Page 1 of 2 12 LastLast
Results 1 to 15 of 26

Thread: Disaster recovery for clusters

  1. #1
    CFPDSA Guest

    Disaster recovery for clusters

    My new company uses clustering, but no one really knows what they are doing.
    I am new to it myself, but I'm doing an Exchange DR review and want to plug
    the holes.

    The clusters are wonderful, we use SAN based clusters with Dell hardware.
    We have about 10 2-node active/passive clusters total. They work great,
    failover beautifully and I am now in love with clustering, especially for
    Exchange.

    The scary question is this: What happens if we shut down the nodes on the
    clusters (or lose power or something) and when we go to start them neither
    one works. In other words, what if we have to restore from backup...can we
    do it?

    From my current reading, the answer would be no, because we are not backing
    up anything other than the stores at the moment.

    The information out there on recovering an entire cluster is sketchy at
    best. The Exchange DR Ops guide suggests that all that is needed is a system
    state, but when I try to test this using a VMWARE SCSI based cluster,
    restoring the System state onto a fresh OS install results in the cluster
    service failing to start.

    There are hints in that and other locations that what is needed is an ASR
    (for the disk signatures). This is fine (and I am in the process of testing)
    but seems impractical for an enterprise backup solution. We are using
    BackupExec here at the moment and contemplating switching to Commvault.
    AFAIK, neither of these do ASR backups, so we'd have to manually run an ASR
    backup on each cluster (the Active node at least). And what about the
    floppy? Or excluding certain files from the backup? The ASR option just
    doesn't make practical sense in a production environment.

    What am I missing here? How do most enterprises do their cluster backups?

    Thanks for any input...

    -JC

  2. #2
    Edwin vMierlo [MVP] Guest

    Re: Disaster recovery for clusters

    Have a look at these documents first
    http://www.microsoft.com/technet/pro.../sercbrbp.mspx
    http://support.microsoft.com/kb/887017

    That should get you started, please post specific question back to this
    newsgroup

    rgds,
    edwin.

    "CFPDSA" <CFPDSA@discussions.microsoft.com> wrote in message
    news:9DE1826A-BEDD-4CF4-86AA-2B4421765F4C@microsoft.com...
    > My new company uses clustering, but no one really knows what they are

    doing.
    > I am new to it myself, but I'm doing an Exchange DR review and want to

    plug
    > the holes.
    >
    > The clusters are wonderful, we use SAN based clusters with Dell hardware.
    > We have about 10 2-node active/passive clusters total. They work great,
    > failover beautifully and I am now in love with clustering, especially for
    > Exchange.
    >
    > The scary question is this: What happens if we shut down the nodes on the
    > clusters (or lose power or something) and when we go to start them neither
    > one works. In other words, what if we have to restore from backup...can

    we
    > do it?
    >
    > From my current reading, the answer would be no, because we are not

    backing
    > up anything other than the stores at the moment.
    >
    > The information out there on recovering an entire cluster is sketchy at
    > best. The Exchange DR Ops guide suggests that all that is needed is a

    system
    > state, but when I try to test this using a VMWARE SCSI based cluster,
    > restoring the System state onto a fresh OS install results in the cluster
    > service failing to start.
    >
    > There are hints in that and other locations that what is needed is an ASR
    > (for the disk signatures). This is fine (and I am in the process of

    testing)
    > but seems impractical for an enterprise backup solution. We are using
    > BackupExec here at the moment and contemplating switching to Commvault.
    > AFAIK, neither of these do ASR backups, so we'd have to manually run an

    ASR
    > backup on each cluster (the Active node at least). And what about the
    > floppy? Or excluding certain files from the backup? The ASR option just
    > doesn't make practical sense in a production environment.
    >
    > What am I missing here? How do most enterprises do their cluster backups?
    >
    > Thanks for any input...
    >
    > -JC




  3. #3
    CFPDSA Guest

    Re: Disaster recovery for clusters

    Ok, here is the current scenario I am troubleshooting:

    3 VMs:

    DCEXCH - DC/Exchange combo (this is a lab environment)
    NodeA/NodeB - two functional nodes hosting an Exchange 2003 clustered
    server using a shared SCSI bus for the clustered resources (quorum drive, log
    drive, data drive). IDE drive for system/boot partition.

    Steps to reproduce the problem:

    1) Take a system state backup of NodeA (also program files directory),
    backup to a share on DCEXCH. Also backup system state of NodeB and the
    Exchange stores using Exchange API based backup. All this is using NTbackup.
    2) Shut down Nodes A and B.
    3) Copy a sysprepped base OS IDE drive over top of Node A's system drive
    file to simulate a wipe and rebuild of the machine. Boot the machine, enter
    the name (NodeA) and password, machine reboots and starts clean (not a member
    of the domain). Assign a static IP/subnet so we can communicate with DCEXCH.
    4) FYI, the quorum, log and data cluster resource disks are visible at this
    point in Windows Explorer, labels intact. No errors in event log.
    5) Use NTBackup to restore the system state and program files directories.
    Prompted to reboot.
    6) Reboot takes a long time and we get the "At least one service or driver
    failed during system startup" error.
    7) Cluster service is not started. Event log lists event id 1000
    "Microsoft Clustering Service suffered an unexpected fatal error at line
    '<line>' of source module '<source path>'. The error code was '<error code>'.
    " The path references d:\ which is the quorum drive, error code is 2. I
    have found no useful information about this error on the internet. The
    cluster resource drives show up in windows explorer with the correct drive
    letters, but the labels do not appear, and clicking on the drive results in a
    "the device is not ready" error.
    8) Examine the
    HKLM\System\currentcontrolset\services\clusdisk\parameters\signatures key and
    compare with the disk signatures using diskpart...they are the same.
    9) Attempt to use clusterrecovery.exe tool which fails because the cluster
    is offline. Attempt using the /fixquorum switch to start the cluster
    service, this fails with the same 1000 error.
    10) Numerous articles refer to using dumpcfg to re-write the signatures, but
    I cannot find dumpcfg anywhere available for download.
    11) Based on info in article 217157 was able to determine that the
    HKLM\Cluster hive is not loaded (it is blank). Attempted to follow
    directions in 224999 to restore the hive by copying the chk backup file over
    CLUSDB, but cannot rename CLUSDB as it continues to say there is a process
    accessing it (but cluster service is stopped!).

    Bottom line: what can I do to fix the cluster in this scenario?

    I have read the links you've provided, and many, many others. Nothing out
    there is helpful.

    I just want to verify the procedure for using a plain, ordinary system state
    backup of a cluster node to restore from scratch. Seems like a simple
    request... I've verified that an ASR backup/restore works, but doing ASR
    backups on 20+ servers seems a bit of a tall order, never mind that they
    don't have floppy drives and there are no straightforward official MS
    instructions on using RIS to do ASR that I can find. The point is that this
    is a basic, basic, basic requirement for any clustering solution and it
    should be possible.

    Any help is appreciated.

    -JC

  4. #4
    Raistlin Guest

    Re: Disaster recovery for clusters

    The same problem is bothering my team too. We are truly puzzled by
    materials out there which discuss a lot without giving a prictical
    measure to solve the problem. The lack of official support for Win2k3
    Cluster has prevented us from deploying more tolerent serivces.

    On 1ÔÂ28ÈÕ, ÉÏÎç11ʱ01·Ö, CFPDSA <CFP...@discussions..microsoft.com> wrote:
    > Ok, here is the current scenario I am troubleshooting:
    >
    > 3 VMs:
    >
    > DCEXCH - DC/Exchange combo (this is a lab environment)
    > NodeA/NodeB - two functional nodes hosting an Exchange 2003 clustered
    > server using a shared SCSI bus for the clustered resources (quorum drive, log
    > drive, data drive). IDE drive for system/boot partition.
    >
    > Steps to reproduce the problem:
    >
    > 1) Take a system state backup of NodeA (also program files directory),
    > backup to a share on DCEXCH. Also backup system state of NodeB and the
    > Exchange stores using Exchange API based backup. All this is using NTbackup.
    > 2) Shut down Nodes A and B.
    > 3) Copy a sysprepped base OS IDE drive over top of Node A's system drive
    > file to simulate a wipe and rebuild of the machine. Boot the machine, enter
    > the name (NodeA) and password, machine reboots and starts clean (not a member
    > of the domain). Assign a static IP/subnet so we can communicate with DCEXCH.
    > 4) FYI, the quorum, log and data cluster resource disks are visible at this
    > point in Windows Explorer, labels intact. No errors in event log.
    > 5) Use NTBackup to restore the system state and program files directories..
    > Prompted to reboot.
    > 6) Reboot takes a long time and we get the "At least one service or driver
    > failed during system startup" error.
    > 7) Cluster service is not started. Event log lists event id 1000
    > "Microsoft Clustering Service suffered an unexpected fatal error at line
    > '<line>' of source module '<source path>'. The error code was '<error code>'.
    > " The path references d:\ which is the quorum drive, error code is 2. I
    > have found no useful information about this error on the internet. The
    > cluster resource drives show up in windows explorer with the correct drive
    > letters, but the labels do not appear, and clicking on the drive results in a
    > "the device is not ready" error.
    > 8) Examine the
    > HKLM\System\currentcontrolset\services\clusdisk\parameters\signatures key and
    > compare with the disk signatures using diskpart...they are the same.
    > 9) Attempt to use clusterrecovery.exe tool which fails because the cluster
    > is offline. Attempt using the /fixquorum switch to start the cluster
    > service, this fails with the same 1000 error.
    > 10) Numerous articles refer to using dumpcfg to re-write the signatures, but
    > I cannot find dumpcfg anywhere available for download.
    > 11) Based on info in article 217157 was able to determine that the
    > HKLM\Cluster hive is not loaded (it is blank). Attempted to follow
    > directions in 224999 to restore the hive by copying the chk backup file over
    > CLUSDB, but cannot rename CLUSDB as it continues to say there is a process
    > accessing it (but cluster service is stopped!).
    >
    > Bottom line: what can I do to fix the cluster in this scenario?
    >
    > I have read the links you've provided, and many, many others. Nothing out
    > there is helpful.
    >
    > I just want to verify the procedure for using a plain, ordinary system state
    > backup of a cluster node to restore from scratch. Seems like a simple
    > request... I've verified that an ASR backup/restore works, but doing ASR
    > backups on 20+ servers seems a bit of a tall order, never mind that they
    > don't have floppy drives and there are no straightforward official MS
    > instructions on using RIS to do ASR that I can find. The point is that this
    > is a basic, basic, basic requirement for any clustering solution and it
    > should be possible.
    >
    > Any help is appreciated.
    >
    > -JC



  5. #5
    Edwin vMierlo [MVP] Guest

    Re: Disaster recovery for clusters


    > 6) Reboot takes a long time and we get the "At least one service or

    driver
    > failed during system startup" error.


    did you check the system event log to find out which driver ?


    > 7) Cluster service is not started. Event log lists event id 1000
    > "Microsoft Clustering Service suffered an unexpected fatal error at line
    > '<line>' of source module '<source path>'. The error code was '<error

    code>'.
    > " The path references d:\ which is the quorum drive, error code is 2. I
    > have found no useful information about this error on the internet. The
    > cluster resource drives show up in windows explorer with the correct drive
    > letters, but the labels do not appear, and clicking on the drive results

    in a
    > "the device is not ready" error.


    the fatal error, did the cluster.log file show any errors at that time
    (note: the cluster.log file timestamps are written in GMT, regardless of
    timezone settings or time on the host)





  6. #6
    CFPDSA Guest

    Re: Disaster recovery for clusters


    >
    > did you check the system event log to find out which driver ?


    As mentioned earlier, it wasn't a driver, it was the cluster service that
    failed to start.

    > the fatal error, did the cluster.log file show any errors at that time
    > (note: the cluster.log file timestamps are written in GMT, regardless of
    > timezone settings or time on the host)


    Here ya go:

    0000067c.00000698::2008/01/28-02:57:17.881 INFO [CS] Cluster Service started
    - Cluster Node Version 4.3790
    0000067c.00000698::2008/01/28-02:57:17.881 INFO
    OS Version 5.2.3790 - Service Pack 2 (ADS 03000112L)
    0000067c.00000698::2008/01/28-02:57:17.881 INFO
    Local Time is 2008/01/28-05:57:17.881
    0000067c.000000b4::2008/01/28-02:57:17.897 INFO [CS] Service Starting...
    0000067c.000000b4::2008/01/28-02:57:17.897 INFO [INIT] ClusterInitialize
    called to start cluster.
    0000067c.000000b4::2008/01/28-02:57:17.897 INFO [EP] Initialization...
    0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] Initialization
    0000067c.000000b4::2008/01/28-02:57:17.897 ERR [DM] DmInitialize: The hive
    was loaded- rollback, unload and reload again
    0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] DmpRestartFlusher: Entry
    0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] DmpUnloadHive:
    unloading the hive
    0000067c.000000b4::2008/01/28-02:57:17.928 INFO [Qfs] QfsSetFileAttributes
    C:\WINDOWS\Cluster\CLUSDB.BKP$ 80, status 2
    0000067c.000000b4::2008/01/28-02:57:17.928 INFO [Qfs] QfsDeleteFile
    C:\WINDOWS\Cluster\CLUSDB.BKP$, status 2
    0000067c.000000b4::2008/01/28-02:57:17.928 INFO [DM] Loading cluster
    database from C:\WINDOWS\Cluster\CLUSDB
    0000067c.000000b4::2008/01/28-02:57:17.975 INFO [DM] DmpStartFlusher: Entry
    0000067c.000000b4::2008/01/28-02:57:17.975 INFO [DM] DmpStartFlusher: thread
    created
    0000067c.000000b4::2008/01/28-02:57:17.975 ERR [DM] Failed to open key
    Resources, status 2
    0000067c.000000b4::2008/01/28-02:57:17.975 ERR Cluster service suffered an
    unexpected fatal error at line 1386 of source module
    d:\nt\base\cluster\service\dm\dminit.c. The error code was 2.


  7. #7
    Edwin vMierlo [MVP] Guest

    Re: Disaster recovery for clusters

    in stead of "fixquorum" can you start with "resetquorumlog" ?



    "CFPDSA" <CFPDSA@discussions.microsoft.com> wrote in message
    news:3418E9AB-4D5D-4E89-BF9B-B50D41FECD78@microsoft.com...
    >
    > >
    > > did you check the system event log to find out which driver ?

    >
    > As mentioned earlier, it wasn't a driver, it was the cluster service that
    > failed to start.
    >
    > > the fatal error, did the cluster.log file show any errors at that time
    > > (note: the cluster.log file timestamps are written in GMT, regardless of
    > > timezone settings or time on the host)

    >
    > Here ya go:
    >
    > 0000067c.00000698::2008/01/28-02:57:17.881 INFO [CS] Cluster Service

    started
    > - Cluster Node Version 4.3790
    > 0000067c.00000698::2008/01/28-02:57:17.881 INFO
    > OS Version 5.2.3790 - Service Pack 2 (ADS 03000112L)
    > 0000067c.00000698::2008/01/28-02:57:17.881 INFO
    > Local Time is 2008/01/28-05:57:17.881
    > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [CS] Service Starting...
    > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [INIT] ClusterInitialize
    > called to start cluster.
    > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [EP] Initialization...
    > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] Initialization
    > 0000067c.000000b4::2008/01/28-02:57:17.897 ERR [DM] DmInitialize: The

    hive
    > was loaded- rollback, unload and reload again
    > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] DmpRestartFlusher:

    Entry
    > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] DmpUnloadHive:
    > unloading the hive
    > 0000067c.000000b4::2008/01/28-02:57:17.928 INFO [Qfs] QfsSetFileAttributes
    > C:\WINDOWS\Cluster\CLUSDB.BKP$ 80, status 2
    > 0000067c.000000b4::2008/01/28-02:57:17.928 INFO [Qfs] QfsDeleteFile
    > C:\WINDOWS\Cluster\CLUSDB.BKP$, status 2
    > 0000067c.000000b4::2008/01/28-02:57:17.928 INFO [DM] Loading cluster
    > database from C:\WINDOWS\Cluster\CLUSDB
    > 0000067c.000000b4::2008/01/28-02:57:17.975 INFO [DM] DmpStartFlusher:

    Entry
    > 0000067c.000000b4::2008/01/28-02:57:17.975 INFO [DM] DmpStartFlusher:

    thread
    > created
    > 0000067c.000000b4::2008/01/28-02:57:17.975 ERR [DM] Failed to open key
    > Resources, status 2
    > 0000067c.000000b4::2008/01/28-02:57:17.975 ERR Cluster service suffered

    an
    > unexpected fatal error at line 1386 of source module
    > d:\nt\base\cluster\service\dm\dminit.c. The error code was 2.
    >




  8. #8
    CFPDSA Guest

    Re: Disaster recovery for clusters

    That didn't work either, same results.

    Sorry it's been a while but we've had a litle "undersea cable" problem over
    here in the last week that has interrupted regular internet access.

    "Edwin vMierlo [MVP]" wrote:

    > in stead of "fixquorum" can you start with "resetquorumlog" ?
    >
    >
    >
    > "CFPDSA" <CFPDSA@discussions.microsoft.com> wrote in message
    > news:3418E9AB-4D5D-4E89-BF9B-B50D41FECD78@microsoft.com...
    > >
    > > >
    > > > did you check the system event log to find out which driver ?

    > >
    > > As mentioned earlier, it wasn't a driver, it was the cluster service that
    > > failed to start.
    > >
    > > > the fatal error, did the cluster.log file show any errors at that time
    > > > (note: the cluster.log file timestamps are written in GMT, regardless of
    > > > timezone settings or time on the host)

    > >
    > > Here ya go:
    > >
    > > 0000067c.00000698::2008/01/28-02:57:17.881 INFO [CS] Cluster Service

    > started
    > > - Cluster Node Version 4.3790
    > > 0000067c.00000698::2008/01/28-02:57:17.881 INFO
    > > OS Version 5.2.3790 - Service Pack 2 (ADS 03000112L)
    > > 0000067c.00000698::2008/01/28-02:57:17.881 INFO
    > > Local Time is 2008/01/28-05:57:17.881
    > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [CS] Service Starting...
    > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [INIT] ClusterInitialize
    > > called to start cluster.
    > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [EP] Initialization...
    > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] Initialization
    > > 0000067c.000000b4::2008/01/28-02:57:17.897 ERR [DM] DmInitialize: The

    > hive
    > > was loaded- rollback, unload and reload again
    > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] DmpRestartFlusher:

    > Entry
    > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] DmpUnloadHive:
    > > unloading the hive
    > > 0000067c.000000b4::2008/01/28-02:57:17.928 INFO [Qfs] QfsSetFileAttributes
    > > C:\WINDOWS\Cluster\CLUSDB.BKP$ 80, status 2
    > > 0000067c.000000b4::2008/01/28-02:57:17.928 INFO [Qfs] QfsDeleteFile
    > > C:\WINDOWS\Cluster\CLUSDB.BKP$, status 2
    > > 0000067c.000000b4::2008/01/28-02:57:17.928 INFO [DM] Loading cluster
    > > database from C:\WINDOWS\Cluster\CLUSDB
    > > 0000067c.000000b4::2008/01/28-02:57:17.975 INFO [DM] DmpStartFlusher:

    > Entry
    > > 0000067c.000000b4::2008/01/28-02:57:17.975 INFO [DM] DmpStartFlusher:

    > thread
    > > created
    > > 0000067c.000000b4::2008/01/28-02:57:17.975 ERR [DM] Failed to open key
    > > Resources, status 2
    > > 0000067c.000000b4::2008/01/28-02:57:17.975 ERR Cluster service suffered

    > an
    > > unexpected fatal error at line 1386 of source module
    > > d:\nt\base\cluster\service\dm\dminit.c. The error code was 2.
    > >

    >
    >
    >


  9. #9
    Edwin vMierlo [MVP] Guest

    Re: Disaster recovery for clusters

    OK, back to basic

    if you disable cluster service, disable clusdisk.sys and reboot --

    do you have access to all your disks ?



    "CFPDSA" <CFPDSA@discussions.microsoft.com> wrote in message
    news:3A1ABC05-51BE-4DB4-803F-1BABD849B14A@microsoft.com...
    > That didn't work either, same results.
    >
    > Sorry it's been a while but we've had a litle "undersea cable" problem

    over
    > here in the last week that has interrupted regular internet access.
    >
    > "Edwin vMierlo [MVP]" wrote:
    >
    > > in stead of "fixquorum" can you start with "resetquorumlog" ?
    > >
    > >
    > >
    > > "CFPDSA" <CFPDSA@discussions.microsoft.com> wrote in message
    > > news:3418E9AB-4D5D-4E89-BF9B-B50D41FECD78@microsoft.com...
    > > >
    > > > >
    > > > > did you check the system event log to find out which driver ?
    > > >
    > > > As mentioned earlier, it wasn't a driver, it was the cluster service

    that
    > > > failed to start.
    > > >
    > > > > the fatal error, did the cluster.log file show any errors at that

    time
    > > > > (note: the cluster.log file timestamps are written in GMT,

    regardless of
    > > > > timezone settings or time on the host)
    > > >
    > > > Here ya go:
    > > >
    > > > 0000067c.00000698::2008/01/28-02:57:17.881 INFO [CS] Cluster Service

    > > started
    > > > - Cluster Node Version 4.3790
    > > > 0000067c.00000698::2008/01/28-02:57:17.881 INFO
    > > > OS Version 5.2.3790 - Service Pack 2 (ADS 03000112L)
    > > > 0000067c.00000698::2008/01/28-02:57:17.881 INFO
    > > > Local Time is 2008/01/28-05:57:17.881
    > > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [CS] Service

    Starting...
    > > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [INIT]

    ClusterInitialize
    > > > called to start cluster.
    > > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [EP] Initialization...
    > > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] Initialization
    > > > 0000067c.000000b4::2008/01/28-02:57:17.897 ERR [DM] DmInitialize: The

    > > hive
    > > > was loaded- rollback, unload and reload again
    > > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM]

    DmpRestartFlusher:
    > > Entry
    > > > 0000067c.000000b4::2008/01/28-02:57:17.897 INFO [DM] DmpUnloadHive:
    > > > unloading the hive
    > > > 0000067c.000000b4::2008/01/28-02:57:17.928 INFO [Qfs]

    QfsSetFileAttributes
    > > > C:\WINDOWS\Cluster\CLUSDB.BKP$ 80, status 2
    > > > 0000067c.000000b4::2008/01/28-02:57:17.928 INFO [Qfs] QfsDeleteFile
    > > > C:\WINDOWS\Cluster\CLUSDB.BKP$, status 2
    > > > 0000067c.000000b4::2008/01/28-02:57:17.928 INFO [DM] Loading cluster
    > > > database from C:\WINDOWS\Cluster\CLUSDB
    > > > 0000067c.000000b4::2008/01/28-02:57:17.975 INFO [DM] DmpStartFlusher:

    > > Entry
    > > > 0000067c.000000b4::2008/01/28-02:57:17.975 INFO [DM] DmpStartFlusher:

    > > thread
    > > > created
    > > > 0000067c.000000b4::2008/01/28-02:57:17.975 ERR [DM] Failed to open

    key
    > > > Resources, status 2
    > > > 0000067c.000000b4::2008/01/28-02:57:17.975 ERR Cluster service

    suffered
    > > an
    > > > unexpected fatal error at line 1386 of source module
    > > > d:\nt\base\cluster\service\dm\dminit.c. The error code was 2.
    > > >

    > >
    > >
    > >




  10. #10
    CFPDSA Guest

    Re: Disaster recovery for clusters

    Yup, they all show up and I can access them.

    "Edwin vMierlo [MVP]" wrote:

    > OK, back to basic
    >
    > if you disable cluster service, disable clusdisk.sys and reboot --
    >
    > do you have access to all your disks ?
    >
    >


  11. #11
    Edwin vMierlo [MVP] Guest

    Re: Disaster recovery for clusters

    first of all check the signature of your q:\ (quorum disk)
    if that is not the original you must change that now.

    if your signature is OK, then do the following

    ensure Node 2 is fully shut down

    rename the q:\mscs folder to q:\mscs_old

    enable clusdisk.sys again on Node 1
    reboot

    start your cluster with -resetquorumlog

    does it start now ?

    if yes, you need to stop it and start it wihout the parameter



    "CFPDSA" <CFPDSA@discussions.microsoft.com> wrote in message
    news:CA232CCB-5CE2-4408-A5BB-A6F63EBDE7E1@microsoft.com...
    > Yup, they all show up and I can access them.
    >
    > "Edwin vMierlo [MVP]" wrote:
    >
    > > OK, back to basic
    > >
    > > if you disable cluster service, disable clusdisk.sys and reboot --
    > >
    > > do you have access to all your disks ?
    > >
    > >




  12. #12
    CFPDSA Guest

    Re: Disaster recovery for clusters

    Verified the signatures are correct using diskpart disk detail compared w/
    the registry entries.

    Renamed mscs, then reenabled clusdisk.sys and rebooted.

    Attempted to start cluster service with -resetquorumlog and it fails again
    with the same error.

    FYI, when the clusdisk.sys driver is enabled, all the disk resources are
    inaccessible. They are visible in Explorer, but give an error of "The device
    is not ready." when double clicked.

    "Edwin vMierlo [MVP]" wrote:

    > first of all check the signature of your q:\ (quorum disk)
    > if that is not the original you must change that now.
    >
    > if your signature is OK, then do the following
    >
    > ensure Node 2 is fully shut down
    >
    > rename the q:\mscs folder to q:\mscs_old
    >
    > enable clusdisk.sys again on Node 1
    > reboot
    >
    > start your cluster with -resetquorumlog
    >
    > does it start now ?
    >
    > if yes, you need to stop it and start it wihout the parameter
    >
    >
    >
    > "CFPDSA" <CFPDSA@discussions.microsoft.com> wrote in message
    > news:CA232CCB-5CE2-4408-A5BB-A6F63EBDE7E1@microsoft.com...
    > > Yup, they all show up and I can access them.
    > >
    > > "Edwin vMierlo [MVP]" wrote:
    > >
    > > > OK, back to basic
    > > >
    > > > if you disable cluster service, disable clusdisk.sys and reboot --
    > > >
    > > > do you have access to all your disks ?
    > > >
    > > >

    >
    >
    >


  13. #13
    Edwin vMierlo [MVP] Guest

    Re: Disaster recovery for clusters



    > Verified the signatures are correct using diskpart disk detail compared w/
    > the registry entries.
    >
    > Renamed mscs, then reenabled clusdisk.sys and rebooted.
    >
    > Attempted to start cluster service with -resetquorumlog and it fails again
    > with the same error.


    hm, this is going to be a long time before we solve these type of problems
    in a news group, if you need quick response I guess you need to start
    getting help from Microsoft.

    Something not right with either you backup or your restore procedure...



    If you still want to keep trying to get this restored cluster online; I do
    start to believe this is not the quorum then, maybe some group policy
    blocking something, maybe the account which is running the cluster service
    cannot access registry, or a file.... again this is going to be a tough one
    to troubleshoot in a forum.

    have you checked that the user account running the cluster service is a
    local admin ?

    on the server, log on with the account which is used to run cluster service.
    Launch filemon.exe and regmon.exe. then start the cluster service, and
    capture filemon and regmon files and see if this gives you a clue

    >
    > FYI, when the clusdisk.sys driver is enabled, all the disk resources are
    > inaccessible. They are visible in Explorer, but give an error of "The

    device
    > is not ready." when double clicked.
    >


    that is normal, first they have to be online in cluster prior to you can
    access the disks



  14. #14
    CFPDSA Guest

    Re: Disaster recovery for clusters

    Well, the whole reason I started this thread was to see what the answer to
    the obvious question of "how do you restore a cluster" was.

    The particular cluster we are working with here (as stated previously) is a
    VMWARE based scsi cluster, just for testing purposes. But the procedure used
    to restore in this case should be the same for any scenario.

    We do not have MS support (AFAIK) so that is not an option for us. I have
    been doing research into a proper disaster recovery plan for our Exchange
    clusters and have been unable to find precise guidance on how to restore a
    dead cluster (i.e. the system state was backed up, now the cluster won't
    start, how do you restore the cluster?).

    I thought this would be an easy question... oh well...

    "Edwin vMierlo [MVP]" wrote:

    >
    >
    > > Verified the signatures are correct using diskpart disk detail compared w/
    > > the registry entries.
    > >
    > > Renamed mscs, then reenabled clusdisk.sys and rebooted.
    > >
    > > Attempted to start cluster service with -resetquorumlog and it fails again
    > > with the same error.

    >
    > hm, this is going to be a long time before we solve these type of problems
    > in a news group, if you need quick response I guess you need to start
    > getting help from Microsoft.
    >
    > Something not right with either you backup or your restore procedure...
    >
    >
    >
    > If you still want to keep trying to get this restored cluster online; I do
    > start to believe this is not the quorum then, maybe some group policy
    > blocking something, maybe the account which is running the cluster service
    > cannot access registry, or a file.... again this is going to be a tough one
    > to troubleshoot in a forum.
    >
    > have you checked that the user account running the cluster service is a
    > local admin ?
    >
    > on the server, log on with the account which is used to run cluster service.
    > Launch filemon.exe and regmon.exe. then start the cluster service, and
    > capture filemon and regmon files and see if this gives you a clue
    >
    > >
    > > FYI, when the clusdisk.sys driver is enabled, all the disk resources are
    > > inaccessible. They are visible in Explorer, but give an error of "The

    > device
    > > is not ready." when double clicked.
    > >

    >
    > that is normal, first they have to be online in cluster prior to you can
    > access the disks
    >
    >
    >


  15. #15
    CFPDSA Guest

    Re: Disaster recovery for clusters

    One last question, is the last statement below accurate? (i.e. no ASR = no
    cluster rebuild)

    From:
    http://technet2.microsoft.com/window....mspx?mfr=true

    Scenario 8—Complete Cluster Failure
    Symptom: None of the nodes can boot up.

    If all nodes fail in a cluster and the quorum disk cannot be repaired,
    follow these steps:

    • Use Automated System Recovery on one node in the original cluster,
    choosing a node that was backed up recently and that was active in the
    cluster at the time it was backed up. This restores the disk signatures, the
    partition layout of the cluster disks (quorum and nonquorum), and the cluster
    configuration data. Do not start other nodes until the first node is
    restored. For more information, see To Restore a damaged cluster node using
    Automated System Recovery.

    • Restore other nodes. For more information, see Restore a damaged cluster
    node using Automated System Recovery.

    • Restore your applications and application data from backup data sets.


    Important

    • If you do not have an Automated System Recovery backup of each node, you
    cannot restore the cluster. Instead, you must recreate your cluster from
    scratch. For more information, see Checklist: Planning and creating a server
    cluster.





    "CFPDSA" wrote:

    > Well, the whole reason I started this thread was to see what the answer to
    > the obvious question of "how do you restore a cluster" was.
    >
    > The particular cluster we are working with here (as stated previously) is a
    > VMWARE based scsi cluster, just for testing purposes. But the procedure used
    > to restore in this case should be the same for any scenario.
    >
    > We do not have MS support (AFAIK) so that is not an option for us. I have
    > been doing research into a proper disaster recovery plan for our Exchange
    > clusters and have been unable to find precise guidance on how to restore a
    > dead cluster (i.e. the system state was backed up, now the cluster won't
    > start, how do you restore the cluster?).
    >
    > I thought this would be an easy question... oh well...
    >
    > "Edwin vMierlo [MVP]" wrote:
    >
    > >
    > >
    > > > Verified the signatures are correct using diskpart disk detail compared w/
    > > > the registry entries.
    > > >
    > > > Renamed mscs, then reenabled clusdisk.sys and rebooted.
    > > >
    > > > Attempted to start cluster service with -resetquorumlog and it fails again
    > > > with the same error.

    > >
    > > hm, this is going to be a long time before we solve these type of problems
    > > in a news group, if you need quick response I guess you need to start
    > > getting help from Microsoft.
    > >
    > > Something not right with either you backup or your restore procedure...
    > >
    > >
    > >
    > > If you still want to keep trying to get this restored cluster online; I do
    > > start to believe this is not the quorum then, maybe some group policy
    > > blocking something, maybe the account which is running the cluster service
    > > cannot access registry, or a file.... again this is going to be a tough one
    > > to troubleshoot in a forum.
    > >
    > > have you checked that the user account running the cluster service is a
    > > local admin ?
    > >
    > > on the server, log on with the account which is used to run cluster service.
    > > Launch filemon.exe and regmon.exe. then start the cluster service, and
    > > capture filemon and regmon files and see if this gives you a clue
    > >
    > > >
    > > > FYI, when the clusdisk.sys driver is enabled, all the disk resources are
    > > > inaccessible. They are visible in Explorer, but give an error of "The

    > > device
    > > > is not ready." when double clicked.
    > > >

    > >
    > > that is normal, first they have to be online in cluster prior to you can
    > > access the disks
    > >
    > >
    > >


Page 1 of 2 12 LastLast

Similar Threads

  1. HP mini SP42226 disaster Recovery utility download
    By $kRITIKa$ in forum Portable Devices
    Replies: 6
    Last Post: 02-07-2011, 10:52 PM
  2. Need information about sun Solaris disaster recovery plan
    By Aalap in forum Operating Systems
    Replies: 6
    Last Post: 02-06-2011, 09:39 PM
  3. Fault Tolerance and Disaster Recovery
    By DwinHell in forum Operating Systems
    Replies: 3
    Last Post: 11-08-2009, 09:51 PM
  4. How to implement Disaster recovery Exchange server 2007
    By Shanbaag in forum Windows Software
    Replies: 2
    Last Post: 03-07-2009, 11:34 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Page generated in 1,638,080,994.14808 seconds with 16 queries