Page 2 of 2 FirstFirst 12
Results 16 to 18 of 18

Thread: Physical disk hangs at "offline pending"

  1. #16
    John Fullbright [MVP] Guest

    Re: Physical disk hangs at "offline pending"

    1. "returned error 997" is Error_IO_Pending"
    2. Your configuration violates atomicity in that you have cross group
    dependencies. A group should contain resources plus all dependencies for
    those resources in order to maintain atomicity. Resource dependencies
    control the order in which services go online and offline. If you violate
    atomicity, you break this. Considering the error, I'd hazard a guess that
    this is the root of your problem.

    John




    "Henry" <Henry@discussions.microsoft.com> wrote in message
    news:056FAD68-9EF5-4877-93B1-841B20EED633@microsoft.com...
    > Hi,
    >
    > 1) No. The file system is NTFS.
    > 2) Event ID 1145 - Cluster resource OracleDB timed out. (Physical disk
    > name)
    > Event ID 1205 - The cluster service failed to bring the resource group
    > "OracleDB" completely online or offline.
    > 3)00000f7c.00000810::2007/03/01-15:51:35.805 INFO [FM]
    > FmpRmOfflineResource:
    > RmOffline() for 5fa5cc41-66f4-4b14-9d9c-32c7f67347a5 returned error
    > 997. ---
    > 0000c20.00000f18::2007/03/01-17:56:22.860 INFO [FM] FmpRmOfflineResource:
    > RmOffline() for 667e7691-4049-44fe-9380-c620cd79971d returned error 997
    >
    > The following entry is repeated:
    > 00000c20.00000a60::2007/03/01-17:58:25.379 INFO [FM] FmpCompleteMoveGroup:
    > Exit, status = 997
    > 00000c20.00000a60::2007/03/01-17:58:25.875 INFO [FM] FmpCompleteMoveGroup:
    > Completing the move for group BANCTEC to node 1 (1)
    > 00000c20.00000a60::2007/03/01-17:58:25.875 INFO [FM] FmpOfflineResource:
    > Offline resource <OracleDB> returned pending
    >
    > until finally:
    > 00000c20.00000a60::2007/03/01-17:59:40.276 INFO [FM] FmpCompleteMoveGroup:
    > Exit, status = 997
    > 000002f4.00000388::2007/03/01-17:59:40.757 WARN [RM] RmpTimerThread:
    > Resource OracleDB pending timed out, CP 3 - setting state to failed.
    >
    > This last messege may be the result of us getting fed up and shutting down
    > the server that will not release the physical drive.
    > 4) The other resources are offline. On the odd occasion another physical
    > disk displays the pending offline symtoms as well.
    >
    > Thanks in Advance
    >
    > --
    > Henry
    >
    >
    > "Edwin vMierlo" wrote:
    >
    >> Henry,
    >>
    >> just a few questions:
    >> - is this Oracle FileSystem (ocfs.sys) ?
    >> - what errors do you see in the system event log (please post) ?
    >> - what errors do you see in the cluster.log (please post) ?
    >> - once the disk is in off-line pending state... what other cluster
    >> resources
    >> are off-line pending ?
    >>
    >> thnx,
    >> edwin.
    >>
    >>
    >>
    >>
    >> "Henry" <Henry@discussions.microsoft.com> wrote in message
    >> news:CE4E6FBE-DAE2-425C-B2F5-526812E37245@microsoft.com...
    >> > Hi,
    >> >
    >> > We have installed Oracle failsafe on this cluster and the drive in

    >> question
    >> > is part of the "Cluster Group" set of resources. The oracle database

    >> resides
    >> > on this SAN drive. I have stopped all oracle services on the server
    >> > giving

    >> me
    >> > the problems and the disk still does not go offline to enable a
    >> > failover
    >> > unless the server is shut down.
    >> > I suppose there must be something else preventing the failover and am

    >> trying
    >> > to determine what could be preventing this disk from being released.
    >> > The
    >> > server in question does have exclusive rights to this physical disk
    >> > when

    >> it
    >> > is the active member.
    >> > If anyone has any idea as to how I might determine if some process is
    >> > refusing to release it's resources please make a suggestion.
    >> > Is there a way to increase the logging level of the cluster and should

    >> that
    >> > give me a better indication of what may be the problem? (the logs are

    >> fairly
    >> > hard to decipher even at the default logging level).
    >> >
    >> > Thanks in Advance,
    >> > --
    >> > Henry
    >> >
    >> >
    >> > "Chuck Timon [MSFT]" wrote:
    >> >
    >> > > Sounds like something has a handle to the drive that is preventing

    >> cluster
    >> > > from completing the Offline process. What kind of group is this disk
    >> > > resource in?
    >> > >
    >> > > Chuck Timon, Jr.
    >> > > Microsoft Corporation
    >> > > Longhorn Readiness Team
    >> > > This posting is provided "AS IS" with no warranties, and confers no

    >> rights.
    >> > >
    >> > > "Henry" <Henry@discussions.microsoft.com> wrote in message
    >> > > news:4429C77B-C125-4677-8F00-C2D96D014716@microsoft.com...
    >> > > > Hi,
    >> > > >
    >> > > > I have a 2 node cluster that works correctly when the active server

    >> goes
    >> > > > down.
    >> > > > All resources are taken over by the passive member.
    >> > > >
    >> > > > When I try to move resources from node 1 to node 2 everything works

    >> fine
    >> > > > as
    >> > > > well.
    >> > > > The problem is that when I try to move the the resources back to
    >> > > > the
    >> > > > original node all resources move except for one physical disk. This
    >> > > > physical
    >> > > > disk status remains as "offline pending". The cluster log contains

    >> many
    >> > > > entries similar to what follows:
    >> > > > "FmpofflineResource: offline resource <drivex>returned pending"
    >> > > > until finally
    >> > > > "RmpTimerThread: Resource drivex pending timed out, CP 3 - seting

    >> state to
    >> > > > failed."
    >> > > >
    >> > > > The only way for us to get the offline resource available for the

    >> other
    >> > > > cluster member is to reboot the server that failed to put the
    >> > > > physical
    >> > > > drive
    >> > > > offline.
    >> > > >
    >> > > > Any ideas would be appreciated.
    >> > > > --
    >> > > > Thanks in Advance,
    >> > > >
    >> > > > Henry
    >> > >
    >> > >

    >>
    >>
    >>




  2. #17
    Henry Guest

    Re: Physical disk hangs at "offline pending"

    Hi,

    I changed the group configuration so that all dependent resources reside in
    the same group and still had this issue.

    I finally did get this issue resolved using filemon, processmon and MS
    support.
    The sis.sys (Single Instance Storage) driver was holding the physical
    drive(s) open.
    There is an undocumented issue with RIS (I had it installed on the passive
    cluster member) and MSCS. I couldn't find information on this incompatability
    in my KB and internet searches. Sis.sys uses grovel.exe to check the system
    disks for duplicate files. This monitoring by grovel.exe was the main issue
    the disk would not release and failover (during a manual failover - "Move
    Group") from node 2 back to node 1.
    The fix was to uninstall RIS and disable sis.sys from starting in the
    registry (by setting the start value to 4) .

    I will ask MS support to create an KB article on this so this information
    will be available to others and they will not have to waste time resolving
    similar issues.

    Thanks to everyone for their help on this.

    Henry
    --
    Henry


    "John Fullbright [MVP]" wrote:

    > 1. "returned error 997" is Error_IO_Pending"
    > 2. Your configuration violates atomicity in that you have cross group
    > dependencies. A group should contain resources plus all dependencies for
    > those resources in order to maintain atomicity. Resource dependencies
    > control the order in which services go online and offline. If you violate
    > atomicity, you break this. Considering the error, I'd hazard a guess that
    > this is the root of your problem.
    >
    > John
    >
    >
    >
    >
    > "Henry" <Henry@discussions.microsoft.com> wrote in message
    > news:056FAD68-9EF5-4877-93B1-841B20EED633@microsoft.com...
    > > Hi,
    > >
    > > 1) No. The file system is NTFS.
    > > 2) Event ID 1145 - Cluster resource OracleDB timed out. (Physical disk
    > > name)
    > > Event ID 1205 - The cluster service failed to bring the resource group
    > > "OracleDB" completely online or offline.
    > > 3)00000f7c.00000810::2007/03/01-15:51:35.805 INFO [FM]
    > > FmpRmOfflineResource:
    > > RmOffline() for 5fa5cc41-66f4-4b14-9d9c-32c7f67347a5 returned error
    > > 997. ---
    > > 0000c20.00000f18::2007/03/01-17:56:22.860 INFO [FM] FmpRmOfflineResource:
    > > RmOffline() for 667e7691-4049-44fe-9380-c620cd79971d returned error 997
    > >
    > > The following entry is repeated:
    > > 00000c20.00000a60::2007/03/01-17:58:25.379 INFO [FM] FmpCompleteMoveGroup:
    > > Exit, status = 997
    > > 00000c20.00000a60::2007/03/01-17:58:25.875 INFO [FM] FmpCompleteMoveGroup:
    > > Completing the move for group BANCTEC to node 1 (1)
    > > 00000c20.00000a60::2007/03/01-17:58:25.875 INFO [FM] FmpOfflineResource:
    > > Offline resource <OracleDB> returned pending
    > >
    > > until finally:
    > > 00000c20.00000a60::2007/03/01-17:59:40.276 INFO [FM] FmpCompleteMoveGroup:
    > > Exit, status = 997
    > > 000002f4.00000388::2007/03/01-17:59:40.757 WARN [RM] RmpTimerThread:
    > > Resource OracleDB pending timed out, CP 3 - setting state to failed.
    > >
    > > This last messege may be the result of us getting fed up and shutting down
    > > the server that will not release the physical drive.
    > > 4) The other resources are offline. On the odd occasion another physical
    > > disk displays the pending offline symtoms as well.
    > >
    > > Thanks in Advance
    > >
    > > --
    > > Henry
    > >
    > >
    > > "Edwin vMierlo" wrote:
    > >
    > >> Henry,
    > >>
    > >> just a few questions:
    > >> - is this Oracle FileSystem (ocfs.sys) ?
    > >> - what errors do you see in the system event log (please post) ?
    > >> - what errors do you see in the cluster.log (please post) ?
    > >> - once the disk is in off-line pending state... what other cluster
    > >> resources
    > >> are off-line pending ?
    > >>
    > >> thnx,
    > >> edwin.
    > >>
    > >>
    > >>
    > >>
    > >> "Henry" <Henry@discussions.microsoft.com> wrote in message
    > >> news:CE4E6FBE-DAE2-425C-B2F5-526812E37245@microsoft.com...
    > >> > Hi,
    > >> >
    > >> > We have installed Oracle failsafe on this cluster and the drive in
    > >> question
    > >> > is part of the "Cluster Group" set of resources. The oracle database
    > >> resides
    > >> > on this SAN drive. I have stopped all oracle services on the server
    > >> > giving
    > >> me
    > >> > the problems and the disk still does not go offline to enable a
    > >> > failover
    > >> > unless the server is shut down.
    > >> > I suppose there must be something else preventing the failover and am
    > >> trying
    > >> > to determine what could be preventing this disk from being released.
    > >> > The
    > >> > server in question does have exclusive rights to this physical disk
    > >> > when
    > >> it
    > >> > is the active member.
    > >> > If anyone has any idea as to how I might determine if some process is
    > >> > refusing to release it's resources please make a suggestion.
    > >> > Is there a way to increase the logging level of the cluster and should
    > >> that
    > >> > give me a better indication of what may be the problem? (the logs are
    > >> fairly
    > >> > hard to decipher even at the default logging level).
    > >> >
    > >> > Thanks in Advance,
    > >> > --
    > >> > Henry
    > >> >
    > >> >
    > >> > "Chuck Timon [MSFT]" wrote:
    > >> >
    > >> > > Sounds like something has a handle to the drive that is preventing
    > >> cluster
    > >> > > from completing the Offline process. What kind of group is this disk
    > >> > > resource in?
    > >> > >
    > >> > > Chuck Timon, Jr.
    > >> > > Microsoft Corporation
    > >> > > Longhorn Readiness Team
    > >> > > This posting is provided "AS IS" with no warranties, and confers no
    > >> rights.
    > >> > >
    > >> > > "Henry" <Henry@discussions.microsoft.com> wrote in message
    > >> > > news:4429C77B-C125-4677-8F00-C2D96D014716@microsoft.com...
    > >> > > > Hi,
    > >> > > >
    > >> > > > I have a 2 node cluster that works correctly when the active server
    > >> goes
    > >> > > > down.
    > >> > > > All resources are taken over by the passive member.
    > >> > > >
    > >> > > > When I try to move resources from node 1 to node 2 everything works
    > >> fine
    > >> > > > as
    > >> > > > well.
    > >> > > > The problem is that when I try to move the the resources back to
    > >> > > > the
    > >> > > > original node all resources move except for one physical disk. This
    > >> > > > physical
    > >> > > > disk status remains as "offline pending". The cluster log contains
    > >> many
    > >> > > > entries similar to what follows:
    > >> > > > "FmpofflineResource: offline resource <drivex>returned pending"
    > >> > > > until finally
    > >> > > > "RmpTimerThread: Resource drivex pending timed out, CP 3 - seting
    > >> state to
    > >> > > > failed."
    > >> > > >
    > >> > > > The only way for us to get the offline resource available for the
    > >> other
    > >> > > > cluster member is to reboot the server that failed to put the
    > >> > > > physical
    > >> > > > drive
    > >> > > > offline.
    > >> > > >
    > >> > > > Any ideas would be appreciated.
    > >> > > > --
    > >> > > > Thanks in Advance,
    > >> > > >
    > >> > > > Henry
    > >> > >
    > >> > >
    > >>
    > >>
    > >>

    >
    >
    >


  3. #18
    Edwin vMierlo [MVP] Guest

    Re: Physical disk hangs at "offline pending"

    Henry,

    thanks for posting back, good to know that you have found the root cause of
    this

    Thanks,
    Edwin.



    "Henry" <Henry@discussions.microsoft.com> wrote in message
    news:E5269A98-0E37-4A6F-BBAA-50998757997E@microsoft.com...
    > Hi,
    >
    > I changed the group configuration so that all dependent resources reside

    in
    > the same group and still had this issue.
    >
    > I finally did get this issue resolved using filemon, processmon and MS
    > support.
    > The sis.sys (Single Instance Storage) driver was holding the physical
    > drive(s) open.
    > There is an undocumented issue with RIS (I had it installed on the passive
    > cluster member) and MSCS. I couldn't find information on this

    incompatability
    > in my KB and internet searches. Sis.sys uses grovel.exe to check the

    system
    > disks for duplicate files. This monitoring by grovel.exe was the main

    issue
    > the disk would not release and failover (during a manual failover - "Move
    > Group") from node 2 back to node 1.
    > The fix was to uninstall RIS and disable sis.sys from starting in the
    > registry (by setting the start value to 4) .
    >
    > I will ask MS support to create an KB article on this so this information
    > will be available to others and they will not have to waste time resolving
    > similar issues.
    >
    > Thanks to everyone for their help on this.
    >
    > Henry
    > --
    > Henry
    >
    >
    > "John Fullbright [MVP]" wrote:
    >
    > > 1. "returned error 997" is Error_IO_Pending"
    > > 2. Your configuration violates atomicity in that you have cross group
    > > dependencies. A group should contain resources plus all dependencies

    for
    > > those resources in order to maintain atomicity. Resource dependencies
    > > control the order in which services go online and offline. If you

    violate
    > > atomicity, you break this. Considering the error, I'd hazard a guess

    that
    > > this is the root of your problem.
    > >
    > > John
    > >
    > >
    > >
    > >
    > > "Henry" <Henry@discussions.microsoft.com> wrote in message
    > > news:056FAD68-9EF5-4877-93B1-841B20EED633@microsoft.com...
    > > > Hi,
    > > >
    > > > 1) No. The file system is NTFS.
    > > > 2) Event ID 1145 - Cluster resource OracleDB timed out. (Physical disk
    > > > name)
    > > > Event ID 1205 - The cluster service failed to bring the resource group
    > > > "OracleDB" completely online or offline.
    > > > 3)00000f7c.00000810::2007/03/01-15:51:35.805 INFO [FM]
    > > > FmpRmOfflineResource:
    > > > RmOffline() for 5fa5cc41-66f4-4b14-9d9c-32c7f67347a5 returned error
    > > > 997. ---
    > > > 0000c20.00000f18::2007/03/01-17:56:22.860 INFO [FM]

    FmpRmOfflineResource:
    > > > RmOffline() for 667e7691-4049-44fe-9380-c620cd79971d returned error

    997
    > > >
    > > > The following entry is repeated:
    > > > 00000c20.00000a60::2007/03/01-17:58:25.379 INFO [FM]

    FmpCompleteMoveGroup:
    > > > Exit, status = 997
    > > > 00000c20.00000a60::2007/03/01-17:58:25.875 INFO [FM]

    FmpCompleteMoveGroup:
    > > > Completing the move for group BANCTEC to node 1 (1)
    > > > 00000c20.00000a60::2007/03/01-17:58:25.875 INFO [FM]

    FmpOfflineResource:
    > > > Offline resource <OracleDB> returned pending
    > > >
    > > > until finally:
    > > > 00000c20.00000a60::2007/03/01-17:59:40.276 INFO [FM]

    FmpCompleteMoveGroup:
    > > > Exit, status = 997
    > > > 000002f4.00000388::2007/03/01-17:59:40.757 WARN [RM] RmpTimerThread:
    > > > Resource OracleDB pending timed out, CP 3 - setting state to failed.
    > > >
    > > > This last messege may be the result of us getting fed up and shutting

    down
    > > > the server that will not release the physical drive.
    > > > 4) The other resources are offline. On the odd occasion another

    physical
    > > > disk displays the pending offline symtoms as well.
    > > >
    > > > Thanks in Advance
    > > >
    > > > --
    > > > Henry
    > > >
    > > >
    > > > "Edwin vMierlo" wrote:
    > > >
    > > >> Henry,
    > > >>
    > > >> just a few questions:
    > > >> - is this Oracle FileSystem (ocfs.sys) ?
    > > >> - what errors do you see in the system event log (please post) ?
    > > >> - what errors do you see in the cluster.log (please post) ?
    > > >> - once the disk is in off-line pending state... what other cluster
    > > >> resources
    > > >> are off-line pending ?
    > > >>
    > > >> thnx,
    > > >> edwin.
    > > >>
    > > >>
    > > >>
    > > >>
    > > >> "Henry" <Henry@discussions.microsoft.com> wrote in message
    > > >> news:CE4E6FBE-DAE2-425C-B2F5-526812E37245@microsoft.com...
    > > >> > Hi,
    > > >> >
    > > >> > We have installed Oracle failsafe on this cluster and the drive in
    > > >> question
    > > >> > is part of the "Cluster Group" set of resources. The oracle

    database
    > > >> resides
    > > >> > on this SAN drive. I have stopped all oracle services on the server
    > > >> > giving
    > > >> me
    > > >> > the problems and the disk still does not go offline to enable a
    > > >> > failover
    > > >> > unless the server is shut down.
    > > >> > I suppose there must be something else preventing the failover and

    am
    > > >> trying
    > > >> > to determine what could be preventing this disk from being

    released.
    > > >> > The
    > > >> > server in question does have exclusive rights to this physical disk
    > > >> > when
    > > >> it
    > > >> > is the active member.
    > > >> > If anyone has any idea as to how I might determine if some process

    is
    > > >> > refusing to release it's resources please make a suggestion.
    > > >> > Is there a way to increase the logging level of the cluster and

    should
    > > >> that
    > > >> > give me a better indication of what may be the problem? (the logs

    are
    > > >> fairly
    > > >> > hard to decipher even at the default logging level).
    > > >> >
    > > >> > Thanks in Advance,
    > > >> > --
    > > >> > Henry
    > > >> >
    > > >> >
    > > >> > "Chuck Timon [MSFT]" wrote:
    > > >> >
    > > >> > > Sounds like something has a handle to the drive that is

    preventing
    > > >> cluster
    > > >> > > from completing the Offline process. What kind of group is this

    disk
    > > >> > > resource in?
    > > >> > >
    > > >> > > Chuck Timon, Jr.
    > > >> > > Microsoft Corporation
    > > >> > > Longhorn Readiness Team
    > > >> > > This posting is provided "AS IS" with no warranties, and confers

    no
    > > >> rights.
    > > >> > >
    > > >> > > "Henry" <Henry@discussions.microsoft.com> wrote in message
    > > >> > > news:4429C77B-C125-4677-8F00-C2D96D014716@microsoft.com...
    > > >> > > > Hi,
    > > >> > > >
    > > >> > > > I have a 2 node cluster that works correctly when the active

    server
    > > >> goes
    > > >> > > > down.
    > > >> > > > All resources are taken over by the passive member.
    > > >> > > >
    > > >> > > > When I try to move resources from node 1 to node 2 everything

    works
    > > >> fine
    > > >> > > > as
    > > >> > > > well.
    > > >> > > > The problem is that when I try to move the the resources back

    to
    > > >> > > > the
    > > >> > > > original node all resources move except for one physical disk.

    This
    > > >> > > > physical
    > > >> > > > disk status remains as "offline pending". The cluster log

    contains
    > > >> many
    > > >> > > > entries similar to what follows:
    > > >> > > > "FmpofflineResource: offline resource <drivex>returned pending"
    > > >> > > > until finally
    > > >> > > > "RmpTimerThread: Resource drivex pending timed out, CP 3 -

    seting
    > > >> state to
    > > >> > > > failed."
    > > >> > > >
    > > >> > > > The only way for us to get the offline resource available for

    the
    > > >> other
    > > >> > > > cluster member is to reboot the server that failed to put the
    > > >> > > > physical
    > > >> > > > drive
    > > >> > > > offline.
    > > >> > > >
    > > >> > > > Any ideas would be appreciated.
    > > >> > > > --
    > > >> > > > Thanks in Advance,
    > > >> > > >
    > > >> > > > Henry
    > > >> > >
    > > >> > >
    > > >>
    > > >>
    > > >>

    > >
    > >
    > >




Page 2 of 2 FirstFirst 12

Similar Threads

  1. Replies: 4
    Last Post: 04-04-2012, 04:42 AM
  2. "Add request pending" issue in Yahoo Messenger
    By Galimberti in forum Technology & Internet
    Replies: 3
    Last Post: 17-06-2010, 11:42 AM
  3. Replies: 2
    Last Post: 20-04-2009, 09:57 PM
  4. Replies: 3
    Last Post: 18-03-2008, 10:02 AM
  5. Replies: 2
    Last Post: 07-03-2007, 08:16 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Page generated in 1,710,844,165.45178 seconds with 16 queries