Results 1 to 6 of 6

Thread: Random Event 1135

  1. #1
    D1Artagnan Guest

    Random Event 1135

    I'm looking for an advice how to troubleshoot Event 1135.

    Scenario:
    2x nodes Windows 2008 Sp1 x64 Failover Cluster (Node and File Share Majority)
    Exchange 2007 Sp1 CCR
    Cluster nodes and witness are on a VMware 3.5, connected to FC SAN
    Additional software: McAfee Group Shield 7 Sp1 for Exchange, SCOM2007
    client, SMS 2003 Advanced client and ARC Server Backup Agent for Exchange ver
    12.1

    Problem description:
    Event 1135: Cluster node 'STLAKLMB01' was removed from the active failover
    cluster membership....

    This event is logged on both Active and Passive cluster nodes. In addition
    the Passive node reports
    Event 1069: Cluster resource 'File Share Witness (\\STLAKLXCH03\Quorum)' in
    clustered service or application 'Cluster Group' failed
    and
    Event 1564: File share witness resource 'File Share Witness
    (\\STLAKLXCH03\Quorum)' failed to arbitrate for the file share
    '\\STLAKLXCH03\Quorum'. Please ensure that file share '\\STLAKLXCH03\Quorum'
    exists and is accessible by the cluster.

    This happened 2 times in the last one week (11:30 PM and 1:06 AM). Downtime
    in both cases was about 2 minutes after which the Passive node reconnected
    and the cluster recovered. The impact was that 4 out of the 6 (2 out of 6 in
    the first case) Exchange 2007 storage groups failed to recover the
    replication after the failure and my only option was to re-seed them in the
    morning.

    The stange thing here is that there aren't any events that may suggest
    network failure. Furthermore the failed (passive) node keeps reporting that
    both networks Public and Heartbeat are up. No other servers or infrastructure
    components have registered any network otages at the time of the events.

    Q1: How do I troubleshoot this failure - are there any additional logs or
    tools I could use to capture more information?

    Q2: How to configure the Failover Cluster to delay shutting down the
    cluster. All current settings are default

    Your help is much appreciated



  2. #2
    John Toner [MVP] Guest

    Re: Random Event 1135

    1) If you go to a command line and issue a "cluster log /g" command, this
    will generate a cluster.log file in the c:\windows\cluster\reports folder
    that might provide additional information. Also, you can check the Failover
    Cluster Operational logs for messages regarding network
    messages...operational logs are under Diagnostics > Applications and Service
    Logs > Microsoft > Windows > FailoverClustering

    2) You cannot delay the shutdown of the cluster, but you can perform some
    tweaks that might help delay the amount of time it takes to get to the point
    where it is determined that the node is not available by adjusting the
    heartbeat settings.

    The default heartbeat value is that a heartbeat signal is sent once every
    second (1000 milliseconds) and when a node misses a series of 5 heartbeats,
    another node will initiate failover. You can adjust these values in Windows
    2008 clusters by using the following commands:

    cluster /prop SameSubnetDelay=<value>
    cluster /prop SameSubnetThreshold=<value>

    If your cluster nodes are on separate subnets, you would adjust the
    following values instead:

    cluster /prop CrossSubnetDelay=<value>
    cluster /prop CrossSubnetThreshold=<value>

    You can type cluster /prop to see your current settings.

  3. #3
    D1Artagnan Guest

    Re: Random Event 1135

    Thank you for your help

    Cluster.log on both nodes were not very useful. The log on the active node
    has not logged events between 1.04 and 22.04. The log on the passive node has
    some events logged on 4th and 8th April. Both logs have no events logged for
    the time of the failures.

    Failover Cluster Operational Log also appears to have missed some periods of
    time although not that large - no events were logged between 1:08 AM on 17.04
    and 3:29 PM on 20.04. The first time stamp coincides with the time when the
    cluster recovered from a failure, the second timestamp is when the backup
    started

    Windows System Event log seems to be the most useful. I'm not sure if the
    cluster service has crashed and that caused the disconnection to the active
    node, or the node has lost connectivity to the quorum and that caused the
    cluster service to terminate. It also looks like there is some pattern in the
    time of the fault: Occurrences in the last 2 weeks are

    23.04 - From 1:05:18 AM to 1:07:51 AM
    17.04 - From 1:06:18 AM to 1:07:55 AM
    14.04 - From 11:30:37 PM to 11:33:09 PM

  4. #4
    John Toner [MVP] Guest

    Re: Random Event 1135

    The cluster log is supposed to contain detailed logs for everything that is
    happening in the cluster. Unfortunately, I have also seen cases where the
    2008 cluster logs are missing data from the time an event had occurred...I
    think that network issue might be affecting cluster logging in 2008.

    Also FYI, the cluster.log events are in GMT time so you may need to
    compensate for this when looking at the time in this log file.

  5. #5
    WayCoolKennel Guest

    Event 7024

    This was VERY useful.. I think you are correct as we had a switch burp...
    and this is EXACTLY what we saw...well except the cluster.log was definitive.
    It showed that the node has lost all communication with the other nodes in
    the cluster.

    thanks for the good info here !

  6. #6
    Join Date
    Jan 2010
    Posts
    1

    Re: Random Event 1135

    D1Artagnan, did you end up finding a solution to this issue? I'm having the exact same problem - pretty much identical configuration as you too. Oddly it keeps happening every Sunday at the same time (give or take a few minutes) in the middle of the night. I'm thinking it's gotta be vmware/network but I can't reproduce it so this is tough to troubleshoot.

Similar Threads

  1. Replies: 4
    Last Post: 13-01-2012, 05:07 PM
  2. How to use "Math.random()" to generate a random number in JavaScript?
    By Silent~Kid in forum Software Development
    Replies: 5
    Last Post: 03-02-2010, 05:06 AM
  3. Replies: 3
    Last Post: 25-02-2009, 03:42 PM
  4. Event ID: 5721 Event Source: NETLOGON member server windows 2003
    By Edwin Delgado in forum Windows Server Help
    Replies: 1
    Last Post: 08-09-2007, 12:41 AM
  5. Event Log Error: Event Source:WinMgmt Event ID:10
    By BlackSunReyes in forum Small Business Server
    Replies: 2
    Last Post: 01-03-2007, 03:27 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Page generated in 1,717,393,164.23110 seconds with 16 queries