I'm looking for an advice how to troubleshoot Event 1135.
Scenario:
2x nodes Windows 2008 Sp1 x64 Failover Cluster (Node and File Share Majority)
Exchange 2007 Sp1 CCR
Cluster nodes and witness are on a VMware 3.5, connected to FC SAN
Additional software: McAfee Group Shield 7 Sp1 for Exchange, SCOM2007
client, SMS 2003 Advanced client and ARC Server Backup Agent for Exchange ver
12.1
Problem description:
Event 1135: Cluster node 'STLAKLMB01' was removed from the active failover
cluster membership....
This event is logged on both Active and Passive cluster nodes. In addition
the Passive node reports
Event 1069: Cluster resource 'File Share Witness (\\STLAKLXCH03\Quorum)' in
clustered service or application 'Cluster Group' failed
and
Event 1564: File share witness resource 'File Share Witness
(\\STLAKLXCH03\Quorum)' failed to arbitrate for the file share
'\\STLAKLXCH03\Quorum'. Please ensure that file share '\\STLAKLXCH03\Quorum'
exists and is accessible by the cluster.
This happened 2 times in the last one week (11:30 PM and 1:06 AM). Downtime
in both cases was about 2 minutes after which the Passive node reconnected
and the cluster recovered. The impact was that 4 out of the 6 (2 out of 6 in
the first case) Exchange 2007 storage groups failed to recover the
replication after the failure and my only option was to re-seed them in the
morning.
The stange thing here is that there aren't any events that may suggest
network failure. Furthermore the failed (passive) node keeps reporting that
both networks Public and Heartbeat are up. No other servers or infrastructure
components have registered any network otages at the time of the events.
Q1: How do I troubleshoot this failure - are there any additional logs or
tools I could use to capture more information?
Q2: How to configure the Failover Cluster to delay shutting down the
cluster. All current settings are default
Your help is much appreciated


Reply With Quote

Bookmarks