Can't failover resources on Win2k3 Cluster odd IP Address Conflict
We have a Win2k3 SP1 two node cluster we use as a file server. Two weeks ago
we had to reboot a node following the reboot of our core network switches.
We have two resources that we balance between the two nodes of the cluster,
but after the reboot everything was on Node 1. Noting a performance lag we
attempted to move the resource to Node 2, however the IP resource failed
citing there was an IP Address conflict and gave us the MAC address, which
turned out to be the MAC address for Node 2.
We are able to move the Cluster resource (Cluster IP, Name, Quorum) to Node
2, but both of the File Server resources won't move and give us this same
problem.
Now all the times we tried moving resources Veritas Netbackup was backing up
the File Server resources. Could the running backups be preventing us from
bouncing a resource from Node 1 to Node 2, or could I have another issue?
Keep in mind this cluster is about 2 years old and this is the first time
we've had this issue.
Here is one of the errors:
Event Type: Error
Event Source: Tcpip
Event Category: None
Event ID: 4199
Date: 10/23/2008
Time: 4:16:53 AM
User: N/A
Computer: Node 1
Description:
The system detected an address conflict for IP address 10.1.20.74 with the
system having network hardware address 00:09:6B:F5:0F:9A. Network operations
on this system may be disrupted as a result.
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 00 00 00 00 03 00 50 00 ......P.
0008: 00 00 00 00 67 10 00 c0 ....g..À
0010: 00 00 00 00 00 00 00 00 ........
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
The MAC Address displayed is for Node 2.
Thanks,
Re: Can't failover resources on Win2k3 Cluster odd IP Address Conf
Sure thing, also it turned out not to be Netbackup's fault as we attempted to
move the File Server Resource when it was not running.
Here is the layout:
Node 0 - 10.1.20.65 This is the Cluster Management IP
Node 1 Main - 10.1.20.67
Node 1 Heartbeat - 10.1.40.115
Node 1 iSCSI - 10.1.40.114
Node 2 Main - 10.1.20.69
Node 2 Heartbeat - 10.1.40.116
Node 2 iSCSI - 10.1.40.117
FileServer 1 - 10.1.20.74
FileServer 2 - 10.1.20.165
The nodes connect to the shared storage via iSCSI. It's not the best
configuration and was actually before my time.
We've had this for a few years and before the Switch Maintanence a few weeks
ago we've never had a problem moving FileServer 1 and FileServer 2 from Node
to Node. Also the Cluster Resource (Node 0 name and IP and the quorum) have
no problem moving between Node 1 and Node 2.
I'm now wondering if something is wrong on the switch side, it's been a
weird problem. The Cluster.log file does tells me little much either.
These lines are the only places I see issues:
2008/10/23-08:16:58.481 WARN [ClNet] Tcpip is not bound to adapter
A3688E05-4B26-4717-9348-BBA01806D352.
2008/10/23-08:16:58.481 WARN [ClNet] Tcpip is not bound to adapter
76890C41-573E-4637-94AE-B8EF15A5E73F.
- and later -
2008/10/23-08:17:03.746 ERR IP Address <fileserver1-ip>: IP address
10.1.20.74 is already in use on the network, status 5057.
2008/10/23-08:17:03.746 INFO [RM] RmpSetResourceStatus, Posting state 4
notification for resource <ctdayfs004-ip>
2008/10/23-08:17:03.746 INFO [FM] NotifyCallBackRoutine: enqueuing event
2008/10/23-08:17:03.746 INFO [FM] Calling RmNotifyChanges in monitor 0e2c.
2008/10/23-08:17:03.746 INFO [CP] CppResourceNotify for resource ctdayfs004-ip
2008/10/23-08:17:03.746 INFO [FM] FmpRmDoHandleCriticalResourceStateChange:
call InterlockedDecrement on gdwQuoBlockingResources, Resource
fe02d2ab-9d9b-461d-b0e8-21390dce6b22
2008/10/23-08:17:03.746 WARN [FM] FmpHandleResourceTransition: Resource Name
Re: Can't failover resources on Win2k3 Cluster odd IP Address Conf
seems to me that you have a double IP address somewhere in your network
this is what you do
1) offline the whole group "FileServer1" (this will offline its IP address
as well)
2) try to ping 10.1.20.74.... if it responds, then *some* other machine in
your network is using the same IP
Re: Can't failover resources on Win2k3 Cluster odd IP Address Conflict
Any one got some solution to this problem. I'm facing similar issues. Please ping me if any one get some workaround
Re: Can't failover resources on Win2k3 Cluster odd IP Address Conflict
Quote:
Originally Posted by
jaipsharma
Any one got some solution to this problem. I'm facing similar issues. Please ping me if any one get some workaround
If an address conflict occurs, the responding system may send out another ARP request for the same address, forcing the other systems on the subnet to update their caches again. Windows NT does this when it detects a conflict with an address that it has successfully registered. For more information on the Address Resolution Protocol (ARP) as discussed in RFC 826, you may obtain a copy on the Internet from the following source:
http://www.freesoft.org/CIE/RFC/826/index.htm