Forum
Hi everyone.
We were working smoothly with our cluster, but we have found that memory suddenly goes up and that we needed to make a failover, but as soon as we did that some interfaces stopped working, I mean we only have 4 virutal interfaces, when working on node1 everything was ok, with the failover to node2, 2 interfaces stopped working even with an ok status, but we couldnt use the farms that were on those interfaces, Do you know what could be happening?
Thanks in advance
Good day @rgonzaleza,
When Community Cluster switches the other node takes control activating the virtual IP configured in variable $cluster_ip, so to clarify, can you confirm if this IP is reachable (you can run ping) in the other node (new MASTER) at the moment the cluster switches?
If not, ARP broadcast packages are not allowed in your network (it is quite common to drop packets in Cloud services). If you run on-premise please allow that kind of packets.
if after a cluster switch the $cluster_ip is reachable but the other virtual IPs are not reachable then do the following:
Open the file /usr/local/skudonet/config/zlb-start and add the following line (anyway you can do this from the new MASTER for test purposes as this command announces to the network the MAC change):
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_1
I assume Virtual_IP_1 runs over eth0, if not, please change eth0 based on your config.
Repeat the line per Virtual IP.
When arping is executed ARP packages are sent to the network to announce the MAC change of Virtual_IP_1, and your switches have to update the ARP table. If not please check your network.
Any update will be welcomed.
Regards!
@emiliocm Thanks for your help, we did what you told us and these are our findings:
When we made the Fail over to the second node, we are able to ping the Cluster IP and the first node, also all the virtual IPs are running over eth0, and the switches are updating actively the ARP table.
Antothe troubleshooting we made is:
We have another LB, and there, we were able to configure the virtual IPs that stopped working in this new cluster.
HI @rgonzaleza,. I cannot understand where your problem is, so can you please extend with more detail the description of the problem? Because I read your first comment and I understood cluster was not switching properly as the virtual IP was not working in second node, so I gave you indications to troubleshoot the switching and you reported it works.
I am a little bit lost, sorry.
Any additional information that would help me figure out what is going on would be appreciated.
Regards!
@emiliocm Hi Emilio, sorry If didnt make myself clear.
This is the issue:
We have a cluster with 2 nodes, master (10.1.1.140), backup (10.1.1.141) and cluster (10.1.1.142), everything seems working fine, the replication and failover is working, we created 4 virtual interfaces:
10.1.1.143
10.1.1.144
10.1.1.145
10.1.1.146
but here's the problem:
when we make the failover to the node 2 some of the virtual interfaces stop working, hence the farms also stop working and we need to make the failover back to the node 1, so we are not able to bring up that virtual intrfaces on node 2, but the weird thing is that, if we create those virtual interfaces in another LB we got, they start working.
Hi @rgonzaleza, when you say: "we are not able to bring up that virtual interface on node 2", what kind of test did you do to figure out that interfaces on node 2 are not bringing UP?
Additionally, some information is missing in the description.
-When the failover is done to node2, are the virtual interfaces configured and Up in the node2? you can check this with commands like:
ifconfig ip route list ip route list table table_eth0
-When the failover is done to node2 are the farms UP and running in this node2?
if you run HTTP profiles then check if "pound" binaries are running:
ps -ef | grep pound
if you run l4xnat profiles then check if nft is loaded with the rules
nft list ruleset
This information is mandatory to let me understand if the issue is a bug in the switching or an issue in your network configuration.
Regards!
@emiliocm Hi Emilio, Finally we were able to replicate the issue, we created a new cluster and I'm sending you a document with the images, hoping that will be clear for you.
Hi @rgonzaleza, thanks for the attached document; it is clear now.
Did you apply the workaround I explained in my previous comment 26/02/2025 8:37 am ?
Add the following lines to /usr/local/skudonet/config/zlb-start (both nodes of the cluster)
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_1
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_2
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_3
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_4
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_5
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_6
Later repeat the tests and share the results with us.
@emiliocm hi Emilio.
I added the command in the file you told me and is updated on the other node automatically, I supposed is because of the HA configuration, but the issue is the same, the node 2 shows interfaces are up, but we lost communication to the virutal interfaces. I add a new document with the evidence, also run the command you sent last week
Hi @rgonaleza, I can confirm that the issue is related to your network. SKUDONET Cluster is working properly; see attached PDF with my tests.
When the cluster switches, the new master sends a gratuitous ARP packet to the network to indicate networking devices that the MAC for the IP has changed. For some reason, this kind of packetets are discarded by your switches.
I would recommend you to play with the arping command in the new MASTER once the cluser switches doing some changes to the command
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_1
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_2
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_3
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_4
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_5
/usr/bin/arping -A -c 2 -I eth0 Virtual_IP_6
by (send arping package in unsolicited mode -U, instead of answered mode, -A)
/usr/bin/arping -U -c 2 -I eth0 Virtual_IP_1
/usr/bin/arping -U -c 2 -I eth0 Virtual_IP_2
/usr/bin/arping -U -c 2 -I eth0 Virtual_IP_3
/usr/bin/arping -U -c 2 -I eth0 Virtual_IP_4
/usr/bin/arping -U -c 2 -I eth0 Virtual_IP_5
/usr/bin/arping -U -c 2 -I eth0 Virtual_IP_6
You can do some troubleshooting in your switches and identify why these packets are not managed. But as I said the issue seems to be related to your network as the cluster works as expected and do the desired.
Thanks!
@emiliocm Thanks Emilio, I ran the command with -U, and that update the MAC address, but now, what do we have to do? because I added the commands in the file you said before (zlb-start) and wehn I did the failover it didnt work.
Good day @rgonzaleza, this file zlb-start is called at the end of the Master startup process, so it is the place where to configure additional actions once the node takes the MASTER role, yo can add some sleep before executing the arping -U and increase the number of packets instead of 2.
sleep 5
/usr/bin/arping -U -c 10 -I eth0 Virtual_IP_1
/usr/bin/arping -U -c 10 -I eth0 Virtual_IP_2
/usr/bin/arping -U -c 10 -I eth0 Virtual_IP_3
/usr/bin/arping -U -c 10 -I eth0 Virtual_IP_4
/usr/bin/arping -U -c 10 -I eth0 Virtual_IP_5
/usr/bin/arping -U -c 10 -I eth0 Virtual_IP_6