Recently I had a project at a customer who had an Private cloud environment based on VMWare.
This environment was running in a datacenter and we had no access to the Hypervisor.
After installing a fresh Netscaler 12.0 VPX from the OVF template we encountered a strange phenomenon: the Netscaler was running fine but after a while all connections to the Netscaler where dropped and the Netscaler would not be reachable over the network
We had to log on to the console to reboot the Netscaler. After the reboot it would work for a while but then it would stop functioning again. After some troubleshooting I found drops on the network interface of the Netscaler. So it seems it cannot respond of send data over the network.
After some digging i found a article from VMWare stating this is an Citrix Netscaler bug. But the article was for older versions of Netscaler. Citrix also has an article stating this issue. We tried different solutions (redeploy, re-configuring, etc.) but after a while I thought what the heck and tried the workaround offered in the Citrix article, and it worked!
Reason for losing network connectivity is caused by “tx_ring_length” mismatch, which causes TX stalls and that causes drops on your network cards.
The workaround will add this line: “hw.em.txd=512” to a new file called loader.conf.local. This wil set the TX value to 512, please do not set it any lower as this will cause core dumps on the Netscaler.
Here’s how i got it working
Log on to the console of the Netscaler
run this line and reboot
echo “hw.em.txd=512” >> /flash/boot/loader.conf.local
Or you can use the Citrix solution:
SSH and log on to Citrix NetScaler VPX appliance as nsroot.
Type shell.
Change directory (cd) to /flash/boot.
Create file /flash/boot/loader.conf.local (if not present) with same permissions as /flash/boot/loader.conf. Add the following line and reboot:
hw.em.txd=512
Note: To create the file, use command touch loader.conf.local.
vi Commands
The following are the vi commands to edit the document:
From NetScaler shell type:
vi
Move the cursor to the last character of text in the file, type “a” and click Enter.
Type the line:
hw.em.txd=512
Press the ESC key and then “:” key. The cursor will move to the bottom of the page, then type wq!.
So although the bug seemed to be fixed in the newer Netscaler builds it somehow seems to resurfaced in a fresh install of version 12.0.
UPDATE:
After a while under load the Netscaler stopped working again. I opened a case at Citrix and it seems to be related to the combination of E1000 NIC interfaces selected on the ESX in and DSV (VMWare Distirbuted Virtual Switch) enabled on the underlying Hypervisor. After changing the NIC to a VMXNet 3 the issue dissapeared.
Beware that in my case the license and config where lost so please backup your config (and export it from the Netscaler to a server) and have your MyCitrix credentials at hand to regenerate te license.
Hope this helps 🙂