I believe I have come across a bug in the implementation of the StoreFront monitor in the Citrix NetScaler 10.5. The issue may also exist in previous versions, but I have not tested it.
The NetScaler I was working on was sited in a secure network, with a firewall between the NetScaler and the internal network. Shown below:
I had the following firewall rules in place:
Source | Destination | Port |
---|---|---|
Subnet IP | StoreFront Servers | 443 |
Management Machines | NetScaler IP | 443 |
The two StoreFront servers, in a load balanced configuration, were constantly showing as down. Expanding the Service Group and looking at the probe results just showed ‘Probe Failed’. To verify connectivity, I created a HTTPS monitor for the same pair of servers, but strangely this monitor always showed as Up.
Running a WireShark trace on the SNIP was not showing any HTTPS requests being sent from the SNIP. Running the same WireShark trace but using the NSIP address showed multiple regular requests from the NetScaler. Another rule was added to the firewall, as per the table below, and the StoreFront monitors changed state to Up almost immediately.
Source | Destination | Port |
---|---|---|
Subnet IP | StoreFront Servers | 443 |
NetScaler IP | StoreFront Servers | 443 |
Management Machines | NetScaler IP | 443 |
I don’t believe that this is by design from Citrix, as their documentation for the NetScaler clearly states that the SNIP should be responsible for the monitoring of services and communication with backend services. Hopefully this bug/issue will be resolved in a future release.
“The NetScaler ADC uses the subnet IP address as a source IP address to proxy client connections to servers. It also uses the subnet IP address when generating its own packets, such as packets related to dynamic routing protocols, or to send monitor probes to check the health of the servers.”
http://support.citrix.com/proddocs/topic/ns-system-10-map/ns-nw-ipaddrssng-confrng-snips-tsk.html
Thankfully in the situation I am currently working on, allowing the NSIP the ability to see the StoreFront servers and subsequently monitor them was not a problem (although another hole in the firewall) but I can imagine that a number of deployments may not have the flexibility to configure their network or firewall in this way.
The source IP can be changed to SNIP/MIP with rnat. See https://support.citrix.com/article/CTX217712
Thanks for that link. Useful to know there is an alternative solution.
I came across that same issue myself yesterday, and I opened up an SR with Citrix, who confirmed that because the SF monitor uses a Perl script through the BSD kernel, the monitor is sourced from the NSIP. I asked for any documentation of the above, and his response was:
(sic) “… I confirmed with Escalation and Dev team. Unfortunately, there is no doc that shows you that SF monitor uses NSIP. Rest assured my email should sufficient that Netscaler used NSIP when communicate with SF monitor and i log a internal request to update the edocs. The main reason behind this is, its using …” [a Perl script through the BSD kernel].
I tried creating a Net Profile using the MIP/SNIP, and binding it to the service group and LB vServer. The Net Profile section in the monitor was grayed out (maybe because it’s handled by the kernel?), and did not work.
He feels that creating a DENY ACL (src:NSIP dst: SF server) should force traffic through the MIP/SNIP, and I’ll try that on Monday.
I’m glad to hear that Citrix confirm our findings on this – it helps re-assure me that it is more a design/technical restriction rather than something strange in the configuration of the particular unit.
It is quite disappointing that they don’t have any public documentation on this though. I would estimate I wasted a bit over a day trying to ascertain the reasons for my StoreFront monitors not working – which is frustrating when a short document or KB could have saved all of that time.
I will also try creating a Deny ACL and see how I get on. Thankfully we are not yet at the point of deploying this as a production service so we still have the time to experiment.
Still experiencing this issue on Netscaler 10.5 55.8.nc. Your post helped out a lot, thanks.