by John Smith, CTP
To continue with the monthly series on using wire shark to troubleshoot Citrix, here we are going to take a look at downstream performance. As a Citrix Engineer for over 15 years I recall once asking our Database team if we could install Edgesight for Endpoints on their SQL Servers so that we could have some visibility when the SQL Server environment was slow. Of course, I got laughed out of the room and was told there was no chance. I have often found this to be unfair given that the process of publishing the application in Citrix left me the responsible party when it was slow. The Database team, while perfectly happy to leave all of their troubleshooting up to me, wasn’t willing to let me install an agent that would at least give me some visibility into their environment.
So, with this narrative, we are going to move forward on how to get visibility into systems outside your purview. Today we will cover Database performance but we will also cover HTTP. Over the years, the number of published browser applications has risen to up to 50% of the total number of published applications so I found it relevant.
Wire Data: Troubleshooting Downstream Communications (HTTP and Database)
So for this post we will cover how to identify your MS SQL transactions and look at the performance metrics as well as some of the data that exists in the queries themselves. We will also look at Web-based and SOAP-based traffic.
My current lab is STILL down after a massive storm so I am using a sample PCAP from ExtraHop. The new server arrived a few weeks ago, I just need to rack it and get VMware installed.
Finding your Database Traffic:
To find MS SQL Traffic use either one of the following:
tcp.port==1433 (if using the default port, not always the case in some clusters) or simply “tds” (without quotes)
Next we want to graph our TDS traffic and see if we have any slowness. To do this, you will go to “Statistics” à “IO GRAPH.”
To get the view you see below, go over to the bottom right-hand area of the dialog box and select “Advanced.” You will also set the CALC fields to “AVG” as we want to see the average turn time for each protocol. So, as you see below you can fill out each of the filters and measurements (there are thousands of metrics to choose from). From there, we will toggle each graph one at a time. You can mash them up together but if there are large discrepancies in time deltas you could end up hiding one of your metrics.
Step 1: So first I click “Graph 1” to look for SQL Server Delays
Observations: You have to somewhat desensitize yourself to SQL, some reports can take several minutes to run however most of your SQL Transactions should be completed in the single or double-digit millisecond range. You should not expect to see long (multiple seconds) on average for basic usage of a client-server or SOAP-server transactions. Below you see that you are averaging four digit (multiple seconds) for transaction turn times. This is likely problematic if it is a very busy EHR/EMR solution that could have several millions of transactions a day. Keep in mind, in our world, milliseconds “F-ing” matter!
Next I want to check my zero windows related to SQL so I click “Graph 1” to un-toggle it and click “Graph 2”: Note that the operator is not average now but counting the number of frames.
What is a zero window and why does it matter? While wire data cannot give you CPU/Disk/Memory, wire data WILL give you zero windows. A zero window is when a client or server basically sends a zero window message (no more room in TCP buffer). This is due to something I/O related on the system itself (my experience is that it is usually Hypervisor over-subscription). While wire data won’t give you CPU, Disk or Memory, it can tell you when a system is I/O bound. Besides, if your transactions are slow, it really doesn’t matter what the CPU, Disk and Memory says does it?
Observation: I see no zero windows so we are not dealing with an I/O issue on either the client or server. Looks like just slow queries, could be bad indexing or just a poorly written app.
Finding your Web based Traffic:
“tcp.port==80 or http”
Looking at basic web traffic is likely not enough, we are going to need to specify a web server. So for the next scenario I want to use 192.168.0.101 as our web server.
So let’s check two things, let’s check the error count and let’s check the process time for this specific server: (Note, wireshark actually gives us specific http transaction time with the http.time value).
Also, I want to check for 500 errors so I will include the following filter: http && ip.host == 192.168.0.101 ** http.response.code >= 500
Graph 1 Observations: HTTP traffic looks pretty good, you had a slight delay of about ½ second but this is not going to be very noticeable for http traffic so you are in pretty good shape.
Graph 2 Observation: When you consider we are filtering for HTTP 500 errors you can see we do have some issues with HTTP 500 errors off and on. Typing the same filter into the main dialog box will give you the specific URLs that are causing those error codes. Normally, 500 errors are related to database issues.
Next month we may look at some other protocols or even look directly at Citrix and PVS itself to try and find slowness issues. Hopefully this will give you some tools to use to monitor your specific published applications and maybe decrease some of the opacity around packets when you are using WireShark. As you can see, when you leverage proper filters and when you have a solid understanding of what is on your network you have a good shot of at least getting close to the root of the issue. I am also considering looking at DNS as well. Please comment if you have an opinion, DNS or Citrix-Proper troubleshooting.
Thanks so much for reading and thanks for supporting the Citrix Community!!
John M. Smith, CTP
Solutions Architect, ExtraHop Networks.