by John Smith, CTP
There are times in our careers where even the most expensive tools cannot find or pinpoint an issue. Among the challenges for Citrix teams is the constant burden to prove the negative. As a result of publishing an application in Citrix that may have a RESTFUL tier, an external API call, a Message Queuing tier and a Database tier (all of which are outside of your team’s purview) you end up being the fulcrum of support for the application as issues with any one of those four tiers will undoubtedly be escalated as a Citrix issue. Sometimes we end up in the “hand-to-hand” combat world of application support and it can be flat out ugly!!! Jobs get threatened, war rooms last for days on end and people begin to wonder if they wouldn’t make a great janitor or reality TV star.
While Desktop Director and HDX Insight have been great addition to the Enterprise and Platinum editions the burden of the VDI/Citrix team is to track the user experience across every Critical Control Point (See my PACCP post). We need visibility beyond the ICA Channel and into the downstream critical control points of our published and installed applications. Yes, this isn’t fair but at the end of the day, we end up being responsible for the “Citrix” experience and if nothing else, we become the folks who route the issue to the correct team. Because of this, there are going to be days when our existing tool set does not give us the entire picture and in spite of sizable investments in existing agent-based, machine data driven tools, we are left with looking at the wire to see what the hell is going on.
Sadly the idea of using WireShark to solve enterprise level issues a lot of times is likened to slaying a dragon with a Swiss army knife. But the fact is, if you stab it enough times…you will kill it. While I am not proposing that we spend the rest of our careers looking at packets (although you’d be surprised at what’s in there) I AM proposing that we take advantage of wire data and the intelligence that we can glean from it with respect to troubleshooting issues and providing visibility into holistic environments that we end up being forced to support.
Today I want to post the first of what I hope will be a series of articles on using WireShark to troubleshoot. WireShark is a tool that EVERYONE can afford and it is much easier to use than you may think. It is NOT just for Network Engineers and as the converged cloud architecture continues to take shape, the importance of broadening our skill sets is very important to be able to support tomorrow’s converged solutions. Embrace the wire and attain significantly greater awareness of your infrastructure.
Wire Data: Troubleshooting Slow Logons/Launches with Wireshark
While I am not an Active Directory expert I do understand the extent to which a misconfigured AD can derail your performance. For those who have not watched it I highly recommend “The Gospel of Carl” and watching his
10 things in AD that can hurt your Application and Desktop Virtualization Efforts (https://www.youtube.com/watch?v=o8atf0DcYzg). Working for a wire data analytics vendor, I commonly see misconfigured AD environments with respect to XenDesktop and XenApp infrastructure. While an organization that set up an office with 10,000 users would likely set up on-premise Active Directory or at least set up sites and services to ensure an acceptable domain controller is selected for them. I am constantly seeing large VDI environments where no Active Directory measures have been taken to accommodate it. In many cases, the VDI project involves new subnets and often times A/D teams are not alerted of the new infrastructure. I have seen VDI instances using domain controllers across international MPLS links. This can cause serious slowness around logins, profile load times, authorization and authentication as well as general slowness and malaise.
Today I want to use WireShark and a sample PCAP file to show how to pinpoint your A/D controllers (without logging into each system and typing “set”) and to see the performance of LDAP, Group Policies, Kerberos and even your profile share. This can be very handy when you are trying to troubleshoot slow logons.
My current lab is down after a massive storm so I am using a sample PCAP from ExtraHop. It is somewhat limited as, you can guess, customers just don’t seem to want to hand you a PCAP from their core switch these days!!
Finding your LDAP and Kerberos Traffic:
To find your LDAP and Kerberos traffic you would use the following filter:
tcp.port==389 || tcp.port==88 || udp.port==88
Next we want to graph our LDAP and Kerberos traffic and see if we have any slowness. To do this, you will go to “Statistics” à “IO GRAPH”
To get the view you see below, go over to the bottom right-hand area of the dialog box and select “Advanced.” You will also set the CALC fields to “AVG” as we want to see the average turn time for each protocol. So, as you see below you can fill out each of the filters and measurements (there are thousands of metrics to choose from). From there, we will toggle each graph one at a time. You can mash them up together but if there are large discrepancies in time deltas you could end up hiding one of your metrics.
So first I click “Graph 1” to look for LDAP delays:
Observations: I do see a ½ second spike but that is about it, generally the LDAP performance is performing at a “switched network” speed. This should tell you that you are not using a domain controller on the other side of the world or across an MPLS link. You can also look for specific IPs but I want to cover just checking turn times for key protocols in this post.
Next I want to check my Kerberos performance so I click “Graph 1” to un-toggle it and click “Graph 2”:
Observation: I see some delays but they are single digit ms delays, not really a delay at that point so I am not going to worry about TCP/88
After Un-Toggling “Graph 2” and Toggling “Graph 3” I note zero delay in UDP/88
So where the hell are my logon/launch delays!!???
Well, in this case I ALSO know that we have a profile server located at 10.10.6.179 and I am using profile shares. So I will use the following filter for tcp.time_delta
“tcp.port==445 && ip.dst==10.10.6.179” (you may have to stretch the dialog box to see all of it)
Observations: I note several second delays in turn times with my profile server!!! If I were to try and say where the issue is, I would certainly start here.
Traditionally, system types have always seen wireshark as the “Network Engineer’s tool” however, given the number of moving parts in a Citrix XenDesktop/XenApp implementation, it can also be an outstanding tool for Citrix teams. It is free and with a little training anyone can take a PCAP and start looking at key parts of their infrastructure to help nail down where an issue is. As I stated at the beginning of the article, I hope to make this a monthly post. For my next post, I want to look at downstream performance for Web and Database transactions.
Thanks so much for reading and thanks for supporting the Citrix Community!!
John M. Smith, CTP
Solutions Architect, ExtraHop Networks.