Site icon BLOGS

Scaling and Load Balancing Session Recording

by Hal Lange, CTA Ryan Revord, CTP

One of our favorite and most overlooked products in the Citrix line is Session Recording. It gives an ability to see what is happening with the users and not just hearsay. One of the problems is that if the wrong people have access, it becomes an HR nightmare. I am not here to discuss that issue, but to show the latest frustrations with this wonderful product.

Background

We have what most people would call a very large environment. With a concurrent peak utilization daily of over 20k users, I can agree. This includes XenDesktop Desktop and Shared workloads across multiple datacenters. Here are the specs used at the beginning after reading the Citrix articles:

4 Session Recording servers – each with 8 vCPU, 32 GB RAM, 20GB space for MSMQ buffer File Share – 10TB Nutanix AFS

Here are the links that we followed:

https://support.citrix.com/article/CTX230015
https://support.citrix.com/article/CTX230013
https://docs.citrix.com/en-us/session-recording/current-release/configure/load-balancing.html
https://support.citrix.com/article/CTX200869

While following the first link will get you the closest, combining all of these links still miss out on some very important points.

Here are the steps Citrix does have documented correctly:

The parts that Citrix mentions but does not give enough detail surround File Share and MSMQ redirection.

Redirecting MSMQ

Citrix documentation seems straight forward, until you read the powershell script they provide to copy the configuration. At this point, you will see that they are editing one file and then creating a completely different file on the other server. Which is the correct file?

Both are the correct method. The easiest method I have found to keep all of your servers looking like the same config is to create a new file.

Create a new file C:\Windows\System32\msmq\Mapping\sr_lb_map.xml

Add the following lines to the newly-created file changing the <Load_Balance_FQDN> and the <Local_Server_FQDN> to your local names:

<redirections xmlns=”msmq-queue-redirections.xml”>        <redirection>               <from>http://<Load_Balance_FQDN>*/msmq/private$/CitrixSmAudData</from&gt;               <to>http://<Local_Server_FQDN>/msmq/private$/CitrixSmAudData</to&gt;        </redirection>        <redirection>               <from>https://<Load_Balance_FQDN>*/msmq/private$/CitrixSmAudData</from&gt;               <to>https://<Local_Server_FQDN>/msmq/private$/CitrixSmAudData</to&gt;        </redirection> </redirections>

**All servers in the LB group will have a unique file because of the <Local_Server_FQDN> component.

Restart the MSMQ service or restart the server and your MSMQ service is ready to be load balanced.

File Storage

The file storage is the tricky one. We have the following requirements:

If you were to follow the recommendations from Citrix, you would use a file share that all the servers have Read/Write access to and leave it at that. There are a few problems with this approach as listed above. How do you have multiple datacenters write to the same share? How do you have Session Recording servers stay to their own datacenter?

When it was first tested in our environment, we used one share in one datacenter. While on the surface it looked like everything was working, when you tried to view the recordings only about 60% were available to watch. As we started to go through the Event Viewer on the Session Recording Servers, there were many errors of SQL dead-lock and file failures. The file failures were for multiple reasons: File opened by another user, couldn’t append the file, couldn’t find the file. The files were an issue, but we were much more concerned about the SQL dead-locks.

Troubleshooting SQL led to an interesting discovery: the dead-locks were coming from the Session Recording servers and NOT SQL. This dead-lock message was the incorrect error for a file access issue that was occurring.

At this point, we had followed Citrix recommendations to the letter and were getting about 60% success rate on recordings and countless errors regarding the files. 

As we kept doing more research, we started reading the Citrix articles more closely. There was a line in CTX200869 that caught our attention:

“Store data on a set of local disks controlled either as RAID by a local disk controller or as a Storage Area Network (SAN). Storing data on a Network Attached Storage (NAS) based on file-based protocols such as SMB, CIFS, or NFS has serious performance and security implications. Never use this configuration in a production deployment of Session Recording.”

That is a very interesting statement. According to all of their other Session Recording Load Balancing documentation, you must use a file share. Now that we have found the conflicting statements we have a starting point to work with.

File Storage Corrections

Now this leads to a new problem. How do we setup shared file storage, across multiple servers, across datacenters, where shares are not recommended?

Here are the requirements that need to be considered.

The scenario that Ryan floated out is what we affectionately call Local Drive loopback. What this entails is adding local storage to the Session Recording server and then sharing it. 

To use this setup:

Figure 2: Server 1 configuration
Figure 3: Server 2 configuration

With this configuration, we now have access for each server locally, but each server can communicate with each other if necessary. Each folder listed will have recordings load balanced across the folders.

Current Configuration

Now that we have figured out how to make this work appropriately, without all the SQL and file errors, it is time to look at the server configuration.

We started with:

4 Session Recording servers – each with 8 vCPU, 32 GB RAM, 20GB space for MSMQ buffer File Share – 10TB Nutanix AFS

I always felt those servers were way to big for Session Recording. Now that the configuration has been corrected, I have changed the config to more servers with a smaller footprint.

We currently use 12 Session Recording servers each with:

It turns out for 30 days worth of retention and some head room in case of failure, we need to use 120 TB of storage across all the servers. Mileage definitely varies on how much space you need for your retention.

The last piece of advice that I offer is don’t forget to run the cleanup utility at a regular scheduled interval using the icldb tool. Session Recording will not clean itself up without this.

 
Hal Lange and Ryan Revord

Exit mobile version