Supercharge Citrix Logins: Collection of Tips From the Field

by Ray Davis, CTA & Tampa CUGC Leader

I wanted to take the time and list the optimizations I try to follow wherever I can when helping clients tune images and make login faster. I also wanted to state that these tips and tricks are gathered from a collection of EUC sources I follow. I can’t take any credit for these, and this blog is to try to put them all in one place for the community.

There are many folks out there that have blogs that go deep into this. One that always comes to mind is James Rankin. I have been following his hat tricks for many years. He has a great “Ultimate guide to Windows Login time” series, and I recommend you read it.

As I go through and list out these optimizations, please note that some of this is my opinion based on my experience and the other is EUC help from the community. I also understand that each environment is different, and some may or may not apply, and some people may not agree with these. I still try to use all I can within the control given during the situation.

As you read this, remember these are helpful tips and aren’t intended for you to go out and start changing things right away. Take your time and test, test, and test. I did not focus on the storage aspect, as ideally, using SSD or NVMe storage is something you would want to stay within any VDI environment.

  1. UEM Tool
    It would be beneficial to obtain a UEM tool with system optimizations for CPU, Memory, and I/O. By just doing Citrix WEM, it has a magic formula (simplified a lot). By setting four options, you will achieve more of a scalable approach for the images, which means you will get more out of the Hypervisor around CPU cycle, CPU wait time, and CPU response. Memory management can be beneficial because it takes a working optimization set and clamps the usage if needed.

    The next question folks ask is, what about the disk I/O or disk latency that could occur? Sure, that could happen, but 13k-18k IOPS per disk at 3gpbps-6gbps is very unlikely. In today’s technology times, I don’t run into disk constraints as I used to 6-8 years ago. But it’s still likely to happen.
  1. Tuning GPO
    GPO is an essential part. There is nothing wrong with the older mindset around away GPP and client-side extension, login scripts, item-level targeting, and WMI filters. But ideally, to get the best user experience, they would need to go away or be open to change if user performance is the key. It does work very well, but it also adds much overhead. But this is the #1 thing I’ve cleaned up at many companies. You move these to a UEM tool.
  1. GPO Functional vs. Monolithic
    Number 2 leads me to number 3, get rid of functional GPO and do the monolithic layout. Too many single-liners GPOs will make logins slow from my experience. One or two main GPO objects will make GPO processing a lot better. Yes, it will contain a lot of GPO in one, but it processes faster. The gentleman in this blog is Trentent Tye. He works for ControlUp, and I occasionally talk to him about custom ControlUp script base actions. He is very sharp and has helped me many times. Another good on on this list is James Rankin.
  1. Loopback Processing
    GPO loopback Processing is something I have seen done wrong in so many places. In a Citrix XA-XD or even RDSH environment, ideally, you also want to do a loopback replacement. You do not want GPO from other OUs applying. This can be a hot topic because you might have your OU laid out where users are in one OU with user policies and computers in another OU with computer policies. But in my last 15 years, the approach has been computer GPOs, and if you want the user’s GPO applied, you need a loopback enabled and then set replace, not merge. Taking this approach means doing GPO additions or OU re-org. This is a debatable factor, and some may not agree.
  1. Computer GPO over user GPO
    One crucial piece is always if you can choose computer GPO when available. Suppose you have a user and computer GPO that do the same thing. Go with computer GPO. It will apply at a startup making the GPO faster. You might be thinking that we have specific user settings that apply to users. Yea, I get that. But again, use a UEM tool and get away from what I listed in #2. Keep nested groups to a minimum, or logins will be impacted. But again, each setup may not be able to do this based on the environment’s complexity.
  1. Asynchronous GPO processing
    Ensure you have Asynchronous GPO processing on.
  1. OS optimization
    Windows OS optimizations, such as Citrix Optimizer and bolt-ons from Citrix marketplace, for 3rd party applications such as Edge, Chrome, Office, etc. It’s essential to tune the image. VMware OSOT vs Citrix Optimizer Optimizer Smackdown | GO-EUC
Optimize Citrix Logins
  1.  Minimize Application from Startup
    Remove all applications at startup, except for the key elements. An example would be the CU agent, UEM Agent, and AV. Autoruns helps in this manner. Nothing needs to run in the hklm\run or Run once.  If it needs to run at startup, you use a UEM tool to call it a day.
  1. Application tuning
    In my experience, this can be a daunting task. Many companies will have custom software for the businesses. Some are in-house, and some are 3rd party and some are used universal across many companies. In any case, try to reference the documentation where possible. Most but not all will have guides on applying best practices in RDSH/XenApp/VDI.

    As an example, here are some that come to mind. There are many more I am sure.
  1. Active Setup
    Active Setup was another legacy hook from MS that they kept around. Remove active setup keys from Registry, and these bloat the unserint and shell from loading, causing delays. I have details and data on this I can provide.
    • Citrix TechZone highlights this in their best practices for deploying Google Chrome. Although the topic isn’t about Chrome, it gives you an idea.
    • Preferred method – Add this into Citrix Optimizer:
    • Another method – Run James Rankin’s script:

      echo Querying and deleting 32bit STUB paths…
      setlocal EnableDelayedExpansion
      :: Queries the Registry and searches for specific strings. In this case ‘STUBPATH’
      set KEY=”HKEY_LOCAL_MACHINE\Software\Microsoft\Active Setup\Installed Components”
      set FND=find /i %KEY%
      for /f “Tokens=*” %%a in (‘reg query %KEY% /s^|%FND%’) do (
      set SP=N
      for /f “tokens=*” %%b in (‘reg query “%%a”^|find /i ” STUBPATH”^|find “REG_”‘) do (
      set SP=Y
      )
      :: Runs an if statement, stating that if a key matching ‘STUBPATH’ is true, it should be deleted.
      if “!SP!” EQU “Y” reg delete “%%a” /V STUBPATH /F
      )
      echo Querying and deleting 64bit STUB paths…
      endlocal
      setlocal EnableDelayedExpansion
      :: Queries the Registry and searches for specific strings. In this case ‘STUBPATH’
      set KEY=”HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Active Setup\Installed Components”
      set FND=find /i %KEY%
      for /f “Tokens=*” %%a in (‘reg query %KEY% /s^|%FND%’) do (
      set SP=N
      for /f “tokens=*” %%b in (‘reg query “%%a”^|find /i ” STUBPATH”^|find “REG_”‘) do (
      set SP=Y
      )
      :: Runs an if statement, stating that if a key matching ‘STUBPATH’ is found, it should be deleted.
      if “!SP!” EQU “Y” reg delete “%%a” /V STUBPATH /F
      )
      endlocal
  1. PVS vDisk maintenance ( if PVS is used)
    PVS offline vDisk maintenance. Yea, it would help if you defragged the VHDX. It doesn’t matter how fast your storage is. Disk fragments will occur, reducing performance by 20-40%, in my experience. There are ways to do this without downtime and automation. The more versions you create, the more it happens. I have blogs on this if you are interested.
  1. Extra VDA Image tweaks
    Sometimes I would like to squeeze more out of the optimizations. Being in the community means many talented folks have many tricks. Here is another blog I go through to see where it can help. I encourage you to ensure you understand what this is doing. If you implement it, it would be a good idea to make a list of running optimizations. It will allow you to have source control for yourself and your peers, helping support the image/environment.
    1. Finalizing or Sealing
      Remember that Finalizing or Sealing the image is very important. I have been using BIS-F by Matthias Schlimm for many years now. I have a good working relationship with him from my CTA experience. This is another critical element. If you currently have your Image Sealing scripts, then no problem. We can combine them, and the results are even better.

      Base Image Script Framework ( BIS-F) 6.1 (eucweb.com)

      Here are some key elements I always use in my Golden Image:
      • Disable IPv6
      • Run DelProf2
      • Run CCleaner
      • Run AV Scan ( it depends on the AV product at times)
      • Configuration CTX Optimization
      • Configure Citrix PVS Target ( Set my Write Cache drive for me)
      • Run a Defrag ([Issue]: Defrag not performed, not defined based on DiskMode VDAPrivate · Issue #369 · EUCweb/BIS-F · GitHub)
      • Run .NET Optimzations
      • Rebuild Performance Counter
      • Enable WinSxS optimization  with a Max of 480 minutes( Execute on base Disk only)
      • Disable “Delete allUsersStartMene Content” I do this because It will ask you, and I have seen folks say yes and not read the messages.
      • Remove ghost devices ( be carefull, and understand this)
      • Configure Desktop shortcut
      • Shutdown Base Image after sealing
      • If using  FSLogix AppMasking, “Copy FSLogix rules (*.frx), assignments (*.fxa) and URL (*.xml) from central share during Device Personalization on System Startup” You can use GPP, but this approach I like more.
      • Azure AD (If using this) PREP: Azure AD leave doesn’t work · Issue #330 · EUCweb/BIS-F · GitHub
      • Rearm MS Office once ( you need to evaluate this for your environment)
      • Rearm MS Windows once ( you need to evaluate this for your environment)
      • Enable RDP support ( allows you to execute BIS-F within a RDP session)
      • Configure logging to a UNC path

    Here are some key elements I run on the GPO for the VDAs but not limited to:

    • Configure Citrix WEM
    • VDA Configuration “Delay Citrix Desktop Service” this helps when you modify the List of DDCs as well as the purpose of the Delay
    • Configure Page file
    1. Bake GPO in Image or use GPMC
      This is another hot topic that I have had many conversations about with the community. I say it depends on the setup and the use case. Bake the GPO in the images to get the best processing and logins. Doing it from GPMC from AD seems better. Make the change 90 minutes later with a 30-minute offset GPO applied or do a GPUpdate /force remotely, mostly completed. But if you bake it in the image, the GPO processing is super-fast. But the downside is you have to crack the image open for any GPO change. Unless it is a computer GPO, a reboot may be needed to reflect the HKLM\policy hive. 
    1. Good profile management.
      Profile containers seem to be everyone’s go-to here. But that is not always the case. However, UPM is still great in my humble opinon. FSLogix Office container is geared around Office 365 and roaming the container’s search database. You can stick it in the profile container or split it in an Office container.

      Server 2019/Win10 Multi-session and above do not set the search to roam anymore in the ADMX file for the GPO. Windows natively do this now, and it will cause issues if you do. It’s the FSLogix docs, and I’m sure you also know.  I did a webinar about one year ago, and the advice I gave was to be careful with exclusions. Exclusions are not treated as they were in the UPM days. Citrix Profile Container (not UPM), but Profile containers are also perfect. They are giving FSLogix a run. Well-respected James Kindon has broken this down very nicely.
    1. Shrink Scripts /Deduplication/Exclusions
      Jim Moyle is an FSLogix genius, and he preaches this all the time. Yes, you will need a shrink script to shrink the VHDX. When I did this, I would do it weekly with Jim Moyles’ script. Another add and if you use any Windows server to host them. Enable data deduplication.  I have also written a blog to show savings and shrink scripts. 

      When you do exclusions, be aware that the first login will impact the PVS write cache. In today’s deployments, the use case is Write cache to Ram with Disk overflow. I wish there were a magical number or a T-shirt size that would fit all. (Maybe there is, and I been living under a rock.) Disk overflow would be the D drive it creates when using the XenDesktop wizard from PVS, or automation works. The older rule of thumb was for desktop operating systems, starting with 256-512MB, and for server operating systems, starting with 2-4GB. Anyway, from my testing, it would only happen on the first login of the profile creation and will not happen again.

      Exclusions do not make the VHDX mount faster, and it plays no part in making logins faster. I used to use 20GB drives for disk overflow, but it may seem that just isn’t cutting it for today’s applications. However, this is environment based in most cases. FSLogix 2210 now has a compaction feature they introduced. I have only used it on a lab setup. It seemed to work well, but I still stick with Jim’s script for now. Matthias Schlimm released a blog giving  a great inside look at what is going on. I suggest you read it.
    1. AV Exclusions
      Making sure the proper AV exclusions are in place is extremely important. I would also verify and check in the Registry if the AV product allows it. Most do, from what I have seen.
    1. Turning RDS/Virtual AppsTSFairShare
      Fair Share technologies for CPU resources were introduced in Windows Server 2008 R2. Remote Desktop Services (RDS) server, Windows 10 Enterprise multi-session, and Windows 11 Enterprise multi-session use Fair Share technology to manage resources. RDS builds on the Fair Share technologies to add features for allocating network bandwidth and disk resources.

      Fair Share technologies are enabled by default, but you can disable them using Windows PowerShell and WMI. I would disable these settings to get the best user experience. Make sure to test this beforehand. On March 2023 on the VirtualExpo, you can see that this indeed helped login and application launch times.
    Citrix Login Optimization

    These registry keys exists for CPU, DISK, and Network, all enabled by default.

    • Disk: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TSFairShare\Disk
      • Network:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TSFairShare\NetFS
      • CPU: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\QuotaSystem
    Citrix Login Optimization
    1. Hardware Layer
      Understanding the CPU architecture is another good topic to pay attention to. In my experience, most places now have SSD or NVMe for the storage aspect of things. The hypervisors that I see are Nutanix and VMware. Nutanix has a wonderful HCI solution, and VMware offers an HCI solution and an traditional 3-tier layout for things like UCS, PowerEdge box, etc. Whatever flavor you are running, it is vital to understand the VM sizing for the workloads. The answer around what size is mostly “it depends” However you can follow guidance from Techzone for a Scalability aspect.

      “On older chips, such as Broadwell and Haswell, Intel connected processors using a ring-based architecture. But as the number of cores increased, access latency increased and bandwidth per core diminished so Intel would mitigate this by splitting the chip into two halves and adding a second ring to reduce distances. And this invisible split was something that needed to be factored into CVAD SSS to provide optimal results. This has been referred to in the past as “NUMA” or Non-Uniform Memory Access. And the leading guidance was to ensure that you are sizing CVA VMs as large as possible but not crossing NUMA nodes, sub-NUMA clusters or rings at the same time. If you sized your CVA VMs too large and they effectively spanned NUMA nodes or rings, it can lead to NUMA “thrashing” by accessing non-local resources and this would yield reduced SSS. Fast-forward to today and Intel has moved from a ring-based architecture to a mesh-based architecture. And this new mesh architecture introduced in Skylake does not have the same limitations as before where we have to split chips, divide cores or add rings. And this changes the way we size CVA servers in particular. So it’s important to understand the specific chip that is being used in the hardware you purchase and how the underlying microprocessor architecture is designed and constructed”

      I do see this a lot at times, client/company throwing more CPU at things hoping it will speed up the back in workloads. Sure there are times it will help. But I try to pay heavy attention to these. CPU wait time and CPU ready time are both terms used in the context of CPU scheduling and resource management in operating systems.

      CPU wait time: refers to the amount of time that a process is waiting in a queue, ready to run but unable to do so because the CPU is currently executing another process. During this time, the process is waiting for the CPU to become available so that it can start executing. Example, a virtual machine did get scheduled but the processors have nothing to process and so the CPU simply waits while the scheduled time for the virtual machine clicks by.

      CPU ready time: on the other hand, refers to the amount of time that a process spends in a ready queue, waiting to be allocated CPU resources. This includes the time that the process spends waiting for its turn to use the CPU, as well as any time that it spends waiting for input/output (I/O) operations to complete. Example, virtual machine was ready, but could not get scheduled to run on the physical CPU. Bascially cpu ready means the guest is waiting on the host, cpu wait means the host is waiting on the guest

      In summary, CPU wait time refers specifically to the time a process spends waiting for the CPU to become available, while CPU ready time encompasses all the time a process spends waiting for CPU and other resources.
    1. Choosing the suitable Provision method

    This concludes the tips and tricks. Remember, this was more of a catch-all source blog showing links and summarizing what many of the EUC folks use to optimize logins. Please let me know if I missed something you believe can be helpful, and I’ll update the blog to include it.

    One comment

    Leave a Reply