by Uddave Jajoo, CTA & Indianapolis CUGC Leader
Recently, I started working on a project for one of the customers performing research work in healthcare on molecule studies, who needed them to run CUDA-based applications using High Graphics processing utilizing NVIDIA vGPU Tesla V100 cards.
I had already worked on a similar requirement previously in an on-prem datacenter. However, this time the requirement was to configure that in Azure with Citrix DaaS. Hence, I decided to implement the solution in Azure using Azure Native VM Family size supporting NVIDIA vGPU enabled cards. Azure already offers N Series VM Family Size supporting vGPU cards, there are several offerings depending on the graphics card OEM.
Before we deep dive into the setup and configuration for the NVIDIA vGPU enabled workloads in Azure, lets talk about Accelerated computing:
“Accelerated computing is the use of specialized hardware to dramatically speed up work, often with parallel processing that bundles frequently occurring tasks. It offloads demanding work that can bog down CPUs, processors that typically execute tasks in serial fashion. Born in the PC, accelerated computing came of age in supercomputers. It lives today in your smartphone and every cloud service. And now companies of every stripe are adopting it to transform their businesses with data.
Accelerated computers blend CPUs and other kinds of processors together as equals in an architecture sometimes called heterogeneous computing.” –Rick Merritt, What is Accelerated Computing, NVIDIA blogs.
Let’s walk through the below configuration steps on how to deploy and configure the VDAs to utilize vGPU enabled VMs in Azure:
- Configuring Cloud License Server
- Install Driver on Master Image in Azure
- Provision Catalog and Create VDIs
- Configure Licensing on Client VDIs
- Identify the VM Family Size supporting NVIDIA vGPU Tesla cards – NCv3 Series
- Identify the supported Driver version – NVIDIA supported Tesla Drivers
- Windows 10 Client OS 22H2
- Citrix VDA Agent 2305
- New Cloud License Server appliance
- Firewall requirements to enable communication with Cloud license server
Configuring Cloud License Server
Legacy License server is set to EOL by July 2023. Hence, NVIDIA offers two different methods for provisioning license server. (DLS) On Premise and Cloud(CLS). In this blog, I am going to cover how to setup a CLS-based license server. Steps are very simple and described properly in the NVIDIA documentation as well.
1. Login to NVIDIA Licensing Portal to create the new CLS based license server.
2. In the Dashboard, click on License Servers and select Create Server.
3. In the next screen, provide details for the license server creation.
4. In Step 1, Enter the details as below:
Description – This is a cloud license Server
5. In Step 2 Features, select the available features based on the purchase of licenses.
6. Select NVIDIA virtual PC and NVIDIA Virtual Applications and enter the amount of license that needs to be added.
Example: I have just added 1 license for each.
7. In Step 3 Environment, select the option CLOUD (CLS).
8. Select Express Installation.
9. In Step 4 Configuration, select Standard configuration, which will configure all the default settings for Cloud License Server.
10. Review the summary and click Create Server.
11. Wait for Cloud License Server to be created in the console and verify the required license configuration exists.
12. Verify the License server is created successfully.
13. Click on Actions and select Generate Client Config Token.
14. Navigate to Settings to modify Lease Duration settings if needed.
By default the lease time is 24 hours and upon expiration of lease time the client will acquire another license from the Cloud License server instance. It’s an automatic process that handles the licensing by communicating with the URL over port 443.
Licensing operations, namely, the borrowing, renewal, and return of a license.
Licensed client authentication
License return from a Windows licensed client that has not been shut down cleanly
Install Driver on Master Image in Azure
For the image to be created in Azure, first you need to finalize the VM Family size to go with it. This depends on multiple factors like supported driver version, supported vGPU cards, acquired licenses for vGPU cards. In my scenario, the customer already purchased the license for NVIDIA Tesla V100 vGPU Cards and in Azure NCv3 is the VM Family size that offers Tesla V100 vGPU card.
The NCv3-series is focused on high-performance computing and AI workloads featuring NVIDIA’s Tesla V100 GPU
Important Note: Please identify the discount with MS account rep before selecting any specific VM Family size, always prefer to go with Reserved Instances + Savings Plan to save cost by 80% from normal Pay-as-you-go pricing.
1. Create a new Native Azure VM, by selecting NC6s_v3 as VM Family Size in Azure portal.
Why do we need to create new VM in Azure?
So that you can bind the catalog to the respective VM family size and select the required machine profile pointing to master image.
2. Login to the Azure Image using the local administrator account.
3. Login to the Licensing portal and download the latest vGPU package including the guest drivers.
4. I preferred to go with the latest version – 16.1, released on Aug 29, 2023
Note: You could also install driver using Azure VM Extensions, but there seems to be an issue with how binaries are pushed from Azure, some folder structure within the C:\ProgramFiles\NVIDIA Corporation\ seems to be missing post installation of the drivers.
5. Post download of the binaries from the portal, copy the zip folder to C:\Support
6. Right click on the exe file and select Run As Administrator.
7. Let the binaries extract to the local folder as displayed.
8. In the System Check window make sure there are not computability errors. If yes, then restart the VM and proceed with the installation again.
9. Under license agreement, select Agree and Continue to proceed further with the installation.
10. Under Installation options, select Custom(Advanced) to proceed with the clean install for drivers on the operating system. Click Next.
11. In the custom installation options, check the box for perform a clean installation. Click Next.
12. Monitor the installation process and wait for the drivers to successfully install.
13. Once installation has finished and status shows installed, click Close.
14. Post Driver installation, create below registry key on location in the master image:
Create – FeatureType DWORD,
Set Value – 2
Reference: Client Licensing User Guide :: NVIDIA Virtual GPU Software Documentation
Note: Do not download and copy the client configuration file token on the master image to avoid license consumption.
Physical GPUs only:
Add the FeatureType DWord (REG_DWORD) registry value to the Windows registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\nvlddmkm\Global\GridLicensing.
Note: If you’re licensing an NVIDIA vGPU, the FeatureType DWord (REG_DWORD) registry value is not required.
NVIDIA vGPU software automatically selects the correct type of license based on the vGPU type.
If you are upgrading an existing driver, this value is already set. You can also perform this step from NVIDIA Control Panel.
Set this value to the feature type of a GPU in pass-through mode or a bare-metal deployment:
0: NVIDIA Virtual Applications
2: NVIDIA RTX Virtual Workstation
Limitation of Azure VM Extension
Please do not utilize Azure VM Extension for Driver install on the native master image in Azure, as this does not properly configure drivers and misses some configuration folders in the System Drive with respect to NVIDIA corporation. I have already submitted the case with NVIDIA and provided feedback to Microsoft as well to adjust the binaries on Azure backend, so that with VM extension feature, proper version of drivers could be installed directly on the client VDIs.
This will avoid hassle for admins to install the drivers directly on the image. However, my preferred way would be to install the drivers locally on the image, so all the subsequent newly provisioned VDIs will get the latest version installed on the VDIs.
Provision Catalog and Create VDIs
Follow the process below to provision catalog and VDIs:
1. Shutdown the master image in Azure.
2. Login to Azure portal to Create Snapshot from the Native Azure Image.
3. Login to Citrix DAAS console, navigate to the Machine Catalogs.
4. Create a machine catalog by pointing to the respective snapshot and machine profile for the Azure Image.
5. Follow through the catalog creation process, review the summary and monitor the VDI deployments.
Configure Licensing on Client VDIs
In this section, I walk through how to configure the license on the client VDIs to communicate successfully with the CLS (Cloud License Server). Ensure to have communication out to internet allowed over 443.
Step 1 – Add the registry Key for FeatureType on the client VDIs. Open PowerShell as administrator and run the following command:
New-ItemProperty -Path “HKLM:\System\CurrentControlSet\Services\nvlddmkm\Global\GridLicensing” -type DWORD -Name FeatureType -value “2”
Step 2 – Download the configuration token file from NVIDIA Licensing Portal and copy it to the default location: %SystemDrive%:\Program Files\NVIDIA Corporation\vGPU Licensing\ClientConfigToken folder
Step 3 – Restart the NvDisplayContainer service.
Step 4 – On the client machine you will notice a notification stating that Acquiring NVIDIA License RTX Virtual Workstation, depending on the OS, immediately followed by the notification NVIDIA License acquired.
All the above Step 1 – Step 3 could be easily scripted and triggered remotely on the newly provisioned VDIs, either by running Script Based Action Triggers using ControlUp or Scripted Tasks using WEM.
Note: Log location for NVIDIA licensing: This could help in troubleshooting issues related to the license acquiring process.
From Start menu open NVIDIA Control Panel and select Manage License under Licensing, it will display the licensing status.
License System User Guide – NVIDIA Docs
Azure VM sizes – GPU – Azure Virtual Machines | Microsoft Learn
NCv3-series – Azure Virtual Machines | Microsoft Learn
NVIDIA Virtual GPU Software License Server End of Life Notice (August 31, 2022) :: NVIDIA Virtual GPU Software News and Updates
Client Licensing User Guide :: NVIDIA Virtual GPU Software Documentation
NVIDIA GPU Driver Extension – Azure Windows VMs – Azure Virtual Machines | Microsoft Learn
Latest CUGC blogs: