Did you ever have one of those days where you’re fighting an issue that lasted for days? Working in IT, these days sprout up every now and again. It just comes with the territory. I recently had one of those days. Our production NAS decided to flake out and it took a few days to recover. We run our Citrix PVS images off this NAS, so our environment was down. This story is about the steps we took to recover Citrix Provisioning Services.
The environment is running two Windows 2016 servers with Citrix Provisioning Services 7.15 in a Citrix 7.12 environment. PVS is configured to use an SMB share running off of two Windows 2012 R2 servers running file clustering services where the images for PVS are stored and streamed. The PVS store has been configured to point to this SMB share called SMBD1.
SMB Share Offline
The file clustering services went offline and took down the SMB share for PVS. Meaning any Citrix workloads streaming off that SMB share are now down. The file clustering services were bouncing back and forth between the two nodes. There were windows of time where the service was available. During these windows, we began copying the images off the file cluster and onto the local disks of the PVS servers. Luckily, the windows were long enough we were able to copy the images off the cluster and onto the local disks of PVS.
Next, we created a new store in PVS and named it LOCAL. We imported the VHDX files into PVS, repointed all the workloads under Device Collections in PVS console and rebooted the workers. Our core production images were now up and functional. So we thought.
Multiple Events on PVS Servers
After a few hours, we received calls that people couldn’t access any applications from our environment. Looking inside of Studio, the VDAs had become Unregistered. Went into PVS and saw two events created under Applications events with Event ID 11. There were two Event ID 11’s, one saying Detected one or more hung threads and the other saying Terminating StreamProcess. Both appeared before the VDAs became unregistered. Looking at the services, the StreamProcess said it was running, but looking at the Servers folder in PVS console, said the stream had stopped. A quick restart of the StreamProcess brought the VDAs back online. Until these events happened again. These events started to happen more frequently ranging from 5 to 15 minutes apart.
Most Importantly, the Fix
A call to support said we needed to remove the reference images under our SMBD1 store in PVS. Since PVS database still had reference points to the SMB share, it was trying to connect to them. When it couldn’t connect, PVS would stop the stream and created the two events. We went into our SMBD1 store and removed the reference images. Once the references were removed, PVS stabilized and the VDAs stayed registered.
LOCAL Store with images
SMBD1 Store removed references
When we imported the images into the Local store, we did rename one of the images. I’m not sure if the database is pointing to a GUID, but removing the references from the original store of SMBD1, helped fix our issue. This fix was a quick a dirty one, but allowed us to get our production environment back up and running to give us the breathing room we needed to repair the file clustering services.