This is more of a dialog of the process that I went through in troubleshooting an issue. This particular issue I think would be rare, as there are only a few situations where total closure from the Internet is actually required or implemented, but the process itself provides a potentially useful guide on logging and investigation of an issue end to end, from Director to delivery machine.
The error was that when you try and initiate a Remote Assistance Session from Citrix Director you get the following error after about 30 seconds:
This coincides with the following event in the Application log of the Director server:
And on the machine that you are attempting to shadow:
Troubleshooting
Start by looking in the IIS Logs on the Citrix Director server. These are in c:\inetpub\logs\LogFiles\W3SVC1. Open the most recent log file and search up from the end of the file for the following line:
‘/Director/service.svc/web/ShadowSession’
The line returned should look something like this:
2015-02-26 09:39:25 111.111.111.111 POST /Director/service.svc/web/ShadowSession - 443 username 222.222.222.222 Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:35.0)+Gecko/20100101+Firefox/35.0 https://citrixdirector.domain.co.uk/Director/default.html?locale=en_US 500 0 0 37343
The values which we care about are the 4 space separated values at the end of the line. These 4 values correspond to the following 4 headings:
sc-status sc-substatus sc-win32-status time-taken
The SC Status of 500 means that an internal server error occurred. The last value is the time taken in milliseconds. If the value is above 30000 then the response was not received quickly enough and a timeout occurred. This timeout is the WCF timeout, which is the default of 30 seconds. In this case, the total time taken was 37343. This indicates that there was a timeout in the request.
The next step is to enable Citrix Director Logging. This will log details of all of the various calls that are made to the Desktop Delivery Controllers from the Director server. The Director server does not communicate with the Delivery machines in any way, all requests are processed by the DDC.
To enable Citrix Director Logging on the Director server:
- Create a folder called C:\Logs
- Assign Modify permissions to the INET_USR account
- Open IIS
- Browse to Sites -> Default Web Site -> Director
- Select Application Settings
- Set the following 2 properties:
- FileName C:\logs\Director.log
- LogToFile 1
- Restart IIS
Now that logging is enabled, you can retry the attempt to shadow the session through the Director web interface. Make a note of the rough time that you click on the Shadow button, it will help in verifying that you are looking at the right record in the log files. Once you have replicated the error, you can open the log file that should have been generated.
In the open file, starting from the bottom, search for: ENTRY: ShadowSession. You should be taken to a row that looks similar to this.
02/26/2015 12:03:01.2926 : [t:9, s:5xj2ur0kvvzhahemlqraow30] ENTRY: ShadowSession service called
The first entry inside the square brackets represents a thread number. All actions happen on a thread. In this case the thread number is 9. This information is useful in tracking the various log items as all related entries will have occurred on the same thread number. About 20 or so lines further down the log file you should see the PowerShell equivalent command that the DDC will have executed in order to start the Shadowing request. It should look similar to this:
02/26/2015 12:03:01.6833 : [t:9, s:5xj2ur0kvvzhahemlqraow30] PowerShell equivalent: New-BrokerMachineCommand -Category DirectorPlugin -Synchronous -MachineUid 78 -CommandName GetRAConnectionString -CommandData (New-Object System.Text.ASCIIEncoding).GetBytes('<GetRAConnectionStringPayload xmlns="http://schemas.datacontract.org/2004/07/Citrix.Dmc.Vda.Command" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><SessionId>fc267736-3565-4642-95e1-d7a85d789ce9</SessionId></GetRAConnectionStringPayload>') | foreach {(New-Object System.Text.ASCIIEncoding).GetString($_.CommandResponseData)}
Right this second we are interested in the next log line on this thread, which should tell you whether or not this command was successful. Chances are if you are reading this then it was not! In the issue we describe here the following error was logged:
02/26/2015 12:03:34.6052 : [t:9, s:5xj2ur0kvvzhahemlqraow30] TimeoutException caught: The request channel timed out while waiting for a reply after 00:00:30. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout.
02/26/2015 12:03:34.6052 : [t:9, s:5xj2ur0kvvzhahemlqraow30] Connector has faulted. Disposing.
At this point, we know it failed, and we know it timed out. This ties up with the timeout value we observed in the first log file. We also have the PowerShell equivalent command which is being run from the DDC. The next step is to verify that the problem is not with the Director server.
Log on to your DDC, open up an elevated PowerShell prompt. We are going to import the Citrix Snap-Ins, and then run the command above to get an idea of how long it is taking.
Add-PSSnapIn *Citrix*
Then copy in the PowerShell command, prepending the statement with a ‘Measure-Command’. It will look like this:
Measure-Command { New-BrokerMachineCommand -Category DirectorPlugin -Synchronous -MachineUid 78 -CommandName GetRAConnectionString -CommandData (New-Object System.Text.ASCIIEncoding).GetBytes('<GetRAConnectionStringPayload xmlns="http://schemas.datacontract.org/2004/07/Citrix.Dmc.Vda.Command" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><SessionId>fc267736-3565-4642-95e1-d7a85d789ce9</SessionId></GetRAConnectionStringPayload>') | foreach {(New-Object System.Text.ASCIIEncoding).GetString($_.CommandResponseData)}}
P.S Don’t forget the additional trailing } that is needed.
After running the command you should have been told how long the command is taking to be run from the DDC. In my case it was always returning 32 seconds. Anything over 30 will be a timeout. If it is a timeout then you can quite safely say that the Director server is not the issue, as all it is doing is reporting the failure of another component. If you find that the system is not timing out, and the request is working, then you will need to investigate communications between the Director Server and the delivery controllers.
Next up is determining what is going on between the DDC and the Delivery Machine. I guessed there must be some form of communication breakdown from the DDC. I opened a copy of WireShark portable on the DDC. Started a capture, and re-ran the PS command from above. Again I had the 32 second timeout.
To make more sense of the results, I applied a filter limiting it to communications to and from the delivery machine with this filter (ip.dst == 192.168.11.51) || (ip.src == 192.168.12.73)
What was returned was a number of successful HTTP requests. There was a number of request at the start of the transmission, about 30 seconds of nothing, and then a handful of exchanges at the end. No failures or retransmissions. I removed the filter and scanned through the remaining entries within this 30 second period, and again nothing strange popped out. (Thankfully this was on a development system, so the amount of traffic generated was actually negligible).
Although at this point I could not be certain that the DDC was not the culprit, I felt the problem actually had to be with the delivery machine. There were no failures being logged on the DDC, no dropped packets or retransmissions, nothing out of the ordinary.
I must add that at this point I did enable logging on the DDC, but I quickly turned it off again. The volume of information in the logs is just overwhelming, and I could not find a way to track requests through the logs. Logging back off, I moved on to the delivery machines.
I started on the delivery machines again with the WireShark trace. I wanted to confirm what I had seen on the DDC matched what was happening on the delivery machines. I started a trace, and again ran the PowerShell script above. I could see the same exchange of HTTP communications, again with the 30 second break in the communications.
Removing the filter though, I was able to see on this machine a couple of requests, which had a number of retries. After the 30 seconds was up, these retries stopped. To prove this, I retried the command with the capture enabled another 3 times. The same couple of IP addresses was attempted to be contacted every time for 30 seconds, before the failure message appeared on the Director.
Each of these requests was an attempt to contact the online Windows Certificate revocation list. The DNS resolved successfully, but attempts to connect were being dropped by the firewall protecting the network. As I mentioned earlier, this is a closed network, with no Internet access for the clients that the network contains.
Each time that a request to shadow was received, the attempt to get the certificate revocation list would be made. This process took about 30 seconds, and the remaining 2 seconds is lost in negotiations and connections between the various servers.
The solution in our particular case was to use Group Policy to tell the clients that they could not use Internet communications, as well as the firewall which enforced that. There seems to be an inherent assumption in Windows 7 that it will be able to contact the Internet unless you tell the client explicitly that it cannot.
The setting is
Computer Management\Administrative Templates\System\Internet Communication Management\Restrict Internet Communication
There are a number of other settings in the next folder, but only this one seemed to stop the CRL check that Remote Assistance was performing.
Hopefully this is somewhat useful in the tracing of errors that you may have though, even if this is not the root cause of your issue.