Centrally Disable NAT in VMware Workstation

A fellow IT-professional, who works with the non-wired flavor of networking, contacted me with the following scenario:

A group of users, developers in this case, have VMware Workstation installed on their laptops. This makes it easy for them to manage, test and develop their applications in a closed environment without having to install a bunch of tools/services on their centrally managed laptop environment. An excellent use case for VMware Workstation if there ever was one.

So far, so good. The problem in this particular case was that due to security policies in the network infrastructure there was a need to disable the NAT networking possibilities in VMware Workstation.

Network address translation (NAT) configures your virtual machine to share the IP and MAC addresses of the host. The virtual machine and the host share a single network identity that is not visible outside the network. NAT can be useful when you are allowed a single IP address or MAC address by your network administrator. You might also use NAT to configure separate virtual machines for handling http and ftp requests, with both virtual machines running off the same IP address or domain. See Network Address Translation (NAT).

VMware Workstation NAT Configuration
VMware Workstation NAT Configuration

Since the VM shares the host MAC address and IP, blocking network access from the VM is not trivial in this scenario.

Thankfully, in VMware Workstation for Windows, NAT is provided through a Windows Service that we can manipulate. By disabling the “VMware NAT Service” we can ensure that NAT does not work, and that the only real alternative is to run the VM in “Bridged Mode”.

Bridged Mode makes it easier for network admins to manipulate access, since the virtual network adapter is exposed to the switches with their own MAC address, and thus possibly also their own IP address, and the VM is not “hidden” behind the hosts MAC. For instance, this makes it possible for the network gurus to limit the VMs physical network access to internet access only, and not exposing the internal network to the VM.

Running around disabling the “VMware NAT Service” on all clients that run VMware Workstation is no fun job, so naturally we need to find a way to automate this as well.

Enter Group Policy Preferences!

  1. On a computer that has VMware Workstation installed, run the Group Policy Management Console and create a new GPO.Group Policy Management Console
  2. In Computer Configuration > Preferences > Control Panel Settings select Services
  3. In the menu click on Action > New > Service and and click on  “…” next to the Service Name field
  4. Select the “VMware NAT Service”and click “Select”Services
  5. Set the Startup mode to “disabled”
  6. Assign this new Group Policy Preference to the OU that the clients that have VMware Workstation installed on resides in, and the next time the policies are refreshed, the “VMware NAT Service” should be set to disabled.Note: This might require a reboot of the client.
  7. Profit.

And there it is, a workaround on how to disable the possibility for VMs running in VMware Workstation utilizing NAT mode. A bit of a hack, but it works.

Wishlist

I really wish VMware would include the possibility to configure and manage multiple VMware Workstation for Windows installs, through Group Policy and Group Policy Preferences.

The ability to centrally manage configurations and settings would be a welcome addition to this already excellent piece of software, and I am sure that I am not alone in asking for this possibility. So how about it VMware, yay or nay?

SCCM 2007 Not a Virtualization Candidate?

The last couple of days I’ve been in training class, taking the 6451B Planning, Deploying and Managing Microsoft System Center Configuration Manager 2007 course.

One of the first things that got mentioned, was that for larger deployments you should not run System Center Configuration Manager virtually. Of course, this caught my eye as I’m a proponent for the virtualize first “movement”.

It runs out that the reason for this is that Configuration Manager is somewhat poorly designed, as just about everything it receives from the clients in the network is placed in text based log files (inbox folder) before being processed and pumped into the back-end SQL DB. SCCM lives for and eats log files for a living.

I’m sure there are good reasons for this, especially back in the day when SCCM 2007 was developed, but in retrospect this seems to be a poor design choice. Especially since the IO intensity of writing text-based log files is high, and doesn’t scale well when you have loads of clients which SCCM 2007 supposedly is designed for in the first place.

There are ways of alleviating the strain on the machine running SCCM, like running the SQL server on a different server and running the management console on your local computer (remember a Windows server is tuned, by default, to prioritize background tasks), but the fact remains that sum of all the small write operations SCCM constantly does to your storage puts a heavy strain on it.

So, if you want to run SCCM 2007 virtualized in your environment, make sure your storage is up to the task and that you don’t saturate it by deploying what is in essence management software. Perhaps it is better to run it on a physical server with adequate local storage, but don’t blame it on virtualization, blame it on poor design in SCCM 2007.

Hopefully Configuration Manager 2012, which is currently in beta 2, behaves better when it’s released. If not, how will Microsoft defend not getting real performance when running it in Hyper-V (or any other Hypervisor).

Exchange 2010 SP1 and KB2393802 or How to Have an Interesting Afternoon at the Office

Let me start this post out with a little story. I am normally a hardcore virtualization and storage guy. Sometimes my career in this sector brings me into working with stuff I haven’t worked with before because virtualization encompasses so much. As I continue to work with other teams I learn more and more about what they do everyday. I usually find myself involved in every performance troubleshooting session and every new project these days. My personal philosophy is the IT guy of the future will be truly converged just as all the technologies are converging into 1 box or “stack”. Specialties in smaller subsets will fall away and a more specialization in everything Datacenter may become the norm.

Early Monday morning I overheard a conversation about connection issues with our new Exchange 2010 environment while drinking some coffee and reviewing my brand new vSphere design. I didn’t think about it very much until my boss came to my desk and asked me to have a look at the problem. Our messaging guy was on vacation and I was the only other person on staff who had some messaging experience. It seems that all of our global and even local offices were complaining about random exchange disconnections and also including email delivery delays from 30 minutes to 4 hours! It seems Activesync devices and OWA users were not affected by these delays at all. Being always up for learning new stuff I took the challenge.

First let’s start with the quick facts I could put together. We had users in every country we have offices complaining about the random disconnections and delays. I had one actually confirmed in China but had some slight trouble getting exact user names from the local IT person. Also we had connections randomly disconnecting and showing disconnected in the lower right hand corner of the outlook client. I did not have any confirmation of who exactly was having these problems. To start I dug through the event logs on all the servers in the Exchange 2010 environment and the amount of errors I found was overwhelming. To shorten this up a bit and not write a novel most all of those I investigated were directly related to running Exchange 2010 SP1 without any update rollups in place. There were corresponding KB articles from Microsoft confirming these fixes in various update rollups.

I noticed an Event ID 2915 on our CAS servers that stuck out. I noticed several EWS and RPC connections reporting “Session Limit Over Budget”. I correlated this with the Default Throttling Policy Exchange 2010 uses. It seems that the more mailboxes a user opens the more connections Exchange creates. It doesn’t somehow truncate these connections. To understand more about the Default Throttling Policy see Understanding Client Throttling Policies. So I quickly whipped up a powershell script that set the Throttling Policy defaults to Null so there was no restrictions (funny Microsoft states as a workaround just to do this if you encounter an issue).
If you are interested in seeing this script or want me to go deeper about Throttling Policies contact me, but this article isn’t quite about this so I will move quickly on.

After the Throttling Policy was changed the reported disconnections stopped but the delivery delays continued as mentioned all around the globe. With the other problem out of the way I began to realize that the problem seemed very random. Some users experienced it, some not, some couldn’t tell me whether they experienced it or not. This is when the hours of fruitlessly digging through configurations to learn them and reading about Exchange 2010 on Google began. I noticed our mailbox servers were set up in an active – active configuration with bidirectional replication using DAGs. This is when I decided to go back to basics of troubleshooting. I went over to my colleague sitting next to me and sent various test messages to him. All of them were promptly delivered without any problems. I noted down what server his mailbox was running on and moved on. Then I walked around the IT department until I was able to find a colleague that confirmed they had the delivery delays up to 4 hours. Just for kicks I turned off their cache mode on the Outlook client and their problem magically vanished. Then I turned cached mode back on and left it broken since I was determined to fix it on the server side and not just band aid the problem. When I went back to my desk and noticed what mailbox server the colleague with the delays was experiencing a light bulb went off and everything seemed to be coming together. Now all I had to do was note the differences between the 2 servers.

First of all to stop the global issue from occurring while I could resolve the problem I failed all the DAG volumes over to the 1 server that did not seem to be having the problem. Reports quickly came in that the problem was resolved. Then I quickly moved on to examining differences between the 2 servers. After comparing windows updates between boxes I noticed that some updates from February were recently applied to both servers, however, there was 1 difference. It seems Microsoft KB2393802 was applied to one server but not the other. I googled regarding this but only found one vague thing about delays in Exchange 2010 mail delivery mentioned in the middle of a technet article relating to this patch, but nothing official at all from Microsoft. I removed the patch, rebooted, and tested with a test mailbox database running on the server I had created for this purpose. The problem was fixed as I thought.

I tried to research on what about this patch could be causing this problem but came up with nothing. If any of you readers have an idea please comment and let me know your thoughts! I have attempted to contact Microsoft regarding this issue so they could possibly append to the KB article but they currently have not replied.