I’ve talked with several colleagues in the virtualization arena and one of the things they all say is “VDI is tough, it’s always changing, there is nothing harder than virtualizing desktops!” I have learned this lesson the hard way. Two years ago our company deployed VMware’s VDI solution View (now Horizon View) as a proof on concept (POC) to a group of test users, these users ranged from task workers to advanced users running CPU and Graphics intensive applications. That test group was roughly 10 people, 6 months later we deployed VDI in waves to various departments and grew to over 50 users.
Now before I go any further I want to give you a background of the equipment we used to deploy the POC:
- Dell Poweredge R620 – Intel Xeon E5-2690 2.9 Ghz, 128GB RAM, (6) 1GB NIC’s
- HP ProCurve 5412zl L2/L3 Switch
- Dual Dell PowerConnect 24 Port Gigabit Managed Swtiches (SAN Network)
- Dell Equalogic PS 6100 (48TB Raw) – Total IOPS – 1300
The POC had been deployed before I joined the company and at the time the VDI experience was very good. But as we continued into production, we started seeing performance hits at random times. I started in April of 2012 and was working in another area of IT but was quickly attracted to the allure of VDI and everything VMware. So in my spare time I started doing research into VDI performance issues, I learned about PCoIP offloading, CPU and RAM issues, sizing Gold Images properly, etc. I threw everything out that I knew and started over with new Gold Images, same performance issues. This all happened over 15 months.
The problem was right in front of us…
Then it occurred to me (read: Google, forums, talking w/ vExperts) that storage was our issue. I started reading everything about Total and Peak IOPS and how it relates to VDI, I started scoring our various Gold Images and discovered that some of our images had Peak IOPS of over 150! Do the math…..the Equalogic that we were running had a peak of 1300 IOPS, at this point we had over 180 users, so do that easy math: 180 users x 25 IOPS (average) = 4500 IOPS!!!!! Houston, we have a problem.
So what did we do? It’s simple but not easy! We realized that as we grew our VDI environment that we improved everything except storage. We upgraded to bigger, more powerful hosts, improved our Core Switch architecture and expanded to larger SAN switches, upgraded our Power and Environmental systems. We did every upgrade except storage. This is not a slight towards our team or myself, we just didn’t have the knowledge and experience to truly understand what we were dealing with in VDI. Getting back to the solution (that is the title of this article right?) we started meeting with and sizing solutions around various vendors and in the meantime I got the idea to buy a Synology NAS load it up with some SSD’s and give us a fairly inexpensive band aid until we can properly implement a permanent storage solution.
In the left corner….Synology DS3612xs
So let’s talk about the Synology DS3612xs because this thing is a beast! I chose this model specifically because of the 12 bay capacity and its ease of transition into our test lab environment (I’m begging my boss to buy it for my Home Lab!) The specs for this thing are really impressive:
- 12 Drive Bays (Expandable to 36 with Add On Chassis)
- Intel Core i3 CPU
- 8GB RAM
- 4 1GB NICs
- Available PCIe bay (did someone say 10GB?)
- vSphere 5 support with VAAI
- SSD TRIM Support
- Synology Awesomesauce DSM operating system
In the right corner….Intel 520 Series SSD and 10GB Fiber
I went with Intel 520 Series 480GB Solid State Discs because of the reliability, cost and Total IOPS count (42,000 Read/50,000 Write). Because of the Peak IOPS burst, I have heard horror stories about running SSD’s over 1GB so I wanted to have a nice big pipe to our SAN network, I went with a Intel SFP card that supports 10GB fiber. This fit perfectly into our SAN switches and was excited to get everything put together!
Did it fix the IOPS issue?
Yes it has! But that was its intention all along. We took the time, did the research and assembled a reasonable budget and solution that could solve an immediate crisis for our end users. Is it a permanent solution? Absolutely not! But we have seen an immediate performance improvement across the board, from recomposes, pool creation, to end user UI improvements, it has been really nice to finally know but to understand the problem.
The next steps?
Now that we have our band aid we can focus on our permanent storage solution. I am really excited to start working with various vendors and stand up some POC’s to see how the various solutions work with our systems and processes. Until then I get a lot of joy watching the performance metrics every morning during login storms go smoothly. Clone VM’s in seconds as opposed to 90 minutes! I will update this article as I can with some specific performance charts. But for now I am getting ready for our next set of problems after storage….virtualizing graphics. But isn’t that why we are doing this, to learn, understand, solve problems and make things better? I know I am!