Running a VSAN PoC – Customer reactions

I recently set up a VMware Virtual SAN 6.1 Proof-of-Concept for a customer, configuring a 3-node cluster based on the following setup:

Hardware:c04411605

  • HP ProLiant DL380 G9
  • 2 x Intel Xeon E5-2680 @ 2.50Ghz w/12 Cores
  • 392 GB RAM
  • 1 x Intel DC 3700 800GB NVMe
  • 6 x Intel DC S3610 1.4TB SSD
  • HP FlexFabric 556FLR-SFP+ 10GBe NICs

Virtual SAN Setup:

Since this was a simple PoC setup, the VSAN was configured with 1 disk group pr host with all 6 Intel DC S3610 drives used as the capacity layer, and the Intel DC P3700 NVMe cards set up as the cache. This gives a total of 21.61TB of usable space for VSAN across the cluster. With the Failures-To-Tolerate=1 (the only real FTT policy available in a three node 6.1 cluster) policy this gives 10.8TB of usable space.

vMotion and VSAN traffic were set up to run on a separate VLANs over 2 x 10GBe interfaces, connected to a Cisco backend.

Customer reaction:

After the customer had been running it in test for a couple of weeks, I got a single line email from them simply stating: “WOW!“.

They were so impressed with the performance (Those NVMe cards are FAST!) and manageability of the setup that they have now decided to order an additional 3 hosts, bringing the cluster up to a more reasonable 6 hosts, in a metro-cluster setup, and upgrade to VSAN 6.2 as soon as it’s available.  The compression, deduplication and erasure coding features of 6.2 will increase their available capacity just by upgrading. At the same time, adding three new hosts will effectively double the available physical disk space as well, even before the 6.2 improvements kick in.

VSAN will be this customers preferred storage platform going forward and they can finally move off they existing monolithic, and expensive, FC SAN over to a storage solution that outperforms it and greatly reduces complexity.

14 Comments

    1. Well, I didn’t want to include any tests in the posting itself because it usually ends up with a discussion on how IOPS should be measured and block sizes and so on, but I can say the following:

      With the current 3 node setup, using IO Analyzer with the 4k block size, 50% read, 50% write, 100% random standard test, we get about 50000 IOPS out of the solution as it stands now.
      With a MAX IOPS test (0.5k/100%read/0%random) we get 160000 IOPS across the three nodes.

      Take these numbers as is, it’s just a pointer of what we can expect from it.

      Another interesting tidbit, a Windows Server 2012 R2 (with nothing else installed) boots in about 2 seconds, cloning it takes approximately 30 secs from the clone operation starts until it’s powered on and ready to log on.

  1. Hi Christian,

    thanks for the numbers.
    you are pretty much using the exact same metrics i use for the iSCSI sans here.
    So perfect for me to cage against.
    I know IOPS isn’t the be all end all for a Vsan but it’s a good indicator\presentation starting point for when showing a POC to management to get them to agree that yes this solution is worth the investment.
    iSCSI SANs i have are going end of life, at best i squeeze 14,000 IOPS out of them using pretty much the same 4k block metrics. It’s not bad for an old kit but not in the same ball park when comes to those Vsan numbers.
    nice tidbit on 2012r2 also as that’s the default OS i have in the cluster for all VM’s.
    I am also thinking that moving the SQL servers to this would greatly help the stun veeam backup issue i have currently where the lost ping kicks off
    a not wanted SQL cluster fail over.

    1. 14k IOPS out of iSCSI doesn’t sound bad at all, but this all-flash VSAN setup is an entirely different beast. As I said, I didn’t really want to make metrics a part of this post, due to the inevitable vendor pissing match that it will ensue, and nobody got time for that stuff.

      Ae you experiencing stun problems with vSphere 6, or is it 5.x?

      1. Yep fully agree lets not get into any vendor pissing stuff, let’s all start from the same page that VSAN is pretty dam impressive. I need my cfo to sign off on the SSD’s so i can get the POC up and running.

        still on 5.5 here but will be upgrading to 6.x in the next couple of weeks so hopefully the new consolidation process that 6 uses will result in the stun issue going away.

    1. That’s a good question. I’ll check the next time I’m at the customer site (next week) and see how that number arrived. I might have fat-fingered it, but let me double check to make sure.

Leave a Reply