Is it time to say goodbye to Jetstress?

The short answer?  “Yes”  The long answer?  “Yyyyyesssss”

But first let me get this out of the way:  If you want to run Jetstress against any storage configuration I come up with, feel free.  I wouldn’t put it forward if I weren’t confident it could handle the workload. 

Prior to 2007, Jetstress REALLY mattered.  You had 500MB mailboxes that could easily 2-5 IO/s per mailbox.  Cached clients were rare – so storage latency was the primary driver of customer complaints.  Over the last ten years, Microsoft has put a lot of effort into making Exchange a much more storage-friendly application, and they’ve succeeded.  Today you have 0.1 IO/s per mailbox, and it’s spread over 2-5 GB.  Exchange is now Just Another Workload.  So why are we spending all this time and money (not to mention implementation delays) using an unwieldy purpose-built testing tool for something that’s Just Another Workload?

With Exchange 2010 and its very modest IO profile, I question the value of Jetstress as opposed to other testing tools.  The level of effort and sheer amount of time required to create the databases, replicate them, and then run the test are significant.  It can run to weeks for reasonably sized deployment.  Yes, you get assurance that your storage rig is operating properly, but you can get that assurance from tools like Iometer, which can take seconds to set up, and mere hours to complete.

For all the effort and time involved in a Jetstress run, I just expect more.  I’d expect that my entire infrastructure would be validated.  I’d expect assurance that I have enough RAM and CPU in my virtual machines, that my network is up to snuff, access to my domain controllers and global catalog servers is sufficient… but I don’t get any of that with Jetstress.

If I’m going to put in the kind of time and effort into my testing that Jetstress requires, I’m going to fire up an entire infrastructure and use Loadgen and verify my entire configuration – not just my storage.  On the other hand, if I’m going to test my storage independently from my server and network, I would:

Roll my own Exchange IO test with Iometer in 30 minutes

  • Set up your storage on a your production mailbox server
  • Determine the file sizes for your database and logs  You can find that on the LUN Requirements tab of the Exchange Mailbox Storage Calculator

image

  • Using fsutil (built in Windows command line tool), create files called iobw.tst sized according to the DB Size + Overhead and Log Size + Overhead using fsutil.  For our example, we’re looking at 1595 GB database files and  34 GB log files.  This part is not strictly necessary, but I like it.  Creating a file called iobw.tst in the root directory of the target will prevent ioMeter from creating thick files that occupy the entire LUN.
    • fsutil file createnew e:\iobw.tst 1672478720 <———-simulated 1.5 TB database file
    • fsutil file createnew f:\iobw.tst 35651584 <————-simulated 34 GB log file
  • Download Iometer and the Exchange 2010 .icf file I’ve created.  Launch ioMeter and open the icf file.
    • If you’re using mount points instead of drive letters, download the latest Iometer Release candidate for mount point support
  • Determine the target IO throughput for the databases and logs.  This can be determined from the Role requirements tab of the Exchange 2010 Storage calculator

image 

  • Modify the transfer delay in “Exchange 2010 DB Workload” Global Access specification so it will generate the desired number of IOs. The math is: 1000 ÷ target IO/s. Our example requires 30 IO/s per database, and 1000 ÷ 30=33.3.  So we’ll set it to 33.  The original in the icf file is 25, which would generate 40 IO/s.

image

  • Modify the transfer delay in the “Exchange 2010 Log Workload” Global Access Specification so it will generate the desired number of IOs.  Our example requires 7 IO/s per log LUN, and 1000 ÷ 7=142.8, so we’ll set it to 143.  The orignial in the .icf file is 100, which would generate about 10 IO/s.

image

  • Assign the DB Worker and BDM Worker to the database LUNs
  • Assign the Log Worker to the Log LUNs
  • Click the Green Flag and start.  Let it run for 5-10 minutes for a quick sanity check, and stop it.  Make sure it’s driving the IO you want at the latencies you expect, and you’re not gated by CPU or anything like that.
  • Start a perfmon data collection (perfcollect is good for this).
  • Modify the test tab for a however long you’d like.  I recommend at least a few hours.

image

  • Take a nap
  • Go for a run
  • Eat some food
  • Watch some TV
  • When the test completes, open up your perfmon file, look at your disk latencies, make sure they’re steady, there were no spikes, and there were no aberrations in number of IO/s
    • If you’re an EMC customer and use perfcollect, zip up the perfcollect data collection and send it to your TC, or reseller TC, and ask for a WPA (miTrend) report on the server(s).  You’ll get a nicely formatted report with graphs and tables and twenty-seven 8×10 color glossy pictures with circles and arrows and a paragraph on the back of each one

Using this method, you can get in and out of testing mode within easily 36 total hours, and your time will be less than an hour of setup and analysis.  That translates into weeks of time where your users can spend enjoying your cool new messaging infrastructure.