Please stop using SQLIOSim to model SQL workloads

Every few months I get a (sometimes panicked) call or email about SQLIOSim that goes like this: 

“We’re doing some testing of a new disk array.  SQL Server’s going to be one of the workloads, so we downloaded SQLIOSim and are running it, but the disk latency is terrible.  Can you help?”

Well, the disk latency is terrible because SQLIOSim is designed to test IO stability and functional correctness, not to model a particular workload.  SQLIOSim actually attempts to break the storage (create page corruption or stale reads) so as to expose configuration or hardware problems that can cause data corruption.  It doesn’t really try to create a workload that “looks” like, say, an OLTP or DW workload.

So you can use SQLIOSim to make sure your storage is healthy, but keep in mind:

  • It is a pass/fail test.  Either you have stale reads/corrupt pages or you don’t.
  • Performance data like queue length and latency are irrelevant during SQLIOSim runs
  • You cannot compare the performance results of one SQLIOSim run to another
  • Be careful about other workloads on the array while you’re running SQLIOSim

For more discussion on SQLIOSim, there are a bunch of good links referenced here.

You might wonder how you actually see how a new storage environment is going to perform under a particular workload.  There are several options, and I’ll list them in terms of accuracy of the model.

Use SQL Profiler to capture and replay traces

This is by far and away the most accurate way of seeing how a given storage configuration will perform under a production workload.  After all, it is precisely the same workload as production.  You can also use something like Benchmark Factory to modify the workload to anticipate growth and so forth.

Use Perfmon to gather data and Iometer to create the workload

This is somewhat less accurate than traces, but can be easier to set up.  You gather all the information needed to create iometer workloads using perfmon and educated guesses.

  • Disk Reads & Disk Writes/sec – use that to figure the r:w ratio
  • Data file sizes – use that to figure the working set
  • Disk Bytes/Read & Write – use that to figure the IO sizes
  • Random vs Sequential – there is no way to determine randomness of the workload from perfmon. Logs will always be 100% sequential, DB files will be a mix. Think about the workload a bit to determine the mix (100% random will usually be safe).

Aside from the educated guesses around randomness and thread counts, there are two other big issues with iometer:  It goes as fast as it can.  There’s no way to configure Iometer to only issue a specific number of IOs/second.  So it will reflect the workload type, but not the workload rate The other thing it will not approximate is the skew.  Most database environments will have certain tables or parts of the dataset that are more active than others.  In some cases, over 90% of the IO can be directed at less than 5% of the total data set (low skew).  Consider an order/inventory OLTP database that has decades of data in it – nearly all the IO will be directed at the latest data – this is a low skew environment.  If you believe you have a low skew environment, you can set the file size to what you estimate the total skew to be, and create a fully random worker (but preferably use a trace instead of iometer).

Use the SQLIO Disk Subsystem Benchmark Tool

This is actually my least favorite method.  SQLIO is indeed extremely simply to use, but there is no way to make this look like a production workload.  You choose one workload type – all reads, or all writes.  One IO size.  Random or sequential.  Although it can expose some of the limits around a disk configuration, there is no way to make it look like a “real” SQL workload.

At the end of the day, the only workload generator that will create a workload that looks like yours is SQL Profiler and traces.  And it’s more important than ever to capture and recreate the workload accurately, including such features of the workload such as skew, since they directly impact the effectiveness of modern storage techniques such as extended cache and storage auto-tiering.


Update:  There is a way to limit the IO rates with IOMeter, and I touch on how to do it in this blog post.  I'll probably do a video demo on this shortly.

New FRCP Amendments – Clarification or Adding Confusion

The preservation of electronically stored information (ESI) is one of the biggest sources of confusion in eDiscovery.  This area of eDiscovery has been governed almost entirely by common law, as the Federal Rules of Civil Procedure (FRCP) do not explicitly address the many questions inherent in the duty to preserve, such as trigger, scope, duration, etc.  It has also been argued that the FRCP gives insufficient guidance regarding the imposition of sanctions for violations of this duty.  That’s why, just a short five years since the last one, there has been a push by many in the legal community to amend the FRCP.

Consider this complex and ambiguous definition of the duty to preserve from the Supreme Court of Texas:

A party must preserve “what it knows, or reasonably should know is relevant in the action, is reasonably calculated to lead to the discovery of admissible evidence, is reasonably likely to be requested during discovery, [or] is the subject of pending discovery sanctions.”  Trevino v. Ortega, 969 S.W.2nd 950 (Tex. 1998).

During the federal rulemaking process, the Advisory Committee on Civil Rules holds conferences around the country for public comment. These conferences are for the committee to consider competing opinions on issues of concern that will need to be addressed by the rulemaking initiative. The dialogue began in earnest in May 2010 at the Duke Civil Litigation Conference which brought together more than 180 federal judges, practitioners, and academics to discuss issues of access, fairness, cost, and delay in the civil litigation process.  The movement to amend the rules gained steam at the Discovery Subcommittee meeting in Dallas in September and took up the bulk of the discussions at the Sedona Conference’s Annual Meeting in October of this year.

Even ahead of the meeting, the Sedona Conference issued a survey to its Electronic Document Retention and Production members seeking input on questions around the difficulties around the duty to preserve. 95 percent of respondents indicated that preservation issues have become increasingly significant in civil litigation over the past five years. Additionally, respondents indicated that significant preservation issues arose in 91 percent of $1 million plus matters “sometimes,” “often” or “always,” and 80 percent of cases from the respondents needed court intervention to resolve the preservation issue.

While it is clear that preservation has become a more complex and risky process, there is no consensus that this pain is directly due to the FRCP.  Amending the rules to specify what constitutes a trigger and dictate an across the board scope and duration can be fraught with issues and will only add complexity and ambiguity.  It also has the strong potential to create more reasons for litigation over eDiscovery.

That’s why if there are to be any amendments to the FRCP to ease some of the eDiscovery burdens litigants are facing it should be an amendment to Rule 37 regarding associated sanctions.  One participant explained it best:

The problem isn’t in the triggers as much as it is in the execution.  That is, clients and lawyers are worried that their reasoned judgment will be second guessed and sanctioned by the court later. Protecting reasonable preservation will encourage a reasoned and documented preservation process that would provide a “safe harbor” from spoliation claims and sanctions to good-faith litigants.

The subcommittee apparently agreed.  The Advisory Committee met on November 7-8th in Washington D.C. to continue the preservation rule discussions and consider “the Subcommittee’s present thinking…that the rulemaking focus should be limited to sanctions regulation.”   If the full Committee also agrees, the Subcommittee will try between the November and March meetings to develop a specific proposal.

Stay tuned!

EU Juggernaut Germany Looks at Business Related e-Mail Different than from a Pure Privacy Perspective?

Tom Reding

Recently, the Higher Labor Court of Berlin-Brandenburg Germany ruled that an employer has the right to access and review an employee’s work-related e-mail during his / her absence from work.

The ruling makes it very clear that an employee’s rights to use the company’s e-mail system for private communications does not preclude the employer from reviewing an employee’s business related e-mail.

The circumstances behind this ruling were as follows:

  • The plaintiff (employee) could not work due to a long-term illness.
  • The employer was unsuccessful in locating the employee to get her consent, so that they  could access and read her business related e-mails, in order to respond to a customer’s request.
  • After several weeks, the employer circumvented the employee’s password, read and printed the employee’s business related e-mails.  (The employer did not read or print any e-mails labeled “private”.

The plaintiff (employee) requested a court order prohibiting her employer from accessing her e-mail account during any future absences without her explicit consent but, was unsuccessful in obtaining such an order.

The Higher Labor Court rejected the plaintiff’s reasoning that, because she and all other company employees were permitted to use the company’s computer system for private e-mail, her employer should therefore, be considered a so-called “provider of telecommunications services” and thus be required to observe the “secrecy of telecommunications” according to Germany’s Telecommunications Act (Telekommunikationsgestz).

The Higher Labor Court said, allowing use of a company e-mail system for private communication is merely a side effect of the employment relationship and does not fall under the scope of the Telecommunications Act.

As a non-German outsider, it appears that under certain circumstances an employer in Germany who may have a business need to read an employee’s e-mail may be granted the ability to do so, when the employer is unable to obtain the employee’s permission to do so.  This ruling represents to this non-German outsider, that the EU Privacy Laws regarding e-mail and other electronic communications has its limitations as discovered in this court ruling.

The author finds this most noteworthy.

Exchange 2010 BDM

A colleague (and good buddy) of mine recently posted a blog post addressing the impact of Background Database Maintenance (BDM) in Exchange 2010.  This topic is often an area of confusion for people and was worth reposting. 

Essentially BDM can be scheduled to run continuously on active databases or scheduled to run at specific times but it runs 24 x 7 on passive databases.  Also, the BDM process generates a 5MB/sec Read workload per database so environments with more databases need to ensure that the environment has been properly sized including the network infrastructure between the servers and the storage.

For the full post, check out Paul's Flipping Bits site at: and for further reading, be sure to check out the EMC whitepaper Microsoft Exchange 2010: Storage Best Practices and Design Guidance for EMC Storage for more detailed sizing recommendations when using EMC storage.