WARNING: GEEK BLOG POST
When you talk to a storage vendor about asynchronous block replication, your first two questions should be:
- Do you preserve write order fidelity within a single LUN?
- Can you preserve write order fidelity between multiple LUNs?
Consistency Group (CG) technology is cool. When you put all your databases and associated logs in a CG, you can replicate asynchronously and still have your database come up at the DR site every time. When you don’t have it, you need to enforce consistency by entering a state in which the database can be backed up while the database is mounted. With SQL, this would mean using a VDI or VSS requestor to enter that state, taking a snapshot with a hardware provider, and finally replicating that snapshot.
It’s not that snap and replicate is a bad thing – people have been doing it for years. But it does limit your achievable recovery point objective to double the frequency with which you can comfortably quiesce and replicate your database. It also limits your achievable recovery time objective because often extra steps are needed to recover your database.
This is all tribal knowledge amongst storage and database folks. But people often don’t know why, either because storage and database administrators are mortal enemies or they speak different languages.
So here’s why:
Let’s start with a concept known as “Write Order Fidelity” (WOF). When applied to asynchronous remote replication technology, this means that the writes at the disaster recovery (DR) site are applied in the same order as they were applied at production site.
Async replication without WOF
In the instance above, when you try to attach that database, it will appear wholly inconsistent and may not attach. Worse, it could attach a corrupt database successfully.
WOF preservation looks like this:
Async replication with WOF
In this case, you’re replicating asynchronously, but the writes are applied to the DR site in the same order they were applied at the production site. So at any given time, the data at the DR site looks as if the server had simply stopped working at the production site. There’s data loss, and transactions may need to be rolled back, but that’s an automatic, normal operation with a database like SQL, the JET database backing Exchange, or Oracle. In fact, that’s what SQL does every time there’s an unplanned cluster failover.
But why don’t we need WOF with synchronous replication? That’s an interesting question. First, WOF is implied with true synchronous replication. Second, true synchronous replication actually writes to the DR site before writing to the production site:
Sync replication – WOF is always enforced
In this case, the DR site is always in complete synchronicity with the production site, writes must be acknowledged at the DR site prior to being considered “applied” at the production site. Of course, this presents the optimal situation – replicated data with no potential for data loss. However, it comes at a cost: any network latency you have will be added to the storage latency. So in effect the distance you can replicate is limited by the storage latency your application can tolerate. For those of you keeping notes, you generally want to keep your write latencies to your transaction logs under 10 ms, which makes for a pretty limited distance.
So that’s the reason behind the first question you’re asking the storage vendor. What’s up with write order fidelity among multiple LUNs?
It turns out that most people will follow their database vendor’s advice and put their database and transaction logs on separate LUNs. It’s sorta outside the scope of this post, but in general it’s to ensure recoverability in the event of a lost LUN. It’s also for performance purposes – your transaction log is sensitive only to write latency and is always sequential in nature, whereas your database is more sensitive to read latency and can be random, sequential, or anything in between.
The function of preserving write order fidelity across multiple LUNs is generally performed by a “Consistency Group” (CG) in EMC parlance. Usually other vendors will use that term – I don’t believe it’s trademarked. CG technology is integrated into RecoverPoint, SRDF/A and even MirrorView/A. Remember, it’s not needed with any true synchronous technology. But most people have asynchronous replication requirements.
And Groups are really, really important for databases
This has to do with the ACID properties of databases that are in wide use today (if you want a brief but cool read on the history of the modern database, wander on over here). Specifically iy has to do with the atomicity part of the ACID properties. If part of a database transaction fails – no matter the reason - the entire transaction gets rolled back.
That’s one of the big reasons the transaction log even exists. Lots of storage people think the log is there only for rolling forward in the event of a failure. Not true. It can be used to roll back in the event that a transaction fails. In fact, storage failure is not the only reason a transaction fails. Go look at the ACID properties to see other reasons why a transaction might fail.
So anyway, with atomicity in mind, consider the following scenario: You’re replicating asynchronously, and you’ve verified that your storage vendor honors write order fidelity within a single LUN. However, write order fidelity is not honored among multiple LUNs, and you’ve followed best practices in separating your databases and logs. A failure scenario might look like this:
Multiple LUNs without consistency group technology
In this case, the database is slightly “ahead” of the transaction log. The RDBMS (like SQL or Oracle) would say, “well I’ve got only part of a transaction here. No problem. I’ll roll it back. I’ll refer to my transaction log to see how I might achieve exactly that”.
Keep in mind I don’t write software for a living. I’m paraphrasing.
However, when it refers to the transaction log, it doesn’t see stuff in there relevant to how it might go about rolling back the transaction. In my snazzy animation, it needs data from blocks six and nine to roll back the transaction. The RDBMS promptly gives up, goes for a latte, leaving you to restore from a backup.
Enter a consistency group. As I’ve mentioned, this technology enforces write order fidelity across multiple disks. So you can have your cake and eat it too.
Multiple LUNs with consistency group technology
In this case we see a failure happen in mid-transaction. Of course this can happen any time even without any sort of remote replication. However, if the database and transaction log are in the same consistency group, the transaction log will always have the data necessary to automatically roll back the transaction and begin processing.
That’s about all there is to it. When I call this “crash consistency” the emphasis is on “consistency”. As long as all the data associated with the database (logs and DB file) are consistent, the RDBMS will be able to recover. It’s a normal, regular, every day operation that happens whenever a fault is sensed within a SQL Server resource group. Emphasizing “crash” as in “car wreck” is misleading.
Lastly, it’s only a matter of time before someone at Pixar notices my awesome animations and calls me with some sort of really cool job offer. So I’m not sure how long I’ll be around here.