SAN Based replication ? no problem.. latency.. Problem..

Disaster recovery has become something which is moving higher and higher up agenda on companies “to do” list. Its becoming increasingly more apparent what the costs to a given business are when companies suffer downtime and/or loss of data..   people are starting to think about the monetary cost to the business is when services or applications are unavailable to both internal staff and more importantly customers and with the big push of server virtualization over the last few years.. where is application data/file data/the application server itself sitting ?  on the SAN;  so it makes sense to leverage that existing infrastructure in the SAN and use some form of SAN based replication.     

Bearing in mind the SAN is no longer a luxury only the privileged enterprise has access to and is becoming ever more important to even small businesses..  not all these organisations have access to biiiig dedicated links between sites and if they do, they’re probably subject to significant contention and unfortunately TCP isn’t the most efficient of protocols over distance.    

So what do you do to make sure the DR solution you have in mind is feasible and realistic ?    

Firstly make sure you pick the right technology    

First port of call is sitting down with the customer and mapping out the availability requirements of their applications. Things like the RPO/RTO requirements of the applications they have in use. Alot of the time the company may not have thought about this in alot of detail, so you can really add value here if you are a reseller. Ultimately it boils down to the following being considered for each service :    

  • How much downtime can you afford before the business start losing money  on each given application.
  • How much data can you afford to lose in the event of a a disaster, before it does significant damage to the business

 

If you can get them to apply a monetary figure to the above, it can help when positioning return on investment.    

There are a few types of Array based replication out there. They normally come in 3 flavours, A-syncronous, Synchronous and Jounaling/CDP and Synchronous Replication.  Synchronous replication can be a bit risky for alot of businesses as usually application response time becomes dependent on writes being committed to disk on both production and DR storage (thus  application response times become dependent also on round trip latency across the link between the 2 sites, spindle count becomes very important on both sites here also).  I often find that aside from banks and large conglomerates the main candidate for synchronous replication in the SMB space  is actually universities. Why ? because often universities don’t replicate over massive distances, they will have a campus DR setup where they replicate over a couple of hundred metres from building to building, so laying fibre in this case isn’t too costly. However, for the average SMB who wants to replicate to another town; syncronous replication isn’t usually preferable due to latency over distance and the cost of the large link required.      

Mirrorview A-Syncronous (EMC)    

A-Syncronous replication is typically what I see in the case of most small to medium size businesses. Why ? firstly, because application response times are dependent on the round trip time of  syncronous replication. With A-Synchronous replication, usually a Copy on first write mechanism is utilised to effectively ship snapshots at specified intervals over an IP link. Below is a diagram showing how EMC Mirrorview/A does this :    

    

EMC  uses whats called a Delta Bitmap (A visual representation of the data blocks on the volume), to track what has been sent to the secondary array and what hasn’t. This Delta Bitmap works in conjunction with reserve LUNs (Delta Set) on the array to ensure that the data that is sent across to the secondary array remains consistent. The secondary also has reserve LUNs in place so that if replication were interrupted or the link was lost, the secondary array can roll back to its original form so the data isn’t compromised.    

Also, you can use higher capacity less expensive disks on the DR site without affecting the response times to production (although application response times will still be affected in the event of a failover, as servers will be accessing disk on the DR box).  One potential drawback with asynchronous replication, is that as both SAN’s are no longer in a synchronous state, you have to decide whether it is important that your remote copies of data are in an application consistent state. If it is important, then you’ll have to look at a technology which will sit in the host and talk to the application and will also talk to the storage. In the EMC world we have a tool called replication manager which does all the various required bits on the host side (calling VSS/Hot backup mode , flushing host buffers, etc).    

Replication manager is licenced per application server (or virtual server in a cluster) and also required an agent per mount host, plus a server licence (or 2 depending on the scenario). There is a lot more to replication manager, but that’s a whole post in itself.    

EMC RecoverPoint    

Recoverpoint is another way of  replication technology by EMC which allows very granular restore points and small RPO’s over IP. Because it employs journalling rather than Copy on first write. It stubs and timestamps at very regular intervals (almost every write in some cases), allowing you to roll back volumes to very specific, granular,  points in time. See below diagram for more detail :    

    

RecoverPoint provides out-of-band replication. To be considered out-of-band, the RecoverPoint appliance is not involved in the I/O process. Instead, a component of RecoverPoint, called the splitter (or Kdriver), is involved. The function of a splitter is to intercept writes destined for a volume being replicated by RecoverPoint. The write is then split (“copied”) with one copy being sent to the RecoverPoint appliance and the original being sent to the target.    

With RecoverPoint, three types of splitters can be used. The first splitter resides on a host server that accesses a volume being protected by RecoverPoint. This splitter resides in the I/O stack, below the file system and volume manager layer, and just above the multi-path layer. This splitter operates as a device driver and inspects each write sent down the I/O stack and determines if the write is destined for one of the volumes that RecoverPoint is protecting. If the write is destined to a protected LUN, then the splitter sends the write downward and will rewrite the address packet in the write so that a copy of the write is sent to the RecoverPoint appliance. When the ACK (acknowledged back) from the original write is received, the splitter will wait until a matching ACK is received from the RecoverPoint appliance before sending an ACK up the I/O stack. The splitter can also be part of the storage services on intelligent SAN switches from Brocade or Cisco.    

For a CLARiiON CX4 and CX3, the CLARiiON storage processor also has a write splitter. When a write enters the CLARiiON array (either through a Gigabit Ethernet port or a Fibre Channel port), its destination is examined. If it is destined to one of the LUNs being replicated by RecoverPoint, then a copy of that write is sent back out one of the Fibre Channel ports of the storage processor to the RecoverPoint appliance. Since the splitter resides in the CLARiiON array, any open systems server that is qualified for attachment to the CLARiiON array can be supported by RecoverPoint. Additionally, both Fibre Channel and iSCSI volumes that reside inside the CLARiiON CX4 or CX3 storage array can be replicated by RecoverPoint. RecoverPoint/SE only supports a Windows host-based splitter and the CLARiiON-based write splitter. Also automatic installation and configuration for RecoverPoint/SE only supports the CLARiiON-based write splitter.   

Below is a Video from EMC demonstrating Recoverpoint in a VMWare Environment : 

   

  

Optimise So how do we ensure we are getting the most out of the links we use (especially over contended links such as VPN or MPLS) ? WAN optimisation..  there are a number of ways this can be done, some use an appliance to acknowledge back to the production SAN locally, then cache the data and burst it over the WAN. Some companies have found a more efficient way of transmitting data over a WAN, by using proprietary more efficient  protocols to replace TCP over the WAN  (such as Hyper IP), Below is a snippet from a mail I received from a company called Silverpeak  who seem to deal with the Challenges of optimizing WAN efficiency quite well, in particular with SAN Replication :      

“Just a few years ago, it was unheard of to combine SAN traffic with other storage applications, let alone on the same network as non-storage traffic. That is no longer the case. Silver Peak customers like NYK logistics are doing real-time data replication over the Internet. Want to learn more? Here is a demo of EMC replication running across a shared WAN ”  

 

   

   

    

    

 

 

  In summary   

Replication is a Biiiig topic..  there are many more factors to be considered; such as automation, cluster awareness, etc.   I think the best way to summarise this post is…      

To be continued     

      

 

Advertisements

About interestingevan

I work as a Technical Architect for a Storage and Virtualisation distributor in the UK called Magirus. The goal of this blog is simply to be a resource for people the want to learn about or go and Sell storage. I'm a qualified EMC Clariion Technical architect, Commvault Engineer and Cisco Unified computing specialist. I have also worked with the rest of the EMC portfolio for a good few years. This Blog will provide information on how specific technologies work, what questions need to be asked in order to spec certain products, competative info and my two pence on some of these technologies. Please feel free to provide feedback as to the content on this blog and some bits you'd like to see. View all posts by interestingevan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: