A good video I came across from EMC discussing some storage protocol considerations when looking at VMware.
Tag Archives: Clariion
So I’ve just come back from a week over at EMC in Cork and have the privilege of seeing the flashy lights of a Vblock and speaking with the Various VCE subject matter experts. So where do I start ?
For those of you who aren’t familiar with Vblock or what the VCE (VMware, Cisco, EMC) coalition is all about you can go to www.vcecoalition.com or watch the below vid from the VCE guys for the polished positioning :
This post is more for those of you who are familiar with the VCE offerings. I shall start with the single support element of the Vblock which has been a subject of some debate since, as there was some ambiguity around what acadia does and where it operates… so, lets start with forgetting about Acadia. That sorts that 🙂 its all about the SST (Seamless support Team). The SST is a dedicated vBlock support team based in Cork (amongst other places), which consists of VMWare, Cisco and EMC qualified staff, all working under one roof, they are responsible for qualifying a vblock, supporting customer implementations of vBlock and more importantly for those who qualify as VCE partners, the SST will also support in the pre-sales element of vBlock and in qualifying the opportunity.
More information on VBlock support can be found here
Can I add a xxxxx to my vBlock ?
No !.. well not without an official exception from the SST anyway and to be fair, aside from maybe adjusting quantities of disks/blades/memory, the architecture for a vBlock shouldn’t need to be changed. For the most part, if your goal if to move toward the virtualised Datacenter then the vBlock should meet that requirement with the validated architecture. Bear in mind the vBlock is designed to sit in a Datacenter environment, effectively at the access layer and uplink into an existing network core/aggregation layer (which is where you would provide services such as firewall/VPN termination/Layer 3 routing, etc.. ) and these elements do not fall under the remit of the seamless support team. The SST only look after the vBlock component(s), other components aside from the vBlock will have to fall under the support of their native vendors.
Why can’t we just add everything VMWare/Cisco/EMC which we have to the same support contract ?!
One of the reasons the SST is so effective is that they have a number of vBlocks within their support centers which all support personnel have access to, this means that they can re-create any issue which a customer may log and massively increase the speed to resolution. This wouldn’t be possible if they didn’t police what a supported vBlock implementation is, then it would make life very difficult in this issue staging and resolution. Also, yes the vBlock is an impressive of flashing lights and cool tech, but aim of a pre-validated architecture is that this enables the customer conversations to be geared more toward meeting business requirements than technical one, as the technical validation is already done. All the validated reference architectures are available at http://www.vcecoalition.com/solutions.htm
However, if it is felt that a component is absolutely required the an exception can be applied for and approved at the discretion of the SST. But don’t go asking to add a HP server or juniper switch… not gonna happen 😉
Bear in mind that it is early doors and although it may appear to be restrictive having to abide by the validated architectures and use cases,but it is early days and more and more validated architectures and options to the vblocks are in the process of going through the required testing to ensure that they are truly technically validated and can be supported by the SST.
I will post more on the positioning and technology of vBlock in due course. for now.. I gotta eat.
Disaster recovery has become something which is moving higher and higher up agenda on companies “to do” list. Its becoming increasingly more apparent what the costs to a given business are when companies suffer downtime and/or loss of data.. people are starting to think about the monetary cost to the business is when services or applications are unavailable to both internal staff and more importantly customers and with the big push of server virtualization over the last few years.. where is application data/file data/the application server itself sitting ? on the SAN; so it makes sense to leverage that existing infrastructure in the SAN and use some form of SAN based replication.
Bearing in mind the SAN is no longer a luxury only the privileged enterprise has access to and is becoming ever more important to even small businesses.. not all these organisations have access to biiiig dedicated links between sites and if they do, they’re probably subject to significant contention and unfortunately TCP isn’t the most efficient of protocols over distance.
So what do you do to make sure the DR solution you have in mind is feasible and realistic ?
Firstly make sure you pick the right technology
First port of call is sitting down with the customer and mapping out the availability requirements of their applications. Things like the RPO/RTO requirements of the applications they have in use. Alot of the time the company may not have thought about this in alot of detail, so you can really add value here if you are a reseller. Ultimately it boils down to the following being considered for each service :
- How much downtime can you afford before the business start losing money on each given application.
- How much data can you afford to lose in the event of a a disaster, before it does significant damage to the business
If you can get them to apply a monetary figure to the above, it can help when positioning return on investment.
There are a few types of Array based replication out there. They normally come in 3 flavours, A-syncronous, Synchronous and Jounaling/CDP and Synchronous Replication. Synchronous replication can be a bit risky for alot of businesses as usually application response time becomes dependent on writes being committed to disk on both production and DR storage (thus application response times become dependent also on round trip latency across the link between the 2 sites, spindle count becomes very important on both sites here also). I often find that aside from banks and large conglomerates the main candidate for synchronous replication in the SMB space is actually universities. Why ? because often universities don’t replicate over massive distances, they will have a campus DR setup where they replicate over a couple of hundred metres from building to building, so laying fibre in this case isn’t too costly. However, for the average SMB who wants to replicate to another town; syncronous replication isn’t usually preferable due to latency over distance and the cost of the large link required.
Mirrorview A-Syncronous (EMC)
A-Syncronous replication is typically what I see in the case of most small to medium size businesses. Why ? firstly, because application response times are dependent on the round trip time of syncronous replication. With A-Synchronous replication, usually a Copy on first write mechanism is utilised to effectively ship snapshots at specified intervals over an IP link. Below is a diagram showing how EMC Mirrorview/A does this :
EMC uses whats called a Delta Bitmap (A visual representation of the data blocks on the volume), to track what has been sent to the secondary array and what hasn’t. This Delta Bitmap works in conjunction with reserve LUNs (Delta Set) on the array to ensure that the data that is sent across to the secondary array remains consistent. The secondary also has reserve LUNs in place so that if replication were interrupted or the link was lost, the secondary array can roll back to its original form so the data isn’t compromised.
Also, you can use higher capacity less expensive disks on the DR site without affecting the response times to production (although application response times will still be affected in the event of a failover, as servers will be accessing disk on the DR box). One potential drawback with asynchronous replication, is that as both SAN’s are no longer in a synchronous state, you have to decide whether it is important that your remote copies of data are in an application consistent state. If it is important, then you’ll have to look at a technology which will sit in the host and talk to the application and will also talk to the storage. In the EMC world we have a tool called replication manager which does all the various required bits on the host side (calling VSS/Hot backup mode , flushing host buffers, etc).
Replication manager is licenced per application server (or virtual server in a cluster) and also required an agent per mount host, plus a server licence (or 2 depending on the scenario). There is a lot more to replication manager, but that’s a whole post in itself.
Recoverpoint is another way of replication technology by EMC which allows very granular restore points and small RPO’s over IP. Because it employs journalling rather than Copy on first write. It stubs and timestamps at very regular intervals (almost every write in some cases), allowing you to roll back volumes to very specific, granular, points in time. See below diagram for more detail :
RecoverPoint provides out-of-band replication. To be considered out-of-band, the RecoverPoint appliance is not involved in the I/O process. Instead, a component of RecoverPoint, called the splitter (or Kdriver), is involved. The function of a splitter is to intercept writes destined for a volume being replicated by RecoverPoint. The write is then split (“copied”) with one copy being sent to the RecoverPoint appliance and the original being sent to the target.
With RecoverPoint, three types of splitters can be used. The first splitter resides on a host server that accesses a volume being protected by RecoverPoint. This splitter resides in the I/O stack, below the file system and volume manager layer, and just above the multi-path layer. This splitter operates as a device driver and inspects each write sent down the I/O stack and determines if the write is destined for one of the volumes that RecoverPoint is protecting. If the write is destined to a protected LUN, then the splitter sends the write downward and will rewrite the address packet in the write so that a copy of the write is sent to the RecoverPoint appliance. When the ACK (acknowledged back) from the original write is received, the splitter will wait until a matching ACK is received from the RecoverPoint appliance before sending an ACK up the I/O stack. The splitter can also be part of the storage services on intelligent SAN switches from Brocade or Cisco.
For a CLARiiON CX4 and CX3, the CLARiiON storage processor also has a write splitter. When a write enters the CLARiiON array (either through a Gigabit Ethernet port or a Fibre Channel port), its destination is examined. If it is destined to one of the LUNs being replicated by RecoverPoint, then a copy of that write is sent back out one of the Fibre Channel ports of the storage processor to the RecoverPoint appliance. Since the splitter resides in the CLARiiON array, any open systems server that is qualified for attachment to the CLARiiON array can be supported by RecoverPoint. Additionally, both Fibre Channel and iSCSI volumes that reside inside the CLARiiON CX4 or CX3 storage array can be replicated by RecoverPoint. RecoverPoint/SE only supports a Windows host-based splitter and the CLARiiON-based write splitter. Also automatic installation and configuration for RecoverPoint/SE only supports the CLARiiON-based write splitter.
Below is a Video from EMC demonstrating Recoverpoint in a VMWare Environment :
Optimise So how do we ensure we are getting the most out of the links we use (especially over contended links such as VPN or MPLS) ? WAN optimisation.. there are a number of ways this can be done, some use an appliance to acknowledge back to the production SAN locally, then cache the data and burst it over the WAN. Some companies have found a more efficient way of transmitting data over a WAN, by using proprietary more efficient protocols to replace TCP over the WAN (such as Hyper IP), Below is a snippet from a mail I received from a company called Silverpeak who seem to deal with the Challenges of optimizing WAN efficiency quite well, in particular with SAN Replication :
Replication is a Biiiig topic.. there are many more factors to be considered; such as automation, cluster awareness, etc. I think the best way to summarise this post is…
Now, for the bulk of organisations (in the UK at least), the majority of business applications are hosted on operating systems such as Windows, Linux, HPUX and Solaris. EMC do very well with these organizations; they have extensive lists of supported operating systems with all their revisions and service pack releases to boot. For these organisations and resellers selling into them, life is good, interoperability is rife and big vendors such as EMC give them much love. But there is another world out there, one often overlooked by the likes of EMC… A world of glorious white, multicoloured fruit and virus free environments.. I shall call this place Mac land, often visited by the likes of graphics design, advertising and publishing companies.
Without a support statement in site involving the words Mac OSX for some years and the likes of Emulex and Qlogic not forthcoming with a resolution, the future was looking bleak for resellers wanting to sell EMC SAN storage into Mac user environments. But wait !! a solution has presented itself!! in the form of a company called ATTO technology.. much like saint nick delivering presents in the night.. these guys are sneaking Mac OSX support statements onto EMC interoperability support matrices. I heard no song and dance about this !? but I was pleased to see it none the less….
The supported range of FC HBA’s come in single port, dual port and quad port models (FC-41ES, FC-42ES, FC-44ES) and the iSCSI software initiator is downloadable from their website.
Supported with Mac OSX 10.5.5 through 10.5.10 on apple Xserve servers and Intel based Mac Pro Workstations attaching to EMC’s CX4 range only; rather than just providing basic support out of neccesity, there are a few bells and whistles. Multipathing is supported with ATTO’s own multipathing driver and integrates with ALUA on the Clariion, a number of Brocade, Cisco MDS and Qlogic Sanbox switches are supported (with the exception a few popular recent switches such as Brocade silkworm 300, 5100, 5300 switches and Qlogic SANBox 1404’s). Also, ATTO have released an iSCSI software initiator for iSCSI connectivity to Clariion or Celerra which is also supported.
Just a brief disclaimer.. I’ve mentioned some specific support statements, that is not to say that EMC would not support the switches I mentioned aren’t currently listed, but you may have to jump through some hoops to get your solution supported if certain elements aren’t on standard support statements. I would recommend checking the relevant support statements from EMC if you are Mac users looking at EMC, just to make sure your bases are covered.
Take a look at the press release from ATTO Technologies here
A few new things comming out on EMC’s mid tier storage range to look out for. One of which is very discreetly named Project Odin and will make the life of EMC Celerra users and resellers alike a touch easier. Its a management console to manage Celerra and its respective back end Clariion rather than having to jump into navisphere to manage the Clariion directly ! from what I gather its will run on any DART or FLARE OS and is pointed at the system to which it needs to manage via IP address, then the appropriate profile is loaded to reflect the functions relevant to said Clariion/Celerra.. about time !! as I understand it, there will be an announcement in feb.. but it won’t be going GA for a little while. Watch this space !!
So the FAST suite is available on Clariion. Good news !! If you don’t know what FAST (Fully Automated Storage Tiering) is, in a nutshell it’s automated storage tiering (as implied in the name), it ensures that LUNs which may have critical performance requirements and variable characteristics in terms of IO utilisation are using disk as efficiently as possible. LUNs of data which are accessed frequently will be serviced by one tier of disk (ie solid state) and others by another (ie fiber channel disk or SATA) and all this is done dynamically on the fly using the Clariion Virtual LUN technology (meaning you can migrate a LUN from 1 set of disks to another seamlessly to the application and retaining all the properties of the LUN); All this is done on the automatically using FAST. This is especially relevant now as virtualisation is rife and the ability to be so flexible by way of server deployment requires that the storage is either meticulously designed and frequently reviewed, adjusted, etc or the storage platform is adaptive and flexible (even more relevant in multi-tenancy environments which offer a managed service) . FAST enables EMC storage to fall under the latter.
So, is there a sting in the tail ? is it silly money ? much like Control Center was (although that did get better to be fair).. surprisingly… no. The FAST suite of tools is suprisingly well priced at a touch over £6000 list. Bearing in mind that the FAST suite isn’t just the FAST software, it also includes Navisphere Analyser (for analyzing and monitoring SAN bandwidth and IO), QOS Manager (for performance tuning) and they throw in an assessment service to provide recommendations as to which LUNs are most suitable for migration to fast managed LUNs which will utilise EFD (solid state drives to the rest of us) and/or SATA etc.. (I’ll come back to why this is required a but later) considering that you’re looking at a £10k list price for Navisphere Analyser and QOS Manager alone, thats not a bad deal. But then it wouldn’t be as you’re still looking at just under £8000 for an enterprise flash drive and FAST is as good a mechanism as any to drive sales of solid state drives. But this isn’t just a smoke and mirrors mechanism to sell solid state drives, the benefits are real. The capital expenditure involved in deploying Enterprise flash drives with FAST may be undesirable to a lot of businesses, but the return on investment is again very real. The requirement to procure mass amounts of FC drives to support highly transactional databases is not gone, but certainly minimized, man hours required for certain laborious storage admin tasks is reduced (especially in environments may have applications with extremely variable disk loads), power and cooling requirements are reduced, the list goes on..
So why is there an assessment service ? can’t I just chuck everything on FAST managed LUNs and tell it to go do ?… Yes, you could. But Solid state drives are still expensive, so make the best use of them you can. So I might suggest that LUNs with a lesser performance requirement and predictable disk load characteristics sit on standard LUNs.
See below to see FAST on EMC’s VMax.. now just waiting for this on Clariion and Celerra (sub LUN level)
and before some boxing boff corrects me on my Muhammad Ali quotation in the post title. I know its “flout like a butterfly, sting like a bee”.. but cut me some slack, float didn’t quite fit.. call it creative license 😉
A few things I would suggest you do before just sizing a bunch of disks for capacity.
Firstly, your application response times are dependant upon a few things, one of the key things is ensuring you provision enough spindles/drives to support the disk load you are going to put on the SAN. If performance isn’t considered and the SAN isn’t sized with performance in mind, you could potentially see queue depth increasing on drives, the queue depth directly relates to the number of read/write requests waiting to access the drives. If the queue depth gets too high, applications which require sub 5ms or less response time (which a few do) may start timing out and you have problems. So you need to do a bit of data gathering..
In windows terms run something like perfmon in logging mode, looking at counters like bytes written, bytes read, number of reads, number of writes, queue depth. In Linux/unix terms, something like IOstat should be fine.
Ensure start logging over a reasonable period of time and ensure you capture metrics over peak hours of activity. We’ll come back to this in a minute.
Identify the profile of your data, which applications write to disk in a sequential fashion which write in a Random fashion. Sequetial writes are optimised by some clever bits the clariion does in cache, if you mix Sequential and random type data on the same RAID groups, you won’t see optimised writes with sequential data.
sequential data = Large writes (typically Backup to disk apps, archive applications, media streaming, large file)
Random data = lots of small read/writes. Ie. Database (exchange, SQL, oracle)
Next you need to think about the level of RAID protection required:
RAID 5 = Distributed parity, has a reasonably high write penalty, good usable vs raw capacity rating (equivalent of one drives usable capacity for parity) , a fair few people use this to get most bang for their buck. bear in mind that RAID 5 can suffer single drive failure (which will incurr performance degradation), but will not protect from double disk failure. EMC Clariion does employ the use of hotspares, which can be proactively built when the Clariion detects a failing drive and used to substitute the failing drive when built, although if no hotspare exists or if a second drive fails during a drive rebuild or hotspare being build, you will lose your data. write penalty = 4
RAID 3 = Dedicated Parity disk, great for large files/media streaming, etc. Although RAID 5 can work in the same fashion as RAID 3 in the right conditions.
RAID 1/0 = Mirrored/Striped, lesser write penalty, more costly per GB as you lose 50% usable capacity to mirroring. RAID 1/0 provides better fault resilience and “rebuild” performance than RAID-5. It has better overall performance by combining the speed of RAID-0 with the redundancy of RAID-1 without requiring parity calculations. write penalty = 2
RAID 6 = Again distributed parity but instead of calculating horizontal parity only (as RAID 5 does) also calculated Diagonal parity essentially protecting you from double disk failure, there is a greater capacity overhead than RAID 5, but not as great as RAID 1/0 (equivalent of 2 drives usable capacity for parity). The write penalty for RAID 6 is greater than RAID 5, although typically RAID 6 should only be used for sequential type data (back to disk, media streaming, large file), so writting to disk will be optimised by write coallescing in cache, writing all data and parity out do disk in one go without calculating parity on the disk (this incurring a lesser write penalty). This will only happen if the write sizes are greater than the stripe size of the RAID group and the drives are properly aligned.
There are a number of documents on powerlink outlining RAID sizing considerations for specific applications. Rule of thumb us keep your log files on seperate spindles to your main DB volumes.
I’m going to cut this a bit short –
from the data you gathered (mentioned at the beginning of the document), for each logical drive currently local or direct attached take the following :
number of reads + (number of writes x write pentalty) = disk load (write penalty is specific to the raid type being used, RAID 5 = 4, RAID 1/0 = 2 )
Each drive type has an IOP rating = 10k FC = 150 IOPS, 15k FC = 180 IOPS, SATA = approx 80 IOPS
divide the disk load by the IOP rating of the disk type you’ve chosen and that will give you the spindles required to support your disk load (excluding parity drives). You will most likely have multiple volumes (LUNS) on a given RAID group, so ensure you divide the aggregate disk load of the volumes to reside on the given RAID group by the IOP rating of the drives in question.
Thats a starting point for you and some food for thought.. for more detail, there are FLARE best practice guides and application specific guides galore on powerlink…
The EMC Clariion CX4 is a very flexible box. One of the ways which it enables admins to get the most bang for their buck is by utilising Virtual (Thin) provisioning. This effectivelly enables an admin to create a pool of storage from which smaller volumes of storage (thin LUNs) are provisioned. See below:
Many people will see thin provisioning as the answer to all their problems in terms of storage management.. No. It still needs to be monitored, managed and some intelligent thinking still needs to go into what volumes will be thin provisioned. The EMC Clariion CX4 also has some limitations around thin provisioning.
Also please bear in mind, that although a disk pool for thin provisioning may contain more drives than RAID allows, the maximum size of a single LUN may not exceed 14TB.