Sizing for FAST performance

So EMC Launched the VNX and changed elements of how we size for IO. We still have the traditional approach to sizing for IO in that we take our LUN’s and size for traditional RAID Groups. So lets start here first to refresh :

Everything starts with the application. So what kind of load is the application going to put on the disks of our nice shiny storage array ?

So lets say we have run perfmon or a similar tool to identify the number of disk transfers (IOPS)  occurring on a logical volume for an application. So we are sizing for a SQL DB volume which is generating 1000 IOPS for the sake of argument.

Before we get into the grit of the math. We must then decide what RAID time we want to use (as below are most common for transactional elements).

RAID 5 = Distributed parity, has a reasonably high write penalty, good usable vs raw capacity rating (equivalent of one drives usable capacity for parity) , a fair few people use this to get most bang for their buck. bear in mind that RAID 5 can suffer single drive failure (which will incur performance degradation), but will not protect from double disk failure. EMC Clariion does employ the use of hotspares, which can be proactively built when the Clariion detects a failing drive and used to substitute the failing drive when built, although if no hotspare exists or if a second drive fails during a drive rebuild or hotspare being build, you will lose your data. write penalty = 4

RAID 1/0 = Mirrored/Striped, lesser write penalty, more costly per GB as you lose 50% usable capacity to mirroring. RAID 1/0 provides better fault resilience and “rebuild” performance than RAID-5. It has better overall performance by combining the speed of RAID-0 with the redundancy of RAID-1 without requiring parity calculations. write penalty = 2

Yes there are only 2 RAID types here, but this is more to keep the concept simple.

So, depending on the RAID type we use, as certain write penalty is incurred due to mirroring or Parity operations.

Lets take a view on the bigger piece now. Our application Generates 1000 IOPS. We need to separate this into Reads and Writes :

So lets say. 20% writes Vs 80% reads. We then multiply the number of writes by the appropriate write penalty (2 for RAID 10 or 4 for RAID 5). Lets say RAID 5 is our selection :

The math is as follows :

800 Reads + (200 Writes x 4) = 1600 IOPS. This is the actual disk load we need to support.

We then divide that disk load by the IO Rating of the drive we wish to use. Generally speaking at a 4KB block size the below IO Ratings apply (this goes down as block sizes/pages to disk sizes get bigger).

15K SAS/FC = 180 IOPS
10k SAS/FC – 150 IOPS

The figure we are left with after dividing the disk load by the IO Rating is the number of spindles required. This is the same when sizing for sequential disk load, but we refer to MB/s and bandwidth instead of disk transfers (IOPS). Avoid using EFD for sequential data (overkill and not much benefit).

15k SAS/FC = 42 MB/s
10k SAS/FC = 35 MB/s
7.2k NLSAS – 25 MB/s

Bear in mind this does not take array cache into account and sequential writes to disk benefit massively from Cache, to the point where many papers suggest that NLSAS/SATA give comparable results to FC/SAS.

So What about FAST ?

Fast is slightly different. It Allows us to define Tier 0, Tier 1 and Tier 2 layers of disk. Tier 0 might be EFD, Tier 1 might be 15k SAS and Tier 2 might be NLSAS. When can have multiple tiers of disk residing in a common pool of storage (kind of like a raid group, but allowing for functions such as thin provisioning and tiering).

When can then create a LUN in this pool and specify that we want the LUN to start life on any given tier. As access patters to that LUN are analysed by the array over time, the LUN is split up into GB chunks and only the most active chunks utilise Tier 0 disk, the less active chunks are trickled down to our Tier 1 and Tier 2 disks in the pool.

fundamentally speaking, 90% of the IOPS for performance with the Tier 0 disk (EFD) and bulk out the capacity by splitting the remaining capacity between tier 1 and tier 2. You will find that in most cases you can service the IO with a fraction of the number of EFD disks vs if you did it all with SAS disks. I would suggest that if you know something should never require EFD such as B2D or archive data or Test/Dev, put them in a separate disk pool with no EFD.

EMC World 2011 – Las Vegas – day 1

So after the first day at EMC World what Marvels of technology have been announced ?
What groundbreaking nuggets of geeky goodness to be announced. So, first things first VPLEX ! looks like they may have cracked it..   Active/active storage over a synchronous distances, Geoclusters will never be the same again !!..   and also a slightly ambiguous announcement around integration with Hadoop opensource (more to follow on that).

What was the message of the day though ? What was this years theme..   This year EMC are talking about Big data and the cloud. Clearly recent acquisitions of Isilon and Greenplum have planted EMC’s head firmly back in the clouds.  Greenplum giving end users the ability to scale out Database architectures for data analytics to mammoth scale with Greenplums distributed node architecture and massive parallel processing capabilities. To br frank, learning about the technology was borderline mind numbing, but my god its a cool technology. Then we have large scale out NAS with Isilon and its OneFS system giving the ability to present massive NAS repositories and scale NAS on a large scale. So obviously, EMC are talking about big data.

I also had the opportunity to sit in on an NDA VNX/VNXe session and what they’re going to do is….    aaah, I’m not that stupid. But needless to say, there are some nice additions on the way, the usual thing with higher capacity smaller footprint drives and getting more IO in less U space, but also some very cool stuff on the way which will enable EMC to offer a much cheaper entry point for compliance ready storage..  watch this space.

In true style EMC threw out some interesting IDC touted metrics further justifying the need to drive storage efficiencies and re-iterating the fact that there will always be a market for storage. So, our digital universe consists of 1.2 Zettabytes of data, currently, of which 90% of that is unstructured data and that figure is predicted to grow by x44 over this decade. Also 88% of fortune 500 companies have to deal with Botnet attacks on a regular basis and have to contend with 60 Million Malware variants.  So making this relevant, the 3 main pain points of end users are; firstly our time old friend budget, then explosive data growth and securing data.

So how have EMC addressed these ? Well, budget is always a fun one to deal with, but with efficiencies in storage by way of deduplication, compression, thin provisioning and auto tiering of data, end users should get more bang for their buck. Also, EMC easing up on the rains with pricing around Avamar and the low entry point of VNXe, this should help the case. Dealing with explosive data growth again tackles with deduplication, compression, thin provisioning and auto tiering of data, but also now with more varied ways of dealing with large sums of data with technologies such as Atmos, greenplum, Isilon. Then the obvious aquisition of RSA to tie in with the security message, all be it that has had its challenges.

I’m also recently introduced the concept of a cloud architect certification track and the concept of a Data Scientist (god knows, but I’ll find out). So I went over to the proven professionals lounge and had a chat with the guys that developed the course. Essentially it gives a good foundation for steps to consider when architecting a companies private cloud, around Storage, virtualisation, networking and compute. If you’re expecting a consolidated course which covers the storage consolidate courseware, Cisco DCNI2, DCUCD course and VMware install configure manage,  then think again, but it does set a good scene as an overlay to understanding these technologies. It also delves into some concepts around cloud service change management and control considerations and the concept of a cloud maturity model (essentially EMM, but more cloud specific). I had a crack at the practice exam and passed with 68%, aside from not knowing the specific cloud maturity terms and EMC specific cloud management jargon anyone with knowledge of servers, Cisco Nexus and networking, plus virtualization shouldn’t have to many issues, but you may want to skim over the video training package.

There was also a nice shiny demo from the Virtual Geek Chad Sakkac showing the new Ionix UIM 2.1 with Vcloud integration using CSC’s cloud service to demonstrate not only the various subsets of multi tenancy, but also mobility between disparate systems. When they integrate with public cloud providers such as Amazon EC2 and Azure, then things will really hot up, but maybe we need some level of cloud standards in place ?…   but we all know the problem with standards, innovation gives way to bureaucracy and slows up…   but then again with recent cloud provider issues, maybe it couldn’t hurt to enforce a bit of policy which allows the market to slow up a little and take a more considered approach to the public cloud scenario..   who knows ?

Anyway.. watch this space..  more to come

Protocol considerations with VMware

A good video I came across from EMC discussing some storage protocol considerations when looking at VMware.

Unisphere.. yay !! what about Celerra Manager and Advanced Manager on the NX4 ?

Right, so EMC have got rid of Basic and advanced editions of Celerra Manager and replaced them Unisphere. Fantastic ! no more questions about what the difference is between the basic and advanced edition of Celerra Manager !!  Naaay..    Interest has peaked on the Celerra NX4; EMC’s little Unified storage box must be hitting a sweet spot as we’re getting lots of requests..  and this box still runs Celerra Manager.

So, What do you need to know about Celerra manager when comparing the two editions.

The Advanced Edition gives you the ability to manage multiple Celerra’s – So if replicating two of Celerra’s, I would strongly suggest the Advanced Edition.

The Advanced Edition gives you more control of Provisioning Disk – The Basic edition will automate management of how disks are carved up in order to present file systems and shares out to the network. I nice feature for the IT manager with not enough time on his hands to do this. But if you want to carve up Meta’s, volumes and disks in a specific way to meet specific performance requirements, then you need the advanced edition to circumnavigate the Automated volume manager.

The advanced edition has a inbuilt migration tool called CDMS (Celerra Data Migration Service) – I would advise that tool is reserved for only those who are well versed in Celerra and migrations. But effectivly it offers migration capability for file data to Celerra with minimal down time. If you are going to use this, make sure you know what you’re doing or engage an EMC partner.

Those are the important bits you need to know..     any further questions…    ask your EMC Partner

Interestingevan on

A few weeks ago Imyself and a few others were asked by Chris Mellor at the register to provide my thoughts around whether Replication could replace backup. Take a look at the below link to see the article :

Adminstration of Clariion with VMWare… Getting easier

So, EMC released the NFS plugin for VMWare to support storage administration tasks on Celerra from the VI Client a while back, which was very cool and had some very impressive features..    but what about the Traditional SAN man ?! 

Well, yesterday EMC announced a VMWare plugin for Clariion.. 

Product Overview

The EMC CLARiiON Plug-in for VMware simplifies storage administration between the VMware Virtual Center Server and CLARiiON storage systems. It offers end-to-end management of storage related tasks including provisioning of datastores, provisioning of raw device mapping (RDM) devices, and array-based virtual machine replication.

New Feature Summary 

The EMC CLARiiON Plug-in for VMware allows you to perform the following specific tasks directly from the VMware vSphere client:

  • You can provision new datastores (VMFS volumes) or raw device mapping (RDM) volumes
  • Delete existing datastores backed by CLARiiON CX4 storage
  • Creation of virtual machine replicas using array-based replication services
  • The plug-in also gives you the option to publish the replicated virtual machines to a View Manager.


·       EMC CLARiiON Plug-in for VMware is customer-installable.

·       EMC CLARiiON Plug-in for VMware requires CX4 storage systems running Release 29 FLARE.

 Thats all I have at the minute, but will be picking the brain of the EMC bods as I go to get some more info.

Very usefull feature though !!

Iomega ? Consumer only ?.. pfft, Me thinks not

It would appear that the aquisition of Iomega by EMC is paying is dividends by way of cool tech being added to the Iomega Range.  So, as you may be aware Iomega released their new IX12 NAS box earlier this month (see previous post for more info) , which has many of the gubbins of “proper” NAS. What could this Sub £10k little box have that pips EMC and Netapps big enterprise boxes to the post ?  It has an Avamar agent installed in the NAS device !!…    Granted,  if you don’t know what avamar is, that previous statement may have been something of an anti-climax…   Let me elaborate:

  • Typically what type of data contain the most commonalilty?
  • Typically which type of data consumes the most storage ?
  • Which type of data takes the longest time to backup ?

The answer to the question my pedigree chums.. is file data (in most cases, not all..  granted).  So,  Company X (The commercial division of the Xmen..  obviously), has a head office in London and a number of regional small branch offices dotted around the country. Each one of these offices is serving up user home directories and network drives from said Iomega IX12 (lets say 4TB per office)..   When it comes to backing those sites up; do they back it all up to tape or disk locally, taking up time and budget on a per site basis for their backups ? Do they back it all up to disk, replicate data to a central site for DR and try and shove how ever many terrabytes down a 100MB link wondering why it takes sooo long ?   nay..  After a the first full backup they only backup the block level changes over the link to their central site , allowing them to negate the requirement to backup to disk locally on their smaller regional offices..     bearing in mind that typically the daily rate of change on unstructured data is less than a percent..  nightly backups can be done quick sharp and are treated as full backups when it comes to restore, so you don’t have to run through all your incremental backups to ensure you’re up to date.

Not a bad bit of tin if you ask me..

Got Email Xtender ? Want SourceOne ? How do I move my mails ?

EMC released SourceOne some time ago as a replacement for Email Xtender and took a view to a co-existence model for existing Mail Xtender estates. What does this mean ? It means the email extender archive stays in place and SourceOne simply reads from that archive for searching and shortcut resolution… Grand !!  but that means you still need to keep EX running until retention runs its course on its archived mails. So..   EMC in conjunction with a company called Transvault have now developed a tool to actually migrate mails from EX to SourceOne.. much better !

It’s a bit new, so you won’t see it just yet. But ultimately it will be a service delivered by EMC only.

Watch this space anywho..    more on the way.

Transvault also offer mail archive migration services for a number of other Mail Archive products. So if you have an archive solution and you don’t like it..   don’t just lump it,  there are ways and means..

Dedupe your file data !! save our hard drives !!

Just a little video I put together showing file server consolidation (in a blue peter here’s one I made earlier style). 2 minutes, nothing too fancy..   just a bit of fun.  

(best watched in full screen)

I do hope geek is the new chic …    because if not…   I feel dirty

Iomega/EMC’s new lovechild

Iomega first started life selling removable storage. The world marvelled at the might of the 200MB Zip drive, brought gifts of  gold , frankincense and murr as offerings to the almighty Jazz drive and sacrificed livestock in awe of the the Ditto Drive  (I exagerate..  but bear with me, I’m setting the scene). Then, as removable storage media started to give way to internet and USB drives became the standard for removable storage..  we started to see the likes of the zip and jazz drive fade away.

So..  out with the old, in with the new ? No..  Now Iomega have a massive play in the consumer space for External Hard drives and networked storage. The upper end of the networked storage range was the IX4 (now on its second generation). A nice tidy box which would hold up to 8TB of RAW capacity and fit well in a remote office environment, home office, even as a media server for your movies and music (all legitimately obtained of course). They even did a rackmount NAS device..  Brilliant !!

But what if you need a little more grunt… a bit more redundancy, scalability.. something more feature rich. Iomega/EMC are on the verge of releasing the IX12. This box fits nice and snug between the IX4-200R and EMC’s Celerra NX4; it supports up to 24TB of RAW capacity, supports all the RAID types you’d ever want to use and has 4 Gigabit ports which can support up to 256 iSCSI initiators (servers) or 256 LUN’s for block level access. All the other usual protocols still apply in the oh so familiar forms of CIFS, NFS, FTP, HTTP, etc and there are even a few nice bells and whistles such as port aggregation, DFS, array based replication, WebDav Support for online collaboration and it also sports drive spin down (very cool if its being used for a backup to disk or archive target). 

The IX12 has also been certified by a number of other vendors; it is obviously certified and on VMwares Hardware compatibility List for shared storage (also supported by a number of other virtualization vendors). Microsoft have verified that it will support Exchange 2010 Mailstores for environments of up to 250 users.

Its being stated by Iomega that these boxes are sitting in at between $5,000 and $10,000 list,  so will help EMC break even further into the lower SMB market. Personally, I think this box will play really well in spaces such as remote office,  graphic design organisations, departmental dedicated storage, backup to disk targets (admittedly would be more compelling if it supported NDMP, but we’ll leave that to the big boys), archive storage for the likes of EMC’s SourceOne, EV, Commvault, etc…

I’ll put together a more clear and concise post after the announcements to come, but I think Iomega could be onto a winner on this one..