Category Archives: NAS

EMC World 2011 – Las Vegas – day 1

So after the first day at EMC World what Marvels of technology have been announced ?
What groundbreaking nuggets of geeky goodness to be announced. So, first things first VPLEX ! looks like they may have cracked it..   Active/active storage over a synchronous distances, Geoclusters will never be the same again !!..   and also a slightly ambiguous announcement around integration with Hadoop opensource (more to follow on that).

What was the message of the day though ? What was this years theme..   This year EMC are talking about Big data and the cloud. Clearly recent acquisitions of Isilon and Greenplum have planted EMC’s head firmly back in the clouds.  Greenplum giving end users the ability to scale out Database architectures for data analytics to mammoth scale with Greenplums distributed node architecture and massive parallel processing capabilities. To br frank, learning about the technology was borderline mind numbing, but my god its a cool technology. Then we have large scale out NAS with Isilon and its OneFS system giving the ability to present massive NAS repositories and scale NAS on a large scale. So obviously, EMC are talking about big data.

I also had the opportunity to sit in on an NDA VNX/VNXe session and what they’re going to do is….    aaah, I’m not that stupid. But needless to say, there are some nice additions on the way, the usual thing with higher capacity smaller footprint drives and getting more IO in less U space, but also some very cool stuff on the way which will enable EMC to offer a much cheaper entry point for compliance ready storage..  watch this space.

In true style EMC threw out some interesting IDC touted metrics further justifying the need to drive storage efficiencies and re-iterating the fact that there will always be a market for storage. So, our digital universe consists of 1.2 Zettabytes of data, currently, of which 90% of that is unstructured data and that figure is predicted to grow by x44 over this decade. Also 88% of fortune 500 companies have to deal with Botnet attacks on a regular basis and have to contend with 60 Million Malware variants.  So making this relevant, the 3 main pain points of end users are; firstly our time old friend budget, then explosive data growth and securing data.

So how have EMC addressed these ? Well, budget is always a fun one to deal with, but with efficiencies in storage by way of deduplication, compression, thin provisioning and auto tiering of data, end users should get more bang for their buck. Also, EMC easing up on the rains with pricing around Avamar and the low entry point of VNXe, this should help the case. Dealing with explosive data growth again tackles with deduplication, compression, thin provisioning and auto tiering of data, but also now with more varied ways of dealing with large sums of data with technologies such as Atmos, greenplum, Isilon. Then the obvious aquisition of RSA to tie in with the security message, all be it that has had its challenges.

I’m also recently introduced the concept of a cloud architect certification track and the concept of a Data Scientist (god knows, but I’ll find out). So I went over to the proven professionals lounge and had a chat with the guys that developed the course. Essentially it gives a good foundation for steps to consider when architecting a companies private cloud, around Storage, virtualisation, networking and compute. If you’re expecting a consolidated course which covers the storage consolidate courseware, Cisco DCNI2, DCUCD course and VMware install configure manage,  then think again, but it does set a good scene as an overlay to understanding these technologies. It also delves into some concepts around cloud service change management and control considerations and the concept of a cloud maturity model (essentially EMM, but more cloud specific). I had a crack at the practice exam and passed with 68%, aside from not knowing the specific cloud maturity terms and EMC specific cloud management jargon anyone with knowledge of servers, Cisco Nexus and networking, plus virtualization shouldn’t have to many issues, but you may want to skim over the video training package.

There was also a nice shiny demo from the Virtual Geek Chad Sakkac showing the new Ionix UIM 2.1 with Vcloud integration using CSC’s cloud service to demonstrate not only the various subsets of multi tenancy, but also mobility between disparate systems. When they integrate with public cloud providers such as Amazon EC2 and Azure, then things will really hot up, but maybe we need some level of cloud standards in place ?…   but we all know the problem with standards, innovation gives way to bureaucracy and slows up…   but then again with recent cloud provider issues, maybe it couldn’t hurt to enforce a bit of policy which allows the market to slow up a little and take a more considered approach to the public cloud scenario..   who knows ?

Anyway.. watch this space..  more to come

Protocol considerations with VMware

A good video I came across from EMC discussing some storage protocol considerations when looking at VMware.

Unisphere.. yay !! what about Celerra Manager and Advanced Manager on the NX4 ?

Right, so EMC have got rid of Basic and advanced editions of Celerra Manager and replaced them Unisphere. Fantastic ! no more questions about what the difference is between the basic and advanced edition of Celerra Manager !!  Naaay..    Interest has peaked on the Celerra NX4; EMC’s little Unified storage box must be hitting a sweet spot as we’re getting lots of requests..  and this box still runs Celerra Manager.

So, What do you need to know about Celerra manager when comparing the two editions.

The Advanced Edition gives you the ability to manage multiple Celerra’s – So if replicating two of Celerra’s, I would strongly suggest the Advanced Edition.

The Advanced Edition gives you more control of Provisioning Disk – The Basic edition will automate management of how disks are carved up in order to present file systems and shares out to the network. I nice feature for the IT manager with not enough time on his hands to do this. But if you want to carve up Meta’s, volumes and disks in a specific way to meet specific performance requirements, then you need the advanced edition to circumnavigate the Automated volume manager.

The advanced edition has a inbuilt migration tool called CDMS (Celerra Data Migration Service) – I would advise that tool is reserved for only those who are well versed in Celerra and migrations. But effectivly it offers migration capability for file data to Celerra with minimal down time. If you are going to use this, make sure you know what you’re doing or engage an EMC partner.

Those are the important bits you need to know..     any further questions…    ask your EMC Partner

Iomega ? Consumer only ?.. pfft, Me thinks not

It would appear that the aquisition of Iomega by EMC is paying is dividends by way of cool tech being added to the Iomega Range.  So, as you may be aware Iomega released their new IX12 NAS box earlier this month (see previous post for more info) , which has many of the gubbins of “proper” NAS. What could this Sub £10k little box have that pips EMC and Netapps big enterprise boxes to the post ?  It has an Avamar agent installed in the NAS device !!…    Granted,  if you don’t know what avamar is, that previous statement may have been something of an anti-climax…   Let me elaborate:

  • Typically what type of data contain the most commonalilty?
  • Typically which type of data consumes the most storage ?
  • Which type of data takes the longest time to backup ?

The answer to the question my pedigree chums.. is file data (in most cases, not all..  granted).  So,  Company X (The commercial division of the Xmen..  obviously), has a head office in London and a number of regional small branch offices dotted around the country. Each one of these offices is serving up user home directories and network drives from said Iomega IX12 (lets say 4TB per office)..   When it comes to backing those sites up; do they back it all up to tape or disk locally, taking up time and budget on a per site basis for their backups ? Do they back it all up to disk, replicate data to a central site for DR and try and shove how ever many terrabytes down a 100MB link wondering why it takes sooo long ?   nay..  After a the first full backup they only backup the block level changes over the link to their central site , allowing them to negate the requirement to backup to disk locally on their smaller regional offices..     bearing in mind that typically the daily rate of change on unstructured data is less than a percent..  nightly backups can be done quick sharp and are treated as full backups when it comes to restore, so you don’t have to run through all your incremental backups to ensure you’re up to date.

Not a bad bit of tin if you ask me..

Dedupe your file data !! save our hard drives !!

Just a little video I put together showing file server consolidation (in a blue peter here’s one I made earlier style). 2 minutes, nothing too fancy..   just a bit of fun.  

(best watched in full screen)

I do hope geek is the new chic …    because if not…   I feel dirty

Iomega/EMC’s new lovechild

Iomega first started life selling removable storage. The world marvelled at the might of the 200MB Zip drive, brought gifts of  gold , frankincense and murr as offerings to the almighty Jazz drive and sacrificed livestock in awe of the the Ditto Drive  (I exagerate..  but bear with me, I’m setting the scene). Then, as removable storage media started to give way to internet and USB drives became the standard for removable storage..  we started to see the likes of the zip and jazz drive fade away.

So..  out with the old, in with the new ? No..  Now Iomega have a massive play in the consumer space for External Hard drives and networked storage. The upper end of the networked storage range was the IX4 (now on its second generation). A nice tidy box which would hold up to 8TB of RAW capacity and fit well in a remote office environment, home office, even as a media server for your movies and music (all legitimately obtained of course). They even did a rackmount NAS device..  Brilliant !!

But what if you need a little more grunt… a bit more redundancy, scalability.. something more feature rich. Iomega/EMC are on the verge of releasing the IX12. This box fits nice and snug between the IX4-200R and EMC’s Celerra NX4; it supports up to 24TB of RAW capacity, supports all the RAID types you’d ever want to use and has 4 Gigabit ports which can support up to 256 iSCSI initiators (servers) or 256 LUN’s for block level access. All the other usual protocols still apply in the oh so familiar forms of CIFS, NFS, FTP, HTTP, etc and there are even a few nice bells and whistles such as port aggregation, DFS, array based replication, WebDav Support for online collaboration and it also sports drive spin down (very cool if its being used for a backup to disk or archive target). 

The IX12 has also been certified by a number of other vendors; it is obviously certified and on VMwares Hardware compatibility List for shared storage (also supported by a number of other virtualization vendors). Microsoft have verified that it will support Exchange 2010 Mailstores for environments of up to 250 users.

Its being stated by Iomega that these boxes are sitting in at between $5,000 and $10,000 list,  so will help EMC break even further into the lower SMB market. Personally, I think this box will play really well in spaces such as remote office,  graphic design organisations, departmental dedicated storage, backup to disk targets (admittedly would be more compelling if it supported NDMP, but we’ll leave that to the big boys), archive storage for the likes of EMC’s SourceOne, EV, Commvault, etc…

I’ll put together a more clear and concise post after the announcements to come, but I think Iomega could be onto a winner on this one..

Building Blocks of a Vblock

Seeing as lots of people are asking lots of questions around EMC,Vmware and Cisco’sVblock. I thought I’d best dig something out. Attached is a very concise, granular, document which outlines the different elements of a Vblock, how the disks are configured, supported Vblock applications and…   some pretty pictures for your delectation.


The below clip is the Cisco Vice President talking about the various Vblock packages.

Managing Celerra from Vmware

EMC of late have been very good at increasing the level of integration between their storage and the VMWare platform. First it was DRS integrating with EMC QOS Manager, then the ability to view what VM’s reside on SAN storage from within Clariions Navisphere Manager, then replication manager was pulled inline to facilitate machine consistent snaps/replication with vmware using their VMFS proxy.

All very cool stuff, but now EMC are pulling the ability to manage storage from within EMC’s Celerra platform amd into VMWare’s VI Client.  As of release 5.6.48 of DART (Dart is the firmware/OS for Celerra), you will be able to manage the creation and management of NFS exports from within VMWare and perform the following actions :

Create an NFS file system and mount to ESX systems to be used as a VMware data store. File systems created with the plug-in will be automatically deployed with EMC and VMware best practices, including Celerra Virtual Provisioning.

Data store (NFS file system) extension extends the Celerra file system that is exported to the ESX cluster.

Compress the VMware VMDK files associated with a virtual machine, a folder (of multiple virtual machines), an ESX server, or a whole data store. Decompress a previously compressed VMDK or set of VMDK files.

Full Clone—Make full copies of virtual machine VMDK files using new Celerra-based functionality

Fast Clone—Make thin copies of virtual machine VMDK files instantly using new Celerra NFS file-based snap functionality



Below is a very good video demonstation provided by the one and only virtualgeek Mr Chad Sakac, demonstrating the feature

Celerra Dedupe… How does it work ?!

I’m getting a lot of questions about how EMC Celerra deduplication works. As deduplication is becoming evermore relevant in the market, I thought I’d best address it.

So what is deduplication ? we know its the elimination of duplicates.. but how is this done in storage ? All we’re doing is taking a thing (file or block of data , depending on the type of dedupe deployed), hashing this “thing” (in most cases using SHA1), a unique fingerprint is generated based on the 1’s and 0’s of that “thing”. So when a “thing” is written to disk, upon hashing said “thing”, if the generated fingerprint already exists..  we don’t store it to disk , we just point to the pre-existing identical “thing”, if it doesn’t exist then we write it to disk and store a new fingerprint for future things to be pointed at. End result…  suprise suprise.. Storage savings !!

Apologies for the excessive use of the word “thing”…  A necessary evil.

Firstly, lets look at the different kinds of deduplication which are deployed out in the market today. There are a few aspects we need to consider when looking at dedupe. Where hashing and checking occurs, at what point the dedupe process takes place and the level of granularity of various types of deduplication.

Where is deduplication Handled (hashing/checking)

Dedupe at Source

We have dedupe at source (where the block delta’s are tracked on the client side in the form of an agent). This is currently deployed in the shape of Avamar by EMC and is used for backup to maximise capacity and minimise LAN/WAN traffic (see previous post on avamar). I believe Commvault may also be making a play for this in Simpana 9.

Deduplication at target

Simply means that dedupe is handled at the Storage Target. This is pretty common. Used by the likes of Data Domain, EMC Celerra, Quantum, etc..   this list goes on.

When does deduplication Occur


Data is handled immediately and deduplicated as part of the process of writing data to disk. This is not so common, as unless its done very well, there can be alot of latency involved due to the deduplication process having to take place before a write is commited to disk. Data Domain do this and they do it very well. Their process uses a system called SISL, where write performance relys on CPU power rather than spindle count. Fingerprints are stored in memory, so that when data is written to the device, the fingerprint lookups are handled in memory and the CPU power determines the speed of the hashing process. If it doesn’t find a fingerprint in memory, it will look for it on disk, but upon finding it will pull up a shed load of fingerprints with it which relate to the same locality of data (kinda similar to cache prefetching.), so sequential writes can again reference fingerprints from memory not disk.

Want more info on this.. see attached (DataDomain-SISL-Whitepaper).

Post Process

This is most common as most people can’t handle inline dedupe as efficiently as Data Domain.

Level of Deduplication Granularity

File Level Dedupe

File level dedupe is where an entire file is hashed and referenced. Also known as single instancing, this is not as efficient as block level dedupe, but requires less processing power. You may be familiar with this technology from the likes of EMC Centera or Commvault’s SiS Single instancing from Simpana 7.

Fixed Block Dedupe

This is hashing individual blocks of data in a data stream and is much more efficient than file level dedupe. Although it incurs a fair amount more processing power.

Variable block size dedupe

This is essentially where the size of the blocks being hashed can be variable in size. The benefits of this for file data is minimal. This is best placed when there are multiple data sources in heterogenous environments or environments where data may be misaligned (ie, B2D data or VTL).  Data Domain do this…  and inline..  which is impressive.

EMC Celerra uses File level dedupe and Compression, it also uses a post process mechanism. So, when specify that you wish to enable dedupe on a file system, you also specify a policy of file types and/or files of a certain age which qualify for dedupe. It then periodically scans the appropriate file system(s) for files which match the policy criteria, compresses them, hashes them and moves them to a specific portion of the file system (transparent to the user), when the next scan runs and it finds new data which meets the policy criteria, it will compress them and hash them, then it will look at the hashes of previously stored files. If a file exists.. it doesn’t get stored (just points to the existing original file), if it doesn’t..  it gets stored…   simples. The fact that there will most likely be a fair few duplicate files in user home directories, means that you should see a fair number of commonalities which qualify for dedupe in many environments and with compression also being used, will assist in making the best usage of available storage on your Celerra.

 More information in an EMC white paper on the subject here.

and an online demo from emc below.