A good video I came across from EMC discussing some storage protocol considerations when looking at VMware.
Tag Archives: NAS
It would appear that the aquisition of Iomega by EMC is paying is dividends by way of cool tech being added to the Iomega Range. So, as you may be aware Iomega released their new IX12 NAS box earlier this month (see previous post for more info) , which has many of the gubbins of “proper” NAS. What could this Sub £10k little box have that pips EMC and Netapps big enterprise boxes to the post ? It has an Avamar agent installed in the NAS device !!… Granted, if you don’t know what avamar is, that previous statement may have been something of an anti-climax… Let me elaborate:
- Typically what type of data contain the most commonalilty?
- Typically which type of data consumes the most storage ?
- Which type of data takes the longest time to backup ?
The answer to the question my pedigree chums.. is file data (in most cases, not all.. granted). So, Company X (The commercial division of the Xmen.. obviously), has a head office in London and a number of regional small branch offices dotted around the country. Each one of these offices is serving up user home directories and network drives from said Iomega IX12 (lets say 4TB per office).. When it comes to backing those sites up; do they back it all up to tape or disk locally, taking up time and budget on a per site basis for their backups ? Do they back it all up to disk, replicate data to a central site for DR and try and shove how ever many terrabytes down a 100MB link wondering why it takes sooo long ? nay.. After a the first full backup they only backup the block level changes over the link to their central site , allowing them to negate the requirement to backup to disk locally on their smaller regional offices.. bearing in mind that typically the daily rate of change on unstructured data is less than a percent.. nightly backups can be done quick sharp and are treated as full backups when it comes to restore, so you don’t have to run through all your incremental backups to ensure you’re up to date.
Not a bad bit of tin if you ask me..
Just a little video I put together showing file server consolidation (in a blue peter here’s one I made earlier style). 2 minutes, nothing too fancy.. just a bit of fun.
(best watched in full screen)
I do hope geek is the new chic … because if not… I feel dirty
Iomega first started life selling removable storage. The world marvelled at the might of the 200MB Zip drive, brought gifts of gold , frankincense and murr as offerings to the almighty Jazz drive and sacrificed livestock in awe of the the Ditto Drive (I exagerate.. but bear with me, I’m setting the scene). Then, as removable storage media started to give way to internet and USB drives became the standard for removable storage.. we started to see the likes of the zip and jazz drive fade away.
So.. out with the old, in with the new ? No.. Now Iomega have a massive play in the consumer space for External Hard drives and networked storage. The upper end of the networked storage range was the IX4 (now on its second generation). A nice tidy box which would hold up to 8TB of RAW capacity and fit well in a remote office environment, home office, even as a media server for your movies and music (all legitimately obtained of course). They even did a rackmount NAS device.. Brilliant !!
But what if you need a little more grunt… a bit more redundancy, scalability.. something more feature rich. Iomega/EMC are on the verge of releasing the IX12. This box fits nice and snug between the IX4-200R and EMC’s Celerra NX4; it supports up to 24TB of RAW capacity, supports all the RAID types you’d ever want to use and has 4 Gigabit ports which can support up to 256 iSCSI initiators (servers) or 256 LUN’s for block level access. All the other usual protocols still apply in the oh so familiar forms of CIFS, NFS, FTP, HTTP, etc and there are even a few nice bells and whistles such as port aggregation, DFS, array based replication, WebDav Support for online collaboration and it also sports drive spin down (very cool if its being used for a backup to disk or archive target).
The IX12 has also been certified by a number of other vendors; it is obviously certified and on VMwares Hardware compatibility List for shared storage (also supported by a number of other virtualization vendors). Microsoft have verified that it will support Exchange 2010 Mailstores for environments of up to 250 users.
Its being stated by Iomega that these boxes are sitting in at between $5,000 and $10,000 list, so will help EMC break even further into the lower SMB market. Personally, I think this box will play really well in spaces such as remote office, graphic design organisations, departmental dedicated storage, backup to disk targets (admittedly would be more compelling if it supported NDMP, but we’ll leave that to the big boys), archive storage for the likes of EMC’s SourceOne, EV, Commvault, etc…
I’ll put together a more clear and concise post after the announcements to come, but I think Iomega could be onto a winner on this one..
So what is deduplication ? we know its the elimination of duplicates.. but how is this done in storage ? All we’re doing is taking a thing (file or block of data , depending on the type of dedupe deployed), hashing this “thing” (in most cases using SHA1), a unique fingerprint is generated based on the 1′s and 0′s of that “thing”. So when a “thing” is written to disk, upon hashing said “thing”, if the generated fingerprint already exists.. we don’t store it to disk , we just point to the pre-existing identical ”thing”, if it doesn’t exist then we write it to disk and store a new fingerprint for future things to be pointed at. End result… suprise suprise.. Storage savings !!
Apologies for the excessive use of the word “thing”… A necessary evil.
Firstly, lets look at the different kinds of deduplication which are deployed out in the market today. There are a few aspects we need to consider when looking at dedupe. Where hashing and checking occurs, at what point the dedupe process takes place and the level of granularity of various types of deduplication.
Where is deduplication Handled (hashing/checking)
Dedupe at Source
We have dedupe at source (where the block delta’s are tracked on the client side in the form of an agent). This is currently deployed in the shape of Avamar by EMC and is used for backup to maximise capacity and minimise LAN/WAN traffic (see previous post on avamar). I believe Commvault may also be making a play for this in Simpana 9.
Deduplication at target
Simply means that dedupe is handled at the Storage Target. This is pretty common. Used by the likes of Data Domain, EMC Celerra, Quantum, etc.. this list goes on.
When does deduplication Occur
Data is handled immediately and deduplicated as part of the process of writing data to disk. This is not so common, as unless its done very well, there can be alot of latency involved due to the deduplication process having to take place before a write is commited to disk. Data Domain do this and they do it very well. Their process uses a system called SISL, where write performance relys on CPU power rather than spindle count. Fingerprints are stored in memory, so that when data is written to the device, the fingerprint lookups are handled in memory and the CPU power determines the speed of the hashing process. If it doesn’t find a fingerprint in memory, it will look for it on disk, but upon finding it will pull up a shed load of fingerprints with it which relate to the same locality of data (kinda similar to cache prefetching.), so sequential writes can again reference fingerprints from memory not disk.
Want more info on this.. see attached (DataDomain-SISL-Whitepaper).
This is most common as most people can’t handle inline dedupe as efficiently as Data Domain.
Level of Deduplication Granularity
File Level Dedupe
File level dedupe is where an entire file is hashed and referenced. Also known as single instancing, this is not as efficient as block level dedupe, but requires less processing power. You may be familiar with this technology from the likes of EMC Centera or Commvault’s SiS Single instancing from Simpana 7.
Fixed Block Dedupe
This is hashing individual blocks of data in a data stream and is much more efficient than file level dedupe. Although it incurs a fair amount more processing power.
Variable block size dedupe
This is essentially where the size of the blocks being hashed can be variable in size. The benefits of this for file data is minimal. This is best placed when there are multiple data sources in heterogenous environments or environments where data may be misaligned (ie, B2D data or VTL). Data Domain do this… and inline.. which is impressive.
EMC Celerra uses File level dedupe and Compression, it also uses a post process mechanism. So, when specify that you wish to enable dedupe on a file system, you also specify a policy of file types and/or files of a certain age which qualify for dedupe. It then periodically scans the appropriate file system(s) for files which match the policy criteria, compresses them, hashes them and moves them to a specific portion of the file system (transparent to the user), when the next scan runs and it finds new data which meets the policy criteria, it will compress them and hash them, then it will look at the hashes of previously stored files. If a file exists.. it doesn’t get stored (just points to the existing original file), if it doesn’t.. it gets stored… simples. The fact that there will most likely be a fair few duplicate files in user home directories, means that you should see a fair number of commonalities which qualify for dedupe in many environments and with compression also being used, will assist in making the best usage of available storage on your Celerra.
More information in an EMC white paper on the subject here.
and an online demo from emc below.