EMC Avamar – Deduplication in Backup

With all the backup products in the market, how do you choose which product is suitable for any given requirement? Well for this post, I shall introduce you to EMC Avamar. Avamar technologies was aquired by EMC back in 2006 and provides efficiency in backup via deduplication at block level. 

Avamar is predominantly positioned today as an appliance with the Avamar software pre-isntalled on a Dell Power Edge 2950. In the same fashion as any other backup products, agents are deployed on systems to be backed up..  nothing new there. The intelligence comes into play where deduplication is concerned. Avamar agents will keep track of blocks which have been backed up and only send changed blocks of data over the network. This has a few benefits:

  1. Capacity Efficiency (only change a word on a previously backed up document, you only backup the changed blocks..  not the whole document again.
  2. Network utilisation. End users become accustomed to the fact that their network will be hit hard during a backup window. With products like VMWare; server sprawl is rife and you can end up really hammering your network. With Avamar only backing up changed data, network utilisation during backups is dramatically reduced.
  3. Remote offices. Many Companies have remote offices dotted around with piddly little links, block level changes will be significantly smaller than file level incremental changes. So bandwidth issues aren’t allways as apparent with avamar.
  4. Avamar plays best with customer data that have large commonalities (ie, file data, OS library files, etc.). Less commonalities (ie, Database volumes, where rate of change is greater) will mean a lower dedupe ratio.

 

Avamar appliances can be sold as single nodes (in which case you need a replicated pair of single nodes for EMC to support the solution) or as a RAIN solution which works in much the same way that RAID does. You have a parity node, capacity nodes and a spare node.

click to enlarge

If you come up with an avamar opportunity and want to have any level of accuracy in terms of the size of appliance required.

These are the questions the reseller needs to be asking.

Capacity Questions

  • How much of the data is File data ?
  • How much data is Database data ?
  • Is any data VMFS (VMWare system) ?
  •  if so how much ? how much data is email data ?
  • Is there any mail archive data, if so how much ?

for each of the above, what are the following :

  • Number of daily backups being retained number of weekly backups being retained
  • number of monthly backups being retained (would advise not to retain more than 3 months of data, is it becomes a very costly solution).
  • What is the daily rate of change for each of the above (% Approx)
  • What is the projected annual growth of data (% approx)
  • How many sites are being backed up
  • Data being backed up per site
  • size of link between sites

Plus the obvious questions around how much they’re looking to spend, the smallest change in an avamar config can have potentially large cost implications.

See the below video for a more in-depth white board curtosy of EMC:

Advertisements

About interestingevan

I work as a Technical Architect for a Storage and Virtualisation distributor in the UK called Magirus. The goal of this blog is simply to be a resource for people the want to learn about or go and Sell storage. I'm a qualified EMC Clariion Technical architect, Commvault Engineer and Cisco Unified computing specialist. I have also worked with the rest of the EMC portfolio for a good few years. This Blog will provide information on how specific technologies work, what questions need to be asked in order to spec certain products, competative info and my two pence on some of these technologies. Please feel free to provide feedback as to the content on this blog and some bits you'd like to see. View all posts by interestingevan

8 responses to “EMC Avamar – Deduplication in Backup

  • cadence

    Hi,

    Hopefully I am talking to the right person as you mention you are a qualified EMC Clariion Technical architect and Commvault Engineer.

    I am attempting to architect a valid solution to backup using commvault a celerra NS480 FC and I am getting alot of confusion as soon as I mention the work Celerra rather than Clarrion. Basically A Celerra NS480FC is a Clarrion CX4 exposed. Can you point me in the right direction as to whom I should be talking to to determine valid supported backup solutions. Basically I want to use snapshot technology from the clarrion and this appears to be a non runner.

    • interestingevan

      Hi Bob,

      The Clariion backend of the NS480 uses snapview to take point in time copies of production volumes provisioned via fiber channel. You should be able to use the Snapview Quick recovery enabler from commvault in conjuction with the proxyhost agent for the appropriate operating system. Commvault will effectively call upon Snapview as the Hardware snapshot provider rather than its own snapshotting tool. The below link should help :

      http://documentation.commvault.com/commvault/release_8_0_0/books_online_1/english_us/search/search.htm

      With regards to it being supported, commvault happily support it as per the above link and EMC only really care about the switches/HBA’s/Servers/firmware revisions and operating system of the attaching servers being supported (this can all be checked using a tool called the elab navigator which you have access to via powerlink if you are an EMC partner/customer http://elabnavigator.emc.com ). If you’re concerned about snaps and backups being application consistent, that depends on the applications having a supported idata agent to handle that. I would suggest engaging your reseller to assist with further qualifying the application backup strategy and engaging with EMC / Commvault if you feel you need a more compelling comfort factor than one man and his blog. I hope this has helped somewhat.

  • interestingevan

    one question, are you planning on using NDMP for CIFs and/or NFS ? if so you may also want to read the below link, it talks through doing NDMP backups from file system checkpoints.

  • cadence

    Tks for the feedback.

    We are already using ndmp sucessfully. So CIFS and NFS are sorted. Because we are looking at a SAN environment ( Celerra FC )at our Production site and a NAS at our BCP site ( Celerra ), the oracle backups need to be made at the RMAN level. So I think the plan now is;

    1. Put production oracle database using ASM and LUNS from Celerra FC into hot backup mode.
    2. Clone LUNS.
    3. Present LUNS to a backup dedicated database to allow the following to occur;
    a. take backup load from production database ( Remember backup has to flow through RMAN interface for BCP comptability ).
    4. Backup database through idata agent so we can restore if required at BCP site.

    So it looks like the whole concern about snapview and proxy host disappear. A POC will validate this fairly quickly.

    • interestingevan

      Hi Bob,

      It seems that Oracle RAC isn’t supported with the snapbackup feature of Commvault with Simpana 8 SP3. See below snippett from Commvaults tech reference library :

      “You are using an Oracle® Real Application Cluster (RAC) on a supported Linux® platform. You would like to use CommVault’s Snap Backup feature on this “shared-disk architecture”.
      Unfortunately, as of Simpana® 8.0 SP3, Oracle’s RAC architecture is not supported by the Snap Backup feature.

      In using Snap Backup on an EMC® Symmetrix or EMC CLARiiON storage array for a single Oracle Server (without RAC or ASM), we are simply placing the database in hot backup mode.

      [ASM is Automatic Storage Management which provides filesystem and volume manager capabilities built into the Oracle database kernel. It simplifies storage management tasks such as creating databases and diskspace management.]

      [Hot Backup Mode means a tablespace is checkpointed, the checkpoint SCN marker in the datafile headers cease to increment with checkpoints, and full images of changed database blocks are written to the redologs. The SCN is an internal number maintained by the database management system to log changes made to a database.]

      We execute the snapshot from the storage array, and then we re-enable standard operations on the Oracle database. We also recommend running Archive Log Backups to provide a mixed level of protection for complete point-in-time recovery by replaying the logs.”

      You could look at using something like EMC replication manager to automate the process of calling hotbackup mode, flushing host buffers, initiating and mounting the snap, you can then script it to call the backup software. I can send you through a white paper which describes replication manager integration with Oracle RAC using ASM if you need it.

  • cadence

    I will take up your offer of the white paper.

    Tks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: