What is a SAN ? A SAN is a storage area network, with the sole purpose of providing dedicated storage to a server environment at block level. A SAN provides a central point of management for server storage, flexibility as to how that storage is managed and addresses the whole problem of under utilised pools of storage which you get when giving direct attached storage to servers on a one to one basis.
With the ever increasing interest in virtualisation technologies such as VMWare; virtualisation is driving more and more storage oppurtunities, as centralised storage is a must have requirement for functionalities such as High Availability and VMWare features such as VMotion and the likes.
So, you’ve established an oppurtunity; mr customer tells you they want to buy a SAN, what next ?
First things first, why do they want a SAN ? Typical reasons would be: they have direct attached storage for each server and its a nightmare to manage, they’re implementing or have implemented a virtualisation technology and need a SAN to unlock features around high availability, load balancing etc, they have an old SAN which is obsolete and simply won’t scale to the capacity or performance they need or our favourite, they bought a SAN based on a price driver only and their IT systems are suffering !
So the first exercise is something of a data gathering excersize and really you need to ween the following information :
What applications will they be giving storage to ?
Performance requirements of these applications (based on perfmon/IO stat figures if possible).
Remember that different types of drives have different performance ratings. SATA drives being higher density lower performance drives are suited for the sequention type data (ie, file data, Backup to disk, archive, streaming).
Applications which will be accessing storage with data of a more random profile (ie, SQL, exchange, oracle, database in general) will usually require something with a bit more grunt, such as fibre channel or SAS drives which typically come in 10k RPM and 15K RPM flavours.
The rule of thumb being more drives = more performance (there is a science to this, so understanding the applications is key). Sometimes a good indicator would be to look at a given server and the number of drives (DAS) its using, if its performing as it should, then ensure it has drives which will give => IOPs in the SAN to meet its perfomance requirements in the SAN. Also, trying and keep data of a senquetial profile on seperate disks from data of a random type profile.
What is the current Capacity requirement ? and what is the expected growth over 3 years ?
How many LUNs (volumes) will be required for the customer environment ?
How many servers will be attaching to the SAN ?
What kind of IOP/Bandwidth load will each server be putting on the storage network ?
Most customers will want to have a redundant infrastructure so make sure you eliminate single points of failure from the environment. This means ideally 2 NICs or HBA’s per server, 2 switches, failover/multipathing software, etc. So with this in mind make sure that you size the number of ports required on fibre/layer 2 switches accordningly (taking into account ISL’s).
Local recovery would be achieved by taking a point in time or fully copy of a given volume within the storage array, so if data is corrupted, as previous non corrupt version of the data from a prior point in time can be utlised.
If doing this then ensure the appriopriate capacity is accounted for on the storage array :
If using clones, then how ever many clones are being used will require the full capacity of the source volumes they are associated with. If Snapshots are being used, then each point in time ideally requires 2 x the amount on changed data of the source volume during the life of the point in time copy. Normally point in time copies would only exist for a number of hours until the next point in time image is taken. Its advised that volumes used for snapshotting reside on seperate drives to their respective source volumes as there is an IO overhead involved in snapshotting.
If the customer wants to implement a remote DR solution, sometimes customer expectations of what is achievable within a given price bracket is slightly out of line with reality. So its important to sit with the customer and discuss things like:
- How much data can I afford to lose before my business is adversley effected (RPO – Recovery Point Obective)
- How much time can I allow for this volume/application to be offline before it becomes unacceptable (RTO – Recovery Time Objective)
The smaller the RTO and RPO, the greater the cost to the customer.
If the customer want to replicate at array level, using something like EMC’s Mirrorview/Syncronous product for example. These are the considerations.
Application response time on critial applications is always going to be the key consideration. If replicating syncronously, the write from the application is sent to the storage array and before a write acknowledgement is sent back to the application, the write must have commited to disk on both local and remote storage systems. If the response time requiment is around 5 ms, then that write needs to have written to have travelled the Link, committed to disk, acknowledge that write back to the production storage and then written to disk on production disk within 5ms, this means fast spinny disks and a very good link. If this is not sized properly, application will time out and Database admins will scream bloody murder.
Volumes which may not require such an agressive RPO, a-syncronous replication may be the way forward. So lets take a look at EMC’s Mirrorview/A-Syncronous product.
Firstly, the source volume is syncronised in its entirety with its respective opposite volume on the DR site. After this has syncronisation has taken place, then mirroriew will take point in time images of the source volume at specified intervals and replicate only delta changes to the remote site. The benefit of this would be that the application recievs acknowledgement of a write once its been written to disk and is not dependant upon the whole replication process.
There are some other methods, such as Continuous data protection (journalling) and other methods which ensure application transactional consistency, but I will come back to those on another post.