Author Archives: interestingevan

About interestingevan

I work as a Technical Architect for a Storage and Virtualisation distributor in the UK called Magirus. The goal of this blog is simply to be a resource for people the want to learn about or go and Sell storage. I'm a qualified EMC Clariion Technical architect, Commvault Engineer and Cisco Unified computing specialist. I have also worked with the rest of the EMC portfolio for a good few years. This Blog will provide information on how specific technologies work, what questions need to be asked in order to spec certain products, competative info and my two pence on some of these technologies. Please feel free to provide feedback as to the content on this blog and some bits you'd like to see.

Implementing your own corporate drop box ?

Upon perusing the Intel Cloud Builders site for interesting new cloudy vendors and reference architectures, I came across an interesting new company called Oxygen Cloud.  Although Storage as a Service is a reasonably well formed concept, much of the attention has been around public provider services such as livedrive, drop box or backup with products such as EMC Mozy. This is all well and good, but a number of companies have concerns over how the “public cloud” type products align to corporate policy. Take drop box for example, the ease of how data is shared or migrated across to other devices maybe doesn’t align to how they want to control one of an organisations most valuable commodities.. data.

So how does an organisation offer device agnostic storage, not based on the contraints of conventional file systems, in such a fashion where they maintain control ? Ultimately there are 101 ways to skin a cat… but as far as skinning cats goes, I quite like this one.

The Back End

You take a product like EMC Atmos; EMC Atmos is what we call cloud optimised storage. In real terms this means the way data is stored, how available it is, how its tiered across different costed storage and where it is stored geographically is handled by repeatable policy, not only this, but also meta data is leveraged to the nth degree (beyond that of traditional metadata uses in traditional file system). I won’t re-invent the explanation as EMC has done a good job of explaining this concept with pretty pictures (video below).

Atmos itself has a fair amount to it, but my point being  is that this use of metadata means that not only can the way data is handled be derived from this meta data, but now the infrastructure can have some awareness the context of data, context which is relevant to a front end such as Oxygen Cloud. Yes Atmos can deliver storage with NFS or CIFS, this is fine, but not overly exciting. The cool part is giving a front end direct access to the context of a file or a set of files using REST, rather than just last modified date and all the usual stuff. The metatags can be used to define the segregation of data in a muti-tenant environment or application specific elements, such as how a file can be shared and with whom.

Also, with Atmos being scale out storage the upper limits of scalability or need is say endless ? (or as near as), with the beauty of the storage being content addressable and not based around hierarchal file systems meaning that as the system is grown, you are not constrained and challenged by overly complex file system structures which need to be maintained.

Clearly availability is important, but hey..  this is expected. Needless to say, the system handles it very well.


The Front End

I’m not going to spend a great deal of time upping my word count on this section, as Oxygen Cloud have some very descriptive videos (further down), but the key things here are that the company controls the data in their own way. We have LDAP/AD integration, full access controls, we can set expiration of a link if we do share a file publicly, encryption at all point of a files transit and file can be presented as a normal explorer/finder plugin (same way we view normal CIFS shares) or files can be accessed via devices such as iPhone/Pad.  One nice feature for me is that if a phone is stolen or an employ leaves, the organisation can sever access to data/directories on a per user or device basis.

Anyway, worth spending a bit of time watching the below :


I shall be building this solution out on the lab over the next month or so (as much as the day job allows), so watch this space for more info and a revised review.


VMware – AppBlast. One word…. Wow!

VMware have a history of innovation and creating disruptive technology. Disruption may sound like bad thing, although as we know with things like the VMware hypervisor, disruption makes people money. It may be disruptive, but if the benefits are clear then people standardise on the technology and IT Resellers, Vendors and professionals benefit from the plethora of technology requirements which spill out the sides to accomodate these new marvells of modern tech.

VMware first set the trend when they abstracted the OS dependancy on directly seeing physical hardware, by introducing a hypervisor; now they have taken away the application dependancy on seeing the operating system..  lovelly jubbly ! this sounds good, but why ? how? what? 

I’m a little light on the nuts and bolts right now, but needless to say; needless to say, if you can deliver a windows/linux/mac application to any device with a browser supporting HTML5, the benefit is clear ! Visio on my iPad.. yes please, Safari on my Windows PC.. Why not ?!

I shall await the finer details with baited breath, but leave you with a pretty cool demo as shown below..   geeky soul food ! Enjoy !!

 


What is a VBlock.. the latest

Overview

Back in 2009 VMware, Cisco and EMC joined forces to create a new approach to selling full datacenter  pre-configured solution stacks. Rather than simply a gentlemen’s agreement and a cross pollination of development from the three companies, it was decided they would  create a new start up business as the delivery mechanism to drive this new concept to market. This new start up, known as VCE (Virtual Computing Environment), would take to market a new range of pre-validated, pre-configured and singularly supported solution stacks called VBlock.

The purpose of a VBlock is to simplify infrastructure down to effectively units of IT and define that a workload can be supported by “a number of floor tiles” in the data centre. This approach is enabled by the fact that everything within a VBlock is pre-validated from an interoperability perspective and customizable components are reduced down to packs of Blades (compute), Disks and network components  required to connect into the upstream customer environment, means that solution design is massively simplified and can be focus to supoprting the identified workload.

Pre-Validated

VCE extensively soak test workloads and configurations available within the VBlock to reduce pre-sales time spent on researching interoperability between the Network/compute/storage layers of the Data centre. This means that defining how a workload is supported is the focus and planning phases are significantly reduced. This pre-validated approach means that power and cooling requirements are easily determined  in preparation for site deployment.

Pre Build and Pre Configured

As part of the VBlock proposition, the physical and logical build process are carried out in VCE facilities, so that time on customer site is restricted to that if integrating into the customer environment and application layer services. This reduces deployment time massively.

Single Support Presence

 Rather than dealing with the parent companies (VMware, Cisco, EMC) of VCE on a per vendor basis. VCE act as a single support presence and will own any VBlock related issue end to end. This is partly enabled by the pre-validated aspect of VBlock, as VCE have a number of VBlocks in house and provided the VBlock is constructed as per approved architectures, VCE can simulate the environment which has caused the error to decrease time to resolution.

The Technology

The technology element at the core of the VBlock consists of VMware VSphere, Cisco UCS (Cisco’s Unified compute solution), Cisco Nexus (Cisco’s Unified fabric offering) and EMC VNX’s unified storage platform. Cisco simplify management of their blade computing platform down to a single point of management (UCS Manager) which resides on the 6100 Fabric interconnects and allows for  “stateless” computing, in that it is possible to  abstract the server “personality” (Mac addresses, word wide names, firmware, etc) away from the server hardware, then create and apply these personalities on demand to any blade within the UCS system. This management system manages all aspects of the UCS system (blade/chassis management, connectivity, firmware and connectivity). Cisco’s Unified Fabric commonly refers to their Nexus range (but elements of unified fabric apply to UCS). Cisco Nexus allows both IP network traffic and fibre channel traffic to be delivered over common 10 Gigabit switches using FcoE (Fibre Channel over Ethernet). In addition the Cisco Nexus 1000v enables deployment of a virtual switch within the Vmware environment ,allowing network services to be deployed within virtual infrastructure  where it was previously only possible in the physical world.

EMC VNX is a multi protocol storage array allowing for storage connectivity via block storage technologies (iSCSI/Fibre Channel) or NAS connectivity (CIFS/NFS/pNFS), giving the end user free choice as to how storage is provided to the UCS Server estate. EMC also drive efficiencies in how capacity and performance are handled by leveraging technologies such as deduplication and thin provisioning to achieve a lower cost per gigabyte. EMC are also able to leverage solid state disk technologies to extend storage Cache or enable sub LUN level tiering of data between Solid state disk and traditional mechanical disk technologies based on data access patterns.

VMware Vsphere has provided many companies cost saving in the past but in the Vblock is leveraged to maximum effect to provide operational efficiencies with features such as dynamic and automated mobility of virtual machines between physical servers based in load, high availability and the native integration that is inherent between VMware and EMC with the VAAI API integration. This integration enables much lower SAN fabric utilisation for what were very intensive storage network operations such as storage migration. EMC Powerpath/VE is also included in the Vblock which enables true intelligent load balancing of storage traffic across the SAN fabric.

Management

VCE utilise the Ionix Unified Infrastructure Manager (UIM) as a management overlay which integrates with the Storage,Compute,Network and Virtualisation  technologies within the Vblock and allows high level automation of and operational simplicity with how resources are provisioned within the VBlock. UIM will discover resources within the VBlock and the administrator then classifies those resources. As an example High performance blades may be deemed “Gold” blades verses lower specification blades which may be classified as “silver” blades. This classification is also applied to other resources within the Vblock such as storage. Once resources have been classified, then they can be applied on a per tenancy/application/department basis which is allowed access to differing levels of Gold/silver/Bronze resources within the Vblock. UIM now also includes operational aspects which give end to end visibility of exactly which hardware within a VBlock a particular VM is utilising (Blades, disks, etc).  Native Vendor management tools can be utilised, although with the exception of Vcenter, UIM would be the point of management of 90% of VBlock tasks after initial deployment.

In Summary

The VCE approach to IT infrastructure with VBlock enables simplification of procurement and IT infrastructure  planning as VCE are able to reduce their infrastructure offerings to essentially  units of IT which are sized to support a defined workload  within a number of “floor tiles” in the data centre. These predetermined units of IT have deterministic power and cooling requirements and scale in such aware to where all VBlock instances (be it few or Many) can be managed from a single point of management and are all supported under a single instance of support. Leveraging technologies which drive efficiencies around Virtualisation, networking, storage and computing we see benefits such as higher performance in smaller physical footprints when addressing storage and compute, minimised cables management and complexity with 10GbE enabling technologies such as Fibre Channel over Ethernet and operational simplicity with the Native Vblock unified infrastructure management tool UIM.management tool UIM.


Sizing for FAST performance

So EMC Launched the VNX and changed elements of how we size for IO. We still have the traditional approach to sizing for IO in that we take our LUN’s and size for traditional RAID Groups. So lets start here first to refresh :

Everything starts with the application. So what kind of load is the application going to put on the disks of our nice shiny storage array ?

So lets say we have run perfmon or a similar tool to identify the number of disk transfers (IOPS)  occurring on a logical volume for an application. So we are sizing for a SQL DB volume which is generating 1000 IOPS for the sake of argument.

Before we get into the grit of the math. We must then decide what RAID time we want to use (as below are most common for transactional elements).

RAID 5 = Distributed parity, has a reasonably high write penalty, good usable vs raw capacity rating (equivalent of one drives usable capacity for parity) , a fair few people use this to get most bang for their buck. bear in mind that RAID 5 can suffer single drive failure (which will incur performance degradation), but will not protect from double disk failure. EMC Clariion does employ the use of hotspares, which can be proactively built when the Clariion detects a failing drive and used to substitute the failing drive when built, although if no hotspare exists or if a second drive fails during a drive rebuild or hotspare being build, you will lose your data. write penalty = 4

RAID 1/0 = Mirrored/Striped, lesser write penalty, more costly per GB as you lose 50% usable capacity to mirroring. RAID 1/0 provides better fault resilience and “rebuild” performance than RAID-5. It has better overall performance by combining the speed of RAID-0 with the redundancy of RAID-1 without requiring parity calculations. write penalty = 2

Yes there are only 2 RAID types here, but this is more to keep the concept simple.

So, depending on the RAID type we use, as certain write penalty is incurred due to mirroring or Parity operations.

Lets take a view on the bigger piece now. Our application Generates 1000 IOPS. We need to separate this into Reads and Writes :

So lets say. 20% writes Vs 80% reads. We then multiply the number of writes by the appropriate write penalty (2 for RAID 10 or 4 for RAID 5). Lets say RAID 5 is our selection :

The math is as follows :

800 Reads + (200 Writes x 4) = 1600 IOPS. This is the actual disk load we need to support.

We then divide that disk load by the IO Rating of the drive we wish to use. Generally speaking at a 4KB block size the below IO Ratings apply (this goes down as block sizes/pages to disk sizes get bigger).

EMC EFD = 2500 IOPS
15K SAS/FC = 180 IOPS
10k SAS/FC – 150 IOPS
7.2K NLSAS/SATA = 90 IOPS

The figure we are left with after dividing the disk load by the IO Rating is the number of spindles required. This is the same when sizing for sequential disk load, but we refer to MB/s and bandwidth instead of disk transfers (IOPS). Avoid using EFD for sequential data (overkill and not much benefit).

15k SAS/FC = 42 MB/s
10k SAS/FC = 35 MB/s
7.2k NLSAS – 25 MB/s

Bear in mind this does not take array cache into account and sequential writes to disk benefit massively from Cache, to the point where many papers suggest that NLSAS/SATA give comparable results to FC/SAS.

So What about FAST ?

Fast is slightly different. It Allows us to define Tier 0, Tier 1 and Tier 2 layers of disk. Tier 0 might be EFD, Tier 1 might be 15k SAS and Tier 2 might be NLSAS. When can have multiple tiers of disk residing in a common pool of storage (kind of like a raid group, but allowing for functions such as thin provisioning and tiering).

When can then create a LUN in this pool and specify that we want the LUN to start life on any given tier. As access patters to that LUN are analysed by the array over time, the LUN is split up into GB chunks and only the most active chunks utilise Tier 0 disk, the less active chunks are trickled down to our Tier 1 and Tier 2 disks in the pool.

fundamentally speaking, 90% of the IOPS for performance with the Tier 0 disk (EFD) and bulk out the capacity by splitting the remaining capacity between tier 1 and tier 2. You will find that in most cases you can service the IO with a fraction of the number of EFD disks vs if you did it all with SAS disks. I would suggest that if you know something should never require EFD such as B2D or archive data or Test/Dev, put them in a separate disk pool with no EFD.


Power to the People !! A beginners guide to the life force of your datacenter

 Aside from chalking, talking, designing and evangelising about the exciting things such as the whizzy storage bits, new blade technologies and the wonder that is unified fabric; I also have to drop back into my corduroys and sandals to get down and geeky with some of the more fundamental elements of the data center. One in particular being..  power.

As much as the network, the storage and all these other elements are pivotal in any solution without paying close attention to the life force behind all of this, we might aswell be selling rocks.

This isn’t going to be an extensive post, just enough to cover a few principles. 

Ok, so we’ve put together a high level design, we’ve worked out how the servers talk to each other, how the get their storage, how they see the outside world and how we make sure this we stop our little world from falling over should we suffer a failure. What else is there ?! 

Lets throw a few scenario’s out there and get to the point. You are a project manager and under strict deadlines to get your infrastructure implemented in time for a new global application going live. You get your pallets of hardware in good time and your engineering resource is all booked in..  fantastic.

So, next step..   avoid aggravating implementation engineers and project manager alike by following a few key points :

Establish whether your inrack devices require C20 or C14 ports on your PDU, then ensure your PDU will accommodate this. Also make sure you have specified power cables for cabinet power when ordering your devices. (running around after power cables can be annoying when running behind on network configuration).

Ensure that your PDU’s will support the power draw of the device when they cycle. Normally vendor specifications should show the cycle power draw and the operating power draw. If you want to turn everything on at the same time, you need to pay attention to the first one.

Make sure that you are matching the current and phase requirements of your PDU’s with the power you are driving to the rack. Many organisations will run 3 phase power to the room, then single phases to the rack. If you have a rack full of blade servers, you may need to drive 3 Phase power to the rack and most likely 32 Amp (unless you can cram 4 PDU’s in each rack with a little bit of creative cable management, although be warned 16 Amp PDU’s tend to be light on C20 connections).

Make sure that if you are running IEC type commando power connections to the rack, you don’t go and specify PDU’s with NEMA power drop’s. A bit of communication between your electrician and they guys specifying your PDU’s can solve save a world of pain.

If you run a global operation ensure your map the power and current requirements to the countries of deployment. There is a page on my site which maps some of these requirements.

So, A little bit of maths :

To determine what power load you can support on a PDU, it goes something like this :

Single phase PDU

Current (Amps) x Input voltage  = Watts

So for a single phase 32 Amp PDU in the UK, we would see:

32 x 230 = 7.36 Kw

For a 3 Phase PDU we need to find our input voltage, which is the output voltage multiplied by the square root of 3 (1.73), so for a UK 3 Phase we would have 230 x 1.73 = near on 400 (398).

We then take our input voltage (400) and multiply this by the current (lets say 32), then multiply by 1.73 again, so :

(230 x 1.73) x 32 x 1.73 = 22 Kw.

So get the power cycle and operating power draw information from the vendor (or do your own testing). Check you have the right power connectivity and size accordingly.

Then all you have to do is ensure you balance the power between your PDU’s while providing some redundancy (don’t plug both your server PSU’s into one PDU !!). Also remember that your blade chassis may have 3 powers supplies in it for N+1 redundancy to protect you from PSU failure, but if the PDU with 2 x PSU’s plugged into it fails, then you’re buggered, so you may want to add that magic number 4, to give you grid redundancy.

So, there we have it, a somewhat rambling of a post and most likely telling a whole load of people how to suck eggs…    but if I can save just one project manager a headache, its worth while 😉


Having performance issues with Celerra and NFS Datastore performance ? patch ! Patch ! Patch !

I sat in on an interesting session yesterday which gets under the covers of VMware performance on NFS datastores  hosted on Celerra NS series. This was presented by a chap called Ken Cantrell who works for EMC engineering and was off the back of the fact that feedback from the field showed in many cases that some customers VMware estate simply wasn’t performing using NFS on Celerra. This is not the case for everybody, but it certainly was an issue. Essentially what was happening is the Celerra was dealing with an extensive amount of NFS calls to the UxFS log (predominantly Getattr type calls) and it was slowing down response types back to the host. EMC tested a workload on the Celerra using DART 6.0.4 using Jetstress. Jetstress effectively simulates exchange workload and also halts the benchmark if response times exceed 20 milliseconds on the basis of 20 milliseconds being to poor for a exchange. EMC were seeing that with the base version of DART 6 response time were exceeding 25 Milliseconds for the Exchange workload they tested on an exchange VM sitting on a Celerra hosted NFS datastore. EMC then released a patch upgrade which brought that down to sub 15 ms. EMC then released an experimental epatch (DART 6.0.4.805 which brought response times down further to sub 10ms (closer to 6ms).

So bottom line is…  don’t just suffer poor performance. Feed back to EMC if you see issues and also keep an eye on patch updates to the OS. They are there to resolve noted issues !

Material:

A good post on ECN with details of the new patch :

https://community.emc.com/thread/118430

A good blog comparing iSCSI Vs NFS for VMware:

http://goingvirtual.wordpress.com/2010/04/07/iscsi-or-nfs-with-emc-celerra/

A good blog post by Jason Boche comparing the performance delta between Dart 6.0.4 and Dart 6.0.4.805

http://www.boche.net/blog/index.php/2011/03/21/emc-celerra-beta-patch-pumps-up-the-nfs-volume/


EMC World 2011 – Las Vegas – day 1

So after the first day at EMC World what Marvels of technology have been announced ?
What groundbreaking nuggets of geeky goodness to be announced. So, first things first VPLEX ! looks like they may have cracked it..   Active/active storage over a synchronous distances, Geoclusters will never be the same again !!..   and also a slightly ambiguous announcement around integration with Hadoop opensource (more to follow on that).

What was the message of the day though ? What was this years theme..   This year EMC are talking about Big data and the cloud. Clearly recent acquisitions of Isilon and Greenplum have planted EMC’s head firmly back in the clouds.  Greenplum giving end users the ability to scale out Database architectures for data analytics to mammoth scale with Greenplums distributed node architecture and massive parallel processing capabilities. To br frank, learning about the technology was borderline mind numbing, but my god its a cool technology. Then we have large scale out NAS with Isilon and its OneFS system giving the ability to present massive NAS repositories and scale NAS on a large scale. So obviously, EMC are talking about big data.

I also had the opportunity to sit in on an NDA VNX/VNXe session and what they’re going to do is….    aaah, I’m not that stupid. But needless to say, there are some nice additions on the way, the usual thing with higher capacity smaller footprint drives and getting more IO in less U space, but also some very cool stuff on the way which will enable EMC to offer a much cheaper entry point for compliance ready storage..  watch this space.

In true style EMC threw out some interesting IDC touted metrics further justifying the need to drive storage efficiencies and re-iterating the fact that there will always be a market for storage. So, our digital universe consists of 1.2 Zettabytes of data, currently, of which 90% of that is unstructured data and that figure is predicted to grow by x44 over this decade. Also 88% of fortune 500 companies have to deal with Botnet attacks on a regular basis and have to contend with 60 Million Malware variants.  So making this relevant, the 3 main pain points of end users are; firstly our time old friend budget, then explosive data growth and securing data.

So how have EMC addressed these ? Well, budget is always a fun one to deal with, but with efficiencies in storage by way of deduplication, compression, thin provisioning and auto tiering of data, end users should get more bang for their buck. Also, EMC easing up on the rains with pricing around Avamar and the low entry point of VNXe, this should help the case. Dealing with explosive data growth again tackles with deduplication, compression, thin provisioning and auto tiering of data, but also now with more varied ways of dealing with large sums of data with technologies such as Atmos, greenplum, Isilon. Then the obvious aquisition of RSA to tie in with the security message, all be it that has had its challenges.

I’m also recently introduced the concept of a cloud architect certification track and the concept of a Data Scientist (god knows, but I’ll find out). So I went over to the proven professionals lounge and had a chat with the guys that developed the course. Essentially it gives a good foundation for steps to consider when architecting a companies private cloud, around Storage, virtualisation, networking and compute. If you’re expecting a consolidated course which covers the storage consolidate courseware, Cisco DCNI2, DCUCD course and VMware install configure manage,  then think again, but it does set a good scene as an overlay to understanding these technologies. It also delves into some concepts around cloud service change management and control considerations and the concept of a cloud maturity model (essentially EMM, but more cloud specific). I had a crack at the practice exam and passed with 68%, aside from not knowing the specific cloud maturity terms and EMC specific cloud management jargon anyone with knowledge of servers, Cisco Nexus and networking, plus virtualization shouldn’t have to many issues, but you may want to skim over the video training package.

There was also a nice shiny demo from the Virtual Geek Chad Sakkac showing the new Ionix UIM 2.1 with Vcloud integration using CSC’s cloud service to demonstrate not only the various subsets of multi tenancy, but also mobility between disparate systems. When they integrate with public cloud providers such as Amazon EC2 and Azure, then things will really hot up, but maybe we need some level of cloud standards in place ?…   but we all know the problem with standards, innovation gives way to bureaucracy and slows up…   but then again with recent cloud provider issues, maybe it couldn’t hurt to enforce a bit of policy which allows the market to slow up a little and take a more considered approach to the public cloud scenario..   who knows ?

Anyway.. watch this space..  more to come


Cisco UCS – Extended memory architecture.. What is it ?

As promised in my previous post, lets go through the blades available in Ciscos Unified Computing System. Essentially we have a few flavours of blades, full width and half width blades, some which utilise extended memory architecture (co developed by Intel and Cisco, which we’ll touch on), Daul socket for the most part with the exception of one which is 4 socket and a veritable feast of different memory options, processor options, IO card options and drive options.

However, I wanted to start with the component pieces before we delve into schematics (because you can read about those on the Cisco Site) and spend a little more time on each piece.

So what is this extended memory architecture Cisco keep bangin on about ? lets start with the why before we get to the how. Any tom, dick and harry can stick a load of Memory DIMMs in a server and scream about the fact they’ve got a few hundred gig of memory..   so why is this different ?

Typcally each CPU on a server has 3 memory channels for… you guessed it.. accessing memory. The number of transfers per second at which memory will perform is typically dictated by the number of DIMMs that are populated per memory channel. Typically when you populate 1 x DIMM per memory channel memory runs at 1333 MTpS (Million transfers per second), when you populate 2 DIMMs it would run at 1066 MTpS and when you get to a depth of 3 DIMMs per channel you’re running at 800 MTpS (not ideal). So as memory desity gets higher, performance can suffer (as shown below).

 

Cisco, in combination with Intel have developed something called the Catalina chipset. Despite sounding like a car, the Catalina chipset is quite a nifty addition. Effectively acting like a RAID controller for memory, it sits downstream of the CPU memory controllers  (one per memory channel) and presents out four additional memory sockets per channel, then presents an aggregate of the memory sitting beneath it as one logical DIMM up to the CPU memory channel, meaning that you can have denser memory configurations without memory ever clocking in below 1066 MTpS ( as shown below).

The two benefits of this being that you can address a larger amount of memory for memory intensive applications/virtual machines/whatever with a lower socket count, also making it possible to see higher consolidation ratio’s when virtualising machines or you can achieve moderate memory configurations using lower capacity less costly DIMMs. Cisco currently utilise this technology with the Westmere and Nehalem CPUs, B250 Blades servers and C250 Rackmount servers.

I nice little clip from the Cisco Datacenter youtube channel with a brief intro into extended memory

Either way..   not a bad idea..


What is this Cisco UCS server Business ?

As I delve into greater numbers of VBlock opportunities, more and more people are asking questions around the Cisco UCS compute offering and what that brings to the table with VBlock. This is a large subject to cover in one post, so I shall start with the fundamentals and start with the B Series offering, as that is where a lot of the more interesting subject matter resides.

The Cisco Unified Computing offering when discussing the B Series relates to Cisco’s Blade server offering. In terms of the architecture of the UCS system much like conventional blade offerings we have :

  • Blade Servers
  • Blade Chassis
  • Blade Chassis Switches (known as fabric extenders in this case, which are slightly different)

But, with the UCS system, rather than having standard upstream switches we have what we call Fabric interconnects. These are effectively the same hardware as the Cisco Nexus 5010 and 5020, but running something called UCS Manager rather than just standard switch software. One of the main differentiators of UCS is that all the management for the UCS system is done from these fabric interconnects. The clever bit around UCS is that each blade server upon deployment is completely stateless, meaning that the server has no personality (no Mac addresses, UUID, WWN’s). Pools of world these unique identifiers are created within UCS Manager and provisioned to what are called service profiles..  these in turn are then deployed to the blades along with WWN’s of boot from SAN LUNs. This means that if we have to down a server (planned or unplanned), we can take the service profile and attach it to another blade.. and the outside world will not see that anything has changed, all with minimal downtime.

Nice UCS Manager demo I came across which gives a high level overview of UCS Management (there is much more to be found on youtube)

There are 2 Flavours of Fabric Interconnect, the 6120, which is a 20 port 10GbE switch, which also supports one expansion module to either add additional 10GbE ports or FC ports to enable FCoE downstream to the Chassis. Each 6120 can manage up to 160 half width or 80 full width blades across 20 chassis. But bear in mind the port density of the 6120 will mean that you have a limited number connections downstream to the chassis in the maximum configuration, so only 10GbE throughput to each chassis per 6120 or 20GbE with 2 x 6120 Fabric interconnects (recommended for HA).

The 6140, which is a 40 port 10GbE switch, which also supports two expansion modules to either add additional 10GbE ports or FC ports to enable FCoE downstream to the Chassis. Each 6140 can manage up to 320 half width or 160 full width blades across 40 chassis. But again, bear in mind the port density of the 6140 will mean that you have a limited number connections downstream to the chassis in the maximum configuration, so only 10GbE throughput to each chassis per 6140 or 20GbE with 2 x 6140 Fabric interconnects (recommended for HA). 

In terms of how the 6100 fabric interconnects connect upstream to the customers aggregation/distribution network layer. All native ports are 10GbE capable, but the first 8 ports on the 6120 and the first 16 ports on the 6140 can negotiate down to gigabit speeds. You currently have the choice of 10GbE long range or short range optics, Gigabit SFP’s or Cisco’s CX1 10GbE copper twinax cables which have the SFP’s attached to each end of the cable, but only come in 1m, 3m and 5m lengths (i believe 7m is planned), typically the CX1 cables would be used for downstream chassis connectivity, but can be used upstream if the connecting device supports them (ie, nexus 5k).  For FC connectivity, we can add expansion modules for eith 8GB/s FC or 4GB/s FC connectivity. Its worth noting that the 6100’s work in NPV mode, so they do currently require upstream FC switches which support NPIV.

The Chassis, otherwise knows as the 5108 Chassis is 6RU in height, with front to back cooling and can house up to 8 half width blades or 4 full width blades (I shall detail the difference later). The has 8 cooling fans and requires a minimum of to power supplies, but can have up to 4 depending on the power redundancy requirements. The Chassis is connected upstream via a pair of 2104 Fabric extenders (blade switches effectively) or a single fabric extender if using a non HA configuration. Each 2104 has 4 external ports and connects to its respective fabric interconnect (not dual honed).

architecture Diagram from Cisco.com

As I said.. there is a lot to cover on UCS, so will detail the blades themselves with all the gubbins, such as extended memory architecture, adapters, processing technology in the next post. Then in further posts we’ll cover things like virtualization integration, 3rd party tools, network consolidation as a whole.


Protocol considerations with VMware

A good video I came across from EMC discussing some storage protocol considerations when looking at VMware.