As discussed deeply in the previous episode “Storage Area Network Concepts”; (Oil & Gas Newspaper – May 2008 – Issue 17); Fiber Channel Storage is added to your infrastructure to raise servers’ performance as well as increasing systems’ scalability and availability. Moreover it gives you the capability to go for a tier-5 disaster recovery solution replicating data between two storage arrays that are placed one at Main Site while the other is located in the Disaster Recovery Site.

All Storage Vendors have different categories of Storage Arrays starting from the very entry level passing by midrange and going up to high end boxes. The question is, what is the difference between these categories ? How can you choose the best and optimum for your infrastructure without exaggeration or taking over qualified solution ? Is it only a matter of capacity ? What is the key factor that you have to consider ?

The truth is Storage solutions selection is primarily related to the needed performance levels and Applications performance plus IOs (i.e., Storage category selection will never be a matter of capacity). Let us start with Entry Level Storage Arrays (e.g., MSA Family from HP, AX-4 Family from Dell & DS3000 Family from IBM). All the mentioned before storage boxes will add only capacity without adding any performance enhancement to your applications as they will act as slow as your servers’ internal disks (i.e., it would fit for very small sized environments that needs only big capacity similar to servers’ internal disks). This is because disk drives that are used inside these disk arrays are either SCSI that is the least performing disks or SAS that is a better than SCSI where both are lower in performance than Fiber Channel Disk Drives that are used in Midrange and High-End Disk Systems.

Midrange Storage Solutions are the best fit for most of the infrastructures according to performance needs compared to High-End disk systems that commonly exists in High Performance and most critical organizations. Every vendor has 2 or 3 types of Midrange Storage systems that vary in performance and scalability. The question is, Where does performance come from ? What is the architecture of Midrange disk systems that affect performance values and scalability ? How can you compare different boxes of each vendor, one to each other ? How can you choose the box that really fits your infrastructure ? What are the considerable optimum factors that should affect you decision ?

Simply when we have a closer look on Midrange Storage Disk Array Subsystems, we’ll find that they are all based on Central Dual Clustered Controllers that are responsible for handling all system’s operations (i.e., Host Connectivity, Cache Operations and Algorithms, RAID Parity Calculations & Read and Write on Disk Drives). Storage Arrays can be classified only by the number of Front-End ports and Back-End ports that present system Bandwidth accessible by servers and delivered to disk drives for writing data (i.e., number of ports tells system category and performance). Most of the Storage Systems in the market are based on 4Gbps End-to-End Fiber Channel Speed except HP that is still having 2Gbps Fiber Channel Speed in the Back-End. According to the diagram below, we can have a flavor about differences between boxes of different vendors.

Let’s start the technical navigation and go more deep to the Storage Array Clustered Controllers, are they clustered in Active-to-Active mode ? IBM & HP storage controllers are clustered in Active-to-Active mode where Dell CX3 family is based on Active-to-Passive architecture. The benefit of having Active-to-Active controllers is to ensure that system will always have balanced workload distributed equally over the two clustered controllers as it will be always monitored and actions will be taken automatically. In Active-to-Passive architecture systems like Dell Storage CX3 family, manual operations will be always required from system administrators by keeping an eye on system performance monitoring tool to keep system workload balanced otherwise un-expected failures would occur.

The second point is the Controllers’ Cache Size. Most vendors demonstrate big cache size to show that their storage arrays are highly ranked than other vendors taking the attention away from the real differentiator that is system’s Front-End and Back-End ports. The Truth is, you should know the real Effective Cache for Data Operations only excluding any overheads taken for system operations (i.e., pay for what you are going to use and never pay for vendors’ limitations). Most of the storage vendors use part of the cache for system operations leaving another part for data operations that you will use in your production system. Listed below is a comparison showing actual cache size used for data operations compared to total cache size for all different storage vendors.

System
CX3-20
EVA4100
DS4700
CX3-40
EVA6100
DS4800
CX3-80
EVA8100
Total Cache
4 GB
4 GB
4 GB
8 GB
4 GB
8 GB
16 GB
8 GB
System Cache
2 GB
2 GB
0 GB
3 GB
2 GB
0 GB
10 GB
4 GB
Data Cache
2 GB
2 GB
4 GB
5 GB
2 GB
8 GB
6 GB
4 GB


The third point is the RAID Parity Calculations, is it based on dedicated Hardware Processors or it is handled via Software. Both of HP EVA Family and Dell CX3 Family are based on Software RAID Parity Calculations that is lower in performance than Hardware based. IBM DS4000 Family has a dedicated Processor for Hardware RAID Parity Calculations that is High-Performing and based on Latest Technology.

The fourth point is the way of communication between Controllers and Disk Drives. All Storage vendors used to use a low performance, bandwidth consuming technique called Fiber Channel Arbitrated Loop (i.e., Controllers Signals going to each and every disk drive in the Loop have to pass by all disk drives in the loop back and forth). Simply the Loop Bandwidth will be divided on the number of disk drives in the Loop (Same as traditional Hub in Networks). The maximum number of disks that can be placed in a single Loop is 127 nodes that is a Physical Limitation as signal won’t pass if more than 127 disks exist in the same loop. The SAN Industry has set a Standard of a maximum of 56 to 60 disks in a single loop for acceptable performance where only Dell is admitting 120 disks in a single loop that is totally against performance according to the following graph.

IBM DS4000 Storage Arrays family implement a special technique called Switched Fabric (i.e., Controllers Signals going to each and every disk drive will go directly without passing by any other disk drives in the path). Simply the Bandwidth will be dedicated to each disk drive in the path (Same as Switch in Networks). The maximum physical limitation in Switched Fabric disks environment is (224 = ~16777216) that is a very huge number that vendors won’t reach.

In order to have a flavor about real differences between Switched Fabric disk drives back-end & arbitrated loop ones, we can consider the following figure. Moreover, in arbitrated loop systems, disk failures deeply affect performance and can’t be predicted. In switched fabric systems, you can put Fiber Channel Disks near SATA Disks without any effect on performance as well as disk failures that can be predicted.

Finally let’s have a look on performance values of boxes for different storage vendors. We’ll consider Vendor Published IOps as well as SPC-1 & SPC-2 Standard Benchmarks considering having Snap Shot copy service running in the back ground and validating its effect on system overall performance.

System
CX3-20
EVA4100
DS4700
CX3-40
EVA6100
DS4800
CX3-80
EVA8100
Vendor IOps
138K IOps
154K IOps
125K IOps
203K IOps
154K IOps
575K IOps
275K IOps
225K IOps
SPC-1
N/A
N/A
17,195
24,997
N/A
45,014
N/A
20,096
SPC-1 Snapview
N/A
N/A
N/A
8,997
N/A
N/A
N/A
N/A
SPC-2
N/A
N/A
798
N/A
N/A
1,381
N/A
N/A

By Mohamed El Mofty
Storage Networking Solutions Expert
IBM Systems and Technology Group

Download