Be it a single compute node or multi-node compute cluster or even a full blown Data-center with multiple Clusters & Grids, everyone of them will have 2 things for sure :
2. Storage for the Applications
Now, every one of us who is associated with any of the above would surely have come across the below questions atleast a couple of times if not hundreds or thousands of times.
- Application is running slow ? damn why ? no idea ?
- Is the problem because of badly written Application Code ?
- Is it the storage that is bottle-necked and making the application crawl ?
- Is the storage not able to handle the Application’s I/O requests ?
- Is it the network congestion due to which the Network Storage Server is serving I/Os much slowly ?
- Are there enough disk spindles in the storage to serve the IOPS needed by the Application ? Or shall we invest more in spindles ?
- Now-a-days, there’s a new kid on the block, the SSD. Shall we replace our complete backend storage layer with All-SSD solutions ? Or Hybrid solutions ?
- Should we move the Application to the cloud ?
And numerous other such things come to our mind and we have no clue about the correct answer.
Those who are on the Application side will definitely refer the issue to the Storage folks.
As a Storage guy, we would start investigating about how our storage servers/devices are behaving with the Application workload.
Our Storage Industry is filled with lot of buzz words like Application I/O Analytics, Data Analytics, Storage Analytics, Performance Analysis, File Analytics, I/O Analytics, Storage monitoring and lot more. And a lot of tools promise that they have the ability to do one or more of the above. But is it true ? Well we will find out the truth about a couple of them.
The aim of any Storage Analytics data is to use & analyze them to arrive at a conclusion about the behaviour and performance of the Storage/Application and make recommendations for improving the performance. Lets see how the popular tools available come to our rescue. The first tool we will use is iostat. Heard of it ..right….?
iostat is a very popular tool packaged with the sysstat utility in Linux
It is used to monitor the Disk I/O performance in terms of Blocks read or written.
The command when executed without any arguments, will display blocks read & written since the time the system booted
Note: iostat displays the CPU stats as well along with the I/O stats. But,we will focus on the I/O stats only
Performance of Local or Block Devices
There are various modes of output metrics that iostat provides
I/O metrics in terms of 512-byte blocks
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 20.06 650.13 75.82 12480434 1455464 sdc 5.04 116.79 2196.00 2242100 42156560 sdd 4.96 115.81 2193.09 2223204 42100552 sdb 0.06 0.51 0.00 9842 8 sde 0.13 1.07 0.00 20528 0 md127 549.42 231.92 4389.09 4452064 84257112
tps : Transfers per second issued to the device or I/O requests issued to the device per second. Blk_read/s : Number of 512-byte blocks read per second Blk_wrtn/s : Number of 512-byte blocks written per second Blk_read : The total number of blocks read Blk_wrtn : The total number of blocks written
I/O metrics in terms of Kilobytes (iostat -k)
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 19.64 318.14 37.13 6242073 728476 sdc 4.94 57.14 1074.30 1121050 21078280 sdd 4.86 56.66 1072.88 1111602 21050276 sdb 0.06 0.25 0.00 4921 4 sde 0.13 0.52 0.00 10264 0 md127 537.56 113.46 2147.18 2226032 42128556
kB_read/s : Number of Kilobytes of data read per second kB_wrtn/s : Number of Kilobytes of data written per second kB_read : The total number of Kilobytes read kB_wrtn : The total number of Kilobytes written
I/O metrics in terms of Megabytes (iostat -m)
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sda 19.64 0.31 0.04 6095 711 sdc 4.93 0.06 1.05 1094 20584 sdd 4.86 0.06 1.05 1085 20556 sdb 0.06 0.00 0.00 4 0 sde 0.12 0.00 0.00 10 0 md127 537.45 0.11 2.10 2173 41141
MB_read/s : Number of Megabytes of data read per second MB_wrtn/s : Number of Megabytes of data written per second MB_read : The total number of Megabytes read MB_wrtn : The total number of Megabytes written
All the above metrics are very basic in nature and would only provide us with the answers to the basic questions like
- Is the workload read intensive or write intensive Percentage of read & write
- Which disk is heavily loaded & which one is least loaded
- How heavy is the I/O workload, continuous or bursts
But, to analyze the real performance problem, we have to drill down to the next level of metrics. For that we need to run iostat with the “interval” argument.
This displays few more advanced I/O metrics which can help us deduce the behaviour of the I/O & storage
iostat -x 2
Here, it will display the metrics for every 2 seconds i.e. its actually the difference of the metrics between the current time and 2 seconds back
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.61 9.28 21.02 1.20 720.37 83.89 36.20 0.31 14.12 2.93 6.52 sdc 0.13 298.99 0.42 5.17 129.41 2433.28 458.45 3.40 607.49 5.03 2.81 sdd 0.11 298.56 0.30 5.20 128.32 2430.05 465.05 1.65 299.37 4.54 2.50 sdb 0.01 0.00 0.06 0.00 0.57 0.00 9.10 0.00 0.98 0.98 0.01 sde 0.01 0.00 0.14 0.00 1.18 0.00 8.37 0.00 0.50 0.50 0.01 md127 0.00 0.00 0.86 607.92 256.97 4863.33 8.41 0.00 0.00 0.00 0.00
rrqm/s : Number of merged read requests that were issued to the device. A large rrqm value signifies Sequential Read wrqm/s : Number of merged write requests that were issued to the device. A large wrqm value signifies Sequential Write r/s : Number of read requests that were issued to the device per second. A larger r/s value signifies a Read Intensive I/O w/s : Number of write requests that were issued to the device per second. A larger r/s value signifies a Write Intensive I/O rsec/s : Number of sectors read from the device per second wsec/s : Number of sectors written to the device per second avgrq-sz : Average size of the requests (in number of sectors) that were issued tot he device. A larger request size signifies a Sequential I/O avgqu-sz : Average queue length of the requests issued to the device. If the queue is large, then it means Heavy I/O is happening. But, at the same time if %util is low then we can assume that its Burst I/O await : Average time in milliseconds for I/O requests to be served (Time spent in queue + Time taken to service them) svctm : Obsolete field - Don't use this anymore %util : Percentage of CPU time during which I/O requests were issued. If this is close to 100% percent for a good amount of time, then that's a clear indication of a saturated/bottle-necked disk
Analytics for a Network Storage / Filesystem
iostat has minimal support for NFS or Network shares as well.
Filesystem: rBlk_nor/s wBlk_nor/s rBlk_dir/s wBlk_dir/s rBlk_svr/s wBlk_svr/s ops/s rops/s wops/s 192.168.1.12:/nfs1 120832.00 0.00 0.00 0.00 119808.00 0.00 60.00 58.50 0.00 Filesystem: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s 192.168.1.12:/nfs1 84.50 0.00 0.00 0.00 85.00 0.00 77.50 85.00 0.00
rBlk_nor/s : Number of 512 byte blocks read per second by Application using read system call wBlk_nor/s : Number of 512 byte blocks written per second by Application using write system call rBlk_dir/s : Number of 512 byte blocks read per second by Application using direct io read wBlk_dir/s : Number of 512 byte blocks written per second by Application using direct io write rBlk_svr/s : Number of blocks read per second from the NFS server using NFS Reads wBlk_svr/s : Number of blocks written per second to the NFS server using NFS Writes ops/s : Number of operations issued to the Filesystem per second rops/s : Number of read operations issued to the Filesystem per second wops/s : Number of write operations issued to the Filesystem per second
The above output can also be seen in terms of Kilobytes or Megabytes instead of 512 byte blocks, as shown in the 2nd output, where metrics rMB_nor/s corresponds to the rBlk_nor/s and so on.
How helpful is iostat ? Is the block level analytical data displayed above enough for our storage thirsty souls ?
Now, these metrics provide us answers to the questions like :
- Is the I/O sequential or random
- Are the disks able to handle the I/O load effectively or get saturated at a point
- Read intensive or Write intensive I/O
- Latency of each I/O request. Are the I/Os being served at an acceptable rate/throughput or
- each I/O request is taking too much time thereby resulting in bad throughput
- Which disks are utilized heavily and which are not at all ?
- Do we need striping by adding more spindles ?
That’s it ? Any analytical data above the disk layer ? Like files, directories, IOPS ?? Well sadly no…
Are the metrics provided by iostat enough to Analyze Application & Storage behaviour in a complex environment ?
Now, many of the Storage Analysts would be satisfied with the above analysis. But, there are a major number of them who would consider the above Analytics information as bare minimal and look for more data points like the below. The Analytical data that we saw above, only tell us about the Application’s I/O in terms of Blocks read and written and latency of each I/O request. It has no clue about answers to our even deeper questions. Does it ? No way….
Can we somehow find these ?
- Which files were accessed during the I/O Workload, we need the details ? the exact file path
- What were the I/O operations done on them ? read, write or both
- What filesystem operations were done ? Were there too many open, close, link, symlink, unlink etc ?
- How many files were accessed during the I/O ? One large file or thousands of small files ?
- Which all files were re-read or re-written multiple times ?
- Which all files were written and never read again ?
- Which all files were only read ?
- Which areas of each file were read/written to ? Or the whole file was accessed ? Hot areas of the file ?
- Does the I/O involve a lot of Attribute operations or only plain read and write ?
- Throughput, IOPS, Latency at different levels like starting from the Mount itself until each file level.
- For any file, we should be able to see its individual IOPS, Throughput, Latency for Read & Write
- Which processes were doing what kind of I/O and how much ? Like some processes doing read , some other doing write on the file ?
- We need to know how exactly a process(s) is/are accessing a file
- What about Process Groups ? Which process group(s) are accessing what files and their I/O analytics ?
- Even a larger subset i.e. Sessions in the System – Which Sessions are acting on what all files and their Analytics ? Like Gnome Desktop Environment Session is acting on what all files and what all operations are done on them ?
- Resident data set size of an Application Workload ? Does the application access data in MBs or GBs or TBs ?
- Does it access a large amount of data once only or a small amount of data but thousands of times ?
- The Process tree hierarchy of the Processes doing I/O. It should be clear as to which process exactly is doing what I/O
What about Applications running on multiple nodes and accessing a set of same data ?
- How do we get consolidated Analytics for all the Nodes ?
- If multiple compute nodes have mounted the same NFS share and are accessing one or more common files, then can we get the total I/O metrics on those files ? And individual as well from each compute node ?
- How about I/O Analytics based on Autofs Environment ? We need all I/O metrics for all mounts under an Autofs
- Which files they access, what I/O operation is done, Throughput, IOPS, Latency etc.
- Now, how about if we want to know how the backend NFS servers are behaving ? We need IOPS, Latency, Throughput, Files accessed from them, their metrics and everything related to each Storage Box.
Now, lets talk about some state-of-the-art Application & Storage Analytics requirements
In a group of one or more compute nodes running multiple applications, accessing files from Multiple NFS Filers, can we find out the :
- The effective IOPS, Latency, Throughput, Read/Write done by each process
- The files accessed during I/O by each process – again in each node and aggregated in all nodes
- The IOPS, Latency, Throughput, Read/Write done for each file
- The effective IOPS, Latency, Throughput, Read/Write achieved by each compute node
- The effective IOPS, Latency, Throughput, Read/Write of each backend Storage Server / Filer
- Imagine how good it would be if all the above Analytics data are available to us in 2 formats i.e. for each node as well as aggregated in all nodes.
- Can we get Analytics data for a selected group of Applications or selected group of Backend Filers. For example, we can select 5-10 applications and see how they are behaving, the effective IOPS, Throughput, Latency they are getting, the Read/Write done by them, files being accessed by them and all other such information which will help us fine tune them further
Let me stop here, as I can see its too much to ask for any Utility or Application/Storage Analytical Tool.
Now, does any of you know of any tool/utility/software by which we can get the answers to the questions above ? If not all, any tool/utility which can provide answers to atleast a few of the above questions ? Let me know about it in the comments section.
Well, we have one such tool which can answer all of the above questions….yes …all of the above and even more…Want to know which one ? Stay tuned, will reveal in the the next blog very soon.