Have you ever made that requirements to your management vendors, or been given the idea that you need one? It’s certainly a logical requirement. As an IaaS service provider, you put together products from many vendors. Each product cannot be monitored in isolation, as they impact one another.
The above requirement or idea comes with other names too. The popular ones are “ single pane of glass ” and “end to end visibility”.
I’m not implying it is a bad idea. It is in fact the holy grail of IT operations. The mistake is the solution you have in mind is wrong.
Think about it. What’s the reason behind the requirement? Another word, what’s the end goal you’re trying to achieve?
We can use different words to articulate the goal. They are essentially the same thing. Here are some examples:
- You want to do your job well, which is to serve your customers. The business of IaaS requires you to ensure the IaaS platform delivers the performance and availability promise you state in your SLA.
- You want to be able to troubleshoot fast, and see problems before they become serious.
- If you cannot troubleshoot the root cause, you need to at least prove it is not due to your IaaS.
On a lighter note, we can say the goal can be summed in 1 word: TTI.
That’s Time to Innocence, not Time to Investigate
Ok, so we have seen here that there are 2 parties: You and Your Customer .
Let’s now look at 2 scenarios of opposite nature.
- Scenario 1 : You spot some errors in your storage, network and VMware. But your customers are all happy. None of their VMs are affected. Is there an active fire you need to put out right now ? All customers told me, the answer is no . Enterprise IT is practical because business is practical. There is no need to complicate matter. You can take your time to troubleshoot as business is not affected.
- Scenario 2 : I’d take the opposite situation. Every component of your IaaS is healthy. Your storage, network, server, hypervisor are all doing well. You support 10000 VM. 9999 is happy, but 1 is not. But that 1 VM happens to be the CEO desktop, and he needs to use it right now. Is there an raging fire, and you’re firefighting furiously? You bet!
The above 2 scenarios clarify that the single pane of glass you are looking for is not what your customers want you to look for. The dashboard you have in mind has information about your ESXi, cluster, datastore, distributed virtual switch, NSX, physical storage, physical network, FC fabric, etc. You want to see them all. How each performs, and how they are related to one another.
I hope you see the problem of such single pane of glass.
Yup, your customers don’t care. Your Infrastructure is irrelevant as far as they are concerned. The dashboard does not show enough information on your customers, and show too much information on your IaaS.
Focus on something they care, which is their VM and how you serve them .
- If you promise 0 dropped packets for network, then prove that not a single VMs experience dropped packet.
- If you promise 30 ms disk latency for Tier 3 service level, then prove that every VM has its IOs served within 30 ms.
- If you promise that performance will be as good as physical for your Platinum service level, then show that not a single VM in the Platinum cluster is contending for CPU and RAM.
The chart below shows a reasonable expectation on your IaaS from CIO viewpoint.
How do we prove that not a single VM in any service tier fails the SLA threshold you promise for that tier? Since it is IaaS, that means CPU, RAM, Disk and Network, the 4 main components of Infrastructure.
Your CIO expects to see consistency in performance. So you need to show a monthly report.
Is it easy? Let’s see. Assume you look after 4000 VM.
Can you think what the dashboard looks like?
That’s the dashboard you need to display as The First Dashboard . It is the first pane in your single pane of glass screen.