The age of managed infrastructure is coming to an end. Infrastructure as a service (IaaS) providers like Amazon Web Services (AWS) have spent the last decade developing platforms that eliminate the manual, time-consuming maintenance of hardware, network, and storage, and enabling infrastructure to be controlled via software, i.e. infrastructure as code .
But contrary to what you might expect, enterprises are still managing IaaS. Buildout and maintenance tasks in AWS can be performed via API call and can therefore be automated, but many still choose to maintain AWS manually, instance by instance. Or perhaps worse, they purchase a “portal” to orchestrate IaaS. And these are very expensive and potentially risky choices.
What is the value of infrastructure?
As the vast majority of businesses become digital (software) companies , the value of infrastructure is only its ability to support business-critical applications that change frequently. Therefore, the best possible infrastructure is reliable and quick to spin up or destroy.
In the old world, the only way to make infrastructure more reliable was to throw more people and dollars at the problem. More people to monitor things like CPU utilization and replicate data, either in-house or outsourced to a managed service provider; more dollars to buy better hardware, live backups, etc..
In IaaS, such concerns are irrelevant. AWS monitors hardware, CPU usage, network performance, etc. AWS can be programmed to take snapshots on a regular schedule. If you use a service like AWS Aurora , Amazon’s managed database service, you get replication, upgrades, and management built-in. If you want to improve reliability or disposability, AWS does not offer “premium hardware”; instead, you must architect your environment in new ways and rely on automation to improve SLAs. In other words, you must treat AWS resources like objects that can be manipulated with code.
In this new world, you do not care about CPU utilization. Your metrics of success are not measured in minutes of downtime. If you architect IaaS correctly, you should never have downtime unless Amazon has a global issue, which very rarely happens. Instead, your KPIs are things like: How many times did we push changes to our infrastructure as code base? How many infrastructure changes produced errors? How long does it take to go from cloud architectural plan to delivering an environment?
Cloud automation is what drives better availability, better cost management, better governance, better time-to-delivery. So whether an enterprise chooses to build an automation team in-house or outsource it to an next-gen service provider , it should be at the top of enterprises’ cloud priority lists.
Cloud Automation in Action
IaaS gives you the tools to control infrastructure programmatically. In other words, you can manipulate infrastructure resources with code rather than through the AWS console or by manually typing in the CLI. This fits in with the larger vision set out in Agile philosophy, particularly the art of maximizing work not done ; if you can automate, you should.
What does this look like in practice with AWS? Teams that want to spend less time on manual infrastructure maintenance usually do one or all of the following:
- They use the fully managed cloud services that AWS already provides (like AWS Aurora or AWS Redshift ) as much as possible
- They automate the buildout of infrastructure resources using a templating tool (like AWS CloudFormation )
- They automate the install/configuration of the OS with a configuration management tool
- They integrate infrastructure automation with their existing deployment pipeline
- They prepare for the future of IT where serverless compute resources like AWS Lambda abstract away infrastructure orchestration entirely
The impact of this model is enormous. When you take full advantage of AWS services, you minimize engineer effort, reduce risk by automating things like backups and failover, and get built-in upgrades. When you automate infrastructure buildout and configuration, you enable rapid change, upgrading, patching, and self-healing of AWS resources without human intervention. Engineers never modify individual instances directly; instead they modify templates and scripts so that every system change is documented and can be rolled back, reducing the risk and effort of change.
When you move to this model, you are building infrastructure as code, not managing infrastructure. Your engineers are now essentially “developers” of infrastructure software. This requires a new set of skills and an entirely new outlook on how engineers should spend their time.
The Cost of Managed Infrastructure
Unfortunately, many enterprises still throw people and money at cloud availability and agility issues. They create (or buy) “orchestration portals” that tell them about instance performance and storage utilization and resource usage. They use the same security processes, i.e., spend many weeks of each deployment cycle manually testing infrastructure and keep compliance checklists in spreadsheets. They use only the most basic AWS services, perform manual upgrades and updates, and in the case of an issue, they nurse individual instances back to health. In other words, they still manage infrastructure.
What is the real cost of this model? A recently released report by Puppet found that high performing IT teams — that prioritize automation, high deployment velocity, and the “work not done” principle — spend 22% less time on unplanned work and rework than low-performing IT teams. High-performers have a change failure rate of 7.5%, compared to medium-performers with a change failure rate of 38%. Mean time to recover is 1 hour for a high-performing organization and 24 hours for a medium-performer. If you multiply mean time to recover by the average cost of downtime at your organization, the real cost of not prioritizing automation becomes unjustifiable.
What could your engineers do with 22% more time? What could they do if they were not constantly firefighting broken virtual machines — and could instead blast away the instance and rebuild a new one in minutes? They would spend more time on new projects, the products that drive real business value and revenues.
It is true that automation itself takes time and money. It also takes expertise — the kind that is hard to find. Yet infrastructure automation is the inflection point that jumpstarts organization’s DevOps efforts . These factors make it an ideal service to outsource; it is a skills gap that enterprises are struggling to fill, and a non-disruptive place where MSPs can provide value without replacing internal DevOps efforts. Annext-generation MSP that has already developed proprietary infrastructure automation software is an ideal fit; just beware of companies that sell “Managed IaaS” that is just monitoring and upgrades, because they will not help you escape from infrastructure management.
The Future of Infrastructure as Code
We are entering a world where infrastructure is not only disposable, it is invisible. The best way to manage infrastructure is not to manage infrastructure at all, but instead to develop an automation layer composed of AWS services, 3rd party tools, and your own custom-built software. No matter how your market changes or the state of IT in five years, investing now in automation will allow you to adapt quickly.
Major cloud providers are pushing the market in the direction of management-less IT, and it is only a matter of time before the market follows. Chances are that adherents of the “management” model will linger — both internal teams and external vendors — that want to patch and update machines with the same reactive, break/fix approach they used in the 90’s and 00’s in managed datacenters. When companies gain AWS expertise and realize how little value infrastructure management adds, IT will evolve into an infrastructure as code provider. Value creation is moving closer up the stack, and IT must follow.
By Jason Deck
SVP of Strategy, Logicworks