Why Workload Isolation Matters and How We Do it

Datetime:2016-08-23 04:47:50          Topic: OpenStack           Share

It’s no secret that virtualized environments drive efficiency in most IT environments. However, bringing workloads closer together and allowing them to share resources can introduce unexpected security challenges. Proper isolation for these workloads is critical.

In this post, you will learn why virtual machines need isolation and what can happen when they do not have it. The Rackspace Private Cloud powered by OpenStack delivers industry-standard isolation for your cloud and we will cover those features as well.

What is isolation and why is it important?

Physical servers are much more isolated than virtual machines. Both have connections to a network, but the virtual machines are much more connected. They often share the same physical network cards, disks, RAM and processors. The Linux kernel has plenty of safeguards that prevent one virtual machine from accessing resources that it should not have access to, but every piece of software has bugs — even the kernel itself.

Isolation on Linux servers starts with the idea that a process — a virtual machine in this case — cannot access certain parts of the server that it is not authorized to access.  For example, a virtual machine should not be able to read RAM that is in use by another virtual machine.  It also should not be able to access another virtual machine’s disks.  In a highly trusted environment, no virtual machine should attempt to do this anyway.  However, if a virtual machine becomes compromised, all bets are off.

These are the situations where more aggressive isolation can help.

How are virtual machines isolated?

The Linux kernel has a feature called the Linux Security Module, or LSM. There are many LSM implementations, but two of the most common are SELinux and AppArmor. These frameworks allows users to set policies that govern what processes on a Linux server are allowed to do.

If a process tries to do something that is allowed, the LSM allows the action to occur.  If a process tries to do something it is not allowed to do, the LSM takes action. These actions range from simply logging the event to immediately blocking the action at the kernel level.

OpenStack uses a system called libvirt to manage virtual machines, and libvirt has a feature called sVirt . The sVirt feature ensures that each virtual machine can only access the resources assigned to it. This provides an extra level of granularity that distinguishes each virtual machine’s set of resources from those of other virtual machines.

How does this isolation help during an attack?

The worst case scenario in a virtualized environment is when an attacker can venture outside the virtual machine. This is commonly called a “VM escape” or a “host privilege escalation.” These usually stem from bugs in the Linux kernel or in the server hardware.

The LSM steps in as a last line of defense. When the attacker attempts to access storage from a different virtual machine, the LSM will walk through its policies and choose to allow or deny the request.  Those steps look like this:

  1. I see that a user in VM #1 wants to access this storage volume from VM #2.
  2. I have policy that says that a VM can only access disks that it owns.
  3. Does this storage volume from belong to VM #1? It does not.
  4. Deny the access.

These policies protect the server, and its virtual machines, from most attacks until the server can be patched with a permanent fix for the kernel bug.

How does Rackspace isolate virtual machines in our OpenStack private cloud?

It starts with an understanding of the two main systems within an OpenStack cloud:

  • data plane: contains all of the virtual machines that users create
  • control plane: contains all of the OpenStack services (such as nova, glance, and swift) along with the support services (such as MariaDB and RabbitMQ).

Within the data plane, RPC uses Ubuntu’s preferred LSM implementation: AppArmor. Along with the additional fine-grained isolation provided by sVirt, AppArmor limits the access of each virtual machine. Any attempts to access unauthorized resources are denied immediately and logged for analysis. In the case of a kernel bug, these isolation technologies will reduce or eliminate the damage caused by a host privilege escalation.

RPC also applies AppArmor profiles and policies within the control plane. Each OpenStack service and each support service are provisioned into separate containers with AppArmor profiles applied to each container. In addition, every container has its storage within a different logical volume on the host. This allows AppArmor policies to further restrict storage access between containers and reduces the chances of a successful container breakout.

The only true way to fix a security bug is to apply an update, but updates take time. These security improvements will provide an extra layer of defense when a long-term fix is still in progress.





About List