High Performance Computing: Think about where you do data operations

Datetime:2016-08-23 05:28:04         Topic: HPC  Test Engineer          Share        Original >>
Here to See The Original Article!!!

The High Performance Computing system at University of Sheffield has several different file systems available to it. We have:-

  • /fastdata – Alustre-based, shared filesystem with hundreds of terabytes of space. No backup. No quota.
  • /data – An NFS file system where each user has access to 100Gb of storage. Back-ups go back 7 days.
  • /home –  An NFS file system where each user has 10Gb. Backed up over 28 days. Mirrored.
  • /scratch – Local disk on each worker node. No back up. Uses ext4.

Lots of options with differing amounts of space, back-up policy and, as I’m about to demonstrate, performance characteristics. I suspect that many other HPC systems have a similar set up.

On our system, it’s very tempting to do everything in /fastdata. There’s lots of space, no quota, readable from all worker nodes simultaneously — good times! I try to encourage people to think about what they are doing, however. Bad things can happen if the lustre filesystem is hammered too much. Also, there can be a huge difference in performance for some operations across different filesystems.

Let’s take an example. I want to download and untar gcc 4.9.2. How long does that take on the three different filesystems?

On the scratch directory of a worker node

cd \scratch
mkdir testing123
cd testing123
wget ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-4.9.2/gcc-4.9.2.tar.gz
time tar xfz ./gcc-4.9.2.tar.gz 

real    0m6.237s
user    0m5.302s
sys 0m3.033s

On the lustre filesystem

cd /fastdata/
mkdir testing123
cd testing123
wget ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-4.9.2/gcc-4.9.2.tar.gz
time tar xfz ./gcc-4.9.2.tar.gz 

real    7m18.170s
user    0m6.751s
sys 0m56.802s

On the NFS filesystem

cd /data/myusername
mkdir testing123
cd testing123
wget ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-4.9.2/gcc-4.9.2.tar.gz
time tar xfz ./gcc-4.9.2.tar.gz 

real	16m37.343s
user	0m6.052s
sys	0m23.438s

For this particular operation, there is a two orders of magnitude difference between the worst and the best option.

I’m not an expert in filesystems and I have no idea what’s causing these differences or if I’d see a similar speed difference given a different file operation. I currently have no interest in doing a robust set of benchmarks. The point I’m making is that if you are using a system that has multiple filesystems it may be worth checking if there’s an advantage to using one over the other for your particular use case.








New

Put your ads here, just $200 per month.