Handling Many Small Files

Handling Many Small Files

Working with many small files can create issues on HPC systems. The main issue is that accessing/writing/reading many small files requires metadata requests to the file systems and since these file systems like PC2PFS or PC2DATA are not local, the latencies for these metadata requests can add up and become an issue.

 

Possible solutions are:

  • using alternative file formats for data storage like HDF5 or netcdf

  • consider using the internal NVME-SSDs in many compute nodes, see Otus partitions and Noctua 2 partitions

  • packing the small files in to a (compressed) archive file

 

The best solution depends on the situation, so please don't hesitate to contact us for example via pc2-support@uni-paderborn.de if you are considering working with many small files.