Handling Many Small Files
Working with many small files can create issues on HPC systems. The main issue is that accessing/writing/reading many small files requires metadata requests to the file systems and since these file systems like PC2PFS or PC2DATA are not local, the latencies for these metadata requests can add up and become an issue.
Possible solutions are:
using alternative file formats for data storage like HDF5 or netcdf
consider using the internal NVME-SSDs in many compute nodes, see Otus partitions and Noctua 2 partitions
packing the small files in to a (compressed) archive file
ratarmount (https://github.com/mxmlnkn/ratarmount ) can then mount this archive file as a virtual file system, so that your scripts/programs can access the files as usual
The best solution depends on the situation, so please don't hesitate to contact us for example via pc2-support@uni-paderborn.de if you are considering working with many small files.