Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

Temporary and Permanent Storage

At PC2 we currently distinguish three different types of file systems:

  • The home directory:
    Each user gets an individual home directory which is mainly for configuration files and so on. It is available at all PC2 cluster systems and is severely limited in size. It is backed up regularly and available at every PC2 cluster system.

  • Permanent project storage (PC2DATA)
    The permanent project storage (aka. PC2DATA) is hosted at the central university storage system with off-site backup similar to the home directories. It is suitable for example for program binaries and final results. Basically for things that you don't want to lose in case the parallel file systems of our clusters fail. We recommend requesting at least a few GB per project even if you are not planning to use it right away. If you plan to request more than 1000 GB please include an explanation of why you need it.
    The permanent project storage is available at every PC2 cluster system.

  • Temporary project storage (PC2PFS):
    The temporary storage (aka. PC2PFS) is hosted on the local parallel file systems of the HPC clusters and is fast (order of 20-60 GB/s) and has low latency for accesses. It should be used during calculations and data analysis. Due to the size, there is no backup!
    We recommend requesting at least a few hundred GB because it will be your main file system for simulations. You can request up to a few hundred TBs.

(See also File Systems)

Type

Purpose

Size limitation

Backup

Remote access from outside the clusters

home directory

configuration files

yes, 5 GB

yes

yes, (Win)SCP, CIFS (Windows Network Drive)

permanent storage (PC2DATA)

program binaries and final results

yes, requested in proposal

yes

yes, (Win)SCP, CIFS (Windows Network Drive)

temporary storage (PC2PFS)

calculations

yes, requested in proposal

NO

yes, via CIFS (Windows Network Drive), NFSv4, scp, rsync

Estimating the required size and number of files:

to estimate which file system size you should request for your project you should have a look at the data that is produced during a typical calculation of yours. Then simply multiply that amount by the number of calculations you need to store. If the resulting data size is large, let’s say beyond 10 TB, you should think about if you need all produced data indefinitely or if you can reduce the storage per calculation. If you have questions or if we can support you with data management, please don't hesitate to contact us a pc2-support@uni-paderborn.de.

For the number of files, you should take a baseline of 1.000.000 files which is typically already sufficient. However, have a look at your typical calculations and count the files created during a calculation. If this number multiplies with the number of calculations you need to store and use it instead. If the product is close to or exceeds 10.000.000 files, please contact us so that we can set up a more suitable storage because otherwise working with this many files might slow down your calculations.

Will the project store or process data on HPC clusters that have special privacy/confidentiality requirements (e.g. health data)?

...