Storage Requirements
Temporary and Permanent Storage
At PC2 we currently distinguish three different types of file systems:
The home directory:
Each user gets an individual home directory which is mainly for configuration files and so on. It is available at all PC2 cluster systems and is severely limited in size. It is backed up regularly and available at every PC2 cluster system.Permanent project storage (PC2DATA)
The permanent project storage (aka. PC2DATA) is hosted at the central university storage system with off-site backup similar to the home directories. It is suitable for example for program binaries and final results. Basically for things that you don't want to lose in case the parallel file systems of our clusters fail. We recommend requesting at least a few GB per project even if you are not planning to use it right away. If you plan to request more than 1000 GB please include an explanation of why you need it.
The permanent project storage is available at every PC2 cluster system.Temporary project storage (PC2PFS):
The temporary storage (aka. PC2PFS) is hosted on the local parallel file systems of the HPC clusters and is fast (order of 20-60 GB/s) and has low latency for accesses. It should be used during calculations and data analysis. Due to the size, there is no backup!
We recommend requesting at least a few hundred GB because it will be your main file system for simulations. You can request up to a few hundred TBs.
(See also File Systems)
Type | Purpose | Size limitation | Backup | Remote access from outside the clusters |
---|---|---|---|---|
home directory | configuration files | yes, 20 GB | yes | |
permanent storage (PC2DATA) | program binaries and final results | yes, requested in proposal | yes | |
temporary storage (PC2PFS) | calculations | yes, requested in proposal | NO |
Estimating the required size and number of files:
to estimate which file system size you should request for your project you should have a look at the data that is produced during a typical calculation of yours. Then simply multiply that amount by the number of calculations you need to store. If the resulting data size is large, let’s say beyond 10 TB, you should think about if you need all produced data indefinitely or if you can reduce the storage per calculation. If you have questions or if we can support you with data management, please don't hesitate to contact us a pc2-support@uni-paderborn.de.
For the number of files, you should take a baseline of 1.000.000 files which is typically already sufficient. However, have a look at your typical calculations and count the files created during a calculation. If this number multiplies with the number of calculations you need to store and use it instead. If the product is close to or exceeds 10.000.000 files, please contact us so that we can set up a more suitable storage because otherwise working with this many files might slow down your calculations.
Will the project store or process data on HPC clusters that have special privacy/confidentiality requirements (e.g. health data)?
You should mention here if you plan to store or process data on our cluster systems or services that have or where you suspect higher requirements in terms or
data privacy,
data confidentiality,
or data safety.
We consider for example the storage and processing of personal data of people that have not explicitly agreed to our data privacy policy, health data, financial data, or confidential research data as such cases. We need this information so that we can support you in the proper storage and processing of such data or, in case a proper usage of this data is not possible on our systems, help you find an HPC system that for example have to suitable certifications to process highly sensitive health data.
Â