Use of HPCC storage
The main function of the HPCC storage systems is to provide rapid access to and from the worker nodes of the HPCC clusters for data needed in high speed calculations. For this reason, these systems are optimized for speed and are not intended for long-term or archival storage. We cannot guarantee that data will not be lost due to operational factors in the use of the clusters. As a result, it is the researcher's responsibility to ensure back up their own important data.
The HPCC stores cluster-wide data on a set of resilient Lustre-based file systems, and backs up all data in user home areas regularly. (Note: data in work and scratch areas is not backed up, as explained below.) We strongly encourage users to maintain an external backup copy of all critical data and not to use HPCC Lustre cluster-wide storage systems as the only copy of files important to their research. To transfer data to other resources under your control, see the information on our HPCC Data Transfer page.
The following table summarizes the locations, their sizes and backup details. In addition to the amounts of free storage listed below made available at no cost to everyone using the clusters, the HPCC offers researchers and/or research groups the opportunity to purchase dedicated space on the cluster-wide file systems. This storage is available with or without the option of near-line backup storage for users who do not have the capability to maintain their own backups, or who prefer to use our backup systems. For further information on these options, please contact firstname.lastname@example.org.
|Location||Size in GB||Alias||Backup||Purged|
|/lustre/scratch/eraiderid||none||$SCRATCH||No||As needed to maintain <70% Lustre space usage. (Purges typically take place monthly.)|
In more detail, in HPCC Lustre cluster-wide storage systems, the conflict between
performance, size, speed, cost, and reliability is generally resolved in favor of
large size at high speed with relatively low cost. Most of the cluster disk storage
is composed of redundant arrays of inexpensive disks (RAID) to be resilient against
single disk failures. There are nearly 100 such arrays operating at this time in the
HPCC. In most cases, at least three disks in any given array must fail for data to
be lost. The option to protect dedicated individual or research group storage on a
separate backup system housed in the HPCC can also be selected as part of an annual
storage purchase agreement.
Please also read the general conditions for access in the TTU HPCC Operational Policies page.
On the HPCC RedRaider cluster-wide file systems,
- The $HOME area for every user is backed up and is subject to usage quotas.
- The $WORK area for every user is not backed up but is not purged, and is subject to usage quotas larger than those used in $HOME.
- Special researcher-owned storage areas may be purchased by individual researchers or research groups and access permissions are managed according to their own policies. Backup may be provided optionally for purchase once the new backup system is commissioned.
- The Scratch partition is subject to purging in order to keep the overall file system
usage below 70% full.
- If the overall Lustre file system becomes 70% or more full, the $SCRATCH area for every user is purged of its oldest files. See the Purge Policy below for details.
- On a monthly basis the $SCRATCH area for every user is purged according to the Purge Policy - see below for details.
Scratch Space Purge Policy
The purpose of the scratch space is to provide temporary output for intermediate results from HPCC codes that can be further processed by researchers on a short-term basis. Lifetimes for files on scratch space should be targeted for hours, or at most days. Regardless of the total scratch space usage by each researcher, individual files will be removed from /lustre/scratch/eraiderid ($SCRATCH) automatically if they have not been accessed in over 1 year. To check a file's last access time (atime) you can use the following command: ls -ulh filename.
HPCC Staff will monitor the overall Lustre space usage to ensure that it remains below
70% full. On a regular basis, the $SCRATCH area for each user will be purged of all files
that need to be removed to bring overall usage across all researchers below this threshold.
This purge will take place regularly regardless of the current level of Lustre space
usage. In the event the overall Lustre utilization goes above the purge threshold, a purge
scan of every user's $SCRATCH space will be initiated and a list of the oldest files
for each user will be sent to all users who have files that need to be removed, targeted
to reduce scratch space usage. Under typical conditions, the retention period will
be far shorter than the 1-year maximum, typically several weeks at most, but under
heavy use, this retention period could drop to 1-2 weeks or even days.
Scratch space SHOULD NOT be used for long-term storage. Users must not run "touch" commands or manipulate files for the purpose of altering file timestamps, move files around to different folders within their scratch space, rename them, or otherwise pursue similar steps to circumvent scratch space purge operations. Users who violate this policy run the risk of having their accounts suspended by HPCC staff and access restricted until the affected files have been cleaned up.
To help us avoid the need to shorten the retention period, please use the scratch space conscientiously. The Scratch partition should be used for files that have no need for long-term retention. Ideally, this period should be measured in days. The reason that the retention period is variable is that it depends on usage. Proactively removing files that are not needed thus extends the retention time for yourself and other users. For this reason, please write your codes and implement your workflows in ways that delete your scratch space usage as soon as possible after processing.
The scratch area should NOT be used for files that will be needed for long time periods.
HPCC staff will work to keep the HPCC community informed and try to give warnings if the expected retention period decreases significantly due to high usage, but ultimately it is your responsibility to use the HPCC file systems judiciously.
For additional assistance please contact email@example.com