Data PolicyUse of HPCC storage
The main function of the HPCC storage systems is to provide rapid access to and from the worker nodes of the HPCC clusters for data needed in high speed calculations. For this reason, these systems are optimized for speed and are not intended for long-term or archival storage. We cannot guarantee that data will not be lost due to operational factors in the use of the clusters. As a result, it is the researcher's responsibility to back up their own important data externally.
The HPCC stores cluster-wide data on a set of resilient Lustre-based file systems,
and backs up a limited amount of user data in home areas. We strongly encourage users to maintain an external copy of all data and not to use
HPCC Lustre cluster-wide storage systems as the only copy of files critical to their
research. In particular, the work, scratch and other special-purpose areas are not backed
up and should not be used as the only long-term copies of important data.
The HPCC is in the process of commissioning a new near-line backup storage system for users who do not have the capability to maintain their own backups, or who prefer to use our backup systems. Further information will be posted once this system has been commissioned.
In more detail, in HPCC Lustre cluster-wide storage systems, the conflict between performance, size, speed, cost, and reliability is generally resolved in favor of large size at high speed with relatively low cost. Most of the cluster disk storage is composed of redundant arrays of inexpensive disks (RAID) to be resilient against single disk failures. There are nearly 100 such arrays operating at this time in the HPCC. In most cases, at least three disks in any given array must fail for data to be lost.
Please also read the general conditions for access in the TTU HPCC Operational Policies page.
Data Policy for Hrothgar and Quanah
On Hrothgar and Quanah,
- The $HOME area for every user is backed up and is subject to usage quotas.
- The $WORK area for every user is not backed up but is not purged, and is subject to usage quotas larger than those used in $HOME.
- Special researcher-owned storage areas may be purchased by individual researchers or research groups and access permissions are managed according to their own policies. Backup may be provided optionally for purchase once the new backup system is commissioned.
- The Scratch partition is subject to purging in order to keep the file usage below
- If the Scratch partition becomes 80% or more full, the $SCRATCH area for every user is purged of its oldest files. See the Purge Policy below for details.
- On a monthly basis the $SCRATCH area for every user is purged according to the Purge Policy - see below for details.
The following table summarizes the locations, their sizes and backup details.
|Location||Size in GB||Alias||Backup||Purged|
As needed to maintain <80% usage
(Purges typically take place monthly)
Individual files will be removed from /lustre/scratch/eraiderid ($SCRATCH) automatically if they have not been accessed in over 1 year. To check
a file's last access time (atime) you can use the following command: ls -ulh filename.
Users may not run "touch" commands or similar commands for the purpose of altering their file's timestamps to circumvent this purge policy. Users who violate this policy will have their accounts suspended by HPCC staff. This suspension may result in the user's jobs being killed.
HPCC Staff will monitor the scratch space usage to ensure that it remains below 80% full. On a monthly basis the $SCRATCH area for every user will be purged of all files that have not been accessed in over 1 year. This monthly purge will be run regardless of the current level of scratch space usage. In the event the Scratch partition goes above the 80% threshold, an immediate purge of every user's $SCRATCH space will be triggered and the oldest files for each user will be removed until we are well below the 80% threshold.
To help us avoid the need to shorten the retention period, please use the scratch space conscientiously. The Scratch partition should be used for files that have no need for long-term retention. Ideally, this period should be measured in days. The reason that the retention period is variable is that it depends on usage. Proactively removing files that are not needed thus extends the retention time for yourself and other users.
At this time, and with current usage patterns, the file retention period on the scratch area can be expected to be at least several days and most likely up to a few weeks, but in no case will files stay on disk more than a year since their last access. The scratch area should NOT be used for files that will be needed for longer periods.
We will keep the HPCC user community informed and give warnings if the expected retention period decreases significantly.
For additional assistance please contact firstname.lastname@example.org