Texas Tech University

Maintenance Schedule

Items on this page:

Unscheduled Emergency Maintenance Shutdowns:

21 December 2021 18:00 - Job submission and interactive session issues were reported. To resolve these, the Slurm scheduler was reset and some jobs may have been requeued as a result. All of the partitions are now available and appear to be working normally. Send any issues you may encounter to hpccsupport@ttu.edu.

For status of individual services, see the "RedRaider Cluster" --> "Current Status of HPCC Services" page of the HPCC web site.

Please see the information below on planned maintenance operations, and contact hpccsupport@ttu.edu if you have any questions.

Planned Maintenance:

Planned shutdowns are usually reserved for the 2nd full week of the 2nd month of each calendar quarter to carry out system maintenance. These can sometimes be skipped if not needed, expanded if necessary, or reduced. In addition, there will sometimes be necessary special planned shutdowns to carry out upgrades or alterations to the clusters.

For Academic Year 2021-2022, the planned shutdowns will be on the following schedule:

1) November 8-12, 2021 *
2) February 7-11, 2022
3) May 9-13, 2022
4) August 8-12, 2022

* Extended thrugh November 17, 2021. Work carried out during this shutdown included the following:

    • Updated firmware on Infiniband switches, cards, and routers
    • Upgraded metadata servers for Lustre storage system to NVMe drive array
    • Update MATLAB version (further commissioning required by MathWorks)
    • Completion of planned purge of older files on the scratch area
    • Firmware updates on all cluster worker nodes

We appreciate your patience during this maintenance. Please feel free to contact us at hpccsupport@ttu.edu if you require assistance or have any further questions.

Maintenance downtime policy for HPCC systems:

  • Special periods may be reserved for performing routine maintenance. Users will be notified as early as possible when we are planning on bring the systems down for any reason.
  • Systems may also go down at anytime to fix security issues, but every effort will be made to give the earliest possible notice.
  • Maintenance may also be required without prior notice in the event of system crashes or any unstable behavior.

Users should always keep extra copies of any files that are critical to their research on systems that are outside of the HPCC to guard against possible loss in the event of unforeseen catastrophic failure.

This policy is just good general practice and applies to all critical research files, regardless of where they are stored or whether or not they are located on HPCC resources.

For further information, see the "Data Transfer" and "Data Policy" guides on this web site.

If you have any questions, concerns, or problems the please contact us at hpccsupport@ttu.edu.

High Performance Computing Center