Maintenance Schedule
Items on this page:
Unscheduled Emergency Maintenance Issues:
Update March 21, 2025: The RedRaider cluster and the Globus Connect data transfer service are now fully available. Jobs were resumed Thursday, March 20, and logins were re-enabled Friday, March 21. Please contact us at hpccsupport@ttu.edu if you encounter any issues.
Update March 20, 2025: Following restoration of chilled water, stable power, and networking services by the campus Operations and Telecommunications teams, batch jobs have been resumed on the HPCC RedRaider cluster and the "TexasTechHPCC" Globus Connect service has also been resumed to enable data transfers. Logins remain disabled until further service restoration can be performed. We'll have a further announcement about the timing of the planned scratch file system purge once all services are back to normal.
Update March 16, 025: The campus Physical Plant remains on generator power, which means that it cannot produce chilled water required to operate the HPCC. We are hoping to have more information soon on the timeline to have these utilities restored, but are told that it is likely to be days before these are available. The planned purge of scratch file system areas originally scheduled to take place today will be rescheduled for a future time once service is restored.
Update March 15, 2025: Although power has been restored to ESB I, central chilled water cooling systems that the HPCC depends on are not yet online. The HPCC cluster remains off. The scratch system purge previously scheduled to take place March 16, 2025 will be delayed and rescheduled once HPCC systems are back online.
March 13, 2025: There has been a major power failure on campus that has caused widespread outages to networks and equipment. Although the HPCC is protected by a generator, the main campus chilled water system that provides cooling to the HPCC has failed, causing us to have to shut down all equipment in the HPCC completely. There is no estimate from the campus Operations Department at this time as to when power and cooling will be restored. The HPCC will remain off until further notice and we will inform everyone when this has been resolved.
Please also see the section below on planned maintenance periods.
Feel free to contact us by email at hpccsupport@ttu.edu if you have any questions.
Planned Maintenance:
Update on Quanah partition: The login node and worker nodes of the Quanah partition are now back online with a new, more modern operating system version (Rocky Linux 9.2), updated and reconfigured network, auxiliary file servers, and software. User-installed Conda modules and packages should continue to function with few or no changes, but previous compiled software configurations that depend on system libraries will need to be rebuilt. A general set of system modules has been made available to support recompiling and relinking any user-supplied software. To see a list of these modules, log on to the quanah login node and use the command "module avail".
Planned shutdowns are usually reserved for the first full week of the second month of each calendar quarter to carry out system maintenance, skipping partial weeks within any given month and avoiding end of semester deadlines where possible, with exceptions as noted below. These can sometimes be skipped if not needed, expanded if necessary, or reduced. In addition, there will sometimes be necessary special planned shutdowns to carry out upgrades or alterations to the clusters.
For Academic Year 2024-2025, the planned shutdowns will be on the following schedule:
1) October 7-11, 2024 (Completed early) *
2) February 3-7, 2025 (Complete)
3) May 19-23, 2025 (delayed to avoid finals)
4) August 4-8, 2025
* Shutdown may continue for an extended time for the quanah partition after return
to service of the rest of the cluster. When the quanah partition returns to service,
it will be reconfigured with a new operating system version and all new software modules.
Those using the quanah partition should plan on recompiling or reinstalling their
code.
We appreciate your patience during these maintenance periods, which are used to improve
operation and stability of the resources and to implement new features as available.
Please feel free to contact us at hpccsupport@ttu.edu if you require assistance or have any further questions.
Maintenance downtime policy for HPCC systems:
- Special periods may be reserved for performing routine maintenance. Users will be notified as early as possible when we are planning on bring the systems down for any reason.
- Systems may also go down at anytime to fix security issues, but every effort will be made to give the earliest possible notice.
- Maintenance may also be required without prior notice in the event of system crashes or any unstable behavior.
Users should always keep extra copies of any files that are critical to their research on systems that are outside of the HPCC to plan around system maintenance and to guard against possible loss in the event of unforeseen catastrophic failure.
This policy is just good general practice and applies to all critical research files, regardless of where they are stored or whether or not they are located on HPCC resources.
For further information, see the "Data Transfer" and "Data Policy" guides on this web site.
If you have any questions, concerns, or problems the please contact us at hpccsupport@ttu.edu.
High Performance Computing Center
-
Phone
806.742.4350 -
Email
hpccsupport@ttu.edu