Note: After returning to operation following a system upgrade, you may notice a message when you connect that states "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!". This occurs when the login nodes for the clusters go through a major upgrade that causes them to adopt new host keys. If you receive this message you will need to remove the old host key from your system using one of the following methods:
- On Linux or Mac systems this is done by editing your .ssh/known_hosts file to remove the old entry for the system you are trying to reach. This can be done using an editor or with a command like the following (replacing "quanah.hpcc.ttu.edu" with the remote system you are having trouble reaching due to its old key)::
ssh-keygen -f "~/.ssh/known_hosts" -R quanah.hpcc.ttu.edu
- On Windows system this done using the features of your ssh terminal program. A failed connection due to a changed host key is often followed by a prompt that allows you to change to the new host key. Select the option on the popup window that corresponds to changing keys.
October 29 - Nov. 16, 2018 (Hrothgar)
Nov. 2 - 9, 2018 (Quanah)
The HPCC is pleased to announce the installation of three major upgrades. As we discussed in the User Group Meeting this past June, Texas Tech initiated a project to install a 750kVA generator for the HPCC, which will be delivered on Thursday, Oct 18. We have also purchased a new 300kVA Uninterruptible Power Supply (UPS) unit to replace an older 150kVA unit that supplies power to the Hrothgar cluster. And finally, we have purchased a major storage upgrade for the Lustre file system, which will add 3.1PB of storage space to the current system of which 1PB will be available for purchase by researchers for additional dedicated storage.
In order to reduce the total amount of downtime, we have coordinated with multiple vendors to complete all three projects within a single maintenance period. At the end of this month, we will begin a phased shutdown of all HPCC services to install each of these major upgrades. The shutdown timeline is as follows:
- Friday, Nov 9 - By the end of the day, we expect Quanah to return to full operation. (Complete.)
- Monday, Nov. 12 - By the end of the day, we expect Hrothgar community cluster queues to return to full operation. (Update: delayed to Tuesday, Nov. 13.) (Complete.)
- Friday, Nov. 16 - By the end of the day, we expect Hrothgar to return to full operations following a full system test of our new UPS system.
In summary, most of the Hrothgar cluster will be unavailable Oct 29 - Nov 16, while the Quanah cluster and storage will be unavailable Nov 2 – Nov 9. Some services, such as the Globus data transfer end point, may be available slightly earlier.
Maintenance downtime policy for HPCC systems:
- Special periods may be reserved for performing routine maintenance. Users will be notified as early as possible when we are planning on bring the systems down for any reason.
- Systems may also go down at anytime to fix security issues, but every effort will be made to give the earliest possible notice.
- Maintenance may also be required without prior notice in the event of system crashes or any unstable behavior.
Users should always keep extra copies of any files that are critical to their research on systems that are outside of the HPCC to guard against possible loss in the event of unforeseen catastrophic failure.
This policy is just good general practice and applies to all critical research files, regardless of where they are stored or whether or not they are located on HPCC resources.