Texas Tech University

HPCC Cluster Updates

Table of Contents

  1. Cluster Update - July 2018
  2. Cluster Update - October 2017

 

 


 

 

Quanah and Hrothgar Update - July 2018

In July of 2018, the HPCC performed an update to the Quanah, Hrothgar and Ivy clusters in order to better homogenize the cluster environments.  During this time, the HPCC made a number of updates to the operating systems for all three clusters (Quanah, Ivy and Hrothgar) and performed an upgrade to Quanah's underlying scheduler (UGE - Univa Grid Engine).  All nodes for each cluster have been updated to CentOS 7.4 and had their local environments homogenized so that cluster wide applications will be the same on all nodes for all clusters.

  • Why do my previously working applications no longer run on Quanah/Hrothgar/Ivy?
    • We have updated our operating system, kernel and compiler versions which will require that almost all code compiled on the old Hrothgar be recompiled on the new Hrothgar. Most code compiled on Quanah should continue to work, however recompilation may be required if you run into issues.
  • Will my old job submission scripts work?
    • If you are submitting your job to Quanah, your job's may schedule but they could fail due to some changes we made.  Please review the Job Submission Guide and ensure you are aware of some of the changes made.  Major changes include:
      • Your jobs will now be constrained on memory.
      • Your jobs should now state how long they will run.
      • We have added two new projects - xlquanah and hep.
    • If you are submitting to Hrothgar, Ivy, Serial, or one of the community clusters then your original job submission scripts should work as the parallel environments and queue names are the same.
  • Where are some of my applications?
    • During the upgrade we attempted to install all applications that had existed on the the previous clusters, though we may not have installed the exact same versions for every software.  Some applications were not able to be installed for various reasons.  If you are missing an application that you were using prior to the upgrade the please send an email to hpccsupport@ttu.edu and we can work with you to sort out what happened.

 


 

 

Quanah and Hrothgar Update - October 2017

In October of 2017, the HPCC decommissioned parts of the Hrothgar cluster in order to double the size of our newest cluster Quanah.  During this time, the HPCC made a number of upgrades to Hrothgar to bring it in line with Quanah.  This included switching to an NFS based system for serving up applications and migrating from Rocks to OpenHPC for cluster management.  These changes allowed us to update Hrothgar from CentOS 6.3 up to CentOS 6.9 as well as remove the need to use the Lustre file system to serve up applications.

  • Why does this matter?
    • We have updated our operating system, kernel and compiler versions which will require that all code compiled on the old Hrothgar to be recompiled on the new Hrothgar.
    • Because all code must be recompiled, /lustre/work/apps has been disabled and has been replaced with an NFS server.
  • Where is /lustre/work/apps?
    • It has been removed and is no longer accessible.
    • It has been replaced by our NFS server which can be accessed using Modules.  See the "Software Environment Setup" user guide for more information on how to use Modules.
  • Where have my applications gone?
    • Applications stored on /lustre/work/apps have been removed and can only be accessed using Modules.  See the "Software Environment Setup" user guide for more information on how to use Modules.
    • Applications stored in your /home, /lustre/work and /lustre/scratch areas will still exist but will likely require re-compilation/re-installation to run on the new cluster.
    • HPCC will provide help to users for how to install application in users' own folders. For questions and assistance, please contact hpccsupport@ttu.edu.
    • If you need an application installed then please submit a software request here: Software Request Page
  • What happened to my .bashrc file?
    • Your old .bashrc file has been moved to ~/.bashrc-old.
  • Why do my previously working applications no longer run on Hrothgar/Ivy?
    • We have updated our operating system, kernel and compiler versions which will require that all code compiled on the old Hrothgar to be recompiled on the new Hrothgar.
  • Will my old job submission scripts work?
    • If you are submitting to one of the community clusters then your original job submission scripts should work as the parallel environments and queue names are the same.
    • If you are submitting your job to the old "normal" queue then you will need to make the following changes:
      • Change the -pe option to "-pe west 12"  (or any multiple of 12 for MPI)
      • Change the -q option to "-q west"
    • If you are submitting your job to the old "ivy" queue then you will need to make the following changes:
      • Change the -pe option to "-pe ivy 20" (or any multiple of 20 for MPI)
  • Where should I go to run my jobs?
    • To be able to load the software settings for running jobs or compiling code, you have to log on to "hrothgar.hpcc.ttu.edu" or "ivy.hpcc.ttu.edu", depending on which queue you are going to use to run your jobs. (Temporary note: the ivy.hpcc.ttu.edu external address is still being set up at this time; for now you can get to ivy by logging into hrothgar.hpcc.ttu.edu and then using "ssh ivy" from there.)
    • If you need to run jobs on the "west", "Chewie", "R2D2" or "Yoda" queues, you need to log on to "hrothgar.hpcc.ttu.edu".
    • If you need to run jobs on the "ivy" queue or one of the rest community queues, you need to log on to "ivy.hpcc.ttu.edu".
  • What queues can I make use of?
    • Run the command "qstat -g c" or visit the Queue Status page.
      • Currently, the Queue Status page only shows queue status for Quanah, Hrothgar will be included soon.
    • On Hrothgar, the "ivy" and "west" queues are available to all users, the other queues are available only to research groups who own the queues.
    • The "serial" queue will be available at a later date.

High Performance Computing Center