HPCC Facilities and Equipment
For a description of HPCC software and other services, click here.
The High Performance Computing Center's (HPCC) hardware is located in three data center locations. The main production clusters, Quanah and Hrothgar, along with the central file system servers and a number of smaller systems are in the Experimental Sciences Building on the main TTU campus. Certain specialty clusters including Weland, Realtime2, and Janus are located at Reese Center, several miles from then main campus. A final set of resources consists of TTU's portion of the Lonestar 5 cluster, comprising approximately 1600 cores, operated at the Texas Advanced Computing Center in Austin.
These resources include both generally-available public and researcher-owned private systems. Public nodes are available to any TTU researcher. Private nodes are owned by individual researchers and administered by HPCC. All of the generally-available cluster resources operate using a weighted fair-share queueing system to provide a flexible balance to ensure that newly submitted jobs compete favorably for upcoming batch queue slots compared to long-running job sequences. The TTU resources on Lonestar 5 are available for competitive allocations for specific projects on special request and serve as a "safety valve" for certain large-scale projects.
The Quanah cluster, commissioned in early 2017, consists of 8748 cores in 243 worker nodes and a total of 45.6 terabytes of memory (36 cores per node, 192 GB memory per node), of which 20 nodes are owned by researchers and 223 are for general use. The nodes consist of dual-18-core Broadwell Xeon processors and are connected by a non-blocking 100 Gbps Intel Omni-Path fabric. The Quanah cluster was benchmarked during commissioning at 253 Teraflops. The cluster is operated with a single queue, with jobs sorted according to projects in order to satisfy the needs of the participating research groups. The Quanah cluster is named for Quanah Parker, and its internal management node Charlie is named for Charles Goodnight.
The Hrothgar cluster was initially built in 2011 with 640 nodes (7680 cores) for parallel jobs and 68 private nodes connected with a DDR (20 Gbps) Infiniband fabric, and 32 nodes for serial jobs. Each of the parallel and serial nodes began as two Westmere 2.8 GHz 6-core Xeon processors with 24 GB memory. The parallel nodes had an initial a peak rating of 86 teraflops and a recorded high performance LINPACK rating of 68 teraflops. Several updates have occurred since then, including the replacement of the core and leaf switches with QDR Infiniband and updating of approximately 100 of the nodes to newer Ivy Bridge Xeon systems.
The current composition of the Hrothgar cluster is approximately 10,000 cores spanning just over 700 nodes, with a theoretical maximum capacity of approximately 180 teraflops. The normal queue, consisting of the original Westmere processors, has approximately 6000 cores, and the ivy queue, consisting of the newer Ivy Bridge processors, has approximately 1800 cores. The serial queue at present is just under 900 cores, used for non-parallel processing. The rest are dedicated nodes owned by particular researchers or research groups, some of which are configured with extra memory or otherwise customized for particular workloads. The Hrothgar cluster and its internal management node Hygd are named for characters in the Beowulf mythic poem.
The HPCC has a DataDirect Network storage system capable of providing storage for up to 2.5 petabytes of data.This storage space is configured using Luster to provide a set of common file systems to the Quanah and Hrothgar clusters. The file system uses a combination of LNet routers to bridge the Omni-Path traffic to Infiniband for the Quanah cluster, direct QDR Infiniband to connect the high-speed compute fabric on Hrothgar, and Gigabit Ethernet to connect to Hrothgar serial nodes. In addition to the DDN central file system, a number of researchers and research groups have purchased dedicated servers for long-term data storage, typically in increments of tens of terabytes, that are also reachable by the same methods.
The Janus, Weland and Realtime clusters are located at the off-campus Reese data center, which also houses some of the serial nodes for the Hrothgar cluster. Janus and Weland are also named after characters from the Beowulf story.
Janus is a Microsoft Windows HPC cluster with twelve 20 core nodes. This system is used for a small number of dedicated workloads that depend on specific licensed software that is not available more seedily for the Linux clusters. This is not intended to be a general Windows login system for the university, but instead to serve those specific workloads that require Windowes HPC Server support.
Weland is a Linux cluster with sixteen 8 core nodes, and each node contains two Xeon E5540 processors for a total of 128 cores running at 2.53GHz with 16 GB main memory. It is primarily operated as a TechGrid resource to augment the campus cycle-sharing grid.
Realtime is a dedicated private weather modeling cluster owned by the Atmospheric Sciences group.
Additionally, a portion of the Hrothgar serial queue operates from the Reese data center.