HPCC Facilities and Equipment
For a description of HPCC software and other services, click here.
The High Performance Computing Center's (HPCC) hardware is located in three data center locations. The main production clusters, Quanah and Hrothgar, along with the central file system servers and a number of smaller systems are in the Experimental Sciences Building on the main TTU campus. By far the largest of these resources is the Quanah cluster.
Certain specialty clusters including Weland, Realtime2, and Janus are located at Reese Center, several miles from then main campus. A final set of resources consists of TTU's portion of the Lonestar 5 cluster, comprising approximately 1600 cores, operated at the Texas Advanced Computing Center in Austin.
These resources include both generally-available public and researcher-owned private systems. Public nodes are available to any TTU researcher. Private nodes are owned by individual researchers and administered by HPCC. All of the generally-available cluster resources operate using a weighted fair-share queueing system to provide a flexible balance to ensure that newly submitted jobs compete favorably for upcoming batch queue slots compared to long-running job sequences.
These generally-available resources, including storage of data for use on the clusters, are provided at no cost to TTU researchers. Additionally, the TTU resources on Lonestar 5 are available for competitive allocations for specific projects on special request and serve as a "safety valve" for certain large-scale projects.
Researchers who need need additional computing capacity beyond the generally-available resources and are considering buying dedicated hardware or storage may wish to talk with us about the community cluster option. This capability allows addition of additional equipment that can be operated on a "guaranteed class of service" level for cpu or storage. Additions of this nature are subject to space or infrastructure limitations. Please check with the HPCC staff and management for the current options.
In this option, researchers may choose to purchase physical compute nodes and/or storage that are operated as part of the HPCC equipment, and will receive priority access equal to the purchased resource capacity. The HPCC will house, operate and maintain the resources according to a memorandum of understanding, which typically lasts as long as they are covered by the researcher by a service contract or remain in warranty. The new warranty period is usually determined at the time of purchase of the equipment and is typically three to five years, although extensions are possible. Contact us for more details.
A dedicated cluster is a standalone cluster that is paid for by a specific TTU faculty member or research group. HPCC is able to, subject to space and infrastructure availability, house these clusters in its machine rooms providing system administration support, UPS power and cooling. Typically, for these clusters HPCC system administration support is by request with day to day cluster administration provided by the owner of the cluster. Other clusters and dedicated equipment exist on campus for which the HPCC provides occasional assistance to the researchers by request on a consulting basis.
The newest cluster, Quanah, has 467 worker nodes with 36 cores each for a total of 16,812 cores, of which 16,092 are reserved for general use and 720 cores are owned by specific research groups. To connect with West Texas history, the cluster is named for Quanah Parker, and its internal management node Charlie is named for Charles Goodnight. Commissioned in early 2017 and expanded to its current size later in that year, it is based on Dell C6300 enclosures holding four C6320 nodes each. The worker nodes consist of dual-18-core Broadwell Xeon processors (36 cores per node) with 192 GB memory per node. The software environment is based CentOS 7 Linux, controlled by Intel HPC Orchestrator, and has a fully non-blocking Intel Omni-Path 100 Gbps fabric for MPI computing. The cluster is operated with a single queue, with jobs sorted according to projects in order to satisfy the needs of the participating research groups. Benchmarks show the performance of Quanah to be approximately 485 Teraflops as of late 2017.
Hrothgar is an older Dell Linux Cluster currently consisting of 630 total nodes and 8246 total processing cores, out of which 7408 cores are made available for general use and the rest are owned by specific research groups. The Hrothgar cluster was initially built in 2011. Several updates have occurred since then, including the replacement of the core and leaf switches with QDR Infiniband and updating of approximately 100 of the nodes to newer Ivy Bridge Xeon systems. Most of the Hrothgar nodes are composed of two 2.8 GHz 6-core processors with 24 GB memory, and the rest, approximately 20% of the total, are in denser 20-core and 32-core nodes. Roughly 90% of these nodes are connected to either 20 Gbps or 40 Gbps Infiniband fabric optimized for parallel computing, and the remainder are dedicated to serial processing. The Hrothgar parallel nodes in the current configuration have a total estimated peak rating of approximately 80 teraflops. To honor the first clusters of this type, the Hrothgar cluster and its internal management node Hygd are named for characters in the Beowulf mythic poem.
The HPCC has a DataDirect Network storage system capable of providing storage for up to 2.5 petabytes of data.This storage space is configured using Luster to provide a set of common file systems to the Quanah and Hrothgar clusters. The file system uses a combination of LNet routers to bridge the Omni-Path traffic to Infiniband for the Quanah cluster, direct QDR Infiniband to connect the high-speed compute fabric on Hrothgar, and Gigabit Ethernet to connect to Hrothgar serial nodes. In addition to the DDN central file system, a number of researchers and research groups have purchased dedicated servers for long-term data storage, typically in increments of tens of terabytes, that are also reachable by the same methods.
The Janus, Weland and Realtime clusters are located at the off-campus Reese data center, which also houses some of the serial nodes for the Hrothgar cluster. Janus and Weland are also named after characters from the Beowulf story.
Janus is a Microsoft Windows HPC cluster with twelve 20 core nodes. This system is used for a small number of dedicated workloads that depend on specific licensed software that is not available more seedily for the Linux clusters. This is not intended to be a general Windows login system for the university, but instead to serve those specific workloads that require Windowes HPC Server support.
Weland is a Linux cluster with sixteen 8 core nodes, and each node contains two Xeon E5540 processors for a total of 128 cores running at 2.53GHz with 16 GB main memory. It is primarily operated as a TechGrid resource to augment the campus cycle-sharing grid.
Realtime is a dedicated private weather modeling cluster owned by the Atmospheric Sciences group.
Additionally, a portion of the Hrothgar serial queue operates from the Reese data center.