HPCC Facilities and Equipment
For a description of HPCC software and other services, click here.
Facilities
The High Performance Computing Center (HPCC) operates and maintains hardware located in three separate data centers. Our main production clusters, along with the central file system servers and several smaller systems, are in the Experimental Sciences Building I (ESB I) on the main TTU campus. Our secondary locations, containing isolated specialty clusters for specific research applications, are found in the Chemistry Building on the main campus and at Reese Center several miles west of the main campus.
Equipment
The HPCC resources include both generally available public and researcher-owned private or dedicated systems. Public nodes and storage are available to any TTU researcher, or upon special request, to external research partners of TTU faculty. These resources, including a standard amount per user of storage of data for use on the clusters, are provided at no cost to TTU researchers. Access to processing time is allocated on a fair-share queueing basis for all generally available nodes. Private resources owned by individual researchers are administered by HPCC staff when part of HPCC clusters, or by the research group itself for dedicated clusters.
Note: The HPCC does not provide support for clusters or equipment not housed in HPCC machine room facilities.
Primary Campus-Based Computing Clusters
The HPCC's primary cluster is RedRaider, consisting of the Nocona, Quanah, and Matador partitions, and the Ivy and associated Community Cluster portions of the older Hrothgar, totaling 1.9 PFLOPS of raw peak computing power. An overview of the configuration for the primary cluster partitions is shown in the table below.
RedRaider CPU (Nocona) |
RedRaider GPU (Matador) |
Quanah (Omni) |
Hrothgar (Ivy) |
|
Total Nodes |
240 |
20 |
467 |
100 |
Theoretical Max Performance |
983 TFLOPS |
280 TFLOPS |
565 TFLOPS |
80 TFLOPS |
Benchmarked Performance |
804 TFLOPS |
226 TFLOPS |
485 TFLOPS |
N/A |
Operating System |
CentOS 8.1 |
CentOS 8.1 |
CentOS 7.4 |
CentOS 7.4 |
Node Manufacturer |
Dell |
Dell |
Dell |
Dell |
Node Model |
PowerEdge C6525 |
PowerEdge R740 |
PowerEdge C6320 |
PowerEdge C6220 II |
Cooling Mechanism |
Liquid Cooled |
Air Cooled |
Air Cooled |
Air Cooled |
Processor |
AMD 7702 |
Intel Xeon Gold 6242 |
Intel Xeon E5-2695 v4 |
Intel Xeon E5-2670v2 |
Processor Family |
EPYC™ Rome |
Cascade Lake |
Broadwell |
Ivy Bridge |
Cores per Processor |
64 |
20 |
18 |
10 |
Cores per Node |
128 |
40 cpu + |
36 |
20 |
Total Cores |
30,720 |
800 cpu + |
16,812 |
2,000 |
GPU |
N/A |
NVIDIA TESLA V100 |
N/A |
N/A |
Number of GPUs per Node |
0 |
2 |
0 |
0 |
Total GPUs |
0 |
40 |
0 |
0 |
Memory per Node |
512 GB |
384 GB |
192 GB |
64 GB |
Memory per Core |
4 GB |
9.6 GB |
5.33 GB |
3.2 GB |
High-Speed Fabric |
Mellanox HDR 200 InfiniBand |
Mellanox HDR 100 Infiniband |
Intel OmniPath |
Mellanox QDR InfiniBand |
High-Speed Fabric Speed |
200 Gbps |
100 Gbps |
100 Gbps |
40 Gbps |
Topology |
Non-Blocking Fat-Tree |
Non-Blocking Fat-Tree |
Non-Blocking Fat-Tree |
Non-Blocking Fat-Tree |
Ethernet Network Speed |
25 GbE |
25 GbE |
10 GbE |
1 GbE |
Efficiency |
81.48% |
80.57% |
85.84% |
N/A |
Cluster Details
Texas Tech worked with our partners at Dell to implement one of the first large-scale university-based AMD EPYC™ Rome clusters in the world. This new cluster, named RedRaider, consists of two distinct parts: the Nocona CPU partition and the Matador GPU partition. To honor Texas Tech's official colors, the internal management nodes are named Scarlet and Midnight. The RedRaider cluster was commissioned in Fall 2020 and began production operation in January 2021.
The Quanah cluster was first commissioned in early 2017 and then was upgraded to twice its capacity in Q4 of 2017. To connect with West Texas history, the cluster is named for Quanah Parker, and its internal management node Charlie is named for Charles Goodnight. Its nodes are now operated as a partition of the RedRaider cluster.
The Hrothgar cluster was initially built in 2011. To honor the history of the first clusters of this type, the Hrothgar cluster and its internal management node Hygelac are named for characters in the Beowulf mythic poem. Some portions of the original Hrothgar cluster have been retired to make room for newer systems. The remaining Hrothgar parallel nodes were commissioned in 2014 as an upgrade to the now decommissioned Hrothgar cluster. This queue is named Ivy in reference to the "Ivy Bridge" Intel processor family name. Other remaining portions of the Hrothgar cluster include the serial and community cluster nodes described below.
Other, Specialty, and Off-Campus Clusters
The HPCC also operates a mix of older and specialty clusters and equipment. The community cluster (22 nodes of various researcher-owned processor and memory configurations operated as part of the Ivy Infiniband fabric), and serial resources (310 nodes of dual 6-core Westmere processors and 24 GB of memory each, operated as separate nodes with no parallel fabric) are remaining special-purpose portions of the Hrothgar cluster.
A dedicated cluster is a standalone cluster that is paid for by a specific TTU faculty member or research group housed in the HPCC machine rooms to provide UPS power and cooling. An example of this type of cluster is Nepag, dedicated to health physics dose calculation codes. The HPCC also houses the DISCI cluster, used by researchers in the Department of Computer Science, and test resources supporting the National Science Foundation Cloud and Autonomic Computing Industry/University Cooperative Research center including the Redfish cluster. Realtime 2 is a dedicated private weather modeling cluster owned by the Atmospheric Sciences group. For these clusters system administration support is provided by the researcher or research group itself, with consultation available if necessary during business hours on request with HPCC staff. Dedicated clusters will only be accepted for operation within HPCC machine room facilities on a space-available basis by the HPCC. Space is not guaranteed for such systems.
Additionally, TTU has access to resources on Lonestar 5 operated by the Texas Advanced Computing Center in Austin that can be made available for competitive allocations for specific projects on special request and serve as a "safety valve" for certain large-scale projects. The TTU portion corresponds to approximately 1,600 cores (roughly 64 teraflops) of Intel Haswell processors out of the 30,048 core cluster total. The Lonestar 5 cluster was commissioned in 2015 and is currently at the end of its service life. Contact hpccsupport@ttu.edu for more details if you are interested in using this system during the remainder of its available usage period.
Cluster-Wide Storage
The HPCC operates a DataDirect Network storage system capable of providing storage for up to 6.9 petabytes of data. This storage space is configured using Lustre to provide a set of common file systems to the RedRaider, Quanah and Hrothgar clusters, and is provisioned with a 1.0 petabyte backup system that can be used to protect research data. The file system uses a 100-Gbps storage network and a combination of Lustre Network (LNet) routers to bridge traffic from the distinct cluster fabric networks into the storage fabric network. A set of central NFS servers also provides access to applications used across each of the clusters.
Researcher-Owned Compute and Storage Capacity
Researchers who need additional computing capacity beyond the generally available fair-share queue resources and are considering buying dedicated hardware or storage may wish to talk with us about purchasing researcher-specific CPU capacity out of the main cluster resources. This capability allows researcher workloads to be scheduled on a "guaranteed class of service" level for cpu or storage. Purchased CPU time or storage space may be shared by users belonging to a given research group. Additions of this nature are subject to space or infrastructure limitations. Please check with the HPCC staff and management for the current options.
In this option, researchers may choose to purchase time on physical compute nodes and/or storage that are operated as part of the HPCC equipment and will receive priority access equal to the purchased resource capacity. The HPCC will house, operate and maintain the resources according to a formal operational agreement that typically lasts as long as they are covered by the researcher by a service contract or remain in warranty. The new warranty period is usually determined at the time of purchase of the equipment and is typically five years, although extensions are possible. The HPCC also offers the opportunity for researchers to purchase backup services for data housed on the HPCC storage system. Contact hpccsupport@ttu.edu for more details.
High Performance Computing Center
-
Phone
806.742.4350 -
Email
hpccsupport@ttu.edu