HPCC Facilities and Equipment
For a description of HPCC software and other services, click here. For information on how to use the equipment described below, please consult our HPCC User Guides and HPCC Training pages.
Facilities
The High Performance Computing Center (HPCC) operates and maintains hardware located in three separate data centers. Our main production clusters, along with the central file system servers and several smaller systems, are in the Experimental Sciences Building I (ESB I) on the main TTU campus. Our secondary locations, containing isolated specialty clusters for specific research applications, are found in the Chemistry Building on the main campus and at Reese Center several miles west of the main campus.
Equipment
The HPCC resources include both generally available public and researcher-owned private or dedicated systems. Public nodes and storage are available to any TTU faculty member, research grant leader, to TTU students working with them, or upon special request, to external research partners of TTU faculty. These resources, including a standard amount per user of storage of data for use on the clusters, are provided at no cost to TTU faculty members, research grant leaders, and to students and research partners working with them, with access to processing time allocated on a fair-share queueing basis for all generally available nodes.
For those with large-scale needs, dedicated computational resource priority on one or more CPU nodes or GPUs or to storage beyond the base amounts may be purchased on an annual or pre-paid basis by individual faculty members, research grant leaders, or research groups. Dedicated resources owned by individual researchers or groups are administered by HPCC staff when operated as part of HPCC clusters.
Note: The HPCC does not provide support for clusters or equipment not housed in HPCC machine room facilities.
Primary Campus-Based Computing Clusters
The HPCC's primary cluster is RedRaider, consisting of the Nocona and Quanah/XLQuanah CPU partitions, the Matador and Toreador GPU partitions, and the Ivy interactive and high-memory nodes totaling approximately 2.2 PFLOPS of raw peak computing power. An overview of the configuration for the primary cluster partitions is shown in the table below.
Partition: |
Nocona |
Quanah / XLQuanah |
Matador |
Toreador |
Ivy |
Type |
CPU |
CPU |
GPU |
GPU |
Auxiliary CPU* |
Total Nodes |
240 |
467 / 16 |
20 |
11 |
50 / 2 |
Theoretical Max |
983 TFLOPS |
565 TFLOPS |
280 TFLOPS |
287 TFLOPS |
40 TFLOPS |
Benchmarked |
804 TFLOPS |
485 TFLOPS |
226 TFLOPS |
|
N/A |
OS |
CentOS 8.1 |
CentOS 7.4 |
CentOS 8.1 |
CentOS 8.1 |
Rocky Linux 8.5 /CentOS 8.1 |
Manufacturer |
Dell |
Dell |
Dell |
Dell |
Dell |
Node Model |
PowerEdge C6525 |
PowerEdge C6320 |
PowerEdge R740 |
Poweredge R7525 |
PowerEdge C6220 II |
Cooling |
Liquid Cooled |
Air Cooled |
Air Cooled |
Air Cooled |
Air Cooled |
Processor Make and Model |
|||||
Family |
Rome |
Broadwell |
Cascade Lake |
Rome |
Ivy Bridge |
Cores/Processor |
64 |
18 |
20 |
8 |
10 |
Cores/Node |
128 |
36 |
40 cpu + |
16 cpu + |
20 |
Total Cores In Partition |
30,720 |
16,812 / 576 |
800 cpu + |
528 cpu + |
1,000 |
GPU (if present) |
N/A |
N/A |
N/A |
||
GPUs/Node |
0 |
0 |
2 |
3 |
0 |
Total GPUs |
0 |
0 |
40 |
33 |
0 |
Memory/Node |
512 GB |
192 GB /256 GB |
384 GB |
192 GB |
128 GB |
Memory/Core |
4 GB |
5.33 GB |
9.6 GB |
12 GB |
6.4 GB |
High-Speed Fabric |
Mellanox HDR 200 InfiniBand |
Intel OmniPath |
Mellanox HDR 100 Infiniband |
Mellanox HDR 100 Infiniband |
Mellanox FDR InfiniBand |
Fabric Speed |
200 Gbps |
100 Gbps |
100 Gbps |
100 Gbps |
56 Gbps |
Topology |
Non-Blocking Fat-Tree |
Non-Blocking Fat-Tree |
Non-Blocking Fat-Tree |
Non-Blocking Fat-Tree |
Up to 2:1 Oversubscribed Fat-Tree |
Ethernet |
25 GbE |
10 GbE |
25 GbE |
25 GbE |
1 GbE |
Efficiency |
81.5% |
85.8%/(N/A) |
80.6% |
|
N/A |
* Auxiliary cpus are used to support workflow management, interactive use, and specialty nodes such as high-memory instances. At present, there are two Ivy Bridge class nodes with 1536 GB of memory available in the himem-ivy partition for use in special use cases that require high memory per node.
Cluster Details
Texas Tech worked with our partners at Dell to implement one of the first large-scale university-based AMD EPYC™ Rome clusters in the world. This new cluster, named RedRaider, consists of three distinct parts: the Nocona CPU partition and the Matador and Toreador GPU partitions. To honor Texas Tech's official colors, the internal management nodes are named Scarlet and Midnight. The RedRaider cluster was commissioned in Fall 2020 and began production operation in January 2021.
The Quanah cluster was first commissioned in early 2017 and then was upgraded to twice its capacity in Q4 of 2017. To connect with West Texas history, the cluster is named for Quanah Parker, and its internal management node Charlie is named for Charles Goodnight. Its nodes are now operated as a partition of the RedRaider cluster. In 2022, 16 nodes were added as the XLQuanah partition to accommodate needs for long-running non-checkpointed workflows, with hardware similar to Quanah but optimized for single-node jobs.
The Ivy partitions are named in reference to the "Ivy Bridge" Intel processor family name, and are primarily intended to support interactive computational use using remaining portions of the former Hrothgar cluster. This capability is currently under development. A small number of nodes with 1.5 TB of memory each are also currently in operation to assess the need for further future high memory computng support. Please contact hpccsupport@ttu.edu if you require these capabilities.
Other, Specialty, and Off-Campus Clusters
The Texas Tech HPCC also manages access to a portion of the resources on Lonestar 6 operated by the Texas Advanced Computing Center in Austin that can be made available for competitive allocations for specific projects on special request and serve as a "safety valve" for certain large-scale projects. The Texas Tech portion of Lonestar 6 corresponds to annual continuous use of approximately 38 nodes (4,860 cores), roughly 190 teraflops, of AMD EPYC 7763 Milan processors out of the 71,680 core cluster total. The Lonestar 6 cluster was commissioned in 2022. Codes proposing to use it must be able to occupy entire 128-core nodes. Contact hpccsupport@ttu.edu for more details if you are interested in using this system.
Additionally, research group needs sometimes require use of dedicated cluster resources. A dedicated cluster is a standalone cluster that is paid for by a specific Texas Tech faculty member or research group that is housed in the HPCC machine rooms to provide UPS power and cooling. An example of this type of cluster is Nepag, dedicated to health physics dose calculation codes. The HPCC also houses the DISCI cluster, used by researchers in the Department of Computer Science, and test resources supporting the National Science Foundation Cloud and Autonomic Computing Industry/University Cooperative Research center including the Redfish cluster. Realtime 2 is a dedicated private weather modeling cluster owned by the Atmospheric Sciences group. System administration support for these clusters is required to be provided by the researcher or research group itself, with consultation available if necessary during business hours on request with HPCC staff. Dedicated clusters will only be accepted for operation within HPCC machine room facilities on a space-available basis by the HPCC. Space is not guaranteed for such systems.
Cluster-Wide Storage
The HPCC operates a DataDirect Network storage system capable of providing storage for up to 6.9 petabytes of data. This storage space is configured using Lustre to provide a set of common file systems to the RedRaider, Quanah and Hrothgar clusters, and is provisioned with a 1.0 petabyte backup system that can be used to protect research data. The file system uses a 200-Gbps storage network and a combination of Lustre Network (LNet) routers to bridge traffic from the distinct cluster fabric networks into the storage fabric network. A set of central NFS servers also provides access to applications used across each of the clusters.
Researcher-Owned Compute and Storage Capacity
Researchers who need additional computing capacity beyond the generally available fair-share queue resources and are considering buying dedicated hardware or storage may wish to talk with us about purchasing researcher-specific CPU and/or GPU capacity out of the main cluster resources. This capability allows researcher workloads to be scheduled on a "guaranteed class of service" level on an annual basis for pre-reserved cpu or gpu priorithy access or dedicated storage. Resources purchased in this manner may be shared by users belonging to a given research group. Additions of this nature are subject to space or infrastructure limitations. Please check with the HPCC staff and management for the current options.
In this option, researchers may choose to purchase time on physical compute nodes, GPUs, and/or storage that are operated as part of the HPCC equipment and will receive priority access equal to the purchased resource capacity. The HPCC will house, operate and maintain the resources according to a formal operational agreement that typically lasts as long as they are covered by the researcher by a service contract or remain in warranty. The new warranty period is usually determined at the time of purchase of the equipment and is typically five years, although extensions are possible. The HPCC also offers the opportunity for researchers to purchase backup services for data housed on the HPCC storage system. Contact hpccsupport@ttu.edu for more details.
High Performance Computing Center
-
Phone
806.742.4350 -
Email
hpccsupport@ttu.edu