Texas Tech University

TechGrid

What is TechGrid ?

The Grid refers to an infrastructure that enables the integrated, collaborative use of high-end computers, networks, databases, and scientific instruments owned and managed by multiple organizations, TTU areas, and units. Grid applications often involve large amounts of data and/or computing and often require secure resource sharing across organizational boundaries, and are thus not easily handled by today's Internet and Web infrastructures. Grid computing is an application that harnesses unused compute cycles, typically from campus computers not in use during a particular time period, to process large computational programs. Grid applications allow secure resource sharing across organizational boundaries; computational projects not easily handled by today's Internet and Web infrastructures.

TechGrid is a collection of over 2000 cores located on the TTU campus (TTU Library contributed 1000 3.2GHz cores in summer 2013) and the TTU IT HPCC (High Performance Computing Center) Linux multiprocessors servers organized into a computational grid. It may be accessed using Condor (all platforms) or Globus (Linux) software. Unique mechanisms enable TechGrid to effectively harness/utilize wasted CPU power from otherwise idle desktop workstations. For instance, TechGrid can be configured to only use desktop machines where the keyboard and mouse are idle. Idle computer time is donated by many areas and units on campus, such as the Rawls College of Business, the Department of Mathematics and Statistics, the College of Education, the English Department, and the TTU IT Division Advanced Technology Learning Centers.

What type of Grid is TechGrid ?

TechGrid is a campus-wide grid. A campus grid is defined as a distributed computing system composed of desktop class and server class computers bound together with grid middleware to provide computational resources from unused computing cycles that would normally go unused during non-peak hours of operation.

How does the TechGrid work?

The grid distributes a compute job among compute nodes within the grid using grid middleware as the modus operandi to facilitate distributed computing. The name of the grid middleware is Condor.

Where is TechGrid now ?

TechGrid's compute nodes are located in "grid zones" located around Texas Tech campus: TTU Library,  the Advanced Technology Learning Center (ATLC), the High Performance Computing Center (HPCC) at Reese Center, the Computer Science department, the Business Building, the North Computing Center, the Math Building, the English Department, and the College of Education. Currently, TechGrid is made up of over 2000 cores spanning several domains and various operating systems.

What is Condor?

Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queuing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. While providing functionality similar to that of a more traditional batch queuing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines - if no shared file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization's computational power into one resource. The ClassAd mechanism in Condor provides an extremely flexible and expressive framework for matching resource requests (jobs) with resource offers (machines). Jobs can easily state both job requirements and job preferences. Likewise, machines can specify requirements and preferences about the jobs they are willing to run. These requirements and preferences can be described in powerful expressions, resulting in Condor's adaptation to nearly any desired policy. Condor can be used to build Grid-style computing environments that cross administrative boundaries. Condor's "flocking" technology allows multiple Condor compute installations to work together. Condor incorporates many of the emerging Grid-based computing methodologies and protocols. For instance, Condor-G is fully interoperable with resources managed by Globus. Condor is the product of the Condor Research Project at the University of Wisconsin-Madison (UW-Madison), and it was first installed as a production system in the UW-Madison Department of Computer Sciences since 1988

High Performance Computing Center