Texas Tech University

Transferring Data

Transferring files/data to and from HPCC resources should be done using the HPCC Globus Connect data transfer service, powered by Globus Connect. Many large research labs and computing centers at other universities also have their own Globus Connect Server endpoints that you can use to move data directly between sites.

Note: Please refrain from using scp, sftp, rsync, or direct connections to the login nodes to transfer data.  Doing so can take up considerable resources on the login ndoes and can interfere with their proper use by other researchers while also generally moving data more slowly than Globus Connect. Violation of these guidelines can lead to suspension of HPCC access if needed to restore proper operation of the login nodes for other researchers.

Table of Contents:

  1. Transferring data between your computer and the HPCC
  2. Transferring data between the HPCC and other sites

 

Transferring data between your personal computer and the HPCC Cluster

To prevent unnecessary load on the cluster login nodes, we require that all users refrain from using scp, sftp, rsyncdirect ssh connections or any other data transfer tool for large-scale data or file movement. These types of transfers are subject to being terminated if discovered by automatic cleanup scripts or by the HPCC staff if they become intrusive. Instead, we provide the use of Globus Connect services to transfer data into and out of the HPCC. There are several reasons to prefer this service:

  • The Globus Connect service is well connected to the campus network and therefore the outside world with redundant high-speed connections.
  • The machines that run our Globus Connect endpoint are more robust for data transfer and have better connectivity to storage than the cluster login nodes.
  • The Globus Connect service eliminates the load of data transfer from the cluster login nodes, which are used by many people for other functions.
  • Globus Connect software includes error-detection and correction automatic retry features and can move data using multiple streams for optimum transfer speeds.
  • There are easy-to-use Globus Connect personal clients available for Linux, Mac or Windows.​

Transferring data between your computer and the HPCC should be done by installing Globus Connect Personal on your own computer and creating a Globus Connect collection on it.  This requires can be done by following one of the guides below:

 

Once you have a personal Globus Connect collection set up, you can now use the web interface of the service to transfer data between your computer and the HPCC. For detailed instructions on how to do this, please see the guide "How To Log In and Transfer Files with Globus" provided by the Globus team. When going through these instructions, please keep in mind the following:

  • To reach your data on the HPCC Lustre storage area (/home, /lustre/work, /lustre/scratch, or the /lustre/research area for your group if one is available), set the endpoint to "TexasTechHPCC"
  • To reach your data on your personal computer, set the collection to the name you selected when you created it.

Transferring data between the HPCC and other sites

Many research organizations and universities have established Globus Connect collections, making the transfer data between these sites and the HPCC a fast and simple process. To prevent unnecessary load on the cluster login nodes, we ask that all users refrain from using scp, sftp, rsyncdirect ssh connections or any other data transfer tool. These types of data transfers are likely to be terminated by automatic cleanup scripts or by HPCC staff if they become intrusive. Instead, we suggest that you use Globus Connect to transfer data into and out of the HPCC.  

The first step you will need to take is to determine the name of the endpoint collection used by the outside organization. If the external site you want to use as your destination does not have Globus Connect, but you have an account on a resource there, it's easy to set up an instance of Globus Connect Personal on the remote resource to use as a destination to transfer your data directly from the HPCC to it and proceed as in the previous section. If the remote site does have Globus Connect Server installed as we do here at the HPCC, all you will need is the name of the collection used by that site for use by its researchers.

Once you have the remote site collection name determined, you can now use the Globus Connect interface to transfer data between your computer and the HPCC. For detailed instructions on how to do this, please see the guide "How To Log In and Transfer Files with Globus" written by the Globus team. When going through these instructions, please keep in mind the following:

  • To reach your data on the HPCC Lustre storage area (/home, /lustre/work, /lustre/scratch, or the /lustre/research area for your group if one is available), set the endpoint to "TexasTechHPCC"
  • To reach your data at the external site, set the collection to the one provided by the other site or to your own Globus Connect Personal collection name there if that is the method you are using.

Please feel free to contact hpccsupport@ttu.edu if you have any questions about transferring data to and from the cluster.

 

High Performance Computing Center