Texas Tech University

Using Jupyter Notebooks from RedRaider Worker Nodes

Jupyter notebooks and Jupyter Lab provide a variety of useful tools for research analysis and data processing support. The HPCC is in the process of developing an interactive resource to support use of these tools through a web-based interface. Until this is ready for use, the following information provides steps you can follow to support use of Jupyter from worker nodes in the cluster. Note that you should not invoke these steps directly from cluster login nodes, as these are not meant to support resource-intensive computation or data processing; instead these should be used from worker nodes as explained on this page.

The steps to install and use either Jupyter notebook or Jupyter Lab are essentially identical, and can be separated into preparation and usage stages. For each of these stages, first sign on to one of the login nodes as described in the page on Connecting to HPCC Resources.

Preparation:

These steps generally only need to be performed once, or to update as necessary to newer versions of the tools. 

For convenience and ease in updating, we recommend the use of Anaconda or Miniconda to set up and isolate the Jupyter environment from your other software and functions. This process is explained in detail on the page on installing a local copy of Python in the HPCC User Guides. Please refer to that page if you would like more information. Here we summarize only the essential commands.

  1. To install Miniconda in your account for the first time if it has not already been installed, execute the following commands:

    /lustre/work/examples/InstallPython.sh . $HOME/conda/etc/profile.d/conda.sh conda activate

  2. Create an environment to hold the Jupyter notebook or Jupyter lab software:

    conda create -n jupyternotebook conda activate jupyternotebook conda install -c conda-forge notebook

    (For Jupyter Lab, replace "notebook" with "jupyterlab" in the last line above, and we suggest that you name the environment "jupyterlab" instead of "jupyternotebook".)

  3. Create a folder in your home directory to hold jupyter runtime files. Doing so will help to reduce space usage on the worker nodes that otherwise can interfere with system files.

    mkdir ~/runtime echo 'export JUPYTER_RUNTIME_DIR=~/runtime' >> ~/.bashrc export JUPYTER_RUNTIME_DIR=~/runtime

    The first two of these lines create the runtime directory and add the environmental variable expected by Jupyter to your .bashrc file for future login sessions. The third line activates this environmental variable for your current session. For details on other Jupyter environmental variables, paths, and configuration files, see the online Jupyter documentation.

 

Usage:

The above steps only need to be done once. For routine use, there are four distinct operations that you need to perform. Keep in mind that each of the worker node interactive and ssh tunnel sessions described here should be closed and terminated when you are done using the server, and are subject to the usual time limits for worker node sessions.

  1. Sign on to a login node, then from that login node get an interactive session on a worker node in the cluster, and activate the jupyter environment for that session:

    interactive -p nocona

    (The above example command is for a single-core interactive session on the Nocona partition of RedRaider. For other options, see the HPCC User Guides and/or HPCC Training pages.)

    . $HOME/conda/etc/profile.d/conda.sh conda activate jupyternotebook

    You should now be on a worker node and the prompt should indicate "(jupyter) cpu-NN-nn:$" where "cpu-NN-nn" is the name of the node you are using. (If you are using Jupyter Lab instead of Jupyter Notebook and have followed the preparation instructions above, activate the "jupyterlab" environment instead, and replace "notebook" by "lab" in the step below.)

  2. Start the Jupyter notebook service from this worker node:

    jupyter notebook --no-browser --ip=0.0.0.0

    After a short startup phase, there should be some informational lines printed indicating that your notebook service is running, ending with:

    To access the notebook, open this file in a browser: file:///home/(eraider)/runtime/nbserver-MMMMM-open.html Or copy and paste one of these URLs: http://cpu-NN-nn:pppp/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx or http://127.0.0.1:pppp/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    Note that the worker node name, server identifier, port number, and token are unique to your instance of the notebook server. You will need the node name, port number, and token to access your notebook server during the following steps. Use the actual values for your session given by the output of your "jupyter notebook" command in the steps below.

    Please also be sure to terminate this server by typing (control-C) twice when you are done using the server and to exit the interactive session rather than let it sit idle for long periods of time when you are not using it, as it ties up resources on the cluster that could be used by others.

  3. Connect a tunnel to the worker node to allow communication with the server by following the same steps as you would to sign on normally, but instead be sure to create a tunnel by including the following specification as part of your connection: "-L pppp:cpu-NN-nn:pppp" where the value of the port (labeled "pppp" here) is the actual port number resulting from step 2. above, or by selecting the equivalent tunneling options within your ssh software. For example, to connect via ssh from the command line on a plain-text ssh program in Mac or Linux, you would use the line:

    ssh eraider@login.hpcc.ttu.edu -L pppp:cpu-NN-nn:pppp

    if you are connecting from on campus or from within the TTU VPN, or

    ssh -J eraider@ssh.ttu.edu eraider@login.hpcc.ttu.edu -L pppp:cpu-NN-nn:pppp

    if you are connecting from off campus through the TTU SSH Gateway. (For variations on the options for connecting, see the overall instructions on Connecting to HPCC Resources.) Again, use the actual value of the port number from step 2 above instead of the "pppp" placeholder we have used in the generic instructions given here. In each case, substitute your eRaider ID for "eraider" and the node name and port number from the previous step for "cpu-NN-nn" and "pppp" respectively. Please be sure to enter the port number in both places as specified.

    For non-command-line GUI ssh software, please consult the documentation for that software on establishing SSH tunnels using the information above.

    This process establishes a tunnel through the HPCC login node to the worker node and port on which your Jupyter notebook server is running. This tunnel is used in the final step below.

  4. You should now be able to reach the notebook server using a web browser on your own machine at the URL given by your notebook server when it started up. Open a browser window and go to the last URL listed, for example by copying the line printed out by your server and pasting it into the address field of the browser window. This will look like:

    http://127.0.0.1:pppp/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    where you should use the exact line from step 2 above as your URL.

    (What is happening: You are connecting to a URL on your own machine that is being tunneled by ssh through to the server running on the worker node.)

The above steps should produce a browser window that shows a Jupyter session. If your browser fails to connect, be sure that you have followed all of the steps above and copied and pasted the URL exactly, and that any firewalls or security software on your machine are configured to allow web connections on the specified outgoing port.

As mentioned above, please also be sure to terminate this server by typing (control-C) twice in the original worker node session when you are done using the server, and to exit the worker node interactive session and ssh tunnel session rather than let them sit idle for long periods of time when you are not using the server. Failing to do so will tie up resources that in general could better be used for batch computing.

High Performance Computing Center