Using Jupyter Notebooks from RedRaider Worker Nodes
Jupyter notebooks and Jupyter Lab provide a variety of useful tools for research analysis and data processing support. The HPCC is in the process of developing an interactive resource to support use of these tools through a web-based interface. Until this is ready for use, the following information provides steps you can follow to support use of Jupyter from worker nodes in the cluster. Note that you should not invoke these steps directly from cluster login nodes, as these are not meant to support resource-intensive computation or data processing; instead these should be used from worker nodes as explained on this page.
The steps to install and use either Jupyter notebook or Jupyter Lab are essentially identical, and can be separated into preparation and usage stages. For each of these stages, first sign on to one of the login nodes as described in the page on Connecting to HPCC Resources.
These steps generally only need to be performed once, or to update as necessary to newer versions of the tools.
For convenience and ease in updating, we recommend the use of Anaconda or Miniconda to set up and isolate the Jupyter environment from your other software and functions. This process is explained in detail on the page on installing a local copy of Python in the HPCC User Guides. Please refer to that page if you would like more information. Here we summarize only the essential commands.
- To install Miniconda in your account for the first time if it has not already been installed, execute the following commands:
/lustre/work/examples/InstallPython.sh . $HOME/conda/etc/profile.d/conda.sh conda activate
- Create an environment to hold the Jupyter notebook or Jupyter lab software:
conda create -n jupyternotebook conda activate jupyternotebook conda install -c conda-forge notebook
(For Jupyter Lab, replace "notebook" with "jupyterlab" in the last line above, and we suggest that you name the environment "jupyterlab" instead of "jupyternotebook".)
- If this is the first time you are installing Jupyter and you have not already done
so, create a folder in your home directory to hold jupyter runtime files and set up
the corresponding environmental variable in your current session and in your .bashrc
file for use in future sessions. Doing so will help to reduce space usage on the worker
nodes that otherwise can interfere with system files.
echo 'export JUPYTER_RUNTIME_DIR=~/runtime' >> ~/.bashrc
The first two of these lines create the runtime directory and add the environmental variable expected by Jupyter to your .bashrc file for future login sessions. The third line activates this environmental variable for your current session. For details on other Jupyter environmental variables, paths, and configuration files, see the online Jupyter documentation.
The above steps only need to be done once. For routine use, there are four distinct operations that you need to perform. Keep in mind that each of the worker node interactive and ssh tunnel sessions described here should be closed and terminated when you are done using the server, and are subject to the usual time limits for worker node sessions.
- Sign on to a login node, then from that login node get an interactive session on a worker node in the cluster,
and activate the jupyter environment for that session:
interactive -p nocona
(The above example command is for a single-core interactive session on the Nocona partition of RedRaider. For other options, see the HPCC User Guides and/or HPCC Training pages.)
This will open an interactvive session a worker node. The prompt should indicate "(jupyter) cpu-NN-nn:$" where "cpu-NN-nn" is the name of the node you are using. Take note of the worker node name for use in the steps below. (If you are using Jupyter Lab instead of Jupyter Notebook and have followed the preparation instructions above, activate the "jupyterlab" environment instead, and replace "notebook" by "lab" in the step below.)
Activate your jupyternotebook conda environment for this session.
conda activate jupyternotebook
- Start the Jupyter notebook service from this worker node:
jupyter notebook --no-browser --ip=0.0.0.0
After a short startup phase, there should be some informational lines printed indicating that your notebook service is running, ending with:
To access the notebook, open this file in a browser: file:///home/(eraider)/runtime/nbserver-MMMMM-open.html Or copy and paste one of these URLs: http://cpu-NN-nn:pppp/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx or http://127.0.0.1:pppp/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Note that the worker node name, server identifier, port number, and token are unique to your instance of the notebook server. You will need the node name, port number, and token to access your notebook server during the following steps. Use the actual values for your session given by the output of your "jupyter notebook" command in the steps below. The interactive session that hosts your server will now enter a static state and you will not need to interact with it further until you are done.
Please be sure to terminate this server instance when you are done using the server by typing (control-C) twice and then exit the interactive session rather than let it sit idle for long periods of time when you are not using it, as it ties up resources on the cluster that could be used by others.
- Open a new terminal session and connect a tunnel to the worker node as described below
to allow communication with the server. Create this session by following the same
steps as you would to sign on normally, but instead be sure to create a tunnel by including the following specification
as part of your connection: "-L pppp:cpu-NN-nn:pppp" where the value of the port labeled
"pppp" here is the actual port number resulting from step 2. above, or by selecting
the equivalent tunneling options within your ssh software tool. This will be a four
digit number like 8888 for example. The node name, cpu-NN-nn, will be the node you
connected to in the interactive session you created in step 1. above.
For example, to connect via ssh from the command line on a plain-text ssh program in Mac or Linux, you would use the line:
ssh firstname.lastname@example.org -L pppp:cpu-NN-nn:pppp
if you are connecting from on campus or from within the TTU VPN, or
ssh -J email@example.com firstname.lastname@example.org -L pppp:cpu-NN-nn:pppp
if you are connecting from off campus through the TTU SSH Gateway. (For variations on the options for connecting, see the overall instructions on Connecting to HPCC Resources.) Again, use the actual value of the port number from step 2 above instead of the "pppp" placeholder we have used in the generic instructions given here. In each case, substitute your eRaider ID for "eraider" and the node name and port number from the previous step for "cpu-NN-nn" and "pppp" respectively. Please be sure to enter the port number in both places as specified.
For non-command-line GUI ssh software, please consult the documentation for that software on establishing SSH tunnels using the information above.
This process establishes a tunnel through the HPCC login node to the worker node and port on which your Jupyter notebook server is running. This tunnel is used in the final step below.
- You should now be able to reach the notebook server using a web browser on your own
machine at the URL given by your notebook server when it started up. Open a browser
window and go to the URL listed in the last line from step 2. above. Copy the URL
that line and paste it into the address field of your browser window. This will look
where you should use the exact line from step 2 above as your URL.
(What is happening: You are connecting to a URL on your own machine that is being tunneled by ssh through to the server running on the worker node.)
The above procedure if followed in its entirety should produce a browser window that shows a Jupyter session. If your browser fails to connect, be sure that you have followed all of the steps above and copied and pasted the URL exactly, and that any firewalls or security software on your machine are configured to allow web connections on the specified outgoing port.
As mentioned above, please also be sure to terminate this server by typing (control-C) twice in the original worker node session when you are done using the server, and to exit the worker node interactive session and ssh tunnel session rather than let them sit idle for long periods of time when you are not using the server. Failing to do so will tie up resources that in general could better be used for batch computing.