Installing a local copy of Python
This tutorial will cover installing Python (via Miniconda) locally into your $HOME folder so that you may install Python versions and packages different from those maintained by the HPCC. The instructions to perform this installation can be found below.
This method is intended as a replacement for the system versions available through modules, so if you have previously loaded one of these within your session, you must first issue a "module unload python" or "module purge" command.
Table of Contents:
- Installing a Local Copy of Python
- Installing Python Packages
- Creating Virtual Environments
- Optimize the Conda Init
- Using Python2 over Python3
The following script can be used to install Python 3 (using Miniconda) to your $HOME directory. You can then use the "environments" feature of Conda to create, activate, deactivate, and move between separate self-consistent environments to manage your Python and related code sets. Python 2 is now officially deprecated and will not be updated past 2020, so please convert all your past code to use Python 3 as soon as possible!
For further information on using the conda command line to control and select between your environments, please see the documentation on the following page: https://conda.io/docs/user-guide/getting-started.html
Automated Python Installation Script:
# Running the following script will automate the process of installing a local copy of Miniconda v3
# Once complete you will need to run the following commands before you can actually use the new environment you created.
Using the Conda installer comes with the added benefit that it comes pre-loaded with a number of popular and powerful python packages, such as numpy and scipy. However, you may find that you need additional packages to perform your research. Conda makes installing many python packages easy with the use of the "conda install" command. For instance, if you needed to install the BioPython package (a popular package among bioinformaticians and biologists), you could use the following command:
conda install biopython
To use these packages in practice, be sure to read the "Getting started with conda" tutorial and learn how to use the "conda create", "conda activate", and "conda deactivate" commands to get the most out of the use of these powerful tools for organizing your Python and related software environments.
Another added bonus of the Conda package manager is the ability to separate your Python dependency stacks into separate virtual environments. This is particularly useful when you have different workflows that require different versions of python libraries or even different versions of python itself. For instance, if you have some applications that require Python v3.x and another application that requires Python v2.7, then you can simply create a Python 2.7 environment and switch to that environment as needed without causing any problems for your Python v3 environment. (Python 2 has been deprecated by the Python project and will not be maintained beyond 2020, so please convert to using Python 3 as soon as possible.)
Below is a very brief tutorial for how to create a virtual environment that uses different versions of some software stacks. For this demonstration we will be installing QIIME v1.9, a bioinformatics software known for having odd dependencies that often create difficulties when being installed on modern installations of Python. Please feel free to look over the "Getting started with conda" tutorial as well as the "Managing environments" documentation provided by the developers as they do provide a more in-depth look at managing environments.
If you haven't already done so, please install Miniconda using the installation instructions above. Then make sure the Conda binary directory has been added to your path. If you installed using the instructions above, then this can be done using the following:
Next we will create a conda environment. This is done by giving conda the "create" command and then passing it an argument "-n <name>". This <name> will be what you use to reference your environment later, so it is often best to use something descriptive to ensure you know what the environment contains in the future. After giving it a name you can then list the packages you want to install, at this point we should note you can also use an "=" (equal sign) to force Conda to install a specific version. The following example command will create a new environment called "qiime1" and install into that environment python v2.7, qiime, matplotlib v1.4.3, mock, nose, h5py and all of their dependencies:
conda create -n python2 python=2.7 scipy=0.15.0 numpy
#As a reminder:
# create -> instructs conda to create a new virtual environment
# -n python2 -> instructs conda to name this new environment "python2" -- see deprecation notice below!
# python=2.7 -> instructs conda to install Python 2.7 instead of Python v3 (default) -- see deprecation notice below!
# scipy=0.15.0 -> instructs conda to install v0.15.0 of scipy instead of the newest version.
# numpy -> instructs conda to install the most recent version of numpy
Once the installation is complete (takes about 1 minute) you will now have a new conda environment named python2. To see a list of all environments you have access to, use the following command:
conda env list
Switching to one of these environments requires you to run the conda command. For the environment we created above we would need to source the python2 environment. Run the following commands to see how loading an environment can change the applications you can access.
#View the current version of Python
python --version #This will likely return Python 3.X.Y :: Anaconda, Inc.
#Load the python2 environment -- see deprecation notice below!
conda activate python2
#View the current version of Python
python --version #This will likely return Python 2.7.14 :: Anaconda, Inc.
As you can see, the version of Python changed when you changed environments. Any work you do with Python will now use this specific version of Python and only the packages you have installed into this environment - nothing installed by the HPCC or that you have installed into other environments will be visible. To leave this environment you can run the following command:
This returns you to your base conda installation and unloads everything from your virtual environment.
It is common among scientists to install all the required Python packages under the "base" Conda Environment (the default environment in Conda), leading to a prolonged initialization process after each new login session to the HPCC login or worker nodes. However, the ideal practice is to create as many Conda environments as needed and install a few Python packages inside each Conda environment as possible to reduce the initialization time and avoid extra overhead once an environment gets activated.
The other possible approach, which we highly recommend to all Conda users at HPCC to apply into their Conda settings, is to disable the automatic activation of the "base" environment after each new login session. This method does not require removing the "conda initialize" lines from the “.bashrc” file and won't stop the Conda from preparing the user's environment for any future interaction with Conda. Instead, it prevents the "base" environment from being activated automatically after each new SSH/Interactive session.
To initialize the Conda environment upon login without automatically activating the "base" environment, you may follow the instructions below:
1. After logging in to one of the HPCC login gateways, please deactivate the Conda base env:
(base)$ conda deactivate
2. Then disable the "Automatic Active Base" feature in the Conda configuration:
$ conda config --set auto_activate_base False
3. You can also check the Conda configurations to make sure the new setup is in place:
$ conda config --show | grep auto_activate_base
4. Log out, and log back into your account. You will see the "conda" command is still available in your session, but the (base) env is not activated yet. You may need to activate the "base" or any other Conda environments whenever you need to do:
$ conda activate
$ conda activate base
(or better yet, a named environment suited for a particular set of commands, not a huge one with everything in it).
Please pay attention that this new Conda setting will require you to activate the Conda environment upon each job submission by simply adding the "conda activate <env>" into your job submission script file.
Some code still exists that uses Python v2, which is now officially deprecated and will not be maintained by the Python project past 2020. For this reason, the installation scripts mentioned above start with examples that will install Python v3.
To switch back to Python v2, you will need to create a Python2 environment and then source that environment whenever you need access to Python2. This can done using the tutorial from the "Creating Virtual Environments" section above, however the relevant commands are reproduced below:
conda create -n python2 python=2.7 scipy=0.15.0 numpy
conda env list #Now you should see an environment called "python2"
conda activate python2 #This will load the python2 environment
#Once you have finished using the environment you will need to deactivate/unload it.