Getting Started with Conda at HPCC
Introduction
It is often useful to set up a fully customized set of software packages including Python and many other items of utility software in your account through installation of the MiniForge package manager. For a variety of reasons including speed and to avoid potentially restrictive terms of service, we recommend transitioning from Anaconda, Inc. tools such Anaconda and Miniconda to use of MiniForge for a more streamlined and community-focused package management experience. While Anaconda provides a comprehensive suite of tools, it often relies on outdated default channels that can limit efficiency. MiniForge starts with the community-suppored and free conda-forge channel, offering access to a broad range of up-to-date packages without the complexities of commercial licensing. It also includes the mamba libraries and internal toolset, which is usually faster and more reliable than the older Conda and Miniconda tools. This guide highlights the benefits of making this switch and provides clear steps for installing and configuring MiniForge effectively.
Table of Contents
- Overview of Conda Package Management Options
- Importance of Installing MiniForge Over Miniconda or Anaconda
- Removing Existing Miniconda or Anaconda Installations
- Installing MiniForge
- Adding channels to MiniForge
- Creating a Virtual Environment
- Deactivate and remove the Virtual Environment
- Optimizing Conda Initialization
- Using Anaconda and Its Terms of Service
Overview of Conda Package Management Landscape
- Conda-Forge is a community-led collection of recipes, build infrastructure, and distributions for the Conda package manager. Important git repositories. staged-recipes: The place to submit new recipes.
- Conda is a powerful command-line tool for package and environment management that runs on Windows, macOS, and Linux. Conda can be found in many distributions, like Anaconda Distribution, Miniconda, or MiniForge.
- Anaconda is a commercial distribution of Python and R languages used to install and manage packages. Anaconda contains all of the most common packages (tools) a data scientist needs and can be considered the hardware store of data science tools.
- Miniconda is a free minimal installer for Conda. It is a small bootstrap version of Anaconda that includes only Conda, Python, the packages they both depend on, and a small number of other useful packages (like pip, zlib, and a few others).
- Mamba is an open source project designed to speed up and improve reliability of conda package installation.
- MiniForge is a minimal installer for Conda that includes the Mamba libraries. MiniForge allows users to install the Conda package manager with the following features pre-configured: conda-forge set as the default channel.
Importance of Installing MiniForge over Miniconda or Anaconda
MiniForge is often preferred over Miniconda and Anaconda for several reasons:
- Community-Focused: MiniForge is a community-driven project specifically designed to work seamlessly with the conda-forge channel, ensuring access to a vast and frequently updated collection of packages.
- No Default Channels: Unlike Anaconda and Miniconda, which include the "defaults" channel that may lead to outdated packages or conflicts, MiniForge sets up and is only dependent on conda-forge, providing a more consistent and reliable package management experience. (As with all Conda installers, user selection of additional repositories such as Bioconda can also be made.)
- Lightweight Installation: MiniForge provides a minimal installer, which means it occupies less disk space and allows users to install only the packages they need.
- Enhanced Compatibility: Using conda-forge as the primary channel improves compatibility with numerous scientific packages that may not be available in the default channels.
Removing Existing Miniconda or Anaconda Installations (if any)
Steps to Remove Existing Conda Environments (Miniconda or Anaconda)
1. Check for Existing Conda Installations
- Run the following command to check if conda is installed
conda --version
- If conda is not installed, you will receive a "command not found" error. This means that you can proceed with the rest of these instructions.
- If you do find a version of conda installed, we recommend that you remove it as described below.
2. Verify and remove any previous conda installation
- To check for Miniconda or Anaconda directories, use the command below. If both this command and the one above return nothing, proceed to the next section of this guide.
ls -al | grep conda
3. Remove any previously existing conda-related directories and files with the command below.
- This command will ask you to confirm each file or directory to be deleted, so if there are items that you wish to keep, answer "n" to the questions. Otherwise, answer "y" to proceed with the removal of the old items.
find . -maxdepth 1 -name \*conda\* -exec rm -ir {} +
Installing MiniForge
Prerequisites: Ensure that there are no other conda environments installed.
1. Download and Install MiniForge
- Download and run the MiniForge Installer
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh
bash Miniforge3-$(uname)-$(uname -m).sh
- In response to the questions, confirm your acceptance of the license and select a
location (the default one is usually okay), and answer the question asking whether
or not to initialize conda on sign-in.
- A good choice is to say "yes" to the question about adding initialization to your .bashrc file, and then take the following steps to prevent auto-activation of the base environment on sign-in for reasons that are explained further below.
source .bashrc
conda config --set auto_activate_base false
2. Review and if necessary update shell configuration in your .bashrc file, then sign out and back in to start with a fresh Conda environment.
3. Confirm the installation by checking the conda version
which conda # This command should return the string ~/MiniForge3/bin/conda
conda --version # This command should return the current Conda version
Adding Channels to MiniForge
To expand your package options, you can add channels
1. Activate Conda
- Ensure that MiniForge is activated
conda activate
2. Add Necessary Channels
- MiniForge will start with the community-run conda-forge channel. You can verify this using the command below:
conda config --show-sources
Note: You can add other channels as needed using the "conda config --add channels (channel name)" if needed, but avoid using the default Anaconda channel, as it is subject to Anaconda's terms of service and may incur costs for the user. Conda-forge does not require the default channel. An example is given below. You do not need to follow this example unless you plan to use the bioconda packages! Those not using these packages can skip this step. Generally, Conda channels will have similar instructions for adding channels if needed.
Example: Adding bioconda channel (not needed if you are not planning to use bioconda packages):
- Set channel priority to strict to ensure conda-forge remains a priority when adding other channels.
conda config --set channel priority strict
- Add the Bioconda Channel ( Only for Bioinformatics users) (optional)
Bioconda is a community-driven channel within the Conda package manager specifically for bioinformatics software. It simplifies the installation and management of specialized bioinformatics packages.
conda config --add channels bioconda
Creating and Managing Virtual Environments
[recommended to install packages]
One of the most significant advantages of Conda is the ability to create isolated environments to manage different versions of Python and packages without conflict. This can be particularly useful if you need to work on multiple projects with different dependencies. Heres a quick guide on setting up environments
1. Create and Activate a New Environment
-
- To create a virtual environment and specify packages (EXAMPLE: choose your desired package and Python version as needed and supply your own environment name in place of <env-name> below)
conda create --name <env_name> python=3.9 scipy=1.5.0 numpy
This command creates an environment named <env_name> with Python version 3.x, and it installs SciPy version 1.5.0 along with the latest version of NumPy. In general you can leave out the version numbers and the "python=..." specifier to obtain the latest versions in your environment. These can be updated later if needed.
-
- Activate the newly created environment
conda activate <env_name>
-
-
- You can view all of your previously created environments using
-
conda env list
2. Install and Test Python Packages
-
-
- You can install other packages into an environment once created and activated. For example, to install Pandas into an existing environment
-
conda activate <env_name>
conda install pandas
-
-
- Test the installed packages .To check the version of the Python interpreter and list the packages in the current environment:
-
python --version
conda list
Deactivate and Remove the Environment
Its a good habit to deactivate and remove environments when they are no longer needed to avoid clutter and potential conflicts. Additionally, ensure that environments are kept minimal to improve performance and avoid slow initialization, as mentioned earlier. Use conda deactivate to switch back to your base Conda environment.
1. Deactivate the current environment
conda deactivate
- This returns you to your base conda installation and unloads everything from your virtual environment.
2. Remove a named environment if you are no longer using it or want to reinstall it from scratch:
conda env remove --name <env_name>
Optimizing Conda Initialization
It is common among scientists to install all the required Python packages under the "base" Conda Environment (the default environment in Conda), leading to a prolonged initialization process after each new login session to the HPCC login or worker nodes. However, the ideal practice is to create as many Conda environments as needed and install a few Python packages inside each Conda environment as possible to reduce the initialization time and avoid extra overhead once an environment gets activated.
The other possible approach, which we highly recommend to all Conda users at HPCC to apply into their Conda settings, is to disable the automatic activation of the "base" environment after each new login session. This method does not require removing the "conda initialize" lines from the “.bashrc” file and won't stop the Conda from preparing the user's environment for any future interaction with Conda. Instead, it prevents the "base" environment from being activated automatically after each new SSH/Interactive session.
If you see the following indicator at the start of each of your command lines, it means your base environment is being activated when you sign on or start a session.
(base)$
We recommend that you suppress the automatic activation of the base environment to improve speed of logins and startup of other sessions for batch jobs. Instead, it is best to activate environments including the base environment oinly when needed.
To initialize the Conda environment upon login without automatically activating the "base" environment, follow the instructions below
1. After logging in to one of the HPCC login nodes, if you see the indicator mentioned above, please deactivate the Conda base env
conda deactivate
2. Then disable the "Automatic Active Base" feature in the Conda configuration
conda config --set auto_activate_base False
3. You can also check the Conda configurations to make sure the new setup is in place
conda config --show | grep auto_activate_base
Log out, and log back into your account. You will see the "conda" command is still
available in your session, but the (base) env is not activated yet. You may now activate
the "base" or any other conda environment whenever you need to do so:
conda activate
For example:
conda activate base
As a reminder, please use a named environment suited for a particular set of commands
to manage your related package installations, not a huge one with everything in it.
Generally speaking, conda packages related to a particular workflow can be placed
into a named environment to avoid the base environment growing too large and to reduce
the possibility of conflicting package versions and their dependencies.
The best practice is to activate the specific conda environment needed within each job submission by simply adding the "conda activate <env>" into your job submission script file or within a given login session. While you can add such conda setup steps into your .bashrc file, please note that this can introduce considerable delays into each login or the start of each session due to the intensive disk activity needed to set up the environment and due to the nature of Python as an interpreted and not a compiled language that requires disk activity for each command and step.
Using Anaconda/Miniconda and Its Terms of Service
Anaconda has very particular terms of service that govern use of its proprietary repositories. These channels can be added to the above MiniForge-based installations if needed, or you can alternatively use Anaconda's installation mechanisms as described on their web site. If you prefer to use Anaconda/Miniconda channels for your work, be aware of the following
- Commercial Licensing: Anaconda is a commercial product, which means you must adhere to its terms of service. Using Anaconda in a commercial setting may incur licensing fees.
- Default Channel Limitations: The default channels provided by Anaconda may not always offer the most up-to-date packages. This can lead to compatibility issues and potential project delays.
- Costs: If you intend to use Anaconda for commercial purposes, you will need to purchase a license. Ensure you understand the costs associated with enterprise usage.
Note: For More Information on Anaconda Terms of Service Update on Anaconda's Terms of Service for Academia and Research | Anaconda
Any additional expenses incurred from using Anaconda will be the user's responsibility, and HPCC will not provide financial support. By opting for MiniForge and avoiding use of proprieterry channels, users can avoid these potential complications and still benefit from an open and flexible environment for their projects.
With the above considerations in mind, here is a link to the older previous Miniconda-based installation guide formerly used by the HPCC.
High Performance Computing Center
-
Phone
806.742.4350 -
Email
hpccsupport@ttu.edu