Getting Started with Conda at HPCC
Introduction
It is often useful to set up a fully customized set of software packages including Python and many other items of utility software in your account through installation of the Miniforge package manager. For a variety of reasons including speed and to avoid potentially restrictive terms of service, we recommend transitioning from Anaconda, Inc. tools such Anaconda and Miniconda to use of Miniforge for a more streamlined and community-focused package management experience. While Anaconda provides a comprehensive suite of tools, it often relies on outdated default channels that can limit efficiency. Miniforge emphasizes the conda-forge channel, offering access to a broader range of up-to-date packages without the complexities of commercial licensing. It also includes the mamba libraries and internal toolset, which is usually faster and more reliable than the older Conda and Miniconsa tools. This guide highlights the benefits of making this switch and provides clear steps for installing and configuring Miniforge effectively.
Table of Contents
- Overview of Conda Package Management Options
- Importance of Installing Miniforge Over Miniconda or Anaconda
- Removing Existing Miniconda or Anaconda Installations
- Installing Miniforge
- Adding channels to Miniforge
- Creating a Virtual Environment
- Deactivate and remove the Virtual Environment
- Optimizing Conda Initialization
- Using Anaconda and Its Terms of Service
Overview of Conda Package Management Landscape
- Conda-Forge is a community-led collection of recipes, build infrastructure, and distributions for the Conda package manager. Important git repositories. staged-recipes: The place to submit new recipes.
- Conda is a powerful command-line tool for package and environment management that runs on Windows, macOS, and Linux. Conda can be found in many distributions, like Anaconda Distribution, Miniconda, or Miniforge.
- Anaconda is a commercial distribution of Python and R languages used to install and manage packages. Anaconda contains all of the most common packages (tools) a data scientist needs and can be considered the hardware store of data science tools.
- Miniconda is a free minimal installer for Conda. It is a small bootstrap version of Anaconda that includes only Conda, Python, the packages they both depend on, and a small number of other useful packages (like pip, zlib, and a few others).
- Mamba is an open source project designed to speed up and improve reliability of conda package installation.
- Miniforge is a minimal installer for Conda that includes the Mamba libraries. Miniforge allows users to install the Conda package manager with the following features pre-configured: conda-forge set as the default channel.
Importance of Installing Miniforge over Miniconda or Anaconda
Miniforge is often preferred over Miniconda and Anaconda for several reasons:
- Community-Focused: Miniforge is a community-driven project specifically designed to work seamlessly with the conda-forge channel, ensuring access to a vast and frequently updated collection of packages.
- No Default Channels: Unlike Anaconda and Miniconda, which include the "defaults" channel that may lead to outdated packages or conflicts, Miniforge sets up and is only dependent on conda-forge, providing a more consistent and reliable package management experience. (As with all Conda installers, user selection of additional repositories such as Bioconda can also be made.)
- Lightweight Installation: Miniforge provides a minimal installer, which means it occupies less disk space and allows users to install only the packages they need.
- Enhanced Compatibility: Using conda-forge as the primary channel improves compatibility with numerous scientific packages that may not be available in the default channels.
Removing Existing Miniconda or Anaconda Installations (if any)
Steps to Remove Existing Conda Environments (Miniconda or Anaconda)
1. Check for Existing Conda Installations
-
-
- Run the following command to check if conda is installed
-
$conda --version
-
-
- If conda is not installed, you will receive a "command not found" error.
-
2. Verify and remove conda installation
-
-
- If Conda is installed, the command will return the installed version.
- To check for Miniconda or Anaconda directories , use the ls command
-
$ ls
3. Remove the existing directories with
$ rm -rf ~/miniconda3
$ rm -rf ~/anaconda 3
Installing Miniforge
Prerequisites: Ensure that there are no other conda environments installed.
1. Download and Install Miniforge
-
-
- Download the Miniforge Installer
-
$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname-m).sh
-
-
- Run the installer script
-
$ bash Miniforge3-$(uname)-$(uname-m).sh
2. Update shell Configuration
-
-
- Open the .bashrc file to ensure the Miniforge path is set
-
$ nano ~/.bashrc
-
-
- Ensure the following line is included to set the Miniforge path
-
$ export PATH="/home/$(whoami)/minifroge3/bin:$PATH"
-
-
- Reload the .bashrc file
-
$ source ~/.bashrc
3. Verify Installation
-
-
- Confirm the installation by checking the conda version
-
$ conda --version
-
-
- Initialize Conda for your shell
-
$ conda init
Adding Channels to Miniforge
To expand your package options, you can add channels
1. Activate Conda
-
-
- Ensure that Miniforge is activated
-
$ conda activate
2. Add Necessary Channels
-
-
- Add conda-forge channel
-
$ conda config --add channels conda-forge
Note: You can add other channels as needed, but avoid using the default channel, as it is subject to Anaconda's terms of service and may incur costs for the user. Conda-forge does not require the default channel.
Ensure conda-forge remains a priority when adding other channels.
-
-
- Set channel Priority
-
Set Channel Priority to strict
$ conda config --set channel priority strict
-
-
- Add the Bioconda Channel ( Only for Bio Informatics) (optional)
-
Bioconda is a community-driven channel within the Conda package manager specifically for bioinformatics software. It simplifies the installation and management of specialized bioinformatics packages.
$ conda config --add channels bioconda
Creating a Virtual Environment
[recommended to install packages]
One of the most significant advantages of Conda is the ability to create isolated environments to manage different versions of Python and packages without conflict. This can be particularly useful if you need to work on multiple projects with different dependencies. Heres a quick guide on setting up environments
1. Create and Activate a New Environment
-
-
- To create a virtual environment and specify packages ( choose your desired package)
-
$ conda create --name <env_name> python=3.9 scipy=1.5.0 numpy
This command creates an environment named <env_name> with Python version 3.x, and it installs SciPy version 1.5.0 along with the latest version of NumPy.
-
-
- Activate the newly created environment
-
$ conda activate <env_name>
-
-
- You can view all available environments using
-
$ conda env list
2. Install and Test Python Packages
-
-
- You can install other packages into an environment once created. Install necessary packages, for example, NumPy and Pandas: Python
-
$ conda install numpy pandas
-
-
- Test the installed packages .To enter the Python interpreter, simply type python in your terminal after installing the packages.
-
$ python
-
-
- Within the Python interpreter
-
import numpy as np
print(np.__version__)
3. Install Bioconda Packages: ( only if Bioconda channel is added )
-
-
- Install the desired Bioconda packages. For example, to install kallisto
-
$ conda install kallisto
-
-
- Verify the installation
-
$ kallisto --version
Deactivate and Remove the Environment
Its a good habit to deactivate and remove environments when they are no longer needed to avoid clutter and potential conflicts. Additionally, ensure that environments are kept minimal to improve performance and avoid slow initialization, as mentioned earlier. Use conda deactivate to switch back to your base Conda environment.
1. Deactivate the environment
$ conda deactivate
-
- This returns you to your base conda installation and unloads everything from your virtual environment.
2. Remove the environment
$ conda env remove --name <env_name>
Optimizing Conda Initialization
It is common among scientists to install all the required Python packages under the "base" Conda Environment (the default environment in Conda), leading to a prolonged initialization process after each new login session to the HPCC login or worker nodes. However, the ideal practice is to create as many Conda environments as needed and install a few Python packages inside each Conda environment as possible to reduce the initialization time and avoid extra overhead once an environment gets activated.
The other possible approach, which we highly recommend to all Conda users at HPCC to apply into their Conda settings, is to disable the automatic activation of the "base" environment after each new login session. This method does not require removing the "conda initialize" lines from the “.bashrc” file and won't stop the Conda from preparing the user's environment for any future interaction with Conda. Instead, it prevents the "base" environment from being activated automatically after each new SSH/Interactive session.
To initialize the Conda environment upon login without automatically activating the "base" environment, you may follow the instructions below
1. After logging in to one of the HPCC login gateways, please deactivate the Conda base env
(base)$ conda deactivate
2. Then disable the "Automatic Active Base" feature in the Conda configuration
$ conda config --set auto_activate_base False
3. You can also check the Conda configurations to make sure the new setup is in place
$ conda config --show | grep auto_activate_base
Log out, and log back into your account. You will see the "conda" command is still
available in your session, but the (base) env is not activated yet. You may need to
activate the "base" or any other Conda environments whenever you need to do:
$ conda activate
For example:
$ conda activate base
(or better yet, a named environment suited for a particular set of commands, not a
huge one with everything in it).
Please pay attention that this new Conda setting will require you to activate the Conda environment upon each job submission by simply adding the "conda activate <env>" into your job submission script file.
Using Anaconda/Miniconda and Its Terms of Service
If you still prefer to use Anaconda/Miniconda, be aware of the following
- Commercial Licensing: Anaconda is a commercial product, which means you must adhere to its terms of service. Using Anaconda in a commercial setting may incur licensing fees.
- Default Channel Limitations: The default channels provided by Anaconda may not always offer the most up-to-date packages. This can lead to compatibility issues and potential project delays.
- Costs: If you intend to use Anaconda for commercial purposes, you will need to purchase a license. Ensure you understand the costs associated with enterprise usage.
Note: For More Information on Anaconda Terms of Service Update on Anaconda's Terms of Service for Academia and Research | Anaconda
Any additional expenses incurred from using Anaconda will be the user's responsibility, and HPCC will not provide financial support. By opting for Miniforge, users can avoid these potential complications and benefit from a more open and flexible environment for their projects.
With the above considerations in mind, here is a link to the older previous Miniconda-based installation guide formerly used by the HPCC.
High Performance Computing Center
-
Phone
806.742.4350 -
Email
hpccsupport@ttu.edu