Python Virtual Envs

Mon 18 April 2022

Disclaimer: I fully admit I'm not a detail-oriented person, and as such, I'm satisfied when things work, not necessarily when things work the right way or even why they do so. Like most Python topics, there is more than one way to make or choose a Python virtual environment, and you should do what's right for you, and, ideally, leave me out of it.

Let's set the scene. You're a relatively new Python user, happily writing amazing scripts that automate those button clicks that seem absurd in hindsight. You've ventured beyond core Python, pip installing third party modules with glee. You've even got multiple projects going, using a mix of shared and exclusive modules. It works great for a while, perhaps years! But, the time comes when one project requires X module version A, and another project requires X module version B. Without some fancy footwork, you can only have one version of each module installed at a time.

You've fallen victim to one Python's classic blunders: never install modules globally. Rather, use a virtual environment. No one tells you this until it's too late, sorry to say.

You can think of virtual environments as a self-contained bundle of Python plus a segregated set of modules. Unfortunately, or luckily, depending on your perspective, there are several options for creating virtual environments. This blog post is intended to serve as a starting point for avoiding choice overload when it comes to creating a virtual environment. It is simply the way I do it, and you can do whatever you want in order to segregate your Python environments. You can find some basic advice at the end of each section, but just know that there are many ways to solve this problem, with a variety of tools.

1. Use recommended tool

Especially when you're starting out, trust repo owners to help you choose the best environment for a project. There may be a good reason they recommend a certain method. If they recommend poetry, try to use poetry. If there is no recommendation, go ahead and choose something else.

2. Use venv

venv is what I assume most people mean when they say "virtual environment". The usage is like:

$ python3 -m venv .venv     # create a new env in a directory called ".venv"
$ source .venv/bin/activate # activate the env in the ".venv" directory

The first command creates a new environment, including, among other things, a Python binary, a known directory in which new modules will be installed (e.g. .venv/lib/python3.8/site-packages/), and a method to install new packages (pip). Once you activate the environment with the second command, all packages installed with that pip will be places in that site-packages directory.

venv is an excellent tool to spend time learning. It is my most used environment type, and good for situations in which:

you want to create the environment quickly
you're not particular about exactly which version of Python will be used (although it is possible to control)
you don't need environment-specific environment variables (although this is achievable)
you're happy with manually activating the environment each time
you don't intend to reuse the environment elsewhere
all of the desired libraries are installable with pip

3. Use direnv

If you like venv, you'll love direnv. Once you've installed direnv, the usage is like:

$ echo 'layout_python3' > .envrc # create a file ".envrc" containing the text "layout_python3"
$ direnv allow                   # refresh the direnv to acknowledge the change

The first command creates a .envrc file with a quick instruction for direnv that this is for python 3. The second command prompts direnv to acknowledge that this directory contains a .envrc and get to work building a new environment. In essence, a new venv is created in the directory, but also, because direnv recognizes this directory the environment will automatically be activated on entry and deactivated on exit. Handy! In addition, you can add export and other commands to the .envrc, allowing you to load and unload environment variables for this directory only.

direnv is great in situations where:

you want to automatically activate/deactivate the environment on entry/exit
you want a simple way to use directory-specific environment variables
you want all the other benefits (and drawbacks) of venv

4. Use conda

conda is another very popular environment manager (like venv) and also functions as a package manager (like pip). Once installed, conda usage is like:

$ conda create -n env_name # create a new conda environment called "env_name"
$ conda activate env_name  # activate environment

Similar to venv, the first command creates the environment and the second activates it. Unlike venv, conda environments are created and accessible globally, meaning you can easily activate conda environments wherever they are needed. New packages are then installed through conda install rather than pip, although you can also install packages using pip. The nice thing about conda environments is that all conda-installed packages are reconciled against each other to ensure compatibility. However, the annoying thing is that this reconciliation can take...

...f o r e v e r.

I was going to use this in my slides but lately one cannot even `conda install gdal` from conda-forge :-( pic.twitter.com/60XW5dIytN
— Filipe Fernandes (I am 😷 under that 🪣) (@ocefpaf) July 15, 2016

(Ignore the text in the tweet above, I'm pretty sure it works in conda now)

A major reason for geospatial developers to consider conda environments is that GDAL python bindings are notoriously difficult to install correctly through pip. If you run into issues creating a functional environment containing GDAL python bindings, consider installing in a conda environment with conda install -c conda-forge gdal.

conda definitely has its place, and that is where:

you don't mind how long it takes to create the environment
you want to ensure compatibility between all installed modules
you want to share a single environment between multiple projects
you want easy control over the python version (e.g. add python=3.8 to environment creation command)

5. Use mamba

mamba is a newer spin-off of conda. I haven't used it much, but the idea of mamba is that it behaves like conda, only faster. You do need to have both conda and mamba installed in order to use mamba. To create a mamba environment:

$ mamba create -n env_name # create a new mamba environment called "env_name"
$ conda activate env_name  # activate environment with conda

As mentioned, I don't have much to say about mamba, but if your conda environment is taking too long to build, keep mamba in mind.

Use something else entirely

Other popular environment management tools that I have no real opinion about include:

That's all I've got. Enjoy you virtual environments! Get in touch with me on Twitter if you want to continue the discussion!

Tags:

Darren Wiens