My setup for virtual environments in Python

python
Author

Filip Wästberg

Published

February 7, 2023

At least once a year I get a chance to dable with Python. This time I’m involved in a project where we have developed outlier detection algorithms for district heating networks. To run the Python code and the application built on top of it I have to use a specific version of Python and some specific package versions.

So I started up my Python environment. Eager to get started. But instead of investigating the code and the results from the algorithm I got stuck on installing the correct package versions.

I have been down this road before. Usually it means I have to spend a day googling and following different tutorials to get everything to work. But I have never really understood why. So I thought I would do this a bit more thorough and write everything down this time.

Why write this post

I primarily use R to analyze data. After working out package management in Python I have realized that R users are very spoiled by the package management system that is built into R. When I install a package via install.packages("packagename") it just works. This is very nice feature since most of my work is experimental. I mainly use R to investigate something: manipulate, visualize and model data. This means that the less time I need to spend on setup, the more productive I can be.

Virtual environments are not new to me

Of course, if I want my code to be used for production, for example scheduling a script, I want to make sure that the script doesn’t fail if I change my setup, like updating a package. So I’m familiar with the concept of virtual environments. I have used R packages for this, like renv and packrat, and also Docker. But I use these when I need them.

Virtual environments in Python

In order to use Python productivly most developers will encourage you to use Virtual Environments. If you are a Data Scientist that uses a bundled Python distribution like Anaconda, you might be using a virtual environment without thinking to much about it.

People use Virtual Environments to isolate projects from each other. In other words, the packages you use in one project can be different (versions) from another project, and when you open up a separate project it should’nt be dependent on the virtual environment in another project. I think of virtual environments as folders where you save all the packages that you use for a project in Python.

venv

The most basic way to create virtual environment in Python is venv.

To use it is pretty straight forward:

python3 -m venv path-to-mynewenv

This create a folder in your directory where the packages installed for your virtual environment will be saved.

What version are we using?

When you open the terminal and type python --version you can get different answers. I found this article really helpful in understanding why this happens. This effects our virtual environment as venv will inherit your Python version. As it says in the Python documentation:

A virtual environment is created on top of an existing Python installation, known as the virtual environment’s “base”

In other words: we cannot specify Python version for virtual environments with venv. To do that you will have to use something like pyenv.

Anyways, to use the environment we created we need to activate it:

source mynewenv/bin/activate

and then we can then install packages into it:

pip install pandas numpy

This works. But I usually want to use a virtual environment in many different projets and not have to install everything again when doing a new projet. Also, I want to have control of the version of Python that I’m using.

Anaconda and conda

A popular distribution and platform for working with Python is Anaconda. Anaconda is not only for Python. On its website it says:

Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN

In other words, when you install Anaconda, you install a lot.

conda

conda is the package manager built into Anaconda. For Python you use it the same way as venv but instead of creating a virtual environment for each project you can use conda to create environments that you can use over many projects.

It should be noted that I installed Anaconda and conda a long time ago so I had to update it before getting it to work properly. First run: conda update conda, then conda install anaconda, not sure why and lastly I had to run conda update --all to get it to work the way I wanted. This took a while.

But as soon as it worked I was able to create an environment where I also can specify the python version.

conda create -n pythonds python=3.8

To install packages into the conda environment you run: conda install -n condaenv numpy=1.19.2 pandas=1.2.3

I wanted to use the environment in a Quarto document. To do this I also had to register it to the ipykernel.

python -m ipykernel install --user --name=pythonds

The setup that now works for me

At some point we just want things to work. And right now this is what works for me when working with Python.

  1. I create and manage virtual environments with conda
  2. If I want to share my virtual environment I create a conda.yml file which is the equivalent of a requirements.txt file to specify dependencies
  3. Lastly, because I use Quarto, I have to register the environment to the jupyter kernel

This setup works for me right now. Crossing my fingers that it will work tomorrow.

Bonus tricks

miniconda

As I mentioned conda is not restricted to Python. A smaller version of conda, primarily made for Python is miniconda:

Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others. Use the conda install command to install 720+ additional conda packages from the Anaconda repository.”

The good thing about Miniconda is that you can use conda in the same way. So to create a virtual environment you do exactly the same: conda create -n minienv python=3.8. If I would start again I would probably restrict myself ot miniconda but I haven’t really figured out how to run these things separate.

pyenv

With conda you can specify Python version, but not with venv. If you want to switch Python versions most people will suggest pyenv. You can use pyenv in a similar fashion to venv.