Python Dependency Management with Fewer Headaches
One of the most painful parts of developing neural networks is dependency management in Python. It seems like Python has reinvented multiple wheels that other languages like JAVA have been merrily rolling along on for multiple years. Ironically, Python packages are actually called wheels. Oh well.
In this short post we want to show you our solution to this problem for Deep Learning (DL) projects, where this problem is particularly nasty as you also need to juggle multiple CUDA versions. Note that there are multiple ways to deal with this - this just happens to be the one we like most - maybe you will as well?
What is so hard about this?
In a professional setting you will work on multiple projects in parallel. And if you are working on multiple projects you soon feel the pain of managing dependencies. AI assisted coding only amplifies this problem, as now you will create a lot of MVPs quicker and try out ideas that before had been too costly to evaluate. This is of course a good thing but will force you to handle even more environments in parallel.
However, your projects will need to handle:
- Different Python versions
- Different versions of the same libraries
- Multiple CUDA versions in parallel
So a single Python install with a set of installed libraries with pip is out. One global CUDA install will not work. Virtual environments are a solution that is widely used, but using plain pip inside is time consuming and error prone and requires a lot of discipline to pin down all library versions. And then you would have to manually install CUDA in each environment which is hard to automate. In the end people use all kinds of combinations of tools to somehow manage this, every workflow with its own pros & cons.
Enter CONDA
Of course one of the most popular ways to solve this is Anaconda. But you will quickly need the full paid version - for a smaller company this quickly adds up in additional licensing cost. Another (not exactly cheap) subscription just to get working dependency management and stable builds, which should be a basic feature of any language.
Apart from the additional cost there is another problem: Everybody who is supposed to reproduce your builds or work on the project needs a license as well - that makes it a bad choice for open source projects, where you want a large audience to be able to chip in.
Note however that Anaconda is definitely a good solution, so if you or your employer can get you a business license, you are mostly set and can stop reading here.
Our Solution
But there is also a way to achieve Python dependency management bliss with multiple CUDA versions using only Open Source components. Here is how.
The quick outline:
- Figure out which PyTorch, Python and CUDA versions you need
- Use miniconda to manage virtual environments and and install Python
- Use pip tools inside the respective virtual environment to manage versions of libraries conveniently
- And the most important trick: Use the PyTorch CUDA dependencies to automatically install the precise CUDA version you need
With these steps you can maintain as many combinations of Python/PyTorch/CUDA as you want on your machine and each project is 100% reproducible by anyone with access to the code.
Step 0: Figuring out what you want
This is how we determine the versions we want to use:
First, determine your target PyTorch version. If it is a greenfield project, just use the latest stable one (just check https://pytorch.org/ and scroll down). In some cases you might need to run / train / modify a network from another project or repo that needs older / deprecated API, so make sure you know the PyTorch version you need in those cases by reading the (hopefully existing) documentation of that project. Let’s assume for now you decide on PyTorch 2.8.0.
Next you need to determine the Python version for this. You can find the compatibility matrix here: https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix We have burnt our fingers many times with newer Python versions as often other libraries you might need later do not support it yet. Hence we recommend sticking to a version a bit older than the most current one. For PyTorch 2.8.0 we would go for Python 3.11.
Finally you need to know which CUDA version you want. To do that, you need to know what the maximum CUDA version is that you can run. To find this out, just run nvidia-smi
in the console, that will give you an output like this:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.169 Driver Version: 570.169 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 ... Off | 00000000:01:00.0 On | N/A |
| N/A 48C P4 9W / 115W | 1465MiB / 8188MiB | 33% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1892 G /usr/lib/xorg/Xorg 946MiB |
| 0 N/A N/A 2785 G cinnamon 28MiB |
| 0 N/A N/A 4266 G /usr/lib/thunderbird/thunderbird 7MiB |
| 0 N/A N/A 4477 G ...led --variations-seed-version 300MiB |
| 0 N/A N/A 4870 G /usr/lib/firefox/firefox 20MiB |
| 0 N/A N/A 10430 G ...bkit2gtk-4.0/WebKitWebProcess 44MiB |
| 0 N/A N/A 11682 G ...ess --variations-seed-version 9MiB |
+-----------------------------------------------------------------------------------------+
Now at the top right you see CUDA Version: 12.8
. You might think this means the system has CUDA 12.8 installed, but this is wrong and is a major source of confusion and stems from a tiny misnomer in the output of nvidia-smi
: It should say max CUDA Version
- because this is the maximum CUDA version the driver supports. (the output above is actually from a system without a CUDA installation). Now you just pick the highest stable CUDA version from the compatibility matrix your driver supports, in our case 12.8, and you are done.
As CUDA support by the drivers is downwards compatible it makes sense to keep the driver as current as possible so you can freely pick which CUDA version to use.
Now we are equipped with all version numbers we need to create a working Deep Learning project, but how to manage all those versions?
Step 1: Miniconda
We start with miniconda, the smaller Open Source Version of the full Anaconda (The downloads are on the right side of this page: https://www.anaconda.com/download/success). This doesn’t come with all the useful repos you need for CUDA and lots of other libraries, but it will allow you to quickly set up conveniently switchable virtual environments and install the exact Python version you need. To make this easy to reproduce, it is best to create a Yaml config file for this. Let’s call it cool_project_name.yml
. It will always look like this (except for different Python versions):
name: cool_project_name
channels:
- defaults
dependencies:
- python=3.11 # or any other python version you like
- pip
- pip:
- pip-tools
Note how little we put in the miniconda environment: just the python version and pip tools, which we will use in the next step to actually install the dependencies we need. This file will likely never change again over the lifetime of the project.
To create the environment, do:
conda env create -f cool_project_name.yml
conda activate cool_project_name
Step 2: Pip Tools
Now to install dependencies into this environment you could simply use pip, but pip-tools makes this much cleaner and easier to reproduce by figuring out library versions for you and especially handling changes to the dependencies much nicer than plain pip.
To add packages you don’t just install them, but instead add them to a file named requirements.in
(note the suffix: it is .in
, not .txt
). E.g.:
# PyTorch with CUDA support
--extra-index-url https://download.pytorch.org/whl/cu128
torch==2.8.0+cu128
mlflow
transformers
# ...
# add any other libs you might need here
# ...
Ignore the weird --extra-index
line for now - this is the secret sauce for our last step. The trick here is that you pin the version of the packages where you really need a specific version for the project to work and do not specify the other versions. Now this is normally a recipe for disaster if multiple people work on a project. Each person installs packages at different moments in a slightly different order without specifying versions, which sooner or later introduces nasty errors due to version inconsistencies. I have often seen projects that run perfectly fine on one developers machine and crash on another’s. However, this is where pip tools will help you.
You now run pip-compile requirements.in
- pip tools will figure out good fits for all libraries without specified versions and create a requirements.txt
with all versions exactly specified for you. This is what you could do for yourself, but I still have to meet a Python developer disciplined enough to actually do this.
Using this requirements.txt
you can now run pip-sync requirements.txt
to update your environment. This is really useful if there is a change/update as this will only install missing packages or packages with a changed version number.
To allow others to reproduce the build you now add cool_project_name.yml
, requirements.in
and requirements.txt
to your repository.
To add new packages later:
- add them to
requirements.in
- call
pip-compile requirements.in
- call
pip-sync requirements.txt
If someone else added new packages or changed versions, all you need to do is run pip-sync requirements.txt
.
Step 3: CUDA dependency
Apart from shape mismatches nothing causes as many headaches in Deep Learning as finding and managing the correct CUDA / CuDNN packages. This is normally where the Anaconda Business version shines. But luckily you can just solve this using pip. The trick is to add the official PyTorch repo. This is done with the following line:
--extra-index-url https://download.pytorch.org/whl/cu128
Note the suffix /cu128
- this is the repo for CUDA 12.8. Similarly, you would add /cu126
for CUDA 12.6, /cu118
for CUDA 11.8 etc. Now you just need to pin the torch and CUDA version via the torch version:
# 2.8.0 -> the torch version, +cu128 -> the cuda version of the CUDA dependencies
torch==2.8.0+cu128
This triggers pip tools to get the correct CUDA and CuDNN packages for PyTorch. If you want to check for the possible versions, you can view all torch version/cuda version combinations here (but normally the dependency matrix from the PyTorch git repo should be enough): https://download.pytorch.org/whl/torch/
Important: This means you should never install CUDA manually. In fact, it is better if you don’t have a CUDA install on your OS, as weird as this sounds. Otherwise the global install might interfere with the install in your virtual environment. All you need to install on your OS is an up-to-date GPU driver.
Summary
By just using miniconda, pip-tools and two config files you can now easily manage multiple Pytorch/Python/Conda combinations and switch conveniently between them. All members of a project team can recreate a working config with just a few commands. Since we follow this setup pattern for every project we didn’t have any reproducibility headaches anymore and people can be quickly onboarded to new projects. We hope you find this information useful and it will help you as much as it did us.
P.S.: But what if I don’t use PyTorch?
Well, if push comes to shove you can still install the PyTorch CUDA packages without Pytorch - they work just fine with other frameworks if those do not bring their own CUDA pip packages with them. Which packages these are we leave as an exercise for the reader - but admittedly, this method is best suited for PyTorch development.