Private Conda package servers: Choosing the right solution

BioStrand (a subsidiary of IPA)
4 min readApr 9, 2024

--

If you are a newcomer to the Python ecosystem, it can be daunting to know how to setup your python environment. There are tens of environment and packaging tools that serve the purpose of managing Python packages. You might have seen the figure below from the An unbiased evaluation of environment management and packaging tools . One might be surprised to learn that this doesn’t cover all of the available tools. Needless to say, how you choose to setup your environment will affect your development experience.

Image Source

Environment and package management

At Biostrand, our development environments consist of:

  • Private python packages for any R&D work
  • Possible mixins with binaries for faster speed up of slower bottlenecks that use python
  • Many different packages that are only present in the Conda ecosystem and have not yet been published PyPI

Because of these requirements, we have been using Conda to build and distribute our internal packages. You might want to do the same in your organization, and come to the conclusion that privately hosting Conda packages of your own is a good decision to take. We decided it would be best to make a comparison of some of the options currently out there. In this post, we’ll share our experience in exploring various private Conda package server solutions, including their pros and cons.

Exploring private Conda package server options:

Here is a list of requirements that we had in mind:

  • The most important requirement was a working conda server that includes authentication for private packages that we have built.
  • We wanted to be able to pull in packages from local development machines, which means having the server publicly access.
  • Our next highest priority was a being able to host the server in our own AWS environment. Given the smaller team size (a current total of about 15 users), we did not want to allocate a large part of our budget towards finding an enterprise solution.
  • On top of that, it would be nice to use software that is regularly maintained.

We explored a number of different third party and open source options for hosting conda packages. Here are the solutions we explored:

  1. S3 Option:
  • AWS and Azure suggest using blob storage for hosting Conda packages.This involves building a package, and storing the zipped package files on s3. It is possible to add an s3 bucket as a conda channel.
  • You still have to ensure that there is an index file in the s3 bucket, but not a lot of information regarding this is available online, we discussed having a lambda function that goes through the s3 bucket to rebuild the index whenever necessary. If you want any authentication involved in this process, this would involve some extra work.
  • The lack of readily available solutions and the need for additional work for authentication may pose challenges, so we decided against using this

2. Third-Party Vendors:

  • Anaconda Cloud offers private channels with a paid license. It offers out of the box tokens that you can use for authenticating members of a channel when accessing a private conda channel.
  • Artifactory and Sonatype provide Counda package hosting with paid subscriptions, offering similar functionalities but have a monthly fee per user, which exceeded our budget.

3. Open Source Solutions:

  • Go server by Daniel Bok. This isn’t currently maintained, so we pretty quickly decided not to use this option.
  • In a blog post from 2019, a solution using nginx was described. While this makes sense as the conda files are static, there was no information about regenerating the conda index.
  • Quetz emerged as a promising open-source solution maintained by the team behind mamba and boa, offering channel segregation and user permissions customization.

4. Other Options:

In the table below, we present a MoSCoW analysis of our requirements for a conda server for our Python development team.

Table 1. MoSCoW analysis of conda package servers

Most of the solutions we explored were either third-party paid options and/or did not allow us to host the servers ourselves. Quetz was the only ready made solution that complied with our requirements. While the internal documentation could be better, overall we are quite happy with our decision.

Choosing Quetz as our private package repository:

  • Quetz emerged as the preferred option due to its robust features, active maintenance, and positive reports from other companies using it.
  • Our team has successfully adopted Quetz for hosting conda packages, ensuring streamlined package management and distribution.

This was the first part of this series.

In a next part, we would like to show you how we setup our repositories, and things to take into account when building Conda packages.

Michael Best | Data Engineer BioStrand (a subsidiary of IPA)

Originally published at https://blog.biostrand.ai on April 9, 2024.

--

--

BioStrand (a subsidiary of IPA)

Software and proprietary solutions for MULTI-omics data analysis. Effective research requires convenient and scalable tools.