Read Anthropic’s case study about Graphite Reviewer

Python monorepos

Kenny DuMez
Kenny DuMez
Graphite software engineer
Try Graphite

Creating and managing a Python monorepo involves consolidating various Python projects into a single repository. This approach unifies version control, dependency management, and testing frameworks across multiple projects. This guide explores how to set up and manage a Python monorepo effectively.

Directory structure: Start by defining a clear directory structure that will house all your projects. A common approach is to have a top-level directory for each project and shared resources.

Here's an example directory structure for a monorepo with 2 discrete projects with some shared code, scripts, and docs:

Terminal
/my-monorepo
/project1
/src
/tests
/project2
/src
/tests
/shared
/src
/scripts
/docs

Version control: Initialize a Git repository in your monorepo directory. It's essential to maintain a clean commit history and branch structure to manage changes across projects efficiently.

Terminal
git init
git add .
git commit -m "Initial monorepo setup"

Managing dependencies in a monorepo can be challenging, especially when projects have conflicting requirements. Here are two effective approaches:

Virtual environments: Use virtual environments to isolate project-specific dependencies.

Virtual environments in Python are isolated spaces that allow you to manage project-specific dependencies separately from the global Python environment. This isolation prevents conflicts between project requirements and allows each project to maintain its own set of dependencies and Python versions.

The venv module, which is part of Python's standard library, provides a straightforward way to create these isolated environments.

Terminal
# Creating a virtual environment for project1
python -m venv my-monorepo/project1/venv
source my-monorepo/project1/venv/bin/activate
  • python -m venv my-monorepo/project1/venv creates a new virtual environment in the directory my-monorepo/project1/venv.

  • source my-monorepo/project1/venv/bin/activate command activates the virtual environment, changing the shell's environment to use the Python and pip located within the created virtual environment, ensuring that any Python packages installed subsequently are local to this environment.

Dependency isolation with Pants or Poetry: Tools like Pants or Poetry can help manage dependencies in a monorepo setting.

  • Pants: Handles dependencies at a fine-grained level, allowing for precise control and minimal rebuilds.
Terminal
# Example of a Pants BUILD file
python_library(
name="project1_lib",
dependencies=[
"//shared/src",
],
sources=["src/**/*.py"],
)
  • Poetry: Manages dependencies and packaging of Python projects. You can set up a pyproject.toml for each project to define its dependencies.
Terminal
# Using Poetry in project1
cd my-monorepo/project1
poetry init
poetry add requests

Testing in a monorepo should be centralized yet capable of testing projects in isolation.

Using pytest: Configure pytest to run tests for each project separately or across the entire monorepo. You can use pytest's configuration file to customize test behaviors.

Terminal
# Running tests in project1
cd my-monorepo/project1
pytest tests/

Continuous integration (CI): Set up a CI pipeline using tools like Jenkins, GitHub Actions, or GitLab CI. Define pipeline steps that install dependencies, run tests, and check code quality for each project independently or for the entire monorepo based on the changes detected.

Terminal
# Example GitHub Actions workflow for a Python monorepo
name: Python Monorepo CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
project: [project1, project2]
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.8'
- name: Install dependencies
run: |
cd ${{ matrix.project }}
poetry install
- name: Run tests
run: |
cd ${{ matrix.project }}
pytest tests/

This GitHub Actions workflow is designed to automate testing for a Python monorepo on every push and pull request, running on Ubuntu's latest version. It uses a matrix strategy to run tests for two separate projects (project1 and project2), setting up Python 3.8, installing dependencies with Poetry, and executing tests using pytest within each project's directory.

Maintain a central documentation hub in your monorepo that provides guidelines, project descriptions, and setup instructions. Tools like MkDocs or Sphinx can be used to generate and manage project documentation.

Terminal
# Setting up MkDocs
pip install mkdocs
mkdocs new my-monorepo/docs
# Add documentation files and then build the site
mkdocs build

Large-scale changes in a monorepo should be managed with care. Use feature branches to develop significant updates or new features. Regularly merge changes from the main branch into these feature branches to keep them up-to-date and to minimize merge conflicts.

Monitor the performance of your CI builds and test suites. Optimize them by caching dependencies and using parallel execution strategies. Tools like Pants are particularly good at optimizing builds in a monorepo by caching and skipping unchanged parts of the codebase.

Managing a Python monorepo requires careful setup and maintenance but can streamline development processes across multiple projects. Implementing a monorepo successfully is all about choosing the right tools and practices.

Built for the world's fastest engineering teams, now available for everyone