Creating a Python package
By turning your code into a package and hosting it on a platform like PyPI (Python Package Index), you enhance the quality and sustainability of your software, promote reuse and embrace contributions from external collaborators.
pyproject.toml
In the Development workflow section we looked at how to structure your project. Here we will focus on the pyproject.toml
file, which is a configuration file used in Python projects to define build system requirements and project metadata. It is part of the PEP 517 and PEP 518 specifications, which aim to standardize the way Python projects are built and packaged. The pyproject.toml
consists of TOML tables, and can include [build-system]
, [project]
, or [tools]
tables.
[build-system]
The [build-system]
table is essential because it defines which build backend you will be using, and also which dependencies are required to build your project. This is needed because frontend tools like pip
are not responsible for transforming your source code into a distributable package, and this is handled by one of the build backends (e.g. setuptools, Hatchling).
[project]
Under the [project]
table you can describe your metadata. It can become quite extensive, but this is where you would list the name of your project, version, authors, licensing, dependencies specific to your project, and other requirements, as well as other optional information. For a detailed list of what can be included under [project]
check the Declaring project metadata section of Python Packaging Guide.
[tools]
The [tool]
table contains subtables specific to each tool. For example, Poetry uses the [tool.poetry]
table instead of the [project]
table.
Before shifting to pyproject.toml
, a common approach was to use a setup.py
build script. You might encounter them in legacy projects.
Package structuring
If you want to distribute your Python code as a package, you will need to have an __init__.py
file in the root directory of your package. This allows Python to treat that directory as a package that can be imported. Every subfolder should also contain an __init__.py
file.
When importing a package, Python searches through the directories on sys.path
looking for the package subdirectory. The presence of __init__.py
files within these directories is crucial, as it tells Python that these directories should be treated as packages. This mechanism helps avoid the scenario where directories with commonplace names accidentally overshadow valid modules that appear later in the search path.
While __init__.py
can simply be an empty file, serving just to mark a directory as a package, it can also contain code that runs when the package is imported. This code can initialize package-level variables, import submodules, and other tasks.
Referring to the project structure in our Development workflow guide, we can build on top of that structure.
So our example package structure would now look like this:
your_project/
│
├── docs/ # Documentation directory
├── notebooks/ # Jupyter notebooks
├── src/ # Contains your main code
│ └── your_pkg_name/ # A folder where your organized code lives - your package
│ ├── __init__.py # A marker file for package initialization
│ ├── module.py # A nested module
│ └── subpkg1/ # A sub-package
│ └── __init__.py # A marker file for sub-package initialization
├── tests/ # Your test directory
│
├── data/ # Data files used in the project (if applicable)
├── processed_data/ # Files from your analysis (if applicable)
├── results/ # Results (if applicable)
│
├── .gitignore # Untracked files
├── pyproject.toml # TOML file
│
│
├── README.md # README
└── LICENSE # License information
You might notice that in our updated structure the requirements.txt
is absent. In many cases, if you have a pyproject.toml
file, you may not need a requirements.txt
file anymore, since the pyproject.toml
file is part of the new standardized Python packaging format (defined in PEP 518) and can include dependencies.
However, some deployment and CI/CD pipelines might still expect a requirements.txt
file, because a set of fixed dependency versions creates more stable pipelines. For simple projects, you can still prefer to use a requirements.txt
for its simplicity and wide adoption.
It is not considered best practice to use the pyproject.toml
to pin dependencies to specific versions or to specify sub-dependencies (i.e. dependencies of your dependencies). This is overly-restrictive, and prevents a user from gaining the benefit of dependency upgrades. For more info, see this discussion.
lib/
and build/
directories
It is possible that you might have lib/
and build/
directories in your project. These directories are not part of the standard Python package structure, but they can be created by certain tools or processes.
- The
build/
directory is typically used to store compiled or built artifacts of your project, such as binary executables, wheels, or other distribution files. This directory is usually not part of your source code repository and is generated during the (automated) build or packaging process. - The
lib/
directory stores third-party libraries or dependencies that are not installed through a package manager. By specifying your project’s dependencies in thepyproject.toml
file, and using a package manager likepip
orpoetry
to install and manage them, these dependencies will be automatically downloaded and installed in the appropriate location (usually the site packages directory).
Local package installation
By installing a Python package locally during development you can test your changes in an environment that mimics how the package will be used once it is deployed. This process allows you to ensure that your package works correctly when installed and imported by others.
You can use pip
to install your package in editable mode (-e
). This way, changes you make to the source code are immediately available without needing to reinstall the package.
-e . pip install
Next steps
Once you have your package ready, you can publish it. Visit our Release your Python package guide for information on how to publish your package to PyPI.