Project structure

Last reviewed

February 24, 2025

Last modified

March 25, 2025

In software development, the choices you make at the start will affect your project’s final outcome. One key decision is how to structure your project, as a well-organised setup is essential for reproducibility and long-term maintainability.

Essential principles

  • Consitant directory structure: Have a consistent and meaningful directory naming convention.
  • Clear file and folder naming: Opt for lowercase names combined with underscores or hyphens.
  • Managing access levels: Use different Git repositories for public and private parts of your project. Use .gitignore or a specific non-tracked folder for sensitive content and/or files that are too large.
  • Clear documentation: Include a README at the project’s root to offer a summary, and add an appropriate LICENSE to define the terms for reuse and modification.
  • Coding standards: Adhere to a consistent coding style to enhance code readability. Check out our code style guide for more information.

Other recommendations

  • Code reusability: Store reusable software elements in a separate repository for efficiency across projects or consider packaging them.
  • Modular code design: Aim for modular code design to improve maintainability and reusability.
  • Dependency management: Use virtual environments to manage project dependencies, ensuring consistent environments across different setups.

Repository structures

The following are recommendations of how you can structure your project repository for Python, MATLAB, and R projects.

your_project/
β”‚
β”œβ”€β”€ docs/                     # Documentation directory
β”œβ”€β”€ notebooks/                # Jupyter notebooks
β”œβ”€β”€ src/                      # Contains your main code
β”‚   └── your_project/            # A folder where your organized code lives
β”‚       β”œβ”€β”€ __init__.py       # A marker file that indicates this folder is for Python code
β”‚       β”œβ”€β”€ module            # A file or folder with specific functions or classes
β”‚       └── extras/           # A folder for additional, related code
β”‚           └── __init__.py   # A marker file for the additional code folder
β”œβ”€β”€ tests/                    # Your test directory  
β”‚
β”œβ”€β”€ data/                     # Data files used in the project (if applicable)
β”œβ”€β”€ processed_data/           # Files from your analysis (if applicable)
β”œβ”€β”€ results/                  # Results (if applicable)
β”‚
β”œβ”€β”€ .gitignore                # Untracked files 
β”œβ”€β”€ requirements.txt          # Software dependencies (environment.yml if using Conda)
β”‚                             # ↑ Even better to use a build system config (pyproject.toml)
β”‚                             # ↑ which is becoming the new standard
β”œβ”€β”€ README.md                 # README
└── LICENSE                   # License information

Choosing between a src/ layout and a flat layout for Python

your_project/
β”‚
β”œβ”€β”€ docs/                   # Documentation and user guides
β”œβ”€β”€ src/                    # Main MATLAB code
β”‚   β”œβ”€β”€ utils/              # Helper functions and scripts
β”‚   β”œβ”€β”€ models/             # Core functions or classes implementing models/algorithms
β”‚   └── main_script.m       # Main script/-s or entry point for the project
β”‚
β”œβ”€β”€ scripts/                # Scripts folder (e.g. for analysis and demo scripts)
β”œβ”€β”€ tests/                  # Tests folder (e.g. MATLAB unit tests)
β”œβ”€β”€ data/                   # Raw data files
β”œβ”€β”€ results/                # Output files (figures, processed data, etc.)
β”œβ”€β”€ examples/               # Example usage or tutorials
β”‚
β”œβ”€β”€ .gitignore              # Specifies files/folders to ignore in version control
β”œβ”€β”€ README.md               # Project overview and instructions
└── LICENSE                 # License information
your_project/
β”‚
β”œβ”€β”€ R/                        # R scripts and functions (can also be called src/)
β”‚   β”œβ”€β”€ function.R            # R functions used across analyses
β”‚   └── other_function.R      
β”‚
β”œβ”€β”€ data/                     # raw data files (if applicable)
β”œβ”€β”€ processed_data/           # processed data files (if applicable)
β”‚
β”œβ”€β”€ doc/                      # project documentation
β”œβ”€β”€ man/                      # helper files for package functions generated from roxygen2 (if applicable)
β”‚      
β”œβ”€β”€ vignettes/                # explanatory vignettes for the project (if applicable)
β”‚   └── function_vignette.Rmd # vignettes for each function
β”‚
β”œβ”€β”€ tests/                    # test cases for your functions (highly recommended)
β”‚   └── testthat/             # using the testthat package
β”‚
β”œβ”€β”€ results/                  # output from data analyses etc. (if applicable)
β”‚
β”œβ”€β”€ scripts/                  # high-level scripts for running analyses
β”‚   └── analysis_script.R     # script running the main analysis
β”‚
β”œβ”€β”€ .gitignore                # gitignore
β”œβ”€β”€ DESCRIPTION               # package description file (if applicable)
β”œβ”€β”€ NAMESPACE                 # namespace file for package (if applicable)
β”œβ”€β”€ README.md                 # README
└── LICENSE                   # license information

These structures are a starting point and can be adapted based on the specific needs and practices of your project. Some additional tips:

  • Particular metadata files are often capitalized, such as README, LICENSE, CONTRIBUTING, CODE_OF_CONDUCT, CHANGELOG, CITATION.cff, NOTICE, and MANIFEST.
  • Generally, all content that is generated upon building or running your code should be added to .gitignore. This likely includes the content of processed_data and results folder.
  • Git cannot track empty folders. If you want to add empty folders to enforce a folder structure, e.g., processed_data orresults, the convention is to add the file .gitkeep to the folder.
Managing data

If your raw data files or any data assets are large (typically more than a few megabytes), it’s usually best not to include them directly in the repository. Instead:

  • Keep such files externally (e.g. cloud storage, Git LFS), and add only a reference or a small sample to the repository.
  • Adding placeholder files or instructions in the README for how to obtain the complete datasets.