Project structure
In software development, the choices you make at the start will affect your projectβs final outcome. One key decision is how to structure your project, as a well-organised setup is essential for reproducibility and long-term maintainability.
Essential principles
- Consitant directory structure: Have a consistent and meaningful directory naming convention.
- Clear file and folder naming: Opt for lowercase names combined with underscores or hyphens.
- Managing access levels: Use different Git repositories for public and private parts of your project. Use
.gitignore
or a specific non-tracked folder for sensitive content and/or files that are too large. - Clear documentation: Include a
README
at the projectβs root to offer a summary, and add an appropriateLICENSE
to define the terms for reuse and modification. - Coding standards: Adhere to a consistent coding style to enhance code readability. Check out our code style guide for more information.
Other recommendations
- Code reusability: Store reusable software elements in a separate repository for efficiency across projects or consider packaging them.
- Modular code design: Aim for modular code design to improve maintainability and reusability.
- Dependency management: Use virtual environments to manage project dependencies, ensuring consistent environments across different setups.
Repository structures
The following are recommendations of how you can structure your project repository for Python, MATLAB, and R projects.
your_project/
β
βββ docs/ # Documentation directory
βββ notebooks/ # Jupyter notebooks
βββ src/ # Contains your main code
β βββ your_project/ # A folder where your organized code lives
β βββ __init__.py # A marker file that indicates this folder is for Python code
β βββ module # A file or folder with specific functions or classes
β βββ extras/ # A folder for additional, related code
β βββ __init__.py # A marker file for the additional code folder
βββ tests/ # Your test directory
β
βββ data/ # Data files used in the project (if applicable)
βββ processed_data/ # Files from your analysis (if applicable)
βββ results/ # Results (if applicable)
β
βββ .gitignore # Untracked files
βββ requirements.txt # Software dependencies (environment.yml if using Conda)
β # β Even better to use a build system config (pyproject.toml)
β # β which is becoming the new standard
βββ README.md # README
βββ LICENSE # License information
your_project/
β
βββ docs/ # Documentation and user guides
βββ src/ # Main MATLAB code
β βββ utils/ # Helper functions and scripts
β βββ models/ # Core functions or classes implementing models/algorithms
β βββ main_script.m # Main script/-s or entry point for the project
β
βββ scripts/ # Scripts folder (e.g. for analysis and demo scripts)
βββ tests/ # Tests folder (e.g. MATLAB unit tests)
βββ data/ # Raw data files
βββ results/ # Output files (figures, processed data, etc.)
βββ examples/ # Example usage or tutorials
β
βββ .gitignore # Specifies files/folders to ignore in version control
βββ README.md # Project overview and instructions
βββ LICENSE # License information
your_project/
β
βββ R/ # R scripts and functions (can also be called src/)
β βββ function.R # R functions used across analyses
β βββ other_function.R
β
βββ data/ # raw data files (if applicable)
βββ processed_data/ # processed data files (if applicable)
β
βββ doc/ # project documentation
βββ man/ # helper files for package functions generated from roxygen2 (if applicable)
β
βββ vignettes/ # explanatory vignettes for the project (if applicable)
β βββ function_vignette.Rmd # vignettes for each function
β
βββ tests/ # test cases for your functions (highly recommended)
β βββ testthat/ # using the testthat package
β
βββ results/ # output from data analyses etc. (if applicable)
β
βββ scripts/ # high-level scripts for running analyses
β βββ analysis_script.R # script running the main analysis
β
βββ .gitignore # gitignore
βββ DESCRIPTION # package description file (if applicable)
βββ NAMESPACE # namespace file for package (if applicable)
βββ README.md # README
βββ LICENSE # license information
These structures are a starting point and can be adapted based on the specific needs and practices of your project. Some additional tips:
- Particular metadata files are often capitalized, such as README, LICENSE, CONTRIBUTING, CODE_OF_CONDUCT, CHANGELOG, CITATION.cff, NOTICE, and MANIFEST.
- Generally, all content that is generated upon building or running your code should be added to
.gitignore
. This likely includes the content ofprocessed_data
andresults
folder. - Git cannot track empty folders. If you want to add empty folders to enforce a folder structure, e.g.,
processed_data
orresults
, the convention is to add the file.gitkeep
to the folder.