# Using pre-commit hooks with uv for Data Science ## 1. What is a *pre-commit hook*? A **hook** is a small script that runs automatically at certain stages of using Git. A **pre-commit hook** runs right before you save your changes with `git commit`. The idea: automate checks or fixes **before your code enters Git history**. Examples: - check Python code style (PEP8), - automatically format code, - prevent committing large data files, - clean up Jupyter notebooks before committing. ## 2. Why should data scientists care? When working in a team, clean code and reproducibility matter. Hooks help you: - **avoid silly mistakes** (typos, forgotten formatting), - **keep the repo clean** (notebooks without huge outputs), - **save time** (problems are fixed before review), - **collaborate smoothly** with engineers. ## 3. Install and initialize `uv` If you don’t have `uv` yet, install it: ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ```` Then create and enter a project: ```bash uv init my-data-project cd my-data-project ``` This creates a virtual environment and a `pyproject.toml`. ## 4. Add `pre-commit` with `uv` Instead of `pip install pre-commit`, use: ```bash uv add --dev pre-commit ``` This installs `pre-commit` as a **development dependency** in your `pyproject.toml`. ## 5. Initialize hooks in Git Inside your project, run: ```bash uv run pre-commit install ``` From now on, every `git commit` will trigger the hooks you configure. ## 6. Configure `.pre-commit-config.yaml` At the root of your repo, create a `.pre-commit-config.yaml`. Here’s a setup useful for data scientists: ```yaml repos: # Basic checks - repo: https://github.com/pre-commit/pre-commit-hooks rev: v4.6.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer - id: check-merge-conflict - id: check-added-large-files # Python code formatter (Black) - repo: https://github.com/psf/black rev: 23.9.1 hooks: - id: black language_version: python3 # Organize imports (isort) - repo: https://github.com/pycqa/isort rev: 5.12.0 hooks: - id: isort # Clean Jupyter notebooks - repo: https://github.com/kynan/nbstripout rev: 0.6.1 hooks: - id: nbstripout ``` What these do: * `trailing-whitespace`, `end-of-file-fixer`: keep files tidy, * `black`: auto-format Python code, * `isort`: clean and order imports, * `nbstripout`: remove execution outputs from Jupyter notebooks. ## 7. Run hooks manually To check all files at once: ```bash uv run pre-commit run --all-files ``` ## 8. Best practices for data scientists * Keep `pre-commit` in **dev dependencies** only. * Document in your README: ```bash uv run pre-commit install ``` so teammates also enable hooks. * Regularly update hooks: ```bash uv run pre-commit autoupdate ``` ## Conclusion With `uv` and *pre-commit hooks*, you get: * faster installations than pip, * reproducible environments, * automatic code quality checks before commits.