Git Workflow for Data Science Project¶
Git is an essential tool for versioning your code, collaborating with your team, and avoiding the loss of work. This tutorial introduces the most common commands you will use in your day-to-day projects, based on the workflow between your local environment and the remote repository.
Daily Workflow¶
Get the project:
git clone <url>
Work on a new branch
git checkout -b dev/new_feature
Work on your notebook or code.
Stage your changes:
git add .
Commit your work:
uv run cz c
Push your work:
git push origin main
Update with the latest changes:
git pull origin main
or
git fetch git rebase
1. Working Directory¶
This is where you write your code (Python, R, SQL, Jupyter notebooks, etc.).
Key command:
git add <file>: adds a file (or changes) to the staging area in preparation for a commit. Example:git add analysis.ipynb git add .
(
git add .adds everything at once)
2. Staging → Local Repository¶
Once your changes are staged, you need to commit them.
Key command:
git commit -m "Clear message": records the changes locally. Example:git commit -m "Added first version of Random Forest model"
Tip: Write meaningful commit messages that describe your work clearly.
uv run cz c: this will open commitizen to use conventional commits
3. Sending to the Remote Repository¶
After saving work locally, you can share it with your team.
Key commands:
git push: sends your commits to the remote repository (GitHub, GitLab, Bitbucket). Example:git push origin main
git pull: retrieves the latest changes from the team and merges them into your code. Example:git pull origin main
git clone <url>: downloads an existing project from a remote repository. Example:git clone https://github.com/team/project-ml.git
4. Merging and Fetching Code¶
When working in a team, you often need to integrate your work with others.
Key commands:
git merge <branch>: merges another branch into your current branch. Example:git merge develop
git fetch: downloads changes from the remote without merging them into your branch. You can review them before applying. Example:git fetch origin
Use git fetch when you want to check what has changed remotely, and git merge (or git pull, which is fetch + merge) when you are ready to integrate those changes.
5. Undoing or Correcting Changes (Reset & Stash)¶
Mistakes happen. Git allows you to roll back or temporarily set aside changes.
Useful commands:
git reset <file>: removes a file from the staging area (before commit). Example:git reset analysis.ipynb
git reset <commit>: reverts the project to a previous state (use with caution). Example:git reset --hard abc123
git stash: temporarily saves your uncommitted changes, useful if you need to switch branches quickly.git stash git stash apply # reapply changes git stash pop # reapply and remove from stash
6. Removing a File from Git History¶
Sometimes large or sensitive files (such as datasets or credentials) get accidentally committed into the repository history. These files can make the repository unnecessarily large or expose private data. Simply deleting the file and committing is not enough, because it still exists in the history.
To permanently remove a file from all history, you can use git filter-repo.
Steps:
Install
git filter-repo(if not already installed). On Linux or macOS:brew install git-filter-repo
or manually download from the GitHub repository.
Run the command to remove the file everywhere in the history:
git filter-repo --path <file-to-remove> --invert-paths
Example:
git filter-repo --path data/large_dataset.csv --invert-paths
This removes the file
large_dataset.csvfrom the entire history of the repository.Force-push the cleaned repository back to remote:
git push origin --force --all git push origin --force --tags
After this, the file will be completely erased from Git history, and the repository size will be reduced.
This is a powerful operation, so use it carefully.
7. Prepare a MR¶
In order to help you prepare the description for your MR (Merge Request), you can leverage Generative AI. For example, from your branch to the main branch.
Move to the main branch and fetch the latest code from main:
git checkout main git fetch git rebase
Move to your branch:
git checkout dev/my_branch
Make a diff between your branch and the main branch and put it into a
diff.txtfile for example:git diff main > diff.txt
Go to ChatGPT or Gemini, upload the
diff.txtfile and use this prompt:# 🔎 Analyse de rédaction de merge request ## 🎯 CONTEXTE Tu es un assistant expert en Git. Tu maîtrises les bonnes pratiques du développement logiciel en versionning de codes et en travail collaboratif de code (travail sur une branche de développement, merge request, etc.) ## 📥 OBJECTIFS Je te transmets : - Le git diff entre une branche source et une branche cible Tu dois me produire une description ultra complète, visuelle, réaliste, de la Merge Request ## Organisation Tu dois rédiger la Merge Request en suivant l’organisation suivante : ### Summary • Titre pour la Merge Request • New Features • Documentation • Fixes (si applicable) ### Walkthrough [Explique en langage naturel ce qui a été modifié, créé, ou supprimé. Mentionne les fichiers importants.] ### Changes [Explique en langage naturel ce qui a été modifié, créé ou supprimé pour chaque fichier. Sois exhaustif dans la liste des fichiers.] | File Path | Change Summary | | --------- | -------------- | | ... | ... | ## Sequence Diagram(s) [Ajoute un diagramme mermaid] Ecris en anglais. Ne renvoie que le résultat.
Create a new MR and paste the following content into the description of the MR.