Git Workflow for Data Science Project ===================================== Git is an essential tool for **versioning your code**, collaborating with your team, and avoiding the loss of work. This tutorial introduces the most common commands you will use in your day-to-day projects, based on the workflow between your **local environment** and the **remote repository**. Daily Workflow --------------- 1. **Get the project:** .. code-block:: bash git clone 2. **Work on a new branch** .. code-block:: bash git checkout -b dev/new_feature 3. **Work on your notebook or code.** 4. **Stage your changes:** .. code-block:: bash git add . 5. **Commit your work:** .. code-block:: bash uv run cz c 6. **Push your work:** .. code-block:: bash git push origin main 7. **Update with the latest changes:** .. code-block:: bash git pull origin main or .. code-block:: bash git fetch git rebase 1. Working Directory -------------------- This is where you write your code (Python, R, SQL, Jupyter notebooks, etc.). **Key command:** * ``git add `` : adds a file (or changes) to the **staging area** in preparation for a commit. Example: .. code-block:: bash git add analysis.ipynb git add . (``git add .`` adds everything at once) 2. Staging → Local Repository ----------------------------- Once your changes are staged, you need to commit them. **Key command:** * ``git commit -m "Clear message"`` : records the changes locally. Example: .. code-block:: bash git commit -m "Added first version of Random Forest model" *Tip: Write meaningful commit messages that describe your work clearly.* * ``uv run cz c`` : this will open commitizen to use conventional commits 3. Sending to the Remote Repository ----------------------------------- After saving work locally, you can share it with your team. **Key commands:** * ``git push`` : sends your commits to the remote repository (GitHub, GitLab, Bitbucket). Example: .. code-block:: bash git push origin main * ``git pull`` : retrieves the latest changes from the team and merges them into your code. Example: .. code-block:: bash git pull origin main * ``git clone `` : downloads an existing project from a remote repository. Example: .. code-block:: bash git clone https://github.com/team/project-ml.git 4. Merging and Fetching Code ---------------------------- When working in a team, you often need to integrate your work with others. **Key commands:** * ``git merge `` : merges another branch into your current branch. Example: .. code-block:: bash git merge develop * ``git fetch`` : downloads changes from the remote without merging them into your branch. You can review them before applying. Example: .. code-block:: bash git fetch origin Use ``git fetch`` when you want to check what has changed remotely, and ``git merge`` (or ``git pull``, which is ``fetch + merge``) when you are ready to integrate those changes. 5. Undoing or Correcting Changes (Reset & Stash) ------------------------------------------------ Mistakes happen. Git allows you to roll back or temporarily set aside changes. **Useful commands:** * ``git reset `` : removes a file from the staging area (before commit). Example: .. code-block:: bash git reset analysis.ipynb * ``git reset `` : reverts the project to a previous state (use with caution). Example: .. code-block:: bash git reset --hard abc123 * ``git stash`` : temporarily saves your uncommitted changes, useful if you need to switch branches quickly. .. code-block:: bash git stash git stash apply # reapply changes git stash pop # reapply and remove from stash 6. Removing a File from Git History ----------------------------------- Sometimes large or sensitive files (such as datasets or credentials) get accidentally committed into the repository history. These files can make the repository unnecessarily large or expose private data. Simply deleting the file and committing is not enough, because it still exists in the history. To permanently remove a file from all history, you can use `git filter-repo `_. **Steps:** 1. Install ``git filter-repo`` (if not already installed). On Linux or macOS: .. code-block:: bash brew install git-filter-repo or manually download from the GitHub repository. 2. Run the command to remove the file everywhere in the history: .. code-block:: bash git filter-repo --path --invert-paths Example: .. code-block:: bash git filter-repo --path data/large_dataset.csv --invert-paths This removes the file ``large_dataset.csv`` from the entire history of the repository. 3. Force-push the cleaned repository back to remote: .. code-block:: bash git push origin --force --all git push origin --force --tags After this, the file will be completely erased from Git history, and the repository size will be reduced. This is a powerful operation, so use it carefully. 7. Prepare a MR --------------- In order to help you prepare the description for your MR (Merge Request), you can leverage Generative AI. For example, from your branch to the main branch. 1. Move to the main branch and fetch the latest code from main: .. code-block:: bash git checkout main git fetch git rebase 2. Move to your branch: .. code-block:: bash git checkout dev/my_branch 3. Make a diff between your branch and the main branch and put it into a ``diff.txt`` file for example: .. code-block:: bash git diff main > diff.txt 4. Go to ChatGPT or Gemini, upload the ``diff.txt`` file and use this prompt: .. code-block:: text # 🔎 Analyse de rédaction de merge request ## 🎯 CONTEXTE Tu es un assistant expert en Git. Tu maîtrises les bonnes pratiques du développement logiciel en versionning de codes et en travail collaboratif de code (travail sur une branche de développement, merge request, etc.) ## 📥 OBJECTIFS Je te transmets : - Le git diff entre une branche source et une branche cible Tu dois me produire une description ultra complète, visuelle, réaliste, de la Merge Request ## Organisation Tu dois rédiger la Merge Request en suivant l’organisation suivante : ### Summary • Titre pour la Merge Request • New Features • Documentation • Fixes (si applicable) ### Walkthrough [Explique en langage naturel ce qui a été modifié, créé, ou supprimé. Mentionne les fichiers importants.] ### Changes [Explique en langage naturel ce qui a été modifié, créé ou supprimé pour chaque fichier. Sois exhaustif dans la liste des fichiers.] | File Path | Change Summary | | --------- | -------------- | | ... | ... | ## Sequence Diagram(s) [Ajoute un diagramme mermaid] Ecris en anglais. Ne renvoie que le résultat. 5. Create a new MR and paste the following content into the description of the MR.