# Introduction to Bandit for Data Science ## 1. What is Bandit? **Bandit** is a static analysis tool for Python that automatically detects potential security issues. It scans your codebase for insecure patterns such as: - hardcoded passwords, - unsafe Python functions (`eval`, `exec`, …), - weak cryptographic practices, - possible code injection. Its main goal is to **improve your code security before production**. ## 2. Installation with `uv` Instead of using `pip`, it’s recommended to use [uv](https://docs.astral.sh/uv/) (a modern, fast Python package manager). You can install Bandit as a **development dependency**: ```bash uv add --dev bandit ```` This ensures Bandit is available for development but won’t be installed in production environments. ## 3. First Security Scan To analyze a single file: ```bash bandit -r my_script.py ``` To analyze an entire project: ```bash bandit -r . ``` * `-r` means recursive: Bandit scans all `.py` files in the directory. ## 4. Practical Example Let’s consider a file `example.py`: ```python import os # Bad practice: storing a password in plain text password = "1234" # Bad practice: using exec() code = "print('Hello World')" exec(code) ``` Run Bandit: ```bash bandit -r example.py ``` **Expected output**: * Warning about `password` (hardcoded secret). * Warning about `exec` usage (potential code injection). ## 5. Understanding the Bandit Report Bandit outputs a report with several fields: * **Severity**: issue severity (`LOW`, `MEDIUM`, `HIGH`). * **Confidence**: detection confidence (`LOW`, `MEDIUM`, `HIGH`). * **Issue**: description and recommendations. * **Location**: file and line of the issue. Example: ``` Severity: HIGH Confidence: HIGH Issue: [B102: exec_used] Use of exec detected. Location: example.py:7 ``` ## 6. Output Formats By default, Bandit prints plain text. You can also generate **JSON** or **HTML** reports for CI/CD pipelines: ```bash bandit -r . -f json -o report.json bandit -r . -f html -o report.html ``` ## 7. Why Bandit Matters in Data Science As a **data scientist**, you often handle: * **sensitive data** (medical, financial, personal), * **machine learning models** that may go into production, * **automated scripts** for ETL or APIs. Bandit helps you catch common security issues before sharing or deploying your code, such as: * avoiding hardcoded credentials, * ensuring proper use of cryptographic libraries, * scanning exported notebooks (`.py` files). ## 8. Integrating Bandit into a Pre-Commit Hook To prevent committing insecure code, you can run Bandit automatically using [pre-commit](https://pre-commit.com/). 1. Install pre-commit: ```bash uv add --dev pre-commit ``` 2. Create a `.pre-commit-config.yaml` file at the root of your repo: ```yaml repos: - repo: https://github.com/PyCQA/bandit rev: 1.7.9 # use the latest stable version hooks: - id: bandit args: ["-r", "."] ``` 3. Install the git hook: ```bash pre-commit install ``` Now Bandit will run automatically on staged files before every commit. If it finds issues, the commit will be blocked until they are fixed. ## 9. Integrating Bandit into CI/CD ### GitHub Actions Create `.github/workflows/security.yml`: ```yaml name: Security Scan on: push: branches: [ main ] pull_request: jobs: bandit: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v4 with: python-version: "3.11" - name: Install uv run: pip install uv - name: Install dependencies run: uv sync - name: Run Bandit run: bandit -r . -f html -o bandit-report.html - name: Upload Bandit report uses: actions/upload-artifact@v3 with: name: bandit-report path: bandit-report.html ``` ### GitLab CI Create `.gitlab-ci.yml`: ```yaml stages: - security bandit_scan: stage: security image: python:3.11 script: - pip install uv - uv sync - bandit -r . -f html -o bandit-report.html artifacts: paths: - bandit-report.html when: always expire_in: 1 week ``` This will generate a **Bandit report** as a downloadable artifact in GitLab. ## 10. Best Practices with Bandit - Run Bandit **early in development** (shift-left security). - Include Bandit scans in **pre-commit hooks**. - Integrate Bandit into your **CI/CD pipeline** (GitHub or GitLab). - Review warnings carefully instead of ignoring them. - Store reports (`HTML`/`JSON`) as artifacts for easy review. ## Conclusion * Bandit is a **lightweight and powerful tool** for Python security analysis. * Installing it with **uv** keeps dependencies clean and isolated. * Integrating Bandit with **pre-commit** ensures insecure code never lands in git history. * Adding Bandit to **CI/CD pipelines** provides automated and repeatable security checks.