# Introduction to Bandit for Data Science

## 1. What is Bandit?
**Bandit** is a static analysis tool for Python that automatically detects potential security issues.
It scans your codebase for insecure patterns such as:
- hardcoded passwords,
- unsafe Python functions (`eval`, `exec`, …),
- weak cryptographic practices,
- possible code injection.

Its main goal is to **improve your code security before production**.


## 2. Installation with `uv`

Instead of using `pip`, it’s recommended to use [uv](https://docs.astral.sh/uv/) (a modern, fast Python package manager).
You can install Bandit as a **development dependency**:

```bash
uv add --dev bandit
````

This ensures Bandit is available for development but won’t be installed in production environments.


## 3. First Security Scan

To analyze a single file:

```bash
bandit -r my_script.py
```

To analyze an entire project:

```bash
bandit -r .
```

* `-r` means recursive: Bandit scans all `.py` files in the directory.


## 4. Practical Example

Let’s consider a file `example.py`:

```python
import os

# Bad practice: storing a password in plain text
password = "1234"

# Bad practice: using exec()
code = "print('Hello World')"
exec(code)
```

Run Bandit:

```bash
bandit -r example.py
```

**Expected output**:

* Warning about `password` (hardcoded secret).
* Warning about `exec` usage (potential code injection).


## 5. Understanding the Bandit Report

Bandit outputs a report with several fields:

* **Severity**: issue severity (`LOW`, `MEDIUM`, `HIGH`).
* **Confidence**: detection confidence (`LOW`, `MEDIUM`, `HIGH`).
* **Issue**: description and recommendations.
* **Location**: file and line of the issue.

Example:

```
Severity: HIGH
Confidence: HIGH
Issue: [B102: exec_used] Use of exec detected.
Location: example.py:7
```


## 6. Output Formats

By default, Bandit prints plain text.
You can also generate **JSON** or **HTML** reports for CI/CD pipelines:

```bash
bandit -r . -f json -o report.json
bandit -r . -f html -o report.html
```


## 7. Why Bandit Matters in Data Science

As a **data scientist**, you often handle:

* **sensitive data** (medical, financial, personal),
* **machine learning models** that may go into production,
* **automated scripts** for ETL or APIs.

Bandit helps you catch common security issues before sharing or deploying your code, such as:

* avoiding hardcoded credentials,
* ensuring proper use of cryptographic libraries,
* scanning exported notebooks (`.py` files).


## 8. Integrating Bandit into a Pre-Commit Hook

To prevent committing insecure code, you can run Bandit automatically using [pre-commit](https://pre-commit.com/).

1. Install pre-commit:

```bash
uv add --dev pre-commit
```

2. Create a `.pre-commit-config.yaml` file at the root of your repo:

```yaml
repos:
  - repo: https://github.com/PyCQA/bandit
    rev: 1.7.9   # use the latest stable version
    hooks:
      - id: bandit
        args: ["-r", "."]
```

3. Install the git hook:

```bash
pre-commit install
```

Now Bandit will run automatically on staged files before every commit.
If it finds issues, the commit will be blocked until they are fixed.


## 9. Integrating Bandit into CI/CD

### GitHub Actions

Create `.github/workflows/security.yml`:

```yaml
name: Security Scan

on:
  push:
    branches: [ main ]
  pull_request:

jobs:
  bandit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.11"

      - name: Install uv
        run: pip install uv

      - name: Install dependencies
        run: uv sync

      - name: Run Bandit
        run: bandit -r . -f html -o bandit-report.html

      - name: Upload Bandit report
        uses: actions/upload-artifact@v3
        with:
          name: bandit-report
          path: bandit-report.html
```


### GitLab CI

Create `.gitlab-ci.yml`:

```yaml
stages:
  - security

bandit_scan:
  stage: security
  image: python:3.11
  script:
    - pip install uv
    - uv sync
    - bandit -r . -f html -o bandit-report.html
  artifacts:
    paths:
      - bandit-report.html
    when: always
    expire_in: 1 week
```

This will generate a **Bandit report** as a downloadable artifact in GitLab.


## 10. Best Practices with Bandit

- Run Bandit **early in development** (shift-left security).
- Include Bandit scans in **pre-commit hooks**.
- Integrate Bandit into your **CI/CD pipeline** (GitHub or GitLab).
- Review warnings carefully instead of ignoring them.
- Store reports (`HTML`/`JSON`) as artifacts for easy review.


## Conclusion

* Bandit is a **lightweight and powerful tool** for Python security analysis.
* Installing it with **uv** keeps dependencies clean and isolated.
* Integrating Bandit with **pre-commit** ensures insecure code never lands in git history.
* Adding Bandit to **CI/CD pipelines** provides automated and repeatable security checks.