Global ArchitectureΒΆ

Authors:

Cao Tri DO <cao-tri.do@keyrus.com>

Version:

2025-09

Objectives

This article is intended to provide a comprehensive overview of the global architecture of the solution.

        graph TD
    A["πŸ“¦ Data Source (Kaggle / Local)"] -->|πŸ“€ Upload| B["πŸ—„οΈ Databricks Volume (Raw Data)"]
    B -->|"🧹 Cleaning & Preprocessing<br>(scripts/run_cleanup_data.py, data/cleanup.py)"| C["πŸ§ͺ Data Processing (Feature Engineering)"]
    C -->|βœ‚οΈ Train / Test Split| D["🧠 MLflow Experiment<br>(train_register_model.py)"]
    D -->|πŸ“Š Track Experiments & Metrics| E["πŸ“ˆ MLflow Tracking Server"]
    E -->|🏷️ Model Versioning| F["πŸ“š MLflow Model Registry"]
    F -->|πŸš€ Deployment & Integration| G["☁️ Databricks Workspace (Dev / Acc / Prod)"]
    G -->|πŸ“Š Monitoring & Reporting| H["πŸ“‰ Dashboard / Visualization (vizualization/)"]

    subgraph "🧰 Tooling & Environment"
        I["🧩 Devbox + UV + Taskfile (Reproducibility & Environments)"]
        J["βš™οΈ GitHub / GitLab CI-CD (Automated CI/CD)"]
        K["🧼 Pre-commit / Ruff / Commitizen (Code Quality & Standardization)"]
    end

    I --> G
    J --> G
    K --> G

    style A fill:#e6f7ff,stroke:#007acc,stroke-width:2px
    style B fill:#e6f7ff,stroke:#007acc,stroke-width:2px
    style C fill:#e6f7ff,stroke:#007acc,stroke-width:2px
    style D fill:#fff2cc,stroke:#f1c232,stroke-width:2px
    style E fill:#fff2cc,stroke:#f1c232,stroke-width:2px
    style F fill:#d9ead3,stroke:#6aa84f,stroke-width:2px
    style G fill:#d9ead3,stroke:#6aa84f,stroke-width:2px
    style H fill:#d9ead3,stroke:#6aa84f,stroke-width:2px
    style I fill:#f9cb9c,stroke:#e69138,stroke-width:2px
    style J fill:#f9cb9c,stroke:#e69138,stroke-width:2px
    style K fill:#f9cb9c,stroke:#e69138,stroke-width:2px