Global ArchitectureΒΆ
- Authors:
Cao Tri DO <cao-tri.do@keyrus.com>
- Version:
2025-09
Objectives
This article is intended to provide a comprehensive overview of the global architecture of the solution.
graph TD
A["π¦ Data Source (Kaggle / Local)"] -->|π€ Upload| B["ποΈ Databricks Volume (Raw Data)"]
B -->|"π§Ή Cleaning & Preprocessing<br>(scripts/run_cleanup_data.py, data/cleanup.py)"| C["π§ͺ Data Processing (Feature Engineering)"]
C -->|βοΈ Train / Test Split| D["π§ MLflow Experiment<br>(train_register_model.py)"]
D -->|π Track Experiments & Metrics| E["π MLflow Tracking Server"]
E -->|π·οΈ Model Versioning| F["π MLflow Model Registry"]
F -->|π Deployment & Integration| G["βοΈ Databricks Workspace (Dev / Acc / Prod)"]
G -->|π Monitoring & Reporting| H["π Dashboard / Visualization (vizualization/)"]
subgraph "π§° Tooling & Environment"
I["π§© Devbox + UV + Taskfile (Reproducibility & Environments)"]
J["βοΈ GitHub / GitLab CI-CD (Automated CI/CD)"]
K["π§Ό Pre-commit / Ruff / Commitizen (Code Quality & Standardization)"]
end
I --> G
J --> G
K --> G
style A fill:#e6f7ff,stroke:#007acc,stroke-width:2px
style B fill:#e6f7ff,stroke:#007acc,stroke-width:2px
style C fill:#e6f7ff,stroke:#007acc,stroke-width:2px
style D fill:#fff2cc,stroke:#f1c232,stroke-width:2px
style E fill:#fff2cc,stroke:#f1c232,stroke-width:2px
style F fill:#d9ead3,stroke:#6aa84f,stroke-width:2px
style G fill:#d9ead3,stroke:#6aa84f,stroke-width:2px
style H fill:#d9ead3,stroke:#6aa84f,stroke-width:2px
style I fill:#f9cb9c,stroke:#e69138,stroke-width:2px
style J fill:#f9cb9c,stroke:#e69138,stroke-width:2px
style K fill:#f9cb9c,stroke:#e69138,stroke-width:2px