Deployment with FastAPI, Docker
Serve and containerize ML models, ready for the cloud (GCP, AWS)
Executive Summary
The ability to operationalize a machine-learning model is just as important as the model itself. In this mini-project I demonstrate how a trained XGBoost classifier (taken from my Customer Churn case study) can be (1) wrapped behind a FastAPI service and (2) containerized with Docker, ready to be shipped to Google Cloud.
“A model that never leaves the notebook never creates value.”
Architecture Overview
- FastAPI provides the
/predict
endpoint and auto-generated docs - Docker ensures identical runtime environments
- Cloud Run offers zero-ops serverless hosting on Google Cloud
Repositories
1. Building the FastAPI micro-service
This repository contains all code and instructions on how to serve the API locally. It serves a pre-trained customer churn XGBoost model and provides a /predict
endpoint for making predictions. It uses Pydantic for request data validation.
2. Containerization with Docker
This repository packages the previous application (the customer churn API) into a Docker image, ready to be shipped to the cloud (e.g. Google Cloud, AWS).
3. Google Cloud deployment
This model is currently being served on Google Cloud Cloud Run, and can be accessed here.
Key Takeaways
-
Reproducibility
A single source of truth (Dockerfile
) recreates the environment byte-for-byte. -
Cloud portability
Although demonstrated on Google Cloud Run, the same container can be deployed to AWS, Azure, or on-prem K8s. -
Framework agnostic
These methods can be applied to other ML frameworks (PyTorch, TensorFlow).
Next steps
- Deploy the same model using a managed platform like SageMaker, Vertex AI, or Azure ML endpoints.
- Add basic monitoring (logging predictions)
- Set up CI/CD pipeline using GitHub Actions to rebuild/redeploy the container on code changes.