liam-cawley/cawley

public

Published on 6/23/2025

Productization Rule

Rules

Build & Development Commands

Use docker-compose up --build to build and run the app locally. - Ensure requirements.txt is up to date with all Python dependencies. - Use python -m app.api to run the API locally for testing. - Swagger UI is served from openapi.yaml and should reflect all available endpoints.

/predict: Accepts a test dataset (e.g., JSON or file upload) and returns model predictions. - /train: Accepts a dataset and optional parameters to train a new model. - /models:
- GET: Lists available models in the local registry.
- POST: Uploads a new model to the registry.
/status/<job_id>: Returns the status of a training job (e.g., pending, running, completed, failed). - All endpoints must be documented in openapi.yaml for DART UI integration.

Models are stored in the models/ directory with metadata (e.g., name, version, date). - model_registry.py must handle model lookup, registration, and versioning. - No cloud storage is used; all models are stored locally or on a private server.

inference.py should load the latest or specified model from the registry and run predictions. - training.py should support training from scratch or fine-tuning, saving the model to the registry. - Long-running training jobs should be handled asynchronously via jobs.py.

Use jobs.py to manage background tasks (e.g., training). - Each job should have a unique ID and status tracking. - Consider using threading, multiprocessing, or a lightweight queue like RQ.

Keep README.md updated with setup, usage, and endpoint examples. - Ensure openapi.yaml is synchronized with actual API behavior. - Document model formats, expected input/output, and training parameters.

Use the provided Dockerfile and docker-compose.yml for reproducible builds. - Ensure all paths in config.py are relative or configurable via environment variables. - Avoid hardcoding file paths or secrets.

Use config.py to load settings from a .env file or YAML/JSON config. - Include paths for model storage, logging, and job tracking.