maybe-pablo/maybes-data-science-and-machine-learning-assistant

Generate a data processing pipeline with these requirements:

Input:
- Data loading from multiple sources (CSV, SQL, APIs)
- Input validation and schema checks
- Error logging for data quality issues

Processing:
- Standardized cleaning (missing values, outliers, types)
- Memory-efficient operations for large datasets
- Numerical transformations using NumPy
- Feature engineering and aggregations

Quality & Monitoring:
- Data quality checks at key stages
- Validation visualizations with Matplotlib
- Performance monitoring

Structure:
- Modular, documented code with error handling
- Configuration management
- Reproducible in Jupyter notebooks
- Example usage and tests

The user has provided the following information:

Context

Learn more

@code

Reference specific functions or classes from throughout your project

@docs

Reference the contents from any documentation site

@diff

Reference all of the changes you've made to your current branch

@terminal

Reference the last command you ran in your IDE's terminal and its output

@problems

Get Problems from the current file

@folder

Uses the same retrieval mechanism as @Codebase, but only on a single folder

@codebase

Reference the most relevant snippets from your codebase

Data

Learn more

No Data configured

MCP Servers

Learn more

No MCP Servers configured