qredence/zacharys-data-science-and-machine-learning-assistant

public

Published on 2/28/2025

Zachary's Data science and machine learning Assistant

Specialized in data science and ML, focusing on Python scientific stack, statistical analysis, and model development.

Rules

Prompts

Models

Context

Models

Learn more

Claude 3.7 Sonnet

anthropic

chat

edit

200kinput·8.192koutput

Claude 3.5 Haiku

anthropic

apply

summarize

chat

200kinput·8.192koutput

Codestral

mistral

autocomplete

Voyage AI rerank-2

voyage

rerank

voyage-code-3

voyage

embed

Rules

Learn more

Data science & machine learning rules

You are an experienced data scientist who specializes in Python-based
data science and machine learning. You use the following tools:
- Python 3 as the primary programming language
- PyTorch for deep learning and neural networks
- NumPy for numerical computing and array operations
- Pandas for data manipulation and analysis
- Jupyter for interactive development and visualization
- Conda for environment and package management
- Matplotlib for data visualization and plotting

Docs

Learn more

Condahttps://docs.conda.io/en/latest/

Jupyterhttps://docs.jupyter.org/en/latest/

Matplotlibhttps://matplotlib.org/stable/

NumPyhttps://numpy.org/doc/stable/

Pandashttps://pandas.pydata.org/docs/

Pythonhttps://docs.python.org/3/

PyTorchhttps://pytorch.org/docs/stable/index.html

Prompts

Learn more

Exploratory Data Analysis

Initial data exploration and key insights

Create an exploratory data analysis workflow that includes:

Data Overview:
- Basic statistics (mean, median, std, quartiles)
- Missing values and data types
- Unique value distributions

Visualizations:
- Numerical: histograms, box plots
- Categorical: bar charts, frequency plots
- Relationships: correlation matrices
- Temporal patterns (if applicable)

Quality Assessment:
- Outlier detection
- Data inconsistencies
- Value range validation

Insights & Documentation:
- Key findings summary
- Data quality issues
- Variable relationships
- Next steps recommendations
- Reproducible Jupyter notebook

The user has provided the following information:

Data Pipeline Development

Create robust and scalable data processing pipelines

Generate a data processing pipeline with these requirements:

Input:
- Data loading from multiple sources (CSV, SQL, APIs)
- Input validation and schema checks
- Error logging for data quality issues

Processing:
- Standardized cleaning (missing values, outliers, types)
- Memory-efficient operations for large datasets
- Numerical transformations using NumPy
- Feature engineering and aggregations

Quality & Monitoring:
- Data quality checks at key stages
- Validation visualizations with Matplotlib
- Performance monitoring

Structure:
- Modular, documented code with error handling
- Configuration management
- Reproducible in Jupyter notebooks
- Example usage and tests

The user has provided the following information:

Context

Learn more

@code

Reference specific functions or classes from throughout your project

@docs

Reference the contents from any documentation site

@diff

Reference all of the changes you've made to your current branch

@terminal

Reference the last command you ran in your IDE's terminal and its output

@problems

Get Problems from the current file

@folder

Uses the same retrieval mechanism as @Codebase, but only on a single folder

@codebase

Reference the most relevant snippets from your codebase

Data

Learn more

No Data configured

MCP Servers

Learn more

No MCP Servers configured