zebulun-woods/aialgo

zebulun-woods/aialgo icon

public

Published on 4/26/2025

ai-database-03

Rules

Prompts

Models

Context

Models

Claude 3.7 Sonnet

anthropic

200kinput·8.192koutput

Claude 3.5 Sonnet

anthropic

200kinput·8.192koutput

Gemini 2.5 Pro

gemini

1048kinput·65.536koutput

MCP Servers

No MCP Servers configured

Rules

Data science & machine learning rules

You are an experienced data scientist who specializes in Python-based
data science and machine learning. You use the following tools:
- Python 3 as the primary programming language
- PyTorch for deep learning and neural networks
- NumPy for numerical computing and array operations
- Pandas for data manipulation and analysis
- Jupyter for interactive development and visualization
- Conda for environment and package management
- Matplotlib for data visualization and plotting

- You are a PyTorch ML engineer
- Use type hints consistently
- Optimize for readability over premature optimization
- Write modular code, using separate files for models, data loading, training, and evaluation
- Follow PEP8 style guide for Python code

Docs

torch.nn Docshttps://pytorch.org/docs/stable/nn.html

Pandashttps://pandas.pydata.org/docs/

NumPyhttps://numpy.org/doc/stable/

PyTorchhttps://pytorch.org/docs/stable/index.html

Prompts

RAG Pipeline Design

Comprehensive retrieval-augmented generation system design

Design a RAG (Retrieval-Augmented Generation) system with:

Document Processing:
- Text extraction strategy
- Chunking approach with size and overlap parameters
- Metadata extraction and enrichment
- Document hierarchy preservation

Vector Store Integration:
- Embedding model selection and rationale
- Vector database architecture
- Indexing strategy
- Query optimization

Retrieval Strategy:
- Hybrid search (vector + keyword)
- Re-ranking methodology
- Metadata filtering capabilities
- Multi-query reformulation

LLM Integration:
- Context window optimization
- Prompt engineering for retrieval
- Citation and source tracking
- Hallucination mitigation strategies

Evaluation Framework:
- Retrieval relevance metrics
- Answer accuracy measures
- Ground truth comparison
- End-to-end benchmarking

Deployment Architecture:
- Caching strategies
- Scaling considerations
- Latency optimization
- Monitoring approach

The user's knowledge base has the following characteristics:

Data Pipeline Development

Create robust and scalable data processing pipelines

Generate a data processing pipeline with these requirements:

Input:
- Data loading from multiple sources (CSV, SQL, APIs)
- Input validation and schema checks
- Error logging for data quality issues

Processing:
- Standardized cleaning (missing values, outliers, types)
- Memory-efficient operations for large datasets
- Numerical transformations using NumPy
- Feature engineering and aggregations

Quality & Monitoring:
- Data quality checks at key stages
- Validation visualizations with Matplotlib
- Performance monitoring

Structure:
- Modular, documented code with error handling
- Configuration management
- Reproducible in Jupyter notebooks
- Example usage and tests

The user has provided the following information:

Exploratory Data Analysis

Initial data exploration and key insights

Create an exploratory data analysis workflow that includes:

Data Overview:
- Basic statistics (mean, median, std, quartiles)
- Missing values and data types
- Unique value distributions

Visualizations:
- Numerical: histograms, box plots
- Categorical: bar charts, frequency plots
- Relationships: correlation matrices
- Temporal patterns (if applicable)

Quality Assessment:
- Outlier detection
- Data inconsistencies
- Value range validation

Insights & Documentation:
- Key findings summary
- Data quality issues
- Variable relationships
- Next steps recommendations
- Reproducible Jupyter notebook

The user has provided the following information:

Create a training loop

Please create a training loop following these guidelines:
- Include validation step
- Add proper device handling (CPU/GPU)
- Implement gradient clipping
- Add learning rate scheduling
- Include early stopping
- Add progress bars using tqdm
- Implement checkpointing

Create a new PyTorch module

Please create a new PyTorch module following these guidelines:
- Include docstrings for the model class and methods
- Add type hints for all parameters
- Add basic validation in __init__

Convert module to equations

Please convert this PyTorch module to equations. Use KaTex, surrounding any equations in double dollar signs, like $$E_1 = E_2$$. Your output should include step by step explanations of what happens at each step and a very short explanation of the purpose of that step.

Context

Reference all of the changes you've made to your current branch

Reference the most relevant snippets from your codebase

Reference the markdown converted contents of a given URL

Uses the same retrieval mechanism as @Codebase, but only on a single folder

Reference the last command you ran in your IDE's terminal and its output

Reference specific functions or classes from throughout your project

Reference any file in your current workspace