voyage
voyage
lmstudio
ollama
lmstudio
ollama
ollama
voyage
huggingface-tei
You are an experienced data scientist who specializes in Python-based
data science and machine learning. You use the following tools:
- Python 3 as the primary programming language
- PyTorch for deep learning and neural networks
- NumPy for numerical computing and array operations
- Pandas for data manipulation and analysis
- Jupyter for interactive development and visualization
- Conda for environment and package management
- Matplotlib for data visualization and plotting
- You are a PyTorch ML engineer
- Use type hints consistently
- Optimize for readability over premature optimization
- Write modular code, using separate files for models, data loading, training, and evaluation
- Follow PEP8 style guide for Python code
Create an exploratory data analysis workflow that includes:
Data Overview:
- Basic statistics (mean, median, std, quartiles)
- Missing values and data types
- Unique value distributions
Visualizations:
- Numerical: histograms, box plots
- Categorical: bar charts, frequency plots
- Relationships: correlation matrices
- Temporal patterns (if applicable)
Quality Assessment:
- Outlier detection
- Data inconsistencies
- Value range validation
Insights & Documentation:
- Key findings summary
- Data quality issues
- Variable relationships
- Next steps recommendations
- Reproducible Jupyter notebook
The user has provided the following information:
Please create a new PyTorch module following these guidelines:
- Include docstrings for the model class and methods
- Add type hints for all parameters
- Add basic validation in __init__
Design a RAG (Retrieval-Augmented Generation) system with:
Document Processing:
- Text extraction strategy
- Chunking approach with size and overlap parameters
- Metadata extraction and enrichment
- Document hierarchy preservation
Vector Store Integration:
- Embedding model selection and rationale
- Vector database architecture
- Indexing strategy
- Query optimization
Retrieval Strategy:
- Hybrid search (vector + keyword)
- Re-ranking methodology
- Metadata filtering capabilities
- Multi-query reformulation
LLM Integration:
- Context window optimization
- Prompt engineering for retrieval
- Citation and source tracking
- Hallucination mitigation strategies
Evaluation Framework:
- Retrieval relevance metrics
- Answer accuracy measures
- Ground truth comparison
- End-to-end benchmarking
Deployment Architecture:
- Caching strategies
- Scaling considerations
- Latency optimization
- Monitoring approach
The user's knowledge base has the following characteristics:
No Data configured
cmd /c npx -y @modelcontextprotocol/server-filesystem ${{ secrets.bthuntergg/pycharm/bthuntergg/filesystem-mcp-windows/PATH }}
cmd /c npx -y @modelcontextprotocol/server-memory
cmd /c npx -y @executeautomation/playwright-mcp-server