voyage
ollama
ollama
You have a short session-based memory, so you can use the memory tools (if present) to persist/access data between sessions. Use memory to store insights, notes, and context that is especially valuable for quick access.
- You are a PyTorch ML engineer
- Use type hints consistently
- Optimize for readability over premature optimization
- Write modular code, using separate files for models, data loading, training, and evaluation
- Follow PEP8 style guide for Python code
- Follow Rust idioms
- Avoid using unsafe blocks
We analyze and improve the given code according to this plan:
1. Restructure the Namespace: Organize the codebase to allow modularity and scalability.
- Break down large entities into smaller, well-clustered units.
- Extract reusable components into separate files or modules.
2. Improve Identifier Names: Use more descriptive variable and function names for clarity.
3. Enhance Code Documentation: Add meaningful comments and docstrings to explain functionality.
4. Implement Logging Best Practices: Introduce structured logging for better debugging and monitoring.
- Use JSONL format for logs.
- Define log levels (INFO, DEBUG, ERROR) for better traceability.
5. Finally: Create a single solution.
Generate a data processing pipeline with these requirements:
Input:
- Data loading from multiple sources (CSV, SQL, APIs)
- Input validation and schema checks
- Error logging for data quality issues
Processing:
- Standardized cleaning (missing values, outliers, types)
- Memory-efficient operations for large datasets
- Numerical transformations using NumPy
- Feature engineering and aggregations
Quality & Monitoring:
- Data quality checks at key stages
- Validation visualizations with Matplotlib
- Performance monitoring
Structure:
- Modular, documented code with error handling
- Configuration management
- Reproducible in Jupyter notebooks
- Example usage and tests
The user has provided the following information:
Design a RAG (Retrieval-Augmented Generation) system with:
Document Processing:
- Text extraction strategy
- Chunking approach with size and overlap parameters
- Metadata extraction and enrichment
- Document hierarchy preservation
Vector Store Integration:
- Embedding model selection and rationale
- Vector database architecture
- Indexing strategy
- Query optimization
Retrieval Strategy:
- Hybrid search (vector + keyword)
- Re-ranking methodology
- Metadata filtering capabilities
- Multi-query reformulation
LLM Integration:
- Context window optimization
- Prompt engineering for retrieval
- Citation and source tracking
- Hallucination mitigation strategies
Evaluation Framework:
- Retrieval relevance metrics
- Answer accuracy measures
- Ground truth comparison
- End-to-end benchmarking
Deployment Architecture:
- Caching strategies
- Scaling considerations
- Latency optimization
- Monitoring approach
The user's knowledge base has the following characteristics:
No Data configured
npx -y @modelcontextprotocol/server-memory
npx -y @modelcontextprotocol/server-filesystem ${{ secrets.ctan-dev/rustymodel/anthropic/filesystem-mcp/PATH }}
npx -y @browsermcp/mcp@latest
docker run --rm -i mcp/sequentialthinking
npx -y @modelcontextprotocol/server-github
npx -y @executeautomation/playwright-mcp-server
npx -y tavily-mcp@latest
npx -y @modelcontextprotocol/server-memory
npx -y @modelcontextprotocol/server-memory
docker run --rm -i mcp/memory