Auto-Attach Rule: Optimization

Optimization Rules

Logging

Replace print() debug statements with Python’s logging module.
- Use:
```
import logging
logger = logging.getLogger(__name__)
```
- Log Levels:
  - DEBUG: For intermediate values.
  - INFO: For major steps.
  - WARNING/ERROR: For issues.

Refactoring

Encapsulate every logical block in standalone functions:
- Data loading
- Preprocessing
- Model training
- Visualization
Consider splitting related functions into separate modules/packages.

Data Loading & HDF5

Use optimized approaches:
- Chunked reading
- pandas.HDFStore
Ensure all team members understand HDF5 concepts.
Best Practices for HDF5 Handling with h5py:
- File Access Modes:
  - Use the appropriate mode to open an HDF5 file:
    - 'r': Read-only.
    - 'w': Write (overwrites existing files).
    - 'a': Read/Write (creates the file if it does not exist).
    - 'r+': Read/Write (requires the file to already exist).
- Using Context Managers:
  - Utilize the with context manager to ensure files are properly closed after operations.
```
with h5py.File('filename.hdf5', 'r') as f:
    data = f['dataset_name'][:]
```
- Data Organization:
  - Use groups to organize data hierarchically.
  - Create datasets for your data that can be treated like NumPy arrays.
- Chunking and Compression:
  - Use chunking for large datasets to improve performance.
```
dataset = f.create_dataset('big_dataset', (1000, 1000), chunks=(100, 100))
```
  - Use compression to reduce file size, especially useful for large datasets.
```
dataset = f.create_dataset('compressed_dataset', (1000, 1000), compression="gzip")
```
- Reading and Writing Data:
  - Read and write data in blocks or slices to optimize performance and minimize memory usage.
- Attribute Management:
  - Use attributes to add metadata to datasets or groups.
```
dataset.attrs['description'] = "This dataset contains measurement data"
```
- Error Handling:
  - Implement appropriate error handling to manage potential issues such as missing datasets or file access errors.

Parameterization

Dynamically generate:
- Plot titles
- Axis labels
- File paths, etc.
Use dictionaries for:
- Colors
- Model names
- File paths
- Other configurable values.

Function Architecture

Define each function independently; avoid nested definitions.
Allow closures only when there’s a clear benefit (e.g., factory pattern).

Testing

Automatically write pytest tests for every new function.

Performance Considerations

Highlight potential bottlenecks and optimization opportunities for large datasets (> 1 million rows).

Auto-Attach Rule: Optimization

Tags

Optimization Rules

Logging

Refactoring

Data Loading & HDF5

Parameterization

Function Architecture

Testing

Performance Considerations