anemos-dan/pyopti icon
public
Published on 7/16/2025
Python_Optimization

Rules

Auto-Attach Rule: Optimization

Tags

  • Optimization

Optimization Rules

Logging

  • Replace print() debug statements with Python’s logging module.
    • Use:
      import logging
      logger = logging.getLogger(__name__)
      
    • Log Levels:
      • DEBUG: For intermediate values.
      • INFO: For major steps.
      • WARNING/ERROR: For issues.

Refactoring

  • Encapsulate every logical block in standalone functions:
    • Data loading
    • Preprocessing
    • Model training
    • Visualization
  • Consider splitting related functions into separate modules/packages.

Data Loading & HDF5

  • Use optimized approaches:
    • Chunked reading
    • pandas.HDFStore
  • Ensure all team members understand HDF5 concepts.
  • Best Practices for HDF5 Handling with h5py:
    • File Access Modes:
      • Use the appropriate mode to open an HDF5 file:
        • 'r': Read-only.
        • 'w': Write (overwrites existing files).
        • 'a': Read/Write (creates the file if it does not exist).
        • 'r+': Read/Write (requires the file to already exist).
    • Using Context Managers:
      • Utilize the with context manager to ensure files are properly closed after operations.
        with h5py.File('filename.hdf5', 'r') as f:
            data = f['dataset_name'][:]
        
    • Data Organization:
      • Use groups to organize data hierarchically.
      • Create datasets for your data that can be treated like NumPy arrays.
    • Chunking and Compression:
      • Use chunking for large datasets to improve performance.
        dataset = f.create_dataset('big_dataset', (1000, 1000), chunks=(100, 100))
        
      • Use compression to reduce file size, especially useful for large datasets.
        dataset = f.create_dataset('compressed_dataset', (1000, 1000), compression="gzip")
        
    • Reading and Writing Data:
      • Read and write data in blocks or slices to optimize performance and minimize memory usage.
    • Attribute Management:
      • Use attributes to add metadata to datasets or groups.
        dataset.attrs['description'] = "This dataset contains measurement data"
        
    • Error Handling:
      • Implement appropriate error handling to manage potential issues such as missing datasets or file access errors.

Parameterization

  • Dynamically generate:
    • Plot titles
    • Axis labels
    • File paths, etc.
  • Use dictionaries for:
    • Colors
    • Model names
    • File paths
    • Other configurable values.

Function Architecture

  • Define each function independently; avoid nested definitions.
  • Allow closures only when there’s a clear benefit (e.g., factory pattern).

Testing

  • Automatically write pytest tests for every new function.

Performance Considerations

  • Highlight potential bottlenecks and optimization opportunities for large datasets (> 1 million rows).