a-le-thanh-son/a-le-thanh-son-first-assistant icon
public
Published on 5/23/2025
My First Assistant

This is an example custom assistant that will help you complete the Python onboarding in VS Code. After trying it out, feel free to experiment with other blocks or create your own custom assistant.

Rules
Prompts
Models
Context
relace Relace Instant Apply model icon

Relace Instant Apply

relace

40kinput·32koutput
anthropic Claude 3.7 Sonnet model icon

Claude 3.7 Sonnet

anthropic

200kinput·8.192koutput
anthropic Claude 3.5 Haiku model icon

Claude 3.5 Haiku

anthropic

200kinput·8.192koutput
mistral Codestral model icon

Codestral

mistral

voyage Voyage AI rerank-2 model icon

Voyage AI rerank-2

voyage

voyage voyage-code-3 model icon

voyage-code-3

voyage

# Manus Chat Real-time Data Crawling Implementation

## Overview

This implementation provides comprehensive real-time data crawling from the Manus.im interface using Playwright for browser automation. The system captures complete HTML structure, monitors for new messages and task completion notifications, and streams data in real-time to the frontend via WebSockets.

## Key Features

### 1. **Enhanced HTML Content Extraction**
- **Complete Chat Structure**: Captures full HTML of chat containers, sidebars, and page content
- **Message Metadata**: Extracts timestamps, message IDs, completion status, and element classes
- **Completion Detection**: Automatically detects task completion notifications
- **Active Element Monitoring**: Tracks typing indicators, loading states, and animations

### 2. **Real-time Monitoring Service**
- **Continuous Monitoring**: Polls Manus interface every 2 seconds for changes
- **Extended Duration Support**: Handles response times up to 15 minutes (configurable)
- **Smart Detection**: Identifies new messages, status changes, and completion events
- **Background Processing**: Runs monitoring in async tasks without blocking main thread

### 3. **WebSocket Real-time Streaming**
- **Multiple Message Types**: Supports various update types (realtime_update, task_completed, monitoring_heartbeat, etc.)
- **Structured Data**: Sends complete chat context with metadata
- **Progress Tracking**: Real-time progress updates and elapsed time tracking
- **Error Handling**: Robust error handling and reconnection logic

### 4. **Chrome Profile Integration**
- **Persistent Authentication**: Maintains login state across sessions
- **Profile Management**: Uses existing Chrome profiles for seamless authentication
- **Session Persistence**: Preserves cookies and session data

## Architecture

```
┌─────────────────┐    WebSocket    ┌─────────────────┐    Playwright    ┌─────────────────┐
│   NextJS        │◄──────────────►│   FastAPI       │◄───────────────►│   Manus.im      │
│   Frontend      │                │   Backend       │                 │   Interface     │
│                 │                │                 │                 │                 │
│ - Chat UI       │                │ - Real-time     │                 │ - Chat Messages │
│ - Data Display  │                │   Monitoring    │                 │ - Completion    │
│ - Status Panel  │                │ - HTML Crawler  │                 │   Notifications │
└─────────────────┘                │ - WebSocket     │                 │ - Page Content  │
                                   │   Manager       │                 └─────────────────┘
                                   └─────────────────┘
```

## Implementation Details

### Backend Components

#### 1. Enhanced Message Extraction (`extract_chat_messages`)
```python
async def extract_chat_messages(page: Page) -> List[Dict[str, Any]]:
    """Extracts all messages with complete metadata including HTML structure."""
```
- Captures user messages, AI responses, and system notifications
- Includes full HTML, timestamps, element classes, and completion status
- Detects completion notifications automatically

#### 2. Complete Chat Context (`extract_complete_chat_context`)
```python
async def extract_complete_chat_context(page: Page) -> Dict[str, Any]:
    """Extracts complete chat context including HTML structure and metadata."""
```
- Captures entire page context (chat container, sidebar, full HTML)
- Tracks active elements and page state
- Provides comprehensive metadata for analysis

#### 3. Real-time Monitoring (`monitor_manus_realtime`)
```python
async def monitor_manus_realtime(page: Page, session_id: str, max_duration: int = 900) -> None:
    """Monitors Manus interface in real-time and streams data via WebSocket."""
```
- Continuous monitoring with configurable intervals
- Automatic completion detection
- Real-time WebSocket streaming
- Heartbeat messages for connection health

### Frontend Components

#### 1. Enhanced WebSocket Context
- New interfaces for real-time data types
- Support for multiple message types
- Real-time data state management
- Monitoring status tracking

#### 2. Real-time Data Panel
- Live monitoring status display
- Chat context information
- HTML data size metrics
- Progress and timing information

## WebSocket Message Types

### 1. `realtime_update`
```json
{
  "type": "realtime_update",
  "data": {
    "chat_context": { /* Complete chat context */ },
    "elapsed_time": 45.2,
    "new_messages": 1,
    "monitoring_status": "active"
  }
}
```

### 2. `task_completed`
```json
{
  "type": "task_completed",
  "data": {
    "chat_context": { /* Final chat context */ },
    "elapsed_time": 180.5,
    "completion_detected": true,
    "monitoring_status": "completed"
  }
}
```

### 3. `monitoring_heartbeat`
```json
{
  "type": "monitoring_heartbeat",
  "data": {
    "elapsed_time": 120.0,
    "message_count": 5,
    "monitoring_status": "active",
    "remaining_time": 780.0
  }
}
```

## Usage Examples

### 1. Basic Real-time Monitoring
```python
# Send message with real-time monitoring enabled
result = await process_chat_message(
    session_id="my_session",
    message="Create a Python script",
    enable_realtime_monitoring=True
)
```

### 2. Frontend Integration
```typescript
// Access real-time data in React component
const { realtimeData, monitoringActive, chatContext } = useWebSocket();

// Display monitoring status
{monitoringActive && (
  <div className="monitoring-active">
    Real-time monitoring active
  </div>
)}
```

### 3. Testing Real-time Features
```bash
# Run the test script
cd fastapi
python test_realtime_monitoring.py
```

## Configuration

### Monitoring Settings
- **Monitoring Interval**: 2 seconds (configurable)
- **Maximum Duration**: 900 seconds (15 minutes)
- **Heartbeat Interval**: 30 seconds
- **HTML Size Limit**: 500KB for full page HTML

### WebSocket Settings
- **Reconnection**: Automatic with exponential backoff
- **Timeout**: 10 seconds for message reception
- **Buffer Size**: Optimized for large HTML payloads

## Performance Considerations

### 1. **Data Size Management**
- HTML content is truncated at 500KB to prevent memory issues
- Only essential metadata is transmitted in real-time updates
- Efficient JSON serialization for WebSocket messages

### 2. **Resource Usage**
- Monitoring runs in background async tasks
- Automatic cleanup of inactive sessions
- Memory-efficient HTML parsing

### 3. **Network Optimization**
- Incremental updates only when changes detected
- Compressed WebSocket messages
- Heartbeat messages for connection health

## Error Handling

### 1. **Browser Automation Errors**
- Automatic retry for failed page interactions
- Graceful handling of element not found errors
- Session recovery for browser crashes

### 2. **WebSocket Errors**
- Automatic reconnection with backoff
- Message queuing during disconnections
- Error state propagation to frontend

### 3. **Monitoring Errors**
- Continuous operation despite individual errors
- Error logging and reporting
- Fallback to basic monitoring mode

## Testing

### 1. **Unit Tests**
- Individual function testing for extractors
- WebSocket message format validation
- Error condition testing

### 2. **Integration Tests**
- End-to-end real-time monitoring
- Browser automation testing
- WebSocket communication testing

### 3. **Performance Tests**
- Long-duration monitoring tests
- High-frequency update handling
- Memory usage monitoring

## Future Enhancements

### 1. **Advanced Analytics**
- Response time analysis
- Message pattern recognition
- Performance metrics collection

### 2. **Enhanced UI**
- Real-time charts and graphs
- Historical data visualization
- Export functionality for crawled data

### 3. **Scalability**
- Multi-session monitoring
- Distributed monitoring architecture
- Database storage for historical data

## Troubleshooting

### Common Issues

1. **Monitoring Not Starting**
   - Check Chrome profile setup
   - Verify Manus.im authentication
   - Check WebSocket connection

2. **Missing Real-time Updates**
   - Verify WebSocket connection status
   - Check browser console for errors
   - Ensure monitoring is enabled

3. **Performance Issues**
   - Reduce monitoring frequency
   - Limit HTML data collection
   - Check system resources

### Debug Mode
Enable detailed logging by setting log level to DEBUG:
```python
logging.basicConfig(level=logging.DEBUG)
```

## Conclusion

This implementation provides a robust, scalable solution for real-time data crawling from Manus.im. The system handles extended response times, maintains authentication state, and provides comprehensive data extraction with real-time streaming capabilities.
Pythonhttps://docs.python.org/3/

Prompts

Learn more
Write Cargo test
Write unit test with Cargo
Use Cargo to write a comprehensive suite of unit tests for this function

Context

Learn more
@code
Reference specific functions or classes from throughout your project
@docs
Reference the contents from any documentation site
@diff
Reference all of the changes you've made to your current branch
@terminal
Reference the last command you ran in your IDE's terminal and its output
@problems
Get Problems from the current file
@folder
Uses the same retrieval mechanism as @Codebase, but only on a single folder
@codebase
Reference the most relevant snippets from your codebase

No Data configured

MCP Servers

Learn more

No MCP Servers configured