scar-chat7/PROGRESS-SCREENCAPTURE.md

205 lines
7.8 KiB
Markdown

# Screen Capture Implementation Progress
## Overview
Implementation of screen capture functionality for SCAR Chat with support for multiple backends:
- **Wayland/Hyprland**: xdg-desktop-portal + Pipewire
- **X11**: FFmpeg with x11grab
- **Windows**: FFmpeg with GDI
## Current Status: 🟡 In Progress
---
## Architecture
### Backend Detection
- ✅ Auto-detection via environment variables (WAYLAND_DISPLAY, HYPRLAND_INSTANCE_SIGNATURE)
- ✅ Fallback mechanism (Portal → X11 → Windows)
- ✅ Manual backend selection support
### Wayland/Hyprland Implementation (Priority)
**Status**: 🟡 In Progress - **CORRECTED TO USE HYPRLAND PORTAL**
**Critical Architecture Understanding**:
- xdg-desktop-portal-hyprland **implements** org.freedesktop.portal.ScreenCast (standard API)
- We use **sdbus-c++** library (NOT libdbus-1) to communicate with the portal
- The portal handles Hyprland-specific details internally (via hyprland-share-picker)
- From client perspective: call standard portal API → portal shows hyprland-share-picker → get Pipewire stream
**Dependencies**:
- **sdbus-c++** (required for DBus communication) ✅ Installed
- **libpipewire-0.3** (for stream handling) ✅ Installed
- **xdg-desktop-portal-hyprland** (runtime requirement - provides hyprland-share-picker)
- **spa-utils** (for spa_hook structure)
**Implementation Tasks**:
- [x] Switch from libdbus-1 to sdbus-c++
- [x] Use standard org.freedesktop.portal.Desktop.ScreenCast interface
- [x] Update CMakeLists.txt with pkg-config for sdbus-c++ and pipewire
- [x] Screen capture class structure with backend selection
- [x] initPortalConnection() - sdbus session bus connection
- [x] cleanupPortalConnection() - Resource cleanup
- [x] createPortalSession() - Use sdbus-c++ for CreateSession method with unique handles
- [x] selectPortalSources() - Use sdbus-c++ for SelectSources (portal shows hyprland-share-picker)
- [x] startPortalSession() - Use sdbus-c++ for Start method
- [x] openPipeWireRemote() - Get file descriptor from portal using UnixFd
- [x] getStreamsNodeId() - Query session Streams property to get actual node_id
- [x] initPipewire() - Complete implementation with stream connection, listeners, and thread loop
- [x] onStreamProcess() - Frame callback implementation that dequeues buffers and invokes user callback
- [x] onStreamParamChanged() - Handle resolution/format changes and update dimensions
- [x] cleanupPipewire() - Stop thread loop and cleanup resources properly
- [x] UI Integration - Add floating screen share button to VideoGridWidget
- [ ] Test end-to-end screen capture flow
- [ ] Frame buffer memory management optimization
- [ ] Error handling and session recovery
- [ ] Restore token support for session persistence
**UI Integration Details**:
- Added \"Share Screen\" button to VideoGridWidget
- Button floats at bottom center of video grid, above video streams
- Positioned via resize events to stay centered
- Styled with Discord-like blue theme
- Toggles between \"Share Screen\" (blue) and \"Stop Sharing\" (red)
- Clicking button calls `ScreenCapture::start()` which opens xdg-desktop-portal dialog
- ScreenCapture instance managed by VideoGridWidget
- Signal `screenShareRequested()` emitted when sharing starts
**Notes**:
- Standard portal API is service name: `org.freedesktop.portal.Desktop`
- Object path: `/org/freedesktop/portal/desktop`
- Interface: `org.freedesktop.portal.ScreenCast`
- When SelectSources is called, xdph automatically launches hyprland-share-picker GUI
- User selection is handled transparently - we just get back session handle + Pipewire node
**Technical Details**:
```cpp
// XDG Desktop Portal ScreenCast API Workflow:
// 1. org.freedesktop.portal.ScreenCast.CreateSession(options) -> session_handle
// - Creates session object for this screen cast
//
// 2. org.freedesktop.portal.ScreenCast.SelectSources(session_handle, options)
// - options.types: MONITOR(1), WINDOW(2), VIRTUAL(4)
// - options.multiple: allow selecting multiple sources
// - options.cursor_mode: Hidden(1), Embedded(2), Metadata(4)
// - options.persist_mode: DoNotPersist(0), WhileRunning(1), UntilRevoked(2)
//
// 3. org.freedesktop.portal.ScreenCast.Start(session_handle, parent_window, options)
// - User selects screen/window via portal UI
// - Response includes: streams array with [(node_id, properties)]
// - Each stream has: id, position, size, source_type
// - Returns restore_token for future sessions
//
// 4. org.freedesktop.portal.ScreenCast.OpenPipeWireRemote(session_handle) -> fd
// - Returns file descriptor for PipeWire connection
//
// 5. Pipewire Connection:
// - pw_context_connect_fd(fd) creates pw_core
// - pw_stream_new() with node_id from Step 3
// - pw_stream_add_listener() for frame callbacks
// - pw_stream_connect() to start streaming
//
// 6. Frame Processing:
// - on_process() callback receives spa_buffer with frame data
// - Extract video/raw format (RGB, YUV, etc.)
// - Invoke FrameCallback with decoded data
```
### X11 Implementation (Fallback)
**Status**: 🔴 Not Started
**Dependencies**:
- FFmpeg (libavformat, libavcodec, libavutil, libavdevice)
- X11 libraries
**Implementation Tasks**:
- [ ] FFmpeg context initialization
- [ ] x11grab input device configuration
- [ ] Frame extraction and decoding
- [ ] Frame callback integration
- [ ] Display selection (multi-monitor support)
### Windows Implementation (Future)
**Status**: 🔴 Not Started
**Dependencies**:
- FFmpeg with GDI support
**Implementation Tasks**:
- [ ] FFmpeg GDI grabber setup
- [ ] Frame processing pipeline
- [ ] Display enumeration
---
## Testing Plan
### Unit Tests
- [ ] Backend detection on different environments
- [ ] Frame callback invocation
- [ ] Start/stop lifecycle
- [ ] Memory leak verification
### Integration Tests
- [ ] Wayland/Hyprland capture on real desktop
- [ ] X11 capture verification
- [ ] Multi-monitor scenarios
- [ ] Permission denial handling
### Performance Tests
- [ ] Frame rate consistency (target: 30 FPS)
- [ ] CPU usage profiling
- [ ] Memory usage under continuous capture
---
## Known Issues & Limitations
### Current
- All backends are stubs (no actual implementation)
- No frame encoding/compression
- No multi-monitor selection UI
### Future Considerations
- Portal permissions may require user interaction each session
- Hyprland-specific optimizations possible via hyprland-share-picker
- Frame rate limiting needed to prevent CPU overload
- Consider hardware encoding for lower CPU usage
---
## Code Locations
- **Header**: `client/media/screen_capture.h`
- **Implementation**: `client/media/screen_capture.cpp`
- **Dependencies**: `CMakeLists.txt` (client section)
---
## Next Steps
1. Implement Pipewire + Portal screen capture for Wayland/Hyprland
2. Test on Hyprland environment
3. Implement X11 fallback
4. Add frame encoding for network transmission
5. Integrate with video streaming protocol
---
## Session Log
### Session 1 - December 7, 2025
- **Completed**: Authentication system fully working (plaintext → salt → argon2 verification)
- **Fixed**: Message deserialization bug (async buffer capture issue in both client and server)
- **Status**: Ready to begin screen capture implementation
- **Decision**: Prioritize Wayland/Hyprland implementation due to target environment
### Session 2 - December 7, 2025 (Current)
- **Researched**: xdg-desktop-portal-hyprland specifications and org.freedesktop.portal.ScreenCast API
- **Implemented**:
- Screen capture header with forward declarations for DBus/Pipewire types
- Basic structure with backend detection and selection
- DBus initialization and cleanup functions
- Pipewire initialization skeleton (loop, context creation)
- Platform-specific compilation (#ifdef __linux__)
- startPortalCapture() workflow outline (6-step process)
- **TODO**: Implement actual DBus method calls for portal communication
- **Next**: Implement createPortalSession() with proper DBus message building