# Screen Capture Implementation Progress ## Overview Implementation of screen capture functionality for SCAR Chat with support for multiple backends: - **Wayland/Hyprland**: xdg-desktop-portal + Pipewire - **X11**: FFmpeg with x11grab - **Windows**: FFmpeg with GDI ## Current Status: 🟡 In Progress --- ## Architecture ### Backend Detection - ✅ Auto-detection via environment variables (WAYLAND_DISPLAY, HYPRLAND_INSTANCE_SIGNATURE) - ✅ Fallback mechanism (Portal → X11 → Windows) - ✅ Manual backend selection support ### Wayland/Hyprland Implementation (Priority) **Status**: 🟡 In Progress - **CORRECTED TO USE HYPRLAND PORTAL** **Critical Architecture Understanding**: - xdg-desktop-portal-hyprland **implements** org.freedesktop.portal.ScreenCast (standard API) - We use **sdbus-c++** library (NOT libdbus-1) to communicate with the portal - The portal handles Hyprland-specific details internally (via hyprland-share-picker) - From client perspective: call standard portal API → portal shows hyprland-share-picker → get Pipewire stream **Dependencies**: - **sdbus-c++** (required for DBus communication) ✅ Installed - **libpipewire-0.3** (for stream handling) ✅ Installed - **xdg-desktop-portal-hyprland** (runtime requirement - provides hyprland-share-picker) - **spa-utils** (for spa_hook structure) **Implementation Tasks**: - [x] Switch from libdbus-1 to sdbus-c++ - [x] Use standard org.freedesktop.portal.Desktop.ScreenCast interface - [x] Update CMakeLists.txt with pkg-config for sdbus-c++ and pipewire - [x] Screen capture class structure with backend selection - [x] initPortalConnection() - sdbus session bus connection - [x] cleanupPortalConnection() - Resource cleanup - [x] createPortalSession() - Use sdbus-c++ for CreateSession method with unique handles - [x] selectPortalSources() - Use sdbus-c++ for SelectSources (portal shows hyprland-share-picker) - [x] startPortalSession() - Use sdbus-c++ for Start method - [x] openPipeWireRemote() - Get file descriptor from portal using UnixFd - [x] getStreamsNodeId() - Query session Streams property to get actual node_id - [x] initPipewire() - Complete implementation with stream connection, listeners, and thread loop - [x] onStreamProcess() - Frame callback implementation that dequeues buffers and invokes user callback - [x] onStreamParamChanged() - Handle resolution/format changes and update dimensions - [x] cleanupPipewire() - Stop thread loop and cleanup resources properly - [ ] Test end-to-end screen capture flow - [ ] Frame buffer memory management optimization - [ ] Error handling and session recovery - [ ] Restore token support for session persistence **Notes**: - Standard portal API is service name: `org.freedesktop.portal.Desktop` - Object path: `/org/freedesktop/portal/desktop` - Interface: `org.freedesktop.portal.ScreenCast` - When SelectSources is called, xdph automatically launches hyprland-share-picker GUI - User selection is handled transparently - we just get back session handle + Pipewire node **Technical Details**: ```cpp // XDG Desktop Portal ScreenCast API Workflow: // 1. org.freedesktop.portal.ScreenCast.CreateSession(options) -> session_handle // - Creates session object for this screen cast // // 2. org.freedesktop.portal.ScreenCast.SelectSources(session_handle, options) // - options.types: MONITOR(1), WINDOW(2), VIRTUAL(4) // - options.multiple: allow selecting multiple sources // - options.cursor_mode: Hidden(1), Embedded(2), Metadata(4) // - options.persist_mode: DoNotPersist(0), WhileRunning(1), UntilRevoked(2) // // 3. org.freedesktop.portal.ScreenCast.Start(session_handle, parent_window, options) // - User selects screen/window via portal UI // - Response includes: streams array with [(node_id, properties)] // - Each stream has: id, position, size, source_type // - Returns restore_token for future sessions // // 4. org.freedesktop.portal.ScreenCast.OpenPipeWireRemote(session_handle) -> fd // - Returns file descriptor for PipeWire connection // // 5. Pipewire Connection: // - pw_context_connect_fd(fd) creates pw_core // - pw_stream_new() with node_id from Step 3 // - pw_stream_add_listener() for frame callbacks // - pw_stream_connect() to start streaming // // 6. Frame Processing: // - on_process() callback receives spa_buffer with frame data // - Extract video/raw format (RGB, YUV, etc.) // - Invoke FrameCallback with decoded data ``` ### X11 Implementation (Fallback) **Status**: 🔴 Not Started **Dependencies**: - FFmpeg (libavformat, libavcodec, libavutil, libavdevice) - X11 libraries **Implementation Tasks**: - [ ] FFmpeg context initialization - [ ] x11grab input device configuration - [ ] Frame extraction and decoding - [ ] Frame callback integration - [ ] Display selection (multi-monitor support) ### Windows Implementation (Future) **Status**: 🔴 Not Started **Dependencies**: - FFmpeg with GDI support **Implementation Tasks**: - [ ] FFmpeg GDI grabber setup - [ ] Frame processing pipeline - [ ] Display enumeration --- ## Testing Plan ### Unit Tests - [ ] Backend detection on different environments - [ ] Frame callback invocation - [ ] Start/stop lifecycle - [ ] Memory leak verification ### Integration Tests - [ ] Wayland/Hyprland capture on real desktop - [ ] X11 capture verification - [ ] Multi-monitor scenarios - [ ] Permission denial handling ### Performance Tests - [ ] Frame rate consistency (target: 30 FPS) - [ ] CPU usage profiling - [ ] Memory usage under continuous capture --- ## Known Issues & Limitations ### Current - All backends are stubs (no actual implementation) - No frame encoding/compression - No multi-monitor selection UI ### Future Considerations - Portal permissions may require user interaction each session - Hyprland-specific optimizations possible via hyprland-share-picker - Frame rate limiting needed to prevent CPU overload - Consider hardware encoding for lower CPU usage --- ## Code Locations - **Header**: `client/media/screen_capture.h` - **Implementation**: `client/media/screen_capture.cpp` - **Dependencies**: `CMakeLists.txt` (client section) --- ## Next Steps 1. Implement Pipewire + Portal screen capture for Wayland/Hyprland 2. Test on Hyprland environment 3. Implement X11 fallback 4. Add frame encoding for network transmission 5. Integrate with video streaming protocol --- ## Session Log ### Session 1 - December 7, 2025 - **Completed**: Authentication system fully working (plaintext → salt → argon2 verification) - **Fixed**: Message deserialization bug (async buffer capture issue in both client and server) - **Status**: Ready to begin screen capture implementation - **Decision**: Prioritize Wayland/Hyprland implementation due to target environment ### Session 2 - December 7, 2025 (Current) - **Researched**: xdg-desktop-portal-hyprland specifications and org.freedesktop.portal.ScreenCast API - **Implemented**: - Screen capture header with forward declarations for DBus/Pipewire types - Basic structure with backend detection and selection - DBus initialization and cleanup functions - Pipewire initialization skeleton (loop, context creation) - Platform-specific compilation (#ifdef __linux__) - startPortalCapture() workflow outline (6-step process) - **TODO**: Implement actual DBus method calls for portal communication - **Next**: Implement createPortalSession() with proper DBus message building