The Problem

Why Virtual Mouse?

Traditional input devices are becoming obsolete. The future of human-computer interaction is gesture-based.

Challenges

Physical mouse dependency

Accessibility limitations

Hygiene concerns in shared spaces

Touchless interaction demand rising

Our Solution

Touch-free gesture control

AI-powered cursor navigation

Real-time AI inference

Works with any standard webcam

Pipeline

How It Works

From webcam feed to mouse control — a real-time AI pipeline processing every frame.

Webcam Input

Captures real-time video frames from any standard webcam at 30+ FPS.

cap = cv2.VideoCapture(0)
ret, frame = cap.read()
frame = cv2.flip(frame, 1)

30 FPSCapture Rate

Hand Landmark Detection

MediaPipe Hands detects 21 3D landmarks on each hand with sub-pixel accuracy.

mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
    max_num_hands=1,
    min_detection_confidence=0.7
)
results = hands.process(rgb_frame)

21Landmarks

Gesture Classification

Interprets landmark positions to classify gestures: point, pinch, scroll, and more.

index_tip = landmarks[8]
thumb_tip = landmarks[4]
distance = calculate_distance(
    index_tip, thumb_tip
)
if distance < PINCH_THRESHOLD:
    gesture = "CLICK"

<5msClassification

Coordinate Mapping

Normalizes hand coordinates to screen space with smoothing interpolation.

screen_x = np.interp(
    index_x, (0, cam_w), (0, scr_w)
)
screen_y = np.interp(
    index_y, (0, cam_h), (0, scr_h)
)
# Apply smoothing
smooth_x = prev_x + (screen_x - prev_x) * SMOOTH

1:1Mapping Ratio

OS Mouse Control

Sends mouse events directly to the operating system for seamless integration.

import pyautogui
pyautogui.moveTo(screen_x, screen_y)
if gesture == "CLICK":
    pyautogui.click()
elif gesture == "SCROLL":
    pyautogui.scroll(delta_y)

<1msOS Latency

Engineering

AI & Engineering Deep Dive

Under the hood — the computer vision, gesture logic, and performance engineering that makes it work.

Computer Vision Pipeline

MediaPipe Hands

Detects 21 3D landmarks per hand with real-time inference at 30+ FPS on CPU.

Real-Time Inference

Sub-frame latency processing using optimized TFLite models running on-device.

Confidence Smoothing

Exponential moving average filter to reduce jitter and false detections.

RGB Processing

Converts BGR → RGB for MediaPipe, maintains color space consistency.

per hand

Landmarks

ms/frame

Inference

97.2

Accuracy

computer_vision.py

# MediaPipe Hand Detection Pipeline
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=1,
    min_detection_confidence=0.7,
    min_tracking_confidence=0.5
)

# Process frame
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = hands.process(rgb)

if results.multi_hand_landmarks:
    for hand_landmarks in results.multi_hand_landmarks:
        # Extract 21 landmarks
        landmarks = []
        for lm in hand_landmarks.landmark:
            landmarks.append((lm.x, lm.y, lm.z))

Capabilities

Powerful Features

Every feature is designed for precision, performance, and real-world usability.

Hand Tracking

21-point hand landmark detection with sub-pixel precision using MediaPipe neural networks.

Cursor Control

Smooth, responsive cursor movement mapped from hand position to full screen coordinates.

Click & Scroll Gestures

Pinch to click, vertical finger drag to scroll — natural, intuitive gesture mappings.

Real-Time Performance

30+ FPS processing with <33ms end-to-end latency. No GPU required — runs on CPU.

Accessibility Friendly

Enables computer control for users with motor disabilities or limited hand mobility.

Cross-Platform Ready

Works on Windows, macOS, and Linux. Requires only Python and a standard webcam.

Code

Code & Architecture

Clean architecture designed for extensibility, performance, and maintainability.

System Architecture

┌─────────────────────────────────────────────────────────┐
│                     Virtual Mouse                        │
├──────────┬──────────┬──────────┬──────────┬──────────────┤
│ Webcam   │  Hand    │ Gesture  │ Coord    │    OS        │
│ Capture  │ Detector │ Engine   │ Mapper   │  Control     │
│          │          │          │          │              │
│ OpenCV   │MediaPipe │ Custom   │ NumPy    │ PyAutoGUI    │
│ VideoIO  │ Hands   │ Logic    │Interp    │ Mouse API    │
├──────────┴──────────┴──────────┴──────────┴──────────────┤
│              Performance Optimizer                        │
│     (Frame Skip · ROI Crop · Kalman Filter)              │
└─────────────────────────────────────────────────────────┘

Project Structure

virtual-mouse/
├── main.py              # Entry point
├── hand_tracker.py      # MediaPipe hand detection
├── gesture_classifier.py # Gesture recognition logic
├── mouse_controller.py  # OS mouse control
├── performance.py       # FPS & optimization utils
├── config.py            # Tuning parameters
├── utils/
│   ├── smoothing.py     # Kalman filter & interpolation
│   └── coordinates.py   # Coordinate normalization
├── requirements.txt
└── README.md

Core Logic (Pseudocode)

virtual_mouse.py

class VirtualMouse:
    def __init__(self):
        self.camera = cv2.VideoCapture(0)
        self.detector = HandDetector(confidence=0.7)
        self.classifier = GestureClassifier()
        self.controller = MouseController()
        self.optimizer = PerformanceOptimizer()
    
    def run(self):
        while True:
            frame = self.camera.read()
            
            # Detect hand landmarks
            landmarks = self.detector.find_hands(frame)
            
            if landmarks:
                # Classify gesture from landmarks
                gesture = self.classifier.classify(landmarks)
                
                # Map hand coordinates to screen
                screen_pos = self.optimizer.smooth(
                    self.map_coordinates(landmarks[8])
                )
                
                # Execute mouse action
                self.controller.execute(gesture, screen_pos)
            
            # Adaptive performance tuning
            self.optimizer.adjust(self.fps)

Explore the Full Source Code on GitHubView Source on GitHub

Roadmap

Future Scope & Vision

What's next — the evolution of gesture-based computing.

In Progress

Phase 1Multi-Hand Gestures

Support for simultaneous two-hand tracking, enabling more complex gesture vocabularies and bimanual interactions.

Planned

Phase 2Gesture Customization

User-defined gesture mapping — assign any hand pose to any computer action through an intuitive configuration UI.

Research

Phase 3ML-Based Gesture Learning

Train the system to recognize new gestures on-the-fly using few-shot learning and user demonstrations.

Vision

Phase 4AR/VR Integration

Extend gesture control to spatial computing environments — mixed reality headsets and holographic interfaces.

Vision

Phase 5Mobile & Edge Devices

Optimize for mobile processors and edge AI chips — on-device inference for IoT and embedded systems.

Creator

Meet the Builder

Created By

Mayank Sharma

AI / ML Engineer

Focused on Computer Vision, Human–AI Interaction, and building intelligent systems that bridge the gap between humans and machines.

GitHub LinkedIn @mayyanks @mayyankks mayyanks.app mayankiitj.in

About This Project

Virtual Mouse is an AI-powered Human–Computer Interaction system that uses real-time computer vision to let you control your computer with just hand gestures. No hardware needed.

┌─────────────────────────────────────────────────────────┐ │ Virtual Mouse │ ├──────────┬──────────┬──────────┬──────────┬──────────────┤ │ Webcam │ Hand │ Gesture │ Coord │ OS │ │ Capture │ Detector │ Engine │ Mapper │ Control │ │ │ │ │ │ │ │ OpenCV │MediaPipe │ Custom │ NumPy │ PyAutoGUI │ │ VideoIO │ Hands │ Logic │Interp │ Mouse API │ ├──────────┴──────────┴──────────┴──────────┴──────────────┤ │ Performance Optimizer │ │ (Frame Skip · ROI Crop · Kalman Filter) │ └─────────────────────────────────────────────────────────┘

virtual-mouse/ ├── main.py # Entry point ├── hand_tracker.py # MediaPipe hand detection ├── gesture_classifier.py # Gesture recognition logic ├── mouse_controller.py # OS mouse control ├── performance.py # FPS & optimization utils ├── config.py # Tuning parameters ├── utils/ │ ├── smoothing.py # Kalman filter & interpolation │ └── coordinates.py # Coordinate normalization ├── requirements.txt └── README.md

class VirtualMouse: def __init__(self): self.camera = cv2.VideoCapture(0) self.detector = HandDetector(confidence=0.7) self.classifier = GestureClassifier() self.controller = MouseController() self.optimizer = PerformanceOptimizer() def run(self): while True: frame = self.camera.read() # Detect hand landmarks landmarks = self.detector.find_hands(frame) if landmarks: # Classify gesture from landmarks gesture = self.classifier.classify(landmarks) # Map hand coordinates to screen screen_pos = self.optimizer.smooth( self.map_coordinates(landmarks[8]) ) # Execute mouse action self.controller.execute(gesture, screen_pos) # Adaptive performance tuning self.optimizer.adjust(self.fps)