VirtualMouse

AI-powered gesture control system using real-time computer vision. Built with innovation, precision, and the future of HCI in mind.

Links

Source Code DocumentationInteractive DemoTechnical Deep Dive mayyanks.app mayankiitj.in

Tech Stack

PythonOpenCVMediaPipePyAutoGUINext.jsTailwind CSSFramer MotionThree.js

© 2026 Virtual Mouse by Mayank Sharma. MIT License.

Built with AI & Vision

The Problem

Why Virtual Mouse?

Traditional input devices are becoming obsolete. The future of human-computer interaction is gesture-based.

Challenges

Physical mouse dependency
Accessibility limitations
Hygiene concerns in shared spaces
Touchless interaction demand rising

Our Solution

Touch-free gesture control
AI-powered cursor navigation
Real-time AI inference
Works with any standard webcam
Pipeline

How It Works

From webcam feed to mouse control — a real-time AI pipeline processing every frame.

01

Webcam Input

Captures real-time video frames from any standard webcam at 30+ FPS.

cap = cv2.VideoCapture(0)
ret, frame = cap.read()
frame = cv2.flip(frame, 1)
30 FPSCapture Rate
02

Hand Landmark Detection

MediaPipe Hands detects 21 3D landmarks on each hand with sub-pixel accuracy.

mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
    max_num_hands=1,
    min_detection_confidence=0.7
)
results = hands.process(rgb_frame)
21Landmarks
03

Gesture Classification

Interprets landmark positions to classify gestures: point, pinch, scroll, and more.

index_tip = landmarks[8]
thumb_tip = landmarks[4]
distance = calculate_distance(
    index_tip, thumb_tip
)
if distance < PINCH_THRESHOLD:
    gesture = "CLICK"
<5msClassification
04

Coordinate Mapping

Normalizes hand coordinates to screen space with smoothing interpolation.

screen_x = np.interp(
    index_x, (0, cam_w), (0, scr_w)
)
screen_y = np.interp(
    index_y, (0, cam_h), (0, scr_h)
)
# Apply smoothing
smooth_x = prev_x + (screen_x - prev_x) * SMOOTH
1:1Mapping Ratio
05

OS Mouse Control

Sends mouse events directly to the operating system for seamless integration.

import pyautogui
pyautogui.moveTo(screen_x, screen_y)
if gesture == "CLICK":
    pyautogui.click()
elif gesture == "SCROLL":
    pyautogui.scroll(delta_y)
<1msOS Latency
Engineering

AI & Engineering Deep Dive

Under the hood — the computer vision, gesture logic, and performance engineering that makes it work.

Computer Vision Pipeline

MediaPipe Hands

Detects 21 3D landmarks per hand with real-time inference at 30+ FPS on CPU.

Real-Time Inference

Sub-frame latency processing using optimized TFLite models running on-device.

Confidence Smoothing

Exponential moving average filter to reduce jitter and false detections.

RGB Processing

Converts BGR → RGB for MediaPipe, maintains color space consistency.

21
per hand
Landmarks
<8
ms/frame
Inference
97.2
%
Accuracy
computer_vision.py
# MediaPipe Hand Detection Pipeline
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=1,
    min_detection_confidence=0.7,
    min_tracking_confidence=0.5
)

# Process frame
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = hands.process(rgb)

if results.multi_hand_landmarks:
    for hand_landmarks in results.multi_hand_landmarks:
        # Extract 21 landmarks
        landmarks = []
        for lm in hand_landmarks.landmark:
            landmarks.append((lm.x, lm.y, lm.z))
Capabilities

Powerful Features

Every feature is designed for precision, performance, and real-world usability.

Hand Tracking

21-point hand landmark detection with sub-pixel precision using MediaPipe neural networks.

Cursor Control

Smooth, responsive cursor movement mapped from hand position to full screen coordinates.

Click & Scroll Gestures

Pinch to click, vertical finger drag to scroll — natural, intuitive gesture mappings.

Real-Time Performance

30+ FPS processing with <33ms end-to-end latency. No GPU required — runs on CPU.

Accessibility Friendly

Enables computer control for users with motor disabilities or limited hand mobility.

Cross-Platform Ready

Works on Windows, macOS, and Linux. Requires only Python and a standard webcam.

Code

Code & Architecture

Clean architecture designed for extensibility, performance, and maintainability.

System Architecture

┌─────────────────────────────────────────────────────────┐
│                     Virtual Mouse                        │
├──────────┬──────────┬──────────┬──────────┬──────────────┤
│ Webcam   │  Hand    │ Gesture  │ Coord    │    OS        │
│ Capture  │ Detector │ Engine   │ Mapper   │  Control     │
│          │          │          │          │              │
│ OpenCV   │MediaPipe │ Custom   │ NumPy    │ PyAutoGUI    │
│ VideoIO  │ Hands   │ Logic    │Interp    │ Mouse API    │
├──────────┴──────────┴──────────┴──────────┴──────────────┤
│              Performance Optimizer                        │
│     (Frame Skip · ROI Crop · Kalman Filter)              │
└─────────────────────────────────────────────────────────┘

Project Structure

virtual-mouse/
├── main.py              # Entry point
├── hand_tracker.py      # MediaPipe hand detection
├── gesture_classifier.py # Gesture recognition logic
├── mouse_controller.py  # OS mouse control
├── performance.py       # FPS & optimization utils
├── config.py            # Tuning parameters
├── utils/
│   ├── smoothing.py     # Kalman filter & interpolation
│   └── coordinates.py   # Coordinate normalization
├── requirements.txt
└── README.md

Core Logic (Pseudocode)

virtual_mouse.py
class VirtualMouse:
    def __init__(self):
        self.camera = cv2.VideoCapture(0)
        self.detector = HandDetector(confidence=0.7)
        self.classifier = GestureClassifier()
        self.controller = MouseController()
        self.optimizer = PerformanceOptimizer()
    
    def run(self):
        while True:
            frame = self.camera.read()
            
            # Detect hand landmarks
            landmarks = self.detector.find_hands(frame)
            
            if landmarks:
                # Classify gesture from landmarks
                gesture = self.classifier.classify(landmarks)
                
                # Map hand coordinates to screen
                screen_pos = self.optimizer.smooth(
                    self.map_coordinates(landmarks[8])
                )
                
                # Execute mouse action
                self.controller.execute(gesture, screen_pos)
            
            # Adaptive performance tuning
            self.optimizer.adjust(self.fps)
Explore the Full Source Code on GitHubView Source on GitHub
Roadmap

Future Scope & Vision

What's next — the evolution of gesture-based computing.

In Progress

Phase 1Multi-Hand Gestures

Support for simultaneous two-hand tracking, enabling more complex gesture vocabularies and bimanual interactions.

Planned

Phase 2Gesture Customization

User-defined gesture mapping — assign any hand pose to any computer action through an intuitive configuration UI.

Research

Phase 3ML-Based Gesture Learning

Train the system to recognize new gestures on-the-fly using few-shot learning and user demonstrations.

Vision

Phase 4AR/VR Integration

Extend gesture control to spatial computing environments — mixed reality headsets and holographic interfaces.

Vision

Phase 5Mobile & Edge Devices

Optimize for mobile processors and edge AI chips — on-device inference for IoT and embedded systems.

Creator

Meet the Builder

MS
Created By

Mayank Sharma

AI / ML Engineer

Focused on Computer Vision, Human–AI Interaction, and building intelligent systems that bridge the gap between humans and machines.

GitHubLinkedIn@mayyanks@mayyankksmayyanks.appmayankiitj.in
2
3
4

About This Project

Virtual Mouse is an AI-powered Human–Computer Interaction system that uses real-time computer vision to let you control your computer with just hand gestures. No hardware needed.