Traditional input devices are becoming obsolete. The future of human-computer interaction is gesture-based.
From webcam feed to mouse control — a real-time AI pipeline processing every frame.
Captures real-time video frames from any standard webcam at 30+ FPS.
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
frame = cv2.flip(frame, 1)MediaPipe Hands detects 21 3D landmarks on each hand with sub-pixel accuracy.
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
max_num_hands=1,
min_detection_confidence=0.7
)
results = hands.process(rgb_frame)Interprets landmark positions to classify gestures: point, pinch, scroll, and more.
index_tip = landmarks[8]
thumb_tip = landmarks[4]
distance = calculate_distance(
index_tip, thumb_tip
)
if distance < PINCH_THRESHOLD:
gesture = "CLICK"Normalizes hand coordinates to screen space with smoothing interpolation.
screen_x = np.interp(
index_x, (0, cam_w), (0, scr_w)
)
screen_y = np.interp(
index_y, (0, cam_h), (0, scr_h)
)
# Apply smoothing
smooth_x = prev_x + (screen_x - prev_x) * SMOOTHSends mouse events directly to the operating system for seamless integration.
import pyautogui
pyautogui.moveTo(screen_x, screen_y)
if gesture == "CLICK":
pyautogui.click()
elif gesture == "SCROLL":
pyautogui.scroll(delta_y)Under the hood — the computer vision, gesture logic, and performance engineering that makes it work.
Detects 21 3D landmarks per hand with real-time inference at 30+ FPS on CPU.
Sub-frame latency processing using optimized TFLite models running on-device.
Exponential moving average filter to reduce jitter and false detections.
Converts BGR → RGB for MediaPipe, maintains color space consistency.
# MediaPipe Hand Detection Pipeline
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
static_image_mode=False,
max_num_hands=1,
min_detection_confidence=0.7,
min_tracking_confidence=0.5
)
# Process frame
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = hands.process(rgb)
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
# Extract 21 landmarks
landmarks = []
for lm in hand_landmarks.landmark:
landmarks.append((lm.x, lm.y, lm.z))Every feature is designed for precision, performance, and real-world usability.
21-point hand landmark detection with sub-pixel precision using MediaPipe neural networks.
Smooth, responsive cursor movement mapped from hand position to full screen coordinates.
Pinch to click, vertical finger drag to scroll — natural, intuitive gesture mappings.
30+ FPS processing with <33ms end-to-end latency. No GPU required — runs on CPU.
Enables computer control for users with motor disabilities or limited hand mobility.
Works on Windows, macOS, and Linux. Requires only Python and a standard webcam.
Clean architecture designed for extensibility, performance, and maintainability.
┌─────────────────────────────────────────────────────────┐
│ Virtual Mouse │
├──────────┬──────────┬──────────┬──────────┬──────────────┤
│ Webcam │ Hand │ Gesture │ Coord │ OS │
│ Capture │ Detector │ Engine │ Mapper │ Control │
│ │ │ │ │ │
│ OpenCV │MediaPipe │ Custom │ NumPy │ PyAutoGUI │
│ VideoIO │ Hands │ Logic │Interp │ Mouse API │
├──────────┴──────────┴──────────┴──────────┴──────────────┤
│ Performance Optimizer │
│ (Frame Skip · ROI Crop · Kalman Filter) │
└─────────────────────────────────────────────────────────┘virtual-mouse/
├── main.py # Entry point
├── hand_tracker.py # MediaPipe hand detection
├── gesture_classifier.py # Gesture recognition logic
├── mouse_controller.py # OS mouse control
├── performance.py # FPS & optimization utils
├── config.py # Tuning parameters
├── utils/
│ ├── smoothing.py # Kalman filter & interpolation
│ └── coordinates.py # Coordinate normalization
├── requirements.txt
└── README.mdclass VirtualMouse:
def __init__(self):
self.camera = cv2.VideoCapture(0)
self.detector = HandDetector(confidence=0.7)
self.classifier = GestureClassifier()
self.controller = MouseController()
self.optimizer = PerformanceOptimizer()
def run(self):
while True:
frame = self.camera.read()
# Detect hand landmarks
landmarks = self.detector.find_hands(frame)
if landmarks:
# Classify gesture from landmarks
gesture = self.classifier.classify(landmarks)
# Map hand coordinates to screen
screen_pos = self.optimizer.smooth(
self.map_coordinates(landmarks[8])
)
# Execute mouse action
self.controller.execute(gesture, screen_pos)
# Adaptive performance tuning
self.optimizer.adjust(self.fps)What's next — the evolution of gesture-based computing.
Support for simultaneous two-hand tracking, enabling more complex gesture vocabularies and bimanual interactions.
User-defined gesture mapping — assign any hand pose to any computer action through an intuitive configuration UI.
Train the system to recognize new gestures on-the-fly using few-shot learning and user demonstrations.
Extend gesture control to spatial computing environments — mixed reality headsets and holographic interfaces.
Optimize for mobile processors and edge AI chips — on-device inference for IoT and embedded systems.
AI / ML Engineer
Focused on Computer Vision, Human–AI Interaction, and building intelligent systems that bridge the gap between humans and machines.