AI GAZE DETECTION
· LOCAL · PRIVATE

Your focus,
under surveillance.

Drsti uses real-time AI gaze detection to measure exactly when you're working and when you drift — down to the second. No guesswork. No self-reporting.

2s
Analysis interval
468
Face landmarks tracked
0ms
Data sent to cloud
100%
Local processing
HOW IT WORKS

Six steps from
webcam to insight

01
Start the timer
Hit play on a focus session. The webcam activates and the AI engine begins sampling.
02
Frame sampling
Every 2 seconds, a JPEG frame is sent from your webcam to the local Python API at localhost:5000.
03
Landmark detection
MediaPipe FaceMesh maps 468 facial landmarks in the frame and returns their normalized coordinates.
04
Gaze estimation
Head yaw and pitch are calculated from nose-to-eye ratios. Eye aspect ratio detects closed eyes.
05
Focus scoring
Each frame is classified as focused or away. A 30-sample rolling window produces your live focus score (0–100).
06
Session report
When the timer ends, you get a detailed breakdown: focused time, away time, per-minute chart, and a save option.
THE AI AGENT

What is the AI actually doing?

Drsti's AI is not a black box. It is a deterministic, rule-based system built on top of MediaPipe FaceMesh — Google's open-source face landmark detection library. It does not learn from your data, does not adapt over time, and does not make probabilistic inferences.

Every two seconds while your timer is running, a single JPEG frame is captured from your webcam and sent to a local Flask server running on your machine at localhost:5000. The server runs MediaPipe FaceMesh on the frame, which identifies 468 facial landmark coordinates in normalized (0–1) space.

From those 468 points, Drsti extracts three specific signals: head yaw (left/right rotation), head pitch (up/down tilt), and eye aspect ratio (detecting closed eyes). If any threshold is exceeded, that second is marked as "away."

Head yaw threshold
Turned more than 25° sideways
±25°
Head pitch threshold
Looking more than 20° up or down
±20°
Eye aspect ratio
Eyes closed or nearly closed
< 0.18
No face detected
Face not found in frame = away
Sample interval
One frame analyzed every 2 seconds
2s
Rolling window
Score based on last 60 seconds
30 frames
What Drsti's AI cannot and does not do
Read emotions or expressions
Identify who you are
Store or transmit video
Train on your data
Access any other app
Run when timer is off
Connect to the internet
Infer cognitive load
SAFETY & PRIVACY

Six safety guarantees

No video is ever stored
Drsti never saves, uploads, or transmits your webcam feed. Frames are captured, analyzed in under 200ms, then discarded immediately. Nothing persists beyond the analysis window.
Entirely local inference
The face detection model (MediaPipe FaceMesh) runs on your own hardware inside a PyInstaller-bundled binary. Your webcam data never leaves your machine — not to our servers, not to Google, not anywhere.
No biometric data stored
Drsti does not store facial geometry, embeddings, or any biometric identifiers. The only data saved per session is: duration, focused seconds, away seconds, and a per-minute focus percentage — no face data.
Open detection thresholds
The distraction thresholds (head yaw > 25°, pitch > 20°, eye aspect ratio < 0.18) are documented and adjustable. You can set sensitivity to strict, balanced, or relaxed in settings.
You control the camera
The webcam only activates when your focus timer is running. It stops the moment you pause or the session ends. The OS camera indicator light will always reflect this accurately.
Row-level security on all data
Every Supabase query is protected by RLS policies. Users can only read, write, and delete their own session rows. Even with a leaked anon key, no cross-user data access is possible.
TECHNOLOGY

Full stack, open architecture

FACE LANDMARK DETECTION
MediaPipe FaceMesh
468 facial landmarks mapped in real time at ~30ms per frame. Runs entirely on your machine — no cloud.
FRAME PROCESSING
OpenCV
Decodes webcam frames, converts color spaces, and feeds processed images to the gaze estimator.
LOCAL AI SERVER
Flask + Python
A lightweight HTTP server (localhost:5000) that Electron spawns silently. Receives frames, returns gaze data.
DESKTOP SHELL
Electron
Wraps the Next.js renderer in a native window. Handles OS notifications, IPC, and the Python process lifecycle.
UI FRAMEWORK
Next.js 15
React-based UI running inside Electron. Same codebase powers this website and the desktop app UI.
AUTH + DATABASE
Supabase
Google OAuth and Postgres for session history. Row-level security ensures users can only access their own data.
ARCHITECTURE FLOW
Webcam
OS capture
Canvas
JPEG frame
Flask API
localhost:5000
MediaPipe
468 landmarks
Gaze Engine
yaw + pitch + EAR
Score
0–100 rolling
React UI
live display
CONNECT

Get in touch

Have questions about the architecture? Found a bug? Just want to connect? Send a message or reach out on my socials.

LinkedInGitHubLeetCodeCodeChefPortfolio
Send a Message