AI GAZE DETECTION
· LOCAL · PRIVATE

Your focus,
under surveillance.

Drsti uses real-time AI gaze detection to measure exactly when you're working and when you drift — down to the second. No guesswork. No self-reporting.

Download for Windows Source Code See how it works →

Analysis interval

468

Face landmarks tracked

0ms

Data sent to cloud

100%

Local processing

HOW IT WORKS

Six steps from
webcam to insight

Start the timer

Hit play on a focus session. The webcam activates and the AI engine begins sampling.

Frame sampling

Every 2 seconds, a JPEG frame is sent from your webcam to the local Python API at localhost:5000.

Landmark detection

MediaPipe FaceMesh maps 468 facial landmarks in the frame and returns their normalized coordinates.

Gaze estimation

Head yaw and pitch are calculated from nose-to-eye ratios. Eye aspect ratio detects closed eyes.

Focus scoring

Each frame is classified as focused or away. A 30-sample rolling window produces your live focus score (0–100).

Session report

When the timer ends, you get a detailed breakdown: focused time, away time, per-minute chart, and a save option.

THE AI AGENT

What is the AI actually doing?

Drsti's AI is not a black box. It is a deterministic, rule-based system built on top of MediaPipe FaceMesh — Google's open-source face landmark detection library. It does not learn from your data, does not adapt over time, and does not make probabilistic inferences.

Every two seconds while your timer is running, a single JPEG frame is captured from your webcam and sent to a local Flask server running on your machine at localhost:5000. The server runs MediaPipe FaceMesh on the frame, which identifies 468 facial landmark coordinates in normalized (0–1) space.

From those 468 points, Drsti extracts three specific signals: head yaw (left/right rotation), head pitch (up/down tilt), and eye aspect ratio (detecting closed eyes). If any threshold is exceeded, that second is marked as "away."

Head yaw threshold

Turned more than 25° sideways

±25°

Head pitch threshold

Looking more than 20° up or down

±20°

Eye aspect ratio

Eyes closed or nearly closed

< 0.18

No face detected

Face not found in frame = away

—

Sample interval

One frame analyzed every 2 seconds

Rolling window

Score based on last 60 seconds

30 frames

What Drsti's AI cannot and does not do

Read emotions or expressions

Identify who you are

Store or transmit video

Train on your data

Access any other app

Run when timer is off

Connect to the internet

Infer cognitive load

SAFETY & PRIVACY

Six safety guarantees

No video is ever stored

Drsti never saves, uploads, or transmits your webcam feed. Frames are captured, analyzed in under 200ms, then discarded immediately. Nothing persists beyond the analysis window.

Entirely local inference

The face detection model (MediaPipe FaceMesh) runs on your own hardware inside a PyInstaller-bundled binary. Your webcam data never leaves your machine — not to our servers, not to Google, not anywhere.

No biometric data stored

Drsti does not store facial geometry, embeddings, or any biometric identifiers. The only data saved per session is: duration, focused seconds, away seconds, and a per-minute focus percentage — no face data.

Open detection thresholds

The distraction thresholds (head yaw > 25°, pitch > 20°, eye aspect ratio < 0.18) are documented and adjustable. You can set sensitivity to strict, balanced, or relaxed in settings.

You control the camera

The webcam only activates when your focus timer is running. It stops the moment you pause or the session ends. The OS camera indicator light will always reflect this accurately.

Row-level security on all data

Every Supabase query is protected by RLS policies. Users can only read, write, and delete their own session rows. Even with a leaked anon key, no cross-user data access is possible.

TECHNOLOGY

Full stack, open architecture

FACE LANDMARK DETECTION

MediaPipe FaceMesh

468 facial landmarks mapped in real time at ~30ms per frame. Runs entirely on your machine — no cloud.

FRAME PROCESSING

OpenCV

Decodes webcam frames, converts color spaces, and feeds processed images to the gaze estimator.

LOCAL AI SERVER

Flask + Python

A lightweight HTTP server (localhost:5000) that Electron spawns silently. Receives frames, returns gaze data.

DESKTOP SHELL

Electron

Wraps the Next.js renderer in a native window. Handles OS notifications, IPC, and the Python process lifecycle.

UI FRAMEWORK

Next.js 15

React-based UI running inside Electron. Same codebase powers this website and the desktop app UI.

AUTH + DATABASE

Supabase

Google OAuth and Postgres for session history. Row-level security ensures users can only access their own data.

ARCHITECTURE FLOW

Webcam

OS capture

→

Canvas

JPEG frame

→

Flask API

localhost:5000

→

MediaPipe

468 landmarks

→

Gaze Engine

yaw + pitch + EAR

→

Score

0–100 rolling

→

React UI

live display

CONNECT

Get in touch

Have questions about the architecture? Found a bug? Just want to connect? Send a message or reach out on my socials.

LinkedIn GitHub LeetCode CodeChef Portfolio