G s t r l

Gesture, voice, and AI control for macOS.

View on GitHub See gestures
↓ Scroll to explore

Gestures.

Just your Mac's webcam. No special hardware.

cursor

👌 Pinch to move

Right hand pinch — move your hand to drag the cursor anywhere on screen.

click

👌 Pinch to click

Quick left pinch for click. Hold for 1 second for right-click.

screenshot

⭕ Circle to screenshot

Draw a circle while pinching — captures that region to your clipboard instantly.

↑ ↓ ← →

🖐 Swipe for arrows

Open hand flick in any direction — up, down, left, right arrow keys.

dictate

✊ Fist to speak

Hold a fist to activate speech-to-text. Say "press enter" or just dictate.

AI agent

✊✊ Both fists for AI

Hold both fists — ask Claude anything. Gets spoken back to you.

scroll ↕

👌✊ Pinch + fist to scroll

Left pinch, right fist. Move to scroll. Accelerates the longer you hold.

⌫ delete

🤙 Six to delete

Hold the shaka — deletes characters, then words, then lines. Both hands 🤙🤙 = delete lines → select all.

drag / select

👌👌 Both pinch to drag

Both hands pinching — drag and drop files, select text, anything.

Voice commands.

Say commands to trigger actions instead of typing. Dictation supports multiple languages.

👆

Click

click · right click · command click

⌨️

Press + key

press enter · press delete · press tab · press escape · press up/down/left/right

Modifiers

command z · control c · shift left · option delete · command shift z

Get started.

Clone, build, run. No accounts needed.

Terminal
git clone https://github.com/TomYang-TZ/Gstrl.git
cd Gstrl
make install
make run

Requires macOS 14+, a webcam, and Swift 5.9+. Permissions auto-prompt on first launch. Claude Code CLI optional for AI agent.

How it works.

All processing on-device. Zero latency. Zero cloud.

1

Webcam captures frames

AVCaptureSession at 30fps (configurable to 120fps) feeds frames to Apple Vision.

2

Vision detects hand poses

VNDetectHumanHandPoseRequest identifies 21 joints per hand, every frame.

3

Classifier maps to actions

Pinch detection, velocity-based swipes, and combo tracking turn poses into CGEvents.

4

Speech + voice commands

Hold a fist to activate speech. Dictate text or say "press enter", "command z", "click" to trigger actions. Dictation supports multiple languages.