Gestures.

Just your Mac's webcam. No special hardware.

cursor

👌 Pinch to move

Right hand pinch — move your hand to drag the cursor anywhere on screen.

click

👌 Pinch to click

Quick left pinch for click. Hold for 1 second for right-click.

screenshot

⭕ Circle to screenshot

Draw a circle while pinching — captures that region to your clipboard instantly.

↑ ↓ ← →

🖐 Swipe for arrows

Open hand flick in any direction — up, down, left, right arrow keys.

dictate

✊ Fist to speak

Hold a fist to activate speech-to-text. Say "press enter" or just dictate.

AI agent

✊✊ Both fists for AI

Hold both fists — ask Claude anything. Gets spoken back to you.

scroll ↕

👌✊ Pinch + fist to scroll

Left pinch, right fist. Move to scroll. Accelerates the longer you hold.

⌫ delete

🤙 Six to delete

Hold the shaka — deletes characters, then words, then lines. Both hands 🤙🤙 = delete lines → select all.

drag / select

👌👌 Both pinch to drag

Both hands pinching — drag and drop files, select text, anything.

Voice commands.

Say commands to trigger actions instead of typing. Dictation supports multiple languages.

👆

Click

click · right click · command click

⌨️

Press + key

press enter · press delete · press tab · press escape · press up/down/left/right

⌘

Modifiers

command z · control c · shift left · option delete · command shift z

Get started.

Clone, build, run. No accounts needed.

Terminal

git clone https://github.com/TomYang-TZ/Gstrl.git
cd Gstrl
make install
make run

Requires macOS 14+, a webcam, and Swift 5.9+. Permissions auto-prompt on first launch. Claude Code CLI optional for AI agent.

How it works.

All processing on-device. Zero latency. Zero cloud.

Webcam captures frames

AVCaptureSession at 30fps (configurable to 120fps) feeds frames to Apple Vision.

Vision detects hand poses

VNDetectHumanHandPoseRequest identifies 21 joints per hand, every frame.

Classifier maps to actions

Pinch detection, velocity-based swipes, and combo tracking turn poses into CGEvents.

Speech + voice commands

Hold a fist to activate speech. Dictate text or say "press enter", "command z", "click" to trigger actions. Dictation supports multiple languages.