Local Core ML vision for macOS

Mac Vision Tools

A menu bar workspace for real-time object detection, emotion monitoring, privacy guardrails, and focus sessions from camera or screen capture.

Four workflows

Built for live visual awareness

Standard

Object detection

Run a bundled SSD MobileNet detector or bring your own Core ML model for labeled detections.

Emotion

Face emotion cues

Detect faces first, classify the visible emotion, and keep a short recent history.

Privacy

Presence threshold

Count visible people and start the screen saver when your configured threshold is reached.

Focus

Attention timer

Use native Apple Vision head-pose tracking to count focused time toward a session goal.

Camera or screen

Choose the capture source and display style

Mac Vision Tools standard detection window
Windowed detection preview
Mac Vision Tools emotion monitoring panel
Emotion mode controls
Mac Vision Tools privacy guard panel
Privacy guard threshold
Mac Vision Tools focus timer panel
Native focus tracking

Model contracts

Know what to train or drop in

Mac Vision Tools accepts local Core ML models. Use these contracts when replacing a bundled model, training a new one, or choosing whether a file belongs in detection or emotion mode.

Bundled detector

SSD MobileNet V2 COCO

Used by Standard Detection and Privacy Guard for live object labels and person counting.

Task Object detection
Classes
80 COCO object classes. Privacy Guard depends on the person class being present and named clearly.
View the bundled COCO labels

person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair drier, toothbrush.

Input Bundled export uses 320 x 320 image input. Custom detectors can use another size if the Core ML model is accepted by Vision.
Output Vision object detections, or SSD-style multi-array boxes, scores, class ids, and count outputs compatible with the app parser.
Train/drop in Train a Core ML detector with stable labels, select it in Standard Detection first, then use it for Privacy Guard only if it reliably emits person.
Bundled classifier

EmotiEff emotion model

Used by Emotion mode after Apple Vision finds and crops a visible face.

Task Face emotion classification
Classes 7 bundled labels: Angry, Disgust, Fear, Happy, Neutral, Sad, and Surprise.
Input Face crop supplied by Apple Vision and scaled to the bundled model's 224 x 224 image input. Train replacement classifiers on face crops, not full camera frames.
Output Predicted class label and confidences compatible with Vision classification results.
Train/drop in Export an image classifier as Core ML. Keep labels equivalent to the bundled emotions so colors, history, and summaries stay meaningful.
System framework

Apple Vision

Used for face detection, face crops, landmarks, and native focus tracking.

Classes No trainable app classes. Apple Vision provides system observations instead of a bundled model file.
Drop in Nothing to replace for Focus mode. The app uses the macOS framework automatically.
User-selected

Custom Core ML models

Select local model files when experimenting with your own detector or classifier.

Formats .mlmodel, .mlpackage, or compiled .mlmodelc files and folders selected from your Mac.
Detection Use Standard Detection to validate labels and boxes. Privacy Guard needs a detector that emits people as person.
Emotion Use a classifier with labels that match or clearly map to the 7 bundled emotion labels above.
Licensing Custom models stay local, and you are responsible for permission to use or redistribute any model you select.

For export details, see the model replacement notes. For bundled notices, see the app's Notices view and repository credits.

On-device by design

Inference stays local

Mac Vision Tools uses Core ML, Vision, AVFoundation, and ScreenCaptureKit on your Mac. Camera and screen frames are processed for live detections and are not saved by the app.

Permissions are explicit: camera capture needs Camera access, and screen capture needs Screen Recording access in macOS System Settings.

The app does not create accounts, run analytics, track users, or send camera or screen content to a server. Custom Core ML models selected by the user stay on the Mac.

Privacy Guard starts the macOS screen saver when the configured person threshold is reached. macOS controls whether the screen saver requires a password.

Read the full privacy policy

Open source macOS utility

Build it, run it, adapt it

The app ships with bundled models and also accepts custom Core ML model files for mode-specific experiments.