Pixel precision for vision LLMs

Click on a screenshot, describe what is there, export to JSON, Playwright spec, or straight into a Claude conversation. Single HTML file, open source, runs entirely in your browser.

Open the app GitHub repo

GridMapper 50 px grid axes X: 1243 Y: 876

1 1 Save button

2 2 Email field

3 3 Header logo

Polygon · 4 vertices

Vision LLMs cannot pixel-point.

When you tell Claude (or GPT-4o, Gemini) “click the blue button in the top-left”, the model has to guess pixels — and hallucinates. Whether you are describing UI elements, measuring on a blueprint, or automating an agent task, there is a missing layer between you and the model that provides exact coordinates. Grid Mapper is that layer.

01

Upload screenshot

Drag & drop PNG/JPG. Multi-image projects.
02

Click or polygon on element

Single click = point. Shift+click = polygon. Label, color, tags.
03

Export to JSON / Playwright / Claude

Copy straight into a Claude chat, or download .spec.ts.

What Grid Mapper does today

8-color marker palette

Point + polygon annotations

Click or Shift+click. 3+ vertices = polygon. Drag individual vertices with Shift+drag.
Multi-image projects

Multiple screenshots in one project, switch with `[` / `]` (or `,` / `.`).
Real-unit calibration

Click 2 points, enter the real distance (mm/cm/m/in). Optional custom origin.
Semantic tags + attributes

Multi-label classification, key/value pairs for selectors and metadata.
16× zoom + pan

GPU-accelerated, zoom-to-cursor, Alt+drag magnifier, Space+drag panning.
Exports

JSON, CSV, Markdown, Playwright `.spec.ts`. Per-image and project-bundle versions.
Copy for Claude

One click puts a structured context into your clipboard for a Claude conversation.
IndexedDB persistence

State survives refresh and restart. 4K / 8K screenshots without issues.

Who it is for

Claude / vision LLM power-user

“Click exactly here” beats “somewhere top-left”.

When you describe a UI screenshot to Claude, the model has to guess pixels. Coordinate hallucinations kill agent tasks and burn tokens. Writing structured JSON with numbered elements is still manual.

Click on the screenshot, describe in one sentence, hit “Copy for Claude”. Claude gets exact pixel-coords + labels. Zero hallucinations, zero re-prompting.

Point + polygon
Semantic tags
Copy for Claude
Multi-image bundle

“From 15 minutes of describing UI to 30 seconds of clicks.”

Try with the Claude template

Architect / surveyor

Real units relative to the room corner, not the photo edge.

You want dimensions from a floor plan or building photo. A full CAD tool is overkill, Notion / Figma give pixels only — not real mm. Then you manually subtract the room-corner offset from the image edge after export.

Calibrate two points with a known real distance, place the coordinate origin on the room corner. Annotations get real mm / cm / m relative to your chosen origin — drop directly into CAD or Excel.

Real-unit calibration
Custom origin (A/B/top-left)
Polygon
Real coords in export

“Pixels on a building drawing are useless. This gives real mm.”

Try calibration

QA / E2E test engineer

Click in the UI, get a Playwright spec.

Writing Playwright tests for a new feature happens from memory — coords approximate, selectors missing. Codegen produces fragile selectors that break on the first refactor. Between “click two pixels left” and `await page.click()` there is no clean workflow.

Click in the UI screenshot, add a `selector` attribute (data-testid=“save”), or just leave pixel coords. The exported `.spec.ts` uses `page.click(selector)` when available, otherwise `page.mouse.click(x, y)`. Code-review-friendly.

Playwright export
Selector attributes
Polygon centroid click
Project bundle

“From mockup to spec test in 5 minutes, not hours.”

Generate a Playwright spec

Pricing

Community

Free

Open source, MIT.

Full feature set
Single HTML file
GitHub repo + issues
No lock-in, no account

Open the app GitHub repo

Enterprise

Let's talk

For companies needing vendor SLA.

Custom support + response time
On-prem / air-gapped builds
Custom feature work
Vendor sign-off for compliance

FAQ

Why a single HTML file?

Distribution via Claude Artifact + browser drag-drop. No server, no sign-up flow, no backend. Works offline and in air-gapped environments. Core thesis, not a limitation.
Where is my data stored?

In your browser's IndexedDB. Locally. Nothing leaves your machine. No telemetry, no cloud sync, no analytics tracking.
Does this work with GPT-4o / Gemini instead of Claude?

JSON / CSV / Markdown exports are LLM-agnostic. The "Copy for Claude" button formats output for a Claude chat, but the same JSON drops into any other model.
How can I contribute?

Issues and PRs on GitHub. Sponsors via GitHub Sponsors / Buy Me a Coffee. For enterprise integrations or on-prem deployments, email us (footer).
Are you planning a cloud version / multi-user collab?

No. Grid Mapper is a single-user power tool, single-file artifact. Cloud sync would dilute the thesis. Annotation sharing = Git commit a JSON export — that already works today.
License?

MIT. Fork, modify, use commercially, all OK.

Pixel precision for vision LLMs

Vision LLMs cannot pixel-point.

Upload screenshot

Click or polygon on element

Export to JSON / Playwright / Claude

Point + polygon annotations

Multi-image projects

Real-unit calibration

Semantic tags + attributes

16× zoom + pan

Exports

Copy for Claude

IndexedDB persistence

“Click exactly here” beats “somewhere top-left”.

Real units relative to the room corner, not the photo edge.

Click in the UI, get a Playwright spec.

Community

Sponsors

Enterprise