miska.ai

Deep dive

How It Works

miska.ai uses AI agents to test your review apps through real browser interactions. No scripts to write, no selectors to maintain — agents explore your app like real users and report what they find.


What is miska.ai?

miska.ai is a GitHub App that automatically runs QA and user testing against your review apps on every pull request. When a PR's preview deployment goes live, miska.ai spins up parallel AI agents that interact with your app through a real browser — clicking, typing, navigating, and breaking things — then posts a structured report back to your PR.

Unlike traditional end-to-end testing tools, miska.ai doesn't require you to write test scripts or maintain CSS selectors. Agents are powered by AI that can see and understand your application, make decisions about what to test, and adapt to changes in your UI automatically.


Zero config defaults

miska.ai works out of the box with no configuration file. When no .miska.yml is found in your repository, the following defaults are used:

  • Agents: The functional agent runs, testing core user flows and catching regressions
  • Personas: A default "First-time user" persona explores your app and evaluates whether it's intuitive and worth using

This means you get both QA testing and user experience testing from the moment you install the app. When you need more control, add a .miska.yml file — see the Configuration docs.


The pipeline

When you push a pull request, here's exactly what happens:

  1. Webhook received — Your deployment platform (Vercel, Railway, Render, etc.) deploys the review app and sends a deployment_status event to GitHub, which forwards it to miska.ai.
  2. Deployment detection — miska.ai creates a test run and begins polling the deployment URL until it returns a successful response.
  3. Configuration loaded — miska.ai reads .miska.yml from the PR branch (not the base branch), so config changes in the PR take effect immediately.
  4. PR diff analysis — The PR diff is fetched and analyzed so agents know which areas of the app have changed and should be prioritized.
  5. Agent orchestration — Parallel agent containers are launched, each with its own Chromium browser session. QA agents and persona agents run simultaneously.
  6. Results collection — Each agent reports its findings (issues, observations, screenshots) back to the orchestrator via callbacks.
  7. Report synthesis — All agent findings are synthesized into a single structured report, with issues deduplicated and categorized by severity.
  8. PR comment posted — The final report is posted as a comment on your pull request, with a summary also visible on the miska.ai dashboard.

Browser automation

Each agent runs in its own Docker container with a full Chromium browser, powered by the LLM provider of your choice. By default, agents use Anthropic's Claude, but you can switch to Google Gemini or any OpenAI-compatible provider via your account settings or per-repository .miska.yml configuration. Agents interact with your app the same way a human would — by looking at the screen, moving the mouse, clicking elements, and typing text.

This approach has several advantages over traditional browser automation:

  • No selectors to maintain — Agents find elements visually, so your tests don't break when you change a CSS class or restructure your HTML
  • Realistic interactions — Agents interact with your app exactly like a real user would, catching issues that selector-based tools miss
  • Adaptive testing — Agents can make decisions about what to test based on what they see, exploring unexpected paths and edge cases
  • Screenshot evidence — Every issue includes a screenshot of exactly what the agent saw, making it easy to understand and reproduce

PR diff analysis

Before agents start testing, miska.ai fetches the PR diff and analyzes which files and features have changed. This information is provided to each agent so they can:

  • Prioritize testing areas of the app that correspond to code changes
  • Pay extra attention to UI components that were modified
  • Verify that existing functionality near the changes still works (regression testing)

Agents still explore the broader app, but they focus more effort on the parts that are most likely to have new issues.


Agent types

miska.ai ships with four specialized QA agent types. Each agent approaches your app differently, looking for different categories of issues.

functional

default

Navigates your app, fills forms, clicks buttons, and verifies that happy paths work. The generalist agent that catches regressions in core user flows. It methodically explores pages, tests interactive elements, and validates that expected content appears.

destructive

Tries to break things on purpose. Submits XSS payloads, SQL injection strings, empty forms, extremely long input, and special characters. Tests error handling, validates input sanitization, and looks for unhandled edge cases that could cause crashes or security vulnerabilities.

accessibility

opt-in

Audits your app for WCAG 2.1 compliance. Checks keyboard navigation, verifies ARIA labels and roles, validates semantic HTML structure, tests color contrast ratios, and examines mobile viewport behavior. Helps ensure your app is usable by everyone.

integration

opt-in

Tests cross-system flows and data consistency. Verifies that data entered on one page appears correctly on another, tests email verification flows, checks API error handling, and validates that the app handles network failures gracefully.


User personas

In addition to QA agents, miska.ai runs user persona testing. Each persona is an AI agent that explores your app from the perspective of a specific type of user — with their own background, patience level, technical skill, and goals.

Persona testing answers questions that QA agents can't: "Is this app intuitive?", "Can a first-time user figure out what to do?", "Does the checkout flow make sense?"

Default persona

When no personas are configured, miska.ai uses a default "First-time user" persona. This agent explores your app as someone who has never seen it before, evaluating:

  • Whether the value proposition is clear
  • Whether the main user flow is intuitive
  • Whether the app feels trustworthy and professional

Custom personas

You can define your own personas in .miska.yml to match your actual user segments. Each persona runs as a separate agent in its own browser session, and findings are attributed to the persona in the report.

The best personas include a backstory (who they are), a temperament (patient, rushed, skeptical), and specific goals (what they're trying to accomplish). See the Configuration docs for the full persona format.

QA vs. User Testing

QA agents (functional, destructive, accessibility, integration) test whether your app works correctly. User personas test whether your app makes sense to real people. Both run in parallel and their findings are combined in the final report.

Reports

After all agents finish, miska.ai synthesizes their findings into a single structured report. The synthesis process:

  • Deduplicates issues — If multiple agents found the same bug, it's reported once with notes on which agents encountered it
  • Categorizes by severity — Issues are rated as critical, high, medium, or low severity based on impact and scope
  • Separates QA from UX — QA findings (bugs, security, accessibility) are listed separately from user experience observations
  • Includes evidence — Each issue includes screenshots, reproduction steps, and the agent's reasoning

Where reports appear

The report is posted in two places:

  • PR comment — A summary report posted directly to your pull request on GitHub, visible to all reviewers
  • Dashboard — The full detailed report on the miska.ai dashboard, with individual agent run details, interaction logs, and all screenshots

Severity levels

Level Description
Critical App crashes, data loss, security vulnerabilities, complete feature breakage
High Major functionality broken, significant UX issues, accessibility failures
Medium Minor bugs, confusing UX, inconsistent behavior, cosmetic issues with impact
Low Polish suggestions, minor cosmetic issues, nice-to-have improvements