Dev tool

Auditing your site for accessibility with axe and Playwright

Taking a site through WCAG 2.2 AA: setting up axe-core in Playwright, what it found, and some of the learnings along the way.

Daniel HallSolutions Architect

13 min read

I’ve been exploring ways to bring my personal site closer into compliance with WCAG 2.2 AA accessibility standards. As with many things, this requires multiple steps, each one with their own challenges.

Firstly, I needed some sort of idea as to how compliant the site already was, and what changes would be needed to improve compliance. The work required a plan on how to prioritise implementation of ‘X’ amount of required tweaks. It was also important that I could re-run the exact same audit to determine how successful the changes were. Ideally, the audit should be light-weight enough to run regularly, to ensure that I maintained compliance when future code changes.

I decided to implement Playwright tests with axe-core, a popular accessibility testing engine. These tests would be re-runnable, and I’d be able to integrate them into my existing GitHub CI pipeline to run alongside my existing unit tests.

The first axe-core sweep came back with 131 serious violations across 24 routes. Luckily, 111 of those were a single rule (colour contrast issues), and traced back to one design choice I'd been making in roughly every component. Most of the rest were structural, such as half-implemented ARIA and a skipped heading level. A useful proportion of what makes a site fail AA is invisible to a sighted developer testing in their own browser, which is the whole problem with relying on intuition for any of this.

This post is a working playbook to help you get started with auditing your own site. Where to set up automated checks, what those checks catch, what they can't, and the structural-vs-cosmetic split that ends up dictating how you spend your time. The WAI quick reference is still the canonical spec; this is the practical sequence around it.

I'll work from the audit I did on this site. Where it helps, I'll include code snippets.

What AA actually means

The bar most teams aim for is WCAG 2.2, level AA. The spec is long, but I feel it’s easy to summarise as follows:

All text contrasts at least 4.5:1 against its background (3:1 for large text and UI components).
Every interactive control is reachable, activatable and visible via the keyboard alone.
The site is navigable and operable with assistive technology, such as screen readers, switch devices, magnifiers.
Animations don't trap users with vestibular disorders.
Forms describe themselves by telling you what they want, what you got wrong, and how to fix it.

Automated tools can catch roughly a third of this, but there a number of remaining areas that need a human eye. Or ear.

Phase 1: stand up the automated checks

This is the most valuable step, in my opinion.

Install `@axe-core/playwright` and `@playwright/test`:

pnpm add -D @axe-core/playwright @playwright/test
pnpm exec playwright install chromium

Setup a minimal playwright.config.ts:

import { defineConfig } from "@playwright/test";

export default defineConfig({
  testDir: "./tests/a11y",
  workers: 1,
  webServer: {
    command: "pnpm dev",
    url: "http://localhost:3000",
    reuseExistingServer: true,
    timeout: 60_000,
  },
});

Define your spec.

import { test, expect } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";

const BASE = "http://localhost:3000";

test("axe-core sweep across every sitemap route", async ({ page }) => {
  // Reads /sitemap.xml so the test picks up new routes for free
  // as you add them. No manual list to keep in sync.
  await page.goto(`${BASE}/sitemap.xml`);
  const xml = await page.content();
  const routes = [...xml.matchAll(/<loc>(.*?)<\/loc>/g)].map((m) => m[1]);

  await page.emulateMedia({ reducedMotion: "reduce" });

  const violations: Array<{ url: string; rule: string; impact: string }> = [];

  for (const url of routes) {
    await page.goto(url);
    const results = await new AxeBuilder({ page })
      .withTags(["wcag2a", "wcag2aa", "wcag22aa"])
      .analyze();

    for (const v of results.violations) {
      if (v.impact === "critical" || v.impact === "serious") {
        violations.push({ url, rule: v.id, impact: v.impact });
      }
    }
  }

  expect(violations, JSON.stringify(violations.slice(0, 5), null, 2)).toEqual([]);
});

A couple of notes that matter:

Emulate reduced-motion in the page context. If you use any kind of fade-in animation that starts at opacity: 0, axe will measure the mid-fade composite colour and accuse perfectly-fine text of failing contrast. page.emulateMedia({ reducedMotion: "reduce" }) makes the audit deterministic. It also nudges you to make sure your animations honour the preference for real users, which is a WCAG criterion in itself.
Use `workers: 1`. axe is more useful when routes are walked serially. Easier to map a violation back to a route in the failure summary, and the dev server doesn't get hammered.

My first run came back with 131 serious + 6 moderate violations across 24 routes. 111 of the 131 were color-contrast. The rest were spread across nested-interactive, dlitem, definition-list, landmark-complementary-is-top-level, and heading-order. Five rules, an hour or two of work. Then the contrast pass, which was its own thing.

axe doesn't really audit the keyboard. It checks that focusable elements exist, not that the tab order makes sense, that focus is visible, or that opening a modal moves focus into the modal.

Most of that is a human-with-a-keyboard job. But you can still write Playwright tests for the bits that are easy to break and easy to assert on.

A skip-link smoke test as an example:

test("skip link is the first focusable element", async ({ page }) => {
  await page.goto("/");
  await page.keyboard.press("Tab");

  const focused = await page.evaluate(() => document.activeElement?.textContent);
  expect(focused).toMatch(/skip to/i);

  await page.keyboard.press("Enter");
  const hash = new URL(page.url()).hash;
  expect(hash).toBe("#main");
});

The non-obvious one I noticed was that client-side route changes don't reset focus. After you navigate from /writing to /writing/<slug>, your next Tab press resumes from wherever the previously-focused link sat in the DOM of the old page, which on the new page is usually somewhere in the middle of the content. A fresh page load doesn't do this. It’s caused by the site being an SPA.

I ended up implementing a hidden sentinel that the layout focuses on every pathname change:

"use client";
import { useEffect } from "react";
import { usePathname } from "next/navigation";

export function RouteFocusReset() {
  const pathname = usePathname();
  useEffect(() => {
    const anchor = document.getElementById("route-focus-anchor");
    anchor?.focus({ preventScroll: true });
  }, [pathname]);
  return null;
}

The layout has <div id="route-focus-anchor" tabIndex={-1} className="sr-only" /> before the skip link. The next Tab after a route change now lands on Skip to content, matching a fresh-page-load experience. I'd never have caught this without explicitly testing for it. Visually nothing changes.

The other one worth a programmatic test is the mobile-nav focus trap. If you use a headless dialog primitive (Radix, Headless UI), the trap is usually there, but the cost of confirming it didn't regress when you styled it is roughly zero:

test("mobile-nav traps focus", async ({ page }) => {
  await page.setViewportSize({ width: 375, height: 640 });
  await page.goto("/");
  await page.getByRole("button", { name: /menu/i }).click();

  for (let i = 0; i < 6; i++) {
    await page.keyboard.press("Tab");
  }

  const inDialog = await page.evaluate(() =>
    Boolean(document.activeElement?.closest('[role="dialog"]')),
  );
  expect(inDialog).toBe(true);
});

Phase 3: the structural fixes

Five rules made up the non-contrast violations on my site. Each one took minutes to fix once I'd seen it. Together they account for most of what axe catches that you wouldn't otherwise notice.

`landmark-complementary-is-top-level`. I'd wrapped the article ToC in <aside>. <aside> has an implicit role="complementary", which expects to be outside any other landmark, but it sat inside <main>. Worse, my ToC component already wrapped itself in its own <nav> landmark, so my first attempted fix (swap to <nav aria-label="Table of contents">) just turned the violation into landmark-unique. The solution was a plain <div> wrapper, because the inner component already provided the landmark.

> Lesson: before adding a landmark to a wrapper, check whether the child already provides one. Duplicate-named landmarks read worse to a screen-reader user than no wrapper at all.

`heading-order`. The /writing page rendered <h1>Writing</h1> followed directly by an <h3> per card, skipping <h2>. Visually it looked fine; the cards read as cards. To a screen-reader user, the heading outline jumps a level. Because it made sense in this scenario, I added a visually-hidden <h2 className="sr-only">All posts</h2> between the page header and the grid. The user doesn't see it, but the assistive tech does, and in this scenario, it added value to a screen reader.

`nested-interactive`. I had a grid that rendered <ul role="listbox"><li role="option"><button>…</button></li></ul>. The full WAI-ARIA listbox pattern expects the option to be the focusable thing (via roving tabindex), not to contain a <button> that's separately focusable. The fix was simply to remove the wrong ARIA. A plain <ul> of <button>s is keyboard-accessible by default. I added aria-pressed to the buttons so the active state still announces.

> Lesson: missing ARIA is often safer than half-implemented ARIA. The native <button> is already accessible; decorating it with role="option" made things worse.

`definition-list` and `dlitem`. My epoch-converter tool had an extra wrapper between the <dl> and its <dt> / <dd> children. <dl> doesn't allow that. I just had to flatten the structure, no visible change.

Phase 4: colour contrast

A huge 111 of my 131 violations were contrast failures, and the only reason that number wasn't higher is that I happen to use a fairly limited palette.

The site's brand colour is #ff3d00. Against the off-white page background (#f0ede5), that's a contrast ratio of 3.03:1. AA wants 4.5:1 for body text. Every "active filter" label, every link-on-hover colour shift, every accent was failing.

My first instinct was to darken the orange brand colour. I tried testing candidate shades by exact luminance against page bg, surface bg and white-on-button, and #c92e00 was the smallest shift that cleared 4.5:1 on all three at once. I swapped the CSS variable and pnpm a11y went from 111 violations to 28. Two more passes on opacity modifiers and tinted backgrounds, and the count hit zero.

This was the technically-correct fix. The brand still looked like itself. But side-by-side, the bright orange just looked better. So I undid it.

I ended up keeping the bright `#ff3d00` brand orange, but stopped using it for text on light backgrounds. The contrast math doesn't care whether the orange is on a background or a border or a glow, only when it's text. The split rule I now apply:

Backgrounds, borders, glows, decorative dots, icons, underline decorations under body links are all fine being orange - none are text.
Text hover states, active states, body text coloured orange on light backgrounds all fail. Use a border or a weight or an underline on the hover/active state instead, and keep the text black.
Buttons with the orange background: text colour switches to black, not white. Black-on-#ff3d00 is 6.2:1; white-on-#ff3d00 is 3.07:1 (which is close, but not enough).

Replacements such as group-hover:text-accent on a card title became group-hover:underline. Active filter tabs lost the orange text and kept just the accent bottom-border. Inline article links became black text with orange underline, rather than orange text. None of these looked worse - in fact many looked much better.

The one change that did feel like a compromise was the button text colour. Black on bright orange is visually heavier than white on bright orange, but it passes AA without me having to dilute the brand.

axe-core now reports zero contrast violations across every route.

Phase 5: the manual sweeps

Past this point, automation runs out. Roughly a third of WCAG can be machine-checked; the rest needs eyes, ears and a keyboard.

Screen reader walkthrough. macOS VoiceOver (Cmd+F5 to toggle) plus Safari is the supported pairing on Apple's platform. The Web Rotor (Ctrl+Option+U) gives you Headings, Links, Form Controls and Landmarks views per page. It's the fastest triage tool you have. Walk every route from the top with Ctrl+Option+A and listen for the bits that don’t sound right. Do dynamic regions (form results, tool outputs) announce when they change? Do buttons describe what they'll do, not what they look like? Small things like does the brand "danieljh.dev" read as the brand or as "danieljh dot dev"?

Reduced motion. macOS System Settings → Accessibility → Display → Reduce Motion (or emulate in browser dev-tools), then check every animation. Animations should skip to the end state, not slow down. The global CSS that handles most of this:

@media (prefers-reduced-motion: reduce) {
  *,
  *::before,
  *::after {
    animation-duration: 0.001ms !important;
    animation-iteration-count: 1 !important;
    transition-duration: 0.001ms !important;
    scroll-behavior: auto !important;
  }
}

For Motion (formerly Framer Motion), wrap the app in <MotionConfig reducedMotion="user">. For GSAP, gate any animation timeline on matchMedia("(prefers-reduced-motion: reduce)").matches.

The component-level mistake I had to fix was a fade-in wrapper that started at opacity: 0 and waited for an intersection observer. With reduce-motion on, the observer never fired and the content stayed invisible. useReducedMotion() from Motion fixed it. When set, skip the fade and render the content at its final state immediately.

Keyboard, eyes-on. Even after the programmatic tests, navigate every route via Tab-only and just look at where the focus ring goes. The most common bug I find here is that the focus ring exists but it's invisible against this particular background. I ended up implementing a global rule that draws both a foreground outline and a background-coloured halo so it's visible on every surface, dark mode included:

:focus-visible {
  outline: 2px solid var(--foreground);
  outline-offset: 2px;
  box-shadow: 0 0 0 4px var(--background);
}

The halo trick is what makes it work on accent buttons and dark surfaces alike. The foreground outline says "this is focused"; the background-coloured halo gives the outline a stripe of breathing room against whatever sits underneath, regardless of colour.

Wiring it into CI

As mentioned at the beginning, I want to wire this into my GitHub Actions CI pipeline to try and stay on top of accessibility as I make changes to the site.

a11y:
  runs-on: ubuntu-latest
  timeout-minutes: 10
  steps:
    - uses: actions/checkout@v4
    - uses: pnpm/action-setup@v4
      with: { version: 10 }
    - uses: actions/setup-node@v4
      with: { node-version: 22, cache: pnpm }
    - run: pnpm install --frozen-lockfile

    - name: Cache Playwright browsers
      id: pw-cache
      uses: actions/cache@v4
      with:
        path: ~/.cache/ms-playwright
        key: pw-${{ runner.os }}-${{ hashFiles('pnpm-lock.yaml') }}

    - if: steps.pw-cache.outputs.cache-hit != 'true'
      run: pnpm exec playwright install --with-deps chromium

    - run: pnpm a11y

    - if: always()
      uses: actions/upload-artifact@v4
      with:
        name: a11y-baseline
        path: docs/a11y-baseline.json

Two things worth noting:

Cache the Playwright Chromium download. It's around 150MB, and restoring it shaves roughly a minute off every run.
Upload the violations report as an artifact even on failure so a reviewer can diff it against the main-branch baseline on a failing PR.

The running time on a small site like mine is under 90 seconds. Roughly the same as the existing lint + typecheck job I’ve got setup, running in parallel with it.

What I'd do differently

If I had done this audit earlier, it’d have meant I’d have got design decisions such as the colour contrast right from the start. Retro-fitting it across a dozen components was a bit of a pain.

The two-thirds of WCAG that automation can't check is also the two-thirds where I learned the most about how sighted developers think about their own work. If you build web UI for a living, I'd encourage you to spend an hour with VoiceOver on, on a site you didn't build, just to feel what an inaccessible page feels like from the other side.

This site isn't finished yet. The manual phases (per-route screen-reader testing, the visual reduced-motion validation) are still not quite finished. Accessibility is one of those things where "done" is always a moving target, but it’s important to always try and keep ahead of it as your site evolves.

Auditing your site for accessibility with axe and Playwright

What AA actually means

Phase 1: stand up the automated checks

Phase 2: keyboard navigation

Phase 3: the structural fixes

Phase 4: colour contrast

Phase 5: the manual sweeps

Wiring it into CI

What I'd do differently

Further reading

Related posts

MCP servers worth your time: a working developer's setup for 2026