Steven's Knowledge

Strategy

Applying the test pyramid to a frontend codebase — what to test at which level on web and mobile

Frontend Test Strategy

The generic testing strategy — pyramid, FIRST, test doubles — applies to frontend without modification. What changes is where the layers fall in a frontend codebase and which problems each layer actually catches.

Pyramid, Frontend Edition

                  /\
                 /  \         E2E (browser / device)
                /----\        Few. Critical user journeys.
               /      \       Playwright, Cypress, Detox, Maestro.
              /--------\
             /          \     Component / integration
            /            \    Most of the suite for UI work.
           /              \   RTL, Flutter widget test, RN Testing Library.
          /                \  Renders the component with realistic state and
         /------------------\ asserts user-visible output.
        /                    \
       /                      \  Unit
      /                        \ Pure logic, hooks, formatters, reducers, selectors.
     /__________________________\Vitest, Jest, flutter test.

A common misconception in frontend: "the pyramid says lots of unit tests." For UI-heavy code, the bulk often falls in the component layer, not unit — because most of the value of a UI component is in how it renders with state and reacts to user input, neither of which a unit test of an internal function can prove.

The shape that works in practice for a typical product frontend:

  • ~10% E2E — a dozen tests covering the journeys that, if broken, mean revenue is broken.
  • ~70% component / integration — every screen and every reusable component has tests describing what users see.
  • ~20% unit — pure logic, hooks with non-trivial internals, formatters, validators, reducers.

If the unit number is much higher, look hard at whether the tests are exercising real behavior or just internal helpers.

What Each Layer Catches

A frontend defect almost always falls into one of these buckets:

DefectWhere it's caught
Wrong calculation (totals, formatters, validators)Unit
Component renders wrong text/element for a stateComponent
State change does not produce expected re-renderComponent
Form submission sends wrong payloadComponent + network mock
API error not surfaced to the userComponent + network mock
Wrong API endpoint, wrong header, wrong serializationE2E (or contract test)
Auth flow broken end-to-endE2E
Layout broken on small screensVisual regression / manual / responsive test
Animation jank, dropped framesPerformance profiling (not unit)
Memory leak on long sessionsLong-running E2E or profiling
Platform-specific (iOS vs Android, Safari vs Chrome) bugCross-browser / cross-device E2E

Notice that several common defects — layout regressions, animation issues, memory leaks — are not well caught by any tier. They need different tools (visual regression, profilers, observability). Tests are necessary but not sufficient.

What to Test, Concretely

For every new screen or component, ask:

  • What can the user do here? Each interaction should have a component test.
  • What states can this be in? Loading, error, empty, populated, partial — each is a render test.
  • What does the user see when something goes wrong? Error boundaries, fallback UI, retry — each is an explicit test.
  • What is the contract with the API? Capture it in a test that mocks the network at the transport layer and asserts the rendered result.

For every shared hook or utility, ask:

  • What is the function signature, including edge cases? Empty input, null, single-element, oversized — each is a unit test.

For every critical user journey (sign-up, checkout, primary task), ask:

  • Does it work end-to-end at least once? One E2E test that runs on every merge.

What Not to Test (At Each Level)

Frontend has its own version of "tests that test the mock":

  • Snapshot tests as the entire UI suite. A wall of snapshots that no one diffs is documentation, not testing. Use snapshots for output that is otherwise hard to assert (large rendered structures, generated SQL), not as a substitute for behavioral assertions.
  • Tests that select by CSS class or test-id everywhere. If every assertion is getByTestId('foo-button'), the test is coupled to implementation. Prefer accessible queries — getByRole, getByLabelText — which mirror how a user finds elements.
  • Testing the framework. Asserting that useState updates state, or that Flutter's setState triggers a rebuild, tests the framework. Move on.
  • Re-implementing the component in the test.
    // Anti-pattern: testing internals
    const wrapper = render(<Counter />);
    expect(wrapper.find('Counter').state('count')).toBe(0);
    Test what the user sees, not the internal state name.
  • Pixel-perfect assertions in unit tests. "The button is at x=120, y=84" is what visual regression is for, not RTL.

Async, the Recurring Problem

Almost every frontend test failure that is hard to debug is async. The patterns:

Wait for the assertion, not for time

// Brittle: depends on absolute timing
fireEvent.click(button);
await new Promise(r => setTimeout(r, 1000));
expect(screen.getByText('Saved')).toBeInTheDocument();

// Robust: wait until the assertion can succeed
fireEvent.click(button);
expect(await screen.findByText('Saved')).toBeInTheDocument();

findBy* queries retry until they succeed or time out; waitFor lets you wrap an arbitrary assertion in the same retry behavior. Use these instead of arbitrary sleeps.

Mock timers when time is the input

Code that calls setTimeout, setInterval, polling, debouncing — use fake timers (vi.useFakeTimers(), jest.useFakeTimers()) and advance them deterministically. Real timers in tests produce real flakiness.

Flush microtasks deliberately

React's effects, Flutter's pump, RN's event loop — each has a way of "let the framework settle." Know the one for your stack:

Stack"Let it settle"
React + RTLawait waitFor(...), await screen.findBy*(...)
Flutter widget testawait tester.pump() (one frame), await tester.pumpAndSettle() (until animations done)
RN + RN Testing Libraryawait waitFor(...), await act(async () => ...)

A test that uses these consistently is one that does not flake on CI.

Mocking the Network

Where you mock has consequences:

  • Mock the data-access function (e.g., your getUser() wrapper) — fastest, but the test bypasses the actual fetch / serialization / error handling. Defects in those layers are not caught.
  • Mock the transport (MSW for fetch/XHR, Detox for native network, Flutter http.Client override) — slightly slower, but the test exercises the real adapter code. Catches more defects.
  • Hit a real test server — slowest, real wiring, real flakiness. Reserve for a small number of contract or E2E tests.

The default for component tests is transport-level mocking. MSW for web/RN is the de facto standard; Flutter has http.MockClient and the dio package's MockAdapter.

CI Considerations

A frontend suite that runs well on a developer's laptop can fall apart on CI. Common causes and mitigations:

  • Timing differences. CI is often slower than a laptop; timeouts that were generous become tight. Use retry-until-success patterns, not sleeps.
  • Resolution differences. Headless browser at 1024×768 vs laptop at 2560×1440. Either fix the viewport or write tests that do not depend on resolution.
  • Font rendering differences. Visual regression tests fail across machines. Run them in a containerized environment with pinned fonts.
  • Network. CI has different network behavior. Mock everything by default; reserve real network for explicit integration tests.
  • Parallelism. Tests sharing state (database, file system, ports) fail when run concurrently. Either isolate state or limit concurrency.

The CI configuration for frontend tests is part of the testing strategy, not an afterthought. See CI/CD for pipeline patterns.

Recovering a Flaky Suite

When the team has stopped trusting tests:

  1. Quarantine flakes to a known list and stop them from gating PRs.
  2. Categorize: real intermittent bugs (race conditions, network), timing fragility (sleeps, animations), or environment dependence (CI vs local).
  3. Fix the timing fragility first — usually the largest category and the easiest. Convert sleeps to waitFor, fix pumpAndSettle usage, mock timers.
  4. Surface the race conditions next — they were always real bugs, exposed only in CI.
  5. Treat environment dependence as test-design debt — fix in batches.
  6. Track the rate. A burn-down chart of flakes is the lever that gets the work resourced.

Pre-Commit Question

Before merging a PR with new frontend tests, ask:

If a user reported a bug that this code now passes, would at least one of these tests fail?

If yes, the suite is doing its job. If no, the tests assert the implementation, not the user-visible behavior. Add the test that would catch the user's bug.

On this page