Playwright, Cypress, Detox, Maestro — driving the real application as a user would

End-to-End

End-to-end tests boot the application (or attach to a deployed one) and drive it through real interactions. They prove the whole stack works — at least once, on the journeys that matter. They are expensive, brittle relative to lower-tier tests, and the most likely tier to be misused.

The discipline is to keep E2E narrow and load-bearing. A few well-chosen journeys catch defects no other tier catches; a wall of E2E tests for every screen turns CI into an obstacle.

What E2E Is Good For

Smoke tests on the critical path. Sign up, log in, primary user action, checkout. If these fail, ship is blocked regardless of unit coverage.
Cross-system contracts. The frontend talks to the right backend; auth tokens flow correctly; redirects work end-to-end.
Cross-browser / cross-device sanity. A handful of tests across Chrome / Safari / Firefox or iOS / Android catch rendering and runtime gaps.
Deployment validation. Post-deploy smoke run against the actual environment.

What E2E Is Bad For

Branch coverage. Every business rule at E2E level is a maintenance disaster.
Edge cases. Negative numbers, weird locales, error states — these belong in component or unit tests where you can construct them quickly.
Speed. Even a fast suite is multi-minute. It cannot run on every save.

If a test could run at the component layer with the same fidelity, it should.

Web: Playwright

Playwright is now the de facto default for web E2E. It runs against Chromium, Firefox, and WebKit; auto-waits on most actions; produces traces that are useful for debugging.

Project layout

e2e/
├── playwright.config.ts
├── fixtures/
│   └── auth.ts
├── pages/
│   └── checkout-page.ts          Page Object pattern
└── tests/
    ├── auth.spec.ts
    └── checkout.spec.ts

A realistic test

import { test, expect } from '@playwright/test';

test('user can place an order', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('link', { name: 'Sign in' }).click();
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('correct-horse-battery-staple');
  await page.getByRole('button', { name: 'Sign in' }).click();

  await page.getByRole('link', { name: 'Products' }).click();
  await page.getByRole('button', { name: 'Add to cart' }).first().click();
  await page.getByRole('link', { name: 'Cart' }).click();
  await page.getByRole('button', { name: 'Checkout' }).click();

  await expect(page.getByText('Order confirmed')).toBeVisible();
});

The same query priorities apply: role and label first, text next, test-id last.

Playwright config that survives CI

import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './e2e/tests',
  fullyParallel: true,
  forbidOnly: !!process.env.CI,           // fail if .only sneaks in
  retries: process.env.CI ? 2 : 0,         // retry only on CI, surfaces flakes locally
  workers: process.env.CI ? 4 : undefined,
  reporter: process.env.CI ? [['html'], ['github']] : 'list',
  use: {
    baseURL: process.env.BASE_URL ?? 'http://localhost:5173',
    trace: 'on-first-retry',               // trace only on retry
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
    { name: 'webkit',   use: { ...devices['Desktop Safari']  } },
    { name: 'mobile-chrome', use: { ...devices['Pixel 7'] } },
  ],
  webServer: {
    command: 'npm run preview',
    url: 'http://localhost:5173',
    reuseExistingServer: !process.env.CI,
  },
});

Key decisions in this config:

retries: 2 on CI catches genuine flakes without hiding them; the test report shows the retry count.
trace: 'on-first-retry' produces a debuggable trace only when a test flakes — cheap, useful.
forbidOnly prevents accidental .only from leaving the entire suite skipped.

Auth: do it once

Logging in through the UI on every test is slow and brittle. Authenticate once, persist the storage state.

// global-setup.ts
import { chromium } from '@playwright/test';

export default async () => {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto('http://localhost:5173/login');
  await page.fill('input[name=email]', 'user@example.com');
  await page.fill('input[name=password]', 'pass');
  await page.click('button[type=submit]');
  await page.waitForURL('**/dashboard');
  await page.context().storageState({ path: 'e2e/.auth/user.json' });
  await browser.close();
};

// playwright.config.ts (add)
globalSetup: require.resolve('./global-setup'),
use: { storageState: 'e2e/.auth/user.json' },

The auth test itself still runs through the UI; everything else starts already signed in.

Network: intercept where it makes the test honest

Two patterns:

Real backend (default). The test exercises real wiring; failures reveal real defects. Reserve for the smoke tests against staging.
Stubbed backend. Use page.route to intercept specific calls; useful for testing error states the real backend cannot produce on demand.

await page.route('**/api/checkout', route =>
  route.fulfill({ status: 500, body: JSON.stringify({ error: 'oops' }) }),
);
await page.getByRole('button', { name: 'Checkout' }).click();
await expect(page.getByText(/failed/i)).toBeVisible();

Web: Cypress

Cypress is the mature alternative. Strengths: opinionated runner, interactive UI for debugging, time-travel through commands. Weaknesses: in-browser execution model is restrictive (no multi-tab, awkward iframe handling), Electron-only out of the box (newer versions support more).

Pick Playwright for new projects. Stay on Cypress if the team has a working suite — the migration is real work and the gains, while present, are not transformative.

Mobile: Detox

Detox runs the real RN app on a real (or simulator/emulator) device and drives it through native APIs. Faster and more reliable than Appium for RN-specific testing.

import { device, element, by, expect as e } from 'detox';

describe('Order flow', () => {
  beforeAll(async () => {
    await device.launchApp({ newInstance: true });
  });

  beforeEach(async () => {
    await device.reloadReactNative();
  });

  it('places an order', async () => {
    await element(by.id('email')).typeText('user@example.com');
    await element(by.id('password')).typeText('pass');
    await element(by.id('signIn')).tap();

    await e(element(by.text('Dashboard'))).toBeVisible();

    await element(by.id('addToCart')).tap();
    await element(by.id('checkout')).tap();

    await e(element(by.text('Order confirmed'))).toBeVisible();
  });
});

The query API is matchers-and-actions: element(by.X).action(). Querying by testID is the practical default in RN because role-based queries are less reliable.

Detox gotchas

iOS vs Android need separate configs. Set up both; CI runs both matrices.
Animations cause flakiness. Disable animations in test builds (UIView.setAnimationsEnabled(false) on iOS; window.animator.setAnimationDuration(0) on Android).

waitFor is essential.

await waitFor(element(by.text('Loaded')))
  .toBeVisible()
  .withTimeout(5000);

JS context reloads. device.reloadReactNative() is faster than launchApp for most tests but does not reset native state.

Mobile: Maestro

Maestro is the newer entrant: YAML-defined flows, easier to write than Detox, supports both RN and native (and Flutter).

# flows/place-order.yaml
appId: com.example.app
---
- launchApp
- tapOn: "Sign in"
- inputText: "user@example.com"
- tapOn: "Password"
- inputText: "pass"
- tapOn: "Sign in"
- assertVisible: "Dashboard"
- tapOn: "Add to cart"
- tapOn: "Checkout"
- assertVisible: "Order confirmed"

Trade-off: less expressive than Detox (no arbitrary JavaScript in steps), but dramatically faster to write and read. For most product teams, Maestro is enough.

Flutter: integration_test

Flutter's integration_test package runs the real app on a device and supports the same WidgetTester API as widget tests, with extras for native interactions.

// integration_test/app_test.dart
import 'package:flutter_test/flutter_test.dart';
import 'package:integration_test/integration_test.dart';
import 'package:my_app/main.dart' as app;

void main() {
  IntegrationTestWidgetsFlutterBinding.ensureInitialized();

  testWidgets('places an order', (tester) async {
    app.main();
    await tester.pumpAndSettle();

    await tester.enterText(find.byKey(const Key('email')), 'user@example.com');
    await tester.enterText(find.byKey(const Key('password')), 'pass');
    await tester.tap(find.text('Sign in'));
    await tester.pumpAndSettle();

    await tester.tap(find.text('Add to cart'));
    await tester.tap(find.text('Checkout'));
    await tester.pumpAndSettle();

    expect(find.text('Order confirmed'), findsOneWidget);
  });
}

Run with flutter test integration_test/ on a connected device. For cross-platform device farms (Firebase Test Lab, AWS Device Farm, BrowserStack), the same tests run unchanged.

Cross-Cutting Patterns

Page Objects (or Screen Objects)

Wrap interactions for each screen in a class so the test file reads at the journey level, not the selector level.

// pages/login-page.ts
export class LoginPage {
  constructor(private page: Page) {}
  async goto() { await this.page.goto('/login'); }
  async signIn(email: string, password: string) {
    await this.page.getByLabel('Email').fill(email);
    await this.page.getByLabel('Password').fill(password);
    await this.page.getByRole('button', { name: 'Sign in' }).click();
  }
}

// tests/auth.spec.ts
test('signs in', async ({ page }) => {
  const login = new LoginPage(page);
  await login.goto();
  await login.signIn('user@example.com', 'pass');
  await expect(page).toHaveURL(/dashboard/);
});

Worth the boilerplate once a page is used in more than 2-3 tests. Premature page objects in a small suite are overhead.

Test data isolation

E2E tests that share a database collide. Options:

Per-test data, cleaned up after. Slow but deterministic.
Per-suite test database. Reset between runs.
Synthetic users per worker. When parallelism is fixed, allocate one test user per worker.

Whichever you pick, do it before the second flake.

Run on every merge, not on every commit

E2E suites are too slow for save-loop. Typical CI shape:

Unit + component on every PR commit (seconds to a couple of minutes).
E2E smoke on PR (single critical journey, single browser).
Full E2E matrix on merge to main (all browsers, all devices, parallelized).
Post-deploy smoke against the deployed environment.

The PR-time E2E is the bare minimum; the full matrix runs in the background after merge and notifies on failure.

When E2E Tests Fail

The investigation order:

Is it the same test failing repeatedly, or different ones? Same test → defect in the system or in the test. Different tests → infra (flaky network, slow CI host, container issue).
Does it reproduce locally? If yes, debug there. If no, check trace/video/screenshots from CI — usually shows the problem.
Is the failure environmental? Test data left over, race with a deploy, port collision.
Is it a real defect? Lower-tier tests should be added to prevent regression; E2E is too slow to be the long-term guard.

The cheapest E2E debugging tool is the trace file. Playwright's trace viewer and Detox's screenshot logs show exactly what the test saw at every step.

Pre-Commit Checklist

Before adding an E2E test:

The journey under test is critical (broken = ship blocked).
The same property cannot be asserted at a lower tier.
Auth and slow setup are reused across tests, not repeated.
Selectors are user-centric where possible.
Async waits are explicit (waitFor, expect(...).toBeVisible()), not sleeps.
The test cleans up its data, or the suite runs in an isolated environment.
Failure produces a trace / video / screenshot for triage.

End-to-End

On this page