If you manage speed optimization for a single website, a Lighthouse audit in Chrome DevTools is often enough to find problems and measure improvements. But if you're an agency, consultancy, or in-house team responsible for 10, 50, or 500+ sites — the workflow is fundamentally different.
2026 Speed Testing & Monitoring Tools — Agency Comparison
Sources: Tool documentation, PageSpeed Matters testing across 300+ client sites, pricing as of Mar 2026
| Feature | PSI / Lighthouse | GTmetrix | SpeedVitals | DebugBear | CrUX BigQuery | WebPageTest |
|---|---|---|---|---|---|---|
| Lab testing | Yes | Yes | Yes | Yes | No | Yes |
| Field data (CrUX) | Yes | Yes | No | Yes | Yes | No |
| Real-User Monitoring (RUM) | No | No | No | Yes | No | No |
| Bulk testing (50+ URLs) | API only | Limited | Yes | Yes | N/A | API only |
| Scheduled monitoring | No | Yes | Yes | Yes | No | API only |
| Regression alerts | No | Basic | Yes | Yes | No | No |
| White-label reports | No | Yes (Pro) | No | Yes | No | No |
| INP measurement | Field only | Lab + field | Lab | Lab + RUM | Field | Lab |
| API access | Free (25K/day) | Paid plans | Paid plans | Yes | Free (BigQuery) | Free + paid |
| Test locations | 1 (US) | 7–30+ | 20+ | 15+ | N/A (field) | 40+ |
| Pricing (agency tier) | Free | $15–$50/mo | $29–$149/mo | $99–$399/mo | Free | Free + $30/mo |
| Best for | Quick audits, CrUX lookup | Client reports, waterfall | Bulk site audits | Ongoing monitoring + RUM | Large-scale field data | Deep debugging |
Key Takeaways
- •Single-page lab tests (Lighthouse, PSI) are useful for debugging but misleading for ongoing monitoring. Field data from CrUX and RUM tools reflects what real users experience — and what Google uses for rankings. Agencies need both, but field data should drive decisions.
- •For agencies managing 10–50 client sites, DebugBear ($99–$399/month) offers the best combined synthetic + RUM monitoring with automated regression alerts, CWV tracking, and white-label reporting. It's the most 'agency-ready' tool on the market in 2026.
- •For agencies managing 50–500+ sites, CrUX BigQuery (free) + a custom dashboard (Looker Studio or Grafana) is the most cost-effective solution for field data monitoring. Combine with the PSI API (free, 25K queries/day) for on-demand lab audits.
- •SpeedVitals ($29–$149/month) is the best purpose-built bulk tester — test 100+ URLs simultaneously with geographic distribution, compare results across test runs, and export CSV reports. Ideal for initial site audits and pre/post optimization comparisons.
- •The biggest agency monitoring mistake: reporting lab scores (Lighthouse) to clients instead of field data (CrUX). Lab scores fluctuate 10–20 points between runs. CrUX data is a 28-day rolling average that reflects actual user experience and directly impacts Google search rankings.
Introduction: Why Agency Speed Monitoring Is Different
If you manage speed optimization for a single website, a Lighthouse audit in Chrome DevTools is often enough to find problems and measure improvements. But if you're an agency, consultancy, or in-house team responsible for 10, 50, or 500+ sites — the workflow is fundamentally different.
You need bulk testing: the ability to audit hundreds of URLs across multiple client sites in a single workflow. You need scheduled monitoring: automated daily or weekly tests that alert you when a client's CWV scores regress. You need field data: real-user metrics from CrUX that show what Google actually sees — not lab scores that fluctuate 15 points between runs. And you need reporting: exportable, client-ready reports that communicate performance in business terms.
We've managed speed optimization across 300+ client sites over the past 4 years. This guide shares our evaluation of every major testing and monitoring tool through that agency lens — what we actually use, what we've tried and abandoned, and how we've built a monitoring stack that scales from 10 to 300+ sites without proportional cost increases.
The tools have matured significantly in 2026. INP measurement (finally stable after replacing FID in March 2024) is now available in most tools. CrUX data coverage has expanded. RUM solutions have become affordable for agency deployment. But the landscape is also more fragmented — making tool selection harder than ever.
1. Why Single-Page Testing Isn't Enough for Agencies
The typical agency workflow starts with a single-page Lighthouse audit: run PageSpeed Insights on the client's homepage, screenshot the score, and send it in a proposal. This is fine for sales — but it's dangerously insufficient for ongoing optimization and monitoring.
5–20 pts
Typical Lighthouse score variance between consecutive runs of the same URL
Google Web Performance team, Lighthouse documentation
Lab vs. Field: The Gap That Misleads Clients
Lab tests (Lighthouse, GTmetrix, WebPageTest) measure performance under controlled conditions: a single device profile, a single network speed, a single geographic location, a single page load with no prior cache. These results are reproducible but not representative.
Field data (CrUX) measures performance from real Chrome users over a 28-day rolling window: diverse devices (flagship phones to budget Androids), diverse networks (5G to 3G), diverse geographies, and real user interactions (scrolling, clicking, navigating). This is what Google uses for Core Web Vitals ranking signals.
The gap between lab and field is often dramatic: a site scoring 85 in Lighthouse (lab) might have a CrUX LCP of 3.2 seconds (field) — failing CWV. The reverse also happens: a site scoring 55 in Lighthouse might pass all CWV in CrUX because real users are primarily on fast devices and networks.
For agencies, reporting lab scores when field scores tell a different story damages credibility. Clients don't care about Lighthouse numbers — they care about whether Google considers their site 'fast' or 'slow' for ranking purposes. CrUX is the only data source that answers that question.
The Multi-Page Problem
Homepages are rarely the worst-performing page on a site. Product pages with heavy image galleries, collection pages with 50+ product cards, blog posts with embedded videos, and search results pages with dynamic filtering — these pages often have 30–50% worse CWV than the homepage.
An agency auditing only the homepage misses the pages where users actually convert (or bounce). Effective monitoring requires testing the full page-type matrix: homepage, key landing pages, product/detail pages, collection/category pages, search results, checkout flow, and any pages receiving significant organic traffic.
For a typical e-commerce client with 5 page types × 3 geographic test locations, that's 15 URLs to monitor per client. Scale that to 50 clients: 750 URLs requiring regular testing. Single-page tools don't scale to this.
Score Variance: Why Single Tests Lie
Lighthouse scores vary 5–20 points between consecutive runs of the same URL. Server load, network variability, CDN cache state, and JavaScript timing all introduce randomness. A single test showing '78' could be a '65' or a '90' on the next run.
For agencies, this variance creates a credibility problem: 'Your score improved from 62 to 74!' might just be normal variance. Reliable measurement requires either multiple test runs averaged together (3–5 runs minimum) or field data (CrUX, which is a 28-day average by design).
This is why our agency workflow uses lab tests for debugging (finding specific issues) and field data for reporting (measuring actual improvement). The two serve different purposes and should never be conflated.
2. PageSpeed Insights (PSI) for Bulk Agency Workflows
PageSpeed Insights is the most widely used speed testing tool — and for good reason. It's free, it shows both lab (Lighthouse) and field (CrUX) data, and it's Google's own tool. But its single-URL interface doesn't scale for agencies. The PSI API changes that.
PSI API: Free Bulk Testing at Scale
The PSI API (v5) provides programmatic access to the same data shown in the PSI web interface — Lighthouse lab results and CrUX field data — for any URL. The free tier allows 25,000 queries per day with no API key required (400 queries/100 seconds with an API key).
At 25K queries/day, an agency can test 750 URLs across 33 clients daily — more than enough for most agency portfolios. We run a nightly cron job that tests 5 URLs per client (homepage + 4 key page types) across our full client roster, stores results in a database, and generates trend reports.
The API returns both 'labData' (Lighthouse audit results) and 'loadingExperience' (CrUX field data for the specific URL) plus 'originLoadingExperience' (CrUX data for the entire origin/domain). This dual dataset — lab for debugging, field for reporting — makes PSI the foundation of most agency monitoring stacks.
- •Cost: Free (25K queries/day without key, 400/100sec with key).
- •Data: Lighthouse lab results + CrUX field data (URL-level and origin-level).
- •INP: Available in CrUX field data. Not available in Lighthouse lab results (Lighthouse measures TBT as a lab proxy for INP).
- •Bulk capability: API-only — requires scripting. No bulk UI.
- •Limitations: Tests from US-only (lab). CrUX requires sufficient traffic (at least ~1,000 page loads in 28 days).
Building a PSI Bulk Testing Script
A basic PSI bulk testing workflow requires: (1) a list of URLs to test, (2) a script that calls the PSI API for each URL, (3) a database or spreadsheet to store results, and (4) a dashboard to visualize trends.
We use a Node.js script that reads URLs from a Google Sheet (one sheet per client, columns for URL and page type), calls the PSI API with a 200ms delay between requests (to stay within rate limits), extracts key metrics (LCP, INP, CLS, TTFB, Lighthouse score), and writes results to a PostgreSQL database. A Looker Studio dashboard connects to the database and generates client-facing reports.
Total setup time: ~4 hours for the initial script + dashboard. Ongoing maintenance: near zero. Cost: $0 (PSI API is free, Looker Studio is free, PostgreSQL can run on a $5/month VPS).
PSI Limitations for Agencies
- •No scheduling: PSI doesn't run tests automatically. You need a cron job or external scheduler.
- •No alerting: PSI doesn't notify you when scores drop. You need to build alerting on top of stored results.
- •US-only lab tests: Lighthouse tests run from a US location. For clients with primarily non-US audiences, lab results may not reflect user experience.
- •CrUX coverage gaps: Low-traffic pages and sites may not have CrUX data. ~60% of URLs in CrUX have enough traffic for URL-level data.
- •No waterfall view: PSI doesn't show request waterfalls. For debugging specific issues, you need GTmetrix or WebPageTest.
- •No RUM: PSI shows CrUX (aggregated field data) but not per-session RUM data. You can't see individual user experiences.
Tip
For agencies on a budget, the PSI API + a simple database + Looker Studio is the highest-ROI monitoring setup. It costs $0–$5/month, provides both lab and field data, and scales to hundreds of client URLs. We used this exact setup for our first 2 years before adding paid tools.
3. GTmetrix: Monitoring Plans & API for Client Reporting
GTmetrix has been the go-to 'client-friendly' speed testing tool for years — its waterfall charts are more readable than Lighthouse's, its reports are visually clean, and its monitoring features are built for recurring testing.
GTmetrix for Agencies: What It Does Well
GTmetrix's strengths are in presentation and ease of use:
- •Visual waterfall charts: GTmetrix's waterfall view is the best in the industry for client communication. Non-technical clients can understand 'this image took 1.2 seconds to load' from a GTmetrix waterfall in ways they can't from a Lighthouse treemap.
- •Monitoring & alerts: GTmetrix Pro plans include scheduled monitoring (hourly, daily, or weekly) with email/Slack alerts when performance degrades. Set thresholds per metric and get notified automatically.
- •Multiple test locations: 7 locations on the free plan, 30+ on paid plans. Test from the geographic regions where your clients' customers actually are.
- •CrUX integration: GTmetrix now shows CrUX field data alongside its lab results — bridging the lab/field gap in a single interface.
- •PDF reports: Export branded PDF reports for client meetings. Cleaner and more professional than Lighthouse HTML reports.
- •Historical comparison: Compare current results against historical baselines. Show clients 'before and after' with visual timelines.
GTmetrix Pricing for Agencies (2026)
GTmetrix's pricing tiers in 2026:
- •Free: 5 tests/day, 1 monitored URL, 7 test locations, basic reports. Sufficient for testing individual URLs during audits.
- •Solo ($15/month): 50 tests/day, 3 monitored URLs, weekly monitoring, PDF reports. Minimum viable for a freelancer with 1–3 clients.
- •Starter ($25/month): 100 tests/day, 10 monitored URLs, daily monitoring, API access (100 calls/day). Reasonable for 5–10 clients.
- •Growth ($50/month): 300 tests/day, 25 monitored URLs, hourly monitoring, priority test queue, white-label reports. The agency sweet spot for 10–25 clients.
- •Custom/Enterprise: Unlimited tests, 100+ monitored URLs, dedicated infrastructure, SLA. Contact for pricing. For 25+ clients.
GTmetrix Limitations
- •No RUM: GTmetrix is lab-only (plus CrUX field data). No real-user monitoring of individual sessions.
- •Limited bulk testing: No 'test 100 URLs at once' feature. Monitored URLs are tested on schedule, but ad-hoc bulk testing requires API scripting.
- •Lighthouse version lag: GTmetrix sometimes runs an older Lighthouse version than PSI. Scores can differ from PSI by 5–10 points due to version differences.
- •INP in lab: GTmetrix's lab INP measurement uses simulated interactions, which may not reflect real user behavior. Rely on CrUX field data for accurate INP.
- •Monitoring limits: Even the Growth plan caps at 25 monitored URLs. Agencies with 50+ client URLs need the Enterprise plan or a supplementary tool.
4. SpeedVitals: The Purpose-Built Bulk Tester
SpeedVitals is the tool most agencies don't know about — and the one that fills the biggest gap in the testing toolkit. It's designed specifically for bulk testing: input 100+ URLs, test them simultaneously from multiple locations, and compare results in a single dashboard.
Why SpeedVitals Stands Out for Bulk Audits
SpeedVitals solves the 'initial audit' problem better than any other tool. When a new client signs up, the first task is auditing their entire site — homepage, product pages, collection pages, blog, checkout, etc. — to identify which pages have CWV issues and prioritize optimization work.
With PSI or GTmetrix, this means running 20–50 individual tests, waiting for each one, recording results in a spreadsheet, and manually comparing. With SpeedVitals, you paste 50+ URLs, select test locations, click 'Run', and get a comparative dashboard in minutes.
- •Bulk testing: Test 100+ URLs simultaneously. Results displayed in a sortable, filterable table with pass/fail indicators for each CWV metric.
- •Geographic comparison: Test each URL from 20+ global locations. Instantly see which pages are slow in which regions.
- •Run comparison: Save test runs and compare them side-by-side. Ideal for 'before/after' optimization reporting.
- •CSV export: Export all results to CSV for further analysis or client reporting.
- •CWV focus: Dashboard is organized around Core Web Vitals (LCP, INP, CLS, TTFB) — not vanity metrics. Each URL gets a clear pass/fail status.
- •Filmstrip view: Visual filmstrip of page load for each URL. Helpful for identifying LCP elements and render-blocking resources.
SpeedVitals Pricing (2026)
- •Free: 5 tests/day, 1 test location, no bulk testing. Useful for quick checks only.
- •Starter ($29/month): 300 tests/month, 5 test locations, bulk testing (up to 20 URLs), run comparison. Good for freelancers.
- •Pro ($79/month): 1,500 tests/month, 15 test locations, bulk testing (up to 50 URLs), API access, CSV export. Agency sweet spot.
- •Business ($149/month): 5,000 tests/month, 20+ test locations, bulk testing (up to 100 URLs), priority queue, dedicated support. For large agencies.
SpeedVitals Limitations
- •Lab only: No CrUX field data integration. No RUM. SpeedVitals is a pure lab testing tool — great for audits, not for ongoing field monitoring.
- •No scheduled monitoring: Tests are on-demand only. No automated daily/weekly monitoring with alerts.
- •No white-label: Reports are SpeedVitals-branded. No custom branding for client deliverables.
- •Newer tool: Smaller community and fewer integrations than GTmetrix or PSI. Documentation is adequate but not extensive.
- •Test consistency: Like all lab tools, results vary between runs. SpeedVitals mitigates this by running 3 tests per URL and showing the median, but variance still exists.
5. DebugBear: The Best Combined Synthetic + RUM Tool for Agencies
DebugBear is the tool we recommend most for agencies that want a single platform covering synthetic monitoring, CrUX field data, AND real-user monitoring. It's purpose-built for the agency use case — and in 2026, it's the most complete monitoring solution available.
What Makes DebugBear Agency-Ready
DebugBear combines three data sources in one dashboard:
- •Synthetic monitoring: Scheduled Lighthouse-based lab tests from 15+ global locations. Daily, 4x/day, or hourly testing depending on plan. Results include full Lighthouse audits, waterfalls, and filmstrips.
- •CrUX field data: Automated CrUX data pulls for every monitored URL and origin. 28-day rolling CWV metrics displayed alongside lab data. No API scripting required — DebugBear pulls CrUX automatically.
- •Real-User Monitoring (RUM): A lightweight JavaScript snippet (<3KB) that captures actual user CWV metrics per page load. See INP by page, by device type, by country, by connection speed. This is the data layer that CrUX doesn't provide — individual page-level and segment-level RUM.
- •Regression detection: Automated alerts when CWV metrics degrade. DebugBear analyzes trends and flags anomalies — not just threshold breaches. If LCP increases by 300ms over 5 days, you get alerted before it becomes a CWV failure.
- •White-label reporting: Custom-branded PDF reports with your agency logo. Scheduled email delivery to clients. CWV trends, improvement tracking, and competitive benchmarking.
- •Competitive monitoring: Monitor competitor URLs alongside client URLs. Show clients how their CWV compares to direct competitors.
DebugBear Pricing (2026)
- •Starter ($99/month): 50 monitored URLs, daily testing, CrUX data, basic RUM (10K page views/month), email alerts. Good for 5–10 clients.
- •Growth ($199/month): 150 monitored URLs, 4x/day testing, advanced RUM (50K page views/month), white-label reports, Slack/webhook alerts. Agency sweet spot for 15–30 clients.
- •Scale ($399/month): 500 monitored URLs, hourly testing, full RUM (200K page views/month), API access, competitive monitoring, priority support. For 30–100+ clients.
- •Enterprise (custom): Unlimited URLs, custom testing frequency, unlimited RUM, dedicated support, SLA.
DebugBear vs. Other Tools
DebugBear's unique value is the combination of synthetic + CrUX + RUM in one platform. No other tool does all three at agency-friendly pricing:
- PSI: Lab + CrUX, but no RUM, no monitoring, no alerts. - GTmetrix: Lab + CrUX + monitoring, but no RUM, limited bulk. - SpeedVitals: Lab + bulk, but no CrUX, no RUM, no monitoring. - WebPageTest: Lab + deep debugging, but no CrUX, no RUM, no monitoring.
The only competitors offering RUM + synthetic at similar scale are Calibre ($399–$999/month — more expensive) and SpeedCurve ($15K+/year — enterprise-priced). DebugBear occupies the sweet spot for agencies that need comprehensive monitoring without enterprise budgets.
DebugBear Limitations
- •Cost: At $99–$399/month, DebugBear is the most expensive tool in this comparison (excluding enterprise tools). For agencies managing <5 clients, the PSI API + GTmetrix may be more cost-effective.
- •RUM snippet overhead: The RUM JavaScript snippet adds ~2–3KB and ~5ms to page load. Negligible in practice, but technically adds to client page weight.
- •Learning curve: The dashboard is feature-rich, which means a steeper learning curve than GTmetrix's simpler interface.
- •Test locations: 15+ locations is fewer than WebPageTest (40+) or GTmetrix (30+). Adequate for most use cases but limited for hyper-regional testing.
Tip
If you can only afford one paid monitoring tool as an agency, make it DebugBear. The combination of automated CrUX tracking, RUM data for debugging field issues, regression alerts, and white-label reports eliminates the need for 3–4 separate tools. The time savings alone justify the cost within the first month.
6. CrUX BigQuery: Free Agency-Scale Field Data
The Chrome User Experience Report (CrUX) is available as a public BigQuery dataset — free to query (within BigQuery's free tier of 1TB/month). For agencies that need field data across hundreds of client origins, CrUX BigQuery is the most powerful and cost-effective option.
What CrUX BigQuery Provides
CrUX BigQuery contains anonymized, aggregated performance metrics from real Chrome users — the same data that powers Google's CWV ranking signal. The dataset is updated monthly and contains:
- •All Core Web Vitals: LCP, INP, CLS — with histogram distributions (p75, good/needs-improvement/poor percentages).
- •TTFB, FCP, FID (legacy): Additional timing metrics for comprehensive analysis.
- •Segmentation: Data broken down by form factor (phone, desktop, tablet), effective connection type (4G, 3G, 2G), and country.
- •Historical data: Monthly snapshots going back to 2017. Track CWV trends over years for any origin in the dataset.
- •Origin and URL level: Origin-level data (entire domain) for all qualifying sites. URL-level data for pages with sufficient traffic.
- •Popularity ranking: Sites are ranked by traffic, allowing competitive analysis within verticals.
Agency Use Case: Multi-Client CWV Dashboard
The highest-value CrUX BigQuery use case for agencies is a multi-client CWV dashboard that automatically updates monthly. Here's the workflow:
1. Maintain a table of client domains (origins) in BigQuery. 2. Write a query that joins the CrUX monthly dataset with your client list, extracting p75 LCP, INP, CLS, and pass/fail status for each. 3. Connect the query results to Looker Studio (free) or Grafana. 4. Dashboard auto-updates when CrUX publishes new monthly data.
Result: a single dashboard showing CWV status for every client, updated monthly, at zero ongoing cost. We use this as our 'early warning system' — if a client's CrUX data shows INP creeping from 180ms to 195ms, we proactively investigate before it crosses the 200ms threshold.
BigQuery free tier includes 1TB of query processing per month. A typical multi-client CrUX query processing 100 origins uses ~10–50GB — well within the free tier. Even heavy usage rarely exceeds $5–$10/month.
CrUX Limitations
- •Monthly granularity: CrUX data updates monthly. For real-time monitoring, you need RUM (DebugBear) or synthetic monitoring (GTmetrix, DebugBear).
- •Chrome-only: CrUX measures Chrome users only (~65% of global browser market). Safari, Firefox, and Edge users are not represented.
- •Traffic threshold: Sites/URLs need ~1,000+ page loads in 28 days to appear in CrUX. Low-traffic client sites may have no data.
- •No debugging data: CrUX tells you WHAT the metrics are, not WHY. You can't see waterfalls, resource sizes, or JavaScript execution. Use lab tools for debugging.
- •Aggregated data: CrUX shows distributions and percentiles, not individual user sessions. For per-session debugging, you need RUM.
- •BigQuery knowledge required: Querying CrUX requires SQL knowledge and BigQuery familiarity. The learning curve is moderate for non-technical team members.
7. WebPageTest API & Bulk Scripting
WebPageTest (WPT) is the deepest, most configurable lab testing tool available. Its waterfall charts, filmstrips, and connection-level diagnostics are unmatched for debugging complex performance issues. For agencies, the API enables bulk testing at scale.
WebPageTest Strengths for Agencies
- •Deepest diagnostics: Connection-level waterfall, request/response headers, JavaScript execution timeline, CPU profiling, long task detection. When GTmetrix or Lighthouse can't explain a performance issue, WPT usually can.
- •40+ test locations: The widest geographic distribution of any testing tool. Test from Tokyo, São Paulo, Mumbai, Sydney, Frankfurt, and dozens more.
- •Custom device profiles: Test with specific device CPU throttling, network profiles (3G, 4G, cable), and browser configurations. Match real user conditions precisely.
- •Scripted testing: WPT supports multi-step scripts — login, navigate, click, wait for element, measure. Test authenticated pages and complex user flows that single-URL tools can't.
- •Video comparison: Side-by-side video comparison of page loads. Powerful for client presentations showing 'before vs. after' visually.
- •API access: Free API key (200 tests/day) + paid plans for higher volume. Integrate into CI/CD pipelines, bulk testing scripts, or monitoring dashboards.
WebPageTest Pricing (2026)
- •Free: 10 tests/day via web UI, 200 tests/day via API (with free API key). No monitoring.
- •Pro ($30/month): 1,500 tests/month, priority queue, no-ads experience, API access. Good for agencies running bulk tests weekly.
- •Enterprise (contact): Higher volume, private instances, SLA, dedicated support.
- •Self-hosted: WebPageTest is open-source. Agencies can run private instances on their own infrastructure for unlimited testing (infrastructure cost only).
WebPageTest Limitations for Agencies
- •No CrUX data: WPT is lab-only. No field data integration.
- •No monitoring: WPT doesn't offer scheduled monitoring or alerts. It's an on-demand testing tool.
- •No RUM: No real-user monitoring capability.
- •Complex UI: The interface is powerful but not client-friendly. Waterfalls and filmstrips require technical knowledge to interpret. Use GTmetrix or DebugBear for client-facing reports.
- •Slow tests: WPT tests take 30–90 seconds each (longer than Lighthouse or SpeedVitals). Bulk testing 100+ URLs can take hours.
- •Queue times: Free-tier tests often wait 2–10 minutes in queue during peak hours. Paid plans provide priority access.
8. Calibre, Treo, & Other Contenders
Several other tools serve specific niches in the agency monitoring landscape.
Calibre ($399–$999/month)
Calibre is a premium synthetic monitoring platform with excellent CI/CD integration, performance budgets, and team collaboration features. Its 'Test Profiles' system lets agencies define standardized testing configurations applied across all client sites.
Best for: Large agencies (50+ clients) and enterprise in-house teams with performance engineering culture. The price point excludes most small/mid-size agencies.
Limitation: No RUM. Lab-only monitoring at enterprise pricing.
Treo (Free + Paid)
Treo focuses specifically on CrUX field data visualization. Its free 'Treo Site Speed' tool provides a beautiful CrUX dashboard for any origin — no setup required. The paid product adds historical trends, multi-site comparison, and alerting.
Best for: Quick CrUX lookups and competitive analysis. The free tier is useful for sales conversations — show a prospect their CrUX data in 10 seconds.
Limitation: CrUX data only — no lab testing, no RUM, no debugging capability.
SpeedCurve ($15K+/year)
SpeedCurve is the enterprise-grade RUM + synthetic monitoring solution. Excellent data visualization, deep RUM segmentation, and strong API. Used by some of the largest e-commerce brands.
Best for: Enterprise clients with large budgets and dedicated performance teams.
Limitation: Pricing excludes agencies and SMBs. Minimum annual commitment.
Pingdom / Uptime Monitoring Tools
Pingdom, UptimeRobot, and similar tools monitor uptime and basic response time — but they don't measure Core Web Vitals. They answer 'is the site up?' not 'is the site fast?'
Best for: Complementary uptime monitoring. Not a substitute for CWV-focused performance monitoring.
Agency tip: Bundle uptime monitoring (UptimeRobot, free for 50 URLs) with CWV monitoring (DebugBear or CrUX BigQuery) for comprehensive client coverage.
9. Real-User Monitoring (RUM) Deep-Dive
RUM — collecting performance metrics from actual user page loads via a JavaScript snippet — is the most underutilized monitoring capability for agencies. CrUX provides aggregated field data, but RUM provides granular, per-session, per-page data that enables precise debugging.
What RUM Tells You That Lab Tests Can't
- •INP by page and interaction: CrUX gives you origin-level INP. RUM gives you INP per page, per interaction type (click, keypress, scroll), per device category. You can identify which specific page and which specific interaction is causing INP failures.
- •Geographic performance distribution: Lab tests measure from 1–3 locations. RUM shows performance from every location where real users visit. Discover that your client's site is fast in the US but slow in Germany due to a CDN misconfiguration.
- •Device-specific issues: RUM segments by device model and browser version. Discover that iPhone 12 users have 120ms INP but Samsung Galaxy A13 users have 380ms INP — revealing a JavaScript complexity issue that only affects low-end devices.
- •Third-party script impact: RUM captures the actual performance impact of third-party scripts (analytics, chat, ads) as experienced by real users — including scripts that load asynchronously after Lighthouse tests complete.
- •Session-level debugging: Trace a specific user session to see exactly which resources loaded, which scripts executed, and which interactions triggered long tasks. This is the 'why' behind poor CrUX numbers.
RUM Options for Agencies in 2026
The RUM landscape has become more accessible in 2026:
- •DebugBear RUM ($99–$399/month): Best integrated option — RUM data alongside synthetic and CrUX in one dashboard. 10K–200K page views/month depending on plan.
- •Vercel Web Analytics (free–$20/month): If your clients' headless stores are on Vercel, the built-in Web Vitals monitoring is excellent and free for small sites. Limited to Vercel-hosted sites.
- •Google Analytics 4: GA4 now reports Core Web Vitals in the 'Site Speed' section. Free, but data is sampled (1% of sessions) and delayed by 24–48 hours. Useful as a free baseline but not precise enough for optimization work.
- •web-vitals.js (free, DIY): Google's open-source JavaScript library for capturing CWV metrics. Send data to your own analytics endpoint. Maximum flexibility, zero cost, but requires development work to build collection and visualization.
- •Cloudflare Web Analytics (free): If clients use Cloudflare, its free Web Analytics includes CWV metrics from real users. Privacy-focused (no cookies). Limited segmentation compared to DebugBear.
When Agencies Should Deploy RUM
RUM isn't necessary for every client. Deploy it when:
- A client's CrUX data shows CWV failures but lab tests pass. RUM reveals what real users experience that lab tests miss. - INP is the failing metric. INP requires real user interactions to measure accurately — lab simulations are approximations. - The client has international traffic. RUM shows geographic performance distribution that lab tests from 1–3 locations can't capture. - Third-party scripts are suspected. RUM measures the real-world impact of analytics, ads, and chat widgets. - The client's revenue justifies the investment. For clients paying $3K+/month for optimization, $20–$50/month for RUM data is trivial and dramatically improves diagnostic capability.
10. Building an Agency Monitoring Stack
Based on our experience monitoring 300+ client sites, here are the stacks we recommend at different agency scales.
Tier 1: Bootstrapping (1–10 Clients) — $0–$30/month
- •PSI API (free): Nightly bulk tests for all client URLs. Store results in a Google Sheet or simple database.
- •CrUX BigQuery (free): Monthly CWV dashboard in Looker Studio. Auto-updates when CrUX publishes new data.
- •GTmetrix Free: On-demand waterfall analysis for debugging specific issues.
- •WebPageTest Free: Deep-dive debugging when GTmetrix can't explain an issue.
- •Total cost: $0–$5/month (database hosting). Time investment: 4–6 hours initial setup, 1 hour/week maintenance.
Tier 2: Growing Agency (10–30 Clients) — $100–$250/month
- •DebugBear Growth ($199/month): 150 monitored URLs, daily synthetic testing, CrUX auto-tracking, basic RUM. White-label reports for client meetings.
- •CrUX BigQuery (free): Supplementary monthly field data for origins not in DebugBear.
- •SpeedVitals Pro ($79/month): Bulk testing for new client audits and pre/post optimization comparisons.
- •WebPageTest Pro ($30/month): Deep debugging when DebugBear's synthetic tests identify issues that need waterfall-level analysis.
- •Total cost: ~$310/month. ROI: eliminates 4–6 hours/week of manual testing and report generation.
Tier 3: Large Agency (30–100+ Clients) — $400–$800/month
- •DebugBear Scale ($399/month): 500 monitored URLs, hourly testing, full RUM (200K page views), competitive monitoring, API access for custom integrations.
- •CrUX BigQuery (free): Multi-client field data dashboard for executive reporting.
- •SpeedVitals Business ($149/month): Bulk auditing for new client onboarding (100 URLs at once).
- •WebPageTest Pro ($30/month): On-demand deep debugging.
- •Custom PSI API pipeline (free): Supplementary daily testing for URLs not covered by DebugBear quotas.
- •Total cost: ~$580/month. Scales to 100+ clients without proportional cost increase.
11. Tool Selection Matrix
Based on the specific task you need to accomplish:
Initial site audit (new client)
SpeedVitals (bulk) + PSI (CrUX)
Test 50+ URLs simultaneously with SpeedVitals. Pull CrUX field data via PSI to compare lab vs. field. Deliver comprehensive audit in 1 hour, not 1 day.
Ongoing CWV monitoring (all clients)
DebugBear (synthetic + CrUX + RUM)
Automated daily testing with regression alerts. CrUX auto-tracking shows field performance trends. White-label reports for client communication.
Debugging a specific performance issue
WebPageTest (waterfall + scripting)
Connection-level waterfall, CPU profiling, and multi-step scripted tests diagnose issues that simplified tools miss. The 'last resort' debugging tool.
Client-facing performance reports
GTmetrix or DebugBear
GTmetrix's visual waterfall + PDF export is best for non-technical clients. DebugBear's white-label reports are best for branded deliverables.
Monitoring field data at scale (50+ origins)
CrUX BigQuery + Looker Studio
Free, monthly-updated field data for every client origin. Custom SQL queries enable any segmentation. Scales to 1,000+ origins at zero marginal cost.
Diagnosing INP failures
DebugBear RUM + Chrome DevTools
DebugBear RUM identifies which pages and interactions cause high INP. Chrome DevTools' Performance panel traces the specific long tasks and event handlers.
Pre/post optimization comparison
SpeedVitals (run comparison)
Test the same 50 URLs before and after optimization. SpeedVitals' run comparison shows per-URL deltas. Export to CSV for client reports.
Competitive benchmarking
DebugBear or CrUX BigQuery
DebugBear monitors competitor URLs alongside client URLs. CrUX BigQuery enables large-scale competitive analysis across entire verticals.
12. Common Mistakes in Bulk Monitoring
Mistakes we've seen agencies make repeatedly when scaling their performance monitoring.
Reporting Lab Scores as 'The Truth'
The most damaging mistake: telling a client 'your Lighthouse score improved from 62 to 78' when their CrUX data shows LCP worsened from 2.2s to 2.8s. Lab scores fluctuate; CrUX is the metric Google uses for rankings. Always lead with field data. Use lab scores for debugging, not reporting.
We've seen agencies lose clients after reporting improving Lighthouse scores while the client's organic traffic dropped — because CrUX field data (which impacts rankings) told a different story.
Not Testing Enough Page Types
Monitoring only the homepage misses 80% of performance issues. E-commerce sites have vastly different performance profiles across page types: product pages (heavy images, variant JS), collection pages (large DOM, filter interactions), search results (dynamic content, API calls), and checkout (form validation, payment scripts).
Minimum monitoring set per client: homepage + 2 product pages (simple and complex) + 1 collection page + 1 blog post. For e-commerce clients, add search results and checkout/cart pages.
Ignoring Mobile vs. Desktop Segmentation
CrUX reports separate mobile and desktop data. A site might pass all CWV on desktop while failing LCP and INP on mobile. Since Google uses mobile-first indexing, the mobile CrUX data is what impacts rankings.
Always segment monitoring by device type. Lab tests should primarily use mobile device profiles (Moto G Power or equivalent mid-range Android). Desktop lab tests are supplementary.
Alert Fatigue
Setting performance alert thresholds too aggressively leads to alert fatigue — daily notifications for 3-point Lighthouse score fluctuations that are normal variance, not real regressions.
Set alert thresholds on CWV metrics (not Lighthouse scores) with meaningful margins: alert when LCP exceeds 2.2s (80% of the 2.5s threshold), INP exceeds 170ms (85% of 200ms), CLS exceeds 0.08 (80% of 0.1). This catches genuine regressions before they become CWV failures while avoiding false positives.
Not Correlating Speed Changes with Deployments
When a client's CWV regresses, the first question is: 'what changed?' Without deployment tracking, you're guessing. Integrate deployment markers (via API or manual logging) with your monitoring tool so performance changes can be correlated with specific code deployments, plugin updates, or content changes.
DebugBear supports deployment markers via API. For PSI/CrUX-based monitoring, maintain a deployment log per client and cross-reference with monthly CrUX data changes.
Common Pitfall
The biggest agency monitoring mistake we see: using tools to generate reports instead of using tools to drive action. A beautiful DebugBear dashboard means nothing if no one reviews it weekly, investigates regressions, and communicates findings to clients. Tools are inputs — optimization is the output.
13. Conclusion & Next Steps
The speed testing landscape in 2026 is mature enough that no agency needs to build tools from scratch — but fragmented enough that choosing the right combination requires understanding what each tool does (and doesn't do).
The foundational principle: use field data (CrUX) for reporting and decision-making, lab data for debugging and diagnostics, and RUM for granular per-session analysis when field data raises questions. No single tool covers all three well — which is why the stack approach works better than trying to find one tool that does everything.
For most agencies, the optimal stack in 2026 is: DebugBear for ongoing synthetic + CrUX + RUM monitoring, SpeedVitals for bulk audits, CrUX BigQuery for large-scale field data, and WebPageTest for deep debugging. Total cost: $250–$600/month depending on scale — a fraction of the value delivered to clients through proactive performance management.
If you're building your agency monitoring practice from zero, start with the free tools (PSI API + CrUX BigQuery + GTmetrix free). They're surprisingly capable and cost nothing. Graduate to paid tools (DebugBear + SpeedVitals) when your client count exceeds 10 and the time savings justify the investment.
Whatever tools you choose, remember: the tool is not the deliverable. The deliverable is faster client websites, better CWV scores, and measurable revenue improvement. Use the tools that help you deliver that outcome most efficiently — and don't let tool evaluation become a substitute for actual optimization work.
Related Resources

Matt Suffoletto
Founder & CEO, PageSpeed Matters
Matt Suffoletto is the Founder & CEO of PageSpeed Matters, a performance optimization consultancy helping businesses improve Core Web Vitals, page speed, and conversion rates. With years of experience optimizing hundreds of sites across Shopify, WooCommerce, WordPress, and enterprise platforms, Matt and his team deliver measurable speed improvements that drive real revenue growth.
