Platform Methodology

How We Calculate Every Metric

SiteWorthIt aggregates data from multiple APIs and public datasets to estimate website traffic, revenue, authority, and technical performance. This page documents every data source, formula, and confidence level behind each number you see on the platform.

Monthly Traffic Estimates

Primary: DataForSEO Bulk Traffic Estimation (search traffic)

Traffic estimation is the hardest problem in third-party web analytics. No external service can observe another site's actual server logs — every number is an estimate derived from indirect signals. SiteWorthIt is deliberately transparent about this: each traffic figure carries a source and a confidence level, and the only way to see truly measured traffic is to connect your own Google Analytics.

Tier 1 — Google Analytics 4 (confidence: HIGH, measured). If you own the site and connect your GA4 property (read-only), we replace every estimate with your real, first-party numbers, clearly labelled as measured. This is the only HIGH-confidence, measured source on the platform.

Tier 2 — DataForSEO Bulk Traffic Estimation (confidence: MEDIUM, estimated). For any domain you don't own, our primary estimate comes from DataForSEO's Labs bulk_traffic_estimation endpoint, which models a domain's search traffic (organic + paid) from its ranked-keyword footprint and expected click-through rates. Important: this is search traffic only — it does not include direct, referral, social, or app traffic, so for very large brands it will read below total-visit figures from panel tools like SimilarWeb. We label this metric "Estimated Search Traffic" rather than "monthly visitors" precisely so the number is not mistaken for total traffic. When the endpoint returns no search traffic for a domain, we fall back to rank-based estimation.

Tier 3 — Rank-based fallback (confidence: LOW, inferred). When no search-traffic estimate is available, we infer a rough figure from the domain's global popularity rank — sourced from Cloudflare Radar (DNS-based, primary) with the research-grade Tranco top-1M list as a secondary source — using a power-law regression. This is the least reliable tier and is shown at LOW confidence with a visible warning, because rank is only a loose proxy for traffic:

visits_per_month ≈ 500,000,000 / rank^0.85
# rank 1 → ~500M visits/month (Google-scale)
# rank 1,000 → ~2.1M visits/month
# rank 100,000 → ~116K visits/month
# rank 1,000,000 → ~11.8K visits/month

This formula is derived from the well-established empirical observation that web traffic follows a Zipf-like power-law distribution across ranked websites. The exponent 0.85 (rather than 1.0) reflects the fact that the traffic drop-off between ranks is sub-linear — the gap between rank 100 and rank 200 is much smaller in absolute terms than between rank 1 and rank 2. We calibrated this coefficient against publicly disclosed traffic figures from sites that appear on both Tranco and DataForSEO's panel.

About the rank sources. Cloudflare operates one of the world's largest DNS resolvers (1.1.1.1) and passively observes DNS query volume across billions of daily lookups; its Radar ranking reflects how frequently a domain is queried relative to others — a reasonable popularity proxy, even though one DNS query may map to many page views or none (email clients, API calls). Tranco is a research-grade composite list (Cisco Umbrella, Majestic, Cloudflare and others) that is more stable and harder to manipulate than any single ranking. Because rank is only loosely related to actual visits, any traffic figure derived this way is shown at LOW confidence — an order-of-magnitude indicator, not a precise number.

No Data State: If a domain does not appear in any of these sources, SiteWorthIt will display "Insufficient data" rather than inventing a number. Small personal blogs, intranet domains, recently launched sites, and heavily geographic-niche sites often fall into this category. Showing a fabricated estimate would be worse than showing nothing.

Confidence Level	Data Source	Typical Accuracy	Site Size Applicability
HIGH · measured	Google Analytics 4 (owner-connected)	Exact (first-party)	Any site you own
MEDIUM · estimated	DataForSEO Bulk Traffic Estimation (search traffic)	±20–30% of search traffic	Domains with ranked keywords
LOW · inferred	Cloudflare Radar / Tranco rank → power-law	±50–70% (order-of-magnitude)	Ranked domains without search data
—	No data available	N/A	Very small or new sites — shown as "Insufficient data"

Revenue Estimation

Based on Traffic × RPM, CPC from DataForSEO

Website revenue estimation is an inherently imprecise exercise. Actual earnings depend on factors no third party can observe: whether the site runs ads at all, which ad network it uses, its negotiated CPM rates, subscription revenue, affiliate commissions, product sales, and dozens of other monetization methods. Our revenue estimate specifically models display advertising revenue — the most common monetisation for general-interest websites — and should be treated as a ballpark indicator, not a valuation.

The RPM (Revenue Per Mille) Framework. RPM — revenue earned per 1,000 page views — is the key variable. It varies enormously by content niche. A cooking blog might earn $2–5 RPM because the advertisers competing for that audience (kitchen appliance brands, recipe app subscriptions) bid modestly. A personal finance site writing about mortgages or insurance might earn $15–40 RPM because financial service advertisers pay some of the highest CPCs in the Google Ads ecosystem. We infer the niche RPM tier from the average cost-per-click (CPC) data returned by DataForSEO's keyword analysis for the domain's top-ranking keywords.

# RPM tier inference from CPC data:
if avg_cpc < $0.50 → LOW tier → RPM range: $2–$5
if avg_cpc $0.50–$2 → MED tier → RPM range: $5–$15
if avg_cpc > $2.00 → HIGH tier → RPM range: $15–$40

# Revenue range formula (monthly):
revenue_low = (RPM_low / 1000) × monthly_visits × 0.30
revenue_high = (RPM_high / 1000) × monthly_visits × 0.80

# The 0.30–0.80 factor accounts for page views per session
# (not all visits result in an ad impression) and fill rate

The multiplier range of 0.30 to 0.80 deserves explanation. A "visit" in traffic analytics typically means a session — one user arriving at the site. That session might involve one page view or ten. The ratio of page views to visits (pages-per-session) varies widely: news sites often see 1.2–1.5 pages/session; Wikipedia-style reference sites can see 2–4. We use 0.30 as a conservative lower bound (low pages/session, low ad fill rate) and 0.80 as an optimistic upper bound (higher engagement sites with good ad partnerships). The result is presented as a range — e.g., "$3,200 – $11,000/month" — rather than a single number, which would imply false precision.

Important constraint: Revenue estimates are only displayed when we have REAL traffic data from DataForSEO's panel (confidence level HIGH). We deliberately do not calculate revenue from rank-based traffic estimates because the compounded uncertainty — imprecise traffic × imprecise RPM — would produce a number that is essentially meaningless. Showing a confident-sounding revenue figure for a site whose traffic is a rough estimate would be misleading.

Domain Authority Score

Source: OpenPageRank API (Common Crawl)

The "Domain Authority" label is one of the most misunderstood concepts in SEO. Moz created the term in 2010 and trademarked the metric as a predictor of Google ranking potential based on their proprietary backlink index. Ahrefs built an equivalent called "Domain Rating." Neither company shares their methodology in full, and neither metric is used by Google in its ranking algorithms — a fact Google has confirmed repeatedly. Despite this, DA/DR numbers are widely cited as if they were official signals.

SiteWorthIt uses a different, open-data approach. Our Domain Authority display is powered by the OpenPageRank API, which is built on link graph data extracted from the Common Crawl — a free, open repository of petabyte-scale web crawl data collected by a non-profit organisation. Common Crawl scans billions of pages and records which URLs link to which other URLs. OpenPageRank runs a PageRank-style computation on this link graph to assign each domain a score on a 0–10 decimal scale.

We multiply that raw score by 10 to present it on a 0–100 scale for intuitive reading. So a domain with an OpenPageRank score of 6.4 would display as "64" on SiteWorthIt. This is mathematically equivalent but easier to read alongside other percentage-based metrics on the same page.

Not Moz DA or Ahrefs DR. Our score uses open web crawl data from Common Crawl, which has different coverage and crawl freshness than Moz's or Ahrefs' proprietary indexes. For the same domain, these scores will often differ significantly. Neither is definitively "correct" — they measure the same underlying concept (link-based authority) using different data snapshots. OpenPageRank updates its index several times per year as new Common Crawl datasets are released.

The practical implication: a high score on SiteWorthIt's Domain Authority reflects that a domain receives links from many other well-linked domains, based on what Common Crawl has observed in its most recent crawl. A low score may mean the domain is genuinely low-authority, or that Common Crawl's crawler hasn't yet indexed its backlink profile fully — which is more likely for newer or smaller sites.

Backlinks & Referring Domains

Source: DataForSEO Backlinks API

Raw backlink counts are a notoriously noisy metric. A single spammy website might have 50,000 pages that all link to the same target — this inflates the backlink count dramatically without adding meaningful authority. The metric that actually matters is referring domains: the count of unique root domains that link to a website at least once. This is the metric SiteWorthIt prioritises in its display.

We retrieve backlink and referring domain data from the DataForSEO Backlinks API, which maintains its own crawl-based link index updated on a rolling basis. DataForSEO reports both the total number of backlinks (all individual links pointing to the domain) and the number of unique referring domains. The distinction matters: 10,000 links from 10 domains is far less valuable, both to search engines and as a signal of genuine authority, than 500 links from 500 distinct domains.

DataForSEO's backlink index is not as large as Ahrefs' or Majestic's — no freely accessible API has comparable scale to the major paid link databases. This means the referring domain count we display may undercount the true figure for very large sites with extensive backlink profiles. For most sites in the mid-range (hundreds to tens of thousands of referring domains), DataForSEO's coverage is sufficient to give an accurate order-of-magnitude reading.

Fresh vs. historical data: DataForSEO distinguishes between "live" backlinks (still active at last crawl) and "historical" (detected at some point but may be gone). SiteWorthIt reports the live figure where available, as historical counts can be inflated by links that no longer exist and are no longer passing any authority.

Global Rank

Sources: DataForSEO · Tranco · Cloudflare Radar

Global rank attempts to answer the question: "Where does this website sit in the overall pecking order of internet traffic?" Like traffic estimates, rank is derived from third-party data rather than the site's own analytics. SiteWorthIt uses a three-source priority chain for this metric.

Primary: DataForSEO Global Rank. DataForSEO's panel assigns a global popularity rank derived from the same aggregated clickstream data used for traffic estimates. This rank is available for a subset of domains that have enough panel representation to be ranked reliably. When available, this is the most reliable rank signal because it's based on actual user behaviour.

Secondary: Tranco List. The Tranco list ranks the top one million domains globally, updated weekly. It is compiled from Alexa historical data, Cisco Umbrella DNS queries, Majestic referring subnets, and Quantcast audience data. For domains in the top million that lack DataForSEO panel coverage, we use their Tranco rank position. The list is published openly and the methodology is documented in peer-reviewed research, making it one of the most transparent ranking datasets available.

Tertiary: Cloudflare Radar. For domains outside the top million but with measurable DNS query volume through Cloudflare's 1.1.1.1 resolver infrastructure, Cloudflare Radar provides a rank. This is the broadest-reaching source but also the least correlated with actual user traffic — DNS queries reflect many types of internet activity beyond web browsing — so it serves as a last resort when the other sources have no data.

Indexed Pages

Source: Serper.dev Google SERP API

The indexed page count answers a fundamental SEO question: how many of a website's pages has Google chosen to include in its search index? A page not in Google's index cannot rank for any query, so the indexed count is a rough indicator of a site's total search-visible footprint.

SiteWorthIt retrieves this number by running a site:domain.com query through the Serper.dev Google SERP API, which is a real-time interface to Google's search results. Google displays an estimated total result count when a site: query is submitted — for example, "About 14,200 results" — and we parse and display that number.

Google's count is approximate. Google itself states that its site: result counts are estimates and can vary by ±30% or more depending on how the query is processed. For large sites with millions of pages, the count can fluctuate significantly between queries. It is also possible for Google to index more pages than the site: operator reveals, as not all indexed content surfaces through this query type. Treat the displayed count as an indicator of scale rather than a precise inventory.

Additionally, a high indexed page count is not always desirable. Sites with large amounts of thin content, duplicate pages, or low-quality automatically generated pages may have a high index count but poor search performance. Google's systems will often surface only a fraction of technically-indexed pages for any given query, prioritising the highest-quality content. This is why SEO practitioners often focus on index quality over index quantity.

PageSpeed & Core Web Vitals

Source: Google PageSpeed Insights API v5

Unlike the traffic and revenue metrics — which are estimates based on third-party panel data — PageSpeed scores are direct measurements obtained by actually loading the website. SiteWorthIt submits each domain to the Google PageSpeed Insights API v5, which triggers a real Lighthouse analysis. Google's Lighthouse engine loads the page in a controlled, simulated environment and measures a standardised set of performance and quality signals.

The PageSpeed Insights API returns four composite scores, each on a 0–100 scale:

Performance — how fast the page loads and becomes interactive for the user
SEO — whether the page follows technical SEO best practices (crawlability, meta tags, mobile-friendliness)
Accessibility — how well the page works for users with disabilities (ARIA labels, contrast ratios, keyboard navigation)
Best Practices — security, deprecated APIs, HTTPS usage, and other general web hygiene signals

Within the Performance score, Lighthouse measures the Core Web Vitals — Google's official set of user experience metrics that directly influence search ranking:

Metric	What It Measures	Good	Needs Work	Poor
FCP First Contentful Paint	Time until first content appears on screen	< 1.8s	1.8–3s	> 3s
LCP Largest Contentful Paint	Time until the main content element loads	< 2.5s	2.5–4s	> 4s
TBT Total Blocking Time	Total time main thread was blocked by JavaScript	< 200ms	200–600ms	> 600ms
CLS Cumulative Layout Shift	How much the page layout jumps during load	< 0.1	0.1–0.25	> 0.25

Score thresholds follow Google's official classification: 0–49 is Poor (red), 50–89 is Needs Improvement (orange), and 90–100 is Good (green). These thresholds apply to all four composite category scores.

The API is free to use up to 25,000 requests per day per API key. Because the analysis involves actually loading the target website, it takes 10–30 seconds to complete per domain. Scores reflect the mobile experience by default (the mode Google uses for indexing), as this is what Google's mobile-first indexing algorithm evaluates.

Why this is the most reliable metric on the platform: PageSpeed data is not an estimate. Lighthouse runs a reproducible, standardised test against the live website. The scores you see reflect the actual current state of the site's front-end performance, not a model or prediction. The data is only as stale as our cache — see the Data Freshness section below.

Domain Age

Source: RDAP (Registration Data Access Protocol)

Domain age is sourced from the official registrar record for the domain. The registration date is a matter of public record maintained by domain registrars and made accessible through ICANN-mandated protocols.

SiteWorthIt uses RDAP (Registration Data Access Protocol) — the modern, structured replacement for the legacy WHOIS system. RDAP was standardised by the IETF (RFC 7482, 7483, 7484) and became mandatory for all ICANN-accredited registrars in 2019. Unlike WHOIS, which returns freeform text that requires fragile string parsing, RDAP returns structured JSON responses with standardised field names. This makes it dramatically more reliable: the registration date is always in the same machine-readable field regardless of which registrar manages the domain.

The specific field we read is the registrationDate (or creationDate in some registrar implementations) from the RDAP JSON response. This date represents when the domain was first registered, which we use to calculate the domain's age in years and months. It is possible for a domain to have changed ownership since its original registration — a domain acquired from a previous owner retains its original registration date, so "domain age" in this context means "time since first registration," not "time under current ownership."

RDAP has advantages over WHOIS beyond structured data. WHOIS servers are rate-limited aggressively, have inconsistent availability, and many registrars have restricted WHOIS data in response to GDPR requirements. RDAP is GDPR-compliant by design and has more consistent uptime. For the minority of domains where RDAP does not return a creation date (some country-code TLDs maintain their own protocols), we fall back to a WHOIS query as a last resort.

Website Valuation

Based on Annual Revenue × Industry Multipliers

Website valuation is the most speculative metric on the platform. The actual sale price of any website depends on a complex negotiation involving traffic trends, revenue consistency, traffic source diversification, owner dependency, niche competition, intellectual property, technical debt, team requirements, and buyer-specific strategic value. None of these factors are available to a third-party analytics tool.

What we can do is apply the industry-standard "income multiple" methodology used by digital asset brokers like Empire Flippers, Flippa, and Motion Invest. In this framework, a content website's value is typically expressed as a multiple of its annual net profit (or gross revenue, for advertising-dependent sites). Based on historical brokered sales data from the digital assets market, content and information websites typically trade at 2.5× to 4× annual revenue at the low and high ends respectively.

valuation_low = estimated_annual_revenue × 2.5
valuation_high = estimated_annual_revenue × 4.0

# where estimated_annual_revenue = monthly_revenue_estimate × 12
# Multiplier range based on Empire Flippers / Flippa market data

The 2.5× lower bound reflects sites with lower-quality traffic, heavy dependence on a single traffic source (e.g., 90%+ organic SEO with algorithm risk), or thin monetisation. The 4× upper bound reflects more stable sites with diversified traffic, consistent revenue history, and lower operational complexity. SaaS businesses and membership sites can trade at 5–10× or higher, but we use content-site multiples as our baseline since that's the most common site type in our dataset.

Strict display conditions: Website valuation is only shown when we have HIGH-confidence traffic data from DataForSEO's real panel AND CPC data to infer the RPM tier. Valuations are never calculated from rank-based traffic estimates. The compounded uncertainty of rank estimate → traffic estimate → revenue estimate → valuation would produce a number with no meaningful relationship to reality. When conditions are not met, this section is hidden entirely.

Even under ideal conditions, treat our valuation range as a rough frame of reference — useful for understanding scale (is this a $10K site or a $1M site?) but not as a figure you would take to a transaction negotiation without independent due diligence.

Data Freshness & Caching

Redis TTL: 24h · Database Staleness: 30 days

Every time you look up a domain on SiteWorthIt, the platform checks multiple caching layers before making any external API calls. This is essential for performance (API calls take seconds), cost control (many of our data sources bill per request), and reliability (if an upstream API is temporarily unavailable, cached data keeps the service functional).

Layer 1 — Redis in-memory cache (TTL: 24 hours). The first lookup result for any domain is stored in Redis with a 24-hour time-to-live. Redis operates entirely in memory, so cache hits return in milliseconds. If you look up the same domain twice within 24 hours, the second request will return the cached result without touching any external APIs. This means the data you see reflects the state of the web at the time of the first lookup in that 24-hour window, not the current moment.

Layer 2 — PostgreSQL persistent database (staleness threshold: 30 days). When Redis has no cached entry for a domain, the platform checks the PostgreSQL database for a previously stored result. If a database record exists and is less than 30 days old, we serve that data and simultaneously refresh the Redis cache with it. If the database record is older than 30 days, or doesn't exist, the platform makes fresh API calls to all configured upstream sources, processes the results, stores them in both PostgreSQL and Redis, and returns the fresh data.

Redis Cache TTL

24 Hours

In-memory, millisecond response

Database Staleness Limit

30 Days

PostgreSQL persistent store

Fresh API Fetch Time

5–20s

Parallel API calls, PSI takes longest

Cached Response Time

< 200ms

Redis hit, no external calls

The practical implication: if a website underwent a major redesign yesterday and you look it up today, you may see data reflecting the pre-redesign state if another user looked it up recently. PageSpeed scores, in particular, can change significantly after a redesign. For fresh data on a specific domain, you can bypass the cache by appending ?refresh=true to the analysis URL — this forces a new API fetch regardless of cache state, subject to rate limits.

Different metrics have different inherent refresh rates from their upstream sources. Traffic data in DataForSEO's panel is updated monthly. The Tranco list is rebuilt weekly. Cloudflare Radar ranks are updated daily. OpenPageRank refreshes several times per year. RDAP registration data is real-time. Our 30-day database threshold is calibrated to balance freshness against API cost — monthly refreshes align with the natural update cadence of most of our upstream data providers.

Accuracy & Limitations

Being Honest About Margins of Error

We built SiteWorthIt because we wanted a free, accessible way to get a ballpark sense of a website's scale — not a replacement for first-party analytics. If you have Google Analytics or another analytics platform installed on your own site, that data will always be more accurate than anything we can provide. We are estimating from the outside looking in.

Traffic estimates carry significant uncertainty. Even DataForSEO's panel-based data — our most accurate source — has a margin of error of roughly ±20–30% for sites in the 50K–1M monthly visit range. For sites below 50K visits/month, panel representation becomes thin and errors can be ±40–60% or more. For rank-based estimates (Tranco/Cloudflare), the error can exceed ±100% — the actual traffic might be double or half what we show. We display confidence badges (HIGH/MEDIUM/LOW) precisely to communicate this uncertainty.

Small sites are harder to estimate than large ones. A site with 1 million monthly visitors is in almost every panel dataset and ranking list. A site with 8,000 monthly visitors might appear in none of them. This inverse relationship — where larger sites are easier to estimate, and small sites that might benefit most from traffic context are hardest to measure — is a fundamental limitation of panel-based analytics. We make no apologies for showing "insufficient data" for sites below the threshold; an honest "we don't know" is more useful than a fabricated number.

Revenue and valuation figures should not be used for financial decisions. The revenue estimate models display advertising and applies industry-average RPMs based on keyword CPC data. A site might use no advertising at all, or might monetise through premium subscriptions at ten times the implied RPM, or might have advertising rates negotiated directly with brands at prices very different from programmatic averages. Valuation multiples are drawn from publicly discussed transaction data but vary enormously in practice. These numbers answer "what ballpark are we in?" not "what should I pay for this site?"

Domain Authority comparisons across tools will differ. If you check a site on SiteWorthIt and then check it on Moz or Ahrefs, you will get different scores. All three tools are measuring link-based authority, but with different crawl data, different graph algorithms, and different scoring curves. Our score is not wrong and theirs are not wrong — they are different estimates from different datasets. The most useful thing you can do with any authority score is track it over time on a single platform, not compare the raw number across platforms.

PageSpeed scores change frequently and vary between requests. Google's infrastructure routes PageSpeed Insights requests to different geographic locations and Lighthouse versions may update between our cached fetch and your current visit. It's common to see a 3–5 point variance on repeated tests. Our 24-hour cache means the score you see is from the most recent test within that window. If a site is actively being worked on (JavaScript optimisation, image compression, server upgrades), the score could change substantially in a short period.

Built by engineers, not marketers. We have no interest in inflating numbers to make the platform look more impressive. Showing a $500,000/month revenue estimate for a small blog might look exciting, but it would undermine trust and make the platform useless. If we do not have quality data for a metric, we hide the metric. If our estimate has low confidence, we say so. The goal is useful signal, not impressive-looking noise.

Explore More Tools

Every tool on SiteWorthIt is free, no sign-up required, and built on the data sources documented above.

Traffic & Revenue

How We Calculate Every Metric

On This Page

Monthly Traffic Estimates

Revenue Estimation

Domain Authority Score

Backlinks & Referring Domains

Global Rank

Indexed Pages

PageSpeed & Core Web Vitals

Domain Age

Website Valuation

Data Freshness & Caching

Accuracy & Limitations

Explore More Tools

Website Analytics Lookup

PageSpeed & Core Web Vitals

Domain Authority Checker

Global Rank Checker

YouTube Channel Stats

Frequently Asked Questions