WebsiteTemplate/docs/ANALYTICS_ACCURACY.md
2026-01-25 11:33:37 -04:00

4.4 KiB

Analytics Accuracy Guide

How to Verify Analytics Accuracy

Run the verification script to check for discrepancies:

php /var/www/verify-analytics.php [YYYY-MM-DD]

If no date is provided, it checks today's data.

Current Accuracy Issues Found

1. Returning Visitor Count Bug ⚠️

The summary shows incorrect returning visitor counts. The script counts unique returning visitors, but the summary logic appears flawed.

Impact: Returning visitor numbers are inflated.

2. RSS Click Tracking

RSS clicks are tracked in two ways:

  • Button clicks on the page (tracked via JavaScript)
  • Actual RSS feed fetches (tracked via PHP in feed.php)

Impact: RSS numbers may be double-counted or inconsistent.

3. No Bot Filtering

Bot traffic (search engines, crawlers) is currently counted as regular visitors.

Impact: Numbers may be inflated by 10-30% depending on site popularity.

4. Ad Blockers

Users with ad blockers may block the analytics script entirely.

Impact: Numbers may be deflated by 5-15% (depending on user base).

5. Self-Visits

Your own visits are not filtered out.

Impact: Development/testing visits inflate numbers.

6. Duplicate Pageviews

Same visitor, same page, within 5 seconds = potential duplicate.

Impact: Rapid navigation or page refreshes create duplicates.

7. New vs Returning Logic

Currently only checks within the same day. A visitor who came yesterday but returns today is counted as "new" again.

Impact: Returning visitor counts are inaccurate across days.

Factors Affecting Accuracy

What IS Tracked Accurately:

  • Pageview timestamps (hourly breakdown is recalculated from raw data)
  • Share counts (when JavaScript executes)
  • Reaction counts (stored separately, very accurate)

⚠️ What May Be Inaccurate:

  • Total visits: May include bots, duplicates, self-visits
  • New vs Returning: Only accurate within same day
  • RSS clicks: May have double-counting issues
  • Unique visitors: Uses localStorage, can be cleared/blocked

Recommendations to Improve Accuracy

1. Filter Bot Traffic

Add bot detection in track.php:

// Check user agent for bots
$ua = $_SERVER['HTTP_USER_AGENT'] ?? '';
$isBot = preg_match('/bot|crawler|spider|scraper/i', $ua);
if ($isBot) {
    // Skip tracking or mark as bot
}

2. Filter Self-Visits

Add your IP(s) to a blocklist in track.php:

$yourIPs = ['YOUR_IP_HERE', 'ANOTHER_IP'];
if (in_array($_SERVER['REMOTE_ADDR'], $yourIPs)) {
    // Skip tracking
}

3. Fix Returning Visitor Logic

Store visitor history across days, not just within the same day.

4. Deduplicate Rapid Pageviews

Add a cooldown period (e.g., same visitor + same page + <10 seconds = ignore).

5. Separate RSS Tracking

Distinguish between:

  • RSS button clicks (user intent)
  • RSS feed fetches (automatic, may be bots)

Understanding Your Numbers

Realistic Accuracy Range

  • Pageviews: ±15-25% (due to bots, ad blockers, duplicates)
  • Unique Visitors: ±20-30% (localStorage can be cleared/blocked)
  • Shares: ±5% (very accurate, requires JavaScript)
  • Reactions: ±1% (very accurate, stored server-side)

What the Numbers Mean

  • Total Visits: All page loads, including bots and duplicates
  • New Visitors: First-time visitors today (not lifetime)
  • Returning Visitors: Visitors who visited earlier today (not yesterday)
  • Hourly Breakdown: Accurate (recalculated from timestamps)

Best Practices

  1. Run verification script regularly to catch discrepancies
  2. Focus on trends rather than absolute numbers
  3. Compare with server logs for validation
  4. Filter your own IP for more accurate numbers
  5. Monitor for anomalies (sudden spikes may be bots)

Quick Accuracy Check

# Check today's data
php /var/www/verify-analytics.php

# Check specific date
php /var/www/verify-analytics.php 2025-12-28

# Look for:
# - Discrepancies between summary and raw data
# - High bot counts
# - Duplicate pageviews
# - Rapid-fire visits

Expected Accuracy

For a typical personal blog:

  • Pageviews: 70-85% accurate (after accounting for bots/ad blockers)
  • Unique Visitors: 60-75% accurate (localStorage limitations)
  • Engagement (shares/reactions): 95%+ accurate

The analytics are good enough for trends and general insights, but don't rely on exact numbers for critical decisions.