# Analytics Accuracy Guide ## How to Verify Analytics Accuracy Run the verification script to check for discrepancies: ```bash php /var/www/verify-analytics.php [YYYY-MM-DD] ``` If no date is provided, it checks today's data. ## Current Accuracy Issues Found ### 1. **Returning Visitor Count Bug** ⚠️ The summary shows incorrect returning visitor counts. The script counts unique returning visitors, but the summary logic appears flawed. **Impact**: Returning visitor numbers are inflated. ### 2. **RSS Click Tracking** RSS clicks are tracked in two ways: - Button clicks on the page (tracked via JavaScript) - Actual RSS feed fetches (tracked via PHP in `feed.php`) **Impact**: RSS numbers may be double-counted or inconsistent. ### 3. **No Bot Filtering** Bot traffic (search engines, crawlers) is currently counted as regular visitors. **Impact**: Numbers may be inflated by 10-30% depending on site popularity. ### 4. **Ad Blockers** Users with ad blockers may block the analytics script entirely. **Impact**: Numbers may be deflated by 5-15% (depending on user base). ### 5. **Self-Visits** Your own visits are not filtered out. **Impact**: Development/testing visits inflate numbers. ### 6. **Duplicate Pageviews** Same visitor, same page, within 5 seconds = potential duplicate. **Impact**: Rapid navigation or page refreshes create duplicates. ### 7. **New vs Returning Logic** Currently only checks within the same day. A visitor who came yesterday but returns today is counted as "new" again. **Impact**: Returning visitor counts are inaccurate across days. ## Factors Affecting Accuracy ### ✅ What IS Tracked Accurately: - Pageview timestamps (hourly breakdown is recalculated from raw data) - Share counts (when JavaScript executes) - Reaction counts (stored separately, very accurate) ### ⚠️ What May Be Inaccurate: - **Total visits**: May include bots, duplicates, self-visits - **New vs Returning**: Only accurate within same day - **RSS clicks**: May have double-counting issues - **Unique visitors**: Uses localStorage, can be cleared/blocked ## Recommendations to Improve Accuracy ### 1. **Filter Bot Traffic** Add bot detection in `track.php`: ```php // Check user agent for bots $ua = $_SERVER['HTTP_USER_AGENT'] ?? ''; $isBot = preg_match('/bot|crawler|spider|scraper/i', $ua); if ($isBot) { // Skip tracking or mark as bot } ``` ### 2. **Filter Self-Visits** Add your IP(s) to a blocklist in `track.php`: ```php $yourIPs = ['YOUR_IP_HERE', 'ANOTHER_IP']; if (in_array($_SERVER['REMOTE_ADDR'], $yourIPs)) { // Skip tracking } ``` ### 3. **Fix Returning Visitor Logic** Store visitor history across days, not just within the same day. ### 4. **Deduplicate Rapid Pageviews** Add a cooldown period (e.g., same visitor + same page + <10 seconds = ignore). ### 5. **Separate RSS Tracking** Distinguish between: - RSS button clicks (user intent) - RSS feed fetches (automatic, may be bots) ## Understanding Your Numbers ### Realistic Accuracy Range - **Pageviews**: ±15-25% (due to bots, ad blockers, duplicates) - **Unique Visitors**: ±20-30% (localStorage can be cleared/blocked) - **Shares**: ±5% (very accurate, requires JavaScript) - **Reactions**: ±1% (very accurate, stored server-side) ### What the Numbers Mean - **Total Visits**: All page loads, including bots and duplicates - **New Visitors**: First-time visitors today (not lifetime) - **Returning Visitors**: Visitors who visited earlier today (not yesterday) - **Hourly Breakdown**: Accurate (recalculated from timestamps) ## Best Practices 1. **Run verification script regularly** to catch discrepancies 2. **Focus on trends** rather than absolute numbers 3. **Compare with server logs** for validation 4. **Filter your own IP** for more accurate numbers 5. **Monitor for anomalies** (sudden spikes may be bots) ## Quick Accuracy Check ```bash # Check today's data php /var/www/verify-analytics.php # Check specific date php /var/www/verify-analytics.php 2025-12-28 # Look for: # - Discrepancies between summary and raw data # - High bot counts # - Duplicate pageviews # - Rapid-fire visits ``` ## Expected Accuracy For a typical personal blog: - **Pageviews**: 70-85% accurate (after accounting for bots/ad blockers) - **Unique Visitors**: 60-75% accurate (localStorage limitations) - **Engagement** (shares/reactions): 95%+ accurate The analytics are **good enough for trends and general insights**, but don't rely on exact numbers for critical decisions.