144 lines
4.4 KiB
Markdown
144 lines
4.4 KiB
Markdown
# Analytics Accuracy Guide
|
|
|
|
## How to Verify Analytics Accuracy
|
|
|
|
Run the verification script to check for discrepancies:
|
|
|
|
```bash
|
|
php /var/www/verify-analytics.php [YYYY-MM-DD]
|
|
```
|
|
|
|
If no date is provided, it checks today's data.
|
|
|
|
## Current Accuracy Issues Found
|
|
|
|
### 1. **Returning Visitor Count Bug** ⚠️
|
|
The summary shows incorrect returning visitor counts. The script counts unique returning visitors, but the summary logic appears flawed.
|
|
|
|
**Impact**: Returning visitor numbers are inflated.
|
|
|
|
### 2. **RSS Click Tracking**
|
|
RSS clicks are tracked in two ways:
|
|
- Button clicks on the page (tracked via JavaScript)
|
|
- Actual RSS feed fetches (tracked via PHP in `feed.php`)
|
|
|
|
**Impact**: RSS numbers may be double-counted or inconsistent.
|
|
|
|
### 3. **No Bot Filtering**
|
|
Bot traffic (search engines, crawlers) is currently counted as regular visitors.
|
|
|
|
**Impact**: Numbers may be inflated by 10-30% depending on site popularity.
|
|
|
|
### 4. **Ad Blockers**
|
|
Users with ad blockers may block the analytics script entirely.
|
|
|
|
**Impact**: Numbers may be deflated by 5-15% (depending on user base).
|
|
|
|
### 5. **Self-Visits**
|
|
Your own visits are not filtered out.
|
|
|
|
**Impact**: Development/testing visits inflate numbers.
|
|
|
|
### 6. **Duplicate Pageviews**
|
|
Same visitor, same page, within 5 seconds = potential duplicate.
|
|
|
|
**Impact**: Rapid navigation or page refreshes create duplicates.
|
|
|
|
### 7. **New vs Returning Logic**
|
|
Currently only checks within the same day. A visitor who came yesterday but returns today is counted as "new" again.
|
|
|
|
**Impact**: Returning visitor counts are inaccurate across days.
|
|
|
|
## Factors Affecting Accuracy
|
|
|
|
### ✅ What IS Tracked Accurately:
|
|
- Pageview timestamps (hourly breakdown is recalculated from raw data)
|
|
- Share counts (when JavaScript executes)
|
|
- Reaction counts (stored separately, very accurate)
|
|
|
|
### ⚠️ What May Be Inaccurate:
|
|
- **Total visits**: May include bots, duplicates, self-visits
|
|
- **New vs Returning**: Only accurate within same day
|
|
- **RSS clicks**: May have double-counting issues
|
|
- **Unique visitors**: Uses localStorage, can be cleared/blocked
|
|
|
|
## Recommendations to Improve Accuracy
|
|
|
|
### 1. **Filter Bot Traffic**
|
|
Add bot detection in `track.php`:
|
|
```php
|
|
// Check user agent for bots
|
|
$ua = $_SERVER['HTTP_USER_AGENT'] ?? '';
|
|
$isBot = preg_match('/bot|crawler|spider|scraper/i', $ua);
|
|
if ($isBot) {
|
|
// Skip tracking or mark as bot
|
|
}
|
|
```
|
|
|
|
### 2. **Filter Self-Visits**
|
|
Add your IP(s) to a blocklist in `track.php`:
|
|
```php
|
|
$yourIPs = ['YOUR_IP_HERE', 'ANOTHER_IP'];
|
|
if (in_array($_SERVER['REMOTE_ADDR'], $yourIPs)) {
|
|
// Skip tracking
|
|
}
|
|
```
|
|
|
|
### 3. **Fix Returning Visitor Logic**
|
|
Store visitor history across days, not just within the same day.
|
|
|
|
### 4. **Deduplicate Rapid Pageviews**
|
|
Add a cooldown period (e.g., same visitor + same page + <10 seconds = ignore).
|
|
|
|
### 5. **Separate RSS Tracking**
|
|
Distinguish between:
|
|
- RSS button clicks (user intent)
|
|
- RSS feed fetches (automatic, may be bots)
|
|
|
|
## Understanding Your Numbers
|
|
|
|
### Realistic Accuracy Range
|
|
- **Pageviews**: ±15-25% (due to bots, ad blockers, duplicates)
|
|
- **Unique Visitors**: ±20-30% (localStorage can be cleared/blocked)
|
|
- **Shares**: ±5% (very accurate, requires JavaScript)
|
|
- **Reactions**: ±1% (very accurate, stored server-side)
|
|
|
|
### What the Numbers Mean
|
|
- **Total Visits**: All page loads, including bots and duplicates
|
|
- **New Visitors**: First-time visitors today (not lifetime)
|
|
- **Returning Visitors**: Visitors who visited earlier today (not yesterday)
|
|
- **Hourly Breakdown**: Accurate (recalculated from timestamps)
|
|
|
|
## Best Practices
|
|
|
|
1. **Run verification script regularly** to catch discrepancies
|
|
2. **Focus on trends** rather than absolute numbers
|
|
3. **Compare with server logs** for validation
|
|
4. **Filter your own IP** for more accurate numbers
|
|
5. **Monitor for anomalies** (sudden spikes may be bots)
|
|
|
|
## Quick Accuracy Check
|
|
|
|
```bash
|
|
# Check today's data
|
|
php /var/www/verify-analytics.php
|
|
|
|
# Check specific date
|
|
php /var/www/verify-analytics.php 2025-12-28
|
|
|
|
# Look for:
|
|
# - Discrepancies between summary and raw data
|
|
# - High bot counts
|
|
# - Duplicate pageviews
|
|
# - Rapid-fire visits
|
|
```
|
|
|
|
## Expected Accuracy
|
|
|
|
For a typical personal blog:
|
|
- **Pageviews**: 70-85% accurate (after accounting for bots/ad blockers)
|
|
- **Unique Visitors**: 60-75% accurate (localStorage limitations)
|
|
- **Engagement** (shares/reactions): 95%+ accurate
|
|
|
|
The analytics are **good enough for trends and general insights**, but don't rely on exact numbers for critical decisions.
|