Axion/n8n-workflows/README.md

# n8n Workflow: Receipt OCR Analysis

This workflow processes receipt images uploaded from the Axion HR system, extracts key information using OCR, and saves it to the backend database.

## Workflow Overview

1. **Webhook Receives Receipt** - Receives POST request with receipt image (base64) and user ID
2. **Extract Data** - Extracts image and user ID from request
3. **OCR API Call** - Sends image to OCR.space API for text extraction
4. **Parse Receipt Data** - Uses regex patterns to extract:
   - Amount (total)
   - Date
   - Vendor name
   - Tax amount
   - Calculates confidence score
5. **Save to Backend** - Saves extracted data to backend API
6. **Respond Success** - Returns success response with receipt ID and extracted amount

## Setup Instructions

### 1. Import the Workflow

1. Open your n8n instance
2. Click "Workflows" → "Import from File"
3. Select `receipt-ocr-workflow.json`
4. The workflow will be imported with all nodes configured

### 2. Configure Environment Variables

Set these environment variables in your n8n instance:

```bash
OCR_API_KEY=your_ocr_space_api_key
BACKEND_API_URL=https://your-backend-api.com
BACKEND_API_KEY=your_backend_api_key
```

**Getting an OCR API Key:**
- Sign up at https://ocr.space/ocrapi
- Get your free API key (25,000 requests/month free)
- Or use alternative OCR services (Google Vision, AWS Textract, etc.)

### 3. Configure Webhook URL

1. Click on the "Webhook - Receipt Upload" node
2. Note the webhook URL (e.g., `https://your-n8n.com/webhook/receipt-upload`)
3. Update your frontend to POST to this URL

### 4. Update Backend API Endpoint

1. Click on the "Save to Backend" node
2. Update the URL to match your backend API endpoint
3. Ensure your backend expects this data structure:

```json
{
  "userId": "string",
  "amount": "number",
  "date": "string",
  "vendor": "string",
  "tax": "number",
  "confidence": "number",
  "status": "string",
  "extractedText": "string"
}
```

## Frontend Integration

Update your `Receipts.tsx` component to call the n8n webhook:

```typescript
const handleFile = async (file: File) => {
  setUploading(true);

  // Convert file to base64
  const reader = new FileReader();
  reader.onloadend = async () => {
    const base64Image = reader.result as string;

    try {
      const response = await fetch('YOUR_N8N_WEBHOOK_URL', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          image: base64Image,
          userId: currentUser.id,
        }),
      });

      const result = await response.json();

      if (result.success) {
        // Update UI with extracted data
        setFormData({
          amount: result.amount,
          // ... other fields
        });
      }
    } catch (error) {
      console.error('OCR processing failed:', error);
    } finally {
      setUploading(false);
    }
  };

  reader.readAsDataURL(file);
};
```

## Workflow Customization

### Using Different OCR Service

Replace the "OCR API Call" node with your preferred service:

**Google Vision API:**
```javascript
// Use Google Vision API node or HTTP Request
POST https://vision.googleapis.com/v1/images:annotate
```

**AWS Textract:**
```javascript
// Use AWS Textract node
```

### Improving Amount Extraction

Modify the regex in "Parse Receipt Data" node:

```javascript
// More robust amount regex
const amountRegex = /(?:total|amount|sum|balance|due|\\$|€|£|USD|EUR)\\s*:?\\s*([\\d,]+\\.[\\d]{2})/i;
```

### Adding Category Detection

Add a Code node after parsing to detect category:

```javascript
const categoryKeywords = {
  'Office Supplies': ['office', 'supplies', 'staples', 'paper'],
  'Meals': ['restaurant', 'cafe', 'food', 'dining'],
  'Transportation': ['uber', 'lyft', 'taxi', 'gas', 'fuel'],
};

// Detect category based on vendor name
let category = 'Other';
for (const [cat, keywords] of Object.entries(categoryKeywords)) {
  if (keywords.some(kw => vendor?.toLowerCase().includes(kw))) {
    category = cat;
    break;
  }
}
```

## Testing

### Test the Workflow

1. Use n8n's "Execute Workflow" button
2. Or send a test POST request:

```bash
curl -X POST https://your-n8n.com/webhook/receipt-upload \
  -H "Content-Type: application/json" \
  -d '{
    "image": "base64_encoded_image_here",
    "userId": "test-user-123"
  }'
```

### Expected Response

```json
{
  "success": true,
  "receiptId": "receipt-123",
  "amount": 45.99,
  "confidence": 0.85,
  "status": "needs_review"
}
```

## Troubleshooting

### OCR Not Extracting Amount

- Check OCR API key is valid
- Verify image quality (clear, readable text)
- Adjust regex patterns in "Parse Receipt Data" node
- Check OCR API response in node output

### Backend Save Failing

- Verify backend API URL is correct
- Check API authentication headers
- Ensure backend endpoint accepts the data structure
- Check n8n execution logs for errors

### Low Confidence Scores

- Improve image quality before upload
- Adjust regex patterns to match your receipt format
- Add more extraction patterns for different receipt types
- Consider using ML-based extraction for better accuracy

## Alternative OCR Services

If OCR.space doesn't meet your needs:

1. **Google Cloud Vision API** - High accuracy, pay-per-use
2. **AWS Textract** - Good for structured documents
3. **Azure Computer Vision** - Microsoft's OCR service
4. **Tesseract.js** - Open source, runs locally
5. **ABBYY FineReader** - Enterprise-grade OCR

Update the "OCR API Call" node accordingly.