Axion/n8n-workflows/README.md
2025-12-07 12:14:33 -04:00

224 lines
5.4 KiB
Markdown

# n8n Workflow: Receipt OCR Analysis
This workflow processes receipt images uploaded from the Axion HR system, extracts key information using OCR, and saves it to the backend database.
## Workflow Overview
1. **Webhook Receives Receipt** - Receives POST request with receipt image (base64) and user ID
2. **Extract Data** - Extracts image and user ID from request
3. **OCR API Call** - Sends image to OCR.space API for text extraction
4. **Parse Receipt Data** - Uses regex patterns to extract:
- Amount (total)
- Date
- Vendor name
- Tax amount
- Calculates confidence score
5. **Save to Backend** - Saves extracted data to backend API
6. **Respond Success** - Returns success response with receipt ID and extracted amount
## Setup Instructions
### 1. Import the Workflow
1. Open your n8n instance
2. Click "Workflows" → "Import from File"
3. Select `receipt-ocr-workflow.json`
4. The workflow will be imported with all nodes configured
### 2. Configure Environment Variables
Set these environment variables in your n8n instance:
```bash
OCR_API_KEY=your_ocr_space_api_key
BACKEND_API_URL=https://your-backend-api.com
BACKEND_API_KEY=your_backend_api_key
```
**Getting an OCR API Key:**
- Sign up at https://ocr.space/ocrapi
- Get your free API key (25,000 requests/month free)
- Or use alternative OCR services (Google Vision, AWS Textract, etc.)
### 3. Configure Webhook URL
1. Click on the "Webhook - Receipt Upload" node
2. Note the webhook URL (e.g., `https://your-n8n.com/webhook/receipt-upload`)
3. Update your frontend to POST to this URL
### 4. Update Backend API Endpoint
1. Click on the "Save to Backend" node
2. Update the URL to match your backend API endpoint
3. Ensure your backend expects this data structure:
```json
{
"userId": "string",
"amount": "number",
"date": "string",
"vendor": "string",
"tax": "number",
"confidence": "number",
"status": "string",
"extractedText": "string"
}
```
## Frontend Integration
Update your `Receipts.tsx` component to call the n8n webhook:
```typescript
const handleFile = async (file: File) => {
setUploading(true);
// Convert file to base64
const reader = new FileReader();
reader.onloadend = async () => {
const base64Image = reader.result as string;
try {
const response = await fetch('YOUR_N8N_WEBHOOK_URL', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
image: base64Image,
userId: currentUser.id,
}),
});
const result = await response.json();
if (result.success) {
// Update UI with extracted data
setFormData({
amount: result.amount,
// ... other fields
});
}
} catch (error) {
console.error('OCR processing failed:', error);
} finally {
setUploading(false);
}
};
reader.readAsDataURL(file);
};
```
## Workflow Customization
### Using Different OCR Service
Replace the "OCR API Call" node with your preferred service:
**Google Vision API:**
```javascript
// Use Google Vision API node or HTTP Request
POST https://vision.googleapis.com/v1/images:annotate
```
**AWS Textract:**
```javascript
// Use AWS Textract node
```
### Improving Amount Extraction
Modify the regex in "Parse Receipt Data" node:
```javascript
// More robust amount regex
const amountRegex = /(?:total|amount|sum|balance|due|\\$|€|£|USD|EUR)\\s*:?\\s*([\\d,]+\\.[\\d]{2})/i;
```
### Adding Category Detection
Add a Code node after parsing to detect category:
```javascript
const categoryKeywords = {
'Office Supplies': ['office', 'supplies', 'staples', 'paper'],
'Meals': ['restaurant', 'cafe', 'food', 'dining'],
'Transportation': ['uber', 'lyft', 'taxi', 'gas', 'fuel'],
};
// Detect category based on vendor name
let category = 'Other';
for (const [cat, keywords] of Object.entries(categoryKeywords)) {
if (keywords.some(kw => vendor?.toLowerCase().includes(kw))) {
category = cat;
break;
}
}
```
## Testing
### Test the Workflow
1. Use n8n's "Execute Workflow" button
2. Or send a test POST request:
```bash
curl -X POST https://your-n8n.com/webhook/receipt-upload \
-H "Content-Type: application/json" \
-d '{
"image": "base64_encoded_image_here",
"userId": "test-user-123"
}'
```
### Expected Response
```json
{
"success": true,
"receiptId": "receipt-123",
"amount": 45.99,
"confidence": 0.85,
"status": "needs_review"
}
```
## Troubleshooting
### OCR Not Extracting Amount
- Check OCR API key is valid
- Verify image quality (clear, readable text)
- Adjust regex patterns in "Parse Receipt Data" node
- Check OCR API response in node output
### Backend Save Failing
- Verify backend API URL is correct
- Check API authentication headers
- Ensure backend endpoint accepts the data structure
- Check n8n execution logs for errors
### Low Confidence Scores
- Improve image quality before upload
- Adjust regex patterns to match your receipt format
- Add more extraction patterns for different receipt types
- Consider using ML-based extraction for better accuracy
## Alternative OCR Services
If OCR.space doesn't meet your needs:
1. **Google Cloud Vision API** - High accuracy, pay-per-use
2. **AWS Textract** - Good for structured documents
3. **Azure Computer Vision** - Microsoft's OCR service
4. **Tesseract.js** - Open source, runs locally
5. **ABBYY FineReader** - Enterprise-grade OCR
Update the "OCR API Call" node accordingly.