Automated Redaction Detection: Verify Document Privacy with Vision AI
Ensure your documents are properly sanitized before they leave your secure environment. Automatically detect black bars and whiteouts with Vision AI.
Introduction
In legal discovery, FOIA requests, and financial auditing, "redacting" a document is a critical step. But how do you verify that a 500-page PDF was actually redacted correctly?
Manual review is error-prone. Reviewers skip pages. Software glitches. This guide demonstrates how to use Ninjadoc's Vision AI API to build a "Redaction QA" robot that scans every page for visual proof of sanitization.
Why Detect Redactions?
Before publishing a public record, you need to know: "Did the software actually apply the black bars?" Our API gives you a JSON report of every redacted region found, acting as a final safety check.
The Compliance Challenge
Organizations often use automated tools to apply redactions. But tools fail.
- Silent Failures: Scripts might crash on page 50, leaving the rest of the document exposed.
- Layer Issues: Sometimes a "redaction" is just a transparent box.
- Incomplete Sanitization: Visual detection confirms what the user actually sees.
The Solution: Vision AI
Unlike text search, Vision AI looks at the document as an image. It identifies the visual characteristics of redaction marks—whether they are standard black bars, whiteout boxes, or pixelated regions.
We offer a dedicated endpoint `POST /api/redaction-detection` that takes a PDF and returns the bounding boxes of every detected redaction.
Implementation Guide
Use the following patterns to integrate automated redaction checking into your workflow.
Redaction Detection Implementation
const formData = new FormData();
formData.append('document_file', file);
const response = await fetch('https://ninjadoc.ai/api/redaction-detection', {
method: 'POST',
headers: {
'X-API-Key': process.env.NINJADOC_API_KEY
},
body: formData
});
const result = await response.json();
// Output:
// {
// "redacted_items": [
// {
// "bbox": [100, 200, 300, 250],
// "page": 1,
// "confidence": 0.98,
// "reason": "black_box_overlay"
// }
// ],
// "page_metadata": { ... }
// }Verifying Sanitization
The API returns a confidence score for every detected field.
- High Confidence (>90%): Confirmed redaction mark.
- Verification: You can cross-reference these coordinates with your expected PII locations to ensure they match.
Pricing
Redaction detection consumes credits based on document length. See our pricing page for details.
Frequently Asked Questions
Does this API apply redactions or detect them?
This specific endpoint (`/api/redaction-detection`) is designed to **detect** existing redactions (e.g., black bars, whiteout boxes). It is used for quality assurance and compliance to ensure documents have been properly sanitized before sharing.
Can I redact PII automatically?
Yes. To *apply* redactions, use our `/api/ask` or `/api/ask/batch` endpoints to find the coordinates of sensitive fields (like SSN or Names), and then use a PDF library to draw opaque overlays over those coordinates.
How accurate is the detection?
Our Vision models are trained on thousands of redacted documents to distinguish between actual redaction marks and similar visual elements like dark photos or tables. It returns a confidence score for every detection.
Further reading
Build Your Compliance Safety Net
Automatically verify that your sensitive documents are properly redacted before they leave your hands.
- No Credit Card Required
- Visual coordinate overlay for auditability