Amazon Textract

May 2025

Adopt

Amazon Textract is a machine learning service that automatically extracts text, handwriting, layout elements, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract structured data from forms and tables. The service provides advanced capabilities including form extraction, table detection, and layout analysis with high accuracy and confidence scores for extracted data.

Key Capabilities:

Optical character recognition for printed text and handwriting
Form and table data extraction with structure preservation
Layout analysis for paragraphs, titles, lists, headers, and footers
Query-based extraction using natural language questions
Custom queries for business-specific document types
Signature detection and verification
Support for invoices, receipts, and identity documents
Multi-language support including English, German, French, Spanish, Italian and Portuguese

MOHARA has adopted Amazon Textract as a document processing service for extracting structured and unstructured data from documents, particularly for projects already leveraging AWS infrastructure where it provides seamless integration with other AWS services.