ToolMachine Learning & AI

AWS Textract

Automatically extract printed text, handwriting and data from any document. Amazon Textract allows the extraction of text and structured data such as tables and forms from documents using the power of artificial intelligence (AI).

Overview

Amazon Textract is a machine learning service that automatically extracts text, handwriting and data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents like PDFs, images, tables and forms, or through simple OCR software that requires manual configuration which often times requires reconfiguration when the form changes. To overcome these manual and expensive processes, Textract uses machine learning to read and process any type of document, accurately extracting text, handwriting, tables and other data without any manual effort.

You can quickly automate document processing and take action on the information extracted, whether it be automating loan processing or extracting information from invoices and receipts. Textract can extract the data in minutes rather than hours and days.

Go beyond simple Optical Character Recognition (OCR) by extracting relationships, structure and text from documents.

Many documents such as medical intake forms or employment applications contain both handwritten and printed text. Amazon Textract can extract printed text and handwriting from documents written in English with high confidence scores, whether it is free-form text or text embedded in tables and forms. Documents can also contain a mix of typed text or handwritten text.

Improve security and compliance through robust data privacy, encryption, security controls and support compliance standards such as HIPAA, GDPR and more.

Amazon Textract can extract relevant data such as contact information, items purchased, and vendor name, from almost any invoice or receipt without the need for any templates or configuration. Invoices and receipts come in various layouts which makes it difficult and time consuming to manually extract data at scale. Amazon Textract uses ML to understand the context of invoices and receipts and automatically extracts data such as vendor name, invoice number, item prices, total amount and payment terms to suit your business needs.

Easily implement human reviews with Amazon Augmented AI (Amazon A2I) to manage nuanced or sensitive workflows and audit predictions on an ongoing basis.

Accurately extracting critical business data for financial services, like mortgage rates, applicant names and invoice totals across a variety of financial forms like mortgage applications, invoices and more to process loan and mortgage applications in minutes.

Better serve your patients and insurers by extracting important patient data from health intake forms, insurance claims and pre-authorization forms. Keep data organized and in its original context and eliminate manual review of output.

Easily extract relevant data from government-related forms like small business loans, federal tax forms or business applications with a high degree of accuracy. Public sector opportunities for utilising AWS ML tools such as Textract are limitless.

Amazon Textract has three different APIs which control the pricing: Detect Document Text API, Analyze Document API and Analyze Expense API.

Detect Document Text API uses OCR technology to extract text and handwriting from a provided document. If based in the US West (Oregon) region, you’d pay $0.0015 per page for the first one million pages and $0.0006 per page for over one million pages.

Analyze Document API has two functions, forms and tables, with different pricing levels.
Analyze Document API for forms uses OCR technology to extract text and handwriting from a provided document.

Analyze Expense API extracts data from invoices and receipts. For example, an invoice or receipt may contain data such as an Invoice ID, Invoice No., invoice # and the associated value of 12345. Amazon Textract recognizes these various terms as the invoice ID and the corresponding value as 12345 and enables a standard taxonomy of common fields.

We recommend visiting the Amazon Textract pricing calculator for an accurate estimate of your cloud costs.

Go beyond simple Optical Character Recognition (OCR) by extracting relationships, structure and text from documents.

Many documents such as medical intake forms or employment applications contain both handwritten and printed text. Amazon Textract can extract printed text and handwriting from documents written in English with high confidence scores, whether it is free-form text or text embedded in tables and forms. Documents can also contain a mix of typed text or handwritten text.

Improve security and compliance through robust data privacy, encryption, security controls and support compliance standards such as HIPAA, GDPR and more.

Amazon Textract can extract relevant data such as contact information, items purchased, and vendor name, from almost any invoice or receipt without the need for any templates or configuration. Invoices and receipts come in various layouts which makes it difficult and time consuming to manually extract data at scale. Amazon Textract uses ML to understand the context of invoices and receipts and automatically extracts data such as vendor name, invoice number, item prices, total amount and payment terms to suit your business needs.

Easily implement human reviews with Amazon Augmented AI (Amazon A2I) to manage nuanced or sensitive workflows and audit predictions on an ongoing basis.

Accurately extracting critical business data for financial services, like mortgage rates, applicant names and invoice totals across a variety of financial forms like mortgage applications, invoices and more to process loan and mortgage applications in minutes.

Better serve your patients and insurers by extracting important patient data from health intake forms, insurance claims and pre-authorization forms. Keep data organized and in its original context and eliminate manual review of output.

Easily extract relevant data from government-related forms like small business loans, federal tax forms or business applications with a high degree of accuracy. Public sector opportunities for utilising AWS ML tools such as Textract are limitless.

Amazon Textract has three different APIs which control the pricing: Detect Document Text API, Analyze Document API and Analyze Expense API.

Detect Document Text API uses OCR technology to extract text and handwriting from a provided document. If based in the US West (Oregon) region, you’d pay $0.0015 per page for the first one million pages and $0.0006 per page for over one million pages.

Analyze Document API has two functions, forms and tables, with different pricing levels.
Analyze Document API for forms uses OCR technology to extract text and handwriting from a provided document.

Analyze Expense API extracts data from invoices and receipts. For example, an invoice or receipt may contain data such as an Invoice ID, Invoice No., invoice # and the associated value of 12345. Amazon Textract recognizes these various terms as the invoice ID and the corresponding value as 12345 and enables a standard taxonomy of common fields.

We recommend visiting the Amazon Textract pricing calculator for an accurate estimate of your cloud costs.

Want to find out more?

Got questions? We’ve got answers! Find out more about our data culture, company vision and our growing team.

Get in touch