Form Recognizer 2022-08-31
Form Recognizer extracts information from forms and images into structured data. It includes the following options:
- Read - Extract text from documents.
- Layout - Extract text and layout information from documents.
- Document - Extract text, layout, entities, and general key-value pairs from documents.
- Business Card - Extract key information from business cards.
- ID Document - Extract key information from passports and ID cards.
- Invoice - Extract key information from invoices.
- Receipt - Extract key information from receipts.
- US W2 Tax - Extract key information from IRS US W2 tax forms (year 2018-2021).
- Vaccination Card - Extract key information from US Covid-19 CDC vaccination cards.
- Health Insurance Card - Extract key information from US health insurance cards.
- Custom - Extracts information from forms (PDFs and images) into structured data based on a model created from a set of representative training forms. Form Recognizer learns the structure of your forms to intelligently extract text and data. It ingests text from forms, applies machine learning technology to identify keys, tables, and fields, and then outputs structured data that includes the relationships within the original file.
Analyze - Analyze document
Analyze document with prebuilt or custom models.
Supported Prebuilt Models
Model ID | Description |
---|---|
prebuilt-read | Extract text from documents. |
prebuilt-layout | Extract text and layout information from documents. |
prebuilt-document | Extract text, layout, entities, and general key-value pairs from documents. |
prebuilt-businessCard | Extract key information from business cards. |
prebuilt-idDocument | Extract key information from passports and ID cards. |
prebuilt-invoice | Extract key information from invoices. |
prebuilt-receipt | Extract key information from receipts. |
prebuilt-healthInsuranceCard.us | Extract key information from US health insurance cards. |
prebuilt-vaccinationCard | Extract key information from US Covid-19 CDC vaccination cards. |
prebuilt-tax.us.w2 | Extract key information from IRS US W2 tax forms (year 2018-2021). |
Analysis Features
Model ID | Content Extraction | Paragraphs | Selection Marks | Tables | Key-Value Pairs | Languages | Document Analysis |
---|---|---|---|---|---|---|---|
prebuilt-read | ✓ | ✓ | ✓ | ||||
prebuilt-layout | ✓ | ✓ | ✓ | ✓ | |||
prebuilt-document | ✓ | ✓ | ✓ | ✓ | ✓ | ||
prebuilt-businessCard | ✓ | ✓ | ✓ | ||||
prebuilt-idDocument | ✓ | ✓ | ✓ | ||||
prebuilt-invoice | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
prebuilt-receipt | ✓ | ✓ | ✓ | ||||
prebuilt-healthInsuranceCard.us | ✓ | ✓ | |||||
prebuilt-vaccinationCard | ✓ | ✓ | ✓ | ✓ | |||
prebuilt-tax.us.w2 | ✓ | ✓ | ✓ | ||||
{ customModelName } | ✓ | ✓ | ✓ | ✓ | ✓ |
Select the testing console in the region where you created your resource:
Australia East Brazil South Canada Central Central India Central US Central US EUAP East Asia East US East US 2 France Central Germany West Central Japan East Japan West Korea Central North Central US North Europe South Africa North South Central US Southeast Asia Switzerland North Switzerland West UAE North UK South West Central US West Europe West US West US 2 West US 3 Norway East Jio India WestRequest URL
Request parameters
Format - [a-zA-Z0-9][a-zA-Z0-9._~-]{1,63}. Unique model name.
List of 1-based page numbers to analyze. Ex. "1-3,5,7-9"
Locale hint for text recognition and document analysis. Value may contain only the language code (ex. "en", "fr") or BCP 47 language tag (ex. "en-US").
Method used to compute string offset and length.
Request headers
Request body
Analyze request parameters.
{
"urlSource": "string"
}
{
"description": "Document analysis parameters.",
"type": "object",
"properties": {
"urlSource": {
"description": "Content at specified URL.",
"type": "string",
"format": "uri"
},
"base64Source": {
"description": "Content represented via Base64 encoding.",
"type": "string",
"format": "byte"
}
},
"example": "{\r\n \"urlSource\": \"string\"\r\n}"
}
Response 202
Request is queued successfully.
Response 400
The top-level error.code property can be one of the following:
Error Code | Message |
---|---|
InvalidRequest | Invalid request. |
InvalidArgument | Invalid argument. |
Top Error Code | Inner Error Code | Message |
---|---|---|
InvalidArgument | InvalidContentSourceFormat | Invalid content source: {details} |
InvalidArgument | InvalidParameter | The parameter {parameterName} is invalid: {details} |
InvalidArgument | InvalidParameterLength | Parameter {parameterName} length must not exceed {maxChars} characters. |
InvalidArgument | InvalidSasToken | The shared access signature (SAS) is invalid: {details} |
InvalidArgument | ParameterMissing | The parameter {parameterName} is required. |
InvalidRequest | InvalidContent | The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats. |
InvalidRequest | InvalidContentDimensions | The input image dimensions are out of range. Refer to documentation for supported image dimensions. |
InvalidRequest | InvalidContentLength | The input image is too large. Refer to documentation for the maximum file size. |
InvalidRequest | NotSupportedApiVersion | The requested operation requires {minimumApiVersion} or later. |
{
"error": {
"code": "InvalidRequest",
"message": "Invalid request.",
"innererror": {
"code": "InvalidContent",
"message": "The file format is unsupported or corrupted. Refer to documentation for the list of supported formats."
}
}
}
Response 403
The top-level error.code property can be one of the following:
Error Code | Message |
---|---|
Forbidden | Access forbidden due to policy or other configuration. |
Top Error Code | Inner Error Code | Message |
---|---|---|
Forbidden | AuthorizationFailed | Authorization failed: {details} |
Forbidden | InvalidDataProtectionKey | Data protection key is invalid: {details} |
Forbidden | OutboundAccessForbidden | The request contains a domain name that is not allowed by the current access control policy. |
{
"error": {
"code": "Forbidden",
"message": "Access forbidden due to policy or other configuration.",
"innererror": {
"code": "OutboundAccessForbidden",
"message": "The request contains a domain name that is not allowed by the current access control policy."
}
}
}
Response 415
The top-level error.code property can be one of the following:
Error Code | Message |
---|---|
UnsupportedMediaType | Request content type is not supported. |
Top Error Code | Inner Error Code | Message |
---|---|---|
UnsupportedMediaType | UnsupportedMediaType | Unsupported media type. |
{
"error": {
"code": "UnsupportedMediaType",
"message": "Request content type is not supported.",
"innererror": {
"code": "UnsupportedMediaType",
"message": "Unsupported media type."
}
}
}
Response 500
The top-level error.code property can be one of the following:
Error Code | Message |
---|---|
InternalServerError | An unexpected error occurred. |
Top Error Code | Inner Error Code | Message |
---|---|---|
InternalServerError | Unknow | Unknow error. |
{
"error": {
"code": "InternalServerError",
"message": "An unexpected error occurred.",
"innererror": {
"code": "Unknown",
"message": "Unknown error."
}
}
}
Response 503
The top-level error.code property can be one of the following:
Error Code | Message |
---|---|
ServiceUnavailable | A transient error has occurred. Please try again. |
Top Error Code | Inner Error Code | Message |
---|---|---|
ServiceUnavailable | ServiceUnavailable | A transient error has occurred. Please try again. |
{
"error": {
"code": "ServiceUnavailable",
"message": "A transient error has occurred. Please try again.",
"innererror": {
"code": "ServiceUnavailable",
"message": "A transient error has occurred. Please try again."
}
}
}
Code samples
@ECHO OFF
curl -v -X POST "https://japanwest.api.cognitive.microsoft.com/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31?pages={string}&locale={string}&stringIndexType=textElements"
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: {subscription key}"
--data-ascii "{body}"
using System;
using System.Net.Http.Headers;
using System.Text;
using System.Net.Http;
using System.Web;
namespace CSHttpClientSample
{
static class Program
{
static void Main()
{
MakeRequest();
Console.WriteLine("Hit ENTER to exit...");
Console.ReadLine();
}
static async void MakeRequest()
{
var client = new HttpClient();
var queryString = HttpUtility.ParseQueryString(string.Empty);
// Request headers
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "{subscription key}");
// Request parameters
queryString["pages"] = "{string}";
queryString["locale"] = "{string}";
queryString["stringIndexType"] = "textElements";
var uri = "https://japanwest.api.cognitive.microsoft.com/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31&" + queryString;
HttpResponseMessage response;
// Request body
byte[] byteData = Encoding.UTF8.GetBytes("{body}");
using (var content = new ByteArrayContent(byteData))
{
content.Headers.ContentType = new MediaTypeHeaderValue("< your content type, i.e. application/json >");
response = await client.PostAsync(uri, content);
}
}
}
}
// // This sample uses the Apache HTTP client from HTTP Components (http://hc.apache.org/httpcomponents-client-ga/)
import java.net.URI;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.utils.URIBuilder;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class JavaSample
{
public static void main(String[] args)
{
HttpClient httpclient = HttpClients.createDefault();
try
{
URIBuilder builder = new URIBuilder("https://japanwest.api.cognitive.microsoft.com/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31");
builder.setParameter("pages", "{string}");
builder.setParameter("locale", "{string}");
builder.setParameter("stringIndexType", "textElements");
URI uri = builder.build();
HttpPost request = new HttpPost(uri);
request.setHeader("Content-Type", "application/json");
request.setHeader("Ocp-Apim-Subscription-Key", "{subscription key}");
// Request body
StringEntity reqEntity = new StringEntity("{body}");
request.setEntity(reqEntity);
HttpResponse response = httpclient.execute(request);
HttpEntity entity = response.getEntity();
if (entity != null)
{
System.out.println(EntityUtils.toString(entity));
}
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
}
}
<!DOCTYPE html>
<html>
<head>
<title>JSSample</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js"></script>
</head>
<body>
<script type="text/javascript">
$(function() {
var params = {
// Request parameters
"pages": "{string}",
"locale": "{string}",
"stringIndexType": "textElements",
};
$.ajax({
url: "https://japanwest.api.cognitive.microsoft.com/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31&" + $.param(params),
beforeSend: function(xhrObj){
// Request headers
xhrObj.setRequestHeader("Content-Type","application/json");
xhrObj.setRequestHeader("Ocp-Apim-Subscription-Key","{subscription key}");
},
type: "POST",
// Request body
data: "{body}",
})
.done(function(data) {
alert("success");
})
.fail(function() {
alert("error");
});
});
</script>
</body>
</html>
#import <Foundation/Foundation.h>
int main(int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
NSString* path = @"https://japanwest.api.cognitive.microsoft.com/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31";
NSArray* array = @[
// Request parameters
@"entities=true",
@"pages={string}",
@"locale={string}",
@"stringIndexType=textElements",
];
NSString* string = [array componentsJoinedByString:@"&"];
path = [path stringByAppendingFormat:@"?%@", string];
NSLog(@"%@", path);
NSMutableURLRequest* _request = [NSMutableURLRequest requestWithURL:[NSURL URLWithString:path]];
[_request setHTTPMethod:@"POST"];
// Request headers
[_request setValue:@"application/json" forHTTPHeaderField:@"Content-Type"];
[_request setValue:@"{subscription key}" forHTTPHeaderField:@"Ocp-Apim-Subscription-Key"];
// Request body
[_request setHTTPBody:[@"{body}" dataUsingEncoding:NSUTF8StringEncoding]];
NSURLResponse *response = nil;
NSError *error = nil;
NSData* _connectionData = [NSURLConnection sendSynchronousRequest:_request returningResponse:&response error:&error];
if (nil != error)
{
NSLog(@"Error: %@", error);
}
else
{
NSError* error = nil;
NSMutableDictionary* json = nil;
NSString* dataString = [[NSString alloc] initWithData:_connectionData encoding:NSUTF8StringEncoding];
NSLog(@"%@", dataString);
if (nil != _connectionData)
{
json = [NSJSONSerialization JSONObjectWithData:_connectionData options:NSJSONReadingMutableContainers error:&error];
}
if (error || !json)
{
NSLog(@"Could not parse loaded json with error:%@", error);
}
NSLog(@"%@", json);
_connectionData = nil;
}
[pool drain];
return 0;
}
<?php
// This sample uses the Apache HTTP client from HTTP Components (http://hc.apache.org/httpcomponents-client-ga/)
require_once 'HTTP/Request2.php';
$request = new Http_Request2('https://japanwest.api.cognitive.microsoft.com/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31');
$url = $request->getUrl();
$headers = array(
// Request headers
'Content-Type' => 'application/json',
'Ocp-Apim-Subscription-Key' => '{subscription key}',
);
$request->setHeader($headers);
$parameters = array(
// Request parameters
'pages' => '{string}',
'locale' => '{string}',
'stringIndexType' => 'textElements',
);
$url->setQueryVariables($parameters);
$request->setMethod(HTTP_Request2::METHOD_POST);
// Request body
$request->setBody("{body}");
try
{
$response = $request->send();
echo $response->getBody();
}
catch (HttpException $ex)
{
echo $ex;
}
?>
########### Python 2.7 #############
import httplib, urllib, base64
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '{subscription key}',
}
params = urllib.urlencode({
# Request parameters
'pages': '{string}',
'locale': '{string}',
'stringIndexType': 'textElements',
})
try:
conn = httplib.HTTPSConnection('japanwest.api.cognitive.microsoft.com')
conn.request("POST", "/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31&%s" % params, "{body}", headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))
####################################
########### Python 3.2 #############
import http.client, urllib.request, urllib.parse, urllib.error, base64
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '{subscription key}',
}
params = urllib.parse.urlencode({
# Request parameters
'pages': '{string}',
'locale': '{string}',
'stringIndexType': 'textElements',
})
try:
conn = http.client.HTTPSConnection('japanwest.api.cognitive.microsoft.com')
conn.request("POST", "/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31&%s" % params, "{body}", headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))
####################################
require 'net/http'
uri = URI('https://japanwest.api.cognitive.microsoft.com/formrecognizer/documentModels/{modelId}:analyze?api-version=2022-08-31')
uri.query = URI.encode_www_form({
# Request parameters
'pages' => '{string}',
'locale' => '{string}',
'stringIndexType' => 'textElements'
})
request = Net::HTTP::Post.new(uri.request_uri)
# Request headers
request['Content-Type'] = 'application/json'
# Request headers
request['Ocp-Apim-Subscription-Key'] = '{subscription key}'
# Request body
request.body = "{body}"
response = Net::HTTP.start(uri.host, uri.port, :use_ssl => uri.scheme == 'https') do |http|
http.request(request)
end
puts response.body