AU2021412659A1 - Architecture for digitalizing documents using multi-model deep learning, and document image processing program - Google Patents

Architecture for digitalizing documents using multi-model deep learning, and document image processing program Download PDF

Info

Publication number
AU2021412659A1
AU2021412659A1 AU2021412659A AU2021412659A AU2021412659A1 AU 2021412659 A1 AU2021412659 A1 AU 2021412659A1 AU 2021412659 A AU2021412659 A AU 2021412659A AU 2021412659 A AU2021412659 A AU 2021412659A AU 2021412659 A1 AU2021412659 A1 AU 2021412659A1
Authority
AU
Australia
Prior art keywords
character string
layout
document image
unit
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
AU2021412659A
Other versions
AU2021412659A9 (en
Inventor
Hossain Shariar Sheikh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Deloitte Touche Tohmatsu LLC
Original Assignee
Deloitte Touche Tohmatsu LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deloitte Touche Tohmatsu LLC filed Critical Deloitte Touche Tohmatsu LLC
Publication of AU2021412659A1 publication Critical patent/AU2021412659A1/en
Publication of AU2021412659A9 publication Critical patent/AU2021412659A9/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/192Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
    • G06V30/194References adjustable by an adaptive method, e.g. learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables

Abstract

The purpose of the present invention is to convert a character string contained in a document image to text data by a method different from conventional optical character recognition. In the present invention, an electronic document generation device is provided with: a document image acquisition unit that acquires a document image obtained by imaging a document; a character string recognition unit that recognizes a character string contained in the document image acquired by the document image acquisition unit using a character string learning model that has learned correspondence between document images and character strings contained in the document images and outputs text data of the recognized character string; and an output unit that outputs the text data as text for an electronic medium.

Description

ARCHITECTURE FOR DIGITALIZING DOCUMENTS USING MULTI-MODEL DEEP LEARNING AND DOCUMENT IMAGE PROCESSING PROGRAM BACKGROUND OF THE INVENTION
1. Field of the Invention
[0001] The present invention relates to an electronic document generation device, an electronic document generation method, and an electronic document generation program
and particularly to an electronic document generation device, an electronic document
generation method, and an electronic document generation program for scanning paper
documents and generating electronic documents.
2. Description of Related Art
[0002] Digital information technology has advanced and paperless systems have spread, but storage or delivery of information using paper documents is still widely used.
In companies having a large amount of paper documents, for example, there is demand for
technology capable of efficiently converting paper documents to digital documents.
[0003] In OCR text recognition technology in the related art, there is a problem in that recognition efficiency of character recognition is poor because the character recognition
is performed by characters (for example, see Patent Document 1).
[Citation List]
[Patent Document]
[0004]
[Patent Document 1]
Japanese Unexamined Patent Application, First Publication No. 2010-244372
SUMMARY OF THE INVENTION
[Technical Problem]
[0005] Therefore, an electronic document generation device, an electronic document generation method, and an electronic document generation program according to the present disclosure are provided for converting a character string contained in a document image to text data using a method different from optical character recognition in the related art.
[Solution to Problem]
[0006] That is, an electronic document generation device according to a first aspect includes: a document image acquiring unit configured to acquire a document image obtained
by imaging a document; a character string recognizing unit configured to recognize a
character string contained in the document image acquired by the document image acquiring
unit using a character string learning model having learned correspondence between
document images and character strings contained in the document images and to generate
text data of the character string; and an output unit configured to output the text data as text
of an electronic medium.
[0007] A second aspect provides the electronic document generation device
according to the first aspect, further including a layout recognizing unit configured to identify
a range of each of a plurality of elements contained in the document image acquired by the
document image acquiring unit in the document image using a layout learning model having
learned correspondence between a plurality of elements contained in document images and
identification information of the plurality of elements, to recognize a type of each of the
plurality of elements, and to acquire position information of each of the plurality of elements
in the document image associated of the range, wherein the character string recognizing unit
recognizes a character string contained in the range recognized by the layout recognizing
unit using the character string learning model and generates text data of the character string,
and the output unit outputs the text data associated with the plurality of elements in the
position information of the range associated with the plurality of elements as text of an
electronic medium.
[0008] A third aspect provides the electronic document generation device
according to the second aspect, wherein the type of each element is one of a character string,
a table, an image, a seal, and handwriting.
[0009] A fourth aspect provides the electronic document generation device according to the third aspect, further including an extraction unit configured to extract each of cells in a table included in an element of which the type recognized by the layout recognizing unit corresponds to a table and to acquire position information of each cell in the document image, wherein the character string recognizing unit recognizes a character string included in each cell extracted by the extraction unit using the character string learning model and generates text data of the character string.
[0010] A fifth aspect provides the electronic document generation device according to any one of the second to fourth aspects, wherein annotations associated with the types corresponding to the elements are given to the elements in the document image including the plurality of elements, the electronic document generation device further includes a layout learning data generating unit configured to accumulate a plurality of the document images to which the annotations are given to generate layout learning data, and the layout learning data is used for supervised learning of the layout learning model.
[0011] A sixth aspect provides the electronic document generation device according to the fifth aspect, wherein position information of ranges associated with the plurality of elements included in the document image in the document image along with the annotations is given to the document image.
[0012] A seventh aspect provides the electronic document generation device according to the fifth or sixth aspect, further including a layout learning data correcting unit configured to correct at least one of the types of the plurality of elements recognized by the layout recognizing unit and the position information of the ranges of the plurality of elements in the document image on the basis of an input and to update the layout learning data by adding the corrected data.
[0013] An eighth aspect provides the electronic document generation device according to the seventh aspect, further including a layout learning unit configured to perform re-learning of the layout learning model using the layout learning data updated by the layout learning data correcting unit.
[0014] A ninth aspect provides the electronic document generation device according to any one of the second to eighth aspects, further including a character string learning data generating unit configured to generate character string learning data which is used for supervised learning of the character string learning model.
[0015] A tenth aspect provides the electronic document generation device
according to the ninth aspect, further including a character string learning data correcting
unit configured to correct text data generated by the character string recognizing unit on the
basis of an input and to update the character string learning data by adding the corrected text
data.
[0016] An eleventh aspect provides the electronic document generation device according to the tenth aspect, further including a character string learning unit configured to
perform re-learning of the character string learning model using the character string learning
data updated by the character string learning data correcting unit.
[0017] A twelfth aspect provides the electronic document generation device according to any one of the second to eleventh aspects, wherein the character string
recognizing unit includes a plurality of the character string learning models and uses the
character string learning models adapted to languages of the character strings included in the
plurality of elements.
[0018] A thirteenth aspect provides the electronic document generation device according to any one of the second to twelfth aspects, further including a preprocessing unit
configured to perform preprocessing on the document image acquired by the document
image acquiring unit, wherein the preprocessing unit includes a background eliminating unit,
a tilt correcting unit, and a shape adjusting unit, the background eliminating unit eliminates
a background of the document image acquired by the document image acquiring unit, the tilt
correcting unit corrects a tilt of the document image acquired by the document image
acquiring unit, and the shape adjusting unit adjusts a shape and a size of the document image
as a whole acquired by the document image acquiring unit.
[0019] A fourteenth aspect provides the electronic document generation device
according to any one of the second to thirteenth aspects, wherein the layout learning model
is one of a layout learning model for a contract, a layout learning model for a bill, a layout learning model for a memorandum, a layout learning model for a delivery note, and a layout learning model for a receipt.
[0020] An electronic document generation method according to a fifteenth aspect is performed by a computer used for an electronic document generation device and includes:
a document image acquiring step of acquiring a document image obtained by imaging a
document; a character string recognizing step of recognizing a character string contained in
the document image acquired in the document image acquiring step using a character string
learning model having learned correspondence between document images and character
strings contained in the document images and generating text data of the character string;
and an output step of outputting the text data as text of an electronic medium.
[0021] An electronic document generation program according to a sixteenth aspect causes a computer used for an electronic document generation device to perform: a document
image acquiring function of acquiring a document image obtained by imaging a document;
a character string recognizing function of recognizing a character string contained in the
document image acquired in the document image acquiring function using a character string
learning model having learned correspondence between document images and character
strings contained in the document images and generating text data of the character string;
and an output function of outputting the text data as text of an electronic medium.
[Advantageous Effects of Invention]
[0022] Since the electronic document generation device according to the present
disclosure includes the document image acquiring unit configured to acquire a document
image obtained by imaging a document, the character string recognizing unit configured to
recognize a character string contained in the document image acquired by the document
image acquiring unit using a character string learning model having learned correspondence
between document images and character strings contained in the document images and to
generate text data of the character string, and the output unit configured to output the text
data as text of an electronic medium and recognizes a character string contained in a
document image using a model subjected to machine learning, it is possible to improve
recognition efficiency of character recognition when the document image is converted to text data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] Features, advantages, and technical and industrial significance of exemplary embodiments of the invention will be described below with reference to the accompanying drawings, in which like numerals denote like elements, and wherein: FIG. 1 is a diagram schematically illustrating an electronic document generation system including an electronic document generation device according to an embodiment. FIG. 2 is a block diagram illustrating a physical configuration of the electronic document generation device. FIG. 3 is a diagram schematically illustrating processes which are performed by the electronic document generation device. FIG. 4 is a block diagram illustrating a functional configuration of the electronic document generation device. FIG. 5 is a diagram illustrating input data and output data of the electronic document generation device. FIG. 6 is a diagram illustrating background removal which is performed in preprocessing. FIG. 7 is a diagram illustrating tilt correction which is performed in preprocessing. FIG. 8 is a diagram illustrating shape adjustment which is performed in preprocessing. FIG. 9 is a diagram illustrating a correction process for missing resolution which is performed in a layout recognizing process. FIG. 10 is a diagram illustrating a correction process for overlap resolution which is performed in the layout recognizing process. FIG. 11 is a diagram illustrating recognition of a layout which is performed in the layout recognizing process. FIG. 12 is a diagram illustrating recognition of a table which is performed in the layout recognizing process. FIG. 13 is a diagram illustrating extraction of a cell image.
FIG. 14 is a diagram illustrating a character string in a cell image.
FIG. 15 is a diagram illustrating arrangement of text data which is performed in a
character string recognizing process.
FIG. 16 is a diagram illustrating elimination of noise which is performed in a character
string recognizing process.
FIG. 17 is a diagram illustrating an example of layout learning data with annotations.
FIG. 18 is a diagram illustrating an example of layout learning data with annotations.
FIG. 19 is a diagram illustrating an example of layout learning data with annotations.
FIG. 20 is a diagram illustrating an example of layout learning data with annotations.
FIG. 21 is a diagram illustrating an example of layout learning data with annotations.
FIG. 22 is a diagram illustrating an example of layout learning data with annotations.
FIG. 23 is a diagram illustrating an example of layout learning data with annotations.
FIG. 24 is a diagram illustrating an example of character string learning data with
annotations.
FIG. 25 is a flowchart of an electronic document generation program.
FIG. 26 is a flowchart (1/3) of an embodiment of the electronic document generation
program.
FIG. 27 is a flowchart (2/3) of the embodiment of the electronic document generation
program.
FIG. 28 is a flowchart (3/3) of the embodiment of the electronic document generation
program.
DETAILED DESCRIPTION OF EMBODIMENTS
[0024] An electronic document generation device 10 according to an embodiment
of the present disclosure will be described below with reference to FIGS. 1 to 24. In this
embodiment, it is assumed that the electronic document generation device 10 is connected
to an information communication network 11 such as the Internet or a local area network
(LAN) for use. An electronic document generation system 100 including the electronic
document generation device 10 will be schematically described below with reference to FIG.
1. FIG. 1 is a diagram schematically illustrating the electronic document generation system 100 including the electronic document generation device 10.
[0025] The electronic document generation system 100 includes an electronic document generation device 10, a user terminal 12, a character string learning model 13, a layout learning model 14, and a document image database 15. The electronic document generation device 10, the user terminal 12, the character string learning model 13, the layout learning model 14, and the document image database 15 are connected to an information communication network 11 and are able to perform information communication with each other.
[0026] The electronic document generation system 100 recognizes character strings contained in a document image and generates text data using the electronic document generation device 10. The electronic document generation device 10 recognizes a layout of a document image using a layout learning model and recognizes character strings contained in the document image using a character string learning model.
[0027] The electronic document generation device 10 is, for example, a type of computer such as a PC and is an information processing device. The electronic document generation device 10 includes an arithmetic processing device and a microcomputer included in various computers and includes instruments and devices for embodying functions according to the present disclosure using applications.
[0028] The character string learning model 13 is a learning model for recognizing an image of a character string contained in a document image and is used for character recognition in the electronic document generation device 10. The character string learning model 13 is not particularly limited in a storage place thereof as long as it can be used via the information communication network 11 by the electronic document generation device 10 and is stored, for example, in an information processing device such as a PC, a server device, or a database. For the purpose of convenient explanation of this embodiment, the character string learning model 13 is assumed to mean an information processing device in which the character string learning model 13 is stored.
[0029] The character string learning model 13 is constituted by an existing learning model or may be independently constituted as a learning model which is appropriate for use of the electronic document generation device 10. The character string learning model 13 includes learning models appropriate for various languages such as Japanese, English, and Chinese and is illustrated as a first character string learning model, a second character string learning model, and a third character string learning model in FIG. 1.
[0030] The character string learning model 13 is not limited to connection to the information communication network 11, and may be included in the electronic document generation device 10 and used under the direct control of the electronic document generation device 10. The character string learning model 13 may be distributed and stored in a plurality of information processing devices connected to the information communication network 11.
[0031] The layout learning model 14 is a learning model that learns correspondence between a plurality of elements contained in document images and identification information of the plurality of elements on the basis of layout learning data which will be described later and is used for layout recognition in the electronic document generation device 10. Similarly to the character string learning model 13, the layout learning model 14 is not particularly limited in a storage place thereof as long as it can be used via the information communication network 11 by the electronic document generation device 10 and is stored, for example, in an information processing device connected to the information communication network 11. For the purpose of convenient explanation of this embodiment, the layout learning model 14 is assumed to mean an information processing device in which the layout learning model 14 is stored.
[0032] The layout learning model 14 includes a layout learning model for a contract, a layout learning model for a bill, a layout learning model for a memorandum, a layout learning model for a delivery note, and a layout learning model for a receipt.
[0033] The layout learning model for a contract is a learning model for recognizing a layout of a document image of a contract and performs learning using the layout learning data for a contract. The layout learning model for a contract learns at what position in the contract what information is located and particularly learns layouts specific to a contract such as itemization, no tables, and a handwriting signature box. The layout learning data for a contract is generated on the basis of document images of at least 3 or 4 contract sheets for each form of 200 types of contract forms with annotations which will be described later.
[0034] The layout learning model for a bill is a learning model for recognizing a
layout of a document image of a bill and performs learning using the layout learning data for
a bill. The layout learning model for a bill learns at what position in the bill what
information is located and particularly learns layouts specific to a bill such as a broad range
of tables, and many wordings such as alphanumeric characters in Japanese. The layout
learning data for a bill is generated on the basis of document images of at least 3 or 4 bill
sheets for each form of 200 types of bill forms with annotations which will be described later.
[0035] The layout learning model for a memorandum is a learning model for
recognizing a layout of a document image of a memorandum and performs learning using
the layout learning data for a memorandum. The layout learning model for a memorandum
learns at what position in the memorandum what information is located and particularly
learns layouts specific to a memorandum such as no tables and a handwriting signature box.
The layout learning data for a memorandum is generated on the basis of document images
of at least 3 or 4 memorandum sheets for each form of 200 types of memorandum forms with
annotations which will be described later.
[0036] The layout learning model for a delivery note is a learning model for
recognizing a layout of a document image of a delivery note and performs learning using the
layout learning data for a delivery note. The layout learning model for a delivery note
learns at what position in the delivery note what information is located and particularly learns
layouts specific to a delivery note such as a broad range of tables and writing of many product
names and product numbers. The layout learning data for a delivery note is generated on
the basis of document images of at least 3 or 4 delivery note sheets for each form of 200
types of delivery note forms with annotations which will be described later.
[0037] The layout learning model for a receipt is a learning model for recognizing
a layout of a document image of a receipt and performs learning using the layout learning
data for a receipt. The layout learning model for a receipt learns at what position in the receipt what information is located and learns layouts specific to a receipt such as a handwriting box for an amount of money and many tables in which amounts of money are written. The layout learning data for a receipt is generated on the basis of document images of at least 3 or 4 receipt sheets for each form of 200 types of receipt forms with annotations which will be described later.
[0038] The layout learning model 14 is not limited to use in the electronic document generation device 10 via the information communication network 11, and may be included
in the electronic document generation device 10. The layout learning model 14 may be
distributed and stored in a plurality of information processing devices connected to the
information communication network 11.
[0039] The document image database 15 is a database in which document images areaccumulated. The electronic document generation device 10 acquires document images
stored in the document image database 15 and generates character string learning data used
for learning of a character string learning model and layout learning data used for learning
of a layout learning model.
[0040] The user terminal 12 is used to operate the electronic document generation
device 10. When erroneously recognized characters are present in an electronic document
generated by the electronic document generation device 10 or when a layout of an electronic
document generated by the electronic document generation device 10 is erroneously
recognized, the electronic document is corrected on the basis of a correction input from a
user of the user terminal 12, and the electronic document generation device 10 receives the
correction and re-training at least one of the character string learning model 13 and the layout
learning model 14.
[0041] A mechanical configuration of the electronic document generation device
10 will be described below with reference to FIG. 2. FIG. 2 is a block diagram illustrating
the mechanical configuration of the electronic document generation device 10. The
electronic document generation device 10 includes an input/output interface 20, a
communication interface 21, a read only memory (ROM) 22, a random access memory
(RAM) 23, a storage unit 24, a central processing unit (CPU) 25, and a graphics processing units (GPU) 28.
[0042] The input/output interface 20 performs transmission and reception of data to and from an external device outside of the electronic document generation device 10.
The external device includes an input device 26 and an output device 27 that input and output
data, for example, to and from the electronic document generation device 10. The input
device 26 includes a keyboard, a mouse, and a scanner, and the output device 27 includes a
monitor, a printer, and a speaker.
[0043] The communication interface 21 has a function of inputting and outputting data, for example, to and from the electronic document generation device 10 in
communication with the outside via the information communication network 11.
The storage unit 24 can be used as a storage device, and various applications required
for operation of the electronic document generation device 10 and various types of data used
by the applications are recorded thereon. The GPU 28 can be suitably used when repeated
arithmetic operations performed in machine learning, for example, are often used and is used
along with the CPU 25.
[0044] The electronic document generation device 10 stores an electronic
document generation program which will be described later in the ROM 22 or the storage
unit 24 and loads the electronic document generation program to a main memory including
the RAM 23. The CPU 25 accesses the main memory to which the electronic document
generation program has been loaded and executes the electronic document generation
program.
[0045] Processes which are performed by the electronic document generation
device 10 will be schematically described below with reference to FIG. 3. FIG. 3 is a
diagram schematically illustrating processes which are performed by the electronic
document generation device 10.
The electronic document generation device 10 performs Processes I to III which will
be described below in that order.
[0046] In Process I, preprocessing 55 including "background elimination," "tilt
correction," and "shape adjustment" of a document image is performed.
The preprocessing 55 means that preprocessing for facilitating character recognition using a learning model is performed on an image including a character string and is for improving recognition accuracy of recognition processes which are performed in Processes II and III.
[0047] In Process II, a layout recognizing process 56 is performed. In the layout recognizing process 56, "layout recognition" of a document image is first performed. The layout recognizing process 56 is a process of recognizing at what position in an input image what information is located.
[0048] Information mentioned herein means things such as a character string, a table, an image, a seal, and handwriting. The electronic document generation device 10 recognizes a layout of a document image, performs "recognition of a table" when a table is contained in the document image, and performs "extraction of an image of a cell" on cells contained in the table.
[0049] In Process III, a character string recognizing process 57 is performed. The character string recognizing process 57 is a process of converting an image containing a character string to text data using the character string learning model 13 having learned correspondence between images and character strings contained in the images. The character string recognizing process 57 may include processes such as "arrangement of text data" and "removal of noise."
[0050] In the character string recognizing process 57, an image of a character string is converted to text data, and "arrangement of text data" and "removal of noise" are performed. "Arrangement of text data" means that, when a space is contained in an extracted image of a character string, the space is also recognized along with the character string and thus text data is arranged along with the space.
[0051] "Removal of noise" means that, when noise is contained in an extracted image of a character string, noise is not recognized by the electronic document generation device 10 and thus is passively removed from the text data. Noise mentioned herein means a pixel not constituting a character in the extracted image of a character string.
[0052] A functional configuration of the electronic document generation device 10
will be described below with reference to FIG. 4. FIG. 4 is a block diagram illustrating the
functional configuration of the electronic document generation device 10. The electronic
document generation device 10 includes a document image acquiring unit 31, a
preprocessing unit 32, a background eliminating unit 32a, a tilt correcting unit 32b, a shape
adjusting unit 32c, a layout recognizing unit 33, an extraction unit 34, a character string
recognizing unit 35, an output unit 36, a layout learning data generating unit 40, a layout
learning data correcting unit 41, a layout learning unit 42, a character string learning data
generating unit 43, a character string learning data correcting unit 44, and a character string
learning unit 45 by causing the CPU 25 to execute an electronic document generation
program which will be described later.
[0053] The document image acquiring unit 31 (see FIG. 4) acquires a document
image by imaging a document.
The document image acquiring unit 31 may acquire a document image form the
document image database 15. Alternatively, the document image acquiring unit 31 may
acquire a document image from the scanner of the input device 26.
[0054] A document image acquired by the document image acquiring unit 31 and
an electronic document output from the electronic document generation device 10 will be
described below with reference to FIG. 5.
FIG. 5 is a diagram illustrating input data and output data of the electronic document
generation device 10, where FIG. 5(a) illustrates a document image acquired as input data
by the document image acquiring unit 31. Noise such as a stapler mark 50, handwriting 51,
a seal 52, and an image 53 is present in the document image.
[0055] This noise serves as an obstacle in a person or an information processing
device such as a PC understanding details of the document or is not necessary therefor.
Another example of noise includes a punched hole for filing and a fold remaining in a paper
sheet. The fold may be recognized as a line and needs to be removed such that it is not
reflected in the electronic document.
[0056] The electronic document generation device 10 converts a character string in the document image to text data while maintaining the layout of the acquired document image and outputs an electronic document (see FIG. 5(b)). The electronic document generation device 10 performs noise removal by performing active processing on a stapler mark 50, handwriting 51, a seal 52, and an image 53 recognized as noise and removes pixels in the document image which are not recognized as a character string and noise through passive processing of not leaving the pixels in the electronic document.
[0057] A table in the document image illustrated in FIG. 5(b) is output as object data in the electronic document along with text data while maintaining the arrangement of
the document image. The electronic document generation device 10 can arbitrarily select
elements contained in the output electronic document.
[0058] For example, a stapler mark 50, handwriting 51, a seal 52, and an image 53 are removed in normal use, but the seal 52 and the image 53 may be included as image data
in the electronic document and then output.
[0059] The preprocessing unit 32 (see FIG. 4) performs the preprocessing 55 on the document image acquired by the document image acquiring unit 31.
The preprocessing 55 is performed to improve recognition accuracy of image
recognition using a learning model in the layout recognizing unit 33 and the character string
recognizing unit 35 which will be described later.
[0060] The preprocessing unit 32 includes the background eliminating unit 32a, the
tilt correcting unit 32b, and the shape adjusting unit 32c.
The background eliminating unit 32a (see FIG. 4) eliminates the background of the
document image acquired by the document image acquiring unit 31.
[0061] Processes which are performed by the background eliminating unit 32a will
be described below with reference to FIG. 6. FIG. 6 is a diagram illustrating background
elimination which is performed in the preprocessing 55. FIG. 6(a) illustrates a document
image 58a before background elimination has been performed thereon, and FIG. 6(b)
illustrates a document image 58b after background elimination has been performed thereon.
[0062] The background eliminating unit 32a eliminates the background of the
document image by changing the background color of the document image to white.
Specifically, the background eliminating unit 32a detects the background color of the acquired document image and determines whether the background color is white. When it is determined that the background color is not white, the background eliminating unit 32a extracts information other than the background of the document image, changing the background color to white, and then overlaps the extracted information thereon.
[0063] With the background eliminating unit 32a, it is possible to remove noise serving as a reason of erroneous image recognition in the layout recognizing unit 33 and the character string recognizing unit 35 by eliminating the background and to improve recognition accuracy.
[0064] The tilt correcting unit 32b (see FIG. 4) corrects a tilt of the document image acquired by the document image acquiring unit 31. Processes which are performed by the tilt correcting unit 32b will be described below with reference to FIG. 7. FIG. 7 is a diagram illustrating tile correction which is performed in the preprocessing 55. FIG. 7(a) illustrates a document image 59a before tilt correction has been performed thereon, and FIG. 7(b) illustrates a document image 59b after tilt correction has been performed thereon.
[0065] The tilt correcting unit 32b corrects a tilt of a character string when a tilted character string is contained in the document image such that the character string is parallel or perpendicular to a writing direction. The tilt correcting unit 32b corrects the tilted character string to be parallel to a vertical writing direction when the document image is in vertical writing, and corrects the tilted character string to be parallel to a horizontal writing direction when the document image is in horizontal writing.
[0066] Specifically, the tilt correcting unit 32b extracts character strings of a document image and determines whether a tilted character string is contained in the extracted character strings. When it is determined that a tilted character string is present in the extracted character strings, the tilt correcting unit 32b detects a tilt angle with respect to the writing direction of the tilted character string and performs a rotating process such that the tilt angle of the tilted character string is zero.
[0067] With the tilt correcting unit 32b, it is possible to improve recognition accuracy of image recognition in the character string recognizing unit 35 by correcting a tilt of a character string. It is also possible to reduce an error of layout recognition in the layout recognizing unit 33.
[0068] The shape adjusting unit 32c (see FIG. 4) adjusts a shape and a size of the document image as a whole acquired by the document image acquiring unit 31.
Processes which are performed by the shape adjusting unit 32c will be described below
with reference to FIG. 8. FIG. 8 is a diagram illustrating shape adjustment which is
performed in the preprocessing. FIG. 8(a) illustrates a document image 60a before shape
adjustment has been performed thereon, and FIG. 8(b) illustrates a document image 60b after
shape adjustment has been performed thereon.
[0069] When the shape of the document image as a whole acquired by the document image acquiring unit 31 is different from that of an actual document, the shape
adjusting unit 32c adjusts a shape of the document image as a whole on the basis of the whole
shape of the actual document. Specifically, when an aspect ratio of the document image as
a whole acquired by the document image acquiring unit 31 is different from an aspect ratio
of the whole actual document, the shape adjusting unit 32c performs adjustment such that
the aspect ratio of the whole document image is equal to the aspect ratio of the whole actual
document.
[0070] When the size of the document image acquired by the document image
acquiring unit 31 is excessively large or excessively small, there is a likelihood that
subsequent processes will not be performed normally and thus the shape adjusting unit 32c
adjusts the size of the document image acquired by the document image acquiring unit 31
such that the subsequent processes are performed normally.
[0071] With the shape adjusting unit 32c, by adjusting the shape and the size of the
document image acquired by the document image acquiring unit 31, it is possible to improve
recognition accuracy of a layout based on an actual document in the layout recognizing unit
33 which is performed subsequently and to further improve recognition accuracy of image
recognition in the character string recognizing unit 35.
[0072] The layout recognizing unit 33 (see FIG. 4) identifies a range of each of a plurality of elements contained in the document image acquired by the document image acquiring unit 31 in the document image 61 using a layout learning model 14 having learned correspondence between a plurality of elements contained in the document image 61 and identification information of each of the plurality of elements, recognizes a type of each of the plurality of elements, and acquires position information of the ranges of the plurality of elements in the document image 61.
[0073] The type of each element may be one of a character string 48, a table 49, an image 53, a seal 52, and handwriting 51. The type of each element is not limited thereto, and a stapler mark 50, a punched hole mark, a fractured (tom) mark, and a copying carbon stain, for example, may be used.
[0074] Types appropriate for document types (for example, a contract, a bill, a memorandum, a delivery note, or a receipt) may be used as the type of each element. For example, when copying carbon is added to a rear side of a receipt and the carbon is transferred to a front side to form a stain, the copying carbon stain may be actively removed using the copying carbon stain as the type of an element.
[0075] The layout learning model 14 may be one of a layout learning model for a contract, a layout learning model for a bill, a layout learning model for a memorandum, a layout learning model for a delivery note, and a layout learning model for a receipt.
[0076] The types of elements may be classified into a necessary element and an unnecessary element according to the type of a document. In this case, the layout recognizing unit 33 may not acquire position information of the corresponding element when the recognized element out of the plurality of elements contained in the document image 61 corresponds to an unnecessary element and acquire position information the corresponding element when the recognized element corresponds to a necessary element. Alternatively, the layout recognizing unit 33 may recognize only a necessary element out of the plurality of elements contained in the document image 61 and acquire position information of the corresponding element.
[0077] The layout recognizing unit 33 recognizes types of elements, acquires position information of the ranges of the elements in the document image, and then corrects the ranges of the elements and the acquired position information on the basis of an actual document when the elements overlap or the elements are excessively separated.
[0078] An example of a correction process for mission resolution which is performed by the layout recognizing unit 33 when a recognition range recognized by the layout recognizing unit 33 has missing will be described below with reference to FIG. 9. Missing means that a part of a range to be recognized as an element by the layout recognizing unit 33 is not recognized and apart of the range of an element is missed. FIG.9isadiagram illustrating a correction process for missing resolution which is performed in the layout recognizing process, where FIG. 9(a) illustrates a state before the correction has been performed and FIG. 9(b) illustrates a state after the correction has been performed.
[0079] When an image 70 of a character string contained in the document image acquired by the document image acquiring unit 31 is recognized as a character string, the layout recognizing unit 33 performs a correction process of determining whether the recognition range thereof has missing and performs a correction process of adding the missed part when there is missing.
[0080] FIG. 9(a) illustrates a state in which the layout recognizing unit 33 recognizes an image 70 of a character string as a character string in a recognition range 72a. The recognition range 72a has missing in the left end part of the image 70 of a character string. The layout recognizing unit 33 determines whether a black line is within a predetermined range around the recognition range 72a and performs correction of adding a range 72b including the black line to the recognition range 72a when there is a black line (see FIG. 9(b)).
[0081] The determination performed by the layout recognizing unit 33 is not limited to a black line, but whether a line with the same color as characters or a line with a preset color is within a predetermined range near the recognition range 72a may be determined. This is because the correction process for missing resolution performed in the layout recognizing process is mainly for improving recognition accuracy of a character recognizing process which is performed subsequently.
[0082] With this correction process, even when the range of an element recognized by the layout recognizing unit 33 has missing, it is possible to correct the recognition range to a normal recognition range by adding the missed range thereto, and the character string recognizing unit 35 can normally recognize a character string contained in the corresponding element.
[0083] An example of correction which is performed by the layout recognizing unit 33 when a recognition range recognized by the layout recognizing unit 33 overlaps another
element will be described below with reference to FIG. 10. FIG. 10 is a diagram illustrating
a correction process for overlap resolution which is performed in the layout recognizing
process, where FIG. 10(a) illustrates a state before the correction has been performed and
FIG. 10(b) illustrates a state after the correction has been performed.
[0084] When an image 73 of a character string contained in the document image acquired by the document image acquiring unit 31 is recognized as a character string, the
layout recognizing unit 33 determines whether a recognition range 75a thereof overlaps
another element (for example, a table 74) and performs a correction process for resolving an
overlap when the overlap occurs.
[0085] FIG. 10(a) illustrates a state in which the layout recognizing unit 33
recognizes the image 73 of the character string as a character string in the recognition range
75a. The recognition range 75a overlaps a table 74 on the right side of the image 73 of the
character string over a blank (space). The layout recognizing unit 33 determines whether a
blank (space) with a predetermined size is present in the recognition range 75a and performs
correction for obtaining a recognition range 75b by deleting the recognition range 75a
associated with the blank (space) and a part on the right side of the blank (space) when the
blank (space) is present (see FIG. 10(b)).
[0086] Since a blank (space) with a predetermined size is necessarily present
between an element and another element, the layout recognizing unit 33 determines that the
recognition range overlaps another element when the blank (space) with a predetermined
size is present in the recognition range. With the correction process for overlap resolution
which is performed in the layout recognizing process, it is possible to improve recognition accuracy of a layout in the layout recognizing unit 33.
[0087] Processes which are performed by the layout recognizing unit 33 will be described below with reference to FIG. 11. FIG. 11 is a diagram illustrating layout
recognition which is performed in the layout recognizing process, where FIG. 11(a)
illustrates a state of a document image 61 before the layout recognition has been performed
thereon and FIG. 11(b) illustrates a state of a document image after the layout recognition
has been performed thereon.
[0088] The layout recognizing unit 33 identifies ranges in the document image 61 of elements (a character string 48, a table 49, a seal 52, and an image 53) contained in the
document image 61 through image recognition using the layout learning model 14.
[0089] In FIG. 11(b), for the purpose of convenient explanation, the range of the
identified character string 48 is surrounded by a solid line, and the ranges of the identified
table 49, the identified seal 52, and the identified image 53 are surrounded by dotted lines.
Since a boundary of an element has only to be recognized by the electronic document
generation device 10, it may not be visible to a person.
[0090] The layout recognizing unit 33 recognizes types of the corresponding elements through image recognition using the layout learning model 14 in the identified
ranges in the document image 61 and acquires position information of the ranges in the
document image 62 along with the types of the elements. The position information may be
expressed by a two-dimensional orthogonal coordinate system with a predetermined point in
the document image 62 as an origin.
[0091] The layout learning model 14 is set in advance according to the type of the
document image 61, and the layout recognizing unit 33 recognizes a layout of the document
image 61 using the layout learning model 14 set in advance.
[0092] That is, the layout recognizing unit 33 performs image recognition using the
layout learning model 14 for a contract when the document image 61 acquired by the
document image acquiring unit 31 is a contract, performs image recognition using the layout
learning model 14 for a bill when the document image 61 is a bill, performs image
recognition using the layout learning model 14 for a memorandum when the document image
61 is a memorandum, performs image recognition using the layout learning model 14 for a
delivery note when the document image 61 is a delivery note, and performs image
recognition using the layout learning model 14 for a receipt when the document image 61 is
a receipt.
[0093] Since the layout recognizing unit 33 uses the layout learning model 14
according to the type of the document image 61 acquired by the document image acquiring
unit 31, it is possible to improve recognition accuracy of a layout of the document image 61.
[0094] The extraction unit 34 (see FIG. 4) extracts each cell of a table contained in an element of which the type recognized by the layout recognizing unit 33 corresponds to a
table and acquires position information of the cells in the document image.
[0095] Recognition of a table 49 which is performed by the layout recognizing unit 33 will be described below with reference to FIG. 12. FIG. 12 is a diagram illustrating
recognition of a table which is performed in the layout recognizing process 56, where FIG.
12(a) illustrates a table 63 before recognition has been performed thereon by the layout
recognizing unit 33 and FIG. 12(b) illustrates a table 64 after recognition has been performed
thereon by the layout recognizing unit 33. In FIG. 12(b), for the purpose of convenient
explanation, lines recognized as vertical lines 65 are denoted by one-dot chain lines, and
lines recognized as horizontal lines 66 are denoted by dotted lines.
[0096] The layout recognizing unit 33 recognizes a length and a position of each of
all the vertical lines 65 and the horizontal lines 66 constituting the table 64. The layout
recognizing unit 33 recognizes all the cells contained in the table 64 by recognizing the
lengths and the positions of all the vertical lines 65 and the horizontal lines 66 constituting
the table 64. That is, the layout recognizing unit 33 recognizes a rectangle constituted by
two neighboring vertical lines 65 and two neighboring horizontal lines 66 as a cell.
[0097] The layout recognizing unit 33 also recognizes line types of the lines
constituting the table 64. The recognized line types are reflected in objects of lines
constituting a table contained in an electronic document when the electronic document is
reproduced on the basis of the acquired document image. For example, when lines of a
table in the document image 62 are dotted lines, the lines of the table contained in the electronic document reproduced on the basis of the document image 62 are expressed as objects of dotted lines.
[0098] The extraction unit 34 extracts an image of each cell of all the cells contained in the table 64 recognized by the layout recognizing unit 33.
Extraction of a cell pixel which is performed by the extraction unit 34 will be described
below with reference to FIG. 13. FIG. 13 is a diagram illustrating extraction of a cell image.
A cell 67 extracted by the extraction unit 34 may include a plurality of character strings.
[0099] The extraction unit 34 acquires an image of each cell and position information of the cell in the table 64 for all the cells contained in the table 64. The position
information may be expressed by a two-dimensional orthogonal coordinate system with a
predetermined point in the table 64 as an origin or may be expressed by (row, column) in the
table 64.
[0100] The extraction unit 34 reproduces all the vertical lines and the horizontal lines constituting the table recognized by the layout recognizing unit 33 and generates
position information of all the cells.
[0101] A cell 67 containing a plurality of character strings will be described below
with reference to FIG. 14. FIG. 14 is a diagram illustrating character strings in a cell image.
[0102] When the extracted cell 67 contains a plurality of rows of character strings, the extraction unit 34 additionally extracts an image for each character string for all the
character strings. The cell 67 illustrated in FIG. 14 contains two rows of character strings,
and the extraction unit 34 extracts an image 67a of a character string and an image 67b of a
character string.
[0103] The character string recognizing unit 35 (see FIG. 4) recognizes a character
string contained in the document image acquired by the document image acquiring unit 31
using the character string learning model 13 having learned correspondence between
document images and character strings contained in the document images and generates text
data of the character string.
[0104] The character string recognizing unit 35 may recognize a character string
contained in a range recognized by the layout recognizing unit 33 using the character string learning model 13 and generate text data of the character string.
[0105] The character string recognizing unit 35 may recognize each of character strings contained in the cells extracted by the extraction unit 34 using the character string
learning model 13 and generate text data of the character strings.
[0106] The character string recognizing unit 35 may include a plurality of character
string learning models 13 and use a character string learning model 13 adapted to the
languages of the character strings contained in the plurality of elements.
The character string recognizing unit 35 uses a character string learning model
appropriate for recognition of character strings in English to recognize a document image
written in English, whereby it is possible to improve recognition accuracy.
[0107] Character recognition which is performed by the character string recognizing unit 35 will be described below with reference to FIGS. 15 and 16.
FIG. 15 is a diagram illustrating arrangement of text data which is performed in the
character string recognizing process 57, where FIG. 15(a) illustrates an image 67a of a
character string before the character recognition has been performed thereon and FIG. 15(b)
illustrates a character string 68a, that is, text data 68a, after the character recognition has
been performed thereon.
[0108] FIG. 16 is a diagram illustrating noise removal which is performed in the character string recognizing process 57, where FIG. 16(a) illustrates an image 71a of a
character string before the character recognition has been performed thereon and FIG. 15(b)
illustrates a character string 71b, that is, text data 71b, after the character recognition has
been performed thereon.
[0109] The image 67a of the character string illustrated in FIG. 15(a) contains a
handwritten check mark in addition to one row of character string. The character string
contains a blank between a word and a word. The character string recognizing unit 35
recognizes the whole image 67a of the character string using the character string learning
model 13 and generates text data.
[0110] The character string recognizing unit 35 recognizes two wordings "L/C NO:"
and "ILC18H000219" and a blank between the two wordings in the image 67a of the character string and generates text data corresponding to the two wordings and text data corresponding to the blank between the two wordings (68a: see FIG. 15(b)).
Accordingly, since the character string recognizing unit 35 recognizes a space between
wordings and converts the space to text data, it is possible to arrange the two wordings
separately similarly to the image 67a.
[0111] Since the character string recognizing unit 35 does not recognize a
handwritten check mark and does not add the handwritten check mark to text data when
recognizing the image 67a of the character string, the handwritten check mark is deleted
from the electronic document which is output (68a: see FIG. 15(b)). Accordingly, noise
such as the handwritten check mark which is not recognized by the character string
recognizing unit 35 is passively removed from the electronic document.
[0112] In the image 71a of a character string illustrated in FIG. 16(a), a part of a
seal overlapping one row of character string remains as noise. The character string
recognizing unit 35 recognizes the whole image 71a of a character string using the character
string learning model 13 and generates text data.
[0113] The character string recognizing unit 35 recognizes the whole image 71a of
a character string and generates text data corresponding to the character string "authorized
to act on behalf of the" (71b: see FIG. 16(b)).
[0114] Since noise contained in the image 71a of a character string is not
recognized by the character string recognizing unit 35, the noise is passively removed from
the electronic document (71b: see FIG. 16(b)).
[0115] A document image and text data of a character string after character
recognition has been performed on the document image have been described above with
reference to FIGS. 15 and 16, but the character string learning model 13 can learn a plurality
of pieces of data in which FIG. 15(a) and FIG. 15(b) are correlated with each other and a
plurality of pieces of data in which FIG. 16(a) and FIG. 16(b) are correlated with each other
as training data, whereby it is possible to embody character recognition from an image using
deep learning.
[0116] The character string recognizing unit 35 may acquire attribute data such as sizes and fonts of characters contained in character strings when the character strings contained in the images 67a and 71a are recognized using the character string learning model
13. The attribute data of characters is reflected as attribute data of text data output from the
output unit 36 which will be described later.
[0117] The output unit 36 (see FIG. 4) outputs text data as text of an electronic
medium.
The output unit 36 may the output position information of the ranges of a plurality of
elements and text data of the plurality of elements as text of an electronic medium.
[0118] An electronic medium is not limited to data electrically stored in a recording medium and may include data which is not stored in a recording medium but of which details
can be handled by an information processing device such as a PC.
Position information of an element may be expressed by a two-dimensional orthogonal
coordinate system with a predetermined point in the document image 62 as an origin.
[0119] Since the output unit 36 outputs text data of a plurality of elements on the basis of the position information of the elements, it is possible to remove noise, to convert a
character string in the document image 61 to text data, and to output an electronic document
while maintaining the layout of the acquired document image 61.
[0120] The output unit 36 may reflect attribute data of characters acquired by the character string recognizing unit 35 in text data and output the text data as an electronic
document. With the output unit 36, the electronic document generation device 10 can
reproduce attribute data such as sizes and fonts of characters contained in the document
image 61 as attribute data of text data contained in the electronic document to be output.
[0121] The layout learning data generating unit 40 (see FIG. 4) adds an annotation
associated with the type of each element in a document image containing a plurality of
elements to the corresponding element and accumulates a plurality of document images with
annotations to generate layout learning data.
[0122] The layout learning data is used for supervised learning of the layout
learning model 14.
Position information of ranges of the plurality of elements contained in a document image in the document image along with an annotation may be added to the document images accumulated in the layout learning data.
[0123] The layout learning data with annotations will be described below with
reference to FIGS. 17 to 23. FIGS. 17 to 23 are diagrams illustrating examples of layout
learning data with annotations.
[0124] The layout learning data generating unit 40 acquires a document image from
the document image database 15, adds annotations to the document image, and generates
layout learning data. When the layout learning data is generated, a user may manually
generate the layout learning data without using the layout learning data generating unit 40.
When a user manually generates the layout learning data, annotations can be added to a
document image acquired from the document image database 15 using the user terminal 12.
[0125] Layout learning data used for leaning of the layout learning model 14 for a
bill will be described below with reference to FIGS. 17 and 18. Annotation symbols are
added to elements contained in a document image such that a character string, a table, an
image, a seal, an outer line, and noise which are the elements can be identified and classified
by the electronic document generation device 10.
[0126] An annotation symbol 76 of a character string is added to an element
associated with a character string, the character string is surrounded by a rectangular
enclosing line, and a tag "Text" is added as a mark to the enclosing line. Apartsurrounded
by the rectangular enclosing line is learned as a range of the element associated with the
character string in the document image by the layout learning model 14.
[0127] An annotation symbol 77 of a table is added to an element associated with
a table, a rectangular enclosing line is superimposed on the outer line of the table, and a tag
"Border Table" is added as a mark to the enclosing line. A part surrounded by the
rectangular enclosing line is learned as a range of the element associated with the table in
the document image by the layout learning model 14.
[0128] An annotation symbol 78 of an image is added to an element associated with
an image, an enclosing line indicating the annotation symbol is superimposed on a boundary
line of the image, and a tag "Image" is added as a mark to the enclosing line. It is assumed that the image includes a logo, a mark, a photograph, and an illustration. Apartsurrounded by the enclosing line is learned as a range of the element associated with the image in the document image by the layout learning model 14.
[0129] An annotation symbol 79 of a seal is added to an element associated with a seal, an enclosing line indicating the annotation symbol is superimposed on a boundary line
of the seal, and a tag "Hun" is added as a mark to the enclosing line. A part surrounded by
the enclosing line is learned as a range of the element associated with the seal in the document
image by the layout learning model 14.
[0130] An annotation symbol 80 of an outer line is added to an element associated with an outer line, an enclosing line is superimposed on a boundary line of the outer line,
and a tag "Border" is added as a mark to the enclosing line. Lengths and positions of four
segments constituting the enclosing line are learned by the layout learning model 14.
[0131] An annotation symbol 81 of noise is added to an element associated with noise, the noise is surrounded by a rectangular enclosing line, and a tag "Noise" is added as
a mark to the enclosing line. A part surrounded by the enclosing line is learned as a range
of the element associated with the noise in the document image by the layout learning model
14.
[0132] Layout learning data used for learning of recognition of a table will be described below with reference to FIG. 19. In a table contained in the document image
acquired from the document image database 15, a one-dot chain line which is an annotation
symbol 83 of a vertical line is superimposed on all the vertical lines constituting the table,
and a dotted line which is an annotation symbol 84 of a horizontal line is superimposed on
all the horizontal lines constituting the table.
[0133] The layout learning model 14 can learn the size of the table, the range of the
table, the position thereof, and information of all the cells of the table by recognizing all the
one-dot chain lines and all the dotted lines. Information of the cells includes the number of
cells in the table and positions of the cells in the table, and the position of a cell in the table
is expressed by (row, column) of the table.
[0134] Layout learning data used for learning of recognition of a character string in a cell of a table will be described below with reference to FIGS. 20 and 21. FIG. 20 illustrates layout learning data in which one row of character string is contained in each cell.
FIG. 21 illustrates layout learning data for recognizing a table containing a cell containing
one row of character string, a cell containing two rows of character strings, and a cell
containing three rows of character strings.
[0135] As illustrated in FIGS. 20 and 21, the annotation symbol 76 of a character string is added to each character string regardless of the number of rows of character strings
contained in one cell, the character string is surrounded by a rectangular enclosing line, and
a tag "Text" is added as a mark to the enclosing line.
[0136] The layout learning model 14 learns the ranges of the annotation symbol 76
of the character strings and positions of the character strings in the table. The electronic
document generation device 10 can reproduce the table by outputting text data of the
character strings as an electronic document along with object data associated with all the
vertical lines and the horizontal lines constituting the table.
[0137] Layout learning data used for learning of recognition of a character string
in a cell of a table will be described below with reference to FIG. 22. The annotation
symbol 76 of a character string is added to each character string contained in a cell of the
table illustrated in FIG. 22, the character string is surrounded by a rectangular enclosing line,
and a tag "Text" is added as a mark to the enclosing line.
[0138] The layout learning model 14 learns the ranges of the annotation symbol 76
of the character strings and position information of the character strings in the document.
The electronic document generation device 10 can reproduce the table in the electronic
document by locating the text data of the character strings at positions in the document. The
electronic document generation device 10 can reproduce the table in the electronic document
by only outputting text data without reproducing all the vertical lines and the horizontal lines
constituting the table in the electronic document.
[0139] Layout learning data used for learning of recognition of a seal will be
described below with reference to FIG. 23. By using the layout learning model illustrated
in FIG. 23, the layout learning model 14 can learn a range and a position of a seal on the basis of the element of the character string and a blank located below the character string without using the element corresponding to the seal.
[0140] The layout learning data correcting unit 41 (see FIG. 4) corrects at least one of a type of each of a plurality of elements acquired by the layout recognizing unit 33 and position information of a range of each of the plurality of elements in the document image on the basis of an input and updates the layout learning data by adding the corrected data thereto.
[0141] A difference may occur between a document image 61 before image recognition has been performed thereon by the layout recognizing unit 33 and a document image 62 after image recognition has been performed thereon by the layout recognizing unit 33. Examples of such a case include a case in which a part of a character string is not recognized, a case in which an element to be recognized as an image is recognized as a seal, and a case in which a position of a table is shifted.
[0142] In this case, the layout learning data is updated by correcting the document image 62 after image recognition has been performed thereon by the layout recognizing unit 33 such that it matches the document image 61 before image recognition has been performed thereon by the layout recognizing unit 33 and adding the corrected data to the layout learning data.
[0143] The layout learning unit 42 (see FIG. 4) performs retraining of the layout learning model 14 using the layout learning data updated by the layout learning data correcting unit 41. When the layout learning model 14 is retrained, it is possible to improve recognition accuracy of a layout of a document image.
[0144] The character string learning data generating unit 43 (see FIG. 4) generates character string learning data used for supervised learning of the character string learning model 13. The character string learning data correcting unit 44 (see FIG. 4) updates the character string learning data by correcting text data generated by the character string recognizing unit 35 on the basis of an input and adding the corrected text data thereto.
[0145] The character string learning unit 45 (see FIG. 4) performs retraining of the character string learning model 13 using the character string learning data updated by the character string learning data correcting unit 44.
[0146] The character string learning data generating unit 43 generates character string learning data by acquiring a document image from the document image database 15 and adding annotations to the document image. When the character string learning data is generated, a user may manually generate the character string learning data without using the character string learning data generating unit 43. When a user manually generates the character string learning data, the user can add annotations to the document image acquired from the document image database 15 using the user terminal 12.
[0147] Character string learning data used for learning of the character string learning model 13 will be described below with reference to FIG. 24. FIG. 21 is a diagram illustrating an example of character string learning data with annotations. FIG. 24 illustrates an output screen of the character string learning data generating unit 43 which is displayed on the user terminal 12 or the output device 27 of the electronic document generation device 10.
[0148] The character string learning data generating unit 43 adds text data corresponding to each character string as an annotation 85 of the text data to the character string contained in the document image acquired from the document image database 15.
[0149] Instead of text data which is added as annotations, corresponding character codes may be added as the annotation 85 of the text data. When a character string contained in the document image includes a blank, the character string learning data generating unit 43 generates the character string learning data such that the text data corresponding to the character string similarly includes a blank.
[0150] An electronic document generation method which is performed by the electronic document generation device 10 according to this embodiment will be described below with reference to FIG. 25 along with an electronic document generation program. FIG. 25 illustrates a flowchart of the electronic document generation program. The electronic document generation method is performed by the CPU 25 of the electronic document generation device 10 in accordance with the electronic document generation program.
[0151] The electronic document generation program causes the CPU 25 of the electronic document generation device 10 to embody various functions such as a document
image acquiring function, a preprocessing function, a layout recognizing function, an
extraction function, a character recognizing function, and an output function. These
functions are performed in the order illustrated in FIG. 25, and may be appropriately
performed in the changed order. These functions are the same as in the aforementioned
description of the electronic document generation device 10, and thus detailed description
thereof will be omitted.
[0152] The document image acquiring function acquires a document image by imaging a document (S31: a document image acquiring step).
A format of the document image may be, for example, PDF, JPG, or GIF, and may
include a data format which can be processed as an image by the electronic document
generation device 10.
[0153] The preprocessing function performs preprocessing on the document image acquired by the document image acquiring function (S32: a preprocessing step).
The preprocessing function includes a background eliminating function, a tilt
correcting function, and a shape adjusting function, the background eliminating function
removes the background of the document image acquired by the document image acquiring
function, the tilt correcting function corrects a tilt of the document image acquired by the
document image acquiring function, and the shape adjusting function adjusts a shape and a
size of the document image as a whole acquired by the document image acquiring function.
[0154] The layout recognizing function identifies a range of each of a plurality of
elements contained in the document image acquired by the document image acquiring
function in the document image using a layout learning model 14 having learned
correspondence between a plurality of elements contained in document images and
identification information of the plurality of elements, recognizes a type of each of the
plurality of elements, and acquires position information of a range of each of the plurality of elements in the document image (S33: a layout recognizing step).
[0155] Types of elements can be classified into a necessary element and an unnecessary element according to the type of the document. In this case, the layout recognizing function may not acquire position information of a recognized element when the recognized element out of the plurality of elements contained in the document image acquired by the document image acquiring function corresponds to an unnecessary element and acquire position information of the recognized element when the recognized element corresponds to a necessary element. Alternatively, the layout recognizing function may recognize only necessary elements out of the plurality of elements contained in the document image 61 and acquire position information of the elements.
[0156] The layout recognizing function recognizes a type of each element, acquires position information of the range of each element in the document image, and then corrects the range of each element and the acquired position information on the basis of an actual document when the elements overlap each other or when the elements are excessively separated.
[0157] The layout recognizing function recognizes lengths and positions of all vertical lines and horizontal lines constituting a table. The layout recognizing function recognizes all the cells contained in the table by recognizing the lengths and the positions of all the vertical lines and the horizontal lines constituting the table. The layout recognizing function recognizes a rectangle constituted by two neighboring vertical lines and two neighboring horizontal lines as a cell.
[0158] The layout recognizing function also recognizes line types of lines constituting the table. The recognized line types are reflected in objects of lines constituting the table contained in an electronic document when the electronic document is reproduced on the basis of the acquired document image. Accordingly, for example, when a line of a table in the document image is a dotted line, the line of the table contained in the electronic document reproduced on the basis of the document image is expressed as an object of a dotted line.
[0159] The extraction function extracts each cell in a table contained in an element of which a type recognized by the layout recognizing function corresponds to a table and acquires position information of each cell in the document image (S34: an extraction step).
The extraction function reproduces all the vertical lines and the horizontal lines
constituting the table recognized by the layout recognizing function and generates position
information of all the cells.
[0160] A cell extracted by the extraction function may include a plurality of character strings. When a plurality of rows of character strings are contained in the
extracted cell, the extraction function extracts an image of each character string of all the
character strings.
[0161] An image of a character string recognized by the layout recognizing
function and an image of a character string extracted by the extraction function are sent to
the character recognizing function for each row.
[0162] The character recognizing function recognizes a character string contained in the document image acquired by the document image acquiring function using a character
string learning model having learned correspondence between document images and
character strings contained in the document image and generates text data of the character
string (S35: a character recognizing step).
[0163] The output function outputs the text data as text of an electronic medium
(Step S36: an output step).
The output function outputs the text data on the basis of the position information of the
character strings acquired by the layout recognizing function and the position information of
the cells acquired by the extraction unit in the document image and reproduces the text data
as text of the electronic medium.
[0164] The electronic document generation program in an example in which a
document image of a receipt is converted to an electronic document will be described below
with reference to FIGS. 26 to 28. FIGS. 26 to 28 are flowcharts illustrating the electronic
document generation program according to the embodiment. The flowcharts illustrated in
FIGS. 26 to 28 are combined to form one flowchart of the electronic document generation
program.
[0165] In Step S102, the document image acquiring unit 31 acquires a document image or a PDF from the document image database 15.
In Step S103, it is determined whether data acquired by the document image acquiring
unit 31 is a PDF. When the acquired data is not a PDF (S103: NO), that is, when the data
acquired by the document image acquiring unit 31 is document image, the process flow
proceeds to Step S106.
[0166] When the data acquired by the document image acquiring unit 31 is a PDF (S103: YES), the PDF is converted to a document image in Step S104 and then the document
image is acquired (S105).
[0167] In Step S106, the preprocessing unit 32 performs preprocessing on the
acquired document image. The preprocessing unit 32 includes the background eliminating
unit 32a, the tilt correcting unit 32b, and the shape adjusting unit 32c.
[0168] The background eliminating unit 32a eliminates the background of the acquired document image. When a character string contained in the acquired document
image is tilted, the tilt correcting unit 32b corrects the tilt of the character string by
performing tilt correction. The shape adjusting unit 32c adjusts the shape and the size of
the acquired document image as a whole.
[0169] In Step S107, the layout recognizing unit 33 acquires the document image subjected to the preprocessing performed by the preprocessing unit 32.
The acquired preprocessed document image is sent to a document image extracting
process of Steps S115, S120, and S136 which will be described later.
[0170] In Steps S108 and S109, the layout recognizing unit 33 performs layout recognition of the document image, identifies a range of each of a plurality of elements
contained in the document image, and acquires a type and position information of each
element.
The type of each element is a character string, a table, an image, a seal, or handwriting.
[0171] In Step S110, the layout recognizing unit 33 performs a process of adjusting
position information of a minimum boundary box of the acquired element.
The minimum boundary box means a rectangle with a minimum area out of rectangles surrounding the element and means a range occupied by the element. The layout recognizing unit 33 compares the document image with the acquired element and adjusts the position information of the minimum boundary box of the acquired element when there is displacement between the document image and the acquired position information of the element.
[0172] In Step Sil, the layout recognizing unit 33 acquires layout information subjected to the minimum boundary box adjusting process performed in Step S110. The layout information includes types and position information of elements.
[0173] In Step S112, the layout recognizing unit 33 determines whether another element remains in the document image with reference to the internally stored layout information of the elements from the process of Step S130 which will be described later.
[0174] When the internally stored layout information of the elements from the process of Step S130 includes layout information of all the elements, the layout recognizing unit 33 determines that another element does not remain in the document image (S112: NO), performs a process of ending the loop from Step S112 to Step S130 in Step S131 and then performs the process of Step S132.
[0175] In contrast, when the internally stored layout information of the elements from the process of Step S130 does not include layout information of all the elements, the layout recognizing unit 33 determines that another element remains in the document image (S112: YES) and then performs the process of Step S113.
[0176] In Step S113, the layout recognizing unit 33 determines whether the element remaining in the document image is a table. When a table does not remain in the document image (S113: NO), the layout recognizing unit 33 sends layout information other than a table to Step S130 which will be described later.
[0177] When a table remains in the document image (S113: YES), the process flow proceeds to Step S114. The document image is associated with a receipt and thus often includes a table. Accordingly, when it is determined that the document image does not include a table, the layout recognizing unit 33 may stop the process flow and ascertain whether the electronic document is associated with a receipt.
[0178] In Step S114, the layout recognizing unit 33 acquires sizes and position information of all the vertical lines and the horizontal lines constituting the table in the
document image. When the sizes and the position information of all the vertical lines and
the horizontal lines constituting the table are acquired, the sizes and the positions of all the
cells contained in the table can be acquired.
[0179] In Step S115, the extraction unit 34 performs a process of extracting an image of the table from the preprocessed document image acquired in the process of Step
S107.
In Step S116, the extraction unit 34 acquires an image of the table extracted in Step
S115.
[0180] In Steps S117 and S118, the extraction unit 34 performs a process of extracting cells from the image of the table acquired in Step S116 (Step S117) and acquires
information of the cells (Step S118).
[0181] The information of a cell includes a row, a column, and coordinates
corresponding to the position information of the cell in the table.
The information of cells acquired in Step S118 is sent to Step S127 which will be
described later.
[0182] In Step S119, the extraction unit 34 determines whether another cell remains
in the table with reference to the internally stored layout information of the table from the
process of Step S127.
[0183] When the internally stored layout information of the table from the process
of Step S127 includes layout information of all the cells, the extraction unit 34 determines
that another cell does not remain in the table (S119: NO), a process of ending the loop from
Step S119 to Step S127 is performed in Step S128, and then the process flow proceeds to
Step S130.
[0184] In contrast, when the internally stored layout information of the cell from
the process of Step S127 does not include layout information of all the cells, the extraction unit 34 determines that another cell remains in the table (S119: YES), and the process flow proceeds to Step S120.
[0185] In Step S120, the extraction unit 34 performs a process of extracting images of cells from the preprocessed document image acquired in the process of Step S107. In Step S121, the extraction unit 34 acquires the images of the cells extracted in the process of Step S120.
[0186] In Step S122, the character string recognizing unit 35 performs a character string recognizing process on the images of the cells acquired in the process of Step S121. In Step S123, the character string recognizing unit 35 acquires position information of the character string subjected to the character string recognizing process.
[0187] In Step S124, the character string recognizing unit 35 performs a process of adjusting position information of the minimum boundary box of the character string acquired in the process of Step S123. The character string recognizing unit 35 compares the acquired position information of the character string with the document image and adjusts the acquired position information of the minimum boundary box of the character string when there is a difference between the document image and the acquired position information of the character string.
[0188] In Step S125, the character string recognizing unit 35 acquires position information after the process of adjusting the position information of the minimum boundary box of the character string performed in Step S124 has been performed thereon.
[0189] In Steps S126 and S127, the character string recognizing unit 35 combines the information of the cell acquired in the process of Step S118 and the adjusted position information of the character string acquired in the process of Step S125 (Step S127) and internally stores the combined information as layout information of the table in an internal storage device (Step S126). The internal storage device is one or both of the RAM 23 and the storage unit 24 illustrated in FIG. 2.
[0190] The processes of Steps S119 to S127 are performed on all the cells contained in the table. The process of ending the loop of Step S128 is performed after the processes of Steps S119 to S127 have been performed a final cell in the table, and the character string recognizing unit 35 performs the process of Step S130.
[0191] In Steps S129 and S130, the output unit 36 combines the layout information of the table acquired in the process of Step S126 and the layout information other than the table acquired in the process of Step S113 (Step S130) and internally stores the combined information in the internal storage device as layout information of all the elements (Step S129).
[0192] The processes of Steps S112 to S130 are performed on all the elements contained in the document image. The process of ending the loop of Step S131 is performed after the processes of Steps S112 to S130 have been performed on a final element in the document image, and the character string recognizing unit 35 performs the process of Step S132.
[0193] In Step S132, the character string recognizing unit 35 determines whether another element remains in the document image. The character string recognizing unit 35 determines whether another element remains in the document image with reference to the internally stored layout information of the elements from the process of Step S140 which will be described later.
[0194] When the internally stored layout information of the elements from the process of Step S140 includes layout information of all the elements, the character string recognizing unit 35 determines that another element does not remain in the document image (S132: NO) and performs a process of ending the loop from Step S132 to Step S140 in Step S141, and the process flow proceeds to Step S142.
[0195] In contrast, when the internally stored layout information of the elements from the process of Step S140 includes layout information of all the elements, the character string recognizing unit 35 determines that another element remains in the document image (S132: YES), and the process flow proceeds to Step S133.
[0196] In Step S133, the character string recognizing unit 35 determines whether an element remaining in the document image is a character string. When it is determined that the element remaining in the document image is a character string (S133: YES), the character string recognizing unit 35 proceeds to Step S135.
[0197] When the character string recognizing unit 35 determines that the element remaining in the document image is not a character string (S133: NO), a loop resuming
process proceeding to Step S132 is performed (Step S134).
In Step S135, the character string recognizing unit 35 acquires the position information
of the character string.
[0198] In Steps S136 and S137, the character string recognizing unit 35 extracts an image of the character string from the preprocessed document image acquired in the process
of Step S107 (Step S136) and acquires the image of the character string.
[0199] In Step S138 and S139, the character string recognizing unit 35 performs a character string recognizing process on the image of the character string acquired in the
process of Step S137 (Step S138) and generates text data predicted in the character string
recognizing process (Step S139).
[0200] In Step S140, the character string recognizing unit 35 combines the position information of the character string acquired in the process of Step S135 and the text data
generated in the process of Step S139 to generate layout information of the elements. The
generated layout information of the elements is sent to Step S129. In Step S129, the sent
layout information of the elements is internally stored in the internal storage device. The
internal storage device means one or both of the RAM 23 and the storage unit 24 illustrated
in FIG. 2.
[0201] The processes of Steps S132 to S140 are performed until it is determined in
Step S132 that the internally stored layout information of the elements from the process of
Step S140 includes layout information of all the elements.
[0202] When it is determined in Step S132 that the internally stored layout
information of the elements from the process of Step S140 includes layout information of all
the elements, a process of ending the loop from Step S132 to Step S140 is performed in Step
S141, and the process flow proceeds to Step S142.
[0203] In Step S142, the electronic document generation device 10 performs post
processing. In the post-processing, output to Java script object notation (JSON), and
conversion to tab-separated values (TSV), for example, are performed on the text data, the images, and the position information of all the elements.
The aforementioned processes of the functional units are performed by the CPU 25 of
the electronic document generation device 10.
[0204] In Step S143, the output unit 36 outputs information of all the elements subjected to the post-processing as a final electronic document in a simple text file, an
hypertext markup language (HTML) format, a file format that can be edited by commercially
available character editing software, or in an editable PDF format.
[0205] According to the embodiment, the electronic document generation device 10 recognizes a layout of a document image using the layout learning model 14 and
recognizes characters of the document image using the character string learning model 13.
That is, since the electronic document generation device 10 identifies types of a plurality of
elements contained in the document image and performs character recognition appropriate
for the type of an element, it is possible to improve recognition accuracy of character
recognition.
[0206] According to the embodiment, since the electronic document generation
device 10 performs character recognition of the document image for each character string
using the character string learning model 13 in comparison with character recognition for
each character having been performed using the OCR text recognition technique according
to the related art, it is possible to improve recognition efficiency at the time of character
recognition.
[0207] According to the embodiment, since the electronic document generation
device 10 performs character recognition for each character string instead of character
recognition for each character at the time of character recognition, it is possible to perform
character recognition with a reduced influence of noise overlapping characters, for example,
and to improve recognition accuracy of character recognition in comparison with character
recognition for each character.
[0208] According to the embodiment, even characters which are likely to be
erroneously recognized in character recognition using the OCR text recognition technique
according to the related art can be correctly recognized in character recognition using the character string learning model 13. For example, when a seal overlaps a character, the character is likely to be erroneously recognized in character recognition using the OCR text recognition technique according to the related art, but can be correctly recognized in character recognition using the character string learning model 13.
[0209] According to the embodiment, since the electronic document generation
device 10 performs character recognition for each character string contained in an image of
each cell in an element of which the type corresponds to a table, it is possible to improve
recognition accuracy of character recognition of a character string contained in the table.
[0210] According to the embodiment, since the character string learning model 13 and the layout learning model 14 are trained using the character string learning data and the
layout learning data with annotations, it is possible to improve recognition accuracy of the
layout recognizing unit 33 and the character string recognizing unit 35.
[0211] According to the embodiment, when a document image includes a table, all the vertical lines and the horizontal lines constituting the table are first recognized and then
all the cells in the table are recognized. Thereafter, since character string recognition for
each cell is performed on all the cells without being affected by the position information in
the table, it is possible to improve recognition accuracy of character recognition of a
character string in a cell.
[0212] The present disclosure is not limited to the electronic document generation
device 10 according to the embodiment, and can be carried out as various modified examples
or application examples without departing from the gist of the present disclosure described
in the appended claims.
[Reference Signs List]
[0213]
10 Electronic document generation device
11 Information communication network
12 User terminal
13 Character string learning model
14 Layout learning model
15 Document image database
20 Input/output interface
21 Communication interface
22 ROM
23 RAM
24 Storage unit
25 CPU
26 Input device
27 Output device
28 GPU
31 Document image acquiring unit
32 Preprocessing unit
32a Background eliminating unit
32b Tilt correcting unit
32c Shape adjusting unit
33 Layout recognizing unit
34 Extraction unit
35 Character string recognizing unit
36 Output unit
40 Layout learning data generating unit
41 Layout learning data correcting unit
42 Layout learning unit
43 Character string learning data generating unit
44 Character string learning data correcting unit
45 Character string learning unit
47 Pre-tilt-correction document image
48 Character string
49 Table
50 Stapler mark
51 Handwriting
52 Seal
53 Image
54 Noise removal
55 Preprocessing
56 Layout recognizing process
57 Character string recognizing process
58a, 59a, 60a Document image
58b, 59b, 60b Document image
61, 62 Document image
63, 64 Table 65 Vertical line
66 Horizontal line
67 Cell image
69, 70, 73 Image of character string
71a Image of character string
71b Text data
72 Recognition range
73 Table
75 Recognition range
76 Annotation symbol of character string
77 Annotation symbol of table
78 Annotation symbol of image
79 Annotation symbol of seal
80 Annotation symbol of outer line
81 Annotation symbol of noise
82 Annotation symbol of handwriting
83 Annotation symbol of vertical line
84 Annotation symbol of horizontal line
85 Annotation of text data
100 Electronic document generation system
S31 Document image acquiring step
S32 Preprocessing step
S33 Layout recognizing step
S34 Extraction step
S35 Character recognizing step
S36 Output step

Claims (16)

1. An electronic document generation device comprising:
a document image acquiring unit configured to acquire a document image
obtained by imaging a document;
a character string recognizing unit configured to recognize a character string
contained in the document image acquired by the document image acquiring unit using a
character string learning model having learned correspondence between document images
and character strings contained in the document images and to generate text data of the
character string; and
an output unit configured to output the text data as text of an electronic medium.
2. The electronic document generation device according to claim 1, further
comprising a layout recognizing unit configured to identify a range of each of a plurality of
elements contained in the document image acquired by the document image acquiring unit
in the document image using a layout learning model having learned correspondence
between a plurality of elements contained in document images and identification
information of the plurality of elements, to recognize a type of each of the plurality of
elements, and to acquire position information of each of the plurality of elements in the
document image associated of the range,
wherein the character string recognizing unit recognizes a character string
contained in the range recognized by the layout recognizing unit using the character string
learning model and generates text data of the character string, and
wherein the output unit outputs the text data associated with the plurality of
elements in the position information of the range associated with the plurality of elements
as text of an electronic medium.
3. The electronic document generation device according to claim 2, wherein the
type of each element is one of a character string, a table, an image, a seal, and handwriting.
4. The electronic document generation device according to claim 3, further
comprising an extraction unit configured to extract each of cells in a table included in an
element of which the type recognized by the layout recognizing unit corresponds to a table
and to acquire position information of each cell in the document image,
wherein the character string recognizing unit recognizes a character string
included in each cell extracted by the extraction unit using the character string learning
model and generates text data of the character string.
5. The electronic document generation device according to any one of claims 2 to
4, wherein annotations associated with the types corresponding to the elements are given to
the elements in the document image including the plurality of elements,
wherein the electronic document generation device further comprises a layout
learning data generating unit configured to accumulate a plurality of the document images
to which the annotations are given to generate layout learning data, and
wherein the layout learning data is used for supervised learning of the layout
learning model.
6. The electronic document generation device according to claim 5, wherein
position information of ranges associated with the plurality of elements included in the
document image in the document image along with the annotations is given to the
document image.
7. The electronic document generation device according to claim 5 or 6, further
comprising a layout learning data correcting unit configured to correct at least one of the
types of the plurality of elements recognized by the layout recognizing unit and the
position information of the ranges of the plurality of elements in the document image on
the basis of an input and to update the layout learning data by adding the corrected data.
8. The electronic document generation device according to claim 7, further
comprising a layout learning unit configured to perform re-learning of the layout learning
model using the layout learning data updated by the layout learning data correcting unit.
9. The electronic document generation device according to any one of claims 2 to
8, further comprising a character string learning data generating unit configured to generate
character string learning data which is used for supervised learning of the character string
learning model.
10. The electronic document generation device according to claim 9, further
comprising a character string learning data correcting unit configured to correct text data
generated by the character string recognizing unit on the basis of an input and to update the
character string learning data by adding the corrected text data.
11. The electronic document generation device according to claim 10, further
comprising a character string learning unit configured to perform re-learning of the
character string learning model using the character string learning data updated by the
character string learning data correcting unit.
12. The electronic document generation device according to any one of claims 2
to 11, wherein the character string recognizing unit includes a plurality of the character
string learning models and uses the character string learning models adapted to languages
of the character strings included in the plurality of elements.
13. The electronic document generation device according to any one of claims 2
to 12, further comprising a preprocessing unit configured to perform preprocessing on the
document image acquired by the document image acquiring unit,
wherein the preprocessing unit includes a background eliminating unit, a tilt
correcting unit, and a shape adjusting unit, wherein the background eliminating unit eliminates a background of the document image acquired by the document image acquiring unit, wherein the tilt correcting unit corrects a tilt of the document image acquired by the document image acquiring unit, and wherein the shape adjusting unit adjusts a shape and a size of the document image as a whole acquired by the document image acquiring unit.
14. The electronic document generation device according to any one of claims 2
to 13, wherein the layout learning model is one of a layout learning model for a contract, a
layout learning model for a bill, a layout learning model for a memorandum, a layout
learning model for a delivery note, and a layout learning model for a receipt.
15. An electronic document generation method that is performed by a computer
used for an electronic document generation device, the electronic document generation
method comprising:
a document image acquiring step of acquiring a document image obtained by
imaging a document;
a character string recognizing step of recognizing a character string contained in
the document image acquired in the document image acquiring step using a character string
learning model having learned correspondence between document images and character
strings contained in the document images and generating text data of the character string;
and
an output step of outputting the text data as text of an electronic medium.
16. An electronic document generation program causing a computer used for an
electronic document generation device to perform:
a document image acquiring function of acquiring a document image obtained by
imaging a document;
AU2021412659A 2020-12-28 2021-12-23 Architecture for digitalizing documents using multi-model deep learning, and document image processing program Pending AU2021412659A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2020-219612 2020-12-28
JP2020219612A JP7150809B2 (en) 2020-12-28 2020-12-28 Document digitization architecture by multi-model deep learning, document image processing program
PCT/JP2021/047935 WO2022145343A1 (en) 2020-12-28 2021-12-23 Architecture for digitalizing documents using multi-model deep learning, and document image processing program

Publications (2)

Publication Number Publication Date
AU2021412659A1 true AU2021412659A1 (en) 2023-07-13
AU2021412659A9 AU2021412659A9 (en) 2024-02-08

Family

ID=82259389

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021412659A Pending AU2021412659A1 (en) 2020-12-28 2021-12-23 Architecture for digitalizing documents using multi-model deep learning, and document image processing program

Country Status (3)

Country Link
JP (2) JP7150809B2 (en)
AU (1) AU2021412659A1 (en)
WO (1) WO2022145343A1 (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH045779A (en) * 1990-04-24 1992-01-09 Oki Electric Ind Co Ltd Character recognizing device
JP3162158B2 (en) * 1992-02-20 2001-04-25 株式会社リコー Image reading device
JPH07160809A (en) * 1993-12-08 1995-06-23 Nec Corp Ocr device
JPH08263588A (en) * 1995-03-28 1996-10-11 Fuji Xerox Co Ltd Character recognition device
JP3940491B2 (en) * 1998-02-27 2007-07-04 株式会社東芝 Document processing apparatus and document processing method
JP2004178010A (en) * 2002-11-22 2004-06-24 Toshiba Corp Document processor, its method, and program
JP2020046860A (en) * 2018-09-18 2020-03-26 株式会社三菱Ufj銀行 Form reading apparatus
JP6590355B1 (en) * 2019-04-26 2019-10-16 Arithmer株式会社 Learning model generation device, character recognition device, learning model generation method, character recognition method, and program

Also Published As

Publication number Publication date
JP2022169754A (en) 2022-11-09
WO2022145343A1 (en) 2022-07-07
AU2021412659A9 (en) 2024-02-08
JP7150809B2 (en) 2022-10-11
JP2022104411A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
JP5712487B2 (en) Image processing apparatus, image processing system, image processing method, and program
JP5934762B2 (en) Document modification detection method by character comparison using character shape characteristics, computer program, recording medium, and information processing apparatus
US8170862B2 (en) Document image processing device and document image processing program for maintaining layout in translated documents
JP5121599B2 (en) Image processing apparatus, image processing method, program thereof, and storage medium
JP4785655B2 (en) Document processing apparatus and document processing method
US11521365B2 (en) Image processing system, image processing apparatus, image processing method, and storage medium
US10503993B2 (en) Image processing apparatus
US11418658B2 (en) Image processing apparatus, image processing system, image processing method, and storage medium
US11568623B2 (en) Image processing apparatus, image processing method, and storage medium
JP2011022867A (en) Image processing device, image processing system and program
US20080300858A1 (en) Image processing apparatus, image processing method and computer readable medium
US20190303702A1 (en) Image processing system and an image processing method
JP2007233671A (en) Image processing apparatus, image processing method and image processing program
AU2021412659A1 (en) Architecture for digitalizing documents using multi-model deep learning, and document image processing program
JP4982587B2 (en) Data entry system and data entry method
US8125691B2 (en) Information processing apparatus and method, computer program and computer-readable recording medium for embedding watermark information
JP4945593B2 (en) Character string collation device, character string collation program, and character string collation method
JP2020166658A (en) Information processing apparatus, information processing method, and program
US11659106B2 (en) Information processing apparatus, non-transitory computer readable medium, and character recognition system
WO2023062799A1 (en) Information processing system, manuscript type identification method, model generation method and program
JP4280939B2 (en) Position plane image recognition computer software
JP4697387B2 (en) Document image determination apparatus, document image determination method and program thereof
JP2010205122A (en) Device and method for analysis of layout structure
JP6852359B2 (en) Image processing equipment and programs
JPH11250179A (en) Character reocognition device and its method

Legal Events

Date Code Title Description
SREP Specification republished