WO2022145343A1 - Architecture for digitalizing documents using multi-model deep learning, and document image processing program - Google Patents

Architecture for digitalizing documents using multi-model deep learning, and document image processing program Download PDF

Info

Publication number
WO2022145343A1
WO2022145343A1 PCT/JP2021/047935 JP2021047935W WO2022145343A1 WO 2022145343 A1 WO2022145343 A1 WO 2022145343A1 JP 2021047935 W JP2021047935 W JP 2021047935W WO 2022145343 A1 WO2022145343 A1 WO 2022145343A1
Authority
WO
WIPO (PCT)
Prior art keywords
character string
document image
layout
unit
document
Prior art date
Application number
PCT/JP2021/047935
Other languages
French (fr)
Japanese (ja)
Inventor
ホサイン シャハリアル シェイク
Original Assignee
有限責任監査法人トーマツ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 有限責任監査法人トーマツ filed Critical 有限責任監査法人トーマツ
Priority to AU2021412659A priority Critical patent/AU2021412659A1/en
Publication of WO2022145343A1 publication Critical patent/WO2022145343A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/192Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
    • G06V30/194References adjustable by an adaptive method, e.g. learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables

Definitions

  • the present invention relates to an electronic document generator, an electronic document generation method, and an electronic document generation program, and more particularly to an electronic document generator that scans a paper document to generate an electronic document, an electronic document generation method, and an electronic document generation program. Is.
  • the object of the electronic document generator, the electronic document generation method, and the electronic document generation program of the present disclosure is to convert a character string included in a document image into text data by a method different from the conventional optical character recognition. And.
  • the electronic document generator is a character string that learns the correspondence between the document image acquisition unit that acquires the document image obtained by imaging the document and the character string included in the document image.
  • the character string recognition unit that recognizes the character string included in the document image acquired by the document image acquisition unit and generates the text data related to the character string, and the text data as the text of the electronic medium. It has an output unit to output.
  • a layout learning model in which a correspondence relationship between a plurality of elements included in a document image and identification information of each of the plurality of elements is learned is used. , Specify the range in each document image of the plurality of elements included in the document image acquired by the document image acquisition unit, recognize each type of the plurality of elements, and the document relating to each range of the plurality of elements.
  • a layout recognition unit for acquiring position information in an image is further provided, and the character string recognition unit recognizes a character string included in a range specified by the layout recognition unit using a character string learning model and converts it into a character string.
  • the text data may be generated, and the output unit may output each of the text data related to the plurality of elements as text in an electronic medium to each position information in the range related to the plurality of elements.
  • the third aspect is that in the electronic document generator according to the second aspect, the type of the element may be any of a character string, a table, an image, a seal, or a handwriting.
  • each cell in the table included in the element is cut out and the cell is cut out.
  • the character string recognition unit further includes a cutout unit for acquiring position information in each document image of the above, and the character string recognition unit uses a character string learning model for a character string included in each of the cells cut out in the cutout unit. It may be recognized and the text data related to the character string may be generated.
  • the fifth aspect is the document image including a plurality of elements in the electronic document generator according to the second to fourth aspects, and the element is given an annotation associated with the type corresponding to each of the elements. It also has a layout learning data generation unit that accumulates multiple document images with annotations and generates layout learning data, and the layout learning data is used for supervised learning of the layout learning model. May be good.
  • a sixth aspect is that in the electronic document generator according to the fifth aspect, position information in each document image of a range related to a plurality of elements included in the document image is added to the document image together with annotations. May be good.
  • a seventh aspect is a document of each type of a plurality of elements recognized by the layout recognition unit and a range of each of the plurality of elements based on the input in the electronic document generator according to the fifth or sixth aspect. At least one of the position information in the image may be modified, and a layout learning data correction unit for updating the layout learning data by adding the modified data may be further provided.
  • the eighth aspect further includes a layout learning unit that relearns the layout learning model using the layout learning data updated by the layout learning data correction unit in the electronic document generation device according to the seventh aspect. It may be that.
  • a ninth aspect is that the electronic document generator according to the second to eighth aspects further includes a character string learning data generation unit that generates character string learning data used for supervised learning of a character string learning model. May be good.
  • the text data generated by the character string recognition unit is modified based on the input, and the modified text data is added to obtain the character string.
  • a character string learning data correction unit for updating the learning data may be further provided.
  • the eleventh aspect is the character string learning in which the character string learning model is relearned by using the character string learning data updated by the character string learning data correction unit in the electronic document generator according to the tenth aspect. It may be provided further.
  • a twelfth aspect is the electronic document generator according to the second to eleventh aspects, wherein the character string recognition unit includes a plurality of character string learning models and is adapted to the language of the character string included in each of the plurality of elements.
  • a character string learning model may be used.
  • a thirteenth aspect further comprises a preprocessing unit that performs preprocessing on a document image acquired by a document image acquisition unit in the electronic document generation device according to the second to twelfth aspects, and the preprocessing unit is a background removing unit.
  • the preprocessing unit is a background removing unit.
  • a tilt correction unit and a shape adjustment unit are provided, the background removal unit removes the background of the document image acquired by the document image acquisition unit, and the tilt correction unit corrects the inclination of the document image acquired by the document image acquisition unit.
  • the shape adjusting unit may adjust the overall shape and size of the document image acquired by the document image acquisition unit.
  • a fourteenth aspect is the electronic document generator according to the second to thirteenth aspects, wherein the layout learning model includes a layout learning model for a contract, a layout learning model for an invoice, a layout learning model for a memorandum, and a delivery note. It may be either a layout learning model for a receipt or a layout learning model for a receipt.
  • the computer used in the electronic document generation device includes a document image acquisition step of acquiring a document image in which a document is imaged, a document image, and a character string included in the document image.
  • the character string recognition step that recognizes the character string included in the document image acquired in the document image acquisition step and generates the text data related to the character string.
  • An output step that outputs the text data as text on an electronic medium.
  • the electronic document generation program includes a document image acquisition function for acquiring a document image in which a document is imaged, and a document image and a character string included in the document image on a computer used in the electronic document generation device.
  • a document image acquisition function for acquiring a document image in which a document is imaged
  • a document image and a character string included in the document image on a computer used in the electronic document generation device With the character string recognition function that recognizes the character string included in the document image acquired by the document image acquisition function and generates the text data related to the character string using the character string learning model that learned the correspondence between , The output function that outputs text data as text on an electronic medium is demonstrated.
  • the electronic document generator uses a document image acquisition unit that acquires a document image that is an image of a document, and a character string learning model that learns the correspondence between the document image and the character string included in the document image.
  • a character string recognition unit that recognizes the character string included in the document image acquired by the document image acquisition unit and generates text data related to the character string, and an output unit that outputs the text data as text on an electronic medium. Since the character string included in the document image is recognized as characters using the machine-learned model, it is possible to improve the recognition efficiency of character recognition when converting the document image into text data.
  • FIG. 1 is a diagram showing an outline of an electronic document generation system 100 including an electronic document generation device 10.
  • the electronic document generation system 100 includes an electronic document generation device 10, a user terminal 12, a character string learning model 13, a layout learning model 14, a document image database 15, and the like.
  • the electronic document generator 10, the user terminal 12, the character string learning model 13, the layout learning model 14, and the document image database 15 are connected to the information communication network 11, and each of them can communicate with each other.
  • the electronic document generation system 100 uses the electronic document generation device 10 to recognize character strings included in a document image and generate text data.
  • the electronic document generation device 10 recognizes the layout of the document image by using the layout learning model, and recognizes the character string included in the document image by using the character string learning model.
  • the electronic document generator 10 is a kind of computer represented by, for example, a personal computer and is an information processing device.
  • the electronic document generation device 10 also includes arithmetic processing devices and microcomputers included in various computers, and also includes devices and devices capable of realizing the functions according to the present disclosure by an application.
  • the character string learning model 13 is a learning model that recognizes an image of a character string included in a document image, and is used for character recognition of the electronic document generation device 10.
  • the storage location of the character string learning model 13 is arbitrary as long as it can be used by the electronic document generation device 10 via the information communication network 11, and is stored in an information processing device such as a personal computer, a server device, or a database.
  • the character string learning model 13 represents an information processing device in which the character string learning model 13 is stored.
  • the character string learning model 13 may be configured by an existing learning model, or may be independently configured as a learning model suitable for use of the electronic document generation device 10.
  • the character string learning model 13 is provided with learning models suitable for various languages such as Japanese, English, and Chinese, and in FIG. 1, the first character string learning model, the second character string learning model, and the third character string are provided. It shall be described as a learning model.
  • the character string learning model 13 is not limited to the one connected to the information communication network 11, but may be included in the electronic document generation device 10 and used under the direct control of the device 10. Further, the character string learning model 13 may be distributed and stored in a plurality of information processing devices connected to the information communication network 11.
  • the layout learning model 14 learns the correspondence between a plurality of elements included in the document image and the identification information of each of the plurality of elements based on the layout learning data described later, and recognizes the layout of the document image. It is a model and is used for layout recognition of the electronic document generator 10. Similar to the character string learning model 13, the layout learning model 14 has an arbitrary storage location as long as it can be used by the electronic document generator 10 via the information communication network 11, and the information connected to the information communication network 11 is available. It is stored in the processing device. For convenience of explanation of the present embodiment, the layout learning model 14 represents an information processing device in which the layout learning model 14 is stored.
  • the layout learning model 14 includes a layout learning model for contracts, a layout learning model for invoices, a layout learning model for memorandums, a layout learning model for delivery notes, a layout learning model for receipts, and the like.
  • the layout learning model for contracts is a learning model that recognizes the layout of document images of contracts, and learns using layout learning data for contracts.
  • the layout learning model for contracts learns what kind of information is in what position in the contract, and in particular, it is described in bullet points, often without a table, and there is a handwritten signature column. Learn about the layout unique to books.
  • the layout learning data for the contract is generated based on, for example, 200 types of contract forms, and at least 3 or 4 contract document images per form, to which the annotation described later is added. ..
  • the layout learning model for invoices is a learning model that recognizes the layout of document images of invoices, and learns using layout learning data for invoices.
  • the layout learning model for invoices learns what information is in what position on the invoice, and in particular, the table often occupies a large area, and even if it is written in Japanese, it is alphanumerical. Learn about invoice-specific layouts, such as not a few words.
  • the layout learning data for invoices is generated based on, for example, 200 types of invoice forms, and at least 3 or 4 invoice document images per form, to which the annotation described later is added. ..
  • the layout learning model for the memorandum is a learning model that recognizes the layout of the document image of the memorandum, and learns using the layout learning data of the memorandum.
  • the layout learning model for the memorandum learns what kind of information is in which position of the memorandum, and in particular, it learns the layout peculiar to the memorandum such as the fact that there is often no table and there is a handwritten signature line.
  • the layout learning data for the memorandum is generated based on, for example, 200 types of memorandum forms to which the annotation described later is added, and at least three or four memorandum document images per form.
  • the layout learning model for the delivery note is a learning model that recognizes the layout of the document image of the delivery note, and learns using the layout learning data for the delivery note.
  • the layout learning model for the delivery note learns what kind of information is in which position on the delivery note, and in particular, the range occupied by the table is often large, and the product name and product number are often described. Learn about the layout peculiar to the delivery note.
  • the layout learning data for the delivery note is generated based on, for example, 200 types of delivery note forms with the annotation described later, and at least 3 or 4 delivery note document images per form. ..
  • the layout learning model for receipts is a learning model that recognizes the layout of document images of receipts, and learns using layout learning data for receipts.
  • the layout learning model for receipts learns what kind of information is in what position on the receipt, and in particular, it often contains a handwritten column of the amount or a table with the amount. Learn about the layout specific to receipts.
  • the layout learning data for receipts is generated based on, for example, 200 types of receipt forms and at least 3 or 4 receipt document images per form, which are annotated as described below. ..
  • the layout learning model 14 is not limited to the use of the electronic document generation device 10 via the information communication network 11, but may be included in the electronic document generation device 10. Further, the layout learning model 14 may be distributed and stored in a plurality of information processing devices connected to the information communication network 11.
  • the document image database 15 is a database that stores images of documents.
  • the electronic document generation device 10 acquires the document image stored in the document image database 15 and generates the character string learning data used for learning the character string learning model and the layout learning data used for learning the layout learning model. ..
  • the user terminal 12 is used for operating the electronic document generator 10.
  • the electronic document is modified according to the modification input from the user of the user terminal 12, and the electronic document generator 10 accepts the modification and relearns at least one of the character string learning model 13 and the layout learning model 14. ..
  • FIG. 2 is a block diagram showing a mechanical configuration of the electronic document generator 10.
  • the electronic document generator 10 includes an input / output interface 20, a communication interface 21, a Read Only Memory (ROM) 22, a Random Access Memory (RAM) 23, a storage unit 24, a Central Processing Unit (CPU) 25, and a Graphics processing unit (GPU). It has 28 mag.
  • the input / output interface 20 sends / receives data or the like to / from an external device of the electronic document generation device 10.
  • the external device is an input device 26 and an output device 27 that input / output data or the like to the electronic document generation device 10.
  • the input device 26 is a keyboard, a mouse, a scanner, and the like
  • the output device 27 is a monitor, a printer, a speaker, and the like.
  • the communication interface 21 has a function of inputting / outputting data of the electronic document generation device 10 when communicating with the outside via the information communication network 11.
  • the storage unit 24 can be used as a storage device, and various applications required for the electronic document generation device 10 to operate, various data used by the applications, and the like are recorded.
  • the GPU 28 is suitable for a lot of repetitive operations performed in executing machine learning and the like, and is used together with the CPU 25.
  • the electronic document generation device 10 stores the electronic document generation program described later in the ROM 22 or the storage unit 24, and takes the electronic document generation program into the main memory composed of the RAM 23 and the like.
  • the CPU 25 accesses the main memory in which the electronic document generation program is incorporated and executes the electronic document generation program.
  • FIG. 3 is a diagram showing an outline of processing performed by the electronic document generator 10.
  • the electronic document generator 10 performs the following processes I to III in this order.
  • Process I performs preprocessing 55 including "background removal”, “tilt correction”, and “shape adjustment” of the document image.
  • the pre-processing 55 refers to performing pre-processing for facilitating the execution (recognition) of character recognition using a learning model for an image including a character string, and is the recognition processing performed in processes II and III. The purpose is to improve recognition accuracy.
  • the layout recognition process 56 is performed.
  • the layout recognition process 56 first, "layout recognition" of the document image is performed.
  • the layout recognition process 56 is a process of recognizing what kind of information is present at which position in the input image.
  • Information here refers to character strings, tables, images, seals, handwriting, etc.
  • the electronic document generator 10 recognizes the layout of the document image and the document image contains a table, the electronic document generator 10 performs "table recognition” and “cuts out the cell image” for the cells included in the table. ..
  • the character string recognition process 57 is a process of converting an image including a character string into text data by using a character string learning model 13 that has learned the correspondence between the image and the character string included in the image.
  • the character string recognition process 57 may include processes such as “arrangement of text data” and “noise reduction”.
  • the image of the character string is converted into text data, and "text data arrangement” and “noise removal” are performed. "Arrangement of text data” means that when the image of the cut out character string contains a space, the space is recognized together with the character string, so that the text data is arranged together with the space.
  • Noise removal means that when noise is contained in the image of the cut out character string, the noise is passively removed from the text data because it is not recognized by the electronic document generator 10.
  • the noise referred to here refers to pixels that do not form characters and are included in the image of the cut out character string.
  • FIG. 4 is a block diagram showing a functional configuration of the electronic document generator 10.
  • the electronic document generation device 10 has a document image acquisition unit 31, a preprocessing unit 32, a background removal unit 32a, an inclination correction unit 32b, a shape adjustment unit 32c, and a layout recognition on the CPU 25.
  • the document image acquisition unit 31 acquires a document image in which a document is imaged.
  • the document image acquisition unit 31 may acquire a document image from the document image database 15.
  • the document image acquisition unit 31 may obtain a document image from the scanner of the input device 26.
  • FIG. 5 is a diagram illustrating input data and output data of the electronic document generation device 10, and FIG. 5A shows a document image acquired by the document image acquisition unit 31 as input data.
  • the document image contains noise such as a stapler mark 50, a handwriting 51, a seal 52, and an image 53.
  • noises interfere with or become unnecessary for information processing devices such as humans and personal computers to understand the contents of the document.
  • Other examples of noise include holes made for filing and creases left on the paper. The creases can be perceived as lines and need to be removed so that they are not reflected in the electronic document.
  • the electronic document generation device 10 converts the character string in the document image into text data and outputs the electronic document while maintaining the layout of the acquired document image (see FIG. 5 (b)).
  • the electronic document generator 10 removes noise 54 by active processing of the Heiliki mark 50, the handwriting 51, the seal 52, and the image 53 recognized as noise, and other pixels in the document image not recognized as character strings and noise. Is removed by passive processing that does not remain in the electronic document.
  • the table in the document image of FIG. 5B is output together with the text data as object data in the electronic document while maintaining the arrangement in the document image.
  • the electronic document generator 10 can arbitrarily select the elements to be included in the electronic document to be output.
  • the stapler mark 50, the handwritten 51, the seal 52, the image 53, and the like are removed in normal use, but the seal 52 and the image 53 can be included in an electronic document and output as image data.
  • the preprocessing unit 32 (see FIG. 4) performs preprocessing 55 on the document image acquired by the document image acquisition unit 31.
  • the preprocessing 55 is performed in order to improve the recognition accuracy of image recognition using the learning model by the layout recognition unit 33 and the character string recognition unit 35, which will be described later.
  • the pretreatment unit 32 includes a background removal unit 32a, an inclination correction unit 32b, and a shape adjustment unit 32c.
  • the background removing unit 32a removes the background of the document image acquired by the document image acquisition unit 31.
  • FIG. 6 is a diagram illustrating background removal performed in the pretreatment 55.
  • FIG. 6A shows a document image 58a before the background is removed
  • FIG. 6B shows a document image 58b after the background is removed.
  • the background removing unit 32a removes the background of the document image by changing the background color of the document image to white. Specifically, the background removing unit 32a detects the background color of the acquired document image and determines whether or not the background color is white. When it is determined that the background color is not white, the background removing unit 32a extracts information other than the background of the document image, makes the background color white, and then superimposes the extracted information.
  • the background removing unit 32a by deleting the background, noise that causes a malfunction of image recognition by the layout recognition unit 33 and the character string recognition unit 35 can be removed, and the recognition accuracy can be improved.
  • the tilt correction unit 32b (see FIG. 4) corrects the tilt of the document image acquired by the document image acquisition unit 31.
  • the processing performed by the tilt correction unit 32b will be described with reference to FIG. 7.
  • FIG. 7 is a diagram illustrating the inclination correction performed in the preprocessing 55.
  • FIG. 7A shows the document image 59a before the tilt correction
  • FIG. 7B shows the document image 59b after the tilt correction.
  • the tilt correction unit 32b corrects the tilt of the character string when there is a tilted character string in the document image, and makes the character string parallel or perpendicular to the writing direction.
  • the tilt correction unit 32b corrects the tilted character string so as to be parallel to the vertical writing direction, and when the document image is written horizontally, the tilted character string is written horizontally. Correct so that it is parallel to the direction.
  • the tilt correction unit 32b extracts the character string of the document image and determines whether or not there is a tilted character string in the extracted character string. When it is determined that there is a tilted character string in the extracted character string, the tilt correction unit 32b detects the tilt angle of the tilted character string with respect to the writing direction, and tilts the tilted character string with respect to the tilted character string. Rotation processing is performed so that the angle becomes zero.
  • the recognition accuracy of image recognition by the character string recognition unit 35 can be improved by correcting the tilt of the character string. Further, it is possible to reduce the layout recognition error by the layout recognition unit 33.
  • the shape adjusting unit 32c (see FIG. 4) adjusts the overall shape and size of the document image acquired by the document image acquisition unit 31.
  • the processing performed by the shape adjusting unit 32c will be described with reference to FIG.
  • FIG. 8 is a diagram illustrating shape adjustment performed in the pretreatment.
  • FIG. 8A shows a document image 60a before the shape adjustment
  • FIG. 8B shows a document image 60b after the shape adjustment.
  • the shape adjustment unit 32c adjusts the overall shape of the document image based on the overall shape of the actual document. conduct. Specifically, when the overall aspect ratio of the document image acquired by the document image acquisition unit 31 is different from the overall aspect ratio of the actual document, the overall aspect ratio of the document image is the overall aspect ratio of the actual document.
  • the shape adjusting unit 32c adjusts so as to be equal to the ratio.
  • the shape adjustment unit 32c performs the subsequent processing. Adjusts the size of the document image acquired by the document image acquisition unit 31 so that
  • the layout recognition unit 33 by adjusting the shape and size of the document image acquired by the document image acquisition unit 31, the layout recognition unit 33 that is performed thereafter improves the recognition accuracy of the layout according to the actual document. Further, the recognition accuracy of image recognition by the character string recognition unit 35 can be improved.
  • the layout recognition unit 33 uses a layout learning model 14 that has learned the correspondence between the plurality of elements included in the document image 61 and the identification information of each of the plurality of elements, and the document image acquisition unit 33.
  • the range of each of the plurality of elements included in the document image 61 acquired in 31 is specified in the document image 61, each type of the plurality of elements is recognized, and the document image 61 relating to each range of the plurality of elements is recognized. Get the position information in.
  • the type of the element may be any of a character string 48, a table 49, an image 53, a seal 52, or a handwriting 51.
  • the type of the element is not limited to this, and stapler marks 50, punch hole marks, breakage (tear) marks, carbon stains for copying, and the like may be used.
  • the type of element may be suitable for the type of document (for example, contract, invoice, memorandum, invoice, receipt, etc.). For example, if carbon for copying is attached to the back side of the receipt and the carbon is transferred to the front surface and becomes a stain, use the stain with the carbon for copying as the element type and actively stain with the carbon for copying. May be removed.
  • type of document for example, contract, invoice, memorandum, invoice, receipt, etc.
  • the layout learning model 14 is either a layout learning model for contracts, a layout learning model for invoices, a layout learning model for memorandums, a layout learning model for delivery notes, or a layout learning model for receipts. It may be that.
  • the types of elements may be classified into necessary and unnecessary according to the type of document.
  • the layout recognition unit 33 does not acquire the position information of the element and requires the recognized element. If it corresponds to, the position information of the element may be acquired.
  • the layout recognition unit 33 may recognize only the necessary elements among the plurality of elements included in the document image 61 and acquire the position information of the elements.
  • the layout recognition unit 33 may overlap the elements or the elements may be too far apart from each other. , Correct the range of each of the elements and the acquired position information based on the actual document.
  • FIG. 9A and 9B are diagrams for explaining the correction process for eliminating the omission in the layout recognition process, FIG. 9A shows a state before the correction, and FIG. 9B shows a state after the correction.
  • the layout recognition unit 33 When the layout recognition unit 33 recognizes the image 70 of the character string included in the document image acquired by the document image acquisition unit 31 as a character string, the layout recognition unit 33 determines whether or not there is a gap in the recognition range, and determines whether or not the recognition range is missing. If there is, perform correction processing to add the missing part.
  • FIG. 9A shows how the layout recognition unit 33 recognizes the image 70 of the character string as a character string in the recognition range 72a.
  • the recognition range 72a has a defect in the left end portion of the image 70 of the character string.
  • the layout recognition unit 33 determines whether or not there is a black line within a predetermined range around the recognition range 72a, and if there is a black line, a correction is added to add the range 72b including the black line to the recognition range 72a. (See FIG. 9 (b)).
  • the determination of presence / absence performed by the layout recognition unit 33 is not limited to the black line, and whether or not a line having the same color as the character or a line having a preset color is within a predetermined range around the recognition range 72a. May be determined. This is because the main purpose of the correction process for eliminating omissions performed in the layout recognition process is to improve the recognition accuracy of the character recognition process performed thereafter.
  • the character string recognition unit 35 can normally perform character recognition.
  • FIG. 10A and 10B are diagrams for explaining the correction process for eliminating the overlap performed in the layout recognition process, FIG. 10A shows a state before the correction, and FIG. 10B shows a state after the correction.
  • the layout recognition unit 33 recognizes the image 73 of the character string included in the document image acquired by the document image acquisition unit 31 as a character string, the recognition range 75a overlaps with another element (for example, Table 74). It is determined whether or not the overlap occurs, and if an overlap occurs, a correction process for eliminating the overlap is performed.
  • another element for example, Table 74
  • FIG. 10A shows how the layout recognition unit 33 recognizes the image 73 of the character string as a character string in the recognition range 75a.
  • the recognition range 75a overlaps the table 74 to the right of the image 73 of the character string beyond a blank (space).
  • the layout recognition unit 33 determines whether or not there is a blank (space) of a predetermined size inside the recognition range 75a, and if there is the blank (space), the blank (space) and the blank.
  • the recognition range 75a related to the portion on the right side of (space) is deleted to make the recognition range 75b (see FIG. 10B).
  • the layout recognition unit 33 Since there is always a blank (space) of a predetermined size between an element and another element, the layout recognition unit 33 recognizes the blank (space) of a predetermined size inside the recognition range. We conclude that the range overlaps with other elements. According to the overlap elimination correction process performed in the layout recognition process, the layout recognition unit 33 can improve the layout recognition accuracy.
  • FIG. 11A and 11B are diagrams for explaining the layout recognition performed in the layout recognition process 56, FIG. 11A shows the state of the document image 61 before the layout is recognized, and FIG. 11B shows the state of the document image 61 after the layout is recognized. The state of the document image 62 of the above is shown.
  • the layout recognition unit 33 specifies the range of the elements (character string 48, table 49, seal 52, image 53) included in the document image 61 within the document image 61 by image recognition using the layout learning model 14.
  • the range of the specified character string 48 is surrounded by a solid line, and the ranges of the specified table 49, the seal 52, and the image 53 are surrounded by a broken line.
  • the boundaries of the elements need not be visible to humans as long as they can be recognized by the electronic document generator 10.
  • the layout recognition unit 33 recognizes the type of the corresponding element by image recognition using the layout learning model 14 in the range in the specified document image 61, and relates to the document image 62 in the range together with the type of the element. Get location information.
  • the position information may be represented by plane orthogonal coordinates with a predetermined point in the document image 62 as the origin.
  • the layout learning model 14 is preset according to the type of the document image 61, and the layout recognition unit 33 recognizes the layout of the document image 61 using the preset layout learning model 14.
  • the document image 61 acquired by the document image acquisition unit 31 is a contract
  • image recognition is performed using the layout learning model 14 for the contract
  • Image recognition is performed using 14, and if it is a memorandum, image recognition is performed using the layout learning model 14 for the memorandum, and if it is a delivery note, image recognition is performed using the layout learning model 14 for the delivery note. If it is a receipt, image recognition is performed using the layout learning model 14 for the receipt.
  • the layout recognition unit 33 properly uses the layout learning model 14 according to the type of the document image 61 acquired by the document image acquisition unit 31, the recognition accuracy of the layout recognition of the document image 61 can be improved.
  • the cutout unit 34 cuts out each of the cells in the table included in the element in the element whose type recognized by the layout recognition unit 33 corresponds to the table, and in each document image of the cell. Get the location information in.
  • FIG. 12A and 12B are diagrams for explaining table recognition performed by the layout recognition process 56
  • FIG. 12A shows Table 63 before being recognized by the layout recognition unit 33
  • FIG. 12B shows the layout recognition unit 33.
  • Table 64 after being recognized in.
  • the line recognized as the vertical line 65 is represented as a one-dot chain line
  • the line recognized as the horizontal line 66 is represented as a broken line.
  • the layout recognition unit 33 recognizes the length and position of each of the vertical lines 65 and the horizontal lines 66 constituting the table 64.
  • the layout recognition unit 33 recognizes all the cells included in the table 64 by recognizing the lengths and positions of all the vertical lines 65 and the horizontal lines 66 constituting the table 64. That is, the layout recognition unit 33 recognizes a quadrangle composed of two adjacent vertical lines 65 and two adjacent horizontal lines 66 as cells.
  • the layout recognition unit 33 also recognizes the line types of the lines constituting Table 64.
  • the recognized line type is reflected in the line object constituting the table included in the electronic document when the electronic document is reproduced based on the acquired document image. Therefore, for example, when the table line in the document image 62 is a broken line, the table line included in the electronic document reproduced based on the document image 62 is represented as a broken line object.
  • the cutout unit 34 cuts out all the cells included in the table 64 grasped by the layout recognition unit 33 into an image for each cell alone. With reference to FIG. 13, cutting out of cell pixels by the cutting-out portion 34 will be described.
  • FIG. 13 is a diagram illustrating cutting out of a cell image.
  • the cell 67 cut out by the cutout portion 34 may include a plurality of character strings.
  • the cutting unit 34 acquires the image of each cell and the position information of the cell in the table 64 for all the cells included in the table 64.
  • the position information may be represented by plane orthogonal coordinates with a predetermined point in Table 64 as the origin, or may be represented by (rows, columns) in Table 64.
  • the cutout unit 34 reproduces all the vertical lines and horizontal lines constituting the table recognized by the layout recognition unit 33, and generates the position information of all the cells.
  • FIG. 14 is a diagram illustrating a character string in the cell image.
  • the cutout unit 34 When the cut out cell 67 contains a character string of a plurality of lines, the cutout unit 34 further cuts out an image for each character string for all the character strings.
  • the cell 67 shown in FIG. 14 contains two lines of character strings, and the cutout portion 34 cuts out an image 67a of the character string and an image 67b of the character string.
  • the character string recognition unit 35 uses a character string learning model 13 that has learned the correspondence between the document image and the character string included in the document image, and the document image acquired by the document image acquisition unit 31. Characters are recognized for the character string included in, and text data related to the character string is generated.
  • the character string recognition unit 35 may recognize the character string included in the range recognized by the layout recognition unit 33 using the character string learning model 13 and generate text data related to the character string.
  • the character string recognition unit 35 may perform character recognition using the character string learning model 13 for the character strings included in each of the cells cut out by the cutout unit 34, and generate text data related to the character strings. ..
  • the character string recognition unit 35 includes a plurality of character string learning models 13, and may use a character string learning model 13 adapted to the language of the character string included in each of the plurality of elements.
  • the recognition accuracy can be improved by using a character string learning model suitable for recognizing an English character string.
  • FIGS. 15 and 16 are diagrams for explaining the arrangement of text data performed in the character string recognition process 57
  • FIG. 15A is an image 67a of a character string before character recognition is performed
  • FIG. 15B is a character. It is a character string 68a after recognition, that is, text data 68a.
  • FIG. 16A and 16B are diagrams for explaining noise removal performed by the character string recognition process 57
  • FIG. 16A is an image 71a of a character string before character recognition is performed
  • FIG. 16B is a character recognition. It is the character string 71b after it is performed, that is, the text data 71b.
  • the image 67a of the character string shown in FIG. 15A has a handwritten check mark in addition to the character string of one line.
  • the character string contains a space between words.
  • the character string recognition unit 35 recognizes the entire image 67a of the character string by using the character string learning model 13 and generates text data.
  • the character string recognition unit 35 recognizes characters for two words "L / C NO:”, “ILC18H000219", and a blank space between the two words in the image 67a of the character string, and 2
  • the text data corresponding to the tokens and the text data corresponding to the blank space between the two tokens are generated (68a: see FIG. 15B). Therefore, since the character string recognition unit 35 also recognizes the space between words and phrases and converts them into text data, the two words and phrases can be arranged separately as in the image 67a.
  • the character string recognition unit 35 recognizes the character of the image 67a of the character string, the handwriting check mark is not recognized and is not included in the text data. Therefore, the handwriting check mark is deleted from the output electronic document (the handwriting check mark is deleted). 68a: FIG. 15 (b). Therefore, noise such as handwriting check marks that are not the target of character recognition by the character string recognition unit 35 is passively removed from the electronic document.
  • the character string recognition unit 35 recognizes the entire image 71a of the character string using the character string learning model 13 and generates text data.
  • the character string recognition unit 35 recognizes characters for the entire image 71a of the character string, and generates the text data corresponding to the character string for the character string "autiated to act on behalf of the" (71b: FIG. 16B). reference).
  • the noise contained in the image 71a of the character string is not the target of character recognition by the character string recognition unit 35, it is passively removed from the electronic document (71b: see FIG. 16B).
  • the character string learning model 13 has FIGS. 15 (a) and 15 (b). ) And the data associated with FIG. 16 (a) and FIG. 16 (b) as teacher data, by learning a large number of characters from the image using such deep learning. Recognition can be realized.
  • the character string recognition unit 35 acquires attribute data such as the size and typeface of the character included in the character string when recognizing the character string included in the images 67a and 71a by using the character string learning model 13. You may.
  • the attribute data of this character is reflected as the attribute data of the text data output by the output unit 36 described later.
  • the output unit 36 (see FIG. 4) outputs text data as text on an electronic medium.
  • the output unit 36 may output each of the text data related to the plurality of elements as text on an electronic medium to each position information in the range related to the plurality of elements.
  • the electronic medium is not limited to data electronically stored in a recording medium, but also includes data itself that can be handled by an information processing device such as a personal computer, not in a state of being stored in the recording medium.
  • the position information of the element may be represented by plane orthogonal coordinates with a predetermined point in the document image 62 as the origin.
  • the output unit 36 since the text data related to the plurality of elements is output based on the position information related to the element, noise is removed while maintaining the layout of the acquired document image 61, and the inside of the document image 61 is used. It is possible to convert the character string of the above into text data and output an electronic document.
  • the output unit 36 may reflect the character attribute data acquired by the character string recognition unit 35 in the text data and output it to an electronic document.
  • the electronic document generator 10 can reproduce attribute data such as character size and typeface included in the document image 61 as attribute data of text data included in the electronic document to be output. can.
  • the layout learning data generation unit 40 (see FIG. 4) is a document image including a plurality of elements, and annotations associated with the types corresponding to each of the elements are attached to the elements, and the annotations are added. Data for layout learning is generated by accumulating a plurality of document images.
  • the layout learning data is used for supervised learning of the layout learning model 14.
  • the document image stored in the layout learning data may be given position information in each document image of the range related to the plurality of elements included in the document image together with the annotation.
  • FIGS. 17 to 23 are diagrams showing an example of layout learning data to which annotations are added.
  • the layout learning data generation unit 40 acquires a document image from the document image database 15, annotates the document image, and generates layout learning data.
  • the user can manually generate the layout learning data without using the layout learning data generation unit 40.
  • the document image acquired from the document image database 15 can be annotated by using the user terminal 12.
  • layout learning data used for learning the layout learning model 14 for invoices will be described.
  • Annotation symbols are added to each element so that the electronic document generator 10 can identify and classify character strings, tables, images, stamps, outer frames, and noises as elements included in the document image.
  • Annotation symbol 76 of the character string is given to the element related to the character string, the character string is surrounded by a rectangular frame line, and the tag of "Text" is attached to the frame line as a mark.
  • the portion surrounded by the rectangular frame line is learned by the layout learning model 14 as the range occupied by the elements related to the character string in the document image.
  • Annotation symbol 77 of the table is given to the element related to the table, a rectangular frame line is superimposed on the outer frame of the table, and the tag of "Border Table" is attached to the frame line as a mark.
  • the portion surrounded by the rectangular frame line is learned by the layout learning model 14 as the range occupied by the elements related to the table in the document image.
  • An annotation symbol 78 of the image is attached to the element related to the image, a frame line indicating the annotation symbol is superimposed on the boundary line of the image, and the tag of "Image" is attached to the frame as a mark.
  • Images shall include logos, marks, photographs, illustrations and the like.
  • the portion surrounded by the frame line is learned by the layout learning model 14 as the range occupied by the elements related to the image in the document image.
  • Annotation symbol 79 of the seal is given to the element related to the seal, a frame line indicating the annotation symbol is superimposed on the boundary line of the seal, and the tag of "Hun” is attached to the frame line as a mark.
  • the portion covered with the frame line is learned by the layout learning model 14 as the range occupied by the element related to the seal in the document image.
  • Annotation symbol 80 of the outer frame is given to the element related to the outer frame, the frame line is superimposed on the boundary line of the outer frame, and the tag of "Border" is attached to the frame line as a mark.
  • the layout learning model 14 learns about the length and position of the four line segments constituting the frame line.
  • a noise annotation symbol 81 is added to the element related to noise, the noise is surrounded by a rectangular frame, and the tag of "Noise" is attached to the frame as a mark.
  • the portion covered with the frame is learned by the layout learning model 14 as the range occupied by the element related to the noise in the document image.
  • the layout learning data used for learning table recognition will be described with reference to FIG.
  • the alternate long and short dash line which is the vertical line annotation symbol 83, is superimposed on all the vertical lines constituting the table, and the horizontal line annotation symbol is applied to all the horizontal lines constituting the table. Overlay the dashed line 84.
  • the layout learning model 14 can learn about the size of the table, the range occupied by the table, the position, and the information of all the cells included in the table.
  • the cell information is the number of cells contained in the table and the position of each cell in the table, and the position in the table is represented by (row, column) of the table.
  • FIG. 20 is layout learning data in which a character string is included in each cell.
  • FIG. 21 is layout learning data for recognizing a table relating to a cell containing a one-line character string, a cell containing a two-line character string, and a cell containing a three-line character string.
  • a character string annotation symbol 76 is added to each of the character strings without being affected by the number of rows of the character string contained in one cell, and the character string is bounded by a rectangular border. Enclose with and attach the tag of "Text" to the frame line as a mark.
  • the layout learning model 14 learns the range of the annotation symbol 76 of the character string and the position of the character string in the table.
  • the electronic document generator 10 can reproduce the table by outputting the text data related to the character string to the electronic document together with the object data related to all the vertical lines and the horizontal lines constituting the table.
  • An annotation symbol 76 for the character string is added to each of the character strings included in the cells of the table of FIG. 22, the character string is surrounded by a rectangular frame line, and the tag "Text" is attached to the frame line as a mark.
  • the layout learning model 14 learns about the range of the annotation symbol 76 of the character string and the position information of the character string in the document.
  • the electronic document generator 10 can reproduce a table in an electronic document by placing text data related to a character string at a position in the document.
  • the electronic document generator 10 can reproduce the table in the electronic document only by outputting the text data without reproducing the vertical lines and the horizontal lines constituting the table in the electronic document.
  • the layout learning model 14 can learn the range and position of the seal by the element related to the character string and the blank located at the bottom of the character string without using the element related to the seal. can.
  • the layout learning data correction unit 41 (see FIG. 4) has the position information in the document image of each type of the plurality of elements acquired by the layout recognition unit 33 and the range of each of the plurality of elements based on the input. At least one of the above is modified, and the layout learning data is updated by adding this modified data.
  • the document image 61 before the image is recognized by the layout recognition unit 33 There may be a discrepancy between the document image 61 before the image is recognized by the layout recognition unit 33 and the document image 62 after the image is recognized by the layout recognition unit 33.
  • a part of the character string may not be recognized, an element to be recognized as an image may be recognized as a seal, or the position of the table may be misaligned.
  • the document image 62 after the image is recognized by the layout recognition unit 33 is modified so as to match the document image 61 before the image is recognized by the layout recognition unit 33, and the corrected data is used.
  • the layout learning data is updated.
  • the layout learning unit 42 (see FIG. 4) relearns the layout learning model 14 using the layout learning data updated by the layout learning data correction unit 41. By re-learning the layout learning model 14, the recognition accuracy of the layout of the document image can be improved.
  • the character string learning data generation unit 43 (see FIG. 4) generates character string learning data used for supervised learning of the character string learning model 13.
  • the character string learning data correction unit 44 (see FIG. 4) corrects the text data generated by the character string recognition unit 35 based on the input, and adds the corrected text data for character string learning. Update the data.
  • the character string learning unit 45 (see FIG. 4) relearns the character string learning model 13 using the character string learning data updated by the character string learning data correction unit 44.
  • the character string learning data generation unit 43 acquires a document image from the document image database 15, annotates the document image, and generates character string learning data.
  • the user can manually generate the character string learning data without using the character string learning data generation unit 43.
  • the document image acquired from the document image database 15 can be annotated by using the user terminal 12.
  • FIG. 21 is a diagram showing an example of character string learning data to which annotations are added.
  • FIG. 24 is an output screen of the character string learning data generation unit 43, which is displayed on the user terminal 12 or the output device 27 of the electronic document generation device 10.
  • the character string learning data generation unit 43 assigns text data corresponding to each of the character strings to the character strings included in the document image acquired from the document image database 15 as the comment 85 of the text data.
  • the annotation may be added as the text data annotation 85 instead of the text data.
  • the character string learning data generation unit 43 When the character string included in the document image contains a blank, the character string learning data generation unit 43 generates the character string learning data so that the text data corresponding to the character string also contains a blank.
  • FIG. 25 is a flowchart of an electronic document generation program.
  • the electronic document generation method is executed by the CPU 25 of the electronic document generation device 10 based on the electronic document generation program.
  • the electronic document generation program realizes various functions such as a document image acquisition function, a preprocessing function, a layout recognition function, a cutting function, a character recognition function, and an output function for the CPU 25 of the electronic document generation device 10. These functions are executed in the order shown in FIG. 25, but the order may be changed as appropriate. Since each function overlaps with the description of the electronic document generation device 10 described above, a detailed description thereof will be omitted.
  • the document image acquisition function acquires a document image obtained by converting a document into an image (S31: document image acquisition step).
  • the format of the document image includes, for example, PDF, JPG, GIF, and the like, and other data formats that the electronic document generator 10 can process as an image may be included.
  • the pre-processing function performs pre-processing on the document image acquired by the document image acquisition function (S32: pre-processing step).
  • the pre-processing function has a background removal function, a tilt correction function, and a shape adjustment function.
  • the background removal function removes the background of the document image acquired by the document image acquisition function
  • the tilt correction function removes the background of the document image acquired by the document image acquisition function.
  • the tilt of the image is corrected
  • the shape adjustment function adjusts the overall shape and size of the document image acquired by the document image acquisition function.
  • the layout recognition function uses a layout learning model 14 that learns the correspondence between a plurality of elements included in the document image and the identification information of each of the plurality of elements to obtain a document image acquired by the document image acquisition function.
  • the range in each document image of the plurality of elements included is specified, each type of the plurality of elements is recognized, and the position information in the document image relating to each range of the plurality of elements is acquired (S33:).
  • Layout recognition step ).
  • the types of elements may be classified into necessary and unnecessary according to the type of document.
  • the layout recognition function does not acquire the position information of the element and recognizes it. If the specified element corresponds to a necessary one, the position information of the element may be acquired.
  • the layout recognition function may recognize only the necessary elements among the plurality of elements included in the document image 61 and acquire the position information of the elements.
  • the layout recognition function recognizes each type of element, acquires the position information of the document image related to each range of the element, and then if the elements overlap or the elements are too far apart, the layout recognition function recognizes each type of the element. Based on the actual document, the range of each of the elements and the acquired position information are corrected.
  • the layout recognition function recognizes the length and position of each of the vertical and horizontal lines that make up the table.
  • the layout recognition function grasps all the cells included in the table by grasping the lengths and positions of all the vertical lines and horizontal lines constituting the table. That is, the layout recognition function recognizes a quadrangle composed of two adjacent vertical lines and two adjacent horizontal lines as cells.
  • the layout recognition function also recognizes the line types of the lines that make up the table.
  • the recognized line type is reflected in the line object constituting the table included in the electronic document when the electronic document is reproduced based on the acquired document image.
  • the table line in the document image is a dashed line
  • the table line contained in the electronic document reproduced based on the document image is represented as a dashed object.
  • the cutout function cuts out each of the cells in the table included in the element in the element whose type recognized by the layout recognition function corresponds to the table, and acquires the position information in each document image of the cell ( S34: Cutting step).
  • the cutout function reproduces all the vertical and horizontal lines constituting the table recognized by the layout recognition function, and generates the position information of all the cells.
  • the cell cut out by the cutout function may contain multiple character strings.
  • the cutout function further cuts out an image for each character string for all the character strings.
  • the image of the character string recognized by the layout recognition function and the image of the character string cut out by the cutout function are sent to the character recognition function line by line.
  • the character recognition function uses a character string learning model that learns the correspondence between the document image and the character string included in the document image, and recognizes the character string included in the document image acquired by the document image acquisition function. , Generates text data related to the character string (S35: character recognition step).
  • the output function outputs the text data as text on an electronic medium (S36: output step).
  • the output function outputs text data based on the position information of the character string acquired by the layout recognition function and the position information in the document image of the cell acquired by the cutout portion, and reproduces the text as text on an electronic medium.
  • FIGS. 26 to 28 are flowcharts of an embodiment relating to an electronic document generation program.
  • the flowcharts shown in FIGS. 26 to 28 show a flowchart of one electronic document generation program by combining them.
  • step S102 the document image acquisition unit 31 acquires a document image or PDF from the document image database 15.
  • step S103 it is determined whether or not the data acquired by the document image acquisition unit 31 is PDF. If it is not a PDF (No: S103), that is, if the data acquired by the document image acquisition unit 31 is a document image, the process proceeds to step S106.
  • step S104 the PDF is converted into a document image, and then the document image is acquired (S105).
  • the preprocessing unit 32 performs preprocessing on the acquired document image.
  • the pretreatment unit 32 includes a background removal unit 32a, an inclination correction unit 32b, and a shape adjustment unit 32c.
  • the background removing unit 32a removes the background of the acquired document image.
  • the tilt correction unit 32b corrects the tilt and corrects the tilt of the character string.
  • the shape adjusting unit 32c adjusts the overall shape and size of the acquired document image.
  • step S107 the layout recognition unit 33 acquires a document image that has undergone preprocessing performed by the preprocessing unit 32.
  • the acquired document image after preprocessing is sent to the document image cutting process of step S115, step S120, and step S136 described later.
  • the layout recognition unit 33 performs layout recognition of the document image, specifies the range of a plurality of elements included in the document image for each element, and acquires the type and position information for each element. do.
  • the types of elements are character strings, tables, images, seals, and handwriting.
  • the layout recognition unit 33 adjusts the position information of the minimum boundary box of the acquired element.
  • the minimum boundary box means the rectangle surrounding the element and having the smallest area, and means the range occupied by the element.
  • the layout recognition unit 33 collates the document image with the acquired element, and if there is a discrepancy between the document image and the position information of the acquired element, the layout recognition unit 33 adjusts the position information of the minimum boundary box of the acquired element. ..
  • the layout recognition unit 33 acquires the layout information after the adjustment process of the minimum boundary box performed in step S110.
  • the layout information includes element types and position information.
  • the layout recognition unit 33 refers to the layout information of the internally stored element sent by the process of step S130 described later, and determines whether or not other elements remain in the document image. do.
  • step S130 When the layout information of all the elements is included in the layout information of the internally stored elements sent by the process of step S130, the layout recognition unit 33 has other elements remaining in the document image. It is determined that there is no such thing (No: S112), the process proceeds to step S131, the loop termination processing of step S112 to step S130 is performed, and the process proceeds to step S132.
  • the layout recognition unit 33 has other elements in the document image. It is determined that it remains (Yes: S112), and the process proceeds to step S113.
  • step S113 the layout recognition unit 33 determines whether or not the element remaining in the document image is a table. If no table remains in the document image (No: S113), layout information other than the table is sent to step S130 described later.
  • step S113 If the table remains in the document image (Yes: S113), the process proceeds to step S114. Since the document image is related to the receipt, it often includes a table. Therefore, if it is determined that the document image does not include the table, the layout recognition unit 33 may interrupt the process and confirm whether or not the electronic document relates to a receipt.
  • step S114 the layout recognition unit 33 acquires the size and position information of all the vertical lines and horizontal lines constituting the table in the document image. If the size and position information of all the vertical lines and the horizontal lines constituting the table are acquired, the size and position of the cells can be acquired for all the cells included in the table.
  • step S115 the cutout unit 34 cuts out a table image from the preprocessed document image acquired by the process of step S107.
  • step S116 the cutout unit 34 acquires an image of the table cut out in step S115.
  • step S117 and step S118 the cutting unit 34 performs a process of extracting cells from the image of the table acquired in step S116 (step S117), and acquires cell information (step S118).
  • the cell information is the row, column, and coordinates corresponding to the cell position information in the table.
  • the cell information acquired in step S118 is sent to step S127, which will be described later.
  • step S119 the cutting unit 34 refers to the layout information of the internally stored table sent by the process of step S127, and determines whether or not other cells remain in the table.
  • step S127 When the layout information of all the cells is included in the layout information of the internally stored cells sent by the process of step S127, the cutout unit 34 has no other cells left in the table. (No: S119), the process proceeds to step S128, the loop end processing of step S119 is performed, and the process proceeds to step S130.
  • step S127 when the layout information of all the cells is not included in the layout information of the internally stored cells sent by the process of step S127, the cutout portion 34 has other cells remaining in the table. (Yes: S119), and the process proceeds to step S120.
  • step S120 the cutting unit 34 performs a process of cutting out a cell image from the preprocessed document image acquired by the process of step S107.
  • step S121 the cutting unit 34 acquires an image of the cell cut out by the process of step S120.
  • step S122 the character string recognition unit 35 performs a character string recognition process on the image of the cell acquired by the process of step S121.
  • step S123 the character string recognition unit 35 acquires the position information of the character string for which the character string recognition process has been performed.
  • step S124 the character string recognition unit 35 adjusts the position information of the minimum boundary box of the character string acquired by the process of step S123.
  • the character string recognition unit 35 collates the document image with the position information of the acquired character string, and if there is a discrepancy between the document image and the position information of the acquired character string, the minimum boundary box of the acquired character string. Adjust the position information of.
  • step S125 the character string recognition unit 35 acquires the position information after the adjustment process of the position information of the minimum boundary box of the character string carried out in step S124.
  • step S126 and step S127 the character string recognition unit 35 merges the cell information acquired by the process of step S118 and the position information of the adjusted character string acquired by the process of step S125 (step S127).
  • step S126 Internally stored in the internal storage device as table layout information (step S126).
  • the internal storage device refers to either or both of the RAM 23 and the storage unit 24 shown in FIG.
  • steps S119 to S127 is performed for all cells included in the table. After the processing of steps S119 to S127 is performed on the last cell included in the table, the loop termination processing of step S128 is performed, and the character string recognition unit 35 shifts to step S130.
  • step S129 and step S130 the output unit 36 merges the layout information of the table acquired by the process of step S126 and the layout information other than the table acquired by the process of step S113 (step S130), and all the elements. Is internally stored in the internal storage device as the layout information of (step S129).
  • steps S112 to S130 are performed for all the elements included in the document image. After the processing of steps S112 to S130 is performed for the last element included in the document image, the loop end processing of step S131 is performed, and the character string recognition unit 35 shifts to step S132.
  • step S132 the character string recognition unit 35 determines whether or not other elements remain in the document image.
  • the character string recognition unit 35 refers to the layout information of the internally stored elements sent by the process of step S140 described later, and determines whether or not other elements remain in the document image.
  • step S140 When the layout information of all the elements is included in the layout information of the internally stored elements sent by the process of step S140, the character string recognition unit 35 has another degree in the document image. It is determined that there is no remaining (No: S132), the process proceeds to step S141, the loop termination process of step S132 to step S140 is performed, and the process proceeds to step S142.
  • step S140 when the layout information of all the elements is not included in the layout information of the internally stored elements sent by the process of step S140, the character string recognition unit 35 has another element in the document image. Is determined to remain (Yes: S132), and the process proceeds to step S133.
  • step S133 the character string recognition unit 35 determines whether or not the element remaining in the document image is a character string.
  • the process proceeds to step S135.
  • step S133 When the character string recognition unit 35 determines that the element remaining in the document image is not a character string (No: S133), the loop continuation process of shifting to step S132 is performed (step S134). In step S135, the character string recognition unit 35 acquires the position information of the character string.
  • step S136 and step S137 the character string recognition unit 35 cuts out an image of the character string from the preprocessed document image acquired by the process of step S107 (step S136), and acquires the image of the character string.
  • step S138 and step S139 the character string recognition unit 35 performs a character string recognition process on the image of the character string acquired by the process of step S137 (step S138), and the text data predicted by the character string recognition process. Is generated (step S139).
  • step S140 the character string recognition unit 35 merges the position information of the character string acquired by the process of step S135 and the text data generated by the process of step S139 to generate the layout information of the element.
  • the layout information of the generated element is sent to step S129.
  • step S129 the layout information of the sent elements is internally stored in the internal storage device.
  • the internal storage device refers to either or both of the RAM 23 and the storage unit 24 shown in FIG.
  • step S132 to step S140 is performed until it is determined by the processing of step S132 that the layout information of all the elements is included in the layout information of the elements sent by the processing of step S140.
  • step S141 in response to the determination by the process of step S132 that the layout information of all the elements is included in the layout information of the elements sent by the process of step S140, steps S132 to S140 The process of ending the loop up to is performed, and the process proceeds to step S142.
  • step S142 the electronic document generator 10 performs post-processing.
  • the text data, images, and position information of all the elements are output to JSON (Javascript objectionation) and converted to TSV (Tab-Separated Values).
  • JSON Javascript objectionation
  • TSV Tab-Separated Values
  • the output unit 36 has a simple text file as a final form, an HTML (HyperText Markup Language), a file format editable by commercially available character editing software, and a file format in which the information of all the elements that have undergone post-processing can be edited. Editable is output as an electronic document such as PDF.
  • HTML HyperText Markup Language
  • the electronic document generator 10 recognizes the layout of the document image using the layout learning model 14, and then performs character recognition of the document image using the character string learning model 13. That is, since the electronic document generation device 10 identifies the types of a plurality of elements included in the document image and performs character recognition suitable for the types of elements, the recognition accuracy of character recognition can be improved.
  • the electronic document generator 10 uses the character string learning model 13 to characterize a document image, as compared with character recognition in character units, which has been performed by conventional OCR text recognition technology. Since the recognition is performed for each character string, the recognition efficiency at the time of character recognition can be improved.
  • the character recognition is performed for each character string instead of the character recognition for each character, so that noise existing on the characters and the like are generated.
  • Character recognition can be performed by suppressing the influence of the above, and the recognition accuracy of character recognition can be improved as compared with character recognition performed in units of one character.
  • a character that is erroneously recognized by character recognition using the conventional OCR text recognition technique can be correctly recognized by character recognition using the character string learning model 13.
  • the character when a seal is superimposed on a character, the character may be erroneously recognized by the conventional OCR text recognition technique, but can be correctly recognized by character recognition using the character string learning model 13. ..
  • the electronic document generator 10 recognizes the elements whose types correspond to the table for each character string included in the image related to the cell alone, and therefore the characters included in the table. It is possible to improve the recognition accuracy of character recognition in a column.
  • the character string learning model 13 and the layout learning model 14 are learned from the annotated character string learning data and the layout learning data, so that the layout recognition unit 33 and the character string recognition unit 35 are learned. It is possible to improve the recognition accuracy of.
  • the present disclosure is not limited to the electronic document generator 10 according to the above-described embodiment, and is carried out by various other modifications or applications as long as it does not deviate from the gist of the present disclosure described in the claims. It is possible.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The purpose of the present invention is to convert a character string contained in a document image to text data by a method different from conventional optical character recognition. In the present invention, an electronic document generation device is provided with: a document image acquisition unit that acquires a document image obtained by imaging a document; a character string recognition unit that recognizes a character string contained in the document image acquired by the document image acquisition unit using a character string learning model that has learned correspondence between document images and character strings contained in the document images and outputs text data of the recognized character string; and an output unit that outputs the text data as text for an electronic medium.

Description

多モデル深層学習による文書のデジタル化アーキテクチャ、文書画像処理プログラムDocument digitization architecture by multi-model deep learning, document image processing program
 本発明は、電子文書生成装置、電子文書生成方法、及び電子文書生成プログラムに関し、特に紙文書を走査して電子文書を生成する電子文書生成装置、電子文書生成方法、及び電子文書生成プログラムに関するものである。 The present invention relates to an electronic document generator, an electronic document generation method, and an electronic document generation program, and more particularly to an electronic document generator that scans a paper document to generate an electronic document, an electronic document generation method, and an electronic document generation program. Is.
 デジタル情報技術が進展しペーパーレス化が普及してきているが、紙文書による情報の蓄積や伝達は依然として広く利用されている。膨大な紙文書を抱える企業などから、紙文書を効率良くデジタル文書に変換できる技術が望まれている。 Although digital information technology has advanced and paperless offices have become widespread, the accumulation and transmission of information in paper documents is still widely used. Companies that have a huge amount of paper documents want a technology that can efficiently convert paper documents into digital documents.
 従来からのOCRテキスト認識技術では、一文字単位で文字認識を行っていたので文字認識の認識効率が良くない点が問題となっていた(例えば、特許文献1参照)。 In the conventional OCR text recognition technology, character recognition is performed in character units, so there is a problem that the recognition efficiency of character recognition is not good (see, for example, Patent Document 1).
特開2010-244372号公報Japanese Unexamined Patent Publication No. 2010-244372
 そこで、本開示の電子文書生成装置、電子文書生成方法、及び電子文書生成プログラムは、文書画像に含まれる文字列を、従来の光学文字認識とは異なる手法によって、テキストデータに変換することを目的とする。 Therefore, the object of the electronic document generator, the electronic document generation method, and the electronic document generation program of the present disclosure is to convert a character string included in a document image into text data by a method different from the conventional optical character recognition. And.
 すなわち、第1の態様に係る電子文書生成装置は、文書を画像化した文書画像を取得する文書画像取得部と、文書画像と当該文書画像に含まれる文字列との対応関係を学習した文字列学習モデルを用いて、文書画像取得部に取得された文書画像に含まれる文字列を文字認識し、当該文字列に係るテキストデータを生成する文字列認識部と、テキストデータを電子媒体のテキストとして出力する出力部とを備える。 That is, the electronic document generator according to the first aspect is a character string that learns the correspondence between the document image acquisition unit that acquires the document image obtained by imaging the document and the character string included in the document image. Using the learning model, the character string recognition unit that recognizes the character string included in the document image acquired by the document image acquisition unit and generates the text data related to the character string, and the text data as the text of the electronic medium. It has an output unit to output.
 第2の態様は、第1の態様に係る電子文書生成装置において、文書画像に含まれる複数の要素と、当該複数の要素の各々の識別情報との対応関係を学習したレイアウト学習モデルを用いて、文書画像取得部に取得された文書画像に含まれる複数の要素の各々の文書画像内における範囲を特定し、複数の要素の各々の種類を認識し、複数の要素の各々の範囲に係る文書画像内における位置情報を取得するレイアウト認識部をさらに備え、文字列認識部は、レイアウト認識部により特定された範囲に含まれる文字列について、文字列学習モデルを用いて文字認識し、文字列に係るテキストデータを生成し、出力部は、複数の要素に係る範囲の各々の位置情報に、複数の要素に係るテキストデータの各々を電子媒体のテキストとして出力することとしてもよい。 In the second aspect, in the electronic document generator according to the first aspect, a layout learning model in which a correspondence relationship between a plurality of elements included in a document image and identification information of each of the plurality of elements is learned is used. , Specify the range in each document image of the plurality of elements included in the document image acquired by the document image acquisition unit, recognize each type of the plurality of elements, and the document relating to each range of the plurality of elements. A layout recognition unit for acquiring position information in an image is further provided, and the character string recognition unit recognizes a character string included in a range specified by the layout recognition unit using a character string learning model and converts it into a character string. The text data may be generated, and the output unit may output each of the text data related to the plurality of elements as text in an electronic medium to each position information in the range related to the plurality of elements.
 第3の態様は、第2の態様に係る電子文書生成装置において、要素の種類は、文字列、表、画像、印章、又は手書きのいずれかであることとしてもよい。 The third aspect is that in the electronic document generator according to the second aspect, the type of the element may be any of a character string, a table, an image, a seal, or a handwriting.
 第4の態様は、第3の態様に係る電子文書生成装置において、レイアウト認識部により認識された種類が表に該当する要素において、当該要素に含まれる表の中のセルの各々を切り出し、セルの各々の文書画像内における位置情報を取得する切出部をさらに備え、文字列認識部は、切出部に切り出されたセルの各々に含まれる文字列について、文字列学習モデルを用いて文字認識を行い、文字列に係るテキストデータを生成することとしてもよい。 In the fourth aspect, in the electronic document generator according to the third aspect, in the element whose type recognized by the layout recognition unit corresponds to the table, each cell in the table included in the element is cut out and the cell is cut out. The character string recognition unit further includes a cutout unit for acquiring position information in each document image of the above, and the character string recognition unit uses a character string learning model for a character string included in each of the cells cut out in the cutout unit. It may be recognized and the text data related to the character string may be generated.
 第5の態様は、第2ないし4の態様に係る電子文書生成装置において、複数の要素を含む文書画像であって、当該要素に当該要素の各々に該当する種類に関連付けられたアノテーションが付与されており、アノテーションが付与された複数の文書画像を蓄積してレイアウト学習用データを生成するレイアウト学習用データ生成部をさらに備え、レイアウト学習用データはレイアウト学習モデルの教師有り学習に用いられることとしてもよい。 The fifth aspect is the document image including a plurality of elements in the electronic document generator according to the second to fourth aspects, and the element is given an annotation associated with the type corresponding to each of the elements. It also has a layout learning data generation unit that accumulates multiple document images with annotations and generates layout learning data, and the layout learning data is used for supervised learning of the layout learning model. May be good.
 第6の態様は、第5の態様に係る電子文書生成装置において、文書画像に、アノテーションとともに文書画像に含まれる複数の要素に係る範囲の各々の文書画像内における位置情報が付与されることとしてもよい。 A sixth aspect is that in the electronic document generator according to the fifth aspect, position information in each document image of a range related to a plurality of elements included in the document image is added to the document image together with annotations. May be good.
 第7の態様は、第5または6の態様に係る電子文書生成装置において、入力に基づいて、レイアウト認識部により認識された複数の要素の各々の種類、及び複数の要素の各々の範囲の文書画像内における位置情報の少なくともいずれかが修正され、この修正されたデータを追加することでレイアウト学習用データを更新するレイアウト学習用データ修正部をさらに備えることとしてもよい。 A seventh aspect is a document of each type of a plurality of elements recognized by the layout recognition unit and a range of each of the plurality of elements based on the input in the electronic document generator according to the fifth or sixth aspect. At least one of the position information in the image may be modified, and a layout learning data correction unit for updating the layout learning data by adding the modified data may be further provided.
 第8の態様は、第7の態様に係る電子文書生成装置において、レイアウト学習用データ修正部により更新されたレイアウト学習用データを用いて、レイアウト学習モデルの再学習を行うレイアウト学習部をさらに備えることとしてもよい。 The eighth aspect further includes a layout learning unit that relearns the layout learning model using the layout learning data updated by the layout learning data correction unit in the electronic document generation device according to the seventh aspect. It may be that.
 第9の態様は、第2ないし8の態様に係る電子文書生成装置において、文字列学習モデルの教師有り学習に用いる文字列学習用データを生成する文字列学習用データ生成部をさらに備えることとしてもよい。 A ninth aspect is that the electronic document generator according to the second to eighth aspects further includes a character string learning data generation unit that generates character string learning data used for supervised learning of a character string learning model. May be good.
 第10の態様は、第9の態様に係る電子文書生成装置において、入力に基づいて、文字列認識部により生成されたテキストデータが修正され、この修正されたテキストデータを追加することで文字列学習用データを更新する文字列学習用データ修正部をさらに備えることとしてもよい。 In the tenth aspect, in the electronic document generator according to the ninth aspect, the text data generated by the character string recognition unit is modified based on the input, and the modified text data is added to obtain the character string. A character string learning data correction unit for updating the learning data may be further provided.
 第11の態様は、第10の態様に係る電子文書生成装置において、文字列学習用データ修正部により更新された文字列学習用データを用いて、文字列学習モデルの再学習を行う文字列学習部をさらに備えることとしてもよい。 The eleventh aspect is the character string learning in which the character string learning model is relearned by using the character string learning data updated by the character string learning data correction unit in the electronic document generator according to the tenth aspect. It may be provided further.
 第12の態様は、第2ないし11の態様に係る電子文書生成装置において、文字列認識部は、複数の文字列学習モデルを備え、複数の要素の各々に含まれる文字列の言語に適応した文字列学習モデルを用いることとしてもよい。 A twelfth aspect is the electronic document generator according to the second to eleventh aspects, wherein the character string recognition unit includes a plurality of character string learning models and is adapted to the language of the character string included in each of the plurality of elements. A character string learning model may be used.
 第13の態様は、第2ないし12の態様に係る電子文書生成装置において、文書画像取得部が取得した文書画像について前処理を行う前処理部をさらに備え、前処理部は、背景除去部、傾き補正部、及び形状調整部を備え、背景除去部は、文書画像取得部が取得した文書画像の背景を除去し、傾き補正部は、文書画像取得部が取得した文書画像の傾きを補正し、形状調整部は、文書画像取得部が取得した文書画像の全体の形状及び大きさを調整することとしてもよい。 A thirteenth aspect further comprises a preprocessing unit that performs preprocessing on a document image acquired by a document image acquisition unit in the electronic document generation device according to the second to twelfth aspects, and the preprocessing unit is a background removing unit. A tilt correction unit and a shape adjustment unit are provided, the background removal unit removes the background of the document image acquired by the document image acquisition unit, and the tilt correction unit corrects the inclination of the document image acquired by the document image acquisition unit. The shape adjusting unit may adjust the overall shape and size of the document image acquired by the document image acquisition unit.
 第14の態様は、第2ないし13の態様に係る電子文書生成装置において、レイアウト学習モデルは、契約書用のレイアウト学習モデル、請求書用のレイアウト学習モデル、覚書用のレイアウト学習モデル、納品書用のレイアウト学習モデル、又は領収書用のレイアウト学習モデルのいずれかであることとしてもよい。 A fourteenth aspect is the electronic document generator according to the second to thirteenth aspects, wherein the layout learning model includes a layout learning model for a contract, a layout learning model for an invoice, a layout learning model for a memorandum, and a delivery note. It may be either a layout learning model for a receipt or a layout learning model for a receipt.
 第15の態様に係る電子文書生成方法は、電子文書生成装置に用いられるコンピュータが、文書を画像化した文書画像を取得する文書画像取得ステップと、文書画像と当該文書画像に含まれる文字列との対応関係を学習した文字列学習モデルを用いて、文書画像取得ステップにて取得された文書画像に含まれる文字列を文字認識し、当該文字列に係るテキストデータを生成する文字列認識ステップと、テキストデータを電子媒体のテキストとして出力する出力ステップとを実行する。 In the electronic document generation method according to the fifteenth aspect, the computer used in the electronic document generation device includes a document image acquisition step of acquiring a document image in which a document is imaged, a document image, and a character string included in the document image. Using the character string learning model that learned the correspondence between, the character string recognition step that recognizes the character string included in the document image acquired in the document image acquisition step and generates the text data related to the character string. , An output step that outputs the text data as text on an electronic medium.
 第16の態様に係る電子文書生成プログラムは、電子文書生成装置に用いられるコンピュータに、文書を画像化した文書画像を取得する文書画像取得機能と、文書画像と当該文書画像に含まれる文字列との対応関係を学習した文字列学習モデルを用いて、文書画像取得機能にて取得された文書画像に含まれる文字列を文字認識し、当該文字列に係るテキストデータを生成する文字列認識機能と、テキストデータを電子媒体のテキストとして出力する出力機能とを発揮させる。 The electronic document generation program according to the sixteenth aspect includes a document image acquisition function for acquiring a document image in which a document is imaged, and a document image and a character string included in the document image on a computer used in the electronic document generation device. With the character string recognition function that recognizes the character string included in the document image acquired by the document image acquisition function and generates the text data related to the character string using the character string learning model that learned the correspondence between , The output function that outputs text data as text on an electronic medium is demonstrated.
 本開示に係る電子文書生成装置は、文書を画像化した文書画像を取得する文書画像取得部と、文書画像と当該文書画像に含まれる文字列との対応関係を学習した文字列学習モデルを用いて、文書画像取得部に取得された文書画像に含まれる文字列を文字認識し、当該文字列に係るテキストデータを生成する文字列認識部と、テキストデータを電子媒体のテキストとして出力する出力部とを備え、機械学習されたモデルを用いて文書画像に含まれる文字列を文字認識するので、文書画像をテキストデータに変換する際の文字認識の認識効率を向上させることができる。 The electronic document generator according to the present disclosure uses a document image acquisition unit that acquires a document image that is an image of a document, and a character string learning model that learns the correspondence between the document image and the character string included in the document image. A character string recognition unit that recognizes the character string included in the document image acquired by the document image acquisition unit and generates text data related to the character string, and an output unit that outputs the text data as text on an electronic medium. Since the character string included in the document image is recognized as characters using the machine-learned model, it is possible to improve the recognition efficiency of character recognition when converting the document image into text data.
本実施形態に係る電子文書生成装置を含む電子文書生成システムの概略を示す図である。It is a figure which shows the outline of the electronic document generation system including the electronic document generation apparatus which concerns on this embodiment. 電子文書生成装置の物理的構成を示すブロック図である。It is a block diagram which shows the physical structure of an electronic document generator. 電子文書生成装置が行う処理の概略を示す図である。It is a figure which shows the outline of the process performed by an electronic document generator. 電子文書生成装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of an electronic document generator. 電子文書生成装置の入力データと出力データとを説明する図である。It is a figure explaining the input data and output data of an electronic document generator. 前処理で行う背景除去を説明する図である。It is a figure explaining the background removal performed in the preprocessing. 前処理で行う傾き補正を説明する図である。It is a figure explaining the tilt correction performed in the pre-processing. 前処理で行う形状調整を説明する図である。It is a figure explaining the shape adjustment performed in the pre-processing. レイアウト認識処理で行う欠落解消の補正処理を説明する図である。It is a figure explaining the correction process of the omission elimination performed in the layout recognition process. レイアウト認識処理で行う重なり解消の補正処理を説明する図である。It is a figure explaining the correction process of the overlap elimination performed in the layout recognition process. レイアウト認識処理で行うレイアウト認識を説明する図である。It is a figure explaining layout recognition performed in layout recognition processing. レイアウト認識処理で行う表の認識を説明する図である。It is a figure explaining the recognition of the table performed in the layout recognition process. セル画像の切り出しを説明する図である。It is a figure explaining the cutout of a cell image. セル画像内の文字列を説明する図である。It is a figure explaining the character string in a cell image. 文字列認識処理で行うテキストデータの配置を説明する図である。It is a figure explaining the arrangement of the text data performed in the character string recognition process. 文字列認識処理で行うノイズ除去を説明する図である。It is a figure explaining the noise reduction performed in the character string recognition process. アノテーションが付与されたレイアウト学習用データの例を示す図である。It is a figure which shows the example of the layout learning data which annotated. アノテーションが付与されたレイアウト学習用データの例を示す図である。It is a figure which shows the example of the layout learning data which annotated. アノテーションが付与されたレイアウト学習用データの例を示す図である。It is a figure which shows the example of the layout learning data which annotated. アノテーションが付与されたレイアウト学習用データの例を示す図である。It is a figure which shows the example of the layout learning data which annotated. アノテーションが付与されたレイアウト学習用データの例を示す図である。It is a figure which shows the example of the layout learning data which annotated. アノテーションが付与されたレイアウト学習用データの例を示す図である。It is a figure which shows the example of the layout learning data which annotated. アノテーションが付与されたレイアウト学習用データの例を示す図である。It is a figure which shows the example of the layout learning data which annotated. アノテーションが付与された文字列学習用データの例を示す図である。It is a figure which shows the example of the character string learning data which annotated. 電子文書生成プログラムのフローチャートである。It is a flowchart of an electronic document generation program. 電子文書生成プログラムに係る一実施形態のフローチャート(1/3)である。It is a flowchart (1/3) of one Embodiment which concerns on an electronic document generation program. 電子文書生成プログラムに係る一実施形態のフローチャート(2/3)である。It is a flowchart (2/3) of one Embodiment which concerns on an electronic document generation program. 電子文書生成プログラムに係る一実施形態のフローチャート(3/3)である。It is a flowchart (3/3) of one Embodiment which concerns on an electronic document generation program.
 図1乃至図24を参照して本開示に係る電子文書生成装置10の一実施形態について説明する。本実施形態では、電子文書生成装置10をインターネット及びLAN(Local Area Network)などの情報通信ネットワーク11に接続して使用する一例を示す。図1を参照して、電子文書生成装置10を含む電子文書生成システム100の概略を説明する。図1は、電子文書生成装置10を含む電子文書生成システム100の概略を示す図である。 An embodiment of the electronic document generator 10 according to the present disclosure will be described with reference to FIGS. 1 to 24. In this embodiment, an example is shown in which the electronic document generator 10 is used by connecting to the Internet and an information communication network 11 such as a LAN (Local Area Network). The outline of the electronic document generation system 100 including the electronic document generation device 10 will be described with reference to FIG. 1. FIG. 1 is a diagram showing an outline of an electronic document generation system 100 including an electronic document generation device 10.
 電子文書生成システム100は、電子文書生成装置10、ユーザ端末12、文字列学習モデル13、レイアウト学習モデル14、及び文書画像データベース15などを備える。電子文書生成装置10、ユーザ端末12、文字列学習モデル13、レイアウト学習モデル14、及び文書画像データベース15は、情報通信ネットワーク11に接続され、おのおの相互に情報通信が可能である。 The electronic document generation system 100 includes an electronic document generation device 10, a user terminal 12, a character string learning model 13, a layout learning model 14, a document image database 15, and the like. The electronic document generator 10, the user terminal 12, the character string learning model 13, the layout learning model 14, and the document image database 15 are connected to the information communication network 11, and each of them can communicate with each other.
 電子文書生成システム100は、電子文書生成装置10を用いて文書画像に含まれた文字列を文字認識しテキストデータを生成するものである。電子文書生成装置10は、文書画像のレイアウトについてレイアウト学習モデルを用いて画像認識し、文書画像に含まれた文字列について文字列学習モデルを用いて文字認識するものである。 The electronic document generation system 100 uses the electronic document generation device 10 to recognize character strings included in a document image and generate text data. The electronic document generation device 10 recognizes the layout of the document image by using the layout learning model, and recognizes the character string included in the document image by using the character string learning model.
 電子文書生成装置10とは、例えば、パソコンなどに代表されるコンピュータの一種であり情報処理装置である。電子文書生成装置10は、さらに様々なコンピュータに含まれる演算処理装置及びマイコン等も含み、アプリケーションによって本開示に係る機能を実現することが可能な機器、及び装置などをも含むのもとする。 The electronic document generator 10 is a kind of computer represented by, for example, a personal computer and is an information processing device. The electronic document generation device 10 also includes arithmetic processing devices and microcomputers included in various computers, and also includes devices and devices capable of realizing the functions according to the present disclosure by an application.
 文字列学習モデル13は、文書画像に含まれる文字列の画像認識を行う学習モデルであり、電子文書生成装置10の文字認識に用いられる。文字列学習モデル13は、情報通信ネットワーク11を介して電子文書生成装置10に利用可能であればその保存場所は任意であり、例えばパソコン、サーバー装置、データベースなどの情報処理装置に保存される。本実施形態の説明の便宜上、文字列学習モデル13は、文字列学習モデル13が保存される情報処理装置を表すものとする。 The character string learning model 13 is a learning model that recognizes an image of a character string included in a document image, and is used for character recognition of the electronic document generation device 10. The storage location of the character string learning model 13 is arbitrary as long as it can be used by the electronic document generation device 10 via the information communication network 11, and is stored in an information processing device such as a personal computer, a server device, or a database. For convenience of the description of the present embodiment, the character string learning model 13 represents an information processing device in which the character string learning model 13 is stored.
 文字列学習モデル13は、既存の学習モデルにより構成されてもよいし、若しくは電子文書生成装置10の利用に適した学習モデルとして独自に構成されてもよい。文字列学習モデル13は、日本語、英語、中国語などの各種言語に適した学習モデルをそれぞれ備えるものとし、図1では第1文字列学習モデル、第2文字列学習モデル、第3文字列学習モデルなどと記載するものとする。 The character string learning model 13 may be configured by an existing learning model, or may be independently configured as a learning model suitable for use of the electronic document generation device 10. The character string learning model 13 is provided with learning models suitable for various languages such as Japanese, English, and Chinese, and in FIG. 1, the first character string learning model, the second character string learning model, and the third character string are provided. It shall be described as a learning model.
 なお、文字列学習モデル13は、情報通信ネットワーク11に接続されるものに限らず、電子文書生成装置10に含まれ当該装置10の直接的な制御のもとに利用されてもよい。また、文字列学習モデル13は、情報通信ネットワーク11に接続された複数の情報処理装置に分散して保存されてもよい。 The character string learning model 13 is not limited to the one connected to the information communication network 11, but may be included in the electronic document generation device 10 and used under the direct control of the device 10. Further, the character string learning model 13 may be distributed and stored in a plurality of information processing devices connected to the information communication network 11.
 レイアウト学習モデル14は、文書画像に含まれる複数の要素と当該複数の要素の各々の識別情報との対応関係を後述のレイアウト学習用データに基づいて学習し、文書画像のレイアウトの認識を行う学習モデルであり、電子文書生成装置10のレイアウト認識に用いられる。レイアウト学習モデル14は、文字列学習モデル13と同様に、情報通信ネットワーク11を介して電子文書生成装置10に利用可能であればその保存場所は任意であり、情報通信ネットワーク11に接続された情報処理装置に保存される。本実施形態の説明の便宜上、レイアウト学習モデル14は、レイアウト学習モデル14が保存される情報処理装置を表すものとする。 The layout learning model 14 learns the correspondence between a plurality of elements included in the document image and the identification information of each of the plurality of elements based on the layout learning data described later, and recognizes the layout of the document image. It is a model and is used for layout recognition of the electronic document generator 10. Similar to the character string learning model 13, the layout learning model 14 has an arbitrary storage location as long as it can be used by the electronic document generator 10 via the information communication network 11, and the information connected to the information communication network 11 is available. It is stored in the processing device. For convenience of explanation of the present embodiment, the layout learning model 14 represents an information processing device in which the layout learning model 14 is stored.
 レイアウト学習モデル14は、契約書用のレイアウト学習モデル、請求書用のレイアウト学習モデル、覚書用のレイアウト学習モデル、納品書用のレイアウト学習モデル、及び領収書用のレイアウト学習モデルなどを含む。 The layout learning model 14 includes a layout learning model for contracts, a layout learning model for invoices, a layout learning model for memorandums, a layout learning model for delivery notes, a layout learning model for receipts, and the like.
 契約書用のレイアウト学習モデルは、契約書の文書画像のレイアウト認識を行う学習モデルであり、契約書用のレイアウト学習用データを用いて学習する。契約書用のレイアウト学習モデルは、契約書のどの位置に、どのような情報があるかを学習し、特に、箇条書きで記載され、表が無い場合が多く、手書きの署名欄が有るなど契約書に特有のレイアウトについて学習する。契約書用のレイアウト学習用データは、後述するアノテーションが付与された、例えば、200種類の契約書のフォームであって1フォームにつき少なくとも3、4枚の契約書の文書画像に基づいて生成される。 The layout learning model for contracts is a learning model that recognizes the layout of document images of contracts, and learns using layout learning data for contracts. The layout learning model for contracts learns what kind of information is in what position in the contract, and in particular, it is described in bullet points, often without a table, and there is a handwritten signature column. Learn about the layout unique to books. The layout learning data for the contract is generated based on, for example, 200 types of contract forms, and at least 3 or 4 contract document images per form, to which the annotation described later is added. ..
 請求書用のレイアウト学習モデルは、請求書の文書画像のレイアウト認識を行う学習モデルであり、請求書用のレイアウト学習用データを用いて学習する。請求書用のレイアウト学習モデルは、請求書のどの位置に、どのような情報があるかを学習し、特に、表の占める範囲が大きい場合が多く、日本語で書かれていても英数字の字句も少なくないなど請求書に特有のレイアウトについて学習する。請求書用のレイアウト学習用データは、後述するアノテーションが付与された、例えば、200種類の請求書のフォームであって1フォームにつき少なくとも3、4枚の請求書の文書画像に基づいて生成される。 The layout learning model for invoices is a learning model that recognizes the layout of document images of invoices, and learns using layout learning data for invoices. The layout learning model for invoices learns what information is in what position on the invoice, and in particular, the table often occupies a large area, and even if it is written in Japanese, it is alphanumerical. Learn about invoice-specific layouts, such as not a few words. The layout learning data for invoices is generated based on, for example, 200 types of invoice forms, and at least 3 or 4 invoice document images per form, to which the annotation described later is added. ..
 覚書用のレイアウト学習モデルは、覚書の文書画像のレイアウト認識を行う学習モデルであり、覚書のレイアウト学習用データを用いて学習する。覚書用のレイアウト学習モデルは、覚書のどの位置に、どのような情報があるかを学習し、特に、表が無い場合が多く、手書きの署名欄が有るなど覚書に特有のレイアウトについて学習する。覚書用のレイアウト学習用データは、後述するアノテーションが付与された、例えば、200種類の覚書のフォームであって1フォームにつき少なくとも3、4枚の覚書の文書画像に基づいて生成される。 The layout learning model for the memorandum is a learning model that recognizes the layout of the document image of the memorandum, and learns using the layout learning data of the memorandum. The layout learning model for the memorandum learns what kind of information is in which position of the memorandum, and in particular, it learns the layout peculiar to the memorandum such as the fact that there is often no table and there is a handwritten signature line. The layout learning data for the memorandum is generated based on, for example, 200 types of memorandum forms to which the annotation described later is added, and at least three or four memorandum document images per form.
 納品書用のレイアウト学習モデルは、納品書の文書画像のレイアウト認識を行う学習モデルであり、納品書用のレイアウト学習用データを用いて学習する。納品書用のレイアウト学習モデルは、納品書のどの位置に、どのような情報があるかを学習し、特に、表の占める範囲が大きい場合が多く、商品の名称及び品番などの記載が少なくないなど納品書に特有のレイアウトについて学習する。納品書用のレイアウト学習用データは、後述するアノテーションが付与された、例えば、200種類の納品書のフォームであって1フォームにつき少なくとも3、4枚の納品書の文書画像に基づいて生成される。 The layout learning model for the delivery note is a learning model that recognizes the layout of the document image of the delivery note, and learns using the layout learning data for the delivery note. The layout learning model for the delivery note learns what kind of information is in which position on the delivery note, and in particular, the range occupied by the table is often large, and the product name and product number are often described. Learn about the layout peculiar to the delivery note. The layout learning data for the delivery note is generated based on, for example, 200 types of delivery note forms with the annotation described later, and at least 3 or 4 delivery note document images per form. ..
 領収書用のレイアウト学習モデルは、領収書の文書画像のレイアウト認識を行う学習モデルであり、領収書用のレイアウト学習用データを用いて学習する。領収書用のレイアウト学習モデルは、領収書のどの位置に、どのような情報があるかを学習し、特に、金額の手書き欄、若しくは金額が記載された表が記載されている場合が多いなど領収書に特有のレイアウトについて学習する。領収書用のレイアウト学習用データは、後述するアノテーションが付与された、例えば、200種類の領収書のフォームであって1フォームにつき少なくとも3、4枚の領収書の文書画像に基づいて生成される。 The layout learning model for receipts is a learning model that recognizes the layout of document images of receipts, and learns using layout learning data for receipts. The layout learning model for receipts learns what kind of information is in what position on the receipt, and in particular, it often contains a handwritten column of the amount or a table with the amount. Learn about the layout specific to receipts. The layout learning data for receipts is generated based on, for example, 200 types of receipt forms and at least 3 or 4 receipt document images per form, which are annotated as described below. ..
 なお、レイアウト学習モデル14は、情報通信ネットワーク11を介する電子文書生成装置10の利用に限定されるものではなく、電子文書生成装置10に含まれるものとしてもよい。また、レイアウト学習モデル14は、情報通信ネットワーク11に接続された複数の情報処理装置に分散して保存されてもよい。 The layout learning model 14 is not limited to the use of the electronic document generation device 10 via the information communication network 11, but may be included in the electronic document generation device 10. Further, the layout learning model 14 may be distributed and stored in a plurality of information processing devices connected to the information communication network 11.
 文書画像データベース15は、文書の画像を蓄積したデータベースである。電子文書生成装置10は、文書画像データベース15に記憶された文書画像を取得し、文字列学習モデルの学習に用いる文字列学習用データ、及びレイアウト学習モデルの学習に用いるレイアウト学習用データを生成する。 The document image database 15 is a database that stores images of documents. The electronic document generation device 10 acquires the document image stored in the document image database 15 and generates the character string learning data used for learning the character string learning model and the layout learning data used for learning the layout learning model. ..
 ユーザ端末12は、電子文書生成装置10の操作に用いられる。電子文書生成装置10によって生成された電子文書の中に誤って認識された文字があった場合、若しくは電子文書生成装置10によって生成された電子文書のレイアウトが誤って認識されたものであった場合に、ユーザ端末12のユーザからの修正入力に従って当該電子文書は修正され、電子文書生成装置10は当該修正を受け付けて、文字列学習モデル13及びレイアウト学習モデル14の少なくともいずれかの再学習を行う。 The user terminal 12 is used for operating the electronic document generator 10. When there is an erroneously recognized character in the electronic document generated by the electronic document generator 10, or when the layout of the electronic document generated by the electronic document generator 10 is erroneously recognized. The electronic document is modified according to the modification input from the user of the user terminal 12, and the electronic document generator 10 accepts the modification and relearns at least one of the character string learning model 13 and the layout learning model 14. ..
 次に図2を参照して、電子文書生成装置10の機械的構成について説明する。図2は、電子文書生成装置10の機械的構成を示すブロック図である。電子文書生成装置10は、入出力インターフェース20、通信インターフェース21、Read Only Memory(ROM)22、Random Access Memory(RAM)23、記憶部24、Central Processing Unit(CPU)25、Graphics processing units(GPU)28等を備えている。 Next, with reference to FIG. 2, the mechanical configuration of the electronic document generator 10 will be described. FIG. 2 is a block diagram showing a mechanical configuration of the electronic document generator 10. The electronic document generator 10 includes an input / output interface 20, a communication interface 21, a Read Only Memory (ROM) 22, a Random Access Memory (RAM) 23, a storage unit 24, a Central Processing Unit (CPU) 25, and a Graphics processing unit (GPU). It has 28 mag.
 入出力インターフェース20は、電子文書生成装置10の外部装置に対してデータなどの送受信を行う。外部装置とは、電子文書生成装置10に対してデータなどの入出力を行う入力装置26及び出力装置27のことである。入力装置26とはキーボード、マウス、及びスキャナーなどのことであり、出力装置27とはモニター、プリンタ及びスピーカなどのことである。 The input / output interface 20 sends / receives data or the like to / from an external device of the electronic document generation device 10. The external device is an input device 26 and an output device 27 that input / output data or the like to the electronic document generation device 10. The input device 26 is a keyboard, a mouse, a scanner, and the like, and the output device 27 is a monitor, a printer, a speaker, and the like.
 通信インターフェース21は、情報通信ネットワーク11を介して外部との通信を行う際に電子文書生成装置10のデータなどの入出力を行う機能を備える。
 記憶部24は、記憶装置として利用でき、電子文書生成装置10が動作する上で必要となる各種アプリケーション及び当該アプリケーションによって利用される各種データなどが記録される。GPU28は、機械学習などを実行する上で行われる繰り返し演算を多用する場合に適しており、CPU25とともに用いる。
The communication interface 21 has a function of inputting / outputting data of the electronic document generation device 10 when communicating with the outside via the information communication network 11.
The storage unit 24 can be used as a storage device, and various applications required for the electronic document generation device 10 to operate, various data used by the applications, and the like are recorded. The GPU 28 is suitable for a lot of repetitive operations performed in executing machine learning and the like, and is used together with the CPU 25.
 電子文書生成装置10は、後述する電子文書生成プログラムをROM22若しくは記憶部24に保存し、RAM23などで構成されるメインメモリに当該電子文書生成プログラムを取り込む。CPU25は、電子文書生成プログラムを取り込んだメインメモリにアクセスして、電子文書生成プログラムを実行する。 The electronic document generation device 10 stores the electronic document generation program described later in the ROM 22 or the storage unit 24, and takes the electronic document generation program into the main memory composed of the RAM 23 and the like. The CPU 25 accesses the main memory in which the electronic document generation program is incorporated and executes the electronic document generation program.
 次に図3を参照して、電子文書生成装置10が行う処理の概略を説明する。図3は、電子文書生成装置10が行う処理の概要を示す図である。
 電子文書生成装置10は、次に述べる処理I~IIIをこの順に行う。
Next, with reference to FIG. 3, the outline of the processing performed by the electronic document generator 10 will be described. FIG. 3 is a diagram showing an outline of processing performed by the electronic document generator 10.
The electronic document generator 10 performs the following processes I to III in this order.
 処理Iでは、文書画像の「背景除去」、「傾き補正」、及び「形状調整」を含む前処理55を行う。
 前処理55とは、文字列を含む画像に対して、学習モデルを用いた文字認識を実行(認識)しやすくするための事前の処理を行うことをいい、処理II、IIIで行う認識処理の認識精度を向上させることを目的とする。
Process I performs preprocessing 55 including "background removal", "tilt correction", and "shape adjustment" of the document image.
The pre-processing 55 refers to performing pre-processing for facilitating the execution (recognition) of character recognition using a learning model for an image including a character string, and is the recognition processing performed in processes II and III. The purpose is to improve recognition accuracy.
 処理IIでは、レイアウト認識処理56を行う。
 レイアウト認識処理56では、先ず文書画像の「レイアウト認識」を行う。レイアウト認識処理56とは、入力された画像内で、どの位置に、どのような情報があるのかを認識する処理である。
In the process II, the layout recognition process 56 is performed.
In the layout recognition process 56, first, "layout recognition" of the document image is performed. The layout recognition process 56 is a process of recognizing what kind of information is present at which position in the input image.
 ここでいう情報とは、文字列、表、画像、印章、手書きなどのことをいう。電子文書生成装置10は、文書画像のレイアウトを認識して文書画像に表が含まれている場合は、「表の認識」を行い、表に含まれるセルについて「セルの画像の切り出し」を行う。 Information here refers to character strings, tables, images, seals, handwriting, etc. When the electronic document generator 10 recognizes the layout of the document image and the document image contains a table, the electronic document generator 10 performs "table recognition" and "cuts out the cell image" for the cells included in the table. ..
 処理IIIでは、文字列認識処理57を行う。
 文字列認識処理57とは、文字列を含む画像を、画像と画像に含まれる文字列との対応関係を学習した文字列学習モデル13を用いて、テキストデータに変換する処理のことである。文字列認識処理57は、「テキストデータの配置」及び「ノイズ除去」などの処理を含むものとしてもよい。
In the process III, the character string recognition process 57 is performed.
The character string recognition process 57 is a process of converting an image including a character string into text data by using a character string learning model 13 that has learned the correspondence between the image and the character string included in the image. The character string recognition process 57 may include processes such as “arrangement of text data” and “noise reduction”.
 文字列認識処理57では、文字列の画像をテキストデータに変換するとともに、「テキストデータの配置」及び「ノイズ除去」を行う。
 「テキストデータの配置」とは、切出した文字列の画像にスペースが含まれている場合には文字列とともにスペースも一緒に認識されるので、テキストデータはスペースとともに配置されることを指す。
In the character string recognition process 57, the image of the character string is converted into text data, and "text data arrangement" and "noise removal" are performed.
"Arrangement of text data" means that when the image of the cut out character string contains a space, the space is recognized together with the character string, so that the text data is arranged together with the space.
 「ノイズ除去」とは、切出した文字列の画像にノイズが含まれている場合に、ノイズについては電子文書生成装置10によって認識されないのでテキストデータから受動的に除去されることをいう。ここでいうノイズとは、切り出された文字列の画像に含まれる、文字を構成しない画素のことをいう。 "Noise removal" means that when noise is contained in the image of the cut out character string, the noise is passively removed from the text data because it is not recognized by the electronic document generator 10. The noise referred to here refers to pixels that do not form characters and are included in the image of the cut out character string.
 次に図4を参照して、電子文書生成装置10の機能的構成について説明する。図4は、電子文書生成装置10の機能的構成を示すブロック図である。電子文書生成装置10は、後述する電子文書生成プログラムを実行することで、CPU25に、文書画像取得部31、前処理部32、背景除去部32a、傾き補正部32b、形状調整部32c、レイアウト認識部33、切出部34、文字列認識部35、出力部36、レイアウト学習用データ生成部40、レイアウト学習用データ修正部41、レイアウト学習部42、文字列学習用データ生成部43、文字列学習用データ修正部44、文字列学習部45等を備える。 Next, the functional configuration of the electronic document generator 10 will be described with reference to FIG. FIG. 4 is a block diagram showing a functional configuration of the electronic document generator 10. By executing an electronic document generation program described later, the electronic document generation device 10 has a document image acquisition unit 31, a preprocessing unit 32, a background removal unit 32a, an inclination correction unit 32b, a shape adjustment unit 32c, and a layout recognition on the CPU 25. Unit 33, cutting unit 34, character string recognition unit 35, output unit 36, layout learning data generation unit 40, layout learning data correction unit 41, layout learning unit 42, character string learning data generation unit 43, character string. It includes a learning data correction unit 44, a character string learning unit 45, and the like.
 文書画像取得部31(図4参照)は、文書を画像化した文書画像を取得する。
 文書画像取得部31は、文書画像データベース15から文書画像を取得してもよい。或いは、文書画像取得部31は、入力装置26のスキャナーから文書画像を入手してもよい。
The document image acquisition unit 31 (see FIG. 4) acquires a document image in which a document is imaged.
The document image acquisition unit 31 may acquire a document image from the document image database 15. Alternatively, the document image acquisition unit 31 may obtain a document image from the scanner of the input device 26.
 図5を参照して、文書画像取得部31に取得された文書画像、及び電子文書生成装置10から出力される電子文書について説明する。
 図5は電子文書生成装置10の入力データと出力データとを説明する図であり、図5(a)は入力データとして文書画像取得部31に取得された文書画像を示す。当該文書画像には、ホチキス跡50、手書き51、印章52、及び画像53などのノイズが存在する。
With reference to FIG. 5, the document image acquired by the document image acquisition unit 31 and the electronic document output from the electronic document generation device 10 will be described.
FIG. 5 is a diagram illustrating input data and output data of the electronic document generation device 10, and FIG. 5A shows a document image acquired by the document image acquisition unit 31 as input data. The document image contains noise such as a stapler mark 50, a handwriting 51, a seal 52, and an image 53.
 これらのノイズは、人及びパソコンなどの情報処理装置が当該文書の内容を理解する上で邪魔若しくは不要となる。ノイズの他の例としては、ファイリングのために開けた穴、紙面に残った折り目などがある。折り目は線として認識される恐れがあり、電子文書に反映されないように除去の必要性が高いものである。 These noises interfere with or become unnecessary for information processing devices such as humans and personal computers to understand the contents of the document. Other examples of noise include holes made for filing and creases left on the paper. The creases can be perceived as lines and need to be removed so that they are not reflected in the electronic document.
 電子文書生成装置10は、取得した文書画像のレイアウトを維持しつつ、当該文書画像内の文字列をテキストデータに変換して電子文書を出力する(図5(b)参照)。電子文書生成装置10は、ノイズとして認識したホチキス跡50、手書き51、印章52、及び画像53について能動的処理によりノイズ除去54し、その他、文字列及びノイズとして認識されなかった文書画像内の画素については、電子文書に残留させない受動的処理により除去する。 The electronic document generation device 10 converts the character string in the document image into text data and outputs the electronic document while maintaining the layout of the acquired document image (see FIG. 5 (b)). The electronic document generator 10 removes noise 54 by active processing of the hochiki mark 50, the handwriting 51, the seal 52, and the image 53 recognized as noise, and other pixels in the document image not recognized as character strings and noise. Is removed by passive processing that does not remain in the electronic document.
 図5(b)の文書画像内の表は、当該文書画像内の配置を維持しつつ、電子文書内のオブジェクトデータとしてテキストデータとともに出力される。電子文書生成装置10は、出力する電子文書に含める要素を任意に選択できるものである。 The table in the document image of FIG. 5B is output together with the text data as object data in the electronic document while maintaining the arrangement in the document image. The electronic document generator 10 can arbitrarily select the elements to be included in the electronic document to be output.
 例えば、ホチキス跡50、手書き51、印章52、及び画像53などは通常使用では除去されるが、印章52、及び画像53をイメージデータとして電子文書に含めて出力することもできる。 For example, the stapler mark 50, the handwritten 51, the seal 52, the image 53, and the like are removed in normal use, but the seal 52 and the image 53 can be included in an electronic document and output as image data.
 前処理部32(図4参照)は、文書画像取得部31が取得した文書画像について前処理55を行う。
 前処理55は、後述するレイアウト認識部33及び文字列認識部35による、学習モデルを用いる画像認識の認識精度を向上させるために行われる。
The preprocessing unit 32 (see FIG. 4) performs preprocessing 55 on the document image acquired by the document image acquisition unit 31.
The preprocessing 55 is performed in order to improve the recognition accuracy of image recognition using the learning model by the layout recognition unit 33 and the character string recognition unit 35, which will be described later.
 前処理部32は、背景除去部32a、傾き補正部32b、及び形状調整部32cを備える。
 背景除去部32a(図4参照)は、文書画像取得部31が取得した文書画像の背景を除去する。
The pretreatment unit 32 includes a background removal unit 32a, an inclination correction unit 32b, and a shape adjustment unit 32c.
The background removing unit 32a (see FIG. 4) removes the background of the document image acquired by the document image acquisition unit 31.
 図6を参照して、背景除去部32aにより行われる処理について説明する。図6は、前処理55で行う背景除去を説明する図である。図6(a)は背景除去される前の文書画像58aを示し、図6(b)は背景除去された後の文書画像58bを示す。 The processing performed by the background removing unit 32a will be described with reference to FIG. FIG. 6 is a diagram illustrating background removal performed in the pretreatment 55. FIG. 6A shows a document image 58a before the background is removed, and FIG. 6B shows a document image 58b after the background is removed.
 背景除去部32aは、文書画像の背景色を白色にすることによって文書画像の背景を除去する。具体的には、背景除去部32aは、取得した文書画像の背景色を検出し、当該背景色が白色か否かを判断する。背景色が白色ではないと判断された場合、背景除去部32aは、文書画像の背景以外の情報を抽出し、背景色を白色にした後に抽出した情報を重ね合わせる。 The background removing unit 32a removes the background of the document image by changing the background color of the document image to white. Specifically, the background removing unit 32a detects the background color of the acquired document image and determines whether or not the background color is white. When it is determined that the background color is not white, the background removing unit 32a extracts information other than the background of the document image, makes the background color white, and then superimposes the extracted information.
 背景除去部32aによれば、背景を削除することでレイアウト認識部33及び文字列認識部35による画像認識の誤動作の原因となるノイズを除去することができ、認識精度を向上させることができる。 According to the background removing unit 32a, by deleting the background, noise that causes a malfunction of image recognition by the layout recognition unit 33 and the character string recognition unit 35 can be removed, and the recognition accuracy can be improved.
 傾き補正部32b(図4参照)は、文書画像取得部31が取得した文書画像の傾きを補正する。
 図7を参照して、傾き補正部32bにより行われる処理について説明する。図7は、前処理55で行う傾き補正を説明する図である。図7(a)は傾き補正される前の文書画像59aを示し、図7(b)は傾き補正された後の文書画像59bを示す。
The tilt correction unit 32b (see FIG. 4) corrects the tilt of the document image acquired by the document image acquisition unit 31.
The processing performed by the tilt correction unit 32b will be described with reference to FIG. 7. FIG. 7 is a diagram illustrating the inclination correction performed in the preprocessing 55. FIG. 7A shows the document image 59a before the tilt correction, and FIG. 7B shows the document image 59b after the tilt correction.
 傾き補正部32bは、文書画像の中に傾いた文字列がある場合に当該文字列の傾きを補正し、当該文字列を書字方向に対して平行若しくは垂直にする。傾き補正部32bは、文書画像が縦書きの場合は、傾いた文字列を縦書きの方向に対して平行になるように補正し、文書画像が横書きの場合は、傾いた文字列を横書きの方向に対して平行になるように補正する。 The tilt correction unit 32b corrects the tilt of the character string when there is a tilted character string in the document image, and makes the character string parallel or perpendicular to the writing direction. When the document image is written vertically, the tilt correction unit 32b corrects the tilted character string so as to be parallel to the vertical writing direction, and when the document image is written horizontally, the tilted character string is written horizontally. Correct so that it is parallel to the direction.
 具体的には、傾き補正部32bは、文書画像の文字列を抽出し、抽出した文字列の中に傾斜した文字列があるか否かを判断する。抽出した文字列の中に傾斜した文字列があると判断した場合に、傾き補正部32bは、傾斜した当該文字列の書字方向に対する傾斜角を検出し、傾斜した当該文字列に対して傾斜角がゼロになるように回転処理を施す。 Specifically, the tilt correction unit 32b extracts the character string of the document image and determines whether or not there is a tilted character string in the extracted character string. When it is determined that there is a tilted character string in the extracted character string, the tilt correction unit 32b detects the tilt angle of the tilted character string with respect to the writing direction, and tilts the tilted character string with respect to the tilted character string. Rotation processing is performed so that the angle becomes zero.
 傾き補正部32bによれば、文字列の傾きを補正することで、文字列認識部35による画像認識の認識精度を向上させることができる。さらに、レイアウト認識部33によるレイアウトの認識エラーを低減することができる。 According to the tilt correction unit 32b, the recognition accuracy of image recognition by the character string recognition unit 35 can be improved by correcting the tilt of the character string. Further, it is possible to reduce the layout recognition error by the layout recognition unit 33.
 形状調整部32c(図4参照)は、文書画像取得部31が取得した文書画像の全体の形状及び大きさを調整する。
 図8を参照して、形状調整部32cにより行われる処理について説明する。図8は、前処理で行う形状調整を説明する図である。図8(a)は形状調整される前の文書画像60aを示し、図8(b)は形状調整された後の文書画像60bを示す。
The shape adjusting unit 32c (see FIG. 4) adjusts the overall shape and size of the document image acquired by the document image acquisition unit 31.
The processing performed by the shape adjusting unit 32c will be described with reference to FIG. FIG. 8 is a diagram illustrating shape adjustment performed in the pretreatment. FIG. 8A shows a document image 60a before the shape adjustment, and FIG. 8B shows a document image 60b after the shape adjustment.
 形状調整部32cは、文書画像取得部31が取得した文書画像の全体の形状が実際の文書と比べて異なる場合、当該文書画像の全体の形状を実際の文書の全体の形状に基づいて調整を行う。具体的には、文書画像取得部31が取得した文書画像の全体の縦横比が実際の文書の全体の縦横比と異なる場合は、当該文書画像の全体の縦横比が実際の文書の全体の縦横比と等しくなるように形状調整部32cが調整する。 When the overall shape of the document image acquired by the document image acquisition unit 31 is different from the actual document, the shape adjustment unit 32c adjusts the overall shape of the document image based on the overall shape of the actual document. conduct. Specifically, when the overall aspect ratio of the document image acquired by the document image acquisition unit 31 is different from the overall aspect ratio of the actual document, the overall aspect ratio of the document image is the overall aspect ratio of the actual document. The shape adjusting unit 32c adjusts so as to be equal to the ratio.
 また、文書画像取得部31が取得した文書画像の大きさが大きすぎる場合、若しくは小さすぎる場合に、その後の処理が正常に行われない可能性があるので、形状調整部32cは、その後の処理が正常に行われるように文書画像取得部31が取得した文書画像の大きさを調整する。 Further, if the size of the document image acquired by the document image acquisition unit 31 is too large or too small, the subsequent processing may not be performed normally. Therefore, the shape adjustment unit 32c performs the subsequent processing. Adjusts the size of the document image acquired by the document image acquisition unit 31 so that
 形状調整部32cによれば、文書画像取得部31が取得した文書画像の形状及び大きさを調整することで、その後に行われるレイアウト認識部33による実際の文書に則したレイアウトの認識精度を向上させることができ、さらに文字列認識部35による画像認識の認識精度を向上させることができる。 According to the shape adjustment unit 32c, by adjusting the shape and size of the document image acquired by the document image acquisition unit 31, the layout recognition unit 33 that is performed thereafter improves the recognition accuracy of the layout according to the actual document. Further, the recognition accuracy of image recognition by the character string recognition unit 35 can be improved.
 レイアウト認識部33(図4参照)は、文書画像61に含まれる複数の要素と、当該複数の要素の各々の識別情報との対応関係を学習したレイアウト学習モデル14を用いて、文書画像取得部31に取得された文書画像61に含まれる複数の要素の各々の文書画像61内における範囲を特定し、複数の要素の各々の種類を認識し、複数の要素の各々の範囲に係る文書画像61内における位置情報を取得する。 The layout recognition unit 33 (see FIG. 4) uses a layout learning model 14 that has learned the correspondence between the plurality of elements included in the document image 61 and the identification information of each of the plurality of elements, and the document image acquisition unit 33. The range of each of the plurality of elements included in the document image 61 acquired in 31 is specified in the document image 61, each type of the plurality of elements is recognized, and the document image 61 relating to each range of the plurality of elements is recognized. Get the position information in.
 要素の種類は、文字列48、表49、画像53、印章52、又は手書き51のいずれかであることとしてもよい。なお、要素の種類はこれに限らず、ホチキス跡50、パンチ穴跡、破損(破れ)跡、複写用カーボン汚れなどを用いてもよい。 The type of the element may be any of a character string 48, a table 49, an image 53, a seal 52, or a handwriting 51. The type of the element is not limited to this, and stapler marks 50, punch hole marks, breakage (tear) marks, carbon stains for copying, and the like may be used.
 要素の種類は、文書の種類(例えば、契約書、請求書、覚書、納品書、又は領収書など)に適したものを用いてもよい。例えば、領収書の裏面に複写用のカーボンが添布してあり、表面にカーボンが移り汚れとなる場合は、要素の種類に複写用カーボンによる汚れを用いて能動的に当該複写用カーボンによる汚れを除去してもよい。 The type of element may be suitable for the type of document (for example, contract, invoice, memorandum, invoice, receipt, etc.). For example, if carbon for copying is attached to the back side of the receipt and the carbon is transferred to the front surface and becomes a stain, use the stain with the carbon for copying as the element type and actively stain with the carbon for copying. May be removed.
 レイアウト学習モデル14は、契約書用のレイアウト学習モデル、請求書用のレイアウト学習モデル、覚書用のレイアウト学習モデル、納品書用のレイアウト学習モデル、又は領収書用のレイアウト学習モデルのいずれかであることとしてもよい。 The layout learning model 14 is either a layout learning model for contracts, a layout learning model for invoices, a layout learning model for memorandums, a layout learning model for delivery notes, or a layout learning model for receipts. It may be that.
 要素の種類は、文書の種類に応じて必要なものと不要なものとに分類してもよい。
 この場合、レイアウト認識部33は、文書画像61に含まれる複数の要素のうち、認識した要素が不要なものに該当する場合は当該要素の位置情報は取得されず、認識した要素が必要なものに該当する場合は当該要素の位置情報を取得することとしてもよい。または、レイアウト認識部33は、文書画像61に含まれる複数の要素のうち、必要な要素のみを認識し、当該要素の位置情報を取得することとしてもよい。
The types of elements may be classified into necessary and unnecessary according to the type of document.
In this case, if the recognized element corresponds to an unnecessary element among the plurality of elements included in the document image 61, the layout recognition unit 33 does not acquire the position information of the element and requires the recognized element. If it corresponds to, the position information of the element may be acquired. Alternatively, the layout recognition unit 33 may recognize only the necessary elements among the plurality of elements included in the document image 61 and acquire the position information of the elements.
 レイアウト認識部33は、要素の各々の種類を認識して、当該要素の各々の範囲に係る文書画像の位置情報を取得した後、要素同士が重なり合う、若しくは要素同士が離れすぎている場合には、実際の文書に基づいて、当該要素の各々の範囲及び取得した位置情報を補正する。 After recognizing each type of the element and acquiring the position information of the document image related to each range of the element, the layout recognition unit 33 may overlap the elements or the elements may be too far apart from each other. , Correct the range of each of the elements and the acquired position information based on the actual document.
 図9を参照して、レイアウト認識部33が認識した認識範囲に欠落が生じた場合にレイアウト認識部33が行う欠落解消の補正処理の一例について説明する。欠落とは、レイアウト認識部33が要素として認識するべき範囲についてその一部が認識されず、要素の範囲の一部が不足することをいう。図9はレイアウト認識処理で行う欠落解消の補正処理を説明する図であり、図9(a)は補正前の様子を示し、図9(b)は補正後の様子を示す。 With reference to FIG. 9, an example of correction processing for eliminating the omission performed by the layout recognition unit 33 when a omission occurs in the recognition range recognized by the layout recognition unit 33 will be described. The omission means that a part of the range to be recognized as an element by the layout recognition unit 33 is not recognized, and a part of the range of the element is insufficient. 9A and 9B are diagrams for explaining the correction process for eliminating the omission in the layout recognition process, FIG. 9A shows a state before the correction, and FIG. 9B shows a state after the correction.
 レイアウト認識部33は、文書画像取得部31によって取得された文書画像に含まれる文字列の画像70を文字列として認識する際に、その認識範囲に欠落が有るか否かの判定を行い、欠落が有る場合には欠落部分を追加する補正処理を行う。 When the layout recognition unit 33 recognizes the image 70 of the character string included in the document image acquired by the document image acquisition unit 31 as a character string, the layout recognition unit 33 determines whether or not there is a gap in the recognition range, and determines whether or not the recognition range is missing. If there is, perform correction processing to add the missing part.
 図9(a)は、レイアウト認識部33が、文字列の画像70について認識範囲72aの文字列として認識した様子を示している。認識範囲72aは、文字列の画像70の左端部分に欠落を有している。レイアウト認識部33は、認識範囲72aの周囲の所定範囲以内に黒線が有るか否かの判定を行い、黒線が有る場合には黒線を含む範囲72bを認識範囲72aに追加する補正を行う(図9(b)参照)。 FIG. 9A shows how the layout recognition unit 33 recognizes the image 70 of the character string as a character string in the recognition range 72a. The recognition range 72a has a defect in the left end portion of the image 70 of the character string. The layout recognition unit 33 determines whether or not there is a black line within a predetermined range around the recognition range 72a, and if there is a black line, a correction is added to add the range 72b including the black line to the recognition range 72a. (See FIG. 9 (b)).
 なお、レイアウト認識部33が行う有無の判定は黒線に限定されるものではなく、文字と同色の線又は予め設定された色の線が認識範囲72aの周囲の所定範囲以内に有るか否かの判定を行うこととしてもよい。なぜらな、レイアウト認識処理で行う欠落解消の補正処理は、その後に行われる文字認識処理の認識精度を向上させるのが主な目的だからである。 The determination of presence / absence performed by the layout recognition unit 33 is not limited to the black line, and whether or not a line having the same color as the character or a line having a preset color is within a predetermined range around the recognition range 72a. May be determined. This is because the main purpose of the correction process for eliminating omissions performed in the layout recognition process is to improve the recognition accuracy of the character recognition process performed thereafter.
 当該補正処理によれば、レイアウト認識部33が認識した要素の範囲に欠落が生じ場合でも、その欠落した範囲を追加することで正常な認識範囲へと補正することができ、当該要素に含まれる文字列について文字列認識部35は正常に文字認識を行うことができる。 According to the correction process, even if the range of the element recognized by the layout recognition unit 33 is missing, it can be corrected to the normal recognition range by adding the missing range, and it is included in the element. About the character string The character string recognition unit 35 can normally perform character recognition.
 図10を参照して、レイアウト認識部33が認識した認識範囲が、他の要素に重なってしまった場合にレイアウト認識部33が行う補正の一例について説明する。図10はレイアウト認識処理で行う重なり解消の補正処理を説明する図であり、図10(a)は補正前の様子を示し、図10(b)は補正後の様子を示す。 With reference to FIG. 10, an example of correction performed by the layout recognition unit 33 when the recognition range recognized by the layout recognition unit 33 overlaps with other elements will be described. 10A and 10B are diagrams for explaining the correction process for eliminating the overlap performed in the layout recognition process, FIG. 10A shows a state before the correction, and FIG. 10B shows a state after the correction.
 レイアウト認識部33は、文書画像取得部31によって取得された文書画像に含まれる文字列の画像73を文字列として認識する際に、その認識範囲75aが他の要素(例えば、表74)に重なっているか否かの判定を行い、重なりが生じている場合には重なりを解消する補正処理を行う。 When the layout recognition unit 33 recognizes the image 73 of the character string included in the document image acquired by the document image acquisition unit 31 as a character string, the recognition range 75a overlaps with another element (for example, Table 74). It is determined whether or not the overlap occurs, and if an overlap occurs, a correction process for eliminating the overlap is performed.
 図10(a)は、レイアウト認識部33が、文字列の画像73について認識範囲75aの文字列として認識した様子を示している。認識範囲75aは、文字列の画像73の右隣の表74に空白(スペース)を超えて重なっている。レイアウト認識部33は、認識範囲75aの内部に所定の大きさの空白(スペース)が有るか否かの判定を行い、当該空白(スペース)が有る場合には、当該空白(スペース)及び当該空白(スペース)より右側の部分に係る認識範囲75aを削除して認識範囲75bとする補正を行う(図10(b)参照)。 FIG. 10A shows how the layout recognition unit 33 recognizes the image 73 of the character string as a character string in the recognition range 75a. The recognition range 75a overlaps the table 74 to the right of the image 73 of the character string beyond a blank (space). The layout recognition unit 33 determines whether or not there is a blank (space) of a predetermined size inside the recognition range 75a, and if there is the blank (space), the blank (space) and the blank. The recognition range 75a related to the portion on the right side of (space) is deleted to make the recognition range 75b (see FIG. 10B).
 要素と他の要素との間には必ず所定の大きさの空白(スペース)があるので、レイアウト認識部33は、認識範囲の内部に所定の大きさの空白(スペース)が有る場合に当該認識範囲は他の要素と重なっていると断定するものである。レイアウト認識処理で行う重なり解消の補正処理によれば、レイアウト認識部33はレイアウトの認識精度を向上させることができる。 Since there is always a blank (space) of a predetermined size between an element and another element, the layout recognition unit 33 recognizes the blank (space) of a predetermined size inside the recognition range. We conclude that the range overlaps with other elements. According to the overlap elimination correction process performed in the layout recognition process, the layout recognition unit 33 can improve the layout recognition accuracy.
 図11を参照して、レイアウト認識部33により行われる処理について説明する。図11は、レイアウト認識処理56で行うレイアウト認識を説明する図であり、図11(a)はレイアウト認識される前の文書画像61の状態を示し、図11(b)はレイアウト認識された後の文書画像62の状態を示す。 The processing performed by the layout recognition unit 33 will be described with reference to FIG. 11A and 11B are diagrams for explaining the layout recognition performed in the layout recognition process 56, FIG. 11A shows the state of the document image 61 before the layout is recognized, and FIG. 11B shows the state of the document image 61 after the layout is recognized. The state of the document image 62 of the above is shown.
 レイアウト認識部33は、文書画像61に含まれる要素(文字列48、表49、印章52、画像53)の文書画像61内における範囲について、レイアウト学習モデル14を用いた画像認識により特定する。 The layout recognition unit 33 specifies the range of the elements (character string 48, table 49, seal 52, image 53) included in the document image 61 within the document image 61 by image recognition using the layout learning model 14.
 図11(b)において、説明の便宜上、特定された文字列48の範囲を実線で囲い、特定された表49、印章52、及び画像53の範囲を破線で囲う。要素の境界は電子文書生成装置10が認識できれば良いので、人に対して可視化されていなくても良い。 In FIG. 11B, for convenience of explanation, the range of the specified character string 48 is surrounded by a solid line, and the ranges of the specified table 49, the seal 52, and the image 53 are surrounded by a broken line. The boundaries of the elements need not be visible to humans as long as they can be recognized by the electronic document generator 10.
 レイアウト認識部33は、特定された文書画像61内における範囲において、レイアウト学習モデル14を用いた画像認識により該当する要素の種類を認識し、当該要素の種類とともに当該範囲の文書画像62内に係る位置情報を取得する。位置情報は、文書画像62内の所定点を原点とした平面直交座標によって表されてもよい。 The layout recognition unit 33 recognizes the type of the corresponding element by image recognition using the layout learning model 14 in the range in the specified document image 61, and relates to the document image 62 in the range together with the type of the element. Get location information. The position information may be represented by plane orthogonal coordinates with a predetermined point in the document image 62 as the origin.
 レイアウト学習モデル14は、文書画像61の種類に合わせて予め設定されており、レイアウト認識部33は、予め設定されたレイアウト学習モデル14を用いて文書画像61のレイアウトを認識する。 The layout learning model 14 is preset according to the type of the document image 61, and the layout recognition unit 33 recognizes the layout of the document image 61 using the preset layout learning model 14.
 すなわち、文書画像取得部31により取得された文書画像61が契約書だった場合は契約書用のレイアウト学習モデル14を用いて画像認識を行い、請求書だった場合は請求書用のレイアウト学習モデル14を用いて画像認識を行い、覚書だった場合は覚書用のレイアウト学習モデル14を用いて画像認識を行い、納品書だった場合は納品書用のレイアウト学習モデル14を用いて画像認識を行い、領収書だった場合は領収書用のレイアウト学習モデル14を用いて画像認識を行う。 That is, if the document image 61 acquired by the document image acquisition unit 31 is a contract, image recognition is performed using the layout learning model 14 for the contract, and if it is an invoice, the layout learning model for the invoice. Image recognition is performed using 14, and if it is a memorandum, image recognition is performed using the layout learning model 14 for the memorandum, and if it is a delivery note, image recognition is performed using the layout learning model 14 for the delivery note. If it is a receipt, image recognition is performed using the layout learning model 14 for the receipt.
 レイアウト認識部33は、文書画像取得部31により取得された文書画像61の種類に合わせてレイアウト学習モデル14を使い分けるので、文書画像61のレイアウト認識の認識精度を向上させることができる。 Since the layout recognition unit 33 properly uses the layout learning model 14 according to the type of the document image 61 acquired by the document image acquisition unit 31, the recognition accuracy of the layout recognition of the document image 61 can be improved.
 切出部34(図4参照)は、レイアウト認識部33により認識された種類が表に該当する要素において、当該要素に含まれる表の中のセルの各々を切り出し、セルの各々の文書画像内における位置情報を取得する。 The cutout unit 34 (see FIG. 4) cuts out each of the cells in the table included in the element in the element whose type recognized by the layout recognition unit 33 corresponds to the table, and in each document image of the cell. Get the location information in.
 図12を参照して、レイアウト認識部33による表49の認識について説明する。図12はレイアウト認識処理56で行う表の認識を説明する図であり、図12(a)はレイアウト認識部33に認識される前の表63を示し、図12(b)はレイアウト認識部33に認識された後の表64を示す。図12(b)では、説明の便宜上、縦線65として認識された線を一点鎖線として表し、横線66として認識された線を破線として表すことにする。 With reference to FIG. 12, the recognition of Table 49 by the layout recognition unit 33 will be described. 12A and 12B are diagrams for explaining table recognition performed by the layout recognition process 56, FIG. 12A shows Table 63 before being recognized by the layout recognition unit 33, and FIG. 12B shows the layout recognition unit 33. Table 64 after being recognized in. In FIG. 12B, for convenience of explanation, the line recognized as the vertical line 65 is represented as a one-dot chain line, and the line recognized as the horizontal line 66 is represented as a broken line.
 レイアウト認識部33は、表64を構成する全ての縦線65及び横線66の各々の長さと位置を認識する。レイアウト認識部33は、表64を構成する全ての縦線65及び横線66の長さと位置を認識することで、表64に含まれる全てのセルについて認識する。すなわち、レイアウト認識部33は、隣接する2本の縦線65及び隣接する2本の横線66により構成される四角形をセルとして認識する。 The layout recognition unit 33 recognizes the length and position of each of the vertical lines 65 and the horizontal lines 66 constituting the table 64. The layout recognition unit 33 recognizes all the cells included in the table 64 by recognizing the lengths and positions of all the vertical lines 65 and the horizontal lines 66 constituting the table 64. That is, the layout recognition unit 33 recognizes a quadrangle composed of two adjacent vertical lines 65 and two adjacent horizontal lines 66 as cells.
 さらに、レイアウト認識部33は表64を構成する線の線種についても認識する。認識された線種は、取得した文書画像に基づいて電子文書を再現する際に、当該電子文書に含まれる表を構成する線のオブジェクトに反映される。従って、例えば、文書画像62内の表の線が破線であった場合、文書画像62に基づいて再現された電子文書に含まれる表の線は破線のオブジェクトとして表現される。 Further, the layout recognition unit 33 also recognizes the line types of the lines constituting Table 64. The recognized line type is reflected in the line object constituting the table included in the electronic document when the electronic document is reproduced based on the acquired document image. Therefore, for example, when the table line in the document image 62 is a broken line, the table line included in the electronic document reproduced based on the document image 62 is represented as a broken line object.
 切出部34は、レイアウト認識部33により把握された表64に含まれる全てのセルについてセル単体毎の画像に切り出す。
 図13を参照して、切出部34によるセル画素の切り出しについて説明する。図13は、セル画像の切り出しを説明する図である。切出部34により切り出されたセル67は、複数の文字列を含む場合もある。
The cutout unit 34 cuts out all the cells included in the table 64 grasped by the layout recognition unit 33 into an image for each cell alone.
With reference to FIG. 13, cutting out of cell pixels by the cutting-out portion 34 will be described. FIG. 13 is a diagram illustrating cutting out of a cell image. The cell 67 cut out by the cutout portion 34 may include a plurality of character strings.
 切出部34は、表64に含まれる全てのセルについてセル単体毎の画像と当該セルの表64における位置情報について取得する。位置情報は、表64内の所定点を原点とした平面直交座標によって表されてもよいし、若しくは表64における(行、列)によって表されてもよい。 The cutting unit 34 acquires the image of each cell and the position information of the cell in the table 64 for all the cells included in the table 64. The position information may be represented by plane orthogonal coordinates with a predetermined point in Table 64 as the origin, or may be represented by (rows, columns) in Table 64.
 切出部34は、レイアウト認識部33により認識された表を構成する全ての縦線及び横線を再生し、全てのセルの位置情報を生成する。 The cutout unit 34 reproduces all the vertical lines and horizontal lines constituting the table recognized by the layout recognition unit 33, and generates the position information of all the cells.
 図14を参照して、複数の文字列を含むセル67について説明する。図14は、セル画像内の文字列を説明する図である。 A cell 67 containing a plurality of character strings will be described with reference to FIG. FIG. 14 is a diagram illustrating a character string in the cell image.
 切出部34は、切り出されたセル67に複数行の文字列が含まれている場合は、さらに全ての文字列について文字列単体毎の画像を切り出す。図14に示すセル67は、2行の文字列を含んでおり、切出部34は文字列の画像67a及び文字列の画像67bを切出す。 When the cut out cell 67 contains a character string of a plurality of lines, the cutout unit 34 further cuts out an image for each character string for all the character strings. The cell 67 shown in FIG. 14 contains two lines of character strings, and the cutout portion 34 cuts out an image 67a of the character string and an image 67b of the character string.
 文字列認識部35(図4参照)は、文書画像と当該文書画像に含まれる文字列との対応関係を学習した文字列学習モデル13を用いて、文書画像取得部31に取得された文書画像に含まれる文字列を文字認識し、当該文字列に係るテキストデータを生成する。 The character string recognition unit 35 (see FIG. 4) uses a character string learning model 13 that has learned the correspondence between the document image and the character string included in the document image, and the document image acquired by the document image acquisition unit 31. Characters are recognized for the character string included in, and text data related to the character string is generated.
 文字列認識部35は、レイアウト認識部33により認識された範囲に含まれる文字列について、文字列学習モデル13を用いて文字認識し、文字列に係るテキストデータを生成してもよい。 The character string recognition unit 35 may recognize the character string included in the range recognized by the layout recognition unit 33 using the character string learning model 13 and generate text data related to the character string.
 文字列認識部35は、切出部34に切り出されたセルの各々に含まれる文字列について、文字列学習モデル13を用いて文字認識を行い、文字列に係るテキストデータを生成してもよい。 The character string recognition unit 35 may perform character recognition using the character string learning model 13 for the character strings included in each of the cells cut out by the cutout unit 34, and generate text data related to the character strings. ..
 文字列認識部35は、複数の文字列学習モデル13を備え、複数の要素の各々に含まれる文字列の言語に適応した文字列学習モデル13を用いてもよい。
 文字列認識部35は、英語で書かれた文書画像を文字認識する場合に、英語の文字列の認識に適した文字列学習モデルを用いることで、認識精度を向上させることができる。
The character string recognition unit 35 includes a plurality of character string learning models 13, and may use a character string learning model 13 adapted to the language of the character string included in each of the plurality of elements.
When the character string recognition unit 35 recognizes a document image written in English, the recognition accuracy can be improved by using a character string learning model suitable for recognizing an English character string.
 図15、16を用いて文字列認識部35が行う文字認識について説明する。
 図15は、文字列認識処理57で行うテキストデータの配置を説明する図であり、図15(a)は文字認識が行われる前の文字列の画像67aであり、図15(b)は文字認識が行われた後の文字列68a、すなわちテキストデータ68aである。
The character recognition performed by the character string recognition unit 35 will be described with reference to FIGS. 15 and 16.
15A and 15B are diagrams for explaining the arrangement of text data performed in the character string recognition process 57, FIG. 15A is an image 67a of a character string before character recognition is performed, and FIG. 15B is a character. It is a character string 68a after recognition, that is, text data 68a.
 図16は、文字列認識処理57で行うノイズ除去を説明する図であり、図16(a)は文字認識が行われる前の文字列の画像71aであり、図16(b)は文字認識が行われた後の文字列71b、すなわちテキストデータ71bである。 16A and 16B are diagrams for explaining noise removal performed by the character string recognition process 57, FIG. 16A is an image 71a of a character string before character recognition is performed, and FIG. 16B is a character recognition. It is the character string 71b after it is performed, that is, the text data 71b.
 図15(a)に示す文字列の画像67aは、1行の文字列の他に手書きチェック跡が残る。当該文字列は、単語と単語との間に空白を含んでいる。文字列認識部35は、文字列の画像67a全体について、文字列学習モデル13を用いて文字認識し、テキストデータを生成する。 The image 67a of the character string shown in FIG. 15A has a handwritten check mark in addition to the character string of one line. The character string contains a space between words. The character string recognition unit 35 recognizes the entire image 67a of the character string by using the character string learning model 13 and generates text data.
 文字列認識部35は、文字列の画像67aの中の、2個の字句である「L/C NO:」、「ILC18H000219」、及び2個の字句の間の空白について文字認識して、2個の字句に対応するテキストデータ、及び2個の字句の間の空白に対応するテキストデータを生成する(68a:図15(b)参照)。
 従って、文字列認識部35は、字句と字句の間にあるスペースについても認識してテキストデータに変換するので、画像67aと同様に2つの字句を離して配置することができる。
The character string recognition unit 35 recognizes characters for two words "L / C NO:", "ILC18H000219", and a blank space between the two words in the image 67a of the character string, and 2 The text data corresponding to the tokens and the text data corresponding to the blank space between the two tokens are generated (68a: see FIG. 15B).
Therefore, since the character string recognition unit 35 also recognizes the space between words and phrases and converts them into text data, the two words and phrases can be arranged separately as in the image 67a.
 文字列認識部35は、文字列の画像67aについて文字認識する際に、手書きチェック跡については文字認識されずテキストデータに含まれないので、出力される電子文書から手書きチェック跡は削除される(68a:図15(b))。従って、文字列認識部35の文字認識の対象とならない手書チェック跡などのノイズは受動的に電子文書から除去される。 When the character string recognition unit 35 recognizes the character of the image 67a of the character string, the handwriting check mark is not recognized and is not included in the text data. Therefore, the handwriting check mark is deleted from the output electronic document (the handwriting check mark is deleted). 68a: FIG. 15 (b). Therefore, noise such as handwriting check marks that are not the target of character recognition by the character string recognition unit 35 is passively removed from the electronic document.
 図16(a)に示す文字列の画像71aは、1行の文字列に重なった印章の一部がノイズとして残っている。文字列認識部35は、文字列の画像71a全体について、文字列学習モデル13を用いて文字認識し、テキストデータを生成する。 In the image 71a of the character string shown in FIG. 16A, a part of the seal overlapped with the character string on one line remains as noise. The character string recognition unit 35 recognizes the entire image 71a of the character string using the character string learning model 13 and generates text data.
 文字列認識部35は、文字列の画像71a全体について文字認識し、文字列「authorized to act on behalf of the」について、当該文字列に対応するテキストデータについて生成する(71b:図16(b)参照)。 The character string recognition unit 35 recognizes characters for the entire image 71a of the character string, and generates the text data corresponding to the character string for the character string "autiated to act on behalf of the" (71b: FIG. 16B). reference).
 文字列の画像71aに含まれるノイズは、文字列認識部35の文字認識の対象とはならないので、受動的に電子文書から除去される(71b:図16(b)参照)。 Since the noise contained in the image 71a of the character string is not the target of character recognition by the character string recognition unit 35, it is passively removed from the electronic document (71b: see FIG. 16B).
 図15、16を用いて、文書画像と、その文章画像に対する文字認識を施した後の文字列のテキストデータについて説明したが、文字列学習モデル13は、図15(a)と図15(b)とを対応付けたデータや、図16(a)と図16(b)とを対応付けたデータを、教師データとして、多数学習することで、このような深層学習を用いた画像からの文字認識を実現することができる。 Although the text data of the character string after performing character recognition on the document image and the sentence image has been described with reference to FIGS. 15 and 16, the character string learning model 13 has FIGS. 15 (a) and 15 (b). ) And the data associated with FIG. 16 (a) and FIG. 16 (b) as teacher data, by learning a large number of characters from the image using such deep learning. Recognition can be realized.
 文字列認識部35は、文字列学習モデル13を用いて、画像67a、71aに含まれる文字列について文字認識する際に当該文字列に含まれる文字の大きさ及び書体などの属性データを取得してもよい。この文字の属性データは、後述の出力部36によって出力されるテキストデータの属性データとして反映される。 The character string recognition unit 35 acquires attribute data such as the size and typeface of the character included in the character string when recognizing the character string included in the images 67a and 71a by using the character string learning model 13. You may. The attribute data of this character is reflected as the attribute data of the text data output by the output unit 36 described later.
 出力部36(図4参照)は、テキストデータを電子媒体のテキストとして出力する。
 出力部36は、複数の要素に係る範囲の各々の位置情報に、複数の要素に係るテキストデータの各々を電子媒体のテキストとして出力してもよい。
The output unit 36 (see FIG. 4) outputs text data as text on an electronic medium.
The output unit 36 may output each of the text data related to the plurality of elements as text on an electronic medium to each position information in the range related to the plurality of elements.
 電子媒体とは、電子的に記録媒体に保存されたデータに限らず、記録媒体に保存された状態でなくパソコンなどの情報処理装置がその内容を扱うことが出来るデータそのものも含むものとする。
 要素の位置情報は、文書画像62内の所定点を原点とした平面直交座標によって表されてもよい。
The electronic medium is not limited to data electronically stored in a recording medium, but also includes data itself that can be handled by an information processing device such as a personal computer, not in a state of being stored in the recording medium.
The position information of the element may be represented by plane orthogonal coordinates with a predetermined point in the document image 62 as the origin.
 出力部36によれば、複数の要素に係るテキストデータを当該要素に係る位置情報に基づいて出力するので、取得した文書画像61のレイアウトを維持しつつノイズを除去して、当該文書画像61内の文字列をテキストデータに変換して電子文書を出力することができる。 According to the output unit 36, since the text data related to the plurality of elements is output based on the position information related to the element, noise is removed while maintaining the layout of the acquired document image 61, and the inside of the document image 61 is used. It is possible to convert the character string of the above into text data and output an electronic document.
 出力部36は、文字列認識部35によって取得された文字の属性データについて、テキストデータに反映させて電子文書に出力してもよい。当該出力部36によれば、電子文書生成装置10は、文書画像61に含まれる文字の大きさ及び書体などの属性データについて、出力する電子文書に含まれるテキストデータの属性データとして再現することができる。 The output unit 36 may reflect the character attribute data acquired by the character string recognition unit 35 in the text data and output it to an electronic document. According to the output unit 36, the electronic document generator 10 can reproduce attribute data such as character size and typeface included in the document image 61 as attribute data of text data included in the electronic document to be output. can.
 レイアウト学習用データ生成部40(図4参照)は、複数の要素を含む文書画像であって、当該要素に当該要素の各々に該当する種類に関連付けられたアノテーションが付与されており、アノテーションが付与された複数の文書画像を蓄積してレイアウト学習用データを生成する。 The layout learning data generation unit 40 (see FIG. 4) is a document image including a plurality of elements, and annotations associated with the types corresponding to each of the elements are attached to the elements, and the annotations are added. Data for layout learning is generated by accumulating a plurality of document images.
 レイアウト学習用データはレイアウト学習モデル14の教師有り学習に用いられる。
 レイアウト学習用データに蓄積される文書画像に、アノテーションとともに文書画像に含まれる複数の要素に係る範囲の各々の文書画像内における位置情報が付与されてもよい。
The layout learning data is used for supervised learning of the layout learning model 14.
The document image stored in the layout learning data may be given position information in each document image of the range related to the plurality of elements included in the document image together with the annotation.
 図17から図23を参照して、アノテーションが付与されたレイアウト学習用データについて説明する。図17から図23は、アノテーションが付与されたレイアウト学習用データの例を示す図である。 The layout learning data to which annotations are added will be described with reference to FIGS. 17 to 23. 17 to 23 are diagrams showing an example of layout learning data to which annotations are added.
 レイアウト学習用データ生成部40は、文書画像データベース15から文書画像を取得し、当該文書画像にアノテーションを付与してレイアウト学習用データを生成する。なお、レイアウト学習用データを生成する際に、レイアウト学習用データ生成部40を用いること無く、ユーザが手動でレイアウト学習用データを生成することもできる。ユーザが手動でレイアウト学習用データを生成する場合は、文書画像データベース15から取得した文書画像にユーザ端末12を用いてアノテーションを付与することができる。 The layout learning data generation unit 40 acquires a document image from the document image database 15, annotates the document image, and generates layout learning data. When generating the layout learning data, the user can manually generate the layout learning data without using the layout learning data generation unit 40. When the user manually generates the layout learning data, the document image acquired from the document image database 15 can be annotated by using the user terminal 12.
 図17、18を参照して、請求書用のレイアウト学習モデル14の学習に用いられるレイアウト学習用データについて説明する。文書画像に含まれる要素として、文字列、表、画像、印章、外枠、ノイズについて、電子文書生成装置10が識別し分類できるように、それぞれの要素に注釈記号を付与する。 With reference to FIGS. 17 and 18, layout learning data used for learning the layout learning model 14 for invoices will be described. Annotation symbols are added to each element so that the electronic document generator 10 can identify and classify character strings, tables, images, stamps, outer frames, and noises as elements included in the document image.
 文字列に係る要素には、文字列の注釈記号76を付与し、当該文字列を矩形の枠線で囲い、当該枠線に「Text」のタグを目印として付ける。矩形の枠線で囲われた部分は、文書画像内における当該文字列に係る要素の占める範囲としてレイアウト学習モデル14に学習される。 Annotation symbol 76 of the character string is given to the element related to the character string, the character string is surrounded by a rectangular frame line, and the tag of "Text" is attached to the frame line as a mark. The portion surrounded by the rectangular frame line is learned by the layout learning model 14 as the range occupied by the elements related to the character string in the document image.
 表に係る要素には、表の注釈記号77を付与し、当該表の外枠に矩形の枠線を重ね合わせ、当該枠線に「Border Table」のタグを目印として付ける。矩形の枠線で囲われた部分は、文書画像内における当該表に係る要素の占める範囲としてレイアウト学習モデル14に学習される。 Annotation symbol 77 of the table is given to the element related to the table, a rectangular frame line is superimposed on the outer frame of the table, and the tag of "Border Table" is attached to the frame line as a mark. The portion surrounded by the rectangular frame line is learned by the layout learning model 14 as the range occupied by the elements related to the table in the document image.
 画像に係る要素には、画像の注釈記号78を付与し、当該画像の境界線に注釈記号を示す枠線を重ね合わせ、当該枠に「Image」のタグを目印として付ける。画像には、ロゴ、マーク、写真、及びイラストなどが含まれるものとする。枠線で囲われた部分は、文書画像内における当該画像に係る要素の占める範囲としてレイアウト学習モデル14に学習される。 An annotation symbol 78 of the image is attached to the element related to the image, a frame line indicating the annotation symbol is superimposed on the boundary line of the image, and the tag of "Image" is attached to the frame as a mark. Images shall include logos, marks, photographs, illustrations and the like. The portion surrounded by the frame line is learned by the layout learning model 14 as the range occupied by the elements related to the image in the document image.
 印章に係る要素には、印章の注釈記号79を付与し、当該印章の境界線に注釈記号を示す枠線を重ね合わせ、当該枠線に「Hun」のタグを目印として付ける。枠線で覆われた部分は、文書画像内における当該印章に係る要素の占める範囲としてレイアウト学習モデル14に学習される。 Annotation symbol 79 of the seal is given to the element related to the seal, a frame line indicating the annotation symbol is superimposed on the boundary line of the seal, and the tag of "Hun" is attached to the frame line as a mark. The portion covered with the frame line is learned by the layout learning model 14 as the range occupied by the element related to the seal in the document image.
 外枠に係る要素には、外枠の注釈記号80を付与し、当該外枠の境界線に枠線を重ね合わせ、当該枠線に「Border」のタグを目印として付ける。枠線を構成する4本の線分について、その長さと位置について、レイアウト学習モデル14に学習される。 Annotation symbol 80 of the outer frame is given to the element related to the outer frame, the frame line is superimposed on the boundary line of the outer frame, and the tag of "Border" is attached to the frame line as a mark. The layout learning model 14 learns about the length and position of the four line segments constituting the frame line.
 ノイズに係る要素には、ノイズの注釈記号81を付与し、当該ノイズを矩形の枠で囲い、当該枠に「Noise」のタグを目印として付ける。枠で覆われた部分は、文書画像内における当該ノイズに係る要素の占める範囲としてレイアウト学習モデル14に学習される。 A noise annotation symbol 81 is added to the element related to noise, the noise is surrounded by a rectangular frame, and the tag of "Noise" is attached to the frame as a mark. The portion covered with the frame is learned by the layout learning model 14 as the range occupied by the element related to the noise in the document image.
 図19を参照して、表の認識の学習に用いられるレイアウト学習データについて説明する。文書画像データベース15から取得した文書画像に含まれる表について、表を構成する全ての縦線に縦線の注釈記号83である一点鎖線を重ね合わせ、表を構成する全ての横線に横線の注釈記号84である破線を重ね合わせる。 The layout learning data used for learning table recognition will be described with reference to FIG. For the table included in the document image acquired from the document image database 15, the alternate long and short dash line, which is the vertical line annotation symbol 83, is superimposed on all the vertical lines constituting the table, and the horizontal line annotation symbol is applied to all the horizontal lines constituting the table. Overlay the dashed line 84.
 レイアウト学習モデル14は、全ての一点鎖線及び破線を認識することで、表の大きさ、表が占める範囲、位置、及び表に含まれ全セルの情報について学習することができる。セルの情報とは、表に含まれるセルの数、及びセル各々の表における位置のことであり、表における位置は表の(行、列)で表される。 By recognizing all the alternate long and short dash lines and broken lines, the layout learning model 14 can learn about the size of the table, the range occupied by the table, the position, and the information of all the cells included in the table. The cell information is the number of cells contained in the table and the position of each cell in the table, and the position in the table is represented by (row, column) of the table.
 図20、21を参照して、表のセルの中の文字列の認識の学習に用いられるレイアウト学習データについて説明する。図20は、セルの各々に1行ずつ文字列が含まれているレイアウト学習データである。図21は、1行の文字列を含むセル、2行の文字列を含むセル、3行の文字列を含むセルに係る表を認識するレイアウト学習データである。 With reference to FIGS. 20 and 21, layout learning data used for learning the recognition of character strings in table cells will be described. FIG. 20 is layout learning data in which a character string is included in each cell. FIG. 21 is layout learning data for recognizing a table relating to a cell containing a one-line character string, a cell containing a two-line character string, and a cell containing a three-line character string.
 図20、図21が示す通り、1つのセルに含まれる文字列の行数に影響を受けることなく、文字列の各々に文字列の注釈記号76を付与し、当該文字列を矩形の枠線で囲い、当該枠線に「Text」のタグを目印として付ける。 As shown in FIGS. 20 and 21, a character string annotation symbol 76 is added to each of the character strings without being affected by the number of rows of the character string contained in one cell, and the character string is bounded by a rectangular border. Enclose with and attach the tag of "Text" to the frame line as a mark.
 レイアウト学習モデル14は、文字列の注釈記号76の範囲、及び表内における文字列の位置を学習する。電子文書生成装置10は、表を構成する全ての縦線及び横線に係るオブジェクトデータとともに、文字列に係るテキストデータを電子文書に出力することで表を再現することができる。 The layout learning model 14 learns the range of the annotation symbol 76 of the character string and the position of the character string in the table. The electronic document generator 10 can reproduce the table by outputting the text data related to the character string to the electronic document together with the object data related to all the vertical lines and the horizontal lines constituting the table.
 図22を参照して、表のセルの中の文字列の認識の学習に用いられるレイアウト学習データについて説明する。図22の表のセルに含まれる文字列の各々に文字列の注釈記号76を付与し、当該文字列を矩形の枠線で囲い、当該枠線に「Text」のタグを目印として付ける。 With reference to FIG. 22, layout learning data used for learning the recognition of character strings in table cells will be described. An annotation symbol 76 for the character string is added to each of the character strings included in the cells of the table of FIG. 22, the character string is surrounded by a rectangular frame line, and the tag "Text" is attached to the frame line as a mark.
 レイアウト学習モデル14は、文字列の注釈記号76の範囲、及び文字列の文書内における位置情報について学習をする。電子文書生成装置10は、文字列に係るテキストデータを文書内における位置に置くことで表を電子文書内に再現することができる。電子文書生成装置10は、表を構成する縦線及び横線を電子文書内に再現することなく、テキストデータの出力のみで電子文書内に表を再現することができる。 The layout learning model 14 learns about the range of the annotation symbol 76 of the character string and the position information of the character string in the document. The electronic document generator 10 can reproduce a table in an electronic document by placing text data related to a character string at a position in the document. The electronic document generator 10 can reproduce the table in the electronic document only by outputting the text data without reproducing the vertical lines and the horizontal lines constituting the table in the electronic document.
 図23を参照して、印章の認識の学習に用いられるレイアウト学習モデルについて説明する。図23に係るレイアウト学習モデルにより、レイアウト学習モデル14は、印章に係る要素を用いること無く、文字列に係る要素、及び文字列の下部に位置する空白により印章の範囲と位置を学習することができる。 With reference to FIG. 23, the layout learning model used for learning the recognition of the seal will be described. According to the layout learning model according to FIG. 23, the layout learning model 14 can learn the range and position of the seal by the element related to the character string and the blank located at the bottom of the character string without using the element related to the seal. can.
 レイアウト学習用データ修正部41(図4参照)は、入力に基づいて、レイアウト認識部33により取得された複数の要素の各々の種類、及び複数の要素の各々の範囲の文書画像内における位置情報の少なくともいずれかが修正され、この修正されたデータを追加することでレイアウト学習用データを更新する。 The layout learning data correction unit 41 (see FIG. 4) has the position information in the document image of each type of the plurality of elements acquired by the layout recognition unit 33 and the range of each of the plurality of elements based on the input. At least one of the above is modified, and the layout learning data is updated by adding this modified data.
 レイアウト認識部33に画像認識される前の文書画像61とレイアウト認識部33に画像認識された後の文書画像62との間に齟齬が生じる場合がある。例えば、文字列の一部が認識されない場合、画像として認識されるべき要素が印章として認識されてしまう場合、表の位置にズレが生じた場合などがある。 There may be a discrepancy between the document image 61 before the image is recognized by the layout recognition unit 33 and the document image 62 after the image is recognized by the layout recognition unit 33. For example, a part of the character string may not be recognized, an element to be recognized as an image may be recognized as a seal, or the position of the table may be misaligned.
 このような場合に、レイアウト認識部33に画像認識された後の文書画像62について、レイアウト認識部33に画像認識される前の文書画像61に合わせるように修正を行い、この修正されたデータをレイアウト学習用データに追加することで、レイアウト学習用データは更新される。 In such a case, the document image 62 after the image is recognized by the layout recognition unit 33 is modified so as to match the document image 61 before the image is recognized by the layout recognition unit 33, and the corrected data is used. By adding to the layout learning data, the layout learning data is updated.
 レイアウト学習部42(図4参照)は、レイアウト学習用データ修正部41により更新されたレイアウト学習用データを用いて、レイアウト学習モデル14の再学習を行う。
 レイアウト学習モデル14は再学習されることで、文書画像のレイアウトの認識精度を向上させることができる。
The layout learning unit 42 (see FIG. 4) relearns the layout learning model 14 using the layout learning data updated by the layout learning data correction unit 41.
By re-learning the layout learning model 14, the recognition accuracy of the layout of the document image can be improved.
 文字列学習用データ生成部43(図4参照)は、文字列学習モデル13の教師有り学習に用いる文字列学習用データを生成する。
 文字列学習用データ修正部44(図4参照)は、入力に基づいて、文字列認識部35により生成されたテキストデータが修正され、この修正されたテキストデータを追加することで文字列学習用データを更新する。
The character string learning data generation unit 43 (see FIG. 4) generates character string learning data used for supervised learning of the character string learning model 13.
The character string learning data correction unit 44 (see FIG. 4) corrects the text data generated by the character string recognition unit 35 based on the input, and adds the corrected text data for character string learning. Update the data.
 文字列学習部45(図4参照)は、文字列学習用データ修正部44により更新された文字列学習用データを用いて、文字列学習モデル13の再学習を行う。 The character string learning unit 45 (see FIG. 4) relearns the character string learning model 13 using the character string learning data updated by the character string learning data correction unit 44.
 文字列学習用データ生成部43は、文書画像データベース15から文書画像を取得し、当該文書画像にアノテーションを付与して文字列学習用データを生成する。なお、文字列学習用データを生成する際に、文字列学習用データ生成部43を用いること無く、ユーザが手動で文字列学習用データを生成することもできる。ユーザが手動で文字列学習用データを生成する場合は、文書画像データベース15から取得した文書画像にユーザ端末12を用いてアノテーションを付与することができる。 The character string learning data generation unit 43 acquires a document image from the document image database 15, annotates the document image, and generates character string learning data. When generating the character string learning data, the user can manually generate the character string learning data without using the character string learning data generation unit 43. When the user manually generates the character string learning data, the document image acquired from the document image database 15 can be annotated by using the user terminal 12.
 図24を参照して、文字列学習モデル13の学習に用いられる文字列学習データについて説明する。図21は、アノテーションが付与された文字列学習用データの例を示す図である。図24は、文字列学習用データ生成部43の出力画面であり、ユーザ端末12若しくは電子文書生成装置10の出力装置27に表示される。 With reference to FIG. 24, the character string learning data used for learning the character string learning model 13 will be described. FIG. 21 is a diagram showing an example of character string learning data to which annotations are added. FIG. 24 is an output screen of the character string learning data generation unit 43, which is displayed on the user terminal 12 or the output device 27 of the electronic document generation device 10.
 文字列学習用データ生成部43は、文書画像データベース15から取得した文書画像に含まれる文字列について、当該文字列各々に対応するテキストデータをテキストデータの注釈85として付与する。 The character string learning data generation unit 43 assigns text data corresponding to each of the character strings to the character strings included in the document image acquired from the document image database 15 as the comment 85 of the text data.
 なお、アノテーションの付与はテキストデータに替えて、対応する文字コードをテキストデータの注釈85として付与してもよい。文字列学習用データ生成部43は、文書画像に含まれる文字列が空白を含む場合、当該文字列に対応するテキストデータは同じ様に空白を含むようにして文字列学習用データを生成する。 Note that the annotation may be added as the text data annotation 85 instead of the text data. When the character string included in the document image contains a blank, the character string learning data generation unit 43 generates the character string learning data so that the text data corresponding to the character string also contains a blank.
 次に、図25を参照して、本実施形態に係る電子文書生成装置10によって実行される電子文書生成方法を電子文書生成プログラムとともに説明する。図25は、電子文書生成プログラムのフローチャートである。電子文書生成方法は、電子文書生成プログラムに基づいて、電子文書生成装置10のCPU25により実行される。 Next, with reference to FIG. 25, an electronic document generation method executed by the electronic document generation device 10 according to the present embodiment will be described together with an electronic document generation program. FIG. 25 is a flowchart of an electronic document generation program. The electronic document generation method is executed by the CPU 25 of the electronic document generation device 10 based on the electronic document generation program.
 電子文書生成プログラムは、電子文書生成装置10のCPU25に対して、文書画像取得機能、前処理機能、レイアウト認識機能、切出機能、文字認識機能、出力機能などの各種機能を実現させる。これらの機能は図25に示される順に実行されるが、適宜、順番を入れ替えて実行することもできる。なお、各機能は前述の電子文書生成装置10の説明と重複するため、その詳細な説明は省略する。 The electronic document generation program realizes various functions such as a document image acquisition function, a preprocessing function, a layout recognition function, a cutting function, a character recognition function, and an output function for the CPU 25 of the electronic document generation device 10. These functions are executed in the order shown in FIG. 25, but the order may be changed as appropriate. Since each function overlaps with the description of the electronic document generation device 10 described above, a detailed description thereof will be omitted.
 文書画像取得機能は、文書を画像化した文書画像を取得する(S31:文書画像取得ステップ)。
 文書画像の形式は、一例としてPDF、JPG、及びGIFなどがあり、この他電子文書生成装置10が画像として処理できるデータ形式のものは含み得る。
The document image acquisition function acquires a document image obtained by converting a document into an image (S31: document image acquisition step).
The format of the document image includes, for example, PDF, JPG, GIF, and the like, and other data formats that the electronic document generator 10 can process as an image may be included.
 前処理機能は、文書画像取得機能が取得した文書画像について前処理を行う(S32:前処理ステップ)。
 前処理機能は背景除去機能、傾き補正機能、及び形状調整機能を備え、背景除去機能は文書画像取得機能が取得した文書画像の背景を除去し、傾き補正機能は文書画像取得機能が取得した文書画像の傾きを補正し、形状調整機能は文書画像取得機能が取得した文書画像の全体の形状及び大きさを調整する。
The pre-processing function performs pre-processing on the document image acquired by the document image acquisition function (S32: pre-processing step).
The pre-processing function has a background removal function, a tilt correction function, and a shape adjustment function. The background removal function removes the background of the document image acquired by the document image acquisition function, and the tilt correction function removes the background of the document image acquired by the document image acquisition function. The tilt of the image is corrected, and the shape adjustment function adjusts the overall shape and size of the document image acquired by the document image acquisition function.
 レイアウト認識機能は、文書画像に含まれる複数の要素と、当該複数の要素の各々の識別情報との対応関係を学習したレイアウト学習モデル14を用いて、文書画像取得機能に取得された文書画像に含まれる複数の要素の各々の文書画像内における範囲を特定し、複数の要素の各々の種類を認識し、及び複数の要素の各々の範囲に係る文書画像内における位置情報を取得する(S33:レイアウト認識ステップ)。 The layout recognition function uses a layout learning model 14 that learns the correspondence between a plurality of elements included in the document image and the identification information of each of the plurality of elements to obtain a document image acquired by the document image acquisition function. The range in each document image of the plurality of elements included is specified, each type of the plurality of elements is recognized, and the position information in the document image relating to each range of the plurality of elements is acquired (S33:). Layout recognition step).
 要素の種類は、文書の種類に応じて必要なものと不要なものとに分類してもよい。
 この場合、レイアウト認識機能は、文書画像取得機能に取得された文書画像に含まれる複数の要素のうち、認識した要素が不要なものに該当する場合は当該要素の位置情報は取得されず、認識した要素が必要なものに該当する場合は当該要素の位置情報を取得することとしてもよい。または、レイアウト認識機能は、文書画像61に含まれる複数の要素のうち、必要な要素のみを認識し、当該要素の位置情報を取得することとしてもよい。
The types of elements may be classified into necessary and unnecessary according to the type of document.
In this case, if the recognized element corresponds to an unnecessary element among a plurality of elements included in the document image acquired by the document image acquisition function, the layout recognition function does not acquire the position information of the element and recognizes it. If the specified element corresponds to a necessary one, the position information of the element may be acquired. Alternatively, the layout recognition function may recognize only the necessary elements among the plurality of elements included in the document image 61 and acquire the position information of the elements.
 レイアウト認識機能は、要素の各々の種類を認識して、当該要素の各々の範囲に係る文書画像の位置情報を取得した後、要素同士が重なり合う、若しくは要素同士が離れすぎている場合には、実際の文書に基づいて、当該要素の各々の範囲及び取得した位置情報を補正する。 The layout recognition function recognizes each type of element, acquires the position information of the document image related to each range of the element, and then if the elements overlap or the elements are too far apart, the layout recognition function recognizes each type of the element. Based on the actual document, the range of each of the elements and the acquired position information are corrected.
 レイアウト認識機能は、表を構成する全ての縦線及び横線の各々の長さと位置を認識する。レイアウト認識機能は、表を構成する全ての縦線及び横線の長さと位置を把握することで、表に含まれる全てのセルについて把握する。すなわち、レイアウト認識機能は、隣接する2本の縦線及び隣接する2本の横線により構成される四角形をセルとして認識する。 The layout recognition function recognizes the length and position of each of the vertical and horizontal lines that make up the table. The layout recognition function grasps all the cells included in the table by grasping the lengths and positions of all the vertical lines and horizontal lines constituting the table. That is, the layout recognition function recognizes a quadrangle composed of two adjacent vertical lines and two adjacent horizontal lines as cells.
 さらに、レイアウト認識機能は表を構成する線の線種についても認識する。認識された線種は、取得した文書画像に基づいて電子文書を再現する際に、当該電子文書に含まれる表を構成する線のオブジェクトに反映される。従って、例えば、文書画像内の表の線が破線であった場合、文書画像に基づいて再現された電子文書に含まれる表の線は破線のオブジェクトとして表現される。 Furthermore, the layout recognition function also recognizes the line types of the lines that make up the table. The recognized line type is reflected in the line object constituting the table included in the electronic document when the electronic document is reproduced based on the acquired document image. Thus, for example, if the table line in the document image is a dashed line, the table line contained in the electronic document reproduced based on the document image is represented as a dashed object.
 切出機能は、レイアウト認識機能により認識された種類が表に該当する要素において、当該要素に含まれる表の中のセルの各々を切り出し、セルの各々の文書画像内における位置情報を取得する(S34:切出ステップ)。
 切出機能は、レイアウト認識機能により認識された表を構成する全ての縦線及び横線を再生し、全てのセルの位置情報を生成する。
The cutout function cuts out each of the cells in the table included in the element in the element whose type recognized by the layout recognition function corresponds to the table, and acquires the position information in each document image of the cell ( S34: Cutting step).
The cutout function reproduces all the vertical and horizontal lines constituting the table recognized by the layout recognition function, and generates the position information of all the cells.
 切出機能により切り出されたセルは、複数の文字列を含む場合もある。切出機能は、切り出されたセルに複数行の文字列が含まれている場合は、さらに全ての文字列について文字列単体毎の画像を切り出す。 The cell cut out by the cutout function may contain multiple character strings. When the cut-out cell contains a multi-line character string, the cutout function further cuts out an image for each character string for all the character strings.
 レイアウト認識機能によって認識された文字列の画像、及び切出し機能により切り出された文字列の画像は、1行ごとに文字認識機能に送り出される。 The image of the character string recognized by the layout recognition function and the image of the character string cut out by the cutout function are sent to the character recognition function line by line.
 文字認識機能は、文書画像と当該文書画像に含まれる文字列との対応関係を学習した文字列学習モデルを用いて、文書画像取得機能に取得された文書画像に含まれる文字列を文字認識し、当該文字列に係るテキストデータを生成する(S35:文字認識ステップ)。 The character recognition function uses a character string learning model that learns the correspondence between the document image and the character string included in the document image, and recognizes the character string included in the document image acquired by the document image acquisition function. , Generates text data related to the character string (S35: character recognition step).
 出力機能は、テキストデータを電子媒体のテキストとして出力する(S36:出力ステップ)。
 出力機能は、レイアウト認識機能により取得された文字列の位置情報、及び切出部により取得されたセルの文書画像内における位置情報に基づいてテキストデータを出力し、電子媒体のテキストとして再生する。
The output function outputs the text data as text on an electronic medium (S36: output step).
The output function outputs text data based on the position information of the character string acquired by the layout recognition function and the position information in the document image of the cell acquired by the cutout portion, and reproduces the text as text on an electronic medium.
 次に、上記した電子文書生成プログラムについて、図26から図28を参照して、領収書の文書画像を電子文書に変換する一実施形態について説明する。図26から図28は、電子文書生成プログラムに係る一実施形態のフローチャートである。図26から図28に示すフローチャートは、これらを結合することで、1つの電子文書生成プログラムのフローチャートを示すこととなる。 Next, with respect to the above-mentioned electronic document generation program, an embodiment for converting a document image of a receipt into an electronic document will be described with reference to FIGS. 26 to 28. 26 to 28 are flowcharts of an embodiment relating to an electronic document generation program. The flowcharts shown in FIGS. 26 to 28 show a flowchart of one electronic document generation program by combining them.
 ステップS102において、文書画像取得部31は、文書画像データベース15より文書画像若しくはPDFを取得する。
 ステップS103において、文書画像取得部31が取得したデータがPDFか否かの判定を行う。PDFではない場合(No:S103)、即ち、文書画像取得部31が取得したデータが文書画像であった場合、ステップS106に移行する。
In step S102, the document image acquisition unit 31 acquires a document image or PDF from the document image database 15.
In step S103, it is determined whether or not the data acquired by the document image acquisition unit 31 is PDF. If it is not a PDF (No: S103), that is, if the data acquired by the document image acquisition unit 31 is a document image, the process proceeds to step S106.
 文書画像取得部31が取得したデータがPDFだった場合(Yes:S103)、ステップS104に移行し、当該PDFを文書画像に変換した後に当該文書画像を取得する(S105)。 When the data acquired by the document image acquisition unit 31 is a PDF (Yes: S103), the process proceeds to step S104, the PDF is converted into a document image, and then the document image is acquired (S105).
 ステップS106において、前処理部32は、取得した文書画像について前処理を行う。前処理部32は、背景除去部32a、傾き補正部32b、及び形状調整部32cを備える。 In step S106, the preprocessing unit 32 performs preprocessing on the acquired document image. The pretreatment unit 32 includes a background removal unit 32a, an inclination correction unit 32b, and a shape adjustment unit 32c.
 背景除去部32aは、取得した文書画像の背景を除去する。傾き補正部32bは、取得した文書画像に含まれる文字列が傾いている場合に傾き補正を行い文字列の傾きを補正する。形状調整部32cは、取得した文書画像の全体の形状及び大きさの調整を行う。 The background removing unit 32a removes the background of the acquired document image. When the character string included in the acquired document image is tilted, the tilt correction unit 32b corrects the tilt and corrects the tilt of the character string. The shape adjusting unit 32c adjusts the overall shape and size of the acquired document image.
 ステップS107において、レイアウト認識部33は、前処理部32により行われた前処理を経た文書画像を取得する。
 取得された前処理後の文書画像は、後述のステップS115、ステップS120、及びステップS136の文書画像切り出し処理に送られる。
In step S107, the layout recognition unit 33 acquires a document image that has undergone preprocessing performed by the preprocessing unit 32.
The acquired document image after preprocessing is sent to the document image cutting process of step S115, step S120, and step S136 described later.
 ステップS108及びステップS109において、レイアウト認識部33は、文書画像のレイアウト認識を行い、文書画像に含まれる複数の要素について、要素ごとにその範囲を特定し、要素ごとに種類と位置情報とを取得する。
 要素の種類は、文字列、表、画像、印章、手書きである。
In step S108 and step S109, the layout recognition unit 33 performs layout recognition of the document image, specifies the range of a plurality of elements included in the document image for each element, and acquires the type and position information for each element. do.
The types of elements are character strings, tables, images, seals, and handwriting.
 ステップS110において、レイアウト認識部33は、取得した要素の最小境界ボックスの位置情報の調整処理を行う。
 最小境界ボックスとは、要素を囲う矩形のうち面積が最小のものをいい、当該要素が占める範囲を意味する。レイアウト認識部33は、文書画像と取得した要素とを照合し、文書画像と取得した要素の位置情報との間に齟齬があった場合は取得した要素の最小境界ボックスの位置情報の調整を行う。
In step S110, the layout recognition unit 33 adjusts the position information of the minimum boundary box of the acquired element.
The minimum boundary box means the rectangle surrounding the element and having the smallest area, and means the range occupied by the element. The layout recognition unit 33 collates the document image with the acquired element, and if there is a discrepancy between the document image and the position information of the acquired element, the layout recognition unit 33 adjusts the position information of the minimum boundary box of the acquired element. ..
 ステップS111において、レイアウト認識部33は、ステップS110にて実施された最小境界ボックスの調整処理後のレイアウト情報を取得する。当該レイアウト情報には、要素の種類、及び位置情報が含まれる。 In step S111, the layout recognition unit 33 acquires the layout information after the adjustment process of the minimum boundary box performed in step S110. The layout information includes element types and position information.
 ステップS112において、レイアウト認識部33は、後述するステップS130の処理により送られてくる内部記憶された要素のレイアウト情報を参照して、文書画像の中に他の要素が残っているか否かを判定する。 In step S112, the layout recognition unit 33 refers to the layout information of the internally stored element sent by the process of step S130 described later, and determines whether or not other elements remain in the document image. do.
 ステップS130の処理により送られてくる内部記憶された要素のレイアウト情報の中に全ての要素のレイアウト情報が含まれている場合、レイアウト認識部33は、文書画像の中に他の要素が残っていないと判定して(No:S112)、ステップS131に移行して、ステップS112からステップS130のループの終了処理を行いステップS132に移行する。 When the layout information of all the elements is included in the layout information of the internally stored elements sent by the process of step S130, the layout recognition unit 33 has other elements remaining in the document image. It is determined that there is no such thing (No: S112), the process proceeds to step S131, the loop termination processing of step S112 to step S130 is performed, and the process proceeds to step S132.
 一方、ステップS130の処理により送られてくる内部記憶された要素のレイアウト情報の中に全ての要素のレイアウト情報が含まれていない場合、レイアウト認識部33は、文書画像の中に他の要素が残っていると判定して(Yes:S112)、ステップS113に移行する。 On the other hand, when the layout information of all the elements is not included in the layout information of the internally stored elements sent by the process of step S130, the layout recognition unit 33 has other elements in the document image. It is determined that it remains (Yes: S112), and the process proceeds to step S113.
 ステップS113において、レイアウト認識部33は、文書画像の中に残る要素が表であるか否かを判定する。
 文書画像に表が残っていない場合(No:S113)、後述のステップS130へ表以外のレイアウト情報を送る。
In step S113, the layout recognition unit 33 determines whether or not the element remaining in the document image is a table.
If no table remains in the document image (No: S113), layout information other than the table is sent to step S130 described later.
 文書画像に表が残っている場合(Yes:S113)、ステップS114に移行する。なお、文書画像は領収書に係るものであるので表を含む場合が多い。従って、文書画像が表を含まないと判断された場合、レイアウト認識部33は処理を中断して、電子文書が領収書に係るものか否かを確認するようにしてもよい。 If the table remains in the document image (Yes: S113), the process proceeds to step S114. Since the document image is related to the receipt, it often includes a table. Therefore, if it is determined that the document image does not include the table, the layout recognition unit 33 may interrupt the process and confirm whether or not the electronic document relates to a receipt.
 ステップS114において、レイアウト認識部33は、文書画像内の表を構成する全ての縦線及び横線の大きさ及び位置情報を取得する。表を構成する全ての縦線及び横線の大きさ及び位置情報を取得すれば、表に含まれる全てのセルについて、そのセルの大きさ及び位置を取得することができる。 In step S114, the layout recognition unit 33 acquires the size and position information of all the vertical lines and horizontal lines constituting the table in the document image. If the size and position information of all the vertical lines and the horizontal lines constituting the table are acquired, the size and position of the cells can be acquired for all the cells included in the table.
 ステップS115において、切出部34は、ステップS107の処理により取得された前処理後の文書画像の中から表の画像の切り出し処理を行う。
 ステップS116において、切出部34は、ステップS115にて切り出された表の画像を取得する。
In step S115, the cutout unit 34 cuts out a table image from the preprocessed document image acquired by the process of step S107.
In step S116, the cutout unit 34 acquires an image of the table cut out in step S115.
 ステップS117及びステップS118において、切出部34は、ステップS116にて取得された表の画像からセルを抽出する処理を行い(ステップS117)、セルの情報を取得する(ステップS118)。 In step S117 and step S118, the cutting unit 34 performs a process of extracting cells from the image of the table acquired in step S116 (step S117), and acquires cell information (step S118).
 セルの情報とは、表におけるセルの位置情報に相当する行、列、及び座標のことである。
 ステップS118にて取得されたセルの情報は、後述するステップS127に送られる。
The cell information is the row, column, and coordinates corresponding to the cell position information in the table.
The cell information acquired in step S118 is sent to step S127, which will be described later.
 ステップS119において、切出部34は、ステップS127の処理により送られてきた内部記憶された表のレイアウト情報を参照して、表の中に他のセルが残っているか否かを判定する。 In step S119, the cutting unit 34 refers to the layout information of the internally stored table sent by the process of step S127, and determines whether or not other cells remain in the table.
 ステップS127の処理により送られてきた内部記憶されたセルのレイアウト情報の中に全てのセルのレイアウト情報が含まれている場合、切出部34は、表の中に他のセルが残っていないと判定して(No:S119)、ステップS128に移行して、ステップS119からステップS127のループの終了処理を行いステップS130に移行する。 When the layout information of all the cells is included in the layout information of the internally stored cells sent by the process of step S127, the cutout unit 34 has no other cells left in the table. (No: S119), the process proceeds to step S128, the loop end processing of step S119 is performed, and the process proceeds to step S130.
 一方、ステップS127の処理により送られてくる内部記憶されたセルのレイアウト情報の中に全てのセルのレイアウト情報が含まれていない場合、切出部34は、表の中に他のセルが残っていると判定して(Yes:S119)、ステップS120に移行する。 On the other hand, when the layout information of all the cells is not included in the layout information of the internally stored cells sent by the process of step S127, the cutout portion 34 has other cells remaining in the table. (Yes: S119), and the process proceeds to step S120.
 ステップS120において、切出部34は、ステップS107の処理により取得された前処理後の文書画像からセルの画像を切り出す処理を行う。
 ステップS121において、切出部34は、ステップS120の処理により切り出されたセルの画像を取得する。
In step S120, the cutting unit 34 performs a process of cutting out a cell image from the preprocessed document image acquired by the process of step S107.
In step S121, the cutting unit 34 acquires an image of the cell cut out by the process of step S120.
 ステップS122において、文字列認識部35は、ステップS121の処理により取得されたセルの画像について文字列認識の処理を行う。
 ステップS123において、文字列認識部35は、文字列認識の処理が行われた文字列の位置情報を取得する。
In step S122, the character string recognition unit 35 performs a character string recognition process on the image of the cell acquired by the process of step S121.
In step S123, the character string recognition unit 35 acquires the position information of the character string for which the character string recognition process has been performed.
 ステップS124において、文字列認識部35は、ステップS123の処理により取得した文字列の最小境界ボックスの位置情報の調整処理を行う。
 文字列認識部35は、文書画像と取得した文字列の位置情報とを照合し、文書画像と取得した文字列の位置情報との間に齟齬があった場合は取得した文字列の最小境界ボックスの位置情報の調整を行う。
In step S124, the character string recognition unit 35 adjusts the position information of the minimum boundary box of the character string acquired by the process of step S123.
The character string recognition unit 35 collates the document image with the position information of the acquired character string, and if there is a discrepancy between the document image and the position information of the acquired character string, the minimum boundary box of the acquired character string. Adjust the position information of.
 ステップS125において、文字列認識部35は、ステップS124にて実施された文字列の最小境界ボックスの位置情報の調整処理後の位置情報を取得する。 In step S125, the character string recognition unit 35 acquires the position information after the adjustment process of the position information of the minimum boundary box of the character string carried out in step S124.
 ステップS126及びステップS127において、文字列認識部35は、ステップS118の処理によって取得されたセルの情報とステップS125の処理によって取得された調整後の文字列の位置情報とを併合し(ステップS127)、表のレイアウト情報として内部記憶装置に内部記憶する(ステップS126)。内部記憶装置とは、図2に示すRAM23若しくは記憶部24の何れか、若しくはその両方のことをいう。 In step S126 and step S127, the character string recognition unit 35 merges the cell information acquired by the process of step S118 and the position information of the adjusted character string acquired by the process of step S125 (step S127). , Internally stored in the internal storage device as table layout information (step S126). The internal storage device refers to either or both of the RAM 23 and the storage unit 24 shown in FIG.
 ステップS119からステップS127の処理は、表に含まれる全てのセルについて行われる。ステップS119からステップS127の処理が表に含まれる最後のセルについて行われた後にステップS128のループの終了処理が行われて、文字列認識部35はステップS130に移行する。 The processing of steps S119 to S127 is performed for all cells included in the table. After the processing of steps S119 to S127 is performed on the last cell included in the table, the loop termination processing of step S128 is performed, and the character string recognition unit 35 shifts to step S130.
 ステップS129及びステップS130において、出力部36は、ステップS126の処理により取得された表のレイアウト情報とステップS113の処理により取得された表以外のレイアウト情報とを併合し(ステップS130)、全ての要素のレイアウト情報として内部記憶装置に内部記憶する(ステップS129)。 In step S129 and step S130, the output unit 36 merges the layout information of the table acquired by the process of step S126 and the layout information other than the table acquired by the process of step S113 (step S130), and all the elements. Is internally stored in the internal storage device as the layout information of (step S129).
 ステップS112からステップS130の処理は、文書画像に含まれる全ての要素について行われる。ステップS112からステップS130の処理が文書画像に含まれる最後の要素について行われた後にステップS131のループの終了処理が行われて、文字列認識部35はステップS132に移行する。 The processing of steps S112 to S130 is performed for all the elements included in the document image. After the processing of steps S112 to S130 is performed for the last element included in the document image, the loop end processing of step S131 is performed, and the character string recognition unit 35 shifts to step S132.
 ステップS132において、文字列認識部35は、文書画像に他の要素が残っているか否かの判定を行う。
 文字列認識部35は、後述のステップS140の処理により送られてきた内部記憶された要素のレイアウト情報を参照して、文書画像の中に他の要素が残っているか否かを判定する。
In step S132, the character string recognition unit 35 determines whether or not other elements remain in the document image.
The character string recognition unit 35 refers to the layout information of the internally stored elements sent by the process of step S140 described later, and determines whether or not other elements remain in the document image.
 ステップS140の処理により送られてきた内部記憶された要素のレイアウト情報の中に全ての要素のレイアウト情報が含まれている場合、文字列認識部35は、文書画像の中に他の要度が残っていないと判定して(No:S132)、ステップS141に移行して、ステップS132からステップS140のループの終了処理を行いステップS142に移行する。 When the layout information of all the elements is included in the layout information of the internally stored elements sent by the process of step S140, the character string recognition unit 35 has another degree in the document image. It is determined that there is no remaining (No: S132), the process proceeds to step S141, the loop termination process of step S132 to step S140 is performed, and the process proceeds to step S142.
 一方、ステップS140の処理により送られてくる内部記憶された要素のレイアウト情報の中に全ての要素のレイアウト情報が含まれていない場合、文字列認識部35は、文書画像の中に他の要素が残っていると判定して(Yes:S132)、ステップS133に移行する。 On the other hand, when the layout information of all the elements is not included in the layout information of the internally stored elements sent by the process of step S140, the character string recognition unit 35 has another element in the document image. Is determined to remain (Yes: S132), and the process proceeds to step S133.
 ステップS133において、文字列認識部35は、文書画像の中に残っている要素が文字列であるか否かを判定する。
 文字列認識部35が、文書画像に残っている要素が文字列であると判定した場合(Yes:S133)、ステップS135に移行する。
In step S133, the character string recognition unit 35 determines whether or not the element remaining in the document image is a character string.
When the character string recognition unit 35 determines that the element remaining in the document image is a character string (Yes: S133), the process proceeds to step S135.
 文字列認識部35が、文書画像に残っている要素が文字列ではないと判定した場合(No:S133)、ステップS132に移行するループの続行処理が行われる(ステップS134)。
 ステップS135において、文字列認識部35は、文字列の位置情報を取得する。
When the character string recognition unit 35 determines that the element remaining in the document image is not a character string (No: S133), the loop continuation process of shifting to step S132 is performed (step S134).
In step S135, the character string recognition unit 35 acquires the position information of the character string.
 ステップS136及びステップS137において、文字列認識部35は、ステップS107の処理によって取得された前処理後の文書画像の中から文字列の画像を切り出し(ステップS136)、文字列の画像を取得する。 In step S136 and step S137, the character string recognition unit 35 cuts out an image of the character string from the preprocessed document image acquired by the process of step S107 (step S136), and acquires the image of the character string.
 ステップS138及びステップS139において、文字列認識部35は、ステップS137の処理によって取得された文字列の画像について文字列認識の処理を行い(ステップS138)、文字列認識の処理により予測されたテキストデータを生成する(ステップS139)。 In step S138 and step S139, the character string recognition unit 35 performs a character string recognition process on the image of the character string acquired by the process of step S137 (step S138), and the text data predicted by the character string recognition process. Is generated (step S139).
 ステップS140において、文字列認識部35は、ステップS135の処理によって取得された文字列の位置情報とステップS139の処理によって生成されたテキストデータとを併合して要素のレイアウト情報を生成する。生成された要素のレイアウト情報は、ステップS129に送られる。ステップS129では、送られてきた要素のレイアウト情報を内部記憶装置に内部記憶する。内部記憶装置とは、図2に示すRAM23若しくは記憶部24の何れか、若しくはその両方のことをいう。 In step S140, the character string recognition unit 35 merges the position information of the character string acquired by the process of step S135 and the text data generated by the process of step S139 to generate the layout information of the element. The layout information of the generated element is sent to step S129. In step S129, the layout information of the sent elements is internally stored in the internal storage device. The internal storage device refers to either or both of the RAM 23 and the storage unit 24 shown in FIG.
 ステップS132からステップS140までの処理は、ステップS140の処理によって送られてきた要素のレイアウト情報の中に全ての要素のレイアウト情報が含まれているとステップS132の処理によって判定されるまで行われる。 The processing from step S132 to step S140 is performed until it is determined by the processing of step S132 that the layout information of all the elements is included in the layout information of the elements sent by the processing of step S140.
 ステップS141において、ステップS140の処理によって送られてきた要素のレイアウト情報の中に全ての要素のレイアウト情報が含まれているとステップS132の処理によって判定されたことを受けて、ステップS132からステップS140までのループの終了処理が行われて、ステップS142に移行される。 In step S141, in response to the determination by the process of step S132 that the layout information of all the elements is included in the layout information of the elements sent by the process of step S140, steps S132 to S140 The process of ending the loop up to is performed, and the process proceeds to step S142.
 ステップS142において、電子文書生成装置10は後処理を行う。後処理では、全ての要素のテキストデータ、画像、及び位置情報について、JSON(javascript object notation)への出力、及びTSV(Tab-Separated Values)への変換などが行われる。
 なお、上記した各機能部の処理は、電子文書生成装置10のCPU25により実行される処理である。
In step S142, the electronic document generator 10 performs post-processing. In the post-processing, the text data, images, and position information of all the elements are output to JSON (Javascript objectionation) and converted to TSV (Tab-Separated Values).
The processing of each of the above-mentioned functional units is a processing executed by the CPU 25 of the electronic document generation device 10.
 ステップS143において、出力部36は、後処理を経た全ての要素の情報について、最終形態として単純なテキストファイル、HTML(HyperText Markup Language)、市販されている文字編集ソフトで編集可能なファイル形式、及び編集可能はPDFなどの電子文書として出力する。 In step S143, the output unit 36 has a simple text file as a final form, an HTML (HyperText Markup Language), a file format editable by commercially available character editing software, and a file format in which the information of all the elements that have undergone post-processing can be edited. Editable is output as an electronic document such as PDF.
 上記した実施形態によれば、電子文書生成装置10は、文書画像のレイアウトについてレイアウト学習モデル14を用いて認識し、その上で文字列学習モデル13を用いて文書画像の文字認識を行う。すなわち、電子文書生成装置10は、文書画像に含まれる複数の要素の種類を特定し、要素の種類に適した文字認識を行うので、文字認識の認識精度を向上させることができる。 According to the above-described embodiment, the electronic document generator 10 recognizes the layout of the document image using the layout learning model 14, and then performs character recognition of the document image using the character string learning model 13. That is, since the electronic document generation device 10 identifies the types of a plurality of elements included in the document image and performs character recognition suitable for the types of elements, the recognition accuracy of character recognition can be improved.
 さらに、上記した実施形態によれば、電子文書生成装置10は、従来からのOCRテキスト認識技術で行っていた一文字単位の文字認識と比較して、文字列学習モデル13を用いて文書画像の文字認識を文字列ごとに行うので、文字認識の際の認識効率を向上させることができる。 Further, according to the above-described embodiment, the electronic document generator 10 uses the character string learning model 13 to characterize a document image, as compared with character recognition in character units, which has been performed by conventional OCR text recognition technology. Since the recognition is performed for each character string, the recognition efficiency at the time of character recognition can be improved.
 さらに、上記した実施形態によれば、電子文書生成装置10は、文字認識を行う際に、一文字単位の文字認識ではなく、文字列ごとに文字認識を行うので、文字に重なって存在するノイズなどの影響を抑制して文字認識を行うことができ、一文字単位で行う文字認識と比較して文字認識の認識精度を向上させることができる。 Further, according to the above-described embodiment, when the electronic document generator 10 performs character recognition, the character recognition is performed for each character string instead of the character recognition for each character, so that noise existing on the characters and the like are generated. Character recognition can be performed by suppressing the influence of the above, and the recognition accuracy of character recognition can be improved as compared with character recognition performed in units of one character.
 さらに、上記した実施形態によれば、従来のOCRテキスト認識技術を用いた文字認識では誤認識するような文字でも、文字列学習モデル13を用いた文字認識では正しく認識することができる。例えば、文字の上に印章が重ねられた場合、その文字について従来のOCRテキスト認識技術では誤認識する可能性があったが、文字列学習モデル13を用いた文字認識では正しく認識することができる。 Further, according to the above-described embodiment, even a character that is erroneously recognized by character recognition using the conventional OCR text recognition technique can be correctly recognized by character recognition using the character string learning model 13. For example, when a seal is superimposed on a character, the character may be erroneously recognized by the conventional OCR text recognition technique, but can be correctly recognized by character recognition using the character string learning model 13. ..
 さらに、上記した実施形態によれば、電子文書生成装置10は、種類が表に該当する要素については、セル単体に係る画像に含まれる文字列ごとに文字認識を行うので、表に含まれる文字列の文字認識の認識精度を向上させることができる。 Further, according to the above-described embodiment, the electronic document generator 10 recognizes the elements whose types correspond to the table for each character string included in the image related to the cell alone, and therefore the characters included in the table. It is possible to improve the recognition accuracy of character recognition in a column.
 さらに、上記した実施形態によれば、アノテーションが付与された文字列学習データ及びレイアウト学習データにより、文字列学習モデル13及びレイアウト学習モデル14を学習するので、レイアウト認識部33及び文字列認識部35の認識精度を向上させることができる。 Further, according to the above-described embodiment, the character string learning model 13 and the layout learning model 14 are learned from the annotated character string learning data and the layout learning data, so that the layout recognition unit 33 and the character string recognition unit 35 are learned. It is possible to improve the recognition accuracy of.
 さらに、上記した実施形態によれば、文書画像の中に表が有る場合は表を構成する全ての縦線及び横線について先ず認識し、当該表に含まれる全てのセルについて認識する。その後で、全てのセルについて、表の内部における位置情報に影響を受けることが無くセルごとの画像に文字列認識を行うので、セルの内部の文字列の文字認識の認識精度を向上させることができる。 Further, according to the above-described embodiment, when there is a table in the document image, all the vertical lines and horizontal lines constituting the table are first recognized, and all the cells included in the table are recognized. After that, for all cells, the character string recognition is performed on the image for each cell without being affected by the position information inside the table, so that the recognition accuracy of the character recognition of the character string inside the cell can be improved. can.
 本開示は上記した実施形態に係る電子文書生成装置10に限定されるものではなく、特許請求の範囲に記載した本開示の要旨を逸脱しない限りにおいて、その他種々の変形例、若しくは応用例により実施可能である。 The present disclosure is not limited to the electronic document generator 10 according to the above-described embodiment, and is carried out by various other modifications or applications as long as it does not deviate from the gist of the present disclosure described in the claims. It is possible.
10 電子文書生成装置
11 情報通信ネットワーク
12 ユーザ端末
13 文字列学習モデル
14 レイアウト学習モデル
15 文書画像データベース
20 入出力インターフェース
21 通信インターフェース
22 ROM
23 RAM
24 記憶部
25 CPU
26 入力装置
27 出力装置
28 GPU
31 文書画像取得部
32 前処理部
 32a 背景除去部
 32b 傾き補正部
 32c 形状調整部
33 レイアウト認識部
34 切出部
35 文字列認識部
36 出力部
40 レイアウト学習用データ生成部
41 レイアウト学習用データ修正部
42 レイアウト学習部
43 文字列学習用データ生成部
44 文字列学習用データ修正部
45 文字列学習部
47 傾き補正前の文書画像
48 文字列
49 表
50 ホッチキス跡
51 手書き
52 印章
53 画像
54 ノイズ除去
55 前処理
56 レイアウト認識処理
57 文字列認識処理
58a、59a、60a 文書画像
58b、59b、60b 文書画像
61、62 文書画像
63、64 表
65 縦線
66 横線
67 セル画像
69、70、73 文字列の画像
71a 文字列の画像
71b テキストデータ
72 認識範囲
73 表
75 認識範囲
76 文字列の注釈記号
77 表の注釈記号
78 画像の注釈記号
79 印章の注釈記号
80 外枠の注釈記号
81 ノイズの注釈記号
82 手書きの注釈記号
83 縦線の注釈記号
84 横線の注釈記号
85 テキストデータの注釈
100 電子文書生成システム
S31 文書画像取得ステップ
S32 前処理ステップ
S33 レイアウト認識ステップ
S34 切出ステップ
S35 文字認識ステップ
S36 出力ステップ
10 Electronic document generator 11 Information communication network 12 User terminal 13 Character string learning model 14 Layout learning model 15 Document image database 20 Input / output interface 21 Communication interface 22 ROM
23 RAM
24 Storage unit 25 CPU
26 Input device 27 Output device 28 GPU
31 Document image acquisition unit 32 Preprocessing unit 32a Background removal unit 32b Tilt correction unit 32c Shape adjustment unit 33 Layout recognition unit 34 Cutout unit 35 Character string recognition unit 36 Output unit 40 Layout learning data generation unit 41 Layout learning data correction Part 42 Layout learning part 43 Character string learning data generation part 44 Character string learning data correction part 45 Character string learning part 47 Document image before tilt correction 48 Character string 49 Table 50 Hotchkiss mark 51 Handwritten 52 Seal 53 Image 54 Noise removal 55 Preprocessing 56 Layout recognition processing 57 Character string recognition processing 58a, 59a, 60a Document image 58b, 59b, 60b Document image 61, 62 Document image 63, 64 Table 65 Vertical line 66 Horizontal line 67 Cell image 69, 70, 73 Character string Image 71a Character string image 71b Text data 72 Recognition range 73 Table 75 Recognition range 76 Character string comment symbol 77 Table comment symbol 78 Image comment symbol 79 Seal comment symbol 80 Outer frame comment symbol 81 Noise comment symbol 82 Handwritten comment symbol 83 Vertical line comment symbol 84 Horizontal line comment symbol 85 Text data comment 100 Electronic document generation system S31 Document image acquisition step S32 Preprocessing step S33 Layout recognition step S34 Cutout step S35 Character recognition step S36 Output step

Claims (16)

  1.  文書を画像化した文書画像を取得する文書画像取得部と、
     文書画像と当該文書画像に含まれる文字列との対応関係を学習した文字列学習モデルを用いて、
     前記文書画像取得部に取得された前記文書画像に含まれる文字列を文字認識し、当該文字列に係るテキストデータを生成する文字列認識部と、
     前記テキストデータを電子媒体のテキストとして出力する出力部と、
     を備えることを特徴とする電子文書生成装置。
    A document image acquisition unit that acquires a document image that is an image of a document,
    Using a character string learning model that learned the correspondence between the document image and the character string included in the document image,
    A character string recognition unit that recognizes a character string included in the document image acquired by the document image acquisition unit and generates text data related to the character string, and a character string recognition unit.
    An output unit that outputs the text data as text on an electronic medium,
    An electronic document generator, characterized in that it comprises.
  2.  文書画像に含まれる複数の要素と、当該複数の要素の各々の識別情報との対応関係を学習したレイアウト学習モデルを用いて、
     前記文書画像取得部に取得された前記文書画像に含まれる複数の要素の各々の前記文書画像内における範囲を特定し、前記複数の要素の各々の種類を認識し、前記複数の要素の各々の前記範囲に係る前記文書画像内における位置情報を取得するレイアウト認識部をさらに備え、
     前記文字列認識部は、前記レイアウト認識部により認識された前記範囲に含まれる文字列について、前記文字列学習モデルを用いて文字認識し、前記文字列に係るテキストデータを生成し、
     前記出力部は、前記複数の要素に係る前記範囲の各々の前記位置情報に、前記複数の要素に係る前記テキストデータの各々を電子媒体のテキストとして出力する、
     ことを特徴とする請求項1に記載の電子文書生成装置。
    Using a layout learning model that learned the correspondence between a plurality of elements included in a document image and the identification information of each of the plurality of elements,
    The range of each of the plurality of elements included in the document image acquired by the document image acquisition unit within the document image is specified, each type of the plurality of elements is recognized, and each of the plurality of elements is recognized. Further, a layout recognition unit for acquiring position information in the document image related to the range is provided.
    The character string recognition unit recognizes a character string included in the range recognized by the layout recognition unit using the character string learning model, and generates text data related to the character string.
    The output unit outputs each of the text data related to the plurality of elements to the position information in each of the ranges related to the plurality of elements as text on an electronic medium.
    The electronic document generator according to claim 1.
  3.  前記要素の前記種類は、文字列、表、画像、印章、又は手書きのいずれかである、
     ことを特徴とする請求項2に記載の電子文書生成装置。
    The type of the element is either a string, a table, an image, a seal, or a handwriting.
    The electronic document generator according to claim 2.
  4.  前記レイアウト認識部により認識された前記種類が前記表に該当する前記要素において、当該要素に含まれる前記表の中のセルの各々を切り出し、前記セルの各々の前記文書画像内における位置情報を取得する切出部をさらに備え、
     前記文字列認識部は、前記切出部に切り出された前記セルの各々に含まれる文字列について、前記文字列学習モデルを用いて文字認識を行い、前記文字列に係るテキストデータを生成する、
     ことを特徴とする請求項3に記載の電子文書生成装置。
    In the element whose type corresponds to the table recognized by the layout recognition unit, each of the cells in the table included in the element is cut out, and the position information of each of the cells in the document image is acquired. With more cutouts
    The character string recognition unit recognizes a character string included in each of the cells cut out in the cutout unit using the character string learning model, and generates text data related to the character string.
    The electronic document generator according to claim 3.
  5.  複数の前記要素を含む文書画像であって、当該要素に当該要素の各々に該当する前記種類に関連付けられたアノテーションが付与されており、
     前記アノテーションが付与された複数の前記文書画像を蓄積してレイアウト学習用データを生成するレイアウト学習用データ生成部をさらに備え、
     前記レイアウト学習用データは前記レイアウト学習モデルの教師有り学習に用いられる、
     ことを特徴とする請求項2ないし4のいずれか1項に記載の電子文書生成装置。
    A document image containing a plurality of the elements, and the elements are annotated with the annotations associated with the types corresponding to each of the elements.
    It further includes a layout learning data generation unit that accumulates a plurality of the document images to which the annotation is attached and generates layout learning data.
    The layout learning data is used for supervised learning of the layout learning model.
    The electronic document generator according to any one of claims 2 to 4, wherein the electronic document generator is characterized by the above.
  6.  前記文書画像に、前記アノテーションとともに前記文書画像に含まれる前記複数の要素に係る範囲の各々の前記文書画像内における位置情報が付与されることを特徴とする請求項5に記載の電子文書生成装置。 The electronic document generator according to claim 5, wherein the document image is provided with position information in the document image for each of the ranges related to the plurality of elements included in the document image together with the annotation. ..
  7.  入力に基づいて、前記レイアウト認識部により認識された前記複数の要素の各々の種類、及び前記複数の要素の各々の範囲の前記文書画像内における位置情報の少なくともいずれかが修正され、この修正されたデータを追加することで前記レイアウト学習用データを更新するレイアウト学習用データ修正部をさらに備える、
     ことを特徴とする請求項5または6に記載の電子文書生成装置。
    Based on the input, at least one of the types of the plurality of elements recognized by the layout recognition unit and the position information in the document image of each range of the plurality of elements is corrected and corrected. It further includes a layout learning data correction unit that updates the layout learning data by adding the data.
    The electronic document generator according to claim 5 or 6.
  8.  前記レイアウト学習用データ修正部により更新された前記レイアウト学習用データを用いて、前記レイアウト学習モデルの再学習を行うレイアウト学習部をさらに備える、
     ことを特徴とする請求項7に記載の電子文書生成装置。
    A layout learning unit for re-learning the layout learning model using the layout learning data updated by the layout learning data correction unit is further provided.
    The electronic document generator according to claim 7.
  9.  前記文字列学習モデルの教師有り学習に用いる文字列学習用データを生成する文字列学習用データ生成部をさらに備える、
     ことを特徴とする請求項2ないし8のいずれか1項に記載の電子文書生成装置。
    It further includes a character string learning data generation unit that generates character string learning data used for supervised learning of the character string learning model.
    The electronic document generator according to any one of claims 2 to 8.
  10.  入力に基づいて、前記文字列認識部により生成されたテキストデータが修正され、この修正されたテキストデータを追加することで前記文字列学習用データを更新する文字列学習用データ修正部をさらに備える、
     ことを特徴とする請求項9に記載の電子文書生成装置。
    Based on the input, the text data generated by the character string recognition unit is modified, and the character string learning data correction unit that updates the character string learning data by adding the modified text data is further provided. ,
    The electronic document generator according to claim 9.
  11.  前記文字列学習用データ修正部により更新された前記文字列学習用データを用いて、前記文字列学習モデルの再学習を行う文字列学習部をさらに備える、
     ことを特徴とする請求項10に記載の電子文書生成装置。
    A character string learning unit for re-learning the character string learning model using the character string learning data updated by the character string learning data correction unit is further provided.
    The electronic document generator according to claim 10.
  12.  前記文字列認識部は、複数の前記文字列学習モデルを備え、前記複数の要素の各々に含まれる文字列の言語に適応した前記文字列学習モデルを用いる、
     ことを特徴とする請求項2ないし11のいずれか1項に記載の電子文書生成装置。
    The character string recognition unit includes a plurality of the character string learning models, and uses the character string learning model adapted to the language of the character string included in each of the plurality of elements.
    The electronic document generator according to any one of claims 2 to 11.
  13.  前記文書画像取得部が取得した文書画像について前処理を行う前処理部をさらに備え、
     前記前処理部は、背景除去部、傾き補正部、及び形状調整部を備え、
      前記背景除去部は、前記文書画像取得部が取得した前記文書画像の背景を除去し、
      前記傾き補正部は、前記文書画像取得部が取得した前記文書画像の傾きを補正し、
      前記形状調整部は、前記文書画像取得部が取得した前記文書画像の全体の形状及び大きさを調整する
     ことを特徴とする請求項2ないし12のいずれか1項に記載の電子文書生成装置。
    A pre-processing unit that performs pre-processing on the document image acquired by the document image acquisition unit is further provided.
    The pretreatment unit includes a background removal unit, a tilt correction unit, and a shape adjustment unit.
    The background removing unit removes the background of the document image acquired by the document image acquisition unit.
    The tilt correction unit corrects the tilt of the document image acquired by the document image acquisition unit.
    The electronic document generation device according to any one of claims 2 to 12, wherein the shape adjusting unit adjusts the overall shape and size of the document image acquired by the document image acquisition unit.
  14.  前記レイアウト学習モデルは、契約書用のレイアウト学習モデル、請求書用のレイアウト学習モデル、覚書用のレイアウト学習モデル、納品書用のレイアウト学習モデル、又は領収書用のレイアウト学習モデルのいずれかであることを特徴とする請求項2ないし13のいずれか1項に記載の電子文書生成装置。 The layout learning model is either a layout learning model for contracts, a layout learning model for invoices, a layout learning model for memorandums, a layout learning model for invoices, or a layout learning model for receipts. The electronic document generator according to any one of claims 2 to 13, wherein the electronic document generator is characterized by the above.
  15.  電子文書生成装置に用いられるコンピュータが、
     文書を画像化した文書画像を取得する文書画像取得ステップと、
     文書画像と当該文書画像に含まれる文字列との対応関係を学習した文字列学習モデルを用いて、
     前記文書画像取得ステップにて取得された前記文書画像に含まれる文字列を文字認識し、当該文字列に係るテキストデータを生成する文字列認識ステップと、
     前記テキストデータを電子媒体のテキストとして出力する出力ステップと、
     を実行することを特徴とする電子文書生成方法。
    The computer used for the electronic document generator
    A document image acquisition step to acquire a document image that is an image of a document,
    Using a character string learning model that learned the correspondence between the document image and the character string included in the document image,
    A character string recognition step that recognizes a character string included in the document image acquired in the document image acquisition step and generates text data related to the character string, and a character string recognition step.
    An output step for outputting the text data as text on an electronic medium, and
    An electronic document generation method characterized by performing.
  16.  電子文書生成装置に用いられるコンピュータに、
     文書を画像化した文書画像を取得する文書画像取得機能と、
     文書画像と当該文書画像に含まれる文字列との対応関係を学習した文字列学習モデルを用いて、
     前記文書画像取得機能にて取得された前記文書画像に含まれる文字列を文字認識し、当該文字列に係るテキストデータを生成する文字列認識機能と、
     前記テキストデータを電子媒体のテキストとして出力する出力機能と、
     を発揮させることを特徴とする電子文書生成プログラム。
    For computers used in electronic document generators
    A document image acquisition function that acquires a document image that is an image of a document,
    Using a character string learning model that learned the correspondence between the document image and the character string included in the document image,
    A character string recognition function that recognizes a character string included in the document image acquired by the document image acquisition function and generates text data related to the character string, and a character string recognition function.
    An output function that outputs the text data as text on an electronic medium,
    An electronic document generation program characterized by demonstrating.
PCT/JP2021/047935 2020-12-28 2021-12-23 Architecture for digitalizing documents using multi-model deep learning, and document image processing program WO2022145343A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021412659A AU2021412659A1 (en) 2020-12-28 2021-12-23 Architecture for digitalizing documents using multi-model deep learning, and document image processing program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-219612 2020-12-28
JP2020219612A JP7150809B2 (en) 2020-12-28 2020-12-28 Document digitization architecture by multi-model deep learning, document image processing program

Publications (1)

Publication Number Publication Date
WO2022145343A1 true WO2022145343A1 (en) 2022-07-07

Family

ID=82259389

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/047935 WO2022145343A1 (en) 2020-12-28 2021-12-23 Architecture for digitalizing documents using multi-model deep learning, and document image processing program

Country Status (3)

Country Link
JP (2) JP7150809B2 (en)
AU (1) AU2021412659A1 (en)
WO (1) WO2022145343A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004178010A (en) * 2002-11-22 2004-06-24 Toshiba Corp Document processor, its method, and program
JP2020046860A (en) * 2018-09-18 2020-03-26 株式会社三菱Ufj銀行 Form reading apparatus
JP2020184109A (en) * 2019-04-26 2020-11-12 Arithmer株式会社 Learning model generation device, character recognition device, learning model generation method, character recognition method, and program

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH045779A (en) * 1990-04-24 1992-01-09 Oki Electric Ind Co Ltd Character recognizing device
JP3162158B2 (en) * 1992-02-20 2001-04-25 株式会社リコー Image reading device
JPH07160809A (en) * 1993-12-08 1995-06-23 Nec Corp Ocr device
JPH08263588A (en) * 1995-03-28 1996-10-11 Fuji Xerox Co Ltd Character recognition device
JP3940491B2 (en) * 1998-02-27 2007-07-04 株式会社東芝 Document processing apparatus and document processing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004178010A (en) * 2002-11-22 2004-06-24 Toshiba Corp Document processor, its method, and program
JP2020046860A (en) * 2018-09-18 2020-03-26 株式会社三菱Ufj銀行 Form reading apparatus
JP2020184109A (en) * 2019-04-26 2020-11-12 Arithmer株式会社 Learning model generation device, character recognition device, learning model generation method, character recognition method, and program

Also Published As

Publication number Publication date
JP2022169754A (en) 2022-11-09
JP7150809B2 (en) 2022-10-11
AU2021412659A9 (en) 2024-02-08
JP2022104411A (en) 2022-07-08
AU2021412659A1 (en) 2023-07-13

Similar Documents

Publication Publication Date Title
US8508756B2 (en) Image forming apparatus having capability for recognition and extraction of annotations and additionally written portions
JP5121599B2 (en) Image processing apparatus, image processing method, program thereof, and storage medium
CN101600032B (en) Information processing apparatus, method of processing information, control program, and recording medium
JP4854491B2 (en) Image processing apparatus and control method thereof
US20080100624A1 (en) Image processing apparatus and method, and program
US20090055159A1 (en) Translation apparatus, translation method and computer readable medium
CN107633055B (en) Method for converting picture into HTML document
US9460089B1 (en) Flow rendering of annotation characters
US20090210215A1 (en) Document image processing device and document image processing program
US20200104586A1 (en) Method and system for manual editing of character recognition results
US7929772B2 (en) Method for generating typographical line
US8600175B2 (en) Apparatus and method of processing image including character string
US7983485B2 (en) System and method for identifying symbols for processing images
JP7035656B2 (en) Information processing equipment and programs
WO2022145343A1 (en) Architecture for digitalizing documents using multi-model deep learning, and document image processing program
JP5020698B2 (en) Image processing apparatus, image processing method, and image processing program
JP2022092119A (en) Image processing apparatus, image processing method, and program
JP2008181174A (en) Method for creating drawing original for patent application or utility model registration application
JP2018151699A (en) Information processing device and program
JP4998176B2 (en) Translation apparatus and program
JP2009223363A (en) Document processor and document processing program
US11170211B2 (en) Information processing apparatus for extracting portions filled with characters from completed document without user intervention and non-transitory computer readable medium
US9411547B1 (en) Compensation for print shift in standardized forms to facilitate extraction of data therefrom
JP4310176B2 (en) Image processing apparatus, image processing method, and program
JP2009205209A (en) Document image processor and document image processing program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21915203

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021412659

Country of ref document: AU

Date of ref document: 20211223

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 11202304908R

Country of ref document: SG

122 Ep: pct application non-entry in european phase

Ref document number: 21915203

Country of ref document: EP

Kind code of ref document: A1