WO2021059848A1

WO2021059848A1 - Information processing device, information processing method, and information processing program

Info

Publication number: WO2021059848A1
Application number: PCT/JP2020/032346
Authority: WO
Inventors: 択渡久地
Original assignee: ＡＩｉｎｓｉｄｅ株式会社
Priority date: 2019-09-27
Filing date: 2020-08-27
Publication date: 2021-04-01
Also published as: JP2021056659A; JP6722929B1

Abstract

An information processing device 100 in an information processing system 1 comprises the following as functions thereof: an image data acquisition unit 131 that acquires image data; a reading item recognition unit 132 that identifies the location of character information included in the image data and recognizes the same as reading items; a text data generation unit 133 that reads out character information pertaining to the recognized portions and performs character recognition; an attribute settings unit 134 that sets attributes pertaining to text data; a correct data extraction unit 135 that compares the text data to correct text data to determine whether there is a match, and extracts the text data determined to be a match; and a learning unit 136 that performs machine learning on the basis of the extracted text data and the image data serving as the source of the text data.

Description

Information processing equipment, information processing methods and information processing programs

This disclosure relates to an information processing device, an information processing method, and an information processing program that read character information from image data including character information.

As an example of image data including character information, a technique for converting input information into a predetermined character code and generating text data by reading forms with an image scanner and performing OCR (Optical Character Recognition) processing has become widespread. are doing.

The process of generating text data from forms, which is performed by such a technique, can be read with a certain accuracy in a standard form described in a predetermined format, but can be read in an atypical form. May be less accurate. This is because, in the case of an atypical form, it is unclear what kind of entry item is placed at which position on the form, and even if OCR processing is performed with the entry item unknown, it is difficult to read with high accuracy. It depends on what it is.

Therefore, for example, Patent Document 1 discloses a document structure analysis device that performs document structure analysis on an atypical document. This document structure analysis device acquires the lines of the read document, extracts the attribute probabilities for each attribute, what kind of line (title, export, continuation from the previous line, etc.), and there are multiple possibilities. A multiple temporary document structure network that expresses the document structure of is generated. Using this network, the consistency of the document structure is analyzed while reducing the ambiguity of the document structure.

Japanese Unexamined Patent Publication No. 2015-127913

By the way, commonly used forms, especially forms such as documents used in the business world, may have different formats depending on the creator, such as invoices, but the description content is as follows. Often similar content is described. By reading the form and performing machine learning on such a form without requiring complicated processing as described in Patent Document 1, what kind of description item is arranged at which position on the form can be determined. A method that can be grasped with high accuracy has been desired.

Therefore, in the present disclosure, an information processing device, an information processing method, and an information processing program capable of improving the accuracy of recognizing the position of a read item by reading image data including character information and performing machine learning will be described.

The information processing device according to one aspect of the present disclosure is an information processing device that reads character information from image data including character information and performs machine learning about the position of the read character information in the image data, and is an image for acquiring image data. A data acquisition unit, a reading item recognition unit that identifies the position of character information contained in image data and recognizes it as a reading item, and a text data generation unit that recognizes the character information in the reading item and generates text data. , The text data and the correct text data indicating the character information contained in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match. Machine learning is performed based on the correct answer data extraction unit that extracts text data, the extracted text data, and the position of the read item in the image data that is the basis of the extracted text data, and the learning model is generated and updated. It is equipped with a learning department to perform.

The information processing method in one aspect of the present disclosure is an information processing method in which character information is read from image data including character information and machine learning is performed on the position of the read character information in the image data, which is performed by the image data acquisition unit. , The image data acquisition step of acquiring image data, the reading item recognition step of identifying the position of the character information included in the image data and recognizing it as a reading item, and the text data generation unit, which are performed by the reading item recognition unit. The text data generation step of recognizing the character information in the read item and generating the text data, the text data performed by the correct answer data extraction unit, and the correct answer text data indicating the character information included in the image data stored in advance. Is compared for each read item, it is judged whether or not they match, and the correct answer data extraction step of extracting the text data determined to match, and the extracted text data performed by the learning unit , A learning step of performing machine learning based on the position of a read item in the image data on which the extracted text data is based, and generating and updating a learning model.

Further, the information processing program according to one aspect of the present disclosure is an information processing program that reads character information from image data including character information and performs machine learning about the position of the read character information in the image data, and acquires the image data. Image data acquisition step to be performed, a reading item recognition step that identifies the position of character information contained in the image data and recognizes it as a reading item, and a text data generation that performs character recognition of the character information in the reading item and generates text data. The step, the text data, and the correct text data indicating the character information contained in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match. Machine learning is performed based on the correct answer data extraction step to extract the extracted text data, the extracted text data, and the position of the read item in the image data that is the basis of the extracted text data, and the learning model is generated and Let the electronic computer execute the learning step to update.

According to the present disclosure, the position of the character information contained in the image data is identified and recognized as a reading item, the character information contained in the image data is recognized as a character to generate text data, and the correct text data and the reading item are obtained. Each item is compared to determine whether or not they match, and machine learning is performed based on the text data determined to match and the underlying image data. Therefore, it is possible to improve the reading accuracy of the image data. In addition, since it is possible to improve the accuracy of recognizing the position of the read item without requiring complicated processing, a machine learning model for reading image data including character information can be generated without requiring a lot of trouble. It is possible to do.

It is a functional block block diagram which shows the information processing system which concerns on one Embodiment of this disclosure. It is a functional block block diagram which shows the user terminal 200 of FIG. It is a flowchart which shows the operation of the information processing apparatus 100 of FIG. It is a schematic diagram which shows the example of the image data P1 acquired by the image data acquisition unit 131 of FIG. It is a schematic diagram which shows the example of the recognition of the reading item performed by the reading item recognition unit 132 of FIG. It is a schematic diagram which shows the example of the text data generated and attribute set by the text data generation unit 133 and the attribute setting unit 134 of FIG. It is a schematic diagram which shows the example of the correct answer data stored in the correct answer text data DB 122 of FIG. It is a schematic diagram which shows the example of the determination in the correct answer data extraction unit 135 of FIG. It is a functional block block diagram which shows the information processing system which concerns on one Embodiment of this disclosure. It is a functional block block diagram which shows the computer 700 which concerns on one Embodiment of this disclosure.

Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. The embodiments described below do not unreasonably limit the contents of the present disclosure described in the claims. Also, not all of the components shown in the embodiments are essential components of the present disclosure.

(Embodiment 1)
<Structure>
FIG. 1 is a functional block configuration diagram showing an information processing system 1 according to the first embodiment of the present disclosure. This information processing system 1 is not limited, but as an example, performs character recognition of character information included in image data including character information to generate text data, and normally read text data and an image on which the information data is based. It is a system that performs machine learning based on data. The information processing system 1 includes correct text data indicating character information included in the image data in order to determine whether or not the text data has been read normally. The generated text data is compared with the correct text data to determine whether or not they match, and if it is determined to match, it is determined to be normally read text data.

Here, in the present embodiment, as image data including character information, image data obtained by scanning forms as an image is described as an example, but the data is not limited to such form data. In this example, the forms to be scanned are atypical documents. An atypical document is a document such as an invoice, which may have a different format depending on the creator, and has similar contents as the description contents, but the information processing in the present embodiment is described. The forms to be scanned by the system 1 are not limited to this.

The information processing system 1 has an information processing device 100, a user terminal 200, and a network NW. The information processing device 100 and the user terminal 200 are connected to each other via a network NW. The network NW is a communication network for communication, and is not limited to the Internet, an intranet, a LAN (Local Area Network), a WAN (Wide Area Network), a wireless LAN (Wireless LAN: WAN), and a wireless WAN (Wireless LAN). It is composed of a communication network including Wireless WAN: WAN), Virtual Private Network (VPN), and the like.

The information processing device 100 identifies the position of the character information included in the image data, recognizes it as a reading item, performs character recognition of the character information included in the image data, generates text data, and generates the text data, and the text data and the correct answer text. Learning to compare with the data to determine whether or not they match, and to estimate the position of the character information in the image data based on the text data determined to match and the underlying image data. It is a device that performs machine learning about the model. Specifically, the information processing device 100 is composed of, for example, a computer (desktop, laptop, tablet, etc.) that controls various devices, a server device, and the like. The information processing device 100 is not limited to a device that operates independently, and may be a distributed server system or a cloud server in which a plurality of devices are connected to each other via a communication network and cooperate with each other to perform communication.

The user terminal 200 is a device that receives operation input performed by the user to the information processing device 100, and is not limited to the user terminal 200, and is composed of, for example, a smartphone, a mobile terminal, a computer (desktop, laptop, tablet, etc.). There is. In this user terminal 200, as an example, not limited to, an application for receiving the service of the information processing system 1 is installed, or a URL or the like for accessing the information processing device 100 is set, and tap or double them. The service is started by clicking or starting the service.

The information processing device 100 includes a communication unit 110, a storage unit 120, and a control unit 130.

The communication unit 110 is a communication interface for communicating with the user terminal 200 by wire or wirelessly via the network NW, and any communication protocol may be used as long as mutual communication can be executed. The communication unit 110 is not limited, and for example, communication is performed by a communication protocol such as TCP / IP (Transmission Control Protocol / Internet Protocol).

The storage unit 120 stores programs, input data, and the like for executing various control processes and each function in the control unit 130. The storage unit 120 is not limited, but as an example, a RAM (RandomAccessMemory) and a ROM (ReadOnly). It is composed of a memory including a memory) and a storage including an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory and the like. Further, the storage unit 120 stores the image data DB 121, the correct answer T (text) data DB 122, the text data DB 123, and the reading learning model DB 124. Further, the storage unit 120 temporarily stores the data when communicating with the user terminal 200 and the data generated by each process described later. The image data DB 121, the correct text data DB 122, the text data DB 123, and the reading learning model DB 124 are databases that can be accessed, referenced, and updated from various programs of the control unit 130.

The image data DB 121 stores image data obtained by scanning forms as an image with a scanner device, or path information of a storage destination in which the image data is stored. This image data is for reading character information by OCR. As mentioned above, the forms to be scanned are, for example, invoices, etc., and the invoice source and billing destination companies do not have to be the same company, etc., and the format of the invoices, etc. is unified. You don't have to. In the present embodiment, the image data scanned by the scanner device is targeted, but it is sufficient that the forms in the paper medium are converted into electronic data, and for example, the photographic image data captured by a camera or the like may be used.

In the correct answer text data DB 122, character information included in the image data stored in the image data DB 121 (or the image data acquired from the storage destination path information stored in the image data DB 121) is stored as the correct answer text data. Has been done. This correct text data is used to determine whether or not the character information has been read normally by OCR. This correct answer text data is not limited, but is stored for each item described in the underlying forms as an example, and attributes are set for each item. The attribute is, for example, the name of the item, such as "form name", "company name", and "date".

In the text data DB 123, the character information is read by OCR from the image data stored in the image data DB 121 (or the image data acquired from the path information of the storage destination stored in the image data DB 121), and the text is generated. The data is stored. Whether or not this text data is read normally is determined, and the text data read normally is used for machine learning. This text data is not limited, but is stored for each item described in the read forms as in the case of the correct answer text data, and the attribute is set for each item.

The reading learning model DB 124 stores a learning model generated by machine learning based on normally read text data. This learning model is model information for generating text data by performing character recognition of character information included in image data obtained by scanning forms as an image and estimating the position in the image data.

The control unit 130 controls the entire operation of the information processing device 100 by executing a program stored in the storage unit 120, and is not limited to, but as an example, a CPU (Central Processing Unit), an MPU ( Equipment including MicroProcessingUnit), GPU (GraphicsProcessingUnit), microprocessor (Microprocessor), processor core (Processorcore), multiprocessor (Multiprocessor), ASIC (Application-Specific IntegratedCircuit), FPGA (Field ProgrammableGateArray) Etc. The functions of the control unit 130 include an image data acquisition unit 131, a reading item recognition unit 132, a text data generation unit 133, an attribute setting unit 134, a correct answer data extraction unit 135, and a learning unit 136. The image data acquisition unit 131, the reading item recognition unit 132, the text data generation unit 133, the attribute setting unit 134, the correct answer data extraction unit 135, and the learning unit 136 are activated by the program stored in the storage unit 120 to provide information. It is executed by the processing device 100.

The image data acquisition unit 131 receives from the user terminal 200 the image data obtained by scanning the forms, which are examples of the image data including the character information, as an image by the scanner device, or the path information of the storage destination in which the image data is stored. Obtained via the communication unit 110. For example, since the scanner device is directly connected to the user terminal 200 or is connected via the network NW and the scanned image data is transmitted from the user terminal 200, the image data may be acquired. Further, since the image data scanned by another external device is acquired by the user terminal 200 and the image data is transmitted, the image data may be acquired. The scanner device or external device in this case is not shown. The image data acquired by the image data acquisition unit 131 is stored in the image data DB 121.

The reading item recognition unit 132 identifies the position of the character information included in the image data from the image data stored in the image data DB 121, and recognizes it as a reading item for reading the character information by OCR. As described above, in the case of image data obtained by scanning an invoice, for example, the character information contained in the image data, for example, the character information such as "invoice" and "○○ Co., Ltd." is rectangular, for example. It is selected to be enclosed in and recognized as a reading item.

The text data generation unit 133 reads character information by OCR and performs character recognition on a portion recognized as a reading item by the reading item recognition unit 132, and generates text data. As described above, in the case of image data obtained by scanning an invoice, for example, characters such as "invoice" and "○○ Co., Ltd." in the portion recognized as a read item are read as character information, and the text data is displayed. Will be generated. The generated text data is stored in the text data DB 123 for each read item.

The attribute setting unit 134 has attributes such as "form name", "company name", and "date" which are the names of the items of the text data generated by the text data generation unit 133 and stored in the text data DB 123. Set. The attribute setting for each item of the text data is automatically set by the attribute setting unit 134 based on, for example, the position of the read item in the image data obtained by scanning the invoice or the like and the content of the text data read from the read item. Is set. The set attribute is stored in the text data DB 123 in association with the reading item of the text data.

The correct answer data extraction unit 135 compares the text data stored in the text data DB 123 with the correct text data stored in the correct answer text data DB 122, determines whether or not they match, and determines whether or not they match. Extract the text data determined to be. That is, the text data that has been read normally is extracted. The determination as to whether or not there is a match is made for each read item stored in the text data DB 123 and the correct answer text data DB 122.

The determination by the correct answer data extraction unit 135 is not limited, but as an example, first, the read item in the text data is compared with the read item in the correct text data, and it is determined whether or not each read item matches. Next, when it is determined that the read items match, it is determined whether or not the text data and the correct answer text data match. Alternatively, the attributes set in the text data are compared with the attributes set in the correct text data to determine whether or not the attributes match, and then if the attributes match. For the determined read item, it is determined whether or not the text data and the correct answer text data match.

Further, in the determination by the correct answer data extraction unit 135, as an example, not limited, the matching degree of the text data is calculated based on the correct answer text data, and when the calculated matching degree is equal to or more than a predetermined threshold value, the text data and the correct answer text data Is determined to match. As described above, the determination of whether or not there is a match is not limited to the determination of whether or not there is an exact match, and it may be determined that the match is made when the degree of match is equal to or greater than a predetermined threshold value. The determination of the match based on the degree of matching may be made for each read item stored in the text data DB 123 and the correct answer text data DB 122, and similarly, the attributes also match when the degree of matching is equal to or greater than a predetermined threshold value. It may be determined that it is. Further, the determination may be made using a different threshold value for each read item. In particular, in the case of attributes, it is not necessary that they match exactly, and if the degree of matching is equal to or higher than a predetermined threshold value, it may be determined that they match.

The learning unit 136 estimates the position of the character information in the image data based on the text data extracted by the correct answer data extraction unit 135 and the image data stored in the image data DB 121 which is the basis of the text data. Therefore, machine learning is performed on the learning model, and the learning model stored in the reading learning model DB 124 is generated or updated. The learning model may be updated by, for example, an aggregation process in which the learning model stored in the reading learning model DB 124 and the learning result by the learning unit 136 are merged.

Machine learning by the learning unit 136 may be performed by supervised machine learning using the extracted text data and the underlying image data as teacher data, and may be performed by unsupervised machine learning, as an example without limitation. It may be done by deep learning.

FIG. 2 is a functional block configuration diagram showing the user terminal 200 of FIG. The user terminal 200 includes a communication unit 210, a display unit 220, an operation unit 230, a storage unit 240, and a control unit 250.

The communication unit 210 is a communication interface for communicating with the information processing device 100 by wire or wirelessly via the network NW, and any communication protocol may be used as long as mutual communication can be executed. The communication unit 210 is not limited, and for example, communication is performed by a communication protocol such as TCP / IP.

The display unit 220 is a user interface used for displaying the operation content input by the user and the transmission content from the information processing device 100, and is composed of a liquid crystal display or the like. The display unit 220 displays the notification information notified from the information processing device 100 to the user.

The operation unit 230 is a user interface used for the user to input operation instructions, and is composed of a keyboard, a mouse, a touch panel, and the like. The operation unit 230 is used for inputting operation information performed by the user to the information processing device 100.

The storage unit 240 stores programs for executing various control processes and each function in the control unit 250, input data, and the like. The storage unit 240 is not limited, and as an example, a memory including a RAM, a ROM, and the like, an HDD, and the like. It is composed of storage including SSD, flash memory and the like. In addition, the storage unit 240 temporarily stores the data that has communicated with the information processing device 100.

The control unit 250 controls the entire operation of the user terminal 200 by executing a program stored in the storage unit 240, and is not limited to, but as an example, a CPU, an MPU, a GPU, a microprocessor, and a processor. It is composed of a core, a multiprocessor, an ASIC, a device including an FPGA, and the like.

<Processing flow>
The processing flow of an example of the information processing method executed by the information processing apparatus 100 of the information processing system 1 will be described with reference to FIG. FIG. 3 is a flowchart showing the operation of the information processing device 100 of FIG.

As the process of step S101, the user terminal 200 transmits the scanned image data or the path information of the storage destination in which the image data is stored, so that the image data acquisition unit 131 acquires the image data. To. The acquired image data is stored in the image data DB 121.

As the process of step S102, the reading item recognition unit 132 reads the image data acquired in step S101 and stored in the image data DB 121.

FIG. 4 is a schematic diagram showing an example of image data P1 acquired by the image data acquisition unit 131 of FIG. The image data P1 shown in FIG. 4 shows image data obtained by scanning an invoice as an example of a form, in which "△△ Co., Ltd." is the invoice source and an invoice addressed to "○○ Co., Ltd." is illustrated. ing. In this image data P1, information such as a subject, an item, a quantity, and an amount of money is described in addition to a "invoice" which is a form name, a billing source company name, and a billing destination company name. In the process of step S101, for example, image data obtained by scanning an invoice or the like as shown in FIG. 4 is acquired, stored in the image data DB 121, and read by the process of step S102.

As the process of step S103, the reading item recognition unit 132 identifies the position of the character information included in the image data read in step S102, and recognizes it as a reading item for reading the character information by OCR.

FIG. 5 is a schematic diagram showing an example of recognition of a read item performed by the read item recognition unit 132 of FIG. FIG. 5 shows an example in which the position of the character information is identified with respect to the image data P1 shown in FIG. 4 and recognized as a reading item for reading the character information. The reading items A1 to A11 shown in FIG. 5 indicate a state in which the character information of the image data P1 shown in FIG. 4 is recognized as a reading item, and the character information is recognized as a rectangular selection area.

As shown in FIG. 5, for example, for the reading item A1, the character information of the form name "invoice" is selected. For the reading item A2, the character information of "△△ Co., Ltd.", which is the name of the billing company, is selected. For the reading item A3, the character information of "○○ Co., Ltd." which is the billing company name is selected. The reading item A4 selects the character information of the date "September 1, 2019". For the reading item A5, the character information of the subject "○ △ ◇ matter" is selected. For the reading item A6, the character information of the item name "○ △ ◇ fee" is selected. For the reading item A7, the character information of "1", which is the number of items of the reading item A6, is selected. The reading item A8 selects the character information of "150,000" which is the amount of money of the item of the reading item A6. The reading item A9 selects the character information of "150,000", which is the subtotal amount. The reading item A10 selects the character information of "12,000", which is the amount of consumption tax. The reading item A11 selects the character information of "162,000", which is the total amount of money. In the process of step S103, the position of the character information as shown in FIG. 5 is identified and recognized as a reading item for reading the character information.

As the process of step S104, the text data generation unit 133 reads the character information by OCR and performs character recognition on the portion recognized as the read item in step S103, and the text data is generated. The generated text data is stored in the text data DB 123 for each read item.

In the process of step S104, the characters of "invoice" are read by the reading item A1 as shown in FIG. 5, and are generated as text data. Similarly, in the reading item A2, the characters "△△ Co., Ltd." are read and generated as text data. The characters "○○ Co., Ltd." are read by the reading item A3 and generated as text data. The characters "September 1, 2019" are read by the reading item A4 and generated as text data. The characters "○ △ ◇" are read in the reading item A5 and generated as text data. The following processing is the same, so it is omitted.

As the process of step S105, the attribute setting unit 134 sets attributes for the text data generated in step S104 and stored in the text data DB 123.

FIG. 6 is a schematic diagram showing an example of text data T1 generated and attribute-set by the text data generation unit 133 and the attribute setting unit 134 of FIG. The text data shown in the right column of the text data T1 in FIG. 6 is text data generated from the reading items A1 to A11 shown in FIG. 5 (the reading items A6 to A11 are not shown), and the respective texts are shown. The attributes shown in the left column of FIG. 6 are set so as to be linked to the data.

For example, in the text data of the "invoice" generated from the reading item A1 shown in FIG. 5, the "form name" is set as an attribute. Similarly, in the text data of "△△ Co., Ltd." generated from the read item A2, "company name" is set as an attribute. In the text data of "○○ Co., Ltd." generated from the read item A3, "destination" is set as an attribute. "Date" is set as an attribute in the text data of "September 7, 2019" generated from the reading item A4 (in this embodiment, the reading was not normally performed for the item. It shall be.). In the text data of "○ △ ◇ matter" generated from the reading item A5, "subject" is set as an attribute. In the process of step S105, a "form name" or the like is set as an attribute for the text data "invoice" or the like as shown in FIG.

As the process of step S106, the correct answer data extraction unit 135 compares and matches the text data generated in step S104 and stored in the text data DB 123 with the correct answer text data stored in the correct answer text data DB 122. It is determined whether or not it is.

As an example at this time, first, the read items in the text data and the read items in the correct text data are compared, and it is determined whether or not the read items match. Next, when it is determined that the read items match, it is determined whether or not the text data and the correct answer text data match. Alternatively, the attributes set in the text data and the attributes set in the correct text data are compared, and it is determined whether or not the attributes match. Next, it is determined whether or not the text data and the correct text data match with respect to the read items for which the attributes are determined to match.

FIG. 7 is a schematic diagram showing an example of the correct answer text data T2 stored in the correct answer text data DB 122 of FIG. The correct answer text data shown in the right column of the correct answer text data T2 of FIG. 7 is data stored in the correct answer text data DB 122 as the correct answer text data of the character information included in the image data P1 shown in FIG. For each read item of the correct answer text data, the attributes shown in the left column of FIG. 7 are set so as to be associated with the respective correct answer text data in the same manner as the text data stored in the text data DB 123.

For example, in the correct answer text data of "invoice" which is the correct answer data of the reading item A1 shown in FIG. 5, "form name" is set as an attribute. Similarly, in the correct answer text data of "△△ Co., Ltd." which is the correct answer data of the reading item A2, "billing source" is set as an attribute. "Billing address" is set as an attribute in the correct answer text data of "○○ Co., Ltd." which is the correct answer data of the reading item A3. "Date" is set as an attribute in the correct text data of "September 1, 2019", which is the correct answer data of the read item A4. A "subject" is set as an attribute in the correct text data of "○ △ ◇ matter" which is the correct answer data of the read item A5.

FIG. 8 is a schematic diagram showing an example of determination in the correct answer data extraction unit 135 of FIG. In the process of step S106, the text data T1 shown in FIG. 6 and the correct answer text data T2 shown in FIG. 7 are compared, and it is determined whether or not they match. The text data T1 and the correct answer text data T2 shown in FIG. 8 are the same as the text data T1 shown in FIG. 6 and the correct answer text data T2 shown in FIG. 7, respectively.

As an example at this time, first, the attribute set in the text data T1 and the attribute set in the correct text data T2 are compared, and it is determined whether or not the attributes match. In the example shown in FIG. 8, the attribute "company name" in the second line of the text data T1 and the attribute "destination" in the third line are the attributes "billing source" in the second line of the correct text data T2, and the third line. It is different from the attribute "billing address". Regarding such a difference, the correct answer data extraction unit 135 calculates the degree of matching for the attributes of each item, and determines for each attribute that the calculated matching degree matches when the calculated matching degree is equal to or greater than a predetermined threshold value. In this case, since different attributes do not affect the generation of text data, it may be determined that the attributes match.

Next, it is determined whether or not the text data T1 and the correct text data T2 match with respect to the read items for which the attributes are determined to match. In the example shown in FIG. 8, the fourth line “September 7, 2019” of the text data T1 is different from the fourth line “September 1, 2019” of the correct text data T2. Regarding such a difference, the correct answer data extraction unit 135 calculates the matching degree for the text data of each item, and determines that the matching degree matches when the calculated matching degree is equal to or more than a predetermined threshold value. In this case, if the dates are different, it may be determined that the reading could not be performed normally.

As the process of step S107, the correct answer data extraction unit 135 extracts the text data determined that the determination results performed in step S106 match. The text data determined to match may be extracted in units of image data or in units of read items.

For example, in the case of the example shown in FIG. 8, the read item of the attribute "date" was not read normally, but all the text data of the read item for the invoice may be excluded from the extraction target. Only the read item of the attribute "date" may be excluded. For the extracted items, for example, the status information may be provided in the text data DB 123 and the status may be set only for the extracted items, or a separate database may be provided.

As the process of step S108, the learning unit 136 performs machine learning based on the text data extracted in step S107 and the image data stored in the image data DB 121 which is the basis of the text data, and reads the text data. The learning model stored in the learning model DB 124 is generated and updated.

<Effect>
As described above, the information processing device, the information processing system, and the information processing method according to the present embodiment perform character recognition of character information included in image data obtained by scanning forms as an image and generate text data. In addition, the correct text data, which is the correct answer data of the character information included in the image data, is stored in advance. A learning model is generated by comparing the text data with the correct text data to determine whether or not they match, and performing machine learning based on the text data determined to match and the underlying image data. Will be done. Therefore, since machine learning is performed by targeting only the text data that has been read normally as the target of machine learning, it is possible to efficiently improve the reading accuracy of the image data.

In addition, the character information contained in the image data is character-recognized for each read item, compared with the correct text data for each read item to determine whether or not they match, and is based on the text data determined to match. Machine learning is performed based on the image data that becomes. Therefore, since the determination is made for each reading item, it is possible to improve the reading accuracy that differs for each item.

Further, the matching degree of the text data is calculated based on the correct answer text data, and if the calculated matching degree is equal to or more than a predetermined threshold value, it is determined that the text data and the correct answer text data match. Further, this determination may be made for each read item. Therefore, the criteria for determining whether or not they match can be set for each form and each read item. As a result, it is possible to improve the reading accuracy, which differs for each item, more efficiently.

(Embodiment 2)
FIG. 9 is a functional block configuration diagram showing the information processing system 1A according to the second embodiment of the present disclosure. This information processing system 1A generates text data by performing character recognition of character information included in image data including character information, and is a machine based on the normally read text data and the image data on which the information data is based. It is the same as the information processing system 1 according to the first embodiment in that it is a learning system, but includes an image data reading unit 137 as a function of the control unit 130 of the information processing device 100A provided in the present embodiment. In that respect, it differs from the information processing system 1 according to the first embodiment.

In this embodiment, the actual forms are read based on the learning model generated by the information processing system 1A.

The image data reading unit 137 acquires image data by newly scanning forms based on the learning model in which machine learning is performed by the learning unit 136 and is stored in the reading learning model DB 124, and character recognition of character information is performed. And generate new text data. The new text data may be stored in the text data DB 123, or may be newly stored in another database. This text data may be provided to a person who provides the forms, for example, as a product of a service that scans the forms and generates the text data read by OCR.

The learning unit 136 in the present embodiment may perform machine learning based on new text data and image data obtained by scanning new forms. Thereby, the reading accuracy can be further improved. Other configurations and processing flows are the same as those in the first embodiment.

According to the present embodiment, in addition to the effect of the first embodiment, an image data reading unit for newly acquiring image data obtained by scanning forms and performing character recognition of character information is provided, and character information is provided based on a learning model. Character recognition of. As a result, the accuracy of reading can be further improved, and it is possible to provide the form as a product of the service of scanning the forms and generating the read text data by OCR to the person who provides the forms. ..

(Embodiment 3 (program))
FIG. 10 is a functional block configuration diagram showing an example of the configuration of the computer (electronic computer) 700. The computer 700 includes a CPU 701, a main storage device 702, an auxiliary storage device 703, and an interface 704.

Here, the image data acquisition unit 131, the reading item recognition unit 132, the text data generation unit 133, the attribute setting unit 134, the correct answer data extraction unit 135, the learning unit 136, and the image according to the first and second embodiments. The details of the control program (information processing program) for realizing each function constituting the data reading unit 137 will be described. These functional blocks are implemented in the computer 700. The operation of each of these components is stored in the auxiliary storage device 703 in the form of a program. The CPU 701 reads the program from the auxiliary storage device 703, expands it to the main storage device 702, and executes the above processing according to the program. Further, the CPU 701 secures a storage area corresponding to the above-mentioned storage unit in the main storage device 702 according to the program.

Specifically, the program includes an image data acquisition step of acquiring image data, a reading item recognition step of identifying the position of character information included in the image data and recognizing it as a reading item, and a reading item in the computer 700. The text data generation step of performing character recognition of the character information in the above and generating the text data, and the text data and the correct answer text data indicating the character information included in the image data stored in advance are compared for each read item. The correct answer data extraction step that determines whether or not they match and extracts the text data that is determined to match, and the reading of the extracted text data and the image data that is the basis of the extracted text data. It is a control program that realizes a learning step that performs machine learning based on the position of an item and generates and updates a learning model by a computer.

The auxiliary storage device 703 is an example of a tangible medium that is not temporary. Other examples of non-temporary tangible media include magnetic disks, magneto-optical disks, CD-ROMs, DVD-ROMs, semiconductor memories, etc. connected via interface 704. When this program is distributed to the computer 700 via the network, the distributed computer 700 may expand the program to the main storage device 702 and execute the above processing.

Further, the program may be for realizing a part of the above-mentioned functions. Further, the program may be a so-called difference file (difference program) that realizes the above-mentioned function in combination with another program already stored in the auxiliary storage device 703.

Although the embodiments related to disclosure have been described above, these can be implemented in various other embodiments, and can be implemented by making various omissions, substitutions, and changes. These embodiments and modifications, as well as those omitted, replaced or modified, are included in the technical scope of the claims and the equivalent scope thereof.

1,1A information processing system, 100,100A information processing device, 110 communication unit, 120 storage unit, 121 image data DB, 122 correct answer T (text) data DB, 123 text data DB, 124 reading learning model DB, 130 control unit , 131 image data acquisition unit, 132 reading item recognition unit, 133 text data generation unit, 134 attribute setting unit, 135 correct answer data extraction unit, 136 learning unit, 137 image data reading unit, 200 user terminal, 210 communication unit, 220 display Unit, 230 operation unit, 240 storage unit, 250 control unit, NW network

Claims

An information processing device that reads the character information from image data including character information and performs machine learning on a learning model for estimating the position of the read character information in the image data.
An image data acquisition unit that acquires the image data,
A reading item recognition unit that identifies the position of the character information included in the image data and recognizes it as a reading item.
A text data generation unit that performs character recognition of the character information in the read item and generates text data,
The text data and the correct text data indicating the character information included in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match. The correct answer data extraction unit that extracts the text data
It includes a learning unit that performs machine learning based on the extracted text data and the position of a read item in the image data that is the basis of the extracted text data, and generates or updates the learning model. Information processing device.
The correct answer data extraction unit calculates the matching degree of the text data by comparing the text data with the correct answer text data, and when the calculated matching degree is equal to or more than a predetermined threshold value, the text data and the correct answer text data The information processing apparatus according to claim 1, wherein the information processing apparatus is determined to match.
The correct answer data extraction unit
The read items in the text data are compared with the read items in the correct text data, and it is determined that the read items match.
The information according to claim 1 or 2, wherein when it is determined that they match, the text data and the correct text data are compared for each read item, and it is determined whether or not they match. Processing equipment.
The correct answer data extraction unit
The reading item in the text data is compared with the reading item in the correct text data to calculate the matching degree of the reading item of the text data, and when the calculated matching degree is equal to or more than a predetermined threshold value, each reading item is determined. Judge that they match,
When it is determined that they match, the text data and the correct text data are compared for each read item, the matching degree of the text data is calculated for each reading item, and the calculated matching degree is predetermined. The information processing apparatus according to claim 3, wherein if the text data is equal to or greater than the threshold value, it is determined that the text data match and the data is extracted.
The information processing device according to claim 4, wherein the correct answer data extraction unit determines that the text data match when the calculated matching degree is equal to or higher than a predetermined threshold value that differs for each read item.
Claims 1 to 5 include an attribute setting unit for setting the attributes of the read item based on the recognized position of the read item in the image data and the text data read from the read item. The information processing apparatus according to any one of the above items.
The correct answer data extraction unit
The attribute of the read item in the text data is compared with the attribute set in the read item in the correct answer text data, and it is determined whether or not they match.
The information processing apparatus according to claim 6, wherein when it is determined that they match, the text data and the correct text data are compared for each attribute and it is determined whether or not they match.
The learning unit performs supervised machine learning using the extracted text data and the position of a read item in the image data on which the extracted text data is based as supervised data, according to claims 1 to 1. The information processing apparatus according to any one of 7.
The method according to any one of claims 1 to 8, further comprising an image data reading unit that acquires new image data, performs character recognition of character information, and generates new text data based on the learning model. Information processing equipment.
It is an information processing method that reads the character information from image data including character information and performs machine learning on a learning model for estimating the position of the read character information in the image data.
An image data acquisition step of acquiring the image data performed by the image data acquisition unit, and
A reading item recognition step performed by the reading item recognition unit to identify the position of the character information included in the image data and recognize it as a reading item.
A text data generation step of performing character recognition of the character information in a read item and generating text data performed by the text data generation unit, and
The text data performed by the correct answer data extraction unit is compared with the correct text data indicating the character information included in the image data stored in advance for each read item, and it is determined whether or not they match. , The correct answer data extraction step to extract the text data determined to match, and
Learning that the learning unit performs machine learning based on the extracted text data and the position of the read item in the image data that is the basis of the extracted text data, and generates or updates the learning model. An information processing method that includes steps.
An information processing program that reads the character information from image data including character information and performs machine learning on a learning model for estimating the position of the read character information in the image data.
The image data acquisition step for acquiring the image data and
A reading item recognition step that identifies the position of the character information included in the image data and recognizes it as a reading item.
A text data generation step of performing character recognition of the character information in the read item and generating text data, and
The text data and the correct text data indicating the character information included in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match. Correct answer data extraction step to extract the said text data
A learning step of performing machine learning based on the extracted text data and the position of a read item in the image data on which the extracted text data is based to generate or update the learning model, and electronically. An information processing program for a computer to execute.