WO2023181149A1

WO2023181149A1 - Character recognition system, character recognition method, and recording medium

Info

Publication number: WO2023181149A1
Application number: PCT/JP2022/013389
Authority: WO
Inventors: 裕一中谷
Original assignee: 日本電気株式会社
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2023-09-28

Abstract

This character recognition system comprises an acquisition unit, a recognition unit, and an output unit. The acquisition unit acquires an image depicting a character noted on a preprint of a ledger sheet including the preprint. The recognition unit uses the image depicting the character noted on the preprint, and a recognition model for recognizing a character noted on a preprint from a preprint image depicting the preprint, to recognize, from the acquired image and the preprint image, the character noted on the preprint of the acquired image. The output unit outputs the result of recognition.

Description

Character recognition system, character recognition method and recording medium

The present invention relates to a character recognition system and the like.

OCR (Optical Character Recognition) is widely used, which reads handwritten characters written on a form as an image using a scanner, converts it into text data by recognizing the characters in the image. Character recognition by OCR is performed, for example, by using a learning model generated by machine learning to recognize characters written on a preprint of a form. However, even when the same characters are written, the shape of the characters written on the preprint of the form and the position where the characters are written on the preprint vary depending on the person writing the characters. Furthermore, in an image obtained by reading characters written on a preprint, the preprint and characters coexist in the image. Therefore, a learning model that recognizes handwritten characters on a form uses a mixture of preprints and characters written on preprints that are written in various shapes and in various positions. It may be required to be able to accurately recognize images. For this reason, it is desirable to have a technology that can accurately recognize characters on preprinted forms.

The image processing system of Patent Document 1 uses a learning model to extract handwritten characters written within the frame of a preprint. The image processing system disclosed in Patent Document 1 extracts handwritten characters from an image of handwritten characters written within the preprint frame by erasing the preprint frame through image processing.

JP2021-39424A

With the information processing device of Patent Document 1, it may be difficult to accurately recognize characters written on a preprint.

In order to solve the above problems, the main object of the present invention is to provide a character recognition system etc. that can improve the recognition accuracy of characters written on preprints.

In order to solve the above problems, the character recognition system of the present invention includes an acquisition means for acquiring an image of the characters written on the preprint of a form including the preprint, and an acquisition means for acquiring an image of the characters written on the preprint of a form including the preprint. Using a recognition model that recognizes the characters written on the preprint from the captured image and the preprint image that captures the preprint, and an output means for outputting the recognition result.

The character recognition method of the present invention acquires an image of characters written on a preprint of a form including a preprint, and combines an image of the characters written on the preprint and a print of the preprint. Using a recognition model that recognizes the characters written on the preprint from the print image, the characters written on the preprint of the acquired image are recognized from the acquired image and the preprint image, and the recognition model is used to recognize the characters written on the preprint of the acquired image. Output the results.

The recording medium of the present invention is capable of acquiring an image of characters written on a preprint of a form including a preprint, an image of characters written on the preprint, and a process of acquiring an image of the characters written on the preprint of a form, and A process of recognizing the characters written on the preprint of the acquired image from the acquired image and the preprint image using a recognition model that recognizes the characters written on the preprint from the acquired preprint image. A character recognition program that causes a computer to execute a process of outputting a recognition result is recorded non-temporarily.

According to the present invention, the recognition accuracy of characters written on preprints can be improved.

1 is a diagram showing an example of a configuration of a first embodiment of the present invention. FIG. It is a figure showing an example of a form in a 1st embodiment of the present invention. It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. 1 is a diagram showing an example of the configuration of a character recognition system according to a first embodiment of the present invention. It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. FIG. 2 is a diagram showing an example of the operation flow of the character recognition system according to the first embodiment of the present invention. FIG. 2 is a diagram showing an example of the operation flow of the character recognition system according to the first embodiment of the present invention. It is a figure showing an example of composition of a 2nd embodiment of the present invention. It is a figure showing an example of composition of a character recognition system of a 2nd embodiment of the present invention. FIG. 7 is a diagram schematically showing a flow of data processing in a second embodiment of the present invention. It is a figure showing an example of an operation flow of a character recognition system of a 2nd embodiment of the present invention. It is a figure showing an example of an operation flow of a character recognition system of a 2nd embodiment of the present invention. It is a figure showing an example of an operation flow of a character recognition system of a 2nd embodiment of the present invention. It is a figure showing the example of composition of other embodiments of the present invention.

(First embodiment)
A first embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram showing an example of the configuration of a form processing system according to this embodiment. The form processing system includes, for example, a character recognition system 10, a scanner 20, and an information processing server 30. The character recognition system 10 is connected to a scanner 20 via a network, for example. Further, the character recognition system 10 is connected to an information processing server 30 via a network. There may be a plurality of scanners 20 and information processing servers 30. The number of scanners 20 and information processing servers 30 is not particularly limited.

For example, the character recognition system 10 acquires an image obtained by reading a form by the scanner 20. A preprint for writing characters is printed on the paper of the form. A preprint is, for example, a frame or a line on a form that indicates the position where characters are written. The character recognition system 10 acquires, for example, an image of handwritten characters written on a preprint. The characters written on the preprint may be printed. The characters written on the preprint are not limited to the above examples.

The character recognition system 10 uses a recognition model to identify the characters written on the preprint from an image of the characters written on the preprint obtained from the scanner 20 and a preprint image of the preprint. Recognize. The recognition model is a learning model that recognizes characters written on a preprint from an image of the characters written on the preprint and the preprint image. The character recognition system 10 outputs the recognition results of characters written on the preprint to the information processing server 30, for example. The information processing server 30 is a server that performs processing according to the purpose of the recognition results of characters written on the preprint.

The character recognition system 10 recognizes characters by recognizing the characters written on the preprint using the preprint image in addition to the image of the characters written on the preprint to be recognized. It is possible to suppress the influence of preprints on

FIG. 2 is a diagram showing an example of a form. In the example of the form in FIG. 2, the name of the form is written as "payment slip" at the top. The example of the form in FIG. 2 is, for example, a document submitted to a financial institution when depositing money into an account at the financial institution. In the example of the form shown in FIG. 2, entry columns for "account number" and "amount" are set. In the example of the form shown in FIG. 2, the frames in which numbers are entered in the "account number" and "amount" fields are preprints.

The characters written on the preprint are, for example, the characters written within the frame of the preprint. The characters written on the preprint may be written so as to overlap with the frame of the preprint. An image with characters written on a preprint is an image that includes both the preprint and the characters written on the preprint. Furthermore, the preprint image is an image of only a preprint without any characters written on it. In the example of the form in FIG. 2, numbers are written on the preprint, but the characters written on the preprint are not limited to numbers. Furthermore, the characters written on the preprint may include symbols.

FIG. 3 is a diagram showing an example of an image of characters written on a preprint. FIG. 3 is an extracted image of the "account number" entry field in the example of the form shown in FIG. 2. Further, FIG. 4 is a preprint image of the entry field for "account number" in the example of the form shown in FIG. In the example of the image of the characters written on the preprint shown in FIG. 3, the characters "01778543" are handwritten on the preprint shown in FIG.

An image that is only a preprint may include characters as a preprint. The characters as the preprint are, for example, characters that indicate the digit of the amount, characters that indicate the item, or characters that indicate the unit. The characters as a preprint are not limited to those mentioned above, as long as they are printed on paper as a preprint.

FIG. 5 is a diagram showing an example of an image of characters written on a preprint. FIG. 5 is an image in which the "amount" entry field is extracted from the example of the form shown in FIG. 2. Further, FIG. 6 shows the example of the form shown in FIG. This is a preprint image of the "amount" entry field. In the example of the preprint image in FIG. 6, "yen" indicating the unit of monetary amount is printed as part of the preprint at the bottom of the frame on the right. In the example of the image of the characters written on the preprint shown in FIG. 5, the characters "40000" are handwritten on the preprint shown in FIG.

A form is a document used for procedures at, for example, financial institutions, government offices, educational institutions, hospitals, transportation facilities, or companies. Further, the form may be a document attached to an item to be managed. Examples of forms are not limited to the above. The preprint indicates, for example, a position on the form where the date, name, affiliation, address, telephone number, e-mail address, age, gender, occupation, or amount is to be written. A preprint is composed of, for example, items to be filled in and a frame in which characters are written. When multiple characters are entered in one item, the preprint may be a series of multiple frames. Furthermore, preprints for a plurality of items may be printed on one form. For example, when a preprint is printed on a sheet of paper as an entry column with a plurality of consecutive frames, the character recognition system 10 outputs the recognized characters as character string data according to the order of the frames.

The configuration of the character recognition system 10 will be explained. FIG. 7 is a diagram showing an example of the configuration of the character recognition system 10. The character recognition system 10 includes an acquisition section 11, a recognition section 13, and an output section 14 as basic components. The character recognition system 10 further includes an image extraction section 12, a generation section 15, and a storage section 16. For example, the acquisition unit 11, the image extraction unit 12, the recognition unit 13, the output unit 14, and the storage unit 16 extract the characters written on the preprint from an image of the characters written on the preprint. Recognize. Further, the acquisition unit 11, the generation unit 15, and the storage unit 16 generate a recognition model, for example.

The acquisition unit 11 acquires an image of the characters written on the preprint. The acquisition unit 11 acquires, for example, from the scanner 20 an image of a form with characters written on a preprint. The acquisition unit 11 may acquire an image in which the portion of the preprint with characters written on it has already been extracted from the form. The image of the portion where the characters are written on the preprint is, for example, the image shown in the examples of FIGS. 3 and 5. When extracting a preprint image from a form, the acquisition unit 11 may acquire an image of the form without any characters written on the preprint. The acquisition unit 11 acquires, for example, from the scanner 20 an image of a form with no characters written on the preprint.

When the character recognition system 10 generates a recognition model, the acquisition unit 11 may acquire learning data used to generate the recognition model. For example, the generation unit 15 acquires, as learning data, an image of the characters written on the preprint, and data in which the preprint image is associated with the characters written on the preprint. The learning data is input into the character recognition system 10 or another terminal device connected to the character recognition system 10, for example, by an operator's operation.

The image extracting unit 12 extracts a preprint image that corresponds to the image obtained by the obtaining unit 11 and depicting the characters written on the preprint. The image extraction unit 12 extracts a preprint image from the form data stored in the storage unit 16, for example. The form data includes, for example, an image of a form and definition data. The definition data includes, for example, information about the items to be written on the form and the position of the preprint corresponding to the written item on the form. Information on the position of the preprint, for example, information indicating the range where the preprint is printed on the form. However, the items to be described include, for example, one or more of name, postal code, address, telephone number, age, personal identification number, account number, amount, and date. The items to be described are not limited to the above examples.

The image extraction unit 12 identifies the position of the preprint on the form, for example, based on the information on the position of the preprint included in the definition data. Then, the image extraction unit 12 extracts a preprint image from the image stored in the storage unit 16 by cutting out the image at the specified preprint position. The image extraction unit 12 may extract the preprint image from the image of the form with no characters written on the preprint, which is acquired by the acquisition unit 11.

The recognition unit 13 uses the recognition model to recognize the characters in the image from the image of the characters written on the preprint acquired by the acquisition unit 11 and the preprint image. The recognition model is a learning model that recognizes the characters written on the preprint from an image of the characters written on the preprint and the preprint image. The recognition unit 13 inputs, for example, the image of the characters written on the preprint and the preprint image acquired by the acquisition unit 11 into the recognition model. Then, the recognition unit 13 recognizes the characters written on the preprint using the recognition model. The recognition unit 13 may recognize characters written on the preprint using a preprint image extracted in advance. Further, the recognition unit 13 may recognize characters written on the preprint using a preprint image generated in advance as an image of the preprint portion. The recognition unit 13 uses, for example, a preprint image stored in the storage unit 16 to recognize characters written on the preprint.

The recognition unit 13 extracts an image showing the characters written on the preprint by specifying the position of the preprint, for example, based on the information on the position of the preprint included in the definition data. Then, the recognition unit 13 uses the recognition model to identify the characters written on the preprint from the extracted image of the characters written on the preprint and the preprint image extracted by the image extraction unit 12. recognize.

For example, the recognition unit 13 combines an image of the characters written on the preprint and the preprint image into one data and inputs the data into the recognition model. Combining an image of characters written on a preprint with a preprint image means generating image data by superimposing the two images. If the image with characters written on the preprint and the preprint image are images with three channels of RGB per pixel, the recognition unit 13, for example, combines the data of the two images to create a single image. Image data of 6 channels per pixel. Then, the recognition unit 13 inputs the combined six-channel image data to the recognition model.

The recognition unit 13 combines, for example, an image of characters written on a preprint and a preprint image based on preset conditions. The recognition unit 13, for example, uses an image of the characters written on the preprint extracted at the same size and a character written on the preprint based on the outer periphery of the preprint image. The two images are combined by overlapping the preprint image and the preprint image. For example, when two images are superimposed, the recognition unit 13 combines image data of corresponding pixels. Then, the recognition unit 13 inputs the combined data into a recognition model and recognizes the characters written on the preprint.

The recognition unit 13 may recognize characters other than those written on the preprint in the image of the form. For example, the recognition unit 13 may identify the type of the form from the image of the form acquired by the acquisition unit 11. Then, the recognition unit 13 recognizes the characters written on the preprint by specifying the position of the preprint based on the definition data included in the form data corresponding to the specified type of form. The recognition unit 13 identifies the type of the form, for example, by recognizing the form name or form number printed on the form in the image of the form. The relationship between the form name or form number printed on the form and the type of form is set in advance. Further, the recognition model used by the recognition unit 13 may be a learning model generated outside the character recognition system 10.

FIG. 8 is a diagram showing an example of an image of characters written on a preprint. FIG. 8 differs from the example of the image in FIG. 3 in the aspect of the preprint. The preprint of the example image in FIG. 8 differs from the example image in FIG. 3 in the thickness and type of lines, for example. FIG. 9 is a preprint image of the example image of FIG. In the example of the image of the characters written on the preprint shown in FIG. 8, the characters "13758047" are handwritten on the preprint shown in FIG. The recognition model outputs "13758047" as a recognition result when the example image in FIG. 8 and the example image in FIG. 9 are input. For example, even if the recognition model is a learning model generated using the preprint of the example image in FIG. 4 as learning data, it cannot recognize the characters written on the preprint of the example image in FIG. 9. I can do it. That is, by inputting an image of characters written on a preprint and a preprint image, the recognition model can recognize characters written on a preprint that has not been trained.

FIG. 10 is a diagram showing an example of an image in which characters describing the year in Western calendar notation are printed on a preprint. In the example of the image in FIG. 10, the characters "A.D." and "Year" are printed in advance within the frame of the preprint. In the example image of FIG. 10, the characters "2022" are handwritten on the preprint image. Further, FIG. 11 is a preprint image in the example of the image in FIG. 10. The recognition model outputs "2022" as a recognition result when the example image in FIG. 10 and the example image in FIG. 11 are input.

FIG. 12 shows an example of an image in which, in the image example of FIG. 10, the upper two digits "20" indicating the year in Western calendar notation are printed in advance as a preprint. That is, in the example image of FIG. 12, "Year", "20", and "Year" are printed in advance as preprints. In the example image of FIG. 12, "22" out of "2022" is handwritten on the preprint. Moreover, FIG. 13 is a preprint image in the example of the image of FIG. 12. The recognition model outputs "22" as a recognition result when the example image in FIG. 12 and the example image in FIG. 13 are input. For example, even if the recognition model is a learning model that does not use the preprints of the example images of FIGS. 11 and 13 as learning data, it is possible to recognize the image written on the preprint from the input image. I can do it. In this way, the recognition model can recognize characters written on various forms of preprints by inputting images of the characters written on the preprints and preprint images. . In addition, although the above example shows preprints with different line thicknesses, line types, and preprinted characters, the recognition model can also be used for preprints with different frame shapes and colors. Recognition can be performed in the same way even if learning is not performed using preprints of all aspects as learning data.

The output unit 14 outputs the recognition result by the recognition unit 13. The output unit 14 outputs the characters recognized by the recognition unit 13 to the information processing server 30, for example. The output unit 14 outputs, for example, an item corresponding to the preprint and the recognized characters in association with each other. When the recognition target is an account number as shown in the example image of FIG. 3, the output unit 14 outputs, for example, information indicating that it is an account number in association with the recognized character string. The output unit 14 may output the recognition result to a display device (not shown) connected to the character recognition system 10.

When generating a recognition model in the character recognition system 10, the generation unit 15 performs processing related to generation of the recognition model. The generation unit 15 learns an image of the characters written on the preprint, and the relationship between the preprint image and the characters written on the preprint. Then, the generation unit 15 generates a recognition model that recognizes the characters in the image from the image of the characters written on the preprint and the preprint image.

For example, the generation unit 15 generates a recognition model by learning the relationship between an image of the characters written on the preprint, data obtained by combining the preprint images, and the characters written on the preprint. generate. When the image showing the characters written on the preprint and the preprint image are images with three channels of RGB per pixel, the generation unit 15 combines the data of the two images to create one image. Image data of 6 channels per pixel. The generation unit 15 then generates a recognition model by learning the relationship between the combined six-channel image data and the characters written on the preprint.

The image of the characters written on the preprint and the preprint included in the preprint image, which the generation unit 15 uses as learning data, do not have to be images of the preprint actually used. When generating the recognition model, the generation unit 15 may perform learning using randomly shaped figures as preprints. When using a random-shaped figure as a preprint, the generation unit 15 generates, for example, an image of characters written on the randomly-shaped figure, and an image that is the same as the figure with the characters written on it. A recognition model is generated using the images as learning data.

The generation unit 15 generates a recognition model by deep learning using, for example, DNN (Deep Neural Network). Machine learning algorithms for generating recognition models are not limited to deep learning using DNN.

The storage unit 16 stores, for example, a recognition model used by the recognition unit 13 to recognize characters in an image. The storage unit 16 stores, for example, preprint images. The storage unit 16 stores, for example, form data. The form data includes, for example, image data of a form and definition data. The form data may include a preprint image extracted in advance. The storage unit 16 stores, for example, an image of characters written on a preprint, a preprint image, and characters written on the preprint as learning data. Note that the recognition model used by the recognition unit 13 may be stored in a storage means other than the storage unit 16.

The scanner 20, for example, optically reads a form and generates an image of the form. The scanner 20 then outputs the image of the form to the character recognition system 10. The scanner 20 may extract the image of the preprint portion from among the images of the form. When extracting an image of the preprint portion, the scanner 20 outputs the extracted preprint image to the character recognition system 10. Further, when the form is a document attached to an item to be managed, the scanner 20 may generate an image of the form by photographing the form.

The information processing server 30 acquires, for example, the recognition results of characters written on the form from the character recognition system 10. The information processing server 30 uses the recognition results to perform processing according to the purpose. The information processing server 30 uses the recognition results, for example, in processing related to application and deposit/withdrawal related to account management at a financial institution. The information processing server 30 may use the recognition results, for example, to process application documents in government offices, educational institutions, hospitals, or transportation facilities. The information processing server 30 may use the recognition results for slip processing at a company. Further, the information processing server 30 may use the identification results for managing the goods in distribution. Examples of identification results are not limited to the above.

The operation of the character recognition system 10 when recognizing characters written on a preprint will be described. FIG. 14 is a diagram showing an example of an operation flow when the character recognition system 10 recognizes characters written on a preprint.

The acquisition unit 11 acquires an image showing the characters written on the preprint (step S11). The acquisition unit 11 acquires, for example, from the scanner 20 an image of a form showing characters written on the preprint.

Furthermore, the image extraction unit 12 extracts a preprint image corresponding to the image acquired by the acquisition unit 11 (step S12). The image extraction unit 12 extracts, for example, a preprint image corresponding to the image acquired by the acquisition unit 11 from the data stored in the storage unit 16.

When the preprint image is extracted, the recognition unit 13 uses the recognition model to recognize characters in the image from the image acquired by the acquisition unit 11 and the preprint image (step S13). The recognition model recognizes the characters written on the preprint from the image of the characters written on the preprint and the preprint image.

When the characters in the image are recognized, the output unit 14 outputs the recognition results (step S14). The output unit 14 outputs the recognition result to the information processing server 30, for example.

The operation of the character recognition system 10 when generating a recognition model will be explained. FIG. 15 is a diagram showing an example of an operation flow when the character recognition system 10 generates a recognition model.

The acquisition unit 11 acquires, as learning data, an image of the characters written on the preprint, the preprint image, and the characters written on the preprint (step S21).

Upon acquiring the learning data, the generation unit 15 learns the image of the characters written on the preprint and the relationship between the preprint image and the characters written on the preprint, and generates a recognition model ( Step S22). For example, the generation unit 15 combines an image of characters written on a preprint with a preprint image. Then, the generation unit 15 learns the relationship between the combined data and the characters written on the preprint, which are included as correct data in the learning data, and generates a recognition model.

After generating the recognition model, the generation unit 15 saves the generated recognition model (step S23). The generation unit 15 stores the generated recognition model in the storage unit 16, for example.

The character recognition system 10 of the form processing system of this embodiment uses a recognition model to recognize characters written on a preprint from an image of the characters written on the preprint and a preprint image. . The character recognition system 10 recognizes the characters written on the preprint by further using the preprint image in addition to the written image showing the characters written on the preprint to be recognized. It is possible to suppress the influence of preprints on the recognition of As a result, the character recognition system 10 can improve the accuracy of recognizing characters written on preprints.

In addition, the recognition model used by the character recognition system 10 uses as input an image of the characters written on the preprint and the preprint image, and performs learning by recognizing the characters written on the preprint. It is possible to recognize characters written on a preprint of an embodiment in which the method is not performed. Therefore, the character recognition system 10 recognizes the characters written on the preprint by inputting an image showing the characters written on the preprint and the preprint image. Characters written on prints can be recognized. Furthermore, in the character recognition system 10, when generating a recognition model, it is not necessary to prepare learning data for each form of preprint actually used for recognition. In addition, in the character recognition system 10, when generating a recognition model, there is no need to learn learning data for each form of preprint actually used for recognition, so the amount of learning when generating a recognition model is suppressed. be able to. Therefore, in the character recognition system 10, the computer resources necessary for generating a recognition model can be suppressed. Therefore, the character recognition system 10 can efficiently generate a recognition model.

Furthermore, by using randomly shaped figures as preprints when generating recognition models, the character recognition system 10 can generate recognition models that can recognize characters written on various preprint images. I can do it. That is, by using a recognition model generated using randomly shaped figures as a preprint, the character recognition system 10 can recognize the characters written on the preprint even if the shape of the preprint image is different for each form. Can recognize characters accurately.

Furthermore, as a character recognition method different from this embodiment, for example, if a method is used in which character recognition is performed after erasing the preprint from an image of the characters written on the preprint, the preprint It can require a lot of computer resources to erase. Furthermore, when erasing a preprint, there is a risk that some of the characters may disappear. On the other hand, the character recognition system 10 of this embodiment recognizes characters by inputting an image of characters written on a preprint and data obtained by combining the preprint images into a recognition model. There is no need to erase preprints as preprocessing for recognition. Further, since the process of erasing the preprint is not performed, it is possible to suppress the influence of the process related to erasing the preprint on character recognition. Therefore, the character recognition system 10 of this embodiment can improve recognition accuracy while suppressing the resources necessary for recognizing characters written on preprints.

(Second embodiment)
A second embodiment of the present invention will be described in detail with reference to the drawings. FIG. 16 is a diagram showing an example of the configuration of the form processing system of this embodiment. The form processing system includes, for example, a character recognition system 40, a scanner 20, and an information processing server 30. The character recognition system 40 is connected to the scanner 20 via a network, for example. Further, the character recognition system 40 is connected to the information processing server 30 via a network. There may be a plurality of scanners 20 and information processing servers 30. The number of scanners 20 and information processing servers 30 is not particularly limited. Further, the functions of the scanner 20 and the information processing server 30 of this embodiment are similar to those of the scanner 20 and the information processing server 30 of the first embodiment.

The character recognition system 10 of the first embodiment uses, for example, a recognition model to input data that combines an image in which characters are written on a preprint and a preprint image, and recognizes the characters on the preprint. recognize. The character recognition system 10 then outputs the recognition result. In addition to such a configuration, the character recognition system 40 of the present embodiment, for example, when combining an image with characters written on a preprint and a preprint image, improves the accuracy of overlapping the two images. In order to improve the image quality, a transformation model is used to transform the preprint images and then combine them. The conversion model is a learning model that estimates conversion parameters used when performing conversion processing on a preprint image.

The configuration of the character recognition system 40 will be explained. FIG. 17 is a diagram showing an example of the configuration of the character recognition system 40. The character recognition system 40 includes an acquisition section 11 , an image extraction section 12 , a recognition section 41 , an output section 14 , a generation section 42 , and a storage section 16 . The recognition unit 41 also includes a conversion unit 51 and an image recognition unit 52. The configuration and functions of the acquisition unit 11, image extraction unit 12, output unit 14, and storage unit 16 of the character recognition system 40 are the same as the acquisition unit 11, image extraction unit 12, and output unit 14 of the character recognition system 10 of the first embodiment. and storage unit 16, respectively.

The conversion unit 51 of the recognition unit 41 converts the preprint image using, for example, a conversion model. The transformation model, for example, performs an affine transformation on the preprint image. The recognition unit 41 converts the preprint image so that it overlaps with the destination image by, for example, rotating, adjusting the size, and moving the preprint image in parallel. The transformation model estimates transformation parameters used when rotating, adjusting size, and translating a preprint image, for example.

For example, the conversion unit 51 uses a conversion model to estimate affine transformation parameters from data obtained by combining an image of characters written on a preprint and a preprint image according to preset conditions. Then, the conversion unit 51 performs affine transformation on the preprint image using the estimated parameters. For example, the converting unit 51 combines the two images so that they overlap by matching the outer periphery of each of the preprint images with the image of the characters written on the preprint as a preset condition. . Then, the conversion unit 51 uses the conversion model to estimate conversion parameters from the data combined under preset conditions. The conversion parameter is a parameter for converting preprint images so that the accuracy of overlaying is improved compared to when combining under preset conditions. After estimating the transformation parameters, the transformation unit 51 performs affine transformation on the preprint image using the transformation parameters, thereby increasing the accuracy of superposition.

The transformation model is, for example, a learning model that uses DNN called STN (Spatial Transformer Networks). The image transformation method using STN is described, for example, in Max Jaderberg et al. "Spatial Transformer Networks", NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 2, December 2015, p. 2017-2025 Are listed.

The image recognition unit 52 of the recognition unit 41 uses a recognition model to recognize the characters written on the preprint from the image of the characters written on the preprint and the preprint image. The image recognition unit 52 combines the image of the characters written on the preprint with the preprint image on which the conversion unit 51 has performed affine transformation. Then, the image recognition unit 52 uses the identification model to recognize characters written on the preprint from the combined data. The conversion model and the recognition model may be learning models generated outside the character recognition system 40.

FIG. 18 is a diagram schematically showing the flow of processing when the recognition unit 41 recognizes characters written on a preprint. In the example of FIG. 18, it is assumed that an image showing characters written on a preprint and a preprint image are input to the recognition unit 41. The converting unit 51 combines, for example, an image of characters written on a preprint and a preprint image according to, for example, preset conditions. The preset conditions are, for example, set so that the outer peripheries of the two images are aligned. After combining the images, the conversion unit 51 estimates affine transformation parameters using the transformation model. Then, the conversion unit 51 performs affine transformation on the preprint image using the estimated affine transformation parameters. The converting unit 51 outputs the preprint image that has undergone affine transformation to the image recognizing unit 52. When the preprint image that has been subjected to affine transformation is input, the image recognition unit 52 combines the image in which characters are written on the preprint and the image that has been subjected to affine transformation. When the images are combined, the image recognition unit 52 uses the recognition model to recognize characters written on the preprint from the combined data.

For example, the character recognition system 40 generates only the recognition model out of the conversion model and the recognition model. When only a recognition model is generated, for example, a learning model generated outside the character recognition system 40 is used as the conversion model. When generating only the recognition model out of the conversion model and the recognition model, the generation unit 42 generates, for example, an image containing the characters written on the preprint and the preprint image, which are included in the learning data. , are combined using a transformation model. Then, the generation unit 42 learns the relationship between the combined data and the characters written on the preprint, which are included as correct data in the learning data, and generates a recognition model. The generation unit 42 stores the generated conversion model and recognition model in the storage unit 16.

The character recognition system 40 may generate both a conversion model and a recognition model. When generating both a conversion model and a recognition model, the generation unit 42 uses the conversion model to combine an image of the characters written on the preprint and the preprint image according to preset conditions. Estimate the transformation parameters from the data. Furthermore, the generation unit 42 uses the recognition model to recognize characters written on the preprint from the combined data. The generation unit 42 updates the parameters of the transformation model so that the difference between the affine transformation parameters estimated by the transformation model and the affine transformation parameters included in the learning data becomes smaller. The generation unit 42 also updates the parameters of the recognition model so that the difference between the identification result and the correct data becomes smaller.

After updating the conversion parameters of the conversion model and the parameters of the recognition model, the generation unit 42 repeats the above process using the updated model. For example, the generation unit 42 generates a conversion model and a recognition model by repeating the above processing until the accuracy of the estimation result of the conversion parameter of the conversion model and the recognition result of the recognition model satisfy a preset standard. Further, the generation unit 42 generates the identification model by, for example, updating the parameters of the recognition model so that the difference between the identification result and the correct data becomes smaller. The generation unit 42 stores the generated conversion model and recognition model in the storage unit 16, for example.

The operation of the character recognition system 40 when recognizing characters written on a preprint will be described. FIG. 19 is a diagram showing an example of an operation flow when the character recognition system 40 recognizes characters written on a preprint.

The acquisition unit 11 acquires an image showing the characters written on the preprint (step S31). The acquisition unit 11 acquires, for example, from the scanner 20 an image of a form showing characters written on the preprint.

Furthermore, the image extraction unit 12 extracts a preprint image corresponding to the image acquired by the acquisition unit 11 (step S32). The image extraction unit 12 extracts, for example, a preprint image corresponding to the image acquired by the acquisition unit 11 from the data stored in the storage unit 16.

When the preprint image is acquired, the conversion unit 51 of the recognition unit 41 uses the conversion model to estimate conversion parameters to be used when converting the preprint image. Then, the conversion unit 51 converts the preprint image using the estimated conversion parameters (step S33). When the preprint image is converted, the image recognition unit 52 combines the image with the characters written on the preprint and the converted preprint image. Then, the image recognition unit 52 uses the recognition model to recognize characters in the image from the combined data (step S34).

When the characters in the image are recognized, the output unit 14 outputs the recognition results (step S35). The output unit 14 outputs the recognition result to the information processing server 30, for example.

The operation when the character recognition system 40 generates only the recognition model out of the conversion model and the recognition model will be described. FIG. 20 is a diagram showing an example of an operation flow when the character recognition system 40 generates only a recognition model.

The acquisition unit 11 acquires, as learning data, an image of the characters written on the preprint, the preprint image, and the characters written on the preprint (step S41).

Once the learning data is acquired, the generation unit 42 uses the conversion model to estimate conversion parameters to be used when converting the preprint image. Then, the generation unit 42 converts the preprint image using the estimated conversion parameters and the conversion model (step S42).

After converting the preprint image, the generation unit 42 combines the image containing the characters written on the preprint with the converted preprint image. The generation unit 42 then learns the relationship between the combined data and the characters written on the preprint, and generates a recognition model (step S43).

After generating the recognition model, the generation unit 42 saves the generated recognition model (step S44). The generation unit 42 stores the generated recognition model in the storage unit 16, for example.

The operation of the character recognition system 40 when generating a conversion model and a recognition model will be explained. FIG. 21 is a diagram showing an example of an operation flow when the character recognition system 40 generates a conversion model and a recognition model.

The acquisition unit 11 acquires, as learning data, data obtained by combining an image of the characters written on the preprint and the preprint image, a conversion parameter, and the characters written on the preprint (step S51 ).

When the learning data is acquired, the generation unit 42 combines the data included in the learning model, which is a combination of the image of the characters written on the preprint and the preprint image, and the parameters included in the learning model. Generate a transformation model by learning relationships. In addition, the generation unit 42 generates a recognition model by learning the relationship between the data obtained by combining the image of the characters written on the preprint and the preprint image, and the characters written on the preprint ( Step S52).

After generating the conversion model and recognition model, the generation unit 42 saves the generated conversion model and recognition model (step S53). The generation unit 42 stores the generated conversion model and recognition model in the storage unit 16, for example.

The character recognition system 40 of this embodiment uses a conversion model to combine an image of characters written on a preprint with a preprint image. Then, the character recognition system 40 uses the recognition model to recognize characters written on the preprint from the combined data. By using the preprint image converted using the conversion model, the character recognition system 40 improves the accuracy of superposition when combining the image of the characters written on the preprint with the preprint image. can be improved. By using the data combined in this way, the character recognition system 40 uses the recognition model to recognize the characters written on the preprint and the preprint image while suppressing fluctuations in the deviation between the image showing the characters written on the preprint and the preprint image. Characters on preprints can be recognized. The character recognition system 40 improves the recognition accuracy of the characters written on the preprint by using the recognition model to recognize the characters written on the preprint while variations in the shift between the two images are suppressed. can do.

In addition, when generating a conversion model using learning data, the character recognition system 40 recognizes the overlap between the image of the characters written on the preprint and the preprint image, which may occur in actual use. A conversion model that suppresses misalignment can be generated. Therefore, the character recognition system 40 can suppress variations in the deviation between the image in which characters are written on the preprint and the preprint image, depending on the actual usage situation. Therefore, when generating a conversion model using learning data, the character recognition system 40 can further improve the recognition accuracy of characters written on a preprint.

Each process in the character recognition system 10 of the first embodiment and the character recognition system 40 of the second embodiment can be realized by executing a computer program on a computer. FIG. 22 shows an example of the configuration of a computer 200 that executes a computer program that performs each process in the character recognition system 10 of the first embodiment and the character recognition system 40 of the second embodiment. The computer 200 includes a CPU (Central Processing Unit) 201, a memory 202, a storage device 203, an input/output I/F (Interface) 204, and a communication I/F 205.

The CPU 201 reads computer programs for performing each process from the storage device 203 and executes them. The CPU 201 may be configured by a combination of multiple CPUs. Further, the CPU 201 may be configured by a combination of a CPU and other types of processors. For example, the CPU 201 may be configured by a combination of a CPU and a GPU (Graphics Processing Unit). The memory 202 is configured with a DRAM (Dynamic Random Access Memory) or the like, and temporarily stores computer programs executed by the CPU 201 and data being processed. The storage device 203 stores computer programs executed by the CPU 201. The storage device 203 is configured by, for example, a nonvolatile semiconductor storage device. Other storage devices such as a hard disk drive may be used as the storage device 203. The input/output I/F 204 is an interface that receives input from a worker and outputs display data and the like. The communication I/F 205 is an interface that transmits and receives data between the scanner 20 and the information processing server 30. Furthermore, the information processing server 30 may also have a similar configuration.

The computer program used to execute each process can also be stored and distributed in a computer-readable recording medium that non-temporarily records data. As the recording medium, for example, a magnetic tape for data recording or a magnetic disk such as a hard disk can be used. Further, as the recording medium, an optical disc such as a CD-ROM (Compact Disc Read Only Memory) can also be used. A nonvolatile semiconductor memory device may be used as the recording medium.

The present invention has been described above using the above-described embodiment as an example. However, the invention is not limited to the embodiments described above. That is, the present invention can apply various aspects that can be understood by those skilled in the art within the scope of the present invention.

10 Character recognition system 11 Acquisition unit 12 Image extraction unit 13 Recognition unit 14 Output unit 15 Generation unit 16 Storage unit 20 Scanner 30 Information processing server 40 Character recognition system 41 Recognition unit 42 Generation unit 51 Conversion unit 52 Image recognition unit 100 Computer 101 CPU
102 Memory 103 Storage device 104 Input/output I/F
105 Communication I/F

Claims

an acquisition means for acquiring an image of characters written on the preprint of the form including the preprint;
A recognition model that recognizes the characters written on the preprint from an image of the characters written on the preprint and a preprint image of the preprint is used to obtain the acquired image and the preprint. recognition means for recognizing characters written on the acquired preprint of the image from the print image;
A character recognition system comprising: an output means for outputting the recognition result.
The recognition means recognizes the characters on the preprint of the acquired image from data obtained by combining an image of the characters written on the preprint and the preprint image into one data.
The character recognition system according to claim 1.
Further comprising conversion means for converting the preprint image using conversion parameters,
The recognition means recognizes characters on the preprint of the acquired image from data combining the acquired image and the converted preprint image.
The character recognition system according to claim 2.
The conversion means converts the preprint image using a conversion model that estimates conversion parameters from data combining the image and the converted preprint image.
The character recognition system according to claim 3.
The recognition means identifies from the image the type of form for which characters written on the preprint are to be recognized, and based on the definition data corresponding to the identified type of form, the recognition means identifies the type of form on the preprint at the acquired position based on the definition data corresponding to the specified type of form. Recognize the characters written in
A character recognition system according to any one of claims 1 to 4.
The recognition means recognizes characters written on the preprint based on definition data defining the position of the preprint on the form.
A character recognition system according to any one of claims 1 to 5.
An image of the characters written on the preprint, an image of the characters written on the preprint by learning the relationship between the preprint image and the characters written on the preprint, and Further comprising a generation means for generating, from the preprint image, a recognition model that recognizes characters written on the preprint of the image.
A character recognition system according to any one of claims 1 to 5.
The generating means learns the relationship between an image of characters written on the preprint, data obtained by combining the preprint image into one data, and a conversion parameter, and performs conversion of the preprint image. generating a transformation model that estimates the transformation parameters to be used;
The character recognition system according to claim 7.
Obtain an image of the characters written on the preprint of the form containing the preprint,
From the image obtained by using a recognition model that recognizes the characters on the preprint from an image with characters written on the preprint and a preprint image that captures the preprint, from the image and the preprint image, Recognizing the characters written on the preprint of the acquired image,
outputting the recognition result;
Character recognition method.
A process of acquiring an image of characters written on a preprint of a form including a preprint;
The image obtained using a recognition model that recognizes the characters on the preprint from an image of the characters written on the preprint and a preprint image that represents the preprint. a process of recognizing characters written on the preprint of the acquired image;
A recording medium that non-temporarily records a character recognition program that causes a computer to execute the following: a process of outputting the recognition result;