WO2023181149A1 - Character recognition system, character recognition method, and recording medium - Google Patents

Character recognition system, character recognition method, and recording medium Download PDF

Info

Publication number
WO2023181149A1
WO2023181149A1 PCT/JP2022/013389 JP2022013389W WO2023181149A1 WO 2023181149 A1 WO2023181149 A1 WO 2023181149A1 JP 2022013389 W JP2022013389 W JP 2022013389W WO 2023181149 A1 WO2023181149 A1 WO 2023181149A1
Authority
WO
WIPO (PCT)
Prior art keywords
preprint
image
characters written
recognition
characters
Prior art date
Application number
PCT/JP2022/013389
Other languages
French (fr)
Japanese (ja)
Inventor
裕一 中谷
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to PCT/JP2022/013389 priority Critical patent/WO2023181149A1/en
Publication of WO2023181149A1 publication Critical patent/WO2023181149A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/192Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
    • G06V30/194References adjustable by an adaptive method, e.g. learning

Definitions

  • the present invention relates to a character recognition system and the like.
  • OCR Optical Character Recognition
  • OCR Optical Character Recognition
  • Character recognition by OCR is performed, for example, by using a learning model generated by machine learning to recognize characters written on a preprint of a form.
  • the shape of the characters written on the preprint of the form and the position where the characters are written on the preprint vary depending on the person writing the characters.
  • the preprint and characters coexist in the image.
  • a learning model that recognizes handwritten characters on a form uses a mixture of preprints and characters written on preprints that are written in various shapes and in various positions. It may be required to be able to accurately recognize images. For this reason, it is desirable to have a technology that can accurately recognize characters on preprinted forms.
  • Patent Document 1 uses a learning model to extract handwritten characters written within the frame of a preprint.
  • the image processing system disclosed in Patent Document 1 extracts handwritten characters from an image of handwritten characters written within the preprint frame by erasing the preprint frame through image processing.
  • the main object of the present invention is to provide a character recognition system etc. that can improve the recognition accuracy of characters written on preprints.
  • the character recognition system of the present invention includes an acquisition means for acquiring an image of the characters written on the preprint of a form including the preprint, and an acquisition means for acquiring an image of the characters written on the preprint of a form including the preprint. Using a recognition model that recognizes the characters written on the preprint from the captured image and the preprint image that captures the preprint, and an output means for outputting the recognition result.
  • the character recognition method of the present invention acquires an image of characters written on a preprint of a form including a preprint, and combines an image of the characters written on the preprint and a print of the preprint. Using a recognition model that recognizes the characters written on the preprint from the print image, the characters written on the preprint of the acquired image are recognized from the acquired image and the preprint image, and the recognition model is used to recognize the characters written on the preprint of the acquired image. Output the results.
  • the recording medium of the present invention is capable of acquiring an image of characters written on a preprint of a form including a preprint, an image of characters written on the preprint, and a process of acquiring an image of the characters written on the preprint of a form, and A process of recognizing the characters written on the preprint of the acquired image from the acquired image and the preprint image using a recognition model that recognizes the characters written on the preprint from the acquired preprint image.
  • a character recognition program that causes a computer to execute a process of outputting a recognition result is recorded non-temporarily.
  • the recognition accuracy of characters written on preprints can be improved.
  • FIG. 1 is a diagram showing an example of a configuration of a first embodiment of the present invention.
  • FIG. It is a figure showing an example of a form in a 1st embodiment of the present invention. It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written.
  • FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written.
  • FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention.
  • 1 is a diagram showing an example of the configuration of a character recognition system according to a first embodiment of the present invention.
  • FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written.
  • FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written.
  • FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention.
  • FIG. 2 is a diagram showing an example of the operation flow of the character recognition system according to the first embodiment of the present invention.
  • FIG. 2 is a diagram showing an example of the operation flow of the character recognition system according to the first embodiment of the present invention. It is a figure showing an example of composition of a 2nd embodiment of the present invention. It is a figure showing an example of composition of a character recognition system of a 2nd embodiment of the present invention.
  • FIG. 7 is a diagram schematically showing a flow of data processing in a second embodiment of the present invention. It is a figure showing an example of an operation flow of a character recognition system of a 2nd embodiment of the present invention. It is a figure showing an example of an operation flow of a character recognition system of a 2nd embodiment of the present invention. It is a figure showing an example of an operation flow of a character recognition system of a 2nd embodiment of the present invention. It is a figure showing the example of composition of other embodiments of the present invention.
  • FIG. 1 is a diagram showing an example of the configuration of a form processing system according to this embodiment.
  • the form processing system includes, for example, a character recognition system 10, a scanner 20, and an information processing server 30.
  • the character recognition system 10 is connected to a scanner 20 via a network, for example. Further, the character recognition system 10 is connected to an information processing server 30 via a network.
  • the character recognition system 10 acquires an image obtained by reading a form by the scanner 20.
  • a preprint for writing characters is printed on the paper of the form.
  • a preprint is, for example, a frame or a line on a form that indicates the position where characters are written.
  • the character recognition system 10 acquires, for example, an image of handwritten characters written on a preprint.
  • the characters written on the preprint may be printed.
  • the characters written on the preprint are not limited to the above examples.
  • the character recognition system 10 uses a recognition model to identify the characters written on the preprint from an image of the characters written on the preprint obtained from the scanner 20 and a preprint image of the preprint. Recognize.
  • the recognition model is a learning model that recognizes characters written on a preprint from an image of the characters written on the preprint and the preprint image.
  • the character recognition system 10 outputs the recognition results of characters written on the preprint to the information processing server 30, for example.
  • the information processing server 30 is a server that performs processing according to the purpose of the recognition results of characters written on the preprint.
  • the character recognition system 10 recognizes characters by recognizing the characters written on the preprint using the preprint image in addition to the image of the characters written on the preprint to be recognized. It is possible to suppress the influence of preprints on
  • FIG. 2 is a diagram showing an example of a form.
  • the name of the form is written as "payment slip" at the top.
  • the example of the form in FIG. 2 is, for example, a document submitted to a financial institution when depositing money into an account at the financial institution.
  • entry columns for "account number” and “amount” are set.
  • the frames in which numbers are entered in the "account number” and “amount” fields are preprints.
  • the characters written on the preprint are, for example, the characters written within the frame of the preprint.
  • the characters written on the preprint may be written so as to overlap with the frame of the preprint.
  • An image with characters written on a preprint is an image that includes both the preprint and the characters written on the preprint.
  • the preprint image is an image of only a preprint without any characters written on it.
  • numbers are written on the preprint, but the characters written on the preprint are not limited to numbers.
  • the characters written on the preprint may include symbols.
  • FIG. 3 is a diagram showing an example of an image of characters written on a preprint.
  • FIG. 3 is an extracted image of the "account number" entry field in the example of the form shown in FIG. 2.
  • FIG. 4 is a preprint image of the entry field for "account number" in the example of the form shown in FIG.
  • the characters "01778543" are handwritten on the preprint shown in FIG.
  • An image that is only a preprint may include characters as a preprint.
  • the characters as the preprint are, for example, characters that indicate the digit of the amount, characters that indicate the item, or characters that indicate the unit.
  • the characters as a preprint are not limited to those mentioned above, as long as they are printed on paper as a preprint.
  • FIG. 5 is a diagram showing an example of an image of characters written on a preprint.
  • FIG. 5 is an image in which the "amount" entry field is extracted from the example of the form shown in FIG. 2.
  • FIG. 6 shows the example of the form shown in FIG. This is a preprint image of the "amount" entry field.
  • "yen” indicating the unit of monetary amount is printed as part of the preprint at the bottom of the frame on the right.
  • the characters "40000" are handwritten on the preprint shown in FIG.
  • a form is a document used for procedures at, for example, financial institutions, government offices, educational institutions, hospitals, transportation facilities, or companies. Further, the form may be a document attached to an item to be managed. Examples of forms are not limited to the above.
  • the preprint indicates, for example, a position on the form where the date, name, affiliation, address, telephone number, e-mail address, age, gender, occupation, or amount is to be written.
  • a preprint is composed of, for example, items to be filled in and a frame in which characters are written. When multiple characters are entered in one item, the preprint may be a series of multiple frames.
  • preprints for a plurality of items may be printed on one form. For example, when a preprint is printed on a sheet of paper as an entry column with a plurality of consecutive frames, the character recognition system 10 outputs the recognized characters as character string data according to the order of the frames.
  • FIG. 7 is a diagram showing an example of the configuration of the character recognition system 10.
  • the character recognition system 10 includes an acquisition section 11, a recognition section 13, and an output section 14 as basic components.
  • the character recognition system 10 further includes an image extraction section 12, a generation section 15, and a storage section 16.
  • the acquisition unit 11, the image extraction unit 12, the recognition unit 13, the output unit 14, and the storage unit 16 extract the characters written on the preprint from an image of the characters written on the preprint. Recognize. Further, the acquisition unit 11, the generation unit 15, and the storage unit 16 generate a recognition model, for example.
  • the acquisition unit 11 acquires an image of the characters written on the preprint.
  • the acquisition unit 11 acquires, for example, from the scanner 20 an image of a form with characters written on a preprint.
  • the acquisition unit 11 may acquire an image in which the portion of the preprint with characters written on it has already been extracted from the form.
  • the image of the portion where the characters are written on the preprint is, for example, the image shown in the examples of FIGS. 3 and 5.
  • the acquisition unit 11 may acquire an image of the form without any characters written on the preprint.
  • the acquisition unit 11 acquires, for example, from the scanner 20 an image of a form with no characters written on the preprint.
  • the acquisition unit 11 may acquire learning data used to generate the recognition model.
  • the generation unit 15 acquires, as learning data, an image of the characters written on the preprint, and data in which the preprint image is associated with the characters written on the preprint.
  • the learning data is input into the character recognition system 10 or another terminal device connected to the character recognition system 10, for example, by an operator's operation.
  • the image extracting unit 12 extracts a preprint image that corresponds to the image obtained by the obtaining unit 11 and depicting the characters written on the preprint.
  • the image extraction unit 12 extracts a preprint image from the form data stored in the storage unit 16, for example.
  • the form data includes, for example, an image of a form and definition data.
  • the definition data includes, for example, information about the items to be written on the form and the position of the preprint corresponding to the written item on the form. Information on the position of the preprint, for example, information indicating the range where the preprint is printed on the form.
  • the items to be described include, for example, one or more of name, postal code, address, telephone number, age, personal identification number, account number, amount, and date. The items to be described are not limited to the above examples.
  • the image extraction unit 12 identifies the position of the preprint on the form, for example, based on the information on the position of the preprint included in the definition data. Then, the image extraction unit 12 extracts a preprint image from the image stored in the storage unit 16 by cutting out the image at the specified preprint position. The image extraction unit 12 may extract the preprint image from the image of the form with no characters written on the preprint, which is acquired by the acquisition unit 11.
  • the recognition unit 13 uses the recognition model to recognize the characters in the image from the image of the characters written on the preprint acquired by the acquisition unit 11 and the preprint image.
  • the recognition model is a learning model that recognizes the characters written on the preprint from an image of the characters written on the preprint and the preprint image.
  • the recognition unit 13 inputs, for example, the image of the characters written on the preprint and the preprint image acquired by the acquisition unit 11 into the recognition model. Then, the recognition unit 13 recognizes the characters written on the preprint using the recognition model.
  • the recognition unit 13 may recognize characters written on the preprint using a preprint image extracted in advance. Further, the recognition unit 13 may recognize characters written on the preprint using a preprint image generated in advance as an image of the preprint portion.
  • the recognition unit 13 uses, for example, a preprint image stored in the storage unit 16 to recognize characters written on the preprint.
  • the recognition unit 13 extracts an image showing the characters written on the preprint by specifying the position of the preprint, for example, based on the information on the position of the preprint included in the definition data. Then, the recognition unit 13 uses the recognition model to identify the characters written on the preprint from the extracted image of the characters written on the preprint and the preprint image extracted by the image extraction unit 12. recognize.
  • the recognition unit 13 combines an image of the characters written on the preprint and the preprint image into one data and inputs the data into the recognition model.
  • Combining an image of characters written on a preprint with a preprint image means generating image data by superimposing the two images. If the image with characters written on the preprint and the preprint image are images with three channels of RGB per pixel, the recognition unit 13, for example, combines the data of the two images to create a single image. Image data of 6 channels per pixel. Then, the recognition unit 13 inputs the combined six-channel image data to the recognition model.
  • the recognition unit 13 combines, for example, an image of characters written on a preprint and a preprint image based on preset conditions.
  • the recognition unit 13, for example, uses an image of the characters written on the preprint extracted at the same size and a character written on the preprint based on the outer periphery of the preprint image.
  • the two images are combined by overlapping the preprint image and the preprint image.
  • the recognition unit 13 combines image data of corresponding pixels. Then, the recognition unit 13 inputs the combined data into a recognition model and recognizes the characters written on the preprint.
  • the recognition unit 13 may recognize characters other than those written on the preprint in the image of the form. For example, the recognition unit 13 may identify the type of the form from the image of the form acquired by the acquisition unit 11. Then, the recognition unit 13 recognizes the characters written on the preprint by specifying the position of the preprint based on the definition data included in the form data corresponding to the specified type of form. The recognition unit 13 identifies the type of the form, for example, by recognizing the form name or form number printed on the form in the image of the form. The relationship between the form name or form number printed on the form and the type of form is set in advance. Further, the recognition model used by the recognition unit 13 may be a learning model generated outside the character recognition system 10.
  • FIG. 8 is a diagram showing an example of an image of characters written on a preprint.
  • FIG. 8 differs from the example of the image in FIG. 3 in the aspect of the preprint.
  • the preprint of the example image in FIG. 8 differs from the example image in FIG. 3 in the thickness and type of lines, for example.
  • FIG. 9 is a preprint image of the example image of FIG. In the example of the image of the characters written on the preprint shown in FIG. 8, the characters "13758047" are handwritten on the preprint shown in FIG.
  • the recognition model outputs "13758047" as a recognition result when the example image in FIG. 8 and the example image in FIG. 9 are input. For example, even if the recognition model is a learning model generated using the preprint of the example image in FIG.
  • the recognition model can recognize characters written on a preprint that has not been trained.
  • FIG. 10 is a diagram showing an example of an image in which characters describing the year in Western calendar notation are printed on a preprint.
  • the characters "A.D.” and “Year” are printed in advance within the frame of the preprint.
  • the characters "2022” are handwritten on the preprint image.
  • FIG. 11 is a preprint image in the example of the image in FIG. 10.
  • the recognition model outputs "2022" as a recognition result when the example image in FIG. 10 and the example image in FIG. 11 are input.
  • FIG. 12 shows an example of an image in which, in the image example of FIG. 10, the upper two digits "20" indicating the year in Western calendar notation are printed in advance as a preprint. That is, in the example image of FIG. 12, “Year”, “20”, and “Year” are printed in advance as preprints. In the example image of FIG. 12, "22" out of "2022” is handwritten on the preprint.
  • FIG. 13 is a preprint image in the example of the image of FIG. 12.
  • the recognition model outputs "22" as a recognition result when the example image in FIG. 12 and the example image in FIG. 13 are input. For example, even if the recognition model is a learning model that does not use the preprints of the example images of FIGS.
  • the recognition model can recognize characters written on various forms of preprints by inputting images of the characters written on the preprints and preprint images.
  • the recognition model can also be used for preprints with different frame shapes and colors. Recognition can be performed in the same way even if learning is not performed using preprints of all aspects as learning data.
  • the output unit 14 outputs the recognition result by the recognition unit 13.
  • the output unit 14 outputs the characters recognized by the recognition unit 13 to the information processing server 30, for example.
  • the output unit 14 outputs, for example, an item corresponding to the preprint and the recognized characters in association with each other.
  • the recognition target is an account number as shown in the example image of FIG. 3
  • the output unit 14 outputs, for example, information indicating that it is an account number in association with the recognized character string.
  • the output unit 14 may output the recognition result to a display device (not shown) connected to the character recognition system 10.
  • the generation unit 15 When generating a recognition model in the character recognition system 10, the generation unit 15 performs processing related to generation of the recognition model. The generation unit 15 learns an image of the characters written on the preprint, and the relationship between the preprint image and the characters written on the preprint. Then, the generation unit 15 generates a recognition model that recognizes the characters in the image from the image of the characters written on the preprint and the preprint image.
  • the generation unit 15 generates a recognition model by learning the relationship between an image of the characters written on the preprint, data obtained by combining the preprint images, and the characters written on the preprint. generate.
  • the generation unit 15 combines the data of the two images to create one image. Image data of 6 channels per pixel.
  • the generation unit 15 then generates a recognition model by learning the relationship between the combined six-channel image data and the characters written on the preprint.
  • the generation unit 15 may perform learning using randomly shaped figures as preprints.
  • the generation unit 15 When using a random-shaped figure as a preprint, the generation unit 15 generates, for example, an image of characters written on the randomly-shaped figure, and an image that is the same as the figure with the characters written on it.
  • a recognition model is generated using the images as learning data.
  • the generation unit 15 generates a recognition model by deep learning using, for example, DNN (Deep Neural Network).
  • DNN Deep Neural Network
  • Machine learning algorithms for generating recognition models are not limited to deep learning using DNN.
  • the storage unit 16 stores, for example, a recognition model used by the recognition unit 13 to recognize characters in an image.
  • the storage unit 16 stores, for example, preprint images.
  • the storage unit 16 stores, for example, form data.
  • the form data includes, for example, image data of a form and definition data.
  • the form data may include a preprint image extracted in advance.
  • the storage unit 16 stores, for example, an image of characters written on a preprint, a preprint image, and characters written on the preprint as learning data.
  • the recognition model used by the recognition unit 13 may be stored in a storage means other than the storage unit 16.
  • the scanner 20 for example, optically reads a form and generates an image of the form.
  • the scanner 20 then outputs the image of the form to the character recognition system 10.
  • the scanner 20 may extract the image of the preprint portion from among the images of the form.
  • the scanner 20 outputs the extracted preprint image to the character recognition system 10.
  • the scanner 20 may generate an image of the form by photographing the form.
  • the information processing server 30 acquires, for example, the recognition results of characters written on the form from the character recognition system 10.
  • the information processing server 30 uses the recognition results to perform processing according to the purpose.
  • the information processing server 30 uses the recognition results, for example, in processing related to application and deposit/withdrawal related to account management at a financial institution.
  • the information processing server 30 may use the recognition results, for example, to process application documents in government offices, educational institutions, hospitals, or transportation facilities.
  • the information processing server 30 may use the recognition results for slip processing at a company. Further, the information processing server 30 may use the identification results for managing the goods in distribution. Examples of identification results are not limited to the above.
  • FIG. 14 is a diagram showing an example of an operation flow when the character recognition system 10 recognizes characters written on a preprint.
  • the acquisition unit 11 acquires an image showing the characters written on the preprint (step S11).
  • the acquisition unit 11 acquires, for example, from the scanner 20 an image of a form showing characters written on the preprint.
  • the image extraction unit 12 extracts a preprint image corresponding to the image acquired by the acquisition unit 11 (step S12).
  • the image extraction unit 12 extracts, for example, a preprint image corresponding to the image acquired by the acquisition unit 11 from the data stored in the storage unit 16.
  • the recognition unit 13 uses the recognition model to recognize characters in the image from the image acquired by the acquisition unit 11 and the preprint image (step S13).
  • the recognition model recognizes the characters written on the preprint from the image of the characters written on the preprint and the preprint image.
  • the output unit 14 When the characters in the image are recognized, the output unit 14 outputs the recognition results (step S14). The output unit 14 outputs the recognition result to the information processing server 30, for example.
  • FIG. 15 is a diagram showing an example of an operation flow when the character recognition system 10 generates a recognition model.
  • the acquisition unit 11 acquires, as learning data, an image of the characters written on the preprint, the preprint image, and the characters written on the preprint (step S21).
  • the generation unit 15 Upon acquiring the learning data, the generation unit 15 learns the image of the characters written on the preprint and the relationship between the preprint image and the characters written on the preprint, and generates a recognition model (Ste S22). For example, the generation unit 15 combines an image of characters written on a preprint with a preprint image. Then, the generation unit 15 learns the relationship between the combined data and the characters written on the preprint, which are included as correct data in the learning data, and generates a recognition model.
  • the generation unit 15 After generating the recognition model, the generation unit 15 saves the generated recognition model (step S23).
  • the generation unit 15 stores the generated recognition model in the storage unit 16, for example.
  • the character recognition system 10 of the form processing system of this embodiment uses a recognition model to recognize characters written on a preprint from an image of the characters written on the preprint and a preprint image. .
  • the character recognition system 10 recognizes the characters written on the preprint by further using the preprint image in addition to the written image showing the characters written on the preprint to be recognized. It is possible to suppress the influence of preprints on the recognition of As a result, the character recognition system 10 can improve the accuracy of recognizing characters written on preprints.
  • the recognition model used by the character recognition system 10 uses as input an image of the characters written on the preprint and the preprint image, and performs learning by recognizing the characters written on the preprint. It is possible to recognize characters written on a preprint of an embodiment in which the method is not performed. Therefore, the character recognition system 10 recognizes the characters written on the preprint by inputting an image showing the characters written on the preprint and the preprint image. Characters written on prints can be recognized. Furthermore, in the character recognition system 10, when generating a recognition model, it is not necessary to prepare learning data for each form of preprint actually used for recognition.
  • the character recognition system 10 when generating a recognition model, there is no need to learn learning data for each form of preprint actually used for recognition, so the amount of learning when generating a recognition model is suppressed. be able to. Therefore, in the character recognition system 10, the computer resources necessary for generating a recognition model can be suppressed. Therefore, the character recognition system 10 can efficiently generate a recognition model.
  • the character recognition system 10 can generate recognition models that can recognize characters written on various preprint images. I can do it. That is, by using a recognition model generated using randomly shaped figures as a preprint, the character recognition system 10 can recognize the characters written on the preprint even if the shape of the preprint image is different for each form. Can recognize characters accurately.
  • the character recognition system 10 of this embodiment recognizes characters by inputting an image of characters written on a preprint and data obtained by combining the preprint images into a recognition model. There is no need to erase preprints as preprocessing for recognition. Further, since the process of erasing the preprint is not performed, it is possible to suppress the influence of the process related to erasing the preprint on character recognition. Therefore, the character recognition system 10 of this embodiment can improve recognition accuracy while suppressing the resources necessary for recognizing characters written on preprints.
  • FIG. 16 is a diagram showing an example of the configuration of the form processing system of this embodiment.
  • the form processing system includes, for example, a character recognition system 40, a scanner 20, and an information processing server 30.
  • the character recognition system 40 is connected to the scanner 20 via a network, for example. Further, the character recognition system 40 is connected to the information processing server 30 via a network.
  • the character recognition system 10 of the first embodiment uses, for example, a recognition model to input data that combines an image in which characters are written on a preprint and a preprint image, and recognizes the characters on the preprint. recognize. The character recognition system 10 then outputs the recognition result.
  • the character recognition system 40 of the present embodiment for example, when combining an image with characters written on a preprint and a preprint image, improves the accuracy of overlapping the two images.
  • a transformation model is used to transform the preprint images and then combine them.
  • the conversion model is a learning model that estimates conversion parameters used when performing conversion processing on a preprint image.
  • FIG. 17 is a diagram showing an example of the configuration of the character recognition system 40.
  • the character recognition system 40 includes an acquisition section 11 , an image extraction section 12 , a recognition section 41 , an output section 14 , a generation section 42 , and a storage section 16 .
  • the recognition unit 41 also includes a conversion unit 51 and an image recognition unit 52.
  • the configuration and functions of the acquisition unit 11, image extraction unit 12, output unit 14, and storage unit 16 of the character recognition system 40 are the same as the acquisition unit 11, image extraction unit 12, and output unit 14 of the character recognition system 10 of the first embodiment. and storage unit 16, respectively.
  • the conversion unit 51 of the recognition unit 41 converts the preprint image using, for example, a conversion model.
  • the transformation model for example, performs an affine transformation on the preprint image.
  • the recognition unit 41 converts the preprint image so that it overlaps with the destination image by, for example, rotating, adjusting the size, and moving the preprint image in parallel.
  • the transformation model estimates transformation parameters used when rotating, adjusting size, and translating a preprint image, for example.
  • the conversion unit 51 uses a conversion model to estimate affine transformation parameters from data obtained by combining an image of characters written on a preprint and a preprint image according to preset conditions. Then, the conversion unit 51 performs affine transformation on the preprint image using the estimated parameters. For example, the converting unit 51 combines the two images so that they overlap by matching the outer periphery of each of the preprint images with the image of the characters written on the preprint as a preset condition. . Then, the conversion unit 51 uses the conversion model to estimate conversion parameters from the data combined under preset conditions.
  • the conversion parameter is a parameter for converting preprint images so that the accuracy of overlaying is improved compared to when combining under preset conditions. After estimating the transformation parameters, the transformation unit 51 performs affine transformation on the preprint image using the transformation parameters, thereby increasing the accuracy of superposition.
  • the transformation model is, for example, a learning model that uses DNN called STN (Spatial Transformer Networks).
  • STN Session Initiation Networks
  • the image transformation method using STN is described, for example, in Max Jaderberg et al. "Spatial Transformer Networks", NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 2, December 2015, p. 2017-2025 Are listed.
  • the image recognition unit 52 of the recognition unit 41 uses a recognition model to recognize the characters written on the preprint from the image of the characters written on the preprint and the preprint image.
  • the image recognition unit 52 combines the image of the characters written on the preprint with the preprint image on which the conversion unit 51 has performed affine transformation. Then, the image recognition unit 52 uses the identification model to recognize characters written on the preprint from the combined data.
  • the conversion model and the recognition model may be learning models generated outside the character recognition system 40.
  • FIG. 18 is a diagram schematically showing the flow of processing when the recognition unit 41 recognizes characters written on a preprint.
  • the converting unit 51 combines, for example, an image of characters written on a preprint and a preprint image according to, for example, preset conditions.
  • the preset conditions are, for example, set so that the outer peripheries of the two images are aligned.
  • the conversion unit 51 estimates affine transformation parameters using the transformation model. Then, the conversion unit 51 performs affine transformation on the preprint image using the estimated affine transformation parameters.
  • the converting unit 51 outputs the preprint image that has undergone affine transformation to the image recognizing unit 52.
  • the image recognition unit 52 combines the image in which characters are written on the preprint and the image that has been subjected to affine transformation.
  • the image recognition unit 52 uses the recognition model to recognize characters written on the preprint from the combined data.
  • the character recognition system 40 generates only the recognition model out of the conversion model and the recognition model.
  • a recognition model for example, a learning model generated outside the character recognition system 40 is used as the conversion model.
  • the generation unit 42 When generating only the recognition model out of the conversion model and the recognition model, the generation unit 42 generates, for example, an image containing the characters written on the preprint and the preprint image, which are included in the learning data. , are combined using a transformation model. Then, the generation unit 42 learns the relationship between the combined data and the characters written on the preprint, which are included as correct data in the learning data, and generates a recognition model.
  • the generation unit 42 stores the generated conversion model and recognition model in the storage unit 16.
  • the character recognition system 40 may generate both a conversion model and a recognition model.
  • the generation unit 42 uses the conversion model to combine an image of the characters written on the preprint and the preprint image according to preset conditions. Estimate the transformation parameters from the data. Furthermore, the generation unit 42 uses the recognition model to recognize characters written on the preprint from the combined data.
  • the generation unit 42 updates the parameters of the transformation model so that the difference between the affine transformation parameters estimated by the transformation model and the affine transformation parameters included in the learning data becomes smaller.
  • the generation unit 42 also updates the parameters of the recognition model so that the difference between the identification result and the correct data becomes smaller.
  • the generation unit 42 repeats the above process using the updated model. For example, the generation unit 42 generates a conversion model and a recognition model by repeating the above processing until the accuracy of the estimation result of the conversion parameter of the conversion model and the recognition result of the recognition model satisfy a preset standard. Further, the generation unit 42 generates the identification model by, for example, updating the parameters of the recognition model so that the difference between the identification result and the correct data becomes smaller. The generation unit 42 stores the generated conversion model and recognition model in the storage unit 16, for example.
  • FIG. 19 is a diagram showing an example of an operation flow when the character recognition system 40 recognizes characters written on a preprint.
  • the acquisition unit 11 acquires an image showing the characters written on the preprint (step S31).
  • the acquisition unit 11 acquires, for example, from the scanner 20 an image of a form showing characters written on the preprint.
  • the image extraction unit 12 extracts a preprint image corresponding to the image acquired by the acquisition unit 11 (step S32).
  • the image extraction unit 12 extracts, for example, a preprint image corresponding to the image acquired by the acquisition unit 11 from the data stored in the storage unit 16.
  • the conversion unit 51 of the recognition unit 41 uses the conversion model to estimate conversion parameters to be used when converting the preprint image. Then, the conversion unit 51 converts the preprint image using the estimated conversion parameters (step S33).
  • the image recognition unit 52 combines the image with the characters written on the preprint and the converted preprint image. Then, the image recognition unit 52 uses the recognition model to recognize characters in the image from the combined data (step S34).
  • the output unit 14 When the characters in the image are recognized, the output unit 14 outputs the recognition results (step S35). The output unit 14 outputs the recognition result to the information processing server 30, for example.
  • FIG. 20 is a diagram showing an example of an operation flow when the character recognition system 40 generates only a recognition model.
  • the acquisition unit 11 acquires, as learning data, an image of the characters written on the preprint, the preprint image, and the characters written on the preprint (step S41).
  • the generation unit 42 uses the conversion model to estimate conversion parameters to be used when converting the preprint image. Then, the generation unit 42 converts the preprint image using the estimated conversion parameters and the conversion model (step S42).
  • the generation unit 42 After converting the preprint image, the generation unit 42 combines the image containing the characters written on the preprint with the converted preprint image. The generation unit 42 then learns the relationship between the combined data and the characters written on the preprint, and generates a recognition model (step S43).
  • the generation unit 42 After generating the recognition model, the generation unit 42 saves the generated recognition model (step S44).
  • the generation unit 42 stores the generated recognition model in the storage unit 16, for example.
  • FIG. 21 is a diagram showing an example of an operation flow when the character recognition system 40 generates a conversion model and a recognition model.
  • the acquisition unit 11 acquires, as learning data, data obtained by combining an image of the characters written on the preprint and the preprint image, a conversion parameter, and the characters written on the preprint (step S51 ).
  • the generation unit 42 When the learning data is acquired, the generation unit 42 combines the data included in the learning model, which is a combination of the image of the characters written on the preprint and the preprint image, and the parameters included in the learning model. Generate a transformation model by learning relationships. In addition, the generation unit 42 generates a recognition model by learning the relationship between the data obtained by combining the image of the characters written on the preprint and the preprint image, and the characters written on the preprint ( Step S52).
  • the generation unit 42 After generating the conversion model and recognition model, the generation unit 42 saves the generated conversion model and recognition model (step S53).
  • the generation unit 42 stores the generated conversion model and recognition model in the storage unit 16, for example.
  • the character recognition system 40 of this embodiment uses a conversion model to combine an image of characters written on a preprint with a preprint image. Then, the character recognition system 40 uses the recognition model to recognize characters written on the preprint from the combined data. By using the preprint image converted using the conversion model, the character recognition system 40 improves the accuracy of superposition when combining the image of the characters written on the preprint with the preprint image. can be improved. By using the data combined in this way, the character recognition system 40 uses the recognition model to recognize the characters written on the preprint and the preprint image while suppressing fluctuations in the deviation between the image showing the characters written on the preprint and the preprint image. Characters on preprints can be recognized. The character recognition system 40 improves the recognition accuracy of the characters written on the preprint by using the recognition model to recognize the characters written on the preprint while variations in the shift between the two images are suppressed. can do.
  • the character recognition system 40 recognizes the overlap between the image of the characters written on the preprint and the preprint image, which may occur in actual use. A conversion model that suppresses misalignment can be generated. Therefore, the character recognition system 40 can suppress variations in the deviation between the image in which characters are written on the preprint and the preprint image, depending on the actual usage situation. Therefore, when generating a conversion model using learning data, the character recognition system 40 can further improve the recognition accuracy of characters written on a preprint.
  • FIG. 22 shows an example of the configuration of a computer 200 that executes a computer program that performs each process in the character recognition system 10 of the first embodiment and the character recognition system 40 of the second embodiment.
  • the computer 200 includes a CPU (Central Processing Unit) 201, a memory 202, a storage device 203, an input/output I/F (Interface) 204, and a communication I/F 205.
  • CPU Central Processing Unit
  • the CPU 201 reads computer programs for performing each process from the storage device 203 and executes them.
  • the CPU 201 may be configured by a combination of multiple CPUs. Further, the CPU 201 may be configured by a combination of a CPU and other types of processors. For example, the CPU 201 may be configured by a combination of a CPU and a GPU (Graphics Processing Unit).
  • the memory 202 is configured with a DRAM (Dynamic Random Access Memory) or the like, and temporarily stores computer programs executed by the CPU 201 and data being processed.
  • the storage device 203 stores computer programs executed by the CPU 201.
  • the storage device 203 is configured by, for example, a nonvolatile semiconductor storage device. Other storage devices such as a hard disk drive may be used as the storage device 203.
  • the input/output I/F 204 is an interface that receives input from a worker and outputs display data and the like.
  • the communication I/F 205 is an interface that transmits and receives data between the scanner 20 and the information processing server 30. Furthermore, the information processing server 30 may also have a similar configuration.
  • the computer program used to execute each process can also be stored and distributed in a computer-readable recording medium that non-temporarily records data.
  • a computer-readable recording medium for example, a magnetic tape for data recording or a magnetic disk such as a hard disk can be used.
  • an optical disc such as a CD-ROM (Compact Disc Read Only Memory) can also be used.
  • a nonvolatile semiconductor memory device may be used as the recording medium.
  • Character recognition system 11 Acquisition unit 12 Image extraction unit 13 Recognition unit 14 Output unit 15 Generation unit 16 Storage unit 20 Scanner 30 Information processing server 40 Character recognition system 41 Recognition unit 42 Generation unit 51 Conversion unit 52 Image recognition unit 100 Computer 101 CPU 102 Memory 103 Storage device 104 Input/output I/F 105 Communication I/F

Abstract

This character recognition system comprises an acquisition unit, a recognition unit, and an output unit. The acquisition unit acquires an image depicting a character noted on a preprint of a ledger sheet including the preprint. The recognition unit uses the image depicting the character noted on the preprint, and a recognition model for recognizing a character noted on a preprint from a preprint image depicting the preprint, to recognize, from the acquired image and the preprint image, the character noted on the preprint of the acquired image. The output unit outputs the result of recognition.

Description

文字認識システム、文字認識方法および記録媒体Character recognition system, character recognition method and recording medium
 本発明は、文字認識システム等に関する。 The present invention relates to a character recognition system and the like.
 帳票に記載された手書きの文字をスキャナーで画像として読み取り、画像中の文字を認識することでテキストデータに変換するOCR(Optical Character Recognition)が広く用いられている。OCRによる文字の認識は、例えば、機械学習によって生成された学習モデルを用いて、帳票のプレプリント上に記載された文字を認識することによって行われる。しかし、同一の文字が記載されている場合でも、帳票のプレプリント上に記載された文字の形状、およびプレプリント上において文字が記載される位置は、文字を記載する人物によって多様である。また、プレプリント上に記載された文字を読み取った画像では、画像中にプレプリントと、文字が混在する。よって、帳票上の手書きの文字を認識する学習モデルは、多様な形状で記載され、記載されている位置も多様なプレプリント上に記載されている文字を、プレプリントと、文字が混在して写っている画像から正確に認識できることが要求され得る。このため、帳票のプレプリント上の文字を正確に認識できる技術があることが望ましい。 OCR (Optical Character Recognition) is widely used, which reads handwritten characters written on a form as an image using a scanner, converts it into text data by recognizing the characters in the image. Character recognition by OCR is performed, for example, by using a learning model generated by machine learning to recognize characters written on a preprint of a form. However, even when the same characters are written, the shape of the characters written on the preprint of the form and the position where the characters are written on the preprint vary depending on the person writing the characters. Furthermore, in an image obtained by reading characters written on a preprint, the preprint and characters coexist in the image. Therefore, a learning model that recognizes handwritten characters on a form uses a mixture of preprints and characters written on preprints that are written in various shapes and in various positions. It may be required to be able to accurately recognize images. For this reason, it is desirable to have a technology that can accurately recognize characters on preprinted forms.
 特許文献1の画像処理システムは、プレプリントの枠内に記載された手書きの文字の抽出を、学習モデルを用いて行う。特許文献1の画像処理システムは、プレプリントの枠内に記載された手書きの文字を写した画像から、プレプリントの枠を画像処理によって消去することで、手書きの文字を抽出する。 The image processing system of Patent Document 1 uses a learning model to extract handwritten characters written within the frame of a preprint. The image processing system disclosed in Patent Document 1 extracts handwritten characters from an image of handwritten characters written within the preprint frame by erasing the preprint frame through image processing.
特開2021-39424号公報JP2021-39424A
 特許文献1の情報処理装置では、プレプリント上に記載された文字の正確な認識が難しい場合がある。 With the information processing device of Patent Document 1, it may be difficult to accurately recognize characters written on a preprint.
 上記の課題を解決するため、本発明は、プレプリント上に記載された文字の認識精度を向上することができる文字認識システム等を提供することを主たる目的とする。 In order to solve the above problems, the main object of the present invention is to provide a character recognition system etc. that can improve the recognition accuracy of characters written on preprints.
 上記の課題を解決するため、本発明の文字認識システムは、プレプリントを含む帳票のプレプリント上に記載された文字を写した画像を取得する取得手段と、プレプリント上に記載された文字を写した画像と、プレプリントを写したプレプリント画像とからプレプリント上に記載された文字を認識する認識モデルを用いて、取得した画像と、プレプリント画像とから、取得した画像のプレプリント上に記載された文字を認識する認識手段と、認識の結果を出力する出力手段とを備える。 In order to solve the above problems, the character recognition system of the present invention includes an acquisition means for acquiring an image of the characters written on the preprint of a form including the preprint, and an acquisition means for acquiring an image of the characters written on the preprint of a form including the preprint. Using a recognition model that recognizes the characters written on the preprint from the captured image and the preprint image that captures the preprint, and an output means for outputting the recognition result.
 本発明の文字認識方法は、プレプリントを含む帳票のプレプリント上に記載された文字を写した画像を取得し、プレプリント上に記載された文字を写した画像と、プレプリントを写したプレプリント画像とからプレプリント上に記載された文字を認識する認識モデルを用いて、取得した画像と、プレプリント画像とから、取得した画像のプレプリント上に記載された文字を認識し、認識の結果を出力する。 The character recognition method of the present invention acquires an image of characters written on a preprint of a form including a preprint, and combines an image of the characters written on the preprint and a print of the preprint. Using a recognition model that recognizes the characters written on the preprint from the print image, the characters written on the preprint of the acquired image are recognized from the acquired image and the preprint image, and the recognition model is used to recognize the characters written on the preprint of the acquired image. Output the results.
 本発明の記録媒体は、プレプリントを含む帳票のプレプリント上に記載された文字がを写した画像を取得する処理と、プレプリント上に記載された文字を写した画像と、プレプリントを写したプレプリント画像とからプレプリント上に記載された文字を認識する認識モデルを用いて、取得した画像と、プレプリント画像とから、取得した画像のプレプリント上に記載された文字を認識する処理と、認識の結果を出力する処理とをコンピュータに実行させる文字認識プログラムを非一時的に記録する。 The recording medium of the present invention is capable of acquiring an image of characters written on a preprint of a form including a preprint, an image of characters written on the preprint, and a process of acquiring an image of the characters written on the preprint of a form, and A process of recognizing the characters written on the preprint of the acquired image from the acquired image and the preprint image using a recognition model that recognizes the characters written on the preprint from the acquired preprint image. A character recognition program that causes a computer to execute a process of outputting a recognition result is recorded non-temporarily.
 本発明によると、プレプリント上に記載された文字の認識精度を向上することができる。 According to the present invention, the recognition accuracy of characters written on preprints can be improved.
本発明の第1の実施形態の構成の一例を示す図である。1 is a diagram showing an example of a configuration of a first embodiment of the present invention. FIG. 本発明の第1の実施形態における帳票の例を示す図である。It is a figure showing an example of a form in a 1st embodiment of the present invention. 本発明の第1の実施形態の文字が記載された画像の例を示す図である。It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. 本発明の第1の実施形態のプレプリント画像の例を示す図である。FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. 本発明の第1の実施形態の文字が記載された画像の例を示す図である。It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. 本発明の第1の実施形態のプレプリント画像の例を示す図である。FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. 本発明の第1の実施形態の文字認識システムの構成の例を示す図である。1 is a diagram showing an example of the configuration of a character recognition system according to a first embodiment of the present invention. 本発明の第1の実施形態の文字が記載された画像の例を示す図である。It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. 本発明の第1の実施形態のプレプリント画像の例を示す図である。FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. 本発明の第1の実施形態の文字が記載された画像の例を示す図である。It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. 本発明の第1の実施形態のプレプリント画像の例を示す図である。FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. 本発明の第1の実施形態の文字が記載された画像の例を示す図である。It is a figure which shows the example of the image in which the character of the 1st Embodiment of this invention was written. 本発明の第1の実施形態のプレプリント画像の例を示す図である。FIG. 3 is a diagram showing an example of a preprint image according to the first embodiment of the present invention. 本発明の第1の実施形態の文字認識システムの動作フローの例を示す図である。FIG. 2 is a diagram showing an example of the operation flow of the character recognition system according to the first embodiment of the present invention. 本発明の第1の実施形態の文字認識システムの動作フローの例を示す図である。FIG. 2 is a diagram showing an example of the operation flow of the character recognition system according to the first embodiment of the present invention. 本発明の第2の実施形態の構成の一例を示す図である。It is a figure showing an example of composition of a 2nd embodiment of the present invention. 本発明の第2の実施形態の文字認識システムの構成の例を示す図である。It is a figure showing an example of composition of a character recognition system of a 2nd embodiment of the present invention. 本発明の第2の実施形態におけるデータ処理のフローを模式的に示す図である。FIG. 7 is a diagram schematically showing a flow of data processing in a second embodiment of the present invention. 本発明の第2の実施形態の文字認識システムの動作フローの例を示す図である。It is a figure showing an example of an operation flow of a character recognition system of a 2nd embodiment of the present invention. 本発明の第2の実施形態の文字認識システムの動作フローの例を示す図である。It is a figure showing an example of an operation flow of a character recognition system of a 2nd embodiment of the present invention. 本発明の第2の実施形態の文字認識システムの動作フローの例を示す図である。It is a figure showing an example of an operation flow of a character recognition system of a 2nd embodiment of the present invention. 本発明の他の実施形態の構成の例を示す図である。It is a figure showing the example of composition of other embodiments of the present invention.
 (第1の実施形態)
 本発明の第1の実施形態について、図を参照して詳細に説明する。図1は、本実施形態の帳票処理システムの構成の例を示す図である。帳票処理システムは、一例として、文字認識システム10と、スキャナー20と、情報処理サーバ30を備える。文字認識システム10は、例えば、ネットワークを介して、スキャナー20と接続する。また、文字認識システム10は、ネットワークを介して、情報処理サーバ30と接続する。スキャナー20および情報処理サーバ30は、複数であってもよい。スキャナー20および情報処理サーバ30の数は、特に限定されない。
(First embodiment)
A first embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram showing an example of the configuration of a form processing system according to this embodiment. The form processing system includes, for example, a character recognition system 10, a scanner 20, and an information processing server 30. The character recognition system 10 is connected to a scanner 20 via a network, for example. Further, the character recognition system 10 is connected to an information processing server 30 via a network. There may be a plurality of scanners 20 and information processing servers 30. The number of scanners 20 and information processing servers 30 is not particularly limited.
 文字認識システム10は、例えば、スキャナー20が帳票を読み取った画像を取得する。帳票の用紙には、文字を記載するプレプリントが印刷されている。プレプリントは、例えば、帳票上において、文字を記載する位置を示す枠または線である。文字認識システム10は、例えば、プレプリント上に記載された手書きで文字が書かれた画像を取得する。プレプリント上に記載された文字は、印刷によるものであってもよい。プレプリント上に記載された文字は、上記の例に限られない。 For example, the character recognition system 10 acquires an image obtained by reading a form by the scanner 20. A preprint for writing characters is printed on the paper of the form. A preprint is, for example, a frame or a line on a form that indicates the position where characters are written. The character recognition system 10 acquires, for example, an image of handwritten characters written on a preprint. The characters written on the preprint may be printed. The characters written on the preprint are not limited to the above examples.
 文字認識システム10は、認識モデルを用いて、スキャナー20から取得したプレプリント上に記載された文字を写した画像と、プレプリントを写したプレプリント画像とから、プレプリント上に記載された文字を認識する。認識モデルは、プレプリント上に記載された文字を写した画像と、プレプリント画像から、プレプリント上に記載された文字を認識する学習モデルである。文字認識システム10は、例えば、情報処理サーバ30に、プレプリント上に記載された文字の認識結果を出力する。情報処理サーバ30は、プレプリント上に記載された文字の認識結果の用途に応じた処理を行うサーバである。 The character recognition system 10 uses a recognition model to identify the characters written on the preprint from an image of the characters written on the preprint obtained from the scanner 20 and a preprint image of the preprint. Recognize. The recognition model is a learning model that recognizes characters written on a preprint from an image of the characters written on the preprint and the preprint image. The character recognition system 10 outputs the recognition results of characters written on the preprint to the information processing server 30, for example. The information processing server 30 is a server that performs processing according to the purpose of the recognition results of characters written on the preprint.
 文字認識システム10は、認識対象となるプレプリント上に記載された文字が写った画像に加え、プレプリント画像をさらに用いて、プレプリント上に記載された文字を認識することで、文字の認識にプレプリントが与える影響を抑制することができる。 The character recognition system 10 recognizes characters by recognizing the characters written on the preprint using the preprint image in addition to the image of the characters written on the preprint to be recognized. It is possible to suppress the influence of preprints on
 図2は、帳票の例を示す図である。図2の帳票の例では、上部に帳票の名称が「払込票」として記載されている。図2の帳票の例は、例えば、金融機関の口座に入金する際に金融機関に提出する書類である。図2の帳票の例では、「口座番号」と、「金額」の記入欄が設定されている。図2の帳票の例では、「口座番号」と、「金額」の記入欄における数字を記入する枠がプレプリントである。 FIG. 2 is a diagram showing an example of a form. In the example of the form in FIG. 2, the name of the form is written as "payment slip" at the top. The example of the form in FIG. 2 is, for example, a document submitted to a financial institution when depositing money into an account at the financial institution. In the example of the form shown in FIG. 2, entry columns for "account number" and "amount" are set. In the example of the form shown in FIG. 2, the frames in which numbers are entered in the "account number" and "amount" fields are preprints.
 プレプリント上に記載された文字は、例えば、プレプリントの枠内に記載された文字である。プレプリント上に記載された文字は、プレプリントの枠内と重なるように記載されていてもよい。プレプリント上に文字が記載された画像は、プレプリントとプレプリント上に記載された文字の両方が含まれる画像である。また、プレプリント画像は、文字が書かれていないプレプリントのみを写した画像である。図2の帳票の例では、プレプリント上に数字が記載されているが、プレプリント上に記載される文字は、数字に限られない。また、プレプリント上に記載される文字は、記号を含んでいてもよい。 The characters written on the preprint are, for example, the characters written within the frame of the preprint. The characters written on the preprint may be written so as to overlap with the frame of the preprint. An image with characters written on a preprint is an image that includes both the preprint and the characters written on the preprint. Furthermore, the preprint image is an image of only a preprint without any characters written on it. In the example of the form in FIG. 2, numbers are written on the preprint, but the characters written on the preprint are not limited to numbers. Furthermore, the characters written on the preprint may include symbols.
 図3は、プレプリント上に記載された文字を写した画像の例を示す図である。図3は、図2の帳票の例において、「口座番号」の記入欄を抽出した画像である。また、図4は、図2の帳票の例における、「口座番号」の記入欄のプレプリント画像である。図3のプレプリント上に記載された文字を写した画像の例では、図4に示すプレプリント上に、手書きで「01778543」の文字が記載されている。 FIG. 3 is a diagram showing an example of an image of characters written on a preprint. FIG. 3 is an extracted image of the "account number" entry field in the example of the form shown in FIG. 2. Further, FIG. 4 is a preprint image of the entry field for "account number" in the example of the form shown in FIG. In the example of the image of the characters written on the preprint shown in FIG. 3, the characters "01778543" are handwritten on the preprint shown in FIG.
 プレプリントのみの画像は、プレプリントとしての文字が含まれていてもよい。プレプリントとしての文字は、例えば、金額の桁を示す文字、項目を示す文字または単位を示す文字である。プレプリントとしての文字は、プレプリントとして用紙に印刷されているものであれば、上記に限られない。 An image that is only a preprint may include characters as a preprint. The characters as the preprint are, for example, characters that indicate the digit of the amount, characters that indicate the item, or characters that indicate the unit. The characters as a preprint are not limited to those mentioned above, as long as they are printed on paper as a preprint.
 図5は、プレプリント上に記載された文字を写した画像の例を示す図である。図5は、図2の帳票の例において、「金額」の記入欄を抽出した画像である。また、図6は、図2の帳票の例における。「金額」の記入欄のプレプリント画像である。図6のプレプリント画像の例では、右側の枠内の下部に金額の単位を示す「円」がプレプリントの一部として印刷されている。図5のプレプリント上に記載された文字を写した画像の例では、図6に示すプレプリント上に、手書きで「40000」の文字が記載されている。 FIG. 5 is a diagram showing an example of an image of characters written on a preprint. FIG. 5 is an image in which the "amount" entry field is extracted from the example of the form shown in FIG. 2. Further, FIG. 6 shows the example of the form shown in FIG. This is a preprint image of the "amount" entry field. In the example of the preprint image in FIG. 6, "yen" indicating the unit of monetary amount is printed as part of the preprint at the bottom of the frame on the right. In the example of the image of the characters written on the preprint shown in FIG. 5, the characters "40000" are handwritten on the preprint shown in FIG.
 帳票は、例えば、金融機関、官公庁、教育機関、病院、交通機関または企業において、手続きに用いる書類である。また、帳票は、管理対象の物品に張り付けられた書類であってもよい。帳票の例は、上記に限られない。プレプリントは、例えば、帳票上において、日付、氏名、所属、住所、電話番号、メールアドレス、年齢、性別、職業または金額を記入する位置を示す。プレプリントは、例えば、記入する項目と、文字を記入する枠によって構成される。1つの項目において複数の文字が記入される場合において、プレプリントは、複数の枠が連なったものであってもよい。また、1つの帳票に複数の項目についてのプレプリントが印刷されていてもよい。文字認識システム10は、例えば、プレプリントが複数の枠が連なった記入欄として用紙に印刷されている場合に、認識した文字を枠の順番に従った文字列のデータとして出力する。 A form is a document used for procedures at, for example, financial institutions, government offices, educational institutions, hospitals, transportation facilities, or companies. Further, the form may be a document attached to an item to be managed. Examples of forms are not limited to the above. The preprint indicates, for example, a position on the form where the date, name, affiliation, address, telephone number, e-mail address, age, gender, occupation, or amount is to be written. A preprint is composed of, for example, items to be filled in and a frame in which characters are written. When multiple characters are entered in one item, the preprint may be a series of multiple frames. Furthermore, preprints for a plurality of items may be printed on one form. For example, when a preprint is printed on a sheet of paper as an entry column with a plurality of consecutive frames, the character recognition system 10 outputs the recognized characters as character string data according to the order of the frames.
 文字認識システム10の構成について説明する。図7は、文字認識システム10の構成の例を示す図である。文字認識システム10は、取得部11と、認識部13と、出力部14とを基本構成として備える。また、文字認識システム10は、さらに、画像抽出部12と、生成部15と、記憶部16を備える。取得部11と、画像抽出部12と、認識部13と、出力部14と、記憶部16は、例えば、プレプリント上に記載された文字を写した画像から、プレプリント上に記載された文字を認識する。また、取得部11と、生成部15と、記憶部16は、例えば、認識モデルを生成する。 The configuration of the character recognition system 10 will be explained. FIG. 7 is a diagram showing an example of the configuration of the character recognition system 10. The character recognition system 10 includes an acquisition section 11, a recognition section 13, and an output section 14 as basic components. The character recognition system 10 further includes an image extraction section 12, a generation section 15, and a storage section 16. For example, the acquisition unit 11, the image extraction unit 12, the recognition unit 13, the output unit 14, and the storage unit 16 extract the characters written on the preprint from an image of the characters written on the preprint. Recognize. Further, the acquisition unit 11, the generation unit 15, and the storage unit 16 generate a recognition model, for example.
 取得部11は、プレプリント上に記載された文字を写した画像を取得する。取得部11は、例えば、スキャナー20から、プレプリント上に文字が記載された帳票を写した画像を取得する。取得部11は、プレプリント上に文字の記載された部分が帳票から抽出済みの画像を取得してもよい。プレプリント上に文字の記載された部分の画像は、例えば、図3および図5の例に示す画像である。プレプリント画像を帳票から抽出する場合に、取得部11は、プレプリント上に文字が記載されていない状態の帳票の画像を取得してもよい。取得部11は、例えば、スキャナー20から、プレプリント上に文字が記載されていない状態の帳票の画像を取得する。 The acquisition unit 11 acquires an image of the characters written on the preprint. The acquisition unit 11 acquires, for example, from the scanner 20 an image of a form with characters written on a preprint. The acquisition unit 11 may acquire an image in which the portion of the preprint with characters written on it has already been extracted from the form. The image of the portion where the characters are written on the preprint is, for example, the image shown in the examples of FIGS. 3 and 5. When extracting a preprint image from a form, the acquisition unit 11 may acquire an image of the form without any characters written on the preprint. The acquisition unit 11 acquires, for example, from the scanner 20 an image of a form with no characters written on the preprint.
 文字認識システム10が認識モデルを生成する場合に、取得部11は、認識モデルの生成に用いる学習データを取得してもよい。生成部15は、例えば、プレプリント上に記載された文字を写した画像、およびプレプリント画像と、プレプリント上に記載されている文字とを関連付けたデータを学習データとして取得する。学習データは、例えば、作業者の操作によって文字認識システム10、または文字認識システム10と接続している他の端末装置に入力される。 When the character recognition system 10 generates a recognition model, the acquisition unit 11 may acquire learning data used to generate the recognition model. For example, the generation unit 15 acquires, as learning data, an image of the characters written on the preprint, and data in which the preprint image is associated with the characters written on the preprint. The learning data is input into the character recognition system 10 or another terminal device connected to the character recognition system 10, for example, by an operator's operation.
 画像抽出部12は、取得部11が取得する、プレプリント上に記載された文字を写した画像に対応するプレプリント画像を抽出する。画像抽出部12は、例えば、記憶部16に保存された帳票データから、プレプリント画像を抽出する。帳票データには、例えば、帳票の画像と、定義データが含まれる。定義データは、例えば、帳票に記載される項目と、記載される項目に対応するプレプリントの帳票上における位置の情報が含まれる。プレプリントの位置の情報、例えば、帳票上において、プレプリントが印刷されている範囲を示す情報である。まtが、記載される項目は、例えば、名前、郵便番号、住所、電話番号、年齢、個人識別番号、口座番号、金額、および日付のうち1つ以上である。記載される項目は、上記の例に限られない。 The image extracting unit 12 extracts a preprint image that corresponds to the image obtained by the obtaining unit 11 and depicting the characters written on the preprint. The image extraction unit 12 extracts a preprint image from the form data stored in the storage unit 16, for example. The form data includes, for example, an image of a form and definition data. The definition data includes, for example, information about the items to be written on the form and the position of the preprint corresponding to the written item on the form. Information on the position of the preprint, for example, information indicating the range where the preprint is printed on the form. However, the items to be described include, for example, one or more of name, postal code, address, telephone number, age, personal identification number, account number, amount, and date. The items to be described are not limited to the above examples.
 画像抽出部12は、例えば、定義データに含まれるプレプリントの位置の情報を基に、帳票上におけるプレプリントの位置を特定する。そして、画像抽出部12は、記憶部16に保存された画像から、特定したプレプリントの位置の画像を切り出すことでプレプリント画像を抽出する。画像抽出部12は、取得部11が取得する、プレプリント上に文字が記載されていない状態の帳票の画像から、プレプリント画像を抽出してもよい。 The image extraction unit 12 identifies the position of the preprint on the form, for example, based on the information on the position of the preprint included in the definition data. Then, the image extraction unit 12 extracts a preprint image from the image stored in the storage unit 16 by cutting out the image at the specified preprint position. The image extraction unit 12 may extract the preprint image from the image of the form with no characters written on the preprint, which is acquired by the acquisition unit 11.
 認識部13は、認識モデルを用いて、取得部11が取得したプレプリント上に記載された文字を写した画像と、プレプリント画像から、画像中の文字を認識する。認識モデルは、プレプリント上に記載された文字を写した画像と、プレプリント画像からプレプリント上に記載された文字を認識する学習モデルである。認識部13は、例えば、取得部11が取得した、プレプリント上に記載された文字を写した画像と、プレプリント画像とを認識モデルに入力する。そして、認識部13は、認識モデルを用いて、プレプリント上に記載された文字を認識する。認識部13は、あらかじめ抽出されているプレプリント画像を用いて、プレプリント上に記載された文字を認識してもよい。また、認識部13は、あらかじめプレプリント部分の画像として生成されたプレプリント画像を用いて、プレプリント上に記載された文字を認識してもよい。認識部13は、例えば、記憶部16に保存されているプレプリント画像を用いて、プレプリント上に記載された文字を認識する。 The recognition unit 13 uses the recognition model to recognize the characters in the image from the image of the characters written on the preprint acquired by the acquisition unit 11 and the preprint image. The recognition model is a learning model that recognizes the characters written on the preprint from an image of the characters written on the preprint and the preprint image. The recognition unit 13 inputs, for example, the image of the characters written on the preprint and the preprint image acquired by the acquisition unit 11 into the recognition model. Then, the recognition unit 13 recognizes the characters written on the preprint using the recognition model. The recognition unit 13 may recognize characters written on the preprint using a preprint image extracted in advance. Further, the recognition unit 13 may recognize characters written on the preprint using a preprint image generated in advance as an image of the preprint portion. The recognition unit 13 uses, for example, a preprint image stored in the storage unit 16 to recognize characters written on the preprint.
 認識部13は、例えば、定義データに含まれるプレプリントの位置の情報を基に、プレプリントの位置を特定することによって、プレプリント上に記載された文字が写った画像を抽出する。そして、認識部13は、認識モデルを用いて、抽出したプレプリント上に記載された文字が写った画像と、画像抽出部12が抽出したプレプリント画像から、プレプリント上に記載された文字を認識する。 The recognition unit 13 extracts an image showing the characters written on the preprint by specifying the position of the preprint, for example, based on the information on the position of the preprint included in the definition data. Then, the recognition unit 13 uses the recognition model to identify the characters written on the preprint from the extracted image of the characters written on the preprint and the preprint image extracted by the image extraction unit 12. recognize.
 認識部13は、例えば、認識モデルに、プレプリント上に記載された文字を写した画像と、プレプリント画像とを1つのデータに結合しデータを入力する。プレプリント上に記載された文字を写した画像と、プレプリント画像を結合するとは、2つの画像を重ね合わせた画像データを生成することをいう。プレプリント上に文字が記載された画像と、プレプリント画像がそれぞれ、1画素あたりRGBの3チャンネルの画像である場合に、認識部13は、例えば、2つの画像のデータを結合して、1画素あたり6チャンネルの画像データとする。そして、認識部13は、認識モデルに、結合した6チャンネルの画像データを入力する。 For example, the recognition unit 13 combines an image of the characters written on the preprint and the preprint image into one data and inputs the data into the recognition model. Combining an image of characters written on a preprint with a preprint image means generating image data by superimposing the two images. If the image with characters written on the preprint and the preprint image are images with three channels of RGB per pixel, the recognition unit 13, for example, combines the data of the two images to create a single image. Image data of 6 channels per pixel. Then, the recognition unit 13 inputs the combined six-channel image data to the recognition model.
 認識部13は、例えば、プレプリント上に記載された文字が写った画像と、プレプリント画像をあらかじめ設定された条件に基づいて結合する。認識部13は、例えば、同一のサイズで抽出されたプレプリント上に記載された文字が写った画像と、プレプリント画像の外周部を基準に、プレプリント上に文字に記載された文字が写った画像と、プレプリント画像を重ね合わせることで2つの画像を結合する。認識部13は、例えば、2つの画像を重ね合わせた際に、対応する画素どうしの画像データを結合する。そして、認識部13は、結合したデータを認識モデルに入力し、プレプリント上に記載された文字を認識する。 The recognition unit 13 combines, for example, an image of characters written on a preprint and a preprint image based on preset conditions. The recognition unit 13, for example, uses an image of the characters written on the preprint extracted at the same size and a character written on the preprint based on the outer periphery of the preprint image. The two images are combined by overlapping the preprint image and the preprint image. For example, when two images are superimposed, the recognition unit 13 combines image data of corresponding pixels. Then, the recognition unit 13 inputs the combined data into a recognition model and recognizes the characters written on the preprint.
 認識部13は、帳票を写した画像において、プレプリント上に記載された文字以外の文字の認識を行ってもよい。認識部13は、例えば、取得部11が取得した帳票の画像から、帳票の種類を特定してもよい。そして、認識部13は、特定した帳票の種類に対応する帳票データに含まれる定義データを基に、プレプリントの位置を特定することによって、プレプリント上に記載された文字を認識する。認識部13は、例えば、帳票の画像において、帳票に印刷されている帳票の名称または帳票番号を認識することによって、帳票の種類を特定する。帳票に印刷されている帳票の名称または帳票番号と、帳票の種類の関係は、あらかじめ設定されている。また、認識部13が用いる認識モデルは、文字認識システム10の外部で生成された学習モデルであってもよい。 The recognition unit 13 may recognize characters other than those written on the preprint in the image of the form. For example, the recognition unit 13 may identify the type of the form from the image of the form acquired by the acquisition unit 11. Then, the recognition unit 13 recognizes the characters written on the preprint by specifying the position of the preprint based on the definition data included in the form data corresponding to the specified type of form. The recognition unit 13 identifies the type of the form, for example, by recognizing the form name or form number printed on the form in the image of the form. The relationship between the form name or form number printed on the form and the type of form is set in advance. Further, the recognition model used by the recognition unit 13 may be a learning model generated outside the character recognition system 10.
 図8は、プレプリント上に記載された文字を写した画像の例を示す図である。図8は、図3の画像の例とプレプリントの態様が異なる。図8の画像の例のプレプリントは、例えば、図3の画像の例とプレプリントと線の太さおよび種類が異なる。図9は、図8の画像の例におけるプレプリント画像である。図8のプレプリント上に記載された文字を写した画像の例では、図9に示すプレプリント上に、手書きで「13758047」の文字が記載されている。認識モデルは、図8の画像の例と、図9の画像の例とを入力とした場合に、「13758047」を認識結果として出力する。認識モデルは、例えば、図4の画像の例のプレプリントを学習データとして用いて生成された学習モデルであっても、図9の画像の例のプレプリント上に記載された文字を認識することができる。すなわち、プレプリント上に記載された文字が写った画像と、プレプリント画像を入力とすることで、認識モデルは、学習を行っていないプレプリント上に記載された文字を認識することができる。 FIG. 8 is a diagram showing an example of an image of characters written on a preprint. FIG. 8 differs from the example of the image in FIG. 3 in the aspect of the preprint. The preprint of the example image in FIG. 8 differs from the example image in FIG. 3 in the thickness and type of lines, for example. FIG. 9 is a preprint image of the example image of FIG. In the example of the image of the characters written on the preprint shown in FIG. 8, the characters "13758047" are handwritten on the preprint shown in FIG. The recognition model outputs "13758047" as a recognition result when the example image in FIG. 8 and the example image in FIG. 9 are input. For example, even if the recognition model is a learning model generated using the preprint of the example image in FIG. 4 as learning data, it cannot recognize the characters written on the preprint of the example image in FIG. 9. I can do it. That is, by inputting an image of characters written on a preprint and a preprint image, the recognition model can recognize characters written on a preprint that has not been trained.
 図10は、プレプリント上に、西暦表記における年を記載した文字を写した画像の例を示す図である。図10の画像の例では、プレプリントとして、「西暦」と、「年」の文字がプレプリントの枠内にあらかじめ印刷されている。図10の画像の例では、プレプリント画像上に、手書きで「2022」の文字が記載されている。また、図11は、図10の画像の例におけるプレプリント画像である。認識モデルは、図10の画像の例と、図11の画像の例とを入力とした場合に、「2022」を認識結果として出力する。 FIG. 10 is a diagram showing an example of an image in which characters describing the year in Western calendar notation are printed on a preprint. In the example of the image in FIG. 10, the characters "A.D." and "Year" are printed in advance within the frame of the preprint. In the example image of FIG. 10, the characters "2022" are handwritten on the preprint image. Further, FIG. 11 is a preprint image in the example of the image in FIG. 10. The recognition model outputs "2022" as a recognition result when the example image in FIG. 10 and the example image in FIG. 11 are input.
 図12は、図10の画像の例において、西暦表記での年を示す上位の2桁の「20」がプレプリントとしてあらかじめ印刷されている画像の例を示す。すなわち、図12の画像の例では、プレプリントとして「西暦」、「20」および「年」があらかじめ印刷されている。図12の画像の例では、プレプリント上に、「2022」のうち、「22」が手書きで記載されている。また、図13は、図12の画像の例におけるプレプリント画像である。認識モデルは、図12の画像の例と、図13の画像の例とを入力とした場合に、「22」を認識結果として出力する。認識モデルは、例えば、図11および図13の画像の例のプレプリントを学習データとして用いていない学習モデルであっても、入力された画像から、プレプリント上に記載された画像を認識することができる。このように、認識モデルは、プレプリント上に記載された文字が写った画像と、プレプリント画像を入力とすることで、様々な態様のプレプリント上に記載された文字を認識することができる。また、上記の例では、線の太さ、線の種類およびあらかじめ印刷された文字が異なるプレプリントの例を示したが、認識モデルは、枠の形状および色が異なるプレプリントの場合においても、すべての態様のプレプリントを学習データとして用いて学習していなくても同様に認識を行うことができる。 FIG. 12 shows an example of an image in which, in the image example of FIG. 10, the upper two digits "20" indicating the year in Western calendar notation are printed in advance as a preprint. That is, in the example image of FIG. 12, "Year", "20", and "Year" are printed in advance as preprints. In the example image of FIG. 12, "22" out of "2022" is handwritten on the preprint. Moreover, FIG. 13 is a preprint image in the example of the image of FIG. 12. The recognition model outputs "22" as a recognition result when the example image in FIG. 12 and the example image in FIG. 13 are input. For example, even if the recognition model is a learning model that does not use the preprints of the example images of FIGS. 11 and 13 as learning data, it is possible to recognize the image written on the preprint from the input image. I can do it. In this way, the recognition model can recognize characters written on various forms of preprints by inputting images of the characters written on the preprints and preprint images. . In addition, although the above example shows preprints with different line thicknesses, line types, and preprinted characters, the recognition model can also be used for preprints with different frame shapes and colors. Recognition can be performed in the same way even if learning is not performed using preprints of all aspects as learning data.
 出力部14は、認識部13による認識結果を出力する。出力部14は、例えば、情報処理サーバ30に、認識部13が認識した文字を出力する。出力部14は、例えば、プレプリントに対応する項目と、認識した文字を関連付けて出力する。認識の対象が図3の画像の例に示すような口座番号の場合に、出力部14は、例えば、口座番号であることを示す情報と、認識した文字列を関連付けて出力する。出力部14は、文字認識システム10に接続されている図示しない表示装置に、認識結果を出力してもよい。 The output unit 14 outputs the recognition result by the recognition unit 13. The output unit 14 outputs the characters recognized by the recognition unit 13 to the information processing server 30, for example. The output unit 14 outputs, for example, an item corresponding to the preprint and the recognized characters in association with each other. When the recognition target is an account number as shown in the example image of FIG. 3, the output unit 14 outputs, for example, information indicating that it is an account number in association with the recognized character string. The output unit 14 may output the recognition result to a display device (not shown) connected to the character recognition system 10.
 文字認識システム10において認識モデルを生成する場合に、生成部15は、認識モデルの生成に関する処理を行う。生成部15は、プレプリント上に記載された文字を写した画像、およびプレプリント画像と、プレプリント上に記載されている文字との関係を学習する。そして、生成部15は、プレプリント上に記載された文字が写った画像と、プレプリント画像とから、画像中の文字を認識する認識モデルを生成する。 When generating a recognition model in the character recognition system 10, the generation unit 15 performs processing related to generation of the recognition model. The generation unit 15 learns an image of the characters written on the preprint, and the relationship between the preprint image and the characters written on the preprint. Then, the generation unit 15 generates a recognition model that recognizes the characters in the image from the image of the characters written on the preprint and the preprint image.
 生成部15は、例えば、プレプリント上に記載された文字が写った画像と、プレプリント画像を結合したデータと、プレプリント上に記載されている文字との関係を学習することで認識モデルを生成する。プレプリント上に記載された文字が写った画像と、プレプリント画像がそれぞれ、1画素あたりRGBの3チャンネルの画像である場合に、生成部15は、2つの画像のデータを結合して、1画素あたり6チャンネルの画像データとする。そして、生成部15は、結合した6チャンネルの画像データと、プレプリント上に記載されている文字との関係を学習することで認識モデルを生成する。 For example, the generation unit 15 generates a recognition model by learning the relationship between an image of the characters written on the preprint, data obtained by combining the preprint images, and the characters written on the preprint. generate. When the image showing the characters written on the preprint and the preprint image are images with three channels of RGB per pixel, the generation unit 15 combines the data of the two images to create one image. Image data of 6 channels per pixel. The generation unit 15 then generates a recognition model by learning the relationship between the combined six-channel image data and the characters written on the preprint.
 生成部15が学習データとして用いる、プレプリント上に記載された文字が写った画像と、プレプリント画像に含まれるプレプリントは、実際に用いるプレプリントの画像でなくてもよい。認識モデルを生成する際に、生成部15は、ランダムな形状の図形をプレプリントとして用いて、学習を行ってもよい。ランダムな形状の図形をプレプリントとして用いる場合に、生成部15は、例えば、ランダムな形状の図形の上に書かれた文字が写った画像と、上に文字が書かれた図形と同一の図形の画像とを学習データとして用いて認識モデルを生成する。 The image of the characters written on the preprint and the preprint included in the preprint image, which the generation unit 15 uses as learning data, do not have to be images of the preprint actually used. When generating the recognition model, the generation unit 15 may perform learning using randomly shaped figures as preprints. When using a random-shaped figure as a preprint, the generation unit 15 generates, for example, an image of characters written on the randomly-shaped figure, and an image that is the same as the figure with the characters written on it. A recognition model is generated using the images as learning data.
 生成部15は、例えば、DNN(Deep Neural Network)を用いたディープラーニングによって認識モデルを生成する。認識モデルを生成する機械学習アルゴリズムは、DNNを用いたディープラーニングに限られない。 The generation unit 15 generates a recognition model by deep learning using, for example, DNN (Deep Neural Network). Machine learning algorithms for generating recognition models are not limited to deep learning using DNN.
 記憶部16は、例えば、認識部13が画像中の文字の認識に用いる認識モデルを保存する。記憶部16は、例えば、プレプリント画像を保存する。記憶部16は、例えば、帳票データを保存する。帳票データは、例えば、帳票の画像データと、定義データを含む。帳票データには、あらかじめ抽出されたプレプリント画像が含まれていてもよい。記憶部16は、例えば、学習データとして、プレプリント上に文字が記載された画像と、プレプリント画像と、プレプリント上に記載されている文字とを保存する。なお、認識部13が用いる認識モデルは、記憶部16以外の記憶手段に保存されていてもよい。 The storage unit 16 stores, for example, a recognition model used by the recognition unit 13 to recognize characters in an image. The storage unit 16 stores, for example, preprint images. The storage unit 16 stores, for example, form data. The form data includes, for example, image data of a form and definition data. The form data may include a preprint image extracted in advance. The storage unit 16 stores, for example, an image of characters written on a preprint, a preprint image, and characters written on the preprint as learning data. Note that the recognition model used by the recognition unit 13 may be stored in a storage means other than the storage unit 16.
 スキャナー20は、例えば、帳票を光学的に読み取り、帳票の画像を生成する。そして、スキャナー20は、文字認識システム10に、帳票の画像を出力する。スキャナー20は、帳票の画像のうち、プレプリント部分の画像を抽出してもよい。プレプリント部分の画像を抽出する場合に、スキャナー20は、文字認識システム10に、抽出したプレプリント画像を出力する。また、帳票が管理対象の物品に張り付けられた書類である場合に、スキャナー20は、帳票を撮影することで、帳票の画像を生成してもよい。 The scanner 20, for example, optically reads a form and generates an image of the form. The scanner 20 then outputs the image of the form to the character recognition system 10. The scanner 20 may extract the image of the preprint portion from among the images of the form. When extracting an image of the preprint portion, the scanner 20 outputs the extracted preprint image to the character recognition system 10. Further, when the form is a document attached to an item to be managed, the scanner 20 may generate an image of the form by photographing the form.
 情報処理サーバ30は、例えば、文字認識システム10から、帳票に記載された文字の認識結果を取得する。情報処理サーバ30は、認識結果を用いて、用途に応じた処理を行う。情報処理サーバ30は、例えば、認識結果を、金融機関における口座の管理に関する申請および入出金に関する処理に用いる。情報処理サーバ30は、例えば、認識結果を、官公庁、教育機関、病院、または交通機関における申請書類の処理に用いてもよい。情報処理サーバ30は、認識結果を、企業における伝票処理に用いてもよい。また、情報処理サーバ30は、識別結果を、流通における物品の管理に用いてもよい。識別結果の用の例は、上記に限られない。 The information processing server 30 acquires, for example, the recognition results of characters written on the form from the character recognition system 10. The information processing server 30 uses the recognition results to perform processing according to the purpose. The information processing server 30 uses the recognition results, for example, in processing related to application and deposit/withdrawal related to account management at a financial institution. The information processing server 30 may use the recognition results, for example, to process application documents in government offices, educational institutions, hospitals, or transportation facilities. The information processing server 30 may use the recognition results for slip processing at a company. Further, the information processing server 30 may use the identification results for managing the goods in distribution. Examples of identification results are not limited to the above.
 文字認識システム10が、プレプリント上に記載された文字を認識する際の動作について説明する。図14は、文字認識システム10が、プレプリント上に記載された文字を認識する際の動作フローの例を示す図である。 The operation of the character recognition system 10 when recognizing characters written on a preprint will be described. FIG. 14 is a diagram showing an example of an operation flow when the character recognition system 10 recognizes characters written on a preprint.
 取得部11は、プレプリント上に記載された文字が写った画像を取得する(ステップS11)。取得部11は、例えば、スキャナー20から、プレプリント上に記載された文字が写った帳票の画像を取得する。 The acquisition unit 11 acquires an image showing the characters written on the preprint (step S11). The acquisition unit 11 acquires, for example, from the scanner 20 an image of a form showing characters written on the preprint.
 また、画像抽出部12は、取得部11が取得した画像に対応するプレプリント画像を抽出する(ステップS12)。画像抽出部12は、例えば、記憶部16に保存されたデータから、取得部11が取得した画像に対応するプレプリント画像を抽出する。 Furthermore, the image extraction unit 12 extracts a preprint image corresponding to the image acquired by the acquisition unit 11 (step S12). The image extraction unit 12 extracts, for example, a preprint image corresponding to the image acquired by the acquisition unit 11 from the data stored in the storage unit 16.
 プレプリント画像が抽出されると、認識部13は、認識モデルを用いて、取得部11が取得した画像と、プレプリント画像とから、画像中の文字を認識する(ステップS13)。認識モデルは、プレプリント上に記載された文字が写った画像と、プレプリント画像とから、プレプリント上に記載された文字を認識する。 When the preprint image is extracted, the recognition unit 13 uses the recognition model to recognize characters in the image from the image acquired by the acquisition unit 11 and the preprint image (step S13). The recognition model recognizes the characters written on the preprint from the image of the characters written on the preprint and the preprint image.
 画像中の文字が認識されると、出力部14は、認識結果を出力する(ステップS14)。出力部14は、例えば、情報処理サーバ30に、認識結果を出力する。 When the characters in the image are recognized, the output unit 14 outputs the recognition results (step S14). The output unit 14 outputs the recognition result to the information processing server 30, for example.
 文字認識システム10が、認識モデルを生成する際の動作について説明する。図15は、文字認識システム10が、認識モデルを生成する際の動作フローの例を示す図である。 The operation of the character recognition system 10 when generating a recognition model will be explained. FIG. 15 is a diagram showing an example of an operation flow when the character recognition system 10 generates a recognition model.
 取得部11は、学習データとして、プレプリント上に記載された文字が写った画像と、プレプリント画像と、プレプリント上に記載された文字とを取得する(ステップS21)。 The acquisition unit 11 acquires, as learning data, an image of the characters written on the preprint, the preprint image, and the characters written on the preprint (step S21).
 学習データを取得すると、生成部15は、プレプリント上に記載された文字が写った画像、およびプレプリント画像と、プレプリント上に記載された文字の関係を学習し、認識モデルを生成する(ステップS22)。生成部15は、例えば、プレプリント上に記載された文字が写った画像と、プレプリント画像とを結合する。そして、生成部15は、結合したデータと、学習データに正解データとして含まれる、プレプリント上に記載された文字の関係を学習し、認識モデルを生成する。 Upon acquiring the learning data, the generation unit 15 learns the image of the characters written on the preprint and the relationship between the preprint image and the characters written on the preprint, and generates a recognition model ( Step S22). For example, the generation unit 15 combines an image of characters written on a preprint with a preprint image. Then, the generation unit 15 learns the relationship between the combined data and the characters written on the preprint, which are included as correct data in the learning data, and generates a recognition model.
 認識モデルを生成すると、生成部15は、生成した認識モデルを保存する(ステップS23)。生成部15は、例えば、記憶部16に、生成した認識モデルを保存する。 After generating the recognition model, the generation unit 15 saves the generated recognition model (step S23). The generation unit 15 stores the generated recognition model in the storage unit 16, for example.
 本実施形態の帳票処理システムの文字認識システム10は、認識モデルを用いて、プレプリント上に記載された文字が写った画像と、プレプリント画像から、プレプリント上に記載された文字を認識する。文字認識システム10は、認識対象となるプレプリント上に記載された文字が写った記載された画像に加え、プレプリント画像をさらに用いてプレプリント上に記載された文字を認識することで、文字の認識にプレプリントが与える影響を抑制することができる。その結果、文字認識システム10は、プレプリント上に記載された文字の認識の精度を向上させることができる。 The character recognition system 10 of the form processing system of this embodiment uses a recognition model to recognize characters written on a preprint from an image of the characters written on the preprint and a preprint image. . The character recognition system 10 recognizes the characters written on the preprint by further using the preprint image in addition to the written image showing the characters written on the preprint to be recognized. It is possible to suppress the influence of preprints on the recognition of As a result, the character recognition system 10 can improve the accuracy of recognizing characters written on preprints.
 また、文字認識システム10が用いる認識モデルは、プレプリント上に記載された文字が写った画像と、プレプリント画像とを入力として、プレプリント上に記載された文字の認識を行うことで、学習を行っていない態様のプレプリント上に記載された文字を認識することができる。よって、プレプリント上に記載された文字が写った画像と、プレプリント画像とを入力として、プレプリント上に記載された文字の認識を行うことで、文字認識システム10は、様々な態様のプレプリント上に記載された文字を認識することができる。また、文字認識システム10では、認識モデルを生成する際に、実際に認識に用いられるプレプリントの態様ごとに学習データを用意することが不要となる。また、文字認識システム10では、認識モデルを生成する際に、実際に認識に用いられるプレプリントの態様ごとに学習データを学習する必要がないため、認識モデルの生成する際の学習量を抑制することができる。このため、文字認識システム10では、認識モデルの生成に必要なコンピュータのリソースを抑制することができる。よって、文字認識システム10は、認識モデルを効率的に生成することができる。 In addition, the recognition model used by the character recognition system 10 uses as input an image of the characters written on the preprint and the preprint image, and performs learning by recognizing the characters written on the preprint. It is possible to recognize characters written on a preprint of an embodiment in which the method is not performed. Therefore, the character recognition system 10 recognizes the characters written on the preprint by inputting an image showing the characters written on the preprint and the preprint image. Characters written on prints can be recognized. Furthermore, in the character recognition system 10, when generating a recognition model, it is not necessary to prepare learning data for each form of preprint actually used for recognition. In addition, in the character recognition system 10, when generating a recognition model, there is no need to learn learning data for each form of preprint actually used for recognition, so the amount of learning when generating a recognition model is suppressed. be able to. Therefore, in the character recognition system 10, the computer resources necessary for generating a recognition model can be suppressed. Therefore, the character recognition system 10 can efficiently generate a recognition model.
 また、認識モデルを生成する際に、プレプリントとしてランダムな形状の図形を用いることで、文字認識システム10は、様々なプレプリント画像上に記載された文字を認識可能な認識モデルを生成することができる。すなわち、プレプリントとしてランダムな形状の図形を用いて生成した認識モデルを用いることで、文字認識システム10は、帳票ごとにプレプリント画像の形状が異なっている場合でも、プレプリント上に記載された文字を正確に認識することができる。 Furthermore, by using randomly shaped figures as preprints when generating recognition models, the character recognition system 10 can generate recognition models that can recognize characters written on various preprint images. I can do it. That is, by using a recognition model generated using randomly shaped figures as a preprint, the character recognition system 10 can recognize the characters written on the preprint even if the shape of the preprint image is different for each form. Can recognize characters accurately.
 また、本実施形態と異なる文字認識手法として、例えば、プレプリント上に記載された文字が写った画像から、プレプリントを消去してから文字認識を行う手法を用いた場合には、プレプリントを消去するために、コンピュータのリソースを多く必要とし得る。また、プレプリントを消去する際に、文字の一部が消える恐れがある。一方で、本実施形態の文字認識システム10は、プレプリント上に記載された文字が写った画像と、プレプリント画像を結合したデータとを認識モデルに入力して文字を認識することで、文字を認識する前処理としてプレプリントの消去の処理を必要としない。また、プレプリントの消去の処理を行わないため、プレプリントの消去に関する処理が文字認識に与える影響を抑制することができる。このため、本実施形態の文字認識システム10は、プレプリント上に記載された文字の認識のために必要なリソースを抑制しつつ、認識の精度を向上することができる。 Furthermore, as a character recognition method different from this embodiment, for example, if a method is used in which character recognition is performed after erasing the preprint from an image of the characters written on the preprint, the preprint It can require a lot of computer resources to erase. Furthermore, when erasing a preprint, there is a risk that some of the characters may disappear. On the other hand, the character recognition system 10 of this embodiment recognizes characters by inputting an image of characters written on a preprint and data obtained by combining the preprint images into a recognition model. There is no need to erase preprints as preprocessing for recognition. Further, since the process of erasing the preprint is not performed, it is possible to suppress the influence of the process related to erasing the preprint on character recognition. Therefore, the character recognition system 10 of this embodiment can improve recognition accuracy while suppressing the resources necessary for recognizing characters written on preprints.
 (第2の実施形態)
 本発明の第2の実施形態について図を参照して詳細に説明する。図16は、本実施形態の帳票処理システムの構成の例を示す図である。帳票処理システムは、一例として、文字認識システム40と、スキャナー20と、情報処理サーバ30を備える。文字認識システム40は、例えば、ネットワークを介して、スキャナー20と接続する。また、文字認識システム40は、ネットワークを介して、情報処理サーバ30と接続する。スキャナー20および情報処理サーバ30は、複数であってもよい。スキャナー20および情報処理サーバ30の数は、特に限定されない。また、本実施形態のスキャナー20と、情報処理サーバ30の機能は、第1の実施形態のスキャナー20と、情報処理サーバ30と同様である。
(Second embodiment)
A second embodiment of the present invention will be described in detail with reference to the drawings. FIG. 16 is a diagram showing an example of the configuration of the form processing system of this embodiment. The form processing system includes, for example, a character recognition system 40, a scanner 20, and an information processing server 30. The character recognition system 40 is connected to the scanner 20 via a network, for example. Further, the character recognition system 40 is connected to the information processing server 30 via a network. There may be a plurality of scanners 20 and information processing servers 30. The number of scanners 20 and information processing servers 30 is not particularly limited. Further, the functions of the scanner 20 and the information processing server 30 of this embodiment are similar to those of the scanner 20 and the information processing server 30 of the first embodiment.
 第1の実施形態の文字認識システム10は、例えば、認識モデルを用いて、プレプリント上に文字が記載された画像と、プレプリント画像とを結合したデータを入力とし、プレプリント上の文字を認識する。そして、文字認識システム10は、認識結果を出力する。このような構成に加え、本実施形態の文字認識システム40は、例えば、プレプリント上に文字が記載された画像と、プレプリント画像とを結合する際に、2つの画像の重ね合わせの精度を向上させるため、変換モデルを用いてプレプリント画像に変換処理を行った後に結合する。変換モデルは、プレプリント画像に変換処理を行う際に用いる変換パラメータを推定する学習モデルである。 The character recognition system 10 of the first embodiment uses, for example, a recognition model to input data that combines an image in which characters are written on a preprint and a preprint image, and recognizes the characters on the preprint. recognize. The character recognition system 10 then outputs the recognition result. In addition to such a configuration, the character recognition system 40 of the present embodiment, for example, when combining an image with characters written on a preprint and a preprint image, improves the accuracy of overlapping the two images. In order to improve the image quality, a transformation model is used to transform the preprint images and then combine them. The conversion model is a learning model that estimates conversion parameters used when performing conversion processing on a preprint image.
 文字認識システム40の構成について説明する。図17は、文字認識システム40の構成の例を示す図である。文字認識システム40は、取得部11と、画像抽出部12と、認識部41と、出力部14と、生成部42と、記憶部16を備える。また、認識部41は、変換部51と、画像認識部52を備える。文字認識システム40の取得部11、画像抽出部12、出力部14および記憶部16の構成と機能は、第1の実施形態の文字認識システム10の取得部11、画像抽出部12、出力部14および記憶部16とそれぞれ同様である。 The configuration of the character recognition system 40 will be explained. FIG. 17 is a diagram showing an example of the configuration of the character recognition system 40. The character recognition system 40 includes an acquisition section 11 , an image extraction section 12 , a recognition section 41 , an output section 14 , a generation section 42 , and a storage section 16 . The recognition unit 41 also includes a conversion unit 51 and an image recognition unit 52. The configuration and functions of the acquisition unit 11, image extraction unit 12, output unit 14, and storage unit 16 of the character recognition system 40 are the same as the acquisition unit 11, image extraction unit 12, and output unit 14 of the character recognition system 10 of the first embodiment. and storage unit 16, respectively.
 認識部41の変換部51は、例えば、変換モデルを用いて、プレプリント画像を変換する。変換モデルは、例えば、プレプリント画像にアフィン変換を行う。認識部41は、例えば、プレプリント画像について、回転、大きさの調整および平行移動を行うことで、結合先の画像と重なり合うようにプレプリント画像を変換する。変換モデルは、例えば、プレプリント画像について、回転、大きさの調整および平行移動を行う際に用いる変換パラメータを推定する。 The conversion unit 51 of the recognition unit 41 converts the preprint image using, for example, a conversion model. The transformation model, for example, performs an affine transformation on the preprint image. The recognition unit 41 converts the preprint image so that it overlaps with the destination image by, for example, rotating, adjusting the size, and moving the preprint image in parallel. The transformation model estimates transformation parameters used when rotating, adjusting size, and translating a preprint image, for example.
 変換部51は、例えば、変換モデルを用いて、プレプリント上に記載された文字が写った画像と、プレプリント画像とをあらかじめ設定された条件によって結合したデータから、アフィン変換パラメータを推定する。そして、変換部51は、推定したパラメータを用いて、プレプリント画像をアフィン変換する。変換部51は、例えば、あらかじめ設定された条件として、プレプリント上に記載された文字が写った画像と、プレプリント画像のそれぞれの外周部を合わせることで2つの画像が重なるようにして結合する。そして、変換部51は、変換モデルを用いて、あらかじめ設定された条件で結合されたデータから変換パラメータを推定する。変換パラメータは、あらかじめ設定された条件で結合した場合よりも、重ね合わせの精度が向上するようにプレプリント画像を変換するためのパラメータである。変換パラメータを推定すると、変換部51は、プレプリント画像に対して変換パラメータを用いてアフィン変換を行うことで、重ね合わせの精度がより高くなるようにする。 For example, the conversion unit 51 uses a conversion model to estimate affine transformation parameters from data obtained by combining an image of characters written on a preprint and a preprint image according to preset conditions. Then, the conversion unit 51 performs affine transformation on the preprint image using the estimated parameters. For example, the converting unit 51 combines the two images so that they overlap by matching the outer periphery of each of the preprint images with the image of the characters written on the preprint as a preset condition. . Then, the conversion unit 51 uses the conversion model to estimate conversion parameters from the data combined under preset conditions. The conversion parameter is a parameter for converting preprint images so that the accuracy of overlaying is improved compared to when combining under preset conditions. After estimating the transformation parameters, the transformation unit 51 performs affine transformation on the preprint image using the transformation parameters, thereby increasing the accuracy of superposition.
 変換モデルは、例えば、STN(Spatial Transformer Networks)と呼ばれるDNNを用いる学習モデルである。STNを用いる画像の変換方法は、例えば、Max Jaderberg et al. "Spatial Transformer Networks", NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 2, December 2015, p. 2017-2025に記載されている。 The transformation model is, for example, a learning model that uses DNN called STN (Spatial Transformer Networks). The image transformation method using STN is described, for example, in Max Jaderberg et al. "Spatial Transformer Networks", NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 2, December 2015, p. 2017-2025 Are listed.
 認識部41の画像認識部52は、認識モデルを用いて、プレプリント上に記載された文字が写った画像と、プレプリント画像とから、プレプリント上に記載された文字を認識する。画像認識部52は、プレプリント上に記載された文字が写った画像と、変換部51がアフィン変換を行ったプレプリント画像とを結合する。そして、画像認識部52は、識別モデルを用いて、結合したデータからプレプリント上に記載された文字を認識する。変換モデルおよび認識モデルは、文字認識システム40の外部で生成された学習モデルであってもよい。 The image recognition unit 52 of the recognition unit 41 uses a recognition model to recognize the characters written on the preprint from the image of the characters written on the preprint and the preprint image. The image recognition unit 52 combines the image of the characters written on the preprint with the preprint image on which the conversion unit 51 has performed affine transformation. Then, the image recognition unit 52 uses the identification model to recognize characters written on the preprint from the combined data. The conversion model and the recognition model may be learning models generated outside the character recognition system 40.
 図18は、認識部41においてプレプリント上に記載された文字を認識する際の処理のフローを模式的に示す図である。図18の例において、プレプリント上に記載された文字が写った画像と、プレプリント画像が認識部41に入力されたとする。変換部51は、例えば、プレプリント上に記載された文字が写った画像と、プレプリント画像とを、例えば、あらかじめ設定された条件によって結合する。あらかじめ設定された条件は、例えば、2つの画像の外周部を合わせるように設定される。画像を結合すると、変換部51は、変換モデルを用いて、アフィン変換パラメータを推定する。そして、変換部51は、推定したアフィン変換パラメータを用いて、プレプリント画像にアフィン変換を行う。変換部51は、アフィン変換を行ったプレプリント画像を画像認識部52に出力する。アフィン変換を行ったプレプリント画像が入力されると、画像認識部52は、プレプリント上に文字が記載された画像と、アフィン変換された画像とを結合する。画像を結合すると、画像認識部52は、認識モデルを用いて、結合されたデータからプレプリント上に記載された文字を認識する。 FIG. 18 is a diagram schematically showing the flow of processing when the recognition unit 41 recognizes characters written on a preprint. In the example of FIG. 18, it is assumed that an image showing characters written on a preprint and a preprint image are input to the recognition unit 41. The converting unit 51 combines, for example, an image of characters written on a preprint and a preprint image according to, for example, preset conditions. The preset conditions are, for example, set so that the outer peripheries of the two images are aligned. After combining the images, the conversion unit 51 estimates affine transformation parameters using the transformation model. Then, the conversion unit 51 performs affine transformation on the preprint image using the estimated affine transformation parameters. The converting unit 51 outputs the preprint image that has undergone affine transformation to the image recognizing unit 52. When the preprint image that has been subjected to affine transformation is input, the image recognition unit 52 combines the image in which characters are written on the preprint and the image that has been subjected to affine transformation. When the images are combined, the image recognition unit 52 uses the recognition model to recognize characters written on the preprint from the combined data.
 文字認識システム40は、例えば、変換モデルと、認識モデルのうち、認識モデルのみを生成する。認識モデルのみを生成する場合には、変換モデルには、例えば、文字認識システム40の外部で生成された学習モデルが用いられる。変換モデルと、認識モデルのうち、認識モデルのみを生成する場合に、生成部42は、例えば、学習データに含まれる、プレプリント上に記載された文字が写った画像と、プレプリント画像とを、変換モデルを用いて結合する。そして、生成部42は、結合したデータと、学習データに正解データとして含まれる、プレプリント上に記載された文字の関係を学習し、認識モデルを生成する。生成部42は、記憶部16に、生成した変換モデルと、認識モデルを保存する。 For example, the character recognition system 40 generates only the recognition model out of the conversion model and the recognition model. When only a recognition model is generated, for example, a learning model generated outside the character recognition system 40 is used as the conversion model. When generating only the recognition model out of the conversion model and the recognition model, the generation unit 42 generates, for example, an image containing the characters written on the preprint and the preprint image, which are included in the learning data. , are combined using a transformation model. Then, the generation unit 42 learns the relationship between the combined data and the characters written on the preprint, which are included as correct data in the learning data, and generates a recognition model. The generation unit 42 stores the generated conversion model and recognition model in the storage unit 16.
 文字認識システム40が変換モデルと、認識モデルの両方を生成してもよい。変換モデルと、認識モデルの両方を生成する場合に、生成部42は、変換モデルを用いて、プレプリント上に記載された文字が写った画像とプレプリント画像とをあらかじめ設定された条件によって結合したデータから変換パラメータを推定する。また、生成部42は、認識モデルを用いて、結合したデータからプレプリント上に記載された文字を認識する。生成部42は、変換モデルが推定するアフィン変換パラメータと、学習データに含まれるアフィン変換パラメータの差が小さくなるように変換モデルのパラメータを更新する。また、生成部42は、識別結果と、正解データの差が小さくなるよう認識モデルのパラメータを更新する。 The character recognition system 40 may generate both a conversion model and a recognition model. When generating both a conversion model and a recognition model, the generation unit 42 uses the conversion model to combine an image of the characters written on the preprint and the preprint image according to preset conditions. Estimate the transformation parameters from the data. Furthermore, the generation unit 42 uses the recognition model to recognize characters written on the preprint from the combined data. The generation unit 42 updates the parameters of the transformation model so that the difference between the affine transformation parameters estimated by the transformation model and the affine transformation parameters included in the learning data becomes smaller. The generation unit 42 also updates the parameters of the recognition model so that the difference between the identification result and the correct data becomes smaller.
 変換モデルの変換パラメータと、認識モデルのパラメータを更新すると、生成部42は、更新したモデルを用いて上記の処理を繰り返す。生成部42は、例えば、変換モデルの変換パラメータの推定結果と、認識モデルの認識結果の精度があらかじめ設定された基準を満たすまで上記の処理を繰り返すことで変換モデルと、認識モデルを生成する。また、生成部42は、例えば、識別結果と、正解データの差が小さくなるよう認識モデルのパラメータを更新することで識別モデルを生成する。生成部42は、例えば、記憶部16に、生成した変換モデルと、認識モデルを保存する。 After updating the conversion parameters of the conversion model and the parameters of the recognition model, the generation unit 42 repeats the above process using the updated model. For example, the generation unit 42 generates a conversion model and a recognition model by repeating the above processing until the accuracy of the estimation result of the conversion parameter of the conversion model and the recognition result of the recognition model satisfy a preset standard. Further, the generation unit 42 generates the identification model by, for example, updating the parameters of the recognition model so that the difference between the identification result and the correct data becomes smaller. The generation unit 42 stores the generated conversion model and recognition model in the storage unit 16, for example.
 文字認識システム40が、プレプリント上に記載された文字を認識する際の動作について説明する。図19は、文字認識システム40が、プレプリント上に記載された文字を認識する際の動作フローの例を示す図である。 The operation of the character recognition system 40 when recognizing characters written on a preprint will be described. FIG. 19 is a diagram showing an example of an operation flow when the character recognition system 40 recognizes characters written on a preprint.
 取得部11は、プレプリント上に記載された文字が写った画像を取得する(ステップS31)。取得部11は、例えば、スキャナー20から、プレプリント上に記載された文字が写った帳票の画像を取得する。 The acquisition unit 11 acquires an image showing the characters written on the preprint (step S31). The acquisition unit 11 acquires, for example, from the scanner 20 an image of a form showing characters written on the preprint.
 また、画像抽出部12は、取得部11が取得した画像に対応するプレプリント画像を抽出する(ステップS32)。画像抽出部12は、例えば、記憶部16に保存されたデータから、取得部11が取得した画像に対応するプレプリント画像を抽出する。 Furthermore, the image extraction unit 12 extracts a preprint image corresponding to the image acquired by the acquisition unit 11 (step S32). The image extraction unit 12 extracts, for example, a preprint image corresponding to the image acquired by the acquisition unit 11 from the data stored in the storage unit 16.
 プレプリント画像が取得されると、認識部41の変換部51は、変換モデルを用いて、プレプリント画像を変換する際に用いる変換パラメータを推定する。そして、変換部51は、推定した変換パラメータを用いて、プレプリント画像を変換する(ステップS33)。プレプリント画像が変換されると、画像認識部52は、プレプリント上に文字が記載された画像と、変換されたプレプリント画像を結合する。そして、画像認識部52は、認識モデルを用いて、結合したデータから画像中の文字を認識する(ステップS34)。 When the preprint image is acquired, the conversion unit 51 of the recognition unit 41 uses the conversion model to estimate conversion parameters to be used when converting the preprint image. Then, the conversion unit 51 converts the preprint image using the estimated conversion parameters (step S33). When the preprint image is converted, the image recognition unit 52 combines the image with the characters written on the preprint and the converted preprint image. Then, the image recognition unit 52 uses the recognition model to recognize characters in the image from the combined data (step S34).
 画像中の文字が認識されると、出力部14は、認識の結果を出力する(ステップS35)。出力部14は、例えば、情報処理サーバ30に、認識の結果を出力する。 When the characters in the image are recognized, the output unit 14 outputs the recognition results (step S35). The output unit 14 outputs the recognition result to the information processing server 30, for example.
 文字認識システム40が、変換モデルと、認識モデルのうち、認識モデルのみを生成する際の動作について説明する。図20は、文字認識システム40が、認識モデルのみを生成する際の動作フローの例を示す図である。 The operation when the character recognition system 40 generates only the recognition model out of the conversion model and the recognition model will be described. FIG. 20 is a diagram showing an example of an operation flow when the character recognition system 40 generates only a recognition model.
 取得部11は、学習データとして、プレプリント上に記載された文字が写った画像と、プレプリント画像と、プレプリント上に記載された文字を取得する(ステップS41)。 The acquisition unit 11 acquires, as learning data, an image of the characters written on the preprint, the preprint image, and the characters written on the preprint (step S41).
 学習データが取得されると、生成部42は、変換モデルを用いて、プレプリント画像を変換する際に用いる変換パラメータを推定する。そして、生成部42は、推定した変換パラメータを用いて、変換モデルを用いて、プレプリント画像を変換する(ステップS42)。 Once the learning data is acquired, the generation unit 42 uses the conversion model to estimate conversion parameters to be used when converting the preprint image. Then, the generation unit 42 converts the preprint image using the estimated conversion parameters and the conversion model (step S42).
 プレプリント画像を変換すると、生成部42は、プレプリント上に記載された文字が写った画像と、変換したプレプリント画像を結合する。そして、生成部42は、結合したデータと、プレプリント上に記載された文字の関係を学習し、認識モデルを生成する(ステップS43)。 After converting the preprint image, the generation unit 42 combines the image containing the characters written on the preprint with the converted preprint image. The generation unit 42 then learns the relationship between the combined data and the characters written on the preprint, and generates a recognition model (step S43).
 認識モデルを生成すると、生成部42は、生成した認識モデルを保存する(ステップS44)。生成部42は、例えば、記憶部16に、生成した認識モデルを保存する。 After generating the recognition model, the generation unit 42 saves the generated recognition model (step S44). The generation unit 42 stores the generated recognition model in the storage unit 16, for example.
 文字認識システム40が、変換モデルと、認識モデルを生成する際の動作について説明する。図21は、文字認識システム40が、変換モデルと、認識モデルを生成する際の動作フローの例を示す図である。 The operation of the character recognition system 40 when generating a conversion model and a recognition model will be explained. FIG. 21 is a diagram showing an example of an operation flow when the character recognition system 40 generates a conversion model and a recognition model.
 取得部11は、学習データとして、プレプリント上に記載された文字が写った画像とプレプリント画像とを結合したデータと、変換パラメータと、プレプリント上に記載された文字を取得する(ステップS51)。 The acquisition unit 11 acquires, as learning data, data obtained by combining an image of the characters written on the preprint and the preprint image, a conversion parameter, and the characters written on the preprint (step S51 ).
 学習データが取得されると、生成部42は、学習モデルに含まれる、プレプリント上に記載された文字が写った画像とプレプリント画像とを結合したデータと、学習モデルに含まれるパラメータとの関係を学習することによって、変換モデルを生成する。また、生成部42は、プレプリント上に記載された文字が写った画像とプレプリント画像を結合したデータと、プレプリント上に記載された文字の関係を学習することによって認識モデルを生成する(ステップS52)。 When the learning data is acquired, the generation unit 42 combines the data included in the learning model, which is a combination of the image of the characters written on the preprint and the preprint image, and the parameters included in the learning model. Generate a transformation model by learning relationships. In addition, the generation unit 42 generates a recognition model by learning the relationship between the data obtained by combining the image of the characters written on the preprint and the preprint image, and the characters written on the preprint ( Step S52).
 変換モデルと、認識モデルを生成すると、生成部42は、生成した変換モデルと、認識モデルを保存する(ステップS53)。生成部42は、例えば、記憶部16に、生成した変換モデルと、認識モデルを保存する。 After generating the conversion model and recognition model, the generation unit 42 saves the generated conversion model and recognition model (step S53). The generation unit 42 stores the generated conversion model and recognition model in the storage unit 16, for example.
 本実施形態の文字認識システム40は、変換モデルを用いて、プレプリント上に記載された文字が写った画像と、プレプリント画像とを結合する。そして、文字認識システム40は、認識モデルを用いて、結合したデータから、プレプリント上に記載された文字を認識する。変換モデルを用いて変換したプレプリント画像を用いることで、文字認識システム40は、プレプリント上に記載された文字が写った画像と、プレプリント画像とを結合する際の、重ね合わせの精度を向上することができる。このように結合したデータを用いることで、文字認識システム40は、プレプリント上に記載された文字が写った画像と、プレプリント画像とのずれの変動が抑制された状態で、認識モデルによって、プレプリント上の文字を認識することができる。2つの画像のずれの変動が抑制された状態で、認識モデルによってプレプリント上に記載された文字を認識することで、文字認識システム40は、プレプリント上に記載された文字の認識精度を向上することができる。 The character recognition system 40 of this embodiment uses a conversion model to combine an image of characters written on a preprint with a preprint image. Then, the character recognition system 40 uses the recognition model to recognize characters written on the preprint from the combined data. By using the preprint image converted using the conversion model, the character recognition system 40 improves the accuracy of superposition when combining the image of the characters written on the preprint with the preprint image. can be improved. By using the data combined in this way, the character recognition system 40 uses the recognition model to recognize the characters written on the preprint and the preprint image while suppressing fluctuations in the deviation between the image showing the characters written on the preprint and the preprint image. Characters on preprints can be recognized. The character recognition system 40 improves the recognition accuracy of the characters written on the preprint by using the recognition model to recognize the characters written on the preprint while variations in the shift between the two images are suppressed. can do.
 また、学習データを用いて変換モデルを生成する場合には、文字認識システム40は、実際の使用状況において生じ得る、プレプリント上に記載された文字を写した画像と、プレプリント画像との重ね合わせのずれを抑制する変換モデルを生成することができる。よって、文字認識システム40は、実際の使用状況に応じて、プレプリント上に文字が記載された画像と、プレプリント画像とのずれの変動を抑制することができる。このため、学習データを用いて変換モデルを生成する場合には、文字認識システム40は、プレプリント上に記載された文字の認識精度をより向上することができる。 In addition, when generating a conversion model using learning data, the character recognition system 40 recognizes the overlap between the image of the characters written on the preprint and the preprint image, which may occur in actual use. A conversion model that suppresses misalignment can be generated. Therefore, the character recognition system 40 can suppress variations in the deviation between the image in which characters are written on the preprint and the preprint image, depending on the actual usage situation. Therefore, when generating a conversion model using learning data, the character recognition system 40 can further improve the recognition accuracy of characters written on a preprint.
 第1の実施形態の文字認識システム10および第2の実施形態の文字認識システム40における各処理は、コンピュータプログラムをコンピュータで実行することによって実現することができる。図22は、第1の実施形態の文字認識システム10および第2の実施形態の文字認識システム40における各処理を行うコンピュータプログラムを実行するコンピュータ200の構成の例を示したものである。コンピュータ200は、CPU(Central Processing Unit)201と、メモリ202と、記憶装置203と、入出力I/F(Interface)204と、通信I/F205を備える。 Each process in the character recognition system 10 of the first embodiment and the character recognition system 40 of the second embodiment can be realized by executing a computer program on a computer. FIG. 22 shows an example of the configuration of a computer 200 that executes a computer program that performs each process in the character recognition system 10 of the first embodiment and the character recognition system 40 of the second embodiment. The computer 200 includes a CPU (Central Processing Unit) 201, a memory 202, a storage device 203, an input/output I/F (Interface) 204, and a communication I/F 205.
 CPU201は、記憶装置203から各処理を行うコンピュータプログラムを読み出して実行する。CPU201は、複数のCPUの組み合わせによって構成されていてもよい。また、CPU201は、CPUと他の種類のプロセッサの組み合わせによって構成されていてもよい。例えば、CPU201は、CPUとGPU(Graphics Processing Unit)の組み合わせによって構成されていてもよい。メモリ202は、DRAM(Dynamic Random Access Memory)等によって構成され、CPU201が実行するコンピュータプログラムや処理中のデータが一時記憶される。記憶装置203は、CPU201が実行するコンピュータプログラムを記憶している。記憶装置203は、例えば、不揮発性の半導体記憶装置によって構成されている。記憶装置203には、ハードディスクドライブ等の他の記憶装置が用いられてもよい。入出力I/F204は、作業者からの入力の受付および表示データ等の出力を行うインタフェースである。通信I/F205は、スキャナー20および情報処理サーバ30との間でデータの送受信を行うインタフェースである。また、情報処理サーバ30も同様の構成としてもよい。 The CPU 201 reads computer programs for performing each process from the storage device 203 and executes them. The CPU 201 may be configured by a combination of multiple CPUs. Further, the CPU 201 may be configured by a combination of a CPU and other types of processors. For example, the CPU 201 may be configured by a combination of a CPU and a GPU (Graphics Processing Unit). The memory 202 is configured with a DRAM (Dynamic Random Access Memory) or the like, and temporarily stores computer programs executed by the CPU 201 and data being processed. The storage device 203 stores computer programs executed by the CPU 201. The storage device 203 is configured by, for example, a nonvolatile semiconductor storage device. Other storage devices such as a hard disk drive may be used as the storage device 203. The input/output I/F 204 is an interface that receives input from a worker and outputs display data and the like. The communication I/F 205 is an interface that transmits and receives data between the scanner 20 and the information processing server 30. Furthermore, the information processing server 30 may also have a similar configuration.
 各処理の実行に用いられるコンピュータプログラムは、データを非一時的に記録するコンピュータ読み取り可能な記録媒体に格納して頒布することもできる。記録媒体としては、例えば、データ記録用磁気テープや、ハードディスクなどの磁気ディスクを用いることができる。また、記録媒体としては、CD-ROM(Compact Disc Read Only Memory)等の光ディスクを用いることもできる。不揮発性の半導体記憶装置を記録媒体として用いてもよい。 The computer program used to execute each process can also be stored and distributed in a computer-readable recording medium that non-temporarily records data. As the recording medium, for example, a magnetic tape for data recording or a magnetic disk such as a hard disk can be used. Further, as the recording medium, an optical disc such as a CD-ROM (Compact Disc Read Only Memory) can also be used. A nonvolatile semiconductor memory device may be used as the recording medium.
 以上、上述した実施形態を例として本発明を説明した。しかしながら、本発明は、上述した実施形態には限定されない。即ち、本発明は、本発明のスコープ内において、当業者が理解し得る様々な態様を適用することができる。 The present invention has been described above using the above-described embodiment as an example. However, the invention is not limited to the embodiments described above. That is, the present invention can apply various aspects that can be understood by those skilled in the art within the scope of the present invention.
 10  文字認識システム
 11  取得部
 12  画像抽出部
 13  認識部
 14  出力部
 15  生成部
 16  記憶部
 20  スキャナー
 30  情報処理サーバ
 40  文字認識システム
 41  認識部
 42  生成部
 51  変換部
 52  画像認識部
 100  コンピュータ
 101  CPU
 102  メモリ
 103  記憶装置
 104  入出力I/F
 105  通信I/F
10 Character recognition system 11 Acquisition unit 12 Image extraction unit 13 Recognition unit 14 Output unit 15 Generation unit 16 Storage unit 20 Scanner 30 Information processing server 40 Character recognition system 41 Recognition unit 42 Generation unit 51 Conversion unit 52 Image recognition unit 100 Computer 101 CPU
102 Memory 103 Storage device 104 Input/output I/F
105 Communication I/F

Claims (10)

  1.  プレプリントを含む帳票のプレプリント上に記載された文字を写した画像を取得する取得手段と、
     プレプリント上に記載された文字を写した画像と、プレプリントを写したプレプリント画像とから前記プレプリント上に記載された文字を認識する認識モデルを用いて、取得した前記画像と、前記プレプリント画像とから、取得した前記画像のプレプリント上に記載された文字を認識する認識手段と、
     前記認識の結果を出力する出力手段と
     を備える文字認識システム。
    an acquisition means for acquiring an image of characters written on the preprint of the form including the preprint;
    A recognition model that recognizes the characters written on the preprint from an image of the characters written on the preprint and a preprint image of the preprint is used to obtain the acquired image and the preprint. recognition means for recognizing characters written on the acquired preprint of the image from the print image;
    A character recognition system comprising: an output means for outputting the recognition result.
  2.  前記認識手段は、前記プレプリント上に記載された文字を写した画像と、前記プレプリント画像とを1つのデータに結合したデータから、取得した前記画像のプレプリント上の文字を認識する、
     請求項1に記載の文字認識システム。
    The recognition means recognizes the characters on the preprint of the acquired image from data obtained by combining an image of the characters written on the preprint and the preprint image into one data.
    The character recognition system according to claim 1.
  3.  変換パラメータを用いて、前記プレプリント画像を変換する変換手段をさらに備え、
     前記認識手段は、取得した前記画像と、変換された前記プレプリント画像とを結合したデータから、取得した前記画像のプレプリント上の文字を認識する、
     請求項2に記載の文字認識システム。
    Further comprising conversion means for converting the preprint image using conversion parameters,
    The recognition means recognizes characters on the preprint of the acquired image from data combining the acquired image and the converted preprint image.
    The character recognition system according to claim 2.
  4.  前記変換手段は、前記画像と、変換された前記プレプリント画像とを結合したデータから変換パラメータを推定する変換モデルを用いて、前記プレプリント画像を変換する、
     請求項3に記載の文字認識システム。
    The conversion means converts the preprint image using a conversion model that estimates conversion parameters from data combining the image and the converted preprint image.
    The character recognition system according to claim 3.
  5.  前記認識手段は、前記画像から前記プレプリント上に記載された文字を認識する対象の帳票の種類を特定し、特定した帳票の種類に対応する定義データを基に、取得した位置のプレプリント上に記載された文字を認識する、
     請求項1から4いずれかに記載の文字認識システム。
    The recognition means identifies from the image the type of form for which characters written on the preprint are to be recognized, and based on the definition data corresponding to the identified type of form, the recognition means identifies the type of form on the preprint at the acquired position based on the definition data corresponding to the specified type of form. Recognize the characters written in
    A character recognition system according to any one of claims 1 to 4.
  6.  前記認識手段は、帳票上におけるプレプリントの位置が定義された定義データを基に、プレプリント上に記載された文字を認識する、
     請求項1から5いずれかに記載の文字認識システム。
    The recognition means recognizes characters written on the preprint based on definition data defining the position of the preprint on the form.
    A character recognition system according to any one of claims 1 to 5.
  7.  プレプリント上に記載された文字を写した画像、および前記プレプリント画像と、プレプリント上に記載されている文字との関係を学習し、プレプリント上に記載された文字を写した画像、および前記プレプリント画像から、前記画像のプレプリント上に記載された文字を認識する認識モデルを生成する生成手段をさらに備える、
     請求項1から5いずれかに記載の文字認識システム。
    An image of the characters written on the preprint, an image of the characters written on the preprint by learning the relationship between the preprint image and the characters written on the preprint, and Further comprising a generation means for generating, from the preprint image, a recognition model that recognizes characters written on the preprint of the image.
    A character recognition system according to any one of claims 1 to 5.
  8.  前記生成手段は、プレプリント上に記載された文字を写した画像と、前記プレプリント画像とを1つのデータに結合したデータと、変換パラメータとの関係を学習し、前記プレプリント画像の変換に用いる変換パラメータを推定する変換モデルを生成する、
     請求項7に記載の文字認識システム。
    The generating means learns the relationship between an image of characters written on the preprint, data obtained by combining the preprint image into one data, and a conversion parameter, and performs conversion of the preprint image. generating a transformation model that estimates the transformation parameters to be used;
    The character recognition system according to claim 7.
  9.  プレプリントを含む帳票のプレプリント上に記載された文字を写した画像を取得し、
     プレプリント上に文字が記載された画像と、プレプリントを写したプレプリント画像とから前記プレプリント上の文字を認識する認識モデルを用いて、取得した前記画像と、前記プレプリント画像とから、取得した前記画像のプレプリント上に記載された文字を認識し、
     前記認識の結果を出力する、
     文字認識方法。
    Obtain an image of the characters written on the preprint of the form containing the preprint,
    From the image obtained by using a recognition model that recognizes the characters on the preprint from an image with characters written on the preprint and a preprint image that captures the preprint, from the image and the preprint image, Recognizing the characters written on the preprint of the acquired image,
    outputting the recognition result;
    Character recognition method.
  10.  プレプリントを含む帳票のプレプリント上に記載された文字を写した画像を取得する処理と、
     プレプリント上に記載された文字を写した画像と、プレプリントを写したプレプリント画像とから前記プレプリント上の文字を認識する認識モデルを用いて、取得した前記画像と、前記プレプリント画像とから、取得した前記画像のプレプリント上に記載された文字を認識する処理と、
     前記認識の結果を出力する処理と
     をコンピュータに実行させる文字認識プログラムを非一時的に記録する記録媒体。
    A process of acquiring an image of characters written on a preprint of a form including a preprint;
    The image obtained using a recognition model that recognizes the characters on the preprint from an image of the characters written on the preprint and a preprint image that represents the preprint. a process of recognizing characters written on the preprint of the acquired image;
    A recording medium that non-temporarily records a character recognition program that causes a computer to execute the following: a process of outputting the recognition result;
PCT/JP2022/013389 2022-03-23 2022-03-23 Character recognition system, character recognition method, and recording medium WO2023181149A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/013389 WO2023181149A1 (en) 2022-03-23 2022-03-23 Character recognition system, character recognition method, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/013389 WO2023181149A1 (en) 2022-03-23 2022-03-23 Character recognition system, character recognition method, and recording medium

Publications (1)

Publication Number Publication Date
WO2023181149A1 true WO2023181149A1 (en) 2023-09-28

Family

ID=88100226

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/013389 WO2023181149A1 (en) 2022-03-23 2022-03-23 Character recognition system, character recognition method, and recording medium

Country Status (1)

Country Link
WO (1) WO2023181149A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05266247A (en) * 1992-03-19 1993-10-15 Toshiba Corp Picture data processing system
JP2007148846A (en) * 2005-11-29 2007-06-14 Nec Corp Ocr device, form out method, and form out program
JP2021043650A (en) * 2019-09-10 2021-03-18 キヤノン株式会社 Image processing device, image processing system, image processing method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05266247A (en) * 1992-03-19 1993-10-15 Toshiba Corp Picture data processing system
JP2007148846A (en) * 2005-11-29 2007-06-14 Nec Corp Ocr device, form out method, and form out program
JP2021043650A (en) * 2019-09-10 2021-03-18 キヤノン株式会社 Image processing device, image processing system, image processing method, and program

Similar Documents

Publication Publication Date Title
US20190279170A1 (en) Dynamic resource management associated with payment instrument exceptions processing
US9317745B2 (en) Data lifting for exception processing
US9342741B2 (en) Systems, methods and computer program products for determining document validity
CA2502811C (en) System and method for capture, storage and processing of receipts and related data
US9098765B2 (en) Systems and methods for capturing and storing image data from a negotiable instrument
US9824288B1 (en) Programmable overlay for negotiable instrument electronic image processing
US10229395B2 (en) Predictive determination and resolution of a value of indicia located in a negotiable instrument electronic image
US10528807B2 (en) System and method for processing and identifying content in form documents
US9031308B2 (en) Systems and methods for recreating an image using white space and check element capture
US20160379186A1 (en) Element level confidence scoring of elements of a payment instrument for exceptions processing
JP2015069256A (en) Character identification system
US10922537B2 (en) System and method for processing and identifying content in form documents
WO2023181149A1 (en) Character recognition system, character recognition method, and recording medium
JP2019191665A (en) Financial statements reading device, financial statements reading method and program
CN110135218A (en) The method, apparatus, equipment and computer storage medium of image for identification
US20150120548A1 (en) Data lifting for stop payment requests
KR101516684B1 (en) A service method for transforming document using optical character recognition
JP2008005219A (en) Document image processing system
US20150120517A1 (en) Data lifting for duplicate elimination
JP2014186659A (en) Image collation device, image collation method, and image collation program
CN101727572A (en) Method for ensuring image integrity by using file characteristics
US11055528B2 (en) Real-time image capture correction device
US10115081B2 (en) Monitoring module usage in a data processing system
WO2023042270A1 (en) Character recognition program, character recognition system, and character recognition method
US20230081511A1 (en) Systems and methods for improved payroll administration in a freelance workforce

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22933295

Country of ref document: EP

Kind code of ref document: A1