WO2020203339A1

WO2020203339A1 - Printed character string recognition device, program, and method

Info

Publication number: WO2020203339A1
Application number: PCT/JP2020/012230
Authority: WO
Inventors: 武史 ▲吉▼田; 昂平安田; 諒介佐々木; 康介木戸; 亮介田嶋; 大田　佳宏
Original assignee: Ａｒｉｔｈｍｅｒ株式会社
Priority date: 2019-03-29
Filing date: 2020-03-19
Publication date: 2020-10-08
Also published as: JPWO2020203339A1; JP6820578B1

Abstract

This printed character string recognition 20 is provided with an acquisition unit 24A, an extraction unit 24B, a division unit 24C, and a recognition unit 24D. The acquisition unit 24A acquires a business form image Gs in which a printed character string ML is entered in a reading area R. The extraction unit 24B extracts the image of the reading area R on the basis of a reference business form image Gc. The division unit 24C divides the image of the printed character string ML entered in the reading area R into images of a rectangular area T of one character unit. The recognition unit 24D recognizes the content of the printed character string ML on the basis of the images of a rectangular area T. Here, the recognition unit 24D recognizes the content of the printed character string ML by using a character string recognition model 21C which infers the content of a printed character ML entered in the image T of the rectangular area according to an input of the image T of the rectangular area.

Description

Type string recognition device, program, and method.

This disclosure relates to a type string recognition device, a program, and a method.

Conventionally, the development of technology for recognizing characters contained in form images has been promoted. For example, Patent Document 1 (Japanese Unexamined Patent Publication No. 2003-115028) discloses a form processing technique for identifying whether a standard form corresponds to any of a plurality of types of registered forms.

When there are many form images to be processed, it is required to read the character strings contained in these form images at high speed and with high accuracy.

The type character string recognition device of the first viewpoint includes an acquisition unit, an extraction unit, a division unit, and a recognition unit. The acquisition unit acquires a document image in which a print character string is written in the reading area. The extraction unit extracts the image of the reading area based on the reference document image. The dividing unit divides the image of the printed character string written in the reading area into an image of a rectangular area of one character unit for each row or column. The recognition unit recognizes the contents of the printed character string by using the character string recognition model. Here, the character string recognition model infers the content of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. With such a configuration, character strings contained in a large number of document images (including form images) can be read at high speed and with high accuracy.

The print character string recognition device of the second viewpoint is the print character string recognition device of the first viewpoint, and the extraction unit extracts a plurality of reading areas corresponding to a plurality of predetermined items. Further, when two or more predetermined items are associated with the same character string recognition model, the recognition unit synthesizes rectangular areas obtained from the two or more predetermined items, and combines the combined rectangular areas with the same character. Fill in the column recognition model. Here, in the document image, a print character string is written in association with a plurality of predetermined items. Further, the character string recognition model is constructed by a neural network, and a plurality of character string recognition models are prepared in association with a plurality of predetermined items. With such a configuration, the recognition speed of the printed character string can be increased.

The type character string recognition device of the third viewpoint is the type character string recognition device of the first viewpoint or the second viewpoint, and the recognition unit generates a candidate character string from the type characters inferred by using the character string recognition model. To do. In addition, the recognition unit compares the candidate character string with the reference character string stored in advance to calculate the similarity. Then, when the similarity is equal to or higher than a predetermined value, the recognition unit recognizes the reference character string as the content of the print character string. With such a configuration, it is possible to improve the recognition accuracy of the print character string for the item for which the candidate character string is expected.

The print character string recognition device of the fourth viewpoint is the print character string recognition device of the third viewpoint, and when the recognition unit has a similarity smaller than a predetermined value, the candidate character string is recognized as the content of the print character string. With such a configuration, it is possible to improve the recognition accuracy of the printed character string for the item for which free description is permitted.

The print character string recognition device of the fifth viewpoint is the print character string recognition device of the first to fourth viewpoints, and the dividing unit binarizes the image in the reading area. In addition, the dividing unit specifies a continuous region in which a plurality of printed characters are continuous along the row direction or the column direction of the binarized image. Then, the dividing unit scans the image of the continuous region in the column direction or the row direction and divides the image into a binary rectangular region image for each character. With such a configuration, the rectangular area of the printed character can be divided with high accuracy.

The print character string recognition device of the sixth viewpoint is the print character string recognition device of the fifth viewpoint, and the recognition unit recognizes the contents of the print character string from the image of the binarized rectangular area. With such a configuration, the recognition speed of the printed character string can be increased.

The print character string recognition device of the seventh viewpoint is the print character string recognition device of the fifth viewpoint, and the recognition unit is an image of the rectangular area before binarization corresponding to the binarized rectangular area. Recognize the contents of the type string from. With such a configuration, the recognition accuracy of the printed character string can be improved.

The print character string recognition device of the eighth viewpoint is a print character string recognition device of the first to fourth viewpoints, and the dividing unit binarizes the image in the reading area. In addition, the dividing unit specifies a continuous region in which a plurality of printed characters are continuous along the row direction or the column direction of the binarized image. Then, the dividing unit divides the binarized image corresponding to the continuous region into a rectangular region image based on a unit length corresponding to one character along the row direction or the column direction. With such a configuration, the rectangular area of the printed character can be divided with high accuracy.

The print character string recognition device of the ninth viewpoint is a print character string recognition device of the first to eighth viewpoints, and displays the recognition result of the print character string by the recognition unit by changing the display form according to the character type. Is what you do. With such a configuration, the correctness of the character type to be read can be easily determined.

The typographic character string recognition program of the tenth viewpoint causes the computer to function as an acquisition unit, an extraction unit, a division unit, and a recognition unit. The acquisition unit acquires a document image in which a print character string is written in the reading area. The extraction unit extracts the image of the reading area based on the reference document image. The dividing unit divides the image of the printed character string written in the reading area into an image of a rectangular area in units of one character. The recognition unit recognizes the contents of the printed character string by using the character string recognition model. Here, the character string recognition model infers the content of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. With such a configuration, character strings contained in a large number of document images can be read at high speed and with high accuracy.

The 11th viewpoint type character string recognition method is a method of recognizing the contents of the type character string using a computer. In this method, a document image in which a print character string is written in a reading area is acquired. Then, the image in the reading area is extracted based on the reference document image. Next, the image of the printed character string written in the reading area is divided into an image of a rectangular area for each character. Then, the content of the printed character string is recognized by using the character string recognition model. Here, the character string recognition model infers the content of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. With such a configuration, character strings contained in a large number of document images can be read at high speed and with high accuracy.

It is a schematic diagram which shows the structure of the type character string recognition apparatus 20 which concerns on 1st Embodiment. It is a schematic diagram which shows an example of the form image Gs which concerns on the same embodiment. It is a schematic diagram which shows an example of the standard form image Gc which concerns on the same embodiment. It is a schematic diagram which shows the example of the image of the reading area which concerns on the same embodiment. It is a flowchart for demonstrating operation of the type character string recognition apparatus 20 which concerns on this embodiment. It is a figure for demonstrating the operation of the division part 24C which concerns on the same embodiment. It is a figure for demonstrating the operation of the division part 24C which concerns on the same embodiment. It is a figure for demonstrating the operation of the recognition unit 24D which concerns on this embodiment. It is a schematic diagram which shows the structure of the type character string recognition apparatus 20 which concerns on modification D. It is a figure for demonstrating the operation of the recognition part 24D which concerns on modification E. It is a schematic diagram which shows the structure of the type character string recognition apparatus 20S which concerns on 2nd Embodiment. It is a schematic diagram for demonstrating the structure of the reference character string database 21DS which concerns on this embodiment. It is a flowchart for demonstrating operation of the type character string recognition apparatus 20S which concerns on this embodiment. It is a figure for demonstrating an example of the operation of the type character string recognition apparatus 20S which concerns on this embodiment. It is a figure for demonstrating an example of the operation of the type character string recognition apparatus 20S which concerns on this embodiment. It is a schematic diagram which shows the structure of the type character string recognition apparatus 20T which concerns on 3rd Embodiment. It is a schematic diagram which shows the structure of the type character string recognition apparatus 20U which concerns on 4th Embodiment.

Hereinafter, embodiments of the print character string recognition device according to the present disclosure will be described together with drawings. In the following description, when a plurality of the same objects are individually described, they may be described with subscripts. For example, when the "predetermined item" is described as a whole, it is described as the predetermined item K, and when each predetermined item is specifically described, it is described with subscripts such as the predetermined items K1 to K6.

<First Embodiment>
(1-1) Configuration of Type Character String Recognition Device FIG. 1 is a schematic diagram showing a configuration of a print character string recognition device 20 according to the present embodiment. The print character string recognition device 20 recognizes the contents of the print character string ML described in the form image Gs by using the character string recognition model 21C described later. As a premise, as shown in FIG. 2, the contents to be entered are displayed in print in the predetermined item column of the form image Gs according to the present embodiment. The form image Gs means an image on which an arbitrary form is copied. In addition, each type character constituting the type character string referred to here has the same character width.
In the following description, the print character string recognition device 20 is described as recognizing the contents of the print character string described in the form image Gs, but the recognition target of the print character string recognition device 20 is the form. It is not limited to images. The type character string recognition device can recognize the contents of a "document image" on which an arbitrary document in which a character string is entered is copied.

As an example of the form image Gs, one in which the print character strings ML1 to ML6 are entered in association with each of the plurality of predetermined items K1 to K6 is used. Item K1 corresponds to the zip code, item K2 corresponds to the address, item K3 corresponds to the name, item K4 corresponds to the registration number, item K5 corresponds to the article used, and item K6 corresponds to the purpose of use. However, the contents of these items are merely examples and are not limited thereto.

The type character string recognition device 20 can be realized by an arbitrary computer, and includes a storage unit 21, an input unit 22, an output unit 23, and a processing unit 24.

The storage unit 21 stores various types of information, and is realized by an arbitrary storage device such as a memory and a hard disk. Here, the storage unit 21 stores information such as the weight of the neural network that constructs the character string recognition model 21C.

The character string recognition model 21C infers the content of the printed characters entered in the image of the rectangular area in response to the input of the image of the rectangular area of each character. The character string recognition model 21C is constructed by a convolutional neural network (CNN) or the like whose weights are adjusted based on the teacher image on which the printed characters are copied.

The input unit 22 is realized by an arbitrary input device such as a keyboard, a mouse, and a touch panel, and inputs various information to the computer.

The output unit 23 is realized by an arbitrary output device such as a display, a touch panel, and a speaker, and outputs various information from a computer.

The processing unit 24 executes various types of information processing, and is realized by a processor such as a CPU or GPU and a memory. Here, by reading one or more programs stored in the storage unit 21 into the CPU, GPU, etc. of the computer, the processing unit 24 uses the acquisition unit 24A, the extraction unit 24B, the division unit 24C, and the recognition unit. Functions as 24D.

The acquisition unit 24A acquires the form image Gs in which the print character string ML is entered. The acquisition unit 24A acquires the image of the form to be read as the form image Gs via an arbitrary imaging device.

The extraction unit 24B matches the form image Gs with the reference form image Gc and extracts the reading area R. The matching referred to here is not only to directly compare the images to obtain the difference, but also to obtain the reading area R from the form image Gs based on the coordinate information corresponding to the predetermined item K acquired from the reference form image Gc. It means the process of extracting.
As shown in FIG. 3, in the reference form image Gc, the column corresponding to the predetermined item K of the form image Gs is blank. Therefore, in the examples shown in FIGS. 2 and 3, the contents corresponding to the predetermined items K1 to K6 in the form image Gs are extracted as the reading areas R1 to R6, respectively, based on the form image Gs and the reference form image Gc. Here, FIG. 4A is an image of the reading area R1 corresponding to the item K1, FIG. 4B is an image of the reading area R2 corresponding to the item K2, and FIG. 4C is a reading area corresponding to the item K3. The image of R3, FIG. 4 (d) is the image of the reading area R4 corresponding to the item K4, FIG. 4 (e) is the image of the reading area R5 corresponding to the item K5, and FIG. 4 (f) is the reading corresponding to the item K6. The image of the region R6 is shown. The reference form image Gc and the coordinate information corresponding to the predetermined item K in the reference form image Gc are stored in advance in the storage unit 21.

The division unit 24C divides the type character string ML written in the reading area R into a rectangular area T in units of one character. Details will be described later.

The recognition unit 24D recognizes the contents of the print character string ML based on the image of the rectangular area T. Here, the recognition unit 24D recognizes the contents of the print character string ML by using the character string recognition model 21C.

(1-2) Operation of the print character string recognition device FIG. 5 is a flowchart for explaining the operation of the print character string recognition device 20 according to the present embodiment.
First, the form image Gs to be read is imaged via an arbitrary imaging device. Then, these form images Gs are timely stored in the storage unit 21 of the print character string recognition device 20 (A1).

Next, the print character string recognition device 20 extracts the image of the reading area R based on the reference form image Gc (A2).
Subsequently, the print character string recognition device 20 divides the image of the print character string ML written in the reading area R into an image of the rectangular area T in units of one character for each line (A3). At this time, the type character string recognition device 20 identifies a continuous area S in which a plurality of type characters are continuous along the line direction in the reading area R by the function of the division unit 24C. Taking item K2 (address) as an example, as shown in FIG. 6, the division unit 24C binarizes the image of the reading area R2 of item K2 (FIG. 6A) and binarizes the reading area. (Fig. 6 (b)). Then, the dividing unit 24C expands the image of the binarized reading area R2 in the line direction to fill the character portion (FIG. 6C). As a result, adjacent characters are combined at predetermined character intervals, and a plurality of characters up to a blank between words are specified as continuous areas S1 to S4. In the example shown in FIG. 6, the images in which "Aichi Prefecture", "Nagoya City", "Naka Ward", and "X-chome Y-address z" are described are specified as individual continuous regions S1 to S4 (FIG. 6). 6 (d)).

Subsequently, the print character string recognition device 20 scans each continuous area S1 to S4 in the column direction by the function of the division unit 24C and divides each continuous area S1 to S4 for each character. Specifically, the continuous regions S1 to S4 are scanned along the line L having a width equal to or less than the character spacing, and the luminance value in the column direction is calculated. Then, the area where the calculated luminance value is equal to or less than a predetermined value is regarded as a space corresponding to the character spacing, and the image corresponding to the part up to the next space is extracted as a rectangular area T for each character. For example, as shown in FIG. 7, when the characters of "Aichi prefecture" are specified as the continuous area S1, the images in which the characters of "love", "knowledge", and "prefecture" are entered are extracted as rectangular areas T1 to T3. Will be done. Note that FIG. 7A is a diagram showing the concept of the line L scanning the continuous region S1, and FIG. 7B is a diagram showing the luminance value in the continuous region S1. In FIG. 7B, the luminance value in the column direction when the continuous region S1 is scanned by the line L is shown corresponding to the pixels in the horizontal direction of the image in the continuous region S1.

Next, the print character string recognition device 20 recognizes the contents of the print character string ML based on the image of the rectangular area T divided into character units (A4). Specifically, the print character string recognition device 20 recognizes the content of the print character string ML by inferring the content of the print character using the character string recognition model 21C constructed by the neural network. The term "inference" as used herein means a recognition result of a printed character calculated by a neural network whose weight is adjusted based on a general-purpose character image. For example, as shown in FIG. 8, when the image of the rectangular region T1 corresponding to the part "love" is input to the character string recognition model 21C, "love", "melancholy", "舜", "receive", etc. are "love". A type character similar in shape to "" is inferred with accuracy p. In the example of FIG. 8, the accuracy of "love" is 99.58309, the accuracy of "melancholy" is 0.0040124, the accuracy of "舜" is 1.26771865 × (10-5th power), and the accuracy of "receive" is It is inferred to be 4.238405 × (10 to the -9th power). Therefore, since the probability of "love" is the highest, the content of the image in the rectangular region T1 is recognized as "love".

As described above, when all the images of the rectangular area T of one character unit are input to the character string recognition model 21C, the print character string ML described in the predetermined item K of the form image Gs is recognized.

In the above description, the blanks between words are specified as continuous areas, but the type character string recognition device 20 according to the present embodiment does not consider the blanks between words if they are on the same line. A continuous region may be specified. In the example shown in FIG. 6, one continuous region S in which continuous regions S1 to S3 of the same line are connected may be specified without considering blanks between words. In this case, up to "Naka Ward, Nagoya City, Aichi Prefecture" will be specified as a continuous area. Even if the continuous area is specified for each line in this way, it is possible to obtain the rectangular area T divided into character units.

(1-3) Features of print character string recognition device (1-3-1)
As described above, the print character string recognition device 20 according to the present embodiment includes an acquisition unit 24A, an extraction unit 24B, a division unit 24C, and a recognition unit 24D. The acquisition unit 24A acquires the form image Gs in which the print character string ML is written in the reading area R. The extraction unit 24B extracts the image of the reading area R based on the reference form image Gc. The dividing unit 24C divides the image of the printed character string ML written in the reading area R into an image of the rectangular area T in units of one character for each row or column. The recognition unit 24D recognizes the contents of the print character string ML based on the image of the rectangular area T. In this way, since the printed character string ML written in the form can be extracted as an image in units of one character, the characters contained in a large number of form images can be read at high speed and with high accuracy.

In particular, here, the recognition unit 24D recognizes the contents of the print character string ML by using the character string recognition model 21C. The character string recognition model 21C is constructed by a neural network, and infers the content of the printed characters written in the image of the rectangular area T in response to the input of the image of the rectangular area T. Therefore, by using such a neural network, the recognition accuracy of the printed character string ML can be improved.

(1-3-2)
Further, in the print character string recognition device 20 according to the present embodiment, the division unit 24C performs binarization processing of the image in the reading area R. Further, the dividing unit 24C specifies a continuous region S in which a plurality of printed characters are continuous along the line direction of the binarized image. Then, the division unit 24C scans the image of the continuous region S in the column direction and divides the image into the image of the rectangular region T binarized in character units. With such a configuration, the image of the continuous region S can be divided into the image of the rectangular region T of the printed characters with high accuracy. Then, by using the character string recognition model 21C constructed by the neural network optimized for the image of each character, the printed characters can be recognized with high accuracy.

Here, since the recognition unit 24D recognizes the contents of the print character string ML from the image of the binarized rectangular area T, the recognition speed of the print character string ML can be increased. Supplementally, since noise is removed from the image of the binarized rectangular region T, it is possible to reduce the calculation load executed when recognizing the print character string ML. As a result, the recognition speed of the print character string ML can be increased.

(1-4) Modification example (1-4-1) Modification example A
In the above description, the division unit 24C has a continuous area S in which a plurality of type characters are continuous along the row direction, but may also have a continuous area in which a plurality of type characters are continuous along the column direction. Good. However, when the dividing portion 24C specifies a continuous region along the column direction, the dividing portion 24C scans the image of the continuous region S in the row direction to obtain an image of the rectangular region T binarized in character units. To divide.

(1-4-2) Modification B
In the above description, the recognition unit 24D recognizes the content of the print character string ML from the image of the binarized rectangular region T, but the recognition unit 24D according to the present embodiment is limited to this. is not. The recognition unit 24D according to the present embodiment extracts the image of the rectangular area in the form image Gs before binarization from the image of the binarized rectangular area T and recognizes the contents of the print character string ML. It may be. With such a configuration, the recognition accuracy of the print character string ML can be improved.

Supplementally, since the image of the binarized reading area R has been expanded, the size may be different from the area in which the printed characters are described in the form image Gs before the binarization. is there. Therefore, the recognition accuracy of the type character can be improved by using the binarized image until the rectangular area of each character is obtained and using the image before the binarization when inferring the type character. At this time, the position of the rectangular region T divided after the binarization process is projected onto the form image Gs before the binarization process to extract a rectangular image, and this image is input to the character string recognition model 21C. Infer the character. Specifically, instead of the image of the rectangular area T after the binarization process of FIG. 6 (d), a rectangular image is extracted from the reading area of FIG. 6 (a), and characters are inferred using this. The character string recognition model 21C in this case is not a binarized character image, but an image in which characters are copied in grayscale or color display is learned as a teacher image.

(1-4-3) Modification C
The processing unit 24 described above may execute a process of displaying the recognition result of the type character string by the recognition unit 24D by changing the display form according to the character type. When the contents of the printed character string are inferred using the character string recognition model 21C, the kanji character "ko" and the katakana character "e", the alphabet character "O" and the number "0", etc. May be misidentified because they have similar outer shapes. Therefore, by displaying the character string recognition model 21C in different colors according to the difference in numbers, alphabets, and symbols, the correctness of the identification result of the character string recognition model 21C can be determined at a glance.

(1-4-4) Modification D
In the above description, it is assumed that the form image Gs to be read is stored in the storage unit 21, but the type character string recognition device 20 according to the present embodiment is not limited to this. For example, as shown in FIG. 9, the form image Gs may be stored in the external storage device 121. With such a configuration, the security level of the external storage device 121 can be raised, and the risk of the form image Gs being leaked can be reduced.
Further, the print character string recognition device 20 may be connected to the external storage device 121 via a network. Further, in this case, the acquisition unit 24A of the print character string recognition device 20 may access the external storage device 121 only when reading the form image Gs and read the form image Gs.

(1-4-5) Modification E
In the above description, the type character string recognition device 20 is configured to include the input unit 22 and the output unit 23, but the type character string recognition device 20 according to the present embodiment is not limited to this. The type character string recognition device according to the present embodiment does not necessarily have to include the input unit 22 and the output unit 23.
Further, the print character string recognition device 20 according to the present embodiment may be configured as a system in which a plurality of devices are connected via a network. For example, as shown in FIG. 10, a terminal device 100 having an input unit 122 and an output unit 123, which has the same functions as the input unit 22 and the output unit 23, and a print character string recognition device 20 are connected via a network. It may be a printed character string recognition system. Further, in this case, the print character string recognition device 20 may be connected to a plurality of terminal devices 100. Here, for convenience, the print character string recognition device in the above description is referred to as a print character string recognition system.

<Second Embodiment>
(2-1) Configuration of Type Character String Recognition Device FIG. 11 is a schematic view showing the configuration of the print character string recognition device 20S according to the second embodiment. The print character string recognition device 20S according to the second embodiment performs processing using the reference character string MS. Hereinafter, the configurations already described will be designated by substantially the same reference numerals, and duplicate description will be omitted. In particular, the configuration peculiar to the present embodiment will be described with the subscript "S".

In the print character string recognition device 20S according to the present embodiment, the storage unit 21S stores the reference character string database 21D. The reference character string database 21D stores in advance the reference character string MS associated with each of the predetermined items K1 to K6 in the form image Gs. For example, in the reference character string database 21D, as shown in FIG. 12, when the predetermined item K5 of the form image Gs is "used article", "private ordinary passenger car" "private small passenger car" "private light four-wheeled passenger car" " Each of the "private light four-wheeled freight car" and "two-wheeled vehicle" is stored as the reference character strings MS5a to MS5e.

Further, in the print character string recognition device 20S according to the present embodiment, the recognition unit 24DS calculates the similarity degree by comparing the candidate character string MK described later with the reference character string MS stored in advance, and the reference has a high degree of similarity. The character string MS is recognized as the content of the print character string ML. When the similarity is equal to or less than a predetermined value, the recognition unit 24D recognizes the candidate character string MK as the content of the print character string ML.

(2-2) Operation and Features of the Type Character String Recognition Device FIG. 13 is a flowchart for explaining the operation of the print character string recognition device 20S according to the second embodiment.
In the print character string recognition device 20S according to the present embodiment, steps B4 to B8 are executed instead of the above-mentioned steps A4. Therefore, also in the print character string recognition device 20S according to the present embodiment, first, the same processing as in steps A1 to A3 described above is executed (B1 to B3).

Then, the print character string recognition device 20S generates a candidate character string MK from the print characters inferred using the character string recognition model 21C (B4). Next, the print character string recognition device 20S compares the candidate character string MK with the reference character string MS stored in advance to calculate the similarity (B5). Then, when the similarity is equal to or higher than a predetermined value, the print character string recognition device 20S recognizes the reference character string MS having the maximum similarity as the content of the print character string ML (B6-Yes, B7). On the other hand, when the similarity is smaller than a predetermined value, the print character string recognition device 20S recognizes the candidate character string MK as the content of the print character string (B6-No, B8).

For example, as shown in FIG. 14 (a), from the image of the reading area R5 described as "private small passenger car" corresponding to the print character string ML5, "private small passenger car" as shown in FIG. 14 (b). The candidate character string MK5 is generated. In this case, the recognition unit 24DS according to the present embodiment compares the reference character string MS5 with the candidate character string MK5 and calculates the degree of similarity between the two. Here, as shown in FIG. 15, "private passenger car", "small private car", "private light four-wheeled passenger car", "private light four-wheeled freight car", and "two-wheeled vehicle" are stored as reference character strings MS5a to MS5e. , The respective similarity is calculated as "0.5", "0.75", "0.470588", "0.35291", and "0.3076923". In this case, since the similarity of the reference character string MS5b to the "private small passenger car" is the highest at 0.75, the recognition unit 24DS replaces the above-mentioned candidate character string MK with the reference character string MS5b.

As described above, the print character string recognition device 20S according to the present embodiment can further improve the recognition accuracy of the print character string ML by using the reference character string database 21D. In particular, when the content described in the predetermined item of the form is specified, the characters included in a large number of form images can be read at high speed and with high accuracy.

On the other hand, the print character string recognition device 20S according to the present embodiment recognizes the candidate character string MK as the content of the print character string ML when the similarity is equal to or less than a predetermined value. With such a configuration, the characters included in the form image Gs can be read with high accuracy even when the reference character string MS cannot be registered in advance. For example, depending on the items of the form, a free entry field may be provided in addition to the preset items. In the print character string recognition device 20S according to the present embodiment, when the similarity is equal to or less than a predetermined value, it is considered that the contents other than the preset items are entered, and this is read with high accuracy.

Note that the features and modifications of the first embodiment can be applied as they are to the print character string recognition device 20S according to the second embodiment.

<Third Embodiment>
FIG. 16 is a schematic view showing the configuration of the print character string recognition device 20T according to the third embodiment. Hereinafter, the configurations already described will be designated by substantially the same reference numerals, and duplicate description will be omitted. In particular, the configuration peculiar to the present embodiment will be described with the subscript "T".

The storage unit 21T according to the present embodiment stores a plurality of character string recognition models 21C1 to 21Cn (n is 2 or more and less than the number of items K) in association with a plurality of predetermined items K1 to K6. In the example shown in FIG. 16, it is assumed that a plurality of character string recognition models 21C1 to 21C4 (n = 4) are stored in the storage unit 21T.

Further, the recognition unit 24DT according to the present embodiment has two or more predetermined items when two or more predetermined items are associated with the same character string recognition model 21Ci (i is a value of any of 1 to 4). The images of the rectangular area obtained from the above are combined, and the combined image of the rectangular area is input to the same character string recognition model 21Ci.

Supplementally, in the predetermined item K of the form image Gs to be read, there are an item in which only numbers are entered, an item in which only kana characters are entered, an item in which arbitrary characters are entered, and the like. For example, arbitrary characters are entered in the above-mentioned item K2 (address) and item K3 (name). Therefore, the item K2 (address) and the item K3 (name) are associated with the character string recognition model 21C2 capable of identifying any character. In such a case, the recognition unit 24DT synthesizes the images of the rectangular area obtained from the item K2 (address) and the item K3 (name), and inputs the combined images of the rectangular area into the character string recognition model 21C2 at once. ..

Therefore, the print character string recognition device 20T infers the print character string ML at a time with the same character string recognition model 21Ci for two or more predetermined items according to the above-described configuration, so that the recognition speed of the print character string ML is increased. be able to.

Specifically, even if the image of the rectangular area corresponding to the item K2 (address) and the image of the rectangular area corresponding to the item K3 (name) are individually input to the character string recognition model 21C2, the item K2 (address) is supported. There is no difference in recognition accuracy as compared with the case where the image of the rectangular area to be used and the image of the rectangular area corresponding to the item K3 (name) are combined and input to the character string recognition model 21C2 at once. On the other hand, when the character string recognition model 21C2 is realized by the neural network, the calculation speed is faster in the latter case. Therefore, it is possible to improve the reading speed of the form image Gs by collectively processing the items having the same character type as in the type character string recognition device 20T according to the third embodiment.

Further, since the character string recognition model 21Ci is constructed in association with the character type used for each item, the recognition accuracy and recognition speed of the print character string ML can be improved. For example, only numbers are entered in the above-mentioned item K1 (zip code). Therefore, for the item K1 (zip code), it is sufficient to use the character string recognition model 21C1 capable of identifying only numbers, and the recognition accuracy and recognition speed are higher than those using the character string recognition model 21C2 capable of identifying arbitrary characters. Can be enhanced.

Note that the features and modifications of the first and second embodiments can be applied as they are to the type character string recognition device 20T according to the third embodiment.

<Fourth Embodiment>
FIG. 17 is a schematic view showing the configuration of the print character string recognition device 20U according to the fourth embodiment. Hereinafter, the configurations already described will be designated by substantially the same reference numerals, and duplicate description will be omitted. In particular, the configuration peculiar to the present embodiment will be described with the subscript "U".

In the present embodiment, in step A3, another method is adopted when extracting a rectangular image in units of one character.

The print character string recognition device 20U according to the present embodiment includes a division unit 24CU. The division unit 24CU is specified as continuous regions S1 to S4 by the same processing as described above. That is, in the example shown in FIG. 6, the images in which "Aichi Prefecture", "Nagoya City", "Naka Ward", and "X-chome Y-address z" are described are specified as individual continuous regions S1 to S4 (FIG. 6 (FIG. 6). d)).

Subsequently, the dividing unit 24CU divides the binarized image corresponding to the continuous areas S1 to S4 into a rectangular area image based on a unit length corresponding to one character along the line direction. Specifically, when a one-character image is written in a rectangular area having 15 pixels in each of the row direction and the column direction, the binarized image corresponding to the continuous area S1 has a length of 48 pixels in the row direction. Suppose you have. In this case, the value 3.2 obtained by dividing 48 pixels by 15 pixels is rounded off to determine the number of divisions, and the binarized image corresponding to the continuous region S1 is divided by this number of divisions.

Then, the print character string recognition device 20U regards the divided image as an image of the rectangular area T divided into character units, and recognizes the contents of the print character string ML by using the character string recognition model 21C.

With the above-described configuration, the print character string recognition device 20U can extract the print character string ML written in the form as an image for each character, so that the characters contained in a large number of form images can be read at high speed and with high accuracy. Can be done.

In particular, in the case of character types that cannot be described with a single stroke, the reading accuracy may be improved by the method according to this embodiment. For example, in the type character string recognition device 20 according to the first embodiment, the width of the line L that scans the continuous regions S1 to S4 is narrow, and the katakana "ri" and the like are used as two characters such as "1" and "no". May be read. With the print character string recognition device 20U according to the present embodiment, it is possible to read print characters having the same character width with high accuracy without reading two characters from one character.

The print character string recognition device 20U according to the fourth embodiment can also be applied as it is with the features and modifications of the first to third embodiments other than the above (13-2).

<Other embodiments>
The present disclosure is not limited to each of the above embodiments as it is. In the present disclosure, the components can be modified and embodied without departing from the gist at the implementation stage. Further, in the present disclosure, various disclosures can be formed by appropriately combining the plurality of components disclosed in each of the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Further, the components may be appropriately combined in different embodiments.

20 Type character string recognition device 20S Type character string recognition device 20T Type character string recognition device 20U Type character string recognition device 21 Storage unit 21C Character string recognition model 21D Reference character string database 22 Input unit 23 Output unit 24 Processing unit

24A Acquisition unit

24B Extraction unit

24C Division unit

24D Recognition unit 100 Terminal device 121 External storage device 122 Input unit 123 Output unit Gc Reference form image Gs Form image ML Printed character string MK Candidate character string MS Reference character string R Reading area T Rectangular area

Japanese Unexamined Patent Publication No. 2003-115028

Claims

An acquisition unit (24A) for acquiring a document image (Gs) in which a print character string (ML) is written in a reading area (R), and
An extraction unit (24B) that extracts an image of the reading area based on a reference document image (Gc), and
A dividing unit (24C, 24CU) that divides the image of the printed character string written in the reading area into an image of a rectangular area (T) of one character unit for each row or column.
A recognition unit that recognizes the contents of the printed character string using a character string recognition model (21C) that infers the contents of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. 24D) and
Type character string recognition device (20, 20S, 20T).
In the document image, a print character string is entered in association with a plurality of predetermined items (K).
The character string recognition model is constructed by a neural network, and a plurality of the character string recognition models are prepared in association with the plurality of predetermined items.
The extraction unit extracts a plurality of reading areas corresponding to a plurality of predetermined items,
When two or more predetermined items are associated with the same character string recognition model, the recognition unit synthesizes rectangular areas obtained from the two or more predetermined items, and combines the combined rectangular areas with the same character string. Enter in the recognition model,
The type character string recognition device according to claim 1.
The recognition unit
A candidate character string (MK) is generated from the printed characters inferred using the character string recognition model, and the candidate character string (MK) is generated.
The similarity is calculated by comparing the candidate character string with the reference character string (MS) stored in advance.
When the similarity is equal to or greater than a predetermined value, the reference character string is recognized as the content of the printed character string.
The type character string recognition device according to claim 1 or 2.
The recognition unit
When the similarity is smaller than a predetermined value, the candidate character string is recognized as the content of the printed character string.
The type character string recognition device according to claim 3.
The divided portion
The image in the reading area is binarized and processed.
A continuous region (S) in which a plurality of type characters are continuous is specified along the row direction or the column direction of the binarized image.
The image of the continuous region is scanned in the column direction or the row direction and divided into an image of a binary rectangular region for each character.
The type character string recognition device according to any one of claims 1 to 4.
The recognition unit recognizes the content of the type character string from the image of the binarized rectangular region.
The print character string recognition device according to claim 5.
The recognition unit recognizes the content of the type character string from the image of the rectangular area before binarization corresponding to the binarized rectangular area.
The print character string recognition device according to claim 5.
The divided portion
The image in the reading area is binarized and processed.
A continuous region (S) in which a plurality of type characters are continuous is specified along the row direction or the column direction of the binarized image.
The binarized image corresponding to the continuous region is divided into a rectangular region image based on a unit length corresponding to one character along the row direction or the column direction.
The type character string recognition device according to any one of claims 1 to 4.
The recognition result of the type character string by the recognition unit is displayed by changing the display form according to the character type.
The type string recognition device (20, 20S, 20T) according to any one of claims 1 to 7.
Computer,
Acquisition unit (24A) for acquiring document images (Gs) in which a print character string (ML) is written in a reading area (R),
An extraction unit (24B) that extracts an image of the reading area based on a reference document image (Gc),
A dividing unit (24C) that divides an image of a printed character string written in the reading area into an image of a rectangular area (T) in units of one character for each row or column.
A recognition unit that recognizes the contents of the printed character string by using a character string recognition model (21C) that infers the contents of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. 24D),
A type string recognition program that functions as.
It is a type character string recognition method that recognizes the contents of a type character string using a computer.
Obtain a document image in which a print character string (ML) is written in the reading area (R),
The image of the reading area is extracted based on the reference document image (Gc), and the image is extracted.
The image of the printed character string written in the reading area is divided into an image of a rectangular area (T) of one character unit for each row or column.
In response to the input of the image of the rectangular area, the content of the type character string is recognized by using the character string recognition model (21C) that infers the content of the type character written in the image of the rectangular area.
Type string recognition method.