WO2021157422A1

WO2021157422A1 - Character string recognition device and character string recognition program

Info

Publication number: WO2021157422A1
Application number: PCT/JP2021/002588
Authority: WO
Inventors: 康介木戸
Original assignee: Ａｒｉｔｈｍｅｒ株式会社
Priority date: 2020-02-06
Filing date: 2021-01-26
Publication date: 2021-08-12
Also published as: JPWO2021157422A1; JP7382544B2

Abstract

Provided are a character string recognition device that enables highly accurate character recognition even for a character string that is not in a monospaced font and the like. The character string recognition device is provided with: an extraction unit that extracts, on a line-by-line basis, a character region containing a character string from a document image including images of character strings; a division unit that divides the character region into a plurality of rectangular regions by separating the character region on the basis of a blank zone orthogonal to the arrangement direction in which the characters are arranged; a determination unit that determines whether to use, as a recognized character, a candidate character obtained by inputting an image of a particular rectangular region to be evaluated among the plurality of rectangular regions to a character recognition model that estimates a single character represented by the image, or a candidate character obtained by inputting, to the character recognition model, an image of a combined rectangular region that is a combination of the particular rectangular region with another rectangular region continuous from the particular rectangular region; and an output unit that outputs a recognized character string generated by combining a plurality of recognized characters sequentially determined by the determination unit while the particular rectangular region is transitioned over the plurality of rectangular regions.

Description

Character string recognition device and character string recognition program

The present invention relates to a character string recognition device and a character string recognition program.

An OCR technique for recognizing a character string contained in an image is known (see, for example, Patent Document 1).

JP-A-2019-175317

The conventional OCR technology has been to cut out a character area for each character from an image containing a character string and estimate one character represented by the character area. Such a method can estimate relatively correctly if the character string of the monospaced font has the same width for each character, but the character string of the variable width font having the width optimized according to the character shape. In that case, it was difficult to make a highly accurate estimation. In particular, sufficient accuracy could not be obtained even in the case of a character string in which half-width characters and full-width characters are mixed.

The present invention has been made to solve such a problem, and provides a character string recognition device or the like that enables highly accurate character recognition even if the character string is not a monospaced font.

The character string recognition device according to the first aspect of the present invention has an extraction unit that extracts a character area including a character string from a document image including an imaged character string on a line-by-line basis, and a character area in which characters are arranged. Estimate one character represented in the image of the divided portion that divides into a plurality of rectangular areas and the image of the specific rectangular area to be evaluated among the plurality of rectangular areas by dividing the image based on the blank band orthogonal to the direction. The candidate character obtained by inputting to the character recognition model is used as the recognition character, or the image of the combined rectangular area obtained by combining other rectangular areas continuous with the specific rectangular area to the specific rectangular area is input to the character recognition model. Outputs a recognition character string generated by combining a decision unit that determines whether the candidate character to be recognized is a recognition character and a plurality of recognition characters that are sequentially determined while transitioning a specific rectangular area in a plurality of rectangular areas. It has an output unit. By determining the recognition character in this way, the character string recognition device can accurately recognize the character represented by the area of one character even if the width of each character is different.

In the above character string recognition device, the determination unit combines a plurality of other rectangular areas with the specific rectangular area within a range in which the length of the generated combined rectangular area in the arrangement direction is equal to or less than the set maximum length. If allowed, a plurality of combined rectangular regions may be generated and each image may be input to the character recognition model. When generating a combined rectangular area, if not only one rectangular area continuous with the specific rectangular area but also a continuous rectangular area is targeted for combination, one character represented in the character area can be recognized more accurately. can.

At this time, the maximum length may be set based on the respective lengths in the arrangement direction of the plurality of rectangular areas. The character string to be recognized has various widths for each character depending on the type of font, for example, but if the maximum length is set based on the length of the rectangular area divided by the divided portion, the maximum suitable for the font is set. Can be long.

Further, in the above character string recognition device, when the determination unit generates the combined rectangular area, the determination unit may be combined including a blank band sandwiched between the rectangular areas. For example, in the case of a character in which the bias and the right component are separated into left and right, it is assumed that there is a slight blank band between them. Therefore, it is possible to improve the accuracy of character recognition by combining the blank bands together.

Further, in the above character string recognition device, in the determination unit, the candidate character obtained by inputting the image of the specific rectangular area into the character recognition model is a Chinese character, and the length in the arrangement direction of the specific rectangular area is set. If it is less than or equal to the reference length, the accuracy calculated by the character recognition model for the candidate characters may be lowered. For example, some kanji biases themselves are established as one kanji, but if they are biased, the width is narrowed. By lowering the accuracy calculated by the character recognition model in this way, it is possible to prevent the bias from being erroneously recognized as one Chinese character.

The character string recognition program according to the second aspect of the present invention includes an extraction step of extracting a character area including a character string from a document image including an imaged character string on a line-by-line basis, and an arrangement of characters in the character area. A division step of dividing into a plurality of rectangular areas by dividing based on a blank band orthogonal to the direction, and an image of a specific rectangular area to be evaluated among the plurality of rectangular areas are estimated as one character represented in the image. The candidate character obtained by inputting to the character recognition model is used as the recognition character, or the image of the combined rectangular area obtained by combining other rectangular areas continuous with the specific rectangular area to the specific rectangular area is input to the character recognition model. A determination step for determining whether the candidate character to be used is a recognition character, and a recognition character string generated by combining a plurality of recognition characters sequentially determined by repeating the determination step while transitioning a specific rectangular area in a plurality of rectangular areas. Have the computer perform the output steps to output. By determining the recognition character in this way, the character string recognition program can accurately recognize the character represented by the area of one character even if the width of each character is different.

INDUSTRIAL APPLICABILITY According to the present invention, it is possible to provide a character string recognition device or the like that enables highly accurate character recognition even if the character string is not a monospaced font.

It is a figure which shows the example of the usage situation of the character string recognition apparatus which concerns on this embodiment. It is a main hardware configuration diagram of a character string recognition device. It is a figure which shows the example of the designated reading area and the extracted character area. It is a figure explaining the process of dividing a character area into a rectangular area. It is a figure explaining the process of setting a reference value from a divided rectangular area. It is a figure explaining the process of character recognition using a character recognition model. It is a figure explaining the input target image to input to a character recognition model. It is a figure explaining the process of character recognition at the time of performing area combination. It is a figure which shows the processing flow of the character string recognition processing. It is a figure which shows the example of another character string.

Hereinafter, the present invention will be described through embodiments of the invention, but the invention according to the claims is not limited to the following embodiments. Moreover, not all of the configurations described in the embodiments are indispensable as means for solving the problem. In the following description, when a plurality of the same objects are individually described, they may be described with subscripts. For example, when the "reading area" is described as a whole, it is described as a reading area DE, and when each reading area is specifically described, it is described with a subscript such as the _{reading area DE 1.}

FIG. 1 is a diagram showing an example of a usage status of the character string recognition device 100 according to the present embodiment. The character string recognition device 100 is, for example, a PC, and the scanner 200 and the monitor 300 are connected to each other. The scanner 200 and the monitor 300 may be connected to the character string recognition device 100 via a network such as the Internet.

The scanner 200 is a device that converts a target document for which character string recognition is desired into image data. In the present embodiment, the target document has a reading area DE including a printed character string, for example, the membership card 900 shown in the figure. As shown in the figure, the membership card 900 has a reading area DE ₁ _{on which the name is printed, a reading area DE 2} on which the address is printed _{, and a reading area DE 3} on which the issue date of the membership card 900 is printed. The user recognizes the character strings printed in the reading area DE by the character string recognition device, converts them into character codes, and uses these character strings as text data for creating a database or the like.

When a large number of documents of the same type are processed, the reading area DE may be preset in the template referenced by the character string recognition device 100, or various documents may be processed individually. If this is the case, the user may set it each time. The character string recognition device 100 takes in the image data of the document image generated by reading the membership card 900 by the scanner 200, and sequentially executes the character string recognition process for the set reading area DE. In the following description, a case where the character string is printed horizontally in the reading area DE will be described.

The monitor 300 displays the recognition character string recognized by the character string recognition device 100 as text data. For example, as shown in the figure, a display for asking the user to confirm the recognition result may be performed, or a state in which the recognition result is entered in a specific cell of the designated database may be displayed.

FIG. 2 is a main hardware configuration diagram of the character string recognition device 100. The character string recognition device 100 is mainly composed of a processing unit 110, a storage unit 120, and an input / output IF 130. The processing unit 110 executes various types of information processing, and is realized by a processor such as a CPU or GPU and a memory. Here, by reading the program stored in the storage unit 120 into the CPU, GPU, or the like of the computer, the processing unit 110 functions as the acquisition unit 111, the extraction unit 112, the division unit 113, the determination unit 114, and the output unit 115. do.

The acquisition unit 111 acquires the image data sent from the scanner 200 and expands the document image on the memory of the processing unit 110. The extraction unit 112 extracts a character area including the imaged character string from the document image on a line-by-line basis. The division unit 113 divides the character area extracted by the extraction unit 112 into a plurality of rectangular areas by dividing the character area extracted by the extraction unit 112 based on a blank band orthogonal to the arrangement direction in which the characters are arranged.

The determination unit 114 sets the candidate character obtained by inputting the image of the specific rectangular area to be evaluated into the character recognition model 121 among the plurality of divided rectangular areas as the recognition character, or sets the candidate character continuously in the specific rectangular area. It is determined whether to use the candidate character obtained by inputting the image of the combined rectangular area generated by combining the rectangular areas of the above into the character recognition model 121 as the recognition character. The output unit 115 outputs a recognition character string generated by combining the recognition characters sequentially determined by the determination unit 114 while transitioning the specific rectangular area. Specific processing of these functional parts will be described in detail later.

The storage unit 120 stores various types of information, and is realized by an arbitrary storage device such as a memory and a hard disk. In the present embodiment, the storage unit 120 stores information such as the weight of the neural network that builds the character string recognition model 121.

The character string recognition model 121 estimates one character represented by the image of the rectangular area in response to the input of the image of the rectangular area. Specifically, a plurality of candidate characters and the accuracy of each candidate character are output for the input image. Candidate characters are output in association with a character code such as JIS Kanji code (JIS X 0208). The character string recognition model 121 is constructed by, for example, a convolutional neural network (CNN) whose weights are adjusted based on a teacher image on which printed characters are copied. However, another neural network may be used, and further, the neural network may not be used, and the neural network may be constructed on a rule basis.

The input / output IF 130 is an input / output interface for the processing unit 110 to exchange information with the scanner 200 and the monitor 300. Specifically, it is realized by a USB interface or a LAN interface. The input / output IF 130 may be connected to an input device such as a keyboard, mouse, or touch panel, or may be connected to an output device such as a speaker.

FIG. 3 is a diagram showing an example of the designated reading area DE and the extracted character area LE. _{The figure is an image of the reading area DE 2} on which the address is printed in the membership card 900, and visually represents the image of _{the reading area DE 2} expanded on the memory by the acquisition unit 111.

The extraction unit 112 extracts the _{character areas LE 1} and LE ₂ _{from the image of the reading area DE 2} so that each of them contains a character string for one line. Specifically, the extraction unit 112 _{binarizes the reading area DE 2} and performs expansion processing in the row direction to form a continuous character string into a pixel set that is continuous in the row direction. Here, since a pixel set for two lines is generated, the character regions LE ₁ and LE ₂ are extracted by enclosing each pixel set with a rectangle so as to include each pixel set.

FIG. 4 is a diagram illustrating a process of dividing the _{character area LE 1 into a plurality of rectangular areas B.} FIG. 4A shows how the dividing unit 113 scans the scanning window SW in the direction in which the character strings are lined up to calculate the luminance value. The scanning window SW has _{a vertical width of the character area LE 1 and} a horizontal width of several pixels, and the dividing unit 113 calculates the brightness value at each scanning position of the scanning window SW. The brightness value is calculated as, for example, the sum of the pixel values of the region included in the scanning window SW.

FIG. 4B shows the luminance value at each scanning position of the scanning window SW in accordance with the coordinates in the arrangement direction of the character strings in FIG. 4A, and the vertical axis represents the luminance value. Here, it is assumed that the higher the density of characters in the scanning window SW, the larger the luminance value. When the character string is "Shinagawa-ku, Tokyo" as shown in the example of the figure, it is between the characters, between the characters, between the bias and the right component of "Miyako", between the three vertical bars of "River", and "East". A blank band having a brightness value of 0 occurs in the front and behind the "ward". The dividing unit 113 divides the character area LE ₁ _{into rectangular areas B 1} to B ₉ each containing a character element by dividing the _{character area LE 1} before and after this blank band orthogonal to the character string arrangement direction. do.

FIG. 4C shows how the character elements of “Shinagawa-ku, Tokyo” are _{divided into rectangular areas B 1} to B _{9, respectively.} As shown in the figure, in the present embodiment, one character is not necessarily surrounded by one rectangular area. For example, the "city" is divided into a bias (rectangular area B ₃ ) and a right component (rectangular area B ₄ ). Surrounded by separate rectangular areas. Similarly, in the "river", three vertical bars are individually surrounded by _{rectangular areas B 6} to B _8.

The reference value set for the rectangular area B divided in this way will be described. FIG. 5 is a diagram illustrating a process of setting a reference value from the divided rectangular area B. The determination unit 114 calculates the rectangular widths W ₁ to W ₉ of the rectangular areas B ₁ to B _9. Then, the top 25% when these rectangle widths are rearranged in descending order is selected. Here, since there are nine rectangular areas, the top two are selected. Specifically, the rectangular width W ₁ of the encompassing rectangular area B ₁ of the "east", selects the rectangular width W ₉ encompassing rectangular area B ₉ to "ward".

_{The maximum length W m} used in the subsequent processing is set as a value obtained by multiplying the average value of the selected rectangular widths W ₁ and W _{9 by 1.5.} It should be noted that what percentage of the top is to be selected and what value to multiply by is set according to the nature of the document to be read, or according to the number of divided rectangular areas B. It can be changed according to the situation, such as setting. Similarly, the reference length W _s used in the subsequent processing is an average value of all rectangular widths W ₁ to W _9.

FIG. 6 is a diagram illustrating a character recognition process using the character recognition model 121. Here, first, a specific rectangular area to be evaluated is a rectangular area B _1, the character recognition model 121 will be described the case of estimating only this rectangular area B _1.

When an image in a rectangular area is input, the character recognition model 121 outputs a plurality of candidate characters and the accuracy p of each candidate character by estimating one character represented in the image. In other words, the character recognition model 121 is a candidate character that represents one character even when the image of the input rectangular area does not actually represent one character (for example, when only bias is represented as described later). To estimate.

FIG. 7 is a diagram for explaining an input target image to be input to the character recognition model 121. In the present embodiment, as described above, the division unit 113 divides the character string into a rectangular area by dividing the character string into a blank band orthogonal to the arrangement direction in which the character strings are arranged. Further, the character recognition model 121 estimates the candidate character as representing one character even when the image of the input rectangular area does not actually represent one character. Therefore, if one divided rectangular area is designated as a specific rectangular area and the recognition character is determined only from the image, for example, only the bias of the Chinese character is recognized as one character.

Therefore, in the present embodiment, not only the image of the specific rectangular area is evaluated, but also the image of the combined rectangular area in which other rectangular areas continuous with the specified specific rectangular area are combined with the specific rectangular area is evaluated. set to target. FIGS. 7 (a) represents a specific case rectangular area to be evaluated is a rectangular region B _3. Specifically, it is the bias of the "capital". In this case, since there are no other rectangular regions to be combined, i = 0 is set.

FIG. 7B shows a state in which a continuous rectangular area B ₄ _{is combined with a rectangular area B 3} which is a specific rectangular area to generate a combined rectangular area 1. Since there is only one other rectangular area to be combined with the specific rectangular area, i = 1. Incidentally, the determination unit 114, when generating such binding rectangular region, couples, including spaces zone sandwiched between the rectangular region B ₃ and the rectangular region B _4. If the combined image of the rectangular area B ₃ and the rectangular area B ₄ represents one character, the balance as one character will be correct if the blank band that originally existed between the two is included, so the character recognition model 121 It is possible to improve the recognition accuracy by.

_{Here, the width of the rectangular area B 3} which is the specific rectangular area _{is W 3} , and the width of the combined rectangular area B ₄ is W ₄ . The sum of these, W ₃ + W _4, is smaller than _{the maximum length W m} described with reference to FIG. Therefore, the determination unit 114 sets the generated image of the combined rectangular region 1 as an evaluation target and inputs it to the character recognition model 121. In this example, the image of the combined rectangular area 1 is evaluated, but if W ₃ + W ₄ is larger than the maximum length W _m , the image of the combined rectangular area 1 is excluded from the evaluation, and the determination unit 114 determines the specific rectangle. Only the area is evaluated. The example of "East" described with reference to FIG. 6 is an example in which only a specific rectangular region is evaluated.

FIG. 7C shows a state in which a continuous rectangular area B ₄ and a rectangular area B ₅ _{are combined with a rectangular area B 3} which is a specific rectangular area to generate a combined rectangular area 2. Since there are two other rectangular areas to be combined with the specific rectangular area, i = 2. Also in this case, the determination unit 114 is connected including the blank band between the respective rectangular regions.

_{Here, the width of the rectangular region B 3} which is the specific rectangular region _{is W 3} , and the widths of the rectangular region B ₄ and the rectangular region B _{5 to be} combined are W ₄ and W ₅ , respectively. The sum of these, W ₃ + W ₄ + W _5, is larger than the maximum length W _m. Therefore, the determination unit 114 excludes the generated image of the combined rectangular region 2 from the evaluation target. In this example, the image of the combined rectangular area 2 is not evaluated, but if W ₃ + W ₄ + W ₅ is smaller than the maximum length W _m , the image of the combined rectangular area 2 is also evaluated.

In this way, by allowing one or more other rectangular regions continuous to the specific rectangular region to be combined within a range of _{the maximum length W m or less, the determination unit 114 twists one character represented in the character region.} Can be recognized accurately. Further, since the maximum length W _m is set based on the width of the divided rectangular area B, it becomes a value suitable for the font used in the character string, and the number of rectangular areas to be combined with the specific rectangular area (i). The upper limit of) is optimized.

FIG. 8 is a diagram illustrating a character recognition process when region combination is performed as described with reference to FIG. 7. As shown in FIG. 8A, the determination unit 114 first inputs the bias of the “city”, which is an image of the specific rectangular region, into the character recognition model 121. The character recognition model 121 outputs "person", "day", etc. as candidate characters together with the accuracy p. In the example of the figure, the accuracy of "person" is p = 0.962, and the accuracy of "day" is p = 0.021.

Here, the output candidate character is a Chinese character, and the rectangular width of the specific rectangular area is W ₃ , which is smaller than _{the reference length W s} described with reference to FIG. Therefore, determination unit 114, a respective probability p calculated for an output candidate characters is reduced by 0.95 times, and calculates the correction accuracy p _c. Specifically, the correction accuracy of the "Company" _p c = 0.9139, correction accuracy of the "day" is a _p c = 0.01995.

Next, as shown in FIG. 8B, the determination unit 114 inputs the “city”, which is an image of the combined rectangular region 1, into the character recognition model 121. The character recognition model 121 outputs "city", "county", etc. as candidate characters together with the accuracy p. In the example of the figure, the accuracy of the "city" is p = 0.982, and the accuracy of the "county" is p = 0.013. If the images of the combined rectangular regions 2, 3 ... Are also evaluation targets, the determination unit 114 also inputs those images into the character recognition model 121 to obtain candidate characters and their respective accuracy.

Then, determination unit 114, as shown in FIG. 8 (c), each of the modified probability p _c of the candidate characters with respect to the image of a specific rectangular area, each of the probability p of candidate characters for the image of the coupling rectangular area 1 The maximum value is searched from among these by comparison, and the candidate character corresponding to the maximum value is determined as the recognition character. Specifically, examples of the highest revised probability p _c for the image of the specific rectangular area _(0.9139 in the example of FIG. 8 (a)), the maximum likelihood p (FIG. 8 with respect to the image of the binding rectangular area (b) Then, the larger one (0.982 in the examples of both figures) is selected in comparison with 0.982), and the corresponding candidate character is determined as the recognition character. The determination unit 114 executes such a process and determines the "city" as the recognition character. That is, it is determined that one character to be recognized is a character indicated by 52 points in 37 wards in the JIS kanji code (JIS X 0208).

Note that the specific rectangular area determination unit 114, as described above, an output candidate characters is Kanji, and, if rectangular width is smaller than the reference length W _s, 1 less than a preset multiplied by the coefficient accuracy p to calculate the corrected accuracy p _c. By adopting such reduced been corrected accuracy p _c, it can be prevented from being erroneously recognized as one of the Chinese character polarized represented in a relatively narrow width.

Determination unit 114, to the next rectangular area B _{n + 1} subsequent to the rectangular area B _n where recognized character is confirmed by transitioning a specific rectangular area, executes the next character recognition. Specifically, since the recognition to the rectangular area B ₄ is determined by the "city", determination portion 114, a rectangular area B ₅ subsequent to the rectangular area B ₄ stipulates that a new specific rectangular region continues to character recognition .. The output unit 115 combines the recognition characters sequentially determined by the determination unit 114 in this way to generate a recognition character string, outputs the recognition character string to the storage unit 120 and stores it, or outputs the recognition character to the monitor 300 via the input / output IF 130. And display it.

Next, the flow of these series of character string recognition processes will be explained using a flow chart. FIG. 9 is a diagram showing a processing flow of the character string recognition process. The flow starts from the time when the image data of the target document is sent from the scanner 200.

In step S101, the acquisition unit 111 acquires image data from the scanner 200 via the input / output IF 130, and develops the document image on the memory of the processing unit 110. In step S102, the extraction unit 112 extracts the character area LE from the designated reading area DE so that each of them includes a character string for one line, as described with reference to FIG. Next, the process proceeds to step S103, and the division unit 113 divides the character area LE extracted by the extraction unit based on a blank band orthogonal to the arrangement direction in which the characters are arranged, as described with reference to FIG. Divide into rectangular area B.

In step S104, the determination unit 114 sets a specific rectangular area to be evaluated from the plurality of divided rectangular areas B. When performing the first step S104, a rectangular area B ₁ of the left defined as a specific rectangular region, otherwise the specified rectangular area for the next rectangular area B _{n + 1} subsequent to the rectangular area B _n where recognized character has been confirmed To be determined. In step S105, the determination unit 114 first inputs an image of a specific rectangular region into the character recognition model 121, and obtains a plurality of candidate characters and the accuracy p of each candidate character. In the following step S106, the determination unit 114 confirms whether or not the rectangular width W of the specific rectangular region is equal to or less than the _{reference length W s} _{, and if it is equal to or less than the rectangular width W s} , proceeds to step S107 to correct the accuracy p. calculating a correction accuracy _{p c} Te then proceeds from the step S108. If it is not equal to or less than the rectangular width W _s , step S107 is skipped and the process proceeds to step S108.

In step S108, the determination unit 114 combines the continuous rectangular area B _{n + 1} _{with the rectangular area B n} , which is a specific rectangular area, to generate the combined rectangular area 1. _{In the following step S109, it is confirmed whether or not the coupling width W n} + W _{n + 1} of the generated coupling rectangular region 1 is equal to or less than the maximum length W _m. If the maximum length is W _m or less, the process proceeds to step S110, the image of the combined rectangular region 1 is input to the character recognition model 121, and a plurality of candidate characters and the accuracy p of each candidate character are obtained. After that, returning to step S108, the determination unit 114 combines the rectangular regions B _{n + 2} _{which are continuous with the rectangular regions B n} and B _{n + 1} to generate the combined rectangular region 2. Then, if the combined width W _n + W _{n + 1} + W _{n + 2 of} the combined rectangular area 2 is equal to or less than the maximum length W _m (YES in step S109), the image of the combined rectangular area 2 is transferred to the character recognition model 121 in the same manner as the combined rectangular area 1. Input and obtain a plurality of candidate characters and the accuracy p of each candidate character (step S110).

When the determination unit 114 determines in step S109 that the coupling width of the coupling rectangular region i (i is the number of rectangular regions B coupled to the specific rectangular region) is _{larger than the maximum length W m} , the determination unit 114 proceeds to step S111. Determination unit 114, at step S111, the maximum probability p (corrected accuracy p _c in the case where it is corrected in step _S107) among the candidate characters obtained by inputting an image of a specific rectangular area to the character recognition model 121 It is determined whether the indicated candidate character is used as the recognition character or the candidate character showing the maximum accuracy p among the candidate characters obtained by inputting the image of the combined rectangular area into the character recognition model 121 is used as the recognition character. Specifically, the respective accuracy ps are compared, and the candidate character corresponding to the larger one is used as the recognition character.

The determination unit 114 proceeds to step S112 and confirms whether or not character recognition of all the rectangular areas B divided in step S102 is completed. If it is not completed, the process returns to step S104 and the character recognition process is repeated. If it is completed, the process proceeds to step S113, and the output unit 115 ends a series of processes by combining and outputting the recognition characters sequentially determined by the determination unit 114. If there is a character area LE for which character string recognition has not been completed, the processing of step S103 and subsequent steps is executed for the character area LE. At this time, when the continuous character areas LE ₁ , LE _2, ... Are recognized as one sentence related to each other, the output unit 115 combines the recognition character strings generated in step S113 with each other. You may output after that. If there is a read area DE for which the extraction of the character area has not been completed, the processing of step S102 and subsequent steps is executed for the read area DE.

The character string recognition device and the character string recognition program in the present embodiment described above include a character string in a variable width font, a character string in which half-width characters and full-width characters are mixed, and a handwritten character string included in the recognized document image. Even if it is a character string or the like, high character recognition accuracy can be exhibited. In addition, it is not limited to Japanese, and can be used for all languages such as English characters, Hangul characters, Cyrillic characters, and Arabic characters.

In the above embodiment, the case where the character string is printed horizontally in the reading area DE has been described, but the same applies even when the character string is printed vertically in the reading area DE. Can be processed. Specifically, first, the extraction unit 112 extracts the character area LE from the image of the reading area DE so that each vertically includes a character string for one line. Next, the dividing unit 113 divides the character area into a plurality of rectangular areas by dividing the character area based on a blank band orthogonal to the vertical direction in which the characters are lined up. Then, the determination unit 114 inputs the image of the specific rectangular area to be evaluated among the plurality of rectangular areas into the character recognition model 121, and uses the candidate character obtained as the recognition character or is continuous with the specific rectangular area. It is determined whether or not the candidate character obtained by inputting the image of the combined rectangular area in which the other rectangular area is combined under the specific rectangular area into the character recognition model 121 is used as the recognition character. Some kanji crowns themselves are established as one kanji, but by recognizing a vertically written character string in this way, it is possible to prevent the crown from being mistakenly recognized as one kanji. The character string is not limited to vertical writing and horizontal writing, and may be written diagonally or along the shape of a song as shown in FIG. 10, and is not particularly limited to the direction of character writing. For example, when the reading area DE is extracted with a curved shape as shown in FIG. 12B, the scanning window SW is set so as to be orthogonal to the curve direction which is the character string direction, and based on the detected blank band. The rectangular area B may be set. In this case, the rectangular region B to be divided may be, for example, a trapezoid or a shape in which some sides are curved.

Further, in the present embodiment, the membership card has been illustrated as an example, but the present invention is not particularly limited, and there is no limitation on the front and back of the driver's license, the front and back of the residence card, the My Number card, the My Number notification card, and other standard forms and non-standard forms. The present invention can be applied to any of them.

100 ... character string recognition device, 110 ... processing unit, 111 ... acquisition unit, 112 ... extraction unit, 113 ... division unit, 114 ... determination unit, 115 ... output unit, 120 ... storage unit, 121 ... character recognition model, 130 ... Input / output IF, 200 ... Scanner, 300 ... Monitor, 900 ... Membership card

Claims

An extraction unit that extracts a character area including the character string from a document image including the imaged character string on a line-by-line basis, and an extraction unit.
A dividing portion that divides the character area into a plurality of rectangular areas by dividing the character area based on a blank band orthogonal to the arrangement direction in which the characters are arranged.
The candidate character obtained by inputting the image of the specific rectangular area to be evaluated among the plurality of rectangular areas into the character recognition model for estimating one character represented in the image is used as the recognition character, or the specific rectangular area is used. A determination unit for determining whether a candidate character obtained by inputting an image of a combined rectangular area obtained by combining other continuous rectangular areas to the specific rectangular area as a recognition character is input to the character recognition model.
A character string recognition device including an output unit in which the determination unit outputs a recognition character string generated by combining a plurality of the recognition characters sequentially determined while transitioning the specific rectangular area in the plurality of rectangular areas.
The determination unit allows a plurality of the other rectangular regions to be combined with the specific rectangular region within a range in which the length of the generated combined rectangular region in the alignment direction is equal to or less than the set maximum length. The character string recognition device according to claim 1, wherein a plurality of the combined rectangular areas are generated and each image is input to the character recognition model.
The character string recognition device according to claim 2, wherein the maximum length is set based on the respective lengths of the plurality of rectangular areas in the arrangement direction.
The character string recognition device according to any one of claims 1 to 3, wherein the determination unit joins the combined rectangular area including the blank band sandwiched between the rectangular areas.
From claim 1, the determination unit reduces the accuracy calculated by the character recognition model for the candidate character when the length of the specific rectangular region in the alignment direction is equal to or less than the set reference length. The character string recognition device according to any one of 4.
An extraction step of extracting a character area including the character string from a document image including the imaged character string line by line, and an extraction step.
A division step of dividing the character area into a plurality of rectangular areas by dividing the character area based on a blank band orthogonal to the arrangement direction in which the characters are arranged.
The candidate character obtained by inputting the image of the specific rectangular area to be evaluated among the plurality of rectangular areas into the character recognition model for estimating one character represented in the image is used as the recognition character, or the specific rectangular area is used. A determination step of determining whether to use a candidate character obtained by inputting an image of a combined rectangular area obtained by combining other continuous rectangular areas to the specific rectangular area into the character recognition model as a recognition character.
A character string that causes a computer to execute an output step that outputs a recognition character string generated by combining a plurality of the recognition characters that are sequentially determined by repeating the determination step while transitioning the specific rectangular area in the plurality of rectangular areas. Recognition program.