WO2021157422A1 - Character string recognition device and character string recognition program - Google Patents

Character string recognition device and character string recognition program Download PDF

Info

Publication number
WO2021157422A1
WO2021157422A1 PCT/JP2021/002588 JP2021002588W WO2021157422A1 WO 2021157422 A1 WO2021157422 A1 WO 2021157422A1 JP 2021002588 W JP2021002588 W JP 2021002588W WO 2021157422 A1 WO2021157422 A1 WO 2021157422A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
rectangular
recognition
character string
area
Prior art date
Application number
PCT/JP2021/002588
Other languages
French (fr)
Japanese (ja)
Inventor
康介 木戸
Original Assignee
Arithmer株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arithmer株式会社 filed Critical Arithmer株式会社
Priority to JP2021575740A priority Critical patent/JP7382544B2/en
Publication of WO2021157422A1 publication Critical patent/WO2021157422A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion

Definitions

  • the present invention relates to a character string recognition device and a character string recognition program.
  • the conventional OCR technology has been to cut out a character area for each character from an image containing a character string and estimate one character represented by the character area.
  • Such a method can estimate relatively correctly if the character string of the monospaced font has the same width for each character, but the character string of the variable width font having the width optimized according to the character shape. In that case, it was difficult to make a highly accurate estimation. In particular, sufficient accuracy could not be obtained even in the case of a character string in which half-width characters and full-width characters are mixed.
  • the present invention has been made to solve such a problem, and provides a character string recognition device or the like that enables highly accurate character recognition even if the character string is not a monospaced font.
  • the character string recognition device has an extraction unit that extracts a character area including a character string from a document image including an imaged character string on a line-by-line basis, and a character area in which characters are arranged. Estimate one character represented in the image of the divided portion that divides into a plurality of rectangular areas and the image of the specific rectangular area to be evaluated among the plurality of rectangular areas by dividing the image based on the blank band orthogonal to the direction. The candidate character obtained by inputting to the character recognition model is used as the recognition character, or the image of the combined rectangular area obtained by combining other rectangular areas continuous with the specific rectangular area to the specific rectangular area is input to the character recognition model.
  • the determination unit combines a plurality of other rectangular areas with the specific rectangular area within a range in which the length of the generated combined rectangular area in the arrangement direction is equal to or less than the set maximum length. If allowed, a plurality of combined rectangular regions may be generated and each image may be input to the character recognition model. When generating a combined rectangular area, if not only one rectangular area continuous with the specific rectangular area but also a continuous rectangular area is targeted for combination, one character represented in the character area can be recognized more accurately. can.
  • the maximum length may be set based on the respective lengths in the arrangement direction of the plurality of rectangular areas.
  • the character string to be recognized has various widths for each character depending on the type of font, for example, but if the maximum length is set based on the length of the rectangular area divided by the divided portion, the maximum suitable for the font is set. Can be long.
  • the determination unit when the determination unit generates the combined rectangular area, the determination unit may be combined including a blank band sandwiched between the rectangular areas. For example, in the case of a character in which the bias and the right component are separated into left and right, it is assumed that there is a slight blank band between them. Therefore, it is possible to improve the accuracy of character recognition by combining the blank bands together.
  • the candidate character obtained by inputting the image of the specific rectangular area into the character recognition model is a Chinese character, and the length in the arrangement direction of the specific rectangular area is set. If it is less than or equal to the reference length, the accuracy calculated by the character recognition model for the candidate characters may be lowered. For example, some kanji biases themselves are established as one kanji, but if they are biased, the width is narrowed. By lowering the accuracy calculated by the character recognition model in this way, it is possible to prevent the bias from being erroneously recognized as one Chinese character.
  • the character string recognition program includes an extraction step of extracting a character area including a character string from a document image including an imaged character string on a line-by-line basis, and an arrangement of characters in the character area.
  • the candidate character obtained by inputting to the character recognition model is used as the recognition character, or the image of the combined rectangular area obtained by combining other rectangular areas continuous with the specific rectangular area to the specific rectangular area is input to the character recognition model.
  • a determination step for determining whether the candidate character to be used is a recognition character, and a recognition character string generated by combining a plurality of recognition characters sequentially determined by repeating the determination step while transitioning a specific rectangular area in a plurality of rectangular areas.
  • FIG. 1 is a diagram showing an example of a usage status of the character string recognition device 100 according to the present embodiment.
  • the character string recognition device 100 is, for example, a PC, and the scanner 200 and the monitor 300 are connected to each other.
  • the scanner 200 and the monitor 300 may be connected to the character string recognition device 100 via a network such as the Internet.
  • the scanner 200 is a device that converts a target document for which character string recognition is desired into image data.
  • the target document has a reading area DE including a printed character string, for example, the membership card 900 shown in the figure.
  • the membership card 900 has a reading area DE 1 on which the name is printed, a reading area DE 2 on which the address is printed , and a reading area DE 3 on which the issue date of the membership card 900 is printed.
  • the user recognizes the character strings printed in the reading area DE by the character string recognition device, converts them into character codes, and uses these character strings as text data for creating a database or the like.
  • the reading area DE may be preset in the template referenced by the character string recognition device 100, or various documents may be processed individually. If this is the case, the user may set it each time.
  • the character string recognition device 100 takes in the image data of the document image generated by reading the membership card 900 by the scanner 200, and sequentially executes the character string recognition process for the set reading area DE. In the following description, a case where the character string is printed horizontally in the reading area DE will be described.
  • the monitor 300 displays the recognition character string recognized by the character string recognition device 100 as text data. For example, as shown in the figure, a display for asking the user to confirm the recognition result may be performed, or a state in which the recognition result is entered in a specific cell of the designated database may be displayed.
  • FIG. 2 is a main hardware configuration diagram of the character string recognition device 100.
  • the character string recognition device 100 is mainly composed of a processing unit 110, a storage unit 120, and an input / output IF 130.
  • the processing unit 110 executes various types of information processing, and is realized by a processor such as a CPU or GPU and a memory.
  • a processor such as a CPU or GPU and a memory.
  • the processing unit 110 functions as the acquisition unit 111, the extraction unit 112, the division unit 113, the determination unit 114, and the output unit 115. do.
  • the acquisition unit 111 acquires the image data sent from the scanner 200 and expands the document image on the memory of the processing unit 110.
  • the extraction unit 112 extracts a character area including the imaged character string from the document image on a line-by-line basis.
  • the division unit 113 divides the character area extracted by the extraction unit 112 into a plurality of rectangular areas by dividing the character area extracted by the extraction unit 112 based on a blank band orthogonal to the arrangement direction in which the characters are arranged.
  • the determination unit 114 sets the candidate character obtained by inputting the image of the specific rectangular area to be evaluated into the character recognition model 121 among the plurality of divided rectangular areas as the recognition character, or sets the candidate character continuously in the specific rectangular area. It is determined whether to use the candidate character obtained by inputting the image of the combined rectangular area generated by combining the rectangular areas of the above into the character recognition model 121 as the recognition character.
  • the output unit 115 outputs a recognition character string generated by combining the recognition characters sequentially determined by the determination unit 114 while transitioning the specific rectangular area. Specific processing of these functional parts will be described in detail later.
  • the storage unit 120 stores various types of information, and is realized by an arbitrary storage device such as a memory and a hard disk. In the present embodiment, the storage unit 120 stores information such as the weight of the neural network that builds the character string recognition model 121.
  • the character string recognition model 121 estimates one character represented by the image of the rectangular area in response to the input of the image of the rectangular area. Specifically, a plurality of candidate characters and the accuracy of each candidate character are output for the input image. Candidate characters are output in association with a character code such as JIS Kanji code (JIS X 0208).
  • the character string recognition model 121 is constructed by, for example, a convolutional neural network (CNN) whose weights are adjusted based on a teacher image on which printed characters are copied. However, another neural network may be used, and further, the neural network may not be used, and the neural network may be constructed on a rule basis.
  • CNN convolutional neural network
  • the input / output IF 130 is an input / output interface for the processing unit 110 to exchange information with the scanner 200 and the monitor 300. Specifically, it is realized by a USB interface or a LAN interface.
  • the input / output IF 130 may be connected to an input device such as a keyboard, mouse, or touch panel, or may be connected to an output device such as a speaker.
  • FIG. 3 is a diagram showing an example of the designated reading area DE and the extracted character area LE.
  • the figure is an image of the reading area DE 2 on which the address is printed in the membership card 900, and visually represents the image of the reading area DE 2 expanded on the memory by the acquisition unit 111.
  • the extraction unit 112 extracts the character areas LE 1 and LE 2 from the image of the reading area DE 2 so that each of them contains a character string for one line. Specifically, the extraction unit 112 binarizes the reading area DE 2 and performs expansion processing in the row direction to form a continuous character string into a pixel set that is continuous in the row direction.
  • the character regions LE 1 and LE 2 are extracted by enclosing each pixel set with a rectangle so as to include each pixel set.
  • FIG. 4 is a diagram illustrating a process of dividing the character area LE 1 into a plurality of rectangular areas B.
  • FIG. 4A shows how the dividing unit 113 scans the scanning window SW in the direction in which the character strings are lined up to calculate the luminance value.
  • the scanning window SW has a vertical width of the character area LE 1 and a horizontal width of several pixels, and the dividing unit 113 calculates the brightness value at each scanning position of the scanning window SW.
  • the brightness value is calculated as, for example, the sum of the pixel values of the region included in the scanning window SW.
  • FIG. 4B shows the luminance value at each scanning position of the scanning window SW in accordance with the coordinates in the arrangement direction of the character strings in FIG. 4A, and the vertical axis represents the luminance value.
  • the higher the density of characters in the scanning window SW the larger the luminance value.
  • the character string is "Shinagawa-ku, Tokyo" as shown in the example of the figure, it is between the characters, between the characters, between the bias and the right component of "Miyako", between the three vertical bars of "River", and "East”.
  • a blank band having a brightness value of 0 occurs in the front and behind the "ward".
  • the dividing unit 113 divides the character area LE 1 into rectangular areas B 1 to B 9 each containing a character element by dividing the character area LE 1 before and after this blank band orthogonal to the character string arrangement direction. do.
  • FIG. 4C shows how the character elements of “Shinagawa-ku, Tokyo” are divided into rectangular areas B 1 to B 9, respectively.
  • one character is not necessarily surrounded by one rectangular area.
  • the "city” is divided into a bias (rectangular area B 3 ) and a right component (rectangular area B 4 ). Surrounded by separate rectangular areas.
  • three vertical bars are individually surrounded by rectangular areas B 6 to B 8.
  • FIG. 5 is a diagram illustrating a process of setting a reference value from the divided rectangular area B.
  • the determination unit 114 calculates the rectangular widths W 1 to W 9 of the rectangular areas B 1 to B 9. Then, the top 25% when these rectangle widths are rearranged in descending order is selected. Here, since there are nine rectangular areas, the top two are selected. Specifically, the rectangular width W 1 of the encompassing rectangular area B 1 of the "east”, selects the rectangular width W 9 encompassing rectangular area B 9 to "ward".
  • the maximum length W m used in the subsequent processing is set as a value obtained by multiplying the average value of the selected rectangular widths W 1 and W 9 by 1.5. It should be noted that what percentage of the top is to be selected and what value to multiply by is set according to the nature of the document to be read, or according to the number of divided rectangular areas B. It can be changed according to the situation, such as setting. Similarly, the reference length W s used in the subsequent processing is an average value of all rectangular widths W 1 to W 9.
  • FIG. 6 is a diagram illustrating a character recognition process using the character recognition model 121.
  • a specific rectangular area to be evaluated is a rectangular area B 1
  • the character recognition model 121 will be described the case of estimating only this rectangular area B 1.
  • the character recognition model 121 When an image in a rectangular area is input, the character recognition model 121 outputs a plurality of candidate characters and the accuracy p of each candidate character by estimating one character represented in the image.
  • the character recognition model 121 is a candidate character that represents one character even when the image of the input rectangular area does not actually represent one character (for example, when only bias is represented as described later). To estimate.
  • FIG. 7 is a diagram for explaining an input target image to be input to the character recognition model 121.
  • the division unit 113 divides the character string into a rectangular area by dividing the character string into a blank band orthogonal to the arrangement direction in which the character strings are arranged.
  • the character recognition model 121 estimates the candidate character as representing one character even when the image of the input rectangular area does not actually represent one character. Therefore, if one divided rectangular area is designated as a specific rectangular area and the recognition character is determined only from the image, for example, only the bias of the Chinese character is recognized as one character.
  • the width of the rectangular area B 3 which is the specific rectangular area is W 3
  • the width of the combined rectangular area B 4 is W 4 .
  • the sum of these, W 3 + W 4 is smaller than the maximum length W m described with reference to FIG. Therefore, the determination unit 114 sets the generated image of the combined rectangular region 1 as an evaluation target and inputs it to the character recognition model 121.
  • the image of the combined rectangular area 1 is evaluated, but if W 3 + W 4 is larger than the maximum length W m , the image of the combined rectangular area 1 is excluded from the evaluation, and the determination unit 114 determines the specific rectangle. Only the area is evaluated.
  • the example of "East" described with reference to FIG. 6 is an example in which only a specific rectangular region is evaluated.
  • the width of the rectangular region B 3 which is the specific rectangular region is W 3
  • the widths of the rectangular region B 4 and the rectangular region B 5 to be combined are W 4 and W 5 , respectively.
  • the sum of these, W 3 + W 4 + W 5, is larger than the maximum length W m. Therefore, the determination unit 114 excludes the generated image of the combined rectangular region 2 from the evaluation target. In this example, the image of the combined rectangular area 2 is not evaluated, but if W 3 + W 4 + W 5 is smaller than the maximum length W m , the image of the combined rectangular area 2 is also evaluated.
  • the determination unit 114 twists one character represented in the character region. Can be recognized accurately. Further, since the maximum length W m is set based on the width of the divided rectangular area B, it becomes a value suitable for the font used in the character string, and the number of rectangular areas to be combined with the specific rectangular area (i). The upper limit of) is optimized.
  • FIG. 8 is a diagram illustrating a character recognition process when region combination is performed as described with reference to FIG. 7.
  • the determination unit 114 first inputs the bias of the “city”, which is an image of the specific rectangular region, into the character recognition model 121.
  • the character recognition model 121 outputs "person", “day”, etc. as candidate characters together with the accuracy p.
  • the output candidate character is a Chinese character
  • the rectangular width of the specific rectangular area is W 3 , which is smaller than the reference length W s described with reference to FIG. Therefore, determination unit 114, a respective probability p calculated for an output candidate characters is reduced by 0.95 times, and calculates the correction accuracy p c.
  • the correction accuracy of the "Company” p c 0.9139
  • the determination unit 114 inputs the “city”, which is an image of the combined rectangular region 1, into the character recognition model 121.
  • the character recognition model 121 outputs "city", “county”, etc. as candidate characters together with the accuracy p.
  • each of the modified probability p c of the candidate characters with respect to the image of a specific rectangular area, each of the probability p of candidate characters for the image of the coupling rectangular area 1 The maximum value is searched from among these by comparison, and the candidate character corresponding to the maximum value is determined as the recognition character.
  • the highest revised probability p c for the image of the specific rectangular area (0.9139 in the example of FIG. 8 (a))
  • the maximum likelihood p (FIG. 8 with respect to the image of the binding rectangular area (b)
  • the larger one is selected in comparison with 0.982
  • the corresponding candidate character is determined as the recognition character.
  • the determination unit 114 executes such a process and determines the "city" as the recognition character. That is, it is determined that one character to be recognized is a character indicated by 52 points in 37 wards in the JIS kanji code (JIS X 0208).
  • an output candidate characters is Kanji, and, if rectangular width is smaller than the reference length W s, 1 less than a preset multiplied by the coefficient accuracy p to calculate the corrected accuracy p c.
  • the corrected accuracy p c By adopting such reduced been corrected accuracy p c, it can be prevented from being erroneously recognized as one of the Chinese character polarized represented in a relatively narrow width.
  • Determination unit 114 to the next rectangular area B n + 1 subsequent to the rectangular area B n where recognized character is confirmed by transitioning a specific rectangular area, executes the next character recognition. Specifically, since the recognition to the rectangular area B 4 is determined by the "city", determination portion 114, a rectangular area B 5 subsequent to the rectangular area B 4 stipulates that a new specific rectangular region continues to character recognition ..
  • the output unit 115 combines the recognition characters sequentially determined by the determination unit 114 in this way to generate a recognition character string, outputs the recognition character string to the storage unit 120 and stores it, or outputs the recognition character to the monitor 300 via the input / output IF 130. And display it.
  • FIG. 9 is a diagram showing a processing flow of the character string recognition process. The flow starts from the time when the image data of the target document is sent from the scanner 200.
  • step S101 the acquisition unit 111 acquires image data from the scanner 200 via the input / output IF 130, and develops the document image on the memory of the processing unit 110.
  • step S102 the extraction unit 112 extracts the character area LE from the designated reading area DE so that each of them includes a character string for one line, as described with reference to FIG.
  • step S103 the division unit 113 divides the character area LE extracted by the extraction unit based on a blank band orthogonal to the arrangement direction in which the characters are arranged, as described with reference to FIG. Divide into rectangular area B.
  • step S104 the determination unit 114 sets a specific rectangular area to be evaluated from the plurality of divided rectangular areas B.
  • a rectangular area B 1 of the left defined as a specific rectangular region, otherwise the specified rectangular area for the next rectangular area B n + 1 subsequent to the rectangular area B n where recognized character has been confirmed To be determined.
  • step S105 the determination unit 114 first inputs an image of a specific rectangular region into the character recognition model 121, and obtains a plurality of candidate characters and the accuracy p of each candidate character.
  • step S106 the determination unit 114 confirms whether or not the rectangular width W of the specific rectangular region is equal to or less than the reference length W s , and if it is equal to or less than the rectangular width W s , proceeds to step S107 to correct the accuracy p. calculating a correction accuracy p c Te then proceeds from the step S108. If it is not equal to or less than the rectangular width W s , step S107 is skipped and the process proceeds to step S108.
  • step S108 the determination unit 114 combines the continuous rectangular area B n + 1 with the rectangular area B n , which is a specific rectangular area, to generate the combined rectangular area 1.
  • step S109 it is confirmed whether or not the coupling width W n + W n + 1 of the generated coupling rectangular region 1 is equal to or less than the maximum length W m. If the maximum length is W m or less, the process proceeds to step S110, the image of the combined rectangular region 1 is input to the character recognition model 121, and a plurality of candidate characters and the accuracy p of each candidate character are obtained.
  • the determination unit 114 combines the rectangular regions B n + 2 which are continuous with the rectangular regions B n and B n + 1 to generate the combined rectangular region 2. Then, if the combined width W n + W n + 1 + W n + 2 of the combined rectangular area 2 is equal to or less than the maximum length W m (YES in step S109), the image of the combined rectangular area 2 is transferred to the character recognition model 121 in the same manner as the combined rectangular area 1. Input and obtain a plurality of candidate characters and the accuracy p of each candidate character (step S110).
  • step S109 When the determination unit 114 determines in step S109 that the coupling width of the coupling rectangular region i (i is the number of rectangular regions B coupled to the specific rectangular region) is larger than the maximum length W m , the determination unit 114 proceeds to step S111. Determination unit 114, at step S111, the maximum probability p (corrected accuracy p c in the case where it is corrected in step S107) among the candidate characters obtained by inputting an image of a specific rectangular area to the character recognition model 121 It is determined whether the indicated candidate character is used as the recognition character or the candidate character showing the maximum accuracy p among the candidate characters obtained by inputting the image of the combined rectangular area into the character recognition model 121 is used as the recognition character. Specifically, the respective accuracy ps are compared, and the candidate character corresponding to the larger one is used as the recognition character.
  • the determination unit 114 proceeds to step S112 and confirms whether or not character recognition of all the rectangular areas B divided in step S102 is completed. If it is not completed, the process returns to step S104 and the character recognition process is repeated. If it is completed, the process proceeds to step S113, and the output unit 115 ends a series of processes by combining and outputting the recognition characters sequentially determined by the determination unit 114. If there is a character area LE for which character string recognition has not been completed, the processing of step S103 and subsequent steps is executed for the character area LE. At this time, when the continuous character areas LE 1 , LE 2, ... Are recognized as one sentence related to each other, the output unit 115 combines the recognition character strings generated in step S113 with each other. You may output after that. If there is a read area DE for which the extraction of the character area has not been completed, the processing of step S102 and subsequent steps is executed for the read area DE.
  • the character string recognition device and the character string recognition program in the present embodiment described above include a character string in a variable width font, a character string in which half-width characters and full-width characters are mixed, and a handwritten character string included in the recognized document image. Even if it is a character string or the like, high character recognition accuracy can be exhibited. In addition, it is not limited to Japanese, and can be used for all languages such as English characters, Hangul characters, Cyrillic characters, and Arabic characters.
  • the extraction unit 112 extracts the character area LE from the image of the reading area DE so that each vertically includes a character string for one line.
  • the dividing unit 113 divides the character area into a plurality of rectangular areas by dividing the character area based on a blank band orthogonal to the vertical direction in which the characters are lined up.
  • the determination unit 114 inputs the image of the specific rectangular area to be evaluated among the plurality of rectangular areas into the character recognition model 121, and uses the candidate character obtained as the recognition character or is continuous with the specific rectangular area.
  • the candidate character obtained by inputting the image of the combined rectangular area in which the other rectangular area is combined under the specific rectangular area into the character recognition model 121 is used as the recognition character.
  • Some kanji crowns themselves are established as one kanji, but by recognizing a vertically written character string in this way, it is possible to prevent the crown from being mistakenly recognized as one kanji.
  • the character string is not limited to vertical writing and horizontal writing, and may be written diagonally or along the shape of a song as shown in FIG. 10, and is not particularly limited to the direction of character writing. For example, when the reading area DE is extracted with a curved shape as shown in FIG.
  • the scanning window SW is set so as to be orthogonal to the curve direction which is the character string direction, and based on the detected blank band.
  • the rectangular area B may be set.
  • the rectangular region B to be divided may be, for example, a trapezoid or a shape in which some sides are curved.
  • the membership card has been illustrated as an example, but the present invention is not particularly limited, and there is no limitation on the front and back of the driver's license, the front and back of the residence card, the My Number card, the My Number notification card, and other standard forms and non-standard forms.
  • the present invention can be applied to any of them.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

Provided are a character string recognition device that enables highly accurate character recognition even for a character string that is not in a monospaced font and the like. The character string recognition device is provided with: an extraction unit that extracts, on a line-by-line basis, a character region containing a character string from a document image including images of character strings; a division unit that divides the character region into a plurality of rectangular regions by separating the character region on the basis of a blank zone orthogonal to the arrangement direction in which the characters are arranged; a determination unit that determines whether to use, as a recognized character, a candidate character obtained by inputting an image of a particular rectangular region to be evaluated among the plurality of rectangular regions to a character recognition model that estimates a single character represented by the image, or a candidate character obtained by inputting, to the character recognition model, an image of a combined rectangular region that is a combination of the particular rectangular region with another rectangular region continuous from the particular rectangular region; and an output unit that outputs a recognized character string generated by combining a plurality of recognized characters sequentially determined by the determination unit while the particular rectangular region is transitioned over the plurality of rectangular regions.

Description

文字列認識装置及び文字列認識プログラムCharacter string recognition device and character string recognition program
 本発明は、文字列認識装置及び文字列認識プログラムに関する。 The present invention relates to a character string recognition device and a character string recognition program.
 画像に含まれる文字列を認識するOCR技術が知られている(例えば、特許文献1参照)。 An OCR technique for recognizing a character string contained in an image is known (see, for example, Patent Document 1).
特開2019-175317号公報JP-A-2019-175317
 これまでのOCR技術は、文字列が含まれる画像から一文字ずつの文字領域を切り出して、その文字領域が表す一文字を推定するものであった。このような手法は、1文字ずつの幅が等しい等幅フォントの文字列であれば比較的正しく推定することができるものの、文字形状に合わせて最適化された幅を有する可変幅フォントの文字列だと精度の高い推定が困難であった。特に、半角文字と全角文字が混在している文字列の場合にも十分な精度が得られなかった。 The conventional OCR technology has been to cut out a character area for each character from an image containing a character string and estimate one character represented by the character area. Such a method can estimate relatively correctly if the character string of the monospaced font has the same width for each character, but the character string of the variable width font having the width optimized according to the character shape. In that case, it was difficult to make a highly accurate estimation. In particular, sufficient accuracy could not be obtained even in the case of a character string in which half-width characters and full-width characters are mixed.
 本発明は、このような問題を解決するためになされたものであり、等幅フォントの文字列でなくても精度の高い文字認識を可能とする文字列認識装置等を提供するものである。 The present invention has been made to solve such a problem, and provides a character string recognition device or the like that enables highly accurate character recognition even if the character string is not a monospaced font.
 本発明の第1の態様における文字列認識装置は、画像化された文字列を含む文書画像から文字列を包含する文字領域を一行単位で抽出する抽出部と、文字領域を、文字が並ぶ並び方向に直交する空白帯に基づいて区切ることにより、複数の矩形領域に分割する分割部と、複数の矩形領域のうち評価対象である特定矩形領域の画像を、画像に表された一文字を推定する文字認識モデルへ入力して得られる候補文字を認識文字とするか、特定矩形領域に連続する他の矩形領域を特定矩形領域に結合した結合矩形領域の画像を、文字認識モデルへ入力して得られる候補文字を認識文字とするかを決定する決定部と、決定部が複数の矩形領域において特定矩形領域を遷移させつつ順次決定した複数の認識文字を結合して生成した認識文字列を出力する出力部とを備える。このように認識文字を決定することにより、文字列認識装置は、それぞれの文字の幅が異なる場合であっても、一文字の領域が表す文字を正確に認識することができる。 The character string recognition device according to the first aspect of the present invention has an extraction unit that extracts a character area including a character string from a document image including an imaged character string on a line-by-line basis, and a character area in which characters are arranged. Estimate one character represented in the image of the divided portion that divides into a plurality of rectangular areas and the image of the specific rectangular area to be evaluated among the plurality of rectangular areas by dividing the image based on the blank band orthogonal to the direction. The candidate character obtained by inputting to the character recognition model is used as the recognition character, or the image of the combined rectangular area obtained by combining other rectangular areas continuous with the specific rectangular area to the specific rectangular area is input to the character recognition model. Outputs a recognition character string generated by combining a decision unit that determines whether the candidate character to be recognized is a recognition character and a plurality of recognition characters that are sequentially determined while transitioning a specific rectangular area in a plurality of rectangular areas. It has an output unit. By determining the recognition character in this way, the character string recognition device can accurately recognize the character represented by the area of one character even if the width of each character is different.
 上記の文字列認識装置において、決定部は、生成される結合矩形領域の並び方向の長さが設定された最大長以下となる範囲で特定矩形領域に複数の他の矩形領域を結合することを許容することにより、複数の結合矩形領域を生成してそれぞれの画像を文字認識モデルへ入力してもよい。結合矩形領域を生成する場合に、特定矩形領域に連続する一つの矩形領域に限らず、さらに連なる矩形領域も結合の対象とすれば、文字領域に表された一文字をより正確に認識することができる。 In the above character string recognition device, the determination unit combines a plurality of other rectangular areas with the specific rectangular area within a range in which the length of the generated combined rectangular area in the arrangement direction is equal to or less than the set maximum length. If allowed, a plurality of combined rectangular regions may be generated and each image may be input to the character recognition model. When generating a combined rectangular area, if not only one rectangular area continuous with the specific rectangular area but also a continuous rectangular area is targeted for combination, one character represented in the character area can be recognized more accurately. can.
 このとき、最大長は、複数の矩形領域の並び方向におけるそれぞれの長さに基づいて設定してもよい。認識対象となる文字列は、例えばフォントの種類によっては文字ごとに様々な幅を有するが、分割部が分割した矩形領域の長さに基づいて最大長を設定すれば、そのフォントに適した最大長にすることができる。 At this time, the maximum length may be set based on the respective lengths in the arrangement direction of the plurality of rectangular areas. The character string to be recognized has various widths for each character depending on the type of font, for example, but if the maximum length is set based on the length of the rectangular area divided by the divided portion, the maximum suitable for the font is set. Can be long.
 また、上記の文字列認識装置において、決定部は、結合矩形領域を生成する場合に、矩形領域間に挟まれた空白帯を含めて結合してもよい。例えば、偏と旁が左右に分かれている文字の場合はその間に若干の空白帯があることが想定されるので、空白帯を纏めて結合した方が文字認識の精度を向上させることができる。 Further, in the above character string recognition device, when the determination unit generates the combined rectangular area, the determination unit may be combined including a blank band sandwiched between the rectangular areas. For example, in the case of a character in which the bias and the right component are separated into left and right, it is assumed that there is a slight blank band between them. Therefore, it is possible to improve the accuracy of character recognition by combining the blank bands together.
 また、上記の文字列認識装置において、決定部は、特定矩形領域の画像を文字認識モデルへ入力して得られた候補文字が漢字であって、特定矩形領域の並び方向の長さが設定された基準長以下である場合には、候補文字に対して文字認識モデルが算出した確度を低下させてもよい。例えば、漢字の偏にはそれ自体が一つの漢字として成立するものもあるが、偏である場合には幅が狭く表記される。このように文字認識モデルが算出した確度を下げることにより、偏を一つの漢字と誤認識することを防ぐことができる。 Further, in the above character string recognition device, in the determination unit, the candidate character obtained by inputting the image of the specific rectangular area into the character recognition model is a Chinese character, and the length in the arrangement direction of the specific rectangular area is set. If it is less than or equal to the reference length, the accuracy calculated by the character recognition model for the candidate characters may be lowered. For example, some kanji biases themselves are established as one kanji, but if they are biased, the width is narrowed. By lowering the accuracy calculated by the character recognition model in this way, it is possible to prevent the bias from being erroneously recognized as one Chinese character.
 本発明の第2の態様における文字列認識プログラムは、画像化された文字列を含む文書画像から文字列を包含する文字領域を一行単位で抽出する抽出ステップと、文字領域を、文字が並ぶ並び方向に直交する空白帯に基づいて区切ることにより、複数の矩形領域に分割する分割ステップと、複数の矩形領域のうち評価対象である特定矩形領域の画像を、画像に表された一文字を推定する文字認識モデルへ入力して得られる候補文字を認識文字とするか、特定矩形領域に連続する他の矩形領域を特定矩形領域に結合した結合矩形領域の画像を、文字認識モデルへ入力して得られる候補文字を認識文字とするかを決定する決定ステップと、複数の矩形領域において特定矩形領域を遷移させつつ決定ステップを繰り返して順次決定した複数の認識文字を結合して生成した認識文字列を出力する出力ステップとをコンピュータに実行させる。このように認識文字を決定することにより、文字列認識プログラムは、それぞれの文字の幅が異なる場合であっても、一文字の領域が表す文字を正確に認識することができる。 The character string recognition program according to the second aspect of the present invention includes an extraction step of extracting a character area including a character string from a document image including an imaged character string on a line-by-line basis, and an arrangement of characters in the character area. A division step of dividing into a plurality of rectangular areas by dividing based on a blank band orthogonal to the direction, and an image of a specific rectangular area to be evaluated among the plurality of rectangular areas are estimated as one character represented in the image. The candidate character obtained by inputting to the character recognition model is used as the recognition character, or the image of the combined rectangular area obtained by combining other rectangular areas continuous with the specific rectangular area to the specific rectangular area is input to the character recognition model. A determination step for determining whether the candidate character to be used is a recognition character, and a recognition character string generated by combining a plurality of recognition characters sequentially determined by repeating the determination step while transitioning a specific rectangular area in a plurality of rectangular areas. Have the computer perform the output steps to output. By determining the recognition character in this way, the character string recognition program can accurately recognize the character represented by the area of one character even if the width of each character is different.
 本発明により、等幅フォントの文字列でなくても精度の高い文字認識を可能とする文字列認識装置等を提供することができる。 INDUSTRIAL APPLICABILITY According to the present invention, it is possible to provide a character string recognition device or the like that enables highly accurate character recognition even if the character string is not a monospaced font.
本実施形態に係る文字列認識装置の利用状況の例を示す図である。It is a figure which shows the example of the usage situation of the character string recognition apparatus which concerns on this embodiment. 文字列認識装置の主なハードウェア構成図である。It is a main hardware configuration diagram of a character string recognition device. 指定された読取領域と抽出した文字領域の例を示す図である。It is a figure which shows the example of the designated reading area and the extracted character area. 文字領域を矩形領域へ分割する処理を説明する図である。It is a figure explaining the process of dividing a character area into a rectangular area. 分割された矩形領域から基準値を設定する処理を説明する図である。It is a figure explaining the process of setting a reference value from a divided rectangular area. 文字認識モデルを用いた文字認識の処理を説明する図である。It is a figure explaining the process of character recognition using a character recognition model. 文字認識モデルへ入力する入力対象画像を説明する図である。It is a figure explaining the input target image to input to a character recognition model. 領域結合を行った場合における文字認識の処理を説明する図である。It is a figure explaining the process of character recognition at the time of performing area combination. 文字列認識処理の処理フローを示す図である。It is a figure which shows the processing flow of the character string recognition processing. 他の文字列の例を示す図である。It is a figure which shows the example of another character string.
 以下、発明の実施の形態を通じて本発明を説明するが、特許請求の範囲に係る発明を以下の実施形態に限定するものではない。また、実施形態で説明する構成の全てが課題を解決するための手段として必須であるとは限らない。なお、以下の説明において、複数ある同一対象を個別に説明する場合、添え字を付して説明する場合がある。例えば「読取領域」を全体として説明する場合には読取領域DEと表記し、個々の読取領域を具体的に説明する場合には読取領域DEのように添え字を付して説明する。 Hereinafter, the present invention will be described through embodiments of the invention, but the invention according to the claims is not limited to the following embodiments. Moreover, not all of the configurations described in the embodiments are indispensable as means for solving the problem. In the following description, when a plurality of the same objects are individually described, they may be described with subscripts. For example, when the "reading area" is described as a whole, it is described as a reading area DE, and when each reading area is specifically described, it is described with a subscript such as the reading area DE 1.
 図1は、本実施形態に係る文字列認識装置100の利用状況の例を示す図である。文字列認識装置100は、例えばPCであり、スキャナ200とモニタ300が接続されている。スキャナ200やモニタ300は、インターネットなどのネットワークを介して文字列認識装置100に接続されていても良い。 FIG. 1 is a diagram showing an example of a usage status of the character string recognition device 100 according to the present embodiment. The character string recognition device 100 is, for example, a PC, and the scanner 200 and the monitor 300 are connected to each other. The scanner 200 and the monitor 300 may be connected to the character string recognition device 100 via a network such as the Internet.
 スキャナ200は、文字列認識を行いたい対象文書を画像データに変換する装置である。本実施形態において、対象文書は、例えば図示する会員証900のように、印刷された文字列を含む読取領域DEを有する。図示するように、会員証900は、氏名が印刷された読取領域DE、住所が印刷された読取領域DE、会員証900の発行日が印刷された読取領域DEを有する。ユーザは、読取領域DEに印刷された文字列を文字列認識装置で認識させ、文字コードに変換することにより、これらの文字列をテキストデータとしてデータベースの作成等に利用する。 The scanner 200 is a device that converts a target document for which character string recognition is desired into image data. In the present embodiment, the target document has a reading area DE including a printed character string, for example, the membership card 900 shown in the figure. As shown in the figure, the membership card 900 has a reading area DE 1 on which the name is printed, a reading area DE 2 on which the address is printed , and a reading area DE 3 on which the issue date of the membership card 900 is printed. The user recognizes the character strings printed in the reading area DE by the character string recognition device, converts them into character codes, and uses these character strings as text data for creating a database or the like.
 読取領域DEは、同一種類の文書が大量に処理されるような場合には、文字列認識装置100が参照するテンプレートに予め設定されていても良く、あるいは、多様な文書を個別に処理するような場合には、ユーザがその都度設定しても良い。文字列認識装置100は、スキャナ200が会員証900を読み込んで生成した文書画像の画像データを取り込み、設定された読取領域DEに対して文字列認識処理を順次実行する。なお、以下の説明においては、読取領域DEに文字列が横書きで印刷されている場合について説明する。 When a large number of documents of the same type are processed, the reading area DE may be preset in the template referenced by the character string recognition device 100, or various documents may be processed individually. If this is the case, the user may set it each time. The character string recognition device 100 takes in the image data of the document image generated by reading the membership card 900 by the scanner 200, and sequentially executes the character string recognition process for the set reading area DE. In the following description, a case where the character string is printed horizontally in the reading area DE will be described.
 モニタ300は、文字列認識装置100が認識した認識文字列を、テキストデータとして表示する。例えば図示するように認識結果をユーザに確認させる表示を行っても良いし、指定されたデータベースの特定セルに記入した様子を表示しても良い。 The monitor 300 displays the recognition character string recognized by the character string recognition device 100 as text data. For example, as shown in the figure, a display for asking the user to confirm the recognition result may be performed, or a state in which the recognition result is entered in a specific cell of the designated database may be displayed.
 図2は、文字列認識装置100の主なハードウェア構成図である。文字列認識装置100は、主に処理部110、記憶部120、及び入出力IF130によって構成されている。処理部110は、各種情報処理を実行するものであり、CPU又はGPU等のプロセッサ、及びメモリにより実現される。ここでは、コンピュータのCPU,GPU等に記憶部120に記憶されたプログラムが読み込まれることにより、処理部110が、取得部111、抽出部112、分割部113、決定部114、出力部115として機能する。 FIG. 2 is a main hardware configuration diagram of the character string recognition device 100. The character string recognition device 100 is mainly composed of a processing unit 110, a storage unit 120, and an input / output IF 130. The processing unit 110 executes various types of information processing, and is realized by a processor such as a CPU or GPU and a memory. Here, by reading the program stored in the storage unit 120 into the CPU, GPU, or the like of the computer, the processing unit 110 functions as the acquisition unit 111, the extraction unit 112, the division unit 113, the determination unit 114, and the output unit 115. do.
 取得部111は、スキャナ200から送られてくる画像データを取得して、文書画像を処理部110のメモリ上に展開する。抽出部112は、文書画像から画像化された文字列を包含する文字領域を一行単位で抽出する。分割部113は、抽出部112が抽出した文字領域を、文字が並ぶ並び方向に直交する空白帯に基づいて区切ることにより、複数の矩形領域に分割する。 The acquisition unit 111 acquires the image data sent from the scanner 200 and expands the document image on the memory of the processing unit 110. The extraction unit 112 extracts a character area including the imaged character string from the document image on a line-by-line basis. The division unit 113 divides the character area extracted by the extraction unit 112 into a plurality of rectangular areas by dividing the character area extracted by the extraction unit 112 based on a blank band orthogonal to the arrangement direction in which the characters are arranged.
 決定部114は、分割された複数の矩形領域のうち評価対象である特定矩形領域の画像を文字認識モデル121へ入力して得られる候補文字を認識文字とするか、特定矩形領域に連続する他の矩形領域を結合して生成した結合矩形領域の画像を文字認識モデル121へ入力して得られる候補文字を認識文字とするかを決定する。出力部115は、決定部114が特定矩形領域を遷移させつつ順次決定した認識文字を結合して生成した認識文字列を出力する。これら機能部の具体的な処理については、後に詳述する。 The determination unit 114 sets the candidate character obtained by inputting the image of the specific rectangular area to be evaluated into the character recognition model 121 among the plurality of divided rectangular areas as the recognition character, or sets the candidate character continuously in the specific rectangular area. It is determined whether to use the candidate character obtained by inputting the image of the combined rectangular area generated by combining the rectangular areas of the above into the character recognition model 121 as the recognition character. The output unit 115 outputs a recognition character string generated by combining the recognition characters sequentially determined by the determination unit 114 while transitioning the specific rectangular area. Specific processing of these functional parts will be described in detail later.
 記憶部120は、各種情報を記憶するものであり、メモリ及びハードディスク等の任意の記憶装置により実現される。記憶部120は、本実施形態においては、文字列認識モデル121を構築するニューラルネットワークの重み等の情報を記憶する。 The storage unit 120 stores various types of information, and is realized by an arbitrary storage device such as a memory and a hard disk. In the present embodiment, the storage unit 120 stores information such as the weight of the neural network that builds the character string recognition model 121.
 文字列認識モデル121は、矩形領域の画像の入力に応じて、当該矩形領域の画像に表された一文字を推定する。具体的には、入力された画像に対して、複数の候補文字とそれぞれの候補文字の確度を出力する。候補文字は、例えばJIS漢字コード(JIS X 0208)のような文字コードに対応付けられて出力される。この文字列認識モデル121は、例えば、活字文字が写された教師画像に基づいて重みが調整された畳み込みニューラルネットワーク(CNN)により構築される。但し、別のニューラルネットワークを用いてもよく、さらにはニューラルネットワークを用いなくてもよく、ルールベースで構築されてもよい。 The character string recognition model 121 estimates one character represented by the image of the rectangular area in response to the input of the image of the rectangular area. Specifically, a plurality of candidate characters and the accuracy of each candidate character are output for the input image. Candidate characters are output in association with a character code such as JIS Kanji code (JIS X 0208). The character string recognition model 121 is constructed by, for example, a convolutional neural network (CNN) whose weights are adjusted based on a teacher image on which printed characters are copied. However, another neural network may be used, and further, the neural network may not be used, and the neural network may be constructed on a rule basis.
 入出力IF130は、処理部110がスキャナ200やモニタ300との間で情報を授受するための入出力インタフェースである。具体的には、USBインタフェースやLANインタフェースにより実現される。入出力IF130は、キーボード、マウス、タッチパネル等の入力装置と接続されてもよく、スピーカ等の出力装置と接続されてもよい。 The input / output IF 130 is an input / output interface for the processing unit 110 to exchange information with the scanner 200 and the monitor 300. Specifically, it is realized by a USB interface or a LAN interface. The input / output IF 130 may be connected to an input device such as a keyboard, mouse, or touch panel, or may be connected to an output device such as a speaker.
 図3は、指定された読取領域DEと抽出した文字領域LEの例を示す図である。図は、会員証900のうち住所が印刷された読取領域DEを読み取った画像であり、取得部111がメモリ上に展開した読取領域DEの画像を視覚的に表すものである。 FIG. 3 is a diagram showing an example of the designated reading area DE and the extracted character area LE. The figure is an image of the reading area DE 2 on which the address is printed in the membership card 900, and visually represents the image of the reading area DE 2 expanded on the memory by the acquisition unit 111.
 抽出部112は、読取領域DEの画像から、それぞれが一行分の文字列を包含するように文字領域LE、LEを抽出する。具体的には、抽出部112は、読取領域DEを二値化し、行方向への膨張処理を施すことにより、連続する文字列を行方向へ連なる画素集合にする。ここでは、2行分の画素集合が生成されるので、それぞれの画素集合を各々包含するように矩形で囲むことにより文字領域LE、LEを抽出する。 The extraction unit 112 extracts the character areas LE 1 and LE 2 from the image of the reading area DE 2 so that each of them contains a character string for one line. Specifically, the extraction unit 112 binarizes the reading area DE 2 and performs expansion processing in the row direction to form a continuous character string into a pixel set that is continuous in the row direction. Here, since a pixel set for two lines is generated, the character regions LE 1 and LE 2 are extracted by enclosing each pixel set with a rectangle so as to include each pixel set.
 図4は、文字領域LEを複数の矩形領域Bへ分割する処理を説明する図である。図4(a)は、分割部113が、走査窓SWを文字列が並ぶ並び方向へ走査して輝度値を算出する様子を示す。走査窓SWは、文字領域LEの縦幅と数ピクセルの横幅を有し、分割部113は、走査窓SWの各走査位置における輝度値を算出する。輝度値は、例えば走査窓SW内に含まれる領域の画素値の総和として算出する。 FIG. 4 is a diagram illustrating a process of dividing the character area LE 1 into a plurality of rectangular areas B. FIG. 4A shows how the dividing unit 113 scans the scanning window SW in the direction in which the character strings are lined up to calculate the luminance value. The scanning window SW has a vertical width of the character area LE 1 and a horizontal width of several pixels, and the dividing unit 113 calculates the brightness value at each scanning position of the scanning window SW. The brightness value is calculated as, for example, the sum of the pixel values of the region included in the scanning window SW.
 図4(b)は、走査窓SWの各走査位置における輝度値を図4(a)の文字列の並び方向の座標に合わせて示すものであり、縦軸が輝度値を表す。なお、ここでは、走査窓SW内の文字の密度が高いほど輝度値が大きいものとして表している。図の例のように文字列が「東京都品川区」である場合、文字と文字の間、「都」の偏と旁の間、「川」の3つの縦棒の間、「東」の前方、「区」の後方に輝度値が0となる空白帯が生じる。分割部113は、文字領域LEを、文字列の並び方向に直交するこの空白帯の前後で文字領域LEを区切ることにより、それぞれが文字要素を包含する矩形領域B~Bに分割する。 FIG. 4B shows the luminance value at each scanning position of the scanning window SW in accordance with the coordinates in the arrangement direction of the character strings in FIG. 4A, and the vertical axis represents the luminance value. Here, it is assumed that the higher the density of characters in the scanning window SW, the larger the luminance value. When the character string is "Shinagawa-ku, Tokyo" as shown in the example of the figure, it is between the characters, between the characters, between the bias and the right component of "Miyako", between the three vertical bars of "River", and "East". A blank band having a brightness value of 0 occurs in the front and behind the "ward". The dividing unit 113 divides the character area LE 1 into rectangular areas B 1 to B 9 each containing a character element by dividing the character area LE 1 before and after this blank band orthogonal to the character string arrangement direction. do.
 図4(c)は、「東京都品川区」の文字要素をそれぞれ矩形領域B~Bで分割した様子を示す。図示するように、本実施形態においては、必ずしも一文字が一つの矩形領域で囲まれているのではなく、例えば、「都」は、偏(矩形領域B)と旁(矩形領域B)で別々の矩形領域で囲まれている。同様に、「川」も3つの縦棒が個別に矩形領域B~Bで囲まれている。 FIG. 4C shows how the character elements of “Shinagawa-ku, Tokyo” are divided into rectangular areas B 1 to B 9, respectively. As shown in the figure, in the present embodiment, one character is not necessarily surrounded by one rectangular area. For example, the "city" is divided into a bias (rectangular area B 3 ) and a right component (rectangular area B 4 ). Surrounded by separate rectangular areas. Similarly, in the "river", three vertical bars are individually surrounded by rectangular areas B 6 to B 8.
 このように分割された矩形領域Bに対して設定される基準値について説明する。図5は、分割された矩形領域Bから基準値を設定する処理を説明する図である。決定部114は、矩形領域B~Bの矩形幅W~Wを計算する。そして、これらの矩形幅を大きい順に並べ替えた場合の上位25%を選択する。ここでは、9個の矩形領域が存在するので、上位2個を選択する。具体的には、「東」を包含する矩形領域Bの矩形幅Wと、「区」を包含する矩形領域Bの矩形幅Wを選択する。 The reference value set for the rectangular area B divided in this way will be described. FIG. 5 is a diagram illustrating a process of setting a reference value from the divided rectangular area B. The determination unit 114 calculates the rectangular widths W 1 to W 9 of the rectangular areas B 1 to B 9. Then, the top 25% when these rectangle widths are rearranged in descending order is selected. Here, since there are nine rectangular areas, the top two are selected. Specifically, the rectangular width W 1 of the encompassing rectangular area B 1 of the "east", selects the rectangular width W 9 encompassing rectangular area B 9 to "ward".
 後の処理において使用する最大長Wは、選択した矩形幅W、Wとの平均値を1.5倍した値として設定する。なお、上位の何%を選択の対象とするか、また、乗ずる係数を如何なる値にするかは、読取り対象の文書の性質に応じて設定したり、分割された矩形領域Bの数に応じて設定したりするなど、状況に応じて変更し得る。また、同じく後の処理において使用する基準長Wは、すべての矩形幅W~Wの平均値とする。 The maximum length W m used in the subsequent processing is set as a value obtained by multiplying the average value of the selected rectangular widths W 1 and W 9 by 1.5. It should be noted that what percentage of the top is to be selected and what value to multiply by is set according to the nature of the document to be read, or according to the number of divided rectangular areas B. It can be changed according to the situation, such as setting. Similarly, the reference length W s used in the subsequent processing is an average value of all rectangular widths W 1 to W 9.
 図6は、文字認識モデル121を用いた文字認識の処理を説明する図である。ここではまず、評価対象である特定矩形領域が矩形領域Bであって、文字認識モデル121がこの矩形領域Bのみを推定する場合について説明する。 FIG. 6 is a diagram illustrating a character recognition process using the character recognition model 121. Here, first, a specific rectangular area to be evaluated is a rectangular area B 1, the character recognition model 121 will be described the case of estimating only this rectangular area B 1.
 文字認識モデル121は、矩形領域の画像が入力されると、当該画像に表された一文字を推定することにより、複数の候補文字とそれぞれの候補文字の確度pを出力する。換言すれば、文字認識モデル121は、入力される矩形領域の画像が実際には一文字を表すものでない場合(例えば、後述するように偏のみを表す場合)にも、一文字を表すものとして候補文字を推定する。 When an image in a rectangular area is input, the character recognition model 121 outputs a plurality of candidate characters and the accuracy p of each candidate character by estimating one character represented in the image. In other words, the character recognition model 121 is a candidate character that represents one character even when the image of the input rectangular area does not actually represent one character (for example, when only bias is represented as described later). To estimate.
Figure JPOXMLDOC01-appb-I000001
Figure JPOXMLDOC01-appb-I000001
 図7は、文字認識モデル121へ入力する入力対象画像を説明する図である。本実施形態においては、上述のように、分割部113は、文字列が並ぶ並び方向に直交する空白帯で区切ることにより文字列を矩形領域に分割する。また、文字認識モデル121は、入力される矩形領域の画像が実際には一文字を表すものでない場合にも、一文字を表すものとして候補文字を推定する。したがって、分割された一つの矩形領域を特定矩形領域に指定し、その画像のみから認識文字を決定すると、例えば漢字の偏のみを一文字として認識してしまう。 FIG. 7 is a diagram for explaining an input target image to be input to the character recognition model 121. In the present embodiment, as described above, the division unit 113 divides the character string into a rectangular area by dividing the character string into a blank band orthogonal to the arrangement direction in which the character strings are arranged. Further, the character recognition model 121 estimates the candidate character as representing one character even when the image of the input rectangular area does not actually represent one character. Therefore, if one divided rectangular area is designated as a specific rectangular area and the recognition character is determined only from the image, for example, only the bias of the Chinese character is recognized as one character.
 そこで、本実施形態においては、特定矩形領域の画像のみを評価対象とするのではなく、指定された特定矩形領域に連続する他の矩形領域を特定矩形領域に結合した結合矩形領域の画像も評価対象とする。図7(a)は、評価対象である特定矩形領域が矩形領域Bである場合を表す。具体的には、「都」の偏である。この場合、結合する他の矩形領域は0個であるので、i=0とする。 Therefore, in the present embodiment, not only the image of the specific rectangular area is evaluated, but also the image of the combined rectangular area in which other rectangular areas continuous with the specified specific rectangular area are combined with the specific rectangular area is evaluated. set to target. FIGS. 7 (a) represents a specific case rectangular area to be evaluated is a rectangular region B 3. Specifically, it is the bias of the "capital". In this case, since there are no other rectangular regions to be combined, i = 0 is set.
 図7(b)は、特定矩形領域である矩形領域Bに、連続する矩形領域Bを結合して、結合矩形領域1を生成した様子を示す。特定矩形領域に結合する他の矩形領域は1個であるので、i=1とする。なお、決定部114は、このような結合矩形領域を生成する場合に、矩形領域Bと矩形領域Bに挟まれた空白帯を含めて結合する。仮に矩形領域Bと矩形領域Bを合わせた画像が一文字を表す場合には、本来両者間に存在していた空白帯を含めた方が一文字としてのバランスが正しくなるので、文字認識モデル121による認識精度を向上させることができる。 FIG. 7B shows a state in which a continuous rectangular area B 4 is combined with a rectangular area B 3 which is a specific rectangular area to generate a combined rectangular area 1. Since there is only one other rectangular area to be combined with the specific rectangular area, i = 1. Incidentally, the determination unit 114, when generating such binding rectangular region, couples, including spaces zone sandwiched between the rectangular region B 3 and the rectangular region B 4. If the combined image of the rectangular area B 3 and the rectangular area B 4 represents one character, the balance as one character will be correct if the blank band that originally existed between the two is included, so the character recognition model 121 It is possible to improve the recognition accuracy by.
 ここで、特定矩形領域である矩形領域Bの幅はWであり、結合する矩形領域Bの幅はWである。これらの和であるW+Wは、図5を用いて説明した最大長Wよりも小さい。したがって、決定部114は、生成した結合矩形領域1の画像を評価対象とし、文字認識モデル121へ入力する。この例では結合矩形領域1の画像は評価対象となるが、もしW+Wが最大長Wよりも大きければ、結合矩形領域1の画像は評価対象外となり、決定部114は、特定矩形領域のみを評価対象とする。図6を用いて説明した「東」の例は、特定矩形領域のみが評価対象となった例である。 Here, the width of the rectangular area B 3 which is the specific rectangular area is W 3 , and the width of the combined rectangular area B 4 is W 4 . The sum of these, W 3 + W 4, is smaller than the maximum length W m described with reference to FIG. Therefore, the determination unit 114 sets the generated image of the combined rectangular region 1 as an evaluation target and inputs it to the character recognition model 121. In this example, the image of the combined rectangular area 1 is evaluated, but if W 3 + W 4 is larger than the maximum length W m , the image of the combined rectangular area 1 is excluded from the evaluation, and the determination unit 114 determines the specific rectangle. Only the area is evaluated. The example of "East" described with reference to FIG. 6 is an example in which only a specific rectangular region is evaluated.
 図7(c)は、特定矩形領域である矩形領域Bに、連続する矩形領域Bと矩形領域Bを結合して、結合矩形領域2を生成した様子を示す。特定矩形領域に結合する他の矩形領域は2個であるので、i=2とする。この場合も、決定部114は、それぞれの矩形領域間の空白帯を含めて結合する。 FIG. 7C shows a state in which a continuous rectangular area B 4 and a rectangular area B 5 are combined with a rectangular area B 3 which is a specific rectangular area to generate a combined rectangular area 2. Since there are two other rectangular areas to be combined with the specific rectangular area, i = 2. Also in this case, the determination unit 114 is connected including the blank band between the respective rectangular regions.
 ここで、特定矩形領域である矩形領域Bの幅はWであり、結合する矩形領域B及び矩形領域Bの幅はそれぞれW、Wある。これらの和であるW+W+Wは、最大長Wよりも大きい。したがって、決定部114は、生成した結合矩形領域2の画像は評価対象外とする。この例では結合矩形領域2の画像は評価対象外となるが、もしW+W+Wが最大長Wよりも小さければ、結合矩形領域2の画像も評価対象とする。 Here, the width of the rectangular region B 3 which is the specific rectangular region is W 3 , and the widths of the rectangular region B 4 and the rectangular region B 5 to be combined are W 4 and W 5 , respectively. The sum of these, W 3 + W 4 + W 5, is larger than the maximum length W m. Therefore, the determination unit 114 excludes the generated image of the combined rectangular region 2 from the evaluation target. In this example, the image of the combined rectangular area 2 is not evaluated, but if W 3 + W 4 + W 5 is smaller than the maximum length W m , the image of the combined rectangular area 2 is also evaluated.
 このように、最大長W以下となる範囲で特定矩形領域に連続する他の矩形領域を1つ以上結合することを許容することにより、決定部114は、文字領域に表された一文字をより正確に認識することができる。また、最大長Wは、分割された矩形領域Bの幅に基づいて設定されるので、文字列に用いられているフォントに適した値となり、特定矩形領域に結合する矩形領域の個数(iの上限)が最適化される。 In this way, by allowing one or more other rectangular regions continuous to the specific rectangular region to be combined within a range of the maximum length W m or less, the determination unit 114 twists one character represented in the character region. Can be recognized accurately. Further, since the maximum length W m is set based on the width of the divided rectangular area B, it becomes a value suitable for the font used in the character string, and the number of rectangular areas to be combined with the specific rectangular area (i). The upper limit of) is optimized.
 図8は、図7を用いて説明したように領域結合を行った場合における文字認識の処理を説明する図である。決定部114は、図8(a)に示すように、まず、特定矩形領域の画像である「都」の偏を文字認識モデル121へ入力する。文字認識モデル121は、候補文字として「者」「日」等を確度pと共に出力する。図の例では、「者」の確度はp=0.962、「日」の確度はp=0.021である。 FIG. 8 is a diagram illustrating a character recognition process when region combination is performed as described with reference to FIG. 7. As shown in FIG. 8A, the determination unit 114 first inputs the bias of the “city”, which is an image of the specific rectangular region, into the character recognition model 121. The character recognition model 121 outputs "person", "day", etc. as candidate characters together with the accuracy p. In the example of the figure, the accuracy of "person" is p = 0.962, and the accuracy of "day" is p = 0.021.
 ここで、出力された候補文字は漢字であり、また、特定矩形領域の矩形幅はWであって、図5を用いて説明した基準長Wよりも小さい。そこで、決定部114は、出力された候補文字に対して算出されたそれぞれの確度pを0.95倍して低下させ、修正確度pを算出する。具体的には、「者」の修正確度はp=0.9139、「日」の修正確度はp=0.01995である。 Here, the output candidate character is a Chinese character, and the rectangular width of the specific rectangular area is W 3 , which is smaller than the reference length W s described with reference to FIG. Therefore, determination unit 114, a respective probability p calculated for an output candidate characters is reduced by 0.95 times, and calculates the correction accuracy p c. Specifically, the correction accuracy of the "Company" p c = 0.9139, correction accuracy of the "day" is a p c = 0.01995.
 次に、決定部114は、図8(b)に示すように、結合矩形領域1の画像である「都」を文字認識モデル121へ入力する。文字認識モデル121は、候補文字として「都」「郡」等を確度pと共に出力する。図の例では、「都」の確度はp=0.982、「郡」の確度はp=0.013である。もし、結合矩形領域2、3…の画像も評価対象であれば、決定部114は、それらの画像も文字認識モデル121へ入力して、候補文字とそれぞれの確度を得る。 Next, as shown in FIG. 8B, the determination unit 114 inputs the “city”, which is an image of the combined rectangular region 1, into the character recognition model 121. The character recognition model 121 outputs "city", "county", etc. as candidate characters together with the accuracy p. In the example of the figure, the accuracy of the "city" is p = 0.982, and the accuracy of the "county" is p = 0.013. If the images of the combined rectangular regions 2, 3 ... Are also evaluation targets, the determination unit 114 also inputs those images into the character recognition model 121 to obtain candidate characters and their respective accuracy.
 そして、決定部114は、図8(c)に示すように、特定矩形領域の画像に対する候補文字のそれぞれの修正確度pと、結合矩形領域1の画像に対する候補文字のそれぞれの確度pとを比較してこれらの中から最大値を検索し、その最大値に対応する候補文字を認識文字に決定する。具体的には、特定矩形領域の画像に対する最大の修正確度p(図8(a)の例では0.9139)と、結合矩形領域の画像に対する最大の確度p(図8(b)の例では0.982)とを比較して大きい方(両図の例では0.982)を選択し、これに対応する候補文字を認識文字に決定する。決定部114は、このような処理を実行して「都」を認識文字と決定する。すなわち、認識すべき一文字は、JIS漢字コード(JIS X 0208)において37区52点で示される文字であると決定する。 Then, determination unit 114, as shown in FIG. 8 (c), each of the modified probability p c of the candidate characters with respect to the image of a specific rectangular area, each of the probability p of candidate characters for the image of the coupling rectangular area 1 The maximum value is searched from among these by comparison, and the candidate character corresponding to the maximum value is determined as the recognition character. Specifically, examples of the highest revised probability p c for the image of the specific rectangular area (0.9139 in the example of FIG. 8 (a)), the maximum likelihood p (FIG. 8 with respect to the image of the binding rectangular area (b) Then, the larger one (0.982 in the examples of both figures) is selected in comparison with 0.982), and the corresponding candidate character is determined as the recognition character. The determination unit 114 executes such a process and determines the "city" as the recognition character. That is, it is determined that one character to be recognized is a character indicated by 52 points in 37 wards in the JIS kanji code (JIS X 0208).
 なお、特定矩形領域については、決定部114は、上述のように、出力された候補文字が漢字であり、かつ、矩形幅が基準長Wよりも小さい場合に、予め設定された1未満の係数を確度pに乗じて修正確度pを算出する。このように低下された修正確度pを採用することにより、比較的狭い幅で表わされる偏を一つの漢字と誤認識することを防ぐことができる。 Note that the specific rectangular area determination unit 114, as described above, an output candidate characters is Kanji, and, if rectangular width is smaller than the reference length W s, 1 less than a preset multiplied by the coefficient accuracy p to calculate the corrected accuracy p c. By adopting such reduced been corrected accuracy p c, it can be prevented from being erroneously recognized as one of the Chinese character polarized represented in a relatively narrow width.
 決定部114は、認識文字が確定した矩形領域Bに続く次の矩形領域Bn+1へ特定矩形領域を遷移させ、次の文字認識を実行する。具体的には、「都」の認識により矩形領域Bまでが確定するので、決定部114は、矩形領域Bに続く矩形領域Bを新たな特定矩形領域と定めて文字認識を継続する。出力部115は、このようにして決定部114が順次決定した認識文字を結合して認識文字列を生成し、記憶部120へ出力して記憶させたり、入出力IF130を介してモニタ300へ出力して表示させたりする。 Determination unit 114, to the next rectangular area B n + 1 subsequent to the rectangular area B n where recognized character is confirmed by transitioning a specific rectangular area, executes the next character recognition. Specifically, since the recognition to the rectangular area B 4 is determined by the "city", determination portion 114, a rectangular area B 5 subsequent to the rectangular area B 4 stipulates that a new specific rectangular region continues to character recognition .. The output unit 115 combines the recognition characters sequentially determined by the determination unit 114 in this way to generate a recognition character string, outputs the recognition character string to the storage unit 120 and stores it, or outputs the recognition character to the monitor 300 via the input / output IF 130. And display it.
 次に、これら一連の文字列認識処理の流れをフロー図により説明する。図9は、文字列認識処理の処理フローを示す図である。フローは、スキャナ200から対象文書の画像データが送られてくる時点から開始する。 Next, the flow of these series of character string recognition processes will be explained using a flow chart. FIG. 9 is a diagram showing a processing flow of the character string recognition process. The flow starts from the time when the image data of the target document is sent from the scanner 200.
 取得部111は、ステップS101で、スキャナ200から入出力IF130を介して画像データを取得し、文書画像を処理部110のメモリ上に展開する。抽出部112は、ステップS102で、図3を用いて説明したように、指定された読取領域DEからそれぞれが一行分の文字列を包含するように文字領域LEを抽出する。次いでステップS103へ進み、分割部113は、図4を用いて説明したように、抽出部が抽出した文字領域LEを、文字が並ぶ並び方向に直交する空白帯に基づいて区切ることにより、複数の矩形領域Bに分割する。 In step S101, the acquisition unit 111 acquires image data from the scanner 200 via the input / output IF 130, and develops the document image on the memory of the processing unit 110. In step S102, the extraction unit 112 extracts the character area LE from the designated reading area DE so that each of them includes a character string for one line, as described with reference to FIG. Next, the process proceeds to step S103, and the division unit 113 divides the character area LE extracted by the extraction unit based on a blank band orthogonal to the arrangement direction in which the characters are arranged, as described with reference to FIG. Divide into rectangular area B.
 決定部114は、ステップS104で、分割された複数の矩形領域Bから評価対象である特定矩形領域を設定する。初めてステップS104を実行する場合には、左端の矩形領域Bを特定矩形領域と定め、そうでない場合には、認識文字が確定した矩形領域Bに続く次の矩形領域Bn+1を特定矩形領域と定める。決定部114は、ステップS105で、まず、特定矩形領域の画像を文字認識モデル121へ入力し、複数の候補文字とそれぞれの候補文字の確度pを得る。決定部114は、続くステップS106で、特定矩形領域の矩形幅Wが基準長W以下であったか否かを確認し、矩形幅W以下であればステップS107へ進んで、確度pを修正して修正確度pを算出してからステップS108へ進む。矩形幅W以下でなければ、ステップS107をスキップしてステップS108へ進む。 In step S104, the determination unit 114 sets a specific rectangular area to be evaluated from the plurality of divided rectangular areas B. When performing the first step S104, a rectangular area B 1 of the left defined as a specific rectangular region, otherwise the specified rectangular area for the next rectangular area B n + 1 subsequent to the rectangular area B n where recognized character has been confirmed To be determined. In step S105, the determination unit 114 first inputs an image of a specific rectangular region into the character recognition model 121, and obtains a plurality of candidate characters and the accuracy p of each candidate character. In the following step S106, the determination unit 114 confirms whether or not the rectangular width W of the specific rectangular region is equal to or less than the reference length W s , and if it is equal to or less than the rectangular width W s , proceeds to step S107 to correct the accuracy p. calculating a correction accuracy p c Te then proceeds from the step S108. If it is not equal to or less than the rectangular width W s , step S107 is skipped and the process proceeds to step S108.
 決定部114は、ステップS108で、特定矩形領域である矩形領域Bに、連続する矩形領域Bn+1を結合して、結合矩形領域1を生成する。続くステップS109で、生成した結合矩形領域1の結合幅W+Wn+1が最大長W以下であるか否かを確認する。最大長W以下であればステップS110へ進んで、結合矩形領域1の画像を文字認識モデル121へ入力し、複数の候補文字とそれぞれの候補文字の確度pを得る。その後、ステップS108へ戻り、決定部114は、矩形領域B、Bn+1に連続する矩形領域Bn+2を結合して、結合矩形領域2を生成する。そして、結合矩形領域2の結合幅W+Wn+1+Wn+2が最大長W以下であれば(ステップS109でYES)、結合矩形領域1と同様に結合矩形領域2の画像を文字認識モデル121へ入力し、複数の候補文字とそれぞれの候補文字の確度pを得る(ステップS110)。 In step S108, the determination unit 114 combines the continuous rectangular area B n + 1 with the rectangular area B n , which is a specific rectangular area, to generate the combined rectangular area 1. In the following step S109, it is confirmed whether or not the coupling width W n + W n + 1 of the generated coupling rectangular region 1 is equal to or less than the maximum length W m. If the maximum length is W m or less, the process proceeds to step S110, the image of the combined rectangular region 1 is input to the character recognition model 121, and a plurality of candidate characters and the accuracy p of each candidate character are obtained. After that, returning to step S108, the determination unit 114 combines the rectangular regions B n + 2 which are continuous with the rectangular regions B n and B n + 1 to generate the combined rectangular region 2. Then, if the combined width W n + W n + 1 + W n + 2 of the combined rectangular area 2 is equal to or less than the maximum length W m (YES in step S109), the image of the combined rectangular area 2 is transferred to the character recognition model 121 in the same manner as the combined rectangular area 1. Input and obtain a plurality of candidate characters and the accuracy p of each candidate character (step S110).
 決定部114は、ステップS109で、結合矩形領域i(iは特定矩形領域に結合した矩形領域Bの個数)の結合幅が最大長Wより大きいと判断すると、ステップS111へ進む。決定部114は、ステップS111で、特定矩形領域の画像を文字認識モデル121へ入力して得られた候補文字のうち最大の確度p(ステップS107で修正された場合には修正確度p)を示す候補文字を認識文字とするか、結合矩形領域の画像を文字認識モデル121へ入力して得られた候補文字のうち最大の確度pを示す候補文字を認識文字とするかを決定する。具体的には、それぞれの確度pを比較して、大きい方に対応する候補文字を認識文字とする。 When the determination unit 114 determines in step S109 that the coupling width of the coupling rectangular region i (i is the number of rectangular regions B coupled to the specific rectangular region) is larger than the maximum length W m , the determination unit 114 proceeds to step S111. Determination unit 114, at step S111, the maximum probability p (corrected accuracy p c in the case where it is corrected in step S107) among the candidate characters obtained by inputting an image of a specific rectangular area to the character recognition model 121 It is determined whether the indicated candidate character is used as the recognition character or the candidate character showing the maximum accuracy p among the candidate characters obtained by inputting the image of the combined rectangular area into the character recognition model 121 is used as the recognition character. Specifically, the respective accuracy ps are compared, and the candidate character corresponding to the larger one is used as the recognition character.
 決定部114は、ステップS112へ進み、ステップS102で分割されたすべての矩形領域Bの文字認識が完了したか否かを確認する。完了していなければステップS104へ戻り文字認識の処理を繰り返す。完了していればステップS113へ進み、出力部115は、決定部114が順次決定した認識文字を結合して出力することにより、一連の処理を終了する。なお、まだ文字列認識が完了していない文字領域LEが残っていれば、その文字領域LEを対象としてステップS103以降の処理を実行する。このとき、出力部115は、連続する文字領域LE、LE…が、互いに関連する一文であると認識されるような場合には、ステップS113で生成されたそれぞれの認識文字列を互いに結合してから出力しても良い。また、まだ文字領域の抽出も終えていない読取領域DEが残っていれば、その読取領域DEを対象としてステップS102以降の処理を実行する。 The determination unit 114 proceeds to step S112 and confirms whether or not character recognition of all the rectangular areas B divided in step S102 is completed. If it is not completed, the process returns to step S104 and the character recognition process is repeated. If it is completed, the process proceeds to step S113, and the output unit 115 ends a series of processes by combining and outputting the recognition characters sequentially determined by the determination unit 114. If there is a character area LE for which character string recognition has not been completed, the processing of step S103 and subsequent steps is executed for the character area LE. At this time, when the continuous character areas LE 1 , LE 2, ... Are recognized as one sentence related to each other, the output unit 115 combines the recognition character strings generated in step S113 with each other. You may output after that. If there is a read area DE for which the extraction of the character area has not been completed, the processing of step S102 and subsequent steps is executed for the read area DE.
 以上説明した本実施形態における文字列認識装置、文字列認識プログラムは、認識する文書画像に含まれる文字列が可変幅フォントの文字列、半角文字と全角文字が混在している文字列、手書きの文字列などであっても、高い文字認識精度を発揮することができる。また、日本語に限られず、英文字やハングル、キリル文字、アラビア文字など特に限られることなく言語全般に用いることができる。 The character string recognition device and the character string recognition program in the present embodiment described above include a character string in a variable width font, a character string in which half-width characters and full-width characters are mixed, and a handwritten character string included in the recognized document image. Even if it is a character string or the like, high character recognition accuracy can be exhibited. In addition, it is not limited to Japanese, and can be used for all languages such as English characters, Hangul characters, Cyrillic characters, and Arabic characters.
 なお、以上の実施形態においては、読取領域DEに文字列が横書きで印刷されている場合について説明したが、読取領域DEに文字列が縦書きで印刷されている場合であっても、同様に処理することができる。具体的には、まず、抽出部112は、読取領域DEの画像から、それぞれが縦に一行分の文字列を包含するように文字領域LEを抽出する。次に、分割部113は、当該文字領域を、文字が並ぶ縦方向に直交する空白帯に基づいて区切ることにより、複数の矩形領域に分割する。そして、決定部114は、これら複数の矩形領域のうち評価対象である特定矩形領域の画像を、文字認識モデル121へ入力して得られる候補文字を認識文字とするか、特定矩形領域に連続する他の矩形領域を特定矩形領域の下に結合した結合矩形領域の画像を、文字認識モデル121へ入力して得られる候補文字を認識文字とするかを決定する。漢字の冠にはそれ自体が一つの漢字として成立するものもあるが、縦書きの文字列をこのように認識させると、冠を一つの漢字と誤認識することを防ぐことができる。なお、文字列は縦書きや横書きに限られずに、図10に示すように斜め書きや曲形状に沿った書き方であってもよく、特に文字書きの方向性に限られるものではない。例えば図12(b)に示すような曲形状で読取領域DEを抽出した場合には、走査窓SWを文字列方向である曲線方向に直交するように設定し、検出される空白帯に基づいて矩形領域Bを設定すれば良い。この場合、分割される矩形領域Bは、例えば台形や一部の辺が曲線で構成される形状としても良い。 In the above embodiment, the case where the character string is printed horizontally in the reading area DE has been described, but the same applies even when the character string is printed vertically in the reading area DE. Can be processed. Specifically, first, the extraction unit 112 extracts the character area LE from the image of the reading area DE so that each vertically includes a character string for one line. Next, the dividing unit 113 divides the character area into a plurality of rectangular areas by dividing the character area based on a blank band orthogonal to the vertical direction in which the characters are lined up. Then, the determination unit 114 inputs the image of the specific rectangular area to be evaluated among the plurality of rectangular areas into the character recognition model 121, and uses the candidate character obtained as the recognition character or is continuous with the specific rectangular area. It is determined whether or not the candidate character obtained by inputting the image of the combined rectangular area in which the other rectangular area is combined under the specific rectangular area into the character recognition model 121 is used as the recognition character. Some kanji crowns themselves are established as one kanji, but by recognizing a vertically written character string in this way, it is possible to prevent the crown from being mistakenly recognized as one kanji. The character string is not limited to vertical writing and horizontal writing, and may be written diagonally or along the shape of a song as shown in FIG. 10, and is not particularly limited to the direction of character writing. For example, when the reading area DE is extracted with a curved shape as shown in FIG. 12B, the scanning window SW is set so as to be orthogonal to the curve direction which is the character string direction, and based on the detected blank band. The rectangular area B may be set. In this case, the rectangular region B to be divided may be, for example, a trapezoid or a shape in which some sides are curved.
 また、本実施形態では、会員証を例示して説明したが、特に限られることなく、免許証表裏、在留カード表裏、マイナンバーカード、マイナンバー通知カード等定型帳票や非定型帳票など限られることなくいずれにも本発明を適用することができる。 Further, in the present embodiment, the membership card has been illustrated as an example, but the present invention is not particularly limited, and there is no limitation on the front and back of the driver's license, the front and back of the residence card, the My Number card, the My Number notification card, and other standard forms and non-standard forms. The present invention can be applied to any of them.
 100…文字列認識装置、110…処理部、111…取得部、112…抽出部、113…分割部、114…決定部、115…出力部、120…記憶部、121…文字認識モデル、130…入出力IF、200…スキャナ、300…モニタ、900…会員証
 
100 ... character string recognition device, 110 ... processing unit, 111 ... acquisition unit, 112 ... extraction unit, 113 ... division unit, 114 ... determination unit, 115 ... output unit, 120 ... storage unit, 121 ... character recognition model, 130 ... Input / output IF, 200 ... Scanner, 300 ... Monitor, 900 ... Membership card

Claims (6)

  1.  画像化された文字列を含む文書画像から前記文字列を包含する文字領域を一行単位で抽出する抽出部と、
     前記文字領域を、文字が並ぶ並び方向に直交する空白帯に基づいて区切ることにより、複数の矩形領域に分割する分割部と、
     前記複数の矩形領域のうち評価対象である特定矩形領域の画像を、画像に表された一文字を推定する文字認識モデルへ入力して得られる候補文字を認識文字とするか、前記特定矩形領域に連続する他の矩形領域を前記特定矩形領域に結合した結合矩形領域の画像を、前記文字認識モデルへ入力して得られる候補文字を認識文字とするかを決定する決定部と、
     前記決定部が前記複数の矩形領域において前記特定矩形領域を遷移させつつ順次決定した複数の前記認識文字を結合して生成した認識文字列を出力する出力部と
    を備える文字列認識装置。
    An extraction unit that extracts a character area including the character string from a document image including the imaged character string on a line-by-line basis, and an extraction unit.
    A dividing portion that divides the character area into a plurality of rectangular areas by dividing the character area based on a blank band orthogonal to the arrangement direction in which the characters are arranged.
    The candidate character obtained by inputting the image of the specific rectangular area to be evaluated among the plurality of rectangular areas into the character recognition model for estimating one character represented in the image is used as the recognition character, or the specific rectangular area is used. A determination unit for determining whether a candidate character obtained by inputting an image of a combined rectangular area obtained by combining other continuous rectangular areas to the specific rectangular area as a recognition character is input to the character recognition model.
    A character string recognition device including an output unit in which the determination unit outputs a recognition character string generated by combining a plurality of the recognition characters sequentially determined while transitioning the specific rectangular area in the plurality of rectangular areas.
  2.  前記決定部は、生成される結合矩形領域の前記並び方向の長さが設定された最大長以下となる範囲で前記特定矩形領域に複数の前記他の矩形領域を結合することを許容することにより、複数の前記結合矩形領域を生成してそれぞれの画像を前記文字認識モデルへ入力する請求項1に記載の文字列認識装置。 The determination unit allows a plurality of the other rectangular regions to be combined with the specific rectangular region within a range in which the length of the generated combined rectangular region in the alignment direction is equal to or less than the set maximum length. The character string recognition device according to claim 1, wherein a plurality of the combined rectangular areas are generated and each image is input to the character recognition model.
  3.  前記最大長は、前記並び方向における前記複数の矩形領域のそれぞれの長さに基づいて設定される請求項2に記載の文字列認識装置。 The character string recognition device according to claim 2, wherein the maximum length is set based on the respective lengths of the plurality of rectangular areas in the arrangement direction.
  4.  前記決定部は、前記結合矩形領域を生成する場合に、矩形領域間に挟まれた前記空白帯を含めて結合する請求項1から3のいずれか1項に記載の文字列認識装置。 The character string recognition device according to any one of claims 1 to 3, wherein the determination unit joins the combined rectangular area including the blank band sandwiched between the rectangular areas.
  5.  前記決定部は、前記特定矩形領域の前記並び方向の長さが設定された基準長以下である場合には、前記候補文字に対して前記文字認識モデルが算出した確度を低下させる請求項1から4のいずれか1項に記載の文字列認識装置。 From claim 1, the determination unit reduces the accuracy calculated by the character recognition model for the candidate character when the length of the specific rectangular region in the alignment direction is equal to or less than the set reference length. The character string recognition device according to any one of 4.
  6.  画像化された文字列を含む文書画像から前記文字列を包含する文字領域を一行単位で抽出する抽出ステップと、
     前記文字領域を、文字が並ぶ並び方向に直交する空白帯に基づいて区切ることにより、複数の矩形領域に分割する分割ステップと、
     前記複数の矩形領域のうち評価対象である特定矩形領域の画像を、画像に表された一文字を推定する文字認識モデルへ入力して得られる候補文字を認識文字とするか、前記特定矩形領域に連続する他の矩形領域を前記特定矩形領域に結合した結合矩形領域の画像を、前記文字認識モデルへ入力して得られる候補文字を認識文字とするかを決定する決定ステップと、
     前記複数の矩形領域において前記特定矩形領域を遷移させつつ前記決定ステップを繰り返して順次決定した複数の前記認識文字を結合して生成した認識文字列を出力する出力ステップと
    をコンピュータに実行させる文字列認識プログラム。

     
    An extraction step of extracting a character area including the character string from a document image including the imaged character string line by line, and an extraction step.
    A division step of dividing the character area into a plurality of rectangular areas by dividing the character area based on a blank band orthogonal to the arrangement direction in which the characters are arranged.
    The candidate character obtained by inputting the image of the specific rectangular area to be evaluated among the plurality of rectangular areas into the character recognition model for estimating one character represented in the image is used as the recognition character, or the specific rectangular area is used. A determination step of determining whether to use a candidate character obtained by inputting an image of a combined rectangular area obtained by combining other continuous rectangular areas to the specific rectangular area into the character recognition model as a recognition character.
    A character string that causes a computer to execute an output step that outputs a recognition character string generated by combining a plurality of the recognition characters that are sequentially determined by repeating the determination step while transitioning the specific rectangular area in the plurality of rectangular areas. Recognition program.

PCT/JP2021/002588 2020-02-06 2021-01-26 Character string recognition device and character string recognition program WO2021157422A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2021575740A JP7382544B2 (en) 2020-02-06 2021-01-26 String recognition device and string recognition program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020018529 2020-02-06
JP2020-018529 2020-02-06

Publications (1)

Publication Number Publication Date
WO2021157422A1 true WO2021157422A1 (en) 2021-08-12

Family

ID=77199950

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/002588 WO2021157422A1 (en) 2020-02-06 2021-01-26 Character string recognition device and character string recognition program

Country Status (2)

Country Link
JP (1) JP7382544B2 (en)
WO (1) WO2021157422A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04211884A (en) * 1990-05-24 1992-08-03 Ricoh Co Ltd Method for segmenting character
JP6057112B1 (en) * 2016-04-19 2017-01-11 AI inside株式会社 Character recognition apparatus, method and program
JP2017531262A (en) * 2014-09-16 2017-10-19 アイフライテック カンパニー, リミテッドIflytek Co., Ltd. Intelligent scoring method and system for descriptive problems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04211884A (en) * 1990-05-24 1992-08-03 Ricoh Co Ltd Method for segmenting character
JP2017531262A (en) * 2014-09-16 2017-10-19 アイフライテック カンパニー, リミテッドIflytek Co., Ltd. Intelligent scoring method and system for descriptive problems
JP6057112B1 (en) * 2016-04-19 2017-01-11 AI inside株式会社 Character recognition apparatus, method and program

Also Published As

Publication number Publication date
JPWO2021157422A1 (en) 2021-08-12
JP7382544B2 (en) 2023-11-17

Similar Documents

Publication Publication Date Title
JP4504702B2 (en) Document processing apparatus, document processing method, and document processing program
JP4768451B2 (en) Image processing apparatus, image forming apparatus, program, and image processing method
EP1999688B1 (en) Converting digital images containing text to token-based files for rendering
US7715045B2 (en) System and methods for comparing documents
US20210064859A1 (en) Image processing system, image processing method, and storage medium
RU2621601C1 (en) Document image curvature eliminating
US10977511B2 (en) Optical character recognition of series of images
US9626601B2 (en) Identifying image transformations for improving optical character recognition quality
EP0576020B1 (en) Character recognizing method and apparatus
CN110263781B (en) Image processing device, image processing method, and storage medium
US20210073535A1 (en) Information processing apparatus and information processing method for extracting information from document image
US20060078204A1 (en) Image processing apparatus and method generating binary image from a multilevel image
Bae et al. Segmentation of touching characters using an MLP
WO2021157422A1 (en) Character string recognition device and character string recognition program
JP2002015280A (en) Device and method for image recognition, and computer- readable recording medium with recorded image recognizing program
US7873228B2 (en) System and method for creating synthetic ligatures as quality prototypes for sparse multi-character clusters
EP3832544A1 (en) Visually-aware encodings for characters
JP7317612B2 (en) Information processing device, information processing method and program
JPH0333990A (en) Optical character recognition instrument and method using mask processing
JP5298830B2 (en) Image processing program, image processing apparatus, and image processing system
JP5277750B2 (en) Image processing program, image processing apparatus, and image processing system
JP2015080032A (en) Image processing device and image processing program
JP7234554B2 (en) Information processing device and program
Caluori et al. Similarity measures for pattern matching on-the-fly
CN116721431A (en) Method for restoring character typesetting in image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21750054

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021575740

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21750054

Country of ref document: EP

Kind code of ref document: A1