WO2020203339A1 - Printed character string recognition device, program, and method - Google Patents

Printed character string recognition device, program, and method Download PDF

Info

Publication number
WO2020203339A1
WO2020203339A1 PCT/JP2020/012230 JP2020012230W WO2020203339A1 WO 2020203339 A1 WO2020203339 A1 WO 2020203339A1 JP 2020012230 W JP2020012230 W JP 2020012230W WO 2020203339 A1 WO2020203339 A1 WO 2020203339A1
Authority
WO
WIPO (PCT)
Prior art keywords
character string
image
unit
recognition
type
Prior art date
Application number
PCT/JP2020/012230
Other languages
French (fr)
Japanese (ja)
Inventor
武史 ▲吉▼田
昂平 安田
諒介 佐々木
康介 木戸
亮介 田嶋
大田 佳宏
Original Assignee
Arithmer株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arithmer株式会社 filed Critical Arithmer株式会社
Priority to JP2020536819A priority Critical patent/JP6820578B1/en
Publication of WO2020203339A1 publication Critical patent/WO2020203339A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/192Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
    • G06V30/194References adjustable by an adaptive method, e.g. learning

Definitions

  • This disclosure relates to a type string recognition device, a program, and a method.
  • Patent Document 1 Japanese Unexamined Patent Publication No. 2003-115028 discloses a form processing technique for identifying whether a standard form corresponds to any of a plurality of types of registered forms.
  • the type character string recognition device of the first viewpoint includes an acquisition unit, an extraction unit, a division unit, and a recognition unit.
  • the acquisition unit acquires a document image in which a print character string is written in the reading area.
  • the extraction unit extracts the image of the reading area based on the reference document image.
  • the dividing unit divides the image of the printed character string written in the reading area into an image of a rectangular area of one character unit for each row or column.
  • the recognition unit recognizes the contents of the printed character string by using the character string recognition model.
  • the character string recognition model infers the content of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area.
  • the print character string recognition device of the second viewpoint is the print character string recognition device of the first viewpoint, and the extraction unit extracts a plurality of reading areas corresponding to a plurality of predetermined items. Further, when two or more predetermined items are associated with the same character string recognition model, the recognition unit synthesizes rectangular areas obtained from the two or more predetermined items, and combines the combined rectangular areas with the same character. Fill in the column recognition model.
  • a print character string is written in association with a plurality of predetermined items.
  • the character string recognition model is constructed by a neural network, and a plurality of character string recognition models are prepared in association with a plurality of predetermined items. With such a configuration, the recognition speed of the printed character string can be increased.
  • the type character string recognition device of the third viewpoint is the type character string recognition device of the first viewpoint or the second viewpoint, and the recognition unit generates a candidate character string from the type characters inferred by using the character string recognition model. To do.
  • the recognition unit compares the candidate character string with the reference character string stored in advance to calculate the similarity. Then, when the similarity is equal to or higher than a predetermined value, the recognition unit recognizes the reference character string as the content of the print character string. With such a configuration, it is possible to improve the recognition accuracy of the print character string for the item for which the candidate character string is expected.
  • the print character string recognition device of the fourth viewpoint is the print character string recognition device of the third viewpoint, and when the recognition unit has a similarity smaller than a predetermined value, the candidate character string is recognized as the content of the print character string. With such a configuration, it is possible to improve the recognition accuracy of the printed character string for the item for which free description is permitted.
  • the print character string recognition device of the fifth viewpoint is the print character string recognition device of the first to fourth viewpoints, and the dividing unit binarizes the image in the reading area.
  • the dividing unit specifies a continuous region in which a plurality of printed characters are continuous along the row direction or the column direction of the binarized image. Then, the dividing unit scans the image of the continuous region in the column direction or the row direction and divides the image into a binary rectangular region image for each character. With such a configuration, the rectangular area of the printed character can be divided with high accuracy.
  • the print character string recognition device of the sixth viewpoint is the print character string recognition device of the fifth viewpoint, and the recognition unit recognizes the contents of the print character string from the image of the binarized rectangular area. With such a configuration, the recognition speed of the printed character string can be increased.
  • the print character string recognition device of the seventh viewpoint is the print character string recognition device of the fifth viewpoint, and the recognition unit is an image of the rectangular area before binarization corresponding to the binarized rectangular area. Recognize the contents of the type string from. With such a configuration, the recognition accuracy of the printed character string can be improved.
  • the print character string recognition device of the eighth viewpoint is a print character string recognition device of the first to fourth viewpoints, and the dividing unit binarizes the image in the reading area.
  • the dividing unit specifies a continuous region in which a plurality of printed characters are continuous along the row direction or the column direction of the binarized image. Then, the dividing unit divides the binarized image corresponding to the continuous region into a rectangular region image based on a unit length corresponding to one character along the row direction or the column direction. With such a configuration, the rectangular area of the printed character can be divided with high accuracy.
  • the print character string recognition device of the ninth viewpoint is a print character string recognition device of the first to eighth viewpoints, and displays the recognition result of the print character string by the recognition unit by changing the display form according to the character type. Is what you do. With such a configuration, the correctness of the character type to be read can be easily determined.
  • the typographic character string recognition program of the tenth viewpoint causes the computer to function as an acquisition unit, an extraction unit, a division unit, and a recognition unit.
  • the acquisition unit acquires a document image in which a print character string is written in the reading area.
  • the extraction unit extracts the image of the reading area based on the reference document image.
  • the dividing unit divides the image of the printed character string written in the reading area into an image of a rectangular area in units of one character.
  • the recognition unit recognizes the contents of the printed character string by using the character string recognition model.
  • the character string recognition model infers the content of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area.
  • the 11th viewpoint type character string recognition method is a method of recognizing the contents of the type character string using a computer.
  • a document image in which a print character string is written in a reading area is acquired.
  • the image in the reading area is extracted based on the reference document image.
  • the image of the printed character string written in the reading area is divided into an image of a rectangular area for each character.
  • the content of the printed character string is recognized by using the character string recognition model.
  • the character string recognition model infers the content of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area.
  • FIG. 1 is a schematic diagram showing a configuration of a print character string recognition device 20 according to the present embodiment.
  • the print character string recognition device 20 recognizes the contents of the print character string ML described in the form image Gs by using the character string recognition model 21C described later.
  • the contents to be entered are displayed in print in the predetermined item column of the form image Gs according to the present embodiment.
  • the form image Gs means an image on which an arbitrary form is copied.
  • each type character constituting the type character string referred to here has the same character width.
  • the print character string recognition device 20 is described as recognizing the contents of the print character string described in the form image Gs, but the recognition target of the print character string recognition device 20 is the form. It is not limited to images.
  • the type character string recognition device can recognize the contents of a "document image" on which an arbitrary document in which a character string is entered is copied.
  • item K1 corresponds to the zip code
  • item K2 corresponds to the address
  • item K3 corresponds to the name
  • item K4 corresponds to the registration number
  • item K5 corresponds to the article used
  • item K6 corresponds to the purpose of use.
  • the contents of these items are merely examples and are not limited thereto.
  • the type character string recognition device 20 can be realized by an arbitrary computer, and includes a storage unit 21, an input unit 22, an output unit 23, and a processing unit 24.
  • the storage unit 21 stores various types of information, and is realized by an arbitrary storage device such as a memory and a hard disk.
  • the storage unit 21 stores information such as the weight of the neural network that constructs the character string recognition model 21C.
  • the character string recognition model 21C infers the content of the printed characters entered in the image of the rectangular area in response to the input of the image of the rectangular area of each character.
  • the character string recognition model 21C is constructed by a convolutional neural network (CNN) or the like whose weights are adjusted based on the teacher image on which the printed characters are copied.
  • CNN convolutional neural network
  • the input unit 22 is realized by an arbitrary input device such as a keyboard, a mouse, and a touch panel, and inputs various information to the computer.
  • the output unit 23 is realized by an arbitrary output device such as a display, a touch panel, and a speaker, and outputs various information from a computer.
  • the processing unit 24 executes various types of information processing, and is realized by a processor such as a CPU or GPU and a memory.
  • a processor such as a CPU or GPU and a memory.
  • the processing unit 24 uses the acquisition unit 24A, the extraction unit 24B, the division unit 24C, and the recognition unit. Functions as 24D.
  • the acquisition unit 24A acquires the form image Gs in which the print character string ML is entered.
  • the acquisition unit 24A acquires the image of the form to be read as the form image Gs via an arbitrary imaging device.
  • the extraction unit 24B matches the form image Gs with the reference form image Gc and extracts the reading area R.
  • the matching referred to here is not only to directly compare the images to obtain the difference, but also to obtain the reading area R from the form image Gs based on the coordinate information corresponding to the predetermined item K acquired from the reference form image Gc. It means the process of extracting.
  • the column corresponding to the predetermined item K of the form image Gs is blank. Therefore, in the examples shown in FIGS. 2 and 3, the contents corresponding to the predetermined items K1 to K6 in the form image Gs are extracted as the reading areas R1 to R6, respectively, based on the form image Gs and the reference form image Gc.
  • FIG. 1 in the reference form image Gc
  • FIG. 4A is an image of the reading area R1 corresponding to the item K1
  • FIG. 4B is an image of the reading area R2 corresponding to the item K2
  • FIG. 4C is a reading area corresponding to the item K3.
  • the image of R3, FIG. 4 (d) is the image of the reading area R4 corresponding to the item K4
  • FIG. 4 (e) is the image of the reading area R5 corresponding to the item K5
  • FIG. 4 (f) is the reading corresponding to the item K6.
  • the image of the region R6 is shown.
  • the reference form image Gc and the coordinate information corresponding to the predetermined item K in the reference form image Gc are stored in advance in the storage unit 21.
  • the division unit 24C divides the type character string ML written in the reading area R into a rectangular area T in units of one character. Details will be described later.
  • the recognition unit 24D recognizes the contents of the print character string ML based on the image of the rectangular area T.
  • the recognition unit 24D recognizes the contents of the print character string ML by using the character string recognition model 21C.
  • FIG. 5 is a flowchart for explaining the operation of the print character string recognition device 20 according to the present embodiment.
  • the form image Gs to be read is imaged via an arbitrary imaging device.
  • these form images Gs are timely stored in the storage unit 21 of the print character string recognition device 20 (A1).
  • the print character string recognition device 20 extracts the image of the reading area R based on the reference form image Gc (A2). Subsequently, the print character string recognition device 20 divides the image of the print character string ML written in the reading area R into an image of the rectangular area T in units of one character for each line (A3). At this time, the type character string recognition device 20 identifies a continuous area S in which a plurality of type characters are continuous along the line direction in the reading area R by the function of the division unit 24C. Taking item K2 (address) as an example, as shown in FIG. 6, the division unit 24C binarizes the image of the reading area R2 of item K2 (FIG. 6A) and binarizes the reading area. (Fig.
  • the dividing unit 24C expands the image of the binarized reading area R2 in the line direction to fill the character portion (FIG. 6C).
  • adjacent characters are combined at predetermined character intervals, and a plurality of characters up to a blank between words are specified as continuous areas S1 to S4.
  • the images in which "Aichi Prefecture”, “Nagoya City”, “Naka Ward”, and "X-chome Y-address z" are described are specified as individual continuous regions S1 to S4 (FIG. 6). 6 (d)).
  • the print character string recognition device 20 scans each continuous area S1 to S4 in the column direction by the function of the division unit 24C and divides each continuous area S1 to S4 for each character. Specifically, the continuous regions S1 to S4 are scanned along the line L having a width equal to or less than the character spacing, and the luminance value in the column direction is calculated. Then, the area where the calculated luminance value is equal to or less than a predetermined value is regarded as a space corresponding to the character spacing, and the image corresponding to the part up to the next space is extracted as a rectangular area T for each character. For example, as shown in FIG.
  • FIG. 7A is a diagram showing the concept of the line L scanning the continuous region S1
  • FIG. 7B is a diagram showing the luminance value in the continuous region S1.
  • FIG. 7B the luminance value in the column direction when the continuous region S1 is scanned by the line L is shown corresponding to the pixels in the horizontal direction of the image in the continuous region S1.
  • the print character string recognition device 20 recognizes the contents of the print character string ML based on the image of the rectangular area T divided into character units (A4). Specifically, the print character string recognition device 20 recognizes the content of the print character string ML by inferring the content of the print character using the character string recognition model 21C constructed by the neural network.
  • the term "inference” as used herein means a recognition result of a printed character calculated by a neural network whose weight is adjusted based on a general-purpose character image. For example, as shown in FIG. 8, when the image of the rectangular region T1 corresponding to the part "love” is input to the character string recognition model 21C, “love”, “melancholy”, “ ⁇ ”, “receive”, etc. are “love”.
  • a type character similar in shape to "" is inferred with accuracy p.
  • the accuracy of "love” is 99.58309
  • the accuracy of "melancholy” is 0.0040124
  • the accuracy of " ⁇ ” is 1.26771865 ⁇ (10-5th power)
  • the accuracy of "receive” is It is inferred to be 4.238405 ⁇ (10 to the -9th power). Therefore, since the probability of "love” is the highest, the content of the image in the rectangular region T1 is recognized as “love”.
  • the print character string ML described in the predetermined item K of the form image Gs is recognized.
  • the blanks between words are specified as continuous areas, but the type character string recognition device 20 according to the present embodiment does not consider the blanks between words if they are on the same line.
  • a continuous region may be specified.
  • one continuous region S in which continuous regions S1 to S3 of the same line are connected may be specified without considering blanks between words.
  • up to "Naka Ward, Nagoya City, Aichi Prefecture” will be specified as a continuous area. Even if the continuous area is specified for each line in this way, it is possible to obtain the rectangular area T divided into character units.
  • the print character string recognition device 20 includes an acquisition unit 24A, an extraction unit 24B, a division unit 24C, and a recognition unit 24D.
  • the acquisition unit 24A acquires the form image Gs in which the print character string ML is written in the reading area R.
  • the extraction unit 24B extracts the image of the reading area R based on the reference form image Gc.
  • the dividing unit 24C divides the image of the printed character string ML written in the reading area R into an image of the rectangular area T in units of one character for each row or column.
  • the recognition unit 24D recognizes the contents of the print character string ML based on the image of the rectangular area T. In this way, since the printed character string ML written in the form can be extracted as an image in units of one character, the characters contained in a large number of form images can be read at high speed and with high accuracy.
  • the recognition unit 24D recognizes the contents of the print character string ML by using the character string recognition model 21C.
  • the character string recognition model 21C is constructed by a neural network, and infers the content of the printed characters written in the image of the rectangular area T in response to the input of the image of the rectangular area T. Therefore, by using such a neural network, the recognition accuracy of the printed character string ML can be improved.
  • the division unit 24C performs binarization processing of the image in the reading area R. Further, the dividing unit 24C specifies a continuous region S in which a plurality of printed characters are continuous along the line direction of the binarized image. Then, the division unit 24C scans the image of the continuous region S in the column direction and divides the image into the image of the rectangular region T binarized in character units. With such a configuration, the image of the continuous region S can be divided into the image of the rectangular region T of the printed characters with high accuracy. Then, by using the character string recognition model 21C constructed by the neural network optimized for the image of each character, the printed characters can be recognized with high accuracy.
  • the recognition unit 24D recognizes the contents of the print character string ML from the image of the binarized rectangular area T, the recognition speed of the print character string ML can be increased.
  • noise is removed from the image of the binarized rectangular region T, it is possible to reduce the calculation load executed when recognizing the print character string ML. As a result, the recognition speed of the print character string ML can be increased.
  • the division unit 24C has a continuous area S in which a plurality of type characters are continuous along the row direction, but may also have a continuous area in which a plurality of type characters are continuous along the column direction. Good. However, when the dividing portion 24C specifies a continuous region along the column direction, the dividing portion 24C scans the image of the continuous region S in the row direction to obtain an image of the rectangular region T binarized in character units. To divide.
  • the recognition unit 24D recognizes the content of the print character string ML from the image of the binarized rectangular region T, but the recognition unit 24D according to the present embodiment is limited to this. is not.
  • the recognition unit 24D according to the present embodiment extracts the image of the rectangular area in the form image Gs before binarization from the image of the binarized rectangular area T and recognizes the contents of the print character string ML. It may be. With such a configuration, the recognition accuracy of the print character string ML can be improved.
  • the size may be different from the area in which the printed characters are described in the form image Gs before the binarization. is there. Therefore, the recognition accuracy of the type character can be improved by using the binarized image until the rectangular area of each character is obtained and using the image before the binarization when inferring the type character.
  • the position of the rectangular region T divided after the binarization process is projected onto the form image Gs before the binarization process to extract a rectangular image, and this image is input to the character string recognition model 21C.
  • the character string recognition model 21C in this case is not a binarized character image, but an image in which characters are copied in grayscale or color display is learned as a teacher image.
  • the processing unit 24 described above may execute a process of displaying the recognition result of the type character string by the recognition unit 24D by changing the display form according to the character type.
  • the character string recognition model 21C When the contents of the printed character string are inferred using the character string recognition model 21C, the kanji character “ko” and the katakana character “e”, the alphabet character “O” and the number “0", etc. May be misidentified because they have similar outer shapes. Therefore, by displaying the character string recognition model 21C in different colors according to the difference in numbers, alphabets, and symbols, the correctness of the identification result of the character string recognition model 21C can be determined at a glance.
  • the form image Gs to be read is stored in the storage unit 21, but the type character string recognition device 20 according to the present embodiment is not limited to this.
  • the form image Gs may be stored in the external storage device 121.
  • the security level of the external storage device 121 can be raised, and the risk of the form image Gs being leaked can be reduced.
  • the print character string recognition device 20 may be connected to the external storage device 121 via a network. Further, in this case, the acquisition unit 24A of the print character string recognition device 20 may access the external storage device 121 only when reading the form image Gs and read the form image Gs.
  • the type character string recognition device 20 is configured to include the input unit 22 and the output unit 23, but the type character string recognition device 20 according to the present embodiment is not limited to this.
  • the type character string recognition device according to the present embodiment does not necessarily have to include the input unit 22 and the output unit 23.
  • the print character string recognition device 20 according to the present embodiment may be configured as a system in which a plurality of devices are connected via a network.
  • a terminal device 100 having an input unit 122 and an output unit 123, which has the same functions as the input unit 22 and the output unit 23, and a print character string recognition device 20 are connected via a network. It may be a printed character string recognition system.
  • the print character string recognition device 20 may be connected to a plurality of terminal devices 100.
  • the print character string recognition device in the above description is referred to as a print character string recognition system.
  • FIG. 11 is a schematic view showing the configuration of the print character string recognition device 20S according to the second embodiment.
  • the print character string recognition device 20S according to the second embodiment performs processing using the reference character string MS.
  • the configurations already described will be designated by substantially the same reference numerals, and duplicate description will be omitted.
  • the configuration peculiar to the present embodiment will be described with the subscript "S”.
  • the storage unit 21S stores the reference character string database 21D.
  • the reference character string database 21D stores in advance the reference character string MS associated with each of the predetermined items K1 to K6 in the form image Gs. For example, in the reference character string database 21D, as shown in FIG. 12, when the predetermined item K5 of the form image Gs is "used article", “private ordinary passenger car” “private small passenger car” “private light four-wheeled passenger car” " Each of the "private light four-wheeled freight car” and “two-wheeled vehicle” is stored as the reference character strings MS5a to MS5e.
  • the recognition unit 24DS calculates the similarity degree by comparing the candidate character string MK described later with the reference character string MS stored in advance, and the reference has a high degree of similarity.
  • the character string MS is recognized as the content of the print character string ML.
  • the recognition unit 24D recognizes the candidate character string MK as the content of the print character string ML.
  • FIG. 13 is a flowchart for explaining the operation of the print character string recognition device 20S according to the second embodiment.
  • steps B4 to B8 are executed instead of the above-mentioned steps A4. Therefore, also in the print character string recognition device 20S according to the present embodiment, first, the same processing as in steps A1 to A3 described above is executed (B1 to B3).
  • the print character string recognition device 20S generates a candidate character string MK from the print characters inferred using the character string recognition model 21C (B4).
  • the print character string recognition device 20S compares the candidate character string MK with the reference character string MS stored in advance to calculate the similarity (B5).
  • the print character string recognition device 20S recognizes the reference character string MS having the maximum similarity as the content of the print character string ML (B6-Yes, B7).
  • the print character string recognition device 20S recognizes the candidate character string MK as the content of the print character string (B6-No, B8).
  • the recognition unit 24DS compares the reference character string MS5 with the candidate character string MK5 and calculates the degree of similarity between the two.
  • “private passenger car”, “small private car”, “private light four-wheeled passenger car”, “private light four-wheeled freight car”, and "two-wheeled vehicle” are stored as reference character strings MS5a to MS5e.
  • the print character string recognition device 20S can further improve the recognition accuracy of the print character string ML by using the reference character string database 21D.
  • the characters included in a large number of form images can be read at high speed and with high accuracy.
  • the print character string recognition device 20S recognizes the candidate character string MK as the content of the print character string ML when the similarity is equal to or less than a predetermined value.
  • the characters included in the form image Gs can be read with high accuracy even when the reference character string MS cannot be registered in advance.
  • a free entry field may be provided in addition to the preset items.
  • the print character string recognition device 20S according to the present embodiment when the similarity is equal to or less than a predetermined value, it is considered that the contents other than the preset items are entered, and this is read with high accuracy.
  • FIG. 16 is a schematic view showing the configuration of the print character string recognition device 20T according to the third embodiment.
  • the configurations already described will be designated by substantially the same reference numerals, and duplicate description will be omitted.
  • the configuration peculiar to the present embodiment will be described with the subscript "T”.
  • the storage unit 21T stores a plurality of character string recognition models 21C1 to 21Cn (n is 2 or more and less than the number of items K) in association with a plurality of predetermined items K1 to K6.
  • n is 2 or more and less than the number of items K
  • a plurality of character string recognition models 21C1 to 21C4 4) are stored in the storage unit 21T.
  • the recognition unit 24DT has two or more predetermined items when two or more predetermined items are associated with the same character string recognition model 21Ci (i is a value of any of 1 to 4).
  • the images of the rectangular area obtained from the above are combined, and the combined image of the rectangular area is input to the same character string recognition model 21Ci.
  • the predetermined item K of the form image Gs to be read there are an item in which only numbers are entered, an item in which only kana characters are entered, an item in which arbitrary characters are entered, and the like.
  • arbitrary characters are entered in the above-mentioned item K2 (address) and item K3 (name). Therefore, the item K2 (address) and the item K3 (name) are associated with the character string recognition model 21C2 capable of identifying any character.
  • the recognition unit 24DT synthesizes the images of the rectangular area obtained from the item K2 (address) and the item K3 (name), and inputs the combined images of the rectangular area into the character string recognition model 21C2 at once. ..
  • the print character string recognition device 20T infers the print character string ML at a time with the same character string recognition model 21Ci for two or more predetermined items according to the above-described configuration, so that the recognition speed of the print character string ML is increased. be able to.
  • the item K2 (address) is supported. There is no difference in recognition accuracy as compared with the case where the image of the rectangular area to be used and the image of the rectangular area corresponding to the item K3 (name) are combined and input to the character string recognition model 21C2 at once.
  • the character string recognition model 21C2 is realized by the neural network, the calculation speed is faster in the latter case. Therefore, it is possible to improve the reading speed of the form image Gs by collectively processing the items having the same character type as in the type character string recognition device 20T according to the third embodiment.
  • the character string recognition model 21Ci is constructed in association with the character type used for each item, the recognition accuracy and recognition speed of the print character string ML can be improved. For example, only numbers are entered in the above-mentioned item K1 (zip code). Therefore, for the item K1 (zip code), it is sufficient to use the character string recognition model 21C1 capable of identifying only numbers, and the recognition accuracy and recognition speed are higher than those using the character string recognition model 21C2 capable of identifying arbitrary characters. Can be enhanced.
  • FIG. 17 is a schematic view showing the configuration of the print character string recognition device 20U according to the fourth embodiment.
  • the configurations already described will be designated by substantially the same reference numerals, and duplicate description will be omitted.
  • the configuration peculiar to the present embodiment will be described with the subscript "U”.
  • step A3 another method is adopted when extracting a rectangular image in units of one character.
  • the print character string recognition device 20U includes a division unit 24CU.
  • the division unit 24CU is specified as continuous regions S1 to S4 by the same processing as described above. That is, in the example shown in FIG. 6, the images in which "Aichi Prefecture”, “Nagoya City”, “Naka Ward”, and "X-chome Y-address z" are described are specified as individual continuous regions S1 to S4 (FIG. 6 (FIG. 6). d)).
  • the dividing unit 24CU divides the binarized image corresponding to the continuous areas S1 to S4 into a rectangular area image based on a unit length corresponding to one character along the line direction. Specifically, when a one-character image is written in a rectangular area having 15 pixels in each of the row direction and the column direction, the binarized image corresponding to the continuous area S1 has a length of 48 pixels in the row direction. Suppose you have. In this case, the value 3.2 obtained by dividing 48 pixels by 15 pixels is rounded off to determine the number of divisions, and the binarized image corresponding to the continuous region S1 is divided by this number of divisions.
  • the print character string recognition device 20U regards the divided image as an image of the rectangular area T divided into character units, and recognizes the contents of the print character string ML by using the character string recognition model 21C.
  • the print character string recognition device 20U can extract the print character string ML written in the form as an image for each character, so that the characters contained in a large number of form images can be read at high speed and with high accuracy. Can be done.
  • the reading accuracy may be improved by the method according to this embodiment.
  • the width of the line L that scans the continuous regions S1 to S4 is narrow, and the katakana "ri" and the like are used as two characters such as "1" and "no". May be read.
  • the print character string recognition device 20U according to the present embodiment it is possible to read print characters having the same character width with high accuracy without reading two characters from one character.
  • the print character string recognition device 20U according to the fourth embodiment can also be applied as it is with the features and modifications of the first to third embodiments other than the above (13-2).
  • the present disclosure is not limited to each of the above embodiments as it is.
  • the components can be modified and embodied without departing from the gist at the implementation stage.
  • various disclosures can be formed by appropriately combining the plurality of components disclosed in each of the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Further, the components may be appropriately combined in different embodiments.
  • Type character string recognition device 20 Type character string recognition device 20S Type character string recognition device 20T Type character string recognition device 20U Type character string recognition device 21 Storage unit 21C Character string recognition model 21D Reference character string database 22 Input unit 23 Output unit 24 Processing unit 24A Acquisition unit 24B Extraction unit 24C Division unit 24D Recognition unit 100 Terminal device 121 External storage device 122 Input unit 123 Output unit Gc Reference form image Gs Form image ML Printed character string MK Candidate character string MS Reference character string R Reading area T Rectangular area

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

This printed character string recognition 20 is provided with an acquisition unit 24A, an extraction unit 24B, a division unit 24C, and a recognition unit 24D. The acquisition unit 24A acquires a business form image Gs in which a printed character string ML is entered in a reading area R. The extraction unit 24B extracts the image of the reading area R on the basis of a reference business form image Gc. The division unit 24C divides the image of the printed character string ML entered in the reading area R into images of a rectangular area T of one character unit. The recognition unit 24D recognizes the content of the printed character string ML on the basis of the images of a rectangular area T. Here, the recognition unit 24D recognizes the content of the printed character string ML by using a character string recognition model 21C which infers the content of a printed character ML entered in the image T of the rectangular area according to an input of the image T of the rectangular area.

Description

活字文字列認識装置、プログラム、及び方法。Type string recognition device, program, and method.
 本開示は、活字文字列認識装置、プログラム、及び方法に関する。 This disclosure relates to a type string recognition device, a program, and a method.
 従来、帳票画像に含まれている文字を認識する技術の開発が進められている。例えば、特許文献1(特開2003-115028号公報)には、定型帳票が登録されている複数種類の帳票のいずれかに該当しているかを識別する帳票処理技術が開示されている。 Conventionally, the development of technology for recognizing characters contained in form images has been promoted. For example, Patent Document 1 (Japanese Unexamined Patent Publication No. 2003-115028) discloses a form processing technique for identifying whether a standard form corresponds to any of a plurality of types of registered forms.
 処理対象の帳票画像が多数存在する場合、これらの帳票画像に含まれている文字列を高速かつ高精度に読み取ることが求められる。 When there are many form images to be processed, it is required to read the character strings contained in these form images at high speed and with high accuracy.
 第1観点の活字文字列認識装置は、取得部と、抽出部と、分割部と、認識部とを備える。取得部は、読取領域に活字文字列が記入された文書画像を取得する。抽出部は、読取領域の画像を基準文書画像に基づいて抽出する。分割部は、読取領域に記入された活字文字列の画像を行毎又は列毎に一文字単位の矩形領域の画像に分割する。認識部は、文字列認識モデルを用いて、活字文字列の内容を認識する。ここで、文字列認識モデルは、矩形領域の画像の入力に応じて、当該矩形領域の画像に記入された活字文字の内容を推論するものである。このような構成により、多数の文書画像(帳票画像を含む)に含まれている文字列を高速かつ高精度に読み取ることができる。 The type character string recognition device of the first viewpoint includes an acquisition unit, an extraction unit, a division unit, and a recognition unit. The acquisition unit acquires a document image in which a print character string is written in the reading area. The extraction unit extracts the image of the reading area based on the reference document image. The dividing unit divides the image of the printed character string written in the reading area into an image of a rectangular area of one character unit for each row or column. The recognition unit recognizes the contents of the printed character string by using the character string recognition model. Here, the character string recognition model infers the content of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. With such a configuration, character strings contained in a large number of document images (including form images) can be read at high speed and with high accuracy.
 第2観点の活字文字列認識装置は、第1観点の活字文字列認識装置であって、抽出部が、複数の所定項目に対応する複数の読取領域を抽出する。また、認識部が、二以上の所定項目が同一の文字列認識モデルに関連付けられている場合、当該二以上の所定項目から得られた矩形領域を合成し、合成した矩形領域を当該同一の文字列認識モデルに入力する。なお、ここでは、文書画像には、複数の所定項目に関連付けられて活字文字列が記入されている。また、文字列認識モデルは、ニューラルネットワークにより構築されているものであって、複数の所定項目に関連付けて複数準備されている。このような構成により、活字文字列の認識速度を速くすることができる。 The print character string recognition device of the second viewpoint is the print character string recognition device of the first viewpoint, and the extraction unit extracts a plurality of reading areas corresponding to a plurality of predetermined items. Further, when two or more predetermined items are associated with the same character string recognition model, the recognition unit synthesizes rectangular areas obtained from the two or more predetermined items, and combines the combined rectangular areas with the same character. Fill in the column recognition model. Here, in the document image, a print character string is written in association with a plurality of predetermined items. Further, the character string recognition model is constructed by a neural network, and a plurality of character string recognition models are prepared in association with a plurality of predetermined items. With such a configuration, the recognition speed of the printed character string can be increased.
 第3観点の活字文字列認識装置は、第1観点又は第2観点の活字文字列認識装置であって、認識部が、文字列認識モデルを用いて推論された活字文字から候補文字列を生成する。また、認識部は、候補文字列を予め記憶された参照文字列と比較して類似度を算出する。そして、認識部は、類似度が所定値以上の場合、参照文字列を活字文字列の内容として認識する。このような構成により、候補文字列が想定される項目について、活字文字列の認識精度を高めることができる。 The type character string recognition device of the third viewpoint is the type character string recognition device of the first viewpoint or the second viewpoint, and the recognition unit generates a candidate character string from the type characters inferred by using the character string recognition model. To do. In addition, the recognition unit compares the candidate character string with the reference character string stored in advance to calculate the similarity. Then, when the similarity is equal to or higher than a predetermined value, the recognition unit recognizes the reference character string as the content of the print character string. With such a configuration, it is possible to improve the recognition accuracy of the print character string for the item for which the candidate character string is expected.
 第4観点の活字文字列認識装置は、第3観点の活字文字列認識装置であって、認識部が、類似度が所定値より小さい場合、候補文字列を活字文字列の内容として認識する。このような構成により、自由な記載が認められる項目について活字文字列の認識精度を高めることができる。 The print character string recognition device of the fourth viewpoint is the print character string recognition device of the third viewpoint, and when the recognition unit has a similarity smaller than a predetermined value, the candidate character string is recognized as the content of the print character string. With such a configuration, it is possible to improve the recognition accuracy of the printed character string for the item for which free description is permitted.
 第5観点の活字文字列認識装置は、第1観点から第4観点の活字文字列認識装置であって、分割部が、読取領域の画像を二値化処理する。また、分割部は、二値化処理した画像を行方向または列方向に沿って、複数の活字文字が連続する連続領域を特定する。そして、分割部は、連続領域の画像を、列方向または行方向に走査して一文字単位の二値化された矩形領域の画像に分割する。このような構成により、活字文字の矩形領域を高精度に分割することができる。 The print character string recognition device of the fifth viewpoint is the print character string recognition device of the first to fourth viewpoints, and the dividing unit binarizes the image in the reading area. In addition, the dividing unit specifies a continuous region in which a plurality of printed characters are continuous along the row direction or the column direction of the binarized image. Then, the dividing unit scans the image of the continuous region in the column direction or the row direction and divides the image into a binary rectangular region image for each character. With such a configuration, the rectangular area of the printed character can be divided with high accuracy.
 第6観点の活字文字列認識装置は、第5観点の活字文字列認識装置であって、認識部が、二値化された矩形領域の画像から、活字文字列の内容を認識する。このような構成により活字文字列の認識速度を速くすることができる。 The print character string recognition device of the sixth viewpoint is the print character string recognition device of the fifth viewpoint, and the recognition unit recognizes the contents of the print character string from the image of the binarized rectangular area. With such a configuration, the recognition speed of the printed character string can be increased.
 第7観点の活字文字列認識装置は、第5観点の活字文字列認識装置であって、認識部が、二値化された矩形領域に対応する、二値化される前の矩形領域の画像から、活字文字列の内容を認識する。このような構成により活字文字列の認識精度を高めることができる。 The print character string recognition device of the seventh viewpoint is the print character string recognition device of the fifth viewpoint, and the recognition unit is an image of the rectangular area before binarization corresponding to the binarized rectangular area. Recognize the contents of the type string from. With such a configuration, the recognition accuracy of the printed character string can be improved.
 第8観点の活字文字列認識装置は、第1観点から第4観点の活字文字列認識装置であって、分割部が、読取領域の画像を二値化処理する。また、分割部は、二値化処理した画像を行方向または列方向に沿って、複数の活字文字が連続する連続領域を特定する。そして、分割部は、連続領域に対応する二値化処理した画像を行方向又は列方向に沿って、一文字に相当する単位長さに基づいて矩形領域の画像に分割する。このような構成により、活字文字の矩形領域を高精度に分割することができる。 The print character string recognition device of the eighth viewpoint is a print character string recognition device of the first to fourth viewpoints, and the dividing unit binarizes the image in the reading area. In addition, the dividing unit specifies a continuous region in which a plurality of printed characters are continuous along the row direction or the column direction of the binarized image. Then, the dividing unit divides the binarized image corresponding to the continuous region into a rectangular region image based on a unit length corresponding to one character along the row direction or the column direction. With such a configuration, the rectangular area of the printed character can be divided with high accuracy.
 第9観点の活字文字列認識装置は、第1観点から第8観点の活字文字列認識装置であって、認識部による活字文字列の認識結果を、文字種別に応じて表示形態を変えて表示するものである。このような構成により、読取対象の文字種別の正誤を容易に判定できる。 The print character string recognition device of the ninth viewpoint is a print character string recognition device of the first to eighth viewpoints, and displays the recognition result of the print character string by the recognition unit by changing the display form according to the character type. Is what you do. With such a configuration, the correctness of the character type to be read can be easily determined.
 第10観点の活字文字列認識プログラムは、コンピュータを、取得部、抽出部、分割部、認識部、として機能させる。取得部は、読取領域に活字文字列が記入された文書画像を取得する。抽出部は、読取領域の画像を基準文書画像に基づいて抽出する。分割部は、読取領域に記入された活字文字列の画像を一文字単位の矩形領域の画像に分割する。認識部は、文字列認識モデルを用いて、活字文字列の内容を認識する。ここで、文字列認識モデルは、矩形領域の画像の入力に応じて、当該矩形領域の画像に記入された活字文字の内容を推論するものである。このような構成により、多数の文書画像に含まれている文字列を高速かつ高精度に読み取ることができる。 The typographic character string recognition program of the tenth viewpoint causes the computer to function as an acquisition unit, an extraction unit, a division unit, and a recognition unit. The acquisition unit acquires a document image in which a print character string is written in the reading area. The extraction unit extracts the image of the reading area based on the reference document image. The dividing unit divides the image of the printed character string written in the reading area into an image of a rectangular area in units of one character. The recognition unit recognizes the contents of the printed character string by using the character string recognition model. Here, the character string recognition model infers the content of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. With such a configuration, character strings contained in a large number of document images can be read at high speed and with high accuracy.
 第11観点の活字文字列認識方法は、コンピュータを用いて活字文字列の内容を認識する方法である。この方法では、読取領域に活字文字列が記入された文書画像を取得する。そして、読取領域の画像を基準文書画像に基づいて抽出する。次に、読取領域に記入された活字文字列の画像を一文字単位の矩形領域の画像に分割する。そして、文字列認識モデルを用いて、活字文字列の内容を認識する。ここで、文字列認識モデルは、矩形領域の画像の入力に応じて、当該矩形領域の画像に記入された活字文字の内容を推論するものである。このような構成により、多数の文書画像に含まれている文字列を高速かつ高精度に読み取ることができる。 The 11th viewpoint type character string recognition method is a method of recognizing the contents of the type character string using a computer. In this method, a document image in which a print character string is written in a reading area is acquired. Then, the image in the reading area is extracted based on the reference document image. Next, the image of the printed character string written in the reading area is divided into an image of a rectangular area for each character. Then, the content of the printed character string is recognized by using the character string recognition model. Here, the character string recognition model infers the content of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. With such a configuration, character strings contained in a large number of document images can be read at high speed and with high accuracy.
第1実施形態に係る活字文字列認識装置20の構成を示す模式図である。It is a schematic diagram which shows the structure of the type character string recognition apparatus 20 which concerns on 1st Embodiment. 同実施形態に係る帳票画像Gsの一例を示す模式図である。It is a schematic diagram which shows an example of the form image Gs which concerns on the same embodiment. 同実施形態に係る基準帳票画像Gcの一例を示す模式図である。It is a schematic diagram which shows an example of the standard form image Gc which concerns on the same embodiment. 同実施形態に係る読取領域の画像の例を示す模式図である。It is a schematic diagram which shows the example of the image of the reading area which concerns on the same embodiment. 同実施形態に係る活字文字列認識装置20の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation of the type character string recognition apparatus 20 which concerns on this embodiment. 同実施形態に係る分割部24Cの動作を説明するための図である。It is a figure for demonstrating the operation of the division part 24C which concerns on the same embodiment. 同実施形態に係る分割部24Cの動作を説明するための図である。It is a figure for demonstrating the operation of the division part 24C which concerns on the same embodiment. 同実施形態に係る認識部24Dの動作を説明するための図である。It is a figure for demonstrating the operation of the recognition unit 24D which concerns on this embodiment. 変形例Dに係る活字文字列認識装置20の構成を示す模式図である。It is a schematic diagram which shows the structure of the type character string recognition apparatus 20 which concerns on modification D. 変形例Eに係る認識部24Dの動作を説明するための図である。It is a figure for demonstrating the operation of the recognition part 24D which concerns on modification E. 第2実施形態に係る活字文字列認識装置20Sの構成を示す模式図である。It is a schematic diagram which shows the structure of the type character string recognition apparatus 20S which concerns on 2nd Embodiment. 同実施形態に係る参照文字列データベース21DSの構成を説明するための模式図である。It is a schematic diagram for demonstrating the structure of the reference character string database 21DS which concerns on this embodiment. 同実施形態に係る活字文字列認識装置20Sの動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation of the type character string recognition apparatus 20S which concerns on this embodiment. 同実施形態に係る活字文字列認識装置20Sの動作の一例を説明するための図である。It is a figure for demonstrating an example of the operation of the type character string recognition apparatus 20S which concerns on this embodiment. 同実施形態に係る活字文字列認識装置20Sの動作の一例を説明するための図である。It is a figure for demonstrating an example of the operation of the type character string recognition apparatus 20S which concerns on this embodiment. 第3実施形態に係る活字文字列認識装置20Tの構成を示す模式図である。It is a schematic diagram which shows the structure of the type character string recognition apparatus 20T which concerns on 3rd Embodiment. 第4実施形態に係る活字文字列認識装置20Uの構成を示す模式図である。It is a schematic diagram which shows the structure of the type character string recognition apparatus 20U which concerns on 4th Embodiment.
 以下、本開示に係る活字文字列認識装置の実施形態を図面とともに説明する。なお、以下の説明において、複数ある同一対象を個別に説明する場合、添え字を付して説明する場合がある。例えば「所定項目」を全体として説明する場合には所定項目Kと表記し、個々の所定項目を具体的に説明する場合には所定項目K1~K6のように添え字を付して説明する。 Hereinafter, embodiments of the print character string recognition device according to the present disclosure will be described together with drawings. In the following description, when a plurality of the same objects are individually described, they may be described with subscripts. For example, when the "predetermined item" is described as a whole, it is described as the predetermined item K, and when each predetermined item is specifically described, it is described with subscripts such as the predetermined items K1 to K6.
<第1実施形態>
 (1-1)活字文字列認識装置の構成
 図1は本実施形態に係る活字文字列認識装置20の構成を示す模式図である。活字文字列認識装置20は、後述する文字列認識モデル21Cを用いて帳票画像Gs内に記載された活字文字列MLの内容を認識する。前提として、本実施形態に係る帳票画像Gsの所定項目欄には、図2に示すように、記入すべき内容が活字で表示されている。なお、帳票画像Gsとは、任意の帳票が写された画像を意味する。また、ここでいう活字文字列を構成する各活字文字は同一の文字幅を有するものである。
 なお、以下の説明において、活字文字列認識装置20は、帳票画像Gs内に記載された活字文字列の内容を認識するものとして説明しているが、活字文字列認識装置20の認識対象は帳票画像に限られるものではない。活字文字列認識装置は、文字列が記入された任意の文書が写された「文書画像」の内容を認識可能なものである。
<First Embodiment>
(1-1) Configuration of Type Character String Recognition Device FIG. 1 is a schematic diagram showing a configuration of a print character string recognition device 20 according to the present embodiment. The print character string recognition device 20 recognizes the contents of the print character string ML described in the form image Gs by using the character string recognition model 21C described later. As a premise, as shown in FIG. 2, the contents to be entered are displayed in print in the predetermined item column of the form image Gs according to the present embodiment. The form image Gs means an image on which an arbitrary form is copied. In addition, each type character constituting the type character string referred to here has the same character width.
In the following description, the print character string recognition device 20 is described as recognizing the contents of the print character string described in the form image Gs, but the recognition target of the print character string recognition device 20 is the form. It is not limited to images. The type character string recognition device can recognize the contents of a "document image" on which an arbitrary document in which a character string is entered is copied.
 帳票画像Gsの一例として、複数の所定項目K1~K6のそれぞれに関連付けて活字文字列ML1~ML6が記入されるものを用いる。項目K1は郵便番号、項目K2は住所、項目K3は氏名、項目K4は登録番号、項目K5は使用物品、項目K6は使用目的にそれぞれ対応している。ただし、これらの項目の内容は単なる例示であり、これらに限定されるものではない。 As an example of the form image Gs, one in which the print character strings ML1 to ML6 are entered in association with each of the plurality of predetermined items K1 to K6 is used. Item K1 corresponds to the zip code, item K2 corresponds to the address, item K3 corresponds to the name, item K4 corresponds to the registration number, item K5 corresponds to the article used, and item K6 corresponds to the purpose of use. However, the contents of these items are merely examples and are not limited thereto.
 活字文字列認識装置20は、任意のコンピュータにより実現することができ、記憶部21、入力部22、出力部23、及び処理部24を備える。 The type character string recognition device 20 can be realized by an arbitrary computer, and includes a storage unit 21, an input unit 22, an output unit 23, and a processing unit 24.
 記憶部21は、各種情報を記憶するものであり、メモリ及びハードディスク等の任意の記憶装置により実現される。ここでは、記憶部21は、文字列認識モデル21Cを構築するニューラルネットワークの重み等の情報を記憶する。 The storage unit 21 stores various types of information, and is realized by an arbitrary storage device such as a memory and a hard disk. Here, the storage unit 21 stores information such as the weight of the neural network that constructs the character string recognition model 21C.
 文字列認識モデル21Cは、一文字単位の矩形領域の画像の入力に応じて、当該矩形領域の画像に記入された活字文字の内容を推論するものである。この文字列認識モデル21Cは、活字文字が写された教師画像に基づいて重みが調整された畳み込みニューラルネットワーク(CNN)等により構築される。 The character string recognition model 21C infers the content of the printed characters entered in the image of the rectangular area in response to the input of the image of the rectangular area of each character. The character string recognition model 21C is constructed by a convolutional neural network (CNN) or the like whose weights are adjusted based on the teacher image on which the printed characters are copied.
 入力部22は、キーボード、マウス、タッチパネル等の任意の入力装置により実現され、コンピュータに各種情報を入力する。 The input unit 22 is realized by an arbitrary input device such as a keyboard, a mouse, and a touch panel, and inputs various information to the computer.
 出力部23は、ディスプレイ、タッチパネル、スピーカー等の任意の出力装置により実現され、コンピュータから各種情報を出力する。 The output unit 23 is realized by an arbitrary output device such as a display, a touch panel, and a speaker, and outputs various information from a computer.
 処理部24は、各種情報処理を実行するものであり、CPU又はGPU等のプロセッサ、及びメモリにより実現される。ここでは、コンピュータのCPU,GPU等に記憶部21に記憶された一又は複数のプログラムが読み込まれることにより、処理部24が、取得部24Aと、抽出部24Bと、分割部24Cと、認識部24Dとして機能する。 The processing unit 24 executes various types of information processing, and is realized by a processor such as a CPU or GPU and a memory. Here, by reading one or more programs stored in the storage unit 21 into the CPU, GPU, etc. of the computer, the processing unit 24 uses the acquisition unit 24A, the extraction unit 24B, the division unit 24C, and the recognition unit. Functions as 24D.
 取得部24Aは、活字文字列MLが記入された帳票画像Gsを取得する。取得部24Aは、任意の撮像装置を介して、読取対象の帳票の画像を帳票画像Gsとして取得する。 The acquisition unit 24A acquires the form image Gs in which the print character string ML is entered. The acquisition unit 24A acquires the image of the form to be read as the form image Gs via an arbitrary imaging device.
 抽出部24Bは、帳票画像Gsと基準帳票画像Gcとをマッチングして読取領域Rを抽出する。なお、ここでいう、マッチングは、画像を直接比較して差分を求めることのみならず、基準帳票画像Gcから取得した所定項目Kに対応する座標情報に基づいて、帳票画像Gsから読取領域Rを抽出する処理を意味する。
 基準帳票画像Gcは、図3に示すように、帳票画像Gsの所定項目Kに対応する欄が空白になっているものである。したがって、図2,3に示す例では、帳票画像Gsと基準帳票画像Gcとに基づいて、帳票画像Gsにおける所定項目K1~K6に対応する内容がそれぞれ読取領域R1~R6として抽出される。ここでは、図4(a)が項目K1に対応する読取領域R1の画像、図4(b)が項目K2に対応する読取領域R2の画像、図4(c)が項目K3に対応する読取領域R3の画像、図4(d)が項目K4に対応する読取領域R4の画像、図4(e)が項目K5に対応する読取領域R5の画像、図4(f)が項目K6に対応する読取領域R6の画像を示している。なお、基準帳票画像Gc及び当該基準帳票画像Gcにおける所定項目Kに対応する座標情報は、記憶部21に予め記憶されている。
The extraction unit 24B matches the form image Gs with the reference form image Gc and extracts the reading area R. The matching referred to here is not only to directly compare the images to obtain the difference, but also to obtain the reading area R from the form image Gs based on the coordinate information corresponding to the predetermined item K acquired from the reference form image Gc. It means the process of extracting.
As shown in FIG. 3, in the reference form image Gc, the column corresponding to the predetermined item K of the form image Gs is blank. Therefore, in the examples shown in FIGS. 2 and 3, the contents corresponding to the predetermined items K1 to K6 in the form image Gs are extracted as the reading areas R1 to R6, respectively, based on the form image Gs and the reference form image Gc. Here, FIG. 4A is an image of the reading area R1 corresponding to the item K1, FIG. 4B is an image of the reading area R2 corresponding to the item K2, and FIG. 4C is a reading area corresponding to the item K3. The image of R3, FIG. 4 (d) is the image of the reading area R4 corresponding to the item K4, FIG. 4 (e) is the image of the reading area R5 corresponding to the item K5, and FIG. 4 (f) is the reading corresponding to the item K6. The image of the region R6 is shown. The reference form image Gc and the coordinate information corresponding to the predetermined item K in the reference form image Gc are stored in advance in the storage unit 21.
 分割部24Cは、読取領域Rに記入された活字文字列MLを一文字単位の矩形領域Tに分割する。詳細については後述する。 The division unit 24C divides the type character string ML written in the reading area R into a rectangular area T in units of one character. Details will be described later.
 認識部24Dは、矩形領域Tの画像に基づいて活字文字列MLの内容を認識する。ここでは、認識部24Dは、文字列認識モデル21Cを用いて、活字文字列MLの内容を認識する。 The recognition unit 24D recognizes the contents of the print character string ML based on the image of the rectangular area T. Here, the recognition unit 24D recognizes the contents of the print character string ML by using the character string recognition model 21C.
 (1-2)活字文字列認識装置の動作
 図5は本実施形態に係る活字文字列認識装置20の動作を説明するためのフローチャートである。
 まず、任意の撮像装置を介して、読取対象の帳票画像Gsが撮像される。そして、これらの帳票画像Gsが適時、活字文字列認識装置20の記憶部21に記憶される(A1)。
(1-2) Operation of the print character string recognition device FIG. 5 is a flowchart for explaining the operation of the print character string recognition device 20 according to the present embodiment.
First, the form image Gs to be read is imaged via an arbitrary imaging device. Then, these form images Gs are timely stored in the storage unit 21 of the print character string recognition device 20 (A1).
 次に、活字文字列認識装置20は、読取領域Rの画像を基準帳票画像Gcに基づいて抽出する(A2)。
 続いて、活字文字列認識装置20は、読取領域Rに記入された活字文字列MLの画像を行毎に一文字単位の矩形領域Tの画像に分割する(A3)。この際、活字文字列認識装置20は、分割部24Cの機能により、読取領域Rを行方向に沿って、複数の活字文字が連続する連続領域Sを特定する。項目K2(住所)を例にとって説明すると、図6に示すように、分割部24Cは、項目K2の読取領域R2の画像(図6(a))を二値化して、二値化した読取領域の画像を生成する(図6(b))。そして、分割部24Cは、二値化した読取領域R2の画像を行方向に膨張処理して、文字部分を塗り潰す(図6(c))。これにより、所定の文字間隔で隣接する文字が結合されて、単語間のブランク(空白)までの複数の文字が連続領域S1~S4として特定される。図6に示す例では、「愛知県」「名古屋市」「中区」「X丁目Y番地z号」が記載された画像がそれぞれ個別の連続領域S1~S4として特定されることになる(図6(d))。
Next, the print character string recognition device 20 extracts the image of the reading area R based on the reference form image Gc (A2).
Subsequently, the print character string recognition device 20 divides the image of the print character string ML written in the reading area R into an image of the rectangular area T in units of one character for each line (A3). At this time, the type character string recognition device 20 identifies a continuous area S in which a plurality of type characters are continuous along the line direction in the reading area R by the function of the division unit 24C. Taking item K2 (address) as an example, as shown in FIG. 6, the division unit 24C binarizes the image of the reading area R2 of item K2 (FIG. 6A) and binarizes the reading area. (Fig. 6 (b)). Then, the dividing unit 24C expands the image of the binarized reading area R2 in the line direction to fill the character portion (FIG. 6C). As a result, adjacent characters are combined at predetermined character intervals, and a plurality of characters up to a blank between words are specified as continuous areas S1 to S4. In the example shown in FIG. 6, the images in which "Aichi Prefecture", "Nagoya City", "Naka Ward", and "X-chome Y-address z" are described are specified as individual continuous regions S1 to S4 (FIG. 6). 6 (d)).
 続いて、活字文字列認識装置20は、分割部24Cの機能により、各連続領域S1~S4を、列方向に走査して一文字毎に分割する。具体的には、連続領域S1~S4を文字間隔以下の幅を有するラインLで走査し、列方向の輝度値を算出する。そして、算出した輝度値が所定値以下の領域を文字間隔に相当するスペースとみなし、次のスペースまでの部分に相当する画像を一文字単位の矩形領域Tとして抽出する。例えば図7に示すように、「愛知県」の文字が連続領域S1として特定された場合、「愛」「知」「県」のそれぞれの文字が記入された画像が矩形領域T1~T3として抽出される。なお、図7(a)は連続領域S1をラインLが走査する概念を示した図であり、図7(b)は連続領域S1における輝度値を示した図である。図7(b)では、連続領域S1をラインLで走査したときの列方向の輝度値を、連続領域S1の画像の水平方向のピクセルに対応させて示している。 Subsequently, the print character string recognition device 20 scans each continuous area S1 to S4 in the column direction by the function of the division unit 24C and divides each continuous area S1 to S4 for each character. Specifically, the continuous regions S1 to S4 are scanned along the line L having a width equal to or less than the character spacing, and the luminance value in the column direction is calculated. Then, the area where the calculated luminance value is equal to or less than a predetermined value is regarded as a space corresponding to the character spacing, and the image corresponding to the part up to the next space is extracted as a rectangular area T for each character. For example, as shown in FIG. 7, when the characters of "Aichi prefecture" are specified as the continuous area S1, the images in which the characters of "love", "knowledge", and "prefecture" are entered are extracted as rectangular areas T1 to T3. Will be done. Note that FIG. 7A is a diagram showing the concept of the line L scanning the continuous region S1, and FIG. 7B is a diagram showing the luminance value in the continuous region S1. In FIG. 7B, the luminance value in the column direction when the continuous region S1 is scanned by the line L is shown corresponding to the pixels in the horizontal direction of the image in the continuous region S1.
 次に、活字文字列認識装置20は、一文字単位に分割された矩形領域Tの画像に基づいて活字文字列MLの内容を認識する(A4)。具体的には、活字文字列認識装置20は、ニューラルネットワークにより構築された文字列認識モデル21Cを用いて、活字文字の内容を推論することで、活字文字列MLの内容を認識する。ここでいう「推論」とは、汎用的な文字画像に基づいて重みが調整されたニューラルネットワークにより算出される活字文字の認識結果のことを意味する。例えば、図8に概念を示すように、「愛」という部分に対応する矩形領域T1の画像を、文字列認識モデル21Cに入力すると、「愛」「憂」「舜」「受」など「愛」と形状が類似する活字文字が確度pとともに推論される。図8の例では、「愛」の確度が99.58309、「憂」の確度が0.0040124、「舜」の確度が1.26771865×(10の-5乗)、「受」の確度が4.238405×(10の-9乗)、であると推論されている。したがって、「愛」の確度が一番大きいので、矩形領域T1の画像の内容は「愛」であると認識される。 Next, the print character string recognition device 20 recognizes the contents of the print character string ML based on the image of the rectangular area T divided into character units (A4). Specifically, the print character string recognition device 20 recognizes the content of the print character string ML by inferring the content of the print character using the character string recognition model 21C constructed by the neural network. The term "inference" as used herein means a recognition result of a printed character calculated by a neural network whose weight is adjusted based on a general-purpose character image. For example, as shown in FIG. 8, when the image of the rectangular region T1 corresponding to the part "love" is input to the character string recognition model 21C, "love", "melancholy", "舜", "receive", etc. are "love". A type character similar in shape to "" is inferred with accuracy p. In the example of FIG. 8, the accuracy of "love" is 99.58309, the accuracy of "melancholy" is 0.0040124, the accuracy of "舜" is 1.26771865 × (10-5th power), and the accuracy of "receive" is It is inferred to be 4.238405 × (10 to the -9th power). Therefore, since the probability of "love" is the highest, the content of the image in the rectangular region T1 is recognized as "love".
 以上のようにして、一文字単位の矩形領域Tの全ての画像が文字列認識モデル21Cに入力されると、帳票画像Gsの所定項目Kに記載された活字文字列MLが認識される。 As described above, when all the images of the rectangular area T of one character unit are input to the character string recognition model 21C, the print character string ML described in the predetermined item K of the form image Gs is recognized.
 なお、上記説明において、単語間のブランクまでを連続領域として特定するようにしていたが、本実施形態に係る活字文字列認識装置20は、同一行であれば単語間のブランクを考慮せずに連続領域を特定してもよい。図6に示す例でいえば、単語間のブランクを考慮せずに、同一行の連続領域S1~S3が連なった1つの連続領域Sを特定してもよい。この場合は「愛知県名古屋市中区」までが連続領域として特定されることになる。このように行毎に連続領域を特定しても一文字単位に分割された矩形領域Tを得ることは可能である。 In the above description, the blanks between words are specified as continuous areas, but the type character string recognition device 20 according to the present embodiment does not consider the blanks between words if they are on the same line. A continuous region may be specified. In the example shown in FIG. 6, one continuous region S in which continuous regions S1 to S3 of the same line are connected may be specified without considering blanks between words. In this case, up to "Naka Ward, Nagoya City, Aichi Prefecture" will be specified as a continuous area. Even if the continuous area is specified for each line in this way, it is possible to obtain the rectangular area T divided into character units.
 (1-3)活字文字列認識装置の特徴
 (1-3-1)
 上述したように、本実施形態に係る活字文字列認識装置20は、取得部24Aと、抽出部24Bと、分割部24Cと、認識部24Dとを備える。取得部24Aは、読取領域Rに活字文字列MLが記入された帳票画像Gsを取得する。抽出部24Bは、読取領域Rの画像を基準帳票画像Gcに基づいて抽出する。分割部24Cは、読取領域Rに記入された活字文字列MLの画像を行毎又は列毎に一文字単位の矩形領域Tの画像に分割する。認識部24Dは、矩形領域Tの画像に基づいて活字文字列MLの内容を認識する。このように、帳票に記入された活字文字列MLを一文字単位の画像で抽出できるので、多数の帳票画像に含まれている文字を高速かつ高精度に読み取ることができる。
(1-3) Features of print character string recognition device (1-3-1)
As described above, the print character string recognition device 20 according to the present embodiment includes an acquisition unit 24A, an extraction unit 24B, a division unit 24C, and a recognition unit 24D. The acquisition unit 24A acquires the form image Gs in which the print character string ML is written in the reading area R. The extraction unit 24B extracts the image of the reading area R based on the reference form image Gc. The dividing unit 24C divides the image of the printed character string ML written in the reading area R into an image of the rectangular area T in units of one character for each row or column. The recognition unit 24D recognizes the contents of the print character string ML based on the image of the rectangular area T. In this way, since the printed character string ML written in the form can be extracted as an image in units of one character, the characters contained in a large number of form images can be read at high speed and with high accuracy.
 特に、ここでは、認識部24Dが、文字列認識モデル21Cを用いて、活字文字列MLの内容を認識する。文字列認識モデル21Cは、ニューラルネットワークにより構築されており、矩形領域Tの画像の入力に応じて、当該矩形領域Tの画像に記入された活字文字の内容を推論する。したがって、このようなニューラルネットワークを用いることで、活字文字列MLの認識精度を高めることができる。 In particular, here, the recognition unit 24D recognizes the contents of the print character string ML by using the character string recognition model 21C. The character string recognition model 21C is constructed by a neural network, and infers the content of the printed characters written in the image of the rectangular area T in response to the input of the image of the rectangular area T. Therefore, by using such a neural network, the recognition accuracy of the printed character string ML can be improved.
 (1-3-2)
 また、本実施形態に係る活字文字列認識装置20では、分割部24Cが、読取領域Rの画像を二値化処理する。また、分割部24Cは、二値化処理した画像を行方向に沿って、複数の活字文字が連続する連続領域Sを特定する。そして、分割部24Cは、連続領域Sの画像を、列方向に走査して一文字単位の二値化された矩形領域Tの画像に分割する。このような構成により、連続領域Sの画像を活字文字の矩形領域Tの画像に高精度に分割できる。そして、一文字単位の画像で最適化されたニューラルネットワークにより構築された文字列認識モデル21Cを用いることで、活字文字を高精度に認識することができる。
(1-3-2)
Further, in the print character string recognition device 20 according to the present embodiment, the division unit 24C performs binarization processing of the image in the reading area R. Further, the dividing unit 24C specifies a continuous region S in which a plurality of printed characters are continuous along the line direction of the binarized image. Then, the division unit 24C scans the image of the continuous region S in the column direction and divides the image into the image of the rectangular region T binarized in character units. With such a configuration, the image of the continuous region S can be divided into the image of the rectangular region T of the printed characters with high accuracy. Then, by using the character string recognition model 21C constructed by the neural network optimized for the image of each character, the printed characters can be recognized with high accuracy.
 なお、ここでは、認識部24Dは、二値化された矩形領域Tの画像から活字文字列MLの内容を認識するので、活字文字列MLの認識速度を速くすることができる。補足すると、二値化された矩形領域Tの画像ではノイズが除去されているので、活字文字列MLを認識する際に実行する演算負荷を低減することができる。その結果、活字文字列MLの認識速度を速くすることができる。 Here, since the recognition unit 24D recognizes the contents of the print character string ML from the image of the binarized rectangular area T, the recognition speed of the print character string ML can be increased. Supplementally, since noise is removed from the image of the binarized rectangular region T, it is possible to reduce the calculation load executed when recognizing the print character string ML. As a result, the recognition speed of the print character string ML can be increased.
 (1-4)変形例
 (1-4-1)変形例A
 上記説明において、分割部24Cは、行方向に沿って複数の活字文字が連続する領域を連続領域Sとしたが、列方向に沿って複数の活字文字が連続する領域を連続領域とするものでもよい。ただし、分割部24Cが列方向に沿って連続領域を特定する場合、分割部24Cは、連続領域Sの画像を、行方向に走査して一文字単位の二値化された矩形領域Tの画像に分割する。
(1-4) Modification example (1-4-1) Modification example A
In the above description, the division unit 24C has a continuous area S in which a plurality of type characters are continuous along the row direction, but may also have a continuous area in which a plurality of type characters are continuous along the column direction. Good. However, when the dividing portion 24C specifies a continuous region along the column direction, the dividing portion 24C scans the image of the continuous region S in the row direction to obtain an image of the rectangular region T binarized in character units. To divide.
 (1-4-2)変形例B
 上記説明において、認識部24Dは、二値化された矩形領域Tの画像から、活字文字列MLの内容を認識するものとしたが、本実施形態に係る認識部24Dはこれに限定されるものではない。本実施形態に係る認識部24Dは、二値化した矩形領域Tの画像から、二値化する前の帳票画像Gsにおける矩形領域の画像を抽出して、活字文字列MLの内容を認識するものでもよい。このような構成により活字文字列MLの認識精度を高めることができる。
(1-4-2) Modification B
In the above description, the recognition unit 24D recognizes the content of the print character string ML from the image of the binarized rectangular region T, but the recognition unit 24D according to the present embodiment is limited to this. is not. The recognition unit 24D according to the present embodiment extracts the image of the rectangular area in the form image Gs before binarization from the image of the binarized rectangular area T and recognizes the contents of the print character string ML. It may be. With such a configuration, the recognition accuracy of the print character string ML can be improved.
 補足すると、二値化した読取領域Rの画像は、膨張処理等が実行されているので、二値化する前の帳票画像Gsにおける活字文字が記載されている領域とは大きさが異なることがある。そこで、一文字単位の矩形領域を求めるまでは二値化した画像を用い、活字文字の推論の際には二値化する前の画像を用いることで、活字文字の認識精度を高めることができる。この際、二値化処理後に分割された矩形領域Tの位置を、二値化処理前の帳票画像Gsに投影させて矩形の画像を抽出し、この画像を文字列認識モデル21Cに入力して文字を推論する。具体的には、図6(d)の二値化処理後の矩形領域Tの画像ではなく、図6(a)の読取領域から矩形の画像を抽出し、これを用いて文字を推論する。なお、この場合の文字列認識モデル21Cは、二値化した文字画像ではなく、グレースケール又はカラー表示で文字が写された画像を教師画像として学習されたものである。 Supplementally, since the image of the binarized reading area R has been expanded, the size may be different from the area in which the printed characters are described in the form image Gs before the binarization. is there. Therefore, the recognition accuracy of the type character can be improved by using the binarized image until the rectangular area of each character is obtained and using the image before the binarization when inferring the type character. At this time, the position of the rectangular region T divided after the binarization process is projected onto the form image Gs before the binarization process to extract a rectangular image, and this image is input to the character string recognition model 21C. Infer the character. Specifically, instead of the image of the rectangular area T after the binarization process of FIG. 6 (d), a rectangular image is extracted from the reading area of FIG. 6 (a), and characters are inferred using this. The character string recognition model 21C in this case is not a binarized character image, but an image in which characters are copied in grayscale or color display is learned as a teacher image.
 (1-4-3)変形例C
 上述した処理部24は、認識部24Dによる活字文字列の認識結果を、文字種別に応じて表示形態を変えて表示する処理を実行するものでもよい。文字列認識モデル21Cを用いて活字文字列の内容を推論した場合、漢字の「工」という文字とカタカナの「エ」という文字、アルファベットの「O」という文字と数字の「0」という文字などは外形が似ていることから誤って識別されてしまうことがある。そこで、数字・アルファベット・記号の違いに応じて色分け等して表示することで、文字列認識モデル21Cの識別結果の正誤を一見して判定できるようになる。
(1-4-3) Modification C
The processing unit 24 described above may execute a process of displaying the recognition result of the type character string by the recognition unit 24D by changing the display form according to the character type. When the contents of the printed character string are inferred using the character string recognition model 21C, the kanji character "ko" and the katakana character "e", the alphabet character "O" and the number "0", etc. May be misidentified because they have similar outer shapes. Therefore, by displaying the character string recognition model 21C in different colors according to the difference in numbers, alphabets, and symbols, the correctness of the identification result of the character string recognition model 21C can be determined at a glance.
 (1-4-4)変形例D
 上記説明では、読取対象の帳票画像Gsが記憶部21に記憶されるとしたが、本実施形態に係る活字文字列認識装置20はこれに限定されるものではない。例えば、図9に示すように、帳票画像Gsは外部記憶装置121に記憶するものでもよい。このような構成により、外部記憶装置121のセキュリティレベルを上げることができ、帳票画像Gsが流出されるリスクを軽減することができる。
 また、活字文字列認識装置20は、ネットワークを介して外部記憶装置121に接続していてよい。さらに、この場合、活字文字列認識装置20の取得部24Aが、帳票画像Gsを読み取るときにのみ外部記憶装置121にアクセスして、帳票画像Gsを読み取るものでもよい。
(1-4-4) Modification D
In the above description, it is assumed that the form image Gs to be read is stored in the storage unit 21, but the type character string recognition device 20 according to the present embodiment is not limited to this. For example, as shown in FIG. 9, the form image Gs may be stored in the external storage device 121. With such a configuration, the security level of the external storage device 121 can be raised, and the risk of the form image Gs being leaked can be reduced.
Further, the print character string recognition device 20 may be connected to the external storage device 121 via a network. Further, in this case, the acquisition unit 24A of the print character string recognition device 20 may access the external storage device 121 only when reading the form image Gs and read the form image Gs.
 (1-4-5)変形例E
 上記説明では、活字文字列認識装置20が入力部22及び出力部23を具備する構成としたが、本実施形態に係る活字文字列認識装置20はこれに限定されるものではない。本実施形態に係る活字文字列認識装置は入力部22及び出力部23を必ずしも具備するものでなくてもよい。
 また、本実施形態に係る活字文字列認識装置20は複数の装置がネットワークを介して接続されるシステムとして構成されるものでもよい。例えば、図10に示すように、入力部22及び出力部23と同様の機能を有する、入力部122及び出力部123を備える端末装置100と、活字文字列認識装置20とがネットワークを介して接続される活字文字列認識システムであってもよい。また、この場合、活字文字列認識装置20は複数の端末装置100と接続されるものでもよい。なお、ここでは便宜上、上記説明における活字文字列認識装置を活字文字列認識システムと呼び変えている。
(1-4-5) Modification E
In the above description, the type character string recognition device 20 is configured to include the input unit 22 and the output unit 23, but the type character string recognition device 20 according to the present embodiment is not limited to this. The type character string recognition device according to the present embodiment does not necessarily have to include the input unit 22 and the output unit 23.
Further, the print character string recognition device 20 according to the present embodiment may be configured as a system in which a plurality of devices are connected via a network. For example, as shown in FIG. 10, a terminal device 100 having an input unit 122 and an output unit 123, which has the same functions as the input unit 22 and the output unit 23, and a print character string recognition device 20 are connected via a network. It may be a printed character string recognition system. Further, in this case, the print character string recognition device 20 may be connected to a plurality of terminal devices 100. Here, for convenience, the print character string recognition device in the above description is referred to as a print character string recognition system.
<第2実施形態>
 (2-1)活字文字列認識装置の構成
 図11は第2実施形態に係る活字文字列認識装置20Sの構成を示す模式図である。第2実施形態に係る活字文字列認識装置20Sは、参照文字列MSを用いた処理を行なう。以下、既に述べた構成については略同一の符号を付し、重複する説明を省略する。特に、本実施形態に特有の構成については添え字「S」をつけて説明する。
<Second Embodiment>
(2-1) Configuration of Type Character String Recognition Device FIG. 11 is a schematic view showing the configuration of the print character string recognition device 20S according to the second embodiment. The print character string recognition device 20S according to the second embodiment performs processing using the reference character string MS. Hereinafter, the configurations already described will be designated by substantially the same reference numerals, and duplicate description will be omitted. In particular, the configuration peculiar to the present embodiment will be described with the subscript "S".
 本実施形態に係る活字文字列認識装置20Sでは、記憶部21Sが、参照文字列データベース21Dを記憶する。参照文字列データベース21Dは、帳票画像Gs内の所定項目K1~K6毎に関連付けられる参照文字列MSを予め記憶する。例えば、参照文字列データベース21Dは、図12に示すように、帳票画像Gsの所定項目K5が「使用物品」である場合、「自家用普通乗用車」「自家用小型乗用車」「自家用軽四輪乗用車」「自家用軽四輪貨物車」「二輪自動車」のそれぞれを参照文字列MS5a~MS5eとして記憶する。 In the print character string recognition device 20S according to the present embodiment, the storage unit 21S stores the reference character string database 21D. The reference character string database 21D stores in advance the reference character string MS associated with each of the predetermined items K1 to K6 in the form image Gs. For example, in the reference character string database 21D, as shown in FIG. 12, when the predetermined item K5 of the form image Gs is "used article", "private ordinary passenger car" "private small passenger car" "private light four-wheeled passenger car" " Each of the "private light four-wheeled freight car" and "two-wheeled vehicle" is stored as the reference character strings MS5a to MS5e.
 また、本実施形態に係る活字文字列認識装置20Sでは、認識部24DSが、後述する候補文字列MKを予め記憶された参照文字列MSと比較して類似度を算出し、類似度の高い参照文字列MSを活字文字列MLの内容として認識する。なお、認識部24Dは、類似度が所定値以下の場合、候補文字列MKを活字文字列MLの内容として認識する。 Further, in the print character string recognition device 20S according to the present embodiment, the recognition unit 24DS calculates the similarity degree by comparing the candidate character string MK described later with the reference character string MS stored in advance, and the reference has a high degree of similarity. The character string MS is recognized as the content of the print character string ML. When the similarity is equal to or less than a predetermined value, the recognition unit 24D recognizes the candidate character string MK as the content of the print character string ML.
 (2-2)活字文字列認識装置の動作及び特徴
 図13は第2実施形態に係る活字文字列認識装置20Sの動作を説明するためのフローチャートである。
 本実施形態に係る活字文字列認識装置20Sにおいては上述したステップA4に代えてステップB4~B8が実行される。したがって、本実施形態に係る活字文字列認識装置20Sにおいても、まずは上述したステップA1~A3と同様の処理が実行される(B1~B3)。
(2-2) Operation and Features of the Type Character String Recognition Device FIG. 13 is a flowchart for explaining the operation of the print character string recognition device 20S according to the second embodiment.
In the print character string recognition device 20S according to the present embodiment, steps B4 to B8 are executed instead of the above-mentioned steps A4. Therefore, also in the print character string recognition device 20S according to the present embodiment, first, the same processing as in steps A1 to A3 described above is executed (B1 to B3).
 そして、活字文字列認識装置20Sは、文字列認識モデル21Cを用いて推論された活字文字から候補文字列MKを生成する(B4)。次に、活字文字列認識装置20Sは、候補文字列MKを予め記憶された参照文字列MSと比較して類似度を算出する(B5)。そして、活字文字列認識装置20Sは、類似度が所定値以上の場合、類似度が最大の参照文字列MSを活字文字列MLの内容として認識する(B6-Yes,B7)。一方、活字文字列認識装置20Sは、類似度が所定値より小さい場合、候補文字列MKを活字文字列の内容として認識する(B6-No,B8)。 Then, the print character string recognition device 20S generates a candidate character string MK from the print characters inferred using the character string recognition model 21C (B4). Next, the print character string recognition device 20S compares the candidate character string MK with the reference character string MS stored in advance to calculate the similarity (B5). Then, when the similarity is equal to or higher than a predetermined value, the print character string recognition device 20S recognizes the reference character string MS having the maximum similarity as the content of the print character string ML (B6-Yes, B7). On the other hand, when the similarity is smaller than a predetermined value, the print character string recognition device 20S recognizes the candidate character string MK as the content of the print character string (B6-No, B8).
 例えば、図14(a)に示すように、活字文字列ML5に対応する「自家用小型乗用車」と記載された読取領域R5の画像から、図14(b)に示すような「自家田小型乗田車」という候補文字列MK5が生成されるとする。この場合、本実施形態に係る認識部24DSは、参照文字列MS5と候補文字列MK5とを比較し、両者の類似度を算出する。ここでは、図15に示すように、参照文字列MS5a~MS5eとして「自家用乗用車」「自家用小型乗用車」「自家用軽四輪乗用車」「自家用軽四輪貨物車」「二輪自動車」が記憶されており、それぞれの類似度が「0.5」「0.75」「0.470588」「0.35291」「0.3076923」と算出されている。この場合、参照文字列MS5bの「自家用小型乗用車」の類似度が0.75と最も高いので、認識部24DSは、上述した候補文字列MKを参照文字列MS5bに置き換える。 For example, as shown in FIG. 14 (a), from the image of the reading area R5 described as "private small passenger car" corresponding to the print character string ML5, "private small passenger car" as shown in FIG. 14 (b). The candidate character string MK5 is generated. In this case, the recognition unit 24DS according to the present embodiment compares the reference character string MS5 with the candidate character string MK5 and calculates the degree of similarity between the two. Here, as shown in FIG. 15, "private passenger car", "small private car", "private light four-wheeled passenger car", "private light four-wheeled freight car", and "two-wheeled vehicle" are stored as reference character strings MS5a to MS5e. , The respective similarity is calculated as "0.5", "0.75", "0.470588", "0.35291", and "0.3076923". In this case, since the similarity of the reference character string MS5b to the "private small passenger car" is the highest at 0.75, the recognition unit 24DS replaces the above-mentioned candidate character string MK with the reference character string MS5b.
 上述したように、本実施形態に係る活字文字列認識装置20Sは、参照文字列データベース21Dを用いることで、活字文字列MLの認識精度をさらに高めることができる。特に、帳票の所定項目に記載される内容が特定される場合に、多数の帳票画像に含まれている文字を高速かつ高精度に読み取ることができる。 As described above, the print character string recognition device 20S according to the present embodiment can further improve the recognition accuracy of the print character string ML by using the reference character string database 21D. In particular, when the content described in the predetermined item of the form is specified, the characters included in a large number of form images can be read at high speed and with high accuracy.
 一方、本実施形態に係る活字文字列認識装置20Sは、類似度が所定値以下の場合、候補文字列MKを活字文字列MLの内容として認識する。このような構成により、参照文字列MSとして予め登録しておくことができない場合にも帳票画像Gsに含まれている文字を高精度に読み取ることができる。例えば、帳票の項目によっては、予め設定された項目に加え、自由記載欄を設けることもある。本実施形態に係る活字文字列認識装置20Sでは、類似度が所定値以下の場合には予め設定された項目以外の内容が記入されているとみなし、これを高精度に読み取る。 On the other hand, the print character string recognition device 20S according to the present embodiment recognizes the candidate character string MK as the content of the print character string ML when the similarity is equal to or less than a predetermined value. With such a configuration, the characters included in the form image Gs can be read with high accuracy even when the reference character string MS cannot be registered in advance. For example, depending on the items of the form, a free entry field may be provided in addition to the preset items. In the print character string recognition device 20S according to the present embodiment, when the similarity is equal to or less than a predetermined value, it is considered that the contents other than the preset items are entered, and this is read with high accuracy.
 なお、第2実施形態に係る活字文字列認識装置20Sにおいても第1実施形態の特徴及び変形例をそのまま適用可能である。 Note that the features and modifications of the first embodiment can be applied as they are to the print character string recognition device 20S according to the second embodiment.
<第3実施形態>
 図16は第3実施形態に係る活字文字列認識装置20Tの構成を示す模式図である。以下、既に述べた構成については略同一の符号を付し、重複する説明を省略する。特に、本実施形態に特有の構成については添え字「T」をつけて説明する。
<Third Embodiment>
FIG. 16 is a schematic view showing the configuration of the print character string recognition device 20T according to the third embodiment. Hereinafter, the configurations already described will be designated by substantially the same reference numerals, and duplicate description will be omitted. In particular, the configuration peculiar to the present embodiment will be described with the subscript "T".
 本実施形態に係る記憶部21Tは、複数の所定項目K1~K6に関連付けて複数の文字列認識モデル21C1~21Cn(nは2以上で項目Kの数未満)を記憶する。図16に示す例では、記憶部21Tに、複数の文字列認識モデル21C1~21C4(n=4)が記憶されているものとする。 The storage unit 21T according to the present embodiment stores a plurality of character string recognition models 21C1 to 21Cn (n is 2 or more and less than the number of items K) in association with a plurality of predetermined items K1 to K6. In the example shown in FIG. 16, it is assumed that a plurality of character string recognition models 21C1 to 21C4 (n = 4) are stored in the storage unit 21T.
 また、本実施形態に係る認識部24DTは、二以上の所定項目が同一の文字列認識モデル21Ci(iは1から4のいずれかの値)が関連付けられている場合、当該二以上の所定項目から得られた矩形領域の画像を合成し、合成した矩形領域の画像を当該同一の文字列認識モデル21Ciに入力する。 Further, the recognition unit 24DT according to the present embodiment has two or more predetermined items when two or more predetermined items are associated with the same character string recognition model 21Ci (i is a value of any of 1 to 4). The images of the rectangular area obtained from the above are combined, and the combined image of the rectangular area is input to the same character string recognition model 21Ci.
 補足すると、読取対象の帳票画像Gsの所定項目Kには、数字だけが記入される項目、カナ文字だけが記入される項目、任意の文字が記入される項目等が存在する。例えば、前述の項目K2(住所)と項目K3(氏名)とには任意の文字が記入される。そのため、項目K2(住所)と項目K3(氏名)とには、任意の文字を識別可能な文字列認識モデル21C2が関連付けられている。このような場合、認識部24DTは、項目K2(住所)及び項目K3(氏名)から得られた矩形領域の画像を合成し、合成した矩形領域の画像を文字列認識モデル21C2に一度に入力する。 Supplementally, in the predetermined item K of the form image Gs to be read, there are an item in which only numbers are entered, an item in which only kana characters are entered, an item in which arbitrary characters are entered, and the like. For example, arbitrary characters are entered in the above-mentioned item K2 (address) and item K3 (name). Therefore, the item K2 (address) and the item K3 (name) are associated with the character string recognition model 21C2 capable of identifying any character. In such a case, the recognition unit 24DT synthesizes the images of the rectangular area obtained from the item K2 (address) and the item K3 (name), and inputs the combined images of the rectangular area into the character string recognition model 21C2 at once. ..
 したがって、活字文字列認識装置20Tは、上述した構成により、二以上の所定項目については同一の文字列認識モデル21Ciで一度に活字文字列MLを推論するので、活字文字列MLの認識速度を高めることができる。 Therefore, the print character string recognition device 20T infers the print character string ML at a time with the same character string recognition model 21Ci for two or more predetermined items according to the above-described configuration, so that the recognition speed of the print character string ML is increased. be able to.
 詳しくは、項目K2(住所)に対応する矩形領域の画像と項目K3(氏名)に対応する矩形領域の画像とを個別に文字列認識モデル21C2に入力しても、項目K2(住所)に対応する矩形領域の画像と項目K3(氏名)に対応する矩形領域の画像とを合成して一度に文字列認識モデル21C2に入力したときと比較して、認識精度に差異は生じない。一方、文字列認識モデル21C2がニューラルネットワークにより実現されている場合、後者の方が演算速度は速くなる。したがって、第3実施形態に係る活字文字列認識装置20Tのように、文字種が同一の項目についてはまとめて処理することで、帳票画像Gsの読み取り速度を向上することができる。 Specifically, even if the image of the rectangular area corresponding to the item K2 (address) and the image of the rectangular area corresponding to the item K3 (name) are individually input to the character string recognition model 21C2, the item K2 (address) is supported. There is no difference in recognition accuracy as compared with the case where the image of the rectangular area to be used and the image of the rectangular area corresponding to the item K3 (name) are combined and input to the character string recognition model 21C2 at once. On the other hand, when the character string recognition model 21C2 is realized by the neural network, the calculation speed is faster in the latter case. Therefore, it is possible to improve the reading speed of the form image Gs by collectively processing the items having the same character type as in the type character string recognition device 20T according to the third embodiment.
 また、文字列認識モデル21Ciは、項目毎に使用される文字種に関連付けて構築されるので、活字文字列MLの認識精度及び認識速度を高めることができる。例えば、前述の項目K1(郵便番号)には数字だけが記入される。そのため、項目K1(郵便番号)は数字だけを識別可能な文字列認識モデル21C1を用いればよく、任意の文字を識別可能な文字列認識モデル21C2を用いるのに比して、認識精度及び認識速度を高めることができる。 Further, since the character string recognition model 21Ci is constructed in association with the character type used for each item, the recognition accuracy and recognition speed of the print character string ML can be improved. For example, only numbers are entered in the above-mentioned item K1 (zip code). Therefore, for the item K1 (zip code), it is sufficient to use the character string recognition model 21C1 capable of identifying only numbers, and the recognition accuracy and recognition speed are higher than those using the character string recognition model 21C2 capable of identifying arbitrary characters. Can be enhanced.
 なお、第3実施形態に係る活字文字列認識装置20Tにおいても第1,2実施形態の特徴及び変形例をそのまま適用可能である。 Note that the features and modifications of the first and second embodiments can be applied as they are to the type character string recognition device 20T according to the third embodiment.
<第4実施形態>
 図17は第4実施形態に係る活字文字列認識装置20Uの構成を示す模式図である。以下、既に述べた構成については略同一の符号を付し、重複する説明を省略する。特に、本実施形態に特有の構成については添え字「U」をつけて説明する。
<Fourth Embodiment>
FIG. 17 is a schematic view showing the configuration of the print character string recognition device 20U according to the fourth embodiment. Hereinafter, the configurations already described will be designated by substantially the same reference numerals, and duplicate description will be omitted. In particular, the configuration peculiar to the present embodiment will be described with the subscript "U".
 本実施形態では、上記ステップA3において、一文字単位の矩形画像を抽出する際に別の方法を採用する。 In the present embodiment, in step A3, another method is adopted when extracting a rectangular image in units of one character.
 本実施形態に係る活字文字列認識装置20Uは、分割部24CUを備える。分割部24CUは、上述したのと同様の処理により、連続領域S1~S4として特定する。すなわち、図6に示す例では、「愛知県」「名古屋市」「中区」「X丁目Y番地z号」が記載された画像をそれぞれ個別の連続領域S1~S4として特定する(図6(d))。 The print character string recognition device 20U according to the present embodiment includes a division unit 24CU. The division unit 24CU is specified as continuous regions S1 to S4 by the same processing as described above. That is, in the example shown in FIG. 6, the images in which "Aichi Prefecture", "Nagoya City", "Naka Ward", and "X-chome Y-address z" are described are specified as individual continuous regions S1 to S4 (FIG. 6 (FIG. 6). d)).
 続いて、分割部24CUは、連続領域S1~S4に対応する二値化処理した画像を、行方向に沿って、一文字に相当する単位長さに基づいて矩形領域の画像に分割する。具体的には、行方向及び列方向がそれぞれ15ピクセルの矩形領域に1文字画像が記入されるときに、連続領域S1に対応する二値化処理した画像が行方向に48ピクセルの長さを有しているとする。この場合、48ピクセルを15ピクセルで割った値3.2を四捨五入等して、分割数を3と決定し、この分割数で、連続領域S1に対応する二値化処理した画像を分割する。 Subsequently, the dividing unit 24CU divides the binarized image corresponding to the continuous areas S1 to S4 into a rectangular area image based on a unit length corresponding to one character along the line direction. Specifically, when a one-character image is written in a rectangular area having 15 pixels in each of the row direction and the column direction, the binarized image corresponding to the continuous area S1 has a length of 48 pixels in the row direction. Suppose you have. In this case, the value 3.2 obtained by dividing 48 pixels by 15 pixels is rounded off to determine the number of divisions, and the binarized image corresponding to the continuous region S1 is divided by this number of divisions.
 そして、活字文字列認識装置20Uは、分割された画像を一文字単位に分割された矩形領域Tの画像とみなして、文字列認識モデル21Cを用いて、活字文字列MLの内容を認識する。 Then, the print character string recognition device 20U regards the divided image as an image of the rectangular area T divided into character units, and recognizes the contents of the print character string ML by using the character string recognition model 21C.
 上述した構成により、活字文字列認識装置20Uは、帳票に記入された活字文字列MLを一文字単位の画像で抽出できるので、多数の帳票画像に含まれている文字を高速かつ高精度に読み取ることができる。 With the above-described configuration, the print character string recognition device 20U can extract the print character string ML written in the form as an image for each character, so that the characters contained in a large number of form images can be read at high speed and with high accuracy. Can be done.
 特に、一筆書きで記載できない文字種の場合には、本実施形態による方法で読取精度を高めることができる場合がある。例えば、第1実施形態に係る活字文字列認識装置20では、連続領域S1~S4を走査するラインLの幅が狭く、カタカナの「リ」などを「1」「ノ」などと2つの文字として読み取ることがある。本実施形態に係る活字文字列認識装置20Uであれば、1つの文字から2つの文字を読み取ることがなく、同じ文字幅の活字文字を高精度に読み取ることができる。 In particular, in the case of character types that cannot be described with a single stroke, the reading accuracy may be improved by the method according to this embodiment. For example, in the type character string recognition device 20 according to the first embodiment, the width of the line L that scans the continuous regions S1 to S4 is narrow, and the katakana "ri" and the like are used as two characters such as "1" and "no". May be read. With the print character string recognition device 20U according to the present embodiment, it is possible to read print characters having the same character width with high accuracy without reading two characters from one character.
 なお、第4実施形態に係る活字文字列認識装置20Uにおいても、上記(13-2)以外の第1~3実施形態の特徴及び変形例をそのまま適用可能である。 The print character string recognition device 20U according to the fourth embodiment can also be applied as it is with the features and modifications of the first to third embodiments other than the above (13-2).
 <他の実施形態>
 本開示は、上記各実施形態そのままに限定されるものではない。本開示は、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できるものである。また、本開示は、上記各実施形態に開示されている複数の構成要素の適宜な組み合わせにより種々の開示を形成できるものである。例えば、実施形態に示される全構成要素から幾つかの構成要素は削除してもよいものである。さらに、異なる実施形態に構成要素を適宜組み合わせてもよいものである。
<Other embodiments>
The present disclosure is not limited to each of the above embodiments as it is. In the present disclosure, the components can be modified and embodied without departing from the gist at the implementation stage. Further, in the present disclosure, various disclosures can be formed by appropriately combining the plurality of components disclosed in each of the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Further, the components may be appropriately combined in different embodiments.
20  活字文字列認識装置
20S 活字文字列認識装置
20T 活字文字列認識装置
20U 活字文字列認識装置
21  記憶部
21C 文字列認識モデル
21D 参照文字列データベース
22  入力部
23  出力部
24  処理部
24A 取得部
24B 抽出部
24C 分割部
24D 認識部
100 端末装置
121 外部記憶装置
122 入力部
123 出力部
Gc  基準帳票画像
Gs  帳票画像
ML  活字文字列
MK  候補文字列
MS  参照文字列
R   読取領域
T   矩形領域
20 Type character string recognition device 20S Type character string recognition device 20T Type character string recognition device 20U Type character string recognition device 21 Storage unit 21C Character string recognition model 21D Reference character string database 22 Input unit 23 Output unit 24 Processing unit 24A Acquisition unit 24B Extraction unit 24C Division unit 24D Recognition unit 100 Terminal device 121 External storage device 122 Input unit 123 Output unit Gc Reference form image Gs Form image ML Printed character string MK Candidate character string MS Reference character string R Reading area T Rectangular area
特開2003-115028号公報Japanese Unexamined Patent Publication No. 2003-115028

Claims (11)

  1.  読取領域(R)に活字文字列(ML)が記入された文書画像(Gs)を取得する取得部(24A)と、
     前記読取領域の画像を基準文書画像(Gc)に基づいて抽出する抽出部(24B)と、
     前記読取領域に記入された活字文字列の画像を行毎または列毎に一文字単位の矩形領域(T)の画像に分割する分割部(24C,24CU)と、
     前記矩形領域の画像の入力に応じて、当該矩形領域の画像に記入された活字文字の内容を推論する文字列認識モデル(21C)を用いて、前記活字文字列の内容を認識する認識部(24D)と、
    を備える、活字文字列認識装置(20,20S,20T)。
    An acquisition unit (24A) for acquiring a document image (Gs) in which a print character string (ML) is written in a reading area (R), and
    An extraction unit (24B) that extracts an image of the reading area based on a reference document image (Gc), and
    A dividing unit (24C, 24CU) that divides the image of the printed character string written in the reading area into an image of a rectangular area (T) of one character unit for each row or column.
    A recognition unit that recognizes the contents of the printed character string using a character string recognition model (21C) that infers the contents of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. 24D) and
    Type character string recognition device (20, 20S, 20T).
  2.  前記文書画像には、複数の所定項目(K)に関連付けられて活字文字列が記入されており、
     前記文字列認識モデルは、ニューラルネットワークにより構築されているものであって、前記複数の所定項目に関連付けて複数準備されており、
     前記抽出部が、複数の所定項目に対応する複数の読取領域を抽出し、
     前記認識部が、二以上の所定項目が同一の文字列認識モデルに関連付けられている場合、当該二以上の所定項目から得られた矩形領域を合成し、合成した矩形領域を当該同一の文字列認識モデルに入力する、
     請求項1に記載の活字文字列認識装置。
    In the document image, a print character string is entered in association with a plurality of predetermined items (K).
    The character string recognition model is constructed by a neural network, and a plurality of the character string recognition models are prepared in association with the plurality of predetermined items.
    The extraction unit extracts a plurality of reading areas corresponding to a plurality of predetermined items,
    When two or more predetermined items are associated with the same character string recognition model, the recognition unit synthesizes rectangular areas obtained from the two or more predetermined items, and combines the combined rectangular areas with the same character string. Enter in the recognition model,
    The type character string recognition device according to claim 1.
  3.  前記認識部は、
     前記文字列認識モデルを用いて推論された活字文字から候補文字列(MK)を生成し、
     前記候補文字列を予め記憶された参照文字列(MS)と比較して類似度を算出し、
     前記類似度が所定値以上の場合、前記参照文字列を前記活字文字列の内容として認識する、
     請求項1又は2に記載の活字文字列認識装置。
    The recognition unit
    A candidate character string (MK) is generated from the printed characters inferred using the character string recognition model, and the candidate character string (MK) is generated.
    The similarity is calculated by comparing the candidate character string with the reference character string (MS) stored in advance.
    When the similarity is equal to or greater than a predetermined value, the reference character string is recognized as the content of the printed character string.
    The type character string recognition device according to claim 1 or 2.
  4.  前記認識部は、
     前記類似度が所定値より小さい場合、前記候補文字列を前記活字文字列の内容として認識する、
     請求項3に記載の活字文字列認識装置。
    The recognition unit
    When the similarity is smaller than a predetermined value, the candidate character string is recognized as the content of the printed character string.
    The type character string recognition device according to claim 3.
  5.  前記分割部は、
     前記読取領域の画像を二値化処理し、
     二値化処理した画像を行方向または列方向に沿って、複数の活字文字が連続する連続領域(S)を特定し、
     前記連続領域の画像を、列方向または行方向に走査して一文字単位の二値化された矩形領域の画像に分割する、
     請求項1から4のいずれか1項に記載の活字文字列認識装置。
    The divided portion
    The image in the reading area is binarized and processed.
    A continuous region (S) in which a plurality of type characters are continuous is specified along the row direction or the column direction of the binarized image.
    The image of the continuous region is scanned in the column direction or the row direction and divided into an image of a binary rectangular region for each character.
    The type character string recognition device according to any one of claims 1 to 4.
  6.  前記認識部は、前記二値化された矩形領域の画像から、前記活字文字列の内容を認識する、
     請求項5に記載の活字文字列認識装置。
    The recognition unit recognizes the content of the type character string from the image of the binarized rectangular region.
    The print character string recognition device according to claim 5.
  7.  前記認識部は、前記二値化された矩形領域に対応する、二値化される前の矩形領域の画像から、前記活字文字列の内容を認識する、
     請求項5に記載の活字文字列認識装置。
    The recognition unit recognizes the content of the type character string from the image of the rectangular area before binarization corresponding to the binarized rectangular area.
    The print character string recognition device according to claim 5.
  8.  前記分割部は、
     前記読取領域の画像を二値化処理し、
     二値化処理した画像を行方向または列方向に沿って、複数の活字文字が連続する連続領域(S)を特定し、
     前記連続領域に対応する二値化処理した画像を行方向又は列方向に沿って、一文字に相当する単位長さに基づいて矩形領域の画像に分割する、
     請求項1から4のいずれか1項に記載の活字文字列認識装置。
    The divided portion
    The image in the reading area is binarized and processed.
    A continuous region (S) in which a plurality of type characters are continuous is specified along the row direction or the column direction of the binarized image.
    The binarized image corresponding to the continuous region is divided into a rectangular region image based on a unit length corresponding to one character along the row direction or the column direction.
    The type character string recognition device according to any one of claims 1 to 4.
  9.  前記認識部による活字文字列の認識結果を、文字種別に応じて表示形態を変えて表示する、
     請求項1から7のいずれか1項に記載の活字文字列認識装置(20,20S,20T)。
    The recognition result of the type character string by the recognition unit is displayed by changing the display form according to the character type.
    The type string recognition device (20, 20S, 20T) according to any one of claims 1 to 7.
  10.  コンピュータを、
     読取領域(R)に活字文字列(ML)が記入された文書画像(Gs)を取得する取得部(24A)、
     前記読取領域の画像を基準文書画像(Gc)に基づいて抽出する抽出部(24B)、
     前記読取領域に記入された活字文字列の画像を行毎又は列毎に一文字単位の矩形領域(T)の画像の画像に分割する分割部(24C)、
     前記矩形領域の画像の入力に応じて、当該矩形領域の画像に記入された活字文字の内容を推論する文字列認識モデル(21C)を用いて、前記活字文字列の内容を認識する認識部(24D)、
    として機能させる、活字文字列認識プログラム。
    Computer,
    Acquisition unit (24A) for acquiring document images (Gs) in which a print character string (ML) is written in a reading area (R),
    An extraction unit (24B) that extracts an image of the reading area based on a reference document image (Gc),
    A dividing unit (24C) that divides an image of a printed character string written in the reading area into an image of a rectangular area (T) in units of one character for each row or column.
    A recognition unit that recognizes the contents of the printed character string by using a character string recognition model (21C) that infers the contents of the printed characters written in the image of the rectangular area in response to the input of the image of the rectangular area. 24D),
    A type string recognition program that functions as.
  11.  コンピュータを用いて活字文字列の内容を認識する活字文字列認識方法であって、
     読取領域(R)に活字文字列(ML)が記入された文書画像を取得し、
     前記読取領域の画像を基準文書画像(Gc)に基づいて抽出し、
     前記読取領域に記入された活字文字列の画像を行毎又は列毎に一文字単位の矩形領域(T)の画像に分割し、
     前記矩形領域の画像の入力に応じて、当該矩形領域の画像に記入された活字文字の内容を推論する文字列認識モデル(21C)を用いて、前記活字文字列の内容を認識する、
     活字文字列認識方法。
     
     
     
    It is a type character string recognition method that recognizes the contents of a type character string using a computer.
    Obtain a document image in which a print character string (ML) is written in the reading area (R),
    The image of the reading area is extracted based on the reference document image (Gc), and the image is extracted.
    The image of the printed character string written in the reading area is divided into an image of a rectangular area (T) of one character unit for each row or column.
    In response to the input of the image of the rectangular area, the content of the type character string is recognized by using the character string recognition model (21C) that infers the content of the type character written in the image of the rectangular area.
    Type string recognition method.


PCT/JP2020/012230 2019-03-29 2020-03-19 Printed character string recognition device, program, and method WO2020203339A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2020536819A JP6820578B1 (en) 2019-03-29 2020-03-19 Type string recognition device, program, and method.

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-066064 2019-03-29
JP2019066064 2019-03-29

Publications (1)

Publication Number Publication Date
WO2020203339A1 true WO2020203339A1 (en) 2020-10-08

Family

ID=72667993

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/012230 WO2020203339A1 (en) 2019-03-29 2020-03-19 Printed character string recognition device, program, and method

Country Status (2)

Country Link
JP (1) JP6820578B1 (en)
WO (1) WO2020203339A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60142792A (en) * 1983-12-29 1985-07-27 Fujitsu Ltd Multi-kind character recognizing device
JPS6057112B2 (en) * 1977-02-04 1985-12-13 キヤノン株式会社 Magnetic card transfer device
JPH0442380A (en) * 1990-06-08 1992-02-12 Seiko Epson Corp Recognized character display device
JPH10143605A (en) * 1996-11-15 1998-05-29 Sharp Corp Optical character recognition device
JP2009026287A (en) * 2007-07-23 2009-02-05 Sharp Corp Character image extracting apparatus and character image extracting method
JP2010269272A (en) * 2009-05-22 2010-12-02 Toshiba Corp Paper sheet processing apparatus, and paper sheet processing method
JP2011018175A (en) * 2009-07-08 2011-01-27 Mitsubishi Heavy Ind Ltd Character recognition apparatus and character recognition method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6057112B1 (en) * 2016-04-19 2017-01-11 AI inside株式会社 Character recognition apparatus, method and program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6057112B2 (en) * 1977-02-04 1985-12-13 キヤノン株式会社 Magnetic card transfer device
JPS60142792A (en) * 1983-12-29 1985-07-27 Fujitsu Ltd Multi-kind character recognizing device
JPH0442380A (en) * 1990-06-08 1992-02-12 Seiko Epson Corp Recognized character display device
JPH10143605A (en) * 1996-11-15 1998-05-29 Sharp Corp Optical character recognition device
JP2009026287A (en) * 2007-07-23 2009-02-05 Sharp Corp Character image extracting apparatus and character image extracting method
JP2010269272A (en) * 2009-05-22 2010-12-02 Toshiba Corp Paper sheet processing apparatus, and paper sheet processing method
JP2011018175A (en) * 2009-07-08 2011-01-27 Mitsubishi Heavy Ind Ltd Character recognition apparatus and character recognition method

Also Published As

Publication number Publication date
JPWO2020203339A1 (en) 2021-04-30
JP6820578B1 (en) 2021-01-27

Similar Documents

Publication Publication Date Title
JP5972468B2 (en) Detect labels from images
Ye et al. PingAn-VCGroup's solution for ICDAR 2021 competition on scientific literature parsing task B: table recognition to HTML
JP4443576B2 (en) Pattern separation / extraction program, pattern separation / extraction apparatus, and pattern separation / extraction method
CN112990205B (en) Method and device for generating handwritten character sample, electronic equipment and storage medium
US9159147B2 (en) Method and apparatus for personalized handwriting avatar
CN112926565B (en) Picture text recognition method, system, equipment and storage medium
JP3913985B2 (en) Character string extraction apparatus and method based on basic components in document image
CN112016545A (en) Image generation method and device containing text
US20210217160A1 (en) Commodity Identification Device, Non-Transitory Computer-Readable Storage Medium, and Learning Method
US8036461B2 (en) Method of graphical objects recognition using the integrity principle
CN113887438A (en) Watermark detection method, device, equipment and medium for face image
CN109685061A (en) The recognition methods of mathematical formulae suitable for structuring
CN111126266B (en) Text processing method, text processing system, equipment and medium
JP6671613B2 (en) Character recognition method and computer program
WO2020203339A1 (en) Printed character string recognition device, program, and method
CN112927314A (en) Image data processing method and device and computer equipment
CN111274863A (en) Text prediction method based on text peak probability density
JP2021047693A (en) Information processing apparatus and program
Akhil An overview of tesseract OCR engine
JP7351159B2 (en) Information processing device and program
JP2016009395A (en) Document file generation device and document file generation method
Zatsepin et al. Fast Korean syllable recognition with letter-based convolutional neural networks
CN113536782A (en) Sensitive word recognition method and device, electronic equipment and storage medium
JP3476595B2 (en) Image area division method and image binarization method
JP2022090469A (en) Format defining device, format defining method, and program

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020536819

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20784984

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20784984

Country of ref document: EP

Kind code of ref document: A1