CN116569225B

CN116569225B - Document image recognition system

Info

Publication number: CN116569225B
Application number: CN202080103301.0A
Authority: CN
Inventors: 岩村光贵; 横田守真; 三轮刚久; 长谷川康次; 小田仁己; 奥村诚司; 小平孝之; 齐藤启太; 榎本嵩久
Original assignee: Mitsubishi Electric Building Solutions Corp
Current assignee: Mitsubishi Electric Building Solutions Corp
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2024-04-30
Anticipated expiration: 2040-08-24
Also published as: CN116569225A; JP7134380B2; JPWO2022044067A1; WO2022044067A1

Abstract

A document image recognition system (100) comprises a user terminal (10), a center server (20) and a cloud API (31), wherein the center server (20) is provided with a selection database (24), the selection database (24) stores a character recognition cloud API (31) with the maximum positive resolution of character recognition when character recognition processing of an input document image is performed, the user terminal (10) transmits an acquired document image as a processing object document image to the center server (20), the center server (20) extracts features from the processing object document image, selects one character recognition cloud API (31) according to the extracted features, and transmits the processing object document image to the selected one character recognition cloud API (31).

Description

Document image recognition system

Technical Field

To a document image recognition system utilizing a character recognition cloud API.

Background

Document image recognition systems using a character recognition function application program interface (hereinafter, referred to as character recognition cloud API) provided by a cloud service are known. In many cases, this system evaluates the positive solutions and processing speeds of a plurality of character recognition cloud APIs using test images prepared in advance, selects a character recognition cloud API, and executes character recognition processing on the selected character recognition cloud API (for example, refer to patent document 1).

Prior art literature

Patent literature

Patent document 1: japanese patent laid-open No. 2008-293354

Disclosure of Invention

Problems to be solved by the invention

On the other hand, in the character recognition cloud API, the positive solution rate of character recognition sometimes differs according to the characteristics of the document image. Therefore, when a document image having a characteristic different from that of a test image used when evaluating a character recognition cloud API is input in advance, a character recognition cloud API different from that evaluated in advance may be optimal. Therefore, the character recognition accuracy of the document image recognition system sometimes decreases.

It is therefore an object of the present invention to provide a document image recognition system with high character recognition accuracy.

Means for solving the problems

The document image recognition system of the present invention includes: a user terminal that acquires a document image; a center server connected to the user terminal by a communication line; and a plurality of character recognition cloud APIs connected to the center server by a communication line, the character recognition processing of the inputted document image is performed, and a character recognition result is outputted, wherein the center server has a selection database storing features of the inputted document image and a character recognition cloud API set having a greatest positive resolution of character recognition among the plurality of character recognition cloud APIs when the character recognition processing of the inputted document image is performed, the user terminal transmits the obtained document image as a processing object document image to the center server, the center server extracts features of the processing object document image from the processing object document image received from the user terminal, selects features of the inputted document image which are most similar to the features of the processing object document image from the features of the inputted document image stored in the selection database, selects one character recognition cloud API set with the features of the selected inputted document image, transmits the processing object document image to the selected one character recognition cloud API set, and receives a character recognition result from the one character recognition cloud API set, and transmits the character recognition result to the user terminal.

In this way, the character recognition cloud API most suitable for the character recognition processing of the processing target document image received from the user terminal is selected, and the character recognition cloud API is caused to perform the character recognition processing, so that the character recognition accuracy of the document image recognition system can be improved.

In the document image recognition system according to the present invention, the user terminal may output the correct character string included in the processing target document image input by the user to the center server when the character recognition result is received from the center server, the center server may transmit the processing target document image to each character recognition cloud API when the correct character string is input by the user terminal, the center server may receive the character recognition result from each character recognition cloud API, and the center server may update each feature of each input document image grouped with each character recognition cloud API of the selection database and may add any one or both of the feature of the input document image and the group of character recognition cloud APIs to the selection database based on the correct resolution of the received character recognition result.

This can optimize the selection database and improve the character recognition accuracy of the document image recognition system.

In the document image recognition system according to the present invention, the center server may update the feature of the input document image grouped with the selected one character recognition cloud API based on the feature of the processing object document image when the character recognition result received from the selected one character recognition cloud API is positive and at least one of the character recognition results received from the other character recognition cloud APIs other than the selected one character recognition cloud API is positive, and the similarity value between the feature of the processing object document image and the feature of the input document image grouped with the selected one character recognition cloud API is equal to or greater than a predetermined threshold value.

In the document image recognition system according to the present invention, when the character recognition result received from the selected one character recognition cloud API is positive and at least one of the character recognition results received from the other character recognition cloud APIs is positive, and when the similarity value between the feature of the processing target document image and the feature of the input document image grouped with the selected one character recognition cloud API is smaller than a predetermined threshold value, the center server may add the feature of the processing target document image and the group of the selected one character recognition cloud API to the selection database.

In the document image recognition system according to the present invention, when the character recognition result received from the one character recognition cloud API selected is positive and at least one of the character recognition results received from the character recognition cloud APIs other than the one character recognition cloud API selected is positive, and when the similarity value between the feature of the processing target document image and the feature of the input document image grouped by the character recognition cloud API whose character recognition result is positive among the other character recognition cloud APIs is equal to or greater than a predetermined threshold value, the feature of the input document image grouped by the character recognition cloud API whose character recognition result is positive among the other character recognition cloud APIs may be updated based on the feature of the processing target document image.

In the document image recognition system according to the present invention, when the character recognition result received from the one character recognition cloud API selected is positive and at least one of the character recognition results received from the other character recognition cloud APIs is positive, and when the similarity value between the feature of the processing target document image and the feature of the input document image of the character recognition cloud API group whose character recognition result is positive among the other character recognition cloud APIs is smaller than a predetermined threshold value, the center server may add the feature of the processing target document image and the group of character recognition cloud APIs whose character recognition result is positive among the other character recognition cloud APIs to the selection database.

In the document image recognition system according to the present invention, when the character recognition result received from the one character recognition cloud API is positive and the character recognition result received from the other character recognition cloud APIs other than the one character recognition cloud API is not positive, and the similarity value between the feature of the processing target document image and the feature of the input document image grouped with the one character recognition cloud API is equal to or greater than a predetermined threshold value, the center server may update the feature of the input document image grouped with the one character recognition cloud API based on the feature of the processing target document image.

In the document image recognition system according to the present invention, when the character recognition result received from the one character recognition cloud API is positive, and the character recognition result received from the other character recognition cloud APIs other than the one character recognition cloud API is not positive, and the similarity value between the feature of the processing target document image and the feature of the input document image grouped with the one character recognition cloud API is smaller than the predetermined threshold value, the center server may add the feature of the processing target document image and the group of the one character recognition cloud API to the selection database.

In the document image recognition system according to the present invention, when the character recognition result received from the one character recognition cloud API selected is not positive, and at least one of the character recognition results received from the character recognition cloud APIs other than the one character recognition cloud API selected is positive, and when the similarity value between the feature of the processing target document image and the feature of the input document image grouped by the character recognition cloud API whose character recognition result is positive among the other character recognition cloud APIs is equal to or greater than a predetermined threshold value, the center server may update the feature of the input document image grouped by the character recognition cloud API whose character recognition result is positive among the other character recognition cloud APIs based on the feature of the processing target document image.

In the document image recognition system according to the present invention, when the character recognition result received from the one character recognition cloud API selected is not positive, and at least one of the character recognition results received from the character recognition cloud APIs other than the one character recognition cloud API selected is positive, and when the similarity value between the feature of the processing target document image and the feature of the input document image in the group of character recognition cloud APIs in which the character recognition result is positive among the other character recognition cloud APIs is less than the predetermined threshold, the center server may add the group of character recognition cloud APIs in which the feature of the processing target document image and the character recognition result is positive among the other character recognition cloud APIs to the selection database.

In the document image recognition system according to the present invention, when the character recognition result received from the one character recognition cloud API is not positive and the character recognition result received from the other character recognition cloud APIs other than the one character recognition cloud API is not positive, the center server may transmit the processing target document image to the other character recognition cloud APIs other than the character recognition cloud API stored in the selection database in a group with the features of the input document image, and when the character recognition result received from the other character recognition cloud APIs is positive, the center server may add the group of the features of the processing target document image and the other character recognition cloud APIs to the selection database.

In the document image recognition system according to the present invention, the feature of the document image may include at least one of an image feature amount calculated from pixel information of the document image, an image attribute indicating a status when the document image is acquired by the user terminal, and a learning feature value calculated using a learning machine.

In the document image recognition system according to the present invention, the image attribute may be information acquired by the user terminal when the user terminal acquires the document image, and may include at least one of brightness, illuminance, acquisition location, and acquisition time of the document image.

In the document image recognition system according to the present invention, the character recognition cloud API stored in the selection database may be the following character recognition cloud API: the feature of the plurality of setting document images having known character strings is extracted, the setting document images having similar features are grouped, and when character recognition of the plurality of setting document images included in each group of setting document images is performed, the positive solution of the character recognition is the largest, and the feature of the input document image grouped with the character recognition cloud API is a representative feature representing the feature of each group of setting document images.

Effects of the invention

The invention can provide a document image recognition system with high character recognition accuracy.

Drawings

Fig. 1 is a system diagram showing the structure of a document image recognition system of an embodiment.

Fig. 2 is a system diagram showing the structure of a general-purpose computer.

Fig. 3 is a flowchart showing the first half of the selection database setting operation of the document image recognition system according to the embodiment.

Fig. 4 is a flowchart showing the second half of the selection database setting operation of the document image recognition system according to the embodiment.

Fig. 5 is an explanatory diagram showing feature extraction of a setting document image in a selection database setting operation.

Fig. 6 is an explanatory diagram showing classification of image feature data sets and grouping of setting document images in the selection database setting operation.

Fig. 7 is an explanatory diagram showing calculation of the positive solution rate of the character recognition cloud API and extraction of the character recognition cloud API with the highest positive solution rate in the selection database setting operation.

Fig. 8 is an explanatory diagram showing generation of a representative image feature data set in the selection database setting operation.

Fig. 9 is an explanatory diagram showing correspondence between a group representing an image feature data set and a character recognition cloud API and a group representing an image feature data set and a document image for setting.

Fig. 10 is an explanatory diagram showing the construction of the selection database.

Fig. 11 is a flowchart showing a character recognition action of the document image recognition system of the embodiment.

Fig. 12 is an explanatory diagram showing feature extraction of a processing target document image in a character recognition operation.

Fig. 13 is an explanatory diagram showing selection of the character recognition cloud API in the character recognition action.

Fig. 14 is a flowchart showing a selection database updating operation in the case where there is an input of an orthographic solution character string of a processing target document image from a user terminal.

Fig. 15 is a flowchart showing the processing in the case of the bonding point 2 shown in fig. 14.

Fig. 16 is a flowchart showing the processing in the case of the junction 3 shown in fig. 14.

Fig. 17 is a flowchart showing the processing in the case of the junction point 4 shown in fig. 14.

Fig. 18 is a flowchart showing the processing in the case of the junction point 5 shown in fig. 17.

Fig. 19 is an explanatory diagram showing a selection database updating operation in the case where there is an input of an orthographic solution character string of a processing target document image from a user terminal.

Detailed Description

Next, a document image recognition system 100 of an embodiment will be described with reference to the drawings. In the following description, the character recognition cloud API is described as the cloud API31 or the cloud API 32. As shown in fig. 1, the document image recognition system 100 is constituted by a user terminal 10, a center server 20, and a cloud API group 30 including a plurality of cloud APIs 31. The user terminal 10 acquires the document image and transmits it to the center server 20. The center server 20 transmits the document image to the cloud API31 selected from the cloud API group 30, receives the character recognition result from the cloud API31, and transmits the character recognition result to the user terminal 10. The user terminal 10 displays the character recognition result received from the center server 20. In the following description, the reference numeral 31 is used without distinguishing a plurality of cloud APIs 31, and letters are appended to the reference numeral 31 with brackets, as in the cloud APIs 31 (a) to 31 (M), when distinguishing each cloud API 31.

The user terminal 10 is constituted by a smart phone with a camera or a tablet terminal with a camera, and is connected to the center server 20 via a communication line such as the internet or a telephone line. The user terminal 10 includes 3 functional blocks, namely, a document image acquisition unit 11, a character string display unit 12, and a correct character string input unit 13. The user terminal 10 acquires a document image by imaging or the like by the document image acquisition unit 11, and transmits the acquired document image to the center server 20 as a processing target document image 80 (see fig. 12). The user terminal 10 receives the character recognition result of the processing target document image 80 from the center server 20, and displays the result on the character string display unit 12. The forward solution string input unit 13 of the user terminal 10 receives a user's consent input when the string displayed on the string display unit 12 is a correct string, and receives a user's forward solution string input when the string displayed on the string display unit 12 is an incorrect string.

The document image acquisition unit 11 of the user terminal 10 is realized by a camera attached to the user terminal 10. The character string display 12 is realized by a screen of a smart phone or a tablet terminal. The forward character string input unit 13 is realized by an input device such as an icon, a touch key, or a keyboard displayed on a screen of a smart phone or a tablet terminal, and a character conversion function or a voice input function.

The center server 20 is connected to the user terminal 10 via a communication line, and is connected to each cloud API31 included in the cloud API group 30 via a communication line such as the internet or a telephone line. The center server 20 has 3 functional blocks of a character recognition processing unit 21, a selection database 24, and a selection database updating unit 25. The character recognition processing unit 21 includes 2 functional blocks, namely, a data transmitting/receiving unit 22 and a cloud API selecting unit 23.

The data transmitting/receiving unit 22 receives the processing target document image 80 from the user terminal 10, and transmits the received processing target document image 80 to one of the cloud APIs 31 selected by the cloud API selecting unit 23. The data transmitting/receiving unit 22 receives the character recognition result from the selected one of the cloud APIs 31, and transmits the received character recognition result to the user terminal 10. The cloud API selecting unit 23 refers to the selection database 24, selects the cloud API31 most suitable for character recognition based on the characteristics of the processing target document image 80, and outputs the selected result to the data transmitting/receiving unit 22. Here, the selection database 24 is a database storing a group of cloud APIs 31 having the largest positive solution rate of character recognition among the plurality of cloud APIs 31 when character recognition processing of the input document image is performed and features of the input document image. The operation of the cloud API selecting unit 23 will be described in detail later.

When the forward solution character string of the processing target document image 80 is input by the user terminal 10, the selection database updating unit 25 transmits the processing target document image 80 to each cloud API31 of the cloud API group 30, receives the character recognition result from each cloud API31, and updates the content of the selection database 24 based on the degree of forward solution or non-forward solution of the character recognition result, that is, the degree of forward solution. The operation of the selection database updating unit 25 will be described in detail later.

The functional blocks of the central server 20 can be implemented by a general purpose computer 150 shown in fig. 2. As shown in fig. 2, the general-purpose computer 150 includes a CPU151 as a processor that performs information processing, a ROM152, a RAM153, a Hard Disk Drive (HDD) 154 storing programs, user data, and the like, which temporarily store data at the time of information processing, a mouse 155 provided as an input unit, a keyboard 156, and a display 157 provided as a display device. The CPU151, ROM152, RAM153, and HDD154 are connected via a data bus 160. In addition, a mouse 155, a keyboard 156, and a display 157 are connected to the data bus 160 via an input-output interface 158. A network controller 159 provided as a communication unit is connected to the data bus 160.

The data transmitting/receiving unit 22, the cloud API selecting unit 23, and the selection database updating unit 25 of the center server 20 are realized by the cooperation between the hardware of the general-purpose computer 150 shown in fig. 2 and the program that operates in the CPU 151. The selection database 24 is implemented by storing a set of characteristics of the input document image and the cloud API31 in the HDD154 of the general-purpose computer 150 shown in fig. 2. In addition, the HDD154 may be replaced by an external storage unit through a network.

The plurality of cloud APIs 31 are character recognition function application program interfaces (character recognition cloud APIs) provided by the cloud service. Each cloud API31 performs character recognition processing of a document image input from the outside, and outputs a character recognition result to the outside. Each cloud API31 is connected to the center server 20 via a communication line such as the internet or a telephone line.

Next, an example of the setting operation of the selection database 24 will be described with reference to fig. 3 to 10. In the following description, the respective reference numerals 50, 51, 55, 60, and 70 are used without distinguishing the plurality of setting document images 50, the plurality of image feature data sets 51, the plurality of image feature data set groups 55, the plurality of setting document image groups 60, and the plurality of representative image feature data sets 70. In the case of distinguishing the plurality of setting document images 50, the plurality of image feature data sets 51, the plurality of image feature data set groups 55, the plurality of setting document image sets 60, and the plurality of representative image feature data sets 70, the numbers are given as (1), (2), and (J) by brackets after the reference numerals.

First, as shown in step S101 of fig. 3 and fig. 5, N setting document images 50 used for setting of the selection database 24 are prepared. The setting document image 50 is a document image containing a known character string included in the image.

Next, as shown in step S102 of fig. 3 and fig. 5, N setting document images 50 are input to the center server 20. The processor of the center server 20 extracts image features of each setting document image 50. As shown in fig. 5, the image features are extracted as an image feature data set 51 composed of a plurality of parameters representing the image features and data of the respective parameters. The parameters of the image feature data set 51 are constituted by a plurality of image feature amounts calculated from pixel information of the document image, a plurality of image attributes indicating a situation when the document image is acquired by the user terminal 10, and a learning feature value calculated using a learning machine. The image feature data set 51 may not include all of the image feature amount, the image attribute, and the learning feature value, and may include at least one of them.

As the image feature amount, various parameters can be used, and for example, an external white retention, an internal white retention, a chromaticity distribution, a chroma distribution, a color difference distribution, a formatting rate, and the like can be used. Here, the external margin ratio is an index indicating that the margin area of the outer periphery occupies several% with respect to the area of the document image. The internal white-out ratio is an index indicating that white portions in a document image other than the peripheral white-out occupy several%. The chromaticity distribution ratio is an index indicating the distribution condition of the color portion. The chroma distribution rate is an index indicating the distribution state of the color portion, similarly to the chroma distribution rate. The color difference distribution ratio is an index indicating the distribution condition of offset, overflow, and blurring of an image. The formatting rate is an index obtained by digitizing the case where characters are regularly arranged.

The image attributes include, for example, brightness, illuminance, acquisition place, and acquisition time of a document image when the document image is captured by the camera of the user terminal 10. The learning feature value is, for example, a feature value extracted using a Convolutional Neural Network (CNN).

Next, as shown in step S103 of fig. 3 and fig. 6, the processor of the center server 20 classifies the N image feature data sets 51 (1) to 51 (N) extracted in step S102 of fig. 3 into K image feature data set groups 55 (1) to 55 (K) each having a similarity value equal to or greater than a predetermined threshold value. As shown in fig. 6, each image feature data set group 55 includes a plurality of image feature data sets 51. For example, the image feature data sets 51 (1), 51 (4), … (N-1) are included in the image feature data set group 55 (1), and the image feature data sets 51 (2), 51 (3), … (N) are included in the image feature data set group 55 (K). Here, the similarity value is a numerical value indicating similarity with each other, 1.0 in the case of coincidence, and 0 in the case of complete dissimilarity. The predetermined threshold value can be determined freely, but may be, for example, about 0.7 to 0.9. In addition, the classification may be performed by using a higher threshold value, and if the classification cannot be performed smoothly, the classification may be performed by sequentially lowering the threshold value.

In step S104 of fig. 3, the processor of the center server 20 generates K setting document image groups 60 in which the setting document images 50 corresponding to the plurality of image feature data sets 51 included in the respective image feature data set groups 55 are set as a group, as shown in fig. 6. For example, the setting document images 50 (1), 50 (4), … (N-1) corresponding to the image feature data sets 51 (1), 51 (4), … (N-1) included in the image feature data set group 55 (1) are respectively grouped to generate the setting document image group 60 (1). The setting document images 50 (2), 50 (3), … (N) corresponding to the image feature data sets 51 (2), 51 (3), … (N) included in the image feature data set group 55 (K) are grouped to generate a setting document image group 60 (K).

Next, as shown in step S105 of fig. 4, the processor of the center server 20 sets 1 of the initial value to the counter J. Then, the flow proceeds to step S106 of fig. 4, and as shown in fig. 7, each setting document image included in the setting document image group 60 (J) is transmitted to the M cloud APIs 31. Then, as shown in step S107 of fig. 4, the center server 20 receives character recognition results from the M cloud APIs 31 (a) to 31 (M), respectively.

In step S108 of fig. 4, the processor of the center server 20 compares the character recognition result of the plurality of setting document images 50 included in the setting document image group 60 (J) received from one cloud API31 (a) with the known character strings included in each setting document image 50, and sets a case where the character recognition result and the known character strings are completely coincident as a positive solution and a case where the character recognition result and the known character strings are not completely coincident as a non-positive solution. Then, the processor of the center server 20 counts the number of setting document images 50 that become forward solutions.

Then, in step S109 of fig. 4, the processor of the center server 20 divides the forward solution number by the total number of the setting document images 50 included in the setting document image group 60 (J), and calculates a forward solution rate in the case where the cloud API31 (a) performs character recognition on the plurality of setting document images 50 of the setting document image group 60 (J).

Similarly, the processor of the center server 20 compares the character recognition results of the plurality of setting document images 50 included in the setting document image group 60 (J) received from the other cloud APIs 31 (B) to 31 (M) with the known character strings included in the respective setting document images 50, and calculates the positive solutions in the case where the cloud APIs 31 (B) to 31 (M) recognize the characters of the plurality of setting document images 50 in the setting document image group 60 (J) respectively.

Then, in step S110 of fig. 4, the processor of the center server 20 extracts the cloud API31 (a) having the highest positive solution rate calculated in step S109.

Next, in step S111 of fig. 4, as shown in fig. 8, the processor of the center server 20 generates a representative image feature data set 70 (J) in which the representative value of each parameter of the 1 image feature data set group 55 (J) is set as each data of each parameter. As shown in fig. 8, the image feature data sets 51 (1), 51 (4), … (N-1) are included in the image feature data set group 55 (1). Similarly, the image feature data set 51 (4) stores data of parameters such as the image feature quantity (1), the image feature quantity (2), the image attribute (1), the image attribute (2), and the learning feature value. The processor of the center server 20 stores representative values of the data of the respective parameters in the data for the parameters representing the image feature data set 70 (J). For example, an average value, a center value, or the like may be used as the representative value. When the average value is used, the representative value of the image feature quantity (1) becomes the average value of the image feature quantities (1) of the image feature data set 51 (1) to the image feature quantity (1) of the image feature data set 51 (N-1). In the image attribute (1), the term of the upper concept of each image attribute (1) including each image feature data set 51 may be set as a representative value. In the case where the image attribute (1) is the location where the document image is captured by the user terminal 10, the average value or the central value of the longitude and latitude may be set as the representative value.

As shown in fig. 9, the representative image feature data set 70 (J) is a representative feature representing a feature of an image of the setting document image group 60 (J) including a plurality of setting document images 50.

When the threshold value at the time of classification in step S103 in fig. 3 is set to about 0.7 to 0.9, the similarity value between the generated representative image feature data set 70 (J) and the plurality of image feature data sets 51 included in the image feature data set group 55 (J) is about 0.7 to 0.9 which is the same as the threshold value. Therefore, the cloud API31 (a) having the highest forward resolution in the case of character recognition of the plurality of setting document images 50 included in the setting document image group 60 (J) is the cloud API31 having the highest forward resolution when character recognition of a document image having the image feature data set 51 similar to the representative image feature data set 70 is performed.

In step S112 of fig. 4, the processor of the center server 20 stores the representative image feature data set 70 (J) generated in step S111 and the cloud API31 (a) having the highest positive solution rate extracted in step S110 of fig. 4 in a group in the selection database 24.

In step S113 of fig. 4, the processor of the center server 20 increments the counter J by 1, and in step S114 of fig. 4, it is determined whether the counter J exceeds the number of image feature data set groups 55 or the number K of setting document image groups 60. Then, if the determination is no in step S114 of fig. 4, the flow returns to step S106 of fig. 4.

Then, the processor of the center server 20 repeatedly executes steps S106 to S112 of fig. 4, generates K representative image feature data sets 70 of K groups and a group of cloud APIs 31 that becomes the highest forward resolution when character recognition of a document image having an image feature data set 51 similar to the representative image feature data set 70 is performed, and stores the groups in the selection database 24, as shown in fig. 10. In addition, 1 cloud API31 may also be grouped with multiple representative image feature data sets 70.

After that, when the processor of the center server 20 determines yes in step S114 in fig. 4, the setting operation of the selection database 24 is ended.

The setting operation of the selection database 24 described above is only an example, and the selection database 24 may be set by other operations.

Next, a character recognition operation using the document image recognition system 100 will be described with reference to fig. 1 and 11 to 13.

When the user transmits the document image acquired by the user terminal 10 as the processing target document image 80 to the center server 20 as shown in fig. 1, the data transmitting/receiving section 22 of the center server 20 receives the processing target document image 80 as shown in step S201 of fig. 11. The data transmitting/receiving section 22 outputs the received processing target document image 80 to the cloud API selecting section 23.

As shown in step S202 of fig. 11 and fig. 12, the cloud API selecting section 23 extracts the features of the processing target document image 80 and generates the image feature data set 81 of the processing target document image 80, as in the case described above in the selection database setting operation.

Next, as shown in step S203 of fig. 11 and fig. 13, the cloud API selecting section 23 calculates respective similarity values with the plurality of representative image feature data sets 70 stored in the selection database 24. Then, the representative image feature data set 70 (1) having the largest similarity value is selected. The maximum similarity value differs depending on the image feature data set 81 of the processing target document image 80, but becomes higher as 0.8 or 0.7, for example, when the image feature data set 81 approaches the feature of the setting document image 50 used in setting the selection database 24. On the other hand, when the image feature data set 81 is distant from the feature of the setting document image 50 used in setting the selection database 24, the image feature data set becomes low as about 0.2 to 0.3.

Then, in step S204 of fig. 11, the cloud API selecting section 23 selects the cloud API31 (a) grouped with the representative image feature data set 70 (1) selected in step S203, and outputs it to the data transmitting/receiving section 22.

As shown in step S205 of fig. 11, the data transmission/reception section 22 transmits the processing target document image 80 to the selected cloud API31 (a) input from the cloud API selection section 23. Then, in step S206 of fig. 11, the data transmission/reception section 22 receives the character recognition result from the cloud API31 (a).

Then, the data transmitting/receiving section 22 transmits the character recognition result received from the cloud API31 (a) to the user terminal 10.

As shown in fig. 1, the user terminal 10 displays a character string of the character recognition result transmitted from the data transmitting/receiving unit 22 of the center server 20 on the character string display unit 12.

As described above, the document image recognition system 100 according to the embodiment selects the cloud API31 that is most suitable for the character recognition processing of the processing target document image 80 received from the user terminal 10, and causes the cloud API31 to perform the character recognition processing, so that the character recognition processing can be performed with high accuracy.

Next, the operation of updating the selection database 24 will be described with reference to fig. 14 to 19.

As described above, the cloud API selecting section 23 calculates the similarity value between the image feature data set 81 of the processing target document image 80 and each of the plurality of representative image feature data sets 70 stored in the selection database 24, and selects the representative image feature data set 70 having the largest similarity value. However, when the image feature data set 81 approaches the feature of the setting document image 50 used in setting the selection database 24, the maximum similarity value increases, for example, as 0.8 or 0.7. On the other hand, when the image feature data set 81 is distant from the feature of the setting document image 50 used in setting the selection database 24, the maximum similarity value becomes low, for example, about 0.2 to 0.3. Therefore, in the case where the representative image feature data set 70 having the largest similarity value is selected and character recognition processing is performed using the cloud API31 grouped therewith, the character recognition result may not be a positive solution. Therefore, the selection database 24 needs to be updated so that the similarity value of the image feature data set 81 of the processing object document image 80 and the representative image feature data set 70 stored in the selection database 24 becomes as high as possible.

The user terminal 10 receives the character recognition result from the center server 20, displays the character string of the character recognition result on the character string display unit 12, and the user who has seen the character string inputs the forward solution character string included in the processing target document image 80 to the forward solution character string input unit 13, thereby starting updating the selection database 24. After inputting the forward solution character string, the user terminal 10 transmits the forward solution character string to the center server 20. The center server 20 transmits the processing target document image 80 to each cloud API31, and updates the selection database 24 according to the degree of positive or non-positive solution of the received character recognition result, that is, the degree of positive solution. The following is a detailed description. In the following description, the positive solution refers to a case where all character strings of the received character recognition result are correct, and the description is made as a non-positive solution when at least 1 incorrect character is included in the character strings of the received character recognition result. In the following description, the character recognition operation is described with the cloud API31 (a) selected.

As shown in fig. 1, the user confirms the character string of the character recognition result displayed in the character string display section 12 of the user terminal 10. At this time, the consent icon and the character input region are displayed in the screen of the user terminal 10. The consent icon and the character input area constitute the forward solution character string input section 13.

If the character recognition result displayed in the character string display section 12 is a correct character string, the user presses the consent icon displayed in the screen of the user terminal 10. Then, in step S207 of fig. 11, the user terminal 10 transmits the character recognition result transmitted from the center server 20 to the selection database updating section 25 of the center server 20 as a forward solution character string. On the other hand, when the user confirms that the character string displayed on the character string display unit 12 is a character string for which the character recognition result is not correct, the user inputs the correct character string of the processing target document image 80 in the character input area displayed on the screen of the user terminal 10. When the forward solution character string is input in the character input area, the user terminal 10 transmits the input forward solution character string to the selection database updating section 25 of the center server 20. In addition, the user may input the consent input or the forward solution character string by voice input. At this time, the voice input function constitutes the forward solution character string input section 13.

As shown in step S301 of fig. 14, the selection database updating unit 25 of the center server 20 waits until there is an input of the forward-solution character string of the processing target document image 80 from the user terminal 10, and after the input of the forward-solution character string, it proceeds to step S302 of fig. 14, and as shown in fig. 19, the processing target document image 80 is transmitted to all of the M cloud APIs 31 (a) to 31 (M). Then, as shown in step S303 of fig. 14, the selection database updating unit 25 receives character recognition results from the M cloud APIs 31 (a) to 31 (M).

As shown in step S304 of fig. 14 and fig. 19, the selection database updating unit 25 compares the character recognition result received from the cloud API31 (a) selected by the cloud API selecting unit 23 in the previous character recognition operation with the forward solution character string, and proceeds to step S305 of fig. 14 when the character recognition result of the selected cloud API31 (a) is forward solution.

In step S305 in fig. 14, the selection database updating unit 25 compares the character recognition results received from the cloud APIs 31 (B) to 31 (M) other than the cloud API31 (a) selected previously with the forward solution character string, and when there is a forward solution in at least one of the character recognition results received from the other cloud APIs 31 (B) to 31 (M), the flow proceeds to step S306 in fig. 15.

In step S306 in fig. 15, the selection database updating unit 25 determines whether or not the similarity value between the image feature data set 81 of the processing target document image 80 shown in fig. 12 and the representative image feature data set 70 (1) shown in fig. 13, which is a group with the cloud API31 (a) selected previously, is equal to or greater than a predetermined threshold value. Here, the predetermined threshold value can be selected freely, but may be set to about 0.8 or 0.7, for example.

When the selection database updating unit 25 determines yes in step S306 in fig. 15, the process proceeds to step S307 in fig. 15, and updates the representative image feature data set 70 (1) grouped with the cloud API31 (a) selected previously based on the image feature data set 81 of the processing target document image 80. The update may be performed, for example, by increasing or decreasing the data representing the parameters of the image feature data set 70 (1) by an amount obtained by weighting the differences between the data representing the parameters of the image feature data set 70 (1) and the data representing the parameters of the image feature data set 81 of the processing target document image 80. Further, each data representing each parameter of the image feature data set 70 (1) may be replaced with each data representing each parameter of the image feature data set 81 of the processing target document image 80.

If the selection database updating unit 25 determines no in step S306 in fig. 15, the flow proceeds to step S308 in fig. 15, and the image feature data set 81 of the processing target document image 80 and the group of one cloud API31 (a) selected previously are added to the selection database 24. However, when the above group exists in the selection database 24, the addition of the group is not performed.

After the selection database updating unit 25 ends the processing in step S307 or step S308 in fig. 15, it is determined whether or not the similarity value between the image feature data set 81 of the processing target document image 80 and the representative image feature data set 70 among the other cloud APIs 31 and the cloud API31 whose character recognition result is positive in step S305 in fig. 14 is equal to or greater than a predetermined threshold value, in step S309 in fig. 15.

When the selection database updating unit 25 determines yes in step S309 in fig. 15, it proceeds to step S310 in fig. 15 to update the representative image feature data set 70 grouped with the cloud API31 having the positive character recognition result among the other cloud APIs 31 based on the image feature data set 81 of the processing target document image 80. As in the case described above, the update may be performed by increasing or decreasing the data representing the parameters of the image feature data set 70 by an amount obtained by weighting the differences between the data representing the parameters of the image feature data set 70 and the data representing the parameters of the image feature data set 81 of the processing target document image 80. Further, each data representing each parameter of the image feature data set 70 may be replaced with each data representing each parameter of the image feature data set 81 of the processing target document image 80.

If the selection database updating unit 25 determines no in step S309 in fig. 15, the flow proceeds to step S311 in fig. 15, and the image feature data set 81 of the processing target document image 80 and the group of cloud APIs 31 whose character recognition result is positive among the other cloud APIs 31 are added to the selection database 24. In the case where the above group exists in the selection database 24, the addition of the group is not performed.

In addition, when the plurality of character recognition results received from the other cloud APIs 31 (B) to 31 (M) are positive solutions in step S305 in fig. 14, the processing in steps S309 to S311 in fig. 15 is performed for each of the other cloud APIs 31.

After the selection database updating unit 25 ends the processing in step S310 or S311 in fig. 15, the updating operation is ended.

When the selection database updating unit 25 determines no in step S305 in fig. 14, the operations of steps S401 to S403 in fig. 16 are performed. The operations of steps S401 to S403 in fig. 16 are the same as those of steps S306 to S308 shown in fig. 15, and therefore, the description thereof is omitted.

If the selection database updating unit 25 determines no in step S304 in fig. 14, the process proceeds to step S501 in fig. 17, and determines whether or not the character recognition results of the other cloud APIs 31 (B) to 31 (M) are positive. Then, when the selection database updating unit 25 determines yes in step S501 in fig. 17, the operations of steps S502 to S504 in fig. 17 are executed. The operations of steps S502 to S504 in fig. 17 are the same as those of steps S309 to S311 shown in fig. 15, and therefore, the description thereof is omitted.

If the selection database updating unit 25 determines "no" in step S501 in fig. 17, the flow proceeds to step S505 in fig. 18, and as shown in fig. 19, the processing target document image 80 is transmitted to another cloud API32 other than the cloud API31 stored in the selection database 24 in a group with the representative image feature data set 70. Then, as shown in step S506 of fig. 18, after receiving the character recognition result from the other cloud API32, the selection database updating unit 25 confirms whether or not the received character recognition result has a positive solution in step S507. Then, when the determination is yes in step S507 in fig. 18, the selection database updating unit 25 proceeds to step S508, and adds the group of the image feature data set 81 of the processing target document image 80 and the other cloud API32 to the selection database 24.

In the update operation described above, the representative image feature data set 70 grouped with the cloud API31 whose character recognition result is positive is brought close to the image feature data set 81 of the processing target document image 80, and therefore, the selection database 24 can be updated so that the similarity value between the image feature data set 81 of the processing target document image 80 and the representative image feature data set 70 stored in the selection database 24 gradually increases. In addition, when the character recognition result is not positive, the other cloud API32 whose character recognition result is positive and the image feature data set 81 of the processing target document image 80 are stored in the selection database 24 in groups, so that the range in which character recognition can be performed can be accurately enlarged.

Thereby, the character recognition accuracy of the document image recognition system 100 of the embodiment can be improved.

In the above description, the positive solution is described as a non-positive solution when all character strings of the received character recognition result are correct, and when at least 1 incorrect character is included in the character strings of the received character recognition result, but the present invention is not limited thereto. For example, when the proportion of the number of positive solution characters in the total number of characters included in the received character recognition result is equal to or greater than a predetermined threshold value such as 90%, the above-described update operation may be executed while the case where the proportion of the number of positive solution characters is smaller than the predetermined threshold value is regarded as a positive solution.

Description of the reference numerals

10: A user terminal; 11: a document image acquisition unit; 12: a character string display unit; 13: a forward-solution character string input unit; 20: a central server; 21: a character recognition processing unit; 22: a data transmitting/receiving unit; 23: a cloud API selection unit; 24: selecting a database; 25: a selection database updating unit; 30: cloud API groups; 31. 32: cloud API;50: setting a document image; 51. 81: an image feature dataset; 55: an image feature dataset set; 60: a set document image group; 70: representing an image feature dataset; 80: processing the object document image; 100: a document image recognition system; 150: a general purpose computer; 151: a CPU;152: a ROM;153: a RAM;154: an HDD;155: a mouse; 156: a keyboard; 157: a display; 158: an input/output controller; 159: a network controller; 160: a data bus.

Claims

1. A document image recognition system, the document image recognition system comprising:

A user terminal that acquires a document image;

A center server connected to the user terminal by a communication line; and

A plurality of character recognition cloud APIs connected to the center server via a communication line, for performing character recognition processing of the inputted document image, outputting a character recognition result,

It is characterized in that the method comprises the steps of,

The center server has a selection database storing a set of character recognition cloud APIs having a feature of an input document image and a greatest positive solution rate of character recognition at the time of character recognition processing of the input document image among a plurality of character recognition cloud APIs,

The user terminal transmits the acquired document image as a processing object document image to the center server,

The center server extracts characteristics of the processing object document image from the processing object document image received from the user terminal, selects characteristics of the input document image most similar to the characteristics of the processing object document image from among the characteristics of the input document image stored in the selection database, selects one character recognition cloud API that is grouped with the selected characteristics of the input document image, transmits the processing object document image to the selected one character recognition cloud API, receives a character recognition result from the one character recognition cloud API, transmits the received character recognition result to the user terminal,

The user terminal, upon receiving a character recognition result from the center server, outputting to the center server a forward solution character string contained in the processing object document image input by a user,

The center server, when the forward solution character string is input by the user terminal, transmits the processing object document image to each character recognition cloud API,

The central server receives character recognition results from the respective character recognition cloud APIs,

The center server updates each feature of each input document image of the selection database and each character recognition cloud API group, and either or both of adding the feature of the input document image and the group of the character recognition cloud APIs to the selection database according to the positive resolution of the received character recognition result,

When the character recognition result received from the selected one character recognition cloud API is positive, at least one of the character recognition results received from the character recognition cloud APIs other than the selected one character recognition cloud API is positive, and the similarity value between the feature of the processing target document image and the feature of the input document image grouped with the selected one character recognition cloud API is equal to or greater than a predetermined threshold value, the center server updates the feature of the input document image grouped with the selected one character recognition cloud API based on the feature of the processing target document image.

2. The document image recognition system of claim 1, wherein,

When the character recognition result received from the selected one character recognition cloud API is positive, and at least one of the character recognition results received from the other character recognition cloud APIs is positive, and the similarity value between the feature of the processing object document image and the feature of the input document image grouped with the selected one character recognition cloud API is smaller than a predetermined threshold value, the center server adds the feature of the processing object document image and the group of the selected one character recognition cloud API to the selection database.

3. A document image recognition system, the document image recognition system comprising:

A user terminal that acquires a document image;

A center server connected to the user terminal by a communication line; and

It is characterized in that the method comprises the steps of,

When the character recognition result received from the selected one character recognition cloud API is positive, and at least one of the character recognition results received from the character recognition cloud APIs other than the selected one character recognition cloud API is positive, and the similarity value of the feature of the processing-object document image and the feature of the input document image grouped by the character recognition cloud API whose character recognition result is positive among the other character recognition cloud APIs is equal to or greater than a predetermined threshold value, the center server updates the feature of the input document image grouped by the character recognition cloud API whose character recognition result is positive among the other character recognition cloud APIs based on the feature of the processing-object document image.

4. The document image recognition system of claim 3, wherein,

When the character recognition result received from the selected one character recognition cloud API is positive, and at least one of the character recognition results received from the other character recognition cloud APIs is positive, and the similarity value of the feature of the processing object document image and the feature of the input document image of the character recognition cloud API group whose character recognition result is positive among the other character recognition cloud APIs is smaller than a prescribed threshold value, the center server adds the feature of the processing object document image and the group of character recognition cloud APIs whose character recognition result is positive among the other character recognition cloud APIs to the selection database.

5. A document image recognition system, the document image recognition system comprising:

A user terminal that acquires a document image;

A center server connected to the user terminal by a communication line; and

It is characterized in that the method comprises the steps of,

When the character recognition result received from the selected one character recognition cloud API is positive, the character recognition result received from the other character recognition cloud APIs other than the selected one character recognition cloud API is not positive, and the similarity value of the feature of the processing object document image and the feature of the input document image grouped with the selected one character recognition cloud API is equal to or greater than a predetermined threshold value, the center server updates the feature of the input document image grouped with the selected one character recognition cloud API according to the feature of the processing object document image.

6. The document image recognition system of claim 5, wherein,

In the case where the character recognition result received from the selected one character recognition cloud API is positive, and the character recognition result received from the other character recognition cloud APIs other than the selected one character recognition cloud API is not positive, and the similarity value of the feature of the processing object document image and the feature of the input document image grouped with the selected one character recognition cloud API is smaller than the prescribed threshold value, the center server adds the feature of the processing object document image and the group of the selected one character recognition cloud API to the selection database.

7. A document image recognition system, the document image recognition system comprising:

A user terminal that acquires a document image;

A center server connected to the user terminal by a communication line; and

It is characterized in that the method comprises the steps of,

When the character recognition result received from the selected one character recognition cloud API is a non-positive solution, and at least one of the character recognition results received from the character recognition cloud APIs other than the selected one character recognition cloud API is a positive solution, and the similarity value of the feature of the processing-object document image and the feature of the input document image grouped by the character recognition cloud API whose character recognition result is a positive solution among the other character recognition cloud APIs is equal to or greater than a predetermined threshold value, the center server updates the feature of the input document image grouped by the character recognition cloud API whose character recognition result is a positive solution among the other character recognition cloud APIs based on the feature of the processing-object document image.

8. The document image recognition system of claim 7, wherein,

When the character recognition result received from the selected one character recognition cloud API is a non-positive solution and at least one of the character recognition results received from the character recognition cloud APIs other than the selected one character recognition cloud API is a positive solution and the similarity value of the feature of the processing object document image and the feature of the input document image of the character recognition cloud API group whose character recognition result is a positive solution among the other character recognition cloud APIs is smaller than a prescribed threshold value, the center server adds the feature of the processing object document image and the group of character recognition cloud APIs whose character recognition result is a positive solution among the other character recognition cloud APIs to the selection database.

9. A document image recognition system, the document image recognition system comprising:

A user terminal that acquires a document image;

A center server connected to the user terminal by a communication line; and

It is characterized in that the method comprises the steps of,

When the character recognition result received from the selected one character recognition cloud API is a non-positive solution and the character recognition result received from the other character recognition cloud APIs other than the selected one character recognition cloud API is not a positive solution, the center server transmits the processing target document image to the other character recognition cloud APIs other than the character recognition cloud API stored in the selection database in groups with the features of the input document image, and when the character recognition result received from the other character recognition cloud APIs is a positive solution, the center server adds the group of the features of the processing target document image and the other character recognition cloud APIs to the selection database.

10. The document image recognition system according to any one of claims 1 to 9, wherein,

The feature of the document image includes at least one of an image feature amount calculated from pixel information of the document image, an image attribute indicating a situation when the document image is acquired by the user terminal, and a learning feature value calculated using a learning machine.

11. The document image recognition system of claim 10, wherein,

The image attribute is information acquired by the user terminal when the user terminal acquires the document image, and includes at least one of brightness, illuminance, acquisition location, and acquisition time of the document image.

12. The document image recognition system according to any one of claims 1 to 9, wherein,

The character recognition cloud API stored in the selection database is the following character recognition cloud API: extracting features of a plurality of setting document images having known character strings, grouping the setting document images having similar features, maximizing a positive solution of character recognition when character recognition of the plurality of setting document images included in each group of setting document images is performed,

The features of the input document image grouped with the character recognition cloud API are representative features representing features of each group of the setting document image.