WO2024084539A1

WO2024084539A1 - Table recognition device and method

Info

Publication number: WO2024084539A1
Application number: PCT/JP2022/038526
Authority: WO
Inventors: 美岬金井
Original assignee: 三菱電機株式会社
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2024-04-25
Also published as: JP7563655B2; JPWO2024084539A1

Abstract

The purpose of the present invention is to achieve accurate recognition of a character string written across a plurality of ruled line frames without relying on ruled line frame information.　The present invention provides a table recognition device that recognizes a character string written in a tabular document from image information of the tabular document, the table recognition device comprising: a character recognition unit that recognizes character strings each written inside each of a plurality of ruled line frames provided in the tabular document; and a ruled line frame merge determination unit that identifies, from among an independent character string recognized for a target ruled line frame treated as a target from among the plurality of ruled line frames and a concatenated character string obtained by concatenating the independent character string with a character string recognized for a ruled line frame different from the target ruled line frame the independent character string or the concatenated character string having a higher degree of agreement with a conforming character string to be written in the tabular document as a unified character string that belongs to the target ruled line frame.

Description

Table recognition device and method

The present disclosure relates to a table recognition device and method for performing character recognition from image information of a table-format document.

When recognizing characters from an image of a tabular document, the table's borders are extracted and the rows and columns of the table are separated into multiple areas (multiple border frames) by the borders. Character recognition is then performed for each border frame, and the character recognition results are stored separately. For this reason, if a single item name or item value is written across multiple border frames, such as when the character string within a border frame that is to become an item name or item value exceeds the border frame and spills over into an adjacent border frame, the item name or item value (character string) may not be recognized.

Therefore, a technology has been disclosed that recognizes an item name or item value (character string) by merging ruled frames when the ruled frame information of adjacent ruled frames satisfies a preset condition, for example, when adjacent ruled frames are both drawn with solid lines of the same thickness (for example, Patent Document 1).
According to this conventional technique, it is possible to recognize a character string that is written across multiple ruled frames.

JP 2017-097805 A

However, the conventional technology has the following problems. For example, adjacent ruled frames may have different types of lines, and the ruled frame information of adjacent ruled frames may differ. In such cases, the preset conditions are not met. Therefore, it is not possible to recognize a character string that is written across multiple ruled frames that do not meet the conditions.

The present disclosure has been made to solve the above-mentioned problems, and aims to make it possible to accurately recognize character strings that span multiple ruled frames without relying on ruled frame information.

The table recognition device of the present disclosure is
A table recognition device that recognizes character strings described in a table-formatted document from image information of the table-formatted document,
a character recognition unit that recognizes character strings written within each of a plurality of ruled frames provided in the table format document;
a single character string that is a character string recognized for a target ruled frame that is a target ruled frame among the plurality of ruled frames, and a concatenated character string that is a concatenation of a character string recognized for a ruled frame other than the target ruled frame and the single character string,
The apparatus further includes a ruled line frame integration determination unit that determines the single character string or the concatenated character string that has a higher degree of match with the matching character string to be written in the table format document as an integrated character string that belongs to the target ruled line frame.

The table recognition method of the present disclosure comprises:
A table recognition method for recognizing a character string described in a table-formatted document from image information of the table-formatted document, comprising the steps of:
a character recognition unit that recognizes character strings written within each of a plurality of ruled frames provided in the table format document;
a single character string that is a character string recognized for a target ruled line frame that is a target ruled line frame among the plurality of ruled line frames, and a concatenated character string that is a concatenation of a character string recognized for a ruled line frame other than the target ruled line frame and the single character string,
The character string or the concatenated character string that matches the matching character string to be written in the table format document is determined as an integrated character string that belongs to the target ruled frame.

According to the present disclosure, it is possible to accurately recognize a character string that is written across multiple ruled frames without relying on ruled frame information.

1 is a functional configuration diagram illustrating a configuration of a table recognition device according to a first embodiment. 2 is a diagram illustrating an example of a table-format document to be recognized by the table recognition device in the first embodiment. FIG. 2 is a diagram illustrating an example of a knowledge database according to the first embodiment. FIG. 2 is a hardware configuration diagram of the table recognition device according to the first embodiment. 4 is a flowchart showing an operation sequence of the table recognition device in the first embodiment. 11 is a flowchart illustrating an operation sequence of a ruled line frame integration determination unit according to the first embodiment. 4 is a diagram illustrating an example of the operation of the table recognition device in the first embodiment. FIG. FIG. 11 is a functional configuration diagram showing the configuration of a table recognition device in embodiment 2. FIG. 13 is a diagram illustrating an example of a table structure knowledge database in the second embodiment. 13 is a flowchart showing an operation sequence of a ruled line integration determination unit in the second embodiment. 13 is a diagram illustrating an example of the operation of the table recognition device in embodiment 3. FIG.

In the description of the embodiments and drawings, the same elements and corresponding elements are given the same reference numerals. Descriptions of elements given the same reference numerals are omitted or simplified as appropriate. In the following embodiments, "part" may be read as "circuit," "process," "procedure," or "process" as appropriate.

Embodiment 1.
<Configuration>
The table recognition device in the embodiment 1 will be described with reference to Fig. 1 to Fig. 7. Fig. 1 is a functional block diagram showing the configuration of a table recognition device 100 in the embodiment 1. In Fig. 1, the table recognition device 100 is made up of a table structure recognition unit 1, a character recognition unit 2, a ruled line frame comprehensive determination unit 3, and a knowledge database 4.

2 is a diagram showing an example of a tabular document to be recognized by the table recognition device 100 in the first embodiment. In the first row and first column of the table shown in FIG. 2, the item name "Item A" is written, in the second column "Item B" is written, and in the third column "Item C" is written. The second row and subsequent rows of the table contain item values belonging to each item name. As shown in FIG. 2, in a tabular document, due to the effects of printing or the number of characters being too large, characters may protrude from the ruled frame or may touch the ruled frame. Specifically, in the case of the table shown in FIG. 2, the item value "Total Fat" belonging to the item name "Item A" is in a state where the character "at" protrudes into the ruled frame of the item name "Item B". Also, the item value "Saturated Fat" belonging to the item name "Item B" is in a state where the initial character "S" touches the ruled frame.

The table structure recognition unit 1 extracts lines from the image information of a table-format document and recognizes the table structure. The table structure is made up of multiple areas (i.e., multiple ruled frames) that separate the rows and columns of the table with lines.

The method of recognizing the table structure can be, for example, a method based on edge histograms. Specifically, from the image information of a tabular document, edges in two directions are obtained near the boundaries of white pixel clusters inside the table area (table area). Here, a white pixel cluster is a white area surrounded by a border frame of a color other than white. Then, based on the edge histograms obtained from each of the obtained edges in the two directions, partial information of the borders is obtained. Furthermore, based on the obtained partial information of the borders, border information of the table area is obtained and the table structure is recognized. Note that the method of recognizing the table structure is not limited to this. For example, various methods can be used as long as they can obtain information on the row and column structure of a table.

The character recognition unit 2 recognizes the character string within the ruled frame, for example, using optical character recognition (OCR) technology. Note that the method for recognizing the character string within the ruled frame is not limited to OCR, and other methods may also be used.

The ruled line frame integration determination unit 3 determines which ruled line frames should be integrated according to the degree of matching between the character string recognized by the character recognition unit 2 and the matching character string registered in the knowledge database 4. In other words, it determines whether the character string in each ruled line frame should be linked to the character strings in the adjacent ruled line frames on the left and right. For example, the degree of matching can be a likelihood. The likelihood is a value that indicates the "likelihood" that an arbitrary character string is estimated to belong to a certain character string group. For example, the likelihood can be the standardized edit distance between two character strings. Here, the standardized edit distance is a value obtained by dividing the edit distance by the length of the longer character string. However, the edit distance is the minimum number of operations required to transform one character string into the other character string by inserting, deleting, or replacing one character. The likelihood may be read as the degree of matching between two character strings.
Then, based on the judgment result, the character strings within the ruled frame are concatenated, and either the concatenated character string (concatenated character string) or the single character string (unconcatenated character string) is output as an integrated character string, which is the final integration judgment result.

The knowledge database 4 defines the item name and the matching string, which is a string to be written as the item value belonging to the item name. FIG. 3 shows an example of the contents of the matching string in the knowledge database 4. In the knowledge database 4, one or more matching strings are registered for each item name. In other words, any of the matching strings for the item value defined in the knowledge database 4 can be written within the ruled frame of the item value for each item name of the tabular document. The matching string is not limited to one word, but may be a phrase or a sentence consisting of multiple words. The IDX in FIG. 3 is an index number that is individually assigned to each matching string. This IDX is used to specify the matching string when referring to the matching string in the knowledge database 4. Specifically, in FIG. 3, for example, index numbers A1, A2, ..., A9 are assigned to each item value belonging to the item name "Item A" in order. The matching string is not limited to the string shown in FIG. 3, and can be set arbitrarily.
Furthermore, a so-called wildcard representing any character or number can be set for the item value of the knowledge database 4. Specifically, in the matching character string belonging to the item name "Item C" in Fig. 3, [*NUM*] means a wildcard, and any number can be substituted for the wildcard. By setting a wildcard for the item value of the knowledge database 4, it becomes unnecessary to set a matching character string for each character or number, and the storage capacity of the knowledge database 4 can be reduced. Also, the amount of calculation required to refer to a matching character string for each character or number can be reduced.
Furthermore, if the recognized character string has an error as a result of table recognition and satisfies a predetermined condition, it may be replaced with a character string of an item name or an item value registered in the knowledge database 4. Alternatively, a character string of an item name or an item value that is not registered may be excluded from the recognition target as a result of erroneous recognition. Specific conditions for replacement will be described later.

<Hardware>
Next, a description will be given of the hardware of the table recognition device 100 in embodiment 1. Fig. 4 is a hardware configuration diagram of the table recognition device 100. The table recognition device 100 has a processor 101, a memory 102, an external storage device 103, and an input/output interface 104.

The processor 101 controls the entire table recognition device 100. For example, the processor 101 is a CPU (Central Processing Unit) or an FPGA (Field Programmable Gate Array). The processor 101 may be a multiprocessor. The table recognition device 100 may also have a processing circuit.

The memory 102 is the main storage device of the table recognition device 100. For example, the memory 102 is a RAM (Random Access Memory). The external storage device 103 is an auxiliary storage device of the table recognition device 100. For example, the external storage device 103 is a HDD (Hard Disk Drive) or an SSD (Solid State Drive). The input/output interface 104 is an interface that transmits and receives data to and from an external device connected to the table recognition device 100. For example, the input/output interface 104 is a NIC (Network Interface Controller). For example, the external device is an image scanner, a display, etc. Note that illustration of the external device is omitted.

The processor 101 reads the table recognition program stored in the external storage device 103 into the memory 102, and the processor 101 executes the program, thereby realizing each process of the table recognition method. The external storage device 103 holds the program and data for realizing the table recognition method of the first embodiment. The table recognition program may be provided over a network, or may be provided by being recorded on a computer-readable recording medium. That is, the table recognition program may be provided, for example, as a program product.

The input/output interface 104 receives image information of a tabular document from an external device such as an image scanner, and outputs the table recognition results to an external device such as a display.

<Flowchart>
Next, the operation of table recognition device 100 in embodiment 1 will be described. Fig. 5 is a flowchart showing the operation sequence of table recognition device 100 in embodiment 1. For ease of explanation, it is assumed that the contents of the first row of a table (i.e., the row in which the item names are written) are known, and a method for integrating only the ruled frames of item values from the second row onwards will be described. Also, since processing of the row of item names (first row) will be omitted, the second row, which is the first row of item values, will be considered as the new first row in the explanation.

First, in step S1, the table structure recognition unit 1 extracts ruled lines from the image information of the tabular document and recognizes the table structure consisting of multiple ruled line frames. Furthermore, the table structure recognition unit 1 obtains the number of rows in the table of the tabular document and the number of columns in each row from the recognized table structure (step S1).

In step S2, the character recognition unit 2 recognizes the characters within the ruled frame that the table structure recognition unit 1 recognized in step S1 (step S2).

Next, in step S3, the ruled line frame integration determination unit 3 refers to the knowledge database 4 and determines which ruled line frames to integrate based on the character recognition results obtained in step S2. Then, based on the determination result, it concatenates the character strings in adjacent ruled line frames and outputs a concatenated character string, which is the concatenated character string, or a single character string, which is a single character string that is not concatenated (step S3).

FIG. 6 is a flowchart showing the sequence of operations performed by the ruled frame integration determination unit 3 in step S3. The "←" in the diagram represents the process of substituting the value or element on the right side for the variable on the left side. To simplify the explanation, the value held by the variable may be indicated by the symbolic name of the variable. The operations are performed for a table with the upper left corner as the origin.

First, in step S301, the variable i, which indicates the position of a row in the table, is assigned the value 1 (step S301).

In step S302, it is confirmed whether the variable i is equal to or less than the number of rows in the table. If the variable i is equal to or less than the number of rows in the table (Yes in step S302), the process proceeds to step S303. If the variable i is greater than the number of rows in the table (No in step S302), the process ends (END) since all rows have been evaluated.

In step S303, assign 1 to the variable j, which represents the position of the column in the table (step S303).

In step S304, it is confirmed whether variable j is less than or equal to the total number of ruled frames for item values belonging to row i (hereinafter, the number of items) (step S304). If variable j is less than or equal to the number of items (Yes in step S304), the process proceeds to step S305. If variable j exceeds the number of items (No in step S304), the process proceeds to step S316.

In step S305, it is confirmed whether there is a character recognition result within the ruled frame in the jth column of the ith row (step S305). If there is a character recognition result within the ruled frame in the jth column of the ith row (Yes in step S305), the process proceeds to step S306. If there is no character recognition result within the ruled frame in the jth column of the ith row (No in step S305), the process proceeds to step S315.

In step S306, 0 is assigned to the variable k (step S306). k represents the number of columns of other ruled lines to be merged with the ruled line frame in the jth column of the ith row. Specifically, when k=0, the ruled line frame in the jth column of the ith row is not merged and is treated as a single ruled line frame, and when k=1, one other adjacent ruled line frame is merged with the ruled line frame in the jth column of the ith row.

In step S307, it is confirmed whether the value of variable k is less than or equal to the number of items minus the value of variable j and 1 (step S307). If the value of variable k is less than or equal to the number of items minus the value of variable j and 1 (Yes in step S307), the process proceeds to step S308. If the value of variable k is greater than the number of items minus the value of variable j and 1 (No in step S307), the process proceeds to step S313.

In step S308, it is confirmed whether there is a character recognition result within the ruled frame in the (j+k)th column of the ith row (step S308). If there is a character recognition result within the ruled frame in the (j+k)th column of the ith row (Yes in step S308), the process proceeds to step S309. If there is no character recognition result within the ruled frame in the (j+k)th column of the ith row (No in step S308), the process proceeds to step S313.

In step S309, the ruled frames from the jth column to the (j+k)th column are integrated, and the character strings in the ruled frames are concatenated to obtain a concatenated character string. Then, the knowledge database 4 is referred to, and the matching character strings belonging to the jth item name (i.e., the jth column) in the knowledge database 4 are sequentially read out, the likelihood [j+k,j] that the concatenated character string belongs to the item name in the jth column is calculated, and the likelihood L[j+k,j] is substituted for the variable L1 (step S309). As a specific example of a method of referring to the knowledge database 4 in this embodiment, for example, when j=1 (i.e., the first column of the table), matching character strings belonging to the item name "Item A" in the knowledge database 4 are sequentially read out, and when j=2 (i.e., the second column of the table), matching character strings belonging to the item name "Item B" in the knowledge database 4 are sequentially read out. Note that in order to read out matching character strings from the knowledge database 4, the index number IDX can be used as a search key. Specifically, the likelihood [j+k,j] is calculated, for example, as follows: First, one matching string is read out from one or more matching strings registered in the knowledge database 4 as an item value belonging to the j-th item name (i.e., the j-th column) using the index number IDX as a key. Next, the standardized edit distance NED between the concatenated string obtained by concatenating the strings in the ruled box from the j-th column to the (j+k)th column and the matching string read out using the index number IDX as a key is calculated. The standardized edit distance NED is calculated for each matching string registered in the knowledge database 4. The standardized edit distance NED may be calculated for all matching strings registered in the knowledge database 4, or may be calculated for some matching strings. Next, the minimum value NED _MIN is calculated from the one or more calculated standardized edit distances NED. Then, the value obtained by subtracting the minimum value NED _MIN from 1 can be calculated as the likelihood [j+k,j]. In other words, the likelihood represents the degree of match between two strings when the most similar string is selected from the matching strings registered in the knowledge database 4 for the concatenated string. When the degree of match between two character strings is high, the number of times the character strings are transformed is small, and the standardized edit distance is small. Therefore, when the degree of match between a concatenated character string obtained by concatenating character strings in the ruled box from the jth column to the (j+k)th column and a matching character string registered in the knowledge database 4 as an item value belonging to the jth item name (i.e., the jth column) is high, the likelihood [j+k,j] indicates a high value (i.e., a value close to 1), and when the degree of match is low, the likelihood [j+k,j] indicates a low value (i.e., a value close to 0).

The likelihood may be calculated based on the output of a trained model trained in the knowledge database 4 using a known machine learning method such as DNN (Deep Neural Network). The trained model can be created from string data obtained from a large amount of tabular documents. Specifically, a large amount of string data is used to randomly extract multiple strings from the large amount of string data, and these strings are concatenated to generate a concatenated string. Next, the likelihood (e.g., standardized edit distance) between the input string data, which is the generated concatenated string, and the matching string registered in the knowledge database 4 is calculated. Next, the likelihood corresponding to each input string data is assigned as a correct answer label (or ranking) and used as training data. Then, the trained model can be created by machine learning using the input string data and the training data so that the estimated likelihood, which is the output of the trained model, matches the correct answer label. Since the estimated likelihood is output by inputting the concatenated string into the trained model, the likelihood can be directly calculated without referring to the knowledge database 4 or without using the knowledge database 4. This is particularly effective when there is a large number of matching strings registered in the knowledge database 4, and it can reduce the amount of calculation required to refer to the knowledge database 4 and the amount of memory required to store the matching strings in the knowledge database 4.

In step S310, to compare the concatenated string obtained in step S309 with another concatenated string obtained by concatenating a string in a ruled box adjacent to the concatenated string, the ruled boxes from column j to (j+k+1) are merged, the strings in the ruled boxes are concatenated, and the concatenated string is obtained. Then, as in the process of step S309, the knowledge database 4 is referenced to calculate the likelihood [j+k+1,j] that the concatenated string belongs to the jth item name, and the likelihood L[j+k+1,j] is assigned to variable L2 (step S310).

In step S311, it is confirmed whether the value of variable L1 is less than or equal to the value of variable L2 (step S311). If the value of variable L1 is less than or equal to the value of variable L2 (Yes in step S311), the likelihood of concatenating another string to the concatenated string is higher than the likelihood of the concatenated string obtained in step S309, so the process proceeds to step S312. If the value of L1 is greater than the value of L2 (No in step S311), the process proceeds to step S313.

In step S312, 1 is added to the variable k (step S312), and the process proceeds to step S307.

In step S313, it is confirmed whether the likelihood L[j+k, j] calculated in step S309 is equal to or greater than a predetermined threshold T1 (step S313).
If the likelihood L[j+k.j] is equal to or greater than the predetermined threshold T1 (Yes in step S313), the process proceeds to step S314. If the likelihood [j+k,j] is less than the predetermined threshold T1 (No in step S313), the process proceeds to step S315. Here, the predetermined threshold T1 is a threshold for suppressing (cutting off) an excessive increase in the number of ruled frame integration candidates C[j]. For example, the predetermined threshold T1 can be preset to 0.5, but is not limited to this value.

In step S314, the concatenated string obtained by concatenating the strings in the ruled frames from the jth column to the (j+k)th column as a candidate for merging the ruled frames C[j], and the row number and column number of each ruled frame are stored, for example, in a memory MEM (not shown) (step S314).

In step S315, 1 is added to the variable j (step S315), and the process proceeds to step S304.

In step S316, the memory MEM is referenced to check whether there is any overlap in the ruled line frames to be integrated for the ruled line frame integration candidates C[j]. For example, when j=1 and the ruled line frames of the first and second columns are the ruled line frame integration candidates C[j], and when j=2 and the ruled line frames of the second and third columns are the ruled line frame integration candidates C[j], the ruled line frame of the second column overlaps, so it is determined that there is an overlap in the ruled line frames to be integrated. If there is an overlap in the ruled line frames to be integrated (Yes in step S316), the process proceeds to step S317. If there is no overlap in the ruled line frames to be integrated (No in step S316), the process proceeds to step S318. In step S316, it is determined whether there is a character recognition error in the character string to be concatenated, and if it is determined that there is a character recognition error, the character string to be concatenated may be replaced with the matching character string with the highest degree of match from among the item values registered in the table knowledge database 4.

In step S317, among the merging candidates C[j] for overlapping ruled frames, candidates whose likelihood of belonging to the item name of the overlapping merging candidate is less than a predetermined threshold T1 are discarded from the memory MEM (step S317). As a method of rejection, for example, a predetermined number of candidates may be left in order of decreasing likelihood.

In step S318, 1 is added to the variable i (step S318), and the process proceeds to step S302. Note that it is not necessary to perform the series of processes from step S302 to step S318 on all ruled frames in the table-format document. For example, if it is clear during the process that all subsequent ruled frame contents will be blank, or if the user determines that merging of ruled frames is not necessary, the series of processes described above may be discontinued.

FIG. 7 is a diagram for explaining an example of the operation of the table recognition device of the first embodiment. FIG. 7(a) is an example of a tabular document to be recognized. In the tabular document shown in FIG. 7(a), "Total Fat" is written as an item value belonging to the item name "Item A", "Saturated Fat" is written as an item value belonging to the item name "Item B", and "25g" and "9g" are written as item values belonging to the item name "Item C". FIG. 7(b) is an example of a table structure recognition result for FIG. 7(a). FIG. 7(c) is an example of a character recognition result for FIG. 7(b). FIG. 7(d) is an example of a character string recognition result obtained by integrating the ruled frame of FIG. 7(c). For the sake of simplicity, it is assumed that the table positions of the item names "Item A", "Item B", and "Item C" are known, and that the character recognition of each item name has been performed correctly. In the following, we will omit the explanation of the processing of the ruled line integration determination unit 3 for the line (first line) of the item name in Figure 7, and will consider the second line, which is the first line of the item values, to be the new first line.

In the example of Figure 7, first, the table structure recognition unit 1 recognizes ruled frame 501 to ruled frame 509.

Next, character recognition unit 2 recognizes character strings within each area from ruled frame 501 to ruled frame 509. Then, character strings 510 to 517 are obtained as the character recognition results.

Here, the item value "Total Fat" belonging to the item name "Item A" exceeds the ruled frame in the second row and first column, and protrudes into the adjacent ruled frame in the second row and second column. Therefore, the character recognition result at this point is divided into a character string 513 ("Total F") and a character string 514 ("at"). That is, the character string 513 ("Total F") is recognized as the character string in the ruled frame 504 belonging to the item name "Item A". Furthermore, the character string 514 ("at") is erroneously recognized as the character string in the ruled frame 505 belonging to the item name "Item B".
Also, the item value "Saturated Fat" belonging to the item name "Item B" is within the ruled frame 508, but the initial "S" touches the vertical rule. Therefore, a character recognition error occurs due to the influence of the vertical rule, and the "S" changes to a "6". Therefore, the character recognition result at this point is erroneously recognized as the character string 516 ("6oaturated fat"). Furthermore, because the item value "9g" belonging to the item name "Item C" is entered left justified, it is also necessary to determine whether the item values "6oaturated Fat" and "9g" are a continuous character string.

Next, the ruled line frame integration judgment unit 3 refers to the matching strings (i.e., strings that should be written as item values) registered in the knowledge database 4, and judges which ruled line frames should be integrated based on the likelihood that the item value in each ruled line frame (i.e., the string obtained by character recognition) belongs to the item name (i.e., the minimum standardized edit distance minus 1). In other words, it judges whether the string in each ruled line frame should be concatenated with the strings in the adjacent left and right ruled line frames. Then, based on the judgment result, it concatenates the strings in the ruled line frames, and outputs the concatenated string, which is the concatenated string, or the single string, which is a single string that is not concatenated, as the integrated string, which is both the final integration judgment result and the recognition result.

6, a specific operation of the ruled line frame integration determination unit 3 will be described. First, consider a case where, in the first row (i=1) of the item value of the table, ruled line frame 504 (i.e., character string 513 ("Total F")), ruled line frame 505 (i.e., character string 514 ("at")), ruled line frame 506 (i.e., character string 515 "25g"), and a matching character string registered in the knowledge database 4 are evaluated.
For simplicity, only the case of the item value “Total Fat” belonging to the item name “Item A” in the knowledge database 4, the item value “Trans Fat” belonging to the item name “Item B”, and "[*NUM*]g” belonging to the item name “Item C” will be described.

First, when variable i=1 and variable j=1, it is determined whether j is equal to or less than the number of items (step S304). Since the number of items in the second row is 3, j is equal to or less than the number of items (Yes in step S304), and it is determined whether the character recognition result is present in the ruled frame in the jth column (step S305). When variable i=1 and variable j=1, character string 513 is present in the ruled frame in the jth column (i.e., ruled frame 504) (Yes in step S305), so 0 is substituted for variable k (step S306). In step S307, since the number of items in the second row is 3, the value obtained by subtracting (j+1) from the number of items is greater than k (=0) (Yes in step S307), so the process proceeds to step S308.
Next, it is determined whether the character recognition result is present in the ruled box in the (j+k)th column (step S308). Since the character string 513 exists in the ruled box in the (j+k)th column (k=0, i.e., the ruled box 504) (Yes in step S308), a variable L1 is calculated (step S309), and a variable L2 is calculated (step S310). Here, the variable L1 is the likelihood between the character string 513 "Total F" in the ruled box 504 and the item value "Total Fat" belonging to the item name "Item A" in the knowledge database 4, and can be calculated from the standardized edit distance NED between the character string 513 "Total F" and the matching character string "Total Fat". Moreover, variable L2 is the likelihood between the concatenated character string "Total Fat" obtained by concatenating ruled box 504 and ruled box 505 and the item value "Total Fat" belonging to the item name "Item A" in knowledge database 4, and can be calculated from the standardized edit distance NED between the concatenated character string "Total Fat" and the matching character string "Total Fat". Next, variable L1 and variable L2 are compared (step S311).
Here, in the calculation of the standardized edit distance NED, when converting the character string "Total F" to the item value "Total Fat", replacement of two characters is necessary. The length of the character string "Total Fat" is 9, including the blank character. Therefore, the variable L1 is 1-(2/9)=0.788. On the other hand, when converting the concatenated character string "Total Fat" to the item value "Total Fat", there is no replacement (i.e., replacement of 0 characters). Therefore, the variable L2 is 1-(0/9)=1.0. As a result of comparing the variables L1 and L2, L1<L2 (Yes in step S311), and 1 is added to the variable k (step S312). Then, the process returns to step S307.

In step S307, since the number of items in the second row is 3, the value obtained by subtracting (j+1) from the number of items is equal to k (=1) (Yes in step S307), and the process proceeds to step S308.
As in the above, it is determined whether the character recognition result is present in the ruled box in the (j+k)th column (step S308). Since the character string 514 "at" is present in the ruled box in the (j+k)th column (i.e., the ruled box 505) (Yes in step S305), the variable L1 is calculated (step S309), and the variable L2 is calculated (step S310). Here, the variable L1 is the likelihood between the concatenated character string "Total Fat" obtained by concatenating the ruled box 504 and the ruled box 505, and the item value "Total Fat" belonging to the item name "Item A" in the knowledge database 4, and can be calculated from the standardized edit distance NED. Furthermore, variable L2 is the likelihood between "Total Fat 25g", which is a concatenated character string obtained by concatenating ruled box 504, ruled box 505, and ruled box 506, and item value "Total Fat" belonging to item name "Item B" in knowledge database 4, and can be calculated from the standardized edit distance NED. Next, variable L1 and variable L2 are compared (step S311).
Here, in the calculation of the standardized edit distance NED, when converting the concatenated character string "Total Fat" to the item value "Total Fat", no replacement is performed (i.e., 0 characters are replaced). Therefore, the variable L1 is 1-(0/9)=1.0. On the other hand, when converting the concatenated character string "Total Fat 25g" to the item value "Total Fat", replacement of 4 characters is required. Therefore, the variable L2 is 1-4/9=0.556. As a result of comparing the variables L1 and L2, L1>L2 (No in step S311), and the process proceeds to step S313.
In step S313, the previously calculated variable L1 is greater than a predetermined threshold T1=0.5 (Yes in step S313), so ruled frame 504 and ruled frame 505 become a candidate for merging ruled frame C[j] (step S314). Then, 1 is added to variable j (step S315), and the process returns to the beginning of step S304.

Next, when variable i=1 and variable j=2, it is determined whether j is equal to or less than the number of items (step S304). Since the number of items in the second row is 3, j is equal to or less than the number of items (Yes in step S304), and it is determined whether the character recognition result is present in the ruled frame in the jth column (step S305). When variable i=1 and variable j=2, character string 514 is present in the ruled frame in the jth column (i.e., ruled frame 505) (Yes in step S305), so 0 is substituted for variable k (step S306). In step S307, since the number of items in the second row is 3, the value obtained by subtracting (j+1) from the number of items is greater than k (=0) (Yes in step S307), so the process proceeds to step S308.
Next, it is determined whether the character recognition result is present in the ruled frame of the (j+k)th column (step S308). Since the character string 413 exists in the ruled frame of the (j+k)th column (k=0, i.e., the ruled frame 505) (Yes in step S308), a variable L1 is calculated (step S309), and a variable L2 is calculated (step S310). Here, the variable L1 is the likelihood between the character string 514 "at" in the ruled frame 505 and "Trans Fat" belonging to the item name "Item B" in the knowledge database 4, and can be calculated from the standardized edit distance NED. Moreover, the variable L2 is the likelihood between the concatenated character string "at 25g" obtained by concatenating the character string in the ruled frame 505 and the character string in the ruled frame 506, and the item value "Trans Fat" belonging to the item name "Item B" in the knowledge database 4, and can be calculated from the standardized edit distance NED. Next, the variables L1 and L2 are compared (step S311).
Here, in the calculation of the standardized edit distance NED, when converting the character string "at" to the item value "Trans Fat", seven characters need to be replaced. The length of the character string "Trans Fat" is 9, including blank characters. Therefore, the variable L1 is 1-(7/9)=0.222. On the other hand, when converting the concatenated character string "at 25g" to the item value "Trans Fat", 11 characters need to be replaced. Therefore, the variable L2 is 1-11/9=0.0 (if 0 or less, it is limited to 0). As a result of comparing the variables L1 and L2, L1>L2 (No in step S311), and the process proceeds to step S313.
In step S313, since the previously calculated variable L1 is smaller than the predetermined threshold value T1 = 0.5 (No in step S313), it is not a candidate for merging ruled line frames C[j], and the process proceeds to step S315. Then, 1 is added to the variable j (step S315), and the process returns to the beginning of step S304.

Finally, when variable i=1 and variable j=3, it is determined whether j is equal to or less than the number of items (step S304). Since the number of items in the second row is 3, j is equal to or less than the number of items (Yes in step S304), and it is determined whether the character recognition result is present in the ruled frame in the jth column (step S305). When variable i=1 and variable j=3, character string 514 is present in the ruled frame in the jth column (i.e., ruled frame 506) (Yes in step S305), so 0 is substituted for variable k (step S306). In step S307, since the number of items in the second row is 3, the value obtained by subtracting (j+1) from the number of items is less than k (=0) (No in step S307), so the process proceeds to step S313.
In step S313, a variable L1 is calculated. Here, the variable L1 is the likelihood between the character string 515 “25g” in the ruled box 506 and the item name "[*NUM*]g" belonging to the item name “Item C” in the knowledge database 4, and can be calculated from the standardized edit distance NED.
Here, when converting the character string "25g" to the item value "[*NUM*]g" in the calculation of the standardized edit distance NED, no replacement is performed because [*NUM*] is a wildcard and can take any numeric value. Therefore, the variable L1 is 1.0, which is greater than the predetermined threshold T1=0.5 (Yes in step S313), and the unintegrated single ruled frame 506 becomes the ruled frame integration candidate C[j] (step S314).

As described above, all ruled line frames belonging to the first row (variable i = 1) of the table item names are evaluated, and the pair of ruled line frames 504 and 505, and single ruled line frame 505 are obtained as candidates for ruled line frame merging C[j]. Then, in step S316, it is determined whether there is any overlap in the ruled line frames to be merged. Since there is no overlap in the ruled line frames to be merged (No in step S316), the character strings that were recognized after being split into character string 513 "Total F" and character string 514 "at" are concatenated into a single character string as the concatenated character string "Total Fat".

Next, in step S316, the likelihood is used to determine whether there is a character recognition error in the concatenated string or the individual string. Since the likelihood of the concatenated string formed by concatenating strings 513 "Total F" and 514 "at" is 1.0 (i.e., a perfect match with the string that should be entered in the item value), the concatenated string is determined to have been recognized correctly, and string 521 ("Total Fat") is output as the integrated string, which is both the final integrated judgment result and the recognition result. Furthermore, string 515 is treated as an individual string that is a single string that is not concatenated. Since the likelihood is 1.0 in this case, string 515 is determined to have been recognized correctly, as are

strings

513 and 514, and string 522 ("25g") is output as the integrated string, which is both the final integrated judgment result and the recognition result.

Next, consider the case where, in the second row (i=2) of the item values in the table, lined box 508 (i.e., character string 516 ("6aturated Fat")), lined box 509 (i.e., character string 517 ("9g")), and matching character strings registered in knowledge database 4 are evaluated.
For simplicity, only the case of the item value "Saturated Fat" belonging to the item "Item B" in the knowledge database 4 and "[*NUM*]g" belonging to the item name "Item C" will be described.

First, when variable i=2 and variable j=1, it is determined whether j is equal to or less than the number of items (step S304). Since the number of items in the third row is 3, j is equal to or less than the number of items (Yes in step S304), and it is determined whether a character recognition result is present within the ruled frame in the jth column (step S305). When variable i=2 and variable j=1, no character string is present in the ruled frame in the jth column (i.e., ruled frame 507) (No in step S305), so 1 is added to variable j (step S315) and the process returns to the beginning of step S304.
Next, when variable i=2 and variable j=2, it is determined whether j is equal to or less than the number of items (step S304). Since the number of items in the third row is 3, j is equal to or less than the number of items (Yes in step S304), and it is determined whether the character recognition result is present in the ruled frame in the jth column (step S305). When variable i=2 and variable j=2, character string 516 is present in the ruled frame in the jth column (i.e., ruled frame 508) (Yes in step S305), so 0 is substituted for variable k (step S306). In step S307, since the number of items in the third row is 3, the value obtained by subtracting (j+1) from the number of items is equal to k (=0) (Yes in step S307), so the process proceeds to step S308.
Next, it is determined whether the character recognition result is present in the ruled box in the (j+k)th column (step S308). Since the character string 516 is present in the ruled box in the (j+k)th column (k=0, i.e., the ruled box 508) (Yes in step S308), a variable L1 is calculated (step S309), and a variable L2 is calculated (step S310). Here, the variable L1 is the likelihood between the character string 516 "Saturated Fat" in the ruled box 508 and the item value "Saturated Fat" belonging to the item name "Item B" in the knowledge database 4, and can be calculated from the standardized edit distance NED. Furthermore, variable L2 is the likelihood between "6aturated Fat9g", which is a concatenated character string obtained by concatenating the character string in ruled box 508 and the character string in ruled box 509, and the item value "Saturated Fat" belonging to the item name "Item B" in knowledge database 4, and can be calculated from the standardized edit distance NED. Next, variables L1 and L2 are compared (step S311).
Here, in the calculation of the standardized edit distance NED, when converting the character string "6aturated Fat" to the item value "Saturated Fat", one character needs to be replaced. The length of the character string "Saturated Fat" is 13, including the blank character. Therefore, the variable L1 is 1-(1/13)=0.923. On the other hand, when converting the concatenated character string "6aturated Fat9g" to the item value "Saturated Fat", three characters need to be replaced. The length of the character string "6aturated Fat9g" is 15, including the blank character. Therefore, the variable L2 is 1-3/15=0.8. As a result of comparing the variables L1 and L2, L1>L2 (No in step S311), and the process proceeds to step S313.
In step S313, the previously calculated variable L1 is greater than a predetermined threshold T1 = 0.5 (Yes in step S313), so the unmerged single ruled line frame 508 becomes the ruled line frame merger candidate C[j] (step S314).
Then, 1 is added to the variable j (step S315), and the process returns to the beginning of step S304.

Finally, when variable i=2 and variable j=3, it is determined whether j is equal to or less than the number of items (step S304). Since the number of items in the third row is 3, j is equal to or less than the number of items (Yes in step S304), and it is determined whether the character recognition result is present in the ruled frame in the jth column (step S305). When variable i=2 and variable j=3, character string 517 is present in the ruled frame in the jth column (i.e., ruled frame 509) (Yes in step S305), so 0 is substituted for variable k (step S306). In step S307, since the number of items in the third row is 3, the value obtained by subtracting (j+1) from the number of items is less than k (=0) (No in step S307), so the process proceeds to step S313.
In step S313, a variable L1 is calculated. Here, the variable L1 is the likelihood between the character string 517 “9g” in the ruled box 509 and the item name "[*NUM*]g" belonging to the item name “Item C” in the knowledge database 4, and can be calculated from the standardized edit distance NED.
Here, when converting the character string "9g" to the item value "[*NUM*]g" in the calculation of the standardized edit distance NED, [*NUM*] is a wildcard and any numerical value can be entered, so no replacement is performed. Therefore, the variable L1 is 1.0, which is greater than the predetermined threshold T1=0.5 (Yes in step S313), so the unintegrated single ruled frame 509 becomes the ruled frame integration candidate C[j] (step S314).

As described above, all ruled line frames belonging to the second row (variable i = 2) of the table item values have been evaluated, and single ruled line frames 508 and 509 have been obtained as candidates for merging ruled line frames C[j]. Then, in step S316, it is determined whether there is any overlap in the ruled line frames to be merged. Since there is no overlap in the ruled line frames to be merged (No in step S316), character string 516 in ruled line frame 508 and character string 517 in ruled line frame 509 are each treated as a single, unlinked character string.

Next, in step S313, it is determined whether or not there is a character recognition error in the concatenated character string or the single character string using the likelihood. The likelihood of the character string 516 is 0.923, that is, it is not an exact match with the character string that should be written in the item value, so the value of the likelihood is compared with a predetermined threshold T2 for error determination. At this time, the predetermined threshold T2 for error determination is preferably, for example, 0.7, which can be set in advance. Since the value of the likelihood of the character string 516 (0.923) is equal to or greater than the predetermined threshold T2 (0.7) for error determination, it is presumed that there is an error in this character string. Therefore, as a substitute for the character string 516, the character string 523 is replaced with the character string "Saturated Fat", which has the highest likelihood among the matching character strings registered in the knowledge database 4. Then, the character string 523 is output as an integrated character string, which is both the final integrated judgment result and the recognition result.
On the other hand, since the likelihood of character string 517 is 1.0, character string 517 is determined to have been correctly recognized, and character string 524 ("9g") is output as an integrated character string which is both the final integrated judgment result and the recognition result.
Even if the likelihood of a concatenated string or a single string is 1.0 (i.e., an exact match), the concatenated string or the single string may be replaced with the string with the highest likelihood among the matching strings registered in the knowledge database 4. This is because the replacement will result in the same string. In other words, if the likelihood is equal to or greater than a predetermined threshold T2 for error determination, including the case where the likelihood is 1.0, the string may be replaced with the string with the highest likelihood among the matching strings registered in the knowledge database 4. In other words, if the likelihood is equal to or greater than a predetermined threshold T2 for error determination, the string may be replaced with the matching string registered in the knowledge database 4 that was used to calculate the likelihood.

If the likelihood of the concatenated string is less than a predetermined threshold T2 (e.g., 0.7) for determining an error, it may be that, for example, the string itself has been correctly recognized, but the degree of match with the matching string registered in the knowledge database 4 is low. In that case, the concatenated string or the single string may be output as is, without being replaced with the matching string registered in the knowledge database 4.

The above process is performed for all ruled lines, and strings 518 to 524 are obtained as integrated strings obtained from the final integration decision of the ruled lines.

In this embodiment 1, a specific example of the operation of the table recognition device has been described only for the ruled frames belonging to the item values of a tabular document, but this is not limiting. For example, it is possible to recognize item names in the same way as item values. In this case, for example, matching character strings for item names can be registered in the knowledge database 4, and the same processing can be performed on the ruled frames belonging to item names as on the ruled frames belonging to item values.

As described above, the ruled line frame integration judgment unit 3 does not use the ruled line frame information of adjacent ruled line frames when judging the ruled line frame integration. Therefore, it is possible to correctly integrate ruled line frames without relying on the ruled line frame information.

The ruled frame integration determination unit 3 also references the matching strings registered in the knowledge database 4, and determines that the concatenated string should be concatenated if it is not a meaningless string of characters and is likely to be meaningful (i.e., if it is judged to have a high likelihood and be close to the item name or item value). Therefore, it is possible to integrate ruled frames even if there is an error in part of the item name or item value (for example, a character recognition error, a typo, partial omission of the written content, etc.). Furthermore, it is possible to replace it with a string that is close to the matching string registered in the knowledge database 4, so the correct string can be output.

The table recognition device described above in detail in embodiment 1 calculates the likelihood as the degree of match belonging to the item of the character recognition result within each frame line, and determines which character strings should be concatenated based on the calculated likelihood.
Therefore, a character string written across a plurality of ruled frames can be accurately recognized without relying on ruled frame information.

The table recognition device described in detail in the first embodiment refers to matching strings registered in a knowledge database, and determines that a concatenated string should be concatenated if it is highly likely that the concatenated string will make sense. Furthermore, the concatenated string is replaced with a string that is close to the matching string registered in the knowledge database.
Therefore, even if there is an error in character recognition in the item name or in the item name, not only can character strings be concatenated accurately, but also errors in the character recognition results can be corrected at the same time, providing a synergistic effect.

Embodiment 2.
In the above-mentioned first embodiment, a knowledge database is used for the ruled frame integration judgment, but this is not limited to this. For example, in the ruled frame integration judgment, information on table structure constraints, which is information that restricts compatible character strings that can be written as an integrated character string, can also be used. This configuration will be described as the second embodiment.

FIG. 8 is a functional configuration diagram showing the configuration of the table recognition device 100 in the second embodiment. The new component compared to FIG. 1 is the table structure knowledge database 5. The other components and operations are the same as those in FIG. 1, and the description will be omitted.

The table structure knowledge database 5 stores information on table structure constraints, which is information that restricts compatible strings that can be written in a ruled frame as an integrated string. For example, the information on table structure constraints is information that restricts compatible strings that can be written in a ruled frame as an integrated string from among multiple compatible strings defined in the knowledge database 4 based on string information in the surrounding ruled frames. More specifically, for example, when table items represent classifications such as major items, medium items, and minor items, the information on table structure constraints is information that indicates the relationship between these items. For example, the table structure knowledge database 5 may register compatible strings that can be written as item values belonging to item names in a similar manner to the knowledge database 4. Figure 9 is an example of the table structure knowledge database 5. As information on the table structure constraints shown in Figure 9, a compatible string (called a constraint string) belonging to the item name "Item A" is registered in the left column. Also, in the right column, when a matching string (i.e., a constrained string) for the item name "Item A" is written, a matching string (called a writable string) that can be written for the adjacent item name "Item B" is registered. Note that in the table structure knowledge database 5, the string of a matching string is not limited to a single word, and may be multiple words, phrases, or sentences.

The ruled line frame integration determination unit 3 refers to the knowledge database 4, the table structure knowledge database 5, and the ruled line frame integration candidates C[j] stored in a memory MEM (not shown), and limits the multiple matching character strings to one or more matching character strings using information on the constraints of the table structure. It then calculates the degree of match between the limited one or more matching character strings and the character string recognized by the character recognition unit 2, and determines which ruled line frames should be integrated depending on the degree of match.
In this embodiment 2, for example, the ruled line frame integration judgment unit 3 refers to the table structure knowledge database 5, and when the ruled line frame integration candidate C[j] corresponds to an item value (constrained string) belonging to a specified item name, the unit 3 restricts the item value candidates (i.e., integrated strings) belonging to other item names adjacent to the specified item name in the knowledge database 4 to one or more matching strings by restricting them to describable strings.

FIG. 10 is a flow chart showing the operation sequence of the ruled line integration determination unit 3 in the second embodiment. In FIG. 10, the steps that differ from FIG. 6 are step S309A and step S310A. Steps that are given the same numbers as in FIG. 6 perform the same processing as shown in the first embodiment, and therefore their explanations are omitted.

In step S309A, the ruled boxes from the jth column to the (j+k)th column are merged, and the character strings in the ruled boxes are concatenated to obtain a concatenated character string. Then, by referring to the table constraint information database 5 and the ruled box merging candidate C[j-1] stored in the memory MEM, it is determined whether the ruled box merging candidate C[j-1], which is a concatenated character string adjacent to the obtained concatenated character string, is a constrained character string for the obtained concatenated character string (step S309A).
If the ruled frame merging candidate C[j-1] corresponds to the restricted string, the matching string belonging to the jth item name (i.e., the jth column) in the knowledge database 4 is restricted to the describable strings described in the table structure knowledge database 5. Then, the likelihood [j+k,j] that the concatenated string belongs to the item name in the jth column is calculated, and the likelihood L[j+k,j] is substituted for the variable L1. If the ruled frame merging candidate C[j-1] does not correspond to the restricted string, no restriction is performed using the table structure knowledge database 5, and the likelihood [j+k,j] that the concatenated string belongs to the item name in the jth column is calculated, and the likelihood L[j+k,j] is substituted for the variable L1 (step S309A).

In step S310A, the ruled boxes from the jth column to the (j+k+1)th column are merged, and the character strings in the ruled boxes are concatenated to obtain a concatenated character string. Then, similar to the processing in step S309A, the table constraint information database 5 and the ruled box merging candidate C[j-1] stored in the memory MEM are referenced to determine whether the ruled box merging candidate C[j-1], which is a concatenated character string adjacent to the obtained concatenated character string, is a constrained character string for the obtained concatenated character string (step S310A).
If the ruled frame merging candidate C[j-1] corresponds to the restricted string, the matching string belonging to the jth item name (i.e., the jth column) in the knowledge database 4 is restricted to the describable strings described in the table structure knowledge database 5. Then, the likelihood [j+k+1,j] that the concatenated string belongs to the jth item name is calculated, and the likelihood L[j+k+1,j] is substituted for the variable L2. If the ruled frame merging candidate C[j-1] does not correspond to the restricted string, no restriction is performed using the table structure knowledge database 5, and the likelihood [j+k+1,j] that the concatenated string belongs to the jth item name is calculated, and the likelihood L[j+k+1,j] is substituted for the variable L2 (step S310A).

A specific example of the operation of the table recognition device of this embodiment 2 will be described using the table structure knowledge database 5 shown in FIG. 9 for the table shown in FIG. 7 above.

The ruled line box integration judgment unit 3 refers to a table structure knowledge database 5 and imposes constraints on a knowledge database 4 used when calculating the likelihood that the character recognition results within each ruled line box belong to an item.
Specifically, when character string 521 ("Total Fat") is obtained as an item value belonging to the item name "Item A" in FIG. 7(d), by referring to table structure knowledge database 5 in FIG. 9, the candidates for describable character strings for the item value belonging to the adjacent item name "Item B" are restricted to "Saturated Fat" or "Trans Fat."

By restricting the integration strings, which are candidates for item values belonging to adjacent item names, to compatible strings that can be written, it is possible to limit the candidates for integrating ruled frames to more accurate strings, improving the accuracy of the ruled frame integration determination. Furthermore, since the number of candidates for integrating ruled frames can be reduced, the amount of processing required for likelihood calculation can be reduced.

The table recognition device described above in detail in embodiment 2 limits the matching character strings to one or more matching character strings in the ruled frame integration determination by using table structure constraint information, which is information that restricts the matching character strings that can be written as an integrated character string, thereby improving the accuracy of the ruled frame integration determination.

Embodiment 3.
In the integrated determination of ruled line frames, the determination can be made taking into consideration the possibility that characters adjacent to the ruled lines have been erroneously recognized. This configuration will be described as a third embodiment.

When calculating the likelihood that a character string in a recognition result belongs to a certain item, the ruled line frame integration judgment 3 "weights" each character to reduce the influence of erroneous recognition of characters close to ruled lines.
For example, when the standardized edit distance is used as the likelihood, the cost value for character conversion of characters close to a ruled line can be weighted lightly in cost calculation of character conversion such as insertion, deletion, and replacement when calculating the standardized edit distance. The weighting value of the cost value is preferably, for example, 0.5 compared to the usual 1, but is not limited to this. For example, the weighting value of the cost value can be appropriately changed depending on the type of the ruled line, etc.
By weighting characters close to ruled lines so as to reduce the cost of character conversion, the effect of erroneous recognition of characters close to ruled lines can be suppressed, or in other words, erroneous recognition of characters close to ruled lines can be tolerated.

Regarding whether a character is close to a ruled line, for example, if the character touches the ruled line, the character is determined to be close to the ruled line. Even if the character is not touching the ruled line, for example, the distance from the ruled line to the character may be used to determine whether the character is close to the ruled line. In this case, if the distance from the ruled line to the character is closer than a predetermined threshold, the character is determined to be close to the ruled line. The threshold value for the distance from the ruled line to the character can be set in advance to a value corresponding to, for example, the thickness of the ruled line or the size of the character. Specifically, the threshold value for the distance from the ruled line to the character can be set to a distance three times the thickness of the ruled line. Furthermore, the number of characters determined to be close in a character string within one ruled line frame is not limited to one character. For example, in a three-character character string "ABC," if the distance from the ruled line to the characters "B" and "C" is closer than a predetermined threshold, the characters "B" and "C" are determined to be close to the ruled line. In other words, both the characters "B" and "C" can be subject to weighting of the cost value.

FIG. 11 is a specific example of the operation of the table recognition device of this embodiment 3. FIG. 11(a) is an example of a table to be recognized. The table shown in FIG. 11(a) has "Total Fat" as an item value belonging to the item name "Item A" and "25g" as an item value belonging to the item name "Item C". FIG. 11(b) is an example of the table structure recognition result for FIG. 11(a). FIG. 11(c) is an example of the character recognition result for FIG. 11(b). FIG. 11(d) is an example of the character string recognition result obtained by integrating the ruled frame of FIG. 11(c).

In the example of Figure 11, first, the table structure recognition unit 1 recognizes ruled frame 601 to ruled frame 606.

Then, the character recognition unit 2 recognizes character strings 607 to 612 within each ruled frame. Here, as shown in FIG. 11(c), in character string 610, the character "F" adjacent to the vertical double ruled line is mistakenly recognized as the character "P." Also, in character string 611, the character "a" adjacent to the vertical double ruled line is mistakenly recognized as the character "p."

Next, a specific operation of the ruled line frame integration determination section 3 will be described.
Consider the case where a concatenated string (i.e., "Total Ppt") obtained by concatenating ruled box 604 (i.e., character string 610 ("Total P")) with ruled box 605 (i.e., character string 611 ("pt")) is evaluated against matching strings registered in knowledge database 4. For simplicity of explanation, only the case of item value "Total Fat" belonging to item name "Item A" in knowledge database 4 will be described.

First, we will explain the case where the cost values of characters close to the ruled frame are not weighted. When converting the concatenated string "Total Ppt" obtained by concatenation into "Total Fat", it is necessary to replace the two characters "Pp". Furthermore, the length of the string "Total Fat" is 9, including spaces. Therefore, the likelihood is 1 - (2/9) = 0.778.

Next, we will explain the case where the weight of the cost value for character conversion of characters close to a ruled line is set to 0.5 in the likelihood calculation. In this case, the characters "P" and "p" are subject to the weighting of the cost value. When converting the concatenated character string "Total Ppt" obtained by concatenation into "Total Fat", the likelihood is 1 - (0.5 x 2/9) = 0.889.

As described above, the likelihood is 0.778 when the cost value is not weighted, whereas the likelihood is 0.889 when the cost value is weighted. In other words, the likelihood is higher than when the cost value is not weighted, and the possibility that a character string of another item name will be mistakenly adopted (as the correct character string) can be reduced. This can further improve the accuracy of the ruled frame integration judgment.

In the above specific example, the cost value is weighted lightly for characters close to vertical lines, but this is not limited to the above. For example, the same processing can be performed on characters close to horizontal lines, and the same effect as described above can be achieved.

As described above, in the table recognition device described in detail in the third embodiment, in the calculation of the likelihood in the ruled line frame integration determination unit, the cost value for character conversion of characters close to a ruled line is weighted lightly.
This reduces the influence of characters that are more likely to be erroneously recognized compared to other characters, thereby further improving the accuracy of the ruled line frame integration determination.

In each of the above-mentioned embodiments, likelihood has been shown as an example of the degree of similarity between two character strings, but this is not limiting. For example, character strings may be represented as vectors, and the cosine similarity between two character string vectors may be used as the degree of similarity. For example, when the cosine similarity is close to 1, the two character vectors are similar and the degree of similarity is high; on the other hand, when the cosine similarity is close to 0, the two character vectors are not similar and the degree of similarity is low.

In each of the above-described embodiments, the process of determining whether or not to merge ruled lines is not limited to languages that are written horizontally or left-to-right. For example, the table recognition device according to the above-described embodiments can also be applied to tables in which the rows and columns are swapped, such as vertically written documents. For example, the table recognition device according to the above-described embodiments can also be applied to languages in which writing starts from the right, such as Arabic.

In addition to the above, any other configuration may be used as long as it provides similar functions and effects. Furthermore, within the scope of the present disclosure, any component of the embodiment may be modified or omitted.

1 Table structure recognition unit, 2 Character recognition unit, 3 Ruled line frame integration determination unit, 4 Knowledge database, 5 Table structure knowledge database,
100 table recognition device, 101 processor, 102 memory, 103 external storage device, 104 input/output interface.

Claims

A table recognition device that recognizes character strings described in a table-formatted document from image information of the table-formatted document,
a character recognition unit that recognizes character strings written within each of a plurality of ruled frames provided in the table format document;
a single character string that is a character string recognized for a target ruled frame that is a target ruled frame among the plurality of ruled frame, and a concatenated character string that is a concatenation of a character string recognized for a ruled frame other than the target ruled frame and the single character string,
a ruled line frame integration determination unit that determines the single string or the concatenated string that has a higher degree of match with a matching string to be written in the table format document as an integrated string that is a string that belongs to the target ruled line frame.
2. The table recognition device according to claim 1, wherein the ruled frame integration determination unit replaces the integrated string with the matching string used to calculate the degree of similarity when the degree of similarity of the single string or the concatenated string determined as the integrated string is equal to or greater than a predetermined threshold.
The table recognition device according to claim 1 or 2, characterized in that the ruled line frame integration determination unit calculates the degree of match for one or more matching character strings limited by information that restricts the matching character strings that can be written within the ruled line frame from among a plurality of matching character strings defined for each ruled line frame, and determines the integrated character string.
4. The table recognition device according to claim 1, wherein the ruled line frame integration determination unit calculates the degree of coincidence by weighting the cost of character conversion of characters close to the ruled lines of the plurality of ruled line frames so as to reduce the cost.
A table recognition method for recognizing a character string described in a table-formatted document from image information of the table-formatted document, comprising the steps of:
a character recognition unit that recognizes character strings written within each of a plurality of ruled frames provided in the table format document;
a single character string that is a character string recognized for a target ruled line frame that is a target ruled line frame among the plurality of ruled line frames, and a concatenated character string that is a concatenation of a character string recognized for a ruled line frame other than the target ruled line frame and the single character string,
A table recognition method comprising: determining a character string or a concatenated character string that has a higher degree of match with a matching character string to be written in the table format document as an integrated character string that belongs to the target ruled frame.