WO2023139650A1

WO2023139650A1 - Drawing reading system, drawing reading method, and drawing reading program

Info

Publication number: WO2023139650A1
Application number: PCT/JP2022/001620
Authority: WO
Inventors: 辰彦斉藤
Original assignee: 三菱電機株式会社
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2023-07-27
Also published as: JP7383209B1; JPWO2023139650A1

Abstract

A drawing reading system (10) comprises a character recognition unit (101), a replacement candidate extraction unit (103), a co-occurrence filter unit (105), and a display control unit (109). The co-occurrence filter unit (105) executes processing for calculating a first occurrence score corresponding to an occurrence probability of a recognized string and one or more second occurrence scores corresponding to occurrence probabilities of one or more respective replacement candidate strings, on the basis of a co-occurrence dictionary (133) composed of at least one among first co-occurrence information indicating the frequency with which a target string in a target cell in a table section (62) and another string in another cell co-occur and second co-occurrence information indicating the frequency with which the target string in the target cell and a string in a drawing section (61) co-occur, and for, if one of the second occurrence scores satisfies a first replacement condition of being higher than the first occurrence score and higher than an occurrence score threshold, outputting the replacement candidate string having said second occurrence score as the recognized string, and outputting the recognized string if none of the second occurrence scores satisfy the first replacement condition.

Description

Drawing reading system, drawing reading method, and drawing reading program

The present disclosure relates to a drawing reading system, a drawing reading method, and a drawing reading program.

There is a proposal for a device that executes character recognition by correcting errors in a recognized character string by matching a recognized character string represented by a lattice of candidate characters with a knowledge dictionary that manages words in a specific field (see, for example, Patent Document 1). This device comprises generating means for generating words not managed by the knowledge dictionary by combining constituent characters of words managed by the knowledge dictionary, measuring means for measuring the distance between the recognized character string and the words managed by the knowledge dictionary, measuring the distance between the recognized character string and the word generated by the generating means, and correcting means for correcting errors in the recognized character string based on the measured distance.

Japanese Patent Application Laid-Open No. 11-175664 (see, for example, abstract)

However, when performing character recognition of characters in each cell of a table in a drawing (hereinafter also referred to as "table portion") with the above conventional device, there is a problem that the accuracy rate of character recognition is low because the character string in each cell is short.

The present disclosure has been made to solve the conventional problems described above, and aims to provide a drawing reading system, a drawing reading method, and a drawing reading program that make it possible to increase the accuracy rate of character recognition of character strings in cells of table portions in drawings.

The drawing reading system of the present disclosure analyzes data of a drawing to be read and divides the drawing into a drawing portion, a table portion, and a character string composed of one or more characters; a character recognition portion that performs character recognition on the character string to generate a recognized character string that is text data obtained as a result of the character recognition; a replacement candidate extracting unit for extracting one or more correct candidate character strings paired with a column as one or more replacement candidate character strings; A first occurrence score corresponding to the occurrence probability and one or more second occurrence scores corresponding to the occurrence probability of each of the one or more replacement candidate character strings are calculated, and if a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold, the replacement candidate character string of the second occurrence score that satisfies the first replacement condition is output as the recognized character string, and the first replacement is performed. A co-occurrence filtering unit that executes co-occurrence filtering, which is a process of outputting the recognized character string as it is when a condition is not satisfied, and a display control unit that outputs image data based on the recognized character string output from the co-occurrence filter unit.

The drawing reading method of the present disclosure is a method executed by an information processing apparatus, and includes the steps of analyzing data of a drawing to be read and dividing the drawing into a drawing portion, a table portion, and a character string composed of one or more characters; executing character recognition on the character string to generate a recognized character string that is text data obtained as a result of the character recognition; extracting one or more correct candidate character strings paired with the matching error character string as one or more replacement candidate character strings; and first co-occurrence information indicating frequency of co-occurrence of the target character string in the target cell of the table portion and other character strings in other cells of the table portion, and second co-occurrence information indicating frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion, based on a co-occurrence dictionary composed of at least one of co-occurrence information. calculating a first occurrence score corresponding to the occurrence probability of the string and one or more second occurrence scores corresponding to the occurrence probabilities of each of the one or more replacement candidate character strings; and outputting a replacement candidate character string with the second occurrence score that satisfies the first replacement condition as the recognition character string, when a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold; and a step of outputting image data based on the recognized character string subjected to the co-occurrence filtering.

By using the drawing reading system, the drawing reading method, and the drawing reading program of the present disclosure, it is possible to increase the accuracy rate of character recognition of character strings in cells of table portions in drawings.

1 is a configuration diagram showing an example of a hardware configuration of a drawing reading system according to Embodiment 1; FIG. FIG. 1 shows an example of a drawing in which a figure portion, a table portion, and a character string are described. FIG. 2 shows a character string in a cell in the table portion of the drawing of FIG. 2 and an example of the character recognition result of this character string (when misrecognition is included). FIG. 2 shows a character string in a cell in the table portion of the drawing of FIG. 2 and an example of the result of character recognition of this character string (when no erroneous recognition is included). 1 is a functional block diagram showing configurations of a drawing reading system and an information processing apparatus according to Embodiment 1; FIG. FIG. 3 shows an example of the results of character recognition of characters included in the drawing part in the drawing of FIG. FIG. 3 shows an example of a character recognition result of a character string in a cell of a table portion in the drawing of FIG. An example of an error pattern dictionary of the drawing reading system according to Embodiment 1 is shown in tabular form. An example of a co-occurrence dictionary of the drawing reading system according to Embodiment 1 is shown in tabular form. FIG. 4 shows an example of a replacement candidate character string generated by the replacement candidate extraction unit of the drawing reading system according to the first embodiment in tabular form. FIG. FIG. 2 shows in tabular form an example of replacement candidate character strings generated by the language model filter unit of the drawing reading system according to the first embodiment; FIG. FIG. 10 shows an example of replacement candidate character strings generated by the co-occurrence filter unit of the drawing reading system according to the first embodiment in a tabular format; FIG. 4 shows an example of text data, which is the result of character recognition output from the display control unit of the drawing reading system according to the first embodiment, in tabular form. 4 is a flow chart showing the operation of the information processing device of the drawing reading system according to Embodiment 1; FIG. 7 is a functional block diagram showing configurations of a drawing reading system and an information processing apparatus according to Embodiment 2; An example of input operation by the user operation unit of the drawing reading system according to the second embodiment is shown in a table format. 9 is a flow chart showing the operation of the drawing reading system according to Embodiment 2; FIG. 11 is a functional block diagram showing the configurations of a drawing reading system and an information processing device according to Embodiment 3; An example of a template for the drawing reading system according to the third embodiment is shown in tabular form. Another example of the template of the drawing reading system according to the third embodiment is shown in tabular form. An example of a parts list of the drawing reading system according to Embodiment 3 is shown in tabular form. FIG. 12 shows an example of the number of parts output from the display control unit of the drawing reading system according to the third embodiment in a tabular format; FIG. 10 is a flow chart showing the operation of the drawing reading system according to Embodiment 3;

A drawing reading system, a drawing reading method, and a drawing reading program according to the embodiment will be described below. The following embodiments are merely examples, and the embodiments can be combined as appropriate and each embodiment can be modified as appropriate.

The drawing reading system according to the embodiment can perform character recognition of character strings in cells of table portions in drawings to be read. A drawing may be a drawing drawn on paper or a drawing converted to electronic data (for example, Portable Document Format (PDF) data). The drawings include, for example, mechanical drawings, architectural drawings, civil engineering drawings, piping drawings, electrical drawings, maps, and layout drawings. In general, a drawing contains a figure portion, a table portion, and text. However, the figure portion and the table portion may be drawn on different pages. Figures are drawn in the figure portion. The graphics include, for example, devices (eg, distribution boards, panels such as switchboards), buildings, electrical devices (eg, meters, operation buttons, indicator lamps), and the like. The figure part may contain a character string. The table portion has a plurality of ruled lines that are boundaries of a plurality of cells. A character string is written in the cell of the table portion. A string consists of one or more characters. A character is an element that can be represented by text data. Characters include symbols.

The drawing reading system according to the embodiment acquires data of a drawing in which a character string is drawn, executes character recognition on the character string drawn in this drawing, and outputs a recognized character string, which is text data corresponding to the character string, as a result of character recognition. The drawing reading system according to the embodiment can improve the accuracy rate of character recognition of character strings in cells of table portions in drawings. The drawing reading method according to the embodiment can be implemented by an information processing device of a drawing reading system. Further, the drawing reading program according to the embodiment is executed by, for example, a computer as an information processing device.

Embodiment 1.
FIG. 1 shows an example of the hardware configuration of a drawing reading system 10 according to the first embodiment. The drawing reading system 10 has an information processing device 11 . The information processing device 11 is, for example, a computer. The information processing device 11 is a device capable of implementing the drawing reading method according to the first embodiment. The information processing device 11 acquires data I0 representing a drawing 60 in which a figure portion 61 and a table portion 62 are drawn, character-recognizes a character string in a cell of the table portion 62, and outputs the result of character recognition as output data. The output data includes, for example, text data indicating the characters corresponding to the character strings in the cells of table portion 62 .

The drawing reading system 10 has an image scanner 41 , a display 42 and a user operation section 43 . The image scanner 41 is an image reading device that optically reads the drawing 60 and provides the data I0 of the drawing 60 to the information processing device 11 . The display 42 displays an image based on image data output from the information processing device 11 . The user operation unit 43 receives user operations and provides information input by the user to the information processing apparatus 11 . Note that the image scanner 41 , the display 42 and the user operation unit 43 need not be part of the drawing reading system 10 . The image scanner 41, display 42, and user operation unit 43 may be an image scanner, display, and user operation unit provided in an external device (for example, a server on a network) that can communicate with the information processing device 11.

The information processing device 11 has a processor 201, a memory 202 that is a volatile storage device, a nonvolatile storage device 203 such as a hard disk drive (HDD) or solid state drive (SSD), a communication device 204 that communicates with external devices, and an interface 205. The memory 202 is, for example, a semiconductor memory such as a RAM (Random Access Memory).

Each function of the information processing device 11 is realized by a processing circuit. The processing circuit may be dedicated hardware or may be the processor 201 that executes a program (for example, a drawing reading program) stored in the memory 202 . The processor 201 may be any of a processing device, an arithmetic device, a microprocessor, a microcomputer, and a DSP (Digital Signal Processor).

When the processing circuit is dedicated hardware, the processing circuit is, for example, ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array).

When the processing circuit is the processor 201, the screen reading method is executed by software, firmware, or a combination of software and firmware. Software and firmware are written as programs and stored in memory 202 . The processor 201 can implement the drawing reading method according to the first embodiment by reading and executing the program stored in the memory 202 .

It should be noted that the information processing device 11 may be partially implemented by dedicated hardware and partially implemented by software or firmware. As such, the processing circuitry may implement each of the functions described above in hardware, software, firmware, or any combination thereof.

The interface 205 is used to communicate with other devices. Image scanner 41 , display 42 , and user operation unit 43 are connected to interface 205 .

FIG. 2 shows an example of a drawing 60 in which a figure portion 61, a table portion 62, and a character string are described. Drawing portion 61 is "Panel No. 101" with parts. "Panel No. 101" has gauges (e.g., meters) indicated by symbols A and XI (numbers 1 and 2), display lamps indicated by symbols LS and DS (numbers 3 and 4), and operation buttons indicated by symbols SS (numbers 5 and 6). Drawing 60 may not include drawing portion 61 . Character strings are written in the cells surrounded by the vertical and horizontal ruled lines of the table portion 62 . The table portion 62 describes the part numbers, symbols, and contents. The table portion 62 shows that the part with the number 1 and the symbol A is an instrument that indicates the degree of opening of the air volume control valve No.0. Also, it is shown that the part with symbol XI in number 2 is an instrument for indicating the circuit air volume of the reaction tank Δ. The part numbered LS with the number 3 is shown to be an indicator lamp that indicates the “open-closed” state of the air agitation valve No. 0 of the anaerobic tank A. The part with the symbol DS in number 4 is an indicator lamp that indicates whether the state of the air agitation valve No. 0 of the anaerobic tank A is "alarm - warning - normal". It is shown that the part with the symbol SS at number 5 is an emergency stop button. The part with the symbol SS at number 6 is shown to be a test button. The drawing 60 is an example, and the drawing to be read may have other description contents.

FIG. 3 shows, in tabular form, an example of a character string in a cell of the table portion 62 of the drawing 60 of FIG. 2 and the result of character recognition of this character string (when misrecognition is included). FIG. 3 shows an example in which the result of character recognition of "correct (SEI)" is "king (OU)" and misrecognition has occurred. FIG. 4 shows, in tabular form, an example of a character string in a cell of the table portion 62 of the drawing 60 of FIG. 2 and the result of character recognition of this character string (when misrecognition is not included).

FIG. 5 is a functional block diagram showing the configuration of the drawing reading system 10 and the information processing device 11. As shown in FIG. The drawing reading system 10 includes a layout analysis unit 100, a character recognition unit 101, a table format conversion unit 102, a replacement candidate extraction unit 103, a language model filter unit 104, a co-occurrence filter unit 105, a reliability calculation unit 108, a display control unit 109, an error pattern dictionary 131, a language model 132, and a co-occurrence dictionary 133. The replacement candidate extraction unit 103, the language model filter unit 104, and the co-occurrence filter unit 105 constitute a filter processing unit 110 that performs filtering for correcting the recognized character string. The error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 are stored, for example, in the storage device 203 of FIG. The error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 may be stored in a storage device of an external device (for example, a server on a network) that can communicate with the information processing device 11. FIG.

The layout analysis unit 100 analyzes the data I0 of the drawing 60 and divides the drawing 60 into a drawing portion 61, a table portion 62, and a character string (that is, a text portion) composed of one or more characters.

The character recognition unit 101 performs character recognition on the character string acquired from the layout analysis unit 100, and generates a recognized character string, which is text data obtained as a result of character recognition. Specifically, the character recognition unit 101 outputs a set of characters obtained by character recognition and position information thereof (that is, information indicating the position in the drawing 60) as a result of character recognition. FIG. 6 shows, in tabular form, an example of the results of character recognition of characters included in the drawing portion 61 of the drawing 60 of FIG.

The table format conversion unit 102 reconstructs the table structure based on the character recognition results of the character strings recognized by the character recognition unit 101, and holds the character strings in a table format. FIG. 7 shows, in tabular form, an example of character recognition results of character strings in cells of the table portion 62 of the drawing 60 of FIG. FIG. 7 shows an example in which "air volume control valve No. 0" is erroneously recognized as "air volume control valve Q", "reaction tank △" is erroneously recognized as "reaction tank A", and "positive (SEI)" is erroneously recognized as "king (OU)".

The filter processing unit 110 finds errors in character recognition and corrects the character recognition results by filtering the tabular character recognition results. The filter processing unit 110 uses the error pattern dictionary 131 to perform a process of setting those that match the error pattern as replacement candidates, uses the language model 132 to perform a process of calculating the language score, and then performs a process of calculating the co-occurrence score using the co-occurrence dictionary 133. The filtering unit 110 determines whether or not to modify the recognized character string based on either or both of the language score and the co-occurrence score.

In addition, the filter processing unit 110 uses the error pattern dictionary 131 to perform a process of setting those that match the error pattern as replacement candidates, uses the co-occurrence dictionary 133 to perform a process of calculating a co-occurrence score, and then uses the language model 132 to perform a process of calculating a language score.

In addition, the filter processing unit 110 uses the error pattern dictionary 131 to perform the process of setting those matching the error pattern as replacement candidates, uses the co-occurrence dictionary 133 to perform the process of calculating the co-occurrence score, and uses the language model 132 to not perform the process of calculating the language score. In other words, the information processing device 11 may not include the language model 132 and the language model filter unit 104 . In this case, the filtering unit 110 determines whether or not to modify the recognized character string based on the co-occurrence score.

FIG. 8 shows an example of the error pattern dictionary 131 in tabular form. The error pattern dictionary 131 stores information on pairs of character recognition errors and correct character strings. The error pattern dictionary 131 is configured by previously collecting error patterns, which are pairs of error character strings and correct candidate character strings. The replacement candidate extraction unit 103 extracts one or more correct candidate character strings paired with an error character string matching the recognized character string from the error pattern dictionary 131 as one or more replacement candidate character strings. That is, the replacement candidate extraction unit 103 uses the error pattern dictionary 131 to extract a character string that is determined to be a character recognition error, and in preparation for replacing it with a correct character string, for example, creates a correct candidate character string (also referred to as a "replacement candidate character string") that is a candidate for the correct character string.

The language model filter unit 104 calculates a first language score corresponding to the occurrence probability of the recognized character string (that is, the occurrence probability before replacement) and one or more second language scores that correspond to the occurrence probability of each of one or more replacement candidate character strings (that is, occurrence probability after replacement) based on the language model 132 that stores the word chains and the occurrence probabilities of the chains. The language model 132 is a model in which word sequences are statistically described, learned from a large corpus. The language model 132 is stored in the storage device 203 in advance. The language model filter unit 104 outputs the replacement candidate character string of the second language score that satisfies the replacement condition C2 as the recognized character string when there is a second language score that satisfies the replacement condition C2 (the second replacement condition) that any one of the one or more second language scores is higher than the first language score and higher than the predetermined language score threshold TH2, and outputs the recognized character string as it is when the replacement condition C2 is not satisfied. The above processing performed by the language model filter unit 104 is also called language model filtering. Language model filtering is processing that increases the language model score.

FIG. 9 shows an example of the co-occurrence dictionary 133 in tabular form. The co-occurrence dictionary 133 includes at least one of co-occurrence information of first co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell of the table portion 62 and other character strings in other cells of the table portion 62, and second co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion 61. Specifically, the co-occurrence dictionary 133 is a dictionary that describes co-occurrence information on top, bottom, left, and right of the table part of the drawing, and co-occurrence information with the figure part corresponding to the table part or with the character strings outside the table part, learned from a large corpus. The co-occurrence dictionary 133 is stored in advance in the storage device 203 . The co-occurrence dictionary 133 may be stored in a storage device of another device (for example, a server on a network) that can communicate with the information processing device 11 .

Based on the co-occurrence dictionary 133, the co-occurrence filter unit 105 calculates a first occurrence score corresponding to the occurrence probability of the recognized character string acquired from the language model filter unit 104 or the replacement candidate extraction unit 103 (that is, the occurrence probability before replacement) and one or more second occurrence scores that correspond to each occurrence probability of one or more replacement candidate character strings (that is, occurrence probability after replacement). The co-occurrence filter unit 105 outputs the replacement candidate character string of the second occurrence score that satisfies the replacement condition C1 as the recognized character string when any of the one or more second occurrence scores satisfies the replacement condition C1 (the first replacement condition) that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than the predetermined occurrence score threshold TH1, and outputs the recognized character string as it is when the replacement condition C1 is not satisfied. The above processing performed by the co-occurrence filter unit 105 is also called co-occurrence filtering. Co-occurrence filtering is a process for increasing the co-occurrence score.

FIG. 10 shows an example of replacement candidate character strings (replacement candidates #1 to #3) generated by the replacement candidate extraction unit 103 in tabular form. FIG. 11 shows examples of replacement candidate character strings (replacement candidates #1 to #3) generated by the language model filter unit 104 in tabular form. FIG. 12 shows examples of replacement candidate character strings (replacement candidates #1 to #3) generated by the co-occurrence filter unit 105 in tabular form.
In each figure, characters with a gray background indicate misrecognized characters. Specifically, each figure shows an example in which ``○'' is erroneously recognized as ``Q'' and ``△'' as ``A'', the Japanese ``times (KAI)'' is erroneously recognized as the symbol ``□ (square)'', ``A'' is erroneously recognized as ``△'', and the Japanese ``sei (SEI)'' is erroneously recognized as the Japanese ``king (OU)''. Also, in each figure, the character strings in the shaded cells are elements that have been removed.

The reliability calculation unit 108 integrates (for example, summation, weighted addition, etc.) the language score obtained by the language model filter unit 104 and the co-occurrence score obtained by the co-occurrence filter unit 105 for the error correction part, and determines that the reliability is low when the final score is low, and the reliability is high when the score is high. The information processing device 11 may not include the reliability calculation unit 108 .

The display control unit 109 outputs image data based on the recognized character string output from the co-occurrence filter unit 105 or the recognized character string output from the language model filter unit 104 and co-occurrence filter unit 105 . When the result of recognition error correction is displayed, if the score calculated by the reliability calculation unit 108 is low, the display control unit 109 may issue an alert by highlighting or the like to prompt manual checking. For example, "King (OU)" in "Mixing tank A material stirring valve No. 0 "Alarm-Warning-Wang Chang"" is the result of erroneous recognition of "Correct (SEI)", so the display control unit 109 may perform highlighting such as color, brightness, and blinking.

FIG. 13 shows an example of character recognition results output from the display control unit 109 in a table format. As the tabular data, the result of character recognition of the character string read from the drawing is output in, for example, csv (Comma-Separated Values) format.

14 is a flowchart showing the operation of the information processing device 11 of the drawing reading system 10. FIG. First, the information processing device 11 acquires the data I0 of the drawing 60 (step S11). Next, the information processing device 11 divides the drawing 60 into a drawing portion 61, a table portion 62, and a character string portion (step S12). Next, the information processing device 11 performs character recognition of the character strings in the cells of the table portion 62 to generate recognized character strings, which are text data obtained as a result of character recognition (step S13). At this time, character recognition of character strings other than those in the cells of the table portion 62 is also performed. Next, the information processing device 11 converts the table recognition data, which is the recognition data of the table portion 62, into table format data (for example, csv format) (step S14).

Next, the information processing device 11 extracts one or more correct candidate character strings paired with the error character string matching the recognized character string from the error pattern dictionary 131 as one or more replacement candidate character strings (step S15). The information processing device 11 executes language model filtering based on the language model 132 (step S16). The information processing device 11 executes co-occurrence filtering based on the co-occurrence dictionary 133 (step S17). The information processing apparatus 11 may perform the language model filtering and the co-occurrence filtering in the order opposite to that in FIG. 14, or may perform them in parallel.

Next, the information processing device 11 calculates the reliability of the corrected recognized character string based on the language score obtained by the language model filter unit 104 and the co-occurrence score obtained by the co-occurrence filter unit 105 (step S18). The information processing device 11 outputs image data (for example, display data for displaying a display) (step S19).

As described above, by using the drawing reading system, the drawing reading method, and the drawing reading program according to Embodiment 1, replacement candidate character strings are evaluated using the co-occurrence frequency (appearance frequency) between cells or the co-occurrence frequency (appearance frequency) between a cell and a figure part, so that the accuracy rate of character recognition of character strings in the cells of the table part in the drawing can be increased.

Embodiment 2.
FIG. 15 is a functional block diagram showing configurations of the drawing reading system 20 and the information processing device 21 according to the second embodiment. 15, the same reference numerals as those shown in FIG. 5 are attached to the same or corresponding configurations as those shown in FIG. The information processing apparatus 21 differs from the information processing apparatus 11 shown in FIG. 5 in that it has a storage unit for character recognition correct data 135 and an automatic knowledge acquisition unit 134 . More specifically, the automatic knowledge acquisition unit 134 compares the result of correcting the recognized character string, which is the result of character recognition, with character recognition correct data (prepared in advance), and based on the comparison result, includes a co-occurrence dictionary that indicates the frequency of co-occurrence between the language model 132 and the target cell and surrounding cells, and an automatic filter acquisition unit that learns the correction.

The user operation unit 43 enables the character recognition results in tabular form displayed on the display 42 by the display control unit 109 to be corrected by the user's operation. A correct character string is a correct character string added by a user's operation when a character recognition error is included. FIG. 16 shows an example of a correction operation to a correct character string by the user operation unit 43. FIG.

The automatic knowledge acquisition unit 134 not only automatically extracts the difference between the character recognition result and the correct character recognition data 135 as an error pattern (candidate), but also automatically learns from the correct character recognition data 135 a language model 132 that statistically represents the chain of characters or words, and a co-occurrence dictionary 133 that represents the frequency of co-occurrence with cells above, below, left and right. That is, the automatic knowledge acquisition unit 134 modifies one or more of the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 based on the correction data. The character recognition correct data 135 is character recognition correct data for the drawing. The character recognition correct data 135 is stored in advance in the storage device 203 . The character recognition correct data 135 may be stored in a storage device of another device (for example, a server on the network) that can communicate with the information processing device 11 .

17 is a flowchart showing the operation of the information processing device 21 of the drawing reading system 20. FIG. In FIG. 17, steps having the same content as the steps shown in FIG. 14 are given the same reference numerals as those shown in FIG. The information processing device 21 acquires correction data input from the user operation unit 43 for the recognized character string displayed on the display 42 (step S20), and then corrects (that is, updates) any one or more of the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 based on the correction data.

As described above, if the drawing reading system 20, the drawing reading method, and the drawing reading program according to the second embodiment are used, the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 are corrected using the information corrected by the user, so the accuracy rate of character recognition of the character strings in the cells of the table portion of the drawing can be further increased.

Except for the above, the second embodiment is the same as the first embodiment.

Embodiment 3.
FIG. 18 is a functional block diagram showing configurations of the drawing reading system 30 and the information processing device 31 according to the third embodiment. In FIG. 18, the same reference numerals as those shown in FIG. 5 are attached to the same or corresponding configurations as those shown in FIG. The information processing device 31 differs from the information processing device 11 shown in FIG. 5 in that it has a parts number calculation unit 120, a template 141, and a parts list 142. FIG. The parts number calculation unit 120 has an information extraction unit 106 and an information totalization unit 107 .

The parts number calculation unit 120 automatically calculates the number of parts. The parts number calculation unit 120 calculates the number of parts in the drawing portion 61 of the drawing 60 based on the recognized character strings obtained by co-occurrence filtering and language model filtering, or obtained by co-occurrence filtering.

FIG. 19 shows an example of the template 141 in tabular form. A template 141 in FIG. 19 is an information extraction rule written in regular expressions or the like. The template 141 of FIG. 19 describes that the symbol "○" in the description "◯: 1 to 3" outside the table portion 62 indicates three cases of "Case 1 to Case 3". 19. The template 141 in FIG. 19 is stored in advance in the storage device 203 in FIG.

FIG. 20 shows another example of the template 141 in tabular form. The template in FIG. 20 is used to obtain the "score/number of positions" because different parts are adopted depending on the number of possible states of the part. For example, if there is a description of "closed-open", which is an example of the character string, a lamp indicating the "closed" state and a lamp indicating the "open" state are required, and the number of points for the lamp is two. Also, since the lamp displays two states, the number of lamp positions is two. By applying the template, it is required that the "score and number of positions" of the parts matched by the number of hyphens "-" in square brackets "　" is "2". Also, in FIG. 20, "number of machines/number of machines" is described in one line in the table, but in reality, multiple machines (i.e., multiple "numbers of machines") are required, and each device indicates that multiple panels (that is, multiple "numbers of machines") are required. In the third embodiment, ◯ is 3 in the ``number of cases of ◯'', so the ``number of models/number of issues'' is ``3''.

FIG. 21 shows an example of the parts list 142 in tabular form. The parts list 142 stores rules for specifying parts. The parts list 142 is stored in the storage device 203 in advance. The parts list 142 may be stored in a storage device of another device (for example, a server on the network) that can communicate with the information processing device 11 .

The information aggregation unit 107 calculates the number of parts based on the information extracted by the information extraction unit 106. FIG. 22 shows an example of the number of parts output from the display control unit 109 of the information processing device 31 in tabular form. The number indicates how many pieces of each part are required, which is automatically calculated from the drawing 60 .

FIG. 23 is a flow chart showing the operation of the information processing device 31 of the drawing reading system 30 according to the third embodiment. In FIG. 23, steps having the same content as the steps shown in FIG. 14 are given the same reference numerals as those shown in FIG. The operation of the information processing device 31 is different from the operation of FIG. 14 in that information on the number of parts is extracted based on the template 141 (step S30) and the number of parts in the parts list 142 is tallied (step S31). Otherwise, the operation of FIG. 23 is the same as that of FIG.

As described above, by using the drawing reading system 30, the drawing reading method, and the drawing reading program according to the third embodiment, the number of parts can be automatically counted. Further, when the information processing device 31 can acquire the unit price information of the parts, the information processing device 31 can automatically create an estimate of the parts.

Except for the above, the third embodiment is the same as the first or second embodiment.

10, 20, 30 drawing reading system, 11, 21, 31 information processing device, 41 image scanner, 42 display, 43 user operation unit, 60 drawing, 61 diagram part, 62 table part, 100 layout analysis unit, 101 character recognition unit, 102 table format conversion unit, 103 replacement candidate extraction unit, 104 language model filter unit, 105 Co-occurrence filter section, 106 Information extraction section, 107 Information aggregation section, 108 Reliability calculation section, 109 Display control section, 110 Filter processing section, 120 Parts number calculation section, 131 Error pattern dictionary, 132 Language model, 133 Co-occurrence dictionary, 134 Automatic knowledge acquisition section, 135 Character recognition correct data, 141 Template, 142 Parts list, 201 processor, 202 memory, 203 storage device, 204 communication device, 205 interface.

Claims

a layout analysis unit that analyzes data of a drawing to be read and divides the drawing into a drawing portion, a table portion, and a character string composed of one or more characters;
a character recognition unit that performs character recognition on the character string to generate a recognized character string that is text data obtained as a result of the character recognition;
a replacement candidate extraction unit for extracting, as one or more replacement candidate character strings, one or more correct candidate character strings paired with the error character string that matches the recognized character string, from an error pattern dictionary configured by collecting in advance error patterns that are pairs of error character strings and correct candidate character strings;
A first occurrence score corresponding to the probability of occurrence of the recognized character string and 1 corresponding to the probability of occurrence of each of the one or more replacement candidate character strings based on a co-occurrence dictionary composed of at least one of: first co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell of the table portion and other character strings in the other cell of the table portion; and second co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion. a co-occurrence filtering unit that performs co-occurrence filtering, which is a process of calculating one or more second occurrence scores, and outputting a replacement candidate character string of the second occurrence score that satisfies the first replacement condition as the recognition character string when a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold; and outputting the recognition character string as it is when the first replacement condition is not satisfied;
a display control unit that outputs image data based on the recognized character string output from the co-occurrence filter unit;
A drawing reading system comprising:
A first language score corresponding to the probability of occurrence of the recognized character string and one or more second language scores corresponding to the probability of occurrence of each of the one or more replacement candidate character strings are calculated based on a language model in which a chain of words and the appearance probability of the chain are stored. A language model filtering unit that outputs the replacement candidate character string of the second language score that satisfies the second replacement condition as a string and outputs the recognized character string as it is when the second replacement condition is not satisfied, further comprising a language model filtering unit that performs language model filtering,
2. The drawing reading system according to claim 1, wherein the co-occurrence filter section performs the co-occurrence filtering on the recognized character string output from the language model filter section.
A first language score corresponding to the probability of occurrence of the recognized character string and one or more second language scores corresponding to the probability of occurrence of each of the one or more replacement candidate character strings are calculated based on a language model in which a chain of words and the appearance probability of the chain are stored. A language model filtering unit that outputs the replacement candidate character string of the second language score that satisfies the second replacement condition as a string and outputs the recognized character string as it is when the second replacement condition is not satisfied, further comprising a language model filtering unit that performs language model filtering,
The language model filter unit performs the language model filtering on the recognized character string output from the co-occurrence filter unit,
2. The drawing reading system according to claim 1, wherein the display control unit displays an image based on image data based on the recognized character string output from the language model filter unit.
The drawing reading system according to any one of claims 1 to 3, wherein the other character string is a character string in one or more cells on the top, bottom, left, and right of the target cell.
The drawing reading system according to any one of claims 1 to 4, wherein the other character string is a character string in one or more cells adjacent to the target cell.
6. The drawing reading system according to any one of claims 1 to 5, wherein the character strings in the drawing portion include symbols indicating types of members drawn in the drawing portion.
In the drawing part, the equipment and the parts provided in the equipment are drawn,
The drawing reading system according to any one of claims 1 to 6, wherein the character string in the drawing part includes a symbol indicating the type of the part.
a display that displays an image based on the recognized character string output from the display control unit;
a user operation unit into which correction data for the recognition character string is input;
an automatic knowledge acquisition unit that corrects the error pattern dictionary and the co-occurrence dictionary based on the correction data;
8. The drawing reading system according to any one of claims 1 to 7, further comprising:
a display that displays an image based on the recognized character string output from the display control unit;
a user operation unit into which correction data for the recognition character string is input;
an automatic knowledge acquisition unit that corrects the error pattern dictionary, the co-occurrence dictionary, and the language model based on the correction data;
4. The drawing reading system according to claim 2, further comprising:
3. The drawing reading system according to claim 1, further comprising a parts number calculation unit that calculates the number of parts in the drawing portion based on the recognized character strings obtained by the co-occurrence filtering.
4. The drawing reading system according to claim 2, further comprising a parts number calculation unit that calculates the number of parts in the drawing portion based on the recognized character strings obtained by the co-occurrence filtering and the language model filtering.
The parts number calculation unit
an information extracting unit for extracting information from a template storing in advance a method for accumulating the number of parts;
an information aggregating unit for aggregating the number of parts extracted by the information extracting unit for parts stored in a parts list storing a list of parts to be integrated in advance;
12. The drawing reading system according to claim 10, further comprising:
A drawing reading method executed by an information processing device, comprising:
a step of analyzing data of a drawing to be read and dividing the drawing into a drawing part, a table part, and a character string composed of one or more characters;
performing character recognition on the character string to generate a recognized character string, which is text data obtained as a result of the character recognition;
a step of extracting one or more correct candidate character strings paired with the error character string matching the recognized character string as one or more replacement candidate character strings from an error pattern dictionary constructed by collecting in advance error patterns that are pairs of error character strings and correct candidate character strings;
A first occurrence score corresponding to the probability of occurrence of the recognized character string and 1 corresponding to the probability of occurrence of each of the one or more replacement candidate character strings based on a co-occurrence dictionary composed of at least one of: first co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell of the table portion and other character strings in the other cell of the table portion; and second co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion. calculating one or more second occurrence scores, and outputting a replacement candidate character string of the second occurrence score that satisfies the first replacement condition as the recognition character string when a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold;
outputting image data based on the recognized character string on which the co-occurrence filtering has been performed;
A drawing reading method, comprising:
to the computer,
a step of analyzing data of a drawing to be read and dividing the drawing into a drawing part, a table part, and a character string composed of one or more characters;
performing character recognition on the character string to generate a recognized character string, which is text data obtained as a result of the character recognition;
a step of extracting one or more correct candidate character strings paired with the error character string matching the recognized character string as one or more replacement candidate character strings from an error pattern dictionary constructed by collecting in advance error patterns that are pairs of error character strings and correct candidate character strings;
A first occurrence score corresponding to the probability of occurrence of the recognized character string and 1 corresponding to the probability of occurrence of each of the one or more replacement candidate character strings based on a co-occurrence dictionary composed of at least one of: first co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell of the table portion and other character strings in the other cell of the table portion; and second co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion. calculating one or more second occurrence scores, and outputting a replacement candidate character string of the second occurrence score that satisfies the first replacement condition as the recognition character string when a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold;
outputting image data based on the recognized character string on which the co-occurrence filtering has been performed;
A drawing reading program characterized by executing