WO2023139650A1 - Drawing reading system, drawing reading method, and drawing reading program - Google Patents

Drawing reading system, drawing reading method, and drawing reading program Download PDF

Info

Publication number
WO2023139650A1
WO2023139650A1 PCT/JP2022/001620 JP2022001620W WO2023139650A1 WO 2023139650 A1 WO2023139650 A1 WO 2023139650A1 JP 2022001620 W JP2022001620 W JP 2022001620W WO 2023139650 A1 WO2023139650 A1 WO 2023139650A1
Authority
WO
WIPO (PCT)
Prior art keywords
occurrence
character string
unit
recognized
replacement
Prior art date
Application number
PCT/JP2022/001620
Other languages
French (fr)
Japanese (ja)
Inventor
辰彦 斉藤
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to JP2023553177A priority Critical patent/JP7383209B1/en
Priority to PCT/JP2022/001620 priority patent/WO2023139650A1/en
Publication of WO2023139650A1 publication Critical patent/WO2023139650A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/12Detection or correction of errors, e.g. by rescanning the pattern

Definitions

  • the present disclosure relates to a drawing reading system, a drawing reading method, and a drawing reading program.
  • This device comprises generating means for generating words not managed by the knowledge dictionary by combining constituent characters of words managed by the knowledge dictionary, measuring means for measuring the distance between the recognized character string and the words managed by the knowledge dictionary, measuring the distance between the recognized character string and the word generated by the generating means, and correcting means for correcting errors in the recognized character string based on the measured distance.
  • tablette portion when performing character recognition of characters in each cell of a table in a drawing (hereinafter also referred to as "table portion") with the above conventional device, there is a problem that the accuracy rate of character recognition is low because the character string in each cell is short.
  • the present disclosure has been made to solve the conventional problems described above, and aims to provide a drawing reading system, a drawing reading method, and a drawing reading program that make it possible to increase the accuracy rate of character recognition of character strings in cells of table portions in drawings.
  • the drawing reading system of the present disclosure analyzes data of a drawing to be read and divides the drawing into a drawing portion, a table portion, and a character string composed of one or more characters; a character recognition portion that performs character recognition on the character string to generate a recognized character string that is text data obtained as a result of the character recognition; a replacement candidate extracting unit for extracting one or more correct candidate character strings paired with a column as one or more replacement candidate character strings; A first occurrence score corresponding to the occurrence probability and one or more second occurrence scores corresponding to the occurrence probability of each of the one or more replacement candidate character strings are calculated, and if a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold, the replacement candidate character string of the second occurrence score that satisfies the first replacement condition is output as the recognized character string, and the first replacement is performed.
  • a co-occurrence filtering unit that executes co-occurrence filtering, which is a process of outputting the recognized character string as it is when a condition is not satisfied, and a display control unit that outputs image data based on the recognized character string output from the co-occurrence filter unit.
  • the drawing reading method of the present disclosure is a method executed by an information processing apparatus, and includes the steps of analyzing data of a drawing to be read and dividing the drawing into a drawing portion, a table portion, and a character string composed of one or more characters; executing character recognition on the character string to generate a recognized character string that is text data obtained as a result of the character recognition; extracting one or more correct candidate character strings paired with the matching error character string as one or more replacement candidate character strings; and first co-occurrence information indicating frequency of co-occurrence of the target character string in the target cell of the table portion and other character strings in other cells of the table portion, and second co-occurrence information indicating frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion, based on a co-occurrence dictionary composed of at least one of co-occurrence information.
  • FIG. 1 is a configuration diagram showing an example of a hardware configuration of a drawing reading system according to Embodiment 1;
  • FIG. FIG. 1 shows an example of a drawing in which a figure portion, a table portion, and a character string are described.
  • FIG. 2 shows a character string in a cell in the table portion of the drawing of FIG. 2 and an example of the character recognition result of this character string (when misrecognition is included).
  • FIG. 2 shows a character string in a cell in the table portion of the drawing of FIG. 2 and an example of the result of character recognition of this character string (when no erroneous recognition is included).
  • 1 is a functional block diagram showing configurations of a drawing reading system and an information processing apparatus according to Embodiment 1;
  • FIG. FIG. 1 is a functional block diagram showing configurations of a drawing reading system and an information processing apparatus according to Embodiment 1;
  • FIG. FIG. 1 is a functional block diagram showing configurations of a drawing reading system and an information processing apparatus
  • FIG. 3 shows an example of the results of character recognition of characters included in the drawing part in the drawing of FIG.
  • FIG. 3 shows an example of a character recognition result of a character string in a cell of a table portion in the drawing of FIG.
  • An example of an error pattern dictionary of the drawing reading system according to Embodiment 1 is shown in tabular form.
  • An example of a co-occurrence dictionary of the drawing reading system according to Embodiment 1 is shown in tabular form.
  • FIG. 4 shows an example of a replacement candidate character string generated by the replacement candidate extraction unit of the drawing reading system according to the first embodiment in tabular form.
  • FIG. FIG. 2 shows in tabular form an example of replacement candidate character strings generated by the language model filter unit of the drawing reading system according to the first embodiment;
  • FIG. 10 shows an example of replacement candidate character strings generated by the co-occurrence filter unit of the drawing reading system according to the first embodiment in a tabular format
  • FIG. 4 shows an example of text data, which is the result of character recognition output from the display control unit of the drawing reading system according to the first embodiment, in tabular form.
  • 4 is a flow chart showing the operation of the information processing device of the drawing reading system according to Embodiment 1
  • FIG. 7 is a functional block diagram showing configurations of a drawing reading system and an information processing apparatus according to Embodiment 2
  • An example of input operation by the user operation unit of the drawing reading system according to the second embodiment is shown in a table format
  • 9 is a flow chart showing the operation of the drawing reading system according to Embodiment 2;
  • FIG. 1 shows an example of replacement candidate character strings generated by the co-occurrence filter unit of the drawing reading system according to the first embodiment in a tabular format
  • FIG. 4 shows an example of text data, which is the result of character recognition output from the display control unit of the drawing
  • FIG. 11 is a functional block diagram showing the configurations of a drawing reading system and an information processing device according to Embodiment 3;
  • An example of a template for the drawing reading system according to the third embodiment is shown in tabular form.
  • Another example of the template of the drawing reading system according to the third embodiment is shown in tabular form.
  • An example of a parts list of the drawing reading system according to Embodiment 3 is shown in tabular form.
  • FIG. 12 shows an example of the number of parts output from the display control unit of the drawing reading system according to the third embodiment in a tabular format;
  • FIG. 10 is a flow chart showing the operation of the drawing reading system according to Embodiment 3;
  • a drawing reading system, a drawing reading method, and a drawing reading program according to the embodiment will be described below.
  • the following embodiments are merely examples, and the embodiments can be combined as appropriate and each embodiment can be modified as appropriate.
  • the drawing reading system can perform character recognition of character strings in cells of table portions in drawings to be read.
  • a drawing may be a drawing drawn on paper or a drawing converted to electronic data (for example, Portable Document Format (PDF) data).
  • PDF Portable Document Format
  • the drawings include, for example, mechanical drawings, architectural drawings, civil engineering drawings, piping drawings, electrical drawings, maps, and layout drawings.
  • a drawing contains a figure portion, a table portion, and text.
  • the figure portion and the table portion may be drawn on different pages.
  • Figures are drawn in the figure portion.
  • the graphics include, for example, devices (eg, distribution boards, panels such as switchboards), buildings, electrical devices (eg, meters, operation buttons, indicator lamps), and the like.
  • the figure part may contain a character string.
  • the table portion has a plurality of ruled lines that are boundaries of a plurality of cells.
  • a character string is written in the cell of the table portion.
  • a string consists of one or more characters.
  • a character is an element that can be represented by text data. Characters include symbols.
  • the drawing reading system acquires data of a drawing in which a character string is drawn, executes character recognition on the character string drawn in this drawing, and outputs a recognized character string, which is text data corresponding to the character string, as a result of character recognition.
  • the drawing reading system according to the embodiment can improve the accuracy rate of character recognition of character strings in cells of table portions in drawings.
  • the drawing reading method according to the embodiment can be implemented by an information processing device of a drawing reading system. Further, the drawing reading program according to the embodiment is executed by, for example, a computer as an information processing device.
  • FIG. 1 shows an example of the hardware configuration of a drawing reading system 10 according to the first embodiment.
  • the drawing reading system 10 has an information processing device 11 .
  • the information processing device 11 is, for example, a computer.
  • the information processing device 11 is a device capable of implementing the drawing reading method according to the first embodiment.
  • the information processing device 11 acquires data I0 representing a drawing 60 in which a figure portion 61 and a table portion 62 are drawn, character-recognizes a character string in a cell of the table portion 62, and outputs the result of character recognition as output data.
  • the output data includes, for example, text data indicating the characters corresponding to the character strings in the cells of table portion 62 .
  • the drawing reading system 10 has an image scanner 41 , a display 42 and a user operation section 43 .
  • the image scanner 41 is an image reading device that optically reads the drawing 60 and provides the data I0 of the drawing 60 to the information processing device 11 .
  • the display 42 displays an image based on image data output from the information processing device 11 .
  • the user operation unit 43 receives user operations and provides information input by the user to the information processing apparatus 11 . Note that the image scanner 41 , the display 42 and the user operation unit 43 need not be part of the drawing reading system 10 .
  • the image scanner 41, display 42, and user operation unit 43 may be an image scanner, display, and user operation unit provided in an external device (for example, a server on a network) that can communicate with the information processing device 11.
  • the information processing device 11 has a processor 201, a memory 202 that is a volatile storage device, a nonvolatile storage device 203 such as a hard disk drive (HDD) or solid state drive (SSD), a communication device 204 that communicates with external devices, and an interface 205.
  • the memory 202 is, for example, a semiconductor memory such as a RAM (Random Access Memory).
  • the processing circuit may be dedicated hardware or may be the processor 201 that executes a program (for example, a drawing reading program) stored in the memory 202 .
  • the processor 201 may be any of a processing device, an arithmetic device, a microprocessor, a microcomputer, and a DSP (Digital Signal Processor).
  • the processing circuit is dedicated hardware, the processing circuit is, for example, ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array).
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the screen reading method is executed by software, firmware, or a combination of software and firmware.
  • Software and firmware are written as programs and stored in memory 202 .
  • the processor 201 can implement the drawing reading method according to the first embodiment by reading and executing the program stored in the memory 202 .
  • the information processing device 11 may be partially implemented by dedicated hardware and partially implemented by software or firmware.
  • the processing circuitry may implement each of the functions described above in hardware, software, firmware, or any combination thereof.
  • the interface 205 is used to communicate with other devices.
  • Image scanner 41 , display 42 , and user operation unit 43 are connected to interface 205 .
  • FIG. 2 shows an example of a drawing 60 in which a figure portion 61, a table portion 62, and a character string are described.
  • Drawing portion 61 is "Panel No. 101" with parts.
  • “Panel No. 101” has gauges (e.g., meters) indicated by symbols A and XI (numbers 1 and 2), display lamps indicated by symbols LS and DS (numbers 3 and 4), and operation buttons indicated by symbols SS (numbers 5 and 6).
  • Drawing 60 may not include drawing portion 61 .
  • Character strings are written in the cells surrounded by the vertical and horizontal ruled lines of the table portion 62 .
  • the table portion 62 describes the part numbers, symbols, and contents.
  • the table portion 62 shows that the part with the number 1 and the symbol A is an instrument that indicates the degree of opening of the air volume control valve No.0. Also, it is shown that the part with symbol XI in number 2 is an instrument for indicating the circuit air volume of the reaction tank ⁇ .
  • the part numbered LS with the number 3 is shown to be an indicator lamp that indicates the “open-closed” state of the air agitation valve No. 0 of the anaerobic tank A.
  • the part with the symbol DS in number 4 is an indicator lamp that indicates whether the state of the air agitation valve No. 0 of the anaerobic tank A is "alarm - warning - normal". It is shown that the part with the symbol SS at number 5 is an emergency stop button.
  • the part with the symbol SS at number 6 is shown to be a test button.
  • the drawing 60 is an example, and the drawing to be read may have other description contents.
  • FIG. 3 shows, in tabular form, an example of a character string in a cell of the table portion 62 of the drawing 60 of FIG. 2 and the result of character recognition of this character string (when misrecognition is included).
  • FIG. 3 shows an example in which the result of character recognition of "correct (SEI)" is "king (OU)” and misrecognition has occurred.
  • FIG. 4 shows, in tabular form, an example of a character string in a cell of the table portion 62 of the drawing 60 of FIG. 2 and the result of character recognition of this character string (when misrecognition is not included).
  • FIG. 5 is a functional block diagram showing the configuration of the drawing reading system 10 and the information processing device 11.
  • the drawing reading system 10 includes a layout analysis unit 100, a character recognition unit 101, a table format conversion unit 102, a replacement candidate extraction unit 103, a language model filter unit 104, a co-occurrence filter unit 105, a reliability calculation unit 108, a display control unit 109, an error pattern dictionary 131, a language model 132, and a co-occurrence dictionary 133.
  • the replacement candidate extraction unit 103, the language model filter unit 104, and the co-occurrence filter unit 105 constitute a filter processing unit 110 that performs filtering for correcting the recognized character string.
  • the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 are stored, for example, in the storage device 203 of FIG.
  • the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 may be stored in a storage device of an external device (for example, a server on a network) that can communicate with the information processing device 11.
  • FIG. 1 An external device (for example, a server on a network) that can communicate with the information processing device 11.
  • the layout analysis unit 100 analyzes the data I0 of the drawing 60 and divides the drawing 60 into a drawing portion 61, a table portion 62, and a character string (that is, a text portion) composed of one or more characters.
  • the character recognition unit 101 performs character recognition on the character string acquired from the layout analysis unit 100, and generates a recognized character string, which is text data obtained as a result of character recognition. Specifically, the character recognition unit 101 outputs a set of characters obtained by character recognition and position information thereof (that is, information indicating the position in the drawing 60) as a result of character recognition.
  • FIG. 6 shows, in tabular form, an example of the results of character recognition of characters included in the drawing portion 61 of the drawing 60 of FIG.
  • the table format conversion unit 102 reconstructs the table structure based on the character recognition results of the character strings recognized by the character recognition unit 101, and holds the character strings in a table format.
  • FIG. 7 shows, in tabular form, an example of character recognition results of character strings in cells of the table portion 62 of the drawing 60 of FIG.
  • FIG. 7 shows an example in which "air volume control valve No. 0" is erroneously recognized as “air volume control valve Q”, "reaction tank ⁇ ” is erroneously recognized as “reaction tank A”, and “positive (SEI)” is erroneously recognized as "king (OU)".
  • the filter processing unit 110 finds errors in character recognition and corrects the character recognition results by filtering the tabular character recognition results.
  • the filter processing unit 110 uses the error pattern dictionary 131 to perform a process of setting those that match the error pattern as replacement candidates, uses the language model 132 to perform a process of calculating the language score, and then performs a process of calculating the co-occurrence score using the co-occurrence dictionary 133.
  • the filtering unit 110 determines whether or not to modify the recognized character string based on either or both of the language score and the co-occurrence score.
  • the filter processing unit 110 uses the error pattern dictionary 131 to perform a process of setting those that match the error pattern as replacement candidates, uses the co-occurrence dictionary 133 to perform a process of calculating a co-occurrence score, and then uses the language model 132 to perform a process of calculating a language score.
  • the filter processing unit 110 uses the error pattern dictionary 131 to perform the process of setting those matching the error pattern as replacement candidates, uses the co-occurrence dictionary 133 to perform the process of calculating the co-occurrence score, and uses the language model 132 to not perform the process of calculating the language score.
  • the information processing device 11 may not include the language model 132 and the language model filter unit 104 .
  • the filtering unit 110 determines whether or not to modify the recognized character string based on the co-occurrence score.
  • the filter processing unit 110 uses the error pattern dictionary 131 to perform the process of setting those matching the error pattern as replacement candidates, uses the co-occurrence dictionary 133 to perform the process of calculating the co-occurrence score, and uses the language model 132 to not perform the process of calculating the language score.
  • the information processing device 11 may not include the language model 132 and the language model filter unit 104 .
  • the filtering unit 110 determines whether or not to modify the recognized character string based on the co-occurrence score.
  • FIG. 8 shows an example of the error pattern dictionary 131 in tabular form.
  • the error pattern dictionary 131 stores information on pairs of character recognition errors and correct character strings.
  • the error pattern dictionary 131 is configured by previously collecting error patterns, which are pairs of error character strings and correct candidate character strings.
  • the replacement candidate extraction unit 103 extracts one or more correct candidate character strings paired with an error character string matching the recognized character string from the error pattern dictionary 131 as one or more replacement candidate character strings.
  • the replacement candidate extraction unit 103 uses the error pattern dictionary 131 to extract a character string that is determined to be a character recognition error, and in preparation for replacing it with a correct character string, for example, creates a correct candidate character string (also referred to as a "replacement candidate character string”) that is a candidate for the correct character string.
  • a correct candidate character string also referred to as a "replacement candidate character string”
  • the language model filter unit 104 calculates a first language score corresponding to the occurrence probability of the recognized character string (that is, the occurrence probability before replacement) and one or more second language scores that correspond to the occurrence probability of each of one or more replacement candidate character strings (that is, occurrence probability after replacement) based on the language model 132 that stores the word chains and the occurrence probabilities of the chains.
  • the language model 132 is a model in which word sequences are statistically described, learned from a large corpus.
  • the language model 132 is stored in the storage device 203 in advance.
  • the language model filter unit 104 outputs the replacement candidate character string of the second language score that satisfies the replacement condition C2 as the recognized character string when there is a second language score that satisfies the replacement condition C2 (the second replacement condition) that any one of the one or more second language scores is higher than the first language score and higher than the predetermined language score threshold TH2, and outputs the recognized character string as it is when the replacement condition C2 is not satisfied.
  • the above processing performed by the language model filter unit 104 is also called language model filtering.
  • Language model filtering is processing that increases the language model score.
  • FIG. 9 shows an example of the co-occurrence dictionary 133 in tabular form.
  • the co-occurrence dictionary 133 includes at least one of co-occurrence information of first co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell of the table portion 62 and other character strings in other cells of the table portion 62, and second co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion 61.
  • the co-occurrence dictionary 133 is a dictionary that describes co-occurrence information on top, bottom, left, and right of the table part of the drawing, and co-occurrence information with the figure part corresponding to the table part or with the character strings outside the table part, learned from a large corpus.
  • the co-occurrence dictionary 133 is stored in advance in the storage device 203 .
  • the co-occurrence dictionary 133 may be stored in a storage device of another device (for example, a server on a network) that can communicate with the information processing device 11 .
  • the co-occurrence filter unit 105 calculates a first occurrence score corresponding to the occurrence probability of the recognized character string acquired from the language model filter unit 104 or the replacement candidate extraction unit 103 (that is, the occurrence probability before replacement) and one or more second occurrence scores that correspond to each occurrence probability of one or more replacement candidate character strings (that is, occurrence probability after replacement).
  • the co-occurrence filter unit 105 outputs the replacement candidate character string of the second occurrence score that satisfies the replacement condition C1 as the recognized character string when any of the one or more second occurrence scores satisfies the replacement condition C1 (the first replacement condition) that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than the predetermined occurrence score threshold TH1, and outputs the recognized character string as it is when the replacement condition C1 is not satisfied.
  • the above processing performed by the co-occurrence filter unit 105 is also called co-occurrence filtering.
  • Co-occurrence filtering is a process for increasing the co-occurrence score.
  • FIG. 10 shows an example of replacement candidate character strings (replacement candidates #1 to #3) generated by the replacement candidate extraction unit 103 in tabular form.
  • FIG. 11 shows examples of replacement candidate character strings (replacement candidates #1 to #3) generated by the language model filter unit 104 in tabular form.
  • FIG. 12 shows examples of replacement candidate character strings (replacement candidates #1 to #3) generated by the co-occurrence filter unit 105 in tabular form.
  • characters with a gray background indicate misrecognized characters.
  • each figure shows an example in which ⁇ '' is erroneously recognized as ⁇ Q'' and ⁇ '' as ⁇ A'', the Japanese ⁇ times (KAI)'' is erroneously recognized as the symbol ⁇ (square)'', ⁇ A'' is erroneously recognized as ⁇ '', and the Japanese ⁇ sei (SEI)'' is erroneously recognized as the Japanese ⁇ king (OU)''.
  • the character strings in the shaded cells are elements that have been removed.
  • the reliability calculation unit 108 integrates (for example, summation, weighted addition, etc.) the language score obtained by the language model filter unit 104 and the co-occurrence score obtained by the co-occurrence filter unit 105 for the error correction part, and determines that the reliability is low when the final score is low, and the reliability is high when the score is high.
  • the information processing device 11 may not include the reliability calculation unit 108 .
  • the display control unit 109 outputs image data based on the recognized character string output from the co-occurrence filter unit 105 or the recognized character string output from the language model filter unit 104 and co-occurrence filter unit 105 .
  • the display control unit 109 may issue an alert by highlighting or the like to prompt manual checking. For example, “King (OU)" in “Mixing tank A material stirring valve No. 0 "Alarm-Warning-Wang Chang”” is the result of erroneous recognition of "Correct (SEI)", so the display control unit 109 may perform highlighting such as color, brightness, and blinking.
  • FIG. 13 shows an example of character recognition results output from the display control unit 109 in a table format.
  • the result of character recognition of the character string read from the drawing is output in, for example, csv (Comma-Separated Values) format.
  • the information processing device 11 acquires the data I0 of the drawing 60 (step S11).
  • the information processing device 11 divides the drawing 60 into a drawing portion 61, a table portion 62, and a character string portion (step S12).
  • the information processing device 11 performs character recognition of the character strings in the cells of the table portion 62 to generate recognized character strings, which are text data obtained as a result of character recognition (step S13).
  • the information processing device 11 converts the table recognition data, which is the recognition data of the table portion 62, into table format data (for example, csv format) (step S14).
  • the information processing device 11 extracts one or more correct candidate character strings paired with the error character string matching the recognized character string from the error pattern dictionary 131 as one or more replacement candidate character strings (step S15).
  • the information processing device 11 executes language model filtering based on the language model 132 (step S16).
  • the information processing device 11 executes co-occurrence filtering based on the co-occurrence dictionary 133 (step S17).
  • the information processing apparatus 11 may perform the language model filtering and the co-occurrence filtering in the order opposite to that in FIG. 14, or may perform them in parallel.
  • the information processing device 11 calculates the reliability of the corrected recognized character string based on the language score obtained by the language model filter unit 104 and the co-occurrence score obtained by the co-occurrence filter unit 105 (step S18).
  • the information processing device 11 outputs image data (for example, display data for displaying a display) (step S19).
  • replacement candidate character strings are evaluated using the co-occurrence frequency (appearance frequency) between cells or the co-occurrence frequency (appearance frequency) between a cell and a figure part, so that the accuracy rate of character recognition of character strings in the cells of the table part in the drawing can be increased.
  • FIG. 15 is a functional block diagram showing configurations of the drawing reading system 20 and the information processing device 21 according to the second embodiment. 15, the same reference numerals as those shown in FIG. 5 are attached to the same or corresponding configurations as those shown in FIG.
  • the information processing apparatus 21 differs from the information processing apparatus 11 shown in FIG. 5 in that it has a storage unit for character recognition correct data 135 and an automatic knowledge acquisition unit 134 .
  • the automatic knowledge acquisition unit 134 compares the result of correcting the recognized character string, which is the result of character recognition, with character recognition correct data (prepared in advance), and based on the comparison result, includes a co-occurrence dictionary that indicates the frequency of co-occurrence between the language model 132 and the target cell and surrounding cells, and an automatic filter acquisition unit that learns the correction.
  • the user operation unit 43 enables the character recognition results in tabular form displayed on the display 42 by the display control unit 109 to be corrected by the user's operation.
  • a correct character string is a correct character string added by a user's operation when a character recognition error is included.
  • FIG. 16 shows an example of a correction operation to a correct character string by the user operation unit 43.
  • the automatic knowledge acquisition unit 134 not only automatically extracts the difference between the character recognition result and the correct character recognition data 135 as an error pattern (candidate), but also automatically learns from the correct character recognition data 135 a language model 132 that statistically represents the chain of characters or words, and a co-occurrence dictionary 133 that represents the frequency of co-occurrence with cells above, below, left and right. That is, the automatic knowledge acquisition unit 134 modifies one or more of the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 based on the correction data.
  • the character recognition correct data 135 is character recognition correct data for the drawing.
  • the character recognition correct data 135 is stored in advance in the storage device 203 .
  • the character recognition correct data 135 may be stored in a storage device of another device (for example, a server on the network) that can communicate with the information processing device 11 .
  • FIG. 17 is a flowchart showing the operation of the information processing device 21 of the drawing reading system 20.
  • steps having the same content as the steps shown in FIG. 14 are given the same reference numerals as those shown in FIG.
  • the information processing device 21 acquires correction data input from the user operation unit 43 for the recognized character string displayed on the display 42 (step S20), and then corrects (that is, updates) any one or more of the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 based on the correction data.
  • the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 are corrected using the information corrected by the user, so the accuracy rate of character recognition of the character strings in the cells of the table portion of the drawing can be further increased.
  • the second embodiment is the same as the first embodiment.
  • FIG. 18 is a functional block diagram showing configurations of the drawing reading system 30 and the information processing device 31 according to the third embodiment.
  • the information processing device 31 differs from the information processing device 11 shown in FIG. 5 in that it has a parts number calculation unit 120, a template 141, and a parts list 142.
  • FIG. The parts number calculation unit 120 has an information extraction unit 106 and an information totalization unit 107 .
  • the parts number calculation unit 120 automatically calculates the number of parts.
  • the parts number calculation unit 120 calculates the number of parts in the drawing portion 61 of the drawing 60 based on the recognized character strings obtained by co-occurrence filtering and language model filtering, or obtained by co-occurrence filtering.
  • FIG. 19 shows an example of the template 141 in tabular form.
  • a template 141 in FIG. 19 is an information extraction rule written in regular expressions or the like.
  • the template 141 of FIG. 19 describes that the symbol " ⁇ " in the description " ⁇ : 1 to 3" outside the table portion 62 indicates three cases of "Case 1 to Case 3". 19.
  • the template 141 in FIG. 19 is stored in advance in the storage device 203 in FIG.
  • FIG. 20 shows another example of the template 141 in tabular form.
  • the template in FIG. 20 is used to obtain the "score/number of positions" because different parts are adopted depending on the number of possible states of the part. For example, if there is a description of "closed-open", which is an example of the character string, a lamp indicating the "closed” state and a lamp indicating the "open” state are required, and the number of points for the lamp is two. Also, since the lamp displays two states, the number of lamp positions is two.
  • the template it is required that the "score and number of positions" of the parts matched by the number of hyphens "-" in square brackets " " is "2". Also, in FIG.
  • FIG. 21 shows an example of the parts list 142 in tabular form.
  • the parts list 142 stores rules for specifying parts.
  • the parts list 142 is stored in the storage device 203 in advance.
  • the parts list 142 may be stored in a storage device of another device (for example, a server on the network) that can communicate with the information processing device 11 .
  • the information aggregation unit 107 calculates the number of parts based on the information extracted by the information extraction unit 106.
  • FIG. 22 shows an example of the number of parts output from the display control unit 109 of the information processing device 31 in tabular form. The number indicates how many pieces of each part are required, which is automatically calculated from the drawing 60 .
  • FIG. 23 is a flow chart showing the operation of the information processing device 31 of the drawing reading system 30 according to the third embodiment.
  • steps having the same content as the steps shown in FIG. 14 are given the same reference numerals as those shown in FIG.
  • the operation of the information processing device 31 is different from the operation of FIG. 14 in that information on the number of parts is extracted based on the template 141 (step S30) and the number of parts in the parts list 142 is tallied (step S31). Otherwise, the operation of FIG. 23 is the same as that of FIG.
  • the drawing reading system 30 the drawing reading method, and the drawing reading program according to the third embodiment, the number of parts can be automatically counted. Further, when the information processing device 31 can acquire the unit price information of the parts, the information processing device 31 can automatically create an estimate of the parts.
  • the third embodiment is the same as the first or second embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

A drawing reading system (10) comprises a character recognition unit (101), a replacement candidate extraction unit (103), a co-occurrence filter unit (105), and a display control unit (109). The co-occurrence filter unit (105) executes processing for calculating a first occurrence score corresponding to an occurrence probability of a recognized string and one or more second occurrence scores corresponding to occurrence probabilities of one or more respective replacement candidate strings, on the basis of a co-occurrence dictionary (133) composed of at least one among first co-occurrence information indicating the frequency with which a target string in a target cell in a table section (62) and another string in another cell co-occur and second co-occurrence information indicating the frequency with which the target string in the target cell and a string in a drawing section (61) co-occur, and for, if one of the second occurrence scores satisfies a first replacement condition of being higher than the first occurrence score and higher than an occurrence score threshold, outputting the replacement candidate string having said second occurrence score as the recognized string, and outputting the recognized string if none of the second occurrence scores satisfy the first replacement condition.

Description

図面読取システム、図面読取方法、及び図面読取プログラムDrawing reading system, drawing reading method, and drawing reading program
 本開示は、図面読取システム、図面読取方法、及び図面読取プログラムに関する。 The present disclosure relates to a drawing reading system, a drawing reading method, and a drawing reading program.
 候補文字のラティスで表される認識文字列と、特定の分野の単語を管理する知識辞書とを照合することにより、認識文字列の誤りを修正することで文字認識を実行する装置の提案がある(例えば、特許文献1参照)。この装置は、知識辞書に管理されている単語の構成文字を組み合わせることで知識辞書に管理されていない単語を生成する生成手段と、認識文字列と知識辞書に管理されている単語との距離を測定するとともに、認識文字列と生成手段によって生成された単語との距離を測定する測定手段と、測定された前記距離に基づいて認識文字列の誤りを修正する修正手段とを備える。 There is a proposal for a device that executes character recognition by correcting errors in a recognized character string by matching a recognized character string represented by a lattice of candidate characters with a knowledge dictionary that manages words in a specific field (see, for example, Patent Document 1). This device comprises generating means for generating words not managed by the knowledge dictionary by combining constituent characters of words managed by the knowledge dictionary, measuring means for measuring the distance between the recognized character string and the words managed by the knowledge dictionary, measuring the distance between the recognized character string and the word generated by the generating means, and correcting means for correcting errors in the recognized character string based on the measured distance.
特開平11-175664号公報(例えば、要約参照)Japanese Patent Application Laid-Open No. 11-175664 (see, for example, abstract)
 しかしながら、上記従来の装置によって、図面における表(以下「表部分」ともいう。)の各セル内の文字の文字認識を実行する場合には、各セル内の文字列が短いため、文字認識の正解率が低いという課題がある。 However, when performing character recognition of characters in each cell of a table in a drawing (hereinafter also referred to as "table portion") with the above conventional device, there is a problem that the accuracy rate of character recognition is low because the character string in each cell is short.
 本開示は、上記従来の課題を解決するためになされたものであり、図面の中の表部分のセル内の文字列の文字認識の正解率を上げることを可能にする図面読取システム、図面読取方法、及び図面読取プログラムを提供することを目的とする。 The present disclosure has been made to solve the conventional problems described above, and aims to provide a drawing reading system, a drawing reading method, and a drawing reading program that make it possible to increase the accuracy rate of character recognition of character strings in cells of table portions in drawings.
 本開示の図面読取システムは、読取対象としての図面のデータを解析して、前記図面を図部分、表部分、及び1つ以上の文字で構成された文字列に分けるレイアウト解析部と、前記文字列に対し文字認識を実行して、前記文字認識の結果として得られたテキストデータである認識文字列を生成する文字認識部と、誤り文字列と正解候補文字列との対である誤りパターンを予め収集することで構成された誤りパターン辞書から、前記認識文字列に一致する前記誤り文字列と対を成す1つ以上の前記正解候補文字列を1つ以上の置換候補文字列として抽出する置換候補抽出部と、前記表部分の対象セル内の対象文字列と前記表部分の他のセル内の他の文字列とが共起する頻度を示す第1の共起情報と前記対象セル内の対象文字列と前記図部分内の文字列とが共起する頻度を示す第2の共起情報との少なくとも一方の共起情報から構成される共起辞書に基づいて、前記認識文字列の生起確率に対応する第1の生起スコアと前記1つ以上の置換候補文字列の各々の生起確率に対応する1つ以上の第2の生起スコアとを計算し、前記1つ以上の第2の生起スコアのうちいずれかが、前記第1の生起スコアより高く且つ予め定められた生起スコア閾値より高いという第1の置換条件を満たす場合に、前記認識文字列として前記第1の置換条件を満たす前記第2の生起スコアの置換候補文字列を出力し、前記第1の置換条件を満たさない場合に、前記認識文字列をそのまま出力する処理である共起フィルタリングを実行する共起フィルタ部と、前記共起フィルタ部から出力された前記認識文字列に基づく画像データを出力する表示制御部と、を有することを特徴とする。 The drawing reading system of the present disclosure analyzes data of a drawing to be read and divides the drawing into a drawing portion, a table portion, and a character string composed of one or more characters; a character recognition portion that performs character recognition on the character string to generate a recognized character string that is text data obtained as a result of the character recognition; a replacement candidate extracting unit for extracting one or more correct candidate character strings paired with a column as one or more replacement candidate character strings; A first occurrence score corresponding to the occurrence probability and one or more second occurrence scores corresponding to the occurrence probability of each of the one or more replacement candidate character strings are calculated, and if a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold, the replacement candidate character string of the second occurrence score that satisfies the first replacement condition is output as the recognized character string, and the first replacement is performed. A co-occurrence filtering unit that executes co-occurrence filtering, which is a process of outputting the recognized character string as it is when a condition is not satisfied, and a display control unit that outputs image data based on the recognized character string output from the co-occurrence filter unit.
 本開示の図面読取方法は、情報処理装置によって実行される方法であって、読取対象としての図面のデータを解析して、前記図面を図部分、表部分、及び1つ以上の文字で構成された文字列に分けるステップと、前記文字列に対し文字認識を実行して、前記文字認識の結果として得られたテキストデータである認識文字列を生成するステップと、誤り文字列と正解候補文字列との対である誤りパターンを予め収集することで構成された誤りパターン辞書から、前記認識文字列に一致する前記誤り文字列と対を成す1つ以上の前記正解候補文字列を1つ以上の置換候補文字列として抽出するステップと、前記表部分の対象セル内の対象文字列と前記表部分の他のセル内の他の文字列とが共起する頻度を示す第1の共起情報と前記対象セル内の対象文字列と前記図部分内の文字列とが共起する頻度を示す第2の共起情報との少なくとも一方の共起情報から構成される共起辞書に基づいて、前記認識文字列の生起確率に対応する第1の生起スコアと前記1つ以上の置換候補文字列の各々の生起確率に対応する1つ以上の第2の生起スコアとを計算し、前記1つ以上の第2の生起スコアのうちいずれかが、前記第1の生起スコアより高く且つ予め定められた生起スコア閾値より高いという第1の置換条件を満たす場合に、前記認識文字列として前記第1の置換条件を満たす前記第2の生起スコアの置換候補文字列を出力し、前記第1の置換条件を満たさない場合に、前記認識文字列をそのまま出力する処理である共起フィルタリングを実行するステップと、前記共起フィルタリングが実行された前記認識文字列に基づく画像データを出力するステップと、を有することを特徴とする。 The drawing reading method of the present disclosure is a method executed by an information processing apparatus, and includes the steps of analyzing data of a drawing to be read and dividing the drawing into a drawing portion, a table portion, and a character string composed of one or more characters; executing character recognition on the character string to generate a recognized character string that is text data obtained as a result of the character recognition; extracting one or more correct candidate character strings paired with the matching error character string as one or more replacement candidate character strings; and first co-occurrence information indicating frequency of co-occurrence of the target character string in the target cell of the table portion and other character strings in other cells of the table portion, and second co-occurrence information indicating frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion, based on a co-occurrence dictionary composed of at least one of co-occurrence information. calculating a first occurrence score corresponding to the occurrence probability of the string and one or more second occurrence scores corresponding to the occurrence probabilities of each of the one or more replacement candidate character strings; and outputting a replacement candidate character string with the second occurrence score that satisfies the first replacement condition as the recognition character string, when a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold; and a step of outputting image data based on the recognized character string subjected to the co-occurrence filtering.
 本開示の図面読取システム、図面読取方法、及び図面読取プログラムを用いれば、図面の中の表部分のセル内の文字列の文字認識の正解率を上げることができる。 By using the drawing reading system, the drawing reading method, and the drawing reading program of the present disclosure, it is possible to increase the accuracy rate of character recognition of character strings in cells of table portions in drawings.
実施の形態1に係る図面読取システムのハードウェア構成の例を示す構成図である。1 is a configuration diagram showing an example of a hardware configuration of a drawing reading system according to Embodiment 1; FIG. 図部分、表部分、及び文字列が記載された図面の例を示す。FIG. 1 shows an example of a drawing in which a figure portion, a table portion, and a character string are described. 図2の図面の中の表部分のセル内の文字列と、この文字列の文字認識の結果の例(誤認識を含む場合)を示す。FIG. 2 shows a character string in a cell in the table portion of the drawing of FIG. 2 and an example of the character recognition result of this character string (when misrecognition is included). 図2の図面の中の表部分のセル内の文字列と、この文字列の文字認識の結果の例(誤認識を含まない場合)を示す。FIG. 2 shows a character string in a cell in the table portion of the drawing of FIG. 2 and an example of the result of character recognition of this character string (when no erroneous recognition is included). 実施の形態1に係る図面読取システム及び情報処理装置の構成を示す機能ブロック図である。1 is a functional block diagram showing configurations of a drawing reading system and an information processing apparatus according to Embodiment 1; FIG. 実施の形態1に係る図面読取システムの文字認識部による、図2の図面の中の図部分に含まれる文字の文字認識の結果の例を表形式で示す。FIG. 3 shows an example of the results of character recognition of characters included in the drawing part in the drawing of FIG. 実施の形態1に係る図面読取システムの文字認識部による、図2の図面の中の表部分のセル内の文字列の文字認識の結果の例を表形式で示す。FIG. 3 shows an example of a character recognition result of a character string in a cell of a table portion in the drawing of FIG. 実施の形態1に係る図面読取システムの誤りパターン辞書の例を表形式で示す。An example of an error pattern dictionary of the drawing reading system according to Embodiment 1 is shown in tabular form. 実施の形態1に係る図面読取システムの共起辞書の例を表形式で示す。An example of a co-occurrence dictionary of the drawing reading system according to Embodiment 1 is shown in tabular form. 実施の形態1に係る図面読取システムの置換候補抽出部によって生成される置換候補文字列の例を表形式で示す。FIG. 4 shows an example of a replacement candidate character string generated by the replacement candidate extraction unit of the drawing reading system according to the first embodiment in tabular form. FIG. 実施の形態1に係る図面読取システムの言語モデルフィルタ部によって生成される置換候補文字列の例を表形式で示す。FIG. 2 shows in tabular form an example of replacement candidate character strings generated by the language model filter unit of the drawing reading system according to the first embodiment; FIG. 実施の形態1に係る図面読取システムの共起フィルタ部によって生成される置換候補文字列の例を表形式で示す。FIG. 10 shows an example of replacement candidate character strings generated by the co-occurrence filter unit of the drawing reading system according to the first embodiment in a tabular format; FIG. 実施の形態1に係る図面読取システムの表示制御部から出力される文字認識の結果であるテキストデータの例を表形式で示す。4 shows an example of text data, which is the result of character recognition output from the display control unit of the drawing reading system according to the first embodiment, in tabular form. 実施の形態1に係る図面読取システムの情報処理装置の動作を示すフローチャートである。4 is a flow chart showing the operation of the information processing device of the drawing reading system according to Embodiment 1; 実施の形態2に係る図面読取システム及び情報処理装置の構成を示す機能ブロック図である。FIG. 7 is a functional block diagram showing configurations of a drawing reading system and an information processing apparatus according to Embodiment 2; 実施の形態2に係る図面読取システムのユーザ操作部による入力操作の例を表形式で示す。An example of input operation by the user operation unit of the drawing reading system according to the second embodiment is shown in a table format. 実施の形態2に係る図面読取システムの動作を示すフローチャートである。9 is a flow chart showing the operation of the drawing reading system according to Embodiment 2; 実施の形態3に係る図面読取システム及び情報処理装置の構成を示す機能ブロック図である。FIG. 11 is a functional block diagram showing the configurations of a drawing reading system and an information processing device according to Embodiment 3; 実施の形態3に係る図面読取システムのテンプレートの例を表形式で示す。An example of a template for the drawing reading system according to the third embodiment is shown in tabular form. 実施の形態3に係る図面読取システムのテンプレートの他の例を表形式で示す。Another example of the template of the drawing reading system according to the third embodiment is shown in tabular form. 実施の形態3に係る図面読取システムのパーツリストの例を表形式で示す。An example of a parts list of the drawing reading system according to Embodiment 3 is shown in tabular form. 実施の形態3に係る図面読取システムの表示制御部から出力されるパーツの個数の例を表形式で示す。FIG. 12 shows an example of the number of parts output from the display control unit of the drawing reading system according to the third embodiment in a tabular format; FIG. 実施の形態3に係る図面読取システムの動作を示すフローチャートである。10 is a flow chart showing the operation of the drawing reading system according to Embodiment 3;
 以下に、実施の形態に係る図面読取システム、図面読取方法、及び図面読取プログラムを説明する。以下の実施の形態は、例にすぎず、実施の形態を適宜組み合わせること及び各実施の形態を適宜変更することが可能である。 A drawing reading system, a drawing reading method, and a drawing reading program according to the embodiment will be described below. The following embodiments are merely examples, and the embodiments can be combined as appropriate and each embodiment can be modified as appropriate.
 実施の形態に係る図面読取システムは、読取対象としての図面の中の表部分のセル内の文字列の文字認識を行うことができる。図面は、用紙に描かれた状態の図面、電子データ化された図面(例えば、Portable Document Format(PDF)のデータ)、のいずれであってもよい。図面は、例えば、機械図面、建築図面、土木図面、配管図面、電気図面、地図、及びレイアウト図面、などである。一般に、図面には、図部分、表部分、及び文字列が描かれている。ただし、図部分と表部分とは、異なるページに描かれていてもよい。図部分には、図形が描かれている。図形は、例えば、機器(例えば、分電盤、配電盤などのパネル)、建築物、電気機器(例えば、メーター、操作ボタン、表示ランプ)、などを含む。図部分には、文字列が含まれてもよい。表部分は、複数のセルの境界である複数の罫線を有する。表部分のセル内には、文字列が記載される。文字列は、1つ以上の文字からなる。文字は、テキストデータで表記可能な要素である。文字は、記号を含む。 The drawing reading system according to the embodiment can perform character recognition of character strings in cells of table portions in drawings to be read. A drawing may be a drawing drawn on paper or a drawing converted to electronic data (for example, Portable Document Format (PDF) data). The drawings include, for example, mechanical drawings, architectural drawings, civil engineering drawings, piping drawings, electrical drawings, maps, and layout drawings. In general, a drawing contains a figure portion, a table portion, and text. However, the figure portion and the table portion may be drawn on different pages. Figures are drawn in the figure portion. The graphics include, for example, devices (eg, distribution boards, panels such as switchboards), buildings, electrical devices (eg, meters, operation buttons, indicator lamps), and the like. The figure part may contain a character string. The table portion has a plurality of ruled lines that are boundaries of a plurality of cells. A character string is written in the cell of the table portion. A string consists of one or more characters. A character is an element that can be represented by text data. Characters include symbols.
 実施の形態に係る図面読取システムは、文字列が描かれている図面のデータを取得し、この図面に描かれている文字列に文字認識を実行し、文字列に対応するテキストデータである認識文字列を文字認識の結果として出力する。実施の形態に係る図面読取システムは、図面の中の表部分のセル内の文字列の文字認識の正解率を向上させることができる。実施の形態に係る図面読取方法は、図面読取システムの情報処理装置によって実施されることができる。また、実施の形態に係る図面読取プログラムは、例えば、情報処理装置としてのコンピュータによって実行される。 The drawing reading system according to the embodiment acquires data of a drawing in which a character string is drawn, executes character recognition on the character string drawn in this drawing, and outputs a recognized character string, which is text data corresponding to the character string, as a result of character recognition. The drawing reading system according to the embodiment can improve the accuracy rate of character recognition of character strings in cells of table portions in drawings. The drawing reading method according to the embodiment can be implemented by an information processing device of a drawing reading system. Further, the drawing reading program according to the embodiment is executed by, for example, a computer as an information processing device.
実施の形態1.
 図1は、実施の形態1に係る図面読取システム10のハードウェア構成の例を示す。図面読取システム10は、情報処理装置11を有している。情報処理装置11は、例えば、コンピュータである。情報処理装置11は、実施の形態1に係る図面読取方法を実施することができる装置である。情報処理装置11は、図部分61と表部分62とが描かれた図面60を示すデータI0を取得し、表部分62のセル内の文字列を文字認識し、文字認識の結果を出力データとして出力する。出力データは、例えば、表部分62のセル内の文字列に対応する文字を示すテキストデータを含む。
Embodiment 1.
FIG. 1 shows an example of the hardware configuration of a drawing reading system 10 according to the first embodiment. The drawing reading system 10 has an information processing device 11 . The information processing device 11 is, for example, a computer. The information processing device 11 is a device capable of implementing the drawing reading method according to the first embodiment. The information processing device 11 acquires data I0 representing a drawing 60 in which a figure portion 61 and a table portion 62 are drawn, character-recognizes a character string in a cell of the table portion 62, and outputs the result of character recognition as output data. The output data includes, for example, text data indicating the characters corresponding to the character strings in the cells of table portion 62 .
 図面読取システム10は、イメージスキャナ41、ディスプレイ42、及びユーザ操作部43を有している。イメージスキャナ41は、図面60を光学的に読み取る画像読取装置であり、情報処理装置11に図面60のデータI0を提供する。ディスプレイ42は、情報処理装置11から出力される画像データに基づく画像を表示する。ユーザ操作部43は、ユーザの操作を受け付け、ユーザによる入力情報を情報処理装置11に提供する。なお、イメージスキャナ41、ディスプレイ42、及びユーザ操作部43は、図面読取システム10の一部である必要はない。イメージスキャナ41、ディスプレイ42、及びユーザ操作部43は、情報処理装置11と通信可能な外部の装置(例えば、ネットワーク上のサーバなど)に備えられたイメージスキャナ、ディスプレイ、及びユーザ操作部であってもよい。 The drawing reading system 10 has an image scanner 41 , a display 42 and a user operation section 43 . The image scanner 41 is an image reading device that optically reads the drawing 60 and provides the data I0 of the drawing 60 to the information processing device 11 . The display 42 displays an image based on image data output from the information processing device 11 . The user operation unit 43 receives user operations and provides information input by the user to the information processing apparatus 11 . Note that the image scanner 41 , the display 42 and the user operation unit 43 need not be part of the drawing reading system 10 . The image scanner 41, display 42, and user operation unit 43 may be an image scanner, display, and user operation unit provided in an external device (for example, a server on a network) that can communicate with the information processing device 11.
 情報処理装置11は、プロセッサ201と、揮発性の記憶装置であるメモリ202と、ハードディスクドライブ(HDD)又はソリッドステートドライブ(SSD)などの不揮発性の記憶装置203と、外部の装置との通信を行う通信装置204と、インタフェース205とを有している。メモリ202は、例えば、RAM(Random Access Memory)などの半導体メモリである。 The information processing device 11 has a processor 201, a memory 202 that is a volatile storage device, a nonvolatile storage device 203 such as a hard disk drive (HDD) or solid state drive (SSD), a communication device 204 that communicates with external devices, and an interface 205. The memory 202 is, for example, a semiconductor memory such as a RAM (Random Access Memory).
 情報処理装置11の各機能は、処理回路により実現される。処理回路は、専用のハードウェアであっても、メモリ202に格納されるプログラム(例えば、図面読取プログラム)を実行するプロセッサ201であってもよい。プロセッサ201は、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、及びDSP(Digital Signal Processor)のいずれであってもよい。 Each function of the information processing device 11 is realized by a processing circuit. The processing circuit may be dedicated hardware or may be the processor 201 that executes a program (for example, a drawing reading program) stored in the memory 202 . The processor 201 may be any of a processing device, an arithmetic device, a microprocessor, a microcomputer, and a DSP (Digital Signal Processor).
 処理回路が専用のハードウェアである場合、処理回路は、例えば、ASIC(Application Specific Integrated Circuit)又はFPGA(Field Programmable Gate Array)などである。 When the processing circuit is dedicated hardware, the processing circuit is, for example, ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array).
 処理回路がプロセッサ201である場合、画面読取方法は、ソフトウェア、ファームウェア、又はソフトウェアとファームウェアとの組み合わせにより実行される。ソフトウェア及びファームウェアは、プログラムとして記述され、メモリ202に格納される。プロセッサ201は、メモリ202に記憶されたプログラムを読み出して実行することにより、実施の形態1に係る図面読取方法を実施することができる。 When the processing circuit is the processor 201, the screen reading method is executed by software, firmware, or a combination of software and firmware. Software and firmware are written as programs and stored in memory 202 . The processor 201 can implement the drawing reading method according to the first embodiment by reading and executing the program stored in the memory 202 .
 なお、情報処理装置11は、一部を専用のハードウェアで実現し、他の一部をソフトウェア又はファームウェアで実現するようにしてもよい。このように、処理回路は、ハードウェア、ソフトウェア、ファームウェア、又はこれらのうちのいずれかの組み合わせによって、上述の各機能を実現することができる。 It should be noted that the information processing device 11 may be partially implemented by dedicated hardware and partially implemented by software or firmware. As such, the processing circuitry may implement each of the functions described above in hardware, software, firmware, or any combination thereof.
 インタフェース205は、他の装置と通信するために用いられる。イメージスキャナ41、ディスプレイ42、及びユーザ操作部43は、インタフェース205に接続される。 The interface 205 is used to communicate with other devices. Image scanner 41 , display 42 , and user operation unit 43 are connected to interface 205 .
 図2は、図部分61、表部分62、及び文字列が記載された図面60の例を示す。図部分61は、パーツを備えた「パネル101号」である。「パネル101号」は、記号A、XI(番号1、2)で示される計器(例えば、メーター)、記号LS、DS(番号3、4)で示される表示ランプ、記号SS(番号5、6)で示される操作ボタンを備えている。図面60は、図部分61を備えていないものであってもよい。表部分62の縦横の罫線で囲われたセル内には、文字列が記載されている。表部分62には、パーツの番号、記号、内容が記載されている。表部分62は、番号1で記号Aのパーツは、風量調節弁〇号の開度を示す計器であることが示されている。また、番号2で記号XIのパーツは、反応槽△の回路風量を示す計器であることが示されている。番号3で記号LSのパーツは、嫌気槽Aの空気撹拌弁〇号の「開-閉」状態を示す表示ランプであることが示されている。番号4で記号DSのパーツは、嫌気槽Aの空気撹拌弁〇号の状態が「警報-注意報-正常」のいずれであるかを示す表示ランプであることが示されている。番号5で記号SSのパーツは、緊急停止ボタンであることが示されている。番号6で記号SSのパーツは、試験用ボタンであることが示されている。図面60は、一例であり、読取対象としての図面は他の記載内容のものであってもよい。 FIG. 2 shows an example of a drawing 60 in which a figure portion 61, a table portion 62, and a character string are described. Drawing portion 61 is "Panel No. 101" with parts. "Panel No. 101" has gauges (e.g., meters) indicated by symbols A and XI (numbers 1 and 2), display lamps indicated by symbols LS and DS (numbers 3 and 4), and operation buttons indicated by symbols SS (numbers 5 and 6). Drawing 60 may not include drawing portion 61 . Character strings are written in the cells surrounded by the vertical and horizontal ruled lines of the table portion 62 . The table portion 62 describes the part numbers, symbols, and contents. The table portion 62 shows that the part with the number 1 and the symbol A is an instrument that indicates the degree of opening of the air volume control valve No.0. Also, it is shown that the part with symbol XI in number 2 is an instrument for indicating the circuit air volume of the reaction tank Δ. The part numbered LS with the number 3 is shown to be an indicator lamp that indicates the “open-closed” state of the air agitation valve No. 0 of the anaerobic tank A. The part with the symbol DS in number 4 is an indicator lamp that indicates whether the state of the air agitation valve No. 0 of the anaerobic tank A is "alarm - warning - normal". It is shown that the part with the symbol SS at number 5 is an emergency stop button. The part with the symbol SS at number 6 is shown to be a test button. The drawing 60 is an example, and the drawing to be read may have other description contents.
 図3は、図2の図面60の表部分62のセル内の文字列と、この文字列の文字認識の結果(誤認識を含む場合)の例を表形式で示す。図3では、「正(SEI)」の文字認識の結果が「王(OU)」であり、誤認識が発生した例が示されている。図4は、図2の図面60の表部分62のセル内の文字列と、この文字列の文字認識の結果(誤認識を含まない場合)の例を表形式で示す。 FIG. 3 shows, in tabular form, an example of a character string in a cell of the table portion 62 of the drawing 60 of FIG. 2 and the result of character recognition of this character string (when misrecognition is included). FIG. 3 shows an example in which the result of character recognition of "correct (SEI)" is "king (OU)" and misrecognition has occurred. FIG. 4 shows, in tabular form, an example of a character string in a cell of the table portion 62 of the drawing 60 of FIG. 2 and the result of character recognition of this character string (when misrecognition is not included).
 図5は、図面読取システム10及び情報処理装置11の構成を示す機能ブロック図である。図面読取システム10は、レイアウト解析部100と、文字認識部101と、表形式変換部102と、置換候補抽出部103と、言語モデルフィルタ部104と、共起フィルタ部105と、信頼度計算部108と、表示制御部109と、誤りパターン辞書131と、言語モデル132と、共起辞書133とを有している。置換候補抽出部103、言語モデルフィルタ部104、及び共起フィルタ部105は、認識文字列を修正するためのフィルタリングを行うフィルタ処理部110を構成している。誤りパターン辞書131、言語モデル132、及び共起辞書133は、例えば、図1の記憶装置203に格納される。誤りパターン辞書131、言語モデル132、及び共起辞書133は、情報処理装置11と通信可能な外部の装置(例えば、ネットワーク上のサーバ)の記憶装置に格納されてもよい。 FIG. 5 is a functional block diagram showing the configuration of the drawing reading system 10 and the information processing device 11. As shown in FIG. The drawing reading system 10 includes a layout analysis unit 100, a character recognition unit 101, a table format conversion unit 102, a replacement candidate extraction unit 103, a language model filter unit 104, a co-occurrence filter unit 105, a reliability calculation unit 108, a display control unit 109, an error pattern dictionary 131, a language model 132, and a co-occurrence dictionary 133. The replacement candidate extraction unit 103, the language model filter unit 104, and the co-occurrence filter unit 105 constitute a filter processing unit 110 that performs filtering for correcting the recognized character string. The error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 are stored, for example, in the storage device 203 of FIG. The error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 may be stored in a storage device of an external device (for example, a server on a network) that can communicate with the information processing device 11. FIG.
 レイアウト解析部100は、図面60のデータI0を解析して、図面60を図部分61、表部分62、及び1つ以上の文字で構成された文字列(すなわち、テキスト部分)に分ける。 The layout analysis unit 100 analyzes the data I0 of the drawing 60 and divides the drawing 60 into a drawing portion 61, a table portion 62, and a character string (that is, a text portion) composed of one or more characters.
 文字認識部101は、レイアウト解析部100から取得した文字列に対し文字認識を実行して、文字認識の結果として得られたテキストデータである認識文字列を生成する。具体的には、文字認識部101は、文字認識の結果として、文字認識によって得られた文字とその位置情報(すなわち、図面60における位置を示す情報)とのセットを出力する。図6は、図面読取システム10の文字認識部101による、図2の図面60の図部分61に含まれる文字の文字認識の結果の例を表形式で示す。 The character recognition unit 101 performs character recognition on the character string acquired from the layout analysis unit 100, and generates a recognized character string, which is text data obtained as a result of character recognition. Specifically, the character recognition unit 101 outputs a set of characters obtained by character recognition and position information thereof (that is, information indicating the position in the drawing 60) as a result of character recognition. FIG. 6 shows, in tabular form, an example of the results of character recognition of characters included in the drawing portion 61 of the drawing 60 of FIG.
 表形式変換部102は、文字認識部101によって文字認識された文字列の文字認識の結果に基づいて表構造を再構築し、文字列を表の形式で保持する。図7は、図面読取システム10の文字認識部101による、図2の図面60の表部分62のセル内の文字列の文字認識の結果の例を表形式で示す。図7では、「風量調節弁〇号」が「風量調節弁Q号」と誤認識され、「反応槽△」が「反応槽A」と誤認識され、「正(SEI)」が「王(OU)」と誤認識された例が示されている。 The table format conversion unit 102 reconstructs the table structure based on the character recognition results of the character strings recognized by the character recognition unit 101, and holds the character strings in a table format. FIG. 7 shows, in tabular form, an example of character recognition results of character strings in cells of the table portion 62 of the drawing 60 of FIG. FIG. 7 shows an example in which "air volume control valve No. 0" is erroneously recognized as "air volume control valve Q", "reaction tank △" is erroneously recognized as "reaction tank A", and "positive (SEI)" is erroneously recognized as "king (OU)".
 フィルタ処理部110は、表形式の文字認識の結果に対してフィルタリングを行うことで、文字認識の誤りを見つけ、文字認識の結果を修正する。フィルタ処理部110は、誤りパターン辞書131を用いて、誤りパターンに合致したものを置換候補とする処理を行い、言語モデル132を用いて、言語スコアを計算する処理を行い、その後、共起辞書133を用いて共起スコアを計算する処理を行う。フィルタ処理部110は、言語スコア及び共起スコアのいずれか又は両方のスコアに基づいて、認識文字列を修正するか、しないかを決定する。 The filter processing unit 110 finds errors in character recognition and corrects the character recognition results by filtering the tabular character recognition results. The filter processing unit 110 uses the error pattern dictionary 131 to perform a process of setting those that match the error pattern as replacement candidates, uses the language model 132 to perform a process of calculating the language score, and then performs a process of calculating the co-occurrence score using the co-occurrence dictionary 133. The filtering unit 110 determines whether or not to modify the recognized character string based on either or both of the language score and the co-occurrence score.
 また、フィルタ処理部110は、誤りパターン辞書131を用いて、誤りパターンに合致したものを置換候補とする処理を行い、共起辞書133を用いて共起スコアを計算する処理を行い、その後、言語モデル132を用いて、言語スコアを計算する処理を行うことも可能である。 In addition, the filter processing unit 110 uses the error pattern dictionary 131 to perform a process of setting those that match the error pattern as replacement candidates, uses the co-occurrence dictionary 133 to perform a process of calculating a co-occurrence score, and then uses the language model 132 to perform a process of calculating a language score.
 また、フィルタ処理部110は、誤りパターン辞書131を用いて、誤りパターンに合致したものを置換候補とする処理を行い、共起辞書133を用いて共起スコアを計算する処理を行い、言語モデル132を用いて、言語スコアを計算する処理を行わないことも可能である。言い換えれば、情報処理装置11は、言語モデル132と言語モデルフィルタ部104とを備えないことも可能である。この場合には、フィルタ処理部110は、共起スコアに基づいて、認識文字列を修正するか、しないかを決定する。 In addition, the filter processing unit 110 uses the error pattern dictionary 131 to perform the process of setting those matching the error pattern as replacement candidates, uses the co-occurrence dictionary 133 to perform the process of calculating the co-occurrence score, and uses the language model 132 to not perform the process of calculating the language score. In other words, the information processing device 11 may not include the language model 132 and the language model filter unit 104 . In this case, the filtering unit 110 determines whether or not to modify the recognized character string based on the co-occurrence score.
 また、フィルタ処理部110は、誤りパターン辞書131を用いて、誤りパターンに合致したものを置換候補とする処理を行い、共起辞書133を用いて共起スコアを計算する処理を行い、言語モデル132を用いて、言語スコアを計算する処理を行わないことも可能である。言い換えれば、情報処理装置11は、言語モデル132と言語モデルフィルタ部104とを備えないことも可能である。この場合には、フィルタ処理部110は、共起スコアに基づいて、認識文字列を修正するか、しないかを決定する。 In addition, the filter processing unit 110 uses the error pattern dictionary 131 to perform the process of setting those matching the error pattern as replacement candidates, uses the co-occurrence dictionary 133 to perform the process of calculating the co-occurrence score, and uses the language model 132 to not perform the process of calculating the language score. In other words, the information processing device 11 may not include the language model 132 and the language model filter unit 104 . In this case, the filtering unit 110 determines whether or not to modify the recognized character string based on the co-occurrence score.
 図8は、誤りパターン辞書131の例を表形式で示す。誤りパターン辞書131は、文字認識誤りと正解文字列の対の情報を格納する。誤りパターン辞書131は、誤り文字列と正解候補文字列との対である誤りパターンを予め収集することで構成される。置換候補抽出部103は、誤りパターン辞書131から、認識文字列に一致する誤り文字列と対を成す1つ以上の正解候補文字列を1つ以上の置換候補文字列として抽出する。つまり、置換候補抽出部103は、誤りパターン辞書131を用いて、文字認識の誤りと判断される文字列を抽出し、正解文字列に置き換える準備として、例えば、正解文字列の候補である正解候補文字列(「置換候補文字列」ともいう。)を作成する。 FIG. 8 shows an example of the error pattern dictionary 131 in tabular form. The error pattern dictionary 131 stores information on pairs of character recognition errors and correct character strings. The error pattern dictionary 131 is configured by previously collecting error patterns, which are pairs of error character strings and correct candidate character strings. The replacement candidate extraction unit 103 extracts one or more correct candidate character strings paired with an error character string matching the recognized character string from the error pattern dictionary 131 as one or more replacement candidate character strings. That is, the replacement candidate extraction unit 103 uses the error pattern dictionary 131 to extract a character string that is determined to be a character recognition error, and in preparation for replacing it with a correct character string, for example, creates a correct candidate character string (also referred to as a "replacement candidate character string") that is a candidate for the correct character string.
 言語モデルフィルタ部104は、単語の連鎖と前記連鎖の出現確率とが格納された言語モデル132に基づいて、認識文字列の生起確率(すなわち、置換前の生起確率)に対応する第1の言語スコアと1つ以上の置換候補文字列の各々の生起確率(すなわち、置換後の生起確率)に対応する1つ以上の第2の言語スコアとを計算する。言語モデル132は、大量のコーパスから学習した、単語の連鎖が統計的に記述されたモデルである。言語モデル132は、予め記憶装置203に記憶される。言語モデルフィルタ部104は、1つ以上の第2の言語スコアのうちのいずれかが、第1の言語スコアより高く且つ予め定められた言語スコア閾値TH2より高いという置換条件C2(第2の置換条件)を満たす第2の言語スコアがある場合に、認識文字列として置換条件C2を満たす第2の言語スコアの置換候補文字列を出力し、置換条件C2を満たさない場合に、認識文字列をそのまま出力する。言語モデルフィルタ部104によって行われる以上の処理は、言語モデルフィルタリングともいう。言語モデルフィルタリングは、言語モデルスコアを上げる処理である。 The language model filter unit 104 calculates a first language score corresponding to the occurrence probability of the recognized character string (that is, the occurrence probability before replacement) and one or more second language scores that correspond to the occurrence probability of each of one or more replacement candidate character strings (that is, occurrence probability after replacement) based on the language model 132 that stores the word chains and the occurrence probabilities of the chains. The language model 132 is a model in which word sequences are statistically described, learned from a large corpus. The language model 132 is stored in the storage device 203 in advance. The language model filter unit 104 outputs the replacement candidate character string of the second language score that satisfies the replacement condition C2 as the recognized character string when there is a second language score that satisfies the replacement condition C2 (the second replacement condition) that any one of the one or more second language scores is higher than the first language score and higher than the predetermined language score threshold TH2, and outputs the recognized character string as it is when the replacement condition C2 is not satisfied. The above processing performed by the language model filter unit 104 is also called language model filtering. Language model filtering is processing that increases the language model score.
 図9は、共起辞書133の例を表形式で示す。共起辞書133は、表部分62の対象セル内の対象文字列と表部分62の他のセル内の他の文字列とが共起する頻度を示す第1の共起情報と、対象セル内の対象文字列と図部分61内の文字列とが共起する頻度を示す第2の共起情報と、の少なくとも一方の共起情報から構成される。具体的には、共起辞書133は、大量のコーパスから学習した、図面の表部分の上下左右の共起情報、表部分に対応した図部分又は表部分の外の文字列との共起情報を記述した辞書である。共起辞書133は、予め記憶装置203に記憶される。共起辞書133は、情報処理装置11と通信可能な他の装置(例えば、ネットワーク上のサーバ)の記憶装置に記憶されてもよい。 FIG. 9 shows an example of the co-occurrence dictionary 133 in tabular form. The co-occurrence dictionary 133 includes at least one of co-occurrence information of first co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell of the table portion 62 and other character strings in other cells of the table portion 62, and second co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion 61. Specifically, the co-occurrence dictionary 133 is a dictionary that describes co-occurrence information on top, bottom, left, and right of the table part of the drawing, and co-occurrence information with the figure part corresponding to the table part or with the character strings outside the table part, learned from a large corpus. The co-occurrence dictionary 133 is stored in advance in the storage device 203 . The co-occurrence dictionary 133 may be stored in a storage device of another device (for example, a server on a network) that can communicate with the information processing device 11 .
 共起フィルタ部105は、共起辞書133に基づいて、言語モデルフィルタ部104又は置換候補抽出部103から取得した認識文字列の生起確率(すなわち、置換前の生起確率)に対応する第1の生起スコアと1つ以上の置換候補文字列の各々の生起確率(すなわち、置換後の生起確率)に対応する1つ以上の第2の生起スコアとを計算する。共起フィルタ部105は、1つ以上の第2の生起スコアのうちいずれかが、第1の生起スコアより高く且つ予め定められた生起スコア閾値TH1より高いという置換条件C1(第1の置換条件)を満たす場合に、認識文字列として置換条件C1を満たす第2の生起スコアの置換候補文字列を出力し、置換条件C1を満たさない場合に、認識文字列をそのまま出力する。共起フィルタ部105によって行われる以上の処理は、共起フィルタリングともいう。共起フィルタリングは、共起スコアを上げる処理である。 Based on the co-occurrence dictionary 133, the co-occurrence filter unit 105 calculates a first occurrence score corresponding to the occurrence probability of the recognized character string acquired from the language model filter unit 104 or the replacement candidate extraction unit 103 (that is, the occurrence probability before replacement) and one or more second occurrence scores that correspond to each occurrence probability of one or more replacement candidate character strings (that is, occurrence probability after replacement). The co-occurrence filter unit 105 outputs the replacement candidate character string of the second occurrence score that satisfies the replacement condition C1 as the recognized character string when any of the one or more second occurrence scores satisfies the replacement condition C1 (the first replacement condition) that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than the predetermined occurrence score threshold TH1, and outputs the recognized character string as it is when the replacement condition C1 is not satisfied. The above processing performed by the co-occurrence filter unit 105 is also called co-occurrence filtering. Co-occurrence filtering is a process for increasing the co-occurrence score.
 図10は、置換候補抽出部103によって生成される置換候補文字列の例(置換候補#1~#3)を表形式で示す。図11は、言語モデルフィルタ部104によって生成される置換候補文字列の例(置換候補#1~#3)を表形式で示す。図12は、共起フィルタ部105によって生成される置換候補文字列の例(置換候補#1~#3)を表形式で示す。
各図において、グレイの背景を伴う文字は、誤認識された文字を示す。具体的には、各図には、「〇」が「Q」と「△」が「A」と誤認識され、日本語の「回(KAI)」が記号の「□(四角)」と誤認識され、「A」が「△」と誤認識され、日本語の「正(SEI)」が日本語の「王(OU)」と誤認識された例が示されている。また、各図において、斜線部分のセル内の文字列は、除去された要素である。
FIG. 10 shows an example of replacement candidate character strings (replacement candidates #1 to #3) generated by the replacement candidate extraction unit 103 in tabular form. FIG. 11 shows examples of replacement candidate character strings (replacement candidates #1 to #3) generated by the language model filter unit 104 in tabular form. FIG. 12 shows examples of replacement candidate character strings (replacement candidates #1 to #3) generated by the co-occurrence filter unit 105 in tabular form.
In each figure, characters with a gray background indicate misrecognized characters. Specifically, each figure shows an example in which ``○'' is erroneously recognized as ``Q'' and ``△'' as ``A'', the Japanese ``times (KAI)'' is erroneously recognized as the symbol ``□ (square)'', ``A'' is erroneously recognized as ``△'', and the Japanese ``sei (SEI)'' is erroneously recognized as the Japanese ``king (OU)''. Also, in each figure, the character strings in the shaded cells are elements that have been removed.
 信頼度計算部108は、誤り補正部分について、言語モデルフィルタ部104で得られた言語スコア及び共起フィルタ部105で得られた共起スコアを総合(例えば、加算、重み付け加算、など)して、最終的にスコアが低い場合に信頼度が低い結果であると判断し、スコアが高い場合に信頼度が高い結果であると判断する。情報処理装置11は、信頼度計算部108を備えないことも可能である。 The reliability calculation unit 108 integrates (for example, summation, weighted addition, etc.) the language score obtained by the language model filter unit 104 and the co-occurrence score obtained by the co-occurrence filter unit 105 for the error correction part, and determines that the reliability is low when the final score is low, and the reliability is high when the score is high. The information processing device 11 may not include the reliability calculation unit 108 .
 表示制御部109は、共起フィルタ部105から出力された認識文字列、又は言語モデルフィルタ部104及び共起フィルタ部105から出力された認識文字列、に基づく画像データを出力する。表示制御部109は、認識誤り補正した結果を表示する際に、信頼度計算部108で計算したスコアが低い場合に、強調表示などでアラートを出すことで、人手によるチェックを促してもよい。例えば、『混合槽 A材料撹拌弁 〇号 「警報-注意報-王常」』における「王(OU)」は、「正(SEI)」の誤認識の結果であるため、表示制御部109は、色、輝度、点滅、などの強調表示を行ってもよい。 The display control unit 109 outputs image data based on the recognized character string output from the co-occurrence filter unit 105 or the recognized character string output from the language model filter unit 104 and co-occurrence filter unit 105 . When the result of recognition error correction is displayed, if the score calculated by the reliability calculation unit 108 is low, the display control unit 109 may issue an alert by highlighting or the like to prompt manual checking. For example, "King (OU)" in "Mixing tank A material stirring valve No. 0 "Alarm-Warning-Wang Chang"" is the result of erroneous recognition of "Correct (SEI)", so the display control unit 109 may perform highlighting such as color, brightness, and blinking.
 図13は、表示制御部109から出力される文字認識の結果の例を表形式で示す。表形式データは、図面から読み取った文字列の文字認識の結果を、例えば、csv(Comma-Separated Values)形式などで出力する。 FIG. 13 shows an example of character recognition results output from the display control unit 109 in a table format. As the tabular data, the result of character recognition of the character string read from the drawing is output in, for example, csv (Comma-Separated Values) format.
 図14は、図面読取システム10の情報処理装置11の動作を示すフローチャートである。先ず、情報処理装置11は、図面60のデータI0を取得する(ステップS11)。次に、情報処理装置11は、図面60を図部分61、表部分62、及び文字列部分に分ける(ステップS12)。次に、情報処理装置11は、表部分62のセル内の文字列の文字認識を実行して、文字認識の結果として得られたテキストデータである認識文字列を生成する(ステップS13)。このとき、表部分62のセル内以外の文字列の文字認識も行う。次に、情報処理装置11は、表部分62の認識データである表認識データを表形式データ(例えば、csv形式)に変換する(ステップS14)。 14 is a flowchart showing the operation of the information processing device 11 of the drawing reading system 10. FIG. First, the information processing device 11 acquires the data I0 of the drawing 60 (step S11). Next, the information processing device 11 divides the drawing 60 into a drawing portion 61, a table portion 62, and a character string portion (step S12). Next, the information processing device 11 performs character recognition of the character strings in the cells of the table portion 62 to generate recognized character strings, which are text data obtained as a result of character recognition (step S13). At this time, character recognition of character strings other than those in the cells of the table portion 62 is also performed. Next, the information processing device 11 converts the table recognition data, which is the recognition data of the table portion 62, into table format data (for example, csv format) (step S14).
 次に、情報処理装置11は、誤りパターン辞書131から、認識文字列に一致する誤り文字列と対を成す1つ以上の正解候補文字列を1つ以上の置換候補文字列として抽出する(ステップS15)。情報処理装置11は、言語モデル132に基づいて言語モデルフィルタリングを実行する(ステップS16)。情報処理装置11は、共起辞書133に基づいて、共起フィルタリングを実行する(ステップS17)。情報処理装置11は、言語モデルフィルタリングと共起フィルタリングとは、図14の順番とは逆の順番で行ってもよく、又は、並行して行ってもよい。 Next, the information processing device 11 extracts one or more correct candidate character strings paired with the error character string matching the recognized character string from the error pattern dictionary 131 as one or more replacement candidate character strings (step S15). The information processing device 11 executes language model filtering based on the language model 132 (step S16). The information processing device 11 executes co-occurrence filtering based on the co-occurrence dictionary 133 (step S17). The information processing apparatus 11 may perform the language model filtering and the co-occurrence filtering in the order opposite to that in FIG. 14, or may perform them in parallel.
 次に、情報処理装置11は、言語モデルフィルタ部104で得られた言語スコア及び共起フィルタ部105で得られた共起スコアに基づいて、修正された認識文字列の信頼度を計算する(ステップS18)。情報処理装置11は、画像データ(例えば、ディスプレイを表示させる表示データ)を出力する(ステップS19)。 Next, the information processing device 11 calculates the reliability of the corrected recognized character string based on the language score obtained by the language model filter unit 104 and the co-occurrence score obtained by the co-occurrence filter unit 105 (step S18). The information processing device 11 outputs image data (for example, display data for displaying a display) (step S19).
 以上に説明したように、実施の形態1に係る図面読取システム、図面読取方法、及び図面読取プログラムを用いれば、セル間の共起の発生頻度(出現頻度)又はセルと図部分との間の共起の発生頻度(出現頻度)を用いて、置換候補文字列を評価するので、図面の中の表部分のセル内の文字列の文字認識の正解率を上げることができる。 As described above, by using the drawing reading system, the drawing reading method, and the drawing reading program according to Embodiment 1, replacement candidate character strings are evaluated using the co-occurrence frequency (appearance frequency) between cells or the co-occurrence frequency (appearance frequency) between a cell and a figure part, so that the accuracy rate of character recognition of character strings in the cells of the table part in the drawing can be increased.
実施の形態2.
 図15は、実施の形態2に係る図面読取システム20及び情報処理装置21の構成を示す機能ブロック図である。図15において、図5に示される構成と同一又は対応する構成には、図5に示される符号と同じ符号が付されている。情報処理装置21は、文字認識正解データ135の記憶部と知識自動獲得部134とを有する点において、図5に示される情報処理装置11と相違する。具体的に言えば、知識自動獲得部134は、文字認識の結果である認識文字列の修正の結果と(事前に準備した)文字認識正解データとを比較し、比較の結果に基づいて、言語モデル132及び対象セルの周辺のセルとの共起の発生頻度を示す共起辞書と、修正を学習させるフィルタ自動取得部とを備えている。
Embodiment 2.
FIG. 15 is a functional block diagram showing configurations of the drawing reading system 20 and the information processing device 21 according to the second embodiment. 15, the same reference numerals as those shown in FIG. 5 are attached to the same or corresponding configurations as those shown in FIG. The information processing apparatus 21 differs from the information processing apparatus 11 shown in FIG. 5 in that it has a storage unit for character recognition correct data 135 and an automatic knowledge acquisition unit 134 . More specifically, the automatic knowledge acquisition unit 134 compares the result of correcting the recognized character string, which is the result of character recognition, with character recognition correct data (prepared in advance), and based on the comparison result, includes a co-occurrence dictionary that indicates the frequency of co-occurrence between the language model 132 and the target cell and surrounding cells, and an automatic filter acquisition unit that learns the correction.
 ユーザ操作部43は、表示制御部109によってディスプレイ42表示され表形式の文字認識の結果を、ユーザの操作によって修正できるようにする。正解文字列は、文字認識誤りが含まれていた場合に、正しい文字列をユーザ操作によって付与したものである。図16は、ユーザ操作部43による正しい文字列への訂正操作の例を示す。 The user operation unit 43 enables the character recognition results in tabular form displayed on the display 42 by the display control unit 109 to be corrected by the user's operation. A correct character string is a correct character string added by a user's operation when a character recognition error is included. FIG. 16 shows an example of a correction operation to a correct character string by the user operation unit 43. FIG.
 知識自動獲得部134は、文字認識の結果と文字認識正解データ135の差分箇所を誤りパターン(候補)として自動抽出するだけでなく、文字認識正解データ135から、文字又は単語の連鎖を統計的に表す言語モデル132、上下左右のセルとの共起の頻度を表す共起辞書133を、自動で学習する。つまり、知識自動獲得部134は、修正データに基づいて誤りパターン辞書131、言語モデル132、共起辞書133のいずれか1つ以上を修正する。文字認識正解データ135は、図面に対する文字認識の正解データである。文字認識正解データ135は、予め記憶装置203に記憶される。文字認識正解データ135は、情報処理装置11と通信可能な他の装置(例えば、ネットワーク上のサーバ)の記憶装置に記憶されてもよい。 The automatic knowledge acquisition unit 134 not only automatically extracts the difference between the character recognition result and the correct character recognition data 135 as an error pattern (candidate), but also automatically learns from the correct character recognition data 135 a language model 132 that statistically represents the chain of characters or words, and a co-occurrence dictionary 133 that represents the frequency of co-occurrence with cells above, below, left and right. That is, the automatic knowledge acquisition unit 134 modifies one or more of the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 based on the correction data. The character recognition correct data 135 is character recognition correct data for the drawing. The character recognition correct data 135 is stored in advance in the storage device 203 . The character recognition correct data 135 may be stored in a storage device of another device (for example, a server on the network) that can communicate with the information processing device 11 .
 図17は、図面読取システム20の情報処理装置21の動作を示すフローチャートである。図17において、図14に示されるステップと同じ内容のステップには、図14に示される符号と同じ符号が付されている。情報処理装置21は、ディスプレイ42に表示された認識文字列についての、ユーザ操作部43から入力された修正データを取得し(ステップS20)、その後、修正データに基づいて誤りパターン辞書131、言語モデル132、共起辞書133のいずれか1つ以上を修正(すなわち、更新)する。 17 is a flowchart showing the operation of the information processing device 21 of the drawing reading system 20. FIG. In FIG. 17, steps having the same content as the steps shown in FIG. 14 are given the same reference numerals as those shown in FIG. The information processing device 21 acquires correction data input from the user operation unit 43 for the recognized character string displayed on the display 42 (step S20), and then corrects (that is, updates) any one or more of the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 based on the correction data.
 以上に説明したように、実施の形態2に係る図面読取システム20、図面読取方法、及び図面読取プログラムを用いれば、ユーザによって修正された情報を用いて誤りパターン辞書131、言語モデル132、共起辞書133を修正しているので、図面の中の表部分のセル内の文字列の文字認識の正解率を更に上げることができる。 As described above, if the drawing reading system 20, the drawing reading method, and the drawing reading program according to the second embodiment are used, the error pattern dictionary 131, the language model 132, and the co-occurrence dictionary 133 are corrected using the information corrected by the user, so the accuracy rate of character recognition of the character strings in the cells of the table portion of the drawing can be further increased.
 なお、上記以外に関し、実施の形態2は、実施の形態1と同じである。 Except for the above, the second embodiment is the same as the first embodiment.
実施の形態3.
 図18は、実施の形態3に係る図面読取システム30及び情報処理装置31の構成を示す機能ブロック図である。図18において、図5に示される構成と同一又は対応する構成には、図5に示される符号と同じ符号が付されている。情報処理装置31は、パーツ個数算出部120と、テンプレート141と、パーツリスト142とを有する点において、図5に示される情報処理装置11と相違する。パーツ個数算出部120は、情報抽出部106と、情報集計部107とを有している。
Embodiment 3.
FIG. 18 is a functional block diagram showing configurations of the drawing reading system 30 and the information processing device 31 according to the third embodiment. In FIG. 18, the same reference numerals as those shown in FIG. 5 are attached to the same or corresponding configurations as those shown in FIG. The information processing device 31 differs from the information processing device 11 shown in FIG. 5 in that it has a parts number calculation unit 120, a template 141, and a parts list 142. FIG. The parts number calculation unit 120 has an information extraction unit 106 and an information totalization unit 107 .
 パーツ個数算出部120は、パーツの個数を自動で算出する。パーツ個数算出部120は、共起フィルタリング及び言語モデルフィルタリングによって得られた、或いは、共起フィルタリングによって得られた、認識文字列に基づいて、図面60の図部分61のパーツの個数を算出する。 The parts number calculation unit 120 automatically calculates the number of parts. The parts number calculation unit 120 calculates the number of parts in the drawing portion 61 of the drawing 60 based on the recognized character strings obtained by co-occurrence filtering and language model filtering, or obtained by co-occurrence filtering.
 図19は、テンプレート141の例を表形式で示す。図19のテンプレート141は、正規表現などで記述された、情報抽出ルールである。図19のテンプレート141は、表部分62の外にある記述「○:1~3」における記号「○」が、「場合1~場合3」の3つの場合を示すことを記述している。図19の「テンプレート141は、図1の記憶装置203に予め記憶されている。テンプレート141は、情報処理装置11と通信可能な他の装置(例えば、ネットワーク上のサーバ)の記憶装置に記憶されてもよい。 FIG. 19 shows an example of the template 141 in tabular form. A template 141 in FIG. 19 is an information extraction rule written in regular expressions or the like. The template 141 of FIG. 19 describes that the symbol "○" in the description "◯: 1 to 3" outside the table portion 62 indicates three cases of "Case 1 to Case 3". 19. The template 141 in FIG. 19 is stored in advance in the storage device 203 in FIG.
 図20は、テンプレート141の他の例を表形式で示す。図20のテンプレートは、パーツが取りうる状態の数により、異なるパーツが採用されるため、「点数/位置数」を求めるために用いられる。例えば、文字列例である「閉-開」と記載があれば、「閉」状態を示すランプと「開」状態を示すランプとが必要であり、ランプの点数は、2である。また、ランプは2つの状態を表示するため、ランプの位置数は2である。テンプレートを適用することで、鍵括弧「 」中にあるハイフン「-」の数でマッチングされるパーツの「点数と位置数」とが、「2」であることが求められる。また、図20において、「機数/号数」は、表中には1行で記載があるが、実際は、複数機(すなわち、複数の「機数」)の装置が必要であり、各装置には、複数号(すなわち、複数の「号数」)のパネルが必要であることを示している。実施の形態3では、「〇の場合の数」における、○は3であるため、「機数/号数」は「3」になる。 FIG. 20 shows another example of the template 141 in tabular form. The template in FIG. 20 is used to obtain the "score/number of positions" because different parts are adopted depending on the number of possible states of the part. For example, if there is a description of "closed-open", which is an example of the character string, a lamp indicating the "closed" state and a lamp indicating the "open" state are required, and the number of points for the lamp is two. Also, since the lamp displays two states, the number of lamp positions is two. By applying the template, it is required that the "score and number of positions" of the parts matched by the number of hyphens "-" in square brackets " " is "2". Also, in FIG. 20, "number of machines/number of machines" is described in one line in the table, but in reality, multiple machines (i.e., multiple "numbers of machines") are required, and each device indicates that multiple panels (that is, multiple "numbers of machines") are required. In the third embodiment, ◯ is 3 in the ``number of cases of ◯'', so the ``number of models/number of issues'' is ``3''.
 図21は、パーツリスト142の例を表形式で示す。パーツリスト142は、パーツを特定するためのルールを記憶する。パーツリスト142は、予め記憶装置203に記憶される。パーツリスト142は、情報処理装置11と通信可能な他の装置(例えば、ネットワーク上のサーバ)の記憶装置に記憶されてもよい。 FIG. 21 shows an example of the parts list 142 in tabular form. The parts list 142 stores rules for specifying parts. The parts list 142 is stored in the storage device 203 in advance. The parts list 142 may be stored in a storage device of another device (for example, a server on the network) that can communicate with the information processing device 11 .
 情報集計部107は、情報抽出部106で抽出した情報をもとに、パーツの個数を算出する。図22は、情報処理装置31の表示制御部109から出力されるパーツの個数の例を表形式で示す。個数は、各パーツが何個必要であるか、図面60から自動的に計算した値を示す。 The information aggregation unit 107 calculates the number of parts based on the information extracted by the information extraction unit 106. FIG. 22 shows an example of the number of parts output from the display control unit 109 of the information processing device 31 in tabular form. The number indicates how many pieces of each part are required, which is automatically calculated from the drawing 60 .
 図23は、実施の形態3に係る図面読取システム30の情報処理装置31の動作を示すフローチャートである。図23において、図14に示されるステップと同じ内容のステップには、図14に示される符号と同じ符号が付されている。情報処理装置31の動作は、テンプレート141に基づいてパーツの個数の情報を抽出し(ステップS30)、パーツリスト142のパーツの個数を集計する(ステップS31)点において、図14の動作と相違する。他の点に関しは、図23の動作は、図4のものと同じである。 FIG. 23 is a flow chart showing the operation of the information processing device 31 of the drawing reading system 30 according to the third embodiment. In FIG. 23, steps having the same content as the steps shown in FIG. 14 are given the same reference numerals as those shown in FIG. The operation of the information processing device 31 is different from the operation of FIG. 14 in that information on the number of parts is extracted based on the template 141 (step S30) and the number of parts in the parts list 142 is tallied (step S31). Otherwise, the operation of FIG. 23 is the same as that of FIG.
 以上に説明したように、実施の形態3に係る図面読取システム30、図面読取方法、及び図面読取プログラムを用いれば、パーツの個数を自動的に集計することができる。また、情報処理装置31がパーツの単価情報を取得できる場合には、情報処理装置31は、パーツの見積りを自動的に作成することができる。 As described above, by using the drawing reading system 30, the drawing reading method, and the drawing reading program according to the third embodiment, the number of parts can be automatically counted. Further, when the information processing device 31 can acquire the unit price information of the parts, the information processing device 31 can automatically create an estimate of the parts.
 なお、上記以外に関し、実施の形態3は、実施の形態1又は2と同じである。 Except for the above, the third embodiment is the same as the first or second embodiment.
 10、20、30 図面読取システム、 11、21、31 情報処理装置、 41 イメージスキャナ、 42 ディスプレイ、 43 ユーザ操作部、 60 図面、 61 図部分、 62 表部分、 100 レイアウト解析部、 101 文字認識部、 102 表形式変換部、 103 置換候補抽出部、 104 言語モデルフィルタ部、 105 共起フィルタ部、 106 情報抽出部、 107 情報集計部、 108 信頼度計算部、 109 表示制御部、 110 フィルタ処理部、 120 パーツ個数算出部、 131 誤りパターン辞書、 132 言語モデル、 133 共起辞書、 134 知識自動獲得部、 135 文字認識正解データ、 141 テンプレート、 142 パーツリスト、 201 プロセッサ、 202 メモリ、 203 記憶装置、 204 通信装置、 205 インタフェース。 10, 20, 30 drawing reading system, 11, 21, 31 information processing device, 41 image scanner, 42 display, 43 user operation unit, 60 drawing, 61 diagram part, 62 table part, 100 layout analysis unit, 101 character recognition unit, 102 table format conversion unit, 103 replacement candidate extraction unit, 104 language model filter unit, 105 Co-occurrence filter section, 106 Information extraction section, 107 Information aggregation section, 108 Reliability calculation section, 109 Display control section, 110 Filter processing section, 120 Parts number calculation section, 131 Error pattern dictionary, 132 Language model, 133 Co-occurrence dictionary, 134 Automatic knowledge acquisition section, 135 Character recognition correct data, 141 Template, 142 Parts list, 201 processor, 202 memory, 203 storage device, 204 communication device, 205 interface.

Claims (14)

  1.  読取対象としての図面のデータを解析して、前記図面を図部分、表部分、及び1つ以上の文字で構成された文字列に分けるレイアウト解析部と、
     前記文字列に対し文字認識を実行して、前記文字認識の結果として得られたテキストデータである認識文字列を生成する文字認識部と、
     誤り文字列と正解候補文字列との対である誤りパターンを予め収集することで構成された誤りパターン辞書から、前記認識文字列に一致する前記誤り文字列と対を成す1つ以上の前記正解候補文字列を1つ以上の置換候補文字列として抽出する置換候補抽出部と、
     前記表部分の対象セル内の対象文字列と前記表部分の他のセル内の他の文字列とが共起する頻度を示す第1の共起情報と前記対象セル内の対象文字列と前記図部分内の文字列とが共起する頻度を示す第2の共起情報との少なくとも一方の共起情報から構成される共起辞書に基づいて、前記認識文字列の生起確率に対応する第1の生起スコアと前記1つ以上の置換候補文字列の各々の生起確率に対応する1つ以上の第2の生起スコアとを計算し、前記1つ以上の第2の生起スコアのうちいずれかが、前記第1の生起スコアより高く且つ予め定められた生起スコア閾値より高いという第1の置換条件を満たす場合に、前記認識文字列として前記第1の置換条件を満たす前記第2の生起スコアの置換候補文字列を出力し、前記第1の置換条件を満たさない場合に、前記認識文字列をそのまま出力する処理である共起フィルタリングを実行する共起フィルタ部と、
     前記共起フィルタ部から出力された前記認識文字列に基づく画像データを出力する表示制御部と、
     を有することを特徴とする図面読取システム。
    a layout analysis unit that analyzes data of a drawing to be read and divides the drawing into a drawing portion, a table portion, and a character string composed of one or more characters;
    a character recognition unit that performs character recognition on the character string to generate a recognized character string that is text data obtained as a result of the character recognition;
    a replacement candidate extraction unit for extracting, as one or more replacement candidate character strings, one or more correct candidate character strings paired with the error character string that matches the recognized character string, from an error pattern dictionary configured by collecting in advance error patterns that are pairs of error character strings and correct candidate character strings;
    A first occurrence score corresponding to the probability of occurrence of the recognized character string and 1 corresponding to the probability of occurrence of each of the one or more replacement candidate character strings based on a co-occurrence dictionary composed of at least one of: first co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell of the table portion and other character strings in the other cell of the table portion; and second co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion. a co-occurrence filtering unit that performs co-occurrence filtering, which is a process of calculating one or more second occurrence scores, and outputting a replacement candidate character string of the second occurrence score that satisfies the first replacement condition as the recognition character string when a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold; and outputting the recognition character string as it is when the first replacement condition is not satisfied;
    a display control unit that outputs image data based on the recognized character string output from the co-occurrence filter unit;
    A drawing reading system comprising:
  2.  単語の連鎖と前記連鎖の出現確率とが格納された言語モデルに基づいて、前記認識文字列の生起確率に対応する第1の言語スコアと前記1つ以上の置換候補文字列の各々の生起確率に対応する1つ以上の第2の言語スコアとを計算し、前記1つ以上の第2の言語スコアのうちのいずれかが、前記第1の言語スコアより高く且つ予め定められた言語スコア閾値より高いという第2の置換条件を満たす第2の言語スコアがある場合に、前記認識文字列として前記第2の置換条件を満たす前記第2の言語スコアの置換候補文字列を出力し、前記第2の置換条件を満たさない場合に、前記認識文字列をそのまま出力する処理である言語モデルフィルタリングを実行する言語モデルフィルタ部を更に有し、
     前記共起フィルタ部は、前記言語モデルフィルタ部から出力された前記認識文字列に前記共起フィルタリングを実行する
     ことを特徴とする請求項1に記載の図面読取システム。
    A first language score corresponding to the probability of occurrence of the recognized character string and one or more second language scores corresponding to the probability of occurrence of each of the one or more replacement candidate character strings are calculated based on a language model in which a chain of words and the appearance probability of the chain are stored. A language model filtering unit that outputs the replacement candidate character string of the second language score that satisfies the second replacement condition as a string and outputs the recognized character string as it is when the second replacement condition is not satisfied, further comprising a language model filtering unit that performs language model filtering,
    2. The drawing reading system according to claim 1, wherein the co-occurrence filter section performs the co-occurrence filtering on the recognized character string output from the language model filter section.
  3.  単語の連鎖と前記連鎖の出現確率とが格納された言語モデルに基づいて、前記認識文字列の生起確率に対応する第1の言語スコアと前記1つ以上の置換候補文字列の各々の生起確率に対応する1つ以上の第2の言語スコアとを計算し、前記1つ以上の第2の言語スコアのうちのいずれかが、前記第1の言語スコアより高く且つ予め定められた言語スコア閾値より高いという第2の置換条件を満たす第2の言語スコアがある場合に、前記認識文字列として前記第2の置換条件を満たす前記第2の言語スコアの置換候補文字列を出力し、前記第2の置換条件を満たさない場合に、前記認識文字列をそのまま出力する処理である言語モデルフィルタリングを実行する言語モデルフィルタ部を更に有し、
     前記言語モデルフィルタ部は、前記共起フィルタ部から出力された前記認識文字列に前記言語モデルフィルタリングを実行し、
     前記表示制御部は、前記言語モデルフィルタ部から出力された前記認識文字列に基づく画像データに基づいて画像を表示する
     ことを特徴とする請求項1に記載の図面読取システム。
    A first language score corresponding to the probability of occurrence of the recognized character string and one or more second language scores corresponding to the probability of occurrence of each of the one or more replacement candidate character strings are calculated based on a language model in which a chain of words and the appearance probability of the chain are stored. A language model filtering unit that outputs the replacement candidate character string of the second language score that satisfies the second replacement condition as a string and outputs the recognized character string as it is when the second replacement condition is not satisfied, further comprising a language model filtering unit that performs language model filtering,
    The language model filter unit performs the language model filtering on the recognized character string output from the co-occurrence filter unit,
    2. The drawing reading system according to claim 1, wherein the display control unit displays an image based on image data based on the recognized character string output from the language model filter unit.
  4.  前記他の文字列は、前記対象セルの上下左右のいずれか1つ以上のセル内の文字列である
     ことを特徴とする請求項1から3のいずれか1項に記載の図面読取システム。
    The drawing reading system according to any one of claims 1 to 3, wherein the other character string is a character string in one or more cells on the top, bottom, left, and right of the target cell.
  5.  前記他の文字列は、前記対象セルに隣接するいずれか1つ以上のセル内の文字列である
     ことを特徴とする請求項1から4のいずれか1項に記載の図面読取システム。
    The drawing reading system according to any one of claims 1 to 4, wherein the other character string is a character string in one or more cells adjacent to the target cell.
  6.  前記図部分内の前記文字列は、前記図部分に描かれた部材の種類を示す記号を含む
     ことを特徴とする請求項1から5のいずれか1項に記載の図面読取システム。
    6. The drawing reading system according to any one of claims 1 to 5, wherein the character strings in the drawing portion include symbols indicating types of members drawn in the drawing portion.
  7.  前記図部分には、機器及び前記機器に備えられた部品が描かれており、
     前記図部分内の前記文字列は、前記部品の種類を示す記号を含む
     ことを特徴とする請求項1から6のいずれか1項に記載の図面読取システム。
    In the drawing part, the equipment and the parts provided in the equipment are drawn,
    The drawing reading system according to any one of claims 1 to 6, wherein the character string in the drawing part includes a symbol indicating the type of the part.
  8.  前記表示制御部から出力された前記認識文字列に基づく画像を表示するディスプレイと、
     前記認識文字列の修正データが入力されるユーザ操作部と、
     前記修正データに基づいて前記誤りパターン辞書及び前記共起辞書を修正する知識自動獲得部と、
     を更に有することを特徴とする請求項1から7のいずれか1項に記載の図面読取システム。
    a display that displays an image based on the recognized character string output from the display control unit;
    a user operation unit into which correction data for the recognition character string is input;
    an automatic knowledge acquisition unit that corrects the error pattern dictionary and the co-occurrence dictionary based on the correction data;
    8. The drawing reading system according to any one of claims 1 to 7, further comprising:
  9.  前記表示制御部から出力された前記認識文字列に基づく画像を表示するディスプレイと、
     前記認識文字列の修正データが入力されるユーザ操作部と、
     前記修正データに基づいて前記誤りパターン辞書、前記共起辞書、及び前記言語モデルを修正する知識自動獲得部と、
     を更に有することを特徴とする請求項2又は3に記載の図面読取システム。
    a display that displays an image based on the recognized character string output from the display control unit;
    a user operation unit into which correction data for the recognition character string is input;
    an automatic knowledge acquisition unit that corrects the error pattern dictionary, the co-occurrence dictionary, and the language model based on the correction data;
    4. The drawing reading system according to claim 2, further comprising:
  10.  前記共起フィルタリングによって得られた前記認識文字列に基づいて、前記図部分のパーツの個数を算出するパーツ個数算出部を更に有する
     ことを特徴とする請求項1又は2に記載の図面読取システム。
    3. The drawing reading system according to claim 1, further comprising a parts number calculation unit that calculates the number of parts in the drawing portion based on the recognized character strings obtained by the co-occurrence filtering.
  11.  前記共起フィルタリング及び前記言語モデルフィルタリングによって得られた前記認識文字列に基づいて、前記図部分のパーツの個数を算出するパーツ個数算出部を更に有する
     ことを特徴とする請求項2又は3に記載の図面読取システム。
    4. The drawing reading system according to claim 2, further comprising a parts number calculation unit that calculates the number of parts in the drawing portion based on the recognized character strings obtained by the co-occurrence filtering and the language model filtering.
  12.  前記パーツ個数算出部は、
     前記パーツの個数の積算の仕方を予め記憶したテンプレートから、情報を抽出する情報抽出部と、
     積算したいパーツのリストを予め記憶するパーツリストに記憶されたパーツについて、前記情報抽出部によって抽出された前記パーツの個数を集計する情報集計部と、
     を有することを特徴とする請求項10又は11に記載の図面読取システム。
    The parts number calculation unit
    an information extracting unit for extracting information from a template storing in advance a method for accumulating the number of parts;
    an information aggregating unit for aggregating the number of parts extracted by the information extracting unit for parts stored in a parts list storing a list of parts to be integrated in advance;
    12. The drawing reading system according to claim 10, further comprising:
  13.  情報処理装置によって実行される図面読取方法であって、
     読取対象としての図面のデータを解析して、前記図面を図部分、表部分、及び1つ以上の文字で構成された文字列に分けるステップと、
     前記文字列に対し文字認識を実行して、前記文字認識の結果として得られたテキストデータである認識文字列を生成するステップと、
     誤り文字列と正解候補文字列との対である誤りパターンを予め収集することで構成された誤りパターン辞書から、前記認識文字列に一致する前記誤り文字列と対を成す1つ以上の前記正解候補文字列を1つ以上の置換候補文字列として抽出するステップと、
     前記表部分の対象セル内の対象文字列と前記表部分の他のセル内の他の文字列とが共起する頻度を示す第1の共起情報と前記対象セル内の対象文字列と前記図部分内の文字列とが共起する頻度を示す第2の共起情報との少なくとも一方の共起情報から構成される共起辞書に基づいて、前記認識文字列の生起確率に対応する第1の生起スコアと前記1つ以上の置換候補文字列の各々の生起確率に対応する1つ以上の第2の生起スコアとを計算し、前記1つ以上の第2の生起スコアのうちいずれかが、前記第1の生起スコアより高く且つ予め定められた生起スコア閾値より高いという第1の置換条件を満たす場合に、前記認識文字列として前記第1の置換条件を満たす前記第2の生起スコアの置換候補文字列を出力し、前記第1の置換条件を満たさない場合に、前記認識文字列をそのまま出力する処理である共起フィルタリングを実行するステップと、
     前記共起フィルタリングが実行された前記認識文字列に基づく画像データを出力するステップと、
     を有することを特徴とする図面読取方法。
    A drawing reading method executed by an information processing device, comprising:
    a step of analyzing data of a drawing to be read and dividing the drawing into a drawing part, a table part, and a character string composed of one or more characters;
    performing character recognition on the character string to generate a recognized character string, which is text data obtained as a result of the character recognition;
    a step of extracting one or more correct candidate character strings paired with the error character string matching the recognized character string as one or more replacement candidate character strings from an error pattern dictionary constructed by collecting in advance error patterns that are pairs of error character strings and correct candidate character strings;
    A first occurrence score corresponding to the probability of occurrence of the recognized character string and 1 corresponding to the probability of occurrence of each of the one or more replacement candidate character strings based on a co-occurrence dictionary composed of at least one of: first co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell of the table portion and other character strings in the other cell of the table portion; and second co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion. calculating one or more second occurrence scores, and outputting a replacement candidate character string of the second occurrence score that satisfies the first replacement condition as the recognition character string when a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold;
    outputting image data based on the recognized character string on which the co-occurrence filtering has been performed;
    A drawing reading method, comprising:
  14.  コンピュータに、
     読取対象としての図面のデータを解析して、前記図面を図部分、表部分、及び1つ以上の文字で構成された文字列に分けるステップと、
     前記文字列に対し文字認識を実行して、前記文字認識の結果として得られたテキストデータである認識文字列を生成するステップと、
     誤り文字列と正解候補文字列との対である誤りパターンを予め収集することで構成された誤りパターン辞書から、前記認識文字列に一致する前記誤り文字列と対を成す1つ以上の前記正解候補文字列を1つ以上の置換候補文字列として抽出するステップと、
     前記表部分の対象セル内の対象文字列と前記表部分の他のセル内の他の文字列とが共起する頻度を示す第1の共起情報と前記対象セル内の対象文字列と前記図部分内の文字列とが共起する頻度を示す第2の共起情報との少なくとも一方の共起情報から構成される共起辞書に基づいて、前記認識文字列の生起確率に対応する第1の生起スコアと前記1つ以上の置換候補文字列の各々の生起確率に対応する1つ以上の第2の生起スコアとを計算し、前記1つ以上の第2の生起スコアのうちいずれかが、前記第1の生起スコアより高く且つ予め定められた生起スコア閾値より高いという第1の置換条件を満たす場合に、前記認識文字列として前記第1の置換条件を満たす前記第2の生起スコアの置換候補文字列を出力し、前記第1の置換条件を満たさない場合に、前記認識文字列をそのまま出力する処理である共起フィルタリングを実行するステップと、
     前記共起フィルタリングが実行された前記認識文字列に基づく画像データを出力するステップと、
     を実行させることを特徴とする図面読取プログラム。
    to the computer,
    a step of analyzing data of a drawing to be read and dividing the drawing into a drawing part, a table part, and a character string composed of one or more characters;
    performing character recognition on the character string to generate a recognized character string, which is text data obtained as a result of the character recognition;
    a step of extracting one or more correct candidate character strings paired with the error character string matching the recognized character string as one or more replacement candidate character strings from an error pattern dictionary constructed by collecting in advance error patterns that are pairs of error character strings and correct candidate character strings;
    A first occurrence score corresponding to the probability of occurrence of the recognized character string and 1 corresponding to the probability of occurrence of each of the one or more replacement candidate character strings based on a co-occurrence dictionary composed of at least one of: first co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell of the table portion and other character strings in the other cell of the table portion; and second co-occurrence information indicating the frequency of co-occurrence of the target character string in the target cell and the character string in the figure portion. calculating one or more second occurrence scores, and outputting a replacement candidate character string of the second occurrence score that satisfies the first replacement condition as the recognition character string when a first replacement condition is satisfied that any one of the one or more second occurrence scores is higher than the first occurrence score and higher than a predetermined occurrence score threshold;
    outputting image data based on the recognized character string on which the co-occurrence filtering has been performed;
    A drawing reading program characterized by executing
PCT/JP2022/001620 2022-01-18 2022-01-18 Drawing reading system, drawing reading method, and drawing reading program WO2023139650A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023553177A JP7383209B1 (en) 2022-01-18 2022-01-18 Drawing reading system, drawing reading method, and drawing reading program
PCT/JP2022/001620 WO2023139650A1 (en) 2022-01-18 2022-01-18 Drawing reading system, drawing reading method, and drawing reading program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/001620 WO2023139650A1 (en) 2022-01-18 2022-01-18 Drawing reading system, drawing reading method, and drawing reading program

Publications (1)

Publication Number Publication Date
WO2023139650A1 true WO2023139650A1 (en) 2023-07-27

Family

ID=87347982

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/001620 WO2023139650A1 (en) 2022-01-18 2022-01-18 Drawing reading system, drawing reading method, and drawing reading program

Country Status (2)

Country Link
JP (1) JP7383209B1 (en)
WO (1) WO2023139650A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004133565A (en) * 2002-10-09 2004-04-30 Fujitsu Ltd Postprocessing device for character recognition using internet
WO2014203905A2 (en) * 2013-06-17 2014-12-24 アイビーリサーチ株式会社 Reference symbol extraction method, reference symbol extraction device and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004133565A (en) * 2002-10-09 2004-04-30 Fujitsu Ltd Postprocessing device for character recognition using internet
WO2014203905A2 (en) * 2013-06-17 2014-12-24 アイビーリサーチ株式会社 Reference symbol extraction method, reference symbol extraction device and program

Also Published As

Publication number Publication date
JP7383209B1 (en) 2023-11-17
JPWO2023139650A1 (en) 2023-07-27

Similar Documents

Publication Publication Date Title
CN110489760B (en) Text automatic correction method and device based on deep neural network
Stoica et al. Re-tacred: Addressing shortcomings of the tacred dataset
US10915788B2 (en) Optical character recognition using end-to-end deep learning
US20080168341A1 (en) Digital spreadsheet formula automation
CN111274239B (en) Test paper structuring processing method, device and equipment
Chen A critical evaluation of text difficulty development in ELT textbook series: A corpus-based approach using variability neighbor clustering
CN111310447A (en) Grammar error correction method, grammar error correction device, electronic equipment and storage medium
Yang et al. Don’t do that! hunting down visual design smells in complex uis against design guidelines
US20200202822A1 (en) Chord information extraction device, chord information extraction method and non-transitory computer readable medium storing chord information extraction program
JP6845911B1 (en) Character processing system and program
WO2023139650A1 (en) Drawing reading system, drawing reading method, and drawing reading program
JP7040155B2 (en) Information processing equipment, information processing methods and programs
CN114494679A (en) Double-layer PDF generation and correction method and device
CN113836894A (en) Multidimensional English composition scoring method and device and readable storage medium
CN111737472A (en) Method and system for updating text classification model, electronic device and storage medium
EP4184313A1 (en) Demand conformity analysis method and system, and electronic device and storage medium
RU2679383C1 (en) Self-corrective method and recognizer for recognition device of valuable documents
KR101697237B1 (en) Apparatus and method for learning the korean pronunciation of the chinese words
CN113128486B (en) Construction method and device of handwritten mathematical formula sample library and terminal equipment
Wu An Evaluation Framework of Optical Music Recognition in Numbered Music Notation
JP2009211639A (en) Document processor
JP2017102587A (en) Information processing apparatus, image reading device, image forming apparatus, and program
JP2016109852A (en) System for creating soft keyboard for input of test score
TWI581130B (en) Pinyin display device and method
KR20230073015A (en) Applying method of the ten thousand separator notation format for numbers

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2023553177

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22921813

Country of ref document: EP

Kind code of ref document: A1