JP2024078158A

JP2024078158A - Character extraction method

Info

Publication number: JP2024078158A
Application number: JP2022190552A
Authority: JP
Inventors: 伸一竹内; 鷹之西田; 真吾則竹; 亮村上; 涼柳川
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2024-06-10

Abstract

【課題】画像のうち、所定の情報の内容を示す文字が記載されている部分を探し出すのに要する時間が長くなることを抑制できること。【解決手段】文字抽出方法は、画像を複数の区画に区分けすること（Ｓ１３）、所定の情報を基に、所定の情報の内容を示す文字が記載されている可能性が最も高い区画を、複数の区画の中から選択すること（Ｓ１７、Ｓ１９）、及び、選択した区画から所定の情報の内容を示す文字を抽出すること（Ｓ２１，Ｓ２３）、をコンピューターに実行させる。【選択図】図４[Problem] To suppress the time required to find a portion of an image containing characters indicating the content of predetermined information. [Solution] A character extraction method causes a computer to execute the following steps: divide an image into a plurality of sections (S13), select from the plurality of sections a section that is most likely to contain characters indicating the content of the predetermined information based on the predetermined information (S17, S19), and extract characters indicating the content of the predetermined information from the selected section (S21, S23). [Selected Figure] Figure 4

Description

本発明は、画像から文字を抽出する文字抽出方法に関する。 The present invention relates to a character extraction method for extracting characters from an image.

特許文献１には、光学文字識別の技術を用いてテキストの画像をテキスト形式に変換する方法の一例が開示されている。具体的には、当該方法では、識別対象のピクチャの中からテキスト領域の囲み枠を確定し、当該囲み枠に基づいて、識別対象のピクチャからテキスト領域ピクチャが抽出される。続いて、当該テキスト領域ピクチャの中からテキスト行の囲み枠を確定し、当該囲み枠に基づいて、当該テキスト領域ピクチャからテキスト行ピクチャが抽出される。そして、当該テキスト行ピクチャに対してテキストシーケンス識別を行うことにより、識別結果が得られる。 Patent document 1 discloses an example of a method for converting an image of text into a text format using optical character recognition technology. Specifically, in this method, a bounding box of a text area is determined from a picture to be identified, and a text area picture is extracted from the picture to be identified based on the bounding box. Next, a bounding box of a text line is determined from the text area picture, and a text line picture is extracted from the text area picture based on the bounding box. Then, text sequence recognition is performed on the text line picture to obtain a recognition result.

なお、識別結果を得る際には、軽量テキストシーケンス識別モデルが用いられる。当該識別モデルは、機械学習が施された学習済モデルである。 When obtaining the classification results, a lightweight text sequence classification model is used. This classification model is a trained model that has been subjected to machine learning.

特開２０２１－１９７１９０号公報JP 2021-197190 A

上記の方法では、識別対象のピクチャから、所望する情報の内容を示す文字が記載されている部分をテキスト行ピクチャとして探し出すのに時間がかかるおそれがある。 With the above method, it may take a long time to find the part of the picture to be identified that contains the characters indicating the content of the desired information as a text line picture.

上記課題を解決するための文字抽出方法は、画像データで示される画像から所定の情報の内容を示す文字を抽出する方法である。当該文字抽出方法は、前記画像を複数の区画に区分けすることと、前記所定の情報を基に、当該所定の情報の内容を示す文字が記載されている可能性が最も高い前記区画を、前記複数の区画の中から選択することと、選択した前記区画から前記所定の情報の内容を示す文字を抽出することと、をコンピューターに実行させる。 The character extraction method for solving the above problem is a method for extracting characters indicating the content of specified information from an image shown by image data. The character extraction method causes a computer to execute the following steps: divide the image into a plurality of sections; select, based on the specified information, from among the plurality of sections, the section that is most likely to contain characters indicating the content of the specified information; and extract the characters indicating the content of the specified information from the selected section.

上記文字抽出方法では、画像を複数の区画に区分けすると、当該複数の区画の中から、所定の情報に関する文字が記載されている可能性が最も高い区画が選択される。この際、当該区画は、所定の情報に基づいて選択される。そして、当該区画から所定の文字情報の内容を示す文字が抽出される。したがって、上記文字抽出方法によれば、画像のうち、所定の情報の内容を示す文字が記載されている部分を探し出すのに要する時間が長くなることを抑制できる。 In the above character extraction method, an image is divided into a plurality of sections, and from the plurality of sections, a section that is most likely to contain characters related to predetermined information is selected. At this time, the section is selected based on the predetermined information. Characters indicating the content of the predetermined text information are then extracted from the section. Therefore, the above character extraction method can prevent the time required to find a portion of an image containing characters indicating the content of the predetermined information from being lengthened.

なお、所定の情報の内容を示す文字が記載されている可能性が最も高い区画を、複数の区画の中から選択する際に、機械学習が施された学習済モデルを用いてもよい。 In addition, when selecting the section that is most likely to contain characters indicating the content of the specified information from among multiple sections, a trained model that has been subjected to machine learning may be used.

図１は、実施形態の文字抽出方法を実現するための抽出装置の概略を示すブロック図である。FIG. 1 is a block diagram showing an outline of an extraction device for implementing a character extraction method according to an embodiment of the present invention. 図２は、設計図面の画像を示す模式図である。FIG. 2 is a schematic diagram showing an image of a design drawing. 図３は、報告書の画像を示す模式図である。FIG. 3 is a schematic diagram showing an image of a report. 図４は、実施形態の文字抽出方法を示すフローチャートである。FIG. 4 is a flowchart showing a character extraction method according to the embodiment.

以下、文字抽出方法の一実施形態を図１～図４に従って説明する。
本実施形態の文字抽出方法は、画像データで示される画像に記載されている文字を、当該画像から抽出する方法である。詳しくは、文字抽出方法は、所定の情報の内容を示す文字を画像から抽出する方法である。 An embodiment of a character extraction method will be described below with reference to FIGS.
The character extraction method of the present embodiment is a method for extracting characters written in an image represented by image data from the image. More specifically, the character extraction method is a method for extracting characters indicating the content of predetermined information from an image.

＜抽出装置＞
図１を参照し、文字抽出方法を実現するための抽出装置１０について説明する。
抽出装置１０は、ユーザーインターフェース１１と、コンピューター２０とを備えている。 <Extraction device>
With reference to FIG. 1, an extraction device 10 for implementing a character extraction method will be described.
The extraction device 10 includes a user interface 11 and a computer 20 .

ユーザーインターフェース１１は、作業者が操作する操作部１２と、コンピューター２０から受信した情報の内容を表示する表示部１３とを有している。操作部１２は、物理的なボタンやスイッチ、及び、タッチパネルを有している表示画面に表示されるボタンのうちの少なくとも一方を有している。ユーザーインターフェース１１は、作業者による操作部１２の操作に応じた要求をコンピューター２０に出力する。 The user interface 11 has an operation unit 12 operated by the worker, and a display unit 13 that displays the contents of information received from the computer 20. The operation unit 12 has at least one of physical buttons or switches, and buttons displayed on a display screen having a touch panel. The user interface 11 outputs requests to the computer 20 in response to the operation of the operation unit 12 by the worker.

コンピューター２０は、例えば電子制御装置である。この場合、コンピューター２０は、ＣＰＵ２１と、第１記憶装置２２と、第２記憶装置２３とを有している。第１記憶装置２２には、ＣＰＵ２１によって実行される制御プログラムＣＰが記憶されている。ＣＰＵ２１が制御プログラムＣＰを実行することにより、コンピューター２０は、図４に示す一連の処理を実行する。すなわち、コンピューター２０は、画像データで示される画像を複数の区画に分割する。コンピューター２０は、複数の区画について、所定の情報の内容を示す文字が記載されている可能性の高い順に順位付けを行う。そして、コンピューター２０は、複数の区画のうち、順位の最も高い区画から、ＯＣＲを用いて所定の情報の内容を示す文字を抽出する処理を行う。なお、ＯＣＲとは光学文字識別である。 The computer 20 is, for example, an electronic control device. In this case, the computer 20 has a CPU 21, a first storage device 22, and a second storage device 23. The first storage device 22 stores a control program CP executed by the CPU 21. When the CPU 21 executes the control program CP, the computer 20 executes the series of processes shown in FIG. 4. That is, the computer 20 divides an image represented by image data into a plurality of sections. The computer 20 ranks the plurality of sections in order of the likelihood that characters indicating the contents of the specified information are written therein. The computer 20 then performs a process of extracting characters indicating the contents of the specified information from the section with the highest rank among the plurality of sections using OCR. Note that OCR stands for optical character recognition.

第２記憶装置２３には、学習済モデルＬＭと、複数のテキストＴＸとが記憶されている。
学習済モデルＬＭは、画像を区分けした複数の区画について、所定の情報の内容を示す文字が記載されている可能性の高い順に順位付けを行うための機械学習が施された学習モデルである。例えば、学習済モデルＬＭの一例は、多次元多項式の近似器である。例えば、学習済モデルＬＭは、中間層が１層である全結合順伝搬型のニューラルネットワークによって構成されている。 The second storage device 23 stores a learned model LM and a plurality of texts TX.
The trained model LM is a learning model that has been subjected to machine learning to rank a plurality of sections into which an image is divided in order of the likelihood that characters indicating the contents of predetermined information are written. For example, one example of the trained model LM is an approximator of a multidimensional polynomial. For example, the trained model LM is configured by a fully connected forward propagation type neural network with one intermediate layer.

学習済モデルＬＭは、所定の情報、画像データ及び書類の種類が入力変数として入力された場合に、複数の区画について、当該所定の情報の内容を示す文字が記載されている可能性の大きさを示す値を出力変数として出力する。画像の区分数が４つである場合、学習済モデルＬＭは、４つの区画の各々に対する出力変数を出力する。 When the trained model LM receives specified information, image data, and the type of document as input variables, it outputs, as output variables, values indicating the likelihood that characters indicating the content of the specified information are written in multiple sections. If the image has four sections, the trained model LM outputs output variables for each of the four sections.

例えば学習済モデルＬＭは、以下に示すような機械学習を学習モデルに施すことによって生成された。
図２には、設計図面の画像ＩＭＧＡが図示されている。図２に示す例では、画像ＩＭＧＡは４つの区画ＳＣ１，ＳＣ２，ＳＣ３，ＳＣ４に区画される。設計図面の画像ＩＭＧＡにあっては、書類の種類は「設計図面」であり、所定の情報の一例は普通公差である。図２に示す画像ＩＭＧＡでは、普通公差（所定の情報）の内容を示す文字である「ＪＩＳＧ１００」が区画ＳＣ４に存在している。そのため、所定の情報として普通公差、書類の種類として設計図面、及び画像ＩＭＧＡの画像データを学習モデルの入力変数とした場合、区画ＳＣ４の出力変数が、他の区画ＳＣ１，ＳＣ２，ＳＣ３の出力変数よりも大きくなるように、当該学習モデルに対して機械学習が施される。 For example, the learned model LM was generated by applying machine learning to a learning model as shown below.
FIG. 2 illustrates an image IMGA of a design drawing. In the example illustrated in FIG. 2, the image IMGA is divided into four sections SC1, SC2, SC3, and SC4. In the image IMGA of a design drawing, the document type is "design drawing", and an example of the specified information is general tolerance. In the image IMGA illustrated in FIG. 2, "JISG100", which is a character indicating the content of the general tolerance (specified information), is present in the section SC4. Therefore, when the general tolerance is used as the specified information, the design drawing is used as the document type, and the image data of the image IMGA are used as input variables of the learning model, machine learning is performed on the learning model so that the output variable of the section SC4 is larger than the output variables of the other sections SC1, SC2, and SC3.

図３には、報告書の画像ＩＭＧＢが図示されている。図３に示す報告書のページ数は６ページである。そのため、図３に示す例では、画像ＩＭＧＢが６つの区画ＳＣ１，ＳＣ２，ＳＣ３，ＳＣ４，ＳＣ５，ＳＣ６に区画される。報告書の画像ＩＭＧＢにあっては、書類の種類は「報告書」であり、所定の情報の一例は発行日である。図３に示す画像ＩＭＧＢでは、発行日（所定の情報）の内容を示す文字が区画ＳＣ１に存在している。そのため、所定の情報として発行日、書類の種類として報告書及び画像ＩＭＧＢの画像データを学習モデルの入力変数とした場合、区画ＳＣ１の出力変数が、他の区画ＳＣ２，ＳＣ３，ＳＣ４，ＳＣ５，ＳＣ６の出力変数よりも大きくなるように、当該学習モデルに対して機械学習が施される。 In FIG. 3, an image IMGB of a report is shown. The report shown in FIG. 3 has six pages. Therefore, in the example shown in FIG. 3, the image IMGB is divided into six sections SC1, SC2, SC3, SC4, SC5, and SC6. In the image IMGB of a report, the document type is "report", and an example of the specified information is the publication date. In the image IMGB shown in FIG. 3, characters indicating the contents of the publication date (specified information) are present in section SC1. Therefore, when the publication date is the specified information, the report is the document type, and the image data of the image IMGB are input variables of the learning model, machine learning is performed on the learning model so that the output variable of section SC1 is larger than the output variables of the other sections SC2, SC3, SC4, SC5, and SC6.

テキストＴＸは、画像から抽出した文字を識別する際に用いられる。複数のテキストＴＸは、設計図面用のテキストＴＸ、及び、報告書用のテキストＴＸを含んでいる。例えば設計図面の普通公差用のテキストＴＸは「ＪＩＳＧ１００」である。 Text TX is used to identify characters extracted from an image. The multiple text TX include text TX for design drawings and text TX for reports. For example, the text TX for general tolerances in design drawings is "JIS G100."

＜文字抽出方法＞
図４には、文字抽出方法を構成する複数の処理の実行手順を示すフローチャートが図示されている。コンピューター２０が画像データを取得すると、コンピューター２０は図４に示す一連の処理を開始する。 <Character extraction method>
A flowchart showing the execution procedure of a plurality of processes constituting the character extraction method is shown in Fig. 4. When the computer 20 acquires image data, the computer 20 starts the series of processes shown in Fig. 4.

ステップＳ１１において、コンピューター２０は、取得した画像データの画像における書類の種類及び所定の情報を取得する。コンピューター２０は、例えば作業者によるユーザーインターフェース１１の操作部１２の入力操作に基づいて、書類の種類及び所定の情報を取得する。 In step S11, the computer 20 acquires the document type and predetermined information in the image of the acquired image data. The computer 20 acquires the document type and predetermined information based on, for example, an input operation of the operation unit 12 of the user interface 11 by the worker.

ステップＳ１３において、コンピューター２０は、画像データの画像を複数の区画に区分けする。このとき、コンピューター２０は、ステップＳ１１で取得した書類の種類に応じた数に画像を区分けするとよい。例えば画像が図２に示したような設計図面の画像ＩＭＧＡであるとき、コンピューター２０は、画像ＩＭＧＡを４つの区画ＳＣ１～ＳＣ４に区分けする。また例えば画像が図３に示したような報告書の画像ＩＭＧＢであるとき、コンピューター２０は、ページ数と同数の区画に画像ＩＭＧＢを区分けする。 In step S13, computer 20 divides the image of the image data into a number of sections. At this time, computer 20 may divide the image into a number according to the type of document acquired in step S11. For example, when the image is an image IMGA of a design drawing as shown in FIG. 2, computer 20 divides image IMGA into four sections SC1 to SC4. Also, when the image is an image IMGB of a report as shown in FIG. 3, computer 20 divides image IMGB into the same number of sections as the number of pages.

ステップＳ１５において、コンピューター２０は、書類の種類、所定の情報及び画像データを入力変数として学習済モデルＬＭに入力する。続くステップＳ１７において、コンピューター２０は、学習済モデルＬＭの出力変数を基に、所定の情報の内容を示す文字が記載されている可能性の高い順に複数の区画に対して順位付けを行う。例えば画像を４つの区画ＳＣ１～ＳＣ４に区画した場合において、区画ＳＣ４の出力変数が１番目に高く、区画ＳＣ２の出力変数が２番目に高く、区画ＳＣ３の出力変数が３番目に高く、区画ＳＣ１の出力変数が４番目に高かったとする。この場合、コンピューター２０は、区画ＳＣ１を１位とし、区画ＳＣ２を２位とし、区画ＳＣ３を３位とし、区画ＳＣ１を４位とする。そして、コンピューター２０は、所定の情報の内容を示す文字が記載されている可能性が最も高い区画として、１位の区画を複数の区画の中から選択する。 In step S15, the computer 20 inputs the document type, the specified information, and the image data as input variables to the trained model LM. In the following step S17, the computer 20 ranks the multiple sections in order of likelihood that the section contains characters indicating the contents of the specified information, based on the output variables of the trained model LM. For example, if an image is divided into four sections SC1 to SC4, the output variable of section SC4 is the highest, the output variable of section SC2 is the second highest, the output variable of section SC3 is the third highest, and the output variable of section SC1 is the fourth highest. In this case, the computer 20 ranks section SC1 first, section SC2 second, section SC3 third, and section SC1 fourth. The computer 20 then selects the section ranked first from the multiple sections as the section most likely to contain characters indicating the contents of the specified information.

ステップＳ１９において、コンピューター２０は、番号Ｎに１を設定する。そして、コンピューター２０は処理をステップＳ２１に移行する。
ステップＳ２１において、コンピューター２０は、複数の区画のうち、Ｎ位の区画からＯＣＲによって文字を抽出する。番号Ｎに１が設定されている場合、コンピューター２０は、１位の区画、すなわち所定の情報の内容を示す文字が記載されている可能性が最も高い区画から文字を抽出する処理を実施する。コンピューター２０は、当該処理の実施を完了すると、処理をステップＳ２３に移行する。 In step S19, the computer 20 sets the number N to 1. Then, the computer 20 shifts the process to step S21.
In step S21, the computer 20 extracts characters from the Nth-ranked section by OCR among the multiple sections. If the number N is set to 1, the computer 20 performs a process of extracting characters from the first section, i.e., the section that is most likely to contain characters indicating the content of the specified information. When the computer 20 completes the process, it transitions to step S23.

ステップＳ２３において、コンピューター２０は、ステップＳ２１で抽出した文字と、テキストＴＸの文字との類似度Ｘを算出する。具体的には、コンピューター２０は、所定の情報及び書類の種類に応じたテキストＴＸを第２記憶装置２３から読み出す。そして、コンピューター２０は、ステップＳ２１で抽出した文字と、第２記憶装置２３から読み出したテキストＴＸの文字とを比較することによって類似度Ｘを算出する。 In step S23, the computer 20 calculates the similarity X between the characters extracted in step S21 and the characters of the text TX. Specifically, the computer 20 reads out the text TX corresponding to the specified information and document type from the second storage device 23. Then, the computer 20 calculates the similarity X by comparing the characters extracted in step S21 with the characters of the text TX read out from the second storage device 23.

ここで、ステップＳ２１で抽出した文字が「ＪＩＳＧ１０６」であり、比較に用いるテキストＴＸの文字が「ＪＩＳＧ１００」である場合を一例として説明する。この場合、ステップＳ２１で抽出した文字の中で、先頭の「Ｊ」、２番目の「Ｉ」、３番目の「Ｓ」、４番目の「Ｇ」、５番目の「１」及び６番目の「０」は、テキストＴＸと一致している。その一方で、７番目の「６」は、テキストＴＸと一致していない。そのため、コンピューター２０は、テキストＴＸと一致した文字数が６個であることを取得できる。そして、コンピューター２０は、例えば以下に示す関係式（Ｆ１）を用いて類似度Ｘを算出する。関係式（Ｆ１）において、「Ｚ１」はテキストＴＸと一致した文字数であり、「Ｙ１」はＯＣＲによって抽出できた文字の数であり、「Ｙ２」はテキストＴＸの文字数である。 Here, an example will be described in which the characters extracted in step S21 are "JISG106" and the characters in text TX used for comparison are "JISG100". In this case, among the characters extracted in step S21, the first "J", the second "I", the third "S", the fourth "G", the fifth "1" and the sixth "0" match the text TX. On the other hand, the seventh "6" does not match the text TX. Therefore, the computer 20 can obtain that the number of characters that match the text TX is six. Then, the computer 20 calculates the similarity X using, for example, the following relational expression (F1). In the relational expression (F1), "Z1" is the number of characters that match the text TX, "Y1" is the number of characters that could be extracted by OCR, and "Y2" is the number of characters in the text TX.

Ｘ＝（２×Ｚ１）／（Ｙ１＋Ｙ２）（Ｆ１）
ここで説明している場合、ＯＣＲで抽出できた文字の数は７個であり、テキストＴＸの文字数は７個である。そのため、コンピューター２０は、「Ｙ１」に７を代入するとともに「Ｙ２」に７を代入する。そして、コンピューター２０は、「Ｚ１」に６を代入することにより、類似度Ｘとして「０．８５７」を算出する。 X = (2 x Z1) / (Y1 + Y2) (F1)
In the case described here, the number of characters extracted by OCR is 7, and the number of characters in the text TX is 7. Therefore, the computer 20 assigns 7 to "Y1" and assigns 7 to "Y2". Then, the computer 20 assigns 6 to "Z1", thereby calculating the similarity X as "0.857".

なお、ステップＳ２１で抽出した文字が「ＪＩＳＧ１００」であった場合、コンピューター２０は、関係式（Ｆ１）の「Ｚ１」に７を代入するため、類似度Ｘとして「１」を算出する。 If the characters extracted in step S21 are "JISG100," the computer 20 assigns 7 to "Z1" in the relational expression (F1), and calculates the similarity X to be "1."

コンピューター２０は、類似度Ｘを算出すると、処理をステップＳ２５に移行する。ステップＳ２５において、コンピューター２０は、ステップＳ２３で算出した類似度Ｘが判定値Ｘｔｈよりも大きいか否かを判定する。判定値Ｘｔｈは、所定の情報の内容を示す文字が画像に記載されているか否かの判断基準である。ＯＣＲによる文字の抽出では、抽出間違いが発生する可能性もある。その一方で、類似度Ｘが０．５を下回るような場合、ＯＣＲで抽出した文字とテキストＴＸとが一致しているとは言いがたい。そのため、０．５よりも大きく且つ１未満の値を判定値Ｘｔｈとして設定することが好ましい。そして、類似度Ｘが判定値Ｘｔｈよりも大きい場合は、所定の情報の内容を示す文字が画像に記載されていると見なす。一方、類似度Ｘが判定値Ｘｔｈ以下である場合は、所定の情報の内容を示す文字が画像に記載されていないと見なす。コンピューター２０は、類似度Ｘが判定値Ｘｔｈよりも大きい場合（Ｓ２５：ＹＥＳ）、処理をステップＳ２７に移行する。一方、コンピューター２０は、類似度Ｘが判定値Ｘｔｈ以下である場合（Ｓ２５：ＮＯ）、処理をステップＳ３１に移行する。 After calculating the similarity X, the computer 20 proceeds to step S25. In step S25, the computer 20 determines whether the similarity X calculated in step S23 is greater than the judgment value Xth. The judgment value Xth is a criterion for determining whether a character indicating the content of the specified information is written in the image. When extracting characters using OCR, there is a possibility of extraction errors. On the other hand, if the similarity X is less than 0.5, it is difficult to say that the character extracted by OCR and the text TX match. Therefore, it is preferable to set a value greater than 0.5 and less than 1 as the judgment value Xth. Then, if the similarity X is greater than the judgment value Xth, it is considered that a character indicating the content of the specified information is written in the image. On the other hand, if the similarity X is equal to or less than the judgment value Xth, it is considered that a character indicating the content of the specified information is not written in the image. If the similarity X is greater than the judgment value Xth (S25: YES), the computer 20 proceeds to step S27. On the other hand, if the similarity X is equal to or less than the judgment value Xth (S25: NO), the computer 20 transitions to step S31.

ステップＳ２７において、コンピューター２０は、所定の情報の内容を示す文字が画像に記載されている旨をユーザーインターフェース１１の表示部１３に表示させる。また、コンピューター２０は、画像に記載されている「所定の情報の内容を示す文字」を、所定の作業ファイルに書き込む。その後、コンピューター２０は一連の処理を終了する。 In step S27, the computer 20 causes the display unit 13 of the user interface 11 to display that the image contains characters indicating the contents of the specified information. The computer 20 also writes the "characters indicating the contents of the specified information" contained in the image to a specified work file. After that, the computer 20 ends the series of processes.

ステップＳ３１において、コンピューター２０は、全ての区画に対して文字を抽出する処理を実施したか否かを判定する。例えば、コンピューター２０は、番号Ｎが画像の区分数と一致している場合、全ての区画に対して文字を抽出する処理を実施したと判定する（Ｓ３１：ＹＥＳ）。そして、コンピューター２０は処理をステップＳ３５に移行する。一方、コンピューター２０は、番号Ｎが画像の区分数よりも小さい場合、複数の区画の中で、文字を抽出する処理を未だ実施していない区画があると判定する（Ｓ３１：ＮＯ）。そして、コンピューター２０は処理をステップＳ３３に移行する。 In step S31, the computer 20 determines whether the process of extracting characters has been performed for all sections. For example, if the number N matches the number of sections in the image, the computer 20 determines that the process of extracting characters has been performed for all sections (S31: YES). The computer 20 then transitions to step S35. On the other hand, if the number N is smaller than the number of sections in the image, the computer 20 determines that there is a section among the multiple sections for which the process of extracting characters has not yet been performed (S31: NO). The computer 20 then transitions to step S33.

ステップＳ３３において、コンピューター２０は、番号Ｎを１だけインクリメントする。このように番号Ｎを更新すると、コンピューター２０は処理をステップＳ２１に移行する。 In step S33, the computer 20 increments the number N by 1. After updating the number N in this manner, the computer 20 transitions to step S21.

ステップＳ３５において、コンピューター２０は、所定の情報の内容を示す文字が画像に記載されていない旨を作業者に通知する。この場合、コンピューター２０は、所定の情報の内容を示す文字が画像に記載されていない旨のメッセージをユーザーインターフェース１１の表示部１３に表示させる。その後、コンピューター２０は一連の処理を終了する。 In step S35, the computer 20 notifies the worker that the image does not include characters indicating the content of the specified information. In this case, the computer 20 causes the display unit 13 of the user interface 11 to display a message indicating that the image does not include characters indicating the content of the specified information. The computer 20 then ends the series of processes.

＜作用及び効果＞
コンピューター２０に画像データが入力されると、図４に示した一連の処理が実行される。画像データで示される画像の書類の種類及び所定の情報をコンピューター２０が取得すると、コンピューター２０の処理によって、書類の種類を基に、画像が複数の区画に区分けされる。例えば書類の種類が設計図面である場合、画像が４つの区画ＳＣ１～ＳＣ４に区分けされる。そして、複数の区画の中から、所定の情報の内容を示す文字が記載されている可能性が最も高い区画が選択される。そして、選択した区画から所定の情報の内容を示す文字が抽出される。 <Action and Effects>
When image data is input to computer 20, the series of processes shown in Fig. 4 are executed. When computer 20 acquires the type of document and the specified information of the image represented by the image data, computer 20 processes the image and divides the image into a number of sections based on the type of document. For example, if the type of document is a design drawing, the image is divided into four sections SC1 to SC4. Then, from among the multiple sections, the section that is most likely to contain characters indicating the content of the specified information is selected. Then, the characters indicating the content of the specified information are extracted from the selected section.

すなわち、本実施形態では、所定の情報を基に、画像のうち、所定の情報の内容を示す文字が記載されている可能性が最も高い部分を探し、文字を抽出する処理を、当該部分に対して他の部分よりも優先して実施できる。そのため、画像のうち、所定の情報の内容を示す文字が記載されている部分を探し出すのに要する時間が長くなることを抑制できる。 In other words, in this embodiment, based on the specified information, the part of the image that is most likely to contain characters indicating the content of the specified information is searched for, and the process of extracting the characters is performed on that part rather than on other parts. This makes it possible to prevent the time required to search for the part of the image containing characters indicating the content of the specified information from being long.

本実施形態では、以下の効果をさらに得ることができる。
（１）本実施形態では、学習済モデルＬＭを用い、画像を区分けした複数の区画について、所定の情報の内容を示す文字が記載されている可能性が高い順に順位付けが行われる。１位の区画に所定の情報の内容を示す文字が記載されていないことがある。この場合には、２位の区画に所定の情報の内容を示す文字が記載されているか否かを判定する処理が実行される。そして、２位の区画に所定の情報の内容を示す文字が記載されていない場合には、３位の区画に所定の情報の内容を示す文字が記載されているか否かを判定する処理が実行される。そのため、画像のいずれかに、所定の情報の内容を示す文字が記載されている場合には、当該文字を抽出することができる。 In this embodiment, the following effects can be further obtained.
(1) In this embodiment, the learned model LM is used to rank the multiple sections into which an image is divided in order of the likelihood that characters indicating the content of the specified information are written. There are cases where the first-ranked section does not have characters indicating the content of the specified information. In this case, a process is executed to determine whether or not the second-ranked section has characters indicating the content of the specified information. Then, if the second-ranked section does not have characters indicating the content of the specified information, a process is executed to determine whether or not the third-ranked section has characters indicating the content of the specified information. Therefore, if any of the images has characters indicating the content of the specified information, the characters can be extracted.

（２）学習済モデルＬＭの入力変数は、所定の情報及び画像データに加え、書類の種類も含んでいる。このように入力変数の種類を増やすことにより、画像を区分けした複数の区画について、所定の情報の内容を示す文字が記載されている可能性が高い順に順位付けを行う際に、その精度を高くできる。 (2) The input variables of the trained model LM include the document type in addition to the specified information and image data. Increasing the number of input variables in this way can improve the accuracy of ranking multiple sections into which an image is divided in order of the likelihood that they contain characters indicating the content of the specified information.

＜変更例＞
上記実施形態は、以下のように変更して実施することができる。上記実施形態及び以下の変更例は、技術的に矛盾しない範囲で互いに組み合わせて実施することができる。 <Example of change>
The above embodiment can be modified as follows: The above embodiment and the following modifications can be combined with each other to the extent that no technical contradiction occurs.

・画像からＯＣＲで文字を抽出する場合、文字の読み間違いが発生することがある。例えば、読み間違いが発生しやすい文字としては、例えば、以下に示す文字が存在する。
アルファベットの大文字の「Ｏ」と、数字の「０（零）」。 When extracting characters from an image using OCR, misreading of characters may occur. For example, the following characters are prone to being misread:
The capital letter "O" and the number "0 (zero)."

アルファベットの大文字の「Ｉ」と、数字の「１」。
アルファベットの大文字の「Ｚ」と、小文字の「ｚ」。
アルファベットの大文字の「Ｇ」と、数字の「６」。 The capital letter "I" and the number "1".
The capital letter "Z" and the lowercase letter "z".
The capital letter "G" and the number "6".

このように読み間違いが発生しやすい文字を確認文字として予め登録しておくとよい。例えば、画像に実際に記載されている文字が「Ｉ（アルファベットの大文字）」である場合、ＯＣＲによって「Ｉ（アルファベットの大文字）」が抽出されたり、「１（数字のイチ）」が抽出されたりする。そのため、ＯＣＲで抽出した文字が確認文字であり、テキストＴＸの文字が「Ｉ（アルファベットの大文字）」である場合、ＯＣＲで抽出した文字が「Ｉ」であっても「１」であっても、抽出した文字とテキストＴＸの文字とが一致していると判定するとよい。これにより、ＯＣＲによる文字の抽出精度に起因して類似度Ｘが低めに算出されることを抑制できる。 In this way, it is a good idea to register characters that are prone to misreading as confirmation characters in advance. For example, if the character actually written in the image is "I (capital letter)," OCR may extract "I (capital letter)" or "1 (number one)." Therefore, if the character extracted by OCR is the confirmation character and the character in text TX is "I (capital letter)," it is good to determine that the extracted character matches the character in text TX regardless of whether the character extracted by OCR is "I" or "1." This makes it possible to prevent the similarity X from being calculated to be low due to the accuracy of character extraction by OCR.

・学習済モデルＬＭの入力変数は、所定の情報及び画像データを含んでいるのであれば、書類の種類を含まなくてもよい。
・ニューラルネットワークは、中間層が１層のフィードフォワードネットワークに限らない。例えば、ニューラルネットワークは、中間層が２層以上のネットワークであってもよいし、畳み込みニューラルネットワークやリカレントニューラルネットワークであってもよい。 - The input variables of the learned model LM do not need to include the document type as long as they include specified information and image data.
The neural network is not limited to a feedforward network with one intermediate layer. For example, the neural network may be a network with two or more intermediate layers, or may be a convolutional neural network or a recurrent neural network.

・機械学習による学習済みモデルは、ニューラルネットワークでなくてもよい。例えば、学習済みモデルとして、サポートベクトルマシンを採用してもよい。
・画像を区分けした複数の区画について、所定の情報の内容を示す文字が記載されている可能性が高い順に順位付けを行う場合に、学習済モデルＬＭを用いなくてもよい。例えば、予め作成されたルールベースに基づいて、複数の区画について順位付けを行うようにしてもよい。 The trained model by machine learning does not have to be a neural network. For example, a support vector machine may be used as the trained model.
When ranking multiple sections into which an image is divided in order of the likelihood that characters indicating the contents of predetermined information are written, it is not necessary to use the learned model LM. For example, multiple sections may be ranked based on a rule base created in advance.

・コンピューター２０は、ＣＰＵとＲＯＭとを備えて、ソフトウェア処理を実行するものに限らない。すなわち、コンピューター２０は、以下（ａ）～（ｃ）の何れかの構成であればよい。 The computer 20 is not limited to having a CPU and ROM and executing software processing. In other words, the computer 20 may have any of the following configurations (a) to (c).

（ａ）コンピューター２０は、コンピュータープログラムに従って各種処理を実行する一つ以上のプロセッサを備えている。プロセッサは、ＣＰＵ並びに、ＲＡＭ及びＲＯＭなどのメモリを含んでいる。メモリは、処理をＣＰＵに実行させるように構成されたプログラムコード又は指令を格納している。メモリ、すなわちコンピューター可読媒体は、汎用又は専用のコンピューターでアクセスできるあらゆる利用可能な媒体を含んでいる。 (a) Computer 20 includes one or more processors that perform various processes according to a computer program. The processor includes a CPU and memory such as RAM and ROM. The memory stores program code or instructions configured to cause the CPU to perform processes. Memory, i.e., computer-readable media, includes any available media that can be accessed by a general-purpose or special-purpose computer.

（ｂ）コンピューター２０は、各種処理を実行する一つ以上の専用のハードウェア回路を備えている。専用のハードウェア回路としては、例えば、特定用途向け集積回路、すなわちＡＳＩＣ又はＦＰＧＡを挙げることができる。なお、ＡＳＩＣは、「Application Specific Integrated Circuit」の略記であり、ＦＰＧＡは、「Field Programmable Gate Array」の略記である。 (b) The computer 20 is equipped with one or more dedicated hardware circuits that execute various processes. Examples of dedicated hardware circuits include application specific integrated circuits, i.e., ASICs or FPGAs. Note that ASIC is an abbreviation for "Application Specific Integrated Circuit" and FPGA is an abbreviation for "Field Programmable Gate Array."

（ｃ）コンピューター２０は、各種処理の一部をコンピュータープログラムに従って実行するプロセッサと、各種処理のうちの残りの処理を実行する専用のハードウェア回路とを備えている。 (c) The computer 20 includes a processor that executes some of the various processes according to a computer program, and a dedicated hardware circuit that executes the remaining processes.

なお、本明細書において使用される「少なくとも１つ」という表現は、所望の選択肢の「１つ以上」を意味する。一例として、本明細書において使用される「少なくとも１つ」という表現は、選択肢の数が２つであれば「１つの選択肢のみ」又は「２つの選択肢の双方」を意味する。他の例として、本明細書において使用される「少なくとも１つ」という表現は、選択肢の数が３つ以上であれば「１つの選択肢のみ」又は「２つ以上の任意の選択肢の組み合わせ」を意味する。 Note that the expression "at least one" used in this specification means "one or more" of the desired options. As an example, the expression "at least one" used in this specification means "only one option" or "both of two options" if the number of options is two. As another example, the expression "at least one" used in this specification means "only one option" or "any combination of two or more options" if the number of options is three or more.

２０…コンピューター、２１…ＣＰＵ、２３…第２記憶装置、ＩＭＧＡ，ＩＭＧＢ…画像、ＬＭ…学習済モデル、ＳＣ１～ＳＣ６…区画。 20...Computer, 21...CPU, 23...Second storage device, IMGA, IMGB...Images, LM...Trained model, SC1 to SC6...Partitions.

Claims

A character extraction method for extracting characters indicating content of predetermined information from an image represented by image data, comprising the steps of:
Segmenting the image into a plurality of sections;
selecting, from among the plurality of sections, the section that is most likely to contain characters indicating the content of the predetermined information based on the predetermined information;
and extracting characters indicating the content of the predetermined information from the selected section.

When selecting the section having the highest possibility of containing characters indicating the content of the predetermined information from among the plurality of sections, ranking the plurality of sections in order of the possibility of containing characters indicating the content of the predetermined information based on the predetermined information, and causing the computer to select the section having the highest ranking;
selecting the section with the second highest rank from among the plurality of sections when characters indicating the content of the predetermined information cannot be extracted from the section with the highest rank;
The character extraction method according to claim 1 , further comprising the step of: extracting characters indicating a content of the predetermined information from the section having the second highest rank.

The computer includes a storage device that stores a trained model that has been subjected to machine learning, and the trained model uses the predetermined information and the image data as input variables, and outputs, as an output variable, a value indicating the degree of likelihood that characters indicating the content of the predetermined information are written in the plurality of sections into which the image is divided;
The character extraction method according to claim 2, wherein when ranking the plurality of sections, the computer ranks the plurality of sections based on output variables of the trained model when the specified information and the image data are input as input variables to the trained model.

The input variables of the trained model further include the type of document depicted in the image;
The character extraction method according to claim 3, wherein when ranking the multiple sections, the computer ranks the multiple sections based on output variables of the trained model when the specified information, the image data, and the document type are input to the trained model as input variables.