JP7178016B2

JP7178016B2 - Image processing device and its image processing method

Info

Publication number: JP7178016B2
Application number: JP2021032135A
Authority: JP
Inventors: 啓延辻; 洋祐辻; 達郎林
Original assignee: Eyetec Co Ltd
Current assignee: Eyetec Co Ltd
Priority date: 2021-03-01
Filing date: 2021-03-01
Publication date: 2022-11-25
Anticipated expiration: 2041-03-01
Also published as: JP2022133179A

Description

本発明は、歯の物体画像を処理し、個々の歯を識別及び分類し、それらの配置、配列、及び、同定を行うことが可能なニューラルネットワーク（Neural Network、ＮＮ）系画像処理装置、及び、その画像装置を用いた物体画像の配置、配列、及び、同定を正確かつ迅速に行い、歯式の決定等が可能な画像処理装置、及び、それを用いた画像処理方法に関する。特に、本発明は、歯科エックス線デジタル写真からの歯式の決定等に適した画像処理装置、及び、それを用いた画像処理方法に関する。 The present invention is a neural network (NN) based image processing device capable of processing tooth object images, identifying and classifying individual teeth , arranging them, arranging them, and identifying them, and , an image processing apparatus capable of accurately and quickly locating, arranging, and identifying an object image using the imaging apparatus, and determining a dental formula, etc. , and an image processing method using the same. In particular, the present invention relates to an image processing apparatus and an image processing method using the same, which are suitable for determining dental formulas from dental X-ray digital photographs .

近年、人工知能（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ、ＡＩ）と一般的に称され、人間の知的能力又はそれ以上の能力をコンピュータに実行させようとする技術が急速に発展し、農林水産業、各種製造業、建設業、情報通信業、電気・ガス・水道業、運輸・郵便業、卸売・小売業、金融・保険業、各種サービス業、医療・福祉、公務等あらゆる産業に利用されつつある。 In recent years, generally referred to as artificial intelligence (AI), a technology that attempts to make a computer perform human intellectual ability or higher ability has developed rapidly, agriculture, forestry and fisheries, various manufacturing industries, It is being used in all industries such as construction, information and communication, electricity, gas, water, transportation, postal, wholesale and retail, finance and insurance, various services, medical and welfare, and public affairs.

ＡＩの定義は、専門家においても明確に定まっていないが、深層学習（ＤｅｅｐＬｅａｒｎｉｎｇ、ＤＬ）という、生物の脳の神経細胞（ニューロン）をモデルとしたＮＮの階層を深めたアルゴリズムを用いて、売上、金融、環境、音声、及び、画像等のあらゆるデータを解析して、情報及び知識として出力する計算システムを活用しているものであると考えられる。 The definition of AI is not clearly defined even by experts, but deep learning (DL), which is a model of nerve cells (neurons) in the brain of living organisms, is an algorithm that deepens the hierarchy of NN. It is thought that a computing system that analyzes all kinds of data such as sales, finance, environment, voice, and images and outputs them as information and knowledge is utilized.

そして、ＮＮの基本構成は、入力層、隠れ層、及び、出力層を備え、各層は複数のノードがエッジで結合される構造となっており、この隠れ層は複数持つことができ、特にその層数が多いものをＤＬと呼んでいる。このようなＤＬでは、各層に「活性化関数」と呼ばれる関数を持たせると共に、「エッジ」には「重み」を持たせ、各ノードの値は、前の層のノードの値、接続エッジの重みの値、そして層が持つ活性化関数から計算される。現在では、データ、用途、目的等に応じて、多種多様な構成に進化し、無数のアルゴリズムが開発されてきた（例えば、非特許文献１）。 The basic configuration of the NN includes an input layer, a hidden layer, and an output layer. Each layer has a structure in which a plurality of nodes are connected by edges. Those with a large number of layers are called DL. In such a DL, each layer is given a function called an ``activation function'', and an ``edge'' is given a ``weight''. It is calculated from the weight value and the activation function that the layer has. At present, it has evolved into a wide variety of configurations and countless algorithms have been developed (for example, Non-Patent Document 1).

このようなＤＬにおいて、画像処理分野で特に実績があるのは、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）で、隠れ層が畳み込み層とプーリング層で構成されているものである（例えば、非特許文献１）。畳み込み層は、前の層のノードにフィルタ処理して「特徴マップ」を得、プーリング層は、畳込み層から出力された特徴マップを、更に縮小して新たな特徴マップとするので、画像の特徴を維持しながら画像の持つデータ量を大幅に圧縮し、画像を抽象化している。つまり、この抽象化された画像は、入力画像の特徴を維持しながらデータ量が縮小化された画像データとなっているため、この抽象化された画像を用いて、入力される画像の識別や分類を高速で処理できるようになり、画像認識の性能が大きく向上した。そのため、ＣＮＮを利用した様々な画像処理アルゴリズムが開発され、今なお重要な役割を果たしている。 Among such DLs, a CNN (Convolutional Neural Network), which has a particularly successful track record in the field of image processing, has a hidden layer composed of a convolutional layer and a pooling layer (for example, Non-Patent Document 1). The convolutional layer filters the nodes of the previous layer to obtain a "feature map", and the pooling layer further reduces the feature map output from the convolutional layer into a new feature map, so the image's The image is abstracted by significantly compressing the amount of data in the image while maintaining its features. In other words, this abstracted image is image data in which the amount of data is reduced while maintaining the characteristics of the input image. Classification can now be processed at high speed, greatly improving the performance of image recognition. Therefore, various image processing algorithms using CNN have been developed and still play an important role.

しかし、例えば、ＩＬＳＶＲＣ（ＩｍａｇｅＮｅｔＬａｒｇｅＳｃａｌｅＶｉｓｕａｌＲｅｃｏｇｎｉｔｉｏｎＣｈａｌｌｅｎｇｅ）２０１２という画像認識の国際競技では、学習用データとして１２０万枚ものアノテーション付き画像が使われていたように、画像認識のためにＣＮＮを単純に使用するだけでは、膨大な学習データの解析には長大な学習時間が必要であるという問題があった。 However, for example, in the international image recognition competition called ILSVRC (ImageNet Large Scale Visual Recognition Challenge) 2012, 1.2 million annotated images were used as training data. There was a problem that a long learning time was required to analyze a huge amount of learning data just by using it.

また、通常、画像には複数の検出対象物体が存在するため、画像認識に要求される性能は、検出対象となる一つの物体画像を分類するタスク（Ｃｌａｓｓｉｆｉｃａｔｉｏｎ）及びこのように分類された一つの物体画像の位置を特定するタスク（Ｌｏｃａｌｉｚａｔｉｏｎ）だけではなく、画像に存在する複数の検出対象物体画像の分類及び位置を特定するタスク（Ｄｅｔｅｃｔｉｏｎ）を解決する必要がある。 In addition, since an image usually contains a plurality of objects to be detected, the performance required for image recognition consists of the task of classifying one object image to be detected (classification) and the task of classifying one object image thus classified. It is necessary to solve not only the task (localization) of identifying the position of an object image, but also the task (detection) of classifying and locating a plurality of detection target object images present in the image.

そこで、検出対象物体画像の分類及び位置の特定を高速かつ正確に行うことができる画像処理アルゴリズムの開発が、ＣＮＮを利用して検討された。その契機となったのが、検出対象物体画像の処理画像データ量の削減に成功した画像処理アルゴリズムで、ＳｅｌｅｃｔｉｖｅＳｅａｒｃｈと呼ばれる手法で領域候補（ＲｅｇｉｏｎＰｒｏｐｏｓａｌ）を抽出するネットワーク（ＲｅｇｉｏｎＰｒｏｐｏｓａｌＮｅｔｗｏｒｋ）を経た後、ＣＮＮを基礎としたネットワークを用いて特徴量を抽出し、取り込まれた画像内の主要な物体を矩形（ＢｏｕｎｄｉｎｇＢｏｘ）として正確に識別することで物体画像の分類及び位置の特定を実行できるＲ－ＣＮＮ（Ｒｅｇｉｏｎ－ｂａｓｅｄＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）という物体検出アルゴリズムである。このＲ－ＣＮＮにより、物体画像の分類及び位置を特定する物体画像の検出精度が大幅に向上した。 Therefore, the development of an image processing algorithm capable of quickly and accurately classifying and locating images of objects to be detected has been studied using CNN. The impetus for this was an image processing algorithm that successfully reduced the amount of processing image data for the object image to be detected. After that, a CNN-based network can be used to extract features and accurately identify the main object in the captured image as a bounding box, thereby performing object image classification and localization. This is an object detection algorithm called R-CNN (Region-based Convolutional Neural Network). This R-CNN greatly improves the object image detection accuracy for classifying and locating object images.

その結果、画像処理では、ＤＬとしてＣＮＮを利用して、より高速でより正確に物体画像の検出を目的とした物体検出アルゴリズムが、続々と開発された。例えば、ＦａｓｔＲ－ＣＮＮ、ＦａｓｔｅｒＲ－ＣＮＮ、ＭａｓｋＲ－ＣＮＮ、及び、Ｒ－ＦＣＮ（Ｒｅｇｉｏｎ－ｂａｓｅｄＦｕｌｌｙＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）、及び、ＳＰＰ－ｎｅｔ（ＳｐａｔｉａｌＰｙｒａｍｉｄＰｏｏｌｉｎｇ－Ｎｅｔｗｏｒｋ）等を挙げることができる（例えば、非特許文献２）。 As a result, in image processing, object detection algorithms have been developed one after another for the purpose of faster and more accurate object image detection using CNN as DL. For example, Fast R-CNN, Faster R-CNN, Mask R-CNN, and R-FCN (Region-based Fully Convolutional Network), and SPP-net (Spatial Pyramid Pooling-Network), etc. can be mentioned ( For example, Non-Patent Document 2).

これらは、Ｒ－ＣＮＮの影響を強く受け、上述したように、画像の中の検出物体画像の領域候補を抽出するネットワークと、領域候補の検出物体画像を識別するＣＮＮを基礎とするネットワークとが直列に実行される二段階（Ｔｗｏ－Ｓｔａｇｅ）法の物体検出アルゴリズムである。そのため、比較的精度の高いものであるが、高速性という点に難があり、精度を落とすことがない改良が進められた。 These are heavily influenced by R-CNN, and as described above, networks that extract region candidates for detected object images in an image and CNN-based networks that identify detected object images for region candidates. It is a two-stage object detection algorithm that runs in series. For this reason, although the accuracy was relatively high, there was a problem in terms of high speed, and improvements were made in order to maintain the accuracy.

その成果として、領域候補の抽出とその識別とをＣＮＮを含む一つのネットワークでＤＬを行う一段階（Ｏｎｅ－Ｓｔａｇｅ）法の物体検出アルゴリズムが多数開発され、現在も引き続き開発されている。その代表例が、Оｖｅｒｆｅａｔ、ＤＰＭ（ＤｅｆｏｒｍａｂｌｅＰａｒｔｓＭｏｄｅｌ）、ＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＤＳＳＤ（ＤｅｃｏｎｖｏｌｕｔｉｏｎａｌＳｉｎｇｌｅＳｈｏｔＤｅｔｅｃｔｏｒ）、ＥＳＳＤ（ＥｘｔｅｎｄｔｈｅｓｈａｌｌｏｗｐａｒｔｏｆＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＲｅｆｉｎｅＤｅｔ（Ｓｉｎｇｌｅ－ＳｈｏｔＲｅｆｉｎｅｍｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋｆｏｒＯｂｊｅｃｔＤｅｔｅｃｔｉｏｎ）、ＲｅｔｉｎａＮｅｔ、Ｍ２Ｄｅｔ、ＹＯＬＯ、及び、ＥｆｆｉｃｉｅｎｔＤｅｔである（例えば、非特許文献２）。特に、一段階法の物体検出アルゴリズムの進化は著しく、ＹＯＬＯをはじめとして、これらの名称にバージョン等を付設し、改良されたアルゴリズムとして数多くのものが輩出されている。 As a result, many One-Stage object detection algorithms have been developed in which extraction of region candidates and their identification are performed by DL in one network including CNN, and are still being developed today.その代表例が、Оｖｅｒｆｅａｔ、ＤＰＭ（ＤｅｆｏｒｍａｂｌｅＰａｒｔｓＭｏｄｅｌ）、ＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＤＳＳＤ（ＤｅｃｏｎｖｏｌｕｔｉｏｎａｌＳｉｎｇｌｅＳｈｏｔＤｅｔｅｃｔｏｒ）、ＥＳＳＤ（ＥｘｔｅｎｄｔｈｅｓｈａｌｌｏｗｐａｒｔｏｆＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＲｅｆｉｎｅＤｅｔ（Ｓｉｎｇｌｅ－ＳｈｏｔＲｅｆｉｎｅｍｅｎｔＮｅｕｒａｌ Network for Object Detection), RetinaNet, M2Det, YOLO, and EfficientDet (for example, Non-Patent Document 2). In particular, the evolution of the one-step object detection algorithm is remarkable, and many improved algorithms, such as YOLO, have been produced with versions added to these names.

一方、二段階法及び一段階法のいずれの場合も、物体画像の識別を行うＣＮＮを基礎としたネットワークを構成要素（バックボーン）として内蔵されており、このネットワークで様々な学習画像データを用いてＤＬされた結果が、転移学習と呼称され、検出対象物体画像の分類及び位置を特定する精度及び速度を高めてきた大きな要因となっている。このようなバックボーンとしての物体分類アルゴリズムには、例えば、ＡｌｅｘＮｅｔ、ＧＰｉｐｅ（ＧｉａｎｔＮｅｕｒａｌＮｅｔｗｏｒｋｓｕｓｉｎｇＰｉｐｅｌｉｎｅＰａｒａｌｌｅｌｉｓｍ）、Ｉｎｃｅｐｔｉｏｎ、ＳＥＢ（Ｓｑｕｅｅｚｅ－ａｎｄ－ＥｘｃｉｔａｔｉｏｎＢｌｏｃｋ）－Ｉｎｃｅｐｔｉｏｎ、Ｘｅｐｔｉｏｎ、ＤｅｎｓｅＮｅｔ（ＤｅｎｓｅｌｙＣｏｎｎｅｃｔｅｄＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）、ＲｅｓＮｅｔ（ＲｅｓｉｄｕａｌＮｅｔｗｏｒｋ）、ＳＥＢ－ＲｅｓＮｅｔ、Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＳＥＢ－Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＲｅｓＮｅＸｔ、ＮＡＳＮｅｔ（ＮｅｕｒａｌＡｒｃｈｉｔｅｃｔｕｒｅＳｅａｒｃｈＮｅｔｗｏｒｋ）、ＶＧＧ（ＶｉｓｕａｌＧｅｏｍｅｔｒｙＧｒｏｕｐ）、ＳＥＢ－ＶＧＧ、ＭｏｂｉｌｅＮｅｔ、ＭｎａｓＮｅｔ、ＡｍｏｅｂａＮｅｔ、ＣＳＰＮｅｔ（ＣｒｏｓｓＳｔａｇｅＰａｒｔｉａｌＮｅｔｗｏｒｋ）、ＣＢＮｅｔ（ＣｏｍｐｏｓｉｔｅＢａｃｋｂｏｎｅＮｅｔｗｏｒｋ）、Ｄａｒｋｎｅｔ、及び、ＥｆｆｉｃｉｅｎｔＮｅｔ等を挙げることができる。この物体分類アルゴリズムも、物体検出アルゴリズム同様、これらの名称にバージョン等を付設し、改良されたアルゴリズムとして数多くのものが輩出されている。 On the other hand, in both the two-step method and the one-step method, a CNN-based network that identifies object images is built in as a component (backbone), and various training image data are used in this network. The DL result is called transfer learning, and has been a major factor in increasing the accuracy and speed of classifying and identifying the position of the object image to be detected.このようなバックボーンとしての物体分類アルゴリズムには、例えば、ＡｌｅｘＮｅｔ、ＧＰｉｐｅ（ＧｉａｎｔＮｅｕｒａｌＮｅｔｗｏｒｋｓｕｓｉｎｇＰｉｐｅｌｉｎｅＰａｒａｌｌｅｌｉｓｍ）、Ｉｎｃｅｐｔｉｏｎ、ＳＥＢ（Ｓｑｕｅｅｚｅ－ａｎｄ－ＥｘｃｉｔａｔｉｏｎＢｌｏｃｋ）－Ｉｎｃｅｐｔｉｏｎ、Ｘｅｐｔｉｏｎ、ＤｅｎｓｅＮｅｔ（ＤｅｎｓｅｌｙＣｏｎｎｅｃｔｅｄＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）、ＲｅｓＮｅｔ（ＲｅｓｉｄｕａｌＮｅｔｗｏｒｋ）、ＳＥＢ－ＲｅｓＮｅｔ、Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＳＥＢ－Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＲｅｓＮｅＸｔ、ＮＡＳＮｅｔ（ＮｅｕｒａｌＡｒｃｈｉｔｅｃｔｕｒｅＳｅａｒｃｈＮｅｔｗｏｒｋ）、ＶＧＧ（ＶｉｓｕａｌＧｅｏｍｅｔｒｙＧｒｏｕｐ）、ＳＥＢ－ＶＧＧ、ＭｏｂｉｌｅＮｅｔ、ＭｎａｓＮｅｔ、ＡｍｏｅｂａＮｅｔ、ＣＳＰＮｅｔ（Ｃｒｏｓｓ Stage Partial Network), CBNet (Composite Backbone Network), Darknet, and EfficientNet. Similar to the object detection algorithm, this object classification algorithm also has versions added to these names, and many improved algorithms have been produced.

そして、このように進歩した物体検出アルゴリズムを備えた画像処理装置が、ＡＩと一般的に称され、農林水産業、各種製造業、建設業、情報通信業、電気・ガス・水道業、運輸・郵便業、卸売・小売業、金融・保険業、各種サービス業、医療・福祉、公務等あらゆる産業において活用され、人間の知的能力又はそれ以上の能力をコンピュータに実行させることができ得る可能性が飛躍的に高まっている。 Image processing devices equipped with such advanced object detection algorithms are generally referred to as AI, and are widely used in the agriculture, forestry and fisheries industries, various manufacturing industries, construction industries, information and communication industries, electricity, gas and water industries, transportation and It is used in all industries such as postal business, wholesale/retail business, finance/insurance business, various service businesses, medical care/welfare, public affairs, etc. Possibility of making computers perform human intellectual ability or higher ability. is rising dramatically.

特に、医療分野では、画像データ及び情報の標準化、ＡＩを利用した画像診断、及び、遠隔医療の展開という観点から、医療の電子化が際限なく広がっている（例えば、非特許文献３及び４）。中でも、従来、Ｘ線ＣＴ（ＣｏｍｐｕｔｅｄＴｏｍｏｇｒａｐｈｙ）画像からの癌の検出、頭部ＭＲＩ（ＭａｇｎｅｔｉｃＲｅｓｏｎａｎｃｅＩｍａｇｉｎｇ）からの脳梗塞の検出、及び、内視鏡画像からのポリープ等の検出、並びに、被曝線量が極めて少ない歯科パノラマＸ線画像（ＤｅｎｔａｌＰａｎｏｒａｍｉｃＲａｄｉｏｇｒａｐｈ、ＤＰＲ）からの歯式の作成による顎骨の腫瘍、嚢胞、及び、骨粗鬆症等の診断等において実用化が進められてきたコンピュータ支援診断／検出（Ｃｏｍｐｕｔｅｒ－ＡｉｄｅｄＤｉａｇｎｏｓｉｓ／Ｄｅｔｅｃｔｉｏｎ、ＣＡＤ）システムに、ＡＩの中でも画像認識に優れたＤＬを用いた画像処理技術を適用する開発が最も注目され、期待されている。これは、上述したように、ＤＬを基礎としたＡＩの進歩があらゆる産業で大きな成果を上げつつあり、ＤＬを用いた画像認識アルゴリズムの進歩によるＡＩの画像認識精度の劇的な向上にその要因がある。 In particular, in the medical field, from the perspective of standardization of image data and information, image diagnosis using AI, and the development of telemedicine, the digitization of medical care is spreading endlessly (for example, Non-Patent Documents 3 and 4). . Among them, conventionally, detection of cancer from X-ray CT (Computed Tomography) images, detection of cerebral infarction from head MRI (Magnetic Resonance Imaging), detection of polyps from endoscopic images, and exposure dose Computer aided diagnosis/detection (Computer -Aided Diagnosis/Detection, CAD) system, the development of applying image processing technology using DL, which is excellent in image recognition among AI, is the most noticeable and expected. As mentioned above, advances in AI based on DL are producing great results in all industries, and the reason for this is the dramatic improvement in AI image recognition accuracy due to advances in image recognition algorithms using DL. There is

特に、歯科医療の分野に限定してみれば、第一にＡＩの自動画像診断支援システムへの適用、第二にＡＩの個人識別システムへの応用が注目されている。 In particular, when limited to the field of dentistry, the first application of AI to an automatic image diagnosis support system and the second application of AI to a personal identification system are attracting attention.

自動画像診断支援システムは、骨粗鬆症、う蝕、根尖病巣、歯石、及び、嚢胞等のＤＰＲを、歯科医の負担が軽減され、迅速かつ高精度となるように、ＡＩにより自動的に解析するものである（例えば、非特許文献５及び６）。従来、歯科医がＤＰＲから画像所見を作成するためには、専門的な知識を要すると共に、長時間を費やす必要があり、歯科医にとって大きな負担であると共に、経験に委ねられた主観的な画像所見となっていた。ＡＩによれば、経験の少ない歯科医では見落としがちな画像所見を、又、客観的な画像所見を迅速かつ正確に検出する可能性が高いものと期待されている。ＤＰＲは、撮影時の被曝線量が極めて少なく、最も広く普及しデータ量が豊富な画像である上、歯と顎骨の特徴及び病変を全て描出するため、それを用いた開発が中心となっている。もちろん、ＡＩによる自動画像診断システムに用いることができる画像は、歯科デジタル画像であれば限定されるものではなく、各種口内Ｘ線画像、三次元画像が得られる歯科用ＣＢＣＴ（ＣｏｎｅｂｅａｍＸ－ｒａｙＣｏｍｐｕｔｅｄＴｏｍｏｇｒａｐｈ）、頭部Ｘ線規格写真であるセファログラム（Ｃｅｐｈａｌｏｇｒａｍ）等も利用することが可能である。 The automatic image diagnosis support system automatically analyzes DPR such as osteoporosis, dental caries, periapical lesions, tartar, and cysts with AI so that the burden on dentists can be reduced, quickly and with high accuracy. (For example, Non-Patent Documents 5 and 6). Conventionally, in order for a dentist to create an image finding from DPR, it requires specialized knowledge and requires a long time, which is a heavy burden for the dentist and subjective images left to experience. It was an observation. AI is expected to have a high possibility of quickly and accurately detecting imaging findings that tend to be overlooked by inexperienced dentists and objective imaging findings. DPR is the most widely used image with a large amount of data, with extremely low exposure dose during imaging, and it is the center of development using it because it visualizes all the characteristics and lesions of the teeth and jawbone. . Of course, the images that can be used in the automatic image diagnosis system by AI are not limited as long as they are dental digital images, and various intraoral X-ray images, dental CBCT (Cone beam X-ray) that can obtain three-dimensional images Computed Tomograph), Cephalogram, which is a head X-ray photograph, and the like can also be used.

個人識別システムは、大規模災害等における遺体の身元を明らかにするものである（例えば、非特許文献７及び８）。個人識別には、身体的特徴、指紋、遺伝子情報、歯牙情報等が用いられるが、遺体の損傷が激しい場合は、身体的特徴や指紋による識別が困難であり、遺伝子情報は、生前の情報を取得されていない場合が多いため、歯科的個人識別の重要性が認識されている。また、東日本大震災における歯科所見から１，２５０人の身元を明らかにした実績がある。しかしながら、このような歯科的個人識別は、歯科デジタルＸ線画像に基づいて行われるが、上記画像診断と同様、専門的知識及び時間を要するため、大規模災害等のように多数の身元を判定するには、莫大な労力が必要となる。また、検死作業の精神的負担は、診断と比較することはできない。そのため、ＡＩによる個人識別システムは、歯科医師が介することなく迅速かつ正確に身元確認が行うことができる方法として期待が高まっており、そのための画像データの管理等の環境整備と共に積極的な開発が進められている。 A personal identification system identifies a dead body in a large-scale disaster or the like (for example, Non-Patent Documents 7 and 8). Physical characteristics, fingerprints, genetic information, dental information, etc. are used for personal identification, but if the corpse is severely damaged, it is difficult to identify by physical characteristics or fingerprints, and genetic information is used to identify information from before life. The importance of dental personal identification is recognized because it is often not obtained. In addition, there is a track record of clarifying the identities of 1,250 people from dental findings in the Great East Japan Earthquake. However, although such dental personal identification is performed based on dental digital X-ray images, it requires specialized knowledge and time as with the above-mentioned image diagnosis. To do so requires a great deal of effort. Also, the mental burden of autopsy work cannot be compared with diagnosis. For this reason, there are growing expectations for AI-based personal identification systems as a method that enables rapid and accurate identification without the intervention of a dentist. is underway.

画像認識アルゴリズムの開発状況から分かるように、このような歯科医療におけるＡＩに対する期待も海外の方が日本以上に高く、ＤＬを用いた画像認識技術が歯科デジタルＸ線画像解析のために数多く検討されている（例えば、非特許文献９～１５）。 As can be seen from the development of image recognition algorithms, expectations for AI in dentistry are higher overseas than in Japan, and image recognition technology using DL has been extensively studied for dental digital X-ray image analysis. (eg, Non-Patent Documents 9-15).

これらの従来技術を総体的に考察すると、歯の画像処理において、各々の歯を同定し、歯番を特定して歯式を決定することが極めて難解な課題であることが分かる。 A general consideration of these conventional techniques reveals that in tooth image processing, identifying each tooth, specifying the tooth number, and determining the tooth formula is an extremely difficult task.

第一に、物体検出アルゴリズムとしては、物体画像の分類及び位置を特定する物体画像の検出精度が大幅に向上する契機となったＲ－ＣＮＮを基礎として用いている矩形検出が多く、例えば、非特許文献９、１１、１３、及び、１４では、いずれも、ＦａｓｔｅｒＲ－ＣＮＮを用い、非特許文献１０では、ＭａｓｋＲ－ＣＮＮを用いている。これは、歯が類似した物体で密接して配置されているため、精度を重視して二段階法の物体検出アルゴリズムであるＲ－ＣＮＮ系の研究結果が数多く報告されているものと推測される。非特許文献１４において、ＦａｓｔｅｒＲ－ＣＮＮの再現率、特異率、及び、適合率のいずれもが、一段階法の物体検出アルゴリズムであるＳＳＤのそれらを上回っていることは、この推測を示唆している。 First, as an object detection algorithm, there are many rectangle detections based on R-CNN, which has greatly improved the detection accuracy of object images for classifying and locating object images. Patent Documents 9, 11, 13, and 14 all use Faster R-CNN, and Non-Patent Document 10 uses Mask R-CNN. It is presumed that this is because the teeth are closely arranged with similar objects, so many research results of the R-CNN system, which is a two-stage object detection algorithm that emphasizes accuracy, have been reported. . In Non-Patent Document 14, the recall, specificity, and precision of Faster R-CNN exceed those of SSD, a one-step object detection algorithm, suggesting this speculation. ing.

第二に、非特許文献１０及び１２では、物体検出とは目的が異なるが、物体を内包する矩形ではなく物体の境界を明確に分離してピクセル単位で特定するセグメンテーションという画像認識に適したアルゴリズムであるＭａｓｋＲ－ＣＮＮやＤｅｅｐｌａｂｖ３を検討している。ピクセル単位で特定することは、歯のサイズや歯の画素値をより正確に計測できる特徴があり、上記矩形検出にはない特徴を有しているが、ＤＬにおける出力量が大きく、計算コストが大きいという問題がある。 Secondly, in Non-Patent Documents 10 and 12, although the purpose is different from object detection, an algorithm suitable for image recognition called segmentation that clearly separates the boundary of the object instead of the rectangle that encloses the object and specifies it in units of pixels. We are considering Mask R-CNN and Deeplabv3. Specifying in pixel units has the feature that the tooth size and the pixel value of the tooth can be measured more accurately, and has features not found in the above rectangle detection, but the output amount in DL is large and the calculation cost is high. There is the problem of being big.

第三に、非特許文献９、並びに、非特許文献１１～１４から明らかなように、予測された歯式は何らかの補正を施さなければ精度を高めることができない。コンピュータプログラミングにおけるアルゴリズムとは対置するヒューリスティックな経験的手法が用いられる場合が多いが、非特許文献１２では、専門家による補正が施されている。また、非特許文献１５では、ＤＬを用いた歯の画像認識ではないが故に、最終的な歯式の予測において、バイオインフォマティクス分野でタンパク質や遺伝子情報を解明するための動的計画法（ＤｙｎａｍｉｃＰｒｏｇｒａｍｉｎｇ、ＤＰ法）と呼ばれるアルゴリズム、ここでは簡略化されたスミス－ウォーターマン（Ｓｍｉｔｈ－Ｗａｔｅｒｍａｎ）アルゴリズムがローカルアライメントに適用されている。 Third, as is clear from Non-Patent Document 9 and Non-Patent Documents 11 to 14, the accuracy of the predicted tooth formula cannot be improved without some kind of correction. Although heuristic empirical methods are often used as opposed to algorithms in computer programming, Non-Patent Document 12 provides expert corrections. In addition, in Non-Patent Document 15, since it is not tooth image recognition using DL, in the final prediction of tooth expression, dynamic programming for elucidating protein and genetic information in the bioinformatics field , DP method), here a simplified Smith-Waterman algorithm, has been applied to the local alignment.

第四に、非特許文献１１では二段階法の物体検出アルゴリズムと物体分類アルゴリズムが、非特許文献１２ではセグメンテーションに適したアルゴリズムと物体分類アルゴリズムが、それぞれ直列に接続され画像処理が実行されていることである。 Fourth, in Non-Patent Document 11, an object detection algorithm and an object classification algorithm of the two-step method, and in Non-Patent Document 12, an algorithm suitable for segmentation and an object classification algorithm are connected in series and image processing is executed. That is.

このように、類似した物体が密接して配置されている歯の画像処理において、各々の歯を同定し、歯番を特定して歯式を決定することが極めて難解な課題であることが分かる。そのため、このような画像処理には精度を重視した上、ヒューリスティックな補正を施す必要があり、ＤＬを用いた迅速かつ正確に画像認識を行うことが可能な画像処理装置は見出されていない。特に、成人に限定した歯の画像認識の検討例しかなく、幼児や子供のように、乳歯を備えたより複雑な歯式や永久歯と乳歯を備えた混合歯列を特定し予測することは困難な状況にある。 In this way, in image processing of teeth in which similar objects are closely arranged, it is an extremely difficult task to identify each tooth, specify the tooth number, and determine the tooth formula. . Therefore, in such image processing, it is necessary to emphasize accuracy and perform heuristic correction, and an image processing apparatus capable of quickly and accurately performing image recognition using DL has not been found. In particular, there are only examples of dental image recognition studies limited to adults, and it is difficult to identify and predict more complex dentition with deciduous teeth and mixed dentition with permanent and deciduous teeth, such as in infants and children. in a situation.

ＴｉｃｋＴａｃｋＷｏｒｌｄ，“やさしい機械学習入門”,［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］, インターネット＜ｈｔｔｐ：／／ｇａｇｂｏｔ.ｎｅｔ/ｍａｃｈｉｎｅ－ｌｅａｒｎｉｎｇ＞TickTack World, “Easy Introduction to Machine Learning”, [online], [searched on January 12, 2021], Internet <http://gagbot.net/machine-learning> 藤原弘将，「ディープラーニングによる一般物体認識とそのビジネス応用」，画像ラボ，２０１９年1月号，ｐｐ．５７－６７Hiromasa Fujiwara, “General Object Recognition by Deep Learning and Its Business Application,” Image Lab, January 2019, pp. 57-67 勝又明敏，「歯科画像情報の現状と将来展望」，日本歯科保存学雑誌, 第６２巻，第５号，ｐｐ．２３８－２４２（２０１９年１０月）Akitoshi Katsumata, "Current status and future prospects of dental image information", Journal of Japanese Dentistry Conservation, Vol. 62, No. 5, pp. 238-242 (October 2019) 勝又明敏，「パノラマＸ線写真をご存知ですか？」，ＮＬだより, 平成３０年２月号，Ｎｏ．４８２，ｐｐ．１－２Akitoshi Katsumata, “Do you know panoramic X-ray photography?”, NL Dayori, February 2018 issue, No. 482, pp. 1-2 林達郎，高橋龍，辻洋祐，辻啓延，「人工知能技術を用いた骨粗鬆症スクリーニング」，医用画像情報学会雑誌，Ｖｏｌ．３６，Ｎｏ．２，ｐｐ．１１４－１１６（２０１９）Tatsuro Hayashi, Ryu Takahashi, Yosuke Tsuji, Hironobu Tsuji, "Osteoporosis screening using artificial intelligence technology", Journal of the Society of Medical Image Information, Vol. 36, No. 2, pp. 114-116 (2019) メディホーム株式会社, 「（業界初）歯科エックス線における診断ＡＩの開発～医師と比較し、診断速度は約６０００倍～」，［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜Ｈｔｔｐｓ：／／ｐｒｔｉｍｅｓ.ｊｐ／ｍａｉｎ／ｈｔｍｌ／ｒｄ／ｐ／００００００００１．００００３４９０１．ｈｔｍｌ＞Medihome Co., Ltd., "(Industry's first) development of diagnostic AI for dental X-rays-diagnosis speed is about 6000 times faster than doctors-", [online], [searched January 12, 2021], Internet <HTTPS: //prtimes.jp/main/html/rd/p/000000001.000034901. html> 村松千左子, 「ディープラーニング技術の歯科的個人識別への応用」，ＪＣＲＮｅｗｓ，Ｎｏ．２１７，ｐｐ．１０－１１、（２０１７年）Chisako Muramatsu, "Application of deep learning technology to dental personal identification," JCR News, No. 217, pp. 10-11, (2017) 高野栄之, 桃田幸弘, 寺田賢治,「～過去に学び、未来に備える～ＡＩ・画像解析による身元確認の迅速化」，ＤｅｎｔａｌＤｉａｍｏｎｄ，２０２０年３月号，ｐｐ．８８－９３Hideyuki Takano, Yukihiro Momota, Kenji Terada, “-Learning from the Past, Preparing for the Future-Rapid Identity Confirmation Using AI and Image Analysis,” Dental Diamond, March 2020, pp. 88-93 ＨｕＣｈｅｎ，ＫａｉｌａｉＺｈａｎｇ，ＰｅｉｊｕｎＬｙｕ，ＨｏｎｇＬｉ，ＬｕｄａｎＺｈａｎｇ，ＪｉＷｕａｎｄＣｈｉｎ－ＨｕｉＬｅｅ，“Ａｄｅｅｐｌｅａｒｎｉｎｇａｐｐｒｏａｃｈｔｏａｕｔｏｍａｔｉｃｔｅｅｔｈｄｅｔｅｃｔｉｏｎａｎｄｎｕｍｂｅｒｉｎｇｂａｓｅｄｏｎｏｂｊｅｃｔｄｅｔｅｃｔｉｏｎｉｎｄｅｎｔａｌｐｅｒｉａｐｉｃａｌｆｉｌｍｓ”，ＳｃｉｅｎｔｉｆｉｃＲｅｐｏｒｔｓｖｏｌｕｍｅ９，Ａｒｔｉｃｌｅｎｕｍｂｅｒ：３８４０（２０１９），［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜Ｈｔｔｐｓ：／／ｗｗｗ.ｎａｔｕｒｅ.ｃｏｍ/ａｒｔｉｃｌｅｓ/ｓ４１５９８－０１９－４０４１４－ｙ＞ＨｕＣｈｅｎ，ＫａｉｌａｉＺｈａｎｇ，ＰｅｉｊｕｎＬｙｕ，ＨｏｎｇＬｉ，ＬｕｄａｎＺｈａｎｇ，ＪｉＷｕａｎｄＣｈｉｎ－ＨｕｉＬｅｅ，“Ａｄｅｅｐｌｅａｒｎｉｎｇａｐｐｒｏａｃｈｔｏａｕｔｏｍａｔｉｃｔｅｅｔｈｄｅｔｅｃｔｉｏｎａｎｄｎｕｍｂｅｒｉｎｇｂａｓｅｄｏｎｏｂｊｅｃｔｄｅｔｅｃｔｉｏｎｉｎｄｅｎｔａｌｐｅｒｉａｐｉｃａｌｆｉｌｍｓ”，ＳｃｉｅｎｔｉｆｉｃＲｅｐｏｒｔｓｖｏｌｕｍｅ９，Ａｒｔｉｃｌｅ number: 3840 (2019), [online], [searched January 12, 2021], Internet <https://www.nature.com/articles/s41598-019-40414-y> ＧｉｌＪａｄｅｒ，ＪｅｆｆｅｒｓｏｎＦｏｎｔｉｎｅｌｅ，ＭａｒｃｏＲｕｉｚ，ＫａｌｙｆＡｂｄａｌｌａ，ＭａｔｈｅｕｓＰｉｔｈｏｎａｎｄＬｕｃｉａｎｏＯｌｉｖｅｉｒａ，“ＤｅｅｐＩｎｓｔａｎｃｅＳｅｇｍｅｎｔａｔｉｏｎｏｆＴｅｅｔｈｉｎＰａｎｏｒａｍｉｃＸ－ＲａｙＩｍａｇｅｓ”，［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜ｆｉｌｅ：／／／Ｃ：／Ｕｓｅｒｓ／Ｕｓｅｒ／Ｄｏｗｎｌｏａｄｓ／ｔｏｏｔｈ＿ｓｅｇｍｅｎｔａｔｉｏｎ％２０（１）．ｐｄｆ＞ＧｉｌＪａｄｅｒ，ＪｅｆｆｅｒｓｏｎＦｏｎｔｉｎｅｌｅ，ＭａｒｃｏＲｕｉｚ，ＫａｌｙｆＡｂｄａｌｌａ，ＭａｔｈｅｕｓＰｉｔｈｏｎａｎｄＬｕｃｉａｎｏＯｌｉｖｅｉｒａ，“ＤｅｅｐＩｎｓｔａｎｃｅＳｅｇｍｅｎｔａｔｉｏｎｏｆＴｅｅｔｈｉｎＰａｎｏｒａｍｉｃＸ－ＲａｙＩｍａｇｅｓ”，［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜ｆｉｌｅ： ///C::/Users/User/Downloads/tooth_segmentation%20(1). pdf> ＤｍｉｔｒｙＶ．Ｔｕｚｏｆｆ，ＬｙｕｄｍｉｌａＮ．Ｔｕｚｏｖａ，ＭｉｃｈａｅｌＭ．Ｂｏｒｎｓｔｅｉｎ，ＡｌｅｘｅｙＳ．Ｋｒａｓｎｏｖ，ＭａｘＡ．Ｋｈａｒｃｈｅｎｋｏ，ＳｅｒｇｅｙＩ．Ｎｉｋｏｌｅｎｋｏ，ＭｉｋｈａｉｌＭ．ＳｖｅｓｈｎｉｋｏｖａｎｄＧｅｏｒｇｉｙＢ．Ｂｅｄｎｅｎｋｏ，“Ｔｏｏｔｈｄｅｔｅｃｔｉｏｎａｎｄｎｕｍｂｅｒｉｎｇｉｎｐａｎｏｒａｍｉｃｒａｄｉｏｇｒａｐｈｓｕｓｉｎｇｃｏｎｖｏｌｕｔｉｏｎａｌｎｅｕｒａｌｎｅｔｗｏｒｋｓ”，ＤｅｎｔｏｍａｘｉｌｌｏｆａｃｉａｌＲａｄｉｏｌｏｇｙＶｏｌｕｍｅ４８，ＩＳＳＵＥ４，２０１９，２０１８００５１，［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜Ｈｔｔｐｓ：／／ｗｗｗ．ｂｉｒｐｕｂｌｉｃａｔｉｏｎｓ．ｏｒｇ／ｄｏｉ／ｆｕｌｌ／１０．１２５９／Ｄｍｆｒ．２０１８００５１＞Dmitry V. Tuzoff, Lyudmila N.; Tuzova, Michael M.; Bornstein, Alexey S.; Krasnov, Max A.; Kharchenko, Sergey I.; Nikolenko, Mikhail M.; Sveshnikov and Georgy B. Ｂｅｄｎｅｎｋｏ，“Ｔｏｏｔｈｄｅｔｅｃｔｉｏｎａｎｄｎｕｍｂｅｒｉｎｇｉｎｐａｎｏｒａｍｉｃｒａｄｉｏｇｒａｐｈｓｕｓｉｎｇｃｏｎｖｏｌｕｔｉｏｎａｌｎｅｕｒａｌｎｅｔｗｏｒｋｓ”，ＤｅｎｔｏｍａｘｉｌｌｏｆａｃｉａｌＲａｄｉｏｌｏｇｙＶｏｌｕｍｅ４８，ＩＳＳＵＥ４，２０１９，２０１８００５１，［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜Ｈｔｔｐｓ：／／ｗｗｗ． publications. org/doi/full/10.1259/Dmfr. 20180051> ＡｎｄｒｅＦｅｒｒｅｉｒａＬｅｉｔｅ１，ＡｄｒｉａａｎＶａｎＧｅｒｖｅｎ，ＨｏｌｇｅｒＷｉｌｌｅｍｓ，ＴｈｏｍａｓＢｅｚｎｉｋ，ＰｉｅｒｒｅＬａｈｏｕｄ，ＨｕｇｏＧａｅｔａ－Ａｒａｕｊｏ，ＭｙｒｔｈｅｌＶｒａｎｃｋｘａｎｄＲｅｉｎｈｉｌｄｅＪａｃｏｂｓ，“Ａｒｔｉｆｉｃｉａｌｉｎｔｅｌｌｉｇｅｎｃｅ－ｄｒｉｖｅｎｎｏｖｅｌｔｏｏｌｆｏｒｔｏｏｔｈｄｅｔｅｃｔｉｏｎａｎｄｓｅｇｍｅｎｔａｔｉｏｎｏｎｐａｎｏｒａｍｉｃｒａｄｉｏｇｒａｐｈｓ”，［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜Ｈｔｔｐｓ：／／ｏｍｆｓｉｍｐａｔｈ．ｂｅ/ｏｎｅｗｅｂｍｅｄｉａ／Ａｒｔｉｆｉｃｉａｌ％２０ｉｎｔｅｌｌｉｇｅｎｃｅ－ｄｒｉｖｅｎ％２０ｎｏｖｅｌ％２０ｔｏｏｌ％２０ｆｏｒ％２０ｔｏｏｔｈ％２０ｄｅｔｅｃｔｉｏｎ％２０ａｎｄ％２０ｓｅｇｍｅｎｔａｔｉｏｎ％２０ｏｎ％２０ｐａｎｏｒａｍｉｃ％２０ｒａｄｉｏｇｒａｐｈｓ．ｐｄｆ＞ＡｎｄｒｅＦｅｒｒｅｉｒａＬｅｉｔｅ１，ＡｄｒｉａａｎＶａｎＧｅｒｖｅｎ，ＨｏｌｇｅｒＷｉｌｌｅｍｓ，ＴｈｏｍａｓＢｅｚｎｉｋ，ＰｉｅｒｒｅＬａｈｏｕｄ，ＨｕｇｏＧａｅｔａ－Ａｒａｕｊｏ，ＭｙｒｔｈｅｌＶｒａｎｃｋｘａｎｄＲｅｉｎｈｉｌｄｅＪａｃｏｂｓ，“Ａｒｔｉｆｉｃｉａｌｉｎｔｅｌｌｉｇｅｎｃｅ－ｄｒｉｖｅｎｎｏｖｅｌｔｏｏｌｆｏｒｔｏｏｔｈｄｅｔｅｃｔｉｏｎａｎｄｓｅｇｍｅｎｔａｔｉｏｎｏｎｐａｎｏｒａｍｉｃｒａｄｉｏｇｒａｐｈｓ”，［ｏｎｌｉｎｅ］，［ Retrieved January 12, 2021], Internet <HTTPS://omfsimpath. be/onewebmedia/Artificial% 20intelligence-driven% 20novel% 20tool% 20for% 20tooth% 20detection% 20and% 20segmentation% 20on% 20panoramic% 20radiographs. pdf> ＦａｈａｄＰａｒｖｅｚＭａｈｄｉ，ＫｏｔａＭｏｔｏｋｉａｎｄＳｙｏｊｉＫｏｂａｓｈｉ，“Ｏｐｔｉｍｉｚａｔｉｏｎｔｅｃｈｎｉｑｕｅｃｏｍｂｉｎｅｄｗｉｔｈｄｅｅｐｌｅａｒｎｉｎｇｍｅｔｈｏｄｆｏｒｔｅｅｔｈｒｅｃｏｇｎｉｔｉｏｎｉｎｄｅｎｔａｌｐａｎｏｒａｍｉｃｒａｄｉｏｇｒａｐｈｓ”，ＳｃｉｅｎｔｉｆｉｃＲｅｐｏｒｔｓｖｏｌｕｍｅ１０，Ａｒｔｉｃｌｅｎｕｍｂｅｒ：１９２６１（２０２０），［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜Ｈｔｔｐｓ：／／ｗｗｗ.ｎａｔｕｒｅ.ｃｏｍ／ａｒｔｉｃｌｅｓ／ｓ４１５９８－０２０－７５８８７－９＞ＦａｈａｄＰａｒｖｅｚＭａｈｄｉ，ＫｏｔａＭｏｔｏｋｉａｎｄＳｙｏｊｉＫｏｂａｓｈｉ，“Ｏｐｔｉｍｉｚａｔｉｏｎｔｅｃｈｎｉｑｕｅｃｏｍｂｉｎｅｄｗｉｔｈｄｅｅｐｌｅａｒｎｉｎｇｍｅｔｈｏｄｆｏｒｔｅｅｔｈｒｅｃｏｇｎｉｔｉｏｎｉｎｄｅｎｔａｌｐａｎｏｒａｍｉｃｒａｄｉｏｇｒａｐｈｓ”，ＳｃｉｅｎｔｉｆｉｃＲｅｐｏｒｔｓｖｏｌｕｍｅ１０，Ａｒｔｉｃｌｅｎｕｍｂｅｒ：１９２６１（２０２０），［ｏｎｌｉｎｅ］，［２０２１年１月１２ date search], Internet <https://www.nature.com/articles/s41598-020-75887-9> ＣｈａｎｇｇｙｕｎＫｉｍ，ＤｏｎｇｈｙｕｎＫｉｍ，ＨｏＧｕｌＪｅｏｎｇ，Ｓｕｋ－ＪａＹｏｏｎａｎｄＳｅｋｙｏｕｎｇＹｏｕｍ，“ＡｕｔｏｍａｔｉｃＴｏｏｔｈＤｅｔｅｃｔｉｏｎａｎｄＮｕｍｂｅｒｉｎｇＵｓｉｎｇａＣｏｍｂｉｎａｔｉｏｎｏｆａＣＮＮａｎｄＨｅｕｒｉｓｔｉｃＡｌｇｏｒｉｔｈｍ”，Ａｐｐｌ．Ｓｃｉ．２０２０，１０（１６），５６２４，［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜ｆｉｌｅ：／／／Ｃ：／Ｕｓｅｒｓ／Ｕｓｅｒ／Ｄｏｗｎｌｏａｄｓ／ａｐｐｌｓｃｉ－１０－０５６２４－ｖ２．ｐｄｆ＞Changgyun Kim, Donghyun Kim, HoGul Jeong, Suk-Ja Yoon and Sekyung Youm, "Automatic Tooth Detection and Numbering Using a Combination of a CNN and Algorithmic." Sci. 2020, 10(16), 5624, [online], [searched January 12, 2021], Internet <file:///C:/Users/User/Downloads/applsci-10-05624-v2. pdf> ＡｎｎｙＹｕｎｉａｒｔｉ，ＡｎｉｎｄｈｉｔａＳｉｇｉｔＮｕｇｒｏｈｏ，ＢｉｌｑｉｓＡｍａｌｉａｈａｎｄＡｇｕｓＺａｉｎａｌＡｒｉｆｉｎ，“ＣｌａｓｓｉｆｉｃａｔｉｏｎａｎｄＮｕｍｂｅｒｉｎｇｏｆＤｅｎｔａｌＲａｄｉｏｇｒａｐｈｓｆｏｒａｎＡｕｔｏｍａｔｅｄＨｕｍａｎＩｄｅｎｔｉｆｉｃａｔｉｏｎＳｙｓｔｅｍ”，ＴＥＬＫＯＭＮＩＫＡ，Ｖｏｌ．１０，Ｎｏ．１，Ｍａｒｃｈ２０１２，ｐｐ．１３７－１４６，［ｏｎｌｉｎｅ］，［２０２１年１月１２日検索］，インターネット＜ｆｉｌｅ：／／／Ｃ：／Ｕｓｅｒｓ／Ｏｗｎｅｒ／Ｄｏｗｎｌｏａｄｓ／Ｃｌａｓｓｉｆｉｃａｔｉｏｎ＿ａｎｄ＿Ｎｕｍｂｅｒｉｎｇ＿ｏｆ＿Ｄｅｎｔａｌ＿Ｒａｄｉｏｇｒａｐｈｓ．ｐｄｆ＞ＡｎｎｙＹｕｎｉａｒｔｉ，ＡｎｉｎｄｈｉｔａＳｉｇｉｔＮｕｇｒｏｈｏ，ＢｉｌｑｉｓＡｍａｌｉａｈａｎｄＡｇｕｓＺａｉｎａｌＡｒｉｆｉｎ，“ＣｌａｓｓｉｆｉｃａｔｉｏｎａｎｄＮｕｍｂｅｒｉｎｇｏｆＤｅｎｔａｌＲａｄｉｏｇｒａｐｈｓｆｏｒａｎＡｕｔｏｍａｔｅｄＨｕｍａｎＩｄｅｎｔｉｆｉｃａｔｉｏｎＳｙｓｔｅｍ”，ＴＥＬＫＯＭＮＩＫＡ，Ｖｏｌ． 10, No. 1, March 2012, pp. 137-146, [online], [retrieved January 12, 2021], Internet <file:///C:/Users/Owner/Downloads/Classification_and_Numbering_of_Dental_Radiographs. pdf>

密集して存在する類似物体である歯の画像認識においては、これまでに開発されたＤＬによる画像認識アルゴリズム単独で、それぞれの物体画像の位置を特定し、それぞれの物体画像を同定することを迅速かつ正確に実行することは極めて困難である。特に、歯式を特定し、予測する場合、速度よりも精度を優先した二段階法の物体検出アルゴリズムに加え、物体分類アルゴリズムを適用した後、ヒューリスティックな経験的方法等に基づいて歯番を補正する必要がある。しかも、成人の歯式の特定に関する検討例はあるが、幼児及び子供のように乳歯を備えたより複雑な歯式や永久歯と乳歯を備えた混合歯列を特定し予測することは困難な状況にある。 In the image recognition of teeth , which are similar objects that exist densely, it is possible to specify the position of each object image and identify each object image quickly with the DL image recognition algorithm developed so far. And it is extremely difficult to do it accurately. In particular, when identifying and predicting the tooth formula, in addition to the two-stage object detection algorithm that prioritizes accuracy over speed, after applying the object classification algorithm, the tooth number is corrected based on heuristic empirical methods. There is a need to. Moreover, although there are studies on the identification of adult teeth, it is difficult to identify and predict more complex tooth patterns with deciduous teeth, such as infants and children, and mixed dentition with permanent and deciduous teeth. be.

そこで、本発明は、歯の画像認識を迅速かつ正確に行うことが可能な画像処理装置及び画像処理方法を提供することを目的とするものである。特に、本発明は、成人だけでなく、幼児及び子供も含めた歯式を迅速かつ正確に特定することが可能な画像処理装置及び画像処理方法を提供することを目的としている。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide an image processing apparatus and an image processing method capable of rapidly and accurately recognizing tooth images. In particular, it is an object of the present invention to provide an image processing apparatus and an image processing method capable of quickly and accurately identifying dental formulas of not only adults but also infants and children.

そこで、本発明者らは、速度を重視した一段階法の物体検出アルゴリズムであるＹＯＬＯｖ３と既存の物体分類アルゴリズムであるＥｆｆｉｃｉｅｎｔＮｅｔとを直列に接続し、それぞれの歯の矩形の位置を特定し、それぞれの歯の同定を行った後、歯学的に矛盾する配列をＤＰによるアライメント補正を行うことによって、乳歯を備えた複雑な歯式や永久歯と乳歯を備えた混合歯列を迅速かつ正確に特定することができることを見出した。この知見に基づき、更に種々検討した結果、このような物体検出アルゴリズムと物体分類アルゴリズムに限定される必要はなく、両アルゴリズムの接続に適切なＤＰアライメント補正を付加することによって同様の結果が得られ本発明の完成に至った。 Therefore, the present inventors connect YOLOv3, a one-stage object detection algorithm that emphasizes speed, and EfficientNet, an existing object classification algorithm, in series, identify the position of each tooth rectangle, and Rapidly and accurately identify complex dentitions with deciduous and mixed dentitions with permanent and deciduous teeth by DP alignment correction for dental inconsistent arrangements after identification of teeth I found that it can be done. Based on this knowledge, as a result of various further studies, it was found that there is no need to be limited to such an object detection algorithm and an object classification algorithm, and similar results can be obtained by adding an appropriate DP alignment correction to the connection of both algorithms. The present invention has been completed.

すなわち、本発明は、物体画像データを入力することができる入力部と、少なくともＣＮＮをモジュールとして備える物体分類アルゴリズムで実行される既存の物体画像データセットから物体の特徴量を抽出する物体特徴抽出部をバックボーンとして内蔵する物体検出アルゴリズムで実行され、入力された第一の教師画像データ、学習画像データセット、及び、この学習画像データセットの拡張画像データを学習し、上記物体特徴抽出部の学習モデルを用いた転移学習及びファインチューニングを行って第一の学習モデルを作成でき、入力された検出対象画像上の検出対象物体個々の画像を内包する第一の矩形の情報タグとこの情報タグが付加されたその第一の矩形の位置を特定することができる物体画像配置部と、物体分類アルゴリズムで実行され、第一の矩形のデータ及び／又はその第一の矩形の広域データ、入力された上記第一の教師画像データとは異なる第二の教師画像データ、上記学習画像データセット、及び、この学習画像データセットの拡張画像データを学習し、物体特徴抽出部の学習モデルを用いた転移学習及びファインチューニングを行って第二の学習モデルを作成でき、上記物体画像配置部で特定された第一の矩形に固有情報タグを付加して上記検出対象物体画像を分類及び同定することができる物体画像同定部と、この物体画像同定部から出力された結果を補正することができる物体画像補正部と、上記検出対象物体画像の処理結果を出力することができる出力部とが備えられていることを特徴とする画像処理装置である。 That is, the present invention includes an input unit capable of inputting object image data, and an object feature extraction unit for extracting object feature amounts from an existing object image data set executed by an object classification algorithm having at least a CNN as a module. as a backbone, learns the input first teacher image data, the learning image data set, and the extended image data of this learning image data set, and learns the learning model of the object feature extraction unit A first learning model can be created by performing transfer learning and fine-tuning using an object image locator capable of identifying the position of the first rectangle obtained by the object classification algorithm, the data of the first rectangle and/or the global data of the first rectangle; Second teacher image data different from the first teacher image data, the learning image data set, and extended image data of the learning image data set are learned, transfer learning using a learning model of the object feature extraction unit, and An object image capable of fine-tuning to create a second learning model, and adding a unique information tag to the first rectangle specified by the object image placement unit to classify and identify the detection target object image. An identification unit, an object image correction unit capable of correcting a result output from the object image identification unit, and an output unit capable of outputting the processing result of the object image to be detected. It is an image processing device characterized by:

ここで、上記物体分類アルゴリズムは、特に限定されるものではないが、速度及び精度を兼備している、ＡｌｅｘＮｅｔ、ＧＰｉｐｅ（ＧｉａｎｔＮｅｕｒａｌＮｅｔｗｏｒｋｓｕｓｉｎｇＰｉｐｅｌｉｎｅＰａｒａｌｌｅｌｉｓｍ）、Ｉｎｃｅｐｔｉｏｎ、ＳＥＢ（Ｓｑｕｅｅｚｅ－ａｎｄ－ＥｘｃｉｔａｔｉｏｎＢｌｏｃｋ）－Ｉｎｃｅｐｔｉｏｎ、Ｘｅｐｔｉｏｎ、ＤｅｎｓｅＮｅｔ（ＤｅｎｓｅｌｙＣｏｎｎｅｃｔｅｄＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）、ＲｅｓＮｅｔ（ＲｅｓｉｄｕａｌＮｅｔｗｏｒｋ）、ＳＥＢ－ＲｅｓＮｅｔ、Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＳＥＢ－Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＲｅｓＮｅＸｔ、ＮＡＳＮｅｔ（ＮｅｕｒａｌＡｒｃｈｉｔｅｃｔｕｒｅＳｅａｒｃｈＮｅｔｗｏｒｋ）、ＶＧＧ（ＶｉｓｕａｌＧｅｏｍｅｔｒｙＧｒｏｕｐ）、ＳＥＢ－ＶＧＧ、ＭｏｂｉｌｅＮｅｔ、ＭｎａｓＮｅｔ、ＡｍｏｅｂａＮｅｔ、ＣＳＰＮｅｔ（ＣｒｏｓｓＳｔａｇｅＰａｒｔｉａｌＮｅｔｗｏｒｋ）、ＣＢＮｅｔ（ＣｏｍｐｏｓｉｔｅＢａｃｋｂｏｎｅＮｅｔｗｏｒｋ）、Ｄａｒｋｎｅｔ、ＥｆｆｉｃｉｅｎｔＮｅｔ、及び、ＮＦＮｅｔの中から選択される少なくともいずれか一つ以上を用いることが好ましい。既述したように、物体分類アルゴリズムは、これらの名称にバージョン等を付設し、改良されたアルゴリズムとして数多くのものが輩出されているが、上記物体アルゴリズムには、これらを全て含み、以下に記述する物体分類アルゴリズム全てに亘って同様である。 Here, the object classification algorithm is not particularly limited, but it is fast and accurate, AlexNet, GPipe (Giant Neural Networks using Pipeline Parallelism), Inception, SEB (Squeeze-and-Excitation Block) －Ｉｎｃｅｐｔｉｏｎ、Ｘｅｐｔｉｏｎ、ＤｅｎｓｅＮｅｔ（ＤｅｎｓｅｌｙＣｏｎｎｅｃｔｅｄＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）、ＲｅｓＮｅｔ（ＲｅｓｉｄｕａｌＮｅｔｗｏｒｋ）、ＳＥＢ－ＲｅｓＮｅｔ、Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＳＥＢ－Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＲｅｓＮｅＸｔ、ＮＡＳＮｅｔ（ＮｅｕｒａｌＡｒｃｈｉｔｅｃｔｕｒｅＳｅａｒｃｈＮｅｔｗｏｒｋ）、ＶＧＧ（ＶｉｓｕａｌＧｅｏｍｅｔｒｙＧｒｏｕｐ）、 It is preferable to use at least one or more selected from SEB-VGG, MobileNet, MnasNet, AmoebaNet, CSPNet (Cross Stage Partial Network), CBNet (Composite Backbone Network), Darknet, EfficientNet, and NFNet. As mentioned above, the object classification algorithm has a number of versions attached to these names, and many improved algorithms have been produced. The same is true for all object classification algorithms that

特に、上記物体分類アルゴリズムとしては、ＳＥＢ、ＲＢ（ＲｅｓｉｄｕａｌＢｌｏｃｋ）、ＤＣｏｎｖ（ＤｅｐｔｈｗｉｓｅＣｏｎｖｏｌｕｔｉｏｎＬａｙｅｒ）、ＰＣｏｎｖ（ＰｏｉｎｔｗｉｓｅＣｏｎｖｏｌｕｔｉｏｎＬａｙｅｒ）、ＭｉｘＣｏｎｖ（ＭｉｘｅｄＤｅｐｔｈｗｉｓｅＣｏｎｖｏｌｕｔｉｏｎＬａｙｅｒ）、及び、ＧＡＰ（ＧｌｏｂａｌＡｖｅｒａｇｅＰｏｏｌｉｎｇ）の中から選択されるモジュール及び／又はブロックを少なくとも一つ以上を備えているものが、精度を高めることができ好ましく、これらのモジュール及び／又はブロックを備えている物体分類アルゴリズムであれば特に限定されないが、例えば、ＲｅｓＮｅｔ、ＲｅｓＮｅＸｔ、ＭｏｂｉｌｅＮｅｔ、ＭｎａｓＮｅｔ、Ｄａｒｋｎｅｔ、及び、ＥｆｆｉｃｉｅｎｔＮｅｔを上げることができる。 In particular, the object classification algorithms include SEB, RB (Residual Block), DConv (Depthwise Convolution Layer), PConv (Pointwise Convolution Layer), MixConv (Mixed Depthwise Convolution Layer), and GAP (Globalization). An algorithm that includes at least one or more selected modules and/or blocks is preferable because it can improve accuracy, and is not particularly limited as long as it is an object classification algorithm that includes these modules and/or blocks. Examples include ResNet, ResNeXt, MobileNet, MnasNet, Darknet, and EfficientNet.

そして、物体画像補正部は、ヒューリスティックな経験的方法と対置をなす、バイオインフォマティクス分野でタンパク質や遺伝子情報を解明するための動的計画法（ＤｙｎａｍｉｃＰｒｏｇｒａｍｉｎｇ、ＤＰ法）と呼ばれるアルゴリズムで実行することが、迅速かつ高速な補正を行うことが可能となり好ましい。このＤＰ法は、全体的な配列は判定できないが、局所的に類似したアライメントを判定する場合にはローカルアライメントに、全体的な配列を判定する場合にはグローバルアライメントに適用することができるので、目的に応じて使い分ける必要がある。特に、歯式を特定する場合は、局所的なローカルアライメントに適用すると歯式全体を補正することが困難であり、全体的なグローバルアライメントに適用すると智歯の有無や智歯から連続する欠損歯の存在に対する補正を精度よく行うことができないため、セミグローバルアライメントに適用することがより好ましい。 The object image correction unit can be executed by an algorithm called dynamic programming (DP method) for elucidating protein and gene information in the bioinformatics field, which is opposed to the heuristic empirical method. , it is possible to perform quick and high-speed correction, which is preferable. Since this DP method cannot determine the overall sequence, but can be applied to local alignments when determining locally similar alignments, and to global alignments when determining the overall sequence, It is necessary to use properly according to the purpose. In particular, when identifying the dental formula, it is difficult to correct the entire dental formula when applied to the local local alignment, and when applied to the overall global alignment, the presence or absence of wisdom teeth and the presence of missing teeth continuing from the wisdom teeth. It is more preferable to apply to semi-global alignment because correction for .

一方、上記物体検出アルゴリズムには、二段階法の物体検出アルゴリズム又は一段階法の物体検出アルゴリズムのいずれも適用することができる。 On the other hand, as the object detection algorithm, either a two-step object detection algorithm or a one-step object detection algorithm can be applied.

特に、二段階法の物体検出アルゴリズムとしては、Ｒ－ＣＮＮ（Ｒｅｇｉｏｎ－ｂａｓｅｄＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）、ＦａｓｔＲ－ＣＮＮ、ＦａｓｔｅｒＲ－ＣＮＮ、ＭａｓｋＲ－ＣＮＮ、及び、Ｒ－ＦＣＮ（Ｒｅｇｉｏｎ－ｂａｓｅｄＦｕｌｌｙＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）の中から選択される少なくともいずれか一つ以上を採用することが好ましい。 In particular, two-stage object detection algorithms include R-CNN (Region-based Convolutional Neural Network), Fast R-CNN, Faster R-CNN, Mask R-CNN, and R-FCN (Region-based Fully Convolutional Network) is preferably adopted.

また、一段階法の物体検出アルゴリズムとしては、Оｖｅｒｆｅａｔ、ＤＰＭ（ＤｅｆｏｒｍａｂｌｅＰａｒｔｓＭｏｄｅｌ）、ＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＤＳＳＤ（ＤｅｃｏｎｖｏｌｕｔｉｏｎａｌＳｉｎｇｌｅＳｈｏｔＤｅｔｅｃｔｏｒ）、ＥＳＳＤ（ＥｘｔｅｎｄｔｈｅｓｈａｌｌｏｗｐａｒｔｏｆＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＲｅｆｉｎｅＤｅｔ（Ｓｉｎｇｌｅ－ＳｈｏｔＲｅｆｉｎｅｍｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋｆｏｒＯｂｊｅｃｔＤｅｔｅｃｔｉｏｎ）、ＲｅｔｉｎａＮｅｔ、Ｍ２Ｄｅｔ、ＹＯＬＯ、ＳｃａｌｅｄＹＯＬＯ、及び、ＥｆｆｉｃｉｅｎｔＤｅｔの中から選択される少なくともいずれか一つ以上を採用することが好ましい。この物体検出アルゴリズムについても、既述したように、これらの名称にバージョン等を付設し、改良されたアルゴリズムとして数多くのものが輩出されているが、上記物体検出アルゴリズムには、これらを全て含み、以下に記述する物体検出アルゴリズム全てに亘って同様である。また、一段階法の物体検出アルゴリズムとしては、Оｖｅｒｆｅａｔ、ＤＰＭ（ＤｅｆｏｒｍａｂｌｅＰａｒｔｓＭｏｄｅｌ）、ＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＤＳＳＤ（ＤｅｃｏｎｖｏｌｕｔｉｏｎａｌＳｉｎｇｌｅＳｈｏｔＤｅｔｅｃｔｏｒ）、ＥＳＳＤ（ＥｘｔｅｎｄｔｈｅｓｈａｌｌｏｗｐａｒｔｏｆＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、 It is preferable to adopt at least one or more selected from RefineDet (Single-Shot Refinement Neural Network for Object Detection), RetinaNet, M2Det, YOLO, Scaled YOLO, and EfficientDet. As already mentioned, this object detection algorithm also has versions attached to these names, and many improved algorithms have been produced, but the object detection algorithm includes all of these, The same is true for all of the object detection algorithms described below.

一般的には、二段階法が精度に優れ、一段階法が速度に優れていると考えられているが、本発明者らは、本画像処理装置の精度及び速度を向上させる要素として、本画像処理装置を実行する物体検出アルゴリズムと物体分類アルゴリズムとの組み合わせが重要な要素であることを見出した。その結果、物体検出アルゴリズムが、Ｍ２Ｄｅｔ、ＹＯＬＯ、及び、ＥｆｆｉｃｉｅｎｔＤｅｔの中から選択される少なくとも一つ以上であって、物体分類アルゴリズムが、ＲｅｓＮｅｔ、ＲｅｓＮｅＸｔ、ＭｏｂｉｌｅＮｅｔ、ＭｎａｓＮｅｔ、Ｄａｒｋｎｅｔ、及び、ＥｆｆｉｃｉｅｎｔＮｅｔの中から選択される少なくともいずれか一つ以上であることが好ましいことが分かった。さらに、より迅速かつ正確に画像認識を実行するためには、ＹＯＬＯｖ３以降のＹＯＬＯ及びＥｆｆｉｃｉｅｎｔＤｅｔの物体検出アルゴリズムと、ＲｅｓＮｅｔ－１０１以降のＲｅｓＮｅｔ、ＭｏｂｉｌｅＮｅｔＶ３以降のＭｏｂｉｌｅＮｅｔ、及び、Ｄａｒｋｎｅｔ－５３以降のＤａｒｋｎｅｔの物体分類アルゴリズムとの組み合わせが好ましい。 It is generally believed that the two-step method is superior in accuracy and the one-step method is superior in speed. We found that the combination of object detection algorithm and object classification algorithm running the image processor is the key factor. As a result, the object detection algorithm is at least one selected from M2Det, YOLO, and EfficientDet, and the object classification algorithm is selected from ResNet, ResNeXt, MobileNet, MnasNet, Darknet, and EfficientNet It was found that it is preferable to select at least one or more. Furthermore, in order to perform image recognition more quickly and accurately, the object detection algorithm of YOLO and EfficientDet after YOLOv3, ResNet after ResNet-101, MobileNet after MobileNet V3, and Darknet object after Darknet-53 A combination with a classification algorithm is preferred.

以上、本発明の画像処理装置について説明したが、本発明の画像処理方法は、歯の画像処理において有効であり、成人の歯式を迅速かつ正確に特定することができるだけでなく、幼児及び子供のように、乳歯と永久歯が混在したより複雑な歯式をも迅速かつ正確に特定することが可能であることを特徴としている。 The image processing apparatus of the present invention has been described above. The image processing method of the present invention is effective in tooth image processing, and is capable not only of quickly and accurately specifying the dental formula of adults, but also of infants and children. It is characterized by being able to quickly and accurately identify even more complex dental formulas such as those in which primary teeth and permanent teeth are mixed.

まず、本発明の第一の画像処理方法は、入力された第一の教師画像データ、学習画像データセット、及び、この学習画像データセットの拡張データを学習すると共に、物体特徴抽出部の学習モデルを用いた転移学習及びファインチューニングを行って習得された第一の学習モデルを用い、入力された検出対象画像上の検出対象物体個々の画像を内包する第一の矩形の情報タグと、この情報タグが付加された前記第一の矩形の位置を特定する工程と、第一の矩形のデータ及び／又は第一の矩形の広域データ、入力された前記第一の教師画像データとは異なる第二の教師画像データ、上記学習画像データセット、及び、上記学習画像データセットの拡張データを学習すると共に、物体特徴抽出部の学習モデルを用いた転移学習及びファインチューニングを行って習得された第二の学習モデルを用い、位置が特定された第一の矩形に、固有情報タグを付加し検出対象物体画像を分類して同定する工程と、同定した結果が自然法則に反するような検出対象物体画像を補正する工程と、検出対象物体画像の処理結果を出力する工程とが経由されることを特徴としている。 First, the first image processing method of the present invention learns input first teacher image data, a learning image data set, and extension data of this learning image data set, and learns a learning model of an object feature extraction unit. Using the first learning model learned by performing transfer learning and fine tuning using the input detection target image, a first rectangular information tag containing the image of each detection target object on the input detection target image, and this information specifying the position of the first rectangle to which the tag is added; and the first rectangle data and/or the first rectangle wide-area data, the input first teacher image data different from the second The second acquired by learning the teacher image data, the learning image data set, and the extension data of the learning image data set, and performing transfer learning and fine tuning using the learning model of the object feature extraction unit Using the learning model, a process of adding a unique information tag to the first rectangle whose position is specified, classifying and identifying the detection target object image, and identifying the detection target object image whose identification result violates the laws of nature. It is characterized by passing through a step of correcting and a step of outputting the processing result of the object image to be detected.

本発明の画像処理装置の物体配置部は、既知の学習データセットから物体の特徴量を抽出する物体特徴抽出部をバックボーンとして内蔵する物体分類アルゴリズムを内蔵した二段階法又は一段階法の物体検出アルゴリズムで実行されるため、物体画像の位置及び同定を行うことが可能であるが、第一の教師画像データにアノテーションされる情報タグを制限し、各検出対象物体画像を内包する矩形の位置と同定を正確に特定することに特化したことに特徴がある。 The object placement unit of the image processing apparatus of the present invention is a two-step method or a one-step object detection method incorporating an object classification algorithm with a built-in object feature extraction unit that extracts the feature amount of an object from a known learning data set as a backbone. Since it is executed by the algorithm, it is possible to position and identify the object image, but the information tag annotated in the first teacher image data is limited, and the position and position of the rectangle containing each object image to be detected. It is characterized by its specialization in accurately specifying identification.

このようにして正確に特定された各検出対象物体画像を内包する矩形の位置に基づいて、第一の教師画像データとは異なり、第二の教師画像データには、各検出物体を同定可能な情報タグがアノテーションされ、物体の分類に特化した物体分類アルゴリズムを用いて、それぞれに固有の情報タグが付加されて検出対象物体画像の位置が正確に特定されと共に、検出対象物体画像の種類が正確に同定される。 Different from the first teacher image data, the second teacher image data contains information in which each detection object can be identified based on the position of the rectangle containing each detection target object image accurately specified in this way. Information tags are annotated, and using an object classification algorithm specialized for classifying objects, a unique information tag is added to each to accurately identify the position of the detection target object image, and the type of the detection target object image is specified. correctly identified.

しかし、密集して存在する歯の画像認識においては、物体画像のオーバーラップ等に起因して、例えば、隣接する異なる物体を同じ配置であるという重複した予測をするように、自然法則に反する予測が生じる場合がある。このため、最終点検として、検出対象物体を補正する工程を経ることが有効である。そして、補正する方法としては、ＤＰアライメントアルゴリズムを用いて行うことが好ましい。 However, in recognizing images of teeth that are densely present, due to overlapping of object images, predictions that go against the laws of nature, such as duplicate predictions that adjacent different objects are in the same arrangement, are performed. may occur. Therefore, it is effective to go through the process of correcting the object to be detected as the final inspection. As a method of correction, it is preferable to use a DP alignment algorithm.

具体例として、歯式の特定にこのような画像処理方法を利用する場合には、次のようにして行うことができる。 As a specific example, when such an image processing method is used to identify a dental formula, it can be carried out as follows.

第一の教師画像データは、数多くの医療機関で撮影された歯科Ｘ線デジタル画像を用い、歯科放射線専門医によってアノテーションされた訓練用画像データ及び検証用データ、並びに、テスト用データから成る学習画像セットが、ＣＮＮをモジュールとして備える物体分類アルゴリズムで実行される既存の物体画像データセットから物体の特徴量を抽出する物体特徴抽出部をバックボーンとして内蔵する二段階法又は一段階法の物体検出アルゴリズムで実行される物体画像配置部に入力される。ここで、アノテーションにおいて定義される情報タグが、例えば、上顎歯と下顎歯とは区別し、歯冠部、歯根部、歯冠部と歯根部の境界付近、及び、歯冠部と歯根部全体等のように、個々の歯の共通する部分の矩形であると定義、設定された矩形のデータを第一の教師画像データとして入力しておく。一方、上記多数の歯科Ｘ線デジタル画像を、縦横比の揺らぎ、解像度のスケーリング、クロッピング、平行移動、回転、左右反転、ランダム消去、ランダムノイズ付与、及び、明度等を考慮したオーギュメンテーション、所謂、データ拡張を行った画像を作成し、同じく物体画像配置部に入力される。 The first training image data consists of training image data, verification data, and test data annotated by dental radiologists using dental X-ray digital images taken at many medical institutions. is executed by a two-step or one-step object detection algorithm with built-in object feature extractor as a backbone that extracts the feature quantity of an object from an existing object image data set, which is executed by an object classification algorithm equipped with CNN as a module. is input to the object image placement unit. Here, the information tag defined in the annotation distinguishes between, for example, the maxillary tooth and the mandibular tooth, the crown portion, the root portion, the vicinity of the boundary between the crown portion and the root portion, and the entire crown portion and the root portion. Rectangle data defined and set as rectangles of common portions of individual teeth are input as first teacher image data. On the other hand, the large number of dental X-ray digital images are subjected to so-called augmentation considering aspect ratio fluctuation, resolution scaling, cropping, translation, rotation, left-right reversal, random erasure, random noise addition, and brightness. , an image with data extension is created and input to the object image placement unit.

そして、これらの画像データを用いて学習すると共に、物体特徴抽出部で学習され、抽出された特徴量を利用して、転移学習を行い、ファインチューニングを行った結果として第一の学習モデルを作成し、この学習モデルを用い、入力された検出対象画像上の検出対象物体である個々の歯を内包する第一矩形の個々の情報タグと、その情報タグが付加された検出対象物体である個々の歯を内包する第一の矩形の位置が特定される。 Then, learning is performed using these image data, and transfer learning is performed using the feature values extracted by learning in the object feature extraction unit, and the first learning model is created as a result of fine tuning. Then, using this learning model, each information tag of the first rectangle that includes each tooth, which is the detection target object on the input detection target image, and each detection target object to which the information tag is added are obtained. A first rectangle containing the teeth of is located.

次いで、上記物体画像配置部に入力された学習画像セット及びこれらのデータ拡張された画像は同じであるが、第一の教師画像データとは異なる第二の教師画像データを作製し、これらを物体分類アルゴリズムで実行される物体画像同定部に入力する。ここで、第二の教師画像データは、検出対象物体である個々の歯、すなわち、対象歯を検出するための画像データ、及び、対象歯を分類し、同定するためのアノテーションされた教師画像データを用いる。前者としては、対象歯の画像データ、対象歯とそれ以外の歯との相対位置が分かる画像データ、対象歯を中心とした広域画像、並びに、対象歯の勾配画像及び角度画像等を挙げることができる。特に、対象歯を中心として、少なくとも隣接歯を含む広域画像が好ましく用いられる。また、後者としては、上顎歯と下顎歯の分類、右側歯と左側歯の分類、永久歯と乳歯の分類、歯種（切歯、犬歯、小臼歯、大臼歯等）の分類、智歯と非智歯の分類に関する教師信号を用いる。以上の教師信号を学習した結果として、第二の学習モデルを作成し、この学習モデルを用いて検出対象である歯の歯式を推論する。なお、この学習モデルを作成するにあたって、シングルタスク学習及びマルチタスク学習のいずれも用いることができるが、本発明においては、マルチタスク学習の方が好ましい。これは、一般的に、推論時の速度が勝っている点においてマルチタスク学習を用いることが好ましが、歯のように類似した物体において懸念された正確性の劣化がなかったためである。また、推論において、一般的な推論モジュールを使用することができ、ＯｐｅｎＶＩＮＯ（登録商標)を使用することができる。 Next, the learning image set input to the object image placement unit and the data-extended images thereof are the same, but second teacher image data different from the first teacher image data are created, and these are used as object image data. Input to the object image identification section which is executed by the classification algorithm. Here, the second training image data are individual teeth that are objects to be detected, that is, image data for detecting the target teeth, and annotated training image data for classifying and identifying the target teeth. Use Examples of the former include image data of the target tooth, image data showing the relative positions of the target tooth and other teeth, wide-area images centering on the target tooth, and gradient and angle images of the target tooth. can. In particular, a wide-area image including at least adjacent teeth centered on the target tooth is preferably used. For the latter, the classification of maxillary and mandibular teeth, the classification of right and left teeth, the classification of permanent and deciduous teeth, the classification of tooth types (incisors, canines, premolars, molars, etc.), wisdom teeth and non-wisdom teeth Use a teacher signal for the classification of As a result of learning the above teacher signal, a second learning model is created, and the tooth formula of the tooth to be detected is inferred using this learning model. In creating this learning model, both single-task learning and multi-task learning can be used, but multi-task learning is preferred in the present invention. This is because it is generally preferable to use multi-task learning in terms of speed during reasoning, but without the worrying degradation of accuracy for similar objects such as teeth. Also, in inference, a general inference module can be used, and OpenVINO (registered trademark) can be used.

このように推論された歯式においては、類似した歯が密集して存在し、隣接歯の関係が加味されないため、解剖学上ありえない配列の推論結果が得られる場合がある。この場合には、ＤＰアライメントアルゴリズムで実行される物体画像補正部で補正する工程を得る。特に、このＤＰアライメントアルゴリズムで実行される歯式を補正する工程の特徴は、歯式の補正が、検出対象物体である歯を分類して同定する工程で処理して推論された結果をＤＰアライメントアルゴリズムに適用することによって精度よく行えることにある。ただし、当然、必ずしも誤った推論結果が得られるとは限らないので、この工程は、本発明の画像処理方法に含まれているが、必ずしもこの工程を経る必要があることを意味するものではない。 In the tooth formula inferred in this way, similar teeth are densely present and the relationship between adjacent teeth is not taken into consideration, so an inferred result of an anatomically impossible arrangement may be obtained. In this case, a step of correcting with an object image corrector executed with the DP alignment algorithm is obtained. In particular, the characteristic of the process of correcting the tooth formula executed by this DP alignment algorithm is that the correction of the tooth formula is processed in the step of classifying and identifying the tooth, which is the object to be detected, and the result inferred from the DP alignment. It is that it can be performed with high accuracy by applying it to the algorithm. However, of course, an erroneous inference result is not necessarily obtained, so this step is included in the image processing method of the present invention, but it does not necessarily mean that it is necessary to go through this step. .

更に、歯の画像認識をより正確に行うためには、歯の集合を内包する矩形を更に特定した上で、情報タグが付加された個々の歯の矩形の位置を特定した後、個々の歯を分類し、同定する画像処理方法がより好ましいことを見出した。 Furthermore , in order to perform tooth image recognition more accurately, after further specifying the rectangle that contains the set of teeth , after specifying the position of each rectangle to which the information tag is added, each tooth We have found that image processing methods that classify and identify are more preferable.

すなわち、本発明のより好ましい画像処理方法は、入力された第一及び第二の教師画像データとは異なる第三の教師画像データ、学習画像データセット、及び、この学習画像データセットの拡張データを学習すると共に、物体特徴抽出部の学習モデルを用いた転移学習及びファインチューニングを行って習得された第三の学習モデルを用い、入力された検出対象画像上の検出対象物体画像全てを内包する第二の矩形の位置を特定する工程と、第二の矩形のデータ及び／又は第二の矩形の広域データ、入力された第一の教師画像データ、上記学習画像データセット、及び、上記学習画像データセットの拡張データを学習すると共に、物体特徴抽出部の学習モデルを用いた転移学習及びファインチューニングを行って習得された第四の学習モデルを用い、入力された検出対象画像上の検出対象物体個々の画像を内包する第一の矩形が改良された第三の矩形の情報タグと、情報タグが付加された第三の矩形の位置を特定する工程と、第三の矩形のデータ及び／又は前記第三の矩形の広域データ、入力された第二の教師画像データ、上記学習画像データセット、及び、上記学習画像データセットの拡張データを学習すると共に、物体特徴抽出部の学習モデルを用いた転移学習及びファインチューニングを行って習得された第五の学習モデルを用い、第三の矩形に、固有情報タグを付加し検出対象物体画像を分類して同定する工程と、検出対象物体画像を補正する工程と、検出対象物体画像の処理結果を出力する工程とが経由されることを特徴としている。 That is, a more preferable image processing method of the present invention is to process third teacher image data different from the input first and second teacher image data, a learning image data set, and extension data of this learning image data set. Using a third learning model acquired by performing transfer learning and fine tuning using a learning model of the object feature extraction unit while learning, a third learning model that includes all detection target object images on the input detection target image a step of specifying the positions of two rectangles, the second rectangle data and/or the second rectangle wide area data, the input first teacher image data, the learning image data set, and the learning image data A set of extended data is learned, and using a fourth learning model acquired by performing transfer learning and fine tuning using the learning model of the object feature extraction unit, each detection target object on the input detection target image is used. identifying a third rectangular information tag enhanced with the first rectangle containing the image of; identifying the position of the third rectangular to which the information tag is added; data of the third rectangle and/or the Learning the third rectangular wide area data, the input second teacher image data, the learning image data set, and the extension data of the learning image data set, and transferring using the learning model of the object feature extraction unit Using the fifth learning model acquired through learning and fine-tuning, adding a unique information tag to the third rectangle, classifying and identifying the detection target object image, and correcting the detection target object image. and a step of outputting the processing result of the object image to be detected.

この画像処理方法において、第二の矩形が、歯の集合を内包する矩形であって、検出対象画像上の不要な情報を削除すると共に、歯の同定を行うためには、個々の歯の形状だけではなく、これらの相対的な位置関係を把握することが重要であることを見出したためである。 In this image processing method, the second rectangle is a rectangle containing a set of teeth . This is not only because he found it important to grasp the relative positional relationship between them.

特に、歯式の特定においては、この歯の集合を内包する第二の矩形は、個々の歯を全て内包する矩形であり、このために第一及び第二の教師画像データに加え、個々の歯を全て内包する矩形を教師画像として設定した第三の教師画像データを適用した。この工程を加えたこと以外は、上記第一の画像処理方法と変わりなく行うことができるが、この工程による歯式の特定の精度の向上が認められた。 In particular, in specifying the dental formula, the second rectangle enclosing this set of teeth is a rectangle enclosing all of the individual teeth. A third teacher image data was applied, in which a rectangle containing all teeth was set as a teacher image. Except for the addition of this step, it can be performed in the same manner as in the first image processing method, but it was recognized that this step improves the accuracy of specifying the tooth formula.

このような本発明の画像処理装置及び画像処理方法は、特に類似した物体が密集した検出対象物体である歯の画像認識に適しており、デジタル画像であれば何ら限定されるものではない。具体例として挙げた歯式の特定においても、歯科デジタルＸ線画像、例えば、口内Ｘ線画像、ＤＰＲ、ＣＢＣＴ、及び、セファログラム等を用いることができる。ただし、被曝線量が極めて少なく、最も広く普及しデータ量が豊富な上、歯及び顎骨に生じる主要な特徴及び病変が全て描出されるＤＰＲが最も好ましい。 Such an image processing apparatus and image processing method of the present invention are particularly suitable for image recognition of teeth , which are objects to be detected in which similar objects are densely packed, and are not limited in any way as long as they are digital images. Dental digital X-ray images, such as intraoral X-ray images, DPR, CBCT, cephalograms, etc., can also be used in identifying the dental formula given as a specific example. However, DPR is most preferable because it has the lowest radiation dose, is the most widely used and has the most data, and renders all the major features and lesions occurring in the teeth and jawbone.

本発明により、二段階法又は一段階法の物体検出アルゴリズムと物体分類アルゴリズムとを直列に接続した画像処理を行い、歯の画像の認識を迅速かつ正確に行うことができるようになった。また、本発明の物体画像補正には、ＤＰアライメントアルゴリズムを採用しているので、密集して存在する歯の画像で生じやすい自然法則に反する予測の矛盾を論理的かつ迅速に解消することが可能となった。従って、本発明の画像処理装置及び画像処理方法は、従来技術では困難であった、極めて複雑に接近した類似物体の配置、分類、及び、同定を行うことができる。特に、成人の歯式の特定においては、予測速度の向上を実現することができた上、従来困難であり、報告例がない、乳歯が存在する幼児及び子供の歯式も特定することができた。 According to the present invention, image processing is performed by serially connecting an object detection algorithm and an object classification algorithm of a two-step method or a one-step method, and recognition of tooth images can be performed quickly and accurately. In addition, since the object image correction of the present invention employs the DP alignment algorithm, it is possible to logically and quickly resolve contradictions in prediction that go against the law of nature and tend to occur in densely existing tooth images. became. Therefore, the image processing apparatus and image processing method of the present invention can perform the arrangement, classification, and identification of very complicated similar objects that are close to each other, which has been difficult with conventional techniques. In particular, it was possible to improve the prediction speed in identifying the dental formula of adults, and it is also possible to identify the dental formula of infants and children with primary teeth, which has been difficult and has never been reported. rice field.

図１は、本発明の一実施形態に係る、入力部、物体画像配置部、物体画像同定部、物体画像補正部、及び、出力部が直列に接続された画像処理装置であり、物体画像配置部は一段階法物体検出アルゴリズムであるＹＯＬＯｖ３を、物体画像同定部は物体分類アルゴリズムであるＥｆｆｉｃｉｅｎｔＮｅｔを、そして、物体画像補正部はＤＰアライメントアルゴリズムを用いて実行することを特徴とする第一の画像処理装置の概要を示す図ある。FIG. 1 shows an image processing apparatus in which an input unit, an object image placement unit, an object image identification unit, an object image correction unit, and an output unit are connected in series according to an embodiment of the present invention. YOLOv3, a one-step object detection algorithm; the object image identification unit, an object classification algorithm; EfficientNet; and the object image correction unit, a DP alignment algorithm. It is a figure which shows the outline|summary of a processing apparatus. 図２は、本発明の一実施形態に係る、入力部、物体画像配置部、物体画像同定部、物体画像補正部、及び、出力部が直列に接続された画像処理装置であり、物体画像配置部は一段階法物体検出アルゴリズムであるＥｆｆｉｃｉｅｎｔＤｅｔを、物体画像同定部は物体分類アルゴリズムであるＥｆｆｉｃｉｅｎｔＮｅｔを、そして、物体画像補正部はＤＰアライメントアルゴリズムを用いて実行することを特徴とする第二の画像処理装置の概要を示す図である。FIG. 2 shows an image processing apparatus in which an input unit, an object image placement unit, an object image identification unit, an object image correction unit, and an output unit are connected in series, according to an embodiment of the present invention. the object image identification unit using the one-step object detection algorithm EfficientDet, the object image identification unit using the object classification algorithm EfficientNet, and the object image correction unit using the DP alignment algorithm. It is a figure which shows the outline|summary of a processing apparatus. 本発明の一実施形態に係る、第一の画像処理装置を用いた歯の画像処理方法において、図３は、入力部から物体画像配置部に入力された画像データから個々の物体を内包する矩形を特定し、その矩形に情報タグを付加し、位置を特定する工程までの概要を示す図である。In the tooth image processing method using the first image processing device according to one embodiment of the present invention, FIG. is identified, an information tag is added to the rectangle, and an outline is shown up to the process of identifying the position. 本発明の一実施形態に係る、第一の画像処理装置を用いた歯の画像処理方法において、図４は、図３の画像処理工程から物体画像同定部に入力された物体の矩形画像を分類し同定する工程から、物体画像補正部に入力された自然法則に反する誤った同定画像を補正する工程を経て、画像処理結果である検出物体画像を出力する工程の概要を示す図である。In the tooth image processing method using the first image processing apparatus according to one embodiment of the present invention, FIG. 10 is a diagram showing an outline of a process of outputting a detected object image, which is an image processing result, through a process of correcting an erroneous identified image that is input to an object image correcting unit and is in violation of the law of nature; FIG. 本発明の一実施形態に係る、第一の画像処理装置を用いた歯のＤＰＲの画像処理方法において、図５は、入力部から物体画像配置部に入力された画像データから個々の歯を内包する矩形を特定すると共に、それぞれの矩形に情報タグを付加し、位置を特定する工程までの概要を示す図である。In the tooth DPR image processing method using the first image processing device according to one embodiment of the present invention, FIG. FIG. 10 is a diagram showing an outline of a process of specifying a rectangle to be processed, adding an information tag to each rectangle, and specifying a position; 本発明の一実施形態に係る、第一の画像処理装置を用いた歯のＤＰＲの画像処理方法において、図６は、図５の画像処理工程から物体画像同定部に入力された個々の歯の矩形画像を分類し歯番を付加して歯式を生成する工程から、物体画像補正部に入力された自然法則に反する誤った歯式画像を補正する工程を経て、画像処理結果である歯式画像を出力する工程の概要を示す図である。In the tooth DPR image processing method using the first image processing apparatus according to one embodiment of the present invention, FIG. From the process of classifying rectangular images and adding tooth numbers to generate tooth formulas, through the process of correcting erroneous tooth formula images that violate natural laws input to the object image corrector, the tooth formulas that are the results of image processing are generated. FIG. 4 is a diagram showing an outline of a process of outputting an image; 本発明の一実施形態に係る、第一の画像装置を用いた歯のＤＰＲの画像処理方法において、図７は、入力部から物体画像配置部に入力された画像データから全ての歯を内包する矩形を特定する工程までの概要を示す図である。In the tooth DPR image processing method using the first imaging device according to one embodiment of the present invention, FIG. It is a figure which shows the outline|summary to the process of identifying a rectangle. 本発明の一実施形態に係る、第一の画像装置を用いた歯のＤＰＲの画像処理方法において、図８は、図７に引き続き、物体画像配置部において実行される画像処理工程であり、全ての歯を内包する矩形画像を個々の歯の矩形に特定すると共に、それぞれの矩形に情報タグを付加し、位置を特定する工程までの概要を示す図である。In the tooth DPR image processing method using the first imaging device according to one embodiment of the present invention, FIG. FIG. 10 is a diagram showing an outline of a process of identifying rectangles of individual teeth in a rectangular image including teeth of teeth, adding an information tag to each rectangle, and identifying a position; 本発明の一実施形態に係る、第一の画像処理装置を用いた歯のＤＰＲの画像処理方法において、図９は、図８の画像処理工程から物体画像同定部に入力された個々の歯の矩形画像を分類し歯番を付加して歯式を生成する工程から、物体画像補正部に入力された自然法則に反する誤った歯式画像を補正する工程を経て、画像処理結果である歯式画像を出力する工程の概要を示す図である。In the tooth DPR image processing method using the first image processing apparatus according to one embodiment of the present invention, FIG. From the process of classifying rectangular images and adding tooth numbers to generate tooth formulas, through the process of correcting erroneous tooth formula images that violate natural laws input to the object image corrector, the tooth formulas that are the results of image processing are generated. FIG. 4 is a diagram showing an outline of a process of outputting an image;

本発明の画像処理装置及びそれを用いた画像処理方法について、主として、ＤＰＲのデジタル画像を用いた歯式の特定に利用する場合を想定した一実施形態を詳細に説明するが、本発明の画像処理装置及びそれを用いた画像処理方法は、これに限定されるものではない。また、本発明の画像処理装置の構成及び画像処理方法の工程もこれに限定されるものではなく、本発明の主旨を逸脱しない範囲内で種々変更して実施することが可能であり、特許請求の範囲に記載した技術思想によってのみ限定されるものである。 Regarding the image processing apparatus and the image processing method using the same of the present invention, an embodiment will be described in detail, mainly assuming that it is used for identifying a dental formula using a DPR digital image. The processing device and the image processing method using it are not limited to this. Also, the configuration of the image processing apparatus and the steps of the image processing method of the present invention are not limited to this, and various changes can be made without departing from the spirit of the present invention. It is limited only by the technical ideas described in the scope of.

図１は、本発明の一実施形態に係る、入力部１１００、物体画像配置部１２００、物体画像同定部１３００、物体画像補正部１４００、及び、出力部１５００が直列に接続され、物体画像配置部１２００は一段階法物体検出アルゴリズムであるＹＯＬＯｖ３（１２１０）を、物体画像同定部１３００は物体分類アルゴリズムであるＥｆｆｉｃｉｅｎｔＮｅｔ１３１０を、そして、物体画像補正部１４００はＤＰアライメントアルゴリズム１４１０を用いて実行することを特徴としている第一の画像処理装置１０００の概要を示している。 FIG. 1 shows an input unit 1100, an object image placement unit 1200, an object image identification unit 1300, an object image correction unit 1400, and an output unit 1500 connected in series according to an embodiment of the present invention. 1200 is a one-step object detection algorithm YOLOv3 (1210), an object image identification unit 1300 is an object classification algorithm EfficientNet 1310, and an object image correction unit 1400 is characterized by using a DP alignment algorithm 1410. 1 shows an outline of a first image processing apparatus 1000 having

図２は、図１と同様、入力部２１００、物体画像配置部２２００、物体画像同定部２３００、物体画像補正部２４００、及び、出力部２５００が直列に接続されるが、物体画像配置部２２００は一段階法物体検出アルゴリズムであるＥｆｆｉｃｉｅｎｔＤｅｔ２２１０を用い、物体画像補正部２４００はセミグローバルアライメントにＤＰアライメントアルゴリズム２４１０を実行することを特徴としている第二の画像処理装置２０００である。この第二の画像処理装置２０００は、物体画像配置部２２００の一段階法物体検出アルゴリズムとしてＥｆｆｉｃｉｅｎｔＤｅｔ２２１０を用いることによって高速化及び高精度化を、セミグローバルアライメント法２４１０を用いることによって、高精度化を図ることができるため、第一の画像処理装置１０００より好ましい。このセミグローバルアライメント法２４１０は、第一の画像処理装置１０００に適用し、第一の画像処理装置１０００の精度を高めることにも有効である。 In FIG. 2, as in FIG. 1, an input unit 2100, an object image placement unit 2200, an object image identification unit 2300, an object image correction unit 2400, and an output unit 2500 are connected in series. The second image processing apparatus 2000 is characterized by using EfficientDet 2210, which is a one-step method object detection algorithm, and an object image correction unit 2400 executing a DP alignment algorithm 2410 for semi-global alignment. This second image processing device 2000 uses EfficientDet 2210 as a one-step object detection algorithm in the object image placement unit 2200 to increase speed and accuracy, and uses a semi-global alignment method 2410 to increase accuracy. It is preferable to the first image processing apparatus 1000 because it can This semi-global alignment method 2410 is also effective in applying to the first image processing apparatus 1000 and increasing the accuracy of the first image processing apparatus 1000 .

更に、一段階法物体検出アルゴリズム及び物体分類アルゴリズムは、これらに限定されることなく用いることができる。特に、物体分類アルゴリズムが、ＳＥＢ、ＲＢ、ＤＣｏｎｖ、ＰＣｏｎｖ、ＭｉｘＣｏｎｖ、及び、ＧＡＰの中から選択されるモジュール及び／又はブロックを少なくとも一つ以上を備えていることが、特に高精度化のために好ましい。 Additionally, non-limiting one-step object detection algorithms and object classification algorithms can be used. In particular, the object classification algorithm is provided with at least one or more modules and/or blocks selected from SEB, RB, DConv, PConv, MixConv, and GAP, especially for high accuracy preferable.

図３及び４は、本発明の一実施形態に係る、第一の画像処理装置１０００を用いた歯の画像処理方法の概要を示している。図３は、入力部１１００から物体画像配置部１２００に入力された画像データ１１１０から個々の物体を内包する矩形を特定し、その矩形に情報タグを付加し、位置を特定する工程までの概要を示す図である。また、図４は、図３の画像処理工程から物体画像同定部１３００に入力された物体の矩形画像を分類し同定する工程から、物体画像補正部１４００に入力された自然法則に反する誤った同定画像を補正する工程を経て、画像処理結果１５１０である検出物体画像を出力する工程の概要を示す図である。 Figures 3 and 4 outline a method for image processing teeth using a first image processing apparatus 1000, according to an embodiment of the present invention. FIG. 3 shows an overview of the process of identifying a rectangle containing each object from the image data 1110 input from the input unit 1100 to the object image placement unit 1200, adding an information tag to the rectangle, and identifying the position. FIG. 4 is a diagram showing; Further, FIG. 4 shows the process of classifying and identifying the rectangular image of the object input to the object image identification unit 1300 from the image processing step of FIG. FIG. 15 is a diagram showing an outline of a process of outputting a detected object image, which is an image processing result 1510, through a process of correcting the image;

このように、本発明の画像処理装置及びその画像処理装置を用いた画像処理方法の発明に至ったのは、ＤＬを用いた画像認識アルゴリズムの進歩によるＡＩの画像認識精度の劇的な向上に基づき、ＡＩの画像処理技術があらゆる産業で大きな成果を上げつつある状況を背景に、ＣＡＤシステムにもＡＩの画像処理技術を適用する開発が積極的に進められた結果として創出されたものである（非特許文献３～１４）。すなわち、本発明は、ＤＬを用いた既存の物体検出アルゴリズムだけでは迅速かつ正確な歯式の同定が困難であり、未だ十分な速度及び精度を備えた画像処理装置及び画像処理方法が見出されていないことに端を発している。 In this way, the invention of the image processing apparatus and the image processing method using the image processing apparatus of the present invention was made possible by the dramatic improvement in AI image recognition accuracy due to advances in image recognition algorithms using DL. It was created as a result of the active development of applying AI image processing technology to CAD systems against the backdrop of the fact that AI image processing technology is producing great results in all industries. (Non-Patent Documents 3-14). That is, the present invention finds an image processing apparatus and an image processing method with sufficient speed and accuracy because it is difficult to quickly and accurately identify the dental model only with the existing object detection algorithm using DL. It starts with not being.

そこで、歯式の同定に用いた実施例を用いて、本発明をより具体的に説明する。まず、図５及び６に、図３及び４に示した画像処理方法を歯式の同定に適用する場合の一例を示す。 Therefore, the present invention will be described more specifically using examples used for identification of tooth formulas. First, FIGS. 5 and 6 show an example of applying the image processing method shown in FIGS. 3 and 4 to identify a dental model.

本発明の一実施形態に係る、第一の画像処理装置１０００を用いた歯のＤＰＲの画像処理方法について、図５は、入力部１１００から物体画像配置部１２００に入力された画像データ１１１１から個々の歯を内包する矩形を特定すると共に、それぞれの矩形に情報タグを付加し、位置を特定する工程までの概要を示す図である。図６は、図５の画像処理工程から物体画像同定部１３００に入力された個々の歯の矩形画像を分類し歯番を付加して歯式を生成する工程から、物体画像補正部１４００に入力された自然法則に反する誤った歯式画像を補正する工程を経て、画像処理結果１５１１である歯式画像を出力する工程の概要を示す図である。 Regarding the image processing method for tooth DPR using the first image processing apparatus 1000 according to an embodiment of the present invention, FIG. FIG. 10 is a diagram showing an outline of a process of identifying rectangles containing teeth of a tooth, adding an information tag to each rectangle, and identifying a position; FIG. 6 shows the process of classifying rectangular images of individual teeth input to the object image identification unit 1300 from the image processing step of FIG. 15 is a diagram showing an outline of a process of outputting a tooth formula image as an image processing result 1511 through a process of correcting an erroneous tooth formula image that violates the law of nature.

更に詳しくは、図５において、学習画像データ１１１１は、検出すべき個々の歯を内包する第一の矩形を特定し、第一の矩形に情報タグを付加すると共に、第一の矩形の位置を特定するため、次のような工程を経る。定義された第一の教師画像データ１１３２、物体画像配置部で学習するための第一のアノテーションされた学習画像データ１１４２、及び、学習画像データのオーギュメンテーションされた拡張画像データ１１５１として、物体画像配置部１２００に入力される。ここで、第一の教師画像データ１１３２としては、全ての歯に共通している個々の歯の上顎歯と下顎歯を区別した歯冠部及び歯根部、歯冠部と歯根部の境界、並びに、歯冠部と歯根部の全体を用いることが好ましく、オーギュメンテーションとしては、縦横比の揺らぎ、解像度のスケーリング、クロッピング、平行移動、回転、左右反転、ランダム消去、ランダムノイズ付与、及び、明度等の拡張が好ましい。また、これらの画像データは、一段階法物体検出アルゴリズムのＹＯＬＯｖ３（１２１０）で実行され学習されると共に、ＹＯＬＯｖ３（１２１０）に内蔵される物体分類アルゴリズムのＤａｒｋｎｅｔ－５３（物体特徴抽出部）１２１１が膨大な画像データから生成した学習モデルを用いた転移学習及びファインチューニングが実行され、その結果として第一の学習モデル１２３１が生成される。一方、物体画像配置部１２００に入力された検出画像データ１１２１は、ＹＯＬＯｖ３（１２１０）で特徴量抽出１２１３が実行され、特徴量抽出データ１２４１が生成される。そして、第一の学習モデル１２３１と特徴量抽出データ１２４１とから推論プログラムで推論され、個々の歯を内包する第一の矩形が特定されると共に、それぞれの矩形に情報タグが付加され、位置が特定される。 More specifically, in FIG. 5, the learning image data 1111 identifies a first rectangle containing each tooth to be detected, adds an information tag to the first rectangle, and identifies the position of the first rectangle. In order to identify it, the following steps are taken. Defined first teacher image data 1132, first annotated learning image data 1142 for learning in the object image placement unit, and augmented learning image data augmented extended image data 1151 as object images It is input to the placement unit 1200 . Here, as the first teacher image data 1132, the tooth crown and tooth root that distinguish the upper and lower teeth of individual teeth common to all teeth, the boundary between the tooth crown and the tooth root, and , It is preferable to use the entire tooth crown and root. Augmentation includes aspect ratio fluctuation, resolution scaling, cropping, translation, rotation, left-right inversion, random erasure, random noise addition, and brightness etc. is preferable. In addition, these image data are executed and learned by YOLOv3 (1210), a one-step object detection algorithm, and Darknet-53 (object feature extraction unit) 1211, an object classification algorithm built into YOLOv3 (1210). Transfer learning and fine tuning are performed using a learning model generated from a huge amount of image data, and as a result a first learning model 1231 is generated. On the other hand, the detection image data 1121 input to the object image arrangement unit 1200 undergoes feature quantity extraction 1213 in YOLOv3 (1210), and feature quantity extraction data 1241 is generated. Then, an inference program makes an inference from the first learning model 1231 and the feature quantity extraction data 1241, identifies a first rectangle containing each tooth, adds an information tag to each rectangle, and determines the position. identified.

図６は、このようにして生成された画像データである個々の歯を内包する第一の矩形の歯を分類し、歯番をつけて同定する工程を詳しく示している。第一の教師画像データとは異なり、対象歯を検出するための画像データ、及び、対象歯を分類し、同定するためのアノテーションされた第二の教師画像データ１１３３を用いる。前者としては、対象歯の画像データ、対象歯とそれ以外の歯との相対位置が分かる画像データ、対象歯を中心とした広域画像、並びに、対象歯の勾配画像及び角度画像等を挙げることができる。特に、対象歯を中心として少なくとも隣接歯を含む広域画像が好ましく用いられる。具体的には、対象歯の長軸の長さをＬとして、対象歯の中心から上下にＬの長さの矩形を設定した場合に、Ｌが１～３が好ましく、１．５～２．５がより好ましい。また、後者としては、上顎歯と下顎歯の分類、右側歯と左側歯の分類、永久歯と乳歯の分類、歯種（切歯、犬歯、小臼歯、大臼歯等）の分類、智歯と非智歯の分類できる画像データを用いる。また、教師画像データを変更したため、図５に示した第一のアノテーションされた画像データとは異なる第二のアノテーションされた学習画像データ１１４３及び学習画像データの縦横比の揺らぎ、解像度のスケーリング、クロッピング、平行移動、回転、左右反転、ランダム消去、ランダムノイズ付与、及び、明度等のオーギュメンテーションされた拡張画像データ１１５１が用いられる。第二のアノテーションされた学習画像データ１１４３は、第一のアノテーションされた画像データ１１３１であってもよい。このような画像データが、物体分類アルゴリズムのＥｆｆｉｃｉｅｎｔＮｅｔ１３１０で実行され、第二の学習モデル１３２１が生成される。一方、検出画像データ１１２１は、ＥｆｆｉｃｉｅｎｔＮｅｔ１３１０の特徴量抽出１３１２によって特徴量抽出データ１３３１を生成する。そして、第二の学習モデル１３２１と特徴量抽出データ１３３１とから推論プログラムで推論され、第一の矩形の分類と同定が行われる。 FIG. 6 shows in detail the process of classifying, tooth numbering and identifying the first rectangular teeth containing individual teeth, which is the image data generated in this way. Unlike the first teacher image data, the image data for detecting the target tooth and the annotated second teacher image data 1133 for classifying and identifying the target tooth are used. Examples of the former include image data of the target tooth, image data showing the relative positions of the target tooth and other teeth, wide-area images centering on the target tooth, and gradient and angle images of the target tooth. can. In particular, a wide-area image including at least adjacent teeth centering on the target tooth is preferably used. Specifically, when the length of the long axis of the target tooth is L, and a rectangle having a length of L is set vertically from the center of the target tooth, L is preferably 1 to 3, and 1.5 to 2.5. 5 is more preferred. For the latter, the classification of maxillary and mandibular teeth, the classification of right and left teeth, the classification of permanent and deciduous teeth, the classification of tooth types (incisors, canines, premolars, molars, etc.), wisdom teeth and non-wisdom teeth Use image data that can be classified into Also, since the teacher image data is changed, the second annotated learning image data 1143 different from the first annotated image data shown in FIG. 5 and the aspect ratio fluctuation, resolution scaling, and cropping of the learning image data , translation, rotation, left-right inversion, random erasure, random noise addition, and augmented image data 1151 such as brightness are used. The second annotated training image data 1143 may be the first annotated image data 1131 . Such image data is run through the object classification algorithm EfficientNet 1310 to generate a second learned model 1321 . On the other hand, the detected image data 1121 generates the feature amount extraction data 1331 by the feature amount extraction 1312 of the EfficientNet 1310 . Then, an inference program makes an inference from the second learning model 1321 and the feature amount extraction data 1331, and classifies and identifies the first rectangle.

しかし、この推論結果には、解剖学的にありえない歯番の重複が推論される場合があるため、物体画像補正部１４００において、ＤＰアライメントアルゴリズム１４１１により補正され、第一の画像処理装置を用いた上記歯の画像処理方法によって画像処理結果１５１１として、検出対象である歯式画像が出力部１５００で生成される。 However, since this inference result may infer anatomically impossible tooth number duplication, in the object image correction unit 1400, it is corrected by the DP alignment algorithm 1411 and the first image processing device is used. A tooth formula image to be detected is generated by the output unit 1500 as an image processing result 1511 by the tooth image processing method.

更に、歯を用いた画像処理方法を検討した結果、より精度を高めることが可能な画像処理方法を見出したので、図７～９を用い、評価結果も含め、より詳しく説明する。この画像処理の方法の特徴は、入力された検出対象画像上の検出対象物体画像である全ての歯を内包する第二の矩形の位置を特定する工程を加えたことにある。このような第二の矩形は、密集して類似した物体である歯の集合を内包する矩形であって、検出対象画像上の不要な情報を削除すると共に、密集して類似した物体である歯の同定を行うためには、個々の歯の形状だけではなく、これらの相対的な位置関係を把握することが重要であることに起因する。この全ての歯を内包する第二の矩形を個々の歯の相対的な基準位置として活用することができるからである。 Furthermore, as a result of examination of image processing methods using teeth, an image processing method capable of increasing accuracy was found. Therefore, a more detailed description will be given using FIGS. 7 to 9, including evaluation results. This image processing method is characterized by the addition of a step of specifying the position of a second rectangle containing all the teeth, which is the detection target object image on the input detection target image. Such a second rectangle is a rectangle that contains a set of teeth, which are densely similar objects. This is because it is important to grasp not only the shape of each tooth but also their relative positional relationship in order to identify the tooth. This is because the second rectangle containing all the teeth can be used as a relative reference position for each tooth.

図７は、入力部１１００から物体画像配置部１２００に入力された画像データから全ての歯を内包する矩形を特定する工程までの概要を示す図である。そのため、数多くの医療機関で撮影されたＤＰＲ１，０００症例を学習画像データとして用いた。この学習画像データの教師画像データとしては、全ての歯を内包する第二の矩形を生成するための歯科放射線専門医によって定義された第三の教師画像データ１１３４、歯科放射線専門医によって定義された第一のアノテーション画像データ１１４２がある。前者は、個々の歯のすべてを内包する矩形であり、後者の代表例としては、全ての歯に共通している個々の歯の上顎歯と下顎歯を区別した歯冠部及び歯根部、並びに、歯槽骨のライン等を挙げることができる。一方、ＤＰＲ１，０００症例の画像から、縦横比の揺らぎ、解像度スケーリング、クロッピング、平行移動、回転、左右反転、ランダム消去、ランダムノイズ付与、及び、明度（濃淡）等のオーギュメンテーションによる拡張画像データ１１５１が作成された。そして、これらの学習データは、物体画像配置部１２００に入力され、訓練用として７２０症例、検証用として８０症例、テスト用として２００症例を用い、５分割交差検証を行って一段階法物体検出アルゴリズムＹＯＬＯｖ３（１２１０）で実行され学習された。 FIG. 7 is a diagram showing an overview up to a process of identifying a rectangle containing all teeth from image data input from the input unit 1100 to the object image placement unit 1200. FIG. Therefore, 1,000 DPR cases photographed at many medical institutions were used as learning image data. The training image data includes third training image data 1134 defined by a dental radiologist for generating a second rectangle containing all teeth, first training image data 1134 defined by a dental radiologist. of annotation image data 1142 . The former is a rectangle that encloses all of the individual teeth, and representative examples of the latter are the crown and root portions of individual teeth that are common to all teeth, distinguishing the upper and lower teeth, and , line of alveolar bone, and the like. On the other hand, from images of 1,000 DPR cases, extended image data by fluctuation of aspect ratio, resolution scaling, cropping, translation, rotation, left-right inversion, random elimination, addition of random noise, and augmentation such as brightness (shading) 1151 was created. These learning data are input to the object image placement unit 1200, and 720 cases for training, 80 cases for verification, and 200 cases for testing are used to perform 5-fold cross-validation to obtain a one-step object detection algorithm. It was run and trained on YOLOv3 (1210).

それと共に、これらの学習データを用い、ＹＯＬＯｖ３（１２１０）に内蔵される物体分類アルゴリズムのＤａｒｋｎｅｔ－５３（１２１１）が膨大な画像データから生成した学習モデルを用いた転移学習及びファインチューニングが実行され、その結果として第三の学習モデル１２３２が生成された。 At the same time, using these learning data, the object classification algorithm Darknet-53 (1211) built in YOLOv3 (1210) performs transfer learning and fine tuning using a learning model generated from a huge amount of image data, As a result, a third learning model 1232 was generated.

一方、物体画像配置部１２００に入力された検出画像データ１１２１は、ＹＯＬＯｖ３（１２１０）で特徴量抽出１２１３が実行され、特徴量抽出データ１２４１が生成される。そして、第三の学習モデル１２３２と特徴量抽出データ１２４１とから推論プログラムで推論され、全ての歯を内包する第二の矩形が特定されると共に、それぞれの矩形に情報タグが付加され、位置が特定される。その結果、全ての歯を内包する矩形の検出性能を、一般的な物体検出の精度の指標として用いられる適合率と再現率から計算されるＡＰ（ＡｖｅｒａｇｅＰｒｅｃｉｓｉｏｎ）で評価したところ、適合率と再現率を算出するオーバーラップ率（Ｊａｃｃａｒｄ係数）と呼ばれるＩоＵ（ＩｎｔｅｒｓｅｃｔｉｏｎｏｖｅｒＵｎｉｏｎ）が、５０％及び７５％の場合に、それぞれ、１．０００及び０．９９７であり、大半の症例でＧＴ（ＧｒｏｕｎｄＴｒｕｔｈ、正解率）が７５％以上であるという良好な結果であった。 On the other hand, the detection image data 1121 input to the object image arrangement unit 1200 undergoes feature quantity extraction 1213 in YOLOv3 (1210), and feature quantity extraction data 1241 is generated. Then, the third learning model 1232 and the feature amount extraction data 1241 are inferred by the inference program, the second rectangles containing all the teeth are specified, information tags are added to the respective rectangles, and the positions are determined. identified. As a result, the detection performance of a rectangle containing all teeth was evaluated by AP (Average Precision) calculated from precision and recall, which are generally used as indicators of precision in object detection. IоU (Intersection over Union), which is called the overlap rate (Jaccard coefficient) for calculating the ratio, is 1.000 and 0.997 at 50% and 75%, respectively, and GT (Ground Truth , accuracy rate) was 75% or more, which was a good result.

図８は、図７に引き続き、物体画像配置部１２００において実行される画像処理工程であり、全ての歯を内包する矩形画像を個々の歯を内包する（第一の矩形が改良された）第三の矩形に特定すると共に、それぞれの矩形に情報タグを付加し、位置を特定する工程までの概要を示している。 FIG. 8 shows an image processing step executed in the object image placement unit 1200 following FIG. It specifies three rectangles, adds an information tag to each rectangle, and shows an overview up to the process of specifying the position.

ここで重要なことは、図示していないが、ここで、全ての歯を内包する第二の矩形を、その中心から、左右に１．１から１．５倍に、上下に１．１～１．８倍に広域画像として用いることである。これは、智歯、乳歯、根尖が第二の矩形から除外されることを避けるためである。 What is important here is that, although not shown, the second rectangle containing all the teeth is magnified 1.1 to 1.5 times horizontally and 1.1 to 1.5 times vertically from the center. It is used as a wide-area image with a magnification of 1.8. This is to avoid excluding wisdom teeth, deciduous teeth and apexes from the second rectangle.

ここでも、ＤＰＲ１，０００症例を学習画像データとして用いた。この学習画像データからの教師画像データとしては、個々の歯を内包する第一の矩形が改良された第三の矩形を特定するため、図５に示した第一の教師データ１１３２、すなわち、全ての歯に共通している、上顎歯と下顎歯を区別した歯冠部、上顎歯と下顎歯を区別した歯根部、歯冠部と歯根部の境界、及び、歯冠部と歯根部の全体から少なくとも一つ以上が用いられた。全ての歯を内包する第二の矩形を特定する場合と同様に、歯科放射線専門医によって定義された第一のアノテーション画像データが教師データ１１４２として用いられ、拡張データ１１５１も用いられた。そして、これらの学習データは、訓練用として７２０症例、検証用として８０症例、テスト用として２００症例を用い、５分割交差検証を行って一段階法物体検出アルゴリズムＹＯＬＯｖ３（１２１０）で実行され学習されると共に、ＹＯＬＯｖ３（１２１０）に内蔵される物体分類アルゴリズムのＤａｒｋｎｅｔ－５３（１２１１）が膨大な画像データから生成した学習モデルを用いた転移学習及びファインチューニングが実行され、その結果として第四の学習モデル１２３３が生成された。 Again, 1,000 DPR cases were used as training image data. As teacher image data from this learning image data, the first teacher data 1132 shown in FIG. 5, that is, all The crown part that separates the maxillary and mandibular teeth, the root part that separates the maxillary and mandibular teeth, the boundary between the crown and the root, and the entire crown and root At least one or more was used from The first annotation image data defined by the dental radiologist was used as training data 1142, and the augmentation data 1151 was also used, as was the case for identifying a second rectangle containing all teeth. Then, these learning data are 720 cases for training, 80 cases for verification, and 200 cases for testing, and 5-fold cross-validation is performed, and the one-step object detection algorithm YOLOv3 (1210) is executed and learned. At the same time, Darknet-53 (1211), an object classification algorithm built into YOLOv3 (1210), performs transfer learning and fine tuning using a learning model generated from a huge amount of image data, and as a result, the fourth learning A model 1233 was generated.

また、図７と同様に、物体画像配置部１２００に入力された検出画像データ１１２１は、ＹＯＬＯｖ３（１２１０）で特徴量抽出１２１３が実行され、特徴量抽出データ１２４１が生成され、第四の学習モデル１２３３と特徴量抽出データ１２４１とから推論プログラムで推論され、個々の歯を内包する第三の矩形が特定されると共に、それぞれの矩形に情報タグが付加され、位置が特定された。その結果を、上顎歯と下顎歯のＡＰの平均であるＭＡＰ（ＭｅａｎＡｖｅｒａｇｅＰｒｅｃｉｓｉｏｎ）という一般的な物体検出の指標で評価したところ、歯冠部、歯根部、歯冠部と歯根部の境界、及び、歯冠部と歯根部の全体のＭＡＰの平均値が、それぞれ、０．７２７、０．６９９、０．６８２、及び０．７７７であり、９７％以上の対象歯を検出することができるという良好な結果が得られた。 7, the detection image data 1121 input to the object image placement unit 1200 is subjected to feature extraction 1213 in YOLOv3 (1210), feature extraction data 1241 is generated, and the fourth learning model An inference program inferred from 1233 and feature amount extraction data 1241 to identify a third rectangle containing each tooth, and an information tag was added to each rectangle to identify the position. The results were evaluated using a general object detection index called MAP (Mean Average Precision), which is the average AP of the maxillary and mandibular teeth. And the average values of MAP for the entire crown and root are 0.727, 0.699, 0.682, and 0.777, respectively, and 97% or more of the target teeth can be detected. Good results were obtained.

次いで、図９は、図８の画像処理工程から物体画像同定部１３００に入力された個々の歯を内包する第三の矩形画像を分類し歯番を付加して歯式を生成する工程から、物体画像補正部１４００に入力された自然法則に反する解剖学的に誤った歯式画像を補正する工程を経て、画像処理結果１５１２である歯式画像を出力する工程の概要を示す図である。 Next, FIG. 9 shows the step of classifying the third rectangular image containing each tooth input to the object image identification unit 1300 from the image processing step of FIG. 15 is a diagram showing an outline of a process of outputting a tooth formula image, which is an image processing result 1512, through a process of correcting an anatomically incorrect tooth formula image that violates the law of nature and is input to an object image correction unit 1400. FIG.

ここでは、図６と同様に、第一の教師画像データとは異なり、対象歯を検出するための画像データ、及び、対象歯を分類し、同定するためのアノテーションされた第二の教師画像データ１１３３を用いる。前者としては、対象歯の画像データ、対象歯とそれ以外の歯との相対位置が分かる画像データ、対象歯を中心とした広域画像、並びに、対象歯の勾配画像及び角度画像等を挙げることができるが、広域画像が特に好ましい。具体的には、対象歯の長軸の長さをＬとして、対象歯の中心から上下にＬの長さの矩形を設定した場合に、Ｌが１～３が好ましく、１．５～２．５がより好ましい。後者としては、上顎歯と下顎歯の分類、右側歯と左側歯の分類、永久歯と乳歯の分類、歯種（切歯、犬歯、小臼歯、大臼歯等）の分類、智歯と非智歯の分類できる画像データを用いる。また、教師画像データを変更したため、図６と同様、第一のアノテーションされた画像データとは異なる第二のアノテーションされた学習画像データ１１４３及びオーギュメンテーションされた拡張画像データ１１５１が用いられた。第二のアノテーションされた学習画像データ１１４３は、第一のアノテーションされた画像データ１１４２であってもよい。このような画像データが、物体分類アルゴリズムのＥｆｆｉｃｉｅｎｔＮｅｔ１３１０で実行され、第五の学習モデル１３１１が生成された。また、検出画像データ１１２１も、図６同様に、ＥｆｆｉｃｉｅｎｔＮｅｔ１３１０の特徴量抽出１３１２によって特徴量抽出データ１３３１が生成された。そして、第五の学習モデル１３１１と特徴量抽出データ１３３１とから推論プログラムで推論され、第三の矩形の分類と同定が行われた。 Here, as in FIG. 6, unlike the first teacher image data, the image data for detecting the target tooth and the annotated second teacher image data for classifying and identifying the target tooth 1133 is used. Examples of the former include image data of the target tooth, image data showing the relative positions of the target tooth and other teeth, wide-area images centering on the target tooth, and gradient and angle images of the target tooth. Although possible, wide area images are particularly preferred. Specifically, when the length of the long axis of the target tooth is L, and a rectangle having a length of L is set vertically from the center of the target tooth, L is preferably 1 to 3, and 1.5 to 2.5. 5 is more preferred. The latter includes the classification of maxillary and mandibular teeth, the classification of right and left teeth, the classification of permanent and deciduous teeth, the classification of tooth types (incisors, canines, premolars, molars, etc.), and the classification of wisdom teeth and non-wisdom teeth. Use image data that can be used. Also, since the teacher image data was changed, the second annotated learning image data 1143 and the augmented extended image data 1151 different from the first annotated image data were used, as in FIG. The second annotated training image data 1143 may be the first annotated image data 1142 . Such image data was run through the object classification algorithm EfficientNet 1310 to generate a fifth learned model 1311 . For the detected image data 1121, feature amount extraction data 1331 is generated by the feature amount extraction 1312 of the EfficientNet 1310 as in FIG. Then, an inference program was used to make an inference from the fifth learning model 1311 and the feature amount extraction data 1331, and the third rectangle was classified and identified.

ここで、推論プログラムとしてＯｐｅｎＶＩＮＯ（登録商標）を活用し、単一モデルでマルチタスク処理の分類を行った結果、複数モデルでシングルタスク処理の分類を行った結果に匹敵する結果が得られたので、速度を考慮すれば、マルチタスク処理を用いることが好ましい。これは、歯が相互に類似した形体であるためであると考えられる。 Here, OpenVINO (registered trademark) was used as an inference program, and as a result of classifying multi-task processing with a single model, results comparable to the results of classifying single-task processing with multiple models were obtained. For speed considerations, it is preferable to use multitasking. It is believed that this is because the teeth have mutually similar shapes.

その結果、歯番、上顎歯と下顎歯、右側歯と左側歯、永久歯と乳歯、歯種（切歯、犬歯、小臼歯、大臼歯）、智歯と非智歯の一般的な物体検出の指標である適合率、再現率、Ｆ値（適合率と再現率の調和平均）が、表１に示すように、極めて良好な結果であった。着目すべき点は、歯番の正解率は９７％であるのに対し、それ以外のタスクは９９％を超えていることであり、乳歯と永久歯が共存する極めて密集した歯を識別できたことである。おそらく、この識別は世界で初めてではないかと思われる。 As a result, in the general object detection index of tooth number, maxillary and mandibular teeth, right and left teeth, permanent and deciduous teeth, tooth types (incisors, canines, premolars, molars), wisdom teeth and non-wisdom teeth Certain precision, recall, and F value (harmonic mean of precision and recall), as shown in Table 1, were very good results. It should be noted that the accuracy rate for tooth numbering was 97%, while the accuracy for other tasks exceeded 99%. is. This identification is probably the first in the world.

ただし、歯番の適合率からわかるように、この推論結果には、解剖学的にありえない歯番の重複が推論される場合がある。例えば、ＦＤＩ（ＦｅｄｅｒａｔｉｏｎＤｅｎｔａｉｒｅＩｎｔｅｒｎａｔｉｏｎａｌｅ、国際歯科医師会）方式（Ｔｗｏ－ｄｉｇｉｔｓｙｓｔｅｍ）に基づいた歯式で２６、すなわち、上顎の左側の第一大臼歯が二本重複して検出された。そこで、物体画像補正部１４００において、セミグローバルアライメントにＤＰアライメントアルゴリズム１４１２を適用するに当たり、表１の歯番以外の識別結果、例えば、上顎歯と下顎歯及び永久歯と乳歯の結果を活用して補正した結果、正しく補正され、画像処理結果１５１２として、検出対象である歯式画像が出力部で生成された。 However, as can be seen from the matching rate of the tooth number, this inference result may infer an anatomically impossible duplication of the tooth number. For example, 26, that is, two double upper left first molars were detected in the tooth formula based on the FDI (Federation Dentaire International) system (Two-digit system). Therefore, in applying the DP alignment algorithm 1412 to the semi-global alignment in the object image correction unit 1400, the identification results other than the tooth numbers in Table 1, for example, the results of the upper and lower teeth and the permanent teeth and deciduous teeth are used for correction. As a result, correct correction was performed, and a dental formula image to be detected was generated as an image processing result 1512 by the output unit.

このように、本発明の画像処理装置を用いた画像処理方法によれば、歯式を迅速かつ正確に生成することができ、永久歯と乳歯が混在した幼児や子供のＤＰＲから、乳歯も検出できることが明らかとなった。 As described above, according to the image processing method using the image processing apparatus of the present invention, it is possible to quickly and accurately generate a dental formula, and it is possible to detect deciduous teeth from the DPR of infants and children in which permanent teeth and deciduous teeth are mixed. became clear.

本発明の画像処理装置及びそれを用いた画像処理方法は、類似した物体が密集して存在する歯の画像の認識に適しているが、ブドウやトマト等の密集する果実の成熟度、山に混在して茂る各種樹木、倉庫に積み重なる形状、材質の類似した多数の段ボール箱、海の魚の群れ、各種航空写真、及び、競技場の観客等、様々な類似した物体が密集して存在する画像の識別に利用することができる可能性があるものと考えられるという点で、産業上の利用可能性は極めて高い。

The image processing apparatus of the present invention and the image processing method using it are suitable for recognizing images of teeth in which similar objects are densely present. Various trees growing together, shapes piled up in a warehouse, many cardboard boxes of similar materials, schools of fish in the sea, various aerial photographs, and images in which various similar objects are densely present , such as spectators at stadiums. Industrial applicability is extremely high in that it is thought that there is a possibility that it can be used for identification of

１０００第一の画像処理装置
１１００入力部
１１１０学習画像データ
１１１１歯の学習画像データ
１１２０検出画像データ
１１２１歯の画像学習データ
１１３０第一の教師画像データ
１１３１第二の教師画像データ
１１３２歯の第一の教師データ
１１３３歯の第二の教師画像データ
１１３４歯の第三の教師画像データ
１１４０第一のアノテーションされた画像データ
１１４１第二のアノテーションされた画像データ
１１４２歯の第一アノテーションされた画像データ
１１４３歯の第二のアノテーションされた画像データ
１１５０オーギュメンテーションされた拡張データ
１１５１歯のオーギュメンテーションされた拡張データ
１２００物体画像配置部
１２１０一段階法物体検出アルゴリズム（ＹＯＬＯｖ３）
１２１１ＹＯＬＯｖ３のバックボーンとして内蔵されている物体分類アルゴリズム
（Ｄａｒｋｎｅｔ－５３）
１２１２ＹＯＬＯｖ３による学習
１２１３ＹＯＬＯｖ３による特徴量抽出
１２１４ＹＯＬＯｖ３による推論
１２２０Ｄａｒｋｎｅｔ－５３による特徴量抽出データ
１２３０ＹＯＬＯｖ３による第一の学習モデル
１２３１ＹＯＬＯｖ３による歯の第一の学習モデル
１２３２ＹＯＬＯｖ３による歯の第三の学習モデル
１２３３ＹＯＬＯｖ３による歯の第四の学習モデル
１２４０ＹＯＬＯｖ３による特徴量抽出データ
１２４１ＹＯＬＯｖ３による歯の特徴量抽出データ
１２５０ＹＯＬＯｖ３による物体画像の情報タグの付加と位置の特定
１２５１ＹＯＬＯｖ３による個々の歯を内包する第一の矩形の情報タグの付加と位
置の特定
１２５２ＹＯＬＯｖ３による全ての歯を内包する第二の矩形の特定
１２５３ＹＯＬＯｖ３による個々の歯を内包する第三の矩形の情報タグの付加と位
置の特定
１３００物体同定部
１３１０物体分類アルゴリズム（ＥｆｆｉｃｉｅｎｔＮｅｔ）
１３１１ＥｆｆｉｃｉｅｎｔＮｅｔによる学習
１３１２ＥｆｆｉｃｉｅｎｔＮｅｔによる特徴量抽出
１３１３ＥｆｆｉｃｉｅｎｔＮｅｔによる推論
１３２０ＥｆｆｉｃｉｅｎｔＮｅｔによる第二の学習モデル
１３２１ＥｆｆｉｃｉｅｎｔＮｅｔによる歯の第二の学習モデル
１３２２ＥｆｆｉｃｉｅｎｔＮｅｔによる歯の第五の学習モデル
１３３０ＥｆｆｉｃｉｅｎｔＮｅｔによる特徴量抽出データ
１３３１ＥｆｆｉｃｉｅｎｔＮｅｔによる歯の特徴量抽出データ
１３４０ＥｆｆｉｃｉｅｎｔＮｅｔによる物体分類及び同定
１３４１ＥｆｆｉｃｉｅｎｔＮｅｔによる第一の矩形の分類及び同定
１３４２ＥｆｆｉｃｉｅｎｔＮｅｔによる第三の矩形の分類及び同定
１４００物体画像補正部
１４１０物体のＤＰアライメントアルゴリズム
１４１１第一の矩形のＤＰアライメントアルゴリズム
１４１２第三の矩形のＤＰアライメントアルゴリズム（セミグローバルアライメント）
１５００出力部
１５１０第一の画像処理装置による物体画像処理結果
１５１１第一の画像処理装置による歯の画像処理結果（歯式の同定）
１５１２第一の画像処理装置による全ての歯を内容する第二の矩形を用いた歯の画
像処理結果（歯式の同定）
２０００第二の画像処理装置
２１００入力部
２２００物体画像配置部
２２１０一段階法物体検出アルゴリズム（ＥｆｆｉｃｉｅｎｔＤｅｔ）
２２１１ＥｆｆｉｃｉｅｎｔＤｅｔのバックボーンとして内蔵されている物体分類アルゴリズム（ＥｆｆｉｃｉｅｎｔＮｅｔ）
２３００物体同定部
２３１０物体分類アルゴリズム（ＥｆｆｉｃｉｅｎｔＮｅｔ）
２４００物体画像補正部
２４１０ＤＰアライメントアルゴリズム（セミグローバルアライメント）
２５００出力部
1000 first image processing device 1100 input unit 1110 learning image data 1111 tooth learning image data 1120 detection image data 1121 tooth image learning data 1130 first teacher image data 1131 second teacher image data 1132 tooth first Teacher data 1133 Second teacher image data of teeth 1134 Third teacher image data of teeth 1140 First annotated image data 1141 Second annotated image data 1142 First annotated image data of teeth 1143 Teeth 1150 Augmented Augmented Data 1151 Augmented Augmented Data for Teeth 1200 Object Image Placer 1210 One Step Object Detection Algorithm (YOLOv3)
1211 Built-in object classification algorithm (Darknet-53) as the backbone of YOLOv3
1212 Learning by YOLOv3 1213 Feature extraction by YOLOv3 1214 Inference by YOLOv3
1220 Feature quantity extraction data by Darknet-53 1230 First learning model by YOLOv3 1231 First learning model of teeth by YOLOv3 1232 Third learning model of teeth by YOLOv3 1233 Fourth learning model of teeth by YOLOv3 1240 By YOLOv3 Feature quantity extraction data 1241 Tooth feature quantity extraction data by YOLOv3 1250 Addition of information tag to object image by YOLOv3 and identification of position 1251 Addition of first rectangular information tag containing each tooth by YOLOv3 and identification of position 1252 Identification of the second rectangle containing all teeth by YOLOv3 1253 Addition and location of information tag of the third rectangle containing each tooth by YOLOv3 1300 Object identification unit 1310 Object classification algorithm (EfficientNet)
1311 Learning by EfficientNet 1312 Feature extraction by EfficientNet 1313 Inference by EfficientNet 1320 Second learning model by EfficientNet 1321 Second learning model of teeth by EfficientNet 1322 Fifth learning model of teeth by EfficientNet 1330 Feature extraction data by EfficientNet 1331 Tooth Feature Amount Extraction Data by EfficientNet 1340 Object Classification and Identification by EfficientNet 1341 First Rectangle Classification and Identification by EfficientNet 1342 Third Rectangle Classification and Identification by EfficientNet 1400 Object Image Corrector 1410 Object DP Alignment Algorithm 1411 One Rectangle DP Alignment Algorithm 1412 Third Rectangle DP Alignment Algorithm (Semi-Global Alignment)
1500 Output unit 1510 Object image processing result by the first image processing device 1511 Tooth image processing result by the first image processing device (identification of tooth formula)
1512 Tooth image processing result using the second rectangle containing all teeth by the first image processor (identification of tooth formula)
2000 second image processing device 2100 input unit 2200 object image placement unit 2210 one-step object detection algorithm (EfficientDet)
Built-in object classification algorithm (EfficientNet) as the backbone of 2211 EfficientDet
2300 Object Identification Unit 2310 Object Classification Algorithm (EfficientNet)
2400 object image correction unit 2410 DP alignment algorithm (semi-global alignment)
2500 output unit

Claims

An image processing device for identifying a dental formula with a tooth as an object,
an input unit capable of inputting object image data;
Executed by an object classification algorithm that includes at least a CNN (Convolutional Neural Network) as a module Executed by an object detection algorithm that incorporates an object feature extraction unit that extracts the feature amount of an object from an existing object image data set as a backbone and learns the input first teacher image data, learning image data set, and extended image data of the learning image data set, and performs transfer learning and fine tuning using the learning model of the object feature extraction unit. identifying a first rectangular information tag surrounding an image of each detection target object on the input detection target image and the position of the first rectangle to which the information tag is added; an object image placement unit capable of
Executed by the object classification algorithm, the first rectangular data and/or the first rectangular wide area data, second teacher image data different from the input first teacher image data, and the learning image A second learning model can be created by learning the data set and the extended image data of the learning image data set, and performing transfer learning and fine tuning using the learning model of the object feature extraction unit, and the object image arrangement. an object image identification unit capable of classifying and identifying the detection target object image by adding a unique information tag to the first rectangle identified by the unit;
The result output from the object image identification unit isBy DP (Dynamic Programming) alignment algorithman object image corrector capable of correcting;
an output unit capable of outputting a processing result of the detection target object image;
An image processing device comprising:

The object classification algorithm comprises:
ＡｌｅｘＮｅｔ、ＧＰｉｐｅ（ＧｉａｎｔＮｅｕｒａｌＮｅｔｗｏｒｋｓｕｓｉｎｇＰｉｐｅｌｉｎｅＰａｒａｌｌｅｌｉｓｍ）、Ｉｎｃｅｐｔｉｏｎ、ＳＥＢ（Ｓｑｕｅｅｚｅ－ａｎｄ－ＥｘｃｉｔａｔｉｏｎＢｌｏｃｋ）－Ｉｎｃｅｐｔｉｏｎ、Ｘｅｐｔｉｏｎ、ＤｅｎｓｅＮｅｔ（ＤｅｎｓｅｌｙＣｏｎｎｅｃｔｅｄＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）、ＲｅｓＮｅｔ（ＲｅｓｉｄｕａｌＮｅｔｗｏｒｋ）、ＳＥＢ－ＲｅｓＮｅｔ、Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＳＥＢ－Ｉｎｃｅｐｔｉｏｎ－ＲｅｓＮｅｔ、ＲｅｓＮｅＸｔ、ＮＡＳＮｅｔ（ＮｅｕｒａｌＡｒｃｈｉｔｅｃｔｕｒｅＳｅａｒｃｈＮｅｔｗｏｒｋ）、ＶＧＧ（ＶｉｓｕａｌＧｅｏｍｅｔｒｙＧｒｏｕｐ）、ＳＥＢ－ＶＧＧ、ＭｏｂｉｌｅＮｅｔ、ＭｎａｓＮｅｔ、ＡｍｏｅｂａＮｅｔ、ＣＳＰＮｅｔ（ＣｒｏｓｓＳｔａｇｅＰａｒｔｉａｌＮｅｔｗｏｒｋ）、ＣＢＮｅｔ（ＣｏｍｐｏｓｉｔｅＢａｃｋｂｏｎｅＮｅｔｗｏｒｋ）、Ｄａｒｋｎｅｔ、 2. The image processing apparatus according to claim 1, wherein at least one or more selected from EfficientNet and NFNet.

The object classification algorithm comprises:
SEB, RB (Residual Block), DConv (Depthwise Convolution Layer), PConv (Pointwise Convolution Layer), MixConv (Mixed Depthwise Convolution Layer), and GAP (Global Average) block and/or modules selected from 2. The image processing apparatus according to claim 1, comprising at least one or more.

The object classification algorithm comprises:
2. The image processing apparatus according to claim 1, wherein the network is at least one selected from ResNet, ResNeXt, MobileNet, MnasNet, Darknet, EfficientNet, and NFNet.

5. The image processing apparatus according to claim 1 , wherein said DP alignment algorithm is applied to semi-global alignment.

The image processing apparatus according to any one of claims 1 to 5 , wherein the object detection algorithm is a two-stage object detection algorithm.

The image processing apparatus according to any one of claims 1 to 5 , wherein the object detection algorithm is a one-step object detection algorithm.

The two-stage object detection algorithm comprises:
At least one or more selected from R-CNN (Region-based Convolutional Neural Network), Fast R-CNN, Faster R-CNN, and R-FCN (Region-based Fully Convolutional Network) 7. The image processing apparatus according to claim 6 , characterized by:

The one-step object detection algorithm comprises:
Оｖｅｒｆｅａｔ、ＤＰＭ（ＤｅｆｏｒｍａｂｌｅＰａｒｔｓＭｏｄｅｌ）、ＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＤＳＳＤ（ＤｅｃｏｎｖｏｌｕｔｉｏｎａｌＳｉｎｇｌｅＳｈｏｔＤｅｔｅｃｔｏｒ）、ＥＳＳＤ（ＥｘｔｅｎｄｔｈｅｓｈａｌｌｏｗｐａｒｔｏｆＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＲｅｆｉｎｅＤｅｔ（Ｓｉｎｇｌｅ－ＳｈｏｔＲｅｆｉｎｅｍｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋｆｏｒＯｂｊｅｃｔＤｅｔｅｃｔｉｏｎ） , RetinaNet , M2Det, YOLO, Scaled YOLO, and EfficientDet.

In the image processing device according to any one of claims 1 to 9 ,
learning the input first teacher image data, the learning image data set, and the extension data of the learning image data set, and performing transfer learning and fine tuning using the learning model of the object feature extraction unit; a first rectangular information tag containing an image of each of the detection target objects on the input detection target image; locating a rectangle of
the first rectangular data and/or the first rectangular wide area data, second teacher image data different from the input first teacher image data, the learning image data set, and the learning Using the second learning model learned by learning the extended data of the image data set and performing transfer learning and fine tuning using the learning model of the object feature extraction unit, the first rectangle, the adding a unique information tag to classify and identify the detection target object image;
a step of correcting the result of classification and identification of the detection target object image by the DP alignment algorithm ;
a step of outputting a processing result of the detection target object image;
An image processing method characterized by passing through.

In the image processing device according to any one of claims 1 to 9 ,
learning third teacher image data different from the input first and second teacher image data, the learning image data set, and extension data of the learning image data set; Using a third learning model acquired by performing transfer learning and fine tuning using a learning model, the position of a second rectangle containing all of the detection target object images on the input detection target image is specified. and
learning the second rectangular data and/or the second rectangular wide area data, the input first teacher image data, the learning image data set, and extension data of the learning image data set; , using a fourth learning model acquired by performing transfer learning and fine tuning using the learning model of the object feature extraction unit, and including the image of each of the detection target objects on the input detection target image. identifying a third rectangular information tag and the position of the third rectangle to which the information tag is attached;
learning the third rectangular data and/or the third rectangular wide area data, the input second teacher image data, the learning image data set, and extension data of the learning image data set; , using a fifth learning model acquired by performing transfer learning and fine tuning using the learning model of the object feature extraction unit, adding the unique information tag to the third rectangle to add the detection target object image classifying and identifying the
a step of correcting the result of classification and identification of the detection target object image by the DP alignment algorithm ;
a step of outputting a processing result of the detection target object image;
An image processing method characterized by passing through.

applying said DP alignment algorithm to a semi-global alignment characterized byClaim 10 or 11The image processing method described in .

13. The image processing method according to any one of claims 10 to 12 , wherein the step of correcting by the DP alignment algorithm applies a result of processing in the step of classifying and identifying the detection target object image. .

the object image is Claims 1 to 3, characterized in that they are dental digital photographs9The image processing device according to any one of .

The image processing method according to any one of claims 10 to 13 , wherein said object is a tooth and said object image is a dental digital photograph.