JPWO2020003992A1

JPWO2020003992A1 - Learning device and learning method, and medical image processing device

Info

Publication number: JPWO2020003992A1
Application number: JP2020527359A
Authority: JP
Inventors: 麻依子遠藤
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2018-06-28
Filing date: 2019-06-10
Publication date: 2021-06-24
Anticipated expiration: 2039-06-10
Also published as: WO2020003992A1; JP7382930B2

Abstract

特定の画質を有する医療画像に対して画像認識を行うモデルを効率よく生成できる学習装置及び学習方法、並びに、医療画像処理装置を提供する。学習装置（１）は、第１画質の医療画像で構成される第１医療画像群を用いて学習することにより、第１画質の医療画像に対して画像認識を行う第１モデル（Ｍ１）を生成する第１学習部（１０）と、第１学習部（１０）で生成された第１モデル（Ｍ１）を元に、第２画質の医療画像で構成される第２医療画像群を用いて学習することにより、第２画質の医療画像に対して画像認識を行う第２モデル（Ｍ２）を生成する第２学習部（２０）と、を備える。第２医療画像群は、認識対象の医療画像の画質と同じ画質の医療画像で構成され、第１医療画像群は、第２画質と異なる画質の医療画像で構成される。第２学習部（２０）は、第１学習部（１０）の学習結果をベースに学習するので、第２医療画像群が少ない場合であっても、精度の高いモデルを生成できる。Provided are a learning device and a learning method capable of efficiently generating a model for performing image recognition on a medical image having a specific image quality, and a medical image processing device. The learning device (1) uses a first model (M1) that performs image recognition on a first-quality medical image by learning using a first-quality medical image group composed of first-quality medical images. Based on the generated first learning unit (10) and the first model (M1) generated by the first learning unit (10), a second medical image group composed of second-quality medical images is used. It includes a second learning unit (20) that generates a second model (M2) that performs image recognition on a medical image having a second image quality by learning. The second medical image group is composed of medical images having the same image quality as the image quality of the medical image to be recognized, and the first medical image group is composed of medical images having an image quality different from that of the second image quality. Since the second learning unit (20) learns based on the learning result of the first learning unit (10), it is possible to generate a highly accurate model even when the number of the second medical image group is small.

Description

本発明は、学習装置及び学習方法、並びに、医療画像処理装置に係り、特に、医療画像に対して画像認識を行うモデルを生成するための学習装置及び学習方法、並びに、そのモデルを使用した医療画像処理装置に関する。 The present invention relates to a learning device and a learning method, and a medical image processing device, and in particular, a learning device and a learning method for generating a model for performing image recognition on a medical image, and medical treatment using the model. Regarding image processing equipment.

機械学習により生成した画像認識モデルを用いて、医療画像から病変を自動的に検出したり、病変を種類ごとに分類したりする技術が知られている。機械学習では、問題に応じた画像を大量に学習させることで、検出、分類といった画像認識が可能となる。 There are known techniques for automatically detecting lesions from medical images and classifying lesions by type using an image recognition model generated by machine learning. In machine learning, image recognition such as detection and classification becomes possible by learning a large amount of images according to a problem.

特許文献１には、学習用の画像の数が少ない場合であっても、精度の高い画像認識モデルを生成する方法として、学習対象の画像群の特性と類似する画像群によって事前学習し、その後、学習対象の画像群で本学習する方法が提案されている。具体的には、被写体の形状が類似している画像群、生体ファントムを撮像した画像群、被写体の組織構造が類似した画像群、同一の撮像系によって模倣臓器を撮像した画像群等によって事前学習し、その後、学習対象の画像群で本学習する方法が提案されている。 According to Patent Document 1, even when the number of images for learning is small, as a method of generating a highly accurate image recognition model, pre-learning is performed using an image group similar to the characteristics of the image group to be learned, and then. , A method of main learning with a group of images to be learned has been proposed. Specifically, pre-learning is performed by an image group in which the shape of the subject is similar, an image group in which a biological phantom is imaged, an image group in which the tissue structure of the subject is similar, an image group in which a mimic organ is imaged by the same imaging system, and the like. After that, a method of main learning with the image group to be learned has been proposed.

また、特許文献２には、第１フレームレートによって撮像された第１画像群によって第１の学習を行い、その後、第１フレームレートよりも低い第２フレームレートで撮像された第２画像群によって第２の学習を行う方法が提案されている。 Further, in Patent Document 2, the first learning is performed by the first image group imaged at the first frame rate, and then by the second image group imaged at a second frame rate lower than the first frame rate. A method of performing the second learning has been proposed.

国際公開第2017/221412号International Publication No. 2017/22 14212 国際公開第2017/175282号International Publication No. 2017/175282

ところで、機械学習では、学習に用意した画像の画質に偏りがあると、その画質の偏りも学習してしまう。このため、学習した画像群の画質の偏りから外れた画質の画像を画像認識させると、認識の精度が低下するという問題がある。特許文献１、２の学習方法では、画質を考慮した学習が行われていないため、学習した画像群の画質と異なる画質の画像を認識させると、認識精度が低下するという欠点がある。 By the way, in machine learning, if the image quality of the image prepared for learning is biased, the bias in the image quality is also learned. Therefore, there is a problem that the accuracy of recognition is lowered when the image of the image quality deviating from the bias of the image quality of the learned image group is recognized. Since the learning methods of Patent Documents 1 and 2 do not perform learning in consideration of image quality, there is a drawback that the recognition accuracy is lowered when an image having an image quality different from the image quality of the learned image group is recognized.

一方、内視鏡等の医療画像を撮影する機器では、機種ごとに画質差が存在する場合がある。したがって、精度の高い画像認識モデルを生成するには、機種ごとに学習を最適化する必要がある。 On the other hand, in a device such as an endoscope that captures a medical image, there may be a difference in image quality depending on the model. Therefore, in order to generate a highly accurate image recognition model, it is necessary to optimize learning for each model.

しかしながら、機種ごとにゼロベースで画像認識モデルを生成することは、多大な手間及びコストがかかるという問題がある。 However, generating a zero-based image recognition model for each model has a problem that a great deal of labor and cost are required.

本発明は、このような事情に鑑みてなされたもので、特定の画質を有する医療画像に対して画像認識を行うモデルを効率よく生成できる学習装置及び学習方法、並びに、医療画像処理装置を提供することを目的とする。 The present invention has been made in view of such circumstances, and provides a learning device and a learning method capable of efficiently generating a model for performing image recognition on a medical image having a specific image quality, and a medical image processing device. The purpose is to do.

上記課題を解決するための手段は、次のとおりである。 The means for solving the above problems are as follows.

（１）第１画質の医療画像で構成される第１医療画像群を用いて学習することにより、第１画質の医療画像に対して画像認識を行う第１モデルを生成する第１学習部と、第１モデルを元に、第１画質と異なる第２画質の医療画像で構成される第２医療画像群を用いて学習することにより、第２画質の医療画像に対して画像認識を行う第２モデルを生成する第２学習部と、を備えた学習装置。 (1) With a first learning unit that generates a first model that performs image recognition on a first-quality medical image by learning using a first medical image group composed of a first-quality medical image. , Based on the first model, image recognition is performed on the second image quality medical image by learning using the second medical image group composed of the second image quality medical image different from the first image quality. A learning device including a second learning unit that generates two models.

本態様によれば、まず、第１画質の第１医療画像群で第１の学習を行うことにより、第１モデルが生成される。その後、第１モデルを元に、第２画質の第２医療画像群で第２の学習を行うことにより、第２モデルが生成される。第２の学習は、第１の学習結果をベースに行われるので、学習用の画像の数が少ない場合であっても、精度のよいモデルを生成できる。したがって、特定の画質を有する医療画像に対して画像認識を行うモデルを生成する場合は、学習用の画像が豊富な第１医療画像群で第１の学習を行った後、目的の画質の医療画像群（第２医療画像群）で第２の学習を行うことにより、目的の画質の医療画像に対して画像認識を行うモデルを効率よく生成できる。 According to this aspect, first, the first model is generated by performing the first learning in the first medical image group of the first image quality. Then, based on the first model, the second model is generated by performing the second learning in the second medical image group of the second image quality. Since the second learning is performed based on the first learning result, it is possible to generate an accurate model even when the number of images for training is small. Therefore, when generating a model for performing image recognition on a medical image having a specific image quality, after performing the first learning on the first medical image group having abundant images for learning, medical treatment with the desired image quality is performed. By performing the second learning in the image group (second medical image group), it is possible to efficiently generate a model that performs image recognition for a medical image having a desired image quality.

なお、医療画像を構成する画像には、動画及び静止画の双方が含まれる。動画は、複数のフレームを含む時系列の画像群と捉えることができる。また、ここでの画質は、動画においては、１フレームを構成する画像の画質を意味する。 The images constituting the medical image include both moving images and still images. A moving image can be regarded as a time-series image group including a plurality of frames. Further, the image quality here means the image quality of the images constituting one frame in the moving image.

（２）第１医療画像群が第１解像度の医療画像で構成され、第２医療画像群が第１解像度と異なる第２解像度の医療画像で構成される、上記（１）の学習装置。 (2) The learning device according to (1) above, wherein the first medical image group is composed of medical images having a first resolution, and the second medical image group is composed of medical images having a second resolution different from the first resolution.

本態様によれば、第１医療画像群が第１解像度の医療画像で構成され、第２医療画像群が第１解像度と異なる第２解像度の医療画像で構成される。これにより、特定の解像度を有する医療画像に対して画像認識を行うモデルを生成する場合に、対応する解像度の学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the first medical image group is composed of the first resolution medical image, and the second medical image group is composed of the second resolution medical image different from the first resolution. As a result, when a model for image recognition is generated for a medical image having a specific resolution, an accurate model can be efficiently generated even when the number of learning images having the corresponding resolution is small. ..

（３）第２解像度が、第１解像度よりも低い解像度である、上記（２）の学習装置。 (3) The learning device according to (2) above, wherein the second resolution is lower than the first resolution.

本態様によれば、第２医療画像群が、第１医療画像群を構成する医療画像の解像度（第１解像度）よりも低い解像度（第２解像度）の医療画像で構成される。これにより、特定の解像度の医療画像に対して画像認識を行うモデルを生成する場合に、対応する解像度の学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the second medical image group is composed of medical images having a resolution (second resolution) lower than the resolution (first resolution) of the medical images constituting the first medical image group. As a result, when a model for image recognition is generated for a medical image having a specific resolution, an accurate model can be efficiently generated even when the number of learning images having the corresponding resolution is small.

（４）第１医療画像群が４Ｋ以上の解像度の医療画像で構成され、第２医療画像群が４Ｋ未満の解像度の医療画像で構成される、上記（３）の学習装置。 (4) The learning device according to (3) above, wherein the first medical image group is composed of medical images having a resolution of 4K or more, and the second medical image group is composed of medical images having a resolution of less than 4K.

本態様によれば、第１医療画像群が、４Ｋ以上の解像度の医療画像で構成され、第２医療画像群が、４Ｋ未満の解像度の医療画像で構成される。これにより、４Ｋ未満の特定の解像度の医療画像に対して画像認識を行うモデルを生成する場合に、対応する解像度の学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the first medical image group is composed of medical images having a resolution of 4K or more, and the second medical image group is composed of medical images having a resolution of less than 4K. As a result, when generating a model that performs image recognition for a medical image having a specific resolution of less than 4K, an accurate model can be efficiently produced even when the number of learning images having a corresponding resolution is small. Can be generated.

なお、４Ｋの画像とは、長辺の画素数が４０００程度の高精細画像をいう。特に、横×縦の画素数が４０００×２０００程度の画像をいう。一般に知られている「4K UHDTV（Ultra High Definition Television：超高精細度テレビジョン）」及び「DCI 4K」は、本明細書における「４Ｋ」に含まれる。「4K UHDTV」は、国際電気通信連合（International Telecommunication Union 、ＩＴＵ）が定める４Ｋであり、横３８４０×縦２１６０画素の４Ｋである。「DCI 4K」は、映画会社などが加盟する Digital Cinema Initiatives（ＤＣＩ）の定める４Ｋであり、横４０９６×縦２１６０画素の画像である。 The 4K image is a high-definition image having about 4000 pixels on the long side. In particular, it refers to an image having a horizontal × vertical number of pixels of about 4000 × 2000. The commonly known "4K UHDTV (Ultra High Definition Television)" and "DCI 4K" are included in "4K" herein. "4K UHDTV" is 4K defined by the International Telecommunication Union (ITU), and is 4K with 3840 horizontal pixels and 2160 vertical pixels. "DCI 4K" is 4K defined by Digital Cinema Initiatives (DCI), to which movie companies and the like are affiliated, and is an image of 4096 horizontal x 2160 vertical pixels.

（５）第１医療画像群が８Ｋ以上の解像度の医療画像で構成され、第２医療画像群が８Ｋ未満の解像度の医療画像で構成される、上記（３）の学習装置。 (5) The learning device according to (3) above, wherein the first medical image group is composed of medical images having a resolution of 8K or more, and the second medical image group is composed of medical images having a resolution of less than 8K.

本態様によれば、第１医療画像群が、８Ｋ以上の解像度の医療画像で構成され、第２医療画像群が、８Ｋ未満の解像度の医療画像で構成される。これにより、８Ｋ未満の特定の解像度の医療画像に対して画像認識を行うモデルを生成する場合に、対応する解像度の学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the first medical image group is composed of medical images having a resolution of 8K or more, and the second medical image group is composed of medical images having a resolution of less than 8K. As a result, when generating a model that performs image recognition for a medical image having a specific resolution of less than 8K, an accurate model can be efficiently produced even when the number of learning images having a corresponding resolution is small. Can be generated.

なお、８Ｋの画像とは、長辺の画素数が８０００程度の高精細画像をいう。特に、横×縦の画素数が８０００×４０００程度の画像をいう。一般に知られている「8K UHDTV」（8K Ultra-high-definition television、 8K Ultra HDTV、8K UHDTV、8K UHD、スーパーハイビジョン8Kなどとも称される）は、本明細書における「８Ｋ」に含まれる。「8K UHDTV」は、国際電気通信連合が定める８Ｋであり、横７６８０×縦４３２０画素の画像である。 The 8K image is a high-definition image having about 8000 pixels on the long side. In particular, it refers to an image having a horizontal × vertical number of pixels of about 8000 × 4000. The commonly known "8K UHDTV" (also referred to as 8K Ultra-high-definition television, 8K Ultra HDTV, 8K UHDTV, 8K UHD, Super Hi-Vision 8K, etc.) is included in "8K" herein. "8K UHDTV" is 8K defined by the International Telecommunication Union, and is an image of 7680 horizontal × 4320 vertical pixels.

（６）第２解像度が、第１解像度よりも高い解像度である、上記（２）の学習装置。 (6) The learning device according to (2) above, wherein the second resolution is higher than the first resolution.

本態様によれば、第２医療画像群が、第１医療画像群を構成する医療画像の解像度（第１解像度）よりも高い解像度（第２解像度）の医療画像で構成される。これにより、特定の解像度の医療画像に対して画像認識を行うモデルを生成する場合に、対応する解像度の学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the second medical image group is composed of medical images having a resolution (second resolution) higher than the resolution (first resolution) of the medical images constituting the first medical image group. As a result, when a model for image recognition is generated for a medical image having a specific resolution, an accurate model can be efficiently generated even when the number of learning images having the corresponding resolution is small.

（７）第１医療画像群が４Ｋ未満の解像度の医療画像で構成され、第２医療画像群が４Ｋ以上の解像度の医療画像で構成される、上記（６）の学習装置。 (7) The learning device according to (6) above, wherein the first medical image group is composed of medical images having a resolution of less than 4K, and the second medical image group is composed of medical images having a resolution of 4K or more.

本態様によれば、第１医療画像群が、４Ｋ未満の解像度の医療画像で構成され、第２医療画像群が、４Ｋ以上の解像度の医療画像で構成される。これにより、４Ｋ以上の特定の解像度の医療画像に対して画像認識を行うモデルを生成する場合に、対応する解像度の学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the first medical image group is composed of medical images having a resolution of less than 4K, and the second medical image group is composed of medical images having a resolution of 4K or more. As a result, when generating a model that performs image recognition for a medical image having a specific resolution of 4K or higher, an accurate model can be efficiently produced even when the number of learning images having the corresponding resolution is small. Can be generated.

（８）第１医療画像群が８Ｋ未満の解像度の医療画像で構成され、第２医療画像群が８Ｋ以上の解像度の医療画像で構成される、上記（６）の学習装置。 (8) The learning apparatus according to (6) above, wherein the first medical image group is composed of medical images having a resolution of less than 8K, and the second medical image group is composed of medical images having a resolution of 8K or more.

本態様によれば、第１医療画像群が、８Ｋ未満の解像度の医療画像で構成され、第２医療画像群が、８Ｋ以上の解像度の医療画像で構成される。これにより、８Ｋ以上の特定の解像度の医療画像に対して画像認識を行うモデルを生成する場合に、対応する解像度の学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the first medical image group is composed of medical images having a resolution of less than 8K, and the second medical image group is composed of medical images having a resolution of 8K or more. As a result, when generating a model that performs image recognition for a medical image having a specific resolution of 8K or higher, an accurate model can be efficiently produced even when the number of learning images having the corresponding resolution is small. Can be generated.

（９）第１医療画像群が、第２医療画像群を構成する医療画像よりもノイズ量の少ない医療画像で構成される、上記（１）の学習装置。 (9) The learning device according to (1) above, wherein the first medical image group is composed of medical images having a smaller amount of noise than the medical images constituting the second medical image group.

本態様によれば、第１医療画像群が、第２医療画像群を構成する医療画像よりもノイズ量の少ない医療画像で構成される。これにより、特定のノイズ量の医療画像に対して画像認識を行うモデルを生成する場合に、対応するノイズ量の学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the first medical image group is composed of medical images having a smaller amount of noise than the medical images constituting the second medical image group. As a result, when generating a model that performs image recognition for a medical image with a specific noise amount, an accurate model can be efficiently generated even when the number of learning images having a corresponding noise amount is small. it can.

（１０）第１医療画像群が、第２医療画像群を構成する医療画像よりもノイズ量の多い医療画像で構成される、上記（１）の学習装置。 (10) The learning device according to (1) above, wherein the first medical image group is composed of medical images having a larger amount of noise than the medical images constituting the second medical image group.

本態様によれば、第１医療画像群が、第２医療画像群を構成する医療画像よりもノイズ量の多い医療画像で構成される。これにより、特定のノイズ量の医療画像に対して画像認識を行うモデルを生成する場合に、対応するノイズ量の学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the first medical image group is composed of medical images having a larger amount of noise than the medical images constituting the second medical image group. As a result, when generating a model that performs image recognition for a medical image with a specific noise amount, an accurate model can be efficiently generated even when the number of learning images having a corresponding noise amount is small. it can.

（１１）第１医療画像群が、第２医療画像群を構成する医療画像よりも広い画角の医療画像で構成される、上記（１）の学習装置。 (11) The learning device according to (1) above, wherein the first medical image group is composed of medical images having a wider angle of view than the medical images constituting the second medical image group.

本態様によれば、第１医療画像群が、第２医療画像群を構成する医療画像よりも広い画角の医療画像で構成される。これにより、切り出し等を行った画像で学習することなく、特定の画角の医療画像に対して画像認識を行うモデルを生成できる。 According to this aspect, the first medical image group is composed of medical images having a wider angle of view than the medical images constituting the second medical image group. This makes it possible to generate a model that performs image recognition on a medical image having a specific angle of view without learning from the image that has been cut out or the like.

（１２）第１医療画像群が内視鏡で撮影された医療画像で構成され、第２医療画像群が第１医療画像群を撮影した内視鏡と異なる内視鏡で撮影された医療画像で構成される、上記（１）の学習装置。 (12) The first medical image group is composed of medical images taken by an endoscope, and the second medical image group is a medical image taken by an endoscope different from the endoscope that took the first medical image group. The learning device of (1) above, which is composed of.

本態様によれば、第１医療画像群が、内視鏡で撮影された医療画像で構成され、第２医療画像群が、第１医療画像群を撮影した内視鏡と異なる内視鏡で撮影された医療画像で構成される。これにより、特定の内視鏡で撮影された医療画像に対して画像認識を行うモデルを生成する場合に、当該内視鏡で撮影された学習用画像の数が少ない場合であっても、精度のよいモデルを効率よく生成できる。 According to this aspect, the first medical image group is composed of medical images taken by an endoscope, and the second medical image group is an endoscope different from the endoscope that took the first medical image group. It consists of medical images taken. As a result, when generating a model that performs image recognition on a medical image taken by a specific endoscope, the accuracy is high even when the number of learning images taken by the endoscope is small. A good model can be generated efficiently.

（１３）第２医療画像群が第１医療画像群を構成する医療画像を撮影した内視鏡と異なる仕様の内視鏡で撮影された医療画像で構成される、上記（１２）の学習装置。 (13) The learning device according to (12) above, wherein the second medical image group is composed of medical images taken by an endoscope having specifications different from those of the endoscope that captured the medical images constituting the first medical image group. ..

本態様によれば、第２医療画像群が、第１医療画像群を構成する医療画像を撮影した内視鏡と異なる仕様の内視鏡で撮影された医療画像で構成される。たとえば、搭載されているイメージセンサのサイズ、画素数等が異なる内視鏡、搭載されている光学系の焦点距離が異なる内視鏡、発生するノイズ量が異なる内視鏡等で撮影された医療画像で構成される。 According to this aspect, the second medical image group is composed of medical images taken by an endoscope having specifications different from those of the endoscope that captured the medical images constituting the first medical image group. For example, medical examinations taken with endoscopes with different sizes of mounted image sensors, number of pixels, etc., endoscopes with different focal lengths of the mounted optical systems, endoscopes with different amounts of generated noise, etc. Consists of images.

（１４）第１モデル及び第２モデルが、畳み込みニューラルネットワークで構成される、上記（１）から（１３）のいずれか一の学習装置。 (14) The learning device according to any one of (1) to (13) above, wherein the first model and the second model are composed of a convolutional neural network.

本態様によれば、第１モデル及び第２モデルが、畳み込みニューラルネットワークで構成される。 According to this aspect, the first model and the second model are composed of a convolutional neural network.

（１５）医療画像を取得する医療画像取得部と、上記（１）から（１４）のいずれか一の学習装置で生成された第２モデルで構成され、医療画像に対して画像認識を行うモデルと、を備えた医療画像処理装置。 (15) A model that is composed of a medical image acquisition unit that acquires a medical image and a second model generated by any one of the learning devices (1) to (14) above, and performs image recognition on the medical image. And, a medical image processing device equipped with.

本態様によれば、医療画像に対して画像認識を行うモデルが、上記（１）から（１４）のいずれか一の学習装置で生成された第２モデルで構成される。これにより、特定の画質を有する医療画像に対して精度よく画像認識を行うことができる。 According to this aspect, the model for performing image recognition on the medical image is composed of the second model generated by the learning device according to any one of (1) to (14) above. As a result, image recognition can be performed accurately on a medical image having a specific image quality.

（１６）複数のモデルと、使用するモデルを切り替えるモデル切替部と、を更に備えた上記（１５）の医療画像処理装置。 (16) The medical image processing apparatus according to (15) above, further comprising a plurality of models and a model switching unit for switching a model to be used.

本態様によれば、画像認識を行うモデルが複数備えられ、切り替えて使用される。これにより、画質に応じて適切なモデルを用いて、画像認識できる。 According to this aspect, a plurality of models for performing image recognition are provided and used by switching. As a result, image recognition can be performed using an appropriate model according to the image quality.

（１７）医療画像を撮影した内視鏡の情報を取得する内視鏡情報取得部を更に備え、複数のモデルは、互いに異なる内視鏡で撮影された第２医療画像群を用いて第２学習部で学習することにより生成され、モデル切替部は、内視鏡情報取得部で取得される内視鏡の情報に基づいて、使用するモデルを切り替える上記（１６）の医療画像処理装置。 (17) An endoscope information acquisition unit for acquiring information on an endoscope that has taken a medical image is further provided, and a plurality of models use a second medical image group taken by different endoscopes for a second. The medical image processing apparatus according to (16) above, which is generated by learning in the learning unit, and the model switching unit switches the model to be used based on the endoscopic information acquired by the endoscope information acquisition unit.

本態様によれば、画像認識を行うモデルが複数備えられる。各モデルは、互いに異なる内視鏡で撮影された第２医療画像群を用いて第２学習部で学習することにより生成され、認識対象とする医療画像を撮影した内視鏡に応じて、使用するモデルが自動的に切り替えられる。 According to this aspect, a plurality of models for performing image recognition are provided. Each model is generated by learning in the second learning unit using a second medical image group taken by different endoscopes, and is used depending on the endoscope in which the medical image to be recognized is taken. The model to be used is automatically switched.

（１８）複数のモデルは、互いに仕様の異なる内視鏡で撮影された第２医療画像群を用いて第２学習部で学習することにより生成される、上記（１７）の医療画像処理装置。 (18) The medical image processing apparatus according to (17) above, wherein the plurality of models are generated by learning in the second learning unit using a second medical image group photographed by endoscopes having different specifications.

本態様によれば、複数のモデルが、互いに仕様の異なる内視鏡で撮影された第２医療画像群を用いて第２学習部で学習することにより生成される。 According to this aspect, a plurality of models are generated by learning in the second learning unit using a second medical image group photographed by endoscopes having different specifications from each other.

（１９）複数のモデルは、互いに解像度又はノイズ量の異なる内視鏡で撮影された第２医療画像群を用いて第２学習部で学習することにより生成される、上記（１８）の医療画像処理装置。 (19) The medical image according to (18) above, wherein the plurality of models are generated by learning in the second learning unit using a second medical image group taken by endoscopes having different resolutions or noise amounts from each other. Processing equipment.

本態様によれば、複数のモデルが、互いに解像度又はノイズ量の異なる内視鏡で撮影された第２医療画像群を用いて第２学習部で学習することにより生成される。 According to this aspect, a plurality of models are generated by learning in the second learning unit using a second medical image group captured by endoscopes having different resolutions or noise amounts from each other.

（２０）第１画質の医療画像で構成される第１医療画像群を用いて学習することにより、第１画質の医療画像に対して画像認識を行う第１モデルを生成するステップと、第１モデルを元に、第１画質と異なる第２画質の医療画像で構成される第２医療画像群を用いて学習することにより、第２画質の医療画像に対して画像認識を行う第２モデルを生成するステップと、を備えた学習方法。 (20) A step of generating a first model for performing image recognition on a first-quality medical image by learning using a first medical image group composed of a first-quality medical image, and a first step. A second model that recognizes a second image quality medical image by learning using a second medical image group composed of a second image quality medical image different from the first image quality based on the model. A learning method with steps to generate.

本発明によれば、特定の画質を有する医療画像に対して画像認識を行うモデルを効率よく生成できる。 According to the present invention, it is possible to efficiently generate a model that performs image recognition on a medical image having a specific image quality.

学習装置の構成の一実施形態を示すブロック図A block diagram showing an embodiment of a configuration of a learning device ＣＮＮの構成の一例を示す模式図Schematic diagram showing an example of the configuration of CNN 第１モデル及び第２モデルを構成するＣＮＮの設定の概念図Conceptual diagram of CNN settings constituting the first model and the second model 学習装置のハードウェア構成の一例を示す図Diagram showing an example of the hardware configuration of the learning device 学習装置で行われる学習の手順を示すフローチャートFlowchart showing the learning procedure performed by the learning device 高解像度の学習用画像群による学習結果をベースにした学習の手順を示すフローチャートA flowchart showing a learning procedure based on the learning result of a high-resolution learning image group. 低解像度の学習用画像群による学習結果をベースにした学習の手順を示すフローチャートA flowchart showing a learning procedure based on the learning result of a low-resolution learning image group. 低ノイズの学習用画像群による学習結果をベースにした学習の手順を示すフローチャートFlowchart showing the learning procedure based on the learning result by the low noise learning image group 高ノイズの学習用画像群による学習結果をベースにした学習の手順を示すフローチャートA flowchart showing a learning procedure based on the learning result of a group of high-noise learning images. 広角の学習用画像群による学習結果をベースにした学習の手順を示すフローチャートA flowchart showing a learning procedure based on the learning result of a wide-angle learning image group. 異なる内視鏡で撮影された学習用画像群による学習結果をベースにした学習の手順を示すフローチャートA flowchart showing a learning procedure based on the learning results of learning images taken with different endoscopes. 内視鏡画像処理装置の構成の一実施形態を示すブロック図A block diagram showing an embodiment of the configuration of an endoscopic image processing device. 内視鏡画像処理装置のハードウェア構成の一例を示す図The figure which shows an example of the hardware composition of an endoscopic image processing apparatus. 内視鏡画像処理装置の変形例を示すブロック図Block diagram showing a modified example of an endoscopic image processing device 内視鏡画像処理装置の他の変形例を示すブロック図Block diagram showing other modifications of the endoscopic image processing device

以下、添付図面に従って本発明の好ましい実施形態について詳説する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

［学習装置の構成］
図１は、学習装置の構成の一実施形態を示すブロック図である。[Configuration of learning device]
FIG. 1 is a block diagram showing an embodiment of the configuration of the learning device.

本実施の形態の学習装置１は、内視鏡検査で得られる内視鏡画像に対して画像認識を行うモデルを機械学習により生成する装置として構成される。内視鏡画像は、医療画像の一例である。特に、本実施の形態の学習装置１は、特定の画質を有する内視鏡画像に対して画像認識を行うモデルを機械学習により生成する装置として構成される。ここで行われる画像認識は、たとえば、画像に含まれる病変の検出、病変の種類ごとの分類等である。 The learning device 1 of the present embodiment is configured as a device that generates a model for performing image recognition on an endoscopic image obtained by endoscopy by machine learning. The endoscopic image is an example of a medical image. In particular, the learning device 1 of the present embodiment is configured as a device that generates a model for performing image recognition on an endoscopic image having a specific image quality by machine learning. The image recognition performed here is, for example, detection of a lesion included in an image, classification by type of lesion, and the like.

図１に示すように、本実施の形態の学習装置１は、第１画質の内視鏡画像に対して画像認識を行う第１モデルＭ１を機械学習により生成する第１学習部１０と、第１学習部１０で生成された第１モデルＭ１を元に、第２画質の内視鏡画像に対して画像認識を行う第２モデルＭ２を生成する第２学習部２０と、学習装置１における全体の動作を統括制御する学習制御部３０と、を備える。また、第１学習部１０で学習するための第１学習用データセット１２及び第２学習部２０で学習するための第２学習用データセット２２を備える。 As shown in FIG. 1, the learning device 1 of the present embodiment has a first learning unit 10 and a first learning unit 10 that generate a first model M1 that performs image recognition on an endoscopic image of the first image quality by machine learning. Based on the first model M1 generated by the first learning unit 10, the second learning unit 20 that generates the second model M2 that performs image recognition on the endoscopic image of the second image quality, and the entire learning device 1 It is provided with a learning control unit 30 that comprehensively controls the operation of the above. Further, a first learning data set 12 for learning by the first learning unit 10 and a second learning data set 22 for learning by the second learning unit 20 are provided.

第１学習部１０は、第１学習用データセット１２を用いて学習することにより、第１画質の内視鏡画像に対して画像認識を行う第１モデルＭ１を生成する。第１モデルＭ１は、たとえば、畳み込みニューラルネットワーク（Convolutional Neural Network；ＣＮＮ）で構成される。第１学習部１０は、学習により第１モデルＭ１を構成するＣＮＮの各層の重みパラメータを最適化する。 The first learning unit 10 generates a first model M1 that performs image recognition on an endoscope image of the first image quality by learning using the first learning data set 12. The first model M1 is composed of, for example, a convolutional neural network (CNN). The first learning unit 10 optimizes the weight parameter of each layer of the CNN constituting the first model M1 by learning.

第１学習用データセット１２は、学習用画像群である第１内視鏡画像群で構成される。第１内視鏡画像群は、第１医療画像群の一例であり、第１画質の内視鏡画像で構成される。 The first learning data set 12 is composed of a first endoscopic image group which is a learning image group. The first endoscopic image group is an example of the first medical image group, and is composed of an endoscopic image of the first image quality.

なお、内視鏡画像を構成する画像には、静止画及び動画の双方が含まれる。動画は、複数のフレームを含む時系列の画像群と捉えることができる。画像を構成するデータは、画素単位で、赤（Red，Ｒ）、緑（Green，Ｇ）及び青（Blue，Ｂ）の各強度値（輝度値）を有するデータである。また、画質は、動画の場合、１フレームを構成する画像の画質を意味する。 The images constituting the endoscopic image include both still images and moving images. A moving image can be regarded as a time-series image group including a plurality of frames. The data constituting the image is data having each intensity value (luminance value) of red (Red, R), green (Green, G), and blue (Blue, B) in pixel units. Further, the image quality means the image quality of the images constituting one frame in the case of moving images.

第２学習部２０は、第１学習部１０で生成された第１モデルＭ１を元に、第２学習用データセット２２を用いて学習することにより、第２画質の内視鏡画像に対して画像認識を行う第２モデルＭ２を生成する。第２モデルＭ２は、たとえば、ＣＮＮで構成される。この第２学習部２０で生成される第２モデルＭ２が、特定の画質を有する内視鏡画像に対して画像認識を行うモデルとなる。すなわち、この第２モデルＭ２を学習済みモデルとして使用して、画像認識の処理が行われる。第２学習部２０は、学習により第２モデルＭ２を構成するＣＮＮの各層の重みパラメータを最適化する。 The second learning unit 20 learns using the second learning data set 22 based on the first model M1 generated by the first learning unit 10 to obtain an endoscopic image of the second image quality. A second model M2 for image recognition is generated. The second model M2 is composed of, for example, CNN. The second model M2 generated by the second learning unit 20 is a model that performs image recognition on an endoscopic image having a specific image quality. That is, the image recognition process is performed using the second model M2 as the trained model. The second learning unit 20 optimizes the weight parameters of each layer of the CNN constituting the second model M2 by learning.

第２学習用データセット２２は、学習用画像群である第２内視鏡画像群で構成される。第２内視鏡画像群は、第２医療画像群の一例であり、第１画質とは異なる第２画質の内視鏡画像で構成される。この第２内視鏡画像群を構成する内視鏡画像の画質（第２画質）は、画像認識を行う対象と同じ画質（同程度の画質を含む）とされる。 The second learning data set 22 is composed of a second endoscopic image group which is a learning image group. The second endoscopic image group is an example of the second medical image group, and is composed of an endoscopic image having a second image quality different from the first image quality. The image quality (second image quality) of the endoscope images constituting the second endoscope image group is the same as the image quality (including the same image quality) as the object to be image-recognized.

一方、第１学習用データセット１２は、第２画質とは異なる画質（第１画質）の内視鏡画像で構成される。したがって、画像認識を行う内視鏡画像の画質とは、異なる画質の内視鏡画像で構成される。具体的には、より高画質又はより低画質の内視鏡画像で構成される。 On the other hand, the first learning data set 12 is composed of an endoscopic image having an image quality different from that of the second image quality (first image quality). Therefore, the image quality of the endoscopic image for which image recognition is performed is composed of an endoscopic image having a different image quality. Specifically, it is composed of a higher quality or lower quality endoscopic image.

学習制御部３０は、第１学習部１０及び第２学習部２０の動作を制御して、学習装置１における全体の動作を統括制御する。また、学習制御部３０は、第１モデルＭ１及び第２モデルＭ２を構成するＣＮＮを設定する。 The learning control unit 30 controls the operations of the first learning unit 10 and the second learning unit 20, and controls the overall operation of the learning device 1. Further, the learning control unit 30 sets the CNNs constituting the first model M1 and the second model M2.

図２は、ＣＮＮの構成の一例を示す模式図である。 FIG. 2 is a schematic diagram showing an example of the configuration of CNN.

同図に示すように、ＣＮＮは、畳み込み層、正規化層、プーリング層などを積み重ねて構成された多層のニューラルネットワークで構成される。 As shown in the figure, the CNN is composed of a multi-layer neural network composed of a convolutional layer, a normalization layer, a pooling layer, and the like.

図３は、第１モデル及び第２モデルを構成するＣＮＮの設定の概念図である。同図において、（Ａ）は、第１モデルＭ１を構成するＣＮＮの一例を示す図であり、（Ｂ）は、第２モデルＭ２を構成するＣＮＮの一例を示す図である。 FIG. 3 is a conceptual diagram of CNN settings constituting the first model and the second model. In the figure, (A) is a diagram showing an example of a CNN constituting the first model M1, and (B) is a diagram showing an example of a CNN constituting the second model M2.

学習制御部３０は、第１モデルＭ１を構成するＣＮＮを設定し、その学習済みの第１モデルＭ１を元に、第２モデルＭ２を構成するＣＮＮを設定する。本実施の形態では、学習済みの第１モデルＭ１を構成するＣＮＮの一部の層の重みパラメータをリセットしたものを第２モデルＭ２のＣＮＮとして設定する。重みパラメータをリセットする層は、出力に近い一部の層である。図３に示す例では、破線ＢＬで囲った最終の３つの層（全結合層、全結合層及びＳｏｆｔｍａｘ層）の重みパラメータをリセットして、第２モデルＭ２のＣＮＮを設定している。この場合、実線ＳＬで囲われた残りの層は、学習済みの第１モデルＭ１の重みパラメータが初期値として設定される。 The learning control unit 30 sets the CNN that constitutes the first model M1, and sets the CNN that constitutes the second model M2 based on the learned first model M1. In the present embodiment, the CNN of the second model M2 is set by resetting the weight parameters of some layers of the CNN constituting the trained first model M1. The layers that reset the weight parameters are some layers that are close to the output. In the example shown in FIG. 3, the weight parameters of the final three layers (fully connected layer, fully connected layer and Softmax layer) surrounded by the broken line BL are reset to set the CNN of the second model M2. In this case, the weight parameter of the trained first model M1 is set as an initial value for the remaining layers surrounded by the solid line SL.

［学習装置のハードウェア構成］
図４は、学習装置のハードウェア構成の一例を示す図である。[Hardware configuration of learning device]
FIG. 4 is a diagram showing an example of the hardware configuration of the learning device.

学習装置１は、サーバコンピュータ、クライアントコンピュータなどのコンピュータで構成され、ＣＰＵ（Central Processing Unit）５１、ＲＯＭ（Read Only Memory）５２、ＲＡＭ（Random Access Memory）５３、ＨＤＤ（Hard Disk Drive）５４、通信インターフェイス５５及び入出力インターフェイス５６等を備える。また、学習装置１は、入力装置５７及び表示装置５８等を備える。 The learning device 1 is composed of computers such as a server computer and a client computer, and includes a CPU (Central Processing Unit) 51, a ROM (Read Only Memory) 52, a RAM (Random Access Memory) 53, an HDD (Hard Disk Drive) 54, and communication. It includes an interface 55, an input / output interface 56, and the like. Further, the learning device 1 includes an input device 57, a display device 58, and the like.

ＣＰＵ５１は、プログラムを実行することにより、学習装置１の各部を制御し、学習装置１の各機能を実現する。ＲＯＭ５２は、ＣＰＵ５１が実行する各種プログラム及び各種データ等を記憶する。ＲＡＭ５３は、ＣＰＵ５１に作業領域を提供する。ＨＤＤ５４は、ＣＰＵ５１が実行する各種プログラム及び各種データを記憶する。通信インターフェイス５５は、学習装置１をＬＡＮ（Local Area Network）等のネットワーク５９に接続するためのインターフェイス（interface；I/F）である。学習装置１は、通信インターフェイス５５を介して外部装置と通信する。入出力インターフェイス５６は、学習装置１に入力装置５７、表示装置５８等の外部機器を接続するためのインターフェイスである。入力装置５７は、ユーザによる操作に応じた情報を学習装置１に入力する。入力装置５７は、たとえば、キーボード、マウス等で構成される。表示装置５８は、各種情報を表示する。表示装置５８は、たとえば、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイ等で構成される。 By executing the program, the CPU 51 controls each part of the learning device 1 and realizes each function of the learning device 1. The ROM 52 stores various programs executed by the CPU 51, various data, and the like. The RAM 53 provides the CPU 51 with a work area. The HDD 54 stores various programs and various data executed by the CPU 51. The communication interface 55 is an interface (interface; I / F) for connecting the learning device 1 to a network 59 such as a LAN (Local Area Network). The learning device 1 communicates with an external device via the communication interface 55. The input / output interface 56 is an interface for connecting an external device such as an input device 57 and a display device 58 to the learning device 1. The input device 57 inputs information according to the operation by the user to the learning device 1. The input device 57 is composed of, for example, a keyboard, a mouse, and the like. The display device 58 displays various information. The display device 58 is composed of, for example, a liquid crystal display, an organic EL (Electro Luminescence) display, or the like.

学習装置１を構成する第１学習部１０、第２学習部２０及び学習制御部３０の各機能は、ＣＰＵ５１が所定のプログラムを実行することにより実現される。また、第１学習用データセット１２及び第２学習用データセット２２は、ＨＤＤ５４に格納される。 Each function of the first learning unit 10, the second learning unit 20, and the learning control unit 30 constituting the learning device 1 is realized by the CPU 51 executing a predetermined program. Further, the first learning data set 12 and the second learning data set 22 are stored in the HDD 54.

［学習方法］
《学習の基本手順》
図５は、学習装置で行われる学習の手順を示すフローチャートである。[Learning method]
<< Basic procedure of learning >>
FIG. 5 is a flowchart showing a learning procedure performed by the learning device.

まず、第１モデルＭ１を構成するＣＮＮが設定される（ステップＳ１）。 First, the CNN constituting the first model M1 is set (step S1).

次に、設定されたＣＮＮに対して、第１学習用データセット１２を用いて第１の学習が行われる（ステップＳ２）。すなわち、第１画質の内視鏡画像で構成された第１内視鏡画像群で学習が行われる。これにより、第１画質の内視鏡画像に対して画像認識を行う第１モデルＭ１が生成される。 Next, the first learning is performed on the set CNN using the first learning data set 12 (step S2). That is, learning is performed with the first endoscopic image group composed of the endoscopic images of the first image quality. As a result, the first model M1 that performs image recognition on the endoscopic image of the first image quality is generated.

次に、学習済みの第１モデルＭ１を元に、第２モデルＭ２を構成するＣＮＮが設定される（ステップＳ３）。上記のように、本実施の形態では、学習済みの第１モデルＭ１の出力に近い一部の層の重みパラメータをリセットして、第２モデルＭ２のＣＮＮを設定する（図３参照）。 Next, based on the learned first model M1, the CNN constituting the second model M2 is set (step S3). As described above, in the present embodiment, the weight parameters of some layers close to the output of the trained first model M1 are reset to set the CNN of the second model M2 (see FIG. 3).

次に、設定されたＣＮＮに対して、第２学習用データセット２２を用いて第２の学習が行われる（ステップＳ４）。すなわち、第２画質の内視鏡画像で構成された第２内視鏡画像群で学習が行われる。これにより、第２画質の内視鏡画像に対して画像認識を行う第２モデルＭ２が生成される。 Next, the second training is performed on the set CNN using the second training data set 22 (step S4). That is, learning is performed with the second endoscope image group composed of the endoscope images of the second image quality. As a result, the second model M2 that performs image recognition on the endoscopic image of the second image quality is generated.

ここで、第２の学習で用いられる第２内視鏡画像群は、画像認識の対象とされる内視鏡画像（特定の画質を有する内視鏡画像）の画質と同じ画質（同程度の画質を含む）の内視鏡画像で構成される。この結果、第２の学習で生成される第２モデルＭ２は、画像認識の対象とされる内視鏡画像と同じ画質の内視鏡画像に対して画像認識が可能なモデルとなる。 Here, the second endoscopic image group used in the second learning has the same image quality (similar level) as the image quality of the endoscopic image (endoscopic image having a specific image quality) to be image-recognized. Consists of endoscopic images (including image quality). As a result, the second model M2 generated in the second learning becomes a model capable of image recognition for an endoscope image having the same image quality as the endoscope image to be image-recognized.

このように、本実施の形態の学習装置１では、第１画質の内視鏡画像群で第１の学習を行った後、その第１の学習結果をベースに、第２画質の内視鏡画像群で第２の学習を行う。第２の学習は、第１の学習結果をベースに行われるので、学習用の画像の数が少ない場合であっても、精度のよいモデルを生成できる。したがって、たとえば、特定の画質を有する医療画像に対して画像認識を行うモデルを生成する場合は、学習用の画像が豊富な画質で第１の学習を行った後、目的の画質の医療画像群で第２の学習を行う。これにより、目的とするモデルを効率よく生成できる。 As described above, in the learning device 1 of the present embodiment, after performing the first learning with the endoscope image group of the first image quality, the endoscope of the second image quality is based on the first learning result. The second learning is performed on the image group. Since the second learning is performed based on the first learning result, it is possible to generate an accurate model even when the number of images for training is small. Therefore, for example, in the case of generating a model that performs image recognition on a medical image having a specific image quality, after performing the first learning with abundant image quality of the image for learning, the medical image group of the desired image quality The second learning is done at. As a result, the target model can be efficiently generated.

《実施例》
〈解像度の異なる学習用画像群での学習〉
特定の解像度を有する内視鏡画像に対して画像認識を行うモデルを生成する場合において、画像認識を行う内視鏡画像と異なる解像度（第１解像度）の内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像と同じ解像度（第２解像度）の内視鏡画像群で第２の学習を行う。この場合、（１）画像認識を行う内視鏡画像よりも高い解像度（第１解像度）の内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像と同じ解像度（第２解像度）の内視鏡画像群で第２の学習を行う方法と、（２）画像認識を行う内視鏡画像よりも低い解像度（第１解像度）の内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像と同じ解像度（第２解像度）の内視鏡画像群で第２の学習を行う方法と、がある。以下、（１）及び（２）の場合に分けて説明する。"Example"
<Learning with learning images with different resolutions>
In the case of generating a model that performs image recognition on an endoscopic image having a specific resolution, the first endoscopic image group having a resolution (first resolution) different from that of the endoscopic image that performs image recognition is used. Based on the result, the second learning is performed on the endoscopic image group having the same resolution (second resolution) as the endoscopic image for image recognition. In this case, (1) the first learning is performed using an endoscopic image group having a higher resolution (first resolution) than the endoscopic image for which image recognition is performed, and image recognition is performed based on the result. A method of performing the second learning with an endoscopic image group having the same resolution as the endoscopic image (second resolution), and (2) endoscopy having a lower resolution (first resolution) than the endoscopic image performing image recognition. A method of performing the first learning using the mirror image group, and based on the result, performing the second learning with the endoscopic image group having the same resolution (second resolution) as the endoscope image for image recognition. , There is. Hereinafter, the cases (1) and (2) will be described separately.

（１）高解像度の学習用画像群による学習結果をベースにした学習
特定の解像度を有する内視鏡画像に対して画像認識を行うモデルを生成する場合において、画像認識を行う内視鏡画像の解像度よりも高い解像度の内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像の解像度と同じ解像度（同程度の解像度を含む）の内視鏡画像群で第２の学習を行う。(1) Learning based on the learning result of a high-resolution learning image group When generating a model that performs image recognition on an endoscopic image having a specific resolution, the endoscopic image that performs image recognition Within the same resolution (including similar resolution) as the resolution of the endoscope image for which image recognition is performed based on the result of the first learning performed using the endoscopic image group having a resolution higher than the resolution. The second learning is performed on the spectroscopic image group.

図６は、高解像度の学習用画像群による学習結果をベースにした学習の手順を示すフローチャートである。 FIG. 6 is a flowchart showing a learning procedure based on the learning result of the high-resolution learning image group.

まず、第１モデルＭ１を構成するＣＮＮを設定する（ステップＳ１１）。 First, the CNN constituting the first model M1 is set (step S11).

次に、設定されたＣＮＮに対して、第１学習用データセット１２を用いて、第１の学習を行う（ステップＳ１２）。この第１学習用データセット１２は、画像認識の対象とされる内視鏡画像の解像度よりも相対的に高い解像度を有する内視鏡画像で構成される。この第１の学習により、相対的に高い解像度の内視鏡画像に対して画像認識を行う第１モデルＭ１が生成される。 Next, the set CNN is subjected to the first learning using the first learning data set 12 (step S12). The first learning data set 12 is composed of an endoscopic image having a resolution relatively higher than the resolution of the endoscopic image to be image-recognized. By this first learning, a first model M1 that performs image recognition on an endoscopic image having a relatively high resolution is generated.

次に、学習済みの第１モデルＭ１を元に、第２モデルＭ２を構成するＣＮＮを設定する（ステップＳ１３）。 Next, based on the learned first model M1, the CNNs constituting the second model M2 are set (step S13).

次に、設定されたＣＮＮに対して、第２学習用データセット２２を用いて、第２の学習を行う（ステップＳ１４）。この第２学習用データセット２２は、画像認識の対象とされる内視鏡画像の解像度と同じ解像度（同程度の解像度を含む）の内視鏡画像で構成される。この第２の学習により、目的とする解像度の内視鏡画像に対して画像認識が可能なモデル（第２モデルＭ２）が生成される。 Next, the set CNN is subjected to the second learning using the second learning data set 22 (step S14). The second learning data set 22 is composed of an endoscope image having the same resolution (including a similar resolution) as the resolution of the endoscope image to be image-recognized. By this second learning, a model (second model M2) capable of image recognition for an endoscopic image having a target resolution is generated.

たとえば、４Ｋ未満の解像度の内視鏡画像に対して画像認識を行うモデルを生成する場合、４Ｋ以上の解像度の内視鏡画像群（たとえば、４Ｋ解像度又は８Ｋ解像度の内視鏡画像群等）で第１の学習を行い、その後、第１の学習結果をベースに、目的とする解像度（たとえば、２Ｋ解像度等）の内視鏡画像群で第２の学習を行うことが考えられる。たとえば、２Ｋ解像度の内視鏡画像に対して画像認識を行うモデルを生成する場合、４Ｋ解像度の内視鏡画像群で第１の学習を行い、その後、第１の学習結果をベースに、２Ｋ解像度の内視鏡画像群で第２の学習を行う。これにより、たとえば、２Ｋ解像度の学習用の内視鏡画像が不足する場合であっても、コストをかけずに効率よく目的とするモデルを生成できる。 For example, when generating a model that performs image recognition for an endoscopic image having a resolution of less than 4K, an endoscopic image group having a resolution of 4K or more (for example, an endoscopic image group having a 4K resolution or an 8K resolution) After that, based on the first learning result, it is conceivable to perform the second learning with the endoscopic image group of the target resolution (for example, 2K resolution, etc.). For example, when generating a model that performs image recognition on an endoscopic image having a 2K resolution, the first learning is performed on the endoscopic image group having a 4K resolution, and then 2K is performed based on the first learning result. The second learning is performed on the endoscopic image group of the resolution. Thereby, for example, even when the endoscopic image for learning at 2K resolution is insufficient, the target model can be efficiently generated at no cost.

なお、２Ｋの画像とは、長辺の画素数が２０００程度の高精細画像をいう。特に、横×縦の画素数が２０００×１０００程度の画像をいう。したがって、一般的なフルハイビジョン（横１９２０×縦１０８０）などは、ここでの２Ｋに含まれる。 The 2K image is a high-definition image having about 2000 pixels on the long side. In particular, it refers to an image having a horizontal × vertical number of pixels of about 2000 × 1000. Therefore, general full high-definition (width 1920 x height 1080) and the like are included in 2K here.

また、たとえば、８Ｋ未満の解像度の内視鏡画像に対して画像認識を行うモデルを生成する場合、８Ｋ以上の解像度の内視鏡画像群（たとえば、８Ｋ解像度の内視鏡画像群）で第１の学習を行い、その後、第１の学習結果をベースに、目的とする解像度（たとえば、２Ｋ解像度又は４Ｋ解像度等）の内視鏡画像群で第２の学習を行うことが考えられる。たとえば、４Ｋ解像度の内視鏡画像に対して画像認識を行うモデルを生成する場合、８Ｋ解像度の内視鏡画像群で第１の学習を行い、その後、第１の学習結果をベースに、４Ｋ解像度の内視鏡画像群で第２の学習を行う。これにより、たとえば、４Ｋ解像度の学習用の内視鏡画像が不足する場合であっても、コストをかけずに効率よく目的とするモデルを生成できる。 Further, for example, when generating a model for image recognition for an endoscopic image having a resolution of less than 8K, an endoscopic image group having a resolution of 8K or more (for example, an endoscopic image group having an 8K resolution) is used. It is conceivable to perform the first learning and then perform the second learning on the endoscopic image group of the target resolution (for example, 2K resolution or 4K resolution) based on the first learning result. For example, when generating a model that performs image recognition on an endoscopic image having a 4K resolution, the first learning is performed on the endoscopic image group having an 8K resolution, and then 4K is performed based on the first learning result. The second learning is performed on the endoscopic image group of the resolution. As a result, for example, even when there is a shortage of endoscopic images for learning at 4K resolution, it is possible to efficiently generate a target model at no cost.

一般に、大学病院などの大規模な病院は、比較的解像度の高い内視鏡（たとえば、経口内視鏡）が使用され、クリニックなどの小規模な病院では、比較的解像度の低い内視鏡（たとえば、経鼻内視鏡）が使用される。そして、小規模な病院では、大規模な病院に比べて検査数が少ないことから、学習用の画像を収集しにくいという問題がある。このため、目的とする解像度の学習用画像が不足する場合がある。 In general, large hospitals such as university hospitals use relatively high-resolution endoscopes (for example, oral endoscopes), and small hospitals such as clinics use relatively low-resolution endoscopes (for example, oral endoscopes). For example, a nasal endoscope) is used. In addition, since the number of examinations is smaller in a small hospital than in a large hospital, there is a problem that it is difficult to collect images for learning. Therefore, the learning image of the target resolution may be insufficient.

本態様の学習方法によれば、小規模な病院などで使用される解像度の低い内視鏡で撮影された内視鏡画像に対して画像認識を行うモデルを生成する場合であっても、少ないコストでモデルを最適化できる。 According to the learning method of this aspect, even when generating a model for image recognition for an endoscope image taken by a low-resolution endoscope used in a small hospital or the like, there are few cases. The model can be optimized at cost.

（２）低解像度の学習用画像群による学習結果をベースにした学習
特定の解像度を有する内視鏡画像に対して画像認識を行うモデルを生成する場合において、画像認識を行う内視鏡画像の解像度よりも低い解像度の内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像の解像度と同じ解像度（同程度の解像度を含む）の内視鏡画像群で第２の学習を行う。(2) Learning based on the learning result of a low-resolution learning image group When generating a model that performs image recognition on an endoscopic image having a specific resolution, the endoscopic image that performs image recognition Within the same resolution (including similar resolution) as the resolution of the endoscopic image for image recognition based on the result of the first learning using the endoscopic image group with a resolution lower than the resolution. The second learning is performed on the spectroscopic image group.

図７は、低解像度の学習用画像群による学習結果をベースにした学習の手順を示すフローチャートである。 FIG. 7 is a flowchart showing a learning procedure based on the learning result of the low-resolution learning image group.

まず、第１モデルＭ１を構成するＣＮＮを設定する（ステップＳ２１）。 First, the CNN constituting the first model M1 is set (step S21).

次に、設定されたＣＮＮに対して、第１学習用データセット１２を用いて、第１の学習を行う（ステップＳ２２）。この第１学習用データセット１２は、画像認識の対象とされる内視鏡画像の解像度よりも相対的に低い解像度を有する内視鏡画像で構成される。この第１の学習により、相対的に低い解像度の内視鏡画像に対して画像認識を行う第１モデルＭ１が生成される。 Next, the set CNN is subjected to the first learning using the first learning data set 12 (step S22). The first learning data set 12 is composed of an endoscopic image having a resolution relatively lower than the resolution of the endoscopic image to be image-recognized. By this first learning, a first model M1 that performs image recognition on an endoscopic image having a relatively low resolution is generated.

次に、学習済みの第１モデルＭ１を元に、第２モデルＭ２を構成するＣＮＮを設定する（ステップＳ２３）。 Next, based on the learned first model M1, the CNNs constituting the second model M2 are set (step S23).

次に、設定されたＣＮＮに対して、第２学習用データセット２２を用いて、第２の学習を行う（ステップＳ２４）。この第２学習用データセット２２は、画像認識の対象とされる内視鏡画像の解像度と同じ解像度（同程度の解像度を含む）の内視鏡画像で構成される。この第２の学習により、目的とする解像度の内視鏡画像に対して画像認識が可能なモデル（第２モデルＭ２）が生成される。 Next, the set CNN is subjected to the second learning using the second learning data set 22 (step S24). The second learning data set 22 is composed of an endoscope image having the same resolution (including a similar resolution) as the resolution of the endoscope image to be image-recognized. By this second learning, a model (second model M2) capable of image recognition for an endoscopic image having a target resolution is generated.

たとえば、４Ｋ以上の解像度の内視鏡画像に対して画像認識を行うモデルを生成する場合、４Ｋ未満の解像度の内視鏡画像群（たとえば、２Ｋ解像度の内視鏡画像群等）で第１の学習を行い、その後、第１の学習結果をベースに、目的とする解像度（たとえば、４Ｋ解像度等）の内視鏡画像群で第２の学習を行うことが考えられる。たとえば、４Ｋ解像度の内視鏡画像に対して画像認識を行うモデルを生成する場合、２Ｋ解像度の内視鏡画像群で第１の学習を行い、その後、第１の学習結果をベースに、４Ｋ解像度の内視鏡画像群で第２の学習を行う。これにより、たとえば、４Ｋ解像度の学習用の内視鏡画像が不足する場合であっても、コストをかけずに効率よく目的とするモデルを生成できる。 For example, when generating a model that performs image recognition for an endoscopic image having a resolution of 4K or higher, the first endoscopic image group having a resolution of less than 4K (for example, an endoscopic image group having a 2K resolution) is the first. After that, based on the first learning result, it is conceivable to perform the second learning with an endoscopic image group having a target resolution (for example, 4K resolution). For example, when generating a model that performs image recognition on an endoscopic image having a 4K resolution, the first learning is performed on the endoscopic image group having a 2K resolution, and then 4K is performed based on the first learning result. The second learning is performed on the endoscopic image group of the resolution. As a result, for example, even when there is a shortage of endoscopic images for learning at 4K resolution, it is possible to efficiently generate a target model at no cost.

また、たとえば、８Ｋ以上の解像度の内視鏡画像に対して画像認識を行うモデルを生成する場合、８Ｋ未満の解像度の内視鏡画像群（たとえば、４Ｋ解像度又は２Ｋ解像度の内視鏡画像群）で第１の学習を行い、その後、第１の学習結果をベースに、目的とする解像度（たとえば、８Ｋ解像度等）の内視鏡画像群で第２の学習を行うことが考えられる。たとえば、８Ｋ解像度の内視鏡画像に対して画像認識を行うモデルを生成する場合、４Ｋ解像度の内視鏡画像群で第１の学習を行い、その後、第１の学習結果をベースに、８Ｋ解像度の内視鏡画像群で第２の学習を行う。これにより、たとえば、８Ｋ解像度の学習用の内視鏡画像が不足する場合であっても、コストをかけずに効率よく目的とするモデルを生成できる。 Further, for example, when generating a model that performs image recognition for an endoscopic image having a resolution of 8K or higher, an endoscopic image group having a resolution of less than 8K (for example, an endoscopic image group having a 4K resolution or a 2K resolution). ), And then, based on the first learning result, it is conceivable to perform the second learning with an endoscopic image group having a target resolution (for example, 8K resolution). For example, when generating a model that performs image recognition on an endoscopic image having an 8K resolution, the first learning is performed on the endoscopic image group having a 4K resolution, and then 8K is performed based on the first learning result. The second learning is performed on the endoscopic image group of the resolution. Thereby, for example, even when the endoscopic image for learning at 8K resolution is insufficient, the target model can be efficiently generated at no cost.

内視鏡の発展により、今後、更に画像の高解像度化が進むことが予想される。その場合、学習に使用するための高解像度な画像が不足することが考えられる。 With the development of endoscopes, it is expected that the resolution of images will be further increased in the future. In that case, it is conceivable that there will be a shortage of high-resolution images to be used for learning.

本態様の学習方法によれば、目的とする解像度の学習用画像が不足する場合であっても、既存の学習用の画像群を利用して、目的とする解像度のモデルを効率よく生成できる。 According to the learning method of this aspect, even when the learning image of the target resolution is insufficient, the model of the target resolution can be efficiently generated by using the existing learning image group.

〈ノイズ量の異なる学習用画像群での学習〉
特定のノイズ量の内視鏡画像に対して画像認識を行うモデルを生成する場合において、画像認識を行う内視鏡画像と異なるノイズ量の内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像のノイズ量と同じノイズ量（同程度のノイズ量を含む）の内視鏡画像群で第２の学習を行う。この場合、（１）画像認識を行う内視鏡画像よりもノイズ量の少ない内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像のノイズ量と同じノイズ量（同程度のノイズ量を含む）の内視鏡画像群で第２の学習を行う方法と、（２）画像認識を行う内視鏡画像よりもノイズ量の多い内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像のノイズ量と同じノイズ量（同程度のノイズ量を含む）の内視鏡画像群で第２の学習を行う方法と、がある。以下、（１）及び（２）の場合に分けて説明する。<Learning with learning image groups with different amounts of noise>
When generating a model for image recognition for an endoscopic image with a specific amount of noise, the first learning is performed using an endoscopic image group with a different amount of noise from the endoscopic image for image recognition. Based on the result, the second learning is performed on the endoscope image group having the same amount of noise (including the same amount of noise) as the noise amount of the endoscope image for which image recognition is performed. In this case, (1) the first learning is performed using an endoscopic image group having a smaller amount of noise than the endoscopic image for which image recognition is performed, and based on the result, the endoscopic image for which image recognition is performed is performed. A method of performing the second learning with an endoscopic image group having the same amount of noise as the amount of noise (including a similar amount of noise), and (2) endoscopy having a larger amount of noise than the endoscopic image performing image recognition. The first learning is performed using the mirror image group, and based on the result, the endoscopic image group has the same amount of noise (including the same amount of noise) as the noise amount of the endoscope image for which image recognition is performed. There is a method of performing the second learning. Hereinafter, the cases (1) and (2) will be described separately.

（１）低ノイズの学習用画像群による学習結果をベースにした学習
特定のノイズ量の内視鏡画像に対して画像認識を行うモデルを生成する場合において、画像認識を行う内視鏡画像よりも低ノイズの内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像のノイズ量と同じノイズ量（同程度のノイズ量を含む）の内視鏡画像群で第２の学習を行う。(1) Learning based on the learning result by the low noise learning image group When generating a model that performs image recognition on an endoscopic image with a specific amount of noise, from the endoscopic image that performs image recognition The first learning is performed using the low-noise endoscopic image group, and based on the result, the noise amount is the same as the noise amount of the endoscopic image for image recognition (including the same amount of noise). The second learning is performed on the endoscopic image group.

図８は、低ノイズの学習用画像群による学習結果をベースにした学習の手順を示すフローチャートである。 FIG. 8 is a flowchart showing a learning procedure based on the learning result of the low noise learning image group.

まず、第１モデルＭ１を構成するＣＮＮを設定する（ステップＳ３１）。 First, the CNN constituting the first model M1 is set (step S31).

次に、設定されたＣＮＮに対して、第１学習用データセット１２を用いて、第１の学習を行う（ステップＳ３２）。この第１学習用データセット１２は、画像認識の対象とされる内視鏡画像のノイズ量よりも相対的にノイズ量の少ない内視鏡画像で構成される。たとえば、ローエンドの内視鏡で撮影された内視鏡画像に対して画像認識を行うモデルを生成する場合、より低ノイズのハイエンドの内視鏡で撮影された内視鏡画像で第１学習用データセット１２を構成する。この第１の学習により、相対的に低ノイズの内視鏡画像に対して画像認識を行う第１モデルＭ１が生成される。 Next, the set CNN is subjected to the first learning using the first learning data set 12 (step S32). The first learning data set 12 is composed of an endoscope image having a noise amount relatively smaller than the noise amount of the endoscope image to be image-recognized. For example, when generating a model that performs image recognition on an endoscope image taken by a low-end endoscope, the endoscope image taken by a low-end high-end endoscope is used for the first learning. Configure the data set 12. By this first learning, a first model M1 that performs image recognition on an endoscopic image having relatively low noise is generated.

次に、学習済みの第１モデルＭ１を元に、第２モデルＭ２を構成するＣＮＮを設定する（ステップＳ３３）。 Next, based on the learned first model M1, the CNNs constituting the second model M2 are set (step S33).

次に、設定されたＣＮＮに対して、第２学習用データセット２２を用いて、第２の学習を行う（ステップＳ３４）。この第２学習用データセット２２は、画像認識の対象とされる内視鏡画像のノイズ量と同じノイズ量（同程度のノイズ量を含む）のノイズ量の内視鏡画像で構成される。たとえば、ローエンドの内視鏡で撮影された内視鏡画像に対して画像認識を行うモデルを生成する場合、当該ローエンドの内視鏡で撮影された内視鏡画像で第２学習用データセット２２を構成する。この第２の学習により、目的とするノイズ量の内視鏡画像に対して画像認識が可能なモデル（第２モデルＭ２）が生成される。 Next, the set CNN is subjected to the second learning using the second learning data set 22 (step S34). The second learning data set 22 is composed of an endoscope image having the same amount of noise (including the same amount of noise) as the amount of noise of the endoscope image to be image-recognized. For example, when generating a model that performs image recognition on an endoscope image taken by a low-end endoscope, the second learning data set 22 is obtained from the endoscope image taken by the low-end endoscope. To configure. By this second learning, a model (second model M2) capable of image recognition for an endoscopic image having a target noise amount is generated.

一般に、大学病院などの大規模な病院は、比較的ノイズ量の少ない内視鏡（いわゆるハイエンドの内視鏡）が使用され、クリニックなどの小規模な病院では、それに比してノイズ量の多い内視鏡が使用される。そして、小規模な病院では、大規模な病院に比べて検査数が少ないことから、学習用の画像を収集しにくいという問題がある。このため、目的とするノイズ量の学習用画像が不足する場合がある。 In general, large hospitals such as university hospitals use endoscopes with a relatively low amount of noise (so-called high-end endoscopes), and small hospitals such as clinics have a larger amount of noise than that. An endoscope is used. In addition, since the number of examinations is smaller in a small hospital than in a large hospital, there is a problem that it is difficult to collect images for learning. Therefore, the learning image of the target noise amount may be insufficient.

本態様の学習方法によれば、小規模な病院などで使用される内視鏡（比較的ノイズ量の多い内視鏡）で撮影された内視鏡画像に対して画像認識を行うモデルを生成する場合であっても、少ないコストでモデルを最適化できる。 According to the learning method of this aspect, a model for image recognition is generated for an endoscope image taken by an endoscope (endoscope with a relatively large amount of noise) used in a small hospital or the like. Even if you do, you can optimize your model at low cost.

（２）高ノイズの学習用画像群による学習結果をベースにした学習
特定のノイズ量の内視鏡画像に対して画像認識を行うモデルを生成する場合において、画像認識を行う内視鏡画像よりも高ノイズの内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像のノイズ量と同じノイズ量（同程度のノイズ量を含む）の内視鏡画像群で第２の学習を行う。(2) Learning based on the learning result of the high-noise learning image group When generating a model that performs image recognition on an endoscopic image with a specific amount of noise, from the endoscopic image that performs image recognition The first learning is performed using the high-noise endoscopic image group, and based on the result, the noise amount is the same as the noise amount of the endoscopic image for image recognition (including the same amount of noise). The second learning is performed on the endoscopic image group.

図９は、高ノイズの学習用画像群による学習結果をベースにした学習の手順を示すフローチャートである。 FIG. 9 is a flowchart showing a learning procedure based on the learning result of the high noise learning image group.

まず、第１モデルＭ１を構成するＣＮＮを設定する（ステップＳ４１）。 First, the CNN constituting the first model M1 is set (step S41).

次に、設定されたＣＮＮに対して、第１学習用データセット１２を用いて、第１の学習を行う（ステップＳ４２）。この第１学習用データセット１２は、画像認識の対象とされる内視鏡画像のノイズ量よりも相対的にノイズ量の多い内視鏡画像で構成される。この第１の学習により、相対的に高ノイズの内視鏡画像に対して画像認識を行う第１モデルＭ１が生成される。 Next, the set CNN is subjected to the first learning using the first learning data set 12 (step S42). The first learning data set 12 is composed of an endoscope image having a noise amount relatively larger than the noise amount of the endoscope image to be image-recognized. By this first learning, a first model M1 that performs image recognition on an endoscopic image having relatively high noise is generated.

次に、学習済みの第１モデルＭ１を元に、第２モデルＭ２を構成するＣＮＮを設定する（ステップＳ４３）。 Next, based on the learned first model M1, the CNNs constituting the second model M2 are set (step S43).

次に、設定されたＣＮＮに対して、第２学習用データセット２２を用いて、第２の学習を行う（ステップＳ４４）。この第２学習用データセット２２は、画像認識の対象とされる内視鏡画像のノイズ量と同じノイズ量（同程度のノイズ量を含む）の内視鏡画像で構成される。この第２の学習により、目的とするノイズ量の内視鏡画像に対して画像認識が可能なモデル（第２モデルＭ２）が生成される。 Next, the set CNN is subjected to the second learning using the second learning data set 22 (step S44). The second learning data set 22 is composed of an endoscopic image having the same amount of noise (including a similar amount of noise) as the amount of noise of the endoscopic image to be image-recognized. By this second learning, a model (second model M2) capable of image recognition for an endoscopic image having a target noise amount is generated.

内視鏡の発展により、今後、更に画像の低ノイズ化が進むことが予想される。その場合、学習に使用するための低ノイズの画像が不足することが考えられる。 With the development of endoscopes, it is expected that the noise of images will be further reduced in the future. In that case, it is conceivable that there will be a shortage of low-noise images to be used for learning.

本態様の学習方法によれば、目的とするノイズ量の学習用画像が不足する場合であっても、既存の学習用の画像群を利用して、目的とするノイズ量のモデルを効率よく生成できる。 According to the learning method of this aspect, even when the learning image of the target noise amount is insufficient, the model of the target noise amount is efficiently generated by using the existing learning image group. it can.

〈広角の学習用画像群による学習結果をベースにした学習〉
特定の画角の内視鏡画像に対して画像認識を行うモデルを生成する場合において、画像認識を行う内視鏡画像よりも広い画角の内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡画像の画角と同じ画角（略同じ画角を含む）の内視鏡画像群で第２の学習を行う。<Learning based on learning results using wide-angle learning image groups>
When generating a model for image recognition for an endoscopic image with a specific angle of view, the first learning is performed using an endoscopic image group with a wider angle of view than the endoscopic image for image recognition. Based on the result, the second learning is performed with an endoscopic image group having the same angle of view (including substantially the same angle of view) as the angle of view of the endoscopic image for which image recognition is performed.

図１０は、広角の学習用画像群による学習結果をベースにした学習の手順を示すフローチャートである。 FIG. 10 is a flowchart showing a learning procedure based on the learning result of the wide-angle learning image group.

まず、第１モデルＭ１を構成するＣＮＮを設定する（ステップＳ５１）。 First, the CNN constituting the first model M1 is set (step S51).

次に、設定されたＣＮＮに対して、第１学習用データセット１２を用いて、第１の学習を行う（ステップＳ５２）。この第１学習用データセット１２は、画像認識の対象とされる内視鏡画像の画角よりも相対的に広い画角の内視鏡画像で構成される。この第１の学習により、目的の内視鏡画像よりも広角の内視鏡画像に対して画像認識を行う第１モデルＭ１が生成される。 Next, the set CNN is subjected to the first learning using the first learning data set 12 (step S52). The first learning data set 12 is composed of an endoscope image having an angle of view relatively wider than the angle of view of the endoscope image to be image-recognized. By this first learning, the first model M1 that performs image recognition on an endoscopic image having a wider angle than the target endoscopic image is generated.

次に、学習済みの第１モデルＭ１を元に、第２モデルＭ２を構成するＣＮＮを設定する（ステップＳ５３）。 Next, based on the learned first model M1, the CNNs constituting the second model M2 are set (step S53).

次に、設定されたＣＮＮに対して、第２学習用データセット２２を用いて、第２の学習を行う（ステップＳ５４）。この第２学習用データセット２２は、画像認識の対象とされる内視鏡画像の画角と同じ画角（略同じ画角を含む）の内視鏡画像で構成される。この第２の学習により、特定の内視鏡での使用に最適化された画像認識のモデル（第２モデルＭ２）が生成される。 Next, the set CNN is subjected to the second learning using the second learning data set 22 (step S54). The second learning data set 22 is composed of an endoscope image having the same angle of view (including substantially the same angle of view) as the angle of view of the endoscope image to be image-recognized. This second learning produces an image recognition model (second model M2) optimized for use in a particular endoscope.

このように、広角の学習用画像群による学習の結果をベースに、目的とする画角の学習用画像群で学習することにより、切り出し等を行った画像で再学習を行う必要がなく、より少ないコストで学習を行うことができる。 In this way, by learning with the learning image group of the target angle of view based on the learning result with the wide-angle learning image group, it is not necessary to perform re-learning with the cropped image, and more. Learning can be done at low cost.

〈他の内視鏡で撮影された学習用画像群による学習結果をベースにした学習〉
特定の内視鏡で撮影された内視鏡画像に対して画像認識を行うモデルを生成する場合において、他の内視鏡で撮影された内視鏡画像群を用いて第１の学習を行い、その結果をベースに、画像認識を行う内視鏡で撮影された内視鏡画像群で第２の学習を行う。<Learning based on the learning results of learning images taken with other endoscopes>
In the case of generating a model that performs image recognition on an endoscope image taken by a specific endoscope, the first learning is performed using an endoscope image group taken by another endoscope. Based on the result, the second learning is performed with the endoscope image group taken by the endoscope that performs image recognition.

図１１は、異なる内視鏡で撮影された学習用画像群による学習結果をベースにした学習の手順を示すフローチャートである。 FIG. 11 is a flowchart showing a learning procedure based on the learning results of the learning image groups taken by different endoscopes.

まず、第１モデルＭ１を構成するＣＮＮを設定する（ステップＳ６１）。 First, the CNN constituting the first model M1 is set (step S61).

次に、設定されたＣＮＮに対して、第１学習用データセット１２を用いて、第１の学習を行う（ステップＳ６２）。この第１学習用データセット１２は、画像認識を行う内視鏡とは異なる内視鏡（他の内視鏡）で撮影された内視鏡画像で構成される。たとえば、画像認識を行う内視鏡とは仕様（イメージセンサのサイズ、イメージセンサの解像度、イメージセンサの種類、撮影光学系の構成、光源種等）の異なる内視鏡で撮影された内視鏡で構成される。この第１の学習により、当該他の内視鏡で撮影された内視鏡画像に対して画像認識を行う第１モデルＭ１が生成される。 Next, the set CNN is subjected to the first learning using the first learning data set 12 (step S62). The first learning data set 12 is composed of an endoscope image taken by an endoscope (another endoscope) different from the endoscope that performs image recognition. For example, an endoscope taken with an endoscope having different specifications (image sensor size, image sensor resolution, image sensor type, imaging optical system configuration, light source type, etc.) from an endoscope that performs image recognition. Consists of. By this first learning, the first model M1 that performs image recognition on the endoscope image taken by the other endoscope is generated.

次に、学習済みの第１モデルＭ１を元に、第２モデルＭ２を構成するＣＮＮを設定する（ステップＳ６３）。 Next, based on the learned first model M1, the CNN constituting the second model M2 is set (step S63).

次に、設定されたＣＮＮに対して、第２学習用データセット２２を用いて、第２の学習を行う（ステップＳ６４）。この第２学習用データセット２２は、画像認識を行う内視鏡と同じ内視鏡（同じ機種及び同じ仕様の内視鏡を含む）で撮影された内視鏡画像で構成される。この第２の学習により、目的とする内視鏡画像に対して画像認識が可能なモデル（第２モデルＭ２）が生成される。 Next, the set CNN is subjected to the second learning using the second learning data set 22 (step S64). The second learning data set 22 is composed of an endoscope image taken by the same endoscope (including an endoscope of the same model and the same specifications) as the endoscope performing image recognition. By this second learning, a model (second model M2) capable of image recognition for a target endoscopic image is generated.

本態様の学習方法によれば、画像認識の対象となる内視鏡の学習用画像が不足する場合であっても、豊富に存在する他の内視鏡の学習用画像群を利用して、特定の内視鏡で撮影された内視鏡画像の画像認識を行うモデルを効率よく生成できる。 According to the learning method of this aspect, even when the learning image of the endoscope to be image-recognized is insufficient, the learning image group of other endoscopes which is abundant can be used. It is possible to efficiently generate a model that recognizes an endoscopic image taken by a specific endoscope.

また、内視鏡は、同じ機種であっても個体差が存在する場合がある。本態様の学習方法によれば、個体差がある場合であっても、特定の内視鏡の画像認識モデルを効率よく生成できる。 In addition, there may be individual differences in the endoscopes even if they are of the same model. According to the learning method of this aspect, an image recognition model of a specific endoscope can be efficiently generated even when there are individual differences.

［学習装置の変形例］
《学習装置のハードウェア構成の変形例》
上記実施の形態では、第１学習部１０及び第２学習部２０の機能を同一のコンピュータで実現する構成としているが、複数のコンピュータで実現する構成とすることもできる。たとえば、第１学習部１０及び第２学習部２０の機能を別々のコンピュータで実現することもできる。[Modification of learning device]
<< Modification example of hardware configuration of learning device >>
In the above embodiment, the functions of the first learning unit 10 and the second learning unit 20 are realized by the same computer, but the functions may be realized by a plurality of computers. For example, the functions of the first learning unit 10 and the second learning unit 20 can be realized by separate computers.

《第１モデル及び第２モデルの構成》
上記実施の形態では、画像認識を行うモデルをＣＮＮで構成しているが、画像認識を行うモデルの構成は、これに限定されるものではない。機械学習で生成されるモデルであればよい。<< Configuration of 1st model and 2nd model >>
In the above embodiment, the model for image recognition is configured by CNN, but the configuration of the model for image recognition is not limited to this. Any model generated by machine learning will do.

《第２モデルの設定》
上記実施の形態では、学習済みの第１モデルＭ１を構成するＣＮＮの一部の層の重みパラメータをリセットしたものを第２モデルＭ２のＣＮＮとして設定しているが、第２モデルＭ２を設定する手法は、これに限定されるものではない。たとえば、学習済みの第１モデルＭ１の重みパラメータを初期値として、ＣＮＮ全体で学習し直す手法、学習済みの第１モデルＭ１の入力層及び出力層を置換して、第２の学習を行う手法、学習済みの第１モデルＭ１の一部の層（たとえば、特徴抽出を行う層）の重みパラメータを固定し、他の層（たとえば、認識を行う層）だけを学習する手法など、種々の手法を採用できる。<< Setting of the second model >>
In the above embodiment, the CNN of the second model M2 is set by resetting the weight parameters of some layers of the CNN constituting the trained first model M1, but the second model M2 is set. The method is not limited to this. For example, a method of re-learning the entire CNN with the weight parameter of the trained first model M1 as an initial value, and a method of replacing the input layer and the output layer of the trained first model M1 to perform the second learning. , Various methods such as a method of fixing the weight parameter of a part of the trained first model M1 (for example, a layer for feature extraction) and learning only another layer (for example, a layer for recognition). Can be adopted.

また、上記実施の形態のように、学習済みの第１モデルＭ１の一部の層の重みパラメータをリセットして、第２の学習を行う場合、各層で学習係数を変えてもよい。たとえば、重みパラメータをリセットした層では、学習が速く進むように、他の層に比べて学習係数を大きく設定して第２の学習を行うようにしてもよい。 Further, when the weight parameters of some layers of the trained first model M1 are reset and the second learning is performed as in the above embodiment, the learning coefficient may be changed in each layer. For example, in the layer in which the weight parameter is reset, the learning coefficient may be set larger than that in the other layers so that the learning proceeds faster, and the second learning may be performed.

この第２モデルの設定を含む第２の学習には、いわゆる転移学習（ファインチューニングなどとも称される）の手法を採用できる。 A so-called transfer learning (also referred to as fine tuning) method can be adopted for the second learning including the setting of the second model.

［内視鏡画像処理装置］
図１２は、内視鏡画像処理装置の構成の一実施形態を示すブロック図である。[Endoscopic image processing device]
FIG. 12 is a block diagram showing an embodiment of the configuration of the endoscopic image processing apparatus.

内視鏡画像処理装置１００は、医療画像処理装置の一例である。内視鏡画像処理装置１００は、特定の画質を有する内視鏡画像を取得し、取得した内視鏡画像に対して画像認識（画像に含まれる病変の検出、病変の種類ごとの分類等）を行い、その結果を出力する。画像認識には、上記学習装置１で生成された画像認識のモデルが使用される。 The endoscopic image processing device 100 is an example of a medical image processing device. The endoscopic image processing device 100 acquires an endoscopic image having a specific image quality and recognizes the acquired endoscopic image (detection of lesions contained in the image, classification by type of lesion, etc.). And output the result. For image recognition, the image recognition model generated by the learning device 1 is used.

図１２に示すように、内視鏡画像処理装置１００は、認識対象の内視鏡画像を取得する内視鏡画像取得部１１０、取得した内視鏡画像に対して画像認識を行う画像認識部１１２、認識結果を出力する認識結果出力部１１４、及び、全体を制御する画像処理制御部１１６を備える。 As shown in FIG. 12, the endoscope image processing device 100 includes an endoscope image acquisition unit 110 that acquires an endoscope image to be recognized, and an image recognition unit that performs image recognition on the acquired endoscope image. It includes 112, a recognition result output unit 114 that outputs a recognition result, and an image processing control unit 116 that controls the whole.

内視鏡画像取得部１１０は、医療画像取得部の一例であり、認識対象の内視鏡画像（医療画像）を取得する。この内視鏡画像は、特定の画質を有する内視鏡画像である。 The endoscopic image acquisition unit 110 is an example of a medical image acquisition unit, and acquires an endoscopic image (medical image) to be recognized. This endoscopic image is an endoscopic image having a specific image quality.

画像認識部１１２は、内視鏡画像取得部１１０で取得された内視鏡画像に対して、画像認識（画像に含まれる病変の検出、病変の種類ごとの分類等）の処理を行う。画像認識部１１２は、上記学習装置１で生成された画像認識のモデル（学習済みモデル）で構成される。したがって、目的とする画質と異なる画質の学習用画像群（第１内視鏡画像群）で第１の学習を行い、その学習結果をベースに、目的とする画質の学習用画像群（第２内視鏡画像群）で学習して生成されたモデル（第２モデル）で構成される。 The image recognition unit 112 performs image recognition (detection of lesions included in the image, classification by type of lesion, etc.) on the endoscopic image acquired by the endoscopic image acquisition unit 110. The image recognition unit 112 is composed of an image recognition model (learned model) generated by the learning device 1. Therefore, the first learning is performed with the learning image group (first endoscopic image group) having an image quality different from the target image quality, and based on the learning result, the learning image group (second) with the target image quality is performed. It is composed of a model (second model) generated by learning with an endoscopic image group).

認識結果出力部１１４は、画像認識部１１２による認識結果を所定のフォーマットで出力する。たとえば、モニタに所定の表示フォーマットで出力する。 The recognition result output unit 114 outputs the recognition result by the image recognition unit 112 in a predetermined format. For example, it outputs to a monitor in a predetermined display format.

画像処理制御部１１６は、各部の動作を統括制御する。 The image processing control unit 116 controls the operation of each unit in an integrated manner.

［内視鏡画像処理装置のハードウェア構成］
図１３は、内視鏡画像処理装置のハードウェア構成の一例を示す図である。[Hardware configuration of endoscopic image processing device]
FIG. 13 is a diagram showing an example of the hardware configuration of the endoscopic image processing device.

内視鏡画像処理装置１００は、サーバコンピュータ、クライアントコンピュータなどのコンピュータで構成され、ＣＰＵ１２１、ＲＯＭ１２２、ＲＡＭ１２３、ＨＤＤ１２４、通信インターフェイス１２５及び入出力インターフェイス１２６等を備える。また、学習装置１は、入力装置１２７及び表示装置１２８等を備える。 The endoscopic image processing device 100 is composed of computers such as a server computer and a client computer, and includes a CPU 121, a ROM 122, a RAM 123, an HDD 124, a communication interface 125, an input / output interface 126, and the like. Further, the learning device 1 includes an input device 127, a display device 128, and the like.

ＣＰＵ１２１は、プログラムを実行することにより、内視鏡画像処理装置１００の各部を制御し、内視鏡画像処理装置１００の各機能を実現する。ＲＯＭ１２２は、ＣＰＵ１２１が実行する各種プログラム及び各種データ等を記憶する。ＲＡＭ１２３は、ＣＰＵ１２１に作業領域を提供する。ＨＤＤ１２４は、ＣＰＵ１２１が実行する各種プログラム及び各種データを記憶する。通信インターフェイス１２５は、内視鏡画像処理装置１００をＬＡＮ等のネットワーク５９に接続するためのインターフェイスである。内視鏡画像処理装置１００は、通信インターフェイス１２５を介して外部装置と通信する。入出力インターフェイス１２６は、内視鏡画像処理装置１００に入力装置１２７、表示装置１２８等の外部機器を接続するためのインターフェイスである。入力装置１２７は、ユーザによる操作に応じた情報を内視鏡画像処理装置１００に入力する。入力装置１２７は、たとえば、キーボード、マウス等で構成される。表示装置１２８は、各種情報を表示する。表示装置１２８は、たとえば、液晶ディスプレイ、有機ＥＬディスプレイ等で構成される。 By executing the program, the CPU 121 controls each part of the endoscopic image processing device 100 and realizes each function of the endoscopic image processing device 100. The ROM 122 stores various programs executed by the CPU 121, various data, and the like. The RAM 123 provides a work area for the CPU 121. The HDD 124 stores various programs and various data executed by the CPU 121. The communication interface 125 is an interface for connecting the endoscopic image processing device 100 to a network 59 such as a LAN. The endoscopic image processing device 100 communicates with an external device via the communication interface 125. The input / output interface 126 is an interface for connecting an external device such as an input device 127 and a display device 128 to the endoscopic image processing device 100. The input device 127 inputs information according to the operation by the user to the endoscopic image processing device 100. The input device 127 includes, for example, a keyboard, a mouse, and the like. The display device 128 displays various information. The display device 128 is composed of, for example, a liquid crystal display, an organic EL display, or the like.

内視鏡画像取得部１１０、画像認識部１１２及び認識結果出力部１１４の各機能は、ＣＰＵ１２１が所定のプログラムを実行することにより実現される。 Each function of the endoscope image acquisition unit 110, the image recognition unit 112, and the recognition result output unit 114 is realized by the CPU 121 executing a predetermined program.

認識対象の内視鏡画像は、たとえば、ＨＤＤ１２４に格納され、ＨＤＤ１２４から取得される。あるいは、ネットワーク５９を介して接続された外部の記憶装置に格納され、その外部の記憶装置からネットワーク５９を介して取得される。あるいは、ネットワーク５９を介して接続された内視鏡装置からネットワーク５９を介して取得される。内視鏡画像取得部１１０は、画像処理制御部１１６による制御の下、指定された取得先から認識対象の内視鏡画像を取得する。 The endoscopic image to be recognized is stored in, for example, HDD 124 and acquired from HDD 124. Alternatively, it is stored in an external storage device connected via the network 59, and is acquired from the external storage device via the network 59. Alternatively, it is obtained from an endoscope device connected via the network 59 via the network 59. The endoscope image acquisition unit 110 acquires an endoscope image to be recognized from a designated acquisition destination under the control of the image processing control unit 116.

認識結果は、たとえば、表示装置１２８に所定の表示フォーマットで表示される。認識結果出力部１１４は、画像処理制御部１１６による制御の下、画像認識部１１２の認識結果を所定のフォーマットで表示装置１２８に出力する。 The recognition result is displayed on the display device 128, for example, in a predetermined display format. The recognition result output unit 114 outputs the recognition result of the image recognition unit 112 to the display device 128 in a predetermined format under the control of the image processing control unit 116.

［画像処理方法］
まず、内視鏡画像取得部１１０によって、認識対象の内視鏡画像が取得される。この内視鏡画像は、特定の画質の内視鏡画像である。次に、画像認識部１１２において、取得された内視鏡画像に対して画像認識が行われる。次に、認識結果出力部１１４によって、認識結果が出力される。[Image processing method]
First, the endoscopic image acquisition unit 110 acquires an endoscopic image to be recognized. This endoscopic image is an endoscopic image of a specific image quality. Next, the image recognition unit 112 performs image recognition on the acquired endoscopic image. Next, the recognition result output unit 114 outputs the recognition result.

本実施の形態の内視鏡画像処理装置１００では、特定の画質に最適化されたモデルで画像認識が行われるため、精度の高い画像認識を行うことができる。 In the endoscopic image processing device 100 of the present embodiment, since image recognition is performed by a model optimized for a specific image quality, highly accurate image recognition can be performed.

［内視鏡画像処理装置の変形例］
《内視鏡画像処理装置の変形例１》
図１４は、内視鏡画像処理装置の変形例を示すブロック図である。[Modification example of endoscopic image processing device]
<< Modification 1 of the endoscopic image processing device >>
FIG. 14 is a block diagram showing a modified example of the endoscopic image processing device.

同図に示すように、本例の内視鏡画像処理装置１００Ａは、画像認識に使用するモデルを切り替えるモデル切替部１３０を更に備える点で上記実施の形態の内視鏡画像処理装置１００と相違する。 As shown in the figure, the endoscope image processing device 100A of this example is different from the endoscope image processing device 100 of the above embodiment in that it further includes a model switching unit 130 for switching a model used for image recognition. To do.

画像認識部１１２は、内視鏡画像に対して画像認識を行うモデルが複数備えてられており、使用するモデルが、モデル切替部１３０によって切り替えられる。このモデルは、第２の学習を行うことにより、最適化されたモデルである。用意された複数のモデルは、たとえば、ＲＯＭ１２２又はＨＤＤ１２４に格納される。 The image recognition unit 112 is provided with a plurality of models that perform image recognition on the endoscopic image, and the model to be used is switched by the model switching unit 130. This model is an optimized model by performing the second learning. The plurality of prepared models are stored in, for example, the ROM 122 or the HDD 124.

モデル切替部１３０は、画像処理制御部１１６からの指示に応じて、使用するモデルを切り替える。画像処理制御部１１６は、ユーザからの指示に応じて、使用するモデルを切り替える。 The model switching unit 130 switches the model to be used in response to an instruction from the image processing control unit 116. The image processing control unit 116 switches the model to be used in response to an instruction from the user.

たとえば、仕様の異なる複数の内視鏡を使い分けて検査する場合において、内視鏡ごとに最適化されたモデルを用意する。そして、検査に使用した内視鏡に応じて、画像認識に使用するモデルを切り替える。これにより、精度の高い画像認識が可能になる。 For example, when a plurality of endoscopes having different specifications are used for inspection, a model optimized for each endoscope is prepared. Then, the model used for image recognition is switched according to the endoscope used for the examination. This enables highly accurate image recognition.

また、内視鏡は、同じ機種でも個体差が存在する場合があるので、内視鏡ごとに最適化されたモデルを用意し、検査に使用した内視鏡に応じて、画像認識に使用するモデルを切り替える。これにより、より精度の高い画像認識が可能になる。 In addition, since there may be individual differences in the same model of endoscope, prepare a model optimized for each endoscope and use it for image recognition according to the endoscope used for the examination. Switch models. This enables more accurate image recognition.

また、一般に、内視鏡は、異なる機種であっても、プロセッサ装置（内視鏡から出力される撮像信号を処理し、画像データを生成する装置）は共用されることが多い。機種ごとに最適化されたモデルを用意し、検査に使用した内視鏡に応じて、画像認識に使用するモデルを切り替えることにより、より精度の高い画像認識が可能になる。 Further, in general, even if the endoscope has a different model, a processor device (a device that processes an imaging signal output from the endoscope and generates image data) is often shared. By preparing a model optimized for each model and switching the model used for image recognition according to the endoscope used for the examination, more accurate image recognition becomes possible.

《内視鏡画像処理装置の変形例２》
図１５は、内視鏡画像処理装置の他の変形例を示すブロック図である。<< Modification 2 of the endoscopic image processing device >>
FIG. 15 is a block diagram showing another modification of the endoscopic image processing apparatus.

本例の内視鏡画像処理装置１００Ｂは、認識対象の内視鏡画像を撮影した内視鏡の情報を取得する内視鏡情報取得部１４０を更に備える点で上記変形例１の内視鏡画像処理装置１００Ａと相違する。画像認識部１１２が、画像認識に使用するモデルは、検査に使用される内視鏡ごとに最適化されたものが複数用意される。 The endoscope image processing device 100B of this example further includes an endoscope information acquisition unit 140 that acquires information of the endoscope that has captured the endoscope image to be recognized. It is different from the image processing device 100A. As the model used by the image recognition unit 112 for image recognition, a plurality of models optimized for each endoscope used for the inspection are prepared.

内視鏡情報取得部１４０は、認識対象の内視鏡画像を撮影した内視鏡の情報を取得し、画像処理制御部１１６に出力する。画像処理制御部１１６は、取得した内視鏡の情報に基づいて、対応するモデルが使用されるように、モデル切替部１３０に切り替えを指示する。モデル切替部１３０は、画像処理制御部１１６からの指示に応じて、使用するモデルを切り替える。たとえば、内視鏡の種類（機種）と、対応するモデルとが関連付けられたテーブルが用意され、そのテーブルを参照して、モデルの切り替えが行われる。 The endoscope information acquisition unit 140 acquires information on the endoscope that has captured the endoscope image to be recognized, and outputs the information to the image processing control unit 116. The image processing control unit 116 instructs the model switching unit 130 to switch so that the corresponding model is used based on the acquired endoscopic information. The model switching unit 130 switches the model to be used in response to an instruction from the image processing control unit 116. For example, a table in which the type (model) of the endoscope and the corresponding model are associated with each other is prepared, and the model is switched by referring to the table.

本例の内視鏡画像処理装置１００Ｂによれば、画像認識に適したモデルが自動的に切り替えられるので、常に高精度な画像認識が可能になる。 According to the endoscopic image processing device 100B of this example, a model suitable for image recognition is automatically switched, so that highly accurate image recognition is always possible.

なお、切り替えて使用する複数のモデルは、内視鏡の仕様に応じたものが複数用意される態様の他、互いに解像度の異なる学習用画像群で第２の学習が行われて生成されたモデル、互いにノイズ量の異なる学習用画像群で第２の学習が行われて生成されたモデル、その組み合わせが異なる学習用画像群で第２の学習が行われて生成されたモデル等が用意される。そして、用途に応じて、適切なモデルが選択される。 As the plurality of models to be used by switching, in addition to the mode in which a plurality of models are prepared according to the specifications of the endoscope, a model generated by performing the second learning with learning image groups having different resolutions from each other. , A model generated by performing the second learning on learning image groups having different noise amounts, a model generated by performing the second learning on learning image groups having different combinations, and the like are prepared. .. Then, an appropriate model is selected according to the application.

［その他の実施の形態］
《医療画像》
上記実施の形態では、医療画像として内視鏡画像を対象に画像認識する場合を例に説明したが、本発明が適用可能な医療画像は、これに限定されるものではない。[Other embodiments]
《Medical image》
In the above embodiment, the case of recognizing an endoscopic image as a medical image as an example has been described as an example, but the medical image to which the present invention can be applied is not limited to this.

本発明が適用可能な「医療画像」には、内視鏡画像の他、ＣＴ（Computerized Tomography）画像、Ｘ線画像、超音波診断画像、ＭＲＩ（Magnetic Resonance Imaging）画像、ＰＥＴ(Positron Emission Tomography)画像、ＳＰＥＣＴ(Single Photon Emission Computed Tomography）画像、又は、眼底画像など、様々な種類の画像が含まれる。 "Medical images" to which the present invention can be applied include CT (Computerized Tomography) images, X-ray images, ultrasonic diagnostic images, MRI (Magnetic Resonance Imaging) images, and PET (Positron Emission Tomography) in addition to endoscopic images. Various types of images such as images, SPECT (Single Photon Emission Computed Tomography) images, and fundus images are included.

本開示の医療画像処理装置は、医師等による診察、治療、又は診断などを支援する診断支援装置として用いることができる。「診断支援」という用語は、診察支援及び／又は治療支援の概念を含む。 The medical image processing device of the present disclosure can be used as a diagnostic support device that supports medical examination, treatment, diagnosis, etc. by a doctor or the like. The term "diagnostic support" includes the concept of medical examination support and / or treatment support.

《ハードウェア構成について》
学習装置及び医療画像処理装置を実現するハードウェアは、次に示すような各種のプロセッサ（processor）で構成できる。<< About hardware configuration >>
The hardware that realizes the learning device and the medical image processing device can be configured by various processors as shown below.

各種のプロセッサには、プログラムを実行して各種の処理部として機能する汎用的なプロセッサであるＣＰＵ（Central Processing Unit）、ＦＰＧＡ（Field Programmable Gate Array）などの製造後に回路構成を変更可能なプロセッサであるプログラマブルロジックデバイス（Programmable Logic Device：ＰＬＤ）、ＡＳＩＣ（Application Specific Integrated Circuit）などの特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路などが含まれる。 Various processors include CPUs (Central Processing Units) and FPGAs (Field Programmable Gate Arrays), which are general-purpose processors that execute programs and function as various processing units. A programmable logic device (PLD), a dedicated electric circuit which is a processor having a circuit configuration specially designed for executing a specific process such as an ASIC (Application Specific Integrated Circuit), and the like are included.

１つの処理部は、これら各種のプロセッサのうちの１つで構成されていてもよいし、同種又は異種の２つ以上のプロセッサで構成されてもよい。たとえば、１つの処理部は、複数のＦＰＧＡ、あるいは、ＣＰＵとＦＰＧＡの組み合わせによって構成されてもよい。また、複数の処理部を１つのプロセッサで構成してもよい。複数の処理部を１つのプロセッサで構成する例としては、第一に、クライアントやサーバなどのコンピュータに代表されるように、１つ以上のＣＰＵとソフトウェアの組み合わせで１つのプロセッサを構成し、このプロセッサが複数の処理部として機能する形態がある。第二に、システムオンチップ（System On Chip：ＳｏＣ）などに代表されるように、複数の処理部を含むシステム全体の機能を１つのＩＣ（Integrated Circuit）チップで実現するプロセッサを使用する形態がある。このように、各種の処理部は、ハードウェア的な構造として、上記各種のプロセッサを１つ以上用いて構成される。 One processing unit may be composed of one of these various processors, or may be composed of two or more processors of the same type or different types. For example, one processing unit may be composed of a plurality of FPGAs or a combination of a CPU and an FPGA. Further, a plurality of processing units may be configured by one processor. As an example of configuring a plurality of processing units with one processor, first, one processor is configured by a combination of one or more CPUs and software, as represented by a computer such as a client or a server. There is a form in which the processor functions as a plurality of processing units. Secondly, as typified by System On Chip (SoC), there is a form in which a processor that realizes the functions of the entire system including a plurality of processing units with one IC (Integrated Circuit) chip is used. is there. As described above, the various processing units are configured by using one or more of the above-mentioned various processors as a hardware-like structure.

更に、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子などの回路素子を組み合わせた電気回路（circuitry）である。 Further, the hardware structure of these various processors is, more specifically, an electric circuit (circuitry) in which circuit elements such as semiconductor elements are combined.

《内視鏡》
内視鏡は、軟性内視鏡に限らず、硬性内視鏡であってもよいし、カプセル内視鏡であってもよい。"Endoscope"
The endoscope is not limited to a flexible endoscope, and may be a rigid endoscope or a capsule endoscope.

《内視鏡の観察光について》
内視鏡の観察光（照明光）は、白色光、あるいは１又は複数の特定の波長帯域の光、あるいはこれらの組み合わせなど観察目的に応じた各種波長帯域の光が選択される。白色光は、白色の波長帯域の光又は複数の波長帯域の光である。「特定の波長帯域」は、白色の波長帯域よりも狭い帯域である。特定の波長帯域に関する具体例を以下に示す。<< About the observation light of the endoscope >>
As the observation light (illumination light) of the endoscope, white light, light of one or a plurality of specific wavelength bands, or light of various wavelength bands according to the observation purpose such as a combination thereof is selected. White light is light in a white wavelength band or light in a plurality of wavelength bands. The "specific wavelength band" is a band narrower than the white wavelength band. Specific examples regarding a specific wavelength band are shown below.

〈第１例〉
特定の波長帯域の第１例は、たとえば、可視域の青色帯域又は緑色帯域である。この第１例の波長帯域は、３９０ｎｍ以上４５０ｎｍ以下の波長帯域又は５３０ｎｍ以上５５０ｎｍ以下の波長帯域を含み、かつ、第１例の光は、３９０ｎｍ以上４５０ｎｍ以下の波長帯域内又は５３０ｎｍ以上５５０ｎｍ以下の波長帯域内にピーク波長を有する。<First example>
A first example of a particular wavelength band is, for example, the blue or green band in the visible range. The wavelength band of the first example includes a wavelength band of 390 nm or more and 450 nm or less or a wavelength band of 530 nm or more and 550 nm or less, and the light of the first example is in the wavelength band of 390 nm or more and 450 nm or less or 530 nm or more and 550 nm or less. It has a peak wavelength within the wavelength band.

〈第２例〉
特定の波長帯域の第２例は、たとえば、可視域の赤色帯域である。この第２例の波長帯域は、５８５ｎｍ以上６１５ｎｍ以下の波長帯域又は６１０ｎｍ以上７３０ｎｍ以下の波長帯域を含み、かつ、第２例の光は、５８５ｎｍ以上６１５ｎｍの波長帯域内以下又は６１０ｎｍ以上７３０ｎｍ以下の波長帯域内にピーク波長を有する。<Second example>
A second example of a particular wavelength band is, for example, the red band in the visible range. The wavelength band of the second example includes a wavelength band of 585 nm or more and 615 nm or less or a wavelength band of 610 nm or more and 730 nm or less, and the light of the second example is within the wavelength band of 585 nm or more and 615 nm or less or 610 nm or more and 730 nm or less. Has a peak wavelength within the wavelength band of.

〈第３例〉
特定の波長帯域の第３例は、酸化ヘモグロビンと還元ヘモグロビンとで吸光係数が異なる波長帯域を含み、かつ、第３例の光は、酸化ヘモグロビンと還元ヘモグロビンとで吸光係数が異なる波長帯域にピーク波長を有する。この第３例の波長帯域は、４００±１０ｎｍ、４４０±１０ｎｍの波長帯域、４７０±１０ｎｍの波長帯域、又は６００ｎｍ以上７５０ｎｍ以下の波長帯域を含み、かつ、第３例の光は、上記４００±１０ｎｍ、４４０±１０ｎｍ、４７０±１０ｎｍ、又は６００ｎｍ以上７５０ｎｍ以下の波長帯域内にピーク波長を有する。<Third example>
The third example of a specific wavelength band includes a wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin, and the light in the third example peaks in the wavelength band in which the absorption coefficient differs between the oxidized hemoglobin and the reduced hemoglobin. Has a wavelength. The wavelength band of the third example includes a wavelength band of 400 ± 10 nm, 440 ± 10 nm, a wavelength band of 470 ± 10 nm, or a wavelength band of 600 nm or more and 750 nm or less, and the light of the third example is the above 400 ±. It has a peak wavelength in the wavelength band of 10 nm, 440 ± 10 nm, 470 ± 10 nm, or 600 nm or more and 750 nm or less.

〈第４例〉
特定の波長帯域の第４例は、生体内の蛍光物質が発する蛍光の観察（蛍光観察）に用いられ、かつ、この蛍光物質を励起させる励起光の波長帯域、たとえば、３９０ｎｍから４７０ｎｍである。<4th example>
The fourth example of the specific wavelength band is used for observing the fluorescence emitted by the fluorescent substance in the living body (fluorescence observation), and the wavelength band of the excitation light that excites the fluorescent substance, for example, 390 nm to 470 nm.

〈第５例〉
特定の波長帯域の第５例は、赤外光の波長帯域である。この第５例の波長帯域は、７９０ｎｍ以上８２０ｎｍ以下の波長帯域又は９０５ｎｍ以上９７０ｎｍ以下の波長帯域を含み、かつ、第５例の光は、７９０ｎｍ以上８２０ｎｍ以下の波長帯域内又は９０５ｎｍ以上９７０ｎｍ以下の波長帯域内にピーク波長を有する。<Example 5>
A fifth example of a specific wavelength band is the wavelength band of infrared light. The wavelength band of the fifth example includes a wavelength band of 790 nm or more and 820 nm or less or a wavelength band of 905 nm or more and 970 nm or less, and the light of the fifth example is in the wavelength band of 790 nm or more and 820 nm or less or 905 nm or more and 970 nm or less. It has a peak wavelength within the wavelength band.

《内視鏡の観察光の切り替えについて》
光源の種類は、レーザ光源、キセノン光源、若しくは、ＬＥＤ光源（ＬＥＤ：Light-Emitting Diode）又はこれらの適宜の組み合わせを採用できる。光源の種類、波長、フィルタの有無等は被写体の種類、観察の目的等に応じて構成することが好ましく、また、観察の際は、被写体の種類、観察の目的等に応じて照明光の波長を組み合わせ、及び／又は、切り替えることが好ましい。波長を切り替える場合、たとえば、光源の前方に配置され特定波長の光を透過又は遮光するフィルタが設けられた円板状のフィルタ（ロータリカラーフィルタ）を回転させることにより、照射する光の波長を切り替えてもよい。<< Switching the observation light of the endoscope >>
As the type of light source, a laser light source, a xenon light source, an LED light source (LED: Light-Emitting Diode), or an appropriate combination thereof can be adopted. The type and wavelength of the light source, the presence or absence of a filter, etc. are preferably configured according to the type of subject, the purpose of observation, etc., and during observation, the wavelength of the illumination light is determined according to the type of subject, the purpose of observation, etc. It is preferable to combine and / or switch. When switching the wavelength, for example, the wavelength of the emitted light is switched by rotating a disk-shaped filter (rotary color filter) arranged in front of the light source and provided with a filter that transmits or blocks light of a specific wavelength. You may.

内視鏡に用いるイメージセンサは、各画素に対しカラーフィルタが配設されたカラー撮像素子に限定されるものではなく、モノクロ撮像素子でもよい。モノクロ撮像素子を用いる場合、照明光の波長を順次切り替えて面順次（色順次）で撮像することができる。たとえば、出射する照明光の波長を、紫色、青色、緑色、及び赤色の間で順次切り替えてもよいし、広帯域光（白色光）を照射してロータリカラーフィルタ（赤色、緑色、青色等）により出射する照明光の波長を切り替えてもよい。また、１又は複数の狭帯域光を照射してロータリカラーフィルタにより出射する照明光の波長を切り替えてもよい。狭帯域光は波長の異なる２波長以上の赤外光でもよい。 The image sensor used for the endoscope is not limited to a color image sensor in which a color filter is provided for each pixel, and may be a monochrome image sensor. When a monochrome image sensor is used, the wavelength of the illumination light can be sequentially switched to perform surface-sequential (color-sequential) imaging. For example, the wavelength of the emitted illumination light may be sequentially switched between purple, blue, green, and red, or a wide band light (white light) is irradiated and a rotary color filter (red, green, blue, etc.) is used. The wavelength of the emitted illumination light may be switched. Further, the wavelength of the illumination light emitted by the rotary color filter by irradiating one or a plurality of narrow band lights may be switched. The narrow band light may be infrared light having two or more wavelengths having different wavelengths.

《特殊光画像の生成例》
内視鏡の画像を処理するプロセッサ装置は、白色光を用いて撮像して得られた通常光画像に基づいて、特定の波長帯域の情報を有する特殊光画像を生成してもよい。なお、ここでいう生成には「取得」の概念が含まれる。プロセッサ装置１６は、特定の波長帯域の信号を、通常光画像に含まれる赤（Ｒ）、緑（Ｇ）、及び、青（Ｂ）、あるいは、シアン（Cyan,Ｃ）、マゼンタ（Magenta,Ｍ）、イエロ（Yellow,Ｙ）の色情報に基づく演算を行うことで得ることができる。<< Example of generating a special light image >>
The processor device that processes the image of the endoscope may generate a special optical image having information in a specific wavelength band based on a normal optical image obtained by imaging with white light. The generation referred to here includes the concept of "acquisition". The processor device 16 transmits a signal in a specific wavelength band to red (R), green (G), and blue (B), or cyan (Cyan, C), magenta (Magenta, M), which are usually contained in an optical image. ), It can be obtained by performing an operation based on the color information of Yellow (Y).

《コンピュータに学習装置及び医療画像処理装置の機能を実現させるプログラムについて》
上述の実施形態で説明した学習装置及び医療画像処理装置の機能をコンピュータに実現させるプログラムを光ディスク、磁気ディスク、若しくは、半導体メモリその他の有体物たる非一時的な情報記憶媒体であるコンピュータ可読媒体に記録し、この情報記憶媒体を通じてプログラムを提供することが可能である。またこのような有体物たる非一時的な情報記憶媒体にプログラムを記憶させて提供する態様に代えて、インターネットなどの電気通信回線を利用してプログラム信号をダウンロードサービスとして提供することも可能である。<< About the program that makes the computer realize the functions of the learning device and the medical image processing device >>
A program for realizing the functions of the learning device and the medical image processing device described in the above-described embodiment on a computer is recorded on an optical disk, a magnetic disk, or a computer-readable medium such as a semiconductor memory or other tangible non-temporary information storage medium. However, it is possible to provide a program through this information storage medium. Further, instead of the mode in which the program is stored and provided in such a tangible non-temporary information storage medium, it is also possible to provide the program signal as a download service by using a telecommunication line such as the Internet.

また、上述の実施形態で説明した学習装置及び医療画像処理装置の機能の一部又は全部をアプリケーションサーバとして提供し、電気通信回線を通じて処理機能を提供するサービスを行うことも可能である。 It is also possible to provide a part or all of the functions of the learning device and the medical image processing device described in the above-described embodiment as an application server, and provide a service that provides the processing function through a telecommunication line.

１学習装置
１０第１学習部
１２第１学習用データセット
１６プロセッサ装置
２０第２学習部
２２第２学習用データセット
３０学習制御部
５１ＣＰＵ
５２ＲＯＭ
５３ＲＡＭ
５４ＨＤＤ
５５通信インターフェイス
５６入出力インターフェイス
５７入力装置
５８表示装置
５９ネットワーク
１００内視鏡画像処理装置
１００Ａ内視鏡画像処理装置
１００Ｂ内視鏡画像処理装置
１１０内視鏡画像取得部
１１２画像認識部
１１４認識結果出力部
１１６画像処理制御部
１２１ＣＰＵ
１２２ＲＯＭ
１２３ＲＡＭ
１２４ＨＤＤ
１２５通信インターフェイス
１２６入出力インターフェイス
１２７入力装置
１２８表示装置
１３０モデル切替部
１４０内視鏡情報取得部
Ｍ１第１モデル
Ｍ２第２モデル
Ｓ１からＳ４学習の手順
Ｓ１１からＳ１４高解像度の学習用画像群による学習結果をベースにした学習の手順
Ｓ２１からＳ２４低解像度の学習用画像群による学習結果をベースにした学習の手順
Ｓ３１からＳ３４低ノイズの学習用画像群による学習結果をベースにした学習の手順
Ｓ４１からＳ４４高ノイズの学習用画像群による学習結果をベースにした学習の手順
Ｓ５１からＳ５４広角の学習用画像群による学習結果をベースにした学習の手順
Ｓ６１からＳ６４異なる内視鏡で撮影された学習用画像群による学習結果をベースにした学習の手順1 Learning device 10 1st learning unit 12 1st learning data set 16 Processor device 20 2nd learning unit 22 2nd learning data set 30 Learning control unit 51 CPU
52 ROM
53 RAM
54 HDD
55 Communication interface 56 Input / output interface 57 Input device 58 Display device 59 Network 100 Endoscopic image processing device 100A Endoscopic image processing device 100B Endoscopic image processing device 110 Endoscopic image acquisition unit 112 Image recognition unit 114 Recognition result Output unit 116 Image processing control unit 121 CPU
122 ROM
123 RAM
124 HDD
125 Communication interface 126 Input / output interface 127 Input device 128 Display device 130 Model switching unit 140 Endoscopic information acquisition unit M1 First model M2 Second model S1 to S4 Learning procedure S11 to S14 Learning with high-resolution learning image group Result-based learning procedure S21 to S24 Learning procedure based on learning result by low-resolution learning image group S31 to S34 Learning procedure based on learning result by low-noise learning image group From S41 S44 Learning procedure based on the learning result of the high-noise learning image group S51 to S54 Learning procedure based on the learning result of the wide-angle learning image group S61 to S64 For learning taken with different endoscopes Learning procedure based on the learning result of the image group

Claims

A first learning unit that generates a first model that performs image recognition on the first image quality medical image by learning using the first medical image group composed of the first image quality medical image.
Image recognition is performed on the second image quality medical image by learning using the second medical image group composed of the second image quality medical image different from the first image quality based on the first model. The second learning part that generates the second model to be performed, and
A learning device equipped with.

The first medical image group is composed of a first resolution medical image, and the second medical image group is composed of a second resolution medical image different from the first resolution.
The learning device according to claim 1.

The second resolution is lower than the first resolution.
The learning device according to claim 2.

The first medical image group is composed of medical images having a resolution of 4K or more, and the second medical image group is composed of medical images having a resolution of less than 4K.
The learning device according to claim 3.

The first medical image group is composed of medical images having a resolution of 8K or more, and the second medical image group is composed of medical images having a resolution of less than 8K.
The learning device according to claim 3.

The second resolution is a resolution higher than the first resolution.
The learning device according to claim 2.

The first medical image group is composed of medical images having a resolution of less than 4K, and the second medical image group is composed of medical images having a resolution of 4K or more.
The learning device according to claim 6.

The first medical image group is composed of medical images having a resolution of less than 8K, and the second medical image group is composed of medical images having a resolution of 8K or more.
The learning device according to claim 6.

The first medical image group is composed of medical images having a smaller amount of noise than the medical images constituting the second medical image group.
The learning device according to claim 1.

The first medical image group is composed of medical images having a larger amount of noise than the medical images constituting the second medical image group.
The learning device according to claim 1.

The first medical image group is composed of medical images having a wider angle of view than the medical images constituting the second medical image group.
The learning device according to claim 1.

The first medical image group is composed of medical images taken by an endoscope, and the second medical image group is a medical image taken by an endoscope different from the endoscope that took the first medical image group. Consists of
The learning device according to claim 1.

The second medical image group is composed of medical images taken by an endoscope having specifications different from those of the endoscope that captured the medical images constituting the first medical image group.
The learning device according to claim 12.

The first model and the second model are composed of a convolutional neural network.
The learning device according to any one of claims 1 to 13.

The medical image acquisition department that acquires medical images and
A model composed of the second model generated by the learning device according to any one of claims 1 to 14 and performing image recognition on the medical image, and a model.
Medical image processing device equipped with.

With multiple of the above models
A model switching unit that switches the model to be used,
The medical image processing apparatus according to claim 15, further comprising.

An endoscope information acquisition unit for acquiring information on the endoscope that captured the medical image is further provided.
The plurality of the models are generated by learning in the second learning unit using the second medical image group photographed by different endoscopes.
The medical image processing apparatus according to claim 16, wherein the model switching unit switches the model to be used based on the information of the endoscope acquired by the endoscope information acquisition unit.

The plurality of the models are generated by learning in the second learning unit using the second medical image group photographed by endoscopes having different specifications.
The medical image processing apparatus according to claim 17.

The plurality of models are generated by learning in the second learning unit using the second medical image group captured by endoscopes having different resolutions or noise amounts from each other.
The medical image processing apparatus according to claim 18.

A step of generating a first model for image recognition on the first-quality medical image by learning using a first medical image group composed of a first-quality medical image, and a step of generating the first model.
Image recognition is performed on the second image quality medical image by learning using the second medical image group composed of the second image quality medical image different from the first image quality based on the first model. Steps to generate the second model to be done,
Learning method with.