JP7115114B2

JP7115114B2 - X-ray image object recognition system

Info

Publication number: JP7115114B2
Application number: JP2018141669A
Authority: JP
Inventors: 宏大和
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2022-08-09
Anticipated expiration: 2038-07-27
Also published as: JP2020014799A

Description

本発明は、Ｘ線画像から撮影対象の物体を認識するＸ線画像物体認識システムに関する。 The present invention relates to an X-ray image object recognition system for recognizing an object to be imaged from an X-ray image.

Ｘ線画像を扱う医療の分野では、Ｘ線撮影後に最終的に出力するＸ線画像を得るための前処理として、撮影時に取得される元画像（Ｒａｗデータ）に対して部位認識を行い、対象部位ごとに最適な画像処理パラメータを用いてデータを補正することで、出力するＸ線画像の画質改善が行われている。例えば、上腕骨、股関節、肩関節、肋骨、異物（金属・ペースメーカー）などの部位に対して、異なるガンマの値の適応などによる濃淡値の修正、ノイズ抑制などにより、出力するＸ線画像の画質改善が行われる。したがって、このようなＸ線画像の画質改善を行うにあたっては、データを補正する部位をできるだけ正確に認識することが必要であり、そのためには、元画像から上記部位を正確に抽出（認識）することが必要となる。 In the medical field that deals with X-ray images, part recognition is performed on the original image (raw data) acquired at the time of radiography as preprocessing to obtain the final output X-ray image after radiography. The image quality of the output X-ray image is improved by correcting the data using the optimum image processing parameters for each part. For example, for parts such as humerus, hip joints, shoulder joints, ribs, and foreign bodies (metals/pacemakers), the image quality of the output X-ray image is improved by applying different gamma values, correcting gray levels, and suppressing noise. Improvements are made. Therefore, in order to improve the image quality of such X-ray images, it is necessary to recognize as accurately as possible the region whose data is to be corrected. is required.

ここで、画像から特定の部位を抽出する方法の一例が、特許文献１に開示されている。特許文献１では、複数の構造物（例えば骨）が含まれる医用画像（Ｘ線画像）を処理する際に、事前情報として解剖学的な位置に基づく特定の構造物（例えば肋骨）の輪郭線の集合からなる事前形状モデルを記憶部に保存し、事前形状モデルを医用画像内に配置した後、その医用画像に対して、配置された事前形状モデルの複数の輪郭線と重合する位置において、画素値に対して１次微分を行って画像特徴量を計算する。そして、事前形状モデルの輪郭線と上記画像特徴量とに基づいて、特定の構造物の候補点を検出する。これにより、複数の構造物が重なっているＸ線画像から、特定の構造物（肋骨（の輪郭線））を精度よく抽出することが可能となっている。 Here, Patent Document 1 discloses an example of a method for extracting a specific part from an image. In Patent Document 1, when processing a medical image (X-ray image) containing a plurality of structures (eg, bones), the outline of a specific structure (eg, ribs) based on the anatomical position is obtained as prior information. After storing a pre-shaped model consisting of a set of in a storage unit and arranging the pre-shaped model in a medical image, at a position where a plurality of contour lines of the arranged pre-shaped model overlap with respect to the medical image, An image feature amount is calculated by performing first-order differentiation on the pixel value. Then, candidate points of a specific structure are detected based on the contour lines of the preliminary shape model and the image feature amount. As a result, it is possible to accurately extract a specific structure ((contour of ribs)) from an X-ray image in which a plurality of structures overlap.

一方、近年では、大量のデータの演算処理が可能であるＧＰＵ（Graphics Processing Unit）の発展により、Deep Learningと呼ばれる深層学習が注目を浴びている。Deep Learningとは、ディープニューラルネットワーク（ＤＮＮ；Deep Neural Network）を用いた学習のことである。ＤＮＮは、人間や動物の脳神経回路をモデルとしたアルゴリズムを多層構造化し、パターン認識するように設計されたニューラルネットワーク（学習ネットワーク）である。大量のデータを用いて予め学習されたＤＮＮを用いることにより、人間の力なしに入力データから自動的に特徴を抽出し、物体認識を行うことができる。 On the other hand, in recent years, due to the development of GPUs (Graphics Processing Units) capable of arithmetic processing of large amounts of data, deep learning called deep learning has attracted attention. Deep learning is learning using a deep neural network (DNN). A DNN is a neural network (learning network) designed to perform pattern recognition with a multi-layered algorithm modeled after the neural circuits of humans and animals. By using a DNN that has been pre-trained using a large amount of data, it is possible to automatically extract features from input data and perform object recognition without human effort.

このようなDeep Learningは、以下に示すような様々な技術または分野に適応されている。
（Ａ）音声認識
人間の声を認識してテキストデータで出力したり、音声の特徴を捉えて、音声を出している人を識別する技術。
（Ｂ）自然言語処理
文書要約、機械翻訳など、人間が日常的に使う自然言語（書き言葉・話し言葉）をコンピューターに処理・理解させる技術。
（Ｃ）異常検知
工場内の監視（故障や異常動作の検知）などのように、産業機器に取り付けられたセンサの時系列検知データから異常の兆候を感知する技術。
（Ｄ）画像認識
顔認証、自動運転、感情分析などの分野で、画像や動画を入力とし、文字、顔、一般物体などの特徴を認識し検出する技術。 Such deep learning is applied to various techniques or fields as shown below.
(A) Speech Recognition Technology that recognizes human voices and outputs them as text data, or captures the characteristics of voices to identify the person who is speaking.
(B) Natural language processing Technology that allows computers to process and understand natural language (written and spoken) that people use on a daily basis, such as document summarization and machine translation.
(C) Anomaly detection A technology that detects signs of anomalies from time-series detection data from sensors attached to industrial equipment, such as in factory monitoring (detection of failures and abnormal operations).
(D) Image recognition Technology that recognizes and detects features such as characters, faces, and general objects from input images and videos in the fields of face recognition, autonomous driving, and emotion analysis.

そして、近年では、上記記載の技術または分野のみならず、医療分野においてもDeep Learningの適応が進められている。Deep Learningの適応により、例えば入力されるＸ線画像に対して骨などの対象部位の領域を抽出することが可能となる。これにより、抽出した領域に対してデータ補正などの前処理を行って、最終的に出力するＸ線画像の画質改善を行うことが可能となる。 In recent years, application of deep learning is progressing not only in the above-described technologies or fields, but also in the medical field. Adaptation of deep learning makes it possible, for example, to extract regions of target parts such as bones from an input X-ray image. This makes it possible to perform preprocessing such as data correction on the extracted region and improve the image quality of the finally output X-ray image.

ここで、従来、画質改善のための部位認識は、開発者がそのノウハウにより、対象部位の検出（抽出）アルゴリズムを開発し、そのアルゴリズムを機械（コンピュータ）が実行することで行われていた。しかし、Ｘ線の撮影条件（放射線量・撮影位置など）の違いや、個人差（体内の骨等の構造物の形状差）などにより、抽出精度にばらつきがあるため、複雑な抽出アルゴリズムの開発が求められていた。ＤＮＮは、上述したように人間の力なしに自動的に入力データから特徴を抽出するため、対象部位の検出にDeep Learningを適応することは、人間による複雑な抽出アルゴリズムの開発が不要となる点で非常に有効である。 Here, conventionally, part recognition for image quality improvement was performed by having a developer develop a detection (extraction) algorithm for a target part using its know-how and executing the algorithm by a machine (computer). However, extraction accuracy varies due to differences in X-ray imaging conditions (radiation dose, imaging position, etc.) and individual differences (shape differences in structures such as bones in the body), so complex extraction algorithms have been developed. was sought. As described above, DNN automatically extracts features from input data without human effort, so applying deep learning to target part detection eliminates the need for humans to develop complex extraction algorithms. is very effective in

しかし、対象部位の検出にDeep Learningを適応するためには、十分な量の学習データが必要である。十分な量の学習データの入手が困難な場合、少量の学習データでもDeep LearningによるＤＮＮの学習および物体認識（物体の推論、予測）は可能であるが、過学習が生じて認識性能が低下する可能性が高くなる。つまり、学習時のデータ量が十分であれば、推論時に学習データ以外のデータがＤＮＮに入力された場合でも、図１８の実線のグラフで示すように、ＤＮＮは本来の正解に近い値を予測することができる。これに対して、学習時のデータ量が少ないと、推論時において、ＤＮＮは、学習したデータが入力された場合しか、正解を予測することができなくなり（過学習の状態）、学習データ以外のデータが入力されたときには、図１９の実線のグラフで示すように、本来の正解に近い値（破線のグラフ参照）から離れた値を予測する。その結果、物体の認識性能が低下する。 However, in order to apply deep learning to target part detection, a sufficient amount of learning data is required. If it is difficult to obtain a sufficient amount of training data, DNN training and object recognition (object inference and prediction) by deep learning are possible even with a small amount of training data, but over-learning occurs and recognition performance deteriorates. more likely. In other words, if the amount of data during learning is sufficient, even if data other than learning data is input to the DNN during inference, the DNN predicts a value close to the original correct answer, as shown by the solid line graph in FIG. can do. On the other hand, if the amount of data at the time of learning is small, at the time of inference, the DNN can only predict the correct answer when the learned data is input (state of over-learning). When data is input, as indicated by the solid-line graph in FIG. 19, a value far from the original correct value (see the dashed-line graph) is predicted. As a result, object recognition performance is degraded.

したがって、学習データが少ない場合、上記の過学習を抑えるためには、学習データを増やす処理が必要となる。このような処理として、元画像に対して移動、回転、拡大・縮小、反転などの人為的な操作を加えることによって画像数を擬似的に増やすデータ拡張（Data augmentation）を行うことが知られている。図２０に示すように、元の画像データ（黒丸参照）に対して、適切なデータ拡張によって新たな画像データ（白丸参照）を作成し、学習データを擬似的に増やして学習を行うことにより、ＤＮＮが学習データを過剰に学習することが抑制される。これにより、推論時には、図２０の実線のグラフで示すように、ＤＮＮは入力データに対して本来の正解に近い値を予測することが可能となり、過学習による認識性能の低下を抑制することが可能となる。 Therefore, when the amount of learning data is small, it is necessary to increase the amount of learning data in order to suppress over-learning. As such processing, it is known to perform data augmentation to artificially increase the number of images by adding artificial operations such as movement, rotation, enlargement/reduction, and inversion to the original images. there is As shown in FIG. 20, the original image data (see black circles) is appropriately extended to create new image data (see white circles), and the learning data is artificially increased for learning. Excessive learning of learning data by the DNN is suppressed. As a result, at the time of inference, as shown by the solid line graph in FIG. 20, the DNN can predict a value that is close to the original correct answer for the input data, thereby suppressing deterioration in recognition performance due to over-learning. It becomes possible.

特開２０１８－１５０２２号公報（請求項１、段落〔０００８〕、〔００１８〕、〔００３０〕～〔００５６〕、図１等参照）Japanese Patent Application Laid-Open No. 2018-15022 (Claim 1, paragraphs [0008], [0018], [0030] to [0056], see FIG. 1, etc.)

ところで、上述したＸ線画像は、撮影装置の特殊性、被爆の問題、個人情報の問題などから、大量に入手が困難なデータである。このため、入力されるＸ線画像に対してＤＮＮの認識性能を向上させるためには、Ｘ線画像の数を擬似的に増やす上述したデータ拡張が必要不可欠である。 By the way, the X-ray images described above are data that are difficult to obtain in large quantities due to the peculiarities of the imaging apparatus, the problem of exposure to radiation, the problem of personal information, and the like. Therefore, in order to improve the recognition performance of the DNN for input X-ray images, the above-described data extension for pseudo-increasing the number of X-ray images is essential.

しかし、Ｘ線画像は、撮影対象部位や撮影方向などの違いにより、取得される画像のバリエーションが多いため、単純に、元画像に対して移動、回転、拡大・縮小、反転などの人為的な操作を加えてデータ拡張を行うと、実際にはあり得ないシーンの画像が作成され、その画像に基づいて実際にはあり得ない撮影シーンをＤＮＮが学習してしまう可能性がある。例えば、元画像が胸部正面Ｘ線画像であり、この画像を面内で回転させて新たな画像を作成するデータ拡張を行う場合において、元画像を回転させすぎると、普段の撮影ではあり得ない胸部正面Ｘ線画像（例えば横向きのＸ線画像（正立状態から９０°回転させた場合）や天地が逆転したＸ線画像（正立状態から１８０°回転させた画像）が取得される。また、元画像が子供の胸部正面Ｘ線画像である場合、その画像を縮小させて新たな画像を作成するデータ拡張を行うと、普段の撮影ではあり得ない微小な胸部正面Ｘ線画像が取得される。 However, since X-ray images have many variations in acquired images due to differences in imaging target parts, imaging directions, etc., it is necessary to simply move, rotate, enlarge/reduce, or invert the original image. If data augmentation is performed by adding an operation, an image of a scene that is actually impossible is created, and based on the image, there is a possibility that the DNN will learn a shooting scene that is actually impossible. For example, when the original image is a frontal chest X-ray image and this image is rotated within the plane to create a new image for data augmentation, if the original image is rotated too much, normal radiography cannot be performed. A frontal chest X-ray image (for example, a lateral X-ray image (rotated 90° from the upright state) or an upside down X-ray image (image rotated 180° from the upright state) is acquired. If the original image is a frontal chest X-ray image of a child, data augmentation is performed by reducing the image to create a new image, and a very small frontal chest X-ray image that cannot be obtained in normal radiography is obtained. be.

このような意図しないデータ拡張によって作成された画像（図２１の白丸参照）に基づいて、意図しない学習が行われると、推論時には、図２１の実線のグラフで示すように、ＤＮＮは入力されるＸ線画像に対して本来の正解に近い値（破線のグラフ参照）から離れた値を予測することになり、結果的に物体の認識性能が低下する。したがって、Ｘ線画像のデータ拡張を行うにあたっては、意図しないデータ拡張が行われないように、データ拡張のパラメータを適切に設定することが必要となるが、このようなパラメータの設定については、従来一切検討されていない。 When unintended learning is performed based on images created by such unintended data augmentation (see white circles in FIG. 21), DNN is input at the time of inference as shown by the solid line graph in FIG. A value deviated from a value close to the original correct answer (see the dashed line graph) is predicted for the X-ray image, resulting in deterioration of the object recognition performance. Therefore, when performing data extension of an X-ray image, it is necessary to appropriately set parameters for data extension so that unintended data extension does not occur. not considered at all.

また、Ｘ線画像は、撮影対象部位や撮影方向などの撮影条件の違いによってバリエーションが多いため、例えば、各撮影条件に応じたＤＮＮを複数用意しておけば、撮影条件ごとに、対応するＤＮＮを用いて推論（部位認識）を行うことができるとも考えられる。しかし、この場合は、複数のＤＮＮの中から撮影条件に応じたＤＮＮを選択するために、Ｘ線撮影を行う撮影者（放射線技師）が撮影条件を入力する必要が生じ、撮影者の手を煩わせる。また、Ｘ線画像の分野では、入力画像に基づく推論は、上述したように出力するＸ線画像の画質改善のための前処理を目的として行われるため、その処理は効率よく行われることが望ましく、また、処理負荷も少ないことが望ましい。以上のことを考慮すると、撮影条件ごとにＤＮＮを複数用意するのではなく、単一のＤＮＮを用意し、どのような撮影条件で撮影されたＸ線画像が入力されても、単一のＤＮＮで撮影対象部位を認識できるようにすることが望ましい。 In addition, since X-ray images have many variations due to differences in imaging conditions such as the part to be imaged and the imaging direction, for example, if a plurality of DNNs corresponding to each imaging condition are prepared, the corresponding DNN can be obtained for each imaging condition. It is also considered possible to perform inference (part recognition) using However, in this case, in order to select a DNN according to the imaging conditions from among a plurality of DNNs, it becomes necessary for the operator (radiological technologist) who performs the X-ray imaging to input the imaging conditions. Annoy me. In the field of X-ray images, inference based on input images is performed for the purpose of preprocessing for improving the image quality of output X-ray images as described above, so it is desirable that this processing be performed efficiently. Also, it is desirable that the processing load is small. Considering the above, a single DNN is prepared instead of preparing a plurality of DNNs for each imaging condition. It is desirable to be able to recognize the region to be imaged with

本発明は、上記の問題点を解決するためになされたもので、その目的は、学習時のＸ線画像に対してデータ拡張を行う際のパラメータを適切に設定することにより、意図しないデータ拡張が行われるのを回避することができ、これによって、データ拡張後の画像を用いて学習ネットワークを適切に機械学習させて、学習ネットワークが物体を精度よく認識（推論）できるようにするとともに、どのような撮影条件で撮影されたＸ線画像が入力されても、単一の学習ネットワークで物体を認識できるようにするＸ線画像物体認識システムを提供することにある。 SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and its object is to appropriately set parameters for performing data augmentation on X-ray images during learning, thereby preventing unintended data augmentation. This allows the learning network to perform appropriate machine learning using the data-augmented images so that the learning network can recognize (infer) objects with high accuracy. To provide an X-ray image object recognition system capable of recognizing an object with a single learning network even if an X-ray image photographed under such photographing conditions is input.

本発明の一側面に係るＸ線画像物体認識システムは、物体のＸ線画像と、前記物体に対応する正解ラベルとを含む学習セットを用いて機械学習を行う学習ネットワークと、前記学習セットから、前記物体のＸ線撮影時の撮影条件を導くための撮影情報を算出する撮影情報算出部と、前記撮影情報に基づいて、前記Ｘ線画像から新たなＸ線画像を作成するデータ拡張を行う際に用いるデータ拡張パラメータを決定するデータ拡張パラメータ決定部と、前記データ拡張パラメータに基づいて前記データ拡張を行い、取得した前記新たなＸ線画像と前記正解ラベルとを用いて前記学習ネットワークを機械学習させる学習処理部とを備え、前記学習ネットワークは、前記新たなＸ線画像を用いて機械学習を行った後に入力されるＸ線画像から、Ｘ線撮影された物体を認識してその認識結果を出力する。 An X-ray image object recognition system according to one aspect of the present invention includes a learning network that performs machine learning using a learning set including an X-ray image of an object and a correct label corresponding to the object, and from the learning set, an imaging information calculation unit for calculating imaging information for deriving imaging conditions at the time of X-ray imaging of the object; and data expansion for creating a new X-ray image from the X-ray image based on the imaging information a data augmentation parameter determination unit that determines a data augmentation parameter to be used in the data augmentation parameter, performs the data augmentation based on the data augmentation parameter, and machine-learns the learning network using the acquired new X-ray image and the correct label The learning network recognizes an X-rayed object from an X-ray image input after performing machine learning using the new X-ray image, and outputs the recognition result. Output.

上記のＸ線画像物体認識システムにおいて、前記撮影情報算出部は、前記学習セットに含まれる前記Ｘ線画像と、前記Ｘ線画像に含まれる前記物体の領域と対応する形状の前記正解ラベルとに基づいて、前記撮影情報を算出してもよい。 In the above-mentioned X-ray image object recognition system, the imaging information calculation unit combines the X-ray image included in the learning set with the correct label having a shape corresponding to the region of the object included in the X-ray image. Based on this, the photographing information may be calculated.

上記のＸ線画像物体認識システムにおいて、前記撮影情報は、前記Ｘ線画像において前記正解ラベルと対応する前記物体の領域の画素数であってもよい。 In the X-ray image object recognition system described above, the imaging information may be the number of pixels in the region of the object corresponding to the correct label in the X-ray image.

上記のＸ線画像物体認識システムにおいて、前記撮影情報算出部は、前記学習セットに含まれる前記Ｘ線画像内で前記物体を囲む矩形領域に基づいて、前記撮影情報を算出してもよい。 In the X-ray image object recognition system described above, the imaging information calculation unit may calculate the imaging information based on a rectangular area surrounding the object in the X-ray image included in the learning set.

上記のＸ線画像物体認識システムにおいて、前記撮影情報は、前記矩形領域の面積であってもよい。 In the X-ray image object recognition system described above, the imaging information may be the area of the rectangular region.

上記のＸ線画像物体認識システムにおいて、前記撮影情報算出部は、前記学習セットから、前記学習セットに含まれる前記Ｘ線画像内で前記物体以外の領域を示す物体外領域情報をさらに算出してもよい。 In the X-ray image object recognition system described above, the imaging information calculation unit further calculates, from the learning set, out-of-object region information indicating a region other than the object in the X-ray image included in the learning set. good too.

上記のＸ線画像物体認識システムにおいて、前記物体外領域情報は、Ｘ線の照射野外の情報であり、前記撮影情報算出部は、前記学習セットに含まれる、前記Ｘ線の照射野外の領域に対応する形状の正解ラベルに基づいて、前記照射野外の情報を算出してもよい。 In the above-mentioned X-ray image object recognition system, the outside-object region information is information of an X-ray irradiation field, and the imaging information calculation unit includes the X-ray irradiation field region included in the learning set. The information of the irradiation field may be calculated based on the correct label of the corresponding shape.

上記のＸ線画像物体認識システムにおいて、前記物体外領域情報は、Ｘ線の照射野外の情報であり、前記撮影情報算出部は、前記学習セットに含まれる前記Ｘ線画像のヒストグラム情報に基づいて、前記照射野外の情報を算出してもよい。 In the X-ray image object recognition system described above, the out-of-object region information is information of an X-ray irradiation field, and the imaging information calculation unit calculates the following based on histogram information of the X-ray images included in the learning set , the information of the irradiation field may be calculated.

上記のＸ線画像物体認識システムにおいて、前記物体外領域情報は、Ｘ線の照射野外の情報であり、前記撮影情報算出部は、前記学習セットに含まれる前記Ｘ線画像の各画素値を二値化した二値化画像に基づいて、前記照射野外の情報を算出してもよい。 In the X-ray image object recognition system described above, the out-of-object region information is information of an X-ray irradiation field, and the imaging information calculation unit divides each pixel value of the X-ray image included in the learning set into two. The information of the irradiation field may be calculated based on the valued binarized image.

上記のＸ線画像物体認識システムにおいて、前記データ拡張パラメータ決定部は、前記撮影情報と、前記物体外領域情報と、予め設定された閾値とに基づいて、前記データ拡張パラメータを決定してもよい。 In the X-ray image object recognition system described above, the data augmentation parameter determination unit may determine the data augmentation parameter based on the imaging information, the extra-object region information, and a preset threshold value. .

上記のＸ線画像物体認識システムにおいて、前記データ拡張パラメータは、前記Ｘ線画像の縮小・拡大率、シフト量、回転角のうちの少なくとも１つであってもよい。 In the above X-ray image object recognition system, the data augmentation parameter may be at least one of reduction/enlargement ratio, shift amount, and rotation angle of the X-ray image.

上記のＸ線画像物体認識システムにおいて、前記データ拡張パラメータ決定部は、前記学習セットに含まれる前記Ｘ線画像内の前記物体の領域の総画素数が第１の閾値以上である場合に、前記データ拡張パラメータとしての前記縮小・拡大率を、前記Ｘ線画像を等倍または縮小する値に設定してもよい。 In the X-ray image object recognition system described above, the data augmentation parameter determination unit determines, when the total number of pixels of the object region in the X-ray image included in the learning set is equal to or greater than a first threshold, the The reduction/enlargement ratio as a data expansion parameter may be set to a value that magnifies or reduces the X-ray image.

上記のＸ線画像物体認識システムにおいて、前記データ拡張パラメータ決定部は、前記学習セットに含まれる前記Ｘ線画像内の前記物体の領域の総画素数が前記第１の閾値よりも小さい第２の閾値以下である場合に、前記縮小・拡大率を、前記Ｘ線画像を等倍または拡大する値に設定してもよい。 In the X-ray image object recognition system described above, the data augmentation parameter determination unit determines a second total number of pixels in the region of the object in the X-ray image included in the learning set that is smaller than the first threshold. The reduction/enlargement ratio may be set to a value that magnifies the X-ray image if it is equal to or less than the threshold.

上記のＸ線画像物体認識システムにおいて、前記データ拡張パラメータ決定部は、前記縮小・拡大率とともに、前記シフト量および前記回転角の少なくとも一方を決定してもよい。 In the X-ray image object recognition system described above, the data augmentation parameter determination unit may determine at least one of the shift amount and the rotation angle along with the reduction/enlargement ratio.

上記のＸ線画像物体認識システムにおいて、前記データ拡張パラメータ決定部は、前記学習セットに含まれる前記Ｘ線画像内の前記物体外領域の総画素数が第３の閾値未満である場合に、前記データ拡張パラメータの設定可能範囲を制限してもよい。 In the X-ray image object recognition system described above, the data augmentation parameter determination unit determines, when the total number of pixels of the extra-object region in the X-ray image included in the learning set is less than a third threshold, The settable range of data extension parameters may be restricted.

上記のＸ線画像物体認識システムにおいて、前記物体は、人物においてＸ線の透過量が相対的に少ないＸ線低透過領域、およびＸ線の透過量が相対的に多いＸ線高透過領域の少なくとも一方を含んでいてもよい。 In the X-ray image object recognition system described above, the object is at least an X-ray low-transmittance region in which the amount of X-rays transmitted in the person is relatively low and a high X-rays-transmittance region in which the amount of X-rays transmitted is relatively high. One may be included.

上記のＸ線画像物体認識システムにおいて、前記Ｘ線低透過領域は、前記人物の骨の領域を含み、前記Ｘ線高透過領域は、前記人物の肺野の領域を含んでいてもよい。 In the X-ray image object recognition system described above, the low X-ray transmission area may include a bone area of the person, and the high X-ray transmission area may include a lung area of the person.

上記のＸ線画像物体認識システムにおいて、前記学習ネットワークは、ニューラルネットワークで構成されていてもよい。 In the above X-ray image object recognition system, the learning network may be composed of a neural network.

上記の構成によれば、撮影情報に基づいてデータ拡張パラメータが適切に決定されるため、上記データ拡張パラメータに基づいてデータ拡張を適切に行うことができ、学習時のＸ線画像に対して意図しないデータ拡張が行われる事態を回避することができる。これにより、データ拡張後の画像を用いて学習ネットワークを適切に機械学習させることができるため、推論時に、学習ネットワークは、入力されるＸ線画像に対して物体（撮影対象部位）を精度よく認識（推論）することが可能となる。また、撮影情報に基づいて決定されたデータ拡張パラメータを用いてデータ拡張が行われて、学習ネットワークが機械学習されるため、推論時に、学習ネットワークは、どのような撮影条件で撮影されたＸ線画像が入力されても、物体を認識することが可能となる。つまり、推論時には、入力されるＸ線画像の撮影条件に関係なく、単一の学習ネットワークで物体を認識することが可能となる。 According to the above configuration, since the data augmentation parameter is appropriately determined based on the imaging information, the data augmentation can be appropriately performed based on the data augmentation parameter, and the intention of the X-ray image at the time of learning can be obtained. It is possible to avoid a situation in which data expansion that does not occur is performed. As a result, the learning network can be appropriately machine-learned using the data-augmented images, so that during inference, the learning network accurately recognizes the object (the part to be imaged) from the input X-ray image. (Inference) becomes possible. In addition, data augmentation is performed using data augmentation parameters determined based on imaging information, and the learning network is subjected to machine learning. Even if an image is input, it becomes possible to recognize the object. That is, at the time of inference, it is possible to recognize an object with a single learning network regardless of the imaging conditions of the input X-ray image.

Ｘ線による人物の正面撮影時の状態を模式的に示す説明図である。FIG. 4 is an explanatory diagram schematically showing a state in which a person is photographed from the front using X-rays; 正面撮影によって得られた胸部のＸ線画像の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of an X-ray image of the chest obtained by frontal radiography; Ｘ線による人物の斜位撮影時の状態を模式的に示す説明図である。FIG. 4 is an explanatory diagram schematically showing a state of oblique imaging of a person using X-rays; 斜位撮影によって得られた胸部のＸ線画像の一例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of a chest X-ray image obtained by oblique imaging; 人物の上腕骨のＸ線画像の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of an X-ray image of a human humerus; 人物の股関節のＸ線画像の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of an X-ray image of a person's hip joint; 胸部正面画像用学習モデルを生成する処理の流れを示す説明図である。FIG. 10 is an explanatory diagram showing the flow of processing for generating a learning model for frontal chest images; 胸部斜位画像用学習モデルを生成する処理の流れを示す説明図である。FIG. 10 is an explanatory diagram showing the flow of processing for generating a learning model for chest oblique images; 上腕骨画像用学習モデルを生成する処理の流れを示す説明図である。FIG. 10 is an explanatory diagram showing the flow of processing for generating a learning model for humerus images; 複数の学習モデルの中から、撮影条件に応じた学習モデルを読み込んで推論を行う場合の処理の流れを示す説明図である。FIG. 10 is an explanatory diagram showing a flow of processing when a learning model corresponding to an imaging condition is read from among a plurality of learning models and inference is performed; 本発明の一実施形態に係るＸ線画像物体認識システムの概略の構成を示すブロック図である。1 is a block diagram showing a schematic configuration of an X-ray image object recognition system according to an embodiment of the present invention; FIG. 上記Ｘ線画像物体認識システムが備える学習ネットワークの学習方法における処理の流れを示すフローチャートである。4 is a flow chart showing the flow of processing in the learning method of the learning network provided in the X-ray image object recognition system; 学習セットに含まれるＸ線画像の一例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of X-ray images included in a learning set; 図１１ＡのＸ線画像に基づいて作成された正解ラベルの一例を示す説明図である。FIG. 11B is an explanatory diagram showing an example of a correct label created based on the X-ray image of FIG. 11A; 図１１Ｂの正解ラベルと、照射野外に対応する正解ラベルとを併せて示す説明図である。FIG. 11C is an explanatory diagram showing both the correct label in FIG. 11B and the correct label corresponding to the irradiation field; 他の学習セットに含まれるＸ線画像の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of X-ray images included in another learning set; 図１３ＡのＸ線画像に基づいて作成された正解ラベルの一例を示す説明図である。FIG. 13B is an explanatory diagram showing an example of a correct label created based on the X-ray image of FIG. 13A; Ｘ線画像内の物体認識用の矩形領域を正解ラベルとして用いる例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of using a rectangular area for object recognition in an X-ray image as a correct label; Ｘ線画像のヒストグラムの一例を模式的に示す説明図である。FIG. 4 is an explanatory diagram schematically showing an example of a histogram of an X-ray image; 二値化画像の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a binarized image; 上記Ｘ線画像物体認識システムにおいて、物体認識時の処理の流れを示すフローチャートである。4 is a flow chart showing the flow of processing during object recognition in the X-ray image object recognition system; 学習データが十分にある場合における、学習データと正解との関係、および推論対象の入力データと予測値との関係を示す説明図である。FIG. 4 is an explanatory diagram showing the relationship between learning data and correct answers and the relationship between input data to be inferred and predicted values when there is sufficient learning data; 学習データが少ない場合における、学習データと正解との関係、および推論対象の入力データと予測値との関係を示す説明図である。FIG. 4 is an explanatory diagram showing the relationship between learning data and correct answers and the relationship between input data to be inferred and predicted values when there is little learning data; データ拡張を適切に行った場合における、学習データと正解との関係、および推論対象の入力データと予測値との関係を示す説明図である。FIG. 10 is an explanatory diagram showing the relationship between learning data and correct answers and the relationship between input data to be inferred and predicted values when data extension is performed appropriately; 意図しないデータ拡張を行った場合における、学習データと正解との関係、および推論対象の入力データと予測値との関係を示す説明図である。FIG. 10 is an explanatory diagram showing the relationship between learning data and correct answers and the relationship between input data to be inferred and predicted values when unintended data extension is performed;

本発明の実施の一形態について、図面に基づいて説明すれば、以下の通りである。まず、本実施形態のＸ線画像物体認識システムについて説明する前に、上述した課題について説明を補足しておく。 One embodiment of the present invention will be described below with reference to the drawings. First, before describing the X-ray image object recognition system of this embodiment, a supplementary description of the above-described problems will be provided.

（課題についての補足）
Deep Learningによる物体認識が可能なアルゴリズムについては、様々な論文で紹介されており、中でも、Ｒ－ＣＮＮ（Regions with Convolutional Neural Networks）、ＦａｓｔＲ－ＣＮＮ、ＦａｓｔｅｒＲ－ＣＮＮ、ＹＯＬＯ（You Only Look Once）などのアルゴリズムが有名である。これらのアルゴリズムは、物体認識の精度を競う国際コンテスト“ImageNet Large Scale Visual Recognition Challenge（ILSVRC）”などで使われる画像を対象に考えられている。例えば、ILSVRC2012のデータセットを用いると、指定された１０００のオブジェクトカテゴリーで、ランダムに５万枚のデータを学習させることが可能であり、合計５０００万枚の画像データをニューラルネットワークに学習させることができる。このため、上記の各アルゴリズムを採用するニューラルネットワークでは、複雑なデータ拡張処理を必要としない。つまり、元画像に対して単純に移動、回転、拡大・縮小、反転などの処理をランダムで実施することにより、データ拡張を行うことが可能である。 (Supplementary information about the assignment)
Algorithms that enable object recognition by deep learning have been introduced in various papers, among which R-CNN (Regions with Convolutional Neural Networks), Fast R-CNN, Faster R-CNN, YOLO (You Only Look Once ) are well-known algorithms. These algorithms are intended for images used in international contests such as the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC)," which competes for object recognition accuracy. For example, using the ILSVRC2012 data set, it is possible to randomly train 50,000 images of data in 1,000 specified object categories, and train a neural network on a total of 50 million image data. can. Therefore, a neural network that employs each of the above algorithms does not require complicated data augmentation processing. In other words, it is possible to expand the data by randomly performing processes such as movement, rotation, enlargement/reduction, and inversion on the original image.

また、画像を画素レベルで把握するSemantic Segmentationにおいて有名な手法であるSegNetの論文に記載されている入力画像は、車載シーンに限定された画像を想定した手法であるため、データ拡張によって大量のデータを取得することは不要である。さらに、車載画像は、車から前方に見えるシーンであるため、路面、空、ビル、前方の車・人などの位置は限定される。また、細胞検出を想定したU-Netにおいても同様であり、細胞の撮影画像は、位置や大きさなどが異なるものの、大幅に形状が異なったりすることはないため、複雑なデータ拡張を行うことは不要である。 In addition, the input image described in the paper on SegNet, a well-known method for semantic segmentation that grasps images at the pixel level, is a method that assumes images limited to in-vehicle scenes. is unnecessary. Furthermore, since the in-vehicle image is a scene that can be seen in front of the vehicle, the positions of the road surface, the sky, buildings, vehicles and people in front are limited. The same is true for U-Net, which assumes cell detection, and although the position and size of captured images of cells differ, the shape does not change significantly, so complex data expansion is not required. is unnecessary.

これに対して、医療分野におけるＸ線画像は、上述したように、大量に入手が困難なデータであり、ＤＮＮの認識性能を向上させるためにはデータ拡張が必要不可欠であるが、撮影対象部位や撮影方向などの違いにより、取得される画像のバリエーションが多い。 On the other hand, as mentioned above, X-ray images in the medical field are data that are difficult to obtain in large amounts, and data expansion is essential in order to improve the recognition performance of DNN. There are many variations in acquired images due to differences in shooting direction, etc.

例えば、肺野内や肋骨を診断することを目的としてＸ線撮影を行う場合、図１Ａに示す正面撮影や、図２Ａに示す斜位撮影が実施される。ここで、図１Ｂは、正面撮影によって得られた胸部のＸ線画像の一例を示し、図２Ｂは、斜位撮影によって得られた胸部のＸ線画像の一例を示している。 For example, when X-ray imaging is performed for the purpose of diagnosing the lung field and ribs, frontal imaging shown in FIG. 1A and oblique imaging shown in FIG. 2A are performed. Here, FIG. 1B shows an example of a chest X-ray image obtained by frontal imaging, and FIG. 2B shows an example of a chest X-ray image obtained by oblique imaging.

また、例えば、上腕骨、大腿骨、股関節などの診断を実施する場合、特定幹部を中心とするＸ線撮影が行われる。このとき、被爆を低減するために、撮影対象部位である特定幹部以外を放射線防護シートなどで覆った状態でＸ線撮影が行われる（放射線照射抑制）。例えば、図３は、上腕骨のＸ線画像の一例を示し、図４は、股関節のＸ線画像の一例を示している。なお、腕を回すことによって上腕骨の位置は様々に変化するため、特に上腕骨については、Ｘ線撮影の自由度がかなり高い。 Further, for example, when diagnosing a humerus, a femur, a hip joint, or the like, X-ray imaging is performed centering on a specific trunk. At this time, in order to reduce exposure to radiation, X-ray imaging is performed in a state in which areas other than the specific trunk, which is an imaging target, are covered with a radiation protection sheet or the like (radiation exposure suppression). For example, FIG. 3 shows an example of an X-ray image of a humerus, and FIG. 4 shows an example of an X-ray image of a hip joint. In addition, since the position of the humerus changes in various ways by rotating the arm, the humerus in particular has a high degree of freedom in X-ray imaging.

このように、Ｘ線画像には、多くのバリエーションが存在する。したがって、入力されるＸ線画像に対して単純に精度よく部位認識を行うためには、例えば、クラス（撮影対象部位）ごとに学習を行ってＤＮＮ（学習モデル）をクラスごとに生成し、撮影者（例えば放射線技師）がＸ線撮影時に入力する情報（例えば撮影対象部位、撮影方向）をもとに、クラスに対応する学習モデルを読み込んで（選択して）、部位認識（推論）を行う手法が考えられる。 Thus, there are many variations in X-ray images. Therefore, in order to simply and accurately recognize parts of an input X-ray image, for example, learning is performed for each class (imaging target part) to generate a DNN (learning model) for each class, Based on the information (e.g., imaging target region, imaging direction) input by an operator (e.g. radiological technologist) during X-ray imaging, the learning model corresponding to the class is read (selected) and region recognition (inference) is performed. method is conceivable.

例えば、胸部正面のＸ線画像から撮影対象部位を認識する学習モデル（胸部正面画像用学習モデル）を生成する場合、図５に示すように、学習用に予め用意された胸部正面のＸ線画像と正解ラベルとを含む学習セットを用い、上記Ｘ線画像に対してデータ拡張を行い、新たに生成されたＸ線画像とその正解データとを用いてＤＮＮを学習させることで、胸部正面画像用の学習モデルを生成する。同様に、胸部斜位のＸ線画像から撮影対象部位を認識する学習モデル（胸部斜位画像用学習モデル）を生成する場合、図６に示すように、学習用に予め用意された胸部斜位のＸ線画像と正解ラベルとを含む学習セットを用い、上記Ｘ線画像に対してデータ拡張を行い、新たに生成されたＸ線画像とその正解データとを用いてＤＮＮを学習させることで、胸部斜位画像用学習モデルを生成する。また、上腕骨のＸ線画像から撮影対象部位を認識する学習モデル（上腕骨画像用学習モデル）を生成する場合、図７に示すように、学習用に予め用意された上腕骨のＸ線画像と正解ラベルとを含む学習セットを用い、上記Ｘ線画像に対してデータ拡張を行い、新たに生成されたＸ線画像とその正解データとを用いてＤＮＮを学習させることで、上腕骨画像用学習モデルを生成する。そして、推論時には、図８に示すように、複数の学習モデルの中から、撮影者がＸ線撮影時に入力する撮影条件（クラス）に応じた学習モデルを読み込み、読み込んだ学習モデルにＸ線画像のデータを入力して対象部位の推論を行い、その結果を出力する。 For example, when generating a learning model (learning model for frontal chest image) for recognizing a region to be imaged from an X-ray image of the frontal chest, as shown in FIG. and a correct label, data augmentation is performed on the X-ray image, and a DNN is learned using the newly generated X-ray image and its correct data, for a chest front image Generate a learning model for Similarly, when generating a learning model (learning model for chest oblique image) for recognizing a region to be imaged from a chest oblique X-ray image, as shown in FIG. Using a learning set containing the X-ray image and the correct label of, performing data extension on the X-ray image, and learning the DNN using the newly generated X-ray image and its correct data, Generate a training model for chest oblique images. When generating a learning model (humerus image learning model) for recognizing a region to be imaged from an X-ray image of the humerus, as shown in FIG. and the correct label, data augmentation is performed on the X-ray image, and the DNN is learned using the newly generated X-ray image and its correct data, for the humerus image Generate a learning model. At the time of inference, as shown in FIG. 8, a learning model corresponding to an imaging condition (class) input by the operator at the time of X-ray imaging is read from among a plurality of learning models, and the X-ray image is stored in the read learning model. data is input, the target part is inferred, and the result is output.

しかし、上記のように撮影条件ごとに学習モデルを用意する場合、複数の学習モデルの中から所定の学習モデルを選択するために、上記のように撮影者の撮影条件の入力が必要となり、撮影者の手を煩わせるとともに、処理が煩雑化する。上述のように、Ｘ線画像の分野では、上記の推論（部位認識）は、出力Ｘ線画像の画質改善のための前処理を目的として行われるため、その処理は効率よく行われることが望ましく、また、処理負荷も少ないことが望ましい。そのためには、撮影条件ごとに学習ネットワーク（学習モデル）を生成するのではなく、一括して（単一の）学習ネットワークを生成し、様々な撮影条件で撮影されたどのＸ線画像についても、単一の学習ネットワークに入力することによって撮影対象部位を認識できるようにすることが望ましい。 However, when a learning model is prepared for each shooting condition as described above, in order to select a predetermined learning model from among multiple learning models, it is necessary for the photographer to input the shooting conditions as described above. In addition to troubling the operator, the processing becomes complicated. As described above, in the field of X-ray images, the above inference (part recognition) is performed for the purpose of preprocessing for improving the image quality of output X-ray images, so it is desirable that this processing be performed efficiently. Also, it is desirable that the processing load is small. For that purpose, instead of generating a learning network (learning model) for each imaging condition, a (single) learning network is generated collectively, and for any X-ray image captured under various imaging conditions, It is desirable to be able to recognize the region to be imaged by inputting it into a single learning network.

また、異なる撮影条件で撮影されるＸ線画像は、多種にわたる画像であり、多くのバリエーションが存在するため、以下の事態が生じ得る。
（ａ）各Ｘ線画像の間でデータが不整形である（各Ｘ線画像の縦横比率がバラバラである）。
（ｂ）各Ｘ線画像において撮影対象物体が映っているアングルが揃っていない（正面撮影、斜位撮影などによる）。
（ｃ）撮影対象物体以外の異物（例えば体内に埋め込まれたペースメーカーやボルト、ネックレスなど）がＸ線画像に映る。
（ｄ）正解データのあるＸ線画像の量が複数のクラス間で不均一である（例えば元画像として胸部の正面Ｘ線画像は多く集まるが、股関節のＸ線画像は集まりにくい）。 In addition, since X-ray images captured under different imaging conditions are of various types and have many variations, the following situations may occur.
(a) The data is irregular between each X-ray image (the aspect ratio of each X-ray image is inconsistent).
(b) The angles at which the objects to be photographed are shown in each X-ray image are not uniform (due to frontal photography, oblique photography, etc.).
(c) A foreign substance (for example, a pacemaker, a bolt, a necklace, etc. implanted in the body) other than the object to be imaged appears in the X-ray image.
(d) The amount of X-ray images with correct data is uneven among a plurality of classes (for example, many frontal chest X-ray images are collected as original images, but hip joint X-ray images are difficult to collect).

したがって、これらの事態を担保するデータ拡張を実施することが必要となり、多くのバリエーションを想定した人為的な操作を加える必要がある。しかし、単純に、元画像に対して移動、回転、拡大・縮小、反転などの人為的な操作を加えてデータ拡張を行うと、Ｘ線画像は元々少数であるため、上述したように、回転しすぎた胸部正面Ｘ線画像や、子供の胸部正面Ｘ線画像よりもさらに小さいＸ線画像が作成されるなど、普段の撮影では得られない画像が作成される場合がある。このような意図しないデータ拡張が行われて、意図しない学習が行われると、図２１で示したように、ＤＮＮは入力されるＸ線画像に対して本来の正解に近い値（破線参照）を予測することができず、物体の認識性能が低下する。 Therefore, it is necessary to implement data expansion to ensure these situations, and it is necessary to add artificial operations assuming many variations. However, if data expansion is simply performed by adding artificial operations such as movement, rotation, enlargement/reduction, and inversion to the original image, since the number of X-ray images is originally small, rotation In some cases, an image that cannot be obtained by normal radiography is created, such as an overexposed chest front X-ray image or an X-ray image that is smaller than a child's front chest X-ray image. When such unintended data expansion is performed and unintended learning is performed, as shown in FIG. Unpredictable and poor object recognition performance.

そこで、本実施形態では、学習時に入力された画像に対し、入力画像・正解ラベルをもとに撮影条件を推定し、推定結果をもとにデータ拡張のパラメータを決定し、決定したパラメータに基づいてデータ拡張を行うことで、意図しないデータ拡張が行われる事態を回避して精度の良い物体認識を可能としつつ、単一の学習ネットワークを機械学習させることにより、様々な撮影条件で撮影されたどのＸ線画像に対しても物体認識を可能としている。以下、本実施形態のＸ線画像物体認識システムについて説明する。 Therefore, in this embodiment, for an image input during learning, shooting conditions are estimated based on the input image and the correct label, parameters for data expansion are determined based on the estimation results, and based on the determined parameters, By performing data augmentation using the 3D method, it is possible to avoid unintended data augmentation and enable highly accurate object recognition. Object recognition is possible for any X-ray image. The X-ray image object recognition system of this embodiment will be described below.

（Ｘ線画像物体認識システムの構成）
図９は、本実施形態のＸ線画像物体認識システム１の概略の構成を示すブロック図である。Ｘ線画像物体認識システム１は、記憶部２と、通信部３と、全体制御部４と、学習ネットワーク５と、撮影情報算出部６と、データ拡張パラメータ決定部７と、学習処理部８とを備えている。このうち、学習ネットワーク５、撮影情報算出部６、データ拡張パラメータ決定部７および学習処理部８は、大量のデータの演算処理が可能であるＧＰＵで構成されている。このようなＸ線画像物体認識システム１は、例えばＰＣ（パーソナルコンピュータ）で構成することができる。なお、図９では、本実施形態で直接関係する構成のみを図示しており、入力部（例えばマウスやキーボード）や表示部（例えば液晶表示装置）などの他の構成についての図示を省略している。 (Configuration of X-ray image object recognition system)
FIG. 9 is a block diagram showing a schematic configuration of the X-ray image object recognition system 1 of this embodiment. The X-ray image object recognition system 1 includes a storage unit 2, a communication unit 3, an overall control unit 4, a learning network 5, an imaging information calculation unit 6, a data augmentation parameter determination unit 7, and a learning processing unit 8. It has Among them, the learning network 5, the photographing information calculation unit 6, the data expansion parameter determination unit 7, and the learning processing unit 8 are configured by a GPU capable of arithmetic processing of a large amount of data. Such an X-ray image object recognition system 1 can be composed of, for example, a PC (personal computer). Note that FIG. 9 shows only the configuration directly related to the present embodiment, and omits illustration of other configurations such as an input unit (for example, a mouse or keyboard) and a display unit (for example, a liquid crystal display device). there is

ここで、本実施形態において、「物体」とは、Ｘ線画像に基づいて認識（推論、予測）する対象となる対象物を指し、人物においてＸ線の透過量が相対的に少ないＸ線低透過領域、およびＸ線の透過量が相対的に多いＸ線高透過領域の少なくとも一方を含む。Ｘ線低透過領域は、例えば人物の骨（頭骨、頸椎、椎体、肩甲骨、肋骨、骨盤、四肢など）の領域を含み、Ｘ線高透過領域は、例えば人物の肺野の領域を含む。 Here, in the present embodiment, the term “object” refers to an object to be recognized (deduced or predicted) based on an X-ray image, and is an X-ray low-power object through which a person has a relatively small amount of X-ray transmission. It includes at least one of a transparent region and a high X-ray transparent region through which a relatively large amount of X-rays are transmitted. The low X-ray transmission region includes, for example, human bone regions (cranium, cervical vertebrae, vertebral bodies, shoulder blades, ribs, pelvis, extremities, etc.), and the high X-ray transmission region includes, for example, the lung region of the human. .

記憶部２は、各種の情報を記憶するメモリであり、例えばハードディスクで構成されるが、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、光ディスク、光磁気ディスク、不揮発性メモリなどの記録媒体から適宜選択して構成されてもよい。上記各種の情報には、物体のＸ線画像および上記物体に対応する正解ラベル（詳細は後述する）の学習セット、データ拡張を行った後のＸ線画像および正解ラベルの学習セットの情報などが含まれる。通信部３は、外部と通信するためのインターフェースであり、入出力ポートのほか、アンテナ、送受信回路、変調回路、復調回路などを含んで構成される。したがって、例えば、データ拡張の元となるＸ線画像や物体の正解ラベルの情報を、通信部３を介して外部から取得して記憶部２に記憶させることが可能である。全体制御部４は、例えばＣＰＵ（Central Processing Unit；中央演算処理装置）で構成されており、Ｘ線画像物体認識システム１の各部の動作を制御する。 The storage unit 2 is a memory for storing various kinds of information, and is composed of, for example, a hard disk. Recording media such as a RAM (Random Access Memory), a ROM (Read Only Memory), an optical disk, a magneto-optical disk, and a non-volatile memory can also be used. It may be configured by appropriately selecting from The various types of information include a learning set of the X-ray image of the object and the correct label (details will be described later) corresponding to the object, information of the X-ray image after data augmentation and the learning set of the correct label, etc. included. The communication unit 3 is an interface for communicating with the outside, and includes an input/output port, an antenna, a transmission/reception circuit, a modulation circuit, a demodulation circuit, and the like. Therefore, for example, it is possible to acquire the information of the correct label of the X-ray image and the object, which are the basis of data expansion, from the outside via the communication unit 3 and store them in the storage unit 2 . The overall control unit 4 is composed of, for example, a CPU (Central Processing Unit), and controls the operation of each unit of the X-ray image object recognition system 1 .

学習ネットワーク５は、記憶部２に記憶された学習セット（例えば物体のＸ線画像と、物体に対応する正解ラベルとを含む学習セット）を用いて機械学習（教師あり学習）を行う学習モデルである。本実施形態では、学習ネットワーク５は、ニューラルネットワークで構成されている。ニューラルネットワークとしては、Ｒ－ＣＮＮ、ＦａｓｔＲ－ＣＮＮ、ＦａｓｔｅｒＲ－ＣＮＮ、ＦＣＮ（Fully Convolutional Networks；完全畳み込みネットワーク）、SegNet、U-Netなどの公知のネットワークを利用することができるが、利用可能なニューラルネットワークはこれらに限定されない。 The learning network 5 is a learning model that performs machine learning (supervised learning) using a learning set (for example, a learning set including an X-ray image of an object and a correct label corresponding to the object) stored in the storage unit 2. be. In this embodiment, the learning network 5 is composed of a neural network. As the neural network, known networks such as R-CNN, Fast R-CNN, Faster R-CNN, FCN (Fully Convolutional Networks), SegNet, U-Net can be used. Neural networks are not limited to these.

撮影情報算出部６は、上記学習セットから、物体のＸ線撮影時の撮影条件を導くための撮影情報を算出する。データ拡張パラメータ決定部７は、撮影情報算出部６によって算出された撮影情報に基づいて、学習用のＸ線画像から新たなＸ線画像を作成するデータ拡張を行う際に用いるデータ拡張パラメータを決定する。学習処理部８は、上記データ拡張パラメータに基づいてデータ拡張を行い、取得した新たなＸ線画像と正解ラベルとを用いて学習ネットワーク５を機械学習させる。なお、撮影情報算出部６、データ拡張パラメータ決定部７、および学習処理部８の詳細については、以下の動作説明の中で併せて行う。 The imaging information calculation unit 6 calculates imaging information for deriving imaging conditions for X-ray imaging of an object from the learning set. Based on the imaging information calculated by the imaging information calculator 6, the data augmentation parameter determining unit 7 determines data augmentation parameters used when performing data augmentation to create a new X-ray image from the X-ray image for learning. do. The learning processing unit 8 performs data extension based on the data extension parameter, and machine-learns the learning network 5 using the acquired new X-ray image and the correct label. Details of the photographing information calculation unit 6, the data expansion parameter determination unit 7, and the learning processing unit 8 will be described together in the following description of operations.

（Ｘ線画像物体認識システムの動作（学習時））
次に、本実施形態のＸ線画像物体認識システム１の動作について説明する。本実施形態では、入力画像に対する物体認識（推論）の前に、学習用のＸ線画像と正解ラベルとを含む学習セットを利用して、学習ネットワーク５の学習が行われる。図１０は、学習ネットワーク５の学習方法における処理の流れを示すフローチャートである。この学習方法は、学習セット準備工程（Ｓ１）と、撮影情報算出工程（Ｓ２）と、データ拡張パラメータ決定工程（Ｓ３）と、データ拡張工程（Ｓ４）と、機械学習工程（Ｓ５）とを含む。以下、各工程の詳細について説明する。 (Operation of X-ray image object recognition system (during learning))
Next, the operation of the X-ray image object recognition system 1 of this embodiment will be described. In this embodiment, the learning network 5 is trained using a learning set including X-ray images for learning and correct labels before object recognition (inference) for input images. FIG. 10 is a flow chart showing the flow of processing in the learning method of the learning network 5. As shown in FIG. This learning method includes a learning set preparation step (S1), an imaging information calculation step (S2), a data expansion parameter determination step (S3), a data expansion step (S4), and a machine learning step (S5). . Details of each step will be described below.

〈Ｓ１；学習セット準備工程〉
Ｓ１では、学習用のＸ線画像と正解ラベルとを含む学習セットを準備する。ここでは、上記学習セットを外部の図示しないＰＣまたはデータベース（サーバー）にて用意し、上記学習セットのデータを上記ＰＣ等からＸ線画像物体認識システム１に送信することで、上記学習セットを準備する。なお、上記の学習セットは、Ｘ線画像物体認識システム１の内部で作成されて準備されてもよい。 <S1; learning set preparation step>
In S1, a learning set including X-ray images for learning and correct labels is prepared. Here, the learning set is prepared by preparing the learning set in an external PC or database (server) (not shown) and transmitting the data of the learning set from the PC or the like to the X-ray image object recognition system 1. do. Note that the learning set described above may be created and prepared inside the X-ray image object recognition system 1 .

図１１Ａは、学習セットに含まれるＸ線画像の一例を示し、図１１Ｂは、上記Ｘ線画像に基づいて作成された正解ラベルの一例を示している。例えば第三者は、図１１ＡのＸ線画像から、上腕骨、肺野と重なる肋骨、肺野と重ならない肋骨、の各領域をそれぞれ把握することができる。そこで、第三者は、外部のＰＣにおいて所定の図形作成用ソフトウェアを用いて人為的な操作を行うことにより、上記各領域の形状と対応する形状（形状が同一である場合、同一ではないが非常に近い場合の両方を含む）の正解ラベルＬ１～Ｌ３を作成する。上記Ｘ線画像と上記正解ラベルＬ１～Ｌ３とを含む学習セットの情報は、上記ＰＣからＸ線画像物体認識システム１に送信されて記憶部２に記憶される。このとき、第三者は、図１２に示すように、Ｘ線画像内でＸ線が照射されていない領域を示す照射野外に対応する形状の正解ラベルＬ４をさらに作成して、上記学習セットに含めるようにしてもよい。なお、図１１Ｂおよび図１２において、符号Ｂは、背景の領域を指す（他の図面でも同様とする）。 FIG. 11A shows an example of X-ray images included in the learning set, and FIG. 11B shows an example of correct labels created based on the X-ray images. For example, from the X-ray image of FIG. 11A, a third person can grasp each region of the humerus, the ribs overlapping the lung fields, and the ribs not overlapping the lung fields. Therefore, a third party can artificially manipulate the shape of each area using a predetermined figure creation software on an external PC to create a shape corresponding to the shape of each region (if the shape is the same, it is not the same). correct labels L1 to L3 for (including both very close cases) are created. The learning set information including the X-ray image and the correct labels L1 to L3 is transmitted from the PC to the X-ray image object recognition system 1 and stored in the storage unit 2. FIG. At this time, as shown in FIG. 12, the third party further creates a correct label L4 having a shape corresponding to the irradiation field, which indicates an area in the X-ray image that is not irradiated with X-rays, and adds it to the learning set. may be included. In FIGS. 11B and 12, reference character B indicates a background area (the same applies to other drawings).

図１３Ａは、他の学習セットに含まれるＸ線画像の一例を示し、図１３Ｂは、上記Ｘ線画像に基づいて作成された正解ラベルの一例を示している。この例は、図１３ＡのＸ線画像から、上記Ｘ線画像に含まれる肺野と重なる肋骨、肺野と重ならない肋骨、の各領域の形状と対応する形状の正解ラベルＬ１１・Ｌ１２を第三者がＰＣで作成した場合を示している。上記Ｘ線画像と上記正解ラベルＬ１１・Ｌ１２とを含む学習セットの情報は、上記と同様に、上記ＰＣからＸ線画像物体認識システム１に送信されて記憶部２に記憶される。 FIG. 13A shows an example of X-ray images included in another learning set, and FIG. 13B shows an example of correct labels created based on the X-ray images. In this example, from the X-ray image of FIG. 13A, the correct labels L11 and L12 of the shapes corresponding to the shapes of the ribs overlapping the lung field and the ribs not overlapping the lung field contained in the X-ray image are displayed as third labels. The figure shows a case where a person created the document on a PC. The learning set information including the X-ray image and the correct labels L11 and L12 is transmitted from the PC to the X-ray image object recognition system 1 and stored in the storage unit 2 in the same manner as described above.

上記した正解ラベルは、人体の骨（例えば頭骨、頸椎、椎体、肩甲骨、肋骨、骨盤、四肢など）の領域とそれぞれ対応する形状で予め作成（付与）されるため、骨ごとに異なる形状のラベルとなっている。上記の骨以外の領域であっても、特徴的な構造物（例えば心臓などの臓器、肺野）に対して対応する形状の正解ラベルが予め作成されてもよい。 The above-mentioned correct labels are created (given) in advance in shapes corresponding to regions of the bones of the human body (for example, skull, cervical vertebrae, vertebral bodies, shoulder blades, ribs, pelvis, limbs, etc.). is labeled. Correct labels having shapes corresponding to characteristic structures (for example, organs such as the heart and lung fields) may be created in advance even for regions other than the above bones.

なお、以上で示した正解ラベルは、Ｘ線画像において物体の領域と形状が対応していることから、物体の領域抽出（Segmentation）を目的として作成（付与）されたラベルであると言える。しかし、図１４に示すように、Ｘ線画像内の物体認識用の矩形領域を正解ラベルとして用いてもよい。同図では、Ｘ線画像において、肺野を囲む矩形領域Ｒ１、心臓を囲む矩形領域Ｒ２、上腕骨頭を囲む矩形領域Ｒ３、肋骨を囲む矩形領域Ｒ４を、それぞれ正解ラベルとして用いる例を示している。 It should be noted that the correct label described above can be said to be a label created (assigned) for the purpose of segmentation of the object, since the region and shape of the object correspond to each other in the X-ray image. However, as shown in FIG. 14, a rectangular area for object recognition in the X-ray image may be used as the correct label. The figure shows an example in which a rectangular region R1 surrounding the lung field, a rectangular region R2 surrounding the heart, a rectangular region R3 surrounding the head of the humerus, and a rectangular region R4 surrounding the ribs in the X-ray image are used as correct labels. .

〈Ｓ２；撮影情報算出工程〉
撮影情報算出部６は、Ｓ１で準備した学習セットに含まれるＸ線画像と、そのＸ線画像に含まれる物体の領域と対応する形状の正解ラベルとに基づいて、Ｘ線撮影時の撮影条件を導くための撮影情報を算出する。例えば、図１１Ａで示したＸ線画像について撮影情報を算出する場合、撮影情報算出部６は、上記Ｘ線画像において正解ラベルＬ１～Ｌ３と対応する物体の領域の画素数を算出する。上記画素数は、撮影対象部位およびＸ線撮影時の撮影方位に固有の値であり、上記画素数に基づいて、例えば「上腕骨および肋骨を正面から撮影した」ことを導くことができる。このため、上記画素数は、Ｘ線撮影時の撮影条件を導くための撮影情報を構成する。なお、このときの撮影情報（物体領域の画素数）の算出は、学習用のＸ線画像と正解ラベルとを含む学習セットのそれぞれについて行われる。 <S2; Photographing Information Calculation Process>
The imaging information calculation unit 6 calculates the imaging conditions for X-ray imaging based on the X-ray images included in the learning set prepared in S1 and the correct label having a shape corresponding to the region of the object included in the X-ray image. , to calculate shooting information for deriving . For example, when calculating the imaging information for the X-ray image shown in FIG. 11A, the imaging information calculator 6 calculates the number of pixels in the regions of the object corresponding to the correct labels L1 to L3 in the X-ray image. The number of pixels is a value unique to the region to be imaged and the imaging direction during X-ray imaging, and based on the number of pixels, it is possible to derive, for example, that "the humerus and ribs were imaged from the front". Therefore, the number of pixels constitutes imaging information for deriving imaging conditions during X-ray imaging. Note that the calculation of the imaging information (the number of pixels in the object region) at this time is performed for each learning set including the X-ray image for learning and the correct label.

また、撮影情報算出部６は、学習セットに含まれるＸ線画像内で物体を囲む矩形領域に基づいて、撮影情報を算出してもよい。例えば、図１４で示したＸ線画像について撮影情報を算出する場合、撮影情報算出部６は、Ｘ線画像の画像データ（画素値）に基づいて、肺野を囲む矩形領域Ｒ１、心臓を囲む矩形領域Ｒ２、上腕骨頭を囲む矩形領域Ｒ３、肋骨を囲む矩形領域Ｒ４をそれぞれ設定し、各矩形領域Ｒ１～Ｒ４の面積（または画素数）を算出してもよい。上記面積は、撮影対象部位およびＸ線撮影時の撮影方位に固有の値であり、上記面積に基づいて、例えば「胸部を正面から撮影した」ことを導くことができる。このため、上記面積も、Ｘ線撮影時の撮影条件を導くための撮影情報を構成する。なお、このときの撮影情報（矩形領域の面積）の算出は、学習用のＸ線画像のそれぞれについて行われる。 The imaging information calculation unit 6 may also calculate imaging information based on a rectangular area surrounding an object in the X-ray images included in the learning set. For example, when calculating imaging information for the X-ray image shown in FIG. A rectangular region R2, a rectangular region R3 surrounding the head of the humerus, and a rectangular region R4 surrounding the ribs may be set, and the area (or the number of pixels) of each of the rectangular regions R1 to R4 may be calculated. The area is a value unique to the region to be imaged and the imaging direction at the time of X-ray imaging, and based on the area, it is possible to derive, for example, that "the chest was imaged from the front". Therefore, the area also constitutes imaging information for deriving imaging conditions during X-ray imaging. Note that the calculation of the imaging information (the area of the rectangular region) at this time is performed for each X-ray image for learning.

また、撮影情報算出部６は、Ｓ１で準備した学習セットから、学習セットに含まれるＸ線画像内で物体以外の領域を示す物体外領域情報をさらに算出してもよい。上記の物体外領域情報としては、ここでは、Ｘ線画像内でＸ線が照射されていない領域の情報、つまり、Ｘ線の照射野外の情報を考えることができる。照射野外の情報の算出方法としては、以下の３つの方法のうちの少なくともいずれかを採用することができる。 In addition, the imaging information calculation unit 6 may further calculate, from the learning set prepared in S1, out-of-object region information indicating a region other than the object in the X-ray image included in the learning set. As the above-described non-object area information, here, information on an area not irradiated with X-rays in an X-ray image, that is, information on an X-ray irradiation field can be considered. At least one of the following three methods can be adopted as a method of calculating the information of the irradiation field.

（１）図１２で示したように、Ｘ線画像内で照射野外に対応する形状の正解ラベルＬ４が予め作成され、その正解ラベルＬ４が学習セットに含まれている場合、撮影情報算出部６は、学習セットに含まれる正解ラベルＬ４に基づいて、照射野外の情報を算出する。例えば、撮影情報算出部６は、学習セットのＸ線画像内で正解ラベルＬ４と対応する領域を、照射野外の領域であると判断し、その領域の画素数を照射野外の情報として算出（出力）する。 (1) As shown in FIG. 12, when a correct label L4 having a shape corresponding to the irradiation field in the X-ray image is created in advance and the correct label L4 is included in the learning set, the imaging information calculation unit 6 calculates the information of the irradiation field based on the correct label L4 included in the learning set. For example, the imaging information calculation unit 6 determines that the region corresponding to the correct label L4 in the X-ray image of the learning set is the irradiation field region, and calculates (outputs) the number of pixels in that region as the irradiation field information. )do.

（２）撮影情報算出部６は、学習セットに含まれるＸ線画像のヒストグラム情報に基づいて、照射野外の情報を算出する。図１５は、Ｘ線画像のヒストグラムの一例を模式的に示している。一般的に、Ｘ線画像では、骨領域のように、Ｘ線が透過しにくい領域は、白
く映り、Ｘ線が透過しやすい領域は黒く映る。照射野外は、Ｘ線撮影時の被爆を防ぐべく
、Ｘ線が透過しないように対象部位以外を遮蔽することによって生じるため、Ｘ線画像で
は最も白く映る。したがって、撮影情報算出部６は、図１５に示すように、Ｘ線画像における画素値と度数との関係を示すヒストグラムを作成し、Ｘ線画像全体の画素数に対する、画素値が閾値Ｔｈ以上である度数の合計の割合を算出することにより、Ｘ線画像全体に対する照射野外の領域の割合を照射野外の情報として算出することができる。 (2) The imaging information calculator 6 calculates the information of the irradiation field based on the histogram information of the X-ray images included in the learning set. FIG. 15 schematically shows an example of a histogram of an X-ray image. Generally, in an X-ray image, a region through which X-rays are difficult to pass, such as a bone region, appears white, and a region through which X-rays easily pass appears black. The irradiated field appears whitest in the X-ray image because it is generated by shielding the part other than the target part so that the X-rays do not pass through in order to prevent exposure during X-ray imaging. Therefore, as shown in FIG. 15, the imaging information calculation unit 6 creates a histogram showing the relationship between the pixel value and the frequency in the X-ray image, and determines if the pixel value is equal to or greater than the threshold value Th with respect to the number of pixels in the entire X-ray image. By calculating the ratio of the total of certain frequencies, the ratio of the irradiation field area to the entire X-ray image can be calculated as the irradiation field information.

（３）撮影情報算出部６は、学習セットに含まれるＸ線画像の各画素値を二値化した二値化画像に基づいて、照射野外の情報を算出する。上記（２）でも述べたように、照射野外は、Ｘ線画像では最も白く映る。例えば、Ｘ線画像の各画素値の取り得る範囲が０（黒）～４０９５（白）であれば、閾値として４０００を考えることにより、画素値が０～４０００までを「０」とし、画素値が４００１～４０９５までを「１」とする二値化処理を行うことができる。例えば図１１ＡのＸ線画像に対して上記の二値化処理を行うと、図１６に示すような二値化画像が得られる。なお、図１６において、二値化した後の画素値が「１」の領域Ｔ１は、照射野外の領域に対応し、二値化した後の画素値が「０」の領域Ｔ２は、照射野の領域に対応する。このように、撮影情報算出部６は、Ｘ線画像の各画素値を二値化することにより、二値化画像から照射野外の領域Ｔ１を認識することができ、これによって、領域Ｔ１の画素数を照射野外の情報として算出（出力）することができる。 (3) The imaging information calculation unit 6 calculates information of the irradiation field based on a binarized image obtained by binarizing each pixel value of the X-ray image included in the learning set. As described in (2) above, the irradiated field appears whitest in the X-ray image. For example, if the possible range of each pixel value of an X-ray image is 0 (black) to 4095 (white), by considering 4000 as a threshold value, the pixel value from 0 to 4000 is set to "0", and the pixel value can be binarized so that 4001 to 4095 are set to "1". For example, if the above binarization processing is performed on the X-ray image of FIG. 11A, a binarized image as shown in FIG. 16 is obtained. In FIG. 16, a region T1 having a binarized pixel value of “1” corresponds to an area outside the irradiation field, and a region T2 having a binarized pixel value of “0” corresponds to the irradiation field. corresponds to the region of By binarizing each pixel value of the X-ray image in this manner, the imaging information calculation unit 6 can recognize the area T1 outside the irradiation field from the binarized image, and thereby the pixels of the area T1 can be recognized. The number can be calculated (output) as information of the irradiation field.

〈Ｓ３；データ拡張パラメータ決定工程〉
データ拡張パラメータ決定部７は、Ｓ２で取得された撮影情報と、物体外領域情報（照射野外の情報）と、予め設定された閾値とに基づいて、データ拡張パラメータを決定する。より具体的には以下の通りである。なお、ここでは、例として、Ｘ線画像のサイズを４８０画素×３６０画素として説明するが、以下で示す閾値は、画像サイズに応じて適宜調整可能である。 <S3; Data extension parameter determination step>
The data augmentation parameter determination unit 7 determines data augmentation parameters based on the imaging information acquired in S2, the non-object area information (irradiation field information), and a preset threshold value. More specifically, it is as follows. Here, as an example, the size of the X-ray image is described as 480 pixels×360 pixels, but the threshold value shown below can be adjusted as appropriate according to the image size.

まず、データ拡張パラメータ決定部７は、学習セットに含まれるＸ線画像において、「背景のサイズ（正解ラベル無しの領域の画素の総和）／画像サイズ（画像全体の画素数）≧０．９０」を満足するか否か、または、「照射野外の領域が３００００画素（第３の閾値）以上」を満足するか否かを判断する。上記条件を満足する場合、Ｘ線画像内に照射野外の領域などがあり、Ｘ線画像全体に対して物体の占める領域が絞られた画像であると判断できる。 First, the data augmentation parameter determination unit 7 determines that “background size (sum of pixels in unlabeled region)/image size (number of pixels in entire image)≧0.90” in the X-ray images included in the learning set. or whether "the area outside the irradiation field is 30000 pixels (third threshold) or more" is satisfied. When the above conditions are satisfied, it can be determined that the X-ray image has an area outside the irradiation field and the like, and the area occupied by the object is limited to the entire X-ray image.

次に、データ拡張パラメータ決定部７は、Ｘ線画像において、「上腕骨を示す正解ラベルＬ１（図１１Ｂ参照）と対応する領域の総画素数≧第１の閾値（例えば１５０００画素）」を満足するか否かを判断する。条件を満足する場合、Ｘ線画像において上腕骨の領域がかなり大きい割合を占めるため、Ｘ線画像は、大人の画像、つまり、画像全体に対して物体のスケールが大きい画像であると判断できる。この場合、データ拡張パラメータ決定部７は、Ｘ線画像をそれ以上大きくすると、あり得ない撮影シーンの画像となる可能性が高い（他の撮影シーンに適合しない）と判断し、データ拡張パラメータとしてのＸ線画像の縮小・拡大率を０．６～１倍の間でランダムに設定する。つまり、データ拡張パラメータ決定部７は、Ｘ線画像内の物体の領域の総画素数が第１の閾値以上である場合には、Ｘ線画像の縮小・拡大率を、Ｘ線画像を等倍または縮小する値に設定する。 Next, the data augmentation parameter determination unit 7 satisfies “the total number of pixels in the region corresponding to the correct label L1 (see FIG. 11B) indicating the humerus≧the first threshold value (for example, 15000 pixels)” in the X-ray image. decide whether to If the condition is satisfied, the humerus area occupies a fairly large proportion in the X-ray image, so it can be determined that the X-ray image is an image of an adult, that is, an image in which the scale of the object is large relative to the entire image. In this case, the data expansion parameter determination unit 7 determines that if the X-ray image is enlarged any further, it is highly likely that the image will be an image of an impossible imaging scene (not suitable for other imaging scenes). Randomly set the X-ray image reduction/enlargement ratio between 0.6 and 1 times. That is, when the total number of pixels in the object region in the X-ray image is equal to or greater than the first threshold, the data expansion parameter determination unit 7 sets the X-ray image reduction/enlargement ratio to the same size as the X-ray image. Or set it to a value to shrink.

次に、データ拡張パラメータ決定部７は、Ｘ線画像において、「上腕骨を示す正解ラベルＬ１と対応する領域の総画素数≦第２の閾値（例えば３０００画素）」を満足するか否かを判断する。上記条件を満足する場合、上腕骨がかなり小さいため、Ｘ線画像は、子供の画像、つまり、画像全体に対して物体のスケールが小さい画像であると判断できる。この場合、データ拡張パラメータ決定部７は、Ｘ線画像をそれ以上小さくすると、あり得ない撮影シーンの画像となる可能性が高いと判断し、データ拡張パラメータとしてのＸ線画像の縮小・拡大率を１～１．４倍の間でランダムに設定する。つまり、データ拡張パラメータ決定部７は、Ｘ線画像内の物体の領域の総画素数が第１の閾値よりも小さい第２の閾値以下である場合には、Ｘ線画像の縮小・拡大率を、Ｘ線画像を等倍または拡大する値に設定する。 Next, the data augmentation parameter determination unit 7 determines whether or not the X-ray image satisfies “the total number of pixels in the region corresponding to the correct label L1 indicating the humerus≦second threshold value (for example, 3000 pixels)”. to decide. If the above conditions are satisfied, the humerus is fairly small, so the X-ray image can be determined to be an image of a child, that is, an image with a small scale of the object relative to the entire image. In this case, the data expansion parameter determination unit 7 determines that if the X-ray image is made smaller than this, it is highly likely that the image will be an image of an impossible imaging scene. is set randomly between 1 and 1.4 times. That is, when the total number of pixels in the region of the object in the X-ray image is equal to or less than the second threshold, which is smaller than the first threshold, the data expansion parameter determination unit 7 sets the reduction/enlargement ratio of the X-ray image to , to a value that magnifies or magnifies the X-ray image.

一方、上記いずれの条件も満足しない場合、つまり、Ｘ線画像において、「第２の閾値＜上腕骨を示す正解ラベルＬ１と対応する領域の総画素数＜第１の閾値」である場合、データ拡張パラメータ決定部７は、元のＸ線画像を拡大しても縮小しても、あり得ない撮影シーンの画像となる可能性が低い（他の撮影シーンに適合する）と判断し、データ拡張パラメータとしてのＸ線画像の縮小・拡大率を０．８～１．２倍の間でランダムに設定する。 On the other hand, if none of the above conditions are satisfied, that is, if "the second threshold < the total number of pixels in the region corresponding to the correct label L1 indicating the humerus < the first threshold" in the X-ray image, the data The extension parameter determination unit 7 determines that the image is unlikely to be an image of an impossible imaging scene (suitable for other imaging scenes) regardless of whether the original X-ray image is enlarged or reduced, and data extension is performed. A reduction/enlargement ratio of the X-ray image as a parameter is randomly set between 0.8 and 1.2 times.

次に、データ拡張パラメータ決定部７は、上記スケール設定（縮小・拡大率の設定）に従って、その他のデータ拡張パラメータを決定する。例えば、データ拡張パラメータ決定部７は、Ｘ線画像の縮小・拡大率を０．６～１倍に設定した場合、Ｘ線画像のシフト量を、上下左右斜め方向に±４０画素数の範囲でランダムに設定し、Ｘ線画像の回転角を、±６°の範囲でランダムに設定する。また、データ拡張パラメータ決定部７は、Ｘ線画像の縮小・拡大率を１～１．４倍に設定した場合、Ｘ線画像のシフト量を、上下左右斜め方向に±２０画素数の範囲でランダムに設定し、Ｘ線画像の回転角を、±２°の範囲でランダムに設定する。さらに、データ拡張パラメータ決定部７は、Ｘ線画像の縮小・拡大率を０．８～１．２倍に設定した場合、Ｘ線画像のシフト量を、上下左右斜め方向に±３０画素数の範囲でランダムに設定し、Ｘ線画像の回転角を、±４°の範囲でランダムに設定する。 Next, the data expansion parameter determining unit 7 determines other data expansion parameters according to the scale setting (reduction/enlargement ratio setting). For example, when the reduction/enlargement ratio of the X-ray image is set to 0.6 to 1, the data expansion parameter determination unit 7 sets the shift amount of the X-ray image within a range of ±40 pixels in the vertical, horizontal, and diagonal directions. It is set randomly, and the rotation angle of the X-ray image is set randomly within a range of ±6°. In addition, when the X-ray image reduction/enlargement ratio is set to 1 to 1.4 times, the data expansion parameter determination unit 7 sets the shift amount of the X-ray image within a range of ±20 pixels in the vertical, horizontal, and diagonal directions. It is set randomly, and the rotation angle of the X-ray image is set randomly within a range of ±2°. Furthermore, when the X-ray image reduction/enlargement ratio is set to 0.8 to 1.2 times, the data expansion parameter determination unit 7 sets the shift amount of the X-ray image to ±30 pixels in the vertical, horizontal, and diagonal directions. The range is set randomly, and the rotation angle of the X-ray image is set randomly within the range of ±4°.

なお、上記したシフト量などの設定は、上記スケール設定に応じた固定範囲内で（そのままスケール情報に対応して連動して）行ってもよいが、実際のスケール設定の情報に応じて可変にしてもよい。例えば、データ拡張パラメータ決定部７は、Ｘ線画像の縮小・拡大率を１～１．４倍の間でＡ倍に設定した場合、Ｘ線画像のシフト量を、上下左右斜め方向に±４０×Ａの画素数の範囲でランダムに設定し、Ｘ線画像の回転角を、±６°×Ａの範囲でランダムに設定するなどして、シフト量等にスケールの割合を反映してもよい。 The setting of the shift amount and the like may be performed within a fixed range according to the scale setting (in conjunction with the scale information as it is), but it may be variable according to the actual scale setting information. may For example, when the X-ray image reduction/enlargement ratio is set to A times between 1 and 1.4 times, the data expansion parameter determination unit 7 sets the shift amount of the X-ray image to ±40 in the vertical, horizontal, and diagonal directions. The ratio of the scale may be reflected in the shift amount, etc. by randomly setting the number of pixels within the range of ×A and setting the rotation angle of the X-ray image randomly within the range of ±6° ×A. .

また、本実施形態では、データ拡張パラメータ決定部７は、Ｘ線画像内の物体外領域（照射野外の領域）の総画素数が３００００画素（第３の閾値）未満である場合、上記Ｘ線画像が、胸部の正面画像であると判断する。胸部の正面画像は大人についても子供についても多く集まりやすいため、データ拡張の範囲を広げる必要性に乏しい。そこで、データ拡張パラメータ決定部７は、データ拡張パラメータの設定可能範囲を制限する。つまり、データ拡張パラメータ決定部７は、データ拡張が行われない値に縮小・拡大率、シフト量、回転角を設定するか、データ拡張が微小量だけ行われる値に設定する。例えば、データ拡張パラメータ決定部７は、縮小・拡大率を０．９～１．１倍の範囲でランダムに設定し、Ｘ線画像のシフト量を、上下左右斜め方向に±１０画素数の範囲でランダムに設定し、Ｘ線画像の回転角を、±１°の範囲でランダムに設定する。 In addition, in the present embodiment, the data augmentation parameter determination unit 7 determines that, when the total number of pixels in the non-object region (region outside the irradiation field) in the X-ray image is less than 30000 pixels (third threshold), the X-ray Determine that the image is a frontal image of the chest. Since many frontal images of the chest are likely to be collected for both adults and children, there is little need to expand the range of data expansion. Therefore, the data extension parameter determination unit 7 limits the settable range of the data extension parameters. That is, the data expansion parameter determining unit 7 sets the reduction/enlargement ratio, shift amount, and rotation angle to values that do not expand data, or sets values that cause only a small amount of data expansion. For example, the data expansion parameter determination unit 7 randomly sets the reduction/enlargement ratio in the range of 0.9 to 1.1 times, and the shift amount of the X-ray image is set in the range of ±10 pixels in the vertical, horizontal, and diagonal directions. , and the rotation angle of the X-ray image is set randomly within a range of ±1°.

〈Ｓ４；データ拡張工程〉
学習処理部８は、Ｓ３で決定されたデータ拡張パラメータに基づいて、学習セットに含まれるＸ線画像から新たなＸ線画像を作成するデータ拡張を行う。データ拡張の種類としては、Horizontal Flip（水平方向の反転）、Vertical Flip（垂直方向の反転）、Crop（１枚の画像からランダムに切り抜く）、Scale（スケールを変化させながらCrop）、Rotation（画像を回転）、Cutout（画像の一部をマスクすることによって、より汎化能力をあげる）、Sift（画像位置を変える）、などがある。ここでは、Ｓ３で決定されたデータ拡張パラメータに基づいて、Scale、Rotation、Siftのデータ拡張が行われる。つまり、決定された縮小・拡大率、回転角、シフト量でデータ拡張（Scale、Rotation、Sift）が行われる。Scale、Rotation、Sift以外のデータ拡張は、必要に応じて行われればよい。 <S4; data extension step>
The learning processing unit 8 performs data extension to create new X-ray images from the X-ray images included in the learning set based on the data extension parameters determined in S3. Types of data expansion include Horizontal Flip (horizontal flip), Vertical Flip (vertical flip), Crop (cutting out randomly from one image), Scale (Crop while changing the scale), Rotation (image Rotate), Cutout (increase generalization ability by masking part of the image), Shift (change image position), etc. Here, data expansion of Scale, Rotation, and Shift is performed based on the data expansion parameters determined in S3. That is, data extension (Scale, Rotation, Shift) is performed with the determined reduction/enlargement ratio, rotation angle, and shift amount. Data extensions other than Scale, Rotation, and Shift may be performed as required.

〈Ｓ５；機械学習工程〉
学習処理部８は、Ｓ４でのデータ拡張によって取得した新たなＸ線画像と正解ラベルとを用いて学習ネットワーク５を機械学習させる。学習ネットワーク５の学習アルゴリズムとしては、一般的な誤差逆伝播法（バックプロパゲーション）を用いることができる。誤差逆伝播法は、学習ネットワーク５への画像（画素値）の入力に対して学習ネットワーク５の最終層から出力される値（尤度（スコア））と、正解を示す値（尤度（スコア））との２乗誤差が最小となるように、最急降下法を用いて、学習ネットワーク５を構成する各ノード（ユニット）の重み（結合荷重）を最終層側から入力層側に向かって順次変化させていく手法である。このような機械学習により、学習済みの学習ネットワーク５（学習モデル）が得られる。 <S5; machine learning step>
The learning processing unit 8 machine-learns the learning network 5 using the new X-ray image and the correct label obtained by the data extension in S4. As a learning algorithm for the learning network 5, a general error backpropagation method (backpropagation) can be used. The error backpropagation method uses a value (likelihood (score)) output from the final layer of the learning network 5 for an input of an image (pixel value) to the learning network 5 and a value (likelihood (score) )), the weight (connection weight) of each node (unit) constituting the learning network 5 is sequentially applied from the final layer side to the input layer side using the steepest descent method so that the squared error between )) is minimized. It is a method of change. Through such machine learning, a learned learning network 5 (learning model) is obtained.

なお、学習ネットワーク５として、SegNetのような重み付け学習ネットワークを使用する場合、入力画像（データ拡張された新たなＸ線画像）ごとに、各クラス（領域）の重み（寄与率）を変更するようにしてもよい。例えばＸ線画像に上腕骨の領域と背景の領域とが含まれる場合において、上腕骨の領域の面積が背景の領域の面積よりも非常に小さいと、背景の領域に引っ張られてネットワークが学習される結果、学習後のネットワークで上腕骨の領域を認識する際の精度が低下するおそれがある。上記のように学習時に入力画像ごとに各クラスの重みを設定することにより、学習後の各領域の認識精度の低下を抑えることができる。 When a weighted learning network such as SegNet is used as the learning network 5, the weight (contribution rate) of each class (region) is changed for each input image (new data-extended X-ray image). can be For example, when an X-ray image includes a humerus region and a background region, if the area of the humerus region is much smaller than the area of the background region, the network is learned by being pulled by the background region. As a result, there is a risk that the accuracy of recognizing the humerus region in the network after training may decrease. By setting the weight of each class for each input image during learning as described above, it is possible to suppress a decrease in the recognition accuracy of each region after learning.

ちなみに、SegNet の論文（A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation、https://arxiv.org/pdf/1511.00561.pdf）では、学習時に用いる全画像から、各クラスの重み（class balancing）を計算することが開示されているが、本実施形態では、全画像ではなく、入力される画像ごとに各クラスの重みを計算する点で、上記の論文の手法とは異なる。 By the way, in the SegNet paper (A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, https://arxiv.org/pdf/1511.00561.pdf), the weight of each class (class balancing) is calculated from all images used during training However, this embodiment differs from the method of the above paper in that the weight of each class is calculated for each input image rather than for all images.

（Ｘ線画像物体認識システムの動作（推論時））
上述のようにして学習ネットワーク５が機械学習されると、その学習ネットワーク５を用いて、入力画像に含まれる物体の認識（物体の推論、予測）を行うことが可能となる。図１７は、Ｘ線画像物体認識システム１において、物体認識時の処理の流れを示すフローチャートである。 (Operation of X-ray image object recognition system (during inference))
When the learning network 5 is machine-learned as described above, the learning network 5 can be used to recognize an object included in an input image (object inference and prediction). FIG. 17 is a flow chart showing the flow of processing during object recognition in the X-ray image object recognition system 1 .

学習ネットワーク５の入力層に、認識対象となる物体を撮影したＸ線画像が入力されると（Ｓ１１；Ｘ線画像入力工程）、学習ネットワーク５は、入力された上記Ｘ線画像から、Ｘ線撮影された物体を認識し（Ｓ１２；推論工程）、その認識結果を出力する（Ｓ１３；出力工程）。 When an X-ray image of an object to be recognized is input to the input layer of the learning network 5 (S11; X-ray image input step), the learning network 5 extracts X-rays from the input X-ray image. The photographed object is recognized (S12; inference step), and the recognition result is output (S13; output step).

本実施形態のように撮影情報に基づいてデータ拡張パラメータを決定し、決定したデータ拡張パラメータに基づいてデータ拡張を行って学習ネットワーク５を機械学習させた場合、ＩｏＵ（Intersection over Union）値は７０％以上であった。なお、ＩｏＵとは、正解領域と予測領域との重なり具合いを表す指標であり、その値が大きいほど予測が正解に近く、識別性能が高いことを表す。これに対して、撮影情報に基づいてデータ拡張パラメータを決定せず、データ拡張をランダムに行って学習ネットワークを機械学習させた場合、ＩｏＵ値は６５％であった。したがって、本実施形態の手法によれば、Ｘ線画像に含まれる物体の認識精度が向上すると言える。 When the data augmentation parameter is determined based on the shooting information as in the present embodiment, data augmentation is performed based on the determined data augmentation parameter, and the learning network 5 undergoes machine learning, the IoU (Intersection over Union) value is 70. % or more. Note that the IoU is an index representing the degree of overlap between the correct region and the predicted region, and the larger the IoU value, the closer the prediction is to the correct answer, and the higher the identification performance. On the other hand, the IoU value was 65% when the data augmentation parameter was not determined based on the imaging information and the data augmentation was performed randomly and the learning network was machine-learned. Therefore, according to the method of this embodiment, it can be said that the accuracy of recognizing an object included in an X-ray image is improved.

なお、Ｓ１１では、学習時に取得されたデータ拡張パラメータの情報を活用し、入力されたＸ線画像のサイズを、データ拡張パラメータに応じて変更してもよい。この場合、入力Ｘ線画像をデータ拡張時のサイズに近づけて、物体認識をより精度よく行うことが可能となる。 In S11, the information of the data augmentation parameter acquired during learning may be used to change the size of the input X-ray image according to the data augmentation parameter. In this case, it is possible to make the input X-ray image closer to the size at the time of data expansion, and to perform object recognition with higher accuracy.

（効果）
以上のように、本実施形態のＸ線画像物体認識システム１によれば、データ拡張パラメータ決定部７は、撮影情報算出部６によって算出された撮影情報に基づいてデータ拡張パラメータを決定する（Ｓ２、Ｓ３）。物体のＸ線撮影時の撮影条件を考慮してデータ拡張パラメータが適切に決定されるため、学習処理部８が上記データ拡張パラメータに基づいてデータ拡張を行うことで（Ｓ４）、学習用のＸ線画像（例えば正立状態で撮影された胸部正面のＸ線画像）に対して、普段のＸ線撮影ではあり得ない画像（天地が逆転した胸部正面のＸ線画像）が擬似的に作成されるような、意図しないデータ拡張が行われる事態を回避することができる。したがって、学習処理部８が、適切なデータ拡張によって取得した新たなＸ線画像と正解ラベルとを用いて学習ネットワーク５を機械学習させることにより（Ｓ５）、学習ネットワーク５は、推論時（物体の認識時）に学習データ以外のＸ線画像が入力された場合でも、入力された上記Ｘ線画像から、Ｘ線撮影された物体を適切に予測することが可能となり、物体を精度よく認識（推論）することが可能となる（Ｓ１１～Ｓ１３）。 (effect)
As described above, according to the X-ray image object recognition system 1 of the present embodiment, the data augmentation parameter determination unit 7 determines data augmentation parameters based on the imaging information calculated by the imaging information calculation unit 6 (S2 , S3). Since the data augmentation parameters are appropriately determined in consideration of the imaging conditions at the time of X-ray imaging of the object, the learning processing unit 8 performs data augmentation based on the data augmentation parameters (S4). A simulated image (an upside-down frontal chest X-ray image) that would not be possible with normal radiography (for example, an X-ray image of the frontal chest taken in an upright position) is created. It is possible to avoid situations where unintended data extension is performed. Therefore, the learning processing unit 8 machine-learns the learning network 5 using the new X-ray image acquired by appropriate data extension and the correct label (S5). Even if an X-ray image other than learning data is input during recognition (recognition), it is possible to appropriately predict the X-rayed object from the input X-ray image, and the object can be accurately recognized (deduced). ) (S11 to S13).

また、物体の撮影情報に基づいてデータ拡張パラメータが決定され、決定されたデータ拡張パラメータに基づいてデータ拡張が行われ、データ拡張後の画像を用いて学習ネットワークが機械学習されるため、推論時には、単一の学習ネットワーク５で、様々な撮影条件で撮影されたＸ線画像に対応することができる。つまり、どのような撮影条件で撮影されたＸ線画像が入力されても、同じ学習ネットワーク５で物体を認識することができる。したがって、個々の撮影条件ごとに学習ネットワークを用意して推論を行う場合のように、複数の学習ネットワークの中から撮影条件に応じた学習ネットワークを選択するための、撮影者による撮影条件の入力を不要とすることができる。このことは、出力するＸ線画像の画質を改善するための前処理として物体認識を行う際の処理の効率向上にもつながり、迅速な前処理が可能となる。 In addition, the data augmentation parameter is determined based on the shooting information of the object, the data augmentation is performed based on the determined data augmentation parameter, and the learning network is machine-learned using the image after the data augmentation. , the single learning network 5 can handle X-ray images captured under various imaging conditions. That is, the object can be recognized by the same learning network 5 regardless of the X-ray image captured under any imaging conditions. Therefore, as in the case of preparing a learning network for each shooting condition and performing inference, it is possible to input the shooting conditions by the photographer in order to select the learning network according to the shooting conditions from among multiple learning networks. can be made unnecessary. This leads to improvement in processing efficiency when object recognition is performed as preprocessing for improving the image quality of an X-ray image to be output, enabling rapid preprocessing.

また、撮影情報算出部６は、学習セットに含まれるＸ線画像と、そのＸ線画像に含まれる物体の領域と対応する形状の正解ラベルとに基づいて、撮影情報を算出する（Ｓ２）。このように、撮影情報算出部６がＸ線画像と正解ラベルとを用いて撮影情報を算出する構成において、上述した本実施形態の効果を得ることができる。 Further, the imaging information calculation unit 6 calculates imaging information based on the X-ray images included in the learning set and the correct label of the shape corresponding to the region of the object included in the X-ray image (S2). In this manner, the above-described effects of the present embodiment can be obtained in the configuration in which the imaging information calculation unit 6 calculates imaging information using the X-ray image and the correct label.

特に、上記撮影情報は、Ｘ線画像において正解ラベルと対応する物体の領域の画素数である。上記画素数は、Ｘ線撮影時の撮影部位および撮影方位を反映しているため、Ｘ線撮影時の撮影条件を導くための撮影情報として有効に用いることができる。 In particular, the imaging information is the number of pixels in the region of the object corresponding to the correct label in the X-ray image. Since the number of pixels reflects the imaging region and imaging direction at the time of X-ray imaging, it can be effectively used as imaging information for deriving the imaging conditions at the time of X-ray imaging.

このとき、撮影情報算出部６は、学習セットに含まれるＸ線画像内で物体を囲む矩形領域に基づいて、撮影情報を算出してもよい（Ｓ２）。このように、撮影情報算出部６が上記矩形領域に基づいて撮影情報を算出する構成であっても、上述した本実施形態の効果を得ることができる。 At this time, the imaging information calculation unit 6 may calculate the imaging information based on the rectangular area surrounding the object in the X-ray images included in the learning set (S2). In this manner, even with the configuration in which the imaging information calculation unit 6 calculates the imaging information based on the rectangular area, the above-described effects of the present embodiment can be obtained.

特に、上記撮影情報は、上記矩形領域の面積である。上記矩形領域の面積は、Ｘ線撮影時の撮影部位および撮影方位を反映しているため、Ｘ線撮影時の撮影条件を導くための撮影情報として有効に用いることができる。 In particular, the imaging information is the area of the rectangular area. Since the area of the rectangular region reflects the imaging site and imaging direction at the time of X-ray imaging, it can be effectively used as imaging information for deriving the imaging conditions at the time of X-ray imaging.

また、撮影情報算出部６は、学習セットから、学習セットに含まれるＸ線画像内で物体以外の領域を示す物体外領域情報をさらに算出してもよい（Ｓ２）。この場合、データ拡張パラメータ決定部７は、物体外領域情報をさらに考慮に入れてデータ拡張パラメータを決定することができるため、意図しないデータ拡張が行われないようなデータ拡張パラメータを精度よく決定することが可能となる。 Further, the imaging information calculation unit 6 may further calculate, from the learning set, out-of-object region information indicating a region other than the object in the X-ray image included in the learning set (S2). In this case, the data augmentation parameter determining unit 7 can determine the data augmentation parameters further taking into account the extra-object area information, so that the data augmentation parameters are accurately determined so that unintended data augmentation is not performed. becomes possible.

ここで、物体外領域情報は、Ｘ線の照射野外の情報であってもよい。そして、撮影情報算出部６は、学習セットに含まれる、Ｘ線の照射野外の領域に対応する形状の正解ラベル（例えば図１２の正解ラベルＬ４）に基づいて、照射野外の情報（例えば画素数）を算出してもよい。学習データに上記正解ラベルが含まれている場合には、上記正解ラベルに基づいて、照射野外の情報を確実に得ることができる。 Here, the out-of-object region information may be information on the X-ray irradiation field. Then, the imaging information calculation unit 6 calculates the information of the irradiation field (for example, the number of pixels ) may be calculated. When the learning data includes the correct label, the information of the irradiation field can be reliably obtained based on the correct label.

また、撮影情報算出部６は、学習セットに含まれるＸ線画像のヒストグラム情報に基づいて、照射野外の情報を算出してもよい（図１５参照）。照射野外は、Ｘ線画像では白く映るため、上記ヒストグラム情報に基づいて、照射野外の情報（例えば全画像領域に対する照射野外の領域の割合）を確実に得ることができる。 Further, the imaging information calculation unit 6 may calculate the information of the irradiation field based on the histogram information of the X-ray images included in the learning set (see FIG. 15). Since the irradiation field appears white in the X-ray image, the information of the irradiation field (for example, the ratio of the irradiation field area to the entire image area) can be reliably obtained based on the histogram information.

また、撮影情報算出部６は、学習セットに含まれるＸ線画像の各画素値を二値化した二値化画像に基づいて、照射野外の情報を算出してもよい（図１６参照）。照射野外は、Ｘ線画像では白く映るため、上記二値化画像に基づいて、照射野外の情報（例えば画素数）を確実に得ることができる。 Further, the imaging information calculation unit 6 may calculate the information of the irradiation field based on a binarized image obtained by binarizing each pixel value of the X-ray image included in the learning set (see FIG. 16). Since the irradiated field appears white in the X-ray image, information (for example, the number of pixels) of the irradiated field can be reliably obtained based on the binarized image.

また、データ拡張パラメータ決定部７は、撮影情報と、物体外領域情報と、予め設定された閾値とに基づいて、データ拡張パラメータを決定する（Ｓ３）。この場合、撮影情報、物体外領域情報および閾値の３種の情報から、データ拡張パラメータを適切に決定することができる。 The data augmentation parameter determination unit 7 also determines data augmentation parameters based on the imaging information, the non-object area information, and a preset threshold value (S3). In this case, the data extension parameter can be appropriately determined from the three types of information, ie, the imaging information, the non-object area information, and the threshold.

ここで、データ拡張パラメータは、Ｘ線画像の縮小・拡大率、シフト量、回転角のうちの少なくとも１つであってもよい。これらのパラメータについては、適切に設定しないと、意図しないデータ拡張が行われて意図しない画像が作成される可能性が高くなる。本実施形態では、データ拡張パラメータ決定部７が、撮影情報に基づいてデータ拡張パラメータを適切に設定できるため、設定するデータ拡張パラメータに縮小・拡大率、シフト量、回転角の少なくとも１つを含めることで、適切なデータ拡張を確実に行うことが可能となる。つまり、意図しない画像が作成されるような意図しないデータ拡張が行われる事態を確実に回避することができる。 Here, the data expansion parameter may be at least one of the X-ray image reduction/enlargement ratio, shift amount, and rotation angle. If these parameters are not properly set, there is a high possibility that unintended data expansion will be performed and an unintended image will be created. In this embodiment, the data expansion parameter determination unit 7 can appropriately set the data expansion parameters based on the shooting information, so the data expansion parameters to be set include at least one of the reduction/enlargement ratio, shift amount, and rotation angle. This makes it possible to reliably perform appropriate data extension. In other words, it is possible to reliably avoid unintended data expansion that creates an unintended image.

また、データ拡張パラメータ決定部７は、学習セットに含まれるＸ線画像内の物体の領域の総画素数が第１の閾値以上である場合に、データ拡張パラメータとしての縮小・拡大率を、Ｘ線画像を等倍または縮小する値に設定する（Ｓ３）。上記物体の総画素数が第１の閾値以上である場合、Ｘ線画像をさらに拡大すると、あり得ない撮影シーンの画像となる可能性が高くなる。したがって、Ｘ線画像の縮小・拡大率を、Ｘ線画像を等倍または縮小する値に設定することにより、データ拡張によってあり得ない撮影シーンの画像が作成される事態を確実に回避することができる。 Further, the data augmentation parameter determining unit 7 sets the reduction/enlargement rate as the data augmentation parameter to X The line image is set to the same size or reduced value (S3). If the total number of pixels of the object is greater than or equal to the first threshold, further enlargement of the X-ray image will likely result in an image of an impossible imaging scene. Therefore, by setting the reduction/magnification ratio of the X-ray image to the same size or a value that reduces the X-ray image, it is possible to reliably avoid the situation where an image of an impossible imaging scene is created by data expansion. can.

また、データ拡張パラメータ決定部７は、学習セットに含まれるＸ線画像内の物体の領域の総画素数が第１の閾値よりも小さい第２の閾値以下である場合に、縮小・拡大率を、Ｘ線画像を等倍または拡大する値に設定する（Ｓ３）。上記物体の総画素数が第２の閾値以下である場合、Ｘ線画像をさらに縮小すると、あり得ない撮影シーンの画像となる可能性が高くなる。したがって、Ｘ線画像の縮小・拡大率を、Ｘ線画像を等倍または拡大する値に設定することにより、データ拡張によってあり得ない撮影シーンの画像が作成される事態を確実に回避することができる。 Further, the data augmentation parameter determination unit 7 sets the reduction/enlargement rate when the total number of pixels in the region of the object in the X-ray image included in the learning set is equal to or less than a second threshold that is smaller than the first threshold. , is set to a value that magnifies or magnifies the X-ray image (S3). If the total number of pixels of the object is less than or equal to the second threshold, further reduction of the X-ray image will likely result in an image of an impossible imaging scene. Therefore, by setting the reduction/enlargement ratio of the X-ray image to a value that magnifies or equalizes the X-ray image, it is possible to reliably avoid the situation where an image of an impossible imaging scene is created by data expansion. can.

また、データ拡張パラメータ決定部７は、縮小・拡大率とともに、シフト量および回転角の少なくとも一方を決定する（Ｓ３）。縮小・拡大率と併せて、シフト量および／または回転角を決定することにより、学習用のＸ線画像に対して様々なデータ拡張を行って画像数を増やすことができる。 In addition, the data expansion parameter determination unit 7 determines at least one of the shift amount and the rotation angle along with the reduction/enlargement ratio (S3). By determining the shift amount and/or rotation angle together with the reduction/enlargement ratio, various data extensions can be performed on the X-ray images for learning to increase the number of images.

また、データ拡張パラメータ決定部７は、学習セットに含まれるＸ線画像内の物体外領域の総画素数が第３の閾値未満である場合に、データ拡張パラメータの設定可能範囲を制限する（Ｓ３）。上記物体外領域の総画素数が第３の閾値未満である場合、上記Ｘ線画像は、多く集まりやすい胸部の正面画像であると考えられるため、データ拡張によって新たな画像を作成する必要性に乏しく、また、たとえデータ拡張を行うとしても、実際にあり得ない撮影シーンとなるようなデータ拡張を回避すべく、データ拡張を微小量だけ行うことが望ましい。上記のようにデータ拡張パラメータの設定可能範囲を制限することにより、データ拡張パラメータを、データ拡張が行われない値やデータ拡張が微小量だけ行われる値に設定することができる。その結果、不要なデータ拡張が行われるのを回避したり、実際にはあり得ない撮影シーンとなるようなデータ拡張が行われるのを回避することができる。 Further, the data augmentation parameter determination unit 7 limits the settable range of the data augmentation parameter when the total number of pixels in the extra-object region in the X-ray image included in the learning set is less than the third threshold (S3 ). If the total number of pixels in the extra-object region is less than the third threshold, the X-ray image is considered to be a frontal image of the chest that tends to gather in large numbers. In addition, even if data expansion is performed, it is desirable to perform data expansion by a very small amount in order to avoid data expansion that would result in an improbable shooting scene. By limiting the settable range of the data expansion parameter as described above, the data expansion parameter can be set to a value for which no data expansion is performed or a value for which only a small amount of data expansion is performed. As a result, it is possible to avoid unnecessary data expansion, or to avoid data expansion that would result in an impossible shooting scene.

また、上記した物体は、人物においてＸ線の透過量が相対的に少ないＸ線低透過領域、およびＸ線の透過量が相対的に多いＸ線高透過領域の少なくとも一方を含む。このような物体をＸ線撮影して得られるＸ線画像について、本実施形態のデータ拡張を行って学習ネットワーク５を機械学習させることにより、推論時に上記物体を精度よく認識することができる。 Further, the above-described object includes at least one of a low X-ray transmission region in which the amount of X-ray transmission in a person is relatively low and a high X-ray transmission region in which the amount of X-ray transmission is relatively high. By subjecting the learning network 5 to machine learning by performing the data augmentation of the present embodiment on an X-ray image obtained by X-raying such an object, the object can be accurately recognized at the time of inference.

また、上記Ｘ線低透過領域は、人物の骨の領域を含み、上記Ｘ線高透過領域は、人物の肺野の領域を含む。したがって、物体が人物の骨の領域および肺野の領域を含む場合でも、そのような物体をＸ線撮影して得られるＸ線画像について、本実施形態のデータ拡張を行って学習ネットワーク５を機械学習させることにより、推論時に上記物体を精度よく認識することができる。 The low X-ray transmission region includes a bone region of a person, and the high X-ray transmission region includes a lung field region of a person. Therefore, even if the object includes a human bone region and a lung region, the data augmentation of the present embodiment is performed on the X-ray image obtained by X-raying such an object so that the learning network 5 can be used by the machine. By learning, the object can be accurately recognized at the time of inference.

また、学習ネットワーク５は、ニューラルネットワークで構成されている。これにより、学習ネットワーク５を機械学習させて、物体の認識精度を向上させることが可能となる。 Also, the learning network 5 is composed of a neural network. As a result, the learning network 5 can perform machine learning to improve object recognition accuracy.

以上、本発明の実施形態について説明したが、本発明の範囲はこれに限定されるものではなく、発明の主旨を逸脱しない範囲で拡張または変更して実施することができる。 Although the embodiments of the present invention have been described above, the scope of the present invention is not limited thereto, and can be implemented by being expanded or modified without departing from the gist of the invention.

本発明は、Ｘ線画像から撮影対象の物体を認識するシステムに利用可能である。 INDUSTRIAL APPLICABILITY The present invention can be used in a system for recognizing an object to be imaged from an X-ray image.

１Ｘ線画像物体認識システム
５学習ネットワーク
６撮影情報算出部
７データ拡張パラメータ決定部
８学習処理部 REFERENCE SIGNS LIST 1 X-ray image object recognition system 5 learning network 6 imaging information calculation unit 7 data augmentation parameter determination unit 8 learning processing unit

Claims

a learning network that performs machine learning using a learning set that includes an X-ray image of an object and a correct label corresponding to the object;
an imaging information calculation unit that calculates imaging information for deriving imaging conditions for X-ray imaging of the object from the learning set;
a data augmentation parameter determination unit that determines a data augmentation parameter to be used when performing data augmentation for creating a new X-ray image from the X-ray image, based on the imaging information;
a learning processing unit that performs the data extension based on the data extension parameter and machine-learns the learning network using the acquired new X-ray image and the correct label,
The learning network recognizes an X-rayed object from an input X-ray image after performing machine learning using the new X-ray image, and outputs the recognition result,
The imaging information calculation unit calculates the imaging information based on the X-ray image included in the learning set and the correct label having a shape corresponding to the region of the object included in the X-ray image,
An X-ray image object recognition system , wherein the photographing information is the number of pixels in a region of the object corresponding to the correct label in the X-ray image.

a learning network that performs machine learning using a learning set that includes an X-ray image of an object and a correct label corresponding to the object;
an imaging information calculation unit that calculates imaging information for deriving imaging conditions for X-ray imaging of the object from the learning set;
a data augmentation parameter determination unit that determines a data augmentation parameter to be used when performing data augmentation for creating a new X-ray image from the X-ray image, based on the imaging information;
a learning processing unit that performs the data extension based on the data extension parameter and machine-learns the learning network using the acquired new X-ray image and the correct label,
The learning network recognizes an X-rayed object from an input X-ray image after performing machine learning using the new X-ray image, and outputs the recognition result,
The imaging information calculation unit calculates the imaging information based on a rectangular area surrounding the object in the X-ray image included in the learning set;
An X-ray image object recognition system, wherein the imaging information is the area of the rectangular region.

3. The imaging information calculation unit further calculates, from the learning set, out-of-object region information indicating a region other than the object in the X-ray image included in the learning set. An X-ray image object recognition system as described.

The outside-object area information is information on an X-ray irradiation field,
4. The method according to claim 3, wherein the imaging information calculation unit calculates the information of the irradiation field based on a correct label having a shape corresponding to the region of the irradiation field of the X-ray, which is included in the learning set. An X-ray image object recognition system as described.

The outside-object area information is information on an X-ray irradiation field,
4. The X-ray image object recognition system according to claim 3, wherein said imaging information calculation unit calculates information of said irradiation field based on histogram information of said X-ray image included in said learning set.

The outside-object area information is information on an X-ray irradiation field,
4. The imaging information calculation unit calculates the information of the irradiation field based on a binarized image obtained by binarizing each pixel value of the X-ray image included in the learning set. X-ray image object recognition system according to .

7. The data augmentation parameter determination unit according to any one of claims 3 to 6, wherein the data augmentation parameter determination unit determines the data augmentation parameter based on the imaging information, the non-object area information, and a preset threshold value. X-ray image object recognition system according to .

8. The X-ray image object recognition system according to claim 7, wherein said data augmentation parameter is at least one of reduction/enlargement ratio, shift amount, and rotation angle of said X-ray image.

The data augmentation parameter determination unit determines the reduction/enlargement ratio as the data augmentation parameter when the total number of pixels in the region of the object in the X-ray image included in the learning set is equal to or greater than a first threshold. is set to a value that magnifies or reduces the X-ray image.

The data augmentation parameter determination unit performs the reduction/ 10. The X-ray image object recognition system according to claim 9, wherein the magnification is set to a value that magnifies or magnifies the X-ray image.

11. The X-ray image object recognition system according to claim 9, wherein the data augmentation parameter determination unit determines at least one of the shift amount and the rotation angle along with the reduction/enlargement ratio.

The data augmentation parameter determination unit limits the settable range of the data augmentation parameter when the total number of pixels in the extra-object region in the X-ray image included in the learning set is less than a third threshold. The X-ray image object recognition system according to any one of claims 8 to 11, characterized in that:

3. The object includes at least one of a low X-ray transmissive region in a person through which the amount of X-rays penetrating is relatively low and a high X-ray transmissive region in which the amount of X-rays penetrating is relatively large. 13. The X-ray image object recognition system according to any one of 1 to 12.

14. The X-ray image object recognition according to claim 13, wherein the low X-ray transmission area includes a bone area of the person, and the high X-ray transmission area includes a lung area of the person. system.

15. The X-ray image object recognition system according to any one of claims 1 to 14, wherein said learning network comprises a neural network.