JP7465469B2

JP7465469B2 - Learning device, estimation device, learning program, and estimation program

Info

Publication number: JP7465469B2
Application number: JP2020086303A
Authority: JP
Inventors: 昌司小橋; 淳一平本; 健人盛田
Original assignee: University of Hyogo
Current assignee: University of Hyogo
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2024-04-11
Anticipated expiration: 2040-05-15
Also published as: JP2021178151A

Description

本発明は、生体内の骨組織が撮影された撮影画像における関節の位置を推定するための学習装置、推定装置、学習プログラム、及び推定プログラムに関する。 The present invention relates to a learning device, an estimation device, a learning program, and an estimation program for estimating the position of a joint in an image of bone tissue inside a living body.

近年、関節リウマチ患者の数は増加している。関節リウマチは、自己の免疫が主に手足の関節を侵すことで、関節痛、関節の変形が生じる炎症性自己免疫疾患である。 In recent years, the number of rheumatoid arthritis patients has been increasing. Rheumatoid arthritis is an inflammatory autoimmune disease in which the body's own immune system attacks the joints, mainly of the hands and feet, causing joint pain and deformation.

関節リウマチは、早期からの治療により予後の改善が可能な疾患である。従って、関節リウマチ患者に適切な治療を施すために疾患の進行度合いを正確に評価する必要がある。 Rheumatoid arthritis is a disease in which early treatment can improve the prognosis. Therefore, in order to provide appropriate treatment to rheumatoid arthritis patients, it is necessary to accurately evaluate the degree of disease progression.

関節リウマチの進行度合いは、例えばｍＴＳ（modified Total Sharp）スコアによって評価される。 The progression of rheumatoid arthritis is evaluated, for example, by the mTS (modified Total Sharp) score.

特表２０１７－５３４３９７号公報（段落０００９、００１０）JP2017-534397A (paragraphs 0009 and 0010)

従来、ｍＴＳスコアを算出するために、関節リウマチ患者の手指又は足指が撮影されたＸ線画像における関節位置を医師が手動で特定する必要があった。この関節位置を手動で特定する作業は医師にとって手間であった。また、関節位置の特定精度が医師の技量に依存するという問題もあった。 Conventionally, to calculate the mTS score, doctors had to manually identify the joint positions in X-ray images of the fingers or toes of a rheumatoid arthritis patient. This task of manually identifying joint positions was time-consuming for doctors. There was also the problem that the accuracy of identifying joint positions depended on the doctor's skill.

このため、関節位置を自動で特定できる画像処理の開発が望まれている。ここで、関節位置を自動で特定できる画像処理において演算負荷が大きければ、処理時間の増大などの新たな問題が生じる。 Therefore, there is a demand for the development of image processing that can automatically identify joint positions. However, if the computational load is large in image processing that can automatically identify joint positions, new problems such as increased processing time will arise.

なお、特許文献１には指関節の位置を自動的に識別する技術が開示されているが、手固定具、センサ等を必要とする技術であり、使い勝手の悪いものであった。 Patent Document 1 discloses a technology for automatically identifying the positions of finger joints, but this technology requires a hand fixation device, sensors, etc., and is therefore difficult to use.

本発明は、上記の状況に鑑み、画像処理における演算負荷を抑制しつつ、関節の位置を推定するために必要な学習結果を得ることができる学習装置及び学習プログラムを提供することを目的とする。 In view of the above, the present invention aims to provide a learning device and a learning program that can obtain the learning results necessary to estimate the position of joints while reducing the computational load in image processing.

また本発明は、上記の状況に鑑み、画像処理における演算負荷を抑制しつつ、関節の位置を推定することができる推定装置及び推定プログラムを提供することを目的とする。 In view of the above, the present invention also aims to provide an estimation device and an estimation program that can estimate the position of a joint while reducing the computational load in image processing.

上記目的を達成するために本発明の第１局面に係る学習装置は、複数の関節それぞれの領域を含む画像における前記複数の関節それぞれの位置を特定する第１特定部と、前記複数の関節間の相対位置をモデル位置情報として特定する第２特定部と、を備え、前記モデル位置情報は所定数のパラメータを有し、前記所定数は、前記複数の関節の個数と前記複数の関節それぞれの位置を示す各座標の次元数との乗算値よりも小さい構成（第１の構成）とする。 To achieve the above object, a learning device according to a first aspect of the present invention includes a first identification unit that identifies the position of each of the multiple joints in an image including the area of each of the multiple joints, and a second identification unit that identifies the relative positions between the multiple joints as model position information, the model position information having a predetermined number of parameters, the predetermined number being smaller than the product of the number of the multiple joints and the number of dimensions of each coordinate indicating the position of each of the multiple joints (first configuration).

上記目的を達成するために本発明の第２局面に係る学習装置は、生体内の骨組織が撮影された第１及び第２撮影画像を取得する取得部と、前記第１撮影画像を複数の領域に分割し、分割した領域の画像特徴量を求める演算部と、前記画像特徴量を機械学習することにより、任意の領域が関節領域である第１クラスを含む少なくとも一つのクラスのうちのどのクラスに属するかを識別するための識別器を生成する生成部と、前記第２撮影画像から複数の関節間の相対位置をモデル位置情報として抽出する抽出部と、を備え、前記モデル位置情報は所定数のパラメータを有し、前記所定数は、前記複数の関節の個数と前記複数の関節それぞれの位置を示す各座標の次元数との乗算値よりも小さい構成（第２の構成）とする。 To achieve the above object, the learning device according to the second aspect of the present invention includes an acquisition unit that acquires first and second captured images of bone tissue in a living body, a calculation unit that divides the first captured image into a plurality of regions and determines image features of the divided regions, a generation unit that generates a classifier for identifying which of at least one class including a first class that is a joint region an arbitrary region belongs to by machine learning the image features, and an extraction unit that extracts the relative positions between a plurality of joints from the second captured image as model position information, the model position information having a predetermined number of parameters, and the predetermined number is configured to be smaller than the product of the number of the plurality of joints and the number of dimensions of each coordinate indicating the position of each of the plurality of joints (second configuration).

上記第２の構成の学習装置において、前記第２撮影画像は複数枚あり、複数枚の第２撮影画像それぞれは異なる生体を被写体として撮影された画像であり、前記抽出部は、複数枚の第２撮影画像の関節位置を主成分分析することによって前記モデル位置情報を抽出する構成（第３の構成）であってもよい。 In the learning device of the second configuration, the second captured images may be multiple, each of the multiple second captured images being captured with a different living body as a subject, and the extraction unit may be configured to extract the model position information by performing principal component analysis on the joint positions of the multiple second captured images (third configuration).

上記第２又は第３の構成の学習装置において、前記少なくとも一つのクラスは、指先領域である第２クラスを含む構成（第４の構成）であってもよい。 In the learning device of the second or third configuration, the at least one class may be configured to include a second class that is a fingertip region (fourth configuration).

上記第２～第４いずれかの構成の学習装置において、前記取得部と前記演算部及び前記抽出部との間に設けられる前処理部を備え、前記前処理部は、前記第１撮影画像及び前記第２撮影画像を水平線によって複数のブロックに分割し、前記複数のブロックそれぞれにおいて前記生体が写っている被写体領域と前記生体が写っていない背景領域との境界となる画素値を画素値ヒストグラムに基づき求め、前記複数のブロック間の前記境界となる画素値のシフトが小さくなるように前記ブロック単位で画素値を補正する構成（第５の構成）であってもよい。 The learning device of any one of the second to fourth configurations may further include a pre-processing unit provided between the acquisition unit and the calculation unit and the extraction unit, and the pre-processing unit may divide the first captured image and the second captured image into a plurality of blocks by horizontal lines, obtain pixel values that are the boundaries between the subject area in which the living body is captured and the background area in which the living body is not captured in each of the plurality of blocks based on a pixel value histogram, and correct the pixel values on a block-by-block basis so that the shift in pixel values at the boundaries between the plurality of blocks is small (fifth configuration).

上記目的を達成するために本発明の第３局面に係る学習プログラムは、コンピュータを、複数の関節それぞれの領域を含む画像における前記複数の関節それぞれの位置を特定する第１特定部、及び前記複数の関節間の相対位置をモデル位置情報として特定する第２特定部、として機能させる学習プログラムであって、前記モデル位置情報は所定数のパラメータを有し、前記所定数は、前記複数の関節の個数と前記複数の関節それぞれの位置を示す各座標の次元数との乗算値よりも小さい構成（第６の構成）とする。 To achieve the above object, the learning program according to the third aspect of the present invention is a learning program that causes a computer to function as a first identification unit that identifies the position of each of the multiple joints in an image including the area of each of the multiple joints, and a second identification unit that identifies the relative positions between the multiple joints as model position information, the model position information having a predetermined number of parameters, the predetermined number being smaller than the product of the number of the multiple joints and the number of dimensions of each coordinate indicating the position of each of the multiple joints (sixth configuration).

上記目的を達成するために本発明の第４局面に係る学習プログラムは、コンピュータを、生体内の骨組織が撮影された第１及び第２撮影画像を取得する取得部、前記第１撮影画像を複数の領域に分割し、分割した領域の画像特徴量を求める演算部、前記画像特徴量を機械学習することにより、任意の領域が関節領域である第１クラスを含む少なくとも一つのクラスのうちのどのクラスに属するかを識別するための識別器を生成する生成部、及び前記第２撮影画像から複数の関節間の相対位置をモデル位置情報として抽出する抽出部、として機能させる学習プログラムであって、前記モデル位置情報は所定数のパラメータを有し、前記所定数は、前記複数の関節の個数と前記複数の関節それぞれの位置を示す各座標の次元数との乗算値よりも小さい構成（第７の構成）とする。 In order to achieve the above object, the learning program according to the fourth aspect of the present invention is a learning program that causes a computer to function as an acquisition unit that acquires first and second captured images of bone tissue in a living body, a calculation unit that divides the first captured image into multiple regions and determines image features of the divided regions, a generation unit that generates a classifier for identifying which of at least one class including a first class that is a joint region an arbitrary region belongs to by machine learning the image features, and an extraction unit that extracts the relative positions between multiple joints as model position information from the second captured image, where the model position information has a predetermined number of parameters, and the predetermined number is configured to be smaller than the product of the number of the multiple joints and the number of dimensions of each coordinate indicating the position of each of the multiple joints (seventh configuration).

上記目的を達成するために本発明の第５局面に係る推定装置は、学習装置によって生成された識別器及び前記学習装置によって抽出されたモデル位置情報を有する推定装置であって、前記学習装置は、上記第２～第５いずれかの構成の学習装置であり、前記推定装置は、生体内の骨組織が撮影された推定対象画像を複数の領域に分割し、前記複数の領域それぞれに対して画像特徴量を求める第１処理部と、前記第１処理部によって求められた前記画像特徴量から、前記識別器を用いて、前記複数の領域それぞれに対して関節らしさを示す度合を求める第２処理部と、前記関節らしさを示す度合に基づき前記推定対象画像における関節の候補を求める第３処理部と、前記関節の候補に基づき前記モデル位置情報の前記パラメータを変更し、前記モデル位置情報における各関節に対応する前記識別器で求められた前記関節らしさを示す度合の合計を最大化するように基づき前記パラメータの更新を収束させる第４処理部と、を備える構成（第８の構成）とする。 In order to achieve the above object, the estimation device according to the fifth aspect of the present invention is an estimation device having a classifier generated by a learning device and model position information extracted by the learning device, the learning device being a learning device having any of the second to fifth configurations described above, and the estimation device is configured to include a first processing unit that divides an estimation target image in which bone tissue in a living body is photographed into a plurality of regions and obtains image features for each of the plurality of regions, a second processing unit that uses the classifier to obtain a degree of joint-likeness for each of the plurality of regions from the image features obtained by the first processing unit, a third processing unit that obtains joint candidates in the estimation target image based on the degree of joint-likeness, and a fourth processing unit that changes the parameters of the model position information based on the joint candidates and converges the update of the parameters based on maximizing the sum of the degrees of joint-likeness obtained by the classifier corresponding to each joint in the model position information (eighth configuration).

上記目的を達成するために本発明の第６局面に係る推定プログラムは、コンピュータを、学習装置によって生成された識別器及び前記学習装置によって抽出されたモデル位置情報を有する推定装置として機能させる推定プログラムであって、前記学習装置は、上記第１～第４いずれかの構成の学習装置であり、前記推定装置は、生体内の骨組織が撮影された推定対象画像を複数の領域に分割し、前記複数の領域それぞれに対して画像特徴量を求める第１処理部と、前記第１処理部によって求められた前記画像特徴量から、前記識別器を用いて、前記複数の領域それぞれに対して関節らしさを示す度合を求める第２処理部と、前記関節らしさを示す度合に基づき前記推定対象画像における関節の候補を求める第３処理部と、前記関節の候補に基づき前記モデル位置情報の前記パラメータを変更し、前記モデル位置情報における各関節に対応する前記識別器で求められた前記関節らしさを示す度合の合計を最大化するように前記パラメータの更新を収束させる第４処理部と、を備える構成（第９の構成）とする。 In order to achieve the above object, the estimation program according to the sixth aspect of the present invention is an estimation program that causes a computer to function as an estimation device having a classifier generated by a learning device and model position information extracted by the learning device, the learning device being a learning device having any of the first to fourth configurations described above, and the estimation device is configured to include a first processing unit that divides an estimation target image in which bone tissue in a living body is photographed into a plurality of regions and obtains image features for each of the plurality of regions, a second processing unit that uses the classifier to obtain a degree of joint-likeness for each of the plurality of regions from the image features obtained by the first processing unit, a third processing unit that obtains joint candidates in the estimation target image based on the degree of joint-likeness, and a fourth processing unit that changes the parameters of the model position information based on the joint candidates and converges the update of the parameters so as to maximize the sum of the degrees of joint-likeness obtained by the classifier corresponding to each joint in the model position information (ninth configuration).

本発明に係る学習装置及び学習プログラムによると、画像処理における演算負荷を抑制しつつ、関節の位置を推定するために必要な学習結果を得ることができる。 The learning device and learning program of the present invention can obtain the learning results necessary to estimate the position of joints while reducing the computational load in image processing.

本発明に係る推定装置及び推定プログラムによると、画像処理における演算負荷を抑制しつつ、関節の位置を推定することができる。 The estimation device and estimation program of the present invention can estimate the position of joints while reducing the computational load in image processing.

本発明の一実施形態に係る情報処理装置の構成を示す図FIG. 1 is a diagram showing a configuration of an information processing device according to an embodiment of the present invention. 図１に示す情報処理装置の機能の一例を示す機能ブロック図FIG. 2 is a functional block diagram showing an example of a function of the information processing device shown in FIG. 1 ; 取得部によって取得されたＸ線撮影画像の一例を示す図FIG. 1 is a diagram showing an example of an X-ray image acquired by an acquisition unit; 取得部によって取得されたＸ線撮影画像の他の例を示す図FIG. 13 is a diagram showing another example of an X-ray image acquired by the acquisition unit; 第１撮影画像を水平線で分割する様子を示す模式図FIG. 13 is a schematic diagram showing how a first captured image is divided by a horizontal line; ブロックの番号と、境界となる画素値との関係を示す図A diagram showing the relationship between block numbers and boundary pixel values. 学習装置の概略動作例を示すフローチャート1 is a flowchart showing an example of a general operation of a learning device; 推定装置の概略動作例を示すフローチャート1 is a flowchart showing an example of a schematic operation of an estimation device;

本発明の実施形態について図面を参照して以下に説明する。 An embodiment of the present invention will be described below with reference to the drawings.

＜１．情報処理装置の構成＞
図１は、本発明の一実施形態に係る情報処理装置の構成を示す図である。本発明の一実施形態に係る情報処理装置１（以下、情報処理装置１という）は、制御部２、記憶部３、通信部４、表示部５、及び操作部６を備える。 1. Configuration of information processing device
1 is a diagram showing the configuration of an information processing device according to an embodiment of the present invention. The information processing device 1 according to an embodiment of the present invention (hereinafter referred to as information processing device 1) includes a control unit 2, a storage unit 3, a communication unit 4, a display unit 5, and an operation unit 6.

制御部２は、例えばマイクロコンピュータである。制御部２は、情報処理装置１の全体を統括的に制御する。制御部２は、不図示のＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、及びＲＯＭ（Read Only Memory）を含む。 The control unit 2 is, for example, a microcomputer. The control unit 2 performs overall control of the information processing device 1. The control unit 2 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory), all of which are not shown.

記憶部３は、例えばフラッシュメモリ、ハードディスクドライブ等である。各種のデータ、情報処理装置１によって実行される学習プログラム及び推定プログラム等を記憶する。 The storage unit 3 is, for example, a flash memory, a hard disk drive, etc. It stores various data, learning programs and estimation programs executed by the information processing device 1, etc.

通信部４は、外部装置との通信を行うための通信インターフェースである。通信部４と外部装置との通信方法は、有線通信でもよく、無線通信でもよく、有線と無線とを組み合わせた通信であってもよい。外部装置としては、例えばＸ線撮影画像を撮影するＸ線撮影装置、Ｘ線撮影画像を記憶している記憶装置等を挙げることができる。 The communication unit 4 is a communication interface for communicating with an external device. The communication method between the communication unit 4 and the external device may be wired communication, wireless communication, or a combination of wired and wireless communication. Examples of the external device include an X-ray imaging device that captures X-ray images, a storage device that stores X-ray images, etc.

表示部５は、例えば液晶表示装置、有機ＥＬ（Electro Luminescence）表示装置等である。表示部５は、制御部２の制御に基づいて各種の画像を表示する。 The display unit 5 is, for example, a liquid crystal display device, an organic EL (Electro Luminescence) display device, etc. The display unit 5 displays various images based on the control of the control unit 2.

操作部６は、例えばキーボード、ポインティングデバイス等である。操作部６は、ユーザの操作内容に応じた信号を制御部２に出力する。 The operation unit 6 is, for example, a keyboard, a pointing device, etc. The operation unit 6 outputs a signal according to the user's operation to the control unit 2.

＜２．情報処理装置の機能＞
図２は、情報処理装置１の機能の一例を示す機能ブロック図である。情報処理装置１は、学習プログラムを実行することによって学習装置１０として機能する。情報処理装置１は、推定プログラムを実行することによって推定装置２０として機能する。なお、本実施形態では、１台の情報処理装置１が学習装置１０及び推定装置２０を機能部として含んでいるが、本実施形態とは異なり、学習装置１０を機能部として含む情報処理装置と、推定装置２０を機能部として含む情報処理装置とが別々の情報処理装置であってもよい。 2. Functions of the information processing device
2 is a functional block diagram showing an example of the functions of the information processing device 1. The information processing device 1 functions as a learning device 10 by executing a learning program. The information processing device 1 functions as an estimation device 20 by executing an estimation program. In this embodiment, one information processing device 1 includes the learning device 10 and the estimation device 20 as functional units, but unlike this embodiment, the information processing device including the learning device 10 as a functional unit and the information processing device including the estimation device 20 as a functional unit may be separate information processing devices.

＜２－１．学習装置＞
学習装置１０は、取得部１１、前処理部１２、演算部１３、生成部１４、及び抽出部１５を含む。 <2-1. Learning device>
The learning device 10 includes an acquisition unit 11, a preprocessing unit 12, a calculation unit 13, a generation unit 14, and an extraction unit 15.

取得部１１は、生体内の骨組織が撮影された第１及び第２撮影画像を取得する。具体的には、取得部１１は、通信部４（図１参照）によって外部装置から第１及び第２撮影画像を取得する。 The acquisition unit 11 acquires the first and second captured images of bone tissue in a living body. Specifically, the acquisition unit 11 acquires the first and second captured images from an external device via the communication unit 4 (see FIG. 1).

本実施形態では、第１撮影画像と第２撮影画像とは同一の画像である。なお、本実施形態とは異なり、第１撮影画像と第２撮影画像とは異なる撮影画像であってもよい。 In this embodiment, the first captured image and the second captured image are the same image. However, unlike this embodiment, the first captured image and the second captured image may be different captured images.

また、本実施形態では、取得部１１は複数の第１撮影画像（＝第２撮影画像）を取得する。より詳細には、取得部１１は、複数の第１撮影画像（＝第２撮影画像）として複数のＸ線撮影画像を取得する。本実施形態では、取得部１１によって取得された複数のＸ線撮影画像は、レントゲン撮影装置によって撮影された異なる複数人の両手の画像である。 In addition, in this embodiment, the acquisition unit 11 acquires a plurality of first captured images (= second captured images). More specifically, the acquisition unit 11 acquires a plurality of X-ray captured images as the plurality of first captured images (= second captured images). In this embodiment, the plurality of X-ray captured images acquired by the acquisition unit 11 are images of both hands of a plurality of different people captured by an X-ray imaging device.

図３は、取得部１１によって取得されたＸ線撮影画像の一例であり、関節リウマチが進行していない人の両手の画像である。図４は、取得部１１によって取得されたＸ線撮影画像の他の例であり、関節リウマチが進行した人の両手の画像である。 Figure 3 is an example of an X-ray image acquired by the acquisition unit 11, which is an image of both hands of a person with non-advanced rheumatoid arthritis. Figure 4 is another example of an X-ray image acquired by the acquisition unit 11, which is an image of both hands of a person with advanced rheumatoid arthritis.

図３及び図４から分かるように、Ｘ線撮影画像の上部は暗くなっており、Ｘ線撮影画像の下部は明るくなっている。このような上下方向における明暗の発生は、レントゲン撮影装置の特性に起因するものである。 As can be seen from Figures 3 and 4, the upper part of the X-ray image is dark, and the lower part of the X-ray image is bright. This occurrence of light and dark in the vertical direction is due to the characteristics of the X-ray imaging device.

そこで、前処理部１２は、図５に示す模式図のように第１撮影画像（＝第２撮影画像）を等間隔に並ぶ水平線Ｈ１～Ｈ９によってブロックＢ１～Ｂ１０に分割する。ここでは、図示を簡単にするために分割数を１０にしたが、実際の分割数は例えば１００～４００程度が好ましい。分割数が少なすぎると、滑らかな補正が行えず、逆に分割数が多すぎると、各ブロックの領域が狭くなり、ノイズ耐性が劣化するからである。本実施形態では、実際の分割数を２００としている。それから、前処理部１２は、ブロックＢ１～Ｂ１０それぞれにおいて手領域（前景）と非手領域（背景）との境界となる画素値（しきい値）を求める。手領域（前景）と非手領域（背景）との境界となる画素値（しきい値）を求める手法は特に限定されないが、本実施形態では大津のしきい値法を用いる。そして、前処理部１２は、ブロックＢ１～Ｂ１０間の境界となる画素値のシフトが小さくなるようにブロックＢ１～Ｂ１０単位で画素値を補正する。例えば、ブロックの番号と、境界となる画素値とが図６に示す関係であって指数関数Ｅ１で近似できる場合、当該指数関数Ｅ１が直線Ｌ１に変換されるように、ブロックＢ１～Ｂ１０毎に異なるパラメータを設定し、当該パラメータを画素値に演算することで画素値を補正する。つまり、画素値を補正した後、ブロックＢ１～Ｂ１０それぞれにおいて手領域（前景）と非手領域（背景）との境界となる画素値（しきい値）を画素値ヒストグラムに基づき大津のしきい値法で求めると、ブロックの番号と、境界となる画素値（しきい値）との関係が直線Ｌ１で近似できる。 Therefore, the preprocessing unit 12 divides the first captured image (= second captured image) into blocks B1 to B10 by horizontal lines H1 to H9 arranged at equal intervals, as shown in the schematic diagram of FIG. 5. Here, the number of divisions is set to 10 for simplicity of illustration, but the actual number of divisions is preferably about 100 to 400, for example. If the number of divisions is too small, smooth correction cannot be performed, and conversely, if the number of divisions is too large, the area of each block becomes narrow and noise resistance deteriorates. In this embodiment, the actual number of divisions is set to 200. Then, the preprocessing unit 12 obtains pixel values (threshold values) that are the boundaries between the hand area (foreground) and the non-hand area (background) in each of the blocks B1 to B10. There is no particular limitation on the method of obtaining pixel values (threshold values) that are the boundaries between the hand area (foreground) and the non-hand area (background), but in this embodiment, Otsu's threshold method is used. Then, the preprocessing unit 12 corrects the pixel values in units of blocks B1 to B10 so that the shift in pixel values that are the boundaries between the blocks B1 to B10 is small. For example, if the relationship between the block number and the pixel value at the boundary is as shown in Figure 6 and can be approximated by the exponential function E1, different parameters are set for each block B1 to B10 so that the exponential function E1 is converted to a straight line L1, and the pixel values are corrected by calculating the parameters to the pixel values. In other words, after correcting the pixel values, if the pixel value (threshold value) at the boundary between the hand area (foreground) and the non-hand area (background) in each block B1 to B10 is found using Otsu's threshold method based on the pixel value histogram, the relationship between the block number and the pixel value (threshold value) at the boundary can be approximated by the straight line L1.

なお、上述したＸ線撮影画像に対する補正処理は、学習装置１０でしか実行できない処理ではなく、Ｘ線撮影画像に対して画像処理を行う画像処理装置全般に適用することができる。 The correction process for X-ray images described above is not a process that can only be performed by the learning device 10, but can be applied to all image processing devices that perform image processing on X-ray images.

次に、前処理部１２は、第１撮影画像（＝第２撮影画像）から手領域を自動抽出する。具体的には、前処理部１２は、図６に示す直線Ｌ１の縦軸座標値である境界となる画素値（しきい値）ＴＨ１より高い画素値を手領域とする。さらに、前処理部１２は、境界となる画素値（しきい値）ＴＨ１以下の画素値を非手領域とし、手領域及び非手領域それぞれにおいて穴や孤立点の除去、形状の平滑化、微小面積領域の除去を行って、手領域及び非手領域それぞれを確定する。 Next, the pre-processing unit 12 automatically extracts a hand region from the first captured image (= second captured image). Specifically, the pre-processing unit 12 determines that a pixel value higher than the boundary pixel value (threshold value) TH1, which is the vertical axis coordinate value of the straight line L1 shown in FIG. 6, is the hand region. Furthermore, the pre-processing unit 12 determines that a pixel value equal to or lower than the boundary pixel value (threshold value) TH1 is the non-hand region, and determines the hand region and non-hand region by removing holes and isolated points, smoothing shapes, and removing small area regions in each of the hand region and non-hand region.

次に、前処理部１２は、左手と右手との分割を行う。右手と左手を分割する手法は特に限定されないが、本実施形態では大津のしきい値法を用いる。具体的には、本実施形態において前処理部１２は、第１撮影画像（＝第２撮影画像）から水平方向に手領域の画素値ヒストグラムを作成し、その手領域の画素値ヒストグラムに基づき大津のしきい値法で第１撮影画像（＝第２撮影画像）における左手と右手との分割ラインを決定する。右手領域は、前処理部１２によって左右反転され、以後は左手として取り扱われる。つまり、右手領域を左右反転してから後は、一つの第１撮影画像（＝第２撮影画像）に二つの左手が存在することになる。これにより、一つの第１撮影画像（＝第２撮影画像）から左手に関するデータが２つ得られる。このようにしてデータ数を２倍に増やすことで、学習装置１０における学習精度及び推定装置２０における推定精度を向上できる。 Next, the preprocessing unit 12 divides the left hand into the right hand. The method for dividing the right hand into the left hand is not particularly limited, but in this embodiment, Otsu's threshold method is used. Specifically, in this embodiment, the preprocessing unit 12 creates a pixel value histogram of the hand region in the horizontal direction from the first captured image (= second captured image), and determines a dividing line between the left hand and the right hand in the first captured image (= second captured image) using Otsu's threshold method based on the pixel value histogram of the hand region. The right hand region is left-right inverted by the preprocessing unit 12, and is treated as the left hand thereafter. In other words, after the right hand region is left-right inverted, two left hands exist in one first captured image (= second captured image). As a result, two pieces of data related to the left hand are obtained from one first captured image (= second captured image). By doubling the amount of data in this way, the learning accuracy in the learning device 10 and the estimation accuracy in the estimation device 20 can be improved.

本実施形態では、演算部１３は、垂直方向１００画素×水平方向１００画素の領域を設定し、各々の第１撮影画像を複数の領域に分割する。演算部１３によって設定される領域の垂直方向画素数は１００に限定されない。同様に、演算部１３によって設定される領域の水平方向画素数も１００に限定されない。演算部１３によって設定される領域の垂直方向画素数と演算部１３によって設定される領域の水平方向画素数とは同一であってもよく同一でなくてもよい。また、本実施形態では、隣接する領域間で重複部分がない態様で各々の第１撮影画像が複数の領域に分割されるが、隣接する領域間で重複部分がある態様で各々の第１撮影画像が複数の領域に分割されてもよい。 In this embodiment, the calculation unit 13 sets an area of 100 pixels vertically by 100 pixels horizontally, and divides each of the first captured images into a plurality of areas. The number of vertical pixels of the area set by the calculation unit 13 is not limited to 100. Similarly, the number of horizontal pixels of the area set by the calculation unit 13 is not limited to 100. The number of vertical pixels of the area set by the calculation unit 13 and the number of horizontal pixels of the area set by the calculation unit 13 may or may not be the same. Also, in this embodiment, each of the first captured images is divided into a plurality of areas such that there is no overlap between adjacent areas, but each of the first captured images may be divided into a plurality of areas such that there is overlap between adjacent areas.

次に、演算部１３は、領域それぞれの画像特徴量を求める。本実施形態では、画像特徴量として、ＨＯＧ（Histograms of Oriented Gradients）特徴量を使用している。ＨＯＧ特徴量は幾何学的変換に強く、第１撮影画像の明るさの変動に頑健であることが、ＨＯＧ特徴量を使用した理由である。なお、本実施形態とは異なり、演算部１３がＨＯＧ特徴量以外の画像特徴量を求めてもよい。 Next, the calculation unit 13 calculates image features for each region. In this embodiment, HOG (Histograms of Oriented Gradients) features are used as the image features. The reason for using HOG features is that they are resistant to geometric transformations and robust to variations in brightness of the first captured image. Note that, unlike this embodiment, the calculation unit 13 may calculate image features other than HOG features.

生成部１４は、関節領域である第１クラスを含む複数のクラスを設定する。 The generation unit 14 sets multiple classes including a first class which is a joint region.

本実施形態では、生成部１４は、関節領域である第１クラスと、指先領域である第２クラスと、関節領域でもなく指先領域でもない領域である第３クラスと、を設定する。生成部１４が指先領域である第２クラスを設定することにより、学習装置１０が指先領域を誤って関節領域であると学習することを抑制することができる。なお、本実施形態とは異なり、生成部１４が、関節領域である第１クラス及び非関節領域であるクラスの二つのクラスのみを設定してもよい。また、本実施形態とは異なり、生成部１４が関節領域である第１クラスのみを設定してもよい。 In this embodiment, the generation unit 14 sets a first class which is a joint region, a second class which is a fingertip region, and a third class which is a region which is neither a joint region nor a fingertip region. By the generation unit 14 setting the second class which is a fingertip region, it is possible to prevent the learning device 10 from erroneously learning that the fingertip region is a joint region. Note that, unlike this embodiment, the generation unit 14 may set only two classes, the first class which is a joint region and a class which is a non-joint region. Also, unlike this embodiment, the generation unit 14 may set only the first class which is a joint region.

生成部１４は、各々の第１撮影画像内の各々の左手領域から、操作部６（図１参照）の出力信号に基づき第１クラスにおいて１４個の領域を選択する。つまり、第１クラスにおける領域選択は手動である。当該１４個の領域は、母指第１関節に対応する領域、母指第２関節に対応する領域、示指第１関節に対応する領域、示指第２関節に対応する領域、示指第３関節に対応する領域、中指第１関節に対応する領域、中指第２関節に対応する領域、中指第３関節に対応する領域、環指第１関節に対応する領域、環指第２関節に対応する領域、環指第３関節に対応する領域、小指第１関節に対応する領域、小指第２関節に対応する領域、及び小指第３関節に対応する領域である。演算部１３は、上述したように各々の第１撮影画像内の各々の左手領域から第１クラスにおいて１４個の領域を選択するので、複数の関節それぞれの領域を含む画像における複数の関節それぞれの位置を特定する第１特定部であるともいえる。 The generating unit 14 selects 14 regions in the first class from each left hand region in each first captured image based on the output signal of the operating unit 6 (see FIG. 1). That is, the region selection in the first class is manual. The 14 regions are a region corresponding to the first joint of the thumb, a region corresponding to the second joint of the thumb, a region corresponding to the first joint of the index finger, a region corresponding to the second joint of the index finger, a region corresponding to the third joint of the index finger, a region corresponding to the first joint of the middle finger, a region corresponding to the second joint of the middle finger, a region corresponding to the third joint of the middle finger, a region corresponding to the first joint of the ring finger, a region corresponding to the second joint of the ring finger, a region corresponding to the third joint of the ring finger, a region corresponding to the first joint of the little finger, a region corresponding to the second joint of the little finger, and a region corresponding to the third joint of the little finger. As described above, the calculating unit 13 selects 14 regions in the first class from each left hand region in each first captured image, so it can also be said to be a first identifying unit that identifies the position of each of the multiple joints in an image including the regions of each of the multiple joints.

生成部１４は、各々の第１撮影画像内の各々の左手領域から、操作部６（図１参照）の出力信号に基づき第２クラスにおいて５個の領域を選択する。つまり、第２クラスにおける領域選択は手動である。当該５個の領域は、母指の指先に対応する領域、示指の指先に対応する領域、中指の指先に対応する領域、環指の指先に対応する領域、及び小指の指先に対応する領域である。 The generation unit 14 selects five regions in the second class from each left hand region in each first captured image based on the output signal of the operation unit 6 (see FIG. 1). In other words, the region selection in the second class is manual. The five regions are a region corresponding to the tip of the thumb, a region corresponding to the tip of the index finger, a region corresponding to the tip of the middle finger, a region corresponding to the tip of the ring finger, and a region corresponding to the tip of the little finger.

生成部１４は、各々の第１撮影画像内の各々の左手領域から、第１クラスの領域及び第２クラスの領域を除いた後、第３クラスにおいて例えば２５個の領域をランダムに選択する。つまり、第３クラスにおける領域選択は自動である。本実施形態では、演算部１３は、第３クラスにおいて複数の領域を所定の条件を満たした上でランダムに選択する。当該所定の条件は、第３クラスにおいて選択される各領域が第３クラスにおいて選択される他の領域と重複せず、且つ、第３クラスにおいて選択される各領域が偏在しないように平均的に配置されるという条件である。 The generation unit 14 removes the first class area and the second class area from each left hand area in each first captured image, and then randomly selects, for example, 25 areas in the third class. In other words, area selection in the third class is automatic. In this embodiment, the calculation unit 13 randomly selects multiple areas in the third class that satisfy a predetermined condition. The predetermined condition is that each area selected in the third class does not overlap with other areas selected in the third class, and each area selected in the third class is evenly arranged so as not to be unevenly distributed.

次に、生成部１４は、複数の第１撮影画像それぞれに対してコントラストを変更する。コントラストの変更は一種類でもよく複数種類でもよい。元の第１撮影画像とコントラストを変更した第１撮影画像とが学習データとして用いられる。 Next, the generation unit 14 changes the contrast for each of the multiple first captured images. The contrast may be changed in one way or in multiple ways. The original first captured image and the first captured image with the contrast changed are used as learning data.

生成部１４は、上述した学習データ、領域それぞれの画像特徴量を演算部１３から受け取る。生成部１４は、上述した学習データを用いて、第１クラスの領域それぞれの画像特徴量、第２クラスの領域それぞれの画像特徴量、及び第３クラスの領域それぞれの画像特徴量を機械学習することにより、任意の領域が第１～第３クラスのうちのどのクラスに属するかを識別するための識別器を生成する。識別器を生成する手法は特に限定されないが、本実施形態ではサポートベクターマシンにより識別器を生成する。 The generation unit 14 receives the above-mentioned learning data and image features of each region from the calculation unit 13. Using the above-mentioned learning data, the generation unit 14 performs machine learning on the image features of each region of the first class, the image features of each region of the second class, and the image features of each region of the third class, thereby generating a classifier for identifying which of the first to third classes an arbitrary region belongs to. There are no particular limitations on the method for generating the classifier, but in this embodiment, the classifier is generated using a support vector machine.

抽出部１５は、第２撮影画像（＝第１撮影画像）の各指関節の中心位置を前処理部１２から受け取る。なお、前処理部１２は、第２撮影画像（＝第１撮影画像）の各指関節の中心位置を操作部６（図１参照）の出力信号に基づき選択する。つまり、第２撮影画像（＝第１撮影画像）の各指関節の中心位置の選択は手動である。抽出部１５は、複数の第２撮影画像に対して関節毎に座標の平均値を求め、関節毎の座標の平均値に基づいて複数の第２撮影画像それぞれに対して剛体レジストレーションを実行する。抽出部１５は、剛体レジストレーション後の複数の第２撮影画像に対して関節毎に座標の平均値を求め、関節毎の座標の平均値に基づいて複数の第２撮影画像それぞれに対して剛体レジストレーションを実行する。抽出部１５は、剛体レジストレーションが収束するまで、関節毎の座標の平均値算出と剛体レジストレーションとを繰り返す。抽出部１５は、剛体レジストレーションが収束した第２撮影画像（＝第１撮影画像）から複数の関節間の相対位置をモデル位置情報として抽出する。モデル位置情報は所定数のパラメータを有する。そして、所定数は、複数の関節の個数と複数の関節それぞれの位置を示す各座標の次元数との乗算値よりも小さい。これにより、複数の関節それぞれの位置を示す各座標によってモデル位置情報を構成する場合と比較して、画像処理における演算負荷を抑制しつつ、関節の位置を推定することができる。抽出部１５は、上述したように第２撮影画像（＝第１撮影画像）から複数の関節間の相対位置をモデル位置情報として抽出するので、複数の関節間の相対位置をモデル位置情報として特定する第２特定部であるともいえる。 The extraction unit 15 receives the center position of each finger joint in the second captured image (= first captured image) from the pre-processing unit 12. The pre-processing unit 12 selects the center position of each finger joint in the second captured image (= first captured image) based on the output signal of the operation unit 6 (see FIG. 1). In other words, the selection of the center position of each finger joint in the second captured image (= first captured image) is manual. The extraction unit 15 calculates the average value of the coordinates for each joint for the multiple second captured images, and performs rigid body registration for each of the multiple second captured images based on the average value of the coordinates for each joint. The extraction unit 15 calculates the average value of the coordinates for each joint for the multiple second captured images after the rigid body registration, and performs rigid body registration for each of the multiple second captured images based on the average value of the coordinates for each joint. The extraction unit 15 repeats the calculation of the average value of the coordinates for each joint and the rigid body registration until the rigid body registration converges. The extraction unit 15 extracts the relative positions between the multiple joints as model position information from the second captured image (= first captured image) in which the rigid body registration has converged. The model position information has a predetermined number of parameters. The predetermined number is smaller than the product of the number of joints and the number of dimensions of each coordinate indicating the position of each of the joints. This makes it possible to estimate the positions of the joints while reducing the computational load in image processing, compared to when model position information is constructed using each coordinate indicating the position of each of the joints. As described above, the extraction unit 15 extracts the relative positions between the multiple joints from the second captured image (= first captured image) as model position information, and therefore can also be said to be a second identification unit that identifies the relative positions between the multiple joints as model position information.

本実施形態では、抽出部１５は、レントゲン撮影装置によって撮影された異なる複数人の両手の画像から複数の手指関節間の相対位置をモデル位置情報として抽出する。本実施形態では、モデル位置情報は３個のパラメータを有し、モデル位置情報における複数の手指関節の個数は１４個であり、複数の手指関節それぞれの位置を示す各座標は２次元である。なお、モデル位置情報における所定数のパラメータは３個のパラメータに限定されず、上述したように所定数が複数の関節の個数と複数の関節それぞれの位置を示す各座標の次元数との乗算値よりも小さければよい。 In this embodiment, the extraction unit 15 extracts the relative positions between multiple finger joints as model position information from images of both hands of multiple different people captured by an X-ray imaging device. In this embodiment, the model position information has three parameters, the number of multiple finger joints in the model position information is 14, and each coordinate indicating the position of each of the multiple finger joints is two-dimensional. Note that the predetermined number of parameters in the model position information is not limited to three parameters, and as described above, it is sufficient that the predetermined number is smaller than the product of the number of multiple joints and the number of dimensions of each coordinate indicating the position of each of the multiple joints.

また、本実施形態では、抽出部１５は、Ｍ枚の第２撮影画像の関節位置を主成分分析することによってモデル位置情報を抽出する。一つの手には１４個の関節点が存在し、各関節点は第２撮影画像において２次元の座標を有するので、一つの手（例えば右手）の各関節点の座標を１次元ベクトル化すると、以下のようなベクトルａとなる。
ａ＝［ｘ_１Ｒ１，ｙ_１Ｒ１，ｘ_１Ｒ２，ｙ_１Ｒ２，…，ｘ_１Ｒ１４，ｙ_１Ｒ１４］ In this embodiment, the extraction unit 15 extracts model position information by subjecting the joint positions of the M second captured images to principal component analysis. Since there are 14 joint points in one hand and each joint point has a two-dimensional coordinate in the second captured images, the coordinates of each joint point of one hand (e.g., the right hand) are converted into a one-dimensional vector, resulting in vector a as follows:
a = [ _x1R1 , _y1R1 , _x1R2 , _y1R2 , ..., _x1R14 , _y1R14 ]

したがって、Ｍ枚の第２撮影画像の関節位置の情報は、以下のようなＭ行２８列の行列Ａとなる。

Therefore, the information on the joint positions of the M second captured images is expressed as a matrix A with M rows and 28 columns as shown below.

抽出部１５は、行列Ａを主成分分析することによってモデル位置情報を抽出する。本実施形態で、モデル位置情報は、平均ベクトル＋主成分スコア×主成分ベクトルで表現される。平均ベクトルと主成分ベクトルとは共通である。主成分ベクトルは、最も分散の大きい方向（第１主成分ベクトル）、２番目に分散の大きい方向（第２主成分ベクトル）、…、及びｎ番目に分散の大きい方向（第ｎ主成分ベクトル）によって構成されるｎ次元のベクトルである。本実施形態では、ｎ＝３とすることで、上述したようにモデル位置情報が３個のパラメータを有することになる。 The extraction unit 15 extracts model position information by performing principal component analysis on matrix A. In this embodiment, the model position information is expressed as a mean vector + principal component score × principal component vector. The mean vector and the principal component vector are the same. The principal component vector is an n-dimensional vector composed of the direction with the largest variance (first principal component vector), the direction with the second largest variance (second principal component vector), ..., and the direction with the nth largest variance (nth principal component vector). In this embodiment, by setting n = 3, the model position information has three parameters as described above.

主成分スコアは、第１主成分ベクトルの係数となる第１パラメータ、第２主成分ベクトルの係数となる第２パラメータ、及び第３主成分ベクトルの係数となる第３パラメータによって構成される。主成分スコアは、各個人の手指の関節間の相対位置の平均からの変動を示すパラメータである。 The principal component score is composed of a first parameter that is the coefficient of the first principal component vector, a second parameter that is the coefficient of the second principal component vector, and a third parameter that is the coefficient of the third principal component vector. The principal component score is a parameter that indicates the variation from the average relative position between the finger joints of each individual.

以上説明した学習装置１０は、例えば図７に示すフローチャートのように動作する。以下、図７に示す動作例について説明する。 The learning device 10 described above operates, for example, as shown in the flowchart in FIG. 7. The following describes an example of the operation shown in FIG. 7.

まず学習装置１０は、第１及び第２撮影画像を取得する（ステップＳ１）。 First, the learning device 10 acquires the first and second captured images (step S1).

次に、学習装置１０は、第１及び第２撮影画像の上下方向における明暗を低減するための補正処理を第１及び第２撮影画像に対して実行する（ステップＳ２）。 Next, the learning device 10 performs a correction process on the first and second captured images to reduce the brightness and darkness in the vertical direction of the first and second captured images (step S2).

次に、学習装置１０は、第１及び第２撮影画像それぞれの手領域及び非手領域それぞれを確定する（ステップＳ３）。 Next, the learning device 10 determines the hand and non-hand regions of each of the first and second captured images (step S3).

次に、学習装置１０は、左手と右手とを分割し、右手領域に対して左右反転処理を行う（ステップＳ４）。 Next, the learning device 10 divides the left hand and the right hand and performs left-right inversion processing on the right hand region (step S4).

次に、学習装置１０は、第１撮影画像を複数の領域に分割し、分割した領域それぞれの画像特徴量を求める（ステップＳ５）。 Next, the learning device 10 divides the first captured image into a plurality of regions and obtains image features for each of the divided regions (step S5).

次に、学習装置１０は、第１～第３クラスを設定し、第１～第３クラスそれぞれの領域を選択する（ステップＳ６）。 Next, the learning device 10 sets the first to third classes and selects areas for each of the first to third classes (step S6).

次に、学習装置１０は、第１撮影画像に対してコントラストを変更して学習データのデータ数を増加させる（ステップＳ７）。 Next, the learning device 10 changes the contrast of the first captured image to increase the number of pieces of learning data (step S7).

次に、学習装置１０は、元の第１撮影画像とコントラストを変更した第１撮影画像とを学習データとして用いて、任意の領域が第１～第３クラスのうちのどのクラスに属するかを識別するための識別器を生成する。（ステップＳ８）。 Next, the learning device 10 uses the original first captured image and the first captured image with the contrast changed as learning data to generate a classifier for identifying which of the first to third classes an arbitrary region belongs to (step S8).

次に、学習装置１０は、第２撮影画像に対して剛体レジストレーションを実行する（ステップＳ９）。 Next, the learning device 10 performs rigid body registration on the second captured image (step S9).

最後に、学習装置１０は、剛体レジストレーションが収束した第２撮影画像から複数の関節間の相対位置をモデル位置情報として抽出する（ステップＳ１０）。 Finally, the learning device 10 extracts the relative positions between the multiple joints as model position information from the second captured image where the rigid body registration has converged (step S10).

＜２－２．推定装置＞
推定装置２０は、第１処理部２１、第２処理部２２、第３処理部２３、第４処理部２４を含む。 <2-2. Estimation device>
The estimation device 20 includes a first processing unit 21, a second processing unit 22, a third processing unit 23, and a fourth processing unit 24.

第１処理部２１は、推定対象画像を取得する。具体的には、第１処理部２１は、通信部４（図１参照）によって外部装置から推定対象画像を取得する。本実施形態では、第１処理部２１によって取得された推定対象画像は、レントゲン撮影装置によって撮影された人の両手の画像である。 The first processing unit 21 acquires the estimation target image. Specifically, the first processing unit 21 acquires the estimation target image from an external device via the communication unit 4 (see FIG. 1). In this embodiment, the estimation target image acquired by the first processing unit 21 is an image of both hands of a person captured by an X-ray imaging device.

次に、第１処理部２１は、推定対象画像に対して前処理部１２と同一の処理を行う。 Next, the first processing unit 21 performs the same processing as the pre-processing unit 12 on the estimation target image.

次に、第１処理部２１は、前処理部１２と同一の処理が行われた後の推定対象画像を複数の領域に分割し、複数の領域それぞれに対して画像特徴量を求める。本実施形態では、第１処理部２１は、垂直方向１００画素×水平方向１００画素の領域を設定し、推定対象画像を複数の領域に分割する。なお、本実施形態では、隣接する領域間で重複部分がない態様で推定対象画像が複数の領域に分割されるが、隣接する領域間で重複部分がある態様で推定対象画像が複数の領域に分割されてもよい。また、本実施形態では、演算部１３と同様に第１処理部２１においても、画像特徴量として、ＨＯＧ特徴量を使用している。 Next, the first processing unit 21 divides the estimation target image after the same processing as the pre-processing unit 12 has been performed into a plurality of regions, and obtains image features for each of the plurality of regions. In this embodiment, the first processing unit 21 sets a region of 100 pixels vertically by 100 pixels horizontally, and divides the estimation target image into a plurality of regions. Note that in this embodiment, the estimation target image is divided into a plurality of regions such that there is no overlap between adjacent regions, but the estimation target image may also be divided into a plurality of regions such that there is overlap between adjacent regions. Also, in this embodiment, the first processing unit 21 uses HOG features as image features, just like the calculation unit 13.

第２処理部２２は、第１処理部２１によって求められた複数の領域それぞれの画像特徴量から、生成部１４によって生成された識別器を用いて、複数の領域それぞれに対して関節らしさを示す度合を求める。なお、上述した第１クラスは関節らしさを意味しており、上述した第２クラスは指先らしさを意味しており、上述した第３クラスは背景らしさを意味している。本実施形態では、関節らしさを示す度合は０以上１以下の任意の値をとり、関節らしいほど関節らしさを示す度合の値は大きくなる。 The second processing unit 22 uses the classifier generated by the generation unit 14 to determine the degree of joint-likeness for each of the multiple regions from the image features of each of the multiple regions determined by the first processing unit 21. Note that the above-mentioned first class means joint-likeness, the above-mentioned second class means fingertip-likeness, and the above-mentioned third class means background-likeness. In this embodiment, the degree of joint-likeness takes any value between 0 and 1, and the more joint-like an area is, the greater the value of the degree of joint-likeness.

第３処理部２３は、関節らしさを示す度合に基づき推定対象画像における関節の候補を求める。 The third processing unit 23 finds candidates for joints in the estimation target image based on the degree of joint-likeness.

具体的には、第３処理部２３は、関節らしさを示す度合を用いて、最も関節らしい領域を関節の候補として抽出し、関節の候補として抽出した領域の周辺領域を関節の候補から除外する。そして、第３処理部２３は、関節らしさを示す度合を用いて、残った領域（関節の候補として抽出された領域及び関節の候補から除外された領域以外の領域）の中で最も関節らしい領域を関節の候補として抽出し、関節の候補として抽出した領域の周辺領域を関節の候補から除外する。上記の処理を繰り返して、第３処理部２３は、一つの手の領域に対して２０個の関節の候補を抽出する。 Specifically, the third processing unit 23 uses the degree of joint-likeness to extract the area that is most like a joint as a joint candidate, and excludes the surrounding areas of the area extracted as a joint candidate from the joint candidates. The third processing unit 23 then uses the degree of joint-likeness to extract the area that is most like a joint from the remaining areas (areas other than the areas extracted as joint candidates and the areas excluded from the joint candidates) as a joint candidate, and excludes the surrounding areas of the area extracted as a joint candidate from the joint candidates. By repeating the above process, the third processing unit 23 extracts 20 joint candidates for one hand area.

第３処理部２３は、一つの手の領域に対して２０個の関節の候補から１４個の関節位置を選択する。一つの手の領域に対して２０個の関節の候補から１４個の関節位置を選択する手法は特に限定されないが、本実施形態ではＩＣＰ（Iterative Closest Point）アルゴリズムを用いる。具体的には、本実施形態において第３処理部２３は、一つの手の領域に対して２０個の関節の候補から１０個をランダムに選び、そのランダムに選ばれた１０個と平均位置と抽出部１５によって抽出されたモデル位置情報の平均ベクトルで表される位置とを比較して一致度を確認しながら、ＩＣＰ（Iterative Closest Point）アルゴリズムを用いて１４個の関節位置を選択する。 The third processing unit 23 selects 14 joint positions from 20 joint candidates for one hand region. There is no particular limitation on the method of selecting 14 joint positions from 20 joint candidates for one hand region, but in this embodiment, the ICP (Iterative Closest Point) algorithm is used. Specifically, in this embodiment, the third processing unit 23 randomly selects 10 from the 20 joint candidates for one hand region, compares the average position of the randomly selected 10 with the position represented by the average vector of the model position information extracted by the extraction unit 15 to check the degree of match, and selects 14 joint positions using the ICP (Iterative Closest Point) algorithm.

第４処理部２４は、関節の候補に基づきモデル位置情報のパラメータを変更し、モデル位置情報における各関節に対応する識別器で求められた関節らしさを示す度合の合計に基づきパラメータの更新を収束させる。具体的には、第４処理部２４は、２０個の関節の候補から選択された１４個の関節位置に基づき、抽出部１５によって抽出されたモデル位置情報の第１～第３パラメータを変更する。そして、第４識別器２４は、モデル位置情報における各関節に対応する識別器で求められた関節らしさを示す度合の合計を最大化するように第１～第３パラメータの更新を収束させる。更新が収束した第１～第３パラメータを有するモデル位置情報が、推定対象画像の手指の関節位置の推定結果となる。 The fourth processing unit 24 changes the parameters of the model position information based on the joint candidates, and converges the update of the parameters based on the sum of the degrees of joint-likeness determined by the classifiers corresponding to each joint in the model position information. Specifically, the fourth processing unit 24 changes the first to third parameters of the model position information extracted by the extraction unit 15 based on 14 joint positions selected from the 20 joint candidates. Then, the fourth classifier 24 converges the update of the first to third parameters so as to maximize the sum of the degrees of joint-likeness determined by the classifiers corresponding to each joint in the model position information. The model position information having the first to third parameters for which the updates have converged becomes the estimation result of the joint positions of the fingers in the estimation target image.

以上説明した推定装置２０は、例えば図８に示すフローチャートのように動作する。以下、図８に示す動作例について説明する。 The estimation device 20 described above operates, for example, as shown in the flowchart in FIG. 8. The following describes an example of the operation shown in FIG. 8.

まず推定装置２０は、推定対象画像を取得する（ステップＳ１１）。 First, the estimation device 20 acquires an image to be estimated (step S11).

次に、推定装置２０は、推定対象画像の上下方向における明暗を低減するための補正処理を推定対象画像に対して実行する（ステップＳ１２）。 Next, the estimation device 20 performs a correction process on the estimation target image to reduce brightness in the vertical direction of the estimation target image (step S12).

次に、推定装置２０は、推定対象画像の手領域及び非手領域それぞれを確定する（ステップＳ１３）。 Next, the estimation device 20 determines the hand region and non-hand region of the estimation target image (step S13).

次に、推定装置２０は、左手と右手とを分割し、右手領域に対して左右反転処理を行う（ステップＳ１４）。 Next, the estimation device 20 divides the left hand and the right hand, and performs left-right inversion processing on the right hand region (step S14).

次に、推定装置２０は、定対象画像を複数の領域に分割し、複数の領域それぞれに対して画像特徴量を求める（ステップＳ１５）。 Next, the estimation device 20 divides the target image into multiple regions and obtains image features for each of the multiple regions (step S15).

次に、推定装置２０は、複数の領域それぞれの画像特徴量から、学習装置１０によって生成された識別器を用いて、複数の領域それぞれに対して関節らしさを示す度合を求める（ステップＳ１６）。 Next, the estimation device 20 uses the classifier generated by the learning device 10 to determine the degree of joint-likeness for each of the multiple regions from the image features of each of the multiple regions (step S16).

次に、推定装置２０は、関節らしさを示す度合に基づき推定対象画像における関節の候補を求める（ステップＳ１７）。 Next, the estimation device 20 finds candidates for joints in the estimation target image based on the degree of joint-likeness (step S17).

最後に、推定装置２０は、関節の候補に基づきモデル位置情報のパラメータを変更し、モデル位置情報における各関節に対応する識別器で求められた関節らしさを示す度合の合計に基づきパラメータの更新を収束させる（ステップＳ１８）。 Finally, the estimation device 20 changes the parameters of the model position information based on the joint candidates, and converges the parameter update based on the sum of the degrees of joint-likeness determined by the classifiers corresponding to each joint in the model position information (step S18).

＜３．その他＞
なお、本発明の構成は、上記実施形態のほか、発明の主旨を逸脱しない範囲で種々の変更を加えることが可能である。上記実施形態は、全ての点で例示であって、制限的なものではないと考えられるべきであり、本発明の技術的範囲は、上記実施形態の説明ではなく、特許請求の範囲によって示されるものであり、特許請求の範囲と均等の意味及び範囲内に属する全ての変更が含まれると理解されるべきである。 <3. Other>
In addition to the above-described embodiment, the configuration of the present invention can be modified in various ways without departing from the spirit of the invention. The above-described embodiment is illustrative in all respects and should be considered as not limiting, and the technical scope of the present invention is indicated by the claims, not the description of the above-described embodiment, and should be understood to include all modifications that fall within the meaning and scope of the claims.

例えば、上述した実施形態では、第１撮影画像、第２撮影画像、及び推定対象画像は、Ｘ線撮影画像であったが、超音波撮影画像などであってもよい。また、ＭＲＩ（Magnetic Resonance Imaging）画像、３Ｄ－ＣＴ（Computed Tomography）画像などの３Ｄ画像であってもよい。第１撮影画像、第２撮影画像、及び推定対象画像が３Ｄ画像である場合、関節の座標は３次元になる。 For example, in the above-described embodiment, the first captured image, the second captured image, and the estimation target image are X-ray images, but they may be ultrasound images, etc. Also, they may be 3D images such as MRI (Magnetic Resonance Imaging) images and 3D-CT (Computed Tomography) images. When the first captured image, the second captured image, and the estimation target image are 3D images, the coordinates of the joint are three-dimensional.

撮影対象は、人間に限らず、骨組織を有する生体であればよい。 The subject to be imaged is not limited to humans, but can be any living organism that has bone tissue.

１０学習装置
１１取得部
１２前処理部
１３演算部
１４生成部
１５抽出部
２０推定装置
２１第１処理部
２２第２処理部
２３第３処理部
２４第４処理部 REFERENCE SIGNS LIST 10 Learning device 11 Acquisition unit 12 Preprocessing unit 13 Calculation unit 14 Generation unit 15 Extraction unit 20 Estimation device 21 First processing unit 22 Second processing unit 23 Third processing unit 24 Fourth processing unit

Claims

an acquisition unit that acquires first and second captured images of bone tissue in a living body;
A calculation unit that divides the first captured image into a plurality of regions and obtains image feature amounts of the divided regions;
a generation unit that generates a classifier for identifying to which class an arbitrary region belongs among at least one class including a first class that is a joint region, by machine learning the image feature amount;
an extraction unit that extracts relative positions between a plurality of joints from the second captured image as model position information;
Equipped with
The model position information has a predetermined number of parameters;
A learning device, characterized in that the predetermined number is smaller than a product of the number of the plurality of joints and the number of dimensions of each coordinate indicating the position of each of the plurality of joints.

The second captured images include a plurality of images, each of which is an image captured using a different living body as a subject;
The learning device according to claim 1 , wherein the extraction unit extracts the model position information by performing principal component analysis on joint positions in the plurality of second captured images.

The learning device according to claim 1 or 2 , wherein the at least one class includes a second class that is a fingertip region.

a pre-processing unit provided between the acquisition unit and the calculation unit and the extraction unit;
The pre-treatment unit includes:
Dividing the first captured image and the second captured image into a plurality of blocks by horizontal lines;
determining a pixel value that is a boundary between a subject region in which the living body is captured and a background region in which the living body is not captured in each of the plurality of blocks based on a pixel value histogram;
4. The learning device according to claim 1, wherein pixel values are corrected on a block-by-block basis so that a shift in pixel values at the boundaries between the plurality of blocks is reduced.

Computer,
an acquisition unit that acquires first and second captured images of bone tissue in a living body;
A calculation unit that divides the first captured image into a plurality of regions and obtains image feature amounts of the divided regions;
a generation unit that generates a classifier for identifying to which class an arbitrary region belongs among at least one class including a first class that is a joint region, by machine learning the image feature amount; and an extraction unit that extracts relative positions between a plurality of joints from the second captured image as model position information.
A learning program that functions as
The model position information has a predetermined number of parameters;
The learning program, wherein the predetermined number is smaller than a product of the number of the plurality of joints and the number of dimensions of each coordinate indicating the position of each of the plurality of joints.

An estimation device having a classifier generated by a learning device and model position information extracted by the learning device,
The learning device is a learning device according to any one of claims 1 to 4 ,
The estimation device comprises:
a first processing unit that divides an estimation target image obtained by capturing an image of bone tissue in a living body into a plurality of regions and obtains an image feature amount for each of the plurality of regions;
a second processing unit that uses the classifier to determine a degree of likelihood of each of the plurality of regions being a joint from the image feature amount determined by the first processing unit;
a third processing unit that obtains candidates for the joint in the estimation target image based on the degree of joint resemblance;
a fourth processing unit that changes the parameters of the model position information based on the joint candidates and converges the update of the parameters so as to maximize a sum of the degrees of joint-likeness determined by the classifier corresponding to each joint in the model position information;
An estimation device comprising:

Computer,
An estimation program for causing an estimation device to function as an estimation device having a classifier generated by a learning device and model position information extracted by the learning device,
The learning device is a learning device according to any one of claims 1 to 4 ,
The estimation device comprises:
a first processing unit that divides an estimation target image obtained by capturing an image of bone tissue in a living body into a plurality of regions and obtains an image feature amount for each of the plurality of regions;
a second processing unit that uses the classifier to determine a degree of likelihood of each of the plurality of regions being a joint from the image feature amount determined by the first processing unit;
a third processing unit that obtains candidates for the joint in the estimation target image based on the degree of joint resemblance;
a fourth processing unit that changes the parameters of the model position information based on the joint candidates and converges the update of the parameters so as to maximize a sum of the degrees of joint-likeness determined by the classifier corresponding to each joint in the model position information;
An estimation program comprising: