JP2018067266A

JP2018067266A - Program for forecasting onset risk or recurrence risk of disease

Info

Publication number: JP2018067266A
Application number: JP2016207222A
Authority: JP
Inventors: 聡史加藤; Satoshi Kato
Original assignee: Fujirebio Inc
Current assignee: Fujirebio Inc
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2018-04-26

Abstract

PROBLEM TO BE SOLVED: To forecast an onset risk or a recurrence risk of a disease within a predetermined period by a simple inspection.SOLUTION: A program forecasts an onset risk or a recurrence risk of the disease within a predetermined period by inputting first data based on one or both of a test sample collected from an analyte and biological information acquired from the analyte including the following step executed by a computer having a calculation part, and includes: a step of generating and acquiring a plurality of second data based on index parameters of two or more types by a calculation part; a step of structuring a plurality of first learning models and a plurality of second learning models by the calculation part; and a step of selecting the plurality of first selection learning models and the plurality of second selection learning models for each of the plurality of first learning models and each of the plurality of second learning models by the calculation part.SELECTED DRAWING: Figure 1

Description

本発明は、所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラム、プログラムを実行する予測装置、及びプログラムにかかる学習モデルの構築方法に関する。 The present invention relates to a program for predicting the onset risk or recurrence risk of a disease within a predetermined period, a prediction apparatus for executing the program, and a learning model construction method for the program.

疾病の発症リスク又は再発リスクの予測は、疾病の発症又は再発が、結果として深刻な結果を招来する場合があるため、重要である。 Prediction of disease onset or recurrence risk is important because the onset or recurrence of the disease can result in serious consequences.

例えば、下記特許文献１は、急性冠症候群のうちの少なくとも１つの症状を発症後、１年以内に主要有害心イベントの再発リスクを決定する方法を開示している。 For example, Patent Document 1 below discloses a method for determining the risk of recurrence of major adverse cardiac events within one year after the onset of at least one symptom of acute coronary syndrome.

米国特許第８６５８３８４号U.S. Pat. No. 8,658,384

前記特許文献１においては、主要有害心イベントの再発リスクを決定するにあたり、いわゆる心筋マーカーである、心筋トロポニンＩ（ｃＴｎＩ）、プロＢ型ナトリウム利尿ペプチド（プロＢＮＰ）又はその切断産物、高感度Ｃ反応性タンパク質（ｈｓＣＲＰ）、ミエロペルオキシダーゼ（ＭＰＯ）、胎盤増殖因子（ＰＩＧＦ）、推算糸球体濾過量（eＧＦＲ）、ホモシステイン（ＨＣＹ）、コリン、虚血修飾アルブミン（ＩＭＡ）、可溶性ＣＤ４０リガンド（ｓＣＤ４０Ｌ）及びリポタンパク質関連ホスホリパーゼＡ_２（ＬｐＰＬＡ２）からなる群から選択される、少なくとも３種類のバイオマーカーの試験試料中の量を指標として用いている。 In Patent Document 1, in determining the risk of recurrence of major adverse cardiac events, so-called myocardial markers, cardiac troponin I (cTnI), pro-B-type natriuretic peptide (proBNP) or a cleavage product thereof, high sensitivity C Reactive protein (hsCRP), myeloperoxidase (MPO), placental growth factor (PIGF), estimated glomerular filtration rate (eGFR), homocysteine (HCY), choline, ischemia modified albumin (IMA), soluble CD40 ligand (sCD40L ) And lipoprotein-related phospholipase A ₂ (LpPLA2), the amount of at least three kinds of biomarkers in the test sample is used as an indicator.

しかしながら、前記特許文献１にかかる決定方法では、試験試料中の心筋マーカーの量を指標として用いるため、例えば入院検査や健康診断で一般的に行われる検査とは別途に特殊かつ高額の費用が必要な検査の実施が必要であり、結果として実施のための負担が大きくなってしまっていた。 However, since the determination method according to Patent Document 1 uses the amount of the myocardial marker in the test sample as an index, for example, a special and expensive cost is required separately from, for example, a test generally performed in a hospitalization test or a health checkup. It was necessary to carry out a proper inspection, and as a result, the burden for the implementation became large.

本発明は、前記課題に鑑みてなされたものである。本発明者らは、このように高負荷の検査などを別途実施することなく、入院検査や健康診断で一般的に行われる簡便かつ安価な検査のみによっても、本発明のプログラムを用いることにより上記課題を解決できることを見出し、本発明を完成するに至った。 The present invention has been made in view of the above problems. The present inventors do not separately carry out such a high-load test, but by using the program of the present invention only by a simple and inexpensive test that is generally performed in hospitalization tests and medical examinations. The present inventors have found that the problem can be solved and have completed the present invention.

すなわち、本発明は、下記［１］〜［１５］を提供する。
［１］演算部を備えるコンピュータにより実行される下記のステップを含む、被検体から採取された試験試料を用いて測定されたか、又は該被検体から得られた第１データを入力することにより、所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラムであって、
前記演算部が、所定期間内に疾病を発症或いは再発したか、又は発症或いは再発しなかった複数の被検体についての複数の指標パラメータからなる群から選択頻度に基づいて選択された２種以上の該指標パラメータに基づく複数の第２データを生成して取得するステップと、
前記演算部が、前記疾病の発症或いは再発あり又は不明を予測する複数の第１学習モデル、及び前記疾病の発症或いは再発なし又は不明を予測する複数の第２学習モデルを構築するステップと、
前記演算部が、複数の前記第１学習モデル及び複数の前記第２学習モデルごとに、複数の第１選抜学習モデル及び複数の第２選抜学習モデルを選抜するステップと
を含む、プログラム。
［２］前記演算部が、前記第１データを、前記第１選抜学習モデル及び前記第２選抜学習モデルにより処理し、前記第１選抜学習モデル及び前記第２選抜学習モデルごとに第１判定結果を取得するステップと、
前記演算部が、複数の前記第１選抜学習モデルの複数の第１判定結果及び複数の前記第２選抜学習モデルの複数の第１判定結果に基づいて、第２判定結果を取得するステップと、
前記演算部が、前記第１選抜学習モデルの第２判定結果及び前記第２選抜学習モデルの第２判定結果を統合して、第３判定結果を取得するステップと、
前記演算部が、前記第３判定結果に基づいて、所定期間内における前記疾病の発症リスク又は再発リスクを予測するステップと
をさらに含む、［１］に記載の所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラム。
［３］前記第２判定結果を取得するステップが、前記第１選抜学習モデルの得票率のカットオフ値と、前記第２選抜学習モデルの得票率のカットオフ値とを異なる値に設定して行われるステップである、［２］に記載の所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラム。
［４］前記演算部が、前記第２判定結果に基づいて、予測された発症リスク又は再発リスクの信頼度を算出するステップをさらに含む、［２］又は［３］に記載の所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラム。
［５］複数の第１学習モデル及び複数の第２学習モデルを構築するステップが、複数の指標パラメータについて取捨選択を行うことにより、生成した複数の部分データ及び該部分データの生成に用いられた完全データを含む第２データを生成して準備し、該第２データを複数のデータ群に分割し、該データ群それぞれを用い、かつ複数の該データ群ごとに異なる複数のパラメータ条件で行われる機械学習により構築するステップである、［１］〜［４］のいずれか１つに記載の所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラム。
［６］前記第２判定結果を取得するステップが、複数の前記第１選抜学習モデルの複数の第１判定結果及び複数の前記第２選抜学習モデルの複数の第１判定結果について投票を行い、前記第１選抜学習モデル及び前記第２選抜学習モデルごとに得票率に基づく第２判定結果を取得するステップである、［２］〜［５］のいずれか１項に記載の所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラム。
［７］前記第１選抜学習モデル及び前記第２選抜学習モデルを選抜するステップが、感度及び陽性的中率を指標とするパレートランクにより前記第１学習モデル及び前記第２学習モデルを評価して選択するステップである、［１］〜［６］のいずれか１つに記載の所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラム。
［８］前記疾病が、主要有害心イベント、急性腎障害、脳動脈瘤破裂又は糖尿病性腎症である、［１］〜［７］のいずれか１つに記載の所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラム。
［９］被検体から採取された試験試料を用いて測定されたか、又は該被検体から得られた２以上の指標パラメータを取得し、該指標パラメータに基づいて第１データを生成して取得する第１データ生成取得部と、
前記第１データ生成取得部が取得した前記第１データを、第１選抜学習モデル及び第２選抜学習モデルにより処理し、該第１選抜学習モデル及び第２選抜学習モデルごとに第１判定結果を生成して取得する第１判定結果生成取得部と、
前記第１判定結果生成取得部が取得した複数の前記第１選抜学習モデルの複数の第１判定結果及び複数の前記第２選抜学習モデルの複数の第１判定結果に基づいて、第２判定結果を生成して取得する第２判定結果生成取得部と、
前記第２判定結果生成取得部が取得した前記第１選抜学習モデルの第２判定結果及び前記第２選抜学習モデルの第２判定結果を統合して、第３判定結果を生成して取得する第３判定結果生成取得部と、
前記第３判定結果生成取得部が取得した前記第３判定結果に基づいて、所定期間内において疾病の発症リスク又は再発リスクを予測する予測部とを備える、所定期間内における前記疾病の発症リスク又は再発リスクを予測するための予測装置。
［１０］所定期間内に疾病を発症或いは再発したか、又は発症或いは再発しなかった複数の被検体についての複数の指標パラメータからなる群から選択頻度に基づいて選択された２種以上の該指標パラメータに基づく第２データを生成して取得する第２データ生成取得部と、
前記第２データ生成取得部から取得した前記第２データに基づいて、前記疾病の発症或いは再発あり又は不明を予測する複数の第１学習モデル、及び前記疾病の発症或いは再発なし又は不明を予測する複数の第２学習モデルを構築する学習モデル構築部と、
前記学習モデル構築部が構築した複数の前記第１学習モデル及び複数の第２学習モデルごとに、複数の第１選抜学習モデル、及び複数の第２選抜学習モデルを選抜する学習モデル選抜部と
をさらに含む、［９］に記載の所定期間内における前記疾病の発症リスク又は再発リスクを予測するための予測装置。
［１１］前記第２データ生成取得部が、複数の指標パラメータについて取捨選択を行い、部分データを生成することにより、複数の部分データ及び該部分データの生成に用いられた完全データを含む第２データを生成する機能部である、［９］又は［１０］に記載の所定期間内における前記疾病の発症リスク又は再発リスクを予測するための予測装置。
［１２］被検体から採取された試験試料を用いて測定されたか、又は該被検体から得られた第１データを入力することにより、所定期間内における疾病の発症リスク又は再発リスクの予測方法に用いる学習モデルの構築方法であって、
所定期間内に疾病を発症或いは再発したか、又は発症或いは再発しなかった複数の被検体についての複数の指標パラメータからなる群から選択頻度に基づいて選択された２種以上の該指標パラメータに基づく複数の第２データを生成して取得するステップと、
前記疾病の発症或いは再発あり又は不明を予測する複数の第１学習モデル、及び前記疾病の発症或いは再発なし又は不明を予測する複数の第２学習モデルを構築するステップと、
複数の前記第１学習モデル及び複数の前記第２学習モデルごとに、複数の第１選抜学習モデル及び複数の第２選抜学習モデルを選抜するステップと
を含む、学習モデルの構築方法。
［１３］複数の第１学習モデル及び複数の第２学習モデルを構築するステップが、複数の指標パラメータについて取捨選択を行い、部分データを生成することにより、複数の部分データ及び該部分データの生成に用いられた完全データを含む第２データを生成して準備し、該第２データを複数のデータ群に分割し、該データ群それぞれを用い、かつ複数の該データ群ごとに異なる複数のパラメータ条件で行われる機械学習により構築するステップである、［１２］に記載の所定期間内における疾病の発症リスク又は再発リスクの予測方法に用いる学習モデルの構築方法。
［１４］前記第１選抜学習モデル及び前記第２選抜学習モデルを選抜するステップが、感度及び陽性的中率を指標とするパレートランクにより前記第１学習モデル及び前記第２学習モデルを評価して選択するステップである、［１３］に記載の所定期間内における疾病の発症リスク又は再発リスクの予測方法に用いる学習モデルの構築方法。
［１５］前記疾病が、主要有害心イベント、急性腎障害、脳動脈瘤破裂又は糖尿病性腎症である、［１４］に記載の所定期間内における疾病の発症リスク又は再発リスクの予測方法に用いる学習モデルの構築方法。 That is, the present invention provides the following [1] to [15].
[1] By using the test data collected from the subject or the first data obtained from the subject, including the following steps executed by a computer having a calculation unit, A program for predicting the risk of disease onset or recurrence within a predetermined period,
Two or more kinds selected from the group consisting of a plurality of indicator parameters for a plurality of subjects who have developed or relapsed a disease within a predetermined period or have not developed or recurred within the predetermined period. Generating and acquiring a plurality of second data based on the index parameter;
The computing unit constructing a plurality of first learning models for predicting the onset or recurrence of the disease or unknown, and a plurality of second learning models for predicting the onset or no recurrence of the disease or unknown;
The arithmetic unit includes a step of selecting a plurality of first selection learning models and a plurality of second selection learning models for each of the plurality of first learning models and the plurality of second learning models.
[2] The computing unit processes the first data by the first selection learning model and the second selection learning model, and a first determination result for each of the first selection learning model and the second selection learning model. Step to get the
The arithmetic unit obtaining a second determination result based on a plurality of first determination results of the plurality of first selection learning models and a plurality of first determination results of the plurality of second selection learning models;
The arithmetic unit integrating the second determination result of the first selective learning model and the second determination result of the second selective learning model to obtain a third determination result;
The calculation unit further includes a step of predicting the risk of developing the disease or the risk of recurrence within a predetermined period based on the third determination result, or the risk of developing a disease within the predetermined period according to [1] or A program for predicting the risk of recurrence.
[3] The step of obtaining the second determination result sets the cut-off value of the vote rate of the first selection learning model and the cut-off value of the vote rate of the second selection learning model to different values. The program for predicting the onset risk or recurrence risk within the predetermined period according to [2], which is a performed step.
[4] The calculation unit further includes a step of calculating a reliability of the predicted onset risk or recurrence risk based on the second determination result, within the predetermined period according to [2] or [3] A program for predicting the risk of disease onset or recurrence.
[5] The step of constructing the plurality of first learning models and the plurality of second learning models was used for generating the plurality of partial data and the partial data generated by selecting the plurality of index parameters. Second data including complete data is generated and prepared, the second data is divided into a plurality of data groups, each of the data groups is used, and each of the plurality of data groups is performed with a plurality of different parameter conditions. The program for predicting the onset risk or recurrence risk within a predetermined period according to any one of [1] to [4], which is a step constructed by machine learning.
[6] The step of obtaining the second determination result votes for the plurality of first determination results of the plurality of first selection learning models and the plurality of first determination results of the plurality of second selection learning models, The disease within a predetermined period according to any one of [2] to [5], which is a step of acquiring a second determination result based on a vote rate for each of the first selection learning model and the second selection learning model. A program for predicting the risk of onset or recurrence.
[7] The step of selecting the first selection learning model and the second selection learning model is performed by evaluating the first learning model and the second learning model based on Pareto ranks using sensitivity and positive predictive value as indexes. The program for predicting the onset risk or the recurrence risk of the disease within the predetermined period according to any one of [1] to [6], which is a selection step.
[8] Onset of disease within a predetermined period according to any one of [1] to [7], wherein the disease is a major adverse cardiac event, acute kidney injury, cerebral aneurysm rupture or diabetic nephropathy A program for predicting risk or risk of recurrence.
[9] Two or more index parameters measured using a test sample collected from the subject or obtained from the subject are acquired, and first data is generated and acquired based on the index parameters A first data generation and acquisition unit;
The first data acquired by the first data generation and acquisition unit is processed by a first selection learning model and a second selection learning model, and a first determination result is obtained for each of the first selection learning model and the second selection learning model. A first determination result generation acquisition unit that generates and acquires;
The second determination result based on the plurality of first determination results of the plurality of first selection learning models and the plurality of first determination results of the plurality of second selection learning models acquired by the first determination result generation acquisition unit. A second determination result generation acquisition unit that generates and acquires
A second determination result of the first selection learning model acquired by the second determination result generation acquisition unit and a second determination result of the second selection learning model are integrated to generate and acquire a third determination result. 3 determination result generation acquisition unit;
Based on the third determination result acquired by the third determination result generation acquisition unit, the prediction unit predicts the risk of disease onset or recurrence within a predetermined period, Prediction device for predicting recurrence risk.
[10] Two or more of the indicators selected based on a selection frequency from a group consisting of a plurality of indicator parameters for a plurality of subjects who have developed or relapsed a disease within a predetermined period or have not developed or recurred A second data generation and acquisition unit that generates and acquires second data based on the parameters;
Based on the second data acquired from the second data generation / acquisition unit, a plurality of first learning models for predicting the onset or recurrence of the disease or the unknown, and predicting the onset or no recurrence of the disease or the unknown A learning model building unit for building a plurality of second learning models;
A learning model selection unit that selects a plurality of first selection learning models and a plurality of second selection learning models for each of the plurality of first learning models and the plurality of second learning models constructed by the learning model construction unit. Furthermore, the prediction apparatus for estimating the onset risk or recurrence risk of the said disease within the predetermined period as described in [9].
[11] The second data generation / acquisition unit selects a plurality of index parameters and generates partial data, thereby including a plurality of partial data and complete data used for generating the partial data. The prediction device for predicting the onset risk or recurrence risk of the disease within the predetermined period according to [9] or [10], which is a functional unit that generates data.
[12] A method for predicting the risk of disease onset or recurrence within a predetermined period of time by using a test sample collected from a subject or inputting first data obtained from the subject. A method for constructing a learning model to be used,
Based on two or more index parameters selected based on the selection frequency from a group consisting of a plurality of index parameters for a plurality of subjects who have developed or relapsed a disease within a predetermined period or have not developed or recurred Generating and acquiring a plurality of second data;
Constructing a plurality of first learning models for predicting the onset or recurrence of the disease or unknown, and a plurality of second learning models for predicting the onset or recurrence of the disease or no unknown;
Selecting a plurality of first selection learning models and a plurality of second selection learning models for each of the plurality of first learning models and the plurality of second learning models.
[13] The step of constructing the plurality of first learning models and the plurality of second learning models performs selection on the plurality of index parameters and generates partial data, thereby generating a plurality of partial data and the partial data. Generating and preparing the second data including the complete data used in the process, dividing the second data into a plurality of data groups, using each of the data groups, and a plurality of parameters different for each of the plurality of data groups A learning model construction method used in the method for predicting disease onset risk or recurrence risk within a predetermined period according to [12], which is a step constructed by machine learning performed under conditions.
[14] In the step of selecting the first selection learning model and the second selection learning model, the first learning model and the second learning model are evaluated based on a Pareto rank using sensitivity and positive predictive value as indexes. A learning model construction method used in the prediction method of disease onset risk or recurrence risk within a predetermined period according to [13], which is a step of selecting.
[15] The disease is a major adverse cardiac event, acute kidney injury, cerebral aneurysm rupture or diabetic nephropathy, and is used in the method for predicting the risk of disease onset or recurrence within a predetermined period according to [14]. How to build a learning model.

本発明にかかるプログラムによれば、より簡便なステップで、より精度の高い疾病の発症リスク又は再発リスクの判定を行うことができる。これにより、治療方針、通院頻度などについてのより精度の高いコンサルティングを行うことができる。また、特殊かつ高額な費用が必要なバイオマーカーの測定などの高負荷の検査を行わずとも、より簡易で安価な検査のみによっても実施できるので、実施のための負担をより軽減することができる。 According to the program according to the present invention, it is possible to determine the disease onset risk or the recurrence risk with higher accuracy in simpler steps. As a result, it is possible to provide more accurate consulting on the treatment policy, the frequency of hospital visits, and the like. In addition, it is possible to carry out only simpler and cheaper examinations without carrying out high-load examinations such as measurement of biomarkers that require special and expensive costs, so the burden for implementation can be further reduced. .

図１は、疾病の発症リスク又は再発リスクを予測するためのステップを示すフローチャートである。FIG. 1 is a flowchart showing steps for predicting the onset risk or recurrence risk of a disease. 図２は、ステップ（Ｓ１）を説明するためのフローチャートである。FIG. 2 is a flowchart for explaining step (S1). 図３は、ステップ（Ｓ０）を説明するためのフローチャートである。FIG. 3 is a flowchart for explaining the step (S0). 図４は、第２データにかかる指標パラメータの例を示す表である。FIG. 4 is a table showing an example of index parameters according to the second data. 図５は、第２データを生成するための完全データの構成を説明する模式的な図である。FIG. 5 is a schematic diagram illustrating a configuration of complete data for generating the second data. 図６は、第２データの模式的な図である。FIG. 6 is a schematic diagram of the second data. 図７は、ステップ（Ｓ２）を説明するためのフローチャートである。FIG. 7 is a flowchart for explaining step (S2). 図８は、カットオフ値の評価結果を説明するための表である。FIG. 8 is a table for explaining the evaluation result of the cutoff value. 図９は、カットオフ値の評価結果を説明するための表である。FIG. 9 is a table for explaining the evaluation result of the cutoff value. 図１０は、コンピュータの構成を説明するための模式的なブロック図である。FIG. 10 is a schematic block diagram for explaining the configuration of the computer. 図１１は、演算部の構成を説明するための模式的なブロック図である。FIG. 11 is a schematic block diagram for explaining the configuration of the calculation unit.

以下、図面を参照して、本発明の実施形態について説明する。なお、各図面は、発明が理解できる程度に、構成要素の形状、大きさ及び配置を概略的に示しているに過ぎない。本発明は以下の記述によって限定されるものではなく、各構成要素は本発明の要旨を逸脱しない範囲において適宜変更可能である。以下の説明に用いる図面において、同様の構成要素については同一の符号を付して示し、重複する説明については省略する場合がある。また、本発明の実施形態にかかる構成要素は、必ずしも図面に示される配置で、製造され、或いは使用されるとは限らない。 Embodiments of the present invention will be described below with reference to the drawings. In addition, each drawing has shown only the shape of the component, the magnitude | size, and arrangement | positioning to such an extent that an invention can be understood. The present invention is not limited to the following description, and each component can be appropriately changed without departing from the gist of the present invention. In the drawings used for the following description, the same components are denoted by the same reference numerals, and overlapping descriptions may be omitted. Moreover, the component concerning embodiment of this invention is not necessarily manufactured or used by arrangement | positioning shown by drawing.

〔用語の説明〕
本明細書において「発症リスクの予測」とは、初診時又は初回の検査後の所定期間（例えば、３ヵ月、６ヵ月、１年、１年６ヵ月、２年、３年或いはより長期）内の疾病の発症の有無（或いは不明）、又は可能性の高低（或いは不明）を予測することを意味している。 [Explanation of terms]
In this specification, “prediction risk” means within a predetermined period (for example, 3 months, 6 months, 1 year, 1 year, 6 months, 2 years, 3 years or longer) at the first examination or after the first examination. This means that the presence or absence (or unknown) of the disease or the likelihood (or unknown) of the disease is predicted.

本明細書において「再発リスクの予測」とは、疾病の発症後所定期間（例えば、３ヵ月、６ヵ月、１年、１年６ヵ月、２年、３年或いはより長期）内のさらなる主要有害心イベントの再発の有無（或いは不明）、又は可能性の高低（或いは不明）を予測することを意味している。 As used herein, “prediction of recurrence risk” refers to further major adverse effects within a specified period (eg, 3 months, 6 months, 1 year, 1 year, 6 months, 2 years, 3 years or longer) after the onset of the disease. It means predicting the presence or absence (or unknown) of a cardiac event or the possibility (or unknown) of a possibility.

本明細書において「被検体」とは、疾病の発症リスク又は再発リスクの予測対象である生体を意味しており、具体的には例えば患者が挙げられる。 In the present specification, the “subject” means a living body that is a prediction target of the onset risk or recurrence risk of a disease, and specifically includes, for example, a patient.

本明細書において「試験試料」とは、後述する指標パラメータを得ることができる任意の試料を意味している。このような試験試料としては、例えば、液体試料（例、血液（全血）または血液由来試料（例、血清、血漿）、尿、唾液、腹水、組織抽出液、細胞抽出液）、非液体試料（例、組織サンプル、細胞サンプル）が挙げられるが、液体試料が好ましく、血液または血液由来試料がより好ましく、血液がより好ましい。試験試料は、測定前に、事前に処理されてもよい。このような処理としては、例えば、遠心分離、抽出、濃縮、分画、細胞固定、組織固定、組織凍結、組織薄片化が挙げられる。 In the present specification, the “test sample” means an arbitrary sample capable of obtaining an index parameter described later. Examples of such test samples include liquid samples (eg, blood (whole blood) or blood-derived samples (eg, serum, plasma), urine, saliva, ascites, tissue extract, cell extract), non-liquid samples. (For example, a tissue sample, a cell sample) is mentioned, but a liquid sample is preferable, blood or a blood-derived sample is more preferable, and blood is more preferable. The test sample may be processed in advance before measurement. Examples of such treatment include centrifugation, extraction, concentration, fractionation, cell fixation, tissue fixation, tissue freezing, and tissue thinning.

本明細書において「指標パラメータ」とは、上記の試験試料について実施された種々の検査結果、例えば生化学検査、血糖検査、血液一般検査、凝固検査などのいわゆる血液検査の結果（所定の成分の含有量、数量、特性など）に加え、その他の生体情報などに基づくパラメータを意味する。 In this specification, the “index parameter” refers to the results of various tests performed on the above-described test sample, for example, the results of so-called blood tests (such as biochemical tests, blood glucose tests, general blood tests, and coagulation tests). Content, quantity, characteristics, etc.) and other parameters based on biological information.

本明細書において「疾病」とは、主要有害心イベント、急性腎障害（外科手術後の急性腎障害）、脳動脈瘤破裂又は糖尿病性腎症（透析を必要とする程度に重症化した症例）を意味する。 As used herein, “disease” refers to major adverse cardiac events, acute kidney injury (acute kidney injury after surgery), cerebral aneurysm rupture or diabetic nephropathy (cases that have become severe enough to require dialysis) Means.

本明細書において「主要有害心イベント」とは、急性心筋梗塞、冠血行再建術が施術された狭心症、入院を要した心不全、心房細動、脳卒中（一過性脳虚血発作（ＴＩＡ）を除く。）、又は循環器を理由とする死亡を意味している。 As used herein, “major adverse cardiac event” refers to acute myocardial infarction, angina pectoris with coronary revascularization, heart failure requiring hospitalization, atrial fibrillation, stroke (transient ischemic attack (TIA) ), Or death due to cardiovascular reasons.

ここで、指標パラメータの例について説明する。疾病が、例えば、主要有害イベントである場合の「指標パラメータ」の例としては、Ｃ反応性タンパク質（ＣＲＰ）、Ｄダイマー、ＨＤＬ−コレステロール（ＨＤＬ−Ｃ）、ＬＤＬ−コレステロール（ＬＤＬ−Ｃ）、プロトロンビン時間（国際標準比（ＩＮＲ））（ＰＴ−ＩＮＲ）、γ−グルタミルトランスペプチターゼ（γ−ＧＴＰ）、アスパラギン酸アミノトランスフェラーゼ（ＡＳＴ（ＧＯＴ））、アミラーゼ（ＡＭＹ）、アラニンアミノトランスフェラーゼ（ＡＬＴ（ＧＰＴ））、アルカリホスファターゼ（ＡＬＰ）、アルブミン（ＡＬＢ）、アンチトロンビン（ＡＴ）、グリコヘモグロビン（ＨｂＡ１ｃ）、クロール（Ｃｌ）、トリグリセリド（ＴＧ）、フィブリノゲン（Ｆｂｇ）、フィブリン／フィブリノゲン分解産物（ＦＤＰ）、活性化部分トロンボプラスチン時間（ＡＰＴＴ）、血清クレアチニン（ＣＲＥ）、血中尿素窒素（ＢＵＮ）、血糖（Ｇｌｕ）、総コレステロール（ＣＨＯ）、総ビルビリン（Ｔ・Ｂｉｌ）、単球数（Ｍｏｎｏ）、直接ビリルビン（Ｄ・Ｂｉｌ）、乳酸脱水素酵素（ＬＤ（ＬＤＨ））、尿たんぱく（定性）（ＵＰ）、尿酸（ＵＡ）、尿糖（定性）（ＵＳ）、ｐＨ、カリウム（Ｋ）、カルシウム（Ｃａ）、ナトリウム（Ｎａ）、赤血球数（ＲＢＣ）、ヘマトクリット値（Ｈｔ）、ヘモグロビン（Ｈｂ）、リンパ球数（Ｌｙｍｐ）、血小板数（ＰＬ）、好塩基球数（Ｂａｓｏ）、好酸球数（Ｅｏｓ）、及び好中球数（Ｎｅｕｔ）が挙げられる。試験試料におけるこれらの測定値の情報は、常法により得ることができる。 Here, an example of the index parameter will be described. Examples of “index parameters” when the disease is, for example, a major adverse event include C-reactive protein (CRP), D-dimer, HDL-cholesterol (HDL-C), LDL-cholesterol (LDL-C), Prothrombin time (international standard ratio (INR)) (PT-INR), γ-glutamyltransferase (γ-GTP), aspartate aminotransferase (AST (GOT)), amylase (AMY), alanine aminotransferase (ALT (ALT) GPT)), alkaline phosphatase (ALP), albumin (ALB), antithrombin (AT), glycohemoglobin (HbA1c), chlor (Cl), triglyceride (TG), fibrinogen (Fbg), fibrin / fibrinogen degradation product (FDP) Activated partial thromboplastin time (APTT), serum creatinine (CRE), blood urea nitrogen (BUN), blood sugar (Glu), total cholesterol (CHO), total bibilin (T · Bil), monocyte count (Mono), Direct bilirubin (D / Bil), lactate dehydrogenase (LD (LDH)), urinary protein (qualitative) (UP), uric acid (UA), urine sugar (qualitative) (US), pH, potassium (K), calcium (Ca), sodium (Na), red blood cell count (RBC), hematocrit value (Ht), hemoglobin (Hb), lymphocyte count (Lym), platelet count (PL), basophil count (Baso), eosinophil Number (Eos), and neutrophil count (Neut). Information on these measured values in the test sample can be obtained by a conventional method.

また、疾病が、例えば、主要有害イベントである場合の「指標パラメータ」の例としては、所定期間内における主要有害心イベントの再発歴、性別、入院理由、入院時現症、糖尿病既往歴、高血圧症既往歴、脂質異常症既往歴、喫煙習慣、年齢、身長、体重、心拍数などの生体情報（属性、定性データ）について２値化或いは数値化したパラメータがさらに挙げられる。
これらのうち、「入院理由」については、例えば、患者（被検体）が入院する際の医師の診断及び処置に基づいて決定することができる。 Examples of the “index parameter” when the disease is a major adverse event include, for example, the history of recurrence of major adverse cardiac events within a predetermined period, gender, reason for hospitalization, current symptoms at hospitalization, history of diabetes, hypertension Further examples include binarized or digitized parameters for biometric information (attributes, qualitative data) such as history of illness, history of dyslipidemia, smoking habit, age, height, weight, heart rate.
Among these, the “reason for hospitalization” can be determined based on, for example, a doctor's diagnosis and treatment when the patient (subject) is hospitalized.

具体的には、例えば、被検体が急性心筋梗塞（ＡＭＩ）と診断され入院した患者である場合には指標パラメータは１とすればよく、被検体が冠血行再建術が施術された狭心症を理由として入院した患者である場合には指標パラメータは２とすればよく、被検体が心不全（ＨＦ）と診断され入院した患者である場合には指標パラメータは３とすればよく、被検体が（心筋焼灼術（アブレーション処置）を要した）心房細動を理由として入院した患者である場合には指標パラメータは４とすればよく、被検体が脳梗塞（ＣＩ）と診断され入院した患者である場合には指標パラメータは５とすればよく、被検体が一過性脳虚血発作（ＴＩＡ）と診断され入院した患者である場合には指標パラメータは６とすればよい。 Specifically, for example, when the subject is a patient who has been diagnosed with acute myocardial infarction (AMI) and is hospitalized, the index parameter may be set to 1, and an angina pectoris in which the subject has undergone coronary revascularization. If the patient is hospitalized for the reason, the index parameter may be 2, and if the subject is a patient who has been diagnosed with heart failure (HF) and is hospitalized, the index parameter may be 3. If the patient is hospitalized because of atrial fibrillation (required myocardial ablation (ablation)), the index parameter may be 4, and the patient is diagnosed with cerebral infarction (CI) and hospitalized. In some cases, the index parameter may be 5, and when the subject is a patient diagnosed with a transient ischemic attack (TIA) and hospitalized, the index parameter may be 6.

また、「入院時現症」については、例えば、心電図波形に基づく病理学的所見により、「心房細動なし」（洞調律：正常）の場合は指標パラメータは０とされ、心房細動ありの場合は指標パラメータは１とされる。 As for “oncology at admission”, for example, the pathological findings based on the electrocardiogram waveform indicate that the index parameter is 0 in the case of “no atrial fibrillation” (sinus rhythm: normal), and there is atrial fibrillation. In this case, the index parameter is 1.

本明細書において「心筋マーカー」とは、試験試料について実施された検査の結果のうち、特に心臓（心筋）に関する指標を意味する。「心筋マーカー」の例としては、クレアチンキナーゼＭＢ（ＣＫＭＢ）、ヒト心臓型脂肪酸結合タンパク質、心筋トロポニンＩ、心筋トロポニンＴ、脳性ナトリウム利尿ペプチド（ＢＮＰ）、プロ脳性ナトリウム利尿ペプチド又はその切断産物、ミエロペルオキシダーゼ、胎盤増殖因子、推算糸球体濾過量、ホモシステイン、虚血修飾アルブミン、可溶性ＣＤ４０リガンド、リポタンパク質関連ホスホリパーゼＡ２、コリン、及び高感度Ｃ反応性タンパク質が挙げられる。 In the present specification, the “myocardial marker” means an index related to the heart (myocardium) in particular, among the results of examinations performed on test samples. Examples of “myocardial markers” include creatine kinase MB (CKMB), human cardiac fatty acid binding protein, cardiac troponin I, cardiac troponin T, brain natriuretic peptide (BNP), pro-brain natriuretic peptide or its cleavage product, myel Examples include peroxidase, placental growth factor, estimated glomerular filtration rate, homocysteine, ischemia-modified albumin, soluble CD40 ligand, lipoprotein-related phospholipase A2, choline, and highly sensitive C-reactive protein.

〔所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラム〕
以下、本実施形態のプログラムが含むステップそれぞれについて具体的に説明する。なお、本実施形態においては、特に断りがない限り「ステップ」はコンピュータによって実行される（詳細は後述する。）。 [Program for predicting the risk of disease onset or recurrence within a specified period]
Hereinafter, each step included in the program of the present embodiment will be specifically described. In this embodiment, unless otherwise specified, “steps” are executed by a computer (details will be described later).

ここで、「プログラム」とは任意の言語や記述方法にて記述されたデータ処理方法であり、ソースコード、バイナリコードなどの形式を問わない。なお、「プログラム」は、必ずしも単一的に構成されるものに限られず、複数のモジュールやライブラリとして分散構成されていてもよく、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）に代表される別個のプログラムと協働してその機能を達成する場合もある。 Here, the “program” is a data processing method described in an arbitrary language or description method, and may be in any format such as source code or binary code. The “program” is not necessarily limited to a single configuration, and may be distributed as a plurality of modules and libraries, and cooperates with a separate program represented by an OS (Operating System). Sometimes achieve this function.

なお、本発明は、所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラムが記録された記憶媒体にも関する。 The present invention also relates to a storage medium in which a program for predicting the risk of disease onset or recurrence within a predetermined period is recorded.

プログラムは、通常、後述する記憶部に準じた記録媒体に記録されており、必要に応じて予測方法を実施するコンピュータ（予測装置）に読み取られる。記録媒体に記録されたプログラムを予測装置で読み取るための具体的な構成、読み取り手順、読み取り後のインストール手順などについては、周知の構成及び手順を用いることができる。 The program is normally recorded on a recording medium according to a storage unit described later, and is read by a computer (prediction device) that performs a prediction method as necessary. As a specific configuration for reading the program recorded on the recording medium by the prediction device, a reading procedure, an installation procedure after reading, and the like, a well-known configuration and procedure can be used.

また、「記録媒体」は、任意の「可搬の物理媒体」、任意の「固定用の物理媒体」、「通信媒体」を含む。なお、「可搬の物理媒体」とはフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどである。「固定用の物理媒体」とは各種コンピュータシステムに内蔵されるＲＯＭ、ＲＡＭ、ハードディスクドライブなどである。「通信媒体」は、ＬＡＮ、ＷＡＮ、インターネットなどのネットワークを介してプログラムを送信する場合における通信回線や搬送波のように、短期間、プログラムを保持する。 The “recording medium” includes any “portable physical medium”, any “fixed physical medium”, and “communication medium”. The “portable physical medium” is a flexible disk, a magneto-optical disk, a ROM, an EPROM, an EEPROM, a CD-ROM, an MO, a DVD, or the like. The “fixed physical medium” is a ROM, a RAM, a hard disk drive or the like built in various computer systems. The “communication medium” holds the program for a short period of time like a communication line or a carrier wave in the case of transmitting the program via a network such as a LAN, WAN, or the Internet.

本発明の一実施形態のプログラムは、演算部を備えるコンピュータにより実行される下記のステップを含む、被検体から採取された試験試料を用いて測定されたか、又は該被検体から得られた第１データを入力することにより、所定期間内における疾病の発症リスク又は再発リスクを予測するためのプログラムであって、演算部が、所定期間内に疾病を発症或いは再発したか、又は発症或いは再発しなかった複数の被検体についての複数の指標パラメータからなる群から選択頻度に基づいて選択された２種以上の指標パラメータに基づく複数の第２データを生成して取得するステップと、演算部が、疾病の発症或いは再発あり又は不明を予測する複数の第１学習モデル、及び疾病の発症或いは再発なし又は不明を予測する複数の第２学習モデルを構築するステップと、演算部が、複数の第１学習モデル及び複数の第２学習モデルごとに、複数の第１選抜学習モデル及び複数の第２選抜学習モデルを選抜するステップとを含む。 A program according to an embodiment of the present invention is a first program obtained by measurement using a test sample collected from a subject, including the following steps executed by a computer including a calculation unit: A program for predicting the risk of disease onset or recurrence within a predetermined period by inputting data, and the calculation unit has developed or recurred within the predetermined period, or has not developed or recurred Generating and acquiring a plurality of second data based on two or more types of index parameters selected from a group consisting of a plurality of index parameters for a plurality of subjects based on a selection frequency; A plurality of first learning models for predicting the onset or recurrence of the disease, and a plurality of second learning models for predicting the onset or recurrence of the disease or the unknown A step of constructing a computing unit, for each of the plurality of first learning model and the plurality of second learning model, and a step of selecting a plurality of first selection learning model and the plurality of second selection learning model.

以下、本実施形態にかかるプログラムが含むステップそれぞれについて具体的に説明する。なお、本実施形態においては、特に断りがない限り「ステップ」はコンピュータによって実行される（詳細は後述する。）。
なお、以下の説明においては、疾病として「主要有害心イベント」を例にとって説明する。 Hereinafter, each step included in the program according to the present embodiment will be specifically described. In this embodiment, unless otherwise specified, “steps” are executed by a computer (details will be described later).
In the following description, “major adverse event” will be described as an example of a disease.

図１を参照して、本実施形態の疾病の発症リスク又は再発リスクを予測するためのプログラムについて説明する。図１は、疾病の発症リスク又は再発リスクを予測するためのステップを示すフローチャートである。 With reference to FIG. 1, the program for predicting the onset risk or recurrence risk of the disease of this embodiment will be described. FIG. 1 is a flowchart showing steps for predicting the onset risk or recurrence risk of a disease.

（１）図１に示されるように、まず、前提として、指標パラメータに基づいて第１データを取得するステップ（Ｓ１）が実行される。 (1) As shown in FIG. 1, first, as a premise, a step (S1) of acquiring first data based on an index parameter is executed.

以下、ステップ（Ｓ１）について、図２を参照して具体的に説明する。図２は、ステップ（Ｓ１）を説明するためのフローチャートである。 Hereinafter, step (S1) will be specifically described with reference to FIG. FIG. 2 is a flowchart for explaining step (S1).

なお、ステップ（Ｓ１）に先だって、後述する学習モデルを構築しておくことが好ましい。 Prior to step (S1), it is preferable to construct a learning model to be described later.

図２に示されるように、ステップ（Ｓ１）においては、まず、試験試料を採取するステップ（Ｓ１−１）が行われる。ステップ（Ｓ１−１）における試験試料の選択、試験試料の採取の方法は、指標パラメータを取得することができることを条件として特に限定されない。 As shown in FIG. 2, in step (S1), first, a step (S1-1) of collecting a test sample is performed. The method of selecting the test sample and collecting the test sample in step (S1-1) is not particularly limited on condition that the index parameter can be acquired.

例えば、試験試料が特に血液にかかる試料である場合には、通常の採血方法により試験試料を得ることができる。 For example, when the test sample is a sample particularly related to blood, the test sample can be obtained by a normal blood collection method.

本実施形態では、指標パラメータとして、被験体から採取された試験試料に基づく検査結果及び／又は生体情報が用いられる。 In the present embodiment, an examination result and / or biological information based on a test sample collected from a subject is used as the index parameter.

例えば、主要有害心イベントについていえば、簡便性を向上させ、負荷をより低減することができるので、心筋マーカーに非由来の指標パラメータを用いることが好ましい。 For example, regarding major adverse cardiac events, it is preferable to use an index parameter that is not derived from the myocardial marker because it can improve convenience and reduce the load.

次に、得られた試験試料を用いて測定されたか、又は該被検体から得られた指標パラメータに基づく第１データを取得するステップ（Ｓ１−２）が行われる。 Next, a step (S1-2) of obtaining first data based on an index parameter measured using the obtained test sample or obtained from the subject is performed.

このステップ（Ｓ１−２）は、得られた試験試料について、従来公知の任意好適な検査手段（測定手段）及び検査方法（測定方法）を用いて分析（測定）することにより行うことができる。 This step (S1-2) can be performed by analyzing (measuring) the obtained test sample using any conventionally known suitable inspection means (measurement means) and inspection method (measurement method).

例えば、疾病が主要有害イベントである場合の心筋マーカーに非由来の指標パラメータの例としては、Ｃ反応性タンパク質量、Ｄダイマー量、ＨＤＬ−コレステロール量、ＬＤＬ−コレステロール量、プロトロンビン時間（国際標準比（ＩＮＲ））、γ−グルタミルトランスペプチターゼ量、アスパラギン酸アミノトランスフェラーゼ量、アミラーゼ量、アラニンアミノトランスフェラーゼ量、アルカリホスファターゼ量、アルブミン量、アンチトロンビン量、グリコヘモグロビン量、クロール量、トリグリセリド量、フィブリノゲン量、フィブリン／フィブリノゲン分解産物量、活性化部分トロンボプラスチン時間、血清クレアチニン量、血中尿素窒素量、血糖量、総コレステロール量、総ビルビリン量、単球数、直接ビリルビン量、乳酸脱水素酵素（ＬＤＨ）量、尿酸量、ｐＨ、カリウム量、カルシウム量、ナトリウム量、赤血球数、ヘマトクリット値、ヘモグロビン量、リンパ球数、血小板数、好塩基球数、好酸球数、及び好中球数が挙げられる。 For example, examples of index parameters not derived from a myocardial marker when the disease is a major adverse event include C-reactive protein amount, D-dimer amount, HDL-cholesterol amount, LDL-cholesterol amount, prothrombin time (international standard ratio) (INR)), γ-glutamyl transpeptidase level, aspartate aminotransferase level, amylase level, alanine aminotransferase level, alkaline phosphatase level, albumin level, antithrombin level, glycohemoglobin level, crawl level, triglyceride level, fibrinogen level , Fibrin / fibrinogen degradation product level, activated partial thromboplastin time, serum creatinine level, blood urea nitrogen level, blood glucose level, total cholesterol level, total bilbilin level, monocyte count, direct bilirubin level, lactic acid Hydrogen enzyme (LDH) level, uric acid level, pH, potassium level, calcium level, sodium level, red blood cell count, hematocrit level, hemoglobin level, lymphocyte count, platelet count, basophil count, eosinophil count, and neutrophil The number of balls is mentioned.

第１データの形式については、後述するステップ（Ｓ２）に適用できることを条件として特に限定されない。 The format of the first data is not particularly limited on condition that it can be applied to step (S2) described later.

また、第１データを取得するステップ（Ｓ１−２）において、既に説明した「心筋マーカーに非由来の指標パラメータの群」にさらに加えて用いられる、心筋マーカーに非由来の指標パラメータの例としては、性別、糖尿病既往歴、高血圧症既往歴、脂質異常症既往歴、喫煙習慣の有無、年齢、身長、体重、心拍数、尿たんぱく（定性）、及び尿糖（定性）などの被検体から得られた生体情報が挙げられる。 Further, in the step (S1-2) of acquiring the first data, examples of the index parameter not derived from the myocardial marker used in addition to the already described “group of index parameters not derived from the myocardial marker” , Gender, history of diabetes, history of hypertension, history of dyslipidemia, presence or absence of smoking habits, age, height, weight, heart rate, urinary protein (qualitative), and urine sugar (qualitative) Biometric information obtained.

また、既に説明した心筋マーカーに非由来の指標パラメータの群にさらに加えて用いられる、心筋マーカーに非由来の指標パラメータの例としては、入院理由及び入院時現症などの被検体から得られた生体情報が挙げられる。 In addition, as an example of the index parameter not derived from the myocardial marker, which is used in addition to the group of index parameters not derived from the myocardial marker already described, it was obtained from a subject such as the reason for hospitalization and on-hospital disease. Biological information can be mentioned.

さらに、疾病が主要有害イベントである場合には、第１データ（及び後述する第２データ）は、既に説明した心筋マーカーに非由来の指標パラメータの群にさらに加えて、既に説明した心筋マーカーに由来する指標パラメータをさらに含む群から選択される指標パラメータに基づくデータとしてもよい。 Furthermore, when the disease is a major adverse event, the first data (and second data described later) is further added to the already described myocardial marker in addition to the group of index parameters not derived from the already described myocardial marker. It is good also as data based on the index parameter selected from the group which further contains the derived index parameter.

このように心筋マーカーに由来する指標パラメータをさらに用いれば、例えば、主要有害心イベントの発症後「３ヵ月以内」といった比較的短期間における再発リスクの予測精度をより向上させることができる。 Thus, if the index parameter derived from the myocardial marker is further used, for example, the prediction accuracy of the recurrence risk in a relatively short period such as “within 3 months” after the onset of the major adverse cardiac event can be further improved.

第１データ（及び後述する第２データ）は、既に説明した心筋マーカーに非由来の指標パラメータの群にさらに加えて、心電図に由来する指標パラメータをさらに含む群から選択される指標パラメータに基づくデータとすることができる。 The first data (and second data described later) is data based on an index parameter selected from the group further including an index parameter derived from an electrocardiogram in addition to the index parameter group not derived from the myocardial marker already described. It can be.

このような心電図に由来する指標パラメータの例としては、Ｐ波の高さ、Ｒ波の間隔（ＲＲ間隔）、ＰＱ時間、Ｒ波の高さ、ＱＲＳ幅、ＳＴ部分の変化量（Ｓ波の高さとＴ波の高さの総和）、Ｔ波の高さ、及び心電図をフーリエ変換して得られるパワースペクトルが挙げられる。 Examples of index parameters derived from such an electrocardiogram include P wave height, R wave interval (RR interval), PQ time, R wave height, QRS width, ST portion change amount (S wave The sum of the height and the height of the T wave), the height of the T wave, and the power spectrum obtained by Fourier transforming the electrocardiogram.

このように心電図に由来する指標パラメータをさらに用いれば、指標パラメータの数をさらに増やすことができるので、主要有害心イベントの再発リスクの予測精度をより向上させることができるという効果を得ることができる。 If the index parameters derived from the electrocardiogram are further used in this way, the number of index parameters can be further increased, so that the effect of improving the prediction accuracy of the recurrence risk of the main adverse cardiac event can be obtained. .

第１データは、既に説明した複数の指標パラメータの群から選択される２種以上の指標パラメータに基づいて生成することができる。かかる指標パラメータは、後述する選択頻度に基づいて選択することが好ましい。 The first data can be generated based on two or more types of index parameters selected from the group of index parameters already described. Such an index parameter is preferably selected based on a selection frequency described later.

本実施形態において、用いられ得る指標パラメータの数は、予測に要する時間などを勘案して任意好適な数とすることができる。用いられ得る指標パラメータの数は、例えば選択頻度がより高い指標パラメータを予測精度を勘案して適宜選択することにより決定することができる。 In the present embodiment, the number of index parameters that can be used can be any suitable number in consideration of the time required for prediction. The number of index parameters that can be used can be determined, for example, by appropriately selecting an index parameter having a higher selection frequency in consideration of prediction accuracy.

本実施形態において、第１データ（及び後述する第２データ）は、既に説明した指標パラメータのうち、２種以上の指標パラメータが用いられる。しかしながら、用いられ得る指標パラメータの数は特に限定されない。指標パラメータとしては、例えば、２種のみならず、５種以上、１０種以上、１５種以上、２０種以上、２５種以上、３０種以上、３５種以上、４０種以上、４５種以上又は既に説明したすべてを用い得る。 In the present embodiment, as the first data (and second data described later), two or more types of index parameters are used among the index parameters already described. However, the number of indicator parameters that can be used is not particularly limited. Examples of the index parameter include not only two types but also 5 types or more, 10 types or more, 15 types or more, 20 types or more, 25 types or more, 30 types or more, 35 types or more, 40 types or more, 45 types or more or already All described can be used.

２種以上の指標パラメータが用いられる場合、かかる２種以上の指標パラメータの組み合わせは、相関ルール分析、クラスタリングなどの処理によって適切な組み合わせを選択することができる。 When two or more types of index parameters are used, an appropriate combination of the two or more types of index parameters can be selected by processing such as correlation rule analysis or clustering.

（２）次いで、第１データを、第２データに基づいて構築された学習モデルで処理して、疾病の発症リスク又は再発リスクを予測するステップ（Ｓ２）が行われる。 (2) Next, the first data is processed with a learning model constructed based on the second data to predict the risk of disease onset or recurrence (S2).

〔学習モデルの構築方法〕
ここで、まず、図３を参照して、ステップ（Ｓ２）に用いられる学習モデルの構築方法にかかるステップ（Ｓ０）について説明する。図３は、ステップ（Ｓ０）を説明するためのフローチャートである。 [How to build a learning model]
Here, first, with reference to FIG. 3, step (S0) concerning the construction method of the learning model used for step (S2) will be described. FIG. 3 is a flowchart for explaining the step (S0).

なお、この学習モデルを構築するステップ（Ｓ０）は、既に説明したステップ（Ｓ１）に先行して行うこともできる。 Note that the step (S0) of building the learning model can be performed prior to the step (S1) already described.

図３に示されるように、まず、複数のビットストリングを含む第２データを準備するステップ（Ｓ０−１）が行われる。 As shown in FIG. 3, first, a step of preparing second data including a plurality of bit strings (S0-1) is performed.

具体的には、疾病が主要有害イベントである場合には、既に説明した「心筋マーカーに非由来の複数の指標パラメータからなる群」に加えられた群から選択頻度に基づいて選択された２種以上の指標パラメータに基づくビットストリングである第２データを生成して、準備する。 Specifically, when the disease is a major adverse event, two types selected based on the selection frequency from the group added to the “group consisting of a plurality of index parameters not derived from the myocardial marker” already described. Second data that is a bit string based on the above index parameters is generated and prepared.

第２データは、学習モデルを構築するための教師データである。ここでは疾病が主要有害イベントである場合の第２データ及びその生成について説明する。 The second data is teacher data for constructing a learning model. Here, the second data when the disease is a major adverse event and the generation thereof will be described.

まず、予め収集された症例データ群を準備する。ここでは症例データ群は、被検体である患者にかかる複数の症例データを収集することにより構成されたデータ群である。 First, a case data group collected in advance is prepared. Here, the case data group is a data group configured by collecting a plurality of case data relating to a patient as a subject.

症例データ群は、所定期間内における疾病の発症又は再発の有無が判明している複数の症例データ、すなわち所定期間内に疾病を発症したか、又は再発した「発症又は再発あり」と分類される「発症又は再発あり」症例データ及び所定期間内に疾病を発症又は再発しなかった「発症又は再発なし」と分類される「発症又は再発なし」症例データを含む。 The case data group is classified as a plurality of case data whose occurrence or recurrence of disease is known within a predetermined period, that is, the disease has occurred or recurred within the predetermined period. “Onset or recurrence” case data and “onset or no recurrence” case data classified as “onset or no recurrence” that did not develop or recurrence of disease within a predetermined period.

よって、症例データ群は、複数の「発症又は再発あり」症例データからなる「発症又は再発あり」症例データ群と、複数の「発症又は再発なし」症例データからなる「発症又は再発なし」症例データ群とから構成される。 Thus, the case data group consists of a plurality of “onset or recurrence” case data group consisting of a plurality of “onset or recurrence” case data and a “onset or recurrence” case data consisting of a plurality of “onset or recurrence” case data. It consists of a group.

次に、症例データ群、すなわち、疾病の「発症又は再発あり」症例データ群及び「発症又は再発なし」症例データ群それぞれを、（ｉ）学習モデル構築用の症例データ群と（ｉｉ）評価用の症例データ群とに分割する。これらのうち、分割された「学習モデル構築用の症例データ」を用いて第２データを生成させる。 Next, each of the case data groups, that is, the “onset or recurrence” case data group and the “onset or no recurrence” case data group of the disease, (i) a case data group for constructing a learning model and (ii) for evaluation It is divided into the case data group. Among these, the second data is generated using the divided “case data for constructing a learning model”.

ここで「学習モデル構築用の症例データ群」に含まれる「発症又は再発あり」症例データの数と「発症又は再発なし」症例データの数とが同等ではなく偏りがある場合には、学習モデルの構築にあたり学習バイアスが生じてしまうおそれがある。 Here, if the number of “onset or recurrence” case data included in the “case data group for learning model construction” and the number of “onset or recurrence” case data are not equal and biased, the learning model There is a risk of learning bias in building

よって、このように偏りがある場合には「発症又は再発あり」症例データの数と「発症又は再発なし」症例データの数とを同程度に揃える均等化処理を行って、学習モデル構築用の「均等化済み症例データ群」を調製する。具体的には、例えば「発症又は再発あり」症例データの数が「発症又は再発なし」症例データの数よりも少ない場合には、「発症又は再発あり」症例データの数と同一数の「発症又は再発なし」症例データを抽出して双方の数を揃える均等化処理を行うことが好ましい。 Therefore, if there is a bias in this way, an equalization process is performed to equalize the number of “onset or recurrence” case data and the number of “onset or recurrence” case data to the same extent. Prepare an “equalized case data group”. Specifically, for example, when the number of “onset or recurrence” case data is smaller than the number of “onset or recurrence” case data, the same number of “onset” as the number of “onset or recurrence” case data It is preferable to perform equalization processing by extracting case data “or no recurrence” and aligning both numbers.

次に、均等化済み症例データ群を、さらに複数の群に分割する。分割された群それぞれに含まれる症例データの数は同程度であればよく、同一とすることが好ましい。分割後の群の総数は特に限定されないが、例えば４程度とすることが好ましい。 Next, the equalized case data group is further divided into a plurality of groups. The number of case data included in each of the divided groups may be approximately the same, and is preferably the same. The total number of groups after division is not particularly limited, but is preferably about 4, for example.

次いで、分割された複数の均等化済み症例データ群のうちの一部を用いて「部分データ」を生成する。例えば４つの均等化済み症例データ群に分割された場合には、そのうちの３つの群（７５％）を用いて「部分データ」を生成させればよい。なお、この場合、残りの１つの群（２５％）は、「評価用データ」とされる。 Next, “partial data” is generated using a part of the divided plurality of equalized case data groups. For example, when the data is divided into four equalized case data groups, “partial data” may be generated using three groups (75%) of them. In this case, the remaining one group (25%) is set as “evaluation data”.

そして、この生成した複数の部分データ及び部分データの生成に用いられた元データである完全データ（症例データ）を含むデータ群である第２データを用いて、学習モデルが構築される（詳細は後述する。）。 And a learning model is constructed | assembled using the 2nd data which is the data group containing the complete data (case data) which is the original data used for the production | generation of this some partial data and partial data (for details, see FIG. (It will be described later.)

ここで、図４を参照して、疾病が主要有害心イベントである場合の第２データにかかる指標パラメータの例を説明する。
図４は、第２データにかかる指標パラメータの例を示す表である。
図４には、指標パラメータ名及び指標パラメータに加えて、指標パラメータの定義及び単位、並びに指標パラメータの種類が示されている。また、指標パラメータにはＩＤ番号として通し番号（１〜２４）が付されている。 Here, with reference to FIG. 4, an example of an index parameter related to the second data when the disease is a major adverse heart event will be described.
FIG. 4 is a table showing an example of index parameters according to the second data.
In FIG. 4, in addition to the index parameter name and the index parameter, the definition and unit of the index parameter, and the type of the index parameter are shown. The index parameters are assigned serial numbers (1 to 24) as ID numbers.

この場合の第２データについては「主要有害心イベントの発症後所定期間内に主要有害心イベントを再発したか、又は再発しなかったか（所定期間内における主要有害心イベントの再発歴）についての指標パラメータ」の選択は必須である。なお、第１データにおいては、かかる指標パラメータはそもそも存在し得ないため選択されない。 In this case, the second data is “an index on whether the major adverse event recurred within the prescribed period after the onset of the major adverse event or did not recur (history of major adverse events within the prescribed period). The selection of “parameter” is essential. In the first data, such an index parameter cannot be selected because it cannot exist in the first place.

この場合の第２データにかかる指標パラメータの群は、「主要有害心イベントの発症後所定期間内に主要有害心イベントを再発したか、又は再発しなかったかについての指標パラメータ」が含まれることを除き、既に説明した第１データにかかる指標パラメータの群と同一とすることができる。 The group of index parameters according to the second data in this case includes that “index parameters regarding whether or not a major adverse event occurred or not recurred within a predetermined period after the occurrence of a major adverse event” is included. Except for this, it can be the same as the group of index parameters related to the first data already described.

第２データにおける指標パラメータの選択において、第１データと同一の指標パラメータからなる群から選択される指標パラメータは、選択頻度に基づいて選択することが好ましい（詳細については後述する。）。 In the selection of the index parameter in the second data, it is preferable to select the index parameter selected from the group consisting of the same index parameters as the first data based on the selection frequency (details will be described later).

図５及び図６を参照して、第２データについて説明する。図５は、部分データを生成するための完全データの構成を説明する模式的な図である。図６は、部分データである第２データの模式的な図である。 The second data will be described with reference to FIGS. FIG. 5 is a schematic diagram for explaining the configuration of complete data for generating partial data. FIG. 6 is a schematic diagram of second data that is partial data.

図５及び図６に示されるように完全データ及びかかる完全データから生成される部分データを含む第２データは、機械学習時に使用される指標パラメータの選択及び症例データの選択をビット列（ビットストリング）として表現しているデータである。
図５に示される例では、部分データを生成するための完全データにおいては、１０種の指標パラメータ及び１０症例のデータを用いている。 As shown in FIG. 5 and FIG. 6, the second data including complete data and partial data generated from the complete data is a bit string (bit string) for selecting an index parameter and selecting case data used during machine learning. It is data expressed as
In the example shown in FIG. 5, the complete data for generating the partial data uses 10 kinds of index parameters and 10 cases of data.

ここで「症例データ」とは、選択された指標パラメータに対応する症例（患者）を特定するためのデータ（パラメータ）である。なお、単一の患者についての複数の症例が、別個の症例データとして存在する場合もありうる。 Here, the “case data” is data (parameters) for specifying a case (patient) corresponding to the selected index parameter. Note that a plurality of cases for a single patient may exist as separate case data.

図５及び図６に示されるように、完全データ及びその部分データを含む第２データにかかるビットストリングは、指標パラメータの選択又は非選択が記述される第１部分ＢＳＰ１と症例データの選択又は非選択が記述される第２部分ＢＳＰ２とにより構成される。この例では第１部分ＢＳＰ１の後に連続的に第２部分ＢＳＰ２が記述されて構成されている。なお、この例では第１部分ＢＳＰ１に１０種の指標パラメータ（ＩＤ：１〜１０）が記述され、第２部分ＢＳＰ２には用いられる１０症例の症例データ（ＩＤ：ＰＴ１〜ＰＴ１０）が記述されている。 As shown in FIG. 5 and FIG. 6, the bit string for the second data including the complete data and the partial data includes the first partial BSP1 in which the selection or non-selection of the index parameter is described and the selection or non-selection of the case data. It consists of a second part BSP2 in which the selection is described. In this example, the second part BSP2 is described continuously after the first part BSP1. In this example, ten index parameters (ID: 1 to 10) are described in the first part BSP1, and case data (ID: PT1 to PT10) of ten cases used are described in the second part BSP2. Yes.

図５及び図６において、最上段の数列はＩＤ番号を示しており、それより下段の数値は指標パラメータの選択又は非選択、並びに症例データの選択又は非選択を表している。 In FIG. 5 and FIG. 6, the uppermost number sequence indicates the ID number, and the numerical values on the lower level indicate selection or non-selection of the index parameter and selection or non-selection of the case data.

なお、図５に示されるビットストリングは、１０種の指標パラメータ及び１０症例の症例データをすべて用いた例を示す完全データであるので、ビットストリングを構成する数値はすべてが「１」で構成されている。ここで、仮に図４に示される指標パラメータと関連づけて考えると、かかるビットストリングは、具体的には（ＩＤ：１〜１０）にかかる指標パラメータが対照されて用いられることを意味している。 The bit string shown in FIG. 5 is complete data showing an example using all 10 kinds of index parameters and case data of 10 cases. Therefore, the numerical values constituting the bit string are all composed of “1”. ing. Here, if considered in association with the index parameter shown in FIG. 4, this bit string specifically means that the index parameter related to (ID: 1 to 10) is used in contrast.

図４に示されるように、得られた検査結果及び／又は生体情報が、例えば所定の成分の含有量、年齢、身長といった定量データ（数値データ）として取得される場合には、そのまま指標パラメータとして用いることができる。 As shown in FIG. 4, when the obtained test result and / or biological information is acquired as quantitative data (numerical data) such as the content, age, and height of a predetermined component, for example, it is used as an index parameter as it is. Can be used.

また、かかる定量データ（数値データ）は、例えば、年齢（高齢者、中高年、青年、少年、幼齢)を勘案して、順序尺度、間隔尺度、比例尺度などに変換して指標パラメータとして用いることができる。 In addition, such quantitative data (numerical data) should be converted into an ordinal scale, interval scale, proportional scale, etc., taking into account the age (elderly, middle-aged, adolescent, boy, young), for example, and used as an index parameter. Can do.

また、糖尿病既往歴、高血圧症既往歴、脂質異常症既往歴などの生体情報にかかる属性データ、陽性（レベル）、陰性などの定性データについては、例えば「あり＝１、なし＝０」と２値化するなどして指標パラメータとして用いることができる。 For attribute data relating to biological information such as a history of diabetes, a history of hypertension, a history of dyslipidemia, and qualitative data such as positive (level) and negative, for example, “Yes = 1, No = 0” and 2 It can be used as an index parameter by converting it into a value.

次に、完全データ、及び完全データに基づく部分データを含む第２データの生成について説明する。具体的には、完全データに含まれる複数の指標パラメータ（データ群の列）、及び症例データ（データ群の行）について取捨選択を行い、もとの完全データとは異なる複数の部分データを生成することにより、完全データ、及び完全データに基づく複数の部分データを含むデータ群である第２データを準備する。 Next, generation of second data including complete data and partial data based on the complete data will be described. Specifically, multiple index parameters (data group columns) and case data (data group rows) included in complete data are selected, and multiple partial data different from the original complete data are generated. Thus, the second data which is a data group including the complete data and a plurality of partial data based on the complete data is prepared.

上述のとおり、第２データには、対応する学習モデルの予測性能が十分であることを条件として、複数の部分データに加えて、すべての指標パラメータ及びすべての症例データを含む完全データが含まれていてもよい。 As described above, the second data includes complete data including all index parameters and all case data in addition to a plurality of partial data, provided that the prediction performance of the corresponding learning model is sufficient. It may be.

ここで、図６を参照して、部分データ及び部分データの生成について説明する。
ここでは、図５を参照して既に説明した第２データを構成し得る完全データ（ビットストリング）に基づく部分データ（ビットストリング）及びその生成ステップについて説明する。 Here, the generation of partial data and partial data will be described with reference to FIG.
Here, partial data (bit string) based on complete data (bit string) that can constitute the second data already described with reference to FIG. 5 and generation steps thereof will be described.

図６に示されるように、この例では１０種の指標パラメータのうち６種（ＩＤ＝１、３、４、６、７及び９）が選択され、４種（ＩＤ＝２、５、８及び１０）が非選択とされるとともに、１０症例の症例データのうち４症例（ＩＤ＝ＰＴ１、ＰＴ４、ＰＴ７及びＰＴ１０）が選択され、６症例（ＩＤ＝ＰＴ２、ＰＴ３、ＰＴ５、ＰＴ６、ＰＴ８及びＰＴ９）が非選択とされている。 As shown in FIG. 6, in this example, 6 types (ID = 1, 3, 4, 6, 7, and 9) are selected from 10 types of index parameters, and 4 types (ID = 2, 5, 8, and 8) are selected. 10) is not selected, and 4 cases (ID = PT1, PT4, PT7 and PT10) are selected from the case data of 10 cases, and 6 cases (ID = PT2, PT3, PT5, PT6, PT8 and PT9) are selected. ) Is not selected.

部分データは、具体的には、例えば、選択された指標パラメータ及び学習モデルの組み合わせを特に考慮することなく、選択結果が重複しないようにランダムに選択して複数の部分データを生成させることにより得ることができる。そして、得られた複数の部分データを第２データに含める処理が行われる。 Specifically, for example, the partial data is obtained by generating a plurality of partial data by randomly selecting the selection results so as not to overlap without particularly considering the combination of the selected index parameter and the learning model. be able to. Then, a process of including the obtained partial data in the second data is performed.

このように指標パラメータ（特徴量）をリサンプリングして得られた複数の部分データを組み入れたデータセットである第２データを得るステップを行うことにより、より大きな分散（Ｖａｒｉａｎｃｅ）を有する学習モデルを得ることができる。 A learning model having a larger variance is obtained by performing the step of obtaining second data that is a data set incorporating a plurality of partial data obtained by re-sampling the index parameter (feature value) in this way. Can be obtained.

なお、図６に示されるように本実施形態においては、部分データを含む第２データは、複数のビットストリングを含むデータセットとして管理、保存される。図６には、第２データであるデータセットに含まれる３つのパターンを有する部分データのビットストリング（ＢＳ１、ＢＳ２及びＢＳ３）が示されている。複数のビットストリングを含むデータセット（第２データ）にかかる処理の詳細については、後述する。 As shown in FIG. 6, in the present embodiment, the second data including partial data is managed and stored as a data set including a plurality of bit strings. FIG. 6 shows bit strings (BS1, BS2, and BS3) of partial data having three patterns included in the data set that is the second data. Details of the processing relating to a data set (second data) including a plurality of bit strings will be described later.

次に、得られた第２データを用いて、機械学習により学習モデルを構築するステップ（Ｓ０−２）が実施される。 Next, a step (S0-2) of constructing a learning model by machine learning is performed using the obtained second data.

本実施形態で構築される学習モデルには、複数種類の学習モデル、すなわち第１学習モデル及び第２学習モデルが含まれる。 The learning model constructed in the present embodiment includes a plurality of types of learning models, that is, a first learning model and a second learning model.

本実施形態において、第１学習モデルは、「所定期間内の疾病の発症或いは再発あり」又は「不明」を予測する学習モデルである。また、第２学習モデルは、「所定期間内の疾病の発症或いは再発なし」又は「不明」を予測する学習モデルである。 In the present embodiment, the first learning model is a learning model that predicts “the occurrence or recurrence of a disease within a predetermined period” or “unknown”. The second learning model is a learning model that predicts “no disease onset or recurrence within a predetermined period” or “unknown”.

以下、ステップ（Ｓ０−２）について具体的に説明する。
ここでは、第２データに含まれ得る完全データ、複数の部分データそれぞれを教師データとして用い、機械学習により複数の学習モデル、すなわち、複数の第１学習モデル及び複数の第２学習モデルを構築する。 Hereinafter, step (S0-2) will be specifically described.
Here, complete data that can be included in the second data and each of the plurality of partial data are used as teacher data, and a plurality of learning models, that is, a plurality of first learning models and a plurality of second learning models are constructed by machine learning. .

本実施形態において、第１学習モデル及び第２学習モデルは、サポートベクターマシン（ＳＶＭ）であることが好ましい。また、サポートベクターマシン以外の手段として、例えばニューラルネットワークなどの他の手段を用いることもできる。 In the present embodiment, the first learning model and the second learning model are preferably support vector machines (SVM). Also, as means other than the support vector machine, other means such as a neural network can be used.

本実施形態において、学習モデルを構築するための機械学習に用いられ得るサポートベクターマシンの例としては、ウェブサイト（ｈｔｔｐｓ：／／ｃｒａｎ．ｒ−ｐｒｏｊｅｃｔ．ｏｒｇ／ｗｅｂ／ｐａｃｋａｇｅｓ／ｅ１０７１）にて入手可能である「Ｒ言語（ｈｔｔｐｓ：／／ｗｗｗ．Ｒ−ｐｒｏｊｅｃｔ．ｏｒｇ）のｅ１０７１パッケージ」に基づくサポートベクターマシンが挙げられる。 In this embodiment, an example of a support vector machine that can be used for machine learning to construct a learning model is obtained from a website (https://cran.r-project.org/web/packages/e1071). A support vector machine based on the "e1071 package of R language (https://www.R-project.org)" is possible.

ここで、学習モデルは、第２データを複数のデータ群に分割し、該データ群それぞれを教師データとして用い、かつ複数の該データ群ごとに異なる複数のパラメータ条件で行われる機械学習により構築されることが好ましい。 Here, the learning model is constructed by machine learning performed by dividing the second data into a plurality of data groups, using each of the data groups as teacher data, and performing a plurality of parameter conditions different for each of the plurality of data groups. It is preferable.

本実施形態において、機械学習にはカーネル関数としてＲＢＦカーネル（ガウスカーネル）を用いたサポートベクターマシンを用いることができる。 In this embodiment, a support vector machine using an RBF kernel (Gauss kernel) as a kernel function can be used for machine learning.

機械学習は、具体的には、異なる複数のパラメータ条件で行われる。この「異なる複数のパラメータ条件」は、例えばランダムサンプリングにより最適な調整係数（ハイパーパラメータ）を調整することにより設定することができる。 Specifically, the machine learning is performed under a plurality of different parameter conditions. The “different parameter conditions” can be set by adjusting an optimum adjustment coefficient (hyper parameter) by, for example, random sampling.

例えば、上記のとおりガウスカーネルを採用したサポートベクターマシンを用いる場合には、調整係数であるγパラメータ及びＣパラメータを調整することにより、同一の部分データから複数の異なる学習モデルを構築することができる。 For example, when using a support vector machine employing a Gaussian kernel as described above, a plurality of different learning models can be constructed from the same partial data by adjusting the γ parameter and the C parameter, which are adjustment coefficients. .

この場合、例えば、識別境界線の複雑さを調節するγパラメータとして、γ＝０．０１、γ＝０．０２、γ＝０．０３、γ＝０．０４、γ＝０．０５の５種類を使用し、ソフトマージンの許容パラメータＣについては機械学習時の性能評価の際に用いられる識別関数（後述する。）によって代替されるため固定して、Ｃ＝１００として学習モデルを構築することができる。 In this case, for example, five parameters, γ = 0.01, γ = 0.02, γ = 0.03, γ = 0.04, and γ = 0.05, are used as γ parameters for adjusting the complexity of the identification boundary line. , And the allowable parameter C of the soft margin is replaced by a discriminant function (described later) used in performance evaluation during machine learning, so that the learning model can be constructed with C = 100. it can.

調整係数を例えば上記のように設定することにより、より大きな分散（Ｖａｒｉａｎｃｅ）を有する学習モデルを得ることができる。 For example, by setting the adjustment coefficient as described above, a learning model having a larger variance can be obtained.

サポートベクターマシンは、本来は分散が小さくなるように構築される学習モデルである。しかしながら、本実施形態では、あえて分散の大きいサポートベクターマシンを構築し、後述する処理をさらに行っている。 The support vector machine is a learning model that is originally constructed so that the variance is small. However, in the present embodiment, a support vector machine having a large dispersion is constructed, and the processing described later is further performed.

本実施形態によれば、あえて分散の大きいサポートベクターマシンを構築することで、結果として、予測精度をより向上させることができる。以下の説明において、特に断らない限り、学習モデルとしてサポートベクターマシンを用いる処理について説明する。 According to the present embodiment, as a result, prediction accuracy can be further improved by constructing a support vector machine having a large variance. In the following description, processing using a support vector machine as a learning model will be described unless otherwise specified.

次いで、評価用データを学習モデルで処理し、予測結果をパレートランクにより評価するステップ（Ｓ０−３）が行われる。 Next, a step (S0-3) of processing the evaluation data with the learning model and evaluating the prediction result based on the Pareto rank is performed.

このステップ（Ｓ０−３）では、感度及び陽性的中率を指標とするパレートランクにより第１学習モデル及び第２学習モデルの性能を評価する。以下、この評価ステップについて具体的に説明する。 In this step (S0-3), the performance of the first learning model and the second learning model is evaluated based on the Pareto rank using the sensitivity and the positive predictive value as indices. Hereinafter, this evaluation step will be specifically described.

１）まず、構築された学習モデル（第１学習モデル及び第２学習モデル）と既に説明した評価用データとを用いて、すべての学習モデルについて予測結果を得る。 1) First, prediction results are obtained for all learning models using the constructed learning models (first learning model and second learning model) and the evaluation data already described.

２）予測可能な領域と不可能な領域を前提とする識別問題を扱う場合、誤識別の度合いをなるべく小さくするとともに、正しく予測できる対象の数をなるべく大きくすることが重要である。そのため、目的関数Ｏ_１（予測可能な領域において、予測エラーを計測するための関数）及び目的関数Ｏ_２（データ空間上で正しく予測できるデータ数を計数するための関数）を用いて、得られた予測結果を評価する。 2) When dealing with an identification problem that assumes a predictable area and an impossible area, it is important to reduce the degree of misidentification as much as possible and to increase the number of objects that can be correctly predicted as much as possible. Therefore, it is obtained using the objective function O ₁ (function for measuring a prediction error in a predictable region) and the objective function O ₂ (function for counting the number of data that can be correctly predicted in the data space). Evaluate the predicted results.

ここで、目的関数Ｏ_１及び目的関数Ｏ_２について説明する。ここでは、識別関数としてサポートベクターマシンを利用して、「疾病の発症或いは再発あり」を予測目標とする第１学習モデルを評価する場合について説明する。 Here, the objective function O ₁ and the objective function O ₂ will be described. Here, a case will be described in which a support vector machine is used as an identification function to evaluate a first learning model whose prediction target is “the occurrence or recurrence of a disease”.

前提として、評価用データ（ｘ，ｙ）において、ｘはある症例について測定された値の組であり、ｙは正解ラベルであり、「疾病の発症又は再発あり＝＋１」および「疾病の発症又は再発なし＝−１」という符号データとして、いずれかの値を取るものとする。 As a premise, in the evaluation data (x, y), x is a set of values measured for a certain case, y is a correct answer label, “onset or recurrence of disease = + 1” and “onset of disease or One of the values is assumed as the code data of “no recurrence = −1”.

所定の第２データを用いて学習することで構築された第１学習モデルの識別関数をｆとすると、この学習モデルの識別目標である「疾病の発症或いは再発あり」に基づき、識別関数ｆにかかるｘを入力したときに算出された予測値ｆ（ｘ）に基づいて、ｆ（ｘ）＞０であれば「疾病の発症或いは再発あり」と予測され、ｆ（ｘ）＜０であれば「不明」とされる。逆に、識別目標が「疾病の発症或いは再発なし」である第２学習モデルにおいては、識別関数の予測値がｆ（ｘ）＜０であれば「疾病の発症又は再発なし」と予測され、ｆ（ｘ）＞０であれば「不明」とされる。 If the discriminant function of the first learning model constructed by learning using the predetermined second data is f, the discriminant function f is determined based on the discriminating target of this learning model, “onset or recurrence of disease”. Based on the predicted value f (x) calculated when such x is input, if f (x)> 0, it is predicted that “the disease will occur or recur”, and if f (x) <0. “Unknown”. Conversely, in the second learning model whose identification target is “no disease onset or recurrence”, if the prediction value of the discrimination function is f (x) <0, “no disease onset or recurrence” is predicted, If f (x)> 0, it is determined as “unknown”.

本実施形態にかかる「疾病の発症或いは再発あり又は不明」を予測する第１学習モデルは、予測可能領域に存在するデータについては常に「疾病の発症或いは再発あり」と予測し、予測可能領域外に存在するデータについては「不明」とされる。 The first learning model for predicting “with or without disease onset or recurrence” according to the present embodiment always predicts “with the onset or recurrence of disease” for data existing in the predictable region, and out of the predictable region. The data existing in is considered “unknown”.

よって、第１学習モデルの予測可能領域に「疾病の発症或いは再発なし」と判定されるべきデータが存在してしまったときには、予測は常に失敗することになる。 Therefore, when there is data that should be determined as “no disease onset or recurrence” in the predictable region of the first learning model, the prediction always fails.

第１学習モデルの「疾病の発症或いは再発あり」との予測可能領域における、目的関数Ｏ_１と目的関数Ｏ_２とによる評価について説明する。 The evaluation based on the objective function O ₁ and the objective function O ₂ in the predictable region of “the onset or recurrence of the disease” in the first learning model will be described.

なお、目的関数Ｏ_１による評価は、学習モデルとしてサポートベクターマシン以外を用いる場合にも、２群判別関数を用いた学習モデルに対して一般化することができる。具体的には、例えば、線形判別関数、２次判別関数、ロジスティック判別関数を用いる場合にも、サポートベクターマシンの場合と同様に、誤予測したデータから識別線までの距離を用いて評価を行うことができる。 Note that the evaluation by the objective function O ₁ can be generalized for the learning model using the two-group discriminant function even when a learning model other than the support vector machine is used. Specifically, for example, when a linear discriminant function, a quadratic discriminant function, or a logistic discriminant function is used, evaluation is performed using the distance from the mispredicted data to the identification line as in the case of the support vector machine. be able to.

目的関数Ｏ_１では、誤予測したデータから識別線までの距離（ＳＶＭＣｏｎｆｉｄｅｎｃｅＭａｒｇｉｎ）の総和の最小化を考える。 In the objective function O ₁ , the minimization of the sum total of the distance (SVM Confidence Margin) from the mispredicted data to the identification line is considered.

ＳＶＭＣｏｎｆｉｄｅｎｃｅＭａｒｇｉｎは、評価用データ（ｘ，ｙ）の識別関数ｆが算出する予測値ｆ（ｘ）を用いて、予測値ｆ（ｘ）と正解ラベルｙとの積ｙｆ（ｘ）と定義される。 The SVM Confidence Margin is defined as the product yf (x) of the predicted value f (x) and the correct answer label y using the predicted value f (x) calculated by the identification function f of the evaluation data (x, y). The

ここで、評価用データが正しく「疾病の発症或いは再発あり」と予測される場合、ＳＶＭＣｏｎｆｉｄｅｎｃｅＭａｒｇｉｎは、ｆ（ｘ）＞０という予測値と正解ラベル「疾病の発症或いは再発あり＝＋１」との積であるので、正の値をとる。他方、予測可能領域内における誤予測、すなわちｆ（ｘ）＞０となり「疾病の発症或いは再発あり」と予測されたにもかかわらず、正解ラベルｙは「疾病の発症或いは再発なし＝−１」であった場合、予測値と正解ラベルとの積は、負の値をとる。 Here, when the evaluation data is correctly predicted as “the onset or recurrence of the disease”, the SVM Confidence Margin has the predicted value of f (x)> 0 and the correct label “the onset or recurrence of the disease = + 1”. Since it is a product of, it takes a positive value. On the other hand, the correct label y is “no disease onset or recurrence = −1” even though it is predicted that there is a misprediction in the predictable region, that is, f (x)> 0 and “onset or recurrence of disease” is predicted. The product of the predicted value and the correct answer label takes a negative value.

同様に、第２学習モデルにおいても、ＳＶＭＣｏｎｆｉｄｅｎｃｅＭａｒｇｉｎは、評価用データが正しく「疾病の発症或いは再発なし」と予測される場合、ｆ（ｘ）＜０という予測値と正解ラベル「疾病の発症或いは再発なし＝−１」との積であるので、正の値をとり、誤予測されると負の値をとる。 Similarly, in the second learning model, the SVM Confidence Margin calculates the predicted value of f (x) <0 and the correct label “onset of disease” when the evaluation data is correctly predicted as “no disease onset or recurrence”. Or, since it is a product of “no recurrence = −1”, it takes a positive value, and takes a negative value if mispredicted.

ＳＶＭＣｏｎｆｉｄｅｎｃｅＭａｒｇｉｎによる目的関数Ｏ_１の最小化は下記式（１）で表される。

The minimization of the objective function O ₁ by SVM Confidence Margin is expressed by the following equation (1).

式（１）中、ｍ（ｙ，ｆ（ｘ））について、

であり、ａｂｓ［ｘ］はｘの絶対値を表す。すなわち、式（１）は、ＳＶＭＣｏｎｆｉｄｅｎｃｅＭａｒｇｉｎにおいて、予測可能領域における誤識別の度合いのみを集計するための機能を有する。 In formula (1), for m (y, f (x))

And abs [x] represents the absolute value of x. That is, the expression (1) has a function for counting only the degree of misidentification in the predictable region in the SVM Confidence Margin.

「疾病の発症或いは再発あり」を予測目標とする第１学習モデルを評価する場合、各評価用データの識別線からの距離について、「疾病の発症或いは再発あり」データが負例（ｆ（ｘ_ｉ）＜０）と予測された評価用データの予測距離だけを集計する。 When evaluating the first learning model with “onset or recurrence of disease” as a prediction target, the “onset or recurrence of disease” data is a negative example (f (x _i ) Only the predicted distances of the evaluation data predicted as <0) are tabulated.

次に、目的関数Ｏ_２について、誤予測をある程度許容しながら、正しく予測される「疾病の発症或いは再発あり」の評価用データの個数の最大化を考える。 Next, regarding the objective function O ₂ , maximization of the number of data for evaluation of “the onset or recurrence of disease” that is correctly predicted while allowing some misprediction to some extent is considered.

識別関数ｆの予測の正誤を正解ラベルｙ及び予測値ｆ（ｘ）を用いて表すと下記式（２）で表される。

When the correctness of the prediction of the discrimination function f is expressed using the correct answer label y and the predicted value f (x), it is expressed by the following formula (2).

ここでｋ個の評価用データについて予測を行った場合の目的関数Ｏ_２の最大化は下記式（３）で表される。

Here, maximization of the objective function O ₂ when prediction is performed for k pieces of evaluation data is expressed by the following equation (3).

式（３）中、右辺第２項は予測可能領域内の「疾病の発症或いは再発なし」の総数による正則化を表している。 In Expression (3), the second term on the right side represents regularization based on the total number of “no disease onset or recurrence” in the predictable region.

式（３）中、誤予測の許容度を調整する変数であるα（１＞α＞０）は、α＝０．３と設定することが好ましい。 In equation (3), α (1> α> 0), which is a variable for adjusting the tolerance of misprediction, is preferably set to α = 0.3.

こうして、学習モデルの予測結果が、パレートランクにより評価される。 Thus, the prediction result of the learning model is evaluated by the Pareto rank.

次いで、評価が高い学習モデル及び評価が高い学習モデルを構築できたビットストリングを選抜するステップ（Ｓ０−４）が行われる。 Next, a step (S0-4) of selecting a learning model having a high evaluation and a bit string for which a learning model having a high evaluation can be constructed is performed.

具体的には、既に説明した目的関数Ｏ_１及びＯ_２による評価値（Ｏ_１，１／Ｏ_２）がより小さかった学習モデル（第１学習モデル及び第２学習モデル）及びかかる学習モデルを構築することできたビットストリング（第２データ）が選抜される。 Specifically, a learning model (first learning model and second learning model) in which the evaluation values (O ₁ , 1 / O ₂ ) by the objective functions O ₁ and O ₂ already described are smaller and such a learning model are constructed. The bit string (second data) that can be selected is selected.

ここで、ステップ（Ｓ０−４）について具体的に説明する。 Here, step (S0-4) will be specifically described.

まず、既に説明した第１データと指標パラメータの構成が同一であるデータであって、かつ指標パラメータの数値が第１データとは一致しない複数の評価用データを用意する。 First, a plurality of evaluation data is prepared which has the same index parameter configuration as that of the first data described above and whose index parameter values do not match the first data.

学習モデルの選抜に用いる評価用データとしては、例えば、分割された複数の均等化済み症例データ群のうちの第２データ（部分データ）の生成に用いられなかった均等化済み症例データ群に属する症例データを用いることができる。 The evaluation data used for selecting the learning model belongs to, for example, an equalized case data group that has not been used to generate second data (partial data) among a plurality of divided equalized case data groups Case data can be used.

次いで、かかる評価用データを複数の第１学習モデル及び複数の第２学習モデルでそれぞれ処理して、所定期間内における疾病の発症リスク又は再発リスクを予測する。 Then, the evaluation data is processed by a plurality of first learning models and a plurality of second learning models, respectively, to predict the risk of disease onset or recurrence within a predetermined period.

次に、得られた予測結果について、既に説明した目的関数Ｏ_１及びＯ_２を用いて感度及び陽性的中率を指標とするパレートランクにより第１学習モデル及び第２学習モデルを評価する。 Next, with respect to the obtained prediction results, the first learning model and the second learning model are evaluated by the Pareto rank using the sensitivity and positive predictive value as indices using the objective functions O ₁ and O ₂ already described.

得られた評価結果に基づいて、感度及び陽性的中率がいずれも高い、すなわち、評価が高い学習モデル（第１学習モデル及び第２学習モデル）及びかかる学習モデルを構築することができた第２データ（ビットストリング）を選抜して保存する。 Based on the obtained evaluation results, both the sensitivity and the positive predictive value are high, that is, the learning model (the first learning model and the second learning model) having a high evaluation and such a learning model can be constructed. Two data (bit strings) are selected and stored.

選抜される学習モデル及び対応するビットストリングの個数は、要する時間、実施規模などを勘案して、任意好適な個数とすることができる。既に説明した本実施形態の場合には、４０個程度とすることが好ましい。 The number of learning models to be selected and the number of corresponding bit strings can be arbitrarily set in consideration of the time required, the implementation scale, and the like. In the case of this embodiment which has already been described, it is preferable that the number is about 40.

評価が高い学習モデル（第１学習モデル及び第２学習モデル）及び対応するビットストリング（第２データ）を選抜するステップ（Ｓ０−４）は、感度については１以下であって、０．９５以上、０．７以上又は０．６以上とすることが好ましく、偽陽性率については０以上であって、０．４以下、０．３以下又は０．０５以下とすることが好ましい。 The step (S0-4) of selecting a highly evaluated learning model (first learning model and second learning model) and a corresponding bit string (second data) has a sensitivity of 1 or less and 0.95 or more. 0.7 or more, preferably 0.6 or more, and the false positive rate is 0 or more, preferably 0.4 or less, 0.3 or less, or 0.05 or less.

なお、この評価が高い第１学習モデル及び第２学習モデルを選抜するステップにより選抜された学習モデルにかかる指標パラメータの選択頻度を分析した分析結果は、第１データ及び第２データを構築するための指標パラメータの選択に用いることができる。 The analysis result of analyzing the selection frequency of the index parameter concerning the learning model selected by the step of selecting the first learning model and the second learning model having a high evaluation is used to construct the first data and the second data. Can be used to select the index parameter.

具体的には、第１学習モデル及び第２学習モデルを構築するにあたり、選択頻度が高い指標パラメータを第１データ及び第２データを生成する際に予め選択すれば、予測方法の実施に必要な時間を短縮することができ、予測精度をより向上させることができる。 Specifically, when the first learning model and the second learning model are constructed, if an index parameter having a high selection frequency is selected in advance when the first data and the second data are generated, it is necessary to implement the prediction method. Time can be shortened and prediction accuracy can be further improved.

また、評価が高い第１学習モデル及び第２学習モデルを構築することができたビットストリングを用いて、採用された指標パラメータの組み合わせを解析することにより、予測精度をより向上させ得る複数の指標パラメータの組み合わせを見出すことができる。 In addition, a plurality of indices that can further improve the prediction accuracy by analyzing a combination of the adopted index parameters using the bit strings that have been able to build the first learning model and the second learning model that are highly evaluated. A combination of parameters can be found.

このようにして見出された複数の指標パラメータの組み合わせを第１データ及び第２データを生成する際に予め選択すれば、予測方法の実施に必要な時間を短縮することができ、予測精度をより向上させることができる。 If a combination of a plurality of index parameters found in this way is selected in advance when generating the first data and the second data, the time required to implement the prediction method can be shortened, and the prediction accuracy can be improved. It can be improved further.

次に、学習モデルが所定の要件を満たすか判定するステップ（Ｓ０−５）が行われる。
具体的には、上記ステップ（Ｓ０−４）の実施により選抜された学習モデル（第１学習モデル及び第２学習モデル）が所定の要件を満たしているか否かについて判定するステップ（Ｓ０−５）が行われる。 Next, a step (S0-5) of determining whether the learning model satisfies a predetermined requirement is performed.
Specifically, a step (S0-5) for determining whether or not the learning models (the first learning model and the second learning model) selected by performing the step (S0-4) satisfy a predetermined requirement. Is done.

ステップ（Ｓ０−５）は、具体的には、第１学習モデル及び第２学習モデルが所定の要件、例えば、ビットストリングの性能、すなわち選抜された第１学習モデル及び第２学習モデルの予測精度が所定の予測精度、例えば予測精度の改善率が０．１％を下回るか否か、第１学習モデル及び第２学習モデル（ビットストリング）の更新にかかる世代数が任意に設定した世代数の上限（例えば、１００世代）を満たすか否かについての判定が行われる。 Specifically, the step (S0-5) is a step in which the first learning model and the second learning model have predetermined requirements, for example, the performance of the bit string, that is, the prediction accuracy of the selected first learning model and second learning model. Is a predetermined prediction accuracy, for example, whether the improvement rate of the prediction accuracy is less than 0.1%, or the number of generations required to update the first learning model and the second learning model (bit string) is an arbitrarily set number of generations A determination is made as to whether an upper limit (eg, 100 generations) is satisfied.

まず、上記ステップ（Ｓ０−５）において、上記ステップ（Ｓ０−４）の実施により、選抜された第１学習モデル及び第２学習モデルが所定の要件を満たしていない場合（ステップ（Ｓ０−５）において「Ｎｏ」の場合）について説明する。 First, in the step (S0-5), when the selected first learning model and second learning model do not satisfy the predetermined requirements due to the execution of the step (S0-4) (step (S0-5)). In the case of “No” in FIG.

選抜された第１学習モデル及び第２学習モデル（学習モデル）が所定の要件を満たしていない場合（ステップ（Ｓ０−５）において「Ｎｏ」の場合、例えば、前記例における判定の結果、第１学習モデル及び第２学習モデルの予測精度が０．１％を下回っていた場合、及び／又は第１学習モデル及び第２学習モデル（ビットストリング）の世代数が１００世代に達していない場合）、かかる学習モデルに対応する、選抜されたビットストリングに対し、遺伝的アルゴリズムを用いて、進化的な処理を行い、新たなビットストリングを生成するステップ（Ｓ０−６）が行われる。 When the selected first learning model and second learning model (learning model) do not satisfy the predetermined requirements (“No” in step (S0-5)), for example, as a result of the determination in the example, the first When the prediction accuracy of the learning model and the second learning model is less than 0.1% and / or when the number of generations of the first learning model and the second learning model (bit string) does not reach 100 generations), A step (S0-6) of generating a new bit string by performing evolutionary processing on the selected bit string corresponding to the learning model using a genetic algorithm is performed.

なお、かかるビットストリングの生成（最適化）のステップは、遺伝的アルゴリズムのみならず、例えば、全パターンの組み合わせの探索、ランダム探索などによっても行うことができる。 Note that the bit string generation (optimization) step can be performed not only by a genetic algorithm but also by, for example, a search for a combination of all patterns, a random search, or the like.

ここでは、遺伝的アルゴリズムを用いるビットストリングの最適化のステップについて説明する。 Here, the step of optimizing a bit string using a genetic algorithm will be described.

遺伝的アルゴリズムを用いるビットストリングの最適化のステップでは、より優れた予測性能を備える学習モデルを構築することができたビットストリングの選抜と選抜された複数のビットストリングを含む第２データ（データセット）の更新及び保存とが行われる。 In the step of optimizing the bit string using the genetic algorithm, the selection of the bit string and the second data including the plurality of selected bit strings (data set) that can construct the learning model having better prediction performance ) Is updated and saved.

（１）まず、既に説明したとおり、目的関数Ｏ_１及びＯ_２による学習モデルの評価に基づいて、より評価が高い、すなわちパレートランクが高い学習モデルを構築することができたビットストリングがより上位となるように順位付けする。 (1) First, as already described, based on the evaluation of the learning model by the objective functions O ₁ and O _2, the bit string that can construct the learning model having a higher evaluation, that is, the Pareto rank is higher. Ranking so that

具体的には、順位付けが済んだビットストリングについて、例えば、より上位のビットストリングがより上段に来るよう並べ替え、データセットに含まれるビットストリングの数なども考慮して、評価が低いビットストリングをデータセットから除外するなど編集する。そして、更新された複数のビットストリングを含むデータセット（第２データ）を、かかるデータセットに含まれるビットストリングが読み出しできる状態として保存する。 Specifically, for the bit strings that have already been ranked, for example, rearrange the higher-order bit strings to the upper level, and consider the number of bit strings included in the data set. Edit by excluding from the dataset. Then, the data set (second data) including the updated plurality of bit strings is stored in a state where the bit strings included in the data set can be read.

図６に示されるビットストリングを例に取ると、ビットストリングＢＳ１が最上位のビットストリングであり、ビットストリングＢＳ２が第２位のビットストリングであり、ビットストリングＢＳ３が第３位のビットストリングである。 Taking the bit string shown in FIG. 6 as an example, bit string BS1 is the most significant bit string, bit string BS2 is the second most significant bit string, and bit string BS3 is the third most significant bit string. .

（２）次に、遺伝的アルゴリズムを用いて、順位付けがされたビットストリングに対して、淘汰、交叉、突然変異の導入、ビットストリングの評価といった進化的な処理を行う。 (2) Next, the genetic algorithm is used to perform evolutionary processing such as selection, crossover, mutation introduction, and bit string evaluation on the ranked bit strings.

かかる遺伝的アルゴリズムによる処理は、本実施形態では、例えば、ＮＳＧＡ−ＩＩ（ＥｌｉｔｉｓｔＮｏｎ−ｄｏｍｉｎａｔｅｄＳｏｒｔｉｎｇＧｅｎｅｔｉｃＡｌｇｏｒｉｔｈｍ）を用いて実施することができる。ここで、ＮＳＧＡ−ＩＩは、非優越ソート遺伝的アルゴリズムである。 In the present embodiment, the processing using such a genetic algorithm can be performed using, for example, NSGA-II (Elitist Non-Dominated Sorting Genetic Algorithm). Here, NSGA-II is a non-dominant sort genetic algorithm.

かかる遺伝的アルゴリズムによる処理は、例えば、一世代あたりのモデル数を５００とし、アーカイブサイズを１２５とし、ビットストリングあたりの突然変異率を１０％とし、一点交叉させ、８０世代まで更新する条件として行うことができる。 The processing by the genetic algorithm is performed, for example, as a condition in which the number of models per generation is 500, the archive size is 125, the mutation rate per bit string is 10%, one-point crossover is performed, and update is performed up to 80 generations. be able to.

そして、かかる遺伝的アルゴリズムによる処理により新たに生成されたビットストリングをもとの第２データに組み込んで、最新の第２データ（データセット）に更新し、データセットに含まれるビットストリングを読み出しできる状態として保存する処理を行う（Ｓ０−７）。 Then, the bit string newly generated by the processing by the genetic algorithm is incorporated into the original second data, updated to the latest second data (data set), and the bit string included in the data set can be read. A process of saving as a state is performed (S0-7).

次いで、更新された最新の第２データを用いて、再度、既に説明した複数のビットストリングを含む第２データを準備するステップ（Ｓ０−１）に戻って、再度ステップ（Ｓ０−５）までのステップが繰り返され、さらにステップ（Ｓ０−５）における判定結果が「Ｎｏ」の場合には、ステップ（Ｓ０−５）における判定結果が「Ｙｅｓ」となるまで、ステップ（Ｓ０−１）からステップ（Ｓ０−７）までが繰り返し実施される。 Next, using the updated second data, the process returns to the step (S0-1) for preparing the second data including the plurality of bit strings already described, and the steps up to step (S0-5) are performed again. If the step is repeated and the determination result in step (S0-5) is “No”, step (S0-1) to step (S0-1) are repeated until the determination result in step (S0-5) is “Yes”. Steps S0-7) are repeated.

このようにして、より優れた学習モデルを構築することができるビットストリングが選抜され、より優れた第２データにかかるデータセットを保持することができる。 In this way, a bit string capable of constructing a better learning model is selected, and a data set related to better second data can be held.

かかる遺伝的アルゴリズムによるビットストリングの最適化の処理においては、変数（指標データ）の選択が同時に行われる。具体的には、成績（予測精度）がより優れた第２データ（ビットストリング）が採用している変数と採用していない変数とを、例えば、変数の選択頻度を比較することで、各変数の重要性を評価し、重要性が高いと判断される変数を選択して採用する。 In the process of optimizing a bit string by such a genetic algorithm, selection of variables (index data) is performed simultaneously. Specifically, for example, by comparing the variable used in the second data (bit string) with better results (prediction accuracy) and the variable not used, for example, by comparing the selection frequency of the variables. Evaluate the importance of and select variables that are judged to be highly important.

かかる変数選択により、予測により寄与すると考えられる変数の抽出、又は変数の組み合わせの絞り込みを効果的に行うことができる。 By such variable selection, it is possible to effectively extract variables considered to contribute to prediction or narrow down combinations of variables.

そして、ステップ（Ｓ０−５）における判定結果が、既に説明した所定の要件を満たしており「Ｙｅｓ」であった場合（前記例における判定の結果、第１学習モデル及び第２学習モデルの予測精度が０．１％以上であった場合、及び／又は第１学習モデル及び第２学習モデル（ビットストリング）の世代数が１００世代に達していた場合）には、ビットストリング（第２データ）の更新は終了し、最終的なビットストリングに基づく学習モデルが選抜学習モデル（第１選抜学習モデル及び第２選抜学習モデル）として選抜される。 If the determination result in step (S0-5) satisfies the predetermined requirement already described and is “Yes” (the determination accuracy in the above example, the prediction accuracy of the first learning model and the second learning model) Is 0.1% or more and / or the number of generations of the first learning model and the second learning model (bit string) reaches 100 generations), the bit string (second data) The update is completed, and a learning model based on the final bit string is selected as a selection learning model (a first selection learning model and a second selection learning model).

最後に、第１選抜学習モデル及び第２選抜学習モデルが保存される（Ｓ０−８）。より具体的には、最終的に選抜された第１選抜学習モデル及び第２選抜学習モデルが、読み出し可能な状態として保存される。またここで、更新が終了した第２データ（ビットストリング）を読み出しできる状態として保存する。
かかるステップ（Ｓ０−８）が実施されることによりステップ（Ｓ０）は終了する。 Finally, the first selection learning model and the second selection learning model are stored (S0-8). More specifically, the first selection learning model and the second selection learning model that are finally selected are stored in a readable state. Here, the second data (bit string) that has been updated is stored in a state where it can be read out.
By executing step (S0-8), step (S0) is completed.

次に、図７を参照して、ステップ（Ｓ２）について説明する。図７は、ステップ（Ｓ２）を説明するためのフローチャートである。 Next, step (S2) will be described with reference to FIG. FIG. 7 is a flowchart for explaining step (S2).

まず、第１データを、第１選抜学習モデル及び第２選抜学習モデルで処理し、第１選抜学習モデル及び第２選抜学習モデルごとに第１判定結果を取得するステップ（Ｓ２−１）が行われる。 First, the first data is processed by the first selection learning model and the second selection learning model, and the step of obtaining the first determination result for each of the first selection learning model and the second selection learning model (S2-1) is performed. Is called.

このステップ（Ｓ２−１）により、複数の第１選抜学習モデルそれぞれの複数の第１判定結果及び複数の第２選抜学習モデルの複数の第１判定結果を取得することができる。 By this step (S2-1), a plurality of first determination results of each of the plurality of first selection learning models and a plurality of first determination results of the plurality of second selection learning models can be acquired.

次に、複数の第１選抜学習モデルの複数の第１判定結果及び複数の第２選抜学習モデルの複数の第１判定結果それぞれについて投票が行われ、第１選抜学習モデル及び第２選抜学習モデルごとに得票率に基づく第２判定結果を取得するステップが行われる。以下、かかるステップについて具体的に説明する。 Next, voting is performed for each of the plurality of first determination results of the plurality of first selection learning models and the plurality of first determination results of the plurality of second selection learning models, and the first selection learning model and the second selection learning model. A step of acquiring a second determination result based on the vote rate is performed every time. Hereinafter, this step will be specifically described.

まず、複数の第１選抜学習モデルの複数の第１判定結果及び複数の第２選抜学習モデルの複数の第１判定結果それぞれについて投票が行われる（Ｓ２−２）。 First, voting is performed for each of the plurality of first determination results of the plurality of first selection learning models and the plurality of first determination results of the plurality of second selection learning models (S2-2).

具体的には、得られた第１判定結果に基づいて、複数の第１選抜学習モデルは「疾病の発症又は再発あり」又は「不明」のいずれかに投票する。複数の第２選抜学習モデルは「疾病の発症又は再発なし」又は「不明」のいずれかに投票する。それぞれの投票結果は、第１選抜学習モデル及び第２選抜学習モデルごとに集計される。 Specifically, based on the obtained first determination result, the plurality of first selection learning models vote for either “the onset or recurrence of the disease” or “unknown”. The plurality of second selection learning models vote for “no disease onset or recurrence” or “unknown”. Each voting result is totaled for each of the first selection learning model and the second selection learning model.

次に、第１選抜学習モデル及び第２選抜学習モデルごとに得票率が算出される。次いで、得票率とカットオフ値とが対照され、得票率がカットオフ値と等しいか又は上回るか否か（カットオフ値≦得票率）が判定される（Ｓ２−３）。なお、カットオフ値の詳細については後述する。 Next, a vote rate is calculated for each of the first selection learning model and the second selection learning model. Next, the vote rate is compared with the cut-off value, and it is determined whether the vote rate is equal to or exceeds the cut-off value (cut-off value ≦ voting rate) (S2-3). Details of the cutoff value will be described later.

結果として、得票率が設定されたカットオフ値と等しいか又は上回る場合（ステップ（Ｓ２-３）において「Ｙｅｓ」の場合）には、第１選抜学習モデルについては所定期間内の疾病の発症又は再発を「あり」と判定する第２判定結果が取得され、また、第２選抜学習モデルについては所定期間内の疾病の発症又は再発を「なし」と判定する第２判定結果が取得される（Ｓ２−４）。 As a result, when the vote rate is equal to or exceeds the set cut-off value (in the case of “Yes” in step (S2-3)), for the first selection learning model, A second determination result for determining relapse as “present” is acquired, and a second determination result for determining occurrence or relapse of a disease within a predetermined period as “none” is acquired for the second selective learning model ( S2-4).

得票率が設定されたカットオフ値よりも小さい場合（ステップ（Ｓ２−３）において「Ｎｏ」の場合）には、第１選抜学習モデル及び第２選抜学習モデルのいずれについても所定期間内の疾病の発症リスク又は再発リスクを「不明」とする第２判定結果が取得される（Ｓ２−５）。 When the vote rate is smaller than the set cut-off value (“No” in step (S2-3)), the disease within a predetermined period for both the first selection learning model and the second selection learning model The 2nd determination result which makes the onset risk or recurrence risk of "unknown" is acquired (S2-5).

ここで、表１を参照して、複数の第１選抜学習モデルについての複数の第１判定結果に基づく投票結果の例について説明する。 Here, an example of voting results based on a plurality of first determination results for a plurality of first selection learning models will be described with reference to Table 1.

表１は、第１選抜学習モデルにかかる得票率及び第２判定結果の例を示す表である。ここでは、有害イベントについての３個の第１データそれぞれについて４個の第１選抜学習モデルを用いて第２判定結果を得る例について説明する。なお、この例は、第２判定結果を得るにあたり、１年以内における主要有害心イベントの再発を「あり」と判定する場合の得票率を７５％（３／４）、すなわち、カットオフ値を０．７５とした例である。 Table 1 is a table showing an example of the vote rate and the second determination result according to the first selection learning model. Here, an example in which the second determination result is obtained using four first selection learning models for each of the three first data regarding the harmful event will be described. In this example, in order to obtain the second determination result, the vote rate when determining that there is a recurrence of the major adverse event within one year as “Yes” is 75% (3/4), that is, the cut-off value is This is an example of 0.75.

表１に示されるとおり、この例では、第１データ（１）については、第１学習モデル（１）〜（４）のいずれも主要有害心イベントの再発を「あり」と判定したため、「あり」の得票率は１００％（４／４）であり、結果として、設定されたカットオフ値を上回る（カットオフ値≦得票率）ため、第２判定結果は「あり」とされる。 As shown in Table 1, in this example, for the first data (1), since any of the first learning models (1) to (4) determined that the recurrence of the major adverse cardiac event was “Yes”, “Yes” "" Is 100% (4/4) and, as a result, exceeds the set cut-off value (cut-off value≤voting rate), the second determination result is "Yes".

第１データ（２）については、第１学習モデル（１）〜（３）が「あり」と判定し、第１学習モデル（４）は「不明」と判定したため、「あり」の得票率は７５％（３／４）であり、結果として、第２判定結果は「あり」とされる。 For the first data (2), the first learning models (1) to (3) are determined to be “present”, and the first learning model (4) is determined to be “unknown”. 75% (3/4), and as a result, the second determination result is “present”.

第１データ（３）については、第１学習モデル（１）〜（３）が「不明」と判定し、第１学習モデル（４）は「あり」と判定したため、「あり」の得票率は２５％（１／４）であり、設定されたカットオフ値よりも小さい（カットオフ値＞得票率）ため、結果として、第２判定結果は「不明」とされる。 For the first data (3), the first learning models (1) to (3) are determined to be “unknown”, and the first learning model (4) is determined to be “present”. Since it is 25% (1/4) and smaller than the set cut-off value (cut-off value> voting rate), the second determination result is “unknown” as a result.

表１に示される例は、「あり」又は「不明」を予測する第１学習モデルにかかる例であるところ、「なし」又は「不明」を予測する第２学習モデルにかかる第２判定結果についても、表１中の第１学習モデル（１）〜（４）それぞれを第２学習モデル（１）〜（４）に置換し、第２判定結果を含め、「あり」を「なし」に反転することで同様に説明することができる。 The example shown in Table 1 is an example related to the first learning model that predicts “Yes” or “Unknown”, and the second determination result related to the second learning model that predicts “None” or “Unknown”. In Table 1, the first learning models (1) to (4) in Table 1 are replaced with the second learning models (1) to (4), and the second determination result is included and “Yes” is inverted to “No”. This can be explained similarly.

本実施形態では、第１判定結果について投票が行われ、得票率に基づく第２判定結果を取得するステップが行われる。本実施形態では、予測精度について、用いられる学習モデルの数が増加するに従って、単体の学習モデルを用いる場合の予測性能と比較して改善する傾向がある。また、学習モデルの数が所定数に達すると、以後の学習モデルの数のさらなる増加に従って予測エラーの数が増加する傾向がある。 In the present embodiment, voting is performed on the first determination result, and a step of acquiring the second determination result based on the vote rate is performed. In the present embodiment, the prediction accuracy tends to improve as compared to the prediction performance when a single learning model is used as the number of learning models used increases. Further, when the number of learning models reaches a predetermined number, the number of prediction errors tends to increase as the number of learning models increases thereafter.

このことは、用いられる学習モデルの数に最適値があることを示唆している。よって、予測精度を向上させるため（予測エラーの数を低減するため）、例えば、学習モデルの数と学習モデルの数の増加に伴う予測エラーの発生数の変動を観測するなどして、予測エラーの発生数が増加する方向に変動してしまった学習モデルの数よりも、用いる学習モデルの数を少なくすることにより、予測精度の向上（予測エラーの数の低減）を図ることができる。本実施形態においては、予測エラーの発生数の増加に加えて、「発症或いは再発あり」の予測と「発症或いは再発なし」の予測との投票数を揃えることも考慮すると、多くとも４０モデル程度ずつの第１選抜学習モデル及び第２選抜学習モデルをそれぞれ使用することが好ましい。 This suggests that there is an optimal value for the number of learning models used. Therefore, to improve prediction accuracy (to reduce the number of prediction errors), for example, by observing fluctuations in the number of learning errors and the number of occurrences of prediction errors as the number of learning models increases, By reducing the number of learning models used rather than the number of learning models that have fluctuated in the direction of increasing the number of occurrences, it is possible to improve prediction accuracy (reduce the number of prediction errors). In the present embodiment, in addition to the increase in the number of occurrences of prediction errors, considering that the number of votes for the prediction of “onset or recurrence” and the prediction of “onset or no recurrence” are aligned, at most about 40 models Each of the first selection learning model and the second selection learning model is preferably used.

本実施形態では、カットオフ値は、より目的に適した予測を行うために適宜調整することができる。ここで、第１選抜学習モデルの得票率のカットオフ値と、第２選抜学習モデルの得票率のカットオフ値とを異なる値に設定することができる。 In the present embodiment, the cut-off value can be appropriately adjusted in order to perform prediction more suitable for the purpose. Here, the cut-off value of the vote rate of the first selection learning model and the cut-off value of the vote rate of the second selection learning model can be set to different values.

上記のように主要有害心イベントや急性腎障害が対象である除外診断の場合には、偽陰性、すなわち、発症なしと予測したにもかかわらず発症を起こしてしまったケースを可能な限り除外することが必要である。そこで、第１学習モデルのカットオフ値が第２学習モデルのカットオフ値よりも小さくなるように、すなわち、第２学習モデルのカットオフ値が第１学習モデルのカットオフ値よりも大きくなるように設定することで、偽陰性率をより低くすることができる。このようにすれば、より「除外診断」に適した予測方法とすることができる。 In the case of exclusion diagnosis that covers major adverse cardiac events and acute kidney injury as described above, we will exclude as much as possible false negatives, that is, cases that have developed even though we predicted no onset It is necessary. Therefore, the cutoff value of the first learning model is made smaller than the cutoff value of the second learning model, that is, the cutoff value of the second learning model is made larger than the cutoff value of the first learning model. By setting to, the false negative rate can be further reduced. In this way, a prediction method more suitable for “exclusion diagnosis” can be achieved.

また、第１学習モデルのカットオフ値が第２学習モデルのカットオフ値よりも大きくなるように設定することで、偽陽性率をより低くすることができる。結果として、糖尿病性腎症を予測対象とする場合のように、「確定診断」により適した予測方法とすることができる。 Further, by setting the cutoff value of the first learning model to be larger than the cutoff value of the second learning model, the false positive rate can be further reduced. As a result, a prediction method more suitable for “definite diagnosis” can be achieved as in the case where diabetic nephropathy is targeted for prediction.

カットオフ値は、対象となる疾病、かかる疾病における予測結果が誤りであった場合の重大性などに鑑みて適宜調整される。 The cut-off value is appropriately adjusted in view of the target disease, the seriousness when the prediction result in such a disease is incorrect, and the like.

本実施形態では、第１学習モデル（発症或いは再発あり）の予測及び第２学習モデル（発症或いは再発なし）の予測のいずれについても、投票により決定されるという性質上、カットオフ値が高すぎるといかなる第１データを適用したとしてもすべて「不明」と予測されてしまい、カットオフ値が低すぎると逆にすべて「発症或いは再発あり」と予測されてしまうおそれがある。以上に鑑みると、本実施形態では、カットオフ値は、０．３〜０．７の範囲で設定することが好ましい。 In the present embodiment, both the prediction of the first learning model (onset or recurrence) and the prediction of the second learning model (onset or no recurrence) are determined by voting, so that the cut-off value is too high. No matter what first data is applied, all are predicted to be “unknown”, and if the cut-off value is too low, all may be predicted to be “onset or recurrence”. In view of the above, in the present embodiment, the cutoff value is preferably set in the range of 0.3 to 0.7.

主要有害心イベントの除外診断を目標とした本実施形態においては、具体的には、第１学習モデルのカットオフ値を０．３〜０．６の範囲とし、第２学習モデルのカットオフ値を０．４〜０．７の範囲とすることが第１学習モデル及び第２学習モデルの投票結果の採択率に勾配を持たせる観点から好ましい。 Specifically, in the present embodiment aimed at exclusion diagnosis of major adverse cardiac events, the cutoff value of the first learning model is set to a range of 0.3 to 0.6, and the cutoff value of the second learning model is set. Is preferably in the range of 0.4 to 0.7 from the viewpoint of giving a gradient to the acceptance rate of the voting results of the first learning model and the second learning model.

カットオフ値の評価について、図８及び図９を参照して説明する。図８及び図９は、カットオフ値の評価結果を説明するための表である。 The evaluation of the cutoff value will be described with reference to FIGS. 8 and 9 are tables for explaining the evaluation result of the cutoff value.

ここでは、全症例数が１２３１であるうち、陽性症例数（１年以内に主要有害心イベントを再発した患者数）が１００であり、陰性症例数（１年以内に主要有害心イベントを再発しなかった患者数）が１１３１であるモデルを用いる例を示す。 Here, out of a total of 1231 cases, the number of positive cases (the number of patients who recurred major adverse events within one year) was 100, and the number of negative cases (the major adverse events recurred within one year). An example of using a model in which the number of patients who did not exist is 1131 is shown.

第１学習モデルのカットオフ値を０．３５と設定した場合の第２学習モデルのカットオフ値の最適値を探索したところ、第２学習モデルの最適なカットオフ値は０．４５であった。 When the optimum value of the cutoff value of the second learning model was searched for when the cutoff value of the first learning model was set to 0.35, the optimum cutoff value of the second learning model was 0.45. .

図８及び図９から明らかなように、第１学習モデルのカットオフ値を０．３５と設定し、第２学習モデルのカットオフ値を０．４５に設定した場合（図９において、Ｐ３５＿Ｎ４５）の感度は０．７７であり、特異度は０．４５であった。 As is apparent from FIGS. 8 and 9, when the cutoff value of the first learning model is set to 0.35 and the cutoff value of the second learning model is set to 0.45 (in FIG. 9, P35_N45). The sensitivity was 0.77, and the specificity was 0.45.

なお、図８においては、予測結果が「不明」であったケースを非計数とした計算値が上段に示されており、下段には従来法に従った計算値が示されている。 In FIG. 8, the calculated value in which the case where the prediction result is “unknown” is not counted is shown in the upper part, and the calculated value according to the conventional method is shown in the lower part.

また、図８において、「ＹＹ」は「第１学習モデルが『再発あり』と判定し（Ｙ）、かつ第２学習モデルが『再発なし』と判定した（Ｙ）ケースの数を表しており、「ＵＵ」は「第１学習モデルが『不明』と判定し（Ｕ）、かつ第２学習モデルが『不明』と判定した（Ｕ）ケースの数を表している。 In FIG. 8, “YY” indicates the number of cases where “the first learning model is determined to be“ recurring ”(Y) and the second learning model is determined to be“ no recurrence ”(Y). , “UU” represents the number of cases where the first learning model is determined as “unknown” (U) and the second learning model is determined as “unknown” (U).

次に、第１選抜学習モデルの第２判定結果及び第２選抜学習モデルの第２判定結果を統合して、第３判定結果を取得するステップ（Ｓ２−６）が行われる。 Next, a step (S2-6) of acquiring a third determination result by integrating the second determination result of the first selection learning model and the second determination result of the second selection learning model is performed.

このステップ（Ｓ２−６）により、第１選抜学習モデル（第１学習モデル）の「所定期間内における疾病の発症或いは再発あり、又は不明」にかかる分類問題の第２判定結果と、第２選抜学習モデル（第２学習モデル）の「所定期間内における疾病の発症或いは再発なし、又は不明」の分類問題にかかる第２判定結果とが統合される。結果として、第１選抜学習モデルの第２判定結果と、第２選抜学習モデルの第２判定結果とが統合された第３判定結果が取得される。 By this step (S2-6), the second selection result of the classification problem relating to “the onset or recurrence of disease within the predetermined period or unknown” of the first selection learning model (first learning model) and the second selection The second determination result relating to the classification problem of “no disease onset or recurrence within a predetermined period or unknown” in the learning model (second learning model) is integrated. As a result, a third determination result obtained by integrating the second determination result of the first selective learning model and the second determination result of the second selective learning model is acquired.

具体的には、例えば（１）第１選抜学習モデルにかかる第２判定結果が「疾病の発症或いは再発あり」であり、かつ第２選抜学習モデルにかかる第２判定結果が「不明」である場合には、第３判定結果は「疾病の発症或いは再発あり」とされる。また、（２）第１選抜学習モデルにかかる第２判定結果が「不明」であり、かつ第２選抜学習モデルにかかる第２判定結果が「疾病の発症或いは再発なし」である場合には、第３判定結果は「疾病の発症或いは再発なし」とされる。 Specifically, for example, (1) the second determination result related to the first selection learning model is “onset or recurrence of disease”, and the second determination result related to the second selection learning model is “unknown”. In this case, the third determination result is “onset or recurrence of disease”. (2) When the second determination result concerning the first selection learning model is “unknown” and the second determination result concerning the second selection learning model is “no disease onset or recurrence”, The third determination result is “no disease onset or recurrence”.

なお、第１選抜学習モデルにかかる第２判定結果が「疾病の発症或いは再発あり」であり、かつ第２選抜学習モデルにかかる第２判定結果が「疾病の発症或いは再発なし」である場合、及び第１選抜学習モデルにかかる第２判定結果が「不明」であり、かつ第２選抜学習モデルにかかる第２判定結果も「不明」である場合には、第３判定結果は「不明」とされる。 When the second determination result related to the first selective learning model is “with the onset or recurrence of the disease” and the second determination result according to the second selective learning model is “the onset or no recurrence of the disease” When the second determination result concerning the first selection learning model is “unknown” and the second determination result concerning the second selection learning model is also “unknown”, the third determination result is “unknown”. Is done.

次に、第３判定結果に基づいて、疾病の発症リスク又は再発リスクを予測するステップ（Ｓ２−７）が行われる。
具体的には、既に説明したステップ（Ｓ２−６）により得られた第３判定結果に基づいて、前記（１）の場合には「所定の期間内において疾病の発症又は再発のリスクが高い」と判定され、前記（２）の場合には「所定の期間内において疾病の発症又は再発のリスクが低い」と判定され、前記（１）及び（２）以外の場合には、所定の期間内における疾病の発症又は再発のリスクの判定は保留される。 Next, based on the third determination result, a step (S2-7) of predicting a disease onset risk or a recurrence risk is performed.
Specifically, based on the third determination result obtained in step (S2-6) already described, in the case of (1) above, “the risk of disease onset or recurrence is high within a predetermined period” In the case of (2) above, it is determined that “the risk of disease onset or recurrence is low within a predetermined period”, and in cases other than (1) and (2) above, within the predetermined period The determination of the risk of disease onset or recurrence in is suspended.

既に説明したとおり、このステップ（Ｓ２−７）は、通常、コンピュータにより行われる。しかしながら、第３判定結果に基づく発症リスク又は再発リスクの予測、発症リスク又は再発リスクの予測に基づく通院頻度の決定などは、例えば、医師、コンサルタントなどの知見に基づいたルール定義ファイルやアルゴリズムに従った処理とすることもできる。 As already described, this step (S2-7) is usually performed by a computer. However, the onset risk or recurrence risk prediction based on the third determination result, the determination of the frequency of outpatient visit based on the onset risk or recurrence risk prediction, etc., follow a rule definition file or algorithm based on the knowledge of doctors, consultants, etc. Can also be processed.

既に説明した第２判定結果に基づいて、予測された発症リスク又は再発リスクの信頼度を算出するステップ（Ｓ２−８）を実施することができる。 The step (S2-8) of calculating the reliability of the predicted onset risk or recurrence risk can be performed based on the already described second determination result.

具体的には、まず、第２判定結果の取得にあたり、複数の学習モデルが、例えば、「発症又は再発あり」と判定した場合には「＋１」点、「発症又は再発なし」と判定した場合には「−１」点、「不明」と判定した場合には「０」点を付与し、同一の被検体にかかる全モデルについてのスコアの総和を算定し、かかる総和に対する「発症又は再発あり」のスコアの総和又は「発症又は再発なし」のスコアの総和の割合を、信頼度（％）として算出する。 Specifically, first, when acquiring the second determination result, when a plurality of learning models determine, for example, “onset or recurrence”, “+1” point, and “onset or recurrence” Is assigned “−1” points, and “0” points are assigned when it is determined as “unknown”, and the sum of scores for all models for the same subject is calculated. The ratio of the sum of the scores of “” or the sum of the scores of “no onset or recurrence” is calculated as the reliability (%).

なお、算出される信頼度（％）の値は、−１００（％）（疾病の発症リスク又は再発リスクが小さい）から１００（％）（疾病の発症リスク又は再発リスクが大きい）までの値を取り得る。 The calculated reliability (%) is a value from -100 (%) (small risk of disease onset or recurrence) to 100 (%) (high risk of disease onset or recurrence). I can take it.

かかる信頼度（％）は、過去の症例に基づいて、既に説明したとおり機械学習による解析と、学習に用いたデータにおける発症頻度又は再発頻度とを紐つけして算出されているので、信頼度の指標として好適である。 The reliability (%) is calculated based on the past cases by associating the analysis by machine learning with the onset frequency or recurrence frequency in the data used for learning as described above. It is suitable as an index of

〔予測装置〕
本実施形態のプログラムは、演算部を備えるコンピュータにより実行される。以下、図１０及び図１１を参照して、予測装置であるコンピュータ１０の構成について、説明する。 [Prediction device]
The program according to the present embodiment is executed by a computer including a calculation unit. Hereinafter, with reference to FIG. 10 and FIG. 11, the configuration of the computer 10 that is a prediction device will be described.

図１０は、コンピュータの構成を説明するための模式的なブロック図である。図１１は、演算部の構成を説明するための模式的なブロック図である。 FIG. 10 is a schematic block diagram for explaining the configuration of the computer. FIG. 11 is a schematic block diagram for explaining the configuration of the calculation unit.

図１０に示されるように、コンピュータ１０は、取得されたパラメータに基づいてデータを生成したり、かかるデータを格納させるなどの命令を処理する演算部１２を備えている。 As shown in FIG. 10, the computer 10 includes an arithmetic unit 12 that processes instructions such as generating data based on the acquired parameters and storing the data.

演算部１２は、例えば、マイクロプロセッサ（ＣＰＵ）、グラフィックプロセッサ（ＧＰＵ）などに相当する機能部である。 The calculation unit 12 is a functional unit corresponding to, for example, a microprocessor (CPU), a graphic processor (GPU), or the like.

コンピュータ１０は、入力されたデータ、生成されたデータ等を一時的に、又は所定期間記憶することができ、かつ読み出し可能な状態で格納する記憶部１４をさらに備えている。 The computer 10 further includes a storage unit 14 that can store input data, generated data, and the like temporarily or for a predetermined period, and stores the input data in a readable state.

記憶部１４は、例えば、メモリ（ＲＡＭ）装置、ハードディスクドライブ、ＳＳＤなどに相当し、演算部１２と協働するように構成される機能部である。 The storage unit 14 corresponds to, for example, a memory (RAM) device, a hard disk drive, an SSD, and the like, and is a functional unit configured to cooperate with the calculation unit 12.

記憶部１４に、読み出し可能な状態で格納されて保存され得るデータの例としては、第１データ、第２データ（完全データ、部分データ、更新されたビットストリングのデータセット）、評価用データ、サポートベクターマシン、第１学習モデル、第２学習モデル、第１選抜学習モデル、第２選抜学習モデルが挙げられる。 Examples of data that can be stored and stored in the storage unit 14 in a readable state include first data, second data (complete data, partial data, updated bit string data set), evaluation data, Examples include a support vector machine, a first learning model, a second learning model, a first selection learning model, and a second selection learning model.

コンピュータ１０は、外部の機能部、機器とデータをやりとりするための例えばシリアル接続、パラレル接続等のインターフェースである入出力部１６などの機能部をさらに備えている。 The computer 10 further includes external functional units and functional units such as an input / output unit 16 that is an interface such as serial connection and parallel connection for exchanging data with the device.

また、コンピュータ１０には、入出力部１６に接続されることにより機能する、キーボード、マウスなどの入力装置２２、データを視覚的に表示できる表示機器、紙媒体などに生成されたデータを出力することができるプリンター、データベースを構成する読み出し及び書き込みが可能な大容量の外部記憶装置３２などのいわゆるコンピュータハードウェア資源、又はこれらの各構成要素に対応する専用のハードウェア資源が、コンピュータ１０が備える機能部と協働するように接続される構成とすることもできる。 In addition, the computer 10 outputs data generated on an input device 22 such as a keyboard and a mouse, a display device capable of visually displaying data, a paper medium, and the like, which functions when connected to the input / output unit 16. The computer 10 includes a so-called computer hardware resource such as a printer capable of reading and writing, a large-capacity external storage device 32 capable of reading and writing, or a dedicated hardware resource corresponding to each of these components. It can also be set as the structure connected so that it may cooperate with a function part.

具体的には、例えば、既に説明した第１データについてのデータベースと第２データについてのデータベースとを格納する外部記憶装置、又は第１データをデータベースとして格納するハードウェア資源と第２データをデータベースとして格納するハードウェア資源とをそれぞれ別体の外部記憶装置として構成し、これらを既に説明したコンピュータの設置地から物理的に離間した遠隔地に設置し、これらを電気通信回線により協働可能なように接続するように構成してもよい。 Specifically, for example, an external storage device that stores the database for the first data and the database for the second data already described, or hardware resources that store the first data as a database and the second data as the database The hardware resources to be stored are configured as separate external storage devices, installed in remote locations that are physically separated from the computer installation locations described above, and can cooperate with each other by electric communication lines. You may comprise so that it may connect to.

ここで「電気通信回線により接続されている」とは、電気、光等の媒体による有線または無線による情報回線により、データ、制御信号等をやりとりすることができるように接続して、接続された機器が協働できるように構成されていることを意味している。 Here, “connected by telecommunication line” means connected by connecting data, control signals, etc. via a wired or wireless information line using a medium such as electricity or light. This means that the devices are configured to work together.

本実施形態のプログラムを実行する予測装置は、単体のコンピュータで構成されていても、複数のコンピュータ（サーバ、操作端末なども含む。）及びその他の周辺機器が電気通信回線により一体的に接続されたシステムとして構成されていてもよい。 Even if the prediction device for executing the program of the present embodiment is constituted by a single computer, a plurality of computers (including servers, operation terminals, etc.) and other peripheral devices are integrally connected by an electric communication line. It may be configured as a system.

また、上述した実施形態で説明した各処理のうち、自動的に行なわれるものとして説明した処理の全部または一部を手動的に行うこともでき、手動的に行なわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。その他、上述の説明及び図面で示した処理手順、制御手順、具体的名称、各種のデータ、パラメータを含む情報、画面例、データベースの構成については、特記する場合を除いて任意に変更することができる。 In addition, among the processes described in the above-described embodiments, all or part of the processes described as being automatically performed can be manually performed, or all of the processes described as being performed manually are performed. Alternatively, a part can be automatically performed by a known method. In addition, the processing procedures, control procedures, specific names, various data, information including parameters, screen examples, and database configurations shown in the above description and drawings may be arbitrarily changed unless otherwise specified. it can.

本実施形態の予測装置は、所定期間内における疾病の発症リスク又は再発リスクを予測するための予測装置である。 The prediction device of this embodiment is a prediction device for predicting the risk of disease onset or the risk of recurrence within a predetermined period.

図１１に示されるように、本実施形態の予測装置１００は、既に説明した演算部１２において相互に協働して機能する複数の機能部を含んでいる。本実施形態の予測装置１００は、第１データ生成取得部１２ａ、第２データ生成取得部１２ｂ、学習モデル構築部１２ｄ、学習モデル選抜部１２ｅ、第１判定結果生成取得部１２ｆ、第２判定結果生成取得部１２ｇ、第３判定結果生成取得部１２ｈ及び予測部１２ｉを備えている。 As illustrated in FIG. 11, the prediction device 100 of the present embodiment includes a plurality of functional units that function in cooperation with each other in the arithmetic unit 12 that has already been described. The prediction device 100 of the present embodiment includes a first data generation acquisition unit 12a, a second data generation acquisition unit 12b, a learning model construction unit 12d, a learning model selection unit 12e, a first determination result generation acquisition unit 12f, and a second determination result. A generation acquisition unit 12g, a third determination result generation acquisition unit 12h, and a prediction unit 12i are provided.

本発明の一実施形態の予測装置１００は、被検体から採取された試験試料を用いて測定されたか、又は該被検体から得られた２以上の指標パラメータを取得し、該指標パラメータに基づいて第１データを生成して取得する第１データ生成取得部１２ａと、第１データ生成取得部１２ａが取得した第１データを、第１選抜学習モデル及び第２選抜学習モデルにより処理し、該第１選抜学習モデル及び第２選抜学習モデルごとに第１判定結果を生成して取得する第１判定結果生成取得部１２ｆと、第１判定結果生成取得部１２ｆが取得した複数の第１選抜学習モデルの複数の第１判定結果及び複数の第２選抜学習モデルの複数の第１判定結果に基づいて、第２判定結果を生成して取得する第２判定結果生成取得部１２ｇと、第２判定結果生成取得部１２ｇが取得した第１選抜学習モデルの第２判定結果及び第２選抜学習モデルの第２判定結果を統合して、第３判定結果を生成して取得する第３判定結果生成取得部１２ｈと、第３判定結果生成取得部１２ｈが取得した第３判定結果に基づいて、疾病の発症リスク又は再発リスクを予測する予測部１２ｉとを備えている。 The prediction apparatus 100 according to an embodiment of the present invention acquires two or more index parameters measured using a test sample collected from a subject or obtained from the subject, and based on the index parameters The first data generation / acquisition unit 12a that generates and acquires the first data and the first data acquired by the first data generation / acquisition unit 12a are processed by the first selection learning model and the second selection learning model, and the first data A first determination result generation acquisition unit 12f that generates and acquires a first determination result for each of the one selection learning model and the second selection learning model, and a plurality of first selection learning models acquired by the first determination result generation acquisition unit 12f. A second determination result generation and acquisition unit 12g that generates and acquires a second determination result based on the plurality of first determination results and the plurality of first determination results of the plurality of second selection learning models; Generation A third determination result generation acquisition unit 12h that generates and acquires a third determination result by integrating the second determination result of the first selection learning model acquired by 12g and the second determination result of the second selection learning model; And a prediction unit 12i that predicts a disease onset risk or a recurrence risk based on the third determination result acquired by the third determination result generation acquisition unit 12h.

別の一実施形態の予測装置１００は、所定期間内に疾病を発症或いは再発したか、又は発症或いは再発しなかった複数の被検体についての複数の指標パラメータからなる群から選択頻度に基づいて選択された２種以上の該指標パラメータに基づく第２データを生成して取得する第２データ生成取得部１２ｂと、第２データ生成取得部１２ｂから取得した第２データに基づいて、疾病の発症或いは再発あり又は不明を予測する複数の第１学習モデル、及び疾病の発症或いは再発なし又は不明を予測する複数の第２学習モデルを構築する学習モデル構築部１２ｄと、学習モデル構築部１２ｄが構築した複数の第１学習モデル及び複数の第２学習モデルごとに、複数の第１選抜学習モデル、及び複数の第２選抜学習モデルを選抜する学習モデル選抜部１２ｅとをさらに含むことが好ましい。 The prediction apparatus 100 according to another embodiment selects based on the selection frequency from a group consisting of a plurality of index parameters for a plurality of subjects who have developed or relapsed a disease within a predetermined period or have not developed or recurred. Based on the second data generated and acquired from the second data generation and acquisition unit 12b, the second data generation and acquisition unit 12b that generates and acquires the second data based on the two or more types of the index parameters, A learning model construction unit 12d and a learning model construction unit 12d constructed a plurality of first learning models for predicting the presence or absence of recurrence and a plurality of second learning models for predicting the onset or no recurrence of disease or unknown. Learning model selection for selecting a plurality of first selection learning models and a plurality of second selection learning models for each of the plurality of first learning models and the plurality of second learning models It may further include a 12e.

さらに別の一実施形態の予測装置１００においては、第２データ生成取得部１２ｂが、複数の指標パラメータについて取捨選択を行い、部分データを生成することにより、複数の部分データ及び部分データの生成に用いられた完全データ（症例データ）を含み得る第２データを生成する機能部であることが好ましい。 In the prediction device 100 of yet another embodiment, the second data generation / acquisition unit 12b selects a plurality of index parameters and generates partial data, thereby generating a plurality of partial data and partial data. It is preferable that it is a function part which produces | generates the 2nd data which may contain the used complete data (case data).

さらにまた別の一実施形態の予測装置１００においては、学習モデル構築部１２ｄが、第２データ生成取得部１２ｂから取得した第２データを複数のデータ群に分割し、該データ群それぞれを教師データとして用い、かつ複数の該データ群ごとに異なる複数のパラメータ条件で行われる機械学習により、複数の第１学習モデル及び複数の第２学習モデルを構築する機能部であることが好ましい。 In the prediction apparatus 100 according to yet another embodiment, the learning model construction unit 12d divides the second data acquired from the second data generation acquisition unit 12b into a plurality of data groups, and each of the data groups is teacher data. And a functional unit that constructs a plurality of first learning models and a plurality of second learning models by machine learning performed under a plurality of parameter conditions that differ for each of the plurality of data groups.

また別の一実施形態の予測装置１００においては、第２判定結果生成取得部１２ｇが、複数の第１選抜学習モデルの複数の第１判定結果及び複数の第２選抜学習モデルの複数の第１判定結果について投票を行い、第１選抜学習モデル及び第２選抜学習モデルごとに得票率に基づく第２判定結果を生成して取得する機能部であることが好ましい。 In the prediction apparatus 100 according to another embodiment, the second determination result generation acquisition unit 12g includes a plurality of first determination results of the plurality of first selection learning models and a plurality of first of the plurality of second selection learning models. It is preferable that the function unit performs voting on the determination result, and generates and acquires the second determination result based on the vote rate for each of the first selection learning model and the second selection learning model.

１０コンピュータ
１２演算部
１２ａ第１データ生成取得部
１２ｂ第２データ生成取得部
１２ｄ学習モデル構築部
１２ｅ学習モデル選抜部
１２ｆ第１判定結果生成取得部
１２ｇ第２判定結果生成取得部
１２ｈ第３判定結果生成取得部
１２ｉ予測部
１４記憶部
１６入出力部
２２入力装置
３２外部記憶装置
１００予測装置 DESCRIPTION OF SYMBOLS 10 Computer 12 Calculation part 12a 1st data generation acquisition part 12b 2nd data generation acquisition part 12d Learning model construction part 12e Learning model selection part 12f 1st determination result generation acquisition part 12g 2nd determination result generation acquisition part 12h 3rd determination result Generation acquisition unit 12i Prediction unit 14 Storage unit 16 Input / output unit 22 Input device 32 External storage device 100 Prediction device

Claims

By using the first data obtained from a test sample measured from the subject, including the following steps executed by a computer having a calculation unit, or within a predetermined period A program for predicting the risk of disease onset or recurrence in
Two or more kinds selected from the group consisting of a plurality of indicator parameters for a plurality of subjects who have developed or relapsed a disease within a predetermined period or have not developed or recurred within the predetermined period. Generating and acquiring a plurality of second data based on the index parameter;
The computing unit constructing a plurality of first learning models for predicting the onset or recurrence of the disease or unknown, and a plurality of second learning models for predicting the onset or no recurrence of the disease or unknown;
The arithmetic unit includes a step of selecting a plurality of first selection learning models and a plurality of second selection learning models for each of the plurality of first learning models and the plurality of second learning models.

The arithmetic unit processes the first data by the first selection learning model and the second selection learning model, and acquires a first determination result for each of the first selection learning model and the second selection learning model. Steps,
The arithmetic unit obtaining a second determination result based on a plurality of first determination results of the plurality of first selection learning models and a plurality of first determination results of the plurality of second selection learning models;
The arithmetic unit integrating the second determination result of the first selective learning model and the second determination result of the second selective learning model to obtain a third determination result;
The said calculating part further includes the step of predicting the onset risk or recurrence risk of the said disease within a predetermined period based on the said 3rd determination result, The onset risk of the disease within the predetermined period of Claim 1 or A program for predicting the risk of recurrence.

The step of obtaining the second determination result is performed by setting the cut-off value of the vote ratio of the first selection learning model and the cut-off value of the vote ratio of the second selection learning model to different values. The program for predicting the onset risk or recurrence risk of the disease within the predetermined period according to claim 2.

The risk of developing a disease within a predetermined period according to claim 2 or 3, wherein the calculation unit further includes a step of calculating a reliability of the predicted onset risk or recurrence risk based on the second determination result. A program for predicting the risk of recurrence.

The step of constructing the plurality of first learning models and the plurality of second learning models selects the plurality of index parameters, thereby generating the plurality of partial data generated and the complete data used for generating the partial data. The second data is generated and prepared, the second data is divided into a plurality of data groups, each of the data groups is used, and machine learning is performed under a plurality of parameter conditions that are different for each of the plurality of data groups. The program for predicting the onset risk or the recurrence risk of the disease in the predetermined period of any one of Claims 1-4 which is a step to construct | assemble.

The step of obtaining the second determination result performs voting on the plurality of first determination results of the plurality of first selection learning models and the plurality of first determination results of the plurality of second selection learning models, The risk of occurrence or recurrence of a disease within a predetermined period according to any one of claims 2 to 5, which is a step of obtaining a second determination result based on a vote rate for each of the selective learning model and the second selective learning model. A program for predicting risk.

The step of selecting the first selection learning model and the second selection learning model is a step of evaluating and selecting the first learning model and the second learning model based on a Pareto rank with indices of sensitivity and positive predictive value. The program for predicting the onset risk or recurrence risk of the disease within the predetermined period according to any one of claims 1 to 6.

The risk of disease onset or recurrence within a predetermined period according to any one of claims 1 to 7, wherein the disease is a major adverse cardiac event, acute kidney injury, cerebral aneurysm rupture or diabetic nephropathy. A program to predict.

First data that is measured using a test sample collected from a subject, or two or more index parameters obtained from the subject are acquired, and first data is generated and acquired based on the index parameters A generation acquisition unit;
The first data acquired by the first data generation and acquisition unit is processed by a first selection learning model and a second selection learning model, and a first determination result is obtained for each of the first selection learning model and the second selection learning model. A first determination result generation acquisition unit that generates and acquires;
The second determination result based on the plurality of first determination results of the plurality of first selection learning models and the plurality of first determination results of the plurality of second selection learning models acquired by the first determination result generation acquisition unit. A second determination result generation acquisition unit that generates and acquires
A second determination result of the first selection learning model acquired by the second determination result generation acquisition unit and a second determination result of the second selection learning model are integrated to generate and acquire a third determination result. 3 determination result generation acquisition unit;
Based on the third determination result acquired by the third determination result generation acquisition unit, the prediction unit predicts the risk of disease onset or recurrence within a predetermined period, Prediction device for predicting recurrence risk.

Based on two or more index parameters selected based on the selection frequency from a group consisting of a plurality of index parameters for a plurality of subjects who have developed or relapsed a disease within a predetermined period or have not developed or recurred A second data generation and acquisition unit for generating and acquiring second data;
Based on the second data acquired from the second data generation / acquisition unit, a plurality of first learning models for predicting the onset or recurrence of the disease or the unknown, and predicting the onset or no recurrence of the disease or the unknown A learning model building unit for building a plurality of second learning models;
A learning model selection unit that selects a plurality of first selection learning models and a plurality of second selection learning models for each of the plurality of first learning models and the plurality of second learning models constructed by the learning model construction unit. Furthermore, the prediction apparatus for predicting the onset risk or recurrence risk of the said disease within the predetermined period of Claim 9 further included.

The second data generation / acquisition unit selects a plurality of index parameters and generates partial data, thereby generating second data including a plurality of partial data and complete data used to generate the partial data. The prediction device for predicting the onset risk or the recurrence risk of the disease within a predetermined period according to claim 9 or 10, which is a functional unit.

A learning model that is measured using a test sample collected from a subject, or that is used in a method for predicting the risk of disease onset or recurrence within a predetermined period by inputting first data obtained from the subject. The construction method of
Based on two or more index parameters selected based on the selection frequency from a group consisting of a plurality of index parameters for a plurality of subjects who have developed or relapsed a disease within a predetermined period or have not developed or recurred Generating and acquiring a plurality of second data;
Constructing a plurality of first learning models for predicting the onset or recurrence of the disease or unknown, and a plurality of second learning models for predicting the onset or recurrence of the disease or no unknown;
Selecting a plurality of first selection learning models and a plurality of second selection learning models for each of the plurality of first learning models and the plurality of second learning models.

The step of constructing a plurality of first learning models and a plurality of second learning models is used to generate a plurality of partial data and the partial data by selecting a plurality of index parameters and generating partial data. Generating and preparing second data including complete data, dividing the second data into a plurality of data groups, using each of the data groups, and performing a plurality of parameter conditions for each of the plurality of data groups. The learning model construction method used for the prediction method of the onset risk or recurrence risk of the disease within the predetermined period according to claim 12, which is a step of construction by machine learning.

The step of selecting the first selection learning model and the second selection learning model is a step of evaluating and selecting the first learning model and the second learning model based on a Pareto rank with indices of sensitivity and positive predictive value. A method for constructing a learning model for use in the method for predicting a risk of developing a disease or a risk of recurrence within a predetermined period according to claim 13.

The learning model used for the prediction method of the risk of disease onset or recurrence within a predetermined period according to claim 14, wherein the disease is a major adverse cardiac event, acute kidney injury, cerebral aneurysm rupture or diabetic nephropathy. Construction method.