JP5645761B2

JP5645761B2 - Medical data analysis method, medical data analysis device, and program

Info

Publication number: JP5645761B2
Application number: JP2011139811A
Authority: JP
Inventors: 登史夫小林; 小林　康孝; 康孝小林
Original assignee: 登史夫小林
Priority date: 2011-06-23
Filing date: 2011-06-23
Publication date: 2014-12-24
Anticipated expiration: 2031-06-23
Also published as: JP2013008159A

Description

本発明は、コンピュータにより、医療データの解析を行う解析方法、医療データ解析装置およびプログラムに関する。 The present invention relates to an analysis method, a medical data analysis apparatus, and a program for analyzing medical data by a computer.

定期的な健康診断が一般的に行われている。
健康診断では、例えば、性別、年齢、身長、体重、肥満度（ＢＭＩ：Body Mass Index）等の身体的データ、喫煙・飲酒の習慣、食事の嗜好や内容、睡眠状況、運動の質と量等の生活習慣データ、血圧、服薬状況、既往病歴、自覚症状、医療関係者による疾病状況の診断等の生理学的データ、血液検査や尿検査等により得られる検査結果データ等、様々なデータが得られる。
これらのデータは、診断する医師に提供され、健康診断受診者の健康状態の評価や、健康の維持や疾患の予防・早期発見等に役立てられる。 Regular medical examinations are generally performed.
In a health checkup, for example, physical data such as sex, age, height, weight, body mass index (BMI), smoking / drinking habits, food preferences and content, sleep status, quality and quantity of exercise, etc. Various data such as lifestyle data, blood pressure, medication status, past medical history, subjective symptoms, physiological data such as diagnosis of disease status by medical personnel, test result data obtained by blood test, urine test, etc. .
These data are provided to a doctor who makes a diagnosis, and are used for evaluation of the health condition of a health check-up examinee, maintenance of health, prevention / early detection of diseases, and the like.

市中で行われる健康診断とは別に、人体から得られた被検査物について、成分分析や微生物の分析等、詳細な検査・分析を行うことにより、より多くの医学関連情報を得ようとする試みが、基礎医学系の研究を行う研究所や大学等で行われている。
このような基礎医学系の研究所等では、例えば、血液のメタボローム解析（全成分解析）や、口腔粘液、糞便、尿、唾液、鼻腔粘液、皮膚や膣液等に含まれる共生微生物の存在状態の解析を基に、患者の特性の詳細を特定することが行われている。 Aside from health examinations conducted in the city, we try to obtain more medical-related information by conducting detailed inspections and analyzes such as component analysis and microbial analysis on the specimen obtained from the human body. Attempts are being made at laboratories and universities that conduct basic medical research.
In such basic medical laboratories, for example, metabolomic analysis of blood (all component analysis), presence of symbiotic microorganisms contained in oral mucus, feces, urine, saliva, nasal mucus, skin and vaginal fluid, etc. Based on this analysis, details of patient characteristics are specified.

このような基礎医学系の検査・分析によって得られる基礎医学的データは、血液内の微少成分や共生微生物等が人体に与える影響を推測し、疾病状況の診断や疾病の予防・予測等を行う試みに使用されている。 Basic medical data obtained by such basic medical examinations / analyses estimates the effects of minute components in blood and symbiotic microorganisms on the human body, and diagnoses disease conditions and prevents / predicts diseases It is used for an attempt.

ところで、一般的に行われる健康診断において得られる様々なデータ（患者から直接得られるデータであるので、以下臨床的データと称する）と、基礎医学的データとは、全く異なるデータであり、上述した基礎医学的データは、現状では基礎医学系の研究を行っている研究所や大学の研究室等でしか扱っていない。このため、臨床的データを扱う医師が基礎医学的データに触れる機会は極めて少ない。
従って、臨床的データと基礎医学的データとの関連性を抽出することや、臨床的データと基礎医学的データの両方を使用して医学的に有益な知見を得ようとする試みは、今までほとんど行われていなかった。 By the way, various data obtained in a general medical examination (data obtained directly from a patient, hereinafter referred to as clinical data) and basic medical data are completely different data. Basic medical data is currently handled only by laboratories and research laboratories conducting basic medical research. For this reason, there are very few opportunities for doctors who handle clinical data to access basic medical data.
Therefore, attempts to extract the relationship between clinical data and basic medical data, and to obtain medically useful findings using both clinical data and basic medical data have been made until now. It was hardly done.

本発明はかかる事情に鑑みてなされたものであり、臨床的データと基礎医学的データの両方を使用して、診断技術や医学的及び科学的に有益な知見を得ることができる医療データ解析方法、医療データ解析装置およびプログラムを提供することを目的とするものである。 The present invention has been made in view of such circumstances, and a medical data analysis method capable of obtaining diagnostic techniques and medically and scientifically useful knowledge using both clinical data and basic medical data. An object of the present invention is to provide a medical data analysis apparatus and program.

第１の発明の医療データ解析方法は、複数の患者に関する身体的データ、患者の生活習慣に関するデータ、患者の疾病状態に関するデータ、患者から得られる被検査物の検査結果に関するデータ、の内、少なくともいずれかのデータを含む、患者を実地に診察及び／または治療する際に得られるデータである臨床医学的データと、患者の糞便、尿、唾液、鼻腔粘液、皮膚や膣液の少なくともいずれかに対して、基礎医学的な検査及び／または分析を行って得られる共生微生物に関するデータを含む、患者から得られる被検査物に対する基礎医学的な検査及び／または分析の結果に関するデータである基礎医学的データと、を基に、データ解析を行う医療データ解析装置の医療データ解析方法であって、医療データ解析装置が、同一患者に関する前記臨床医学的データと前記基礎医学的データを対応付ける第１のステップと、医療データ解析装置が、予め選択された前記臨床医学的データの一項目に対して、前記第１のステップにおいて対応付けられた基礎医学的データを基に、データマイニングの手法を用いたデータ解析を行う第２のステップと、を有し、前記第２のステップは、前記データマイニングの手法により、前記基礎医学的データにおける類似集団を抽出し、当該基礎医学的データ全体の有する構造特性を分別するための分別モデルを生成する第３のステップをさらに有する。
The medical data analysis method of the first invention includes at least one of physical data related to a plurality of patients, data related to a patient's lifestyle, data related to a patient's disease state, and data related to a test result obtained from a patient. containing either data, and clinical medical data is data obtained when the examination and / or treatment of the patient hands, the patient's feces, urine, saliva, nasal mucus, to at least one of skin and vaginal fluid On the other hand, basic medical data that is data on the results of basic medical examinations and / or analyzes on the specimens obtained from patients, including data on symbiotic microorganisms obtained by conducting basic medical examinations and / or analyses. Is a medical data analysis method for a medical data analysis device that performs data analysis based on data, and the medical data analysis device relates to the same patient. A first step of associating the clinical medical data with the basic medical data, and a medical data analysis device are associated with one item of the clinical medical data selected in advance in the first step. A second step of performing data analysis using a data mining method based on the basic medical data, and the second step includes a step in the basic medical data by the data mining method. The method further includes a third step of generating a classification model for extracting the similar population and for classifying the structural characteristics of the entire basic medical data.

第２の発明の医療データ解析装置は、複数の患者に関する身体的データ、患者の生活習慣に関するデータ、患者の疾病状態に関するデータ、患者から得られる被検査物の検査結果に関するデータ、の内、少なくともいずれかのデータを含む、患者を実地に診察及び／または治療する際に得られるデータである臨床医学的データと、患者の糞便、尿、唾液、鼻腔粘液、皮膚や膣液の少なくともいずれかに対して、基礎医学的な検査及び／または分析を行って得られる共生微生物に関するデータを含む、患者から得られる被検査物に対する基礎医学的な検査及び／または分析の結果に関するデータである基礎医学的データと、を基に、データ解析を行う医療データ解析装置であって、前記臨床医学的データおよび前記基礎医学的データを記憶する記憶部と、入力操作を受け付ける入力部と、制御部と、を有し、前記制御部は、同一患者に関する前記臨床医学的データと前記基礎医学的データを対応付け、前記入力部を介した入力操作により予め選択された前記臨床医学的データの一項目に対して対応付けられた基礎医学的データを基に、データマイニングの手法により、前記基礎医学的データにおける類似集団を抽出し、当該基礎医学的データ全体の有する構造特性を分別するための分別モデルを生成してデータ解析を行う。
The medical data analysis apparatus according to the second invention includes at least one of physical data related to a plurality of patients, data related to a patient's lifestyle, data related to a patient's disease state, and data related to a test result of a test object obtained from the patient. containing either data, and clinical medical data is data obtained when the examination and / or treatment of the patient hands, the patient's feces, urine, saliva, nasal mucus, to at least one of skin and vaginal fluid On the other hand, basic medical data that is data on the results of basic medical examinations and / or analyzes on the specimens obtained from patients, including data on symbiotic microorganisms obtained by conducting basic medical examinations and / or analyses. A medical data analysis device that performs data analysis based on data, and stores the clinical medical data and the basic medical data A memory unit, an input unit that receives an input operation, and a control unit, wherein the control unit associates the clinical medical data and the basic medical data related to the same patient, and inputs via the input unit Based on the basic medical data associated with one item of the clinical medical data preselected by the operation, a similar group in the basic medical data is extracted by the data mining technique, and the basic medical data is extracted. A classification model for classifying the structural characteristics of the entire target data is generated and data analysis is performed.

第３の発明のプログラムは、複数の患者に関する身体的データ、患者の生活習慣に関するデータ、患者の疾病状態に関するデータ、患者から得られる被検査物の検査結果に関するデータ、の内、少なくともいずれかのデータを含む、患者を実地に診察及び／または治療する際に得られるデータである臨床医学的データと、患者の糞便、尿、唾液、鼻腔粘液、皮膚や膣液の少なくともいずれかに対して、基礎医学的な検査及び／または分析を行って得られる共生微生物に関するデータを含む、患者から得られる被検査物に対する基礎医学的な検査及び／または分析の結果に関するデータである基礎医学的データと、を基に、データ解析を行う医療データ解析装置が有するコンピュータの実行するプログラムであって、同一患者に関する前記臨床医学的データと前記基礎医学的データを対応付ける第１の手順と、予め選択された前記臨床医学的データの一項目に対して、前記第１の手順において対応付けられた基礎医学的データを基に、データマイニングの手法を用いたデータ解析を行う第２の手順と、前記第２の手順において、前記データマイニングの手法により、前記基礎医学的データにおける類似集団を抽出し、当該基礎医学的データ全体の有する構造特性を分別するための分別モデルを生成する第３の手順と、を前記コンピュータに実行させる。
The program of the third invention is at least one of physical data related to a plurality of patients, data related to a patient's lifestyle, data related to a patient's disease state, and data related to a test result obtained from a patient. Including clinical data, including data, which is obtained when the patient is examined and / or treated in the field, and / or feces, urine, saliva, nasal mucus, skin and vaginal fluid . Basic medical data that is data relating to the results of basic medical examinations and / or analyzes on the specimen obtained from the patient, including data on symbiotic microorganisms obtained by conducting basic medical examinations and / or analyzes; Is a program executed by a computer included in a medical data analysis apparatus that performs data analysis, the clinical program relating to the same patient A first procedure for associating the medical data with the basic medical data, and the basic medical data associated in the first procedure with respect to one item of the clinical medical data selected in advance In the second procedure for performing data analysis using a data mining technique, and in the second procedure, a similar population in the basic medical data is extracted by the data mining technique, and the basic medical data as a whole is extracted. And causing the computer to execute a third procedure for generating a classification model for classifying the structural characteristics of the computer.

本発明によれば、臨床的データと基礎医学的データの両方を使用して、診断技術や医学的及び科学的に有益な知見を得ることができる医療データ解析方法、医療データ解析装置およびプログラムを提供することができる。 According to the present invention, there is provided a medical data analysis method, a medical data analysis device, and a program capable of obtaining diagnostic techniques and medically and scientifically useful knowledge using both clinical data and basic medical data. Can be provided.

図１は、医療データ解析方法の事業モデルの一例を示す図である。FIG. 1 is a diagram illustrating an example of a business model of a medical data analysis method. 図２は、医療データ解析装置１００の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of the medical data analysis apparatus 100. 図３は、データマイニングの手法を使用した医療データ解析方法の一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of a medical data analysis method using a data mining technique. 図４は、医療データ解析方法の対象となるデータの例を示した図である。FIG. 4 is a diagram illustrating an example of data that is a target of the medical data analysis method. 図５は、Ｃ＆ＲＴ法でＨｂＡ_１ｃを特性として解析を行った結果得られた決定木の一例を示す図である。FIG. 5 is a diagram showing an example of a decision tree obtained as a result of analysis using HbA _1c as a characteristic by the C & RT method. 図６は、ＨｂＡ_１ｃについて、患者のカテゴリ分けを行う際のしきい値の一例を示す表である。FIG. 6 is a table showing an example of threshold values when categorizing patients for HbA _1c . 図７は、Ｃ＆ＲＴ法で収縮期血圧を特性として解析を行った結果得られた決定木の一例を示す図である。FIG. 7 is a diagram illustrating an example of a decision tree obtained as a result of analysis using systolic blood pressure as a characteristic by the C & RT method. 図８は、収縮期血圧について、患者のカテゴリ分けを行う際のしきい値の一例を示す表である。FIG. 8 is a table showing an example of threshold values for categorizing patients with respect to systolic blood pressure. 図９は、Ｃ＆ＲＴ法でＬＤＬ−ＣおよびＨＤＬ−Ｃを特性として解析を行った結果得られた決定木の一例を示す図である。FIG. 9 is a diagram illustrating an example of a decision tree obtained as a result of analyzing LDL-C and HDL-C as characteristics by the C & RT method. 図１０は、ＬＤＬ−ＣおよびＨＤＬ−Ｃについて、患者のカテゴリ分けを行う際のしきい値の一例を示す表である。FIG. 10 is a table showing an example of threshold values when performing patient categorization for LDL-C and HDL-C.

以下、本発明の実施形態について説明する。
まず、本実施形態において扱うデータの種類について説明する。 Hereinafter, embodiments of the present invention will be described.
First, the types of data handled in this embodiment will be described.

（１）臨床的データ
本実施形態では、例えば健康診断や、市中の病院等の一般的な医療施設における検査等において、患者から得られる様々なデータを、総称して臨床的データと称する。
ここで一般的な医療施設とは、後述する基礎医学系の研究を行う研究所や大学等を含まない医療施設を意味している。
臨床的データは、例えば、医師の問診や患者に対するアンケート、身体的データの測定、採血や採尿後の血液検査及び尿検査等により得られるデータである。
臨床的データには、例えば、性別、年齢、身長、体重、肥満度等の身体的データ、喫煙・飲酒の習慣、食事の嗜好や内容、睡眠状況、運動の質と量等の生活習慣データ、血圧、服薬状況、既往病歴、自覚症状、医療関係者による疾病状況の検査や診断等の生理学的データ、血液検査や尿検査等により得られる検査・分析結果データ等が含まれる。 (1) Clinical data In this embodiment, for example, various data obtained from patients in medical examinations and examinations at general medical facilities such as hospitals in the city are collectively referred to as clinical data.
Here, a general medical facility means a medical facility that does not include a research institute or university that conducts basic medical research described later.
The clinical data is data obtained by, for example, a doctor's inquiry, a questionnaire for patients, measurement of physical data, blood collection, blood tests after urine collection, urinalysis, and the like.
Clinical data includes, for example, physical data such as gender, age, height, weight, and obesity, smoking / drinking habits, dietary preferences and content, sleep status, quality and quantity of exercise, It includes blood pressure, medication status, past medical history, subjective symptoms, physiological data such as examination and diagnosis of disease status by medical personnel, test / analysis result data obtained by blood test, urinalysis, and the like.

（２）基礎医学的データ
本実施形態では、例えば基礎医学系の研究を行う研究所や大学等において、患者から採取した血液に対するメタボローム解析（全成分の検査・分析）を実施して得られたデータや、患者の糞便、尿、唾液、鼻腔粘液、皮膚や膣液等から得た共生微生物の存在状態に関するデータを基礎医学的データと称する。 (2) Basic medical data In this embodiment, for example, obtained by performing metabolomic analysis (examination / analysis of all components) on blood collected from a patient at a research institute or university conducting basic medical research. Data and data on the presence of symbiotic microorganisms obtained from patient feces, urine, saliva, nasal mucus, skin, vaginal fluid, etc. are referred to as basic medical data.

本実施形態では、これら臨床的データと基礎医学的データとの両方を使用して、診断技術や医学的及び科学的に有益な知見を得るための医療データ解析方法について説明する。 In the present embodiment, a medical data analysis method for obtaining diagnostic techniques and medically and scientifically useful knowledge using both clinical data and basic medical data will be described.

・事業モデル
図１は、本実施形態の医療データ解析方法の事業モデルの一例を示す図である。
図１には、事業モデルの一例を示す。
図１に示すように、事業者１、患者２、病院３を含む。 Business Model FIG. 1 is a diagram illustrating an example of a business model of the medical data analysis method of the present embodiment.
FIG. 1 shows an example of a business model.
As shown in FIG. 1, the service provider 1, the patient 2, and the hospital 3 are included.

事業者１は、患者２および病院３から臨床的データおよび被検査物を収集し、収集した被検査物を基に例えば図示しない基礎医学系の研究を行う研究所や大学等が生成した基礎医学的データを取得する。そして、収集した臨床的データと取得した基礎医学的データとを使用して、データマイニングの手法により、所定の目的に応じた解析を行い、解析結果を得る。
あるいは、事業者１自体が患者から収集した被検査物を基に、基礎医学的データを生成してもよい。
なお、データマイニングとは、蓄積されたデータを解析し、目的とする特性に関して、その中に潜む項目間の相関関係や特徴などを探し出して特性の動向を予測する手法である。
事業者１の行うデータマイニングの手法による解析の具体的方法については後に詳述する。
事業者１は、解析結果を基に、患者２の個体別に生活改善方法や生体内共生微生物の制御方法等を含む、健康のための具体的なアドバイス情報を作成することができる。そして、解析結果とともに、アドバイス情報を患者２および病院３に提供することができる。 The business operator 1 collects clinical data and test objects from the patient 2 and the hospital 3, and based on the collected test objects, for example, a basic medicine generated by a research institute or university that conducts basic medical research (not shown). The target data. Then, using the collected clinical data and the acquired basic medical data, analysis according to a predetermined purpose is performed by a data mining technique, and an analysis result is obtained.
Or you may produce | generate basic medical data based on the to-be-inspected thing which the provider 1 itself collected from the patient.
Data mining is a technique for analyzing the accumulated data and searching for correlations and features between items hidden in the target characteristics to predict characteristics trends.
A specific method of analysis by the data mining method performed by the business operator 1 will be described in detail later.
Based on the analysis result, the business operator 1 can create specific advice information for health including a life improvement method and a control method for in vivo symbiotic microorganisms for each patient 2. Then, along with the analysis result, advice information can be provided to the patient 2 and the hospital 3.

患者２は、例えば事業者１により市民の中から抽出された複数人であり、事業者１の行う解析の対象となる。患者２の人数については本発明では特に限定しない。事業者１が行う解析の目的に合わせて適正な人数とすればよい。また、患者２を抽出する条件は、事業者１が行う解析の目的に応じて設定されればよい。
病院３は、例えば事業者１の要請に従い、患者２を診察し臨床的データを得る。病院３は１つである必要はなく、患者毎に異なる病院３を利用するようにしてもよい。 The patients 2 are, for example, a plurality of people extracted from the citizens by the business operator 1, and are subject to analysis performed by the business operator 1. The number of patients 2 is not particularly limited in the present invention. What is necessary is just to set it as an appropriate number of persons according to the purpose of the analysis which the provider 1 performs. Moreover, the conditions for extracting the patient 2 may be set according to the purpose of analysis performed by the business operator 1.
The hospital 3 examines the patient 2 and obtains clinical data according to a request from the business operator 1, for example. The number of hospitals 3 is not necessarily one, and different hospitals 3 may be used for each patient.

なお、図１に示した例では、事業者１の行う解析の対象となっているのは人間の患者２であるが、本実施形態における解析の対象は人間に限られず、例えば家畜等でもよい。この場合、病院３は動物病院や畜産業者ということになる。 In the example shown in FIG. 1, the subject of analysis performed by the operator 1 is a human patient 2, but the subject of analysis in the present embodiment is not limited to a human, and may be, for example, livestock. . In this case, the hospital 3 is an animal hospital or a livestock industry.

・医療データ解析装置
事業者１は、医療データ解析装置１００を有し、上述した臨床的データおよび基礎医学的データを使用した解析処理を行う。
以下、医療データ解析装置１００の構成例について説明する。
図２は、医療データ解析装置１００の構成例を示す図である。
図２に示すように、医療データ解析装置１００は、記憶部１０１、表示部１０２、制御部１０３、入力部１０４の各要素を有するコンピュータである。 The medical data analyzer company 1 has a medical data analyzer 100 and performs analysis processing using the clinical data and basic medical data described above.
Hereinafter, a configuration example of the medical data analysis apparatus 100 will be described.
FIG. 2 is a diagram illustrating a configuration example of the medical data analysis apparatus 100.
As illustrated in FIG. 2, the medical data analysis apparatus 100 is a computer having elements of a storage unit 101, a display unit 102, a control unit 103, and an input unit 104.

記憶部１０１は、例えばＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）、フラッシュメモリ等の記憶装置である。記憶部１０１には、各種データ（上述した臨床的データや基礎医学的データを含む）や、所定のプログラム、プログラムの実行に必要なデータ等が記憶される。
表示部１０２は、例えば液晶ディスプレイやＣＲＴ（Cathode Ray Tube）等の表示装置である。
制御部１０３は、例えばＣＰＵ（Central Processing Unit）等の主演算装置であり、記憶部１０１に記憶された所定のプログラムを実行して所定の処理を行う。
入力部１０４は、例えばキーボードやマウス、スキャナ等のデータ入力装置である。また、入力部１０４は、データ入力端子であって、オンラインでのデータ入力を受け付ける。 The storage unit 101 is a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or a flash memory. The storage unit 101 stores various data (including the above-described clinical data and basic medical data), a predetermined program, data necessary for executing the program, and the like.
The display unit 102 is a display device such as a liquid crystal display or a CRT (Cathode Ray Tube).
The control unit 103 is a main processing device such as a CPU (Central Processing Unit), for example, and executes a predetermined program stored in the storage unit 101 to perform a predetermined process.
The input unit 104 is a data input device such as a keyboard, a mouse, or a scanner. The input unit 104 is a data input terminal and accepts online data input.

以下説明する解析方法は、このような記憶部１０１、表示部１０２、制御部１０３、入力部１０４を有するコンピュータである医療データ解析装置１００によって実行される。具体的には、このような解析方法を実行するプログラムが記憶部１０１に記憶されており、入力部１０４を介した操作に応じて制御部１０３がプログラムを実行させることにより、解析が行われる。
医療データ解析装置１００は上述した構成以外にも、例えば印刷等の出力を行う出力部を有していてもよい。 The analysis method described below is executed by the medical data analysis apparatus 100 which is a computer having such a storage unit 101, display unit 102, control unit 103, and input unit 104. Specifically, a program for executing such an analysis method is stored in the storage unit 101, and the analysis is performed by causing the control unit 103 to execute the program in response to an operation via the input unit 104.
In addition to the configuration described above, the medical data analysis apparatus 100 may include an output unit that performs output such as printing.

・解析方法
次に、事業者１の医療データ解析装置１００により行われる、臨床的データと基礎医学的データとの両方を使用したデータマイニングの手法による解析の具体的方法について説明する。
図３は、医療データ解析装置１００において行われる、データマイニングの手法を使用した解析方法の一例を示すフローチャートである。 Analysis Method Next, a specific method of analysis performed by the data mining method using both clinical data and basic medical data performed by the medical data analysis apparatus 100 of the business operator 1 will be described.
FIG. 3 is a flowchart illustrating an example of an analysis method using a data mining technique performed in the medical data analysis apparatus 100.

ステップＳＴ１：
複数の患者２および病院３から取得した臨床的データのデータ入力を行う。
臨床的データは、予め複数の患者２および病院３から取得されたものを使用する。なお、臨床的データを患者２から採取する方法については、本発明では限定しない。臨床的データは、患者２が病院３まで出向いて生成されるのが一般的である。 Step ST1:
Data of clinical data acquired from a plurality of patients 2 and hospital 3 is input.
Clinical data obtained from a plurality of patients 2 and hospitals 3 in advance is used. The method for collecting clinical data from the patient 2 is not limited in the present invention. The clinical data is generally generated when the patient 2 goes to the hospital 3.

ステップＳＴ２：
予め患者２から取得した被検査物を基に生成した基礎医学的データを入力する。
ここで、被検査物とは、患者２から採取された血液、糞便、尿、唾液、鼻孔粘液、皮膚、膣液等である。本実施形態では、特に、糞便を被検査物として、糞便内に存在する共生微生物である腸内常在菌を対象とした解析を行う場合について説明する。
なお、被検査物を患者２から採取する方法については、本発明では限定しない。被検査物は、患者２自身が採取してもよいし、病院３において採取されてもよい。
基礎医学的データの生成方法については後述する。
ステップＳＴ３：
ステップＳＴ１において入力した臨床的データおよび、ステップＳＴ２において入力した基礎医学的データを基に、データマイニングの手法による解析を好適に行うために、データ整理を行う。
まず、取得した臨床的データおよび基礎医学的データを、行（縦）方向に患者２（被験生体）を、列（横）方向にデータの各項目を配置し、２次元のデータ（表）を生成する。 Step ST2:
Basic medical data generated based on a test object acquired in advance from the patient 2 is input.
Here, the test object is blood, feces, urine, saliva, nostril mucus, skin, vaginal fluid, etc. collected from the patient 2. In the present embodiment, a case will be described in which analysis is particularly performed on intestinal resident bacteria, which are symbiotic microorganisms present in stool, using stool as a test object.
In addition, about the method of extract | collecting a test object from the patient 2, it is not limited in this invention. The inspected object may be collected by the patient 2 or may be collected in the hospital 3.
A method for generating basic medical data will be described later.
Step ST3:
Based on the clinical data input in step ST1 and the basic medical data input in step ST2, data arrangement is performed in order to suitably perform analysis by a data mining technique.
First, the acquired clinical data and basic medical data are arranged in the row (vertical) direction with the patient 2 (test subject) and the data items in the column (horizontal) direction, and the two-dimensional data (table). Generate.

図４（ａ）に、２次元のデータとしての臨床的データの例を示す。
また、図４（ｂ）に、２次元のデータとしての基礎医学的データの例を示す。
図４（ａ）では、列方向に患者名が示され、行方向にＨｂＡ_１ｃ、収縮期血圧、ＬＤＬ−Ｃ、ＨＤＬ−Ｃ・・・等、患者２毎の臨床的データの項目が示されている。
図４（ｂ）では、列方向に患者名が、行方向にＢ３３２、Ｂ４９４、Ｂ６４１、Ｂ６５７・・・等、患者毎の基礎医学的データの項目名（ここでは、腸内常在菌の種類と量的な構成を示す名称）が示されている。Ｂ３３２、Ｂ４９４等の項目名の詳細については、後述する。 FIG. 4A shows an example of clinical data as two-dimensional data.
FIG. 4B shows an example of basic medical data as two-dimensional data.
In FIG. 4A, patient names are shown in the column direction, and clinical data items for each patient 2 such as HbA _1c , systolic blood pressure, LDL-C, HDL-C, etc. are shown in the row direction. ing.
In FIG. 4 (b), the names of patients in the column direction and the item names of basic medical data for each patient such as B332, B494, B641, B657... And a name indicating a quantitative structure). Details of item names such as B332 and B494 will be described later.

臨床的データと基礎医学的データとをデータマイニングの手法によりデータ解析するために、これらのデータセットを単一の表で表すことが必要である。このため、本ステップにおいて、図４（ａ）に例示した臨床的データと図４（ｂ）に例示した基礎医学的データとを基にして、単一の表を作成する処理を行う。
具体的には、図４（ａ）に例示した臨床的データと図４（ｂ）に例示した基礎医学的データとを基に、同一患者に関する臨床的データと基礎医学的データを対応付ける。
図４（ｃ）に、図４（ａ）に示す臨床的データと図４（ｂ）に示す基礎医学的データを基に作成した単一の表の例を示す。
図４（ｃ）に示す表は、図４（ａ）に示す臨床的データと図４（ｂ）に示す基礎医学的データの両方のデータを、患者名毎に示した単一の表である。列方向には患者名が示され、行方向にはＨｂＡ_１ｃ、収縮期血圧、ＬＤＬ−Ｃ、ＨＤＬ−Ｃ・・・等の臨床的データの項目名の後、Ｂ３３２、Ｂ４９４、Ｂ６４１、Ｂ６５７・・・等の基礎医学的データの項目名が示される。
同一患者に関する臨床的データと基礎医学的データが対応付けられるため、図４（ａ）の表には存在し図４（ｂ）の表示は存在しない患者Ｃおよび、図４（ｂ）の表には存在し図４（ａ）の表示は存在しない患者Ｅは図４（ｃ）に示す単一の表からは削除されている。
このような処理により、患者毎の臨床的データと基礎医学的データとの対応関係が明瞭になる。 In order to analyze clinical data and basic medical data by data mining techniques, it is necessary to represent these data sets in a single table. Therefore, in this step, a process for creating a single table is performed based on the clinical data illustrated in FIG. 4A and the basic medical data illustrated in FIG.
Specifically, based on the clinical data illustrated in FIG. 4A and the basic medical data illustrated in FIG. 4B, the clinical data and the basic medical data relating to the same patient are associated with each other.
FIG. 4 (c) shows an example of a single table created based on the clinical data shown in FIG. 4 (a) and the basic medical data shown in FIG. 4 (b).
The table shown in FIG. 4 (c) is a single table showing both clinical data shown in FIG. 4 (a) and basic medical data shown in FIG. 4 (b) for each patient name. . The patient name is shown in the column direction, and the clinical data item names such as HbA _1c , systolic blood pressure, LDL-C, HDL-C... Are shown in the row direction, and then B332, B494, B641, B657.・ Item names of basic medical data such as
Since clinical data and basic medical data related to the same patient are associated with each other, patient C exists in the table of FIG. 4A and does not have the display of FIG. 4B, and the table of FIG. Patient E, which does not exist in FIG. 4 (a), has been deleted from the single table shown in FIG. 4 (c).
By such processing, the correspondence between clinical data and basic medical data for each patient becomes clear.

ステップＳＴ４：
本ステップでは、解析の目的となる「特性」の入力を医療データ解析装置１００の入力部１０４を介して受け付ける。
ここで、「特性」とは、例えば事業者１により予め決定される解析の目的に応じたデータの一項目を意味する。
「特性」は、例えば解析の目的となる疾病に対応する臨床的データの一項目から選択される。
例えば、上述した図４の例では、図４（ａ）に示すように、臨床的データは、ＨｂＡ_１ｃ、収縮期血圧、ＬＤＬ−Ｃ、ＨＤＬ−Ｃ・・・と様々な項目を有している。
ここで、ＨｂＡ_１ｃは、糖尿病に関連の深い項目であり、収縮期血圧は、高血圧に関連の高い項目であり、ＬＤＬ−Ｃ及びＨＤＬ−Ｃは、脂質異常症（高脂血症）に関連の高い項目である。 Step ST4:
In this step, an input of “characteristic” that is an object of analysis is received via the input unit 104 of the medical data analysis apparatus 100.
Here, the “characteristic” means an item of data corresponding to the purpose of analysis determined in advance by the business operator 1, for example.
The “characteristic” is selected from one item of clinical data corresponding to a disease to be analyzed, for example.
For example, in the example of FIG. 4 described above, as shown in FIG. 4A, clinical data includes various items such as HbA _1c , systolic blood pressure, LDL-C, HDL-C. Yes.
Here, HbA _1c is an item deeply related to diabetes, systolic blood pressure is a high item related to hypertension, and LDL-C and HDL-C are related to dyslipidemia (hyperlipidemia). It is a high item.

すなわち、事業者１は、例えば糖尿病に関して解析を行うことを目的とする場合には、「特性」としてＨｂＡ_１ｃを選択する。同様に、高血圧に関して解析を行うことを目的とする場合には、「特性」として収縮期血圧を、脂質異常症に関して解析を行うことを目的とする場合には、「特性」としてＬＤＬ−Ｃ及びＨＤＬ−Ｃを選択すればよい。
また、上述した例以外の疾病に関して解析を行うことを目的とする場合は、その疾病に関連が高い項目の臨床的データをステップ１において予め入力し、本ステップにおいてその項目を選択すればよい。
このように、事業者１は、解析の目的に合わせて、臨床的データの１項目を選択し、入力部１０４を介して選択した項目を入力する。そして、以降のステップでは、入力された選択された項目の値に応じて解析を行うことになる。 That is, for example, the business operator 1 selects HbA _1c as the “characteristic” when the purpose is to analyze diabetes. Similarly, when the purpose is to analyze hypertension, the systolic blood pressure is “characteristic”, and when the purpose is to analyze dyslipidemia, LDL-C and What is necessary is just to select HDL-C.
In addition, when the purpose is to analyze a disease other than the above-described example, clinical data of an item highly related to the disease may be input in step 1 in advance and the item may be selected in this step.
As described above, the business operator 1 selects one item of clinical data in accordance with the purpose of analysis, and inputs the selected item via the input unit 104. In the subsequent steps, analysis is performed according to the value of the selected selected item.

なお、本実施形態では、臨床的データとして、上述したように医師の診断内容や、患者２に対するアンケートの結果等、数値やカテゴリ値ではなく文章で記述されるデータも含まれる。本実施形態では、図４（ａ）には例示していないものの、数値データ以外の記述データも特性として設定することができる（詳しくは後述する）。 In the present embodiment, as described above, the clinical data includes data described in sentences instead of numerical values or category values, such as doctor's diagnosis contents and results of a questionnaire for the patient 2. In this embodiment, although not illustrated in FIG. 4A, description data other than numerical data can also be set as characteristics (details will be described later).

上述した「特性」の選択は、事業者１の入力部１０４を介した操作により行われる。
なお、事業者１は、例えば、解析の目的とする疾病に応じて、患者２および病院３から当該疾病に関連が高い項目の臨床的データを取得するようにしてもよいし、無作為に患者２および病院３から取得した臨床的データの中から、事業者１が任意に解析の目的とする「特性」を決定するようにしてもよい。 The above-mentioned “characteristic” is selected by an operation via the input unit 104 of the business operator 1.
For example, the business operator 1 may acquire clinical data of items highly related to the disease from the patient 2 and the hospital 3 according to the disease to be analyzed, or may randomly select the patient. 2 and the clinical data acquired from the hospital 3, the business operator 1 may arbitrarily determine a “characteristic” to be analyzed.

ステップＳＴ５：
ステップＳＴ３において生成した２次元データを基に、ステップＳＴ４において選択した「特性」に関して、データマイニングの手法によるデータ解析を行う。
データマイニングとは、蓄積されたデータを解析し、目的とする特性に関して、その中に潜む項目間の相関関係や特徴などを探し出して特性の動向を予測する手法である。 Step ST5:
Based on the two-dimensional data generated in step ST3, the “characteristic” selected in step ST4 is subjected to data analysis by a data mining technique.
Data mining is a technique for predicting the trend of characteristics by analyzing accumulated data and searching for correlations and features between items hidden in the target characteristics.

データマイニングの手法としては、例えば、Ｃ＆ＲＴ法、ＣＨＡＩＤ（Chi-square Automatic Interaction Detection）法、ＱＵＥＳＴ（Quick, Unbiased, Efficient, Statistical Tree）法、Ｃ５．０法等の決定木（ルールセット）の構築と分別モデルとを提供する演算方法や、ベイズ法、ロジスティック回帰法、ニューラルネットワークアルゴリズム、ＳＶＭ（Support Vector Machine）法等の決定木を構築せず分別モデルを提供する演算方法がある。
決定木（Decision Tree）とは、木構造を利用して、入力パターンに対応する分別結果のアルゴリズムを表現したものである。
分別モデルとは、与えられた数値資料に含まれるデータの特徴をデータマイニングの演算方式によって、資料中の特定項目の動向に着目及び／または整理し、データを類似集団への帰属状況に基づき個別にまとめ分類して、資料全体の持つ構造特性を再現性よく分別すること、また、後続する別の数値資料を該モデルに適用することで、容易に類似の分別結果や予測確率が得られることを特徴とする演算数式群である。
分別モデルはデータマイニングの演算方式毎に構築可能である。また、同じ数値資料、同じ演算方式であっても目的とする特性が異なれば、異なる分別モデルが生成され、電子ファイル等に保存可能である、という特徴を有する。 Data mining techniques include, for example, C & RT method, CHAID (Chi-square Automatic Interaction Detection) method, QUEST (Quick, Unbiased, Efficient, Statistical Tree) method, and C5.0 method decision tree (rule set) construction And a classification model, and a calculation method that provides a classification model without constructing a decision tree, such as a Bayesian method, a logistic regression method, a neural network algorithm, and an SVM (Support Vector Machine) method.
A decision tree represents a classification result algorithm corresponding to an input pattern using a tree structure.
A classification model is a data mining algorithm that focuses on and / or organizes the characteristics of data contained in a given numerical document, and the data is individually based on the status of belonging to a similar group. To classify and classify the structural characteristics of the entire document with good reproducibility, and to apply similar numerical results to the model to easily obtain similar classification results and prediction probabilities. Is an arithmetic expression group characterized by the following.
A classification model can be constructed for each calculation method of data mining. Further, even if the same numerical material and the same calculation method are used, if the target characteristics are different, different classification models can be generated and stored in an electronic file or the like.

・データマイニングによる解析方法
解析に使用するデータマイニングの手法は、例えば、Ｃ＆ＲＴ法、ＣＨＡＩＤ法、ＱＵＥＳＴ法、Ｃ５．０法、ベイズ法、ロジスティック回帰法、ニューラルネットワークアルゴリズム、ＳＶＭ（Support Vector Machine）、等の手法のうち、解析の目的や性質に合わせて事業者１が任意に選択した手法を使用すればよい。
上記例示したデータマイニングの演算手法のうち、Ｃ＆ＲＴ法、ＣＨＡＩＤ法、ＱＵＥＳＴ法、Ｃ５．０法を使用した場合には、決定木と分別モデルとが提供され、ベイズ法、ロジスティック回帰法、ニューラルネットワークアルゴリズム、ＳＶＭを使用した場合には、分別モデルのみが提供され決定木は提供されない。 Data mining techniques used by the data mining analysis method analysis, for example, C & RT method, CHAID method, QUEST method, C5.0 method, Bayes method, logistic regression, neural networks algorithm, SVM (Support Vector Machine), Of these methods, a method arbitrarily selected by the operator 1 in accordance with the purpose and nature of the analysis may be used.
Among the data mining calculation methods exemplified above, when the C & RT method, the CHAID method, the QUEST method, and the C5.0 method are used, a decision tree and a classification model are provided, and a Bayes method, a logistic regression method, a neural network are provided. When the algorithm, SVM is used, only the classification model is provided and the decision tree is not provided.

以下、まず、決定木を提供する演算方式の一例として、Ｃ＆ＲＴ法を使用した場合について説明する。
（１）Ｃ＆ＲＴ法
Ｃ＆ＲＴ法は、目的変数に対してできるだけ等質なデータサブセットを作成すべく、対象を２つに分別することにより決定木を構築していくものである。
具体的には、データの不純度（Ｇｉｎｉ係数）を定義し、元のデータ（親ノードのデータ）を２つのサブセット（子ノードのデータ）に分割するとき、子ノードの不純度が親ノードの不純度に対してどの程度改善されたかを示す改善度を分別作業の評価基準として構築していく。
そして、改善度が最大となるような分別点（分別変数及びその値）を再帰的に探索する過程を、停止規則を満たすまで反復する。 Hereinafter, a case where the C & RT method is used will be described as an example of an arithmetic method for providing a decision tree.
(1) C & RT method The C & RT method constructs a decision tree by dividing the target into two in order to create as homogeneous a data subset as possible for the objective variable.
Specifically, when the impurity (Gini coefficient) of data is defined and the original data (parent node data) is divided into two subsets (child node data), the impurity of the child node is The degree of improvement indicating how much improvement has been made with respect to the impurity is established as an evaluation standard for the classification work.
Then, the process of recursively searching for the separation point (the classification variable and its value) that maximizes the improvement is repeated until the stop rule is satisfied.

不純度ｇ（ｔ）および改善度ｆ（ｔ）は、以下のように算出される。 The impurity g (t) and the improvement degree f (t) are calculated as follows.

ただし、数式１および数式２において、 However, in Equation 1 and Equation 2,

である。

It is.

ここで、π（ｊ）は、カテゴリｊの事前確率、Ｎ_ｊ（ｔ）は、ノードｔにおけるカテゴリｊのケース数（本実施形態の場合は、患者数）、Ｎ_ｊは、ルートノードにおけるカテゴリｊのケース数である。また、Ｐ_Ｌは、ノードｔにおいて１つ目の子ノードに送られるケースの割合であり、Ｐ_Ｒは、ノードｔにおいて２つ目の子ノードに送られるケースの割合である。
Ｃ＆ＲＴ法においては、改善度ｆ（ｔ）が最大となるように分別する。すなわち、不純度が最も大きく減少するように分別する。
このようにして、決定木を構築し、出力する。
なお、Ｃ＆ＲＴ法による決定木の作出には、上述したようにＧｉｎｉ係数を使用する他に、Ｔｗｏｉｎｇ、最小２乗偏差（ＬＳＤ）等を使用する方法もあり、目的とする特性や被検査物の性格によって使い分けるようにしてもよい。 Here, π (j) is the prior probability of category j, N _j (t) is the number of cases of category j at node t (in this embodiment, the number of patients), and N _j is the category at the root node. j is the number of cases. Also, P _L is the percentage of cases that are sent to the first child node in node t, P _R is the percentage of cases that are sent to the second child node in the node t.
In the C & RT method, separation is performed so that the improvement degree f (t) is maximized. That is, fractionation is performed so that the impurity is most greatly reduced.
In this way, a decision tree is constructed and output.
In addition to using the Gini coefficient as described above, there are other methods for creating decision trees by the C & RT method, such as using Twining, least square deviation (LSD), etc. You may make it use properly by personality.

次に、決定木を提供しない演算方式の一例として、ロジスティック回帰法を使用した場合について説明する。
（２）ロジスティック回帰法
部分母集団ｉにおける応答カテゴリｊの確率π_ｉｊは以下のようになる。 Next, a case where a logistic regression method is used will be described as an example of an arithmetic method that does not provide a decision tree.
(2) Logistic regression method The probability π _ij of the response category j in the subpopulation i is as follows.

ここで、Ｊは最後のカテゴリである。
ｘ’_ｉβ_ｊは、次のように表現される。

Here, J is the last category.
x ′ _i β _j is expressed as follows.

ただし、ｊ＝１、・・・Ｊである。

However, j = 1,... J.

数式６はロジット変換の逆関数であり、Ｊ＝２のとき、このモデルは２項ロジスティック回帰モデルと同じである。このため、上記モデルは２項のレスポンスから多項名義レスポンスへの、２項ロジスティック回帰モデルの延長と考えられる。 Equation 6 is the inverse function of the logit transform. When J = 2, this model is the same as the binomial logistic regression model. For this reason, the above model is considered to be an extension of the binary logistic regression model from the binary response to the multinomial nominal response.

このモデルの対数尤度は以下の式により得られる。 The log likelihood of this model is obtained by the following equation.

ここで、対数尤度を最大とするパラメータＢを求めるために、Newton-Raphson法を用いる。ただし、この方法は、パラメータＢに対する１の二次導関数の期待値が観測対象の期待値と同じため、このモデルのFisherのスコアリングアルゴリズムと同一となる。
∂ｌ／∂Ｂを、パラメータＢに関する１の一次導関数の（Ｊ−１）ｐ×１ベクトルとする。
さらに、［∂^２ｌ／∂Ｂ∂Ｂ］を、パラメータＢに関する二次導関数１の（Ｊ−１）ｐ×（Ｊ−１）ｐ行列とする。 Here, the Newton-Raphson method is used to obtain the parameter B that maximizes the log likelihood. However, this method is the same as the Fisher scoring algorithm of this model because the expected value of the second derivative of 1 for the parameter B is the same as the expected value of the observation target.
Let ∂l / ∂B be the (J−1) p × 1 vector of the first derivative of 1 with respect to parameter B.
Further, let [∂ ² l / ∂B∂B] be the (J−1) p × (J−1) p matrix of the second derivative 1 with respect to the parameter B.

ここで、［∂^２ｌ／∂Ｂ∂Ｂ］は、以下のように与えられる。 Here, [∂ ² l / ∂ B∂ B] is given as follows.

Δ_ｉは、次のような（Ｊ−１）×（Ｊ−１）行列である。

Δ _i is the following (J−1) × (J−1) matrix.

ここで、π_ｉ ^（−Ｊ）＝π_ｉ１，．．．，π_ｉＪ−１であり、Diag（π_ｉ ^（−Ｊ））はπ_ｉ ^（−Ｊ）のβ_ｊ＊対角行列である。

Here, π _i ^(−J) = π _i1,. . . , Π _iJ−1 and Diag (π _i ^(−J) ) is a β _{j *} diagonal matrix of π _i ^(−J) .

反復νにおけるパラメータ推定値をＢ^（ν）とすると、反復ν＋１におけるパラメータ推定値Ｂ^{（ν＋１）}は次の式のように与えられる。 Assuming that the parameter estimation value at the iteration ν is B ^(ν) , the parameter estimation value B ^{(ν + 1) at} the iteration ν + 1 is given by the following equation.

ξ＞０はｌ（Ｂ^{（ν＋１）}）−ｌ（Ｂ^（ν））≧０となるようなステップ基準スカラーであり、Ｘ^＊は独立ベクトルの（Ｊ−１）ｐ×（Ｊ−１）行列となる。
ｌ（Ｂ^{（ν＋１）}）−ｌ（Ｂ^（ν））＜０の場合は、段階二分法を使用し、νがステップの最大数とすると、ξの値のセットは｛１／２^ν：ν＝０，．．．，ν−１｝となる。

ξ> 0 is a step reference scalar such that l (B ^{(ν + 1)} ) −l (B ^(ν) ) ≧ 0, and X ^* is a (J−1) p × (J−1) matrix of independent vectors. It becomes.
If l (B ^{(ν + 1)} ) −l (B ^(ν) ) <0, then using the step bisection method and ν being the maximum number of steps, the set of values of ξ is {1/2 ^ν : ν = 0,. . . , Ν−1}.

２つの収束基準ε_ｋ＞０及びε_ｐ＞０が与えられると、次の何れかの基準が満たされた場合に反復が収束したものと見なされる。
（１）｜ｌ（Ｂ^{（ν＋１）}）−ｌ（Ｂ^（ν））｜＜ε_ｋ
（２）ｍａｘ_ｉ｜Ｂ_ｉ ^{（ν＋１）}−Ｂ_ｉ ^（ν）｜＜ε_ｐ
（３）∂ｌ／∂Ｂ^{（ν＋１）}中の上記要素の最大値がｍｉｎ（ε_ｋ，ε_ｐ）未満
このような演算方法により、分別モデルが生成される。 _Given two convergence criteria ε _k > 0 and ε _p > 0, an iteration is considered converged if any of the following criteria is met:
(1) | l (B ^{(ν + 1)} ) −l (B ^(ν) ) | <ε _k
(2) max _i | B _i ^{(ν + 1)} −B _i ^(ν) | <ε _p
(3) The maximum value of the above element in ∂l / ∂B ^{(ν + 1)} is less than min (ε _k , ε _p ) A classification model is generated by such a calculation method.

・基礎医学的データの生成方法
次に、基礎医学的データの生成方法について説明する。
本実施形態では、基礎医学的データとして患者から採取した糞便に含まれる腸内常在菌等の分析値を使用する場合について説明する。 -Method for generating basic medical data Next, a method for generating basic medical data will be described.
In the present embodiment, a case will be described in which analysis values of intestinal resident bacteria contained in feces collected from a patient are used as basic medical data.

まず、腸内常在菌の分析値を得る方法について詳述する。
腸内常在菌の分析値を得る方法には、例えばターミナル−ＲＦＬＰ（Terminal Restriction Fragment Length Polymorphism Analysis：Ｔ−ＲＦＬＰ）法がある。
Ｔ−ＲＦＬＰ法は、被検査物から微生物由来の１６ＳｒＤＮＡ遺伝子を抽出し、その鋳型ＤＮＡをＰＣＲ（ポリメラーゼ連鎖反応：Polymerase Chain Reaction）により増幅し、制限酵素（ＤＮＡの特定の塩基配列部位を切断する性質を持つ酵素）による消化後にフラグメント検査・分析を行い、制限酵素の切断部位が異なることを利用してそのピーク位置や強度の違いを計測する方法である。
Ｔ−ＲＦＬＰ法において、各ＤＮＡ断片は共生微生物由来のＯＴＵ（Operational Taxonomic Unit：操作上の分別単位）として分別・計測される。 First, a method for obtaining an analysis value of intestinal resident bacteria will be described in detail.
For example, a terminal-RFLP (Terminal Restriction Fragment Length Polymorphism Analysis: T-RFLP) method is used as a method for obtaining an analysis value of intestinal resident bacteria.
The T-RFLP method extracts a 16S rDNA gene derived from a microorganism from an object to be examined, amplifies the template DNA by PCR (Polymerase Chain Reaction), and cleaves a restriction enzyme (specific base sequence site of DNA). In this method, fragment inspection / analysis is performed after digestion with an enzyme having a property of oxidization, and the difference in the cleavage site of the restriction enzyme is used to measure the difference in peak position and intensity.
In the T-RFLP method, each DNA fragment is classified and measured as an OTU (Operational Taxonomic Unit) derived from a symbiotic microorganism.

このような分析により、例えば図４（ｂ）に示したような基礎医学的データが得られる。
ここで、「Ｂ３３２」や「Ｂ４９４」等の項目名は、使用した制限酵素と、その制限酵素を使用して得られたピーク位置とを示すものである。
すなわち、「Ｂ３３２」の場合、例えばＢｓｌIという制限酵素を用いて得られたピーク位置が「３３２」であるＯＴＵを意味している。すなわち、制限酵素の頭文字と、その制限酵素を用いて得られたピーク位置とを結びつけて項目名としている。
本実施形態では、制限酵素としてＢｓｌIを使用した例を示したが、本発明はこれには限定されない。他の制限酵素、例えば、ＭｓｐI、ＡｌｕI、ＨａｅIII等を使用してもよい。 By such an analysis, for example, basic medical data as shown in FIG. 4B is obtained.
Here, item names such as “B332” and “B494” indicate the restriction enzyme used and the peak position obtained using the restriction enzyme.
That is, in the case of “B332”, for example, it means an OTU whose peak position is “332” obtained using a restriction enzyme BslI. That is, the item name is formed by combining the initial letter of the restriction enzyme and the peak position obtained using the restriction enzyme.
In the present embodiment, an example in which BslI is used as a restriction enzyme has been shown, but the present invention is not limited to this. Other restriction enzymes such as MspI, AluI, HaeIII and the like may be used.

各ＯＴＵに帰属する微生物種は極めて多く、また人の共生微生物はその大部分がその種名や生理機能について未解明であるため、どのＯＴＵがどの微生物由来のものであるかは現状では複数の制限酵素を使用する等により類推するしかない。しかし今後、安価で再現性のある画期的なＤＮＡ検査方法が開発されれば、基礎医学的データとして、現状のＯＴＵよりも正確な個別微生物群名、種名などを含む資料を得られると思われる。
このような事情により、本実施形態では、使用した制限酵素と、その制限酵素を使用して得られたピーク位置とを示すＯＴＵを項目名として採用している。 There are a large number of microbial species belonging to each OTU, and most of the human symbiotic microorganisms are unclear about their species names and physiological functions. It can only be inferred by using restriction enzymes. However, if an innovative and reproducible DNA testing method is developed in the future, it will be possible to obtain materials including the names of individual microorganisms and species that are more accurate than the current OTU as basic medical data. Seem.
Due to such circumstances, in this embodiment, OTU indicating the restriction enzyme used and the peak position obtained using the restriction enzyme is adopted as the item name.

・データマイニングの手法によるデータ解析の具体例
以下では、データマイニングの手法によりデータ解析の具体例を示す。
一例として、図４（ａ）に示す各項目をそれぞれ特性とした場合のデータ解析結果を示す。
［１］ＨｂＡ_１ｃを特性とした場合
ＨｂＡ_１ｃ（ヘモグロビンＡ_１ｃ）を特性として選択した場合について説明する。
ＨｂＡ_１ｃはブドウ糖と結びついたヘモグロビン（血色素）であり、糖尿病と大きな関連性を有する項目である。
図５は、Ｃ＆ＲＴ法でＨｂＡ_１ｃを特性として解析を行った結果得られた決定木の一例である。図５は、定期健康診断で生活習慣病が懸念された男女１２１名を対象にしたアンケート、問診、血液も含む各種検査からの取得された臨床的データおよび基礎医学的データを基にしたものである。 Specific example of data analysis by data mining method Hereinafter, a specific example of data analysis by the data mining method will be shown.
As an example, a data analysis result in the case where each item shown in FIG.
[1] When HbA _1c is used as a characteristic A case where HbA _1c (hemoglobin A _1c ) is selected as a characteristic will be described.
HbA _1c is hemoglobin (hemoglobin) associated with glucose and is an item having a great relevance to diabetes.
FIG. 5 is an example of a decision tree obtained as a result of analysis using HbA _1c as a characteristic by the C & RT method. Figure 5 is based on clinical data and basic medical data obtained from questionnaires, interviews, and various tests including blood, targeting 121 men and women who were concerned about lifestyle-related diseases in regular health examinations. is there.

図５において、決定木は左側から右側へと伸長している。
最も左側のノードであるノード０をルートノードといい、データ解析の対象となった全ての患者がこのノードに含まれる。
ノード０に含まれる全ての患者は、予めＡ〜Ｄの４つのカテゴリに分別されている。
このカテゴリ分けは、例えば解析の目的や性格に合わせて、事業者１によって例えば事業者１の入力部１０４を介した操作により予め行われている。
図５に示す例では、ＨｂＡ_１ｃの値の多寡に応じてカテゴリ分けがなされている。カテゴリ分けは、例えばＨｂＡ_１ｃの値が所定のしきい値以下であるか、所定のしきい値より大であるか、に応じてなされればよい。 In FIG. 5, the decision tree extends from the left side to the right side.
The leftmost node, node 0, is referred to as a root node, and all patients subjected to data analysis are included in this node.
All patients included in node 0 are classified in advance into four categories A to D.
This categorization is performed in advance by the operator 1 through, for example, an operation via the input unit 104 of the operator 1 according to the purpose and character of the analysis.
In the example shown in FIG. 5, categorization is performed according to the number of values of HbA _1c . The categorization may be performed according to, for example, whether the value of HbA _1c is equal to or less than a predetermined threshold value or greater than the predetermined threshold value.

図６に、図５におけるＨｂＡ_１ｃのカテゴリ分けに使用したしきい値の一例を表として示す。
図６に示すように、カテゴリＡは最もＨｂＡ_１ｃ濃度の低い患者が属するカテゴリであり、カテゴリＤが最もＨｂＡ_１ｃ濃度の高い患者が属するカテゴリである。最もＨｂＡ_１ｃ濃度の高いカテゴリＤに属する患者群は、糖尿病の重篤な患者である。 FIG. 6 is a table showing an example of threshold values used for categorizing HbA _1c in FIG.
As shown in FIG. 6, category A is the category to which the patient with the lowest HbA _1c concentration belongs, and category D is the category to which the patient with the highest HbA _1c concentration belongs. A patient group belonging to category D having the highest HbA _1c concentration is a severely diabetic patient.

図５及び図６に示す例では、ＨｂＡ_１ｃ濃度が正常である患者が属するカテゴリ（カテゴリＡ）と、要注意状態の患者が属するカテゴリ（カテゴリＢおよびＣ）、重篤な糖尿病である患者が属するカテゴリＤとにカテゴリ分けを行うことにより、共生微生物とＨｂＡ_１ｃ濃度との関連性、ひいては共生微生物と糖尿病との関連性についての知見を得ることを目的としている。 In the examples shown in FIGS. 5 and 6, the category to which patients with normal HbA _1c concentration belong (category A), the category to which patients in need of attention (category B and C), and patients with severe diabetes are shown. The purpose is to obtain knowledge about the relationship between the symbiotic microorganisms and the HbA _1c concentration, and thus the relationship between the symbiotic microorganisms and diabetes by performing category classification to the category D to which it belongs.

図５において、各ノードには類別された各カテゴリＡ〜ＤそれぞれのＨｂＡ_１ｃの値に対応する患者数を「ｎ」の欄に、患者数全体における各カテゴリの患者数の割合（パーセンテージ）を「％」の欄に示している。従って、ノード０において、「ｎ」欄の合計は患者数全体の１２１人であり、「％」欄の合計は「１００．０００」％である。
図５に示す例では、全患者数１２１人のうち、カテゴリＡに該当する患者数は４２人であり、割合は３４．７１１％である。カテゴリＢに該当する患者数は４６人であり、割合は３８．０１７％である。カテゴリＣに該当する患者数は１４人であり、割合は１１．５７０％である。カテゴリＤに該当する患者数は１９人であり、割合は１５．７０２％である。 In FIG. 5, each node indicates the number of patients corresponding to the value of HbA _{1c in} each category A to D in the “n” column, and the ratio of the number of patients in each category to the total number of patients. This is shown in the “%” column. Therefore, in node 0, the total in the “n” column is 121 people in the total number of patients, and the total in the “%” column is “100.000”%.
In the example shown in FIG. 5, the number of patients corresponding to category A out of the total number of patients 121 is 42, and the ratio is 34.711%. The number of patients corresponding to category B is 46, and the ratio is 38.017%. The number of patients corresponding to category C is 14, and the ratio is 11.570%. The number of patients corresponding to category D is 19, and the ratio is 15.702%.

図５において、ノード０は、ノード１およびノード２に分別されている。
この分別の条件が図５におけるノード０とノード１および２との間に記述されている。ノード０の右側に記述された「Ｂ４９４＿ｔｒ」が分別に寄与するＯＴＵを示し、ノード１の左側に記述された「＜＝２１．９２３」およびノード２の左側に記述された「＞２１．９２３」が分別の際の境界値を示している。
すなわち、図５では、ノード０からノード１および２に分別する際に、「Ｂ４９４＿ｔｒ」というＯＴＵの値が境界値２１．９２３以下であればノード１に、２１．９２３より大であればノード２に分別している。
In FIG. 5, node 0 is classified into node 1 and node 2.
This classification condition is described between node 0 and nodes 1 and 2 in FIG. Node "B494_tr" described in the right side of 0 indicates the OTU contribute to fractionation, described on the left side of node 1 '<= 21.923 "and written to the left of the node 2"> 21.9 23 "Indicates the boundary value at the time of classification.
That is, in FIG. 5, when the node 0 is sorted into the nodes 1 and 2, if the value of the OTU “B494_tr” is equal to or less than the boundary value 21.923, the value is set to the node 1; Are separated.

ここで、分別に寄与するＯＴＵは、上述したようにＣ＆ＲＴ法による改善度の算出により、最も改善度の高くなるように選択されたものである。図５のノード０からノード１および２への分別の場合は、「Ｂ４９４＿ｔｒ」となる。
なお、「Ｂ４９４＿ｔｒ」とは、Ｂを頭文字とする制限酵素ＢｓlIによるピーク位置が４９４であるＯＴＵを意味しており、上述した基礎医学的データの項目の一つである。なお、「Ｂ４９４＿ｔｒ」の「＿ｔｒ」は上述したＣ＆ＲＴ法による演算の際、各項目毎に基準化した値を使用したことを示している。 Here, the OTU that contributes to the classification is selected to have the highest degree of improvement by calculating the degree of improvement by the C & RT method as described above. In the case of classification from the node 0 to the nodes 1 and 2 in FIG. 5, “B494_tr” is obtained.
“B494_tr” means an OTU having a peak position of 494 due to the restriction enzyme BslI starting with B, and is one of the items of basic medical data described above. Note that “_tr” of “B494_tr” indicates that a value normalized for each item is used in the above-described calculation by the C & RT method.

図５に示した例では、ノード１には、カテゴリＡに４１人、カテゴリＢに１４人含まれカテゴリＣおよびＤの患者は０人である。一方、ノード２には、カテゴリＡが１人、カテゴリＢが３２人、カテゴリＣが１４人、カテゴリＤが１９人含まれる。 In the example shown in FIG. 5, node 1 includes 41 people in category A and 14 people in category B, and 0 patients in categories C and D. On the other hand, the node 2 includes one category A, 32 categories B, 14 categories C, and 19 categories D.

また、図５に示した例では、ノード３にはカテゴリＡの患者のみが含まれ、ノード１５もほぼカテゴリＡの患者のみが集まっている。また、ノード９にはカテゴリＢの患者のみが集まり、ノード７、ノード１６、ノード１８、ノード２６には、カテゴリＢの患者が多く集まっている。また、ノード１１および２５にはカテゴリＣの患者のみが集まっている。カテゴリＤの患者は、ノード１２にのみ全員集まっている。 In the example shown in FIG. 5, the node 3 includes only category A patients, and the node 15 includes almost only category A patients. In addition, only the patient of category B gathers at node 9, and many patients of category B gather at node 7, node 16, node 18, and node 26. Nodes 11 and 25 have only category C patients. All patients of category D are gathered only at node 12.

以上のことから、ノード１１およびノード１２に、ＨｂＡ_１ｃ濃度が最も高い（糖尿病の重篤な患者である）カテゴリＤの全員と、その次にＨｂＡ_１ｃ濃度が高いＣグループの患者とが集まっていることがわかる。図５によれば、ノード１１および１２へと分別するＯＴＵは「Ｂ４９４」と「Ｂ３３２」であるため、これらのＯＴＵが糖尿病の発生に大きな関連性を有する、ということを上記解析から容易に推測することができる。 From the above, at node 11 and node 12, all the members of category D having the highest HbA _1c concentration (which is a serious patient with diabetes) and the patients of group C having the next highest HbA _1c concentration gathered. I understand that. According to FIG. 5, since the OTUs to be sorted into nodes 11 and 12 are “B494” and “B332”, it is easily estimated from the above analysis that these OTUs have a great relationship with the occurrence of diabetes. can do.

［２］収縮期血圧を特性とした場合
収縮期血圧を特性として選択した場合について説明する。
収縮期血圧は高血圧症と大きな関連性を有する特性である。
図７は、Ｃ＆ＲＴ法で収縮期血圧を特性として解析を行った結果得られた決定木の一例である。図７は、定期健康診断で生活習慣病が懸念された男女１２１名を対象にしたアンケート、問診、血液も含む各種検査からの取得された臨床的データおよび基礎医学的データを基にしたものである。 [2] When systolic blood pressure is used as a characteristic A case where systolic blood pressure is selected as a characteristic will be described.
Systolic blood pressure is a characteristic that has great relevance to hypertension.
FIG. 7 is an example of a decision tree obtained as a result of analysis using systolic blood pressure as a characteristic by the C & RT method. Figure 7 is based on clinical data and basic medical data obtained from questionnaires, interviews, and various tests including blood, targeting 121 men and women who were concerned about lifestyle-related diseases in regular health examinations. is there.

図７に示す例では、図５に示した例と同様に、ノード０からＣ＆ＲＴ法による改善度が高くなるように分別を決定し、右側へ向かって決定木を伸長し、各ノードが単一カテゴリ或いはそれに近い状態になるまで分別処理を行っている。 In the example illustrated in FIG. 7, as in the example illustrated in FIG. 5, the classification is determined from the node 0 so that the improvement degree by the C & RT method is increased, the decision tree is extended toward the right side, and each node is single. The classification process is performed until the category or a state close thereto.

図７に示す例では、全ての患者は予め、例えば事業者１によりＪ〜Ｍの４つのカテゴリに分別されている。各患者は収縮期血圧の値に応じて、カテゴリＪ〜Ｍのいずれかに分別される。図８に、収縮期血圧の値に応じたカテゴリ分けに使用したしきい値の一例を示す。
図８に示すしきい値によりカテゴリ分けされたカテゴリＪ〜Ｍにおいて、カテゴリＭが最も収縮期血圧が高い患者の属するカテゴリである。 In the example illustrated in FIG. 7, all patients are classified in advance into four categories J to M by the business operator 1 in advance. Each patient is classified into one of categories J to M according to the value of systolic blood pressure. FIG. 8 shows an example of threshold values used for categorization according to systolic blood pressure values.
In the categories J to M classified by the threshold shown in FIG. 8, the category M is the category to which the patient with the highest systolic blood pressure belongs.

最も収縮期血圧が高いカテゴリＭに属する患者は、図７によれば、ノード１０に全員が含まれ、次に収縮期血圧が高いカテゴリであるカテゴリＬに属する患者は、その多くがノード９に含まれる。
従って、図７に示す決定木からは、「Ｂ４６９」、「Ｂ１２４」、「Ｂ３６６」等のＯＴＵが高血圧症の発生に大きな関連性を有することを容易に推測することができる。
［３］ＬＤＬ−ＣおよびＨＤＬ−Ｃを特性とした場合
ＬＤＬ−ＣおよびＨＤＬ−Ｃ（コレステロール値）を特性として選択した場合について説明する。
ＬＤＬ−ＣおよびＨＤＬ−Ｃは脂質異常症（高脂血症）と大きな関連性を有する特性である。
図９は、Ｃ＆ＲＴ法でＬＤＬ−ＣおよびＨＤＬ−Ｃを特性として解析を行った結果得られた決定木の一例である。図９は、定期健康診断で生活習慣病が懸念された男女１２１名を対象にしたアンケート、問診、血液も含む各種検査からの取得された臨床的データおよび基礎医学的データを基にしたものである。 According to FIG. 7, all of the patients belonging to category M having the highest systolic blood pressure are included in node 10, and many of the patients belonging to category L, which is the next category having the highest systolic blood pressure, belong to node 9. included.
Therefore, it can be easily estimated from the decision tree shown in FIG. 7 that OTUs such as “B469”, “B124”, and “B366” have a great relevance to the occurrence of hypertension.
[3] When LDL-C and HDL-C are used as characteristics A case where LDL-C and HDL-C (cholesterol value) are selected as characteristics will be described.
LDL-C and HDL-C are properties that have great relevance to dyslipidemia (hyperlipidemia).
FIG. 9 is an example of a decision tree obtained as a result of analyzing the characteristics of LDL-C and HDL-C by the C & RT method. Figure 9 is based on clinical data and basic medical data obtained from questionnaires, interviews, and various tests including blood targeting 121 men and women who were concerned about lifestyle-related diseases in regular health examinations. is there.

図９に示す例では、図５および７に示した例と同様に、ノード０からＣ＆ＲＴ法による改善度が高くなるように分別を決定し、右側へ向かって決定木を伸長し、各ノードが単一カテゴリ或いはそれに近い状態になるまで分別処理を行っている。 In the example shown in FIG. 9, as in the examples shown in FIGS. 5 and 7, the classification is determined from the node 0 so that the improvement degree by the C & RT method is increased, the decision tree is extended toward the right side, and each node is The separation process is performed until a single category or a state close thereto.

図９に示す例では、全ての患者は予め、例えば事業者１によりＰ〜Ｓの４つのカテゴリに分別されている。各患者はＬＤＬ−ＣおよびＨＤＬ−Ｃの値に応じて、カテゴリＰ〜Ｓのいずれかに分別される。図１０にＬＤＬ−ＣおよびＨＤＬ−Ｃの値に応じたカテゴリ分けに使用したしきい値の一例を示す。
図１０に示すようなしきい値によりカテゴリ分けされたカテゴリＰ〜Ｓにおいて、カテゴリＳは最も重篤な脂質異常症の患者の属するカテゴリ、カテゴリＰは正常な患者の属するカテゴリである。カテゴリＱおよびＲは脂質異常症に関して要注意状態の患者の属するカテゴリである。 In the example illustrated in FIG. 9, all patients are classified in advance into four categories P to S by the business operator 1 in advance. Each patient is classified into one of categories P to S according to the values of LDL-C and HDL-C. FIG. 10 shows an example of threshold values used for categorization according to the values of LDL-C and HDL-C.
In categories P to S categorized by threshold values as shown in FIG. 10, category S is a category to which the most severe dyslipidemic patient belongs, and category P is a category to which a normal patient belongs. Categories Q and R are categories to which patients who are in need of attention regarding dyslipidemia belong.

図９によれば、ノード１には、重篤な脂質異常症の患者が属するカテゴリＳの全ての患者が含まれており、ノード４には、要注意状態の患者が属するカテゴリＱおよびＲの全ての患者が含まれている。
そして、ノード５にはカテゴリＱの多くの患者が含まれ、ノード１２にはカテゴリＲの全員が含まれている。ノード４からノード５とノード１２への分別に関わるＯＴＵは「Ｂ９９０」である。すなわち、ノード１２からノード１３（全員が正常な患者の属するカテゴリＰに含まれる）とノード１４（要注意状態の患者の属するカテゴリＱおよびＲの患者が含まれる）への分別を考慮すると、「Ｂ９９０」の微妙な濃度差が脂質異常症の発症に大きく影響していることが推測できる。 According to FIG. 9, node 1 includes all patients in category S to which patients with serious dyslipidemia belong, and node 4 includes those in categories Q and R to which patients in need of attention belong. All patients are included.
Node 5 includes many patients of category Q, and node 12 includes all members of category R. The OTU related to the classification from the node 4 to the node 5 and the node 12 is “B990”. That is, considering the separation from the node 12 to the node 13 (all are included in the category P to which normal patients belong) and the node 14 (including the patients in the categories Q and R to which the patients in need of attention are included) It can be inferred that the subtle concentration difference of “B990” greatly affects the onset of dyslipidemia.

以上説明したように、本実施形態の医療データ解析方法によれば、一般的に行われる健康診断や検査において得られる臨床的データと、糞便、尿、唾液、鼻腔粘液、皮膚や膣液等や血液等の被検査物を基に基礎医学系の研究所等における検査および／または分析により生成される基礎医学的データと、を基に、データマイニングの手法によるデータ解析を行うので、臨床的データと基礎医学的データとを有機的に対応付け、これらの関連性を容易に把握可能な決定木を構築することができる。
また、臨床的データに含まれる複数の項目のうち、データ解析の目的に応じた項目を選択し、選択した項目（特性）と基礎医学的データの各項目の数値データを基にデータ解析を行うことができる。従って、同一のデータ（臨床的データおよび基礎医学的データ）から、目的とする項目を変更するだけで、目的に応じた決定木を構築することができる。 As described above, according to the medical data analysis method of the present embodiment, clinical data obtained in general medical examinations and examinations, feces, urine, saliva, nasal mucus, skin, vaginal fluid, etc. Because data analysis by data mining techniques is performed based on basic medical data generated by testing and / or analysis in basic medical laboratories based on test objects such as blood, clinical data And a basic medical data can be organically associated with each other, and a decision tree can be constructed in which these relationships can be easily grasped.
In addition, among the multiple items included in the clinical data, an item corresponding to the purpose of data analysis is selected, and data analysis is performed based on the selected item (characteristic) and the numerical data of each item of basic medical data. be able to. Therefore, it is possible to construct a decision tree according to the purpose simply by changing the target item from the same data (clinical data and basic medical data).

さらに、データ解析に使用するデータマイニングの手法として、決定木を構築可能である演算方法（例えば、Ｃ＆ＲＴ法、ＣＨＡＩＤ法、ＱＵＥＳＴ法、Ｃ５．０法等）を使用することにより、目的とする特性に対応した患者がどのノードに含まれているかを視覚的かつ容易に把握することが可能であり、目的とする特性に応じた疾病等に対応する基礎医学的データの項目を容易に理解することができる。 Furthermore, as a data mining technique used for data analysis, a target characteristic can be obtained by using a calculation method (for example, C & RT method, CHAID method, QUEST method, C5.0 method, etc.) capable of constructing a decision tree. It is possible to visually and easily grasp which node contains a patient corresponding to, and to easily understand the items of basic medical data corresponding to diseases etc. according to the target characteristics Can do.

さらに、決定木および分別モデルの構築に使用した臨床的データと基礎医学的データの患者以外の新しい患者のデータに対して、予め構築した決定木および分別モデルを適用することにより、新しい患者に対する疾病状況の予測分別等が可能になる。
具体的には、例えば、新しい患者の基礎医学的データのみを取得した場合に、上述した実施形態において特性「ＨｂＡ_１ｃ」について構築した決定木（図５参照）を参照するならば、構築した決定木において、分別に対応するＯＴＵと分別境界値とが明らかになっているので、新しい患者のデータに対しては既に生成されている分別モデルを適用するだけで、新しい患者のそれぞれがどのノードに含まれるかを容易に推測することができ、これにより目的とする特性に対する新しい患者の疾病状況の予測分別等が可能となる。 In addition, by applying pre-built decision trees and classification models to new patient data other than the clinical and basic medical data patients used to build the decision trees and classification models, The situation can be predicted and sorted.
Specifically, for example, when only basic medical data of a new patient is acquired, if the decision tree (see FIG. 5) constructed for the characteristic “HbA _1c ” in the above-described embodiment is referred to, the constructed decision is made. In the tree, OTUs corresponding to classification and classification boundary values are made clear, so by applying the already generated classification model to new patient data, each new patient is assigned to which node. It is possible to easily infer whether it is included, thereby making it possible to predict and classify a new patient's disease state with respect to a target characteristic.

本発明は上述した実施形態には限定されない。
すなわち、当業者は、本発明の技術的範囲またはその均等の範囲内において、上述した実施形態の構成要素に関し、様々な変更、コンビネーション、サブコンビネーション、並びに代替を行ってもよい。 The present invention is not limited to the embodiment described above.
That is, those skilled in the art may make various modifications, combinations, subcombinations, and alternatives regarding the components of the above-described embodiments within the technical scope of the present invention or an equivalent scope thereof.

上述した実施形態では、図５、７、９に示す決定木を構築するデータ解析において、選択された特性の値に応じてそれぞれ４つのカテゴリに分割したが、これは一例であり、本発明はこれには限定されない。より多くのカテゴリ、例えば６〜８つのカテゴリに分割することにより、より重篤な患者が含まれるカテゴリを見いだし、特性の関連するＯＴＵをより好適に特定することが可能となる。 In the above-described embodiment, in the data analysis for constructing the decision trees shown in FIGS. 5, 7, and 9, each of the categories is divided into four categories according to the selected characteristic values. This is not a limitation. By dividing into more categories, for example, 6 to 8 categories, it is possible to find a category that includes a more serious patient, and to more appropriately identify an OTU related to characteristics.

また、上述した実施形態では、図５、７、９に示す決定木を構築するデータ解析において、事業者１が予め目的とした特性の値に応じたカテゴリ分けを行う例について説明したが、本発明はこれには限定されない。臨床的データは必ずしも数値やカテゴリ・データとは限らず、例えば、患者２によるアンケートの回答結果や、医師の診断結果等、文章や語句の羅列により記述されている場合もある。例えば、事業者１がこのような文章や語句の羅列の項目を特性として選択し、記述の内容に応じてカテゴリ分けをするようにしてもよい。例えば、データとして「なし」、「腹部に痛みあり」、「脚部に痛みあり」・・・等が記述されている「自覚症状」という項目を選択し、「自覚症状のありなし」でカテゴリ分けをしてもよいし、「自覚症状のある部位」でカテゴリ分けをしてもよい。このように、カテゴリ分けは解析の目的に応じて事業者１が自由に設定することが可能である。 In the above-described embodiment, the example in which the business operator 1 performs categorization according to the target characteristic value in advance in the data analysis for constructing the decision trees shown in FIGS. The invention is not limited to this. The clinical data is not necessarily numerical values or category data, and may be described by a list of sentences and phrases such as a response result of a questionnaire by the patient 2 and a diagnosis result of a doctor. For example, the business operator 1 may select items in such a list of sentences and phrases as characteristics and categorize them according to the contents of the description. For example, select the item “Subjective Symptom” in which “None”, “Pain in the abdomen”, “Pain in the leg”, etc. are described as data, and select “ You may divide and may categorize by "site with a subjective symptom". In this way, the category division can be freely set by the business operator 1 according to the purpose of analysis.

上述した実施形態では、データマイニングの手法の例としてＣ＆ＲＴ法及びロジスティック回帰法について説明したが、本発明にて使用するデータマイニングの演算方式はこれらには限定されない。例えば、ＣＨＡＩＤ法、ＱＵＥＳＴ法、Ｃ５．０法等、決定木（Decision Tree）の構築と分別モデルとを提供する演算方法や、ベイズ法、ロジスティック回帰法、ニューラルネットワークアルゴリズム、ＳＶＭ等、決定木を構築せずに分別モデルを提供する演算方法、のうち、解析の目的に合わせた演算方法を例えば事業者１が任意に選択して使用すればよい。なお、決定木を構築しない演算方法の場合は、ノードの分別や関連性について視覚的に把握することは不可能となるが、いずれの演算方法でも分別モデルは構築されるので、目的とする特性に関するデータ解析と分別は可能である。 In the above-described embodiment, the C & RT method and the logistic regression method have been described as examples of the data mining method. However, the data mining calculation method used in the present invention is not limited thereto. For example, CHAID method, QUEST method, C5.0 method, etc., which provide decision tree construction and classification model, Bayes method, logistic regression method, neural network algorithm, SVM, etc. Of the calculation methods that provide the classification model without construction, for example, the operator 1 may arbitrarily select and use a calculation method that matches the purpose of the analysis. In the case of a calculation method that does not construct a decision tree, it is impossible to visually grasp the classification and relevance of nodes. Data analysis and separation are possible.

また、決定木を構築せずに分別モデルを生成する演算方式を採用した場合は、以下説明する数理的重要度をさらに算出し、目的とする特性に対して関係の深い項目を推定することが可能である。
数理的重要度は、例えば最近隣分析法等により算出する。
数理的重要度は、目的とする特性に対する類似性に基づいて各成分のデータパターンを識別し、その遠近距離を分類する非類似度の尺度として用いるものである。
具体的には、成分の重要度をＦＩ_（ｐ）とし、パターンを比較した際の誤差率または誤差平方和をｅとすると、生成された数値モデルにＸ_（１），Ｘ_（２），．．．，Ｘ_（ｍ）（１≦ｍ≦Ｐ^０）があれば、そのモデルＯＴＵＸ_（ｐ）の重要度を、次の式で算出する。 In addition, when an arithmetic method that generates a classification model without constructing a decision tree is adopted, it is possible to further calculate the mathematical importance described below and estimate items closely related to the target characteristics. Is possible.
The mathematical importance is calculated by, for example, nearest neighbor analysis.
The mathematical importance is used as a measure of dissimilarity for identifying the data pattern of each component based on the similarity to the target characteristic and classifying the distance.
Specifically, if the importance of the component is FI _(p) and the error rate or the error sum of squares when the patterns are compared is e, the generated numerical model is represented by X ₍₁₎ , X ₍₂₎ ,. . . , X _(m) (1 ≦ m ≦ P ⁰ ₎ , the importance of the model OTU X _(p) is calculated by the following equation.

まず、モデルから当該成分Ｘ_（ｐ）を除き、残りの成分Ｘ_（１），Ｘ_（２），．．．，Ｘ_{（ｐ−１）}，Ｘ_{（ｐ＋１）}，．．．，Ｘ_（ｍ）に基づいて、誤差率または誤差平方和ｅ_（ｐ）を算出・比較する。
そして、ＦＩ_（ｐ）＝ｅ_（ｐ）＋１／ｍを算出し、最後に当該成分Ｘ_（ｐ）の重要度を得る。
ここで、Ｘは、要素Ｘ_ｐｎを持つ２次元のＰ×Ｎ行列で、ｐ＝１，〜，Ｐは成分を、ｎ＝１，〜，Ｎは被験生体の計測事例を指す。また、Ｐは、目的とする特性の次元数であって、連続型では成分の数、カテゴリ型の特性では成分全体のカテゴリ数である。 First, the component X _(p) is removed from the model, and the remaining components X ₍₁₎ , X ₍₂₎ ,. . . , X _(p−1) , X _{(p + 1),.} . . , X _(m) , the error rate or error sum of squares e _(p) is calculated and compared.
Then, FI _(p) = e _(p) + 1 / m is calculated, and finally the importance of the component X _(p) is obtained.
Here, X is a two-dimensional P × N matrix having an element X _pn , p = 1,..., P is a component, and n = 1,. Further, P is the number of dimensions of the target characteristic, which is the number of components in the continuous type, and the number of categories of the entire component in the category type characteristics.

また、上述した実施形態では、基礎医学的データとして、患者２の糞便を被検査物とし、腸内常在菌についてのデータを使用する場合について説明したが、本発明はこれには限定されない。例えば、尿、唾液、鼻腔粘液、皮膚や膣液を被検査物としてこれらに含まれる共生微生物を解析の対象としてもよい。また、例えば、血液を被検査物としたメタボローム解析（全成分解析：代謝産物を網羅的に解析すること）の結果を基礎医学的データとして使用してもよい。 Moreover, although embodiment mentioned above demonstrated the case where the stool of patient 2 was used as a to-be-tested object as basic medical data, and the data about intestinal resident bacteria were used, this invention is not limited to this. For example, urine, saliva, nasal mucus, skin and vaginal fluid may be used as test objects, and symbiotic microorganisms contained therein may be analyzed. Further, for example, the result of metabolome analysis (all component analysis: comprehensive analysis of metabolites) using blood as the test object may be used as basic medical data.

１・・・事業者、２・・・患者、３・・・病院、１００・・・医療データ解析装置、１０１・・・記憶部、１０２・・・表示部、１０３・・・制御部、１０４・・・入力部 DESCRIPTION OF SYMBOLS 1 ... Business operator, 2 ... Patient, 3 ... Hospital, 100 ... Medical data analyzer, 101 ... Memory | storage part, 102 ... Display part, 103 ... Control part, 104 ... Input section

Claims

Practical data on patients, including at least one of physical data on multiple patients, data on patient lifestyle, data on patient's disease status, and data on test results obtained from patients Clinical medical data, which is data obtained during examination and / or treatment, and / or basic medical examination and / or examination of stool, urine, saliva, nasal mucus, skin and vaginal fluid. contains data about probiotic microorganisms obtained by performing analysis, and basic medical data is data about the results of the basic medical examination and / or analysis of the object to be inspected obtained from a patient, a based on, performing data analysis A medical data analysis method for a medical data analyzer,
The medical data analysis device first associates the clinical medical data and the basic medical data for the same patient;
The medical data analysis device for preselected a component of the clinical medical data, based on the basic medical data associated in the first step, data analysis using data mining techniques A second step of performing
Have
The second step is a third step of extracting a similar group in the basic medical data by the data mining technique and generating a classification model for classifying structural characteristics of the entire basic medical data. A medical data analysis method further comprising:

In the second step, at least one of the C & RT method, the CHAID method, the QUEST method, and the C5.0 method is used as the data mining method, and the medical data analysis device uses the method to determine the decision tree and the classification The medical data analysis method according to claim 1, wherein a model is constructed.

In the second step, at least one of a Bayesian method, a logistic regression method, a neural network algorithm, and an SVM (Support Vector Machine) is used as the data mining method. The medical data analysis method according to claim 1, wherein a classification model is constructed.

In the second step, the medical data analysis device sets one item of the pre-selected clinical medical data as an objective variable, and sets each item of the associated basic medical data as an explanatory variable. The medical data analysis method according to any one of claims 1 to 3, wherein the classification model is constructed by a data mining technique.

The physical data regarding the plurality of patients includes at least one of sex, age, height, weight, and obesity,
The data on the lifestyle of the patient includes at least one of smoking habit, supper habit, meal preference, meal content, sleep situation, exercise quality, exercise quantity,
The data regarding the disease state of the patient includes at least one of blood pressure, medication status, past medical history, subjective symptoms, and diagnostic results by medical personnel,
The medical data analysis according to any one of claims 1 to 4, wherein the data related to the test result of the test object obtained from the patient includes at least one of a blood test result and a urine test result. Method.

Practical data on patients, including at least one of physical data on multiple patients, data on patient lifestyle, data on patient's disease status, and data on test results obtained from patients Clinical medical data, which is data obtained during examination and / or treatment, and / or basic medical examination and / or examination of stool, urine, saliva, nasal mucus, skin and vaginal fluid. contains data about probiotic microorganisms obtained by performing analysis, and basic medical data is data about the results of the basic medical examination and / or analysis of the object to be inspected obtained from a patient, a based on, performing data analysis A medical data analysis device,
A storage unit for storing the clinical medical data and the basic medical data;
An input unit that accepts input operations;
A control unit;
Have
The control unit associates the clinical medical data and the basic medical data related to the same patient, and is associated with one item of the clinical medical data selected in advance by an input operation via the input unit. Based on the basic medical data, a data mining technique is used to extract a similar group in the basic medical data and generate a classification model to separate the structural characteristics of the entire basic medical data. Medical data analysis device.

Practical data on patients, including at least one of physical data on multiple patients, data on patient lifestyle, data on patient's disease status, and data on test results obtained from patients Clinical medical data, which is data obtained during examination and / or treatment, and / or basic medical examination and / or examination of stool, urine, saliva, nasal mucus, skin and vaginal fluid. contains data about probiotic microorganisms obtained by performing analysis, and basic medical data is data about the results of the basic medical examination and / or analysis of the object to be inspected obtained from a patient, a based on, performing data analysis A program executed by a computer included in the medical data analysis device,
A first procedure for associating the clinical medical data and the basic medical data for the same patient;
A second procedure for performing data analysis using a data mining method based on the basic medical data associated in the first procedure with respect to one item of the clinical medical data selected in advance; ,
In the second procedure, a third procedure for extracting a similar group in the basic medical data by the data mining technique and generating a classification model for classifying the structural characteristics of the entire basic medical data. When,
A program for causing the computer to execute.