JP2009205464A

JP2009205464A - Medical information processor, medical information processing method, and medical information processing program

Info

Publication number: JP2009205464A
Application number: JP2008047420A
Authority: JP
Inventors: Satoru Hayamizu; 悟速水; Keiko Yamamoto; けい子山本; Tetsutsugu Tamura; 哲嗣田村; Yasuomi Kinosada; 保臣紀ノ定; Masakazu Asano; 昌和浅野; Akira Nakamura; 明中村
Original assignee: Gifu University NUC; Sanyo Electric Co Ltd
Current assignee: Gifu University NUC; Sanyo Electric Co Ltd
Priority date: 2008-02-28
Filing date: 2008-02-28
Publication date: 2009-09-10

Abstract

PROBLEM TO BE SOLVED: To provide a medical information processor for a modeling change corresponding to the level of the state or the condition of a disease or the detail sorting of the disease by automatically clustering and modeling large amounts of data with personal differences taken into sufficient consideration, and simultaneously achieving the more flexible modeling of time series, and the estimation of precise probability density. SOLUTION: The medical information processor 10 classifies the group of medical information in which check items related with a plurality of candidates are prepared in time series into clusters based on the check items. The medical information processor 10 estimates multi-dimensional time series distribution in the group of a plurality of medical information related with a plurality of candidates as probability density functions in each classified cluster, and estimates a representative value as a model from the estimated probability density functions in the clusters. The medical information processor 10 moves the optimal time axis about all samples in each cluster by using the representative value as a reference, and converges the results. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、医療情報処理装置、医療情報処理方法、及び医療情報処理プログラムに関し、特に、大量のデータから時系列解析と統計的手法を組み合わせて確率的な事業に関する統計的な情報を抽出することができる医療情報処理装置、医療情報処理方法、及び医療情報処理プログラムに関する。 The present invention relates to a medical information processing apparatus, a medical information processing method, and a medical information processing program, and in particular, to extract statistical information regarding probabilistic business from a large amount of data by combining time series analysis and statistical methods. The present invention relates to a medical information processing apparatus, a medical information processing method, and a medical information processing program.

平成２０年４月から健康保険法の改正により、医療保険者に対し、健診及び保健指導が義務づけられる。特に医療保険者に対してメタボリックシンドロームの予防に即した保健指導が実施される。個人の一人ひとりに対して、診断や健康指導を行う上で、個人の特性や状況に合わせたきめこまやかな支援や指導が必要である。 From April 2008, due to the revision of the Health Insurance Law, medical insurers are required to undergo medical examinations and health guidance. In particular, health guidance is provided to medical insurers in line with prevention of metabolic syndrome. In providing diagnosis and health guidance to each individual, it is necessary to provide meticulous support and guidance tailored to the individual characteristics and circumstances.

このようなきめこまやかな支援や指導にはかなりのコストが必要であった。また、個人の一人ひとりを対象とした状況の把握や管理も困難であった。
これに対して、近年、情報技術の利用によって、個人を対象としたサービスを行う試みが行われるようになってきた。特に、特定健診の実施に向けて、健康保険組合、保健師、医師、個人を支援するための医療情報システムに関心が集まっている。予め登録した個人の一人ひとりに合わせた健康指導を、Ｗｅｂ上のシステムと電子メールで提供するサイトも登場している。 Such detailed support and guidance required considerable costs. It was also difficult to understand and manage the situation for each individual.
In contrast, in recent years, attempts have been made to provide services for individuals by using information technology. In particular, there is an interest in medical information systems for supporting health insurance societies, public health nurses, doctors, and individuals for the implementation of specific medical examinations. There are also sites that provide health guidance tailored to each individual registered in advance via a Web system and e-mail.

これらの従来技術では、検査データなどを、どのように効率的に入力するかや、どのように時系列として表示するか、或いは、サイトとしての使い勝手のよさが、主要な特徴であり、検診結果や問診のデータを解析し、リスクを算出したり、健康指導の課程を数値的に管理し、支援する仕組みがなかった。 In these conventional technologies, how to efficiently input inspection data, how to display them in time series, or ease of use as a site is a major feature, and the examination results There was no mechanism for analyzing and interrogating data, calculating risks, and numerically managing and supporting health guidance courses.

こうした取り組みとして、個人の健診情報から生活習慣病の発症可能性を予測することが非特許文献１に開示されている。これによれば、福岡県久山市で約４０年間、行ってきた疫学調査のデータを用いている。このデータから導出した疾病リスク算出式と、個人の年齢、体重、血圧、運動量、心電図、コレステロールや血糖値などの検査データから、今後１０年の間に生活習慣病（脳梗塞、虚血性心疾患、糖尿病、高血圧など）が発症する可能性を予測している。 As such an approach, Non-Patent Document 1 discloses predicting the possibility of development of lifestyle-related diseases from personal checkup information. According to this, data from epidemiological surveys conducted in Kuyama City, Fukuoka Prefecture for about 40 years are used. From the disease risk calculation formula derived from this data and test data such as individual age, weight, blood pressure, exercise amount, electrocardiogram, cholesterol and blood sugar level, lifestyle-related diseases (cerebral infarction, ischemic heart disease) in the next 10 years , Diabetes, high blood pressure, etc.).

又、特許文献１、２、４に記載のシステムや、特許文献３に記載の生活習慣病の発症リスクを予測する方法も提案されている。
特許文献１のシステムでは、臨床検査数値データ項目について、各検査機関毎に異なる基準値と成書記載の診断決定値、或いは基礎疾患、合併症、治療等の副作用等から予測される検査項目における個別患者の正常範囲値、診断決定値を任意に設定可能にしている。そして、同システムでは、診断時に同患者について測定した臨床検査数値データを入力することにより、上記設定により正常範囲値、診断決定値を基準とした診断上の重症度に対応する臨床診断評価値に変換処理するようにしている。 In addition, a system described in Patent Documents 1, 2, and 4 and a method for predicting the risk of developing lifestyle-related diseases described in Patent Document 3 have been proposed.
In the system of Patent Document 1, for clinical test numerical data items, there are different reference values for each laboratory and diagnostic decision values described in written documents, or test items predicted from side effects such as basic diseases, complications, treatments, etc. The normal range value and diagnosis decision value of each individual patient can be set arbitrarily. In the system, by inputting the clinical laboratory numerical data measured for the patient at the time of diagnosis, the above settings are used to obtain the clinical diagnostic evaluation value corresponding to the diagnostic severity based on the normal range value and the diagnostic decision value. Conversion processing is done.

特許文献２のシステムでは、患者の検査値および臨床所見を含む診療データ入力部と、糖尿病に関連する臓器、器官の機能を数理モデルとして記述した生体モデルを用いて生体の挙動を再現する生体モデル駆動部と、前記診療データに基づいて生体モデルのパラメータセットを推定して、患者固有の生体モデルを生成する生体モデル部を有している。又、このシステムは、生成されたモデルのパラメータセットに基づいて患者の糖尿病の病態を分析する病態分析部と、分析された病態ごとに定められた診断判断基準を用いて診療支援情報を生成する診療支援情報生成部と、前記病態分析部および／または診療支援情報生成部より得られた情報を出力する診療支援情報出力部を備えている。 In the system of Patent Document 2, a living body model that reproduces the behavior of a living body by using a medical data input unit including patient test values and clinical findings, and a living body model that describes organs related to diabetes and the functions of the organs as mathematical models. A driving unit and a biological model unit that generates a patient-specific biological model by estimating a parameter set of the biological model based on the medical data. In addition, the system generates medical support information using a pathological condition analysis unit that analyzes the pathological condition of a patient's diabetes based on the parameter set of the generated model, and a diagnostic criterion that is determined for each analyzed pathological condition. A medical support information generating unit, and a medical support information output unit that outputs information obtained from the pathological condition analyzing unit and / or the medical support information generating unit are provided.

特許文献３は、健常人又は健常動物の体液中のエンドトキシン濃度を測定することで、血液中のコレステロール、中性脂質、血糖から選ばれる生化学的パラメータの変動を推測する方法が提案され、この方法により生活習慣病の発症リスクを予測する方法が提案されている。又、特許文献３は、健常人又は健常動物の体液中の歯周病原菌由来エンドトキシン濃度を検出又はその濃度を測定することで、血液中のコレステロール、中性脂質、血糖から選ばれる生化学的パラメータの変動を推測する方法が提案され、この方法により生活習慣病の発症リスクを予測する方法が提案されている。 Patent Document 3 proposes a method for estimating fluctuations in biochemical parameters selected from cholesterol, neutral lipids and blood glucose in blood by measuring the endotoxin concentration in the body fluid of healthy persons or healthy animals. A method for predicting the risk of developing lifestyle-related diseases has been proposed. Patent Document 3 discloses a biochemical parameter selected from cholesterol, neutral lipid, and blood glucose in blood by detecting or measuring the concentration of endotoxin derived from periodontal pathogens in the body fluid of a healthy person or healthy animal. A method for estimating the fluctuation of the lifespan is proposed, and a method for predicting the risk of developing lifestyle-related diseases by this method is proposed.

特許文献４の診断支援システムは、生理データ入力部、糖尿病疾患リスク分析部、メタボリックシンドローム疾患リスク分析部、診断支援情報生成部、生体モデル生成部、病態シミュレーション部、及び診断支援情報出力部の各機能ブロックを有している。そして、このシステムは、糖尿病疾患リスク分析部及びメタボリックシンドローム疾患リスク分析部により、それぞれ糖尿病及びメタボリックシンドロームの疾患リスクを分析し、診断支援情報生成部が前記分析結果から診断支援情報を生成するようにされている。
特開２００４−２２７０４１号公報特開２００５−２６７０４２号公報特開２００５−１４０６１８号公報特開２００６−３０４８３３号公報清原裕、「久山町研究とは」、［online］、２００６年、九州大学大学院医学研究院環境医学分野久山町研究所、［平成１９年１２月１８日］、インターネット＜URL： http://www.envmed.med.kyushu-u.ac.jp/about/index.html＞ The diagnosis support system of Patent Document 4 includes a physiological data input unit, a diabetes disease risk analysis unit, a metabolic syndrome disease risk analysis unit, a diagnosis support information generation unit, a biological model generation unit, a disease state simulation unit, and a diagnosis support information output unit. Has functional blocks. The system analyzes the disease risk of diabetes and metabolic syndrome by the diabetes disease risk analysis unit and the metabolic syndrome disease risk analysis unit, respectively, and the diagnosis support information generation unit generates diagnosis support information from the analysis result. Has been.
JP 2004-227041 A Japanese Patent Laying-Open No. 2005-267042 Japanese Patent Laid-Open No. 2005-140618 JP 2006-304833 A Yutaka Kiyohara, “What is Hisayama Town Research”, [online], 2006, Hisayama Research Institute, Department of Environmental Medicine, Graduate School of Medicine, Kyushu University, [December 18, 2007], Internet <URL: http: // www.envmed.med.kyushu-u.ac.jp/about/index.html>

ところが、上記の非特許文献１では、対象とする人数が約２６００名であり、個別の生活習慣病に対応した疾患リスクの算出を行っているため、リスク算出のための統計的な情報の推定は、全体としては、さらに細かい疾患や状況のレベルに応じたモデル化や、個人差を考慮した推定にはまだいたっていない。また、保健師、管理栄養士、医師等による健康指導で、情報システムによる支援を利用することは、まだ、一般的でなく、こうした一人ひとりの健康状態や生活習慣に対応したきめ細かい支援のために、情報システムに備えるべき最適な情報の抽出は、必要性は認識されるものの、まだ、十分な研究成果が得られていない状況である。 However, in the above non-patent document 1, since the target number of people is about 2600 and the disease risk corresponding to individual lifestyle-related diseases is calculated, statistical information estimation for risk calculation is performed. As a whole, modeling according to the level of more detailed diseases and situations, and estimation considering individual differences have not yet been made. In addition, it is not yet common to use information system support in health guidance by public health nurses, registered dietitians, doctors, etc., and for detailed support corresponding to each individual's health condition and lifestyle, information Although it is recognized that there is a need to extract the optimal information to be prepared for the system, sufficient research results have not yet been obtained.

個人の特性や状況に合わせた、きめ細やかな支援や健康指導を実現するためには、検査データなどの時系列をより精密にモデル化し、さらに、大量のデータから得られた統計的な情報に基づいて対象とする事象の生起確率をより詳細に推定する必要がある。 In order to achieve detailed support and health guidance tailored to individual characteristics and circumstances, the time series such as test data is modeled more precisely, and statistical information obtained from a large amount of data is also created. Based on this, it is necessary to estimate the occurrence probability of the target event in more detail.

特許文献１乃至４での従来技術の問題は、第一に個人差を十分に考慮していない点にある。特定の疾患のリスクを推定する際にも、実際には症状の程度や、疾患を細かく分類した場合などにおいて、それぞれ、基礎となるモデルは異なるべきであるのに、従来技術では、こうした点の対応ができていない。 The problem with the prior art in Patent Documents 1 to 4 is that the individual differences are not fully considered. When estimating the risk of a specific disease, the underlying model should be different depending on the symptom level and the detailed classification of the disease. It is not supported.

第二に、データのばらつきの処理である。検査データや、日々の食事や運動量などのデータは、一人ひとりのデータについても分散が大きく、ばらつきがある。従来技術は、こうしたばらつきについて、全体的な確率密度の推定を行っているのみである。 The second is data variation processing. Examination data and data such as daily meals and exercise amount are highly dispersed and varied for each person's data. The prior art only estimates the overall probability density for these variations.

第三として、従来技術では、時間的な変化のモデルが、上昇、下降、或いは、平均的な変化パターンといった単純なものであり、より細かいモデル化ができていない。
第四として、従来技術において、検査データなどを分析し、リスクの予測を行うという提案においても、そうした判断の根拠となるデータをどのように得るのかについて、提案されていない。すなわち、従来技術では、予測を行うための統計的な知識を、もとのデータから得るための手段が提案されていない。 Thirdly, in the prior art, the temporal change model is simple such as ascending, descending, or average change pattern, and more detailed modeling is not possible.
Fourthly, even in the proposal of analyzing the inspection data and predicting the risk in the prior art, there is no proposal as to how to obtain the data that is the basis for such judgment. That is, the prior art has not proposed means for obtaining statistical knowledge for performing prediction from original data.

従来技術におけるモデル化の方法は、精度の粗いモデルとなっており、ここで、精度の粗いモデル（或いは確率の推定）とは、時間的な変化や、個人による違いを考慮せず、検査データや問診データなどを、少数のクラスに対して、おおまかなモデル化を行う方法である。 The modeling method in the prior art is a model with coarse accuracy. Here, the coarse accuracy model (or probability estimation) is a test data that does not take into account temporal changes and individual differences. This is a rough modeling method for a small number of classes.

このような方法によるモデル化、或いは確率分布（確率密度関数）の推定では、対象とするデータ中に、様々な要因による変動が含まれているのに、それを区別することができないため、分布の全体的な様子だけを推定する、ブロードな推定にならざるを得ない。 In modeling by such a method or estimation of probability distribution (probability density function), although fluctuations due to various factors are included in the target data, it cannot be distinguished, so distribution It must be a broad estimation that estimates only the overall state of the.

例えば、時間的な変化について、健康状態から疾患をもっている場合まで、さまざまな段階のサンプルを区別することなくモデル化するのでは、全体的な分布の推定しかできない。又、個人による違いを考慮せず、対象となるサンプルを全て１群として扱うのでは、複雑な形状をした確率分布を推定しなければならないため、単純なモデルでは表現できない。一方、複雑なモデルによって確率分布を推定することもできるが、推定精度の点からは、様々な要因による変動を未知のまま含むモデルから、高精度に推定することは、一般的に困難である。 For example, if the changes over time are modeled without distinguishing samples at various stages from the state of health to the case of having a disease, only the overall distribution can be estimated. In addition, if all target samples are handled as a group without considering differences among individuals, it is necessary to estimate a probability distribution having a complicated shape, and thus cannot be expressed by a simple model. On the other hand, it is possible to estimate the probability distribution with a complicated model, but it is generally difficult to estimate with high accuracy from a model that contains fluctuations due to various factors unknown from the viewpoint of estimation accuracy. .

本発明の目的は、個人差を十分に考慮して、大量のデータから自動的にクラスタリングによってモデル化することで、病態・症状の程度や、疾患の細かい分類に対応する変化をモデル化することができるとともに、より柔軟な時系列のモデル化と、精密な確率密度の推定を同時に実現することができる医療情報処理方法を提供することにある。 The object of the present invention is to model changes in the degree of pathological conditions / symptoms and detailed classification of diseases by automatically modeling from a large amount of data by taking into account individual differences and modeling automatically. It is possible to provide a medical information processing method capable of simultaneously realizing more flexible time series modeling and precise probability density estimation.

さらに、本発明の目的は、クラスタリングを行うことからその粒度の設定により、従来技術に比してより詳細なモデル化が可能であり、さらに、被験者の病気のリスクの予想を行うための統計的な知識を、もととなるデータからどのように得ることができるかを具体的に示すことができる医療情報処理方法を提供することにある。 Further, since the object of the present invention is to perform clustering, the granularity setting enables more detailed modeling as compared with the prior art, and further, statistical analysis for predicting a subject's disease risk. It is to provide a medical information processing method capable of concretely showing how to obtain knowledge from original data.

又、本発明の第２の目的は、個人差を十分に考慮して、大量のデータから自動的にクラスタリングによってモデル化することで、病態・症状の程度や、疾患の細かい分類に対応する変化をモデル化することができるとともに、より柔軟な時系列のモデル化と、精密な確率密度の推定を同時に実現することができる医療情報処理装置を提供することにある。 In addition, the second object of the present invention is to automatically change from a large amount of data into a model by clustering, taking into account individual differences, so that changes corresponding to the degree of disease state / symptom and the detailed classification of disease can be achieved. It is an object of the present invention to provide a medical information processing apparatus that can simultaneously model more flexible time-series modeling and precise probability density estimation.

さらに、本発明の目的は、クラスタリングを行うことからその粒度の設定により、従来技術に比してより詳細なモデル化が可能であり、さらに、被験者の病気のリスクの予想を行うための統計的な知識を、もととなるデータからどのように得ることができるかを具体的に示すことができる医療情報処理装置を提供することにある。 Further, since the object of the present invention is to perform clustering, the granularity setting enables more detailed modeling as compared with the prior art, and further, statistical analysis for predicting a subject's disease risk. It is to provide a medical information processing apparatus capable of specifically showing how to acquire knowledge from original data.

本発明の第３の目的は、個人差を十分に考慮して、大量のデータから自動的にクラスタリングによってモデル化することで、病態・症状の程度や、疾患の細かい分類に対応する変化をモデル化することができるとともに、より柔軟な時系列のモデル化と、精密な確率密度の推定を同時に実現することができる医療情報処理プログラムを提供することにある。 The third object of the present invention is to model changes based on the degree of pathology / symptoms and detailed classification of diseases by automatically modeling from a large amount of data by taking into account individual differences. Another object of the present invention is to provide a medical information processing program that can simultaneously realize more flexible time-series modeling and precise probability density estimation.

さらに、本発明の目的は、クラスタリングを行うことからその粒度の設定により、従来技術に比してより詳細なモデル化が可能であり、さらに、被験者の病気のリスクの予想を行うための統計的な知識を、もととなるデータからどのように得ることができるかを具体的に示すことができる医療情報処理プログラムを提供することにある。 Further, since the object of the present invention is to perform clustering, the granularity setting enables more detailed modeling as compared with the prior art, and further, statistical analysis for predicting a subject's disease risk. It is to provide a medical information processing program capable of specifically showing how to acquire the knowledge from the original data.

上記目的を達成するために、請求項１に記載の発明は、複数の対象者に関する複数の検査項目のデータが時系列として付与されている医療情報の集合を、検査項目のデータによってクラスターに分類する第１ステップと、分類された各クラスター内において、複数の対象者に関する複数の医療情報の集合における多次元の時系列の分布を確率密度関数として推定する第２ステップと、前記クラスター内の推定された確率密度関数の中からモデルとなる代表値を決定する第３ステップと、前記代表値を基準として、各クラスター内の全てのサンプルについて最適な時間軸の移動を行う第４ステップとを備えることを特徴とする医療情報処理方法を要旨とするものである。 In order to achieve the above object, the invention according to claim 1 classifies a set of medical information to which data of a plurality of examination items relating to a plurality of subjects are given in time series into clusters according to the data of the examination items. A first step of estimating a multi-dimensional time-series distribution of a plurality of sets of medical information on a plurality of subjects as a probability density function in each classified cluster, and an estimation in the cluster A third step of determining a representative value to be a model from the obtained probability density function, and a fourth step of moving an optimal time axis for all the samples in each cluster based on the representative value. The gist of the medical information processing method is characterized by this.

請求項２の発明は、請求項１において、前記第４ステップにより得られた結果を、さらに、前記第１ステップ〜第４ステップにて繰り返させることにより収束させることを特徴とする。 The invention of claim 2 is characterized in that, in claim 1, the result obtained in the fourth step is further converged by repeating in the first step to the fourth step.

請求項３の発明は、請求項２において、前記第４ステップにより得られた結果を収束させた後、クラスターに関連情報を付与するステップと、検査項目が時系列を持って付与されている新規の医療情報が入力された際、当該新規の医療情報の検査項目が、どのクラスターに属するかを示す確信度を算出するステップと、前記新規の医療情報がどのクラスターに属するかの確信度と、当該新規の医療情報が属するクラスターの関連情報を出力するステップを備えることを特徴とする。 A third aspect of the present invention is the method according to the second aspect, wherein after the result obtained in the fourth step is converged, a step of assigning related information to the cluster and a new item in which inspection items are given in time series When the medical information is input, a step of calculating a certainty level indicating which cluster the test item of the new medical information belongs to, a certainty level regarding which cluster the new medical information belongs to, The method includes a step of outputting related information of a cluster to which the new medical information belongs.

請求項４の発明は、請求項１乃至請求項３のいずれか１項において、前記第２ステップは、推定した確率密度関数に基づいて、欠損する部分があるデータを補間することを特徴とする。 According to a fourth aspect of the present invention, in any one of the first to third aspects, the second step interpolates data having a missing portion based on the estimated probability density function. .

請求項５の発明は、請求項４において、前記第２ステップは、クラスター内における医療情報に時間スケールの違いがある場合、時間スケールを揃えることを特徴とする。
請求項６の発明は、複数の対象者に関する複数の検査項目のデータが時系列として付与されている医療情報の集合を、検査項目のデータによってクラスターに分類する第１手段と、分類された各クラスター内において、複数の対象者に関する複数の医療情報の集合における多次元の時系列の分布を確率密度関数として推定する第２手段と、前記クラスター内の推定された確率密度関数の中からモデルとなる代表値を決定する第３手段と、前記代表値を基準として、各クラスター内の全てのサンプルについて最適な時間軸の移動を行う第４手段とを備えることを特徴とする医療情報処理装置を要旨とするものである。 According to a fifth aspect of the present invention, in the fourth aspect, the second step is characterized in that when the medical information in the cluster has a time scale difference, the time scale is aligned.
The invention of claim 6 is a first means for classifying a set of medical information in which data of a plurality of examination items relating to a plurality of subjects are given in time series into clusters according to the data of the examination items, and each classified A second means for estimating, as a probability density function, a multi-dimensional time-series distribution in a plurality of sets of medical information related to a plurality of subjects in the cluster; and a model from the estimated probability density functions in the cluster; A medical information processing apparatus comprising: a third means for determining a representative value; and a fourth means for performing an optimal time axis movement for all samples in each cluster based on the representative value. It is a summary.

請求項７の発明は、請求項６において、前記第４手段により得られた結果を、さらに、前記第１手段〜第４手段にて繰り返して処理させて収束させる第５手段を備えたことを特徴とする。 The invention of claim 7 comprises the fifth means according to claim 6, further comprising a fifth means for repeatedly processing the result obtained by the fourth means by the first means to the fourth means for convergence. Features.

請求項８の発明は、請求項７において、前記第５手段にて収束されたクラスターに関連情報を付与する関連情報付与手段と、検査項目が時系列を持って付与されている新規の医療情報が入力された際、当該新規の医療情報の検査項目が、どのクラスターに属するかを示す確信度を算出する確信度算出手段と、前記新規の医療情報がどのクラスターに属するかの確信度と、当該新規の医療情報が属するクラスターの関連情報を出力する出力手段を備えることを特徴とする。 The invention according to claim 8 is the novel medical information in which the related information giving means for giving the related information to the cluster converged by the fifth means and the test items are given in time series in claim 7 Is input, a certainty factor calculating means for calculating a certainty factor indicating to which cluster the test item of the new medical information belongs, and a certainty factor to which cluster the new medical information belongs, It is characterized by comprising output means for outputting related information of the cluster to which the new medical information belongs.

請求項９の発明は、請求項６乃至請求項８のいずれか１項において、前記第２手段は、推定した確率密度関数に基づいて、欠損する部分があるデータを補間することを特徴とする。 The invention of claim 9 is characterized in that, in any one of claims 6 to 8, the second means interpolates data having a missing portion based on the estimated probability density function. .

請求項１０の発明は、請求項９において、前記第２手段は、クラスター内における医療情報に時間スケールの違いがある場合、時間スケールを揃えることを特徴とする。
請求項１１の発明は、コンピュータを、複数の対象者に関する複数の検査項目のデータが時系列として付与されている医療情報の集合を、検査項目のデータによってクラスターに分類する第１手段と、分類された各クラスター内において、複数の対象者に関する複数の医療情報の集合における多次元の時系列の分布を確率密度関数として推定する第２手段と、前記クラスター内の推定された確率密度関数の中からモデルとなる代表値を決定する第３手段と、前記代表値を基準として、各クラスター内の全てのサンプルについて最適な時間軸の移動を行う第４手段として機能させることを特徴とする医療情報処理プログラムを要旨とするものである。 The invention of claim 10 is characterized in that, in claim 9, the second means aligns the time scale when the medical information in the cluster has a time scale difference.
The invention according to claim 11 is a first means for classifying a set of medical information to which data of a plurality of examination items relating to a plurality of subjects are given in time series into clusters according to the examination item data; A second means for estimating, as a probability density function, a multidimensional time-series distribution in a plurality of sets of medical information relating to a plurality of subjects in each of the clusters, and among the estimated probability density functions in the cluster Medical information characterized by functioning as a fourth means for determining a representative value to be a model from the above and a fourth means for moving the optimal time axis for all samples in each cluster with reference to the representative value The gist of the processing program.

請求項１２の発明は、請求項１１において、コンピュータを、前記第４手段により得られた結果を、前記第１手段〜第４手段にて繰り返して処理させて収束させる第５手段として機能させることを特徴とする。 According to a twelfth aspect of the invention, the computer according to the eleventh aspect causes the computer to function as a fifth means for causing the result obtained by the fourth means to be repeatedly processed by the first means to the fourth means to converge. It is characterized by.

請求項１の発明によれば、個人差を十分に考慮して、大量のデータから自動的にクラスタリングによってモデル化することで、病態・症状の程度や、疾患の細かい分類に対応する変化をモデル化することができるとともに、より柔軟な時系列のモデル化と、精密な確率密度の推定を同時に実現することができる医療情報処理方法を提供できる。 According to the invention of claim 1, by taking into account individual differences and modeling automatically from a large amount of data by clustering, a change corresponding to the degree of disease state / symptom and the detailed classification of the disease is modeled. It is possible to provide a medical information processing method that can simultaneously realize more flexible modeling of time series and precise estimation of probability density.

さらに、請求項１の発明によれば、クラスタリングを行うことからその粒度の設定により、従来技術に比してより詳細なモデル化が可能であり、さらに、被験者の病気のリスクの予想を行うための統計的な知識を、もととなるデータからどのように得ることができるかを具体的に示すことができる医療情報処理方法を提供できる。 Furthermore, according to the invention of claim 1, since clustering is performed, the granularity setting enables more detailed modeling than in the prior art, and furthermore, the risk of a subject's disease is predicted. It is possible to provide a medical information processing method that can specifically show how the statistical knowledge can be obtained from the original data.

請求項２の発明によれば、第４ステップで得られた結果が、第１ステップ〜第４ステップでさらに繰り返されて処理されるため、最適な分類結果、或いは、最適な確率密度関数を得ることができることから、より良く、請求項１の効果を実現できる。 According to the invention of claim 2, since the result obtained in the fourth step is further repeated in the first step to the fourth step, the optimum classification result or the optimum probability density function is obtained. Therefore, the effect of claim 1 can be better realized.

従来の方法によるモデル化或いは確率分布（確率密度関数）の推定では、対象とするデータ中に、様々な要因による変動が含まれて、それを区別することができないため、分布の全体的な様子だけを推定する、ブロードな推定にならざるを得ない。しかし、請求項１及び請求項２の発明によれば、時間的な変化に対しては、それを整合化する処理によって、前記変動を吸収することができる。 In conventional modeling or estimation of probability distribution (probability density function), fluctuations due to various factors are included in the target data and cannot be distinguished, so the overall state of the distribution It must be a broad estimate. However, according to the first and second aspects of the present invention, the variation can be absorbed by the process of matching the change with time.

又、請求項１及び請求項２の発明によれば、個人差の問題に対しても、任意のカテゴリー数のクラスターへ分割（分類）することが可能であり、前述の利点と合わせて、分割された各クラスター内のサンプルの分布は、よりコンパクトなものになる。 Further, according to the inventions of claim 1 and claim 2, it is possible to divide (classify) the problem of individual differences into clusters of an arbitrary number of categories. The distribution of samples within each cluster is made more compact.

これによって、ここで対象とする現象をより、精密にモデル化し、確率をより詳細に推定することが可能になる。
請求項３の発明では、新規の医療情報が入力されると、その新規の医療情報がどのクラスターに属するかの確信度と、当該新規の医療情報が属するクラスターの関連情報を容易に得ることができ、健康管理プログラムの運用を行う際の支援として用いることができる。 This makes it possible to model the target phenomenon here more precisely and estimate the probability in more detail.
In the invention of claim 3, when new medical information is input, it is possible to easily obtain the certainty of which cluster the new medical information belongs and the related information of the cluster to which the new medical information belongs. Yes, it can be used as support when operating a health care program.

従来技術では医療情報に含まれる検査値などには、時系列上に欠損値があることは避けられず、そうしたデータを除くクレンジングが行われた後のデータのみを対象としているが、請求項４の発明によれば、時系列上に欠損値がある医療情報においても対応することができる医療情報処理方法を提供できる。 In the prior art, it is inevitable that test values included in medical information have a missing value in time series, and only the data after cleansing excluding such data is targeted. According to this invention, it is possible to provide a medical information processing method capable of dealing with medical information having a missing value in time series.

従来技術では、年の単位、月の単位、週の単位、日の単位、時間の単位が混合した時系列に対して、統一的な扱いが困難であったが、請求項５の発明では、時間スケールの違いがある医療情報に対しても時間スケールの違いを吸収でき、年の単位、月の単位、週の単位、日の単位、時間の単位が混合した時系列に対して、統一的な扱いができる医療情報処理方法を提供できる。 In the prior art, unified handling was difficult for a time series in which a year unit, a month unit, a week unit, a day unit, and a time unit were mixed. Even for medical information with different time scales, the difference in time scales can be absorbed and unified for time series in which year units, month units, week units, day units, and time units are mixed. Medical information processing methods that can be handled easily.

請求項６の発明によれば、個人差を十分に考慮して、大量のデータから自動的にクラスタリングによってモデル化することで、病態・症状の程度や、疾患の細かい分類に対応する変化をモデル化することができるとともに、より柔軟な時系列のモデル化と、精密な確率密度の推定を同時に実現することができる医療情報処理装置を提供できる。さらに、請求項６の発明によれば、クラスタリングを行うことからその粒度の設定により、従来技術に比してより詳細なモデル化が可能であり、さらに、被験者の病気のリスクの予想を行うための統計的な知識を、もととなるデータからどのように得ることができるかを具体的に示すことができる医療情報処理装置を提供できる。 According to the invention of claim 6, by taking into account individual differences and modeling automatically from a large amount of data by clustering, the degree of pathology / symptoms and changes corresponding to the detailed classification of the disease are modeled. It is possible to provide a medical information processing apparatus that can simultaneously realize more flexible time series modeling and precise probability density estimation. Further, according to the invention of claim 6, since clustering is performed, more detailed modeling is possible by setting the granularity as compared with the prior art, and further, the risk of the subject's disease is predicted. It is possible to provide a medical information processing apparatus that can specifically show how the statistical knowledge can be obtained from the original data.

請求項７の発明によれば、第４手段で得られた結果が、第１手段〜第４手段でさらに繰り返されて処理されるため、最適な分類結果、或いは、最適な確率密度関数を得ることができることから、より良く、請求項６の効果を実現できる。 According to the invention of claim 7, since the result obtained by the fourth means is processed repeatedly by the first to fourth means, the optimum classification result or the optimum probability density function is obtained. Therefore, it is possible to realize the effect of claim 6 better.

請求項８の発明では、新規の医療情報が入力されると、その新規の医療情報がどのクラスターに属するかの確信度と、当該新規の医療情報が属するクラスターの関連情報を容易に得ることができ、健康管理プログラムの運用を行う際の支援として用いることができる医療情報処理装置を提供できる。 In the invention of claim 8, when new medical information is input, it is possible to easily obtain the certainty of which cluster the new medical information belongs and the related information of the cluster to which the new medical information belongs. It is possible to provide a medical information processing apparatus that can be used as support when operating a health care program.

従来技術では医療情報に含まれる検査値などには、時系列上に欠損値があることは避けられず、そうしたデータを除くクレンジングが行われた後のデータのみを対象としているが、請求項９の発明によれば、時系列上に欠損値がある医療情報においても対応することができる医療情報処理装置を提供できる。 In the prior art, it is inevitable that test values included in medical information have a missing value in time series, and only the data after cleansing excluding such data is targeted. According to the invention, it is possible to provide a medical information processing apparatus that can cope with medical information having a missing value in time series.

又、従来技術では、年の単位、月の単位、週の単位、日の単位、時間の単位が混合した時系列に対して、統一的な扱いが困難であったが、請求項１０の発明では、時間スケールの違いがある医療情報に対しても時間スケールの違いを吸収でき、年の単位、月の単位、週の単位、日の単位、時間の単位が混合した時系列に対して、統一的な扱いができる医療情報処理装置を提供できる。 Further, according to the prior art, it has been difficult to uniformly handle a time series in which a year unit, a month unit, a week unit, a day unit, and a time unit are mixed. Can absorb the difference in time scale even for medical information with different time scales, and for time series where year units, month units, week units, day units, and time units are mixed, Medical information processing devices that can be handled in a unified manner can be provided.

請求項１１の発明によれば、請求項５の医療情報処理装置の効果を容易に実現することができる医療情報処理プログラムを提供できる。
請求項１２の発明によれば、請求項６の医療情報処理装置の効果を容易に実現することができる医療情報処理プログラムを提供できる。 According to the invention of claim 11, a medical information processing program capable of easily realizing the effect of the medical information processing apparatus of claim 5 can be provided.
According to the invention of claim 12, a medical information processing program capable of easily realizing the effect of the medical information processing apparatus of claim 6 can be provided.

以下、本発明を具体化した一実施形態を図１〜３を参照して説明する。
図１は、医療情報処理装置の全体概略図が示されている。
図１に示すように、医療情報処理装置１０は、キーボード等の入力装置１１と、プログラムにより動作するデータ処理装置１２を備えている。データ処理装置１２には、各種データを記憶する記憶装置１３と、出力装置１４とが接続されている。出力装置１４は、例えばディスプレイやプリンタが含まれており、出力手段に相当する。 Hereinafter, an embodiment of the present invention will be described with reference to FIGS.
FIG. 1 shows an overall schematic diagram of a medical information processing apparatus.
As shown in FIG. 1, the medical information processing apparatus 10 includes an input device 11 such as a keyboard and a data processing device 12 that operates according to a program. A storage device 13 for storing various data and an output device 14 are connected to the data processing device 12. The output device 14 includes a display and a printer, for example, and corresponds to output means.

データ処理装置１２は、ＲＯＭ１２ａ及びＲＡＭ１２ｂを備えたコンピュータ１２ｃからなり、ＲＯＭ１２ａ等の記憶手段に格納された医療情報処理プログラムにより医療情報処理を行う。 The data processing device 12 includes a computer 12c having a ROM 12a and a RAM 12b, and performs medical information processing using a medical information processing program stored in storage means such as the ROM 12a.

データ処理装置１２は、クラスタリング手段２１、確率密度関数推定手段２２、代表値決定手段２３、時間軸移動手段２４、収束手段２５、関連情報付与手段２６、及び新規データ処理手段２７を備えている。クラスタリング手段２１は、複数の対象者に関する検査項目が時系列として付与されている医療情報の集合を、検査項目に応じてクラスターに分類する第１手段に相当する。 The data processing device 12 includes a clustering means 21, a probability density function estimating means 22, a representative value determining means 23, a time axis moving means 24, a convergence means 25, a related information giving means 26, and a new data processing means 27. The clustering means 21 corresponds to a first means for classifying a set of medical information to which examination items relating to a plurality of subjects are given in time series into clusters according to the examination items.

ここで、医療情報としては、代表例として下記のものを挙げることができるが、これらに限定されるものではない。これらの中には、臨床検査や健康診断での検査項目が含まれる。 Here, as medical information, the following can be cited as representative examples, but is not limited thereto. These include test items in clinical tests and medical examinations.

身体計測に関する値（身長、体重，ＢＭＩ、腹囲）、血圧測定値、血液化学検査に関する値（中性脂肪、ＨＤＬコレステロール、ＬＤＬコレステロール）、肝機能検査に関する値（ＡＳＴ（ＧＯＴ）、ＡＬＴ（ＧＰＴ）、γ−ＧＴ（γ−ＧＴＰ））、血糖検査に関する値（空腹時血糖又はＨｂＡ１ｃ検査）、尿検査に関する値（尿糖等）、心電図。 Values related to physical measurements (height, weight, BMI, waist circumference), blood pressure measurements, values related to blood chemistry tests (neutral fat, HDL cholesterol, LDL cholesterol), values related to liver function tests (AST (GOT), ALT (GPT) , Γ-GT (γ-GTP)), values related to blood glucose test (fasting blood glucose or HbA1c test), values related to urine test (urine sugar, etc.), electrocardiogram.

又、医療情報の検査項目には、他に問診項目や、医師が行う理学的検査項目等が含まれる。これらの各項目には、検査された年月日時の時間情報が付されている。この時間情報により、医療情報（検査項目）を後述する時系列分布とすることが可能となる。 In addition, medical information inspection items include inquiry items, physical inspection items performed by doctors, and the like. Each of these items has time information on the date and time of the inspection. This time information enables medical information (examination items) to have a time series distribution described later.

確率密度関数推定手段２２は、前記分類された各クラスター内において、複数の対象者に関する複数の医療情報の集合における多次元の時系列の分布を確率密度関数として推定する第２手段に相当する。代表値決定手段２３は、前記クラスター内の推定された確率密度関数の中からモデルとなる代表値の決定を行う第３手段に相当する。時間軸移動手段２４は、前記代表値を基準として、各クラスター内の全てのサンプルについて最適な時間軸の移動を行う第４手段に相当する。収束手段２５は、前記第１手段〜第４手段にて繰り返して処理させて収束させる第５手段に相当する。関連情報付与手段２６は、各クラスターに対して疾患・症状・薬品等の関連情報を付与する手段である。新規データ処理手段２７は、新規データについて種々の処理を行う手段である。 The probability density function estimating means 22 corresponds to a second means for estimating, as a probability density function, a multidimensional time series distribution in a set of a plurality of medical information related to a plurality of subjects in each classified cluster. The representative value determining means 23 corresponds to a third means for determining a representative value to be a model from the estimated probability density functions in the cluster. The time axis moving means 24 corresponds to fourth means for moving the optimal time axis for all samples in each cluster with the representative value as a reference. The convergence means 25 corresponds to a fifth means for causing the first to fourth means to repeatedly process and converge. The related information giving means 26 is a means for giving relevant information such as disease, symptom, and medicine to each cluster. The new data processing means 27 is a means for performing various processes on new data.

次に、図２を参照して、データ処理装置１２が医療情報処理プログラムに従って行う処理を説明する。
データ処理装置１２による医療情報処理を行う以前に、前処理として、例えば、医療情報に基づいて国や学会等で定めた疾病、或いは疾病の疑いがあると判断する際の、各種の検査項目に関する閾値Shを記憶装置１３に格納しておく。又、入力装置１１から予め複数の被験者の医療情報を記憶装置１３に格納してくものとする。 Next, processing performed by the data processing device 12 according to the medical information processing program will be described with reference to FIG.
Prior to performing medical information processing by the data processing device 12, as preprocessing, for example, regarding various test items when determining that there is a disease or suspicion of a disease determined by the national or academic society based on medical information The threshold value Sh is stored in the storage device 13. Further, medical information of a plurality of subjects is stored in the storage device 13 from the input device 11 in advance.

（ステップＳ１０）
Ｓ１０では、データ処理装置１２のクラスタリング手段２１は、入力装置１１から予め記憶装置１３に格納しておいた、複数の被験者の医療情報を読込み、同医療情報の集合から、検査項目に応じた任意のカテゴリ数のクラスタリングを行う。この場合、前記複数の被験者の数は、多いほど好ましい。 (Step S10)
In S10, the clustering means 21 of the data processing device 12 reads the medical information of a plurality of subjects stored in advance in the storage device 13 from the input device 11, and selects an arbitrary according to the examination item from the set of the medical information. Cluster the number of categories. In this case, the larger the number of the plurality of subjects, the better.

複数の被験者の医療情報において、医療情報に含まれる時間情報をｔ=1,2,…,T、検査項目をi=1,2,…,M、人をj=1,2,…,Nとすると、サンプルのデータ、すなわち医療情報は、ｆ（i,t,j）で表わされる。 In the medical information of a plurality of subjects, the time information included in the medical information is t = 1, 2,..., T, the examination items are i = 1, 2,. Then, sample data, that is, medical information is represented by f (i, t, j).

このクラスタリングを行う際、データ処理装置１２は予めクラスターの粒度を決定する。
クラスターの粒度は、クラスタリングの際に，各クラスターの大きさを表わす数値的な指標である。この粒度を制御することは，全体のサンプルをクラスターに分類するときに，各クラスターにいくつのサンプルを割り当てるかということである。本実施形態では、クラスターの粒度は，そのクラスターに属するサンプル数を用いる。 When this clustering is performed, the data processing device 12 determines the granularity of the cluster in advance.
The cluster granularity is a numerical index representing the size of each cluster during clustering. Controlling this granularity is how many samples are assigned to each cluster when classifying the entire sample into clusters. In this embodiment, the number of samples belonging to the cluster is used as the cluster granularity.

クラスタリングは、例えば、k-means法があるが、k-means法に限定されるもではなく、他のクラスタリング方法を用いてもよい。Ｓ１０は第１ステップに相当する。
（ステップＳ２０）
Ｓ２０では、データ処理装置１２の確率密度関数推定手段２２は、各クラスター内のサンプルにおける時系列データの分布を確率密度関数として推定するが、確率密時関数を推定する前に各サンプルにおける時間スケールが異なる場合には、まずこの時間スケールを揃える。 Clustering includes, for example, the k-means method, but is not limited to the k-means method, and other clustering methods may be used. S10 corresponds to the first step.
(Step S20)
In S20, the probability density function estimation means 22 of the data processing apparatus 12 estimates the distribution of time series data in the samples in each cluster as a probability density function, but before estimating the probability dense time function, the time scale in each sample is estimated. If they are different, first set the time scale.

時間スケールが異なるとは、例えばあるサンプルに関する特定の検査項目が、年毎に検査されているのに対して、他のサンプルに関する同じ特定の検査項目が月毎のように検査されている場合のように相違することをいう。このように時間スケールが相違する場合は、予め入力装置１１で各検査項目毎に設定しておいた時間スケールに揃えた後、確率密度関数を推定する。時間スケールを揃えることにより、サンプルの時間スケールの違いを吸収でき、年の単位、月の単位、週の単位、日の単位、時間の単位が混合した時系列に対して、統一的な扱いができることになる。 A different time scale means that, for example, a specific test item for one sample is inspected every year, while the same specific test item for another sample is inspected monthly. It means that they are different. When the time scales are different as described above, the probability density function is estimated after aligning with the time scale set in advance for each inspection item by the input device 11. By aligning the time scale, the difference in the time scale of the sample can be absorbed, and a uniform treatment can be applied to a time series in which the units of year, month, week, day, and time are mixed. It will be possible.

又、各サンプルにおける時間スケールが揃っている場合には、そのまま確率密度関数を推定する。
確率密度分布の推定は、通常、多次元空間内の各点において、ある事象が生起する確率を与える確率密度関数を推定することによって行われる。この分布を表わす関数として、分布の形状を表わす関数を仮定し、その分布の特徴を表わすパラメータを推定するパラメトリックな推定法について説明する。 If the time scales of the samples are the same, the probability density function is estimated as it is.
The estimation of the probability density distribution is usually performed by estimating a probability density function that gives a probability that an event occurs at each point in the multidimensional space. A parametric estimation method will be described in which a function representing the shape of the distribution is assumed as a function representing the distribution, and a parameter representing the characteristics of the distribution is estimated.

いま、あるクラスターｃについて、時間tにおけるＭ次元 (i=1,2,…,M) の検査項目の値が、Ｎ人(j=1,2.,,,N) について得られているとする。このＮ個のサンプルから、Ｍ次元空間における確率密度分布を推定する。たとえば、分布の形状として、多次元正規分布を仮定すれば、確率密度関数P(X) は、平均ベクトルμと、共分散行列Σによって、以下のように表わすことができる。 Now, with respect to a certain cluster c, the inspection item values of M dimensions (i = 1, 2,..., M) at time t are obtained for N persons (j = 1, 2,..., N). To do. A probability density distribution in the M-dimensional space is estimated from the N samples. For example, assuming a multidimensional normal distribution as the shape of the distribution, the probability density function P (X) can be expressed by the mean vector μ and the covariance matrix Σ as follows.

Ｎ個のサンプルを用いて、平均ベクトルμと、共分散行列Σを推定すれば、確率密度関数が得られる。なお、平均ベクトルと共分散行列は、クラスターｃおよび、時間ｔごとに求める必要がある点に注意する必要がある。

A probability density function can be obtained by estimating the mean vector μ and the covariance matrix Σ using N samples. It should be noted that the mean vector and the covariance matrix need to be obtained for each cluster c and time t.

ここで、未知のサンプルＹの検査項目がキーボード等にて入力された場合には、上記のP(X)に、Ｙを代入して得られた確率を、時間tにおいて、クラスターcに属すると予測される場合のリスクの程度を表わす数値として算出する。 Here, when an inspection item of an unknown sample Y is input with a keyboard or the like, the probability obtained by substituting Y into P (X) above belongs to cluster c at time t. Calculated as a numerical value representing the degree of risk when predicted.

又、未知のサンプルにおける検査項目の時系列が与えられた場合には、後述するＳ４０と同じ手続きにより、時間軸の整合を行い、最適な時間軸の移動量を求めた後に、対応する時刻における確率密度関数P(X)を用いて、確率を求める。 In addition, when a time series of inspection items in an unknown sample is given, time axis alignment is performed by the same procedure as S40 described later, and an optimal amount of time axis movement is obtained. The probability is obtained using the probability density function P (X).

ここでは、確率密度関数として、多次元正規分布の例を示したが、推定に利用できるサンプル数が少ない場合には、共分散行列の代わりに、対角成分のみを用いることも可能である。逆に、推定できるサンプル数が、ある程度、多い場合には、複数の正規分布の重み付きの和とする混合正規分布を用いることもできる。これらは、分布を表わす関数を、平均ベクトルと共分散行列という、パラメータによって表現するパラメトリックな方法である。 Here, an example of a multidimensional normal distribution is shown as the probability density function. However, when the number of samples that can be used for estimation is small, it is possible to use only a diagonal component instead of the covariance matrix. On the other hand, when the number of samples that can be estimated is large to some extent, a mixed normal distribution that is a weighted sum of a plurality of normal distributions can be used. These are parametric methods for expressing a function representing a distribution by parameters such as a mean vector and a covariance matrix.

確率密度関数の推定は、パラメトリックに方法に限定されるものではなく、ノンパラメトリック推定な方法で行ってもよい。
ノンパラメトリックな方法は分布の形状を表わす関数形をとくに仮定しない。ノンパラメトリックな確率密度推定方法の代表的なものが、カーネル密度推定法である。カーネル密度関数推定法には、多変量カーネル密度推定法、ビン化カーネル推定法などがある。 The estimation of the probability density function is not limited to a parametric method, and may be performed by a nonparametric estimation method.
The nonparametric method does not particularly assume a functional form representing the shape of the distribution. A typical nonparametric probability density estimation method is a kernel density estimation method. Kernel density function estimation methods include multivariate kernel density estimation methods and binned kernel estimation methods.

たとえば、ビン化カーネル推定法では、以下のように確率密度関数を推定する。 For example, in the binned kernel estimation method, the probability density function is estimated as follows.

ここで、Ｂはビンの数、Ｎはサンプル数、njはj番目のビンの度数、δはビン幅、Kh(x −jδ)は、バンド幅hのカーネル関数である。カーネル関数は、エパネックニコフカーネル（Epanechnikov Kernel）等を用いることができる。

Here, B is the number of bins, N is the number of samples, nj is the frequency of the jth bin, δ is the bin width, and Kh (x−jδ) is a kernel function with a bandwidth h. As the kernel function, Epanechnikov Kernel or the like can be used.

カーネル密度推定法では、分布の全体に対して特定の関数形を仮定しないことから、より柔軟なモデル化が可能である。
（欠損値がある場合）
前記確率密度関数を推定した後、確率密度関数推定手段２２は各クラスター内のサンプルの検査項目の時系列に欠損値があるか否かを判定し、欠損値があると判定した場合には、前記推定した確率密度関数に基づいて欠損値（すなわち、欠損する部分）を補間する。Ｓ２０は第２ステップに相当する。 Since the kernel density estimation method does not assume a specific function form for the entire distribution, more flexible modeling is possible.
(If there are missing values)
After estimating the probability density function, the probability density function estimating means 22 determines whether or not there is a missing value in the time series of the inspection items of the samples in each cluster. Based on the estimated probability density function, a missing value (that is, a missing portion) is interpolated. S20 corresponds to the second step.

（ステップＳ３０）
Ｓ３０では、データ処理装置１２の代表値決定手段２３は、各クラスター内において，代表値を定める。ここで、代表値を定めるのは，後述するＳ４０において，クラスター内のサンプルを移動する際に，代表値（モデル）を基準点とするためである。代表値は、中心値、或いは平均値でもよい。又、代表値を外部から入力された情報に基づいて定めてもよい。Ｓ３０は第３ステップに相当する。 (Step S30)
In S30, the representative value determining means 23 of the data processing device 12 determines a representative value in each cluster. Here, the representative value is determined in order to use the representative value (model) as a reference point when moving the sample in the cluster in S40 described later. The representative value may be a center value or an average value. The representative value may be determined based on information input from the outside. S30 corresponds to the third step.

（ステップＳ４０）
Ｓ４０では、データ処理装置１２の時間軸移動手段２４は、最適な時間軸の移動となる値（τ）を求め、当該検査項目のデータ（すなわち、検査データ）を移動させる。最適な時間軸の移動となる値（時間差）を求める方法は，クラスター内のサンプルを，たとえば，−τ〜τまでのように，ある時間幅におけるすべての値について，移動させ，Ｓ３０で定められた代表値との差（距離の和）が最小となる値を，最適な時間差とする。この移動は全検査データに対して行う。 (Step S40)
In S40, the time axis moving unit 24 of the data processing device 12 obtains a value (τ) that is the optimal time axis movement, and moves the data of the inspection item (that is, the inspection data). The method for obtaining the value (time difference) that results in the optimal movement of the time axis is determined in S30 by moving samples in the cluster for all values in a certain time width, for example, from -τ to τ. The value that minimizes the difference (sum of distances) from the representative value is the optimal time difference. This movement is performed for all inspection data.

この結果、疾患による検査項目の検査値の変化の軸が揃うことになる。
図３（ａ）は、クラスタリングを行う前の、複数の被験者に関するある検査項目を有した医療情報の模式図である。図３（ａ）において、縦軸は検査項目の検査値、横軸は時間を示している。図３（ｂ）はクラスタリングを行い、さらに、時間軸を揃えた場合の模式図である。図３（ｂ）に示すように、クラスタリングが行われて、時間軸を揃えることにより、当該検査項目において、例えば、特定のクラスターにおいて、時系列上の検査値が閾値Sh以下（或いは閾値以上）を有するものの場合に疾病の可能性があると判定が可能となる。 As a result, the axis of change of the test value of the test item due to the disease is aligned.
FIG. 3A is a schematic diagram of medical information having certain examination items related to a plurality of subjects before clustering. In FIG. 3A, the vertical axis indicates the inspection value of the inspection item, and the horizontal axis indicates time. FIG. 3B is a schematic diagram when clustering is performed and the time axes are aligned. As shown in FIG. 3B, by performing clustering and aligning the time axis, in the inspection item, for example, in a specific cluster, the time-series inspection value is less than or equal to the threshold value Sh (or more than the threshold value). It is possible to determine that there is a possibility of illness in the case of having the above.

なお、図３（ａ）と図３（ｂ）とは検査値のスケールは説明の便宜上異ならしめている。Ｓ４０は第４ステップに相当する。
（ステップＳ５０）
Ｓ５０では、データ処理装置１２の収束手段２５は、Ｓ４０で移動した後のデータについて、収束したか否かを判定する。すなわち、収束手段２５は、収束の判定を、クラスターへの分類の良さと推定した確率密度関数がデータを表わすモデルの良さを表わす数値的な指標を用いて判断する。 In FIG. 3A and FIG. 3B, the scale of the inspection value is different for convenience of explanation. S40 corresponds to the fourth step.
(Step S50)
In S50, the convergence means 25 of the data processing device 12 determines whether or not the data after moving in S40 has converged. That is, the convergence means 25 determines the convergence using a numerical index representing the goodness of the model in which the probability density function estimated as the goodness of classification into clusters represents the data.

例えば、収束手段２５は前回の処理（すなわち、Ｓ１０〜Ｓ４０の処理）におけるモデルの良さを表わす指標と比較し、予め定めた一定値以上の改善が得られなくなったときに収束と判断する。なお、収束手段２５は、前回の処理（すなわち、Ｓ１０〜Ｓ４０の処理）におけるモデルの良さを表わす指標がない場合、すなわち、Ｓ５０の判定が初回の場合は、予め定めた初期値の指標を使用する。 For example, the convergence unit 25 compares with an index representing the goodness of the model in the previous process (that is, the processes in S10 to S40), and determines that the convergence is achieved when improvement over a predetermined value cannot be obtained. The convergence means 25 uses a predetermined initial value index when there is no index representing the goodness of the model in the previous process (that is, the processes of S10 to S40), that is, when the determination of S50 is the first time. To do.

ここで前記一定値は、絶対値と、相対値（たとえば、改善が前回の１％以下で収束）があるが、いずれでもよい。
数値的な指標としては、確率的な数値（尤度）を用いる。 Here, the constant value includes an absolute value and a relative value (for example, the improvement converges at 1% or less of the previous time), but may be any.
As a numerical index, a probabilistic numerical value (likelihood) is used.

（収束判定の指標の例）
ここで、収束判定の指標の例を挙げて説明する。
すべてのサンプルのデータ (X(j), j=1,...,N) を、あるクラスターに分類した状態の全体を、D(j), j=1,...,N とする。例えば、３番目のデータ (j=3) が、クラスター番号４に分類された場合に、D(3)=4 のようにすべてのサンプルに対して、クラスター番号を付与する。 (Example of convergence judgment index)
Here, an example of a convergence determination index will be described.
Let D (j), j = 1, ..., N be the entire state where all sample data (X (j), j = 1, ..., N) are classified into a cluster. For example, when the third data (j = 3) is classified into cluster number 4, cluster numbers are assigned to all samples as D (3) = 4.

確率密度関数は、クラスターごとに推定する。パラメトリックな方法、ノンパラメトリックな方法を共通に表わす場合には、各クラスターの確率密度関数の違いをモデルMを用いて PM(X) と表わす。パラメトリックな方法の場合には、確率密度関数を推定するためのパラメータをθとし、M(θ) と表わす。たとえば、正規分布で確率密度を推定する場合には、平均値と共分散行列が、確率密度関数PM(X)を表わすパラメータθである。 The probability density function is estimated for each cluster. When the parametric method and the nonparametric method are expressed in common, the difference in the probability density function of each cluster is expressed as PM (X) using the model M. In the case of the parametric method, θ is a parameter for estimating the probability density function, and it is expressed as M (θ). For example, when the probability density is estimated with a normal distribution, the average value and the covariance matrix are parameters θ representing the probability density function PM (X).

あるクラスターへの分類Dと、確率密度関数のモデルMが与えられると、各サンプルのデータに対してこの分類Dと確率密度関数Mから生起した確率を計算できる。すべてのサンプルに対してこの確率を求め、その積を、クラスター分類Dと確率密度関数Mから、すべてのサンプルが生起する確率と考えることにする。このすべてのサンプルが生起する確率を、収束判定の指標とする。たとえば、確率の対数をとった対数尤度を用いると以下の式のようになる。 Given a classification D to a cluster and a model M of a probability density function, the probability generated from the classification D and the probability density function M can be calculated for each sample data. This probability is obtained for all samples, and the product is considered as the probability that all samples occur from the cluster classification D and the probability density function M. The probability that all these samples occur is used as an index for convergence determination. For example, when the log likelihood obtained by taking the logarithm of probability is used, the following formula is obtained.

ここで上記のモデルMは、クラスター番号D(j)に従って選択するものとする（ M=M(D(j)) ）。Πは積を表わす。

Here, the model M is selected according to the cluster number D (j) (M = M (D (j))). Π represents product.

Ｓ５０において、収束手段２５は収束していないと判定すると、Ｓ１０に戻り、収束していると判定すると、Ｓ６０に移行する。Ｓ５０を有することにより第１ステップ〜第４ステップを繰り返させることにより収束させることになる。 If the convergence means 25 determines in S50 that it has not converged, it returns to S10, and if it determines that it has converged, it proceeds to S60. By having S50, convergence is achieved by repeating the first to fourth steps.

（ステップＳ６０）
Ｓ６０においては、関連情報付与手段２６は、予め記憶装置１３に格納されている医療関係のテキストデータ等から、テキストマイニングによって、疾患・症状・薬品等の関連情報を集約する。又、関連情報には、前記医療関係のテキストデータ等から集約され、前記疾患名、症状等に関係する種々の情報、例えば、疾患に対する指導法や、症状を改善又は治療するための薬品等の情報が含まれる。 (Step S60)
In S 60, the related information adding unit 26 collects related information such as diseases, symptoms, and medicines by text mining from medical related text data stored in the storage device 13 in advance. In addition, the related information is aggregated from the medical-related text data, etc., and various information related to the disease name, symptoms, etc., such as instruction methods for diseases, drugs for improving or treating symptoms, etc. Contains information.

そして、関連情報付与手段２６は、前記集約した関連情報と関連があるクラスターに対して、当該関連情報を付与する。すなわち、クラスターには、そのクラスター内に多く見られる疾患名や症状の程度等の情報が含まれていることに基づいて、前記集約した情報の中から、その疾患名や、症状の程度等に関するものを、関連情報として付与する。この結果、クラスターには、関連情報として、疾患に対する指導法や、症状を改善又は治療するための薬品等の情報が付与されることになる。 Then, the related information giving unit 26 gives the related information to the cluster related to the aggregated related information. That is, based on the fact that the cluster includes information such as disease names and symptom levels that are frequently found in the cluster, the cluster relates to the disease name and symptom level from the aggregated information. Things are given as related information. As a result, the cluster is provided with information such as a guidance method for a disease and a drug for improving or treating symptoms as related information.

（ステップＳ７０）
Ｓ７０において、新規データ処理手段２７は、新規データ（すなわち、医療情報）の入力を待ち、新規データの入力があると、新規データに対する確信度を演算する。新規データ処理手段２７は確信度算出手段に相当する。 (Step S70)
In S 70, the new data processing unit 27 waits for input of new data (that is, medical information), and when there is input of new data, calculates a certainty factor for the new data. The new data processing unit 27 corresponds to a certainty factor calculating unit.

確信度を表わす方法は、確信度を表わす数値を示す方法と、予め確信度に何段階かのクラスを設定しその確信度のクラスを示す方法とがある。
確信度を数値で表わす場合、新規データ処理手段２７は、その新規データが各クラスターに属する確率をすべてのクラスターについて算出し、その確率を確信度とする。 There are two methods for expressing the certainty factor: a method for indicating a numerical value representing the certainty factor, and a method for setting a certain number of classes in the certainty factor in advance and indicating the class of the certainty factor.
When the certainty factor is represented by a numerical value, the new data processing means 27 calculates the probability that the new data belongs to each cluster for all clusters, and uses the probability as the certainty factor.

又、確信度の表示方法は、すべてのクラスターについて表示する方法、確率の高いものから順に一定数のクラスターについて表示する方法、ある一定の確率以上のクラスターのみについて表示する方法などがある。 As a method of displaying the certainty factor, there are a method of displaying all the clusters, a method of displaying a certain number of clusters in descending order of probability, and a method of displaying only clusters having a certain probability or higher.

何段階かのクラスを設定する場合には、新規データ処理手段２７は、確信度の大きさを予め、上記の確率の値を用いて、上限と下限を定めておく。そして、その間の値となる新規データについては、クラス名（段階の値）のみを、後述のＳ８０において、データ処理装置１２が出力する確信度として出力する。このようにすれば、例えば、確信度を、５段階で表示することができる。 In the case of setting several classes, the new data processing means 27 determines the upper limit and the lower limit of the degree of certainty in advance using the above probability values. And about the new data used as the value in the meantime, only a class name (step value) is output as a certainty factor that the data processing device 12 outputs in S80 described later. In this way, for example, the certainty factor can be displayed in five stages.

又、新規データは時系列データであるため、新規データがモデルの時系列上のどの位置にあるかについての情報を新規データ処理手段２７は推定する。この情報は、後述のＳ８０において、データ処理装置１２に出力される。 Further, since the new data is time series data, the new data processing means 27 estimates information about where the new data is on the time series of the model. This information is output to the data processing device 12 in S80 described later.

時系列上の位置の情報とは、そのモデルで表わされる病態・症状などの程度の時間的な変化において、どのような段階であるかを示すためのものである。
この場合、新規データ処理手段２７による時系列上の位置の推定は、入力されたサンプルのデータから、上記の確率を求める場合には、各クラスターのモデルに対して、時間軸の移動を行い、最も良く入力データと適合する時間軸の移動量を求めることにより行われる。又、この方法は、Ｓ４０と同様に行われる。 The information on the position on the time series is for indicating what stage the temporal change of the degree of the pathological condition / symptom represented by the model is.
In this case, estimation of the position on the time series by the new data processing means 27 is performed by moving the time axis with respect to the model of each cluster when obtaining the above probability from the input sample data. This is done by finding the amount of movement on the time axis that best matches the input data. Moreover, this method is performed similarly to S40.

この場合にも、時間軸の移動量を数値的に求めるものの他に、予め何段階かの移動量を設定しておき、最適なものをその中から選択してもよい。これによって、確信度の場合と同様に、上限と下限のような幅をもった移動量を示すことができる。 Also in this case, in addition to the numerical value for obtaining the movement amount of the time axis, several steps of movement amounts may be set in advance, and the optimum one may be selected from them. As a result, as in the case of the certainty factor, it is possible to indicate a movement amount having a width such as an upper limit and a lower limit.

なお、新規データの検査項目について、既に得られているクラスターとは時間スケールと異なる場合には、そのクラスターの時間スケールに新規データの時間スケールを揃えてから、上述した時間軸の移動を行う。 If the new data inspection item is different from the already obtained cluster in the time scale, the time axis is moved after the time scale of the new data is aligned with the time scale of the cluster.

（ステップＳ８０）
データ処理装置１２は、Ｓ７０で得られた新規データに関する確信度、新規データが、どのクラスターのモデルの時系列上のどの位置にあるかについての情報、並びに、各種の検査項目に関する閾値Shを出力装置１４にて出力（表示、及び印刷）する。この結果、新規データが、どのクラスターのモデルの時系列上のどの位置にあるかについてや、その各種の検査項目に関する閾値Shが即座に指導を行う人が分かるため、適切な指導をその新規データに関する被験者に与えることが可能となる。 (Step S80)
The data processing device 12 outputs the certainty factor relating to the new data obtained in S70, information about which position in the time series of the model of which cluster, and the threshold value Sh relating to various inspection items. Output (display and print) by the device 14. As a result, it is possible to know who the new data is in which position on the time series of which model of the cluster and the threshold value Sh for the various inspection items. Can be given to subjects.

本実施形態によって発揮される効果について、以下に記載する。
（１）本実施形態の医療情報処理方法は、複数の対象者に関する複数の検査項目のデータが時系列として付与されている医療情報の集合を、検査項目のデータによってクラスターに分類するステップＳ１０を備える。又、本医療情報処理方法は、分類された各クラスター内において、複数の対象者に関する複数の医療情報の集合における多次元の時系列の分布を確率密度関数として推定するステップＳ２０を備える。さらに、本実施形態の医療情報処理方法は、クラスター内の推定された確率密度関数の中からモデルとなる代表値を決定するステップＳ３０と、前記代表値を基準として、各クラスター内の全てのサンプルについて最適な時間軸の移動を行うステップＳ４０を備える。 The effects exhibited by this embodiment will be described below.
(1) The medical information processing method according to the present embodiment includes step S10 of classifying a set of medical information to which data of a plurality of examination items relating to a plurality of subjects are given in time series into clusters based on the examination item data. Prepare. The medical information processing method further includes a step S20 of estimating a multidimensional time-series distribution as a probability density function in a plurality of sets of medical information related to a plurality of subjects in each classified cluster. Furthermore, in the medical information processing method of this embodiment, step S30 for determining a representative value to be a model from the estimated probability density function in the cluster, and all samples in each cluster based on the representative value. Step S40 for performing an optimal time axis movement is provided.

この結果、個人差を十分に考慮して、大量のデータから自動的にクラスタリングによってモデル化することで、病態・症状の程度や、疾患の細かい分類に対応する変化をモデル化することができるとともに、より柔軟な時系列のモデル化と、精密な確率密度の推定を同時に実現することができる。さらに、本実施形態の医療情報処理方法によれば、クラスタリングの粒度を適切に設定すれば、従来技術に比してより詳細なモデル化が可能であり、さらに、被験者の病気のリスクの予想を行うための統計的な知識を、もととなるデータからどのように得ることができるかを具体的に示すことができる。 As a result, by taking into account individual differences and modeling automatically from a large amount of data by clustering, it is possible to model changes in the degree of pathological conditions / symptoms and detailed classification of diseases. Therefore, more flexible time series modeling and precise probability density estimation can be realized at the same time. Furthermore, according to the medical information processing method of this embodiment, if the granularity of clustering is set appropriately, more detailed modeling is possible as compared with the prior art, and the risk of the subject's disease can be predicted. It is possible to specifically show how statistical knowledge for performing can be obtained from the original data.

又、本実施形態の医療情報処理方法によれば、Ｓ４０にて、個別のデータをモデルとの整合性を考慮して時間軸上を移動して整合するため、時間的な変化に対しては、それを整合化する処理によって、前記変動を吸収することができる。 Further, according to the medical information processing method of the present embodiment, since individual data is moved and matched on the time axis in consideration of the consistency with the model in S40, with respect to temporal changes, The variation can be absorbed by the process of matching it.

（２）本実施形態の医療情報処理方法は、ステップＳ４０により得られた結果を、さらに、Ｓ５０を介してＳ１０〜Ｓ４０にて繰り返させることにより収束させる。この結果、本実施形態の医療情報処理方法によれば、最適な分類結果、或いは、最適な確率密度関数を得ることができることから、より良く、（１）の効果を実現できる。 (2) The medical information processing method of the present embodiment further converges the result obtained in step S40 by repeating it in S10 to S40 via S50. As a result, according to the medical information processing method of the present embodiment, an optimal classification result or an optimal probability density function can be obtained, and therefore the effect (1) can be realized better.

又、従来の方法によるモデル化或いは確率分布（確率密度関数）の推定では、対象とするデータ中に、様々な要因による変動が含まれて、それを区別することができないため、分布の全体的な様子だけを推定する、全体的な推定にならざるを得ない。 In addition, modeling by a conventional method or estimation of a probability distribution (probability density function) includes fluctuations due to various factors in the target data and cannot be distinguished from each other. It must be an overall estimation that estimates only the state of the image.

しかし、本実施形態の医療情報処理方法によれば、繰り返しＳ１０〜Ｓ４０を処理した際に、Ｓ４０にて、個別のデータをモデルとの整合性を考慮して時間軸上を移動して整合するため、時間的な変化に対しては、それを整合化する処理によって、前記変動を吸収することができる。 However, according to the medical information processing method of the present embodiment, when S10 to S40 are repeatedly processed, in S40, individual data is moved and matched on the time axis in consideration of consistency with the model. Therefore, the change can be absorbed by the process of matching the change with time.

（３）本実施形態の医療情報処理方法は、Ｓ６０にて、Ｓ４０により得られた結果を収束させた後、クラスターに関連情報を付与し、Ｓ７０において、検査項目が時系列を持って付与されている新規の医療情報が入力された際、当該新規の医療情報の検査項目が、どのクラスターに属するかを示す確信度を算出する。そして、本実施形態の医療情報処理方法は、Ｓ８０において、前記新規の医療情報がどのクラスターに属するかの確信度と、当該新規の医療情報が属するクラスターの関連情報を出力する。 (3) In S60, the medical information processing method according to the present embodiment converges the result obtained in S40, and then gives related information to the cluster. In S70, the examination items are given in time series. When new medical information is input, a certainty factor indicating which cluster the test item of the new medical information belongs to is calculated. Then, in S80, the medical information processing method of the present embodiment outputs a certainty level to which cluster the new medical information belongs and related information of the cluster to which the new medical information belongs.

この結果、本実施形態の医療情報処理方法では、新規の医療情報が入力されると、その新規の医療情報がどのクラスターに属するかの確信度と、当該新規の医療情報が属するクラスターの関連情報を容易に得ることができる。 As a result, in the medical information processing method of this embodiment, when new medical information is input, the certainty of which cluster the new medical information belongs and the related information of the cluster to which the new medical information belongs. Can be easily obtained.

このことは、電子カルテシステム上に実現して個別の患者に対して医師が診断や指導を行ったりする際の支援や、健康管理システム上に実現して保健師や栄養管理士が個別のユーザに対して健康指導する際の支援として行うことができる。さらに、医学領域において、疫学研究の際に、利用することも可能である。又、健康保険組合や保健管理プログラムの運を行う際の支援として用いることも可能である。 This is realized on the electronic medical record system and is supported when a doctor performs diagnosis and guidance for an individual patient, and is realized on a health management system so that a public health nurse and a nutritional manager can provide individual users. Can be provided as a support for health guidance. Furthermore, it can be used for epidemiological studies in the medical field. It can also be used as a support for the luck of health insurance associations and health management programs.

（４）本実施形態の医療情報処理方法は、ステップＳ２０では、推定した確率密度関数に基づいて、欠損する部分があるデータを補間する。この結果、従来技術では医療情報に含まれる検査値などには、時系列上に欠損値があることは避けられず、そうしたデータを除くクレンジングが行われた後のデータのみを対象としているが、本実施形態の医療情報処理方法では、時系列上に欠損値がある医療情報においても対応することができる。 (4) In step S20, the medical information processing method of the present embodiment interpolates data having a missing portion based on the estimated probability density function. As a result, in the prior art, it is inevitable that test values included in medical information have missing values in time series, and only the data after cleansing excluding such data is targeted, In the medical information processing method of the present embodiment, it is possible to cope with medical information having a missing value in time series.

（５）本医療情報処理方法は、ステップＳ２０では、クラスター内における医療情報に時間スケールの違いがある場合、時間スケールを揃える。この結果、従来技術では年の単位、月の単位、週の単位、日の単位、時間の単位が混合した時系列に対して、統一的な扱いが困難であったが、本医療情報処理方法は、時間スケールの違いがある医療情報に対しても時間スケールの違いを吸収でき、年の単位、月の単位、週の単位、日の単位、時間の単位が混合した時系列に対して統一的な扱いができる。 (5) In the medical information processing method, in step S20, when there is a time scale difference in the medical information in the cluster, the time scale is aligned. As a result, in the prior art, unified handling was difficult for time series in which year units, month units, week units, day units, and time units were mixed. Can absorb the difference of time scale even for medical information with time scale difference, unified for time series with mixed year unit, month unit, week unit, day unit and time unit Can be treated.

（６）医療情報処理装置１０は、複数の対象者に関する複数の検査項目のデータが時系列として付与されている医療情報の集合を、検査項目のデータによってクラスターに分類するクラスタリング手段２１を備える。さらに、医療情報処理装置１０は分類された各クラスター内において、複数の対象者に関する複数の医療情報の集合における多次元の時系列の分布を確率密度関数として推定する確率密度関数推定手段２２と、クラスター内の推定された確率密度関数の中からモデルとなる代表値を決定する代表値決定手段２３を備える。又、医療情報処理装置１０は、代表値を基準として、各クラスター内の全てのサンプルについて最適な時間軸の移動を行う時間軸移動手段２４を備える。この結果、本実施形態の医療情報処理装置１０は、上記（１）と同様の効果を奏することができる装置として提供できる。 (6) The medical information processing apparatus 10 includes clustering means 21 that classifies a set of medical information to which data of a plurality of examination items related to a plurality of subjects are given in time series into clusters according to the examination item data. Furthermore, the medical information processing apparatus 10 includes a probability density function estimating unit 22 that estimates, as a probability density function, a multidimensional time-series distribution in a plurality of sets of medical information related to a plurality of subjects in each classified cluster; A representative value determining unit 23 is provided for determining a representative value to be a model from the estimated probability density function in the cluster. Further, the medical information processing apparatus 10 includes a time axis moving unit 24 that performs an optimal time axis movement for all samples in each cluster with reference to the representative value. As a result, the medical information processing apparatus 10 of the present embodiment can be provided as an apparatus that can achieve the same effect as the above (1).

（７）医療情報処理装置１０は、時間軸移動手段２４により得られた結果を、さらに、収束手段２５により、クラスタリング手段２１、確率密度関数推定手段２２、代表値決定手段２３、時間軸移動手段２４にて繰り返して処理させて収束させる。この結果、上記（２）と同様の効果を奏することができる装置として提供できる。 (7) In the medical information processing apparatus 10, the result obtained by the time axis moving means 24 is further converted by the convergence means 25 into the clustering means 21, probability density function estimating means 22, representative value determining means 23, time axis moving means. The process is repeated at 24 to converge. As a result, it can provide as an apparatus which can have the same effect as the above (2).

（８）医療情報処理装置１０は、収束されたクラスターに関連情報を付与する関連情報付与手段２６と、検査項目が時系列を持って付与されている新規の医療情報が入力された際、当該新規の医療情報の検査項目が、どのクラスターに属するかを示す確信度を算出する新規データ処理手段２７を備える。そして、医療情報処理装置１０は、新規の医療情報がどのクラスターに属するかの確信度と、当該新規の医療情報が属するクラスターの関連情報を出力する出力装置１４を備える。 (8) When the medical information processing apparatus 10 receives the related information providing unit 26 that assigns the related information to the converged cluster and the new medical information in which the test items are given in time series, New data processing means 27 is provided for calculating a certainty factor indicating to which cluster the test item of the new medical information belongs. The medical information processing apparatus 10 includes an output device 14 that outputs the certainty of which cluster the new medical information belongs and the related information of the cluster to which the new medical information belongs.

この結果、本実施形態の医療情報処理装置１０は上記（３）と同様の効果を奏することができる装置として提供できる。
（９）医療情報処理装置１０の確率密度関数推定手段２２は、推定した確率密度関数に基づいて、欠損する部分があるデータを補間する。この結果、本実施形態の医療情報処理装置１０は上記（４）と同様の効果を奏することができる装置として提供できる。 As a result, the medical information processing apparatus 10 of the present embodiment can be provided as an apparatus that can achieve the same effect as the above (3).
(9) The probability density function estimation unit 22 of the medical information processing apparatus 10 interpolates data having a missing portion based on the estimated probability density function. As a result, the medical information processing apparatus 10 of the present embodiment can be provided as an apparatus that can achieve the same effect as the above (4).

（１０）医療情報処理装置１０の確率密度関数推定手段２２は、クラスター内における医療情報に時間スケールの違いがある場合、時間スケールを揃える。この結果、本実施形態の医療情報処理装置１０は上記（５）と同様の効果を奏することができる装置として提供できる。 (10) The probability density function estimation means 22 of the medical information processing apparatus 10 aligns the time scale when there is a time scale difference in the medical information in the cluster. As a result, the medical information processing apparatus 10 of the present embodiment can be provided as an apparatus that can achieve the same effect as the above (5).

（１１）本実施形態の医療情報処理プログラムは、コンピュータを、複数の対象者に関する複数の検査項目のデータが時系列として付与されている医療情報の集合を、検査項目のデータによってクラスターに分類するクラスタリング手段２１として機能させる。又、医療情報処理プログラムは、コンピュータを、分類された各クラスター内において、複数の対象者に関する複数の医療情報の集合における多次元の時系列の分布を確率密度関数として推定する確率密度関数推定手段２２及び、前記クラスター内の推定された確率密度関数の中からモデルとなる代表値を決定する代表値決定手段２３として機能させる。さらに医療情報処理プログラムは、コンピュータを、前記代表値を基準として、各クラスター内の全てのサンプルについて最適な時間軸の移動を行う時間軸移動手段２４として機能させる。この結果、本実施形態のプログラムは、上記（１）の効果を奏するプログラムとして提供できる。 (11) The medical information processing program of the present embodiment classifies a set of medical information to which data of a plurality of examination items relating to a plurality of subjects are given in time series into clusters according to the examination item data. It functions as the clustering means 21. In addition, the medical information processing program is a probability density function estimation unit that estimates a computer as a probability density function of a multi-dimensional time series distribution in a plurality of sets of medical information related to a plurality of subjects in each classified cluster. 22 and the representative value determining means 23 for determining a representative value to be a model from the estimated probability density function in the cluster. Further, the medical information processing program causes the computer to function as a time axis moving unit 24 that moves the optimal time axis for all samples in each cluster with the representative value as a reference. As a result, the program of the present embodiment can be provided as a program that exhibits the effect (1).

（１２）本実施形態の医療情報処理プログラムは、時間軸移動手段２４により得られた結果を、クラスタリング手段２１、確率密度関数推定手段２２、代表値決定手段２３、時間軸移動手段２４にて繰り返して処理させて収束させる収束手段２５としてコンピュータを機能させる。この結果、本実施形態のプログラムは、上記（２）の効果を奏するプログラムとして提供できる。 (12) The medical information processing program of the present embodiment repeats the results obtained by the time axis moving means 24 by the clustering means 21, the probability density function estimating means 22, the representative value determining means 23, and the time axis moving means 24. The computer is caused to function as the convergence means 25 for processing and convergence. As a result, the program of the present embodiment can be provided as a program that exhibits the effect (2).

なお、前記実施形態を次のように変更して構成することもできる。
○ Ｓ１０では、クラスターの粒度は、そのクラスターに属するサンプル数を用いる方法としたが、これ以外に、まとまりの良さを表わす数値を用いる方法や、両者の組み合わせによるものなどにしてもよい。又、前記まとまりの良さを表わす数値を用いる方法としては、クラスター内の代表値からの各サンプルの距離の和を用いるものや、正規分布として推定した確率密度関数の共分散行列の対角成分の和を用いるものなどがあり、これらの方法でクラスターの粒度を決定してもよい。このようにクラスターの粒度は、種々の方法で決定することができ、限定されるものではない。 In addition, the said embodiment can also be changed and comprised as follows.
In S10, the cluster granularity is a method using the number of samples belonging to the cluster. However, in addition to this, a method using a numerical value representing the goodness of unity or a combination of both may be used. In addition, as a method of using the numerical value representing the goodness of the unit, a method using a sum of distances of each sample from a representative value in the cluster, or a diagonal component of a covariance matrix of a probability density function estimated as a normal distribution is used. Some use sums, etc., and the cluster granularity may be determined by these methods. Thus, the particle size of the cluster can be determined by various methods and is not limited.

医療情報処理装置１０の概略ブロック図。1 is a schematic block diagram of a medical information processing apparatus 10. FIG. 医療情報処理プログラムのフローチャート。The flowchart of a medical information processing program. （ａ）はクラスタリングを行う前の、複数の被験者に関するある検査項目を有した医療情報の模式図、（ｂ）はクラスタリングを行い、さらに、時間軸を揃えた場合の模式図。(A) is a schematic diagram of medical information having certain examination items related to a plurality of subjects before clustering, and (b) is a schematic diagram when clustering is performed and the time axes are aligned.

Explanation of symbols

１０…医療情報処理装置、１１…入力装置、１２…データ処理装置、
１３…記憶装置、１４…出力装置（出力手段）、
２１…クラスタリング手段（第１手段）、
２２…確率密度関数推定手段（第２手段）、
２３…代表値決定手段（第３手段）、２４…時間軸移動手段（第４手段）、
２５…収束手段（第５手段）、２６…関連情報付与手段、
２７…新規データ処理手段（確信度算出手段）。 DESCRIPTION OF SYMBOLS 10 ... Medical information processing apparatus, 11 ... Input device, 12 ... Data processing apparatus,
13 ... Storage device, 14 ... Output device (output means),
21 ... Clustering means (first means),
22 ... Probability density function estimation means (second means),
23 ... representative value determining means (third means), 24 ... time axis moving means (fourth means),
25 ... Convergence means (fifth means), 26 ... related information giving means,
27: New data processing means (confidence degree calculating means).

Claims

A first step of classifying a set of medical information in which data of a plurality of examination items relating to a plurality of subjects are given in time series into clusters according to the data of the examination items;
A second step of estimating, as a probability density function, a multi-dimensional time-series distribution in a plurality of sets of medical information related to a plurality of subjects within each classified cluster;
A third step of determining a representative value as a model from the estimated probability density function in the cluster;
A medical information processing method comprising: a fourth step of performing an optimal time axis shift for all samples in each cluster with reference to the representative value.

The medical information processing method according to claim 1, wherein the result obtained in the fourth step is further converged by repeating the first step to the fourth step.

Assigning relevant information to the cluster after converging the results obtained in the fourth step;
A step of calculating a certainty level indicating to which cluster the test item of the new medical information belongs when new medical information to which the test item is given in time series is input;
The medical information processing method according to claim 2, further comprising: a step of outputting a certainty level to which the new medical information belongs and a related information of the cluster to which the new medical information belongs.

The medical information processing method according to any one of claims 1 to 3, wherein the second step interpolates data having a missing portion based on the estimated probability density function.

5. The medical information processing method according to claim 4, wherein when the medical information in the cluster has a time scale difference, the second step aligns the time scale.

A first means for classifying a set of medical information in which data of a plurality of examination items relating to a plurality of subjects are given in time series into clusters according to the data of the examination items;
A second means for estimating, as a probability density function, a multi-dimensional time-series distribution in a plurality of sets of medical information related to a plurality of subjects within each classified cluster;
A third means for determining a representative value as a model from the estimated probability density function in the cluster;
A medical information processing apparatus, comprising: a fourth means for performing an optimal time axis movement for all samples in each cluster with reference to the representative value.

The medical information processing according to claim 6, further comprising fifth means for causing the result obtained by the fourth means to be repeatedly processed by the first means to the fourth means to converge. apparatus.

Related information giving means for giving related information to the cluster converged by the fifth means;
A certainty factor calculating means for calculating a certainty factor indicating to which cluster the test item of the new medical information belongs when the new medical information to which the test item is given in time series is input;
The medical information processing apparatus according to claim 7, further comprising: an output unit configured to output a certainty level to which cluster the new medical information belongs and related information of the cluster to which the new medical information belongs.

The medical information processing apparatus according to claim 6, wherein the second means interpolates data having a missing portion based on the estimated probability density function.

The medical information processing apparatus according to claim 9, wherein the second means aligns the time scale when there is a time scale difference in the medical information in the cluster.

Computer
A first means for classifying a set of medical information in which data of a plurality of examination items relating to a plurality of subjects are given in time series into clusters according to the data of the examination items;
A second means for estimating, as a probability density function, a multi-dimensional time-series distribution in a plurality of sets of medical information related to a plurality of subjects within each classified cluster;
A third means for determining a representative value as a model from the estimated probability density function in the cluster;
A medical information processing program which functions as a fourth means for performing an optimal time axis movement for all samples in each cluster with the representative value as a reference.

Computer
The medical information processing program according to claim 11, wherein the result obtained by the fourth means is caused to function as a fifth means for causing the first to fourth means to repeatedly process and converge.