JP2008065836A

JP2008065836A - Method of selecting medical and biochemical diagnostic tests employing neural network-related application

Info

Publication number: JP2008065836A
Application number: JP2007235531A
Authority: JP
Inventors: Jerome Lapointe; ジエローム・ラポイント; Duane D Desieno; デユアン・デイ・デジーノ
Original assignee: Adeza Biomedical Corp
Current assignee: Adeza Biomedical Corp
Priority date: 1996-02-09
Filing date: 2007-09-11
Publication date: 2008-03-21
Also published as: AU2316297A; CA2244913A1; JP3480940B2; JP4168187B2; JP3782792B2; JP2008136874A; JP4139822B2; WO1997029447A3; JP2005319301A; EP0879449A2; JP2006172461A; JP2000501869A; WO1997029447A2; JP2004041713A

Abstract

<P>PROBLEM TO BE SOLVED: To identify important variables for supporting an analysis of a failure or a state. <P>SOLUTION: The computer system for selecting variables involves: (a) providing a first set of "n" candidate variables and a second set of "selected important variables", which initially is empty; (b) taking the candidate variables one at a time and evaluating each by training a decision making support system on the basis of the variable combined with the current set of important variables; (c) selecting the best variables, where the best variable is the one that gives the highest performance of the decision making support system, and if it improves performance in comparison to the performance of the selected important variables, adding it to the "selected important variable" set, removing it from the candidate set and continuing processing at step (b), until the best variable no longer improves the performance. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明の主題は、医療診断補助装置の開発のための、予測技術、特に非線形予測技術の使用に関する。特に、医療診断ツールおよび診断の方法の開発のために患者病歴情報からの入力を有するニューラルネットワークおよび他のエキスパートシステムに対して有効なトレーニング技法が提供される。 The subject of the present invention relates to the use of prediction techniques, in particular non-linear prediction techniques, for the development of medical diagnostic aids. In particular, effective training techniques are provided for neural networks and other expert systems that have inputs from patient history information for the development of medical diagnostic tools and methods of diagnosis.

本出願は、１９９６年２月９日出願、Jerome Lapointe および Duane DeSieno の米国特許出願第０８／５９９２７５号「METHOD FOR DEVEL0PING MEDICAL AND BI0CHEMICAL DIAGN0STICTESTS USING NEURAL NETW0RKS」の一部継続出願であり、かつ１９９６年２月９日出願、Jerome Lapointeおよび Duane DeSienoの米国仮特許出願第６０／０１１４４９号「METHOD ANDAPPARATUS FOR AIDING IN THE DIAGNOSIS OF ENDOMETRIOSIS USINGA PLURALITY OF PARAMETERS SUITED F0R ANALYSIS THROUGH ANEURAL NETWORK」の３５Ｕ．Ｓ．Ｃ§１１９（ｅ）による優先権を主張するものである。上記の出願および仮出願それぞれの主題は、全体として参照により本発明の一部となる。 This application is a continuation-in-part of US patent application No. 08/599275 “METHOD FOR DEVEL0PING MEDICAL AND BI0CHEMICAL DIAGN0STICTESTS USING NEURAL NETW0RKS” filed on Feb. 9, 1996, Jerome Lapointe and Duane DeSieno. 35 U. of Jerome Lapointe and Duane DeSieno, US Provisional Patent Application No. 60/011449, “METHOD AND APPARATUS FOR AIDING IN THE DIAGNOSIS OF ENDOMETRIOSIS USINGA PLURALITY OF PARAMETERS SUITED F0R ANALYSIS THROUGH ANEURAL NETWORK”. S. Claims priority according to C§119 (e). The subject matter of each of the above applications and provisional applications is hereby incorporated by reference in its entirety.

マイクロフィッシュ付録．
本明細書に記載のプログラム用のコンピュータプログラムソースコードを含む二つのコンピュータ付録は、本願の出願と同時に提出されている。このコンピュータ付録は、３７Ｃ．Ｆ．Ｒ．１．９６（ｂ）に従ってマイクロフィッシュ付録に変換できる。以下「マイクロフィッシュ付録」と呼ぶコンピュータ付録は、それぞれ全体として参照により本発明の一部となる。したがって、本特許文献の開示の一部は、著作権保護を受ける資料を含む。著作権所有者は、本特許文献または特許開示のいずれかが特許商標庁特許ファイルまたはレコードに記載されたときに、それによるファクシミリ複製に異議はないが、それ以外の場合すべての著作権はどんなものでも留保する。 Microfish appendix.
Two computer appendices containing computer program source code for the programs described herein are filed concurrently with this application. This computer appendix is 37C. F. R. It can be converted to a microfiche appendix according to 1.96 (b). Computer appendices, hereinafter referred to as “microfiche appendices”, are each incorporated by reference in their entirety. Accordingly, part of the disclosure of this patent document includes material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by either this patent document or the patent disclosure when it appears in the Patent and Trademark Office patent file or record, but otherwise all copyrights Retain even things.

データ収集、意思決定支援システムおよびニューラルネットワーク．
いくつかのコンピュータ意思決定支援システムは、情報を分類し、かつ入力データ中のパターンを識別する能力を有し、大量の変数を有するデータセットおよび変数間の複雑な相互作用を評価する場合に特に有用である。集合的に「データ収集」または「データベース中の知識発見」（本明細書では意思決定支援システム）と呼ばれるこれらのコンピュータ意思決定システムは、プロセッサ、内部デバイスおよび周辺デバイス、メモリデバイスおよび入出力インタフェースを有する同じ基本ハードウェア構成要素、例えば、パーソナルコンピュータ（ＰＣ）を使用する。システム間の区別は、ソフトウェア中で起こり、より基本的には、ソフトウェアがそれに基づいているパラダイム中で起こる。意思決定支援機能を提供するパラダイムは、回帰方法、決定ツリー、判別分析、パターン認識、ベイズの決定理論、およびファジー諭理を含む。より広く使用されている意思決定支援コンピュータシステムの一つは人工ニューラルネットワークである。 Data collection, decision support system and neural network.
Some computer decision support systems have the ability to classify information and identify patterns in input data, especially when evaluating complex interactions between data sets and variables with a large number of variables. Useful. Collectively referred to as “data collection” or “knowledge discovery in a database” (decision support system herein), these computer decision making systems include processors, internal and peripheral devices, memory devices and input / output interfaces. Use the same basic hardware components that you have, for example, a personal computer (PC). The distinction between systems occurs in software, and more fundamentally, in the paradigm on which software is based. Paradigms that provide decision support functions include regression methods, decision trees, discriminant analysis, pattern recognition, Bayesian decision theory, and fuzzy logic. One of the more widely used decision support computer systems is an artificial neural network.

人工ニューラルネットワークまたは「ニューラルネット」は、ニューロンと呼ばれる個々の処理要素が層中に配列され、連続的な層中の要素間の多数の相互接続を備える並列情報処理ツールである。処理要素の働きは、処理要素の出力が一般に非線形の伝達関数によって決定される生物学神経細胞に近似するようにモデル化される。ニューラルネットワークの代表的なモデルでは、入力を受け取る要素用の入力層、出力を発生する一つまたは複数の要素を含む出力層、およびそれらの間の要素の一つまたは複数の隠れた層中に処理要素が配列される。隠れた層は、非線形問題をそれによって解決できる手段を与える。処理要素中、要素への入力信号は、各入力に関連する重み係数に従って算術的に重み付けされる。得られた重み付けされた合計は、Ｓ字形関数など、選択された非線形伝達関数によって変換されて、各処理要素ごとに、その値が０から１まで変動する出力をもたらす。学習プロセスは「トレーニング」と呼ばれ、特定の処理要素が、他の処理要素の出力と結合したときにニューラルネットワークの出力とトレーニングデータ中に提示された所望の出力との間に生じた誤差を最小限に抑える結果を発生する出力を与えるように、処理要素の重みに対する一連のインタラクティブ調整を必要とする試行錯誤プロセスである。要素の重みの調整は誤差信号によってトリガされる。トレーニングデータは、各例がニューラルネットワークに提示されるべき一組の入力値および関連する一組の所望の出力値を含むいくつかのトレーニング例として説明される。 Artificial neural networks or “neural networks” are parallel information processing tools in which individual processing elements, called neurons, are arranged in layers and comprise a number of interconnections between elements in successive layers. The working of the processing element is modeled so that the output of the processing element approximates a biological neuron that is generally determined by a non-linear transfer function. In a typical model of a neural network, in an input layer for elements that receive input, an output layer that includes one or more elements that generate output, and one or more hidden layers of elements between them Processing elements are arranged. The hidden layer provides a means by which nonlinear problems can be solved. During the processing element, the input signal to the element is arithmetically weighted according to a weighting factor associated with each input. The resulting weighted sum is transformed by a selected non-linear transfer function, such as a sigmoidal function, resulting in an output whose value varies from 0 to 1 for each processing element. The learning process is called “training”, and the error that occurs between the output of the neural network and the desired output presented in the training data when a particular processing element is combined with the output of other processing elements. It is a trial and error process that requires a series of interactive adjustments to the weights of the processing elements to give an output that produces minimal results. The adjustment of the element weights is triggered by the error signal. The training data is described as several training examples, each example including a set of input values to be presented to the neural network and an associated set of desired output values.

一般的なトレーニング方法は、誤差信号をネットワーク中を逆方向に伝搬させる逆方向伝搬または「バックプロップ」である。誤差信号は、所与の要素の重みをどのくらい変化させるべきかおよび誤差勾配を決定するために使用される。その目的は、平均二乗誤差の大域的最小値に収束することである。収束に向かう経路、すなわち下り勾配はステップの形でとられる。各ステップは、処理要素の入力重みの調整である。各ステップのサイズは学習率によって決定される。下り勾配の傾斜は、収束が達成されたという誤った印象を与え、不正確な結果をもたらす極小値の働きをする谷を有する平坦な領域および急な領域を含む。 A common training method is reverse propagation or “backprop”, which propagates the error signal back through the network. The error signal is used to determine how much the weight of a given element should change and the error slope. Its purpose is to converge to a global minimum of mean square error. The path towards convergence, i.e. the downgrading, is taken in the form of steps. Each step is adjustment of the input weight of the processing element. The size of each step is determined by the learning rate. The slope of the down slope gives a false impression that convergence has been achieved, and includes flat and steep areas with valleys that act as local minima, resulting in inaccurate results.

バックプロップのいくつかの変形は、前の重み変化値の一部が現在の値に追加される運動量項を組み込む。これは、運動量をアルゴリズムの軌道の下り勾配中に追加する。これは、アルゴリズムの軌道が極小値中に「捕獲」されるのを防ぐ。運動量項を含む一つの逆方向伝搬方法は、運動量率が適応性のある「クイックプロップ」である。クイックプロップ変形は、Fahlman（「Fast Learning Variations on Back-Propagation: An Empirical Study」、Proceedings on the 1988 Connectionist Models Summer School、ピッツバーグ、1988年、D．Touretzky他編、pp.38-51、Morgan Kaufmann、カリフォルニア州 San Mateo、Lebriere との共著、「The Cascade-Correlation Learning Architecture」、Advances in Neural Information Processing Systems 2（デンバー、1989年）、D.Touretzky 編、pp.524-32、Morgan Kaufmann、カリフォルニア州 San Mateo参照）によって記載されている。クイックプロップアリゴリズムは、CarnegieMellon University の School of Computer Scienceによって維持される Artificial Intelligence Repository から、公的に入手でき、インターネットを介してダウンロードできる。クイックプロップでは、動的運動量率を勾配の傾斜に基づいて計算する。傾斜が直前の重み調整の後の傾斜よりも小さいが、それと同じ符号を有する場合、重み変化は加速する。加速率は、傾斜値間の連続的な差の大きさによって決定される。現在の傾斜が前の傾斜と反対の方向である場合、重み変化は減速する。クイックプロップ方法は、収束速度を改善し、できる限り急な下り勾配を与え、極小値への収束を防ぐのを助ける。 Some variations of the backprop incorporate a momentum term in which a portion of the previous weight change value is added to the current value. This adds momentum into the descending slope of the algorithm trajectory. This prevents the algorithm trajectory from being “captured” in the local minimum. One back-propagation method that includes a momentum term is a “quick prop” with an adaptive momentum rate. Quick prop variations are described in Fahlman ("Fast Learning Variations on Back-Propagation: An Empirical Study", Proceedings on the 1988 Connectionist Models Summer School, Pittsburgh, 1988, D. Touretzky et al., Pp. 38-51, Morgan Kaufmann, Co-authored with San Mateo, California and Lebriere, “The Cascade-Correlation Learning Architecture”, Advances in Neural Information Processing Systems 2 (Denver, 1989), edited by D. Touretzky, pp. 524-32, Morgan Kaufmann, San Mateo). Quickprop algorithms are publicly available from the Artificial Intelligence Repository maintained by the School of Computer Science at Carnegie Mellon University and can be downloaded over the Internet. In quick props, the dynamic momentum rate is calculated based on the slope of the gradient. If the slope is smaller than the slope after the previous weight adjustment but has the same sign, the weight change accelerates. The acceleration rate is determined by the magnitude of the continuous difference between the slope values. If the current slope is in the opposite direction to the previous slope, the weight change slows down. The quick prop method improves the convergence speed, gives the steepest possible slope, and helps prevent convergence to a local minimum.

ニューラルネットワークが十分なトレーニングデータに基づいてトレーニングされるとき、ニューラルネットワークは、トレーニングデータの一部でなかった新しい入力データの組用の正確な解決策に対して一般化することができる連想メモリの働きをする。ニューラルネットワークは、完全なデータがない場合または雑音がある場合でも動作できることが示されている。また、新しいデータまたはテストデータに対するネットワークの性能はトレーニングデータに対する性能よりも低くなる傾向にあることが観測されている。テストデータに対する性能の差は、ネットワークがトレーニングデータから一般化することができた範囲を示す。しかしながら、ニューラルネットワークは、再トレーニングでき、したがって新しいデータから学習でき、ネットワークの全体的な性能を改善できる。 When a neural network is trained based on sufficient training data, the neural network can be generalized to an accurate solution for a new set of input data that was not part of the training data. Work. Neural networks have been shown to work even in the absence of complete data or in the presence of noise. It has also been observed that network performance for new or test data tends to be lower than performance for training data. The difference in performance relative to the test data indicates the extent to which the network can generalize from the training data. However, neural networks can be retrained and can therefore learn from new data, improving the overall performance of the network.

したがって、ニューラルネットは、医療診断など、予測を必要とする領域を含めて、多数の様々な問題に好適となる特性を有する。 Therefore, the neural network has characteristics suitable for a number of various problems, including areas that require prediction, such as medical diagnosis.

ニューラルネットおよび診断．
患者を診断および／または治療する場合、医師は、患者の状態、症状、および適用できる医療診断テストの結果を使用して、患者の疾病状態または状態を識別する。医師は、症状およびテスト結果と特定の診断との関連を慎重に決定し、特定の診断を行う際に経験および直感に基づく判断を使用しなければならない。医療診断では、医療病歴、物理的試験、生化学テストを含むいくつかの情報源から情報を統合する必要がある。試験およびテストおよび質問に対する答えの結果に基づいて、医師は、その人のトレーニング、経験、知識、専門知識を使用して、診断を定形化する。最終的な診断には、検証または定式化するために後続の外科手順が必要である。したがって、診断のプロセスは、意思決定支援、直感、経験の組合せを必要とする。医師の診断の有効性は、その人の経験および能力に依存する。 Neural networks and diagnostics.
When diagnosing and / or treating a patient, the physician uses the patient's condition, symptoms, and the results of applicable medical diagnostic tests to identify the patient's disease state or condition. The physician must carefully determine the association of symptoms and test results with a particular diagnosis and use judgment based on experience and intuition when making a particular diagnosis. Medical diagnosis requires the integration of information from several sources, including medical history, physical testing, and biochemical testing. Based on the results of the tests and the answers to the tests and questions, the physician uses the person's training, experience, knowledge, and expertise to formalize the diagnosis. The final diagnosis requires subsequent surgical procedures to be validated or formulated. Thus, the diagnostic process requires a combination of decision support, intuition and experience. The effectiveness of a physician's diagnosis depends on the person's experience and ability.

医療診断の予測的および直感的性質のために、ニューラルネットワークおよびこのプロセスを助ける他のエキスパートシステムを開発する試みがなされている。医療診断へのニューラルネットワークの適用が報告されている。例えば、ニューラルネットワークは、心臓血管障害の診断を助けるために使用されている（例えば、Baxt（1991年）「Use of an Artificial Neural Network for the Diagnosis of Myocardial Infarction」、Annalsof Internal Medicine 115:843;Baxt（1992年）「Improving the Accuracy of an Artificial Neural Network Using Multiple Differently Trained Networks」、Neural Computation 4:772;Baxt(1992年)「Analysis of the clinical variables that drivedecision in an artificial neural network trained to identify the presence of myocardial infarction」、Annals of Emergency Medicine 21:1439; Baxt(1994年)「Complexity, chaos and human physiology: the justification for non-linear neural computational analysis」、Cancer Letters 77:85参照）。他の医療診断用途としては、癌診断（例えば、Maclin 他（1991年）「Using Neural Networks to Diagnose Cancer」 Journal of Medical Systems 15:11-9; Rogers 他（1994年）「Artificial Neural Networks for Early Detection and Diagnosis of Cancer」Cancer Letters 77:79-83; Wilding 他（1994 年）「Application of Backpropogation Neural Networks to Diagnosis of Breast and Ovarian Cancer Cancer Letters77:145-53）, neuromuscular disorders（Pattichis 他（1995年を参照）「Neural Network Models in EMG Diagnosis」、IEEE Transactions on Biomedical Engineering 42:5:486-495参照）および慢性疲労症候群（Solms 他（1996年）「A Neural Network Diagnostic Tool for the Chronic Fatigue Syndrome」、 International Conference on Neural Networks， Paper No．108を参照）にニューラルネットワークを使用することである。しかしながら、これらの方法は、広い範囲の状態に対して実際的な診断テストの開発に関連する重大な問題を処理できず、また入力変数の選択を処理しない。 Due to the predictive and intuitive nature of medical diagnosis, attempts have been made to develop neural networks and other expert systems that aid this process. Application of neural networks to medical diagnosis has been reported. For example, neural networks have been used to help diagnose cardiovascular disorders (eg, Baxt (1991) “Use of an Artificial Neural Network for the Diagnosis of Myocardial Infarction”, Analsof Internal Medicine 115: 843; Baxt (1992) "Improving the Accuracy of an Artificial Neural Network Using Multiple Differently Trained Networks", Neural Computation 4: 772; Baxt (1992) "Analysis of the clinical variables that drivedecision in an artificial neural network trained to identify the presence of myocardial infarction ", Annals of Emergency Medicine 21: 1439; Baxt (1994)" Complexity, chaos and human physiology: the justification for non-linear neural computational analysis ", Cancer Letters 77:85). Other medical diagnostic applications include cancer diagnosis (eg Maclin et al. (1991) “Using Neural Networks to Diagnose Cancer” Journal of Medical Systems 15: 11-9; Rogers et al. (1994) “Artificial Neural Networks for Early Detection. and Diagnosis of Cancer ”Cancer Letters 77: 79-83; Wilding et al. (1994)“ Application of Backpropogation Neural Networks to Diagnosis of Breast and Ovarian Cancer Cancer Letters 77: 145-53 ”, neuromuscular disorders (see Pattichis et al. (1995)) ) "Neural Network Models in EMG Diagnosis", IEEE Transactions on Biomedical Engineering 42: 5: 486-495) and Chronic Fatigue Syndrome (Solms et al. (1996) "A Neural Network Diagnostic Tool for the Chronic Fatigue Syndrome", International Conference on Neural Networks, Paper No. 108). However, these methods cannot handle the critical problems associated with developing practical diagnostic tests for a wide range of conditions and do not handle the selection of input variables.

ＭＹＣＩＮ（Davis他、「Production Systems as a Representation for a Knowledge-based Consultation Program」、 Artificial Intelligence、1977年、8:1:15-45）およびその子孫 TEIRESIAS、EMYCIN、PUFF、 CENTAUR、VM、GUIDON、SACON、ONCOCIN およびROGET を含む知識ベースのエキスパートシステムを含めて、医療診断に応用できるニューラルネットワーク以外のコンピュータ意思決定支援方法が報告されている。ＭＹＣＩＮは、いくつかの伝染病を診断し、かつ抗菌療法を規定する対話式プログラムである。そのような知識ベースのシステムは、実際の知識および規則またはその知識を使用するための他の方法を含む。すべての情報および規則は、ニューラルネットワークの場合と同様に、入力データに基づいて所望の結果に到達するためにそれ自体の手順を開発するシステム以外のシステムのメモリ中に事前にプログラムされる。他のコンピュータ診断方法は、トレーニングパターンおよびアプリオリ情報からの確率密度関数に基づいてパターンを分類する信念または因果的確率的ネットワークとも呼ばれるベイズのネットワークである。乳癌と診断する乳房Ｘ線写真の解釈に使用されるベイズの意思決定システムが報告されている（Roberts 他「Mammo Net: A Bayesian Network diagnosing Breast Cancer」、Midwest artificial Intelligence and Cognitive Science Society Conference，イリノイ州 Carbonda1e,1995年４月）and Hypertension（Blinowska 他（1993年）「Diagnostica - A Bayesian Decision-Aid System - Applied to Hypertension Diagnosis」、IEEE Transactions on Biomedical Engineering 40:230-35）。ベイズの意思決定システムは、線形関係の信頼性および処理できる入力データ点の数に関して若干制限され、変数間の非線形関係を必要とする意思決定支援にはあまり適さない。ニューラルネットワークの処理要素を使用してベイズの方法を実施すれば、これらの制限の一部を克服できる（例えば Penny 他（1996年）、「Neural Networks in Clinica1 Medicine」、Medical Decision-support、1996年、16:4:386-98参照）。これらの方法は、医師をまねることによって、重要な変数がシステム中に入力される障害を診断するために使用されている。しかしながら、これらのシステムを使用して、既存の診断手順を改善することが重要であろう。 MYCIN (Davis et al., Production Systems as a Representation for a Knowledge-based Consultation Program, Artificial Intelligence, 1977, 8: 1: 15-45) and its descendants TEIRESIAS, EMYCIN, PUFF, CENTAUR, VM, GUIDON, SACON Computer decision support methods other than neural networks that can be applied to medical diagnosis, including knowledge-based expert systems including ONCOCIN and ROGET, have been reported. MYCIN is an interactive program that diagnoses several infectious diseases and defines antimicrobial therapy. Such knowledge-based systems include actual knowledge and rules or other methods for using that knowledge. All information and rules are pre-programmed into the memory of a system other than the system that develops its own procedure to reach the desired result based on the input data, as in the case of neural networks. Other computer diagnostic methods are Bayesian networks, also called belief or causal probabilistic networks that classify patterns based on probability density functions from training patterns and a priori information. A Bayesian decision-making system used to interpret mammograms to diagnose breast cancer has been reported (Roberts et al. “Mammo Net: A Bayesian Network diagnosing Breast Cancer”, Midwest artificial Intelligence and Cognitive Science Society Conference, Illinois) Carbonda1e, April 1995) and Hypertension (Blinowska et al. (1993) "Diagnostica-A Bayesian Decision-Aid System-Applied to Hypertension Diagnosis", IEEE Transactions on Biomedical Engineering 40: 230-35). Bayesian decision systems are somewhat limited with respect to the reliability of linear relationships and the number of input data points that can be processed, and are not well suited for decision support that requires nonlinear relationships between variables. Implementing Bayesian methods using neural network processing elements can overcome some of these limitations (eg Penny et al. (1996), “Neural Networks in Clinica1 Medicine”, Medical Decision-support, 1996 16: 4: 386-98). These methods are used to diagnose disorders in which important variables are entered into the system by mimicking a physician. However, it may be important to use these systems to improve existing diagnostic procedures.

子宮内膜症．
子宮内膜症は、子宮状組織が子宮外で成長することである。これは、約１５〜３０パーセントの生殖年齢女性に影響を及ぼす。子宮内膜症の原因は未知であるが、後退月経、子宮から腹膜腔中への子宮内膜組織および細胞（月経堆積物）の退潮に起因しうる。後退月経はたいていの女性またはすべての女性に起こると考えられるが、なぜある女性が子宮内膜症になり、他の女性は子宮内膜症にならないのかははっきりしない。 Endometriosis.
Endometriosis is the growth of uterine tissue outside the uterus. This affects women of reproductive age of about 15-30 percent. The cause of endometriosis is unknown, but may be due to retromenstrual menopause and the endurance of endometrial tissue and cells (menstrual deposits) from the uterus into the peritoneal cavity. Regressive menstruation is thought to occur in most women or all women, but it is unclear why some women have endometriosis and others do not.

子宮内膜症を有するすべての女性が症状を示すか、またはその疾病を煩うとは限らない。子宮内膜症の程度または重さは症状に相関しない。重い疾病を有する女性は完全に無症候性であり、最小の疾病を有する他の女性は耐え難い痛みを受ける。不妊症、骨盤痛、月経困難症、子宮内膜症の過去の発生など、子宮内膜症に関連している症状は、しばしば子宮内膜症を有しない女性に起こる。他の場合には、これらの症状は現れ、女性は子宮内膜症を有する。これらの症状と子宮内膜症との間の関係は存在すると思われるが、これらのおよび他のファクタとの相互作用は複雑である。臨床医は、しばしば上記の指示の組合せに基づいて子宮内膜症を有する優れた候補者であると考えられる患者に対して診断腹腔鏡検査を実施する。しかしながら、子宮内膜症は、これらの女性のかなりの部分に存在しない。したがって、子宮内膜症は、医師が情報の複雑な組を使用して、診断を定形化するために経験に頼らなければならない疾病状態の一例を表す。診断の有効性は、医師の経験および能力に関係する。 Not all women with endometriosis show symptoms or bother the disease. The degree or severity of endometriosis does not correlate with symptoms. Women with severe illness are completely asymptomatic, while other women with minimal illness suffer intolerable pain. Symptoms associated with endometriosis, such as infertility, pelvic pain, dysmenorrhea, past occurrence of endometriosis, often occur in women who do not have endometriosis. In other cases, these symptoms appear and the woman has endometriosis. Although a relationship between these symptoms and endometriosis appears to exist, the interaction with these and other factors is complex. Clinicians often perform diagnostic laparoscopy on patients who are considered good candidates with endometriosis based on a combination of the above indications. However, endometriosis is not present in a significant portion of these women. Endometriosis therefore represents an example of a disease state in which a physician must rely on experience to shape a diagnosis using a complex set of information. The effectiveness of the diagnosis is related to the experience and ability of the physician.

したがって、症状だけから女性が子宮内膜症を有するかどうかを決定することは不可能であった。医学界内で、子宮内膜症の診断は、手術中に子宮内膜障害を直接視覚化することによってしか確認されない。多数の医師は、しばしば更なる制限を加え、子宮内膜バイオプシー組織に関する組織学を使用して、推測される障害を子宮内膜状（腺およびストロマ）として検証するよう要求している。したがって、子宮内膜症用の非侵襲性診断テストはかなり有用であろう。 Therefore, it was impossible to determine whether a woman had endometriosis based on symptoms alone. Within the medical community, the diagnosis of endometriosis can only be confirmed by direct visualization of endometrial disorders during surgery. Many physicians often require further restrictions and use histology on endometrial biopsy tissue to validate suspected disorders as endometrial (glands and stroma). Thus, a noninvasive diagnostic test for endometriosis would be quite useful.

特開平５−２７７１１９号公報。JP-A-5-277119. 特開平５−１７６９３２号公報。JP-A-5-176932. 特開平６−１１９２９１号公報。Japanese Patent Laid-Open No. 6-119291. 特開平７−８４９８１号公報。JP-A-7-84981.

したがって、本発明の目的は、子宮内膜症用の非侵襲性診断補助装置を提供することである。また、本発明の目的は、子宮内膜症および他の障害および状態の診断を助ける意思決定支援システム中で使用すべき重要な変数を選択する方法を提供することである。また、本発明の目的は、新しい変数を識別すること、疾病用の新しい生化学テストおよびマーカを識別すること、既存の診断方法を改善する新しい診断テストを設計することである。 Accordingly, it is an object of the present invention to provide a noninvasive diagnostic assistance device for endometriosis. It is also an object of the present invention to provide a method for selecting important variables to be used in a decision support system that aids in the diagnosis of endometriosis and other disorders and conditions. It is also an object of the present invention to identify new variables, identify new biochemical tests and markers for disease, and design new diagnostic tests that improve existing diagnostic methods.

疾病、障害、および他の医療状態の診断およびそれを助ける意思決定支援システムを使用する方法が提供される。本発明において提供される方法は、患者病歴データおよび重要な変数の識別を使用して診断テストを開発する方法、重要な選択される変数を識別する方法、診断テストを設計する方法、診断テストの有用性を評価する方法、診断テストの臨床的効用を拡大する方法、様々な可能な治療の結果を予測することによって治療方針を選択する方法を含む。また、子宮内膜症など診断することが難しい疾病、特定の期間中の出産の可能性など妊娠関連事象の予測、女性の健康に関連する他のそのような障害を含めて、障害の診断を助ける疾病パラメータまたは変数が提供される。本明細書では女性の障害を例に挙げるが、本発明の方法は任意の障害または状態に適用できることを理解されたい。 Methods are provided for diagnosing illnesses, disorders, and other medical conditions and using decision support systems that assist in it. The methods provided in the present invention include methods for developing diagnostic tests using patient history data and identification of important variables, methods for identifying important selected variables, methods for designing diagnostic tests, diagnostic tests Including methods for assessing utility, expanding the clinical utility of diagnostic tests, and selecting treatment strategies by predicting the outcome of various possible treatments. Diagnose disorders, including diseases that are difficult to diagnose, such as endometriosis, pregnancy-related events such as the possibility of childbirth during a specific period, and other such disorders related to women's health. Helping disease parameters or variables are provided. Although female disorders are taken as an example herein, it should be understood that the method of the invention can be applied to any disorder or condition.

また、ニューラルネットワークトレーニングを使用して、テストの感度および特異性を改善するためにテストの開発を案内し、疾病状態または医療状態の全体的な診断またはその潜在能力を改善する診断テストを選択する手段が提供される。最後に、所与の診断テストの有効性を評価する方法が説明される。 Also, use neural network training to guide test development to improve test sensitivity and specificity and select diagnostic tests that improve the overall diagnosis or potential of a disease or medical condition Means are provided. Finally, a method for evaluating the effectiveness of a given diagnostic test is described.

したがって、本発明では、障害または状態の診断を助ける変数または変数の組を識別する方法が提供される。重要な変数を識別し、選択する方法および診断用の生成システムでは、患者のデータまたは情報、一般に患者の病歴または臨床データを収集し、このデータに基づく変数を識別する。例えば、データは、各患者が経験した妊娠の回数に関する各患者ごとの情報を含む。したがって、抽出された変数は妊娠の回数である。変数を意思決定支援システムによって分析し、ニューラルネットワークによって例証して、重要なまたは関連する変数を識別する。 Accordingly, the present invention provides a method for identifying a variable or set of variables that aids in the diagnosis of a fault or condition. Methods for identifying and selecting important variables and diagnostic generation systems collect patient data or information, typically patient history or clinical data, and identify variables based on this data. For example, the data includes information for each patient regarding the number of pregnancies experienced by each patient. Therefore, the extracted variable is the number of pregnancy. Variables are analyzed by a decision support system and illustrated by a neural network to identify important or related variables.

ニューラルネットワークや他の適応性のある処理システム（「データ収集ツール」と総称）など、コンピュータベースの意思決定支援システムを使用して、医療診断テストを開発する方法が提供される。ニューラルネットワークまたは他のそのようなシステムは、患者のデータおよび症状が既知であるか、または推測されるテスト患者のグループから収集された観測値に基づいてトレーニングされる。関連する変数のサブセットまたは複数のサブセットは、ニューラルネットワークやニューラルネットワークのコンセンサスなど、意思決定支援システムまたは複数の意思決定支援システムを使用して識別される。別の組の意思決定支援システムは、識別されたサブセットに基づいてトレーニングされ、その症状用のニューラルネットベースのテストなど、コンセンサス意思決定支援システムベースのテストを発生する。コンセンサスニューラルネットワークなど、コンセンサスシステムを使用すれば、ニューラルネットワークベースのシステムなど、意思決定支援システムの極小値の負の影響を最小限に抑えられ、それによりシステムの確度が改善される。 Methods are provided for developing medical diagnostic tests using computer-based decision support systems, such as neural networks and other adaptive processing systems (collectively referred to as “data collection tools”). A neural network or other such system is trained based on observations collected from a group of test patients whose patient data and symptoms are known or inferred. The subset or subsets of related variables are identified using a decision support system or a plurality of decision support systems, such as neural networks or neural network consensus. Another set of decision support systems is trained based on the identified subset to generate a consensus decision support system based test, such as a neural network based test for that symptom. Using a consensus system, such as a consensus neural network, minimizes the negative impact of local minima in decision support systems, such as neural network based systems, thereby improving system accuracy.

また、性能を向上または改善するために、使用する患者の数を増やせば患者データを増大させることができる。また、生化学テストデータおよび他のデータを追加の例の一部として、またはデータを変数選択プロセスの前に追加の変数として使用して含めることができる。 Also, patient data can be increased by increasing the number of patients used to improve or improve performance. Biochemical test data and other data can also be included as part of additional examples, or the data can be used as additional variables prior to the variable selection process.

得られたシステムは、診断の補助装置として使用される。さらに、システムを使用するとき、患者データを記憶し、次いでそれを使用して、システムをさらにトレーニングし、特定の遺伝的集団に適合するシステムを開発することができる。追加のデータのシステム中へのこの入力は、自動的に実施されるか、または手動で実施される。そうすることによって、システムは、連続的に学習し、それらが使用される特定の環境に適合する。得られたシステムは、診断の他に、疾病または障害の重さの評価、選択された治療プロトコルの結果の予測を含む多数の用途を有する。このシステムはまた、生化学テストデータや他のそのようなデータなど、診断手順中の他のデータの値を評価するため、および特定の疾病を診断するために有用な新しいテストを識別するために使用される。 The resulting system is used as a diagnostic aid. Further, when using the system, patient data can be stored and then used to further train the system and develop a system that fits a particular genetic population. This entry of additional data into the system is performed automatically or manually. By doing so, the system learns continuously and adapts to the specific environment in which they are used. In addition to diagnosis, the resulting system has a number of uses including assessing the severity of a disease or disorder, and predicting the outcome of a selected treatment protocol. This system is also used to evaluate the value of other data during the diagnostic procedure, such as biochemical test data and other such data, and to identify new tests useful for diagnosing specific diseases used.

したがってまた、既存の生化学テストを改善する方法、関連する生化学テストを識別する方法、障害および状態の診断を助ける新しい生化学テストを開発する方法が提供される。これらの方法では、意思決定支援システムベースのテストの性能に対する特定のテストまたは潜在的な新しいテストの影響を評価する。テストからの情報の追加が性能を改善する場合、そのようなテストは診断に関連する。 Accordingly, methods are also provided for improving existing biochemical tests, identifying related biochemical tests, and developing new biochemical tests that help diagnose disorders and conditions. These methods assess the impact of a specific test or potential new test on the performance of a decision support system based test. Such tests are relevant to diagnosis if the addition of information from the test improves performance.

本発明において特に重要であり、かつ本発明の方法が容易に適用できる障害および状態は、子宮内膜症、不妊症、特定の期間中の出産の可能性など妊娠関連事象の予測、子癇前症を含めて、婦人科学状態および生殖能力に影響を及ぼす他の状態である。ただし、これらに限定されない。しかしながら、本発明の方法は任意の障害または状態に適用できることを理解されたい。 Disorders and conditions that are of particular importance in the present invention and to which the method of the present invention can be readily applied include endometriosis, infertility, the prediction of pregnancy related events such as the possibility of childbirth during a specific period, preeclampsia Gynecological conditions and other conditions that affect fertility, including However, it is not limited to these. However, it should be understood that the method of the present invention can be applied to any disorder or condition.

これらの方法は、ニューラルネットワークに関して例を挙げて説明するが、エキスパートシステム、ファジー諭理、決定ツリー、および一般的に非線形である他の統計的意思決定支援システムなど、他のデータ収集ツールも使用できることを理解されたい。本発明において提供される変数は意思決定支援システムとともに使用するようになされているが、変数を識別した後、重要な変数の知識を備えた人、一般に医師は、それらを使用して、意思決定支援システムがない場合、またはあまり複雑でない線形分析システムを使用して診断を助けることができる。 These methods are described with examples for neural networks, but also use other data collection tools such as expert systems, fuzzy logic, decision trees, and other statistical decision support systems that are generally non-linear. Please understand that you can. The variables provided in the present invention are intended to be used with a decision support system, but after identifying the variables, a person with knowledge of important variables, typically a physician, can use them to make decisions. In the absence of a support system, or a less complex linear analysis system can be used to aid diagnosis.

本明細書に示すように、診断を助ける際に今まで重要であることが知られていなかった変数またはその組合せが識別される。さらに、生化学テストデータを補足することなく、患者病歴データを使用して、本発明において提供されるニューラルネットなど、意思決定支援システムとともに使用したときに障害または状態を診断するか、または障害または状態の診断を助けることができる。さらに、生化学データを使用した診断または生化学データを使用しない診断の確度は、侵襲性外科診断手順が不要になるほど十分である。 As shown herein, variables or combinations thereof that have not been known to be important in assisting diagnosis are identified. Furthermore, without supplementing biochemical test data, patient history data can be used to diagnose a disorder or condition when used with a decision support system, such as a neural network provided in the present invention, or Can help diagnose the condition. Furthermore, the accuracy of a diagnosis using biochemical data or a diagnosis not using biochemical data is sufficient to eliminate the need for invasive surgical diagnostic procedures.

また、本発明では、診断テストの臨床的効用を識別し、拡大する方法が提供される。特定のテストの結果、今まで注目する障害または状態に関して臨床的効用があると考えられていなかった特定のテストの結果は、変数と結合され、ニューラルネットなど、意思決定支援システムとともに使用される。システムの性能、障害を正確に診断する能力がテストの結果の追加によって改善された場合、テストは、臨床的効用または新しい効用を有することになる。 The present invention also provides a method for identifying and extending the clinical utility of diagnostic tests. Specific test results that have not previously been considered clinically useful for the disorder or condition of interest are combined with variables and used with a decision support system such as a neural network. If the performance of the system, the ability to accurately diagnose a fault, is improved by adding test results, the test will have clinical utility or new utility.

同様に、得られたシステムを使用すれば、薬品または療法の新しい効用を識別することができ、また特定の薬品および療法の用途を識別することができる。例えば、このシステムを使用すれば、特定の薬品または療法が有効である患者の副次集団を選択することができる。したがって、薬品または療法用の指示を拡大する方法、および新しい薬品および療法を識別する方法が提供される。 Similarly, the resulting system can be used to identify new utilities or therapies and to identify specific drug and therapy applications. For example, the system can be used to select a sub-population of patients for whom a particular drug or therapy is effective. Accordingly, a method is provided for expanding drug or therapy instructions and for identifying new drugs and therapies.

特定の実施形態では、ニューラルネットワークを使用して、特定の観測値およびテスト結果を評価し、生化学診断テストまたは他の診断テストの開発を案内し、テスト用の意思決定支援機能を提供する。 In certain embodiments, neural networks are used to evaluate specific observations and test results, guide the development of biochemical diagnostic tests or other diagnostic tests, and provide decision support functions for testing.

また、意思決定支援システム中で使用される重要な変数（パラメータ）またはその組を識別する方法が提供される。この方法は、本明細書では医療診断に関して例を挙げて説明するが、重要なパラメータまたは変数を複数の中から選択する、財務分析など、任意の分野において広く応用できる。 Also provided is a method for identifying key variables or sets used in a decision support system. Although this method will be described herein with reference to medical diagnosis, it can be widely applied in any field, such as financial analysis, in which important parameters or variables are selected from a plurality.

特に、変数の有効な組合せを選択する方法が提供される。この方法は、（１）一組の「ｎ」個の候補変数および最初は空である一組の「選択された重要な変数」を与えるステップ、（２）カイ二乗および感度分析に基づいてすべての候補変数を順位付けするステップ、（３）最も高い「ｍ」個の順位付けされた変数（ｍは１からｎまで）を一度に取り、重要な変数の現在の組に結合された変数に基づいてニューラルネットのコンセンサスをトレーニングすることによって各変数を評価するステップ、（４）ｍ個の変数のうち最もよい変数（最もよい変数とは最も高い性能を与える変数である）を選択し、それが選択された重要な変数の性能と比較して性能を改善する場合、それを「選択された重要な変数」の組に追加し、それを候補組から除去し、ステップ（３）で処理を継続し、それ以外の場合、ステップ（５）に進むステップ、（５）候補組のすべての変数を評価した場合、プロセスを終了し、それ以外の場合、次の最も高い「ｍ」個の順位付けされた変数を一度に取り、重要な選択された変数の現在の組に結合された変数に基づいてニューラルネットのコンセンサスをトレーニングすることによって各変数を評価し、ステップ（４）を実施するステップを含む。重要な選択された変数の最終組は、複数、一般に三つから五つよりも多い変数を含む。 In particular, a method is provided for selecting valid combinations of variables. This method consists of (1) giving a set of “n” candidate variables and a set of “selected important variables” that are initially empty, (2) all based on chi-square and sensitivity analysis (3) taking the highest “m” ranked variables (where m is 1 to n) at a time and combining them into the current set of important variables Evaluating each variable by training the consensus of the neural network based on (4) selecting the best variable (the best variable is the variable that gives the highest performance) out of the m variables, and If it improves performance compared to the performance of the selected critical variable, it is added to the “selected critical variable” set, removed from the candidate set, and processed in step (3). Continue, otherwise Step to step (5), (5) If all variables in the candidate set have been evaluated, terminate the process, otherwise take the next highest “m” ranked variables at once Evaluating each variable by training a consensus of the neural network based on the variables coupled to the current set of important selected variables and performing step (4). The final set of important selected variables includes multiple, typically more than three to five variables.

特定の実施形態では、感度分析は、（ｋ）観測データセット中の各変数ごとに平均観測値を決定するステップ、（ｌ）トレーニング例を選択し、意思決定支援システム中で例を実行して、通常の出力として指定され、記憶される出力値を発生するステップ、（ｍ）選択されたトレーニング例中の第一の変数を選択し、観測値を第一の変数の平均観測値と交換し、意思決定支援システム中で修正された例を順方向モードで実行し、出力を修正された出力として記録するステップ、（ｎ）通常の出力と修正された出力との差を二乗し、それを各変数ごとに合計として累積するステップ（この合計は各変数ごとに選択された変数合計に指定される）、（ｏ）例中の各変数ごとにステップ（ｍ）およびステップ（ｎ）を繰り返すステップ、（ｐ）データセット中の各例ごとにステップ（ｌ）からステップ（ｎ）を繰り返すステップ（選択された変数の各合計は、意思決定支援システム出力の決定に対する各変数の相対的寄与を表す）を含む。この合計は、意思決定支援システム出力の決定に対するその相対的寄与に従って各変数を順位付けするために使用される。 In certain embodiments, the sensitivity analysis comprises (k) determining an average observation for each variable in the observation data set; (l) selecting a training example and executing the example in a decision support system; Generating a stored output value designated as normal output, (m) selecting the first variable in the selected training example, and replacing the observed value with the average observed value of the first variable Executing the modified example in the decision support system in forward mode and recording the output as a modified output, (n) squaring the difference between the normal output and the modified output, A step of accumulating as a sum for each variable (this sum is designated as the variable sum selected for each variable), (o) a step of repeating step (m) and step (n) for each variable in the example , (P) day Repeating step (n) for each example in the set from step (l) (the total of the selected variables represent the relative contribution of each variable for the determination of the decision support system outputs) a. This sum is used to rank each variable according to its relative contribution to the decision support system output decision.

本明細書で示すように、ニューラルネットワークなどコンピュータベースの意思決定支援システムは、最初に重要であると考えられていなかったいくつかの入力ファクタが結果に影響を及ぼしうることを明らかにする。関連する入力ファクタを明らかにするニューラルネットワークのこの能力により、診断テストの設計を案内するのにニューラルネットワークを使用することができる。したがって、診断テストを設計する方法、および診断テストの効用を評価する方法も提供される。各場合において、テストまたは可能なテストからのデータは、意思決定支援システムの入力に加えられる。データが入力中に含まれるときに結果が改善された場合、診断テストは臨床的効用を有する。 As shown herein, computer-based decision support systems, such as neural networks, reveal that several input factors that were not initially considered important can affect the results. This ability of a neural network to account for relevant input factors allows it to be used to guide the design of diagnostic tests. Accordingly, a method for designing a diagnostic test and a method for evaluating the utility of the diagnostic test are also provided. In each case, data from the test or possible test is added to the input of the decision support system. A diagnostic test has clinical utility if results improve when data is included in the input.

今まで特定の障害の診断において重要であることが知られていなかったテストが識別され、または新しいテストが開発できる。ニューラルネットワークは、スプリアスデータ点の影響を減じ、かつ代用されうる他のデータ点があればそれを識別することによって診断テストに耐性を加えることができる。 Tests that have not previously been known to be important in the diagnosis of a particular disorder can be identified or new tests can be developed. Neural networks can add to the diagnostic test by reducing the effects of spurious data points and identifying any other data points that can be substituted.

ネットワークを一組の変数に対してトレーニングし、次いで診断テストデータまたは生化学テストデータからの臨床データおよび／または追加の患者情報を入力データに追加する。ない場合と比較して結果を改善する変数を選択する。したがって、今まで特定の障害を診断する際に重要であることが知られていなかった特定のテストが関連性を有することが分かる。例えば、血清抗体のウェスタンブロット上の特定のスポットの有無を疾病状態に相関させることができる。特定のスポット（すなわち抗原）の同一性に基づいて、新しい診断テストが開発できる。 The network is trained against a set of variables, and then clinical data and / or additional patient information from diagnostic or biochemical test data is added to the input data. Select variables that improve results compared to none. Thus, it can be seen that certain tests that were previously not known to be important in diagnosing certain disorders are relevant. For example, the presence or absence of a specific spot on a Western blot of serum antibodies can be correlated to the disease state. New diagnostic tests can be developed based on the identity of a particular spot (ie antigen).

疾病の診断を助けるために予測技術を適用する方法、より具体的には疾病子宮内膜症の診断を助けるために様々な情報源からの入力とともにニューラルネットワーク技法を使用する方法の一例が提供される。コンピュータシステム中のネットワークのコンセンサスに従って動作するニューラルネットワークのトレーニングされた組を使用して、その一部が一般に疾病状態に関連しない、例えば調査によって得られる特定の臨床的関連を評価する。これは、例示的な疾病状態子宮内膜症の場合に証明され、子宮内膜症の診断を助けるために使用されるファクタが提供される。ニューラルネットワークトレーニングは、本明細書で臨床データと呼ぶ、その疾病状態が外科的に検証されていないかなりの数の臨床患者の医師によって供給される答えと質問との相関に基づいている。 An example of how to apply predictive technology to help diagnose a disease, more specifically, how to use neural network techniques with input from various sources to help diagnose a disease endometriosis is provided. The A trained set of neural networks that operate according to the network consensus in the computer system is used to assess certain clinical associations, some of which are generally not associated with disease states, eg, obtained by research. This is demonstrated in the case of the exemplary disease state endometriosis and provides factors that are used to help diagnose endometriosis. Neural network training is based on the correlation between answers and questions, referred to herein as clinical data, provided by a significant number of clinical patient physicians whose disease states have not been surgically verified.

４０個以上の臨床データファクタの集合から抽出される特定のトレーニングされたニューラルネットワーク中の１２個から約１６個の複数のファクタ、具体的には一組の１４個のファクタが子宮内膜症の一次兆候として識別される。次のパラメータの組、すなわち年齢、パリティ（出産回数）、妊娠（妊娠回数）、流産回数、喫煙（箱／日）、過去の子宮内膜症歴、月経困難症、骨盤痛、異常ｐａｐ／形成異常症、骨盤手術歴、薬物治療歴、妊娠高血圧症、生殖器いぼ、糖尿病が重要であると識別された。他の同様のパラメータの組も識別された。これらの変数のサブセットも子宮内膜症を診断する際に使用できる。 More than 12 to about 16 multiple factors in a particular trained neural network extracted from a set of 40 or more clinical data factors, specifically a set of 14 factors, is associated with endometriosis. Identified as a primary sign. The following set of parameters: age, parity (number of births), pregnancy (number of pregnancy), number of miscarriages, smoking (box / day), past endometriosis history, dysmenorrhea, pelvic pain, abnormal pap / formation Abnormalities, pelvic surgery history, medication history, pregnancy hypertension, genital warts and diabetes were identified as important. Other similar parameter sets were also identified. A subset of these variables can also be used in diagnosing endometriosis.

次の三つの変数の組合せのうちの一つ（または複数）を含む、選択されたパラメータの組の任意のサブセット、特に１４個の変数の組が子宮内膜症の診断用の意思決定支援システムとともに使用できる。
ａ）出産回数、子宮内膜症歴、骨盤手術歴
ｂ）糖尿病、妊娠高血圧症、喫煙
ｃ）妊娠高血圧症、異常ｐａｐしみ／形成異常症、子宮内膜症歴
ｄ）年齢、喫煙、子宮内膜症歴
ｅ）喫煙、子宮内膜症歴、月経困難症
ｆ）年齢、糖尿病、子宮内膜症歴
ｇ）妊娠高血圧症、出産回数、子宮内膜症歴
ｈ）喫煙、出産回数、子宮内膜症歴
ｉ）妊娠高血圧症、子宮内膜症歴、骨盤手術歴
ｊ）妊娠回数、子宮内膜症歴、骨盤手術歴
ｋ）出産回数、異常ＰＡＰしみ／形成異常症、子宮内膜症歴
ｌ）出産回数、異常ＰＡＰしみ／形成異常症、月経困難症
ｍ）子宮内膜症歴、骨盤手術歴、月経困難症
ｎ）妊娠回数、子宮内膜症歴、月経困難症。 A decision support system for diagnosing endometriosis, wherein any subset of selected parameter sets, particularly 14 variable sets, includes one (or more) of the following three variable combinations: Can be used with.
a) Number of births, endometriosis history, pelvic surgery history b) Diabetes, pregnancy hypertension, smoking c) Pregnancy hypertension, abnormal pap stain / dysplasia, endometriosis history d) Age, smoking, intrauterine E) Smoking, endometriosis, dysmenorrhea f) Age, diabetes, endometriosis g) Pregnancy hypertension, number of births, endometriosis h) Smoking, births, intrauterine I) Pregnancy hypertension, endometriosis history, pelvic surgery history j) Number of pregnancy, endometriosis history, pelvic surgery history k) Number of childbirth, abnormal PAP stain / dysplasia, endometriosis history l) Number of births, abnormal PAP stain / dysplasia, dysmenorrhea m) History of endometriosis, pelvic surgery, dysmenorrhea n) Number of pregnancy, history of endometriosis, dysmenorrhea.

子宮内膜症の診断に変数を使用する診断ソフトウェアおよび例示的なニューラルネットワークも提供される。このソフトウェアは、臨床的に有用な子宮内膜症インデックスを生成する。 Diagnostic software and exemplary neural networks that use variables to diagnose endometriosis are also provided. This software generates a clinically useful endometriosis index.

他の実施形態では、子宮内膜症のテストに使用される診断ニューラルネットワークシステムの性能は、ネットワークのトレーニングに使用されるファクタ（本明細書では生化学テストデータと呼ぶ。これは分析からのテスト、脈拍や血圧など、生命徴候などのデータを含む）の一部として関連する生化学テストからの生化学テスト結果に基づく変数を含めることによって向上する。それによって得られる例示的なネットワークは、生化学テストの結果および１４個の臨床パラメータを含めて、１５個の入力ファクタを使用する増大ニューラルネットワークである。８個の増大ニューラルネットワークの重みの組は、８個の臨床データニューラルネットワークの重みの組と異なる。例示の生化学テストは、ＥＬＩＳＡ診断テストフォーマットなど、免疫診断テストフォーマットを使用する。 In other embodiments, the performance of a diagnostic neural network system used to test endometriosis is a factor used in network training (referred to herein as biochemical test data. This is a test from an analysis. Improve by including variables based on biochemical test results from relevant biochemical tests (including data such as vital signs, such as pulse and blood pressure). The resulting exemplary network is an augmented neural network that uses 15 input factors, including biochemical test results and 14 clinical parameters. The weight set of the 8 augmented neural networks is different from the weight set of the 8 clinical data neural networks. An exemplary biochemical test uses an immunodiagnostic test format, such as an ELISA diagnostic test format.

本明細書で例示した子宮内膜症に適用した方法は、例えば、不妊症、特定の期間中の出産の可能性など妊娠関連事象の予測、子癇前症など、婦人科学障害および女性関連障害を含めたがそれだけには限られない、他の障害用のファクタを識別するために同様に適用し、使用できる。したがって、ニューラルネットワークは、疾病状態を予測し、それらを生化学データに結合する際に重要なファクタの識別に基づいて疾病状態を予測するようにトレーニングできる。 The methods applied to endometriosis exemplified herein include gynecological and female-related disorders, such as infertility, prediction of pregnancy-related events such as the possibility of childbirth during a specific period, preeclampsia, etc. It can be similarly applied and used to identify other failure factors, including but not limited to. Thus, neural networks can be trained to predict disease states based on the identification of factors that are important in predicting disease states and combining them with biochemical data.

得られた診断システムは、状態または障害の存在だけでなく、障害の重さを診断するために、また治療方針を選択する際の補助装置として適しており、使用できる。 The resulting diagnostic system is suitable for use in diagnosing not only the presence of a condition or disorder, but also the severity of the disorder and as an aid in selecting a treatment strategy.

定義．
別段の定義がない限り、本明細書で使用するすべての技術用語および科学用語は、一般に本発明がそれに属する技術分野の当業者が理解できるのと同じ意味を有する。本明細書で参照するすべての特許および文献は、参照により本発明の一部となる。 Definition.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. All patents and documents referred to herein are part of the present invention by reference.

本明細書で使用する「データ収集システム」または「データシステム中の知識発見」とも呼ばれる意思決定支援システムは、入力データを分類するためにデータに基づいてトレーニングされ、次いでトレーニングデータに基づいて後で意思決定を行うために新しい入力データとともに使用できる任意のシステム、一般にコンピュータベースのシステムである。これらのシステムは、エキスパートシステム、ファジー論理、非線形回帰分析、多変量分析、意思決定ツリー分類装置、ベイズの信念ネットワーク、および本明細書で例示するニューラルネットワークを含む。ただし、これらに限定されない。 As used herein, a decision support system, also referred to as a “data collection system” or “knowledge discovery in a data system”, is trained on data to classify input data and then later on the basis of training data. Any system that can be used with new input data to make decisions, generally a computer-based system. These systems include expert systems, fuzzy logic, nonlinear regression analysis, multivariate analysis, decision tree classifiers, Bayesian belief networks, and the neural networks exemplified herein. However, it is not limited to these.

本明細書で使用する適応機械学習プロセスは、データを使用して、予測解決策を生成する任意のシステムである。そのようなプロセスは、エキスパートシステム、ニューラルネットワーク、およびファジー論理によって実施されるプロセスである。 As used herein, an adaptive machine learning process is any system that uses data to generate a predictive solution. Such processes are those implemented by expert systems, neural networks, and fuzzy logic.

本明細書で使用するエキスパートシステムは、そのタスクの知識またはその知識を使用するための論理的な規則または手順に基づくコンピュータベースの問題解決および意思決定支援システムである。専門分野の専門家の経験からの知識ならびに論理がコンピュータ中に入力される。 As used herein, an expert system is a computer-based problem solving and decision support system based on knowledge of the task or logical rules or procedures for using that knowledge. Knowledge and logic from the expertise of specialists is input into the computer.

本明細書で使用するニューラルネットワーク、またはニューラルネットは、密に相互接続された適応処理要素から構成される並列計算モデルである。ニューラルネットワークでは、処理要素は、入力層、出力層、および少なくとも一つの隠れた層中に構成される。適切なニューラルネットワークは、当業者に知られている（例えば、米国特許第５２５１６２６号、第５４７３５３７号および第５３３１５５０号、Baxt(1991年)「Use of an Artificial Neural Network for the Diagnosis of Myocardial Infarction」、Annals of Internal Medicine 115:843;Baxt（1992年）「Improving the Accuracy of an Artificial Neural Network Using Multiple Differently Trained Networks」、Neural Computation 4:772;Baxt(1992年)「Analysis of the clinical variables that drive decision in an artificial neural network trained to identify the presence of myocardialinfarction」、Annals of Emergency Medicine 21:1439; Baxt（1994 年）「Complexity, chaos and human physiology: the justification for non-linear neural computation analysis」、 Cancer Letters 77:85参照）。 As used herein, a neural network, or neural network, is a parallel computing model composed of adaptive processing elements that are closely interconnected. In a neural network, processing elements are organized in an input layer, an output layer, and at least one hidden layer. Suitable neural networks are known to those skilled in the art (eg, US Pat. Nos. 5,251,626, 5,473,537 and 5,331,550, Baxt (1991) “Use of an Artificial Neural Network for the Diagnosis of Myocardial Infarction”, Annals of Internal Medicine 115: 843; Baxt (1992) "Improving the Accuracy of an Artificial Neural Network Using Multiple Differently Trained Networks", Neural Computation 4: 772; Baxt (1992) "Analysis of the clinical variables that drive decision in an artificial neural network trained to identify the presence of myocardialinfarction ”, Annals of Emergency Medicine 21: 1439; Baxt (1994)“ Complexity, chaos and human physiology: the justification for non-linear neural computation analysis ”, Cancer Letters 77:85 reference).

本明細書で使用するパーセプトロンまたは人工ニューロンとも呼ばれる処理要素は、複数の入力からの入力データを伝達関数に従って単一の二進出力中にマップする計算ユニットである。各処理要素は、その入力で受信された信号を掛けられて、重み付けされた入力値を発生する各入力に対応する入力重みを有する。処理要素は、各入力の重み付けされた入力値を合計して、重み付けされた合計を発生し、次いでこれが伝達関数によって定義されたしきい値と比較される。 As used herein, a processing element, also called a perceptron or artificial neuron, is a computational unit that maps input data from multiple inputs into a single binary output according to a transfer function. Each processing element has an input weight corresponding to each input that is multiplied by the signal received at that input to generate a weighted input value. The processing element sums the weighted input values for each input to generate a weighted sum that is then compared to a threshold defined by the transfer function.

本明細書で使用するしきい値関数または活性化関数とも呼ばれる伝達関数は、二つの別個のカテゴリを定義する曲線を生成する数学的関数である。伝達関数は、線形であるが、ニューラルネットワーク中で使用されるとき、より一般的には、二次関数、多項式関数、またはＳ字形関数を含めて非線形である。 As used herein, a transfer function, also called a threshold function or activation function, is a mathematical function that produces a curve that defines two distinct categories. The transfer function is linear, but when used in a neural network, it is more generally non-linear, including quadratic, polynomial, or sigmoidal functions.

本明細書で使用する逆方向伝搬は、ターゲット出力と実際の出力との間の誤差を訂正するためのニューラルネットワーク用のトレーニング方法である。誤差信号はニューラルネットワークの処理層中にフィードバックされて、処理要素の重みの変化により実際の出力がターゲット出力により近づく。 Back propagation as used herein is a training method for a neural network to correct an error between a target output and an actual output. The error signal is fed back into the processing layer of the neural network, and the actual output approaches the target output due to the change in the weight of the processing element.

本明細書で使用するクイックプロップは、Fahlmanが提案し、開発し、報告した逆方向伝搬方法である（「Fast Learning Variations on Back-Propagation: An Empirical Study」、Proceedings on the 1988 Connectionist Models Summer School, Pittsburgh，1988，D．Touretzky 他編，pp．38-51，Morgan Kaufmann，カリフォルニア州 San Mateo; Lebriereとの共著、「The Cascade-Correlation Learning Architecture」、Advances in Neural Information Processing Systems 2, (Denver, 1989),D．Touretzky 編，pp．524-32．Morgan Kaufmann，カリフォルニア州 San Mateo）。 The quick prop used here is the back propagation method proposed, developed and reported by Fahlman ("Fast Learning Variations on Back-Propagation: An Empirical Study", Proceedings on the 1988 Connectionist Models Summer School, Pittsburgh, 1988, D. Touretzky et al., Pp. 38-51, Morgan Kaufmann, San Mateo, CA; co-authored with Lebriere, “The Cascade-Correlation Learning Architecture”, Advances in Neural Information Processing Systems 2, (Denver, 1989 ), D. Touretzky, pp. 524-32, Morgan Kaufmann, San Mateo, California).

本明細書で使用する診断は、疾病、障害または他の医療状態の存在、不在、重さまたは治療方法を評価する予測プロセスである。本明細書では、診断はまた、治療から得られた結果を決定する予測プロセスを含む。 Diagnosis, as used herein, is a predictive process that assesses the presence, absence, severity, or method of treatment of a disease, disorder or other medical condition. As used herein, diagnosis also includes a predictive process that determines the results obtained from the treatment.

本明細書で使用する生化学テストデータは、免疫学的検定法、生物学的検定法、クロマトグラフィ、モニタおよびイメージャからのデータ、測定値を含む（ただしこれらに限定されない）任意の分析方法の結果であり、また、脈拍、体温、血圧、例えば、ＥＫＧ、ＥＣＧ、ＥＥＧ、バイオリズムモニタの結果、および他のそのような情報など、生命徴候および身体機能に関するデータを含む。分析は、例えば、分析物、血清マーカ、抗体、およびサンプル中の患者から得られる他のそのような材料を評価できる。 Biochemical test data as used herein is the result of any analytical method including, but not limited to, data from immunoassays, biological assays, chromatography, monitors and imagers, and measurements. And also includes data on vital signs and physical functions such as pulse, body temperature, blood pressure, eg, EKG, ECG, EEG, biorhythm monitor results, and other such information. The analysis can evaluate, for example, analytes, serum markers, antibodies, and other such materials obtained from the patient in the sample.

本明細書で使用する患者病歴データは、質問表などによって、患者から得られたデータであるが、一般に本明細書で使用する生化学テストデータを含まない。ただし、そのようなデータが病歴データである限りは、所望の解決策は、障害の診断を生成できる数または結果を生成する。 The patient medical history data used in this specification is data obtained from a patient by a questionnaire or the like, but generally does not include biochemical test data used in this specification. However, as long as such data is historical data, the desired solution produces a number or result that can produce a diagnosis of the disorder.

本明細書で使用するトレーニング例は、単一の診断用の観測データ、一般に一人の患者に関する観測データを含む。 The training examples used herein include observation data for a single diagnosis, generally observation data for a single patient.

本明細書で使用する患者病歴データから識別されたパラメータは、本明細書では観測ファクタまたは値または変数と呼ぶ。例えば、患者データは、個々の患者の喫煙習慣に関する情報を含む。それに関連する変数は喫煙である。 As used herein, parameters identified from patient history data are referred to herein as observation factors or values or variables. For example, patient data includes information regarding individual patient smoking habits. A related variable is smoking.

本明細書で使用する分割手段は、８０％など、データの一部を選択し、それをニューラルネットをトレーニングするために使用し、残りの部分をテストデータとして使用することを意味する。したがって、ネットワークは、データの一部以外に基づいてトレーニングされる。このプロセスは、その場合繰り返され、第二のネットワークをトレーニングできる。このプロセスは、すべての区分がテストデータおよびトレーニングデータとして使用されるまで繰り返される。 As used herein, the dividing means means selecting a portion of the data, such as 80%, using it to train the neural network, and using the remaining portion as test data. Thus, the network is trained based on other than part of the data. This process can then be repeated to train the second network. This process is repeated until all segments are used as test data and training data.

本明細書で使用する使用できるデータを複数のサブセット中に分割することによるトレーニングの方法は、一般にトレーニングの「ホールドアウト方法」と呼ばれる。ホールドアウト方法は、ネットワークトレーニングに使用できるデータが制限されるときに特に有用である。 The method of training by dividing the available data used herein into multiple subsets is commonly referred to as the training “holdout method”. The holdout method is particularly useful when the data available for network training is limited.

本明細書で使用するトレーニングは、入力データを使用して、意思決定支援システムを生成するプロセスである。特に、ニューラルネットに関して、トレーニングは、特定の処理要素が、他の処理要素の出力と結合されたときに、ニューラルネットの出力とトレーニングデータ中に提示された所望の出力との間の生じた誤差を最小限に抑える結果を発生する出力を与える処理要素の重みに対する一連のインタラクティブ調整を行う試行錯誤プロセスである。 Training as used herein is the process of generating a decision support system using input data. In particular, for neural networks, training is the error that occurs between the output of the neural network and the desired output presented in the training data when a particular processing element is combined with the output of other processing elements. Is a trial and error process that performs a series of interactive adjustments to the weights of the processing elements that provide an output that produces a result that minimizes

本明細書で使用する変数選択プロセスは、予測結果をもたらす変数の組合せを任意の使用できる組から選択する系統的方法である。選択は、追加の変数の追加が結果を改善しないようにサブセットの予測性能を最大にすることによって実施される。本明細書において提供される好ましい方法では、可能なすべての組合せを考慮せずに変数が選択できる。 As used herein, the variable selection process is a systematic method of selecting a combination of variables that yields a prediction result from any available set. Selection is performed by maximizing the prediction performance of the subset so that the addition of additional variables does not improve the results. In the preferred method provided herein, variables can be selected without considering all possible combinations.

本明細書で使用する候補変数は、意思決定支援システムとともに使用できる財務記録など診断実施形態または他の記録用のテスト患者のグループから収集された観測値から選択された項目である。候補変数は、患者データなどのデータを収集し、観測値を一組の変数として分類することによって得られる。 Candidate variables as used herein are items selected from observations collected from a group of test patients for diagnostic embodiments or other records such as financial records that can be used with a decision support system. Candidate variables are obtained by collecting data such as patient data and classifying observations as a set of variables.

本明細書で使用する重要な選択された変数は、手元のタスクのネットワーク性能を高める変数である。使用できるすべての変数を含めることは、最適なニューラルネットワークをもたらさない。いくつかの変数がネットワークトレーニング中に含まれるとき、ネットワーク性能は低下する。関連するパラメータのみを使用してトレーニングされるネットワークは、ネットワーク性能の向上をもたらす。これらの変数はまた、本明細書において関連する変数のサブセットとも呼ばれる。 As used herein, important selected variables are those that enhance the network performance of the task at hand. Including all variables that can be used does not result in an optimal neural network. When some variables are included during network training, network performance is degraded. A network trained using only relevant parameters results in improved network performance. These variables are also referred to herein as a subset of the relevant variables.

本明細書で使用する順位付けは、変数を選択の順序でリストするプロセスである。順位付けは、任意でよく、または整理されることが好ましい。整理は、例えば、診断などタスクに対して変数を重要度順に順位付けする統計分析によるか、意思決定支援システムベースの分析によって実施される。順位付けはまた、例えば、専門家か、規則ベースのシステムか、またはこれらの方法の任意の組合せによって実施できる。 As used herein, ranking is the process of listing variables in order of selection. The ranking may be arbitrary or preferably organized. The organization is performed, for example, by statistical analysis in which variables are ranked in order of importance for tasks such as diagnosis, or by analysis based on a decision support system. Ranking can also be performed by, for example, an expert, a rules-based system, or any combination of these methods.

本明細書で使用するニューラルネットワークのコンセンサスは、各出力の重みが任意に決定されるか、または等しい値に設定される複数のニューラルネットワークからの出力の線形組合せである。 As used herein, neural network consensus is a linear combination of outputs from multiple neural networks where the weight of each output is arbitrarily determined or set to an equal value.

本明細書で使用するグリーディアルゴリズムは、所与のデータセットからの点を含めるか、除くかどうかを決定することによってデータセットを最適化する方法である。このセットは、要素がない状態から始まり、部分解決策があるとすれば、目的を最も改善する他の値が選択される近視最適化によって残りの要素の実現可能なセットから要素を連続的に選択する。 As used herein, a greedy algorithm is a method of optimizing a data set by determining whether to include or exclude points from a given data set. This set starts with no elements, and if there is a partial solution, the other values that best improve the objective are chosen. select.

本明細書で使用するジェネティックアルゴリズムは、トレーニングサイクル中に実行され、かつ所望のターゲットに到達する際にそれらの性能に従って順位付けされるランダムに生成されるニューラルネットワークの初期分布から始まる方法である。十分に実行しないネットワークはその分布から除去され、より適切なネットワークは、親ネットワークの所望の特性を保持する子孫へのクロスオーバプロセス用に保持され、選択される。 As used herein, a genetic algorithm is a method that starts from an initial distribution of randomly generated neural networks that are executed during a training cycle and ranked according to their performance when reaching a desired target. Networks that do not perform well are removed from the distribution, and more appropriate networks are retained and selected for crossover processes to descendants that retain the desired characteristics of the parent network.

本明細書で使用するシステムの性能は、結果が特定の結果をより正確に子測または決定したときに改善される、またはより高くなると言われる。また、システムの性能は、一般により多くのトレーニング例を使用したときによりよくなることを理解されたい。したがって、本発明のシステムは、それらが使用されるときに時間ととも向上し、より多くの患者データが蓄積され、次いでトレーニングデータとしてシステムに追加される。 As used herein, the performance of a system is said to improve or become higher when a result more accurately measures or determines a particular result. It should also be understood that the performance of the system is generally better when using more training examples. Thus, the systems of the present invention improve over time as they are used, and more patient data is accumulated and then added to the system as training data.

本明細書で使用する感度＝ＴＰ／（ＴＰ＋ＦＮ）、特異性はＴＮ／（ＴＮ＋ＦＰ）である。ただし、ＴＰ＝真の正、ＴＮ＝真の負、ＦＰ＝偽の正、ＦＮ＝偽の負である。臨床感度は、テストが疾病を有する患者をどのくらいよく検出するかを測定する。臨床特異性は、テストが疾病を有しない患者をどのくらいよく正確に識別するかを測定する。 As used herein, sensitivity = TP / (TP + FN), specificity is TN / (TN + FP). However, TP = true positive, TN = true negative, FP = false positive, and FN = false negative. Clinical sensitivity measures how well the test detects patients with disease. Clinical specificity measures how well the test accurately identifies patients with no disease.

本明細書で使用する正予測値（ＰＰＶ）はＴＰ／（ＴＰ＋ＦＰ）である。負予測値（ＮＰＶ）はＴＮ／（ＴＮ＋ＦＮ）である。正予測値は、正テストを有する患者が実際に疾病を有する可能性である。負予測値は、負テスト結果を有する患者が疾病を有しない可能性である。 The positive predictive value (PPV) used in this specification is TP / (TP + FP). The negative predicted value (NPV) is TN / (TN + FN). A positive predictive value is the likelihood that a patient with a positive test will actually have the disease. A negative predictive value is the likelihood that a patient with a negative test result will not have the disease.

本明細書で使用するファジー論理は、正確に記述できないシステムを処理する手法である。メンバシップ関数（データセット中のメンバシップ）はファジー諭理システム中では二進ではない。代わりにメンバシップ関数は分数値をとる。したがって、要素は、セットのメンバシップの係数が異なる場合にもかかわらず、矛盾する二つのセット中に同時に含まれうる。したがって、このタイプの手法は、ｙｅｓまたは答えがない質問に答えるために有用である。したがって、このタイプの論理は、答えがしばしば一つの程度である患者病歴質問票からの返答を分類するのに適している。 As used herein, fuzzy logic is a technique for handling systems that cannot be accurately described. Membership functions (membership in a data set) are not binary in a fuzzy logic system. Instead, the membership function takes a fractional value. Thus, an element can be simultaneously included in two conflicting sets, even though the set membership coefficients are different. This type of approach is therefore useful for answering yes or no-answer questions. This type of logic is therefore suitable for classifying responses from patient history questionnaires where answers are often only one degree.

１．一般的考察および一般的方法．
患者病歴および／または生化学情報など観測値を分析するニューラルネットワークをトレーニングするためにいくつかの技法が使用できることが決定されている。使用できるデータおよび分析すべき問題の特性に応じて、様々なニューラルネットワークトレーニング技法が使用できる。例えば、大量のトレーニング入力が使用できる場合、冗長トレーニング情報をなくす方法が採用される。 1. General considerations and general methods.
It has been determined that several techniques can be used to train neural networks that analyze observations such as patient history and / or biochemical information. Depending on the data available and the characteristics of the problem to be analyzed, various neural network training techniques can be used. For example, when a large amount of training input can be used, a method of eliminating redundant training information is adopted.

本明細書に示すように、ニューラルネットワークはまた、最初に重要であると考えられていなかったいくつかの入力ファクタが結果に影響を及ぼすことを明らかにし、かつ多分重要なファクタが結果決定因ではないことを明らかにする。関連する入力ファクタおよび関連しない入カファクタを明らかにするニューラルネットワークの能力は、診断テストの設計を案内する際にニューラルネットワークを使用することを可能にする。本明細書に示すように、ニューラルネットワーク、および他のそのようなデータ収集ツールは、診断における貴重な進歩であり、診断テストの感度および特異性を高める機会を与える。本明細書に示すように、極小値の現象のために不十分な確度の答えの可能性を回避するよう注意しなければならない。本発明の方法は、この問題を回避するか、または少なくともそれを最小限に抑える手段を提供する。 As shown here, neural networks also reveal that several input factors that were not initially considered important affect the outcome, and perhaps important factors are not the outcome determinants. Make it clear. The ability of a neural network to account for relevant input factors and unrelated input factors allows the neural network to be used in guiding diagnostic test design. As shown herein, neural networks, and other such data collection tools, are valuable advances in diagnosis and provide an opportunity to increase the sensitivity and specificity of diagnostic tests. Care must be taken to avoid the possibility of an answer with insufficient accuracy due to the phenomenon of local minima, as shown herein. The method of the present invention provides a means to avoid this problem or at least minimize it.

開発診断手順、特に患者情報にのみまたは一部に基づく診断テストを開発する場合、いくつかの問題が解決される。例えば、一般に、トレーニングデータが使用できる限られた数の患者が存在するので限られた量のデータが存在する。これを解決するために、以下で説明するように、患者情報は、ネットワークをトレーニングするときに分割される。また、一般に、使用できるデータに関連して使用するために使用できる多数の入力観測ファクタが存在し、したがって観測値を順位付けし、選択する方法が開発された。 Several problems are solved when developing diagnostic procedures, particularly diagnostic tests based solely or partly on patient information. For example, there is generally a limited amount of data because there are a limited number of patients for which training data is available. To solve this, patient information is split when training the network, as described below. Also, in general, there are a number of input observation factors that can be used for use in connection with available data, and thus methods have been developed to rank and select observations.

また、一般に、使用できる患者データ中に多数の二進（真／偽）入力ファクタが存在するが、これらのファクタは、一般に性質がまばら（使用できる患者データ中の二進入力ファクタの一部の場合においてのみ正または負である値）である。また、診断中の正のファクタと負のファクタとの間に高度のオーバラップが存在する。 Also, there are generally a large number of binary (true / false) input factors in available patient data, but these factors are generally sparse in nature (some of the binary input factors in available patient data). A value that is positive or negative only in some cases). There is also a high degree of overlap between the positive and negative factors being diagnosed.

上記その他の特性は、診断テストを開発するために使用される手順および方法の選択に影響を及ぼす。これらの問題は、本発明において処理され、解決される。 These other characteristics affect the choice of procedures and methods used to develop diagnostic tests. These problems are addressed and solved in the present invention.

２．患者病歴診断テストの開発．
診断テスト．
患者病歴データのみに基づく診断の方法が提供される。本明細書で証明するように、患者病歴情報のみに依存するが、診断を助ける意思決定支援システムを提供することができる。したがって、得られたシステムは、生化学テストデータの予測能力を改善すること、新しい疾病マーカを識別すること、生化学テストを開発すること、今まで特定の障害を予測すると考えられていなかったテストを識別するために使用できる。 2. Development of patient history diagnostic test.
Diagnostic test.
A method of diagnosis based solely on patient history data is provided. As demonstrated herein, a decision support system that assists in diagnosis can be provided, depending only on patient history information. Thus, the resulting system improves the predictive ability of biochemical test data, identifies new disease markers, develops biochemical tests, tests that have not previously been thought of predicting a specific disorder Can be used to identify

これらの方法はまた、選択された治療方法の結果を予測することによって適切な治療方法を選択し、療法後の状態を予測するために使用できる。トレーニング用の入力変数は、例えば、選択された治療および結果を含めて、診断および他の使用できるデータを示す電子的患者記録から得られる。得られた意思決定支援システムは、その場合、例えば、異なる治療に応答し、かつ特定の治療の結果を予測する異なるクラスに女性を分類するために使用できるすべてのデータとともに使用される。これにより、治療またはプロトコルの選択が成功する確率を最も高くすることができる。 These methods can also be used to select the appropriate treatment method by predicting the outcome of the selected treatment method and to predict the post-therapy condition. Input variables for training are obtained from electronic patient records showing diagnosis and other usable data, including, for example, selected treatments and outcomes. The resulting decision support system is then used with all data that can be used, for example, to classify women into different classes that respond to different treatments and predict the outcome of a particular treatment. This can maximize the probability of successful treatment or protocol selection.

同様に、これらのシステムは、薬品または療法の新しい効用を識別するために使用でき、また特定の薬品および療法の用途を識別するために使用できる。例えば、これらのシステムは、特定の薬品または療法が有効である患者の副次集団を選択するために使用できる。したがって、薬品または療法の支持を拡大する方法および新しい薬品および療法を識別する方法が提供される。 Similarly, these systems can be used to identify new benefits of drugs or therapies, and can be used to identify specific drug and therapy applications. For example, these systems can be used to select patient subpopulations for which a particular drug or therapy is effective. Accordingly, methods are provided for expanding drug or therapy support and for identifying new drugs and therapies.

患者データの収集、変数の生成、および概要．
本発明の方法を例示するために、図１に、患者病歴ベースの診断テスト方法を開発する流れ図を示す。このプロセスは、患者病歴データの収集から始まる（ステップＡ）。患者病歴データまたは観測値は、患者質問票、臨床結果、場合によっては診断テスト結果、患者医療記録から得られ、コンピュータ読取り可能な形でコンピュータ上で動作するシステムに供給される。デジタルコンピュータでは、患者病歴データは、（真／偽など）二進値と定量（連続的な）値の二つの形態の一組の変数に分類される。二進値変数は、「たばこを吸いますか」という質問に対する答えを含むこともある。定量値変数は、「一日に何箱たばこを吸いますか」という質問に対する答えであることもある。メンバシップ関数など他の値も入力手段として有用である。 Collect patient data, generate variables, and overview.
To illustrate the method of the present invention, FIG. 1 shows a flow chart for developing a patient history-based diagnostic test method. This process begins with the collection of patient history data (step A). Patient history data or observations are obtained from patient questionnaires, clinical results, possibly diagnostic test results, patient medical records, and provided to a system that operates on a computer in computer readable form. In a digital computer, patient history data is classified into a set of variables in two forms: binary values (such as true / false) and quantitative (continuous) values. A binary variable may contain an answer to the question "Do you smoke". The quantitative variable may be the answer to the question “how many cigarettes do you smoke per day”. Other values such as membership functions are also useful as input means.

患者病歴データはまた、診断すべき医療状態の存在、不在、または重さを示すと考えられるターゲットまたは所望の結果変数を含む。この所望の結果情報は、ニューラルネットワークトレーニング用に有用である。トレーニングデータ中に含めるべきデータの選択は、診断すべき医療状態の存在、重さまたは不在の知識または仮定を使用して行われる。本明細書に示すように、診断はまた、進行の評価および／または療法治療の有効性を含む。 Patient history data also includes targets or desired outcome variables that are considered to indicate the presence, absence, or severity of the medical condition to be diagnosed. This desired result information is useful for neural network training. The selection of data to be included in the training data is made using knowledge or assumptions of the presence, weight or absence of the medical condition to be diagnosed. As indicated herein, diagnosis also includes assessment of progression and / or effectiveness of therapeutic treatment.

定義でき、したがって生成できる変数の数は扱いにくい。二進変数は、一般に、正（または負の）応答の数がしばしば応答の全体的な数の一部であるのでまばらである。したがって、代表的なトレーニングデータ環境中で使用できる多数の変数および少数の患者が存在する場合、診断にとって重要な変数のサブセットを使用できる変数から分離するステップがとられる（ステップＢ）。使用できる変数の中からの変数のサブセットの特定の選択は、ニューラルネットワークの診断性能に影響を及ぼす。 The number of variables that can be defined and therefore generated is cumbersome. Binary variables are generally sparse because the number of positive (or negative) responses is often part of the overall number of responses. Thus, if there are a large number of variables and a small number of patients that can be used in a typical training data environment, a step is taken to separate a subset of variables that are important for diagnosis from the variables that can be used (step B). The particular selection of a subset of variables from the available variables affects the diagnostic performance of the neural network.

本明細書で概説した方法は、一般に医師などトレーニングされた専門家によって選択された変数のサブセットと比較して感度および信頼性が同等であるか、または優れている変数のサブセットをつくり出すことが分かっている。いくつかの例では、変数は、順位または関連性の順に優先権が与えられるか、または配置される。 The methods outlined herein have been found to produce a subset of variables that are generally equivalent or superior in sensitivity and reliability compared to a subset of variables selected by a trained professional such as a physician. ing. In some examples, variables are given priority or placed in order of rank or relevance.

その後、診断手順で使用すべき最後のニューラルネットワークをトレーニングする（ステップＣ）。好ましい実施形態では、ネットワークのコンセンサス（すなわち複数のネットワーク）をトレーニングする。得られたネットワークは、完成した患者病歴診断テスト（ステップＤ）用の意思決定支援機能を形成する。 Thereafter, the last neural network to be used in the diagnostic procedure is trained (step C). In the preferred embodiment, network consensus (ie, multiple networks) is trained. The resulting network forms a decision support function for the completed patient history diagnostic test (step D).

重要な変数を分離する方法．
重要な変数を分離する方法が本発明において提供される。この方法によれば、変数の可能なすべての組合せを比較することによって有効な変数の組を選択することができる。重要な変数は、意思決定支援システム用の入力として使用される。 How to separate important variables.
A method for separating important variables is provided in the present invention. According to this method, a valid set of variables can be selected by comparing all possible combinations of variables. The important variables are used as input for the decision support system.

重要な変数または関連する変数の分離−変数の順位付け．
図３に、診断テスト中の重要な変数または関連する変数を分離する方法の流れ図を示す（ステップＥ）。そのような方法は、一般に、潜在的に関連する情報が与えられているデジタルコンピュータシステムを使用して実施される。この手順では、独立した二つの方法を使用して変数を重要度の順に順位付けし、次いで順位の最上部から使用できる変数のサブセットを選択する。上記のように、当業者は、カイ二乗または感度分析の代わりに他の順位付け方法を使用することができる。また、ｘをＮ（候補変数の総数）まで設定した場合、順位付けは任意である。 Separation of important or related variables-ranking of variables.
FIG. 3 shows a flow diagram of a method for separating important or related variables during a diagnostic test (step E). Such a method is generally implemented using a digital computer system provided with potentially relevant information. In this procedure, two independent methods are used to rank the variables in order of importance, and then select a subset of variables that can be used from the top of the rank. As noted above, those skilled in the art can use other ranking methods instead of chi-square or sensitivity analysis. Further, when x is set up to N (total number of candidate variables), ranking is arbitrary.

このシステムは、以下で説明するように、使用できるデータに基づいて複数のニューラルネットワークをトレーニングし（ステップＩ）、次いですべてのトレーニングされたネットワーク上で感度分析を生成して、各入力変数が診断を実施するためにネットワーク中でどの程度まで使用されたかを決定する（ステップＪ）。各入力変数のコンセンサス感度分析は、各トレーニングされたネットワークごとに個々の感度分析結果を平均化することによって決定される。感度に基づいて、患者病歴情報から得られた各変数ごとの順位を決定する（ステップＫ）。 The system trains multiple neural networks based on available data (step I) and then generates a sensitivity analysis on all trained networks, as described below, with each input variable diagnosed To what extent it has been used in the network to implement (step J). A consensus sensitivity analysis for each input variable is determined by averaging the individual sensitivity analysis results for each trained network. Based on the sensitivity, the rank for each variable obtained from the patient medical history information is determined (step K).

変数の順位付け．
好ましい実施形態では、変数は、カイ二乗分析など統計分析および／または感度分析など意思決定支援システムベースの分析を使用して順位付けされる。例示の実施形態では、感度分析およびカイ二乗分析を使用して、変数を順位付けする。限定しないが、回帰分析、判別分析、および当業者に知られている他の方法を含めて、他の統計方法および／または意思決定支援システムベースの方法も使用できる。順位付けされた変数は、ネットワークをトレーニングするために使用でき、また本発明において与えられる変数選択の方法中で使用できることが好ましい。 Ranking of variables.
In a preferred embodiment, the variables are ranked using a decision support system based analysis such as statistical analysis such as chi-square analysis and / or sensitivity analysis. In the exemplary embodiment, sensitivity analysis and chi-square analysis are used to rank the variables. Other statistical methods and / or decision support system based methods may also be used, including but not limited to regression analysis, discriminant analysis, and other methods known to those skilled in the art. The ranked variables can preferably be used to train the network and can be used in the method of variable selection given in the present invention.

この方法では、各入力を変更し、出力の対応する変化を測定する感度分析を使用する（また Modai 他（1993 年）「Clinical Decisions for Psychiatric Inpatients and Their Evaluation by Trained Neural Networks」、Methods of Information in Medicine 32:396-99; Wilding 他（1994 年）「Application of Backpropogation Neural Networks to Diagnosis of Breast and Ovarian Cancer」、Cancer Letters 77:145-53; Ruck 他（1990年）「Feature Selection in Feed-Forward Neural Networks」Neural Network Co puting 20:40-48; Utans 他（1993 年）「Selecting Neural Network Architectures Via the Prediction Risk: Application to Corporate Bond Rating Prediction」、Proceedings of the First International Conference on Artificial Intelligence Applications on Wall Street. Washington, D.C.，IEEE Computer Society Press．pp．35-41;Penny他（1996 年）「Neural Networks in Clinical Medicine」、Medical Decision-support 4:386-398参照）。そのような方法は、今まで本明細書で説明するように、重要な変数を選択するために使用されていなかった。例えば、感度分析を使用して、重要な変数を選択するためにではなく、変数間の関係を決定する統計手法を開発することが報告されている（Baxt 他（1995年）「Bootstrapping Confidence Intervals for Clinical Input Variable Effects in a Network Trained to Identify the Presence of Myocardial Infarction」、Neural Computation7:624-38参照）。そのような感度分析は、本明細書で説明するように診断の補助手段として重要な変数の選択の一部として使用できる。 This method uses a sensitivity analysis that modifies each input and measures the corresponding change in output (also Modai et al. (1993) “Clinical Decisions for Psychiatric Inpatients and Their Evaluation by Trained Neural Networks”, Methods of Information in Medicine 32: 396-99; Wilding et al. (1994) “Application of Backpropogation Neural Networks to Diagnosis of Breast and Ovarian Cancer”, Cancer Letters 77: 145-53; Ruck et al. (1990) “Feature Selection in Feed-Forward Neural Networks ”Neural Network Coputing 20: 40-48; Utans et al. (1993)“ Selecting Neural Network Architectures Via the Prediction Risk: Application to Corporate Bond Rating Prediction ”, Proceedings of the First International Conference on Artificial Intelligence Applications on Wall Street. Washington, DC, IEEE Computer Society Press, pp. 35-41; Penny et al. (1996) “Neural Networks in Clinical Medicine”, Medical Decision-support 4: 386-398). Such a method has not been used so far to select important variables as described herein. For example, it has been reported to develop statistical methods to determine relationships between variables rather than to select important variables using sensitivity analysis (Baxt et al. (1995) “Bootstrapping Confidence Intervals for Clinical Input Variable Effects in a Network Trained to Identify the Presence of Myocardial Infarction ”, Neural Computation 7: 624-38). Such sensitivity analysis can be used as part of the selection of variables that are important as diagnostic aids as described herein.

図３のステップＫに感度分析の概要を示す。各ネットワークまたは複数のトレーニングされたニューラルネットワーク（ネットワークＮ₁からＮ_n）は、各トレーニング例Ｓ_x（出力が知られているか、または推測される入力データグループ。少なくとも二つのトレーニング例が存在しなければならない）ごとに順方向モードで実行される。ただし、「ｘ」はトレーニング例の数である。各トレーニング例Ｓ_xの各ネットワークＮ₁〜Ｎ_nの出力は、記録される、すなわちメモリ中に記憶される。すべてのトレーニング例中の各入力変数の平均値を含む新しいトレーニング例が定義される。一度に一つ、各元の各トレーニング例Ｓ_x値の各入力変数がその対応する平均値Ｖ_1(avg)からＶ_y(avg)と交換される。ただし、「ｙ」は変数の数である。 An overview of sensitivity analysis is shown in Step K of FIG. Each network or a plurality of trained neural networks (networks N ₁ to N _n ) is connected to each training example S _x (input data group whose output is known or inferred. There must be at least two training examples. Run in forward mode. However, “x” is the number of training examples. The output of each network N ₁ -N _n of each training example S _x is recorded, ie stored in memory. A new training example is defined that contains the average value of each input variable in all training examples. One at a time, each input variable of each original training example S _x value is exchanged for its corresponding average value V _{1 (avg)} to V _{y (avg)} . However, “y” is the number of variables.

修正されたトレーニング例Ｓ_x’は、複数のネットワーク中で再び実行され、各変数の各ネットワークごとに修正された出力を発生する。元のトレーニング例Ｓ_xからの出力と各入力変数の修正された出力との間の差は、各入力変数に対応する個々の合計を得るために二乗され、合計される（累計される）。説明するために、例えば、それぞれ１５個の変数Ｖ₁〜Ｖ₁₅を有する別々の１０個のニューラルネットワークＮ₁〜Ｎ₁₀および異なる５個のトレーニング例Ｓ₁〜Ｓ₅の場合、５個のトレーニング例の各トレーニング例は、１０個のネットワークを介して実行され、合計５０個の出力を発生する。各トレーニング例から変数Ｖ₁をとり、平均値Ｖ_1(avg)を計算する。この平均化された値Ｖ_1(avg)は、５個のトレーニング例の各トレーニング例中に代用され、修正されたトレーニング例Ｓ₁’〜Ｓ₅’を生成し、それらは、再び１０個のネットワーク中で実行される。５０個の修正された出力値がネットワークＮ₁〜Ｎ₁₀および５個のトレーニング例によって生成される。修正は、平均値変数Ｖ_1(avg)を使用した結果である。５０個の元の出力値および修正された出力値それぞれの間の差を計算する、すなわちネットワークＮ₆中のトレーニングＳ₄からの元の出力ＯＵＴ（Ｓ₄Ｎ₆）を、ネットワークＮ₆中のトレーニング例Ｓ₄からの修正された出力ＯＵＴ（Ｓ₄’Ｎ₆）から引く。この差値を二乗する［ＯＵＴ（Ｓ₄’Ｎ₆）−ＯＵＴ（Ｓ₄Ｎ₆）］² _V1。この値を、変数Ｖ₁がその平均値Ｖ_1(avg)と代用された反復に対してネットワークとトレーニング例のすべての組合せについて二乗された差値と合計する。すなわち次式が得られる。 The modified training example S _x ′ is run again in multiple networks, producing a modified output for each network for each variable. The difference between the output from the original training example S _x and the modified output of each input variable is squared and summed (cumulated) to obtain the individual sums corresponding to each input variable. To illustrate, for example, in the case of 10 separate neural networks N _{1 to} N ₁₀ each having ₁₅ variables V _{1 to} V ₁₅ and 5 different training examples S _{1 to} S ₅ , 5 trainings Each example training example is performed over 10 networks, producing a total of 50 outputs. The variable V ₁ is taken from each training example, and the average value V _{1 (avg)} is calculated. This averaged value V _{1 (avg)} is substituted into each of the 5 training examples to generate modified training examples S ₁ ′ to S ₅ ′, which are again 10 Executed in the network. 50 modified output values are generated by the networks N ₁ -N ₁₀ and 5 training examples. The correction is the result of using the mean value variable V _{1 (avg)} . Calculating the difference between the respective 50 pieces of the original output value and the modified output value, i.e. the original output OUT from the training S ₄ in the network _{_{_{N 6 (S 4 N 6)}}} , in the network N ₆ Subtract from the modified output OUT (S ₄ 'N ₆ ) from training example S ₄ . The difference value is squared [OUT (S ₄ 'N ₆ ) −OUT (S ₄ N ₆ )] ² _V1. This value is summed with the difference value squared for all combinations of network and training examples for the iteration in which the variable V ₁ is substituted with its mean value V _{1 (avg)} . That is, the following equation is obtained.

次に、このプロセスを変数＃２について繰り返し、ネットワークとトレーニング例の各組合せごとに元の出力と修正された出力との間の差を求め、二乗し、次いで差を合計する。このプロセスを、１５個のすべての変数が終了するまで各変数ごとに繰り返す。 The process is then repeated for variable # 2 to determine the difference between the original output and the modified output for each combination of network and training example, squared, and then sum the differences. This process is repeated for each variable until all 15 variables have been completed.

次いで、すべての変数が単一の生じた出力に等しく寄与した場合に正規化された値が１．０になるように生じた各合計を正規化する。前の例の後、各変数ごとに合計二乗差を合計して、すべての変数について全合計二乗差を得る。各変数の値を全合計二乗差で割り、各変数からの寄与を正規化する。この情報から、各変数の正規化された値を重要度の順に順位付けすることができる。より高い相対的な数は、対応する変数が出力により大きい影響を及ぼすことを示す。入力変数の感度分析を使用して、どの変数がネットワーク出力を発生するに最大の役目を果たしたかを示す。 Each resulting sum is then normalized so that the normalized value is 1.0 if all variables contributed equally to a single resulting output. After the previous example, the total squared difference is summed for each variable to obtain the total total squared difference for all variables. Divide the value of each variable by the total squared difference and normalize the contribution from each variable. From this information, the normalized values of each variable can be ranked in order of importance. A higher relative number indicates that the corresponding variable has a greater effect on the output. Use input variable sensitivity analysis to show which variables have played the most role in generating the network output.

本発明では、コンセンサスネットワークを使用して、感度分析を実施することは変数選択プロセスを改善することが分かっている。例えば、二つの変数が高度に相関する場合、データに基づいてトレーニングされた単一のニューラルネットワークがその二つの変数の一方のみを使用して、診断をつくり出すこともある。変数が高度に相関する場合、両方の変数を含めることによって得られるものはほとんどなく、どちらの変数を含めるべきかの選択は、トレーニング中のネットワークの初期開始条件に依存する。単一のネットワークを使用する感度分析は、一方のみ、または他方のみが重要であることを示すこともある。それぞれ異なる初期条件を使用してトレーニングされる複数のネットワークのコンセンサスから得られる感度分析は、高度に相関した両方の変数が重要であることを明らかにすることがある。感度分析を一組のニューラルネットワークにわたって平均化することによって、初期条件の影響を最小限に抑えるコンセンサスが形成される。 In the present invention, it has been found that performing sensitivity analysis using a consensus network improves the variable selection process. For example, if two variables are highly correlated, a single neural network trained on the data may use only one of the two variables to create a diagnosis. If the variables are highly correlated, very little is gained by including both variables, and the choice of which variable to include depends on the initial starting conditions of the network being trained. Sensitivity analysis using a single network may indicate that only one or only the other is important. Sensitivity analysis obtained from consensus of multiple networks trained using different initial conditions may reveal that both highly correlated variables are important. By averaging the sensitivity analysis over a set of neural networks, a consensus is formed that minimizes the effects of initial conditions.

カイ二乗分割表．
まばらな二進データを処理する場合、所与の変数に対する正の応答は、診断中の条件に高度に相関することもあるが、トレーニングデータ中で発生することはごくまれであるので、ニューラルネットワーク感度分析によって示される変数の重要度が非常に低くなることもある。これらの発生を捕らえるために、カイ二乗分割表が二次順位付けプロセスとして使用される。テーブルの各セルが二つの変数の組合せに対して観測された周波数である二進変数に対する２×２分割表カイ二乗テストを実施する（図３、ステップＦ）。（経験的に決定されることもある）最適なしきい値を使用して、連続的な変数に対して２×２分割表カイ二乗テストを実施する（ステップＧ）。カイ二乗分析に基づいた二進変数および連続的な変数を順位付けする（ステップＨ）。 Chi-square contingency table.
When processing sparse binary data, a positive response to a given variable may be highly correlated with the condition being diagnosed, but rarely occurs in training data, so a neural network The importance of variables shown by sensitivity analysis can be very low. To capture these occurrences, a chi-square contingency table is used as a secondary ranking process. A 2 × 2 contingency table chi-square test is performed on the binary variable, which is the frequency at which each cell of the table is observed for a combination of two variables (FIG. 3, step F). A 2 × 2 contingency table chi-square test is performed on continuous variables using the optimal threshold (which may be determined empirically) (step G). Rank binary and continuous variables based on chi-square analysis (step H).

二進変数に対して動作する標準のカイ二乗２×２分割表（ステップＦ）を使用して、特定の二進入力変数と（トレーニングデータを既知の単一出力結果と比較することによって決定された）所望の出力との間の関係の重要度を決定する。小さいカイ二乗値を有する変数は、一般に所望の出力に関連しない。 Using a standard chi-square 2x2 contingency table (step F) that operates on binary variables, determined by comparing specific binary input variables (with training data against known single output results Determine the importance of the relationship between the desired output. Variables with small chi-square values are generally not related to the desired output.

連続的な値を有する変数の場合、連続的な変数をしきい値と比較することによって２×２分割表を構成できる（ステップＧ）。しきい値は、経験的に修正され、できる限り大きいカイ二乗値をもたらす。 In the case of variables having continuous values, a 2 × 2 contingency table can be constructed by comparing the continuous variables with threshold values (step G). The threshold is empirically modified to yield the largest possible chi-square value.

次いで、連続的な変数のカイ二乗値と二進変数のカイ二乗値とを共通の順位付けのために結合できる（ステップＨ）。次いで、カイ二乗順位付けされた変数を感度分析順位付けされた変数と結合する第二のレベルの順位付けを実施できる（ステップＬ）。順位付けのこの結合により、出力に大きく関連し、かつまばらな変数（すなわち、ごく一部の場合正または負である値）を重要な変数の組の中に含めることができる。さもなければ、そのような非線形システム中の重要な情報が容易に見落とされる可能性がある。 The continuous variable chi-square value and the binary variable chi-square value can then be combined for common ranking (step H). A second level ranking can then be performed that combines the chi-square ranking variables with the sensitivity analysis ranking variables (step L). This combination of ranking allows sparse variables (ie, values that are positive or negative in some cases) to be included in the set of important variables that are highly related to the output. Otherwise, important information in such non-linear systems can be easily overlooked.

順位付けされた変数の中からの重要な変数の選択．
上記のように、重要な変数は、識別された変数の中から選択される。選択は、第二のレベルの順位付けプロセスが呼び出されたときに変数を順位付けした後で実施されることが好ましい。意思決定支援システム中で使用される重要な変数（パラメータ）またはその組を識別する方法も提供される。この方法は、本明細書では医療診断を例に挙げて説明するが、重要なパラメータまたは変数を複数の中から選択する統計ベースの予測を行う財務分析や他の試みなど任意の分野において広く応用できる。 Select important variables from among the ranked variables.
As described above, important variables are selected from among the identified variables. The selection is preferably performed after ranking the variables when the second level ranking process is invoked. A method of identifying important variables (parameters) or sets used in a decision support system is also provided. This method is described here using medical diagnostics as an example, but it is widely applied in any field, such as financial analysis or other attempts to make statistical-based predictions that select important parameters or variables from a plurality. it can.

特に、変数の有効な組合せを選択する方法が提供される。一組の「ｎ」個の候補変数および最初空である一組の「選択された重要な変数」を与えるステップ（１）、および上述のように、カイ二乗および感度分析に基づいてすべての候補変数を順位付けするステップ（２）の後、この方法は、一度に最大「ｍ」個（ｍは１からｎまで）の順位付けされた変数をとり、重要な変数の現在の組と結合された変数に基づいてニューラルネットのコンセンサスをトレーニングすることによって各変数を評価するステップ（３）、ｍ個の変数のうち最もよい変数を選択し（最もよい変数とは性能を最も改善する変数である）、それが性能を改善する場合、それを「選択された重要な変数」セットに追加し、それを候補セットから除去し、ステップ（３）での処理を継続し、それ以外の場合、ステップ（５）に進むことによって継続するステップ（４）、候補セット上のすべての変数が評価されていれば、プロセスが終了し、それ以外の場合、一度に次に最も大きい「ｍ」個の順位付けされた変数をとり、重要な選択された変数の現在の組と結合された変数に基づいてニューラルネットのコンセンサスをトレーニングし、ステップ（４）を実施することによって各変数を評価するステップ（５）を含む。 In particular, a method is provided for selecting valid combinations of variables. Providing a set of “n” candidate variables and a set of “selected important variables” that are initially empty, and all candidates based on chi-square and sensitivity analysis, as described above After step (2) of ranking variables, the method takes up to “m” ranked variables at a time (where m is 1 to n) and is combined with the current set of important variables. Step (3) of evaluating each variable by training the consensus of the neural network based on the determined variable, and selecting the best variable among the m variables (the best variable is the variable that improves the performance most) ), If it improves performance, add it to the “selected critical variables” set, remove it from the candidate set, continue processing in step (3), otherwise step Go to (5) Step (4), if all variables on the candidate set have been evaluated, the process ends, otherwise the next largest “m” ranked variables at a time. And (5) training a neural network consensus based on the variables combined with the current set of important selected variables and evaluating each variable by performing step (4).

特に、第二のレベルの順位付けプロセス（ステップＬ）は、感度分析（ステップＫ）からの最も高い順位付けされた変数を重要な変数の組に追加すること（ステップＨ）によって開始する。あるいは、第二のレベルの順位付けプロセスは、空の組から開始し、次いで順位付けの二つの組の各組から上部のいくつか（ｘ個）の変数をテストすることによって開始する。この第二のレベルの順位付けプロセスでは、使用できるデータからの変数の現在選択されている区分またはサブセットに対してネットワークトレーニング手順（ステップＩ）を使用して、一組のニユーラルネットワークをトレーニングする。順位付けプロセスは、「重要な」変数の現在の組（一般に最初空である）ならびに順位付け中または順位付けのためにテスト中の現在の変数を使用し、またグリーディアルゴリズムを使用して、前に識別された重要な変数に基づいて入力組を近視的に最適化することによって入力変数の組を最適化して、出力を最も改善する残りの変数を識別するするネットワークトレーニング手順である。 In particular, the second level ranking process (step L) begins by adding the highest ranked variable from the sensitivity analysis (step K) to the key variable set (step H). Alternatively, the second level ranking process starts with an empty set and then tests the top few (x) variables from each of the two sets of ranking. In this second level ranking process, a set of neural networks is trained using a network training procedure (Step I) against the currently selected segment or subset of variables from the available data. . The ranking process uses the current set of “significant” variables (generally the first empty) as well as the current variables being ranked or being tested for ranking, and using the greedy algorithm A network training procedure that optimizes the input set by myopically optimizing the input set based on the important variables identified to identify the remaining variables that most improve the output.

このトレーニングプロセスを図４に示す。ニューラルネットワークによって使用される入力の数は、所望の出力、すなわちトレーニングデータの既知のターゲット出力に大きく貢献しないことが分かる入力を排除することによって制御される。米国カリフォルニア州 La Jolla Logical Designs Consulting 社の Windows^TM用の ThinksPro^TMニューラルネットワーク（またはTrainDos^TMＤＯＳバージョン）や、当業者が開発できるそのような他のプログラムなど、業務用コンピュータプログラムは、入力を変更し、ネットワークをトレーニングするために使用できる。 This training process is shown in FIG. The number of inputs used by the neural network is controlled by eliminating inputs that are found to not contribute significantly to the desired output, ie, the known target output of the training data. Commercial computer programs such as ThinksPro ^TM Neural Network (or TrainDos ^TM DOS version) for Windows ^TM by La Jolla Logical Designs Consulting, California, USA, and other such programs that can be developed by those skilled in the art will change the input. Can be used to train the network.

米国オレゴン州ビーバートンの California Scientific Software 社、Nevada Adaptive Solutionsから販売されているBrainmaker^tm、米国ペンシルバニア州ピッツバーグのNeuralWare 社から販売されている Neural Network Utility/2^tm、および米国メリーランド州のフレデリックの Ward Systems Group 社から販売されている NeuroShell^tmおよびNeuroWindows^tmを含めて、いくつかの他の市販のニューラルネットワークコンピュータプログラムを使用して、上記の動作のいずれかを実施できる。変数選択およびネットワーク最適化の機能を提供する他のタイプのデータ収集ツール、すなわち意思決定支援システムも設計でき、また他の市販のシステムも使用できる。例えば、米国ワシントン州レドモンドの BioComp Systems 社から販売されている Neuro Genetic Optimizer^TM、およびシンガポール共和国の New Wave Intelligent Business Systems（ＮＩＢ５）社から販売されている Neuro Forecaster/GENETICAは、自然選択に基づいてモデル化された遺伝アルゴリズムを使用して、ネットワーク分布中の十分に実行しないノードをなくし、かつ最適化されたネットワークを「成長」させるために最もよく実行する速度を子孫ノードに送り、かつ結果に大きく貢献しない入力変数をなくす。遺伝アルゴリズムに基づくネットワークは、変異を使用して、極小値中での捕獲を回避し、またクロスオーバプロセスを使用して、新しい構造を分布中に導入する。 Brain Scientific ^tm , sold by Nevada Adaptive Solutions, California Scientific Software, Beaverton, Oregon, USA, Neural Network Utility / 2 ^tm , sold by NeuralWare, Pittsburgh, PA, and Ward, Frederick, Maryland, USA Several other commercially available neural network computer programs can be used to perform any of the above operations, including NeuroShell ^tm and NeuroWindows ^tm sold by Systems Group. Other types of data collection tools that provide variable selection and network optimization functions, i.e., decision support systems, can be designed, and other commercially available systems can be used. For example, Neuro Genetic Optimizer ^TM sold by BioComp Systems in Redmond, Washington, USA, and Neuro Forecaster / GENETICA sold by New Wave Intelligent Business Systems (NIB5), Singapore, are models based on natural selection. Using optimized genetic algorithms to eliminate poorly performing nodes in the network distribution, and to send the best performing speed to the descendant nodes to “grow” the optimized network and greatly increase the results Eliminate input variables that do not contribute. Networks based on genetic algorithms use mutations to avoid trapping in local minima and use crossover processes to introduce new structures into the distribution.

データ中の知識発見（ＫＤＤ）は、変数中に存在する重要な関係を識別するために設計された他のデータ収集ツール、意思決定支援システムであり、多数の可能な関係が存在する場合に有用である。米国マサチューセッツ州べッドフォードのThinking Machines 社から販売されている Darwin^tm、米国カリフォルニア州マウンテンビューの Silicon Graphics 社から販売されている Mineset^tm、米国カリフォルニア州サンフランシスコの Ultragem Data Mining 社のEikoplex^tmを含めて、いくつかのＫＤＤシステムが販売されている。（Eikoplex^tmは心臓病の存在の確率を決定する分類規則を与えるために使用されている）。他のシステムも当業者によって開発されうる。 Knowledge discovery in data (KDD) is another data collection tool, decision support system designed to identify important relationships that exist in variables, and is useful when there are many possible relationships It is. Including Darwin ^tm sold by Thinking Machines Corporation in the United States Massachusetts base Ddofodo, California, USA Mineset ^tm that Mountain sold by Silicon Graphics, Inc. of view, the Ultragem Data Mining Corporation of Eikoplex ^tm of San Francisco, California, USA, Several KDD systems are on the market. (Eikoplex ^tm is used to give a classification rule that determines the probability of the presence of heart disease). Other systems can be developed by those skilled in the art.

順位付け手順を継続して、例えば、ｘを２に設定した場合、二つの順位付け組の各組からの上部の二つの変数をプロセスによってテストし（図３、ステップＬ、Ｓ）、テスト結果が改善を示すかどうかを確かめるために結果を検査する（ステップＴ）。改善が存在する場合、単一の最も性能のよい変数を「重要な」変数の組に追加し、次いでその変数を他のテスト（ステップＳ）のために二つの順位から除去する（図３、ステップＵ）。改善が存在しない場合、改善が見つかるまで、または二つの組からのすべての変数がテストされるまで各組からの次のｘ個の変数についてプロセスを繰り返す。このプロセスは、ソースセットが空である、すなわちすべての関連する変数または重要な変数が最後のネットワーク中に含まれるまで、またはテストすべき組の中の残りのすべての変数が重要な変数の現在のリストの性能以下であることが分かるまで繰り返される。この除去プロセスは、重要な変数の組を決定するためにテストしなければならない使用できる変数のサブセットの数を大いに減らす。最悪の場合でも、１０個の使用できる変数では、プロセスは、ｘ＝２の場合３４個のサブセットのみ、ｘ＝１の場合１０２４個の可能な組合せの１９個のサブセットのみをテストする。したがって、１００個の使用できる変数が存在する場合、３９４個のサブセットのみがｘ＝２の場合テストされる。したがって、最もよいテスト性能を有するネットワークからの変数を使用のために識別する（図３、ステップＶ）。 Continuing the ranking procedure, for example, if x is set to 2, the upper two variables from each of the two ranking pairs are tested by the process (FIG. 3, steps L and S) and the test results Inspect the results to see if shows an improvement (step T). If there is an improvement, add the single best performing variable to the “significant” set of variables and then remove that variable from the two ranks for other tests (step S) (FIG. 3, Step U). If no improvement exists, the process is repeated for the next x variables from each set until an improvement is found or until all variables from the two sets are tested. This process is used until the source set is empty, i.e. all relevant or important variables are included in the last network, or all remaining variables in the set to be tested are Iterate until it is found that it is below the performance of the list. This removal process greatly reduces the number of available subsets of variables that must be tested to determine the set of important variables. In the worst case, with 10 available variables, the process tests only 34 subsets when x = 2 and only 19 subsets of 1024 possible combinations when x = 1. Thus, if there are 100 usable variables, only 394 subsets are tested when x = 2. Therefore, the variables from the network with the best test performance are identified for use (FIG. 3, step V).

次いで、診断を実施するようにネットワークの最後の組をトレーニングする（図４、ステップＭ、Ｎ、Ｑ、Ｒ）。一般に、いくつかの最後のニューラルネットワークが診断を実施するようにトレーニングされる。ニューラルネットワークのこの組は、エンドユーザに供給できる製品の基礎となりうる。異なる初期条件（初期重み）が所与のネットワークに対して異なる出力を発生しうるので、コンセンサスを求めることが有用である。（異なる初期重みは、誤差が極小値中に捕獲されるのを回避するために使用される）。コンセンサスは、トレーニングされたネットワークの各ネットワークの出力を平均化することによって形成され、これは次いで診断テストの単一出力になる。 The last set of networks is then trained to perform the diagnosis (FIG. 4, steps M, N, Q, R). In general, some last neural networks are trained to perform a diagnosis. This set of neural networks can be the basis for products that can be supplied to end users. It is useful to determine consensus because different initial conditions (initial weights) can produce different outputs for a given network. (Different initial weights are used to avoid errors being trapped in the local minimum). A consensus is formed by averaging the output of each network in the trained network, which then becomes the single output of the diagnostic test.

ネットワークのコンセンサスをトレーニングする．
図４に、ニューラルネットワークのコンセンサスをトレーニングする手順を示す。まず、現在のトレーニングサイクルが最後のトレーニングステップであるかどうかを決定する（ステップＭ）。ｙｅｓの場合、すべての使用できるデータをトレーニングデータセット中に入れる（すなわちＰ＝１）（ステップＮ）。ｎｏの場合、使用できるデータをＰ個の等しいサイズの区分に分割し、データを各区分ごとにランダムに選択する（ステップＯ）。例示の実施形態では、例えば、５個の区分、例えば、Ｐ₁〜Ｐ₅を使用できるトレーニングデータのフルセットから生成する。次いで、二つの構成に取りかかる（ステップＰ）。まず、一つまたは複数の区分をテストファイルにコピーし、残りの区分をトレーニングファイルにコピーする。５個の区分の例示の実施形態を継続し、全データセットの２０％を表す区分のーつ、例えば、Ｐ₁をテストフアイルにコピーする。残りの４個のファイルＰ₂〜Ｐ₄をトレーニングデータとして識別する。トレーニング区分を使用して、Ｎ個のニューラルネットワークのグループをトレーニングする。各ネツトワークは異なる開始重みを有する（ステップＱ）。したがって、例示の実施形態では、２０個の異なる乱数種を使用してランダムに選択された開始重みを有する２０個のネットワーク（Ｎ＝２０）が存在する。２０個のネットワークの各ネットワークごとにトレーニングを完了した後、２０個のすべてのネットワークの出力値を平均化して、トレーニングされたネットワークのテストデータの平均性能を与える。次いで、トレーニングされたネットワークを介してテストファイル（区分Ｐ₁）中のデータを実行して、トレーニングされたネットワークの性能の推定値を与える。この性能は、一般に予測の二乗平均誤差または誤分類率として決定される。各ネットワークの個々の性能推定値を平均化することによって最終性能推定値を生成し、完成したコンセンサスネットワークをつくり出す（ステップＲ）。使用できるデータを複数のサブセットに分割することによるトレーニングのこの方法は、一般にトレーニングの「ホールドアウト方法」と呼ばれる。ホールドアウト方法は、ネットワークトレーニングに使用できるデータが制限されるときに特に有用である。 Train network consensus.
FIG. 4 shows a procedure for training a neural network consensus. First, it is determined whether the current training cycle is the last training step (step M). If yes, put all available data into the training data set (ie P = 1) (step N). If no, the available data is divided into P equal sized sections and the data is randomly selected for each section (step O). In the illustrated embodiment, for example, five segments, eg, P ₁ -P _5, are generated from a full set of training data that can be used. Next, two configurations are started (step P). First, one or more sections are copied to a test file and the remaining sections are copied to a training file. Continuing with the exemplary embodiment of five partitions, copy _one of the partitions representing 20% of the total data set, eg, P1, to the test file. The remaining four files P _{2 to} P ₄ are identified as training data. Training groups of N neural networks are trained using training segments. Each network has a different starting weight (step Q). Thus, in the exemplary embodiment, there are 20 networks (N = 20) with starting weights randomly selected using 20 different random number seeds. After completing the training for each of the 20 networks, the output values of all 20 networks are averaged to give the average performance of the trained network test data. The data in the test file (section P ₁ ) is then executed over the trained network to give an estimate of the trained network performance. This performance is generally determined as the mean square error or misclassification rate of the prediction. A final performance estimate is generated by averaging the individual performance estimates for each network to create a completed consensus network (step R). This method of training by dividing the available data into multiple subsets is commonly referred to as the training “holdout method”. The holdout method is particularly useful when the data available for network training is limited.

テストセット性能は、テストセット性能を最大にするネットワークパラメータを識別する様々な実験を実施することによって経験的に最大化できる。実験のこの組中で修正できるパラメータは、１）隠れた処理要素の数、２）入力に加えられる雑音の量、３）誤差許容度の量、４）学習アルゴリズムの選択、５）重み減衰の量、６）変数の数を含む。可能なすべての組合せの完全な探索は、一般に、必要とされる処理時間の量のために実際的でない。したがって、テストネットワークは、ＴｈｉｎｋｓＰｒｏ^TMやユーザが開発したプログラムなど、コンピュータプログラムを介して経験的に選択されるトレーニングパラメータを使用してトレーニングされるか、または注目する分野で働いている他者によって生成された既存のテスト結果の結果からトレーニングされる。「最もよい」構成が決定された後、ネットワークの最終組を完成したデータセットに基づいてトレーニングできる。 Test set performance can be empirically maximized by performing various experiments that identify network parameters that maximize test set performance. The parameters that can be modified in this set of experiments are 1) the number of hidden processing elements, 2) the amount of noise added to the input, 3) the amount of error tolerance, 4) the choice of learning algorithm, 5) the weight decay Quantity, 6) including the number of variables. A complete search for all possible combinations is generally impractical due to the amount of processing time required. Thus, test networks are trained using training parameters selected empirically via computer programs, such as ThinksPro ^™ or user-developed programs, or generated by others working in the field of interest Trained from the results of existing test results. After the “best” configuration is determined, the final set of networks can be trained based on the completed data set.

３．生化学診断テストの開発．
変数を分離する同様の技法を使用して、生化学診断テストを構築または検証することができ、また生化学診断テストデータを患者病歴診断テストと組み合わせて、医療診断の信頼性を高めることができる。 3. Development of biochemical diagnostic tests.
Similar techniques for isolating variables can be used to build or validate biochemical diagnostic tests, and biochemical diagnostic test data can be combined with patient history diagnostic tests to increase the reliability of medical diagnostics .

選択された生化学テストは、患者および／または患者の症状に関連して有用な診断情報がそこから得られる任意のテストを含む。このテストは、機器ベースであるか、または非機器ベースであり、また生物学試料、患者兆候、患者状態、および／またはこれらのファクタの変化の分析を含む。いくつかの分析方法のいずれも使用でき、免疫学的検定法、生物学的検定法、クロマトグラフィ、モニタおよびイメージャを含む。ただし、これらに限定されない。この分析は、分析物、血清マーカ、抗体、およびサンプル中の患者から得られるものなどを評価できる。さらに、患者に関する情報がテストに関連して供給できる。そのような情報は、年齢、体重、血圧、遺伝的履歴、および他のそのようなパラメータまたは変数を含む。ただし、これらに限定されない。 Selected biochemical tests include any test from which useful diagnostic information is obtained in connection with the patient and / or the patient's symptoms. This test is instrument-based or non-instrument-based and includes analysis of changes in biological samples, patient signs, patient status, and / or these factors. Any of a number of analytical methods can be used, including immunoassays, biological assays, chromatography, monitors and imagers. However, it is not limited to these. This analysis can evaluate analytes, serum markers, antibodies, and those obtained from patients in the sample. In addition, information about the patient can be provided in connection with the test. Such information includes age, weight, blood pressure, genetic history, and other such parameters or variables. However, it is not limited to these.

この実施形態において開発された例示の生化学テストは、ＥｎｚｙｍｅＬｉｎｋｅｄＩｍｍｕｎｏｓｏｒｂｅｎｔＡｓｓａｙまたはＥＬＩＳＡテストなど、標準化されたテストフォーマットを使用するが、本明細書で与えられる情報は、他の生化学テストまたは診断テストの開発に適用でき、ＥＬＩＳＡテストの開発に限定されない（例えばＥＬＩＳＡテストの説明については、Atassi他編、「Molecular Immunology: A Textbook」、Marcel Dekker Inc.，New York and Basel 1984参照）。ＥＬＩＳＡテストの開発にとって重要な情報は、抗体プロファイルを特徴付け、かつ抗体の特性を抽出するために蛋白質に対する抗体の反応度を決定するテストフォーマットであるウェスタンブロットテスト中に得られる。 The exemplary biochemical test developed in this embodiment uses a standardized test format, such as the Enzyme Linked Immunosorbent Assay or ELISA test, but the information provided herein is based on other biochemical or diagnostic tests. (For example, see Atassi et al., “Molecular Immunology: A Textbook”, Marcel Dekker Inc., New York and Basel 1984 for an explanation of the ELISA test). Important information for the development of ELISA tests is obtained during the Western blot test, a test format that characterizes antibody profiles and determines the reactivity of antibodies to proteins in order to extract antibody properties.

ウェスタンブロットは、例えば、混合物中の特定の抗原をポリアクリルアミドゲル上で分離させ、ニトロセルロース上ににじませ、ラベル付けされた抗体をプローブとして検出することによって、これらの抗原を識別するために使用される技法である。（例えばウェスタンブロットについては、Stitesおよび Terr編、「Basic and Clinical Immunology」、Seventh Edition、Appleton and Large 1991を参照）。しかしながら、ウェスタンブロットテストを診断ツールとして使用することは時々望ましくない。代わりに、診断に関連する情報を含む分子量の範囲を事前に識別し、次いでこの情報を同等のＥＬＩＳＡテスト中に「コード化」できる。 Western blots are used to identify these antigens, for example, by separating specific antigens in a mixture on a polyacrylamide gel, blotting onto nitrocellulose, and detecting the labeled antibody as a probe. The technique used. (See, eg, Western blots, edited by Stites and Terr, “Basic and Clinical Immunology”, Seventh Edition, Appleton and Large 1991). However, it is sometimes undesirable to use the Western blot test as a diagnostic tool. Alternatively, a range of molecular weights containing information relevant to diagnosis can be identified in advance, and then this information can be “coded” during an equivalent ELISA test.

この例では、有効な生化学診断テストの開発は、疾病症状が既知であるか、または推測される患者のウェスタンブロットデータの使用可能度に依存する。図５を参照すると、ウェスタンブロットデータをソースとして使用し（ステップＷ）、ウェスタンブロットデータを処理する場合の第一のステップは、ニューラルネットワークによって使用されるウェスタンブロットデータを事前処理することである（ステップＸ）。画像は、コンピュータを使用して、スプライン補間および画像正規化を実施することによってデジタル化され、固定の寸法のトレーニング記録に変換される。多数のウェスタンブロットテストからのデータを使用するために画像中の情報のみに基づいて画像を所与のゲル上で整合する必要がある。ニューラルネットワークの各入力は、特定の分子量または分子量の範囲を正確に表す必要がある。通常、生成される各ゲルは、較正用に標準画像を含む。含まれる蛋白質は、既知の分子量のものであり、したがって標準画像はまた、同じウェスタンブロット中に含まれる画像の整合に使用できる。例えば、標準曲線を使用して、同じウェスタンブロット上の他の画像の分子量範囲を推定し、それによりニトロセルロースストリップを整合することができる。 In this example, the development of an effective biochemical diagnostic test depends on the availability of Western blot data for patients whose disease symptoms are known or suspected. Referring to FIG. 5, using Western blot data as a source (Step W), the first step in processing Western blot data is to pre-process the Western blot data used by the neural network ( Step X). The image is digitized by performing spline interpolation and image normalization using a computer and converted to a fixed size training record. In order to use data from multiple Western blot tests, the images need to be matched on a given gel based solely on the information in the images. Each input of the neural network needs to accurately represent a specific molecular weight or range of molecular weights. Typically, each gel that is generated contains a standard image for calibration. The included proteins are of known molecular weight, so standard images can also be used to match images included in the same Western blot. For example, a standard curve can be used to estimate the molecular weight range of other images on the same Western blot, thereby matching the nitrocellulose strip.

画像を整合する方法は立方スプライン補間である。これは、標準によって表されたデータ点で滑らかな遷移を保証する方法である。外挿による起こりうる性能問題を回避するために、外挿が線形になるように終端条件を設定する。コンピュータの整合ステップは、ウェスタンブロットの出力上の所与のバンドに対して分子量の推定値の変動を最小限に抑える。 A method for aligning images is cubic spline interpolation. This is a way to ensure a smooth transition at the data points represented by the standard. In order to avoid possible performance problems due to extrapolation, termination conditions are set so that the extrapolation is linear. A computer matching step minimizes the variation in molecular weight estimates for a given band on the output of the Western blot.

次いで、最も暗いバンドが１．０の倍率変更された密度を有し、かつ最も明るいバンドが０．０に倍率変更されるように密度を倍率変更することによって画像の密度を正規化するために得られた走査画像を処理する。次いで、この画像を処理して、以下で説明するように最初にトレーニングされなければならないニューラルネットワークへの入力になる数の固定の長さのベクトルにする。 To normalize the density of the image by scaling the density so that the darkest band has a scaled density of 1.0 and the brightest band is scaled to 0.0 The obtained scanned image is processed. This image is then processed into a number of fixed length vectors that will be input to the neural network that must first be trained as described below.

トレーニング例は、ウェスタンブロットデータの処理から生成された結果をトレーニングする上述のプロセスと同じプロセスで作成される（ステップＹ）。開始重みに対する依存度、相互依存変数中の冗長度、ネットワークを過剰トレーニングすることから生じる減感の認識された問題を最小限に抑えるために、前に論じた分割方法によるデータに基づいて一組のニューラルネットワーク（コンセンサス）をトレーニングすることが有用である。 A training example is created in the same process as described above for training the results generated from the processing of Western blot data (step Y). To minimize the perceived weight dependency, redundancy in the interdependent variables, and the perceived problem of desensitization resulting from overtraining the network, a set based on the data from the partitioning method discussed earlier. It is useful to train a neural network (consensus).

処理されたウェスタンブロットデータに対するトレーニング実行の感度分析から、大いに寄与する分子量（ＭＷ）の領域を決定し、識別できる（ステップＡＡ）。分離ステップの一部として、入力と所望の出力との間の相関の符号が同じである限り、隣接する領域中の入力を「ビン」中に結合することが好ましい。このプロセスは、ウェスタンブロットによってつくり出された代表的な１００プラス入力、および他の入力を、約２０個未満の入力のはるかにより操作可能な数まで減らす。 From the sensitivity analysis of the training run on the processed Western blot data, a region of significant molecular weight (MW) can be determined and identified (step AA). As part of the separation step, it is preferable to combine the inputs in adjacent regions into “bins” as long as the sign of the correlation between the input and the desired output is the same. This process reduces the typical 100 plus input created by Western blots, and other inputs, to a much more manipulable number of less than about 20 inputs.

特定の実施形態では、分子量の複数の範囲は、診断中の条件を示す所望の出力に相関することが分かる。相関は、正または負である。減少した入力表示は、ウェスタンブロットトレーニング中に発見されたピークの各ピークを中心としたガウス領域を使用して生成される。標準偏差は、ガウスの値が領域の縁部で０．５以下であるように決定される。 In certain embodiments, it can be seen that the multiple ranges of molecular weight correlate with a desired output indicative of the condition being diagnosed. The correlation is positive or negative. A reduced input representation is generated using a Gaussian region centered on each peak of the peaks found during Western blot training. The standard deviation is determined such that the Gaussian value is 0.5 or less at the edge of the region.

特定の実施形態では、ニューラルネットワーク入力を生成する基本操作は、計算のために分子量のログを使用して、ガウス画像とウェスタンブロット画像との間で畳み込みを実施することである。 In certain embodiments, the basic operation of generating a neural network input is to perform a convolution between a Gaussian image and a Western blot image using a molecular weight log for calculation.

データは、上述のように、ホールドアウト方法を使用してテストできる。例えば、５個の区分を使用し、各区分中、データの８０％をトレーニング用に使用し、データの２０％をテスト用に使用することもある。データは、各区分が各ゲルからの例を有する可能性があるようにシャッフルされる。 Data can be tested using the holdout method as described above. For example, five sections may be used, and in each section 80% of the data is used for training and 20% of the data is used for testing. The data is shuffled so that each section may have an example from each gel.

診断にとって重要な分子量領域を識別した後（ステップＡＡ）、分子量の選択された一つまたは複数の領域の一つまたは複数のテストを構築する（ステップＡＢ）。ＥＬＩＳＡ生化学テストは一例である。診断にとって重要であると識別された分子量の選択された一つまたは複数の領域は、その場合、物理的に識別され、ＥＬＩＳＡ生化学テストの構成要素として使用される。同じ相関符号の領域は単一のＥＬＩＳＡテスト中に結合されても結合されなくてもよいが、異なる相関符号の領域は、単一のテスト中に結合されてはならない。そのような生化学テストの値は、その場合、生化学テスト結果を既知のまたは推測される医療状態と比較することによって決定される。 After identifying the molecular weight regions important for diagnosis (step AA), one or more tests of the selected one or more regions of molecular weight are constructed (step AB). The ELISA biochemical test is an example. The selected region or regions of molecular weight identified as important for diagnosis are then physically identified and used as a component of an ELISA biochemical test. Regions of the same correlation code may or may not be combined during a single ELISA test, but regions of different correlation codes must not be combined during a single test. The value of such a biochemical test is then determined by comparing the biochemical test results to a known or suspected medical condition.

この例では、生化学診断テストの開発は、図２に示されるプロセス中で患者データと生化学データを結合することによって向上する。これらの条件下では、患者病歴診断テストは生化学診断テストの基礎である。本明細書で説明するように、重要な変数として識別された変数は、診断にとって重要な分子量領域を識別するために使用すべき一組のニューラルネットワークをトレーニングするためにウェスタンブロットデータから得られたデータと結合される。 In this example, biochemical diagnostic test development is enhanced by combining patient and biochemical data in the process shown in FIG. Under these conditions, the patient history diagnostic test is the basis of a biochemical diagnostic test. As described herein, the variables identified as key variables were obtained from Western blot data to train a set of neural networks to be used to identify molecular weight regions important for diagnosis. Combined with data.

図２を参照すると、ウェスタンブロットデータをソースとして使用し（ステップＷ）、上述のようにニューラルネットワークが使用するために事前処理する（ステップＸ）。患者病歴データからの重要な変数およびウェスタンブロットデータの処理から生成された結果を結合し、結合されたデータを使用してトレーニングする（ステップＹ）上述のプロセスと同様のプロセス中でトレーニング例を作成する。並行して、ネットワークを上述のように患者病歴データに基づいてトレーニングする（ステップＺ）。 Referring to FIG. 2, Western blot data is used as a source (step W) and preprocessed for use by the neural network as described above (step X). Combine key variables from patient history data and results generated from processing Western blot data and train using the combined data (Step Y) Create training examples in a process similar to the process described above To do. In parallel, the network is trained based on patient history data as described above (step Z).

開始重み、相互依存変数間の冗長度、およびネットワークを過剰トレーニングすることによって生じる減感に対する依存度の認識された問題を最小限に抑えるために、分割方法によってデータに基づいて一組のニューラルネットワーク（コンセンサスセット）をトレーニングすることが好ましいことが分かった。 A set of neural networks based on the data by the partitioning method to minimize the perceived problem of the starting weight, the redundancy between the interdependent variables, and the desensitization caused by overtraining the network It has been found preferable to train (consensus set).

患者病歴データのみに基づくトレーニング実行の感度分析から、上述のように、大きく寄与する分子量の領域を決定し、識別できる（ステップＡＡ）。分離プロセス中の他のステップとして、その後、ウェスタンブロットデータ用の重要なビンを分離するために結合された患者病歴およびビン情報を入力として使用して、一組のネットワークをトレーニングする。「重要なビン」は、患者病歴情報の寄与を考慮する診断に関連する分子量の重要な領域を表す。これらのビンは、診断の所望の出力と正または負に相関する。 From the sensitivity analysis of training execution based only on patient history data, regions of molecular weight that contribute significantly can be determined and identified as described above (step AA). As another step in the separation process, a set of networks is then trained using as input the patient history and bin information combined to separate key bins for Western blot data. “Important bins” represent an important area of molecular weight relevant to diagnosis that takes into account the contribution of patient history information. These bins correlate positively or negatively with the desired output of the diagnosis.

診断にとって重要な分子量領域を識別した後（ステップＡＡ）、選択された領域または複数の領域用の一つまたは複数のテストを作成し、上述のように確認する（ステップＡＢ）。次いで、設計されたＥＬＩＳＡテストを生成し、使用して、データベース中の各患者ごとにＥＬＩＳＡデータを生成する（ステップＡＣ）。ＥＬＩＳＡデータおよび重要な患者病歴データを入力として使用して、上述の分割手法を使用して一組のネットワークをトレーニングする（ステップＡＥ）。分割手法を使用すれば、生化学テストの下限の推定値が得られる。一組のネットワーク、すなわち供給できる製品として使用すべきネットワークの最終トレーニング（ステップＡＥ）は、トレーニングデータの一部としてすべての使用できるデータを使用して行われる。必要な場合、新しいデータを使用して、診断テストの性能を確認できる（ステップＡＦ）。すべてのトレーニングデータの性能は、生化学テストの性能推定値の上限になる。ネットワークのコンセンサスは、意図された診断テスト出力（ＡＧ）を表す。このニューラルネットワークの最後の組は、その場合診断に使用できる。 After identifying molecular weight regions that are important for diagnosis (step AA), one or more tests for the selected region or regions are created and confirmed as described above (step AB). The designed ELISA test is then generated and used to generate ELISA data for each patient in the database (step AC). A set of networks is trained using the segmentation technique described above (step AE) using the ELISA data and key patient history data as inputs. Using the split method gives an estimate of the lower limit of the biochemical test. Final training (step AE) of a set of networks, ie networks to be used as available products, is performed using all available data as part of the training data. If necessary, the new data can be used to confirm the performance of the diagnostic test (step AF). The performance of all training data is the upper limit of the biochemical test performance estimate. The network consensus represents the intended diagnostic test output (AG). This last set of neural networks can then be used for diagnosis.

４．ニューラルネットワーク性能の改善．
ニューラルネットワークを例に挙げて説明した意思決定支援システム、および本発明において提供される方法の重要な特徴は、性能を改善する能力である。上記で概説したトレーニング方法は、より多くの情報が使用できるようになるにつれて繰り返される。動作中、すべての入力変数および出力変数は、記録され、将来のトレーニングセッション中にトレーニングデータを増大させる。このようにして、診断ニューラルネットワークは、個々の母集団および母集団特性のゆるやかな変化に適合できる。 4). Improved neural network performance.
An important feature of the decision support system described by taking a neural network as an example and the method provided in the present invention is the ability to improve performance. The training method outlined above is repeated as more information becomes available. In operation, all input and output variables are recorded, increasing training data during future training sessions. In this way, the diagnostic neural network can adapt to gradual changes in individual populations and population characteristics.

トレーニングニューラルネットワークが、ユーザが所要の情報を入力し、かつニューラルネットワークスコアをユーザに出力することを可能にする装置中に含まれる場合、使用中の性能を改善するプロセスが自動化される。各エントリおよび対応する出力は、メモリ中に保持される。ネットワークを再トレーニングするステップは装置中にコード化できるので、ネットワークは、母集団に固有のデータを使用してどんなときでも再トレーニングできる。 If the training neural network is included in a device that allows the user to input the required information and output the neural network score to the user, the process of improving performance in use is automated. Each entry and corresponding output is held in memory. Since the step of retraining the network can be coded into the device, the network can be retrained at any time using population specific data.

５．診断テスト治療方法の有効性を評価する方法．
一般に、診断テストの有効性または有用性は、診断テスト結果が分かっているかまたは推測される患者医療状態と比較することによって決定される。診断テストは、診断テスト結果と患者医療状態との間に良好な相関がある場合に有効であると考えられる。診断テスト結果と患者医療状態との間の相関がよければよいほど、診断テストの有効性に置かれる評価は高くなる。そのような相関がない場合、診断テストは、あまり有効でないと考えられる。本発明において提供されるシステムは、そのテストに対応する変数が重要な選択された変数であるかどうかを決定することによって生化学テストの有効性を評価する手段を提供する。システムの性能を改善するデータをもたらすテストが識別される。 5. A method to evaluate the effectiveness of diagnostic test treatment methods.
In general, the effectiveness or usefulness of a diagnostic test is determined by comparing it to a patient medical condition for which the diagnostic test results are known or suspected. A diagnostic test is considered effective when there is a good correlation between the diagnostic test results and the patient medical condition. The better the correlation between the diagnostic test results and the patient medical status, the higher the rating placed on the effectiveness of the diagnostic test. If there is no such correlation, the diagnostic test is considered not very effective. The system provided in the present invention provides a means to assess the effectiveness of a biochemical test by determining whether the variable corresponding to the test is an important selected variable. Tests that identify data that improves the performance of the system are identified.

診断テスト結果と患者医療状態との間の相関と無関係に、診断テストの有効性をそれにより決定できる方法（図６）について以下で説明する。同様の方法を使用して、特定の治療の有効性を評価できる。 The following describes a method (FIG. 6) by which the effectiveness of a diagnostic test can be determined regardless of the correlation between the diagnostic test result and the patient medical condition. Similar methods can be used to assess the effectiveness of a particular treatment.

一実施形態では、この方法は、患者データのみに基づいてトレーニングされた患者病歴診断ニューラルネットワークの性能を、患者病歴データとＥＬＩＳＡデータなど生化学テストデータとの結合に基づいてトレーニングされた結合ニューラルネットワークの性能と比較する。患者病歴データを使用して、すべて上述のように、診断用の重要な変数を分離し（ステップＡＨ）、最後のニューラルネットワークをトレーニングする（ステップＡＪ）。並行して、その患者データが分かっているすべての患者またはサブセットについて生化学テスト結果を与え（ステップＡＫ）、すべて上述のように、まず診断用の重要な変数を分離し（ステップＡＬ）、その後最後のニューラルネットワークをトレーニングすること（ステップＡＭ）によって診断ニューラルネットワークを結合した患者データおよび生化学データに基づいてトレーニングする。 In one embodiment, the method uses a combined neural network trained based on combining patient history data and biochemical test data, such as ELISA data, with the performance of a patient history diagnostic neural network trained based only on patient data. Compare with performance. Patient history data is used to isolate important diagnostic variables (step AH) and train the final neural network (step AJ), all as described above. In parallel, biochemical test results are given for all patients or subsets whose patient data is known (step AK), and as described above, first all important diagnostic variables are separated (step AL) and then Training based on the patient and biochemical data combined with the diagnostic neural network by training the last neural network (step AM).

次いで、ステップＡＮで、ステップＡＪから得られた患者病歴診断ニューラルネットワークの性能を、ステップＡＭから得られた結合した診断ニューラルネットワークの性能と比較する。診断ニューラルネットワークの性能はいくつの手段ででも測定できる。一例では、各診断ニューラルネットワーク出力と患者の分かっているまたは推測される医療状態との間の相関を比較する。その場合、性能をこの相関の関数として測定できる。性能を測定する多数の他の方法がある。この例では、ステップＡＪから得られたものに勝るステップＡＭから得られた結合した診断ニューラルネットワークの性能の向上を生化学テストの有効性の尺度として使用する。 Then, in step AN, the performance of the patient history diagnostic neural network obtained from step AJ is compared with the performance of the combined diagnostic neural network obtained from step AM. The performance of a diagnostic neural network can be measured by any number of means. In one example, the correlation between each diagnostic neural network output and the patient's known or suspected medical condition is compared. In that case, performance can be measured as a function of this correlation. There are many other ways to measure performance. In this example, the improved performance of the combined diagnostic neural network obtained from step AM over that obtained from step AJ is used as a measure of the effectiveness of the biochemical test.

そのテスト結果と分かっているまたは推測される医療状態との間の十分な相関を欠いているこの例の生化学テスト、および一般の診断テストは、通常、効用が限られると考えられる。そのようなテストは、上述の方法によっていくつかの用途を有することが示され、したがってさもなければ有益でないと考えられるかもしれないそのテストの有効性が高くなる。本明細書で説明した方法には、診断テストの有用性を評価する手段を提供すること、かつまた診断テストの有効性を高める手段を提供することの二つの目的に適う。 The biochemical test in this example, which lacks sufficient correlation between the test results and the known or suspected medical condition, and general diagnostic tests are usually considered to have limited utility. Such a test has been shown to have several uses by the methods described above, thus increasing the effectiveness of that test, which may otherwise be considered unprofitable. The methods described herein serve two purposes: providing a means for assessing the usefulness of a diagnostic test and also providing a means for enhancing the effectiveness of a diagnostic test.

６．診断用の変数の識別および診断テストの開発へのこれらの方法の応用．
本発明において提供される方法およびネットワークは、例えば、重要な変数を識別し、既存の生化学テストを改善し、新しいテストを開発し、療法経過を評価し、新しい疾病マーカを識別する手段を提供する。これらの利点を例示するために、提供される方法は、特定の期間中の陣痛および出産の可能性など、子宮内膜症および妊娠関連事象に適用されている。 6). Application of these methods to the identification of diagnostic variables and the development of diagnostic tests.
The methods and networks provided in the present invention provide, for example, a means to identify important variables, improve existing biochemical tests, develop new tests, evaluate therapy courses, and identify new disease markers To do. To illustrate these benefits, the provided methods have been applied to endometriosis and pregnancy related events, such as the possibility of labor and childbirth during specific periods.

子宮内膜症．
本明細書に記載の方法は、子宮内膜症の診断用の非侵襲性方法を開発する手段を提供している。さらに、本発明の方法は、子宮内膜症を示すデータを与える生化学テストを開発し、また新しい生化学テストを識別し、開発する手段を提供する。 Endometriosis.
The methods described herein provide a means to develop non-invasive methods for the diagnosis of endometriosis. Furthermore, the method of the present invention provides a means to develop biochemical tests that provide data indicative of endometriosis and to identify and develop new biochemical tests.

意思決定支援システムの変数選択および使用の方法は、子宮内膜症に適用されている。意思決定支援システム、この例では、ニューラルネットワークのコンセンサスは、子宮内膜症の診断用に開発されている。「例」で詳述するこの開発の過程で、患者病歴データ、すなわち質問表フォーマットによって患者から得られるデータのみを利用する子宮内膜症の診断を助けることができるニューラルネットワークを開発できることが分かった。生化学テストデータを使用して、特定のネットワークの性能を高めることができることが分かったが、診断ツールとしてのその価値にとっては重要でなかった。変数選択プロトコルおよびニューラルネットは、子宮内膜症を診断する手段を提供する意思決定支援システム中に入力できる変数の組を選択する手段を提供する。識別された変数の一部は、従来子宮内膜症に関連していた変数を含むが、他の変数はそうではない。さらに、上記のように、子宮内膜症に関連する骨盤痛や月経困難症などの変数は、診断が可能なようにそれに線形的に相関しない。 The method of variable selection and use of the decision support system has been applied to endometriosis. A decision support system, in this example, a neural network consensus, has been developed for the diagnosis of endometriosis. In the course of this development, detailed in “Examples”, it was found that a neural network can be developed that can aid in the diagnosis of endometriosis using only patient history data, ie, data obtained from patients via a questionnaire format. . It has been found that biochemical test data can be used to enhance the performance of a particular network, but was not important for its value as a diagnostic tool. Variable selection protocols and neural networks provide a means of selecting a set of variables that can be input into a decision support system that provides a means of diagnosing endometriosis. Some of the identified variables include those previously associated with endometriosis, while others are not. Furthermore, as noted above, variables such as pelvic pain and dysmenorrhea associated with endometriosis do not linearly correlate with it so that it can be diagnosed.

例示的な意思決定支援システムについて例中で説明する。例えば、本明細書でｐａｔ０７で示される一つのニューラルネットについて例１４で説明する。ｐａｔ０７ネットワーク出力と子宮内膜症を有する確率との比較は正の相関をもたらす（表１参照）。ｐａｔ０７ネットワークは、女性のｐａｔ０７スコアに基づいて子宮内膜症を有する女性の可能性を予測できる。例えば、女性が０．６のｐａｔ０７スコアを有する場合、彼女は子宮内膜症を有する９０％の確率を有する。ｐａｔ０７スコアが０．４の場合、彼女は子宮内膜症を有する確率が１０％である。データベースに適用したときのｐａｔ０７出力のダイナミックレンジは、約０．３から約０．７であった。理論上、出力値は、０から１まで変動しうるが、０．３以下の値または０．７以上の値は観測されなかった。ｐａｔ０７ネットワークを使用して、８００人以上の女性を評価し、その性能は次のように要約できる。 An exemplary decision support system is described in the example. For example, one neural network indicated by pat07 in this specification will be described in Example 14. Comparison of the pat07 network output with the probability of having endometriosis results in a positive correlation (see Table 1). The pat07 network can predict the likelihood of a woman with endometriosis based on the woman's pat07 score. For example, if a woman has a pat07 score of 0.6, she has a 90% probability of having endometriosis. If the pat07 score is 0.4, she has a 10% chance of having endometriosis. The dynamic range of the pat07 output when applied to the database was about 0.3 to about 0.7. Theoretically, the output value can vary from 0 to 1, but no value less than 0.3 or more than 0.7 was observed. Using the pat07 network, we evaluated over 800 women and their performance can be summarized as follows.

ｐａｔ０７ネットワークスコアは、子宮内膜症を有する可能性と解釈され、女性が子宮内膜症を有すると診断されるかどうかではない。可能性は、各スコアグループ中で発見される子宮内膜症の相対発生率に基づく。例えば、０．６またはそれ以上のｐａｔ０７ネットワークスコアを有する女性のグループでは、これらの女性の９０％が子宮内膜症を有し、これらの女性の１０％が子宮内膜症を有しない。この可能性は、不妊症科にいる女性の人口に関連する。ｐａｔ０７ネットワークを含むソフトウェアプログラムが開発されている。 The pat07 network score is interpreted as a possibility of having endometriosis, not whether a woman is diagnosed as having endometriosis. The likelihood is based on the relative incidence of endometriosis found in each score group. For example, in a group of women with a pat07 network score of 0.6 or greater, 90% of these women have endometriosis and 10% of these women do not have endometriosis. This possibility is related to the female population in the infertility department. Software programs including the pat07 network have been developed.

ａｄｅｚａｃｒｆ．ｅｘｅと呼ばれる一つのプログラムは、ユーザが女性のｐａｔ０７ネットワークスコアを得ることを可能にする単一のスクリーンウィンドウインタフェースを提供する。ユーザは、１４個のすべての変数の値を入力し、すべてのキーストロークの後でｐａｔ０７ネットワークスコアを計算する。ａｄｚｃｒｆ２．ｅｘｅと呼ばれる他のプログラムは、ａｄｅｚａｃｒｆ．ｅｘｅとほとんど厳密に同じであるが、一つの追加の入力、すなわちＥＬＩＳＡテストの値を入力できる。このプログラムおよびネットワークは、診断テストの臨床的効用を拡大する方法の具体的な例である。ＥＬＩＳＡテスト結果は子宮内膜症に相関しなかった。それだけではＥＬＩＳＡテストは臨床的効用を有しない。他の入力パラメータとして、ＥＬＩＳＡテストは、ネットワーク分析用の入力としてＥＬＩＳＡ結果を組み込むことがＥＬＩＳＡテストの臨床効用を拡大するように、ネットワーク性能を改善した。ａｄｚｃｒｆ２．ｅｘｅと呼ばれる（本明細書の付録ＩＩに記載されている）他のプログラムは、ユーザが女性のｐａｔ０７ネットワークスコアを得ることを可能にする多重スクリーンウィンドウインタフェースを提供する。多重データエントリスクリーンは、すべての患者病歴データを入力し、ｐａｔ０７用の入力として必要とされるパラメータだけは入力しないようにユーザを案内する。ユーザがすべてのデータを入力し、正確なものと認めた後で、ｐａｔ０７スコアを計算する。このプログラムはまた、＊．ｆｄｂファイル中に入力されたデータを保管し、データをインポートし、インポートされたデータ上のｐａｔ０７スコアを計算し、データをエクスポートすることができる。ユーザは、前に入力したデータを編集することができる。上記の三つのプログラムはすべて、子宮内膜症用の診断ソフトウェアの具体的な例として役立つ。 adezacrf. One program called exe provides a single screen window interface that allows the user to obtain a female pat07 network score. The user enters the values of all 14 variables and calculates the pat07 network score after every keystroke. adzcrf2. Another program called exe is adezacrf. It is almost exactly the same as exe, but one additional input can be entered, namely the value of the ELISA test. This program and network are specific examples of ways to extend the clinical utility of diagnostic tests. ELISA test results did not correlate with endometriosis. By itself, the ELISA test has no clinical utility. As another input parameter, the ELISA test has improved network performance so that incorporating the ELISA results as input for network analysis expands the clinical utility of the ELISA test. adzcrf2. Another program called exe (described in Appendix II herein) provides a multi-screen window interface that allows the user to obtain a female pat07 network score. The multiple data entry screen guides the user to enter all patient history data and not only the parameters required as input for pat07. After the user enters all the data and finds it accurate, the pat07 score is calculated. This program also has *. Data entered in an fdb file can be stored, the data imported, the pat07 score on the imported data calculated, and the data exported. The user can edit previously entered data. All three programs above serve as specific examples of diagnostic software for endometriosis.

図１１に、診断ソフトウェア中で使用される例示的なインタフェーススクリーンを示す。ＭｉｃｒｏＳｏｆｔＷｉｎｄｏｗｓ^TMタイプのディスプレイとして提供されるディスプレイ１１００は、子宮内膜症の診断用に決定された重要な変数の各変数ごとに数値を入力するためのテンプレートを提供する。テストを実施するためのデータの入力は、従来のキーボードのみを使用するか、またはコンピュータマウス、トラックボールまたはジョイスティックと組み合わせて実施される。本明細書では、マウスとキーボードの組合せを使用する。各テキストボックス１１０１〜１１０６は、重要な変数、すなわち年齢（ボックス１１０１）、妊娠回数（ボックス１１０２）、出産回数（ボックス１１０３）、流産回数（ボックス１１０４）、一日当たり吸うたばこの箱数（ボックス１１０５）、ＥＬＩＳＡテスト結果（ボックス１１０６）を表す数値を入力するためのものである。対象患者の年齢を入力するために、ユーザは、スクリーン上のポインタがボックス１１０１中に入るようにマウスを動かし、次いでその位置でクリックする。キーボードを使用して、患者の年齢を表す数を入力する。選択されたボックスでポイントし、クリックすることによって残りのボックスにアクセスする。 FIG. 11 shows an exemplary interface screen used in the diagnostic software. A display 1100 provided as a MicroSoft Windows ^™ type display provides a template for entering a numerical value for each of the important variables determined for the diagnosis of endometriosis. Input of data for performing the test is performed using only a conventional keyboard or in combination with a computer mouse, trackball or joystick. In this specification, a combination of a mouse and a keyboard is used. Each text box 1101-1106 contains important variables: age (box 1101), number of pregnancy (box 1102), number of births (box 1103), number of miscarriages (box 1104), number of cigarettes smoked per day (box 1105). ), For inputting a numerical value representing the ELISA test result (box 1106). To enter the age of the subject patient, the user moves the mouse so that the pointer on the screen enters box 1101, and then clicks at that location. Use the keyboard to enter a number that represents the age of the patient. Point to the selected box and click to access the remaining boxes.

ボックス１１０７〜１１１５は、データが二進、すなわち「ｙｅｓ」か「ｎｏ」である重要な選択された変数である。ボックスおよび変数は次のように相関する。 Boxes 1107-1115 are important selected variables whose data is binary, ie “yes” or “no”. Boxes and variables correlate as follows:

［表１］
――――――――――――――――――――――――――――――
ボックス変数
１１０７子宮内膜症の過去病歴
１１０８月経困難症
１１０９妊娠中の高血圧症
１１１０骨盤痛
１１１１異常ＰＡＰ／形成異常症
１１１２骨盤手術歴
１１１３薬物治療歴
１１１４生殖器いぼ
１１１５糖尿病
―――――――――――――――――――――――――――――― [Table 1]
――――――――――――――――――――――――――――――
Box variable
1107 Past history of endometriosis
1108 Dysmenorrhea
1109 Hypertension during pregnancy
1110 Pelvic pain
1111 Abnormal PAP / Dysplasia
1112 History of pelvic surgery
1113 History of drug treatment
1114 Genital warts
1115 Diabetes mellitus ――――――――――――――――――――――――――――――

これらの変数の任意の変数に対する「ｙｅｓ」は、対応するボックスをポイントし、マウスボタンをクリックして、ボックス中の「Ｘ」を示すことによって表示できる。 The “yes” for any of these variables can be displayed by pointing to the corresponding box and clicking the mouse button to indicate “X” in the box.

ネットワークは、すべてのキーストロークの後でデータを自動的に処理し、したがってテンプレート１１００中へのすべての入力の後でテキストボックス１１１８〜１１２０中に表示される出力値中に変化が現れる。「Ｅｎｄｏ」とラベル付けされたテキストボックス１１１８は、子宮内膜症の存在に対してコンセンサスネットワーク出力を与える。「ＮｏＥｎｄｏ」とラベル付けされたテキストボックス１１１９は、子宮内膜症の不在に対してコンセンサスネットワーク出力を与える。テキストボックス１１２０は、患者が子宮内膜症を有するかどうかを示す相対的スコアを与える。テキストボックス１１２０中のスコアは、医師が結果を解釈することをより容易にするボックス１１１８および１１１９から得られた人工的な数であることに留意されたい。上記のように、２５までの正の範囲中のこのボックス中の値は、子宮内膜症を有することを示し、−２５までの負の範囲の値は、子宮内膜症を有しないことを示す。選択された変換は、医師がｐａｔ０７出力をより容易に解釈することを可能にする。 The network automatically processes the data after every keystroke, so changes appear in the output values displayed in text boxes 1118-1120 after every entry into template 1100. A text box 1118 labeled “Endo” provides a consensus network output for the presence of endometriosis. A text box 1119 labeled “No Endo” provides a consensus network output for the absence of endometriosis. Text box 1120 provides a relative score that indicates whether the patient has endometriosis. Note that the score in text box 1120 is an artificial number obtained from boxes 1118 and 1119 that make it easier for the physician to interpret the results. As noted above, values in this box in the positive range up to 25 indicate having endometriosis, and values in the negative range up to -25 indicate no endometriosis. Show. The selected transformation allows the physician to more easily interpret the pat07 output.

例で説明したように、ｐａｔ０７は、子宮内膜症を予測する唯一のネットワークではない。ｐａｔ０８からｐａｔ２３ａで示される他のネットワークが開発されている。これらも子宮内膜症を予測する。これらのすべてのネットワークは、まったく同様に動作し、容易にｐａｔ０７の代わりに使用できる。したがって、ｐａｔ０７を開発するために使用した方法に従って、他の同様に機能するニューラルネットが開発でき、開発されている。ｐａｔ０８およびｐａｔ０９は、ｐａｔ０７に最も類似している。これらのネットワークは、上記で概説したプロトコルに従って開発され、ｐａｔ０７の開発用に使用されるものと同じ組から重要な変数を選択することができた。 As explained in the example, pat07 is not the only network predicting endometriosis. Other networks indicated by pat08 to pat23a have been developed. These also predict endometriosis. All these networks operate in exactly the same way and can easily be used instead of pat07. Therefore, other similarly functioning neural networks can be developed and developed according to the method used to develop pat07. pat08 and pat09 are most similar to pat07. These networks were developed according to the protocol outlined above and were able to select key variables from the same set used for the development of pat07.

変数の初期重み付けは、変数選択手順の結果に対しては影響を及ぼすが、最後の診断結果中ではそうではないことが分かった。ｐａｔ０８およびｐａｔ０９は、ｐａｔ０７と同じ患者データのデータベースを使用して、疾病関連パラメータを導出した。ｐａｔ１０からｐａｔ２３ａは、元々いくつかのパラメータ、すなわち子宮内膜症歴、骨盤手術歴、月経困難症歴、骨盤痛の重要度を明らかにするために設計されたトレーニング実行であった。これらを開発するために、変数選択プロセスから変数を差し引くことによってその変数の重要度を評価した。変数選択プロセスおよび最後のコンセンサスネットワークをトレーニングすることで、ネットワーク性能は著しく低下しないことが分かった。 It was found that the initial weighting of the variables affects the outcome of the variable selection procedure, but not in the final diagnostic results. pat08 and pat09 derived disease-related parameters using the same database of patient data as pat07. Pat10 to pat23a were originally training runs designed to reveal several parameters: endometriosis history, pelvic surgery history, dysmenorrhea history, pelvic pain importance. To develop these, the importance of the variable was evaluated by subtracting the variable from the variable selection process. It has been found that training the variable selection process and the final consensus network does not significantly degrade the network performance.

したがって、特定の変数または変数の組は子宮内膜症を予測する場合に重要であると考えられていたが、そのような変数がない場合にトレーニングされるネットワークは、子宮内膜症を予測する著しく低下した能力を有しない。これらの結果は、（１）変数選択およびコンセンサスネットワークトレーニング用の方法の有効性、および（２）一般にネットワークの適合性を証明する。一つのデータタイプがない場合、ネットワークは、その情報を引き出すべき他の変数を見つけた。一つの変数がない場合、ネットワークは、異なる変数をその所定の位置で選択し、性能を維持した。 Thus, while a particular variable or set of variables was considered important in predicting endometriosis, networks trained in the absence of such variables predict endometriosis Does not have significantly reduced ability. These results demonstrate (1) the effectiveness of the method for variable selection and consensus network training, and (2) generally the suitability of the network. In the absence of one data type, the network has found other variables from which to retrieve that information. In the absence of a single variable, the network selected a different variable at its predetermined location to maintain performance.

子宮内膜症を有する疑いがある患者は、一般に疾病を診断するために診断手術を受けなければならない。患者病歴情報および任意選択でウェスタンブロットデータなど生化学テストデータを使用して、この障害を確実に診断する能力は、手術の非常に望ましい代替手段を与える。本発明の方法および識別された変数はそれを行う手段を与える。 Patients suspected of having endometriosis generally must undergo diagnostic surgery to diagnose the disease. The ability to reliably diagnose this disorder using patient history information and optionally biochemical test data such as Western blot data provides a highly desirable alternative to surgery. The methods and identified variables of the present invention provide a means to do so.

子宮内膜症の疾病の診断に関連するデータが収集されている。このデータは、患者病歴データ、ウェスタンブロットデータ、ＥＬＩＳＡデータを含む。「例」に示される本発明の方法の適用は、患者病歴データのみが子宮内膜症を予測できることを証明した。 Data related to the diagnosis of endometriosis disease has been collected. This data includes patient history data, Western blot data, ELISA data. The application of the method of the invention shown in the “Examples” has demonstrated that only patient history data can predict endometriosis.

変数選択プロトコルの性能を評価し、１４個の変数ネットワーク（ｐａｔ０７）が１４個の変数の可能なすべての組合せと比較して（性能に関して）順位付けされることを確認するために、ネットワークを変数の可能なすべての組合せ（１６，３８４の組合せ）に基づいてトレーニングした。また、変数選択プロトコルを１４個の変数の組に適用した。１４個の変数の中から、５個の変数を選択した。これらは、妊娠高血圧症、出産回数、異常ＰＡＰ／形成異常症、子宮内膜症歴、および骨盤手術歴である。この組合せを１６，３８４の可能な組合せから６８番目に最もよく実行する組合せとして順位付けし（９９．６百分位数）、それにより変数選択プロトコルの有効性を証明した。また、１４個のすべての変数を含む組合せを１６，３８４の可能な組合せから７１８番目に順位付けした（９５．６百分位数）。 To evaluate the performance of the variable selection protocol and verify that the 14 variable network (pat07) is ranked (in terms of performance) compared to all possible combinations of 14 variables Training was based on all possible combinations (16,384 combinations). A variable selection protocol was applied to a set of 14 variables. Of the 14 variables, 5 variables were selected. These are pregnancy hypertension, number of births, abnormal PAP / dysplasia, endometriosis history, and pelvic surgery history. This combination was ranked as the 68th best performing combination out of 16,384 possible combinations (99.6 percentile), thereby demonstrating the effectiveness of the variable selection protocol. Also, the combinations including all 14 variables were ranked 718 out of 16,384 possible combinations (95.6 percentile).

これらの結果はまた、１４個の変数のサブセットが有用であることを示す。特に、３個の変数の次の組合せの一つ（または複数）のパラメータの選択された組の任意のサブセット、特に１４個の変数の組が子宮内膜症の診断用に意思決定支援システムとともに使用できる。 These results also show that a subset of 14 variables is useful. In particular, any subset of a selected set of one (or more) parameters of the next combination of three variables, particularly a set of 14 variables, together with a decision support system for the diagnosis of endometriosis Can be used.

［表２］
――――――――――――――――――――――――――――――――――――
ａ）出産回数、子宮内膜症歴、骨盤手術歴
ｂ）糖尿病、妊娠高血圧症、喫煙
ｃ）妊娠高血圧症、異常ｐａｐしみ／形成異常症、子宮内膜症歴
ｄ）年齢、喫煙、子宮内膜症歴
ｅ）喫煙、子宮内膜症歴、月経困難症
ｆ）年齢、糖尿病、子宮内膜症歴
ｇ）妊娠高血圧症、出産回数、子宮内膜症歴
ｈ）喫煙、出産回数、子宮内膜症歴
ｉ）妊娠高血圧症、子宮内膜症歴、骨盤手術歴
ｊ）妊娠回数、子宮内膜症歴、骨盤手術歴
ｋ）出産回数、異常ＰＡＰしみ／形成異常症、子宮内膜症歴
ｌ）出産回数、異常ＰＡＰしみ／形成異常症、月経困難症
ｍ）子宮内膜症歴、骨盤手術歴、月経困難症
ｎ）妊娠回数、子宮内膜症歴、月経困難症。
―――――――――――――――――――――――――――――――――――― [Table 2]
――――――――――――――――――――――――――――――――――――
a) Number of births, endometriosis history, pelvic surgery history b) Diabetes, pregnancy hypertension, smoking c) Pregnancy hypertension, abnormal pap stain / dysplasia, endometriosis history d) Age, smoking, intrauterine E) Smoking, endometriosis, dysmenorrhea f) Age, diabetes, endometriosis g) Pregnancy hypertension, number of births, endometriosis h) Smoking, births, intrauterine I) Pregnancy hypertension, endometriosis history, pelvic surgery history j) Number of pregnancy, endometriosis history, pelvic surgery history k) Number of childbirth, abnormal PAP stain / dysplasia, endometriosis history l) Number of births, abnormal PAP stain / dysplasia, dysmenorrhea m) History of endometriosis, pelvic surgery, dysmenorrhea n) Number of pregnancy, history of endometriosis, dysmenorrhea.
――――――――――――――――――――――――――――――――――――

例で示すように、列挙した１４個の変数と同じ働きをする重要な選択された変数の他の組が得られる。他のより小さいそのサブセットも識別できる。 As shown in the example, another set of important selected variables is obtained that works the same as the 14 listed variables. Other smaller subsets can also be identified.

特定の期間中の出産の可能性など、妊娠関連事象の予測．
本発明の方法は、任意の障害または状態に適用でき、特に診断テストが適切に相関し得るか、または生化学テストまたは好都合な生化学テストが使用できない状態に適している。例えば、本発明の方法は、特定の期間中の出産の可能性など、妊娠関連事象の予測に適用されている。 Prediction of pregnancy-related events such as the possibility of childbirth during a specific period.
The method of the invention can be applied to any disorder or condition, and is particularly suitable for situations where diagnostic tests can be adequately correlated or biochemical tests or convenient biochemical tests are not available. For example, the method of the present invention has been applied to the prediction of pregnancy related events, such as the possibility of childbirth during a specific period.

間近に迫った出産の決定は、例えば、３４週までに生まれた幼児の新生児残存者を増やすために重要である。妊娠の週２０後に妊娠患者からの膣腔または子宮頸管からの分泌サンプル中の胎児のフィブロネクチンの存在は、３４週前の陣痛および出産の危険に関連する。妊娠の週２０後に妊娠患者からの膣腔または子宮頸管からの分泌サンプル中の胎児のフィブロネクチンをスクリーニングする方法および装置が販売されている（米国特許第５５１６７０２号、第５４６８６１９号、第５２８１５２２号、および第５０９６８３０号参照。また、米国特許第５２３６８４６号、第５２２３４４０号、および第５１８５２７０号参照）。 The impending birth decision is important, for example, to increase the number of newborn infants born by the 34th week. The presence of fetal fibronectin in the vaginal cavity or cervical secretion sample from pregnant patients after week 20 of gestation is associated with labor risk and birth risk before 34 weeks. Methods and devices have been marketed for screening fetal fibronectin in vaginal or cervical secretion samples from pregnant patients after week 20 of pregnancy (US Pat. Nos. 5,516,702, 5,468,619, and 5,281,522, and No. 5,096,830. See also US Pat. Nos. 5,236,846, 5,223,440, and 5,185,270).

これらの分泌物中の胎児のフィブロネクチンの存在と、３４週前の陣痛および出産との相関は完全ではない。著しい偽正率および偽負率が存在する。したがって、３４週前の陣痛および出産の可能性を評価する方法の必要に対処し、使用できるテストの予想可能度を改善するために、本発明の方法は、いくつかの妊娠関連事象の可能性を評価する意思決定支援システムの開発に適用されている。特に、懐胎の３４週前（または後）の出産を予測するニューラルネットが開発された。本明細書で説明した開発されたニューラルネットワークおよび他の意思決定支援システムは、偽正の数を少なくすることによって胎児のフィブロネクチン（ｆＦＮ）の性能を改善できる。例１３に示される結果は、本発明の方法を使用すれば、予測性能が改善されるので既存のテストの診断効用が改善できることを証明する。 The correlation between the presence of fetal fibronectin in these secretions and labor and delivery before 34 weeks is not perfect. There are significant false positive and false negative rates. Thus, in order to address the need for a method of assessing the likelihood of labor and childbirth before 34 weeks and to improve the predictability of tests that can be used, the method of the present invention has the potential for several pregnancy-related events. It has been applied to the development of a decision support system that evaluates In particular, neural networks have been developed to predict childbirth 34 weeks before (or after) pregnancy. The developed neural networks and other decision support systems described herein can improve fetal fibronectin (fFN) performance by reducing the number of false positives. The results shown in Example 13 demonstrate that using the method of the present invention can improve the diagnostic utility of existing tests because of improved prediction performance.

上述のように、これらの方法を使用して、前に疾病、状態または障害に関連すると考えられていなかったテストを識別し、新しいテストを設計し、新しい疾病マーカを識別することができる。 As described above, these methods can be used to identify tests that were not previously thought to be associated with a disease, condition, or disorder, to design new tests, and to identify new disease markers.

次の例は、説明のためにのみ記載し、本発明の範囲を限定するものではない。 The following examples are given for illustrative purposes only and are not intended to limit the scope of the invention.

＜例１＞
患者病歴データの関連変数についての評価．
この例は、候補変数の選択を説明する。 <Example 1>
Evaluation of related variables in patient history data.
This example illustrates the selection of candidate variables.

要件．
患者病歴を評価して、どの変数が診断に関連するかを決定する。この例は、診断で使用する各変数についての感度分析を実施することによって実施される。この分析の実施には二つの方法を使用することができる。第一の方法は、全ての情報についてネットワークをトレーニングし、各入力がネットワーク出力に及ぼす影響をネットワークの重みから決定する方法である。第二の方法は、変数を含んでトレーニングしたネットワーク、および変数を排除してトレーニングした第二のネットワークの、二つのネットワークの性能を比較する方法である。このトレーニングは、関連すると考えられる変数のそれぞれについて実施することになる。性能に寄与しないものは排除されることになる。こうした操作は、ネットワークへの入力のディメンションを低下させるために実施する。限られた量のデータでトレーニングする場合には、入力のディメンションが低いことにより、ネットワークの一般化能力が高まることになる。 Requirements.
The patient medical history is evaluated to determine which variables are relevant to the diagnosis. This example is performed by performing a sensitivity analysis for each variable used in the diagnosis. Two methods can be used to perform this analysis. The first method is to train the network for all information and determine the influence of each input on the network output from the network weight. The second method is a method of comparing the performance of two networks, a network trained including variables and a second network trained excluding variables. This training will be conducted for each variable that is considered relevant. Those that do not contribute to performance will be excluded. These operations are performed to reduce the dimension of input to the network. When training with a limited amount of data, the low input dimension increases the generalization ability of the network.

データの分析．
この例で使用するデータは、５１０件の患者病歴を含んでいた。各記録は１２０個のテキストおよび数字フィールドを含む。これらのフィールドのうち４５個は、手術前に知られており、常に情報を含むものとして識別された。これらのフィールドは、ネットワークの分析およびトレーニングに利用できる基本的な変数として使用した。この例で使用した変数の概要は以下の通りである。 Data analysis.
The data used in this example included 510 patient histories. Each record contains 120 text and number fields. Forty-five of these fields were known before surgery and were always identified as containing information. These fields were used as basic variables available for network analysis and training. The outline of the variables used in this example is as follows.

使用した方法．
変数の重要性を決定する、最も一般的に使用される方法は、全ての変数を含むデータについてニューラルネットワークをトレーニングする方法である。トレーニングしたネットワークを基礎として使用して、ネットワークおよびトレーニングデータについて感度分析を実施する。各トレーニング例では、ネットワークは順方向モードに実行される（トレーニングなし）。ネットワーク出力を記録した。各入力変数について、ネットワークは、変数をそのトレーニング例にわたるその変数の平均値で置換して、再実行される。出力値の差を二乗して累積する。各トレーニング例についてこのプロセスを繰り返す。次いでその結果生じた合計を正規化し、正規化した値の合計が変数の数と等しくなるようにする。このように、全ての変数が等しく出力に寄与する場合には、それらの正規化値は１．０になるはずである。次いで正規化値を重要性の順に順位付けすることができる。 The method used.
The most commonly used method for determining the importance of a variable is to train a neural network on data containing all variables. Perform sensitivity analysis on the network and training data using the trained network as a basis. In each training example, the network is run in forward mode (no training). Recorded network output. For each input variable, the network is re-executed, replacing the variable with the average value of the variable over the training example. The difference between output values is squared and accumulated. This process is repeated for each training example. The resulting sum is then normalized so that the sum of the normalized values is equal to the number of variables. Thus, if all variables contribute equally to the output, their normalized value should be 1.0. The normalized values can then be ranked in order of importance.

上記の手法にはいくつかの問題がある。第一に、これは発見されたニューラルネットワークの解決策に依存する。異なるネットワークの開始重みを使用すれば、異なる順位付けが発見される可能性がある。第二に、二つの変数の相関性が高い場合には、そのいずれを使用しても十分な情報を含むことになる。ネットワークトレーニング実行に依存して、ただ一つの変数しか重要であると識別することはできない。第三の問題は、トレーニングしすぎたネットワークが、変数の真の重要性を曲解する可能性があることである。 There are several problems with the above approach. First, it depends on the neural network solution found. Using different network starting weights, different rankings can be found. Secondly, if the correlation between the two variables is high, any of them will contain sufficient information. Depending on the network training execution, only one variable cannot be identified as important. The third problem is that a network that has been trained too much may distort the true importance of variables.

上記の問題の影響を最小限に抑えるために、いくつかのネットワークをデータについてトレーニングした。できる限り最良のテストセットの性能を生み出すようにトレーニングプロセスを洗練し、ネットワークが入力と所望の出力との間の基礎関係を学習しているようにした。このプロセスの終了までに、ネットワークの良好なセットが利用可能になり、また最後にトレーニングしたネットワークについてのトレーニング構成も確立されることになる。トレーニングした各ネットワークについて感度分析を実施し、正規化値を平均した。この例では、トレーニング実行は、ホールドアウト方法を使用して利用可能なデータの五つの区分についてトレーニングした１５個のネットワークを含んでいた。 In order to minimize the impact of the above issues, several networks were trained on the data. The training process was refined to produce the best test set performance possible, and the network was learning the basic relationship between inputs and desired outputs. By the end of this process, a good set of networks will be available and a training configuration for the last trained network will be established. A sensitivity analysis was performed on each trained network and the normalized values were averaged. In this example, the training run included 15 networks trained on five segments of available data using the holdout method.

変数の順位付けが確立された後で、テスト実行を行い、変数の排除がテストセットの性能に及ぼす影響を決定した。寄与の小さい変数を排除すると、テストセットの性能が低下する。トレーニングデータが限られていることによって過剰トレーニングが問題になる場合には、変数を排除することによってテストセットの性能を実際に改善することができる。処理時間を節約するために、順位付けに基づくテストでは変数のグループを排除することもできる。 After variable ranking was established, test runs were performed to determine the effect of variable exclusion on test set performance. Excluding variables with small contributions decreases the performance of the test set. If overtraining becomes a problem due to limited training data, the performance of the test set can actually be improved by eliminating variables. To save processing time, ranking-based tests can eliminate groups of variables.

結果．
順位付けまたは変数は以下の通りとなり、これらはｐａｔ０５の実行でトレーニングされたネットワークについて報告される。 result.
The rankings or variables are as follows and these are reported for the network trained in the execution of pat05.

［表３］
――――――――――――――――――――――――――――――
０１．３５．薬物治療歴
０２．３３．Ｅｎｄｏの過去の病歴
０３．１１．出産回数
０４．３７．骨盤痛
０５．４０．月経困難症
０６．３４．骨盤手術歴
０７．１．年齢（ｐｒｅｐｒｏｃ）
０８．１３．不妊症歴
０９．８．箱／日
１０．３６．現在の外因性ホルモン
１１．４２．不妊症
１２．１８．誘発ホルモン
１３．１５．無排卵
１４．１４．排卵
１５．４３．付属器の塊／肥厚症
１６．４５．その他の症状
１７．３０．異常ＰＡＰ／形成異常症
１８．２６．子宮外妊娠
１９．１９．ヘルペス
２０．３９．月経異常
２１．１２．流産回数
２２．４１．性交疼痛症
２３．２４．子宮／卵管異常
２４．３１．婦人癌
２５．３２．その他の病歴
２６．１０．妊娠回数
２７．２８．卵巣嚢胞
２８．２５．フィブロイド
２９．２２．膣感染
３０．１６．未知
３１．２７．機能性子宮出血
３２．３８．異常な痛み
３３．５．妊娠過形成
３４．９．薬物使用
３５．２０．生殖器いぼ
３６．３．妊娠ＤＭ
３７．４．高血圧症
３８．２１．その他のＳＴＤ
３９．２３．ＰＩＤ
４０．４４．未決定
４１．２．糖尿病
４２．１７．乏排卵
４３．６．自己免疫疾患
４４．２９．多嚢胞性卵巣症候群
４５．７．移植
―――――――――――――――――――――――――――――― [Table 3]
――――――――――――――――――――――――――――――
01.35. Medication history 02.33. Endo's past medical history 03.11. Number of births 04.37. Pelvic pain 05.40. Dysmenorrhea 06.34. Pelvic surgery history 07.1. Age (preproc)
08.13. Infertility history 09.8. Box / day 10.36. Current exogenous hormone 11.42. Infertility 12.18. Induced hormone 13.15. Anovulation 14.14. Ovulation 15.43. Appendiceal mass / hypertrophy 16.45. Other symptoms 17.30. Abnormal PAP / dysplasia 18.26. Ectopic pregnancy 19.19. Herpes 20.39. Menstrual abnormality 21.12. Number of miscarriages 22.41. Sexual pain 23.24. Uterine / fallopian tube abnormality 24.31. Women's cancer 25.32. Other medical history 26.10. Number of pregnancy 27.28. Ovarian cyst 28.25. Fibroid 29.22. Vaginal infection 30.16. Unknown 31.27. Functional uterine bleeding 32.38. Abnormal pain 33.5. Pregnancy hyperplasia 34.9. Drug use 35.20. Genital warts 36.3. Pregnancy DM
37.4. Hypertension 38.21. Other STD
39.23. PID
40.44. Undecided 41.2. Diabetes 42.17. Poor ovulation 43.6. Autoimmune disease 44.29. Polycystic ovary syndrome 45.7. Transplanting ――――――――――――――――――――――――――――――

変数のサブセットをテストし、１４個の変数の最後のセットを使用してｐａｔ０７のネットワークをトレーニングした（例１３および１４参照）。上記の上位１４個にはないいくつかの変数も使用した。これによりテストセットの性能が改善された。
ｐａｔ０７のネットワークについての順位付けは以下の通りである。 A subset of variables was tested and the last set of 14 variables was used to train the network of pat07 (see Examples 13 and 14). Some variables not in the top 14 above were also used. This improved the performance of the test set.
The ranking of the pat07 network is as follows.

［表４］
――――――――――――――――――――――――――――――
０１．１０．Ｅｎｄｏの過去の病歴
０２．６．出産回数
０３．１４．月経困難症
０４．１．年齢（ｐｒｅｐｒｏｃ）
０５．１３．骨盤痛
０６．１１．骨盤手術歴
０７．４．箱／日
０８．１２．薬物治療歴
０９．５．妊娠回数
１０．７．流産回数
１１．９．異常ＰＡＰ／形成異常症
１２．３．妊娠過形成
１３．８．生殖器いぼ
１４．２．糖尿病
―――――――――――――――――――――――――――――― [Table 4]
――――――――――――――――――――――――――――――
01.10. Endo's past medical history 02.6. Number of births 03.14. Dysmenorrhea 04.1. Age (preproc)
05.13. Pelvic pain 06.11. Pelvic surgery history 07.4. Box / day 08.12. History of drug treatment 09.5. Number of pregnancy 10.7. Miscarriage count 11.9. Abnormal PAP / dysplasia 12.3. Pregnancy hyperplasia 13.8. Genital warts 14.2. Diabetes mellitus------------------------------

結論．
この例で識別された変数のセットは、テストおよび情報に基づいて妥当であると考えられる。 Conclusion.
The set of variables identified in this example is considered valid based on tests and information.

＜例２＞
患者病歴データについてのネットワークのトレーニング．
この例は、上記の１４個の変数を使用して、様々なパラメータを設定および最適化する方法を明らかにする。 <Example 2>
Network training on patient history data.
This example demonstrates how to set and optimize various parameters using the above 14 variables.

要件．
上記の例が完了すると、減少した患者病歴についてネットワークのセットをトレーニングし、それらの性能を記録する。実験を実行し、ネットワークのトレーニングに最良の構成およびパラメータを決定した。性能の分析を実施して偽の正および偽の負の数を決定し、所与の患者のサブセットを信頼性高く診断することができるかどうかを調べた。データが限られているので、推定した性能は、残りのデータについてのテストおよびトレーニングのためにデータベースの小さな部分（２５％）を除外することによって決定した。ネットワークの一つで全てのデータをテストデータとして使用するまで、この方法を繰り返した。次いでテストデータについての組み合わせた結果が性能の推定となる。最後のネットワークは、利用可能な全てのデータをトレーニングデータとして使用してトレーニングした。 Requirements.
When the above example is complete, the network set is trained on the reduced patient history and their performance recorded. Experiments were performed to determine the best configuration and parameters for network training. Performance analysis was performed to determine false positives and false negative numbers to see if a given subset of patients could be diagnosed reliably. Since the data is limited, the estimated performance was determined by excluding a small portion (25%) of the database for testing and training on the remaining data. This method was repeated until all data was used as test data in one of the networks. The combined result for the test data is then a performance estimate. The last network trained using all available data as training data.

使用した方法．
少ないトレーニング例を扱う場合には、ネットワーク構成およびパラメータ設定の決定に便利なテスト情報を提供するためには、ホールドアウト方法が有効である。処理時間を大幅に増加させることなくトレーニングに利用できるデータを最大限にするために、提案された２５％ではなく、２０％のホールドアウトを使用した。これにより四つではなく五つのデータ区分が生じ、データの８０％が各区分のトレーニング用となった。 The method used.
When dealing with a small number of training examples, the holdout method is effective in providing test information useful for determining network configuration and parameter settings. In order to maximize the data available for training without significantly increasing the processing time, a 20% holdout was used instead of the proposed 25%. This resulted in five data categories instead of four, and 80% of the data was for training in each category.

ランダム開始重みの影響を最小限に抑えるために、複数のネットワークを全トレーニング実行でトレーニングした。こうした実行では、三つのネットワークを、データの五つの区分のそれぞれで、それぞれ異なるランダム開始からトレーニングした。ネットワークの出力を平均して、単一のネットワークから得られるより低い分散を有するコンセンサス結果を形成する。 To minimize the effects of random starting weights, multiple networks were trained with all training runs. In this implementation, the three networks were trained from different random starts in each of the five sections of data. The network outputs are averaged to form a consensus result with a lower variance obtained from a single network.

いくつかの実験を実施し、テストセットの性能を最大限にするネットワークのパラメータを発見した。このプロセスで修正したパラメータは以下の通りである。 Several experiments were conducted to discover network parameters that maximize the performance of the test set. The parameters modified by this process are as follows:

［表５］
――――――――――――――――――――――――――――――
１．隠れた処理要素の数
２．入力に追加された雑音の量
３．誤差許容度の量
４．使用する学習アルゴリズム
５．使用する重み減衰の量
６．使用する入力変数の数
―――――――――――――――――――――――――――――― [Table 5]
――――――――――――――――――――――――――――――
1. Number of hidden processing elements 2. The amount of noise added to the input 3. Error tolerance amount 4. Learning algorithm to be used 5. Amount of weight attenuation to use Number of input variables to be used ――――――――――――――――――――――――――――――

４５個の変数の可能な全ての組合せを完全に探索することは、テストに必要とされるＣＰＵ時間の量により容易ではない。テストネットワークは、この領域で重要であると当業者に知られているパラメータに基づいて、また以前のテストの結果に基づいて選択したパラメータでトレーニングした。その他の変数のセットも適当である。また、本明細書の別の部分に示すように、選択した１４個の変数全ての組合せをテストした。最良の構成が決定された後で、５１０人の患者の完全なデータセットについて、最後のセットのネットワークをトレーニングした。最後のセットのネットワークでは、八つのネットワークのコンセンサスを作成し、最終的な統計値を生成した。 Full searching for all possible combinations of 45 variables is not easy due to the amount of CPU time required for testing. The test network was trained with parameters selected based on parameters known to those skilled in the art as important in this area and based on the results of previous tests. Other variable sets are also appropriate. Also, combinations of all 14 selected variables were tested as shown elsewhere in this specification. After the best configuration was determined, the last set of networks was trained on a complete data set of 510 patients. In the last set of networks, a consensus of eight networks was created and final statistics were generated.

結果．
最終的ホールドアウトトレーニングの実行は、１４個の変数を有するｐａｔ０６であった。テストデータについての性能は６８．２３％であった。全トレーニング実行は、ｐａｔ０６と同じネットワーク構成を有するｐａｔ０７であった。トレーニングデータについての性能は７２．９％であった。最後のトレーニング実行についての統計値を、ネットワーク出力値のカットオフの使用に基づいて生成した。ネットワーク出力がカットオフ以下である場合には、この例は考慮されていない。以下の表はｐａｔ０７中の八つのネットワークのコンセンサスについての結果の概要である。ａｄｚｃｒｆと呼ばれるテストプログラムを生成して、この最後のトレーニングを明らかにした。 result.
The final holdout training run was pat06 with 14 variables. The performance for the test data was 68.23%. All training runs were pat07 with the same network configuration as pat06. The performance for the training data was 72.9%. Statistics for the last training run were generated based on the use of a network output value cutoff. This example is not considered when the network output is below the cutoff. The following table summarizes the results for the consensus of the eight networks in pat07. A test program called adzcrf was generated to reveal this final training.

ＰＰＶ＝正の予測値、ＮＰＶ＝負の予測値 PPV = positive prediction value, NPV = negative prediction value

＜例３＞
ウェスタンブロットデータの事前処理および入力． <Example 3>
Western blot data pre-processing and input.

要件．
最初に論理設計に送られた、患者についてのウェスタンブロットからの抗原データは、ピーク分子量およびそれらが関連する強度のみについての情報を提供した。このデータ、およびこのデータが取られた元の画像の分析は、より多くの情報をニューラルネットワークに提供することができるようにデジタル化された元の画像を使用することができることを示す。二つの実験について元の画像を検査すると、画像データを事前処理は、画像中の特定の分子量の位置の可変性を低下させる。この事前処理では、標準画像に適した多項式を使用して、修正した画像を生成することになる。画像の事前処理は、画像の背景レベルおよびコントラストを正規化するステップも含むことになる。 Requirements.
The antigen data from the Western blot for the patient, first sent to the logic design, provided information about only the peak molecular weights and the intensity with which they were associated. Analysis of this data, and the original image from which this data was taken, shows that the digitized original image can be used so that more information can be provided to the neural network. When examining the original image for two experiments, preprocessing the image data reduces the variability of the position of a particular molecular weight in the image. In this preprocessing, a corrected image is generated using a polynomial suitable for the standard image. Image pre-processing will also include normalizing the background level and contrast of the image.

事前処理が完了した後で、画像データをそのままで使用する、またはピーク分子量を抽出することができる。得られた画像から、ニューラルネットワークへの入力が生成されることになる。通常の画像は約１０００ピクセルの長さであるので、入力数を減少させる方法が調査されることになる。画像が、全てまたは減少したディメンション（解像度）の画像を使用して直接ネットワーク入力にコード化されるので、ニューラルネットワークは、監視学習でトレーニングされ、疾病の決定に関係する分子量の範囲の決定を補助することになる。この例は、ネットワークへの入力中で画像を全体として使用することを焦点とする。 After the pre-processing is completed, the image data can be used as it is or the peak molecular weight can be extracted. An input to the neural network is generated from the obtained image. Since a normal image is about 1000 pixels long, methods to reduce the number of inputs will be investigated. Since the images are encoded directly into the network input using images of all or reduced dimensions (resolution), the neural network is trained in surveillance learning to help determine the range of molecular weights relevant to disease determination. Will do. This example focuses on using the image as a whole in input to the network.

使用した方法．
相関技術を使用して、ウェスタンブロットの画像についての同様の特徴を突き合わせて、相関プロットを生成した。これらのプロットから、サンプルを正確に整合するには、二つのサンプルの相関プロットについての突合せの変動が大きすぎると結論付けられた。ネットワークの各入力は分子量値を正確に表現する必要があるので、標準画像からの情報のみを画像の整合に使用することに決定した。 The method used.
Correlation techniques were used to match similar features on Western blot images to generate correlation plots. From these plots it was concluded that the reconciliation variation for the two sample correlation plots was too large to match the samples accurately. Since each input of the network needs to accurately represent the molecular weight value, it was decided to use only information from the standard image for image matching.

標準画像について二次適合を実施し、相対移動性情報を分子量に翻訳する手段を生成する。相対移動性の曲線を分子量の対数に対してプロットし、ＲＳＱＲ値を検査した後で、二次適合はこの翻訳を実施するのに十分に正確ではなかったと結論付けられた。二次適合を使用して標準分子について計算した分子量は、ゲルごとに様々である。 A second order fit is performed on the standard image to generate a means to translate relative mobility information into molecular weight. After plotting the relative mobility curve against the logarithm of molecular weight and examining the RSQR value, it was concluded that the quadratic fit was not accurate enough to perform this translation. The molecular weight calculated for the standard molecule using a quadratic fit varies from gel to gel.

相対移動性の分子量への翻訳を改善するためにいくつかの方法を試みた。三次スプライン補間法を選択した。この方法は、データ点でのなめらかな移行を保証し、迅速に計算される。唯一重要なのは、標準がカバーする区間の外側にある相対移動性の値について、この方法がどのように実施されるかということである。終了条件が適当に設定されていれば、補外法の問題は回避されるものと考えられる。これが選択した方法である。 Several methods were tried to improve the translation to relative mobility molecular weight. A cubic spline interpolation method was selected. This method ensures a smooth transition at the data points and is calculated quickly. The only important thing is how this method is implemented for relative mobility values outside the interval covered by the standard. If the termination conditions are set appropriately, the extrapolation problem can be avoided. This is the method of choice.

スプライン補間法を使用して、画像を一定寸法のトレーニング記録に変換した。この時点で、画像強度の正規化を考慮しなければならない。二つの選択肢が考えられる。第一は、正規化を実施しないことである。第二は、画像にわたる最大値が１．０にセットされ、最小値が０．０にセットされるように画像を処理することである。各選択肢についてネットワークをトレーニングし、その結果を比較した。入力に雑音が追加されなければ、事前処理した画像ネットワークは９７％のトレーニング例性能を有し、事前処理しなかった場合の性能は７９％であった。雑音が追加された場合には、二つの選択肢は同様の結果を与えた。さらなるトレーニング実行のために事前処理した画像を使用することを選択した。この選択により、ウェスタンブロット法を使用して達成することができる許容度の範囲内で、所与のネットワーク入力が特定の分子量と一貫して関連付けられることが保証された。 A spline interpolation method was used to convert the image into a constant size training record. At this point, normalization of image intensity must be considered. Two options are possible. The first is that no normalization is performed. The second is to process the image so that the maximum value over the image is set to 1.0 and the minimum value is set to 0.0. We trained the network for each option and compared the results. If no noise was added to the input, the preprocessed image network had 97% training example performance and 79% performance without preprocessing. When noise was added, the two options gave similar results. We chose to use preprocessed images for further training runs. This selection ensured that a given network input was consistently associated with a particular molecular weight within the tolerances that could be achieved using Western blotting.

上記の選択を使用して、一連の八つのニューラルネットワークをトレーニングし、Ｅｎｄｏ存在変数の予測に基づく様々な分子量の重要性についての情報を提供した。相関の方向の分析を可能にするために、単一の隠れた処理要素のみをトレーニングに使用した。各ネットワークについて感度分析を実施し、得られたコンセンサスをＥｘｃｅｌを使用してプロットした。 Using the above selections, a series of eight neural networks were trained to provide information about the importance of various molecular weights based on the prediction of Endo presence variables. Only a single hidden processing element was used for training to allow analysis of the direction of correlation. Sensitivity analysis was performed for each network and the resulting consensus was plotted using Excel.

次いでネットワークの重みを平均し、各重みについてのコンセンサス値を生成した。隠れた要素から出力への相互接続の重みは正にも負にもなるので、これらの重みは全ての出力接続が同じ符号を有するように変形した。次いで重みを平均し、その結果をＥｘｃｅｌを使用してプロットした。 The network weights were then averaged to generate a consensus value for each weight. Since the weight of the interconnection from the hidden element to the output can be positive or negative, these weights were modified so that all output connections have the same sign. The weights were then averaged and the results were plotted using Excel.

結果．
ネットワーク入力への画像整合およびＭａｘ／Ｍｉｎ画像事前処理のために三次スプライン補間法を使用して、ウェスタンブロットデータの分析を実施した。ウェスタンブロット法による、画像の整合の確度において一定量の変動性が期待できる場合には、この手法は、多項式適合が最初に使用した、より良好な結果を与えるものと考えられる。 result.
Analysis of Western blot data was performed using cubic spline interpolation for image alignment to network inputs and Max / Min image preprocessing. If a certain amount of variability can be expected in the accuracy of image alignment by Western blotting, this approach is considered to give better results that the polynomial fit was first used.

最終的コンセンサスネットワークについての感度分析および重みのプロットは、疾病の予測および診断を補助することができるウェスタンブロットの領域があることを示した。ネットワークの重みに見られる、正および負の相関の領域の幅もまた、示された結果が有意であることを示す。ピークが非常に狭い場合には、ピークは、過剰トレーニングと同様のトレーニングプロセスの人為結果であり、学習される基礎プロセスを形成しないものと結論付けなければならない。重要であると考えられる領域は以下の通りである。 Sensitivity analysis and weight plots for the final consensus network showed that there is a region of the Western blot that can aid in disease prediction and diagnosis. The width of the positive and negative correlation regions seen in the network weights also indicates that the results shown are significant. If the peak is very narrow, it must be concluded that the peak is an artifact of a training process similar to overtraining and does not form the basic process to be learned. The areas that are considered important are:

［表６］
――――――――――――――――――――――――――――――
正の相関
31503.98 - 34452.12
62548.87 - 65735.97
84279.36 - 89458.49
負の相関
19165.9 - 20142.47
50263.36 - 53352.14
67725.77 - 78614.77
―――――――――――――――――――――――――――――― [Table 6]
――――――――――――――――――――――――――――――
Positive correlation
31503.98-34452.12
62548.87-65735.97
84279.36-89458.49
Negative correlation
19165.9-20142.47
50263.36-53352.14
67725.77-78614.77
――――――――――――――――――――――――――――――

正および負のピークはいくつか存在するが、これらが、二つのＥＬＩＳＡテストに含まれる可能性が最も高いと考えられる。一方のテストは正の領域を焦点とし、もう一方は負の領域を焦点とする。次いで得られた二つの値を、ニューラルネットワークへの入力として患者病歴データと組み合わせることができる。 There are several positive and negative peaks, but these are most likely to be included in the two ELISA tests. One test focuses on the positive area and the other focuses on the negative area. The two values obtained can then be combined with patient history data as input to the neural network.

結論．
ニューラルネットワークは、ウェスタンブロットに基づいて疾病の存在と相関する領域を発見することができた。 Conclusion.
Neural networks were able to find areas that correlate with the presence of disease based on Western blots.

＜例４＞
ウェスタンブロットデータについての一定入力寸法の調査． <Example 4>
Investigation of constant input dimensions for Western blot data.

要件．
事前処理した画像から抽出したピーク分子量を使用して、患者についてのウェスタンブロットデータの変化する寸法を、ニューラルネットワークについての一定の寸法に換算する方法を調査した。この手法は、ネットワーク入力が全画像手法より大幅に少なくなるので望ましい。基本的な問題は、相互に関係する可能性のある分子量の変数がテストで生じることである。例およびこの例の結果を比較すると、分子量のパターンが存在すること、またはそれらの分子量が関連がないかどうかが示される。分子量データにはいくらか変動性があるので、ニューラルネットワークについて分類を実施しても、このデータを処理する手法はファジーメンバシップ関数と同様である。 Requirements.
Using the peak molecular weight extracted from the pre-processed image, we investigated how to convert the changing dimensions of the Western blot data for the patient to a constant dimension for the neural network. This approach is desirable because it requires significantly less network input than the full image approach. The basic problem is that the test generates molecular weight variables that can be interrelated. A comparison of the example and the results of this example indicates that there is a pattern of molecular weights or whether their molecular weights are not relevant. Since there is some variability in molecular weight data, even if classification is performed on a neural network, the technique for processing this data is similar to the fuzzy membership function.

追加要件．
ウェスタンブロットデータから一部分が識別される。これらの部分の積は再生可能であるので、この情報の使用の有効性は、ウェスタンブロット画像データを処理して、これらの部分の分子量に対応するｂｉｎｓにすることにより決定される。 Additional requirements.
A portion is identified from the Western blot data. Since the product of these parts is reproducible, the effectiveness of using this information is determined by processing the Western blot image data into bins corresponding to the molecular weight of these parts.

使用した方法．
例４の結果から、分子量のいくつかの範囲が疾病と相関があるものと決定される。例５に見られる各ピークに集中するガウス領域を使用することにより、減少した入力表現が生成された。ガウスの値が領域の縁部で０．５以下になるようにガウスの標準偏差を決定した。ニューラルネットワーク入力を生成するために実施した基本操作は、ガウスとウェスタンブロット画像との間のたたみ込みである。計算は全て、分子量の対数を使用して実施した。 The method used.
From the results of Example 4, it is determined that some range of molecular weight is correlated with disease. By using a Gaussian region centered on each peak seen in Example 5, a reduced input representation was generated. The Gaussian standard deviation was determined so that the Gaussian value was 0.5 or less at the edge of the region. The basic operation performed to generate the neural network input is a convolution between Gaussian and Western blot images. All calculations were performed using the logarithm of molecular weight.

別々のソフトウェアプログラムが生成された。このプログラムは、正規化した画像についての分子量および強度に対するたたみ込みを実施した。ネットワーク入力の計算のパラメータは、ｂｉｎｐｒｏｃプログラム中の表に含まれる。ｂｉｎｐｒｏｃでは、平均および標準偏差はこの表に記憶される。表の値が変更されるときに、プログラムは再コンパイルされる。プログラムは、Ｅｘｃｅｌを使用して匹敵するウェスタンブロット画像にガウスをプロットすることができる出力ファイルを生成するテストモードを有する。領域のプロットはドキュメンテーションに含まれる。 Separate software programs were generated. This program performed a convolution on the molecular weight and intensity for the normalized image. The parameters for calculating the network input are included in a table in the binproc program. In binproc, the mean and standard deviation are stored in this table. The program is recompiled when the table values change. The program has a test mode that produces an output file that can plot Gaussian to comparable Western blot images using Excel. Region plots are included in the documentation.

３６個の小部分を処理する際には、小部分の位置をｂｉｎｐｒｏｃの表の値に翻訳するようにｂｉｎｐｒｏｃ．ｃを再度修正した。この修正したプログラムをｆｐｒｏｃ．ｄと呼ぶ。その目的は、分子量値を標準に基づいて正規化するのに必要なスプライン補間を実施することである。ｂｉｎｐｒｏｃからｂｉｎｐｒｏｃ２．ｃを生成し、平均偏差表および標準偏差表を、供給されたファイル中の小部分の終点に対応するｍｉｎ．表およびｍａｘ．表で置き換えた。 When processing 36 sub-portions, binproc.c. So as to translate the position of the sub-portions into the values in the binproc table. c was corrected again. This modified program is called fproc. Call it d. Its purpose is to perform the spline interpolation necessary to normalize molecular weight values based on standards. binproc to binproc2. c, the average deviation table and the standard deviation table are min. corresponding to the end points of the small portions in the supplied file. Table and max. Replaced with a table.

上記プログラムから生成された任意のデータファイルをテストするために、データの８０％をトレーニング用に、残りの２０％をテスト用に使用して、ホールドアウト方法を使用した。ウェスタンブロットデータからトレーニングデータが生成された後で、乱数列および患者のＩＤ列をＥｘｃｅｌのスプレッドシートに追加した。次いで乱数列上でデータをソートした。これにより実際にデータがシャッフルされる。このようにして、各区分が各ゲルからの例を有する可能性が高い。これらの割合で、五つの別々のトレーニングおよびテストファイルが、組み合わせたテストセットの結果からネットワークの性能を推定することができるように生成される。 To test any data file generated from the above program, a holdout method was used, with 80% of the data used for training and the remaining 20% used for testing. After training data was generated from the Western blot data, a random number sequence and a patient ID sequence were added to the Excel spreadsheet. The data was then sorted on the random number sequence. This actually shuffles the data. In this way, each section is likely to have an example from each gel. At these rates, five separate training and test files are generated so that network performance can be estimated from the combined test set results.

ＴｈｉｎｋｓＰｒｏ^TMを使用して、入力を排除することにより、ネットワークが使用する入力の数を変化させることができる。排除された入力は、トレーニング中にネットワークに提示されない。ガイドとして感度分析を使用して、重要でない入力を除去する。入力スペースのディメンションを減少させることは、トレーニング例の数が少ないときにはさらに重要になる。この方法は、患者病歴トレーニング実行中の変数を除去する際に使用したものと同じである。現在では、このプロセスは手動で行う。 By using ThinksPro ^™ , the number of inputs used by the network can be varied by eliminating the inputs. Excluded inputs are not presented to the network during training. Use sensitivity analysis as a guide to remove unimportant inputs. Reducing the input space dimension becomes even more important when the number of training examples is small. This method is the same as that used to remove variables during patient history training. Currently, this process is manual.

結果．
例５では、全てのデータについてトレーニングされたネットワークを使用して、分類プロセスに重要な分子量の範囲を決定した。この例では、ホールドアウト方法を使用して、テストセットの性能を推定することができるようにネットワークをトレーニングした。第一のテストセットは、例５で識別された領域に基づいている。第二のテストセットは、四つのｉｓｈｇｅｌファイル中で識別された小部分を使用して作成された。 result.
In Example 5, a network trained on all data was used to determine the range of molecular weights important for the classification process. In this example, the holdout method was used to train the network so that the performance of the test set can be estimated. The first test set is based on the area identified in Example 5. A second test set was created using the small pieces identified in the four ishgel files.

例５で見られた上位六つの領域に基づく最初のコンセンサス実行の性能は低い（５０％）。生成された入力データの分析により、入力データの生成に使用された領域は、画像データから重要な情報を捕捉するには狭すぎることが示された。領域の幅を広げ、上位六つではなく、例５からの上位１０個の領域を含めた。幅を広げた１０個の領域についてのテストはわずかに良好な性能を示した。感度分析を使用して、１０個の領域のうち三つを除去し、完全なテストを実行した。幅を広げた１０個の領域のうち六つについての性能は、５４．５％に向上した。 The performance of the first consensus run based on the top six regions seen in Example 5 is low (50%). Analysis of the generated input data showed that the area used to generate the input data was too narrow to capture important information from the image data. The width of the region was expanded to include the top 10 regions from Example 5, rather than the top six. Tests on 10 wide areas showed slightly better performance. Using sensitivity analysis, three of the ten regions were removed and a complete test was performed. The performance of 6 out of 10 widened areas was improved to 54.5%.

ネットワークへの入力数がさらに減少するにつれて、テストセットの性能（ホールドアウト方法で推定）は高まり続ける。６６３９２．６５から７８６１４．７４の範囲の分子量を有するただ一つの領域しか使用しない場合に最高の性能が達成された。ホールドアウト方法を使用した、テストデータについての性能の推定値は５８．５％であった。 As the number of inputs to the network further decreases, the performance of the test set (estimated by the holdout method) continues to increase. The best performance was achieved when using only one region with a molecular weight in the range of 66392.65 to 78614.74. The estimated performance for the test data using the holdout method was 58.5%.

このプロセスを、識別された小部分に基づく３６個の領域を開始として使用して再度適用した。３６個の小部分には大量の重複が存在した。上位七つの小部分を、感度分析を使用して３６個から決定した。小部分のサブセットを使用して、５８％という同様の性能が達成された。 This process was reapplied using 36 regions based on the identified sub-parts as starting points. There was a large amount of duplication in the 36 small parts. The top seven sub-parts were determined from 36 using sensitivity analysis. Similar performance of 58% was achieved using a small subset.

結論．
テストでは非常に高い結果は生じなかった。このことの主な理由は、この例で利用できるトレーニングデータの量が限られていたことである可能性が高い。以前の例から得られた結果は、トレーニングサンプル中の患者数が減少するにつれて妥当性データについての性能も低下したことを示した。この関係を以下の表に示す。 Conclusion.
The test did not produce very high results. The main reason for this is likely that the amount of training data available in this example was limited. The results obtained from previous examples showed that the performance on the validity data decreased as the number of patients in the training sample decreased. This relationship is shown in the table below.

患者数が減少しても、Ｅｌｉｓａ変数を含む場合にはＥｌｉｓａ／患者病歴データについてより良好な結果が達成された。このことはＥＬＩＳＡ変数の価値を示す。 Even when the number of patients was reduced, better results were achieved for Elisa / patient history data when Elisa variables were included. This indicates the value of the ELISA variable.

いくつかの領域を、疾病の分類に重要であると決定できることは明らかである。大幅に異なる領域のセットが同様の結果を生じ、ウェスタンブロットデータ中に、疾病の存在を示すパターンが存在する可能性があることを示す。患者のデータベースが少ない場合には、これらのパターンを分離することはより困難になる。 It is clear that several areas can be determined to be important for disease classification. A significantly different set of regions yields similar results, indicating that there may be patterns in the Western blot data indicating the presence of disease. If the patient database is small, it is more difficult to separate these patterns.

ウェスタンブロットデータ用のデータベースのサイズの増加により、このデータについてトレーニングしたネットワークの性能が改善されることになることは明らかである。ウェスタンブロットデータを患者病歴データと組み合わせると、ネットワークの入力寸法が増加することになる。入力寸法が増加すると、通常は一般化を保証するためにより多くのトレーニング例が必要となる。 Clearly, increasing the size of the database for Western blot data will improve the performance of the network trained on this data. Combining Western blot data with patient history data will increase the input size of the network. Increasing input dimensions usually requires more training examples to ensure generalization.

＜例５＞
ウェスタンブロットデータを使用するトレーニングネットワーク．
この例の目的は、ウェスタンブロットデータのみを使用して診断についての性能推定を決定するようにネットワークのセットをトレーニングすることである。実験を実行し、ネットワークのトレーニングのための最良の構成およびパラメータを決定した。上記の例２に記載した方法を、この性能推定に使用する。最後のネットワークは、利用可能な全てのデータをトレーニングデータとして使用してトレーニングした。このトレーニングしたネットワークの出力（抗原指標）は、組み合わされたデータフェーズ中で生成されたネットワークへの入力として使用した。 <Example 5>
Training network using western blot data.
The purpose of this example is to train a set of networks to use only Western blot data to determine performance estimates for diagnosis. Experiments were performed to determine the best configuration and parameters for network training. The method described in Example 2 above is used for this performance estimation. The last network trained using all available data as training data. The output of this trained network (antigen index) was used as input to the network generated during the combined data phase.

使用した方法．
いくつかの方法を使用して、利用可能なトレーニングデータについての最もよく実施される入力のセットを発見した。以前の例から、感度分析を使用すると、各入力変数の重要性の識別において良好な結果が生じることが分かった。その数のネットワークは、感度分析によって手動で決定された変数の組合せについてトレーニングした。 The method used.
Several methods were used to find the most commonly implemented set of training data available. From previous examples, it was found that using sensitivity analysis yielded good results in identifying the importance of each input variable. That number of networks were trained on combinations of variables determined manually by sensitivity analysis.

自動化手順を準備する際に、変数の２×２分割表カイ二乗分析を使用して、変数の重要性の代替の順位付けを提供した。入力は連続的であるので、各入力についてしきい値を使用して、分割表に必要な情報を生成した。カイ二乗値は、しきい値の設定に依存して変化する。変数の順位付けに使用するしきい値は、カイ二乗統計値を最大にするように選択した。 In preparing the automation procedure, a 2 × 2 contingency table chi-square analysis of the variables was used to provide an alternative ranking of the importance of the variables. Since the inputs are continuous, a threshold was used for each input to generate the information needed for the contingency table. The chi-square value changes depending on the threshold setting. The threshold used to rank the variables was chosen to maximize the chi-square statistic.

自動化手順の開発中に行われるトレーニング実行は、これらの順位付けから選択される。トレーニング実行が行われた時点で、自動化手順は定形化されていない。全体の処理時間を節約するために、トレーニングデータのただ一つの区分しか使用しない。次いでトレーニングおよびテストデータの第一区分中で良好に実施された変数の組合せを、残りの区分について試した。 Training runs performed during the development of the automation procedure are selected from these rankings. When the training run is performed, the automated procedure has not been formalized. Only one segment of training data is used to save overall processing time. The well-performed variable combinations in the first segment of training and test data were then tested for the remaining segments.

本文献で提案する最良の入力のセットを発見する一つの方法は、遺伝アルゴリズムを使用して、最もよく実施される入力のセットを決定するものである。ジェネティックアルゴリズムは、通常は、良好な解答に収束するには数千回も反復する必要がある。ウェスタンブロットデータの処理では、これは、トレーニング例のサイズが小さい場合でも大量のコンピュータ時間に相当することになる。１０個の変数について、全ての組合せを枚挙するには１０２４回のトレーニング実行が必要となる。ジェネティックアルゴリズムの代替の方法を試みた。この代替の方法では、選択した入力のセットに基づいてテストセットのＲＭＳ誤差を予測するように、ニューラルネットワークをトレーニングした。この実験で使用したトレーニング例は、ウェスタンブロットデータの第一区分についてのトレーニング実行の結果である。次いで全ての組合せで予測ネットワークをテストし、予測された最小の組合せを決定する。次いで入力の組合せを使用して、ウェスタンブロットデータについてネットワークをトレーニングする。この方法およびジェネティックアルゴリズム手法の主な欠点は、非常に有効であることが分かっている感度分析情報が、このプロセスにおいて無視されることである。 One way to find the best set of inputs proposed in this document is to use a genetic algorithm to determine the most commonly implemented set of inputs. Genetic algorithms usually require thousands of iterations to converge to a good answer. In processing Western blot data, this corresponds to a large amount of computer time even if the training example size is small. For 10 variables, 1024 training runs are required to enumerate all combinations. An alternative method of genetic algorithm was tried. In this alternative method, the neural network was trained to predict the RMS error of the test set based on the selected set of inputs. The training example used in this experiment is the result of a training run on the first section of the Western blot data. The prediction network is then tested with all combinations to determine the predicted minimum combination. The combination of inputs is then used to train the network for Western blot data. The main drawback of this method and the genetic algorithm approach is that sensitivity analysis information that has proven to be very effective is ignored in this process.

結果．
ウェスタンブロットデータ中の１０個の変数（ｂｉｎｓ）についての基本的順位付けは、２００個の例の全データベースについてトレーニングした八つのネットワークのコンセンサスに基づく。その結果は以下の通りである。 result.
The basic ranking for the 10 bins in the Western blot data is based on a consensus of 8 networks trained on a full database of 200 examples. The results are as follows.

［表７］
――――――――――――――――――――――――――――――
7 : 1.182073
9 : 1.055611
3 : 1.053245
8 : 1.039028
6 : 1.027239
10 : 1.023135
4 : 0.978769
5 : 0.952821
2 : 0.899936
1 : 0.788143
―――――――――――――――――――――――――――――― [Table 7]
――――――――――――――――――――――――――――――
7: 1.182073
9: 1.055611
3: 1.053245
8: 1.039028
6: 1.027239
10: 1.023135
4: 0.978769
5: 0.952821
2: 0.899936
1: 0.788143
――――――――――――――――――――――――――――――

カイ二乗分析に基づく１０個の変数の順位付けは以下の通りである。 The ranking of the 10 variables based on the chi-square analysis is as follows.

［表８］
――――――――――――――――――――――――――――――
3 : 4.380517
9 : 3.751625
7 : 3.372731
2 : 3.058437
6 : 3.022164
5 : 2.787982
10 : 1.614931
4 : 1.225725
1 : 0.975502
8 : 0.711958
―――――――――――――――――――――――――――――― [Table 8]
――――――――――――――――――――――――――――――
3: 4.380517
9: 3.751625
7: 3.372731
2: 3.058437
6: 3.022164
5: 2.787982
10: 1.614931
4: 1.225725
1: 0.975502
8: 0.711958
――――――――――――――――――――――――――――――

ウェスタンブロットデータの分析中に、トレーニングデータの一つまたは複数の第一区分についていくつかのネットワークをトレーニングした。テストの結果は以下のように順位付けられ、変数がトレーニング実行に含まれることを示す。 During the analysis of western blot data, several networks were trained on one or more first segments of training data. The test results are ranked as follows, indicating that the variable is included in the training run.

変数 variable

（）は予測ネットワークトレーニングプロセスによって生成された組合せを示す () Indicates the combination generated by the predictive network training process

上記のテスト実行を参照すると、順位付け中のより重要な変数が下位のテストセット誤差に寄与すること、および含まれる変数が多くなると、テストセットの結果が低くなることは明らかである。このことは、高性能ニューラルネットワークの開発における、変数の最良のサブセットを選択することの重要性を示す。 Referring to the test execution above, it is clear that the more important variables in the ranking contribute to the lower test set error, and that the more variables included, the lower the test set results. This demonstrates the importance of selecting the best subset of variables in the development of high performance neural networks.

いくつかの組合せの変数を使用して、トレーニングデータの全ての区分についてネットワークをトレーニングした。これらの実行の結果を以下に示す。 Several combinations of variables were used to train the network for all segments of the training data. The results of these runs are shown below.

［表９］
――――――――――――――――――――――――――――――
変数時間セットの性能
3 57.5％
3， 9 53.5％
3， 7，9 53.0％
4， 6，9， 10 57.0％
―――――――――――――――――――――――――――――― [Table 9]
――――――――――――――――――――――――――――――
Variable Time set performance
3 57.5%
3, 9 53.5%
3, 7, 9 53.0%
4, 6, 9, 10 57.0%
――――――――――――――――――――――――――――――

変数の両方の順位付けは３、７、および９が重要であると示すので、十分なトレーニングデータが存在する場合には、この組合せが５７．５％を超える可能性が高い。この組合せについてのトレーニング例の性能は６３．９％であり、これは発生した過剰トレーニングのレベルを示す。上記に示した第一区分ネットワークのいくつかは、テスト性能を予測するようにトレーニングしたニューラルネットワークによって選択された変数の組合せを有する。これらのネットワークは最後の列の番号によって示される。この番号は、テストが実行されるシーケンスを示す。番号のない組合せは、順位付けから手動で選択した。このプロセスを継続すれば、予測ネットワークは最終的に最良の組合せを発見するはずである。テストセットの性能に影響を及ぼす可能性のあるファクタは数多く存在するので、テストセットの結果には多くの「雑音」が存在する可能性が高い。この方法をより良好に働かせるために、予測されたテストセットの誤差についてトレーニング値を生成するためにコンセンサス手法が必要になることがある。この問題はコンセンサス手法を使用する際にも見られる。 Both rankings of the variables indicate that 3, 7, and 9 are important, so if there is sufficient training data, this combination is likely to exceed 57.5%. The training example performance for this combination is 63.9%, indicating the level of overtraining that occurred. Some of the first segment networks shown above have a combination of variables selected by a neural network trained to predict test performance. These networks are indicated by the number in the last column. This number indicates the sequence in which the test is performed. Combinations without numbers were manually selected from ranking. If this process continues, the prediction network will eventually find the best combination. Since there are many factors that can affect the performance of the test set, there is a high probability that there will be a lot of “noise” in the results of the test set. In order for this method to work better, a consensus approach may be required to generate training values for predicted test set errors. This problem is also seen when using consensus techniques.

結論．
変数の感度および分割表順位付けを使用するプロセスは、ニューラルネットワークの性能を最大限にするように変数のセットを選ぶための有効かつ効率的な技術である。両方の順位付けの下での上位三つの変数は同じであり、これはこれらの方法が良好に実施されることを示す。この方法は、ウェスタンブロットデータを処理することは明らかであるが、任意形態のデータについてよく作用し、これを患者病歴データにも適用可能な汎用ニューラルネットワーク技術にする。 Conclusion.
The process of using variable sensitivity and contingency table ranking is an effective and efficient technique for choosing a set of variables to maximize the performance of a neural network. The top three variables under both rankings are the same, indicating that these methods are well implemented. Although this method apparently processes Western blot data, it works well for any form of data, making it a general purpose neural network technique that can also be applied to patient history data.

上記の結果は、データが多ければ性能レベルが向上することを示す。感度分析は、変数の相対値の変動がほとんどないことを示す。ほとんどの変数は解答に寄与する。完全ウェスタンブロット画像についてトレーニングしたニューラルネットワークの重みの分析に基づいてｂｉｎｓが選択されるので、このことが期待される。しかし、全てまたはほとんどの変数を使用することにより、ニューラルネットワークは急速に過剰トレーニング状態となる。これは、トレーニング例にデータを追加することによって回避することができる。 The above results show that the performance level improves with more data. Sensitivity analysis shows little variation in the relative values of the variables. Most variables contribute to the answer. This is expected because bins are selected based on an analysis of the weight of the neural network trained on the full Western blot image. However, by using all or most of the variables, the neural network quickly becomes overtrained. This can be avoided by adding data to the training example.

ニューラルネットワークに案内されて変数を選択するテストは、順位付け手法より有効性が低いことが分かった。順位付け手法が最も有効であることは明らかであるが、ニューラルネットワーク案内手法でも最終的には最良の変数のセットを発見することができる。これは遺伝アルゴリズムより直接的な手法であるので、同様のデータについて、ジェネティックアルゴリズムより良好に実施される可能性が高い。この方法の主な欠点は、探索の補助に感度分析情報を使用しないことである。 Tests that are guided by a neural network to select variables are found to be less effective than ranking methods. It is clear that the ranking method is the most effective, but the neural network guidance method can finally find the best set of variables. Since this is a more direct method than the genetic algorithm, it is likely that similar data will be better implemented than the genetic algorithm. The main drawback of this method is that it does not use sensitivity analysis information to assist in searching.

＜例６＞
患者病歴およびＥＬＩＳＡデータを組み合わせる． <Example 6>
Combine patient history and ELISA data.

要件．
上記の例で開発した処理を使用して、患者病歴データおよびＥＬＩＳＡデータの組合せについてネットワークのセットをトレーニングする。抗原の全セットの使用に基づいてＥＬＩＳＡテストから生成される指標を使用して、この情報を患者病歴データと組み合わせることによって達成される性能の改善を決定することになる。 Requirements.
The process developed in the above example is used to train a set of networks for a combination of patient history data and ELISA data. An index generated from an ELISA test based on the use of the entire set of antigens will be used to determine the performance improvement achieved by combining this information with patient history data.

追加要件．
上記要件に加えて、複数のＥＬＩＳＡからのデータ、ＥＬＩＳＡ１００およびＥＬＩＳＡ２００データとＥＬＩＳＡ２データとの間の比較、ならびに変数の相互関係の分析を実施し、元のＥＬＩＳＡテストが関係する変数を決定する助けとした。 Additional requirements.
In addition to the above requirements, data from multiple ELISAs, comparisons between ELISA100 and ELISA200 data and ELISA2 data, as well as analysis of the interrelationships of variables, to help determine the variables to which the original ELISA test relates did.

使用した方法．
ＥＬＩＳＡテストの結果を含めることによって達成される診断テストの性能の改善を決定するために、例２で説明したホールドアウト方法を使用していくつかのトレーニングを行った。 The method used.
In order to determine the improved diagnostic test performance achieved by including the results of the ELISA test, some training was performed using the holdout method described in Example 2.

各区分中でデータの８０％がトレーニングに使用され、残りの２０％がテストに使用されるように、データの区分を作成した。 Data sections were created so that 80% of the data was used for training in each section and the remaining 20% was used for testing.

ランダム開始重みの影響を最小限に抑えるために、いくつかのネットワークは全トレーニング実行でトレーニングする。こうした実行では、三つのネットワークは、それぞれ異なるランダム開始からの、データの五つの区分のそれぞれでトレーニングした。ネットワークの出力を平均し、単一のネットワークから得られるより低い変動を有するコンセンサス結果を形成した。全ての形態のＥＬＩＳＡデータを利用することができる患者数は３２５であるので、元の１４個の変数での新しいトレーニング実行を行い、ＥＬＩＳＡデータが疾病の診断に与える影響を比較する正確な平均を提供した。ＥＬＩＳＡ２データの分析は、そのテストのための広範囲の値を示した。ＥＬＩＳＡ２のＥＬＩＳＡ１００データに対する関係を示すプロットは、ＥＬＩＳＡ２データの対数の方が未処理値より良好である可能性があることを示す。 In order to minimize the effects of random starting weights, some networks train with full training runs. In such an implementation, the three networks were trained on each of the five segments of data, each from a different random start. The network outputs were averaged to form a consensus result with lower variability obtained from a single network. Since 325 patients have access to all forms of ELISA data, a new training run with the original 14 variables will be performed and an accurate average comparing the impact of ELISA data on disease diagnosis will be obtained. Provided. Analysis of the ELISA2 data showed a wide range of values for the test. A plot showing the relationship of ELISA2 to ELISA100 data shows that the logarithm of ELISA2 data may be better than the raw value.

比較トレーニング実行は以下のように構成される。 The comparative training execution is configured as follows.

実行１：ＥＬＩＳＡ１００、ＥＬＩＳＡ２００、対数（ＥＬＩＳＡ２）および元の１４個の変数
実行２：（ＥＬＩＳＡ２）および元の１４個の変数
実行３：元の１４個の変数 Run 1: ELISA 100, ELISA 200, Logarithm (ELISA2) and original 14 variables Run 2: (ELISA2) and original 14 variables Run 3: Original 14 variables

これらの比較実行を行った後で、ネットワークの最後のセットを３２５人の患者の完全なデータセットについてトレーニングした。ネットワークの最後のセットでは、八つのネットワークのコンセンサスを作成し、最終的な統計値を生成した。最後の実行の統計値は、トレーニングデータについてのみ報告され、真の性能の上限を表す。最後のホールドアウト実行の結果は、性能についての可能な下限を表す。 After making these comparison runs, the last set of networks was trained on a complete data set of 325 patients. The final set of networks created a consensus of eight networks and generated the final statistics. The last run statistic is reported only for training data and represents the true performance limit. The result of the last holdout execution represents a possible lower bound on performance.

トレーニングデータから、診断に利用できないものも含めた６５個の変数のそれぞれは、３２５個のトレーニング例の中のトレーニング例に組み込まれる。ＴｒａｉｎＤｏｓトレーニングプログラムは、ネットワークの生成を自動化し、変数間の関係を提供するように修正した。６５個のネットワークのそれぞれでは、一つの変数が残りの６４個によって予測される。予測を行う際の各変数の重要性を示すために、各ネットワークについて感度分析を実施した。 From the training data, each of the 65 variables, including those that are not available for diagnosis, are incorporated into the training examples among the 325 training examples. The TrainDos training program was modified to automate network generation and provide relationships between variables. In each of the 65 networks, one variable is predicted by the remaining 64. Sensitivity analysis was performed on each network to show the importance of each variable in making the prediction.

結果．
三つの比較実行についてのコンセンサス結果は以下の通りである。 result.
The consensus results for the three comparison runs are as follows:

［表１０］
――――――――――――――――――――――――――――――――――――
実行１：全てのＥＬＩＳＡ変数（ＣＲＦＥ：１） 66.46％
実行２：ＥＬＩＳＡ２の対数（ＣＲＦＥＬ２） 66.77％
実行３：ＥＬＩＳＡ変数なし（ＣＲＦＥＬ０） 62.76％
―――――――――――――――――――――――――――――――――――― [Table 10]
――――――――――――――――――――――――――――――――――――
Run 1: all ELISA variables (CRFE: 1) 66.46%
Execution 2: Logarithm of ELISA2 (CRFEL2) 66.77%
Execution 3: No ELISA variable (CRFEL0) 62.76%
――――――――――――――――――――――――――――――――――――

実行１および実行２を比較すると、ＥＬＩＳＡ１００およびＥＬＩＳＡ２００のデータをＥＬＩＳＡ２データに追加したことの影響がないことが分かる。したがって、ＥＬＩＳＡ１００およびＥＬＩＳＡ２００の変数は除去することができる。 Comparing Run 1 and Run 2, it can be seen that there is no effect of adding the ELISA 100 and ELISA 200 data to the ELISA 2 data. Therefore, the ELISA 100 and ELISA 200 variables can be removed.

実行２および実行３を比較すると、ＥＬＩＳＡテストに基づく入力によって、疾病の診断が改善されたことが分かる。 Comparing Run 2 and Run 3, it can be seen that the input based on the ELISA test improved the diagnosis of the disease.

実行３をｐａｔ０６と比較すると、テストの性能が５.４７％低下することが
分かる。これは単にトレーニングに利用できる患者数の減少によるものである。このことは、５００を超えるトレーニングデータの増加は、テストデータについてのニューラルネットワークの性能に対して相当な影響を有する可能性が高いことも意味する。 Comparing run 3 with pat06, it can be seen that the performance of the test is reduced by 5.47%. This is simply due to a decrease in the number of patients available for training. This also means that an increase in training data over 500 is likely to have a significant impact on the performance of the neural network for test data.

これらの結果に基づいて、最後のネットワークをトレーニングした。八つのネットワークは３２５人の患者についてトレーニングした。このトレーニングデータについての性能は７２．３１％であった。これはｐａｔ０７実行と同様の結果であるが、ＥＬＩＳＡ２データによる改善が利用可能なトレーニングデータの量が減少したことによって相殺されていることは明らかである。 Based on these results, the last network was trained. The eight networks trained on 325 patients. The performance for this training data was 72.31%. This is the same result as the pat07 execution, but it is clear that the improvement with ELISA2 data is offset by the reduced amount of training data available.

感度分析の結果は、１５個の変数の中で７番目に順位付けされたＥＬＩＳＡ２の変数を使用したことを示す。 The results of the sensitivity analysis indicate that the ELISA2 variable ranked 7th among the 15 variables was used.

八つのトレーニングしたネットワークの対数ファイルから、隠れた処理要素出力のプロットを作成した。所望の出力をプロット上に示すことができるように平均を求めた。八つのネットワークを比較することにより、それぞれが異なる方法でタスクを実施することが明らかとなる。データ点のいくつかのクラスタ化が、いくつかのプロットに見られる。これは一貫して起こるわけではないので、結諭を導き出すことはできない。 A plot of hidden processing element output was created from log files of eight trained networks. The average was determined so that the desired output could be shown on the plot. A comparison of the eight networks reveals that each performs the task in a different way. Some clustering of data points can be seen in some plots. Since this does not happen consistently, no ligation can be derived.

ネットワーク出力値のカットオフの使用に基づいて、最後のトレーニング実行について統計値を生成した。ネットワーク出力がカットオフ以下である場合には、この例は考慮していない。 Statistics were generated for the last training run based on the use of the network output value cutoff. This example is not considered when the network output is below the cutoff.

以下の表はＣＲＦＬＥ２中の八つのネットワークのコンセンサスについての結果の概要である。 The following table summarizes the results for the consensus of the eight networks in CRFLE2.

一般に、これらの結果はｐａｔ０７についての結果より良好である。 In general, these results are better than those for pat07.

この最後のトレーニングのデモとして、ａｄｚｃｒｆ２．ｅｘｅ（付録ＩＩ参照）と呼ばれるテストプログラムを生成した。このプログラムにより、ＥＬＩＳＡフィールド中の値入力に基づくｐａｔ０７およびＣＲＦＥＬ２の実行が可能になる。このフィールド中の値が０であると、ｐａｔ０７が使用される。 As a demonstration of this last training, adzcrf2. A test program called exe (see Appendix II) was generated. This program allows the execution of pat07 and CRFEL2 based on the value input in the ELISA field. If the value in this field is 0, pat07 is used.

変数の関係の分析を実施した。この関係の分析に基づいて、Ｅｎｄｏ存在を寄与ファクタとして示す変数を、予測Ｅｎｄｏ中で使用する変数と比較した。二つのネットワーク（ＰＡＴＶＡＲＳＡおよびＰＡＴＶＡＲＳ３）のトレーニングの結果は、Ｅｎｄｏの場合には、相関を使用した場合のように関係が対称的ではないことを示す。結果を要約するために、ＣＲＦＶＡＲＳＡ．ＸＬＳを感度分析の結果から構築した。これらの結果は、関係の非線形特性を示す。変数の重要性は、トレーニング実行中のその他の変数の影響を受ける。このことは、この分析の利便性を高めるために重要でない変数を自動的に除去する手段が必要になることもあることを意味する。 An analysis of variable relationships was performed. Based on an analysis of this relationship, variables that indicated Endo presence as a contributing factor were compared to variables used in the predicted Endo. The training results of the two networks (PATVARSA and PATVARS3) show that in the case of Endo, the relationship is not symmetric as with correlation. To summarize the results, CRFVARSA. XLS was constructed from the results of sensitivity analysis. These results show the non-linear characteristics of the relationship. The importance of variables is affected by other variables during the training run. This means that a means for automatically removing unimportant variables may be required to increase the convenience of this analysis.

変数の関係（ＣＲＦＶＡＲ００からＣＲＦＶＡＲ６４）の分析は、ほとんどの場合に、ＥＬＩＳＡ２テストの対数の方が未処理のＥＬＩＳＡ２値より高い有効性を有することを示す。特に、予測するＥｎｄｏ存在およびＡＦＳＳｔａｇｅの両方について上位に順位付けされた対数値ではそのようになる。 Analysis of the variable relationships (CRFVAR00 to CRFVAR64) shows that in most cases the logarithm of the ELISA2 test is more effective than the raw ELISA2 value. This is especially true for logarithmic values ranked higher for both the predicted Endo presence and AFS Stage.

結論． Conclusion.

ＥＬＩＳＡ２テストは、ニューラルネットワークの予測能力を追加する。ＥＬＩＳＡ２テストにより、元のＥＬＩＳＡテストは不要となる。この結果に基づいて、ウェスタンブロットデータの処理の結果は、ニューラルネットワークの診断テストの能力をさらに改善することになる可能性が高い。 The ELISA2 test adds the predictive power of neural networks. The ELISA2 test eliminates the need for the original ELISA test. Based on this result, the results of processing the Western blot data are likely to further improve the diagnostic test capability of the neural network.

トレーニングデータの増加の影響は、実行３とｐａｔ０６の比較において明らかに見られる。この性能の差は、トレーニングデータの増加によってニューラルネットワークの性能が大幅に向上したことを意味する。この比較から、データを二倍にすれば性能は１０から１５％向上することになることは明らかである。データを８倍から１０倍にすれば、性能は７５から８０％向上する可能性がある。 The effect of increasing training data is clearly seen in the comparison of run 3 and pat06. This difference in performance means that the performance of the neural network is greatly improved by increasing the training data. From this comparison, it is clear that doubling the data will improve performance by 10 to 15%. If the data is increased from 8 to 10 times, the performance may be improved by 75 to 80%.

＜例７＞
患者病歴Ｓｔａｇｅ／ＡＦＳＳｃｏｒｅトレーニング． <Example 7>
Patient history Stage / AFS Score training.

要件．
上記の例で開発した方法を使用して、疾病のｓｔａｇｅまたはＡＦＳＳｃｏｒｅのいずれかについての関連のある変数を識別する。使用するターゲット出力変数の選択は、重要な患者病歴変数のフェーズ１リストを使用するトレーニング実行からのテストセットの性能の比較によって決定される。重要な変数のリストを選択した後で、５１０個の患者データベースについて八つのニューラルネットワークのコンセンサスをトレーニングすることになる。 Requirements.
The method developed in the above example is used to identify relevant variables for either disease stage or AFS Score. The selection of the target output variable to use is determined by comparing the performance of the test set from the training run using the phase 1 list of key patient history variables. After selecting a list of important variables, we will train a consensus of eight neural networks on the 510 patient database.

使用した方法．
Ｓｔａｇｅに望ましい出力およびＡＦＳｓｃｏｒｅに望ましい出力について、トレーニング例を構築した。Ｓｔａｇｅ情報の欠落した患者が７人、Ｓｃｏｒｅ情報が欠落した患者が２８人存在した。ｓｔａｇｅ変数については、データが欠落している場合には平均値２．０９を使用した。ｓｃｏｒｅについては、欠落データは、ｓｔａｇｅ変数の値に依存する値で置換した。ｓｔａｇｅ１では、ｓｃｏｒｅとして３を使用した。ｓｔａｇｅ２では、１０．５を使用した。ｓｔａｇｅ３では２８を使用し、ｓｔａｇｅ４では値５５を使用した。ｓｔａｇｅおよびｓｃｏｒｅは、所望の出力が０．０から１．０の範囲となるように再処理した。ｓｔａｇｅは線形に翻訳した。ｓｃｏｒｅについては二つの方法を使用した。第一の方法は１２．５で割ったｓｃｏｒｅの平方根である。第二の方法は、ｓｃｏｒｅ＋１の対数を１５０の対数で割った数である。 The method used.
Training examples were constructed for the desired output for Stage and the desired output for AFS score. There were 7 patients with missing Stage information and 28 patients with missing Score information. For the stage variable, an average value of 2.09 was used when data was missing. For score, the missing data was replaced with a value that depends on the value of the stage variable. In stage 1, 3 was used as the score. In stage 2, 10.5 was used. In stage 3, 28 was used, and in stage 4, the value 55 was used. Stage and score were reprocessed so that the desired output was in the range of 0.0 to 1.0. The stage was translated linearly. Two methods were used for score. The first method is the square root of score divided by 12.5. The second method is a number obtained by dividing the logarithm of score + 1 by the logarithm of 150.

ホールドアウト方法を使用して、ｓｔａｇｅ、ｓｃｏｒｅの平方根、およびｓｃｏｒｅの対数についてネットワークをトレーニングした。これらのネットワークは、４５個の変数を使用してトレーニングした。その結果を比較して、どの変数および処理を、この例の残りに使用するかを決定した。ｓｃｏｒｅの対数を選択した。 The holdout method was used to train the network for stage, square root of score, and logarithm of score. These networks were trained using 45 variables. The results were compared to determine which variables and treatments were used for the remainder of this example. The logarithm of score was selected.

この時点で、重要な変数のセットを分離する手順が開始された。八つのネットワークを全トレーニング例についてトレーニングし、コンセンサス感度分析を生成して変数の第一の順位付けを生み出した。次いでカイ二乗分割表を生成し、変数の第二の順位付けを生み出した。重要な変数を分離する手順は手動で開始するが、時間がかかりすぎることが分かった。この手順は、コンピュータプログラムとして実施し、約一週間コンピュータ上で実行した。 At this point, a procedure was started to isolate a set of important variables. Eight networks were trained on all training examples and a consensus sensitivity analysis was generated to generate the first ranking of variables. A chi-square contingency table was then generated to produce a second ranking of variables. The procedure for isolating important variables was started manually but was found to be too time consuming. This procedure was implemented as a computer program and was run on the computer for about a week.

変数選択の結果から、八つ１組のネットワークを全トレーニング例についてトレーニングした。コンセンサス結果を分析し、Ｅｎｄｏ存在の結果と比較した。 From the variable selection results, a set of eight networks was trained for all training examples. Consensus results were analyzed and compared to the results of Endo presence.

結果．
４５個全ての変数の感度分析により以下の変数の順位付けが与えられた。 result.
A sensitivity analysis of all 45 variables gave the following variable rankings:

カイ二乗分析から、以下の変数の順位付けが与えられた。 From the chi-square analysis, the following variable rankings were given:

変数選択手順中に選択した変数は以下の通りであり、これは最後の感度分析からの順位付けを示す。 The variables selected during the variable selection procedure are as follows, which indicates the ranking from the last sensitivity analysis.

ｓｃｏｒｅネットワークとＥｎｄｏ存在ネットワークとの比較は、所望のｓｃｏｒｅ出力にしきい値を与え、Ｅｎｄｏ存在の比較を生み出すことによって実施することができる。ｓｃｏｒｅおよびｐａｔ０７のネットワークについての結果を以下に示す。 The comparison between the score network and the Endo presence network can be performed by providing a threshold to the desired score output and creating an Endo presence comparison. The results for the score and pat07 networks are shown below.

結論．
この例で識別した変数のセットは妥当であると考えられる。 Conclusion.
The set of variables identified in this example is considered reasonable.

自動化変数選択方法は適当に機能すると考えられる。変数の選択は、感度分析によって良好に予測される。 The automated variable selection method seems to work properly. The selection of variables is well predicted by sensitivity analysis.

疾病を予測する方法は二つあるので、Ｅｎｄｏ存在ネットワークおよびＳｃｏｒｅネットワークを組み合わせて、予測の信頼性を向上させることができる。 Since there are two methods for predicting illness, the reliability of prediction can be improved by combining the Endo presence network and the Score network.

＜例８＞
患者病歴Ａｄｈｅｓｉｏｎｓトレーニング． <Example 8>
Patient history Adhesions training.

要件．
例７で概説した方法を使用して、Ａｄｈｅｓｉｏｎｓターゲット出力変数についての関連変数を識別する。このターゲット出力変数は、重要な患者病歴変数のフェーズ１リストを使用して実行されることになる。これにより、新しい出力を、フェーズ１中で使用されるＥｎｄｏ存在ターゲット変数と比較することも可能になる。重要な変数のリストを選択した後で、５１０個の患者データベースについて八つのニューラルネットワークのコンセンサスをトレーニングすることになる。 Requirements.
The method outlined in Example 7 is used to identify relevant variables for Adhesions target output variables. This target output variable will be implemented using a phase 1 list of important patient history variables. This also allows the new output to be compared to the Endo presence target variable used in Phase 1. After selecting a list of important variables, we will train a consensus of eight neural networks on the 510 patient database.

使用した方法．
ａｄｈｅｓｉｏｎｓ変数についてのトレーニングデータは、例７の場合と同様に生成した。ａｄｈｅｓｉｏｎｓ変数は、Ｅｎｄｏ存在の場合に使用したのと同様の方法で二つの出力変数を生成した。この時点で、重要な変数のセットを分離する手順が開始された。八つ１組のネットワークを全トレーニング例についてトレーニングし、コンセンサス感度分析を生成して変数の第一の順位付けを生み出した。次いでカイ二乗分割表を生成し、変数の第二の順位付けを生み出した。重要な変数を分離する手順は手動で開始するが、時間がかかりすぎることが分かった。この手順は、コンピュータプログラムとして実施し、完了するまでに約一週間コンピュータ上で実行した。 The method used.
Training data for the adhesions variable was generated in the same manner as in Example 7. The adhesions variable generated two output variables in the same way as used when Endo was present. At this point, a procedure was started to isolate a set of important variables. A set of eight networks was trained for all training examples, and a consensus sensitivity analysis was generated to produce the first ranking of variables. A chi-square contingency table was then generated to produce a second ranking of variables. The procedure for isolating important variables was started manually but was found to be too time consuming. This procedure was implemented as a computer program and was run on the computer for about a week to complete.

カイ二乗分析により以下の変数の順位付けが与えられた。 The chi-square analysis gave the following variable rankings:

結論．
この例で識別した変数のセットは妥当であると考えられる。自動化変数選択方法は適当に機能すると考えられる。変数の選択は、感度分析によって良好に予測される。 Conclusion.
The set of variables identified in this example is considered reasonable. The automated variable selection method seems to work properly. The selection of variables is well predicted by sensitivity analysis.

＜例９＞
この例は本明細書で提供するプロセスの再現性を示す。 <Example 9>
This example shows the reproducibility of the process provided herein.

使用した方法．
ＡｄｈｅｓｉｏｎｓおよびＳｃｏｒｅについての重要な変数の選択に使用したソフトウェアは、Ｅｎｄｏ存在の所望の出力を処理するように修正した。このソフトウェアは、各特定テストについて再コンパイルする必要なく、一般的な場合に実行することができるようにさらに修正した。 The method used.
The software used to select key variables for Adhesions and Score was modified to handle the desired output of Endo presence. The software was further modified to run in the general case without having to recompile for each specific test.

Ａｄｈｅｓｉｏｎおよびｓｃｏｒｅについての実行と同様に、Ｅｎｄｏ存在変数について実行を行った。これは、変数選択プロセス中に四つのネットワークのコンセンサスを使用することを含む。トレーニングデータはトレーニングプロセス中に五つの区分に分割し、テストする変数の現在のセットをそれぞれ評価する、総数で２０個のネットワークを生成した。 Similar to the execution for Adhesion and score, the execution was performed for the Endo presence variable. This involves using a consensus of four networks during the variable selection process. The training data was divided into five sections during the training process, generating a total of 20 networks, each evaluating the current set of variables to be tested.

異なる乱数シードを有する実行の結果は、コンセンサス中のネットワーク数を増加させる必要があることを示した。 The results of runs with different random seeds indicated that the number of networks in consensus needs to be increased.

１０個のネットワークのコンセンサスをプロセス中に使用して、二つの追加変数選択実行を行った。この場合には、総数で５０個のネットワークが変数の単一の組合せを評価するようにトレーニングされる。二つの別々の実行を、ランダム開始シードのみを変更して、同様に行った。 Two additional variable selection runs were performed using a consensus of 10 networks in the process. In this case, a total of 50 networks are trained to evaluate a single combination of variables. Two separate runs were done in the same way, changing only the random starting seed.

こうした最後の二つの変数選択実行から、八つ１組のネットワークは、各変数セット（ｐａｔ０８、ｐａｔ０９）についてトレーニングされ、新しいデータ（元の５１０個の記録データベースには含まれない）についてそれらの性能を評価することを可能にする。これらのネットワークの性能についての統計値は、それらを元のｐａｔ０７のコンセンサスネットと比較することができるように生成される。 From these last two variable selection runs, a set of eight networks is trained for each variable set (pat08, pat09) and their performance on new data (not included in the original 510 record database). Makes it possible to evaluate Statistics on the performance of these networks are generated so that they can be compared to the original pat07 consensus net.

結果．
異なる乱数シードを使用するそれぞれの場合で、変数選択プロセスは、重要な変数の様々なセットを発見した。コンセンサス中のネットワーク数が１０まで増加すると、異なる実行中で共通の変数が増加する。 result.
In each case using different random number seeds, the variable selection process found various sets of important variables. As the number of networks in consensus increases to 10, common variables increase in different runs.

ｐａｔ０７について使用した元の１４個の変数の多くは、１０コンセンサスネットを使用する変数選択実行において重要であると確認された。選択した変数について行った最後の実行は、ｐａｔ０８およびｐａｔ０９と呼ばれる。 Many of the original 14 variables used for pat07 have been identified as important in variable selection execution using a 10 consensus net. The last run performed on the selected variable is called pat08 and pat09.

ｐａｔ０８およびｐａｔ０９コンセンサスネットワーク中で使用した変数を、それらの感度分析の順位付けとともに以下に示す。 The variables used in the pat08 and pat09 consensus networks are listed below along with their sensitivity analysis ranking.

結論．
変数選択プロセスは良好に働き、ｐａｔ０７ネットと同様またはそれ以上に働く二つの代替のネットワークを生み出した。この結論の理由は、トレーニングデータのみについて生成した性能統計値が、ｐａｔ０８およびｐａｔ０９よりｐａｔ０７の方がわずかに良好に現れるためである。変数選択プロセスはテストセットの性能に基づいて慎重に変数を選ぶので、関連するネットワークが過剰トレーニングになっている可能性は低い。ネットワークが過剰トレーニング状態になる場合の典型的な特徴は、トレーニング例の性能が向上し、テストセットの性能が低下することである。したがってｐａｔ０７の性能の方が高いと、わずかに過剰トレーニングの結果となる可能性がある。 Conclusion.
The variable selection process worked well and produced two alternative networks that worked as well or better than the pat07 net. The reason for this conclusion is that performance statistics generated for training data only appear slightly better for pat07 than for pat08 and pat09. Since the variable selection process carefully selects variables based on the performance of the test set, it is unlikely that the associated network is overtrained. A typical feature when the network is overtrained is that the performance of the training example is improved and the performance of the test set is reduced. Thus, higher performance of pat07 may result in slightly overtraining.

変数選択プロセスは明らかに、同じトレーニングデータについての二つの代替の選択を生み出したが、二つの選択の性能は非常に類似していると考えられる。これは二つの実行についての最後の変数選択のテストセットの性能に基づいている。二つの変数の相対的な性能が近い場合には、ランダムファクタがそれらの相対的な順位付けに影響を及ぼす可能性があることが明らかになった。変数選択実行中のランダムファクタは、ランダム開始点と、トレーニング中に入力に付加された雑音の使用とを含む。ランダム雑音は、より良好な一般化（翻訳：テストセットの性能）を補助することが分かっている。コンセンサス中のネットワーク数が増加するにつれて、ランダム影響の度合いは低下する。 The variable selection process clearly produced two alternative selections for the same training data, but the performance of the two selections appears to be very similar. This is based on the performance of the last variable selection test set for the two runs. When the relative performance of the two variables is close, it has become clear that random factors can affect their relative ranking. The random factor during variable selection includes a random starting point and the use of noise added to the input during training. Random noise has been found to help better generalization (translation: test set performance). As the number of networks in consensus increases, the degree of random influence decreases.

高品質のネットワークを生み出す変数のセットの決定は、変数選択プロセスによって扱われるものと考えられる。うまく働く変数のより多くの組合せが枚挙されるにつれて、特定の変数または変数の組合せが、良好な性能には不可欠であることが明らかになる。 The determination of the set of variables that produces a high quality network is considered to be handled by the variable selection process. As more combinations of variables that work well are enumerated, it becomes apparent that a particular variable or combination of variables is essential for good performance.

＜例１０＞
子宮内膜症の過去の病歴および骨盤手術歴の排除の診断性能に対する評価．
この例の目的は、患者の子宮内膜症を有する危険性を評価する際の「子宮内膜症の過去の病歴」変数および「過去の骨盤手術歴」変数の重要性を決定し、その結論を予測する際の任意の所与の変数の重要性を測定する代替の手段（感度分析とは異なる）を提供することである。 <Example 10>
Evaluation of the diagnostic performance of exclusion of past history of endometriosis and history of pelvic surgery.
The purpose of this example is to determine the significance of the “historical history of endometriosis” and “history of previous pelvic surgery” variables in assessing the patient's risk of having endometriosis and its conclusions. Is to provide an alternative means of measuring the importance of any given variable in predicting (different from sensitivity analysis).

タスク：
１．「子宮内膜症の過去の病歴」を除いて変数選択プロセスを適用する。
２．変数選択プロセスについて様々なランダムシード変数を使用して、タスク（１）を繰り返す。
３．上記のタスク（１）および（２）で識別した「子宮内膜症関連変数」の両セットについて、コンセンサスネットワークトレーニングプロセスを完了する。
４．子宮内膜症データベースから「過去の骨盤手術歴」変数を除いて、上記タスク（１）、（２）、および（３）を繰り返す。
５．子宮内膜症データベースから「子宮内膜症の過去の病歴」変数および「過去の骨盤手術歴」変数の両方を除いて、上記タスク（１）、（２）、および（３）を繰り返す。 task:
1. The variable selection process is applied except for “historical history of endometriosis”.
2. Task (1) is repeated using various random seed variables for the variable selection process.
3. The consensus network training process is completed for both sets of “endometriosis related variables” identified in tasks (1) and (2) above.
4). The above tasks (1), (2), and (3) are repeated by removing the “past pelvic surgery history” variable from the endometriosis database.
5. The above tasks (1), (2), and (3) are repeated except for both the “historical history of endometriosis” variable and the “historical history of pelvic surgery” variable from the endometriosis database.

使用した方法．
例９で開発した変数選択ソフトウェアを基本として使用して、例１０のそれぞれについての結果を生成した。このソフトウェアは、例１０の要件に基づいて考慮から排除されることになる変数をユーザが識別することができるように修正した。このソフトウェアは、除去した変数の影響をより容易に理解することができるように、テストする変数のセットのそれぞれについての分類性能を報告することができるようにも修正した。 The method used.
Using the variable selection software developed in Example 9 as a basis, the results for each of Example 10 were generated. The software was modified to allow the user to identify variables that would be excluded from consideration based on the requirements of Example 10. The software was also modified to report the classification performance for each set of variables to be tested so that the effects of removed variables can be more easily understood.

行った各変数選択実行について、変数選択プロセスのパラメータは以下のように設定した。 For each execution of variable selection, the parameters of the variable selection process were set as follows.

［表１１］
――――――――――――――――――――――――――――――
区分数： 5
コンセンサスネットワーク： 10
トレーニング例サイズ： 510
パス数： 999
―――――――――――――――――――――――――――――― [Table 11]
――――――――――――――――――――――――――――――
Number of categories: 5
Consensus network: 10
Training example size: 510
Number of passes: 999
――――――――――――――――――――――――――――――

変数選択プロセス中のデータベース変数の順序付けは、感度分析およびカイ二乗分析に基づいている。この順序付けは、ｐａｔ０８およびｐａｔ０９で使用したものと同様である。 The ordering of database variables during the variable selection process is based on sensitivity analysis and chi-square analysis. This ordering is similar to that used in pat08 and pat09.

この例のためにトレーニングしたネットワークは、以下のように識別される（二つのネットは異なるランダムシードを有する）。 The network trained for this example is identified as follows (the two nets have different random seeds).

［表１２］
――――――――――――――――――――――――――――――
Ｅｎｄｏの過去の病歴を除去：pat10、pat11
過去の骨盤手術歴を除去： pat12、pat13
両変数を除去： pat14、pat15
―――――――――――――――――――――――――――――― [Table 12]
――――――――――――――――――――――――――――――
Remove Endo's past medical history: pat10, pat11
Remove previous pelvic surgery history: pat12, pat13
Remove both variables: pat14, pat15
――――――――――――――――――――――――――――――

変数およびランダムシードの各組合せについて変数選択プロセスが完了した後で、八つ１組のネットワークを、識別された選択した変数を使用してトレーニングした。これらのネットワークのそれぞれは、完全な５１０個の記録データベースについてトレーニングされる。これらのトレーニング実行から、出力のコンセンサスがＥｘｃｅｌのスプレッドシート中で生成され、各ネットワークの性能を評価することができる。 After the variable selection process was completed for each variable and random seed combination, a set of eight networks was trained using the identified selected variables. Each of these networks is trained on a complete 510 record database. From these training runs, output consensus can be generated in an Excel spreadsheet to evaluate the performance of each network.

結果．
ネットワークのコンセンサスの通常の性能を、５の区分でホールドアウト方法を使用して推定した。ｐａｔ０８およびｐａｔ０９の場合のように、全ての変数が利用可能である場合には、分類性能は６５．２３％と推定された。 result.
The normal performance of the network consensus was estimated using the holdout method in 5 sections. When all variables were available as in pat08 and pat09, the classification performance was estimated at 65.23%.

子宮内膜症の過去の病歴の変数が考慮から除去される場合（ｐａｔ１０およびｐａｔ１１）には、性能は６２．４７％と推定された。これは２．７６％の低下に相当する。 When variables from past history of endometriosis were removed from consideration (pat10 and pat11), the performance was estimated to be 62.47%. This corresponds to a reduction of 2.76%.

過去の骨盤手術歴の変数が考慮から除去される場合（ｐａｔ１２およびｐａｔ１３）には、性能は６４．５２％と推定された。これは０．７２％のみの低下に相当する。 When past pelvic surgery history variables were removed from consideration (pat12 and pat13), the performance was estimated at 64.52%. This corresponds to a reduction of only 0.72%.

両変数が考慮から除去される場合（ｐａｔ１４およびｐａｔ１５）には、性能は６２．４３％と推定された。これは２．８０％の低下に相当する。これは子宮内膜症の過去の病歴を除去した場合よりわずかに悪いだけであり、変数が独立である（相関がない）という想定に基づくその他の結果と矛盾しないと考えられる。 When both variables were removed from consideration (pat 14 and pat 15), the performance was estimated at 62.43%. This corresponds to a reduction of 2.80%. This is only slightly worse than if the past history of endometriosis was removed, and would not be inconsistent with other results based on the assumption that the variables were independent (no correlation).

結論．
利用可能であればニューラルネットワークで骨盤手術歴を使用するが、この変数を除去した影響は最小限であった。ニューラルネットワークは、その他の情報を使用することによってこの変数の除去を補償することができるものと考えられる。 Conclusion.
Neural networks use pelvic surgery history if available, but the effect of removing this variable was minimal. It is believed that the neural network can compensate for the removal of this variable by using other information.

子宮内膜症の過去の病歴の除去は有意である。この変数は、いかなる感度分析でも常にリストの最上位にある。その除去は、全ての変数が利用可能であるときには平均を超えて約２．７６％の性能低下を引き起こした。平均性能が６５．２３％と推定され、５０％は偶然に達成することができるものとすると、これは１８．１２％の有効低下に相当する。 Removal of the past history of endometriosis is significant. This variable is always at the top of the list for any sensitivity analysis. The removal caused about 2.76% performance degradation above average when all variables were available. If the average performance is estimated to be 65.23% and 50% can be achieved by chance, this corresponds to an effective reduction of 18.12%.

両変数が除去された場合には、どのような有意な性能低下も現れず、これはこれら二つの変数の間に相互作用がないことを示す。変数を除去し、変数選択プロセスを実行するこのプロセスは、所与の変数の真の値を決定する良好な手法であると考えられる。診断に重要だが高度に相関する変数は二つ存在し、一方のみを除去しても、ネットワークがもう一方を使用することによってこれを補償するので、ほとんど影響がないことに留意されたい。それらの値が明らかになるのは、両方を除去したときのみである。 If both variables are removed, no significant performance degradation appears, indicating that there is no interaction between these two variables. This process of removing variables and performing a variable selection process is considered a good technique for determining the true value of a given variable. Note that there are two variables that are important for diagnosis but highly correlated, and removing only one has little effect because the network compensates for this by using the other. Their values are only apparent when both are removed.

＜例１１＞
骨盤痛および月経困難症の除去の診断性能に対する評価． <Example 11>
Evaluation of diagnostic performance for removal of pelvic pain and dysmenorrhea.

要件．
目的：
１．患者の子宮内膜症を有する危険性を評価する際の「骨盤痛」変数および「月経困難症」変数の重要性を決定すること。
２．その結論を予測する際の任意の所与の変数の重要性を測定する別々の機構（感度分析とは異なる）を提供すること。
タスク：
１．本明細書に記載の変数選択プロセスを適用する。
２．変数選択プロセスについて様々なランダムシード変数を使用して、タスク（１）を繰り返す。
３．上記のタスク（１）および（２）で識別した「子宮内膜症関連変数」の両セットについて、コンセンサスネットワークトレーニングプロセスを完了する。
４．子宮内膜症データベースから「月経困難症」変数を除いて、上記タスク（１）、（２）、および（３）を繰り返す。
５．子宮内膜症データベースから「骨盤痛」変数および「月経困難症」変数の両方を除いて、上記タスク（１）、（２）、および（３）を繰り返す。 Requirements.
the purpose:
1. To determine the importance of the “pelvic pain” and “dysmenorrhea” variables in assessing the patient's risk of having endometriosis.
2. To provide a separate mechanism (different from sensitivity analysis) that measures the importance of any given variable in predicting its conclusion.
task:
1. Apply the variable selection process described herein.
2. Task (1) is repeated using various random seed variables for the variable selection process.
3. The consensus network training process is completed for both sets of “endometriosis related variables” identified in tasks (1) and (2) above.
4). The above tasks (1), (2), and (3) are repeated by removing the “dysmenorrhea” variable from the endometriosis database.
5. The above tasks (1), (2), and (3) are repeated except for both the “pelvic pain” variable and the “dysmenorrhea” variable from the endometriosis database.

使用した方法．
例９で開発した変数選択ソフトウェアを基本として使用して、これらのタスクのそれぞれについての結果を生成した。 The method used.
Using the variable selection software developed in Example 9 as a basis, results were generated for each of these tasks.

［表１３］
――――――――――――――――――――――――――――――
区分数： 5
コンセンサスネットワーク： 10
トレーニング例サイズ： 510
パス数： 999
―――――――――――――――――――――――――――――― [Table 13]
――――――――――――――――――――――――――――――
Number of categories: 5
Consensus network: 10
Training example size: 510
Number of passes: 999
――――――――――――――――――――――――――――――

変数選択プロセス中のデータベース変数の順序付けは、感度分析およびカイ二乗分析に基づいている。この順序付けは、ｐａｔ０８およびｐａｔ０９で使用したものと同様である。このタスクのためにトレーニングしたネットワークは、以下のように識別される（二つのネットは異なるランダムシードを有する）。 The ordering of database variables during the variable selection process is based on sensitivity analysis and chi-square analysis. This ordering is similar to that used in pat08 and pat09. The networks trained for this task are identified as follows (the two nets have different random seeds):

［表１４］
――――――――――――――――――――――――――――――
骨盤痛を除去： pat16、pat17、pat17A
月経困難症を除去： pat18、pat19
両変数を除去： pat20、pat21
四つの変数（EXs.11および12）：pat22、pat23、pat23A
―――――――――――――――――――――――――――――― [Table 14]
――――――――――――――――――――――――――――――
Removes pelvic pain: pat16, pat17, pat17A
Remove dysmenorrhea: pat18, pat19
Remove both variables: pat20, pat21
Four variables (EXs.11 and 12): pat22, pat23, pat23A
――――――――――――――――――――――――――――――

骨盤痛の変数が考慮から除去される場合（ｐａｔ１６およびｐａｔ１７）には、性能は６１．０３％と推定された。これは４．２０％の低下に相当する。 When the pelvic pain variable was removed from consideration (pat 16 and pat 17), the performance was estimated at 61.03%. This corresponds to a reduction of 4.20%.

月経困難症の変数が考慮から除去される場合（ｐａｔ１８およびｐａｔ１９）には、性能は６３．４４％と推定された。これは１．７９％のみの低下に相当する。 When the dysmenorrhea variable was removed from consideration (pat 18 and pat 19), the performance was estimated at 63.44%. This corresponds to a reduction of only 1.79%.

両変数が考慮から除去される場合（ｐａｔ２０およびｐａｔ２１）には、性能は６１．２２％と推定された。これは４．００％の低下に相当する。これは骨盤痛のみを除去した場合より良好である。これは骨盤痛についての性能低下が誇張されることを意味する。骨盤痛を含まずに最もよく実施されるネットワークは、６２．２９％の性能を有し、これは２．９４％の低下を与える。両方とも除去したときに性能が与えられる場合には、これはより妥当な推定となる。 When both variables were removed from consideration (pat20 and pat21), the performance was estimated at 61.22%. This corresponds to a reduction of 4.00%. This is better than removing only pelvic pain. This means that the performance degradation for pelvic pain is exaggerated. The best implemented network without pelvic pain has a performance of 62.29%, which gives a drop of 2.94%. This is a more reasonable estimate if performance is given when both are removed.

結論．
四つの変数をテストして、重要性の順序で変数を順位付けすると以下の通りとなる。 Conclusion.
Testing four variables and ranking the variables in order of importance yields:

［表１５］
――――――――――――――――――――――――――――――
骨盤痛 2.94から4.20％の低下
ｅｎｄｏの過去の病歴 2.76％の低下
月経困難症 1.79％の低下
過去の骨盤手術歴 0.72％の低下
―――――――――――――――――――――――――――――― [Table 15]
――――――――――――――――――――――――――――――
Pelvic pain 2.94 to 4.20% decrease Endo's past history 2.76% decrease Dysmenorrhea 1.79% decrease Past pelvic surgery history 0.72% decrease ――――――――――――――――― ―――――――――――――

変数を除去し、変数選択プロセスを実行するこのプロセスは、所与の変数の値を決定する良好な手法である。診断に重要だが高度に相関する変数は二つ存在し、一方のみを除去しても、ネットワークがもう一方を使用することによってこれを補償するので、ほとんど影響がないことに留意されたい。それらの真の値が明らかになるのは、両方を除去したときのみである。 This process of removing variables and performing a variable selection process is a good way to determine the value of a given variable. Note that there are two variables that are important for diagnosis but highly correlated, and removing only one has little effect because the network compensates for this by using the other. Their true value is only apparent when both are removed.

＜例１２＞
軽度および重度の子宮内膜症を区別するようにニューラルネットワークをトレーニングする．
目的：
１．最小限／軽度の子宮内膜症と中程度／重度の子宮内膜症とを区別するネットワークのコンセンサスをトレーニングすること。
タスク：
１．ネットワークを以下のようにＡＦＳｓｃｏｒｅにトレーニングする。
正＝ＥｎｄｏＳｔａｇｅＩＩＩまたはＩＶ
負＝Ｅｎｄｏなし、ＥｎｄｏＳｔａｇｅＩまたはＩＩ
２．子宮内膜症データベースのニューラルネットワークを使用
する医療および生化学テストを開発する方法に記載の変数選択プロセスを適用する。
３．変数選択プロセスに対して様々なランダムシード変数を使用して、タスク（２）を繰り返す。
４．進行する前に上記（２）および（３）で選択した変数を比較する。選択した変数のセットが大幅に異なる場合には、様々なランダムシード重みを使用してタスク（２）を繰り返す。
５．上記（２）および（３）で選択した変数について最後のコンセンサスネットワークをトレーニングする。
６．Ｅｎｄｏが患者に存在した子宮内膜症データベースのサブセットのみを使用して、ステップ（２）から（５）を繰り返す。 <Example 12>
Train the neural network to distinguish between mild and severe endometriosis.
the purpose:
1. Train a network consensus to distinguish between minimal / mild endometriosis and moderate / severe endometriosis.
task:
1. Train the network to AFS score as follows.
Positive = Endo Stage III or IV
Negative = No Endo, Endo Stage I or II
2. Apply the variable selection process described in How to develop medical and biochemical tests using a neural network of endometriosis database.
3. Task (2) is repeated using various random seed variables for the variable selection process.
4). Compare the variables selected in (2) and (3) above before proceeding. If the set of selected variables is significantly different, task (2) is repeated using various random seed weights.
5. Train the final consensus network for the variables selected in (2) and (3) above.
6). Repeat steps (2) to (5) using only a subset of the endometriosis database where Endo was present in the patient.

使用した方法．
例１０で開発し、例１１で修正した変数選択ソフトウェアを基本として使用して、この例のタスクのそれぞれについての結果を生成した。 The method used.
The variable selection software developed in Example 10 and modified in Example 11 was used as a basis to generate results for each of the tasks in this example.

［表１６］
――――――――――――――――――――――――――――――
区分数： 5
コンセンサスネットワーク： 20
トレーニング例サイズ： 510(ステップ(6)では290)
パス数： 999
―――――――――――――――――――――――――――――― [Table 16]
――――――――――――――――――――――――――――――
Number of categories: 5
Consensus network: 20
Training example size: 510 (290 in step (6))
Number of passes: 999
――――――――――――――――――――――――――――――

変数選択プロセス中のデータベース変数の順序付けは、例１で説明した新しいターゲット出力について特に実行される感度分析およびカイ二乗分析に基づいている。この例のためにトレーニングしたネットワークは、以下のように識別される（二つのネットは異なるランダムシードを有する）。 The ordering of database variables during the variable selection process is based on sensitivity analysis and chi-square analysis performed specifically on the new target output described in Example 1. The network trained for this example is identified as follows (the two nets have different random seeds).

［表１７］
――――――――――――――――――――――――――――――――――――
全データベースについてトレーニングしたネット：ＡＦＳ０１およびＡＦＳ０２
Ｅｎｄｏ存在サブセットについてトレーニングしたネット：ＡＦＳＥＰ１およびＡＦＳＥＰ２
―――――――――――――――――――――――――――――――――――― [Table 17]
――――――――――――――――――――――――――――――――――――
Net trained on all databases: AFS01 and AFS02
Nets trained on the Endo presence subset: AFSEP1 and AFSEP2
――――――――――――――――――――――――――――――――――――

変数およびランダムシードの各組合せについて変数選択プロセスが完了した後で、八つ１組のネットワークを、識別された選択した変数を使用してトレーニングした。これらのＡＦＳ０１およびＡＦＳ０２変数についてのネットワークのそれぞれは、完全な５１０個の記録データベースについてトレーニングされる。ＡＦＳＥＰ１およびＡＦＳＥＰ２変数についてのネットワークのそれぞれは、ｅｎｄｏ存在変数が正である２９１個の記録についてトレーニングされる。これらのトレーニング実行から、出力のコンセンサスがＥｘｃｅｌのスプレッドシート中で生成され、各ネットワークの性能を評価することができる。 After the variable selection process was completed for each variable and random seed combination, a set of eight networks was trained using the identified selected variables. Each of these networks for AFS01 and AFS02 variables are trained on a complete 510 recording database. Each of the networks for the AFSEP1 and AFSEP2 variables is trained for 291 records where the endo presence variable is positive. From these training runs, output consensus can be generated in an Excel spreadsheet to evaluate the performance of each network.

結果．
減少したサブセット実行中で発見された変数のカウントは、全トレーニング例についての実行の場合より少ない。ネットワークのコンセンサスの通常の性能は、５の区分のホールドアウト方法を使用して推定された。全トレーニング例を使用するＡＦＳ実行についての通常の分類性能は７７．２２５４９％であった。ｅｎｄｏ存在サブセットについての通常の分類性能は６３．００８６２１％であった。全ての例が負として分類される場合には、全トレーニング例についての性能は７８．８２％、サブセットについては６５．２９％になるはずである。正および負の分類のためのカットオフ値を変更することにより、これらの数字によって提示されるより良好な性能を達成することができる。 result.
The count of variables found in the reduced subset run is lower than in the run for all training examples. The normal performance of the network consensus was estimated using a five-part holdout method. Normal classification performance for AFS execution using all training examples was 77.22549%. The normal classification performance for the endo presence subset was 63.008621%. If all examples are classified as negative, the performance for all training examples should be 78.82% and the subset should be 65.29%. By changing the cutoff values for positive and negative classifications, better performance presented by these numbers can be achieved.

結論．
全トレーニング例およびｅｎｄｏ存在例のサブセットについての変数選択実行の結果は、重要な変数の決定ではトレーニング例のサイズが重要であることを示す。トレーニング例のサイズが大きくなるにつれて、より多くの変数が重要と考えられることになることは明らかである。この結果は、トレーニングデータが多くなれば、変数選択プロセス、および診断テストの構築に使用するコンセンサスネットワークの全体的な性能が改善されることを示すものと解釈することもできる。 Conclusion.
The results of variable selection runs for all training examples and a subset of endo presence examples show that the size of training examples is important in determining important variables. Clearly, as the size of the training example increases, more variables will be considered important. This result can also be interpreted as an indication that more training data will improve the overall performance of the variable selection process and the consensus network used to build the diagnostic test.

＜例１３＞
変数選択、妊娠に関係する事象を予測するニューラルネットの開発、および胎児フィブロネクチンのテストの性能の改善．
データは、米国特許第５４６８６１９号に記載のアッセイの臨床試験に含まれる７００人を超える被験患者から収集した。変数選択は、胎児フィブロネクチン（ｆＦＮ）テストデータなしで実施した。ＥＧＡ１からＥＧＡ４で示される最後のネットワークは、以下の表に示す変数でトレーニングされる。 <Example 13>
Improved variable selection, development of neural networks to predict pregnancy-related events, and fetal fibronectin testing performance.
Data were collected from over 700 test patients included in the clinical trials of the assay described in US Pat. No. 5,468,619. Variable selection was performed without fetal fibronectin (fFN) test data. The last network, designated EGA1 to EGA4, is trained with the variables shown in the table below.

ＥＧＡ１からＥＧＡ４は、変数選択に使用されるニューラルネットワークを表す。ＥＧＡ１では、変数選択プロトコルは、入力層に八つの入力、隠れた層に三つの処理要素、および出力層に一つの出力を有するネットワークアーキテクチャで実施される。ＥＧＡ２は、入力層の入力が九つであることを除けばＥＧＡ１と同様である。ＥＧＡ３は、入力層に七つの入力、隠れた層に三つの処理要素、および出力層に一つの出力を有する。ＥＧＡ４は、ＥＧＡ１の入力層の入力が八つであることを除けば、ＥＧＡ１と同様である。 EGA1 to EGA4 represent neural networks used for variable selection. In EGA1, the variable selection protocol is implemented in a network architecture with eight inputs in the input layer, three processing elements in the hidden layer, and one output in the output layer. EGA2 is the same as EGA1 except that there are nine inputs in the input layer. EGA3 has seven inputs in the input layer, three processing elements in the hidden layer, and one output in the output layer. EGA4 is similar to EGA1 except that there are eight inputs in the input layer of EGA1.

選択した変数は以下の通りである。 The selected variables are as follows.

ＥＧＡ＝推定在胎齢。 EGA = estimated gestational age.

最終的なコンセンサスネットワークの性能． Final consensus network performance.

ＥＧＡ＝推定在胎齢（３４週未満）；ＴＰ＝真の正；ＴＮ＝真の負；ＦＰ＝偽の正；ＦＮ＝偽の負；ＳＮ＝感度；ＳＰ＝特異性；ＰＰＶ＝正の予測値；ＮＰＶ＝負の予測値；ＯＲ＝オッズ比（正しい総数／正しい回答の総数）；ｆＦＮ＝ｆＦＮについてのＥＬＩＳＡアッセイからの結果 EGA = estimated gestational age (less than 34 weeks); TP = true positive; TN = true negative; FP = false positive; FN = false negative; SN = sensitivity; SP = specificity; PPV = positive predictive value NPV = negative predictive value; OR = odds ratio (correct total / total correct answer); fFN = result from ELISA assay for fFN

この結果は、七つの患者変数およびｆＦＮＥＬＩＳＡアッセイを含み、３４週未満での分娩を予測するニューラルネットであるネットワークＥＧＡ４が、ｆＦＮＥＬＩＳＡアッセイよりはるかに少ない偽の正を有することを示す。さらに、偽の正の数は５０％低下した。ｆＦＮテストをニューラルネットに組み込むことで、ｆＦＮＥＬＩＳＡアッセイの性能が向上した。全てのニューラルネットは、単独のｆＦＮテストより良好に実施された。したがって、本明細書の方法は、妊娠に関係する事象を予測するために使用することができるニューラルネットおよびその他の意思決定支援システムを開発するために使用することができる。 This result shows that network EGA4, a neural network that predicts parturition in less than 34 weeks, includes seven patient variables and the fFN ELISA assay, has much fewer false positives than the fFN ELISA assay. Furthermore, the false positive number dropped by 50%. Incorporating the fFN test into the neural network improved the performance of the fFN ELISA assay. All neural networks performed better than a single fFN test. Thus, the methods herein can be used to develop neural networks and other decision support systems that can be used to predict pregnancy related events.

＜例１４＞
ｐａｔ０７変数の特定のサブセットについてコンセンサスニューラルネットワークをトレーニングする．
この例は、ｐａｔ０７変数のｐａｔ０７性能への寄与を測量し、最小数のｐａｔ０７変数を使用して子宮内膜症ネットワークを開発するように設計されたタスクの結果を示す。 <Example 14>
Train a consensus neural network for a specific subset of pat07 variables.
This example shows the results of a task designed to survey the contribution of the pat07 variable to pat07 performance and develop an endometriosis network using the minimum number of pat07 variables.

タスク：
１．以下のｐａｔ０７変数の組合せを使用して最後のコンセンサスネットワークをトレーニングする。
ａ．全１４個からＥｎｄｏ歴を引く（総数１３個の変数）
ｂ．全１４個から骨盤痛を引く（総数１３個の変数）
ｃ．全１４個から月経困難症を引く（総数１３個の変数）
ｄ．全１４個から骨盤手術を引く（総数１３個の変数）
２．ｐａｔ０７変数のその他の組合せを使用して最後のコンセンサスネットワークをトレーニングする。
ａ．Ｅｎｄｏ歴、骨盤痛、および月経困難症
ｂ．Ｅｎｄｏ歴、骨盤痛、月経困難症、および骨盤手術歴
３．上記結果から示されるｐａｔ０７変数のその他の組合せを使用して最後のコンセンサスネットワークをトレーニングする。 task:
1. Train the final consensus network using a combination of the following pat07 variables.
a. Subtract Endo history from all 14 (total of 13 variables)
b. Draw pelvic pain from all 14 (total of 13 variables)
c. Subtract dysmenorrhea from all 14 (total of 13 variables)
d. Subtract pelvic surgery from all 14 (total of 13 variables)
2. Train the final consensus network using other combinations of pat07 variables.
a. Endo history, pelvic pain, and dysmenorrhea b. 2. Endo history, pelvic pain, dysmenorrhea, and pelvic surgery history. Train the final consensus network using other combinations of pat07 variables indicated by the above results.

使用した方法．
元の患者データベースを使用して、評価すべき変数の各組合せについてトレーニング例が生成された。これらのトレーニング例は、所与のコンセンサス実行に必要な変数のみを含む。ＴｒａｉｎＤｏｓ^TMをバッチモードで使用して、評価すべき変数の各組合せについて八つのニューラルネットワークのセットをトレーニングした。ネットワークは、ｐａｔ０７トレーニング実行と同様のパラメータを使用してトレーニングした。唯一の相違は、各ネットワークについての乱数シードの設定である。各ネットワークは、全５１０個の記録データベースについてトレーニングされた。これらのトレーニング実行から、出力のコンセンサスがＥｘｃｅｌのスプレッドシート中で生成され、各ネットワークの性能を評価することができる。 The method used.
Using the original patient database, training examples were generated for each combination of variables to be evaluated. These training examples include only those variables necessary for a given consensus run. TrainDos ^™ was used in batch mode to train a set of eight neural networks for each combination of variables to be evaluated. The network was trained using parameters similar to the pat07 training run. The only difference is the setting of a random number seed for each network. Each network was trained on a total of 510 recording databases. From these training runs, output consensus can be generated in an Excel spreadsheet to evaluate the performance of each network.

結果．
これらの実行は最後のトレーニング実行であるので、変数を除去した影響は見られるが、ホールドアウト方法によって達成することができるほど明白な指示は与えない。 result.
Since these runs are the last training runs, the effect of removing the variable is seen, but does not give a clear indication that can be achieved by the holdout method.

結論．
所与の変数のセットの寄与を決定することを目的とする全トレーニング例についての変数選択実行の結果は、変数選択プロセスで使用した評価方法ほど良好な方法ではない。５の区分、および２０個のネットコンセンサスでの評価用の「ホールドアウト」方法は、変数を比較するための大幅に良好な統計値を与える。 Conclusion.
The result of variable selection execution for all training examples aimed at determining the contribution of a given set of variables is not as good as the evaluation method used in the variable selection process. The “holdout” method for evaluation with 5 divisions and 20 net consensus gives significantly better statistics for comparing variables.

＜例１５＞
ニューラルネットワーク（ｐａｔ０７）を介した分析に適した複数のパラメータを使用する、子宮内膜症の診断を補助する方法および装置．
図７は、複数のニューラルネットワークのコンセンサスネットワーク（図１０）で使用される形態の臨床データについてトレーニングされた一タイプのニューラルネットワーク１０の一実施形態を示す概略図である。この構造は、デジタルコンピュータで処理される重み値およびデータとともにデジタル形式で記憶される。この第一タイプニューラルネットワーク１０は、三つの層、すなわち入力層１２、隠れた層１４、および出力層１６を含む。入力層１２は、平均および標準偏差値を生成して、入力層に入力される臨床ファクタに重み付けする正規化装置（図示せず）をそれぞれ備える１４個の入力プリプロセッサ１７〜３０を有する。平均および標準偏差値は、ネットワークトレーニングデータに固有である。入力層のプリプロセッサ１７〜３０はそれぞれ、経路５１〜６４、および６５〜７８を介して、隠れた層１４の第一および第二処理要素４８、５０に結合され、隠れた層の処理要素４８、５０がそれぞれ、各入力プリプロセッサ１７〜３０から値または信号を受けるようになっている。各経路は、トレーニングデータについてのトレーニングの結果に基づく固有の重みを備える。固有の重み８０〜９３および９５〜１０８は出力と非線形に関係し、各ネットワーク構造およびトレーニングデータの初期値について固有である。重みの最終値は、ネットワークトレ−ニングに割り当てられた初期化値に基づく。トレーニングの結果生じる重みの組合せは、重みで表現されるその記述が所望の解決策を生成する機能的装置、またはより詳細には子宮内膜症の診断の暫定的指標を含む。 <Example 15>
A method and apparatus for assisting in the diagnosis of endometriosis using a plurality of parameters suitable for analysis via a neural network (pat07).
FIG. 7 is a schematic diagram illustrating one embodiment of one type of neural network 10 trained on the form of clinical data used in a consensus network of multiple neural networks (FIG. 10). This structure is stored in digital form along with weight values and data processed by a digital computer. This first type neural network 10 includes three layers: an input layer 12, a hidden layer 14, and an output layer 16. The input layer 12 has 14 input preprocessors 17-30, each with a normalizer (not shown) that generates mean and standard deviation values and weights the clinical factors input to the input layer. Mean and standard deviation values are specific to network training data. The input layer preprocessors 17-30 are coupled to the first and second processing elements 48, 50 of the hidden layer 14 via paths 51-64 and 65-78, respectively, and the hidden layer processing elements 48, Each 50 receives a value or signal from each input preprocessor 17-30. Each path has a unique weight based on the training results for the training data. The unique weights 80-93 and 95-108 are non-linearly related to the output and are unique for each network structure and initial value of the training data. The final value of the weight is based on the initialization value assigned to network training. The combination of weights resulting from the training includes a functional device whose description expressed in weights produces the desired solution, or more specifically, a tentative indicator of the diagnosis of endometriosis.

本明細書で提供する子宮内膜症テストでは、ニューラルネットワークをトレーニングするために使用される、出力がそれに基づくファクタは、疾病の過去の病歴、出産回数、月経困難症、年齢、骨盤痛、骨盤手術歴、一日あたりの喫煙量、薬物治療歴、妊娠回数、流産回数、異常ＰＡＰ／形成異常症、妊娠高血圧症、生殖器いぼ、糖尿病である。これら１４個のファクタは、４０を超える臨床ファクタの元のセットの中で、最も影響力のある（最大感度の）セットであると決定されている。（影響力のあるファクタのその他のセットも導かれている。上記の各例を参照）。 In the endometriosis test provided herein, the factors based on output used to train the neural network are: past history of disease, number of births, dysmenorrhea, age, pelvic pain, pelvis History of surgery, amount of smoking per day, history of drug treatment, number of pregnancy, number of miscarriages, abnormal PAP / dysplasia, pregnancy hypertension, genital warts, diabetes. These 14 factors have been determined to be the most influential (maximum sensitivity) set of the original set of over 40 clinical factors. (Other sets of influential factors have also been derived, see examples above).

隠れた層は、経路１６４および１７９を介して処理要素４８および５０に提供されるバイアス重み９４、１１９によってバイアスされる。出力層１６は、二つの出力処理要素１２０、１２２を含む。出力層１６は、隠れた層の処理要素４８、５０の両方から、経路１２３、１２４、および１２５、１２６を介して入力を受ける。出力層の処理要素１２０、１２２は、重み１１０、１１２、および１１４、１１６によって重み付けされる。出力層１６は、経路１２９および１３１を介して処理要素１２０および１２２に提供されるバイアス重み１２８、１３０によってバイアスされる。 The hidden layers are biased by bias weights 94, 119 provided to processing elements 48 and 50 via paths 164 and 179. The output layer 16 includes two output processing elements 120 and 122. The output layer 16 receives input from both hidden layer processing elements 48, 50 via paths 123, 124, and 125, 126. The output layer processing elements 120, 122 are weighted by weights 110, 112 and 114, 116. Output layer 16 is biased by bias weights 128, 130 provided to processing elements 120 and 122 via paths 129 and 131.

子宮内膜症の有無または重さの暫定的指標は、二つの処理要素１２０、１２２からの値ＡおよびＢの出力対である。これらの値は常に０から１の間の正である。一方の指標は、子宮内膜症が存在することを示す。もう一方の指標は、子宮内膜症が存在しないことを示す。出力対Ａ、Ｂは一般に有効な疾病の指標を与えるが、トレーニングしたニューラルネットワークのコンセンサスネットワークはより信頼性の高いインデックスを提供する。 A provisional indicator of the presence or severity of endometriosis is the output pair of values A and B from the two processing elements 120,122. These values are always positive between 0 and 1. One indicator indicates the presence of endometriosis. The other indicator indicates the absence of endometriosis. The output pair A, B generally provides an effective disease index, but the consensus network of trained neural networks provides a more reliable index.

図１０を参照すると、最後の指標対Ｃ、Ｄは、複数、詳細には八つの、トレーニングされたニューラルネットワーク１０Ａから１０Ｈ（図１０）からの暫定的指標対のコンセンサスの分析に基づいている。各暫定的指標対Ａ、Ｂは、経路１３３〜１４０および１４１〜１４８を介して二つのコンセンサスプロセッサ１５０、１５２の一方に供給される。第一コンセンサスプロセッサ１５０は全ての正の指標を処理する。第二コンセンサスプロセッサ１５２は全ての負の指標を処理する。各コンセンサスプロセッサ１５０、１５２は平均化装置である、すなわち同様の暫定的指標対Ａ、Ｂの集合の、平均などの一次結合を単に形成する。その結果の信頼性指標対は所望の結果であり、入力は被験患者についての臨床ファクタのセットである。 Referring to FIG. 10, the last index pair C, D is based on an analysis of consensus of provisional index pairs from multiple, specifically eight, trained neural networks 10A-10H (FIG. 10). Each interim indicator pair A, B is fed to one of the two consensus processors 150, 152 via paths 133-140 and 141-148. The first consensus processor 150 processes all positive indicators. The second consensus processor 152 processes all negative indicators. Each consensus processor 150, 152 is an averaging device, ie simply forms a linear combination, such as an average, of a set of similar provisional indicator pairs A, B. The resulting reliability index pair is the desired result, and the input is a set of clinical factors for the test patient.

図９は、代表的なプロセッサ要素１２０を示す。同様のプロセッサ４８および５０は、さらに多くの入力要素を有し、プロセッサ要素１２２はほぼ同一である。代表的なプロセッサ要素１２０は、各入力経路（ここでは全体として要素ごとに１５、１６、または３の番号を付け、プロセッサ要素１２０の一部分として示す）上に複数の重み乗算器１１０、１１４、１２８を含む。重み乗算器からの重み付けされた値は、加算器１５６に結合される。加算器１５６の出力は、Ｓ字型伝達関数やアークタンジェント伝達関数などの活性化関数１５８に結合される。プロセッサ要素は、専用ハードウェアとして、またはソフトウェア機能中で実施することができる。 FIG. 9 shows an exemplary processor element 120. Similar processors 48 and 50 have more input elements and the processor element 122 is nearly identical. A representative processor element 120 includes a plurality of weight multipliers 110, 114, 128 on each input path (herein, numbered 15, 16, or 3 as a whole and shown as part of the processor element 120). including. The weighted value from the weight multiplier is coupled to adder 156. The output of adder 156 is coupled to an activation function 158 such as an S-shaped transfer function or arctangent transfer function. The processor element can be implemented as dedicated hardware or in a software function.

感度分析を実施して、臨床ファクタの相対的な重要性を決定することができる。感度分析は、デジタルコンピュータ上で以下のように実施される。トレーニングしたニューラルネットワークを、各トレーニング例（真の出力が知られている、または推測される入力データ群）について順方向モード（トレーニングなし）で実行する。次いで各トレーニング例についてのネットワークの出力を記録する。その後、各入力変数を全トレーニング例にわたる入力変数の平均値で置き換えて、ネットワークを再実行する。次いで各出力の値の差を二乗して合計（累積）し、個別の合計を得る。 Sensitivity analysis can be performed to determine the relative importance of clinical factors. Sensitivity analysis is performed on a digital computer as follows. The trained neural network is run in forward mode (no training) for each training example (input data group for which true output is known or inferred). The network output for each training example is then recorded. Thereafter, each input variable is replaced with the average value of the input variables over all training examples, and the network is re-executed. The difference between the values of each output is then squared and summed (accumulated) to obtain an individual sum.

この感度分析プロセスは、各トレーニング例について実施する。次いで全ての変数が単一の結果出力に等しく寄与する場合に正規化値が１．０となるように、各結果合計を従来のプロセスに従って正規化する。この情報から、正規化値を重要性の順序で順位付けすることができる。 This sensitivity analysis process is performed for each training example. Each result sum is then normalized according to a conventional process so that the normalized value is 1.0 if all variables contribute equally to a single result output. From this information, the normalized values can be ranked in order of importance.

臨床データの分析では、このニューラルネットワークシステムについてのファクタの感度の順序は、疾病の過去の病歴、出産回数、月経困難症、年齢、骨盤痛、骨盤手術歴、一日あたりの喫煙量、薬物治療歴、妊娠回数、流産回数、異常ＰＡＰ／形成異常症、妊娠高血圧症、生殖器いぼ、糖尿病であると決定された。 In the analysis of clinical data, the order of sensitivity of factors for this neural network system is: past history of illness, number of births, dysmenorrhea, age, pelvic pain, pelvic surgery history, amount of smoking per day, medication History, number of gestations, number of miscarriages, abnormal PAP / dysplasia, pregnancy hypertension, genital warts, diabetes.

特定のニューラルネットワークシステムがトレーニングされ、有効な診断ツールであることが分かった。図７および図１０に示すニューラルネットワークシステムは以下のように記述される。 Certain neural network systems have been trained and found to be effective diagnostic tools. The neural network system shown in FIGS. 7 and 10 is described as follows.

［表１８］
――――――――――――――――――――――――――――――
０．バイアス
１．年齢
２．糖尿病
３．妊娠高血圧症
４．一日あたりの喫煙量
５．妊娠回数
６．出産回数
７．流産回数
８．生殖器いぼ
９．異常ＰＡＰ／形成異常症
１０．子宮内膜症の病歴
１１．骨盤手術歴
１２．薬物治療歴
１３．骨盤痛
１４．月経困難症
―――――――――――――――――――――――――――――― [Table 18]
――――――――――――――――――――――――――――――
0. Bias 1. Age 2. Diabetes 3. 3. Pregnancy hypertension 4. Smoking amount per day Number of pregnancy 6. Number of births 7. Miscarriage 8 Genital warts9. Abnormal PAP / dysplasia 10. 11. History of endometriosis 11. Pelvic surgery history 13. History of drug treatment Pelvic pain14. Dysmenorrhea ――――――――――――――――――――――――――――――

以上のような、感度の順序ではなく識別の順序になっている重みは、八つの第一タイプのニューラルネットワーク１０それぞれについて以下のようになっている。 The weights in the order of identification instead of the order of sensitivity as described above are as follows for each of the eight first-type neural networks 10.

第一ニューラルネットワークＡ First neural network A

第一ニューラルネットワークＢ First neural network B

第一ニューラルネットワークＣ First neural network C

第一ニューラルネットワークＤ First neural network D

第一ニューラルネットワークＥ First neural network E

第一ニューラルネットワークＦ First neural network F

第一ニューラルネットワークＧ First neural network G

第一ニューラルネットワークＨ First neural network H

第一タイプのニューラルネットワークについての正規化した観測値 Normalized observations for the first type of neural network

さらに、本明細書で提供するように、ＥＬＩＳＡ形式テストによるテストなどの生化学テストの結果を使用して、トレーニングした増加されたニューラルネットワークシステムを生成し、感度および特異性の比較的高い信頼性レベルを生み出すことができる。こうした第二タイプのニューラルネットワークを図８に示す。入力層１２のノード３１と、一対の重み１０９および１１１が追加されていることを除けば、番号は図７と同様である。ただし、ネットワーク中の全ての重みは、追加の生化学結果でトレーニングすると変化する。正確な重みセットは、特定の生化学テストトレーニング例に依存する。 In addition, as provided herein, the results of biochemical tests, such as tests with an ELISA format test, are used to generate a trained increased neural network system that is relatively reliable with high sensitivity and specificity. A level can be created. Such a second type of neural network is shown in FIG. The numbers are the same as in FIG. 7 except that the node 31 of the input layer 12 and a pair of weights 109 and 111 are added. However, all weights in the network change when training with additional biochemical results. The exact weight set depends on the specific biochemical test training example.

本明細書で提供されるトレーニングシステムを使用することができる。代替のトレーニング技術を使用することもできる（例えば、Baxtによる「Use of an Artificial Neural Network for the Diagnosis of Myocardial Infarction」、Annals of Internal Medicine 115，p.843（1991 年 12 月１日);「Improving the Accuracy of an Artificia1 Neural Network Using Multiple Differently Trained Networks」、Neural Computation 4,p.772(1992 年)を参照のこと）。 The training system provided herein can be used. Alternative training techniques can also be used (for example, “Use of an Artificial Neural Network for the Diagnosis of Myocardial Infarction” by Baxt, Annals of Internal Medicine 115, p.843 (December 1, 1991); “Improving the Accuracy of an Artificia 1 Neural Network Using Multiple Differently Trained Networks, Neural Computation 4, p. 772 (1992)).

テスト結果を評価する際には、高ｓｃｏｒｅは疾病が存在することに相関し、低ｓｃｏｒｅは疾病が存在しないことに相関し、極端なｓｃｏｒｅは信頼性を高めるが、中程度のｓｃｏｒｅは信頼性を低下させることに留意した。子宮内膜症が存在することは、０．６以上の出力によって示され、それが存在しないことは０．４以下によって示される。高い相対ｓｃｏｒｅが、疾病の高い相対重さと相関することにも留意した。本明細書の方法は、疾病状態の有無または重さを確立するためにそれ以上の手順、しばしば手術を必要とする患者数を最小限に抑える。 When evaluating test results, a high score correlates with the presence of a disease, a low score correlates with the absence of a disease, an extreme score increases reliability, while a moderate score indicates reliability. Noted that will reduce. The presence of endometriosis is indicated by an output of 0.6 or higher, and the absence of it is indicated by 0.4 or lower. Note also that a high relative score correlates with a high relative weight of the disease. The methods herein minimize the number of patients that require further procedures, often surgery, to establish the presence or severity of a disease state.

当業者には修正形態が明らかであるので、本発明は添付の特許請求の範囲によってのみ制限されるものとする。 Since modifications will be apparent to those skilled in the art, the present invention is intended to be limited only by the scope of the appended claims.

患者病歴ベースの診断テストプロセスを開発するための流れ図である。2 is a flow diagram for developing a patient history-based diagnostic test process. 生化学診断テストを開発するための流れ図である。2 is a flowchart for developing a biochemical diagnostic test. 重要な変数を分離するプロセスの流れ図である。2 is a flow diagram of a process for separating important variables. 変数の分割を含む一つまたは一組のニューラルネットワークをトレーニングするプロセスの流れ図である。2 is a flow diagram of a process for training one or a set of neural networks including variable partitioning. 生化学診断テストを開発するための流れ図である。2 is a flowchart for developing a biochemical diagnostic test. 生化学診断テストの有効性を決定するための流れ図である。3 is a flow chart for determining the effectiveness of a biochemical diagnostic test. 複数のニューラルネットワークのコンセンサスネットワーク用に使用されるフォームの臨床データに基づいてトレーニングされたニューラルネットワークの概略図である。FIG. 6 is a schematic diagram of a neural network trained based on a form of clinical data used for a consensus network of multiple neural networks. 八個のニューラルネットワークのコンセンサス用に使用されるフォームのテスト結果データによって増大した臨床データに基づいてトレーニングされたニューラルネットワークの第二の実施形態の概略図である。FIG. 4 is a schematic diagram of a second embodiment of a neural network trained on clinical data augmented with form test result data used for consensus of eight neural networks. ニューラルネットワークの各ノードの処理要素の概略図である。It is the schematic of the processing element of each node of a neural network. ニューラルネットワークの第一または第二の実施形態を使用した八個のニューラルネットワークのコンセンサスネットワークの概略図である。FIG. 6 is a schematic diagram of a consensus network of eight neural networks using the first or second embodiment of the neural network. 診断子宮内膜症インデックス中のユーザインタフェースの例示的なインタフェーススクリーンの図である。FIG. 3 is an exemplary interface screen of a user interface during a diagnostic endometriosis index.

Claims

(A) means for providing a first set of n candidate variables and a second set of important selected variables that are initially empty;
(B) means for evaluating each variable by taking candidate variables one at a time and training the decision support system based on the variables coupled to the current set of important selected variables;
(C) When the best variable that gives the best performance of the decision support system is selected from the candidate variables, and the best candidate variable improves the performance compared with the performance of the important selected variable Means to add it to the set of important selected variables, remove it from the candidate set and continue the evaluation using means (b) above until the best candidate variable does not improve performance A computer system for variable selection.

The computer system according to claim 1, wherein said means (a) uses candidate variables obtained from a patient and comprising history data and / or biochemical data.

A computer system that generates a test to aid diagnosis,
Means for selecting a set of important selected variables according to the computer system of claim 1;
A computer system comprising means for training a decision support system using a selected final set of important selected variables to generate a diagnostic test.

A computer system that generates a test to assist in the diagnosis evaluates the likelihood of a medical condition or disorder, evaluates the likelihood that a particular condition is ongoing or will occur in the future, or provides predetermined treatment The computer system according to claim 3, wherein the treatment according to the unit is selected or the effectiveness of the treatment is determined.

The computer system according to claim 4, wherein the condition is a condition related to pregnancy or endometriosis.

4. The computer system for generating a test to assist in the diagnosis assesses the presence or severity of a medical condition or determines a possible outcome from treatment for a given therapeutic unit. Computer system.

A computer system that improves the effectiveness of diagnostic biochemical tests,
Means for selecting a set of important selected variables according to the computer system of claim 1;
A computer system with means to train a decision support system using important selected variables and a selected final set of biochemical test data to generate tests that are more effective than biochemical tests alone .

A computer system that identifies biochemical tests to assist in the diagnosis of a fault or condition,
(A) means for selecting a set of important selected variables according to the computer system of claim 1;
(B) Identifying a set of biochemical test data and making a decision using a selected final set of important selected variables coupled to each element of the set of biochemical test data A means of training the support system and evaluating the performance of the resulting system;
(C) means for repeating training and evaluation until all elements have been used for training for each element of the set of biochemical test data;
(D) a computer system comprising means for selecting elements of a set of biochemical data resulting in a system operating at best performance.

(A) means for providing a first set of n candidate variables and a second set of important selected variables that are initially empty; and (b) any candidate variables arbitrarily, or A means of ranking in order;
(C) Take m largest ranked variables one at a time, where m is from 1 to n, and make a decision based on the variables combined with the current set of important selected variables Means to evaluate each variable by training the support system;
(D) Of the m variables, select the best variable that gives the best performance of the decision support system and improve the performance compared to the performance of the selected variable where the best variable is important. If so, add it to the set of important selected variables, remove it from the candidate set, continue to evaluate by means (c), and compare the variables with the performance of the important selected variables. Means for evaluating by using means (e),
(E) a computer system for variable selection comprising means for determining whether all variables of the candidate set have been evaluated.

The computer system according to claim 3 or 9, wherein the candidate variable includes biochemical test data.

10. The computer system of claim 9, wherein the ranking means is based on sensitivity analysis or based on analysis including other decision support system based analysis.

The computer system of claim 9, wherein the ranking means is based on a process that includes statistical analysis.

10. The computer system according to claim 9, wherein the ranking means is based on a process including chi-square, regression analysis or discriminant analysis.

The computer system of claim 9, wherein the means for ranking is determined by a process that uses an expert, a rule-based system, a sensitivity analysis or a combination evaluation.

The above sensitivity analysis
(I) means for determining an average observation value for each variable in the observation data set;
(Ii) means for selecting a training example and executing the example via a decision support system to generate an output value designated and stored as a normal output;
(Iii) Select the first variable in the selected training example, replace the observed value with the average observed value of the first variable, and execute the modified example in the forward mode in the decision support system Means for recording the output as a modified output;
(Iv) means for squaring the difference between the normal output and the modified output and accumulating the sum as a sum, wherein the sum for each variable is specified as a variable sum selected for each variable. The computer system according to 11 or 14.

The computer system according to claim 1, wherein the decision support system includes neural network consensus.

10. A computer system as claimed in claim 1 or 9, wherein the set of n candidate variables and the set of important selected variables are each stored in a computer.

4. The method of claim 3, further comprising means for training a final decision support system based on a completed set of important selected variables to generate a decision support system based test for the condition. Computer system.

The computer system according to claim 3, wherein the state is a state related to gynecology.

21. The computer system of claim 19, wherein the condition is selected from among infertility, pregnancy related events, and pre-eclampsia.

A computer system that develops a decision support system based test to assist in diagnosing a medical condition, disease or disorder,
(A) means for obtaining observations from a group of test patients whose medical condition is known;
(B) means for classifying the observation results obtained by means (a) into a set of candidate variables having observation values and storing the observation values in a computer as an observation data set;
(C) means for selecting a subset of important selected variables from a set of candidate variables using the computer system of claim 1 or 9;
(D) using observation data corresponding to a subset of important selected variables, a second decision support system based system constitutes a decision support based diagnostic test for a condition, disease or disorder And a means for training the second decision support system.

After collecting observations from a group of test patients and before training a second decision support based system,
Obtain test results collected from biochemical tests on at least some of the test patients whose status is known or suspected and classify them into a set of candidate variables, which are then candidate variables 24. The computer system of claim 21, further comprising means for adding to the first set.

Further comprising means for identifying one or more biochemical test data variables ending with a final subset of important selected variables, whereby the identified one or more biochemical test data variables are 23. The computer system of claim 22, useful as an indicator of a failure or condition.

24. A computer system according to any of claims 21-23, wherein the test assesses the presence or weight or treatment unit of a disease, disorder or other medical condition.

24. A computer system according to any of claims 21-23, wherein the test assists in determining a result resulting from the selected treatment.

24. A computer system according to any of claims 21-23, wherein the decision support system comprises a neural network and the final set comprises a consensus of neural networks.

24. A computer system according to any of claims 21-23, wherein a first subset of important selected variables is identified using a sensitivity analysis performed on a decision support based system or its consensus. .

24. A computer system according to any of claims 21 to 23, wherein the first decision support system comprises at least one neural network.

24. A computer system according to any of claims 21 to 23, wherein the second decision support system comprises at least one neural network.

24. The computer system of claim 23, further comprising means for developing a diagnostic biochemical test for one or more identified biochemical test data variables.

24. The method of claim 21-23, further comprising means for collecting additional observations from the patient and classifying them into a set of candidate variables, the candidate variables being then added to the first set of candidate variables. A computer system according to any one of the above.

A computer system for developing new biochemical tests or identifying new disease markers,
24. The computer system of claim 23;
A means of identifying biochemical data variables that are important selected variables;
A computer system comprising: biochemical data from which variables are obtained or means for developing tests to detect disease markers.

The computer system according to claim 21 or 22, wherein the candidate variable includes biochemical test data.

The computer system of claim 21, wherein the ranking means is based on an analysis including sensitivity analysis or other decision support system based analysis.

The computer system of claim 21, wherein the means for ranking is based on a process that includes statistical analysis.

The computer system of claim 21, wherein the ranking means is based on a process including chi-square, regression analysis, or discriminant analysis.

The computer system of claim 21, wherein the means for ranking is determined by a process that uses an expert, rule-based system, sensitivity analysis or a combination of evaluations.

Sensitivity analysis
(I) means for determining an average observation value for each variable in the observation data set;
(Ii) means for selecting a training example and executing the example via a decision support system to generate an output value designated and stored as a normal output;
(Iii) Select the first variable in the selected training example, replace the observed value with the average observed value of the first variable, execute the modified example in the forward mode in the decision support system, and output the output Means for recording as modified output;
(Iv) means for squaring the difference between the normal output and the corrected output and accumulating the sum as a sum, the sum for each variable being designated as the sum of the variables selected for each variable,
(V) means for using means (iii) and (iv) for each variable in the example;
(Vi) means for using examples (ii)-(v) for each example in the data set, wherein each sum of the selected variables represents the relative contribution of each variable to the decision support system output determination. 38. A computer system as claimed in claim 34 or 37.

40. The computer system of claim 38, further comprising: (vii) means for ranking the variables according to their relative contribution to the determination of the decision support system output.

The means for training the second decision support system is to execute a set of previously unused observation data through the second decision support system after training to provide a performance estimate for the medical condition indicator. 24. A computer system according to any one of claims 21 to 23, wherein a set of previously unused observation data is collected from a patient whose medical condition is known, comprising validation means for providing .

The means for training the second decision support system comprises means for dividing the observation data set into a plurality of sections including at least one test data section and a plurality of training data sections. The test data partition is used to provide a final performance estimate for the second decision support system after the training data partition is executed. 24. The computer system according to claim 21-23.

42. The computer system of claim 41, wherein the second decision support system comprises a plurality of neural networks each having a set of unique starting weights and a performance rating value.

43. The computer system of claim 42, wherein the final performance estimate is generated by averaging performance rating values for a plurality of neural networks.

24. A computer system according to any one of claims 21 to 23, wherein the observed values are obtained from results of patient medical history data and / or biochemical test results.

24. A computer system according to any of claims 21-23, wherein the condition is a pregnancy related condition or endometriosis.

The disorder is endometriosis,
Candidate variables are
(I) Past endometriosis history, number of births, dysmenorrhea, age, pelvic pain, pelvic surgery history, smoking amount per day, medication treatment history, pregnancy count, miscarriage count, abnormal PAP / dysplasia Pregnancy, hypertension, genital warts and diabetes, or (ii) age, number of births, number of pregnancy, number of miscarriages, amount of smoking per day, past endometriosis history, dysmenorrhea, pelvic pain, abnormal 33. The computer system of claim 32, comprising at least four variables selected from PAP, pelvic surgery history, medication history, pregnancy hypertension, genital warts, and diabetes.

The computer system according to claim 46, wherein the decision support system comprises a neural network or a consensus of neural networks.

47. The computer system of claim 46, wherein at least five variables are selected.