WO2009024796A1 - Method for prediction and diagnosis of medical conditions and apparatus for performing the same - Google Patents

Method for prediction and diagnosis of medical conditions and apparatus for performing the same Download PDF

Info

Publication number
WO2009024796A1
WO2009024796A1 PCT/GB2008/002859 GB2008002859W WO2009024796A1 WO 2009024796 A1 WO2009024796 A1 WO 2009024796A1 GB 2008002859 W GB2008002859 W GB 2008002859W WO 2009024796 A1 WO2009024796 A1 WO 2009024796A1
Authority
WO
WIPO (PCT)
Prior art keywords
result
data
analysis
diagnostic
subject
Prior art date
Application number
PCT/GB2008/002859
Other languages
French (fr)
Inventor
Emmanuel C. Ifeachor
Viktoriya Stalbovskaya
Original Assignee
University Of Plymouth
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Plymouth filed Critical University Of Plymouth
Publication of WO2009024796A1 publication Critical patent/WO2009024796A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Definitions

  • the present invention relates to a method for use in assisting a medical practitioner in making a diagnosis of the condition of a subject or patient, in particular in determining the nature of the condition.
  • the present invention further relates to an apparatus for performing the method.
  • the assessment and diagnosis of the condition of a patient or subject may be divided into aspects.
  • the first aspect is the correct determination of the particular condition that is ailing the subject.
  • the second aspect is the nature or severity of that condition, in particular to determine whether the condition is malignant or benign.
  • the proper determination of the second aspect is particularly useful in deciding upon the most appropriate and efficient form of treatment. This is particularly important when a health service or provider is treating a large number of patients with limited resources, as is generally the case.
  • Cancer is the second-leading cause of death in the UK after heart diseases. Each year, around 130,000 people die from cancer in the UK alone and about
  • genomics and proteomics have the potential to provide an insight into individual differences in patients and an opportunity to improve diagnosis and care on an individual basis.
  • Recent studies in genomics/proteomics of cancer have identified potentially useful cancer-specific "signatures" and biomarkers (Van't Veer et al., 'Gene expression profiling predicts clinical outcome of breast cancer', Nature, 415 (6871) pages 530-6, 2002; Petricoin, E.
  • the discovery of the marker HER-2 made it possible to identify subgroups of breast cancer patients who will benefit from adjuvant chemotherapy with Trastuzumab (Romond EH et al., Trastuzumab plus adjuvant chemotherapy for operable HER2-positive breast cancer', N Eng.J Med., 353 (16), pages 1673-84, 2005).
  • the method should preferably be able to handle a wide range of data obtained from the results of tests and examinations performed on a patient, as well as being able to accommodate data sets that may be incomplete.
  • Providing the first prerequisite will enable the individualised approach to the care of patients, in particular oncological patients, because, in contrast to traditional cancer staging system, quantitative assessment of a patient's health status can handle individual peculiarities and allows fine grading of a patient's condition.
  • the method should integrate patient information from different modalities (clinical, imaging, laboratory, genomics, etc.) to produce a composite index, with an appropriate confidence measure assigned to each modality.
  • ovarian tumours are common among women. In Europe and North America the age- adjusted standardised incidence rate of ovarian cancer is over 10 per 100,000 women. Preoperative prediction of malignancy of ovarian tumours is very important, because it can prevent unnecessary surgery for benign functional cysts or in the case of benign neoplastic lesions only minimal surgical intervention would be required. On the other hand, patients with malignant forms of tumour require not only surgical operation but also an appropriate pre-, peri- and postoperative management. A great deal of effort has been put in by gynaecological oncologists in order to develop preoperative predictive markers of ovarian malignancy.
  • the range of laboratory and instrumental diagnostic techniques for ovarian cancer is wide and includes transvaginal and transabdominal ultrasonography, serum tumour markers, laparoscopy, computer tomography and magnetic resonance imaging.
  • a key problem is in the choice of necessary procedures taking into account their diagnostic value, cost and invasiveness. Accordingly, there is a need for a method of assessing tumours and cancerous conditions of a patient in particular, and for assessing other clinical conditions in general, that meets the aforementioned criteria and needs.
  • the present invention provides a method of characterising a medical condition of a subject, the method comprising the steps of: a) provide a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) calculate a diagnostic coefficient DC 1 for each result of an analysis X 1 ; c) calculate an importance factor J 1 for each result of an analysis X 1 ; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DC, using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J 1 , DC,max with the thresholds determined in step (e); g) successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC 1 having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in step (e); and h) identify the threshold exceeded in step (g).
  • the method of the first aspect of the present invention allows the modelling of the preoperative diagnosis of conditions of a patient, in particular cancerous conditions and tumours, especially ovarian tumours.
  • the method is based on the Sequential Nonuniform Procedure (SNuP), which meets the requirements above.
  • SNuP is based on a form of Bayes classification, but with additional restrictions. In particular, consecutive multiplication of likelihood ratios of input variables is interrupted when one of the diagnostic thresholds is reached. Values of thresholds are specified according to an acceptable level of the diagnostic errors.
  • the SNuP operates sequentially on the variables (features) as the cases (observations) are accumulated. This is significant, as it allows the method to provide a personalised differential diagnosis.
  • the first step of the method as set of data obtained from an analysis and/or examination of the subject is provided.
  • the set of data contains the result of at least one test, analysis, investigation or examination carried out on or in respect of the subject. In many cases, the set of data will contain two or more such results.
  • Examples of symptom analyses x that may be obtained to generate the set of symptom data for a female subject suspected of suffering from ovarian cancer are set out in Table I below.
  • the set of symptom data contains at least one result from an analysis of a symptom x,.
  • Peak systolic velocity (PSV) cont Peak systolic velocity
  • TAMX Time-averaged mean velocity
  • Low level echogenicity Low-level binary Mixed echogenicity (Mixed) binary
  • RMI Risk of malignancy index
  • Table I the form of data for each of the analyses is indicated and may be continuous, binary (that is assigned 1 or 0) or nominal (that is having a discrete value, such as 0, 1 , 2 etc. depending upon the outcome of the analysis).
  • the method of the present invention employs purely discrete data, in particular binary data.
  • discrete data is to be considered as being data that have a discrete value, such as may result from a test or investigation that produces an indication that is merely 'low 1 , 'medium' or 'high'.
  • binary data is to be considered as the data resulting from a test, examination or analysis of the subject that can give one of two results.
  • included in the above list is the menopausal state of the female subject, who may either be menopausal or not menopausal.
  • many analyses or tests conducted on a subject do not yield a simple binary result, but rather provide a continuous result that may take any value within the result range.
  • the type of result data is specified for each data set in Table I.
  • the reduction of the continuous result data to a discrete data set, in particular a binary data set may be achieved in a number of ways.
  • the analysis or test may be redefined to produce a binary result.
  • the level of serum CA 125 while a continuous result rather than a binary, may be redefined to specify a minimum level of the serum CA 125 (for example 30 U/ml), allowing the result to be presented as a binary result, that is either above the specified minimum level, or at or below the specified minimum level.
  • this manual setting of a specified value, such as a minimum or threshold value is not I O preferred and can lead to inefficiencies in the system.
  • the continuous result data are converted into discrete result data, for example binary data. This conversion may be achieved using mathematical manipulations known in the art.
  • the continuous result data are converted into discrete result data, for example binary data. This conversion may be achieved using mathematical manipulations known in the art.
  • fuzzy logic 15 conversion of the continuous result data to discrete data is achieved using fuzzy logic.
  • Suitable fuzzy logic techniques are known in the art.
  • One preferred method for converting the continuous result data into discrete data is the use of clustering techniques, in particular univariate and multivariate clustering.
  • a preferred clustering technique is disclosed in J. MacQueen, 'Some methods for classification and 0 analysis of multivariate observations', Proceedings of 5 th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1 , 1967, pages 281 to 297.
  • continuous result data may be transformed to ordinal data by partitioning the initial input space. Thereafter these variables are analysed by SNuP in the regular way. Automatic partition of the input space for continuous variables may be performed by 5 applying k-means clustering, as described by J.
  • MacQueen aforementioned, with three number of clusters.
  • Squared Euclidean distance may be used as a distance measure, initial centroid positions of clusters being selected randomly. In case of a cluster losing all of its member observations those clusters are removed. Assignment of continuous variables from the test set to clusters may be made on a basis of 0 minimal squared Euclidean distance to one of the centroids that are identified on the training stage.
  • the present invention provides a method of characterising a medical condition of a subject, the method comprising the steps of: a) provide a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; a1) convert continuous result data present in the set of symptom data into discrete result data; b) calculate a diagnostic coefficient DC 1 for each result of an analysis X 1 ; c) calculate an importance factor J, for each result of an analysis X 1 ; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DC 1 using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J,, DC,max, with the thresholds determined in step (e); g) if DC,max does not exceed one of the thresholds determined in step (e), successively sum the diagnostic coefficient DC,max with
  • each result of an analysis x is used to calculate a diagnostic coefficient DC 1 .
  • the diagnostic coefficient DC is derived from the probability of the subject having a particular condition given a specific result from the analysis x,.
  • the diagnostic coefficient DC 1 may be derived from the result of an analysis
  • the key issue in diagnosis is to determine whether a given subject belongs to one of two groups, that is the groups of having a particular condition, or not, given the symptoms expressed by the subject and the laboratory data obtained from one or more tests.
  • the key issue is whether the subject belongs to one of the following two groups: benign or malignant tumour.
  • P(xij A k ) is the conditional probability of x, given A k , that is the probability of presence of symptom x, in the group A k ;
  • P(X 1 ) is the prior 5 probability of symptom x,.
  • the posterior probability of the subject to belong to group A h having symptom x can be defined using Bayes' theorem as follows:
  • Sequential non-uniform procedure produces a model of classification into two groups
  • the ratio of conditional probabilities of the groups is equal to the ratio of symptom's occurrences in the two groups, as follows: I 5
  • the diagnostic coefficient DC 1 of symptom x is a score value which is defined as follows:
  • the next step is to determine an importance factor Ji for each result of the analysis or symptom x,.
  • the importance factor is required in order to rank the analytical and test results and symptoms, as described hereinafter.
  • the importance factor J may be determined as follows:
  • the feature selection process and ranking of input variables/symptoms is based on the calculation of symmetrised Kullback-Leibler divergence between two distributions, P and Q, the so-called J-divergence, as described in H. Jeffreys 'An invariant from of the prior probability in estimation problems', J. R. Statist. Soc, Vol. A, pages 453 to 469, 1946, as follows:
  • m is the number of distinct values of the variable (for example, 'low', I O 'normal', 'high')
  • the importance factor J 1 for the symptom or result x is the sum of all the distinct values J(X 1 ,) of the variable, as follows:
  • the symptom or results are ranked according to their importance factors, with the symptom or result having the highest importance factor being ranked highest.
  • the subsequent processing of the diagnostic coefficients DC is applied to each symptom or result x, according to the ranking of its importance factor J 1 , as will be described hereinafter.
  • the method of the present invention requires the determination of threshold values for the diagnostic coefficients DC, that is the threshold values for the ratio set out in formula (3). This in turn requires that the possible errors in the values of the symptom or result data x, is taken into account.
  • a subject under investigation for a given condition two types of error may be identified: first, the subject may be diagnosed as having the given condition, when this is incorrect and the condition is not present; and, second, the subject may be diagnosed as not having the given condition, when the condition is in fact present. These errors may be termed as ⁇ and ⁇ .
  • specifies the probability of false assignment of a patient with a malignant tumour into a benign tumour group
  • specifies the probability of false assignment of a patient with a benign tumour into a malignant tumour group
  • is the rate of misclassification into group A 1
  • is the rate of misclassification into group A 2
  • the threshold for a diagnostic hypothesis is the minimum acceptable rate of correct diagnoses over incorrect ones. Denoting A+ as a correct diagnosis and A- as an incorrect diagnosis, the probabilities of correct and incorrect diagnoses in the groups are P[A 1 +), P(A 1 - ), P(A 2 +), and P(A 2 - ). Accordingly, the decision rule for group 1 is:
  • the threshold for a diagnostic hypothesis is the minimum acceptable rate of correct diagnoses over incorrect ones. Thresholds for the sum of the diagnostic coefficients are defined as follows: I O
  • the threshold values are assigned to a particular condition or diagnosis. 0
  • Ai may assigned as the condition of a tumour being benign and A 2 being assigned as the condition of the tumour being malignant.
  • the threshold value DCu 1 (A 1 ) is the minimum value of the diagnostic coefficients required in order to provide a diagnosis that the tumour is benign.
  • the threshold value DC th (A 2 ) is the minimum value of the diagnostic 5 coefficients required in order to diagnose the condition as being a malignant tumour.
  • the threshold values for the diagnostic coefficients is 12.8 and -12.8. Increasing the accuracy of the analysis and symptom data to 99% provides threshold values for the diagnostic coefficients of 30 and -30.
  • the value of the diagnostic coefficient DCimax with the highest importance factor Ji is compared with the threshold values determined for the diagnostic coefficients. If the value of the diagnostic coefficient DCjiriax with the highest importance factor Jj exceeds one of the threshold values, the thus exceeded threshold value is identified and the method terminated. If the value of the diagnostic coefficient DCimax with the highest importance factor Jj does not exceed one of the threshold values, the value of DCimax is summed with the value of the diagnostic coefficient DCi having the next highest importance factor J 1 . If the value of this sum exceeds one of the threshold values, the thus exceeded threshold value is identified and the method terminated. If neither threshold value is exceeded, the successive summation of the diagnostic coefficients DCi in order of decreasing importance factor Jj is continued until the value of the sum exceeds one of the threshold values. At this point, the summation is ceased and the exceeded threshold value identified.
  • the accumulation of the diagnostic information using the diagnostic coefficients DCj is performed as a sum, as follows:
  • each threshold value of the diagnostic coefficient is the minimum acceptable rate of correct diagnoses over incorrect ones.
  • identification of the exceeded threshold in turn allows the correct diagnosis to be made. For example, if A 1 is taken to represent the subject having a benign tumour and A 2 is taken to represent the subject having a malignant tumour.
  • the value of the diagnostic coefficient DC.max with the highest importance factor J 1 or the successive summation of the diagnostic coefficients exceed the threshold value for the group A 1 , given as DC tH (A 1 ) in Table II then the method indicates that the subject has a benign tumour.
  • DC,max or the successive summation of the diagnostic coefficients exceed the threshold value for the group A 2 , given as DQ h (A 2 ) in Table II, the subject is to be diagnosed with a tumour that is malignant.
  • the present invention provides a system for characterising a medical condition of a subject, the system comprising: a) means for providing a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) means for calculating a diagnostic coefficient DC, for each result of an analysis x,; c) means for calculating an importance factor J, for each result of an analysis X 1 , d) means allowing a user to specify an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DC 1 using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J 1 , DC,max with the thresholds determined in feature (e); g) means for successively summing the diagnostic coefficient DC,max with the diagnostic coefficient DC 1 having the next highest importance factor J 1 until the value of the sum lies outside the threshold range
  • the system of this aspect of the present invention is capable of processing binary analysis data.
  • the system comprises means for converting continuous analysis data into discrete data, for example binary data.
  • the present invention provides a system for characterising a medical condition of a subject, the system comprising: a) means for providing a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; a1 ) means for converting continuous result data present in the set of symptom data into discrete result data; b) means for calculating a diagnostic coefficient DC, for each result of an analysis X 1 ; c) means for calculating an importance factor J 1 for each result of an analysis x,; d) means allowing a user to specify an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DC, using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DC 1 of the analysis x, with the highest importance factor J 1 , DC,max with the thresholds determined in feature (e); g) means for successively summing the diagnostic coefficient DC
  • the methods of the present invention may be used to provide an indication at an early stage in the diagnostic assessment of a subject the extent to which some or all of the available tests, analyses and examinations in relation to the subject are required in order to allow the medical practitioner to arrive at a clear diagnosis.
  • the methods may be applied in order to identify those tests, analyses and examinations that are not required in order to reach a clear diagnosis, thus reducing the amount of time the subject is subjected to, possibly invasive, procedures, the overall time taken to carry out the prediagnostic assessments, and the overall cost of the diagnostic procedure.
  • the present invention provides a method of identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the method comprising the steps of: a) provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) calculate a diagnostic coefficient DC, for each result of an analysis x,; c) calculate an importance factor J 1 for each result of an analysis X 1 ; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DC 1 using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J 1 , DC,max with the thresholds determined in step (e); g) successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC 1 having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in step (e
  • the method applies the general process discussed hereinbefore to a first set of symptom data obtained from a first group of tests or analyses.
  • This first set may contain only some of the results of a complete investigation into the condition of the subject.
  • the method is applied to this partial data set. If the successive summation of the diagnostic coefficients results in one of the threshold values being exceeded, then a diagnosis of the condition can be made, without the need for conducting further investigations or tests. It is only when the successive summation of all the diagnostic coefficients in the first data set does not provide a sum that exceeds one of the thresholds that further investigations are required.
  • the method may be applied after each analysis, test or investigation into the subject and the procedures continued only until a threshold is exceeded and a diagnosis is possible. In this way, the method indicates whether a diagnosis of the condition is possible from a selection of tests, analyses or investigations selected according to specified criteria, such as time taken, discomfort or risk to the subject, and/or cost.
  • the method may be applied to sets of data that contain only binary values. However, it is most advantageous if the method is also applied to data sets containing continuous values. In this case, the continuous data values are converted into discrete data, as hereinbefore described.
  • the present invention provides a method of identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the method comprising the steps of: a) provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis Xj conducted on the subject; a1 ) convert continuous result data present in the set of symptom data into discrete result data; b) calculate a diagnostic coefficient DCj for each result of an analysis x,; c) calculate an importance factor J-, for each result of an analysis X ⁇ ; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DCi using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J 1 , DC.max with the thresholds determined in step (e); g) successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC, having the next highest
  • a system for identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject comprising: a) means to provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) means to calculate a diagnostic coefficient DC 1 for each result of an analysis x,; c) means for calculating an importance factor J, for each result of an analysis x,; d) means for specifying an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DC 1 using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DC 1 of the analysis x, with the highest importance factor J 1 , DC,max with the thresholds determined in feature (e); g) means to successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC
  • the system most preferably further comprises means for converting continuous data in the data set into discrete data. Accordingly, there is also provided a system for identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the system comprising: a) means to provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; a1) means for converting continuous result data present in the set of symptom data into discrete result data; b) means to calculate a diagnostic coefficient DCi for each result of an analysis X 1 ; c) means for calculating an importance factor Jj for each result of an analysis X 1 ; d) means for specifying an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DCj using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DCi of the analysis x, with the highest importance factor 4 DC,max with the
  • Figure 1 is a graphical representation of the method of the present invention applied to four different cases of ovarian cancer.
  • Figure 1 illustrates, graphically, the method of the present invention applied to a set of data obtained from the investigation of a subject suspected of suffering from ovarian cancer.
  • a 1 is the outcome that the tumour is malignant
  • a 2 is the outcome that the tumour is benign.
  • the thresholds for A 1 and A 2 are denoted by bold solid lines.
  • the hatched areas in Figure 1 are determinate zones for A 1 and A 2 (malignant and benign groups).
  • Case 1 and case 2 demonstrate SNuP with definitive variables (when one variables is enough to reach a threshold). Thus, in both cases 1 and 2, the value of the diagnostic coefficient DC,max having the highest importance factor exceeds one of the threshold values.
  • the tumour may be diagnosed as being malign.
  • the subject is suffering from a benign tumour.
  • Case 3 shows the straight-forward classification of a benign tumour in three steps, involving the successive summation of the three diagnostic coefficients having the first, second and third highest importance factors.
  • Case 4 is a difficult case of ovarian cancer. It can be seen the final determination of the method applied to case 4 is that the tumour is malignant. However, it will be noted that the successive summation provides an indication in the third and fourth summations that the tumour may be benign, as indicated by the value of the summation tending towards the threshold value DC th (A 2 ). In a conventional diagnostic procedure, a medical practitioner may have interpreted such a trend in the data to indicate a benign tumour.
  • the amount of blood flow was assessed within the septa, cyst walls, solid tumor areas, or ovarian stroma.
  • two new binary variables were added - ' Col3' and ' CoW.
  • the variable CA125 was transformed to binary values, 0 or 1 , depending on a threshold value of 30 U/ml. A value of 1 was assigned if CA125>30, a value of 0 otherwise.
  • the Risk of Malignancy Index was used as a benchmark during the performance evaluation.
  • the ultrasound score (Morph) was calculated as the sum of scores for the presence of multilocular cyst, evidence of solid areas, evidence of metastases, presence of ascites and bilateral lesions.
  • the menopause state (Meno) was equal to 1 if premenopausal and equal to 3 if postmenopausal.
  • the calculated diagnostic coefficients and J-divergences for all nominal input variables are presented in Table III below.
  • the last column shows the rank of the symptom.
  • the sequential non-uniform procedure for preoperative differential diagnosis between benign and malignant forms of adnexal tumour is recommended to start from the most informative variables, i.e. variables with the highest J rank (e.g. smooth internal wall, strong blood flow, presence of unilocular cyst, level of serum CA125 above 30 U/ml, presence of ascites, etc).
  • N total number of cases in the group
  • n number of cases in the group with presence of feature
  • This case contains data missing for some variables, indicating that certain tests or analyses were not carried out on the subject.
  • the performance of the method of the present invention was assessed using ROC analysis and a 3-fold cross validation. A ratio of 1 :2 between malignant and benign groups sample sizes was taken from the initial data set.
  • the SNuP procedure of the present invention was applied to the ovarian tumour data set.
  • the task was to distinguish malignant and benign forms of this kind of neoplasm.
  • the differential diagnosis of these conditions apart from clinical examination involves ultrasound methods, tumour markers, CT and MRI. It is important to find a trade-off between the cost and the number of the diagnostic procedures and the risk of missing a case when urgent surgical operation might be required.
  • the SNuP showed a high performance on a real data set during cross validation.
  • the method is close to clinical thinking and can be used not only for research but also for educational purposes to demonstrate the inference process.
  • a second study using the method of the present invention was performed on 1066 cases of adnexal masses collected during international multicentre clinical trial across 14 research centres.
  • the full database include 1066 cases of ovarian tumours, 266 malignant and 800 benign. Histological diagnosis were used as a gold standard.
  • (iii) serum tumour marker CA125 was measured for 809 patients.
  • the peak systolic velocity (PSV), time averaged maximum velocity (TAMXV), the pulsatility index (Pl), and the resistance index (Rl) were substituted by 2.0 cm/sec, 1 cm/sec, 3.0, and 1.0, respectively.
  • m value arg i inin ⁇ (a > j - Cij f
  • the number of clusters max(/) 3
  • coordinates of cluster centroids c ' are triplets ⁇ LesD ⁇ ;LesD7 ⁇ LesD3), : ⁇ 51.0;40.7;40.0 ⁇ , ⁇ 106.8; 85.3; 82.2 ⁇ , ⁇ 201.2; 148.0; 142.3 ⁇ Table IV
  • Cluster 1 Ouster 2 Cluster 3
  • diagnostic coefficients of untransformed and univariately transformed variables are straightforward when there is clear assignment of the diagnostic coefficient DC, to the level of the symptom.
  • Examples of a univariately transformed variable include log of serum CA125, which was split into three clusters, which can be described as 'low', 'medium' and 'high' levels, with an increasing degree of association with malignancy.
  • New variables created in two- or multidimensional space might have increasing values in one dimension and decreasing values in another, which may slightly complicate interpretation and require additional clinical input. Examples of these kinds of variables include the diameter of solid component (SoNdD), velocity indices (Pl, Rl, PSV, TAMXV), diameter of ovaries (OvD) and diameter of lesion (LesD).
  • the method of the present invention was compared with the performance of an expert medical assessment.
  • conditions 1 and 4 represent a situation when the method and the expert agree. The rest of the conditions are more interesting.
  • Conditions 2 and 6 are difficult cases for diagnosis.
  • Condition 2 is true when the expert misses something or reaches a conclusion based on a wrong assumption. This may also be due to new knowledge discovered by the method of the present invention.
  • condition 3 is true.
  • Condition 5 is possible when there is enough information for the expert to arrive at a correct conclusion but not enough for the method to provide a determinative answer.
  • Table VII contains a summary of the results of the method of the present invention compared with the performance of an expert in making the same diagnosis.
  • Modelling conditions included a 3-fold cross validation, 141 cases in the test set, ⁇ and ⁇ were equal to 0.10, and P(A) was taken into account.
  • Table VII shows that for malignant tumour the agreement was reached in 79.5% cases (79.5% and 0.0%) and for the benign form the level of agreement was almost the same - 78.3% (73.2% and 5.1 %).
  • the method of the present invention was better than the expert in 10 (10.3%) cases of benign and 1 (2.3%) case of malignant tumours.
  • the expert outperformed the method in 9 (9.3%) cases of benign neoplasm and 8 (18.2%) cases of ovarian cancer.
  • results of the method-expert comparison show that the method of the present invention compares well with the diagnoses made by a medical expert.
  • tumour marker CA125 in the serum is common in the diagnosis of ovarian cancer as well as during and after treatment. It has been shown that an abnormally raised level of CA125 is associated with malignancy. However,
  • the method of the present invention was used to analyse a set of test data for 5 a range of subjects and to compare the results obtained when the CA125 data were included in the data set with the results obtained when the CA125 data were omitted.
  • the results are summarised in Table VIII.
  • the CA125 result does not significantly improve the classification performance, but brings more certainty to the decision making process by reducing the total number of undefined cases although the median number of variables stays the same.
  • the CA 125 test has an importance factor Jj that ranks it fourth in importance. As demonstrated in Example I, the method of the present invention may not require four data points in order to reach a diagnosis. In an experiment, CA125 was used in 80 patients out of 141. In 66 (82.5%) cases the absence of CA125 did not change the outcome of the method of the present invention, and the rates per groups were 30 (78.9%) benign cases, 36 (85.7%) malignant cases. The exclusion of CA125 produced the worse results in 9 (11.3%) cases and better results3 in 5 (6.3%) cases.
  • the method of the present invention may be used to provide a diagnosis in a significant number of cases on the basis of a data set that does not contain CA125 data. If the method is applied and the result is indeterminative, that is neither threshold is exceeded, this is an indication that further data are required, which may include CA125 data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

A method of characterising a medical condition of a subject is provided, the method comprising the steps of: a) provide a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis Xi conducted on the subject; b) calculate a diagnostic coefficient DCi for each result of an analysis Xi c) calculate an importance factor Ji for each result of an analysis Xi; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DCi using the error level specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DCi of the analysis Xi with the highest importance factor Ji DCimax with the thresholds determined in step (e); g) successively sum the diagnostic coefficient DCimax with the diagnostic coefficient DCi having the next highest importance factor Ji until the value of the sum lies outside the threshold range defined in step (e); and h) identify the threshold exceeded in step (g). The method preferably comprises the step of converting continuous data in the symptom data set into discrete data. A system for carrying out the method is also disclosed.

Description

METHOD FOR PREDICTION AND DIAGNOSIS OF MEDICAL CONDITIONS AND APPARATUS FOR PERFORMING THE SAME
The present invention relates to a method for use in assisting a medical practitioner in making a diagnosis of the condition of a subject or patient, in particular in determining the nature of the condition. The present invention further relates to an apparatus for performing the method.
The assessment and diagnosis of the condition of a patient or subject may be divided into aspects. The first aspect is the correct determination of the particular condition that is ailing the subject. The second aspect is the nature or severity of that condition, in particular to determine whether the condition is malignant or benign. The proper determination of the second aspect is particularly useful in deciding upon the most appropriate and efficient form of treatment. This is particularly important when a health service or provider is treating a large number of patients with limited resources, as is generally the case.
Cancer is the second-leading cause of death in the UK after heart diseases. Each year, around 130,000 people die from cancer in the UK alone and about
225,000 new cases are diagnosed. These figures are currently increasing by about 1.4% per annum. In 2003, the NHS invested £639m mainly on chemotherapy alone. However, the earlier diagnosis of cancer is made the more optimistic the prognosis can be and the less aggressive the therapy (for example conservative operation without adjuvant chemo and radiotherapy). For example, if ovarian cancer is detected at FIGO stage I1 the 5-year disease-free survival rate is over 80%. The use of the optimum treatment strategy can increase the effectiveness of treatment and minimise side effects experienced by the patient, many of which can be severe.
The trend in many clinical areas of the treatment of cancer is towards personalisation of diagnosis and treatment because of the heterogeneity of the disease and differences in individual patients. Conventional prognostic criteria for various types of cancer include histological staging, lymph node status, TNM system, proliferation index, Nottingham prognostic index and risk of malignancy index for ovarian tumours. However, the predictive power of conventional diagnostic and prognostic markers is limited and therefore not adequate for the individualisation of prediction of care and the response to treatment. Tumours with similar histopathological appearance, for example, can follow significantly different clinical courses and patients with similar diagnosis show markedly different responses to treatment.
New and emerging high throughput technologies such as genomics and proteomics have the potential to provide an insight into individual differences in patients and an opportunity to improve diagnosis and care on an individual basis. Recent studies in genomics/proteomics of cancer have identified potentially useful cancer-specific "signatures" and biomarkers (Van't Veer et al., 'Gene expression profiling predicts clinical outcome of breast cancer', Nature, 415 (6871) pages 530-6, 2002; Petricoin, E. F., Ardekani, A.M., Hitt et al., 'Use of proteomic patterns in serum to identify ovarian cancer1, Lancet, 2002; 359 (9306), pages 572-7; Golub TR, Slonim DK, Tamayo P et al., 'Molecular classification of cancer, class discovery and class prediction by gene expression monitoring', Science, 286 (5439) pages 531-7, 1999; Crijns A. P et al., 'Molecular prognostic markers in ovarian cancer: toward patient- tailored therapy', lnt J G Cancer, Suppl 1 , pages 152-65, 2006).
For example, the discovery of the marker HER-2 made it possible to identify subgroups of breast cancer patients who will benefit from adjuvant chemotherapy with Trastuzumab (Romond EH et al., Trastuzumab plus adjuvant chemotherapy for operable HER2-positive breast cancer', N Eng.J Med., 353 (16), pages 1673-84, 2005).
Potentially up-regulated genes associated with breast cancer have recently been identified and are being tested as potential biomarkers of breast cancer (Piccart MJ, Loi S, Van'tVeer L, et al, 'Multi-center external validation study of the Amsterdam 70-gene prognostic signature in node negative untreated breast cancer: are the results still outperforming the clinical-pathological criteria?', Breast Cancer Res. Treat., 88 (suppl 1), Abstr 38, 2004). However, there are enormous research challenges to be addressed to determine whether such methods can satisfy the high expectations (for example the ability to tailor therapy on the basis of biological findings) as well as overcome the relevant biotechnological challenges. Further, in clinical practice, decision-making is still largely based on clinical data alone and a great deal of work remains to understand the information derived from genomic/proteomics data and how to integrate this information with clinical data when appropriate. In some cases, clinical data alone are adequate for diagnosis because of the clinico-pathological signs of the malignancy. However, this is by no means always the case.
There are two key problems to be addressed, which are important prerequisites for successful patient-tailored diagnosis and treatment for cancer. First, there is a need for the development of a methodology for quantifying a patient's health/disease status. Further, it would be most advantageous to have a model for handling the diversity and complexity of the diagnostic problem. In particular, the method should preferably be able to handle a wide range of data obtained from the results of tests and examinations performed on a patient, as well as being able to accommodate data sets that may be incomplete.
Providing the first prerequisite will enable the individualised approach to the care of patients, in particular oncological patients, because, in contrast to traditional cancer staging system, quantitative assessment of a patient's health status can handle individual peculiarities and allows fine grading of a patient's condition.
The problems associated with the second prerequisite can be illustrated with a simple example. In clinical practice, some cases of cancer have clear clinico- pathological signs of malignancy so that only ultrasound examination might be needed to make a diagnostic decision and a referral for operation; others might require a thorough examination, including measurement of genetic markers and invasive diagnostic procedures. Thus, to cater for these differing requirements requires the development of a flexible model which can utilise a variable number of modalities.
It would be most desirable if a novel method for quantitative assessment of a patient's health status could be provided which combines multimodal data from macro-, micro- and nano-levels. Preferably, the method should integrate patient information from different modalities (clinical, imaging, laboratory, genomics, etc.) to produce a composite index, with an appropriate confidence measure assigned to each modality.
It would be particularly advantageous if such a method could be provided for the assessment of cancers and tumours, for example ovarian tumours. Ovarian tumours are common among women. In Europe and North America the age- adjusted standardised incidence rate of ovarian cancer is over 10 per 100,000 women. Preoperative prediction of malignancy of ovarian tumours is very important, because it can prevent unnecessary surgery for benign functional cysts or in the case of benign neoplastic lesions only minimal surgical intervention would be required. On the other hand, patients with malignant forms of tumour require not only surgical operation but also an appropriate pre-, peri- and postoperative management. A great deal of effort has been put in by gynaecological oncologists in order to develop preoperative predictive markers of ovarian malignancy. However, prospective testing of these markers have shown either low performance or unbalanced results (i.e. high specificity and low sensitivity). To address the limitations of previous studies the International Ovarian Tumour Analysis (IOTA) Group has established multicentre prospective clinical trials with more than six centres working to the same protocol and collecting data from a total of 1000 patients who have a persistent adnexal mass. For clinical acceptance, a predictive model for discrimination of ovarian tumours should preferably satisfy the following requirements:
(i) have reasonably high sensitivity and specificity levels, typically 90% and 75%, respectively;
(ii) be interpretable; and
(iii) use as few diagnostic techniques/parameters as possible.
In relation to (iii), the range of laboratory and instrumental diagnostic techniques for ovarian cancer is wide and includes transvaginal and transabdominal ultrasonography, serum tumour markers, laparoscopy, computer tomography and magnetic resonance imaging. A key problem is in the choice of necessary procedures taking into account their diagnostic value, cost and invasiveness. Accordingly, there is a need for a method of assessing tumours and cancerous conditions of a patient in particular, and for assessing other clinical conditions in general, that meets the aforementioned criteria and needs.
In a first aspect, the present invention provides a method of characterising a medical condition of a subject, the method comprising the steps of: a) provide a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) calculate a diagnostic coefficient DC1 for each result of an analysis X1; c) calculate an importance factor J1 for each result of an analysis X1; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DC, using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J1, DC,max with the thresholds determined in step (e); g) successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC1 having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in step (e); and h) identify the threshold exceeded in step (g).
The method of the first aspect of the present invention allows the modelling of the preoperative diagnosis of conditions of a patient, in particular cancerous conditions and tumours, especially ovarian tumours. The method is based on the Sequential Nonuniform Procedure (SNuP), which meets the requirements above. SNuP is based on a form of Bayes classification, but with additional restrictions. In particular, consecutive multiplication of likelihood ratios of input variables is interrupted when one of the diagnostic thresholds is reached. Values of thresholds are specified according to an acceptable level of the diagnostic errors. The SNuP operates sequentially on the variables (features) as the cases (observations) are accumulated. This is significant, as it allows the method to provide a personalised differential diagnosis. This is achieved by varying the number of attributes used, ranking the variables according to their discriminative relevance and the specified confidence level. In the first step of the method as set of data obtained from an analysis and/or examination of the subject is provided. The set of data contains the result of at least one test, analysis, investigation or examination carried out on or in respect of the subject. In many cases, the set of data will contain two or more such results. Examples of symptom analyses x, that may be obtained to generate the set of symptom data for a female subject suspected of suffering from ovarian cancer are set out in Table I below. The set of symptom data contains at least one result from an analysis of a symptom x,.
Table I
ANALYSIS x, TYPE OF RESULT
Age (Age) cont.
Menopause state (Meno) binary
Amount of blood flow (CoI score) nominal
Level of serum CA 125 (CA125) cont.
Pulsatility index (Pl) cont.
Resistance index (Rl) cont.
Peak systolic velocity (PSV) cont.
Time-averaged mean velocity (TAMX) cont.
Ascites (Asc) binary
Unilocular cyst (Un) binary
Unilocular solid (UnSoI) binary
Multilocular cyst (MuI) binary
Multilocular solid (MuISoI) binary
Solid tumour (Sol) binary
Bilateral mass (Bilat) binary
Smooth wall (Smooth) binary
Irregular wall (Irreg) binary
Papulations (Pap) binary Septa > 3 mm (Sept) binary
Acoustic shadows (Shadows) binary
Anechoic cystic content (Lucent) binary
Low level echogenicity (Low-level) binary Mixed echogenicity (Mixed) binary
Ground glass cyst (G. Glass) binary
Hemorrhagic cyst (Haem) binary
Output var Pathology result (Path) binary
Indices Ultrasound score (Morph) nominal Jacobs index (Jacobs) nominal
Risk of malignancy index (RMI) cont.
Transformed Rather strong blood flow (Col3) binary vars Very strong blood flow (Col4) binary
CA125 > 30 U/ml (C CA125) binary
In Table I, the form of data for each of the analyses is indicated and may be continuous, binary (that is assigned 1 or 0) or nominal (that is having a discrete value, such as 0, 1 , 2 etc. depending upon the outcome of the analysis).
Similar sets of result data may be compiled for other cancerous and noncancerous conditions.
In one embodiment, the method of the present invention employs purely discrete data, in particular binary data. In this respect, discrete data is to be considered as being data that have a discrete value, such as may result from a test or investigation that produces an indication that is merely 'low1, 'medium' or 'high'. Similarly, binary data is to be considered as the data resulting from a test, examination or analysis of the subject that can give one of two results. For example, included in the above list is the menopausal state of the female subject, who may either be menopausal or not menopausal. However, many analyses or tests conducted on a subject do not yield a simple binary result, but rather provide a continuous result that may take any value within the result range. The type of result data is specified for each data set in Table I. In the present method, the reduction of the continuous result data to a discrete data set, in particular a binary data set, may be achieved in a number of ways. First, the analysis or test may be redefined to produce a binary result. For 5 example, the level of serum CA 125, while a continuous result rather than a binary, may be redefined to specify a minimum level of the serum CA 125 (for example 30 U/ml), allowing the result to be presented as a binary result, that is either above the specified minimum level, or at or below the specified minimum level. However, this manual setting of a specified value, such as a minimum or threshold value, is not I O preferred and can lead to inefficiencies in the system.
Alternatively, and most preferably, the continuous result data are converted into discrete result data, for example binary data. This conversion may be achieved using mathematical manipulations known in the art. In one embodiment, the
15 conversion of the continuous result data to discrete data is achieved using fuzzy logic. Suitable fuzzy logic techniques are known in the art. One preferred method for converting the continuous result data into discrete data is the use of clustering techniques, in particular univariate and multivariate clustering. A preferred clustering technique is disclosed in J. MacQueen, 'Some methods for classification and 0 analysis of multivariate observations', Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1 , 1967, pages 281 to 297. In particular, continuous result data may be transformed to ordinal data by partitioning the initial input space. Thereafter these variables are analysed by SNuP in the regular way. Automatic partition of the input space for continuous variables may be performed by 5 applying k-means clustering, as described by J. MacQueen aforementioned, with three number of clusters. Squared Euclidean distance may be used as a distance measure, initial centroid positions of clusters being selected randomly. In case of a cluster losing all of its member observations those clusters are removed. Assignment of continuous variables from the test set to clusters may be made on a basis of 0 minimal squared Euclidean distance to one of the centroids that are identified on the training stage.
Accordingly, in a further aspect, the present invention provides a method of characterising a medical condition of a subject, the method comprising the steps of: a) provide a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; a1) convert continuous result data present in the set of symptom data into discrete result data; b) calculate a diagnostic coefficient DC1 for each result of an analysis X1; c) calculate an importance factor J, for each result of an analysis X1; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DC1 using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J,, DC,max, with the thresholds determined in step (e); g) if DC,max does not exceed one of the thresholds determined in step (e), successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC1 having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in step (e); and h) identify the threshold exceeded in step (g).
Once the set of symptom data has been compiled and provided, each result of an analysis x, is used to calculate a diagnostic coefficient DC1. The diagnostic coefficient DC, is derived from the probability of the subject having a particular condition given a specific result from the analysis x,.
The diagnostic coefficient DC1 may be derived from the result of an analysis
X1, as follows.
The key issue in diagnosis is to determine whether a given subject belongs to one of two groups, that is the groups of having a particular condition, or not, given the symptoms expressed by the subject and the laboratory data obtained from one or more tests. In the case of a tumour, the key issue is whether the subject belongs to one of the following two groups: benign or malignant tumour. The task can be viewed as a two-class classification (Ak, where k = 1 ; 2) problem, given a vector of input variables, x. Denoting P(Ak) as the prior probability of class k, k = 1 . . n, with n being the number of classes (groups); P(xij Ak) is the conditional probability of x, given Ak, that is the probability of presence of symptom x, in the group Ak; P(X1) is the prior 5 probability of symptom x,. The posterior probability of the subject to belong to group Ah having symptom x, can be defined using Bayes' theorem as follows:
Figure imgf000011_0001
Sequential non-uniform procedure produces a model of classification into two groups The ratio of conditional probabilities of the groups is equal to the ratio of symptom's occurrences in the two groups, as follows: I 5
Figure imgf000011_0002
(2)
where ' 2 'x > ' is a likelihood ratio of probability of a group given symptom 0 x, and ('r > ' ? ' is a likelihood ratio of the probability of the symptom x, given groups Ak
Accumulation of the diagnostic information given the presence of independent features/symptoms X1, X2 Xn is performed as follows: 5 /'i . l I' : - -''-} Vn ) PUMi)
/ '1. . I o ■'1 . . ''2 * n ) A TTJ- PU1 A 2)
(3)
In order to remove the multiplication operation in the right hand part of formula (3), the relationship may be transformed into a summation by taking a logarithm and introducing a scaling factor of 10. The diagnostic coefficient DC1 of symptom x, is a score value which is defined as follows:
Figure imgf000012_0001
When the probability of the symptom x, is higher in group A1 than in group A2 the value of DC, is greater than 0. When the probability of the symptom x, is higher in the group A2 the value of DC, is less than 0.
In the method of the present invention, the next step is to determine an importance factor Ji for each result of the analysis or symptom x,. The importance factor is required in order to rank the analytical and test results and symptoms, as described hereinafter. The importance factor J, may be determined as follows:
The feature selection process and ranking of input variables/symptoms is based on the calculation of symmetrised Kullback-Leibler divergence between two distributions, P and Q, the so-called J-divergence, as described in H. Jeffreys 'An invariant from of the prior probability in estimation problems', J. R. Statist. Soc, Vol. A, pages 453 to 469, 1946, as follows:
j ( K Q ] - »i r\\Q) + Q(QW P)
(5) where D is the Kullback-Liebler divergence so that
Figure imgf000013_0001
Figure imgf000013_0002
and m is the number of distinct values of the variable (for example, 'low', I O 'normal', 'high')
The J-divergence for distinct values P1 and Dj is defined as follows'
P3 log £. + Q3 log %L
Figure imgf000013_0003
15
Substituting for P1 and Qj with the conditional probabilities P(x(J I JA1) and P(Xg I jA2) and, to be consistent with the definition of DC1, scaling formula (5) the J- divergence of the distinct value of the variable becomes: 0
Figure imgf000013_0004
(6) The importance factor J1 for the symptom or result x, is the sum of all the distinct values J(X1,) of the variable, as follows:
(7)
Once the importance factors J, have been obtained for each symptom or result X1, the symptom or results are ranked according to their importance factors, with the symptom or result having the highest importance factor being ranked highest. The subsequent processing of the diagnostic coefficients DC, is applied to each symptom or result x, according to the ranking of its importance factor J1, as will be described hereinafter.
The method of the present invention requires the determination of threshold values for the diagnostic coefficients DC,, that is the threshold values for the ratio set out in formula (3). This in turn requires that the possible errors in the values of the symptom or result data x, is taken into account. In the case of a subject under investigation for a given condition, two types of error may be identified: first, the subject may be diagnosed as having the given condition, when this is incorrect and the condition is not present; and, second, the subject may be diagnosed as not having the given condition, when the condition is in fact present. These errors may be termed as α and β. Thus, in the case of a cancer or tumour, in terms of the 'malignant-benign' classification, α specifies the probability of false assignment of a patient with a malignant tumour into a benign tumour group, and β specifies the probability of false assignment of a patient with a benign tumour into a malignant tumour group.
In terms of classification into groups A1 and A2, α is the rate of misclassification into group A1, and β is the rate of misclassification into group A2. The threshold for a diagnostic hypothesis is the minimum acceptable rate of correct diagnoses over incorrect ones. Denoting A+ as a correct diagnosis and A- as an incorrect diagnosis, the probabilities of correct and incorrect diagnoses in the groups are P[A1+), P(A1- ), P(A2+), and P(A2- ). Accordingly, the decision rule for group 1 is:
r (AoI-VLX2.....) — P(AT)
and for group 2 is:
P(Ax ]X1 1X2,...) ^- P(A0: ) P(Λ3\.τι. X2,...) - P(At)1
P(Aj) P(Aς)
P(A") P( 4+) where * ' and ^ - are the levels of acceptable classification errors.
Using types I and Il errors, for group 1:
P(A1+) = 1 -α ; and
P(A1-) = β. The ratio of correct to incorrect diagnoses in group 1 is as follows:
Figure imgf000015_0001
Similarly for group 2:
P(A2-) = α ; and
P(A2+) = 1 - β. The ratio of correct to incorrect diagnoses in group 2 is as follows:
Figure imgf000016_0001
The threshold for a diagnostic hypothesis is the minimum acceptable rate of correct diagnoses over incorrect ones. Thresholds for the sum of the diagnostic coefficients are defined as follows: I O
Figure imgf000016_0002
Figure imgf000016_0003
The threshold values are assigned to a particular condition or diagnosis. 0 Thus, for example, Ai may assigned as the condition of a tumour being benign and A2 being assigned as the condition of the tumour being malignant. This in turn means that the threshold value DCu1(A1) is the minimum value of the diagnostic coefficients required in order to provide a diagnosis that the tumour is benign. In contrast, the threshold value DCth(A2) is the minimum value of the diagnostic 5 coefficients required in order to diagnose the condition as being a malignant tumour.
The threshold values for different levels of α and β are given in Table Il below. Table Il
Figure imgf000017_0001
It can be seen from the figures in Table II, for example, that an accuracy of 95% (that is both α and β are 0.05), the threshold values for the diagnostic coefficients is 12.8 and -12.8. Increasing the accuracy of the analysis and symptom data to 99% provides threshold values for the diagnostic coefficients of 30 and -30.
Considering the task of assignment of the symptom or analysis data x, to one of the groups A1 or A2 the inference rules for the SNuP are as follows:
* " P(Az[Xt -Xz....) :- p then the decision is * x ? Group A\'\
*
Figure imgf000018_0001
Group .42 *'.
If ■> ^ /J(Λ ,1*1 ,£;;.■■■) .. I -Q 11 | _fl '- P(.l2|:r|.T2....) " 0 then additional information is required to assign x to one of the groups. if a ,.- P(A )]It , X3,...) .- I -a 11 t -fi '"■ P(A2Ix1. rn....) " β and no more features are available then the decision is "membership of x is undefined".
Thus, in the next step of the method of the present invention, the value of the diagnostic coefficient DCimax with the highest importance factor Ji is compared with the threshold values determined for the diagnostic coefficients. If the value of the diagnostic coefficient DCjiriax with the highest importance factor Jj exceeds one of the threshold values, the thus exceeded threshold value is identified and the method terminated. If the value of the diagnostic coefficient DCimax with the highest importance factor Jj does not exceed one of the threshold values, the value of DCimax is summed with the value of the diagnostic coefficient DCi having the next highest importance factor J1. If the value of this sum exceeds one of the threshold values, the thus exceeded threshold value is identified and the method terminated. If neither threshold value is exceeded, the successive summation of the diagnostic coefficients DCi in order of decreasing importance factor Jj is continued until the value of the sum exceeds one of the threshold values. At this point, the summation is ceased and the exceeded threshold value identified.
The accumulation of the diagnostic information using the diagnostic coefficients DCj is performed as a sum, as follows:
∑ DC(n) = DC(X1) + DC(Ji2) + ...4- DC(Xx,)
(10) The SNuP using diagnostic coefficients DC, is performed until the following inequality is no longer true:
DCa1[A2 ) < ∑ DC(X1) < DCtft(.4 , ) • (11)
As noted above, each threshold value of the diagnostic coefficient is the minimum acceptable rate of correct diagnoses over incorrect ones. Thus, identification of the exceeded threshold in turn allows the correct diagnosis to be made. For example, if A1 is taken to represent the subject having a benign tumour and A2 is taken to represent the subject having a malignant tumour. Should the value of the diagnostic coefficient DC.max with the highest importance factor J1 or the successive summation of the diagnostic coefficients exceed the threshold value for the group A1, given as DCtH(A1) in Table II, then the method indicates that the subject has a benign tumour. In contrast, should DC,max or the successive summation of the diagnostic coefficients exceed the threshold value for the group A2, given as DQh(A2) in Table II, the subject is to be diagnosed with a tumour that is malignant.
In a further aspect, the present invention provides a system for characterising a medical condition of a subject, the system comprising: a) means for providing a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) means for calculating a diagnostic coefficient DC, for each result of an analysis x,; c) means for calculating an importance factor J, for each result of an analysis X1, d) means allowing a user to specify an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DC1 using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J1, DC,max with the thresholds determined in feature (e); g) means for successively summing the diagnostic coefficient DC,max with the diagnostic coefficient DC1 having the next highest importance factor J1 until the value of the sum lies outside the threshold range defined in feature (e).
The system of this aspect of the present invention is capable of processing binary analysis data. However, as noted hereinbefore, many tests, examinations and analyses conducted on or in respect of a subject provide continuous data as results. Accordingly, it is particularly preferred that the system comprises means for converting continuous analysis data into discrete data, for example binary data.
In a still further aspect, the present invention provides a system for characterising a medical condition of a subject, the system comprising: a) means for providing a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; a1 ) means for converting continuous result data present in the set of symptom data into discrete result data; b) means for calculating a diagnostic coefficient DC, for each result of an analysis X1; c) means for calculating an importance factor J1 for each result of an analysis x,; d) means allowing a user to specify an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DC, using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DC1 of the analysis x, with the highest importance factor J1, DC,max with the thresholds determined in feature (e); g) means for successively summing the diagnostic coefficient DC.max with the diagnostic coefficient DC1 having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in feature (e).
As discussed hereinbefore, the methods of the present invention may be used to provide an indication at an early stage in the diagnostic assessment of a subject the extent to which some or all of the available tests, analyses and examinations in relation to the subject are required in order to allow the medical practitioner to arrive at a clear diagnosis. In particular, the methods may be applied in order to identify those tests, analyses and examinations that are not required in order to reach a clear diagnosis, thus reducing the amount of time the subject is subjected to, possibly invasive, procedures, the overall time taken to carry out the prediagnostic assessments, and the overall cost of the diagnostic procedure.
Accordingly, in a further aspect, the present invention provides a method of identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the method comprising the steps of: a) provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) calculate a diagnostic coefficient DC, for each result of an analysis x,; c) calculate an importance factor J1 for each result of an analysis X1; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DC1 using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J1, DC,max with the thresholds determined in step (e); g) successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC1 having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in step (e); h) identify the threshold exceeded in step (g); i) if no threshold is exceeded as a result of the successive summation of all the diagnostic coefficients DC, in step (g), provide an indication that further symptom data for the subject are required. The method applies the general process discussed hereinbefore to a first set of symptom data obtained from a first group of tests or analyses. This first set may contain only some of the results of a complete investigation into the condition of the subject. However, the method is applied to this partial data set. If the successive summation of the diagnostic coefficients results in one of the threshold values being exceeded, then a diagnosis of the condition can be made, without the need for conducting further investigations or tests. It is only when the successive summation of all the diagnostic coefficients in the first data set does not provide a sum that exceeds one of the thresholds that further investigations are required. The method may be applied after each analysis, test or investigation into the subject and the procedures continued only until a threshold is exceeded and a diagnosis is possible. In this way, the method indicates whether a diagnosis of the condition is possible from a selection of tests, analyses or investigations selected according to specified criteria, such as time taken, discomfort or risk to the subject, and/or cost.
As noted hereinbefore, the method may be applied to sets of data that contain only binary values. However, it is most advantageous if the method is also applied to data sets containing continuous values. In this case, the continuous data values are converted into discrete data, as hereinbefore described.
In a further aspect, the present invention provides a method of identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the method comprising the steps of: a) provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis Xj conducted on the subject; a1 ) convert continuous result data present in the set of symptom data into discrete result data; b) calculate a diagnostic coefficient DCj for each result of an analysis x,; c) calculate an importance factor J-, for each result of an analysis Xι; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DCi using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J1, DC.max with the thresholds determined in step (e); g) successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC, having the next highest importance factor J1 until the value of the sum lies outside the threshold range defined in step (e); h) identify the threshold exceeded in step (g); i) if no threshold is exceeded as a result of the successive summation of all the diagnostic coefficients DC1 in step (g), provide an indication that further symptom data for the subject are required.
A system for carrying out the method of the foregoing aspects of the invention is also provided. Accordingly, there is provided a system for identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the system comprising: a) means to provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) means to calculate a diagnostic coefficient DC1 for each result of an analysis x,; c) means for calculating an importance factor J, for each result of an analysis x,; d) means for specifying an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DC1 using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DC1 of the analysis x, with the highest importance factor J1, DC,max with the thresholds determined in feature (e); g) means to successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC1 having the next highest importance factor J1 until the value of the sum lies outside the threshold range defined in feature (e); and h) means that, if no threshold is exceeded as a result of the successive summation of all the diagnostic coefficients DCi in step (g), provide an indication that further symptom data for the subject are required.
The system most preferably further comprises means for converting continuous data in the data set into discrete data. Accordingly, there is also provided a system for identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the system comprising: a) means to provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; a1) means for converting continuous result data present in the set of symptom data into discrete result data; b) means to calculate a diagnostic coefficient DCi for each result of an analysis X1; c) means for calculating an importance factor Jj for each result of an analysis X1; d) means for specifying an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DCj using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DCi of the analysis x, with the highest importance factor 4 DC,max with the thresholds determined in feature (e); g) means to successively sum the diagnostic coefficient Ddmax with the diagnostic coefficient DC, having the next highest importance factor J1 until the value of the sum lies outside the threshold range defined in feature (e); and h) means that, if no threshold is exceeded as a result of the successive summation of all the diagnostic coefficients DCj in step (g), provide an indication that further symptom data for the subject are required. The method and apparatus of the present invention will now be further illustrated by the following examples, having reference to the accompanying figures, in which:
Figure 1 is a graphical representation of the method of the present invention applied to four different cases of ovarian cancer.
Example 1
Figure 1 illustrates, graphically, the method of the present invention applied to a set of data obtained from the investigation of a subject suspected of suffering from ovarian cancer. The result of the successive sum of DQ in every step of procedure is indicated by arrow. A1 is the outcome that the tumour is malignant, while A2 is the outcome that the tumour is benign. The thresholds for A1 and A2 are denoted by bold solid lines. The hatched areas in Figure 1 are determinate zones for A1 and A2 (malignant and benign groups). Case 1 and case 2 demonstrate SNuP with definitive variables (when one variables is enough to reach a threshold). Thus, in both cases 1 and 2, the value of the diagnostic coefficient DC,max having the highest importance factor exceeds one of the threshold values. In case 1 , the tumour may be diagnosed as being malign. In case 2, the subject is suffering from a benign tumour. Case 3 shows the straight-forward classification of a benign tumour in three steps, involving the successive summation of the three diagnostic coefficients having the first, second and third highest importance factors. Case 4 is a difficult case of ovarian cancer. It can be seen the final determination of the method applied to case 4 is that the tumour is malignant. However, it will be noted that the successive summation provides an indication in the third and fourth summations that the tumour may be benign, as indicated by the value of the summation tending towards the threshold value DCth(A2). In a conventional diagnostic procedure, a medical practitioner may have interpreted such a trend in the data to indicate a benign tumour. However, proceeding further with the method of successive summation until a threshold value is exceeded demonstrates that such a diagnosis would be incorrect and that the subject is suffering from a malignant tumour. In any case, should the result of the summation procedure after all the diagnostic coefficients had been successively summed still lie between the threshold values, this would indicate that further investigative data are required in order to provide a full diagnosis of the condition of the subject.
Example 2
A study was conducted of data obtained from the investigation of 525 patients admitted to the Department of Obstetrics and Gynecology at the University Hospitals Katholieke Universiteit Leuven. All the patients underwent a transvaginal ultrasonography with B-mode and colour Doppler imaging. The level of serum oncomarker CA125 was measured for 432 patients. A summary of the collected data is as set out in Table I above. A detailed description of the data acquisition process is set out in D. Timmerman, et al., 'A comparison of methods for preoperative discrimination between malignant and benign adnexal masses: the development of a new logistic regression model', Am J. Abstet. Gynecol., Vol. 181 , No. 1 , pages 57 to 65, July 1999.
As part of the ultrasound examination the amount of blood flow was assessed within the septa, cyst walls, solid tumor areas, or ovarian stroma. Depending on whether the amount of the blood flow was rather strong or very strong two new binary variables were added - 'Col3' and 'CoW. The variable CA125 was transformed to binary values, 0 or 1 , depending on a threshold value of 30 U/ml. A value of 1 was assigned if CA125>30, a value of 0 otherwise.
The Risk of Malignancy Index (RMI) was used as a benchmark during the performance evaluation. RMI values were calculated according to the formula RMI=JacobsχMenoχCA125. Details of the RMI are set out in I. Jacobs, et al., 'A risk of malignancy index incorporating CA 125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer', Br. J. Obstet. Gynaecol., Vol. 97, No. 10, pages 922 to 929, October 1990. The ultrasound score (Morph) was calculated as the sum of scores for the presence of multilocular cyst, evidence of solid areas, evidence of metastases, presence of ascites and bilateral lesions. Jacobs' index was assigned a value of 0 if Morph=0, a value 1 if Morph=1 and a value 3 if Morph>1. The menopause state (Meno) was equal to 1 if premenopausal and equal to 3 if postmenopausal.
The calculated diagnostic coefficients and J-divergences for all nominal input variables are presented in Table III below. The last column shows the rank of the symptom. The sequential non-uniform procedure for preoperative differential diagnosis between benign and malignant forms of adnexal tumour is recommended to start from the most informative variables, i.e. variables with the highest J rank (e.g. smooth internal wall, strong blood flow, presence of unilocular cyst, level of serum CA125 above 30 U/ml, presence of ascites, etc).
Table
Figure imgf000027_0001
Figure imgf000028_0001
Figure imgf000029_0001
N - total number of cases in the group, n - number of cases in the group with presence of feature
In this study only binary variables were considered. A large value of DCj means a high discriminative ability of the variable and the importance factor J, gives an indication of the reliability of this variable. Features with positive DCi values correspond to malignancy, and those with negative values to the benign group. Accumulation of the diagnostic information was carried out by summation of the diagnostic coefficients and comparing the sum with a specified threshold.
Two cases of ovarian cancer will now be used to demonstrate the application of the method of the present invention. Case 1
A woman with a benign adnexal mass, age 31 , pre-menopausal, strong blood flow, CA125 is not raised (9 U/ml), no ascites, unilocular ovarian cyst, smooth internal wall, mixed echogenicity (patient 3 in the database). Acceptable levels of errors are: α = 0.05 and β = 0.05, that is assuming 95% confidence for both decisions (benign and malignant). The thresholds values for the summation of the diagnostic coefficient may be taken from Table Il and are DC1I1(A1) = 12.8 for malignant and DC,h(A2) = -12.8 for benign.
The successive summation of the diagnostic coefficients DC, follows the following steps:
Step 1 : Variable 'Smooth'^, ^(Smooth = 0 = - 10 Sum(DC) - - 10 Thresholds are not reached. Conclusion: continue procedure.
Step 2: Variable 'CoW=I , Dc(-Co14 = !> = 1 ] 1 ,
Sum {DC ) = - 10 + 1 1 .1 = 1. 1
Thresholds are not reached. Conclusion: continue procedure.
Step 3: Variable W=I , DC(-U" = *> = " 10 3 , Sum^DC>> = λ Λ + <" 10-3> = : 9-2 . Thresholds are not reached. Conclusion: continue procedure.
Step 4: Variable 1C-CAI 25'=0, DC(C - CA 125 = 0) = " 5 6 , Sum(DC ) = - 9.2 + (- 5.6) = - 14.8 Sum(DC) < DClh(A2)
Threshold has been exceeded. Conclusion: Stop SNuP. Decision: benign form of tumour.
For case 1 , four variables are enough to make a decision as to a diagnosis with a confidence level of 95%. Case 2
This case contains data missing for some variables, indicating that certain tests or analyses were not carried out on the subject.
This is a difficult case of ovarian cancer. It is for a woman aged 72, post- menopause, ascites, multilocular cyst, strong blood flow, smooth internal wall, no information on the level of CA125 (patient number 216 in the database). Thresholds for Sum(DC) are 9.8 for malignant and -12.5 for benign (α = 0.05 , b = OAO y
The successive summation of the diagnostic coefficients DQ proceeded as follows:
Step 1 : Variable 'Smooth1^. DC^ooth - 0 - - 1O 1 Sum(DC) = - 10 Thresholds are not reached. Conclusion: continue procedure.
Step 2: Variable 'CoW=I , DC(CoIA = 1) = 11.1 _ Sum(DC) = - 10 + 11.1 = 1.1 Thresholds are not reached. Conclusion: continue procedure.
Step 3: Variable W=O, DC(Un = 0) = 2'5 , Sum(DC) = 1.1 + 2.5 = 3.6
Thresholds are not reached. Conclusion: continue procedure.
Step 4: Variable 'C_CA125' value unknown, Sum(DC) = 3-6 . Thresholds are not reached. Conclusion: continue procedure. Step 5: Variable 'Ascites1^ , DC (A sates = l) = 6.6 _
Sum(DC) = 3.6 + 6.6 = 10.2 Sum(DC) > DC,h(Ax)
Threshold has been exceeded. Conclusion: Stop SNuP. Decision: malignant form of tumour. Example 3
The performance of the method of the present invention was assessed using ROC analysis and a 3-fold cross validation. A ratio of 1 :2 between malignant and benign groups sample sizes was taken from the initial data set.
The SNuP procedure of the present invention was applied to the ovarian tumour data set. The task was to distinguish malignant and benign forms of this kind of neoplasm. The differential diagnosis of these conditions apart from clinical examination involves ultrasound methods, tumour markers, CT and MRI. It is important to find a trade-off between the cost and the number of the diagnostic procedures and the risk of missing a case when urgent surgical operation might be required.
The SNuP showed a high performance on a real data set during cross validation. The method is close to clinical thinking and can be used not only for research but also for educational purposes to demonstrate the inference process.
Example 4
A second study using the method of the present invention was performed on 1066 cases of adnexal masses collected during international multicentre clinical trial across 14 research centres. The full database include 1066 cases of ovarian tumours, 266 malignant and 800 benign. Histological diagnosis were used as a gold standard. There were three data modalities:
(i) clinical variables included family history of ovarian and breast cancer, age, menopausal status, previous hormonal surgery, and surgical history;
(ii) sonographical examination was performed in all cases with gray scale and colour Doppler imaging with total over 40 morphological and blood flow velocity characteristics;
(iii) serum tumour marker CA125 was measured for 809 patients. When intratumoral blood flow velocity waveforms were not detected, the peak systolic velocity (PSV), time averaged maximum velocity (TAMXV), the pulsatility index (Pl), and the resistance index (Rl) were substituted by 2.0 cm/sec, 1 cm/sec, 3.0, and 1.0, respectively.
At the first stage of analysis some preprocessing procedures were made in order to incorporate continuous variables into a model. Continuous variables were transformed into discrete values by automatic partitioning input space into intervals. A univariate and multivariate k-means clustering procedure in MATLAB was used. For volumetric characteristics, such as diameter of the lesion (LesD1-3) multivariate clustering was used.
Table IV below demonstrates this approach. First, it is necessary to specify variables and a desired number of clusters. Three clusters were used by default, considering that values in these clusters might be described as 'low', 'medium' and 'high'. As a result, a new variable with three/two values was obtained. Assignment of values for the variable was made by the following rule:
m value = argi inin ^(a>j - Cij f
where '- ' .. .is the Euclidean distance from an observation x to one of m-dimensional centroids c of clusters.
In the case of a malignant-benign classification using the diameter of the lesion, the number of clusters max(/) = 3 , the number of modalities maxC/) = m = 3 and coordinates of cluster centroids c' are triplets {LesD\;LesD7\LesD3), : {51.0;40.7;40.0}, {106.8; 85.3; 82.2}, {201.2; 148.0; 142.3} Table IV
Values
Variable
Cluster 1 Ouster 2 Cluster 3
LesDl 51.0 106.8 201.2
LesD2 40.7 S5.3 148.0
LesD3 40.0 82.2 142.3
UsD 1 2 3
Ig(CA 125) 1.1 1 1.83 3.01
Ig( CA 125) 1 2 3
After all input variables were converted to a discrete scale the successive summation procedure of the present invention was applied to the data. For every distinct value of a variable (e.g. strong blood flow, col score = 4) the following parameters were calculated:
the conditional probability of this event in malignant and benign groups P(Xj |
the diagnostic coefficient for the distinct value of the variable DC(XiJ ) using formula (4); the J-divergence of the symptom's level J(Xjj) using formula (6).
Then values of J-divergence were summarised across all values to produce an importance factor Jj of the symptom xit applying formula (7). All the variables were sorted by their importance factor J, in descending order. The most informative variables (that is those with an importance factor J of greater than 1.0) are summarised in Table V, where the total J is presented in the third column, all distinct values of variables are given in the forth column, followed by corresponding values of the diagnostic coefficients DCi. A large value of DCi means a high discriminative ability of the variable and the importance factor J1 gives an indication of the reliability of this value. Features with positive DCj values correspond to malignancy, and those with negative values to the benign group. Accumulation of the diagnostic information was carried out by summation of the diagnostic coefficients and comparing the sum with a specified threshold. Table V
Rank Variable J Le^cI ol symptom IDiagnostic coelficients
I Uiculariiy 5J7 j I 2 j 436} {-15.9: 2.4: -6.1; 3.6: 7.4: 4.7 )
CoI Score 3.91 {1234} {-11.2: -3.6: 2.0; 9.6}
3 WallRegularilv 2.76 {01} {-6.4:4.2}
4 SolidD 2.68 {12] {-2.6:9.9}
5 Λscilϊs 2.64 {01} {-2.1:11.9}
6 PL RL I1SV. TAMXV 2.19 {123} {11.6; 5.2: -3.3)
7 RaI ioPapI it's 2.05 {123} {6.5: -2.8:7.5}
S NrLocutes 1.99 {0123456} {6.6; -4.0; -0.3: -2.9: -1.5: 1.8:7.7]
9 M u id 1.79 {12} {8.9: -1.9}
10 IMpNr 1.78 {0 1 234} {-1.9: -0.7:5.3: 2.7: 11.1}
I l RipMαv 1.68 {01} {-1.9:8.1}
12 Max Solid 1.63 {123} I -1.4: 9.1; 11.5|
13 age 1.35 {123} {-4.4; 4.S; 0.9}
14 O' D 1.3 {123} |-3.2:6.3;1.8}
15 PapSmooih 1.21 {01] {-1.7:6.4}
16 Max IJCS 1.19 {123} {-3.0: 6.1; 2.1}
17 Ig(TA 125) 1.11 {I 23} {0.7:4.8:13.6}
IS Lesl) 1.05 {123} f -2.8; 2.7; 5.3}
In order to classify cases the thresholds DCth (Ai2) were to be specified. B was set and fixed at 0.05, while α was varied from 0.90 to 0.001. Lower and upper thresholds for sum of the diagnostic coefficients were calculated using formulae (8) and (9).
Performance of the method during 3-fold cross-validation is presented in Table Vl. The last column of the table shows the median number of cases where the diagnostic decision was undefined. As can be seen this number increases with decreasing the level of acceptable error α. As a result a value of α = 0.10 was selected as an optimal threshold, as it produces a relatively high performance (Se=86.9%, Sp=84.3%, Acc=84.9%) and low number of undefined cases (10 out of
355).
The interpretation of diagnostic coefficients of untransformed and univariately transformed variables is straightforward when there is clear assignment of the diagnostic coefficient DC, to the level of the symptom. For instance, low blood flow (ColScore=1) is highly associated with a benign tumour, DQ = -11.2, and strong blood flow (ColScore=4) on the other hand is a marker of malignancy, DC, = 9.6.
Examples of a univariately transformed variable include log of serum CA125, which was split into three clusters, which can be described as 'low', 'medium' and 'high' levels, with an increasing degree of association with malignancy. New variables created in two- or multidimensional space might have increasing values in one dimension and decreasing values in another, which may slightly complicate interpretation and require additional clinical input. Examples of these kinds of variables include the diameter of solid component (SoNdD), velocity indices (Pl, Rl, PSV, TAMXV), diameter of ovaries (OvD) and diameter of lesion (LesD).
Example 5
The method of the present invention was compared with the performance of an expert medical assessment.
Assuming S is a gold standard value (0 - benign tumour, 1 - malignant tumour), M - model result (0 - benign tumour, 0.5 - undefined, 1 - malignant tumour), E - expert opinion (0 - benign tumour, 1 - malignant tumour), there are six possible situations in comparing the model to an expert, as follows:
1 ) Method is correct. Expert is correct. S = M U S = E.
2) Method is correct. Expert is incorrect. S = M U S ≠ E
3) Method is incorrect. Expert is correct.
S ≠ M U M ≠ 0:5 U S = E
4) Method is incorrect. Expert is incorrect. S ≠ M U S ≠ E 5) Method's result is undefined. Expert is correct. M = 0:5 U S = E
6) Mthod's result is undefined. Expert is incorrect.
M = 0:5 U S ≠ E
In the above situations, conditions 1 and 4 represent a situation when the method and the expert agree. The rest of the conditions are more interesting. Conditions 2 and 6 are difficult cases for diagnosis. Condition 2 is true when the expert misses something or reaches a conclusion based on a wrong assumption. This may also be due to new knowledge discovered by the method of the present invention. When the expert outperforms the method, condition 3 is true. Condition 5 is possible when there is enough information for the expert to arrive at a correct conclusion but not enough for the method to provide a determinative answer.
Table VII contains a summary of the results of the method of the present invention compared with the performance of an expert in making the same diagnosis.
Table VII
Condi ricms Number of cases
Benign Malignant
1. M coiTecl. E correct. 71 (73.2%) 35 (79.5%) 2. M coτrecl. E iπcoriecl. 10 Cl 0.3%) I (2.3%) λ M incorrect.. E coπecl. 6 (6.2%) 6 (13.6%) 4. M incorrect. E incoπecl 5 (5.1 %) 0 (0.0%) 5. M undefined. E correct. 3 (3.1 %) 2 (4.6%) 6. M undefined. E incorrect. 2 (2, 1%) 0 (0,0%)
Modelling conditions included a 3-fold cross validation, 141 cases in the test set, α and β were equal to 0.10, and P(A) was taken into account. Table VII shows that for malignant tumour the agreement was reached in 79.5% cases (79.5% and 0.0%) and for the benign form the level of agreement was almost the same - 78.3% (73.2% and 5.1 %). The method of the present invention was better than the expert in 10 (10.3%) cases of benign and 1 (2.3%) case of malignant tumours. The expert outperformed the method in 9 (9.3%) cases of benign neoplasm and 8 (18.2%) cases of ovarian cancer.
The results of the method-expert comparison show that the method of the present invention compares well with the diagnoses made by a medical expert.
I O Example 6
Measurement of the tumour marker CA125 in the serum is common in the diagnosis of ovarian cancer as well as during and after treatment. It has been shown that an abnormally raised level of CA125 is associated with malignancy. However,
15 many women with a benign tumour, or even healthy women, might have raised levels of CA125. This in turn results in a high rate of misdiagnosis. On the other hand 10 to 20 percent of ovarian cancer patients have normal levels of CA125. Analysis of CA125 is expensive and the laboratory results take a considerable time to be produced. Therefore it is important to evaluate the role of CA125 in the preoperative 0 differential diagnosis of adnexal masses and establish conditions in which the diagnosis will definitely benefit from CA125, or to identify conditions when CA125 can be omitted from diagnostic procedures.
The method of the present invention was used to analyse a set of test data for 5 a range of subjects and to compare the results obtained when the CA125 data were included in the data set with the results obtained when the CA125 data were omitted. The results are summarised in Table VIII.
0 Table VIII
Figure imgf000039_0001
For the analysis summarised in Table VIII1 both α and β were set at 0.05.
As can be seen from Table VIII, the CA125 result does not significantly improve the classification performance, but brings more certainty to the decision making process by reducing the total number of undefined cases although the median number of variables stays the same.
The CA 125 test has an importance factor Jj that ranks it fourth in importance. As demonstrated in Example I, the method of the present invention may not require four data points in order to reach a diagnosis. In an experiment, CA125 was used in 80 patients out of 141. In 66 (82.5%) cases the absence of CA125 did not change the outcome of the method of the present invention, and the rates per groups were 30 (78.9%) benign cases, 36 (85.7%) malignant cases. The exclusion of CA125 produced the worse results in 9 (11.3%) cases and better results3 in 5 (6.3%) cases.
Accordingly, the method of the present invention may be used to provide a diagnosis in a significant number of cases on the basis of a data set that does not contain CA125 data. If the method is applied and the result is indeterminative, that is neither threshold is exceeded, this is an indication that further data are required, which may include CA125 data.

Claims

1. A method of characterising a medical condition of a subject, the method comprising the steps of: a) provide a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) calculate a diagnostic coefficient DC, for each result of an analysis x,; c) calculate an importance factor J1 for each result of an analysis X1; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DC, using the error level specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J1, DC,max with the thresholds determined in step (e), g) successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC1 having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in step (e); and h) identify the threshold exceeded in step (g).
2. The method according to claim 1 , wherein the medical condition is a tumour.
3. The method according to claim 1, wherein the medical condition is ovarian cancer.
4. The method according to any preceding claim, wherein the set of symptom data comprises a plurality of results.
5. The method according to any preceding claim, wherein the data in the set of symptom data are all discrete data.
6. The method according to claim 5, wherein the symptom data are all binary data.
7. The method according to any of claims 1 to 4, wherein the set of symptom data comprises continuous data.
8. The method according to claim 7, further comprising the step of: a1) converting the continuous data in the set of symptom data into discrete data.
9. The method according to claim 8, wherein the conversion of the data is carried out using fuzzy logic.
10. The method according to claim 8, wherein the conversion of the data is carried out using clustering techniques.
11. The method according to claim 10, wherein the clustering techniques comprise univariate and multivariate clustering.
12. The method according to any preceding claim, wherein the specification of an acceptable level of error comprises specifying values for α and β.
13. A method of characterising a medical condition of a subject, the method comprising the steps of: a) provide a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis Xj conducted on the subject; a1) convert continuous result data present in the set of symptom data into discrete result data; b) calculate a diagnostic coefficient DC, for each result of an analysis XJ; c) calculate an importance factor J, for each result of an analysis X1; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DQ using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis X1 with the highest importance factor J,, DCjinax, with the thresholds determined in step (e); g) if DC,max does not exceed one of the thresholds determined in step (e), successively sum the diagnostic coefficient DCjmax with the diagnostic coefficient DC, having the next highest importance factor Jj until the value of the sum lies outside the threshold range defined in step (e); and h) identify the threshold exceeded in step (g).
14. A system for characterising a medical condition of a subject, the system comprising: a) means for providing a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis Xj conducted on the subject; b) means for calculating a diagnostic coefficient DC, for each result of an analysis X1; c) means for calculating an importance factor J, for each result of an analysis x,; d) means allowing a user to specify an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DCi using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J1, DC,max with the thresholds determined in feature (e); g) means for successively summing the diagnostic coefficient DC,max with the diagnostic coefficient DC, having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in feature (e).
15. The system according to claim 14, further comprising means for converting continuous data in the set of symptom data into discrete data.
16. The system according to claim 15, wherein the means for converting the continuous data employs fuzzy logic and/or clustering techniques.
17. A system for characterising a medical condition of a subject, the system comprising: a) means for providing a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; a1) means for converting continuous result data present in the set of symptom data into discrete result data; b) means for calculating a diagnostic coefficient DC, for each result of an analysis X1; c) means for calculating an importance factor Jj for each result of an analysis x,; d) means allowing a user to specify an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DC1 using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DQ of the analysis x, with the highest importance factor Ji, DCjiriax with the thresholds determined in feature (e); g) means for successively summing the diagnostic coefficient DC,max with the diagnostic coefficient DC1 having the next highest importance factor Jj until the value of the sum lies outside the threshold range defined in feature (e).
18. A system for characterising a medical condition of a subject, the system comprising: a) means for providing a set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; a1 ) means for converting continuous result data present in the set of symptom data into discrete result data; b) means for calculating a diagnostic coefficient DCj for each result of an analysis x,; c) means for calculating an importance factor J1 for each result of an analysis X1; d) means allowing a user to specify an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DC1 using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DC1 of the analysis x, with the highest importance factor J1, DC,max with the thresholds determined in feature (e); g) means for successively summing the diagnostic coefficient Ddmax with the diagnostic coefficient DC, having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in feature (e).
19. A method of identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the method comprising the steps of: a) provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) calculate a diagnostic coefficient DC, for each result of an analysis x,; c) calculate an importance factor J1 for each result of an analysis x,; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DC, using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC, of the analysis x, with the highest importance factor J1, Ddmax with the thresholds determined in step (e); g) successively sum the diagnostic coefficient Ddmax with the diagnostic coefficient DC1 having the next highest importance factor Ji until the value of the sum lies outside the threshold range defined in step (e); h) identify the threshold exceeded in step (g); i) if no threshold is exceeded as a result of the successive summation of all the diagnostic coefficients DCj in step (g), provide further symptom data for the subject.
20. The method according to claim 19, further comprising the step of: a1) converting continuous data contained in the first set of symptom data into discrete data.
21. A method of identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the method comprising the steps of: a) provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; a1) convert continuous result data present in the set of symptom data into discrete result data; b) calculate a diagnostic coefficient DC, for each result of an analysis x,; c) calculate an importance factor J1 for each result of an analysis X1; d) specify an acceptable level of error in the characterisation; e) determine the thresholds for the diagnostic coefficients DC1 using the error levels specified in step (d) and define a threshold range therebetween; f) compare the value of the diagnostic coefficient DC1 of the analysis x, with the highest importance factor J1, DC,max with the thresholds determined in step (e); g) successively sum the diagnostic coefficient DC,max with the diagnostic coefficient DC1 having the next highest importance factor J1 until the value of the sum lies outside the threshold range defined in step (e); h) identify the threshold exceeded in step (g); i) if no threshold is exceeded as a result of the successive summation of all the diagnostic coefficients DC1 in step (g), provide an indication that further symptom data for the subject are required.
22. A system for identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the system comprising: a) means to provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis x, conducted on the subject; b) means to calculate a diagnostic coefficient DC, for each result of an analysis X1; c) means for calculating an importance factor J1 for each result of an analysis X1, d) means for specifying an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DCj using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DC1 of the analysis Xi with the highest importance factor Jj, DC.max with the thresholds determined in feature (e); g) means to successively sum the diagnostic coefficient DCjinax with the diagnostic coefficient DCi having the next highest importance factor J1 until the value of the sum lies outside the threshold range defined in feature (e); and h) means that, if no threshold is exceeded as a result of the successive summation of all the diagnostic coefficients DQ in step (g), provide an indication that further symptom data for the subject is required.
23. A system for identifying the investigative procedures required to reach a diagnosis of a medical condition of a subject, the system comprising: a) means to provide a first set of symptom data for the subject, the set of symptom data comprising at least one result of an analysis X1 conducted on the subject; a1) means for converting continuous result data present in the set of symptom data into discrete result data; b) means to calculate a diagnostic coefficient DCj for each result of an analysis x,; c) means for calculating an importance factor Jj for each result of an analysis x,; d) means for specifying an acceptable level of error in the characterisation; e) means to determine the thresholds for the diagnostic coefficients DC1 using the error levels specified in feature (d) and define a threshold range therebetween; f) means to compare the value of the diagnostic coefficient DCj of the analysis x, with the highest importance factor Jj, DCjmax with the thresholds determined in feature (e); g) means to successively sum the diagnostic coefficient DC.max with the diagnostic coefficient DC, having the next highest importance factor J, until the value of the sum lies outside the threshold range defined in feature (e); and h) means that, if no threshold is exceeded as a result of the successive summation of all the diagnostic coefficients DC, in step (g), provide an indication that further symptom data for the subject are required.
PCT/GB2008/002859 2007-08-23 2008-08-22 Method for prediction and diagnosis of medical conditions and apparatus for performing the same WO2009024796A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0716460.1 2007-08-23
GB0716460A GB2452067A (en) 2007-08-23 2007-08-23 Method for prediction and diagnosis of medical conditions

Publications (1)

Publication Number Publication Date
WO2009024796A1 true WO2009024796A1 (en) 2009-02-26

Family

ID=38599151

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2008/002859 WO2009024796A1 (en) 2007-08-23 2008-08-22 Method for prediction and diagnosis of medical conditions and apparatus for performing the same

Country Status (2)

Country Link
GB (1) GB2452067A (en)
WO (1) WO2009024796A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112435757A (en) * 2020-10-27 2021-03-02 深圳市利来山科技有限公司 Prediction device and system for acute hepatitis

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0912957B1 (en) * 1996-07-12 2004-12-08 First Opinion Corporation Computerized medical diagnostic and treatment advice system including network access
US20090054740A1 (en) * 2006-03-03 2009-02-26 Mentis Cura Ehf Method and apparatus of constructing and using a reference tool to generate a discriminatory signal for indicating a medical condition of a subject

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
IFEACHOR ET AL.: "Preoperative ovarian cancer diagnosis using neuro-fuzzy approach", 28 September 2005 (2005-09-28) - 30 September 2005 (2005-09-30), pages 1 - 8, XP002506224, Retrieved from the Internet <URL:ftp://ftp.esat.kuleuven.ac.be/sista/ida/reports/05-196.pdf> *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112435757A (en) * 2020-10-27 2021-03-02 深圳市利来山科技有限公司 Prediction device and system for acute hepatitis

Also Published As

Publication number Publication date
GB0716460D0 (en) 2007-10-03
GB2452067A (en) 2009-02-25

Similar Documents

Publication Publication Date Title
Singh Determining relevant biomarkers for prediction of breast cancer using anthropometric and clinical features: A comparative investigation in machine learning paradigm
JP4963721B2 (en) Method and system for determining whether a drug is effective in a patient with a disease
US20040101181A1 (en) Automated method and system for computerized image analysis prognosis
Ma et al. A machine learning-based diagnosis of thyroid cancer using thyroid nodules ultrasound images
Zadeh Shirazi et al. A novel and reliable computational intelligence system for breast cancer detection
Lee et al. Image-based clinical decision support for transrectal ultrasound in the diagnosis of prostate cancer: comparison of multiple logistic regression, artificial neural network, and support vector machine
CN111863250B (en) Combined diagnosis model and system for early breast cancer
Bhuyan et al. Disease analysis using machine learning approaches in healthcare system
CN113436150A (en) Construction method of ultrasound imaging omics model for lymph node metastasis risk prediction
Qi et al. Diagnosis of ovarian neoplasms using nomogram in combination with ultrasound image-based radiomics signature and clinical factors
CN115440383B (en) System for predicting curative effect of PD-1/PD-L1 monoclonal antibody of advanced cancer patient
Pareek et al. [Retracted] Predicting the Spread of Vessels in Initial Stage Cervical Cancer through Radiomics Strategy Based on Deep Learning Approach
WO2022015700A1 (en) Universal pan cancer classifier models, machine learning systems and methods of use
Ameye et al. A scoring system to differentiate malignant from benign masses in specific ultrasound‐based subgroups of adnexal tumors
Miao et al. Breast cancer biopsy predictions based on mammographic diagnosis using support vector machine learning
Zhu et al. Predictive Value of Ultrasound Imaging Characteristics and a BRAF V600E Nomogram for Central Lymph Node Metastasis Risk in Papillary Thyroid Microcarcinoma.
Patrício et al. Differentiating malignant thyroid nodule with statistical classifiers based on demographic and ultrasound features
WO2009024796A1 (en) Method for prediction and diagnosis of medical conditions and apparatus for performing the same
Fang et al. Deep learning predicts biomarker status and discovers related histomorphology characteristics for low-grade glioma
CN114941031A (en) Early gastric cancer prognosis differential gene and recurrence prediction model
Cherezov et al. Rank acquisition impact on radiomics estimation (AсquIRE) in chest CT imaging: A retrospective multi-site, multi-use-case study
CN112329876A (en) Colorectal cancer prognosis prediction method and device based on image omics
Park et al. Classification of serous ovarian tumors based on microarray data using multicategory support vector machines
Kadhim et al. A Comprehensive Review of Artificial Intelligence Approaches in Kidney Cancer Medical Images Diagnosis, Datasets, Challenges and Issues and Future Directions
Wang et al. Study on diagnosing thyroid nodules of ACR TI‐RADS 4–5 with multimodal ultrasound radiomics technology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08788419

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08788419

Country of ref document: EP

Kind code of ref document: A1