CA2391289A1

CA2391289A1 - Dynamic thresholding of segmented data sets and display of similarity values in a similarity image

Info

Publication number: CA2391289A1
Application number: CA002391289A
Authority: CA
Inventors: Bradley T. Wyman; Christopher L. Stork
Original assignee: Individual
Current assignee: Confirma Inc
Priority date: 1999-11-24
Filing date: 2000-11-24
Publication date: 2001-05-31
Also published as: KR20020077345A; JP2003515368A; NO20022447L; AU2046901A; NO20022448L; NO20022447D0; JP2003515828A; NO20022448D0; EP1236178A1; WO2001039122A1; CA2391290A1; EP1236176A1; AU763454B2; AU1796001A; WO2001039123A1; KR20020079742A; AU763459B2

Abstract

A method for image processing of magnetic resonance measurements to generate object-specific enhanced display images, including obtaining magnetic resonance measurements from an MRI apparatus; generating score data based on the magnetic resonance measurements; and modifying the score data to produce an enhanced image. Modifying the score data can include applying a convolution filter, automatically generating a score data rejection threshold, and modifying the score data to dynamically alter the rejection threshold to produce a dynamically-enhanced image. The method provides tissue segmentation results that are self-normalized, avoiding object and scanner dependencies in the generation of MRI images.

Description

DYNAMIC THRESHOLDING OF SEGMENTED DATA SETS AND DISPLAY
OF SIMILARI~'Y VALUES IN A SIMILARITY IMAGE
TECHNICAL FIELD
The present invention pertains to a method for establishing and applying thresholds to classified data. This invention pertains to a method to automatically or dynamically derive and display the acceptance threshold used to determine the sufficiency of similarity between the training set and a test sample.
BACKGROUND OF THE INVENTION
Pattern recognition involves the division of a set of samples into a number of classes, based on measurements made on one or more properties of the samples.
The set of samples within a given class are similar in terms of their measurement values, which are reflective of these samples' characteristics. Pattern recognition methods can be categorized into two broad classes: (a) supervised techniques, which utilize a priori knowledge or assumptions about the class membership of a set of samples to develop a classification rule so as to predict the class membership of unknown samples, and (b) unsupervised techniques, which without making a priori assumptions about the data, identify the class memberships of samples within the data set.
Pattern recognition is commonly performed in several disciplines, including geology, chemistry, cytology, medical imaging, as well as banking and marketing. The types of samples on which pattern recognition is performed vary widely across disciplines. While the types of samples may differ across disciplines, the basic purpose of pattern recognition remains constant across all fields - to separate samples into logical groups where each group has distinct, measurable properties.
In supervised pattern recognition, given two or more samples of known class, hereby referred to as the training set, and a sample of unknown class, hereby referred to as the test sample, one compares the measurement value or values of a test sample to the corresponding measurement values of a training set to determine if sufficient similarity exists to consider the test sample as being within the same class as the training set.

WO 01/39123 CA 02391289 2002-05-to pCT/US00/32204 On the other hand, in unsupervised pattern recognition, the samples are divided into classes based on measures of similarity between samples in a data set without any prior designation of class membership of any data.
U.S. Patent No. 5,003,979 describes a system for the detection and display of lesions in breast tissue, using magnetic resonance imaging (MRI) techniques.
In one described example, three different types of images are obtained for a given region, and the pixels of the image are then classified by comparing their intensity patterns to known patterns for pure tissue types, such as fat, cyst or cancer. The patent indicates that three specific types of images are adequate for statistically separating MR images of breast fat, cyst, carcinoma and fibroadenoma.
Applicants have found that in many cases, comparison of the pattern of intensities of a patient's tissue to a universal set of standard patterns for different tissue types does not produce results of sufficient accuracy. The basic problem appears to be that there is too much variability from one patient to the next, as well as from one MRI machine to the next, to apply a universal set of patterns to all patients. For this reason, the use of standard patterns does not result in the high degree of confidence that one must have in order to forego a more certain diagnostic technique, such as biopsy. For this reason, cancer diagnosis based on MRI
has not yet achieved widespread acceptance. A problem that occurs frequently in cancer treatment is detecting when a primary tumor has spread to other sites in the patient's body, to produce so-called secondary tumors, known as metastases, at those sites.
Detection and correct identification of metastases, using MRI or other imaging techniques, is often complicated by the fact that a remote lesion discovered during staging could represent either a metastasis or a benign incidental finding. A number of benign lesions (such as hepatic hemangiomas and non-functioning adrenal adenomas) occur as frequently in patients with a known primary tumor as they do in the general population.
Resolving this dilemma requires additional imaging or biopsy, but often significant uncertainty persists. Biopsy may expose the patient to substantial risk when the lesion is in the brain or mediastinum, or when the patient has impaired hemostasis. Even when biopsy does not present a significant risk to the patient, it may be technically challenging, such as sampling focal lesions in vertebral marrow.

3 CA 02391289 2002-05-10 pCT/US00/32204 SUMMARY OF THE INVENTION
A method for establishing and applying an automatic threshold determination to a test sample is provided. A method of displaying similarity images is also provided.
According to principals of the present invention, data are collected from an object to be studied. The data which have been collected are formed into a data set which is composed of a plurality of individual data elements.. The data elements represent corresponding locations in the object under investigation, with one or more properties collected for each data element. Some data within this set are segmented into classes, using either a supervised or an unsupervised segmentation technique.
After the initial segmentation has occurred, one or more of the resulting classes is selected. A training data set within this chosen class of interest is created. An acceptance threshold is automatically created based on the properties of the training set.
Using this automatic threshold as a standard, all data elements in the object are classified. By definition, data elements not in the training set are designatedas test samples. The test samples are compared to the data in the training set. If a test sample is very similar to the training set, so as to have a distance value that is less than or equal to the automatic threshold, it is determined to belong to the same class as the training set. If the test sample has a distance value which exceeds the automatic threshold, it is determined to not belong to the same class as the training set, and is placed in a different class. The automatic threshold is a custom generated number that has a different value for each training set and thus uniquely characterizes a subset of data elements within an object or patient. The automatic threshold therefore provides an improved analysis of the data because it is customized to each individual patient's own medical and physical characteristics.
One or more classes may be created which contain test samples that are similar to the training set, but are not sufficiently similar to be considered within the same class as the training set. The new class or classes may contain a range of test samples, some of which are more similar to the training set than other test samples. The new classes are displayed in a manner to show the similarity that these data elements have to the training set and to display the range of similarity relationships among the test samples and the training set.
According to one embodiment of the invention, the similarity values of the test samples to the training set are displayed on a color scale. All of the test samples that have values most similar to the training set are depicted with a single color.
Accordingly, all test samples that have values that are next most similar to the training set are depicted with a second color different from that assigned to the most similar set of test samples.
This multicolor display of similarity values may be used to depict many different levels of similarity, if desired.
The images are produced using magnetic resonance imaging (MRI) which includes a variety of different pulse sequences and types of pulses. The images are displayed on a monitor composed of an array of pixels. On the display, the region of interest appears as a set of pixels in a particular location. Using the display as a guide, a user selects a region of interest to be studied with more detail. An initial segmentation of tissues within the region of interest is carried out on the imager-derived data to determine the number of distinct tissues in the region. After the initial segmentation is performed, the user selects one or more of the classes of tissue within the region of interest to study in more detail. A
training set is created for each selected tissue class within the region of interest. The similarity among the data elements in the training set is determined using the maximum nearest neighbor distance among the training samples. From this, a classification threshold is established that is set as the first threshold for the maximum distance for a test sample to be considered to be in the same class as the training set. The distance between each test sample and its nearest neighbor in the training set is determined. The test sample is accepted as a member of the same class as the training set when the distance of the test sample is less than or equal to the classification threshold distance.
In accordance with another embodiment of the present invention, a method for applying a patient-specific threshold parameter is provided. The method includes using an MRI apparatus to generate image data sets and to produce a training set and a test set, as described above; measuring the extent of similarity between the pixels in the training set and each test sample to determine a threshold distance for inclusion of pixels in a display class;
and then dynamically varying the classification threshold distance.
In accordance with another aspect of the foregoing embodiment, the method includes displaying an image associated with the display class of pixels and dynamically altering the display of the image by dynamically varying the threshold value that changes the number of data elements in the class of interest. The display resulting from the dynamic varying of the threshold may be refreshed after each threshold change or displayed in "real time" meaning the display changes as the threshold is changed.

The invention will be described in one embodiment in the context of data from MRI as one example. The invention can be applied to the display of any image data collected by any acceptable technique, such as NMR, x-ray, computed tomography (CAT) scan, positron emission tomography (PET) scan, nuclear imaging, and other types or a combination of data from two or more of the techniques,.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is an isometric view of a known data collection apparatus which may be used to collect image data for later processing according to the present invention.
Figure 2 is a functional block diagram illustrating the steps in automatic guided specific tissue segmentation method.
Figure 3 is a flowchart illustrating the steps performed in determining the class membership of a test sample.
Figure 4 is a functional block diagram illustrating one technique to calculate and display similarity values.
Figures SA and SB are displays of an MRI image without and with use of the invention, respectively.
Figures 6A and 6B illustrate a similarity image according to the present invention as compared to a binary thresholded image, respectively.
Figures 7A and 7B illustrate dynamically varying the classification threshold according to the present invention.
Figures 8A, 8B and 8C are screen views that a user of the invention may see when comparing images using the present invention.
DETAILED DESCRIPTION OF THE INVENTION
Figure 1 is a known sensor and data collection device as described in U.S.
Patent No. 5,644,232. It illustrates one technique by which data can be collected for analysis according to the principles of the present invention.
Details of magnetic resonance imaging methods are disclosed in U.S. Patent No. 5,311,131, entitled, "Magnetic Resonance Imaging Using Pattern Recognition"; U.S.
Patent No. 5,644,232, entitled, "Quantitation and Standardization of Magnetic Resonance Measurements"; and U.S. Patent No. 5,818,231, entitled, "Quantitation and Standardization of Magnetic Resonance Measurements." The above-referenced three patents are incorporated in their entirety herein by reference. The technical descriptions in these three patents provide a background explanation of one environment for the invention and are beneficial to understand the present invention.
Pattern recognition is utilized in several disciplines and the application of thresholding as described with respect to this invention is pertinent to all of these fields.
Without the loss of generality, the examples and descriptions will all be limited to the field of magnetic resonance imaging (MRI) for simplicity. Of particular interest is the application of pattern recognition technology in the detection of similar lesions such as tumors within magnetic resonance images. Therefore, additional background on the process of MRI and the detection of tumor using MRI is beneficial to understand the invention.
Magnetic resonance (MR) is a widespread analytical method used routinely in chemistry, physics, biology, and medicine. Nuclear magnetic resonance (NMR) is a chemical analytical technique that is routinely used to determine chemical structure and purity. In I S NMR, a single sample is loaded into the instrument and a representative, multivariate, chemical spectrum is obtained. The magnetic resonance method has evolved from being only a chemical/physical spectral investigational tool to an imaging technique, MRI, that can be used to evaluatecomplex biological processes in cells, isolated organs, and living systems in a non-invasive way. In MRI, sample data are represented by an individual picture element, called a pixel, and there are multiple samples within a given image.
Magnetic resonance imaging utilizes a strong magnetic field for the imaging of matter in a specimen. MRI is used extensively in the medical field for the noninvasive evaluation of internal organs and tissues, including locating and identifying benign or malignant tumors.
As shown in Figure 1, a patient 20 is typically placed within a housing 12 having an MR scanner, which is a large, circular magnet 22 with an internal bore large enough to receive the patient. The magnet 22 creates a static magnetic field along the longitudinal axis of the patient's body 20. The magnetic field results in the precession or spinning of charged elements such as the protons. The spinning protons in the patient's tissues preferentially align themselves along the direction of the static magnetic field. A
radio frequency electromagnetic pulse is applied, creating a new temporary magnetic field.
The proton spins now preferentially align in the direction of the new temporary magnetic WO 01/39123 CA 02391289 2002-05-10 pCT/US00/32204 field. When the temporary magnetic field is removed, the proton spin returns to align with the static magnetic field. Movement of the protons produces a signal that is detected by an antenna 24 associated with the scanner. Using additional magnetic gradients, the positional information can be retrieved and the intensity of the signals produced by the protons can be reconstructed into a two or three dimensional image.
The realignment of the protons' spin with the original static magnetic field (referred to as relaxation) is measured along two axes. More particularly, the protons undergo a longitudinal relaxation (T~) and transverse relaxation (T2). Because different tissues undergo different rates of relaxation, the differences create the contrast between different internal structures as well as a contrast between normal and abnormal tissue. Thus, the signal intensity is proportional to a combination of the number of protons, the T, and the TZ properties of the tissue. Proton density weighted images generally emphasize differences in the number of protons between different tissues, while T,-weighted images generally emphasize the difference in T~ relaxation times between different tissues.
Similarly, TZ-weighted images emphasize the difference in TZ relaxation times between different tissues.
By manipulating the parameters of the MR scanner, an operator can produce images that are dominated by T~ or TZ relaxation (i.e., T~-weighted and TZ-weighted images) or proton density. In addition to T~, T2, and proton density, variations in the sequence selection permit the measurement of chemical shift, proton bulk motion, diffusion coefficients, and magnetic susceptibility using MR. The information obtained for the computer guided tissue segmentation may also include such features as: a spin-echo (SE) sequence; two fast spin-echo (FSE) double echo sequences; and fast stimulated inversion recovery (FSTIR) or any of a variety of sequences approved for safe use on the imager. Contrast agents are types of drugs which may be administered to the subject. If given, contrast agents typically distribute in various compartments of the body over time and provide some degree of enhanced image for interpretation by the user. In addition to the above, pre- and post-contrast sequence data sets were acquired. In other embodiments, a Tl, T2, proton density, and four echo sequences were acquired. Any acceptable data acquisition method, sequences and combinations thereof, can be used to collect the data according to the present invention. Thus, by using multiple sequences, multivariate image data can be obtained. Each pixel can be considered a sample and by using different sequences to image the same physical location, each sequence produces a new measurement for the sample.

WO 01/39123 CA 02391289 2002-05-10 pCT/US00/32204 Each data element under consideration has one or more properties which describe a corresponding portion of the object which the data element represents. Each of these properties has a numerical value. For example, if the image which has been acquired is an MRI image, then the properties of each data element may include such features as the longitudinal relaxation factor, TI, or the transverse relaxation factor, T2, weighted T1 or T2 images, the proton density space, or other parameters which are normally measured in an MRI, as is known in the art. Therefore, each of the data elements known has the numerical value which is related to each of the properties that provides a description of the data element. Each data element will thus be described by several different numbers, one number for each of the properties stored. The data is thus multivariate. The numerical values may be thought of as defining the position of a data element in mufti-dimensional space and reflecting the magnetic resonance properties of the tissue corresponding to that location.
Namely, each one of the parameters represents one of the dimensions for the location of the object in a Euclidean geometry field. If two properties of an object are stored for each data element, then the field becomes a two-dimensional Euclidean plane. If three parameters are stored, then the data element can be considered as being at a location in a three-dimensional Euclidean field. Similarly, if four physical parameters are represented, then the object may be considered as being at a location in a four-dimensional Euclidean field. Each data element, therefore, has a location within the mufti-dimensional Euclidean field.
In Figure 1, an object to be examined, in this case a patient's body 20, is S110W11. A slice 26 of the body 20 under examination is scanned and the data collected. The data are collected, organized and stored in a signal processing module 18 under control of a computer 14. A display 15 may display the data as they are collected and stored. It may also provide an interface for the user to interact with and control the system. A
power supply 16 provides power for the system.
The current known clinical standard for locating tumor tissue with MRI
involves having an experienced radiologist interpret the images for suspected lesions.
Radiologists are skilled in detecting anatomic abnormalities and in formulating differential diagnoses to explain their findings. Unfortunately, only a small fraction of the wealth of information generated by magnetic resonance is routinely available because the human visual system is unable to correlate the complexity and volume of data. The specific problem is that radiologists try to answer clinical questions precisely regarding the location of certain tissues, WO 01/39123 CA 02391289 2002-05-10 pCT~S00/32204 but seldom can they extract enough information visually from the images to make a specific diagnosis because the tissues are very complex and therefore difficult to fully segment in the image provided. This problem is compounded for MRI which produces many different types of images during a single imaging session.
To use all of the information created by an MRI examination, radiologists have to simultaneously view several images created with different MR scanner settings and understand the simultaneous complex relationships among millions of data. The unassisted human visual system is not capable of seeing, let alone processing, all of the information.
Consequently, much of the information generated by a conventional MRI study is wasted.
Consequently, there is a great need to efficiently utilize more of the existing MR information to more accurately segment the various tissues and thereby improve the confidence of conclusions drawn from the interpretations of medical images. Because a proper determination of the location and the extent of a tumor (a process called staging) will determine the course of treatment and may impact the likelihood of recovery, accurate staging I 5 is important for proper patient management.
To assist the radiologist in making diagnostic decisions and to reduce the time and expense associated with the analysis of magnetic resonance (MR) data, computerized MR-based tissue segmentation methods have been developed, where segmentation refers to the specific application of pattern recognition technology in the analysis of images, some of which are described in the three patents incorporated by reference. The purpose of the prior art MR-based tissue segmentation methods is to divide the pixels within an MR
image into different tissue types. The success of MR-based segmentation methods lies in the ability of the computer to quickly and correctly correlate all of the digital information provided by MRI.
Attempts at supervised segmentation methods have been made in the analysis of MRI images, but with limited success. Training samples are typically obtained through a manual labeling of a few pixels or contiguous regions in the image set for each tissue of interest by a qualified observer. Multiple tissues or classes are modeled and, for a given test sample, a value is calculated for each modeled class indicating the similarity of the test sample to the class in question. The test sample is then assigned to the modeled class to which it is deemed most similar. An obvious problem with this approach occurs if all classes within the image set have not been modeled. If a given pixel belongs to an unmodeled class, WO 01/39123 CA 02391289 2002-05-10 pCT~S00/32204 it is falsely assigned to one of the modeled classes utilizing current supervised segmentation techniques.
The present invention provides an MRI-based supervised segmentation method that allows the automated and accurate identification of tissues of interest, such as tumor, without the requirement of modeling all tissues within the image set.
One approach to classification of data according to the invention is shown in Figure 2. As shown in Figure 2, data 30 are acquired. The data can be acquired using any acceptable technique and protocol. As previously stated, the data can be collected using the prior art MRI system as shown in Figure 1. Alternatively, other systems for acquiring and storing the data may be used. For example, NMR or other medical imaging techniques may be used. The invention may also be used in the other fields previously described.
Thus, the invention is independent of the particular acquisition protocol and the equipment used to acquire and store the data. Once the data has been stored, as shown in block 30, it is now ready to be treated according to principles of the present invention.
In order to proceed to the next step, a user, usually a medical technician or physician, obtains an image from the data. On the image, the user selects a region of interest for which further examination is desired, step 38. This region of interest may be designated on a screen using a computer mouse, forming a box around a target site, that includes a known tumor site, or some other technique by which a user may designate a region of interest. Normally, a user will indicate a region of interest that includes two or more types of tissue. It is therefore necessary to divide the data elements into classes that represent the types of tissue in the region of interest.
Automatic clustering of tissues within the region of interest is carried out as shown in step 40 to divide the data elements into classes reflecting the types of tissues. If the region of interest is quite small and specific, there may be only two or three types of tissues within the region of interest. Alternatively, there may be five or more types of tissue within the region of interest. The automatic clustering provides a segmentation of the different types of tissues that are found within the region of interest which has been designated.
After the automatic clustering is performed, the next step 42, guided tissue selection is carried out. The user performs a guided tissue selection as will now be described according to the present invention. After the clustering of tissues is performed, the user selects one or more clusters which represent a type of tissue which is of particular interest for further study. In standard use of the present invention, the type of tissue being selected will normally be a tumor which is suspected of being malignant. In some applications, the tumor may have previously been confirmed as a malignant tumor and it is desirable to know the exact shape of the tumor and whether the cancer has spread outside the initial tumor boundaries. It is also desirable to know whether the cancer is present in other portions of the body. For each selected cluster, an automatic classification threshold is calculated utilizing information contained within each corresponding set of training samples.
In step 44, the specific tissue segmentation is performed. The automatic classification threshold has been applied in step 44 and now it is desired to provide an indication of those data elements in the test set which are sufficiently similar to the training set. In the case of image data, the samples or pixels which are deemed sufficiently similar to the training set at the current classification threshold are considered to be within the same class. The final segmented image is then displayed, step 34.
Steps 42 and 44 are particularly relevant to the invention and will be described in more detail in Figures 3 and 4, concerning the establishment of an automatic classification threshold, the dynamic modification of this classification threshold, as well as the creation of an image using similarity data, and thus will be discussed in greater detail.
Figure 2, providing an overview of the invention has now been described.
Some of the specific details for each of the steps will be provided. Step 40, the automatic clustering will be described in more detail as follows.
The tissue in each volume element (voxel) of the subject is represented as pixel data in the image. The display characteristics of a single pixel in the image thus represents the characteristics of a corresponding voxel imaged by the MRI
equipment.
A data element has a numerical value in each of two, three, or four or more properties. The numerical values place each data element at a location in a Euclidean geometry composed of two, three, four or more dimensions, one dimension being used for each numerical value. The location of the data element can thus be known in a multivariate space. Data elements that are in the same general location in multivariate space represent voxels whose contents have properties very similar to each other. Namely, for two highly similar or identical tissues which occupy separate voxels, all the corresponding pixel data elements representing those voxels will have highly similar or identical numerical scores in the multivariate space and thus will be deemed highly similar to each other when measured in Euclidean geometry.
The goal of cluster analysis is to uncover any intrinsic structure within a set of samples based on measurements made on these samples. Samples are grouped together into clusters based on their proximity in multivariate measurement space, where samples within a given cluster exhibit a high degree of similarity. In MRI, each different cluster typically represents different tissuetypes, which thus have different signal intensity characteristics. In one embodiment, the fuzzy C-means algorithm is implemented. The key concept of the fuzzy C-means algorithm is that each sample exhibits varying degrees of membership in each of the clusters. A membership value close to one indicates a high level of similarity between the sample and a cluster, and a value near zero indicates little similarity between the sample and the cluster in question. The fuzzy C-means algorithm has proven valuable in the analysis of MRI data due to its ability to explicitly model partial voluming effects.
Mathematically, the fuzzy C-means algorithm can be summarized as follows.
I S Data collected during the course of an MRI study can be organized into a four-dimensional array, F,x,X,~X,. , where I designates the number of rows within the array, .7 the number of columns, K the number of MRI data sequences, and L the number of slices within the study.
The intensity value for a particular array element, f;;k, , can be accessed by specifying a row-column-sequence-slice address, where i = 1,...,1 is an index indicating the row position,,] =
I,...,J is an index indicating the column position, k = 1,...,K is an index for the MRI data sequence, and 1 = 1,...,L is an index for slice number. Elements of this four-dimensional feature array can be extracted to form other useful data structures. For a given pixel located in row i, column j and slice number l, a sample vector is composed of the values for the K
measured sequences:
fKX~ _ ~f,;n, f.;~r,..., f,;K,~T . A collection of N sample vectors organized by rows forms a sample set designated by the matrix, FNx,; . For a set of N sample vectors, F~X~, containing C' clusters, the fuzzy C-means algorithm is based on minimization of the following objective function, J9, with respect to U~,t,~, a fuzzy C-partition of the data set, and to V,~-~-~, a set of C' prototypes:
N (' \1 '/ 1'r ~7 '_7'l Jq~~('-zN~~Kx('~ ~~~uc'n~l ~ ~Vc~fn~
rr=1 c=1 where f" is the nth K-dimensional sample vector, v~ is the centroid of the cth cluster, u~." is the degree of membership of fn in the cth cluster, q is a real number greater than 1, and d(v~.,f") is WO 01/39123 CA 02391289 2002-05-10 pCT/US00/32204 the distance between f» and v~. The weighting exponent, q, controls the fuzziness of the resulting clusters. The fuzzy partition matrix, Uc,t,~, contains the membership value of each sample for each cluster. The C cluster centers are collected in the cluster center matrix, VKXc _ {v~, v2,..., vc}. The distance between sample fn and cluster center v~ is computed as S dOc~f»~=~~ff-v~~T~f»-v~.~]°.s.
An optimal fuzzy clustering of the sample set, FNxK, is defined as the fuzzy partition matrix-fuzzy cluster centers pair, (UcxN, VKXc), that locally minimizes the fuzzy objective function, J9. Optimization is achieved through an iteration of the following steps:
(1) Initialize the fuzzy partition matrix, U, using a random number generator such that c.
u~." =1 c=1 (2) Compute the fuzzy centroid v~ for c = 1,..., C using N
~ua» ~y f»
v = »=1 c' N , ' \uc» ly »--1 (3) Compute an updated membership matrix, U, via _1 1 w' d2wa~f»~
uc» = 1 c. 1 v_1 d wc~~f,r~
(4) Repeat steps (2) and (3) until the reduction in Jg between two successive iterations is less than a defined threshold or the number of iterations exceeds the accepted limit.
A difficulty in the practical implementation of clustering involves determining the number of distinct groups or tissue types within the selected region of interest. As the results of clustering and, therefore, classification are dependent on the number of clusters WO 01/39123 CA 02391289 2002-05-10 pCT~S00/32204 employed, it is imperative that an accurate, automated method be implemented to assist the user in making this selection. In one embodiment, the Xie-Beni fuzzy validity criterion has been implemented to automatically select the number of clusters. The Xie-Beni fuzzy validity criterion identifies overall compact and separate fuzzy C-partitions without assumptions on the number of groups inherent in the data. For a set of N
sample vectors, FN~~-, and a given number of clusters, C, the Xie-Beni index is calculated as follows:
~~Ow,O~d~wa~f,~~
s = ~~=t ~=t N(dmin where d",;" is the minimum distance between cluster centroids. The numerator of the Xie-Beni index is a measure of cluster compactness, while the denominator is indicative of the degree of separation between clusters. A smaller Xie-Beni index indicates a partition in which all the clusters are overall compact and separate. The optimal number of clusters is determined by calculating the Xie-Beni index for a range of cluster numbers and selecting the value of C for which S is a minimum. A group of clusters will thus be represented as groups of pixels, all clustered together. Usually, each cluster will be separated from each other cluster by some distance, based on the differences of the respective tissue type represented by each respective cluster of pixels.
A cluster of interest will thereafter be selected for further study.
The Xie-Beni Fuzzy Validity Criterion is well known in the art and has been published, see for example the article by X.L. Xie and G. Beni in IEEE
Trun.suction.v orr Pattern Ar~alysis~ and Machine Intelligence, Volume 13, page 841-847 (1991).
As shown in Figure 3, the user then performs a guided tissue selection in step 42 as will now be described in more detail according to the present invention.
After the clustering of tissues is performed, the user selects one or more clusters representing one or more tissue types which are of particular interest for further study. In standard use of the present invention, the cluster selected will normally be a tumor. In some patients, the lesion may have previously been confirmed as tumor and it is desirable to know the exact shape of the tumor and whether the cancer has spread outside the initial tumor boundaries. It is also desirable to know whether cancer is present in other portions of the body.
According to the present invention, selection made of a class of tissue 50 that has been classified by the clustering in the prior step 40 as the specific tissue of which further examination is desired. In the present example, it will be assumed that a tumor is selected in step 50. After the specific tissue is selected in step 50, a training set of individual data elements is created in step 52 which is representative of this selected type of tissue for which further search and study is desired.
In designating a training set, the goal is to incorporate data elements which are representative of the tissue of interest, while excluding data elements that represent other tissues. The training set will then be used to determine which data elements within and outside the region of interest are similar to in the training set.
In some embodiments, a data element in the stored data has a one-to-one correspondence to pixels in the display. In other embodiments, a data element may be represented by multiple pixels, or vice-versa, depending on the resolution of each, the type of display, and other factors.
In one embodiment, the set of training samples is selected from the full set of cluster samples using the fuzzy partition matrix, U~XN, obtained via the fuzzy C-means clustering algorithm. First, the cluster samples having a maximum membership value for the selected cluster, c, are identified. Next, for each sample within this subset, the percentage difference between the largest and second largest fuzzy membership value is calculated. Only if this difference exceeds a specified percentage level (e.g., 90%) is the cluster sample incorporated within the training set. This training sample selection procedure ensures that those cluster samples which are representative of the tissue of interest (e.g., those close to the cluster center, v~) are included in the training set, while outliers are not.
Each of the data elements within the region of interest has been placed in the particular class to which they most closely belong in the prior step 40. One of the classes is selected to create the training set. After the class is selected from which the training set will be created, the number of data elements in the training set is refined to represent only those core elements which most closely represent the core properties of this class.
For example, all of those data elements having a low fuzzy membership value for the selected cluster are removed from the training set. These may be referred to as outliers. For example, a data element is placed within a particular cluster when its fuzzy membership value for that cluster is higher than for any other clusters within the region of interest. Thus, the data element is defined as being more closely associated with that cluster than any other cluster within the region of interest. However, once a training set is going to be created for a particular cluster, a threshold is established for data elements which are within the cluster to be included within the training set. If the fuzzy membership value for the particular data element falls below the acceptable threshold, then it is not included within the training set, even though it is still considered a member of the cluster. For example, consider the case where there are three clusters and the sum of the three fuzzy membership values for each pixel equals 1Ø The fuzzy membership value for a given cluster ranges from 0 to 1. Thus, if for a given data element one of the fuzzy membership values is 0.9 or 0.95 then this indicates the other two fuzzy membership values for that particular data element are quite low and that data element is a strong member of the class. On the other hand, if all the fuzzy membership values are approximately equal, then classification of that particular data element in one class or another is very close and using it as a member of the training set is not appropriate.
Thus, all data elements whose fuzzy membership value for the selected cluster fall below the threshold are stripped away from the class and are not placed into the training set. In one example, the threshold may be 0.7 or, alternatively, 0.8. Thus, only those data elements having a fuzzy membership value higher than the threshold, for example in the range of 0.7-0.8 or higher will be included in the training set. The training set is thus created having a group of data elements which have characteristics strongly representative of the selected class. Those data elements which may be more closely associated with that class than any other class, but are not strong members of the class, are removed and do not become members of the training set.
After the training set is created, as shown in Figure 3, step 52, an automatic threshold value is created step 53 for the training set that will be distinct for that particular training set. The automatic threshold of step 53 is created as follows. The distance between each member of the training set, and every other member of the training set is calculated. As described elsewhere herein, each data element may be considered as being at a location in a mufti-dimensional space. Thus, using Euclidean geometry the distance between data elements can be calculated. Since the location of a data element within the Euclidean field is directly related to its properties, the more closely the training elements are to each other the more similar are the physical properties of the tissue that they represent.
Therefore, a numerical value representative of the distance between data elements is a reliable determination of the similarity of the characteristics of the tissues which the data elements represent. If the distance between them is small, namely at or near zero, then the tissues are determined very similar to each other. On the other hand, if their distance is great, then the properties of the tissues that the data elements represent are not similar to each other.

The maximum distance that any member of the training set is from its nearest neighbor in the training set is established as the automatic threshold.
Namely, in step 53, the distance between each training set data element and the data element nearest to it within the training set is determined. Then, that distance which is the greatest for the nearest neighbor pair of data elements is selected as the custom threshold distance.
As will be appreciated, this automatic threshold is created by the steps of the invention and will be different for every single training set which is created. Accordingly, it is termed an automatic threshold because for every single data collection the calculation is automatically done and the value of the threshold will be different. The concept is that the value of the threshold is based on the similarity of the data elements within the training set to each other. If all members of the training set are very similar to each other, then the threshold will be correspondingly low because the greatest distance between any two members will be low. On the other hand, if the training set is composed of data elements which are more different from each other, then the automatic threshold will be higher. Thus, the automatic threshold, as calculated, provides a determination of the relationship of the data elements to each other within the training set. It is a custom number calculated for each tissue being studied in each patient and therefore captures the individual characteristics of the patient's tissues. This same relationship will then be used to determine whether data elements in the same patient but not within the training set should also be included within the same class as the training set.
This automatic threshold number will be stored and saved for each patient and each tissue type for which a training set is created for a patient. If the patient receives another MRI, months or even years later, all this data will be available. The custom threshold number for a given training set in this patient may be useful as a comparison standard for any new measurements taken months or years later. It may also be used to calibrate and then normalize data taken by the same MRI machine years later, or by a different MRI machine.
It is known that the data collection characteristics vary from machine to machine for the many MR, NMR, CT, PET, and other imaging and non-imaging sensors now in use. Further, the characteristics of one sensor may change over time as the parts age, operating temperature changes, or other factors affecting a machine.
Automatically calculating a custom threshold for each set of data that is used as a training set provides a self calibration for each machine and each tissue sample for each patient. The subsequent classification of tissue thus has a more reliable standard than was possible in the prior art. In addition, by saving the data and the custom threshold, they can be used at a later time to compare to later tests of the same patient to assess and quantify improvements or changes over the treatment time. A later collected data set may, for example, have its own custom threshold created and this can be compared to the prior custom threshold and the prior custom threshold can be used on the new data to see if more accurate results can be obtained or to see more clearly the changes over time.
As shown in Figure 3, once the automatic threshold has been created, the next step 54 is to determine whether or not additional elements outside of the region of interest should be included within the same class step 54. This is accomplished by comparing the distance between each member of the training set with each data element in the entire array, step 54. If the distance between a test data element and any one member of the training set is less than the automatic threshold, then that data element is included as a member of the training set class, step 58.
A comparison is carried out between each member of the training set and each data element outside of the training set. Preferably, all the data elements which compose the entire original data in step 30 are compared to determine whether or not any of the selected tissue is located outside of the region of interest, and in particular, whether it is located in other portions of the body.
Any of known acceptable techniques may be used for comparing the training set of data elements to the test set of data elements which represent the object under consideration. One particular acceptable comparison technique will be summarized, after which will be described in more detail even though other techniques may be used.
The numerical values associated with each member of the training set are compared to the numerical values associated with every other data element of the object under investigation. Since the numerical values of the data element are related to the physical properties of the object under investigation for each data element, its location in the multi-dimensional Euclidean field is related to the physical properties of the object. In the comparison of the training set to the test set, the closer that two data elements are to each other within the multi-dimensional Euclidean field, the more similar they are.
Thus, the distance between the two data elements with respect to each other becomes an accurate and WO 01/39123 CA 02391289 2002-05-10 pCT~S00/32204 reliable measure of the similarity of the properties of the object represented by the data elements.
The distance from a data element in the training set and a test sample data element can be represented as a numerical value which provides a similarity for the test sample data element. The numerical value of this similarity represents the similarity between the test sample and the training set. The smaller the distance, the more similar the data elements are to each other. Thus, a distance of 0 indicates the data elements are so similar to each other than they may be considered identical and members of the same class. Higher distances indicate less similarity. Depending on the scaling used, the numerical distance value may be a single digit number, such as a number one through ten.
Alternatively, it may have many digits such as a seven- to ten-digit number. For some embodiments, the distance number is expected to be sufficiently large for nearly all test samples in the data set, that the logarithmic value may be taken after the distance value is obtained in order to normalize the distance values. The absolute distance value is not so important as the relative distance value compared to the distance for other data elements in the test set and also the distance of a training data element when compared to other elements in the training data set.
The above concepts will now be described from a mathematical model for an example of how the invention may be carried out. In this example, data collected during the course of an MRI study is organized into a four-dimensional array, F,x,xKx,. , where I designates the number of rows within the array, J the number of columns, K the number of MRI data sequences, and L the number of slices within the study. The K sequences are the features utilized in tissue segmentation. The intensity value for a particular array element, ,f,~~, , can be accessed by specifying a row-column-sequence-slice address, where i = 1,...,1 is an index indicating the row position, j = 1,...,J is an index indicating the column position, k = 1,...,K is an index for the MRI data sequence, and l = 1,...,L is an index for slice number. Each data element in the array can be thought of as occupying a location in a 2-, 3-, 4-, 5- or other multi-dimensional field. Such data element is sometimes termed a voxel. For a 2-dimensional screen image, each display element is usually referred to as a pixel.
Elements of this four-dimensional feature array can be extracted to form other useful data structures. For a given voxel located in row i, column j and slice number l, a sample is composed of the values for the K sequences:

f T
fxm = ~.fiv~.f;v~...,.fi;xn A collection of N voxel samples organized by rows forms a sample set designated by the matrix, FNxK
An automatic classification threshold has been established using information contained within the training set samples in step 42 as previously described.
This automatic classification threshold sets the initial cutoff level for determining which test samples are sufficiently similar to the training set to be considered a member of the class of interest. In one embodiment, the "near enough a neighbor" (NEN) criterion is utilized to establish a classification threshold. The NEN criterion determines a classification threshold based on the maximum of the nearest neighbor distances within the training set samples. The nearest neighbor distance values for the test samples are compared to this automatic classification threshold to determine which of these test samples are sufficiently similar to the training set to be considered a member of the class of interest.
The similarity values will be calculated and displayed, as shown in Figures 4 and 6A, however, in some embodiments it is preferred to perform some prior clustering and selection as will now be explained. For each voxel, segmentation is initiated by calculating a similarity value for each specified class. For a given class, an output of segmentation is a three-dimensional similarity array, D,x,X,, , containing the similarity value at each voxel position (i,j,l).
In one implementation of MR segmentation, the user designates a set of voxel samples, FN~x , representing a single tissue class. Having identified these training samples.
supervised segmentation is performed utilizing the nearest neighbor method.
The classical single nearest neighbor method can be used at this stage, which is a nonparametric decision rule which classifies an unknown test sample as belonging to the same class as the nearest sample point in the training set, as described in a publication by Gose et al.
in 1996. The degree of similarity between a test sample, Rest , and a training sample, f" , can be defined in terms of their Euclidean distance in K-dimensional feature space ~~f test ~ f,i ) _ ~ \J test.k - J nk k=1 The distance from the test sample to each training sample in the training set is measured, and the minimum of these distances is selected. Namely, the distance value for the test sample is based on how close the test sample is to the nearest one of the samples in the WO 01/39123 CA 02391289 2002-05-10 pCT/US00/32204 training set. An output of nearest neighbor segmentation is a three-dimensional similarity array, D,x,~,, , containing the minimum distance at each voxel position (i,j,l).
Figures SA and SB illustrate the advantageous results which can be obtained by using the automatic threshold according to the present invention. Figure SA
illustrates an MRI of the subject which has been reviewed by a radiologist for suspected tumor locations.
The radiologist identified regions 60 and 62, both encircled, as likely tumor sites. Region 62 corresponds to a mass in the nasopharynx whereas region 60 corresponds to nodular masses of variable size in the fatty tissues deep to the left cheek. Using the concepts of the present invention, the mass within the nasopharynx in region 62 was selected as a region of interest.
Automatic clustering was carried out using the techniques as described herein.
Following this, the mass in the nasopharynx within circle 62 is identified as the tissue from which a training set should be created. Accordingly, a training set was created from the class of data elements that represent the large mass 62 in the nasopharynx. After the training set was created, an automatic threshold was generated by comparing members of the training set to each other and selecting as a threshold that value which was the maximum of the nearest neighbor distances as previously described herein. Once the automatic threshold was obtained, then all data elements in the MR image were compared to the training set to determine those that were within the threshold distance and thus should be classified in the same class of tissue as that found within region 62 in the nasopharynx. A
binary segmentation result was obtained by applying the threshold, such that pixels whose distance values were less than or equal to the threshold were designated as members of the training set class. After the classification of all data elements in the entire MR image was completed, the original unclassified MR image was overlaid with a display showing in black all pixels which deemed similar to the training set. The resulting image is shown in Figure SB.
As can been seen in Figure SB, the regions 60 and 62 are shown completely in black, illustrating that they both belong to the same class of tissue. This makes clear the radiologist's initial interpretation that these two tissues likely were cancerous tissues of the same type. Of some interest, the enhanced image also showed an additional region 64 which had previously not been detected by the radiologist. This additional region 64 was also overlaid in black pixels, indicating that it is likely the same type of tissue as the regions 60 and 62 and thus, is a probable, additional site of the spreading cancer tissue, in this case located in the left sinus cavity. This confirms, therefore, that there are two remote lesion sites, 60 and 64, which are highly similar to the reference mass 62 and thus indicate a high likelihood of metastases of the cancer.
The image provided according to the present invention has a number of advantages. First, as can be seen by comparing Figures SA and SB, the specific sites of the tumor can more clearly be seen in Figure SB. Thus, the boundaries and exact location of the tissues of interest are more exactly known by viewing Figure SB. In the event surgery, or some other invasive treatment is desired, having the MR image of Figure SB
becomes extremely useful to more precisely locate the tissues which share the same characteristics. In addition, by definitely locating another tissue in the MR image as being of the same type, the spread of the tissue can more accurately determined. Further, within the portion of the body imaged, confirmation that the cancer has not spread to other locations can also be confirmed.
Only three sites were highlighted in Figure SB as being the same tissue and thus, the physician has a high degree of confidence that no other tissues are of the same type and that the cancer is confirmed to three locations. Given the general difficulty of interpreting and reading various medical images, the advantage of the present invention of highlighting specific tissues which share the same property has a number of advantages in medical diagnosis.
Figure 4 illustrates further steps that can be carried out according to principles of the present invention as part of step 44 of Figure 2. Figure 4 shows the steps for creating similarity data and then displaying the similarity data. The data are acquired, step 30, as in the prior Figure 2 and the other steps carried out to reach step 44. The next step 32 follows the creation of a training set of the same type as that of Figure 3.
Similarly, the automatic thresholding step is carried out in order to determine a threshold value.
Following this, the similarity values are determined from each data element in the test sample, step 32.
After the distance to each data element outside the training set has been determined, step 32 is carried out in which a color is assigned to each pixel based on its distance from the training set. According to this embodiment, all pixels in the training set are assigned to a particular color. If the distance between any data element in the training set and test data element is less than the automatic threshold amount, it is assigned to the same color as the training set. If the distance of the data element is just slightly greater than the threshold distance, then it is assigned a color which is just slightly different from the color of the training set. The greater the distance between the test element and the training set, the WO 01/39123 CA 02391289 2002-05-10 pCT/US00/32204 greater the difference in color. Accordingly, a color will be assigned based on the similarity of the test sample to the training set. The color changes representing the different degrees of similarity may be shades of a single color or a palate of distinct colors or a combination of the two color display methods.
According to one embodiment of the present invention, the color selectin for display of similarity is based on the color spectrum for visible light. As is known, red has the lowest frequency visible to the unaided human eye. Progressing along the color spectrum of light from red, the next color encountered is orange, followed by yellow.
Continuing to progress along the color spectrum for visible light, the next colors encountered are green, greenish blue, blue, violet.
In one embodiment, the class to which the training set belongs is assigned the color red within the visible color spectrum of light. All test data elements which are within this class are also shown in red. All test data elements not within the same class as the training set are assigned to a color whose frequency is proportional to the distance that the particular test data element is from the training set. For a test data element which is quite close, it is assigned to a very similar light spectrum to that of red, namely, an orange-red color. For those spaced slightly further from the class of the training set, a color is assigned in the color spectrum which is proportionally farther from red, such as orange-yellow or yellow. This is performed for all members of the training set. These colors are then overlaid (in either opaque or a degree of transparent color or in some pattern such as cross-hatched or stippled) on top of the MR image which, in normal instruments, is a grayscale image that will appear in different shades of black and white. By overlaying a color image whose frequency spectrum varies proportional to the similarity of the tissue under study, the radiologist can very quickly and easily see the exact location of identical type tissues and, in addition, can see similar type tissues which appear to be related to the tissue under investigation.
A binary image may also be created, step 36 as shown in Figure 4. The binary image may be a gray scale image, of the type similar to that shown in Figures SA and SB or, alternatively, it may be a two-color image, such as a gray scale image in combination with one other selected color which has strong contrast against gray scale, such as orange, yellow or red. In a binary image, pixels are grouped into two classes, those that are termed "on'' and those that are termed as "off." The "on" pixels all have a predetermined intensity and/or color that is the same for all pixels in the set. The "off ' pixels retain their original display WO 01/39123 CA 02391289 2002-05-10 pCT/US00/32204 characteristics. The "on" pixels are overlaid on the MR image to provide a binary image for the user.
The selection of pixels to be "on" or "off' in the binary image can be done using the same classification system that has just been described with respect to the custom threshold.
After the custom threshold is obtain and applied to the image, those pixels within the same class as the training set are turned "on" and the remaining pixels are left off. Alternatively, a more strict standard may be used, with a higher threshold than the custom threshold so as to require that the pixels be very similar to each other to be turned on in the binary image. As a further alternative, the binary image may be created using as a standard, the using the fuzzy clustering technique as previously described.
Thus, the binary image will show two colors, all tissue within the same class of the training set being in a first color and all other tissue being in a second color, with the potential that the second color may be a single solid color which has contrast with the first color or the gray scale image. This type of display is shown in Figure 6B as will be explained later herein. Alternatively, the binary image may be created using a threshold value. The threshold value may be the custom, automatic threshold value as previously described or, alternatively it may be a dynamic threshold value as explained with respect to Figures 7A and 7B. The threshold value may, therefore be a different number than previously used for the maximum similarity, particularly if a dynamic threshold is used.
When the threshold technique is used, the similarity of values, having been calculated in step 32 are compared to the threshold value. Those pixels which are within the threshold value are shown as the first color, while those pixels which are not within the threshold are shown in a second color, or in a gray scale. Accordingly, the physician is provided a binary display of the data in two colors to more quickly and easily highlight the exact regions of the tissue which have an exact match within the threshold of interest.
Providing to the physician, in a side-by-side format, the original MR image and then, the overlaid version of the same MR image according to the present invention has significant advantages. The physician viewing the original MR image is able to locate specific regions of tissue for further investigation and note those areas for which particular study is desired. Then, the same MR image is overlaid with the pixels modified as described in the present invention. This can be displayed side-by-side with the unaltered image. The WO 01/39123 CA 02391289 2002-05-to pCT~S00/32204 physician can quickly and easily see all changes which have been made by the processing steps of the present invention. Tl~e changes will therefore more particularly be highlighted to the physician with the realization that these are based on a comparison of the tissues of particular interest. The changes which are made between the original MR image and the modified MR image therefore provide additional information to the physician in understanding the type of tissue at each location, and the size and extent of the tissue growth at each particular location. This side-by-side pattern is shown in Figures SA
and SB and also Figures 6A and 6B.
In the embodiment shown in Figures 6A and 6B, the side-by-side comparison is between a color similarity image and a binary similarity image. Again, providing the physician a color spectrum similarity image adjacent a binary image provides significant advantages in understanding the tissue located at each site, the tissue growth patterns and the extent of any tumors which may have spread. These also provide significant advantages in permitting the physician to ignore false positives which may occur in the original image, or possibly in the modified image.
Figures 6A and 6B illustrate an MR image which has been overlaid with a color change of those pixels which represent the data elements based on their distance from the training set, corresponding to a tumor site, as carried out in steps 34 and 36 of Figure 4.
Those pixels which are the most identical to the training set are colored red as can clearly be seen by region 72 in Figure 6A. Those regions which are quite similar to the training set, but not sufficiently similar to be within the same class, have a slight orange color, as can be seen by reviewing regions 74 of Figure 6A. Those tissues which are the next most similar have a yellow color, regions 76 of the tissue in Figure 6A. The regions next most similar have a yellowish green, followed by a green color. The tissues least similar have a dark blue, approaching a violet in color.
Segmentation results are displayed to the user in the form of similarity images.
For a slice l, the corresponding similarity image, D,x,,, , is displayed using a specified color map. Figure 6A is one example of a color display and having the presentation of segmentation results using the similarity image method and binary thresholding for an MR
animal study according to the invention. T1-weighted, T2-weighted and STIR
(short tau inversion recovery) sequences were acquired and utilized in segmentation. In this example, the user designated a set of tumor pixels as the region of interest and a nearest neighbor distance value was calculated for each of the remaining pixels in the image set. In displaying the similarity image, a color map was used. All pixels in the training set appear red.
According to the invention, pixels with a relatively small distance values (high similarity values) for the selected class appear the same color as the training set, in this case red, while pixels with relatively large distance values (small similarity values) appear dark blue.
Viewing Figure 6A, four core regions having red at a central portion thereof can easily be seen. Each of these regions has a yellow fringe around the edges indicating that these very similar tissues are located at the edges, even though they are not so similar as to be within the red class. The remainder of the image is shown in blue. Namely, it is clear that all other types of tissue are distinctly different from those tissues which have been indicated as belonging to the class of interest, in this case tumor. Since all of the portions of the MR
image have been overlaid with the blue, the physician has a very high confidence that the tumor is not located in other regions beyond those which have been highlighted in red. The ease of reading the MR image is greatly improved using the similarity of color image in the present invention.
Figure 6B is the same MR image of Figure 6A, however, a grayscale image is overlaid with a single color indicating only those pixels which belong to the same class as the training set. In Figure 6B, a threshold was applied in step 36 and all pixels that correspond to data elements within the threshold distance were assigned a color of yellow, regions 73, 75, 77 and 79. All other pixels not within the similarity threshold distance retained their original grayscale color of the original MR image. Thus, Figure 6B appears more as a binary image in which the tissue of interest is in one color and all other tissues are in a grayscale pattern. It thus has a binary image with only those pixels which fall within the threshold being highlighted.
The advantages of using similarity display of data can be seen by comparing Figures 6A and 6B. In Figure 6A, much larger regions are illuminated with slight variations in color as to quickly draw the radiologist's attention to those tissues of interest. Since the center of most larger tumors is often necrotic, namely dead tissue which was formerly living tumor tissue, it will have characteristics somewhat similar to the class of interest, but will be sufficiently different that it does not belong to the same class as the otherwise living tumor tissue. In the binary image, such tissue does not show as belonging to the same class, with the result that the physician may not fully appreciate the scope of the tumor.
However, in the Figure 6A image, a larger amount of tissue is highlighted for more easy recognition to understand the scope of the tumor and, permit the understanding of the necrotic tissue within the center of the tumor region. Further, Figure 6A, showing that all other portions of the MR
image are blue, indicates that there is no similar tumor issue anywhere else in the array.
Figure 6B, in which the remainder of the MR image is on a grayscale, requires additional consideration and further study by the radiologist in order to confirm whether or not additional tumors may be present.
As shown in Figure 6B, in generating the binary thresholded image, pixels with distance values less than or equal to the designated threshold are highlighted in yellow on the T1-weighted MR image. Regions suspicious for tumor are readily apparent in the similarity image of Figure 6A, via the pixels' elevated brightness levels and different colors. In contrast, similarity information is not as easy to understand in the binary image display of Figure 6B
through the application of a subjective hard threshold. As an example, in Figure 6B, while the line between the colors is sharp, in reality the tissue change may not be as abrupt. Pixels falling I S above the designated threshold may be very similar to the selected tumor class yet are not highlighted. The user should be more suspicious of pixels whose distance values lie just above the threshold, than those pixels whose distance values far exceed it, yet this information is not available in the binary image of Figure 6B. Figure 6A, on the other hand, clearly shows to a user the tissues that are in the same class as the training set and those that, while not the same class, are so close as to be very similar and suspected of being a tumor. On the other hand, viewing Figure 6B, a user has the advantage that stark contrast between all tumor tissue and all other tissue is easily seen.
In addition to directly presenting the similarity images to the user, as portrayed in step 34 of Figure 2 and Figure 6A, other display modes are envisioned. In one variation, the subset of pixels whose similarity values fall above or below a designated threshold level could be overlaid on the original image. Also, to indicate pixels included within a volume, a modified similarity image could be created, such that the calculated similarity values are displayed for all pixels whose values fall above or below a designated threshold level, while a constant similarity value is assigned to all other pixels. Similarity images could be reconstructed and displayed for image slices that were not originally acquired (e.g., orthogonal slices). In another variation, the similarity images could be volume rendered with WO 01/39123 CA 02391289 2002-05-10 pCT~S00/32204 arbitrary slicing and high/low transparency values. Surface rendering at particular similarity values could be implemented.
Figures 7A and 7B illustrate a technique by which the threshold may be dynamically varied by the user during viewing of the MR image. According to this embodiment, the automatic threshold which has been previously described is assigned as the beginning threshold. A first classification is done using the automatic threshold and the techniques as previously described. In a further embodiment of the present invention, the user is given control over this threshold so that it becomes a dynamic threshold which is user-controllable while the MR image is viewed on the screen. The user thus has the capability to modify this threshold to make it larger and thus include more data elements within the selected class or, to make it smaller and thus make it more difficult for data elements to be within the class. The user can thus dynamically vary the sensitivity of the class segmentation to select that image which has the lowest number of false positives, while having an accurate count of the true positives. This would permit the tissue classification results to be viewed across the full range from maximum sensitivity to maximum specificity.
One method for giving the user access is providing a slider bar as part of a window display of the image. Movement of the slider bar dynamically alters the threshold value, the effect of which is immediately visually displayed to the user in the displayed image. Referring to Figures 7A-7B, depicted therein is a window display 81 of an image 83 having a scale 80 on the side. A slider bar 82 is on scale 80 that can be moved by the user under control of a cursor on a mouse or keyboard. At the bottom end of the scale 80 is indicated a numerical value 84 which corresponds to the threshold value at the lowest end of the scale. At the upper end is a top number 86 which corresponds to the numerical value of the threshold value at the highest end of the scale. In this case, the threshold numbers are 0.1 and 31.14, respectively, but the values can be different for each sample.
The position of the slider 82 on the scale under the automatic threshold value is placed on the position on the scale which corresponds to that automatic threshold value.
This may be a high, or low value depending on the relationship of data elements in the training set to each other as has been described.
Figure 7A illustrates an example in which the training set data elements are quite similar to each other and thus, slider bar 82 is near the bottom indicating a low threshold number. For this low threshold, only those tissues which are very similar to each WO 01/39123 CA 02391289 2002-05-10 pCT~S00/32204 other will be considered within the same class as the training set. Thus, using this threshold number, two sites of similar tissue, 88 and 90 were identified as potentially being the same type of tissue, and thus likely candidates for a spreading tumor at two locations. With this low threshold, the region 90 is a very small region and appears as a single speck. It is difficult for the physician to determine whether this represents similar type tissue or not and if so, the size of the tissue at region 90. According to the present invention, the physician is able to move the slider 82 upward in order to increase the threshold number, to thus dynamically vary the threshold. As the threshold is increased, the region 90 becomes significantly larger, as does region 88, see Figure 7B. This provides a clear definition to the physician that region 90 is a similar type tissue and indicates the exact size. Of course, the physician may move the slider bar higher in order to further increase the threshold number to include more tissue types. However, doing so will cause spurious signals which may easily recognized as false positives. Accordingly, the dynamic threshold may be adjusted back downward using the slider bar 82 to that location which most accurately reflects the locations of the tissue of concern.
Of some importance, the region 92 becomes highlighted in Figure 7B when it was not shown to be in the same class of tissue in the first Figure 7A when shown as a gray scale pattern. Thus, region 92 which may have been suspected of being a tumor site in the gray scale image of Figure 7A is clearly shown to be of the same tissue, and is displayed as belonging to the same class as tissues 88 and 90. Thus, the physician has confirmed that the tumor has spread to a third location which was not apparent viewing the gray scale image of Figure 7A. Other potential sites for the tumor, locations 91 and 93 of Figure 7A, however, are not highlighted and the physician is able to confirm that the tumor has not spread to locations 91 or 93. Thus, the use of the dynamic threshold is beneficial to illustrate those areas of the tissue which are most similar to each other and represent the same tissue type while also indicating those areas of the tissues which are in fact not similar to each other even though, at a first glance of a gray scale pattern they may appear similar in many ways.
A user friendly computer screen can be provided as shown in Figures 8A-8C
according to the present invention to provide other advantages for the physician. As indicated in Figure 8A, the classification threshold can be changed by moving slider bar 114.
In the example shown in Figure 8A, the actual value of the threshold number 17, which corresponds to the position as represented by the slider bar 82. For example, in Figure 8A the actual threshold numeric value is 17 whereas in Figure 8B, the numeric value has been increased to 35 as shown in above bar 114. The user may easily click on the arrow on slider bar 114 to move the threshold to any desired location. Also, the system may be returned to the custom threshold number previously calculated by clicking on the box 118 labeled "default" and thus return to the automatic classification technique if desired. The alow on slider bar 114 may be moved as desired to increase or decrease the threshold values. The computer screen also illustrates the ease of use of the present invention for the radiologist. A
patient name box 100 is provided which provides the patient's name. The time and date that the data was collected is shown in box 102. Indicated in box 101 is the slice number and in box 103, the number of slices available for viewing. Box 104 permits a user to sort the various slices or to select the type of sample to be studied. Box 106 permits a user to turn on or off the overlay features. Box 108 permits the user to indicate that a region of interest is going to be created corresponding to the region of interest step 38 of Figure 2. Cluster selection is carried out within this region of interest by clicking on box 108 and guided tissue selection can be carried out by clicking on box 112, corresponding to steps 40 and 42 of Figure 2.
The user can also achieve other functions using tool bar 110 such as clearing prior settings, box 108, zooming out, applying different masks, or other automatic features as indicated by the boxes. The screen is also advantageous in that it provides to the user detailed information regarding the exact view currently under consideration.
Box 1120 indicates the correspondence of the test sample color to the training set color The screen can also show, side-by-side, the images with the overlay and the images without the overlay. Thus, the figure on the left hand side is the unmodified MR
image with potential tumor sites 120, 122, 126, 128 and 140. A radiologist may have difficulty determining tumor from non-tumor locations in the conventional image. With the invention applied, the tumor sites and non-tumor sites are more readily seen.
Thus, region 120 is clearly seen as tumor. By setting the threshold to 35, as shown in Figure 8B, regions 122 and 140 can clearly be seen as tumor sites, similar to region 120 whereas regions 126 and 128 can be seen as fluid ducts, not tumor sites. The correct diagnosis can thus be made more quickly and with better certainty. Figure 8C provide a full color similarity image. A rainbow scale 115 is provided below the slider bar 114 to show the color range of the image. The tumor site clearly shows as a dark red in region 120. Of the other potential sites, regions 122 and 140 have a small red center with a fringe color of yellow then green. This indicates they are likely tumor sites. Yet, regions 126 and 128 remain the same color, indicating there is not tumor present at these locations. Thus, the present invention provides an easy to use computer screen readout which may be manipulated by the radiologist for a clear understanding of the tissues under investigation.
While different embodiments of the invention have been illustrated and described, it is to be understood that various changes may be made therein without departing from the spirit and scope of the invention. As will be evident from the foregoing, the disclosed embodiments consider a single tissue at each time or each tissue independently, instead of classifying all tissues at one time. This enables modification of the score data as described above. Prior segmentation methods attempt to segment all present tissues, thus generating a score at each pixel location for all of the tissue classes. Data is thresholded usually by classifying the tissue by the lowest distance score and/or highest membership value, and is generally fixed and not adjustable by the user. However, the embodiments of the present invention disclosed herein can be integrated within a traditional method of segmentation.
When all samples are modeled a threshold could be set for one or more sample groups allowing the sensitivity to be changed on a group by group basis.
Thus, from the foregoing it will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention.
Accordingly, the invention is not limited except as by the appended claims.

Claims

1. A method for establishing and applying an automatic threshold determination to a data sample, the method comprising:
collecting a plurality of data elements representative of the properties of an object;
obtaining a training set composed of a group of the data elements;
measuring the extent of similarity among the data elements in the training set to determine the maximum distance between the individual data elements in the group;
setting a threshold distance based on a measure of similarity between the pixels in the group;
comparing the thresholded measure of similarity between the data elements in the training set and test data elements not in the training set; and classifying the test data element as a member of a class when measured similarity between the test data element and the training set is less than or equal to the threshold value.

2. The method of claim 1, wherein measuring the extent of similarity comprises determining the maximum Euclidean distance among the data elements in the training set.

3. The method of claim 2, wherein determining the maximum Euclidean distance comprises:
calculating the distance using multivariate computational methods.

4. A method for determining and applying object-specific threshold parameters to magnetic resonance measurements, the method comprising:
obtaining magnetic resonance measurements of an object from an MRI
apparatus;
generating an array of pixels from the magnetic resonance measurements for displaying an image, including producing a training set comprising one or more training samples, the training set being formed from a plurality of congruent first images of a training region of the object, each first image being produced using an MRI pulse sequence different from the pulse sequences used to produce the ether first images;
producing a data set comprising a plurality of test data samples, the test data set being formed from a plurality of congruent second images of a test data region of the same object, the second images being produced using the same MRI pulse sequences as the first images;
generating score data for each pixel in the array of pixels; and modifying the score data to produce an enhanced, object-specific image.

5. The method of claim 4, wherein modifying the score data comprises automatically generating a score data rejection threshold.

6. The method of claim 4, wherein modifying the score data comprises dynamically altering the score data to produce a corresponding dynamically-enhanced image.

7. The method of claim 4, wherein modifying the score data comprises automatically generating a score data rejection threshold.

8. The method of claim 4, wherein modifying the score data further comprises dynamically altering the automatically-generated rejection threshold to produce a dynamically-enhanced image.