WO2016059385A1

WO2016059385A1 - A method for optimising segmentation of a tumour on an image, and apparatus therefor

Info

Publication number: WO2016059385A1
Application number: PCT/GB2015/052981
Authority: WO
Inventors: Emiliano SPEZI; Béatrice BERTHON; Christopher Marshall
Original assignee: University College Cardiff Consultants Limited; Velindre Nhs Trust
Priority date: 2014-10-17
Filing date: 2015-10-12
Publication date: 2016-04-21

Abstract

The method comprises a calculation step comprising calculating a value for one or more parameters associated with the image; a prediction step comprising making a prediction of the most accurate automatic segmentation method, from amongst a plurality of automatic segmentation methods, for segmenting the image, wherein the prediction is based upon the one or more values calculated in the calculation step and a set of rules for the selection of the best automatic segmentation method to apply to a given tumour image; and a segmentation step comprising segmenting the image using the automatic segmentation method predicted as the most accurate in the prediction step.

Description

A method for optimising segmentation of a tumour on an image, and apparatus therefor

Field of the invention

An aspect of the present invention relates to apparatus and method for optimal segmentation of PET images. An aspect of the present invention relates to the development of a prediction model for optimised, preferably automatic, segmentation of H&N tumours on FDG-PET images.

II. A. Hypotheses and previous work

II. B. Generation of training datasets ............24

II.C. Building of the decision tree method 27

II. D. Evaluation of the method with phantom data 29

Ι Ι Ι. Λ. Hypotheses and previous work ....31

III. B. Generation of a training dataset 32

III.C. Building of decision tree 33

HID. Validation of the decision tree 35

Vl(iii) - Workflow and Analysis 47

VI(v) - Growing of the GT V based on ATL A AS .49

VI (vi) - Shrinking of the GTV based on ATLAAS 49

VI (vii) - ATLAAS information discarded 50

I. Background, introduction and hypotheses

The past decade has seen an increasing interest in the use of positron emission tomography (PET) using 18F-fluorodeoxy glucose (FDG) in clinical oncology. This technique, allowing the visualization of the metabolic activity of tumour tissue, shows great promise in applications such as radiotherapy treatment planning and the monitoring of response to therapy. However, the use of PET-based information for these purposes relies on the availability of an accurate and reliable segmentation technique for defining the tumour, which is made challenging by the low resolution of PET images. In the Head and Neck (H&N), the proximity of physiological high uptake areas [l] and the intratumour heterogeneity, due to large necrotic areas and intense focal uptakes [2a], [2b], make this task particularly difficult. The low resolution of PET coupled with other metabolicaliy active structures makes delineation challenging. In particular in the H&N, organs such as the pharyngeal muscles, spinal cord and salivary glands, which should be spared to minimise

I morbidity and improve quality of life, can generate additional background FDG uptake.

Although clinician expertise is crucial in the delineation of the Gross Tumour Volume (GTV), manual delineation processes are extremely time consuming. In addition, there is a large amount of evidence in the literature suggesting that manual contours are highly dependent on the operator as well as on the delineation instance. Due to a number of publications recommending the use of automatic delineation, some centres have implemented PET-based automatic segmentation (PET-AS) tools within their clinical practice. These mainly consist of low-level segmentation, such as threshold- based methods, which have been shown to lack accuracy and robustness to a number of image parameters [4]. The number of published and validated advanced PET-AS methods is currently growing, as a number of centres are aiming at implementing such methods into their planning protocol. However, the focus of the literature remains on individual experience of different centers, with methods tested on their own data, often covering only a small range of variation in image parameters. The wide range of variation in tumour characteristics observed for clinical, H&N cases and the large number of segmentation methods published independently make it difficult to recommend a single delineation method. As a consequence, there is a need for an exhaustive comparison of different segmentation approaches on a wide range of data for the development or recommendation of the optimal PET -AS method. In particular, it is important to challenge the assumption that a single segmentation algorithm exists that yields excellent delineation accuracy for every type of tumour.

It is, therefore, an object of the present invention to seek to alleviate the above identified problems.

SUMMARY OF INVENTION

According to a first aspect of the present invention, there is provided a method for optimising segmentation of a tumour on an image, the method comprising:

a calculation step comprising calculating a value for one or more parameters associated with the image; a prediction step comprising making a prediction of the most accurate automatic segmentation method, from amongst a plurality of automatic segmentation methods, for segmenting the image, wherein the prediction is based upon the one or more values calculated in the calculation step and a set of rules for the selection of the best automatic segmentation method to apply to a given tumour image; and

a segmentation step comprising segmenting the image using the automatic segmentation method predicted as the most accurate in the prediction step.

The present invention provides a method to predict and use the best PET- AS algorithm; for example in FDG PET imaging. The method of the present invention was shown to have excellent accuracy across a wide range of complex phantom data and achieved a prediction of the best (Pbest) or near-best (P₁₀) segmentation in a very large number of cases. Validation data indicates that the present invention is a robust method that is capable of using an accurate segmentation algorithm when other methods fail. It has been found that the method of the present invention has the capability to predict the best or near-best segmentation methods in a well-defined and reproducible way, thus providing optimal and robust image segmentation. The present invention provides a novel, fast and reliable tool for tumour delineation; for example, in radiotherapy treatment planning of H&N cancers. The present method provides a fully automated approach that is useful in delivering operator independent PET segmentation in the radiotherapy planning process.

PET imaging plays a valuable role in radiotherapy treatment planning and the present invention, by providing more accurate and reproducible delineation of the GTV, represents a significant improvement. An accurate PET -based GTV is usually smaller than, for example a Computed Tomography (CT) based volume and adjacent normal tissue, particularly parotids, is spared, leading to reduced long term morbidity, xerostomia and improved quality of life.

Preferably, the one or more parameters relate to at least one of the size, homogeneity, geometry, intensity or texture of the tumour in the image.

Preferably, the one or more parameters comprise one or more of: a) tumour volume:

b) tumour-to-background ratio using peak tumour intensity;

c) Coefficient of Variation (COV), defined as:

ΟΟΥ=σ/μ where σ and μ are the standard deviation and mean of the intensity values of the tumour on the image, respectively;

d) Intensity Variability ( !v ), wherein the image of the tumour is divided into voxels, and wherein Iv is defined as:

with n(ij) the number of voxels in regions of intensity level i and size j, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image 64 discrete intensities;

e) Size-zone Variability (Sv), wherein the image of the tumour is divided into voxels, and wherein Sv is defined as:

with i (ij) the number of voxels in regions of intensity level i and size j, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities;

f) NL the number of intensity levels within the image of the tumour, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities;

g) R: the number of homogeneous regions within the image of the tumour, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities; and h) the number of homogeneous region sizes within the image of the tumour, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities.

Preferably, the one or more parameters may additionally or alternatively comprise:

i) HDS: Hausdorff Distance to Sphere, quantifying the distance between the segmented tumour surface A and the surface B of a sphere of same volume and same centre of mass using the Modified Hausdorff Distance:

with A and NA, B and NB the set of points and number of points within the segmented surface and sphere surface respectively, and d (a, B), the minimal Euclidian distance between point a and the points in B.

For example, in a preferred embodiment, the one or more parameters comprise one or more of:

a) tumour volume;

b) tumour-to-background ratio using peak tumour intensity;

c) Coefficient of Variation (COV), defined as:

COY σ. μ where σ and μ are the standard deviation and mean of the intensity values of the tumour on the image, respectively;

d) Intensity Variability ( Iv ), wherein the image of the tumour is divided into voxels, and wherein Iv is defined as:

with n(ij) the number of voxels in regions of intensity level i and size j, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities; e) Size-zone Variability (Sv), wherein the image of the tumour is divided into voxels, and wherein Sv is defined as:

with n(ij) the number of voxels in regions of intensity level i and size j, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities;

f) NI: the number of intensity levels within the image of the tumour, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities;

g) R: the number of homogeneous regions within the image of the tumour, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities;

h) NRS: the number of homogeneous region sizes within the image of the tumour, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities; and

i) HDS: Hausdorff Distance to Sphere, quantifying the distance between the segmented tumour surface A and the surface B of a sphere of same volum e and s :

Preferably, the set of rules is based upon the conformity between delineation contours produced by each of the plurality of automatic segmentation methods for a plurality of training images and reference delineation contours for the plurality of training images. Preferably, the conformity is represented by a Dice Similarity Coefficient (DSC), a Jaccard Conformity Index, a Van Riet Conformity index or a Mean Distance to Conformity.

The Dice Similarity Coefficient (DSC) was used in the method of the present invention to quantify the segmentation accuracy, as it provides information on the spatial conformity of the ground truth and segmented contours, combining information on the sensitivity and positive predictive value of the segmentation within a unique accuracy score. However, it is understood that the method of the present invention is independent of the output metric used and could theoretically be built with any other accuracy score.

Preferably, the set of rules includes information on the effect of different values for the one or more parameters on the conformity of delineation contours produced by- each of the plurality of segmentation methods with reference delineation contours.

Preferably, the set of rules is codified as a Decision Tree, using the Classification and Regression Trees statistical method, for each of the plurality of automatic segmentation methods.

Preferably, the calculation step comprises using a single automatic segmentation method to calculate the one or more parameter values from the image.

Preferably, the single automatic segmentation method comprises one of the plurality of automatic segmentation methods.

Preferably, the single automatic segmentation method comprises the automatic segmentation method having the lowest standard deviation in its conformity for the plurality of training images.

Preferably, the plurality of training images comprise simulated tumour images having values for at least one parameter, which are within a clinically-observed range. Preferably, the plurality of training images comprise simulated tumour images having values for all parameters, which are within a clinically-observed range.

Preferably, the calculation, prediction and segmentation steps are performed as an automatic process.

Preferably, the image is produced by Positron Emission Tomography or Single Photon Emission Tomography.

Preferably, the tumour belongs to the class of H&N tumours. Preferably, the tumour is in a subject's head and/or neck area. Preferably, the tumour is a brain tumour.

Preferably, the tumour is a tumour of the respiratory tract, preferably of the lung.

According to another aspect of the present invention, there is provided a method of building a prediction model for predicting the most accurate automatic segmentation method, from amongst a plurality of automatic segmentation methods, for segmenting a tumour on an image, wherein the segmentation comprises generating a contour for delineating the tumour on the image, the method comprising:

calculating a value for one or more parameters within the image of the tumour; using the plurality of automatic segmentation methods to segment a plurality of training images of training target portions; and

evaluating the conformity between the delineation contours produced for the training images by the plurality of automatic segmentation methods and expected delineation contours.

Preferably, the conformity is represented by a Dice Similarity Coefficient, a Jaccard Conformity Index, a Van Riet Conformity Index, or a Mean Distance to Conformity.

Preferably, the methods further comprise making a determination of the effect of changes in one or more parameters associated with the training target portions on the conformity between the delineation contours produced for the training images by the plurality of automatic segmentation methods and expected delineation contours.

Preferably, the methods further comprise building a Decision Tree for each of the plurality of automatic segmentation methods based upon the conformity between the delineation contours produced for the training images by the plurality of automatic segmentation methods and the reference delineation/expected delineation contours.

According to a further aspect of the present invention, there is provided apparatus arranged to perform a method as described herein.

According to another aspect of the present invention, there is provided a computer- readable medium having computer-executable instructions adapted to cause a computer system to perform a method as described herein.

According to a further aspect of the present invention, there is provided a computer programmed to perform a method as described herein.

Another aspect of the present invention relates to a method of optimising segmentation of an image, the image including a target portion, the method comprising:

an extraction step comprising extracting, from the image, values for one or more parameters associated with the target portion;

a prediction step comprising making a prediction of the most accurate automatic segmentation method, from amongst a plurality of automatic segmentation methods, for segmenting the image, wherein said prediction is based upon information relating to the accuracy of each of the plurality of automatic segmentation methods in segmenting training images of training target portions corresponding to a plurality of different values for the one or more parameters; and

o Preferably, the target portion comprises an image of a lesion and the one or more parameters relate to at least one of the size, homogeneity, geometry or intensity of the lesion in the image.

Preferably, the one or more parameters comprise one or more of:

a) lesion volume:

b) SUVpeak, the peak (mean value within a lcm³ sphere centred in the maximum) SUV, wherein SUV is defined as:

SUV — . Patient weight,

Corrected injected dose

c) a ratio between SUVpeak and the background SUV value, wherein the background comprises at least a portion of the image which does not form part of the target portion; d) Coefficient of Variation (COV), defined as:

£ΟΥ=α/μ where σ and μ are the standard deviation and mean of the intensity values of the lesion within the image, respectively:

e) Intensity Variability (Iv), wherein the image of the lesion is divided into voxels, and wherein Iv is defined as:

^{/v =}∑.∑.(ⁿ(iJ))²

with the number of voxels in regions of intensity level i and size j, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities;

f) Size-zone Variability (Sv), wherein the image of the lesion is divided into voxels, and wherein Sv is defined as:

_>2

Sv = ) ) (n_(iJ)) with the number of voxels in regions of intensity level i and size j, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities;

g) NI: the number of intensity levels within the image of the lesion, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities:

h) ^'NR.: the number of homogeneous regions within the image of the lesion, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities; and

i) RS: the number of homogeneous region sizes within the image of the lesion, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities.

j) non- sphericity.

It is understood that a measure of the non-sphericity is the (HDS) Hausdorff Distance to Sphere, as defined previously.

a) lesion volume;

b) SUVpeak, the peak (mean value within a 1cm³ sphere centred in the maximum) SUV, wherein SUV is defined as:

SUV = Actⁱvⁱty . Patient weiqht:

Corrected injected dose

€ΟΥ=α/μ where σ and μ are the standard deviation and mean of the intensity values of the lesion within the image, respectively;

e) Intensity Variability (Iv), wherein the image of the lesion is divided into voxels, and wherein Iv is define

g) NI: the number of intensity levels within the image of the lesion, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities;

h) NR: the number of homogeneous regions within the image of the lesion, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities;

i) RS: the number of homogeneous region sizes within the image of the lesion, preferably after resampling the image to a plurality of discrete intensities, and most preferably after resampling the image to 64 discrete intensities and

j) non- sphericity.

Preferably, the information relates to the conformity between delineation contours produced by each of the plurality of automatic segmentation methods for the training images and the reference delineation/expected delineation contours for the training images. Preferably, the conformity is represented by a Dice Similarity Coefficient (DSC), a Jaccard Conformity Index, a Van Riet Conformity index or a Mean Distance to Conformity.

Preferably, the information further comprises information on the effect of different values for the one or more parameters on the conformity of delineation contours produced by each of the plurality of segmentation methods with reference delineation contours.

Preferably, the information is codified as a Decision Tree for each of the plurality of automatic segmentation methods.

Preferably, the extraction step comprises using a single automatic segmentation method to extract the one or more parameter values from the image.

Preferably the single automatic segmentation method comprises one of the plurality of automatic segmentation methods.

Preferably, the single automatic segmentation method comprises the automatic segmentation method having the lowest standard deviation in its conformity for the training images.

Preferably, the training images comprise simulated lesion images having values for at least one parameter which are within a clinically-observed range.

Preferably, the training images comprise simulated lesion images having values for all parameters which are within a clinically-observed range.

Preferably, the extraction, prediction and segmentation steps are performed as an automated process.

Preferably, the image is produced by Positron Emission Tomography or Single Photon Emission Tomography. Preferably, the target portion comprises an image of a tumour in a subject's head and/or neck area.

According to another aspect of the present invention, there is provided a method of building a prediction model for predicting the most accurate automatic segmentation method, from amongst a plurality of automatic segmentation methods, for segmenting an image, wherein the segmentation comprises generating a contour for delineating a target portion within the image, the method comprising:

using the plurality of automatic segmentation methods to segment a plurality of training images of training target portions; and

evaluating the conformity between the delineation contours produced for the training images by the plurality of automatic segmentation methods and the reference delineation/expected delineation contours.

An aspect of the present invention relates to the development of a prediction model for optimised segmentation of H&N tumours on FDG-PET images. The invention is however applicable to any class of tumour (e.g. H&N, Lung, Brain), in particular for which the image comes from a scan based on the detection of radiation emitted by radionuclides injected in a body.

An aspect of the present invention provides a method of optimising the accuracy of segmentation of tumours (which may according to a preferable feature be H&N tumors) on images (which may according to a preferable feature be FDG-PET images) by considering/applying only the predicted best performing method.

Within this specification embodiments have been described in a way which enables a clear and concise specification to be written, but it is intended and will be appreciated that embodiments may be variously combined or separated without parting from the invention. For example, it will be appreciated that all preferred features described herein are equally applicable to ail aspects of the invention described herein, and vice versa.

DET AILED DESCRIPTION A small number of publications in the field of medical image segmentation have suggested the use of machine learning techniques, for continuously improving algorithms, which "learn" from the data they are used on. Machine learning allows a given algorithm to be built and optimised using existing data for which the outcome is known, in order for it to achieve optimal performance for unknown cases. Machine learning techniques include methods such as Artificial Neural Networks, Support Vector Machine, and Decision Tree learning, which have been used in the literature for the segmentation of medical imaging by classifying voxels into different categories. The main advantages of such techniques are their high predictive power and their ability to adapt to any given dataset. In addition, training methods can be continually improved or updated by modifying the training dataset, which implies that a method developed for data acquired in one centre could be easily transferred to a different centre, provided training data is available. The present inventors have appreciated that decision tree methods are particularly useful, as they allow visualising the final algorithm and the set of rules learned during the training process, which then provides a way of implementing these into an optimised decision process. The present inventors have appreciated that decision tree learning could therefore be a powerful tool in the exploration of a wide range of data cases, representing clinical tumours, for the determination of optimal segmentation.

An aspect of the present invention provides a PET-AS method, optimised for the delineation of H&N cancer tumours. In previous studies, the present inventors have investigated and compared the accuracy of a number of segmentation approaches, selected as the most promising in the current literature, for a wide range of FDG uptake distributions. Results showed that different approaches performed best or worse for different types of image. An aspect of the present invention aims at taking forward these observations in building a single segmentation method using machine learning techniques, to achieve optimal segmentation accuracy in H&N PET data.

In order that the present invention may be more readily understood, embodiments thereof will now be described, by way of example only, with reference to the Tables included in the following, and the accompanying Figures, of which: FIGURE I shows a flow chart according to an embodiment of a training aspect of the present invention;

FIGURE 2 shows a flow chart according to an embodiment of a segmentation aspect of the present invention;

FIGURE 3 shows the interrelation between the embodiments of Figs. 1 and 2;

FIGURES 4A to 4F show stages in a segmentation process according to an embodiment of the present invention;

FIGURES 5a and 5b show a comparison between binary segmentation and multiple clustering in the case of a heterogeneous tumour with high SUV peak;

FIGURES 5c, 5d, 5e, 5f and 5g show examples of phantom scans overlaid with reference true contours used for the validation of the method of the present invention, wherein figure 5c shows filiable spheres with thin plastic walls; figures 5d and 5e show non-spherical filiable inserts with thin plastic walls; figure 5f shows spheroidal lesion on printed H&N phantom and figure 5g shows irregular lesion on printed H&N phantom.

FIGURE 6a shows a printout pattern for a sphere of 58mm diameter with 4 intensity levels and FIGURE 6b shows the resulting PET image, reference contours and contours from GCM delineation method applied to 2, 4, 6 and 8 clusters;

FIGURE 7 shows the range of TBRpeak and volume values obtained for simulated cases, compared to recorded clinical primary (P) and lymph node (LN) cases;

FIGURE 8 is a graph showing the accuracy of a model developed according to an embodiment of the present invention, compared to single segmentation approaches, over a validation phantom dataset;

FIGURES 9 to 15a show simplified versions of the decision trees obtained for each one of the methods tested according to an embodiment of the present invention; Figures 15b and 15c show a representation of decision trees built according to the one aspect of the method of the present invention for PET-AS methods b) KM2 and c) AT;

Figure 15d is a graph showing the accuracy of a model developed according to an embodiment of the method of the present invention (dashed black line) compared to individual PET-AS methods (solid line) across the validation data set, including tillable phantom and printed H&N phantom data;

Figures 16 to 20 show further simplified versions of the decision trees obtained for each one of the methods tested according to an embodiment of the present invention;

FIGURES 21 to 30 describe an embodiment of the present invention; and

FIGURES 31 TO 34 illustrate the findings of a clinical feasibility and impact study, including FIGURE 33 showing GTVp volumes and DSC index for manual and ATLAAS contours and FIGURE 34 showing quantification of the changes to the final volume (growth and shrinkage) based on the ATLAAS outlines, and differences between ATLAAS and CT/MRI not taken into account in the final GTV.

As noted above, image segmentation, for example on PET images, is currently done manually, but this is time consuming and involves a lack of reproducibility. A number of automatic segmentation algorithms are known, but there is no real consensus on which is best.

An embodiment of the present invention works at a higher level, to predict which of a plurality of automatic segmentation algorithms will work best for segmenting a given image of a tumour. According to an embodiment, a training dataset is built by testing a plurality of automatic segmentation algorithms on images of simulated or "phantom" targets having a known contour. As the ground/clinical truth of the target contour is known, the efficacy of a given algorithm for segmenting a particular phantom target may be determined. This has led to the identification of a number of parameters, which may be assessed for a given image of a tumour to determine which automatic segmentation algorithm of the plurality of automatic segmentation algorithms would return the most accurate segmentation of the image of the tumour.

According to an embodiment, parameters relate to, for example, the size, homogeneity and geometry of the tumour. The present inventors have appreciated that, in certain embodiments, heterogeneity parameters appear to be particularly determinative of which automatic segmentation algorithm will return the most accurate segmentation result for a given image of a tumour. According to embodiments, this appears to particularly be the case for H&N tumours.

An embodiment of the present invention works by automatically examining a given image, e.g. a PET image, to determine values for one or more parameters relating to, for example, the size, homogeneity and geometry of a tumour in the image, then selects the optimum (i.e. most accurate) automatic segmentation algorithm, from a plurality of automatic segmentation algorithms, to use for segmenting that image based on a training dataset of accuracy results for each of the plurality automatic segmentation algorithms in segmenting training images, and then performs segmentation of the image using the selected automatic segmentation algorithm. According to this embodiment, the steps of this process are performed as an automated sequence, and all that is really required of a user is to generally indicate, in a given image (e.g. PET image), the area which the automatic segmentation process should focus on.

According to an embodiment, the training dataset(s) could be extended at a later date to refine the system further. According to an embodiment, this is done by adding scans and "ground truth" data that bring new information in the training datasets e.g. by extending the range of certain imaging parameters to meet the image characteristics found in scans from other PET centres, or to focus on other anatomical sites.

A training aspect of the present invention relates to building a predictive model for predicting the automatic segmentation method, from amongst a plurality of automatic segmentation methods, which will return the most accurate segmentation result for a given image of a tumour.

An embodiment according to this training aspect is shown in the flow chart of Fig. I . According to this embodiment, in a first step I a plurality of training data images, e.g. simulated PET images, are produced of targets having known contours, preferably within a targeted range of values observed clinically.

Image parameters are determined for each of these training data images in a second step 2. These parameters may include various characteristics such as at least one of the size, homogeneity, geometry, intensity or texture of the tumour on the image, tumour dimension, tumour position, FDG uptake value, signal to background ratio and intensity distribution characteristics.

In a third step shown generally at 3, for each of the training data images, the "true" i.e. known contour of the target shown in the training image is compared against the contour produced for that target by each of a plurality of automatic segmentation methods.

This comparison is quantified at a fourth step 4 by a conformity metric, which in the present embodiment comprises a Dice Similarity Coefficient reflecting the conformity of contours produced by the plurality of automatic segmentation methods to the corresponding known contours.

The process is repeated for a plurality of training images, and the results are used in a fifth step 5 to build a predictive model for predicting the automatic segmentation method, from amongst the plurality of automatic segmentation methods, which will return the most accurate segmentation result for a given image of a tumour not belonging to the training dataset e.g. a clinical scan image.

In an embodiment of the present invention, these results are codified as Decision Trees for each of the plurality of automatic segmentation methods. A segmentation aspect of the present invention relates to employing the predictive model, built according to the training aspect, in predicting the automatic segmentation method, from amongst a group of automatic segmentation methods, which will return the most accurate segmentation result for a given clinical image of a tumour, and using the predicted most accurate automatic segmentation method to segment the clinical image.

An embodiment according to this segmentation aspect is shown in the flow chart of Fig. 2, in which a clinical image of a tumour is produced in a first step 6.

In a second step 7 of the present embodiment, image parameters associated with that clinical image, and corresponding to the parameters derived for the training data images in the training aspect are extracted. Accordingly, these parameters may include various characteristics such as at least one of the size, homogeneity, geometry, intensity or texture of the tumour on the image, tumour dimension, tumour position, FDG uptake value, signal to background ratio and intensity distribution characteristics.

In a third step 8 of the present embodiment, the parameters extracted from the clinical image are used in conjunction with the predictive model built according to the embodiment of the training aspect described above, to predict the most accurate automatic segmentation method to segment the clinical image.

In a fourth step 9 of the present embodiment, the clinical image is segmented using the predicted most accurate automatic segmentation method.

Fig. 3 shows the interrelation of the embodiments of the training and segmentation aspects described in relation to Figures 1 and 2, in which the predictive model built in the embodiment of the training aspect is employed by the embodiment of the segmentation aspect to segment a given clinical image.

Figures 4a to 4f show various stages in use of an embodiment of the segmentation aspect being applied to a clinical image. First, a user opens the scan image in a viewer, as shown in Fig. 4a.

Next, the user selects to perform segmentation according to the present embodiment of the segmentation aspect, as shown in Fig. 4b.

This results in the appearance of the segmentation interface as shown in Fig. 4c: using this interface, the user highlights the general area of the tumour within the image to be segmented (in the present embodiment, this is done by a user drawing a box around the general area of the tumour).

The area highlighted by the user can be displayed in the scan image in the viewer, as shown in Fig. 4d.

The user may then launch the segmentation process of the present embodiment by issuing a suitable command in the segmentation interface, as shown in Fig. 4e.

Segmentation is then performed using the predicted most accurate automatic segmentation method according to the present embodiment, and the result of the segmentation is displayed as a contour defining the tumour region in the viewer, as shown in Fig. 4f.

The embodiment of Figs. 4a to 4f is implemented in Matlab, and the following information used during the segmentation in the illustrative example of Figs. 4a to 4f appears in the command line of Matlab:

Tumour parameters:

Vol - 9.3641

SUVpeak = 54.9772

! BRpcak 3.7203

COV = 31.7823

Iv = 34976

Sv = 2282

NR = 254

NRS = 6 NI = 43

HDS = 1.9932

Methods included:

segM

Columns 1 through 6

'AT' 'RG' 'KM2' 'KM 3' CM2^* 'FCM3'

Columns 7 through 11

CM4^* CM5^* 'GCM2' 'GCM3' 'GCM4'

Columns 12 through 14

■GCM5^* 'AC 'WT

Predicted DSCs :

DSC =

Columns 1 through 5

0.7160 0.8340 0.8470 0.7250 0.5570

Columns 6 through 10

0.8000 0.7950 0,7930 0.7900 0.7940

Columns 11 through 14

0.8050 0.7010 0.8030 0,5070

Best predicted method: KM2

DSC = 0.847

Running segmentation for method KM2 ...

In the following, embodiments of the present invention are further described.

II. Methods

II.A. Hypotheses and previous work

Several segmentation approaches were selected by the present inventors from a literature review as the most promising for GTV delineation on PET. The present inventors implemented one or more segmentation algorithms using each approach and made them fully automatic. The methods have been described in detail in a previous publication [6], and are presented in Table 1. All methods were implemented with Matlab. The clustering methods KM, FCM and GCM are written to separate the voxels of the target image into a number of homogeneous clusters. The methods can be applied to the detection of any desired numbers of clusters, and automatically selects the lowest intensity cluster to be the background, grouping the other clusters into the region corresponding to the segmented tumour.

Algorithm Description Example(s)

AT 3D Adaptive iterative thresholding, Jentzen et a/.[3b], Drever et al.[ 3c]

using background subtraction

RG 3D Region-growing with automatic Day et at. [3d]

seed finder and stopping criterion

KM K-means iterative clustering with Zaidi et al.[3e]

custom stopping criterion

FCM Fuzzy -C-means iterative clustering Belhassen et al, [3fj

with custom stopping criterion

GCM 3D Gaussian Mixture Models ---based Halt et al. [3 g]

FCM with custom stopping criterion

WT Geets et al [8a] Tylski et al [8b]

Table 1. Description of PET-AS approaches implemented and investigated in this study

Structures consisting of a wide range of intensity values can include several regions of different mean intensities. Binary methods classify voxels into two categories corresponding to the tumour and the background. With such methods, tumour voxels of low intensity will therefore be classified as background, even if their intensity level is higher than the mean background value. As a consequence, a reduced sensitivity is expected for binary segmentation methods when delineating highly heterogeneous structures. Clustering methods, used for classifying voxels into several clusters, have the potential of overcoming this by identifying the multiple regions in a heterogeneous structure. This is illustrated in Figure 5 (5a and 5b), which shows a comparison between binary segmentation and multiple clustering in the case of a heterogeneous tumour with high SUV peak. The present inventors formulated the following hypotheses:

Multiple clustering methods are more accurate than binary methods for the delineation of heterogeneous tumours of large volume. * The optimal number of clusters for methods KM, FCM and GCM is proportional to the number of uptake levels in the tumour.

To test these hypotheses, the present inventors modelled six spheres of diameters 10, 13, 22, 30, 38 and 58 mm with heterogeneous uptake distributions, by printing each sphere out as 4 concentric spheres of different diameters with different intensity levels. The diameters of the concentric spheres were fractions of 1 , 0.66, 0,5 and 0.33 of the outside diameter. The intensities in concentric spheres of decreasing diameters were set with ratios of 1, 1 .5, 2, 2,5 to model a sphere with 4 different intensity levels, and 1, 1,2,2 for a sphere with 2 intensity levels. The intensity of the outside sphere was set to 5 times the background intensity in each case. The same tumour to background ratio (TBR) of 4 was used to print one set of homogeneous spheres for comparison purposes. The methods KM, FCM and GCM were applied to all spheres modelled for the detection of 1, 2, 3, 4, 5, 6, 7 and 8 clusters.

The segmentation accuracy was assessed by quantifying the conformity of the contours obtained to the reference true contour, which was evaluated using the Dice Similarity Coefficient, defined as:

DSC=(2*|XnY|)/(|X where jX| and |Yj are the number of voxels in the reference and test contour, respectively. DSC above 0.7 is considered as an indicator of good overlap [5], The reference contour was derived by generating a contour from the object's known geometry, which was positioned on the PET image.

II.B. Generation of training datasets

Large datasets were generated for the training of a decision tree method aimed at predicting the best segmentation approach to use for each new tumour. Decision tree methods take as an input a single variable corresponding to the outcome to be predicted for the training dataset and one of more predictive independent variables used to split the data into different categories leading to different outcomes. The final decision tree represents a set of rules used for the optimal prediction of the outcome on the given dataset. The present inventors used PETSTEP, a fast PET simulator described previously [8c], [8d], [14] for the generation of a training set of realistic H&N PET images with modelled tumours from user-defined contours. The PETSTEP simulator was used for the rapid generation of large training sets. PETSTEP operates in the open source framework of the Matlab-based Computational Environment for Radiotherapy Research (CERR) [8e], It uses a CT image and a map of the background FDG uptake to generate a PET image from tumour contours drawn by the user.

For a first training dataset, an FDG activity map, registered to a clinical CT image, was used as a simulation template to model typical background H&N activity levels. The training dataset was generated by adding tumours of varying sizes and activity distributions to the background FDG uptake map. This was done automatically from initial tumour contours drawn manually on the FDG uptake map, to model different irregular shapes and locations.

The data was aimed at producing a large set of test cases, covering the range of tumour metrics observed for clinical H&N data at the inventors' centre. For this purpose, two types of lesions were identified on available clinical data:

® Primary lesions, located around the oral cavity, tonsils and base of tongue ® Nodal lesions, located in the parotid and submandibular spaces.

The present inventors used 10 different initial contours for generating the dataset, 8 corresponded to typical primary lesions locations, and were generated with targeted volumes and intensities ranging within 7-75mL and maximum SUV values of 5-40, corresponding to the ranges clinically observed. The remaining contours represented nodal lesions and were similarly generated with volumes and intensities ranging within 0.5-40 mL and maximum SUV values of 2-25. A total of 100 PET images were generated for each contour geometry for regularly spaced values of the tumour volume and uptake spanning the specified range, leading to 1000 cases in total. For each tumour geometry, uptake value and volume, the inventors also simulated homogeneous uptake, increased uptake in the tumour centre with 2 and 3 uptake levels, and a tumour with necrotic centre. In addition, global tumour heterogeneity was modelled in each homogeneous region by randomly assigning to the voxels uptake values extracted from a Gaussian distribution centred on the targeted mean uptake value, with standard deviation of 100% of the mean value.

An additional set of 68 training images were generated with hand-drawn contours and randomly assigned uptake values.

A second training dataset was generated based on existing PET/CT data of a tillable phantom. Target tumours covering a range of different characteristics relevant to clinical situations were added to the background of this phantom. A total of 100 spherical tumours were modelled for volume and maximum uptake values in the range O.SrnL - 50mL and 4000Bq/mL - 40000Bq/mL respectively. For all the scans in the training dataset the target outline was known since it was one of the input parameters in the synthetic image generation process. We refer to this outline as "reference true contour" or "ground truth".

For the first training dataset, the following measurable metrics describing the simulated tumours were identified:

* Vol: tumour volume (mL)

TBRpeak: Ratio between the tumour SUV_peak , calculated as the mean value in a 1 cm³ sphere centered on the maximum SUV in the tumour, and the background SUV, calculated as the mean intensity in a 0.5 cm thick extension of the contour. SU V is the Standardised Uptake Value. COV : Coefficient of Variation, defined as:

COY σ μ

with σ and μ the standard deviation and mean of the lesion intensity values respectively.

Regional texture features were extracted to investigate the influence of the number of intensity levels in the tumour. These metrics rely on the identification of regions of connected voxels with the same intensity value, after resampling the tumour to 64 discrete intensity levels as described by Haralick et al. [9]. The following texture metrics were calculated:

* Iv: Intensity Variability:

With n(ij) the number of voxels in regions of intensity level i and size j . 8 Sv: Size-zone Variabili

• NI: number of intensity levels

• NR: number of homogeneous regions

• NRS: number of homogeneous region sizes.

For the second training dataset, only the following parameters describing the tumour uptake were identified as classifiers of the DT learning method:

• V: tumour volume (mL)

TBRpeak: Ratio between the tumour SUV_peak, calculated as the mean value in a 1 cm³ sphere centred on the maximum Standardised Uptake Value (SUV) in the tumour, and the background SUV, calculated as the mean intensity in a 0.5 cm thick extension of the tumour contour.

® NI: a regional texture feature related to the intensity distribution in the target object. The number NI of intensity levels in the tumour is obtained based on the methods described by Loh et al. [15] to calculate grey level am length, which rely on the identification of regions of connected voxels with the same intensity value. NI is the number of the corresponding intensity values identified, after resampling the tumour to 64 discrete intensity levels as recommended by Tixier et al. [16].

The workflow to build the statistical model was fully automated and computer scripts were used to (1) determine evenly spaced values of the volume and FDG uptake spanning the ranges specified, (2) generate the synthetic training scans, (3) calculate the value of corresponding tumour parameters, (4) segment the synthetic scans using the individual PET-AS methods, (5) calculate the segmentation accuracy for each method by comparing the result of the segmentation with the ground truth and (6) build the corresponding decision trees.

II.C. Building of the decision tree method A prediction model was built using a decision tree technique to define a set of rules for the selection of the best segmentation method to apply to a given tumour image, on the basis of the value of the image and tumour parameters. The present inventors used the statistical analysis software SPSS 20 (IBM, Chicago, USA), or Matlab software can be used, for the selection of the optimal segmentation approach among the methods involved, which included ail methods described above, and AT, RG, and clustering methods KM applied to the detection of 2 or 3 clusters, and FCM and GCM applied to the detection of 2 to 8 clusters in the case of the first training dataset. Other software packages (e.g. Matlab) may equally be employed for this selection task. The segmentation accuracy was assessed by quantifying the conformity of the contours obtained to the reference true contour provided by the simulation template in PETSTEP, using the DSC.

For the second training dataset, only the PET-AS methods that provided non- correlated DSC values across the dataset were kept in the study. Correlated methods were identified using SPSS 20, as pairs of PET- AS for which the Pearson's r correlation coefficient was larger than 0,95 (r>Q,95) and the associated p-value smaller than 0.05 (p<0.05). The best performing (highest average DSC) of each correlated pair was kept. Only the following uncorreiated methods were considered for further analysis: AT, RG, WT, KM2, KM3, KM4, FCM2, GCM3 and GCM4.

For each segmentation approach, a tree was grown for the prediction of the DSC score obtained on different tumour types. The present inventors used the Classification and Regression Tree (CART) growing method with ail tumour and image characteristics entered into the model as prediction variables. The impurity measure used was "Gini" and the maximum tree depth was set to 10 with a minimum number of 50 cases per node. The software returned the trees obtained for each segmentation approach, together with its risk estimate, which represents the average classification error on the whole dataset made by assigning each case to its DSC predicted by the tree. The importance to the model was also calculated for each variable, as the sum of the improvements brought to the model on all the nodes, weighted by the proportion of cases considered in each node. The trees were pruned back to their optimal size with a minimum risk allowed of 1 standard deviation of the average risk. The final segmentation method was built using the decision trees derived for each method. It calculates for each new case the image parameters using one contours obtained by one of the single segmentation methods, followed by the predicted DSC score for each approach, and selects the algorithm reaching the highest predicted DSC value to be applied.

For the second dataset, the decision trees predicting, for each PET-AS, the DSC score obtained according to the values of the different tumour parameters were built automatically with the Matlab statistics toolbox, based on the Classification And Regression Tree (CART) growing method. The Matlab method RegressionTree.fit was used with its default settings, including the impurity measure "Gini", ensuring the data is classified into homogeneous groups of values. For each tree, the cross- validation error was calculated as the relative difference between the true DSC and the DSC predicted by the tree, averaged on a randomly chosen subset (10%) of the training dataset.

ILD. Evaluation of the method with phantom data

The performance of the decision tree method was evaluated on a set of images obtained with a tillable phantom with minimal plastic wall thickness used in a previous study [6], and a sub -resolution sandwich (SS) printed phantom also described in previous work [7]. The fill able phantom was imaged with hot spheres of 0.5 to 102 raL at Tumour-to-Background Ratios (TBRs) of 1.4 to 6.4. The SS printed phantom method allows modelling any given FDG uptake distribution, and producing a tri-dimensional object to scan. The phantom was used to generate realistic H&N data with irregularly shaped tumours, located in the base of tongue, tonsils and parotid space. The geometry, size and location of the tumour printouts were validated by a radiologist as relevant for the modelling of tumours in this work. The following tumour uptake distributions FDG uptake patterns commonly observed for H&N squamous cell carcinoma primary tumours and malignant lymph nodes [2b] were modelled:

Homogeneous uptake with TBRs: 2, 4, 6 and 8

Uptake distribution following a Gaussian model (higher at the centre of the tumour)

s Necrotic uptake with no uptake at the centre of the tumour Random distribution of intensity values across the tumour (with a standard deviation of 20% of the mean intensity)

The image parameters used in the model were calculated for this dataset using the ground truth contour, obtained from the corresponding CT or simulation template. Only the images for which these fell into the range used in the training dataset were kept, which led to using 115 validation images for the first dataset and 85 for the second dataset.

The model built from the first training dataset was applied to each test case of this validation dataset, on the basis of image parameters calculated from an initial contour obtained by running the method GCM2, GCM2 was chosen in this case because if returned the lowest standard deviation in DSC value son the training dataset. The average DSC was calculated for the model of the present embodiment, and compared with DSC values obtained for each segmentation approach separately.

For the second training dataset, the accuracy of ATLAAS was evaluated for the segmentation of a wide range of phantom data generated for previous studies, within the range of volume and TBR_peak used in the training of ATLAAS. The data included:

® 37 filiable phantom test images obtained from using thin-wall spherical [6] and non-spherical [10] plastic inserts including ellipses, tori, tubes, pear- and drop-shaped objects,

® 17 cases of heterogeneous spheres (1 , 2 or 4 homogeneous regions of different uptakes) obtained with the printed SS phantom [7],

® 31 H&N homogeneous and heterogeneous (random and Gaussian smoothed uptake) cases generated with the printed subresolution sandwich (SS) phantom, using a printout template representing realistic H&N background uptake.

Figures 5c, 5d, 5e, 5f, and 5g show examples of phantom scans with overlaid referenced showing true contours used for the validation of the model. The target objects shown are:

Figure 5c - filiable spheres with thin plastic walls;

Figure 5d and Figure 5e - non-spherical tillable inserts with thin plastic wails; Figure 5f - spheroidal lesion on a printed head and neck (H&N) phantom; and Figure 5g - irregular lesion on a printed H&N phantom The ATLAAS predictive segmentation model was applied to each test case of this validation dataset. The PET-AS methods in Table 1 were also individually applied to each test case. The mean and minimum (min) DSC across the validation dataset were calculated for the method of the present invention and for each of the different PET- AS methods. In addition, for each method the percentage of cases where it returned the best DSC (Pbest) and a DSC value within 10% of the best PET-AS algorithm (P₁₀) was determined. The Wilcoxon signed-rank test was used to determine if the ATLAAS method of the present invention and the PET-AS generated DSC distributions were significantly different.

The second dataset did not include highly irregular lesions, or on images with heterogeneous background, but it is intended that this shows precisely the influence of the training dataset on the accuracy of the predictive model of the present invention. It is expected that a training dataset containing irregular ground truth data for highly irregular lesions and heterogeneous background images will further improve the performance of the present invention.

It is also understood that, in a further embodiment of the present invention an automated procedure is used to build the training dataset. This allows building and using the model with data from any centre or any PET imaging system. Indeed the method of the present invention could be built for any tracer or imaging modality (e.g. SPECT), as long as corresponding data with reference true contours are available. There is no limit to the number or type of PET-AS algorithms that can be included in the method of the present invention providing predictive segmentation model,

III. Results

ΪΪΙ.Α. Hypotheses and previous work

Table 2 gives the number of clusters providing the most accurate segmentation (highest DSC) for GCM applied to spheres S1-S6 with 1, 2 or 4 activity levels. Similar results were obtained for FCM. These results show that the number of clusters is strongly correlated with the number of activity levels and with the dimension of the spheres. The highest DSC values were obtained when the segmentation was based on the identification of 2 clusters for KM, except in one case for which it was 3 clusters.

SI S2 S3 S4 S5 S6

1 level 2 9 2 2 9

2 levels 2 4 4 6 4 4

4 levels 2 9 5 6 6 6

Table 2, Number of clusters providing the most accurate segmentation (highest DSC) for GCM applied to spheres S1-S6 with 1, 2 or 4 activity levels.

Figure 6a shows one transverse slice of the printed pattern used to model a 58 mm diameter sphere consisting of 4 different concentric spheres of increasing intensity levels. Figure 6b shows a transverse slice of the PET image obtained by scanning the printed pattern in Figure 6a within the sub-resolution phantom. The contours provided by the GCM method applied to the detection of 2, 4 and 6 and 8 clusters are shown on Figure 6b. The accuracy of the methods increased with increasing number of clusters. The binary methods described in Table 1, AT, RG, and FCM and GC applied to the identification of 2 clusters, all had DSC values lower than 0.65 compared to a value of 0.94 reached for GCM used for 6 clusters. This is due to the fact that they recovered only the most intense part of the spheres, as illustrated on Figure 6b. KM used for 2 clusters reached a high DSC when used for 2 (DSC=0.89) and 3 clusters (DSC=0,93).

Figure 7 shows a comparison between the range of SUVpeak and Volume values obtained for the different training test cases of the first training dataset, compared to the values measured for clinical patients.

III.B. Generation of a training dataset

Simulated training cases for which the TBR_peak. SUV_peak or Volume values were outside the observed clinical range were discarded. In particular, low TBRs were sometimes achieved due to the proximity of intense normal uptake (tonsils, brain, etc.), which hampers the delineation process. A total of 815 cases were finally analysed for the first training dataset. Table 3 presents the range of values obtained for the different metrics considered, for both simulated and clinical images. The parameter values simulated ensured good coverage of the clinical values observed, with a tendency for lower texture feature values, in particular Sv, and higher COV for clinical data.

Parameter TBRpeak Vol (mL) COV Iv Sv NI NR NRS

Clinical 1,1 - 8.4 0.44-67 10-51 153-12xl0⁵ 19-72xl0³ 13-63 15-1756 2-344 Simulated 1.1-9.0 0.54-77 5.5-51 298-99xl0⁵ 22-10xl0⁵ 18-65 20-5813 2-394 Phantom 1.1-3.5 0.3-12 8.5- 39 34 - 5.0xl0⁴ 8 -2.9xl0³ 9 - 50 8 - 305 2 - 19

Table 3, Comparison of the range of values measured for the parameters considered for clinical, simulated for the first training dataset and validation phantom data.

For the second dataset, the simulated data used for training the model of the present invention covered TBR_peak, V and NI ranges of 0.44-59, 1.1-52 mL and 8-38 respectively. The clinical data observed at the inventors' centre had TBRpeak, V and NI in the range 1.1-8.4, 0.44-59 mL and 13-63 respectively.

III.C. Building of decision tree

Methods GCM and FCM applied to 6 or more clusters did not reach the highest DSC value for any of the training cases in the first dataset, and were therefore discarded in the rest of this study. A total of twelve trees were generated, corresponding to predictions for the delineation accuracy of AT, RG, KM 2, KM3, FCM2, FCM3, FCM4, FCM5, GCM2, GCM3, GCM4 and GCM5. Table 4 provides for each one of the methods the mean DSC obtained on the training dataset, the percentage of cases where it returned the highest DSC, the risk estimate of the tree obtained, and the normalised importance of each one of the variables included in the tree. NI was not included in any of the trees, and is therefore ignored in the rest of this analysis. TBRpeak was included in all of the decision trees generated, and was the most important variable to the model for six methods corresponding to GCM and FCM clustering with more than one cluster. COV was also included in eight out of twelve trees.

Method AT RG GCM2 GCM3 GCM4 GCM5 KM2 KM3 FCM2 FCM3 FCM4 FCM5

Mean 0,718 0.775 0.792 0.814 0,767 0.687 0.839 0,772 0.630 0.795 0.757 0.684

Returned 2.7 10 10 14 1.7 0.45 34 19 1.8 0.60 5.3 1.4 highest

DSC (%

cases)

Risk 0.015 0.014 0.006 0.006 0.010 0.010 0.003 0.008 0.020 0.010 0.012 0.012 estimate

Relative importance (%}

TBRpeak 91,5 95.2 42.0 100 100 100 44.1 74,2 79.3 100 100 100

VoS - - - - - - - 92.2 - 37.4 86.3 62.8

COV 100 100 64.7 77,3 - 30.3 15.3 - 100 63,8 - -

Iv 83.5 66.0 100

Sv - 100 - 69.4 64,1 96.4 - - -

NR 97.8 100 31.8

MRS 56.8

Table 4. Results of decision tree generation for the methods returning the highest

DSC in more than 50 of the training cases.

For the second training dataset, a total of 9 trees were generated using all tumour parameters described previously as classifiers. Table 5 shows the performance of the different PET-AS methods on the training dataset (mean DSC and P est) and the cross- validation error of the associated decision trees. All trees achieved a cross-validation error smaller than 1%, except for RG.

Method AT RG WT KM2 KM3 KM4 FCM2 GCM3 GCM4

Mean 0.919 0.589 0.929 0.932 0.873 0.811 0.675 0.820 0.917 Pbest (% cases) 4 22 26 39 <1 <1 <J 9 9 Cross- validation <1 9.0 <1 <1 <1 <\ <1 <1 <1 error (%)

Table 5 Performance of the PET-AS methods on the training dataset and cross- validation error of the associated tree. Figures 15c and 15b show the decision trees built for methods AT and KM2 using the second training dataset, including the image parameter and cut-off value at each node, and the predicted DSC at each extremity (leaf) of the tree.

IILD. Validation of the decision tree

A total of 1 15 cases were used for the validation of the model based on the first training dataset, including both data obtained with thin-wall spherical and non- spherical plastic inserts and images from a printed sub resolution sandwich phantom. The range of parameter values achieved for these cases, measured from the ground truth contour, are given in the bottom line of Table 3.

Figure 8 shows the DSC values obtained for the model (in black) and the single segmentation approaches for each case of the validation dataset. The best performing algorithm was correctly selected by the model of the present embodiment of the present invention in 35% cases, leading to the method of the present embodiment of the present invention returning the best DSC in 35% of cases. In 74% and 89% cases, the algorithm selected by the model of the present embodiment of the present invention reached a DSC value within 5% and 10% of the best performing algorithm for this case, respectively. The maximum error made, i.e. the maximum difference between the DSC value achieved by the selected algorithm and the best performing algorithm, was 33% of the best DSC.

Table 6 shows the average, median and range of DSCs obtained for the decision tree method applied to the phantom dataset, compared to some of the single segmentation approaches considered in this study. For the sake of comparison, the present inventors chose the single segmentation approach which reached the highest mean DSC (AT), and the segmentation approach which achieved the best DSC for the largest number of cases on the validation dataset (RG). In addition the result of the model of the present embodiment of the present invention were compared with the theoretical optimal segmentation achieved by choosing the best performing method for each case, names "optimal" in Table 6, The decision tree method achieved a mean DSC 7% higher than every one of the single segmentation approaches. The method of the present embodiment of the present invention returned a DSC higher than 0.364 in all cases, and reduced the number of cases with DSC lower than 0.5 to one. The difference in average accuracy between the model of the present embodiment of the present invention and the optimal theoretical method is 0.022 DSC.

Method Mean DSC Lowest DSC Ni umber of DSCs < 0.5

Optimal 0.853 0.412 1

AT 0.806 0.353 3

KM 0.804 0.345 2

Model 0.821 0.364 1

Table 6, Comparison of average segmentation accuracy (DSC), lowest DSC and number of DSC lower than 0,5 obtained for the validation phantom dataset for AT, KM, the model of the present embodiment of the present invention, and the optimal segmentation obtained by selecting the best method in each case.

With reference to Figure 15d, corresponding to the second training dataset, the phantom dataset built for the evaluation of the present invention included target objects with volumes and TBRpeak values in the range 0.58 mL -102 raL and 1.3-17 respectively. The results of the validation of the present invention with phantom data are given in Table 7 showing the mean and lowest DSC obtained on the dataset as well as Pbest and P₁₀ for the dataset and all PET-AS methods considered in the model. The method of the present invention returned the best DSC in 56% of cases, and a DSC within 10% of the best in 93% cases, compared to respectively 34% and 88% obtained when using the best single PET-AS methods on this dataset (AT). It is worthwhile noting that on average the present invention outperformed all the individual PET-AS methods on all the metrics used. On the phantom dataset, the present invention reached a minimum DSC of 0.650, which is more than twice the minimum DSC obtained using the best single PET-AS methods on this dataset (AT). Results of the Wilcoxon signed-rank test, given at the bottom of Table , showed statistically significantly different DSC values for the present invention compared to WT, KM3, KM4, GCM3 and GCM4. - method

FCM GCM GCM

AT RG WT KM2 KM3 KM4 of the

2 3 4

present invention

Mean DSC 0.841 0.846 0.800 0.836 0.660 0.537 0.587 0.793 0.752 0.858

Min DSC 0.353 0.057 0.214 0.202 0.079 0.088 0.142 0.104 0.099 0.650

Pbest (% eases) 34 22 3 26 1 <1 2 9 _^ 56

Pio (%eases) 88 81 58 72 10 I 10 68 54 93

Wilcoxon signed-rank test results

U 3721 3667 4690 3722 6093 6814 6786 4333 4690

0.012

p-value 0.362 0.432 <10^"3* 0.366 <10^'6* <10^"6* <l e^"6* <10"³*

*

*Stati stically significant

Table 7 Performance of the method of the present invention and of the different PET-AS methods for the phantom validation data, including mean, median and minimum DSC obtained, best and Pio values. The results of the Wilcoxon signed rank test are given in the bottom rows.

A full comparison of the segmentation accuracy (DSC) obtained for the method of the present invention, trained on the second dataset (black dashed line) and for the single PET-AS methods (solid Sines), for all the cases of the validation dataset, including filiable phantom and printed H&N phantom data, is shown in Figure 15d. It can be noted that the curve representing the method of the present invention is higher than individual PET -AS curves in most cases. When the method of the present invention did not reach the highest DSC, its DSC was still higher than 0.7. In addition, although the performance of other PET-AS methods such as AT, KM2 and RG was good overall, there were cases in which these methods had a low DSC score (e.g. case No. 10, 26, 32 and 38 in Figure 15d), while the present invention maintained a DSC higher than 0.7.

The average accuracy of the method of the present invention, trained on the second dataset across the filiable phantom data (Figure 15d, left) was 0.881 DSC compared to 0.852 for the best of the PET-AS methods. In the case of the printed H&N phantom data (Figure 15d, right). The present invention reached an average 0.819 DSC compared to 0.831 DSC for the best performing PET-AS method.

IV. Discussion

In making the present invention, an aim of the present inventors was to develop a model able to select and apply the best performing method among a selection of promising segmentation algorithms. The combination of several algorithms has been investigated in other studies using other combination processes. McGurk et al. [8] compared two voxel-wise methods for the combination of 5 different segmentation algorithms, in order to limit inconsistencies of the segmentation accuracy on a large dataset. In the present invention, the present inventors aimed at optimising the accuracy of their segmentation by considering only the predicted best performing method. This optimisation approach is different from the approach of McGurk et al., which is focused on robust segmentation, and allows reaching the highest achievable DSC values rather than a consensus of methods.

Accordingly, an aspect of the present invention provides a method of optimising the accuracy of segmentation of tumours (which may according to a preferable feature be H&N tumours) on images (which may according to a preferable feature be FDG-PET images) by considering only the predicted best performing method. The method of the present embodiment of the present invention aims for optimised segmentation accuracy, and is therefore dependent on the imaging system and on the type of lesions identified. For this reason, the method needs to be trained on a dataset targeting the type of images desired. In this embodiment, the present inventors used a fast PET simulator to automatically generate large datasets, ensuring a good accuracy for machine learning algorithms like the decision tree approach used. The process of generating the data was simple and automatic, and could be easily applied to a different centre and/or a different tumour site.

The final method of the present embodiment, trained on the first dataset, showed excellent accuracy across a wide range of phantom dataset, and achieved a prediction of the best or near-best segmentation (DSC within 5% of the best) in a large number of cases. This work demonstrated both the robustness of the method of the present embodiment, which systematically returns DSC values higher than 0.36, and its high accuracy, with a mean DSC obtained of 0.82.

Although a good coverage of the clinical range of prediction parameter values was obtained (cf. Table 3), the first training dataset consisted of a number of images with lower COV values than the clinical data, and images with higher texture parameters (Iv, Sv, R and RS). The difference in COV, which is an indicator of global image noise, could be caused by the difference between scanner acquired images and simulated images. The PET simulator used in testing the present embodiment was calibrated in previous work against phantom data obtained by the scanner used in testing the present embodiment, but this was done using non-TOF corrected data, as this feature was not yet available in the simulator. In addition, the reconstruction process and smoothing parameters used are slightly different from the scanner software, and did not account for geographical variations within the Field Of View. The difference in Haralick texture features could be due to a difference in the underlying tumour heterogeneity. In the present work on the present embodiment, random intensity variations were coupled with the definition of up to three regions of different mean FDG uptake. Clinical tumours are likely to bear more complex heterogeneity patterns, but this is not known at present, and there is little data available for the definition of realistic tumour uptake patterns.

Parameters describing the tumour heterogeneity (Iv, Sv, and NR) were included in the decision trees, in particular for clustering-based methods KM, GCM and FCM (cf. Table 4), which confiniis the hypothesis of the present inventors that the accuracy of these methods is strongly correlated to the tumour uptake distribution pattern, as also shown by the results presented in section III.A. In particular, parameter NR, which describes the number of homogeneous regions identified within the tumour after resampling to 64 intensity values, was included in decision trees for GCM2, KM3 and FCM2, with a high importance level.

When trained on the second dataset, the final method also showed excellent accuracy across a wide range of complex phantom data and achieved a prediction of the best (P est) or near-best (P₁₀) segmentation in a very large number of cases as shown in Table , The data indicate that ATLAAS is a robust method that is capable of using a good segmentation algorithm when other methods fail (cf. Fig. 15d). The Wilcoxon signed-rank test showed that the distribution of DSC values obtained for ATLAAS was not significantly different from those obtained with AT, RG and KM2 and GCM3. This is due to the fact that these were the PET-AS methods that ATLAAS predicted and applied in many cases. The contours generated by ATLAAS are not expected to be statistically different from the contours obtained with the individual PET-AS methods, since ATLAAS is coded to use these algorithms for image segmentation. The added value of ATLAAS is the capability of predicting the best or near-best segmentation methods in a well-defined and reproducible way, thus providing optimal and robust image segmentation.

The results also showed that ATLAAS largely outperformed all single PET- AS values on fillable phantom data but did not outperform the best PET-AS in the case of printed H&N phantom data. This can be explained by the fact that the training dataset for ATLAAS was built using images simulated from fillable phantom templates. Although the training dataset included non-spherical target lesions, ATLAAS was not trained on highly irregular lesions, or on images with heterogeneous background, which were found in the H&N printed data. This shows precisely the influence of the training dataset on the accuracy of the predictive model. It is expected that a training dataset containing irregular ground truth data for highly irregular lesions and heterogeneous background images will further improve the performance of ATLAAS. The training dataset was generated using PETSTEP, a fast and flexible PET simulation tool. PETSTEP has been previously calibrated and validated for the simulation of complex H&N uptake [14]. However, the current version of PETSTEP does not include axial filtering and Time-of-Flight correction, which was indeed used in ail scans acquired in the inventors' centre. Since the quality of the ATLAAS predictive model depends largely on the quality of the training dataset it is expected that any improvement in the quality of the images simulated with PETSTEP will results in an improved performance of ATLAAS.

V. Conclusion According to an aspect of the present invention, there is provided a model for automatic selection and application of a range of PET-AS method, which may according to embodiments provide excellent segmentation accuracy. The present inventors have validated this model on a large dataset of printed and tillable phantom images, and showed that this model is an improvement on the use of a single segmentation approach. In addition, according to a further aspect of the present invention, there is provided a full framework for the automatic training and building of such an optimised segmentation method, which can be applied to data from any centre, proton emission imaging system and selection of automatic segmentation algorithms.

Figs. 21 to 30 give details of an embodiment of the present invention.

Fig. 21 describes PET-based Optimisation of Target Volume Delineation for Head and Neck Radiotherapy, and in particular the findings:

® There is a large number of PET- Auto Segmentation (PET-AS) methods in the literature

® There is no consensus on the best method to use

® Previous work ([6], [7], [10]) showed that PET-AS methods perform differently in different situations.

According to the present embodiment, an automatic decision tree learning algorithm for advanced segmentation (ATLAAS) is proposed.

Fig. 22 describes that ATLAAS is trained to select and apply the PET- AS method for which the highest accuracy is predicted. In a training phase, training data is used to build a decision tree (set of rules) for each method to predict the DSC according to the value of image parameters. In a segmentation phase performed on clinical data, image parameters are calculated, and the DSC is predicted for each method using the decision trees. The method with the highest predicted DSC is applied.

Fig, 23 describes 25 fully automatic PET-AS methods being implemented with Matlab and CERR were included in the present embodiment, with reference to the table shown in the Figure. Clustering methods KM, FCM and GCM were applied to the detection of 2-8 clusters in the image in the present embodiment, including KM2, KM3,... ,KM8, FCM2, FCM3, ... ,FCM8 and GCM2, GCM3, ... ,GCM8. The accuracy is evaluated using the Dice Similarity Coefficient (DSC) according to the equation shown in the Figure,

Fig, 24 describes training data according to the present embodiment; in particular, a novel fast PET simulator (PETSTEP) was used to generate 1000 synthetic images of oropharyngeal lesions - 10 geometries x 5 sizes (ranging from 0.5 to 70mL) x 5 intensities (ranging from 3-45 SUV) x 4 texture types = 1000 simulated PET lesions.

Fig, 25 describes image parameters (estimated using KM2 contour) employed in the present embodiment:

Volume - size

TBRpeak (Tumour-to-Background Ratio using peak tumour intensity) - intensity

Coefficient Of Variation - texture

• Haralick ([9]) texture features (after resampling to 64 intensities)

o Intensity Variability - texture

o Size-zone Variability - texture

o NI (number of intensities) - texture

o NRS (number of homogeneous intensity region sizes) - texture o R (number of homogeneous regions) - texture

Fig. 26 describes evaluation data according to the present embodiment:

• Phantom: 1 15 phantom images obtained in previous work ([10], [6])

* 39 non-spherical thin-wall inserts (fillabie phantom) ® 58 H&N printed phantom lesions (printed sub-resolution sandwich phantom)

® 18 heterogeneous spheres (printed sub-resolution sandwich phantom)

• Simulated: 100 images simulated with PETSTEP for random volume/SUV values Fig. 27 shows evaluation results of the present embodiment. The data was generated within the range of TBRpeak and volumes clinically observed at the inventors' centre. All parameters except Sv were used in at least one decision tree.

Fig. 28 shows results for simulated data for the present embodiment, from which the following conclusions may be drawn:

® Mean DSC of 0.88 (close to optimal accuracy of 0.91)

® Higher accuracy than all methods except KM2 {DSC 0.89}

® Higher minimum DSC than all PET-AS methods (DSC=0.67)

• Returns DSC within 10% of the best in 89% cases

® Maximum difference to highest DSC of 22%

Fig. 29 shows results for phantom data for the present embodiment, from which the following conclusions may be drawn:

• Mean DSC of 0.83

• Higher accuracy than ail PET-AS methods

• Higher lowest DSC than all PET-AS methods (DSC=0.49)

• Returns DSC within 10% of the best in 77% cases

® Maximum difference to highest DSC of 33%

Fig. 30 indicates the following conclusions drawn in respect of the present embodiment:

The model ATLAAS developed:

reached high accuracy levels on a varied dataset

showed robustness to different types of data

eliminated large errors due to the use of a single PET-AS method

reached near optimal (within 10% best) accuracy in most cases

In addition, ATLAAS is an automatic process, which is potentially applicable to any site or data type.

ATLAAS is currently further improved and considered for implementation at the inventors' centres. As will be appreciated, an aspect of the present invention relates to a predictive model for optimal auto-segmentation of PET images. In particular, PET-AS methods can provide reliable and reproducible segmentations of the GTV for radiotherapy planning. However, a large variety of PET-AS methods have been validated on different datasets, making it difficult to recommend a single algorithm. The present application presents a predictive model for optimal auto-segmentation of PET images based on machine learning technology. The "ATLAAS" package (Automatic decision Tree Learning Algorithm for Advanced Segmentation) is provided according to an embodiment of the present invention, and can select and apply the best PET- AS algorithm from a pool of methods on which it has been trained.

Materials and methods: For each PET -AS included in the model, ATLAAS predicts its accuracy in segmenting a given target lesion. This is based on the analysis of several image parameters describing the lesion. The prediction is done using a set of rules (decision tree) derived by training ATLAAS on a large number of synthetic images representing realistic data generated using a fast PET Simulator, The accuracy of the PET-AS is quantified with the Dice Similarity Coefficient (DSC), using the known ground truth extracted from the synthetic lesion, A total of 25 PET- AS algorithms are currently implemented in ATLAAS. These include adaptive thresholding, region-growing, gradient-based and clustering methods. ATLAAS was optimised for H&N and was built using more than 1000 images. The validation of ATLAAS was carried out on 1 15 PET scans acquired using fillable spherical and non- spherical inserts and using printed sub resolution sandwich phantoms. Both homogeneous and heterogeneous uptakes were modelled. The segmentation of 10 H&N patients with ATLAAS was also compared to manual delineation carried out on PET/CT data.

Results: Nine image parameters were used in the ATLAAS predictive model, including the lesion volume, peak intensity, peak-to-background ratio, coefficient of variation, regional Haralick texture features and geometrical complexity indices. The mean accuracy of the decision tree method was 0.82 DSC for the phantom dataset, with a range of 0.36-0.96. In 80% of the cases, the method selected an algorithm providing a DSC within 5% of the best DSC across algorithms. The average DSC comparing the results of ATLAAS to manual PET/CT segmentation was 0.70, ranging between 0.65 and 0.92. Differences were mainly due to absence of CT component in ATLAAS.

Conclusions: The inventors have developed a method that predicts the best auto- segmentation algorithm for FDG PET images. The accuracy and robustness of ATLAAS has been shown on a large range of synthetic and experimental data. ATLAAS can be a useful tool for optimal segmentation of PET images used for RT planning and can be further optimised to include different anatomical regions and tracers.

VI - Clinical Feasibility and Impact Study : The feasibility and impact of using the novel, fully automated PET auto-segmentation method of the present invention in the radiotherapy planning process:

Further to the validation of the method of the present invention discussed above, set out below is a summary of a feasibility and impact study of including this novel PET- AS method in the RT planning process.

VI(i) - The Predictive Segmentation Model

The PET-AS was performed using a novel segmentation method developed at the inventors' centre, named Automatic decision Tree-based Learning Algorithm for Advanced Segmentation (ATLAAS). The ATLAAS predictive segmentation model is designed to select the most accurate PET-AS method for the optimal segmentation of a given PET image. This is achieved using a decision tree (DT) supervised machine learning method optimised using a training dataset, for which the segmentation outcome is known, to achieve best performance for cases in which the outcome is not known. ATLAAS is described and validated elsewhere for 6 classes of advanced PET- AS methods. In this work the feasibility of using the ATLAAS approach in clinically relevant cases is carried out implementing two different PET-AS algorithms in the decision tree, as shown in Figure 31.

The two algorithms used in this study are the Adaptive Thresholding method (AT) and the Gaussian mixture models Clustering Method using 5 clusters (GCM5) described in previous work [6]. In this implementation of the DT model, the best method is predicted on the basis of TBR_peak defined as the ratio between the tumour peak intensity value, (mean value in a 1 cm³ sphere centred on the maximum intensity voxel) and the background intensity (mean intensity in a 1 cm thick extension of a thresholded volume at 50% of the peak intensity value). A training dataset of 65 H&N tumours modelled with a sub-resolution printed sandwich phantom [7] was used to determine the optimal cut-off value of TBR_peak for this work.

The ATLAAS predictive segmentation model was implemented for this work in the Computational Environment for Radiotherapy Research (CERR)[13]. The segmentation accuracy was evaluated by quantifying the overlap between the set of voxels of the segmentation result and of the true contour using the Dice Similarity Coefficient [5].

VI (ii) -Acquisition of clinical data

This work was part of the POSITIVE (Optimisation of Positron Emission Tomography based Target Volume Delineation in Head and Neck Radiotherapy) project (REC No. 12/W A/0083) carried out at Veiindre Cancer Centre, a regional oncology centre in South Wales, UK. Twenty oropharyngeal stage III/IV cancer patients were recruited and consented to the study. The patients were treated with neoadjuvant chemotherapy followed by radical chemoradiotherapy, 66 Gy in 30 fractions delivered over 6 weeks using EVIRT. A planning FDG PET/^'CT scan was carried out on a GE Discovery 690 PET/CT scanner before chemotherapy to avoid changes in tumour volumes prior to outlining. The scans were acquired 90 minutes after FDG administration in treatment position with RT immobilisation shell. A registered PET and Contrast Enhanced CT (CECT) were acquired using 6-8 bed positions (top of the head to the sternum) of 3 min each. The images were reconstructed to 512 x 512 voxels for CT and 256 x 256 voxels for PET, using the reconstruction algorithm Vue Point FX (24 subsets, 2 iterations, 6.4 mm cutoff) including CT-based attenuation-, scatter- and Time-Of-Flight corrections. The CT scans were resampled at 3 mm interval slices for RT planning. Six weeks on average separated the PET/CT planning scan and the start of RT. The fit of the immobilisation shells was adjusted after induction chemotherapy and the patient was re-outlined and re-planned if necessary using the original CT/MRI/PET scan. Reporting was done by PET specialist radiologists after acquisition of the planning scans.

Vl(iii) - Workflow and Analysis

MRI scans acquired before recruitment, available for all patients, were fused to the planning PET-CT scan using the Mutual Information registration in ProSoma (MedCom GmbG, Darmstadt, Germany).

Planning scans for the first 10 patients recaiited were used to validate the workflow and verify that the ATLAAS method provided relevant contours for use. GTVs were manually outlined by three trained consultant clinical radiation oncologists on the registered PET/CT, using the software Velocity AI (Varian Medical Systems, Palo Alto, USA). The resulting GTVppET/cx were compared with ATLAAS contours in terms of their volume and geometrical overlap, using the DSC index.

Once the ATLAAS method was validated, another 10 oropharyngeal cancer patients were recruited for evaluating the use of ATLAAS in RT outlining. Manual delineation was performed by the trained radiation oncologists on the fused MRI and CT images in ProSoma (GTVpci/MRi). PET-AS contours were automatically generated using ATLAAS (GTVpATLAAs). GTVPATLAAS and GTVpcr/MRi contours were then imported into ProSoma where the final GTV contour (GTVpnnai) was generated manually by the treating clinician using the available contour data.

The use of the additional information brought by the ATLAAS contours was evaluated by comparing the different contours (GTVpcr/MRi, GTVPATLAAS, and GTVpfma for each patient, in terms of volume and geometrical overlap, using the DSC index, in addition, the clinicians were asked to report on any changes they made to the GTVpfinai as a result of including the ATLAAS contour in the delineation process. Lymph nodes, which are well defined on CT/MRI images, were not outlined using ATLAAS. However, the PET image was used to identify involved lymph nodes independently from the CT/MRI, to assess if the use of PET contributed any additional information.

VI(iv) - Results

The patient cohort included 17 men and 3 women with a median age of 63 years. Ten patients had tonsillar tumours, 8 base of tongue tumours and 2 soft palate tumours. Two patients needed re-planning after induction chemotherapy prior to the start of radiotherapy due to weight loss.

The comparison of ATLAAS and manual PET/CT contours on the preliminary group of 10 oropharyngeal cancer patients showed that ATLAAS successfully delineated the PET-avid tumour for ail patients. The ATLAAS volumes were smaller than the PET/CT volume for 7 out of 10 patients. The mean DSC between manual PET/CT and ATLAAS contours was 0.82, when 0.7 is considered to be an indicator of good overlap [5]. On the basis of these results it was decided that only ATLAAS and CT/MRI contours would be used in the outlining of the subsequent 10 patients recruited.

A comparison between manual and ATLAAS contours of the GTVp in terms of volume and DSC index is presented in Figure 33. It can be noted that the ATLAAS volumes were smaller than the corresponding CT/MRI volumes in 7 out of 10 cases. Furthermore, volumes outlined on PET were within 10% of those outlined on the CT/MRI scans in 4 out of 10 cases. The spatial conformity of ATLAAS and manual CT/MRI outlines was 0.70 DSC on average. ATLAAS and final GTVs volumes were within 30% of each other in 6 out of the 10 cases. GTVrmai volumes were larger than GTVATLAAS in all cases indicating that the consultant clinical radiation oncologists combined the information provided by the different imaging modalities. However, the ATLAAS volumes showed good conformity to the final contour, with an average DSC of 0.77. Figure 32 shows how the ATLAAS generated contours were used in deriving the final volume for seven different clinical cases, by highlighting differences between the overlaid GTVCT/MRI, GTVPATLAAS and GTVpunai contours on the CT/PET scan. Figure 34 reports the details of the changes to the final volume based on the ATLAAS outlines. The table also outlines the differences between ATLAAS and CT/MRI contours which have not been taken into account in the final GTV. For instance the data in the top row of Figure 34 shows that 100% of the ATLAAS volume was included in the final GTV in 4 cases, and more than 83% of the ATLAAS volume was included in the final GTV for all patients. The second row reports the proportion of the CT/MRI volume modified on the basis of the information provided by the ATLAAS generated volume. This value ranged from 6.5% to 33%.

VI(v) - Growing of the GTV based on ATLAAS

As reported in the third row of Figure 34, the CT/MRI GTV was grown locally based on the information provided by the ATLAAS contour (cf. upper part of Figure 32) for all clinical cases, with up to 10 mL added to make the final volume (cf. row 3 in Figure 34). Visual examination and reporting by the clinicians showed that this was done when additional disease extension was detected by the ATLAAS algorithm, and subsequently confirmed by clinical findings. This included larger superior-inferior disease extension (for five patients and up to 1.1 cm as reported for instance in Figure 34 row 3a), and extension of the disease identified across the midline (cf. Fig. 32c)..

VI (vi) - Shrinking of the GTV based on ATLAAS

Local shrinking (on one or more transverse slices) of the CT/MRI volume based on the information provided by ATLAAS was observed for 7 patients, and was more than 2 ml, for two patients. Contour shrinking was carried out when the smaller extension of the disease indicated by the ATLAAS contour was in agreement with the clinical findings and the CT or the MRI information. The volume was also shrunk in the superior-inferior direction for two patients (1.5 cm for patient No 16). In cases of largely conflicting information between image modalities, the CT/MRI volume was shrunk down to a compromise following the edge of the anatomical structures, as depicted in Figure 32(d) VI (vii) - ATLAAS information discarded

Differences between CT/MRI and ATLAAS contours were not considered in the final GTV when they included: a) bone tissue (0.1 mL for patient No 11, cf. Figure 32e),

b) areas encompassing air regions (for 5 patients, up to 6.6 mL for patient No 12, cf.

Figure 32f)

c) different superior-inferior disease extension in the ATLAAS contour which was not confirmed by anatomical imaging or clinical examinations (for 6 patients, up to 6.4 mL for patient No 13, cf. Figure 32g)

d) different transverse disease extension unconfirmed by anatomical imaging or clinical examinations (cf. some regions in Figure 32f)

In these cases, the amount in mL of the differences between the ATLAAS and the GTVpfinai contours, including both regions where ATLAAS was outside and inside the final contour, is given in row 5 and detailed in rows 5a-5d of Figure 34.

A total of 58 involved lymph nodes were detected for the 20 oropharyngeal cancer patients scanned. In total, 3 lymph nodes detected on CT/MRI were PET-negative. Additional malignant lymph nodes (12 in total) not seen on CT/MRI were detected for 6 patients when PET was included in the outlining process, one of which was a contralateral lymph node.

VI (viii) - Discussion

In this work the inventors have investigated the feasibility of using the ATLAAS predictive segmentation method for fully automatic PET segmentation of H&N cancer patients undergoing radical chemoradiotherapy. Following an initial validation study showing that the system provided useful and reliable contours for RT planning, ATLAAS was prospectively used in combination with manual CT/MRI data to derive the final GTV.

The ATLAAS-generated contours were smaller than the CT/MRI contours in most cases. Furthermore the ATLAAS contours provided additional information to anatomical contours manually drawn on CT and MRI In particular in this study it was found that additional information from ATLAAS included (a) disease locations previously not detected on CT, including largely different superior-inferior direction, and extension across the midline (e.g. Figure 32c), and (b) disease extension boundaries differing from anatomical data. ATLAAS auto-segmentation was used in all patients, which shows the confidence of the clinicians in using the method for RT planning at the inventors' centre. The clinician's judgment and expertise and the additional clinical data available (endoscopy or clinical examination results) remained paramount in the process. Nevertheless ATLAAS was very useful (a) in confirming the outlining when close to the CT/MRI contour, and (b) as a delineation guide when in disagreement with CT/MRI data, due for instance to different patient positioning and/or poor image registration.

The inventors have thoroughly investigated the impact of the ATLAAS segmentation on the generation of the final GTV volume, which was done manually by consultant clinical radiation oncologists. It was found that although ATLAAS led to shrinking the GTVcT MRi locally for 7 patients this led to a globally smaller final volume for only one patient. This confirms the suggestion that clinicians may not be prepared as yet to reduce the GTV volume based on PET. Indeed, PET-AS contours accurately identify the metaboiicaily active part of tumour rather than the whole tumour burden, which can be highly heterogeneous. The ATLAAS predictive segmentation model could be useful for determining, in a consistent and operator independent way, the highly metaboiicaily active areas of the tumour requiring a radiation boost. Correlation with additional information and clinical input would still be required in finalising the volumes for dose escalation.

Differences between the GTVpnnai and GTVPATLAAS volumes, corresponding to ATLAAS information discarded when deriving the final volume, amounted to between 0.6 and 45mL (cf. row 5 in Figure 34). This was due to: (a) the absence of PET signal in abnormal mucosa (Figure 32g), leading to PET-AS volumes being much smaller than the CT/MRI volumes, (b) high PET uptake observed in and around air cavities and/or bone (Figure 32c and h), which can be due to signal spill-out or inflammation. Spill-out effects can be corrected with simple CT-based thresholding, whereas unconfirmed soft tissue extensions of the disease, which represent a large part of the differences observed between CT MRI and PET contours (cf. rows 5c and 5d in Figure 34), are inherent to the difference between modalities.

ATLAAS was not used in this study for the delineation of nodal volumes, since involved lymph nodes were well defined on fused CT/MRI data. However, the results of this study show that the use of PET imaging can help identify additional involved lymph nodes for 30% of the patients.

In this work the fully automated ATLAAS predictive segmentation model replaced manual PET/CT outlining. This reduced considerably the clinicians' workload considering that PET-AS with ATLAAS took only a few seconds per patient. This can represent a very important change in routine clinical practice. In addition use of ATLAAS eliminates inter-observer variability, as the volumes derived using auto segmentation are fully reproducible.

VI (ix) - conclusions

The ATLAAS predictive segmentation model based on the decision tree machine learning method is a novel, fast and reliable tool for tumour delineation in radiotherapy treatment planning of Head and Neck cancer. ATLAAS' fully automated approach can be very useful in delivering operator independent PET segmentation in the radiotherapy planning process.

The embodiment illustrated in the figures referred to herein is described in relation to the segmentation of H&N tumours; these are often difficult to segment. Embodiments of the present invention may however equally be applied to other anatomical sites e.g. pelvis, lung, prostrate, brain, preferably by building a training dataset(s) for those areas in the manner described above, to provide a training dataset specific to those anatomical regions.

The illustrated embodiment has been described in relation to segmenting images produced by Positron Emission Tomography (PET). According to embodiments, this may have particular utility as the resolution of PET images may be poor e.g. of the order of a few millimetres. It is however to be understood that the present invention is by no means limited to the segmentation of PET images, and may according to further embodiments be applied to the segmentation of images produced by other means e.g. to the Single Photon Emission Computed Tomography (SPECT) imaging modality.

In the present application, "tumour" is to be understood as including both benign and malignant tumours, as well as any lesion requiring segmentation.

In the present application, the term "subject" or "patient" refers to an animal, preferably a mammal and in particular a human. In a particular aspect, the subject or patient is suffering from a disease, for example, cancer.

In the present application, references to a H&N tumour are to be understood as including any tumours belonging to the class of H&N tumours.

The following documents, which are referred to in the foregoing, are incorporated herein by reference in their entirety:

[1] T. M, Blodgett, M. B. Fukui, C. H. Snyderman, B, F. Branstetter, B. M.

Mccook, W. Dave, and C. C. Meitzer, "Combined PET-CT in the Head and Neck. Parti . Physiologic, Altered Physiologic and Artifactuai FDG Uptake," RadioGraphi.es, vol. 25, pp. 897-912, 2005,

[2a] U. Haberkorn, L. G. Strauss, C. Reisser, D. Haag, a Dimitrakopoulou, S. Ziegler, F. Oberdorfer, V. Rudat, and G. van Kaick, "Glucose uptake, perfusion, and ceil proliferation in head and neck tumors: relation of positron emission tomography to flow cytometry.," J. Nucl. Med., vol. 32, no. 8, pp. 1548-55, Aug. 1991.

[2b] E. Henriksson, E. Kjellen, P, ahiberg, T. Ohlsson, J. Wennerberg, and E. Bam, "2-Deoxy-2-[18F] fluoro-D-glucose uptake and correlation to intratumoral heterogeneity.," Anticancer Res,, vol. 27, no. 4B, pp. 2155-9, 2007.

W. Jentzen, L. Freudenberg, E. G. Eising, M. Heinze, W. Brandau, and A. Bockisch, "Segmentation of PET volumes by iterative image thresholding ," J, Nucl. Pied, vol. 48, no. 1, pp. 108-14, Jan. 2007

L. Drever, W. Roa, A. McEwan, and D Robinson, "Iterative threshold segmentation for PET target volume delineation," Med. Phys., vol. 34, no. 4, pp. 1253-1265, 2007.

E. Day, J. Betler, D. Parda, B. Reitz, A. Kirichenko, S. Mohamrnadi, and M. Miften, "A region growing method for tumor volume segmentation on PET images for rectal and anal cancer patients," Med, Phys., vol. 36, no. 10, pp. 4349-4358, 2009.

H. Zaidi and I. El Naqa, "PET -guided delineation of radiation therapy treatment volumes: a survey of image segmentation techniques.," Eur. J. Nucl. Med. Mol. imaging, vol. 37, no. 1 1, pp. 2165-87, Nov. 2010.

S. Belhassen and H. Zaidi, "A novel fuzzy C-means algorithm for unsupervised heterogeneous tumor quantification in PET," Med. Phys., vol. 37, no. 3, pp. 1309-1324, 2010.

M. Hatt, C. Cheze, A. Turzo, C. Roux, and I. T. Oncology, "A Fuzzy Locally Advanced Bayesian Segmentation Approach for Volume Determination in PET," IEEE Trans. Med. Imaging, vol, 28, no. 6, pp. 881-893, 2009, [4] E. C. Ford, P. E. Kinahan, L. Hani on, A. Alessio, J. Raj endran, D. L. Schwartz, and M, Phillips, "Tumor delineation using PET in head and neck cancers: Threshold contouring and lesion volumes," Med. Phys., vol. 33, no, 1 , pp. 4280-4288, 2006.

[5] L. Dice, "Measures of the mount of ecologic association between species.," Ecology;, vol. 26, pp. 297-302, 1945.

[6] B. Berthon, C. Marshall, A. Edwards, M. Evans, and E. Spezi, "Influence of cold walls on PET image quantification and volume segmentation.," Med. Phys., vol. 40, no. 8, pp. 1-13, 2013.

[7] B. Berthon, C. Marshall, and E. Spezi, "A novel phantom technique for evaluating the performance of PET auto-segmentation methods in delineating heterogeneous and irregular lesions" Eur. ], Nucl. Med. Moi. Imaging. Physics, vol. 2 (13), 2015

[8] R. J. McGurk, J. Bowsher, J. a Lee, and S. K. Das, "Combining multiple FDG- PET radiotherapy target segmentation methods to reduce the effect of variable performance of individual segmentation methods.," ed Phys., vol. 40, no. 4, pp. 1-9, Apr. 2013 ,

[8a] X. Geets, J. a Lee, A. Boi, M. Lonneux, and V. Gregoire, "A gradient -based method for segmenting FDG-PET images: methodology and validation ," Eur. J. Nucl. Med. Mol. Imaging, vol. 34, no. 9, pp. 1427-38, Sep. 2007.

[8b] P, Tylski, G, Bonniaud, E, Decenciere, J. Stawiaski, J, Couiot, and D.

Lefkopoulos, "F-FDG PET images segmentation using morphological watershed : a phantom study," Nucl. Sci. Symp. Conf. Rec. 2006, IEEE, vol. 4. IEEE, pp. 2063-2067, 2006. B. Berthon, E. Spezi, C. R. Schmidtlein, A. Apte, P. Gal avis, H. Zaidi, E. De Bernards, J. A. Lee, A. Kirov, and inc. Other contributors, "Development of a software platform for evaluating automatic PET segmentation methods: the PET- AS set," Radiother. OncoL, vol. I l l, no. Suppl. l, p. 177, 2014.

B. Berthon, I. Haggstrom, A. Apte, B. J. Beattie, A. S. Kirov, J. L. Humm, C. Marshall, E. Spezi, A. Larsson, and C. R, Schmidtlein, "PETSTEP: A Fast Positron Emission Tomography Simulator for Synthetic Lesion Simulation," Eur. J. Niicl. Med. Mol. Imaging, no. Suppl 1 ., 2014,

J. O. Deasy, A. I Blanco, and V. H. Clark, "CERR: A computational environment for radiotherapy research," Med. Phys., vol. 30, no. 5, pp. 979- 985, 2003.

R. M. Haralick, K. Shanmugam, and I. Dinstein, "Texturai Features for Image Classification," IEEE Trans. Syst. Man Cyhern., vol. SMC-3, no. 6, pp. 610- 621, 1973.

B. Berthon, C. Marshall, M. Evans, E. Spezi, "Evaluation of advanced automatic PET segmentation methods using non-spherical thin-wall inserts", Med Phys. 2014, 41(2):022502.

B. Berthon, R, B. Holmes, C. Marshall, S. Jayaprakasam, M. Evans, E, Spezi. "Evaluation of several automatic PET-segmentation algorithms for radiotherapy treatment planning in H&N using a printed sub-resolution sandwich phantom", in Proc. EANM 2013, Lyon.

B. Berthon, R. B. Holmes, C. Marshall, M. Evans, A. Edwards, E. Spezi. "Validating 8F-FDG PET automated segmentation methods for clinical use in H&N", In Proc. ICMP 2013, Brighton.

J. O. Deasy, A. I. Blanco, and V. H. Clark, "CERR: A computational environment for radiotherapy research," Med. Phys., vol. 30, no. 5, pp. 979- 985, 2003. [14] B. Berthon, I. Haggstrom, A. Apte, B. J. Beattie, A. S. Kirov, J. L. Humm, C. Marshall, E. Spezi, A. Larsson, and C. R. Schmidtlein, "PETSTEP: Generation of synthetic PET lesions for fast evaluation of segmentation methods," Physica M di a: European Journal of Medica Physics, In Press, 2015.

[15] H.-H. Loh, J.-G. Leu, and R. C. Luo, "The analysis of natural textures using run length features," IEEE Trans. Ind, Electron,, vol. 35, no. 2, pp. 323-328, May 1988.

[16] F. Tixier, C. C. Le Rest, M, Hatt, N, Albarghach, O. Pradier, J.-P, Metges, L.

Corcos, and D. Visvikis, "Intratumor heterogeneity characterized by textural features on baseline 18F-FDG PET images predicts response to concomitant radiochemotherapy in esophageal cancer.," J. Nucl. Med,, vol. 52, no. 3, pp. 369-78, Mar. 2011.

Within this specification, the term "about" means plus or minus 20%, more preferably plus or minus 10%, even more preferably plus or minus 5%, most preferably plus or minus 2%.

It should be understood that various changes and modifications to the presently preferred embodiments described herein will be apparent to those skilled in the art. Such changes and modifications can be made without departing from the spirit and scope of the present invention and without diminishing its attendant advantages. It is therefore intended that such changes and modifications are covered by the appended claims.

Claims

1 . A method for optimising segmentation of a tumour on an image, the method comprising:

a calculation step comprising calculating a value for one or more parameters associated with the image;

a prediction step comprising making a prediction of the most accurate automatic segmentation method, from amongst a plurality of automatic segmentation methods, for segmenting the image, wherein the prediction is based upon the one or more values calculated in the calculation step and a set of rules for the selection of the best automatic segmentation method to apply to a given tumour image; and

a segmentation step comprising segmenting the image using the automatic segmentation method predicted as the most accurate in the prediction step

2. A method according to claim 1 , wherein the one or more parameters relate to at least one of the size, homogeneity, geometry, intensity or texture of the tumour on the image.

3. The method according to claim 2, wherein the one or more parameters comprise one or more of:

a) tumour volume;

b) tumour-to-background ratio using peak tumour intensity;

c) Coefficient of Variation (COV), defined as:

where σ and μ are the standard deviation and mean of the intensity values of the tumour on the image, respectively;

^/v =∑.∑.(ⁿcu))² with n(ij) the number of voxels in regions of intensity level i and size j;

c _/ S izc~ zone Variability (SV), wherein the image of the tumour is divided into voxels, and wherein Sv is defined as:

with n(ij) the number of voxels in regions of intensity level i and size j;

f) NI: the number of intensity levels within the image of the tumour; g) NR: the number of homogeneous regions within the image of the tumour; and

h) NRS: the number of homogeneous region sizes within the image of the tumour.

4. The method according to claim 2, wherein the one or more parameters comprise one or more of:

a) HDS: Hausdorff Distance to Sphere, quantifying the distance between the segmented tumour surface A and the surface B of a sphere of same volume and same centre of mass using the Modified Hausdorff Distance:

HDS = HD(A, B) = max d(b, A))

with A and NA, B and NB the set of points and number of points within the segmented surface and sphere surface respectively, and d(a, B), the minimal Euclidian distance between point a and the points in B; and

b) non-sphericity.

5. The method according to claim 3, wherein the one or more parameters additionally comprise one or more of:

i) HDS: Hausdorff Distance to Sphere, quantifying the distance between the segmented tumour surface A and the surface B of a sphere of same volume and same center of mass using the Modified Hausdorff Distance:

HDS = HD(A, B) = maxf-i-V ά(α,

with A and NA, B and NB the set of points and number of points within the segmented surface and sphere surface respectively, and d (a, B), the minimal Euclidian distance between point a and the points in B; and

j) non- sphericity.

6. The method of any one of the preceding claims, wherein the set of rules is based upon the conformity between delineation contours produced by each of the plurality of automatic segmentation methods for a plurality of training images and reference delineation contours for the plurality of training images.

7. The method of claim 6, wherein the conformity is represented by a Dice Similarity Coefficient (DSC).

8. The method of claim 6 or 7, wherein the set of rules includes information on the effect of different values for the one or more parameters on the conformity of delineation contours produced by each of the plurality of segmentation methods with reference delineation contours.

9. The method of any one of the preceding claims, wherein the set of rules is codified as a Decision Tree for each of the plurality of automatic segmentation methods.

10. The method of any one of the preceding claims, wherein the calculation step comprises using a single automatic segmentation method to calculate the one or more parameter values from the image.

1 1. The method of claim 10, wherein the single automatic segmentation method comprises one of the plurality of automatic segmentation methods.

12. The method of claim 1 1 as dependent directly or indirectly upon claim 6, wherein the single automatic segmentation method comprises the automatic segmentation method having the lowest standard deviation in its conformity for the plurality of training images.

13. The method of any one of the preceding claims, wherein the plurality of training images comprise simulated tumour images having values for the parameters which are within a clinically-observed range.

14. The method of any one of the preceding claims, wherein the calculation, prediction and segmentation steps are performed as an automatic process.

15. The method of any one of the preceding claims, wherein the image is produced by Positron Emission Tomography or Single Photon Emission Tomography.

16. The method of any one of the preceding claims, wherein the tumour is in a subject's head and/or neck area.

17. A method of building a prediction model for predicting the most accurate automatic segmentation method, from amongst a plurality of automatic segmentation methods, for segmenting a tumour on an image, wherein the segmentation comprises generating a contour for delineating the tumour on the image, the method comprising: a calculation step comprising calculating a value for one or more parameters associated with the image;

using the plurality of automatic segmentation methods to segment a plurality of training images of training target portions;

and

evaluating the conformity between the delineation contours produced for the training images by the plurality of automatic segmentation methods and expected delineation contours and

making a determination of the effect of changes in one or more parameters associated with the training target portions on the conformity between the delineation contours produced for the training images by the plurality of automatic segmentation methods and the reference delineation/expected delineation contours, such that the prediction model is for use in a method of optimising segmentation of a tumour on an image by making a prediction of the most automatic segmentation method based on the determination of the effect of the changes in one or more parameters associated with the training target portions.

18. The method of claim 17, wherein the conformity is represented by a Dice Similarity Coefficient.

19. The method of any one of claims 17 to 18, further comprising building a Decision Tree for each of the plurality of automatic segmentation methods based upon the conformity between the delineation contours produced for the training images by the plurality of automatic segmentation methods and expected delineation contours.

20. Apparatus arranged to perform the method of any one of the preceding claims.

21. A computer-readable medium having computer-executable instructions adapted to cause a computer system to perform the method of any one of claims 1 to 20.

22. A computer programmed to perform the method of any one of claims 1 to 19.

23. A method substantially as hereinbefore described with reference to the accompanying drawings.