ITUB20150390A1

ITUB20150390A1 - METHOD FOR THE DIAGNOSIS OF ENDOMETRAL CARCINOMA

Info

Publication number: ITUB20150390A1
Application number: ITUB2015A000390A
Authority: IT
Inventors: Jacopo Troisi; Giovanni Scala; Pietro Campiglia; Fulvio Zullo; Maurizio Guida
Original assignee: Hosmotic Srl
Priority date: 2015-02-27
Filing date: 2015-02-27
Publication date: 2016-08-27
Also published as: WO2016135119A1

Description

?METODO PER LA DIAGNOSI DI CARCINOMA ENDOMETRIALE? ? METHOD FOR THE DIAGNOSIS OF ENDOMETRAL CARCINOMA?

DESCRIZIONE DESCRIPTION

La presente invenzione si riferisce ad un metodo per la diagnosi del carcinoma dell?endometrio mediante un?analisi metabolomica del sangue e la manipolazione bioinformatica dei profili metabolici mediante modelli di classificazione. The present invention relates to a method for the diagnosis of endometrial carcinoma by means of a metabolomic analysis of the blood and the bioinformatic manipulation of metabolic profiles by means of classification models.

Il carcinoma endometriale ? il pi? comune cancro invasivo del tratto genitale femminile ed ? responsabile del 7% di tutti i tumori invasivi nelle donne (escludendo i tumori della cute). Endometrial cancer? the pi? common invasive cancer of the female genital tract and? responsible for 7% of all invasive cancers in women (excluding skin cancers).

Il carcinoma dell?endometrio ? raro nelle donne di et? inferiore ai 40 anni. Il picco di incidenza ? tra i 55 e i 65 anni. Studi clinico-patologici e analisi molecolari hanno sostenuto la classificazione del carcinoma endometriale in due ampie categorie: Tipo I e Tipo II. Cancer of the endometrium? rare in women of age? less than 40 years old. The peak of incidence? between 55 and 65 years. Clinico-pathological studies and molecular analyzes have supported the classification of endometrial cancer into two broad categories: Type I and Type II.

Il tipo I ? il pi? frequente, con una percentuale di casi superiore all?80%, mina le ghiandole endometriali proliferative e come tale viene indicato con il termine carcinoma endometrioide. Generalmente insorge in un quadro di iperplasia endometriale e, come questa, si associa ad obesit?, diabete, ipertensione, infertilit? e stimolazione estrogenica incontrastata. Recenti studi hanno fornito ulteriori prove a sostegno della tesi secondo cui l?iperplasia endometriale ? un precursore del carcinoma endometriale (Muller GL et al. Allelotype mapping of unstable microsatellites establishes direct lineage continuity between endometrial precancers and cancers. Cancers Res 56:4483, 1996). Il carcinoma endometriale di tipo II generalmente colpisce le donne un decennio pi? tardi rispetto al tipo I (65-75 anni) e a differenza dei tipo I si sviluppa soprattutto su un quadro di atrofia endometriale. Type I? the pi? frequent, with a percentage of cases higher than 80%, undermines the proliferative endometrial glands and as such is indicated with the term endometrioid carcinoma. Generally occurs in a picture of endometrial hyperplasia and, like this, is associated with obesity, diabetes, hypertension, infertility? and uncontested estrogenic stimulation. Recent studies have provided further evidence to support the thesis that endometrial hyperplasia? a precursor of endometrial carcinoma (Muller GL et al. Allelotype mapping of unstable microsatellites establishes direct lineage continuity between endometrial precancers and cancers. Cancers Res 56: 4483, 1996). Type II endometrial cancer generally affects women a decade or more. later than type I (65-75 years) and unlike type I it develops mainly on a picture of endometrial atrophy.

Il tipo II rappresenta meno del 15% dei casi di carcinoma endometriale e si presenta scarsamente differenziato (G3). Il sottotipo pi? comune ? quello sieroso, cosi chiamato per via della sovrapposizione biologica e morfologica con il carcinoma dell?ovaio. Appartengono a questa categoria anche i sottotipi istologici meno comuni: carcinoma a cellule chiare e tumore maligno misto m?lleriano. Type II accounts for less than 15% of endometrial cancer cases and is poorly differentiated (G3). The subtype pi? common ? the serous one, so called because of the biological and morphological overlap with ovarian carcinoma. The less common histological subtypes also belong to this category: clear cell carcinoma and Malignant mixed malignancy.

Attualmente, uno screening di massa su una popolazione asintomatica in et? premenopausale e postmenopausale per la diagnosi precoce di carcinoma endometriale, come si effettua per il carcinoma cervicale mediante il Pap-test, non ? attuabile. Currently, a mass screening on an asymptomatic population in age? premenopausal and postmenopausal for the early diagnosis of endometrial cancer, as is done for cervical cancer with the Pap test, not? feasible.

Studi condotti su prelievo esocervicale hanno dimostrato una frequenza di falsi negativi di circa il 40-50% in quanto le cellule endometriali esfoliate, avendo subito l?azione dell?ambiente vaginale, presentano alterazioni per cui perdono le caratteristiche che permettono la differenziazione della cellula tumorale da quella normale. D?altro canto la prognosi ? strettamente legata alla precocit? della diagnosi, infatti la sopravvivenza a 5 anni diminuisce drasticamente dal 78-98% in caso di diagnosi allo stadio I fino ad un 3-10% in caso di diagnosi in stadio IV. Studies conducted on exocervical sampling have shown a false negative frequency of about 40-50% as the exfoliated endometrial cells, having undergone the action of the vaginal environment, show alterations for which they lose the characteristics that allow the differentiation of the tumor cell from the normal one. On the other hand, the prognosis? closely linked to earliness? 5-year survival drastically decreases from 78-98% in the case of stage I diagnosis to 3-10% in the case of stage IV diagnosis.

Diverse migliaia di metaboliti del siero umano sono stati ad oggi identificati e l?applicazione della metabolomica ha permesso lo sviluppo di biomarcatori per numerosi disturbi quali la schizofrenia (Kaddurah-Daouk R., Metabolic profiling of patients with schizophrenia, PLOS Med 2006; 8:e363), la meningite (Subramanian A. et al., Proton MR/CSF analysis and a new software as predictors for the differentiation of meningitis in children, NMR Biomed 2005; 18:213-25) e il cancro al colon (Denkert C., et al., Metabolite profiling of human colon carcinoma ? deregulation of TCA cycle and amino acid turnover, Mol. Cancer 2008; 7:1-15). Tuttavia l?impiego della metabolomica in campo ginecologico e stato finora limitato a studi riguardanti il carcinoma ovarico (Fan L. et al. Identification of metabolic biomarkers to diagnose epithelial ovarian cancer using a UPLC/QTOF/MS platform Acta Oncologica, 2012; 51:473?479). Non sono riportati ad oggi in letteratura studi condotti in gascromatografia accoppiata alla spettrometria di massa e a tecniche chemiometriche per la diagnosi del carcinoma endometriale. Several thousand human serum metabolites have been identified to date and the application of metabolomics has allowed the development of biomarkers for numerous disorders such as schizophrenia (Kaddurah-Daouk R., Metabolic profiling of patients with schizophrenia, PLOS Med 2006; 8: e363 ), meningitis (Subramanian A. et al., Proton MR / CSF analysis and a new software as predictors for the differentiation of meningitis in children, NMR Biomed 2005; 18: 213-25) and colon cancer (Denkert C. , et al., Metabolite profiling of human colon carcinoma? deregulation of TCA cycle and amino acid turnover, Mol. Cancer 2008; 7: 1-15). However, the use of metabolomics in gynecology has so far been limited to studies concerning ovarian cancer (Fan L. et al. Identification of metabolic biomarkers to diagnose epithelial ovarian cancer using a UPLC / QTOF / MS platform Acta Oncologica, 2012; 51: 473? 479). Studies conducted in gas chromatography coupled with mass spectrometry and chemometric techniques for the diagnosis of endometrial cancer are not reported in the literature to date.

E? quindi ad oggi fortemente sentita la necessit? di un sistema diagnostico non invasivo che consenta di effettuare uno screening sulla popolazione a rischio per et? o per fattori di rischio noti, al fine di identificare precocemente questa temibile neoplasia femminile. AND? therefore today strongly felt the necessity? of a non-invasive diagnostic system that allows to carry out a screening on the population at risk by age? or for known risk factors, in order to identify this fearful female neoplasm early.

Vantaggiosamente, la presente invenzione risolve i problemi suddetti mediante un metodo non invasivo per la diagnosi del carcinoma endometriale. Non esistono attualmente altri metodi diagnostici non invasivi che possano consentire una tale discriminazione istologica di questo tipo di tumore. Advantageously, the present invention solves the above problems by means of a non-invasive method for the diagnosis of endometrial carcinoma. There are currently no other non-invasive diagnostic methods that can allow such histological discrimination of this type of tumor.

A seguire l?oggetto dell?invenzione verr? dettagliatamente illustrato. To follow the object of the invention will come? illustrated in detail.

BREVE DESCRIZIONE DELLE FIGURE BRIEF DESCRIPTION OF THE FIGURES

La Figura 1 mostra il risultato dell?analisi OPLS-DA basata sui dati del profilo metabolomico delle pazienti con carcinoma endometriale e dei controlli sani. Figure 1 shows the result of the OPLS-DA analysis based on the metabolomic profile data of patients with endometrial cancer and healthy controls.

Lo scores plots discrimina tra le due classi senza sovrapposizioni. I triangoli rappresentano le pazienti affette da carcinoma endometriale, mentre i cerchietti le pazienti sane. Le componenti principali PC1 e PC2 riportate sugli assi descrivono rispettivamente il 16.5% e il 14.9% della varianza globale. Scores plots discriminate between the two classes without overlapping. The triangles represent patients with endometrial cancer, while the circles represent healthy patients. The main components PC1 and PC2 shown on the axes describe respectively 16.5% and 14.9% of the global variance.

Figura 2 mostra, secondo l?invenzione, la classificazione istologica (carcinoma di tipo I vs carcinoma di tipo II) ottenuta con il modello PLS-DA. I pallini rappresentano i profili metabolomici di donne con carcinoma endometriale di tipo I, mentre i triangoli quelli di pazienti con carcinoma endometriale di tipo II. Uno solo di questi campioni viene collocato dal modello in una zona non univocamente attribuibile all?area corretta. Figure 2 shows, according to the invention, the histological classification (type I carcinoma vs type II carcinoma) obtained with the PLS-DA model. The dots represent the metabolomic profiles of women with type I endometrial cancer, while the triangles represent those of patients with type II endometrial cancer. Only one of these samples is placed by the model in an area not uniquely attributable to the correct area.

DEFINIZIONI DEFINITIONS

Per metabolomica si intende, comunemente, la analisi di processi cellulari attraverso lo studio del profilo metabolico di piccole molecole di un organismo. Per analisi metabolomica gli inventori intendono riferirsi alla esecuzione di un processo volto alla identificazione e alla determinazione della concentrazione del maggior numero possibile di metaboliti all?interno di un campione biologico. By metabolomics we commonly mean the analysis of cellular processes through the study of the metabolic profile of small molecules of an organism. By metabolomic analysis, the inventors intend to refer to the execution of a process aimed at identifying and determining the concentration of the largest possible number of metabolites within a biological sample.

Per metaboliti si intendono comunemente le piccole molecole derivate dai processi biologici di tipo anabolico o catabolico di una cellula o di un insieme di cellule. Gli inventori intendono riferirsi, con il termine di metaboliti, a tutte le molecole con peso molecolare inferiore ai 1000 Dalton potenzialmente identificabili e misurabili all?interno di un campione biologico. By metabolites we commonly mean small molecules derived from the anabolic or catabolic biological processes of a cell or a set of cells. The inventors intend to refer, with the term metabolites, to all molecules with molecular weight lower than 1000 Daltons potentially identifiable and measurable within a biological sample.

Per profilo metabolico s?intende lo specifico pattern che i metaboliti assumono nel sangue della paziente a seconda delle loro proporzioni relative. By metabolic profile we mean the specific pattern that the metabolites assume in the patient's blood according to their relative proportions.

Il PLS-DA (Partial Least Squares Discriminant Analysis) ? un metodo supervisionato che utilizza tecniche di regressione multivariata per estrarre tramite combinazioni lineari delle variabili originarie (X) le informazioni che possono prevedere l'appartenenza ad una determinata classe (Y). Per valutare l?efficacia nella discriminazione delle classi, viene eseguito un test di permutazione. In ogni permutazione, un modello PLS-DA viene costruito tra i dati (X) e le etichette di classe permutate (Y) utilizzando il numero ottimale di componenti determinati dalla cross validazione per il modello basato sulla assegnazione delle classi originali. Due tipi di test statistici vengono eseguiti per misurare la capacit? di discriminazione tra le classi. Il primo si basa sulla precisione di previsione in fase di addestramento del modello. Il secondo si basa sulla distanza di separazione in base al rapporto tra somma delle distanze quadratiche all?interno delle classi e tra le classi (B/W-ratio). The PLS-DA (Partial Least Squares Discriminant Analysis)? a supervised method that uses multivariate regression techniques to extract, through linear combinations of the original variables (X), the information that can predict belonging to a specific class (Y). To evaluate the effectiveness of class discrimination, a permutation test is performed. In each permutation, a PLS-DA model is constructed between the data (X) and the permuted class labels (Y) using the optimal number of components determined by cross validation for the model based on the assignment of the original classes. Two types of statistical tests are performed to measure the capacity? discrimination between classes. The first is based on the prediction accuracy when training the model. The second is based on the separation distance based on the ratio between the sum of the quadratic distances within the classes and between the classes (B / W-ratio).

L? OPLS-DA (Orthogonal Partial Least Squares - Discriminant Analysis) ? un importante sviluppo della tecnica PLS-DA che ? stato proposto per gestire ortogonalmente la variazione delle classi nella matrice dei dati. OPLS-DA aumenta le prestazioni di classificazione dei modelli PLS-DA. Le prestazioni di classificazione vengono stimate sulla base di ?k-fold cross validation? dividendo la matrice di dati in k sottoinsiemi casuali. Per ciascun ciclo di calcolo, uno dei sottoinsiemi k viene tenuto da parte come set di test e i restanti k-1 sottoinsiemi fungono da addestratori. Ciascuno dei k sottoinsiemi viene utilizzato una volta come set di test, generando k valori di precisione. La precisione della classificazione viene calcolata come la media dei tassi di precisione nei k sottoinsiemi. Il modello viene sottoposto a cross validazione con il metodo ?leave one out cross validation? (LOOCV) per poterlo convalidare. La matrice di dati prima di essere sottoposta alla suddivisione in k sottoinsiemi viene scalata a media e varianza unitaria. In altre parole, la media e la deviazione standard dei dati di addestramento vengono utilizzati per indicare il centro e scalare i dati dei test. Una volta che il modello ? stato addestrato, viene utilizzato per verificare se i dati hanno generato un ?sovradattamento?. Per fare questo si crea un insieme di validazione con etichette di classe nota e si verifica se questo dia un tasso di accuratezza paragonabile a quella dei dati di addestramento. Un altro metodo ? un plot di convalida R2/Q2 che aiuta a valutare il rischio che il modello attuale sia spurio, cio?, il modello si adatti bene solo ai sottoinsiemi impostati ma non predice Y altrettanto bene per le nuove osservazioni. Il valore di R2 rappresenta la varianza percentuale del set di training che pu? essere spiegata dal modello. Il valore Q2 ? una misura cross-validata di R2. Questa convalida confronta la bont? di adattamento del modello originale con la bont? di adattamento dei diversi modelli in base ai dati in cui l'ordine delle Y osservazioni viene permutato in modo casuale, mentre la matrice viene mantenuta intatta. I criteri per la validit? del modello sono i seguenti: L? OPLS-DA (Orthogonal Partial Least Squares - Discriminant Analysis)? an important development of the PLS-DA technique which? was proposed to manage orthogonally the variation of classes in the data matrix. OPLS-DA increases the classification performance of PLS-DA models. Classification performance is estimated based on? K-fold cross validation? dividing the data matrix into k random subsets. For each computation cycle, one of the k subsets is kept aside as a test set and the remaining k-1 subsets act as trainers. Each of the k subsets is used once as a test set, generating k precision values. The accuracy of the classification is calculated as the average of the accuracy rates in the k subsets. The model is cross-validated with the? Leave one out cross validation? Method. (LOOCV) in order to validate it. The data matrix before being subdivided into k subsets is scaled to the mean and unit variance. In other words, the mean and standard deviation of the training data are used to indicate the center and scale the test data. Once the model? been trained, it is used to check if the data has generated an? overfitting ?. To do this, a validation set with labels of known class is created and it is checked whether this gives an accuracy rate comparable to that of the training data. Another method? an R2 / Q2 validation plot that helps assess the risk of the current model being spurious, that is, the model only fits well to the set subsets but does not predict Y as well for new observations. The value of R2 represents the percentage variance of the training set that can? be explained by the model. The Q2 value? a cross-validated measure of R2. Does this validation compare the goodness? of adaptation of the original model with the goodness? of adaptation of the different models based on the data in which the order of the Y observations is permuted randomly, while the matrix is kept intact. The criteria for validity? of the model are as follows:

1. Tutti i valori Q2 sul set di dati permutato devono essere inferiori al valore Q2 stimato sul set di dati attuale. Se questo non viene verificato, significa che il modello e sovradattato. 1. All Q2 values on the permuted dataset must be less than the estimated Q2 value on the current dataset. If this is not checked, it means that the model is over-adapted.

2. La linea di regressione (linea che unisce il punto Q2 reale al centroide del cluster di valori Q2 permutato) ha un valore negativo di intercetta dell'asse y. 2. The regression line (line joining the real Q2 point to the centroid of the permuted cluster of Q2 values) has a negative y-intercept value.

Le Support Vector Machines (SVMs) rappresentano delle tecniche supervisionate relativamente nuove di machine learning utilizzate per la classificazione. Le SVMs furono proposte per la prima volta nel 1982 da Vapnik (Vapnik, V. Estimation of Dependences Based on Empirical Data; Springer Verlag: New York, 1982). Il principio di base delle SVMs, che sono essenzialmente dei classificatori binari, ? il seguente: dato un insieme di dati con due classi, viene costruito un classificatore lineare sotto forma di un iperpiano, che ha il margine massimo nella minimizzazione simultanea dell'errore di classificazione empirica e la massimizzazione del margine geometrico. Nel caso di insiemi di dati che non sono linearmente separabili, i dati originali vengono mappati in un pi? alto spazio dimensionale e un classificatore lineare ? costruito in questo nuovo spazio (questo e noto come il "kernel"). Support Vector Machines (SVMs) are relatively new supervised machine learning techniques used for classification. SVMs were first proposed in 1982 by Vapnik (Vapnik, V. Estimation of Dependences Based on Empirical Data; Springer Verlag: New York, 1982). The basic principle of SVMs, which are essentially binary classifiers,? the following: given a data set with two classes, a linear classifier is constructed in the form of a hyperplane, which has the maximum margin in the simultaneous minimization of the empirical classification error and the maximization of the geometric margin. In the case of data sets that are not linearly separable, the original data is mapped into a pi? high dimensional space and a linear classifier? built in this new space (this is known as the "kernel").

Considerando un set di dati di training xi n, i= 1,?,m dove ciascun xi cade in una delle due categorie yi {1,1} la SVM determina l'iperpiano i cui parametri sono dati da (w,b) come ottenuto dalla soluzione del seguente problema di ottimizzazione convessa: Considering a set of training data xi n, i = 1,?, M where each xi falls into one of the two categories yi {1,1} the SVM determines the hyperplane whose parameters are given by (w, b) as obtained from the solution of the following convex optimization problem:

soggetto alle condizioni subject to conditions

dove c ? il parametro di regolarizzazione, che ? un compromesso tra la precisione dell?apprendimento e il termine di predizione, ed ? una misura del numero di errori di classificazione. L'inclusione del termine di regolarizzazione riduce il problema del sovradattamento. where c? the regularization parameter, which? a compromise between the accuracy of learning and the prediction term, and? a measure of the number of misclassifications. The inclusion of the regularization term reduces the problem of overfitting.

ALBERI DI DECISIONE. Gli alberi decisionali costruiscono modelli di classificazione basati sul partizionamento ricorsivo dei dati. Tipicamente, un algoritmo di albero decisionale inizia con l'intero set di dati, i dati vengono divisi in due o pi? sottogruppi basati sui valori di uno o pi? attributi e quindi si divide ripetutamente ogni sottoinsieme in sottoinsiemi pi? piccoli finch? la dimensione di ciascun sottoinsieme raggiunge un livello appropriato. L'intero processo di modellazione pu? essere rappresentato in una struttura ad albero, e il modello generato pu? essere riassunto come un insieme di regole "if-then". Gli alberi di decisione sono facili da interpretare, computazionalmente poco impegnativi, e in grado di fronteggiare dati rumorosi. La maggior parte degli alberi decisionali affronta il problemi di classificazione, quali ad esempio l?oggetto di questa invenzione. In questo contesto, la tecnica ? indicata anche come albero di classificazione. Nella rappresentazione con struttura ad albero, un nodo rappresenta un insieme di dati, e l'intero set di dati ? rappresentato come un nodo alla radice. DECISION TREES. Decision trees build classification models based on recursive partitioning of data. Typically, a decision tree algorithm starts with the entire data set, the data is split into two or more? subgroups based on the values of one or more? attributes and then repeatedly divides each subset into subsets pi? small as long as? the size of each subset reaches an appropriate level. The whole modeling process can? be represented in a tree structure, and the generated model can? be summarized as an "if-then" set of rules. Decision trees are easy to interpret, computationally undemanding, and capable of dealing with noisy data. Most decision trees deal with classification problems, such as the object of this invention. In this context, the technique? also referred to as a classification tree. In the tree representation, a node represents a dataset, and the whole dataset? represented as a node at the root.

DESCRIZIONE DETTAGLIATA DELL?INVENZIONE DETAILED DESCRIPTION OF THE INVENTION

La presente si riferisce ad un metodo per la diagnosi del carcinoma endometriale, basata sulla analisi metabolomica del sangue e su una integrazione dei risultati ottenuti mediante un?analisi multivariata utilizzando modelli di analisi discriminante scelti nel gruppo consistente di PLS-DA e OPLS-DA, oppure modelli di computer learning scelti nel gruppo consistente di SVM and decision tree. The present refers to a method for the diagnosis of endometrial carcinoma, based on the metabolomic analysis of the blood and on an integration of the results obtained by means of a multivariate analysis using discriminant analysis models chosen from the group consisting of PLS-DA and OPLS-DA, or computer learning models chosen from the group consisting of SVM and decision tree.

Oggetto dell?invenzione ? un metodo per la diagnosi del carcinoma endometriale basato sull?analisi metabolomica del sangue, detto metodo comprendente le seguenti fasi: Object of the invention? a method for the diagnosis of endometrial cancer based on metabolomic analysis of the blood, said method comprising the following steps:

(I), detta fase di addestramento, comprendente: (I), said training phase, comprising:

- l?analisi GCMS o GCxGCMS di campioni di sangue prelevati da pazienti con carcinoma endometriale e da controlli sani; - GCMS or GCxGCMS analysis of blood samples taken from patients with endometrial cancer and from healthy controls;

- l?integrazione dei risultati ottenuti mediante un?analisi multivariata utilizzando almeno un modello di analisi discriminante oppure un modello di computer learning per addestrare almeno un modello di classificazione; (II), detta fase di attribuzione, comprendente l?analisi GCMS o GCxGCMS di un campione di sangue incognito; e sua attribuzione alla classe di appartenenza sulla base del modello di classificazione formulato nella fase di addestramento. - the integration of the results obtained through a multivariate analysis using at least one discriminant analysis model or a computer learning model to train at least one classification model; (II), said attribution step, comprising the GCMS or GCxGCMS analysis of an unknown blood sample; and its attribution to the class to which it belongs on the basis of the classification model formulated in the training phase.

L?analisi multivariata, effettuata sui cromatogrammi raccolti impiegando - almeno un modello di analisi discriminante selezionato dal gruppo costituto da: PLS-DA e OPLS-DA, oppure The multivariate analysis, carried out on the chromatograms collected using - at least one discriminant analysis model selected from the group consisting of: PLS-DA and OPLS-DA, or

- un modello di computer learning selezionato dal gruppo costituito da: SVM e decision tree; - a computer learning model selected by the group consisting of: SVM and decision tree;

ha vantaggiosamente consentito la soddisfacente classificazione dicotomica (?Paziente sano? vs ?Paziente affetta da carcinoma endometriale?) di campioni incogniti. Il modello di classificazione ottenuto con analisi multivariata PLS-DA ha addirittura consentito la discriminazione istologica del carcinoma (carcinoma di tipo I vs carcinoma di tipo II); non esistono attualmente altri metodi diagnostici non invasivi che possano consentire una tale discriminazione istologica di questo tipo di tumore. In detta fase (I) di addestramento vengono analizzati campioni prelevati da pazienti affette da carcinoma endometriale e da donne sane con caratteristiche fisiche (BMI, et?, comorbidit?) e sociali (grado di istruzione, condizione socio-economica) simili, e per mezzo di questi vengono addestrati i modelli di classificazione. Detta fase di addestramento ? volta a creare e delimitare le caratteristiche del profilo metabolico presente nel sangue dei due gruppi. Perch? si possa ottenere una buona predittivit? del modello di classificazione ? necessario sottoporre ad analisi multivariata un numero di campioni pari ad almeno l?80% del numero di variabili individuate, detti campioni appartenenti ad almeno 2 diverse classi. has advantageously allowed the satisfactory dichotomous classification (? Healthy patient? vs? Patient with endometrial carcinoma?) of unknown samples. The classification model obtained with PLS-DA multivariate analysis even allowed the histological discrimination of carcinoma (type I carcinoma vs type II carcinoma); there are currently no other non-invasive diagnostic methods that can allow such histological discrimination of this type of tumor. In said training phase (I), samples taken from patients with endometrial carcinoma and from healthy women with similar physical (BMI, age, comorbidities) and social (educational level, socio-economic status) characteristics are analyzed, and for by means of these the classification models are trained. Said training phase? aimed at creating and delimiting the characteristics of the metabolic profile present in the blood of the two groups. Why? you can get a good predictivity? of the classification model? It is necessary to subject to multivariate analysis a number of samples equal to at least 80% of the number of variables identified, called samples belonging to at least 2 different classes.

In detta fase (II) di attribuzione i campioni incogniti vengono sottoposti ad analisi GCMS, e i cromatogrammi risultanti vengono classificati secondo i modelli precedentemente addestrati, stimandone la pi? probabile classe di appartenenza. In said phase (II) of attribution, the unknown samples are subjected to GCMS analysis, and the resulting chromatograms are classified according to the previously trained models, estimating the most? probable class of belonging.

Il metodo per la diagnosi del carcinoma endometriale della presente invenzione non si basa sulla misurazione della concentrazione dei singoli metaboliti, ma viene considerato come biomarcatore l?intero cluster di metaboliti (profilo metabolico), che, per essere presenti secondo proporzioni diverse nei 2 gruppi, ne permettono l?inserimento in due classi diverse d?appartenenza. The method for the diagnosis of endometrial carcinoma of the present invention is not based on the measurement of the concentration of the single metabolites, but the entire cluster of metabolites (metabolic profile) is considered as a biomarker, which, in order to be present in different proportions in the 2 groups, they allow the insertion in two different classes of belonging.

Preferibilmente, detta fase (I) di addestramento comprende ulteriormente le seguenti sottofasi: Preferably, said training phase (I) further comprises the following sub-phases:

- estrazione e derivatizzazione dei metaboliti da campioni di sangue prelevati da pazienti con carcinoma endometriale e da controlli sani; - extraction and derivatization of metabolites from blood samples taken from patients with endometrial carcinoma and from healthy controls;

- analisi GCMS o GCxGCMS dei metaboliti estratti e derivatizzati per ottenere un cromatogramma per ciascun campione, ciascun cromatogramma essendo un profilo metabolico; - GCMS or GCxGCMS analysis of the extracted and derivatized metabolites to obtain a chromatogram for each sample, each chromatogram being a metabolic profile;

- creazione di una matrice di dati dei profili metabolici dei pazienti con carcinoma endometriale e dei controlli sani; - creation of a data matrix of the metabolic profiles of patients with endometrial cancer and healthy controls;

- strutturazione di almeno un modello di classificazione in seguito ad analisi multivariata della matrice di dati; detta analisi multivariata effettuata utilizzando almeno un modello di analisi discriminante oppure un modello di computer learning per addestrare almeno un modello di classificazione. Diversi modelli di classificazione si prestano allo scopo di cui alla presente invenzione; preferibilmente detti modelli di classificazione sono selezionati dal gruppo costituito da: PLS-DA, OPLS-DA, SVM e Decision Tree. - structuring of at least one classification model following multivariate analysis of the data matrix; said multivariate analysis carried out using at least one discriminant analysis model or a computer learning model to train at least one classification model. Different classification models lend themselves to the purpose of the present invention; preferably said classification models are selected from the group consisting of: PLS-DA, OPLS-DA, SVM and Decision Tree.

Preferibilmente, detta fase (II) di attribuzione comprende utlteriormente le seguenti sottofasi: Preferably, said attribution phase (II) further comprises the following sub-phases:

- l?applicazione delle prime tre sottofasi di detta fase (I) al campione incognito; e - the application of the first three sub-phases of said phase (I) to the unknown sample; And

- l?attribuzione del profilo metabolico ad una classe in base al modello di classificazione addestrato nella fase (I). - the attribution of the metabolic profile to a class based on the classification model trained in phase (I).

Preferibilmente, il metodo della presente invenzione prevede un modello di classificazione addestrato ad una classificazione dicotomica ?Paziente sano? o ?Paziente affetta da carcinoma endometriale?. Ancor pi? preferibilmente, detto modello di classificazione ? anche addestrato per una classificazione istologica del carcinoma ?di tipo I? o ?di tipo II?. Preferably, the method of the present invention provides a classification model trained to a dichotomous classification? Healthy patient? or? Patient with endometrial cancer ?. Even more? preferably, said classification model? also trained for a histological classification of carcinoma? type I? or? type II ?.

Preferibilmente, detta estrazione dei metaboliti ? effettuata dopo aver aggiunto al campione una aliquota nota di un composto di riferimento; preferibilmente, detto composto di riferimento ? ribitolo. Preferably, said metabolite extraction? carried out after adding a known aliquot of a reference compound to the sample; preferably, said reference compound? I re-titled it.

Preferibilmente, detta estrazione ? effettuata impiegando una miscela di estrazione costituita da una miscela acquosa di un alcool ed un solvente polare aprotico, preferibilmente CH3OH/H2O/CHCl3, ancor pi? preferibilmente in rapporto in volume 2-3/0,5-0,5/0,5-1. Preferably, said extraction? carried out using an extraction mixture consisting of an aqueous mixture of an alcohol and an aprotic polar solvent, preferably CH3OH / H2O / CHCl3, even more? preferably in volume ratio 2-3 / 0.5-0.5 / 0.5-1.

In una realizzazione preferita, detta sottofase di estrazione e derivatizzazione comprende: In a preferred embodiment, said extraction and derivatization sub-phase comprises:

i) agitazione del campione ottenuto dal trattamento con la miscela di estrazione; i) agitation of the sample obtained from the treatment with the extraction mixture;

ii) centrifugazione di detto campione ottenuto da i); ii) centrifugation of said sample obtained from i);

iii) derivatizzazione del surnatante ottenuto da ii) per trattamento con cloridrato di metossilammina in piridina; iii) derivatization of the supernatant obtained from ii) by treatment with methoxylamine hydrochloride in pyridine;

iv) silanizzazione del surnatante ottenuto da iii) con un agente silanizzante selezionato dal gruppo costituito da: N,O-bis(trimetilsilil) trifluoroacetammide (BSTFA), N-metil-N-(trimetilsilil) trifluoroacetammide (MSTFA), esametil disilazano (HMDS), 1-(trimetilsilil) imidazolo (TMSI), N-tert-butildimetilsilil-N-metiltrifluoroacetammide (MTBSTFA), 1-(tertbutildimetilsilil) imidazolo (TBDMSIM) in opzionale presenza di trimetilclorosilano (TMCS). iv) silanization of the supernatant obtained from iii) with a silanizing agent selected from the group consisting of: N, O-bis (trimethylsilyl) trifluoroacetamide (BSTFA), N-methyl-N- (trimethylsilyl) trifluoroacetamide (MSTFA), hexamethyl disilazane (HMDS ), 1- (trimethylsilyl) imidazole (TMSI), N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide (MTBSTFA), 1- (tertbuthyldimethylsilyl) imidazole (TBDMSIM) in optional presence of trimethylchlorosilane (TMCS).

Per ottenere una separazione tra i metaboliti utile agli scopi di questa invenzione ? possibile operare sia in gascromatografia monodimensionale sia in gascromatografia bidimensionale; la gascromatografia bidimensionale ? preferibile in quanto il miglior potere risolutivo della tecnica offre una migliore accuratezza classificativa. Tuttavia, come mostrato nella sezione ESEMPI ? possibile operare anche con la pi? comune gascromatografia monodimensionale. To obtain a separation between the metabolites useful for the purposes of this invention? it is possible to operate both in one-dimensional gas chromatography and in two-dimensional gas chromatography; two-dimensional gas chromatography? preferable as the best resolving power of the technique offers better classification accuracy. However, as shown in the EXAMPLES section? can also operate with the pi? common one-dimensional gas chromatography.

I gascromatogrammi ottenuti, preferibilmente in modalit? SCAN, vengono integrati in modo da identificare tutti i picchi che hanno un?area superiore a 10 volte il rumore di fondo del tracciato gascromatografico. The gas chromatograms obtained, preferably in the mode? SCAN, are integrated in order to identify all the peaks that have an area greater than 10 times the background noise of the gas chromatographic trace.

Utilizzando il picco del composto di riferimento (preferibilmente ribitolo) come riferimento sia per l?analisi quantitativa sia per centrare i tempi di ritenzione, ogni picco viene identificato sulla base di un segnale m/z di quantizzazione e di almeno 2 segnali m/z di qualificazione. In seguito all?integrazione si procede alla quantificazione con il metodo delle aree percentuali normalizzate. I risultati ottenuti da questa quantizzazione (aree percentuali normalizzate) vengono trasferiti ad una matrice in cui ogni campione rappresenta una linea e le colonne sono rappresentate dai vari metaboliti identificati in modo univoco per mezzo del loro tempo di ritenzione gascromatografico, rispetto al tempo di ritenzione del composto di riferimento. La prima colonna della matrice viene utilizzata per definire la classe di appartenenza del campione. Nello scenario pi? semplice si possono prevedere due sole classi ?Paziente sana? e ?Paziente affetta da carcinoma endometriale?, pi? avanti sono riportate evidenze del funzionamento dell?invenzione sulla base di questa classificazione dicotomica. Using the peak of the reference compound (preferably ribitol) as a reference both for quantitative analysis and for centering retention times, each peak is identified on the basis of an m / z quantization signal and at least 2 m / z of qualification. Following the integration, quantification is carried out using the method of normalized percentage areas. The results obtained from this quantization (normalized percentage areas) are transferred to a matrix in which each sample represents a line and the columns are represented by the various metabolites uniquely identified by means of their gas chromatographic retention time, compared to the retention time of the reference compound. The first column of the matrix is used to define the membership class of the sample. In the scenario pi? simple can there be only two classes? Healthy patient? and? Patient with endometrial carcinoma ?, plus? below there are evidences of the functioning of the invention on the basis of this dichotomous classification.

L?analisi statistica multivariata dei dati (PLS-DA e OPLS-DA) e l?apprendimento automatico (SVM e albero di decisione) vengono effettuate sui cromatogrammi normalizzati e corretti (sull?area di picco del ribitolo) usando SIMPCA-P 13.0 (Umetrics), RapidMiner 5.3 (Rapid-I) e R (Foundation for Statistical Computing, Vienna). I valori vengono centrati sulla media e la varianza viene normalizzata. Multivariate statistical data analysis (PLS-DA and OPLS-DA) and machine learning (SVM and decision tree) are performed on normalized and corrected chromatograms (on the ribitol peak area) using SIMPCA-P 13.0 (Umetrics ), RapidMiner 5.3 (Rapid-I) and R (Foundation for Statistical Computing, Vienna). The values are centered on the mean and the variance is normalized.

Per il profilo metabolico, il modello OPLS-DA ha mostrato capacit? di modellazione e di predittivit? soddisfacenti usando un componente predittivo e tre componenti ortogonali (R2Ycum=0,995, Q2 For the metabolic profile, the OPLS-DA model has shown capacity? modeling and predictivity? satisfactory using one predictive component and three orthogonal components (R2Ycum = 0.995, Q2

cum=0,985). La Figura 1 mostra la separazione tra le classi ottenuta con il modello OPLS-DA. cum = 0.985). Figure 1 shows the separation between classes obtained with the OPLS-DA model.

E? stata inoltre costruita una classificazione basata sull?istologia del carcinoma mediante un modello PLS-DA. Come mostrato in Figura 2, un solo campione si colloca in una zona dubbia dello spazio di definizione delle classi. AND? Furthermore, a classification based on the histology of the carcinoma was constructed using a PLS-DA model. As shown in Figure 2, only one sample is placed in a dubious area of the class definition space.

La presente invenzione potr? essere meglio compresa alla luce dei seguenti esempi realizzativi, non limitativi. The present invention may? be better understood in the light of the following non-limiting embodiments.

ESEMPI EXAMPLES

La metodologia diagnostica oggetto della presente invenzione ? stata sviluppata a partire dalla analisi metabolomica, effettuata su campioni di sangue raccolti da pazienti con diagnosi certa di carcinoma endometriale, prima dell?intervento di isterectomia e da un gruppo di donne controllo, con caratteristiche fisiche e socio-economiche simili ma con un utero sano. Le informazioni circa l?isotipo e il grado della neoplasia sono state raccolte successivamente all?isterectomia in base alle evidenze anatomopatologiche ottenute dall?analisi dell?organo espiantato. The diagnostic methodology object of the present invention? was developed starting from the metabolomic analysis, carried out on blood samples collected from patients with a certain diagnosis of endometrial carcinoma, before the hysterectomy and by a group of control women, with similar physical and socio-economic characteristics but with a healthy uterus . The information about the isotype and the degree of the neoplasm were collected after the hysterectomy on the basis of the pathological evidence obtained from the analysis of the explanted organ.

Raccolta dei campioni Collection of samples

I campioni sono stati prelevati da 88 donne con carcinoma endometriale e 80 donne sane, che hanno donato volontariamente campioni di sangue. Lo studio ? stato approvato dal comitato etico dell?universit? della Magna Grecia di Catanzaro e le pazienti e le volontarie sane hanno firmato un consenso informato circa gli scopi dello studio. I campioni di sangue sono stati prelevati immediatamente prima dell?intervento di isterectomia usando provette BD Vacutainer?, il siero ? stato congelato a -80?C fino al momento dell?analisi. Il sospetto diagnostico di carcinoma endometriale a seguito di esame isteroscopico con biopsia della lesione endometriale ? stato confermato dall?esame anatomopatologico dell?utero successivamente all?intervento di isterectomia. E? stato predisposto anche un gruppo di controllo prelevando campioni di sangue a donne senza segni di carcinoma endometriale e con caratteristiche fisiche e socio economiche simili (peso, altezza, indice di massa corporea, et?, stato civile, grado di istruzione, ecc.). The samples were taken from 88 women with endometrial cancer and 80 healthy women, who voluntarily donated blood samples. I study ? been approved by the ethics committee of the university? of Magna Grecia in Catanzaro and healthy patients and volunteers signed an informed consent about the purposes of the study. Blood samples were taken immediately prior to the hysterectomy using BD Vacutainer? Tubes, serum? was frozen at -80 ° C until the time of analysis. The diagnostic suspicion of endometrial cancer following hysteroscopic examination with biopsy of the endometrial lesion? was confirmed by the anatomopathological examination of the uterus following the hysterectomy. AND? A control group was also set up by taking blood samples from women without signs of endometrial cancer and with similar physical and socio-economic characteristics (weight, height, body mass index, age, marital status, educational level, etc.).

Le caratteristiche demografiche e cliniche dei casi e dei controlli sono riportate nella tabella 1 mentre nella tabella 2 sono elencate le caratteristiche anatomopatologiche dei tumori indagati. The demographic and clinical characteristics of the cases and controls are shown in table 1 while in table 2 the anatomopathological characteristics of the investigated tumors are listed.

Tabella 1: caratteristiche della popolazione dello studio Table 1: Characteristics of the study population

Tabella 2: caratteristiche anatomopatologiche dei tumori indagati Table 2: anatomopathological characteristics of the investigated tumors

Estrazione e derivatizzazione dei metaboliti Extraction and derivatization of metabolites

Cinquanta microlitri di siero sono stati trasferiti in provette Eppendorf da 2 mL e si aggiungono 20 L di una soluzione 1 g/L di ribitolo e 200 L di una miscela composta da 2.5 parti di Metanolo, 1 parte di Acqua e 1 parte di Cloroformio (CH3OH:H2O:CHCl3, 2,5:1:1). Fifty microliters of serum were transferred into 2 mL Eppendorf tubes and added 20 L of a 1 g / L solution of ribitol and 200 L of a mixture consisting of 2.5 parts of Methanol, 1 part of Water and 1 part of Chloroform ( CH3OH: H2O: CHCl3, 2.5: 1: 1).

La soluzione viene miscelata in vortex per 30 secondi. The solution is vortexed for 30 seconds.

I campioni sono stati quindi centrifugati a 16000 rpm per 10 minuti a 4?C. Un?aliquota di 200 L del surnatante ? stata raccolta e trasferita in nuove provette Epperndorf da 2 mL ed addizionata con 200 L di H2O e, miscelata in vortex per 30 secondi e centrifugata nuovamente a 16000 rpm per 5 minuti a 4?C. The samples were then centrifuged at 16000 rpm for 10 minutes at 4 ° C. A? 200 L aliquot of the supernatant? was collected and transferred into new 2 mL Epperndorf tubes and added with 200 L of H2O and vortexed for 30 seconds and centrifuged again at 16000 rpm for 5 minutes at 4 ° C.

Un?aliquota di 350 L del surnatante ? stata nuovamente raccolta e trasferita in fiale di vetro da 1,5 mL, e liofilizzata. A? 350 L aliquot of the supernatant? was collected again and transferred into 1.5 mL glass vials, and lyophilized.

Il campione liofilizzato ? stato trattato con 50 L di cloridrato di metossilammina 20 mg/mL in piridina. La reazione ? stata condotta a 37?C in agitazione (350 rpm) per 90 minuti. Al termine, 50 L di N,O-bis(trimetilsilil)trifluoroacetammide (BSTFA) con 1% di trimetilclorosilano (TMCS) sono stati aggiunti ad ogni fiala e la reazione di silanizzazione ? stata condotta a 37?C per 60 minuti in agitazione (350 rpm). The lyophilized sample? was treated with 50 L of methoxylamine hydrochloride 20 mg / mL in pyridine. The reaction ? was carried out at 37 ° C under stirring (350 rpm) for 90 minutes. Upon completion, 50 L of N, O-bis (trimethylsilyl) trifluoroacetamide (BSTFA) with 1% trimethylchlorosilane (TMCS) was added to each vial and the silanization reaction? was carried out at 37 ° C for 60 minutes under stirring (350 rpm).

Analisi MDGCMS MDGCMS analysis

Per la gascromatografia bidimensionale ? stata utilizzata una colonna primaria (posta nel primo forno), del tipo SLB-5ms 30,0 m x 0,25 mm ID con 1 m di spessore del film [polimero silfenilene, praticamente equivalente in polarit? a poli(5% difenil/95% metilsilossano)] (J&W Agilent) che ? stata collegata alla posizione 1 della interfaccia a 7 porte (SGE). Una BPX-505,0 m x 0,50 mm ID con 0,25 m di spessore del film ? stata collegata alla posizione 7 dell?interfaccia. Una BPX-501,5 m x 0,25 mm ID, 0,25 m ? stata fissata alla posizione 6 e collegata ad un detector a ionizzazione di fiamma (FID) posto a 320?C, mentre la colonna analitica di 5,0 m (chimicamente identica a quella collegata al FID) ? stata collegata al sistema qMS. For two-dimensional gas chromatography? was used a primary column (placed in the first oven), of the type SLB-5ms 30,0 m x 0,25 mm ID with 1 m of film thickness [silphenylene polymer, practically equivalent in polarity? to poly (5% diphenyl / 95% methylsiloxane)] (J&W Agilent) what? been connected to position 1 of the 7-port interface (SGE). A BPX-505.0m x 0.50mm ID with 0.25m film thickness? been connected to position 7 of the interface. A BPX-501.5m x 0.25mm ID, 0.25m? been fixed at position 6 and connected to a flame ionization detector (FID) set at 320 ° C, while the analytical column of 5.0 m (chemically identical to the one connected to the FID)? been connected to the qMS system.

La colonna collegata al FID ? stata usata per ridurre i flussi nella seconda dimensione e per verificare che un composto poco rappresentativo non fosse frutto di una fluttuazione casuale della cromatografia. The column connected to the FID? was used to reduce flows in the second dimension and to verify that an unrepresentative compound was not the result of a random fluctuation in chromatography.

E? stato utilizzato un capillare esterno da 40 L (20 cm x 0,71 mm OD x 0,51 mm ID in acciaio inox) per collegare le porte 3 e 4 della interfaccia SGE. AND? A 40 L external capillary (20 cm x 0.71 mm OD x 0.51 mm ID stainless steel) was used to connect ports 3 and 4 of the SGE interface.

Il programma termico uguale per i due forni era: 80?C per 1 minuto poi riscaldamento fino a 320?C a 3?C/minuto e mantenuto per 4 minuti. The same heat program for the two ovens was: 80 ° C for 1 minute then heating up to 320 ° C at 3 ° C / minute and maintained for 4 minutes.

La pressione di elio iniziale (velocit? lineare costante) ? stata fissata a 129,6 kPa. La pressione di elio ausiliaria iniziale della APC (controllo avanzato di pressione), anche esso operante in condizioni di velocit? lineare costante ? stato fissato a 90,4 kPa. The initial helium pressure (constant linear velocity)? was set at 129.6 kPa. The initial auxiliary helium pressure of the APC (advanced pressure control), also operating in conditions of speed? linear constant? was set at 90.4 kPa.

Il volume di iniezione a 1 L con un rapporto di split: 1:5. Il periodo di modulazione ? stato fissato a 4,1 s (periodo di accumulo 4,0 secondi, periodo di iniezione 0,1 secondi). Le condizioni dello spettrometro di massa a quadrupolo erano: modalit? di ionizzazione: impatto elettronico (70 eV), intervallo di massa:40-600 m/z, velocita scansione: 10.000 amu/secondo. Analisi GCMS The injection volume at 1 L with a split ratio: 1: 5. The modulation period? was set at 4.1 s (accumulation period 4.0 seconds, injection period 0.1 seconds). The conditions of the quadrupole mass spectrometer were: modality? ionization rate: electron impact (70 eV), mass range: 40-600 m / z, scan rate: 10,000 amu / second. GCMS analysis

Per la gascromatografia monodimensionale ? stata utilizzata una colonna del tipoCP-Sil 8 CB GC Column, 30 m, 0,25 mm, 1,00 m, (Agilent J&W). Il programma termico del GC prevedeva una temperatura iniziale di 100?C per 1 minuto poi riscaldamento fino a 320?C a 4?C/minuto e 4 minuti di hold time per un tempo totale di corsa di 60 minuti. For one-dimensional gas chromatography? A column of the type CP-Sil 8 CB GC Column, 30 m, 0.25 mm, 1.00 m, (Agilent J&W) was used. The GC thermal program had an initial temperature of 100 ° C for 1 minute then heating up to 320 ° C at 4 ° C / minute and 4 minutes of hold time for a total run time of 60 minutes.

La pressione di elio iniziale (velocit? lineare costante di 39 cm/s) ? stata fissata a 83,7 kPa. Il volume di iniezione a 2 L con un rapporto di split: 1:5. Le condizioni dello spettrometro di massa a quadrupolo erano: modalit? di ionizzazione: impatto elettronico (70 eV), intervallo di massa: 35-600 m/z, velocita di scansione: 3.333 amu/secondo con un solvent cut time di 4,5 minuti. The initial helium pressure (constant linear velocity of 39 cm / s)? was set at 83.7 kPa. The injection volume at 2 L with a split ratio: 1: 5. The conditions of the quadrupole mass spectrometer were: modality? of ionization: electron impact (70 eV), mass range: 35-600 m / z, scanning speed: 3,333 amu / second with a solvent cut time of 4.5 minutes.

Creazione di una matrice di dati Creating a data matrix

In un cromatogramma TIC vengono normalmente rilevati pi? di 250 segnali, alcuni di questi picchi non sono stati ulteriormente investigati perch? non sono state trovate corrispondenze in altri campioni, perch? in concentrazione troppo bassa o perch? di scarsa qualit? spettrale per poter essere confermati come metaboliti. In a TIC chromatogram, more than 100 percentages are normally detected. of 250 signals, some of these peaks have not been further investigated why? no matches were found in other samples, why? in too low concentration or why? of poor quality? spectral in order to be confirmed as metabolites.

Un totale di 198 metaboliti endogeni quali amminoacidi, acidi organici, carboidrati, acidi grassi e steroidi sono stati rilevati. Per l?identificazione del picco, si e utilizzato l?indice di ritenzione lineare (LRI) ponendo come tolleranza una differenza tra l?indice di Kovats tabellare e quello sperimentale massima di 10, mentre il minimo di compatibilit? per la ricerca nelle librarie e stato posto all? 85%. Sono state utilizzate 2 librerie: la NIST11 e una libreria appositamente sviluppata derivatizzando oltre 500 metaboliti nelle stesse condizioni dei campioni analizzati. Le aree dei picchi sono state normalizzate e corrette al segnale del ribitolo. I risultati sono stati riassunti in un file a matrice separato da virgole (CSV) e caricato in un software appropriato per l?elaborazione statistica. A total of 198 endogenous metabolites such as amino acids, organic acids, carbohydrates, fatty acids and steroids were detected. For the identification of the peak, the linear retention index (LRI) was used, setting as tolerance a difference between the tabular Kovats index and the maximum experimental one of 10, while the minimum of compatibility for research in the libraries and was placed at? 85%. Two libraries were used: NIST11 and a specially developed library derivatizing over 500 metabolites under the same conditions as the analyzed samples. The peak areas were normalized and corrected to the ribitol signal. The results were summarized in a comma separated matrix file (CSV) and loaded into appropriate software for statistical processing.

I gascromatogrammi ottenuti in modalit? SCAN sono stati integrati in modo da identificare tutti i picchi che hanno un?area superiore a 10 volte il rumore di fondo del tracciato gascromatografico. Ogni picco ? stato identificato sulla base di un segnale m/z di quantizzazione e di almeno 2 segnali m/z di qualificazione. In seguito all?integrazione si ? proceduto alla quantificazione con il metodo delle aree percentuali normalizzate, il picco del ribitolo ? stato utilizzato come riferimento sia per l?analisi quantitativa sia per centrare i tempi di ritenzione. I risultati ottenuti da questa quantizzazione (aree percentuali normalizzate) sono stati trasferiti ad una matrice in cui ogni campione rappresenta una linea e le colonne sono rappresentate dai vari metaboliti identificati in modo univoco per mezzo del loro tempo di ritenzione gascromatografico. La prima colonna della matrice viene utilizzata per definire la classe di appartenenza del campione. Nello scenario pi? semplice si possono prevedere due sole classi ?Paziente sano? e ?Paziente affetta da carcinoma endometriale?, pi? avanti sono riportate evidenze del funzionamento dell?invenzione sulla base di questa classificazione dicotomica. Ulteriori evidenze sono state ottenute circa la possibilit? dei diversi modelli di classificazione testati di predire anche l?istotipo della neoplasia e il grading. The gas chromatograms obtained in the modality? SCAN have been integrated to identify all peaks that have an area greater than 10 times the background noise of the gas chromatographic trace. Any peak? been identified on the basis of a quantization m / z signal and at least 2 qualification m / z signals. Following the integration yes? proceeded to quantify with the method of normalized percentage areas, the peak of the ribitol? was used as a reference for both quantitative analysis and to center retention times. The results obtained from this quantization (normalized percentage areas) were transferred to a matrix in which each sample represents a line and the columns are represented by the various metabolites uniquely identified by means of their gas chromatographic retention time. The first column of the matrix is used to define the membership class of the sample. In the scenario pi? simple can there be only two classes? Healthy patient? and? Patient with endometrial carcinoma ?, plus? below there are evidences of the functioning of the invention on the basis of this dichotomous classification. Further evidence has been obtained regarding the possibility? of the different classification models tested to also predict the histotype of the neoplasm and the grading.

Analisi statistica Statistic analysis

L?analisi statistica multivariata dei dati (PLS-DA e OPLS-DA) e l?apprendimento automatico (SVM e albero di decisione) sono state effettuate sui cromatogrammi normalizzati e corretti (sull?area di picco del ribitolo) usando SIMPCA-P 13.0 (Umetrics), RapidMiner 5.3 (Rapid-I) e R (Foundation for Statistial Computing, Vienna). Multivariate statistical data analysis (PLS-DA and OPLS-DA) and machine learning (SVM and decision tree) were performed on normalized and corrected chromatograms (on the ribitol peak area) using SIMPCA-P 13.0 ( Umetrics), RapidMiner 5.3 (Rapid-I) and R (Foundation for Statistial Computing, Vienna).

I valori sono stati centrati sulla media e la varianza ? stata normalizzata. Risultati Were the values centered on the mean and variance? been normalized. Results

cum=0,985). Gli altri modelli di classificazione hanno mostrato buone (anche se inferiori all?OPLS-DA) capacit? classificative. Diversi approcci sono possibili per l?attribuzione definitiva della classe di appartenenza del campione incognito. Si pu? utilizzare la risposta di un singolo modello oppure integrare le risposte dei vari modelli in un pi? complesso algoritmo decisionale. cum = 0.985). The other classification models have shown good (even if lower than the OPLS-DA) capacity? classification. Different approaches are possible for the definitive attribution of the class to which the unknown sample belongs. Can you? use the response of a single model or integrate the responses of the various models into a pi? complex decision algorithm.

La Tabella 3 riporta alcuni indici di stima delle performance diagnostiche utilizzati per valutare i modelli indagati. La sensibilit? ? stata calcolata come TP/(TP+FN), dove TP rappresenta il numero di veri positivi, cio? campioni correttamente diagnosticati come affetti da carcinoma endometriale dal modello proposto, e FN il numero di falsi negativi, cio? campioni erroneamente identificati come negativi. La specificit? ? stata stimata come TN/(TN+FP), dove TN rappresenta il numero di veri negativi, cio? campioni correttamente diagnosticati come sani e FP rappresenta i falsi positivi, cio? il numero di soggetti erroneamente diagnosticati come sani. Il rapporto di verosimiglianza positivo (PLR) ? stato calcolato come Sensibilit?/(1-Specificit?), mentre quello negativo (NLR) come (1-Sensibilit?)/Specificit?. Il valore predittivo negativo (NPV) ? stato stimato come TN/(TN+FN), mentre quello positivo (VPP) come TP/(TP+FP). L'accuratezza rappresenta la percentuale di tutte le assegnazioni corrette ed ? stata stimata come (TP+TN)/(TP+FP+TN+FN) mentre la riproducibilit? come il numero di riassegnazioni corrette in 10 replicati della analisi di un campione. Table 3 reports some diagnostic performance estimation indices used to evaluate the investigated models. The sensitivity? ? been calculated as TP / (TP + FN), where TP represents the number of true positives, that is? samples correctly diagnosed as having endometrial cancer by the proposed model, and FN the number of false negatives, that is? samples incorrectly identified as negative. The specificity? ? been estimated as TN / (TN + FP), where TN represents the number of true negatives, that is? samples correctly diagnosed as healthy and FP represents false positives, that is? the number of subjects misdiagnosed as healthy. The positive likelihood ratio (PLR)? was calculated as Sensitivity / (1-Specificity), while the negative (NLR) was calculated as (1-Sensitivity) / Specificity. The negative predictive value (NPV)? was estimated as TN / (TN + FN), while the positive one (VPP) as TP / (TP + FP). Accuracy represents the percentage of all correct assignments and? was estimated as (TP + TN) / (TP + FP + TN + FN) while the reproducibility? as the number of successful reassignments in 10 replicates of one sample run.

Tabella 3 ? Performance diagnostiche dei modelli investigati Table 3? Diagnostic performance of the investigated models

Per identificare i metaboliti che hanno contribuito maggiormente alla separazione delle classi ? stato calcolato il punteggio delle variabili importanti nella proiezione (VIP) per ciascun componente. I punteggi VIP rappresentano la somma pesata dei quadrati dei loading della pls, tenendo conto della quantit? di y-varianza spiegata in ogni dimensione. Due picchi mostrato un punteggio VIP maggiore di 2 in entrambi i modelli PLS-DA e OPLS-DA (sia nella classificazione di carcinoma endometriale vs controllo sia nella classificazione di tipo i vs tipo ii). Questi sono stati identificati come importanti nodi anche nell?albero decisionale, queste osservazioni suggeriscono una grande importanza di queste variabili nei processi di classificazione (dati non riportati). Il primo metabolita (VIP-score=2,3; similarit? spettrometrica=91%; LRI = 11) ? risultato essere un segnale attribuibile all?amminoacido glutammina, mentre il secondo (VIP-score=2,1; similarit? spettrometrica=89% LRI = 16) ? risultato essere un segnale attribuibile al -glucono lattone. To identify the metabolites that contributed most to the separation of classes? The score of the important variables in the projection (VIP) for each component was calculated. The VIP scores represent the weighted sum of the squares of the pls loading, taking into account the quantity? of y-variance explained in each dimension. Two peaks showed a VIP score greater than 2 in both PLS-DA and OPLS-DA models (both in the endometrial vs control classification and in the type i vs type ii classification). These have also been identified as important nodes in the decision tree, these observations suggest a great importance of these variables in the classification processes (data not shown). The first metabolite (VIP-score = 2.3; spectrometric similarity = 91%; LRI = 11)? turned out to be a signal attributable to the amino acid glutamine, while the second (VIP-score = 2.1; spectrometric similarity = 89% LRI = 16)? found to be a signal attributable to -glucono lactone.

Claims

CLAIMS 1. A method for the diagnosis of endometrial cancer based on metabolomic analysis of blood, said method comprising the following steps: (I) training phase comprising: - GCMS or GCxGCMS analysis of blood samples taken from patients with endometrial cancer and from healthy controls; - the integration of the results obtained through a multivariate analysis using at least one discriminant analysis model or a computer learning model to train at least one classification model; (II) attribution phase comprising the GCMS or GCxGCMS analysis of an unknown blood sample and its attribution to the class on the basis of the classification model formulated in the (I) training phase.

2. Method according to claim 1 wherein - said discriminant analysis model? selected from the group consisting of: PLS-DA and OPLS-DA, or - said computer learning model? selected from the group consisting of: SVM and decision tree.

3. Method according to one or more? of claims 1-2 wherein step (I) comprises the following sub-steps: a) extraction and derivatization of metabolites from blood samples taken from patients with endometrial carcinoma and from healthy controls; b) GCMS or GCxGCMS analysis of the extracted and derivatized metabolites to obtain a chromatogram for each sample; c) creation of a data matrix of the metabolic profiles of patients with endometrial cancer and healthy controls; d) structuring of at least one classification model following multivariate analysis of the data matrix; where dictates multivariate analysis? performed using at least one discriminant analysis model or a computer learning model to train at least one classification model.

4. Method according to one or more? of the preceding claims in which said phase (II) further comprises the following sub-phases: a) extraction and derivatization of metabolites from at least one unknown blood sample; b) GCMS or GCxGCMS analysis of the extracted and derivatized metabolites to obtain at least one chromatogram of the unknown sample; c) creation of a metabolic profile from said chromatogram of the unknown sample; d) attribution of the metabolic profile to a class based on the classification model trained in phase (I).

5. Second method or more? of the previous claims where the number of blood samples taken from patients with endometrial cancer and from healthy controls? equal to at least 80% of the number of metabolic profile variables.

6. Method according to one or more? of the preceding claims in which said classification model? trained in a dichotomous classification? healthy patient? or? Patient with endometrial cancer ?.

7. Method according to one or more? of the previous claims n which classification model? also trained for a histological classification of carcinoma? type I? or? type II ?.

8. Method according to one or more? of the preceding claims wherein said extraction and derivatization comprises i) agitation of the sample obtained by adding an extraction mixture; ii) centrifugation of said sample obtained from i); iii) derivatization of the supernatant obtained from ii) by treatment with methoxylamine hydrochloride in pyridine; iv) silanization of the supernatant obtained from iii) with a silanizing agent selected from the group consisting of: N, O-bis (trimethylsilyl) trifluoroacetamide (BSTFA), N-methyl-N- (trimethylsilyl) trifluoroacetamide (MSTFA), hexamethyl disilazane (HMDS ), 1- (trimethylsilyl) imidazole (TMSI), N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide (MTBSTFA), 1- (tertbuthyldimethylsilyl) imidazole (TBDMSIM); where said extraction mixture? consisting of an aqueous mixture of an alcohol and an aprotic polar solvent.

9. Method according to one or more? of the preceding claims wherein said metabolite extraction? carried out by adding an aliquot of reference compound, preferably ribitol.

10. Method according to one or more? of the preceding claims further comprising the following steps: -Integration of the chromatograms obtained according to one or more? of the preceding claims, where said integration provides for the identification of all peaks which have an area greater than 10 times the background noise of the chromatographic trace; using the peak of the reference compound as a reference both for quantitative analysis and for centering retention times, where each peak? identified on the basis of: - a quantization signal m / z; and of - at least 2 m / z qualification signals; - quantification with the method of normalized percentage areas; - transfer of the data obtained from said quantification to a matrix in which each sample represents a line and the columns are represented by the various metabolites uniquely identified by means of their chromatographic retention time.