EP2771483A1

EP2771483A1 - Method for diagnosing a disease based on plasma-dna distribution

Info

Publication number: EP2771483A1
Application number: EP12780167.8A
Authority: EP
Inventors: Jochen Geigl; Ellen HEITZER; Eva-Maria Hoffmann; Michael Speicher
Original assignee: ONCOTYROL - Center for Personalized Cancer Medicine GmbH
Current assignee: ONCOTYROL - Center for Personalized Cancer Medicine GmbH
Priority date: 2011-10-25
Filing date: 2012-10-25
Publication date: 2014-09-03
Also published as: WO2013060762A1

Abstract

The present invention relates to a method of diagnosing a disease, particularly a disease associated with increased apoptosis, and the use of size distribution of plasma DNA in the diagnosis of the disease.

Description

Method for diagnosing a disease based on plasma-DNA distribution

The present invention relates to a method of diagnosing a disease, particularly a disease associated with increased apoptosis, and the use of size distribution of plasma DNA in the diagnosis of the disease. Methods of diagnosis usually aim at being minimal invasive, technically robust and universally applicable. Blood analysis has been used as a diagnostic tool for many years, as blood is easily obtainable and the analysis of its components is usually relatively easy and automatable. Additionally, the composition of the blood components usually reflects a variety of diseases. For cancer diagnosis state of the art methods as well as certain technical problems are detailed in the following:

There is considerable interest in the exploitation of somatic mutations as tools for improving the detection of disease and, ultimately, for affecting individual outcomes positively. In colorectal cancer (CRC) the paradigm are KRAS mutations in exon 2 (codons 12 and 13), which have been established as a predictive marker for treatment with epidermal growth factor receptor (EGFR) inhibitors, such as cetuximab and panitumumab (Walther et al. 2009). For the clinical oncologist, it has therefore become increasingly urgent to have access to accurate and sensitive methods for the detection of such predictive biomarkers. However, the question remains how serial monitoring of tumor genotypes, which are prone to changes, can be performed. The evaluation of patient blood samples for mutant DNA molecules or circulating tumor cells (CTCs) is a particularly attractive approach because blood is easily accessible.

Early reports suggesting that the simple presence or absence of circulating DNA itself, or its concentration, was of diagnostic value (Leon et al. 1977; Stroun et al. 1989) have been called into question because, the results of numerous studies, which attempted to identify abnormal forms or quantities of DNA in plasma or serum, were contradictory (Sidransky 2002; Pinzani et al. 2010; van der Vaart and Pretorius 2010; Schwarzenbach et al. 2011). Other methods aimed at the analyses of plasma-/serum-DNA by loss of heterozygosity (LOH) (Miiller et al. 2008; Schwarzenbach et al. 2009) or for tumor-related methylation patterns (Gormally et al. 2007). However these tests lack the necessary specificity and are, therefore, not used as diagnostic tools. Moreover, the detection of mutations which had previously been identified in the corresponding primary tumor from the same patient should provide, in theory, a specific biomarker of disease burden (Nawroz et al. 1996; Diehl et al. 2005, 2008a, 2008b; Yung et al. 2009). However, sensitive and specific detection of a mutated base in a vast excess of normal DNA requires specialized techniques currently beyond the scope of many diagnostic laboratories (Diehl et al. 2005, 2008a, 2008b; Yung et al. 2009). Furthermore, the copy number of selected regions may be determined in plasma DNA with quantitative PCR. A well-known method is real-time PCR (RT-PCR), wherein primers with fluorescence markers are used and the increase of fluorescence depending on the number of cycles is monitored. A disadvantage of this procedure is that it is costly and has limited capability for multiplex high-throughput screenings. In addition, our own analyses show that with a low amount of starting material, quantitative PCR does not provide reliable results and that the actual copy numbers are often not determined correctly.

Also BEAMing (BEAM: beads, emulsion, amplification, magnetic) technique has been suggested as tool. The technique comprises four essential steps (Diehl et al, 2005) which is exemplified for a mutation in the APC gene: (i) The total number of plasma DNA fragments comprising APC is determined by real-time PCR; (ii) BEAMing is then used in order to link the amplified plasma DNA to beads, wherein within the beads an emulsion PCR is performed, in order to amplify each individual plasma DNA fragment within a bead to obtain so called "extended beads" which are subsequently broken ("emulsification"); (iii) the mutation status of the extended beads is determined by single base extension (incorporation of fluorochrome-labeled ddNTPS (causing termination of the reaction)) so that plasma fragments with and without mutation differ by the respective fluorochrome; and (iv) the fluorochromes are measured by flow cytometry and the ratio of mutant and wild type alleles is determined (Diehl et al, 2005). In later publications, the authors showed that the method may be used to monitor therapy of cancer patients (Diehl et al, 2008a, 2008b). The disadvantages of this procedure are high costs, high effort and complex laboratory equipment. An advantage of the procedure is the potential to identify cancer in early stages due to the analysis of plasma DNA. On the other hand, the mutation to be detected must be known in advance and the BEAMing technique may not be carried without that knowledge. Due to these essential disadvantages, BEAMing has not been found suitable for universal application. Further methods have been developed based on the design of bio markers for genomic changes within the tumor, which may be used specifically with plasma in order to monitor the disease. A prerequisite is that the primary tumor is examined by means of new next- generation sequencing procedures (Shendure and Ji, 2008) for tumor-specific rearrangements, such as translocations. The biomarker consists essentially of region- spanning breaks, whose presence and amounts in plasma may be monitored. As translocation break points may vary from tumor to tumor, each primary tumor has to be assessed individually and therefore this approach was referred to as PARE (personalized analysis of rearranged ends) (Leary et al., 2010).

For the malign hematologic disease chronic myeloid leukemia (CML) it is known that the translocation between chromosomes 9 and 22 causes fusion of the BCR and ABL genes. Also with other types of leukemia, specific translocations occur regularly so that high sensitive assays specifically recognizing translocation fusions even with very low amounts of material (therefore referred to as minimal residual disease (MRD)) have been developed. However, an essential difference between solid tumors and leukemia is that the great majority of solid tumors does not show regular translocations so that for each tumor, a sequencing procedure has to be developed individually. This implies a further dilemma of the approach: Whereas with leukemia it could be shown that the translocation is the reason for leukemia and therefore a "driver mutation", the majority of patient-specific assays having been designed for the respective translocation breakpoints which are most probably not causal for the development of a tumor, as they are rather passenger mutations than driver mutations (McBride et al, 2010). As long as the cells of the recrudescence also show the mutation, the monitoring of the therapy is not a problem. However, it is not known, whether or not relapse of a newly developing clone still comprises the passenger mutation, particularly in view of the different selection pressures. Moreover, solid tumors are usually highly heterogenic, which means that the translocation might be present only in a subpopulation of tumors cells. This is a further significant difference to leukemia, as the translocation between chromosomes 9 and 22 is present in all CML cells. A further disadvantage is the high effort for designing biomarkers and the high costs associated therewith. A potential advantage might be that, under suitable conditions, also minimal amounts of tumor might be diagnosed.

Another option is the analysis of tumor cells in the circulation of individuals with advanced cancers (Cristofanilli et al. 2004; Nagrath et al. 2007). However, CTCs constitute as few as 1 cell per lxlO⁹ hematologic cells in the blood of patients with metastatic cancer making it difficult to identify and isolate these cells (Pantel et al. 2008). Therefore, cells qualifying as CTCs are at present mainly enumerated without further analyses as they are routinely performed in tissue-based diagnostics.

Additionally, the diagnosis of fetal diseases in the maternal blood is of increasing interest. Here, the safety of the procedure as compared to e.g. amniocentesis and chorionic villus sampling is evidently of particular interest as well as with legal standards and moral requirements; further, a comparison to preimplantation diagnostics is evidently of particular interest. With pregnant females, it is estimated that 10% to 20% of the plasma DNA of a pregnant female are from the fetus. The development of the so called next- generation sequences methods gave rise to further methods for examining fetal DNA in maternal blood. By sequencing the complete plasma DNA of a pregnant woman and an additional complex calculation method, it is possible to identify the proportion ratio of chromosome 21 in order to identify fetal trisomy 21 (Chiu et al., 2011). In the meantime, it could be shown that with this procedure, the complete genome of the fetus is present in the maternal plasma DNA and, therefore, it is not only possible to identify aneuploidy, but also to establish a mutation profile, i.e., to simultaneously test for a variety of genetic diseases or variants (Lo et al., 2010). In a new study, next-generation plasma sequencing was tested in a multi-center study with 753 pregnant females and showed that, indeed, fetal trisomy 21 was diagnosed in maternal blood with a sensitivity of 100%) and a specificity of 97.9%> (Chiu et al, 201 1). These studies suggest that the analysis of plasma DNA has the potential to substitute traditional prenatal diagnosis. However, a disadvantage is the high effort needed so far. In summary, the methods for the diagnosis of a disease according to the state of the art, particularly for diagnosing cancer or fetal aneuploidy, pose technical problems.

Accordingly, it was an object of the present invention to provide new methods for diagnosing a disease or for monitoring therapy. Particularly, it was intended to avoid problems of the methods detailed above. The methods of the present invention are particularly suitable in the predictive or prognostic diagnosis of cancer as well as in monitoring cancer therapy. Additionally, the methods may be employed with pregnant females in order to identify fetal aneuploidy. The object was solved by a method involving the determination of size distribution of plasma DNA, wherein an increased non-apoptosis peak in the size distribution relative to a control is indicative of a disease. As shown in the examples, it could be shown that the size distribution of healthy subjects is different from that of diseased subjects. Exemplary size distributions of plasma DNA are shown in Figures 3 a) and b). Figure 3 a) shows a peak around 170 base pairs (bp) referred to "region 1". Figure 3 b) shows the plasma size distribution of a diseased subject. When comparing Figure 3 b) to Figure 3 a), a further peak around 340 base pairs can be identified, which is indicative of the disease. The second peak is referred to as non-apoptotic peak. Our observations are the basis for straightforward, sensitive and specific methods to quantify disease burden in serial blood samples from potentially diseased subjects, particularly patients suspected or suffering from cancer and pregnant females. In contrast to the methods of the state of the art which are laborious, expensive and usually allow only the analysis of few selected regions within a genome, the present invention provides a method which allows to establish genome-wide copy number changes in relevant genomes, e.g., the genome of the tumor and cancer patients or the genome of the fetus in a pregnant female, from plasma DNA in a cheap, simpler and fast manner. The methods of the present invention are advantageous, because they are easy to carry out and to implement. The obtained data are robust and easy to interpret. The methods are cost- efficient as compared to other methods. They are characterized by high sensitivity and provide information about the complete relevant genome. The methods do not depend on the patient's history, in particular with tumor patients; they do not depend on the knowledge about primary tumors, and are independent from clonality and heterogeneity of the tumor. Finally, significant results are obtained faster than with other methods.

With respect to the use in the medical field, the methods are particularly useful to provide information about genomic aberrations such as copy numbers of the genome (especially with cancer patients of the tumor genome and with pregnant females of the genome of the fetus) from the plasma DNA. Additionally, the DNA is prepared in a manner to allow for easy determination of the mutation status of a selected gene or by next-generation sequencing of the complete genome. Furthermore and very remarkably, data provide the basis in order to conclude how probable it is to detect circulating tumor cells (CTCs) in the blood of a cancer patient.

As detailed above, the present invention is particularly useful in the field of oncology. In Europe, only in 2006, about 3,191,600 new cancer cases were diagnosed and about 1,700,000 events of death could be assigned to tumor diseases. Cancer is still the biggest challenge in the health system. As detailed above, this development is accompanied by the discovery of new molecular genetic bio markers providing essential predictive and prognostic information. Mutations in epidermal growth factor receptor are prominent examples which frequently occur at patients with lung carcinoma and which are usually treated with tyrosine kinase inhibitors such as Gefitinib. However, it is a prerequisite to clarify the mutation status and to monitor the treatment, in order to be able to quickly react on possible resistances developing in the meantime. It is very likely that a huge number of further biomarkers will be identified in the coming years and the need for efficient noninvasive methods for the determination of biomarkers at the time of diagnosis as well as during therapy will increase significantly. The method of the present invention has the additional advantage that changes throughout the complete tumor genome can be detected very quickly allowing to diagnose the occurrence of further new clones with other characteristics very fast in "real time". Accordingly, the present invention will have a great benefit in the field of oncology.

Additionally, the invention is also particularly useful in the field of fetal diagnosis. In 2009, there were 70,344 live births in Austria. In the 1980s, the age of parturients was 26.4 years on the average, whereas it was 30.0 years in 2009. Due to this demographic development, the need to identify fetal aneuploidy is of increasing relevance.

In a first aspect, the present invention relates to a method of diagnosing a disease, comprising

a) providing a plasma sample from a subject's blood; and

b) determining size distribution of the plasma-DNA,

wherein an increased non-apoptosis peak in the size distribution relative to a control is indicative of the disease.

Particularly, the present invention relates to a method of diagnosing a disease associated with increased apoptosis, comprising

a) providing a plasma sample from a subject's blood; and

b) determining size distribution of the plasma-DNA, wherein the size distribution is biphasic having a apoptosis peak and a non-apoptosis peak, wherein the apoptosis peak is characterized by a maximum in the range of from 150 bp to 180 bp and wherein the non-apoptosis peak is characterized by a maximum at in the range of from 300 bp to 350 bp,

"Diagnosing a disease" in the context of the present invention has a broad meaning. In the medical field, it is used for a process of attempting to determine and/or identify a possible disease done in order to clarify whether or not a subject is diseased or is going to be diseased. Alternatively or additionally, the extent of the disease may be determined. Accordingly, it may or may not be known whether the subject is suffering or is going to suffer from a disease. This is further exemplified with cancer, but is also true for other diseases. In one alternatively, it could be checked whether a subject is suffering from cancer or is going to suffer from cancer (without showing clinical symptoms yet). The diagnosis is carried out in order to find out whether the subject is healthy or suffering from cancer. If, the subject is suffering from cancer, the degree and extent of the disease could be determined, too. In a second alternative, it is known that the subject is or was diseased. The method could be used in the monitoring of the health status of a previously or still diseased subject. This could be in the context of the monitoring of a therapy or the recurrence.

A disease is an abnormal condition affecting the body of an organism. It is often construed to be a medical condition associated with specific symptoms and signs. It may be caused by external factors, such as infectious disease, or it may be caused by internal dysfunctions, such as autoimmune diseases. In humans, "disease" is often used more broadly to refer to any condition that causes pain, dysfunction, distress, social problems, and/or death to the person afflicted, or similar problems for those in contact with the person. In this broader sense, it includes injuries, disabilities, disorders, syndromes, infections, isolated symptoms, deviant behaviors, and atypical variations of structure and function. In the context of the present invention the terms disease, disorder, morbidity and illness are used interchangeably. Plasma-DNA is intended to relate to DNA present in blood plasma. In vertebrates, blood is composed of blood cells suspended in the blood plasma. Plasma, which constitutes 55% of blood fluid, is mostly water (92% by volume), and contains dissipated proteins, glucose, mineral ions, hormones, carbon dioxide (plasma being the main medium for excretory product transportation) and importantly for the present invention DNA. Plasma is obtained from blood by removal of cells.

Accordingly, plasma-DNA is from a subject's blood sample. As used herein the term "subject" can mean either a human or non-human animal, preferably vertebrates such as mammals, especially primates, such as humans. The sample is a limited quantity of blood which is intended to be similar to and represent a larger amount of the same. For providing a plasma sample for a subject's blood, a blood sample may be taken from the subject. Particularly for mammals, this may be conveniently performed by taking venous blood from the subject. Venous blood may be obtained by venipuncture from the mammal, e.g. a patient suspected of having cancer or a pregnant woman, wherein usually only a small sample, e.g. 3 ml to 10 ml sample, of blood is adequate for the method of the present invention. Blood is most commonly obtained from the median cubital vein, on the anterior forearm (the side within the fold of the elbow). This vein lies close to the surface of the skin, and there is not a large nerve supply. Most blood collection in the industrialized countries is done with an evacuated tube system consisting of a plastic hub, a hypodermic needle, and a vacuum tube. However, blood may also be obtained by any other method known to the skilled person.

After isolation of the blood, plasma is isolated from the blood sample. For this, cells may be removed by any suitable method. Conventionally, blood plasma is prepared by spinning a tube of blood usually containing an anti-coagulant in a centrifuge until the (blood) cells precipitate on the bottom of the tube. The plasma is then poured or drawn off. Exemplary method is by centrifugation at 1600g for 10 min and microcentrifugation at 16 OOOg for 10 min. After having been obtained and optionally further purified, the plasma may be immediately used for analysis or frozen for storage as known to the person skilled in the art. The plasma may be frozen at liquid nitrogen temperatures and stored for long periods of time, being thawed and capable of being reused. In a preferred embodiment, blood plasma is isolated as follows: Blood is taken from a subject and collected in container intended for blood collection. Those containers are commercially available and may be used in the method of the present invention. Usually, they comprise an anti-coagulant such as EDTA. An exemplary container is a routine EDTA Vacutainer tubes (BD Biosciences, Heidelberg, Germany). In order stabilize cell membranes and to impede cell lysis, suitable agents known to the skilled person such as formaldehyde (e.g. 4% weight per volume) may be added. Furthermore, a buffer solution adapted for stabilisation at neutral conditions may be present (e.g. 10% neutral buffered solution containing formaldehyde (4% weight per volume)). Blood samples may be gently inverted. Thereafter, the sample may be immediately used or stored until further processing (stored at 4° C and further processed within two hours).

Removal of cells is usually carried out by sedimentation, especially centrifugation techniques. Often, the centrifugation step may encompass several centrifugation steps, after which cells are separated from the liquid phase, in order to improve removal of cells and therefore the quality of the plasma. However, centrifugation should be rather mild in order to avoid loss of larger DNA molecules. Suitably, centrifugation may be carried out at about lOOOg to 3000g for several minutes, repeatedly or only once. In a preferred method, tubes containing the blood may by centrifuged at 200g for 10 min. e.g. with the brake and acceleration powers set to zero with a subsequent centrifugation step at 1600g for 10 min. The supernatant is collected, transferred to a new tube and spun at 1600g for 10 min. Further details are described in Dhallan et al. (Dhallan et al. 2004, 2007). Thereafter, the obtained plasma can be immediately analyzed or stored (e.g. at -80°C).

Before further analysis, it might be necessary to isolate DNA from the plasma. Suitable methods for this are known in the art. Exemplarily, the method may be carried out by using the QIAamp DNA Blood Mini Kit (#51306, Qiagen, Hilden, Germany) or the Qiagen circulating nucleic acids Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions.

In the method of the present invention a plasma sample from a subject's blood is provided (see above) and the size distribution of the plasma-DNA is determined. A size distribution is a list of values or a mathematical function that defines the relative amounts of particles present, sorted according to size. In the present invention, the DNA molecules of blood plasma are analyzed for their sizes (e.g. in base pairs (bp) and the amount or number of molecules (DNA or DNA fragments) of the various sizes is determined. Usually the size distribution is determined over a list of size ranges that covers nearly all the sizes present in the plasma. The amount of DNA fragments in each size range is preferably listed in order or presented in a coordinate system, e.g. a two-dimensional space (such as a Euclidian space). An exemplary illustration is shown as Fig. 3 a) and 3 b). Preferably and typically, the size distribution is shown in a Cartesian coordinate system, specifying each point uniquely in a plane by a pair of numerical coordinates, representing the size of the DNA molecule and the amount of the DNA molecules having the respective size. Typically, the size will be shown on the X axis and the amount (or a value reflecting or corresponding to the same) of DNA molecules having this size on the Y axis. In order to establish or determine the size distribution, it is required to determine the size of the DNA molecules present in the plasma sample. Any suitable method known to the skilled person may be used. However, care should be taken when selecting a suitable method that the selected method is sufficiently sensitive, which might also depend from the amount of sample accessible to analysis. Usually, the DNA molecules will be separated according to their sizes and analyses for their amount or number. Suitable methods include electrophoresis, next-generation DNA sequencing technology, PCR such as real-time PCR, Multiplex Ligation-dependent Probe Amplification (MLPA).

The term "next-generation sequencing" is known to the skilled person, e.g. from Metzker, 2010. Next-generation or high-throughput sequencing technologies parallelize the sequencing process, producing thousands or even millions of sequences at the same time. DNA electrophoresis is an analytical technique used to separate DNA fragments by size. DNA molecules which are to be analyzed are set upon a viscous medium, the gel, where an electric field induces the DNA to migrate toward the anode, due to the net negative charge of the sugar-phosphate backbone of the DNA chain. The separation of these fragments is accomplished by exploiting the mobilities with which different sized molecules are able to pass through the gel. Longer molecules migrate more slowly because they experience more resistance within the gel. Because the size of the molecule affects its mobility, smaller fragments end up nearer to the anode than longer ones in a given period. After some time, the voltage is removed and the fragmentation gradient is analyzed. For larger separations between similar sized fragments, either the voltage or run time can be increased. Extended runs across a low voltage gel yield the most accurate resolution. The DNA fragments of different lengths may be visualized using a fluorescent dye specific for DNA, such as ethidium bromide, wherein the intensity is reflects the amount of molecules. The gel shows bands corresponding to different DNA molecules populations with different molecular weight. Fragment size determination is typically done by comparison to commercially available DNA markers containing linear DNA fragments of known length. The types of gel most commonly used for DNA electrophoresis are agarose (for relatively long DNA molecules) and polyacrylamide (for high resolution of short DNA molecules). Gels have conventionally been run in a "slab" format, but capillary electrophoresis has become important for applications such as high-throughput DNA sequencing. Electrophoresis techniques used in the assessment of DNA include alkaline gel electrophoresis and pulsed field gel electrophoresis. The measurement and analysis are mostly aided by a specialized gel analysis software. Also next-generation DNA sequencing technology may be employed for size analysis. The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once. Exemplary methods include Massively Parallel Signature Sequencing (MPSS), polony sequencing, 454 pyrosequencing and Illumina (Solexa) sequencing.

For size distribution analysis of plasma DNA with PCR, primers for different-sized amplicons can be designed. These PCR assays use different reverse and/or forward primers, and one common or different binding probe(s). PCRs may be set up in a reaction tube, followed by amplification in the tube. Amplification data may be collected and analyzed by suitable software, e.g. Sequence Detection System Software (Ver. 1.9; Applied Biosystems). In real time PCR, different thermal profiles may be established for the assays.

If necessary or suitable, DNA could be extracted before analysis, e.g. with a QIAamp Blood Kit (Qiagen). After, determining the size distribution, it is assessed whether the size distribution shows an increased non-apoptosis peak relative to a control, which would be is indicative of the disease.

As detailed above, the plasma DNA size distribution shows one peak representing apoptotic cells (Diehl et al, 2005; Lo et al, 2010). The peak is therefore referred to as apoptosis peak. It is characterized by DNA fragments of about 166 bp. The non-apoptosis peak is a further (or second) peak in the size distribution which usually represents larger DNA fragments than the apoptosis peak. The second peak is referred to as the "non- apoptosis peak" (which means that it is not the apoptosis peak) or "viability peak" (as it does not predominately represent apoptotic cells).

The peak is to be regarded as increased, if its size is increased. An increase may be an increased area underneath the part of the graph which is considered to constitute the peak, i.e. the area integral of the peak. Alternatively or additionally, the maximum of the peak, i.e. the height of the peak, may be increased. It is evident that the increase in comparison to the control should be significant. The person skilled in the art knows statistical procedures to assess whether two values or areas are significantly different from each other such as Student's t-test or chi-square tests. Based on this, a threshold may be determined by the skilled person, which will evidently depend from the method used. If a value (e.g. peak area or maximum) is at or above that threshold, the value is to be regarded as increased relative to the control. It is evident for the skilled person that any background signal has to be subtracted when analyzing the data. In specific embodiments, the increase is at least about 10 % relative to the control. In other embodiments, the increase is at least 20 %, 30 %, 40 %, 50 % or 100 %, especially 150 %, 200 %, 250 %, or 300 %.

The control may be a sample from a healthy subject or determined at a group of healthy subjects. Alternatively, it may be a known and/or pre-determined reference value or area or size distribution. Furthermore, the skilled person knows how to select a suitable control.

In accordance with the present invention an increased non-apoptosis peak is indicative of the disease. Accordingly, it can be assumed that the subject (or in case of a pregnant women the fetus) whose plasma DNA is analyzed is diseased, if the peak is increased. Further, the DNA comprised under (represented by) the non-apoptosis peak provides information useful in characterizing the disease. The disease may particularly be a disease involving genomic aberrations, particularly chromosome aberrations (e.g. cancer or fetal aneuploidy). A genomic abnormality or aberration is a change in the genome of a subject, i.e. the genome of one or more cells or all cells of the subject.

A genomic abnormality (= anomaly, = aberration) reflects an atypical number of chromosomes or a structural abnormality in one or more chromosomes. Chromosome anomalies usually occur when there is an error in cell division following meiosis or mitosis. There are many types of chromosome anomalies. They can be organized into two basic groups, numerical and structural anomalies. Numerical abnormalities are called aneuploidy (an abnormal number of chromosomes), and occurs when an individual is missing either a chromosome from a pair (monosomy) or has more than two chromosomes of a pair (Trisomy, Tetrasomy, etc.). In humans an example of a condition caused by a numerical anomaly is Down Syndrome, also known as Trisomy 21 (an individual with Down Syndrome has three copies of chromosome 21 or a critical portion thereof, rather than two). Turner Syndrome is an example of a monosomy where the individual is born with only one X-chromosome, also referred to as an X0 genotype. Structural abnormalities are those abnormalities in which the chromosome's structure is altered. This can take several forms including deletions, duplications, translocations, inversions, rings and isochromosomes. In deletions, a portion of the affected chromosome is missing or deleted. Known disorders in humans include Wolf-Hirschhorn syndrome, which is caused by partial deletion of the short arm of chromosome 4; and Jacobsen syndrome, also called the terminal 1 lq deletion disorder. In duplications, a portion of the chromosome is duplicated, resulting in extra genetic material. Known human disorders include Charcot-Marie-Tooth disease type 1A which may be caused by duplication of the gene encoding peripheral myelin protein 22 (PMP22) on chromosome 17. In translocations, a portion of one chromosome is transferred to another chromosome. There are two main types of translocations. In a reciprocal translocation, segments from two different chromosomes have been exchanged. In a Robertsonian translocation, an entire chromosome has attached to another at the Centromere - in humans these only occur with chromosomes 13, 14, 15, 21 and 22. In inversions, a portion of the chromosome has broken off, turned upside down and reattached, therefore the genetic material is inverted. In rings a portion of a chromosome has broken off and formed a circle or ring. This can happen with or without loss of genetic material. Isochromosomes are formed by the mirror image copy of a chromosome segment including the centromere. Chromosome aberrations often lead to an increased tendency to develop certain types of malignancies. The genomic abnormality or aberration may be a mutation which is a change in the genomic sequence (here the DNA sequence). Mutations are caused by radiation, viruses, transposons and mutagenic chemicals, as well as errors that occur during meiosis or DNA replication. They can also be induced by the organism itself, by cellular processes such as hypermutation. The sequence of a gene can be altered in a number of ways. Gene mutations have varying effects on health depending on where they occur and whether they alter the function of essential proteins. Mutations in the structure of genes can be classified as small-scale mutations, such as point mutations, missense mutations, nonsense mutations, insertions, deletions and large-scale mutations in chromosomal structure, including amplifications (or gene duplications), deletions of large chromosomal regions, chromosomal translocations, interstitial deletions, chromosomal inversions, or loss of heterozygosity. Changes in DNA caused by mutation can cause errors in protein sequence, creating partially or completely non- functional proteins. To function correctly, each cell depends on thousands of proteins to function in the right places at the right times. When a mutation alters a protein that plays a critical role in the body, a medical condition can result. Mutations in a somatic cell of an organism will be present in all descendants of this cell within the same organism, and certain mutations can cause the cell to become malignant, and thus cause cancer. In a preferred embodiment, the size distribution of the plasma-DNA is detected as being biphasic, wherein the first and second peaks are the apoptosis peak and the non-apoptosis peak, respectively.

Healthy as well as diseased subjects show an apoptosis peak in the size distribution of the plasma-DNA. A peak means that there is a significant accumulation of DNA in a particular size range as compared to the surrounding part(s). As already detailed above, the apoptosis peak is typically in the size range of from 80 bp to 240 bp with a maximum around 166 bp. Also typically the peak is bell-shaped. It reflects the apoptotic cells (see above).

As evidenced here, in a diseased subject, there is a second peak, referred to as the "non- apoptosis" or "viability peak". This peak typically immediately follows the apoptosis peak and has an asymmetric shape.

The non-apoptosis peak may be characterized by further details, particularly

- by a maximum above 250 base pairs (bp), particularly in the range of from about 280 bp to about 400 bp, more particularly from 300 bp to 350 bp; and/or

- by representing DNA in the size of above 200 bp, particularly in the range of from 200 bp to about 1000 bp, more particularly of from 240 bp to 600 bp, especially from 250 bp to 400 bp.

The apoptosis peak may be characterized by further details, particularly

- by a maximum in the range of from 150 bp to 180 bp, more particularly of from 160 bp to 170 bp; and/or

- representing DNA in the range of from 50 bp to 250 bp, more particularly from 80 to 240 bp.

In as detailed above, an increased non-apoptosis peak is indicative of a disease. In a preferred embodiment the non-apoptosis peak is understood as being increased

- if the non-apoptosis peak contains at least 5 % of the total plasma-DNA, particularly at least 10 %, more particularly at least 20 %; and /or

- if the ratio of the maximal height of the non-apoptosis peak to the maximal height of the apoptosis peak is at least 20 %, preferably at least 30 %, more preferable at least 33 %. An increased non-apoptosis peak may also be detected based on its effect within the size distribution of one sample. As detailed above, the non-apoptosis peak is absent in the sample of a healthy subject. The presence of the peak requires that a particular fraction of the total DNA represented by the peak. Accordingly, the peak is increased if it contains at least 5 % of the total plasma-DNA, particularly at least 10 %, more particularly ate least 20 %. Alternatively or additionally, the increased peak may also be detected based in its relation to the apoptosis peak. In accordance with this, the non-apoptosis peak is increased, if the ratio of the maximal height of the non-apoptosis peak to the maximal height of the apoptosis peak is at least 20 %, preferably at least 30 %, more preferable at least 33 %.

The method of the present invention may comprise a further diagnostic step, particularly if an increased non-apoptosis peak in the size distribution relative to a control is detected. Accordingly, the present invention may provide a two-step approach, wherein first a biphasic size distribution of the plasma DNA is established and second a further diagnostic process carried out. Particular reference is made to the flow sheet presented in Figure 4. The first step allows an efficient and low-cost pre-selection of individuals which are subsequently subjected to more specific testing. Particularly, detection of circulating tumor cells (CTCs) and optionally subsequent correlation of the biphasic size distribution with the presence of CTCs is envisioned.

The method of the present invention provides the means to re-constract the tumor genome using the DNA of the non-apoptotic peak, particularly in the case of a cancer patient. The concept aims at detecting in this DNA fraction genetic aberrations indicative of disease status and allowing to make therapeutic decisions. Thus, DNA of the non-apoptotic peak can be used in array-based analysis. To this end, genome-wide copy number changes can be established. Accordingly, the method allows to establish genome-wide copy number changes in the genome of the patient to be diagnosed, particularly the genome of a cancer patient or the genome of the fetus in a pregnant female, from plasma DNA; and/or to provide information about genomic aberrations such as copy numbers of the genome from the plasma DNA.

The determining of the size distribution may be done by any suitable means (see also above). Micro fluidics-based electrophoresis has been shown as particular suitable. As detailed above, electrophoresis, which relies on inducing detectable differences in migration behavior between charged species under the influence of an applied electric field, has proven to be a highly versatile analytical technique owing to a favorable combination of characteristics including relatively simple hardware design and compatibility with a wide range of analytes including biological macro molecules (e.g., DNA, proteins). Therefore, it is a suitable tool for determining the size of DNA. More recently, electrophoresis technology has been miniaturized into microfluidic formats with the aim of producing portable low-cost versions of conventional benchtop-scale instrumentation. Microfluidic device can usually be identified by the fact that it has one or more channels with at least one dimension less than 1 mm. Common fluids used in microfluidic devices include whole blood samples, bacterial cell suspensions, protein or antibody solutions and various buffers. Because the volume of fluids within these channels is very small, usually several nanoliters, the amount of reagents and analytes used is quite small. An exemplary suitable device is the Agilent 2100 Bioanalyzer which allows for automatically sizing and quantitating DNA samples (Agilent Technologies, Inc., Santa Clara, CA, USA).

In a further embodiment, the method further comprises determining the total plasma-DNA level, wherein an increased total plasma-DNA level is also indicative of the disease. As detailed above, it is known that for certain diseases such as cancer and fetal aneuploidy the total plasma-DNA is increased relative to a control (e.g. a healthy subject), since particularly DNA of apoptotic cells is released into the blood (Leon at al, 1977, Stroun et al, 1989, Diehl et al, 2005). This has also been confirmed in the Examples (see e.g. Figure 1). Accordingly, the determination of total plasma-DNA may also be part of the method of the present invention. Total plasma-DNA may be determined as known in the art. In the present case, total plasma-DNA may be determined as the area integral of the size distribution, i.e. the area underneath the complete graph. It is evident that the increase in comparison to the control should be significant. The person skilled in the art knows statistical procedures to assess whether two values or areas are significantly different from each other such as Student's t-test or chi-square tests. It is evident for the skilled person that any background signal has to be subtracted when analyzing the data. In specific embodiments, the increase is at least about 10 %. In other embodiments, the increase is at least 20 %, 30 %, 40 %, 50 % or 100 %, especially 150 % or 200 %.

The method of the present invention may comprise further steps. Particularly, the method may further comprise amplifying the plasma-DNA and analyzing the amplified DNA, particularly by comparative genomic hybridization (CGH), especially by microarray-based CGH.

CGH is a molecular-cytogenetic method for the analysis of copy number changes (gains/losses) in the DNA content of a given subject's DNA. CGH can detect unbalanced chromosomal changes. In general, DNA from a subject and from a control (or reference) are each labeled with different tags for later analysis by fluorescence. After mixing subject and control DNA along with e.g. unlabeled human cot-1 DNA (placental DNA that is enriched for repetitive DNA sequences such as the Alu and Kpn family) to suppress repetitive DNA sequences, the mix is hybridized to normal metaphase chromosomes or, for microarray- or matrix-CGH, to an array or matrix containing hundreds or thousands of defined DNA probes. Using epifluorescence microscopy and quantitative image analysis, regional differences in the fluorescence ratio of gains/losses vs. control DNA can be detected and used for identifying abnormal regions in the genome. Chromosomal CGH is capable of detecting loss, gain and amplification of the copy number at the levels of chromosomes. In order to provide sufficient nucleic acid for CGH, plasma DNA is amplified before CGH according to method known by the skilled person. Exemplary methods for amplification and CGH are also detailed in the Examples. A highly preferred protocol is detailed in the following: The laboratory part comprises a clearly defined protocol, specifying how to prepare plasma DNA and the parameters that may be used to diagnose a disease, necessary method steps and further suitable analyses. An exemplary procedure is shown in Figure 4 and also exemplified in the following. The steps may be carried out as follows or in a different manner, wherein it is evident to the skilled person that each of the following steps can be modified and combined with the methods described in other part of the description including the Examples:

1. Preparation from plasma DNA from a blood sample. This step may be carried as follows: 80-90% of the plasma DNA of a healthy human is derived from about hematopoietic cells and 10-20% is derived from non- hematopoietic cells. As the total amount of plasma DNA is usually increased in tumor patients and pregnant females, the ratios may vary as the additional DNA results from tumor or fetal cells. It is evident for the skilled person that during the preparation of DNA, enrichment of hematopoietic cells should be avoided. This may be achieved by using specific tubes for collecting blood and preparation techniques which may include the addition of formaldehyde in order to stabilize the cell membranes of blood cells in order to reduce lysis. Evidently, it is also advantageous that plasma rather than serum is used.

2. Determination of DNA concentration and analysis of size distribution plasma DNA.

This step may be carried out as detailed above or in the Examples. 3. With the size distribution, two regions in which enrichment may be monitored are particularly relevant, as they are indicative of diseases: a) The first region reflects enrichment at about 166 bp, which is referred to as the "apoptosis peak". If the plasma-DNA is, for example, assessed by a bioanalyzer, a size distribution of DNA may be shown on the X axis in the unit base pairs. The apoptosis peak is usually obtained as a bell-shaped curve typically for a Gaussian distribution, wherein the maximum is usually at about 166 base pairs (which can vary by several base pairs). The area around 166 base pairs is of relevance, as this size corresponds to a fundamental packaging form of DNA in human cells, namely a nucleosome. A nucleosome is a complex of DNA and histones. A nucleosome core consists of two exemplars of each histone H2a, H2b, H3 and H4 which is surrounded by 1.65 coils of 147 bp DNA. The area between two linkers has a length of 50-60 base pairs in humans. According to established theories, the DNA of tumor cells or fetal cells is predominantly derived from apoptotic cells. During apoptosis, DNAse enzyme cleaves genomic DNA at internucleosomal regions resulting in fragments of 147 to 200 bp with a peak usually at about 166 base pairs. The assessment of the regions allows for a first conclusion. b) The second area consists of DNA fragments having more than 300 base pairs. The size distribution of these areas does usually not show a Gaussian distribution curve, but may be asymmetric, usually with a steep increase to a maximum at about 308 base pairs, followed by a slow decrease at larger DNA fragments which might be until 10,000 bp in size. These DNA fragments are significantly larger than those mentioned under item a). It is reasonable to assume that the DNA fragments having a length of about 300 base pairs, is composed of DNA of two subsequent nucleosomes. Larger fragments reflect the corresponding longer DNA molecules, such as several subsequent nucleosomes. The second peak reflects therefore the extent to which the DNA was digested by DNAse and, therefore, whether or not the DNA is derived from apoptotic cells only, or a combination of apoptotic cells and viable cells. Therefore, the peak is referred to as "viability peak" or "non-apoptotic peak". So far, the heights of the viability peaks have always been lower than those of the apoptosis peaks and the ratio of the prevailing heights was often in the range of about 6: 1 (apoptosis peak: non-apoptosis peak); however, this ratio is not fixed and may vary. The viability peak correlates also with the occurrence of circulating tumor cells (CTCs).

Earlier reports found that tumor DNA (Diehl et al., 2005) and fetal DNA (Lo et al, 2010) is fragmented to a larger extent, that this fragmentation results in fragments of about 166 bp and is related to apoptosis. A viability peak was not disclosed in these reports. It is essential for the method of the present invention that the size distribution is determined in a reliable manner, in order to correctly detect apoptosis and viability peaks.

Depending on the status of the apoptosis and viability peaks, it is decided upon the further process, e.g., whether copy numbers of tumor cells are reflected in plasma DNA in an appropriate manner and whether plasma DNA should be further analyzed, whether sequencing of plasma DNA should be contemplated and whether CTCs are probably to be found, etc. Accordingly, further tests may be added in order to increase the informative value of the results of the size distribution. Threshold values may be determined for apoptosis and viability peaks which are to be reached in order to provoke further procedure steps. 5. Thereafter, a DNA library is prepared (e.g., by adding linkers to the ends of the DNA fragments), thereafter the fragments are amplified by PCR. A commercial kit such as WGA2-kit (Sigma- Aldrich Chemie GmbH, Munich, Germany) may be used.

6. -7. As the next steps, the size distribution of the amplification product may be again established and depending on the size of the fragments of plasma DNA a further amplification may be carried out.

8. The amplification product may be analyzed by different methods such as CGH (comparative genomic hybridization), particularly array CGH. Here, the amplification product and the reference DNA, each labeled with different fluorochromes, are immobilized on an array by hybridization to immobilize sequences, whose physical address in the genome is known. Accordingly, a high resolution analysis of the copy number of the plasma DNA may be reached (similar techniques are detailed in Fiegler et al., 2007; Geigl et al, 2009). For data analysis, the algorithm for array CGH as detailed at Geigl et al, 2009, has been completely revised and new properties have been added. For example, the algorithm identifies and corrects automatically common artefacts which may occur due to strong fragmentation of the plasma DNA. The calculation to identify over- and underrepresented regions was completely revised. For prenatal application, a specific calculation procedure in order to establish copy numbers of chromosomes 13, 18, and 21 was added. Furthermore, the prepared plasma DNA may be sequenced by next-generation sequencing or individual genes may be analyzed by Sanger sequencing or other procedures such as the SNaPshot assay (Dias- Santagata et al, 2010). Depending on the obtained plasma distribution, plasma DNA fragments might be further analyzed by deep sequencing in order to detect MRDs.

9. Our algorithm provides information about the probability to detect CTCs in blood.

This correlates with the viability peak; the more prominent the peak, the higher the probability that CTCs are found in blood.

As detailed above, the method of the present invention is intended for the diagnosis of a disease. The method is particularly suitable for a disease associated with increased apoptosis, particularly wherein the disease is cancer or fetal aneuploidy.

Apoptosis is the process of programmed cell death (PCD) that may occur in multicellular organisms. Biochemical events lead to characteristic cell changes (morphology) and death. These changes include blebbing, cell shrinkage, nuclear fragmentation, chromatin condensation, and chromosomal DNA fragmentation. Altered apoptosis can result in a number of cancers, autoimmune diseases, inflammatory diseases, and viral infections. Loss of control of cell death can lead to neurodegenerative diseases, hematologic diseases, and tissue damage. Preferably, the disease to be diagnosed is cancer or fetal aneuploidy.

Cancer is a term for a large group of different diseases. Cancers create malignant tumors, cells that divide and grow uncontrollably and invade nearby parts of the body. The cancer may also spread to more distant parts of the body through the lymphatic system or bloodstream. Not all tumors are cancerous. Benign tumors do not grow uncontrollably, do not invade neighbouring tissues, and do not spread through the body. Cancer is fundamentally a disease of failure of regulation of tissue growth. In order for a normal cell to transform into a cancer cell, the genes which regulate cell growth and differentiation must be altered. In general, the method of the present invention is applicable to any cancer associated with increased apoptosis and genetic aberration. In particular, inventors were able to detect colorectal cancer, breast cancer, prostate cancer, and lung cancer with the method of the present invention.

In a preferred embodiment of the present invention, the method of the present invention further comprises the step of detecting the presence of CTCs, particularly if the non- apoptosis peak is increased. It could be shown that an increased non-apoptosis peak may be associated with an increased number of CTCs and that the size of the non-apoptosis peak may be correlated with the number of CTCs. CTCs are cells that have detached from a primary tumor and circulate in the bloodstream are therefore an indicator for cancer. Cancer research has demonstrated the critical role circulating tumor cells play in the metastatic spread of carcinomas and CTCs may constitute seeds for subsequent growth of additional tumors (metastasis) in different tissues. Methods for detecting CTCs with the requisite sensitivity and reproducibility in patients with metastatic disease are known in the art.

Fetal aneuploidy is an aneuploidy (see above) in an unborn child. Fetal aneuploidy may be detected by analyzing the mother's plasma DNA which also comprises fetal DNA (see below).

The methods of the present invention may be used in the diagnosis of a disease, wherein the diagnosis may be a predictive or prognostic diagnosis or therapy monitoring, particularly of cancer.

Prognostic bio markers can be separated in two groups: bio markers that give information on recurrence in patients who receive curative treatment and biomarkers that correlate with the duration of (progression free) survival in patients with metastatic disease. According to a NIH Consensus Conference, a clinical useful prognostic marker must be a proven independent, significant factor that is easy to determine and interpret and has therapeutic consequences. A biomarker with predictive value gives information on the effect of a therapeutic intervention in a patient. A predictive biomarker can also be a target for therapy. One can distinguish upfront and early predictive markers. The first can be used for patient selection and the second provides information early during therapy (see Oldenhuis, 2008). The plasma may be obtained from any suitable subject, as also detailed above. Preferably, the subject is a mammal. Evidently, subjects of particular interest include domestic animals, pets, and animals of commercial value (e.g. domestic animals such as horses) or personal value (e.g. pets such as dogs, cats). The method is especially preferred with human subjects, for which diagnostic methods including pre-natal diagnostic methods are commonly employed.

The method of the present invention may also be used in the detection of a fetal disease, wherein the plasma sample is from the pregnant mother. As detailed herein, it is estimated that 10 % to 20 % of the plasma DNA of a pregnant female are from the fetus. If the fetus is diseased the non-apoptosis peak in the pregnant mother's plasma DNA distribution is increased relative to a control (a healthy pregnant or non-pregnant subject). In a preferred embodiment of the present invention, the plasma sample is from the pregnant mother's blood if the disease is aneuploidy of the respective fetus. Particularly, trisomies 13, 18 and 21 are of clinical relevance. Of these, trisomy 21 and trisomy 18 are the most common. In rare cases, a fetus with trisomy 13 can survive. Accordingly, if an increased non-apoptosis peak is identified in the plasma distribution, preferably the fetus or fetal DNA is further tested for trisomy 13, 18 and/or 21.

Quantification of fetal DNA is typically based on Y chromosome specific sequences. In this case, analysis is limited to pregnancies carrying a male fetus. For detection of Y- chromosomal DNA qRT-PCR of the y-specific SRY gene and a housekeeping gene, e.g. GAPDH or β-globin, can be used. For qRT-PCR 5μ1 of isolated plasma DNA was amplified in duplicates using a standard SYBR Green protocol. Primers for SRY were designed according to a publication from Dennis Lo et al., 1998. For quantitation of total plasma DNA concentration a calibration curve was prepared. Genomic male DNA was diluted to concentrations of 66 ng, 6 ng, 660 pg, 66 pg, 6.6 pg and run in parallel and in duplicates with each analysis. The amount of 6.6 pg of DNA corresponds to one genome equivalent (GE) representing the DNA content of one single cell and was used for calculation for copy numbers of the specific target gene. The concentration, expressed in copies per milliliter, was calculated using the following equation: c =Q x (VDNA/VPCR) X ( 1/V ext) [c, concentration in copies/ml; Q, target quantity (GE); V_DNA, total volume of DNA obtained after extraction, V_PCR, volume of input DNA for PCR, V_ext volume of plasma extracted] Because the SRY gene is found in all nucleated cells of males only, whereas the β-globin gene is present in the male fetus and the pregnant female, we calculated the percentage of male DNA in a particular plasma sample, denoted as Y%, using the following equation (Lui et al, 2002):

Y% =SYR / β-Globin x 100

Other approaches to overcome the limitation of using y-specific markers use different epigenetic markers between maternal and fetal DNA. Some of these use the difference in methylation patterns between fetal and maternal cells. DNA-Methylation relates to the presence of methyl groups at the 5' carbon atom of a cytosin following a guanosin (a so- called CpG dinucleotide). CpG-Methylation in the promoter regions of genes is part of the regulatory system of gene expression and is tissue-specific. Accordingly, genes may be selected for analysis of fetal cells in maternal blood, whose methylation differs between fetal cells (usually placenta cells as the majority of fetal cells in maternal blood in derived from the placenta) and maternal blood cells (Chiu RW, Lo YM (2011)). In one example, the methylation profile of the SERPINB5 (Serpin peptidase inhibitor, clade B (ovalbumin), member 5) promoter may be used, which is hypomethylated and hypermethylated in fetal and maternal cells, respectively (Chim SS, et al (2005)). In another approach, a 5- methylcytidine-specific antibody may be used to detect methylated sequences and to enrich for fetal methylated DNA (see Papageorgiou EA, et al (2011)).

In another aspect, the present invention relates to the use of the size distribution of plasma DNA in the diagnosis of a disease, wherein an increased non-apoptosis peak in the size distribution relative to a control is indicative of the disease. Particularly, the present invention relates to the use of size distribution of plasma DNA in the diagnosis of a disease associated with increased apoptosis, wherein the size distribution is biphasic having a apoptosis peak and a non-apoptosis peak, wherein the apoptosis peak is characterized by a maximum in the range of from 150 bp to 180 bp and wherein the non-apoptosis peak is characterized by a maximum at in the range of from 300 bp to 350 bp, and wherein an increased non-apoptosis peak in the size distribution relative to a control is indicative of the disease. The use may be further characterized as detailed above in the context of the method of the present invention. The invention is not limited to the particular methodology, protocols, and reagents described herein because they may vary. Further, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present invention. As used herein and in the appended claims, the singular forms "a", "an", and "the" include plural reference unless the context clearly dictates otherwise. Similarly, the words "comprise", "contain" and "encompass" are to be interpreted inclusively rather than exclusively. If not stated otherwise, it is understood that the term "about" and the character "~" in combination with a numerical value n ("about n", "~n") indicates a value x in the interval given by the numerical value ±5 % of the value, i.e. n - 0.05 * n≤ x≤ n + 0.05 * n. In case the term "about" or the character "~" in combination with a numerical value n describes a particular embodiment of the invention, the value of n is most preferred, if not indicated otherwise.

Unless defined otherwise, all technical and scientific terms and any acronyms used herein have the same meanings as commonly understood by one of ordinary skill in the art in the field of the invention. Although any methods and materials similar or equivalent to those described herein can be used in the practice of the present invention, the preferred methods, and materials are described herein. The following Figures and Examples are intended to illustrate various embodiments of the invention. As such, the specific modifications discussed are not to be construed as limitations on the scope of the invention. It will be apparent to the person skilled in the art that various equivalents, changes, and modifications may be made without departing from the scope of the invention, and it is thus to be understood that such equivalent embodiments are to be included herein.

FIGURES Figure 1 shows characteristics of plasma-DNA from healthy controls and CRC patients: (a) Comparison of plasma-DNA quantities between healthy controls (1) and tumor patients (2) ( =0.018). (b) to (d) Sizing of plasma-DNA fragments with the micro fluidics-based Agilent 2100 Bioanalyzer platform. Normalization was performed to two internal markers,

1. e. a lower marker at 35 bp and an upper marker at 10.380 bp, for each analysis we used 800 pg of DNA. (b) Distribution of plasma-DNA from a healthy donor (l .Peak: 140-238 bp and 88 %). (c) Sizing of plasma-DNA from patient 11 (1. Peak: 100-268 bp and 89%;

2. Peak: 288-503 bp and 10 %). (d) Plasma-DNA analysis from patient 22 (l .Peak: 81-241 bp and 67 %; 2. Peak: 249-400 bp and 25 %). (e) Patients with the second, non-apoptotic peak (2) have higher plasma-DNA concentrations (mean: 604 ng/ml; median: 562 ng/ml; range: 260-1037 ng/ml) as compared to the patients without this second peak (1) (mean: 103 ng/ml; median: 89 ng/ml; range: 22-201 ng/ml) ( O.0001).

Figure 2 shows a combined copy number profile representing an average profile from all individual CTCs.

Figure 3 shows exemplary plasma-DNA size distributions, (a) Smear analysis with the 2100 Expert software of the plasma-DNA of healthy control M3. Sliders were placed at the beginning and end of the peak to determine the % of total, (b) Smear analysis of patient #18 (Mini Kit, 800 pg) with sliders positioned at the first (101 bp - 247 bp) and second (254 bp - 590 bp) peak.

Figure 4 shows the workflow of a preferred method of the present invention.

EXAMPLES

Material and Methods Patients (32 patients)

Twenty-three patients were male, 9 female, and the mean age at diagnoses was 68 years (range: 45-81 years). All patients (n=32) had advanced stage (Dukes D/stage IV), progressive disease at the time of blood collection. The period between diagnosis of the primary tumor and blood collection to obtain both plasma-DNA and circulating tumor cells (CTCs) varied extensively with a range from 1 month (patients 2, 9, 18, 23) to more than 12 years (151 months; patient 28) (mean: 31 months; median: 24 months).

The study was approved by the local ethics committee, written informed consent was obtained from all patients. All patients were seen at the Department of Internal Medicine at the University Hospital at the Medical University of Graz.

Clinical evidence for progression was based on one of the following three criteria:

a) newly diagnosed stage IV disease without previous treatment and before the start of a palliative treatment (n=6);

b) no recent chemotherapy and during one of the regular check-ups done every 8 to 12 weeks progression was noted [usually by an imaging procedure, such as ultrasound or X-ray] (n=15);

c) a tumor failed to respond to a given therapy and, blood was collected before the start of another treatment (n=l 1).

Group a) contains one patient (i.e. #38), who was initially treated with surgery alone; however, progression was noted 44 months later. Therefore, this patient is not "newly diagnosed", but he did not receive any chemotherapy as patients in groups b) and c) prior to our blood collection and therefore he was included in group a).

The interval between the last chemotherapy and blood collection depended on the above definition of the three different groups with progressive disease and did accordingly not exist for group a) (the mean interval between the diagnosis of the primary and the blood collection was for this group: 8.5 months; median: 1 months; range: 1-45 months; this extensive range was caused by patient #38, as mentioned above).

In group b) the mean interval between last chemotherapy and our blood collection was 198 days (median: 121 days; range: 67-688 days) for group b) (Interval between diagnosis primary and blood collection: mean: 49 months; median: 41 months; range: 8-151 months); and 25 days (median: 21 days; range: 12-48 days) for group c) (Interval between diagnosis primary and blood collection: mean: 20 months; median: 16 months; range: 2-47 months).

Preparation of cell-free tumor DNA using a standardized protocol

It has been well established that serum contains a higher concentration of DNA than plasma, possibly because of DNA released from the blood cells during the clotting process (Lo et al. 1998). Thus it is more reflective of the in vivo situation in the circulation to study plasma DNA, rather than serum DNA. Factors, which may influence the yield of circulating DNA include the isolation and quantification methods; the blood processing methods including the time elapsed till the isolation procedure is started; and the centrifugation conditions. Therefore, we standardized our protocol as follows:

Nine ml whole blood was collected in routine EDTA Vacutainer tubes (BD Biosciences, Heidelberg, Germany). To stabilize cell membranes and to impede cell lysis, 0.225 mL of a 10% neutral buffered solution containing formaldehyde (4% weight per volume) (Sigma- Aldrich, Vienna, Austria) was added immediately after blood had been drawn. Blood samples were gently inverted, stored at 4° C and further processed within two hours. Cell- free tumor DNA was prepared according to Dhallan et al. (Dhallan et al. 2004, 2007). In brief, tubes were centrifuged at 200g for 10 min. with the brake and acceleration powers set to zero with a subsequent centrifugation step at 1600g for 10 min. The supernatant was collected, transferred to a new 15 ml tube and spun at 1600g for 10 min. The plasma was carefully transferred to a new 2 ml Eppendorf tube and stored at -80°C.

DNA was isolated from plasma samples using the QIAamp DNA Blood Mini Kit (#51306, Qiagen, Hilden, Germany) or the Qiagen circulating nucleic acids Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. DNA was eluted in 30 of distilled water. Quantification and quality of the extracted DNA were determined using the Nano-Drop Spectrometer ND- 1000 (Peqlab Biotechnologie, Erlangen, Germany).

Quantitative analysis of plasma-DNA

For quantification of cell-free DNA in plasma we used Quant-iT™ PicoGreen ® dsDNA Kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer^'s instructions. Briefly, for a standard curve (ranging from 500pg^l to 16pg^l) we used bacteriophage lambda DNA. 50μ1 of the standard dilutions were transferred into disposable cuvettes and mixed with 50μ1 of Quant-iT™ PicoGreen® reagent working solution. Experimental DNA samples were diluted in lxTE (1 :25) to a final volume of 50μ1 and also mixed with 50μ1 of Quant-iT™ PicoGreen® reagent working solution in disposable cuvettes. Fluorescence of the samples was then measured using a QuantiFluor™ Fluorometer (Promega, Madison, WI, USA). The fluorometer was calibrated using the blank and the highest concentration of the standard dilutions. DNA concentration of the samples was determined from the standard curve generated using a spreadsheet software such as Excel (Microsoft Corp.).

Based on the DNA amount estimates we used 800 pg for a qualitative bioanalyzer analysis and 50 ng for the whole genome amplification.

Qualitative analysis of plasma-DNA (sizing / Bioanalyzer)

The minimal amount of plasma-DNA needed for the Bioanalyzer was defined by the amount of plasma-DNA from healthy donors needed to visualize the apoptosis peak. For each sample the same amount of DNA, i.e. 800 pg, was employed to reveal relative differences between the samples.

The length of the native plasma-DNA was determined by electrophoresis on an Agilent 2100 Bioanalyzer using the DNA series II 1000 kit / Agilent High Sensitivity DNA kit. The Agilent 2100 Expert software (version B.02.07 or higher) offers a smear analysis with an integrator allowing size adjustments of the smear region. The software automatically determines the average size (bp), size distribution in CV (%), concentration (pg/μΕ), % of total, and molarity (pmol/L) for each defined smear region. Estimation of cell numbers releasing DNA fragments into the blood circulation

The Agilent 2100 Bioanalyzer (Agilent Technologies) allows smear analysis for DNA sizing and quantification. In each sample we focused on the range from 80 to 600 bp as we invariably observed merely a flat line without measurable fluorescence intensities below 80 bp or above 600 bp, indicating that no measurable DNA is present in this range. We applied sliders within the electropherogram to determine the percentage of the total DNA within the respective smear region. The sliders were positioned at the beginning and the end of the apoptosis peak and if applicable also to the second peak. The Agilent software determines the average size (bp), size distribution in CV (%) and % of total for each defined smear region.

For example, in healthy control M3, we positioned the sliders at 105 bp and 180 bp, i.e. beginning and end of the first peak (Fig. 3 a)). As there was no second peak no other sliders were placed. The DNA within the thus defined region corresponds to 90% of the total DNA. Using pico-green the plasma-DNA concentration of M3 was determined to be 19.51 ng/ml; 90% are 17.559 ng/ml. Assuming that the blood volume of a male is about 6 1 this translates to 105 μg of DNA in the entire circulation. As a diploid cell has 6.6 pg of DNA 105 ng translate to 16xl0⁶ apoptotic cells, which contributed their DNA to the circulation at the time when the blood was drawn.

In patient #18 sliders were positioned at the first (lOlbp - 247bp) and second (254bp - 590bp) peak (Fig. 3 b)), revealing that the first peak contains 69% and the second peak 31% of plasma-DNA. Patient's 18 plasma-DNA concentration was 384.88 ng/ml, thus 69%o correspond to 266 ng/ml and 31% to 1 19 ng/ml. In a 6 1 blood volume (patient 18 was male) the first peak then translates to 1.6 g of DNA in the entire circulation, or 241xl0⁶ apoptotic cells, whereas the second peak corresponds to 716 μg of DNA or 108xl0⁶ non- apoptotic cells. These calculations again relate to the time point of the blood collection because the half- life of tumor DNA is estimated at 16 minutes (Lo et al. 1999) and therefore no estimates about tumor DNA release for an entire day are possible.

All calculations were based on these considerations, for females we used 5 1 as average blood volume.

Amplification of cell-free tumor DNA / Generation of random DNA libraries

We adjusted the PCR procedure to minimize any further degradation of DNA and to allow an unbiased amplification of all fragments irrespective of their size according to our previous experience (Geigl and Speicher 2007; Geigl et al. 2009).

For our purposes, i.e. unbiased amplification of plasma-DNA with fragments of different lengths, it turned out that the GenomePlex Complete Whole Genome Amplification Kit (WGA2, Sigma- Aldrich, Vienna, Austria) is - with some modifications - best suited. In brief, we omitted the fragmentation step and directly prepared libraries. Amplification was performed by adding 7.5 μΐ of lOx Amplification Master Mix, 47.5 μΐ of nuclease-free water and 5 μΐ WGA DNA Polymerase. Samples were amplified using an initial denaturation of 95° C for 3 min followed by 14 cycles, each consisting of a denaturation step at 94° C for 15 s and an annealing/extension step at 65° C for 5 min. After purification using the GenElute PCR Clean-up Kit (#NA1020; Sigma- Aldrich, Germany), DNA concentration was determined by a Nanodrop spectrophotometer. We used 50 ng of plasma-DNA as starting template for the whole genome amplification. After amplification the mean concentration of DNA was 136.01¾/μ1 (min 57.32 ng/μΐ; max 241.12 ng/μΐ). DNA Isolation of tumor DNA from FFPE sections

Formalin- fixed paraffin embedded (FFPE) tissue samples from primary tumors and if available from metastases were cut and mounted on a microscope slide. Hematoxilin and eosin stained slides were reviewed by experienced pathologists (C.L. and S.L.) and areas with a high tumor cell infiltration were microdissected from parallel sections and DNA was isolated using the QIAamp DNA FFPE tissue kit (#56404, Qiagen, Hilden, Germany) following the manufacturer's instructions.

CTC-isolation / Veridex

Blood samples (7.5 ml each) were collected into CellSave tubes (Veridex, Raritan, NJ, USA). The Epithelial Cell Kit (Veridex) was applied for CTC enrichment and enumeration with the CellSearch system as described previously (Riethdorf et al. 2007). In brief, in a first step CTCs were captured by anti-epithelial cell adhesion molecule (EpCAM)- antibody-bearing ferrofluid. Subsequent identification of CTCs was based on cytokeratin (CK)-positivity and negativity for the leukocyte common antigen CD45. In addition, 4',6- diamidino-2-phenylindole (DAPI) staining was done to evaluate the integrity of the nucleus.

For further analysis of CTCs, the EpCAM-enriched cell fraction in a volume of 300 μΐ was transferred from the CellSearch cartridge onto slides. Nucleated CK-positive, CD45- negative CTCs were isolated with the help of a micromanipulation device comprising the microinjector CellTram vario and the micromanipulator Trans ferM an NK2 (Eppendorf AG Hamburg, Germany) connected to an Axiovert 200 inverted microscope (Carl Zeiss AG, Jena, Germany) Single CTCs were placed into a 2.5 μΐ water drop in a 200 μΐ PCR reaction tube and stored overnight at -20°C prior to whole genome amplification. To propagate DNA from single CTCs, either the GenomePlex Single Cell Whole Genome Amplification Kit (Sigma-Aldrich, St. Louis, MO, USA) as described previously (Fiegler et al. 2006, Geigl et al. 2007) or the GenomiPhi DNA amplification kit (GE Healthcare, Chalfont St. Giles, UK) was applied.

For WGA using the GenomiPhi DNA amplification kit, 9 μΐ sample buffer and 1 μΐ protease (Qiagen, Hilden, Germany, dissolved in distilled water, final concentration 10.7 μΑυ/μΙ) were added for cell lysis (50 °C for 15 min, 70°C for 15 min). The DNA was denatured (95 °C for 2 min) and 10 μΐ reaction mix (9 μΐ reaction buffer + 1 μΐ enzyme mix) was added. The amplification reaction was performed at 30 °C for 2.5 h, followed by a final inactivation step at 65 °C for 10 min.

WGA products were purified using the GenElute PCR Clean-up Kit (Sigma-Aldrich, St. Louis, MO, USA) for DNA obtained with the GenomePlex Single Cell WGA Kit (Sigma- Aldrich) or NucleoSeq Columns (Macherey-Nagel, Diiren, Germany) for GenomiPhi amplified DNA. The final DNA concentration was determined by a NanoDrop ND-100 Spectrometer (Peqlab Biotechnologie, Erlangen, Germany).

When present in the primary tumor, we also tested for KRAS mutations in the CTC amplification products by Sanger sequencing.

Array-CGH

Array-CGH was carried out using a genome-wide oligonucleotide microarray platform (Human genome CGH 60K microarray kit, Agilent Techologies, Santa Clara, CA, USA) following the instructions of the manufacturer (protocol version 6.0). As reference DNA we used commercially available male reference DNA (Promega, Madison, WI, USA), which was amplified for the hybridization with WGA2. Samples were labeled with the Bioprime array-CGH genomic labeling system (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. In brief, 500 ng test DNA and reference DNA were differentially labeled with dCTP-Cy5 or dCTP-Cy3 (GE Healthcare, Milwaukee, WI, USA). Slides were scanned using a microarray scanner, and images were analyzed using Feature Extraction and DNA Workbench 5.0.14 (Agilent Technologies) with the statistical algorithm ADM-2; the sensitivity threshold was 6.0.

Array-CGH evaluation

Evaluation of our array-CGH was done based on our previously published algorithm (Geigl et al. 2009) in R (R Development Core Team 2008). In brief, data normalization and calculation of ratio values were conducted employing the Feature Extraction software 9.1 from Agilent Technologies. The algorithm focuses on detecting which ratio values differ significantly from the ratio profile's mean and should therefore be considered as over- or underrepresented. The concept of the algorithm includes the employment of running means with different window sizes and analyses at progressively greater levels of smoothing and then combining these analyses. Consecutive data points are combined and their mean ratio values are presented in graphs of array-CGH results. The algorithm iterates through the profile by changing window positions, employing a sliding window approach. The algorithm calculates the mean ratio value for each window based on the respective ratio values. Assuming that a window's ratio values are distributed normally we estimate the standard deviation (SD) by considering the outmost value that is within ± 34.1% of the mean. Thresholds were defined as ± 1.25 times the SD in single cell experiments and due to the higher noise with plasma-DNA thresholds had to be defined more stringently as ± 1.5 times the SD. The obtained values were assigned to all oligos of the respective window. If a window shows a significantly increased or decreased mean ratio value the mean position of that window will be displayed above or below the respective region of the ratio profile. We used 7 different window sizes consisting of 10, 25, 50, 100, 250, 500 and 750 adjacent oligos. Depending on the window size it will be labeled with a different color and distance to the X-axis and thus generates a color bar code. As we use sliding windows, the assessment of the copy number status of each oligo is based on 44 different calculations. The final assessment is indicated in a single green or red bar for gained or lost, respectively, regions, which is generated if at least 39 (90%) of the 44 repetitive calculations consistently result in the same copy number change. Furthermore, the algorithm generates a table with all localizations of significant calls which allows detailed mapping of each CNV. All ratio profiles shown in the center of the images were calculated using a 500 oligonucleotide window.

Plasma-DNA array-CGH artifact correction

Copy number estimations after WGA may have to deal with potential amplification artifacts, which may result in under- (e.g. allele drop out) or over-representations (e.g. preferential amplifications). Such a systematic amplification bias was not observed with the single cell amplification products as reported before by us (Geigl et al. 2009; Geigl and Speicher 2009) and therefore we had not employed corrections for single cell analyses. However, with the plasma-DNA WGA products we noted amplification biases and since a large part of this amplification bias was systematic (e.g. correlated with GC-content of the DNA) we could correct for it. In a first step we mapped the systematic amplification biases. In a second step, we introduced corrections for these regions in our algorithm for plasma-DNA. In brief, for the regions found in the first step, we increased the number of repetitive calculations, which had to consistently result in the same copy number change from 39 (90%) of the 44 calculations to 41 (93%) requesting that the copy number change is also indicated at least in the 500 or 750 window size calculation. For the representation of the ratio profile in the center of our array-CGH illustrations these regions were adjusted with a correction factor, which depended on the aforementioned calculations. The same corrections were applied in all plasma-DNA analyses of the tumor patients.

Establishment of integer copy numbers

Single cells should have integer copy number states. We inferred them from the array- CGH ratio values as follows: A nullosomy is reflected by a log₂(0) ratio value, which should in theory result in an infinite small ratio value, which is in practice usually indicated by a very small (<-2) log₂ ratio value. A heterozygous deletion in a diploid cell results in a ratio of 1 :2, which translates to a log₂(0.5) ratio value of -1. With the exception of nullosomies there are in a triploid cell two (i.e. 2:3; log₂ ratio: -0.585 and 1 :3; log₂ ratio: - 1.585) and in a tetraploid cell three (i.e. 3 :4, log₂ ratio: -0.415; 2:4, log₂ ratio: 1 ; 1 :4, log₂ ratio: -2) different possible deletion states. There are even more discrete ratio values for copy number gains, which may even reach exceptional values in cases of high level amplifications. Therefore, our algorithm focuses on ratio values for the limited number of deleted conditions to establish first the ploidy of a cell and second to infer subsequently the integer copy number of increased ratio values. We subdivided ratio values from 1.2 (which approximately corresponds in a diploid cell to a 5:2 copy number ratio, in a triploid cell to a 7:3, and in a tetraploid cell to a 10:4 copy number ratio) to less than -1.5, which covers the monosomies in di- (i.e. 1 :2 copies), tri- (1 :3) and tetraploid (1 :4) cells, into 20 different bins. We used then non-parametric Gaussian kernel density estimators using the bkde2D function in R (R Development Core Team 2008) to establish the distribution of ratio values into the various bins. Such a high number of bins was necessary because the higher the ploidy level, the smaller are the log₂ ratio differences between different integer copy number states. Thus, depending on the ploidy level of a cell adjacent bins may represent the same copy number. Furthermore, due to smoothing operations and the various window sizes employed during the evaluation process, some ratio values representing the same integer copy number may be also be assigned to neighboring bins. To confirm whether the integer copy number estimations are correct, array-CGH offers the option to compare the respective array-CGH ratio profile with the integer copy number profile. Thus, for each cell we simulated integer copy number profiles assuming a diploid, triploid, tetraploid state and compared those to the initial array-CGH profile.

Reconstruction of copy number profiles from individual CTCs

Common copy number profiles were reconstructed after establishing the integer copy number status. For each oligonucleotide and each cell we added up the previously established integer copy numbers and divided those by the number of cells included in the analysis.

Enrichment of genes

A total of 68 genes (Table 3), corresponding to 696 kb, which were frequently mutated in CRC (www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=byhist&ss= colon&sn=large intestine&s=3) according to the COSMIC database (Catalogue Of Somatic Mutations In Cancer), were enriched using a SeqCap EZ Choice Library (Roche Nimblegen, Madison, WI, USA), following the manufacturer^'s instructions. In brief, SeqCap EZ Oligo pool was designed and synthesized against the above mentioned target regions in the genome. A standard shotgun library was made from genomic (tumor and metastasis of patients 6 and 26) or amplified (CTCs from patients 6 and 26) DNA. Four libraries were pooled equimolarily (250 ng each) (FFPE pool: tumor and metastasis of patient 6 and 26, CTC pool 1 : CTC 7, 13, 14 from patient 6, CTC 5 from patient 26; CTC pool 2: CTC 21 , 22, 24, 28 from patient 26) and hybridized to the SeqCap EZ Oligo pool for 72h. Streptavidin beads were used to pull down the complex of captured oligos and genomic DNA fragments, whereas unbound fragments were removed by washing. Finally, the enriched fragment pools were amplified by PCR. For quality control of enrichment qPCR at control loci included in the Choice Library and for one target gene (TP53) was performed. Average fold-enrichment for control loci enriched was 4321 -fold and for the selected target gene TP53 9210-fold.

Subsequently, enriched library pools were ready for high throughput (next generation) sequencing.

454 Life Sciences/ Roche Diagnostics; Sequencing of target enriched library pools

For sequencing of the target enriched library pools of tumors, metastases, and CTCs we followed the Lib-L protocol (emPCR Method Manual - Lib-L LV). To achieve sufficient coverage for all samples and due to low quality reads of FFPE samples (tumors and metastases) we performed four sequencing runs for a total of three library pools. Additional information on next generation sequencing

A total of 959.790 and 2,070.381 sequences were generated for the FFPE pool (tumor and metastasis of patient 6 and 16) and the two CTCs pools (pool 1 : CTC 7, 13, 14 from patient 6 and CTC 5, from patient 26; pool 2: CTC 21, 22, 24, 28 from patient 26), respectively. Read length was remarkably consistent throughout the experiment. The average read length was 232bp for the FFPE material and 329bp for the CTCs, indicating that the total yield from the sequencing run was 223Mb for the 4 FFPE samples and 683Mb for the 8 CTCs. Since FFPE materials are more uneven and difficult to apply to genome-wide assays, the average coverage of tumors and metastases (24x) was lower as compared to the CTCs (41x). In addition, sequencing of enriched CTCs achieved more on-target reads (55,6%) than sequencing of the FFPE material (41,5%).

454 Life Sciences/ Roche Diagnostics , KRAS Deep sequencing

"Picotiter plate pyrosequencing", a massively parallel sequencing-by-synthesis ("next generation sequencing") approach, relies on emulsion PCR-based clonal amplification of a DNA library adapted to micron-sized beads and subsequent pyrosequencing-by-synthesis of each clonally amplified template in a picotiter plate, generating up to 1.000,000 unique clonal sequencing reads per experiment (Margulies et al. 2005). Sequence variants that represent a fraction of a complex sample can be vastly oversampled, thus enabling statistically meaningful quantification of low-abundance variants.

For amplicon preparation of KRAS we used a proof-reading enzyme (Fast Start High Fidelity System, Roche Diagnostics) according to the manufacturer^'s instructions with 35 cycles of amplification. Primers were designed resulting in 3 different amplicon lengths (KRAS specific length 119bp, 168bp, and 323bp), including the Roche-compatible adaptors A and B with a length of 21bp, plus 4bp TCAG key sequence (each read has to start with key sequence) and lObp MID (multiple identifier; barcode). Amplicons from patient and control samples (cell-free native and whole genome amplified DNA from plasma) were purified, quantified with PicoGreen, and pooled equimolarily (6-8 samples per pool). Amplicon pools were quality checked on an Agilent Bioanalyzer using Agilent High Sensitivity DNA kit. Emulsion PCR was performed according to emPCR Method Manual - Lib-A MV. In doing so, two emulsion PCRs were performed for forward and reverse sequencing. After clonal amplification microbeads were collected and DNA-carrying beads were enriched and deposited onto PicoTiterPlates provided for the 454-FLX instrument (Roche Diagnostics). Massively parallel pyrosequencing was performed according to the manufacturer's protocol, base calls and quality scores were generated using the GS Run Processor software on a HPC-cluster, and variants were extracted using the GS Amplicon Variant Analysis 2.6 software provided with the platform. Read lengths strongly corresponded to amplicon lengths. The average coverage was 57045x for the 119bp fragment, 12201x for the 168bp fragment, and 13540x for the 323bp fragment, respectively.

Hierarchical cluster analyses

We used copy number changes to build neighbor-joining trees by hierarchical (centroid linkage) cluster analysis. Oligonucleotides with gained ratio values were set to 1, balanced oligonucleotides to 0 and lost oligonucleotides to -1. Thus, cluster analysis was based on the values of 59012 olignucleotides per tumor, metastasis, and CTC, respectively. We used the Gene Cluster 3.0 software, which is an enhanced version of the original Cluster/Tree View program, developed by Michael Eisen (Eisen et al. 1998). Cluster 3.0 was created by Michiel de Hoon together with Seiya Imoto and Satoru Miyano and is available under the URL (http://bonsai.hgc.jp/~mdehoon/software/cluster/).

Statistics

Statistical analysis was done using Microsoft Excel. Unpaired one-sided Student's t test was used to calculate P values for the data sets.

Results Sizing of plasma-DNA fragments reveals a biphasic distribution in a subset of patients with CRC

Plasma-DNA concentrations of healthy controls were within a narrow range (mean: 15.21 ng/ml; median: 14.37 ng/ml; range: 12.20-19.51 ng/ml) whereas plasma-DNA of patients with CRC showed invariably higher values with a substantial variability (mean: 275.35 ng/ml; median: 139.0 ng/ml; range: 22.44-1,037.49 ng/ml) ( =0.018) (Fig. 1 a)).

As DNA fragments in the blood circulation are degraded (Diehl et al. 2005), we employed the micro fluidics-based Agilent 2100 Bioanalyzer platform for sizing. In healthy controls we observed an enrichment of plasma-DNA fragments within a range of 85-230 bp and a maximum around 166 bp (Fig. lb)). Previously plasma-DNA fragments within this size range were associated with the release of DNA from apoptotic cells after enzymatic processing (Diehl et al. 2005; Lo et al. 2010), because the length of these fragments corresponds approximately to the DNA wrapped around a nucleosome (-142 bp) plus a linker fragment (~20 bp) (Lewin 2008). We estimate that about 90% of plasma-DNA in healthy controls is derived from apoptotic cells, which translates to -16x10 apoptotic cells releasing DNA fragments into the circulation.

There was no significant qualitative difference regarding the size distribution of plasma- DNA fragments between healthy controls and 21 (65.6%) of our CRC cases (#1, 2, 3, 7, 11, 12, 14, 15, 16, 17, 19, 21, 23, 24, 28, 29, 30, 32, 34, 35, 37) despite the higher concentrations of plasma-DNA in the latter group (Fig. 1 c) #11). In these cases still about 90%) of DNA was contributed by apoptotic cells and due to the increased DNA concentrations this translates to a DNA-release from ~18xl0⁶-164xl0⁶ apoptotic cells.

However in 11 (#6, 9, 10, 18, 20, 22, 25, 26, 27, 33, 38) (34.3%) patients, we observed an additional, second peak consisting of DNA fragments with a typical size range starting from about 240 bp and frequently extending to 400 bp or in some cases even longer (Fig. 1 d) #22). As we did not observe this second peak in healthy controls it is likely tumor related. Furthermore, the size distribution was always biphasic, suggesting that the DNA from the second peak may be derived from a different cell source, most-likely non- apoptotic cells. Hence, we refer to this 2^nd peak as "non-apoptosis peak". In these cases we estimate that about 65-86% of the DNA is derived from apoptotic cells (corresponding to 241xl0⁶ - 613xl0⁶ apoptotic cells) and about 11-33% from non-apoptotic cells (54xl0⁶ - 31 lxlO⁶ non-apoptotic cells) (numbers based on cases #9, 18 and 22).

Deep sequencing confirms high percentages of mutated DNA fragments in both plasma- DNA fractions

In order to test whether these two peaks reflect indeed tumor related events we determined in patients with KRAS mutations in their corresponding primary tumors the percentage of plasma-DNA fragments with and without KRAS mutations. We used ultra-deep pyrosequencing and by varying the size of the sequencing reaction from 119 bp to 323 bp (119 bp, 168 bp, 323 bp), it was possible to determine both the number and size distribution of normal and mutant gene fragments. As we did not know what percentage of mutated DNA fragments to expect, we performed on purpose an oversampling and tried to obtain more than 10,000 sequence reads per reaction. To assess sequencing error rates, we included plasma-DNA from three healthy male donors as control. We observed low levels (<5%) of KRAS mutations (G12D and G12V), which is within the range of false positives occurring at a coverage between 10,000 and 20,000 (Table 1). While we always obtained a high number of sequence reads for the 119 bp and 168 bp fragments (i.e. between 14.000 and 21.000 reads) we invariable obtained only very low sequence read coverage (<30x) for the 323 bp fragments (Table 1). This suggests that long DNA fragments are a rarity in plasma-DNA of healthy individuals. Table 1: Ultra-deep pyrosequencing with sequencing reaction sizes of 119 bp, 168 bp, and 323 bp, in patients without non-apoptosis peak (7, 11, 15, and 16) and with non-apoptosis peak (6, 10, 25, 38).

Deep sequencing was performed in four patients without (i.e. #7, #11 , #15, and #16) and in four patients with the non-apoptosis peak (i.e. #6, #10, #25, and #38). In patients without non-apoptosis peak, deep sequencing identified mutated KRAS fragments at low levels in two patients (#11, #15), which was within the range of false-positives, and in the other two patients (#7, #16) even not at all (Table 1). However, with the exception of patient 15 deep-sequencing of 323 bp fragments was possible (Table 1), suggesting that these patients may have longer DNA fragments in their plasma than healthy controls.

In contrast, the four patients with the non-apoptotic peak (i.e. #6, #10, #25, and #38) showed high percentages of mutated DNA fragments (Table 1). With the exception of patient 25 the fraction of mutant molecules [number of total KRAS fragments (WT plus mutant)] was not dependent on the size of the amplicon, and was about the same over the entire size range tested, including 323bp fragments.

The presence of the non-apoptotic cell plasma-DNA peak correlates with higher plasma- DNA concentrations and genomic plasma-DNA imbalances Patients with the second, non-apoptotic peak had higher plasma-DNA concentrations (mean: 604 ng/ml; median: 562 ng/ml; range: 260-1037 ng/ml) as compared to the patients without this second peak (mean: 103 ng/ml; median: 89 ng/ml; range: 22-201 ng/ml) ( O.0001) (Fig. 1 e)).

We reasoned that in addition to KRAS mutated DNA fragments also other tumor DNA fragments should be present in the circulation of these patients and may provide insights about the tumor genome. To address this question, we generated random DNA libraries by converting the plasma-DNA fragments to PCR-amplifiable OmniPlex Library molecules flanked by universal priming sites for whole genome amplification (WGA). We subjected the WGA products to array-CGH on a 60K microarray platform.

We started the analyses with healthy controls (3 females and 3 males). We mapped systematic amplification biases, which occurred due to the severe fragmentation of plasma- DNA and corrected for those in our evaluation algorithm. This allowed us to establish in each case the sex of the plasma-DNA donor based on the ratio shifts observed on the sex chromosomes with ease. On the autosomes on average 4.026 (7.2%; range 3.606-4.321; 6.4%-7.7%) oligonucleotides showed aberrant ratio values. As the distribution of these oligonucleotides was random, they most likely represent artifacts due to the fragmentation and/or amplification process.

We subjected the random DNA libraries generated during the WGA process from CRC- patients [#6, #7, #10, #11, #15, #16, #25, and #38] again to deep sequencing to estimate whether shifts between the ratios of mutated versus non-mutated DNA fragments had occurred during the amplification process. Not surprisingly, we did not detect KRAS mutations in the WGA-products of the four patients with low frequency or absent KRAS mutations (i.e. #7, #11, #15, and #16) (Table 2). The percentage of mutated DNA fragments was about the same as in the native DNA in patients 10 and 25, however, it was lower in patient 6 and even notably lower in patient 38 (Table 2). Overall this suggests that WGA may cause a shift towards a higher number of non-mutated DNA fragments in a subset of cases. Table 2: Ultra-deep pyrosequencing with sequencing reaction sizes of 119 bp, 168 bp, and 323 bp, in patients without non-apoptosis peak (7, 11, 15, and 16) and with non-apoptosis peak (6, 10, 25, 38) after WGA.

Patient Mutation 119 bp 168 bp 323 bp

% mutated reads % mutated reads % mutated reads

7 G13D (c.38G>A) 0 9939 0 22059 0 0

11 G12V (c.35G>T) 0 7845 0 8498 0 3342

15 G12D (c.35G>A) 0 13401 0 22527 0 29546

16 G12D (c.35G>A) 0.62 13271 0 11745 0.36 6049

% mutated reads % mutated reads % mutated reads

6 G12V (c.35G>T) 23.42 5479 9.61 12894 19.74 2989

10 G12D (c.35G>A) 49.08 12270 28.42 21641 25.47 14447 25 G12D (c.35G>A) 13.23 12732 8.59 13956 0 10537

38 G12D (c.35G>A) 10.56 14596 4.56 14292 3.88 20450

When we performed array-CGH with patients who had only the apoptosis-peak we observed an increase of oligonucleotides with aberrant ratio values to a mean of 5.572 (10.0%; range: 4.587-7.469; 8.2%-13.4%; compared to healthy controls: =0.001). However, these were usually single, non-adjacent oligonucleotides and not at recurrent locations and did therefore not indicate a specific loss or gain of a particular chromosomal region.

In contrast, the plasma-DNA in 10 of the 11 patients (i.e. 6, 9, 10, 18, 20, 22, 25, 26, 27, 33; exception: 38) with the additional non-apoptosis peak had a mean number of oligonucleotides with copy number changes of 12.476 (22.3%; range: 9.117-14.915; 16.3%-26.7%) which was highly significant compared to both the aforementioned CRC cases and the healthy controls ( <0.0001 each).

The chromosomal imbalances in the plasma-DNA are tumor specific

To check whether the observed copy number changes in the latter group are tumor specific, we compared copy number changes between the plasma-DNA and primary tumor and/or metastasis in 4 patients (i.e. 6, 9, 26, 33) (Fig. 2).

In patient 6 the primary tumor and its brain metastasis showed marked copy number differences, however, in the plasma-DNA we could attribute many chromosomal regions to the primary tumor and/or metastasis by shared copy number alterations. For the plasma- DNA, the copy number status of 46.3% of the oligonucleotides on our array platform was omnipresent across all three lesions; 18.1% was partially shared by the primary tumor and 18.0% by the metastasis and 17.6% unique to the plasma-DNA.

In patient 9 copy number changes within the primary and the peritoneal carcinomatosis were almost identical, and some of these changes, such as losses of chromosome 4 and 5 material were also reflected in the plasma-DNA. In contrast, copy number changes of the plasma-DNAs of patients 26 and 33 resembled clearly more those observed in the primary tumor than those from the liver metastases.

Taken together, in these four plasma-DNA samples, an average of 54.7% (median: 54.3%; range: 46.3-63.6%)) of all oligonucleotides displayed an identical copy number status (i.e. lost, balanced, gained) in all three samples (primary, metastasis, and plasma-DNA). About 10%) (median: 9.4%>; range: 3.1-18.1%>) of plasma-DNA copy number changes were unique to the metastasis and vice versa approximately 14.5% (median: 16.2%; range: 2.9-22.7%) only present in the primary tumor. About 20.9% (median: 18.7%; range: 15.7-30.4%) of plasma-DNA changes were observed only in the plasma-DNA but not in the primary and metastasis. However, all 4 patients did not have just one metastasis but several at various sites, which were not accessible to us. Thus, copy number changes observed only in the plasma-DNA could be derived from a metastatic site not included in our analysis.

In patient 27 the comparison between changes in the primary tumor and plasma-DNA revealed a shared copy number status in 73.8% of all oligonucleotides. In two patients (#18, #22) we could compare the plasma-DNA copy number changes only with those from CTCs as described below. In three patients (#10, #20, and #25) no further material for comparison was available. However, we observed again copy number changes, which have been frequently described in colorectal cancer, such as 8p loss or gains of 8q and 20q. Overall this suggests that the genomic alterations, which we identified in the plasma-DNAs are indeed tumor related, and may reflect contributions of a specific tumor cell population, which released its DNA into the circulation around the time of blood collection.

The only exception was patient 38 whose plasma-DNA concentration was elevated but did not show copy number changes of larger contiguous chromosomal regions or entire chromosomes, which unequivocally corresponded to those from the primary and/or metastasis.

The presence of the non-apoptotic cell plasma-DNA peak correlates with the occurrence of CTCs

We also observed a clear correlation between the presence of the non-apoptosis peak and the occurrence of CTCs. We applied the FDA-approved Veridex system for CTC detection in 30 of the 32 patients [NAs: 10, 35] (Riethdorf et al. 2007). In 10 of these 30 cases no CTCs were identified. In the patient group with non-apoptosis peak (n=9; 6, 9, 18, 20, 22, 26, 27, 33, 38; NA: 10), we found altogether 524 CTCs (mean: 52; median: 35; range: 0- 181) whereas patients without second peak (n=21; NA: 35) had altogether only 30 CTCs (mean: 1.5; median: 1; range: 0-7) ( =0.0003).

CTCs represent a tremendously heterogeneous population of tumor cells

We successfully isolated 37 CTCs for further analyses from 6 patients (i.e. 6 [9 CTCs], 9 [4 CTCs], 18 [4 CTCs], 22 [4 CTCs], 26 [10 CTCs], and 38 [6 CTCs]) and processed them as previously described (Geigl and Speicher 2007; Geigl et al. 2009). Although the number of CTCs, which we could analyze, was limited we made already some intriguing observations.

For example, we had 10 CTCs from patient 26. In order to investigate the CTC population structure and assuming that single cells will have discrete copy number states, we calculated integer copy number profiles for each CTC. These analyses revealed that all CTCs [i.e. 05, 17, 19, 21, 22, 24, 25, 26, 27, and 28] were within the tetraploid range.

All CTCs showed aberrant profiles with diverse chromosomal gains and losses and we did not identify two CTCs with an identical profile of gains and losses. On average for all

CTCs together 52.0% (median: 53.4%; range: 37.3-61.5%)) of the copy number status (i.e. gains, losses and balanced regions) was omnipresent across all CTCs, primary tumors and metastases; 8.3%> (median: 8.2%>; range: 6.7-10.4%>) were partially shared between all

CTCs and metastases only; 14.2% (median: 14.2%; range: 12.0-16.3%) were partially shared between CTCs and primary tumors only; and 25.6%> (median: 24.8%>; range: 17.6-

41.9%) of all CTC copy number changes were not observed in the primary tumors or metastases.

In order to establish the most frequently occurring copy number changes within the CTC population, we constructed an average copy number profile from all individual CTCs (Fig. 2). This "main CTC lineage" revealed that compared to the primary tumor and its metastasis the CTCs had acquired several novel copy number changes, such as gains of chromosomes lq, 6q, 12p, high level gain of 13, and loss of chromosome 14. This was also reflected when we calculated distances between the CTC profiles using hierarchical cluster analysis and constructed neighbor-joining trees from the copy number profiles because the main CTC lineage did not cluster close to the primary tumor or metastasis. Furthermore, these analyses revealed that the integer copy number profile of one cell (i.e. CTC28) clearly deviated from the other cells suggesting the presence of cells with a "private" pattern of alterations.

We investigated CTCs from further patients to determine whether these findings extend. In patient 6 we had 9 cells [3, 03, 07, 8, 11, 12, 13, 14, 22] available for analyses. For this patient we used the G12V KRAS mutation status to confirm whether these cells were indeed derived from the tumor. The KRAS mutation was identified in 6 cells [i.e. 3, 03, 07, 12, 13, 14]. The three cells without KRAS mutation did not show any tumor-related copy number changes, suggesting that some cells identified by the Veridex system may be epithelial stromal cells.

Establishing of integer copy number profiles revealed that 2 of these CTCs [i.e. 3 and 12] were in the diploid range with few copy number changes; therefore we classified the CTCs as pseudodiploid. Four CTCs [i.e. 03, 07, 13, and 14] were tetraploid and displayed multiple copy number changes again with a high cell-to-cell variability. The average copy number profile revealed that compared to the primary tumor the gain of chromosome 8q had newly evolved in the CTC population. The CTCs had clearly more chromosomal copy number changes in common with the primary tumor than with the metastasis. We found again a cell with a "private" copy number profile (i.e. CTC14) showing no clear relationships to the main CTC lineage and a large distance from the other cells in our cluster analysis.

These observations also extended to the CTCs of patient 9. Three of them (i.e. CTC 4, 10, 13) had very similar copy number changes as to those in both primary and metastasis; however, in these CTCs the overrepresentation of chromosome 3, which was present in both the primary and metastasis was lost. CTC2 had a private pattern of aberrations.

In patient 38 the Veridex system identified 6 cells. However, none of them had the G12D KRAS mutation, which had been identified in the primary tumor and which we had used for deep sequencing. Moreover, all of them had a balanced profile suggesting that these cells were likely not tumor cells but epithelial stromal cells.

CTCs and plasma-DNA analysis in the absence of material from the primary tumor or metastasis

In two cases (i.e. #18 and #22) only biopsies were taken at time of diagnosis so that no material was available for our analyses. In patient 22 our plasma-DNA analysis revealed losses of chromosomes 3, 4, 5, 8p, and 18 and gains of chromosomes 7p, 17q, and 20, suggesting that these were predominant changes of the tumor cell population releasing DNA into the circulation. In fact, we found one CTC with almost identical copy number changes, suggesting that tumor cells with these copy number changes indeed exist. However, the other analyzed CTCs displayed again various other, heterogeneous copy number changes.

In patient 18 our plasma-DNA analysis revealed again typical CRC-related copy number changes, such as loss of 8p and gains of 8q and 20. However, the identified copy number changes in the CTCs differed from those in the plasma-DNA.

Mutation spectrum in primary tumors, metastases, and CTCs

We used enrichment technologies for the analysis of 68 genes (see Table 3), corresponding to 696 kb, which were frequently (>3%) mutated in CRC according to the Catalogue Of Somatic Mutations In Cancer (COSMIC database). We performed these analyses for the primary tumors and their metastases in patients #6 and #26. In addition, we sequenced three CTCs [7, 13, 14] from patient #6 and five CTCs [5, 21 , 22, 24, 28] from patient #26. Furthermore, constitutional DNA was analyzed to determine whether mutations were somatic. In addition, using ultra-conserved and extremely-conserved regions we performed control experiments to estimate the experimental sequencing error rate and found that sequencing errors are very rare.

Table 3: List of 68 genes, corresponding to 696 kb, which were according to the COSMIC database frequently mutated in CRC and analyzed in our study (www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=byhist&ss=colon&sn=large_intestin e&s=3).

Gene Description Gene Description

ATP-binding cassette, sub-family

ABCA1 MSH3 MutS homolog 3 (E. coli)

A (ABC1), member 1

ADAMTSL3 ADAMTS-like protein 3 MSH6 MutS homolog 6 (E. coli)

Anaplastic lymphoma receptor

ALK MYH/MUTY MutY homolog (E. coli)

tyrosine kinase

APC Adenomatous polyposis coli NAV3 Neuron navigator 3

Bone morphogenetic protein

BMP 1A NEB Nebulin

receptor, type IA

V-raf murine sarcoma viral

BRAF NF1 Neurofibromin 1

oncogene homolog Bl

Chromosome 10 open reading Neuroblastoma RAS viral (v-ras)

C10orfl37 NRAS

frame 137 oncogene homolog

Calcium channel, voltage- Obscurin, cytoskeletal

CACNA2D3 dependent, alpha 2/delta OBSCN calmodulin and titin-interacting subunit 3 RhoGEF

Cyclin-dependent kinase

Olfactory receptor, family 51,

CDKN2A inhibitor 2A (melanoma, pl6, OR51E1

subfamily E, member 1 inhibits CDK4)

Cyclin-dependent kinase Phosphoinositide-3-kinase,

CDKN2a(pl4) PIK3CA

inhibitor 2A (pl4ARF) catalytic, alpha polypeptide

CUB and Sushi multiple domains Phosphoinositide-3-kinase,

CSMD1 PIK3R1

1 regulatory subunit 1 (alpha)

CUB and Sushi multiple domains Phospholipase C, gamma 2

CSMD3 PLCG2

3 (phosphatidylinositol-specific)

Catenin (cadherin-associated PMS2 postmeiotic segregation

CTNNB1 PMS2

protein), beta 1 increased 2 (S. cerevisiae)

Dynein, axonemal, heavy chain Phosphatase and tensin

DNAH1 PTEN

1 homolog

EPHA3 EPH receptor A3 RET Ret proto-oncogene EPHB6 EPH receptor B6 RYR2 Ryanodine receptor 2 (cardiac)

Excision repair cross- complementing rodent repair Sidekick homolog 1, cell

ERCC6L SDK1

deficiency, complementation adhesion molecule (chicken) group 6-like

EVC2 Ellis van Creveld syndrome 2 SMAD2 SMAD family member 2

FBN2 Fibrillin 2 SMAD4 SMAD family member 4

SWI/SNF related, matrix

F-box and WD repeat domain associated, actin dependent

FBXW7 SMARCA4

containing 7 regulator of chromatin,

subfamily a, member 4

Fibronectin type III domain Smoothened homolog

FNDC1 SMO

containing 1 (Drosophila)

V-src sarcoma (Schmidt-Ruppin

F AS1 Fraser syndrome 1 SRC A-2) viral oncogene homolog

(avian)

GNAS GNAS complex locus STAB1 Stabilin 1

Guanylate cyclase 1, soluble,

GUCY1A2 STK11 Serine/threonine kinase 11 alpha 2

HECT, UBA and WWE domain Spectrin repeat containing,

HUWE1 SYNE1

containing 1 nuclear envelope 1

Lysine (K)-specific demethylase

KDM6A TBX22 T-box 22

6A

Transcription factor 7-like 2 (T-

KIAA1409 KIAA1409 TCF7L2

cell specific, HMG-box)

V-Ki-ras2 Kirsten rat sarcoma Transforming growth factor,

KRAS TGFBR2

viral oncogene homolog beta receptor II (70/80kDa)

Transglutaminase 3 (E

LAMA1 Laminin, alpha 1 TGM3 polypeptide, protein-glutamine- gamma-glutamyltransferase)

Mitogen-activated protein

MAP2K4 TNN Tenascin N

kinase kinase 4

Mitogen-activated protein

MAP2K7 TP53 Tumor protein p53

kinase kinase 7

MutL homolog 1, colon cancer,

MLH1 TTN Titin

nonpolyposis type 2 (E. coli)

Myeloid/lymphoid or mixed- Ubiquitin protein ligase E3

MLL3 UBR5

lineage leukemia 3 component n-recognin 5 description according to GeneCards V3 (http://www.genecards. With the exception of an exonic deletion in ERCC6L in patient #6 and a 1 bp-insertion in MLHl in patient #26 (CTC28), all identified insertions and deletions were encountered only in introns. Hence, we focused our analysis on exonic, non-synonymous variants, not included in the NCBI dbSNP build 132 database. Furthermore, we concentrated on mutations in established candidate cancer genes (CAN genes), which had previously been described as true driver genes in CRC (Wood et al. 2007), and on some other CRC- associated genes such as BRAF, CTNNB1, and MLHl. KRAS was Sanger sequenced as it had low coverage in next-generation sequencing.

In a first screen we found 25 mutations in 17 CAN genes in at least one of the analyzed samples of patient #6 and 22 mutations in 10 CAN genes in patient #26 (Table 4). Sanger sequencing confirmed 16 and 15 mutations in patients #6 and #26, respectively. In patient #6 four of these mutations were constitutional and therefore found throughout all samples. In addition, this patient had at least 3 somatic "founder" mutations (Yachida et al. 2010) (i.e. APC, KRAS, PIK3CA), which were present in all tumor samples including all CTCs. The other mutations were "progressor mutations" (Yachida et al. 2010), i.e. present only in the metastases and/or CTCs examined, but not in the primary tumor.

Table 4: Mutations in CAN-genes

Nucleotide Amino acid

Gene Accession No.² Patient Sample³

change¹ change¹

ABCA1 C.C3625T P.P1209S NM_005502.3 #6 CTC7

ABCA1 C.C5140T P.A1714T NM_005502.3 #26 CTC21

ADAMTSL3 C.C2266T P.Q756X NM_207517.2 #26 CTC24

PT, MT, CTC7,

APC C.C994T P.R332X NM_000038.5 #6

CTC13, CTC14

C10orfl37 C.C2969A P.A990E NM_015608.2 #6 CTC14

CACNA2D3 C.G250A p.G84S NM_018398.2 #6 CTC14

CTNNB1 C.C478A P.L160I NM_001904.3 #26 CTC5

CTNNB1 C.G445A P.A149T NM_001904.3 #6 CTC14

CTNNB1 C.G830A P.G277D NM_001904.3 #26 CTC5

CTNNB1 C.G1286A P.C429Y NM_001904.3 #26 CTC24

CT, PT, MT, CTC7,

CSMD3 C.A39T p.E13D NM_198123.1 #6

CTC13, CTC14

CSMD3 C.G5167A P.V1723I NM_198123.1 #26 CTC22

CT, PT, MT, CTC7,

E CC6L C.G3382A P.E1128K NM_017669.2 #6

CTC13, CTC14

CT, PT, MT, CTC7,

ERCC6L c.3598_3599del P.Y1200X NM_017669.2 #6

CTC13, CTC14

GNAS C.G2606A P.G869D NM_080425.2 #6 CTC14

GNAS C.C218T p.A73V NM_016592.2 #26 CTC5 GUCY1A2 C.C1315T P. H439Y N M_000855.1 #6 CTC14

KIAA1409 C.T4030C P.S1344P N M_020818.3 #26 CTC21

LAMA1 C.C4240T P. P1414S N M_005559.3 #6 CTC7

LAMA1 C.C3004T P. H1002Y N M_005559.3 #6 CTC7

LAMA1 C.G2453A P.W818X N M_005559.3 #26 CTC5

MLH 1 c. l484insC p. R497Pfs*6 N M_000249.3 #26 CTC28

NAV3 C.G5761A P.G1921S N M_014903.4 #26 CTC21

NAV3 C.C1781T P.S594F N M_014903.4 #26 CTC5

MT, CTC7,CTC13,

N F1 C.C403T P. R135W N M_001042492.2 #6

CTC14

CT, PT, MT, CTC5,

0 51E1 C.G587A P. R196Q N M_152430.3 #26 CTC21, CTC22,

CTC24

0R51E1 C.C935T P.A312V N M_152430.3 #26 CTC21

PT, MT, CTC7,CTC13,

PIK3CA C.G1624A P.E542K N M_006218.2 #6

CTC14

CT, PT, MT, CTC7,

STAB1 C.C3730T P.R1244C N M_015136.2 #6

CTC13, CTC14

MT, CTC7,CTC13,

TP53 C.C421T P. R141C N M_001126115.2 #6

CTC14

¹ HGVS mutation nomenclature, ¹ Accession number according to ^" NCBI

³ CT, constitutional DNA, PT, primary tumor; MT, metastasis

In patient #26 we identified one constitutional mutation, however within our panel of genes we did not identify driver mutations according to the aforementioned definition. The CTCs had again acquired numerous progressor mutations.

Altogether, we found that metastases and CTCs were superimposed by an accumulation of progressor mutations associated with clonal evolution.

Correlation with clinical parameters

We started to correlate our plasma-DNA and CTC findings with some clinical parameters. The level of established CRC tumor markers CEA and CA19-9 was not correlated with increased plasma-DNA concentration or the number of CTCs. The percentage of patients with a non-apoptosis peak in our plasma analysis was 67%, 44%, and 38%> for patients with metastases in bone (n=6) or liver (n=25), and peritoneal carcinomatosis (n=8), respectively, but only 9%> for patients with lung metastasis (n=l l) (Tab. 5). These percentages were almost identical for patients with more than 6 CTCs, i.e. 67%>, 36%>, 9%>, and 38% for patients with metastasis in bone, liver, lung, and peritoneal carcinomatosis, respectively. In fact, there was only one of 11 patients with lung metastasis with both the non-apoptosis peak and a high "CTC" number according to the Veridex system, and this exception was patient 38. Whether lung metastasis may have a filtering capacity reducing both the number of mutant DNA fragments and CTCs is at present very speculative and warrants further investigations.

Table 5: Sites of metastases and their association with the non-apoptosis peak and number ofCTCs.

Discussion

A major goal of cancer medicine is to progress from fixed treatment regimens to bespoke therapy tailored to a patient's tumor. Efficient monitoring response to anti-cancer therapy is a prerequisite for individualizing treatment choices. Here we addressed some fundamental questions regarding the use of both tumor plasma-DNA and CTCs as predictive and prognostic biomarkers.

Variable detection rates and integrity of tumor associated plasma-DNA

Despite the use of similar techniques and patient cohorts substantial differences in the detection rates of abnormal forms or quantities of DNA in plasma or serum were described (Sidransky 2002; Pinzani et al. 2010; van der Vaart and Pretorius 2010; Schwarzenbach et al. 2011). Furthermore, some studies found that circulating mutant tumor DNA fragments are degraded (Diehl et al. 2005; Miiller et al. 2008) whereas an increased integrity of free circulating DNA in patients with various cancers was also reported (Wang et al. 2003; Umetani et al. 2006). Our results contribute to explain these reported discrepancies.

Based on our observations patients with CRC can be subdivided into two groups. The first group has elevated plasma-DNA levels compared to healthy controls, but as confirmed by deep sequencing and array-CGH a very low percentage of mutated DNA fragments. This is consistent with necrotic neoplastic cells being engulfed by macrophages, which involves the killing of neoplastic cells and surrounding stromal and inflammatory cells (Diehl et al. 2005). The released DNA will contain multiple wild type DNA sequences and may thus explain the increase in total, non-mutant circulating DNA.

The second group is characterized by the non-apoptosis peak, which indicates a distinct biological process because its occurrence was associated with very high plasma-DNA levels, elevated percentages of mutated DNA fragments in the circulation and an increased number of CTCs. The non-apoptosis peak likely reflects massive cell destruction with direct shedding of DNA from tumor cells and cellular fragments into the bloodstream. Schwarzenbach et al. (2009) suggested a link between the presence of CTCs and allelic imbalances of 3 microsatellite markers in patients with prostate cancer. However, whether this referred to a similar scenario as described here remains elusive as the authors analyzed plasma-DNA only with a very small set of microsatellite markers and did not investigate essential parameters, such as plasma-DNA sizing, characterization of the primary tumor and/or metastasis and characterization of CTCs.

The reason for the reported variable detection rates and different plasma-DNA fragment lengths is that the release of tumor DNA into the circulation is likely not a continuous but a stochastic process, which critically depends on the current rate of tumor cell destruction and their proximity to blood vessels. As the half-life of tumor plasma-DNA had been estimated to be merely 16 minutes (Lo et al. 1999) the amount of plasma-DNA may not always reflect the true tumor burden but rather events during the last 30 minutes prior to the blood collection. The same should apply to the number of CTCs, which is apparently linked to the release of tumor DNA into the circulation. This also explains why the plasma- DNA and CTCs varied significantly in our cohort although all of our patients had advanced stage disease based on well-defined and established clinical parameters, which was progressive at the time of blood collection. Furthermore, this is in line with a report by Diehl et al. (2005), who observed that the ability of mutant APC fragments to get into the circulation was clearly not related to tumor load (including metastatic deposits).

Plasma-DNA and CTC analysis

Reconstruction of tumor genomes from plasma-DNA as done here has not been described before and significantly extends previous studies using only small marker sets for analysis (Miiller et al. 2008; Schwarzenbach et al. 2009). In addition, direct demonstration of multiple genetically related subclones within a CTC population and their relationships to each other has previously been hampered by the lack of tools for the detection of rare genetic variants and their analyses. We employed here our advanced single cell techniques (Geigl and Speicher 2007; Geigl et al. 2009), which we recently also applied for the analysis of small cell numbers (<10 cells) from tissue sections (Begus-Nahrmann et al. 2009; Aleksic et al. 2011).

CTCs reflect a significant heterogeneity of the tumor fulfilling all criteria for chromosomal instability (Geigl et al. 2008) and thus provide unique insights into the tumor cell population substructure. In contrast, results from the plasma-DNA may reflect more the most predominant changes of the tumor genome at the time of blood collection. The ease with which plasma-DNA can be analyzed may make it to an especially attractive tool for disease monitoring.

We showed that subclones of CTCs have a mixture of shared and private somatic changes. The subclones with the best evolutionary fitness will, in time, come to dominate the cancer cell population, and each will be marked by the presence of mutations that provide a direct competitive advantage (driver mutations) and by others acquired during clonal evolution that contribute nothing to the subclone's oncogenic potential (passenger mutations). Our results suggest that next-generation sequencing technologies are applicable even to single cell amplification products making the search for novel driver mutations and the identification of cell clones with novel characteristics feasible. Whether cells, with private copy number changes represent "passenger cells" whereas cells within the main tumor lineage are the real "driver cells" will have to be elucidated by further studies. Disease monitoring with plasma-DNA and CTCs and future strategies

Serial monitoring of residual disease using plasma-DNA has been especially successful in hematological diseases (Jabbour et al. 2008; Flohr et al. 2008). Therefore, there is currently enthusiasm that this approach can be transferred to solid tumors (Leary et al. 2010; McBride et al. 2010).

However, our results suggest some significant differences between hematologic and solid malignancies: First, analyses based on plasma-DNA measurements and CTCs may critically depend on relatively acute events, such as the amount of dying/destroyed tumor cells and therefore may provide only a snapshot of events, which happened prior to the blood drawing. It is obvious that in the presence of large tumor masses the chances for such events are higher as compared to small tumor lesions. However, as outlined in our model disease monitoring in solid tumors will to a large extend be influenced by such stochastic events.

Second, translocations monitored in hematologic diseases are well-established driver mutations whereas relapsing cells in solid tumors do not have to carry a driver rearrangement found in the primary tumor. Instead, it may be possible for a relapsing clone to have lost the rearrangement and still be malignant. Thus, solid tumors will likely require a higher number of molecular targets for disease monitoring, which may make approaches providing data on the entire genome as presented here, especially attractive.

Patient monitoring using plasma-DNA and CTCs may evolve to a routine laboratory test to detect the development of an aggressive tumor subclone. Our analyses demonstrate that progression of chromosomal copy number changes and acquisition of novel driver mutations during tumor evolution can be established from both plasma-DNA and CTCs. This may pave the way for new options for disease monitoring and may represent another step in the progress to "personalized genomics".

REFERENCES

Aleksic K et al. (2011) Evolution of genomic instability in diethylnitrosamine-induced hepatocarcinogenesis in mice. Hepatology 53:895-904

Begus-Nahrmann Y, et al. (2009) p53 deletion impairs clearance of chromosomal-instable stem cells in aging telomere-dysfunctional mice. Nat Genet 41 : 1138-43. Epub 2009 Aug 30.

Chiu RW, et al. (2001) Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study. BMJ 342x7401.

Chiu RW, Lo YM (201 1) Non- invasive prenatal diagnosis by fetal nucleic acid analysis in maternal plasma: the coming of age. Semin Fetal Neonatal Med 16:88-93. Epub 2010 Nov 12

Chim SS, et al (2005) Detection of the placental epigenetic signature of the maspin gene in maternal plasma. Proc Natl Acad Sci USA 102: 14753-8. Epub 2005 Oct 3.

Cristofanilli M, et al. (2004) Circulating tumor cells, disease progression, and survival in metastatic breast cancer. N Engl J Med 351 :781-91.

Dhallan R, et al. (2004) Methods to increase the percentage of free fetal DNA recovered from the maternal circulation. JAMA 291 : 1114-9.

Dhallan R, et al. (2007) A non-invasive test for prenatal diagnosis based on fetal DNA present in maternal blood: a preliminary study. Lancet 369:474-81.

Diehl F, et al. (2005) Detection and quantification of mutations in the plasma of patients with colorectal tumors. Proc Natl Acad Sci USA 102: 16368-73. Epub 2005 Oct 28 Diehl F, et al. (2008a) Analysis of mutations in DNA isolated from plasma and stool of colorectal cancer patients. Gastroenterology 135:489-98. Epub 2008 May 15.

Diehl F, et al. (2008b) Circulating mutant DNA to assess tumor dynamics. Nat Med 14:985-90. Epub 2007 Jul 31.

Eisen MB, et al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863-8.

Flohr T,et al; International BFM Study Group (I-BFM-SG) (2008) Minimal residual disease-directed risk stratification using real-time quantitative PCR analysis of immunoglobulin and T-cell receptor gene rearrangements in the international multicenter trial AIEOP-BFM ALL 2000 for childhood acute lymphoblastic leukemia. Leukemia 22:771-82. Epub 2008 Jan 31.

Geigl JB, et al. (2008) Defining 'chromosomal instability'. Trends Genet 24:64-9. Epub 2008 Jan 14.

Geigl JB, et al. (2009) Identification of small gains and losses in single cells after whole genome amplification on tiling oligo arrays. Nucleic Acids Res 37:el05. Epub 2009 Jun 18.

Geigl JB, Speicher MR (2007) Single-cell isolation from cell suspensions and whole genome amplification from single cells to provide templates for CGH analysis. Nat Protoc 2:3173-84.

Gormally E, et al. (2007) Circulating free DNA in plasma or serum as biomarker of carcinogenesis: practical aspects and biological significance. Mutat Res 635: 105-17. Epub 2007 Jan 25.

Jabbour E, et al. (2008) Molecular monitoring in chronic myeloid leukemia: response to tyrosine kinase inhibitors and prognostic implications. Cancer 112:2112-8.

Leary RJ, et al. (2010) Development of personalized tumor biomarkers using massively parallel sequencing. Sci Transl Med 2:20ral4.

Leon SA, et al. (1977) Free DNA in the serum of cancer patients and the effect of therapy. Cancer Res 37:646-50.

Lewin B, in Gene IX (Jones and Bartlett, Sudbury, MA, 2008), pp. 757-795

Lo YM, et al. (1998) Quantitative analysis of fetal DNA in maternal plasma and serum: implications for noninvasive prenatal diagnosis. Am J Hum Genet 62:768-75.

Lo YM, et al. (1999) Rapid clearance of fetal DNA from maternal plasma. Am J Hum Genet 64:218-24.

Lo YM, et al. (2010) Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci Transl Med 2:61ra91.

Margulies M, et al, (2005) Genome sequencing in micro fabricated high-density picolitre reactors. Nature 437:376-80. Epub 2005 Jul 31.

McBride DJ, et al. (2010) Use of cancer-specific genomic rearrangements to quantify disease burden in plasma from patients with solid tumors. Genes Chromosomes Cancer 49: 1062-9.

Metzker ML (2010) Sequencing technologies - the next generation, Nature Reviews Genetics 11 :31 Miiller I, et al. (2008) Identification of loss of heterozygosity on circulating free DNA in peripheral blood of prostate cancer patients: potential and technical improvements. Clin Chem 54:688-96. Epub 2008 Feb 15.

Nagrath S, et al. (2007) Isolation of rare circulating tumour cells in cancer patients by microchip technology. Nature 450: 1235-9

Nawroz H, et al. (1996) Microsatellite alterations in serum DNA of head and neck cancer patients. Nat Med 2: 1035-7

Oldenhuis C.N.A.M. (2008) Prognostic versus predictive value of biomarkers in oncology, European Journal of Cancer 44: 946-953

Pantel K, et al. (2008) Detection, clinical relevance and specific biological properties of disseminating tumour cells. Nat Rev Cancer 8:329-40

Papageorgiou EA, et al (2011) Fetal- specific DNA methylation ratio permits noninvasive prenatal diagnosis of trisomy 21. Nat Med 17:510-3. Epub 2011 Mar 6.

Pinzani P, et al. (2010) Circulating nucleic acids in cancer and pregnancy. Methods 50:302-7. Epub 2010 Feb 8.

R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07- 0, URL http://www.R-project.org

Riethdorf S, et al. (2007) Detection of circulating tumor cells in peripheral blood of patients with metastatic breast cancer: a validation study of the CellSearch System. Clin Cancer Res 13:920-928

Schwarzenbach H, et al. (2009) Cell-free tumor DNA in blood plasma as a marker for circulating tumor cells in prostate cancer. Clin Cancer Res 15:1032-8.

Schwarzenbach H, et al. (2011) Cell- free nucleic acids as biomarkers in cancer patients. Nat Rev Cancer 11 :426-37. Epub 2011 May 12.

Sidransky D (2002) Emerging molecular markers of cancer. Nat Rev Cancer 2:210-9

Stroun M, et al. (1989) Neoplastic characteristics of the DNA found in the plasma of cancer patients. Oncology 46:318-22

Umetani N, et al. (2006) Prediction of breast tumor progression by integrity of free circulating DNA in serum. J Clin Oncol 24:4270-6

van der Vaart M, Pretorius PJ (2010) Is the role of circulating DNA as a biomarker of cancer being prematurely overrated? Clin Biochem 43:26-36. Epub 2009 Sep 9.

Walther A, et al. (2009) Genetic prognostic and predictive markers in colorectal cancer. Nat Rev Cancer 9:489-99. Epub 2009 Jun 18.

Wang BG, et al. (2003) Increased plasma DNA integrity in cancer patients. Cancer Res 63:3966-8

Wood LD, et al. (2007) The genomic landscapes of human breast and colorectal cancers. Science 318: 1108-13. Epub 2007 Oct 11.

Yachida S, et al. (2010) Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467: 1114-7.

Yung TK, et al. (2009) Single-molecule detection of epidermal growth factor receptor mutations in plasma by microfluidics digital PCR in non-small cell lung cancer patients. Clin Cancer Res 15:2076-84. Epub 2009 Mar 10.

Claims

A method of diagnosing a disease associated with increased apoptosis, comprising a) providing a plasma sample from a subject's blood; and

The method of claim 1, wherein the non-apoptosis peak represents DNA in the size of above 200 bp, particularly in the range of from 200 bp to 1000 bp, more particularly of from 240 bp to 600 bp, especially from 250 bp to 400 bp.

The method of any of claims 1 or 2, wherein the apoptosis peak is characterized by a maximum in the range of from 160 bp to 170 bp and/or wherein the apoptosis peak represents DNA in the range of from 50 bp to 250 bp, more particularly from 80 to 240 bp.

The method of any of claims 1 to 3, wherein the non-apoptosis peak is increased

- if the non-apoptosis peak contains at least 5 % of the total plasma-DNA, particularly at least 10 %, more particularly ate least 20 %; and /or

- if the ratio of the maximal height of the non-apoptosis peak to the maximal height of the apoptosis peak is at least 20 %, preferably at least 30 %, more preferable at least 33 %.

5. The method of any of claims 1 to 4, wherein the method comprises a further diagnostic step, if an increased non-apoptosis peak in the size distribution relative to a control is detected, particularly wherein in the further diagnostic step the copy number of tumor cells is determined and/or plasma DNA is further analyzed and/or plasma DNA is sequenced and/or the presence of circulating tumor cells (CTCs) is detected.

6. The method of any of claims 1 to 5, wherein the method allows to establish genome- wide copy number changes in the genome of the patient to be diagnosed, particularly the genome of a cancer patient or the genome of the fetus in a pregnant female, from plasma DNA; and/or wherein the method provides information about genomic aberrations such as copy numbers of the genome from the plasma DNA.

7. The method of any of claims 1 to 6, wherein the determining of the size distribution is by micro fluidics-based electrophoresis.

8. The method of any of claims 1 to 7, wherein the method further comprises determining the total plasma-DNA level, wherein an increased total plasma-DNA level is also indicative of the disease.

9. The method of any of claims 1 to 8, wherein the method further comprises amplifying the plasma-DNA and analyzing the amplified DNA, particularly by comparative genomic hybridization (CGH), especially by microarray-based CGH.

10. The method of any of claims 1 to 9, wherein the disease is cancer, particularly colorectal cancer, breast cancer, prostate cancer or lung cancer, or fetal aneuploidy.

11. The method of any of claims 1 to 10, wherein the method is for predictive or prognostic diagnosis or therapy monitoring, particularly of cancer.

12. The method of any of claims 1 to 11, wherein the subject is a mammal, particularly a human.

13. The method of any of claims 1 to 12, wherein the plasma sample is from the pregnant mother's blood if the disease is aneuploidy of the respective fetus.

14. Use of size distribution of plasma DNA in the diagnosis of a disease associated with increased apoptosis, wherein the size distribution is biphasic having a apoptosis peak and a non-apoptosis peak, wherein the apoptosis peak is characterized by a maximum in the range of from 150 bp to 180 bp and wherein the non-apoptosis peak is characterized by a maximum at in the range of from 300 bp to 350 bp, and wherein an increased non-apoptosis peak in the size distribution relative to a control is indicative of the disease.

15. The use of claim 14 further characterized as defined in any of claims 2 to 13.