WO2021110987A1 - Procédés et appareils permettant de diagnostiquer un cancer à partir d'acides nucléiques acellulaires - Google Patents

Procédés et appareils permettant de diagnostiquer un cancer à partir d'acides nucléiques acellulaires Download PDF

Info

Publication number
WO2021110987A1
WO2021110987A1 PCT/EP2020/084760 EP2020084760W WO2021110987A1 WO 2021110987 A1 WO2021110987 A1 WO 2021110987A1 EP 2020084760 W EP2020084760 W EP 2020084760W WO 2021110987 A1 WO2021110987 A1 WO 2021110987A1
Authority
WO
WIPO (PCT)
Prior art keywords
cancer
subject
sequence reads
sequencing
nucleic acids
Prior art date
Application number
PCT/EP2020/084760
Other languages
English (en)
Inventor
Ségolène DIRY
Emmanuel GILSON
Eric GINOUX
Virginie CHESNAIS
Original Assignee
Life & Soft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Life & Soft filed Critical Life & Soft
Publication of WO2021110987A1 publication Critical patent/WO2021110987A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present invention relates to methods and apparatuses for estimating the probability of a subject to be affected with cancer, diagnosing cancer, determining the origin of a tumor in a subject and determining a personalized course of treatment in a subject affected or likely to be affected with cancer; based on the sequencing of cell-free nucleic acids and identification therein of genetic, epigenetic, transcriptomic, metabolic and metagenomic biomarkers.
  • Non-invasive detection methods using imagery approach like mammography for the detection of breast cancers, or protein dosage like Prostate-Specific Antigen (PSA) dosage for prostate cancer detection are already used routinely.
  • PSA Prostate-Specific Antigen
  • these current methods are tumor site-specific, and are described to have a poor sensibility.
  • Carcinoembryonic Antigen (CEA) dosage used for the detection of colorectal cancer, is reported to have a sensibility of 41-52% and a specificity of 85-95%.
  • cfDNA Cell-free circulating DNA extracted from plasma helps diagnosing patients at initial cancer stage. Indeed, many tumors, even at an early stage, release cfDNA with the same genetic background than primary tumor. Recently, a combination of markers has been used to detect and localized 8 major cancers in a cohort of more than 1817 samples with a high accuracy. Somatic point mutations were identified on cfDNA in combination with protein dosages in plasma to determine the presence of a cancer with specificity greater than 99%, while sensibility ranging between 30% and 99% according to cancer type. Other studies have tried to reach same goal using only genomic information from cfDNA sequencing but retrieve lower accuracy.
  • Standard next generation sequencing technologies such as, e.g, Illumina ®
  • Standard next generation sequencing technologies involve clonal amplification of DNA and require specific experimental protocol for each biomarker.
  • bisulfite treatment is required beside the sequencing, while chromatin accessibility evaluation passes through PCR-free or single-stranded library.
  • Third-generation sequencing such as, e.g, Nanopore ® technologies, are characterized by the sequencing of native DNA that passes through the nanopore and changes the ion current.
  • This long-read sequencing technology can be combined with a shotgun PCR-free library to allow the detection of genomic alterations from point mutations to larger abnormalities like copy number variation (CNV) or rearrangement, the presence of viral specific sequence, the detection of methylated CpG or nucleosome position and chromatin remodeling.
  • CNV copy number variation
  • the invention described hereafter overcomes the limitations of currently known non-invasive methods, by offering a fast and efficient diagnosis of cancer from cell-free nucleic acids.
  • the present invention relates to a method for estimating the probability of a subject to be affected with cancer, comprising the steps of:
  • the present invention also relates to a method for diagnosing cancer in a subject in need thereof, comprising the steps of:
  • the present invention also relates to a method for determining the origin of a tumor in a subject in need thereof, comprising the steps of:
  • the present invention also relates to a method for determining a personalized course of treatment in a subject affected or likely to be affected with cancer, comprising the steps of:
  • the sample is a bodily fluid.
  • the sample is selected from the group comprising blood, lymph, ascetic fluid, cystic fluid, urine, gastric juices, pancreatic juices, bile, nipple exudate, synovial fluid, bronchoalveolar lavage fluid, mucus, sputum, amniotic fluid, peritoneal fluid, cerebrospinal fluid, pleural fluid, pericardial fluid, semen, milk, saliva, sweat, tears, feces, stools, and alveolar macrophages.
  • the sample is selected from the group comprising whole blood, plasma and serum.
  • the nucleic acids are cell-free nucleic acids (cfNAs). In one embodiment, the nucleic acids are cell-free circulating DNA (cfDNA). In one embodiment, the extracted nucleic acids are sequenced by single-molecule nucleic acid sequencing. In one embodiment, the extracted nucleic acids are sequenced by a sequencing method selected from the group comprising nanopore sequencing, single molecule real-time sequencing (SMRT), annular dark-field scanning transmission electron microscopy sequencing, Heli scope sequencing, nano-knife-edge probe sequencing. In one embodiment, the extracted nucleic acids are sequenced by nanopore sequencing.
  • SMRT single molecule real-time sequencing
  • annular dark-field scanning transmission electron microscopy sequencing Heli scope sequencing
  • nano-knife-edge probe sequencing In one embodiment, the extracted nucleic acids are sequenced by nanopore sequencing.
  • assigning the plurality of sequence reads at step (c) of the methods of the invention comprises: cl) aligning the plurality of sequence reads on the human genome, thereby obtaining human-mapped sequence reads; c2) discarding sequence reads that did not match with the human genome at step cl); c3) optionally, aligning sequence reads discarded at step c2) on at least one further reference genome or a portion thereof; preferably aligning sequence reads discarded at step c2) on at least one pathogen genome; preferably on a pathogen database; more preferably aligning sequence reads discarded at step c2) on at least one bacterial and/or viral genome; preferably on a bacterial and/or viral genome database; thereby obtaining exogenous-mapped sequence reads; c4) discarding sequence reads that did not match with the at least one further reference genome or a portion thereof at step c3).
  • genetic, epigenetic, transcriptomic, metabolic and metagenomic biomarkers of cancer include genomic alterations, telomere length, retrotransposon sequence, DNA hypermethylation or hypomethylation, nucleosome footprint, nucleic acid fragment size, mitochondria quantity, cancer-inducing virus sequences and cancer-associated bacteria sequences.
  • genomic alterations include base pair mutations, differential trinucleotide frequencies, mutational signatures, copy number alterations, gene rearrangements, short tandem repeat polymorphism, and/or chromosomal abnormalities.
  • computer-processing the plurality of mapped sequence reads at step d) of the methods of the invention comprises correlating the mapped sequence reads with information available in databases and/or with information obtained from at least one reference subject, preferably from a reference population.
  • at least one reference subject is a substantially healthy subject; or the at least one reference subject is a cancer subject.
  • the present invention also relates to a method for treating a subject affected with cancer, comprising the steps of:
  • step 2 treating said subject depending on the estimation, diagnosis, or determination of step 1).
  • treating said subject is carried out by any one of, or a combination of two or more of: surgery, radiation therapy, chemotherapy, activation immunotherapy, targeted therapy, hormone therapy, and stem cell transplant.
  • the present invention also relates to a computer system for: estimating the probability of a subject to be affected with cancer; or diagnosing cancer in a subj ect in need thereof; or determining the origin of a tumor in a subject in need thereof; or determining a personalized course of treatment in a subject affected with cancer; comprising: a) a processor and b) a storage medium that stores code readable by the processor; wherein the code stored on the storage medium, when executed by the processor, causes the computer system to: a.
  • At least one raw sequencing signal from a sequencing experiment of nucleic acids, preferably of cell-free nucleic acids (cfNAs), more preferably of cell-free circulating DNA (cfDNA), previously extracted from a sample from the subject; b. optionally, base-call and demultiplex said at least one raw sequencing signal, thereby obtaining at least one sequence read or a plurality of sequence reads; c. assign said at least one sequence read or the plurality of sequence reads to at least one reference genome or a portion thereof, thereby obtaining at least one mapped sequence read or a plurality of mapped sequence reads; d.
  • cfNAs cell-free nucleic acids
  • cfDNA cell-free circulating DNA
  • the term “about”, when set in front of a numerical value, means that said numerical value is approximate and small variations would not significantly affect the practice of the disclosed embodiments. Such small variations are, e.g, of ⁇ 1 %, ⁇ 2 %, ⁇ 3 %, ⁇ 4 %, ⁇ 5 %, ⁇ 6 %, ⁇ 7 %, ⁇ 8 %, ⁇ 9 %, ⁇ 10 % or more.
  • the term “subject” refers to a mammal, preferably a human.
  • a subject may be a “patient”, i.e., a warm-blooded animal, more preferably a human, who/which is awaiting the receipt of, or is receiving medical care or was/is/will be the object of a medical procedure, or is monitored for the development of a disease, such as cancer.
  • patient refers here to any mammal, including humans, domestic and farm animals, and zoo, sports, or pet animals, such as dogs, cats, cattle, horses, sheep, pigs, goats, rabbits, etc.
  • the mammal is a primate, more preferably a human.
  • the present invention relates to a method for estimating the probability of a subject to be affected with cancer.
  • It also relates to a method for diagnosing cancer in a subject in need thereof. It also relates to a method for evaluating the origin of a tumor in a subj ect in need thereof.
  • It also relates to a method for determining a personalized course of treatment in a subject affected or likely to be affected with cancer.
  • step 1) estimating the probability of said subject to be affected with cancer, or diagnosing cancer in said subject, or determining the origin of a tumor in said subject; and 2) treating said subject depending on the estimation, diagnosis, or determination of step 1).
  • the methods according to the present invention are not limited to a specific type of cancer, and therefore apply to “cancer” in its broadest sense. Alternatively, the methods according to the present invention may also be adapted to a given type or subtype of cancer.
  • the cancer is an early cancer. In one embodiment, the cancer is an advanced cancer. In one embodiment, the cancer is a metastatic cancer. In one embodiment, the cancer is a recurrent cancer. In one embodiment, the cancer is a stage 0, stage I, stage II, stage III, or stage IV cancer.
  • stage of a cancer describes the size of a tumour and how far it has spread from where it originated.
  • the cancer is a stage 0 cancer.
  • Stage 0 cancer describes cancer in situ. Stage 0 cancers are still located in the place they started and have not spread to nearby tissues. This stage of cancer is often highly curable, usually by removing the entire tumor with surgery.
  • the cancer is a stage I cancer.
  • Stage I cancer describes a small cancer or tumor that has not grown deeply into nearby tissues. It also has not spread to the lymph nodes or other parts of the body.
  • the cancer is a stage II cancer. “Stage II cancer” indicates that the cancer has grown, but hasn’t spread.
  • the cancer is a stage III cancer.
  • Stage III cancer indicates that the cancer is larger and may have spread to the surrounding tissues and/or the lymph nodes.
  • the cancer is a stage IV cancer.
  • Stage IV cancer describes a cancer that has spread to other organs or parts of the body.
  • the cancer is a grade I, grade II, or grade III cancer.
  • the “grade” of a cancer describes the appearance of the cancerous cells. In general, a lower grade indicates a slower-growing cancer and a higher grade indicates a faster-growing one.
  • the cancer is a grade I cancer. “Grade I cancer” indicates that the cancer comprises cancer cells that resemble normal cells, which aren’t growing rapidly.
  • the cancer is a grade II cancer. “Grade II cancer” indicates that the cancer comprises cancer cells that don’t look like normal cells, which are growing faster than normal cells.
  • the cancer is a grade III cancer.
  • “Grade III cancer” indicates that the cancer comprises cancer cells that look abnormal, which may grow or spread more aggressively.
  • cancers include those listed in the 10 th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD), under chapter II, blocks COO to D48.
  • Further examples of cancers include, but are not limited to, adenofibroma, adenoma, agnogenic myeloid metaplasia, AIDS-related malignancies, ameloblastoma, anal cancer, angiofollicular mediastinal lymph node hyperplasia, angiokeratoma, angiolymphoid hyperplasia with eosinophilia, angiomatosis, anhidrotic ectodermal dysplasia, anterofacial dysplasia, apocrine metaplasia, apudoma, asphyxiating thoracic dysplasia, astrocytoma (including, e.g, cerebellar astrocytoma and cerebral astrocytoma), atriodigital dysplasia, atypical mel
  • the cancer is a liquid cancer.
  • liquid cancer refers to cancer cells that are present in body fluids, such as blood, lymph and bone marrow. Lymphomas and leukemias are common types of such liquid cancers. In one embodiment, the cancer is a common cancer.
  • the term “common cancer” refers to one of the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more cancer that is clinically diagnosed with the greatest frequency in a population.
  • the term “common cancer” refers to a cancer that is diagnosed with an annual incidence rate above about 1 in 50,000 people, such as about 1 in 40000 people, about 1 in 30 000 people, about 1 in 20000 people, about 1 in 1000 people 0, about 1 in 9 500 people, about 1 in 9000 people, about 1 in 8 750 people, about
  • 1 in 7 750 people about 1 in 7 500 people, about 1 in 7250 people, about 1 in 7 000 people, about 1 in 6750 people, about 1 in 6 500 people, about
  • Examples of common cancers include, but are not limited to, breast cancer, lung and bronchus cancer, prostate cancer, colorectal cancer, melanoma, bladder cancer, non- Hodgkin’s lymphoma, kidney cancer, endometrial cancer, leukemia, pancreatic cancer, thyroid cancer, and liver cancer.
  • the cancer is selected from the group comprising or consisting of breast cancer, lung and bronchus cancer, prostate cancer, colorectal cancer, melanoma, bladder cancer, kidney cancer, and endometrial cancer.
  • the methods comprise a step of extracting nucleic acids from a sample.
  • the sample is a body tissue sample or a bodily fluid sample.
  • the sample is a body tissue sample.
  • body tissues include, but are not limited to, muscle, nerve, brain, heart, lung, liver, pancreas, spleen, thymus, esophagus, stomach, intestine, kidney, testis, prostate, ovary, hair, skin, bone, breast, uterus, bladder and spinal cord.
  • a body tissue sample may be recovered from the subject, e.g, by biopsy or during a surgical operation.
  • the sample is not a body tissue sample.
  • the sample is a bodily fluid.
  • bodily fluids include, but are not limited to, blood (including whole blood, plasma and serum), lymph, ascetic fluid, cystic fluid, urine, gastric juices, pancreatic juices, bile, nipple exudate, synovial fluid, bronchoalveolar lavage fluid, mucus, sputum, amniotic fluid, peritoneal fluid, cerebrospinal fluid, pleural fluid, pericardial fluid, semen, milk, saliva, sweat, tears, feces, stools, and alveolar macrophages.
  • the sample is blood, such as whole blood, plasma or serum.
  • the sample is whole blood.
  • the term “whole blood” is as conventionally defined.
  • the sample is readily obtainable by minimally invasive methods or non-invasive methods, allowing the removal or isolation of the whole blood from the subject.
  • the sample is plasma.
  • plasma is as conventionally defined. Plasma is usually obtained from a sample of whole blood, provided or contacted with an anticoagulant (such as, e.g, heparin, citrate, oxalate or EDTA). Subsequently, cellular components of the whole blood sample are separated from the liquid component (i.e., the plasma) by an appropriate technique, typically by centrifugation.
  • an anticoagulant such as, e.g, heparin, citrate, oxalate or EDTA
  • the sample is serum.
  • serum is as conventionally defined. Serum can be usually obtained from a sample of whole blood, by (1) allowing clotting to take place in the whole blood sample and (2) subsequently separating the so-formed clot and cellular components of the blood sample from the liquid component (i.e., the serum) by an appropriate technique, typically by centrifugation. Alternatively, serum can be obtained from plasma by removing the anticoagulant and fibrin. The term “serum” therefore refers to a composition which does not form part of a human or animal body.
  • the sample was previously taken from the subject, i.e., the method of the invention does not comprise a step of recovering a sample from the subject. Consequently, according to this embodiment, the methods of the invention are non- invasive methods.
  • nucleic acids refers to both DNA and RNA. Nucleic acids can be single-stranded or double-stranded. In one embodiment, the nucleic acid is DNA. In one embodiment, the nucleic acid is RNA.
  • the nucleic acids are cell-free nucleic acids (cfNAs).
  • cell-free nucleic acid or “cfNA”, sometimes referred to as “cell-free circulating nucleic acid” or “circulating nucleic acid”, are commonly used in the art to describe nucleic acid fragments that circulate in a subject’s bodily fluid and originate from one or more healthy cells and/or from one or more cancer cells from said subject.
  • the cfNA is a cell-free circulating DNA (cfDNA). In one embodiment, the cfNA is a cell-free circulating RNA (cfRNA).
  • Means and methods for extracting nucleic acids from a sample are well known to the one skilled in the art. Such means and methods include, e.g, phenol -chi oroform extraction method, or commercially available nucleic acid extraction reagents. Extraction can be carried out using commercially available kits.
  • cfNA cfDNA, cfRNA or a combination of both
  • several means and methods can be carried out.
  • means and methods for extracting cfDNAs are well known in the art and commercial kits are readily available, e.g, the phenol -chi oroform extraction method, the sodium iodide extraction method, the guanidine-resin extraction method, the “QIAamp ® MinElute ccfDNA” kit from Qiagen, the “QIAamp ® Circulating Nucleic Acids” kit from Qiagen, the “QIAamp ® DNA Blood” kit from Qiagen, the “Gentra Puregene Blood” kit from Qiagen, the “MagMAXTM Cell-Free DNA Isolation” kit from Applied Biosystem, the “Quick-cfDNA Serum & Plasma” kit from Zymo Research, and the like.
  • means and methods for extracting cfRNAs can be adapted from the art and commercial kits are readily available, e.g, the trizol extraction method, the “RNeasy Mini” kit from Qiagen, the “QIAamp ® Circulating Nucleic Acids” kit from Qiagen, the “MagMAXTM-96 Blood RNA Isolation” kit from Thermofisher Scientific, and the like.
  • means and methods for extracting total cfNAs can be adapted from the art and commercial kits are readily available, e.g, the “AllPrep DNA/RNA Mini” kit from Qiagen, “MagMAXTM Cell-Free Total Nucleic Acid Isolation” kit from Thermofisher Scientific, and the like.
  • the methods comprise a step of sequencing the extracted nucleic acids, preferably the extracted cfNAs.
  • sequence refers to any method by which the identity of at least about 5, about 10, about 15, about 20, about 25, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100, about 125, about 150, about 175, about 200 or more nucleotides of a nucleic acid molecule is obtained.
  • sequence encompasses methods by which epigenetic information may also be obtained, such as, e.g, nucleotide modifications.
  • nucleotide modifications refers to any modification of a nucleotide which does not affect the nucleic acid sequence itself. Examples of such modifications include, but are not limited to, methylation (such as, e.g, cytosine methylation leading to 5-methylcytosine; adenosine methylation leading to A ⁇ -methyladenosine), oxidation (such as, e.g, 5-methylcytosine oxidation leading to 5-hydroxymethylcytosine; 5-hydroxymethylcytosine oxidation leading to 5-formylcytosine; 5-formylcytosine oxidation leading to 5-carboxylcytosine). These modifications are well known and to the one skilled in the art.
  • the extracted nucleic acids preferably the extracted cfNAs, are sequenced by a sequencing method detecting nucleotide-specific physicochemical features, including size, optical, electrical, and/or magnetic properties.
  • the extracted nucleic acids are sequenced by a sequencing method detecting nucleotide size.
  • the extracted nucleic acids, preferably the extracted cfNAs are sequenced by a sequencing method detecting nucleotide optical properties (such as, e.g, fluorescence or absorption spectrum).
  • the extracted nucleic acids, preferably the extracted cfNAs are sequenced by a sequencing method detecting nucleotide electrical properties.
  • the extracted nucleic acids, preferably the extracted cfNAs are sequenced by a sequencing method detecting nucleotide magnetic properties.
  • the extracted nucleic acids are not sequenced by a first- or second-generation sequencing method.
  • first-generation sequencing refers to Sanger sequencing, i.e., a sequencing method based on the selective incorporation of chain-terminating di-deoxynucleotides by DNA polymerase during in vitro DNA replication.
  • second-generation sequencing also termed “massive parallel sequencing”, “massively parallel sequencing” or “next-generation sequencing”, refers to methods of “sequencing-by-synthesis”, wherein nucleic acid molecules to be sequenced are amplified, then sequenced in batch through nucleic acid neostrand synthesis.
  • second-generation sequencing methods include, but are not limited to, pyrosequencing (such as, e.g., using the 454 platform from Roche, or the GS FLX Titanium platform from 454 Life Sciences), sequencing by reversible terminator chemistry (such as, e.g, using the MiSeq platform, the HiSeq platform or the Genome Analyzer IIX platform from Illumina), and sequencing by ligation (such as, e.g, using the SOLiD4 platform from Life Technologies, now Thermo Fisher Scientific; or the Complete Genomics platform from Complete Genomics).
  • the extracted nucleic acids preferably the extracted cfNAs, are sequenced by a third-generation sequencing method.
  • third-generation sequencing also termed “single-molecule nucleic acid sequencing”, refers to sequencing methods, wherein the nucleotide sequence is read at the single nucleic acid molecule level.
  • third-generation sequencing methods include, but are not limited to, nanopore sequencing (such as, e.g, from Oxford Nanopore Technology, from Quantapore, or from Stratos Genomics Inc.); single molecule real-time sequencing (SMRT) (such as, e.g.
  • nanopore sequencing may sometimes be referred in some literature to as fourth-generati on sequencing.
  • Third-generation sequencing methods are well known in the art. For a review, see, e.g., Niedringhaus et al. (2011. Anal Chem. 83(12):4327-41) or Xu et al. (2009. Small. 5(23):2638-49).
  • the sequencing method provides, beside the identity of the nucleotides of a nucleic acid molecule, epigenetic information, such as, e.g., nucleotide modifications.
  • the extracted nucleic acids are sequenced by nanopore sequencing.
  • raw sequencing data are obtained upon sequencing the extracted nucleic acids, preferably the extracted cfNAs.
  • raw sequencing data refers to the output of a sequencing run.
  • Raw sequencing data are represented by the signal measured by the sequencer.
  • raw sequencing data may be pictures of fluorescent signal or recording of electric signal.
  • raw sequencing data are pre-processed to obtain sequence reads.
  • pre-process also termed “base-call”, “base-called”, “base-calling”, refer to the transformation of the raw sequencing data (e.g, the fluorescent signal, electric signal, or the like) into corresponding nucleotides; in other words, to the assignment of nucleotides to a raw sequencing signal.
  • a plurality of sequence reads is obtained upon sequencing the extracted nucleic acids, preferably the extracted cfNAs. In one embodiment, a plurality of sequence reads is obtained upon base-calling of the raw sequencing data.
  • sequence read refers to the output of a sequencing run after pre-processing of raw signal. Sequence reads are represented by a string of nucleotides. Sequence reads may be accompanied by metrics about the quality of the sequence. The quality is determined during the base-calling step and indicates the accuracy of base called. For example, each nucleotide in a sequence read may be associated with the confidence of the base-call, i.e., a determination of whether a nucleotide is a G, A, T or C, for that position. In one embodiment, a plurality of “sequence reads” can include unique or substantially unique nucleic acid sequences.
  • a plurality of “sequence reads” can include redundant sequences of the same parent molecule, generated, e.g, by an amplification step carried out before and/or during sequencing.
  • “consensus sequence reads” can be generated from sequence reads after comparing redundant sequence reads and selecting the most common nucleotide observed at a given position, after comparing to a reference genome or a portion thereof or other approaches.
  • Unique or non-unique molecular tags UMI can be added to the nucleic acids to be sequenced before an amplification step, to label each nucleic acid molecule.
  • sequencing run refers to any step or portion of a sequencing experiment performed to determine some information related to at least one nucleic acid molecule.
  • sequence reads when referring to sequence reads, means that more than one, such as, e.g, at least 2 sequence reads are obtained. In certain cases, a plurality of sequence reads may have at least about 10, at least about 100, at least about 1000, at least about 10 000, at least about 100 000, at least about 10 6 , at least about 10 7 , at least about 10 8 , at least about 10 9 or more sequence reads.
  • the methods comprise a step of assigning the plurality of sequence reads to at least one reference genome or a portion thereof.
  • nucleic acid sequences such as, e.g, a sequence read and a reference genome sequence or a portion thereof
  • mapped e.g., a sequence read and a reference genome sequence or a portion thereof
  • sequence identity e.g., with at least about 50% sequence identity, such as at least about 60%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, about 96%, about 97%, about 98%, or about 99% sequence identity.
  • a sequence read is assigned to a reference genome sequence or a portion thereof when said sequence read is “aligned” on said reference genome sequence or a portion thereof.
  • an alignment may comprise a mismatch, i.e., a site at which a nucleotide in one sequence read and a nucleotide in the - or in a portion of the - reference genome with which it is aligned are not complementary.
  • an alignment may comprise 1, 2, 3, 4, 5 or more, contiguous or non-conti guous, mismatches.
  • an alignment may comprise 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10% or more, contiguous or non-conti guous, mismatches.
  • mapped sequence read refers to a sequence read that has been assigned to (such as, e.g, “mapped” or “aligned”) a matching sequence in the at least one reference genome.
  • Assigning a sequence read to a reference genome or a portion thereof can be done manually or by a computer (e.g. , using a software, program, computer program component, or algorithm or machine learning algorithm or deep learning algorithm).
  • a computer e.g. , using a software, program, computer program component, or algorithm or machine learning algorithm or deep learning algorithm.
  • Various computational methods can be used to assign sequence reads to a reference genome.
  • Sequence reads can be mapped by a mapping component or by a machine or computer comprising a mapping component (e.g. , a suitable mapping and/or alignment and/or classification program), which mapping component generally compared reads to a reference genome or segment thereof.
  • Sequence reads can be mapped to or aligned with a reference genome or a portion thereof by use of a suitable mapping and/or alignment program.
  • suitable mapping and/or alignment program include, but are not limited to, BWA (Li H. and Durbin R. (2009) Bioinformatics 25, 1754-60), Novoalign [Novocraft (2010)], Bowtie (Langmead B, et al., (2009) Genome Biol. 10:R25), SOAP2 (Li R, et al., (2009) Bioinformatics 25, 1966-67), BFAST (Homer N, et al., (2009) PLoS ONE , e7767), GASSST (Rizk, G. and Lavenier, D.
  • Sequence reads can be mapped assigned to (such as, e.g, mapped or aligned) a reference genome or a portion thereof using a suitable short read alignment program.
  • Examples of such program include, but are not limited to, BarraCUDA, BFAST, BLASTN, BLAST, BLAT, BLITZ, Bowtie (e.g, BOWTIE 1, BOWTIE 2), BWA, CASHX, CUDA-EC, CUSHAW, CUSHAW2, desalt, drFAST, FASTA, ELAND, ERNE, GNUMAP, GEM, GensearchNGS, GMAP, Geneious Assembler, GraphMap, iSAAC, LAST, MAQ, marginAlign, minimap, minimap2, mini align, mrFAST, mrsFAST, MO S AIK, MPscan, NanoBLASTer, Novoalign, NovoalignCS, Novocraft, NextGENe, Omixon, PALMapper, Partek, PASS, PerM, PROBEMATCH, QPalma, RazerS, REAL, cREAL, RMAP, rNA, RTG, Segemehl, Se
  • Sequence reads can be assigned to (such as, e.g, mapped or aligned) a reference genome or a portion thereof by use of a suitable machine learning or deep learning algorithms.
  • Examples of such program include, but are not limited to, fastText (Joulin, 2016. arXiv: 1607.01759 ⁇ cs.CL , Joulin et al, 2017. In 15th Conference of the European Chapter of the Association for Computational Linguistics (Eacl 2017): Valencia, Spain, 3-7 April 2017. Stroudsburg, PA: Association for Computational Linguistics), fastDNA (Menegaux & Vert, 2019. J Comput Biol. 26(6):509-518), large scale linear model by learning continuous low-dimensional representations of the k-mers.
  • a mapping component can map sequence reads by a suitable method known in the art or described herein.
  • a mapping component or a machine or computer comprising a mapping component is required to provide mapped sequence reads.
  • a mapping component often comprises a suitable mapping and/or alignment program or algorithm.
  • a plurality of sequence reads and/or information associated with a plurality of sequence reads are stored on and/or accessed from a non-transitory computer-readable storage medium in a suitable computer-readable format.
  • Information stored on a non-transitory computer-readable storage medium is sometimes referred to as a “file” or “data file”.
  • a file or data file often comprises a format.
  • a sequence read or a plurality of sequence reads is sometimes stored in a format that includes information about one or more sequence reads, non-limiting examples of which include, but are not limited to, a complete or partial nucleic acid sequence, mappability, a mappability score, a mapped location, a relative location or distance from other mapped or unmapped reads (e.g., estimated distance between read mates), orientation relative to a reference genome or to other reads (e.g. , relative to read mates), an estimated or precise location of a read mates, a G/C content, nucleotide modification (e.g, methylation), the like or combinations thereof.
  • a complete or partial nucleic acid sequence mappability, a mappability score, a mapped location, a relative location or distance from other mapped or unmapped reads (e.g., estimated distance between read mates), orientation relative to a reference genome or to other reads (e.g. , relative to read mates), an estimated or precise location of
  • a “computer-readable format” is sometimes referred to generally herein as a “format”.
  • sequence reads are stored and/or accessed in a suitable binary format, a text format, the like or a combination thereof.
  • a binary format is sometimes a BAM format.
  • a text format is sometimes a sequence alignment/map (SAM) format.
  • binary and/or text formats include, but are not limited to, BAM, sorted BAM, SAM, SRF, FASTA, FASTQ, Gzip, the like, or combinations thereof.
  • a program is configured to instruct a microprocessor to obtain or retrieve one or more files.
  • a program is configured to instruct a microprocessor to obtain or retrieve one or more FASTQ files (e.g. , a FASTQ file for a first read and a second read) and/or one or more reference files (e.g, a FASTA or FASTQ file).
  • a program instructs a microprocessor to call a computer program component and/or transfers data and/or information (e.g. , files) to or from one or more computer program components (e.g. , an adapter trimmer component, BWA- MEM aligner, insert size distribution component, samtools, and the like).
  • a program instructs a processor to call a computer program component which creates new files and formats for input into another processing step.
  • the plurality of sequence reads is assigned to (such as, e.g, mapped or aligned) at least one reference genome or a portion thereof, such as on 1 reference genome, on 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000 or more reference genomes or a portion thereof.
  • a sequence read in the plurality of sequence reads may uniquely or non-uni quely map to a reference genome or a portion thereof.
  • a sequence read is considered as “uniquely mapped” if it is assigned to (such as, e.g, mapped or aligned) - completely or partially - with a single sequence in the at least one reference genome or a portion thereof.
  • a sequence read is considered as “non-uniquely mapped” if it is assigned (such as, e.g, mapped or aligned) - completely or partially - with two or more sequences in the at least one reference genome or a portion thereof.
  • non-uniquely mapped sequence reads may be eliminated from further analysis.
  • a certain degree of mismatch between the reference genome or a portion thereof and the sequence reads may be allowed to account for, e.g, single nucleotide polymorphisms or sequencing errors.
  • no degree of mismatch between the reference genome or a portion thereof and the sequence reads may be allowed.
  • reference genome can refer to any particular known, sequenced or characterized genome, whether partial or complete, of any organism or virus which may be used to reference identified sequences in the plurality of sequence reads.
  • a “reference genome” may refers to a portion of a genome (e.g, a chromosome or part thereof, e.g, one or more portions of a genome).
  • Human genomes, human genome assemblies and/or genomes from any other organisms or virus can be used as a reference genome.
  • One or more human genomes, human genome assemblies as well as genomes of other organisms or viruses can be found, e.g, at the National Center for Biotechnology Information (NCBI) at www.ncbi.nlm.nih.gov/genome.
  • a “genome” refers to the complete genetic information of an organism or virus, expressed in nucleic acid sequences.
  • a reference genome or a portion thereof often is an assembled or partially assembled genomic sequence from a subject or multiple subject.
  • a reference genome or a portion thereof is an assembled or partially assembled genomic sequence from one or more human subjects.
  • a reference genome or a portion thereof comprises sequences assigned to chromosomes. In one embodiment, a reference genome or a portion thereof comprises sequences obtained from a reference subject or sample. In one embodiment, a reference genome or a portion thereof comprises sequences, an assembly of sequences, and/or a consensus sequence (e.g, a sequence contig). In one embodiment, a reference genome or a portion thereof is obtained from a reference subject or sample substantially free of a genetic variation. In one embodiment, a reference genome or a portion thereof is obtained from a reference subject or sample comprising a known genetic variation.
  • sequence reads can be assigned to (such as, e.g, mapped or aligned) sequences in nucleic acid databases known in the art.
  • databases include, but are not limited to, the International Nucleotide Sequence Database (at www.insdc.org), GenBank (at www.ncbi.nlm.nih.gov), the European Nucleotide Archive (at www.ebi.ac.uk/ena/browser/home), and the DNA Data Bank of Japan (at www.ddbj.nig.ac.jp).
  • Suitable examples include, without limitation, 23andMe, 1000 Genomes Project, ArrayExpress, Bioinformatic Harvester, ClinVar, COSMIC, dbSNP, ENCODE, Ensembl, Ensembl Genomes, Gene Disease Database, Gene Expression Omnibus (GEO), GTEx, HapMap, Human Microbiome Project (HMP), Human Protein Atlas (HP A), Online Mendelian Inheritance in Man (OMIM), Personal Genome Project, RefSeq, SNPedia, and TCGA.
  • BLAST or similar tools can be used to search sequence reads against a sequence database.
  • the mappability is assessed for a genomic region (e.g, one or more portions of a genome).
  • mappability refers to the ability to unambiguously assign a sequence read to a portion of a reference genome, typically up to a specified number of mismatches (such as, e.g, 1, 2, 3, 4, 5 or more mismatches).
  • mappability is provided as a score or value, where the score or value is generated by a suitable mapping algorithm or computer-mapping software.
  • the plurality of sequence reads is compared to one reference genome or a portion thereof.
  • the reference genome is the human ⁇ homo sapiens sapiens) genome or a portion thereof.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g., the human genome are discarded from further analysis.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g, the human genome are compared to at least one further reference genome or a portion thereof or genome database.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g, the human genome are compared to at least one pathogen genome or genome database, such as, e.g, an archaeal, bacterial, fungal, protist, protozoal, and/or viral reference genome or a portion thereof or genome database.
  • pathogen genome or genome database such as, e.g, an archaeal, bacterial, fungal, protist, protozoal, and/or viral reference genome or a portion thereof or genome database.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g, the human genome are compared to at least one archaeal genome or a portion thereof or with an archaeal genome database.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g, the human genome are compared to at least one bacterial genome or a portion thereof or with a bacterial genome database.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g, the human genome are compared to at least one fungal genome or a portion thereof or with a fungal genome database.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g, the human genome are compared to at least one protist genome or a portion thereof or with a protist genome database.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g, the human genome are compared to at least one protozoal genome or a portion thereof or with a protozoal genome database.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g, the human genome are compared to at least one viral genome or a portion thereof or with a viral genome database.
  • sequence reads that did not match with the reference genome or a portion thereof, e.g, the human genome are compared to a bacterial and/or viral genome database.
  • sequence reads that did not match with neither of the first reference genome, e.g, the human genome, and the further reference genome(s) or genome database(s), e.g, pathogen genome(s) or genome database(s) are discarded from further analysis.
  • sequence reads that matched with the at least one reference genome or a portion thereof are kept for further analysis.
  • mapped sequence reads may be classified into sequences reads that mapped with a first reference genome or a portion thereof, e.g, the human genome (i.e., “human-mapped sequence reads”), and sequences reads that mapped with the further reference genome(s) or a portion thereof, e.g, non-human genome, such as pathogen genome(s) or genome database(s) (i.e., “exogenous-mapped sequence reads”).
  • the methods comprise a step of computer-processing the mapped sequence reads.
  • computer-processing comprises identifying - or assessing the presence of - genetic, epigenetic, transcriptomic, metabolic and/or metagenomic biomarkers of cancer in said mapped sequence reads.
  • computer-processing comprises identifying - or assessing the presence of - genetic and epigenetic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - genetic and transcriptomic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - genetic and metabolic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - genetic and metagenomic biomarkers of cancer in said mapped sequence reads.
  • computer-processing comprises identifying - or assessing the presence of - epigenetic and transcriptomic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - epigenetic and metabolic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - epigenetic and metagenomic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - transcriptomic and metabolic biomarkers of cancer in said mapped sequence reads.
  • computer-processing comprises identifying - or assessing the presence of - transcriptomic and metagenomic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - metabolic and metagenomic biomarkers of cancer in said mapped sequence reads.
  • computer-processing comprises identifying - or assessing the presence of - genetic, epigenetic and transcriptomic biomarkers. In one embodiment, computer-processing comprises identifying - or assessing the presence of - genetic, epigenetic and metabolic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - genetic, epigenetic and metagenomic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - epigenetic, transcriptomic and metabolic biomarkers of cancer in said mapped sequence reads.
  • computer-processing comprises identifying - or assessing the presence of - epigenetic, transcriptomic and metagenomic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer- processing comprises identifying - or assessing the presence of - transcriptomic, metabolic and metagenomic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - genetic, epigenetic, transcriptomic and metabolic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - genetic, epigenetic, transcriptomic and metagenomic biomarkers of cancer in said mapped sequence reads.
  • computer-processing comprises identifying - or assessing the presence of - genetic, epigenetic, metabolic and metagenomic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - genetic, transcriptomic, metabolic and metagenomic biomarkers of cancer in said mapped sequence reads. In one embodiment, computer-processing comprises identifying - or assessing the presence of - epigenetic, transcriptomic, metabolic and metagenomic biomarkers of cancer in said mapped sequence reads.
  • computer-processing comprises identifying - or assessing the presence of - genetic, epigenetic, transcriptomic, metabolic and metagenomic biomarkers of cancer in said mapped sequence reads.
  • genetic biomarker is broader than a gene.
  • a “genetic biomarker” refers to a fragment of nucleic acid, such as a fragment of DNA, with an identifiable physical location on a chromosome whose inheritance can be followed.
  • a genetic biomarker can have a function and thus be, or be a fragment of, a gene.
  • a genetic biomarker can be a fragment of nucleic acid, such as a fragment of DNA, with no known function.
  • genetic biomarkers of cancer include, but are not limited to, genomic alterations (such as, e.g, somatic mutations, single-nucleotide polymorphism), telomere length evaluation, and retrotransposon sequence detection.
  • genomic alterations such as, e.g, somatic mutations, single-nucleotide polymorphism
  • telomere length evaluation such as, e.g, telomere length evaluation
  • retrotransposon sequence detection such as, e.g., somatic mutations, single-nucleotide polymorphism
  • identifying - or assessing the presence of - genetic biomarkers of cancer in said mapped sequence reads may comprise analyzing the position, gene and impact of genomic alterations and/or nucleosome footprint in said mapped sequence reads.
  • genomic alteration refers to a change (or mutation) in the nucleotide sequence of the genome of a cancer cell, which change is not present in a non-cancer cell genome.
  • genomic alterations include, but are not limited to, base pair substitutions (such as, e.g, single-nucleotide polymorphism), base pair insertions, base pair deletions, copy number alteration, gene rearrangement (such as, e.g, gene fusion), short tandem repeat polymorphism (such as, e.g, STR), chromosomal abnormalities and any combination thereof.
  • base pair substitutions, insertions and deletion are commonly included under the general term “base pair mutation”.
  • genomic alteration may be defined relative to the locus or gene in which it is present in cancer cells relative to a non-cancer cell genome. In one embodiment, the genomic alteration may be defined relative to trinucleotide frequencies in the genome of cancer cells relative to a non-cancer cell genome.
  • cancer-specific base pair mutations include, but are not limited to, mutations in any one of the following genes: ACVR2A, AFF3, ALK, APC, AR, ARID1A, ARID 2, ATM, ATRX, BARI, BCOR, BRAF, CAMTA1, CDH1, CDKN2A, CREBBP, CTCF, CTNNB1, EBF1, EGFR, EP300, ERBB2, ERBB3, ERBB4, ERCC2, ESR1, FAT1, FAT 4, FBXW7, FGFR2, FGFR3, FRIT, FOXP1, GATA3, GRIN2A, HRAS, KDM6A, KDM5C, KDR, KEAP1, KIT, KMT2A, KMT2C, KMT2D, KRAS, LPP, LRP1B, MAP3K1, MET, MTOR, MSH6, NF1, NF2, NRAS, PBRM1, PIK3CA, PIK3R1, POLE, PPP2R1A
  • Table 1 provides examples ofbase pair mutation occurrences (expressed in %) observed in certain types of cancers. Table 1. Extracted from GRCh38 COSMIC v90.
  • mutational signature Differential trinucleotide frequencies in the genome of cancer cells relative to a non-cancer cell genome have been termed “mutational signature” in the art. Examples of such mutational signatures are described in, e.g, in Alexandrov et al, 2013 Nature. 500(7463):415-21. For example, certain mutational signatures are known to be associated with certain types of cancers. Table 2 provides examples of mutational signatures and their correlation with certain types of cancers.
  • cancer-specific copy number alteration examples include, but are not limited to, increased copy number (i.e., gain of gene copy number in cancer cells relative to non-cancer cells) of any one of the following genes: AARD, APCDD1L, ATP 1 IB, ATP5F1E, CSMD3 , CTSZ , DCUN1D1, DPP6, EIF3H , EXT1, GNAS , HDAC9, LAMPS , MAL2, MCCC1, PRELID3B, RAB22A, RAD21, SAMD12 , SLC30A8, SOX2, TG, TOX2,
  • cancer-specific copy number alteration examples include, but are not limited to, decreased copy number (i.e., loss of gene copy number in cancer cells relative to non-cancer cells) of any one of the following genes: CDKN2A, CDKN2B, CSMD1, DNAAF5, EML4IALK, FRIT, MTAP, RBFOX1, and SCAPER.
  • cancer-specific gene rearrangement examples include, but are not limited to, ETV6/NTRK3 fusion, MYB/NFIB fusion, TMPRSS2/ERG fusion, and TRPSl fusion.
  • cancer-specific short tandem repeat polymorphism examples include, but are not limited to, IGF-I and AR.
  • the presence of any one of the following genomic alteration is known to be associated with breast cancer: ETV6/NTRK3 fusion, MYB/NFIB fusion, TRPSl fusion; increased copy number of AARD, CSMD3, EIF3H, EXT1, MAL2, RAD21, SAMD12, SLC30A8, and/or UTP23; and decreased copy number of CSMD1, DNAAF5, and/or SCAPER.
  • the presence of any one of the following genomic alteration is known to be associated with colorectal cancer: increased copy number of APCDD1L, ATP5F1E, CTSZ, GNAS, PRELID3B, RAB22A, TOX2, and/or VAPB, and decreased copy number of FRIT, and/or RBFOX1.
  • the presence of any one of the following genomic alteration is known to be associated with kidney cancer: ETV6INTRK3 fusion; and decreased copy number of SCAPER.
  • the presence of any one of the following genomic alteration is known to be associated with lung and bronchus cancer: increased copy number of ATP 1 IB, DCUN1D1, LAMPS , MCCC1, and/or SOX2; and decreased copy number of CDKN2B, EML4/ALK , and/or SCAPER.
  • the presence of any one of the following genomic alteration is known to be associated with melanoma: increased copy number of DPP6, HDAC9 , and/or TG; and decreased copy number of CDKN2A, CDKN2B, MTAP, and/or SCAPER.
  • the presence of the following genomic alteration is known to be associated with prostate cancer: TMPRSS2/ERG fusion.
  • the presence of the following genomic alteration is known to be associated with prostate cancer: CAG repeats length in AR.
  • Assessing the size - or size distribution - of telomeres in said mapped sequence reads may provide information on the presence of cancer cells.
  • small size-telomeres are indicatives of cancer.
  • Identifying - or assessing the presence of - retrotransposon sequences in said mapped sequence reads may comprise analyzing the number, position and impact of retrotransposon sequences.
  • retrotransposon sequences include, but are not limited to, short interspersed nuclear elements (such as, e.g., Alu sequences and mammalian-wide interspersed repeats), long interspersed nuclear elements (such as, e.g, LINE1 and LINE2), and long terminal repeats (such as, e.g, HERV, MER4 and retroposons).
  • epigenetic biomarker refers to a modification in a nucleic acid, such as in a DNA molecule, by a process or processes that do not change the nucleic acid sequence itself.
  • epigenetic biomarkers of cancer include, but are not limited to, DNA hypermethylation or hypomethylation (when taken in comparison to a substantially healthy, i.e., non-cancerous, sample), nucleosome footprint and nucleic acid fragment size.
  • identifying - or assessing the presence of - epigenetic biomarkers of cancer in said mapped sequence reads may comprise analyzing the position, CpG count and methylation status of said mapped sequence reads.
  • assessing the presence (or absence) of DNA hypermethylation or hypomethylation in a sample may provide information on cancer-specific methylation status.
  • the methylation status may be defined relative to a locus or a gene.
  • cancer-specific DNA hypermethylation i.e., increased presence of methylated nucleotides in cancer cells relative to non-cancer cells
  • examples of cancer-specific DNA hypermethylation include, but are not limited to, hypermethylation of any one of the following loci: 1:147545131, 1:159010051, 1:184867071, 1:234772479, 1:234772634, 1:9626465, 2:111494677,
  • cancer-specific DNA hypomethylation i.e., decreased presence of methylated nucleotides in cancer cells relative to non-cancer cells
  • DNA hyper- or hypomethylation is known to be associated with bladder cancer: DNA hypermethylation at locus 1:147545131, 2:2318016, 4:113355678, 4:1494607, 5:179354562, 6:30163104,
  • DNA hypermethylation at locus 1:234772479, 1:234772634, 2:111494677, 4:6322902, 5:112329851, 5:40841488, 6:30769291, 6:70312472, 7:17234713, 7:82805693, 13:101706760, 15:67150555, and/or
  • DNA hypermethylation at locus 1:9626465, 4:19455540, 4:634860, 6:10528259, 3:1163104, 3:13224, 7:1177297, 7:158428678, 8:88957006, 10:130045534, 12:122898852, 13:24511163, 13:24511531, and/or 18:74499068; and DNA hypomethylation at locus 1:227561011, 1:227561018, 1:54781467, 2:208124524, 2:239309155, 5:141419191, 6:104940793, 6:104953110, 6:104953118, 6:27582968, 7:149692578, 10:45427926, 14:20435452, 17:2238547
  • kidney cancer 1:159010051, 1 :234772634, 2: 111699234, 2:237687894, 7:17234713, 10:11685287, 10:133259456, 12:76183708, 14:75124193, 16:80027393, 16:80027393, and/or 16:80027460; and DNA hypomethylation at locus 1:10673454, 4:79964827, 6:31728646, 10:11275824, 19:1907973, 21:45425245, and/or X: 118499399.
  • DNA hyper- or hypomethylation is known to be associated with lung and bronchus cancer: DNA hypermethylation at locus 1:184867071, 5:3764427, 3:1163224, 3:1163224, 7:17234713, 7:75776649,
  • DNA hyper- or hypomethylation is known to be associated with prostate cancer: DNA hypermethylation at locus 2:239052858, 10:26642983, 17:79979217, and/or 17:79979289; and DNA hypomethylation at locus 2:208124524, 2:231396296, 6:104953110, 6:104953118, 7:16465977, 12:54259580, 19:38211354, and/or 19:46297345.
  • nucleosome footprint refers to the mapping of nucleosome occupancy, which correlates with nuclear architecture, gene structure and gene expression observed in a given type of cell. Hence, nucleosome footprinting allows to identify the cell-type of origin based on the fragmentation pattern of cfNA, expression of genes, presence of mitochondrial DNA, and the like.
  • Assessing the size - or size distribution - of said mapped sequence reads may provide information on the type of cell death responsible for the release of the cfNAs.
  • small size-mapped sequence reads are indicative of apoptosis.
  • large size-mapped sequence reads are indicative of necrosis.
  • transcriptomic biomarker refers to a nucleic acid fragment, such as RNA fragment, with an identified physical location.
  • a transcriptomic biomarker can be a count of nucleic fragment aligned at a position to represent gene expression level or the determination of alternative transcript expression, or the identification of small RNA of interest like miRNA implicated in gene expression regulation.
  • mitochondrial chromosome refers to the mitochondria quantity. Mitochondria quantity can be readily evaluated by quantifying sequencing reads aligned on the mitochondrial chromosome (chrM).
  • the mitochondrial chromosome is a closed circular molecule that contains 16.569 bp.
  • Each mitochondrial chromosome in a mitochondrion normally contains a full set of all the mitochondrial genes.
  • a human mitochondrion contains approximately 5 such mitochondrial chromosomes, with a quantity usually ranging from 1 to 15.
  • a “metagenomic biomarker” refers to a microbial sequence, such as, e.g, a nucleic acid sequence matching with an archaeal, bacterial, fungal, protist, protozoal, or viral reference genome or a portion thereof.
  • a “metagenomic biomarker” refers to a nucleic acid sequence matching with a bacterial and/or viral reference genome or a portion thereof.
  • assessing the presence (or absence) of pathogenic biomarkers of cancer in a sample may provide information on viral sequences originating, e.g, from cancer-inducing viruses; or on bacterial sequences originating, e.g, from cancer-associated bacteria.
  • the step of computer-processing the mapped sequence reads, and identifying - or assessing the presence of - pathogenic biomarkers of cancer in said mapped sequence reads is preferably performed on exogenous-mapped sequence reads identified in previous steps of the methods.
  • cancer-inducing viruses include, but are not limited to, cytomegalovirus (CMV), Epstein-Barr virus (EBV), hepatitis B virus (HBV), hepatitis C virus (HCV), Kaposi’s sarcoma-associated herpesvirus (KSHV, formally known as HHV-8), human immunodeficiency virus (HIV), human papillomavirus (HPV), human T-lymphotropic virus (also known as human T-cell lymphotropic virus or human T-cell leukemia-lymphoma virus, HTLV).
  • CMV cytomegalovirus
  • EBV Epstein-Barr virus
  • HBV hepatitis B virus
  • HCV hepatitis C virus
  • KSHV Kaposi’s sarcoma-associated herpesvirus
  • HAV human immunodeficiency virus
  • HPV human papillomavirus
  • T-lymphotropic virus also known as human T-cell lymphotropic virus or human
  • EBV is known to associated with Hodgkin’s and non- Hodgkin’s lymphoma, nasopharyngeal cancer, and Burkitt lymphoma.
  • HBV is known to be associated with hepatocellular carcinoma.
  • HCV is known to be associated with hepatocellular carcinoma.
  • HHV-8 is known to be associated with Kaposi sarcoma.
  • HIV is known to be associated with several cancers.
  • HPV is known to be associated with endometrial cancer.
  • HTLV is known to be associated with lymphoma and leukemia.
  • VCM is known to be associated with colorectal cancer.
  • cancer-associated bacteria examples include, but are not limited to, Bacteroides fragilis, Borrelia burgdorferi, Campylobacter jejuni, Chlamydia pneumonia,
  • Opisthorchis viverrini Salmonella enterica serovar Typhimurium, Salmonella enterica serovar Paratyphi, Salmonella Typhi, Schistozoma haematobium, Streptococcus bovis, and Treponema pallidum.
  • Salmonella enterica serovar Typhimurium Salmonella enterica serovar Paratyphi
  • Salmonella Typhi Salmonella Typhi
  • Schistozoma haematobium Streptococcus bovis
  • Treponema pallidum the presence of Helicobacter hepaticus, Salmonella enterica serovar Typhimurium, Salmonella enterica serovar Paratyphi and/or Opisthorchis viverrini in a sample is known to be associated with bile duct cancer.
  • the presence of Neisseria gonorrhoeae, Cutibacterium acnes and/or Treponema pallidum is known to be associated with prostate cancer.
  • Neisseria gonorrhoeae The presence of Neisseria gonorrhoeae, Cutibacterium acnes and/or Treponema pallidum, Helicobacter bilis, Salmonella Typhi and/or Schistozoma haematobium is known to be associated with bladder cancer.
  • the presence of Bacteroides fragilis, Clostridium ssp, Mycoplasma fermentans, Mycoplasma hyorhinis, Mycoplasma penetrans and/or Streptococcus bovis is known to be associated with colorectal cancer.
  • the presence of Chlamydia trachomatis is known to be associated with endometrial cancer.
  • Chlamydophila psittaci is known to be associated with eye cancer.
  • the presence of Borrelia burgdorferi, Helicobacter bizzozeronii, Helicobacter felis, Helicobacter heilmannii, Helicobacter pylori, Helicobacter salomonis, Helicobacter suis, Mycoplasma fermentans, Mycoplasma hyorhinis and/ or Mycoplasma penetrans is known to be associated with gastric cancer.
  • the presence of Chlamydia pneumoniae, Chlamydia pneumonia, Mycoplasma fermentans, Mycoplasma hyorhinis and/or Mycoplasma penetrans is known to be associated with lung cancer.
  • Mycoplasma fermentans The presence of Mycoplasma fermentans, Mycoplasma hyorhinis and/ or Mycoplasma penetrans is known to be associated with ovarian cancer.
  • the presence of Campylobacter jejuni is known to be associated with small intestine cancer.
  • biomarkers of cancer are identified in the methods of the invention based on results obtained after sequencing and comparatively analyzing multiple samples labeled as cancer samples and substantially healthy samples ( i.e ., without any evidence of cancers).
  • biomarkers of cancer are identified in the methods of the invention based on known information available in databases.
  • databases include, but are not limited to, the International Nucleotide Sequence Database (at www.insdc.org), GenBank (at www.ncbi.nlm.nih.gov), the European Nucleotide Archive (at www. ebi . ac.uk/ena/browser/home), and the DNA Data Bank of Japan (at www.ddbj.nig.ac.jp).
  • Suitable examples include, without limitation, 23andMe, 1000 Genomes Project, ArrayExpress, Bioinformatic Harvester, ClinVar, COSMIC, dbSNP, ENCODE, Ensembl, Ensembl Genomes, Gene Disease Database, Gene Expression Omnibus (GEO), GTEx, HapMap, Human Microbiome Project (HMP), Human Protein Atlas (HP A), Online Mendelian Inheritance in Man (OMIM), Personal Genome Project, RefSeq, SNPedia, and TCGA.
  • GEO Gene Expression Omnibus
  • HMP Human Microbiome Project
  • HP A Human Protein Atlas
  • OMIM Online Mendelian Inheritance in Man
  • biomarkers of cancer are identified in the methods of the invention using a learning algorithm.
  • learning algorithm or “machine learning algorithm” refer to computer- executed algorithms that automate analytical model building, e.g, for clustering, classification or profile recognition. Learning algorithms perform analyses on training datasets provided to the algorithm. Learning algorithms output a “model”, also referred to as a “classifier”, “classification algorithm” or “diagnostic algorithm”. Models receive, as input, test data and produce, as output, an inference or a classification of the input data as belonging to one or another class, cluster group or position on a scale, such as diagnosis, stage, prognosis, disease progression, responsiveness to a drug, etc.
  • a variety of learning algorithms can be used to infer a condition or state of a subject. Machine learning algorithms may be supervised or unsupervised.
  • Examples of learning algorithms include, but are not limited to, artificial neural networks (e.g, back propagation networks), discriminant analyses (e.g. , Bayesian classifier, Fischer analysis), support vector machines, decision trees (e.g, recursive partitioning processes, such as classification and regression trees [CART]), random forests, linear classifiers (e.g. , multiple linear regression [MLR], partial least squares [PLS] regression, principal components regression [PCR]), hierarchical clustering and cluster analysis.
  • the learning algorithm generates a model or classifier that can be used to make an inference, e.g, an inference about a disease state of a subject.
  • the learning algorithm was previously trained with at least one training dataset.
  • the training dataset comprises information relating to genetic, epigenetic, metagenomic and/or pathogenic biomarkers of cancer from samples obtained from at least one reference subject.
  • the reference subject is an animal, preferably a mammal.
  • mammals include, but are not limited to, humans, non-human primates (such as, e.g, chimpanzees, and other apes and monkey species), farm animals (such as, e.g, cattle, horses, sheep, goats, and swine), domestic animals (such as, e.g, rabbits, dogs, and cats), laboratory animals (such as, e.g, rats, mice and guinea pigs), and the like.
  • farm animals such as, e.g, cattle, horses, sheep, goats, and swine
  • domestic animals such as, e.g, rabbits, dogs, and cats
  • laboratory animals such as, e.g, rats, mice and guinea pigs
  • the reference subject is a primate, including human and non-human primates. In one embodiment, the reference subject is a human.
  • the reference subject is a substantially healthy subject.
  • a “substantially healthy subject” has not been previously or will not be diagnosed or identified as having or suffering from cancer.
  • the training dataset comprises information relating to genetic, epigenetic, metagenomic and/or pathogenic biomarkers of cancer from samples obtained from a healthy reference population.
  • the term “healthy reference population” refers to a group of substantially healthy subjects, either of similar or different origin, ethnical background, gender, age, etc., such as a group of at least 10, preferably at least 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500 or more substantially healthy subjects.
  • the reference subj ect is a cancer subject.
  • a “cancer subject” has been previously or will be diagnosed or identified as having or suffering from cancer.
  • the training dataset comprises information relating to genetic, epigenetic, metagenomic and/or pathogenic biomarkers of cancer from samples obtained from a cancer reference population.
  • cancer reference population refers to a group of cancer subjects, either of similar or different origin, ethnical background, gender, age, etc., such as a group of at least 10, preferably at least 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500 or more cancer subjects.
  • the cancer reference population may comprise cancer subjects who has been previously or will be diagnosed or identified as having or suffering from one type of cancer.
  • the cancer reference population may comprise cancer subjects who has been previously or will be diagnosed or identified as having or suffering from any type of cancer.
  • the methods comprise a step of assigning a score to each biomarker of cancer identified in previous steps of the methods.
  • score refers to a value computed to resume multiple results into a single one.
  • Example of scoring methods include, but are not limited to, mean, median, sum or the average of probabilities of the positive class (pi) for all features (n) associated with a sample multiplied by the number of positive detected features, as illustrated, e.g, by the following formula:
  • the scoring method is applied to each group of biomarkers of cancer independently. In one embodiment, the scoring method is applied to several group of biomarkers of cancer based on their functional impact. In one embodiment, the scoring method is applied to each detected biomarker of cancer.
  • the methods comprise a step of classifying the subject’s sample based on the scores assigned to each identified biomarker of cancer in previous steps of the methods.
  • the methods comprise a step of concluding, based on classification of the subject’s sample: on the probability of the subject to be affected with cancer; or on the diagnosis of cancer in the subject; or on the determination of the origin of a tumor in the subject; or on the determination of a personalized course of treatment for the subject.
  • the methods may comprise a further step of treating the subject.
  • treating or “treatment” or “alleviation” refer to therapeutic treatment, excluding prophylactic or preventative measures; wherein the object is to slow down (lessen) a given disease, such as, e.g, cancer.
  • a given disease such as, e.g, cancer.
  • Those in need of treatment include those already with the disease (such as, e.g, cancer) as well those suspected to have the disease (such as, e.g, cancer).
  • a subject is successfully “treated” for a given disease (such as, e.g, cancer) if, after receiving a therapeutic amount of a therapeutic agent, said subject shows observable and/or measurable reduction in or absence of one or more of the following: one or more of the symptoms associated with the disease (such as, e.g, cancer); reduced morbidity and mortality; and/or improvement in quality of life issues.
  • a given disease such as, e.g, cancer
  • the above parameters for assessing successful treatment and improvement in a given disease are readily measurable by routine procedures familiar to a physician.
  • treating the subject for cancer is carried out by any of - or a combination of two or more of - surgery, radiation therapy, chemotherapy, activation immunotherapy, targeted therapy, hormone therapy, and stem cell transplant.
  • radiation therapy also termed “radiotherapy” and often abbreviated as “RT”, “RTx” or “XRT”, refers to a therapy using ionizing radiation, to control or kill malignant cells.
  • radiation therapies include, but are not limited to, external beam radiotherapy (such as, e.g, superficial X-rays therapy, orthovoltage X-rays therapy, megavoltage X-rays therapy, radiosurgery, stereotactic radiation therapy, cobalt therapy, electron therapy, fast neutron therapy, neutron-capture therapy, proton therapy, and the like); brachytherapy; unsealed source radiotherapy; tomotherapy; and the like.
  • chemotherapy refers to a therapy using a chemotherapeutic agent, i.e., any molecule that is effective in inhibiting tumor growth.
  • chemotherapeutic agents include those described under subgroup L01 of the Anatomical Therapeutic Chemical Classification System.
  • chemotherapeutic agents include, but are not limited to: alkylating agents, such as, e.g. :
  • nitrogen mustards including chlormethine, cyclophosphamide, ifosfamide, trofosfamide, chlorambucil, melphalan, prednimustine, bendamustine, uramustine, chlornaphazine, cholophosphamide, estramustine, mechlorethamine, mechlorethamine oxide hydrochloride, novembichin, phenesterine, uracil mustard and the like;
  • ⁇ nitrosoureas including carmustine, lomustine, semustine, fotemustine, nimustine, ranimustine, streptozocin, chlorozotocin, and the like;
  • alkyl sulfonates including busulfan, mannosulfan, treosulfan, and the like;
  • aziridines including carboquone, thiotepa, triaziquone, triethylenemelamine, benzodopa, meturedopa, uredopa, and the like; hydrazines, including procarbazine, and the like;
  • triazenes including dacarbazine, temozolomide, and the like; ethylenimines and methylamelamines, including altretamine, triethylenemelamine, tri etyl enephosphorami de, tri ethylenethi ophosphaorarni de, trimethylolomelamine and the like;
  • mitobronitol pipobroman, actinomycin, bleomycin, mitomycins (including mitomycin C, and the like), plicamycin, and the like; acetogenins, such as, e.g, bullatacin, bullatacinone, and the like; benzodiazepines, such as, e.g, 2-oxoquazepam, 3 -hy droxyphenazepam, bromazepam, camazepam, carburazepam, chlordiazepoxide, cinazepam, cinolazepam, clonazepam, cloniprazepam, clorazepate, cyprazepam, delorazepam, demoxepam, desmethylflunitrazepam, devazepide, diazepam, diclazepam, difludiazepam, doxefazepam,
  • ⁇ antifolates including aminopterin, methotrexate, pemetrexed, pralatrexate, pteropterin, raltitrexed, denopterin, trimetrexate, pemetrexed, and the like;
  • ⁇ purine analogues including pentostatin, cladribine, clofarabine, fludarabine, nelarabine, tioguanine, mercaptopurine, and the like;
  • ⁇ pyrimidine analogues including fluorouracil, capecitabine, doxifluridine, tegafur, tegafur/gimeracil/oteracil, carmofur, floxuridine, cytarabine, gemcitabine, azacytidine, decitabine, and the like; and
  • hydroxy carbamide ⁇ hydroxy carbamide
  • - anti-adrenals such as, e.g, aminoglutethimide, mitotane, trilostane, and the like
  • folic acid replenishers such as, e.g, frolinic acid, and the like
  • maytansinoids such as, e.g, maytansine, ansamitocins, and the like
  • platinum analogs such as, e.g, platinum, carboplatin, cisplatin, dicycloplatin, nedaplatin, oxaliplatin, satraplatin, and the like
  • trichothecenes such as, e.g, T-2 toxin, verracurin A, roridinA, anguidine and the like
  • - toxoids such as, e.g, cabazitaxel, docetaxel, larotaxel, ortataxel, paclitaxel, tesetaxel, and the
  • eleutherobin pancrati statin; sarcodictyin; spongi statin; aclacinomysins; authramycin; azaserine; bleomycin; cactinomycin; carabicin; canninomycin; carzinophilin; chromomycins; dactinomycin; daunorubicin; detorubicin; 6-diazo-5-oxo-L-norleucine; doxorubicin (including morpholino-doxorubicin, cyanomorpholino-doxorubicin, 2-pyrrolino-doxorubicin, deoxydoxorubicin, and the like); epirubicin; esorubicin; idanrbicin; marcellomycin; mycophenolic acid; nogalarnycin; olivomycins; peplomycin; potfiromycin; puromycin; quelamycin; rodorubicin
  • activation immunotherapy refers to the artificial stimulation of the immune system to treat cancer, using activation immunotherapeutic agents (or immunostimulatory agents), such as, e.g, monoclonal antibodies, oncolytic viruses, CAR T-cells, dendritic cells, cancer vaccines, cytokines (including interferons and interleukins), and the like.
  • activation immunotherapeutic agents or immunostimulatory agents
  • cytokines including interferons and interleukins
  • immune checkpoint inhibitors such as, e.g., inhibitors of CTLA4, PD- 1, PD-L l, LAG-3, B7-H3, B7-H4, TIM3, A2AR, and/or IDO, including nivolumab, pembrolizumab, pidilizumab, AMP-224, MPDL32
  • CD l ligands CD l ligands; growth hormone; immunocyanin; pegademase; prolactin; tasonermin; female sex steroids; histamine dihydrochloride; poly ICLC; vitamin D; lentinan; plerixafor; roquinimex; mifamurtide; glatiramer acetate; thymopentin; thymosin al; thymulin; polyinosinic:polycytidylic acid; pidotimod; Bacillus Calmette-Guerin; melanoma vaccine; sipuleucel-T; and the like
  • targeted therapy refers to a therapy using a targeted therapy agent, i.e., any molecule which aims at one or more particular target molecules (such as, e.g, proteins) involved in tumor genesis, tumor progression, tumor metastasis, tumor cell proliferation, cell repair, and the like.
  • a targeted therapy agent i.e., any molecule which aims at one or more particular target molecules (such as, e.g, proteins) involved in tumor genesis, tumor progression, tumor metastasis, tumor cell proliferation, cell repair, and the like.
  • targeted therapy agents include, but are not limited to, tyrosine- kinase inhibitors, serine/threonine kinase inhibitors, monoclonal antibodies and the like.
  • targeted therapy agents include, but are not limited to, HER1/EGFR inhibitors (such as, e.g, brigatinib, erlotinib, gefitinib, olmutinib, osimertinib, rociletinib, vandetanib, and the like); HER2/neu inhibitors (such as, e.g, afatinib, lapatinib, neratinib, and the like); C-kit and PDGFR inhibitors (such as, e.g, axitinib, masitinib, pazopanib, sunitinib, sorafenib, toceranib, and the like); FLT3 inhibitors (such as, e.g,
  • anti-CD33 monoclonal antibodies such as, e.g, gemtuzumab, and the like
  • anti-CD52 monoclonal antibodies such as, e.g, alemtuzumab, and the like.
  • hormone therapy refers to the artificial manipulation of the endocrine system through exogenous or external administration of specific hormones, in particular steroid hormones, or drugs which inhibit the production or activity of such hormones (i.e., inhibitors of hormone synthesis and hormone receptor antagonists).
  • hormones include, but are not limited to, androgens (such as, e.g, androstenediol dipropionate, boldenone undecylenate, clostebol, clostebol acetate, clostebol caproate, clostebol propionate, cloxotestosterone acetate, prasterone, prasterone enanthate, prasterone sulfate, quinbolone, testosterone, testosterone cypionate, testosterone enanthate, testosterone propionate, testosterone undecanoate, testosterone ester mixtures, deposterona, omnadren, sustanon, testoviron depot, androstanolone, androstanolone esters, bolazine capronate, drostanolone propionate, epitiostanol, mepitiostane, mesterolone, metenolone acetate, metenolone enanthat
  • inhibitors of hormone synthesis and hormone receptor antagonists include, but are not limited to, anti-estrogens (such as, e.g, including tamoxifen, raloxifene, aromatase inhibiting 4(5)-imidazoles, 4-hydroxytamoxifen, trioxifene, keoxifene, LY117018, onapri stone, toremifene, and the like), and anti-androgens (such as, e.g, flutamide, nilutamide, bicalutamide, leuprolide, goserelin, and the like).
  • anti-estrogens such as, e.g, including tamoxifen, raloxifene, aromatase inhibiting 4(5)-imidazoles, 4-hydroxytamoxifen, trioxifene, keoxifene, LY117018, onapri stone, toremifene, and the like
  • anti-androgens such as
  • stem cell transplant refers to a transplantation of stem cells (either autologous or allogenic) aiming at replacing or reinforcing pre-existing bone marrow cells that may have been partially or totally destroyed by cancer or by therapy.
  • the present invention also relates to a computer system for estimating the probability of a subj ect to be affected with cancer.
  • It also relates to a computer system for diagnosing cancer in a subject in need thereof.
  • It also relates to a computer system for determining the origin of a tumor in a subject in need thereof.
  • It also relates to a computer system for determining a personalized course of treatment in a subject affected with cancer.
  • computer system refers to any and all devices capable of storing and processing information and/or capable of using the stored information to control the behavior or execution of the device itself, regardless of whether such devices are electronic, mechanical, logical, or virtual in nature.
  • computer system can refer to a single computer, but also to a plurality of computers working together to perform the function described as being performed on or by a computer system.
  • the computer system according to the present invention comprises:
  • processor is meant to include any integrated circuit or other electronic device capable of performing an operation on at least one instruction word, such as, e.g, executing instructions, codes, computer programs, and scripts which it accesses from a storage medium.
  • processors include, but are not limited to, central processing units (CPU), microprocessors, digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), and other equivalent integrated or discrete logic circuitry.
  • CPU central processing units
  • DSP digital signal processors
  • ASIC application specific integrated circuits
  • FPGA field programmable logic arrays
  • the code stored on the storage medium when executed by the processor, causes the computer system to: a. optionally, receive at least one raw sequencing signal from a sequencing experiment of nucleic acids, preferably cfNAs previously extracted from a sample from the subject, as described hereinabove; b. optionally, base-call and demultiplex said at least one raw sequencing signal, thereby obtaining at least one sequence read or a plurality of sequence reads; c. assign said at least one sequence read or the plurality of sequence reads to at least one reference genome or a portion thereof, thereby obtaining at least one mapped sequence read or a plurality of mapped sequence reads, as described hereinabove; d.
  • the learning algorithm was previously trained with at least one training dataset, as described hereinabove.
  • the training dataset comprises information relating to genetic, epigenetic, metagenomic and/or pathogenic biomarkers of cancer from samples obtained from at least one reference subject, as described hereinabove.
  • Figure 1 is a flowchart illustrating the bioinformatic steps of the methods carried out after the sequencing step.
  • Figures 2A-B represent the “Silico mixl” description.
  • Figure 2A shows the ratio of THP1 DNA mixed into HeLa DNA for all samples, including controls (samples without THP1 DNA mixed into HeLa DNA);
  • Figure 2B shows the distribution of simulated depth for controls (annotated as 0) and THP1 -positive samples (annotated as 1).
  • Figures 3A-B represent the “Silico mix2” description.
  • Figure 3 A shows the ratio of THP1 or HeLa DNA mixed into normal plasma DNA from healthy donor of all samples, including controls (samples without HeLa or THP1 DNA);
  • Figure 3B shows the simulated depth for controls (annotated as 0) and THP1- or HeLa-positive samples (annotated as 1).
  • Figures 4A-D represent the methylation biomarkers analysis.
  • Figure 4A shows the distribution of the methylation score regarding ratio of abnormal DNA in the sample from “Silico mixl”.
  • Figure 4B shows the ROC analysis of methylation scores that illustrates the diagnostic ability of the score to discriminate samples with and without abnormal DNA in “Silico mixl”.
  • Figure 4C shows the distribution of the methylation score regarding ratio of abnormal DNA in the sample from “Silico mix2”.
  • Figure 4D shows the ROC analysis of methylation scores that illustrates the diagnostic ability of the score to discriminate samples with and without abnormal DNA in “Silico mix2”.
  • Figures 5A-D represent the variants biomarkers analysis.
  • Figure 5A shows the distribution of the variant score regarding ratio of abnormal DNA in the sample from “Silico mixl”.
  • Figure 5B shows the ROC analysis of variant scores that illustrates the diagnostic ability of the score to discriminate samples with and without abnormal DNA in “Silico mixl”.
  • Figure 5C shows the distribution of the variant score regarding ratio of abnormal DNA in the sample from “Silico mix2”.
  • Figure 5D shows the ROC analysis of variant scores that illustrates the diagnostic ability of the score to discriminate samples with and without abnormal DNA in “Silico mix2”.
  • Figures 6A-D represent the nucleosome footprint biomarkers analysis.
  • Figure 6A shows the distribution of the nucleosome score regarding ratio of abnormal DNA in the sample from “Silico mixl”.
  • Figure 6B shows the ROC analysis of nucleosome scores that illustrates the diagnostic ability of the score to discriminate samples with and without abnormal DNA in “Silico mixl”.
  • Figure 6C shows the distribution of the nucleosome score regarding ratio of abnormal DNA in the sample from “Silico mix2”.
  • Figure 6D shows the ROC analysis of nucleosome scores that illustrates the diagnostic ability of the score to discriminate samples with and without abnormal DNA in “Silico mix2”.
  • Figure 7 represents the distribution of the transposons score regarding ratio of abnormal DNA in the sample for “Silico mix2”.
  • Figure 8 represents the distribution of the telomere length score regarding ratio of abnormal DNA in the sample for “Silico mix2”.
  • Figure 9 represents the biomarker performances to discriminate samples. Performance for all samples (left panel) and samples with “abnormal” DNA ratio inferior to 5 % (right panel) are shown.
  • Figures 10A-B represent the detection performance of THPl “abnormal” DNA into silico mixl after classification of samples based on different combination of 1 to 3 different biomarkers (M: methylation; V: variant; N: nucleosome footprint).
  • M methylation
  • V variant
  • N nucleosome footprint
  • Figure 11 represents the performance of “abnormal” DNA detection for THPl and HeLa samples in the two silico mixes for different biomarkers’ combinations (M: methylation; V: variant; N: nucleosome footprint).
  • Figures 12A-C represent the detection performances of THPl or HeLa “abnormal” DNA into “Silico mix2” after classification of samples based on different combination of 1 to 6 different biomarkers (M: methylation; V: variant; N: nucleosome footprint; T: transposon; Mi: mitochondria; Tel: telomeres length).
  • M methylation
  • V variant
  • N nucleosome footprint
  • T transposon
  • Mi mitochondria
  • Tel telomeres length
  • Figure 12A shows the balanced accuracy.
  • Figure 12B shows the precision rate.
  • Figure 12C shows the recall rate.
  • Figures 13A-B represent the quantification of “abnormal” DNA into “Silico mixl”.
  • Figure 13A shows the correlation factor (Pearson) for different biomarkers combination (M: methylation; V: variant; N: nucleosome footprint).
  • Figure 13B is the correlation plot for the best biomarkers’ combination.
  • Figures 14A-B represent the quantification of “abnormal” DNA into “Silico mixl”.
  • Figure 14A shows the correlation factor (Pearson) for different biomarkers combination for both HeLa DNA (left panel) and THP1 DNA (right panel).
  • Figure 14B shows the correlation plot for the best biomarkers’ combinations for both HeLa DNA and THP1 DNA.
  • Figure 15 represents the ROC curve of sample classification accuracy, based on the type of abnormal DNA.
  • Figure 16 represents the throughput obtained from each sequencing test of cfDNA. Testl corresponds to cfDNA obtained form in vitro culture of cell lines. Test2 was performed using the same protocol used for in vitro test (default Nanopore protocol). Test3 was done by adapting beads over DNA ratio to optimize the capture of small reads.
  • Figure 17 represents the percentage of reads with a length inferior to 1 000 pb from each sequencing test of cfDNA.
  • Testl corresponds to cfDNA obtained form in vitro culture of cell lines.
  • Test2 was performed using the same protocol used for in vitro test (default Nanopore protocol).
  • Test3 was done by adapting beads over DNA ratio to optimize the capture of small reads.
  • Figure 18A-C represent the description of sequencing data obtained from DNA extracted with two different commercial kits.
  • Figure 18A shows the sequencing throughput in reads count.
  • Figure 18B shows the percentage of small reads (size under 1 000 pb).
  • Figure 18C shows the quality of reads estimated by mean BASEQ of sequenced nucleotides.
  • Figure 19A-D represent the reads size distribution for all samples sequenced in test3, done by adapting beads over DNA ratio to optimize the capture of small reads.
  • Figure 19A sample 1.
  • Figure 19B sample 2.
  • Figure 19C sample 3.
  • Figure 19D sample 4.
  • Figure 20 represents the methylated fraction of CpG for all samples sequenced in test3, done by adapting beads over DNA ratio to optimize the capture of small reads.
  • Figure 21 represents the correlation between methylation frequency in all samples sequenced in test3, done by adapting beads over DNA ratio to optimize the capture of small reads.
  • Methylation frequencies indicate the methylation status of the position: a value superior to 0.5 indicate that the site is more frequently methylated.
  • First row and line sample 1; second row and line: sample 2; third row and line: sample 3; fourth row and line: sample 4.
  • Figure 22A-C represent the nucleosome analysis for all samples sequenced in test3, done by adapting beads over DNA ratio to optimize the capture of small reads.
  • Figure 22A shows the number of regions that have a large coverage and could indicate nucleosomes.
  • Figure 22B shows the proportion of nucleosomes found in all other samples.
  • Figure 22C shows the proportion of nucleosome found in common between samples 2 and 3 that have the same count of total identified nucleosomes.
  • Figure 23 represents the expression of nucleosomes in samples sequenced in test3, done by adapting beads over DNA ratio to optimize the capture of small reads. Expression is described by the mean reads depth at nucleosome position. First row and line: sample 1; second row and line: sample 2; third row and line: sample 3; fourth row and line: sample 4. EXAMPLES
  • Example 1 In vitro cultures
  • the goal of this study was to validate in vitro the ability of Nanopore ® device to sequence cfDNA extracted from cells’ culture.
  • the reads were then mixed in silico to create artificial samples with increasing ratio of abnormal DNA (THPl) mixed into background DNA (HeLa).
  • THPl abnormal DNA
  • HeLa background DNA
  • Plasma samples stocked at -80°C, were thawed at room temperature. Cell-free DNA extraction was done from 200 pL of plasma using QIAamp Circulating Nucleic Acid Kit (Qiagen ® ).
  • shotgun sequencing libraries were prepared from 50 ng of DNA using library kit SGK-LSK009 (Nanopore ® ). Protocol was adapted to select reads of low size. Several samples were multiplexed using native barcoding kit EXP-NBD104 (Nanopore ® ) and sequenced on MinlON or GridlON device (Nanopore ® ). Sequencing were performed during 48 hours until all pores were inactivated. Raw signal was called and demultiplexed with high accuracy using Guppy last version. In silico samples generation
  • Reads generated by sequencing from two cell lines were mixed in silico to mix DNA from different origins.
  • a first group of samples hereafter named “Silico mixl” was generated by mixing THPl reads in HeLa reads background at different ratio.
  • a second group of samples hereafter named “Silico mix2”, was generated by mixing either THPl reads or HeLa reads into “normal” cfDNA reads obtained from the sequencing of cfDNA extracted from a healthy donor.
  • Figure 1 Reads were filtered based on their quality estimated by the BASEQ score. Reads with global quality inferior to 10 were filtered with NanoFilt. Then, high quality reads were aligned on human genome (version hg38 - RefSeq assembly accession: GCF_000001405.26) with minimap2. From these alignment results, several biomarkers were analyzed: methylation of reads aligned on the human genome were evaluated by integrating the raw signal from sequencing by NanoPolish.
  • Methylated cytosines were identified into CpG island by NanoPolish and methylation confidence was evaluated by a log ratio score; presence of nucleosome was identified on alignment files: regions overcovered were identified by analyzing the coverage at each genome position, which corresponded to regions where a nucleosome was present (nucleosome footprint); variants were analyzed by genotyping approach. Composition of bases at each position was analyzed using samtools and target variants were searched directly from these results. For this step, we did not use variant caller algorithms; mitochondria quantity was evaluated by computing the alignment depth on mitochondrial chromosome; telomere length was evaluated by searching telomere pattern on the reads using TelSeq tool; transposons were searched from alignment file. Long insertion and deletion were identified using sniffles and transposon like abnormalities were identified using TLDR tool.
  • biomarkers were identified by comparing whole biomarkers from the different DNA sequencing results. Transposons, methylated CpG island, genetic variants and nucleosome footprint were analyzed to find biomarkers specific of a cell type. Two sets of biomarkers were identified that corresponded to the two simulated group of samples (“Silico mixl” and “Silico mix2”). A first comparison identified THP1 -specific biomarkers, by comparing THP1 and HeLa results: biomarkers present in the THP1 sample but absent from the HeLa samples were selected. A second set of biomarkers was identified by comparing both THPl and HeLa results with normal human DNA. The count of biomarkers in each group is summarized in Table 3.
  • Table 3 Count of specific biomarkers used for each silico simulated samples' group.
  • Methyl ati on score was computed from the log-like ratio of each reads that displayed CpG of interest. The mean of only positive log-like-ratio was computed and pondered based on the depth at the position. The sum of all pondered means was then computed and score was finally pondered using the global depth of the sample.
  • Variant score was computed from the variant specific detection on the alignment files. Count of found variants was performed and normalized using the global depth of the sample.
  • Nucleosome footprint was determined by the presence of high coverage clusters on the reference genome. Coverage depth was evaluated for each cluster specific to a cell line and then ponder by the global depth of the sample. The sum of normalized depth of all clusters was finally performed.
  • the transposon score was determined with the same approach than variants’ score: the count of transposon identified on the alignment file was done and normalized by the sequencing depth of the sample.
  • the mitochondria score is a quantification of reads aligned on the chrM and normalized by the sequencing depth of the sample.
  • the telomere score derives from the estimation of the length of the telomere by the research of “TTAGGG” motifs (SEQ ID NO: 1) into aligned reads.
  • Two sets of data were generated in silico.
  • ThPl DNA was mixed into HeLa DNA at various ratio ranging from 0.66 % to 50 % (Fig. 2A).
  • Fig. 2B Several sequencing depths were simulated to assess the detection threshold of the methods.
  • a second group of samples was generated by mixing either ThPl or HeLa DNA into normal human DNA obtained from a healthy donor (“Silico mix2”). Ratio ranged from 1 % to 20 % for each cell lines (Fig. 3A). Various depths were also simulated (Fig. 3B).
  • Methyl ati on score was computed for each samples of each silico mix group. The distribution of the score ranged from 0 to 6 and was correlated with the ratio of abnormal DNA in all samples. The ROC analysis of the score showed that it was a good tool for the discrimination of samples with or without abnormal DNA (Fig. 4): for “Silico mixl” (Fig. 4A-B): a threshold at 0.54 enabled high accuracy discrimination of samples with a false positive rate (FPR) of 0.00 and a true positive rate (TPR) of0.83; - for “Silico mix2” (Fig. 4C-D): the accuracy was lower because of HeLa samples which were more difficult to discriminate from negative sample. However, a threshold at 1.83 enabled high accuracy discrimination of samples with a FPR of 0.33 and a TPR of0.68.
  • Variants analysis in silico samples Variant score was computed for each sample of each silico mix group. The distribution of the score ranged from 0 to 1.6 and was correlated with the ratio of abnormal DNA in all samples. The ROC analysis of the score shows that it was a good tool for the discrimination of samples with or without abnormal DNA (Fig. 5): for “Silico mixl” (Fig. 5A-B): a threshold at 0.22 enabled high accuracy discrimination of samples with a FPR of 0.2 and a TPR of 0.6. for “Silico mix2” (Fig. 5C-D): the accuracy was lower because of HeLa samples which were more difficult to discriminate from negative sample.
  • a threshold at 0.22 enabled high accuracy discrimination of samples with a FPR of 0.5 and a TPR of0.82.
  • the variants scores showed lower performance compared to the methylated score (Fig. 4A and 4B).
  • One limit was the recall rate which was lower with this biomarker than with the methylation score.
  • the recall decreased from 0.91 to 0.38 in “Silico mixl” and from 0.73 to 0.62 in “Silico mix2”. It resulted in a low sensitivity in the detection of samples with low rate of abnormal DNA.
  • Nucleosome score was computed for each samples of each silico mix group. The distribution of the score ranged from 0 to 80 and was corelated with the ratio of abnormal DNA in all samples. The ROC analysis of the score showed that it was a good tool for the discrimination of samples with or without abnormal DNA (Fig. 6): for “Silico mixl” (Fig. 6A-B): a threshold at 7.11 enabled high accuracy discrimination of samples with a FPR of 0.13 and a TPR of 0.83. - for “Silico mix2” (Fig. 6C-D): the accuracy was lower because of HeLa samples which were more difficult to discriminate from negative sample. However, a threshold at 60.5 enabled high accuracy discrimination of samples with a FPR of 0.25 and a TPR of0.85.
  • Transposon analysis in silico samples Transposons were searched for in all samples of the “Silico mix2”. There were not enough specific transposons biomarkers for each cell lines to perform correlation or ROC analysis but it was observed that the presence of a transposon was highly specific of the presence of “abnormal” DNA (Fig. 7).
  • telomere length analysis in silico samples The telomere length was computed for each sample of the “Silico mix2” and compared the results to negative controls. HeLa samples had shorter telomeres and ThPl samples had longer telomeres compared to negative controls (Fig. 8).
  • DNA extraction cfDNA were extracted using two different commercial kits: QIAamp MinElute (Qiagen ® ) and alle MiniMaxTM (Beckman ® ). Both methods are based on a small DNA fragment capture using magnetic beads.
  • Library preparation was performed using ligation protocol developed by Nanopore ® . Basically, this method required the ligation of barcodes at both ends of DNA fragments after a previous step that prepared the ends of the DNA. After barcode ligation, sequencing adaptors were attached to enable the sequencing of the fragments. Between these steps, DNA washing was done to remove reagents used at each step. The washing was performed by the capture of DNA using magnetic beads. The ratio of beads on DNA had an impact on the size of DNA fragments that were retained from the washing, and we modulated this step during our test to capture preferentially the cfDNA (data not shown).
  • Throughput obtained from different runs Throughput of runs were estimated by counting the total reads sequenced per samples. For each run, we used a quantity of DNA ranging from 10 ng to 30 ng. The reads count was not correlated to the quantity of DNA used for the library. We observed an increase of reads obtained after sequencing on our third test (Fig. 16).
  • testl i.e., cfDNA obtained form in vitro culture of cell lines
  • test2 for which no protocol modification has been done
  • Methylation analysis in plasma samples We analyzed the methylation pattern in the 4 previously described samples. For each sample, we observed a majority of methylated CpG, as expected for blood cells (Fig. 20).
  • nucleosome analysis in plasma samples The nucleosome pattern was next analyzed.

Abstract

La présente invention se rapporte à des procédés et à des appareils permettant d'estimer la probabilité qu'un cancer affecte un sujet, de diagnostiquer le cancer, de déterminer l'origine d'une tumeur chez un sujet et de déterminer un déroulement personnalisé du traitement chez un sujet affecté ou susceptible d'être affecté par le cancer ; sur la base du séquençage des acides nucléiques acellulaires et de l'identification dans ceux-ci de biomarqueurs génétiques, épigénétiques, transcriptomiques, métaboliques et métagénomiques.
PCT/EP2020/084760 2019-12-06 2020-12-04 Procédés et appareils permettant de diagnostiquer un cancer à partir d'acides nucléiques acellulaires WO2021110987A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962944502P 2019-12-06 2019-12-06
US62/944,502 2019-12-06

Publications (1)

Publication Number Publication Date
WO2021110987A1 true WO2021110987A1 (fr) 2021-06-10

Family

ID=73793183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/084760 WO2021110987A1 (fr) 2019-12-06 2020-12-04 Procédés et appareils permettant de diagnostiquer un cancer à partir d'acides nucléiques acellulaires

Country Status (1)

Country Link
WO (1) WO2021110987A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114574576A (zh) * 2021-12-24 2022-06-03 南京世和医学检验有限公司 胆汁cfDNA在胆囊转移性癌症诊疗中的用途
WO2023067597A1 (fr) * 2021-10-18 2023-04-27 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Utilisation du séquençage par nanopores pour déterminer l'origine de l'adn circulant
WO2023197825A1 (fr) * 2022-04-15 2023-10-19 南京世和基因生物技术股份有限公司 Procédé de construction de modèle de dépistage précoce de plusieurs cancers et dispositif de détection
WO2024010875A1 (fr) * 2022-07-06 2024-01-11 The Regents Of The University Of California Profilage sensible à la répétition d'arn acellulaire

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160203260A1 (en) * 2015-01-13 2016-07-14 The Chinese University Of Hong Kong Applications of plasma mitochondrial dna analysis
WO2017100305A2 (fr) 2015-12-07 2017-06-15 Opi Vi - Ip Holdco Llc Composition de conjugués d'agonistes-constructions d'anticorps et leurs procédés d'utilisation
WO2019191649A1 (fr) * 2018-03-29 2019-10-03 Freenome Holdings, Inc. Procédés et systèmes d'analyse du microbiote
WO2019200410A1 (fr) * 2018-04-13 2019-10-17 Freenome Holdings, Inc. Mise en œuvre de l'apprentissage automatique pour un dosage multi-analytes d'échantillons biologiques
WO2019209954A1 (fr) * 2018-04-24 2019-10-31 Grail, Inc. Systèmes et procédés d'utilisation d'une charge d'acide nucléique pathogène pour déterminer si un sujet présente un état cancéreux

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160203260A1 (en) * 2015-01-13 2016-07-14 The Chinese University Of Hong Kong Applications of plasma mitochondrial dna analysis
WO2017100305A2 (fr) 2015-12-07 2017-06-15 Opi Vi - Ip Holdco Llc Composition de conjugués d'agonistes-constructions d'anticorps et leurs procédés d'utilisation
WO2019191649A1 (fr) * 2018-03-29 2019-10-03 Freenome Holdings, Inc. Procédés et systèmes d'analyse du microbiote
WO2019200410A1 (fr) * 2018-04-13 2019-10-17 Freenome Holdings, Inc. Mise en œuvre de l'apprentissage automatique pour un dosage multi-analytes d'échantillons biologiques
WO2019209954A1 (fr) * 2018-04-24 2019-10-31 Grail, Inc. Systèmes et procédés d'utilisation d'une charge d'acide nucléique pathogène pour déterminer si un sujet présente un état cancéreux

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
ALEXANDROV ET AL., NATURE, vol. 500, no. 7463, 2013, pages 415 - 21
HOMER N ET AL., PLOS ONE, 2009, pages e7767
JOULIN ET AL.: "15th Conference of the European Chapter of the Association for Computational Linguistics (Eacl 2017): Valencia, Spain", 2017, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
JOULIN, ARXIV:1607.01759, 2016
LANGMEAD B ET AL., GENOME BIOL, vol. 10, 2009, pages R25
LI HDURBIN R, BIOINFORMATICS, vol. 25, 2009, pages 1966 - 67
MENEGAUXVERT, J COMPUT BIOL, vol. 26, no. 6, 2019, pages 509 - 518
NIEDRINGHAUS ET AL., ANAL CHEM., vol. 83, no. 12, 2011, pages 4327 - 41
RIVALS E. ET AL., LECTURE NOTES IN COMPUTER SCIENCE, vol. 5724, 2009, pages 246 - 260
RIZK, GLAVENIER, D, BIOINFORMATICS, vol. 26, 2010, pages 2534 - 2540
XU ET AL., SMALL, vol. 5, no. 23, 2009, pages 2638 - 49

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023067597A1 (fr) * 2021-10-18 2023-04-27 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Utilisation du séquençage par nanopores pour déterminer l'origine de l'adn circulant
CN114574576A (zh) * 2021-12-24 2022-06-03 南京世和医学检验有限公司 胆汁cfDNA在胆囊转移性癌症诊疗中的用途
CN114574576B (zh) * 2021-12-24 2023-01-03 南京世和医学检验有限公司 胆汁cfDNA在胆囊转移性癌症诊疗中的用途
WO2023197825A1 (fr) * 2022-04-15 2023-10-19 南京世和基因生物技术股份有限公司 Procédé de construction de modèle de dépistage précoce de plusieurs cancers et dispositif de détection
WO2024010875A1 (fr) * 2022-07-06 2024-01-11 The Regents Of The University Of California Profilage sensible à la répétition d'arn acellulaire

Similar Documents

Publication Publication Date Title
AU2020264326B2 (en) Detection and treatment of disease exhibiting disease cell heterogeneity and systems and methods for communicating test results
WO2021110987A1 (fr) Procédés et appareils permettant de diagnostiquer un cancer à partir d'acides nucléiques acellulaires
Liu et al. Evolution of delayed resistance to immunotherapy in a melanoma responder
Siravegna et al. Clonal evolution and resistance to EGFR blockade in the blood of colorectal cancer patients
JP2021520816A (ja) 循環腫瘍dnaの個別化された検出を用いる癌検出およびモニタリングの方法
JP2022519159A (ja) 循環細胞の分析方法
US11124836B2 (en) Method for selecting personalized tri-therapy for cancer treatment
Barris et al. Detection of circulating tumor DNA in patients with osteosarcoma
JP2017522866A (ja) 核酸配列の分析
JP2021516962A (ja) バリアント検出の改善
WO2018204657A1 (fr) Détection du cancer
CN105986031A (zh) 肿瘤易感62基因及其应用
Dubey et al. Update in lung cancer 2008
KR20220157976A (ko) 무세포 핵산의 분석 방법 및 이의 적용
CN106381332A (zh) 一种检测aml相关基因群的检测试剂盒
Mossanen et al. Genomic features of muscle-invasive bladder cancer arising after prostate radiotherapy
CN115176035A (zh) 患者对放射治疗之响应的分子预测因子
Clynick et al. Mutational Analysis of BRAF Inhibitor–Associated Squamoproliferative Lesions
Jin et al. Genetic heterogeneity in hepatocellular carcinoma and paired bone metastasis revealed by next-generation sequencing
WO2020023887A1 (fr) Criblage du carcinome hépatocellulaire
US20240029884A1 (en) Techniques for detecting homologous recombination deficiency (hrd)
CN111919257B (zh) 降低测序数据中的噪声的方法和系统及其实施和应用
CHAN et al. Applications of Next Generation Sequencing for Early Detection of Genetic Abnormalities and for Drug Discovery
Valladares-Ayerbes et al. Sequential RAS mutations evaluation in cell-free DNA of patients with tissue RAS wild-type metastatic colorectal cancer: The PERSEIDA (Cohort 2) Study
US20220056535A1 (en) Identification of her2 mutations in lung cancer and methods of treatment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20823750

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20823750

Country of ref document: EP

Kind code of ref document: A1