WO2021115906A1 - Procédé et système pour déterminer un profil cnv d'une tumeur à l'aide d'un séquençage de génome entier clairsemé - Google Patents

Procédé et système pour déterminer un profil cnv d'une tumeur à l'aide d'un séquençage de génome entier clairsemé Download PDF

Info

Publication number
WO2021115906A1
WO2021115906A1 PCT/EP2020/084403 EP2020084403W WO2021115906A1 WO 2021115906 A1 WO2021115906 A1 WO 2021115906A1 EP 2020084403 W EP2020084403 W EP 2020084403W WO 2021115906 A1 WO2021115906 A1 WO 2021115906A1
Authority
WO
WIPO (PCT)
Prior art keywords
cnv
profile
range
adjusted
ploidy
Prior art date
Application number
PCT/EP2020/084403
Other languages
English (en)
Inventor
Jie Wu
Yee Him CHEUNG
Nevenka Dimitrova
Original Assignee
Koninklijke Philips N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips N.V. filed Critical Koninklijke Philips N.V.
Priority to US17/779,624 priority Critical patent/US20230011085A1/en
Publication of WO2021115906A1 publication Critical patent/WO2021115906A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection

Definitions

  • the present disclosure is directed generally to methods and systems for characterizing an accurate copy number variation (CNV) profile of tumor cells from a tumor sample using sparse whole genome sequencing.
  • CNV copy number variation
  • Copy number variation is a class of somatic mutational events that is of importance clinically from a diagnostic, prognostic, and therapeutic point of view.
  • CNV data can be an essential component of cancer diagnosis and prognosis.
  • CNV data can also be used to guide targeted therapy and risk-directed therapy.
  • the clinical utility of copy number information is widely acknowledged in cancers such as acute myeloid leukemia and breast cancer, and is increasingly being recognized for its importance in other cancer disease entities.
  • CNV analysis can also be used to uncover clinically-actionable genetic aberrations in other cancers such as in melanoma, non-small-cell lung carcinoma and colorectal cancer.
  • CNV information for tumor cells can be complicated.
  • Clinical tumor samples are typically mixtures of tumor cells and other cells such as stromal cells, and thus a deconvolution is necessary for a better understanding of the tumor.
  • tumor cell contamination must be accounted for by adjusting the initial CNV results to absolute copy numbers. Accounting for purity can make CNV detection more accurate.
  • the present disclosure is directed to inventive methods and systems for characterizing copy number variation for a tumor cell using sparse whole genome sequencing.
  • Various embodiments and implementations herein are directed to a system and method that determines, from sparse genome data, an initial unadjusted CNV profile comprising a plurality of CNV calls for a plurality of chromosomes. The system then normalizes that unadjusted CNV profile to a mean value of 1.
  • the system comprises a predetermined range for ploidy for the genome data, and a predetermined range for a contamination rate for the genome data.
  • the system uses that information to determine adjusted segmentation values for the plurality of CNV calls, and then determines a plurality of adjustment scores each comprising a distance between the adjusted segmentation values and closest whole integers for a CNV.
  • the determined plurality of adjustment scores are compared to one or more factors that influence the selection of a CNV profile best fit, such as CNV profiles previously observed and preferred by a clinician, and/or ploidy and contamination distributions from previous data, among other possible factors. Based on that comparison, the system selects one of the plurality of adjustment scores as a best fit for the copy number variation profile of the tumor cells of the tumor.
  • the system generates an adjusted CNV profile report using the selected best fit adjustment score and provides the generated adjusted CNV profile report, such as to a user, user interface, or other display or system.
  • a method for determining a copy number variation (CNV) profile of target cells from a sample using a CNV profiling system includes: (i) receiving sparse genome sequencing data comprising sequencing from both target and non-target cells from the sample; (ii) determining, from the received sparse genome data, an unadjusted CNV profile comprising a plurality of CNV calls for a plurality of chromosomes; (iii) normalizing the unadjusted CNV profile; (iv) receiving a range for possible ploidy for the CNV profile, and/or receiving a range for a possible contamination rate for the CNV profile; (v) determining, using the received ploidy range and/or received contamination rate range, adjusted segmentation values for the plurality of CNV calls; (vi) determining a plurality of adjustment scores comprising a distance between adjusted segmentation values and closest whole integers for a CNV profile; (vii) comparing the determined pluralit
  • the unadjusted CNV profile is normalized to a mean value of one.
  • the range for possible ploidy for the CNV profile and the range for a possible contamination rate for the CNV profile is received from a user of the CNV profiling system.
  • determining a plurality of adjustment scores comprises the equation where D is a calculated distance between an adjusted segmentation value (Sadj) and a closest whole integer, S a l dj is an adjusted segmentation value of an ith segment, and n is a number of autosome segments.
  • one of the one or more predetermined factors for selecting a CNV profile best fit is a CNV profile previously observed by a user, a ploidy value or range previously observed by a user, a contamination value or range previously observed by a user, and/or ploidy or contamination information from a previous analysis.
  • the target cells are tumor cells.
  • a second aspect is a system for determining a copy number variation (CNV) profile of target cells from a sample.
  • the system includes: sparse genome sequencing data comprising sequencing from both target and non-target cells from the sample; a processor configured to: (i) determine, from the received sparse genome data, an unadjusted CNV profile comprising a plurality of CNV calls for a plurality of chromosomes; (ii) determine, using a received ploidy range and/or received contamination rate range, adjusted segmentation values for the plurality of CNV calls; (iii) determine a plurality of adjustment scores comprising a distance between adjusted segmentation values and closest whole integer for a CNV profile; (iv) compare the determined plurality of adjustment scores to one or more predetermined factors for selecting a CNV profile best fit; (v) select, based at least in part on the comparison, one of the plurality of adjustment scores as a best fit for the copy number variation profile of the tumor cells of the tumor; and (
  • the user interface is further configured to receive a range for possible ploidy for the CNV profile, and/or receive a range for a possible contamination rate for the CNV profile.
  • a processor or controller may be associated with one or more storage media (generically referred to herein as “memory,” e.g., volatile and non-volatile computer memory such as RAM, PROM, EPROM, and EEPROM, floppy disks, compact disks, optical disks, magnetic tape, etc.).
  • the storage media may be encoded with one or more programs that, when executed on one or more processors and/or controllers, perform at least some of the functions discussed herein.
  • Various storage media may be fixed within a processor or controller or may be transportable, such that the one or more programs stored thereon can be loaded into a processor or controller so as to implement various aspects as discussed herein.
  • program or “computer program” are used herein in a generic sense to refer to any type of computer code (e.g., software or microcode) that can be employed to program one or more processors or controllers.
  • FIG. 1 is a flowchart of a method for determining a copy number variation profile, in accordance with an embodiment.
  • FIG. 2A is an example of an initial unadjusted CNV profile, in accordance with an embodiment.
  • FIG. 2B is an example of an initial unadjusted CNV profile, in accordance with an embodiment.
  • FIG. 2C is an example of an initial unadjusted CNV profile, in accordance with an embodiment.
  • FIG. 3A is an example of an adjusted CNV profile, in accordance with an embodiment.
  • FIG. 3B is an example of an adjusted CNV profile, in accordance with an embodiment.
  • FIG. 3C is an example of an adjusted CNV profile, in accordance with an embodiment.
  • FIG. 4A is an example of a best fit adjusted CNV profile, in accordance with an embodiment.
  • FIG. 4B is an example of a best fit adjusted CNV profile, in accordance with an embodiment.
  • FIG. 4C is an example of a best fit adjusted CNV profile, in accordance with an embodiment.
  • FIG. 5A is a preferred fit graph, in accordance with an embodiment.
  • FIG. 5B is an adjustment score graph, in accordance with an embodiment.
  • FIG. 6A is a preferred fit graph, in accordance with an embodiment.
  • FIG. 6B is an adjustment score graph, in accordance with an embodiment.
  • FIG. 7A is a preferred fit graph, in accordance with an embodiment.
  • FIG. 7B is an adjustment score graph, in accordance with an embodiment.
  • FIG. 8 is a comparison of an unadjusted CNV profile (top panel) and a generated best fit CNV profile (bottom panel), in accordance with an embodiment.
  • FIG. 9 is an example of an adjustment score graph, in accordance with an embodiment.
  • FIG. 10 is an example of a preferred fit graph, in accordance with an embodiment.
  • FIG. 11A is an example of an adjustment score graph, in accordance with an embodiment.
  • FIG. 1 IB is an example of a preferred fit graph, in accordance with an embodiment.
  • FIG. 11C is a generated best fit CNV profile, in accordance with an embodiment.
  • FIG. 12 is a schematic representation of a system for determining a copy number variation profile, in accordance with an embodiment.
  • the present disclosure describes various embodiments of a system and method to determining a copy number variation profile of tumor cells from a tumor sample using sparse genome data. More generally, Applicant has recognized and appreciated that it would be beneficial to provide a method and system that can characterize an accurate copy number variation profile of tumor cells using faster and more cost-effective methods.
  • the system uses sparse genome data to generate an unadjusted CNV profile comprising a plurality of CNV calls for a plurality of chromosomes, which can then be normalized.
  • the system comprises a range for ploidy for the genome data, and a range for a contamination rate for the genome data.
  • the system uses that information to determine an adjusted segmentation value for at least one of the plurality of CNV calls, and then determines a plurality of adjustment scores comprising distances between the adjusted segmentation value and different closest whole integers for a CNV profile.
  • the determined plurality of adjustment scores are compared to one or more factors that influence the selection of a CNV profile best fit, such as CNV profiles preferred by a clinician, and/or ploidy and contamination distributions from previous data, among other possible factors. Based on that comparison, the system selects one of the plurality of adjustment scores as a best fit for the copy number variation profile of the tumor cells of the tumor.
  • the system generates an adjusted CNV profile report using the selected a best fit adjustment score and provides the generated adjusted CNV profile report, such as to a user, user interface, or other display or system.
  • sparse whole genome sequencing has been overlooked by research and healthcare communities.
  • sparse whole genome sequencing is a cost- effective technique to retrieve genome- wide cytogenetic information
  • CNV information unlike smaller variants such as single nucleotide variants, can be retrieved via sparse whole genome sequencing data.
  • the nature of this approach makes it highly cost effective (including an order of magnitude cheaper or more), and it also yields much more uniform read distribution than whole exome sequencing and covers the whole genome to enable a larger spectrum. It is also fare more sensitive than array-based methods.
  • One of the many advantages of the methods and systems described or otherwise envisioned herein is that they enable non-tumor cell contamination analysis and adjustment without a control.
  • the system only utilizes the measured copy number data for purity estimation, and no other variant data (such as single nucleotide variant) is utilized.
  • FIG. 1 in one embodiment, is a flowchart of a method 100 for determining copy number variation of tumor cells using sparse genome data and a CNV profiling system.
  • the CNV profiling system may be any of the systems described or otherwise envisioned herein, and may comprise any of the components described or otherwise envisioned herein.
  • the CNV profiling system receives or generates sparse whole genome sequencing data from a sample.
  • sparse whole genome sequencing data comprises much less information than high-depth next-generation whole genome sequencing data.
  • the sparse whole genome sequencing data may comprise fewer than 10 million reads for a human genome, comprising approximately O.lx coverage of that genome.
  • the sample is a tumor sample from an individual such as a patient or other person, and comprises both tumor and non-tumor cells.
  • the sample can be any genetic sample from any organism, including humans, pathogenic and non-pathogenic organisms, and many others. It is recognized that there is no limitation to the source of the genetic sample.
  • the CNV profiling system comprises a DNA sequencing platform configured to obtain sparse whole genome sequencing data from the genetic sample.
  • the sequencing platform can be any sequencing platform, including but not limited to any system described or otherwise envisioned herein.
  • a sample and/or the nucleic acids therein may be prepared for sequencing using any method for preparation, which may be at least in part dependent upon the sequencing platform.
  • the nucleic acids may be extracted, purified, and/or amplified, among many other preparations or treatments.
  • the nucleic acid may be fragmented using any method for nucleic acid fragmentation, such as shearing, sonication, enzymatic fragmentation, and/or chemical fragmentation, among other methods, and may be ligated to a sequencing adaptor or any other molecule or ligation partner.
  • the CNV profiling system receives the sparse whole genome sequencing data from the genetic sample.
  • the CNV profiling system may be in communication or otherwise receive the sequencing data from a database comprising one or more genetic samples.
  • the generated and/or received sparse sequencing data may comprise a complete or mostly complete genome, or may be a partial genome.
  • the generated and/or received sparse whole genome sequencing data may be stored in a local or remote database for use by the CNV profiling system.
  • the CNV profiling system may comprise a database to store the sequencing data for the genetic sample, and/or may be in communication with a database storing the sequencing data. These databases may be located with or within the CNV profiling system or may be located remote from the CNV profiling system, such as in cloud storage and/or other remote storage.
  • the CNV profiling system determines an initial unadjusted CNV profile from the sparse genome data, comprising a plurality of CNV calls for a plurality of chromosomes or other genomic regions or breakdowns.
  • the initial unadjusted CNV profile determination may be performed using any of a wide variety of different CNV analysis platforms or methods. Referring to FIGS. 2A, 2B, and 2C are examples of initial unadjusted CNV profiles determined or received by the CNV profiling system.
  • the copy number profile for a DNA sample comprising a number of cells from a tumor section will be composed of a component from the normal diploid cells and the tumor cells within that sample. Some or many of the normal diploid cells in the sample may be, for example, stromal cells among other types of cells. As the tumor cells are likely to be mostly from a single copy number clone, the normal and tumor components of the copy number profile can be separated and an estimate of the percentage of normal cells in the tumor section sequenced.
  • the total copy number profile will be composed of a variable copy number tumor profile plus a constant normal cell profile. As described below, after subtracting the constant profile it is possible to compute the best possible ploidy estimate for the remaining tumor component and then an error for that profile.
  • the system normalizes the unadjusted CNV profile.
  • the system can be configured to normalize the unadjusted CNV profile in any of a wide variety of ways and methods, including existing normalization methods.
  • the system may be configured to normalize the unadjusted CNV profile to a mean value of one.
  • the system may be configured by a user to normalize the unadjusted CNV profile depending on the needs or goals of the user.
  • these graphs of unadjusted CNV profiles comprise contamination, such as from non-cancer cells, which results in both an incorrect ploidy number as well as non-integer ploidy numbers.
  • the analysis and deconvolution described or otherwise envisioned herein results in the correct ploidy number, including ploidy shifts upward or downward to increase accuracy.
  • the system receives a range for ploidy for the genome data, and/or receives a range for a contamination rate for the genome data.
  • ranges can be predetermined, including being pre-programmed or otherwise received by or provided to the system.
  • a user such as a researcher, clinician, technician, or other user can provide the ranges as a setting, selection, or other information via a user interface or any other communication method.
  • the one or more received ranges allow the system to process the normalized unadjusted CNV profiles as described below.
  • the received range for ploidy ( P ) comprises one or more values which can be used to process the normalized unadjusted CNV profile.
  • the received range for ploidy (P) comprises a range between and possibly including 1.5 to 4.5, although other ranges are possible. Indeed, measured ploidy can be much higher and thus can require a larger range, possibly depending on the sample and/or cause of the CNV, among other variables.
  • the value or values may be utilized with an interval that samples the range.
  • the interval for the ploidy range may be 0.1 such that sampling a range of 1.5 to 4.5 may be 1.5, 1.6, 1.7, and so on.
  • the received range for range for contamination rate (C) comprises one or more values which can be used to process the normalized unadjusted CNV profile.
  • the received range for contamination rate (C) comprises a range between and possibly including 0% to 100%, with much smaller ranges possible.
  • the value or values may be utilized with an interval that samples the range.
  • the interval for the contamination rate range may be 1% such that sampling the range of 0% to 100% may comprise values of 0%, 1%, 2%, and so on.
  • a user such as a researcher or clinician may be utilizing the methods and systems described or otherwise envisioned herein to determine a copy number variation (CNV) profile of tumor cells from a tumor sample.
  • CNV copy number variation
  • the user will provide an input comprising a selected or default ploidy ( P ) range or value, and/or an input comprising a selected or default contamination rate (C) range or value, optionally as one or more settings for the CNV profiling system.
  • P selected or default ploidy
  • C contamination rate
  • the CNV profiling system utilizes the received ranges or values to inform one or more downstream steps of the analysis.
  • the system uses the received one or more ranges to determine adjusted segmentation values for the plurality of CNV calls.
  • the adjusted segmentation values may be determined in a variety of different ways and methods. According to one embodiment, an adjusted segmentation value is determined using the unadjusted segmentation value for a CNV segment, a ploidy value based on input from step 140, and/or a contamination rate value based on input from step 140.
  • an adjusted segmentation value may be determined using the following equation:
  • Sa dj P(S - 0/(1 - C) (Eq. 1)
  • S adj is the adjusted segmentation value for a CNV segment
  • P is a ploidy value
  • C is a cell contamination rate value
  • S is the segmentation value for the CNV segment before adjustment.
  • P is a ploidy value
  • C is a cell contamination rate value
  • S is the segmentation value for the CNV segment before adjustment.
  • P is a ploidy value
  • C is a cell contamination rate value
  • S the segmentation value for the CNV segment before adjustment.
  • the mean value of S adj is P, rather than normalized to 1.
  • an adjusted segmentation value is determined for each CNV segment using the full received ploidy range, the full received contamination rate range, or both received ranges.
  • the system may comprise or receive, such as from a user, a ploidy range of 2 to 4.5, and a contamination rate of 10% to 25%.
  • the numbers provided in this example are provided only as possible ranges, and are not limiting.
  • the system determines an adjusted segmentation value for each CNV segment using a sampling rate for the ploidy range and/or for the contamination rate.
  • the system will determine an adjusted segmentation value for each CNV segment using 2.0, 2.1, 2.2, and so on at 0.1 intervals through and including 4.5. If the sampling rate for the contamination rate is 1% and the range is 10% to 25%, the system will determine an adjusted segmentation value for each CNV segment using 10%, 11%, 12%, and so on at 1% intervals through and including 25%. There may thus be 100s or 1000s of determined adjusted segmentation values for a CNV segment.
  • the determined adjusted segmentation values for each CNV segment may be used by the system immediately or in the short-term, or may be stored in a local or remote database for future or other downstream use by the CNV profiling system.
  • the system uses the adjusted segmentation values (S adj ) for the CNV segments to determine a plurality of adjustment scores, for example comprising a distance between the adjusted segmentation value (S m/ ,) and different closest whole integers for a CNV profile.
  • the adjustment scores may be determined in a variety of different ways and methods.
  • an adjustment score measures, and may allow for the minimization of, the difference between adjusted segmentation values and whole integers closest to the values.
  • the system may be designed such that CNV segments are likely to be clonal, meaning they are likely to be an integer such as 1, 2, 3, etc. rather than a value such as 1.4, 2.6, 3.1, etc.
  • an adjustment score may be determined for an entire CNV profile, or the adjustment score may be determined for a sub-set of the CNV profile. This may be selected or otherwise determined by a user, may be selected or determined by the system, and/or may be selected or otherwise determined by other input or selection mechanism.
  • an adjustment score may be determined using the following equation:
  • the adjustment score may be determined by multiplying the above distance by the lengths of the segments to account for segment sizes, among other methods.
  • FIGS. 3A, 3B, and 3C are adjusted CNV profiles with adjusted segmentation values, prior to the best fit analysis in step 170 of the method. These profiles are rejected by the CNV profiling system as they do not comprise the best fit for CNV profile.
  • the system compares the results of the adjustment score analysis to one or more factors to facilitate selection of a best fit CNV profile. This may result in one or more parameters or factors that may be used to select or influence selection of a best fit CNV profile, from among the profiles represented by the adjustment scores.
  • factors are things such as CNV profiles or profile variables previously clinically observed by a user such as a clinician or researcher and determined to be more meaningful according to the user’s experience, including factors such as likely CNV segment integers, among others.
  • ploidy distributions determined from previous data or analyses and/or contamination distributions from previous data or analyses, including but not limited to analyses where one or more parameters of the sample or analysis were similar to the current sample or analysis.
  • the system may utilize prior information to prioritize certain solutions. For example, the system may use the distribution of contamination rate or ploidy from similar samples obtained by other techniques. In some cases, a ploidy closer to two may be more favorable as the best solution. Copy number distributions can also be used. For instance, when the predicted ploidy/contamination results in a CNV profile with all copy number bigger than two, without any lower copy numbers, the system may reject that solution (in the next step) and use another solution. Many other preferences from the clinicians can also be incorporated into the selection procedure.
  • a final CNV profile is selected as a best fit for the sample, based at least in part on the one or more factors from step 170 of the method.
  • the combination of contamination rate (Q and ploidy estimate ( P ) that best minimizes error in the unadjusted CNV profile, and thus generates the most likely adjusted CNV profile is selected as the best solution.
  • the tumor is a single copy number clone
  • the segments will fall very close to integer values when the contamination rate and tumor ploidy values are correct.
  • the combination of contamination rate and tumor ploidy values that generate an adjustment score and adjusted CNV profile with the highest likelihood of accuracy is selected.
  • the system generates the best fit adjusted CNV profile using the selected adjustment. This can be performed by the system via a variety of methods and systems, to generate a final adjusted CNV profile that can be saved, reported, or otherwise stored or used by the CNV profiling system.
  • FIGS. 4A, 4B, and 4C are best fit adjusted CNV profiles generated by the CNV profiling system. These best fit adjusted CNV profiles correspond to the examples of initial unadjusted CNV profiles in FIGS. 2A/3A, 2B/3B, and 2C/3C, respectively.
  • the example in FIG. 4A utilized the score graph in FIG. 5A and the preferred fit graph in FIG. 5B.
  • FIG. 5A is a graph of adjustment score results for a given ploidy range (1.5 to 4) and a contamination range (0% to 100%).
  • FIG. 5B is a graph of acceptable or preferred results for the given ploidy versus contamination ranges. For the example in FIG. 4A, the ploidy was shifted down by the analysis as there were no events at copy number 1 and 2.
  • FIG. 4B utilized the score graph in FIG. 6A and the preferred fit graph in FIG. 6B.
  • FIG. 6A is a graph of adjustment score results for a given ploidy range (1.5 to 4) and a contamination range (0% to 100%).
  • FIG. 6B is a graph of acceptable or preferred results for the given ploidy versus contamination ranges.
  • the best fit provided the improvement of at least two integer copy numbers as shown in FIG. 4B.
  • FIG. 4C utilized the score graph in FIG. 7A and the preferred fit graph in FIG. 7B.
  • FIG. 7A is a graph of adjustment score results for a given ploidy range (1.5 to 4) and a contamination range (0% to 100%).
  • FIG. 7B is a graph of acceptable or preferred results for the given ploidy versus contamination ranges.
  • the ploidy was shifted down by the analysis as it was unlikely that the majority of the genome would be at copy number 3.
  • FIG. 8 is a comparison of an unadjusted CNV profile (top panel) and a generated best fit CNV profile (bottom panel).
  • the copy numbers are not integers due to contamination. See, for example, the circled copy number in the top panel.
  • the generated best fit CNV profile the copy numbers are integers due to the process described or otherwise envisioned herein. See, for example, the circled copy number for the same segment in the bottom panel.
  • FIG. 9 is an example of a graph of adjustment score results for a given ploidy range (1.5 to 4) and a contamination range (0% to 100%), where the arrows show regions with favorable scores corresponding to the scale to the right of the graph.
  • FIG. 10 is a graph of acceptable or preferred results for the given ploidy versus contamination ranges, where the arrows correspond to the more favorable results according to the scale to the right of the graph.
  • the adjustment score shown by the arrow in the lower right side of the adjustment score graph in FIG. 9 corresponds to a preferred region in the acceptable or preferred results graph in FIG. 10, thus indicating the adjustment score as a possible best fit to generate a best fit CNV profile.
  • FIGS. 11 A through 11 C is an example of a best fit adjusted CNV profile selected using the methods and systems described or otherwise envisioned herein.
  • FIG. 11A is a plot of adjustment scores for a ploidy range (1.5 to 4) and a contamination range (0% to 100%), where the arrows show regions with favorable scores, or in other words three potential solutions.
  • FIG. 1 IB is a graph of acceptable or preferred results for ploidy versus contamination ranges. For example, the central region shown by the arrow corresponds to the more favorable result on the scale.
  • the circled favorable score from FIG. 11A is selected as the best fit as it corresponds to a preferred result region in FIG. 1 IB, shown by the circled region in FIG. 1 IB.
  • the system provides the generated adjusted CNV profile report.
  • the report may comprise, for example, one or more of the original unadjusted CNV profile, the generated adjusted CNV profile, the received ploidy range ( P ) and interval, the received contamination rate (C) and interval, one or more calculated adjusted segmentation values, one or more calculated adjustment scores, a best fit adjustment score, information about the factor or factors that influenced selection of the best fit CNV profile, and/or other information.
  • the report may be electronic or printed, and may be stored.
  • the report may comprise a text- based file or other format.
  • the report may be sortable or otherwise configured for organization to allow easy analysis and extraction of information.
  • the CNV profiling system may visually display information about the generated adjusted CNV profile and/or any of the elements, scores, parameters, or factors described or otherwise envisioned herein.
  • a clinician, researcher, or other user may only be interested in one piece of information such as the generated adjusted CNV profile, and thus the CNV profiling system may be instructed or otherwise designed or programmed to only display this information.
  • the report or information may be stored in temporary and/or long-term memory or other storage. Additionally and/or alternatively, the report or information may be communicated or otherwise transmitted to another system, recipient, process, device, and/or other local or remote location.
  • the report or information can be provided to a researcher, clinician, or other user to review and implement an action or response based on the provided information.
  • a researcher, clinician or other user may utilize the information to quantify clinically actionable CNVs based on the report as generated from sparse whole genome sequencing data. That this is generated from sparse whole genome sequencing data represents a novel and non-obvious improvement in the field, as prior studies teach away from this use either explicitly or by suggesting that sparse whole genome sequencing data is not data-rich or robust enough to provide the necessary amount of information.
  • identifying causal CNVs can be an essential component of disease diagnosis and treatment.
  • Clinically actionable CNVs present an important piece of information for disease, as well as a possible treatment point for disease. This is true not only in cancers but in many other disorders and phenotypes.
  • CNV evaluation can help improve diagnosis, monitoring, and treatment of neurological disorders. This may include scenarios where the neurological disorder is so rare that there is no diagnostic test in existence.
  • many other conditions may be diagnosed, monitored, and treated based on the identification of a causal CNV obtained by analysis of sparse whole genome sequencing.
  • a user such as a healthcare professional or researcher receives a generated adjusted CNV profile report and identifies, based on the report, one or more causal CNVs for the phenotype.
  • the user may identify a causal CNV for a cancer or cancer phenotype, a neurological disorder, or any of a wide variety of other phenotypes.
  • the user identifies a treatment or other intervention for the individual based on the identified causal CNV, and applies that treatment to the individual.
  • the identification of the CNV profile, the identification of an intervention, and the application of that intervention are based entirely on the ability of the CNV profiling system to generate an adjusted CNV profile using only the results of sparse whole genome sequencing.
  • the use of sparse whole genome sequencing by the CNV profiling system has thereby significantly decreased cost, increased speed and efficiency of the CNV profiling system, and improved care of the individual.
  • a researcher, clinician or other user may utilize the information to quantify tumor purity, which may be a piece of information provided in the report or otherwise provided by the system.
  • the system is also thereby determining a purity, or rather the contamination, of the sample as measured by the initial unadjusted CNV profile.
  • Many other downstream uses are possible.
  • FIG. 12 in one embodiment, is a schematic representation of a CNV profiling system 1200 configured to determine copy number variation of tumor cells using sparse genome data and a CNV profiling system.
  • System 1200 may be any of the systems described or otherwise envisioned herein, and may comprise any of the components described or otherwise envisioned herein.
  • system 1200 comprises one or more of a processor 1220, memory 1230, user interface 1240, communications interface 1250, and storage 1260, interconnected via one or more system buses 1212. It will be understood that FIG. 12 constitutes, in some respects, an abstraction and that the actual organization of the components of the system 1200 may be different and more complex than illustrated.
  • the hardware may include additional sequencing hardware 1215.
  • the sequencing platform is configured to generate sparse whole genome sequencing data from a sample.
  • sparse whole genome sequencing data comprises much less information than high-depth next-generation whole genome sequencing data.
  • the sparse whole genome sequencing data may comprise fewer than 10 million reads for a human genome, comprising approximately O.lx coverage of that genome.
  • system 1200 comprises a processor 1220 capable of executing instructions stored in memory 1230 or storage 1260 or otherwise processing data to, for example, perform one or more steps of the method.
  • processor 1220 may be formed of one or multiple modules.
  • Processor 1220 may take any suitable form, including but not limited to a microprocessor, microcontroller, multiple microcontrollers, circuitry, field programmable gate array (FPGA), application-specific integrated circuit (ASIC), a single processor, or plural processors.
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • Memory 1230 can take any suitable form, including a non-volatile memory and/or RAM.
  • the memory 1230 may include various memories such as, for example LI, L2, or L3 cache or system memory.
  • the memory 1230 may include static random access memory (SRAM), dynamic RAM (DRAM), flash memory, read only memory (ROM), or other similar memory devices.
  • SRAM static random access memory
  • DRAM dynamic RAM
  • ROM read only memory
  • the memory can store, among other things, an operating system.
  • the RAM is used by the processor for the temporary storage of data.
  • an operating system may contain code which, when executed by the processor, controls operation of one or more components of system 1200. It will be apparent that, in embodiments where the processor implements one or more of the functions described herein in hardware, the software described as corresponding to such functionality in other embodiments may be omitted.
  • User interface 1240 may include one or more devices for enabling communication with a user.
  • the user interface can be any device or system that allows information to be conveyed and/or received, and may include a display, a mouse, and/or a keyboard for receiving user commands.
  • user interface 1240 may include a command line interface or graphical user interface that may be presented to a remote terminal via communication interface 1250.
  • the user interface may be located with one or more other components of the system, or may located remote from the system and in communication via a wired and/or wireless communications network.
  • Communication interface 1250 may include one or more devices for enabling communication with other hardware devices.
  • communication interface 1250 may include a network interface card (NIC) configured to communicate according to the Ethernet protocol.
  • NIC network interface card
  • communication interface 1250 may implement a TCP/IP stack for communication according to the TCP/IP protocols.
  • TCP/IP protocols Various alternative or additional hardware or configurations for communication interface 1250 will be apparent.
  • Storage 1260 may include one or more machine-readable storage media such as read only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash-memory devices, or similar storage media.
  • storage 1260 may store instructions for execution by processor 1220 or data upon which processor 1220 may operate.
  • storage 1260 may store an operating system 1261 for controlling various operations of system 1200.
  • system 1200 implements a sequencer and includes sequencing hardware 1215
  • storage 1260 may include sequencing instructions 1262 for operating the sequencing hardware 1215, and sparse whole genome sequencing data 1263 obtained by the sequencing hardware 1215, although sparse whole genome sequencing data 1263 may be obtained from a source other than an associated sequencing platform.
  • memory 1230 may also be considered to constitute a storage device and storage 1260 may be considered a memory.
  • memory 1230 and storage 1260 may both be considered to be non-transitory machine-readable media.
  • non-transitory will be understood to exclude transitory signals but to include all forms of storage, including both volatile and non-volatile memories.
  • processor 1220 may include multiple microprocessors that are configured to independently execute the methods described herein or are configured to perform steps or subroutines of the methods described herein such that the multiple processors cooperate to achieve the functionality described herein.
  • processor 1220 may include a first processor in a first server and a second processor in a second server. Many other variations and configurations are possible.
  • storage 1260 of CNV profiling system 1200 may store one or more algorithms and/or instructions to carry out one or more functions or steps of the methods described or otherwise envisioned herein.
  • processor 1220 may comprise unadjusted CNV profile instructions 1264, adjusted segmentation values instructions 1265, adjustment score instructions 1266, selection instructions 1267, and reporting instructions 1268, among many other algorithms and/or instructions to carry out one or more functions or steps of the methods described or otherwise envisioned herein.
  • unadjusted CNV profile instructions or software 1264 direct the system to generate or determine an initial unadjusted CNV profile from the sparse genome data received or generated by the system, comprising a plurality of CNV calls for a plurality of chromosomes or other genomic regions or breakdowns.
  • the initial unadjusted CNV profile determination may be performed using any of a wide variety of different CNV analysis platforms or methods.
  • the unadjusted CNV profile instructions or software may direct the system to further process the initial unadjusted CNV profile.
  • the instructions or software or other instructions or software may direct the system to normalize the unadjusted CNV profile in any of a wide variety of ways and methods, including existing normalization methods.
  • the system may be configured to normalize the unadjusted CNV profile to a mean value of one.
  • adjusted segmentation values instructions or software 1265 direct the system to determine adjusted segmentation values for the plurality of CNV calls.
  • the adjusted segmentation values may be determined in a variety of different ways and methods.
  • the adjusted segmentation values instructions or software receive one or more input parameters for analysis.
  • input can include a range for ploidy for the genome data, and/or receives a range for a contamination rate for the genome data. These ranges can be predetermined, including being pre-programmed or otherwise received by or provided to the system.
  • a user such as a researcher, clinician, technician, or other user can provide the ranges as a setting, selection, or other information via a user interface or any other communication method.
  • adjusted segmentation values may be determined using the unadjusted segmentation value for a CNV segment, a ploidy value based on input, and/or a contamination rate value based on input.
  • adjustment score instructions or software 1266 direct the system to determine a plurality of adjustment scores using adjusted segmentation values for the CNV segments.
  • the adjustment scores may be determined in a variety of different ways and methods.
  • an adjustment score measures, and may allow for the minimization of, the difference between an adjusted segmentation value and a whole integer closest to the value.
  • the system may be designed such that CNV segments are likely to be clonal, meaning they are likely to be an integer such as 1, 2, 3, etc. rather than a value such as 1.4, 2.6, 3.1 , etc. This may represent an underlying assumption that CNV segments are likely to be an integer.
  • an adjustment score may be determined for an entire CNV profile, or the adjustment score may be determined for a sub-set of the CNV profile. This may be selected or otherwise determined by a user, may be selected or determined by the system, and/or may be selected or otherwise determined by other input or selection mechanism.
  • selection instructions or software 1267 direct the system to identify a best fit adjusted CNV profile.
  • the combination of contamination rate (C) and ploidy estimate ( P ) that best minimizes error in the unadjusted CNV profile, and thus generates the most likely adjusted CNV profile is selected as the best solution.
  • the segments will fall very close to integer values when the contamination rate and tumor ploidy values are correct.
  • the combination of contamination rate and tumor ploidy values that generate an adjustment score and adjusted CNV profile with the highest likelihood of accuracy is selected.
  • identifying a best fit adjusted CNV profile comprises comparison of the results of the adjustment score analysis to one or more factors to facilitate selection of a best fit CNV profile. This may result in one or more parameters or factors that may be used to select or influence selection of a best fit CNV profile, from among the profiles represented by the adjustment scores.
  • the parameters or factors may include, for example, variables such as preferences such as likely CNV segment integers, among others.
  • Other factors include ploidy distributions determined from previous data or analyses, and/or contamination distributions from previous data or analyses, including but not limited to analyses where one or more parameters of the sample or analysis were similar to the current sample or analysis. In other words, the system may utilize prior information to prioritize certain solutions.
  • selection instructions or software further direct the system to generate the best fit adjusted CNV profile using the selected adjustment. This can be performed by the system via a variety of methods and systems, to generate a final adjusted CNV profile that can be saved, reported, or otherwise stored or used by the CNV profiling system.
  • reporting instructions or software 1268 direct the system to generate a user report comprising information about the analysis performed by the system.
  • a report may comprise one or more of the original unadjusted CNV profile, the generated adjusted CNV profile, the received ploidy range ( P ) and interval, the received contamination rate (C) and interval, one or more calculated adjusted segmentation values, one or more calculated adjustment scores, a best fit adjustment score, information about the factor or factors that influenced selection of the best fit CNV profile, and/or other information.
  • the reporting instructions or software 1268 may direct the system to store the generated report or information in temporary and/or long-term memory or other storage. This may be local storage within system 1200 or associated with system 1200, or may be remote storage which received the report or information from or via system 1200. Additionally and/or alternatively, the report or information may be communicated or otherwise transmitted to another system, recipient, process, device, and/or other local or remote location.
  • the reporting instructions or software 1268 may direct the system to provide the generated report to a user or other system.
  • the CNV profiling system may visually display information about the best fit CNV profile and/or any other generated information on the user interface, which may be a screen or other display.
  • the CNV profiling system and approach described or otherwise envisioned herein enables a researcher, clinician, or other user to more accurately determine the CNV profile of the genetic sample, and thus to implement that information in research, diagnosis, treatment, and/or other decisions. This significantly improves the research, diagnosis, and/or treatment decisions of the researcher, clinician, or other user.
  • the methods and systems described herein comprise different limitations each comprising and analyzing millions of pieces of information.
  • sparse whole genome sequencing data comprises reads that number in the millions.
  • analyzing the data to generate an initial CNV profile requires millions of points of information.
  • the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
  • This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified.
  • inventive embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed.
  • inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein.

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

L'invention concerne un procédé (100) pour déterminer un profil de variation du nombre de copies (CNV), consistant : (i) à recevoir (110) des données de séquençage de génome clairsemé; (ii) à déterminer (120) un profil de CNV non ajusté; (iii) à normaliser (130) le profil de CNV non ajusté; (iv) à recevoir (140) une plage pour une ploïdie possible et pour un taux de contamination possible; (v) à déterminer (150) des valeurs de segmentation ajustées pour le profil de CNV; (vi) à déterminer (160) une pluralité de scores d'ajustement comprenant une distance entre une valeur de segmentation ajustée et un nombre entier le plus proche pour un appel de CNV; (vii) à comparer (170) la pluralité déterminée de scores d'ajustement à un ou plusieurs facteurs prédéterminés pour sélectionner un profil de CNV le mieux adapté; (viii) à sélectionner (180) l'un de la pluralité de scores d'ajustement en tant que mieux adapté pour le profil de variation du nombre de copies des cellules tumorales de la tumeur; (ix) à générer (190) un rapport de profil de CNV ajusté; et (x) à rapporter (192) le rapport de profil de CNV ajusté généré.
PCT/EP2020/084403 2019-12-12 2020-12-03 Procédé et système pour déterminer un profil cnv d'une tumeur à l'aide d'un séquençage de génome entier clairsemé WO2021115906A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/779,624 US20230011085A1 (en) 2019-12-12 2020-12-03 Method and system for determining a cnv profile for a tumor using sparse whole genome sequencing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962947023P 2019-12-12 2019-12-12
US62/947,023 2019-12-12

Publications (1)

Publication Number Publication Date
WO2021115906A1 true WO2021115906A1 (fr) 2021-06-17

Family

ID=73726811

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/084403 WO2021115906A1 (fr) 2019-12-12 2020-12-03 Procédé et système pour déterminer un profil cnv d'une tumeur à l'aide d'un séquençage de génome entier clairsemé

Country Status (2)

Country Link
US (1) US20230011085A1 (fr)
WO (1) WO2021115906A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014190286A2 (fr) * 2013-05-24 2014-11-27 Sequenom, Inc. Méthodes et systèmes d'évaluation non invasive de variations génétiques
WO2018136882A1 (fr) * 2017-01-20 2018-07-26 Sequenom, Inc. Procédés d'évaluation non invasive de variations du nombre de copies

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014190286A2 (fr) * 2013-05-24 2014-11-27 Sequenom, Inc. Méthodes et systèmes d'évaluation non invasive de variations génétiques
WO2018136882A1 (fr) * 2017-01-20 2018-07-26 Sequenom, Inc. Procédés d'évaluation non invasive de variations du nombre de copies

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LEFKOWITZ ROY B ET AL: "Clinical validation of a noninvasive prenatal test for genomewide detection of fetal copy number variants", AMERICAN JOURNAL OF OBSTETRICS & GYNECOLOGY, MOSBY, ST LOUIS, MO, US, vol. 215, no. 2, 17 February 2016 (2016-02-17), XP029663242, ISSN: 0002-9378, DOI: 10.1016/J.AJOG.2016.02.030 *

Also Published As

Publication number Publication date
US20230011085A1 (en) 2023-01-12

Similar Documents

Publication Publication Date Title
US20200255909A1 (en) Integrated machine-learning framework to estimate homologous recombination deficiency
US7324926B2 (en) Methods for predicting chemosensitivity or chemoresistance
US20220130488A1 (en) Methods for detecting copy-number variations in next-generation sequencing
KR101828052B1 (ko) 유전자의 복제수 변이(cnv)를 분석하는 방법 및 장치
US20240105282A1 (en) Methods for detecting bialllic loss of function in next-generation sequencing genomic data
JP2022516152A (ja) 転移性組織サンプルのトランスクリプトームデコンボリューション
CN108804876A (zh) 用于计算癌症样本纯度和染色体倍性的方法和装置
Madubata et al. Identification of potentially oncogenic alterations from tumor-only samples reveals Fanconi anemia pathway mutations in bladder carcinomas
US20230011085A1 (en) Method and system for determining a cnv profile for a tumor using sparse whole genome sequencing
US20210238689A1 (en) Tumor functional mutation and epitope loads as improved predictive biomarkers for immunotherapy response
US20230061214A1 (en) Guided analysis of single cell sequencing data using bulk sequencing data
US20210319849A1 (en) Method for assessing genome alignment basis
US20210158900A1 (en) A method and system for gene signature marker selection
JP2021515569A (ja) Rnaシーケンシングデータの転写発現レベルを解釈するために局所的なユニークな特徴を使用するシステム及び方法
CN114093417B (zh) 一种鉴定染色体臂杂合性缺失的方法和装置
EP4297037A1 (fr) Dispositif pour déterminer un indicateur de présence de hrd dans un génome d'un sujet
US20220399079A1 (en) Method and system for combined dna-rna sequencing analysis to enhance variant-calling performance and characterize variant expression status
Gafurov et al. Probabilistic Models of k-mer Frequencies
Fan et al. Performance comparison of Accucopy, Sequenza, and ControlFreeC
Reiser et al. Can matching improve the performance of boosting for identifying important genes in observational studies?
CN114242164A (zh) 一种全基因组复制的分析方法、装置和存储介质
WANG et al. Bayesian Graphical Models for Integrating Multiplatform Genomics Data
Tolosi Analysis of Array CGH Data for the Estimation of Genetic Tumor Progression
Kamath et al. Toward a measure of classification complexity in gene expression signatures
WO2019016353A1 (fr) Classification de mutations somatiques à partir d'un échantillon hétérogène

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20820103

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20820103

Country of ref document: EP

Kind code of ref document: A1