CN110895959B - Method, apparatus, system and computer readable medium for evaluating gene copy number - Google Patents

Method, apparatus, system and computer readable medium for evaluating gene copy number Download PDF

Info

Publication number
CN110895959B
CN110895959B CN201911089855.5A CN201911089855A CN110895959B CN 110895959 B CN110895959 B CN 110895959B CN 201911089855 A CN201911089855 A CN 201911089855A CN 110895959 B CN110895959 B CN 110895959B
Authority
CN
China
Prior art keywords
sample
tumor cell
mutation
formula
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911089855.5A
Other languages
Chinese (zh)
Other versions
CN110895959A (en
Inventor
张水荣
施巍炜
王凯
柳文进
黄璐嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Origimed Technology Shanghai Co ltd
Original Assignee
Origimed Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Origimed Technology Shanghai Co ltd filed Critical Origimed Technology Shanghai Co ltd
Priority to CN201911089855.5A priority Critical patent/CN110895959B/en
Publication of CN110895959A publication Critical patent/CN110895959A/en
Application granted granted Critical
Publication of CN110895959B publication Critical patent/CN110895959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a gene copy number evaluation method, a device, a system and a computer readable medium capable of improving the accuracy of an evaluation result of the gene copy number, wherein the gene copy number evaluation method evaluates the gene copy number in a detection sample according to the content of tumor cells in the sample and sequencing analysis data which are obtained after sequencing analysis of the detection sample and a comparison sample and respectively comprise the detection correction sequencing depth and the comparison correction sequencing depth after respective sequencing depth correction of the detection sample and the comparison sample, and is characterized by comprising the following steps of: acquiring a sample correction sequencing depth and a contrast correction sequencing depth, and calculating by adopting a formula (I) to obtain a corrected sequencing depth ratio; and (4) calculating to obtain the corresponding gene copy number by adopting a formula (II) according to the corrected sequencing depth ratio.

Description

Method, apparatus, system and computer readable medium for evaluating gene copy number
Technical Field
The invention belongs to the field of biology, and particularly relates to a method, a device and a system for evaluating gene copy number and a computer readable medium.
Background
The gene copy number is often needed to be analyzed when the genetic variation detection analysis is performed on the tumor tissue, however, the genetic tumor cells in the tumor tissue also have normal cells, and due to the existence of the normal cells, if the actual content of the tumor cells is not considered, the content of the normal cells is included, so that the analysis result of the gene copy number is greatly influenced.
Disclosure of Invention
The invention provides a method, a device and a system for evaluating gene copy number and a computer readable medium, aiming at improving the accuracy of evaluating the gene copy number.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a gene copy number evaluation method, which evaluates the gene copy number in a detection sample according to the content of tumor cells in the sample and sequencing analysis data which are obtained after sequencing analysis of the detection sample and a comparison sample and contain detection correction sequencing depth and comparison correction sequencing depth after respective sequencing depth correction of the detection sample and the comparison sample, and is characterized by comprising the following steps: acquiring a sample correction sequencing depth and a contrast correction sequencing depth, and calculating by adopting a formula (I) to obtain a corrected sequencing depth ratio; and (3) calculating to obtain the corresponding gene copy number by adopting a formula (II) according to the corrected sequencing depth ratio, wherein the formula (I) and the formula (II) are respectively as follows:
Figure BDA0002266527520000021
Figure BDA0002266527520000022
in the formula (I), ratio is the corrected sequencing depth ratio,
TD is the sample corrected sequencing depth after detecting the sample correction,
CD is the corrected control sequencing depth of the control sample;
in the formula (II), log2 ratioIs the logarithm of the base 2 ratio,
and purity is the content of tumor cells in a detected sample.
The gene copy number evaluation method provided by the present invention is characterized by further comprising: the method comprises the following steps of (1) evaluating the content of tumor cells in a sample to obtain the content of tumor cells in an evaluated detection sample, wherein sequencing analysis data further comprises average sample mutation frequencies respectively corresponding to mutation sites and control mutation frequencies respectively corresponding to the mutation sites, which are obtained by respectively sequencing and analyzing the detection sample and a control sample, and the content of tumor cells in the evaluated detection sample is used as the content of tumor cells in the detection sample in a formula (II): obtaining the average sample mutation frequency and the contrast mutation frequency of the mutation sites meeting the preset mutation conditions one by one, and calculating by adopting a formula (1) to obtain the content of single tumor cells corresponding to the mutation sites; once all the single tumor cell contents corresponding to all the mutation sites meeting the preset mutation conditions are obtained, obtaining all the single tumor cell contents and calculating by adopting a formula (2) to obtain the tumor cell contents in the detection sample; evaluating the tumor cell content in the test sample calculated by the formula (2) according to a predetermined evaluation rule to obtain the tumor cell content in the evaluated test sample, wherein the formula (1) and the formula (2) are respectively as follows,
Figure BDA0002266527520000031
Figure BDA0002266527520000032
in the formula (1), for a mutation site satisfying a predetermined mutation condition, P is the calculated content of a single tumor cell corresponding to the mutation site,
VAFa is the average mutation frequency of a sample corresponding to the mutation site in a detection sample,
VAFt is the mutation frequency of the tumor cells corresponding to the mutation site in the sample, VAFn is the control mutation frequency corresponding to the mutation site in the control sample,
in the formula (2), P is the content of tumor cells in the sample, n is the total number of mutation sites satisfying the predetermined mutation condition, and the content P of individual tumor cells corresponding to each of the n mutation sites is represented by P1, P2, P3, P4 … Pn.
The method for evaluating a gene copy number according to the present invention is characterized in that the predetermined mutation condition is such that the control mutation frequency corresponding to one mutation site is 0.4 or more and 0.6 or less.
The gene copy number evaluation method provided by the invention is characterized in that: the predetermined evaluation rule is: directly evaluating the tumor cell content in the detection sample obtained by calculation in the formula (2) as the tumor cell content in the detection sample after evaluation.
The gene copy number evaluation method provided by the invention is characterized in that: wherein, the predetermined evaluation rule is as follows: correcting the content of the tumor cells in the detection sample obtained by calculation by adopting the following tumor cell content correction model, evaluating the content of the tumor cells obtained after correction as the content of the tumor cells in the detection sample after evaluation,
y=ax-b (3)
in the formula (3), y is the content of tumor cells in the detection sample obtained after correction;
x is the calculated content p of the tumor cells in the detection sample,
a and b are model parameters, the value of a is 1.4, and the value range of b is 0.23-0.26.
The present invention also provides a gene copy number evaluation device, comprising: the corrected sequencing depth calculating unit is used for obtaining the corrected sequencing depth of the sample and the corrected sequencing depth of the contrast and calculating by adopting a formula (I) to obtain a corrected sequencing depth ratio; the gene copy number calculating unit calculates the corresponding gene copy number by adopting a formula (II) according to the corrected sequencing depth ratio,
wherein, the formula (one) and the formula (two) are respectively as follows:
Figure BDA0002266527520000041
Figure BDA0002266527520000042
in the formula (I), the ratio is the corrected sequencing depth ratio,
TD is the corrected sequencing depth of the sample after the sample correction is detected,
CD is the corrected control sequencing depth of the control sample;
in the formula (II), log2 ratioIs the logarithm of the base 2 ratio;
and purity is the content of tumor cells in a detected sample.
The gene copy number evaluation device according to the present invention is characterized by further comprising: the tumor cell content evaluation unit is used for evaluating the tumor cell content in the sample to obtain the tumor cell content in the sample after evaluation, and the tumor cell content in the sample is used as the tumor cell content in the detection sample in the formula (II), and the tumor cell content evaluation unit comprises: a single tumor cell content calculation part, which obtains the average sample mutation frequency and the contrast mutation frequency of the mutation sites meeting the preset mutation conditions one by one, and calculates the single tumor cell content corresponding to the mutation sites by adopting a formula (1); a tumor cell content calculation part for obtaining all single tumor cell contents corresponding to all mutation sites meeting the predetermined mutation conditions once the single tumor cell contents are obtained, and calculating the tumor cell contents in the detection sample by adopting a formula (2), a content evaluation part for evaluating the tumor cell contents in the detection sample obtained by calculating the formula (2) according to a predetermined evaluation rule to obtain the tumor cell contents in the detection sample after evaluation, wherein the formula (1) and the formula (2) are respectively shown as follows,
Figure BDA0002266527520000051
Figure BDA0002266527520000052
in the formula (1), for a mutation site satisfying a predetermined mutation condition, P is the calculated content of a single tumor cell corresponding to the mutation site,
VAFa is the average mutation frequency of a sample corresponding to the mutation site in a detection sample,
VAFt is the mutation frequency of the tumor cells corresponding to the mutation site in the sample, VAFn is the control mutation frequency corresponding to the mutation site in the control sample,
in the formula (2), P is the calculated tumor cell content in the test sample, n is the total number of mutation sites satisfying the predetermined mutation condition, and the individual tumor cell contents P corresponding to each of the n mutation sites are respectively represented by P1, P2, P3, and P4 … Pn.
The gene copy number evaluation apparatus according to the present invention is further characterized in that the predetermined evaluation rule is: directly evaluating the content of the tumor cells in the detection sample obtained by the calculation of the formula (2) as the content of the tumor cells in the detection sample after evaluation.
The gene copy number evaluation apparatus according to the present invention is further characterized in that the predetermined evaluation rule is: correcting the content of the tumor cells in the detection sample obtained by calculation by adopting the following tumor cell content correction model, evaluating the content of the tumor cells obtained by correction as the content of the tumor cells in the detection sample after evaluation,
y=ax-b (3)
in the formula (3), y is the content p' of the tumor cells in the corrected detection sample;
x is the calculated content p of the tumor cells in the detection sample,
a and b are the parameters of the model,
the value of a is 1.4, and the value of b ranges from 0.23 to 0.26.
The present invention also provides a gene copy number evaluation system, comprising: the sequencing analysis device is used for respectively carrying out sequencing analysis on the detection sample and the control sample to obtain sequencing analysis data for gene copy number evaluation; and a gene copy number evaluation device for evaluating the gene copy number in the detection sample according to the sequencing analysis data, wherein the gene copy number evaluation device is the above-mentioned gene copy number evaluation device.
The present invention also provides a gene copy number evaluation apparatus, characterized by comprising: a memory for storing computer program instructions; and a processor for executing computer program instructions, wherein the computer program instructions, when executed by the processor, cause the apparatus to perform the steps of the gene copy number evaluation method described above.
The present invention is also a computer-readable medium characterized by: the computer readable medium stores a computer program, wherein the computer program is executable by a processor to implement the steps of the gene copy number evaluation method according to claim.
Action and Effect of the invention
The gene copy number evaluation method, the gene copy number evaluation device, the gene copy number evaluation system and the computer readable medium provided by the invention have the advantages that the reliability of the evaluation result of the gene copy number can be improved compared with the evaluation of the copy number without considering the content of the tumor cells due to the fact that the content of the tumor cells is considered.
Drawings
FIG. 1 is a block diagram showing the construction of a gene copy number evaluation system according to example 1 of the present invention;
FIG. 2 is a block diagram showing the structure of a tumor cell content evaluation module according to example 1 of the present invention;
FIG. 3 is a flowchart showing the operation of the gene copy number evaluation system according to example 1 of the present invention;
FIG. 4 is a block diagram showing the construction of a gene copy number evaluation system according to example 2 of the present invention;
fig. 5 is a result of verifying the tumor cell content evaluation unit and method according to example 1 of the present invention.
Detailed Description
The following describes embodiments of the present invention with reference to the drawings. For the specific methods or materials used in the embodiments, those skilled in the art can make routine alternatives based on the existing technologies based on the technical idea of the present invention, and not limited to the specific descriptions of the embodiments of the present invention.
The methods used in the examples are conventional methods unless otherwise specified; the materials, reagents and the like used are commercially available unless otherwise specified.
In the following embodiments, the test sample is from the subject to be tested, in particular from, for example, tumor tissue; and a control sample refers to a sample from the same subject to be tested as a control, in particular from, for example, blood or tissue adjacent to cancer.
The sampling and sequencing processes for the various samples referred to in the various examples below are all conventional methods and processes.
Example 1
This example is a corresponding illustration of a tumor cell content assessment system and corresponding processing.
FIG. 1 is a block diagram showing the construction of a gene copy number evaluation system according to example 1 of the present invention.
As shown in fig. 1, the gene copy number evaluation system 100 includes: a sequencing analyzer 1 and a gene copy number estimating apparatus 2 communicatively connected to the sequencing analyzer 1 via a communication network 3.
The sequencing analysis device 1 is configured to perform sequencing analysis on the detection sample and the control sample to obtain sequencing analysis data for gene copy number evaluation, and specifically, to perform analysis based on the results of sequencing the detection sample and the control sample to obtain sequencing analysis data including the detection corrected sequencing depth and the control corrected sequencing depth of the detection sample and the control sample after respective sequencing depths are corrected.
Specifically, the above described detection-corrected sequencing depth and control-corrected sequencing depth were obtained by:
first, the gene sequencing depth normalization process in the tumor sample and the control sample is as follows:
Figure BDA0002266527520000081
description of the drawings:
Dnor: sequencing depth after homogenization;
Dgene: the sequencing depth of the gene;
Dmin: minimum sequencing depth in all genes;
Dmax: maximum sequencing depth in all genes;
deep GC content correction of gene sequencing in tumor samples and control samples, wherein each gene comprises four bases A, G, C and T, the ratio of the bases G and C to the four bases is the GC content, and the depth D after homogenization is realized by using a local polynomial regression (LOESS and LOWESS) modelnorFitting with GC content to obtain a fitting model, and fitting DnorSubstituting into the model for prediction to obtain a predicted value Dpre,DnorAnd DpreThe ratio is the sequencing depth after GC correction, i.e. TD (depth after tumour sample homogenization and GC correction, detection corrected sequencing depth) or CD (depth after control sample homogenization and GC correction, control corrected sequencing depth).
FIG. 2 is a block diagram showing the structure of a gene copy number evaluating apparatus according to example 1 of the present invention.
As shown in fig. 2, the gene copy number evaluating apparatus 2 evaluates the gene copy number in a test sample based on the above-mentioned sequencing analysis data, and includes: a corrected sequencing depth calculating unit 10, a tumor cell content calculating unit 20, a gene copy number calculating unit 30, a gene copy number evaluation side temporary storage unit 40, a gene copy number evaluation side communication unit 50, and a gene copy number evaluation side control unit 60.
The gene copy number evaluation side communication unit 50 receives the above-mentioned sequencing analysis data from the sequencing analysis apparatus 1 via the communication network 3.
The corrected sequencing depth calculating unit 10 obtains the corrected sequencing depth of the sample and the corrected sequencing depth of the contrast from the received sequencing analysis data, and calculates to obtain a corrected sequencing depth ratio by adopting a formula (one), wherein the formula (one) is specifically as follows:
Figure BDA0002266527520000101
in the formula (I), the ratio is the corrected sequencing depth ratio,
TD is the corrected sequencing depth of the sample after the sample correction is detected,
CD is the corrected control sequencing depth of the control sample.
The gene copy number calculation unit 30 calculates the corresponding gene copy number according to the corrected sequencing depth ratio by using a formula (two), wherein the formula (two) is specifically as follows:
Figure BDA0002266527520000102
Figure BDA0002266527520000103
in the formula (II), log2 ratioIs the logarithm of the base 2 ratio;
and purity is the content of tumor cells in a detected sample.
In this embodiment, the tumor cell content evaluation unit 20 is further included to evaluate the tumor cell content in the sample to obtain the tumor cell content in the evaluated sample, which is used as the purity (the content of tumor cells in the test sample) in the formula (ii). The sequencing analysis data further includes average sample mutation frequencies corresponding to the mutation sites and control mutation frequencies corresponding to the mutation sites, which are obtained by sequencing analysis of the detection sample and the control sample, respectively, and the tumor cell content evaluation unit 20 evaluates the tumor cell content in the detection sample according to the average sample mutation frequencies and the control mutation frequencies.
Specifically, the sequencing analysis apparatus 1 also obtains the above-mentioned average sample mutation frequency and control mutation frequency by: sequencing the detection sample and the control sample by a high-throughput second-generation sequencing platform, comparing the measured sequences with the human reference genome, and for each mutation site, assuming that a total of M sequences cover the site, wherein N sequences are inconsistent with the human reference genome at the site, and M-N sequences are consistent with the human reference genome at the site, detecting the average mutation frequency of the sample or the control sample for the siteWith a frequency of abrupt change of
Figure BDA0002266527520000111
As shown in fig. 2, specifically, in the present embodiment, the tumor cell content evaluation unit 20 has: a single tumor cell content calculation unit 21, a tumor content calculation unit 22, a content evaluation unit 23, and an information storage unit 24.
The information storage unit 24 stores the received average sample mutation frequencies corresponding to the respective mutation sites and the control mutation frequencies corresponding to the respective mutation sites in association with each other, and the specific association is shown in table 1.
Figure BDA0002266527520000112
The single tumor cell content calculation section 21 acquires the average sample mutation frequency and the control mutation frequency of the mutation sites satisfying the predetermined mutation conditions one by one, and calculates to obtain the single tumor cell content corresponding to the mutation site by using the formula (1).
Specifically, in this embodiment:
the single tumor cell content calculating part 21 correspondingly stores the information from the information storing part 24
In this embodiment, for a mutation site, when it is determined that the predetermined mutation condition is satisfied, the average sample mutation frequency and the control mutation frequency corresponding to the mutation site are obtained, where the predetermined mutation condition is: the control mutation frequency corresponding to one mutation site is more than or equal to 0.4 and less than or equal to 0.6, namely, VAF is more than or equal to 0.4n≤0.6。
For example, in table 1, for mutation site 1, the corresponding control mutation frequency is 0.5, and the predetermined mutation condition is satisfied, so that the corresponding average sample mutation frequency (0.4) and control mutation frequency (0.5) are obtained; and for mutation site 2, due to VAFn0.8, large, 0.6, and the mutation point does not satisfy the predetermined mutation condition, so that it is discarded.
Then, calculating by using formula (1) to obtain the content of each tumor cell corresponding to each mutation site, wherein for convenience of expression, the content of a single tumor cell is used as a reference mutation frequency, that is, for a mutation site satisfying a predetermined mutation condition, the obtained corresponding average sample mutation frequency and the obtained reference mutation frequency are calculated by using formula (1) to obtain the content of a single tumor cell corresponding to the mutation site, wherein formula (1) is as follows:
Figure BDA0002266527520000121
in the formula (1), for a mutation site satisfying a predetermined mutation condition,
p is the calculated content of single tumor cells corresponding to the mutation site,
VAFa is the average mutation frequency of a sample corresponding to the mutation site in a detection sample,
VAFt is the mutation frequency of tumor cells corresponding to the mutation site in a detection sample: when in use
When VAFa is less than or equal to VAFn, the value of VAFt is VAFa/2,
when VAFa is larger than VAFn, the value of VAFt is (VAFa +1)/2,
VAFn is the control mutation frequency corresponding to the mutation site in the control sample.
The content of single tumor cells corresponding to each mutation site meeting the preset mutation condition can be obtained by acquiring each mutation site meeting the preset mutation condition one by one and calculating by adopting a formula (1).
For example, in table 1, if m is 5, mutation sites 1, 3, 4 and 5 satisfying predetermined mutation conditions are present, and the value is required according to VAFt:
for mutation site 1, VAFt takes the value 0.4/2 ═ 0.2;
for mutation site 3, VAFt takes the value of (0.55+ 1)/2-0.78;
for mutation site 4, VAFt takes the value of (0.5+ 1)/2-0.75;
for mutation site 5, VAFt values were 0.45/2 to 0.23.
Accordingly, the individual tumor cell content of each of the several mutation sites was calculated as: 0.33, 0.18, 0.22, and 0.41.
Once all the single tumor cell contents corresponding to all the mutation sites satisfying the predetermined mutation condition are obtained, the tumor content calculation section 22 obtains all the calculated single tumor cell contents and calculates the tumor cell content in the test sample by using the formula (2), wherein the formula (2) is as follows:
Figure BDA0002266527520000131
in the formula (2), P is the calculated tumor cell content in the test sample, n is the total number of mutation sites satisfying the predetermined mutation condition, and the individual tumor cell contents P corresponding to the n mutation sites are P1, P2, P3, and P4 … Pn, respectively.
For example, in table 1, the total of 4 (n-4) single tumor cell contents, which are 0.33(P1), 0.18(P2), 0.22(P3), and 0.41(P4), were calculated, and the tumor cell content P-0.33 +0.18+0.22+ 0.41)/4-0.285 in the test sample was calculated.
The content evaluation unit evaluates the tumor cell content in the test sample calculated by formula (2) according to a predetermined evaluation rule to obtain the tumor cell content in the final test sample, for convenience of description, in this embodiment, the tumor cell content in the test sample obtained by evaluation is named as the tumor cell content in the test sample after evaluation, and in this embodiment, the predetermined evaluation rule is: directly evaluating the tumor cell content in the detection sample obtained by calculation in the formula (2) as the tumor cell content in the detection sample after evaluation. Accordingly, for example, 0.285 calculated above is directly used to measure the tumor cell content in the sample after the above evaluation.
The gene copy number evaluation side temporary storage unit 40 temporarily stores relevant data or parameters generated by the operation of the gene copy number evaluation apparatus 2.
The gene copy number evaluation side control unit 60 includes a computer program that controls the operation of the post-correction sequencing depth calculation unit 10, the tumor cell content calculation unit 20, the gene copy number calculation unit 30, the gene copy number evaluation side temporary storage unit 40, and the gene copy number evaluation side communication unit 50.
FIG. 3 is a flowchart showing the operation of the gene copy number evaluation system according to example 1 of the present invention.
As shown in fig. 3, in the present embodiment, the operation flow of the gene copy number evaluation system 100 includes the following steps:
step S1, the sequencing analysis device 1 obtains sequencing analysis data required for gene copy number evaluation, and sends the sequencing analysis data to the gene copy number evaluation device 2 through the communication network 3, and then the process goes to step S2;
step S2, the gene copy number evaluation side communication unit 50 receives the above sequencing data from the sequencing analysis device 1 via the communication network 3, the information storage 24 stores each average sample mutation frequency and the control mutation frequency in correspondence with the corresponding mutation site, and then proceeds to step S3;
step S3, the single tumor cell content calculation section 21 obtains the average sample mutation frequency and the control mutation frequency of the mutation sites satisfying the predetermined mutation conditions one by one, calculates the single tumor cell content corresponding to the mutation site by using the formula (1), and then proceeds to step S4;
step S4, the gene copy number evaluation side control unit 60 judges whether or not all the single tumor cell contents corresponding to all the mutation sites satisfying the predetermined mutation condition are obtained, and when judged yes, proceeds to step S5, and when judged no, returns to step S3;
step S5, the tumor content calculation section 22 obtains all the calculated contents of the single tumor cells and calculates the content of the tumor cells in the test sample using the formula (2), and then proceeds to step S6;
step S6, the content evaluation part evaluates the tumor cell content in the detection sample calculated by the formula (2) according to a preset evaluation rule to obtain the tumor cell content in the detection sample after evaluation, and then the step S7 is carried out;
and step S7, calculating the corresponding gene copy number by the gene copy number calculating unit 30 according to the corrected sequencing depth ratio and the content of the tumor cells in the detection sample after evaluation by adopting a formula (II).
Example 2
The following is a description of example 2.
In embodiment 2, the same components as those in embodiment 1 are given the same reference numerals, and the same descriptions are omitted.
FIG. 4 is a block diagram showing the structure of a gene copy number evaluating apparatus according to example 2 of the present invention.
As shown in fig. 4, in the present embodiment, the gene copy number evaluation apparatus 4 has a post-correction sequencing depth calculation unit 10, a tumor cell content calculation unit 220, a gene copy number calculation unit 30, a gene copy number evaluation side temporary storage unit 40, a gene copy number evaluation side communication unit 50, and a gene copy number evaluation side control unit 60.
In this embodiment, the gene copy number evaluation apparatus 4 is different from embodiment 1 in that the content evaluation section 223 in the tumor cell content calculation unit employs a predetermined evaluation rule in the content evaluation section 223, and the predetermined evaluation rule in this embodiment is: correcting the content of the tumor cells in the sample obtained by calculation in the formula (2) by adopting the following tumor cell content correction model, evaluating the content of the tumor cells obtained after correction as the content of the tumor cells in the detected sample after evaluation,
y=ax-b (3)
in the formula (3), y is the content of tumor cells in the detected sample obtained after correction;
x is the tumor cell content p in the detection sample obtained by calculation by adopting the formula (2);
a and b are model parameters, the value of a is 1.4, and the value range of b is 0.23-0.26.
Example 3
This example is intended to verify the reliability of the evaluation of the tumor cell content in a specimen using the tumor cell content calculation unit 20 in the tumor cell content evaluation system of example 1 and the corresponding processing. The details are as follows.
Sample selection: paired tumor cell line samples and normal cell line samples were selected and purchased as commercial cell lines in Table 2, where the tumor cell lines were all pure tumor cells, i.e., 100% of tumor cells, and the normal cell lines were all pure normal cells, i.e., 100% of normal cells.
Figure BDA0002266527520000171
The mixing was performed by two methods:
the method comprises the following steps: before sequencing (experimental stage), DNA extracted from tumor cell lines and DNA extracted from normal cell lines are mixed according to different proportions, for example, as shown in Table 3, and then target region-targeted sequencing is performed on the mixed sample.
Figure BDA0002266527520000172
The second method comprises the following steps: in the data analysis stage, data extraction mixing is carried out, specifically: firstly, 6 pure cell lines in the table 1 are sequenced to obtain sequencing data, and then the data are proportionally extracted from the sequencing data to carry out sample mixing, wherein the specific ratio is shown in the table 4
Figure BDA0002266527520000181
The mixed method of the first method and the second method is to obtain the actual content of the tumor cells in the tumor tissue sample through simulation, that is, the reliability of the tumor cell content evaluation system and the reliability of the tumor cell content evaluation in the sample can be tested if the actual result of the cell line mixed sample is used as the reference standard and the actual tumor cell content is known through the mixed sample.
Therefore, in this example, the existing facts and PureCN algorithms are adopted, and the tumor cell content evaluation system and the corresponding processing procedure (named oriprecision for convenience of description) of example 1 are adopted to calculate or evaluate the tumor cell content of the sample mixed by the first method and the second method, respectively. Wherein, the normal cell line is used as a control sample, and the mixture of the normal cell line and the paired tumor cell line is used as a detection sample.
Fig. 5 is a result of verifying the tumor cell content evaluation unit and method according to example 1 of the present invention.
The results (x) of the contents of tumor cells in the test samples obtained by the above three methods, respectively, are shown in Table 5.
Figure BDA0002266527520000191
The results were regressed with the actual tumor cell content after mixing (i.e., the mixed tumor cell content in tables 3 and 4) and are shown in fig. 5.
In FIG. 5, the left column shows regression analysis between the tumor cell content calculated or estimated for method one blend (TestPurity) and the actual tumor cell content after method one blend (TruePurity, blended tumor cell content in Table 3); the right column shows regression analysis between the tumor cell content (TestPurity) calculated or estimated by method two mixture and the actual tumor cell content (trueupurity, mixed tumor cell content in table 4) after method two mixture; in fig. 5, the value range of the correlation coefficient R is greater than or equal to-1 and less than or equal to 1, when R is less than 0, x and y are negatively correlated, when R is equal to 0, x and y are uncorrelated, and when R is greater than 0, x and y are positively correlated, the larger the absolute value is, the stronger the correlation between x and y is, the better the model regression is; the smaller the p value, the more statistically and credibly the correlation between x and y is.
As can be seen from fig. 5, with the apparatus and method provided in example 1, both of the first and second mixed detection results have the largest R and the smallest p, which means that the apparatus and method provided in example 1 can obtain the highest proximity, i.e. the most accurate and reliable tumor cell content compared with the other two methods.
Example 4
In this embodiment, after the samples 1 to 5 provided in embodiment 3 are mixed by the first method, the tumor cell content evaluation units 20 and the corresponding processing procedures in embodiments 1 and 2 are respectively used to obtain the tumor cell content in the evaluated test samples, and the results are shown in table 6, wherein: each sample was repeated 5 times, in example 2, b was 0.23, 0.25, and 0.26, respectively, and the actual tumor cell content was the mixed tumor cell content of the first mixed sample of the above method.
Figure BDA0002266527520000201
Figure BDA0002266527520000211
As can be seen from table 6, the mean approach of the content of each tumor cell obtained in example 1 to the actual tumor cell content was 59%, and the mean approach of the content of each tumor cell obtained in example 2 using different values of b to the actual tumor cell content was 79%, which indicates that the content of each tumor cell obtained after the correction in example 2 was closer to the actual tumor cell content, and the reliability was improved by 20% compared to the result obtained without the correction in example 1.
Then, the copy number evaluation system 200 and the corresponding process in the embodiment 1 and the embodiment 2 are respectively adopted to evaluate the copy number of the gene with copy number variation in the corresponding detection sample, and simultaneously, the method without considering the tumor cell content is also adopted to evaluate the copy number, when the tumor cell content is not considered, the purity value in the formula (II) is 1, and the result is shown in table 7.
Figure BDA0002266527520000212
Figure BDA0002266527520000221
As seen from table 7, the average closeness of each copy number result obtained in example 1 to the actual copy number result was 74%, and the average closeness of the copy number result obtained in example 2 using different b values to the actual copy number result was 77%, and the average closeness of the result irrespective of the tumor cell content to the actual copy number result was 53%.
It can be seen that the copy number evaluation of the tumor cell content obtained in example 1 and example 2 is at least 20% closer than the copy number evaluation result without considering the tumor cell content, and the result is more reliable than the copy number evaluation without considering the tumor cell content; the copy number result obtained by using the content of the tumor cells obtained after the correction of the example 2 is closer to the actual copy number result, and the reliability is improved by 3% compared with the result obtained by the uncorrected correction of the example 1.
Effects and effects of the embodiments
Example 1 and example 2 relate to a method, apparatus and system for evaluating gene copy number, and it can be seen from example 4 that, in comparison with the evaluation of copy number without considering the content of tumor cells, the reliability of the evaluation result of gene copy number can be improved due to the content of tumor cells;
it can be seen from example 3 that, compared with the two existing methods for calculating the tumor cell content, the evaluation of the tumor cell content in example 1 has better regression between the evaluation result of the tumor cell content in the detection sample and the actual result, that is, the analysis of the tumor cell content in the detection sample is more reliable, so that the evaluation result of the gene copy number can be improved by the evaluation result of the tumor cell content provided by the gene copy number system and the corresponding method provided by the present invention;
while the evaluation of the tumor cell content in example 2 is actually a further correction of the evaluation result in example 1, it can be seen from example 4 that the correction can obtain a more accurate tumor cell content than in example 1, and accordingly, the gene copy number result obtained by the evaluation of the tumor cell content is also more accurate than in example 1, that is, the evaluation of the tumor cell content in example 2 has a higher reliability of the analysis of the tumor cell content in the test sample, and thus the reliability in the application of the gene copy number considering the tumor cell content is also higher.
In addition, correspondingly, the invention also discloses a gene copy number evaluation device, which comprises: a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, cause the apparatus to perform the steps of the method performed by the gene copy number evaluation device of an embodiment. For details of the technical section, reference may be made to the above-mentioned embodiments herein, which are not described herein again.
Accordingly, the present invention also discloses a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the method of operating the gene copy number estimation apparatus as described above. For details, reference may be made to the embodiments, which are not described herein again.

Claims (12)

1. A gene copy number evaluation method is used for evaluating the gene copy number in a detection sample according to the content of tumor cells in the sample and sequencing analysis data, wherein the sequencing analysis data comprises a detection correction sequencing depth and a control correction sequencing depth after sequencing analysis and sequencing depth correction are respectively carried out on the detection sample and a control sample, and the method is characterized by comprising the following steps of:
obtaining the detection correction sequencing depth and the contrast correction sequencing depth, and calculating by adopting a formula (I) to obtain a corrected sequencing depth ratio;
calculating by adopting a formula (II) according to the corrected sequencing depth ratio to obtain the corresponding gene copy number,
wherein, the formula (one) and the formula (two) are respectively as follows:
Figure FDA0003554620650000011
Figure FDA0003554620650000012
in the formula (I), the ratio is the corrected sequencing depth ratio,
TD is the corrected sequencing depth of the test sample after correction,
CD is the corrected control sequencing depth of the control sample;
in the formula (II), log2ratio is the logarithm of the base 2 ratio,
and the purity is the content of the tumor cells in the detection sample.
2. The method for evaluating a gene copy number according to claim 1, further comprising:
the tumor cell content in the sample is the tumor cell content in the evaluated detection sample obtained after evaluation,
wherein the sequencing analysis data further comprises each average sample mutation frequency corresponding to each mutation site and each control mutation frequency corresponding to each mutation site, which are obtained by sequencing analysis of the detection sample and the control sample respectively,
evaluating the tumor cell content in the sample by the following steps to obtain the tumor cell content in the test sample after evaluation, and using the tumor cell content in the test sample as the tumor cell content in the formula (two):
acquiring the average sample mutation frequency and the contrast mutation frequency of the mutation sites meeting the preset mutation conditions one by one, and calculating by adopting a formula (1) to obtain the content of single tumor cells corresponding to the mutation sites;
once all the single tumor cell contents corresponding to all the mutation sites meeting the preset mutation conditions are obtained, obtaining all the single tumor cell contents and calculating by adopting a formula (2) to obtain the tumor cell contents in the detection sample;
evaluating the tumor cell content in the detection sample obtained by calculation in the formula (2) according to a preset evaluation rule to obtain the tumor cell content in the detection sample after evaluation,
wherein, the formula (1) and the formula (2) are respectively shown as follows,
Figure FDA0003554620650000021
Figure FDA0003554620650000022
formula (1) is directed to a mutation site satisfying the predetermined mutation condition,
p is the calculated content of the single tumor cell corresponding to the mutation site,
VAFa is the average mutation frequency of the sample corresponding to the mutation site in the detection sample,
VAFt is the mutation frequency of the tumor cells corresponding to the mutation site in the sample,
VAFn is the corresponding control mutation frequency of the mutation site in the control sample,
in the formula (2), P is the content of tumor cells in the test sample, n is the total number of mutation sites satisfying the predetermined mutation condition, and the individual tumor cell contents P corresponding to each of the n mutation sites are respectively represented by P1, P2, P3 and P4 … Pn.
3. The method for evaluating a gene copy number according to claim 2, wherein:
wherein the predetermined mutation condition is that the control mutation frequency corresponding to one of the mutation sites is 0.4 or more and 0.6 or less.
4. The method for evaluating a gene copy number according to claim 2 or 3, wherein:
the predetermined evaluation rule is: directly evaluating the tumor cell content in the detection sample obtained by calculation in the formula (2) as the tumor cell content in the detection sample after evaluation.
5. The method for evaluating a gene copy number according to claim 2 or 3, wherein:
wherein the predetermined evaluation rule is: correcting the content of the tumor cells in the detection sample obtained by calculation by adopting the following tumor cell content correction model, and evaluating the content of the tumor cells obtained after correction as the content of the tumor cells in the detection sample after evaluation,
y=ax-b (3)
in the formula (3), y is the content of tumor cells in the detection sample obtained after the rectification;
x is the tumor cell content p in the detection sample obtained by calculation,
a and b are model parameters, the value of a is 1.4, and the value range of b is 0.23-0.26.
6. A gene copy number evaluation device for evaluating the gene copy number in a test sample based on the tumor cell content in the sample and sequencing analysis data, wherein the sequencing analysis data includes a detection correction sequencing depth and a control correction sequencing depth after sequencing analysis and sequencing depth correction are performed on the test sample and a control sample, respectively, the device is characterized by comprising:
the corrected sequencing depth calculating unit is used for obtaining the detection correction sequencing depth and the contrast correction sequencing depth and calculating by adopting a formula (I) to obtain a corrected sequencing depth ratio;
a gene copy number calculation unit, which calculates the corresponding gene copy number by adopting a formula (II) according to the corrected sequencing depth ratio,
wherein, the formula (one) and the formula (two) are respectively as follows:
Figure FDA0003554620650000041
Figure FDA0003554620650000042
in the formula (I), the ratio is the corrected sequencing depth ratio,
TD is the corrected sequencing depth of the test sample after correction,
CD is the corrected control sequencing depth of the control sample;
in the formula (II), log2ratio is the logarithm of ratio with base 2;
and purity is the content of the tumor cells in the detection sample.
7. The apparatus for evaluating a gene copy number according to claim 6, further comprising:
a tumor cell content evaluation unit for evaluating the tumor cell content in the sample to obtain the tumor cell content in the detected sample after evaluation,
wherein the sequencing analysis data further comprises each average sample mutation frequency corresponding to each mutation site and each control mutation frequency corresponding to each mutation site, which are obtained by sequencing analysis of the detection sample and the control sample respectively,
using the tumor cell content in the test sample after evaluation as the tumor cell content in the test sample in formula (ii), the tumor cell content evaluation unit having:
a single tumor cell content calculation part, which acquires the average sample mutation frequency and the contrast mutation frequency of the mutation sites meeting the preset mutation conditions one by one and calculates the single tumor cell content corresponding to the mutation sites by adopting a formula (1);
a tumor cell content calculation unit for obtaining the contents of all the single tumor cells corresponding to all the mutation sites satisfying the predetermined mutation conditions and calculating the contents of the tumor cells in the test sample by using the formula (2),
a content evaluation section for evaluating the tumor cell content in the test sample calculated by the formula (2) according to a predetermined evaluation rule to obtain the tumor cell content in the evaluated test sample,
wherein, the formula (1) and the formula (2) are respectively shown as follows,
Figure FDA0003554620650000051
Figure FDA0003554620650000052
formula (1) is directed to a mutation site satisfying the predetermined mutation condition,
p is the calculated content of the single tumor cell corresponding to the mutation site,
VAFa is the average mutation frequency of the sample corresponding to the mutation site in the detection sample,
VAFt is the mutation frequency of the tumor cells corresponding to the mutation site in the sample,
VAFn is the corresponding control mutation frequency of the mutation site in the control sample,
in the formula (2), P is the calculated tumor cell content in the test sample, n is the total number of mutation sites satisfying the predetermined mutation condition, and the individual tumor cell content P corresponding to each of the n mutation sites is represented by P1, P2, P3, and P4 … Pn, respectively.
8. The gene copy number evaluation device according to claim 7, wherein:
wherein the predetermined evaluation rule is: directly evaluating the tumor cell content in the detection sample obtained by calculation in the formula (2) as the tumor cell content in the detection sample after evaluation.
9. The gene copy number evaluation device according to claim 7, wherein: wherein the predetermined evaluation rule is: correcting the content of the tumor cells in the detection sample obtained by calculation by adopting the following tumor cell content correction model, and evaluating the content of the tumor cells obtained after correction as the content of the tumor cells in the detection sample after evaluation,
y=ax-b (3)
in the formula (3), y is the content p' of the tumor cells in the corrected detection sample;
x is the tumor cell content p in the detection sample obtained by calculation,
a and b are the parameters of the model,
the value of a is 1.4, and the value of b ranges from 0.23 to 0.26.
10. A gene copy number evaluation system, comprising:
the sequencing analysis device is used for respectively carrying out sequencing analysis on the detection sample and the control sample to obtain sequencing analysis data for gene copy number evaluation;
a gene copy number evaluating device for evaluating the gene copy number in the test sample based on the sequencing analysis data,
wherein the gene copy number evaluation device is the gene copy number evaluation device according to any one of claims 6 to 8.
11. A gene copy number evaluation apparatus, comprising:
a memory for storing computer program instructions; and
a processor for executing the instructions of the computer program,
wherein the computer program instructions, when executed by the processor, cause the apparatus to perform the steps of the gene copy number evaluation method of any one of claims 1 to 5.
12. A computer-readable medium, characterized in that:
the computer-readable medium stores a computer program,
wherein the computer program is executable by a processor to implement the steps of the gene copy number evaluation method according to any one of claims 1 to 5.
CN201911089855.5A 2019-11-08 2019-11-08 Method, apparatus, system and computer readable medium for evaluating gene copy number Active CN110895959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911089855.5A CN110895959B (en) 2019-11-08 2019-11-08 Method, apparatus, system and computer readable medium for evaluating gene copy number

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911089855.5A CN110895959B (en) 2019-11-08 2019-11-08 Method, apparatus, system and computer readable medium for evaluating gene copy number

Publications (2)

Publication Number Publication Date
CN110895959A CN110895959A (en) 2020-03-20
CN110895959B true CN110895959B (en) 2022-05-20

Family

ID=69786541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911089855.5A Active CN110895959B (en) 2019-11-08 2019-11-08 Method, apparatus, system and computer readable medium for evaluating gene copy number

Country Status (1)

Country Link
CN (1) CN110895959B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462816B (en) * 2020-03-31 2022-05-20 至本医疗科技(上海)有限公司 Method, electronic device and computer storage medium for detecting microdeletion and microduplication of germ line genes
CN111477276B (en) * 2020-04-02 2020-12-15 上海之江生物科技股份有限公司 Method and device for obtaining species-specific consensus sequence of microorganism and application of species-specific consensus sequence
CN113823353B (en) * 2021-08-12 2024-02-09 上海厦维医学检验实验室有限公司 Gene copy number amplification detection method, device and readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106834502A (en) * 2017-03-06 2017-06-13 明码(上海)生物科技有限公司 A kind of spinal muscular atrophy related gene copy number detection kit and method based on gene trap and two generation sequencing technologies
CN109887546A (en) * 2019-01-15 2019-06-14 明码(上海)生物科技有限公司 A kind of single-gene or polygenes copy number detection system and method based on two generation sequencing technologies
CN110289047A (en) * 2019-05-15 2019-09-27 西安电子科技大学 Tumour purity and absolute copy number prediction technique and system based on sequencing data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120141472A1 (en) * 2009-05-29 2012-06-07 Shalini Singh Methods of scoring gene copy number in a biological sample using in situ hybridization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106834502A (en) * 2017-03-06 2017-06-13 明码(上海)生物科技有限公司 A kind of spinal muscular atrophy related gene copy number detection kit and method based on gene trap and two generation sequencing technologies
CN109887546A (en) * 2019-01-15 2019-06-14 明码(上海)生物科技有限公司 A kind of single-gene or polygenes copy number detection system and method based on two generation sequencing technologies
CN110289047A (en) * 2019-05-15 2019-09-27 西安电子科技大学 Tumour purity and absolute copy number prediction technique and system based on sequencing data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Predicting tumor purity from methylation microarray data;Zhang, NQ (Zhang, Naiqian) et al.;《BIOINFORMATICS》;20151209;第31卷(第21期);全文 *
VCF2CNA: A tool for efficiently detecting copy-number alterations in VCF genotype data and tumor purity;Putnam, DK (Putnam, Daniel K.) et al.;《SCIENTIFIC REPORTS》;20190802;第9卷;全文 *
荧光原位杂交检测非小细胞肺癌EGFR基因拷贝数状况分析;郭开华等;《解剖学研究》;20110825(第04期);全文 *

Also Published As

Publication number Publication date
CN110895959A (en) 2020-03-20

Similar Documents

Publication Publication Date Title
CN110895959B (en) Method, apparatus, system and computer readable medium for evaluating gene copy number
CN109182525B (en) A kind of microsatellite biomarker combinations, detection kit and application thereof
Jónás et al. Estimating the effective population size from temporal allele frequency changes in experimental evolution
Campos et al. The effects on neutral variability of recurrent selective sweeps and background selection
CN109207594A (en) A method of microsatellite stable state and genome variation are detected by blood plasma based on the sequencing of two generations
Booker et al. Understanding the factors that shape patterns of nucleotide diversity in the house mouse genome
CN106676178B (en) Method and system for evaluating tumor heterogeneity
Živković et al. Transition densities and sample frequency spectra of diffusion processes with selection and variable population size
JP2015527057A5 (en)
EP3293270B1 (en) Reagent kit, apparatus, and method for detecting chromosome aneuploidy
CN111755068B (en) Method and device for identifying tumor purity and absolute copy number based on sequencing data
CN113096728B (en) Method, device, storage medium and equipment for detecting tiny residual focus
CN115394357B (en) Site combination for judging sample pairing or pollution and screening method and application thereof
KR101936933B1 (en) Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same
CN110853705B (en) Method, device and system for evaluating content of tumor cells and computer readable medium
KR101936934B1 (en) Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same
CN109390034B (en) Method for detecting normal tissue content and tumor copy number in tumor tissue
KR20140099189A (en) A method and apparatus of providing information on a genomic sequence based personal marker
US20220228209A1 (en) Dna methylation sequencing analysis methods
Muralidharan et al. Detecting mutations in mixed sample sequencing data using empirical Bayes
US20160265051A1 (en) Methods for Detection of Fetal Chromosomal Abnormality Using High Throughput Sequencing
JP7072825B2 (en) Copy number measuring device, copy number measuring program and copy number measuring method
CN114517223A (en) Method for screening SNP (Single nucleotide polymorphism) sites and application thereof
US20200354798A1 (en) Methods for determining tumor microsatellite instability
CN111627498B (en) Method and device for correcting GC bias of sequencing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant