CN114182022B - Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution - Google Patents
Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution Download PDFInfo
- Publication number
- CN114182022B CN114182022B CN202210110195.XA CN202210110195A CN114182022B CN 114182022 B CN114182022 B CN 114182022B CN 202210110195 A CN202210110195 A CN 202210110195A CN 114182022 B CN114182022 B CN 114182022B
- Authority
- CN
- China
- Prior art keywords
- mutation
- cfdna
- sequencing
- tumor
- frequency distribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000035772 mutation Effects 0.000 title claims abstract description 77
- 238000009826 distribution Methods 0.000 title claims abstract description 64
- 230000036438 mutation frequency Effects 0.000 title claims abstract description 53
- 201000007270 liver cancer Diseases 0.000 title claims abstract description 8
- 208000014018 liver neoplasm Diseases 0.000 title claims abstract description 8
- 238000000034 method Methods 0.000 title abstract description 5
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 77
- 238000012163 sequencing technique Methods 0.000 claims abstract description 53
- 238000001514 detection method Methods 0.000 claims abstract description 20
- 238000012216 screening Methods 0.000 claims abstract description 8
- 239000011324 bead Substances 0.000 claims description 30
- 210000002381 plasma Anatomy 0.000 claims description 20
- 210000001519 tissue Anatomy 0.000 claims description 20
- 238000009396 hybridization Methods 0.000 claims description 16
- 239000000523 sample Substances 0.000 claims description 15
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 claims description 12
- 230000008859 change Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 229960002685 biotin Drugs 0.000 claims description 6
- 235000020958 biotin Nutrition 0.000 claims description 6
- 239000011616 biotin Substances 0.000 claims description 6
- 239000013068 control sample Substances 0.000 claims description 6
- 238000000746 purification Methods 0.000 claims description 6
- 108020004707 nucleic acids Proteins 0.000 claims description 5
- 150000007523 nucleic acids Chemical class 0.000 claims description 5
- 102000039446 nucleic acids Human genes 0.000 claims description 5
- 230000009467 reduction Effects 0.000 claims description 5
- 238000012165 high-throughput sequencing Methods 0.000 claims description 4
- 210000003819 peripheral blood mononuclear cell Anatomy 0.000 claims description 4
- 238000005498 polishing Methods 0.000 claims description 4
- 238000011282 treatment Methods 0.000 claims description 4
- 108090001008 Avidin Proteins 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims description 2
- 238000011084 recovery Methods 0.000 claims description 2
- 230000002980 postoperative effect Effects 0.000 abstract description 18
- 210000004369 blood Anatomy 0.000 abstract description 15
- 239000008280 blood Substances 0.000 abstract description 15
- 238000001356 surgical procedure Methods 0.000 abstract description 8
- 206010069754 Acquired gene mutation Diseases 0.000 abstract description 4
- 206010064571 Gene mutation Diseases 0.000 abstract description 4
- 238000011156 evaluation Methods 0.000 abstract description 4
- 230000037439 somatic mutation Effects 0.000 abstract description 4
- 238000005070 sampling Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 2
- 239000000047 product Substances 0.000 description 17
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 13
- 108020004414 DNA Proteins 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 10
- 230000003902 lesion Effects 0.000 description 10
- 239000000203 mixture Substances 0.000 description 10
- 108090000623 proteins and genes Proteins 0.000 description 10
- 239000000872 buffer Substances 0.000 description 9
- 239000003153 chemical reaction reagent Substances 0.000 description 9
- 239000006228 supernatant Substances 0.000 description 8
- 239000011534 wash buffer Substances 0.000 description 8
- 238000002156 mixing Methods 0.000 description 7
- 235000019441 ethanol Nutrition 0.000 description 6
- 239000012634 fragment Substances 0.000 description 5
- 238000010828 elution Methods 0.000 description 4
- 206010073071 hepatocellular carcinoma Diseases 0.000 description 4
- 231100000844 hepatocellular carcinoma Toxicity 0.000 description 4
- 238000004393 prognosis Methods 0.000 description 4
- 238000001179 sorption measurement Methods 0.000 description 4
- 108010090804 Streptavidin Proteins 0.000 description 3
- XLYOFNOQVPJJNP-PWCQTSIFSA-N Tritiated water Chemical compound [3H]O[3H] XLYOFNOQVPJJNP-PWCQTSIFSA-N 0.000 description 3
- 238000007605 air drying Methods 0.000 description 3
- 238000005251 capillar electrophoresis Methods 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000002096 quantum dot Substances 0.000 description 3
- 239000011541 reaction mixture Substances 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 239000004606 Fillers/Extenders Substances 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 238000005352 clarification Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 239000003623 enhancer Substances 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000011534 incubation Methods 0.000 description 2
- 238000011528 liquid biopsy Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004083 survival effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 208000005623 Carcinogenesis Diseases 0.000 description 1
- 108091035707 Consensus sequence Proteins 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- 108010067770 Endopeptidase K Proteins 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 206010016654 Fibrosis Diseases 0.000 description 1
- 208000032818 Microsatellite Instability Diseases 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 239000012807 PCR reagent Substances 0.000 description 1
- 208000007660 Residual Neoplasm Diseases 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- -1 antibodies Proteins 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 230000037429 base substitution Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000027455 binding Effects 0.000 description 1
- 150000001615 biotins Chemical class 0.000 description 1
- 238000007664 blowing Methods 0.000 description 1
- 230000036952 cancer formation Effects 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 238000000546 chi-square test Methods 0.000 description 1
- 230000007882 cirrhosis Effects 0.000 description 1
- 208000019425 cirrhosis of liver Diseases 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000012149 elution buffer Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000010921 in-depth analysis Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000006249 magnetic particle Substances 0.000 description 1
- 239000011259 mixed solution Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000009871 nonspecific binding Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Abstract
The invention discloses a method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution, which is a screening strategy for gene mutation in blood, does not need to rely on tumor tissue sampling, and is a completely noninvasive tumor mutation detection scheme for accurately screening tumor source mutation. High throughput UMI sequencing was performed by collecting pre-operative and post-operative blood samples of tumor patients, extracting cfDNA separately. After high-precision sequence processing and merging, mutation detection is carried out on each blood sample. After identifying the mutations, all somatic mutations were used to group different types of gene mutations using frequency changes at two time points before and after surgery. And then integrating the screened tumor source mutations, wherein the screened tumor source specific mutations can be further used for dynamic evaluation of tumor burden of time series samples.
Description
Technical Field
The invention relates to the technical field of bioinformatics, in particular to a detection method of base mutation frequency distribution in cfDNA and application thereof in tumor prognosis evaluation.
Background
Cancer is one of the major scientific problems that endanger human health. Statistics data in 2020 shows that new cases of cancer can reach 1930 ten thousand each year, death can reach 1000 ten thousand, and the incidence rate is still rising year by year. At present, surgical excision is a main treatment mode of various tumors, but tumor patients still face the problems of unsatisfactory surgical excision rate and high recurrence probability. The existence of tiny residual focus after operation is the main reason of recurrence after tumor operation, and the tiny residual focus is detected in time after operation, which has important value for adjusting treatment scheme and improving prognosis of tumor patients. At present, the traditional imaging means and blood markers have very limited detection capability for tiny residual lesions, and the problem of prognosis evaluation which is continuously faced in clinic is difficult to solve. The liquid biopsy technology which is developed at high speed in recent years can dynamically evaluate the tumor load of a patient, has higher sensitivity, and provides a new tool and strategy for detecting tiny residual focus after tumor operation.
Wherein, based on genomic information of circulating tumor DNA (circulating tumor DNA) in plasma episome DNA (circulating free DNA), a dynamic tumor burden assessment strategy using sequencing technology to track tumor genomic features has become one of the clinical application hot spot fields. At present, a plurality of different clinical research institutions at home and abroad simultaneously develop clinical experiments for detecting residual tumor lesions of patients after operation by using cfDNA tracking so as to guide the optimization of treatment strategies. In 2020, the FDA approved the liquid biopsy companion therapy detection protocol of the first high throughput sequencing technique, and in 2021, the FDA gave SIGNATERA a breakthrough detection medical device identification confirming its value in assessing tiny residual lesions, monitoring prognosis. However, in solid tumors, especially liver cancer, detection of microscopic residual lesions also often requires reliance on tissue genomic information obtained prior to tissue sampling. How to non-invasively screen for tumor-specific mutations and evaluate tumor burden without reliance on tissue remains one of the significant problems that need to be addressed clinically.
CfDNA contains DNA fragments from different tissues and different cell populations. During the clinical course of a patient, the proportion of DNA fragments from different tissues in cfDNA will change dynamically with tumor burden and the patient's physiological and pathological state. Thus, analysis of mutation frequency variation by cfDNA samples at multiple time points provides the possibility to screen for mutations from specific tissues. However, at present, a time sequence cfDNA sample is used for tumor mutation screening, tumor burden is identified, and a scheme for evaluating the existence of tiny residual focus is not established.
Disclosure of Invention
In order to solve the technical problems, the invention develops a detection method for the base mutation frequency distribution in cfDNA, which is a screening strategy for gene mutation in blood, does not need to rely on tumor tissue sampling, and is a completely noninvasive tumor mutation detection scheme for accurately screening tumor-derived mutation.
The technical scheme adopted by the invention is as follows: a method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution, comprising the following steps:
Step 1, extracting cfDNA from the blood plasma of the extracted subject;
Step 2, repairing the tail end of the cfDNA extracted in the step 1, adding an A base at the 3' end of the cfDNA, connecting and sequencing truncated Y-shaped joints with molecular tags UMI at the two ends of the cfDNA after treatment to obtain a connection product, performing magnetic bead purification on the connection product, amplifying and enriching with primers containing index tag sequences for distinguishing different samples, performing magnetic bead purification, adding a probe with biotin marks for hybridization reaction, eluting a library with magnetic beads with chains and avidin marks, capturing biotin marks, amplifying and purifying with the magnetic beads, constructing a high-flux sequencing library, and finally sequencing;
Extracting UMI read sections in an original sequence file, separating UMI information of 3 bases at two ends of each read section, adding the extracted UMI information into read section names for storage, cutting UMI information in the read sections from the original sequence to obtain a sequencing read section after UMI extraction, and comparing the sequencing read section after UMI extraction with a human reference genome;
Extracting all UMI extracted sequencing reads which are compared to the same position of the human reference genome by bwa, and removing the sequencing reads of UMI information if UMI information of the sequencing reads is the same but base arrangement is inconsistent; if UMI information of the plurality of sequencing reads is the same and the base arrangement is consistent, marking the sequences as common sequences and reserving the common sequences; if only 1 sequencing read has the same UMI information and the base arrangement is consistent, marking the sequence as an isolated sequence and reserving the sequence;
step 5, recovering after noise reduction of the isolated sequence, and merging the common sequence to form a final output bam file;
Step 6, converting the bam file into a base frequency distribution file by using an iDES algorithm, and polishing the base frequency distribution file by using a reference data set of a human reference genome to obtain base frequency distribution after noise reduction;
And 7, using sequencing data of the PBMC as a control sample, detecting mutation frequency of the data processed in the step 6, and removing false positive mutation to obtain mutation frequency distribution of cfDNA of the subject.
Further, the conditions for identifying the mutation in the mutation frequency detection in step 7 are: meanwhile, when mutation frequency detection coverage is carried out on sequencing reads from positive and negative double chains, more than 2 mutation reads are carried out, or when mutation frequency detection coverage is carried out on sequencing reads from positive chains/negative chains, more than 1 mutation read is carried out, meanwhile, the mutation frequency is more than-log (0.01)/depth, and the mutation frequency of sequencing in a control sample is less than 0.005.
Further, the post-noise reduction recovery of the isolated sequence includes: calculating the base distribution of all the human reference genome positions with only the isolated sequence, evaluating the size of sequencing noise based on binomial distribution, and discarding the isolated sequence when the base distribution of the corresponding isolated sequence is inconsistent with the main base of the human reference genome position and the base frequency is less than the noise, otherwise recovering the isolated sequence.
Further, step 7 further comprises: detecting mutation frequency distribution of preoperative cfDNA of a tumor patient and mutation frequency distribution of a homogeneous tumor tissue sample according to the steps 1-7, identifying overlapped parts, comparing the mutation frequency distribution of the overlapped parts, and selecting the lower quarter of the mutation frequency distribution of the overlapped parts as a frequency threshold.
Further, detecting mutation frequency distribution of cfDNA after operation of a tumor patient according to steps 1-7, carrying out filtration on mutation frequency distribution of cfDNA before operation and after operation of the tumor patient based on the obtained frequency threshold, obtaining mutation frequency distribution after screening, calculating mutation frequency change ratio, and drawing a peak diagram.
CfDNA was extracted separately for high throughput UMI sequencing by collecting pre-operative and post-operative blood samples from liver cancer patients. After high-precision sequence processing and merging, mutation detection is carried out on each blood sample. After identifying the mutations, all somatic mutations were used to group different types of gene mutations using frequency changes at two time points before and after surgery. And then integrating the screened tumor source mutations, wherein the screened tumor source specific mutations can be further used for dynamic evaluation of tumor burden of time series samples.
The invention has carried out in-depth analysis on the mutation spectrum of cfDNA of a tumor patient, and discovers for the first time that the dynamic change of the cfDNA of plasma of the tumor patient before and after the operation can be used for separating the mutation from the tumor, and the mutation can be used for evaluating the effect of the tumor excision operation. Thus, tumor specific mutations can be identified noninvasively by only extracting pre-operative post-operative cfDNA of the patient.
In summary, the frequency of the detected somatic mutations is evaluated by using the preoperative and postoperative plasma samples of the tumor surgery patients, so that the specific mutations of the tumor can be screened.
The beneficial effects of the invention are mainly as follows: according to the technical scheme, isolated sequence noise, mutation background noise and false positive mutation of cfDNA of a subject are eliminated, and high-precision mutation frequency distribution data are obtained. The mutation frequency distribution data with high precision is also screened by using a frequency threshold value, and the non-tumor specific mutation frequency is removed.
Drawings
FIG. 1 distribution of tumor tissue specific mutations in plasma;
FIG. 2 frequency distribution of tumor tissue specific mutations in preoperative plasma;
FIG. 3 frequency distribution of preoperative plasma tumor tissue specific mutations in post-operative plasma;
FIG. 4 density distribution of preoperative mutation frequency ratios for tissue-free samples;
FIG. 5 Cross-over of two sets of mutant genes within and outside the threshold with tumor-specific mutant genes, chi-square test: p value 0.00017;
FIG. 6. Ratio of the length of the mutation reads of the upper and lower groups of threshold values within 150 bp;
Fig. 7. The grouping is significantly correlated with time to Recurrence Free Survival (RFS).
Detailed Description
The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto, and the terms "pre-operative", "post-operative" as used herein refer to both pre-operative and post-operative removal of a tumor.
The experimental materials used in the examples described below, unless otherwise specified, were purchased from conventional biochemical reagent stores.
Materials and equipment:
(1) QIAamp Circulating Nucleic Acid extraction kit was purchased from QIAGEN company;
(2) The chain and streptavidin-labeled magnetic beads are purchased from INTEGRATED DNA Technologies (IDT) company under the trade designation 1080589, and the streptavidin magnetic beads are formed by covalently binding magnetic particles with high-purity streptavidin. Magnetic beads can be used to capture biotin complexes, including biotin-labeled antigens, antibodies, and nucleic acids. The interaction of biotin-streptavidin is strong, and the non-specific binding rate is low, so that the captured substrate can meet the requirement of subsequent experiments;
(3) Biotin-labeled probe panel was purchased from naonda (nanjing) biotechnology limited, cat: 1001111E, the probe panel involves 578 genes of broad interest in solid tumor research, covering the approximately 2.6Mb region of the genome; support the enrichment of a variety of variant information including base substitutions, insertions/deletions, gene rearrangements, gene amplifications, microsatellite instability.
(4) ConsensusCruncher is a tool to suppress error rates of second generation sequencing data by deduplicating reads on the same DNA template by unique identifier Unique Molecular Identifiers (UMI).
Example 1:
1. 100 cases of pre-operation postoperative plasma samples and tumor tissue samples of hepatocellular carcinoma patients, 38 cases of cirrhosis patients and 30 healthy volunteers were collected.
2. According to the preoperative and postoperative plasma samples and tumor tissue samples collected into the patients with the group of hepatocellular carcinoma, 8mL of whole blood is taken from the follow-up blood samples of the preoperative and postoperative plasma samples and tumor tissue samples respectively, plasma is collected by a two-step centrifugation method, and the cfDNA of the plasma is extracted according to the operation instructions of the QIAamp Circulating Nucleic Acid extraction kit.
1. To a 50mL centrifuge tube were added 3mL of plasma, 300. Mu.L of protease K, 2.4mL of Buffer ACL (containing CARRIER RNA), vortexed 30s, and incubated at 60℃for 30min.
2. The tube was removed, 5.4mL Buffer ACB was added, vortexed for 30s, and incubated on ice for 5min.
3. VacConnector, mini column and 20mL tube extension were sequentially inserted onto QIAvac Plus.
4. And adding the sample mixed solution into the tube extender, starting a vacuum pump, closing the vacuum pump after the sample passes through the column, removing 20mL of the tube extender, and sequentially adding 600 mu L of Buffer ACW1, 750 mu L of Buffer ACW2 and 750 mu L of absolute ethyl alcohol into the Mini column for washing.
5. After the elution is completed, the Mini column is pulled out and placed on a new 1.5mL EP tube, 20000g is centrifuged for 3min, the supernatant is discarded, the cover is opened, and the mixture is placed in an incubator at 56 ℃ for incubation for 10min.
6. Adding 30-50 mu L Buffer AVE into Mini column, incubating for 3min at normal temperature, centrifuging for 3min at 20000g, and collecting cfDNA sample.
7. The concentration of cfDNA is precisely quantified by using Qubit, and the size of cfDNA fragments is detected by using Qsep 100,100 capillary electrophoresis system. The prepared cfDNA is placed at the temperature of-80 ℃ for standby.
3. Construction of genomic library (NanoPrep TM DNA library construction kit (for)) Collocation Duplex SEQ ADAPTERS KIT
1. Repairing the tail end of cfDNA, and adding an A base at the 3' -end;
1) The reaction mixture was prepared as shown in the following table:
Reagent(s) | Dosage of |
cfDNA | 10-20ng |
End Repair&A-Tailing Buffer | 6μL |
End Repair&A-Tailing Enzyme | 4μL |
H2O | Make up to 50 mu L |
After being mixed with 40 mu L of cfDNA sample, the mixture was put into a PCR instrument under the following reaction conditions:
Temperature (temperature) | Time of |
20℃ | 30min |
65℃ | 30min |
10℃ | ∞ |
2. The cfDNA was ligated at both ends to sequence a truncated Y-linker with a molecular tag (UMI) containing a random molecular tag sequence.
1) The reaction mixture was prepared as shown in the following table:
Reagent(s) | Dosage of |
IDT Duplex adpater(15uM) | 2μL |
Ligation Buffer | 26μL |
Totals to | 28μL |
After being mixed with 50 mu L of the end repair product of the last step, 2 mu L DNALIGASE is added finally, and the mixture is put into a PCR instrument for reaction under the following conditions:
Temperature (temperature) | Time of |
20℃ | 15min |
4℃ | ∞ |
2) Purification of the ligation product:
Adding 40 mu L of NanoPrep TM SP Beads into the connection product, uniformly mixing, incubating for 5-10min at normal temperature, placing in a magnetic rack, and discarding the supernatant after clarification; adding 150 mu L of 80% ethanol, incubating for 30s at room temperature, and rinsing the magnetic beads twice; removing residual alcohol, standing at room temperature for 5-10min, and air drying the magnetic beads; add 21. Mu.L Nuclear FREE WATER, incubate at 25℃for 2min, remove magnetic beads by magnetic rack adsorption, transfer 20. Mu.L of supernatant to a new 0.2mLPCR tube and collect purified ligation product.
3. Amplification enrichment with primers containing index tag sequences for discriminating between different samples
1) The reaction mixture was prepared as shown in the following table:
Reagent(s) | Dosage of |
HiFi PCR Master Mix,2x | 25μL |
UDI Primer Mix(5uM) | 5μL |
Totals to | 30μL |
After mixing with 20. Mu.L of the ligation product purified in the previous step, the mixture was subjected to the following reaction conditions in a PCR apparatus:
2) Adding 50 mu L of NanoPrep TM SP Beads into the PCR product, uniformly mixing, incubating for 5-10min at normal temperature, placing in a magnetic rack, clarifying, and discarding the supernatant; adding 150 mu L of 80% ethanol, incubating for 30s at room temperature, and rinsing the magnetic beads twice; removing residual alcohol, standing at room temperature for 5-10min, and air drying the magnetic beads; add 21. Mu.L Nuclear FREE WATER, incubate at 25℃for 2min, remove magnetic beads by magnetic rack adsorption, transfer 20. Mu.L of supernatant to a new 0.2mLPCR tube, collect purified PCR product, perform Qubit quantification and capillary electrophoresis.
4. Hybridization reaction
1) 8-12 Constructed libraries were mixed 500ng each into one pool, 5. Mu.L of COT Human DNA and 2. Mu. LBlocker were added and dried in vacuo.
2) Hybridization reagent preparation
Reagent(s) | Dosage of |
2X Hybridization Buffer | 8.5μL |
Hybridization Buffer Enhancer | 2.7μL |
probes | 4μL |
Nuclease-Free Water | 1.8μL |
Totals to | 17μL |
Resuspension the vacuum dried product with the prepared buffer, and placing the product into a PCR instrument for incubation under the following conditions:
Temperature (temperature) | Time of |
95℃ | 30s |
65℃ | 16h |
5. After hybridization, the library is eluted with magnetic beads with chain and avidin labels to capture biotin-labeled nucleic acid molecules
1) Preparing hybridization elution buffer
Reagent(s) | Dosage of | H2O | Totals to |
2X Bead Wash Buffer | 160μL | 160μL | 320μL |
10X Wash Buffer 1 | 28μL | 252μL | 280μL |
10X Wash Buffer 2 | 16μL | 144μL | 160μL |
10X Wash Buffer 3 | 16μL | 144μL | 160μL |
10X Stringent Wash Buffer | 32μL | 288μL | 320μL |
2) Preparation of bead resuspension Mix
Reagent(s) | Dosage of |
2X Hybridization Buffer | 8.5μL |
Hybridization Buffer Enhancer | 2.7μL |
Nuclease-Free Water | 5.8μL |
Totalizing | 17μL |
3) Elution of hybridization products
Taking 50 mu L Capture Beads to 1.5mL low adsorption tube, adding 100 mu L Bead Wash Buffer, gently mixing with gun, magnetically holding for 1min, removing supernatant, adding 17 mu L bead resuspension Mix resuspension Beads, adding prepared Capture Beads to hybridization product, blowing and mixing, incubating at 65deg.C for 45min to Capture hybridization library. The captured hybridization library was then washed sequentially with Wash Buffer1, STRINGENT BUFFER, wash Buffer 2, and Wash Buffer 3, and finally 20. Mu. LNuclease-FREE WATER was used for library elution.
PCR amplified hybridization library
1) Preparing PCR reagent.
Reagent(s) | Dosage of |
HiFi PCR Master Mix,2x | 25μL |
Library AmplificationPrimer Mix(5uM) | 5μL |
Totals to | 30μL |
Mixing with the hybridization elution product of the last step, and carrying out PCR amplification under the following reaction conditions:
2) PCR product purification
Adding 50 mu L of NanoPrep TM SP loads into the PCR product, uniformly mixing, incubating for 5-10min at normal temperature, placing in a magnetic rack, and discarding the supernatant after clarification; adding 150 mu L of 80% ethanol, incubating for 30s at room temperature, and rinsing the magnetic beads twice; removing residual alcohol, standing at room temperature for 5-10min, and air drying the magnetic beads; adding 21 mu L of Nuclear FREE WATER, incubating for 2min at 25 ℃, removing magnetic beads by magnetic rack adsorption, transferring 20 mu L of supernatant into a new 0.2mLPCR tube, collecting purified PCR products, accurately quantifying the concentration of the library by using Qubit, detecting the size of the library fragments by using a Qsep 100 capillary electrophoresis system, wherein the fragments are mainly concentrated to about 350 bp.
7. Finally, sequencing was performed on the Illumina Hiseq X ten platform.
4. Extracting UMI (unique molecular identifiers molecular tag) reads in the original sequence file using ConsensusCruncher:
Setting a UMI mode as NNN, reading UMI information of 3 bases at two ends of each reading segment, adding the extracted UMI information into the names of the reading segments for storage, cutting the UMI information in the reading segments from an original sequence to obtain a sequencing reading segment after UMI extraction, and comparing the sequencing reading segment after UMI extraction with a reference genome hg 19.
5. Continuing to use ConsensusCruncher, comparing the UMI extracted sequencing read with bwa to the hg19 genome, recording position information, and then identifying UMI common sequences and separating isolated sequences.
Wherein, the common sequence refers to a sequence with a plurality of sequencing read sequences identified to an alignment position together, UMI information and base sequencing are completely consistent, and the isolated sequence is a sequence identified by a single sequencing read.
The specific separation process is as follows: firstly, extracting all UMI sequencing reads which are compared to the same position of a genome, and counting the variety number of corresponding UMI information; judging whether base arrangements of all sequencing reads of the same UMI information aligned to the same position of the genome are consistent or not, and if not, considering the sequencing reads of the UMI information at the position to introduce PCR errors for removal; when the base arrangement of the sequencing reads of the same UMI information aligned to the same position of the genome is consistent and only one, the sequencing sequence is regarded as an isolated sequence, the sequence is reserved after marking, and when the base arrangement of the sequencing reads of the same UMI information aligned to the same position of the genome is consistent and more than one, the sequencing sequences are regarded as a common sequence, and the sequence is reserved after marking. Wherein the consensus sequence will be directly retained for further analysis; the isolated sequence should be noise processed in advance: the base distribution of all genomic positions supported by only the isolated sequence will be calculated, the magnitude of sequencing noise will be estimated based on the binomial distribution, when the base distribution of the corresponding isolated sequence is not consistent with the main base of the position and the base frequency is less than the noise, the isolated sequence will be discarded, otherwise recovered and the next analysis will be performed. And merging the recovered isolated sequence with the common sequence to form a final bam file.
6. The bam file was converted to a base frequency distribution file inside the sequencing region using iDES.
The iDES algorithm uses sequence information for all sequencing reads in the bam file based on each genomic position. The base distribution at this position is extracted. The base frequency distribution file was polished based on the distribution of bases of the reference dataset of the pre-established hg19 genome at the whole genome level. Polishing refers to constructing a distribution model based on the distribution of the bases of the hg19 genome at the whole genome level, carrying out hypothesis testing on the candidate mutation of the sequencing read according to the model, judging the authenticity of the candidate mutation, estimating the probability of false detection of the mutation bases, removing the mutation as a background error, and reducing the background error rate in a base distribution file after polishing.
7. And finally, carrying out high-precision mutation detection by using the frequency distribution difference among samples.
Constructing a gene library of PBMC (peripheral blood mononuclear cells), obtaining sequencing data according to the fourth to sixth steps, and using the sequencing data as a control sample to detect the mutation frequency of a preoperative plasma sample, wherein the mutation sites are required to meet the following requirements: for double-ended (i.e., both positive and negative double-stranded read support) sequence coverage, there are more than 2 mutant read supports, for single-ended sequence (i.e., only positive or negative strand read support) coverage, there are more than 1 mutant read support, with a mutation frequency greater than-log (0.01)/depth, and the mutation frequency sequenced in the control sample is less than 0.005 to remove false positive results.
8. The mutation detected before surgery was cross-aligned with the mutation from tumor tissue, overlapping portions were identified, and the mutation frequency distribution of the overlapping portions (mutation detected in both blood and tissue, i.e., tumor-specific mutation detected in blood) was compared.
The frequency distribution of overlapping mutations is shown in figure 2. The figure shows that there is a specific pattern in the frequency distribution of tumor specific mutations in the blood, and correspondingly, the frequency of non-tumor specific mutations in the blood is more in the low frequency region. Therefore, the lower quarter of the distribution of tumor specific mutation frequencies in blood, 0.02611, is selected as the basis for mutation frequency screening. This frequency threshold will be the first condition to filter mutations without depending on the tissue.
All values are arranged in order of magnitude and divided into four equal parts, and the score at the position of the three division points is the quartile. The smallest quartile is called the lower quartile.
9. All mutations detected before surgery are filtered based on the frequency threshold 0.02611 determined in the previous step, and the selected mutations are used as alternatives of tumor-derived mutations. The frequency distribution of the preoperative plasma tumor tissue specific mutations in the postoperative plasma is shown in FIG. 3 by extracting mutation frequencies of the sites above the frequency threshold according to the three-seven pairs of postoperative blood samples.
10. The ratio of the mutation frequency changes was calculated and compared between preoperative (fig. 2) and postoperative (fig. 3), and the peak pattern was plotted as shown in fig. 4. The graph shows that according to the mutation frequency change condition, the peak graph is observed, the threshold value capable of clearly separating two different mutation peaks is selected as a diagnosis threshold value, the diagnosis threshold value is positioned at the lowest point of the dip between the two mutation peaks, the mutation detected before operation can be divided into two clear groups, wherein the group with obvious dip should reflect the influence of the tumor excision operation on the tumor load and should be tumor specific mutation; the other group should be mutated from other sources (i.e., non-tumor specific sources).
11. The group of mutations was extracted for gene overlap analysis (FIG. 5) and length distribution analysis (FIG. 6), and the results confirmed that the group of mutations had the characteristics of typical tumor-derived mutations. Thus the group mutation can be considered as a specific mutation of tumor origin.
13. The presence or absence of a minimal residual lesion of a tumor in a hepatocellular carcinoma patient was judged based on the presence or absence of the group mutation in the postoperative sample, and KM survival analysis was performed using R (fig. 7). The results showed that hepatocellular carcinoma patients with minimal residual lesions were significantly worse than those without minimal residual lesions detected. The above results highlight that the frequency change based on the preoperative and postoperative somatic mutation can screen the tumor-derived mutation on the premise of not depending on tumor tissues, and thereby identify the tiny residual lesion of the tumor.
Example 2:
ctDNA detection analysis was performed as in example 1, taking pre-and post-operative plasma samples of the patient for the presence of mutations with a pre-operative frequency change rate greater than the diagnostic threshold (0.2). By analyzing the distribution and sequence length distribution of these mutations on the gene, it was found that the mutant genes include important genes related to tumorigenesis and thus are tumor-related mutations. After analysis, it was found that tumor-related mutations remained after surgery, and it was judged that there were minute residual lesions. The patient was found to have an imaging recurrence at the bottom 84 days post-surgery by follow-up.
Example 3:
ctDNA detection analysis was performed as in example 1, taking pre-operative and post-operative plasma samples of the patient for the presence of mutations with a pre-operative frequency change rate greater than a threshold (0.2). Analysis of gene distribution and sequence length distribution supports the group mutation to tumor specific mutation. In the post-operative samples, the frequency of the group mutations was all 0, and no minimal residual lesions were determined. The patient was found to have not relapsed within one year after surgery by follow-up.
Claims (1)
1. The device for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution is used for sequencing a high-throughput sequencing library, and the high-throughput sequencing library is established by adopting the following steps: step 1, extracting cfDNA from the blood plasma of the extracted subject;
Step 2, repairing the tail end of the cfDNA extracted in the step1, adding an A base at the 3' end of the cfDNA, connecting and sequencing truncated Y-shaped joints with molecular tags UMI at the two ends of the cfDNA after treatment to obtain a connection product, performing magnetic bead purification on the connection product, amplifying and enriching with primers containing index tag sequences for distinguishing different samples, performing magnetic bead purification, adding a probe with biotin marks for hybridization reaction, eluting a library with magnetic beads with chains and avidin marks, capturing biotin-marked nucleic acid molecules, amplifying and purifying with the magnetic beads, and constructing a high-throughput sequencing library;
The device for detecting the liver cancer specific mutation based on the cfDNA base mutation frequency distribution is characterized by comprising the following execution modules of obtaining the mutation frequency distribution of the cfDNA of a subject:
Extracting UMI read sections in an original sequence file, separating UMI information of 3 bases at two ends of each read section, adding the extracted UMI information into read section names for storage, cutting the UMI information in the read sections from the original sequence to obtain a sequencing read section after UMI extraction, and comparing the sequencing read section after UMI extraction with a human reference genome;
extracting all UMI extracted sequencing reads which are compared to the same position of a human reference genome by bwa, and removing the sequencing reads of UMI information if UMI information of the sequencing reads is the same but base arrangement is inconsistent; if UMI information of the plurality of sequencing reads is the same and the base arrangement is consistent, marking the sequences as common sequences and reserving the common sequences; if only 1 sequencing read has the same UMI information and the base arrangement is consistent, marking the sequence as an isolated sequence and reserving the sequence;
Thirdly, denoising the isolated sequence, recovering the isolated sequence, and merging the isolated sequence and the common sequence to form a final output bam file; post-noise reduction recovery of isolated sequences includes: calculating the base distribution of all the human reference genome positions with only the isolated sequence, evaluating the size of sequencing noise based on binomial distribution, and discarding the isolated sequence when the base distribution of the corresponding isolated sequence is inconsistent with the main base of the human reference genome positions and the base frequency is less than the noise, otherwise, recovering the isolated sequence;
Converting the bam file into a base frequency distribution file by using an iDES algorithm, and polishing the base frequency distribution file by using a reference data set of a human reference genome to obtain base frequency distribution after noise reduction;
Using the sequencing data of the PBMC as a control sample, detecting mutation frequency of the data processed by the module IV, and removing false positive mutation to obtain mutation frequency distribution of cfDNA of the subject; the conditions for identifying mutation in the mutation frequency detection in the fifth module are as follows: meanwhile, when mutation frequency detection coverage is carried out on sequencing reads from positive and negative double chains, more than 2 mutation reads are carried out, or when mutation frequency detection coverage is carried out on sequencing reads from positive chains/negative chains, more than 1 mutation read is carried out, meanwhile, the mutation frequency is more than-log (0.01)/depth, and the mutation frequency of sequencing in a control sample is less than 0.005;
and a sixth module: detecting mutation frequency distribution of preoperative cfDNA of a tumor patient and mutation frequency distribution of a homogeneous tumor tissue sample according to modules I-five, identifying overlapped parts, comparing the mutation frequency distribution of the overlapped parts, and selecting the lower quarter of the mutation frequency distribution of the overlapped parts as a frequency threshold;
and detecting mutation frequency distribution of cfDNA after operation of the tumor patient according to the first to fifth modules, filtering the mutation frequency distribution of cfDNA before operation and after operation of the tumor patient based on the obtained frequency threshold, obtaining the mutation frequency distribution after screening, calculating mutation frequency change ratio, and drawing a peak diagram.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210110195.XA CN114182022B (en) | 2022-01-29 | Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210110195.XA CN114182022B (en) | 2022-01-29 | Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114182022A CN114182022A (en) | 2022-03-15 |
CN114182022B true CN114182022B (en) | 2024-07-09 |
Family
ID=
Non-Patent Citations (3)
Title |
---|
Detection of Rare Mutations in CtDNA Using Next Generation Sequencing;Xiaoxing Lv et al;《Journal of Visualized Experiments》;1-8 * |
Integrated digital error suppression for improved detection of circulating tumor DNA;Aaron M Newman et al;《nature biotechnology》;547-560 * |
Urine as a non-invasive alternative to blood for germline and somatic mutation detection in hepatocellular carcinoma;Amy K. Kim et al;《medRxiv》;1-32 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109637587B (en) | Method, device, storage medium, processor and method for standardizing transcriptome data expression quantity for detecting gene fusion mutation | |
WO2021169875A1 (en) | Cancer gene methylation measuring system and cancer in vitro detection method executed in same | |
EP3249051B1 (en) | Use of methylation sites in y chromosome as prostate cancer diagnosis marker | |
CN108315424A (en) | PCR specific primers, detection kit and the detection method of Benign Thyroid Nodules tumor- associated gene | |
WO2021180105A1 (en) | Probe composition for detecting common cancers of both sexes | |
WO2021180106A1 (en) | Probe composition for detecting five tumors of digestive tract | |
WO2021185274A1 (en) | Probe composition for detecting 6 cancers with high incidence in china | |
CN107312770A (en) | A kind of construction method in tumour BRCA1/2 genetic mutations library detected for high-flux sequence and its application | |
CN113215257B (en) | Nucleic acid composition, kit and detection method for detecting breast cancer related gene methylation | |
WO2021175284A1 (en) | Probe composition for detecting three types of solid organ tumors | |
CN108374047B (en) | Kit for detecting bladder cancer based on high-throughput sequencing technology | |
CN111748628B (en) | Primer and kit for detecting thyroid cancer prognosis related gene variation | |
CN108315425A (en) | PCR specific primers, kit and its application method of metastasis of thyroid carcinoma related gene detection | |
CN114182022B (en) | Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution | |
CN116254344A (en) | Composition for detecting bladder cancer, kit and application thereof | |
CN111020710A (en) | ctDNA high-throughput detection of hematopoietic and lymphoid tissue tumors | |
CN113667757B (en) | Biomarker combination for early screening of prostate cancer, kit and application | |
WO2021169874A1 (en) | Probe composition for detecting three lumen organ tumors | |
CN113817822B (en) | Tumor diagnosis kit based on methylation detection and application thereof | |
CN114196740A (en) | Digital amplification detection method, detection product and detection kit for simultaneously identifying multiple gene types | |
CN113355416A (en) | Nucleic acid composition, kit and detection method for detecting gastric cancer related gene methylation | |
CN114182022A (en) | Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution | |
CN113234822A (en) | Method for capturing genetic colorectal cancer genome target sequence | |
CN110964821A (en) | Detection panel for predicting liver cancer metastasis mode and risk and application thereof | |
CN113948150B (en) | JMML related gene methylation level evaluation method, model and construction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |