CN114182022A - Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution - Google Patents

Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution Download PDF

Info

Publication number
CN114182022A
CN114182022A CN202210110195.XA CN202210110195A CN114182022A CN 114182022 A CN114182022 A CN 114182022A CN 202210110195 A CN202210110195 A CN 202210110195A CN 114182022 A CN114182022 A CN 114182022A
Authority
CN
China
Prior art keywords
mutation
cfdna
tumor
umi
sequencing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210110195.XA
Other languages
Chinese (zh)
Other versions
CN114182022B (en
Inventor
刘小龙
陈耕
刘景丰
彭放
蔡志雄
张虎勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases)
Xian Jiaotong University
Original Assignee
Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases)
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases), Xian Jiaotong University filed Critical Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases)
Priority to CN202210110195.XA priority Critical patent/CN114182022B/en
Publication of CN114182022A publication Critical patent/CN114182022A/en
Application granted granted Critical
Publication of CN114182022B publication Critical patent/CN114182022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)

Abstract

The invention discloses a method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution, which is a tumor mutation detection scheme for screening gene mutation in blood, does not need to rely on tumor tissue sampling and is completely noninvasive and is used for accurately screening tumor source mutation. cfDNA was extracted separately for high throughput UMI sequencing by collecting pre-and post-operative blood samples from tumor patients. After high-precision sequence processing and combination, mutation detection is respectively carried out on each blood sample. After identification of the mutations, the frequency changes of all somatic mutations at two time points before and after surgery were used to group different types of genetic mutations. And then integrating the screened tumor-derived mutations, wherein the screened tumor-derived specific mutations can be further used for dynamically evaluating the tumor burden of the time series samples.

Description

Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution
Technical Field
The invention relates to the technical field of bioinformatics, in particular to a detection method of base mutation frequency distribution in cfDNA and application thereof in tumor prognosis evaluation.
Background
Cancer is one of the major scientific problems that endanger human health. According to the statistical data in 2020, the number of new cancer cases can reach 1930 ten thousand each year, the number of deaths can reach 1000 ten thousand each year, and the morbidity is still rising year by year. At present, surgical resection is the main treatment mode of various tumors, but tumor patients still face the problems of unsatisfactory surgical resection rate and high recurrence probability. The existence of the postoperative tiny residual focus is the main reason of recurrence after tumor surgery, and the detection of the tiny residual focus in time after the surgery has important value for adjusting the treatment scheme and improving the prognosis of tumor patients. At present, the detection capability of the traditional imaging means and blood markers on tiny residual lesions is very limited, and the problem of prognosis evaluation continuously faced in clinic is difficult to solve. The liquid biopsy technology developed at a high speed in recent years can dynamically evaluate the tumor load of a patient, has higher sensitivity, and provides a new tool and strategy for detecting the tiny residual focus after the tumor operation.
Among them, a dynamic tumor burden evaluation strategy using a sequencing technology to track tumor genome features based on genome information of circulating tumor dna (circulating tumor dna) in plasma free dna (circulating free dna) has become one of the hot areas of clinical application. Currently, clinical experiments for tracking and detecting postoperative tumor residual lesions of patients by using cfDNA to guide treatment strategy optimization are developed by a plurality of different clinical research institutions at home and abroad. In 2020, FDA approved the first liquid biopsy concomitant therapy protocol for high throughput sequencing technology, and in 2021, FDA gave Signatera's test breakthrough medical device identification, confirming its value in assessing minimal residual lesions and monitoring prognosis. However, in solid tumors, especially liver cancer, the detection of microscopic residual lesions often requires tissue genomic information obtained in advance by tissue sampling. How to non-invasively screen tumor-specific mutations and evaluate tumor burden without depending on tissues is still one of the major problems to be solved in clinic.
cfDNA contains DNA fragments from different tissues, different cell populations. The proportion of DNA fragments in cfDNA from different tissues will dynamically change with tumor burden and the patient's physiopathological state during the clinical course of the patient. Therefore, analyzing mutation frequency changes through multiple time points of cfDNA samples offers the possibility to screen for mutations from specific tissues. However, currently, protocols for tumor mutation screening using time series cfDNA samples to identify tumor burden and assess the presence of microscopic residual lesions have not been established.
Disclosure of Invention
In order to solve the technical problems, the invention develops a detection method of base mutation frequency distribution in cfDNA, which is a tumor mutation detection scheme for accurately screening tumor source mutation, wherein the method is a gene mutation screening strategy in blood, does not need to rely on tumor tissue sampling, and is completely noninvasive.
The technical scheme adopted by the invention is as follows: a method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution comprises the following steps:
step 1, extracting cfDNA from the extracted blood plasma of a subject;
step 2, performing end repairing on the cfDNA extracted in the step 1, adding A base at the 3' end of the cfDNA, connecting and sequencing a truncated Y-shaped joint with a molecular tag UMI at the two ends of the processed cfDNA to obtain a connection product, performing magnetic bead purification on the connection product, performing amplification enrichment and magnetic bead purification by using a primer containing an index tag sequence for distinguishing different samples, adding a probe with a biotin tag to perform hybridization reaction, eluting the library by using a magnetic bead with a chain and an avidin tag, capturing a nucleic acid molecule with the biotin tag, performing amplification and magnetic bead purification, constructing a high-throughput sequencing library, and finally performing sequencing;
step 3, extracting UMI reads in the original sequence file, separating UMI information of 3 basic groups at two ends of each read, adding the extracted UMI information into the read names for storage, cutting the UMI information in the reads from the original sequence to obtain sequencing reads after UMI extraction, and comparing the sequencing reads after UMI extraction with a human reference genome;
step 4, extracting all sequencing reads after extracting UMI which are aligned to the same position of the human reference genome by using bwa, and removing the sequencing reads of the UMI information if the UMI information of the sequencing reads is the same but the base sequence is not consistent; if the UMI information of a plurality of sequencing reads is the same and the base sequences are consistent, marking as a common sequence and reserving; if the UMI information of only 1 sequencing read is the same and the base sequences are consistent, marking as an isolated sequence and reserving;
step 5, the isolated sequence is subjected to noise reduction and then is recovered and is combined with the common sequence to form a final output bam file;
step 6, converting the bam file into a base frequency distribution file by using an iDES algorithm, and then polishing the base frequency distribution file by using a reference data set of a human reference genome to obtain base frequency distribution after noise reduction;
and 7, taking sequencing data of the PBMCs as a control sample, carrying out mutation frequency detection on the data processed in the step 6, and removing false positive mutation to obtain the mutation frequency distribution of the cfDNA of the subject.
Further, the conditions identified as mutations in the mutation frequency detection in step 7 are: and meanwhile, when mutation frequency detection coverage is carried out on the sequencing reads from the positive strand and the negative strand, more than 2 mutation reads are available, or when mutation frequency detection coverage is carried out on the sequencing reads from the positive strand/the negative strand, more than 1 mutation read is available, the mutation frequency is more than-log (0.01)/depth, and the mutation frequency sequenced in a control sample is less than 0.005.
Further, the recovering after the noise reduction of the isolated sequence comprises: and calculating the base distribution of all the human reference genome positions only with isolated sequences, evaluating the magnitude of sequencing noise based on the binomial distribution, and discarding the isolated sequences when the base distribution of the corresponding isolated sequences is inconsistent with the main bases of the human reference genome positions and the base frequency is less than the noise, or else, recovering the isolated sequences.
Further, step 7 is followed by: detecting the mutation frequency distribution of the preoperative cfDNA of the tumor patient and the mutation frequency distribution of the same tumor tissue sample according to the steps 1-7, identifying the overlapped part, comparing the mutation frequency distribution of the overlapped part, and selecting the lower quarter quantile of the mutation frequency distribution of the overlapped part as a frequency threshold.
Further, detecting the mutation frequency distribution of the postoperative cfDNA of the tumor patient according to the steps 1-7, filtering the mutation frequency distribution of the preoperative cfDNA and the postoperative cfDNA of the tumor patient based on the obtained frequency threshold value to obtain the screened mutation frequency distribution, calculating the mutation frequency change ratio, and drawing a peak map.
cfDNA is extracted respectively to carry out high-throughput UMI sequencing by collecting a preoperative blood sample and a postoperative blood sample of a liver cancer patient. After high-precision sequence processing and combination, mutation detection is respectively carried out on each blood sample. After identification of the mutations, the frequency changes of all somatic mutations at two time points before and after surgery were used to group different types of genetic mutations. And then integrating the screened tumor-derived mutations, wherein the screened tumor-derived specific mutations can be further used for dynamically evaluating the tumor burden of the time series samples.
The invention deeply analyzes the mutation spectrum of cfDNA of a tumor patient, and discovers for the first time that the dynamic change of the cfDNA of the tumor patient before and after the operation can be used for separating the mutation from the tumor, and the mutation can be used for evaluating the effect of the tumor resection operation. Therefore, tumor specific mutations can be identified non-invasively only by extracting cfDNA of a patient before and after an operation.
In summary, the frequency of detected somatic mutations is assessed using preoperative and postoperative plasma samples from tumor surgery patients, and specific mutations from tumors can be screened.
The invention has the following beneficial effects: the technical scheme of the invention eliminates isolated sequence noise, mutation background noise and false positive mutation of cfDNA of a subject to obtain high-precision mutation frequency distribution data. The high-precision mutation frequency distribution data are also screened by using a frequency threshold value, and the non-tumor specific mutation frequency is eliminated.
Drawings
FIG. 1. distribution of tumor tissue specific mutations in plasma;
FIG. 2. frequency distribution of tumor tissue specific mutations in pre-operative plasma;
FIG. 3 frequency distribution of preoperative plasma tumor tissue specific mutations in post-operative plasma;
FIG. 4 is a density distribution of the ratio of mutation frequencies before and after surgery for an unorganized sample;
FIG. 5. intersection of two sets of mutant genes inside and outside the threshold with tumor-specific mutant genes, chi-square test: p value 0.00017;
FIG. 6 shows the ratio of the length of two sets of mutant reads within 150bp above and below the threshold;
figure 7 grouping was significantly associated with relapse free survival time (RFS).
Detailed Description
The present invention will be further described with reference to the following specific examples, but the scope of the present invention is not limited thereto, and the terms "before operation" and "after operation" as used herein refer to before and after operation for removing a tumor.
The experimental materials used in the following examples were purchased from conventional biochemical reagent stores unless otherwise specified.
Materials and equipment:
(1) the QIAamp Circulating Nucleic Acid extraction kit is purchased from QIAGEN company;
(2) streptavidin-labeled magnetic beads, which were formed by covalently bonding magnetic microparticles to high purity streptavidin, were purchased from Integrated DNA Technologies (IDT) Inc., cat # 1080589. Magnetic beads can be used to capture biotin complexes, including biotin-labeled antigens, antibodies, and nucleic acids. The biotin-streptavidin interaction is strong, and the nonspecific binding rate is low, so that the captured substrate can meet the requirements of subsequent experiments;
(3) the biotin-labeled probe panel was purchased from nanoonta (Nanjing) Biotechnology Ltd, cat #: 1001111E, the probe panel relates to 578 genes of widespread interest in solid tumor studies, covering a region of approximately 2.6Mb of the genome; supporting the enrichment of various variant information including base substitution, insertion/deletion, gene rearrangement, gene amplification and microsatellite instability.
(4) ConsensussCruncher is a tool to suppress the error rate of second generation sequencing data, and reads on the same DNA template are de-duplicated by unique identifier Unique Molecular Identifiers (UMI).
Example 1:
first, 100 pre-and post-operative plasma samples and tumor tissue samples of hepatocellular carcinoma patients, 38 plasma samples of cirrhosis patients and 30 plasma samples of healthy volunteers were collected.
And secondly, according to the preoperative and postoperative plasma sample and the tumor tissue sample of the hepatocellular carcinoma patient, respectively taking 8mL of whole blood from the follow-up blood sample of the hepatocellular carcinoma patient and the tumor tissue sample, collecting plasma by using a two-step centrifugation method, and extracting plasma cfDNA according to the operation instruction of the QIAamp Circulating Nucleic Acid extraction kit.
1. 3mL of plasma, 300. mu.L of protease K, 2.4mL of Buffer ACL (containing carrier RNA) were added to a 50mL centrifuge tube, vortexed for 30s, and incubated at 60 ℃ for 30 min.
2. The tube was removed, 5.4mL Buffer ACB was added, vortexed for 30s, and incubated on ice for 5 min.
3. VacConnector, Mini column and 20mL of tube extender were inserted sequentially onto QIAvac 24 Plus.
4. Adding the sample mixed solution into the tube expander, turning on the vacuum pump, turning off the vacuum pump after the sample passes through the column, removing 20mL of the tube expander, and sequentially adding 600 mu L of Buffer ACW1, 750 mu L of Buffer ACW2 and 750 mu L of absolute ethyl alcohol into the Mini column to pass through the column for cleaning.
5. After the elution was completed, the Mini column was pulled out and placed on a new 1.5mL EP tube, centrifuged at 20000g for 3min, the supernatant was discarded, the lid was opened, and the tube was placed in an incubator at 56 ℃ for incubation for 10 min.
6. Adding 30-50 μ L Buffer AVE into Mini column, incubating at room temperature for 3min, centrifuging at 20000g for 3min, and collecting cfDNA sample.
7. The concentration of cfDNA is accurately quantified by utilizing the Qubit, and the size of the cfDNA fragment is detected by utilizing a Qsep 100 capillary electrophoresis system. The prepared cfDNA is placed at-80 ℃ for standby.
Thirdly, constructing a genome library (NanoPrep)TMDNA library construction kit (for)
Figure BDA0003494862380000041
) Collocation Duplex Seq Adapters Kit)
1. Performing end repair on cfDNA, and adding an A base at the 3' end;
1) the reaction mixture was prepared as follows:
reagent Dosage of
cfDNA 10-20ng
End Repair&A-Tailing Buffer 6μL
End Repair&A-Tailing Enzyme 4μL
H2O Make up to 50. mu.L
After being mixed with 40. mu.L of cfDNA sample, the mixture was placed in a PCR apparatus under the following reaction conditions:
temperature of Time
20℃ 30min
65℃ 30min
10℃
2. The cfDNA is ligated at both ends with a sequencing truncated Y-linker with a molecular tag (UMI) containing a random molecular tag sequence.
1) The reaction mixture was prepared as follows:
reagent Dosage of
IDT Duplex adpater(15uM) 2μL
Ligation Buffer 26μL
Total of 28μL
After being mixed with 50 μ L of the end repair product of the previous step, 2 μ L of DNAIGAse was added, and the mixture was placed in a PCR apparatus under the following reaction conditions:
temperature of Time
20℃ 15min
4℃
2) Purifying a connection product:
add 40. mu.L NanoPrep to the ligation productTMSP Beads, mixing uniformly, incubating at normal temperature for 5-10min, placing on a magnetic rack, and removing the supernatant after clarification; adding 150 mu L of 80% ethanol, incubating at room temperature for 30s, and rinsing the magnetic beads twice; removing residual alcohol, standing at room temperature for 5-10min, and air drying the magnetic beads; mu.L of nucleic Free Water was added, incubated at 25 ℃ for 2min, the beads were removed by magnetic frame adsorption, 20. mu.L of the supernatant was transferred to a new 0.2ml PCR tube, and the purified ligation product was collected.
3. And performing amplification enrichment by using primers containing index tag sequences for distinguishing different samples
1) The reaction mixture was prepared as follows:
reagent Dosage of
HiFi PCR Master Mix,2x 25μL
UDI Primer Mix(5uM) 5μL
Total of 30μL
After being mixed with 20 μ L of the ligation product purified in the previous step, the mixture is placed into a PCR instrument to react under the following conditions:
Figure BDA0003494862380000051
2) PCR product purification 50. mu.L NanoPrep was added to the PCR productTMSP Beads, mixing uniformly, incubating at normal temperature for 5-10min, placing on a magnetic rack, and removing the supernatant after clarification; adding 150 mu L of 80% ethanol, incubating at room temperature for 30s, and rinsing the magnetic beads twice; removing residual alcohol, standing at room temperature for 5-10min, and air drying the magnetic beads; add 21 u L nuclear Free Water, 25 degrees C were incubated for 2min, magnetic shelf adsorption to remove magnetic beads, transfer the supernatant 20 u L to a new 0.2mLPCR tube, collect the purified PCR products, Qubit quantification and capillary electrophoresis.
4. Hybridization reaction
1) 500ng of each of 8-12 constructed libraries were pooled into one pool, 5. mu.L of COT Human DNA and 2. mu.L of LBlocker were added, and vacuum dried.
2) Hybridization reagent preparation
Reagent Dosage of
2X Hybridization Buffer 8.5μL
Hybridization Buffer Enhancer 2.7μL
probes 4μL
Nuclease-Free Water 1.8μL
Total of 17μL
Resuspending the vacuum dried product with a prepared buffer, and incubating in a PCR instrument under the following conditions:
temperature of Time
95℃ 30s
65℃ 16h
5. After the hybridization reaction, the library is eluted by magnetic beads with chains and avidin marks, and nucleic acid molecules marked by biotin are captured
1) Preparing hybridization elution buffer
Reagent Dosage of H2O Total of
2X Bead Wash Buffer 160μL 160μL 320μL
10X Wash Buffer 1 28μL 252μL 280μL
10X Wash Buffer 2 16μL 144μL 160μL
10X Wash Buffer 3 16μL 144μL 160μL
10X Stringent Wash Buffer 32μL 288μL 320μL
2) Preparation of bead resuspension Mix
Reagent Dosage of
2X Hybridization Buffer 8.5μL
Hybridization Buffer Enhancer 2.7μL
Nuclease-Free Water 5.8μL
Total up to 17μL
3) Elution of the hybridization product
And (3) taking 50 mu L of Capture Beads to a 1.5mL low adsorption tube, adding 100 mu L of Bead Wash Buffer, gently mixing by using a gun, carrying out magnetic frame for 1min, removing supernatant, adding prepared 17 mu L of Bead resuspension Mix resuspension Beads, adding the prepared Capture Beads to the hybridization products, blowing and mixing uniformly, and incubating at 65 ℃ for 45min to Capture the hybridization library. The captured hybridization library was then washed sequentially with the prepared Wash Buffer1, Stringent Buffer, Wash Buffer 2, and Wash Buffer 3, and finally 20. mu.L of the Ulluclease-Free Water library was eluted.
PCR amplification of hybrid libraries
1) And preparing a PCR reagent.
Reagent Dosage of
HiFi PCR Master Mix,2x 25μL
Library AmplificationPrimer Mix(5uM) 5μL
Total of 30μL
Mixing with the hybridization elution product of the previous step, and carrying out PCR amplification under the following reaction conditions:
Figure BDA0003494862380000061
2) PCR product purification
Add 50. mu.L NanoPrep to PCR productTMSP Beads, mixing uniformly, incubating at normal temperature for 5-10min, placing on a magnetic rack, and removing the supernatant after clarification; adding 150 mu L of 80% ethanol, incubating at room temperature for 30s, and rinsing the magnetic beads twice; removing residual alcohol, standing at room temperature for 5-10min, and air drying the magnetic beads; add 21. mu.L of nucleic Free Water and incubate at 25 ℃ for 2minMagnetic beads are removed by adsorption of a magnetic frame, 20 mu L of supernatant is transferred to a new 0.2mLPCR tube, purified PCR products are collected, the concentration of the library is accurately quantified by utilizing the Qubit, the size of the fragments of the library is detected by utilizing a Qsep 100 capillary electrophoresis system, and the fragments are mainly concentrated to be about 350 bp.
7. And finally, sequencing on an Illumina Hiseq X ten platform.
Extracting UMI (unique molecular identifiers) reading sections in the original sequence file by using a ConsensussCruncher:
and setting the UMI mode as NNN, reading UMI information of 3 bases at two ends of each read, adding the extracted UMI information into the read name for storage, cutting the UMI information in the read from the original sequence to obtain a sequencing read after UMI extraction, and comparing the sequencing read after UMI extraction with the reference genome hg 19.
And fifthly, continuing using the Consensuss Cruncher, aligning the sequencing reads after UMI extraction to the hg19 genome by using bwa, recording position information, and identifying the common sequences of the UMI and separating isolated sequences.
Wherein, the common sequence refers to a sequence which is identified by a plurality of sequencing read sequences together to the alignment position, the UMI information and the base sequence are completely consistent, and the isolated sequence is the sequence identified by only a single sequencing read.
The specific separation process is as follows: firstly, extracting all UMI sequencing reads which are compared to the same position of a genome, and counting the number of types of corresponding UMI information; judging whether the base sequences of all sequencing reads which are compared to the same UMI information at the same position of the genome are consistent, if not, regarding the sequencing reads of the UMI information at the position as introducing PCR errors for removal; and when the base sequences of the sequencing reads of the same UMI information at the same position of the genome are aligned to be consistent and only one, the sequencing sequences are regarded as isolated sequences and retained after being marked, and when the base sequences of the sequencing reads of the same UMI information at the same position of the genome are aligned to be consistent and more than one, the sequencing sequences are regarded as common sequences and retained after being marked. Wherein, the common sequence is directly reserved for the next analysis; the isolated sequence should be noise-treated: and (3) calculating the base distribution of the genome positions only supported by the isolated sequences, evaluating the magnitude of sequencing noise based on the binomial distribution, and discarding the isolated sequences when the base distribution of the corresponding isolated sequences is inconsistent with the main bases at the positions and the base frequency is less than the noise, otherwise, recovering the isolated sequences and performing next analysis. And combining the recovered isolated sequence and the common sequence to form a final bam file.
Sixthly, converting the bam file into a base frequency distribution file inside the sequencing region by using the iDES.
The iDES algorithm uses the sequence information of all sequencing reads in the bam file, based on each genomic position. The base distribution at that position is extracted. The base frequency distribution file was polished based on the distribution of bases at the genome wide level of the previously established reference dataset of hg19 genome. Polishing refers to the distribution of bases of hg19 genome at the whole genome level, a distribution model is constructed, the candidate mutation of a sequencing read is subjected to hypothesis test according to the model, the authenticity of the candidate mutation is judged, the probability of detection of mutation base errors is estimated, if the mutation is a background error, the mutation is removed, and the background error rate in a base distribution file is reduced after polishing.
And seventhly, performing high-precision mutation detection by using the frequency distribution difference among the samples.
Constructing a gene library of PBMC (peripheral blood mononuclear cells), obtaining sequencing data according to the fourth to sixth steps, using the sequencing data as a control sample, and performing mutation frequency detection on a preoperative plasma sample, wherein mutation sites meet the following requirements: when the double-ended sequence (namely, the double-stranded reads with positive and negative support) is covered, more than 2 mutant reads are supported, when the single-ended sequence (namely, the reads with positive or negative support) is covered, more than 1 mutant read is supported, the mutation frequency is more than-log (0.01)/depth, and the mutation frequency sequenced in a control sample is less than 0.005 so as to remove false positive results.
Eighthly, cross-aligning the mutation detected before the operation with the mutation from the tumor tissue, identifying the overlapped part, and comparing the mutation frequency distribution of the overlapped part (the mutation detected in both blood and tissue, namely the tumor specific mutation detected in blood).
The frequency distribution of overlapping mutations is shown in figure 2. The figure shows that there is a specific pattern in the frequency distribution of tumor-specific mutations in blood, and correspondingly, the frequency of non-tumor-specific mutations in blood is more in the low frequency region. Therefore, the lower quartile of the tumor specific mutation frequency distribution in blood was selected, 0.02611, as the basis for mutation frequency screening. This frequency threshold will serve as the first condition for filtering mutations without tissue dependence.
All the numerical values are arranged according to the size sequence and are divided into four equal parts, and the scores at the positions of the three dividing points are quartiles. The smallest quartile is called the lower quartile.
And ninthly, filtering all the mutations detected before the operation based on the frequency threshold 0.02611 determined in the last step to obtain the screened mutations as the candidates of the tumor-derived mutations. The mutation frequencies of the sites which are more than the frequency threshold value are extracted from the blood samples after the operation according to the steps of three to seven, and the frequency distribution of the tumor tissue specificity mutation of the plasma before the operation in the plasma after the operation is shown as the figure 3.
Ten, the ratio of the change in mutation frequency before (FIG. 2) and after (FIG. 3) the operation was compared and the peak was plotted as shown in FIG. 4. The figure shows that, according to the change situation of the mutation frequency, a peak image is observed, a threshold value which can clearly separate two different mutation peaks is selected as a diagnosis threshold value, the diagnosis threshold value is positioned at the lowest point of the two mutation peaks, and the mutation detected before the operation can be divided into two clear groups, wherein the group which is obviously reduced reflects the influence of the tumor resection operation on the tumor load and is the tumor specific mutation; another group should be mutations of other origin (i.e.of non-tumor specific origin).
Eleven, extracting the group of mutations to perform gene overlap analysis (figure 5) and length distribution analysis (figure 6), and confirming that the group of mutations have the characteristics of typical tumor-derived mutations. This population of mutations can therefore be considered to be tumor-derived specific mutations.
Thirteen, based on the group of mutations in postoperative samples in the presence or absence of judging hepatocellular carcinoma patients tumor tiny residual focus, and use R for KM survival analysis (figure 7). The results show that hepatocellular carcinoma patients with minimal residual foci had a significantly worse prognosis than patients without detectable minimal residual foci. The above results highlight that the tumor-derived mutations can be screened based on the frequency change of somatic mutation before and after the operation without depending on tumor tissues, and thus the tumor minimal residual foci can be identified.
Example 2:
ctDNA detection analysis was performed on pre-operative and post-operative plasma samples of the patient by the method of example 1, and there was a mutation having a frequency change rate greater than a diagnostic threshold (0.2) before and after the mutation. By analyzing the distribution of these mutations on the genes and the distribution of the sequence lengths, it was found that the mutant genes include important genes associated with tumorigenesis and thus are tumor-associated mutations. After analysis, the tumor-related mutation still exists after operation, and the disease is judged to have a tiny residual focus. Following follow-up, the patient was found to have imaging recurrence by the end of the procedure for 84 days.
Example 3:
ctDNA detection analysis was performed on pre-operative and post-operative plasma samples of the patient by the method of example 1, and there was a mutation having a frequency change rate greater than a threshold (0.2) before and after the mutation. Analysis of gene distribution and sequence length distribution supports the group mutations to be tumor specific mutations. In the post-operative samples, the frequency of all mutations in this group was 0, and it was judged that there was no minute residual lesion. The patient is found to have no relapse within one year after the operation through follow-up.

Claims (5)

1. A method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution is characterized by comprising the following steps:
step 1, extracting cfDNA from the extracted blood plasma of a subject;
step 2, performing end repairing on the cfDNA extracted in the step 1, adding A base at the 3' end of the cfDNA, connecting and sequencing a truncated Y-shaped joint with a molecular tag UMI at the two ends of the processed cfDNA to obtain a connection product, performing magnetic bead purification on the connection product, performing amplification enrichment and magnetic bead purification by using a primer containing an index tag sequence for distinguishing different samples, adding a probe with a biotin tag to perform hybridization reaction, eluting the library by using a magnetic bead with a chain and an avidin tag, capturing a nucleic acid molecule with the biotin tag, performing amplification and magnetic bead purification, constructing a high-throughput sequencing library, and finally performing sequencing;
step 3, extracting UMI reads in the original sequence file, separating UMI information of 3 basic groups at two ends of each read, adding the extracted UMI information into the read names for storage, cutting the UMI information in the reads from the original sequence to obtain sequencing reads after UMI extraction, and comparing the sequencing reads after UMI extraction with a human reference genome;
step 4, extracting all sequencing reads after extracting UMI which are aligned to the same position of the human reference genome by using bwa, and removing the sequencing reads of the UMI information if the UMI information of the sequencing reads is the same but the base sequence is not consistent; if the UMI information of a plurality of sequencing reads is the same and the base sequences are consistent, marking as a common sequence and reserving; if the UMI information of only 1 sequencing read is the same and the base sequences are consistent, marking as an isolated sequence and reserving;
step 5, the isolated sequence is subjected to noise reduction and then is recovered and is combined with the common sequence to form a final output bam file;
step 6, converting the bam file into a base frequency distribution file by using an iDES algorithm, and then polishing the base frequency distribution file by using a reference data set of a human reference genome to obtain base frequency distribution after noise reduction;
and 7, taking sequencing data of the PBMCs as a control sample, carrying out mutation frequency detection on the data processed in the step 6, and removing false positive mutation to obtain the mutation frequency distribution of the cfDNA of the subject.
2. The method of claim 1, wherein the condition identified as a mutation in the mutation frequency detection in step 7 is: and meanwhile, when mutation frequency detection coverage is carried out on the sequencing reads from the positive strand and the negative strand, more than 2 mutation reads are available, or when mutation frequency detection coverage is carried out on the sequencing reads from the positive strand/the negative strand, more than 1 mutation read is available, the mutation frequency is more than-log (0.01)/depth, and the mutation frequency sequenced in a control sample is less than 0.005.
3. The method of claim 1, wherein de-noising and recovering the isolated sequence comprises: and calculating the base distribution of all the human reference genome positions only with isolated sequences, evaluating the magnitude of sequencing noise based on the binomial distribution, and discarding the isolated sequences when the base distribution of the corresponding isolated sequences is inconsistent with the main bases of the human reference genome positions and the base frequency is less than the noise, or else, recovering the isolated sequences.
4. The method of claim 1, further comprising, after step 7: detecting the mutation frequency distribution of the preoperative cfDNA of the tumor patient and the mutation frequency distribution of the same tumor tissue sample according to the steps 1-7, identifying the overlapped part, comparing the mutation frequency distribution of the overlapped part, and selecting the lower quarter quantile of the mutation frequency distribution of the overlapped part as a frequency threshold.
5. The method according to claim 4, wherein the mutation frequency distribution of the post-operative cfDNA of the tumor patient is detected according to steps 1 to 7, the mutation frequency distribution of the pre-operative and post-operative cfDNA of the tumor patient is filtered based on the obtained frequency threshold, the filtered mutation frequency distribution is obtained, the mutation frequency change ratio is calculated, and a peak map is drawn.
CN202210110195.XA 2022-01-29 2022-01-29 Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution Active CN114182022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210110195.XA CN114182022B (en) 2022-01-29 2022-01-29 Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210110195.XA CN114182022B (en) 2022-01-29 2022-01-29 Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution

Publications (2)

Publication Number Publication Date
CN114182022A true CN114182022A (en) 2022-03-15
CN114182022B CN114182022B (en) 2024-07-09

Family

ID=80545834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210110195.XA Active CN114182022B (en) 2022-01-29 2022-01-29 Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution

Country Status (1)

Country Link
CN (1) CN114182022B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898802A (en) * 2022-07-14 2022-08-12 臻和(北京)生物科技有限公司 Terminal sequence frequency distribution characteristic determination method, evaluation method and device based on plasma free DNA methylation sequencing data
CN116356001A (en) * 2023-02-07 2023-06-30 江苏先声医学诊断有限公司 Dual background noise mutation removal method based on blood circulation tumor DNA

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021788A (en) * 2017-12-06 2018-05-11 深圳市新合生物医疗科技有限公司 The method and apparatus of deep sequencing data extraction biomarker based on cell free DNA
CN113903401A (en) * 2021-12-10 2022-01-07 臻和(北京)生物科技有限公司 ctDNA length-based analysis method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021788A (en) * 2017-12-06 2018-05-11 深圳市新合生物医疗科技有限公司 The method and apparatus of deep sequencing data extraction biomarker based on cell free DNA
CN113903401A (en) * 2021-12-10 2022-01-07 臻和(北京)生物科技有限公司 ctDNA length-based analysis method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AARON M NEWMAN ET AL: "Integrated digital error suppression for improved detection of circulating tumor DNA", 《NATURE BIOTECHNOLOGY》, pages 547 - 560 *
AMY K. KIM ET AL: "Urine as a non-invasive alternative to blood for germline and somatic mutation detection in hepatocellular carcinoma", 《MEDRXIV》, pages 1 - 32 *
XIAOXING LV ET AL: "Detection of Rare Mutations in CtDNA Using Next Generation Sequencing", 《JOURNAL OF VISUALIZED EXPERIMENTS》, pages 1 - 8 *
郭晓冬;姜晓峰;高卓;: "循环肿瘤DNA在非小细胞肺癌中的研究进展", 中国实验诊断学, no. 02, pages 156 - 161 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898802A (en) * 2022-07-14 2022-08-12 臻和(北京)生物科技有限公司 Terminal sequence frequency distribution characteristic determination method, evaluation method and device based on plasma free DNA methylation sequencing data
CN116356001A (en) * 2023-02-07 2023-06-30 江苏先声医学诊断有限公司 Dual background noise mutation removal method based on blood circulation tumor DNA
CN116356001B (en) * 2023-02-07 2023-12-15 江苏先声医学诊断有限公司 Dual background noise mutation removal method based on blood circulation tumor DNA

Also Published As

Publication number Publication date
CN114182022B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
CN108893466B (en) Sequencing joint, sequencing joint group and detection method of ultralow frequency mutation
EP1169479B1 (en) Methods for detecting nucleic acids indicative of cancer
CN102311953B (en) Method and kit for diagnosing bladder cancer with urine
US20080145852A1 (en) Methods and compositions for detecting adenoma
CN109637587B (en) Method, device, storage medium, processor and method for standardizing transcriptome data expression quantity for detecting gene fusion mutation
WO2021169875A1 (en) Cancer gene methylation measuring system and cancer in vitro detection method executed in same
CN114182022B (en) Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution
WO2021180106A1 (en) Probe composition for detecting five tumors of digestive tract
WO2021180105A1 (en) Probe composition for detecting common cancers of both sexes
CN107142320B (en) Gene marker for detecting liver cancer and application thereof
WO2021185274A1 (en) Probe composition for detecting 6 cancers with high incidence in china
WO2022143396A1 (en) Reagent kit for methylation detection of breast tumor specificity
CN111748628B (en) Primer and kit for detecting thyroid cancer prognosis related gene variation
CN115094142B (en) Methylation markers for diagnosing lung-intestinal adenocarcinoma
WO2021175284A1 (en) Probe composition for detecting three types of solid organ tumors
WO2021169874A1 (en) Probe composition for detecting three lumen organ tumors
CN113355416A (en) Nucleic acid composition, kit and detection method for detecting gastric cancer related gene methylation
CN113667757B (en) Biomarker combination for early screening of prostate cancer, kit and application
CN115976216A (en) Methylation marker for differential diagnosis of benign and malignant lung nodules as well as screening method and application thereof
EP4273269A1 (en) Tumor marker and application thereof
WO2021185275A1 (en) Probe composition for detecting 11 cancers
CN111020710A (en) ctDNA high-throughput detection of hematopoietic and lymphoid tissue tumors
CN113817822B (en) Tumor diagnosis kit based on methylation detection and application thereof
CN113234822A (en) Method for capturing genetic colorectal cancer genome target sequence
CN113948150B (en) JMML related gene methylation level evaluation method, model and construction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant