CN114182022B

CN114182022B - Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution

Info

Publication number: CN114182022B
Application number: CN202210110195.XA
Authority: CN
Inventors: 刘小龙; 陈耕; 刘景丰; 彭放; 蔡志雄; 张虎勤
Original assignee: Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases); Xian Jiaotong University
Current assignee: Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases); Xian Jiaotong University
Filing date: 2022-01-29
Publication date: 2024-07-09
Anticipated expiration: 2042-01-29

Abstract

The invention discloses a method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution, which is a screening strategy for gene mutation in blood, does not need to rely on tumor tissue sampling, and is a completely noninvasive tumor mutation detection scheme for accurately screening tumor source mutation. High throughput UMI sequencing was performed by collecting pre-operative and post-operative blood samples of tumor patients, extracting cfDNA separately. After high-precision sequence processing and merging, mutation detection is carried out on each blood sample. After identifying the mutations, all somatic mutations were used to group different types of gene mutations using frequency changes at two time points before and after surgery. And then integrating the screened tumor source mutations, wherein the screened tumor source specific mutations can be further used for dynamic evaluation of tumor burden of time series samples.

Description

Method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution

Technical Field

The invention relates to the technical field of bioinformatics, in particular to a detection method of base mutation frequency distribution in cfDNA and application thereof in tumor prognosis evaluation.

Background

Cancer is one of the major scientific problems that endanger human health. Statistics data in 2020 shows that new cases of cancer can reach 1930 ten thousand each year, death can reach 1000 ten thousand, and the incidence rate is still rising year by year. At present, surgical excision is a main treatment mode of various tumors, but tumor patients still face the problems of unsatisfactory surgical excision rate and high recurrence probability. The existence of tiny residual focus after operation is the main reason of recurrence after tumor operation, and the tiny residual focus is detected in time after operation, which has important value for adjusting treatment scheme and improving prognosis of tumor patients. At present, the traditional imaging means and blood markers have very limited detection capability for tiny residual lesions, and the problem of prognosis evaluation which is continuously faced in clinic is difficult to solve. The liquid biopsy technology which is developed at high speed in recent years can dynamically evaluate the tumor load of a patient, has higher sensitivity, and provides a new tool and strategy for detecting tiny residual focus after tumor operation.

Wherein, based on genomic information of circulating tumor DNA (circulating tumor DNA) in plasma episome DNA (circulating free DNA), a dynamic tumor burden assessment strategy using sequencing technology to track tumor genomic features has become one of the clinical application hot spot fields. At present, a plurality of different clinical research institutions at home and abroad simultaneously develop clinical experiments for detecting residual tumor lesions of patients after operation by using cfDNA tracking so as to guide the optimization of treatment strategies. In 2020, the FDA approved the liquid biopsy companion therapy detection protocol of the first high throughput sequencing technique, and in 2021, the FDA gave SIGNATERA a breakthrough detection medical device identification confirming its value in assessing tiny residual lesions, monitoring prognosis. However, in solid tumors, especially liver cancer, detection of microscopic residual lesions also often requires reliance on tissue genomic information obtained prior to tissue sampling. How to non-invasively screen for tumor-specific mutations and evaluate tumor burden without reliance on tissue remains one of the significant problems that need to be addressed clinically.

CfDNA contains DNA fragments from different tissues and different cell populations. During the clinical course of a patient, the proportion of DNA fragments from different tissues in cfDNA will change dynamically with tumor burden and the patient's physiological and pathological state. Thus, analysis of mutation frequency variation by cfDNA samples at multiple time points provides the possibility to screen for mutations from specific tissues. However, at present, a time sequence cfDNA sample is used for tumor mutation screening, tumor burden is identified, and a scheme for evaluating the existence of tiny residual focus is not established.

Disclosure of Invention

In order to solve the technical problems, the invention develops a detection method for the base mutation frequency distribution in cfDNA, which is a screening strategy for gene mutation in blood, does not need to rely on tumor tissue sampling, and is a completely noninvasive tumor mutation detection scheme for accurately screening tumor-derived mutation.

The technical scheme adopted by the invention is as follows: a method for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution, comprising the following steps:

Step 1, extracting cfDNA from the blood plasma of the extracted subject;

Step 2, repairing the tail end of the cfDNA extracted in the step 1, adding an A base at the 3' end of the cfDNA, connecting and sequencing truncated Y-shaped joints with molecular tags UMI at the two ends of the cfDNA after treatment to obtain a connection product, performing magnetic bead purification on the connection product, amplifying and enriching with primers containing index tag sequences for distinguishing different samples, performing magnetic bead purification, adding a probe with biotin marks for hybridization reaction, eluting a library with magnetic beads with chains and avidin marks, capturing biotin marks, amplifying and purifying with the magnetic beads, constructing a high-flux sequencing library, and finally sequencing;

Extracting UMI read sections in an original sequence file, separating UMI information of 3 bases at two ends of each read section, adding the extracted UMI information into read section names for storage, cutting UMI information in the read sections from the original sequence to obtain a sequencing read section after UMI extraction, and comparing the sequencing read section after UMI extraction with a human reference genome;

Extracting all UMI extracted sequencing reads which are compared to the same position of the human reference genome by bwa, and removing the sequencing reads of UMI information if UMI information of the sequencing reads is the same but base arrangement is inconsistent; if UMI information of the plurality of sequencing reads is the same and the base arrangement is consistent, marking the sequences as common sequences and reserving the common sequences; if only 1 sequencing read has the same UMI information and the base arrangement is consistent, marking the sequence as an isolated sequence and reserving the sequence;

step 5, recovering after noise reduction of the isolated sequence, and merging the common sequence to form a final output bam file;

Step 6, converting the bam file into a base frequency distribution file by using an iDES algorithm, and polishing the base frequency distribution file by using a reference data set of a human reference genome to obtain base frequency distribution after noise reduction;

And 7, using sequencing data of the PBMC as a control sample, detecting mutation frequency of the data processed in the step 6, and removing false positive mutation to obtain mutation frequency distribution of cfDNA of the subject.

Further, the conditions for identifying the mutation in the mutation frequency detection in step 7 are: meanwhile, when mutation frequency detection coverage is carried out on sequencing reads from positive and negative double chains, more than 2 mutation reads are carried out, or when mutation frequency detection coverage is carried out on sequencing reads from positive chains/negative chains, more than 1 mutation read is carried out, meanwhile, the mutation frequency is more than-log (0.01)/depth, and the mutation frequency of sequencing in a control sample is less than 0.005.

Further, the post-noise reduction recovery of the isolated sequence includes: calculating the base distribution of all the human reference genome positions with only the isolated sequence, evaluating the size of sequencing noise based on binomial distribution, and discarding the isolated sequence when the base distribution of the corresponding isolated sequence is inconsistent with the main base of the human reference genome position and the base frequency is less than the noise, otherwise recovering the isolated sequence.

Further, step 7 further comprises: detecting mutation frequency distribution of preoperative cfDNA of a tumor patient and mutation frequency distribution of a homogeneous tumor tissue sample according to the steps 1-7, identifying overlapped parts, comparing the mutation frequency distribution of the overlapped parts, and selecting the lower quarter of the mutation frequency distribution of the overlapped parts as a frequency threshold.

Further, detecting mutation frequency distribution of cfDNA after operation of a tumor patient according to steps 1-7, carrying out filtration on mutation frequency distribution of cfDNA before operation and after operation of the tumor patient based on the obtained frequency threshold, obtaining mutation frequency distribution after screening, calculating mutation frequency change ratio, and drawing a peak diagram.

CfDNA was extracted separately for high throughput UMI sequencing by collecting pre-operative and post-operative blood samples from liver cancer patients. After high-precision sequence processing and merging, mutation detection is carried out on each blood sample. After identifying the mutations, all somatic mutations were used to group different types of gene mutations using frequency changes at two time points before and after surgery. And then integrating the screened tumor source mutations, wherein the screened tumor source specific mutations can be further used for dynamic evaluation of tumor burden of time series samples.

The invention has carried out in-depth analysis on the mutation spectrum of cfDNA of a tumor patient, and discovers for the first time that the dynamic change of the cfDNA of plasma of the tumor patient before and after the operation can be used for separating the mutation from the tumor, and the mutation can be used for evaluating the effect of the tumor excision operation. Thus, tumor specific mutations can be identified noninvasively by only extracting pre-operative post-operative cfDNA of the patient.

In summary, the frequency of the detected somatic mutations is evaluated by using the preoperative and postoperative plasma samples of the tumor surgery patients, so that the specific mutations of the tumor can be screened.

The beneficial effects of the invention are mainly as follows: according to the technical scheme, isolated sequence noise, mutation background noise and false positive mutation of cfDNA of a subject are eliminated, and high-precision mutation frequency distribution data are obtained. The mutation frequency distribution data with high precision is also screened by using a frequency threshold value, and the non-tumor specific mutation frequency is removed.

Drawings

FIG. 1 distribution of tumor tissue specific mutations in plasma;

FIG. 2 frequency distribution of tumor tissue specific mutations in preoperative plasma;

FIG. 3 frequency distribution of preoperative plasma tumor tissue specific mutations in post-operative plasma;

FIG. 4 density distribution of preoperative mutation frequency ratios for tissue-free samples;

FIG. 5 Cross-over of two sets of mutant genes within and outside the threshold with tumor-specific mutant genes, chi-square test: p value 0.00017;

FIG. 6. Ratio of the length of the mutation reads of the upper and lower groups of threshold values within 150 bp;

Fig. 7. The grouping is significantly correlated with time to Recurrence Free Survival (RFS).

Detailed Description

The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto, and the terms "pre-operative", "post-operative" as used herein refer to both pre-operative and post-operative removal of a tumor.

The experimental materials used in the examples described below, unless otherwise specified, were purchased from conventional biochemical reagent stores.

Materials and equipment:

(1) QIAamp Circulating Nucleic Acid extraction kit was purchased from QIAGEN company;

(2) The chain and streptavidin-labeled magnetic beads are purchased from INTEGRATED DNA Technologies (IDT) company under the trade designation 1080589, and the streptavidin magnetic beads are formed by covalently binding magnetic particles with high-purity streptavidin. Magnetic beads can be used to capture biotin complexes, including biotin-labeled antigens, antibodies, and nucleic acids. The interaction of biotin-streptavidin is strong, and the non-specific binding rate is low, so that the captured substrate can meet the requirement of subsequent experiments;

(3) Biotin-labeled probe panel was purchased from naonda (nanjing) biotechnology limited, cat: 1001111E, the probe panel involves 578 genes of broad interest in solid tumor research, covering the approximately 2.6Mb region of the genome; support the enrichment of a variety of variant information including base substitutions, insertions/deletions, gene rearrangements, gene amplifications, microsatellite instability.

(4) ConsensusCruncher is a tool to suppress error rates of second generation sequencing data by deduplicating reads on the same DNA template by unique identifier Unique Molecular Identifiers (UMI).

Example 1:

1. 100 cases of pre-operation postoperative plasma samples and tumor tissue samples of hepatocellular carcinoma patients, 38 cases of cirrhosis patients and 30 healthy volunteers were collected.

2. According to the preoperative and postoperative plasma samples and tumor tissue samples collected into the patients with the group of hepatocellular carcinoma, 8mL of whole blood is taken from the follow-up blood samples of the preoperative and postoperative plasma samples and tumor tissue samples respectively, plasma is collected by a two-step centrifugation method, and the cfDNA of the plasma is extracted according to the operation instructions of the QIAamp Circulating Nucleic Acid extraction kit.

1. To a 50mL centrifuge tube were added 3mL of plasma, 300. Mu.L of protease K, 2.4mL of Buffer ACL (containing CARRIER RNA), vortexed 30s, and incubated at 60℃for 30min.

2. The tube was removed, 5.4mL Buffer ACB was added, vortexed for 30s, and incubated on ice for 5min.

3. VacConnector, mini column and 20mL tube extension were sequentially inserted onto QIAvac Plus.

4. And adding the sample mixed solution into the tube extender, starting a vacuum pump, closing the vacuum pump after the sample passes through the column, removing 20mL of the tube extender, and sequentially adding 600 mu L of Buffer ACW1, 750 mu L of Buffer ACW2 and 750 mu L of absolute ethyl alcohol into the Mini column for washing.

5. After the elution is completed, the Mini column is pulled out and placed on a new 1.5mL EP tube, 20000g is centrifuged for 3min, the supernatant is discarded, the cover is opened, and the mixture is placed in an incubator at 56 ℃ for incubation for 10min.

6. Adding 30-50 mu L Buffer AVE into Mini column, incubating for 3min at normal temperature, centrifuging for 3min at 20000g, and collecting cfDNA sample.

7. The concentration of cfDNA is precisely quantified by using Qubit, and the size of cfDNA fragments is detected by using Qsep 100,100 capillary electrophoresis system. The prepared cfDNA is placed at the temperature of-80 ℃ for standby.

3. Construction of genomic library (NanoPrep ^TM DNA library construction kit (for)) Collocation Duplex SEQ ADAPTERS KIT

1. Repairing the tail end of cfDNA, and adding an A base at the 3' -end;

1) The reaction mixture was prepared as shown in the following table:

Reagent(s)	Dosage of
		cfDNA	10-20ng
End Repair&A-Tailing Buffer	6μL
		End Repair&A-Tailing Enzyme	4μL
H₂O	Make up to 50 mu L

After being mixed with 40 mu L of cfDNA sample, the mixture was put into a PCR instrument under the following reaction conditions:

Temperature (temperature)	Time of
		20℃	30min
65℃	30min
		10℃	∞

2. The cfDNA was ligated at both ends to sequence a truncated Y-linker with a molecular tag (UMI) containing a random molecular tag sequence.

1) The reaction mixture was prepared as shown in the following table:

Reagent(s)	Dosage of
		IDT Duplex adpater(15uM)	2μL
Ligation Buffer	26μL
		Totals to	28μL

After being mixed with 50 mu L of the end repair product of the last step, 2 mu L DNALIGASE is added finally, and the mixture is put into a PCR instrument for reaction under the following conditions:

Temperature (temperature)	Time of
		20℃	15min
4℃	∞

2) Purification of the ligation product:

Adding 40 mu L of NanoPrep ^TM SP Beads into the connection product, uniformly mixing, incubating for 5-10min at normal temperature, placing in a magnetic rack, and discarding the supernatant after clarification; adding 150 mu L of 80% ethanol, incubating for 30s at room temperature, and rinsing the magnetic beads twice; removing residual alcohol, standing at room temperature for 5-10min, and air drying the magnetic beads; add 21. Mu.L Nuclear FREE WATER, incubate at 25℃for 2min, remove magnetic beads by magnetic rack adsorption, transfer 20. Mu.L of supernatant to a new 0.2mLPCR tube and collect purified ligation product.

3. Amplification enrichment with primers containing index tag sequences for discriminating between different samples

1) The reaction mixture was prepared as shown in the following table:

Reagent(s)	Dosage of
		HiFi PCR Master Mix,2x	25μL
UDI Primer Mix(5uM)	5μL
		Totals to	30μL

After mixing with 20. Mu.L of the ligation product purified in the previous step, the mixture was subjected to the following reaction conditions in a PCR apparatus:

2) Adding 50 mu L of NanoPrep ^TM SP Beads into the PCR product, uniformly mixing, incubating for 5-10min at normal temperature, placing in a magnetic rack, clarifying, and discarding the supernatant; adding 150 mu L of 80% ethanol, incubating for 30s at room temperature, and rinsing the magnetic beads twice; removing residual alcohol, standing at room temperature for 5-10min, and air drying the magnetic beads; add 21. Mu.L Nuclear FREE WATER, incubate at 25℃for 2min, remove magnetic beads by magnetic rack adsorption, transfer 20. Mu.L of supernatant to a new 0.2mLPCR tube, collect purified PCR product, perform Qubit quantification and capillary electrophoresis.

4. Hybridization reaction

1) 8-12 Constructed libraries were mixed 500ng each into one pool, 5. Mu.L of COT Human DNA and 2. Mu. LBlocker were added and dried in vacuo.

2) Hybridization reagent preparation

Reagent(s)	Dosage of
		2X Hybridization Buffer	8.5μL
Hybridization Buffer Enhancer	2.7μL
		probes	4μL
Nuclease-Free Water	1.8μL
		Totals to	17μL

Resuspension the vacuum dried product with the prepared buffer, and placing the product into a PCR instrument for incubation under the following conditions:

Temperature (temperature)	Time of
		95℃	30s
65℃	16h

5. After hybridization, the library is eluted with magnetic beads with chain and avidin labels to capture biotin-labeled nucleic acid molecules

1) Preparing hybridization elution buffer

Reagent(s)	Dosage of	H₂O	Totals to
				2X Bead Wash Buffer	160μL	160μL	320μL
10X Wash Buffer 1	28μL	252μL	280μL
				10X Wash Buffer 2	16μL	144μL	160μL
10X Wash Buffer 3	16μL	144μL	160μL
				10X Stringent Wash Buffer	32μL	288μL	320μL

2) Preparation of bead resuspension Mix

Reagent(s)	Dosage of
		2X Hybridization Buffer	8.5μL
Hybridization Buffer Enhancer	2.7μL
		Nuclease-Free Water	5.8μL
Totalizing	17μL

3) Elution of hybridization products

Taking 50 mu L Capture Beads to 1.5mL low adsorption tube, adding 100 mu L Bead Wash Buffer, gently mixing with gun, magnetically holding for 1min, removing supernatant, adding 17 mu L bead resuspension Mix resuspension Beads, adding prepared Capture Beads to hybridization product, blowing and mixing, incubating at 65deg.C for 45min to Capture hybridization library. The captured hybridization library was then washed sequentially with Wash Buffer1, STRINGENT BUFFER, wash Buffer 2, and Wash Buffer 3, and finally 20. Mu. LNuclease-FREE WATER was used for library elution.

PCR amplified hybridization library

1) Preparing PCR reagent.

Reagent(s)	Dosage of
		HiFi PCR Master Mix,2x	25μL
Library AmplificationPrimer Mix(5uM)	5μL
		Totals to	30μL

Mixing with the hybridization elution product of the last step, and carrying out PCR amplification under the following reaction conditions:

2) PCR product purification

Adding 50 mu L of NanoPrep ^TM SP loads into the PCR product, uniformly mixing, incubating for 5-10min at normal temperature, placing in a magnetic rack, and discarding the supernatant after clarification; adding 150 mu L of 80% ethanol, incubating for 30s at room temperature, and rinsing the magnetic beads twice; removing residual alcohol, standing at room temperature for 5-10min, and air drying the magnetic beads; adding 21 mu L of Nuclear FREE WATER, incubating for 2min at 25 ℃, removing magnetic beads by magnetic rack adsorption, transferring 20 mu L of supernatant into a new 0.2mLPCR tube, collecting purified PCR products, accurately quantifying the concentration of the library by using Qubit, detecting the size of the library fragments by using a Qsep 100 capillary electrophoresis system, wherein the fragments are mainly concentrated to about 350 bp.

7. Finally, sequencing was performed on the Illumina Hiseq X ten platform.

4. Extracting UMI (unique molecular identifiers molecular tag) reads in the original sequence file using ConsensusCruncher:

Setting a UMI mode as NNN, reading UMI information of 3 bases at two ends of each reading segment, adding the extracted UMI information into the names of the reading segments for storage, cutting the UMI information in the reading segments from an original sequence to obtain a sequencing reading segment after UMI extraction, and comparing the sequencing reading segment after UMI extraction with a reference genome hg 19.

5. Continuing to use ConsensusCruncher, comparing the UMI extracted sequencing read with bwa to the hg19 genome, recording position information, and then identifying UMI common sequences and separating isolated sequences.

Wherein, the common sequence refers to a sequence with a plurality of sequencing read sequences identified to an alignment position together, UMI information and base sequencing are completely consistent, and the isolated sequence is a sequence identified by a single sequencing read.

The specific separation process is as follows: firstly, extracting all UMI sequencing reads which are compared to the same position of a genome, and counting the variety number of corresponding UMI information; judging whether base arrangements of all sequencing reads of the same UMI information aligned to the same position of the genome are consistent or not, and if not, considering the sequencing reads of the UMI information at the position to introduce PCR errors for removal; when the base arrangement of the sequencing reads of the same UMI information aligned to the same position of the genome is consistent and only one, the sequencing sequence is regarded as an isolated sequence, the sequence is reserved after marking, and when the base arrangement of the sequencing reads of the same UMI information aligned to the same position of the genome is consistent and more than one, the sequencing sequences are regarded as a common sequence, and the sequence is reserved after marking. Wherein the consensus sequence will be directly retained for further analysis; the isolated sequence should be noise processed in advance: the base distribution of all genomic positions supported by only the isolated sequence will be calculated, the magnitude of sequencing noise will be estimated based on the binomial distribution, when the base distribution of the corresponding isolated sequence is not consistent with the main base of the position and the base frequency is less than the noise, the isolated sequence will be discarded, otherwise recovered and the next analysis will be performed. And merging the recovered isolated sequence with the common sequence to form a final bam file.

6. The bam file was converted to a base frequency distribution file inside the sequencing region using iDES.

The iDES algorithm uses sequence information for all sequencing reads in the bam file based on each genomic position. The base distribution at this position is extracted. The base frequency distribution file was polished based on the distribution of bases of the reference dataset of the pre-established hg19 genome at the whole genome level. Polishing refers to constructing a distribution model based on the distribution of the bases of the hg19 genome at the whole genome level, carrying out hypothesis testing on the candidate mutation of the sequencing read according to the model, judging the authenticity of the candidate mutation, estimating the probability of false detection of the mutation bases, removing the mutation as a background error, and reducing the background error rate in a base distribution file after polishing.

7. And finally, carrying out high-precision mutation detection by using the frequency distribution difference among samples.

Constructing a gene library of PBMC (peripheral blood mononuclear cells), obtaining sequencing data according to the fourth to sixth steps, and using the sequencing data as a control sample to detect the mutation frequency of a preoperative plasma sample, wherein the mutation sites are required to meet the following requirements: for double-ended (i.e., both positive and negative double-stranded read support) sequence coverage, there are more than 2 mutant read supports, for single-ended sequence (i.e., only positive or negative strand read support) coverage, there are more than 1 mutant read support, with a mutation frequency greater than-log (0.01)/depth, and the mutation frequency sequenced in the control sample is less than 0.005 to remove false positive results.

8. The mutation detected before surgery was cross-aligned with the mutation from tumor tissue, overlapping portions were identified, and the mutation frequency distribution of the overlapping portions (mutation detected in both blood and tissue, i.e., tumor-specific mutation detected in blood) was compared.

The frequency distribution of overlapping mutations is shown in figure 2. The figure shows that there is a specific pattern in the frequency distribution of tumor specific mutations in the blood, and correspondingly, the frequency of non-tumor specific mutations in the blood is more in the low frequency region. Therefore, the lower quarter of the distribution of tumor specific mutation frequencies in blood, 0.02611, is selected as the basis for mutation frequency screening. This frequency threshold will be the first condition to filter mutations without depending on the tissue.

All values are arranged in order of magnitude and divided into four equal parts, and the score at the position of the three division points is the quartile. The smallest quartile is called the lower quartile.

9. All mutations detected before surgery are filtered based on the frequency threshold 0.02611 determined in the previous step, and the selected mutations are used as alternatives of tumor-derived mutations. The frequency distribution of the preoperative plasma tumor tissue specific mutations in the postoperative plasma is shown in FIG. 3 by extracting mutation frequencies of the sites above the frequency threshold according to the three-seven pairs of postoperative blood samples.

10. The ratio of the mutation frequency changes was calculated and compared between preoperative (fig. 2) and postoperative (fig. 3), and the peak pattern was plotted as shown in fig. 4. The graph shows that according to the mutation frequency change condition, the peak graph is observed, the threshold value capable of clearly separating two different mutation peaks is selected as a diagnosis threshold value, the diagnosis threshold value is positioned at the lowest point of the dip between the two mutation peaks, the mutation detected before operation can be divided into two clear groups, wherein the group with obvious dip should reflect the influence of the tumor excision operation on the tumor load and should be tumor specific mutation; the other group should be mutated from other sources (i.e., non-tumor specific sources).

11. The group of mutations was extracted for gene overlap analysis (FIG. 5) and length distribution analysis (FIG. 6), and the results confirmed that the group of mutations had the characteristics of typical tumor-derived mutations. Thus the group mutation can be considered as a specific mutation of tumor origin.

13. The presence or absence of a minimal residual lesion of a tumor in a hepatocellular carcinoma patient was judged based on the presence or absence of the group mutation in the postoperative sample, and KM survival analysis was performed using R (fig. 7). The results showed that hepatocellular carcinoma patients with minimal residual lesions were significantly worse than those without minimal residual lesions detected. The above results highlight that the frequency change based on the preoperative and postoperative somatic mutation can screen the tumor-derived mutation on the premise of not depending on tumor tissues, and thereby identify the tiny residual lesion of the tumor.

Example 2:

ctDNA detection analysis was performed as in example 1, taking pre-and post-operative plasma samples of the patient for the presence of mutations with a pre-operative frequency change rate greater than the diagnostic threshold (0.2). By analyzing the distribution and sequence length distribution of these mutations on the gene, it was found that the mutant genes include important genes related to tumorigenesis and thus are tumor-related mutations. After analysis, it was found that tumor-related mutations remained after surgery, and it was judged that there were minute residual lesions. The patient was found to have an imaging recurrence at the bottom 84 days post-surgery by follow-up.

Example 3:

ctDNA detection analysis was performed as in example 1, taking pre-operative and post-operative plasma samples of the patient for the presence of mutations with a pre-operative frequency change rate greater than a threshold (0.2). Analysis of gene distribution and sequence length distribution supports the group mutation to tumor specific mutation. In the post-operative samples, the frequency of the group mutations was all 0, and no minimal residual lesions were determined. The patient was found to have not relapsed within one year after surgery by follow-up.

Claims

1. The device for detecting liver cancer specific mutation based on cfDNA base mutation frequency distribution is used for sequencing a high-throughput sequencing library, and the high-throughput sequencing library is established by adopting the following steps: step 1, extracting cfDNA from the blood plasma of the extracted subject;

Step 2, repairing the tail end of the cfDNA extracted in the step1, adding an A base at the 3' end of the cfDNA, connecting and sequencing truncated Y-shaped joints with molecular tags UMI at the two ends of the cfDNA after treatment to obtain a connection product, performing magnetic bead purification on the connection product, amplifying and enriching with primers containing index tag sequences for distinguishing different samples, performing magnetic bead purification, adding a probe with biotin marks for hybridization reaction, eluting a library with magnetic beads with chains and avidin marks, capturing biotin-marked nucleic acid molecules, amplifying and purifying with the magnetic beads, and constructing a high-throughput sequencing library;

The device for detecting the liver cancer specific mutation based on the cfDNA base mutation frequency distribution is characterized by comprising the following execution modules of obtaining the mutation frequency distribution of the cfDNA of a subject:

Extracting UMI read sections in an original sequence file, separating UMI information of 3 bases at two ends of each read section, adding the extracted UMI information into read section names for storage, cutting the UMI information in the read sections from the original sequence to obtain a sequencing read section after UMI extraction, and comparing the sequencing read section after UMI extraction with a human reference genome;

extracting all UMI extracted sequencing reads which are compared to the same position of a human reference genome by bwa, and removing the sequencing reads of UMI information if UMI information of the sequencing reads is the same but base arrangement is inconsistent; if UMI information of the plurality of sequencing reads is the same and the base arrangement is consistent, marking the sequences as common sequences and reserving the common sequences; if only 1 sequencing read has the same UMI information and the base arrangement is consistent, marking the sequence as an isolated sequence and reserving the sequence;

Thirdly, denoising the isolated sequence, recovering the isolated sequence, and merging the isolated sequence and the common sequence to form a final output bam file; post-noise reduction recovery of isolated sequences includes: calculating the base distribution of all the human reference genome positions with only the isolated sequence, evaluating the size of sequencing noise based on binomial distribution, and discarding the isolated sequence when the base distribution of the corresponding isolated sequence is inconsistent with the main base of the human reference genome positions and the base frequency is less than the noise, otherwise, recovering the isolated sequence;

Converting the bam file into a base frequency distribution file by using an iDES algorithm, and polishing the base frequency distribution file by using a reference data set of a human reference genome to obtain base frequency distribution after noise reduction;

Using the sequencing data of the PBMC as a control sample, detecting mutation frequency of the data processed by the module IV, and removing false positive mutation to obtain mutation frequency distribution of cfDNA of the subject; the conditions for identifying mutation in the mutation frequency detection in the fifth module are as follows: meanwhile, when mutation frequency detection coverage is carried out on sequencing reads from positive and negative double chains, more than 2 mutation reads are carried out, or when mutation frequency detection coverage is carried out on sequencing reads from positive chains/negative chains, more than 1 mutation read is carried out, meanwhile, the mutation frequency is more than-log (0.01)/depth, and the mutation frequency of sequencing in a control sample is less than 0.005;

and a sixth module: detecting mutation frequency distribution of preoperative cfDNA of a tumor patient and mutation frequency distribution of a homogeneous tumor tissue sample according to modules I-five, identifying overlapped parts, comparing the mutation frequency distribution of the overlapped parts, and selecting the lower quarter of the mutation frequency distribution of the overlapped parts as a frequency threshold;

and detecting mutation frequency distribution of cfDNA after operation of the tumor patient according to the first to fifth modules, filtering the mutation frequency distribution of cfDNA before operation and after operation of the tumor patient based on the obtained frequency threshold, obtaining the mutation frequency distribution after screening, calculating mutation frequency change ratio, and drawing a peak diagram.