CN112735531B - Methylation analysis method and device of circulating cell-free nucleosome active region, terminal equipment and storage medium - Google Patents

Methylation analysis method and device of circulating cell-free nucleosome active region, terminal equipment and storage medium Download PDF

Info

Publication number
CN112735531B
CN112735531B CN202110337436.XA CN202110337436A CN112735531B CN 112735531 B CN112735531 B CN 112735531B CN 202110337436 A CN202110337436 A CN 202110337436A CN 112735531 B CN112735531 B CN 112735531B
Authority
CN
China
Prior art keywords
nucleosome
methylation
sample
active region
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110337436.XA
Other languages
Chinese (zh)
Other versions
CN112735531A (en
Inventor
吕芳
宋小凤
于佳宁
裴志华
张琦
洪媛媛
李宇龙
何骥
陈维之
杜波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Precision Medical Laboratory Co ltd
Zhenhe Beijing Biotechnology Co ltd
Original Assignee
Wuxi Precision Medical Laboratory Co ltd
Zhenhe Beijing Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Precision Medical Laboratory Co ltd, Zhenhe Beijing Biotechnology Co ltd filed Critical Wuxi Precision Medical Laboratory Co ltd
Priority to CN202110337436.XA priority Critical patent/CN112735531B/en
Publication of CN112735531A publication Critical patent/CN112735531A/en
Application granted granted Critical
Publication of CN112735531B publication Critical patent/CN112735531B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation

Abstract

The invention provides a methylation analysis method and device of an active region of a circulating cell-free nucleosome, a terminal device and a storage medium, wherein the method comprises the following steps: acquiring capture sequencing data of a plasma sample to be detected and extracting cfDNA molecular fragments from the capture sequencing data; based on the extracted cfDNA molecule fragments, performing sliding operation by adopting windows in a genome interval, and calculating the number of the cfDNA molecules which cross the whole window from head to tail in each window and the ratio of the number of the cfDNA molecules covered by the window under different conditions; screening an interval with a significant difference from a baseline nucleosome activity difference area created according to a healthy human sample by a Kolmogorov-Simminov test method based on the calculated ratio to obtain a nucleosome activity area; and (3) calculating the methylation phenotype characteristics of the screened nucleosome active region, completing the methylation analysis of the circulating cell-free nucleosome active region, and effectively assisting in distinguishing the source of the plasma sample to be detected.

Description

Methylation analysis method and device of circulating cell-free nucleosome active region, terminal equipment and storage medium
Technical Field
The invention relates to the technical field of biomedicine, in particular to a methylation analysis method and device of an active region of a circulating cell-free nucleosome, terminal equipment and a storage medium.
Background
In recent years, the application of circulating free DNA (cfDNA) in biology and diagnosis has attracted much attention, for example, cfDNA sequencing has been widely used in clinic for noninvasive prenatal detection of fetal chromosomal aneuploidy, and cfDNA tumor-specific mutations have great prospects for diagnosis and monitoring of cancer. DNA methylation is a covalent modification that plays an important role in the expression of genes. The programming and reprogramming of DNA methylation patterns during embryonic development and somatic cell division is the fundamental pattern of epigenetics. The central role of DNA methylation in epigenetics stems from its covalent association with the genome and the persistently high activity of housekeeping DNA methyltransferases during cell division. The high methylation of cancer suppressor genes and the hypomethylation of proto-oncogenes have high correlation with the occurrence and development of tumors. The analysis of abnormal methylation of tumor cells can know the relevant information of the grading stage, invasion and metastasis, even survival period and the like of the tumor. Tumors release cfDNA into the blood and abnormal methylation of a large number of genes is found in the plasma cfDNA of different types of cancer patients. The crucial therapeutic approach for cancer is to diagnose cancer at an early stage, so it is very important to develop accurate and sensitive auxiliary means.
cfDNA derived from tumor cells is generally called ctDNA (circulating tumor DNA), which is very different in length, fragment end sequence, and position from cfDNA derived from normal somatic apoptosis. Nucleosomes are the basic unit of eukaryotic DNA packaging, and they contain 147bp of DNA wrapped in a histone octamer to form about 1.7 supercoils. Nucleosomes have a variety of roles in the nucleus, in addition to DNA packaging, the assembly of nucleosomes plays a crucial role in controlling the DNA accessibility of many DNA binding proteins to regulatory elements on chromosomes, and the location of the nucleosomes affects gene expression regulation, DNA replication, and DNA recombination. It has been demonstrated that ctDNA is more prone to break down in the nucleosome active region than cfDNA from normal apoptotic production. The distribution of cfDNA released by different blood plasma is very different from the healthy state under specific physiological conditions and during the course of the disease. At present, most of research is limited to the average methylation level of a single methylation site and the methylation levels of continuous methylation sites, and the methylation site state of the nucleosome active region is rarely analyzed, so that a methylation analysis method which is significant in the biomedical field and aims at the nucleosome active region is urgently needed.
Disclosure of Invention
In view of the above problems, the present invention provides a methylation analysis method and apparatus for circulating cell-free nucleosome active region, a terminal device and a storage medium, which are used for analyzing the methylation state of the nucleosome active region.
The technical scheme provided by the invention is as follows:
in one aspect the invention provides a method for methylation analysis of the active region of circulating cell free nucleosomes, comprising:
acquiring capture sequencing data of a plasma sample to be detected and extracting cfDNA molecular fragments from the capture sequencing data;
based on the extracted cfDNA molecule fragments, performing sliding operation by adopting a window with a preset length in a genome interval of the cfDNA molecule fragments in a preset step length, and calculating the ratio of the number of the cfDNA molecules which cross the whole window from head to tail in each window to the number of the cfDNA molecules under all different conditions covered by the window;
screening an interval with obvious difference from a baseline nucleosome activity difference area established according to a healthy human sample by a Kolmogorov-Similov test method based on the calculated ratio to obtain a nucleosome activity area;
and calculating the methylation phenotype characteristics of the nucleosome active region obtained by screening, and judging the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics to finish methylation analysis of the circulating cell-free nucleosome active region.
In the present solution, blood flows through the circulatory system and exchanges substances with the organs of the human body, so that it carries much status information about the organs. DNA fragments (cfDNA) released from human organs due to self-apoptosis or necrosis caused by disease enter blood. The cfDNA in plasma is a mixture of cfDNA of different sources, where the cfDNA released from tumor cells will carry genetic information of the tumor cells. Advanced cancer patients will have a large amount of cfDNA from the tumor in their blood, however for early cancer patients, cfDNA from the tumor will only account for a small fraction of the mixture. It is difficult to find cancer signatures in patients at an early stage. Studies have shown that cfDNA from different plasma sources can result in plasma-specific cfDNA release into the blood due to nucleosome occupancy. The distribution, the length and the like of cfDNA are greatly different between healthy people and cancer patients, so that the technical scheme provides a methylation analysis method of the circulating cell-free nucleosome active region, and the sources of cfDNA molecular fragments are distinguished by researching the methylation phenotype characteristics of the specific nucleosome active region of a DNA mixture in an auxiliary way, so that the detection efficiency is improved.
Further preferably, in the step of performing sliding operation with a preset length window in a preset step size in the genome interval based on the extracted cfDNA molecule fragments, and calculating the ratio of the number of cfDNA molecules that cross the whole window end to end in each window to the number of cfDNA molecules that cover the window in all different situations, the method includes:
based on the extracted circulating cell-free genome interval, sliding by adopting a window with a preset length and a preset step length;
calculate the head in each windowThe ratio of the number of gene molecules tailed across the entire window to the number of cfDNA molecules covering all gene molecules in the windowSAnd finding out the number ratio of cfDNA molecules in different situations in the fixed-length genome interval in each circulating cell-free genomeSPeak value ofS p
For peak valueS p The position of the genome is extended by a preset length to serve as an active region of the alternative nucleosome;
for alternative nucleosome active regionSThe values are homogenized and smoothed to obtain the ratio of the number of cfDNA molecules in different conditions after the active area of each alternative nucleosome in the interval is smoothedS il
Screening out an interval with a significant difference from a baseline nucleosome activity difference region created according to a healthy human sample by a Kolmogorov-Scirkov test method based on the calculated ratio to obtain a nucleosome active region, wherein the interval comprises: based on the calculated ratioS il Candidate nucleosome active regions having a significant difference from the baseline nucleosome active difference region created from a healthy human sample are screened out as the final nucleosome active region by the method of the kolmogorov-smirnov test.
Further preferably, after the sliding operation is performed in a genome interval of the cfDNA molecule fragment based on the extracted cfDNA molecule fragment with a preset length in a preset step length and the ratio of the number of the cfDNA molecules that cross the whole window end to end in each window and the number of the cfDNA molecules of all different cases covered by the window is calculated, the method further comprises the step of creating a baseline nucleosome activity difference region according to a healthy human sample:
obtaining a healthy person sample, and dividing the healthy person sample into a baseline sample group, a training sample group and a test sample group;
screening out the peaks in the baseline sample setS p Ratio of windowS il Alternative nucleosome active region i satisfying preset conditionss
Computing a set of candidate nucleosome active regions between a baseline sample set and a training sample set using a method of the kolmogorov-smirov testi s Trend ratio of active region of nucleosome in each of the candidatesS m The difference between them, and then calculate the active region of each alternative nucleosome contained in each samplepA value;
screening according to the calculation resultpAnd obtaining a basal nucleosome activity difference region by the nucleosome activity difference region with the value larger than the preset threshold value, and testing the basal nucleosome activity difference region by using the test sample group.
Further preferably, the screening of the region having a significant difference from the baseline nucleosome activity difference region created from the healthy human sample by the kolmogorov-smirnov test based on the calculated ratio to obtain the nucleosome active region comprises: testing between healthy person and blood plasma sample to be tested by using Kolmogorov-Similnov test method to obtainpThe value is lower than the interval of the preset threshold value, and then the nucleosome active region is obtained.
Further preferably, the methylation phenotype characteristics of the nucleosome active region obtained by the calculation and screening, and the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics are judged, so as to complete the methylation analysis of the circulating cell-free nucleosome active region, and the methylation phenotype characteristics comprise:
counting the number of differential nucleosome active areas in the plasma sample to be detected;
and judging the methylation heterogeneity level of the plasma sample to be detected according to a preset nucleosome active region quantity threshold, wherein the nucleosome active region threshold is determined by the corresponding differential nucleosome active region quantity when the maximum approximate faradaic coefficient is obtained by calculation of a York coefficient method.
Further preferably, the methylation phenotype characteristics of the nucleosome active region obtained by the calculation and screening, and the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics are judged, so as to complete the methylation analysis of the circulating cell-free nucleosome active region, and the methylation phenotype characteristics comprise:
for the resulting nucleosome active regions, the ratio of the number of methylated CpG sites to the number of all CpG sites in each region was calculatedMD
According to the site methylation ratio of each nucleosome active region in the plasma sample to be detectedMDCalculating to obtain the methylation density of the sample;
and judging the methylation heterogeneity level of the plasma sample to be detected according to a preset methylation density threshold value, wherein the methylation density threshold value is determined by the corresponding methylation density when the maximum approximate registration coefficient is calculated by a approximately registration coefficient method.
Further preferably, the methylation phenotype characteristics of the nucleosome active region obtained by the calculation and screening, and the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics are judged, so as to complete the methylation analysis of the circulating cell-free nucleosome active region, and the methylation phenotype characteristics comprise:
counting the number of methylated molecular fragments on each CpG site and the number of all the molecular fragments covering the CpG site and calculating a Beta value for the CpG sites in the obtained nucleosome active region;
calculating the methylation level of the sample according to the Beta value of each CpG locus in the plasma sample to be detected;
and judging the methylation heterogeneity level of the plasma sample to be detected according to a preset methylation level threshold value, wherein the methylation level threshold value is determined by the methylation level corresponding to the maximum approximate registration coefficient calculated by a registration coefficient method.
Further preferably, the methylation phenotype characteristics of the nucleosome active region obtained by the calculation and screening, and the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics are judged, so as to complete the methylation analysis of the circulating cell-free nucleosome active region, and the methylation phenotype characteristics comprise:
calculating methylation entropy in each region of the obtained nucleosome active regions;
calculating the methylation entropy of the sample according to the methylation entropy in each nucleosome active region in the plasma sample to be detected;
and judging the methylation heterogeneity level of the plasma sample to be detected according to a preset methylation entropy threshold, wherein the methylation entropy threshold is determined by a corresponding methylation entropy threshold when the maximum approximate registration coefficient is obtained by calculation of a registration coefficient method.
In another aspect, the present invention provides a methylation analysis device of circulating cell-free nucleosome active region, which is applied to the above methylation analysis method of circulating cell-free nucleosome active region, the device comprising:
the circulating cell-free gene fragment acquisition module is used for acquiring capture sequencing data of a plasma sample to be detected and extracting cfDNA molecular fragments from the capture sequencing data;
the different-condition cfDNA molecule number ratio calculation module is used for performing sliding operation by adopting a window with a preset length in a genome interval of the extracted cfDNA molecule fragments in a preset step length, and calculating the ratio of the number of the cfDNA molecules which cross the whole window from head to tail in each window to the number of the cfDNA molecules which cover the window in different conditions;
a nucleosome active region screening module, which is used for screening an interval with obvious difference from a baseline nucleosome active difference region established according to a healthy human sample by a Kolmogorov-Simminov test method based on the calculated ratio to obtain a nucleosome active region;
and the methylation analysis module is used for calculating the methylation phenotype characteristics of the nucleosome active region obtained by screening, judging the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics, and completing the methylation analysis of the circulating cell-free nucleosome active region.
Further preferably, the number ratio of cfDNA molecules in different cases calculation module comprises:
the window sliding unit is used for sliding by adopting a window with a preset length in a preset step length based on the extracted circulating cell-free genome interval;
a first calculating unit for calculating the number of gene molecules crossing the whole window from head to tail in each window and the ratio of the number of cfDNA molecules of different cases of the number of all gene molecules covered by the windowSAnd finding out the number ratio of cfDNA molecules in different situations in the fixed-length genome interval in each circulating cell-free genomeSPeak value ofS p
A data processing unit for aligning the peak valuesS p At the genome position, extending the preset length back and forth, serving as an active region of the alternative nucleosome and being used for the active region of the alternative nucleosomeSThe values are homogenized and smoothed to obtain the ratio of the number of cfDNA molecules in different conditions after the active area of each alternative nucleosome in the interval is smoothedS il
And a nucleosome active region screening module for calculating the ratio based on the measured valuesS il Candidate nucleosome active regions having a significant difference from the baseline nucleosome active difference region created from a healthy human sample are screened out as the final nucleosome active region by the method of the kolmogorov-smirnov test.
Further preferably, the methylation analysis device of circulating cell-free nucleosome activity region further comprises a baseline creation module for creating a baseline nucleosome activity difference region from a healthy human sample, comprising:
the system comprises a sample acquisition unit, a training unit and a testing unit, wherein the sample acquisition unit is used for acquiring a healthy person sample and dividing the healthy person sample into a baseline sample group, a training sample group and a testing sample group;
a window set screening unit for screening out the sum peak in the baseline sample setS p In the windowS il Alternative nucleosome activity difference region satisfying preset conditionsi s
A second calculation unit for calculating a set of alternative nucleosome active regions between the baseline sample set and the training sample set using a method of the Kolmogorov-Schmilov testi s Trend ratio of active region of nucleosome in each of the candidatesS m The difference between them, and then calculate the active region of each alternative nucleosome contained in each samplepA value;
basal nucleosome activityA sex difference region screening unit for screening according to the calculation resultpAnd obtaining a nucleosome activity difference region with the value larger than a preset threshold value.
Further preferably, the nucleosome active region screening module is also used for testing between a healthy person and a plasma sample to be tested by using a Kolmogorov-Similnov test method to obtainpThe value is lower than the interval of the preset threshold value, and then the nucleosome active region is obtained.
In another aspect, the present invention further provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above methylation analysis method of circulating cell-free nucleosome active region when executing the computer program.
In another aspect, the present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described method for methylation analysis of circulating cell-free nucleosome active regions.
The methylation analysis method and device of the circulating cell-free nucleosome active region, the terminal equipment and the storage medium provided by the invention calculate the ratio in a window sliding mode, further screen out a significant difference interval between a plasma sample to be detected and a healthy human sample through the ratio, determine the interval as a small nuclear active region, and finally calculate the methylation phenotype characteristic of the small nuclear active region to realize the methylation analysis of the circulating cell-free nucleosome active region. The method is based on a second-generation sequencing method to analyze the phenotypic information on cfDNA molecules in blood plasma, so that the methylation heterogeneity level of the blood plasma sample to be detected is obtained, and partial basis is provided for a follow-up doctor to comprehensively judge the source (from a cancer sample or a healthy human sample) of the blood plasma sample to be detected. Experiments show that the methylation analysis method and the methylation analysis device provided by the invention can be applied to early cancer plasma samples, and the methylation heterogeneity level different from that of healthy human samples can be obtained, so that the early diagnosis of cancer and the early screening of cancer are effectively assisted, and the screening efficiency and precision are improved. In addition, the method is a non-invasive detection means, can be better accepted by patients compared with other methods, and reduces the expenditure of basic medical treatment.
Drawings
The foregoing features, technical features, advantages and embodiments are further described in the following detailed description of the preferred embodiments, which is to be read in connection with the accompanying drawings.
FIG. 1 is a schematic flow chart of a methylation analysis method of circulating cell-free nucleosome active regions according to the present invention;
FIG. 2 is a schematic diagram of the structure of a methylation analysis apparatus for circulating cell-free nucleosome active regions according to the present invention;
FIG. 3 is a boxplot of the present invention using the number of nucleosome active domains as an example of a methylation phenotype signature;
FIG. 4 is a graph of the working characteristics of subjects of the present invention using the number of nucleosome active regions as an example of methylation phenotype features;
FIG. 5 is a boxplot of the present invention using methylation density as an example of a methylation phenotype feature;
FIG. 6 is a graph of the working characteristics of subjects of the present invention using methylation density as an example of a methylation phenotype;
FIG. 7 is a boxplot of the present invention using methylation levels as an example of a methylation phenotype feature;
FIG. 8 is a graph of the working characteristics of subjects of the present invention using methylation levels as an example of a methylation phenotype;
FIG. 9 is a boxplot of the invention using entropy of methylation as an example of a methylation phenotype feature;
FIG. 10 is a graph of the working characteristics of subjects of the present invention using entropy of methylation as an example of a methylation phenotype feature;
fig. 11 is a schematic structural diagram of a terminal device in the present invention.
Reference numerals:
the kit comprises a 100-methylation analysis device, a 110-circulating cell-free gene fragment acquisition module, a 120-different-condition cfDNA molecular number ratio calculation module, a 130-nucleosome activity region screening module and a 140-methylation analysis module.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
As shown in FIG. 1, the methylation analysis method of the circulating cell-free nucleosome active region provided by the invention comprises the following steps:
s10, acquiring capture sequencing data of a plasma sample to be detected and extracting cfDNA molecular fragments from the capture sequencing data;
s20, based on the extracted cfDNA molecule fragments, adopting a window with a preset length in a genome interval to perform sliding operation with a preset step length, and calculating the ratio of the number of the cfDNA molecules crossing the whole window from head to tail in each window to the number of the cfDNA molecules in different conditions covered by the window;
s30 screening an interval with a significant difference from a baseline nucleosome activity difference area created according to a healthy human sample by a Kolmogorov-Scirnoff test method based on the calculated ratio to obtain a nucleosome activity area;
s40, calculating the methylation phenotype characteristics of the nucleosome active region obtained by screening, and judging the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics to finish methylation analysis of the circulating cell-free nucleosome active region.
In step S10, the plasma sample to be detected is a plasma sample that needs to be subjected to methylation analysis of the active region of the circulating cell-free nucleosome, and in practical applications, the plasma sample may be a healthy human sample or a plasma sample of a tumor patient. After acquiring the capture sequencing data of the plasma sample to be detected, the method further comprises the step of acquiring the characteristics of the cfDNA fragments, and the method comprises the following steps: for sequence files obtained by sequencing, respectively obtaining the starting position and the ending position of paired reads belonging to the same DNA molecule, and judging the positive chain and the negative chain of the reads according to the comparison information of the reads; and then judging the starting and ending positions of the cfDNA molecules according to the positive and negative chains and the starting and ending positions of paired reads, further obtaining the starting and ending positions of the cfDNA fragments on the human genome hg19 and calculating the length L of the cfDNA fragments.
Step S20 includes: s21, based on the extracted circulating cell-free genome interval, sliding by adopting a window with a preset length and a preset step length; s22 calculating the ratio of the number of gene molecules crossing the whole window from head to tail in each window to the number of cfDNA molecules in different cases of the number of all gene molecules covered by the windowSAnd finding out the number ratio of cfDNA molecules in different situations in the fixed-length genome interval in each circulating cell-free genomeSPeak value ofS p (ii) a S23 pairs of peaksS p The position of the genome is extended by a preset length to serve as an active region of the alternative nucleosome; s24 pairs in alternative nucleosome active regionsSThe values are homogenized and smoothed to obtain the ratio S of the number of cfDNA molecules in different conditions after the active area of each alternative nucleosome in the interval is smoothedil’。
In the sliding operation of the cfDNA genome region, the window size and the sliding step size can be adjusted according to practical applications, for example, the window size is 100bp, 110bp, 120bp, 130bp, 140bp, 150bp, 160bp, 170bp, 180bp, 190bp, 200bp or even larger, and the sliding step size is 1bp, 2bp, 3bp, 4bp or even larger. The gene molecules with the head and the tail crossing the whole window specifically mean that the length of the gene molecules is larger than the size of the window, and the head end and the tail end of the gene molecules are both positioned outside the window; all different instances of cfDNA covered by a window specifically refer to circulating cell-free DNA molecules that, in addition to spanning the entire window head-to-tail, also include circulating cell-free DNA molecules that head-to-tail or tail-to-tail inside the window, i.e. all cfDNA molecules within the window coverage.
The molecular ratio of cfDNA is completedSFinding out the ratio of cfDNA molecules in different situations in a genome interval in a target sequencing intervalSPeak value ofS p And for the calculated peak valueS p The position of the genome is extended by a preset length in front and back to serve as an active region of the alternative nucleosome. Here again, the fixed length genomic region may be selected based on the actual circumstances (including window size and sliding step size factors), such as targeted sequencing of the designed genomic region, the fixed size of the defined genomic region, etc. The preset length of the extension can be set according to actual conditions, such as 60bp extension.
Then, the values within the window are normalized and smoothed, specifically, for each peak valueS p The ratio S of the window is subjected to homogenization treatment by the formula (1) to obtain a homogenized ratioS’
Figure 146630DEST_PATH_IMAGE001
(1)
Then, for the normalized ratioS’Normalizing in each window by using a local weighted average method to obtain the ratio of each window after smoothingS il And further based on the calculated ratioS il Candidate nucleosome active regions having a significant difference from the baseline nucleosome active difference region created from a healthy human sample are screened out as the final nucleosome active region by the method of the kolmogorov-smirnov test.
Before screening for significant differences, a baseline nucleosome activity difference region of a healthy human sample needs to be created by the following specific processes: firstly, obtaining a healthy person sample, and dividing the healthy person sample into a baseline sample group, a training sample group and a test sample group; then, the peak value of the active region of the candidate nucleosome in the baseline sample group is screenedS p Ratio of the windowS il Satisfy a predetermined condition (e.g. satisfyS p Ratio of the windowS il Between upper quartile and lower quartile, etc.)) in a window seti s (ii) a Then, using a Kolmogorov-Similnov testThe method calculates the active region set of the alternative nucleosome between the base line sample set and the training sample seti s Trend ratio of active region of nucleosome in each of the candidatesS m The difference between them, and then calculate the active region of each alternative nucleosome contained in each samplepA value; finally, screening according to the calculation resultpAnd obtaining a baseline nucleosome activity difference region from the candidate nucleosome activity region with the value larger than the preset threshold value, and testing the baseline nucleosome activity difference region by using the test sample group. Based on this, the test is carried out between a healthy person and the plasma sample to be tested by using the Kolmogorov-Similnov test method,pand the alternative nucleosome active region with the value lower than the preset threshold value is the nucleosome active region. Are directed topThe preset threshold value of the value is usually set to 0.05, the trend ratioS m The expression can be carried out by a median, an average value and the like, and can also be adjusted according to practical application. Trend ratioS m Aggregation of active regions by alternative nucleosomesi s Smoothed ratio of active regions of alternative nucleosomesS il Determining, e.g., by comparing the ratio of the active regions of each of the candidate nucleosomesS il Median of (2) as trend ratioS m (ii) a Further example is the ratio of the active regions of each alternative nucleosomeS il Is taken as a trend ratioS m And the like.
After obtaining the nucleosome active region, calculating the methylation phenotype characteristics of the region, and further judging the methylation heterogeneity level of the region, wherein the methylation phenotype characteristics comprise the number, methylation density, methylation level, methylation entropy and the like of the nucleosome active region.
When the number of nucleosome active regions is used as a methylation phenotype characteristic, the following steps are included:
s11 counting the number of differential nucleosome active regions in the plasma sample to be detected;
s12 determining the methylation heterogeneity level of the plasma sample to be detected according to a predetermined threshold value of the number of nucleosome active regions, wherein the threshold value of the nucleosome active regions is determined by the number of differential nucleosome active regions corresponding to the maximum approximation of the dyadic coefficient calculated by the yoden coefficient method.
Specifically, in the process of setting the threshold value of the number of the nucleosome active areas, the calculated number of the nucleosome active areas is used for classifying healthy human samples and plasma samples of tumor patients to obtain the number of false positive, false negative, true positive and true negative, and then the number of the nucleosome active areas corresponding to the maximum approximation of the dengue coefficient is calculated by using the approximation coefficient methodNAnd when the number of the differential nucleosome active regions in the plasma sample to be detected is larger than the threshold of the number of the differential nucleosome active regions, the sample is judged to have high methylation heterogeneity level, otherwise, the sample has low methylation heterogeneity level. Wherein the coefficient of approximationJThe formula of calculation is as follows (2):
Figure 335297DEST_PATH_IMAGE002
(2)
wherein the content of the first and second substances,TPrepresenting the number of true positives, i.e. the number of samples predicted to be true, actually true;TNindicating the number of true negatives, namely the number of samples predicted to be false and actually false;FNrepresenting the number of false negatives, namely the number of samples predicted to be false and actually true;FPthe number of false positives is indicated, i.e. the number of samples predicted to be false, actually false.
When methylation density is used as a methylation phenotype characteristic, the following steps are included:
s21 calculating methylation ratio of the number of methylated CpG sites in each region to the number of all CpG sites for the obtained nucleosome active regionMD(Methylation sensitivity) according to formula (3):
Figure 642999DEST_PATH_IMAGE003
(3)
wherein the content of the first and second substances,Mrepresents the number of methylated CpG sites,Urepresents the number of unmethylated CpG sites.
S22 according to waitingDetecting site methylation ratio of each nucleosome active region in plasma sampleMDCalculating to obtain the methylation density of the sampleMD s As in formula (4):
Figure 866301DEST_PATH_IMAGE004
(4)
wherein the content of the first and second substances,Nindicates the number of nucleosome active regions,irepresents the firstiThe individual nucleosome active region.
S23, the methylation heterogeneity level of the plasma sample to be detected is judged according to a preset methylation density threshold, the methylation density threshold is determined by the corresponding methylation density when the maximum approximate registration coefficient is obtained by calculation of a registration coefficient method, and the registration coefficient calculation formula is as shown in formula (2). And when the methylation density of the plasma sample to be detected is greater than the methylation density threshold value, judging that the methylation heterogeneity level of the sample is high, otherwise, judging that the methylation heterogeneity level is low.
When using methylation levels as a methylation phenotype characteristic, the following steps are included:
s31, counting the quantity of methylated molecular fragments at each CpG site and the quantity of all the molecular fragments covering the CpG site and calculating the Beta value for the CpG sites in the obtained nucleosome active region, wherein the formula is as follows (5):
Figure 28686DEST_PATH_IMAGE005
(5)
wherein the content of the first and second substances,Bwhich represents the value of Beta and is,MFrepresents the number of methylated molecular fragments at CpG sites,AFrepresents the number of all molecular fragments covered at CpG sites;
s32 calculating the methylation level of the sample according to the Beta value of each CpG locus in the plasma sample to be detectedMLAs in formula (6):
Figure 455381DEST_PATH_IMAGE006
(6)
wherein the content of the first and second substances,kindicates all Cp in the areaThe G site is a site of the amino acid sequence,irepresents the firstiAnd (4) a region.
S33, the methylation heterogeneity level of the plasma sample to be detected is judged according to a preset methylation level threshold, the methylation level threshold is determined by the methylation level corresponding to the maximum approximate dengue coefficient obtained by the calculation of the dengue coefficient method, and the dengue coefficient calculation formula is as shown in formula (2). And when the methylation level of the plasma sample to be detected is greater than the methylation level threshold, judging that the methylation heterogeneity level of the sample is high, otherwise, judging that the methylation heterogeneity level is low.
When using methylation entropy as a methylation phenotype feature, the following steps are included:
s41 calculating methylation entropy in each region of the obtained nucleosome active regions;
s42, calculating the methylation entropy of the sample according to the methylation entropy in each nucleosome active region in the plasma sample to be detected;
s43, the methylation heterogeneity level of the plasma sample to be detected is judged according to a preset methylation entropy threshold, the methylation entropy threshold is determined by a corresponding methylation entropy threshold when the maximum approximate dengue coefficient is obtained by calculation of a dengue coefficient method, and the dengue coefficient calculation formula is as shown in formula (2).
The calculation process of the methylation entropy is as follows: adjacent to the calculation regionnAnd (c) a methylation site. For the locus Locs, it was observedn 2 One possible methylation state for one of the sitesLoc i The entropy of methylation at this site is the probability that the two exoalleles randomly sampled from the site are differentEP i1 AndEP i2 entropy of methylation for one sample is all sitesLoc i Distribution of methylation scoresETAs shown in formulas (7) to (9):
Figure 920168DEST_PATH_IMAGE007
(7)
Figure 483130DEST_PATH_IMAGE008
(8)
Figure 705164DEST_PATH_IMAGE009
(9)
wherein the content of the first and second substances,R w representation overlaynA cfDNA molecule of a single methylation site,wrepresenting successionnA window of individual methylation sites,x i,r represents the number on the cfDNA molecule fragmentiThe methylation state of each of the methylation sites,
Figure 758437DEST_PATH_IMAGE010
represents the number on the cfDNA molecule fragmentiA combination of the states of the individual methylation sites,c k is shown asiA second site of methylationkThe methylation state, when n =4,
Figure 661803DEST_PATH_IMAGE011
prop k to representc k The frequency of occurrence of the frequency of occurrence,rrepresents the number on the cfDNA molecule fragmentiThe position of each of the methylation sites, EPis the entropy of methylation of one methylation site,ETall methylation sitesEPThe sum of the values;
Figure 531801DEST_PATH_IMAGE012
representing a probability distribution function. And when the methylation entropy of the plasma sample to be detected is larger than the methylation entropy threshold, judging that the methylation heterogeneity level of the sample is high, otherwise, judging that the methylation heterogeneity level is low.
In this embodiment, various methods for determining the methylation heterogeneity level of a plasma sample to be detected based on the number of nucleosome active regions, methylation density, methyl formazan level and methylation entropy 4 are provided, and in practical applications, one or more methods can be selected according to actual requirements to perform comprehensive determination on the methylation heterogeneity level of the plasma sample to be detected. For the calculation result, if the methylation heterogeneity level of the plasma sample to be detected is judged to be high, the result indicates that the plasma sample to be detected possibly comes from the cancer plasma sample; if the methylation heterogeneity level of the plasma sample to be detected is judged to be low, the plasma sample to be detected is possibly derived from a healthy human plasma sample. On the basis, the diagnosis system can assist doctors in comprehensive judgment in the subsequent diagnosis process, provide partial basis for diagnosis results, and assist cancer screening work, particularly diagnosis and screening of early cancers.
The present invention also provides a methylation analysis device 100 for circulating cell-free nucleosome active regions, which is applied to the above methylation analysis method for circulating cell-free nucleosome active regions, as shown in fig. 2, wherein the device 100 comprises: the circulating cell-free gene fragment acquisition module 110 is used for acquiring capture sequencing data of a plasma sample to be detected and extracting cfDNA molecular fragments from the capture sequencing data; a different-case cfDNA molecule number ratio calculation module 120, configured to perform sliding operation with a preset length window in a preset step length in a genome interval of the extracted cfDNA molecule fragments, and calculate a ratio of the number of cfDNA molecules that end-to-end span the entire window in each window to the number of cfDNA molecules that cover the window in different cases; a nucleosome active region screening module 130, configured to screen, by a kolmogorov-smirnov test method, an interval having a significant difference from a baseline nucleosome active difference region created according to a healthy human sample based on the calculated ratio, to obtain a nucleosome active region, and to obtain a nucleosome active region; and the methylation analysis module 140 is used for calculating the methylation phenotype characteristics of the nucleosome active region obtained by screening, judging the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics, and completing the methylation analysis of the circulating cell-free nucleosome active region.
Specifically, the plasma sample to be detected is a plasma sample which needs to be subjected to methylation analysis of the active region of the circulating cell-free nucleosome, and in practical application, the plasma sample can be a healthy human sample or a plasma sample of a tumor patient. The circulating cell-free gene fragment acquisition module 110 is further configured to acquire cfDNA fragment characteristics, including the steps of: for sequence files obtained by sequencing, respectively obtaining the starting position and the ending position of paired reads belonging to the same DNA molecule, and judging the positive chain and the negative chain of the reads according to the comparison information of the reads; and then judging the starting and ending positions of the cfDNA molecules according to the positive and negative chains and the starting and ending positions of paired reads, further obtaining the starting and ending positions of the cfDNA fragments on the human genome hg19 and calculating the length L of the cfDNA fragments.
The module 120 for calculating the number ratio of cfDNA molecules in different cases includes: the window sliding unit is used for sliding by adopting a window with a preset length in a preset step length based on the extracted circulating cell-free genome interval; a first calculating unit for calculating the number of gene molecules crossing the whole window from head to tail in each window and the ratio of the number of cfDNA molecules of different cases of the number of all gene molecules covered by the windowSAnd finding out the number ratio of cfDNA molecules in different situations in the fixed-length genome interval in each circulating cell-free genomeSPeak value ofS p (ii) a A data processing unit for aligning the peak valuesS p At the genome position, extending the preset length back and forth, serving as an active region of the alternative nucleosome and being used for the active region of the alternative nucleosomeSThe values are homogenized and smoothed to obtain the ratio of the number of cfDNA molecules in different conditions after the active area of each alternative nucleosome in the interval is smoothedS il
In the sliding operation of the window sliding unit on the cfDNA genome region, the window size and the sliding step size can be adjusted according to practical application, for example, the window size is 100bp, 110bp, 120bp, 130bp, 140bp, 150bp, 160bp, 170bp, 180bp, 190bp, 200bp or even larger, and the sliding step size is 1bp, 2bp, 3bp, 4bp or even larger. The first computing unit is used for computing the length of the gene molecules, wherein the first computing unit spans the whole window from head to tail, specifically, the length of the gene molecules is larger than the size of the window, and the head end and the tail end of the gene molecules are both positioned outside the window; all gene molecules covered by the window include, in addition to gene molecules that span the entire window head-to-tail, cfDNA molecules whose head-to-end or tail-to-end is located inside the window, i.e., all cfDNA molecules within the window coverage.
The molecular ratio of cfDNA of different conditions is completedSAfter the calculation, the first calculation unit further finds out the molecular ratio of the cfDNA of different situations in the interval of the fixed-length genomeSPeak value ofS p And then calculateTo peak valueS p The position of the genome is extended by a preset length in front and back to serve as an active region of the alternative nucleosome. Here again, the fixed length genomic region may be selected based on the actual circumstances (including window size and sliding step size factors), such as targeted sequencing of the designed genomic region, the fixed size of the defined genomic region, etc. The preset length of the extension can be set according to actual conditions, such as 60bp extension.
Then, the data processing unit performs homogenization and smoothing on the values in the window, specifically, for each peak valueS p Ratio of the windowSCarrying out homogenization treatment by the formula (1) to obtain a homogenized ratioS’. Then, for the normalized ratioS’Normalizing in each window by using a local weighted average method to obtain the ratio of each window after smoothingS il . Then the nucleosome active region screening module is based on the cfDNA molecule ratio of different conditions after each window in the interval is smoothedS il The interval with significant difference from the baseline nucleosome activity difference region was calculated. The nucleosome active region screening module is also used for calculating the ratio based on the calculated ratioS il Candidate nucleosome active regions having a significant difference from the baseline nucleosome active difference region created from a healthy human sample are screened out as the final nucleosome active region by the method of the kolmogorov-smirnov test.
Prior to screening for significant differences, a baseline creation module is also required to create a baseline nucleosome activity difference region for healthy human samples, including: the system comprises a sample acquisition unit, a training unit and a testing unit, wherein the sample acquisition unit is used for acquiring a healthy person sample and dividing the healthy person sample into a baseline sample group, a training sample group and a testing sample group; a window set screening unit for screening out the sum peak in the baseline sample setS p In the windowS il Satisfy a predetermined condition (e.g. satisfyS p Between the upper quartile and the lower quartile, etc.) of alternative nucleosome activity difference regionsi s (ii) a Second computing unitMethod for calculating a set i of active regions between a baseline sample set and a training sample set in alternative nucleosomes using the kolmogorov-smirnov testsTrend ratio of active region of nucleosome in each of the candidatesS m The difference between them, and then calculate the active region of each alternative nucleosome contained in each samplepA value; a screening unit of the activity difference region of the baseline nucleosome for screening according to the calculation resultpAnd obtaining a nucleosome activity difference region with the value larger than a preset threshold value. The nucleosome active region screening module is used for testing between a healthy person and a plasma sample to be tested by using a Kolmogorov-Similnov test method to obtainpThe value is lower than the interval of the preset threshold value, and then the nucleosome active region is obtained. Are directed topThe preset threshold value of the value is usually set to 0.05 and can be adjusted according to the actual application.
After obtaining the nucleosome active region, the methylation analysis module calculates the methylation phenotype characteristics of the region, and further judges the methylation heterogeneity level of the region, wherein the methylation phenotype characteristics comprise the number, the methylation density, the methylation level, the methylation entropy and the like of the nucleosome active region. When the number of nucleosome active regions is used as a methylation phenotype signature, analysis is performed according to steps S11-S12; when methylation density is used as a methylation phenotype characteristic, analysis is performed according to steps S21-S23; when methylation levels are used as the methylation phenotype characteristic, analysis is performed according to steps S31-S33; when methylation entropy is used as the methylation phenotype characteristic, the analysis is performed according to steps S41-S43, which is not described herein.
The methylation analysis of the active region of circulating cell free nucleosomes and its beneficial effects are illustrated below by an example:
1cfDNA extraction
79 liver cancer patients and 81 ctDNA samples are selected for library construction, target region capture and sequencing, and the following operations are respectively carried out:
1.1 treating plasma
1.1.2 after thawing the samples, 15. mu.L proteinase K (proteinase K) (20mg/mL) and 50. mu.L Sodium Dodecyl Sulfate (SDS) solution (20%) were added to each 1mL of samples. If the plasma volume is less than 4mL, make up with Phosphate Buffered Saline (PBS).
1.1.3 turn over and mix evenly, incubate 20min at 60 ℃, then ice-bath 5 min.
1.2 adding reagents to deep well plates:
1.2.1 Add corresponding reagents to the deep well plates, the reagents and corresponding amounts added to each deep well plate are shown in Table 1:
table 1: list of reagents added in deep well plate
Figure 160622DEST_PATH_IMAGE013
1.3 run KingFisher FLEX magnetic bead extractor:
1.3.1 before the program runs, a clean magnetic head sleeve needs to be put into a specified position of a detection program, and the program runs to detect whether the magnetic head sleeve falls off or not.
1.3.2 after the deep hole plate is added, clicking an SATRT key on the automatic extraction instrument, and sequentially placing a magnetic head sleeve and the corresponding deep hole plate according to the requirements of a display screen. The SATRT key is clicked again, and the automatic extractor starts to operate.
1.4 aspiration of DNA sample:
after the automatic extractor is operated, the No. 7 deep hole plate is taken out firstly, and then the STOP key is clicked. The DNA sample was aspirated into the corresponding labeled centrifuge tube with a pipette.
Library construction
2.1 preparation of internal reference
Adding Lamdba DNA into a 50uL breaking tube, breaking by using an M220 breaking instrument, diluting the broken internal reference DNA, and adding the diluted internal reference DNA into a sample during library building.
2.2 preparation of DNA samples
2.2.1cfDNA samples did not require disruption.
2.2.2 adding the extracted blood plasma into the interrupted reference substance to prepare a library.
2.3 library preparation procedure:
2.3.1EZ transformation
2.3.1.1 the sample has an initial volume of 20. mu.L, and when it is less than 20. mu.L, the volume is made up with water.
2.3.1.2A 130. mu.L of Lightning Conversion Reagent in the kit was added to the DNA sample, shaken and mixed, centrifuged briefly, placed on a PCR instrument, and subjected to PCR reaction under the conditions shown in Table 2:
table 2: conditions of PCR reaction
Figure 748860DEST_PATH_IMAGE014
2.3.1.3 adding M-Binding Buffer in 600. mu.L kit into Zymo-Spin ™ IC Column in the kit, adding the product obtained by the reaction in the previous step into Zymo-Spin ­ IC Column containing M-Binding Buffer, blowing and mixing uniformly by a gun, and standing for 2 min. Centrifuge at 12000rpm for 1 min.
2.3.1.4 adding the liquid in the collecting tube back to the adsorption column, standing for 2min, centrifuging at 12000rpm for 1min, and discarding the waste liquid.
2.3.1.5M-Wash Buffer in 100. mu.L kit was added, centrifuged at 12000rpm for 1min, and the waste solution was discarded.
2.3.1.6 adding into 200 μ L-depletion Buffer in kit, incubating at room temperature (20-30 deg.C) for 15-20min, centrifuging at 12000rpm for 1min, and discarding waste liquid.
2.3.1.7M-Wash Buffer in 200. mu.L kit was added, centrifuged at 12000rpm for 1min, and the waste solution was discarded.
2.3.1.8 repeat 1.8 steps, add 200. mu.L M-Wash Buffer in the kit, centrifuge at 12000rpm for 1min, discard the waste liquid.
2.3.1.9 the adsorption column was returned to the collection tube, centrifuged at 12,000 rpm for 2min, and the waste liquid was discarded. And (4) opening the adsorption column, placing at room temperature for 2-5min to thoroughly dry the residual rinsing liquid in the adsorption material.
2.3.1.10 transferring the adsorption column into a clean centrifuge tube, suspending and dripping 20 μ L of elution buffer TE into the middle part of the adsorption membrane for elution, standing at room temperature for 2-5min, and centrifuging at 12000rpm for 1 min.
2.3.1.11 the liquid in the collection tube is added back to the adsorption column again, placed at room temperature for 2-5min, centrifuged at 12000rpm for 1min, and the tube with the DNA after transformation is stored at-20 deg.C (the DNA after transformation is used as soon as possible).
2.3.2DNA pretreatment
2.3.2.1 the PCR instrument was preheated to 95 ℃ in advance and the hot lid temperature was 105 ℃.
2.3.2.2 the transformed fragmented DNA was put into a 0.2ml PCR tube, and a Low concentration ethylenediaminetetraacetic acid TE buffer solution (Low EDTA TE) was added to dilute the total volume to 15. mu.L.
2.3.2.3 the PCR tube was placed in a PCR instrument and incubated at 95 ℃ for 2min, immediately placed on ice and left to stand for 2 min.
2.3.3 Joint with T7
2.3.3.1PCR instrument was preheated in advance at 37 ℃ and the hot lid temperature was 105 ℃.
2.3.3.2 the reaction system was prepared according to Table 3, in which the reagents were ACCEL-NGS METHYL YL-SEQ DNA LIBRARY KIT KITs (produced by Swift Biosciences).
Table 3: list of reagents
Figure 65703DEST_PATH_IMAGE015
2.3.3.3 Add 25. mu.L of reagent to the pre-treated DNA sample PCR tube placed on ice, pipette to mix well, and centrifuge instantaneously.
2.3.3.4 the PCR tube was set in a PCR machine and the reaction was carried out under the conditions shown in Table 4.
Table 4: reaction conditions
Figure 242737DEST_PATH_IMAGE016
2.3.4 two-chain Synthesis reaction (Second strand synthesis reaction)
2.3.4.1 the PCR instrument was preheated to 98 ℃ in advance and the hot lid temperature was 105 ℃.
2.3.4.2 the reagents were prepared according to Table 5, which were derived from ACCEL-NGS METHYL-SEQ DNA LIBRARY KIT (produced by Swift Biosciences).
Table 5: list of reagents
Figure 342325DEST_PATH_IMAGE017
2.3.4.3 mu.L of the reagent shown in Table 5 was added to the reaction system in the previous step, and the mixture was pipetted and mixed well and centrifuged instantaneously.
2.3.4.4 the PCR tube was set in a PCR machine to perform the double strand synthesis reaction under the conditions shown in Table 6.
Table 6: reaction conditions for two-chain synthesis
Figure 293226DEST_PATH_IMAGE018
2.3.4.5 the purified magnetic beads were removed from the reaction mixture at 4 ℃ and allowed to equilibrate at room temperature for half an hour.
2.3.4.6 after the reaction in the previous step, 101. mu.L of magnetic beads were added to the product, and the mixture was blown up and mixed.
2.3.4.7 standing at room temperature for 5min, placing on a magnetic frame until the liquid is clear, and discarding the supernatant.
2.3.4.8 was incubated with 200. mu.L of 80% ethanol for 30sec and discarded. Note that: 80% ethanol is prepared in situ. The 200 μ L80% ethanol wash step was repeated once.
2.3.4.9 residual ethanol at the bottom of the centrifuge tube was discarded using a 10. mu.L pipette tip and dried at room temperature until ethanol was completely volatilized.
2.3.4.10 the tube was removed from the magnetic stand, 16. mu.L of ultrapure water was added, and the mixture was shaken and mixed. Incubate at room temperature for 2 min.
2.3.4.11 briefly, place on a magnetic rack until the liquid is clear, and transfer 15. mu.L of the sample to a new centrifuge tube.
2.3.5 plus T5 Joint
2.3.5.1 the reagents were prepared according to Table 7, which were obtained from ACCEL-NGS METHYL methacrylate-SEQ DNA LIBRARY KIT KIT (produced by Swift Biosciences). Adding 15 μ L of the reaction system into the sample in the previous step, blowing and mixing the mixture by using a pipette, and performing instantaneous centrifugation.
Table 7: list of reagents
Figure 977542DEST_PATH_IMAGE019
2.3.5.2 the PCR tube was set in a PCR machine and the PCR reaction was carried out under the conditions shown in Table 8.
Table 8: conditions of PCR reaction
Figure 494236DEST_PATH_IMAGE020
2.3.5.3 the purified magnetic beads were removed from the reaction mixture at 4 ℃ and allowed to equilibrate at room temperature for half an hour.
2.3.5.4 after the ligation reaction was completed, 36. mu.L of magnetic beads were added and the mixture was pipetted and mixed.
2.3.5.5 standing at room temperature for 5min, placing on a magnetic frame until the liquid is clear, and discarding the supernatant.
2.3.5.6 was incubated with 200. mu.L of 80% ethanol for 30sec and discarded. Note that: 80% ethanol is prepared in situ. The 200 μ L80% ethanol wash step was repeated once.
2.3.5.7 residual ethanol at the bottom of the centrifuge tube was discarded using a 10. mu.L pipette tip and dried at room temperature until ethanol was completely volatilized.
2.3.5.8 the tube was removed from the magnetic frame, 20. mu.L of ultrapure water was added, and the mixture was shaken and mixed. Incubate at room temperature for 2 min.
2.3.5.9 briefly, place on a magnetic rack until the liquid is clear, and transfer 20. mu.L of the sample to a new centrifuge tube.
2.3.6 amplification
2.3.6.1 configuring reaction reagents according to the table 9, adding 30 μ L of reaction system into the sample in the last step, using a pipette to blow, evenly mix, and carrying out instantaneous centrifugation, wherein the reagents in the table are from ACCEL-NGS METHYL-SEQ DNA LIBRARY KIT (produced by Swift Biosciences).
Table 9: list of reagents
Figure 866791DEST_PATH_IMAGE021
2.3.6.2 the PCR tubes were placed in a PCR machine and the PCR reactions were performed according to the conditions of Table 10.
Table 10: conditions of PCR reaction
Figure 164043DEST_PATH_IMAGE022
2.3.6.3 the purified magnetic beads were removed from the reaction mixture at 4 ℃ and allowed to equilibrate at room temperature for half an hour.
2.3.6.4 after the ligation reaction, 60. mu.L of magnetic beads were added and the mixture was pipetted and mixed.
2.3.6.5 standing at room temperature for 5min, placing on a magnetic frame until the liquid is clear, and discarding the supernatant.
2.3.6.6 was incubated with 200. mu.L of 80% ethanol for 30sec and discarded. Note that: 80% ethanol is prepared in situ. The 200 μ L80% ethanol wash step was repeated once.
2.3.6.7 residual ethanol at the bottom of the centrifuge tube was discarded using a 10. mu.L pipette tip and dried at room temperature until ethanol was completely volatilized.
2.3.6.8 the tube was removed from the magnetic stand, 50. mu.L of ultrapure water was added, and the mixture was shaken and mixed. Incubate at room temperature for 2 min.
2.3.6.9 briefly, place on a magnetic rack until the liquid is clear, and transfer 50. mu.L of the sample to a new centrifuge tube.
2.4 library Capture
2.4.1 hybrid library:
2.4.1.1 captures at 1ug per total capture.
2.4.1.3 adding hybridization reagent into the above system, shaking and mixing, and centrifuging for a short time.
2.4.2 seal the EP tube with a sealing film, put into a vacuum centrifugal concentrator and evaporate to dryness (60 ℃, about 20min-1 hr). Note that it is checked at any time whether it has evaporated to dryness.
2.4.3DNA denaturation:
2.4.3.1 samples were completely evaporated to dryness, 7.5. mu.L of 2 × Hybridization Buffer (via 5) and 3. mu.L of LHybridization Component A (via 6) were added to each trap, mixed by shaking, centrifuged briefly, and denatured at 95 ℃ for 10 min. Both reagents in this step were from SeqCap Hyb and Wash Kit kits (manufactured by Roche).
2.4.3.2 denaturation at 95 ℃ for 10 min.
2.4.4 library hybridization to probes:
2.4.4.1 the probe was removed and centrifuged briefly.
2.4.4.2 the denatured DNA (always kept at 95 ℃) was quickly transferred to a PCR tube containing the probe by brief centrifugation, shaken and mixed well, and centrifuged briefly.
2.4.4.3 was placed in a PCR machine and hybridized at 47 ℃.
2.4.5 preparation of purification reagents:
2.4.5.1A method for preparing the purified reagents required for capturing is shown in Table 11, and buffers were prepared according to the following table based on the number of captures. The reagents in the tables were SeqCap Hyb and Wash Kit kits (manufactured by Roche).
Table 11: list of formulated reagents to capture desired purification reagents
Figure 776683DEST_PATH_IMAGE023
2.4.5.2 incubation of Capture Beads (Capture Beads) and Wash Buffer (Wash Buffer) working solution:
before use, Capture Beads should be equilibrated at room temperature for 30 min.
Wash Buffer (visual 4 and visual 1) working solution was incubated at 47 ℃ for 2hr before use.
2.4.6 post-hybridization purification:
2.4.6.1 mu.L of capture beads were dispensed per capture, 100. mu.L of capture beads were placed on a magnetic rack until the liquid cleared, and the supernatant was discarded.
2.4.6.2 mu.L of 1 × Bead Wash Buffer (via 7) was added and mixed by shaking. Placing on a magnetic frame until the liquid is clear, and discarding the supernatant.
2.4.6.3 mu.L of 1 × Bead Wash Buffer (via 7) was added and mixed by shaking. Placing on a magnetic frame until the liquid is clear, and discarding the supernatant.
2.4.6.4 mu.L of 1 × Bead Wash Buffer (via 7) was added and mixed by shaking. Placing on a magnetic frame until the liquid is clear, and thoroughly discarding the supernatant. At this point the bead pretreatment was complete and the next run was immediately performed.
2.4.6.5 transfer the captured overnight hybridization fluid into washed magnetic beads and pipette ten strokes. Placing in a PCR instrument, incubating at 47 ℃ for 45min (the temperature of a PCR hot cover is set as 57 ℃), and shaking once every 15min to ensure that the magnetic beads are suspended.
2.4.7 cleaning:
2.4.7.1 after completion of incubation, 100. mu.L of 1 × Wash Buffer I (visual 1) pre-warmed at 47 ℃ was added to each tube and mixed by shaking. Placing on a magnetic frame until the liquid is clear, and discarding the supernatant. The reagents used in all of the steps through 2.4.7.6 were obtained from SeqCap Hyb and Wash Kit (manufactured by Roche).
2.4.7.2 mu.L of 1 × Stringent Wash Buffer (visual 4) preheated at 47 ℃ was added and mixed by pipetting ten times. Incubating at 47 deg.C for 5min, placing on magnetic frame until the liquid is clear, and discarding the supernatant.
2.4.7.3 mu.L of 1 × Stringent Wash Buffer (visual 4) preheated at 47 ℃ was added and mixed by pipetting ten times. Incubating at 47 deg.C for 5min, placing on magnetic frame until the liquid is clear, and discarding the supernatant.
2.4.7.4 mu.L of 1 × Wash Buffer I (visual 1) at room temperature was added, shaken for 2min, centrifuged briefly, placed on a magnetic stand until the liquid was clear, and the supernatant was discarded.
2.4.7.5 mu.L of 1 × Wash Buffer II (visual 2) placed at room temperature was added, shaken for 1min, centrifuged briefly, placed on a magnetic stand until the liquid was clear, and the supernatant was discarded.
2.4.7.6 mu.L of 1 × Wash Buffer III (visual 3) at room temperature was added, shaken for 30sec, centrifuged briefly, placed on a magnetic stand until the liquid was clear, and the supernatant was discarded.
2.4.7.7 and adding 36 μ L of ultrapure water into the centrifuge tube for elution, shaking and mixing uniformly, and carrying out the next amplification test.
2.4.8 PCR reaction:
2.4.8.1 according to the capture number, preparing the mixed solution according to the table 12, shaking and mixing evenly. The reagents in the tables are all from SeqCap Hyb and Wash Kit kits (manufactured by Roche).
Table 12: preparation reagent list of mixed solution
Figure 475780DEST_PATH_IMAGE024
2.4.8.2 were centrifuged briefly and the mixture was dispensed into PCR tubes at 30. mu.L/tube. Each captured sample was divided into two tubes for PCR amplification, with 20uL of sample per tube.
2.4.8.3 the above samples were transferred to PCR reaction, shaken, mixed and centrifuged briefly.
2.4.8.4 was placed on a PCR machine and the PCR reaction was carried out under the conditions shown in Table 13.
Table 13: conditions of PCR reaction
Figure 194468DEST_PATH_IMAGE025
2.4.9 purification after amplification:
2.4.9.1 the purified magnetic Beads (DNA Purification Beads) were removed and allowed to equilibrate at room temperature for 30 min.
2.4.9.2 mu.L of purified magnetic beads was put into a 1.5mL centrifuge tube, 100. mu.L of the amplified capture DNA library was added, mixed well with shaking, and incubated at room temperature for 15 min.
2.4.9.3 were placed on a magnetic stand until the liquid was clear and the supernatant was discarded.
2.4.9.4 was incubated with 200. mu.L of 80% ethanol for 30sec and discarded. Note that: 80% ethanol is prepared in situ. The 200 μ L80% ethanol wash step was repeated once.
2.4.9.5 residual ethanol at the bottom of the centrifuge tube was discarded using a 10. mu.L pipette tip and dried at room temperature until ethanol was completely volatilized.
2.4.9.6 the tube was removed from the magnetic frame, 120. mu.L of ultrapure water was added, and the mixture was shaken and mixed. Incubate at room temperature for 2 min.
2.4.9.7 briefly, the sample was placed on a magnetic rack until the liquid was clear and the captured sample was transferred to a new centrifuge tube.
2.5 library pooling and sequencing
And calculating the quality of the mixed library for each capture according to the data volume proportion, and mixing different captures into one sample according to the data volume proportion. And adding a Phix library to mix into an upper machine sample, and sequencing. Phix is a phage that can improve base imbalance, and can be used as a reference to evaluate the sequencing quality.
Identifying nucleosome distribution intervals in a sample
3.1 for the genome interval obtained by capture sequencing, using 2bp as the step length and the size of a 120bp window to slide, calculating the ratio of cfDNA molecules of different conditions of the total number of cfDNA molecule fragments spanning the whole window and all cfDNA molecule fragments covering the window in each windowS
3.2 Using each start site of the window as a unique identifier, finding the molecular ratio of cfDNA of different conditions in the interval of the fixed length genomeSPeak value ofS p
3.3 pairs of peaksS p The genome position is extended by 60bp back and forth to serve as an active region of an alternative nucleosome; and for each peakS p The ratio S in the alternative nucleosome active area i is homogenized to obtain a homogenized ratioS i
3.4 for the normalized ratioS i Normalizing in each alternative nucleosome active area by using a local weighted regression mode to obtain the cfDNA molecular ratio of different conditions after each window is smoothedS il
3.5 obtaining 81 healthy human samples, dividing the healthy human samples into a baseline sample group (25 samples), a training sample group (32 samples) and a test sample group (24 samples), and screening S in the baseline sample groupil' corresponding windowS p Set of windows between upper quartile and lower quartilei s And calculating the median value of each site in the test sample setS m
3.6 for the windows above, the calculation of each window between the training and baseline samples was done using the Kolmogorov-Similnov testS m The difference between the values and the calculation of each sample per windowpValue, screening all samplespA window with a value greater than 0.05;
3.7 for liver cancer patientsCalculated by the above methodi s Normalized value for each of the windowsS’Performing Kolmogorov-Similnov unilateral test on each window between healthy people and liver cancer patients, and screening out each samplepA window with a value below 0.05 and a count of N is made.
Using the number of nucleosome active domains as a methylation phenotype signature
According to the counting result, the number of the differential nucleosome active regions between the healthy human sample and the liver cancer sample, and a box-type graph is made, as shown in fig. 3, the horizontal axis is the sample type, and the vertical axis is the differential nucleosome number (corresponding to the number of the nucleosome active regions), and it can be seen from the graph that the number of the differential nucleosomes in the liver cancer sample is between 6 and 100, the number of the differential nucleosomes in the healthy human sample is almost none, and the difference is obvious. Classifying the number N of the differential nucleosomes obtained by calculating the healthy person sample and the liver cancer sample and drawing a working characteristic curve of the testee, wherein the horizontal axis is 1-specificity and represents the proportion of the sample predicted as the liver cancer sample but actually taken as the healthy person in all the healthy person samples; the vertical axis represents the sensitivity, which indicates the ratio of the predicted liver cancer sample and the actual liver cancer sample to all the liver cancer samples.
Calculating to obtain a nucleosome active region quantity threshold value of 8 according to a Yordon coefficient method, namely judging that the methylation heterogeneity level of the sample is high when the quantity of the differential nucleosomes is more than 8; when N is 8 or less, the methylation heterogeneity level of the sample is judged to be low. When the threshold was 8, the specificity was 1 and the sensitivity was 0.9625.
Using methylation density as a methylation phenotype feature
For the screened differential nucleosome active regions, calculating methylation density for each region, and making a box-type graph, as shown in fig. 5, the horizontal axis is sample type, and the vertical axis is methylation density, and as can be seen from the graph, the methylation density in the liver cancer sample is 22-360, the methylation density of the healthy human sample is 36-86, and the difference is significant. Classifying the healthy person samples and the methylation densities obtained by calculating the liver cancer samples, and drawing a working characteristic curve of a subject, wherein the horizontal axis is 1-specificity and represents the proportion of the healthy person samples which are predicted to be the liver cancer samples but actually account for all the healthy person samples; the vertical axis represents the sensitivity, which indicates the ratio of the predicted liver cancer sample and the actual liver cancer sample to all the liver cancer samples.
Calculating to obtain a methylation density threshold value of 70.15235352 according to a Yowden coefficient method, namely judging that the methylation heterogeneity level of the sample is high when the methylation density is more than 70.15235352; when N is 70.15235352 or less, the methylation heterogeneity level of the sample is judged to be low. The specificity was 0.913580247 and the sensitivity was 0.696202532 when the threshold was 70.15235352.
Using methylation levels as methylation phenotype features
For the screened differential nucleosome active region, calculating the methylation level of each CpG locus in the region, and making a box-type graph, as shown in FIG. 7, the horizontal axis is the sample type, and the vertical axis is the sum of the Beta values of the CpG loci, so that the graph shows that the methylation level in the liver cancer sample is between 20 and 353, the methylation level of the healthy human sample is between 37 and 88, and the difference is obvious. Classifying the methylation levels calculated from the healthy person sample and the liver cancer sample and drawing a working characteristic curve of the subject, as shown in fig. 8, wherein the horizontal axis is 1-specificity and represents the proportion of the healthy person sample which is predicted to be the liver cancer sample but actually is the liver cancer sample in all the healthy person samples; the vertical axis represents the sensitivity, which indicates the ratio of the predicted liver cancer sample and the actual liver cancer sample to all the liver cancer samples.
Calculating a methylation level threshold value of 61.5472871434711 according to a Yowden coefficient method, namely judging that the methylation heterogeneity level of the sample is high when the methylation level is more than 61.5472871434711; when N is 70.15235352 or less, the methylation heterogeneity level of the sample is judged to be low. The specificity was 0.839506172839506 and the sensitivity was 0.772151898734177 when the threshold was 61.5472871434711.
Using entropy of methylation as a methylation phenotype feature
For the screened differential nucleosome activity regions, the methylation entropy in the regions is calculated, and a box-type graph is made, as shown in fig. 9, the horizontal axis is the sample type, and the vertical axis is the methylation entropy, so that the methylation entropy in the liver cancer sample is between 2 and 10, and the methylation entropy in the healthy human sample is between 2 and 5, and the difference is significant. Classifying methylation entropies obtained by calculating the healthy person samples and the liver cancer samples and drawing a working characteristic curve of a subject, wherein the horizontal axis is 1-specificity and represents the proportion of the healthy person samples which are predicted to be the liver cancer samples but actually are the liver cancer samples in all the healthy person samples; the vertical axis represents the sensitivity, which indicates the ratio of the predicted liver cancer sample and the actual liver cancer sample to all the liver cancer samples.
Calculating according to a Yowden coefficient method to obtain a methylation entropy threshold value of 4.981215, namely judging that the methylation heterogeneity level of the sample is high when the methylation entropy is greater than 4.981215; when N is 4.981215 or less, the methylation heterogeneity level of the sample is judged to be low. The specificity was 0.95 and the sensitivity was 0.692307692 at a threshold of 4.981215.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of program modules is illustrated, and in practical applications, the above-described distribution of functions may be performed by different program modules, that is, the internal structure of the apparatus may be divided into different program units or modules to perform all or part of the above-described functions. Each program module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one processing unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software program unit. In addition, the specific names of the program modules are only used for distinguishing the program modules from one another, and are not used for limiting the protection scope of the application.
Fig. 11 is a schematic structural diagram of a terminal device provided in an embodiment of the present invention, and as shown, the terminal device 200 includes: a processor 220, a memory 210, and a computer program 211 stored in the memory 210 and executable on the processor 220, such as: methylation analysis of the active region of circulating cell-free nucleosomes correlates with the procedure. The processor 220 implements the steps of the above-described embodiments of the methylation analysis method of the respective circulating cell-free nucleosome active region when executing the computer program 211, or the processor 220 implements the functions of the above-described modules of the embodiments of the methylation analysis device of the circulating cell-free nucleosome active region when executing the computer program 211.
The terminal device 200 may be a notebook, a palm computer, a tablet computer, a mobile phone, or the like. Terminal device 200 may include, but is not limited to, processor 220, memory 210. Those skilled in the art will appreciate that fig. 11 is merely an example of terminal device 200, does not constitute a limitation of terminal device 200, and may include more or fewer components than shown, or some components in combination, or different components, such as: terminal device 200 may also include input-output devices, display devices, network access devices, buses, and the like.
The Processor 220 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. The general purpose processor 220 may be a microprocessor or the processor may be any conventional processor or the like.
The memory 210 may be an internal storage unit of the terminal device 200, such as: a hard disk or a memory of the terminal device 200. The memory 210 may also be an external storage device of the terminal device 200, such as: a plug-in hard disk, an intelligent TF memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the terminal device 200. Further, the memory 210 may also include both an internal storage unit of the terminal device 200 and an external storage device. The memory 210 is used to store the computer program 211 and other programs and data required by the terminal device 200. The memory 210 may also be used to temporarily store data that has been output or is to be output.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or recited in detail in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described apparatus/terminal device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by sending instructions to relevant hardware through the computer program 211, where the computer program 211 may be stored in a computer readable storage medium, and when the computer program 211 is executed by the processor 220, the steps of the method embodiments may be implemented. Wherein the computer program 211 comprises: computer program code which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying the code of computer program 211, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the content of the computer readable storage medium can be increased or decreased according to the requirements of the legislation and patent practice in the jurisdiction, for example: in certain jurisdictions, in accordance with legislation and patent practice, the computer-readable medium does not include electrical carrier signals and telecommunications signals.
It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for persons skilled in the art, numerous modifications and adaptations can be made without departing from the principle of the present invention, and such modifications and adaptations should be considered as within the scope of the present invention.

Claims (10)

1. A method for methylation analysis of the active region of circulating cell-free nucleosomes, comprising:
acquiring capture sequencing data of a plasma sample to be detected and extracting cfDNA molecular fragments from the capture sequencing data;
based on the extracted cfDNA molecule fragments, performing sliding operation by adopting a window with a preset length in a genome interval of the cfDNA molecule fragments in a preset step length, and calculating the ratio of the number of the cfDNA molecules which cross the whole window from head to tail in each window to the number of the cfDNA molecules under all different conditions covered by the window;
screening an interval with obvious difference from a baseline nucleosome activity difference area established according to a healthy human sample by a Kolmogorov-Similov test method based on the calculated ratio to obtain a nucleosome activity area;
calculating the methylation phenotype characteristics of the nucleosome active region obtained by screening, and judging the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics to complete the methylation analysis of the circulating cell-free nucleosome active region;
the method for obtaining the characteristics of the cfDNA fragments comprises the following steps: for sequence files obtained by sequencing, respectively obtaining the starting position and the ending position of paired reads belonging to the same DNA molecule, and judging the positive chain and the negative chain of the reads according to the comparison information of the reads; then, judging the initial and termination positions of the cfDNA molecules according to the positive and negative chains and the initial and termination positions of paired reads, further obtaining the initial and termination positions of the cfDNA fragments on the human genome hg19 and calculating the length L of the cfDNA fragments;
performing sliding operation by adopting a window with a preset length in a genome interval of the cfDNA molecule fragment based on extraction in a preset step length, and calculating the ratio of the number of the cfDNA molecules which cross the whole window from head to tail in each window to the number of the cfDNA molecules which cover the window under all different conditions, wherein the ratio comprises the following steps:
based on the extracted circulating cell-free genome interval, sliding by adopting a window with a preset length and a preset step length;
calculating the number of the gene molecules which cross the whole window from head to tail in each window and the ratio of the number of the cfDNA molecules of different conditions of the number of all the gene molecules covered by the windowSAnd finding out the number ratio of cfDNA molecules in different situations in the fixed-length genome interval in each circulating cell-free genomeSPeak value ofS p
For peak valueS p At the genome position, extending the preset length back and forth to be used as an alternative nucleosome active regionA domain;
for alternative nucleosome active regionSThe values are homogenized and smoothed to obtain the ratio of the number of cfDNA molecules in different conditions after the active area of each alternative nucleosome in the interval is smoothedS il
Screening out an interval with a significant difference from a baseline nucleosome activity difference region created according to a healthy human sample by a Kolmogorov-Scirkov test method based on the calculated ratio to obtain a nucleosome active region, wherein the interval comprises: based on the calculated ratioS il Screening out an alternative nucleosome active region which has a significant difference with a baseline nucleosome active difference region established according to a healthy human sample by a method of Kelmonov-Scironov test, and taking the alternative nucleosome active region as a final nucleosome active region;
the method comprises the following steps of performing sliding operation by adopting a window with a preset length in a genome interval of the cfDNA molecule fragment based on extraction and by a preset step length, and calculating the ratio of the number of the cfDNA molecules crossing the whole window from head to tail in each window to the number of the cfDNA molecules in all different situations covered by the window, and further comprises the following steps of establishing a baseline nucleosome activity difference region according to a healthy human sample:
obtaining a healthy person sample, and dividing the healthy person sample into a baseline sample group, a training sample group and a test sample group;
screening out the peaks in the baseline sample setS p Ratio of the windowS il Alternative nucleosome active regions satisfying preset conditionsi s
Computing a set of candidate nucleosome active regions between a baseline sample set and a training sample set using a method of the kolmogorov-smirov testi s Trend ratio of active region of nucleosome in each of the candidatesS m The difference between them, and then calculate the active region of each alternative nucleosome contained in each samplepA value;
screening according to the calculation resultpObtaining the nucleosome activity difference region with the value larger than the preset threshold valueThe region of differential baseline nucleosome activity and tested using the test sample set.
2. The method for methylation analysis of circulating cell-free nucleosome active regions according to claim 1, wherein the step of screening out regions having significant difference in activity from a baseline nucleosome activity difference region created from a sample of a healthy person by a kolmogorov-smirnov test based on the calculated ratio comprises: testing between healthy person and blood plasma sample to be tested by using Kolmogorov-Similnov test method to obtainpThe value is lower than the interval of the preset threshold value, and then the nucleosome active region is obtained.
3. The method for methylation analysis of circulating cell-free nucleosome active regions according to claim 1, wherein the methylation phenotypic characteristic of the selected nucleosome active regions obtained by calculation and the methylation heterogeneity level of the nucleosome active regions is judged according to the methylation phenotypic characteristic, and the methylation analysis of the circulating cell-free nucleosome active regions is completed by:
counting the number of differential nucleosome active areas in the plasma sample to be detected;
and judging the methylation heterogeneity level of the plasma sample to be detected according to a preset nucleosome active region quantity threshold, wherein the nucleosome active region threshold is determined by the corresponding differential nucleosome active region quantity when the maximum approximate faradaic coefficient is obtained by calculation of a York coefficient method.
4. The method for methylation analysis of circulating cell-free nucleosome active regions according to claim 1, wherein the methylation phenotypic characteristic of the selected nucleosome active regions obtained by calculation and the methylation heterogeneity level of the nucleosome active regions is judged according to the methylation phenotypic characteristic, and the methylation analysis of the circulating cell-free nucleosome active regions is completed by:
for the resulting nucleosome active regions, each was calculatedMethylation ratio of the number of methylated CpG sites to the number of all CpG sites in a regionMD
According to the site methylation ratio of each nucleosome active region in the plasma sample to be detectedMDCalculating to obtain the methylation density of the sample;
and judging the methylation heterogeneity level of the plasma sample to be detected according to a preset methylation density threshold value, wherein the methylation density threshold value is determined by the corresponding methylation density when the maximum approximate registration coefficient is calculated by a approximately registration coefficient method.
5. The method for methylation analysis of circulating cell-free nucleosome active regions according to claim 1, wherein the methylation phenotypic characteristic of the selected nucleosome active regions obtained by calculation and the methylation heterogeneity level of the nucleosome active regions is judged according to the methylation phenotypic characteristic, and the methylation analysis of the circulating cell-free nucleosome active regions is completed by:
counting the number of methylated molecular fragments on each CpG site and the number of all the molecular fragments covering the CpG site and calculating a Beta value for the CpG sites in the obtained nucleosome active region;
calculating the methylation level of the sample according to the Beta value of each CpG locus in the plasma sample to be detected;
and judging the methylation heterogeneity level of the plasma sample to be detected according to a preset methylation level threshold value, wherein the methylation level threshold value is determined by the methylation level corresponding to the maximum approximate registration coefficient calculated by a registration coefficient method.
6. The method for methylation analysis of circulating cell-free nucleosome active regions according to claim 1, wherein the methylation phenotypic characteristic of the selected nucleosome active regions obtained by the calculation and the methylation heterogeneity level of the nucleosome active regions is judged according to the methylation phenotypic characteristic, and the methylation analysis of the circulating cell-free nucleosome active regions is completed and comprises:
calculating methylation entropy in each region of the obtained nucleosome active regions;
calculating the methylation entropy of the sample according to the methylation entropy in each nucleosome active region in the plasma sample to be detected;
and judging the methylation heterogeneity level of the plasma sample to be detected according to a preset methylation entropy threshold, wherein the methylation entropy threshold is determined by a corresponding methylation entropy threshold when the maximum approximate registration coefficient is obtained by calculation of a registration coefficient method.
7. A methylation analysis device of circulating cell free nucleosome active regions, which is applied to the methylation analysis method of circulating cell free nucleosome active regions according to any one of claims 1 to 6, the device comprising:
the circulating cell-free gene fragment acquisition module is used for acquiring capture sequencing data of a plasma sample to be detected and extracting cfDNA molecular fragments from the capture sequencing data; the method comprises the following steps: for sequence files obtained by sequencing, respectively obtaining the starting position and the ending position of paired reads belonging to the same DNA molecule, and judging the positive chain and the negative chain of the reads according to the comparison information of the reads; then, judging the initial and termination positions of the cfDNA molecules according to the positive and negative chains and the initial and termination positions of paired reads, further obtaining the initial and termination positions of the cfDNA fragments on the human genome hg19 and calculating the length L of the cfDNA fragments;
the different-condition cfDNA molecule number ratio calculation module is used for performing sliding operation by adopting a window with a preset length in a genome interval of the extracted cfDNA molecule fragments in a preset step length, and calculating the ratio of the number of the cfDNA molecules which cross the whole window from head to tail in each window to the number of the cfDNA molecules which cover the window in different conditions;
a nucleosome active region screening module, which is used for screening an interval with obvious difference from a baseline nucleosome active difference region established according to a healthy human sample by a Kolmogorov-Simminov test method based on the calculated ratio to obtain a nucleosome active region;
the methylation analysis module is used for calculating the methylation phenotype characteristics of the nucleosome active region obtained by screening, judging the methylation heterogeneity level of the nucleosome active region according to the methylation phenotype characteristics and completing the methylation analysis of the circulating cell-free nucleosome active region;
the module for calculating the number ratio of the cfDNA molecules in different situations comprises:
the window sliding unit is used for sliding by adopting a window with a preset length in a preset step length based on the extracted circulating cell-free genome interval;
a first calculating unit for calculating the number of gene molecules crossing the whole window from head to tail in each window and the ratio of the number of cfDNA molecules of different cases of the number of all gene molecules covered by the windowSAnd finding out the number ratio of cfDNA molecules in different situations in the fixed-length genome interval in each circulating cell-free genomeSPeak value ofS p
A data processing unit for aligning the peak valuesS p At the genome position, extending the preset length back and forth, serving as an active region of the alternative nucleosome and being used for the active region of the alternative nucleosomeSThe values are homogenized and smoothed to obtain the ratio of the number of cfDNA molecules in different conditions after the active area of each alternative nucleosome in the interval is smoothedS il
And a nucleosome active region screening module for calculating the ratio based on the measured valuesS il Screening out an alternative nucleosome active region which has a significant difference with a baseline nucleosome active difference region established according to a healthy human sample by a method of Kelmonov-Scironov test, and taking the alternative nucleosome active region as a final nucleosome active region;
the methylation analysis device of the circulating cell-free nucleosome activity region also comprises a baseline creating module for creating a baseline nucleosome activity difference region according to a healthy human sample, and the baseline creating module comprises:
the system comprises a sample acquisition unit, a training unit and a testing unit, wherein the sample acquisition unit is used for acquiring a healthy person sample and dividing the healthy person sample into a baseline sample group, a training sample group and a testing sample group;
a window set screening unit for screening out the sum peak in the baseline sample setS p In the windowMouth pieceS il Alternative nucleosome activity difference region satisfying preset conditionsi s
A second calculation unit for calculating a set of alternative nucleosome active regions between the baseline sample set and the training sample set using a method of the Kolmogorov-Schmilov testi s Trend ratio of active region of nucleosome in each of the candidatesS m The difference between them, and then calculate the active region of each alternative nucleosome contained in each samplepA value;
a screening unit of the activity difference region of the baseline nucleosome for screening according to the calculation resultpAnd obtaining a nucleosome activity difference region with the value larger than a preset threshold value.
8. The apparatus for methylation of circulating cell-free nucleosome active regions according to claim 7, wherein the nucleosome active region screening module is further configured to test between a healthy person and a plasma sample to be tested using the kolmogorov-smirnov test method to obtainpThe value is lower than the interval of the preset threshold value, and then the nucleosome active region is obtained.
9. A terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor when running said computer program implements the steps of the method for methylation analysis of circulating cell-free nucleosome active regions according to any of claims 1-6.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method for methylation analysis of circulating cell-free nucleosome active regions according to any one of claims 1 to 6.
CN202110337436.XA 2021-03-30 2021-03-30 Methylation analysis method and device of circulating cell-free nucleosome active region, terminal equipment and storage medium Active CN112735531B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110337436.XA CN112735531B (en) 2021-03-30 2021-03-30 Methylation analysis method and device of circulating cell-free nucleosome active region, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110337436.XA CN112735531B (en) 2021-03-30 2021-03-30 Methylation analysis method and device of circulating cell-free nucleosome active region, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112735531A CN112735531A (en) 2021-04-30
CN112735531B true CN112735531B (en) 2021-07-02

Family

ID=75597059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110337436.XA Active CN112735531B (en) 2021-03-30 2021-03-30 Methylation analysis method and device of circulating cell-free nucleosome active region, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112735531B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903401B (en) * 2021-12-10 2022-04-08 臻和(北京)生物科技有限公司 ctDNA length-based analysis method and system
CN115132274B (en) * 2022-09-01 2022-11-25 臻和(北京)生物科技有限公司 Methylation level analysis method and device for circulating cell-free DNA transcription factor binding site
CN116168761B (en) * 2023-04-18 2023-06-30 珠海圣美生物诊断技术有限公司 Method and device for determining characteristic region of nucleic acid sequence, electronic equipment and storage medium
CN116153418B (en) * 2023-04-18 2023-07-18 臻和(北京)生物科技有限公司 Method, apparatus, device and storage medium for correcting whole genome methylation sequencing data batch effect

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111183145A (en) * 2017-03-08 2020-05-19 芝加哥大学 High-sensitivity DNA methylation analysis method
CN111433855A (en) * 2017-07-18 2020-07-17 康捷尼科有限公司 Screening system and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111032868A (en) * 2017-06-30 2020-04-17 加利福尼亚大学董事会 Methods and systems for assessing DNA methylation in cell-free DNA
CN110189798A (en) * 2019-06-26 2019-08-30 广州市雄基生物信息技术有限公司 A kind of clustering method and application based on peripheral blood plasma DNA nucleosome footprint difference

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111183145A (en) * 2017-03-08 2020-05-19 芝加哥大学 High-sensitivity DNA methylation analysis method
CN111433855A (en) * 2017-07-18 2020-07-17 康捷尼科有限公司 Screening system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DETECTION OF METHYLATED CELL-FREE DNA FOR DIAGNOSIS AND;Yu-Jen Cheng等;《2021 IEEE 34th International Conference on Micro Electro Mechanical Systems (MEMS)》;20210129;第310-313页 *
甲基化循环肿瘤DNA的检测及应用;吴羽灵等;《Labeled Immunoassays& Clin Med》;20210131;第28卷(第1期);第152-158页 *

Also Published As

Publication number Publication date
CN112735531A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN112735531B (en) Methylation analysis method and device of circulating cell-free nucleosome active region, terminal equipment and storage medium
CN112029861B (en) Tumor mutation load detection device and method based on capture sequencing technology
CN112397150B (en) ctDNA methylation level prediction device and method based on target region capture sequencing
CN110129441B (en) Detection panel for brain glioma based on second-generation sequencing, detection kit and application of detection panel
CN112397151B (en) Methylation marker screening and evaluating method and device based on target capture sequencing
TW201840853A (en) Diagnostic applications using nucleic acid fragments
CN103305618A (en) Screening method of inherited metabolic disorder gene
CN105653898A (en) Cancer detection kit based on large-scale data mining and detection method
CN114317762B (en) Three-marker composition for detecting early liver cancer and kit thereof
WO2020224159A1 (en) Next generation sequencing-based panel for detecting glioma, detection kit, detection method, and application thereof
CN105132407A (en) Method for low-frequency mutant-enriched sequencing of DNA of exfoliative cells
CN107893116A (en) For detecting primer pair combination, kit and the method for building library of gene mutation
CN105779435A (en) Kit and application thereof
CN112609015A (en) Microbial marker for predicting colorectal cancer risk and application thereof
CN114574587B (en) Marker composition for colorectal cancer detection and application thereof
CN108070658A (en) Detect the non-diagnostic method of MSI
CN103290136A (en) Screening method of leukoencephalopathy genes
CN108949979A (en) A method of judging that Lung neoplasm is good pernicious by blood sample
CN113362893A (en) Construction method and application of tumor screening model
CN115341031A (en) Screening method of pan-cancer methylation biomarker, biomarker and application
CN110993025B (en) Method and device for quantifying fetal concentration and method and device for genotyping fetus
CN105779433A (en) Kit and applications thereof
CN115678964B (en) Noninvasive screening method of embryo before implantation based on embryo culture solution
US20210214799A1 (en) Method and kit for the classification of thyroid nodules
CN110724743A (en) Methylated biomarker related to colorectal cancer diagnosis in human blood and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant