CN115851930A - Methylation marker for detecting benign and malignant lung nodules and application thereof - Google Patents

Methylation marker for detecting benign and malignant lung nodules and application thereof Download PDF

Info

Publication number
CN115851930A
CN115851930A CN202211065863.8A CN202211065863A CN115851930A CN 115851930 A CN115851930 A CN 115851930A CN 202211065863 A CN202211065863 A CN 202211065863A CN 115851930 A CN115851930 A CN 115851930A
Authority
CN
China
Prior art keywords
methylation
benign
malignant
seq
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211065863.8A
Other languages
Chinese (zh)
Inventor
孙加源
李营
谢芳芳
李威
苏志熙
何其晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Chest Hospital
Original Assignee
Shanghai Chest Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Chest Hospital filed Critical Shanghai Chest Hospital
Priority to CN202211065863.8A priority Critical patent/CN115851930A/en
Publication of CN115851930A publication Critical patent/CN115851930A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a methylation marker for detecting benign and malignant lung nodules and application thereof. The methylation marker comprises a fragment with at least one of sequences shown in Seq ID No. 1 to Seq ID No. 43 or a complete complementary sequence thereof, or a fragment including the sequence shown and 1kb sequence upstream and downstream thereof, a DNA methylation site in a region of the sequence shown, a variant having at least 90% or more sequence identity with the sequence shown and having the same methylation site as the sequence shown, or a DNA methylation haplotype covered in a region of the sequence shown, and the abundance of the DNA methylation haplotype. The method is based on methylation sequencing data of benign and malignant lung nodule samples, and the markers with methylation levels obviously different in the benign lung nodule and the malignant lung nodule are screened, so that the method can be used for effectively identifying the benign and malignant lung nodule of the lung nodule, and a benign and malignant lung nodule/lung cancer risk prediction and evaluation model is established by utilizing the markers.

Description

Methylation marker for detecting benign and malignant lung nodules and application thereof
Technical Field
The invention relates to the technical field of biomedicine, in particular to a methylation marker for detecting benign and malignant pulmonary nodules and application thereof.
Background
Low Dose Computed Tomography (LDCT) has been widely used as the most effective method for detecting early stage lung cancer. National Lung Screening Trial (NLST) with milestone significance indicates that when Screening high risk group of Lung cancer, the high sensitivity of LDCT to early small malignant Lung nodules (< 3 cm) can reduce 20% of Lung cancer death rate compared with single-view posterior anterior chest radiograph. (National Lung Screening Trial Research et al, 2011). However, LDCT has a high false positive rate (96.4% in NLST studies) by detecting benign Lung nodules (National Lung Screening triple Research et al, 2011).
In order to minimize misdiagnosis of benign nodules as malignant, accurate risk stratification for malignancy must be performed on detected lung nodules. Recently, liquid biopsy has been proposed as an alternative method for differential diagnosis of lung nodules. It samples and analyzes non-solid biological tissues, mainly blood, to identify and quantify lung Cancer-derived biomarkers, such as circulating tumor cells, nucleic acids or proteins, to assess the presence and/or status of tumors for detection, diagnosis, prognosis and monitoring (Wan, j.c. m., et al, liquid biologies com of age: war immunization of circulating tumor dna. Nature Reviews Cancer, 2017). Studies have now demonstrated that body fluids (blood, urine, alveolar lavage, etc.) contain large amounts of free DNA (cfDNA) that has tissue-specific DNA methylation characteristics that allow tissue-based and cancer detection (Lo YMD, et al, 2021). The cfDNA methylation detection is applied to detection of various cancers such as colorectal cancer, liver cancer and the like, and early cancers can be sensitively detected under high specificity. Some documents find that cfDNA methylation can be applied to early diagnosis and screening of lung cancer (AUC = 0.90) (Liang N et al, 2021), but the cfDNA methylation is very poor in diagnosis of benign and malignant lung nodules (AUC = 0.72-0.81) (Liang W et al, 2021), so that a liquid biopsy methylation biomarker with higher stability and higher specificity needs to be found for diagnosis of benign and malignant lung nodules.
In view of this, the invention is particularly proposed.
Disclosure of Invention
The invention aims to provide a methylation marker for detecting benign and malignant pulmonary nodules and application thereof. The invention can distinguish benign pulmonary nodule patients from malignant pulmonary nodule patients by detecting the methylation level of the methylation marker in the blood plasma, and realizes the purpose of noninvasive and accurate diagnosis of pulmonary nodules with higher accuracy and lower cost.
In order to achieve the above purpose of the present invention, the following technical solutions are adopted:
in one aspect, the invention provides methylation markers for detecting benign and malignant lung nodules, the methylation markers comprising the following markers:
(a1) A fragment of at least one of the sequences Seq ID No. 1 to Seq ID No. 43 or the complete complement thereof;
(a2) A fragment comprising the sequence of (a 1) and a 1kb sequence upstream and downstream thereof;
(a3) A DNA methylation site in a region of the sequence set forth in (a 1) or (a 2);
(a4) Variants having at least 90% or more sequence identity to the sequence of (a 1) or (a 2) and having the same methylation site as (a 1) or (a 2); or
(a5) A DNA methylation haplotype covered in a region of the sequence represented by (a 1) or (a 2), and the abundance of the DNA methylation haplotype.
In one embodiment, the markers of benign and malignant methylation of lung nodules comprise one or more or all of the sequences selected from the group consisting of seq id no: seq ID No: 1. seq ID No: 2. seq ID No: 3. seq ID No: 4. seq ID No: 5. seq ID No: 6. seq ID No: 7. seq ID No: 8. seq ID No: 9. seq ID No: 10. seq ID No: 11. seq ID No: 12. seq ID No: 13. seq ID No: 14. seq ID No: 15. seq ID No: 16. seq ID No: 17. seq ID No: 18. seq ID No: 19. seq ID No: 20. seq ID No: 21. seq ID No: 22. seq ID No: 23. seq ID No: 24. seq ID No: 25. seq ID No: 26. seq ID No: 27. seq ID No: 28. seq ID No: 29. seq ID No: 30. seq ID No: 31. seq ID No: 32. se q ID No: 33. seq ID No: 34. seq ID No: 35. seq ID No: 36. seq ID No: 37. seq ID No: 38. seq ID No: 39. seq ID No: 40. seq ID No: 41. seq ID No: 42. seq ID No:43. the positions of the sequences in the genome are respectively: the main components of the chr1: 29586780, chr1: 44015730.
Preferably, the methylation markers of benign and malignant lung nodules comprise Seq ID No:33 and one or more or all of the following sequences: seq ID No: 1. seq ID No: 2. seq ID No: 3. seq ID No: 4. seq ID No: 5. seq ID No: 6. seq ID No: 7. seq ID No: 8. seq ID No: 9. seq ID No: 10. seq ID No: 11. seq ID No: 12. seq ID No: 13. seq ID No: 14. seq ID No: 15. seq ID No: 16. seq ID No: 17. seq ID No: 18. seq ID No: 19. seq ID No: 20. seq ID No: 21. seq ID No: 22. seq ID No: 23. seq ID No: 24. seq ID No: 25. seq ID No: 26. seq ID No: 27. seq ID No: 28. seq ID No: 29. seq ID No: 30. seq ID No: 31. seq ID No: 32. seq ID No: 34. seq ID No: 35. seq ID No: 36. seq ID No: 37. seq ID No: 38. seq ID No: 39. seq ID No: 40. seq ID No: 41. seq ID No: 42. seq ID No:43.
more preferably, the benign and malignant methylation marker of the pulmonary nodule comprises the sequence shown in Seq ID No:1 to Seq ID No:43 or a complete complementary sequence thereof.
In one embodiment, the lung nodule benign and malignant methylation marker comprises a sequence shown in Seq ID No. 1 to Seq ID No. 43 and a fragment including 1kb sequence of the sequence; or a DNA methylation site in a region of the sequence shown; or a variant having at least 90% or more sequence identity to the indicated sequence and having the same methylation site as the indicated sequence; or the DNA methylation haplotypes covered in the region of the sequence shown and the abundance of the DNA methylation haplotypes.
Preferably, (a 4), a variant having at least 95%, 98%, 99% or 99.9% or more sequence identity to the sequence of (a 1) or (a 2) and having the same methylation site as (a 1) or (a 2).
In the present invention, the methylation biomarker uses the methylation state/level of the region/fragment/methylation site containing the CpG island involved in the present invention as a basis for distinguishing patients with benign lung nodules from patients with malignant lung nodules. The methylation level of the methylation marker can be obtained by a variety of detection means known in the art, including, but not limited to, obtaining the methylation level by next-generation sequencing.
The numbering of the positions of the sequences in the genome described in the present invention corresponds to the positions on the UCSC (http:// hgdownload. Soe. UCSC. Edu/goldenPath/HG19/big Zips/HG19.Fa. Gz) HG19 genome.
The methylation biomarker in the invention can be used as a lung cancer related methylation molecular marker alone or in combination for detecting or assisting in detecting or identifying benign and malignant lung nodules and/or lung cancer.
In another aspect, the present invention also provides a reagent for detecting the methylation degree of the aforementioned methylation marker; for example, a primer or a probe for the aforementioned methylation marker, the primer having a nucleotide sequence in which the methylation marker is located as a target sequence, for specific amplification of the target sequence; the probe specifically captures the nucleotide sequence where the methylation marker is located.
In another aspect, the present invention also provides a kit for detecting benign and malignant pulmonary nodules, the kit comprising a reagent for detecting the methylation marker of benign and malignant pulmonary nodules.
In another aspect, the invention also provides the application of the reagent for detecting the methylation level of the lung nodule benign and malignant methylation marker in the preparation of a diagnostic kit for identifying the lung nodule benign and malignant methylation marker in a sample.
Further, the detection reagent may further include a reagent used in any one of or a combination of a PCR amplification method, a fluorescent quantitative PCR method, a digital PCR method, a liquid chip method, a second generation sequencing method, a third generation sequencing method, a bisulfite sequencing method, a whole genome methylation sequencing method, and a methylation chip method. For example, the detection reagent may be selected from the following: bisulfite and its derivatives, PCR buffer, polymerase, dNTP, primer, probe, restriction endonuclease sensitive or insensitive to methylation, enzyme digestion buffer, fluorescent dye, fluorescence quencher, fluorescence reporter, exonuclease, alkaline phosphatase, internal standard, contrast material, etc.
In one embodiment, the biomarkers are obtained from liquid samples, which are mammalian blood and liquid samples of other biological origin, such as peripheral blood, serum, plasma, ascites, urine, cerebrospinal fluid, sputum, saliva, and the like. The mammals include rats, mice and humans; preferably a human.
In one embodiment, the sample is a fine needle biopsy or plasma. The sample comprises genomic DNA or cfDNA. In one embodiment, the sample preferably refers to plasma cfDNA. cfDNA refers to circulating free DNA or cell free DNA, degraded DNA fragments released into plasma.
In one embodiment, the invention provides a marker for screening for a diagnosis of benign and malignant lung nodules based on cfDNA methylation.
In the present invention, "benign" and "malignant" of a lung nodule indicate the nature of the lung nodule. The lung nodules include solid or partially solid or ground glass nodules, preferably partially solid or ground glass nodules. A "malignant" lung nodule is generally referred to as having a cancerous lesion.
On the other hand, the invention also provides a construction method of the lung nodule benign and malignant prediction evaluation model, which comprises the following steps:
(a) Collecting benign samples and malignant samples of lung nodules, and dividing the benign samples and the malignant samples into a training set and a testing set;
(b) Extracting cfDNA of a sample, and performing library building and sequencing;
(c) Carrying out methylation conversion treatment and data comparison on the sequence, and calculating a Probability of Discordant Reads (PDR) value of the sample;
(d) Constructing a characteristic matrix and an algorithm model for the sample data, and screening out benign and malignant pulmonary nodule methylation markers according to the sample data of the training set;
(e) Verifying the effect of the model by using the sample data of the test set;
(f) Methylation markers are determined for eventual predictive assessment of benign and malignant lung nodules.
In the invention, 80 stable methylation markers are screened by comparing Reduced Responsive Bisulfate Sequencing (RRBS) data of good and malignant nodule tissue samples, and then a Panel finally containing 760 methylation markers is formed by combining reported Panseer Panel (Chen et al, 2020). Methylation sequencing data of cfDNA of sample plasma are obtained by a method of Methyl-Titan (patent number: CN 201910515830), methylation levels of the 760 markers are obtained, and key methylation marker combinations are screened out to assist in diagnosing benign and malignant pulmonary nodules.
In one embodiment, the test sets include a validation set, a single-center test set, and a multi-center test set.
In step (d), the optimal combination of the classifier and the parameters is found by using a grid search method in the training set. First, logistic regression, support Vector Machines (SVMs), and random forest models are pre-selected for classifier selection. For all features, a 3-fold cross-validation grid search method was used to select classifiers, which selected logistic regression as the optimal classifier. Then, grid searching is carried out according to different modeling parameters and combinations of thresholds, the characteristics with the optimal classification capability are identified by using a logistic regression model, and finally 43 characteristics are screened out by model selection for modeling; these procedures were carried out using python (version 3.6.13) and scimit-spare (version 0.24.2).
Further, the algorithmic model comprises a machine learning model; the machine learning model includes any one of a principal component analysis model, a logistic regression analysis model, a nearest neighbor analysis model, a support vector machine, and a neural network model.
Preferably, the machine learning model is a logistic regression analysis model.
In another aspect, the present invention further provides a risk assessment model for benign and malignant pulmonary nodules, the risk assessment model being obtained according to the above construction method, and the risk assessment model includes:
the data acquisition module is at least used for acquiring a sample data set;
a sequencing module at least for obtaining sequencing data;
a data comparison module at least used for comparing the sequencing data with a reference sequence and determining the methylation result of the marker in the sequencing data based on the comparison result;
and the result judging module is at least used for calculating a prediction score threshold value through statistical model analysis and judging whether the sample to be detected is benign or malignant of the lung nodule.
The risk assessment model may further include a methylation processing module at least for performing methylation processing to obtain methylated data.
The methylation level of the screened marker and the benign and malignant nodules are subjected to regression analysis, and a regression model is constructed, so that a risk assessment model or a diagnosis model for the benign and malignant pulmonary nodules is obtained. The methylation levels of the 43 methylation markers of the present invention were used to construct a machine learning model of logistic regression that could be used to identify benign and malignant lung nodules.
In another aspect, the present invention also provides an information data processing terminal, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the following steps:
(a) Obtaining the methylation level of at least one region of sequences from Seq ID No. 1 to Seq ID No. 43 in a sample to be detected; (b) Calculating to obtain a score by constructing a logistic regression diagnosis model; and (c) identifying the benign and malignant lung nodules according to the scores.
In another aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, performs the steps of:
(a) Obtaining the methylation level of at least one region of the sequences from Seq ID No:1 to Seq ID No:43 in a sample to be tested: (b) Calculating to obtain a score by constructing a logistic regression diagnosis model; and (c) identifying the benign and malignant lung nodules according to the scores.
A method for detecting benign and malignant lung nodules for non-disease diagnostic purposes, comprising the steps of:
s1: detecting the methylation level of at least one region of the sequences Seq ID No. 1 to Seq ID No. 43 in the sample;
s2: calculating to obtain a score according to the lung nodule benign and malignant risk assessment model;
s3: and identifying the benign and malignant thyroid nodules according to the scores.
Specifically, when the methylation level of the target sequence in the test sample satisfies a certain threshold, a malignant nodule is identified. For example, for the sample to be detected, when the score is greater than the threshold value, the result is determined to be positive, i.e., malignant nodules, otherwise, the result is determined to be negative, i.e., benign nodules.
The invention has the following beneficial effects:
1. the invention provides 43 methylation markers with very high relevance to benign and malignant lung nodules based on plasma cfDNA high-throughput methylation sequencing, the markers can effectively identify the benign and malignant lung nodules and have high sensitivity and specificity, and a benign and malignant lung nodule/lung cancer risk prediction and evaluation model can be established by utilizing the markers and is used for identifying the benign and malignant lung nodules;
2. the application also provides a risk prediction evaluation model or diagnosis model established according to the relationship between the methylation level of the marker and the benign and malignant pulmonary nodule and a specific construction method thereof, the benign and malignant pulmonary nodule can be distinguished through the prediction score of the model, and the model has the advantages of noninvasive detection, safe and convenient detection, high flux and high detection accuracy;
3. the methylation marker and the risk prediction and evaluation model thereof are used for identifying the benign and malignant lung nodules, so that the detection cost can be effectively controlled while better detection performance is effectively obtained.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 shows the distribution of prediction scores of a training set and a validation set of an AllModel diagnostic model;
FIG. 2 is a diagram showing the effect of the AllModel diagnostic model on the training set and the verification set for the ROC curve;
FIG. 3 is a graph of AllModel diagnostic model predicted score distribution in single-center and multi-center test sets;
FIG. 4 shows the effect of the AllModel diagnostic model on single-center and multi-center test sets for the ROC curve.
Detailed Description
The present disclosure will now be described in detail by way of example with reference to the accompanying drawings, in order to more clearly illustrate the general concepts of the present disclosure. In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.
The examples, in which specific conditions are not specified, were conducted under conventional conditions or conditions recommended by the manufacturer.
In the following embodiments, reagents or apparatuses used are not indicated by manufacturers, and are conventional products commercially available, unless otherwise specified.
In the present application, 80 stable methylation markers are selected by comparing Reduced Responsive Bisulfate Sequencing (RRBS) data of good and malignant nodule tissue samples, and then combined with reported Panseer Panel (Chen et al, 2020) to form a Panel finally containing 760 methylation markers.
Blood samples of lung nodule patients were collected. The sample collection criteria were as follows: adult patients 40-80 years old, were screened for lung nodules (especially between 5 and 30 mm) by LDCT. Exclusion criteria included patients with lung nodules with either glottic or mediastinal lymph node enlargement or suspected lung cancer or other malignant intrapulmonary metastases, and other conditions that researchers considered inappropriate for this study. The samples are divided into a Training set (Training), a Validation set (Validation), a Single-center Test set (Single-center Test), and a Multi-center Test set (Multi-center Test). Specific sample information is as follows (see table 1):
TABLE 1 sample information statistics
Figure SMS_1
Methylation sequencing data of cfDNA of sample plasma are obtained by a method of Methyl-Titan (patent number: CN 201910515830), and methylation markers in the sample plasma are screened out. The specific screening process is as follows:
1. extraction of plasma cfDNA samples:
patient 10ml whole blood samples were collected using Streck blood collection tubes (Streck, 218962), transported at ambient temperature within 12 hours after collection, and plasma was centrifuged in time (within 3 days). To separate the plasma, the blood sample was first centrifuged at 1600g for 10 min at 4 ℃ and then the supernatant was extracted and centrifuged again at 16000g for 10 min at 4 ℃. All cfDNA was then extracted using the HiPure circulation DNA Midi Kit C (magenta) Kit.
2. Sequencing and data preprocessing:
a) The library was sequenced at 150bp termini using Illumina Nextseq 500 sequencer with a sequencing throughput of not less than 5M.
b) Pear (v0.6.0) software combines double-end sequencing data of identical fragments sequenced by double-end 150bp under an Illumina Hiseq X10/Nextseq 500/Novaseq sequencer into a sequence with the shortest overlapping length of 20bp and the shortest length of 30bp after combination.
c) And performing de-splicing treatment on the combined sequencing data by using Trim _ galore v0.6.0 and cutatapt v1.8.1 software. The linker sequence "AGATCGGAAGCAC" was removed at the 5' end of the sequence and bases with a sequencing quality value below 20 were removed at both ends.
3. Sequencing data alignment
The reference genomic data used herein was from the UCSC database (UCSC: HG19, http:// hgdownload. Soe. UCSC. Edu/goldenPath/HG19/big Zips/HG19.Fa. Gz).
a) HG19 was first transformed with cytosine to thymine (CT) and adenine to Guanine (GA) using Bismark software, respectively, and the transformed genomes were separately indexed using Bowtie2 software.
b) The pre-processed data were also subjected to CT and GA transformation.
c) The transformed sequences were aligned to the transformed HG19 reference genome, respectively, using Bowtie2 software, with a minimum seed sequence length of 20, the seed sequences not allowing for mismatches.
4. Calculation of PDR for each sample
The degree of consistency of methylation of a target region described by the Probability of Discordant Reads (PDR). In short, if all CpG sites on the Read are methylated or unmethylated, then the Read is classified as identical; otherwise, it is inconsistent. At each CpG site, PDR equals the number of inconsistent Reads covering the position divided by the total number of Reads covering the position:
Figure SMS_2
wherein l represents the target methylation interval, N l,d Reads numbers, N, indicating methylation inconsistencies in the target region l Representing the total number of Reads located at the target methylation interval.
5. Feature matrix construction
a) And respectively combining the PDR value of each target methylation interval of each sample of the training set and the test set into a training set characteristic matrix and a test set characteristic matrix, and taking the target methylation interval with the number of Reads lower than 100 as a missing value.
b) Target methylation intervals with deletion ratios higher than 10% were removed.
c) And training a converter on the training set matrix by using a KNN algorithm, and performing missing data interpolation on the training set and the test set characteristic matrix by using the converter.
6. Finding out benign and malignant pulmonary nodule methylation markers according to training set samples and modeling process
The method and the device search for the optimal combination of the classifier and the parameters by adopting a grid search method in a training set. First, logistic regression, support Vector Machines (SVMs), and random forest models are pre-selected for classifier selection. For all features, a 3-fold cross-validation grid search method was used to select classifiers, which selected logistic regression as the optimal classifier. Then, grid search is carried out according to the combination of different modeling parameters and threshold values, the characteristics with the optimal classification capability are identified by using a logistic regression model, and finally 43 characteristic modeling models are screened out by using the model, wherein the 43 methylation characteristic regions and adjacent genes are shown in a table 2; these procedures were carried out using python (version 3.6.13) and scimit-spare (version 0.24.2).
TABLE 2 genomic locations of 43 methylation markers and associated genes
Figure SMS_3
/>
Figure SMS_4
7. Modeling effect assessment
The Training set (Training), the verification set (Validation), the Single-center Test set (Single-center Test) and the Multi-center Test set (Multi-center Test) are used for effect evaluation.
a) Training and verifying set effect evaluation:
the methylation levels of the 43 methylation markers are used for constructing a logistic regression machine learning model for identifying benign lung nodules and malignant lung nodules, and then verification is carried out on other data sets, and the specific steps are as follows:
using the logistic regression model in the sklern package in python: allModel = Logistic regression (C =10, penalty = 'l2', random _ state = 99)
Training using samples of the training set: fit (Traindata, trainPhono), wherein TrainData is data of a training set, trainPhono is the property of a training set sample (malignant nodule is 1, benign nodule is 0), and a correlation threshold value of a model is determined according to the sample of the training set.
And (2) testing by using samples of the verification set, the single-center test set and the multi-center test set, wherein the TestPred = AllModel, predict _ proba (TestData) [: 1], the TestData is data of the verification set or the test set, the TestPred is a model prediction score, and whether the sample is a malignant nodule or not is judged by using the prediction score according to the threshold value.
The model prediction scores of the training set, the verification set, the single-center test set and the multi-center test set are shown in fig. 1 and fig. 3, and the scores of benign lung nodule samples and malignant lung nodule samples can be seen to have significant difference. Corresponding ROC curves are respectively shown in fig. 2 and fig. 4, the training set AUC is 0.973, the validation set AUC is 0.810, and the single-center and multi-center test set AUC is 0.815 and 0.761 respectively. A prediction score threshold of 0.63 was set above which malignant lung nodules were predicted and conversely benign lung nodules were predicted, at which the single-and multi-center test sets had specificities of 0.857, 0.846 and sensitivities of 0.6, 0.391, respectively (see table 3).
TABLE 3 machine learning diagnostic model Effect
AUC Threshold value Accuracy of Specificity of the drug Sensitivity of the reaction
Training set 0.973 0.63 0.918 0.949 0.903
Verification set 0.810 0.63 0.710 0.789 0.674
Single-center test set 0.815 0.63 0.689 0.857 0.600
Multi-center test set 0.761 0.63 0.516 0.846 0.391
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. Methylation markers for detecting benign and malignant lung nodules, wherein the methylation markers comprise the following markers:
(a1) A fragment of at least one of the sequences Seq ID No. 1 to Seq ID No. 43 or the complete complement thereof;
(a2) A fragment comprising the sequence of (a 1) and a 1kb sequence upstream and downstream thereof;
(a3) A DNA methylation site in a region of the sequence set forth in (a 1) or (a 2);
(a4) Variants having at least 90% or more sequence identity to the sequence of (a 1) or (a 2) and having the same methylation site as (a 1) or (a 2); or
(a5) A DNA methylation haplotype covered in a region of the sequence represented by (a 1) or (a 2), and the abundance of the DNA methylation haplotype.
2. The primer or probe for detecting the methylation marker in claim 1, wherein the primer takes the nucleotide sequence of the methylation marker as a target sequence for specific amplification of the target sequence; the probe specifically captures the nucleotide sequence where the methylation marker is located.
3. A kit for detecting benign/malignant pulmonary nodules, comprising a reagent for detecting the methylation marker of benign/malignant pulmonary nodules according to claim 1.
4. Use of a reagent for detecting the methylation level of the benign and malignant methylation marker of lung nodule as claimed in claim 1 in the preparation of a diagnostic kit for identifying benign and malignant lung nodule in a sample.
5. The use of claim 4, wherein the detection reagent further comprises a reagent used in any one or a combination of PCR amplification, fluorescent quantitative PCR, digital PCR, liquid phase chip method, second generation sequencing, third generation sequencing, bisulfite sequencing, whole genome methylation sequencing, methylation chip method.
6. A method for constructing a lung nodule benign and malignant prediction evaluation model is characterized by comprising the following steps:
(a) Collecting benign samples and malignant samples of pulmonary nodules, and dividing the benign samples and the malignant samples into a training set and a testing set;
(b) Extracting cfDNA of a sample, and performing library building and sequencing;
(c) Carrying out methylation conversion treatment and data comparison on the sequence, and calculating the Probability of Discordant Reads (PDR) value of the sample;
(d) Constructing a characteristic matrix and an algorithm model for the sample data, and screening out benign and malignant pulmonary nodule methylation markers according to the sample data of the training set;
(e) Verifying the effect of the model by using the sample data of the test set;
(f) Methylation markers are determined for eventual predictive assessment of benign and malignant lung nodules.
7. The construction method according to claim 6, wherein the algorithmic model comprises a machine learning model; the machine learning model includes any one of a principal component analysis model, a logistic regression analysis model, a nearest neighbor analysis model, a support vector machine, and a neural network model.
8. A risk assessment model for benign and malignant pulmonary nodules, obtained by the construction method according to claim 6 or 7, comprising: the data acquisition module is at least used for acquiring a sample data set; a sequencing module at least for obtaining sequencing data; a data comparison module at least used for comparing the sequencing data with a reference sequence and determining the methylation result of the marker in the sequencing data based on the comparison result; and the result judging module is at least used for calculating a prediction score threshold value through statistical model analysis and judging whether the sample to be detected is benign or malignant of the lung nodule.
9. An information data processing terminal comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the following steps when executing the computer program:
(a) Obtaining the methylation level of at least one region of the sequences from Seq ID No:1 to Seq ID No:43 in a sample to be detected; (b) Calculating to obtain a score by constructing a logistic regression diagnosis model; and (c) identifying the benign and malignant lung nodules according to the scores.
10. A computer-readable storage medium, having a computer program stored thereon, which, when executed by a processor, performs the steps of:
(a) Obtaining the methylation level of at least one region of the sequences from Seq ID No. 1 to Seq ID No. 43 in the sample to be tested: (b) Calculating to obtain a score by constructing a logistic regression diagnosis model; and (c) identifying the benign and malignant lung nodules according to the scores.
CN202211065863.8A 2022-09-01 2022-09-01 Methylation marker for detecting benign and malignant lung nodules and application thereof Pending CN115851930A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211065863.8A CN115851930A (en) 2022-09-01 2022-09-01 Methylation marker for detecting benign and malignant lung nodules and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211065863.8A CN115851930A (en) 2022-09-01 2022-09-01 Methylation marker for detecting benign and malignant lung nodules and application thereof

Publications (1)

Publication Number Publication Date
CN115851930A true CN115851930A (en) 2023-03-28

Family

ID=85660733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211065863.8A Pending CN115851930A (en) 2022-09-01 2022-09-01 Methylation marker for detecting benign and malignant lung nodules and application thereof

Country Status (1)

Country Link
CN (1) CN115851930A (en)

Similar Documents

Publication Publication Date Title
CN111910004B (en) Application of cfDNA in noninvasive diagnosis of early breast cancer
WO2010003771A1 (en) Molecular markers for cancer prognosis
CN101999000A (en) Molecular in vitro diagnosis of breast cancer
EP3899956A2 (en) Systems and methods for using fragment lengths as a predictor of cancer
CN113355415B (en) Detection reagent and kit for diagnosis or auxiliary diagnosis of esophageal cancer
WO2023226938A1 (en) Methylation biomarker, kit and use
US20200109457A1 (en) Chromosomal assessment to diagnose urogenital malignancy in dogs
CN107992719A (en) A kind of carcinoma of urinary bladder detection kit based on high-flux sequence
CN106399304A (en) Breast cancer related SNP marker
JP2012507799A (en) Genomic classification of malignant melanoma based on pattern of gene copy number change
CN112210601A (en) Colorectal cancer screening kit based on fecal sample
US20140162895A1 (en) System, computer program and method for determining behavior of thyroid tumor
CN115976209A (en) Training method of lung cancer prediction model, prediction device and application
CN115851930A (en) Methylation marker for detecting benign and malignant lung nodules and application thereof
CN112210602A (en) Colorectal cancer screening method based on stool sample
CN115772566B (en) Methylation biomarker for auxiliary detection of lung cancer somatic ERBB2 gene mutation and application thereof
CN116804218A (en) Methylation marker for detecting benign and malignant lung nodules and application thereof
CN115772567B (en) Methylation site for auxiliary detection of lung cancer somatic cell TP53 gene mutation and application thereof
WO2022262831A1 (en) Substance and method for tumor assessment
CN106636351A (en) SNP marker related to breast cancer and its application
CN106834476A (en) A kind of breast cancer detection kit
CN117711498A (en) Data processing device and system for assisting in distinguishing benign and malignant thyroid tumors and application of data processing device and system
CN116555426A (en) Kit for identifying tumor tissue sources and data analysis method
CN117746991A (en) Data processing device and system for thyroid cancer diagnosis
CN116403719A (en) Construction method of breast nodule malignancy differential diagnosis model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination