CN117344014A - Pancreatic cancer early diagnosis kit, method and device thereof - Google Patents

Pancreatic cancer early diagnosis kit, method and device thereof Download PDF

Info

Publication number
CN117344014A
CN117344014A CN202310887408.4A CN202310887408A CN117344014A CN 117344014 A CN117344014 A CN 117344014A CN 202310887408 A CN202310887408 A CN 202310887408A CN 117344014 A CN117344014 A CN 117344014A
Authority
CN
China
Prior art keywords
pancreatic cancer
sample
model
machine learning
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310887408.4A
Other languages
Chinese (zh)
Other versions
CN117344014B (en
Inventor
沈柏用
邹思奕
李凡露
石涵
杨峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ruijing Biotechnology Co ltd
Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Original Assignee
Shanghai Ruijing Biotechnology Co ltd
Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ruijing Biotechnology Co ltd, Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd filed Critical Shanghai Ruijing Biotechnology Co ltd
Priority to CN202310887408.4A priority Critical patent/CN117344014B/en
Publication of CN117344014A publication Critical patent/CN117344014A/en
Application granted granted Critical
Publication of CN117344014B publication Critical patent/CN117344014B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Zoology (AREA)
  • Theoretical Computer Science (AREA)
  • Immunology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Biochemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Microbiology (AREA)
  • Pathology (AREA)
  • Bioethics (AREA)
  • Hospice & Palliative Care (AREA)
  • Mathematical Physics (AREA)
  • Oncology (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)

Abstract

The invention relates to the field of molecular diagnosis, in particular to a pancreatic cancer early diagnosis kit, a pancreatic cancer early diagnosis method and a pancreatic cancer early diagnosis device, wherein the kit comprises a probe composition for capturing differential methylation regions of ctDNA of pancreatic cancer related genes, and the pancreatic cancer related genes are selected from any one or more of the following genes: BNC1, TFPI2, FBXL7, DBX1, IGF1R, TAF, ADAMTS16, ZNF710, ANKRD60, UTF1, OR1F1, GALNT17. The diagnostic method comprises the steps of: s1, acquiring sequencing data of ctDNA of a sample to be tested; s2, the machine learning model judges and outputs the probability that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the acquired sequencing data, and judges that the sample to be detected is the healthy sample or the pancreatic cancer sample according to the probability cut-off value. The invention can realize high-precision and noninvasive early diagnosis of pancreatic cancer.

Description

Pancreatic cancer early diagnosis kit, method and device thereof
Technical Field
The invention relates to the field of molecular diagnosis, in particular to a pancreatic cancer early diagnosis kit, a pancreatic cancer early diagnosis method and a pancreatic cancer early diagnosis device.
Background
Pancreatic cancer is a very dangerous malignancy that is often asymptomatic early, difficult to find early, resulting in a relatively low chance of treatment for most patients having entered a late stage at the time of diagnosis. Pancreatic cancer responds poorly to conventional treatments, resulting in extremely low five year survival rates, less than 10%. In addition, pancreatic cancer is often characterized by invasive growth and early metastasis, which reduces the feasibility of surgical resection and other treatments. The difficulty in clinically detecting pancreatic cancer is in several respects. First, because its early symptoms are atypical and not obvious, patients often misinterpret it as other common gastrointestinal disorders, resulting in delayed diagnosis. Second, the pancreas is located deep in the abdominal cavity, making detection difficult for physical examination and routine palpation. In addition, standard imaging examinations such as CT, MRI, etc. have a relatively low detection rate of early stage pancreatic cancer. Specific markers for pancreatic cancer such as CA 19-9 and CEA have limited application in diagnosis and may have false positive or false negative results. In summary, pancreatic cancer is a hazard in its advanced diagnosis and treatment, as well as resistance to conventional treatments. Difficulties in clinical testing include early symptom atypia, limitations in the testing method, and inaccuracy of the markers. Thus, in order to improve prognosis and survival of pancreatic cancer, further research and development of more accurate, reliable early detection methods and treatment strategies are needed.
DNA methylation is a biological process and important biomarker that plays a key role in cell development, genomic imprinting, and chromosomal stability. Abnormal methylation levels of specific genomic regions can be indicative of the occurrence and progression of various cancers, including pancreatic cancer. Circulating tumor DNA (ctDNA) is fragmented DNA derived from tumor cells, which is present in the blood. The methylation state of ctDNA in blood samples can provide a "molecular fingerprint" for different types of cancer. Numerous studies have demonstrated that detecting specific ctDNA methylation status in blood samples is an effective non-invasive early cancer detection method.
DNA methylation detection based on second-generation sequencing has been rapidly developed in recent years, and early screening and early diagnosis of cancers can be greatly promoted. This technique involves sequencing the DNA, leaving the methylated cytosine unchanged after a treatment to convert unmethylated cytosine to uracil. This allows the methylation status of each cytosine in the genome to be determined, providing a comprehensive view of the methylation landscape. In the detection of cancer, DNA methylation sequencing is particularly valuable. Cancer cells often exhibit abnormal DNA methylation patterns, which can be a unique marker for disease. By sequencing the circulating tumor DNA (ctDNA) released into the blood, these methylation patterns can be captured with extremely high sensitivity by non-invasive detection means. Furthermore, DNA methylation sequencing can not only detect the presence of cancer, but can also provide information about its characteristics. Different types of cancer, and even different stages of the same cancer, may have different methylation patterns. By analyzing these patterns, guidance can be provided for determining the type and stage of progression of the cancer, as well as treatment options.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a kit, a method and a device for early diagnosis of pancreatic cancer, which are used for solving the problems of the prior art.
To achieve the above and other related objects, the present invention provides a probe composition for capturing a differential methylation region of ctDNA of a pancreatic cancer-related gene selected from any one or more of the following: BNC1, TFPI2, FBXL7, DBX1, IGF1R, TAF, ADAMTS16, ZNF710, ANKRD60, UTF1, OR1F1, GALNT17.
The differential methylation regions include regions between corresponding sites on any one or more of the following (coordinates based on the GRCh38 human reference genome):
the invention also provides application of the probe composition in preparing a pancreatic cancer diagnosis kit.
In the present invention, the pancreatic cancer is selected from Pancreatic Acinar Cell Carcinoma (PACC) or Pancreatic Ductal Adenocarcinoma (PDAC). The pancreatic cancer diagnosis kit is a pancreatic cancer early diagnosis kit. The pancreatic cancer early diagnosis kit refers to a pancreatic cancer diagnosis kit of stage I or stage II.
The invention also provides a pancreatic cancer diagnosis kit, wherein the kit comprises the probe composition.
The invention also provides a pancreatic cancer detection method, which comprises the following steps:
s1, acquiring sequencing data of ctDNA of a sample to be tested;
s2, judging and outputting the probability that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the acquired sequencing data by the machine learning model, and judging that the sample to be detected is the healthy sample or the pancreatic cancer sample according to the probability cut-off value; the machine learning model is constructed by the following method:
s21, obtaining sequencing data of pancreatic cancer groups and healthy groups, and dividing the sequencing data into a training verification data set and a test data set;
s22, training and verifying a machine learning model by using the training and verifying data set, and evaluating the obtained machine learning model;
and S23, testing and adjusting the machine learning model obtained in the step S22 by using a test data set until the judgment of the model is in accordance with the actual judgment, and obtaining the optimal model.
The invention also provides a pancreatic cancer detection device, which comprises the following modules:
1) The data acquisition module to be tested: sequencing data for acquiring ctDNA of a sample to be tested;
2) And a detection module: the machine learning model judges and outputs the probability that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the acquired sequencing data, and judges that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the probability cut-off value.
In some embodiments of the invention, the machine learning model includes the following sub-modules:
1) A data set acquisition sub-module: the method comprises the steps of obtaining sequencing data of pancreatic cancer people and healthy people, and dividing the sequencing data into a training verification data set and a test data set;
2) Model construction submodule: for training, validating the machine learning model using the training validation data set, and evaluating the obtained machine learning model;
3) Model optimization sub-module: and (3) testing and adjusting the machine learning model obtained in the step (S22) by using the test data set until the judgment of the model is consistent with the actual judgment, and obtaining the optimal model.
The present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method.
The invention provides an electronic terminal, comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory so as to enable the terminal to execute the method.
As described above, the kit, the method and the device for early diagnosis of pancreatic cancer have the following beneficial effects:
1. The invention can find the occurrence and development signals of the tumor at very early stage, and even presents the indication of the tumor before the traditional histopathological method. Can effectively improve prognosis and survival rate for cancer species with higher malignancy degree such as pancreatic cancer.
2. The invention is a non-invasive method, and compared with methods such as endoscopic puncture, the invention is more convenient and safer, and the compliance of testers is higher.
3. The detection of the invention only needs a trace amount of ctDNA methylation level in plasma, and can detect tumorigenic signals in early pancreatic cancer, thus having extremely high detection sensitivity.
4. The methylation region is obtained by screening based on a region screening method of pancreatic cancer specificity of Chinese people, and has extremely high detection accuracy on pancreatic cancer of Chinese people.
Drawings
FIG. 1 shows a basic flow chart for detecting pancreatic cancer according to the present invention.
Figure 2 shows the ability of the device of the present invention to detect pancreatic cancer.
Fig. 3 is a schematic view showing an apparatus for detecting pancreatic cancer according to the present invention.
Fig. 4 shows a schematic diagram of an electronic terminal according to the present invention.
FIG. 5 shows the specificity of pancreatic cancer detection for the device of the present invention.
Detailed Description
The invention provides a pancreatic cancer circulating tumor DNA (ctDNA) detection model based on high-throughput sequencing, which detects methylation levels of methylation markers in biological samples from a subject, namely detects methylation state changes of ctDNA in a plurality of specific genome regions, and judges whether the subject suffers from pancreatic cancer.
The present invention provides a probe composition for capturing a differentially methylated region of ctDNA of a pancreatic cancer-related gene selected from any one or more of: BNC1, TFPI2, FBXL7, DBX1, IGF1R, TAF, ADAMTS16, ZNF710, ANKRD60, UTF1, OR1F1, GALNT17.
The differential methylation region is a genomic region that covers differential methylation sites, which are the methylation sites most significantly associated with cancer and non-cancer classification.
The differentially methylated regions are located in the exon (Exons), intron (Introns), untranslated region (Untranslated regions, UTRs) and/or promoter (promters) regions of the gene.
The differential methylation sites or differential methylation regions can be obtained by prior art techniques. Although techniques for screening for differential methylation sites or differential methylation regions are known per se, screening for a set of sites with higher sensitivity and accuracy is also very difficult in the art because different samples or different screening methods may be used to screen for different differential methylation sites, which form differential methylation regions that are used for diagnosis or detection that are different.
In certain embodiments of the invention, the differential methylation region is obtained by the following method:
a plurality of tissue samples of pancreatic cancer patients and corresponding normal samples are collected for clinical diagnosis, and methylation levels of CpG sites in a genome are measured, so that a comprehensive methylation level matrix is established among the plurality of samples. Then, based on the methylation level of each site, a logistic regression model is built with sequencing depth as a weight, sample type as a response variable. Methylation sites most significantly associated with cancer, non-cancer classification, i.e., differential methylation sites, were determined by model. By combining adjacent differential methylation sites, the most efficient characteristic differential methylation region is obtained.
The differential methylation regions include regions between corresponding sites on any one or more of the following (coordinates based on the GRCh38 human reference genome):
in certain embodiments of the invention, the probe composition comprises detecting a first composition for capturing sequences in a methylated state and a second composition for capturing sequences in a non-methylated state.
In certain embodiments of the invention, the first composition comprises a plurality of probes that are identical or complementary to sequences of methylation states in the differentially methylated regions, and/or the second composition comprises a plurality of probes that are identical or complementary to sequences of non-methylation states in the differentially methylated regions.
The length of each probe is 80-120 bp. The probe composition is capable of covering a target region, i.e., a chromosomal region as shown in the table above.
The nucleotide sequences of the probe compositions are identical or complementary to the target regions, enabling them to bind to these target regions and facilitate their capture. This allows for targeted enrichment of these regions in the sequencing library for detailed analysis during sequencing. The resulting sequencing data provides detailed information on the methylation status of each genomic site in the target region.
The invention also provides application of the probe composition in preparing a pancreatic cancer diagnosis kit.
In the present invention, the pancreatic cancer is selected from Pancreatic Acinar Cell Carcinoma (PACC) or Pancreatic Ductal Adenocarcinoma (PDAC). The pancreatic cancer diagnosis kit is a pancreatic cancer early diagnosis kit. The pancreatic cancer early diagnosis kit refers to a pancreatic cancer diagnosis kit of stage I or stage II.
The invention also provides a pancreatic cancer diagnosis kit, wherein the kit comprises the probe composition.
The kit also comprises ctDNA extraction reagent.
The ctDNA extraction reagent may be a commercial extraction reagent.
The kit also comprises ctDNA pretreatment reagent, wherein the ctDNA pretreatment reagent is a reagent for converting unmethylated cytosine in DNA into uracil, and methylated cytosine is kept unchanged.
ctDNA extracted from a patient's blood sample carries the same genetic and epigenetic changes as tumor cells, and is then subjected to pretreatment reagent transformation. The ctDNA can distinguish between methylated and unmethylated DNA sequences after pretreatment with pretreatment reagents. The ctDNA after pretreatment was used to prepare a sequencing library.
The ctDNA pretreatment reagent is selected from the group consisting of bisulphite. The bisulphite is, for example, any one or more of sodium bisulphite, calcium bisulphite, potassium bisulphite or ammonium bisulphite. The working concentration of the bisulphite is 1 mol/L-5 mol/L. For example, the working concentration of the bisulfite is 1mol/L to 2mol/L, 2mol/L to 3mol/L, 3mol/L to 4mol/L, or 4mol/L to 5mol/L.
The kit also comprises library construction reagents. The library construction reagent is a commercial reagent.
The library construction reagent comprises a terminal repair reagent, a sequencing joint reagent, an extension reagent, a product purification reagent, a library purification reagent, a quality control reagent and the like.
The invention also provides a pancreatic cancer detection method, which comprises the following steps:
s1, acquiring sequencing data of ctDNA of a sample to be tested;
S2, judging and outputting the probability that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the acquired sequencing data by the machine learning model, and judging that the sample to be detected is the healthy sample or the pancreatic cancer sample according to the probability cut-off value; the machine learning model is constructed by the following method:
s21, obtaining sequencing data of pancreatic cancer groups and healthy groups, and dividing the sequencing data into a training verification data set and a test data set;
s22, training and verifying a machine learning model by using the training and verifying data set, and evaluating the obtained machine learning model;
and S23, testing and adjusting the machine learning model obtained in the step S22 by using a test data set until the model obtains the maximum prediction accuracy, and obtaining the optimal model.
In certain embodiments of the invention, the sequencing data is obtained by sequencing a sequencing library constructed from a sample using the probe composition or the kit.
In some embodiments, the sample to be tested is selected from a tissue sample or a blood sample.
In some embodiments, the sample to be tested is a plasma sample. The sample to be tested comprises ctDNA separated from plasma.
In certain embodiments of the invention, the sequencing data in S1 is pre-processed sequencing data.
In some embodiments of the invention, the preprocessing method includes cleaning up data, deleting irrelevant features, processing missing values, and homogenizing the resulting data.
In some embodiments of the invention, cleaning data refers to removing data in the sequencing result that does not meet quality requirements. Data that does not meet quality requirements includes data results with insufficient length, insufficient sequencing Q30, excessive error rates, insufficient alignment rates, and other significant methylation rate bias, and the like. The cleaned data has high quality and low bias.
In some embodiments of the present invention, the process of processing the deficiency value is to check the calculation result of the methylation rate, and then determine to remove the methylation site or fill in the deficiency value when the methylation rate is not output by the analysis flow due to insufficient sequencing depth, insufficient coverage, poor quality, technical failure, etc. When a methylation site has a missing value in more than 20% of the data, the site is removed and not used in the subsequent modeling process. Otherwise, the median methylation rate of the methylation site in the past data is calculated and used to fill in the missing values at that location.
In some embodiments of the invention, the feature vector is normalized in each dimension using z-score. Specifically, the z-score conversion is used to convert to a fraction between (0, 1). Specifically, the z-score normalization function is as follows:
wherein x is i Represents the methylation rate measured at methylation site i,representing the mean value of the corresponding dimensional data of the sample data,s is the standard deviation thereof>
In some embodiments of the invention, the machine learning model is a support vector machine (Support Vector Machine, SVM) model.
The SVM model is a machine learning algorithm that can be used for classification and regression analysis. In the present invention, the SVM model is used to classify ctDNA test samples as healthy samples or pancreatic cancer samples based on the methylation status of specific genomic loci. The SVM model is trained by using ctDNA sample data sets from patients and healthy individuals clinically diagnosed with pancreatic cancer. Methylation status of a particular genomic locus is determined from the sequencing results and is used as an input feature of the SVM model. The model distinguishes pancreatic cancer from normal ctDNA samples by learning specific methylation patterns. The SVM model is unique in that it can accurately predict the presence of pancreatic cancer based on the methylation pattern of ctDNA. In addition, SVM models are capable of processing high-dimensional data, such as methylation status of multiple genomic loci, making them particularly useful in the context of the present invention. The ability of the model to maximize boundaries between categories also contributes to its robustness and accuracy in pancreatic cancer prediction.
In certain embodiments of the present invention, the kernel function of the SVM model is a radial basis kernel function. The radial basis function is a scalar function of some radial symmetry, typically defined as a monotonic function of the euclidean distance between any point in space to some center, which serves to calculate the similarity.
In one embodiment, the probability cutoff value is 0.58. Samples with probability scores higher than 0.58 were classified as pancreatic cancer samples, while samples with probability scores lower than 0.58 were classified as healthy samples. The probability cut-off value is used for improving sensitivity as much as possible on the premise of ensuring effective specificity, so that Rule-Out elimination effect of benign results in early diagnosis is ensured. This further enhances the clinical utility of the model, providing a reliable tool for early detection of PDACs.
In certain embodiments of the invention, the sequencing data is split into a training validation data set and a test data set at a ratio of 80%, 20% in S21.
In some embodiments of the invention, K-fold cross validation is used to perform model parameter selection during training and validation of machine learning models. K-fold cross-validation defaults to three-fold, typically five or ten-fold. The K-fold cross-validation process includes: (1) randomly dividing the sample dataset into K shares; (2) 1 part of the training set is selected as a verification set, and the rest (K-1) parts are selected as training sets; training on the training set to obtain a model, testing on the verification set by using the model, and storing the evaluation index of the model; (3) Repeating the step (2) K times to ensure that each subset has a single opportunity as a verification set; (4) And calculating an average value of the K groups of test indexes as an estimation of model precision, and taking the average value as a performance index of the model under the current K-fold cross validation. And determining the optimal model parameters through maximum accuracy, precision, recall and F1 score obtained by K-fold cross validation.
In certain embodiments of the present invention, the Mean Absolute Error (MAE), root Mean Square Error (RMSE), and/or the decision coefficient (R 2 ) As an index for evaluating the machine learning model. Specifically:
wherein y is i To be a true value of the value,for predictive value +.>N is the number of verification samples, which is the average of the predicted values.
In certain embodiments of the present invention, the maximum prediction accuracy in S23 may be assessed by the area under the curve (AUC) shown by the ROC curve, with the AUC peaking, the model being considered to achieve the maximum prediction accuracy.
The invention also provides a pancreatic cancer detection device, which comprises the following modules:
1) The data acquisition module 11 to be tested: sequencing data for acquiring ctDNA of a sample to be tested;
2) The detection module 12: the machine learning model judges and outputs the probability that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the acquired sequencing data, and judges that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the probability cut-off value.
In some embodiments of the invention, the machine learning model includes the following sub-modules:
1) A data set acquisition sub-module: the method comprises the steps of obtaining sequencing data of pancreatic cancer people and healthy people, and dividing the sequencing data into a training verification data set and a test data set;
2) Model construction submodule: for training, validating the machine learning model using the training validation data set, and evaluating the obtained machine learning model;
3) Model optimization sub-module: and (3) testing and adjusting the machine learning model obtained in the step (S22) by using the test data set until the judgment of the model is consistent with the actual judgment, and obtaining the optimal model.
Since the pancreatic cancer detection device is basically the same as the principle of the foregoing method, in the foregoing method and device embodiments, the definition of the same features, the calculation method, the enumeration of embodiments and the enumeration of preferred embodiments may be mutually used, and the detailed description will not be repeated.
It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the data acquisition module may be a processing element that is set up separately, may be implemented in a chip of the above apparatus, or may be stored in a memory of the above apparatus in the form of program codes, and the functions of the data acquisition module may be called and executed by a processing element of the above apparatus. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.
For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more microprocessors (digital signal processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
In some embodiments of the present invention, there is also provided an electronic terminal, a schematic structural diagram of which is shown in fig. 4, the electronic terminal including: a processor 41, a memory 42, a communicator 43, a communication interface 44 and a system bus 45; the memory 42 and the communication interface 44 are connected to the processor 41 and the communicator 43 via a system bus 45 and perform communication with each other, the memory 42 being for storing a computer program, the communication interface 44 being for communicating with other devices, the processor 41 and the communicator 43 being for running the computer program for causing the terminal to perform the method.
The system bus mentioned above may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The system bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface is used to enable communication between the database access apparatus and other devices (e.g., clients, read-write libraries, and read-only libraries). The memory may comprise random access memory (Random Access Memory, RAM) and may also comprise non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The electronic terminal may be a mobile phone, a computer device, a tablet device, a personal digital processing device, a factory background processing device, etc.
In some embodiments of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the foregoing detection method.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
In some embodiments of the present invention, there is also provided a computer processing device including a processor and the aforementioned computer-readable storage medium, the processor executing a computer program on the computer-readable storage medium to implement the steps of the aforementioned method.
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It is noted that in the following description, reference is made to the accompanying drawings, which describe several embodiments of the present application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "upper," and the like, may be used herein to facilitate a description of one element or feature as illustrated in the figures as being related to another element or feature.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," "held," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, operations, elements, components, items, categories, and/or groups. The terms "or" and/or "as used herein are to be construed as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; A. b and C). An exception to this definition will occur only when a combination of elements, functions or operations are in some way inherently mutually exclusive.
Examples
Differential methylation region screening:
based on the collected cancer tissue samples and corresponding paracancerous normal samples of patients clinically diagnosed with pancreatic cancer, the CpG site methylation level of the genome was determined, and the calculation formula was meta=c/(c+t). Integrated into a multisample methylation level matrix.
And taking the site with the sequencing depth lower than 10 as a deletion value, filtering the site with the deletion value higher than 40%, and filling the deletion values of the rest sites by using a K nearest neighbor method. A logistic regression (logistic Regression) model was constructed based on the methylation level of each methylation site, and with sequencing depth as a weight, and sample type as a response variable. The model can identify methylation sites most significantly associated with cancer/non-cancer classification. Methylation sites most significantly associated with the cancer/paracancerous classification were screened for methylation levels that were significantly different in the cancer and paracancerous samples. By combining adjacent methylation sites, the most efficient characteristic differential methylation regions are obtained.
Establishing pancreatic cancer specific probe capture Panel:
ProbeTools software was used to aid in the design of capture probes for the 13 genomic regions of Panel according to the invention. Thereby achieving efficient enrichment and subsequent sequencing of the region of interest. Alignment logic based on BLAST algorithm is adopted to ensure that the designed probes have no obvious homology in other areas of the genome, thereby reducing nonspecific binding. The melting temperature (Tm) of the probe and the target DNA is calculated at 60 ℃, the software optimizes the probe sequence accordingly, ensures that the probe has enough affinity to bind with the target to achieve efficient capture, and ensures that the probe can be eluted by subsequent steps without being replaced in the sequencing process. The software also considers the length parameters of the probe, ensuring that the probe can provide higher specificity and affinity, and also ensuring that the risk of secondary structures interfering with the capture process is not formed.
The probe arrangement takes 100bp as interval, so as to ensure the coverage of at least the following regions to be detected (coordinates are based on GRCh38 human reference genome):
blood ctDNA extraction and sulfite conversion:
1. sample processing
1.1 Whole blood collection: 5ml venous blood was collected using EDTA anticoagulant or free DNA blood collection tubes. Blood samples collected by using common EDTA vacuum blood collection tubes should be immediately separated from plasma, and if the blood samples cannot be immediately separated from the blood samples, the blood samples should be preserved at the temperature of 2-8 ℃ for no more than 4 hours; the plasma collected using the free DNA blood collection tube can be stored at room temperature for 4 days. No frozen blood samples were obtained.
1.2 preparation of plasma samples: the whole blood-filled blood collection tube was centrifuged for 12 minutes at a centrifugal force of 1350.+ -.150 rcf. The plasma samples can be stored at-20+ -5deg.C for no more than 30 days. The plasma sample may be stored at 2-8 ℃ for no more than 12 hours.
1.3 sulfite-transformed DNA (BisDNA): the sulfite conversion was performed according to the Qiagen QIAamp Circulating Nucleic Acid Kit nucleic acid extraction kit instructions and Thermo Scientific EpiJET kit instructions. If the BisDNA is not used immediately, the BisDNA is stored at 2-8 ℃ for 16 hours or at-20+ -5 ℃ for 4 days.
The specific method for treating the DNA by using the bisulphite comprises the following steps:
(1) 1g of sodium bisulphite powder is weighed to prepare a 3M buffer solution by adding water.
(2) Preparing a protection buffer solution, weighing 1g of hydroquinone reagent, and adding water to prepare 0.5M protection buffer solution.
(3) Mu.l of DNA solution (DNA content 100 ng), 200. Mu.l of bisulfite buffer and 50. Mu.l of protection solution were mixed and mixed by shaking.
(4) Thermal cycling: 95℃for 5min,50℃for 30min,95℃for 5min,50℃for 2h,95℃for 5min,50℃for 5h,4 ℃.
(5) 1ml of DNA binding buffer was added to the bisulfite-treated DNA solution, and 50. Mu.l of magnetic beads were added thereto and incubated with shaking for 1 hour.
(6) The beads were adsorbed by a magnetic separator and the supernatant solution was discarded.
(7) 0.5ml of washing buffer A was added to resuspend the beads and washing was performed with shaking for 1min.
(8) The beads were adsorbed with a magnetic separator and the supernatant discarded.
(9) 0.5ml of washing buffer B was added to resuspend the beads and washing was performed with shaking for 1min.
(10) The beads were adsorbed by a magnetic separator and the supernatant solution was discarded.
(11) The reaction mixture was centrifuged at 10000rpm for 1min, and the beads were adsorbed by a magnetic separator to remove the residual supernatant.
(12) The centrifuge tube with the magnetic beads is placed on a metal bath at 55 ℃, and the centrifuge tube is uncapped and dried for 10min.
(13) Add 50. Mu.l elution buffer to resuspend the beads, place on 65℃metal bath and shake wash for 10min.
(14) And (3) adsorbing the magnetic beads by using a magnetic separator, taking out a buffer solution containing target DNA, quantifying the DNA, and marking.
Establishing an NGS sequencing library:
all reagents in the subsequent library procedure were from the Ai Jitai kang ssDNA Library Prep Kit kit.
1. Sample DNA end repair: the end repair reaction system was prepared on an ice box as follows:
component (A) Volume of
Post-transformation cfDNA 11μL
1-1Polishing Buffer 1μL
1-2Polishing Enzyme 1μL
Total 13μL
The mixture was blown and homogenized (avoiding vigorous shaking and homogenization) using a pipette, centrifuged instantaneously, and placed on a PCR instrument to run the following procedure:
temperature (thermal cover 85 ℃ C.) Time
37℃ 30min
65℃ 5min
4℃ Hold
2.3' end-to-end connection: adapter 1 (20. Mu.M) was diluted to the appropriate concentration in advance according to the amount of DNA put into the pool:
DNA input amount Concentration of Adapter 1 Adapter 1 dilution factor
50ng 4μM 5 times of
10~20ng 2μM 10 times of
5ng 0.2μM 100 times of
Preparing a 3' end joint connection reaction system on an ice box according to the following table:
component (A) Volume of
End repair products 13μL
2-1Ligation Buffer 1 4.5μL
2-3Adapter 1 1μL
2-2Ligation Enzyme 1.5μL
Total 20μL
The mixture was blown and homogenized (avoiding vigorous shaking and homogenization) using a pipette, centrifuged instantaneously, and placed on a PCR instrument to run the following procedure:
temperature (thermal cover 105 ℃ C.) Time
37℃ 30min
95℃ 2min
4℃ Hold
3. Two-strand Extension, the Extension Primer (10. Mu.M) was diluted to an appropriate concentration in advance according to the amount of DNA put into the pool:
DNA input amount Extension Primer concentration Extension Primer dilution factor
10~50ng 10μM Not diluting
5ng 5μM 2 times of
The two-chain extension reaction system was prepared on an ice box as follows:
component (A) Volume of
Ligation products 20μL
3-1Extension Mix 21μL
3-2Extension Primer 1μL
Total 42μL
The mixture was blown and homogenized (avoiding vigorous shaking and homogenization) using a pipette, centrifuged instantaneously, and placed on a PCR instrument to run the following procedure:
4. two-chain extension product purification:
31.2. Mu.L XP beads were pipetted into 24. Mu.L of the product of the two-chain extension step and mixed well and incubated for 5min at room temperature.
The centrifuge tube was placed on a magnetic rack for 2-5min until the liquid was clear, carefully aspirated with a pipette and the supernatant discarded.
The centrifuge tube was kept on a magnetic rack, 150. Mu.L of freshly prepared 80% ethanol was added to rinse the beads, and after 30s of rest the supernatant was carefully aspirated and discarded.
Repeating the step (3) to suck the liquid in the pipe as much as possible; if a small amount of residual liquid exists on the tube wall, the centrifugal tube can be instantaneously centrifuged, and after the centrifugal tube is separated on a magnetic frame, the residual liquid at the bottom of the tube is sucked by a small-range pipette.
And (3) keeping the centrifuge tube on a magnetic frame, opening the tube cover of the centrifuge tube, and drying at room temperature until the surface of the magnetic bead has no reflection and no cracking.
The centrifuge tube was removed from the magnet rack, and 11 μl NFW was added for DNA elution, gently pipetting a minimum of 10 times to complete mixing. Incubate at room temperature for 5min.
The tube was placed on a magnetic rack, allowed to stand for 2-5min until the liquid was completely clear, and 10. Mu.L of supernatant was transferred to a new 0.2mL PCR tube.
The DNA product can be purified by using Beckmann AMPure XP Beads, and can be replaced by other similar functional Beads; in addition to the magnetic bead purification scheme, kits or other schemes with similar functions can be used instead, such as rubber cutting purification, column purification, and the like.
5.5' end-to-end connection:
adapter 2 (20. Mu.M) was diluted to the appropriate concentration in advance according to the amount of DNA put into the pool:
DNA input amount Adapter 2 concentration Adapter 2 dilution
50ng 20μM Not diluting
10~20ng 4μM 5 times of
5ng 0.4μM 50 times of
The 5' end-to-end connection reaction system was prepared on an ice box as follows:
component (A) Volume of
Purified extension products 13μL
4-1Ligation Buffer 2 4.5μL
4-2Adapter 2 1μL
2-2Ligation Enzyme 1.5μL
Total 20μL
The mixture was blown and homogenized (avoiding vigorous shaking and homogenization) using a pipette, centrifuged instantaneously, and placed on a PCR instrument to run the following procedure:
temperature (thermal cover 105 ℃ C.) Time
37℃ 30min
95℃ 2min
4℃ Hold
6. Pre-library PCR reaction:
pre-library PCR reaction systems were prepared on ice box as follows:
component (A) Volume of
Ligation products 20μL
5-1Pre PCR Mix 25μL
UDI-XXX 5μL
Total 50μL
Note that: different samples of the same batch should use different UDI-XXX.
The mixture was blown and homogenized (avoiding vigorous shaking and homogenization) using a pipette, centrifuged instantaneously, and placed on a PCR instrument to run the following procedure:
And (3) injection: the number of PCR cycles can be recommended with reference to the following table, depending on the amount of DNA to be added for library construction.
DNA input amount Cycle number N (about 2. Mu.g yield)
50ng 8
20ng 10
10ng 11
5ng 12
7. Pre-library purification:
the Selection Beads were equilibrated at room temperature for 30min and vortexed.
60 mu L Selection Beads is added to the PCR product, and the mixture is blown and mixed by a pipette or mixed by slight vortex, and the mixture is kept stand at room temperature for 5min.
The PCR tube was centrifuged instantaneously and placed on a magnetic rack, and after the solution was clarified, the supernatant was discarded.
The PCR tube was kept on a magnetic rack, the beads were rinsed by adding 150. Mu.L of freshly prepared 80% ethanol, left to stand for 30s, the supernatant discarded, and the procedure repeated for a total of two rinses.
The PCR tube is instantaneously centrifuged and placed on a magnetic rack, the residual ethanol at the bottom is discarded (attention is paid not to be attracted to the magnetic beads), and the PCR tube is uncovered and dried until no ethanol remains (the magnetic beads do not reflect light, attention is paid to the fact that the magnetic beads are not excessively dried to a dry crack state).
Add 30. Mu.L of Nuclear-Free Water, blow mix or slightly vortex mix with a pipette and leave stand for 5min at room temperature.
The PCR tube was centrifuged transiently and placed on a magnetic rack, after the solution was clarified, 29. Mu.L of supernatant was aspirated into the new PCR tube, yielding a purified pre-library.
8. Library quality control:
library concentration quantification was performed using Qubit 2.0, qubit 3.0, qubit 4.0.
Library fragment analysis was performed using a biological analysis instrument such as PE Labchip, qsep 100, qsep 400, agilent 2100, etc.
9. Liquid phase hybridization:
all reagents in the subsequent hybridization and capture steps were from Ai Jitai kang ssDNA Library Frep Kit.
Hyb Buffer was thawed at room temperature and preheated to 65 ℃ until completely dissolved (transparent and free of precipitation).
750ng of the pre-library was added to the PCR tubes (500 ng of each pre-library was added, suggesting no more than 4 rounds at the most, when multiple pre-libraries were hybridized together), and 10. Mu.L of Enhancer was added to each PCR tube.
3 volumes of Selection Beads were added to the PCR tube, and the mixture was vortexed with a pipette and incubated at room temperature for 5min.
The PCR tube was centrifuged instantaneously and placed on a magnetic rack, and after the solution was clarified, the supernatant was discarded.
The PCR tube was kept on a magnetic rack, the beads were rinsed by adding 200. Mu.L of freshly prepared 80% ethanol, left to stand for 30s, the supernatant discarded, and the procedure repeated for a total of two rinses.
The PCR tube is instantaneously centrifuged and placed on a magnetic rack, the residual ethanol at the bottom is discarded (attention is paid not to be attracted to the magnetic beads), and the PCR tube is uncovered and dried until no ethanol remains (the magnetic beads do not reflect light, attention is paid to the fact that the magnetic beads are not excessively dried to a dry crack state).
The hybridization reaction solution was added to the PCR tube as follows:
Component (A) Volume of
6-1Hyb Buffer (65 ℃ C. Preheating) 13μL
6-2Hyb Human Block 5μL
6-3Hyb Adapter Block 2μL
6-4RNase Block 5μL
Target Probes 2μL
Nuclease-Free Water 3μL
Total 30μL
And (3) blowing and uniformly mixing by using a pipette, and standing for 3min at room temperature. The PCR tube is instantaneously centrifuged and placed on a magnetic rack, 28 mu L of supernatant is sucked into a new PCR tube after the solution is clarified, a pipettor is gently blown and mixed uniformly, the PCR tube is instantaneously centrifuged and placed on a PCR instrument to run the following procedures:
temperature (thermal cover 105 ℃ C.) Time
95℃ 5min
65℃ Hold
Note that: hybridization reaction times of 12 to 18 hours are recommended.
10. Capturing a hybridization product:
capture Beads were equilibrated at room temperature for 30min and vortexed.
50 μl Capture Beads were added to a new PCR tube and placed on a magnetic rack, and after the solution was clarified, the supernatant was discarded.
180. Mu.L of Binding Buffer was added, and the beads were resuspended by pipetting or vortexing gently. The PCR tube was centrifuged instantaneously and placed on a magnetic rack, and after the solution was clarified, the supernatant was discarded. This procedure was repeated twice and the beads were washed three times using Binding Buffer.
Then 180. Mu.L of Binding Buffer is added, the pipette is blown and mixed uniformly or mixed uniformly by slight vortex, the magnetic beads are resuspended, and the next operation is immediately carried out.
The hybridization product of step 10 is kept on a PCR instrument, and resuspended Capture Beads are added to the hybridization product, and a pipette is used for blowing and mixing uniformly.
Transferring all the mixture in the PCR tube into a 1.5mL centrifuge tube, placing on a vertical rotation mixer at a rotation speed of not more than 10rpm, and combining at room temperature for 30min (if the vertical rotation mixer is not used, standing at room temperature for 30min, and mixing every 5min for 10 times upside down).
The centrifuge tube is centrifuged instantaneously and placed on a magnetic rack, and after the solution is clarified, the supernatant is discarded.
150 mu L of Wash Buffer 1 is added into the centrifuge tube, a pipettor is used for gently blowing and mixing, the magnetic beads are resuspended, and then the mixture is placed on a vertical rotation mixer, the rotating speed is not more than 10rpm, and the mixture is cleaned for 15 minutes at room temperature.
The centrifuge tube is centrifuged instantaneously and placed on a magnetic rack, and after the solution is clarified, the supernatant is discarded.
150. Mu.L of Wash Buffer 2 preheated at 50℃is added to the centrifuge tube, the mixture is gently beaten and mixed by a pipette, the beads are resuspended, centrifuged instantaneously, placed on a metal bath and incubated at 50℃for 10min. The centrifuge tube is centrifuged instantaneously and placed on a magnetic rack, and after the solution is clarified, the supernatant is discarded. This procedure was repeated twice for three washes using Wash Buffer 2 preheated at 50 ℃.
The centrifuge tube was kept on a magnetic rack, 200. Mu.L of freshly prepared 80% ethanol was added to rinse the beads, and the mixture was allowed to stand for 30s, and the supernatant was discarded.
The centrifuge tube is instantaneously centrifuged and placed on a magnetic rack, the residual ethanol at the bottom is discarded (the magnetic beads are not required to be attracted), the centrifuge tube is uncapped and dried until no ethanol remains (the magnetic beads are not reflective, and the magnetic beads are not required to be excessively dried to a dry crack state).
To the centrifuge tube, 24. Mu.L of Nuclease-Free Water was added, vortexed and homogenized, and the beads were resuspended to give a captured library (containing magnetic beads).
11. Capture library amplification:
the capture library amplification reaction system was prepared on an ice box as follows:
component (A) Volume of
Captured library (magnetic bead) 24μL
8-1Post PCR Mix 25μL
8-2Post PCR Primer 1μL
Total 50μL
The mixture was blown and homogenized using a pipette, and the mixture was rapidly placed on a PCR instrument to run the following procedure (the homogenization was not performed by vortex and centrifugation):
* And (3) injection: at a total library input of <1.5 μg, 16 cycles were recommended; 15 cycles were recommended when the total input of library was greater than or equal to 1.5. Mu.g.
12. Library purification:
the Selection Beads were equilibrated at room temperature for 30min and vortexed.
55 mu L Selection Beads is added to the PCR product, and the mixture is blown and mixed by a pipette or mixed by slight vortex, and the mixture is kept stand at room temperature for 5min.
The PCR tube was centrifuged instantaneously and placed on a magnetic rack, and after the solution was clarified, the supernatant was discarded.
The PCR tube was kept on a magnetic rack, the beads were rinsed by adding 150. Mu.L of freshly prepared 80% ethanol, left to stand for 30s, the supernatant discarded, and the procedure repeated for a total of two rinses.
The PCR tube is instantaneously centrifuged and placed on a magnetic rack, the residual ethanol at the bottom is discarded (attention is paid not to be attracted to the magnetic beads), and the PCR tube is uncovered and dried until no ethanol remains (the magnetic beads do not reflect light, attention is paid to the fact that the magnetic beads are not excessively dried to a dry crack state).
Add 25. Mu.L of Nuclear-Free Water, blow mix or slightly vortex mix with a pipette and leave stand for 5min at room temperature.
The PCR tube was centrifuged transiently and placed on a magnetic rack, after which 24. Mu.L of supernatant was aspirated into a new PCR tube to give a purified library.
13. Library quality control:
the quality control of the PCR products can be quantified using the Qubit. Fragment quality control was performed on the PCR products using a fragment analyzer.
The library with the quality control passed is applied to an Illumina sequencing platform for on-machine sequencing.
Obtaining NGS target methylation profile
After the confirmed pancreatic cancer patient and normal human peripheral blood ctDNA sample obtain the sequencing result through the above process, the methylation characteristic of each region is obtained through a pre-established analysis flow. The pre-established analysis flow includes the following steps: checking the quality of the basic data QC using FastQC; insufficient length or low quality data is cut out using a trimmatic; sequence alignment of methylation transformation DNA is carried out by using Bismark, and methylation data of each methylation site are extracted; adjacent differential methylation sites (DMC) were pooled to Differential Methylation Regions (DMR) using a methyl kit. The average methylation level of each DMR is the methylation characteristic of that region.
Pancreatic cancer prediction model establishment based on support vector machine
The data of the last step is applied to training and testing of the SVM model through data preprocessing, and the preprocessing method comprises the steps of cleaning the data, deleting irrelevant features, processing missing values and processing abnormal values, and homogenizing the obtained data. The z-score conversion is used to convert each dimension of the feature vector into a fraction between (0, 1), and the homogenized data is divided into a training verification data set and a test data set according to the proportion of 80% and 20% for building an SVM model. Radial basis functions are selected from the SVM model as kernel functions.
In this embodiment, five-fold cross-validation is used for model parameter selection. I.e. the training verification data set is divided into 5 parts, of which 1 part is the verification set and the remaining 4 parts are the training set, whereby the model is trained and verified. After the training phase is completed, the model passes through a verification phase to ensure the accuracy and reliability of the model. This involves using a separate dataset, i.e. the validation dataset, than the dataset used for training. The best model parameters were determined by studying maximum accuracy, precision, recall and F1 score.
In the model evaluation, mean Absolute Error (MAE), root Mean Square Error (RMSE) and determination coefficient (R) are selected 2 ) As an evaluation index.
Based on the methylation level in the test dataset, the predictions of the model are compared to the actual health status of the individual. Any inconsistencies are noted and the model is adapted accordingly to increase its predictive power. This iterative process of testing and tuning will continue until the model achieves maximum prediction accuracy. After testing, the SVM model can be applied in clinical environment. When ctDNA methylation data from a new patient is provided, the model can predict the likelihood that the patient will have stage I or II pancreatic cancer.
Evaluation of benign and malignant properties
A set of data including 20 Pancreatic Ductal Adenocarcinoma (PDAC) samples and 20 normal samples was used to input the above model for testing the predictive power of the established logistic regression model. In these samples, ctDNA methylation levels at three specific gene loci were measured and entered into the model. The model then generates a probability score for each sample, indicating the likelihood that the sample is cancerous. These probability scores are used to construct a receiver operating characteristic curve (ROC curve) showing the predictive capabilities of the model as follows:
ROC curves show an area under the curve (AUC) of about 0.95, indicating a higher accuracy in distinguishing PDACs from normal samples. AUC of 0.95 indicates that the model has excellent predictive power. Based on the ROC curve, a cutoff value of 0.58 for the logistic regression model probability score was determined. This means that samples with a score above 0.58 are classified as malignant (PDAC), while samples with a score below 0.58 are classified as benign (normal).
The model was used to classify and predict 20 pancreatic cancer samples, 20 other cancer samples, and 20 normal samples, and the distribution of the obtained predicted values is shown in fig. 5. It can be seen that the methylation sites selected in the present invention can distinguish pancreatic cancer from other non-pancreatic cancers.
In summary, the present invention effectively overcomes the disadvantages of the prior art and has high industrial utility value.
The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (16)

1. A probe composition for capturing a differentially methylated region of ctDNA of a pancreatic cancer-associated gene, wherein the pancreatic cancer-associated gene is selected from any one or more of: BNC1, TFPI2, FBXL7, DBX1, IGF1R, TAF, ADAMTS16, ZNF710, ANKRD60, UTF1, OR1F1, GALNT17.
2. The probe composition of claim 1, wherein the differentially methylated region is located in an exon region, an intron region, an untranslated region, and/or a promoter region of a gene.
3. The probe composition of claim 1, wherein the differential methylation region comprises a region between corresponding sites on any one or more of the following chromosomes, the reference genome being GRCh38:
4. The probe composition of claim 1, wherein the probe composition comprises a first composition for capturing sequences in a methylated state and a second composition for capturing sequences in a non-methylated state.
5. The probe composition of claim 4, wherein the first composition comprises a plurality of probes that are identical or complementary to sequences of methylation states in the differentially methylated regions, and/or the second composition comprises a plurality of probes that are identical or complementary to sequences of non-methylation states in the differentially methylated regions.
6. Use of the probe composition of any one of claims 1-5 in the preparation of a pancreatic cancer diagnostic kit.
7. The use according to claim 6, wherein the pancreatic cancer is selected from pancreatic acinar cell carcinoma or pancreatic ductal adenocarcinoma; preferably, the pancreatic cancer diagnostic kit is a pancreatic cancer early-stage diagnostic kit.
8. A kit for diagnosing pancreatic cancer, comprising the probe composition according to any one of claims 1 to 5.
9. The kit of claim 8, further comprising any one or more of the following:
1) ctDNA extraction reagent;
2) ctDNA pretreatment reagent; preferably, the ctDNA pretreatment reagent is selected from bisulphite; more preferably, the bisulphite is any one or more of sodium bisulphite, calcium bisulphite, potassium bisulphite or ammonium bisulphite;
3) Library construction reagents; preferably, the library construction reagent comprises any one or more of a terminal repair reagent, a sequencing adapter reagent, an extension reagent, a product purification reagent, a library purification reagent, or a quality control reagent.
10. A method for detecting pancreatic cancer, comprising the steps of:
s1, sequencing data of ctDNA of a sample to be tested is obtained after sequencing a sequencing library constructed by the sample to be tested by using the probe composition according to any one of claims 1 to 5 or the kit according to any one of claims 8 to 9;
s2, judging and outputting the probability that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the acquired sequencing data by the machine learning model, and judging that the sample to be detected is the healthy sample or the pancreatic cancer sample according to the probability cut-off value; the machine learning model is constructed by the following method:
s21, obtaining sequencing data of pancreatic cancer groups and healthy groups, and dividing the sequencing data into a training verification data set and a test data set;
S22, training and verifying a machine learning model by using the training and verifying data set, and evaluating the obtained machine learning model;
and S23, testing and adjusting the machine learning model obtained in the step S22 by using a test data set until the model obtains the maximum prediction accuracy, and obtaining the optimal model.
11. The method of claim 10, further comprising any one or more of the following features:
1) The sample to be tested is selected from a tissue sample or a blood sample;
2) The sequencing data in the S1 are the sequencing data after pretreatment; preferably, the preprocessing method comprises the steps of cleaning data, deleting irrelevant features or processing missing values, and homogenizing the obtained data;
3) The machine learning model is an SVM model; preferably, the kernel function of the SVM model is a radial basis kernel function;
4) The probability cutoff value is 0.58, samples with probability scores higher than 0.58 are classified as pancreatic cancer samples, and samples with probability scores lower than 0.58 are classified as healthy samples;
5) In the process of training and verifying a machine learning model, K-fold cross verification is adopted to select model parameters;
6) The mean absolute error, root mean square error, and/or decision coefficient are selected as indicators for evaluating the machine learning model.
12. A pancreatic cancer detection device, comprising:
1) The data acquisition module to be tested: sequencing data for obtaining ctDNA of a sample to be tested, the sequencing data being obtained after sequencing a sequencing library constructed from the sample to be tested using the probe composition of any one of claims 1 to 5 or the kit of any one of claims 8 to 9;
2) And a detection module: the machine learning model judges and outputs the probability that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the acquired sequencing data, and judges that the sample to be detected is a healthy sample or a pancreatic cancer sample according to the probability cut-off value.
13. The pancreatic cancer detection apparatus according to claim 12, wherein said machine learning model comprises the following sub-modules:
1) A data set acquisition sub-module: the method comprises the steps of obtaining sequencing data of pancreatic cancer people and healthy people, and dividing the sequencing data into a training verification data set and a test data set;
2) Model construction submodule: for training, validating the machine learning model using the training validation data set, and evaluating the obtained machine learning model;
3) Model optimization sub-module: and (3) testing and adjusting the machine learning model obtained in the step (S22) by using the test data set until the judgment of the model is consistent with the actual judgment, and obtaining the optimal model.
14. The method of claim 13, further comprising any one or more of the following features:
1) The sample to be tested is selected from a tissue sample or a blood sample;
2) The sequencing data in the data acquisition module or the data set acquisition sub-module to be detected is the sequencing data after pretreatment; preferably, the preprocessing method includes cleaning up data, deleting irrelevant features, processing missing values and processing outliers,
and homogenizing the obtained data; preferably, the homogenization is performed using a z-score;
3) The machine learning model is an SVM model; preferably, the kernel function of the SVM model is a radial basis kernel function;
4) A probability cutoff value in the detection module is 0.58, a sample with a probability score higher than 0.58 is classified as a pancreatic cancer sample, and a sample with a probability score lower than 0.58 is classified as a healthy sample;
5) The model construction submodule adopts K-fold cross validation to select model parameters in the process of training and verifying a machine learning model;
6) And selecting average absolute error, root mean square error and/or decision coefficient from the model construction submodule as indexes for evaluating the machine learning model.
15. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of claim 10 or 11.
16. An electronic terminal, comprising: a processor and a memory; the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, so as to cause the terminal to perform the method of claim 10 or 11.
CN202310887408.4A 2023-07-19 2023-07-19 Pancreatic cancer early diagnosis kit, method and device thereof Active CN117344014B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310887408.4A CN117344014B (en) 2023-07-19 2023-07-19 Pancreatic cancer early diagnosis kit, method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310887408.4A CN117344014B (en) 2023-07-19 2023-07-19 Pancreatic cancer early diagnosis kit, method and device thereof

Publications (2)

Publication Number Publication Date
CN117344014A true CN117344014A (en) 2024-01-05
CN117344014B CN117344014B (en) 2024-06-28

Family

ID=89363883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310887408.4A Active CN117344014B (en) 2023-07-19 2023-07-19 Pancreatic cancer early diagnosis kit, method and device thereof

Country Status (1)

Country Link
CN (1) CN117344014B (en)

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002068694A1 (en) * 2001-02-23 2002-09-06 The Johns Hopkins University School Of Medicine Differentially methylated sequences in pancreatic cancer
WO2004110246A2 (en) * 2003-05-15 2004-12-23 Illumina, Inc. Methods and compositions for diagnosing conditions associated with specific dna methylation patterns
CN103695560A (en) * 2014-01-09 2014-04-02 上海交通大学医学院附属瑞金医院 Application of PPP1R12A gene in colorectal cancer chemotherapy curative effect judgment and detection kit
CN105950739A (en) * 2016-05-30 2016-09-21 哈尔滨医科大学 Probe for detecting circulating tumor DNA (Deoxyribonucleic Acid) of human breast cancer and application of probe
CN106295244A (en) * 2015-06-05 2017-01-04 上海交通大学医学院附属瑞金医院 Screening method of tumor diagnosis marker, breast cancer lung metastasis related gene obtained by method and application of breast cancer lung metastasis related gene
CN106834426A (en) * 2015-12-04 2017-06-13 博尔诚(北京)科技有限公司 Composition and application thereof for detecting cancer of pancreas
WO2017158158A1 (en) * 2016-03-18 2017-09-21 Region Nordjylland, Aalborg University Hospital Methylation markers for pancreatic cancer
CN108277274A (en) * 2017-01-03 2018-07-13 博尔诚(北京)科技有限公司 Composition and application thereof for differentiating pancreatic cancer and chronic pancreatitis
CN109337983A (en) * 2018-11-29 2019-02-15 优葆优保健康科技(宁波)有限公司 Detect the probe combinations and its capture sequencing system of human thyroid carcinomas Circulating tumor DNA
CN111565729A (en) * 2018-01-02 2020-08-21 上海交通大学医学院附属瑞金医院 mp53 rescue compounds and methods of treating p53 disease
CN112176057A (en) * 2020-09-23 2021-01-05 中国人民解放军海军军医大学第一附属医院 Marker for detecting pancreatic ductal adenocarcinoma by using CpG locus methylation level and application thereof
KR20210044923A (en) * 2019-10-15 2021-04-26 사회복지법인 삼성생명공익재단 colorectal cancer-specific methylation biomarkers for diagnosing colorectal cancer
CN113186282A (en) * 2021-04-29 2021-07-30 北京艾克伦医疗科技有限公司 Methods and kits for identifying pancreatic cancer status
CN113373222A (en) * 2021-05-31 2021-09-10 苏州崛起医疗科技有限公司 Early screening evaluation method for pancreatic ductal adenocarcinoma
CN113699242A (en) * 2021-10-18 2021-11-26 浙江省人民医院 Primer probe, kit and method for detecting KRAS gene mutation, ADAMTS1 and BNC1 methylation
CN114045286A (en) * 2021-11-22 2022-02-15 上海交通大学医学院附属瑞金医院 Gene set for pancreatic cancer molecular typing and application thereof
CN114959031A (en) * 2022-05-20 2022-08-30 上海交通大学医学院附属瑞金医院 Marker combination for pancreatic adenocarcinoma prognosis evaluation and application thereof
US20220325349A1 (en) * 2021-04-13 2022-10-13 Geninus Inc. Colorectal cancer-specific methylation biomarkers for diagnosing colorectal cancer
CN115466791A (en) * 2022-09-15 2022-12-13 上海丹贝医学科技有限公司 Methylation biomarker combination for detecting metastatic prostate cancer and application
CN115491421A (en) * 2021-06-18 2022-12-20 上海鹍远生物科技股份有限公司 Pancreatic cancer diagnosis related DNA methylation marker and application thereof
CN115537468A (en) * 2022-10-28 2022-12-30 上海睿璟生物科技有限公司 Probe composition and kit for detecting pancreatic cancer germline variation
KR102549013B1 (en) * 2022-10-11 2023-06-28 주식회사 엔도믹스 Methylation marker genes for pancreatic cancer diagnosis and use thereof

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002068694A1 (en) * 2001-02-23 2002-09-06 The Johns Hopkins University School Of Medicine Differentially methylated sequences in pancreatic cancer
WO2004110246A2 (en) * 2003-05-15 2004-12-23 Illumina, Inc. Methods and compositions for diagnosing conditions associated with specific dna methylation patterns
CN103695560A (en) * 2014-01-09 2014-04-02 上海交通大学医学院附属瑞金医院 Application of PPP1R12A gene in colorectal cancer chemotherapy curative effect judgment and detection kit
CN106295244A (en) * 2015-06-05 2017-01-04 上海交通大学医学院附属瑞金医院 Screening method of tumor diagnosis marker, breast cancer lung metastasis related gene obtained by method and application of breast cancer lung metastasis related gene
CN106834426A (en) * 2015-12-04 2017-06-13 博尔诚(北京)科技有限公司 Composition and application thereof for detecting cancer of pancreas
WO2017158158A1 (en) * 2016-03-18 2017-09-21 Region Nordjylland, Aalborg University Hospital Methylation markers for pancreatic cancer
EP3430162A1 (en) * 2016-03-18 2019-01-23 Region Nordjylland, Aalborg University Hospital Methylation markers for pancreatic cancer
CN105950739A (en) * 2016-05-30 2016-09-21 哈尔滨医科大学 Probe for detecting circulating tumor DNA (Deoxyribonucleic Acid) of human breast cancer and application of probe
CN108277274A (en) * 2017-01-03 2018-07-13 博尔诚(北京)科技有限公司 Composition and application thereof for differentiating pancreatic cancer and chronic pancreatitis
CN111565729A (en) * 2018-01-02 2020-08-21 上海交通大学医学院附属瑞金医院 mp53 rescue compounds and methods of treating p53 disease
CN109337983A (en) * 2018-11-29 2019-02-15 优葆优保健康科技(宁波)有限公司 Detect the probe combinations and its capture sequencing system of human thyroid carcinomas Circulating tumor DNA
KR20210044923A (en) * 2019-10-15 2021-04-26 사회복지법인 삼성생명공익재단 colorectal cancer-specific methylation biomarkers for diagnosing colorectal cancer
CN112176057A (en) * 2020-09-23 2021-01-05 中国人民解放军海军军医大学第一附属医院 Marker for detecting pancreatic ductal adenocarcinoma by using CpG locus methylation level and application thereof
US20220325349A1 (en) * 2021-04-13 2022-10-13 Geninus Inc. Colorectal cancer-specific methylation biomarkers for diagnosing colorectal cancer
CN113186282A (en) * 2021-04-29 2021-07-30 北京艾克伦医疗科技有限公司 Methods and kits for identifying pancreatic cancer status
CN113373222A (en) * 2021-05-31 2021-09-10 苏州崛起医疗科技有限公司 Early screening evaluation method for pancreatic ductal adenocarcinoma
CN115491421A (en) * 2021-06-18 2022-12-20 上海鹍远生物科技股份有限公司 Pancreatic cancer diagnosis related DNA methylation marker and application thereof
CN113699242A (en) * 2021-10-18 2021-11-26 浙江省人民医院 Primer probe, kit and method for detecting KRAS gene mutation, ADAMTS1 and BNC1 methylation
CN114045286A (en) * 2021-11-22 2022-02-15 上海交通大学医学院附属瑞金医院 Gene set for pancreatic cancer molecular typing and application thereof
CN114959031A (en) * 2022-05-20 2022-08-30 上海交通大学医学院附属瑞金医院 Marker combination for pancreatic adenocarcinoma prognosis evaluation and application thereof
CN115466791A (en) * 2022-09-15 2022-12-13 上海丹贝医学科技有限公司 Methylation biomarker combination for detecting metastatic prostate cancer and application
KR102549013B1 (en) * 2022-10-11 2023-06-28 주식회사 엔도믹스 Methylation marker genes for pancreatic cancer diagnosis and use thereof
CN115537468A (en) * 2022-10-28 2022-12-30 上海睿璟生物科技有限公司 Probe composition and kit for detecting pancreatic cancer germline variation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈旻等: "胰腺癌血浆循环游离DNA甲基化预测模型的构建及应用", 《肿瘤》, 25 August 2021 (2021-08-25) *

Also Published As

Publication number Publication date
CN117344014B (en) 2024-06-28

Similar Documents

Publication Publication Date Title
CN108753967B (en) Gene set for liver cancer detection and panel detection design method thereof
JP2022185149A (en) Detecting mutations for cancer screening and fetal analysis
WO2019085988A1 (en) Using nucleic acid size range for noninvasive prenatal testing and cancer detection
CN112397151B (en) Methylation marker screening and evaluating method and device based on target capture sequencing
CN112176057B (en) Marker for detecting pancreatic duct adenocarcinoma by using CpG site methylation level and application thereof
US11242559B2 (en) Method of nuclear DNA and mitochondrial DNA analysis
CN108588230B (en) Marker for breast cancer diagnosis and screening method thereof
CN110760579A (en) Reagent for amplifying free DNA and amplification method
CN114566285B (en) Early screening model for bladder cancer, construction method of early screening model, kit and use method of early screening model
CN116631508B (en) Detection method for tumor specific mutation state and application thereof
CN116804218A (en) Methylation marker for detecting benign and malignant lung nodules and application thereof
CN112951325B (en) Design method of probe combination for cancer detection and application thereof
CN116083588B (en) DNA methylation site combination as prostate cancer marker and application thereof
CN117344014B (en) Pancreatic cancer early diagnosis kit, method and device thereof
CN113811621A (en) Method for determining RCC subtype
CN115976209A (en) Training method of lung cancer prediction model, prediction device and application
CN106868128B (en) Biomarker for auxiliary diagnosis of breast cancer and application thereof
CN113195741A (en) Identification of global sequence features in whole genome sequence data from circulating nucleic acids
CN117344015B (en) Pancreatic cancer diagnosis kit, method and device thereof
CN116656830B (en) Methylation markers, devices, apparatuses and storage media for gastric cancer assisted diagnosis
CN117965725A (en) Method, device and kit for distinguishing liver cancer from liver non-cancer disease samples
CN110205322B (en) Mutation SNP (Single nucleotide polymorphism) site of breast cancer pathogenic gene SEC63 and application thereof
CN118240934A (en) Methylation signal detection method, device and kit
CN116987788A (en) Method and kit for detecting early lung cancer by using flushing liquid
WO2015181718A1 (en) Method of prenatal diagnosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant