CN114334142A - SNP (Single nucleotide polymorphism) locus combination for colorectal cancer morbidity risk prediction, morbidity risk prediction model and system - Google Patents
SNP (Single nucleotide polymorphism) locus combination for colorectal cancer morbidity risk prediction, morbidity risk prediction model and system Download PDFInfo
- Publication number
- CN114334142A CN114334142A CN202110753257.4A CN202110753257A CN114334142A CN 114334142 A CN114334142 A CN 114334142A CN 202110753257 A CN202110753257 A CN 202110753257A CN 114334142 A CN114334142 A CN 114334142A
- Authority
- CN
- China
- Prior art keywords
- risk
- colorectal cancer
- snp
- score
- risk prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010009944 Colon cancer Diseases 0.000 title claims abstract description 118
- 208000001333 Colorectal Neoplasms Diseases 0.000 title claims abstract description 117
- 238000013058 risk prediction model Methods 0.000 title abstract description 18
- 239000002773 nucleotide Substances 0.000 title description 4
- 125000003729 nucleotide group Chemical group 0.000 title description 4
- 238000012216 screening Methods 0.000 claims abstract description 38
- 238000000034 method Methods 0.000 claims abstract description 19
- 101000690100 Homo sapiens U1 small nuclear ribonucleoprotein 70 kDa Proteins 0.000 claims description 10
- 101100029173 Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173) SNP2 gene Proteins 0.000 claims description 10
- 101100094821 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) SMX2 gene Proteins 0.000 claims description 10
- 102100024121 U1 small nuclear ribonucleoprotein 70 kDa Human genes 0.000 claims description 10
- 239000003153 chemical reaction reagent Substances 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 7
- 238000013517 stratification Methods 0.000 claims description 7
- 108090000623 proteins and genes Proteins 0.000 abstract description 18
- 230000002265 prevention Effects 0.000 abstract description 15
- 238000001514 detection method Methods 0.000 abstract description 14
- 238000011160 research Methods 0.000 abstract description 10
- 238000013399 early diagnosis Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000011835 investigation Methods 0.000 abstract description 2
- 238000006243 chemical reaction Methods 0.000 description 15
- 206010028980 Neoplasm Diseases 0.000 description 14
- 230000002068 genetic effect Effects 0.000 description 11
- 238000012795 verification Methods 0.000 description 11
- 201000011510 cancer Diseases 0.000 description 9
- 239000000523 sample Substances 0.000 description 9
- 208000034826 Genetic Predisposition to Disease Diseases 0.000 description 6
- 238000010276 construction Methods 0.000 description 5
- 201000010099 disease Diseases 0.000 description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000010200 validation analysis Methods 0.000 description 5
- 230000036541 health Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 206010006187 Breast cancer Diseases 0.000 description 3
- 208000026310 Breast neoplasm Diseases 0.000 description 3
- 238000002790 cross-validation Methods 0.000 description 3
- 238000012502 risk assessment Methods 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 238000012408 PCR amplification Methods 0.000 description 2
- 238000007844 allele-specific PCR Methods 0.000 description 2
- 238000012098 association analyses Methods 0.000 description 2
- 239000007853 buffer solution Substances 0.000 description 2
- 230000002860 competitive effect Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 239000003085 diluting agent Substances 0.000 description 2
- 230000002550 fecal effect Effects 0.000 description 2
- 230000004907 flux Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 201000002313 intestinal cancer Diseases 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001717 pathogenic effect Effects 0.000 description 2
- 210000005259 peripheral blood Anatomy 0.000 description 2
- 239000011886 peripheral blood Substances 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 1
- 241001156002 Anthonomus pomorum Species 0.000 description 1
- 208000005623 Carcinogenesis Diseases 0.000 description 1
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 description 1
- 102100021147 DNA mismatch repair protein Msh6 Human genes 0.000 description 1
- 201000003741 Gastrointestinal carcinoma Diseases 0.000 description 1
- 206010064571 Gene mutation Diseases 0.000 description 1
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 description 1
- 101000968658 Homo sapiens DNA mismatch repair protein Msh6 Proteins 0.000 description 1
- 208000005016 Intestinal Neoplasms Diseases 0.000 description 1
- 229910015837 MSH2 Inorganic materials 0.000 description 1
- 102000008071 Mismatch Repair Endonuclease PMS2 Human genes 0.000 description 1
- 108010074346 Mismatch Repair Endonuclease PMS2 Proteins 0.000 description 1
- 102000013609 MutL Protein Homolog 1 Human genes 0.000 description 1
- 108010026664 MutL Protein Homolog 1 Proteins 0.000 description 1
- 206010033128 Ovarian cancer Diseases 0.000 description 1
- 206010061535 Ovarian neoplasm Diseases 0.000 description 1
- 208000006011 Stroke Diseases 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 238000009534 blood test Methods 0.000 description 1
- 230000036952 cancer formation Effects 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- -1 dNTP mixed liquor Proteins 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 230000000378 dietary effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 208000020603 familial colorectal cancer Diseases 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 238000003205 genotyping method Methods 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 238000004949 mass spectrometry Methods 0.000 description 1
- 230000033607 mismatch repair Effects 0.000 description 1
- 239000011259 mixed solution Substances 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000011369 optimal treatment Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 230000009862 primary prevention Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 239000012898 sample dilution Substances 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 238000007789 sealing Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012108 two-stage analysis Methods 0.000 description 1
- 230000002747 voluntary effect Effects 0.000 description 1
Images
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a colorectal cancer morbidity risk prediction model and a colorectal cancer morbidity risk prediction system. The invention constructs a colorectal cancer morbidity risk prediction system and a kit comprising 19 SNP locus typing information based on the research of the colorectal cancer susceptibility gene screening result of Chinese population, can predict and prompt the risk degree of the examinee suffering from colorectal cancer, has stable performance, good repeatability, high reliability and convenient detection technology in the colorectal cancer onset risk prediction of different crowds, provides the possibility of evaluating the onset risk level for common risk crowds, but also can carry out large-scale population general investigation and screening of high risk population suffering from colorectal cancer without making age restriction and considering gender difference, realize the risk prediction and early warning of colorectal cancer, improve the early diagnosis rate, the method can be used as the first step of an individualized early screening and prevention strategy for colorectal cancer in China, is beneficial to identifying high-risk individuals of colorectal cancer, and has a prospect of being recommended as one of auxiliary means for a colorectal cancer general screening project.
Description
Technical Field
The invention belongs to the technical field of biomedicine. More particularly, the invention relates to a SNP locus combination for colorectal cancer onset risk prediction, a colorectal cancer onset risk prediction model and a colorectal cancer onset risk prediction system.
Background
The global incidence of colorectal cancer is at the 3 rd position of malignant tumors, and the mortality rate is at the 2 nd position. With the development of socioeconomic and the adjustment of dietary structure, China has become a highly-diseased area of intestinal cancer worldwide. Colorectal cancer has the characteristics of obvious genetic heterogeneity, phenotype complexity, ethnic difference and the like. Genetic factors can fundamentally promote the occurrence of colorectal cancer. Hereditary colorectal cancer accounts for about 5% of the total bowel cancer, which is mainly caused by germline mutations in which APC and DNA mismatch repair genes (MSH2, MSH6, MLH1, PMS2) are rare. The remaining genetic risk of colorectal cancer may be the result of cumulative effects of common genetic variations.
Cancer prevention is more than treatment, and primary prevention aiming at tumors, such as genetic susceptibility gene mutation screening, risk assessment and preventive intervention, has already significant effect in the prevention and control of breast cancer and ovarian cancer, and is approved and recommended by developed countries in European and American parts to be group prevention measures for most effectively improving the survival rate and reducing the death rate of related tumor patients.
The general screening of colorectal cancer population can screen high risk group of colorectal cancer, improve the early diagnosis rate of colorectal cancer, and is one of the key strategies for improving the prevention and treatment effect of colorectal cancer. The prevention and control work of colorectal cancer in China has been improved to a certain extent in the past decades, but the gap is still obvious compared with developed countries in Europe and America. At present, the clinical improvement of the early diagnosis rate of colorectal cancer mainly depends on the improvement of personal health consciousness and the discovery of voluntary physical examination. If a novel screening means can be developed to predict and identify high-risk pathogenic population through colorectal cancer pathogenic risk, the individualized health management and tumor prevention of the high-risk population with colorectal cancer can be guided, and early tumor screening is realized.
China already develops the examination and screening of colorectal cancer in individual provincial and urban areas, and at present, the large-scale community screening of colorectal cancer in China firstly screens high-risk groups by questionnaires, and then carries out enteroscopy fine screening on the high-risk groups. Because of the large population in China, the direct use of the enteroscopy for the general investigation of the population consumes a large amount of manpower, material resources and financial resources, and the enteroscopy has certain complication risks. Moreover, no existing screening projects are included in colorectal cancer genetic susceptibility gene assessment. The role of genetic factors in tumorigenesis is not negligible. In some common malignant tumors, the disease risk of an individual is predicted by detecting the condition that the individual carries a susceptibility gene, and certain effects are achieved. For example, under the standardized screening and prevention, the incidence of breast cancer is steadily decreasing in developed countries in europe and america, and the number of breast cancer patients in middle and advanced stages of the initial diagnosis is reduced to about 9%. The incidence rate of colorectal cancer in China is on the rise, and most patients are in middle and late stages when diagnosed, so that the optimal treatment opportunity is lost. Therefore, the method has positive and important significance in colorectal cancer onset risk assessment in the prevention and treatment work of colorectal cancer in China, and can be an effective and feasible cancer prevention strategy for individual prevention and intervention of people, particularly high onset risk individuals in cancer families.
In China, no feasible genetic prediction model for the incidence risk of the colorectal cancer general population exists, and the screening effect of the genetic prediction model in the population is still unclear.
Disclosure of Invention
The invention aims to overcome the defects of the existing colorectal cancer screening and morbidity risk prediction technology, and develops a new means for carrying out different risk stratification on general risk groups in China so as to identify high risk groups and then carry out colorectal cancer fine screening.
The invention aims to provide an SNP locus combination for colorectal cancer onset risk prediction.
Another objective of the invention is to provide a colorectal cancer onset risk prediction system (model).
The invention also aims to provide a colorectal cancer onset risk prediction kit.
The above purpose of the invention is realized by the following technical scheme:
the invention carries out external verification research on 75 common gene variations identified by existing colorectal cancer whole Genome association research (GWAs), and uses an LASSO (last absolute colorectal cancer and selection operator, LASSO) statistical method to comprehensively analyze and construct a colorectal cancer risk prediction model containing 19 variation sites.
Namely, the invention provides a SNP locus combination for colorectal cancer onset risk prediction, which comprises rs10411210, rs10774214, rs10795668, rs10936599, rs11903757, rs12603526, rs12953717, rs1321311, rs16969681, rs1801133, rs2423279, rs3802842, rs4813802, rs6061231, rs6469656, rs647161, rs 69877267, rs704017 and rs 7315438.
The application of the SNP locus combination in constructing a colorectal cancer onset risk prediction model or system and the application in preparing a colorectal cancer onset risk prediction kit also belong to the protection scope of the invention.
Based on the above, the application provides a colorectal cancer onset risk prediction model based on the above 19 variation sites, and the prediction is performed by using the above SNP site combination genotype data in the obtained sample to be tested, and using the following calculation formula: score ═ exp (β 0+ β 1 × SNP1+ β 2 × SNP2+ … … + β 19 × SNP19), where β 0 ═ 0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4.
Based on this, a colorectal cancer onset risk prediction system includes:
(1) a data acquisition module: the SNP genotype data of a sample to be predicted is collected, and the SNP is the SNP locus combination;
(2) a score calculation module: the data used for analysis (1), incorporated into a scoring system (i.e., in a model for risk prediction of morbidity) for risk scoring,
(3) a risk comparison module: used for the risk score to be included in the risk stratification standard and giving the risk judgment result,
(4) a risk result display module: and displaying the risk judgment result.
The specific method of the step (2) is as follows: bringing the data in the step (1) into a calculation formula of a scoring system to calculate the morbidity risk score; the calculation formula is as follows: score ═ exp (β 0+ β 1 × SNP1+ β 2 × SNP2+ … … + β 19 × SNP19), where β 0 ═ 0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4.
The risk stratification standard in the step (3) is as follows: when Score >1.6, it is judged as a high risk group, i.e. the relative risk is 3 times or more of that of the general population; when the Score is more than 0.9 and less than or equal to 1.6, the risk is judged to be the population with the risk, namely the relative risk is 1.5-3 times; when the Score is less than or equal to 0.9, the risk is judged to be low risk group, namely the relative risk is within 1.5 times.
Example (c): the SNP detection result of the patient A is as follows: rs10936599: TC; rs6061231 CA; rs10774214: CC; rs10795668 GA; rs11903757: TC; rs12603526, TC; rs1321311: GG; rs2423279: TC; rs3802842: CA; rs4813802: TT; rs6469656, GG; rs647161: CC; rs704017: AA; rs7315438: CC; rs10411210: CC; rs12953717 CC; rs16969681, TT; rs1801133: AA; rs 69883267 TG; the risk score is then: 0.56, the low risk group is judged.
And B, the SNP detection result of the patient is as follows: rs10936599: TC; rs6061231 CA; rs10774214: CC; rs10795668 AA; rs11903757: TT; rs12603526, TC; rs1321311: TG; rs2423279: TC; rs3802842: CA; rs4813802: TT; rs6469656, GG; rs647161 CA; rs704017 is GA; rs7315438: CC; rs10411210: CC; rs12953717 CC; rs16969681, CC; rs1801133: AG; rs 69883267 TG; the risk score is then: 1.14, the patients are judged to be the people with the risk of stroke.
Patient C, SNP detection results are: rs10936599: TC; rs6061231 CA; rs10774214 TC; rs10795668 AA; rs11903757: TC; rs12603526, TC; rs1321311: TG; rs2423279: TT; rs3802842: CA; rs4813802: TG; rs6469656, GA; rs647161 CA; rs704017 is GA; rs7315438: TC; rs10411210: CC; rs12953717 TC; rs16969681, CC; rs1801133: AA; rs 69883267: TT; the risk score is then: 1.97, the high risk group is judged.
According to the colorectal cancer onset risk prediction system provided by the invention, the area under the curve (AUC) for predicting colorectal cancer onset risk in training population is 0.61, and the AUC obtained by analysis in three independent verification populations of the south, east and north China is 0.59-0.61. Based on the predictive model of the invention, the risk of developing colorectal cancer in the individual with the highest quartile risk score is more than 2 times (2.12-2.90) higher than that in the individual with the lowest quartile risk score. The method has stable performance in colorectal cancer incidence risk prediction of different populations, is convenient in detection technology, provides possibility for evaluating incidence risk level for common risk population, can be used as the first step of individualized early screening and prevention strategy for colorectal cancer in China, and is beneficial to identifying high-risk individuals for colorectal cancer.
In addition, based on the colorectal cancer onset risk prediction kit, the kit comprises a reagent for detecting the SNP site combination and the prediction system.
As an alternative preferred scheme, the reagent is a PCR specific recognition primer and an extension primer, and the sequence is shown as SEQ ID NO. 20-76.
As an optional preferred scheme, the kit further comprises reagents required by PCR detection, wherein the reagents required by PCR detection comprise Taq enzyme, dNTP mixed liquor, DNA sample diluent and buffer solution.
As an alternative preferred scheme, the PCR reaction system is as follows:
as an alternative preferred embodiment, the PCR reaction procedure is as follows:
if the typing result is not ideal, the PCR reaction is optimized by adding the following conditions:
the colorectal cancer morbidity risk prediction kit provided by the invention can be used for simultaneously detecting the common variation sites of 19 colorectal cancer susceptibility genes, and can provide reference for the general survey of the individual colorectal cancer risk degree and the colorectal cancer high incidence area population, the screening of the colorectal cancer morbidity high risk population and the corresponding prevention measures.
Therefore, the application of the above-mentioned SNP site combination, the prediction model, the prediction system, and the kit in colorectal cancer screening, or in the preparation of colorectal cancer screening products, also fall within the scope of the present invention.
The colorectal cancer risk prediction model is robust, and has the potential of helping clinicians determine which subjects have higher genetic susceptibility, and individuals in high-risk groups (with 75% -100% of risk distribution) need to be subjected to enteroscopy, so that the colorectal cancer fine screening and early screening efficiency can be improved. Compared with the current gene-based cancer risk model applicable to western populations, the model achieves a considerably higher level of screening for the risk of onset of colorectal cancer. Colorectal cancer screening programs currently conducted in china include only risk assessment questionnaires and fecal occult blood tests, followed by recommendation of colonoscopic screening for high risk groups. Increasing genetic susceptibility screening in the population to assess risk of developing disease is more applicable to a stratified risk assessment program for the general population to predict future cancer risk for asymptomatic individuals. The screening results of these programs can further stimulate behavioral changes in people to prevent colorectal cancer in a healthier lifestyle.
Compared with a small number of research, the colorectal cancer onset risk prediction model based on gene variation is developed in European population. For example, in one study (HSU, LI, JEON, JIHYOUN, BRENNER, HERMANN, et al. A Model to determination color Cancer Risk Using Common Genetic similarity Loci [ J ] Gastroenterology,2015,148(7): 1330) 1339.) a predictive Model was constructed that contained 27 SNPs, with an AUC of 0.55. Another study (Dunlop M G, Tenesa A, Farrington S M, et al. Current impact of common genetic variants and other risk factors on color cancer in 42103 indicials- -Dunlop et al 62(6):871- -Gut [ J ] Gut,2013,62(6):871-81.) contains a predictive model of 10 SNPs with an AUC of 0.57, while the AUC of the model for inclusion of genotype, age, gender and family history states is 0.59. A weighted genetic risk score (G-score) based on 63 SNPs may suggest that the intensity of colorectal cancer risk for high risk population is 1.30 to 1.34 fold. The AUC of the risk prediction model established by the invention at least reaches 0.59 (ranging from 0.59 to 0.61), and risk stratification indicates that the intensity of the risk of colorectal cancer of high risk groups is more than 2 times of that of general individuals. The risk prediction model established by the invention is subjected to two-stage analysis of four independent queues, and internal and external verification, so that the discrimination capability is stable. The method is also one of the decisive conditions for screening and applying the potential clinical and population colorectal cancer onset risk.
The invention has the following beneficial effects:
the invention provides a colorectal cancer morbidity risk prediction model which is constructed by selecting 19 Single Nucleotide Polymorphism (SNP) loci typing information based on early-stage research of colorectal cancer susceptibility gene screening results of Chinese population, and specifically screening a primer sequence aiming at each locus. The kit provided by the invention is used for genotyping detection by adopting the genomic DNA from the peripheral blood of a detected person, detecting a single nucleotide site by using a competitive Allele Specific PCR (KASP) method, and presuming the risk degree of the detected person suffering from colorectal cancer by using a genetic risk scoring model, thereby overcoming the defect that the colorectal cancer onset risk judgment is carried out only by detecting fecal occult blood or depending on the detection results of few colonoscopes and the like which are yet to be developed in a large scale.
The model provided by the invention does not limit the age for screening the population, does not consider the gender difference, can detect at any age stage of the subject, prompts the risk degree of the subject suffering from colorectal cancer, can perform large-scale population census and screen high risk population suffering from colorectal cancer, realizes the risk prediction and early warning of colorectal cancer, improves the early diagnosis rate, and has one of the auxiliary means recommended as the colorectal cancer screening project.
The evaluation of colorectal cancer genetic susceptibility through gene variation detection is a novel auxiliary screening means for high risk population. The invention relates to a colorectal cancer onset risk gene prediction model established based on Chinese big sample group data. The invention screens the susceptible sites discovered based on the existing Genome-Wide Association Study (GWAs) in the international range, and integrates the susceptible sites closely related to colorectal cancer morbidity risk of Chinese population. Compared with the colorectal cancer onset risk prediction model based on genes in the western population at present, the risk prediction model disclosed by the invention has higher identification capability.
Repeatability and stability are methods for measuring the reliability of gene prediction models, which are particularly important for applying the same tumor incidence risk prediction model to different people, and are also one of important factors for determining whether the method has population popularization and measuring the reliability of the method. The simple detection of a single site or multiple sites in a previous sample limits the application of colorectal cancer related susceptibility genes in early tumor screening practice. In the invention, besides the queue data for constructing the gene model, the reliability of the developed model can be determined through one internal queue verification and two external queue verifications. The verification queues selected by the invention are respectively from the south, east and north parts of China, and people in the three areas may have better representativeness in China and meet the requirement of evaluating the external prediction effectiveness of the invention. Good repeatability is a prerequisite for guaranteeing the popularization and the application of the method in colorectal cancer genetic susceptibility screening of different populations in China. The scoring system derived from the prediction model can be used as one of screening tools for China to establish an individualized CRC prevention strategy, in the scoring system, the risk of the upper quartile is at least twice that of the general population, and the possibility of dividing different risk levels in the average risk population according to the risk prediction model is provided. The invention may be used as the first step of a personalized prevention strategy for colorectal cancer in China.
Drawings
FIG. 1 is a binary logistic regression model using the minimum absolute shrinkage and selection operator (LASSO) in accordance with the present invention. (A) The method comprises the following steps The LASSO model tuning parameter (λ) selection uses 10-fold minimum standard cross validation. λ is plotted over the area under the subject performance characteristic (AUC) curve. The dashed line is drawn at the optimum with 1 standard error of the minimum standard (1-se standard). By 10-fold cross validation, the lambda value was 0.012(1-SE standard). (B) The method comprises the following steps LASSO logistic regression coefficients for 75 SNPs. λ is plotted from the coefficient profile. The vertical dashed line is drawn at the optimal value with 1 standard error of the minimum standard (1-se standard), resulting in 19 non-zero coefficients (highlighted).
FIG. 2 shows that in the model building queue studied by the inventors, SNP locus association analysis is performed, and 19 selected loci are associated with colorectal cancer onset
FIG. 3 is the area under the working characteristic (AUC) of the subject based on the 19 SNP colorectal cancer onset risk prediction model in the invention. SYSUCC-training is a model construction research queue result from southern China, SYSUCC-training is a model internal verification research queue result from southern China, FUSCC-validation is a model external verification research queue result from eastern China, and CMU-validation is a model external verification research queue result from northern China.
Detailed Description
The model of the invention needs to collect the typing results of 19 related SNPs, and the SNP typing technology can adopt Sanger sequencing, flight mass spectrometry, TaqMan probe method, competitive Allele Specific PCR (KASP) method and the like. The KASP method can improve the analysis flux and sensitivity, and achieves higher detection rate with less sample dosage and reagent dosage and lower cost. Has obvious advantages in flux, sensitivity and flexibility. The invention is further described with reference to the drawings and the following detailed description, which are not intended to limit the invention in any way. Reagents, methods and apparatus used in the present invention are conventional in the art unless otherwise indicated.
Unless otherwise indicated, reagents and materials used in the following examples are commercially available.
Example 1 SNP site combination related to model for predicting colorectal cancer onset risk and model construction
Based on 75 key susceptibility gene variation sites discovered by colorectal cancer GWAS research in an international range (Table 1), based on the association analysis of large-scale sample volumes of Chinese colorectal cancer population and health control population (3049 colorectal cancer patients and 2557 health bank control individuals in population cohort in southern China), 75 SNPs are subjected to fitting analysis by using a LASSO statistical method, a model is trained by using 10-fold cross validation, and a test data set is randomly selected each time.
TABLE 1 prediction model invented by the present invention is based on 75 SNPs from GWAs
The invention fits models containing different amounts of SNPs in the analysis process, and finally obtains an optimized colorectal cancer onset risk prediction SNP site combination containing 19 SNPs and a model (figure 1) through comprehensive analysis, wherein the model formula is Score ═ exp (beta 0+ beta 1 × SNP1+ beta 2 × SNP2+ … … + beta 19 × SNP19), wherein beta 0 ═ -0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4. The sequences of the 19 SNP sites are shown in Table 2.
The effect of 19 SNPs in the model on the risk of colorectal cancer onset is shown in figure 2.
When the impact of the risk allele on the risk of colorectal cancer onset is calculated, a single SNP in the model carries a 1.13 to 1.63-fold risk compared to the reference genotype. During the training phase, the area under the working characteristic (AUC) of the subject based on the 19 SNP gene model was 0.61 (fig. 3).
TABLE 2 sequences of 19 SNP sites significantly related to colorectal cancer onset determined by the present invention
Example 2 colorectal cancer onset risk prediction kit
A kit for colorectal cancer onset risk prediction, comprising: PCR amplification primers and single-base extension primers for detecting 19 SNP sites (19 SNP sites shown in Table 2) of the human genome.
The specific kit is prepared as follows:
1. designing and synthesizing PCR-specific recognition primers and extension primers for the SNP sites as shown in Table 3
TABLE 3 PCR amplification primers (Primer Allle _ FAM, Primer Allle _ HEX) and Co-extension primers (Primer Common) for specific recognition of biallelic genes at 19 SNP sites to be tested
2. Construction of the kit
Other components of the kit include: taq enzyme, dNTP mixed solution, diluent, buffer solution and the like, and the details are shown in the following detection method.
3. The detection method comprises the following steps:
(1) DNA sample extraction
Using Qiagen DNA midi kit (100) kit or similar products, genomic DNA from peripheral blood was extracted, dried at 65 ℃ for 30min, quantified with a spectrophotometer, and stored at-20 ℃ for future use.
(2) DNA sample dilution and addition (96-well reaction System)
The total reaction volume of the 96-well plate reaction system is 10 mu L, and the content of the samples in the reaction system is 8-10 ng/mu L, namely the total amount of the samples in each well is about 80-100 ng. The samples corresponding to the total amount above were added to a 96-well PCR reaction plate and centrifuged.
(3) Construction of PCR reaction System
The total amount of the added DNA is 80-100ng
Plate format | 96well plate |
2*KASP master mix | 5μL |
72*Assay mix | 0.14μL |
DNA samples | 4.86μL |
H2O | 0μL |
Total | 10μL |
1reaction |
(4) And (3) sealing the 96-hole PCR reaction plate, shaking and centrifuging to ensure that the reaction system is uniformly mixed.
(5) After centrifugation, PCR was carried out under the following reaction conditions:
(6) and reading the plate after the reaction on a microplate reader Pheastar, and reading the fluorescent signal of the reaction plate.
(7) If the typing result is not ideal, the PCR reaction is optimized by adding the following conditions:
The prediction efficacy of the model constructed by the invention is verified in an independent validation cohort from southern China, the validation cohort is 3017 colorectal cancer patients and 2488 healthy control individuals, the AUC for predicting the colorectal cancer onset risk is 0.59 (figure 3), and the model is shown to have good internal efficacy.
Example 4 external verification of prediction efficiency of prediction model constructed by the invention
The prediction model constructed by the invention is further verified in two independent queues in northern China and eastern China, the study objects of northern China sources comprise 1242 colorectal cancer patients and 1027 healthy control individuals, the study objects of eastern China sources comprise 1414 colorectal cancer patients and 1152 healthy control individuals, the AUC of colorectal cancer onset risk prediction are 0.59 and 0.61 respectively, and the prediction model is proved to have good external validity (figure 3).
Example 5 model-derived risk stratification constructed according to the invention
We assigned the risk and reference genotypes according to the genetic model of each SNP and its coefficients in the risk prediction model (Table 4) and calculated the size of the risk of developing tumors, and the research and development stage of the invention relates to colorectal cancer risk distribution of cohort population (Table 5). We measure risk scores and stratify with quartiles, with group 1 having a risk score below 25% as a reference. In the model construction cohort and the three validation cohorts, the colorectal cancer incidence risk increased with increasing risk score. The relative risk of morbidity in group 2 was slightly increased, 1.22-1.61 times that in group 1. The relative risk of disease in group 3 was moderately elevated, 1.73-2.00 times that in group 1. The relative risk of the disease onset is highest in group 4, and the disease onset risk of group 1 is more than 2 times (2.12-2.90).
TABLE 4 19 SNPs involved in the model studied by the present invention and their contribution in the model
TABLE 5 colorectal cancer incidence Risk size distribution for cohort groups to which the present invention relates
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (10)
1. An SNP locus combination for colorectal cancer onset risk prediction is characterized by comprising rs10411210, rs10774214, rs10795668, rs10936599, rs11903757, rs12603526, rs12953717, rs1321311, rs16969681, rs1801133, rs2423279, rs3802842, rs4813802, rs6061231, rs6469656, rs647161, rs 69877267, rs704017 and rs 7315438.
2. Use of the combination of SNP sites according to claim 1 for constructing a model or system for predicting colorectal cancer onset risk.
3. Use of the SNP site combination according to claim 1 for preparing a kit for predicting colorectal cancer onset risk.
4. A model for predicting colorectal cancer onset risk, which is characterized in that the SNP locus combined genotype data of the obtained sample to be tested is used for predicting by the following calculation formula: score ═ exp (β 0+ β 1 × SNP1+ β 2 × SNP2+ … … + β 19 × SNP19), where β 0 ═ 0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4.
5. A colorectal cancer onset risk prediction system, comprising:
(1) a data acquisition module: SNP genotype data for collecting a sample to be predicted, the SNP being the combination of SNP sites described in claim 1;
(2) a score calculation module: for analysis of the data of (1), inclusion in a scoring system for risk scoring,
(3) a risk comparison module: used for the risk score to be included in the risk stratification standard and giving the risk judgment result,
(4) a risk result display module: and displaying the risk judgment result.
6. The colorectal cancer onset risk prediction system of claim 5, wherein the specific method of step (2) is: bringing the data in the step (1) into a calculation formula of a scoring system to calculate the morbidity risk score; the calculation formula is as follows: score ═ exp (β 0+ β 1 × SNP1+ β 2 × SNP2+ … … + β 19 × SNP19), where β 0 ═ 0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4.
7. The system for predicting colorectal cancer onset risk according to claim 5, wherein the risk stratification criteria in step (3) are: when Score >1.6, it is judged as a high risk group, i.e. the relative risk is 3 times or more of that of the general population; when the Score is more than 0.9 and less than or equal to 1.6, the risk is judged to be the population with the risk, namely the relative risk is 1.5-3 times; when the Score is less than or equal to 0.9, the risk is judged to be low risk group, namely the relative risk is within 1.5 times.
8. A kit for predicting colorectal cancer onset risk, comprising a reagent for detecting the combination of SNP sites according to claim 1, and the prediction system according to any one of claims 5 to 7.
9. The kit according to claim 8, wherein the reagent for detecting the SNP site combination according to claim 1 is a PCR specific recognition primer and an extension primer, and the sequences of the primers are shown as SEQ ID NO: 20-76 in sequence.
10. The use of the SNP site combination according to claim 1, the model for predicting colorectal cancer onset risk according to claim 4, the prediction system according to claim 5 or the kit according to any one of claims 8 to 9 for colorectal cancer screening, or for preparing a colorectal cancer screening product.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110753257.4A CN114334142A (en) | 2021-07-02 | 2021-07-02 | SNP (Single nucleotide polymorphism) locus combination for colorectal cancer morbidity risk prediction, morbidity risk prediction model and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110753257.4A CN114334142A (en) | 2021-07-02 | 2021-07-02 | SNP (Single nucleotide polymorphism) locus combination for colorectal cancer morbidity risk prediction, morbidity risk prediction model and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114334142A true CN114334142A (en) | 2022-04-12 |
Family
ID=81044294
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110753257.4A Pending CN114334142A (en) | 2021-07-02 | 2021-07-02 | SNP (Single nucleotide polymorphism) locus combination for colorectal cancer morbidity risk prediction, morbidity risk prediction model and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114334142A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117210568A (en) * | 2023-10-30 | 2023-12-12 | 云南省肿瘤医院(昆明医科大学第三附属医院) | SNP marker for detecting familial hereditary colorectal cancer and application thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105821147A (en) * | 2016-05-26 | 2016-08-03 | 成都中创清科医学检验所有限公司 | Primer and method for detecting rectal-cancer-susceptibility-related SNP site |
CN107557460A (en) * | 2017-10-20 | 2018-01-09 | 武汉赛云博生物科技有限公司 | Method based on nucleic acid mass-spectrometric technique detection colorectal cancer driving gene and susceptible SNP |
CN109072308A (en) * | 2016-01-28 | 2018-12-21 | 墨尔本大学 | For assessing the method for suffering from Risk of Colorectal Cancer |
CN111394466A (en) * | 2020-04-29 | 2020-07-10 | 盐城师范学院 | Kit for detecting susceptibility of colorectal cancer |
-
2021
- 2021-07-02 CN CN202110753257.4A patent/CN114334142A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109072308A (en) * | 2016-01-28 | 2018-12-21 | 墨尔本大学 | For assessing the method for suffering from Risk of Colorectal Cancer |
US20190161802A1 (en) * | 2016-01-28 | 2019-05-30 | The University Of Melbourne | Methods for assessing risk of developing colorectal cancer |
CN105821147A (en) * | 2016-05-26 | 2016-08-03 | 成都中创清科医学检验所有限公司 | Primer and method for detecting rectal-cancer-susceptibility-related SNP site |
CN107557460A (en) * | 2017-10-20 | 2018-01-09 | 武汉赛云博生物科技有限公司 | Method based on nucleic acid mass-spectrometric technique detection colorectal cancer driving gene and susceptible SNP |
CN111394466A (en) * | 2020-04-29 | 2020-07-10 | 盐城师范学院 | Kit for detecting susceptibility of colorectal cancer |
Non-Patent Citations (1)
Title |
---|
CAI-YUN HE ET AL.: "Performance of common genetic variants in risk prediction for colorectal cancer in Chinese: A two-stage and multicenter study", GENOMICS, vol. 113, 28 February 2021 (2021-02-28), pages 868 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117210568A (en) * | 2023-10-30 | 2023-12-12 | 云南省肿瘤医院(昆明医科大学第三附属医院) | SNP marker for detecting familial hereditary colorectal cancer and application thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11421282B2 (en) | Methods and compositions for correlating genetic markers with prostate cancer risk | |
CN107254531B (en) | Genetic biomarker for auxiliary diagnosis of early colorectal cancer and application thereof | |
CN109797190B (en) | Microbial marker for evaluating risk of type II diabetes and application of microbial marker | |
EA025926B1 (en) | Molecular diagnostic test for cancer | |
CN110468192B (en) | Time-of-flight mass spectrometry nucleic acid analysis method for detecting human spinal muscular atrophy gene mutation | |
CN110904231B (en) | Reagent for auxiliary diagnosis of liver cancer and application of reagent in preparation of reagent kit | |
CN105567822B (en) | A kind of genetic test primer sets and kit for cancer risk assessment | |
CN106591273A (en) | Gene new mutations relevant to IEM (Inborn Errors of Metabolism) and detection kit | |
CN103614477B (en) | Fluorescent quantitative PCR (Polymerase Chain Reaction) kit for diagnosing human spinal muscular atrophy | |
CN112397151A (en) | Methylation marker screening and evaluating method and device based on target capture sequencing | |
CN114891876A (en) | Functional genome area biomarker combination for diagnosing high myopia | |
CN114334142A (en) | SNP (Single nucleotide polymorphism) locus combination for colorectal cancer morbidity risk prediction, morbidity risk prediction model and system | |
CN110951872A (en) | Method for detecting colorectal cancer gene DNA methylation level based on nucleic acid mass spectrometry technology and application thereof | |
CN112037863B (en) | Early NSCLC prognosis prediction system | |
KR101992789B1 (en) | Method for providing information of prediction and diagnosis of obesity using methylation level of BZRAP1-AS1 gene and composition therefor | |
CN116083562B (en) | SNP marker combination and primer set related to aspirin resistance auxiliary diagnosis and application thereof | |
CN109182490B (en) | LRSAM1 gene SNP mutation site typing primer and application thereof in coronary heart disease prediction | |
CN113637741B (en) | Early-onset white hair genetic risk gene detection kit, early-onset white hair genetic risk assessment system and early-onset white hair genetic risk assessment method | |
CN111549137B (en) | Genetic molecular marker related to gastric cancer auxiliary diagnosis and application thereof | |
CN116469552A (en) | Method and system for breast cancer polygene genetic risk assessment | |
CN110714079A (en) | Mutant gene for breast cancer auxiliary diagnosis and application thereof | |
CN111154871A (en) | Method for detecting colorectal cancer related gene mutation based on nucleic acid mass spectrometry technology | |
Arnolda et al. | Clinical validation and implementation of droplet digital PCR for the detection of BRAF mutations from cell-free DNA | |
CN111662974B (en) | SNP markers related to radioactive oral mucositis and application thereof | |
CN107312843B (en) | Application of KRBA1 gene mutation in preparation of breast cancer detection kit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |