CN114334142A

CN114334142A - SNP (Single nucleotide polymorphism) locus combination for colorectal cancer morbidity risk prediction, morbidity risk prediction model and system

Info

Publication number: CN114334142A
Application number: CN202110753257.4A
Authority: CN
Inventors: 徐瑞华; 何彩云; 陈乐宗
Original assignee: Sun Yat Sen University Cancer Center
Current assignee: Sun Yat Sen University Cancer Center
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2022-04-12

Abstract

The invention provides a colorectal cancer morbidity risk prediction model and a colorectal cancer morbidity risk prediction system. The invention constructs a colorectal cancer morbidity risk prediction system and a kit comprising 19 SNP locus typing information based on the research of the colorectal cancer susceptibility gene screening result of Chinese population, can predict and prompt the risk degree of the examinee suffering from colorectal cancer, has stable performance, good repeatability, high reliability and convenient detection technology in the colorectal cancer onset risk prediction of different crowds, provides the possibility of evaluating the onset risk level for common risk crowds, but also can carry out large-scale population general investigation and screening of high risk population suffering from colorectal cancer without making age restriction and considering gender difference, realize the risk prediction and early warning of colorectal cancer, improve the early diagnosis rate, the method can be used as the first step of an individualized early screening and prevention strategy for colorectal cancer in China, is beneficial to identifying high-risk individuals of colorectal cancer, and has a prospect of being recommended as one of auxiliary means for a colorectal cancer general screening project.

Description

SNP (Single nucleotide polymorphism) locus combination for colorectal cancer morbidity risk prediction, morbidity risk prediction model and system

Technical Field

The invention belongs to the technical field of biomedicine. More particularly, the invention relates to a SNP locus combination for colorectal cancer onset risk prediction, a colorectal cancer onset risk prediction model and a colorectal cancer onset risk prediction system.

Background

The global incidence of colorectal cancer is at the 3 rd position of malignant tumors, and the mortality rate is at the 2 nd position. With the development of socioeconomic and the adjustment of dietary structure, China has become a highly-diseased area of intestinal cancer worldwide. Colorectal cancer has the characteristics of obvious genetic heterogeneity, phenotype complexity, ethnic difference and the like. Genetic factors can fundamentally promote the occurrence of colorectal cancer. Hereditary colorectal cancer accounts for about 5% of the total bowel cancer, which is mainly caused by germline mutations in which APC and DNA mismatch repair genes (MSH2, MSH6, MLH1, PMS2) are rare. The remaining genetic risk of colorectal cancer may be the result of cumulative effects of common genetic variations.

Cancer prevention is more than treatment, and primary prevention aiming at tumors, such as genetic susceptibility gene mutation screening, risk assessment and preventive intervention, has already significant effect in the prevention and control of breast cancer and ovarian cancer, and is approved and recommended by developed countries in European and American parts to be group prevention measures for most effectively improving the survival rate and reducing the death rate of related tumor patients.

The general screening of colorectal cancer population can screen high risk group of colorectal cancer, improve the early diagnosis rate of colorectal cancer, and is one of the key strategies for improving the prevention and treatment effect of colorectal cancer. The prevention and control work of colorectal cancer in China has been improved to a certain extent in the past decades, but the gap is still obvious compared with developed countries in Europe and America. At present, the clinical improvement of the early diagnosis rate of colorectal cancer mainly depends on the improvement of personal health consciousness and the discovery of voluntary physical examination. If a novel screening means can be developed to predict and identify high-risk pathogenic population through colorectal cancer pathogenic risk, the individualized health management and tumor prevention of the high-risk population with colorectal cancer can be guided, and early tumor screening is realized.

China already develops the examination and screening of colorectal cancer in individual provincial and urban areas, and at present, the large-scale community screening of colorectal cancer in China firstly screens high-risk groups by questionnaires, and then carries out enteroscopy fine screening on the high-risk groups. Because of the large population in China, the direct use of the enteroscopy for the general investigation of the population consumes a large amount of manpower, material resources and financial resources, and the enteroscopy has certain complication risks. Moreover, no existing screening projects are included in colorectal cancer genetic susceptibility gene assessment. The role of genetic factors in tumorigenesis is not negligible. In some common malignant tumors, the disease risk of an individual is predicted by detecting the condition that the individual carries a susceptibility gene, and certain effects are achieved. For example, under the standardized screening and prevention, the incidence of breast cancer is steadily decreasing in developed countries in europe and america, and the number of breast cancer patients in middle and advanced stages of the initial diagnosis is reduced to about 9%. The incidence rate of colorectal cancer in China is on the rise, and most patients are in middle and late stages when diagnosed, so that the optimal treatment opportunity is lost. Therefore, the method has positive and important significance in colorectal cancer onset risk assessment in the prevention and treatment work of colorectal cancer in China, and can be an effective and feasible cancer prevention strategy for individual prevention and intervention of people, particularly high onset risk individuals in cancer families.

In China, no feasible genetic prediction model for the incidence risk of the colorectal cancer general population exists, and the screening effect of the genetic prediction model in the population is still unclear.

Disclosure of Invention

The invention aims to overcome the defects of the existing colorectal cancer screening and morbidity risk prediction technology, and develops a new means for carrying out different risk stratification on general risk groups in China so as to identify high risk groups and then carry out colorectal cancer fine screening.

The invention aims to provide an SNP locus combination for colorectal cancer onset risk prediction.

Another objective of the invention is to provide a colorectal cancer onset risk prediction system (model).

The invention also aims to provide a colorectal cancer onset risk prediction kit.

The above purpose of the invention is realized by the following technical scheme:

the invention carries out external verification research on 75 common gene variations identified by existing colorectal cancer whole Genome association research (GWAs), and uses an LASSO (last absolute colorectal cancer and selection operator, LASSO) statistical method to comprehensively analyze and construct a colorectal cancer risk prediction model containing 19 variation sites.

Namely, the invention provides a SNP locus combination for colorectal cancer onset risk prediction, which comprises rs10411210, rs10774214, rs10795668, rs10936599, rs11903757, rs12603526, rs12953717, rs1321311, rs16969681, rs1801133, rs2423279, rs3802842, rs4813802, rs6061231, rs6469656, rs647161, rs 69877267, rs704017 and rs 7315438.

The application of the SNP locus combination in constructing a colorectal cancer onset risk prediction model or system and the application in preparing a colorectal cancer onset risk prediction kit also belong to the protection scope of the invention.

Based on the above, the application provides a colorectal cancer onset risk prediction model based on the above 19 variation sites, and the prediction is performed by using the above SNP site combination genotype data in the obtained sample to be tested, and using the following calculation formula: score ═ exp (β 0+ β 1 × SNP1+ β 2 × SNP2+ … … + β 19 × SNP19), where β 0 ═ 0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4.

Based on this, a colorectal cancer onset risk prediction system includes:

(1) a data acquisition module: the SNP genotype data of a sample to be predicted is collected, and the SNP is the SNP locus combination;

(2) a score calculation module: the data used for analysis (1), incorporated into a scoring system (i.e., in a model for risk prediction of morbidity) for risk scoring,

(3) a risk comparison module: used for the risk score to be included in the risk stratification standard and giving the risk judgment result,

(4) a risk result display module: and displaying the risk judgment result.

The specific method of the step (2) is as follows: bringing the data in the step (1) into a calculation formula of a scoring system to calculate the morbidity risk score; the calculation formula is as follows: score ═ exp (β 0+ β 1 × SNP1+ β 2 × SNP2+ … … + β 19 × SNP19), where β 0 ═ 0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4.

The risk stratification standard in the step (3) is as follows: when Score >1.6, it is judged as a high risk group, i.e. the relative risk is 3 times or more of that of the general population; when the Score is more than 0.9 and less than or equal to 1.6, the risk is judged to be the population with the risk, namely the relative risk is 1.5-3 times; when the Score is less than or equal to 0.9, the risk is judged to be low risk group, namely the relative risk is within 1.5 times.

Example (c): the SNP detection result of the patient A is as follows: rs10936599: TC; rs6061231 CA; rs10774214: CC; rs10795668 GA; rs11903757: TC; rs12603526, TC; rs1321311: GG; rs2423279: TC; rs3802842: CA; rs4813802: TT; rs6469656, GG; rs647161: CC; rs704017: AA; rs7315438: CC; rs10411210: CC; rs12953717 CC; rs16969681, TT; rs1801133: AA; rs 69883267 TG; the risk score is then: 0.56, the low risk group is judged.

And B, the SNP detection result of the patient is as follows: rs10936599: TC; rs6061231 CA; rs10774214: CC; rs10795668 AA; rs11903757: TT; rs12603526, TC; rs1321311: TG; rs2423279: TC; rs3802842: CA; rs4813802: TT; rs6469656, GG; rs647161 CA; rs704017 is GA; rs7315438: CC; rs10411210: CC; rs12953717 CC; rs16969681, CC; rs1801133: AG; rs 69883267 TG; the risk score is then: 1.14, the patients are judged to be the people with the risk of stroke.

Patient C, SNP detection results are: rs10936599: TC; rs6061231 CA; rs10774214 TC; rs10795668 AA; rs11903757: TC; rs12603526, TC; rs1321311: TG; rs2423279: TT; rs3802842: CA; rs4813802: TG; rs6469656, GA; rs647161 CA; rs704017 is GA; rs7315438: TC; rs10411210: CC; rs12953717 TC; rs16969681, CC; rs1801133: AA; rs 69883267: TT; the risk score is then: 1.97, the high risk group is judged.

According to the colorectal cancer onset risk prediction system provided by the invention, the area under the curve (AUC) for predicting colorectal cancer onset risk in training population is 0.61, and the AUC obtained by analysis in three independent verification populations of the south, east and north China is 0.59-0.61. Based on the predictive model of the invention, the risk of developing colorectal cancer in the individual with the highest quartile risk score is more than 2 times (2.12-2.90) higher than that in the individual with the lowest quartile risk score. The method has stable performance in colorectal cancer incidence risk prediction of different populations, is convenient in detection technology, provides possibility for evaluating incidence risk level for common risk population, can be used as the first step of individualized early screening and prevention strategy for colorectal cancer in China, and is beneficial to identifying high-risk individuals for colorectal cancer.

In addition, based on the colorectal cancer onset risk prediction kit, the kit comprises a reagent for detecting the SNP site combination and the prediction system.

As an alternative preferred scheme, the reagent is a PCR specific recognition primer and an extension primer, and the sequence is shown as SEQ ID NO. 20-76.

As an optional preferred scheme, the kit further comprises reagents required by PCR detection, wherein the reagents required by PCR detection comprise Taq enzyme, dNTP mixed liquor, DNA sample diluent and buffer solution.

As an alternative preferred scheme, the PCR reaction system is as follows:

as an alternative preferred embodiment, the PCR reaction procedure is as follows:

if the typing result is not ideal, the PCR reaction is optimized by adding the following conditions:

the colorectal cancer morbidity risk prediction kit provided by the invention can be used for simultaneously detecting the common variation sites of 19 colorectal cancer susceptibility genes, and can provide reference for the general survey of the individual colorectal cancer risk degree and the colorectal cancer high incidence area population, the screening of the colorectal cancer morbidity high risk population and the corresponding prevention measures.

Therefore, the application of the above-mentioned SNP site combination, the prediction model, the prediction system, and the kit in colorectal cancer screening, or in the preparation of colorectal cancer screening products, also fall within the scope of the present invention.

The colorectal cancer risk prediction model is robust, and has the potential of helping clinicians determine which subjects have higher genetic susceptibility, and individuals in high-risk groups (with 75% -100% of risk distribution) need to be subjected to enteroscopy, so that the colorectal cancer fine screening and early screening efficiency can be improved. Compared with the current gene-based cancer risk model applicable to western populations, the model achieves a considerably higher level of screening for the risk of onset of colorectal cancer. Colorectal cancer screening programs currently conducted in china include only risk assessment questionnaires and fecal occult blood tests, followed by recommendation of colonoscopic screening for high risk groups. Increasing genetic susceptibility screening in the population to assess risk of developing disease is more applicable to a stratified risk assessment program for the general population to predict future cancer risk for asymptomatic individuals. The screening results of these programs can further stimulate behavioral changes in people to prevent colorectal cancer in a healthier lifestyle.

Compared with a small number of research, the colorectal cancer onset risk prediction model based on gene variation is developed in European population. For example, in one study (HSU, LI, JEON, JIHYOUN, BRENNER, HERMANN, et al. A Model to determination color Cancer Risk Using Common Genetic similarity Loci [ J ] Gastroenterology,2015,148(7): 1330) 1339.) a predictive Model was constructed that contained 27 SNPs, with an AUC of 0.55. Another study (Dunlop M G, Tenesa A, Farrington S M, et al. Current impact of common genetic variants and other risk factors on color cancer in 42103 indicials- -Dunlop et al 62(6):871- -Gut [ J ] Gut,2013,62(6):871-81.) contains a predictive model of 10 SNPs with an AUC of 0.57, while the AUC of the model for inclusion of genotype, age, gender and family history states is 0.59. A weighted genetic risk score (G-score) based on 63 SNPs may suggest that the intensity of colorectal cancer risk for high risk population is 1.30 to 1.34 fold. The AUC of the risk prediction model established by the invention at least reaches 0.59 (ranging from 0.59 to 0.61), and risk stratification indicates that the intensity of the risk of colorectal cancer of high risk groups is more than 2 times of that of general individuals. The risk prediction model established by the invention is subjected to two-stage analysis of four independent queues, and internal and external verification, so that the discrimination capability is stable. The method is also one of the decisive conditions for screening and applying the potential clinical and population colorectal cancer onset risk.

The invention has the following beneficial effects:

the invention provides a colorectal cancer morbidity risk prediction model which is constructed by selecting 19 Single Nucleotide Polymorphism (SNP) loci typing information based on early-stage research of colorectal cancer susceptibility gene screening results of Chinese population, and specifically screening a primer sequence aiming at each locus. The kit provided by the invention is used for genotyping detection by adopting the genomic DNA from the peripheral blood of a detected person, detecting a single nucleotide site by using a competitive Allele Specific PCR (KASP) method, and presuming the risk degree of the detected person suffering from colorectal cancer by using a genetic risk scoring model, thereby overcoming the defect that the colorectal cancer onset risk judgment is carried out only by detecting fecal occult blood or depending on the detection results of few colonoscopes and the like which are yet to be developed in a large scale.

The model provided by the invention does not limit the age for screening the population, does not consider the gender difference, can detect at any age stage of the subject, prompts the risk degree of the subject suffering from colorectal cancer, can perform large-scale population census and screen high risk population suffering from colorectal cancer, realizes the risk prediction and early warning of colorectal cancer, improves the early diagnosis rate, and has one of the auxiliary means recommended as the colorectal cancer screening project.

The evaluation of colorectal cancer genetic susceptibility through gene variation detection is a novel auxiliary screening means for high risk population. The invention relates to a colorectal cancer onset risk gene prediction model established based on Chinese big sample group data. The invention screens the susceptible sites discovered based on the existing Genome-Wide Association Study (GWAs) in the international range, and integrates the susceptible sites closely related to colorectal cancer morbidity risk of Chinese population. Compared with the colorectal cancer onset risk prediction model based on genes in the western population at present, the risk prediction model disclosed by the invention has higher identification capability.

Repeatability and stability are methods for measuring the reliability of gene prediction models, which are particularly important for applying the same tumor incidence risk prediction model to different people, and are also one of important factors for determining whether the method has population popularization and measuring the reliability of the method. The simple detection of a single site or multiple sites in a previous sample limits the application of colorectal cancer related susceptibility genes in early tumor screening practice. In the invention, besides the queue data for constructing the gene model, the reliability of the developed model can be determined through one internal queue verification and two external queue verifications. The verification queues selected by the invention are respectively from the south, east and north parts of China, and people in the three areas may have better representativeness in China and meet the requirement of evaluating the external prediction effectiveness of the invention. Good repeatability is a prerequisite for guaranteeing the popularization and the application of the method in colorectal cancer genetic susceptibility screening of different populations in China. The scoring system derived from the prediction model can be used as one of screening tools for China to establish an individualized CRC prevention strategy, in the scoring system, the risk of the upper quartile is at least twice that of the general population, and the possibility of dividing different risk levels in the average risk population according to the risk prediction model is provided. The invention may be used as the first step of a personalized prevention strategy for colorectal cancer in China.

Drawings

FIG. 1 is a binary logistic regression model using the minimum absolute shrinkage and selection operator (LASSO) in accordance with the present invention. (A) The method comprises the following steps The LASSO model tuning parameter (λ) selection uses 10-fold minimum standard cross validation. λ is plotted over the area under the subject performance characteristic (AUC) curve. The dashed line is drawn at the optimum with 1 standard error of the minimum standard (1-se standard). By 10-fold cross validation, the lambda value was 0.012(1-SE standard). (B) The method comprises the following steps LASSO logistic regression coefficients for 75 SNPs. λ is plotted from the coefficient profile. The vertical dashed line is drawn at the optimal value with 1 standard error of the minimum standard (1-se standard), resulting in 19 non-zero coefficients (highlighted).

FIG. 2 shows that in the model building queue studied by the inventors, SNP locus association analysis is performed, and 19 selected loci are associated with colorectal cancer onset

FIG. 3 is the area under the working characteristic (AUC) of the subject based on the 19 SNP colorectal cancer onset risk prediction model in the invention. SYSUCC-training is a model construction research queue result from southern China, SYSUCC-training is a model internal verification research queue result from southern China, FUSCC-validation is a model external verification research queue result from eastern China, and CMU-validation is a model external verification research queue result from northern China.

Detailed Description

The model of the invention needs to collect the typing results of 19 related SNPs, and the SNP typing technology can adopt Sanger sequencing, flight mass spectrometry, TaqMan probe method, competitive Allele Specific PCR (KASP) method and the like. The KASP method can improve the analysis flux and sensitivity, and achieves higher detection rate with less sample dosage and reagent dosage and lower cost. Has obvious advantages in flux, sensitivity and flexibility. The invention is further described with reference to the drawings and the following detailed description, which are not intended to limit the invention in any way. Reagents, methods and apparatus used in the present invention are conventional in the art unless otherwise indicated.

Unless otherwise indicated, reagents and materials used in the following examples are commercially available.

Example 1 SNP site combination related to model for predicting colorectal cancer onset risk and model construction

Based on 75 key susceptibility gene variation sites discovered by colorectal cancer GWAS research in an international range (Table 1), based on the association analysis of large-scale sample volumes of Chinese colorectal cancer population and health control population (3049 colorectal cancer patients and 2557 health bank control individuals in population cohort in southern China), 75 SNPs are subjected to fitting analysis by using a LASSO statistical method, a model is trained by using 10-fold cross validation, and a test data set is randomly selected each time.

TABLE 1 prediction model invented by the present invention is based on 75 SNPs from GWAs

The invention fits models containing different amounts of SNPs in the analysis process, and finally obtains an optimized colorectal cancer onset risk prediction SNP site combination containing 19 SNPs and a model (figure 1) through comprehensive analysis, wherein the model formula is Score ═ exp (beta 0+ beta 1 × SNP1+ beta 2 × SNP2+ … … + beta 19 × SNP19), wherein beta 0 ═ -0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4. The sequences of the 19 SNP sites are shown in Table 2.

The effect of 19 SNPs in the model on the risk of colorectal cancer onset is shown in figure 2.

When the impact of the risk allele on the risk of colorectal cancer onset is calculated, a single SNP in the model carries a 1.13 to 1.63-fold risk compared to the reference genotype. During the training phase, the area under the working characteristic (AUC) of the subject based on the 19 SNP gene model was 0.61 (fig. 3).

TABLE 2 sequences of 19 SNP sites significantly related to colorectal cancer onset determined by the present invention

Example 2 colorectal cancer onset risk prediction kit

A kit for colorectal cancer onset risk prediction, comprising: PCR amplification primers and single-base extension primers for detecting 19 SNP sites (19 SNP sites shown in Table 2) of the human genome.

The specific kit is prepared as follows:

1. designing and synthesizing PCR-specific recognition primers and extension primers for the SNP sites as shown in Table 3

TABLE 3 PCR amplification primers (Primer Allle _ FAM, Primer Allle _ HEX) and Co-extension primers (Primer Common) for specific recognition of biallelic genes at 19 SNP sites to be tested

2. Construction of the kit

Other components of the kit include: taq enzyme, dNTP mixed solution, diluent, buffer solution and the like, and the details are shown in the following detection method.

3. The detection method comprises the following steps:

(1) DNA sample extraction

Using Qiagen DNA midi kit (100) kit or similar products, genomic DNA from peripheral blood was extracted, dried at 65 ℃ for 30min, quantified with a spectrophotometer, and stored at-20 ℃ for future use.

(2) DNA sample dilution and addition (96-well reaction System)

The total reaction volume of the 96-well plate reaction system is 10 mu L, and the content of the samples in the reaction system is 8-10 ng/mu L, namely the total amount of the samples in each well is about 80-100 ng. The samples corresponding to the total amount above were added to a 96-well PCR reaction plate and centrifuged.

(3) Construction of PCR reaction System

The total amount of the added DNA is 80-100ng

Plate format	96well plate
		2*KASP master mix	5μL
72*Assay mix	0.14μL
		DNA samples	4.86μL
H₂O	0μL
		Total	10μL
	1reaction

(4) And (3) sealing the 96-hole PCR reaction plate, shaking and centrifuging to ensure that the reaction system is uniformly mixed.

(5) After centrifugation, PCR was carried out under the following reaction conditions:

(6) and reading the plate after the reaction on a microplate reader Pheastar, and reading the fluorescent signal of the reaction plate.

(7) If the typing result is not ideal, the PCR reaction is optimized by adding the following conditions:

embodiment 3 internal verification of prediction efficiency of prediction model constructed by the invention

The prediction efficacy of the model constructed by the invention is verified in an independent validation cohort from southern China, the validation cohort is 3017 colorectal cancer patients and 2488 healthy control individuals, the AUC for predicting the colorectal cancer onset risk is 0.59 (figure 3), and the model is shown to have good internal efficacy.

Example 4 external verification of prediction efficiency of prediction model constructed by the invention

The prediction model constructed by the invention is further verified in two independent queues in northern China and eastern China, the study objects of northern China sources comprise 1242 colorectal cancer patients and 1027 healthy control individuals, the study objects of eastern China sources comprise 1414 colorectal cancer patients and 1152 healthy control individuals, the AUC of colorectal cancer onset risk prediction are 0.59 and 0.61 respectively, and the prediction model is proved to have good external validity (figure 3).

Example 5 model-derived risk stratification constructed according to the invention

We assigned the risk and reference genotypes according to the genetic model of each SNP and its coefficients in the risk prediction model (Table 4) and calculated the size of the risk of developing tumors, and the research and development stage of the invention relates to colorectal cancer risk distribution of cohort population (Table 5). We measure risk scores and stratify with quartiles, with group 1 having a risk score below 25% as a reference. In the model construction cohort and the three validation cohorts, the colorectal cancer incidence risk increased with increasing risk score. The relative risk of morbidity in group 2 was slightly increased, 1.22-1.61 times that in group 1. The relative risk of disease in group 3 was moderately elevated, 1.73-2.00 times that in group 1. The relative risk of the disease onset is highest in group 4, and the disease onset risk of group 1 is more than 2 times (2.12-2.90).

TABLE 4 19 SNPs involved in the model studied by the present invention and their contribution in the model

TABLE 5 colorectal cancer incidence Risk size distribution for cohort groups to which the present invention relates

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. An SNP locus combination for colorectal cancer onset risk prediction is characterized by comprising rs10411210, rs10774214, rs10795668, rs10936599, rs11903757, rs12603526, rs12953717, rs1321311, rs16969681, rs1801133, rs2423279, rs3802842, rs4813802, rs6061231, rs6469656, rs647161, rs 69877267, rs704017 and rs 7315438.

2. Use of the combination of SNP sites according to claim 1 for constructing a model or system for predicting colorectal cancer onset risk.

3. Use of the SNP site combination according to claim 1 for preparing a kit for predicting colorectal cancer onset risk.

4. A model for predicting colorectal cancer onset risk, which is characterized in that the SNP locus combined genotype data of the obtained sample to be tested is used for predicting by the following calculation formula: score ═ exp (β 0+ β 1 × SNP1+ β 2 × SNP2+ … … + β 19 × SNP19), where β 0 ═ 0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4.

5. A colorectal cancer onset risk prediction system, comprising:

(1) a data acquisition module: SNP genotype data for collecting a sample to be predicted, the SNP being the combination of SNP sites described in claim 1;

(2) a score calculation module: for analysis of the data of (1), inclusion in a scoring system for risk scoring,

(4) a risk result display module: and displaying the risk judgment result.

6. The colorectal cancer onset risk prediction system of claim 5, wherein the specific method of step (2) is: bringing the data in the step (1) into a calculation formula of a scoring system to calculate the morbidity risk score; the calculation formula is as follows: score ═ exp (β 0+ β 1 × SNP1+ β 2 × SNP2+ … … + β 19 × SNP19), where β 0 ═ 0.396; exp is taken from the natural index; SNP1 and SNP2 … … SNP19, which are assigned according to the corresponding genotype scores of the SNPs in the table 4; β 1, β 2 … … β 19, the specific values of the β values being in accordance with the coefficient series in table 4.

7. The system for predicting colorectal cancer onset risk according to claim 5, wherein the risk stratification criteria in step (3) are: when Score >1.6, it is judged as a high risk group, i.e. the relative risk is 3 times or more of that of the general population; when the Score is more than 0.9 and less than or equal to 1.6, the risk is judged to be the population with the risk, namely the relative risk is 1.5-3 times; when the Score is less than or equal to 0.9, the risk is judged to be low risk group, namely the relative risk is within 1.5 times.

8. A kit for predicting colorectal cancer onset risk, comprising a reagent for detecting the combination of SNP sites according to claim 1, and the prediction system according to any one of claims 5 to 7.

9. The kit according to claim 8, wherein the reagent for detecting the SNP site combination according to claim 1 is a PCR specific recognition primer and an extension primer, and the sequences of the primers are shown as SEQ ID NO: 20-76 in sequence.

10. The use of the SNP site combination according to claim 1, the model for predicting colorectal cancer onset risk according to claim 4, the prediction system according to claim 5 or the kit according to any one of claims 8 to 9 for colorectal cancer screening, or for preparing a colorectal cancer screening product.