CN114283883A - Liver cancer tumor screening model based on molecular marker and application - Google Patents

Liver cancer tumor screening model based on molecular marker and application Download PDF

Info

Publication number
CN114283883A
CN114283883A CN202111611165.9A CN202111611165A CN114283883A CN 114283883 A CN114283883 A CN 114283883A CN 202111611165 A CN202111611165 A CN 202111611165A CN 114283883 A CN114283883 A CN 114283883A
Authority
CN
China
Prior art keywords
liver cancer
screening
snps
model
snp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111611165.9A
Other languages
Chinese (zh)
Other versions
CN114283883B (en
Inventor
付小斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Huace Aipu Medical Laboratory Co ltd
Original Assignee
First Affiliated Hospital Of Hebei North University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by First Affiliated Hospital Of Hebei North University filed Critical First Affiliated Hospital Of Hebei North University
Priority to CN202111611165.9A priority Critical patent/CN114283883B/en
Priority to CN202211249896.8A priority patent/CN115719613A/en
Priority to CN202211250017.3A priority patent/CN115762635A/en
Publication of CN114283883A publication Critical patent/CN114283883A/en
Application granted granted Critical
Publication of CN114283883B publication Critical patent/CN114283883B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application discloses a method for establishing a liver cancer tumor screening model based on molecular markers, which comprises the steps of obtaining an SNP data set associated with liver cancer; based on the data set, screening to obtain SNPs for modeling as model variables; calculating relative risk values of different genotypes of each model variable; and obtaining the tumor screening model based on the relative risk value. The application discloses a liver cancer screening, risk prediction and/or diagnosis method based on a tumor screening model constructed by the method, and a related kit, system, device, computer readable storage medium and equipment, so that early accurate screening of liver cancer of each stage is realized.

Description

Liver cancer tumor screening model based on molecular marker and application
Technical Field
The application relates to the technical field of gene detection, in particular to a liver cancer tumor screening model based on molecular markers and application thereof.
Background
The primary prevention of liver cancer is the etiological prevention, and the traditional etiological factors of liver cancer mainly comprise virus infection (hepatitis B virus and hepatitis C virus), exposure of aflatoxin and microcystin, smoking, alcoholism and other bad habits and habits; it is found that non-alcoholic fatty liver has become the main cause of liver cancer in developed countries; in addition, researches find that diabetes is an independent risk factor for liver cancer; the prevalence rate of liver cancer of the high BMI population is 5 times higher than that of the normal population. Aiming at the traditional causes, the country has taken corresponding measures that hepatitis B vaccine is inoculated to all newborns for free in 2005; after standard treatment, the chronic hepatitis B can be effectively controlled; the HCV virus can be completely eliminated after the chronic hepatitis C is treated by the antiviral treatment.
The tertiary prevention of liver cancer mainly refers to improvement of clinical treatment methods and research and development of novel medicines. The current liver cancer diagnosis method commonly used in clinic is that the diagnosis method combines clinical symptoms to carry out imaging examination, including ultrasound, X-ray Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Digital Silhouette Angiography (DSA) and nuclear medicine imaging method (PET CT, SPECT CT), and liver puncture biopsy is carried out when necessary. These diagnostic methods, which are used clinically, are either low in sensitivity or expensive or traumatic and are difficult to meet the needs of early screening of the population. The international health organization recommends that the B ultrasonic examination and the AFP examination are carried out twice a year on patients with liver cirrhosis so as to realize the early diagnosis of liver cancer. However, current research shows that AFP sensitivity is low, the AFP level of about 40% of liver cancer patients is not increased, especially in early liver cancer patients, the proportion is higher, and European liver cancer research Association (EASL) does not recommend AFP as a diagnosis index of liver cancer [ [19-211 ]. Ultrasound is used as an imaging diagnosis method, and has strong dependence on the technical level of doctors; in addition, the tumor volume of the patient can be detected by ultrasonic only when accumulated to a certain degree; therefore, the sensitivity of diagnosing early liver cancer, especially small liver cancer, is low. Therefore, an objective liver cancer tumor screening mode which is noninvasive, easy to popularize and suitable for early diagnosis of liver cancer is urgently needed to be found.
Disclosure of Invention
In view of the above, the present application aims to provide at least one improved liver cancer screening model, so as to realize a noninvasive, easily popularized and suitable early diagnosis method for liver cancer.
In a first aspect, the embodiment of the present application discloses a method for establishing a liver cancer tumor screening model based on molecular markers, comprising:
obtaining an SNP dataset associated with liver cancer;
based on the data set, screening to obtain SNPs for modeling as model variables;
calculating relative risk values of different genotypes of each model variable; and
obtaining the tumor screening model based on the relative risk value.
In the embodiment of the present application, the step of screening SNPs for modeling specifically includes: and selecting the SNP with the SNP spacing within 50Mb and the r2 of more than 0.9 which is analyzed continuously as the SNP constructed by the model according to the linkage disequilibrium analysis result of each SNP on different chromosomes.
In the embodiment of the present application, after the step of screening SNPs for modeling, the method further includes:
obtaining the individual effect value and the phenotypic parameter of each SNP locus on the occurrence of the liver cancer according to the individual effect of each SNP in the data set; wherein the individual effect value is a statistical probability of a single SNP to have liver cancer in the data set; the phenotype parameter is the data set, and the phenotype of the single SNP in the genetic process of a single individual is the statistical frequency of the dominant genetic individual suffering from the liver cancer; and
calculating individual effect values of single individuals by using Logistic regression analysis, correcting and weighting to obtain genetic scores; and
according to the individual effect values, the phenotypic parameters and the genetic scores, the SNPs-based liver cancer weighted risk screening score of each individual can be calculated, and the cancer risk of each individual can be judged according to the liver cancer weighted risk screening score.
In the examples of the present application, SNPs screened for modeling include AT least one of TAGA rs15945924, FBXW rs11744825, RANBP1 rs17033807, GNA rs5741536, TY rs8896114, TGM rs239809, DUOX rs4539964, RE rs4362209, AT rs10819989, ATP7 rs5251533, MUTY rs 4579862.
In the examples of the present application, SNPs screened for modeling included TAGA rs15945924, FBXW rs11744825, RANBP1 rs17033807, GNA rs5741536, TY rs8896114, TGM rs239809, DUOX rs4539964, RE rs4362209, AT rs10819989, ATP7 rs5251533, MUTY rs 4579862.
In the examples of the present application, SNPs screened for modeling included FBXW rs11744825, GNA rs5741536, TY rs8896114, TGM rs239809, DUOX rs4539964, RE rs4362209, ATP7 rs5251533, MUTY rs 4579862.
In the examples of the present application, SNPs screened for modeling included TAGA rs15945924, FBXW rs11744825, GNA rs5741536, TY rs8896114, TGM rs239809, DUOX rs4539964, RE rs4362209, AT rs10819989, ATP7 rs5251533, MUTY rs 4579862.
In a second aspect, embodiments of the present application disclose a method for screening, risk prediction and/or diagnosis of liver cancer, the method comprising the step of using a liver cancer tumor screening model constructed by the construction method of the first aspect.
In a third aspect, the present application discloses a kit for screening, risk prediction and/or diagnosis of liver cancer, which comprises a reagent for detecting genotyping in the tumor screening model constructed according to the construction method of the first aspect.
In a fourth aspect, embodiments of the present application disclose a system or device for liver cancer screening, risk prediction and/or diagnosis, the system or device comprising:
the acquisition module is used for acquiring an SNP data set associated with the liver cancer;
the screening module is used for screening SNPs used for modeling based on the data set to serve as model variables;
the calculation module is used for calculating the relative risk value of different genotypes of each model variable;
the construction module is used for constructing and obtaining the liver cancer tumor screening model based on the relative risk value; and
and the data analysis module is used for inputting the relative risk value of the SNP used for modeling of the individual to be tested into the liver cancer tumor screening model constructed according to the construction method of the first aspect so as to obtain a prediction result.
Compared with the prior art, the application has at least the following beneficial effects:
the liver cancer tumor screening model provided by the application not only provides a tumor screening model construction method capable of obtaining more accurate prediction results, but also provides a new tumor prediction index combination, and achieves a prediction effect superior to that of the prior art. In addition, the method for screening the liver cancer tumor by using the model does not depend on the progress degree of the tumor, has no obvious difference in prediction effect in tumor patients in different stages, is applicable to various stages of the tumor, and can solve the problem that the early and extremely early tumors are difficult to screen.
Drawings
FIG. 1 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 6 provided in the examples of the present application.
FIG. 2 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 12 as provided in the examples of the present application.
FIG. 3 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 5 provided in the examples of the present application.
FIG. 4 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 10 provided in the examples of the present application.
FIG. 5 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 19 provided in the examples of the present application.
FIG. 6 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 20 provided in the examples of the present application.
FIG. 7 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 11 provided in the examples of the present application.
FIG. 8 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 1 provided in the examples of the present application.
FIG. 9 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 7 provided in the examples of the present application.
FIG. 10 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 15 provided in the examples of the present application.
FIG. 11 is a graph of r2 distribution from linkage disequilibrium analysis of 30 SNPs on chromosome X provided in the examples of the present application.
FIG. 12 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 9 provided in the examples of the present application.
FIG. 13 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 17 provided in the examples of the present application.
FIG. 14 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 13 provided in the examples of the present application.
FIG. 15 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 12 provided in the examples of the present application.
FIG. 16 is a r2 distribution graph of linkage disequilibrium analysis of 30 SNPs on chromosome 18 provided in the examples of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. Reagents not individually specified in detail in this application are conventional and commercially available; methods not specifically described in detail are all routine experimental methods and are known from the prior art.
Liver cancer gene data acquisition
1. Data retrieval
(1) Database for search
Literature searches were performed using NCBI, GenBank, PubMed, EMBASE, Cochrane Library, Web of Science, CNKI (Chinese), Wanfang (Chinese), VIP (Chinese), and CBM (Chinese) databases. To further avoid the omission of some potential risk factors, the present application also tests some relevant bibliography, such as Novel markers in viral live disease and platelet cancer, authors: rode, Anthony Philip, publication date: 2013.
(2) retrieval strategy
The combination of the searched Chinese key words is as follows: "risk factors", "cirrhosis", "liver cancer" or "liver", "cancer" or "tumor"; "polymorphism", "single nucleotide polymorphism" and "genetic variation"; "fusion gene", "clinical index" and "gene molecular marker".
The English keywords for retrieval are combined as follows: "risk fault" and "lift cancer", "hazard" and "lift cancer", "pharmaceutical disease", "pharmaceutical", "tum of cancer", "Neoplasms", "carcinoma", "tum", "polymph", "single nucleotide polymph", "SNP", "fused gene", "Clinical index", "Clinical parameters", "variant", "variation", "China", "Chinese", "assay".
(3) Data inclusion criteria
Analyzing the relationship between the gene factors and the susceptibility of the liver cancer by taking human as a research object according to the Odds ratio (l);
exclusion criteria: meta analysis or review, data studies published in summary form only, sample size of control or case groups of less than 10 and Minor Allele Frequency (MAF) of control groups of less than 1%.
(4) Document prescreening
And importing the document entries acquired by the online public database into NoteExpress, storing the bibliography, and removing repeated bibliographies among the databases by using a duplicate checking function module. And according to the document title, abstract and key words, and combining the full text, eliminating the documents which do not meet the inclusion standard.
2. Document data extraction
Data extraction is carried out on documents meeting the inclusion standard, and the extracted main information comprises: sample size, study area, sample origin, liver cancer morphotype (block, node and diffuse), and liver cancer histology typing (hepatocyte, cholangiocyte and mixed), study design, genetic factors (genotype, allele, molecular marker, etc.).
3. Document data quality assessment
Quality evaluation of the incorporated literature data was performed using the Venice Standard (Ioamudis JP, Boffetta P, Little J, et al. Association of social evaluation on genetic associations: interior guidelines [ J ]. Int J epidemic, 2008,37(1): 120-132.). The confidence level of the obtained literature data is determined by three parameters of literature quantity, heterogeneity and bias (a is strong, B is neutral and C is weak), and the grade can be classified as (r) AAA-strong evidence. ② AAB, ABA, ABB, BAA, BBA, BBB, BAB-moderate evidence. The remaining classifications will be treated as evidence of low confidence.
In addition to the Venice criteria, the significant association results are evaluated by calculating the False Positive Reporting Probability (FPRP) (assembling the False positive adaptive response False positive result: an adaptive quick molecular identity students [ J ]. J Natl Cancer Inst.2004.96(61: 434. sup. 442.)) to avoid the False association probability that is well performed and the sample size is large, but the calculated False positive probability is still high.
4. Statistical analysis
The Review Manager 5.3.5(Cochrane cloning, Oxfld, UK) was used for analysis of combinations to evaluate the association between genetic factors, non-genetic factors and liver cancer risk. Genetic factors or non-genetic factors are considered as different variables, and if three or more available independent data sets exist in each variable, then the combined analysis is carried out. For a variable of a genetic factor, an allele model can be calculated using a genetic model for genome-wide association analysis (GWAS) to discover and validate available independent datasets of the capability of the variable.
Finally, observations and expectations were analyzed and normal distribution analyzed by SPSS 21.0 software on Q-Q (quartz-quartz) plots for inclusion in each available independent data set. A general trend distribution analysis was performed using Visual Studio 2013 to observe the joint l-value and cumulative frequency distribution of possible combinations of all variables.
The Relative Risk (RR) of the exposed part of the risk factor in the population is divided into RR of the exposed and unexposed parts of the population, and the RR of a given exposure effect is obtained. If the incidence is very low (rare disease, generally incidence less than 1/10000), RR is approximately equal to l (RR ≈ l), and the RR estimate is replaced with l for Meta analysis.
The epidemiological effect of each risk factor is evaluated by using the Attributable Risk (ARP) and the Population Attributable Risk (PARP) as indexes.
ARP=|l-1/l|×100%;PARP=Pe|(l-1)|/Pe(l-1)+1]|×100%;
In the formula, PeIs the control group or population risk allele (factor) frequency.
Calculating the average risk of a single SNP in the population, i.e. Genetic fraction (Genetic scale), from the genotype frequency of Genetic variations in the haplotype map of the human genome (HapMap) and the pooled l, the Genetic scale ═ 1-P2l2+2P(1-P)l+P2(ii) a P is the risk allele frequency.
The Q-Q graph is used to determine whether two data sets are from a population having a common distribution. All P values are two-sided, with P <0.05 being statistically significant. Regression analysis, sensitivity analysis and publication bias analysis were performed using STATA 13.1(StataClp College Station, TX, USA).
Second, result in
Table 1 shows the statistics of genetic factors and liver cancer risk and evidence rank analysis. Tables 2 and 3 show the statistical results of the correlation analysis of the SNPs of each gene and the risk of liver cancer. From table 2, of the 30 SNPs associated with liver cancer onset, 3 SNPs (LT × rs5246916, ZNF35 × rs5246916, ARFGAP × rs4718842) were rated as high quality, 27 SNPs (TAGA × rs15945924, FBXW × rs11744825, HAPLN rs8294854, RHOBTB rs 6267676767282, NLRP rs5545282, MSH × rs 95235, RANBP1 × rs 33807, GNA × rs5741536, TY × rs8896114, CSMD × rs3411226, CASS rs5502816, NRD rs 90934, TFR × 22522591, TGM 9848, csm × rs 451226, CASS × 5648 rs, ag417948 rs, agus × 434742, agus × 4348, agus × 48, agus 5648 × rs 4705, TFR × 22591, TGM 9848 × rs, tga × 4548 × 435648 × 4348 × rs × 48 × pgs × 48 × 434705, agus × 48 × wt × rs × 48 × wt × rs, TSC × 48 × wt. The 30 SNPs are named sequentially in Table 2, see column 1 brackets.
TABLE 1 analysis of SNPs and risk and evidence of liver cancer
Figure BDA0003434931050000091
Figure BDA0003434931050000101
Figure BDA0003434931050000111
TABLE 2 analysis of association of SNPs genes with liver cancer Risk (1)
Figure BDA0003434931050000112
TABLE 3 analysis of association of SNPs genes with liver cancer Risk (2)
Figure BDA0003434931050000113
Figure BDA0003434931050000121
Figure BDA0003434931050000131
Establishment and evaluation of liver cancer tumor risk screening model
Method and device
1. Data selection for correlation analysis
After quality control, the remaining 235 individuals (each dataset in table 1 as one individual) and 30 valid SNPs were used for linkage disequilibrium analysis in subsequent studies to obtain association data.
LD measurement: the degree of linkage disequilibrium is usually determined by D' and r2Measure, this study chooses r2As a measure of LD. r is2A relationship indicating the degree of statistical and genetic correlation between two loci (0)<r2<1),r2Is insensitive to the change of gene frequency and shows stable performance. r is2The calculation formula of (2) is as follows: r is2=(PA1B1-PA1×PB1)2/PA1×(1-PA1)×PB1×(1-PB1);
In the formula, PA1 and PB1 are the frequencies of the 1 st allele at the two marker loci, and PA1B1 is the frequency of the haplotype formed between the alleles.
Calculating r among the SNPs by adopting H aploview software2And (4) performing subsequent statistical analysis by using R software.
2. Model building process
(1) According to the SNP spacing and r on chromosome 6, 12, 5, 10, 19, 20, 11, 1, 7, 15, X, 9, 17, 13, 15, 12 and 182Distribution map, selecting the ones with SNP spacing within 50Mb and continuously analyzing2>0.9, based on SNP selection as a model construction basis.
(2) Obtaining an individual effect value (l) and a phenotype parameter (f) of each SNP locus on liver cancer occurrence according to the obtained individual effect of each SNP in the data set, wherein the individual effect value is the statistical probability of the single SNP for liver cancer occurrence in the data set; a phenotype parameter, which is in the data set, wherein the phenotype of the single SNP in the genetic process of a single individual is the statistical frequency of the dominant genetic individual suffering from the liver cancer;
(3) and calculating the individual effect value of a single individual by using Logistic regression analysis, correcting and weighting to obtain the genetic score (W).
(4) Taking a single SNP genotype of an individual as a variable, wherein the genotype comprises an allelic type, a heterozygote type, a homozygote type, a phenotype type and a stealth type as five variables respectively, the genotype of a certain SNP is AA dominant, AA recessive, AB dominant, AB stealth and BB, A is a risk allele, B is a non-risk allele, and corresponding risk values are respectively taken asIs 12×f2、l2×(1-f)21X (1-l) X (1-f), 1X (1-l) and (1-l)2
(5) Calculate the relative risk value for each SNP as [ l2×f2+l2×(1-f)2/+l×(1-l)×(1-f)+l×(1-l)+(1-l)2]/W;
(6) Then the SNPs-based liver cancer weighted risk screening score M of each individual is as follows: SNPn, SNP1 × SNP2 × snpp 3.; SNPn is a relative risk value of n SNPs screened.
Through the steps, the liver cancer weighted risk screening scores of the n SNPs of each individual can be obtained, and the cancer risk of each individual can be judged according to the liver cancer weighted risk screening scores.
3. Logistic regression analysis
The Logistic regression model is suitable for data with dependent variables as classification variables. The model is linear fitting by using logarithm of the ratio of the occurrence probability and non-occurrence probability of an event as a dependent variable, and the regression coefficient is estimated by a maximum likelihood method.
In the research, Logistic regression fitting is carried out in a training set through a glm function in R.3.6.2, a bestglm function in a bestglm packet is subjected to variable screening of an optimal model through a ten-fold cross verification method and a minimum BIC (Bayesian inflammation criterion) criterion method, and the two models are compared through an anova function. And then, carrying out diagnosis prediction on the verification set sample by using a predict function, forming a four-grid table matrix by using the fusion matrix and the mis ClassErrl functions in the InflimationValue package, and calculating the error rate or the coincidence rate. And finally, drawing a nomogram of the model by utilizing a regplot function in a regplot packet, and drawing an ROC curve by utilizing a pROC packet. The comparison between the two ROC curves uses the following method:
(1) group comparison: two ROC curves were obtained from different observers, the two samples used were completely independent, and the test formula was:
Figure BDA0003434931050000151
(2) pairing and comparing: the two diagnostic methods use the same sample, each subject performs the two tests simultaneously, and then compares their diagnostic effects. The test formula is as follows:
Figure BDA0003434931050000152
Figure BDA0003434931050000153
in the formula, Al and A2 are areas under two sample ROC curves respectively, SE2(A1) and SE2(A2) are standard errors of the areas under the two sample ROC curves, and Cov (A1, A2) is covariance estimated by the areas of the two samples and can be calculated by a nonparametric method given by Delong. Under the condition of a large sample, Z approximately follows standard normal distribution, under the condition of a check level alpha, Z > Za/2, two diagnosis methods can be considered to be different, and the two diagnosis methods can be realized by MedCalc software.
4. Authentication object
Cases recruited from the south, the middle and the north of China are divided into model crowds and verification crowds 1-3, the model crowds are used for dividing the M scores obtained by the established model into risk areas, and the verification crowds are used for verifying the accuracy of the model. The inclusion criteria of patients with liver cancer in all cases is that the patients are diagnosed with primary liver cancer, do not receive radiotherapy and chemotherapy, and are pathologically verified. Case-control population individuals do not have relationship. All patients had signed informed consent and were approved for administration by the ethics committee.
TABLE 4
Figure BDA0003434931050000154
Figure BDA0003434931050000161
5. DNA extraction and genotyping
Extracting 5-10 mu g of DNA from serum of each case or healthy group case, breaking the DNA into fragments of 100-400 bp, and establishing a Library by using SeqCap EZ Human Exome Library v3.0 (Roche), wherein the Library type is a small fragment Library of 2 x 180bp DNA. The PCR-RFLP method is used for genotyping, and primers of 30 corresponding SNPs are provided by Shanghai Ministry of Engineers.
Second, result in
Based on the search in tables 1 to 3, genes and SNPs were obtained for each genome r on different chromosomes2The distribution of (a) was plotted as shown in the figure. For SNPs on different chromosomes, the spacing is within 50Mb and r is continuously analyzed2>The SNPs of 0.9 were counted, and the results are shown in Table 5.
TABLE 5 analysis of linkage disequilibrium of SNPs
Figure BDA0003434931050000162
Figure BDA0003434931050000171
Different weighted genetic risk score models can be constructed based on the SNPs selected in table 5, as shown in table 6.
TABLE 6 weighted genetic Risk Scoring model
Figure BDA0003434931050000172
TABLE 7M quantile distribution of model population
Figure BDA0003434931050000173
Figure BDA0003434931050000181
Calculating M scores of model groups according to models generated by embodiments 1-3 and comparative examples 1-2, and counting the arrangement of M scores, wherein the results are shown in Table 7, and in Table 7, the range of 0.35-0.45 does not include 0.35 but includes 0.45; "/" denotes no case; "+" indicates the number of actual cancer cases of the corresponding model population in Table 4, and "-" indicates the number of actual healthy cases of the corresponding model population in Table 4. In Table 7, the percentage of cases with M values exceeding 0.35 to the total number of cases of the model population is the predicted positive rate of example 1, and the percentage of cases with M values exceeding 0.3 to the total number of cases of the model population is the predicted positive rate of examples 2 to 3 and comparative examples 1 to 2. In Table 7, the positive match rate is equal to the percentage of the predicted positive rate of each example or comparative example to the real positive rate (59.39%) of liver cancer in the model population.
As can be seen from the results in Table 7, the positive coincidence rates of the comparative examples 1-2 exceed 90%, but the true healthy cases of the comparative examples 1-2 occur when the M value is 0.5-0.6, which indicates that the accuracy rates of the comparative examples 1-2 are lower than those of the examples 1-3. For the model constructed based on SNPs in the embodiment of the present application, when the calculated M value is greater than 0.3 or 0.35, it indicates that the individual is at risk of cancer, and not vice versa.
Further, the models obtained in examples 1 to 3 and comparative examples 1 to 2 were verified by using the verified population 1, and the results are shown in table 8 (the predicted positive rate is calculated as an M value larger than 0.3). For the verification population 1, the positive coincidence rates of the examples 1 to 3 and the comparative examples 1 to 2 are high. However, it is found in the M-quantile distribution provided in comparative examples 1-2 that a truly positive liver cancer case appears in a case with an M value of less than or equal to 0.3, and a large number of healthy control cases appear in a range with an M value of 0.3-0.35, and thus, the M-quantile model provided in comparative examples 1-2 has a certain error in risk evaluation of liver cancer positive cases and healthy negative cases. The M-bin models provided by embodiments 1-3 have smaller errors.
Table 8 verifies M quantile distribution for population 1
Figure BDA0003434931050000191
Further, the models obtained in examples 1 to 3 and comparative examples 1 to 2 were verified by using the verified population 2, and the results are shown in table 9 (the predicted positive rate is calculated as an M value larger than 0.3). For the verification population 2, the positive coincidence rates of the examples 1 to 3 are all high. Moreover, the M-quantile distribution provided by the comparative examples 1-2 shows that the true positive liver cancer cases appear in the cases with the M value less than or equal to 0.3, and a large number of healthy control cases appear in the range with the M value of 0.3-0.35, so that the M-quantile model provided by the comparative examples 1-2 has certain errors in risk evaluation of the liver cancer positive cases and the healthy negative cases. The M-bin models provided by embodiments 1-3 have smaller errors.
Table 9 verifies M quantile distribution for population 2
Figure BDA0003434931050000192
The positive coincidence rates of the liver cancer in the verification populations 1 and 2 are respectively 84.36% and 81.24% by analyzing the Logistic regression model provided by the comparison example 3, which is not as high as the positive coincidence rate of the evaluation model provided by the embodiment of the present application.
Therefore, the optimal model for liver cancer risk assessment and screening based on SNPs provided by the embodiment of the application is shown, and the screening accuracy is higher.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application.

Claims (10)

1. The establishment method of the liver cancer tumor screening model based on the molecular marker is characterized by comprising the following steps:
obtaining an SNP dataset associated with liver cancer;
based on the data set, screening to obtain SNPs for modeling as model variables;
calculating relative risk values of different genotypes of each model variable; and
obtaining the tumor screening model based on the relative risk value.
2. The method according to claim 1, wherein the step of screening SNPs obtained for modeling specifically comprises:
according to the linkage disequilibrium analysis result of each SNP on different chromosomes, selecting the SNP intervals within 50Mb and continuously analyzing r2>0.9, SNP constructed as a model.
3. The method of claim 2, wherein the step of screening SNPs for modeling further comprises:
obtaining the individual effect value and the phenotypic parameter of each SNP locus on the occurrence of the liver cancer according to the individual effect of each SNP in the data set; wherein the individual effect value is a statistical probability of a single SNP to have liver cancer in the data set; the phenotype parameter is the data set, and the phenotype of the single SNP in the genetic process of a single individual is the statistical frequency of the dominant genetic individual suffering from the liver cancer; and
calculating individual effect values of single individuals by using Logistic regression analysis, correcting and weighting to obtain genetic scores; and
according to the individual effect values, the phenotypic parameters and the genetic scores, the SNPs-based liver cancer weighted risk screening score of each individual can be calculated, and the cancer risk of each individual can be judged according to the liver cancer weighted risk screening score.
4. The method of claim 3, wherein the SNPs screened for modeling include AT least one of TAGA rs15945924, FBXW rs11744825, RANBP1 rs17033807, GNA rs5741536, TY rs8896114, TGM rs239809, DUOX rs4539964, RE rs4362209, AT rs10819989, ATP7 rs5251533, MUTY rs 4579862.
5. The method of claim 4, wherein the SNPs screened for modeling include TAGA rs15945924, FBXW rs11744825, RANBP1 rs17033807, GNA rs5741536, TY rs8896114, TGM rs239809, DUOX rs4539964, RE rs4362209, AT rs10819989, ATP7 rs5251533, MUTY rs 7945862.
6. The method of claim 4, wherein the SNPs screened for modeling include FBXW rs11744825, GNA rs5741536, TY rs8896114, TGM rs239809, DUOX rs4539964, RE rs4362209, ATP7 rs5251533, MUTY rs 4579862.
7. The method of claim 4, wherein the SNPs screened for modeling include TAGA rs15945924, FBXW rs11744825, GNA rs5741536, TY rs8896114, TGM rs239809, DUOX rs4539964, RE rs4362209, AT rs10819989, ATP7 rs5251533, MUTY rs 4579862.
8. A method for screening, risk prediction and/or diagnosis of liver cancer, comprising the step of using a screening model of liver cancer tumor, wherein the screening model of liver cancer tumor is constructed by the construction method of any one of claims 1 to 7.
9. A kit for screening, risk prediction and/or diagnosis of liver cancer, comprising a reagent for detecting genotyping in a tumor screening model constructed according to the construction method of any one of claims 1 to 7.
10. A system or device for liver cancer screening, risk prediction and/or diagnosis, comprising:
the acquisition module is used for acquiring an SNP data set associated with the liver cancer;
the screening module is used for screening SNPs used for modeling based on the data set to serve as model variables;
the calculation module is used for calculating the relative risk value of different genotypes of each model variable;
the construction module is used for constructing and obtaining the liver cancer tumor screening model based on the relative risk value; and
the data analysis module is used for inputting the relative risk value of the SNP for modeling of the individual to be tested into the liver cancer tumor screening model constructed according to the construction method of any one of claims 1 to 7 so as to obtain a prediction result.
CN202111611165.9A 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application Active CN114283883B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202111611165.9A CN114283883B (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application
CN202211249896.8A CN115719613A (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application
CN202211250017.3A CN115762635A (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111611165.9A CN114283883B (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN202211249896.8A Division CN115719613A (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application
CN202211250017.3A Division CN115762635A (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application

Publications (2)

Publication Number Publication Date
CN114283883A true CN114283883A (en) 2022-04-05
CN114283883B CN114283883B (en) 2022-11-22

Family

ID=80876214

Family Applications (3)

Application Number Title Priority Date Filing Date
CN202211250017.3A Withdrawn CN115762635A (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application
CN202111611165.9A Active CN114283883B (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application
CN202211249896.8A Withdrawn CN115719613A (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202211250017.3A Withdrawn CN115762635A (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202211249896.8A Withdrawn CN115719613A (en) 2021-12-27 2021-12-27 System for screening and risk prediction of liver cancer based on molecular marker and application

Country Status (1)

Country Link
CN (3) CN115762635A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101437960A (en) * 2006-03-01 2009-05-20 佩勒根科学有限公司 Markers for addiction
US20160222469A1 (en) * 2014-09-30 2016-08-04 Genetic Technologies Limited Methods for Assessing Risk of Developing Breast Cancer
CN109585017A (en) * 2019-01-31 2019-04-05 上海宝藤生物医药科技股份有限公司 A kind of the risk prediction algorithms model and device of age-related macular degeneration
CN110338762A (en) * 2019-07-09 2019-10-18 上海宝藤生物医药科技股份有限公司 Assist method, apparatus, terminal and the server of vitamin D
CN110382712A (en) * 2017-01-24 2019-10-25 基因技术有限公司 The improved method of risk for assessment development breast cancer
CN110527719A (en) * 2019-08-27 2019-12-03 北京天平永达生物科技发展有限公司 A method of establishing the early screening scale of gestational diabetes mellitus risk assessment
CN113517020A (en) * 2021-08-04 2021-10-19 华中农业大学 Rapid and accurate animal genome matching analysis method
CN113637741A (en) * 2021-09-29 2021-11-12 成都二十三魔方生物科技有限公司 Early-onset leukotrichia genetic risk gene detection kit, and early-onset leukotrichia genetic risk assessment system and method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101437960A (en) * 2006-03-01 2009-05-20 佩勒根科学有限公司 Markers for addiction
US20160222469A1 (en) * 2014-09-30 2016-08-04 Genetic Technologies Limited Methods for Assessing Risk of Developing Breast Cancer
CN107002138A (en) * 2014-09-30 2017-08-01 基因技术有限公司 Method for assessment development mammary cancer risk
CN110382712A (en) * 2017-01-24 2019-10-25 基因技术有限公司 The improved method of risk for assessment development breast cancer
US20200102617A1 (en) * 2017-01-24 2020-04-02 Genetic Technologies Limited Improved Methods For Assessing Risk of Developing Breast Cancer
CN109585017A (en) * 2019-01-31 2019-04-05 上海宝藤生物医药科技股份有限公司 A kind of the risk prediction algorithms model and device of age-related macular degeneration
CN110338762A (en) * 2019-07-09 2019-10-18 上海宝藤生物医药科技股份有限公司 Assist method, apparatus, terminal and the server of vitamin D
CN110527719A (en) * 2019-08-27 2019-12-03 北京天平永达生物科技发展有限公司 A method of establishing the early screening scale of gestational diabetes mellitus risk assessment
CN113517020A (en) * 2021-08-04 2021-10-19 华中农业大学 Rapid and accurate animal genome matching analysis method
CN113637741A (en) * 2021-09-29 2021-11-12 成都二十三魔方生物科技有限公司 Early-onset leukotrichia genetic risk gene detection kit, and early-onset leukotrichia genetic risk assessment system and method

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
KL AYERS等: "SNP selection in genome‐wide and candidate gene studies via penalized logistic regression", 《GENETIC EPIDEMIOLOGY》 *
PAOLA SEBASTIANI等: "Naïve Bayesian classifier and genetic risk score for genetic risk prediction of a categorical trait: not so different after all", 《FRONTIERS IN GENETICS》 *
余彩裙等: "阿尔茨海默病患病风险评估模型研究", 《昆明学院学报》 *
王继英等: "全基因组关联分析在畜禽中的研究进展", 《中国农业科学》 *
薛付忠: "健康医疗大数据驱动的健康管理学理论方法体系", 《山东大学学报(医学版)》 *
赵静: "慢性肾病相关生物标志物的筛选及预测模型的建立", 《中国优秀硕士学位论文全文数据库 (医药卫生科技辑)》 *
陈琦等: "TOX3基因rs3803662位点多态性与乳腺癌免疫标志物的相关性", 《中华医学杂志》 *

Also Published As

Publication number Publication date
CN114283883B (en) 2022-11-22
CN115762635A (en) 2023-03-07
CN115719613A (en) 2023-02-28

Similar Documents

Publication Publication Date Title
Yuan et al. Integrated analysis of genetic ancestry and genomic alterations across cancers
Baurecht et al. Genome-wide comparative analysis of atopic dermatitis and psoriasis gives insight into opposing genetic mechanisms
Painter et al. Genome-wide association study identifies a locus at 7p15. 2 associated with endometriosis
Snowsill et al. A systematic review and economic evaluation of diagnostic strategies for Lynch syndrome
Reich et al. A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility
CN108138233B (en) Methylation Pattern analysis of haplotypes of tissues in DNA mixtures
EP2327792B9 (en) Methods and compositions for detecting auto-immune disorders
KR20150110477A (en) Method for indicating a presence or non-presence of aggressive prostate cancer
Mengual et al. DNA microarray expression profiling of bladder cancer allows identification of noninvasive diagnostic markers
CN111676288B (en) System for predicting lung adenocarcinoma patient prognosis and application thereof
Wyszynski et al. A genome‐wide scan for loci predisposing to non‐syndromic cleft lip with or without cleft palate in two large Syrian families
WO2020224504A1 (en) Cfdna classification method, apparatus and application
WO2020034543A1 (en) Marker for breast cancer diagnosis and screening method therefor
Luo et al. hsa‐mir‐3199‐2 and hsa‐mir‐1293 as novel prognostic biomarkers of papillary renal cell carcinoma by COX ratio risk regression model screening
Voigt et al. Phenotype in combination with genotype improves outcome prediction in acute myeloid leukemia: a report from Children’s Oncology Group protocol AAML0531
Li et al. Segregation analysis of 17,425 population-based breast cancer families: evidence for genetic susceptibility and risk prediction
Chen et al. Identification of hub biomarkers and immune cell infiltration in polymyositis and dermatomyositis
CN113963801A (en) Urinary system calculus postoperative recurrence risk prediction model, urinary system calculus postoperative recurrence risk assessment system and urinary system calculus postoperative recurrence risk assessment method
US20190345565A1 (en) Method for indicating a presence or non-presence of prostate cancer in individuals with particular characteristics
CN114283883B (en) System for screening and risk prediction of liver cancer based on molecular marker and application
Poirier et al. Informed Genome‐Wide Association Analysis With Family History As a Secondary Phenotype Identifies Novel Loci of Lung Cancer
Chen et al. Screening of a novel autophagy-related prognostic signature and therapeutic targets in hepatocellular carcinoma
Kong et al. Association of mthfr polymorphisms with h-type hypertension: A systemic review and network meta-analysis of diagnostic test accuracy
Glessner et al. MONTAGE: a new tool for high-throughput detection of mosaic copy number variation
US20080140320A1 (en) Biometric analysis populations defined by homozygous marker track length

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20221026

Address after: Room 101, 201, No. 309, Jiangchang Third Road, Jing'an District, Shanghai 200040

Applicant after: Shanghai Huace Aipu Medical Laboratory Co.,Ltd.

Address before: 075000 No. 12, Changqing Road, Qiaoxi West, Hebei, Zhangjiakou

Applicant before: THE FIRST AFFILIATED HOSPITAL OF HEBEI NORTH University

GR01 Patent grant
GR01 Patent grant