CN109182505B - Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis - Google Patents
Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis Download PDFInfo
- Publication number
- CN109182505B CN109182505B CN201811146220.XA CN201811146220A CN109182505B CN 109182505 B CN109182505 B CN 109182505B CN 201811146220 A CN201811146220 A CN 201811146220A CN 109182505 B CN109182505 B CN 109182505B
- Authority
- CN
- China
- Prior art keywords
- snp
- snps
- mastitis
- analysis
- sites
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 208000004396 mastitis Diseases 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 24
- 235000013365 dairy product Nutrition 0.000 title claims abstract description 23
- 238000003205 genotyping method Methods 0.000 title claims abstract description 8
- 238000012163 sequencing technique Methods 0.000 claims abstract description 29
- 238000001976 enzyme digestion Methods 0.000 claims abstract description 23
- 238000007477 logistic regression Methods 0.000 claims abstract description 19
- 238000012098 association analyses Methods 0.000 claims abstract description 10
- 238000003766 bioinformatics method Methods 0.000 claims abstract description 5
- 238000001914 filtration Methods 0.000 claims abstract description 5
- 101000878540 Homo sapiens Protein-tyrosine kinase 2-beta Proteins 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 238000009826 distribution Methods 0.000 claims description 36
- 241000283690 Bos taurus Species 0.000 claims description 31
- 230000000694 effects Effects 0.000 claims description 27
- 238000004458 analytical method Methods 0.000 claims description 14
- 239000000654 additive Substances 0.000 claims description 11
- 230000000996 additive effect Effects 0.000 claims description 11
- 230000002068 genetic effect Effects 0.000 claims description 9
- 239000003550 marker Substances 0.000 claims description 9
- 239000002773 nucleotide Substances 0.000 claims description 9
- 125000003729 nucleotide group Chemical group 0.000 claims description 9
- 108091008146 restriction endonucleases Proteins 0.000 claims description 8
- 238000000546 chi-square test Methods 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 7
- 238000007476 Maximum Likelihood Methods 0.000 claims description 6
- 108700028369 Alleles Proteins 0.000 claims description 5
- 208000019395 Lactation disease Diseases 0.000 claims description 3
- 206010042576 Suppressed lactation Diseases 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000000513 principal component analysis Methods 0.000 claims description 3
- 238000003908 quality control method Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 210000000349 chromosome Anatomy 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims description 2
- 238000010207 Bayesian analysis Methods 0.000 claims 1
- 239000003153 chemical reaction reagent Substances 0.000 claims 1
- 238000002360 preparation method Methods 0.000 claims 1
- 239000012634 fragment Substances 0.000 abstract description 14
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000012216 screening Methods 0.000 abstract description 5
- 102100037787 Protein-tyrosine kinase 2-beta Human genes 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000010219 correlation analysis Methods 0.000 abstract description 2
- 108090000623 proteins and genes Proteins 0.000 abstract description 2
- 108010042407 Endonucleases Proteins 0.000 description 9
- 102000004533 Endonucleases Human genes 0.000 description 9
- 238000011160 research Methods 0.000 description 5
- 238000003752 polymerase chain reaction Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 102000004190 Enzymes Human genes 0.000 description 3
- 108090000790 Enzymes Proteins 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 102000003960 Ligases Human genes 0.000 description 2
- 108090000364 Ligases Proteins 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 244000144980 herd Species 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 235000003930 Aegle marmelos Nutrition 0.000 description 1
- 244000058084 Aegle marmelos Species 0.000 description 1
- 108091023037 Aptamer Proteins 0.000 description 1
- 102000012410 DNA Ligases Human genes 0.000 description 1
- 108010061982 DNA Ligases Proteins 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 101000757182 Saccharomyces cerevisiae Glucoamylase S2 Proteins 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000006911 enzymatic reaction Methods 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000012214 genetic breeding Methods 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Pathology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to a method for genotyping and analyzing key SNPs locus rs75762330 and 2b-RAD of mastitis of a dairy cow, which comprises the following steps: establishing a library and sequencing; bioinformatics analysis: data filtration, enzyme digestion sequence extraction, data comparison, SNP typing and whole genome association analysis. A Bayesian model and a Logistic regression model are adopted to perform genome-wide association analysis (GWAS) on the dairy cow clinical mastitis phenotypic characters. Compared with the prior art, the invention has the beneficial effects that: compared with the RADSeq, 2b-RAD sequencing technology, the method has the following advantages: 1. the enzyme digestion fragments are uniform in length and do not need subsequent screening; 2. the enzyme digestion fragment does not need to be added with a Y-shaped joint; 3. the steps are simple; 4. the sequencing cost of each sample is low; 5. the sequencing time is short. The invention also constructs two whole genome correlation analysis models (BayesA and Logitics); 3. a Chinese Holstein cow mastitis key SNPs locus and a corresponding gene (PTK2B) are screened.
Description
Technical Field
The invention relates to a method for genotyping and analyzing key SNPs locus rs75762330 and 2b-RAD of mastitis of a dairy cow.
Background
The restriction enzyme site-associated DNA sequencing (RADSeq) technology is to use restriction enzyme to perform enzyme digestion on a genome to generate a DNA fragment with a certain size, and then to perform high-throughput sequencing on RAD markers generated after enzyme digestion by constructing a sequencing library. RADseq is considered to be one of the most important scientific breakthroughs in the past decade, and single nucleotide polymorphism markers (SNPs) in thousands of genomes can be detected at a time in a single, simple and cost-effective method in a whole genome, thereby promoting the research of genomics. Compared with other sequencing technologies, the technology has the advantages of high flux, good accuracy, short experimental period, high cost performance, no limitation of the existence of a reference genome sequence and the like. The method is successfully applied to the research fields of population genetic structure and system evolution analysis, Quantitative Trait Locus (QTL) positioning of important economic traits of animals and plants, assisted genetic breeding, genetic map construction, SNP marker detection and the like.
The RADSeq technical process comprises the following steps: the method comprises the steps of (1) enzyme digestion of genome DNA (endonuclease), library construction (aptamer connection, fragment size screening, fragment end modification, end Y-shaped adaptor addition, PCR amplification), machine sequencing (mainly an Illumina GAII or HiSeq sequencing platform), and bioinformatics analysis (common analysis software: Stacks, pyrAD, UNEAK and the like). The specific flow chart is shown in figure 1.
The prior art has the following disadvantages: 1. the enzyme digestion fragments are different in size and need to be screened; 2. adding different linkers twice at the end of the enzyme digestion fragment; 3. adding special A-tail and Y-type joint to the enzyme digestion fragment; 4. the steps are relatively complicated, the technical requirement is high, and the time is consumed; 5. sequencing costs per sample are high.
Disclosure of Invention
In order to overcome the defects, the endonuclease DNA fragment provided by the invention is uniform in length, subsequent screening is avoided, a joint does not need to be added for multiple times, and the sequencing time is shortened by simple steps; 2b-RAD genotyping and analysis methods that reduce sequencing costs per sample.
The invention also provides a key SNPs locus of the mastitis of the dairy cattle, wherein the key SNPs locus rs75762330 is positioned in an intron region of the gene PTK2B, and the SNPs are C > T. Chromosome AC _000165.1 is involved.
The 2b-RAD genotyping and analyzing method for screening the key SNPs sites of the mastitis of the dairy cattle comprises the following steps:
1) library construction and sequencing: enzyme digestion: the genomic DNA of more than or equal to 200ng is digested by IIB type restriction enzyme; adding a joint: adding 5 groups of different linkers into the enzyme digestion product respectively, and connecting T4 deoxynucleotide ligase;
amplification; are connected in series; mixing the storehouses; sequencing: performing machine sequencing on the DNA library qualified by quality inspection;
2) bioinformatics analysis:
(1) and (3) data filtering: performing quality control on Clean Reads;
(2) and (3) enzyme digestion sequence extraction: extracting a sequence containing a restriction enzyme cutting recognition site for subsequent analysis;
(3) and (3) data comparison: comparing the enzyme digestion sequence to the constructed reference sequence by using SOAP software;
(4) SNP typing: according to the comparison result, typing is carried out by utilizing a maximum likelihood Method (ML);
(5) and (3) analysis: construction of an evolutionary tree, principal component analysis, population genetic structure analysis or whole genome association analysis.
Comparing the enzyme digestion sequence to a reference sequence by using SOAP software, then carrying out SNP mark typing by using a maximum likelihood Method (ML), and further filtering the typing result by adopting the following steps 1) -5) after typing is finished:
1) eliminating sites from all samples where less than 80% of individuals can be typed;
2) (ii) eliminating sites with MAF below 0.01;
3) removing Single Nucleotide Polymorphism (SNP) sites containing 1 or 4 base types;
4) knock out sites of more than 1 SNP within the tag;
5) knock out sites within the tag that are below 2 genotypes.
Carrying out genome-wide association analysis (GWAS) on the clinical mastitis phenotypic characters of the dairy cows by adopting a Bayesian model and a Logistic regression model;
before carrying out genome-wide association analysis (GWAS), firstly constructing a linear regression model equation based on the dairy cow mastitis phenotypic character,wherein, yiA phenotypic feature vector representing an ith individual; m is the total SNPs number; mu is a feature vector of the total phenotypic character average value; alpha is alphakIs the additive association effect vector for the kth SNP; xikGenotype of kth SNP for ith individual; e is a vector of residual effects; k indicates the number of SNP sites.
The Bayesian model assumes that the SNPs effect fits a priori normal distributions with "zero mean" and "SNPs variance" as σk 2Indicates that (the 'zero mean' is equivalent to the 'SNPs variance', and only the words describe the difference), wherein k is 1,2 … …, and M and k refer to the number of SNP sites; the SNPs effect variances are independent of each other, and the independent distribution IID of each variance is the same as the inverse Chi-squared prior normal distribution:where v is a parameter for a degree of freedom, S2Is a scale parameter, P represents the independent distribution (IID) of each variance and the inverse Chi-squared prior normal distribution, χ-2Is 'inverse chi fang'; the prior distribution of criticality for each SNP effect fits the t-distribution: wherein N means that when the probability is pi, the SNPs have zero effect or conform to normal distribution and the probability distribution is (1-pi),”,P(αk│v,S2) A priori distribution, α, expressed as criticality of the effect of each SNPkIndicates the additive association effect vector, α, of the kth SNPkDepends on the variance of each SNP, which has an inverse chi-square; when the probability is pi, the SNPs have zero effect or conform to normal distribution with probability distribution of (1-pi),αk│п, wherein the content of the first and second substances,represents the common variance of all non-zero SNPs effects, which is scaled to a prior distribution that meets the Chi-squared test:the unknown pi value in the model is predicted by its a-priori distribution (considered uniform between 0 and 1) or pi-uniform (0, 1).
vaIs designated as a 4-position(s),calculated from the additive variance:andwherein, PkIs shown as(ii) allele frequencies of the kth SNPs;a difference for a given marker; additive genetic variance by SNPsFor explanation or illustration;a prior distribution for chi-square test; pk(ii) an allele frequency representing the kth SNPs; k is the number of total SNPs.
Logistic regression analysis model: assuming that the single nucleotide polymorphism has influence on the clinical phenotypic character of the mastitis of the dairy cow, establishing a logic (Logistic) regression model to predict the possibility of the clinical mastitis of the dairy cow, firstly constructing a fitted Logistic regression equation,wherein P isjUnder the condition XjProbability of clinical manifestations of hypogalactia, (1-P)j) Under the condition XjProbability of non-occurrence of the lower clinical mastitis phenotype, j represents the jth SNP site, Xij=(X1j,X2j,X3j……Xmj) Genotype at j site for the ith individual (0,1 and 2), β j is the impact of the jth SNP, M is the number of samples, μ is the feature vector of the total phenotypic trait mean; in the logistic regression analysis model, Y ═ Y + Σ βiXi) The equation is converted to another form:wherein Y represents the mastitis phenotype of the ith individual and P represents the probability of a clinical mastitis phenotype; xiIs the genotype of the i-th individual; β i is the odds ratio OR; the equation expressed between P and the variable is transformed by the equation: 95% Confidence Interval (CI) exp (β)i±1.96SE(βi) P1 represents the probability of occurrence of a certain SNP site in a case group, and p0 represents the probability of occurrence of a corresponding site in a control group; SE (. beta.)i) Expressed as: beta is aiIs wrong.
The invention obtains 1 key SNPs locus of mastitis of dairy cows by two analysis models, as shown in tables 1 and 2:
TABLE 1 Bayesian analytical model results
TABLE 2 results of logistic regression analysis model
Compared with the prior art, the invention has the beneficial effects that: compared with the RADSeq, 2b-RAD sequencing technology, the method has the following advantages: 1. the enzyme digestion fragments are uniform in length and do not need subsequent screening; 2. the enzyme digestion fragment does not need to be added with a Y-shaped joint; 3. the steps are simple; 4. the sequencing cost of each sample is low; 5. the sequencing time is short. The invention also constructs two whole genome correlation analysis models (BayesA and Logitics); 3. a Chinese Holstein cow mastitis key SNPs locus and a corresponding gene (PTK2B) are screened.
Drawings
FIG. 1 is a flow diagram of a prior art RADSeq sequencing technique;
FIG. 2 is a flowchart of the 2b-RAD sequencing of the present invention;
FIG. 3 is a diagram of the alignment of the direct sequencing of PCR amplified fragments with the NCBI reference sequence, (A) and (B) are diagrams of direct sequencing of PCR amplified fragments in Chromas; (C)1 is the NCBI reference sequence, a and b are direct sequencing sequences; the grey box is the single nucleotide polymorphic marker site.
Detailed Description
The invention is further illustrated by the following examples and figures.
2b-RAD is a simplified RAD genotyping method based on type IIB restriction enzyme, and provides a powerful technology and method for researching population genome genetics. In the research, Chinese Holstein cows are taken as a research object, clinical mastitis of the Chinese Holstein cows and a normal healthy control group herd are constructed, the whole genome of the constructed herd cows is extracted, the whole genome DNA of all cow samples is subjected to enzyme digestion by Bael endonuclease to obtain standard enzyme digestion fragments, then the on-machine sequencing is carried out and the analysis is carried out, and the specific library construction sequencing flow is as follows (figure 2):
(1) enzyme digestion: the genomic DNA of more than or equal to 200ng is digested by IIB type restriction enzyme;
(2) adding a joint: adding 5 groups of different linkers into the enzyme digestion products respectively, and connecting T4 deoxynucleotide Ligase (T4 DNA Ligase);
(3) amplification: amplifying the ligation product by Polymerase Chain Reaction (PCR);
(4) series connection: according to 5 groups of group header information, serially connecting five labels in sequence;
(5) pooled (Pooling): adding a barcode (barcode) sequence to the ligation product, and mixing the library;
(6) sequencing: and (4) performing on-machine sequencing on the high-quality library qualified by quality inspection.
The above library-building sequencing procedure is described in Serial sequencing of isolentth RAD tags for cost-effective genome-side profiling of genetic and epigenetic variations, written by Shi Wang et al, on-line, 2016, 10/6.
Bioinformatics analysis:
the invention takes an ox (https:// www.ncbi.nlm.nih.gov/genome/. The analysis flow is as follows:
(1) and (3) data filtering: performing quality control on Clean Reads;
(2) enzyme digestion sequence (Enzyme reactions) extraction: extracting sequences containing restriction Enzyme recognition sites (Reads), which are called Enzyme Reads, for subsequent analysis;
(3) and (3) data comparison: comparing Enzyme Reads to the constructed reference sequence by using SOAP software;
(4) SNP typing: according to the comparison result, typing is carried out by utilizing a maximum likelihood Method (ML);
(5) and (3) analyzing the content: the method comprises the steps of construction of an evolutionary tree, principal component analysis, population genetic structure analysis, whole genome association analysis and the like.
And (3) comparing Enzyme Reads to a reference sequence by using SOAP software, and then carrying out SNP marker typing by using a maximum likelihood Method (ML). The RAD typing software package (RADtyping) used in the process comprises more than 10 software components, and covers the whole process from data preprocessing to final typing result output. In order to ensure the accuracy of the subsequent analysis, the typing result is further filtered by the following indexes after the typing work is finished:
1) eliminating sites from all samples where less than 80% of individuals can be typed;
2) (ii) eliminating sites with MAF below 0.01;
3) removing Single Nucleotide Polymorphism (SNP) sites containing 1 or 4 base types;
4) knock out sites of more than 1 SNP within the tag;
5) eliminating sites less than 2 genotypes within the tag;
the total number of SNP markers obtained from all samples was 10058.
Statistical analysis model
The study used a bayesian model and Logistic regression model to perform genome wide association analysis (GWAS) on the clinical mastitis phenotypic traits of cows.
We first construct a linear regression model equation based on the phenotypic characters of mastitis in dairy cows, wherein, yiPhenotypic feature vector representing ith individual(ii) a M is the total SNPs number; mu is a feature vector of the total phenotypic character average value; alpha is alphakIs the additive association effect vector for the kth SNP; xikGenotype of kth SNP for ith individual (0,1 and 2); e is the vector of residual effects.
The Bayesian model assumes that the SNPs effect fits a priori normal distributions with "zero mean" and "SNPs variance" as σk 2Wherein k is 1,2 … …, M; the SNPs effect variances are independent of each other, and the independent distribution (IID) of each variance is the same as the inverse Chi-squared prior normal distribution, where v is a parameter of a degree of freedom; s2Is a scale parameter:the prior distribution of criticality for each SNP effect fits the t-distribution:αkis dependent on the variance of each SNP, with each variance having an inverse chi-square. When the probability is pi, the SNPs have zero effect or conform to normal distribution with probability distribution of (1-pi),αk│п, wherein the content of the first and second substances,represents the common variance of all non-zero SNPs effects, which is scaled to a prior distribution that meets the Chi-squared test:the unknown pi value (considered uniform between 0 and 1) or pi-uniform (0, 1) prediction in the prediction model is predicted from the prior distribution.
vaQuilt fingerThe number of the grooves is set to be 4,calculated from the additive variance:andwherein, PkExpressed as allele frequencies of the kth SNPs;a difference for a given marker; additive genetic variance by SNPsFor explanation or illustration.
A Logistic regression analysis model is provided, and supposing that the single nucleotide polymorphism has influence on the clinical phenotypic character of the mastitis of the dairy cows, a Logistic (Logistic) regression model is established to predict the possibility of the clinical mastitis of the dairy cows, a fitting Logistic regression equation is established,wherein P isjUnder the condition XjProbability of clinical manifestations of hypogalactia, (1-P)j) Is the probability that the clinical mastitis phenotype does not occur; xij=(X1j,X2j,X3j……Xmj) Genotype AT j site for the ith individual (0,1 and 2), e.g., AA for 0, TT for 2, AT for 1; this may also be the case: CC is represented by 0, GG is represented by 2, and CG is represented by 1; or AA is 0, CC is 2, and AC is 1 …; β j is the effect of the jth SNP; m is the number of samples and μ is the feature vector of the overall phenotypic property mean. In the logistic regression analysis model, Y ═ Y + Σ βiXi) The equation can be converted to another form: wherein Y represents the mastitis phenotype of the ith individual and P represents the probability of a clinical mastitis phenotype; xiIs the genotype of the i-th individual; β i is the Odds Ratio (OR); the equation expressed between P and the variable can be transformed by the equation: 95% Confidence Interval (CI) exp (β)i±1.96SE(βi))。
In this study, 1 key SNPs site of mastitis in dairy cows was obtained by two analytical models, as shown in tables 1 and 2:
TABLE 1 Bayesian analytical model results
TABLE 2 results of logistic regression analysis model
Note: denotes the p-value calculated from the chi-square (< 0.05); is the t-statistic p-value (<0.05) of the logistic regression model; CHISQ is the chi-square value under the chi-square test. STAT is the t-statistic under the Logistic regression model. OR: and (4) the advantage ratio. L95: the probability of a 95% confidence interval is less than the 95% lower limit. U95: 95% probability confidence interval 95% upper limit.
In order to verify the correlation between the SNP marker and the mastitis of the dairy cattle, a case control research method is adopted to compare and analyze the exposure rate of the key SNP locus of a case group and a control group. Statistically, if there is a significant difference between the two groups, it can be considered as the SNP site related to the mammitis of the cow. Interference of external matching factors is eliminated in comparison, and only the correlation between SNPs and mastitis is considered. We use matching design and case Control not equal (case/Control 1/h) to determine the number of validation samples.
OR=ad/bc
N is the number of clinical mastitis required in the verification population, and N is the total number of cows in the verification population. P0 is the exposure rate of SNP site mutation of a normal control population, P1 is the exposure rate of SNP site mutation in a clinical mastitis population, OR is the odds ratio (the expected association strength of the SNP site), alpha is the probability of the I type error of the hypothesis test (the expected test significance level), beta is the probability of the II type error of the hypothesis test, (1-beta) is the expected test confidence level, OR 95% CI is the 95% confidence interval, and chi2Is the chi-square test of key SNP loci. a is the number of SNP site mutant individuals in clinical mastitis groups, b is the number of SNP site mutant individuals in normal control groups, c is the number of SNP site non-mutant individuals in clinical mastitis groups, and d is the number of SNP site non-mutant individuals in normal control groups, as shown in Table 3.
rs75762330
SNP site base | Clinical mastitis | Normal control | Total up to |
T | 36(a) | 89(b) | 162 |
C | 37(c) | 221(d) | 221 |
Total up to | 73 | 310 | 383 |
TABLE 3 correlation verification of SNP markers with mastitis in cows
The degree of freedom Df is 1, OR is ad/bc is 2.416OR value > 1 indicates that the risk factor rs75762330 site C of clinical mastitis of Chinese Holstein cows is proved>T and there is a "positive" association between T and mastitis; chi fang chi210.279 ≧ 7.879, P < 0.005, concluded to reject the null hypothesis, i.e., the SNP site rs75762330 difference was statistically significant.
The examples described are illustrative of the invention and are not to be construed as limiting the invention, and any variations and modifications which come within the meaning and range of equivalency of the invention are to be considered within the scope of the invention.
Claims (8)
1. The application of the detection reagent for detecting the key SNPs sites of the mastitis of the dairy cattle in the preparation of the mastitis kit of the dairy cattle is characterized in that the key SNPs sites are positioned in an intron region of a gene PTK2B and relate to a chromosome AC _000165.1, the reference sequence in NCBI is TCCCCTTGATACTCATGTATTCCAATAA, the 5 th position is a single nucleotide polymorphic marker site, and SNPs are C > T.
2. The use according to claim 1, wherein the method for genotyping and analyzing 2b-RAD at the sites of the key SNPs of mastitis in dairy cattle comprises the steps of:
establishing a library and sequencing;
bioinformatics analysis:
(1) and (3) data filtering: performing quality control on Clean Reads;
(2) and (3) enzyme digestion sequence extraction: extracting a sequence containing a restriction enzyme cutting recognition site for subsequent analysis;
(3) and (3) data comparison: comparing the enzyme digestion sequence to the constructed reference sequence by using SOAP software;
(4) SNP typing: according to the comparison result, typing is carried out by utilizing a maximum likelihood method;
(5) and (3) analysis: construction of an evolutionary tree, principal component analysis, population genetic structure analysis or whole genome association analysis.
3. The application of claim 2, wherein the SOAP software is used for performing SNP marker typing on the enzyme-cleaved sequence after the enzyme-cleaved sequence is aligned to the reference sequence by using a maximum likelihood method, and the typing result is further filtered by the following steps 1) -5) after the typing is completed:
1) eliminating sites from all samples where less than 80% of individuals can be typed;
2) (ii) eliminating sites with MAF below 0.01;
3) eliminating single nucleotide polymorphic sites containing 1 or 4 base types;
4) knock out sites of more than 1 SNP within the tag;
5) knock out sites within the tag that are below 2 genotypes.
4. The use of claim 2, wherein a bayesian model and Logistic regression model are used to perform genome-wide association analysis of the clinical mastitis phenotypic traits of cows;
before carrying out whole genome association analysis, firstly constructing a linear regression model equation based on the mastitis phenotypic characters of the dairy cows,wherein, yiA phenotypic feature vector representing an ith individual; m is the total SNPs number; mu is a feature vector of the total phenotypic character average value; alpha is alphakIs the additive association effect vector for the kth SNP; xikGenotype of kth SNP for ith individual; e is a vector of residual effects; k indicates the number of SNP sites.
5. The use according to claim 4,
the Bayesian model assumes that the SNPs effect fits a priori normal distributions with "zero mean" and "SNPs variance" as σk 2Wherein k is 1,2 … …, M, k refers to the number of SNP sites; the SNPs effect variances are independent of each other, and the independent distribution IID of each variance is the same as the inverse Chi-squared prior normal distribution:where v is a parameter for a degree of freedom, S2Is a scale parameter, P represents the independent distribution IID of each variance and the inverse Chi-squared prior normal distribution, χ-2Is 'inverse chi fang'; the prior distribution of criticality for each SNP effect fits the t-distribution: wherein P (alpha)k│v,S2) A priori distribution, α, expressed as criticality of the effect of each SNPkIndicates the additive association effect vector, α, of the kth SNPkDepends on the variance of each SNP, which has an inverse chi-square; when the probability is pi, the SNPs have zero effect or conform to normal distribution with probability distribution of (1-pi),αk│п, wherein the content of the first and second substances,represents the common variance of all non-zero SNPs effects, which is scaled to a prior distribution that meets the Chi-squared test:
vais designated as a 4-position(s),calculated from the additive variance:andwherein, PkExpressed as allele frequencies of the kth SNPs;a difference for a given marker; additive genetic variance by SNPsFor explanation or illustration;a prior distribution for chi-square test; pk(ii) an allele frequency representing the kth SNPs; k is the number of total SNPs.
6. The use according to claim 4,
logistic regression analysis model: assuming that the single nucleotide polymorphism has influence on the clinical phenotypic character of the mastitis of the dairy cow, establishing a Logistic regression model to predict the possibility of the clinical mastitis of the dairy cow, firstly constructing a fitted Logistic regression equation,wherein P isjUnder the condition XjProbability of clinical manifestations of hypogalactia, (1-P)j) Under the condition XjProbability of non-occurrence of the lower clinical mastitis phenotype, j represents the jth SNP site, Xij=(X1j,X2j,X3j……Xmj) Is the genotype of the ith individual at the j site, β j is the influence of the jth SNP, M is the sample number, μ is the feature vector of the overall phenotypic trait mean; in the logistic regression analysis model, Y ═ Y + Σ βiXi) The equation is converted to another form:wherein Y represents the mastitis phenotype of the ith individual and P represents the probability of a clinical mastitis phenotype; xiIs the genotype of the i-th individual; β i is the odds ratio OR; the equation expressed between P and the variable is transformed by the equation: 95% Confidence Interval (CI) exp (β)i±1.96SE(βi) P1 represents the probability of occurrence of a certain SNP site in a case group, and p0 represents the probability of occurrence of a corresponding site in a control group; CI refers to 95% confidence interval; SE (. beta.)i) Expressed as: beta is aiIs wrong.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811146220.XA CN109182505B (en) | 2018-09-29 | 2018-09-29 | Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811146220.XA CN109182505B (en) | 2018-09-29 | 2018-09-29 | Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109182505A CN109182505A (en) | 2019-01-11 |
CN109182505B true CN109182505B (en) | 2022-01-04 |
Family
ID=64906890
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811146220.XA Expired - Fee Related CN109182505B (en) | 2018-09-29 | 2018-09-29 | Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109182505B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112164424B (en) * | 2020-08-03 | 2024-04-09 | 南京派森诺基因科技有限公司 | Group evolution analysis method based on no-reference genome |
JP7465485B2 (en) | 2022-03-24 | 2024-04-11 | 国立大学法人東京農工大学 | DNA marker for use in determining risk of developing mastitis and method for determining risk of mastitis using the same |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102899395B (en) * | 2012-06-20 | 2014-12-10 | 山东省农业科学院奶牛研究中心 | Breed selection method for improving mastitis resistance of dairy cow and use thereof |
CN102899396B (en) * | 2012-07-25 | 2015-02-18 | 山东省农业科学院奶牛研究中心 | Core promoter for influencing cow mastitis infectibility/resistance HMGB3 gene and functional molecule mark and application |
CN103146821B (en) * | 2013-02-25 | 2015-06-17 | 安徽农业大学 | Method for evaluating inheritance effect of SNP (Single Nucleotide Polymorphism) sites to traits and application thereof |
CN104232627B (en) * | 2013-06-13 | 2017-05-10 | 深圳华大基因科技有限公司 | 2b-RAD pooling technology |
CN105925680B (en) * | 2016-05-06 | 2019-06-18 | 中国农业科学院蔬菜花卉研究所 | A kind of method and its application of Tetraploid Potatoes high-flux sequence exploitation label |
CN108004340B (en) * | 2016-10-27 | 2021-04-16 | 河南农业大学 | Method for developing SNP (single nucleotide polymorphism) of whole genome of peanut |
-
2018
- 2018-09-29 CN CN201811146220.XA patent/CN109182505B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN109182505A (en) | 2019-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109182538B (en) | Method for genotyping and analyzing key SNPs sites rs88640083 and 2b-RAD of dairy cow mastitis | |
EP2805280B1 (en) | Diagnostic processes that factor experimental conditions | |
US20120184449A1 (en) | Fetal genetic variation detection | |
JP7497879B2 (en) | Methods and Reagents for Analysing Nucleic Acid Mixtures and Mixed Cell Populations and Related Uses - Patent application | |
US20220106642A1 (en) | Multiplexed Parallel Analysis Of Targeted Genomic Regions For Non-Invasive Prenatal Testing | |
Liu et al. | A comprehensive catalogue of regulatory variants in the cattle transcriptome | |
CN109182505B (en) | Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis | |
CN109182504B (en) | Method for genotyping and analyzing key SNPs sites rs20438858 and 2b-RAD of dairy cow mastitis | |
AU2020296108B2 (en) | Systems and methods for determining pattern of inheritance in embryos | |
US11649500B2 (en) | Target-enriched multiplexed parallel analysis for assessment of fetal DNA samples | |
US20200399701A1 (en) | Systems and methods for using density of single nucleotide variations for the verification of copy number variations in human embryos | |
US20230279494A1 (en) | Methods for non-invasive assessment of fetal genetic variations that factor experimental conditions | |
Wojciechowska et al. | Nowak-Zyczy nska | |
Morgan | 14 Considerations in Estimating Genotype in Nutrigenetic Studies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220104 |