CN109182505B - Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis - Google Patents

Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis Download PDF

Info

Publication number
CN109182505B
CN109182505B CN201811146220.XA CN201811146220A CN109182505B CN 109182505 B CN109182505 B CN 109182505B CN 201811146220 A CN201811146220 A CN 201811146220A CN 109182505 B CN109182505 B CN 109182505B
Authority
CN
China
Prior art keywords
snp
snps
mastitis
analysis
sites
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201811146220.XA
Other languages
Chinese (zh)
Other versions
CN109182505A (en
Inventor
蔡亚非
杨帆
李君�
陈芳慧
江孝俊
袁露
马腾月
吕成龙
李莲
李惠侠
王根林
韩兆玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Agricultural University
Original Assignee
Nanjing Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Agricultural University filed Critical Nanjing Agricultural University
Priority to CN201811146220.XA priority Critical patent/CN109182505B/en
Publication of CN109182505A publication Critical patent/CN109182505A/en
Application granted granted Critical
Publication of CN109182505B publication Critical patent/CN109182505B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a method for genotyping and analyzing key SNPs locus rs75762330 and 2b-RAD of mastitis of a dairy cow, which comprises the following steps: establishing a library and sequencing; bioinformatics analysis: data filtration, enzyme digestion sequence extraction, data comparison, SNP typing and whole genome association analysis. A Bayesian model and a Logistic regression model are adopted to perform genome-wide association analysis (GWAS) on the dairy cow clinical mastitis phenotypic characters. Compared with the prior art, the invention has the beneficial effects that: compared with the RADSeq, 2b-RAD sequencing technology, the method has the following advantages: 1. the enzyme digestion fragments are uniform in length and do not need subsequent screening; 2. the enzyme digestion fragment does not need to be added with a Y-shaped joint; 3. the steps are simple; 4. the sequencing cost of each sample is low; 5. the sequencing time is short. The invention also constructs two whole genome correlation analysis models (BayesA and Logitics); 3. a Chinese Holstein cow mastitis key SNPs locus and a corresponding gene (PTK2B) are screened.

Description

Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis
Technical Field
The invention relates to a method for genotyping and analyzing key SNPs locus rs75762330 and 2b-RAD of mastitis of a dairy cow.
Background
The restriction enzyme site-associated DNA sequencing (RADSeq) technology is to use restriction enzyme to perform enzyme digestion on a genome to generate a DNA fragment with a certain size, and then to perform high-throughput sequencing on RAD markers generated after enzyme digestion by constructing a sequencing library. RADseq is considered to be one of the most important scientific breakthroughs in the past decade, and single nucleotide polymorphism markers (SNPs) in thousands of genomes can be detected at a time in a single, simple and cost-effective method in a whole genome, thereby promoting the research of genomics. Compared with other sequencing technologies, the technology has the advantages of high flux, good accuracy, short experimental period, high cost performance, no limitation of the existence of a reference genome sequence and the like. The method is successfully applied to the research fields of population genetic structure and system evolution analysis, Quantitative Trait Locus (QTL) positioning of important economic traits of animals and plants, assisted genetic breeding, genetic map construction, SNP marker detection and the like.
The RADSeq technical process comprises the following steps: the method comprises the steps of (1) enzyme digestion of genome DNA (endonuclease), library construction (aptamer connection, fragment size screening, fragment end modification, end Y-shaped adaptor addition, PCR amplification), machine sequencing (mainly an Illumina GAII or HiSeq sequencing platform), and bioinformatics analysis (common analysis software: Stacks, pyrAD, UNEAK and the like). The specific flow chart is shown in figure 1.
The prior art has the following disadvantages: 1. the enzyme digestion fragments are different in size and need to be screened; 2. adding different linkers twice at the end of the enzyme digestion fragment; 3. adding special A-tail and Y-type joint to the enzyme digestion fragment; 4. the steps are relatively complicated, the technical requirement is high, and the time is consumed; 5. sequencing costs per sample are high.
Disclosure of Invention
In order to overcome the defects, the endonuclease DNA fragment provided by the invention is uniform in length, subsequent screening is avoided, a joint does not need to be added for multiple times, and the sequencing time is shortened by simple steps; 2b-RAD genotyping and analysis methods that reduce sequencing costs per sample.
The invention also provides a key SNPs locus of the mastitis of the dairy cattle, wherein the key SNPs locus rs75762330 is positioned in an intron region of the gene PTK2B, and the SNPs are C > T. Chromosome AC _000165.1 is involved.
The 2b-RAD genotyping and analyzing method for screening the key SNPs sites of the mastitis of the dairy cattle comprises the following steps:
1) library construction and sequencing: enzyme digestion: the genomic DNA of more than or equal to 200ng is digested by IIB type restriction enzyme; adding a joint: adding 5 groups of different linkers into the enzyme digestion product respectively, and connecting T4 deoxynucleotide ligase;
amplification; are connected in series; mixing the storehouses; sequencing: performing machine sequencing on the DNA library qualified by quality inspection;
2) bioinformatics analysis:
(1) and (3) data filtering: performing quality control on Clean Reads;
(2) and (3) enzyme digestion sequence extraction: extracting a sequence containing a restriction enzyme cutting recognition site for subsequent analysis;
(3) and (3) data comparison: comparing the enzyme digestion sequence to the constructed reference sequence by using SOAP software;
(4) SNP typing: according to the comparison result, typing is carried out by utilizing a maximum likelihood Method (ML);
(5) and (3) analysis: construction of an evolutionary tree, principal component analysis, population genetic structure analysis or whole genome association analysis.
Comparing the enzyme digestion sequence to a reference sequence by using SOAP software, then carrying out SNP mark typing by using a maximum likelihood Method (ML), and further filtering the typing result by adopting the following steps 1) -5) after typing is finished:
1) eliminating sites from all samples where less than 80% of individuals can be typed;
2) (ii) eliminating sites with MAF below 0.01;
3) removing Single Nucleotide Polymorphism (SNP) sites containing 1 or 4 base types;
4) knock out sites of more than 1 SNP within the tag;
5) knock out sites within the tag that are below 2 genotypes.
Carrying out genome-wide association analysis (GWAS) on the clinical mastitis phenotypic characters of the dairy cows by adopting a Bayesian model and a Logistic regression model;
before carrying out genome-wide association analysis (GWAS), firstly constructing a linear regression model equation based on the dairy cow mastitis phenotypic character,
Figure BDA0001816861040000021
wherein, yiA phenotypic feature vector representing an ith individual; m is the total SNPs number; mu is a feature vector of the total phenotypic character average value; alpha is alphakIs the additive association effect vector for the kth SNP; xikGenotype of kth SNP for ith individual; e is a vector of residual effects; k indicates the number of SNP sites.
The Bayesian model assumes that the SNPs effect fits a priori normal distributions with "zero mean" and "SNPs variance" as σk 2Indicates that (the 'zero mean' is equivalent to the 'SNPs variance', and only the words describe the difference), wherein k is 1,2 … …, and M and k refer to the number of SNP sites; the SNPs effect variances are independent of each other, and the independent distribution IID of each variance is the same as the inverse Chi-squared prior normal distribution:
Figure BDA0001816861040000031
where v is a parameter for a degree of freedom, S2Is a scale parameter, P represents the independent distribution (IID) of each variance and the inverse Chi-squared prior normal distribution, χ-2Is 'inverse chi fang'; the prior distribution of criticality for each SNP effect fits the t-distribution:
Figure BDA0001816861040000032
Figure BDA0001816861040000033
wherein N means that when the probability is pi, the SNPs have zero effect or conform to normal distribution and the probability distribution is (1-pi),
Figure BDA0001816861040000034
”,P(αk│v,S2) A priori distribution, α, expressed as criticality of the effect of each SNPkIndicates the additive association effect vector, α, of the kth SNPkDepends on the variance of each SNP, which has an inverse chi-square; when the probability is pi, the SNPs have zero effect or conform to normal distribution with probability distribution of (1-pi),
Figure BDA0001816861040000035
αk│п,
Figure BDA0001816861040000036
Figure BDA0001816861040000037
wherein the content of the first and second substances,
Figure BDA0001816861040000038
represents the common variance of all non-zero SNPs effects, which is scaled to a prior distribution that meets the Chi-squared test:
Figure BDA0001816861040000039
the unknown pi value in the model is predicted by its a-priori distribution (considered uniform between 0 and 1) or pi-uniform (0, 1).
vaIs designated as a 4-position(s),
Figure BDA00018168610400000310
calculated from the additive variance:
Figure BDA00018168610400000311
and
Figure BDA00018168610400000312
wherein, PkIs shown as(ii) allele frequencies of the kth SNPs;
Figure BDA00018168610400000313
a difference for a given marker; additive genetic variance by SNPs
Figure BDA00018168610400000314
For explanation or illustration;
Figure BDA00018168610400000315
a prior distribution for chi-square test; pk(ii) an allele frequency representing the kth SNPs; k is the number of total SNPs.
Logistic regression analysis model: assuming that the single nucleotide polymorphism has influence on the clinical phenotypic character of the mastitis of the dairy cow, establishing a logic (Logistic) regression model to predict the possibility of the clinical mastitis of the dairy cow, firstly constructing a fitted Logistic regression equation,
Figure BDA00018168610400000316
wherein P isjUnder the condition XjProbability of clinical manifestations of hypogalactia, (1-P)j) Under the condition XjProbability of non-occurrence of the lower clinical mastitis phenotype, j represents the jth SNP site, Xij=(X1j,X2j,X3j……Xmj) Genotype at j site for the ith individual (0,1 and 2), β j is the impact of the jth SNP, M is the number of samples, μ is the feature vector of the total phenotypic trait mean; in the logistic regression analysis model, Y ═ Y + Σ βiXi) The equation is converted to another form:
Figure BDA00018168610400000317
wherein Y represents the mastitis phenotype of the ith individual and P represents the probability of a clinical mastitis phenotype; xiIs the genotype of the i-th individual; β i is the odds ratio OR; the equation expressed between P and the variable is transformed by the equation:
Figure BDA00018168610400000318
Figure BDA00018168610400000319
95% Confidence Interval (CI) exp (β)i±1.96SE(βi) P1 represents the probability of occurrence of a certain SNP site in a case group, and p0 represents the probability of occurrence of a corresponding site in a control group; SE (. beta.)i) Expressed as: beta is aiIs wrong.
The invention obtains 1 key SNPs locus of mastitis of dairy cows by two analysis models, as shown in tables 1 and 2:
TABLE 1 Bayesian analytical model results
Figure BDA0001816861040000041
TABLE 2 results of logistic regression analysis model
Figure BDA0001816861040000042
Compared with the prior art, the invention has the beneficial effects that: compared with the RADSeq, 2b-RAD sequencing technology, the method has the following advantages: 1. the enzyme digestion fragments are uniform in length and do not need subsequent screening; 2. the enzyme digestion fragment does not need to be added with a Y-shaped joint; 3. the steps are simple; 4. the sequencing cost of each sample is low; 5. the sequencing time is short. The invention also constructs two whole genome correlation analysis models (BayesA and Logitics); 3. a Chinese Holstein cow mastitis key SNPs locus and a corresponding gene (PTK2B) are screened.
Drawings
FIG. 1 is a flow diagram of a prior art RADSeq sequencing technique;
FIG. 2 is a flowchart of the 2b-RAD sequencing of the present invention;
FIG. 3 is a diagram of the alignment of the direct sequencing of PCR amplified fragments with the NCBI reference sequence, (A) and (B) are diagrams of direct sequencing of PCR amplified fragments in Chromas; (C)1 is the NCBI reference sequence, a and b are direct sequencing sequences; the grey box is the single nucleotide polymorphic marker site.
Detailed Description
The invention is further illustrated by the following examples and figures.
2b-RAD is a simplified RAD genotyping method based on type IIB restriction enzyme, and provides a powerful technology and method for researching population genome genetics. In the research, Chinese Holstein cows are taken as a research object, clinical mastitis of the Chinese Holstein cows and a normal healthy control group herd are constructed, the whole genome of the constructed herd cows is extracted, the whole genome DNA of all cow samples is subjected to enzyme digestion by Bael endonuclease to obtain standard enzyme digestion fragments, then the on-machine sequencing is carried out and the analysis is carried out, and the specific library construction sequencing flow is as follows (figure 2):
(1) enzyme digestion: the genomic DNA of more than or equal to 200ng is digested by IIB type restriction enzyme;
(2) adding a joint: adding 5 groups of different linkers into the enzyme digestion products respectively, and connecting T4 deoxynucleotide Ligase (T4 DNA Ligase);
(3) amplification: amplifying the ligation product by Polymerase Chain Reaction (PCR);
(4) series connection: according to 5 groups of group header information, serially connecting five labels in sequence;
(5) pooled (Pooling): adding a barcode (barcode) sequence to the ligation product, and mixing the library;
(6) sequencing: and (4) performing on-machine sequencing on the high-quality library qualified by quality inspection.
The above library-building sequencing procedure is described in Serial sequencing of isolentth RAD tags for cost-effective genome-side profiling of genetic and epigenetic variations, written by Shi Wang et al, on-line, 2016, 10/6.
Bioinformatics analysis:
the invention takes an ox (https:// www.ncbi.nlm.nih.gov/genome/. The analysis flow is as follows:
(1) and (3) data filtering: performing quality control on Clean Reads;
(2) enzyme digestion sequence (Enzyme reactions) extraction: extracting sequences containing restriction Enzyme recognition sites (Reads), which are called Enzyme Reads, for subsequent analysis;
(3) and (3) data comparison: comparing Enzyme Reads to the constructed reference sequence by using SOAP software;
(4) SNP typing: according to the comparison result, typing is carried out by utilizing a maximum likelihood Method (ML);
(5) and (3) analyzing the content: the method comprises the steps of construction of an evolutionary tree, principal component analysis, population genetic structure analysis, whole genome association analysis and the like.
And (3) comparing Enzyme Reads to a reference sequence by using SOAP software, and then carrying out SNP marker typing by using a maximum likelihood Method (ML). The RAD typing software package (RADtyping) used in the process comprises more than 10 software components, and covers the whole process from data preprocessing to final typing result output. In order to ensure the accuracy of the subsequent analysis, the typing result is further filtered by the following indexes after the typing work is finished:
1) eliminating sites from all samples where less than 80% of individuals can be typed;
2) (ii) eliminating sites with MAF below 0.01;
3) removing Single Nucleotide Polymorphism (SNP) sites containing 1 or 4 base types;
4) knock out sites of more than 1 SNP within the tag;
5) eliminating sites less than 2 genotypes within the tag;
the total number of SNP markers obtained from all samples was 10058.
Statistical analysis model
The study used a bayesian model and Logistic regression model to perform genome wide association analysis (GWAS) on the clinical mastitis phenotypic traits of cows.
We first construct a linear regression model equation based on the phenotypic characters of mastitis in dairy cows,
Figure BDA0001816861040000061
Figure BDA0001816861040000062
wherein, yiPhenotypic feature vector representing ith individual(ii) a M is the total SNPs number; mu is a feature vector of the total phenotypic character average value; alpha is alphakIs the additive association effect vector for the kth SNP; xikGenotype of kth SNP for ith individual (0,1 and 2); e is the vector of residual effects.
The Bayesian model assumes that the SNPs effect fits a priori normal distributions with "zero mean" and "SNPs variance" as σk 2Wherein k is 1,2 … …, M; the SNPs effect variances are independent of each other, and the independent distribution (IID) of each variance is the same as the inverse Chi-squared prior normal distribution, where v is a parameter of a degree of freedom; s2Is a scale parameter:
Figure BDA0001816861040000063
the prior distribution of criticality for each SNP effect fits the t-distribution:
Figure BDA0001816861040000064
αkis dependent on the variance of each SNP, with each variance having an inverse chi-square. When the probability is pi, the SNPs have zero effect or conform to normal distribution with probability distribution of (1-pi),
Figure BDA0001816861040000065
αk│п,
Figure BDA0001816861040000066
Figure BDA0001816861040000067
wherein the content of the first and second substances,
Figure BDA0001816861040000068
represents the common variance of all non-zero SNPs effects, which is scaled to a prior distribution that meets the Chi-squared test:
Figure BDA0001816861040000069
the unknown pi value (considered uniform between 0 and 1) or pi-uniform (0, 1) prediction in the prediction model is predicted from the prior distribution.
vaQuilt fingerThe number of the grooves is set to be 4,
Figure BDA00018168610400000610
calculated from the additive variance:
Figure BDA00018168610400000611
and
Figure BDA00018168610400000612
wherein, PkExpressed as allele frequencies of the kth SNPs;
Figure BDA00018168610400000613
a difference for a given marker; additive genetic variance by SNPs
Figure BDA00018168610400000614
For explanation or illustration.
A Logistic regression analysis model is provided, and supposing that the single nucleotide polymorphism has influence on the clinical phenotypic character of the mastitis of the dairy cows, a Logistic (Logistic) regression model is established to predict the possibility of the clinical mastitis of the dairy cows, a fitting Logistic regression equation is established,
Figure BDA00018168610400000615
wherein P isjUnder the condition XjProbability of clinical manifestations of hypogalactia, (1-P)j) Is the probability that the clinical mastitis phenotype does not occur; xij=(X1j,X2j,X3j……Xmj) Genotype AT j site for the ith individual (0,1 and 2), e.g., AA for 0, TT for 2, AT for 1; this may also be the case: CC is represented by 0, GG is represented by 2, and CG is represented by 1; or AA is 0, CC is 2, and AC is 1 …; β j is the effect of the jth SNP; m is the number of samples and μ is the feature vector of the overall phenotypic property mean. In the logistic regression analysis model, Y ═ Y + Σ βiXi) The equation can be converted to another form:
Figure BDA0001816861040000071
Figure BDA0001816861040000072
wherein Y represents the mastitis phenotype of the ith individual and P represents the probability of a clinical mastitis phenotype; xiIs the genotype of the i-th individual; β i is the Odds Ratio (OR); the equation expressed between P and the variable can be transformed by the equation:
Figure BDA0001816861040000073
Figure BDA0001816861040000074
95% Confidence Interval (CI) exp (β)i±1.96SE(βi))。
In this study, 1 key SNPs site of mastitis in dairy cows was obtained by two analytical models, as shown in tables 1 and 2:
TABLE 1 Bayesian analytical model results
Figure BDA0001816861040000075
TABLE 2 results of logistic regression analysis model
Figure BDA0001816861040000076
Note: denotes the p-value calculated from the chi-square (< 0.05); is the t-statistic p-value (<0.05) of the logistic regression model; CHISQ is the chi-square value under the chi-square test. STAT is the t-statistic under the Logistic regression model. OR: and (4) the advantage ratio. L95: the probability of a 95% confidence interval is less than the 95% lower limit. U95: 95% probability confidence interval 95% upper limit.
In order to verify the correlation between the SNP marker and the mastitis of the dairy cattle, a case control research method is adopted to compare and analyze the exposure rate of the key SNP locus of a case group and a control group. Statistically, if there is a significant difference between the two groups, it can be considered as the SNP site related to the mammitis of the cow. Interference of external matching factors is eliminated in comparison, and only the correlation between SNPs and mastitis is considered. We use matching design and case Control not equal (case/Control 1/h) to determine the number of validation samples.
Figure BDA0001816861040000077
Figure BDA0001816861040000081
Figure BDA0001816861040000082
OR=ad/bc
Figure BDA0001816861040000083
Figure BDA0001816861040000084
N is the number of clinical mastitis required in the verification population, and N is the total number of cows in the verification population. P0 is the exposure rate of SNP site mutation of a normal control population, P1 is the exposure rate of SNP site mutation in a clinical mastitis population, OR is the odds ratio (the expected association strength of the SNP site), alpha is the probability of the I type error of the hypothesis test (the expected test significance level), beta is the probability of the II type error of the hypothesis test, (1-beta) is the expected test confidence level, OR 95% CI is the 95% confidence interval, and chi2Is the chi-square test of key SNP loci. a is the number of SNP site mutant individuals in clinical mastitis groups, b is the number of SNP site mutant individuals in normal control groups, c is the number of SNP site non-mutant individuals in clinical mastitis groups, and d is the number of SNP site non-mutant individuals in normal control groups, as shown in Table 3.
rs75762330
SNP site base Clinical mastitis Normal control Total up to
T 36(a) 89(b) 162
C 37(c) 221(d) 221
Total up to 73 310 383
TABLE 3 correlation verification of SNP markers with mastitis in cows
The degree of freedom Df is 1, OR is ad/bc is 2.416OR value > 1 indicates that the risk factor rs75762330 site C of clinical mastitis of Chinese Holstein cows is proved>T and there is a "positive" association between T and mastitis; chi fang chi210.279 ≧ 7.879, P < 0.005, concluded to reject the null hypothesis, i.e., the SNP site rs75762330 difference was statistically significant.
The examples described are illustrative of the invention and are not to be construed as limiting the invention, and any variations and modifications which come within the meaning and range of equivalency of the invention are to be considered within the scope of the invention.

Claims (8)

1. The application of the detection reagent for detecting the key SNPs sites of the mastitis of the dairy cattle in the preparation of the mastitis kit of the dairy cattle is characterized in that the key SNPs sites are positioned in an intron region of a gene PTK2B and relate to a chromosome AC _000165.1, the reference sequence in NCBI is TCCCCTTGATACTCATGTATTCCAATAA, the 5 th position is a single nucleotide polymorphic marker site, and SNPs are C > T.
2. The use according to claim 1, wherein the method for genotyping and analyzing 2b-RAD at the sites of the key SNPs of mastitis in dairy cattle comprises the steps of:
establishing a library and sequencing;
bioinformatics analysis:
(1) and (3) data filtering: performing quality control on Clean Reads;
(2) and (3) enzyme digestion sequence extraction: extracting a sequence containing a restriction enzyme cutting recognition site for subsequent analysis;
(3) and (3) data comparison: comparing the enzyme digestion sequence to the constructed reference sequence by using SOAP software;
(4) SNP typing: according to the comparison result, typing is carried out by utilizing a maximum likelihood method;
(5) and (3) analysis: construction of an evolutionary tree, principal component analysis, population genetic structure analysis or whole genome association analysis.
3. The application of claim 2, wherein the SOAP software is used for performing SNP marker typing on the enzyme-cleaved sequence after the enzyme-cleaved sequence is aligned to the reference sequence by using a maximum likelihood method, and the typing result is further filtered by the following steps 1) -5) after the typing is completed:
1) eliminating sites from all samples where less than 80% of individuals can be typed;
2) (ii) eliminating sites with MAF below 0.01;
3) eliminating single nucleotide polymorphic sites containing 1 or 4 base types;
4) knock out sites of more than 1 SNP within the tag;
5) knock out sites within the tag that are below 2 genotypes.
4. The use of claim 2, wherein a bayesian model and Logistic regression model are used to perform genome-wide association analysis of the clinical mastitis phenotypic traits of cows;
before carrying out whole genome association analysis, firstly constructing a linear regression model equation based on the mastitis phenotypic characters of the dairy cows,
Figure FDA0003257270470000011
wherein, yiA phenotypic feature vector representing an ith individual; m is the total SNPs number; mu is a feature vector of the total phenotypic character average value; alpha is alphakIs the additive association effect vector for the kth SNP; xikGenotype of kth SNP for ith individual; e is a vector of residual effects; k indicates the number of SNP sites.
5. The use according to claim 4,
the Bayesian model assumes that the SNPs effect fits a priori normal distributions with "zero mean" and "SNPs variance" as σk 2Wherein k is 1,2 … …, M, k refers to the number of SNP sites; the SNPs effect variances are independent of each other, and the independent distribution IID of each variance is the same as the inverse Chi-squared prior normal distribution:
Figure FDA0003257270470000021
where v is a parameter for a degree of freedom, S2Is a scale parameter, P represents the independent distribution IID of each variance and the inverse Chi-squared prior normal distribution, χ-2Is 'inverse chi fang'; the prior distribution of criticality for each SNP effect fits the t-distribution:
Figure FDA0003257270470000022
Figure FDA0003257270470000023
wherein P (alpha)k│v,S2) A priori distribution, α, expressed as criticality of the effect of each SNPkIndicates the additive association effect vector, α, of the kth SNPkDepends on the variance of each SNP, which has an inverse chi-square; when the probability is pi, the SNPs have zero effect or conform to normal distribution with probability distribution of (1-pi),
Figure FDA0003257270470000024
αk│п,
Figure FDA0003257270470000025
Figure FDA0003257270470000026
wherein the content of the first and second substances,
Figure FDA0003257270470000027
represents the common variance of all non-zero SNPs effects, which is scaled to a prior distribution that meets the Chi-squared test:
Figure FDA0003257270470000028
vais designated as a 4-position(s),
Figure FDA0003257270470000029
calculated from the additive variance:
Figure FDA00032572704700000210
and
Figure FDA00032572704700000211
wherein, PkExpressed as allele frequencies of the kth SNPs;
Figure FDA00032572704700000212
a difference for a given marker; additive genetic variance by SNPs
Figure FDA00032572704700000213
For explanation or illustration;
Figure FDA00032572704700000214
a prior distribution for chi-square test; pk(ii) an allele frequency representing the kth SNPs; k is the number of total SNPs.
6. The use according to claim 4,
logistic regression analysis model: assuming that the single nucleotide polymorphism has influence on the clinical phenotypic character of the mastitis of the dairy cow, establishing a Logistic regression model to predict the possibility of the clinical mastitis of the dairy cow, firstly constructing a fitted Logistic regression equation,
Figure FDA00032572704700000215
wherein P isjUnder the condition XjProbability of clinical manifestations of hypogalactia, (1-P)j) Under the condition XjProbability of non-occurrence of the lower clinical mastitis phenotype, j represents the jth SNP site, Xij=(X1j,X2j,X3j……Xmj) Is the genotype of the ith individual at the j site, β j is the influence of the jth SNP, M is the sample number, μ is the feature vector of the overall phenotypic trait mean; in the logistic regression analysis model, Y ═ Y + Σ βiXi) The equation is converted to another form:
Figure FDA00032572704700000216
wherein Y represents the mastitis phenotype of the ith individual and P represents the probability of a clinical mastitis phenotype; xiIs the genotype of the i-th individual; β i is the odds ratio OR; the equation expressed between P and the variable is transformed by the equation:
Figure FDA0003257270470000031
Figure FDA0003257270470000032
Figure FDA0003257270470000033
Figure FDA0003257270470000034
95% Confidence Interval (CI) exp (β)i±1.96SE(βi) P1 represents the probability of occurrence of a certain SNP site in a case group, and p0 represents the probability of occurrence of a corresponding site in a control group; CI refers to 95% confidence interval; SE (. beta.)i) Expressed as: beta is aiIs wrong.
7. Use according to claim 5, characterized in that the Bayesian analysis model results in
Figure FDA0003257270470000035
8. The use according to claim 6,
the result of the logistic regression analysis model is
Figure FDA0003257270470000036
CN201811146220.XA 2018-09-29 2018-09-29 Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis Expired - Fee Related CN109182505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811146220.XA CN109182505B (en) 2018-09-29 2018-09-29 Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811146220.XA CN109182505B (en) 2018-09-29 2018-09-29 Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis

Publications (2)

Publication Number Publication Date
CN109182505A CN109182505A (en) 2019-01-11
CN109182505B true CN109182505B (en) 2022-01-04

Family

ID=64906890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811146220.XA Expired - Fee Related CN109182505B (en) 2018-09-29 2018-09-29 Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis

Country Status (1)

Country Link
CN (1) CN109182505B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112164424B (en) * 2020-08-03 2024-04-09 南京派森诺基因科技有限公司 Group evolution analysis method based on no-reference genome
JP7465485B2 (en) 2022-03-24 2024-04-11 国立大学法人東京農工大学 DNA marker for use in determining risk of developing mastitis and method for determining risk of mastitis using the same

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102899395B (en) * 2012-06-20 2014-12-10 山东省农业科学院奶牛研究中心 Breed selection method for improving mastitis resistance of dairy cow and use thereof
CN102899396B (en) * 2012-07-25 2015-02-18 山东省农业科学院奶牛研究中心 Core promoter for influencing cow mastitis infectibility/resistance HMGB3 gene and functional molecule mark and application
CN103146821B (en) * 2013-02-25 2015-06-17 安徽农业大学 Method for evaluating inheritance effect of SNP (Single Nucleotide Polymorphism) sites to traits and application thereof
CN104232627B (en) * 2013-06-13 2017-05-10 深圳华大基因科技有限公司 2b-RAD pooling technology
CN105925680B (en) * 2016-05-06 2019-06-18 中国农业科学院蔬菜花卉研究所 A kind of method and its application of Tetraploid Potatoes high-flux sequence exploitation label
CN108004340B (en) * 2016-10-27 2021-04-16 河南农业大学 Method for developing SNP (single nucleotide polymorphism) of whole genome of peanut

Also Published As

Publication number Publication date
CN109182505A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN109182538B (en) Method for genotyping and analyzing key SNPs sites rs88640083 and 2b-RAD of dairy cow mastitis
EP2805280B1 (en) Diagnostic processes that factor experimental conditions
US20120184449A1 (en) Fetal genetic variation detection
JP7497879B2 (en) Methods and Reagents for Analysing Nucleic Acid Mixtures and Mixed Cell Populations and Related Uses - Patent application
US20220106642A1 (en) Multiplexed Parallel Analysis Of Targeted Genomic Regions For Non-Invasive Prenatal Testing
Liu et al. A comprehensive catalogue of regulatory variants in the cattle transcriptome
CN109182505B (en) Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis
CN109182504B (en) Method for genotyping and analyzing key SNPs sites rs20438858 and 2b-RAD of dairy cow mastitis
AU2020296108B2 (en) Systems and methods for determining pattern of inheritance in embryos
US11649500B2 (en) Target-enriched multiplexed parallel analysis for assessment of fetal DNA samples
US20200399701A1 (en) Systems and methods for using density of single nucleotide variations for the verification of copy number variations in human embryos
US20230279494A1 (en) Methods for non-invasive assessment of fetal genetic variations that factor experimental conditions
Wojciechowska et al. Nowak-Zyczy nska
Morgan 14 Considerations in Estimating Genotype in Nutrigenetic Studies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220104