CN115418409A - SNP site related to upland cotton boll opening character and application thereof - Google Patents

SNP site related to upland cotton boll opening character and application thereof Download PDF

Info

Publication number
CN115418409A
CN115418409A CN202210974010.XA CN202210974010A CN115418409A CN 115418409 A CN115418409 A CN 115418409A CN 202210974010 A CN202210974010 A CN 202210974010A CN 115418409 A CN115418409 A CN 115418409A
Authority
CN
China
Prior art keywords
boll opening
germplasm
allelic variation
snp
loci
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210974010.XA
Other languages
Chinese (zh)
Other versions
CN115418409B (en
Inventor
宿俊吉
王彩香
谢晓宇
袁文敏
海涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gansu Agricultural University
Original Assignee
Gansu Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gansu Agricultural University filed Critical Gansu Agricultural University
Priority to CN202210974010.XA priority Critical patent/CN115418409B/en
Publication of CN115418409A publication Critical patent/CN115418409A/en
Application granted granted Critical
Publication of CN115418409B publication Critical patent/CN115418409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/6895Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/13Plant traits
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Botany (AREA)
  • Mycology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses SNP sites related to the boll opening character of upland cotton and application thereof, and relates to the technical field of biology. According to the invention, through the genome-wide association analysis of the phenotype numerical values of the boll opening rate of five single environments and the optimal linear unbiased prediction thereof and the SNP loci, 53 SNP loci which are obviously associated with the boll opening rate are found, the boll opening rates of the respective specific allelic variation types of the 53 stable association loci are obviously compared, the boll opening rates of corresponding germplasms according to different allelic variation types have obvious difference, the allelic variation type with high boll opening rate is an excellent allelic variation principle, the excellent allelic variation of 50 loci is identified, and a foundation is laid for the molecular auxiliary selection of cotton picking of an early maturing machine in the future.

Description

SNP (Single nucleotide polymorphism) site related to boll opening character of upland cotton and application thereof
Technical Field
The invention relates to the technical field of biology, in particular to an SNP (single nucleotide polymorphism) site related to the boll opening character of upland cotton and application thereof.
Background
Cotton (Gossypium spp.) is one of the important economic crops for providing natural fiber, and plays an important role in national economy in China. The boll opening rate is the ratio of the boll opening number to the total boll number in a specific period in a boll opening period, and is one of important indexes for measuring the cotton early maturing property and the optimal period for spraying the defoliating agent. However, there is currently less research on the genetic mechanism of boll opening rate.
In cotton production, the boll opening period is a key index for measuring the early maturity of cotton, and the boll opening period is the date when half of the first bolls of cotton in the whole cotton field are opened. However, practice shows that although some varieties enter the boll opening stage early, the boll opening is not concentrated, the boll opening period is long, and therefore the boll opening stage cannot completely reflect the earliness of the varieties. On the contrary, if the boll opening period of one cotton variety is too long, the boll opening is not concentrated enough, so that the mechanical harvesting process of the cotton harvesting period is influenced, and the boll opening concentration becomes a key character of early maturity. The boll opening rate property is the ratio of the number of bolls opening to the total number of bolls in the boll opening period of the cotton, and is an important index for measuring the boll opening concentration of the cotton. The earlier and more concentrated the boll opening, the higher the boll opening rate, and the lower the boll opening rate is otherwise. And SNP sites related to the boll opening rate are identified, and related candidate genes are excavated, so that technical support can be provided for cultivating early-maturing high-quality machine-harvested cotton varieties.
Disclosure of Invention
The invention aims to provide SNP sites related to the boll opening character of upland cotton and application thereof, so as to solve the problems in the prior art, and the invention excavates 50 SNP sites which are obviously related to the boll opening rate, thereby laying a foundation for the molecular auxiliary selection of cotton picking of an early maturing machine in the future.
In order to achieve the purpose, the invention provides the following scheme:
the invention provides SNP loci related to the boll opening character of upland cotton, and the position information of the SNP loci is as follows:
Figure BDA0003798010910000011
Figure BDA0003798010910000021
the invention also provides application of the SNP locus in identifying the boll opening character of upland cotton.
The invention discloses the following technical effects:
according to the principle that the commonly identified sites in a plurality of environments and a plurality of models are regarded as stable sites related to the boll opening rate, 53 SNP sites which are obviously related to the boll opening rate are found through genome-wide association analysis (GWAS) of the boll opening rate phenotype numerical value and the SNP sites of five single environments and the Best Linear Unbiased Prediction (BLUP). The flocculation rates of the specific allelic variant types of each of the 53 stably associated sites were significantly compared. According to the principle that the boll opening rates of germplasm corresponding to different allelic variation types are obviously different, the allelic variation type with high boll opening rate is an excellent allelic variation, and the excellent allelic variation of 50 sites is identified. The invention excavates 50 SNP loci with excellent allelic variation, and lays a foundation for the molecular auxiliary selection of cotton picking by an early maturing machine in the future.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a three-dimensional diagram of principal component analysis;
FIG. 2 is the results of a GWAS analysis using the farm CPU model;
figure 3 is the results of GWAS analysis using the GLM model;
figure 4 is the results of GWAS analysis using MLM model;
FIG. 5 is a comparison of the flocculation rates of different allelic variations of the significant associated SNP sites of A01-A05 chromosomes in the At chromosome set;
FIG. 6 is a comparison of the flocculation rate of different allelic variations of the A10-A13 chromosome significant association SNP sites in the At chromosome set;
FIG. 7 is a comparison of the flocculation rates of different allelic variations of the significant associated SNP sites of the D03-D08 chromosomes in the Dt chromosome set;
FIG. 8 is a comparison of opening rates of different allelic variations of the significantly associated SNP sites of D09-D12 chromosomes in the Dt genome.
Detailed Description
Reference will now be made in detail to various exemplary embodiments of the present invention, which should not be construed as limiting the invention but rather as providing a more detailed description of certain aspects, features and embodiments of the invention.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Further, for numerical ranges in this disclosure, it is understood that each intervening value, between the upper and lower limit of that range, is also specifically disclosed. Every smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in a stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although only preferred methods and materials are described herein, any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention. All documents mentioned in this specification are incorporated by reference herein for the purpose of disclosing and describing the methods and/or materials associated with the documents. In case of conflict with any incorporated document, the present specification will control.
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the present disclosure without departing from the scope or spirit of the disclosure. Other embodiments will be apparent to those skilled in the art from consideration of the specification. The description and examples are intended to be illustrative only.
As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms, i.e., meaning including, but not limited to.
Example 1 Whole genome Association analysis of boll opening rate traits in upland Cotton population
1. Materials and methods
1.1 Experimental materials
619 upland cotton varieties (lines): (1) Utzibekstein group (FSU, 18 parts); (2) American breed group (USA, 60 parts); (3) northwest inland group (NIR, 171 parts); (4) northern ultra-precocious group (NSEMR, 17 parts); (5) yellow river basin group (YRR, 213 parts); (6) Changjiang river Valley group (YZRR, 113 parts), see Table 1 for details. The method can be divided into five groups according to different ages: (1) to 1979 (25 parts); (2) 1980-1989 (20 parts); (3) 1990-1999 (42 parts); (4) 2000-2009 (121 parts); (5) 2010 to (76 parts).
418 upland cotton variety lines are screened from the 619 parts of upland cotton material to form a natural population (Table 1), which comprises 317 representative variety lines in different cotton planting areas in China, and 101 are foreign introduced variety lines.
TABLE 1 418 details of upland cotton material
Figure BDA0003798010910000041
Figure BDA0003798010910000051
Figure BDA0003798010910000061
Figure BDA0003798010910000071
Figure BDA0003798010910000081
Figure BDA0003798010910000091
1.2 phenotypic data statistics and analysis
Description Statistics and variance analysis are carried out on the boll opening rate property form value by using Excel2019 and IBM SPSS Statistics26.0 software. The basic parameters of the statistical analysis were: max, min, mean, SD, CV, skewness and Kurtosis. The drawing and combination of the pictures and the tables are carried out by utilizing software such as Origin 2018, adobe Photoshop CS6 and Adobe Illustrator 2018. The Best Linear unbiased estimate (BLUP) is calculated using the R packet "lme 4". Generalized heritability (H) 2 ) The calculation formula of (2) is as follows: formula H 2 =σ g 2g 2gy 2 /n+σ e 2 And/nr). Wherein σ g 2 ,σ gy 2 ,σ e 2 Representing genetic variance, genotype and environmental interaction (gxe) variance, and error variance, respectively, n representing the number of environments, and r representing the number of repeats.
1.3 extraction of genomic DNA
In the experiment, 5 representative plant types with good and consistent growth vigor are screened from each variety in natural population in 4 months in 2020, the stem tip and the young leaf tissues of cotton are collected from the field, and the cotton is stored in a prepared ice box and is stored in a refrigerator at minus 80 ℃ for a long time. The modified Cetyl Trimethyl Ammonium Bromide (CTAB) method is used for extracting genome DNA from young leaf tissues of all materials, and the specific operation steps are as follows:
(1) Taking out cotton sample from-80 deg.C refrigerator, transferring into mortar pre-cooled with liquid nitrogen, adding liquid nitrogen, grinding into powder, rapidly transferring into centrifuge tube, adding 800 μ LCTAB lysate, mixing, water-bathing in 65 deg.C water bath for 30min, and slowly reversing every 10min for mixing for 1 time.
(2) The centrifuge tube was removed from the water bath, and 800 μ L of chloroform: isoamyl alcohol (24. After centrifugation at 4 ℃ for 10min at high speed (12000 rpm), the supernatant was aspirated and transferred to a new centrifuge tube.
(3) Adding 0.8 times volume of ice isopropanol (stored at-20 deg.C) into a new centrifuge tube, placing in an oscillator, shaking vigorously for 1min until flocculent precipitate is generated, and standing for 30min.
(4) The DNA was picked up into a 1.5mL centrifuge tube, washed 2 times with an appropriate amount of 70% ethanol, and finally washed 1 time with anhydrous ethanol.
(5) The absolute ethanol is poured off, the centrifuge tube is placed on a clean bench overnight until the DNA is completely dried, and then ddH is added 2 O (containing 0.50% RNase), and the DNA was completely dissolved in a warm water bath at 37 ℃ for 1 hour. The DNA sample was stored in a freezer at-20 ℃ for further use.
1.4 entrusts Beijing Nuo He biogenic bioinformation science and technology Limited company to perform full-genome second-generation re-sequencing on 418 upland cotton germplasm resources. The latest published genome data of Gossypium hirsutum TM-1 (http:// ibi. Zju. Edu. Cn/cotton /) at Zhejiang university was used as a reference. The comparison is carried out by BWA software, the identification of SNP markers is carried out on the compared data by using GATK and Samtools, and the genotype filtration is carried out by using software VCFtools to remove the SNP markers with the minimum allele frequency more than 10 percent (MAF more than 10 percent) and the deletion rate less than 100 percent.
1.5 principal component analysis
Before performing genome-wide association analysis, natural populations are typically subjected to population structure analysis to reduce the sites of false positives in genome-wide association analysis results by assessing population structure. CFtools and PLINK software were used for principal component analysis in this experiment.
1.6 genome wide association analysis (GWAS)
Carrying out whole genome association analysis on 4452629 SNP marks obtained by re-sequencing and phenotypical data of the boll opening rate character by utilizing three methods, namely FarmCPU, GLM and MLM respectively, counting the SNP sites which are obviously associated when sites with-lg (P) of more than or equal to 5.00 are considered as the SNP sites which are obviously associated with the target character, comparing the SNP sites in different environments under three association analysis models by utilizing Excel2019, summarizing the SNP sites which are obviously associated in two or more than two models, and screening the SNP sites which are obviously associated with the boll opening rate character. The qqman package in the R language is used to draw manhattan and Q-Q graphs.
2. Results
2.1 development of SNP markers
Developing SNP markers by using whole genome re-sequencing, performing enzyme digestion prediction on a reference genome by using enzyme digestion prediction software, and selecting a latest upland cotton (TM-1) genome subjected to quality upgrading for enzyme digestion prediction. After quality control, a total of 4452629 high-quality SNP markers were obtained from 418 upland cotton variety materials by resequencing. After population SNP filtration, counting the distribution and density of SNP markers on each chromosome, wherein the SNP markers cover all 26 chromosomes of upland cotton, the number of the markers in each chromosome is greatly different, and the maximum number of the SNP markers detected on the A08 chromosome is 39496 markers; the number of SNP markers detected on chromosome D04 is minimal, and only 84976 markers are detected. The chromosome with the highest distribution density of the markers was A04, and one SNP marker was detected every 1.05kb on average, and the chromosome with the lowest distribution density of the markers was D09, and one SNP marker was detected every 3.27kb on average (Table 2). Analyzing the SNP distribution density map on a single chromosome, wherein most of SNPs on the single chromosome are mainly dark green to light yellow, namely 1/Mb-7761/Mb, the SNPs can be basically and uniformly distributed on each chromosome, and the length of the D01-D13 chromosome set is less than that of the A01-A13 chromosome set; 2541551 SNP markers were present in the A01-A13 chromosome set, while only 1911078 SNP markers were present in the D01-D13 chromosome set.
TABLE 2 distribution of SNPs on chromosomes
Figure BDA0003798010910000111
Figure BDA0003798010910000121
2.2 principal component analysis and linkage disequilibrium
To clarify the population structure of the GWAS analysis, we performed a principal component analysis on 418 parts of upland cotton material (fig. 1), and found that 418 parts of upland cotton material were divided into 2 sub-populations, denoted G1 and G2. G1 354 varieties of upland cotton including areas such as the United states and the Yangtze river basin of China; g2 mainly comprises 64 upland cotton varieties including the introduced Utzibestan variety and the northwest inland cotton area of China. The two sub-populations contained multiple subpopulations, each containing more or less different geographical sources of upland cotton variety, and the results showed that there was extensive genetic communication in the 418 upland cotton populations of this experiment. Linkage disequilibrium analysis revealed that the reduction in r2 from a maximum of 0.46 to half corresponds to an LD value of approximately 500kb.
2.3 Whole genome correlation analysis results of flocculation Rate traits
In 2020, planting experimental materials, namely SHZ-20, KEL-20 and DH-20 respectively, in a test field of cotton institute of Xinjiang Kagaku river (Shihezi, SHZ), xinjiang academy of agricultural sciences, a test field of a Kuerle test station of Xinjiang academy of agricultural sciences (Kuerle, KEL) and a test field of a Dunhuang cotton test station of Gansu Dunhuang, gansu province academy of agricultural sciences; 2021 at the test fields of the Cotton institute of Xinjiang institute of agricultural and reclamation sciences, xinjiang Kehezi and the test fields of the Kurler test station of Xinjiang institute of agricultural and reclamation sciences, xinjiang Korler, are named SHZ-21 and KEL-21.
The GWAS study was performed using the FarmCPU model for five single-environment (DH-20, SHZ-21, KEL-20 and KEL-21) and BLUP values, respectively (FIG. 2). 382 SNP significant sites identified in DH-20 are distributed on 20 chromosomes, wherein the number of sites identified on D03 chromosome is 145 at most, and the-lg (P) value of the site A08-22508631 is 6.67 at most; 313 SNP loci were identified in KEL-20 distributed on 24 chromosomes except A09 and D08, a maximum of 154 loci were identified in D10 chromosome, and the-lg (P) value of loci A10-112524934 was at most 6.84; 64 SNP loci are identified in KEL-21 to be distributed on 20 chromosomes, the maximum 18 loci are identified in A04 chromosome, and the-lg (P) value of the loci D12-14407079 is 7.06 at most; 15 SNP sites were identified in SHZ-20 distributed on the chromosomes A01, A02, A10, A12, D02, D04, D09 and D11, with a-lg (P) value of up to 5.67 at sites A02-42555624; 140 SNP loci identified in SHZ-21 are distributed in 16 chromosomes, wherein 76 SNP loci are identified on A04 chromosome, and the-lg (P) value of the loci A03-106583399 is up to 6.94; 146 SNP loci are identified in BLUP and distributed in 20 chromosomes, wherein the maximum 29 SNP loci are identified on D03 chromosome, and the-lg (P) value of the loci A04-4671281 is up to 7.37.
GWAS studies were performed separately on five single-environment and BLUP values using the GLM model (fig. 3). 81 significant SNP loci identified in DH-20 are unevenly distributed on 14 chromosomes, wherein the loci identified on D03 have a maximum of 52 loci, and the-lg (P) value of loci D03-48988739 is 6.82 at most; 113 SNP loci were identified in KEL-20 distributed on 17 chromosomes, and a maximum of 52 loci were identified in A10 chromosome, with a-lg (P) value of up to 6.90 for loci A13-85385814; 84 SNP loci are identified in KEL-21 and distributed on 18 chromosomes, the maximum 18 loci are identified in A04 chromosome, and the-lg (P) value of the loci D12-14407079 is 7.27 at most; 12 SNP loci are identified in SHZ-20 to be distributed on chromosomes A02, A12, D02, D04, D09 and D11, and the-lg (P) value of the locus A02-42555624 is 5.69 at most; 130 SNP loci identified in SHZ-21 are distributed in 17 chromosomes, wherein 76 SNP loci are identified on chromosome A04, and the-lg (P) value of the loci A03-106583399 is 7.04 at most; 84 SNP loci identified in BLUP are distributed in 21 chromosomes, wherein the maximum 29 SNP loci are identified on D03 chromosome, and the-lg (P) value of the loci A04-4671281 is 6.47 at most.
The GWAS study was performed separately on five single-environment and BLUP values using the MLM model (fig. 4). 15 SNP significant sites were identified in DH-20 distributed on 6 chromosomes A05, A06, D03, D05, D09 and D12, with-lg (P) values at positions D09-6482165 being up to 6.00; 17 SNP sites were identified in KEL-20 and distributed on 5 chromosomes A02, A05, A10, A13 and D02, with a maximum of 12 sites identified in the A10 chromosome, with the highest-lg (P) value of site A10-112630022 being 5.70; 30 SNP sites are identified in KEL-21 to be distributed on 8 chromosomes of A01, A04, A08, D02, D05, D07, D11 and D12, the maximum 12 sites are identified in the chromosome A04, and the-lg (P) value of the site D12-14407079 is 6.95 at most; 7 SNP loci are identified in SHZ-20 to be distributed on 5 chromosomes of A02, A12, D02, D09 and D11, and the-lg (P) value of the locus A02-42555624 is 5.37 at most; 34 SNP sites were identified in SHZ-21 distributed in 15 chromosomes, with a-lg (P) value of up to 6.51 at site A03-106583399; in the BLUP, 14 SNP sites are identified and distributed in 7 chromosomes of A01, A03, A04, A05, A11, A12 and D09, wherein the-lg (P) value of the site D09-368352 is 5.76 at most.
2.4 pleiotropic SNP markers
Sites identified in multiple environments are considered pleiotropic stable sites related to the boll opening rate, and 53 SNP sites are found through site alignment of five single environments and BLUP values and distributed on 15 chromosomes of A01, A03, A04, A05, A10, A11, A13, D03, D05, D06, D08, D09, D10, D11 and D12 respectively. Wherein the-lg (P) value for position A04-4671281 is up to 7.37 and is detected in KEL-21, SHZ-21 and BLUP; position D06-1876732 was also detected in KEL-21, SHZ-21 and BLUP; site D12-59755182 was detected in both KEL-20 and KEL-21 environments; sites A04-4671281, D12-14407079, D12-14407074, A10-112524934, A04-77988939, A10-112630022, D12-59755182, A01-109010159, A11-44468966, D11-23824283, D11-23824366 and D11-23824373 were detected in the three FarmCPU, GLM and MLM models (Table 3).
TABLE 3 five Single Environment and BLUP integration sites
Figure BDA0003798010910000141
Figure BDA0003798010910000151
The accuracy of predicting the boll opening ratio trait locus is improved by using the intersection locus of the 3 models including the farm CPU, the GLM and the MLM, and SNP loci detected by the three models are shown in a table 4.
Researches find that the cotton yield of 418 upland cotton varieties (lines) has wide phenotypic differences, and the cotton yield is easily influenced by external environmental factors such as temperature, illumination, effective accumulated temperature and the like. GWAS analysis is carried out by respectively utilizing the phenotypic value and the BLUP value of a single environment, SNP sites which can be stably associated with target traits are identified, and 53 SNP sites which are obviously associated with the boll opening rate are obtained. The 53 SNP loci which are obviously associated with the boll opening rate are compared with other QTL related to the premature character found in the past, and the loci D03-12470173 and D03-44138315 are found to be overlapped with the QTL reported in the study of the past. For example, ma et al found 34 SNP sites for FT in 11.04-14.17 Mb of D03 chromosome; shen et al found SNP sites for FFBN in 11.75-14.38Mb of the D03 chromosome; the SNP sites for WGP, FT, PH, NFFB and HFFBN were found by Jia et al in 40.24-44.20Mb of the D03 chromosome. The results of the previous researches prove the reliability of the results of the researches and also prove that the site is a key site for controlling the early-maturing related traits of the upland cotton. It is significantly related not only to WGP, FT, PH, NFFB, FFBN and HFFBN, but also to the boll opening rate. Furthermore, the remaining 51 sites should be the newly discovered sites.
TABLE 4 SNP sites detected by the three models
Figure BDA0003798010910000161
Example 2 Excellent allelic variation identification and candidate Gene mining
1. Materials and methods
1.1 identification of superior allelic variation of stably associated sites
SNP loci which can be commonly detected in a plurality of models and a plurality of environments are considered as loci which are obviously associated with the boll opening rate character, the Excel2019 is utilized to arrange the average value of various variation type phenotypes of the obviously associated loci, the IBM SPSS staticistics 26.0 software is used for carrying out double-tail T test to determine the specific excellent equipotential variation type of a single SNP locus, and the corresponding box line graph is drawn by Origin 2018.
2. As a result, the
2.1 identification of superior allelic variation of stably associated sites
In order to identify specific excellent allelic variation types of the 53 stably associated sites, significance comparisons were made on the boll opening rate corresponding to each allelic variation. According to the principle that the boll opening rate of germplasms corresponding to different allelic variation types has significant difference, the allelic variation type with high boll opening rate is excellent allelic variation, and the excellent allelic variation of 50 stable and significant associated sites is identified.
The At genome has 26 stably significant associated loci in total, of which 24 excellent allelic variations have been identified. As shown in fig. 5-6, the flocculation rate of allelic variants of germplasm carrying AG (n = 254) and GG (n = 168) at position a01_109010136 is significantly higher than that of germplasm lines carrying AA (n = 41); the boll opening rate of the A01_109010159 allele variant germplasm carrying CC (n = 153) and CT (n = 258) is obviously higher than that of a germplasm line carrying TT (n = 40); the boll opening rate of A03_1750892 germplasm carrying CG (n = 4) allelic variation is significantly higher than that of a germplasm line carrying CC (n = 443) and GG (n = 37); the boll opening rate of a04_4671281 allelic variant germplasm carrying AG (n = 14) and GG (n = 224) is significantly higher than that of a germplasm line carrying AA (n = 326); a04_4671281 has higher boll opening rate than germplasm line carrying AA (n = 326) with allelic variation germplasm of AG (n = 14) and GG (n = 224); the boll opening rate of the germplasm carrying TT (n = 365) allelic variation of A04_4711992 is obviously higher than that of the germplasm line carrying CC (n = 156) and CT (n = 9); the boll opening rate of the germplasm with allelic variation of AA (n = 533) and CC (n = 48) carried by A04_77961350 is obviously higher than that of the germplasm line with AC (n = 3); the bolling opening rate of the germplasm carrying AA (n = 76) and GG (n = 527) allelic variation in A05_37230382 is obviously higher than that of the germplasm carrying AG (n = 3); the boll opening rate of the germplasm carrying the CT (n = 87) allelic variation in A10_111302977 is significantly higher than that of the germplasm line carrying CC (n = 4) and TT (n = 487); the boll opening rate of germplasm carrying AG (n = 88) allelic variation in A10_111302993 is significantly higher than that of germplasm lines carrying AA (n = 4) and GG (n = 483); the boll opening rate of the germplasm with the allelic variation of TT (n = 349) carried by A10_111500488 is obviously higher than that of a germplasm line with CC (n = 251) and CT (n = 8); the boll opening rate of the germplasm carrying the AA (n = 482) and CC (n = 54) allelic variation in A10_112595448 is obviously higher than that of the germplasm carrying AC (n = 61); the boll shedding rate of the germplasm carrying AA (n = 481) allelic variation in A10_112595449 is significantly higher than that of the germplasm line carrying AG (n = 61) and GG (n = 54); the boll opening rate of the germplasm with allelic variation of A10_112630022, which carries CC (n = 534) and TT (n = 55), is obviously higher than that of the germplasm line carrying CT (n = 7); the boll opening rate of germplasm carrying CC (n = 381) allelic variation in A11_44468944 is obviously higher than that of germplasm carrying CG (n = 222) and GG (n = 3); the boll opening rate of germplasm carrying TT (n = 399) allelic variation of A11_44468966 is significantly higher than that of germplasm lines carrying CC (n = 3) and CT (n = 207); the boll opening rate of a11_44469012 germplasm carrying AA (n = 421) allelic variation is significantly higher than that of germplasm lines carrying AG (n = 174) and GG (n = 9); the boll opening rate of the germplasm carrying GG (n = 437) allelic variation in A11_44469049 is significantly higher than that of the germplasm line carrying GT (n = 160) and TT (n = 8); the boll opening rate of germplasm carrying TT (n = 583) allelic variation in A13_95631102 is significantly higher than that of germplasm carrying CC (n = 30) and CT (n = 2); the boll opening rate of germplasm carrying TT (n = 241) allelic variation in A05_5861460 is obviously higher than that of an inserted and deleted germplasm line; the boll opening rate of germplasm with inserted and deleted bases in the 5 sites of A04_77988939, A05_109681925, A05_5861460, A05_37278564 and A10_111302978 is obviously improved. There was no significant difference between the allelic variation in the 2 sites a10_112519655 and a10_ 112524934.
The Dt genome has 22 stably significant associated loci in total, of which 21 loci have excellent allelic variation identified. As shown in fig. 7-8, the boll opening rate of D03_6021314 germplasm with CT (n = 2) allelic variation was significantly higher than that of germplasm lines with CC (n = 569) and TT (n = 43); the boll shedding rate of D03_7799462 germplasm with CT (n = 2) allelic variation is obviously higher than that of a germplasm line with CC (n = 568) and TT (n = 42); the boll opening rate of D03_44138315 germplasm with CT (n = 3) allelic variation is obviously higher than that of a germplasm line with CC (n = 549) and TT (n = 61); the boll opening rate of the germplasm with AA (n = 54) allelic variation of D03_44303526 is obviously higher than that of the germplasm line with AG (n = 4) and GG (n = 554); the boll opening rate of D03_45220363 with CG (n = 3) and GG (n = 67) allelic variation germplasm is significantly higher than that of a germplasm line with CC (n = 490); the boll opening rate of the D05_60334237 germplasm with AC (n = 7) allelic variation is obviously higher than that of a germplasm line with AA (n = 407) and CC (n = 31); the boll opening rate of D05_63511270 germplasm with allelic variation of AT (n = 11) and TT (n = 388) is significantly higher than that of germplasm line with AA (n = 195); d05_63547789 germplasm with allelic variation of AA (n = 434) and AG (n = 10) showed significantly higher boll opening rate than germplasm line with GG (n = 149); the boll opening rate of D05_63548956 germplasm with allelic variation of CT (n = 9) and TT (n = 428) is significantly higher than that of a germplasm line with CC (n = 145); the boll opening rate of the D06_1876732 germplasm with AA (n = 121) allelic variation is significantly higher than that of the germplasm line with AG (n = 4) and GG (n = 441); the boll opening rate of D06_1889989 germplasm with allelic variation of CT (n = 4) and TT (n = 145) is obviously higher than that of a germplasm line with CC (n = 449); d08_2931370 has a significantly higher boll opening rate with GT (n = 4) allelic variation germplasm than with GG (n = 318) and TT (n = 181); the boll opening rate of D09_51961124 germplasm with TT (n = 49) allelic variation is significantly higher than that of germplasm line with GG (n = 527); d10_60745942 boll opening rate of germplasm with GG (n = 36) allelic variation is significantly higher than that of germplasm line with CC (n = 562); d11_23725967 has a GG (n = 235) allelic variation significantly higher than the germplasm line with AA (n = 348) and AG (n = 11); d11_23741773 has significantly higher bolling rate than germplasm line with GG (n = 347) with allelic variation germplasm of AA (n = 235) and AG (n = 16); d11_23797129 germplasm with allelic variation of GG (n = 232) and GT (n = 8) was significantly higher than that of germplasm line with TT (n = 340); d11_23824283 germplasm with allelic variation of CC (n = 229) and CT (n = 10) was significantly higher than that of germplasm line with TT (n = 334); d11_23824366 has allelic variation germplasm with CC (n = 205) and CT (n = 3) significantly higher than that with TT (n = 307); the boll opening rate of D11_23824373 germplasm with CC (n = 205) and CT (n = 3) allelic variation is obviously higher than that of a germplasm line with TT (n = 304); d12_14407074 germplasm with GG (n = 474) allelic variation was significantly higher than the germplasm line with CC (n = 3) and CG (n = 137); d12_59755182 germplasm with CC (n = 467) and TT (n = 75) allelic variation was significantly higher than germplasm line with CT (n = 1); the boll opening rate of D10_61212190 germplasm with AA (n = 48) allelic variation is obviously higher than that of an inserted and deleted germplasm line; the boll opening rate of D12_14407079 germplasm with TT (n = 473) allelic variation is obviously higher than that of an insertion and deletion germplasm line; the boll opening rate of germplasm with inserted and deleted bases in D03_12470173 and D10_61212197 is obviously improved. There was no significant difference between allelic variants in position D06-36533573.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention may be made by those skilled in the art without departing from the spirit of the present invention, which is defined by the claims.

Claims (2)

1. The SNP locus which is obviously related to the boll opening character of upland cotton is characterized in that the position information of the SNP locus is as follows:
Figure FDA0003798010900000011
Figure FDA0003798010900000021
2. the use of the SNP site of claim 1 for identifying the boll opening trait of upland cotton.
CN202210974010.XA 2022-08-15 2022-08-15 SNP locus related to upland cotton boll opening character and application thereof Active CN115418409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210974010.XA CN115418409B (en) 2022-08-15 2022-08-15 SNP locus related to upland cotton boll opening character and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210974010.XA CN115418409B (en) 2022-08-15 2022-08-15 SNP locus related to upland cotton boll opening character and application thereof

Publications (2)

Publication Number Publication Date
CN115418409A true CN115418409A (en) 2022-12-02
CN115418409B CN115418409B (en) 2023-04-25

Family

ID=84198249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210974010.XA Active CN115418409B (en) 2022-08-15 2022-08-15 SNP locus related to upland cotton boll opening character and application thereof

Country Status (1)

Country Link
CN (1) CN115418409B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070166707A1 (en) * 2002-12-27 2007-07-19 Rosetta Inpharmatics Llc Computer systems and methods for associating genes with traits using cross species data
CN105238866A (en) * 2015-11-02 2016-01-13 中国农业科学院棉花研究所 SNP site related to early-maturing traits in upland cotton and application of SNP site
CN113940248A (en) * 2021-10-14 2022-01-18 新疆农垦科学院 Two-stage accurate identification method for boll opening concentration of cotton variety

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070166707A1 (en) * 2002-12-27 2007-07-19 Rosetta Inpharmatics Llc Computer systems and methods for associating genes with traits using cross species data
CN105238866A (en) * 2015-11-02 2016-01-13 中国农业科学院棉花研究所 SNP site related to early-maturing traits in upland cotton and application of SNP site
CN113940248A (en) * 2021-10-14 2022-01-18 新疆农垦科学院 Two-stage accurate identification method for boll opening concentration of cotton variety

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNJI SU等: "Genome-wide association study identified genetic variations and candidate genes for plant architecture component traits in Chinese upland cotton" *
谢晓宇等: "陆地棉吐絮率的限制性两阶段多位点全基因组关 联分析及候选基因预测" *

Also Published As

Publication number Publication date
CN115418409B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
Qiu et al. Genomic variation associated with local adaptation of weedy rice during de-domestication
Sakiroglu et al. Identification of loci controlling forage yield and nutritive value in diploid alfalfa using GBS-GWAS
Cann DNA and human origins
Joly et al. Incorporating allelic variation for reconstructing the evolutionary history of organisms from multiple genes: an example from Rosa in North America
Won et al. Divergence population genetics of chimpanzees
Zhao et al. Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci
Shah et al. Extreme genetic signatures of local adaptation during Lotus japonicus colonization of Japan
Zhang et al. Further insights into the phylogeny of two ciliate classes Nassophorea and Prostomatea (Protista, Ciliophora)
Jaramillo-Correa et al. The contribution of recombination to heterozygosity differs among plant evolutionary lineages and life-forms
Cotton et al. Going nuclear: gene family evolution and vertebrate phylogeny reconciled
Busconi et al. Epigenetic stability in Saffron (Crocus sativus L.) accessions during four consecutive years of cultivation and vegetative propagation under open field conditions
CN109830261B (en) Method for screening quantitative trait candidate genes
Goffinet et al. Phylogenetic relationships among basal-most arthrodontous mosses with special emphasis on the evolutionary significance of the Funariineae
Prychitko et al. Alignment and phylogenetic analysis of β-fibrinogen intron 7 sequences among avian orders reveal conserved regions within the intron
CN110218799A (en) The molecular genetic marker of pig residue feed intake character and application
Li et al. Phenotypic plasticity and genetic variation of cotton yield and its related traits under water-limited conditions
CN115820892A (en) SNP molecular marker associated with upland cotton chromosome A07 and boll weight and application thereof
Klaper et al. Heritability of phenolics in Quercus laevis inferred using molecular markers
Qi et al. Genomic dissection of widely planted soybean cultivars leads to a new breeding strategy of crops in the post-genomic era
Li et al. Cryptic diversity within the African aquatic plant Ottelia ulvifolia (Hydrocharitaceae) revealed by population genetic and phylogenetic analyses
CN110358840A (en) The SNP molecular genetic marker of TPP2 gene relevant to remaining feed intake
Cloutier et al. Somatic stability of microsatellite loci in Eastern white pine, Pinus strobus L.
Lam et al. Morphological and ITS1, 5.8 S, and partial ITS2 ribosomal DNA sequence distinctions between two species Playtygyra (Cnidaria: Scleractinia) from Hong Kong
Termignoni-Garcia et al. Comparative population genomics of cryptic speciation and adaptive divergence in Bicknell’s and gray-cheeked thrushes (Aves: Catharus bicknelli and Catharus minimus)
Wang et al. Favorable pleiotropic loci for fiber yield and quality in upland cotton (Gossypium hirsutum)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant