CN117467793A - Soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof - Google Patents
Soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof Download PDFInfo
- Publication number
- CN117467793A CN117467793A CN202311332463.3A CN202311332463A CN117467793A CN 117467793 A CN117467793 A CN 117467793A CN 202311332463 A CN202311332463 A CN 202311332463A CN 117467793 A CN117467793 A CN 117467793A
- Authority
- CN
- China
- Prior art keywords
- protein
- soybean
- glyma
- protein content
- primer sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 235000010469 Glycine max Nutrition 0.000 title claims abstract description 62
- 244000068988 Glycine max Species 0.000 title claims abstract description 61
- 108010073771 Soybean Proteins Proteins 0.000 title claims abstract description 29
- 235000019710 soybean protein Nutrition 0.000 title claims abstract description 25
- 239000003147 molecular marker Substances 0.000 title claims abstract description 20
- 210000000349 chromosome Anatomy 0.000 title claims abstract description 10
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 194
- 235000018102 proteins Nutrition 0.000 claims abstract description 118
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 118
- 238000000034 method Methods 0.000 claims abstract description 14
- 239000002773 nucleotide Substances 0.000 claims abstract description 4
- 125000003729 nucleotide group Chemical group 0.000 claims abstract description 4
- 238000006243 chemical reaction Methods 0.000 claims description 18
- 238000012408 PCR amplification Methods 0.000 claims description 9
- 238000001816 cooling Methods 0.000 claims description 4
- 229940001941 soy protein Drugs 0.000 claims description 4
- 238000002360 preparation method Methods 0.000 claims description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 14
- 238000012216 screening Methods 0.000 abstract description 12
- 241000196324 Embryophyta Species 0.000 abstract description 9
- 238000009395 breeding Methods 0.000 abstract description 5
- 230000001488 breeding effect Effects 0.000 abstract description 3
- 239000000463 material Substances 0.000 description 88
- 239000003921 oil Substances 0.000 description 47
- 102000054766 genetic haplotypes Human genes 0.000 description 40
- 235000019198 oils Nutrition 0.000 description 40
- 238000004458 analytical method Methods 0.000 description 33
- 108700028369 Alleles Proteins 0.000 description 17
- 238000009396 hybridization Methods 0.000 description 14
- 238000009826 distribution Methods 0.000 description 11
- 238000012163 sequencing technique Methods 0.000 description 11
- 238000003752 polymerase chain reaction Methods 0.000 description 10
- 230000035772 mutation Effects 0.000 description 8
- 238000006116 polymerization reaction Methods 0.000 description 8
- 101150009243 HAP1 gene Proteins 0.000 description 7
- 244000046052 Phaseolus vulgaris Species 0.000 description 5
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 101150036080 at gene Proteins 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 235000014113 dietary fatty acids Nutrition 0.000 description 4
- 229930195729 fatty acid Natural products 0.000 description 4
- 239000000194 fatty acid Substances 0.000 description 4
- 150000004665 fatty acids Chemical class 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 239000000523 sample Substances 0.000 description 4
- 230000014616 translation Effects 0.000 description 4
- 101100240528 Caenorhabditis elegans nhr-23 gene Proteins 0.000 description 3
- 108020004414 DNA Proteins 0.000 description 3
- 235000001014 amino acid Nutrition 0.000 description 3
- 150000001413 amino acids Chemical class 0.000 description 3
- 230000027455 binding Effects 0.000 description 3
- 238000009739 binding Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 241000406588 Amblyseius Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000000306 component Substances 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 238000003205 genotyping method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 235000016709 nutrition Nutrition 0.000 description 2
- SECPZKHBENQXJG-FPLPWBNLSA-N palmitoleic acid Chemical compound CCCCCC\C=C/CCCCCCCC(O)=O SECPZKHBENQXJG-FPLPWBNLSA-N 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000008117 seed development Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- WRIDQFICGBMAFQ-UHFFFAOYSA-N (E)-8-Octadecenoic acid Natural products CCCCCCCCCC=CCCCCCCC(O)=O WRIDQFICGBMAFQ-UHFFFAOYSA-N 0.000 description 1
- 101710134649 17.3 kDa class I heat shock protein Proteins 0.000 description 1
- 101710086494 17.6 kDa class I heat shock protein Proteins 0.000 description 1
- LQJBNNIYVWPHFW-UHFFFAOYSA-N 20:1omega9c fatty acid Natural products CCCCCCCCCCC=CCCCCCCCC(O)=O LQJBNNIYVWPHFW-UHFFFAOYSA-N 0.000 description 1
- QSBYPNXLFMSGKH-UHFFFAOYSA-N 9-Heptadecensaeure Natural products CCCCCCCC=CCCCCCCCC(O)=O QSBYPNXLFMSGKH-UHFFFAOYSA-N 0.000 description 1
- 101000889837 Aeropyrum pernix (strain ATCC 700893 / DSM 11879 / JCM 9820 / NBRC 100138 / K1) Protein CysO Proteins 0.000 description 1
- 101100390319 Arabidopsis thaliana FAD8 gene Proteins 0.000 description 1
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 102000014914 Carrier Proteins Human genes 0.000 description 1
- 108010078791 Carrier Proteins Proteins 0.000 description 1
- 101710174426 DNA-directed RNA polymerase III subunit RPC4 Proteins 0.000 description 1
- 102100039886 DNA-directed RNA polymerase III subunit RPC4 Human genes 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 101710126432 Eukaryotic translation initiation factor 4E1 Proteins 0.000 description 1
- 101100071584 Glycine max HSP17.3-B gene Proteins 0.000 description 1
- 101100071599 Glycine max HSP17.6-L gene Proteins 0.000 description 1
- 108700037728 Glycine max beta-conglycinin Proteins 0.000 description 1
- 101710122973 Glycinin G4 Proteins 0.000 description 1
- 108010073032 Grain Proteins Proteins 0.000 description 1
- 102000002812 Heat-Shock Proteins Human genes 0.000 description 1
- 108010004889 Heat-Shock Proteins Proteins 0.000 description 1
- 101000690100 Homo sapiens U1 small nuclear ribonucleoprotein 70 kDa Proteins 0.000 description 1
- 102000016538 Myb domains Human genes 0.000 description 1
- 108050006056 Myb domains Proteins 0.000 description 1
- 101100476756 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) sec-61 gene Proteins 0.000 description 1
- ZQPPMHVWECSIRJ-UHFFFAOYSA-N Oleic acid Natural products CCCCCCCCC=CCCCCCCCC(O)=O ZQPPMHVWECSIRJ-UHFFFAOYSA-N 0.000 description 1
- 239000005642 Oleic acid Substances 0.000 description 1
- 235000021319 Palmitoleic acid Nutrition 0.000 description 1
- 108010044843 Peptide Initiation Factors Proteins 0.000 description 1
- 102000005877 Peptide Initiation Factors Human genes 0.000 description 1
- 101710137622 Probable protein disulfide-isomerase A6 Proteins 0.000 description 1
- 108010029485 Protein Isoforms Proteins 0.000 description 1
- 102000001708 Protein Isoforms Human genes 0.000 description 1
- 101710103937 Protein SLOW WALKER 1 Proteins 0.000 description 1
- 101710142487 Protein translation factor SUI1 homolog 2 Proteins 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 101710167467 Ribosomal RNA-processing protein 7 homolog A Proteins 0.000 description 1
- 102100033978 Ribosomal RNA-processing protein 7 homolog A Human genes 0.000 description 1
- 101100495925 Schizosaccharomyces pombe (strain 972 / ATCC 24843) chr3 gene Proteins 0.000 description 1
- 108010016634 Seed Storage Proteins Proteins 0.000 description 1
- 101710155054 Transcription factor MYC2 Proteins 0.000 description 1
- 102000044143 Translation initiation factor IF2/IF5 Human genes 0.000 description 1
- 108700038807 Translation initiation factor IF2/IF5 Proteins 0.000 description 1
- 102100024121 U1 small nuclear ribonucleoprotein 70 kDa Human genes 0.000 description 1
- 102100037934 U3 small nucleolar RNA-associated protein 6 homolog Human genes 0.000 description 1
- 101710182878 U3 small nucleolar RNA-associated protein 6 homolog Proteins 0.000 description 1
- 102100038397 Vacuolar protein sorting-associated protein 26C Human genes 0.000 description 1
- 101710187765 Vacuolar protein sorting-associated protein 26C Proteins 0.000 description 1
- 101710185735 Vacuolar protein sorting-associated protein 53 A Proteins 0.000 description 1
- 102100038324 Vacuolar protein sorting-associated protein 8 homolog Human genes 0.000 description 1
- 101710175871 Vacuolar protein sorting-associated protein 8 homolog Proteins 0.000 description 1
- DTOSIQBPPRVQHS-PDBXOOCHSA-N alpha-linolenic acid Chemical compound CC\C=C/C\C=C/C\C=C/CCCCCCCC(O)=O DTOSIQBPPRVQHS-PDBXOOCHSA-N 0.000 description 1
- 235000020661 alpha-linolenic acid Nutrition 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007960 cellular response to stress Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- SECPZKHBENQXJG-UHFFFAOYSA-N cis-palmitoleic acid Natural products CCCCCCC=CCCCCCCCC(O)=O SECPZKHBENQXJG-UHFFFAOYSA-N 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000020930 dietary requirements Nutrition 0.000 description 1
- 239000008157 edible vegetable oil Substances 0.000 description 1
- 239000003797 essential amino acid Substances 0.000 description 1
- 235000020776 essential amino acid Nutrition 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 235000019197 fats Nutrition 0.000 description 1
- 230000004136 fatty acid synthesis Effects 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 108010083391 glycinin Proteins 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 208000019622 heart disease Diseases 0.000 description 1
- 208000024348 heart neoplasm Diseases 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- QXJSBBXBKPUZAA-UHFFFAOYSA-N isooleic acid Natural products CCCCCCCC=CCCCCCCCCC(O)=O QXJSBBXBKPUZAA-UHFFFAOYSA-N 0.000 description 1
- 229960004488 linolenic acid Drugs 0.000 description 1
- KQQKGWQCNNTQJW-UHFFFAOYSA-N linolenic acid Natural products CC=CCCC=CCC=CCCCCCCCC(O)=O KQQKGWQCNNTQJW-UHFFFAOYSA-N 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000010197 meta-analysis Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 230000035764 nutrition Effects 0.000 description 1
- ZQPPMHVWECSIRJ-KTKRTIGZSA-N oleic acid Chemical compound CCCCCCCC\C=C/CCCCCCCC(O)=O ZQPPMHVWECSIRJ-KTKRTIGZSA-N 0.000 description 1
- 108010033653 omega-3 fatty acid desaturase Proteins 0.000 description 1
- 238000001543 one-way ANOVA Methods 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000012846 protein folding Effects 0.000 description 1
- 238000001243 protein synthesis Methods 0.000 description 1
- 238000010791 quenching Methods 0.000 description 1
- 230000000171 quenching effect Effects 0.000 description 1
- 238000003753 real-time PCR Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 239000003549 soybean oil Substances 0.000 description 1
- 235000012424 soybean oil Nutrition 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000006231 tRNA aminoacylation Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- 230000009105 vegetative growth Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6858—Allele-specific amplification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/13—Plant traits
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/172—Haplotypes
Abstract
The invention provides a soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof. Belonging to the field of plant identification. In order to rapidly and accurately screen high-protein high-quality soybean varieties. The invention provides a soybean protein content related molecular marker, the gene of the molecular marker is Glyma.17G074400, the nucleotide site at 2279 is C or T, and the application of the markers in preparing a kit for detecting high protein content of soybean and a screening method. The selection of the characters is realized through the selection of the markers, the breeding efficiency is greatly improved, and the effect of directionally improving the soybean varieties is realized, so that the soybean varieties with high protein can be selected.
Description
Technical Field
The invention belongs to the field of plant identification, and particularly relates to a soybean protein content-related molecular marker located on a soybean chromosome 17 and application thereof.
Background
The soybean has rich nutrition, and protein content is about 40%. The soybean protein contains 8 essential amino acids for human body, people can eat soybean to supplement needed nutrient substances, and can prevent cardiovascular diseases of human body, the soybean is an important oil crop, can be processed into edible oil, can meet the dietary requirements of people, and simultaneously mainly consists of five fatty acids, wherein the fatty acids can prevent heart diseases, cancers and the like. Along with the increasing living standard of people, more and more people pay more attention to the edible health and the nutritional value of food, so the demand for soybeans is great, but the soybeans in China are more dependent on import from other countries, so the soybean protein and the high-protein and high-oil soybean varieties are urgently improved in China, and the daily needs of people are met.
The soybean grain protein is a quality-related character, is a relatively complex quantitative character, is controlled by a plurality of genes, is limited by genetic characteristics and a breeding method all the time, is too slow in a traditional method, and is proposed as technology is continuously advanced, molecular auxiliary selection is performed on the basis of the traditional hybridization breeding method, molecular markers are closely linked with genes for determining target characters, the selection of the characters is realized through the selection of the markers, the breeding efficiency is greatly improved, and the effect of directionally improving soybean varieties is realized, so that the soybean varieties with high protein can be selected.
Disclosure of Invention
The invention aims to rapidly and accurately screen high-protein high-quality soybean varieties.
The invention provides a soybean protein content-related molecular marker, wherein the gene of the molecular marker is Glyma.17G074400, and the nucleotide site at 2279 is C or T.
The invention provides a primer sequence for amplifying the molecular marker, wherein the forward primer sequence is shown as SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown in SEQ ID NO. 1.
The invention provides a SNP locus related to soybean protein content, wherein the SNP locus is positioned at 5830450 position on chromosome 17 of soybean, and the base of the locus is C or T.
The invention provides a primer sequence for amplifying the SNP locus, and the forward primer sequence is shown as SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown in SEQ ID NO. 1.
The invention provides application of the molecular marker, the primer sequence, the SNP locus or the primer sequence in preparation of a kit for identifying high-protein soybean or low-protein soybean.
The invention provides a kit for identifying high-protein soybean or low-protein soybean, which comprises the primer sequence.
Further defined, the kit further comprises a Master Mix and water.
The invention provides a method for identifying soybean protein content, which comprises the following specific steps:
step 1: extracting DNA of soybean to be detected;
step 2: and (3) carrying out PCR (polymerase chain reaction) by using the primer sequence of the molecular marker or the primer sequence of the SNP locus, detecting that the soybean of the to-be-detected variety is the soybean with low protein content if the soybean of the to-be-detected variety is the CC genotype, and detecting that the soybean of the to-be-detected variety is the soybean with high protein content if the soybean of the to-be-detected variety is the TT genotype.
Further defined, the conditions of the PCR reaction in step 2 are: (1) Hot Start (Hot Start): maintaining at 95deg.C for 30s for 1 cycle; (2) gradual cooling (Touch down): at 95℃for 60s and then at 63℃for 20s, each cycle was cooled by 0.8℃and a total of 10 cycles were performed from 63℃to 55 ℃. (3) PCR amplification (PCR): the reaction was carried out at 95℃for 60s and at 55℃for 20s for 30 cycles. (4) Plate Read: the reaction was maintained at 37℃for 60s and 1 cycle was performed.
Further defined, the CC genotype in step 2 is that the base of the SNP site is C, and the TT base is that of the SNP site is T.
The beneficial effects are that: the invention utilizes 1029 resource materials from 5 resource hybridization populations as experimental populations. The mutation genes are initially screened by sequence comparison in the corresponding parents, effective molecular markers are selected by designing primers aiming at mutation sites, the KASP technology in the SNP molecular marker technology is adopted for verification in resource materials, and finally, the gene is re-verified and polymerized in a resequencing population by haplotype analysis. The aim is to determine effective functional markers and important candidate genes, and the main research results are as follows:
(1) 5 SNP markers related to protein content were obtained and developed: chr2:47624936, chr3:43490555, chr7:18219120 Chr14:33480275, chr17:5830450.
(2) Important candidate genes in the vicinity of SNP markers related to soybean protein content are obtained as follows: glyma.02G274900, glyma.03G219900, glyma.07G151300, glyma.14G119000, glyma.17G074400.
Drawings
FIG. 1 is a diagram showing the sequence alignment of genes related to proteins;
FIG. 2 is a graph showing the expression results of candidate genes at SNP sites associated with proteins;
FIG. 3 is a graph showing the genotyping results of KASP in the parent material for protein-associated SNP markers;
FIG. 4 is a histogram of protein and oil content distribution for different populations of materials;
FIG. 5 is a graph showing the result of KASP genotyping of SNP markers related to protein content in resource materials;
FIG. 6 is a graph of the mean high protein haplotype and low protein haplotype phenotype results for protein-associated SNP sites;
FIG. 7 is a graph of the average high protein and low protein phenotypes with a polymerization effect associated with the protein;
FIG. 8 is a distribution histogram of the resequencing population proteins and oil content.
Detailed Description
Example 1.
1. 1029 parts of material from 5 hybridized colony resource materials were utilized. First, seven varieties of Suilng 76, suilng 69, suilng 49, suilng 35, suilng 42, dongsheng No.1 and Suilk No.3 are selected as parents, and hybridized combination is carried out, wherein the characteristics of the varieties are shown in Table 1. The hybridization combination is Suizhong 69 XSuizhong 76, suiximang 35 XSuizhong 76, dongsheng 1 XSuiximang 76, suiximang 3 XSuiximang 42 and Suiximang 49 XSuiximang 76. 1209 soybean germplasm resources from the above 5 hybrid combination F6 generations are selected as an experimental group, planted in a saleization separation experimental field of the national academy of sciences of Heilongjiang province in 2022, and field management method is the same as field management. In the vegetative growth stage, three young leaves at the top of the plant are adopted for extracting DNA, KASP typing experiments are carried out, threshing is carried out after the seeds are mature, and the seeds are used for measuring the protein and oil content.
Next, the study performed haplotype analysis and gene polymerization using 643 parts of the finished genome resequencing material from soybean improvement genetics laboratory.
TABLE 1 quality characterization of hybrid parents
Variety of species | Quality traits |
Seiner 76 | High protein variety, 46.78% protein and 16% fat.86% |
Suinong 69 | Disease-resistant variety, protein content 40.57%, fat content 19.46% |
Suinong 49 | Special variety (large grain variety), protein content 41.24%, fat content 21.57% |
Suinong 35 | High oil soybean with protein content of 42.17% and fat content of 22.00% |
Seism 42 | High fatty acid soybean variety, protein content 40.68%, fat content 20.00%, |
dongsheng No.1 | Protein content 41.30%, fat content 19.97% |
No.3 of no fishy bean | No fishy bean variety, protein content 37.37%; fat content 21.81% |
2. Important allele mining
Important allele mining is a method of screening for sites that are significantly associated with a trait of interest by analyzing the correlation between genomic data and the trait of interest, and then combining phenotypes to further determine the effect of the allele on the trait of interest. In this study, coincidence rate was used to represent the effect of a site.
In order to preliminarily screen candidate genes related to soybean oil protein, firstly, a soybean gene sequence in a SoyBase (https:// www.soybase.org /) data platform is downloaded as a template, the gene sequence of a hybrid combination parent is extracted from 634 parts of soybean core planting sequencing resources in northeast areas, DNAMAN is utilized for sequence comparison, SNP loci which are located in a CDS region and cause amino acid change are screened out, candidate genes are preliminarily screened out, and the functions of the candidate genes are preliminarily explored. Secondly, determining SNP loci corresponding to each pair of parents, designing specific primers aiming at the loci, verifying in parent materials by adopting KASP technology in SNP molecular marking technology, and screening effective primers capable of distinguishing alleles. Finally, preparing the effective primer before the resource material sample reacts; the KASP master mix comprises a kit of LGC company in UK, which contains two general quenching fluorescent probes FAM and HEX, and core components such as Taq enzyme; whereas the KASP analysis mixture contained two forward primers and one reverse primer, wherein the two forward primers were designed based on the sequence specificity before and after the SNP site. These primers bind to different alleles and bind to FAM and HEX fluorescent probes when subjected to the KASP procedure, producing different fluorescent results. If a given SNP genotype is homozygous, a green or blue fluorescent signal will be generated; if the genotype is heterozygous, the result shows a red fluorescent signal.
According to the principle, a corresponding upstream and downstream 50bp base sequence can be selected on the SNP locus obtained by screening, and a KASP primer is designed by utilizing Premier5.0 software. The primers comprise two specific forward primers (F1/F2) and a common reverse primer (R). The forward primer not only has the characteristic of identifying different alleles, but also has fluorescent labels FAM (GAAGGTGACCAAGTTCATGCT) and HEX (GAAGGTCGGAGTCAACGGATT) with different colors connected to one end so as to realize the distinction of PCR amplification products. The primer sequences are shown in Table 2. The PCR reaction uses 384-well plate as carrier, adds the chemical substance needed by the reaction, and uses the Roche Light Cycler 480 II real-time fluorescence quantitative PCR instrument to make the reaction. The reaction procedure was divided into the following parts: (1) Hot Start (Hot Start): maintaining at 95deg.C for 30s for 1 cycle; (2) gradual cooling (Touch down): at 95℃for 60s and then at 63℃for 20s, each cycle was cooled by 0.8℃and a total of 10 cycles were performed from 63℃to 55 ℃. (3) PCR amplification (PCR): the reaction was carried out at 95℃for 60s and at 55℃for 20s for 30 cycles. (4) Plate Read: the reaction was maintained at 37℃for 60s and 1 cycle was performed. After the PCR reaction is completed, we need to read the end fluorescent signal.
Table KASP reaction System
TABLE 2 primer sequences
Results: in order to preliminarily determine candidate genes related to soybean protein content, the study screened important genes related to soybean protein pathways collected and sorted by the subject group and obtained by combining with MateQTL analysis, and articles published by the subject group: meta-analysis and transcriptome profiling reveal hub genes for soybean seed storage composition during seed development, soybean protein, oil-related excellent allele mining breeding evaluation and screening. Firstly, extracting the gene sequence of an important gene related to the soybean protein content by utilizing a SoyBase (https:// www.soybase.org /) data platform, extracting the sequences of hybrid combined parents of Suilnong 76, suilnong 69, suilnong 49, suilnong 35 and Suilnong 43 from the soybean core germplasm sequencing resource in 634 northeast regions, and carrying out sequence comparison on parents according to the gene sequences related to Dongsheng No.1 and Suilnon-fishy bean No.3 in published articles, screening SNP loci which are positioned in a CDS region and cause amino acid change, and initially screening candidate genes.
Gene annotation relates to functions of fatty acid synthesis, growth and development, protein binding and the like, and relates to protein types of 7S, 40S, 60S and the like. For 103 important genes, soybean reference genome from American variety Williams 82 downloaded on SoyBase platform is compared with extracted parent sequence in DNAMAN, one SNP locus comparison result is shown in figure 1, in the hybridization combination of Suilnong 69×Suilnong 76, the female parent Suilnong 69 is mutated from genotype C to G, and the mutation of amino acid is caused. Finally, 65 SNP loci are screened out, and the candidate genes are located in 27 candidate genes related to proteins, wherein 7 SNP loci are located on the gene Glyma.02G090800, and 5 SNP loci are located on the gene Glyma.16G018400, so that the number is large; the other sites are distributed on the gene more uniformly. Preliminary studies were made on functional annotation of 27 candidate genes, such as gene Glyma.02G090800, involving protein translation processes including transformation initiation, translation initiation factor activity, protein binding; genes Glyma.03G232000 and Glyma.07G151300 are involved in protein folding and cellular stress response; gene Glyma.08G316700 plays an important role in different stages of protein synthesis and the like; the gene Glyma.07G151300 is a member of FAD8, and can catalyze the production of palmitoleic acid and linolenic acid, and is closely related to the fatty acid content of soybeans. The candidate genes include a plurality of genes such as Glyma.02G151500, glyma.02G274900 and Glyma.03G219900, and are functionally related to translation, transcription and binding of proteins (see Table 3).
TABLE 3 candidate genes near protein-associated SNP markers
Gene number | Gene annotation |
Glyma.02G090800 | translation initiation factor IF2/IF5 |
Glyma.02G151500 | protein SLOW WALKER 1 |
Glyma.02G274900 | Chromosome and associated proteins |
Glyma.03G219900 | DELLA protein |
Glyma.03G232000 | probable protein disulfide-isomerase A6 |
Glyma.03G244800 | OAS-TL1,cysteine synthase |
Glyma.07G051500 | transcription factor MYC2 |
Glyma.07G102800 | vacuolar protein sorting-associated protein 26C |
Glyma.07G151300 | omega-3fatty acid desaturase |
Glyma.07G261900 | U3 small nucleolar RNA-associated protein 6homolog |
Glyma.08G069000 | HSP17.3-B,17.3kDa class I heat shock protein |
Glyma.08G316700 | protein translation factor SUI1 homolog 2 |
Glyma.09G018300 | vacuolar protein sorting-associated protein 53A |
Glyma.09G230700 | DNA-directed RNA polymerase III subunit RPC4 isoform X1 |
Glyma.10G037100 | glycinin G4,GY4 |
Glyma.12G018300 | tRNA aminoacylation for protein translation |
Glyma.12G230900 | transport protein Sec61 subunit alpha |
Glyma.13G171200 | ribosomal RNA-processing protein 7homolog A |
Glyma.13G176000 | HSP17.6-L,17.6kDa class I heat shock protein |
Glyma.14G048800 | vacuolar sorting protein |
Glyma.14G119000 | myb domain protein 56 |
Glyma.15G089800 | eukaryotic translation initiation factor 4E-1 |
Glyma.16G018400 | vacuolar protein sorting-associated protein 8homolog |
Glyma.16G178800 | HSP90A2,heat shock protein 90-A2 |
Glyma.17G074400 | delta-12desaturase |
Glyma.19G164800 | glycinin subunit G7,GY7 |
Glyma.20G146200 | beta-conglycinin beta-subunit,CG-BETA-1 |
The results of the expression pattern analysis using the RNA-seq dataset in SoyBase (https:// www.soybase.org /) were mapped using TBtool software (FIG. 2), with orange to green color indicated the expression level from high to low. 20 candidate genes in 27 genes are expressed in each stage of the development of the seed grains, wherein the expression level of 10 genes such as Glyma.03G219900, glyma.03G232000, glyma.03G244800 and the like is low; the expression level of 6 genes such as Glyma.03G232000, glyma.03G244800, glyma.07g102800 and the like is higher, especially the expression level is highest in roots; the difference in the expression levels of 4 genes, such as Glyma.02G151500, glyma.07G051500, glyma.15G089800, and Glyma.16G018400, was most remarkable. The gene Glyma.07g05150 has the lowest expression quantity of the Seed 42DAF in the Seed development period, and the peak value of the Seed25DAF is 103.6 times of the Seed 42 DAF; gene Glyma.10G037100 shows the lowest expression level of Seed 21DAF in the Seed grain development period, and the peak value of Seed25DAF is 24 times of the Seed 21 DAF; the gene Glyma.15G089800 has the lowest expression quantity of Seed 21DAF in the Seed grain development period, and the peak value of the Seed 35DAF is 2.4 times of the Seed 21 DAF; gene Glyma.16G018400 showed the lowest expression of Seed 21DAF in its Seed stage, with the peak of Seed 10DAF being 3.7 times that of Seed 21 DAF.
To preliminarily determine SNP sites related to soybean protein content, KASP primers were designed for 65 SNP sites of 27 soybean protein-related candidate genes of Table 1, 50bp base sequences upstream and downstream of the extraction site, respectively, using Primer5.0 software (http:// www.premierbiosoft.com/index. Html). And verifying the primer in the parent corresponding to the hybridization group, and repeating the test at least three times for each pair of parent of the primer to improve the reliability of the result. FIG. 3 is one of many results, in which green and blue represent two different homozygous genotypes and red is heterozygous genotype. When the primer can be stably displayed as different homozygous genotypes in parents, the primer is judged to have a better typing effect. According to the KASP typing result, 19 excellent primers with better typing effect were finally selected as shown in Table 4.
TABLE 4 SNP molecular marker loci associated with protein content
SNP numbering | Gene number | Base group | Chromosome of the human body | Position of |
56 | Glyma.02G090800 | C/T | Chr2 | 7982944 |
58 | Glyma.02G090800 | G/A | Chr2 | 7983144 |
30 | Glyma.02G274900 | T/A | Chr2 | 47624936 |
31 | Glyma.02G274900 | T/C | Chr2 | 47625857 |
63 | Glyma.03G219900 | T/C | Chr3 | 43490555 |
64 | Glyma.03G232000 | A/T | Chr3 | 44494958 |
71 | Glyma.07G051500 | C/T | Chr7 | 4424835 |
40 | Glyma.07G151300 | G/T | Chr7 | 18219120 |
73 | Glyma.07G261900 | G/A | Chr7 | 43998064 |
74 | Glyma.07G261900 | T/C | Chr7 | 43998083 |
79 | Glyma.12G018300 | C/A | Chr12 | 1280727 |
80 | Glyma.12G230900 | G/A | Chr12 | 40525342 |
18 | Glyma.14G119000 | A/G | Chr14 | 33480275 |
19 | Glyma.14G119000 | A/C | Chr14 | 33480631 |
67 | Glyma.15G089800 | A/G | Chr15 | 6903356 |
50 | Glyma.17G074400 | C/A | Chr17 | 5830396 |
51 | Glyma.17G074400 | T/C | Chr17 | 5830450 |
52 | Glyma.17G074400 | T/C | Chr17 | 5831106 |
68 | Glyma.19G164800 | A/C | Chr19 | 43002797 |
The specific distribution number of these 19 excellent sites on 20 chromosomes of soybean is Chr02 (4), chr03 (2), chr07 (4), chr12 (2), chr14 (2), chr15 (1), chr17 (3), chr19 (1), and the greatest number of SNPs distributed on Chr02 and Chr07 chromosomes can be seen. Specific distributions in different hybridization combinations are 69×seism 76 (9), 1×seism 76 (7), 35×seism 76 (6), 3×seism 42 (11), 49×seism 76 (11), and more polymerized SNPs in the hybridization combination of 3×seism 42 and 49×seism 76.
Kasp typing experiment
Adding the prepared KASP reaction system into a 384-well plate, obtaining experimental results through a Roche Light Cycler 480 II instrument, and then importing the results into an Excel table, and processing and analyzing by combining phenotype data, wherein the basic method comprises the following steps:
(1) Classifying materials according to soybean seed protein or oil phenotype, calculating the average value and standard deviation of each group of data, and determining a critical value according to the result of adding and subtracting the standard deviation from the average value;
(2) Materials above this value are referred to as high protein or high oil materials, and below this value are referred to as low protein or low oil materials, and the criterion is used to calculate the gene locus compliance, i.e., the proportion of materials that meet a phenotypic characteristic in the population.
(3) The sample data obtained from the KASP result are counted according to the high protein/oil component material and the low protein/oil component material, and the two data are added to obtain the total number, and the coincidence rate is obtained by dividing the coincidence number by the total number. Then, the high protein/oil material and the low protein/oil material were used as rows, the x-containing allele and the y-containing allele were used as columns, and finally a four-grid table (see table 5) of coincidence rates was constructed. According to the data in the four-grid table, the accuracy and the reliability of the detection method, and the possible misjudgment condition are analyzed, and necessary improvements are carried out.
(4) Judging whether the SNP locus is related to the soybean protein oil content by using hypothesis test: the original assumption is that H0 indicates that the content size is independent of the x/y allele, while HA indicates that there is a correlation between these two variables. By calculation we can get the coincidence rates P1 and P2 and determine whether the H0 hypothesis needs to be rejected and the HA hypothesis accepted based on P1, P2 and the set significance level α (60%).
Table 5 four grid table of compliance rates
High protein/oil material | Low protein/oil material | |
Containing x alleles | a | c |
Containing y alleles | b | d |
Total number of | M | N |
Note that: x and y are genotypes of KASP (kaSP) typing of SNP locus design primers, a is the number of x alleles in a result of ≡sequencing ≡egg ≡or ≡oil material typing, b is the number of x alleles in a result of ≡sequencing ≡egg ≡or low oil material typing, c is the number of y alleles in a result of ≡sequencing ≡egg ≡or ≡oil material typing, d is the number of y alleles in a result of ≡sequencing ≡egg ≡or low oil material typing, M is the total number of ≡sequencing ≡egg ≡or ≡oil material, and N is the total number of ≡sequencing low egg ≡or low oil material.
Results:
the study was directed to F from 5 hybridization populations 5 The material is planted in the area of the seismosis in 2022. Wherein, the group 28 is the hybridization of the seiner 69 and the seiner 76, the male parent is a high-protein variety, the protein content is 46.78%, the female parent is a disease-resistant variety, 390 plants are planted together, and the phenotype data 245 plants are harvested and measured. The colony 122 is hybridized by the seism 35 and the seism 76, the male parent is a high-protein variety, the protein content is 46.78, the female parent is a high-oil variety, the oil content is 22.00%, 390 plants are planted, and phenotype data 274 plants are harvested and measured. Group 163 is the hybridization of the non-fishy bean 3 of the seiid with the seiid 42, the male parent is the non-fishy bean variety, the female parent is the high oleic acid variety, 390 plants are planted for harvesting and 205 plants are measured, and the like. Other cross-combining information is shown in Table 6, with final offspring co-harvested and measured for 1029 strains.
TABLE 6 Soybean resource Material information
Group numbering | Female parent | Protein content | Oil content | Father parent | Protein content | Oil content | Quantity of materials |
28 | Suinong 69 | 40.57% | 19.64% | Seiner 76 | 46.78% | 16.86% | 245 |
119 | Dongsheng 1 | 41.30% | 19.97% | Seiner 76 | 46.78% | 16.86% | 212 |
122 | Suinong 35 | 42.17% | 22.00% | Seiner 76 | 46.78% | 16.86% | 274 |
163 | Fishy smell of seiid 3 | 37.37% | 21.81% | Seism 42 | 40.68% | 20.00% | 205 |
167 | Suinong 49 | 42.17% | 19.97% | Seiner 76 | 46.78% | 16.86% | 272 |
Soy protein, oil phenotype data were measured by a Foss grain analyzer for 2022 and were descriptive statistically analyzed using SPSS software. The maximum protein content of the 2022 material is 46.6 percent, 31.53 percent and the average value is between 36.5 percent and 40.5 percent; the maximum oil content is 25.64%, 16.16% and the average value is 18.46% -21.49%. The phenotype data are widely distributed and obviously different, the quantitative trait genetic characteristics are met, and the protein oil content of the material is moderately distributed in a bias manner through analysis of kurtosis and bias discovery, so that the material is suitable for subsequent research.
Analysis of the different hybridization populations revealed from Table 7 that among the five populations, the 49X 76 hybrid protein content was highest, the maximum was 46.63% and the average was 40.49%; the content of the combined oil of the hybridization of the 3 XSuinon 42 with no fishy smell is the highest, the maximum value is 25.64%, and the average value is 21.48%. The standard deviation of the protein content is between 1.24 and 2.40, and the variation coefficient is between 3.39 and 5.92 percent; the standard deviation of the oil content is between 0.72 and 1.37, the variation coefficient is between 3.27 and 6.84 percent, the total standard deviation is smaller, and no larger amplitude is generated.
TABLE 7 descriptive analysis of different populations of protein, oil quality traits
Maximum value | Minimum value | Average value of | Median of | Standard deviation of | Degree of deviation | Kurtosis degree | Coefficient of variation | |
Suinon 69 x Suinon 76-protein | 44.15 | 33.52 | 40.17 | 40.54 | 2.16 | -0.74 | 0.35 | 5.37% |
69X 76-oil content of seism | 21.70 | 16.16 | 18.46 | 18.25 | 1.23 | 0.64 | -0.14 | 6.64% |
35X 76 of Suinon-proteins | 43.88 | 27.69 | 39.84 | 39.99 | 1.89 | -1.30 | 6.13 | 4.73% |
35X seiner 76-oil content | 24.00 | 17.04 | 19.89 | 19.93 | 1.07 | 0.23 | 1.27 | 5.37% |
Dongsheng 1 Xseinone 76-protein | 43.47 | 32.36 | 39.49 | 39.86 | 1.95 | -0.93 | 1.15 | 4.94% |
Dongsheng 1 Xseinong 76-oil | 22.57 | 18.42 | 20.28 | 20.22 | 0.72 | 0.38 | 0.24 | 3.57% |
Fishy smell-free 3 Xseinon 42-protein | 39.59 | 32.65 | 36.50 | 36.49 | 1.24 | -0.12 | -0.19 | 3.39% |
Fishy smell-free 3 Xseinon 42-oil | 25.64 | 19.25 | 21.49 | 21.56 | 0.78 | 0.31 | 3.49 | 3.65% |
49 XSuinong 76-protein | 46.63 | 31.53 | 40.49 | 40.65 | 2.40 | -0.50 | 1.01 | 5.92% |
49X 76-oil content of seism | 24.51 | 16.95 | 20.04 | 19.90 | 1.37 | 0.71 | 0.69 | 6.84% |
Drawing frequency distribution histograms of protein oil phenotype data of five groups by utilizing GraphPad Prism 8 software, wherein the protein content distribution is from 30% to 48%, and the group spacing is 1; the oil content distribution is 16% -25% and the group distance is 0.5. As can be seen from FIG. 4, the soybean seed protein oil content values of the respective populations were measured to show continuous distribution, and the normal distribution trend was evident. Secondly, the figure shows that the total protein content of the 49X-seiner 76 hybridization group and the 69X-seiner 76 hybridization group of the seiner is higher and concentrated to more than 40%, and the oil content of the seiner is relatively lower; the oil content of the 3 XSuinon 42 hybridization group without fishy smell is higher and concentrated to more than 21%, and the protein content is lower as a whole; the 35X seiner 76 hybridized colony protein content is concentrated at 38% -41%, and the oil content is concentrated at 19% -20.5%.
4. Verification of SNP locus related to protein content
To verify the excellent protein-associated alleles, the 19 primers screened were typed by the KASP platform, and the fluorescent signal was shown green or blue if the genotype of a given SNP was homozygous, and red if the genotype was heterozygous, and the KASP results for the protein-associated SNP sites are shown in fig. 5.
The results show that the 19 primers have better typing effect, and KASP results are combined with protein phenotype for analysis: at Chr2:47624936 (fig. 5 (3)), the high protein material had 24 AA genotypes with a compliance of 58.54% and the low protein material had 20 TT genotypes with a compliance of 83.33%; at Chr3:43490555 (fig. 5 (5)): the high protein material has 24 parts of AA genotype, the coincidence rate is 60.98%, the low protein material has 15 parts of TT genotype, and the coincidence rate is 53.57%; at Chr7:18219120 (fig. 5 (8)): the high protein material has 53 parts of AA genotype, the coincidence rate is 81.54%, the low protein material has 27 parts of GG genotype, and the coincidence rate is 54.00%; at Chr14:33480275 (fig. 5 (13)): the high protein material has 24 parts of GG genotype, the coincidence rate is 58.54%, the low protein material has 34 parts of AA genotype, and the coincidence rate is 70.53%; at Chr17:5830450 (17 in fig. 5): the high protein material has 84 parts of TT genotype, the coincidence rate is 84.85%, the low protein material has 34 parts of CC genotype, and the coincidence rate is 57.63%. The five SNP markers can successfully carry out typing and show different genotypes in high and low proteins, and can better distinguish high and low protein materials (as shown in table 8).
TABLE 8 screening results of SNP markers related to the soybean protein content
Note that: "shows excellent SNP locus with better screening effect
Finally, 5 SNP loci which are determined in resource materials and are related to the soybean protein content are positioned in 5 important candidate genes: glyma.02G274900 (Chr 2: 47624936), glyma.03G219900 (Chr 3: 43490555), glyma.07G151300 (Chr 7: 18219120), glyma.14G119000 (Chr 14: 33480275), glyma.17G074400 (Chr 17: 5830450) are as shown in Table 9. Wherein the hybrid combination seism 69×seism 76 contains 3 mutation sites at SNP numbers 30, 63, 51; the hybrid combination seism 35 x seism 76 contains 3 mutation sites at SNP numbers 63, 40 and 18; the hybrid combination Dongsheng 1 XSuinon 76 contains 3 mutation sites at SNP numbers 30, 40 and 18; the hybrid combination seiid no fishy 3 x seiner 42 contains 2 mutation sites at SNP numbers 63 and 40; the hybrid amblyseius 49×amblyseius 76 contains 5 mutation sites at SNPs 30, 63, 40, 18, 51. The hybrid combined seiner 49×seiner 76 is polymerized with the largest number of SNP loci related to protein content, and the phenotypic detection of the resource group shows that the total protein content is the highest in 5 groups, concentrated to more than 40%, and the highest value is 46.63%, so that the phenotype is consistent with the genotype.
TABLE 9 SNP genotypes associated with Soy seed protein content
5. Haplotype analysis
Verifying the mined candidate genes in a resequencing population, and carrying out haplotype analysis on the candidate genes in 643 resequencing materials by using software, wherein the specific method is as follows:
(1) The SoyBase (https:// www.soybase.org /) data platform is utilized to extract the genome sequence of the soybean protein oil candidate genes, and the genome information of the resequencing population is combined to search all candidate genes, and the important candidate genes of the SNP loci are screened out.
(2) And then, dividing the similar SNP loci into a group for haplotype analysis, and analyzing the relationship between haplotype and phenotype in the important candidate genome sequence information.
(3) The boxplot was drawn using GraphPad Prism 8 software and the significance differences between the different haplotypes and their phenotypes in each important candidate gene were analyzed. Significance analysis the variance alignment was detected and multiple comparisons made using the Least Significant Difference (LSD) method in the one-way ANOVA model.
For haplotype analysis, subsequent studies were performed using the phenotype data of the 643 re-sequencing resource population provided by the present laboratory for two years 2018, 2019, and the population re-sequencing genotype data. The BIUP value of the protein oil content for two years is shown in figure 8, and the 2 quality character variation coefficients are between 4.7% and 4.9%, so that the BIUP value is stable and has no larger amplitude; the protein property of the protein is in medium bias distribution, and the oil property of the protein is in high bias distribution, so that the protein is suitable for subsequent experiments.
Results: to further determine the correlation of important candidate genes with soy protein content, haplotype analysis was performed on SNP sites of 5 important candidate genes and the proximity sites were grouped into one set for joint analysis, each set yielding a different haplotype. The proportion of haplotypes in 643 sequenced materials was analyzed, and the phenotypic mean of the different haplotypes was calculated for analysis of variance. The final analysis resulted in high protein good haplotypes and low protein haplotypes with significant differences in the average protein phenotype of the 4 groups (see figure 6).
Analysis on the gene Glyma.02G274900 shows that the high protein excellent haplotype has_2 (ACTTT) and the low protein haplotype has_3 (TCTTT), the high protein excellent haplotype accounts for 24.8%, the average protein content is 43.1%, the low protein haplotype accounts for 59.3%, and the average protein content is 42.2%; analysis on the gene Glyma.07G151300 to obtain high protein excellent haplotype Hap_3 (GAA) and low protein haplotype Hap_4 (AGT), wherein the high protein excellent haplotype accounts for 10.3%, the average protein content is 42.9%, the low protein haplotype accounts for 19.5%, and the average protein content is 41.5%; analysis on gene Glyma.14G119000 gave high protein excellent haplotype Hap_3 (TCTAC) and low protein haplotype Hap_1 (TCCAC), the high protein excellent haplotype was 56.5%, the average protein content was 42.9%, the low protein haplotype was 13.2%, and the average protein content was 41.8%; analysis on the gene Glyma.17G074400 shows that the high protein excellent haplotype Hap_5 (TTAGTCCCG) and the low protein haplotype Hap_4 (TCAGTCCCG) are obtained, the high protein excellent haplotype accounts for 10.1%, the average protein content is 43.3%, the low protein haplotype accounts for 55.1%, the average protein content is 42.0%, and the four genes have obvious difference in protein content. Analysis on the gene Glyma.03G219900 gave a high protein excellent haplotype Hap_1 (CCGAGTAAGC) and a low protein excellent haplotype Hap_2 (CCGAGTTAGC), the high protein excellent haplotype was 84.3%, the average protein content was 42.5%, the low protein haplotype was 6.7%, and the average protein content was 41.7%.
To further determine if there was synergy of the excellent haplotypes in the high protein material, the material was subjected to a polymerization analysis in 643 re-sequencing populations (see FIG. 6). Selecting 156 parts of high-protein material with protein content higher than 44%, counting haplotype ratio and analyzing polymerization effect: analysis at the gene Glyma.02G274900 shows that 50 parts of the material contains high protein genotype Hap_2 (ACTTT) accounting for 30.1% of the high protein material; analysis at the gene Glyma.07G151300 shows that 19 parts of the material contained high protein genotype Hap_1 (TACCC), accounting for 12.2% of the high protein material; analysis at gene Glyma.14G119000 gave 109 parts of material containing high protein genotype Hap_3 (TCTAC) accounting for 69.9% of the high protein material; analysis at gene Glyma.17G074400 shows that 25 parts of the material contains high protein genotype Hap_5 (TTAGTCCCG) accounting for 16.0% of the high protein material; as a result of analysis at the gene Glyma.03G219900, 141 parts of the material contained high protein genotype Hap_1 (CCGAGTAAGC) accounting for 90.0% of the high protein material. Of 156 parts of material, 1 part of material polymerized 5 excellent genotypes, 11 parts of material polymerized 4 excellent genotypes, 39 parts of material polymerized 3 excellent genotypes, and 71 parts of material polymerized 2 excellent genotypes; there were 97 parts of material, and the high protein genotypes of Glyma.14G119000Hap_3 (TCTAC) and Glyma.03G219900Hap_1 (CCGAGTAAGC) were polymerized, accounting for 62.1% of the high protein material, which was judged to have higher polymerization effect.
154 parts of low-protein material with the protein content lower than 41% are selected, the haplotype ratio is counted and the polymerization effect is analyzed: analysis at the gene Glyma.02G274900 shows that 99 parts of the material contains low protein genotype Hap_3 (TCTTT) accounting for 64.3% of the low protein material; analysis at the gene Glyma.07G151300, 38 parts of material contained low protein genotype Hap_4 (AGT), accounting for 24.7% of low protein material; analysis at gene Glyma.14G119000 gave 54 parts of material containing low protein genotype Hap_1 (TCCAC) accounting for 35.1% of the low protein material; analysis at gene Glyma.17G074400 shows that 101 parts of material contains low protein genotype Hap_4 (TCAGTCCCG) accounting for 65.6% of low protein material; analysis at the gene Glyma.03G219900 gave 15 parts of material containing low protein genotype Hap_2 (CCGAGTTAGC) at 10.0% of the low protein material. Of 156 parts of material, 9 parts of material polymerized 4 low protein genotypes, 46 parts of material polymerized 3 low protein genotypes, and 44 parts of material polymerized 2 low protein genotypes; 82 parts of material, the low protein genotypes of Glyma.02G274900Hap_3 (TCTTT) and Glyma.17G074400Hap_4 (TCAGTCCCG) were polymerized simultaneously, accounting for 53.3% of the low protein material, judging that the material has higher polymerization effect.
Table 10 superior haplotypes involved in polymerization with protein
Gene | High protein genotype | Number of materials | Low protein genotype | Number of materials |
Glyma.02G274900 | Hap_2(ACTTT) | 50 | Hap_3(TCTTT) | 99 |
Glyma.07G151300 | Hap_3(GAA) | 19 | Hap_4(AGT) | 38 |
Glyma.14G119000 | Hap_3(TCTAC) | 109 | Hap_1(TCCAC) | 52 |
Glyma.17G074400 | Hap_5(TTAGTCCCG) | 25 | Hap_4(TCAGTCCCG) | 101 |
Glyma.03G219900 | Hap_1(CCGAGTAAGC) | 141 | Hap_2(CCGAGTTAGC) | 15 |
The final polymerized protein content phenotype is shown in FIG. 7, with a maximum protein content of 48.06%, a minimum protein content of 44.01% and an average value of 45.30% in 97 parts of material polymerized in the high protein genotype; the highest protein content of 82 parts of polymerized materials in the low protein genotype was 40.98%, the lowest 37.95% and the average value was 40.10%.
Example 2.
1. A kit for screening high protein soybeans:
the forward primer sequence of the amplified molecular marker is shown as SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown as SEQ ID NO. 1; the nucleotide sequence of the downstream primer of the amplification SNP1 is shown as SEQ ID NO. 1;
the screening method comprises the following steps: selecting a sample with unknown soy protein content, and performing a PCR amplification procedure by using the kit for screening high-protein soybeans in the step one: (1) Hot Start (Hot Start): maintaining at 95deg.C for 30s for 1 cycle; (2) gradual cooling (Touch down): at 95℃for 60s and then at 63℃for 20s, each cycle was cooled by 0.8℃and a total of 10 cycles were performed from 63℃to 55 ℃. (3) PCR amplification (PCR): the reaction was carried out at 95℃for 60s and at 55℃for 20s for 30 cycles. (4) Plate Read: the reaction was maintained at 37℃for 60s and 1 cycle was performed. The steps after KASP analysis are as follows:
2. a method for identifying soybeans with high protein content, which comprises the following specific steps:
(1) Extracting DNA of soybean to be detected;
(2) And (3) carrying out PCR reaction by using the primer of the molecular marker, wherein the soybean of the to-be-detected variety is detected to be of a CC genotype, the soybean of the to-be-detected variety is detected to be of a low protein content, and the soybean of the to-be-detected variety is detected to be of a TT genotype.
Results: the soybean protein content in the sample with unknown soybean protein content is detected, the genotype mark is used for detecting TT genotype, and the high-protein content of the soybean is consistent with the genotype detected by the mark. The low protein content of soybean is consistent with the genotype detected by the marker.
Claims (10)
1. A soybean protein content-related molecular marker, characterized in that the gene of the molecular marker is Glyma.17g074400, and the nucleotide site at 2279 is C or T.
2. Amplifying the primer sequence of the molecular marker of claim 1, wherein the forward primer sequence is shown in SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown in SEQ ID NO. 1.
3. A soybean protein content-related SNP locus, wherein the SNP locus is positioned at 5830450 on chromosome 17 of soybean, and the base of the locus is C or T.
4. Amplifying the primer sequence of the SNP locus according to claim 3, wherein the forward primer sequence is shown in SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown in SEQ ID NO. 1.
5. Use of the molecular marker of claim 1, the primer sequence of claim 2, the SNP site of claim 3 or the primer sequence of claim 4 for the preparation of a kit for identifying high protein content soybeans or low protein content soybeans.
6. A kit for identifying high protein soybean or low protein soybean, comprising the primer sequence of claim 2 or claim 4.
7. The kit of claim 6, further comprising a Master Mix and water.
8. A method for identifying the content of soy protein, which is characterized by comprising the following specific steps:
step 1: extracting DNA of soybean to be detected;
step 2: carrying out PCR reaction by using the primer sequence of the molecular marker of claim 2 or the primer sequence of the SNP locus of claim 4, detecting that the soybean of the to-be-detected variety is of CC genotype, and if the soybean of the to-be-detected variety is of TT genotype, the soybean of the to-be-detected variety is of high protein content.
9. The method according to claim 8, wherein the conditions of the PCR reaction in step 2 are: the conditions for the PCR reaction in step 2 are: (1) Hot Start (Hot Start): maintaining at 95deg.C for 30s for 1 cycle; (2) gradual cooling (Touch down): at 95℃for 60s and then at 63℃for 20s, each cycle was cooled by 0.8℃and a total of 10 cycles were performed from 63℃to 55 ℃. (3) PCR amplification (PCR): the reaction was carried out at 95℃for 60s and at 55℃for 20s for 30 cycles. (4) Plate Read: the reaction was maintained at 37℃for 60s and 1 cycle was performed.
10. The method according to claim 8, wherein the CC genotype in step 2 is C at the base of the SNP site, and the TT gene is T at the base of the SNP site.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311332463.3A CN117467793A (en) | 2023-10-16 | 2023-10-16 | Soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311332463.3A CN117467793A (en) | 2023-10-16 | 2023-10-16 | Soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117467793A true CN117467793A (en) | 2024-01-30 |
Family
ID=89632241
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311332463.3A Pending CN117467793A (en) | 2023-10-16 | 2023-10-16 | Soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117467793A (en) |
-
2023
- 2023-10-16 CN CN202311332463.3A patent/CN117467793A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108779459B (en) | Cotton whole genome SNP chip and application thereof | |
CN102747138B (en) | Rice whole genome SNP chip and application thereof | |
CN106480228B (en) | The SNP marker and its application of rice low cadmium-accumulation gene OsHMA3 | |
CN114182045B (en) | Soybean high-protein content related molecular marker located on chromosome 14 and method for identifying soybean high-protein content | |
CN112280881B (en) | SNP (Single nucleotide polymorphism) marker combination for identifying broccoli germplasm resources and varieties and application | |
CN110894542A (en) | Primer for identifying types of GS5 gene and GLW7 gene of rice and application of primer | |
CN111394508B (en) | Molecular marker linked with capsicum frutescens gene and application thereof | |
CN110512024B (en) | SNP molecular marker related to low acidity or acidity state of peach fruit and application thereof | |
CN111979346B (en) | Improved variety peach breeding method based on KASP molecular marker | |
CN110878376B (en) | SSR molecular marker primer for identifying dendrobium huoshanense and application thereof | |
CN116622876B (en) | Haplotype molecular marker related to vitamin C content of papaya pulp and application thereof | |
CN109504798B (en) | SNP marker for identifying pear kernel size based on high resolution dissolution curve and application thereof | |
CN114480721B (en) | Method for identifying whether melon variety to be detected is thin-skin melon or thick-skin melon and special SNP primer combination thereof | |
CN115992265A (en) | Grouper whole genome liquid phase chip and application thereof | |
CN116287381A (en) | SNP (Single nucleotide polymorphism) marker combination, primer combination and molecular identity card for identifying osmanthus varieties | |
CN113278723B (en) | Composition for analyzing genetic diversity of Chinese cabbage genome segment or genetic diversity introduced in synthetic mustard and application | |
CN117305501A (en) | Soybean protein content-related molecular marker located on soybean chromosome 14 and application thereof | |
CN117467793A (en) | Soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof | |
CN117418029A (en) | Molecular marker related to soybean protein content on soybean chromosome 2 and application thereof | |
CN117344051A (en) | Soybean protein content-related molecular marker located on soybean chromosome 3 and application thereof | |
CN117418030A (en) | Soybean protein content-related molecular marker located on soybean chromosome 7 and application thereof | |
CN117230227A (en) | SNP locus closely linked with anthocyanin content of cowpea, KASP (KASP sequence characterized by single nucleotide polymorphism) marker and application of SNP locus | |
CN110257548B (en) | Method for detecting loquat aneuploid molecular karyotype based on SSR (simple sequence repeat) marker and qPCR (quantitative polymerase chain reaction) | |
CN117587155A (en) | Molecular marker related to soybean oil content on soybean chromosome 3 and application thereof | |
CN117604139A (en) | Molecular marker related to soybean oil content on soybean chromosome 12 and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |