CN117467793A

CN117467793A - Soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof

Info

Publication number: CN117467793A
Application number: CN202311332463.3A
Authority: CN
Inventors: 齐照明; 禹国龙; 冯学珍; 胡利民; 杨明亮; 刘春燕; 陈庆山; 武小霞; 辛大伟; 王锦辉
Original assignee: Northeast Agricultural University
Current assignee: Northeast Agricultural University
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2024-01-30

Abstract

The invention provides a soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof. Belonging to the field of plant identification. In order to rapidly and accurately screen high-protein high-quality soybean varieties. The invention provides a soybean protein content related molecular marker, the gene of the molecular marker is Glyma.17G074400, the nucleotide site at 2279 is C or T, and the application of the markers in preparing a kit for detecting high protein content of soybean and a screening method. The selection of the characters is realized through the selection of the markers, the breeding efficiency is greatly improved, and the effect of directionally improving the soybean varieties is realized, so that the soybean varieties with high protein can be selected.

Description

Soybean protein content-related molecular marker located on soybean chromosome 17 and application thereof

Technical Field

The invention belongs to the field of plant identification, and particularly relates to a soybean protein content-related molecular marker located on a soybean chromosome 17 and application thereof.

Background

The soybean has rich nutrition, and protein content is about 40%. The soybean protein contains 8 essential amino acids for human body, people can eat soybean to supplement needed nutrient substances, and can prevent cardiovascular diseases of human body, the soybean is an important oil crop, can be processed into edible oil, can meet the dietary requirements of people, and simultaneously mainly consists of five fatty acids, wherein the fatty acids can prevent heart diseases, cancers and the like. Along with the increasing living standard of people, more and more people pay more attention to the edible health and the nutritional value of food, so the demand for soybeans is great, but the soybeans in China are more dependent on import from other countries, so the soybean protein and the high-protein and high-oil soybean varieties are urgently improved in China, and the daily needs of people are met.

The soybean grain protein is a quality-related character, is a relatively complex quantitative character, is controlled by a plurality of genes, is limited by genetic characteristics and a breeding method all the time, is too slow in a traditional method, and is proposed as technology is continuously advanced, molecular auxiliary selection is performed on the basis of the traditional hybridization breeding method, molecular markers are closely linked with genes for determining target characters, the selection of the characters is realized through the selection of the markers, the breeding efficiency is greatly improved, and the effect of directionally improving soybean varieties is realized, so that the soybean varieties with high protein can be selected.

Disclosure of Invention

The invention aims to rapidly and accurately screen high-protein high-quality soybean varieties.

The invention provides a soybean protein content-related molecular marker, wherein the gene of the molecular marker is Glyma.17G074400, and the nucleotide site at 2279 is C or T.

The invention provides a primer sequence for amplifying the molecular marker, wherein the forward primer sequence is shown as SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown in SEQ ID NO. 1.

The invention provides a SNP locus related to soybean protein content, wherein the SNP locus is positioned at 5830450 position on chromosome 17 of soybean, and the base of the locus is C or T.

The invention provides a primer sequence for amplifying the SNP locus, and the forward primer sequence is shown as SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown in SEQ ID NO. 1.

The invention provides application of the molecular marker, the primer sequence, the SNP locus or the primer sequence in preparation of a kit for identifying high-protein soybean or low-protein soybean.

The invention provides a kit for identifying high-protein soybean or low-protein soybean, which comprises the primer sequence.

Further defined, the kit further comprises a Master Mix and water.

The invention provides a method for identifying soybean protein content, which comprises the following specific steps:

step 1: extracting DNA of soybean to be detected;

step 2: and (3) carrying out PCR (polymerase chain reaction) by using the primer sequence of the molecular marker or the primer sequence of the SNP locus, detecting that the soybean of the to-be-detected variety is the soybean with low protein content if the soybean of the to-be-detected variety is the CC genotype, and detecting that the soybean of the to-be-detected variety is the soybean with high protein content if the soybean of the to-be-detected variety is the TT genotype.

Further defined, the conditions of the PCR reaction in step 2 are: (1) Hot Start (Hot Start): maintaining at 95deg.C for 30s for 1 cycle; (2) gradual cooling (Touch down): at 95℃for 60s and then at 63℃for 20s, each cycle was cooled by 0.8℃and a total of 10 cycles were performed from 63℃to 55 ℃. (3) PCR amplification (PCR): the reaction was carried out at 95℃for 60s and at 55℃for 20s for 30 cycles. (4) Plate Read: the reaction was maintained at 37℃for 60s and 1 cycle was performed.

Further defined, the CC genotype in step 2 is that the base of the SNP site is C, and the TT base is that of the SNP site is T.

The beneficial effects are that: the invention utilizes 1029 resource materials from 5 resource hybridization populations as experimental populations. The mutation genes are initially screened by sequence comparison in the corresponding parents, effective molecular markers are selected by designing primers aiming at mutation sites, the KASP technology in the SNP molecular marker technology is adopted for verification in resource materials, and finally, the gene is re-verified and polymerized in a resequencing population by haplotype analysis. The aim is to determine effective functional markers and important candidate genes, and the main research results are as follows:

(1) 5 SNP markers related to protein content were obtained and developed: chr2:47624936, chr3:43490555, chr7:18219120 Chr14:33480275, chr17:5830450.

(2) Important candidate genes in the vicinity of SNP markers related to soybean protein content are obtained as follows: glyma.02G274900, glyma.03G219900, glyma.07G151300, glyma.14G119000, glyma.17G074400.

Drawings

FIG. 1 is a diagram showing the sequence alignment of genes related to proteins;

FIG. 2 is a graph showing the expression results of candidate genes at SNP sites associated with proteins;

FIG. 3 is a graph showing the genotyping results of KASP in the parent material for protein-associated SNP markers;

FIG. 4 is a histogram of protein and oil content distribution for different populations of materials;

FIG. 5 is a graph showing the result of KASP genotyping of SNP markers related to protein content in resource materials;

FIG. 6 is a graph of the mean high protein haplotype and low protein haplotype phenotype results for protein-associated SNP sites;

FIG. 7 is a graph of the average high protein and low protein phenotypes with a polymerization effect associated with the protein;

FIG. 8 is a distribution histogram of the resequencing population proteins and oil content.

Detailed Description

Example 1.

1. 1029 parts of material from 5 hybridized colony resource materials were utilized. First, seven varieties of Suilng 76, suilng 69, suilng 49, suilng 35, suilng 42, dongsheng No.1 and Suilk No.3 are selected as parents, and hybridized combination is carried out, wherein the characteristics of the varieties are shown in Table 1. The hybridization combination is Suizhong 69 XSuizhong 76, suiximang 35 XSuizhong 76, dongsheng 1 XSuiximang 76, suiximang 3 XSuiximang 42 and Suiximang 49 XSuiximang 76. 1209 soybean germplasm resources from the above 5 hybrid combination F6 generations are selected as an experimental group, planted in a saleization separation experimental field of the national academy of sciences of Heilongjiang province in 2022, and field management method is the same as field management. In the vegetative growth stage, three young leaves at the top of the plant are adopted for extracting DNA, KASP typing experiments are carried out, threshing is carried out after the seeds are mature, and the seeds are used for measuring the protein and oil content.

Next, the study performed haplotype analysis and gene polymerization using 643 parts of the finished genome resequencing material from soybean improvement genetics laboratory.

TABLE 1 quality characterization of hybrid parents

Variety of species	Quality traits
		Seiner 76	High protein variety, 46.78% protein and 16% fat.86％
Suinong 69	Disease-resistant variety, protein content 40.57%, fat content 19.46%
		Suinong 49	Special variety (large grain variety), protein content 41.24%, fat content 21.57%
Suinong 35	High oil soybean with protein content of 42.17% and fat content of 22.00%
		Seism 42	High fatty acid soybean variety, protein content 40.68%, fat content 20.00%,
dongsheng No.1	Protein content 41.30%, fat content 19.97%
		No.3 of no fishy bean	No fishy bean variety, protein content 37.37%; fat content 21.81%

2. Important allele mining

Important allele mining is a method of screening for sites that are significantly associated with a trait of interest by analyzing the correlation between genomic data and the trait of interest, and then combining phenotypes to further determine the effect of the allele on the trait of interest. In this study, coincidence rate was used to represent the effect of a site.

In order to preliminarily screen candidate genes related to soybean oil protein, firstly, a soybean gene sequence in a SoyBase (https:// www.soybase.org /) data platform is downloaded as a template, the gene sequence of a hybrid combination parent is extracted from 634 parts of soybean core planting sequencing resources in northeast areas, DNAMAN is utilized for sequence comparison, SNP loci which are located in a CDS region and cause amino acid change are screened out, candidate genes are preliminarily screened out, and the functions of the candidate genes are preliminarily explored. Secondly, determining SNP loci corresponding to each pair of parents, designing specific primers aiming at the loci, verifying in parent materials by adopting KASP technology in SNP molecular marking technology, and screening effective primers capable of distinguishing alleles. Finally, preparing the effective primer before the resource material sample reacts; the KASP master mix comprises a kit of LGC company in UK, which contains two general quenching fluorescent probes FAM and HEX, and core components such as Taq enzyme; whereas the KASP analysis mixture contained two forward primers and one reverse primer, wherein the two forward primers were designed based on the sequence specificity before and after the SNP site. These primers bind to different alleles and bind to FAM and HEX fluorescent probes when subjected to the KASP procedure, producing different fluorescent results. If a given SNP genotype is homozygous, a green or blue fluorescent signal will be generated; if the genotype is heterozygous, the result shows a red fluorescent signal.

According to the principle, a corresponding upstream and downstream 50bp base sequence can be selected on the SNP locus obtained by screening, and a KASP primer is designed by utilizing Premier5.0 software. The primers comprise two specific forward primers (F1/F2) and a common reverse primer (R). The forward primer not only has the characteristic of identifying different alleles, but also has fluorescent labels FAM (GAAGGTGACCAAGTTCATGCT) and HEX (GAAGGTCGGAGTCAACGGATT) with different colors connected to one end so as to realize the distinction of PCR amplification products. The primer sequences are shown in Table 2. The PCR reaction uses 384-well plate as carrier, adds the chemical substance needed by the reaction, and uses the Roche Light Cycler 480 II real-time fluorescence quantitative PCR instrument to make the reaction. The reaction procedure was divided into the following parts: (1) Hot Start (Hot Start): maintaining at 95deg.C for 30s for 1 cycle; (2) gradual cooling (Touch down): at 95℃for 60s and then at 63℃for 20s, each cycle was cooled by 0.8℃and a total of 10 cycles were performed from 63℃to 55 ℃. (3) PCR amplification (PCR): the reaction was carried out at 95℃for 60s and at 55℃for 20s for 30 cycles. (4) Plate Read: the reaction was maintained at 37℃for 60s and 1 cycle was performed. After the PCR reaction is completed, we need to read the end fluorescent signal.

Table KASP reaction System

TABLE 2 primer sequences

Results: in order to preliminarily determine candidate genes related to soybean protein content, the study screened important genes related to soybean protein pathways collected and sorted by the subject group and obtained by combining with MateQTL analysis, and articles published by the subject group: meta-analysis and transcriptome profiling reveal hub genes for soybean seed storage composition during seed development, soybean protein, oil-related excellent allele mining breeding evaluation and screening. Firstly, extracting the gene sequence of an important gene related to the soybean protein content by utilizing a SoyBase (https:// www.soybase.org /) data platform, extracting the sequences of hybrid combined parents of Suilnong 76, suilnong 69, suilnong 49, suilnong 35 and Suilnong 43 from the soybean core germplasm sequencing resource in 634 northeast regions, and carrying out sequence comparison on parents according to the gene sequences related to Dongsheng No.1 and Suilnon-fishy bean No.3 in published articles, screening SNP loci which are positioned in a CDS region and cause amino acid change, and initially screening candidate genes.

Gene annotation relates to functions of fatty acid synthesis, growth and development, protein binding and the like, and relates to protein types of 7S, 40S, 60S and the like. For 103 important genes, soybean reference genome from American variety Williams 82 downloaded on SoyBase platform is compared with extracted parent sequence in DNAMAN, one SNP locus comparison result is shown in figure 1, in the hybridization combination of Suilnong 69×Suilnong 76, the female parent Suilnong 69 is mutated from genotype C to G, and the mutation of amino acid is caused. Finally, 65 SNP loci are screened out, and the candidate genes are located in 27 candidate genes related to proteins, wherein 7 SNP loci are located on the gene Glyma.02G090800, and 5 SNP loci are located on the gene Glyma.16G018400, so that the number is large; the other sites are distributed on the gene more uniformly. Preliminary studies were made on functional annotation of 27 candidate genes, such as gene Glyma.02G090800, involving protein translation processes including transformation initiation, translation initiation factor activity, protein binding; genes Glyma.03G232000 and Glyma.07G151300 are involved in protein folding and cellular stress response; gene Glyma.08G316700 plays an important role in different stages of protein synthesis and the like; the gene Glyma.07G151300 is a member of FAD8, and can catalyze the production of palmitoleic acid and linolenic acid, and is closely related to the fatty acid content of soybeans. The candidate genes include a plurality of genes such as Glyma.02G151500, glyma.02G274900 and Glyma.03G219900, and are functionally related to translation, transcription and binding of proteins (see Table 3).

TABLE 3 candidate genes near protein-associated SNP markers

Gene number	Gene annotation
		Glyma.02G090800	translation initiation factor IF2/IF5
Glyma.02G151500	protein SLOW WALKER 1
		Glyma.02G274900	Chromosome and associated proteins
Glyma.03G219900	DELLA protein
		Glyma.03G232000	probable protein disulfide-isomerase A6
Glyma.03G244800	OAS-TL1,cysteine synthase
		Glyma.07G051500	transcription factor MYC2
Glyma.07G102800	vacuolar protein sorting-associated protein 26C
		Glyma.07G151300	omega-3fatty acid desaturase
Glyma.07G261900	U3 small nucleolar RNA-associated protein 6homolog
		Glyma.08G069000	HSP17.3-B,17.3kDa class I heat shock protein
Glyma.08G316700	protein translation factor SUI1 homolog 2
		Glyma.09G018300	vacuolar protein sorting-associated protein 53A
Glyma.09G230700	DNA-directed RNA polymerase III subunit RPC4 isoform X1
		Glyma.10G037100	glycinin G4,GY4
Glyma.12G018300	tRNA aminoacylation for protein translation
		Glyma.12G230900	transport protein Sec61 subunit alpha
Glyma.13G171200	ribosomal RNA-processing protein 7homolog A
		Glyma.13G176000	HSP17.6-L,17.6kDa class I heat shock protein
Glyma.14G048800	vacuolar sorting protein
		Glyma.14G119000	myb domain protein 56
Glyma.15G089800	eukaryotic translation initiation factor 4E-1
		Glyma.16G018400	vacuolar protein sorting-associated protein 8homolog
Glyma.16G178800	HSP90A2,heat shock protein 90-A2
		Glyma.17G074400	delta-12desaturase
Glyma.19G164800	glycinin subunit G7,GY7
		Glyma.20G146200	beta-conglycinin beta-subunit,CG-BETA-1

The results of the expression pattern analysis using the RNA-seq dataset in SoyBase (https:// www.soybase.org /) were mapped using TBtool software (FIG. 2), with orange to green color indicated the expression level from high to low. 20 candidate genes in 27 genes are expressed in each stage of the development of the seed grains, wherein the expression level of 10 genes such as Glyma.03G219900, glyma.03G232000, glyma.03G244800 and the like is low; the expression level of 6 genes such as Glyma.03G232000, glyma.03G244800, glyma.07g102800 and the like is higher, especially the expression level is highest in roots; the difference in the expression levels of 4 genes, such as Glyma.02G151500, glyma.07G051500, glyma.15G089800, and Glyma.16G018400, was most remarkable. The gene Glyma.07g05150 has the lowest expression quantity of the Seed 42DAF in the Seed development period, and the peak value of the Seed25DAF is 103.6 times of the Seed 42 DAF; gene Glyma.10G037100 shows the lowest expression level of Seed 21DAF in the Seed grain development period, and the peak value of Seed25DAF is 24 times of the Seed 21 DAF; the gene Glyma.15G089800 has the lowest expression quantity of Seed 21DAF in the Seed grain development period, and the peak value of the Seed 35DAF is 2.4 times of the Seed 21 DAF; gene Glyma.16G018400 showed the lowest expression of Seed 21DAF in its Seed stage, with the peak of Seed 10DAF being 3.7 times that of Seed 21 DAF.

To preliminarily determine SNP sites related to soybean protein content, KASP primers were designed for 65 SNP sites of 27 soybean protein-related candidate genes of Table 1, 50bp base sequences upstream and downstream of the extraction site, respectively, using Primer5.0 software (http:// www.premierbiosoft.com/index. Html). And verifying the primer in the parent corresponding to the hybridization group, and repeating the test at least three times for each pair of parent of the primer to improve the reliability of the result. FIG. 3 is one of many results, in which green and blue represent two different homozygous genotypes and red is heterozygous genotype. When the primer can be stably displayed as different homozygous genotypes in parents, the primer is judged to have a better typing effect. According to the KASP typing result, 19 excellent primers with better typing effect were finally selected as shown in Table 4.

TABLE 4 SNP molecular marker loci associated with protein content

SNP numbering	Gene number	Base group	Chromosome of the human body	Position of
					56	Glyma.02G090800	C/T	Chr2	7982944
58	Glyma.02G090800	G/A	Chr2	7983144
					30	Glyma.02G274900	T/A	Chr2	47624936
31	Glyma.02G274900	T/C	Chr2	47625857
					63	Glyma.03G219900	T/C	Chr3	43490555
64	Glyma.03G232000	A/T	Chr3	44494958
					71	Glyma.07G051500	C/T	Chr7	4424835
40	Glyma.07G151300	G/T	Chr7	18219120
					73	Glyma.07G261900	G/A	Chr7	43998064
74	Glyma.07G261900	T/C	Chr7	43998083
					79	Glyma.12G018300	C/A	Chr12	1280727
80	Glyma.12G230900	G/A	Chr12	40525342
					18	Glyma.14G119000	A/G	Chr14	33480275
19	Glyma.14G119000	A/C	Chr14	33480631
					67	Glyma.15G089800	A/G	Chr15	6903356
50	Glyma.17G074400	C/A	Chr17	5830396
					51	Glyma.17G074400	T/C	Chr17	5830450
52	Glyma.17G074400	T/C	Chr17	5831106
					68	Glyma.19G164800	A/C	Chr19	43002797

The specific distribution number of these 19 excellent sites on 20 chromosomes of soybean is Chr02 (4), chr03 (2), chr07 (4), chr12 (2), chr14 (2), chr15 (1), chr17 (3), chr19 (1), and the greatest number of SNPs distributed on Chr02 and Chr07 chromosomes can be seen. Specific distributions in different hybridization combinations are 69×seism 76 (9), 1×seism 76 (7), 35×seism 76 (6), 3×seism 42 (11), 49×seism 76 (11), and more polymerized SNPs in the hybridization combination of 3×seism 42 and 49×seism 76.

Kasp typing experiment

Adding the prepared KASP reaction system into a 384-well plate, obtaining experimental results through a Roche Light Cycler 480 II instrument, and then importing the results into an Excel table, and processing and analyzing by combining phenotype data, wherein the basic method comprises the following steps:

(1) Classifying materials according to soybean seed protein or oil phenotype, calculating the average value and standard deviation of each group of data, and determining a critical value according to the result of adding and subtracting the standard deviation from the average value;

(2) Materials above this value are referred to as high protein or high oil materials, and below this value are referred to as low protein or low oil materials, and the criterion is used to calculate the gene locus compliance, i.e., the proportion of materials that meet a phenotypic characteristic in the population.

(3) The sample data obtained from the KASP result are counted according to the high protein/oil component material and the low protein/oil component material, and the two data are added to obtain the total number, and the coincidence rate is obtained by dividing the coincidence number by the total number. Then, the high protein/oil material and the low protein/oil material were used as rows, the x-containing allele and the y-containing allele were used as columns, and finally a four-grid table (see table 5) of coincidence rates was constructed. According to the data in the four-grid table, the accuracy and the reliability of the detection method, and the possible misjudgment condition are analyzed, and necessary improvements are carried out.

(4) Judging whether the SNP locus is related to the soybean protein oil content by using hypothesis test: the original assumption is that H0 indicates that the content size is independent of the x/y allele, while HA indicates that there is a correlation between these two variables. By calculation we can get the coincidence rates P1 and P2 and determine whether the H0 hypothesis needs to be rejected and the HA hypothesis accepted based on P1, P2 and the set significance level α (60%).

Table 5 four grid table of compliance rates

	High protein/oil material	Low protein/oil material
			Containing x alleles	a	c
Containing y alleles	b	d
			Total number of	M	N

Note that: x and y are genotypes of KASP (kaSP) typing of SNP locus design primers, a is the number of x alleles in a result of ≡sequencing ≡egg ≡or ≡oil material typing, b is the number of x alleles in a result of ≡sequencing ≡egg ≡or low oil material typing, c is the number of y alleles in a result of ≡sequencing ≡egg ≡or ≡oil material typing, d is the number of y alleles in a result of ≡sequencing ≡egg ≡or low oil material typing, M is the total number of ≡sequencing ≡egg ≡or ≡oil material, and N is the total number of ≡sequencing low egg ≡or low oil material.

Results:

the study was directed to F from 5 hybridization populations ₅ The material is planted in the area of the seismosis in 2022. Wherein, the group 28 is the hybridization of the seiner 69 and the seiner 76, the male parent is a high-protein variety, the protein content is 46.78%, the female parent is a disease-resistant variety, 390 plants are planted together, and the phenotype data 245 plants are harvested and measured. The colony 122 is hybridized by the seism 35 and the seism 76, the male parent is a high-protein variety, the protein content is 46.78, the female parent is a high-oil variety, the oil content is 22.00%, 390 plants are planted, and phenotype data 274 plants are harvested and measured. Group 163 is the hybridization of the non-fishy bean 3 of the seiid with the seiid 42, the male parent is the non-fishy bean variety, the female parent is the high oleic acid variety, 390 plants are planted for harvesting and 205 plants are measured, and the like. Other cross-combining information is shown in Table 6, with final offspring co-harvested and measured for 1029 strains.

TABLE 6 Soybean resource Material information

Group numbering

Female parent

Protein content

Oil content

Father parent

Protein content

Oil content

Quantity of materials

28

Suinong 69

40.57％

19.64％

Seiner 76

46.78％

16.86％

245

119

Dongsheng 1

41.30％

19.97％

Seiner 76

46.78％

16.86％

212

122

Suinong 35

42.17％

22.00％

Seiner 76

46.78％

16.86％

274

163

Fishy smell of seiid 3

37.37％

21.81％

Seism 42

40.68％

20.00％

205

167

Suinong 49

42.17％

19.97％

Seiner 76

46.78％

16.86％

272

Soy protein, oil phenotype data were measured by a Foss grain analyzer for 2022 and were descriptive statistically analyzed using SPSS software. The maximum protein content of the 2022 material is 46.6 percent, 31.53 percent and the average value is between 36.5 percent and 40.5 percent; the maximum oil content is 25.64%, 16.16% and the average value is 18.46% -21.49%. The phenotype data are widely distributed and obviously different, the quantitative trait genetic characteristics are met, and the protein oil content of the material is moderately distributed in a bias manner through analysis of kurtosis and bias discovery, so that the material is suitable for subsequent research.

Analysis of the different hybridization populations revealed from Table 7 that among the five populations, the 49X 76 hybrid protein content was highest, the maximum was 46.63% and the average was 40.49%; the content of the combined oil of the hybridization of the 3 XSuinon 42 with no fishy smell is the highest, the maximum value is 25.64%, and the average value is 21.48%. The standard deviation of the protein content is between 1.24 and 2.40, and the variation coefficient is between 3.39 and 5.92 percent; the standard deviation of the oil content is between 0.72 and 1.37, the variation coefficient is between 3.27 and 6.84 percent, the total standard deviation is smaller, and no larger amplitude is generated.

TABLE 7 descriptive analysis of different populations of protein, oil quality traits

	Maximum value	Minimum value	Average value of	Median of	Standard deviation of	Degree of deviation	Kurtosis degree	Coefficient of variation
									Suinon 69 x Suinon 76-protein	44.15	33.52	40.17	40.54	2.16	-0.74	0.35	5.37％
69X 76-oil content of seism	21.70	16.16	18.46	18.25	1.23	0.64	-0.14	6.64％
									35X 76 of Suinon-proteins	43.88	27.69	39.84	39.99	1.89	-1.30	6.13	4.73％
35X seiner 76-oil content	24.00	17.04	19.89	19.93	1.07	0.23	1.27	5.37％
									Dongsheng 1 Xseinone 76-protein	43.47	32.36	39.49	39.86	1.95	-0.93	1.15	4.94％
Dongsheng 1 Xseinong 76-oil	22.57	18.42	20.28	20.22	0.72	0.38	0.24	3.57％
									Fishy smell-free 3 Xseinon 42-protein	39.59	32.65	36.50	36.49	1.24	-0.12	-0.19	3.39％
Fishy smell-free 3 Xseinon 42-oil	25.64	19.25	21.49	21.56	0.78	0.31	3.49	3.65％
									49 XSuinong 76-protein	46.63	31.53	40.49	40.65	2.40	-0.50	1.01	5.92％
49X 76-oil content of seism	24.51	16.95	20.04	19.90	1.37	0.71	0.69	6.84％

Drawing frequency distribution histograms of protein oil phenotype data of five groups by utilizing GraphPad Prism 8 software, wherein the protein content distribution is from 30% to 48%, and the group spacing is 1; the oil content distribution is 16% -25% and the group distance is 0.5. As can be seen from FIG. 4, the soybean seed protein oil content values of the respective populations were measured to show continuous distribution, and the normal distribution trend was evident. Secondly, the figure shows that the total protein content of the 49X-seiner 76 hybridization group and the 69X-seiner 76 hybridization group of the seiner is higher and concentrated to more than 40%, and the oil content of the seiner is relatively lower; the oil content of the 3 XSuinon 42 hybridization group without fishy smell is higher and concentrated to more than 21%, and the protein content is lower as a whole; the 35X seiner 76 hybridized colony protein content is concentrated at 38% -41%, and the oil content is concentrated at 19% -20.5%.

4. Verification of SNP locus related to protein content

To verify the excellent protein-associated alleles, the 19 primers screened were typed by the KASP platform, and the fluorescent signal was shown green or blue if the genotype of a given SNP was homozygous, and red if the genotype was heterozygous, and the KASP results for the protein-associated SNP sites are shown in fig. 5.

The results show that the 19 primers have better typing effect, and KASP results are combined with protein phenotype for analysis: at Chr2:47624936 (fig. 5 (3)), the high protein material had 24 AA genotypes with a compliance of 58.54% and the low protein material had 20 TT genotypes with a compliance of 83.33%; at Chr3:43490555 (fig. 5 (5)): the high protein material has 24 parts of AA genotype, the coincidence rate is 60.98%, the low protein material has 15 parts of TT genotype, and the coincidence rate is 53.57%; at Chr7:18219120 (fig. 5 (8)): the high protein material has 53 parts of AA genotype, the coincidence rate is 81.54%, the low protein material has 27 parts of GG genotype, and the coincidence rate is 54.00%; at Chr14:33480275 (fig. 5 (13)): the high protein material has 24 parts of GG genotype, the coincidence rate is 58.54%, the low protein material has 34 parts of AA genotype, and the coincidence rate is 70.53%; at Chr17:5830450 (17 in fig. 5): the high protein material has 84 parts of TT genotype, the coincidence rate is 84.85%, the low protein material has 34 parts of CC genotype, and the coincidence rate is 57.63%. The five SNP markers can successfully carry out typing and show different genotypes in high and low proteins, and can better distinguish high and low protein materials (as shown in table 8).

TABLE 8 screening results of SNP markers related to the soybean protein content

Note that: "shows excellent SNP locus with better screening effect

Finally, 5 SNP loci which are determined in resource materials and are related to the soybean protein content are positioned in 5 important candidate genes: glyma.02G274900 (Chr 2: 47624936), glyma.03G219900 (Chr 3: 43490555), glyma.07G151300 (Chr 7: 18219120), glyma.14G119000 (Chr 14: 33480275), glyma.17G074400 (Chr 17: 5830450) are as shown in Table 9. Wherein the hybrid combination seism 69×seism 76 contains 3 mutation sites at SNP numbers 30, 63, 51; the hybrid combination seism 35 x seism 76 contains 3 mutation sites at SNP numbers 63, 40 and 18; the hybrid combination Dongsheng 1 XSuinon 76 contains 3 mutation sites at SNP numbers 30, 40 and 18; the hybrid combination seiid no fishy 3 x seiner 42 contains 2 mutation sites at SNP numbers 63 and 40; the hybrid amblyseius 49×amblyseius 76 contains 5 mutation sites at SNPs 30, 63, 40, 18, 51. The hybrid combined seiner 49×seiner 76 is polymerized with the largest number of SNP loci related to protein content, and the phenotypic detection of the resource group shows that the total protein content is the highest in 5 groups, concentrated to more than 40%, and the highest value is 46.63%, so that the phenotype is consistent with the genotype.

TABLE 9 SNP genotypes associated with Soy seed protein content

5. Haplotype analysis

Verifying the mined candidate genes in a resequencing population, and carrying out haplotype analysis on the candidate genes in 643 resequencing materials by using software, wherein the specific method is as follows:

(1) The SoyBase (https:// www.soybase.org /) data platform is utilized to extract the genome sequence of the soybean protein oil candidate genes, and the genome information of the resequencing population is combined to search all candidate genes, and the important candidate genes of the SNP loci are screened out.

(2) And then, dividing the similar SNP loci into a group for haplotype analysis, and analyzing the relationship between haplotype and phenotype in the important candidate genome sequence information.

(3) The boxplot was drawn using GraphPad Prism 8 software and the significance differences between the different haplotypes and their phenotypes in each important candidate gene were analyzed. Significance analysis the variance alignment was detected and multiple comparisons made using the Least Significant Difference (LSD) method in the one-way ANOVA model.

For haplotype analysis, subsequent studies were performed using the phenotype data of the 643 re-sequencing resource population provided by the present laboratory for two years 2018, 2019, and the population re-sequencing genotype data. The BIUP value of the protein oil content for two years is shown in figure 8, and the 2 quality character variation coefficients are between 4.7% and 4.9%, so that the BIUP value is stable and has no larger amplitude; the protein property of the protein is in medium bias distribution, and the oil property of the protein is in high bias distribution, so that the protein is suitable for subsequent experiments.

Results: to further determine the correlation of important candidate genes with soy protein content, haplotype analysis was performed on SNP sites of 5 important candidate genes and the proximity sites were grouped into one set for joint analysis, each set yielding a different haplotype. The proportion of haplotypes in 643 sequenced materials was analyzed, and the phenotypic mean of the different haplotypes was calculated for analysis of variance. The final analysis resulted in high protein good haplotypes and low protein haplotypes with significant differences in the average protein phenotype of the 4 groups (see figure 6).

Analysis on the gene Glyma.02G274900 shows that the high protein excellent haplotype has_2 (ACTTT) and the low protein haplotype has_3 (TCTTT), the high protein excellent haplotype accounts for 24.8%, the average protein content is 43.1%, the low protein haplotype accounts for 59.3%, and the average protein content is 42.2%; analysis on the gene Glyma.07G151300 to obtain high protein excellent haplotype Hap_3 (GAA) and low protein haplotype Hap_4 (AGT), wherein the high protein excellent haplotype accounts for 10.3%, the average protein content is 42.9%, the low protein haplotype accounts for 19.5%, and the average protein content is 41.5%; analysis on gene Glyma.14G119000 gave high protein excellent haplotype Hap_3 (TCTAC) and low protein haplotype Hap_1 (TCCAC), the high protein excellent haplotype was 56.5%, the average protein content was 42.9%, the low protein haplotype was 13.2%, and the average protein content was 41.8%; analysis on the gene Glyma.17G074400 shows that the high protein excellent haplotype Hap_5 (TTAGTCCCG) and the low protein haplotype Hap_4 (TCAGTCCCG) are obtained, the high protein excellent haplotype accounts for 10.1%, the average protein content is 43.3%, the low protein haplotype accounts for 55.1%, the average protein content is 42.0%, and the four genes have obvious difference in protein content. Analysis on the gene Glyma.03G219900 gave a high protein excellent haplotype Hap_1 (CCGAGTAAGC) and a low protein excellent haplotype Hap_2 (CCGAGTTAGC), the high protein excellent haplotype was 84.3%, the average protein content was 42.5%, the low protein haplotype was 6.7%, and the average protein content was 41.7%.

To further determine if there was synergy of the excellent haplotypes in the high protein material, the material was subjected to a polymerization analysis in 643 re-sequencing populations (see FIG. 6). Selecting 156 parts of high-protein material with protein content higher than 44%, counting haplotype ratio and analyzing polymerization effect: analysis at the gene Glyma.02G274900 shows that 50 parts of the material contains high protein genotype Hap_2 (ACTTT) accounting for 30.1% of the high protein material; analysis at the gene Glyma.07G151300 shows that 19 parts of the material contained high protein genotype Hap_1 (TACCC), accounting for 12.2% of the high protein material; analysis at gene Glyma.14G119000 gave 109 parts of material containing high protein genotype Hap_3 (TCTAC) accounting for 69.9% of the high protein material; analysis at gene Glyma.17G074400 shows that 25 parts of the material contains high protein genotype Hap_5 (TTAGTCCCG) accounting for 16.0% of the high protein material; as a result of analysis at the gene Glyma.03G219900, 141 parts of the material contained high protein genotype Hap_1 (CCGAGTAAGC) accounting for 90.0% of the high protein material. Of 156 parts of material, 1 part of material polymerized 5 excellent genotypes, 11 parts of material polymerized 4 excellent genotypes, 39 parts of material polymerized 3 excellent genotypes, and 71 parts of material polymerized 2 excellent genotypes; there were 97 parts of material, and the high protein genotypes of Glyma.14G119000Hap_3 (TCTAC) and Glyma.03G219900Hap_1 (CCGAGTAAGC) were polymerized, accounting for 62.1% of the high protein material, which was judged to have higher polymerization effect.

154 parts of low-protein material with the protein content lower than 41% are selected, the haplotype ratio is counted and the polymerization effect is analyzed: analysis at the gene Glyma.02G274900 shows that 99 parts of the material contains low protein genotype Hap_3 (TCTTT) accounting for 64.3% of the low protein material; analysis at the gene Glyma.07G151300, 38 parts of material contained low protein genotype Hap_4 (AGT), accounting for 24.7% of low protein material; analysis at gene Glyma.14G119000 gave 54 parts of material containing low protein genotype Hap_1 (TCCAC) accounting for 35.1% of the low protein material; analysis at gene Glyma.17G074400 shows that 101 parts of material contains low protein genotype Hap_4 (TCAGTCCCG) accounting for 65.6% of low protein material; analysis at the gene Glyma.03G219900 gave 15 parts of material containing low protein genotype Hap_2 (CCGAGTTAGC) at 10.0% of the low protein material. Of 156 parts of material, 9 parts of material polymerized 4 low protein genotypes, 46 parts of material polymerized 3 low protein genotypes, and 44 parts of material polymerized 2 low protein genotypes; 82 parts of material, the low protein genotypes of Glyma.02G274900Hap_3 (TCTTT) and Glyma.17G074400Hap_4 (TCAGTCCCG) were polymerized simultaneously, accounting for 53.3% of the low protein material, judging that the material has higher polymerization effect.

Table 10 superior haplotypes involved in polymerization with protein

Gene	High protein genotype	Number of materials	Low protein genotype	Number of materials
					Glyma.02G274900	Hap_2(ACTTT)	50	Hap_3(TCTTT)	99
Glyma.07G151300	Hap_3(GAA)	19	Hap_4(AGT)	38
					Glyma.14G119000	Hap_3(TCTAC)	109	Hap_1(TCCAC)	52
Glyma.17G074400	Hap_5(TTAGTCCCG)	25	Hap_4(TCAGTCCCG)	101
					Glyma.03G219900	Hap_1(CCGAGTAAGC)	141	Hap_2(CCGAGTTAGC)	15

The final polymerized protein content phenotype is shown in FIG. 7, with a maximum protein content of 48.06%, a minimum protein content of 44.01% and an average value of 45.30% in 97 parts of material polymerized in the high protein genotype; the highest protein content of 82 parts of polymerized materials in the low protein genotype was 40.98%, the lowest 37.95% and the average value was 40.10%.

Example 2.

1. A kit for screening high protein soybeans:

the forward primer sequence of the amplified molecular marker is shown as SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown as SEQ ID NO. 1; the nucleotide sequence of the downstream primer of the amplification SNP1 is shown as SEQ ID NO. 1;

the screening method comprises the following steps: selecting a sample with unknown soy protein content, and performing a PCR amplification procedure by using the kit for screening high-protein soybeans in the step one: (1) Hot Start (Hot Start): maintaining at 95deg.C for 30s for 1 cycle; (2) gradual cooling (Touch down): at 95℃for 60s and then at 63℃for 20s, each cycle was cooled by 0.8℃and a total of 10 cycles were performed from 63℃to 55 ℃. (3) PCR amplification (PCR): the reaction was carried out at 95℃for 60s and at 55℃for 20s for 30 cycles. (4) Plate Read: the reaction was maintained at 37℃for 60s and 1 cycle was performed. The steps after KASP analysis are as follows:

2. a method for identifying soybeans with high protein content, which comprises the following specific steps:

(1) Extracting DNA of soybean to be detected;

(2) And (3) carrying out PCR reaction by using the primer of the molecular marker, wherein the soybean of the to-be-detected variety is detected to be of a CC genotype, the soybean of the to-be-detected variety is detected to be of a low protein content, and the soybean of the to-be-detected variety is detected to be of a TT genotype.

Results: the soybean protein content in the sample with unknown soybean protein content is detected, the genotype mark is used for detecting TT genotype, and the high-protein content of the soybean is consistent with the genotype detected by the mark. The low protein content of soybean is consistent with the genotype detected by the marker.

Claims

1. A soybean protein content-related molecular marker, characterized in that the gene of the molecular marker is Glyma.17g074400, and the nucleotide site at 2279 is C or T.

2. Amplifying the primer sequence of the molecular marker of claim 1, wherein the forward primer sequence is shown in SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown in SEQ ID NO. 1.

3. A soybean protein content-related SNP locus, wherein the SNP locus is positioned at 5830450 on chromosome 17 of soybean, and the base of the locus is C or T.

4. Amplifying the primer sequence of the SNP locus according to claim 3, wherein the forward primer sequence is shown in SEQ ID NO.2 or SEQ ID NO. 3; the reverse primer sequence is shown in SEQ ID NO. 1.

5. Use of the molecular marker of claim 1, the primer sequence of claim 2, the SNP site of claim 3 or the primer sequence of claim 4 for the preparation of a kit for identifying high protein content soybeans or low protein content soybeans.

6. A kit for identifying high protein soybean or low protein soybean, comprising the primer sequence of claim 2 or claim 4.

7. The kit of claim 6, further comprising a Master Mix and water.

8. A method for identifying the content of soy protein, which is characterized by comprising the following specific steps:

step 1: extracting DNA of soybean to be detected;

step 2: carrying out PCR reaction by using the primer sequence of the molecular marker of claim 2 or the primer sequence of the SNP locus of claim 4, detecting that the soybean of the to-be-detected variety is of CC genotype, and if the soybean of the to-be-detected variety is of TT genotype, the soybean of the to-be-detected variety is of high protein content.

9. The method according to claim 8, wherein the conditions of the PCR reaction in step 2 are: the conditions for the PCR reaction in step 2 are: (1) Hot Start (Hot Start): maintaining at 95deg.C for 30s for 1 cycle; (2) gradual cooling (Touch down): at 95℃for 60s and then at 63℃for 20s, each cycle was cooled by 0.8℃and a total of 10 cycles were performed from 63℃to 55 ℃. (3) PCR amplification (PCR): the reaction was carried out at 95℃for 60s and at 55℃for 20s for 30 cycles. (4) Plate Read: the reaction was maintained at 37℃for 60s and 1 cycle was performed.

10. The method according to claim 8, wherein the CC genotype in step 2 is C at the base of the SNP site, and the TT gene is T at the base of the SNP site.