CN112289384A - Construction method and application of whole citrus genome KASP marker library - Google Patents

Construction method and application of whole citrus genome KASP marker library Download PDF

Info

Publication number
CN112289384A
CN112289384A CN202011104524.7A CN202011104524A CN112289384A CN 112289384 A CN112289384 A CN 112289384A CN 202011104524 A CN202011104524 A CN 202011104524A CN 112289384 A CN112289384 A CN 112289384A
Authority
CN
China
Prior art keywords
genome
filtering
genotype
citrus
kasp marker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011104524.7A
Other languages
Chinese (zh)
Other versions
CN112289384B (en
Inventor
邓秀新
宋谢天
王楠
叶俊丽
曹榛
张斯淇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Agricultural University
Original Assignee
Huazhong Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Agricultural University filed Critical Huazhong Agricultural University
Priority to CN202011104524.7A priority Critical patent/CN112289384B/en
Publication of CN112289384A publication Critical patent/CN112289384A/en
Application granted granted Critical
Publication of CN112289384B publication Critical patent/CN112289384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/6895Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/13Plant traits
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Bioethics (AREA)
  • Biochemistry (AREA)
  • Genetics & Genomics (AREA)
  • Mycology (AREA)
  • General Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Botany (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a construction method and application of a citrus whole genome KASP marker library, and belongs to the technical field of KASP marker libraries. The construction method of the citrus KASP marker library comprises the following steps: introducing a group father and mother DNA sample and a filial generation individual plant DNA sample; filtering off partial separation sites in the sample; carrying out genotype filling on unknown genotype data in the filtered genome VCF file; filtering the single base variation SNP data in the filled genome VCF file to obtain a final genome VCF file; and (4) importing the final genome VCF file into KASP marker generation software to generate a citrus KASP marker library. The invention also discloses an application of the constructed citrus KASP marker library. The KASP marker library can be used for identifying phenotypes and has the advantages of high efficiency and accuracy.

Description

Construction method and application of whole citrus genome KASP marker library
Technical Field
The invention mainly relates to the technical field of marker library construction, and particularly relates to a construction method and application of a citrus whole genome KASP marker library.
Background
The citrus is one of the most important fruit crops in the world, and the annual output of the citrus in China exceeds 4000 ten thousand tons and is the top of the world. However, citrus cross breeding has been slow, due to the unique Apomixis (Apomixis) phenomenon in citrus. On one hand, the polyembryony character is widely utilized in citrus production, and particularly, the posterity generated by apomixis in stock breeding has high germination and emergence rate, and seedlings are uniform, sturdy and tidy and consistent, so that the method has the advantages that other fruit trees do not have; on the other hand, for cross breeding, sexual offspring cannot be obtained after cross breeding due to the interference of nucellus embryos, and the cross breeding efficiency of citrus is seriously influenced. The molecular marker assisted breeding has great utilization value in citrus, theoretically, hybrids with excellent genotypes can be identified in the seedling stage by utilizing the molecular marker, the phenotype can be predicted in advance, but actually, useful molecular markers are few. Therefore, the selection of efficient molecular markers is essential for citrus cross breeding and rootstock breeding.
In recent years, with the development of molecular biology, breed identification has entered the genetic level. The DNA marking technology has the characteristics of accuracy, rapidness and automation, the cost is also reduced year by year, and the method is a trend of variety identification in the future, wherein SSR and SNP are methods particularly suitable for variety identification. SNP has many advantages as a third-generation molecular marker, and is also considered to be a molecular marker technology with great application prospects. While the KASP technique (competitive Allele Specific PCR) is based on the Specific matching of the primer end bases to type SNPs and detect InDels (Insertions and Deletions), and can perform precise double-Allele judgment on SNPs and InDels at Specific sites in a wide range of genomic DNA samples (even DNA samples of complex genomes), and has the characteristics of high stability and accuracy. As one of the SNP typing methods with high throughput, low cost and low error rate, the method plays an important role in crop auxiliary breeding application.
SNP is widely distributed on genome, has large density and various types, and currently, a method for developing SSR markers according to genome is provided, but a method for constructing a whole genome KASP marker library is not provided, because sequencing and comparison errors cause higher false positive rate in the construction process, especially the method for constructing the KASP marker library of the whole citrus genome is lacked.
In the prior art, there are two methods for obtaining genome single site KASP markers: the first is to carry out high throughput sequencing on natural populations and screen KASP markers according to the characteristic that markers are linked in cells, because the natural populations have large variation, small marker quantity and high false positive, and are often matched with correlation analysis to carry out phenotype identification. The second is to perform re-sequencing on parents of a genetic population or perform PCR sequencing after amplification, and the re-sequencing method has extremely high false positive rate in labeling because: sequencing errors, alignment accuracy, variation detection errors. At the same time, there is no standard to identify the accuracy of the obtained marker, and PCR amplification sequencing can be carried out for verification.
In view of this, there is a need to provide a method for constructing a citrus whole genome KASP marker library and applications thereof, so as to solve the deficiencies of the prior art.
Disclosure of Invention
One of the purposes of the invention is to provide a construction method of a citrus whole genome KASP marker library. The invention utilizes the citrus genetic population to carry out mutation detection, and filters off partial separation sites to obtain highly credible mutation site information of the parents on the genome, thereby obtaining the KASP marker library of the citrus whole genome, wherein the KASP marker library can be used for identifying phenotype and has the advantages of high efficiency and accuracy.
The technical scheme for solving the technical problems is as follows: a construction method of a citrus whole genome KASP marker library comprises the following steps:
introducing a group father and mother DNA sample and a filial generation single plant DNA sample, wherein the group father and mother DNA sample and the filial generation single plant DNA sample are obtained by respectively extracting a group father and mother plant and a plurality of filial generation single plants by a trace DNA extraction method;
filtering partial separation sites in the population father and mother DNA samples and the filial generation individual plant DNA samples to obtain a genome VCF file after filtering treatment;
carrying out genotype filling on unknown genotype data in the filtered genome VCF file by using genotype filling software;
filtering the single base variation SNP data in the filled genome VCF file to obtain a final genome VCF file which can be used in KASP marker generation software;
and importing the final genome VCF file into the KASP marker generation software to generate a citrus KASP marker library.
The number of filial generation single plants selected by the invention is 231.
The method comprises the steps of selecting parental sequencing to obtain the genotype of a father and a mother, selecting the sequencing of 231 filial generations to obtain the genotype of the filial generations, sequencing the data of the father and the mother and the son generations to obtain the genotype of the father and the mother, and establishing the genetic relationship between the filial generations and the genotype of the father and the mother.
The construction method of the whole citrus genome KASP marker library has the advantages that:
the invention utilizes the citrus genetic population to carry out mutation detection, and filters off partial separation sites to obtain highly credible mutation site information of the parents on the genome, thereby obtaining the KASP marker library of the citrus whole genome, wherein the KASP marker library can be used for identifying phenotype and has the advantages of high efficiency and accuracy.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, the process of filtering out the segregation sites in the population father maternal DNA sample and the hybrid progeny individual DNA sample comprises:
establishing a library for the group father and mother DNA sample and the hybrid progeny single plant DNA sample, and introducing the library into an illumina novaseq 6000 sequencing platform to obtain the parental off-line data and the hybrid progeny single plant off-line data in a fastq format with paired ends; paired-end sequencing, namely paired-end sequencing, which is one mode of an illumina novaseq 6000 sequencing platform;
removing sequencing joints of the parent and female parent off-line data and the hybrid progeny single plant off-line data by using fastp data quality control software to obtain parent and female parent sequencing data and hybrid progeny single plant sequencing data;
comparing the sequencing data of the parents and the sequencing data of the hybrid progeny single plants to a reference genome by using bwa sequence comparison software, and respectively sequencing the sequences of the compared sequencing data of the parents and the compared sequencing data of the hybrid progeny single plants by using samtools sequence comparison software and preset genome position information;
respectively filtering the sequenced parental parent sequencing data and the repeated fragment PCR in the sequenced filial generation single plant sequencing by using a picard high-throughput sequencing data format kit;
merging genotype data in the filtered parental sequencing data and the filtered filial generation single plant sequencing data to obtain a genome VCF file to be processed, counting the filial generation genotype coverage number of each locus in the genome file to be processed, and filtering each locus according to the filial generation genotype coverage number and a preset filtering value;
and performing type marking on the remaining sites in the genome VCF file subjected to site filtering, determining partial separation sites according to the marking information of each site, and filtering the partial separation sites to obtain the filtered genome VCF file.
The beneficial effect of adopting the further scheme is that: the variant positions satisfying the Mendelian rule-passing law can be obtained.
Further, the process of filtering each locus according to the coverage number of the offspring genotype and a preset filtering value comprises the following steps:
and (3) setting the filial generation genotype coverage number of the locus as m, the total number of filial generation single plants as n, and the preset filtering value as 0.95, calculating the ratio of m to n, and filtering the corresponding locus if the ratio is less than the preset filtering value of 0.95.
The beneficial effect of adopting the further scheme is that: the obtained data is more accurate.
Further, the process of type-tagging remaining sites in the site-filtered genomic VCF file comprises:
the marker type is determined from the sequenced genotype, and is labeled as heterozygous site if the genotype is "0/1", as homozygous site if the genotype is "0/0", as nn × np if the maternal genotype is "0/0" and the paternal genotype is "0/1", and as lm × ll if the maternal genotype is "0/1" and the paternal genotype is "0/0", as lm × ll, as "0/1" and the paternal genotype is "0/1".
Further, the process of determining partial separation sites according to the labeling information of each site and filtering the partial separation sites comprises:
counting the number a of the nn genotypes and the number b of the np genotypes of all the marker nn multiplied np sites, carrying out chi-square detection on a: b and (a + b)/2 in a set confidence interval to obtain a first p value, judging as a partial separation site if the first p value is less than 0.01, and filtering the partial separation site, wherein the confidence interval is 0.05;
counting the lm genotype number c and the ll genotype number d of all the marker lm × ll sites, carrying out chi-square detection on c: d and (c + d)/2 in a set confidence interval to obtain a second p value, if the second p value is less than 0.01, determining a partial separation site, and filtering the partial separation site, wherein the confidence interval is 0.05;
counting the number e of hh genotypes, the number f of hk genotypes and the number g of kk genotypes of all the marked hk multiplied hk sites, carrying out chi-square detection on e: f: g and (e + f + g)/4 (e + f + g)/2 (e + f + g)/4 in a set confidence interval to obtain a third p value, if the third p value is less than 0.01, judging as a partial separation site, and filtering the partial separation site, wherein the confidence interval is 0.05.
Specific genetic relationships and mendelian's law of inheritance for three genotypes: nn × np means that the female parent is Aa, the male parent is Aa type, and the separation can be carried out in the selfing generation, wherein the separation ratio is 1: 1. lm × ll means that the female parent is Aa, the male parent is Aa type, and the male parent can be separated in the first selfing generation, and the separation ratio is also 1: 1, hk × hk means that the parent and the mother are Aa and Aa, and the segregation ratio of the offspring is 1: 2: 1.
further, the process of filtering the single base variation SNP data in the populated genomic VCF file includes:
the single base variation SNPs in the populated genomic VCF file were filtered using the VCF file manipulation software package.
The second purpose of the invention is to provide the application of the citrus whole genome KASP marker library. The citrus whole genome KASP marker library has high accuracy and high flux, covers citrus whole genome and has wide application field.
The technical scheme for solving the technical problems is as follows: the application of the whole citrus genome KASP marker library in citrus cluster typing is provided.
The application of the citrus whole genome KASP marker library has the advantages that:
the citrus whole genome KASP marker library has high accuracy and high flux, covers citrus whole genome and has wide application field. Specifically, it can be used in the following fields:
field 1: the sequenced varieties in the citrus whole genome KASP marker library are used for configuring other hybrid pollination combinations, and the citrus whole genome KASP marker library can be directly used for marking only by detecting unknown parents.
Field 2: the application of the KASP marker library of the whole citrus genome can be expanded to natural varieties of citrus, and only the resequencing of other wild species is needed, and then the required marker sites are further filtered according to the sites of the KASP.
Field 3: directly after the QTL or GWAS of the citrus material is positioned, the required marker locus can be further filtered in the candidate segment according to the locus of the KASP.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, the specific method of the application is as follows:
step 1: using QTL to locate and determining that the candidate segment is a segment of chromosome 4 1.23 Mb;
step 2: extraction of citrus leaf DNA
Taking leaf tissues of citrus varieties to be detected, and extracting whole genome DNA;
and step 3: PCR amplification reaction
The reaction system is as follows: mu.l of the whole genomic DNA obtained in step 2 at a concentration of 0.5 ng/. mu.l, 5. mu.l of a 2 XKASP reaction mixture and 0.14. mu.l of a primer mixture, in a total volume of 10.14. mu.l; the reaction procedure is as follows: 15min at 94 ℃; ② 94 ℃ for 20 s; ③ from 61 ℃, the temperature is reduced to 55 ℃ within 60s at the speed of 0.6 ℃/cycle, and 10 cycles are carried out; 94 ℃, 20s, 55 ℃, 60s and 26 cycles; obtaining a PCR amplification product;
and 4, step 4: clustering typing
And (3) performing fluorescence quantitative PCR (polymerase chain reaction) tape reading on the PCR amplification product obtained in the step (3), selecting the fluorescence types of FAM and HEX, and selecting allele typing to obtain the cluster typing of the citrus variety to be tested.
The adoption of the further beneficial effects is as follows: the method can be used for judging the clustering type of the citrus varieties to be detected.
In step 1, the QTL is called QuantitativeTraitLocus in english, and the chinese name is a quantitative trait locus or a quantitative trait locus, which refers to the position of a gene controlling a quantitative trait in a genome.
Selecting an nnXnp-type KASP marker from the citrus genome-wide KASP marker library if the phenotypic genetic rule is paternally dominant; selecting an lmXll type KASP marker from the citrus whole genome KASP marker library if the phenotype genetic rule is maternally dominant; if experiments were performed using F2 generation population, the KASP marker of hkXhk type was selected from the above-mentioned Citrus whole genome KASP marker library, the marker site was selected to be KASP _ chr4_29357745 within 1.23Mb, and the sequences flanking the marker site by 60bp were obtained, and primer 1, primer 2 and primer3 were designed using primer3.0 software.
In step 3, the DNA concentration is diluted to 5 ng/. mu.l, so that the DNA amount corresponding to the size of the genome of each subsequent reaction can be ensured, and the genome of the sample to be detected is 300Mb multiplied by 5 ng.
The 2 XKASP reaction mixture was purchased from LGC, UK under the catalog number KBS-1016-002 (well plate for 96/384).
Furthermore, in step 2, the extraction of the whole genome DNA is performed by CTAB method.
The further beneficial effects of the adoption are as follows: by adopting the method, the whole genome DNA can be extracted.
Furthermore, in step 3, the primer mixture is formed by mixing a primer 1, a primer 2 and a primer3 in equal volume, wherein the nucleotide sequence of the primer 1 is shown as SEQ ID No.1, the nucleotide sequence of the primer 2 is shown as SEQ ID No.2, and the nucleotide sequence of the primer3 is shown as SEQ ID No. 3.
Primer 1(SEQ ID NO. 1): gaaggtgaccaagttcatgctatcacttagtgcactacaac, respectively;
primer 2(SEQ ID NO. 2): gaaggtcggagtcaacggattatcacttagtgcactacaaa, respectively;
primer 3(SEQ ID NO. 3): tcgaacatgcctggtcatt are provided.
The further beneficial effects of the adoption are as follows: by adopting the method, PCR amplification products with FAM and HEX fluorescent signals can be obtained.
Drawings
FIG. 1 is a QTL mapping chart in accordance with an embodiment of the present invention.
FIG. 2 is a linkage diagram of a Citrus polygerm KASP marker and a Citrus polygerm INDEL marker in an example of the invention.
FIG. 3 is a graph of the results of genotyping multiple genetic populations using citrus polygerm KASP markers obtained from the whole genome KASP marker library in an example of the invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
The following introduces a construction method of a citrus whole genome KASP marker library by a specific embodiment, which comprises the following steps:
introducing a group father and mother DNA sample and a filial generation single plant DNA sample, wherein the group father and mother DNA sample and the filial generation single plant DNA sample are obtained by respectively extracting a group father and mother plant and a plurality of filial generation single plants by a trace DNA extraction method;
filtering partial separation sites in the population father and mother DNA samples and the filial generation individual plant DNA samples to obtain a genome VCF file after filtering treatment;
carrying out genotype filling on unknown genotype data in the filtered genome VCF file by using genotype filling software;
filtering the single base variation SNP data in the filled genome VCF file to obtain a final genome VCF file which can be used in KASP marker generation software;
and importing the final genome VCF file into the KASP marker generation software to generate a citrus KASP marker library.
Specifically, the process of filtering out the partial separation sites in the population father mother DNA sample and the filial generation individual DNA sample comprises the following steps:
establishing a library for the group father and mother DNA sample and the hybrid progeny single plant DNA sample, and introducing the library into an illumina novaseq 6000 sequencing platform to obtain the parental off-line data and the hybrid progeny single plant off-line data in a fastq format with paired ends;
removing sequencing joints of the parent and female parent off-line data and the hybrid progeny single plant off-line data by using fastp data quality control software to obtain parent and female parent sequencing data and hybrid progeny single plant sequencing data;
comparing the sequencing data of the parents and the sequencing data of the hybrid progeny single plants to a reference genome by using bwa sequence comparison software, and respectively sequencing the sequences of the compared sequencing data of the parents and the compared sequencing data of the hybrid progeny single plants by using samtools sequence comparison software and preset genome position information;
respectively filtering the sequenced parental parent sequencing data and the repeated fragment PCR in the sequenced filial generation single plant sequencing by using a picard high-throughput sequencing data format kit;
merging genotype data in the filtered parental sequencing data and the filtered filial generation single plant sequencing data to obtain a genome VCF file to be processed, counting the filial generation genotype coverage number of each locus in the genome file to be processed, and filtering each locus according to the filial generation genotype coverage number and a preset filtering value;
and performing type marking on the remaining sites in the genome VCF file subjected to site filtering, determining partial separation sites according to the marking information of each site, and filtering the partial separation sites to obtain the filtered genome VCF file.
In the above embodiment, the mutation position satisfying the mendelian rule can be obtained.
Specifically, the process of filtering each locus according to the coverage number of the offspring genotype and a preset filtering value comprises the following steps:
and (3) setting the filial generation genotype coverage number of the locus as m, the total number of filial generation single plants as n, and the preset filtering value as 0.95, calculating the ratio of m to n, and filtering the corresponding locus if the ratio is less than the preset filtering value of 0.95.
Specifically, the process of type-tagging remaining sites in the site-filtered genomic VCF file comprises:
the marker type is determined from the sequenced genotype, and is labeled as heterozygous site if the genotype is "0/1", as homozygous site if the genotype is "0/0", as nn × np if the maternal genotype is "0/0" and the paternal genotype is "0/1", and as lm × ll if the maternal genotype is "0/1" and the paternal genotype is "0/0", as lm × ll, as "0/1" and the paternal genotype is "0/1".
Specifically, the process of determining partial separation sites according to the labeling information of each site and filtering the partial separation sites comprises:
counting the number a of the nn genotypes and the number b of the np genotypes of all the marker nn multiplied np sites, carrying out chi-square detection on a: b and (a + b)/2 in a set confidence interval to obtain a first p value, judging as a partial separation site if the first p value is less than 0.01, and filtering the partial separation site, wherein the confidence interval is 0.05;
counting the lm genotype number c and the ll genotype number d of all the marker lm × ll sites, carrying out chi-square detection on c: d and (c + d)/2 in a set confidence interval to obtain a second p value, if the second p value is less than 0.01, determining a partial separation site, and filtering the partial separation site, wherein the confidence interval is 0.05;
counting the number e of hh genotypes, the number f of hk genotypes and the number g of kk genotypes of all the marked hk multiplied hk sites, carrying out chi-square detection on e: f: g and (e + f + g)/4 (e + f + g)/2 (e + f + g)/4 in a set confidence interval to obtain a third p value, if the third p value is less than 0.01, judging as a partial separation site, and filtering the partial separation site, wherein the confidence interval is 0.05.
Specifically, the process of filtering the single base variation SNP data in the populated genomic VCF file includes:
the single base variation SNPs in the populated genomic VCF file were filtered using the VCF file manipulation software package.
In the embodiment, the citrus genetic population is used for mutation detection, and the segregation sites are filtered out, so that the highly reliable mutation site information of the parents on the genome is obtained, and the citrus whole genome KASP marker library is obtained and can be used for identifying the phenotype, and the method has the advantages of high efficiency and accuracy.
An example of a part of the contents of the citrus whole genome KASP marker library obtained in this example is shown in table 1.
TABLE 1 example of partial contents of the Citrus Whole genome KASP tags library
Figure BDA0002726496530000121
Figure BDA0002726496530000131
Figure BDA0002726496530000141
Figure BDA0002726496530000151
Figure BDA0002726496530000161
Figure BDA0002726496530000171
Figure BDA0002726496530000181
Figure BDA0002726496530000191
The application of the whole citrus genome KASP marker library in citrus cluster typing is provided.
The specific method of the application is as follows:
step 1: the candidate segment is determined to be a segment of chromosome 4, 1.23Mb, by mapping using QTL.
Step 2: extraction of citrus leaf DNA
Taking leaf tissues of citrus varieties to be detected, and extracting whole genome DNA by adopting a CTAB method.
And step 3: PCR amplification reaction
The reaction system is as follows: mu.l of the whole genomic DNA obtained in step 2 at a concentration of 0.5 ng/. mu.l, 5. mu.l of a 2 XKASP reaction mixture and 0.14. mu.l of a primer mixture, in a total volume of 10.14. mu.l. Wherein the primer mixture is prepared by mixing the primer 1, the primer 2 and the primer3 in equal volumes, wherein the 2 XKASP reaction mixture is purchased from LGC company of UK with the catalog number KBS-1016-002 (applicable to 96/384 well plate). The nucleotide sequence of the primer 1 is shown as SEQ ID NO.1, the nucleotide sequence of the primer 2 is shown as SEQ ID NO.2, and the nucleotide sequence of the primer3 is shown as SEQ ID NO. 3.
Primer 1(SEQ ID NO. 1): gaaggtgaccaagttcatgctatcacttagtgcactacaac, respectively;
primer 2(SEQ ID NO. 2): gaaggtcggagtcaacggattatcacttagtgcactacaaa, respectively;
primer 3(SEQ ID NO. 3): tcgaacatgcctggtcatt are provided.
The reaction procedure is as follows: 15min at 94 ℃; ② 94 ℃ for 20 s; ③ from 61 ℃, the temperature is reduced to 55 ℃ within 60s at the speed of 0.6 ℃/cycle, and 10 cycles are carried out; 94 ℃, 20s, 55 ℃, 60s and 26 cycles; obtaining PCR amplification products.
And 4, step 4: clustering typing
And (3) performing fluorescence quantitative PCR (polymerase chain reaction) tape reading on the PCR amplification product obtained in the step (3), selecting the fluorescence types of FAM and HEX, and selecting allele typing to obtain the cluster typing of the citrus variety to be tested. The single embryo and multiple embryo types can be clustered into two different types, wherein the Aa type is single embryo citrus, and the Aa type is multiple embryo citrus.
And (3) analysis:
in the prior art, multiple embryos and single embryos are generally identified by naked eyes under a microscope, and the quantity of the embryos can be observed by using seeds after the citrus is matured, so that the embryo property is determined. Because the embryo belongs to a fine structure, certain errors exist in observation, and the method in the prior art has the defects of time and labor waste and low accuracy. However, the development of molecular markers for cosegregation of citrus polygerm traits has been slow for many years. INDEL markers were developed in the last few years because of the need for running agarose gel electrophoresis, the limited number of assays per run, and the disadvantage of high throughput assays. In addition, the single polyembryony phenotype was localized to chr4, but the INDEL marker was in a complex DNA structure, amplification efficiency was low, and the product was prone to form higher order structures.
In the 231 individuals and the parental sequencing experiment of the embodiment of the invention, the population has a plurality of segregation traits, wherein the segregation traits comprise a single multi-embryo trait, the QTL is made by using the population, as shown in figure 1, the QTL is successfully positioned until the major gene is within a 1.23Mb interval of chr4, then the recombinant individuals are positioned by using the nnXnp type marker, the segment is reduced to a region of 313Kb, and the positioning segment is consistent with the positioning segment of the prior art, and the accuracy is high. After searching the citrus whole genome KASP marker library, the invention finds a KASP marker which is completely linked with the INDEL marker, Kasp _ chr4_29357745, is located at the same locus, and is a marker of nnXnp as shown in FIG. 2.
Furthermore, the invention also adopts a kit KBS-1016-
Figure BDA0002726496530000201
480II real-time fluorescence quantitative PCR system, which confirms that the KASP marker library of the whole citrus genome is available and has good effect.
Furthermore, the invention also tests the group utilized in the research and tests the second group with the same male parent, and finds that the KASP marker has good effect in both the two groups because the male parents are the same, can completely distinguish single embryo types from multiple embryo types, is accurate, has no false positive, has large flux, and can complete the detection task of nearly 700 samples within 3 hours. The partial detection results are shown in FIG. 3.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Sequence listing
<110> university of agriculture in Huazhong
<120> construction method and application of citrus whole genome KASP marker library
<160> 3
<170> SIPOSequenceListing 1.0
<210> 1
<211> 41
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
gaaggtgacc aagttcatgc tatcacttag tgcactacaa c 41
<210> 2
<211> 41
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
gaaggtcgga gtcaacggat tatcacttag tgcactacaa a 41
<210> 3
<211> 19
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
tcgaacatgc ctggtcatt 19

Claims (10)

1. A construction method of a citrus whole genome KASP marker library is characterized by comprising the following steps:
introducing a group father and mother DNA sample and a filial generation single plant DNA sample, wherein the group father and mother DNA sample and the filial generation single plant DNA sample are obtained by respectively extracting a group father and mother plant and a plurality of filial generation single plants by a trace DNA extraction method;
filtering partial separation sites in the population father and mother DNA samples and the filial generation individual plant DNA samples to obtain a genome VCF file after filtering treatment;
carrying out genotype filling on unknown genotype data in the filtered genome VCF file by using genotype filling software;
filtering the single base variation SNP data in the filled genome VCF file to obtain a final genome VCF file which can be used in KASP marker generation software;
and importing the final genome VCF file into the KASP marker generation software to generate a citrus KASP marker library.
2. The method of constructing a citrus whole genome KASP marker library according to claim 1, wherein the step of filtering out the segregation sites in the population parent maternal DNA sample and the hybrid progeny individual DNA sample comprises:
establishing a library for the group father and mother DNA sample and the hybrid progeny single plant DNA sample, and introducing the library into an illumina novaseq 6000 sequencing platform to obtain the parental off-line data and the hybrid progeny single plant off-line data in a fastq format with paired ends;
removing sequencing joints of the parent and female parent off-line data and the hybrid progeny single plant off-line data by using fastp data quality control software to obtain parent and female parent sequencing data and hybrid progeny single plant sequencing data;
comparing the sequencing data of the parents and the sequencing data of the hybrid progeny single plants to a reference genome by using bwa sequence comparison software, and respectively sequencing the sequences of the compared sequencing data of the parents and the compared sequencing data of the hybrid progeny single plants by using samtools sequence comparison software and preset genome position information;
respectively filtering the sequenced parental parent sequencing data and the repeated fragment PCR in the sequenced filial generation single plant sequencing by using a picard high-throughput sequencing data format kit;
merging genotype data in the filtered parental sequencing data and the filtered filial generation single plant sequencing data to obtain a genome VCF file to be processed, counting the filial generation genotype coverage number of each locus in the genome file to be processed, and filtering each locus according to the filial generation genotype coverage number and a preset filtering value;
and performing type marking on the remaining sites in the genome VCF file subjected to site filtering, determining partial separation sites according to the marking information of each site, and filtering the partial separation sites to obtain the filtered genome VCF file.
3. The method of constructing a citrus whole genome KASP marker library according to claim 1, wherein the filtering of each locus according to the progeny genotype coverage and a preset filter value comprises:
and (3) setting the filial generation genotype coverage number of the locus as m, the total number of filial generation single plants as n, and the preset filtering value as 0.95, calculating the ratio of m to n, and filtering the corresponding locus if the ratio is less than the preset filtering value of 0.95.
4. The method of constructing a citrus whole genome KASP marker library according to claim 1, wherein the process of type-labeling the remaining sites in the site-filtered genomic VCF file comprises:
the marker type is determined from the sequenced genotype, and is labeled as heterozygous site if the genotype is "0/1", as homozygous site if the genotype is "0/0", as nn × np if the maternal genotype is "0/0" and the paternal genotype is "0/1", and as lm × ll if the maternal genotype is "0/1" and the paternal genotype is "0/0", as lm × ll, as "0/1" and the paternal genotype is "0/1".
5. The method for constructing a citrus whole genome KASP marker library according to claim 4, wherein the process of determining a partial separation site according to marker information of each site and filtering the partial separation site comprises:
counting the number a of the nn genotypes and the number b of the np genotypes of all the marker nn multiplied np sites, carrying out chi-square detection on a: b and (a + b)/2 in a set confidence interval to obtain a first p value, judging as a partial separation site if the first p value is less than 0.01, and filtering the partial separation site, wherein the confidence interval is 0.05;
counting the lm genotype number c and the ll genotype number d of all the marker lm × ll sites, carrying out chi-square detection on c: d and (c + d)/2 in a set confidence interval to obtain a second p value, if the second p value is less than 0.01, determining a partial separation site, and filtering the partial separation site, wherein the confidence interval is 0.05;
counting the number e of hh genotypes, the number f of hk genotypes and the number g of kk genotypes of all the marked hk multiplied hk sites, carrying out chi-square detection on e: f: g and (e + f + g)/4 (e + f + g)/2 (e + f + g)/4 in a set confidence interval to obtain a third p value, if the third p value is less than 0.01, judging as a partial separation site, and filtering the partial separation site, wherein the confidence interval is 0.05.
6. The method for constructing a citrus whole genome KASP marker library according to claim 1, wherein the filtering of the single base variation SNP data in the populated genomic VCF file comprises:
the single base variation SNPs in the populated genomic VCF file were filtered using the VCF file manipulation software package.
7. Use of a citrus whole genome KASP marker library according to any one of claims 1 to 6 for citrus cluster typing.
8. The application according to claim 7, wherein the specific method of the application is:
step 1: using QTL to locate and determining that the candidate segment is a segment of chromosome 4 1.23 Mb;
step 2: extraction of citrus leaf DNA
Taking leaf tissues of citrus varieties to be detected, and extracting whole genome DNA;
and step 3: PCR amplification reaction
The reaction system is as follows: mu.l of the whole genomic DNA obtained in step 2 at a concentration of 0.5 ng/. mu.l, 5. mu.l of a 2 XKASP reaction mixture and 0.14. mu.l of a primer mixture, in a total volume of 10.14. mu.l; the reaction procedure is as follows: 15min at 94 ℃; ② 94 ℃ for 20 s; ③ from 61 ℃, the temperature is reduced to 55 ℃ within 60s at the speed of 0.6 ℃/cycle, and 10 cycles are carried out; 94 ℃, 20s, 55 ℃, 60s and 26 cycles; obtaining a PCR amplification product;
and 4, step 4: clustering typing
And (3) performing fluorescence quantitative PCR (polymerase chain reaction) tape reading on the PCR amplification product obtained in the step (3), selecting the fluorescence types of FAM and HEX, and selecting allele typing to obtain the cluster typing of the citrus variety to be tested.
9. The use of claim 8, wherein in step 2, the extraction of whole genome DNA is performed by CTAB method.
10. The use of claim 8, wherein in step 3, the primer mixture is prepared by mixing a primer 1, a primer 2 and a primer3 in equal volumes, wherein the nucleotide sequence of the primer 1 is shown as SEQ ID No.1, the nucleotide sequence of the primer 2 is shown as SEQ ID No.2, and the nucleotide sequence of the primer3 is shown as SEQ ID No. 3.
CN202011104524.7A 2020-10-15 2020-10-15 Construction method and application of citrus whole genome KASP marker library Active CN112289384B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011104524.7A CN112289384B (en) 2020-10-15 2020-10-15 Construction method and application of citrus whole genome KASP marker library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011104524.7A CN112289384B (en) 2020-10-15 2020-10-15 Construction method and application of citrus whole genome KASP marker library

Publications (2)

Publication Number Publication Date
CN112289384A true CN112289384A (en) 2021-01-29
CN112289384B CN112289384B (en) 2024-02-20

Family

ID=74496978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011104524.7A Active CN112289384B (en) 2020-10-15 2020-10-15 Construction method and application of citrus whole genome KASP marker library

Country Status (1)

Country Link
CN (1) CN112289384B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113571131A (en) * 2021-08-06 2021-10-29 广东省农业科学院水稻研究所 Pangenome construction method and corresponding structural variation mining method
CN116463445A (en) * 2023-03-24 2023-07-21 西南大学 Citrus whole genome 40K liquid-phase chip and application
CN117305503A (en) * 2023-11-04 2023-12-29 华中农业大学 20K liquid phase chip for citrus genotype identification and application thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108004340A (en) * 2016-10-27 2018-05-08 河南农业大学 One cultivate peanut full-length genome SNP exploitation method
WO2019138244A1 (en) * 2018-01-12 2019-07-18 John Innes Centre Method for identifying genes associated with a particular phenotype
CN110042172A (en) * 2019-06-03 2019-07-23 广东省农业科学院果树研究所 A kind of citrus hybrid Rapid identification primer and method based on SNP marker

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108004340A (en) * 2016-10-27 2018-05-08 河南农业大学 One cultivate peanut full-length genome SNP exploitation method
WO2019138244A1 (en) * 2018-01-12 2019-07-18 John Innes Centre Method for identifying genes associated with a particular phenotype
CN110042172A (en) * 2019-06-03 2019-07-23 广东省农业科学院果树研究所 A kind of citrus hybrid Rapid identification primer and method based on SNP marker

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王富强等: "SNP分子标记在作物品种鉴定中的应用和展望", 植物遗传资源学报, vol. 21, no. 05, pages 1308 - 1320 *
陈思平: "基于KASP的水稻基因组SNP标记开发及其育种应用", 中国优秀硕士学位论文 农业科技辑, no. 08 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113571131A (en) * 2021-08-06 2021-10-29 广东省农业科学院水稻研究所 Pangenome construction method and corresponding structural variation mining method
CN113571131B (en) * 2021-08-06 2022-06-14 广东省农业科学院水稻研究所 Pangenome construction method and corresponding structural variation mining method
CN116463445A (en) * 2023-03-24 2023-07-21 西南大学 Citrus whole genome 40K liquid-phase chip and application
CN116463445B (en) * 2023-03-24 2024-04-30 西南大学 Citrus whole genome 40K liquid-phase chip and application
CN117305503A (en) * 2023-11-04 2023-12-29 华中农业大学 20K liquid phase chip for citrus genotype identification and application thereof
CN117305503B (en) * 2023-11-04 2024-04-26 华中农业大学 20K liquid phase chip for citrus genotype identification and application thereof

Also Published As

Publication number Publication date
CN112289384B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN112289384B (en) Construction method and application of citrus whole genome KASP marker library
CN108103235B (en) SNP molecular marker and primer for identifying cold resistance of apple rootstock and application of SNP molecular marker and primer
CN108998550B (en) SNP molecular marker for rice genotyping and application thereof
CN108300799B (en) High-throughput detection marker of wheat powdery mildew resistance gene Pm5e and application thereof in breeding
CN107312870B (en) Molecular marker closely linked with pepper sterility restoring gene, method and application
CN112195265B (en) SNP (Single nucleotide polymorphism) locus and primer set for identifying purity of pepper hybrid and application
CN110846429A (en) Corn whole genome InDel chip and application thereof
CN115198023A (en) Hainan cattle liquid phase breeding chip and application thereof
CN108300800B (en) Molecular marker, primer and application of pepper male sterility restoring gene in close linkage
CN113249509B (en) Identification primer and identification method for interspecific hybrid progeny of populus jaborandi and populus microphylla
CN109161609B (en) SNP molecular marker of wheat leaf rust resistance gene Lr42, detection method and application
CN115141893A (en) Molecular marker group containing 7 molecular markers and used for predicting dry matter content of kiwi fruit, application and kit thereof
CN108060247B (en) Haplotype related to upland cotton No. 8 chromosome fiber strength
CN116479164B (en) SNP locus, molecular marker, amplification primer and application of SNP locus and molecular marker related to soybean hundred-grain weight and size
CN117210596B (en) Melon SNP locus marker combination, SNP locus marker detection probe combination, liquid phase chip and application
CN116103428B (en) dCAPS molecular marker related to watermelon seed size and application thereof
CN117587159B (en) Chilli SNP molecular marker combination, SNP chip and application thereof
CN114736979B (en) Molecular marker closely linked with watermelon complete leaf shape gene ClLL and application thereof
CN113005215B (en) Haplotype molecular marker related to poplar wood yield and application thereof
CN108300797B (en) Haplotype of upland cotton No. 25 chromosome related to fiber strength
CN115747370A (en) Non-heading Chinese cabbage KASP molecular marker and application thereof in germplasm resource identification
CN117737294A (en) Molecular marker and method for rapidly identifying purity of tomato winter rhyme hybrid seeds
CN117144047A (en) InDel locus related to drought resistance of corn, molecular marker, primer and application thereof
CN117965784A (en) KASP molecular marker closely linked with capsicum fruit width gene, acquisition method, primer, kit and application
CN118064638A (en) SNP molecular marker locus related to drought tolerance of corn and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant