CN118098348B - Method and device for detecting genotype of hybrid parent, electronic equipment and medium - Google Patents
Method and device for detecting genotype of hybrid parent, electronic equipment and medium Download PDFInfo
- Publication number
- CN118098348B CN118098348B CN202410093508.4A CN202410093508A CN118098348B CN 118098348 B CN118098348 B CN 118098348B CN 202410093508 A CN202410093508 A CN 202410093508A CN 118098348 B CN118098348 B CN 118098348B
- Authority
- CN
- China
- Prior art keywords
- genotype
- data
- endosperm
- hybrid
- parent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000035772 mutation Effects 0.000 claims abstract description 45
- 238000001514 detection method Methods 0.000 claims abstract description 38
- 238000012163 sequencing technique Methods 0.000 claims abstract description 31
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000003860 storage Methods 0.000 claims description 14
- 108700028369 Alleles Proteins 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 9
- 108091081062 Repeated sequence (DNA) Proteins 0.000 claims description 4
- 230000007547 defect Effects 0.000 abstract description 5
- 238000003766 bioinformatics method Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 9
- 239000000463 material Substances 0.000 description 8
- 230000008901 benefit Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 108020004414 DNA Proteins 0.000 description 5
- 240000008042 Zea mays Species 0.000 description 5
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 239000003153 chemical reaction reagent Substances 0.000 description 3
- 235000005822 corn Nutrition 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 230000003370 grooming effect Effects 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- LZZYPRNAOMGNLH-UHFFFAOYSA-M Cetrimonium bromide Chemical compound [Br-].CCCCCCCCCCCCCCCC[N+](C)(C)C LZZYPRNAOMGNLH-UHFFFAOYSA-M 0.000 description 2
- 238000001353 Chip-sequencing Methods 0.000 description 2
- 108010044467 Isoenzymes Proteins 0.000 description 2
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 description 2
- 238000009395 breeding Methods 0.000 description 2
- 230000001488 breeding effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000000227 grinding Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 239000007791 liquid phase Substances 0.000 description 2
- 235000009973 maize Nutrition 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000001712 DNA sequencing Methods 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 208000026487 Triploidy Diseases 0.000 description 1
- 238000012271 agricultural production Methods 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 102000054766 genetic haplotypes Human genes 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 230000008774 maternal effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000003147 molecular marker Substances 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008775 paternal effect Effects 0.000 description 1
- 230000003234 polygenic effect Effects 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000002791 soaking Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010396 two-hybrid screening Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/20—Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/13—Plant traits
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Botany (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Mycology (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The application discloses a method, a device, electronic equipment and a medium for detecting genotype of a hybrid parent. The detection method includes obtaining first genomic data of a hybrid; comparing the first genome data to a reference genome and processing the comparison result to obtain second genome data; extracting the mutation site data in the second genome data; and judging the genotype of the hybrid parent according to the mutation site data. The parent genotype of the sequencing site is directly deduced by bioinformatics analysis by utilizing sequencing data of endosperm of hybrid seeds, which overcomes a series of limitations of the traditional method, realizes direct speculation of the parent genotype and overcomes the defect that the prior art cannot directly trace the parent from the hybrid seeds.
Description
Technical Field
The application relates to the technical field of hybrid seeds, in particular to a method and a device for detecting the genotype of a parent hybrid seed, electronic equipment and a medium.
Background
The hybrid seeds produced by the hybridization of the homozygous parents have important economic significance, and have stronger resistance, adaptability and productivity, can obviously improve the yield and quality of crops, and are important resources for agricultural production. In modern agriculture, cultivation of new varieties requires a lot of time and resources. Tracing the parent genotype can provide powerful support for the intellectual property protection of varieties, ensure the genetic stability and uniqueness of the varieties, better protect the labor result of breeders and prevent unauthorized planting and sales. The genotype of the parent is presumed by utilizing the hybrid seeds, and the method plays an important role in paternity identification, intellectual property protection of varieties and breeding practice. The liquid phase chip sequencing technology is an innovative tool combining chip technology and second generation sequencing technology. The technology realizes parallel sequencing of a large number of DNA or RNA fragments by fixing nucleic acid molecules on the surface of a tiny chip, has the advantages of high flux, high resolution and diversity, and becomes an important tool in genomics research. Hybrid seeds have all the genetic information of the parent, wherein the endosperm is triploid (information with two maternal and one paternal haplotype), a good resource for the presumption of the parental genotype.
Conventional methods for identifying hybrid parents typically rely on molecular markers, isozymes, protein markers, and the like. These methods have limitations in terms of sample processing, analysis efficiency, cost, etc., may require a long experimental period and a large amount of human resources, and may require comparison with information of parents to make a judgment. For example, molecular marker technology can only cover limited sites, so that the genetic information of parents is difficult to fully understand; and the need for specialized equipment and reagents, resulting in relatively high costs, especially in large-scale sample analysis; molecular markers often rely on specific known sites and may not be effective in detecting polymorphisms at the unknown site. Isozymes are parent identified by analyzing differences in specific cleavage sites on a DNA sequence, and may not distinguish between certain similar genotypes at relatively low resolution; the method has high sensitivity to genome structure, and thus may not be stable or applicable in some genomes, and may increase experimental difficulty in processing complex samples (such as polygenic families). Protein labelling techniques are greatly affected by environmental factors and may lead to unstable results, and protein labelling may lack sufficient sensitivity in genotype-similar parents, making it difficult to provide discrimination. The above limitations affect the accurate prediction of the parental genotype. At present, a method for directly presuming the parent genotype through hybrid seeds to realize tracing is not seen, so that more advanced and efficient technical means are necessary to be sought, and the application of the liquid phase chip sequencing technology is expected to make up for the defects.
Disclosure of Invention
Based on the above, the embodiment of the application utilizes the sequencing data of endosperm of hybrid seeds to process the first genome data to obtain the second genome data, and can directly infer the parent genotype of the sequencing site according to the mutation site data in the second genome data. The detection method, the detection device, the electronic equipment and the medium thus overcome a series of limitations of the traditional method, realize direct estimation of the parent genotype and make up for the defect that the prior art cannot directly trace the parent from the hybrid.
Based on the above, the embodiment of the application at least discloses the following technical scheme:
In a first aspect, embodiments disclose a method of detecting a genotype of a hybrid parent. The detection method comprises the following steps: sequencing to obtain first genome data of the hybrid; comparing the first genome data to a reference genome, and processing a comparison result to obtain second genome data; extracting mutation site data in the second genome data; judging the parent genotype of the hybrid according to the mutation site data.
In a second aspect, embodiments disclose a device for detecting a genotype of a hybrid parent. The detection device comprises: a sequencing unit for sequencing to obtain first genome data of the hybrid; the data processing unit is used for comparing the first genome data to a reference genome and processing a comparison result to obtain second genome data; an extraction unit for extracting the mutation site data in the second genome data; and the judging unit is used for judging the genotype of the hybrid parent according to the mutation site data.
In a third aspect, an embodiment discloses an electronic device. The electronic device includes a processor and a memory. Stored in the memory are computer program instructions which, when executed by the processor, cause the processor to perform the detection method of the first aspect.
In a fourth aspect, embodiments disclose a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions which, when executed by a computing device, are operable to perform the detection method of the first aspect.
The method, the device, the electronic equipment and the medium for detecting the genotype of the hybrid parent provided by the embodiment utilize the processed hybrid sequencing data to judge according to the data amount (reads number) of the endosperm genotype, so that the genotype of the hybrid and a certain mutation site or sequencing site can be accurately detected. The embodiment overcomes a series of limitations of the traditional method, realizes direct estimation of the parent genotype, overcomes the defect that the prior art cannot directly trace the parent from the hybrid, and has the characteristic of high detection accuracy.
Drawings
FIG. 1 is a flow chart of a method for detecting the genotype of a hybrid parent provided in the examples.
Fig. 2 is a flowchart illustrating step S200 provided in the embodiment.
Fig. 3 is a flowchart illustrating step S300 according to an embodiment.
Fig. 4 is a flowchart illustrating step S400 provided in the embodiment.
FIG. 5 is a flow chart of a method for detecting the genotype of a hybrid parent provided in the examples.
FIG. 6 is a schematic structural diagram of a device for detecting the genotype of a hybrid parent provided in the examples.
Fig. 7 is a schematic structural diagram of an electronic device for detecting a genotype of a hybrid parent provided in the embodiment.
Fig. 8 is a schematic diagram of a gt table for extracting VCF files using R package "vcfR" according to an embodiment. A1667-P and A1667-R refer to the data corresponding to the parameters of two hybrid seeds. The FORMAT column indicates the data types, including GT, AD, DP, etc., for each hybrid seed, separated by a colon, one-to-one with the following sample columns.
FIG. 9 is a schematic diagram of extracting ALLELE DEPTH parameters from a gt table according to an embodiment; FORMAT column finger AD (Allele Depth); the numbers following are preceded by the number of reads supported by GT1, followed by the number of reads supported by GT2, separated by a comma. FORMAT refers to AD (Allele Depth) parameters, A1667-P, etc. refers to samples.
The example of FIG. 10 provides the percent identity result of a certain hybrid seed presuming the male parent genotype to all male parent (2026) genotypes. The DataN column is the name of 2026 parts of male parent material, X refers to consistency percentage, the sequence is arranged in descending order, the true male parent is marked on the graph, and the right side is 2026 parts of male parent consistency percentage density graph. Percent identity = Sum (putative locus genotype equals true male genotype)/total putative locus number.
The example of FIG. 11 provides the percent identity result of a certain hybrid seed presuming the male parent genotype to all male parent (2026) genotypes. The DataN column is the name of 2026 parts of male parent material, X refers to consistency percentage, the sequence is arranged in descending order, the true male parent is marked on the graph, and the right side is 2026 parts of male parent consistency percentage density graph. Percent identity = Sum (putative locus genotype equals true male genotype)/total putative locus number.
The example of FIG. 12 provides the percent identity result of a certain hybrid seed presuming the male parent genotype to all male parent (2026) genotypes. The DataN column is the name of 2026 parts of male parent material, X refers to consistency percentage, the sequence is arranged in descending order, the true male parent is marked on the graph, and the right side is 2026 parts of male parent consistency percentage density graph. Percent identity = Sum (putative locus genotype equals true male genotype)/total putative locus number.
The example of FIG. 13 provides the percent identity result of a certain hybrid seed presuming the male parent genotype to all male parent (2026) genotypes. The DataN column is the name of 2026 parts of male parent material, X refers to consistency percentage, the sequence is arranged in descending order, the true male parent is marked on the graph, and the right side is 2026 parts of male parent consistency percentage density graph. Percent identity = Sum (putative locus genotype equals true male genotype)/total putative locus number.
The example of FIG. 14 provides the percent identity result of a certain hybrid seed presuming the male parent genotype to all male parent (2026) genotypes. The DataN column is the name of 2026 parts of male parent material, X refers to consistency percentage, the sequence is arranged in descending order, the true male parent is marked on the graph, and the right side is 2026 parts of male parent consistency percentage density graph. Percent identity = Sum (putative locus genotype equals true male genotype)/total putative locus number.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the following examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. The reagents not specifically and individually described in the present application are all conventional reagents and are commercially available; methods which are not specifically described in detail are all routine experimental methods and are known from the prior art.
The endosperm cells of hybrid seed develop from the two polar nuclei of the female parent and one male parent sperm. For example, when the genotypes of a certain locus of parents are inconsistent, if the female parent is AA, the male parent is AA, the female parent polar nucleus genotype is A, and the male parent sperm genotype is a, the endosperm DNA genotype is AAa, and the A and the a belong to the male parent or the female parent according to the numbers of reads supported by the locus A and the genotype a of endosperm through second-generation sequencing.
The embodiment of the application discloses a method, a device, electronic equipment and a medium for detecting the genotype of a hybrid parent strain. The detection method utilizes the data quantity of endosperm genotype to judge the parent genotype of the hybrid, and can accurately detect the genotype of the hybrid and a certain mutation site or sequencing site. The embodiment overcomes a series of limitations of the traditional method, realizes direct estimation of the parent genotype, overcomes the defect that the prior art cannot directly trace the parent from the hybrid, and has the characteristic of high detection accuracy.
Compared with the traditional method, the detection device, the electronic equipment and the detection medium provided by the embodiment directly utilize the hybrid seeds to infer the parent genotype, so that the complicated step of comparing with the parent is omitted, the whole process is developed on the basis of the hybrid seeds, the process of the parent genotype inference is greatly simplified, and the analysis efficiency is improved.
The detection method, the detection device, the electronic equipment and the detection medium provided by the embodiment directly estimate the genotypes of the male parent and the female parent by setting simple and visual judgment conditions such as the relation between the reading depth proportion and the number, and simplify the interpretation process of the results.
The detection method, the detection device, the electronic equipment and the detection medium provided by the embodiment are not only suitable for parent-child relationship identification, but also can play a role in various aspects such as variety intellectual property protection, breeding practice and the like, and have application potential in the field of agriculture.
Method for detecting genotype of schematic hybrid parent
FIG. 1 shows a schematic flow chart of a method for detecting the genotype of a parent maize hybrid according to an embodiment. The detection method comprises the following steps:
s100, obtaining first genome data of the hybrid seeds through sequencing;
S200, comparing the first genome data to a reference genome, and processing a comparison result to obtain second genome data;
s300, extracting mutation site data in the second genome data;
S400, judging the genotype of the hybrid parent according to the mutation site data.
In these examples, the parent genotype of the hybrid is judged using the data amount of endosperm genotype, the judgment logic is simple and the judgment result is accurate, and the computational burden for the electronic equipment and medium processing the detection method is reduced.
In some embodiments of step S100, the hybrid seed is placed in warm water for 24 hours and the endosperm fraction is separated using forceps, blades, etc. The separated endosperm was placed in a 65 ℃ oven to dry for 24 hours, and the dried endosperm was milled using a ball mill (100 r/s,30 s). The ground endosperm is used for extracting genome DNA by using a CTAB method, and the site to be identified is subjected to second generation DNA sequencing to obtain sequencing data (namely first genome data), wherein the sequencing depth is not less than 250X.
Fig. 2 shows a flow chart of step S200 of the disclosure of an embodiment. The step of S200 includes: s201, comparing the first genome data to the reference genome; s202, removing the repeated sequence amplified by PCR; s203, generating a variation file in the VCF format.
In some embodiments, the Fastq files (first genomic data) from the sequencing are aligned to the reference genome using BWA software, and after PCR duplication is removed, the GATK software is used to generate variant files in VCF format. For example, hard filtering is performed using GATK "- -filter-expression 'QUAL <30.0||mq <50.0||qd.0<2.0'", and then quality control of mutation is performed using VCFtools, with parameters of "- -max-kernels 2-kernels-meanDP", to obtain a mutation file in a usable VCF format.
Fig. 3 shows a method flow diagram of S300 of an embodiment disclosure. The step of S300 includes: s301, extracting a gt table in the variation file; s302, extracting ALLELE DEPTH parameters in the gt table, wherein the ALLELE DEPTH parameters comprise the number of reads supported by the first endosperm genotype and the number of reads supported by the data volume of the second endosperm genotype of each mutation site.
In some embodiments, after reading the VCF file using the R language package "vcfR", the GT table is extracted, and ALLELE DEPTH (AD) parameters are extracted from the GT table, containing the number of reads supported by both genotype one (GT 1) and genotype two (GT 2) endosperm genotypes for each mutation site.
Fig. 4 shows a method flow diagram of S400 of an embodiment disclosure. The step of S400 includes: s401, obtaining the data amount of the first endosperm genotype and the data amount of the second endosperm genotype of each mutation site in the mutation site data; s402, judging the parent genotype of the hybrid according to the data amount of the first endosperm genotype and the data amount of the second endosperm genotype.
In some embodiments, the data amount of the first endosperm genotype is the read amount of the first endosperm genotype for each mutation site in the second genotype data, and the data amount of the second endosperm genotype is the read amount of the second endosperm genotype for each mutation site in the second genotype data.
In some embodiments, as shown in fig. 5, the step of S400 further includes: s403, if the data size of the first endosperm genotype is larger than the data size of the second endosperm genotype and the ratio of the data size of the first endosperm genotype to the data size of the second endosperm genotype is larger than 10, the male parent genotype of the hybrid corresponds to the first endosperm genotype, and the female parent genotype of the hybrid corresponds to the first endosperm genotype.
In some embodiments, as shown in fig. 5, the step of S400 further includes: s404, if the data amount of the second endosperm genotype is larger than the data amount of the first endosperm genotype and the ratio of the data amount of the second endosperm genotype to the data amount of the first endosperm genotype is larger than 10, the male parent genotype of the hybrid seed corresponds to the second endosperm genotype, and the female parent genotype of the hybrid seed corresponds to the second endosperm genotype.
In some embodiments, as shown in fig. 5, the step of S400 further includes: s405, if the ratio of the data amount of the first endosperm genotype to the data amount of the second endosperm genotype is more than 1.5 and less than 2.5, the male parent genotype of the hybrid corresponds to the second endosperm genotype, and the female parent genotype of the hybrid corresponds to the first endosperm genotype.
In some embodiments, as shown in fig. 5, the step of S400 further includes: s406, if the ratio of the data amount of the second endosperm genotype to the data amount of the first endosperm genotype is more than 1.5 and less than 2.5, the male parent genotype of the hybrid corresponds to the first endosperm genotype, and the female parent genotype of the hybrid corresponds to the second endosperm genotype.
In some embodiments, as shown in fig. 5, the detection method further comprises: s500, deleting the mutation site data which does not meet the judgment conditions of the steps S403 to S406.
In some embodiments, as shown in fig. 5, the detection method further comprises: s600, taking the non-mutation site data in the second genome data as the genotype of the reference genome.
In some embodiments, as shown in fig. 5, the detection method further comprises: s700, outputting a detection result, wherein the detection result comprises genotypes of male parent and female parent of each sequencing site.
In a specific embodiment, the detection method comprises:
1. Material to be measured
2026 Parts of maize inbred line material is used as a male parent to hybridize with the same female parent. 5 hybrid seeds were randomly selected from the above, and the male parent genotype was estimated by the method provided in the above example and compared with the actual genotype of the true male parent.
2. Separating corn seed endosperm and grinding
Soaking seeds in cold water for 24 hr, separating endosperm with forceps and blades, and grinding with ball mill to obtain powder. The CTAB method is used for extracting DNA, and the total amount of DNA is not less than 4 mug.
3. Detection of
S100, performing second generation sequencing, wherein the sequencing depth is 500X, and obtaining sequencing data (first genome data).
S200, comparing sequencing data with a corn B73 reference genome by using BWA software, removing PCR repetition of a BAM file by using GATK, and utilizing a B73 reference genome call SNP of the BAM file from which the repetition is removed by using GATK. And filtering after the vcf file is obtained. Hard filtering was first performed using GATK "— filter-expression 'QUAL <30.0||mq <50.0||qd 0< 2.0'". Then using VCFtools to control the variation quality, the parameters are "- -max-scales 2- -min-meanDP". Variant files (second genomic data) in VCF format were obtained.
S300, as shown in FIG. 8, after the VCF file is read by using the R language package vcfR, the gt table is extracted. As shown in FIG. 9, ALLELE DEPTH (AD) parameters were extracted from the GT table, containing the number of reads supported by both endosperm genotype one (GT 1) and genotype two (GT 2) for each mutation site.
S400, judging the genotype of the father and the mother by using the following method at each mutation site: if GT1> GT2 and GT1/GT2>10, the male parent is GT1 and the female parent is GT1. If GT2> GT1 and GT2/GT1>10, the male parent is GT2 and the female parent is GT2. If GT1/GT2>1.5 and GT1/GT2<2.5, the male parent is GT2 and the female parent is GT1. If GT2/GT1>1.5 and GT2/GT1<2.5, then the male parent is GT1 and the female parent is GT2.
S500, deleting the mutation site data which does not meet the judgment conditions of the steps S403 to S406.
S600, determining the detected position without variation information (not existing in the VCF file) as the genotype of the reference genome based on the sequencing position information, namely, the genotype of the reference genome position from the male parent to the female parent.
S700, outputting a detection result, wherein the detection result comprises genotypes of male parent and female parent of each sequencing site.
4. Results
Figures 10-14 show the percent identity of the predicted male genotype to all male (2026) genotypes of 5 seeds in a hybrid, respectively. Wherein, the percentage of the true male parent genotype consistent with the presumed male parent genotype is 92.2% -94.9%. The consistency with other inbred lines is lower. 5 parts of material, 4 parts of real male parent are ranked first, and 1 part of real male parent is ranked second.
Detection device for schematically hybrid parent genotypes
FIG. 6 shows a device for detecting the genotype of a hybrid parent. The detection device comprises: a sequencing unit 701 for sequencing to obtain first genome data of the hybrid; a data processing unit 702, configured to compare the first genome data to a reference genome, and process a comparison result to obtain second genome data; an extraction unit 703 for extracting the mutation site data in the second genome data; and a judging unit 704, configured to judge the genotype of the hybrid parent according to the mutation site data.
In some embodiments, the data processing unit 702 further includes a comparison module 712, an erasure module 722, a generation module 732. An alignment module 712 is used to align the first genomic data to the reference genome. The erase module 722 is used to remove the repeated sequences amplified by the PCR. The generating module 732 is configured to generate a variant file in VCF format.
In some embodiments, the extraction unit 703 includes a first extraction module 713, a second extraction module 723. The first extraction module 713 is configured to extract the gt table in the mutation file. The second extraction module 723 is configured to extract ALLELE DEPTH parameters in the gt table, the ALLELE DEPTH parameters including a number of reads supported by the first endosperm genotype and a number of reads supported by a data volume of the second endosperm genotype for each mutation site.
In some embodiments, the determining unit 704 includes an acquiring module 714 and a determining module 724. The acquisition module 914 is configured to obtain a data amount of a first endosperm genotype and a data amount of a second endosperm genotype for each mutation site in the mutation site data. The judging module 724 is used for judging the hybrid parent genotype according to the data amount of the first endosperm genotype and the data amount of the second endosperm genotype.
In some embodiments, the detection apparatus further includes a data deletion unit 705 for deleting the mutation site data that does not meet the judgment conditions of steps S403 to S406.
In some embodiments, the detection device further comprises a reference genotype extraction unit 706 for taking the mutation-free locus data in the second genome data as the genotype of the reference genome.
Schematic electronic device
Fig. 7 illustrates a block diagram of an electronic device as disclosed in an embodiment. The electronic device includes a processor 800 and a memory 900. Stored in the memory are computer program instructions that, when executed by the processor, cause the processor to perform the corn SNP locus screening method disclosed in the above embodiments.
Processor 800 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities and may control other components in an electronic device to perform desired functions.
Memory 900 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 800 to implement the medical data grooming method and/or other desired functions of the various embodiments of the application described above. Various contents such as various data tables can also be stored in the computer readable storage medium.
In one example, the electronic device may further include: input device 901 and output device 902, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown). For example, the input device 901 may be, for example, a keyboard, a mouse, or the like. The output device 902 can output various information including disease prediction results, customer information, and the like to the outside. The output device 902 may include, for example, a display, speakers, a printer, and a communication network and remote output apparatus connected thereto, etc. The input device 901 is used for inputting first genome data of a hybrid obtained by sequencing. The output device 902 is configured to output a detection result, where the detection result includes genotypes of the male parent and the female parent of each sequencing site.
Of course, only some of the components of the electronic device relevant to the present application are shown in fig. 7 for simplicity, components such as buses, input/output interfaces, etc. being omitted. In addition, the electronic device may include any other suitable components depending on the particular application. Illustrative computer program product and computer readable storage medium
In addition to the methods and apparatus described above, embodiments of the application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a medical data grooming method in accordance with the various embodiments of the application described in the section "exemplary method" above in this specification.
The computer program product may write program code for performing operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having a computer program product stored thereon. The computer program product, when executed by a processor, causes the processor to perform the steps in the medical data grooming method according to the various embodiments of the application described in the section "exemplary method" above in the description.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present application have been described above in connection with specific embodiments, but it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be construed as necessarily possessed by the various embodiments of the application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not necessarily limited to practice with the above described specific details.
The block diagrams of the devices, apparatuses, devices, systems referred to in the present application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present application.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application.
Claims (8)
1.A method for detecting a genotype of a hybrid parent, comprising:
sequencing to obtain first genome data of the hybrid;
comparing the first genome data to a reference genome, and processing a comparison result to obtain second genome data;
extracting mutation site data in the second genome data;
judging the parent genotype of the hybrid according to the mutation site data;
Wherein the step of "aligning the first genomic data to a reference genome and processing the alignment result" comprises: comparing the first genome data to the reference genome, removing the repeated sequence amplified by PCR, and generating a variation file in a VCF format;
Wherein the step of "extracting the mutation site data in the second genome data" comprises: extracting a gt table in the variation file; extracting ALLELE DEPTH parameters in the gt table, wherein the ALLELE DEPTH parameters comprise the number of reads supported by the first endosperm genotype and the number of reads supported by the data volume of the second endosperm genotype of each mutation site;
wherein the step of judging the genotype of the hybrid parent according to the mutation site data comprises the following steps: obtaining a data amount of a first endosperm genotype and a data amount of a second endosperm genotype for each mutation site in the mutation site data; judging the parent genotype of the hybrid according to the data amount of the first endosperm genotype and the data amount of the second endosperm genotype; the data amount of the first endosperm genotype is the reads number of the first endosperm genotype of each variation site in the second genotype data, and the data amount of the second endosperm genotype is the reads number of the second endosperm genotype of each variation site in the second genotype data;
the detection method further includes taking the non-mutated site data in the second genomic data as the genotype of the reference genome.
2. The method of claim 1, wherein if the data size of the first endosperm genotype is greater than the data size of the second endosperm genotype and the ratio of the data size of the first endosperm genotype to the data size of the second endosperm genotype is greater than 10, the male parent genotype of the hybrid corresponds to the first endosperm genotype and the female parent genotype of the hybrid corresponds to the first endosperm genotype.
3. The method of claim 1, wherein if the amount of data of the second endosperm genotype is greater than the amount of data of the first endosperm genotype and the ratio of the amount of data of the second endosperm genotype to the amount of data of the first endosperm genotype is greater than 10, the male parent genotype of the hybrid corresponds to the second endosperm genotype and the female parent genotype of the hybrid corresponds to the second endosperm genotype.
4. The method of claim 1, wherein if the ratio of the amount of data of the first endosperm genotype to the amount of data of the second endosperm genotype is greater than 1.5 and less than 2.5, the male parent genotype of the hybrid corresponds to the second endosperm genotype and the female parent genotype of the hybrid corresponds to the first endosperm genotype.
5. The method of claim 1, wherein if the ratio of the amount of data of the second endosperm genotype to the amount of data of the first endosperm genotype is greater than 1.5 and less than 2.5, the male parent genotype of the hybrid corresponds to the first endosperm genotype and the female parent genotype of the hybrid corresponds to the second endosperm genotype.
6. A device for detecting a genotype of a hybrid parent, comprising:
a sequencing unit for sequencing to obtain first genome data of the hybrid;
the data processing unit is used for comparing the first genome data to a reference genome and processing a comparison result to obtain second genome data;
an extraction unit for extracting mutation site data in the second genome data;
the judging unit is used for judging the parental genotypes of the hybrid seeds according to the mutation site data;
Wherein the step of "aligning the first genomic data to a reference genome and processing the alignment result" comprises: comparing the first genome data to the reference genome, removing the repeated sequence amplified by PCR, and generating a variation file in a VCF format;
Wherein the step of "extracting the mutation site data in the second genome data" comprises: extracting a gt table in the variation file; extracting ALLELE DEPTH parameters in the gt table, wherein the ALLELE DEPTH parameters comprise the number of reads supported by the first endosperm genotype and the number of reads supported by the data volume of the second endosperm genotype of each mutation site;
wherein the step of judging the genotype of the hybrid parent according to the mutation site data comprises the following steps: obtaining a data amount of a first endosperm genotype and a data amount of a second endosperm genotype for each mutation site in the mutation site data; judging the parent genotype of the hybrid according to the data amount of the first endosperm genotype and the data amount of the second endosperm genotype; the data amount of the first endosperm genotype is the reads number of the first endosperm genotype of each variation site in the second genotype data, and the data amount of the second endosperm genotype is the reads number of the second endosperm genotype of each variation site in the second genotype data;
The detecting further includes taking the non-mutated locus data in the second genomic data as the genotype of the reference genome.
7. An electronic device, comprising:
A processor; and
A memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the detection method of any of claims 1-5.
8. A computer readable storage medium having stored thereon computer program instructions which, when executed by a computing device, are operable to perform the detection method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410093508.4A CN118098348B (en) | 2024-01-23 | 2024-01-23 | Method and device for detecting genotype of hybrid parent, electronic equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410093508.4A CN118098348B (en) | 2024-01-23 | 2024-01-23 | Method and device for detecting genotype of hybrid parent, electronic equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118098348A CN118098348A (en) | 2024-05-28 |
CN118098348B true CN118098348B (en) | 2024-08-06 |
Family
ID=91148373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410093508.4A Active CN118098348B (en) | 2024-01-23 | 2024-01-23 | Method and device for detecting genotype of hybrid parent, electronic equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118098348B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118703693A (en) * | 2024-08-28 | 2024-09-27 | 中山大学 | Hybrid allele parent specific expression identification method based on biparental map genome |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102747153A (en) * | 2012-07-03 | 2012-10-24 | 董志平 | Millet waxy gene cosegregation molecular marker and detection method thereof |
CN107177665A (en) * | 2016-03-09 | 2017-09-19 | 中国科学院上海生命科学研究院 | Function linked marker 0707-1 and its application in corn germplasm improvement |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8487167B2 (en) * | 2009-08-10 | 2013-07-16 | The United States Of America, As Represented By The Secretary Of Agriculture | Non-transgenic soft textured tetraploid wheat plants having grain with soft textured endosperm, endosperm therefrom and uses thereof |
US20170124250A1 (en) * | 2015-11-03 | 2017-05-04 | International Business Machines Corporation | Estimating multiple parents from a matrix of f1 hybrid progeny |
CN106755300B (en) * | 2016-11-17 | 2019-08-23 | 中国科学院华南植物园 | A method of identification Kiwi berry hybrid parent is to filial generation genome contribution proportion |
KR102010859B1 (en) * | 2017-11-10 | 2019-08-16 | 대한민국 | Markers for identifying floury endosperm characteristics and use thereof |
CN110964845B (en) * | 2020-01-03 | 2022-06-07 | 江苏省农业科学院 | Method for tracing hybrid source of corn pollination and InDel molecular marker |
CN115052994A (en) * | 2020-05-22 | 2022-09-13 | 深圳华大智造科技股份有限公司 | Method for determining base type of predetermined site in chromosome of embryonic cell and application thereof |
CN112931183A (en) * | 2021-02-05 | 2021-06-11 | 江苏省农业科学院 | Efficient corn breeding method based on single plant evaluation and whole genome selection technology |
CN113053459A (en) * | 2021-03-17 | 2021-06-29 | 扬州大学 | Hybrid prediction method for integrating parental phenotypes based on Bayesian model |
CN115161408A (en) * | 2022-05-25 | 2022-10-11 | 华中农业大学 | DNA methylation detection of maize genomic target segments |
CN114990256A (en) * | 2022-06-23 | 2022-09-02 | 中国科学院植物研究所 | Molecular marker of sorghum sub-aleurone layer thickness related gene and application |
CN115623984B (en) * | 2022-11-02 | 2023-06-23 | 中国林业科学研究院经济林研究所 | Apricot plant distant hybridization high-affinity backbone parent selection method and cotyledon abortive hybrid embryo rescue method based on genome heterozygosity |
CN116004893A (en) * | 2022-11-07 | 2023-04-25 | 广东省中医院(广州中医药大学第二附属医院、广州中医药大学第二临床医学院、广东省中医药科学院) | Interseed hybrid of spring sand and Hainan sand and identification method of parent thereof |
CN115995262B (en) * | 2023-03-21 | 2023-05-23 | 济南大学 | Method for analyzing corn genetic mechanism based on random forest and LASSO regression |
-
2024
- 2024-01-23 CN CN202410093508.4A patent/CN118098348B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102747153A (en) * | 2012-07-03 | 2012-10-24 | 董志平 | Millet waxy gene cosegregation molecular marker and detection method thereof |
CN107177665A (en) * | 2016-03-09 | 2017-09-19 | 中国科学院上海生命科学研究院 | Function linked marker 0707-1 and its application in corn germplasm improvement |
Also Published As
Publication number | Publication date |
---|---|
CN118098348A (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Minnoye et al. | Chromatin accessibility profiling methods | |
Yang et al. | Target SSR-Seq: a novel SSR genotyping technology associate with perfect SSRs in genetic analysis of cucumber varieties | |
SA517381091B1 (en) | Methods and systems for analyzing nucleic acid sequencing data | |
CN118098348B (en) | Method and device for detecting genotype of hybrid parent, electronic equipment and medium | |
Lange et al. | Analysis pipelines for cancer genome sequencing in mice | |
CN115198023B (en) | Hainan cattle liquid-phase breeding chip and application thereof | |
JP7361774B2 (en) | A method for detecting genetic variation in highly homologous sequences by independent alignment and pairing of sequence reads | |
EP3794120A1 (en) | Methods and reagents for resolving nucleic acid mixtures and mixed cell populations and associated applications | |
CN115052994A (en) | Method for determining base type of predetermined site in chromosome of embryonic cell and application thereof | |
Beal et al. | Whole genome sequencing for quantifying germline mutation frequency in humans and model species: cautious optimism | |
CN113278712A (en) | Gene chip, molecular probe combination, kit and application for analyzing sheep hair color | |
CN113278714B (en) | Gene chip for analyzing whether sheep has horns or not, molecular probe combination, kit and application | |
CN109524060B (en) | Genetic disease risk prompting gene sequencing data processing system and processing method | |
Mabire et al. | High throughput genotyping of structural variations in a complex plant genome using an original Affymetrix® axiom® array | |
CN118186103A (en) | Lateolabrax japonicus 100k liquid phase chip and application thereof | |
JP7429072B2 (en) | Methods for constructing nucleic acid libraries and their use in pre-implantation embryo chromosomal structural abnormality analysis | |
JP7362789B2 (en) | Systems, computer programs and methods for determining genetic relationships between sperm donors, oocyte donors and their respective conceptuses | |
CN115992292B (en) | SNP molecular marker combination for brassica napus and application thereof | |
US20220020449A1 (en) | Vector-based haplotype identification | |
CN113981070B (en) | Method, device, equipment and storage medium for detecting embryo chromosome microdeletion | |
Boopathi et al. | QTL analysis | |
CN112639129A (en) | Method and apparatus for determining the genetic status of a new mutation in an embryo | |
JP7446343B2 (en) | Systems, computer programs and methods for determining genome ploidy | |
CN113699253A (en) | Laoshan milk goat low-density liquid-phase SNP chip and application thereof | |
CN114171116A (en) | Method for evaluating fetal DNA concentration by free and self DNA of pregnant woman and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |