CN112885408A - Method and device for detecting SNP marker locus based on low-depth sequencing - Google Patents

Method and device for detecting SNP marker locus based on low-depth sequencing Download PDF

Info

Publication number
CN112885408A
CN112885408A CN202110199054.5A CN202110199054A CN112885408A CN 112885408 A CN112885408 A CN 112885408A CN 202110199054 A CN202110199054 A CN 202110199054A CN 112885408 A CN112885408 A CN 112885408A
Authority
CN
China
Prior art keywords
sequencing
depth
genome
low
haplotype
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110199054.5A
Other languages
Chinese (zh)
Inventor
胡晓湘
王宇哲
朱迪
任江丽
李宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202110199054.5A priority Critical patent/CN112885408A/en
Publication of CN112885408A publication Critical patent/CN112885408A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Abstract

The invention relates to the field of genetics, in particular to a method and a device for detecting SNP marker sites based on low-depth sequencing. The method comprises the following steps: obtaining genome DNA of an individual to be detected; carrying out individual low-depth whole genome sequencing on the genome DNA, and comparing a sequencing result with a reference genome to obtain polymorphic site information; based on a hidden Markov model, carrying out genotyping on the polymorphic site information by utilizing a reference haplotype database; and the reference haplotype database comprises mutation site information of a breeding group to which the individual to be detected belongs. The invention utilizes the low-depth sequencing data of a single sample to carry out high-accuracy and standardized genotyping of SNP sites of ten million orders of magnitude in the whole genome in a very short time.

Description

Method and device for detecting SNP marker locus based on low-depth sequencing
Technical Field
The invention relates to the field of genetics, in particular to a method and a device for detecting SNP marker sites based on low-depth sequencing.
Background
Single Nucleotide Polymorphisms (SNPs) are the most popular genetic markers at present, and have the advantages of large number, wide distribution and good genetic stability in genomes. SNP is widely applied to the research directions of analysis of various trait genetic mechanisms, selection evolution research, genome prediction and the like in human and animal and plant research.
The number of genetic markers is required to be different according to different research contents, wherein the research contents needing to use the whole genome high-density markers mainly comprise whole genome association analysis and animal and plant genome selection analysis. In genome-wide association analysis, true causative mutations of the target phenotype can be identified more accurately using higher density of genome-wide genetic markers; in recent years, the Genome Selection (GS) technology emerging in genetic breeding of animals and plants utilizes high-density SNPs covering the whole genome to construct a genetic relationship coefficient matrix to calculate the estimated genome breeding value of an individual and select the individual. It is worth mentioning that genome selection belongs to application research, the sample scale for seed selection and breeding by utilizing genome selection is greatly increased year by year, and the method is very sensitive to three factors of marking accuracy, timeliness and price in actual production.
The current genome-wide SNP typing methods can be mainly divided into two major categories, namely commercialized SNP chips and genome sequencing. The commercialized SNP chip is the mainstream method for early whole genome typing because of its high standardization, good accuracy and simple operation. However, as the research expands, the deficiency thereof also gradually appears. For example, the number of SNP markers contained in a chip is mostly tens of thousands to hundreds of thousands, and thus, the chip is difficult to meet all types of research requirements; one SNP chip can only detect specific mutation sites, and has poor expansibility; commercial chips use a specific part of mainstream varieties in site design, which can cause a part of labeled sites to fail in a specific population; in addition, with the continuous development of sequencing technology, the cost advantage of chip typing has gradually disappeared. On the other hand, although the whole genome sequencing cost is continuously reduced, the distance from large-scale population breeding application is still not small, and a plurality of alternative methods for targeted sequencing are derived, for example, simplified genome sequencing is taken as an example, and the methods achieve the purpose of reducing the cost by enriching and sequencing fragments with a small proportion in a genome. Compared with the chip technology, the method has great progress in the aspects of marker density and cost optimization, but the targeted sequencing does not really realize the coverage of the whole genome, and the analysis process needs higher biological information basis, so the qualitative breakthrough is not realized in the whole genome typing technology of breeding practice.
In order to achieve higher density genotyping, the main strategy adopted at present is genotype filling, such as filling a low density chip with a high density chip, filling chip data with high depth sequencing data, and the like. However, these methods rely heavily on high-quality reference haplotype data sets (reference panels), which not only means that the population of the data sets is large in size and the result of self-typing has high confidence, but also requires that the data sets have a close genetic relationship with the population to be filled. At present, most of research in livestock and poultry species relies on small-sample high-depth sequencing data to construct a reference haplotype database, a large number of research reports exist, the quality of the panel cannot guarantee high-accuracy filling, which means that tens of thousands or hundreds of thousands of wrong typing results exist when the number of markers is in the million level, and the strategy has high computational complexity and poor timeliness, so that the breeding practice is still not facilitated.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a device for detecting SNP marker sites based on low-depth sequencing. The invention utilizes the low-depth sequencing data of the sample to be tested and carries out gene typing based on the reference haplotype database, thereby shortening the time of SNP locus typing to a greater extent and having extremely high accuracy.
In a first aspect, the present invention provides a method for detecting SNP marker loci based on low-depth sequencing, comprising:
obtaining genome DNA of an individual to be detected;
performing first low-depth whole genome sequencing on the genome DNA, comparing a sequencing result to a reference genome, and then performing genotyping;
the genotyping is based on a hidden Markov model, and polymorphic sites in a sequencing result are genotyped by utilizing a reference haplotype database;
the reference haplotype database comprises haplotype information of a breeding population to which the individual to be detected belongs, which is obtained by sequencing a second low-depth whole genome.
Further, the sequencing depth of the first low-depth whole genome sequencing is between 0.1X and 1X.
Further, the genotyping is:
and predicting the probability of the mutation site belonging to each haplotype source in the reference haplotype database by a hidden Markov model aiming at each mutation site in the sequencing result, and outputting the genotyping result of the mutation site according to the information of the haplotype with the highest probability.
Further, the method for constructing the reference haplotype database comprises the following steps:
obtaining genomic DNAs of a plurality of individuals of the breeding population, and performing the second low-depth sequencing to obtain sequencing data;
comparing the sequencing data to a reference genome, and judging and screening population polymorphic sites to obtain position information of each polymorphic site in the breeding population;
and processing the mutation site information of the breeding population through an EM iterative algorithm to construct a reference haplotype database.
Further, the second low depth whole genome sequencing has a population sequencing depth of between 300X and 600X.
Further, the plurality of individuals is 1500 or more individuals.
Further, the method for constructing the reference haplotype database further comprises:
after completing the detection of the SNP marker sites, haplotype data obtained by the detection result is incorporated into the reference haplotype database.
The method for detecting the SNP marker locus based on the low-depth sequencing is matched with the practical process of breeding and breeding: the breeding precondition needs a large-scale sample reference population which is matched with the process of constructing the reference haplotype database; the individual data to be determined by breeding are accumulated gradually in a small amount and a plurality of times, which is matched with the analysis in a single sample unit in the process of detecting the SNP marker locus.
In a second aspect, the present invention provides an electronic device comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the steps of the program instructions to enable performance of the method as provided by the first aspect.
In a third aspect, the invention provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the steps of the method as provided in the first aspect.
The invention has the following beneficial effects:
according to the invention, a reference haplotype database suitable for a large-scale sample source of a target group is established at low cost by using low-depth sequencing data, and a database construction link and a detection link are independently operated, so that the high-density SNP genotyping of a single low-depth sample, which is rapid, economic, accurate and covers the whole genome, is realized.
In addition, the reference haplotype database provided by the invention also has updating iteration, namely, after the samples obtained by detection reach a certain number, the information of the new samples is updated into the reference haplotype database at one time, so that the high accuracy of the typing of the samples produced subsequently is ensured.
Drawings
FIG. 1 is a flow chart of the method for detecting SNP marker loci based on low-depth sequencing provided by the invention.
Fig. 2 is a schematic physical structure diagram of an electronic device provided in the present invention.
FIG. 3 is a graph showing the results of the relationship between different reference sample sizes and sequencing depths and genotyping accuracy provided in example 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a schematic flow chart of a method for detecting SNP marker loci based on low-depth sequencing, and as shown in FIG. 1, the invention provides a method for detecting SNP marker loci based on low-depth sequencing, which comprises:
s1, obtaining genome DNA of the individual to be detected;
specifically, in practical application, the genomic DNA of the individual to be detected can be obtained by a common method in the art, for example, the whole genome is randomly interrupted by an enzyme digestion method or by an ultrasonic method, and the genomic DNA fragment obtained by any method capable of realizing the random interruption of the whole genome can be applied to subsequent sequencing and other processes.
S2, performing first low-depth whole genome sequencing on the genomic DNA;
specifically, based on the above scheme, low-depth whole genome sequencing can be performed on a second generation sequencing platform by a conventional method in the field, and the sequencing depth is preferably between 0.1X and 1X.
S3, comparing the sequencing result with a reference genome, and then carrying out genotyping;
the genotyping is specifically that based on a hidden Markov model, polymorphic sites in a sequencing result are genotyped by utilizing a reference haplotype database;
further, the reference genome may be selected to be homologous to the individual to be detected, for example, the reference genome of swine may be genotyped, the reference genome of chicken may be genotyped, and the reference genome of chicken may be referred to.
Further, alignment of the sequencing data to a reference genome can yield an alignment for each individual (bam file).
Further, the genotyping is:
and predicting the probability of the mutation site belonging to each haplotype source in the reference haplotype database by a hidden Markov model aiming at each mutation site in the sequencing result, and outputting the genotyping result of the mutation site according to the information of the haplotype with the highest probability.
The reference haplotype database provided by the invention is constructed by the following method: obtaining genomic DNAs of a plurality of individuals of the breeding population, and performing the second low-depth sequencing to obtain sequencing data; comparing the sequencing data to a reference genome, and judging and screening population polymorphic sites to obtain position information of each polymorphic site in the breeding population; and processing the mutation site information of the breeding population through an EM iterative algorithm to construct a reference haplotype database.
In this step, the sample size of the reference haplotype database construction link should be more than 1500, and the population sequencing depth (sample size for constructing database x sequencing depth of each individual) of a polymorphic site should be more than 300, so as to ensure the accuracy of detection. In practical applications, the sequencing depth can be adjusted according to the number of samples, for example, when the number of samples is 1500, the average sequencing depth is guaranteed to be more than 0.2 x, and when the number of samples is 3000, the average sequencing depth is guaranteed to be more than 0.1 x.
Furthermore, conventional software in the prior art, such as BaseVar software, can be adopted in the step to judge and screen the population polymorphic sites to obtain corresponding polymorphic site information, and certain screening standards can be set, such as EAF ≥ 0.01.
It should be noted that the EM iteration algorithm involved in this step may implement EM iteration by using software existing in the prior art, such as stutch software or fastPHASE.
Further, after the detection of the SNP marker sites is completed, haplotype data obtained by the detection result can be also incorporated into the reference haplotype database. For example, in practical applications, since constructing the reference haplotype database is a rate-limiting step, it is preferable to incorporate the detected haplotype data into the reference haplotype database after accumulating a certain number of samples for each detection, such as once after accumulating 1500 samples, which can ensure the rapidity of the detection process.
Fig. 2 is a schematic physical structure diagram of an electronic device provided in the present invention, and referring to fig. 2, the electronic device includes: a processor (processor)31, a memory (memory)32, and a bus 33; wherein, the processor 31 and the memory 32 complete the communication with each other through the bus 33; the processor 31 is configured to call program instructions in the memory 32 to perform the methods provided by the above-mentioned method embodiments, for example, including: obtaining genome DNA of an individual to be detected; carrying out individual low-depth whole genome sequencing on the genome DNA, and comparing a sequencing result with a reference genome to obtain polymorphic site information; and based on a neural network model, performing genotyping on the polymorphic site information by using a reference haplotype database.
Furthermore, the logic instructions in the memory 32 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the detection method provided by the above embodiments, for example, including: obtaining genome DNA of an individual to be detected; carrying out individual low-depth whole genome sequencing on the genome DNA, and comparing a sequencing result with a reference genome to obtain polymorphic site information; and based on a neural network model, performing genotyping on the polymorphic site information by using a reference haplotype database.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
The present invention is further illustrated below based on more specific examples.
Example 1
1. Experimental Material
3000 individual ear tissue samples of Duroc core swine herd were used, and the genome was extracted and diluted to 40 ng/. mu.L.
2. Experimental methods
2.1 Low depth DNA library construction and sequencing
In this example, the DNA library construction by Tn5 enzyme digestion is described as follows:
(1) embedding Tn5 proenzyme with specific Tn5ME-A/Tn5Merev and Tn5ME-B/Tn5Merev joint for 2h at 72 deg.C to obtain Tn5 working enzyme with shear-paste activity, diluting the working enzyme to 16.5ng/μ L, and adding 5 × TAPS-MgCl at 4 μ L2mu.L of Dimethylformamide (DMF) and nucleic-free water were used to cleave 50ng of the genome at 55 ℃ for 10 min.
(2) To each reaction, 3.5. mu.L of 0.2% SDS was added, and the reaction was incubated at 55 ℃ for another 10 min. A PCR reaction was then performed, including 96 different indices in the primers to distinguish individuals.
The PCR procedure was: 1 × (72 ℃, 9 min); 1 × (98 ℃, 30 sec); 9 × (98 ℃, 30 sec; 63 ℃, 30 sec; 72 ℃, 3 min).
(3) After the PCR product of each individual is quantified by the Qubit Fluorometric quantification (Invitrogen), an equal amount of mixing pool is taken for 96 individuals, AMPure XP beads (Beckmann) is used for purification under the conditions of 0.55 multiplied by retained supernatant and 0.1 multiplied by retained magnetic beads, the size of the library fragment is detected by an Agilent Bioanalyzer 2100 after the concentration of the purified product is detected, and the quality of the library is ensured to be qualified.
Double-ended 2X 100bp whole genome re-sequencing was performed on all samples on a MGIseq2000 platform, with an average sequencing depth of 0.7X per sample.
2.2 polymorphic site identification screening
The filtered raw sequencing data were aligned using an FPGA-based acceleration server, the reference genome was made using a version of porcine Ssc refofa 11.1(ftp:// ftp. ensembl. org/pub/rele ase-99/fasta/sus _ scrofa/dna /), and the alignment software was made using BWA. The alignment time for each sample was about 2-3 min. In the embodiment, BaseVar software is adopted to identify polymorphic sites, the standard of screening sites is that EAF is more than or equal to 0.01, a boxplot is adopted to evaluate the sequencing depth of each site group, and sites with the sequencing depth more than or equal to 1.5IQR are reserved as the group mutation site set. In this example, 11.6M candidate polymorphic sites of the porcine whole genome were obtained.
2.3 reference haplotype database construction
In the implementation example, STITCH software is selected to carry out EM algorithm iterative computation, the number of the founder haplotypes is preset to be 10, the pre-typing result is used as a database haplotype filtering standard, and the specific parameters are imputation info score >0.4 and Hardy Weinberg Equisibrium (HWE) p-value >1 e-6.
2.4 candidate sample mutation typing and accuracy assessment
The same DNA library building, sequencing and comparison methods are adopted for the samples to be typed. And reading the original sequencing data of the typing sample by using the constructed reference haplotype database, and identifying and typing the genotypes of all candidate polymorphic sites by adopting an HMM hidden Markov model. Finally obtaining the SNP typing result of 11.6M of the whole genome of the individual. And then, the accuracy of the genotyping result is judged by adopting a GeneSeek Genomic Profile pore 80K SNP Array chip, the genotyping results of 42 samples are collected for evaluation, and chromosome 13 is selected as an example, so that the result shows that the genotyping consistency of the coincident sites of the two methods reaches 99.67 percent, and the method is proved to have extremely high accuracy.
Example 2
This example is intended to illustrate the accuracy and timeliness of the method for detecting SNP marker sites provided by the present invention.
1. Experimental Material
Blood samples from 3000 individuals in the distant deep-crossbred line of huiyang beard chickens and Lingnan yellow chickens were used to extract genomes and diluted to 40 ng/. mu.L.
2. Experimental methods
The basic methods of low-depth DNA library construction and sequencing, polymorphic site identification and screening, reference haplotype database construction, candidate sample mutation typing and accuracy evaluation are the same as in example 1. The different points include: the average sequencing depth per individual was about 0.8 ×; the reference genome used version of chicken GRCg6a (INSDC Assembly GCA _000002315.5, Mar 2018); since the genomic heterozygosity and complexity of the hybrid population is much higher than that of the inbred population, the number of haplotypes for the founder in this example is preset to 24; obtaining 7.9M candidate polymorphic sites (SNP interval is about 96bp/SNP on average, and genome distribution is uniform) on the chicken autosome by referring to a haplotype database; then, taking the result of chicken Chr11 as an example to evaluate the accuracy, 28 individuals are analyzed in the example, and all the individuals successfully obtain the typing result of 288895 SNP sites on all the Chr 11; the 28 individuals were additionally subjected to ultra-deep whole genome sequencing (average sequencing depth per sample of 80 ×) and genotyped using GATK 4.1 standardized SNP identification protocol.
In the embodiment, the calculation resources used for constructing the reference haplotype database are 40 cores, the time for comparing the whole genome sequencing data of each sample with the genome is about 1-2min, and the total time for constructing the database for 3000 samples is 4 h. In the detection process, the genotyping result only needs 8-10min from the original sequencing data to the generation of a SNP (single nucleotide polymorphism) of hundreds of thousands of levels of a chromosome in each 100 samples, and the generation of all SNPs (of tens of millions of levels) of the whole genome can be completed through parallel calculation of different chromosomes. The typing results of 28 individuals for evaluating the accuracy show that the consistency of the high-depth data results and the genotyping of the method is over 99.71 percent, which proves that the method still has extremely high accuracy in the hybridization population.
In conclusion, the method realizes high-accuracy and standardized genotyping of SNP loci of ten million orders of magnitude in the whole genome in extremely short time by using low-depth sequencing data of a single sample.
Example 3
This example is used to illustrate the influence of the sequencing depth and sample size of each sample on the genotyping accuracy in reference to the haplotype database construction process.
The experimental materials and experimental methods used in this example were the same as those of example 2. In the reference haplotype database construction link, different reference sample sizes (200, 500, 1000, 1500, 2000, 3000, 4000) and sequencing depths (0.05 x, 0.1 x, 0.2 x, 0.3 x, 0.5 x) of each sample are extracted, and finally obtained genotyping results are compared with high-depth data to evaluate the accuracy.
The results are shown in FIG. 3. As can be seen, when the average sequencing depth of each sample reaches more than 0.2X, and the sample size exceeds 1500, the genotyping accuracy is basically stable (kept above 98.78%), and no longer changes obviously with the increase of the sequencing depth and the number of samples; under 0.2 × sequencing conditions, the sample size exceeds 2000, and the accuracy is over 99%, reaching 99.13%.
Although the invention has been described in detail hereinabove with respect to a general description and specific embodiments thereof, it will be apparent to those skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (9)

1. A method for detecting SNP marker sites based on low-depth sequencing is characterized by comprising the following steps:
obtaining genome DNA of an individual to be detected;
performing first low-depth whole genome sequencing on the genome DNA, comparing a sequencing result to a reference genome, and then performing genotyping;
the genotyping is based on a hidden Markov model, and polymorphic sites in a sequencing result are genotyped by utilizing a reference haplotype database;
the reference haplotype database comprises haplotype information of a breeding population to which the individual to be detected belongs, which is obtained by sequencing a second low-depth whole genome.
2. The method of claim 1, wherein the first low depth whole genome sequencing has a sequencing depth between 0.1 x and 1 x.
3. The method of claim 1 or 2, wherein the genotyping is:
and predicting the probability of the mutation site belonging to each haplotype source in the reference haplotype database by a hidden Markov model aiming at each mutation site in the sequencing result, and outputting the genotyping result of the mutation site according to the information of the haplotype with the highest probability.
4. The method of claim 1, wherein the reference haplotype database is constructed by the method comprising the steps of:
obtaining genomic DNAs of a plurality of individuals of the breeding population, and performing the second low-depth sequencing to obtain sequencing data;
comparing the sequencing data to a reference genome, and judging and screening population polymorphic sites to obtain position information of each polymorphic site in the breeding population;
and processing the mutation site information of the breeding population through an EM iterative algorithm to construct a reference haplotype database.
5. The method of claim 4, wherein the second low depth whole genome sequencing has a population sequencing depth between 300 x and 600 x.
6. The method of claim 4, wherein the plurality of individuals is 1500 or more individuals.
7. The method of any of claims 4-6, wherein the reference haplotype database is constructed by a method further comprising:
after completing the detection of the SNP marker sites, haplotype data obtained by the detection result is incorporated into the reference haplotype database.
8. An electronic device, comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 7.
9. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 7.
CN202110199054.5A 2021-02-22 2021-02-22 Method and device for detecting SNP marker locus based on low-depth sequencing Pending CN112885408A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110199054.5A CN112885408A (en) 2021-02-22 2021-02-22 Method and device for detecting SNP marker locus based on low-depth sequencing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110199054.5A CN112885408A (en) 2021-02-22 2021-02-22 Method and device for detecting SNP marker locus based on low-depth sequencing

Publications (1)

Publication Number Publication Date
CN112885408A true CN112885408A (en) 2021-06-01

Family

ID=76056870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110199054.5A Pending CN112885408A (en) 2021-02-22 2021-02-22 Method and device for detecting SNP marker locus based on low-depth sequencing

Country Status (1)

Country Link
CN (1) CN112885408A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450871A (en) * 2021-06-28 2021-09-28 广东博奥医学检验所有限公司 Method for identifying sample identity based on low-depth sequencing
CN113517022A (en) * 2021-06-10 2021-10-19 阿里巴巴新加坡控股有限公司 Gene detection method, feature extraction method, device, equipment and system
CN113539357A (en) * 2021-06-10 2021-10-22 阿里巴巴新加坡控股有限公司 Gene detection method, model training method, device, equipment and system
CN113832252A (en) * 2021-11-02 2021-12-24 华南农业大学 Method for detecting SNP locus genotype of indica-japonica rice
CN114242164A (en) * 2021-12-21 2022-03-25 苏州吉因加生物医学工程有限公司 Analysis method, device and storage medium for whole genome replication
CN114783527A (en) * 2022-05-23 2022-07-22 广州鸿溪见杉科技有限公司 Construction method of various human haplotype ancestor source databases
CN116377086A (en) * 2023-03-30 2023-07-04 山东省农业科学院家禽研究所(山东省无特定病原鸡研究中心) Chicken whole genome low-density chip and manufacturing method and application thereof
CN117542418A (en) * 2023-06-14 2024-02-09 河北农业大学 Method for evaluating seed conservation effect of seed conservation group based on low-depth whole genome resequencing technology
CN117637020A (en) * 2024-01-25 2024-03-01 鲁东大学 Tetraploid oyster whole genome SNP typing method based on deep learning
CN117637020B (en) * 2024-01-25 2024-04-30 鲁东大学 Tetraploid oyster whole genome SNP typing method based on deep learning

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104053789A (en) * 2012-05-14 2014-09-17 深圳华大基因医学有限公司 Method, System And Computer Readable Medium For Determining Base Information In Predetermined Area Of Fetus Genome
CN108090324A (en) * 2018-01-16 2018-05-29 深圳市泰康吉音生物科技研发服务有限公司 Pathogenic Microorganisms On Tropical method based on high-throughput gene sequencing data
CN108220403A (en) * 2017-12-26 2018-06-29 北京科迅生物技术有限公司 Detection method, detection device, storage medium and the processor in specific mutation site
CN108256289A (en) * 2018-01-17 2018-07-06 湖南大地同年生物科技有限公司 A kind of method based on target area capture sequencing genomes copy number variation
CN108376210A (en) * 2018-02-12 2018-08-07 中国农业科学院作物科学研究所 A kind of breeding parent selection method excavated based on the advantageous haplotypes of full-length genome SNP of genomic information auxiliary breeding means II-
CN109033752A (en) * 2018-08-13 2018-12-18 上海科穹生物信息技术有限公司 It is a kind of to read the long polygenes fusion detection method being sequenced based on long
CN109063417A (en) * 2018-07-09 2018-12-21 福建国脉生物科技有限公司 A kind of genotype complementing method constructing hidden Markov chain
CN109416928A (en) * 2016-06-07 2019-03-01 伊路米纳有限公司 For carrying out the bioinformatics system, apparatus and method of second level and/or tertiary treatment
CN110093406A (en) * 2019-05-27 2019-08-06 新疆农业大学 A kind of argali and its filial generation gene research method
CN110349631A (en) * 2019-07-30 2019-10-18 苏州亿康医学检验有限公司 Determine the analysis method and device of the haplotype of descendant object
CN110714082A (en) * 2019-09-03 2020-01-21 中国农业大学 SNP (Single nucleotide polymorphism) locus related to number of pig breasts as well as detection method and application thereof
CN110951889A (en) * 2018-09-26 2020-04-03 中国农业大学 Haplotype molecular marker related to chicken body weight and application thereof

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104053789A (en) * 2012-05-14 2014-09-17 深圳华大基因医学有限公司 Method, System And Computer Readable Medium For Determining Base Information In Predetermined Area Of Fetus Genome
CN109416928A (en) * 2016-06-07 2019-03-01 伊路米纳有限公司 For carrying out the bioinformatics system, apparatus and method of second level and/or tertiary treatment
CN108220403A (en) * 2017-12-26 2018-06-29 北京科迅生物技术有限公司 Detection method, detection device, storage medium and the processor in specific mutation site
CN108090324A (en) * 2018-01-16 2018-05-29 深圳市泰康吉音生物科技研发服务有限公司 Pathogenic Microorganisms On Tropical method based on high-throughput gene sequencing data
CN108256289A (en) * 2018-01-17 2018-07-06 湖南大地同年生物科技有限公司 A kind of method based on target area capture sequencing genomes copy number variation
CN108376210A (en) * 2018-02-12 2018-08-07 中国农业科学院作物科学研究所 A kind of breeding parent selection method excavated based on the advantageous haplotypes of full-length genome SNP of genomic information auxiliary breeding means II-
CN109063417A (en) * 2018-07-09 2018-12-21 福建国脉生物科技有限公司 A kind of genotype complementing method constructing hidden Markov chain
CN109033752A (en) * 2018-08-13 2018-12-18 上海科穹生物信息技术有限公司 It is a kind of to read the long polygenes fusion detection method being sequenced based on long
CN110951889A (en) * 2018-09-26 2020-04-03 中国农业大学 Haplotype molecular marker related to chicken body weight and application thereof
CN110093406A (en) * 2019-05-27 2019-08-06 新疆农业大学 A kind of argali and its filial generation gene research method
CN110349631A (en) * 2019-07-30 2019-10-18 苏州亿康医学检验有限公司 Determine the analysis method and device of the haplotype of descendant object
CN110714082A (en) * 2019-09-03 2020-01-21 中国农业大学 SNP (Single nucleotide polymorphism) locus related to number of pig breasts as well as detection method and application thereof

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113539357B (en) * 2021-06-10 2024-04-30 阿里巴巴达摩院(杭州)科技有限公司 Gene detection method, model training method, device, equipment and system
CN113517022A (en) * 2021-06-10 2021-10-19 阿里巴巴新加坡控股有限公司 Gene detection method, feature extraction method, device, equipment and system
CN113539357A (en) * 2021-06-10 2021-10-22 阿里巴巴新加坡控股有限公司 Gene detection method, model training method, device, equipment and system
CN113450871A (en) * 2021-06-28 2021-09-28 广东博奥医学检验所有限公司 Method for identifying sample identity based on low-depth sequencing
CN113832252A (en) * 2021-11-02 2021-12-24 华南农业大学 Method for detecting SNP locus genotype of indica-japonica rice
CN114242164A (en) * 2021-12-21 2022-03-25 苏州吉因加生物医学工程有限公司 Analysis method, device and storage medium for whole genome replication
CN114783527A (en) * 2022-05-23 2022-07-22 广州鸿溪见杉科技有限公司 Construction method of various human haplotype ancestor source databases
CN114783527B (en) * 2022-05-23 2024-05-03 宋清 Construction method of haplotype progenitor source database of various people
CN116377086A (en) * 2023-03-30 2023-07-04 山东省农业科学院家禽研究所(山东省无特定病原鸡研究中心) Chicken whole genome low-density chip and manufacturing method and application thereof
CN116377086B (en) * 2023-03-30 2024-03-15 山东省农业科学院家禽研究所(山东省无特定病原鸡研究中心) Chicken whole genome low-density chip and manufacturing method and application thereof
CN117542418A (en) * 2023-06-14 2024-02-09 河北农业大学 Method for evaluating seed conservation effect of seed conservation group based on low-depth whole genome resequencing technology
CN117637020B (en) * 2024-01-25 2024-04-30 鲁东大学 Tetraploid oyster whole genome SNP typing method based on deep learning
CN117637020A (en) * 2024-01-25 2024-03-01 鲁东大学 Tetraploid oyster whole genome SNP typing method based on deep learning

Similar Documents

Publication Publication Date Title
CN112885408A (en) Method and device for detecting SNP marker locus based on low-depth sequencing
Zhao et al. Detection of selection signatures in dairy and beef cattle using high-density genomic information
Davey et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing
Rowan et al. Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping
Schlötterer et al. Sequencing pools of individuals—mining genome-wide polymorphism data without big funding
Aylor et al. Genetic analysis of complex traits in the emerging Collaborative Cross
Liu et al. Gene mapping via bulked segregant RNA-Seq (BSR-Seq)
Van Bers et al. The design and cross‐population application of a genome‐wide SNP chip for the great tit Parus major
KR102080120B1 (en) Biomarker composition for predicting thermal-tolerance phenotype of Abalone
CN102121046A (en) Chinese population linkage analysis single nucleotide polymorphism (SNP) marker sets and use method and application thereof
US20210285063A1 (en) Genome-wide maize snp array and use thereof
CN107090494A (en) The molecular labeling related to millet code grain number character and its detection primer and application
CN116516029A (en) Golden pomfret whole genome breeding chip and application
Knief et al. Association mapping of morphological traits in wild and captive zebra finches: reliable within, but not between populations
KR101741252B1 (en) Gene composition for parentage testing in hanwoo
Bradley et al. A major zebrafish polymorphism resource for genetic mapping
Marsjan et al. Molecular markers, a tool for exploring genetic diversity
US20170204474A1 (en) Bulk Allele Discrimination Assay
US20090264307A1 (en) Array-based polymorphism mapping at single nucleotide resolution
KR101825497B1 (en) Kits for Detecting Equine Maternal lineage and Predicting Athletic Ability with Single Nucleotide Polymorphism and Method for Detection of Equine Maternal lineage and Prediction of Athletic Ability thereby
JP7446343B2 (en) Systems, computer programs and methods for determining genome ploidy
CN106755370B (en) Method for detecting sheep FTH-1 gene single nucleotide polymorphism by using PCR-RFLP and application thereof
KR101740634B1 (en) Gene composition for parentage testing in wagyu
CN109750106A (en) A kind of combination of long-chain non-coding RNA and its detection method and application for evaluating bull sperm vigor height
KR101706656B1 (en) Gene composition for analyzing single nucleotide polymorphism in hanwoo

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination