CN116168763A - Method and device for grouping and assembling autotetraploid genome, method and device for constructing chromosome and application of method and device - Google Patents

Method and device for grouping and assembling autotetraploid genome, method and device for constructing chromosome and application of method and device Download PDF

Info

Publication number
CN116168763A
CN116168763A CN202211691347.6A CN202211691347A CN116168763A CN 116168763 A CN116168763 A CN 116168763A CN 202211691347 A CN202211691347 A CN 202211691347A CN 116168763 A CN116168763 A CN 116168763A
Authority
CN
China
Prior art keywords
genome
data set
sequencing data
autotetraploid
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211691347.6A
Other languages
Chinese (zh)
Other versions
CN116168763B (en
Inventor
李志民
杨伟飞
王娟
张雪梅
李晓波
涂成芳
刘涛
王众司
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Annoroad Gene Technology Beijing Co ltd
Original Assignee
Annoroad Gene Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Annoroad Gene Technology Beijing Co ltd filed Critical Annoroad Gene Technology Beijing Co ltd
Priority to CN202211691347.6A priority Critical patent/CN116168763B/en
Publication of CN116168763A publication Critical patent/CN116168763A/en
Application granted granted Critical
Publication of CN116168763B publication Critical patent/CN116168763B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a device for grouping and assembling a homotetraploid genome, a method and a device for constructing a chromosome and application thereof. The parting assembly method comprises the following steps: step 1, respectively comparing a sequencing dataset of a sample with a reference genome of a kindred diploid species after typing; identifying genomic variation information according to the comparison result and typing to obtain a sequencing data set I which can be typed and is similar to a reference genome, a sequencing data set II which can be typed and is dissimilar to the reference genome and a sequencing data set III which cannot be typed; step 2, extracting a sequencing data set IV with single base depth more than or equal to 1/2 average depth; step 3, assembling the sequencing data set I with the sequencing data set III; and assembling the sequencing data set II with the sequencing data set III and the sequencing data set IV. The method and the device have good parting and assembling effects, can be applied to highly-homologous tetraploid samples, and are low in cost and easy to sample.

Description

Method and device for grouping and assembling autotetraploid genome, method and device for constructing chromosome and application of method and device
Technical Field
The invention relates to the field of biotechnology, in particular to a method and a device for the genotyping assembly of a homotetraploid genome, a method and a device for constructing a chromosome, a genome and a chromosome sequence obtained by the method and/or the device and application thereof.
Background
The application of genome assembly technology has greatly driven the development of basic life sciences and medical research fields. The traditional genome assembly strategy is inevitably assembled into a chimeric genome because of neglecting the difference between homologous chromosomes, and cannot distinguish the difference of allele expression of the homologous chromosomes, the difference of homologous chromosome modification and the like. To break this limitation, the haplotype genome assembly technique comes with advantage, and has become a breakthrough technique for highly accurate genome assembly and accurate site screening.
At present, 2 ideas exist for the genome typing assembly technology of the autotetraploid, but both ideas have own limitations.
The first method is tetraploid genome typing technology based on ALLHiC method, the tetraploid which is mainly successfully applied at present mainly comprises sugarcane and alfalfa, and the research species are firstly clustered according to the annotation of the kindred species; the second step is to remove hic interaction relation between homologous chromosomes according to the constructed homologous fragment file; thirdly, clustering the contigs according to the pruned bam file; the fourth step is to find out partial non-clustered contigs sequences according to the original bam file, and divide the contigs into corresponding cluster groups according to hic interaction signals; the fifth step is to sort the clustering results; the final step is to construct an agp, sequence information and Hi-C interaction heat map for each chromosome. The method has the advantages of less dependence on information, high-quality genome typing work can be completed, and high-quality genome results are obtained; the method has the defects that aiming at the tetraploid with higher homology, the typing effect is poor, and the region with higher homology only has a set of sequences assembled, so that the construction influence on the homology information table in the first step is larger, and even part of tetraploid with higher homology cannot completely construct a chromosome. A flow chart of the ALLHiC versus autotetraploid typing principle may be as shown in fig. 5.
The second method is a gametophyte single-cell sequencing auxiliary typing method, which combines single-cell sequencing and three-generation sequencing technologies of gametophytes, and is successfully applied to the typing genome assembly work of human and autotetraploid potatoes. Firstly, pollen and tissue materials are obtained, the tissue materials are subjected to three-generation sequencing and Hi-C sequencing, pollen (haploid) is subjected to 10X single cell sequencing, three-generation sequencing data are subjected to primary assembly, meanwhile, pollen single cell sequencing data are split, the primary assembly sequence is divided into 4 sets according to the split result, the genome of each set is connected with a chromosome by utilizing HiC data, and finally, a chromosome-level gene is formed. A flow chart of the principle of single cell sequencing assisted typing can be seen in fig. 6.
In conclusion, the prior art mainly has poor parting and assembling effects on tetraploids with higher homology; single cell sequencing of gametes and the like results in significant expense; and the sampling of pollen and tissue material samples is difficult.
Disclosure of Invention
In view of the above problems with the prior art, the present invention provides a method and apparatus for the genotyping assembly of a autotetraploid genome, and a method and apparatus for constructing a chromosome. The method and the device can be applied to highly homologous autotetraploid samples, have good parting assembly results, do not need additional sequencing technology, do not need single-cell sequencing of gametes, further can reduce cost, do not need acquisition of material pollen, tissue materials and the like, and are easy to sample.
In one aspect, the invention provides a method of genotyping an autotetraploid genome comprising:
step 1, respectively comparing a sequencing data set of a autotetraploid genome sample with a reference genome of a kindred diploid species of the sample after typing; identifying genomic variation information according to the comparison result and typing to obtain a sequencing data set I which can be typed and is similar to a reference genome, a sequencing data set II which can be typed and is dissimilar to the reference genome and a sequencing data set III which cannot be typed;
step 2, comparing the sequencing data set of the sample with any one group in the reference genome to obtain single base depth, and extracting a sequencing data set IV with the single base depth being more than or equal to 1/2 average depth;
step 3, combining and assembling the sequencing data set I and the sequencing data set III to obtain a first genome and a second genome of the autotetraploid genome; and combining and assembling the sequencing data set II with the sequencing data set III and the sequencing data set IV to obtain a third genome and a fourth genome of the autotetraploid genome.
Further, the sequencing dataset of the sample comprises long reads of the autotetraploid genome.
Further, the long reads of the autotetraploid genome are long reads obtained by a three-generation sequencing method.
Further, the third generation sequencing method is selected from Pacbio and/or Nanopore.
Further, the Pacbio is selected from HiFi.
Further, before the step 1, the method further comprises: typing the kindred diploid species of the sample to obtain a reference genome A and a reference genome a.
Further, step 2 includes: comparing the sequencing data length reads of the autotetraploid genome sample with the reference genome A or the reference genome a to obtain single base depth, and extracting the sequencing data IV of which the single base depth is 1/2-1 times of the average depth.
Further, the genomic variation information is selected from one or more of SNP, indel, and SV.
Further, the method of typing includes one or more of whatshap, longphase.
In yet another aspect, the invention provides four sets of genomes of the autotetraploid samples obtained according to the genotyping assembly method described above.
In yet another aspect, the invention provides a method of constructing a chromosome, the method comprising: chromosome construction is carried out on four groups of genome of the autotetraploid sample obtained by the parting assembly method.
Further, the chromosome construction adopts Hi-C construction.
In yet another aspect, the present invention provides a chromosomal sequence of a autotetraploid sample prepared according to the method described above.
In yet another aspect, the present invention provides a device for genotyping a tetraploid genome for use in the above-described genotyping method, the device comprising: a first comparison unit, a second comparison unit and an assembly unit, wherein,
the first comparison unit is used for respectively comparing a sequencing dataset of a autotetraploid genome sample with a reference genome of a kindred diploid species of the sample after typing; identifying genomic variation information according to the comparison result and typing to obtain a sequencing data set I which can be typed and is similar to a reference genome, a sequencing data set II which can be typed and is dissimilar to the reference genome and a sequencing data set III which cannot be typed;
the second comparison unit is used for comparing the sequencing data set of the sample with any one group in the reference genome to obtain single base depth, and extracting a sequencing data set IV with the single base depth being more than or equal to 1/2 average depth;
the assembling unit is used for combining and assembling the sequencing data set I and the sequencing data set III to obtain a first genome and a second genome of the autotetraploid genome; and combining and assembling the sequencing data set II with the sequencing data set III and the sequencing data set IV to obtain a third genome and a fourth genome of the autotetraploid genome.
In still another aspect, the present invention provides a device for constructing a chromosome, the device comprising the above-described autotetraploid genome typing assembly device and a construction unit, wherein the construction unit is used for chromosome construction of four sets of genomes of the autotetraploid sample obtained by the autotetraploid genome typing assembly device.
Further, the chromosome construction unit is Hi-C.
In a further aspect, the invention provides the use of the method of genotyping and assembling the above-described autotetraploid genome, the above-described tetraset of genomes of autotetraploid samples, the above-described method of constructing chromosomes, the above-described chromosomal sequences of autotetraploid samples, the above-described device for genotyping and assembling the autotetraploid genes, or the above-described device for constructing chromosomes in species evolution and molecular breeding.
Further, the use is in genome assembly, more preferably in haplotype genome assembly.
The invention has the following advantages:
1. in the method and apparatus of the present invention, a sequenced dataset of a sample of a autotetraploid genome is compared with a typed reference genome of a kindred diploid species of the sample, respectively; and identifying genomic variation information according to the comparison result, typing, and extracting the sequenced data set after typing to obtain a sequenced data set I which can be typed and is similar to the reference genome, a sequenced data set II which can be typed and is dissimilar to the reference genome and a sequenced data set III which cannot be typed. Because the homology of the sequencing data set I and the sequencing data set II is higher, the set of sequencing data set I close to the kindred species contains the sequencing data of the partial sequencing data set II, so that the sequencing data sets with single base depth of more than or equal to 1/2 average depth of a set of kindred species are combined, and the accuracy, the integrity and the consistency of the data can be remarkably improved.
2. Compared with a single-cell sequencing-assisted parting method (single-cell parting technology), the method and the device do not need additional sequencing technology, single-cell sequencing of gametes is not needed, so that the cost can be reduced, material pollen, tissue materials and the like do not need to be acquired, and therefore, samples are easy to sample.
3. The method and apparatus of the present invention can be applied to highly homologous autotetraploid typing with good results relative to the ALLHiC method.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a schematic diagram of the method for typing the autotetraploid genome provided in example 1 of the present invention.
FIG. 2a is a thermal interaction diagram of H1 genomic chromosomes in the four sets of genomes of example 2 of the invention.
FIG. 2b is a thermal interaction diagram of H2 genomic chromosomes in the four sets of genomes of example 2 of the invention.
FIG. 2c is a thermal interaction diagram of H3 genomic chromosomes in the four sets of genomes of example 2 of the invention.
FIG. 2d is a thermal interaction diagram of H4 genomic chromosomes in the four sets of genomes of example 2 of the invention.
FIG. 3 is a thermal interaction diagram of the inside of 4 sets of chromosomes according to example 2 of the present invention.
FIG. 4 is a graph showing the collinearity profile of 4 sets of chromosomes and closely related species of the present invention in example 3.
FIG. 5 is a schematic representation of the principle of autotetraploid typing by ALLHiC.
FIG. 6 is a schematic diagram of the principle of single cell sequencing-assisted typing.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
The first aspect of the present invention provides a method of genotyping an autotetraploid genome comprising:
step 1, respectively comparing a sequencing data set of a autotetraploid genome sample with a reference genome of a kindred diploid species of the sample after typing; identifying genomic variation information according to the comparison result and typing to obtain a sequencing data set I which can be typed and is similar to a reference genome, a sequencing data set II which can be typed and is dissimilar to the reference genome and a sequencing data set III which cannot be typed;
step 2, comparing the sequencing data set of the sample with any one group in the reference genome to obtain single base depth, and extracting a sequencing data set IV with the single base depth being more than or equal to 1/2 average depth;
step 3, combining and assembling the sequencing data set I and the sequencing data set III to obtain a first genome and a second genome of the autotetraploid genome; and combining and assembling the sequencing data set II with the sequencing data set III and the sequencing data set IV to obtain a third genome and a fourth genome of the autotetraploid genome.
In the present invention, the increased chromosome set is derived from the same closely related species, and the cell contains four chromosome sets. The occurrence of autotetraploid plants under natural conditions, the occurrence of autotetraploid plants is often two. When mitosis is carried out in the growth and development process of diploid plant seedlings, chromosome doubling somatic cells are formed due to unknown reasons, and the chromosome doubling somatic cells are subjected to normal mitosis to form tetraploid plants. Since the doubled chromosomes are from the same diploid species, they are also known as autotetraploids; the second is that diploid plants form unreduced gametes during meiosis to form gametes, and that two unreduced gametes fertilize to form a autotetraploid for unknown reasons.
According to the genotyping assembly method of the invention, preferably, the sequencing dataset of the sample comprises long reads of the autotetraploid genome, further, the long reads of the autotetraploid genome are long reads obtained by a three-generation sequencing method. In the technical field of the invention, short-reading long sequencing, long-reading long sequencing and direct sequencing are all common general knowledge. The long reads (long read length) defined in the present application are common knowledge in the art, and are not described in detail herein.
According to the typing assembly method of the present invention, preferably, the third generation sequencing method is selected from Pacbio and/or Nanopore. More preferably, the Pacbio is selected from HiFi.
According to the parting assembly method of the present invention, preferably, before the step 1, further comprises: typing the kindred diploid species of the sample to obtain a reference genome A and a reference genome a. In the present invention, winnowmap can be used to align long reads of a autotetraploid sample onto reference genome A, a of a kindred diploid species, align with samtools and rank the files.
According to the parting assembly method of the present invention, preferably, step 2 includes: comparing the sequencing data length reads of the autotetraploid genome sample with the reference genome A or the reference genome a to obtain single base depth, and extracting the sequencing data IV of which the single base depth is 1/2-1 times of the average depth. In the present invention, samtools can be used to obtain single base depths.
According to the genotyping assembly method of the invention, preferably, the genomic variation information is selected from one or more of SNP, indel and SV. The method for identifying genomic variations may employ, but is not limited to: the pair-hidden Markov model longshot. Genomic SNP variation information is identified, for example, using longshot model.
According to the typing assembly method of the present invention, preferably, the typing may be performed by selecting a method having similarity with the reference genome higher than a set threshold, or may be performed by using a mathematical model, preferably, a mathematical model, and more preferably, the typing method may be performed by using, but not limited to: whatshap, longphase. By the above-described typing method, a sequencing dataset I which can be typed and is similar to the reference genome, a sequencing dataset II which can be typed and is dissimilar to the reference genome, and a sequencing dataset III which cannot be typed are obtained. That is, according to the above-described typing method, it can be judged whether typing is possible, and whether the typing is similar to the reference genome in the sequencing dataset of the typing. For example, with whotshap, a sequencing dataset that can be typed and is similar to the reference genome shows 0, a sequencing dataset that can be typed and is dissimilar to the reference genome shows 1, and a sequencing dataset that cannot be typed shows unise.
According to the genotyping assembly method of the invention, preferably, the method of pooled assembly may obtain complete sets of genomic genetic information in a species, e.g., tetraploid species, according to an assembly algorithm, and eventually assemble 4 sets of genomic genetic information. In the present invention, the method of merging and assembling may be, but is not limited to: hifiasm, falcon-one or more of uzzi, falcon-phase.
According to the parting assembly method of the present invention, preferably, after the step 1 and before the step 2, the mutation result is filtered to obtain a high quality mutation parting result. The high quality can be the result of genotyping with a genotyping quality value (GQ value) > 70.
According to the genotyping assembly method of the invention, the kindred diploid species may preferably be a kindred diploid species known in the art or may be a kindred diploid species of an orthotetraploid sample. By adopting the method, the tetraploid sample with high homology can be well typed and assembled.
In the present invention, the term "average depth" refers to the ratio of the number of all bases obtained in a given region to the length of the region. For example, a region has a total of 4 sequences covered, a sequence has 10 bases, and 40 bases covered by the sequence have an average depth of 4, and when 2 sequences are aligned to cover a single base, the single base depth is 1/2 of the average depth.
In a second aspect, the invention provides four sets of genomes of a autotetraploid sample obtained according to the genotyping assembly method described above.
In a third aspect, the invention provides a method of constructing a chromosome, the method comprising: chromosome construction is carried out on four groups of genome of the autotetraploid sample obtained by the parting assembly method.
According to the method of the invention, preferably, the chromosome construction is carried out using Hi-C construction.
In a fourth aspect, the present invention provides a chromosomal sequence of a autotetraploid sample prepared according to the method described above.
In a fifth aspect, the present invention provides a device for the genotyping of a autotetraploid genome for use in the above described genotyping method, the device comprising: a first comparison unit, a second comparison unit and an assembly unit, wherein,
the first comparison unit is used for respectively comparing a sequencing dataset of a autotetraploid genome sample with a reference genome of a kindred diploid species of the sample after typing; identifying genomic variation information according to the comparison result and typing to obtain a sequencing data set I which can be typed and is similar to a reference genome, a sequencing data set II which can be typed and is dissimilar to the reference genome and a sequencing data set III which cannot be typed;
the second comparison unit is used for comparing the sequencing data set of the sample with any one group in the reference genome to obtain single base depth, and extracting a sequencing data set IV with the single base depth being more than or equal to 1/2 average depth;
the assembling unit is used for combining and assembling the sequencing data set I and the sequencing data set III to obtain a first genome and a second genome of the autotetraploid genome; and combining and assembling the sequencing data set II with the sequencing data set III and the sequencing data set IV to obtain a third genome and a fourth genome of the autotetraploid genome.
In a sixth aspect, the present invention provides a device for constructing a chromosome, the device comprising the above-described autotetraploid genome typing assembly device and a construction unit for chromosome construction of four sets of genomes of a autotetraploid sample obtained by the autotetraploid genome typing assembly device.
According to the device of the invention, preferably, the chromosome construction unit is Hi-C.
In a seventh aspect, the invention provides the use of a method of genotyping and assembling the above-described autotetraploid genome, the above-described tetrad genome of an autotetraploid sample, the above-described method of constructing chromosomes, the above-described chromosomal sequence of an autotetraploid sample, the above-described apparatus for genotyping and assembling the autotetraploid genome, or the above-described apparatus for constructing chromosomes in species evolution and molecular breeding.
Preferably, the use according to the invention is in genome assembly, more preferably in haplotype genome assembly.
The invention will now be described with reference to specific examples, which are intended to be illustrative only and are not to be construed as limiting the invention.
Example 1
A method and device for typing and assembling a autotetraploid genome, wherein the principle schematic diagram of the typing and assembling method is shown in figure 1. In FIG. 1, hps1 represents reference genome A after typing of an kindred diploid species; hps2 represents the reference genome a after typing of the kindred diploid species; hapA1 reads and HapA2 reads represent reads which can be typed after being compared with a reference genome A; non-typed a reads represent reads that cannot be typed after alignment with reference genome a; hapB1 reads, hapB2 reads represent reads that can be typed after alignment with reference genome a; untyped B reads represent reads that cannot be typed after alignment with reference genome a; the non-typed A, B reads represent the sum of the non-typed A reads and the non-typed B reads; the H1, H2, H3, H4 genomes represent the four genomes of the highly homologous tetraploid potato Atlantic sample, respectively.
Step 1, a tetraploid genome sample is a tetraploid potato Atlantic sample with high homology, and sequencing data 71Gb is obtained through Pacbio third-generation HiFi sequencing; hi-C data of 130G. The kindred diploid species is diploid potato RH89-039-16, and the reference genome A and the reference genome a are typed by the kindred diploid species.
Step 2, long reads of the autotetraploid potato samples were aligned to the reference genome a, a of the kindred diploid species using winnow map, aligned and sequenced with samtools. The tetraploid potato samples were aligned to a, the alignment of a is shown in table 1.
TABLE 1 alignment to A, a alignment ratio
Figure 463537DEST_PATH_IMAGE001
And 3, identifying genome SNP variation information by using a longshot model according to the comparison result. The number of identified variations is shown in Table 2.
TABLE 2 number of identified variations
Figure 261729DEST_PATH_IMAGE002
And 4, filtering the mutation result to obtain a mutation typing result with high quality (GQ value is more than 70), wherein the mutation typing result is shown in Table 3.
TABLE 3 high quality variant typing results
Figure 132733DEST_PATH_IMAGE003
And 5, typing reads by using whotshap according to the comparison result and the typing variation result to obtain a sequencing data set I which can be typed and is similar to the reference genome A, a sequencing data set II which can be typed and is dissimilar to the reference genome A, a and a sequencing data set III which cannot be typed.
Comparing the sequencing data set of the autotetraploid potato sample with the reference genome A, obtaining single base depth by utilizing samtools, and extracting the sequencing data set IV with the single base depth being more than or equal to 1/2 of the average depth. Reads and data volume statistics for each sequencing dataset after typing are shown in Table 4.
TABLE 4 reads and data volume statistics for each sequencing dataset after typing
Figure 457404DEST_PATH_IMAGE004
Step 6, combining and assembling the sequencing data set I and the sequencing data set III by using the hifiasm to obtain a first genome (H1 genome) and a second genome (H2 genome) of the autotetraploid genome; and combining and assembling the sequencing data set II with the sequencing data set III and the sequencing data set IV to obtain a third genome (H3 genome) and a fourth genome (H4 genome) of the autotetraploid genome, and finally obtaining 4 genomes, wherein the results are shown in Table 5.
TABLE 5.4 group haplotype genome Assembly results and BUSCO results
Figure 247505DEST_PATH_IMAGE005
Wherein BUSCO is a universal single copy homologous gene benchmark. The H1 genome, the H2 genome, the H3 genome and the H4 genome are respectively four groups of genomes of the tetraploid potato Atlantic sample with high homology. Contig_len (bp) represents the length of the Contig base. Contig_num represents the number of Contig.
The continuity and integrity of the assembly can be demonstrated to be better by table 5.
Example 2
Step 1, chromosome construction was performed on the four groups of genomes (H1 genome, H2 genome, H3 genome, H4 genome) obtained in example 1 by Hi-C auxiliary assembly software Lachesis. The chromosome construction effect is shown in FIG. 2a, FIG. 2b, FIG. 2c, and FIG. 2 d.
And 2, carrying out overall interaction heat map drawing on 4 sets of chromosomes by using a drawing tool of Hi-C auxiliary assembly software ALLHiC, wherein the result is shown in figure 3.
The thermal diagram can be used for explaining that the quality of the assembled chromosome is higher, and the quality of the four sets of chromosome typing of the homotetraploid is reliable.
Example 3
The entire chromosomes of the assembled highly homologous tetraploid potato Atlantic sample of example 2 were aligned with one haploid of the kindred diploid species diploid potato RH89-039-16 using the minimap2 software and the colinear results plotted using the R software as shown in fig. 4.
In FIG. 4, the x-axis represents a set of chromosomes (12 chromosomes) of a diploid potato of the closely related species and the y-axis represents the chromosomes of each set of genomes of a autotetraploid potato. The chromosomes of each kindred species on the x axis correspond to 4 chromosomes of the autotetraploid sample of the invention, and 48 chromosomes (4×12) on the y axis are added, so that the result of the typing assembly method of the invention has higher accuracy, consistency and completeness.
Comparative example 1
The same autotetraploid potato Atlantic sample and close-source diploid potato RH89-039-16 were used as in example 1. Tetraploid genome typing technique based on ALLHiC method (principle is shown in FIG. 5).
With this method, a chromosome cannot be constructed.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A tetrad genome of a autotetraploid sample, wherein the tetrad genome of the autotetraploid sample is obtained by a genotyping assembly method of the autotetraploid genome, said genotyping assembly method of the autotetraploid genome comprising:
step 1, respectively comparing a sequencing data set of a autotetraploid genome sample with a reference genome of a kindred diploid species of the sample after typing; identifying genomic variation information according to the comparison result and typing to obtain a sequencing data set I which can be typed and is similar to a reference genome, a sequencing data set II which can be typed and is dissimilar to the reference genome and a sequencing data set III which cannot be typed;
step 2, comparing the sequencing data set of the sample with any one group in the reference genome to obtain single base depth, and extracting a sequencing data set IV with the single base depth being more than or equal to 1/2 average depth;
step 3, combining and assembling the sequencing data set I and the sequencing data set III to obtain a first genome and a second genome of the autotetraploid genome; and combining and assembling the sequencing data set II with the sequencing data set III and the sequencing data set IV to obtain a third genome and a fourth genome of the autotetraploid genome.
2. The four-group genome of claim 1, wherein the sequencing dataset of the sample comprises long reads of a autotetraploid genome.
3. The four-group genome of claim 2, wherein the long reads of the autotetraploid genome are long reads obtained by a three-generation sequencing method; preferably, the third generation sequencing method is selected from Pacbio and/or Nanopore; more preferably, the Pacbio is selected from HiFi.
4. A four-group genome according to any of claims 1-3, characterized in that prior to step 1 further comprises: typing the kindred diploid species of the sample to obtain a reference genome A and a reference genome a.
5. The four-group genome of claim 4, wherein step 2 comprises: comparing the sequencing data length reads of the autotetraploid genome sample with the reference genome A or the reference genome a to obtain single base depth, and extracting the sequencing data IV of which the single base depth is 1/2-1 times of the average depth.
6. A four-group genome according to any of claims 1-3, wherein the genomic variation information is selected from one or more of SNPs, indels and SVs.
7. A four-group genome according to any of claims 1-3, wherein the method of typing comprises one or more of whatshap, longphase.
8. A method of constructing a chromosome, the method comprising: chromosome construction of a four-group genome of the autotetraploid sample of any one of claims 1-7; preferably, the chromosome construction is Hi-C construction.
9. A chromosomal sequence of a autotetraploid sample made according to the method of claim 8.
10. The use of the tetraploid sample's tetrad genome according to any one of claims 1-7, the method of constructing a chromosome according to claim 8, the chromosome sequence of the tetraploid sample according to claim 9 in species evolution and molecular breeding, preferably in genome assembly.
CN202211691347.6A 2022-09-06 2022-09-06 Method and device for constructing chromosome and application thereof Active CN116168763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211691347.6A CN116168763B (en) 2022-09-06 2022-09-06 Method and device for constructing chromosome and application thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211081173.1A CN115148289B (en) 2022-09-06 2022-09-06 Method and device for assembling autotetraploid gene component types and device for constructing chromosome
CN202211691347.6A CN116168763B (en) 2022-09-06 2022-09-06 Method and device for constructing chromosome and application thereof

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202211081173.1A Division CN115148289B (en) 2022-09-06 2022-09-06 Method and device for assembling autotetraploid gene component types and device for constructing chromosome

Publications (2)

Publication Number Publication Date
CN116168763A true CN116168763A (en) 2023-05-26
CN116168763B CN116168763B (en) 2024-08-13

Family

ID=83415271

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202211081173.1A Active CN115148289B (en) 2022-09-06 2022-09-06 Method and device for assembling autotetraploid gene component types and device for constructing chromosome
CN202211691347.6A Active CN116168763B (en) 2022-09-06 2022-09-06 Method and device for constructing chromosome and application thereof

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202211081173.1A Active CN115148289B (en) 2022-09-06 2022-09-06 Method and device for assembling autotetraploid gene component types and device for constructing chromosome

Country Status (1)

Country Link
CN (2) CN115148289B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115762633B (en) * 2022-11-23 2024-01-23 哈尔滨工业大学 Genome structure variation genotype correction method based on three-generation sequencing

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046105A (en) * 2015-07-09 2015-11-11 天津诺禾医学检验所有限公司 Haplotype map of chromosome span, and construction method thereof
CN107153777A (en) * 2017-05-03 2017-09-12 武汉菲沙基因信息有限公司 A kind of method for the diplodization degree for estimating tetraploid species gene group
CN108138231A (en) * 2015-09-29 2018-06-08 路德维格癌症研究有限公司 Parting and assembling split gene set of pieces
CN108350495A (en) * 2016-02-26 2018-07-31 深圳华大生命科学研究院 The method and apparatus assembled to separating long segment sequence
CN110997936A (en) * 2017-09-08 2020-04-10 深圳华大生命科学研究院 Method and device for genotyping based on low-depth genome sequencing and application of method and device
CN111816248A (en) * 2020-05-22 2020-10-23 武汉菲沙基因信息有限公司 Complete genome typing method based on Pacbio libraries and Hi-C reads
CN113496760A (en) * 2020-04-01 2021-10-12 深圳华大基因科技服务有限公司 Polyploid genome assembling method and device based on third-generation sequencing

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AUPO780297A0 (en) * 1997-07-09 1997-07-31 Wu Li Dance Company Pty Ltd Determination of genetic sex in equine species by analysis of y-chromosomal DNA sequences
WO2010044923A1 (en) * 2008-10-21 2010-04-22 Morehouse School Of Medicine Methods for haplotype determination by haplodissection
US20210280269A1 (en) * 2020-03-06 2021-09-09 Laboratory Corporation Of America Holdings Assay for Hemoglobin A (HBA) Detection and Genotyping
CN112397149B (en) * 2020-11-11 2023-06-09 天津现代创新中药科技有限公司 Transcriptome analysis method and system without reference genome sequence
CN112820354B (en) * 2021-02-25 2022-07-22 深圳华大基因科技服务有限公司 Method and device for assembling diploid and storage medium
CN112908413A (en) * 2021-03-22 2021-06-04 深圳市血液中心(深圳市输血医学研究所) Blood typing method based on ABO gene
CN113817725B (en) * 2021-10-15 2024-05-14 西安浩瑞基因技术有限公司 HLA gene amplification primer, kit, sequencing library construction method and sequencing method
CN114678071A (en) * 2021-12-31 2022-06-28 杭州芯原力生物科技有限公司 HLA gene comprehensive analysis method based on high-throughput sequencing data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046105A (en) * 2015-07-09 2015-11-11 天津诺禾医学检验所有限公司 Haplotype map of chromosome span, and construction method thereof
CN108138231A (en) * 2015-09-29 2018-06-08 路德维格癌症研究有限公司 Parting and assembling split gene set of pieces
CN108350495A (en) * 2016-02-26 2018-07-31 深圳华大生命科学研究院 The method and apparatus assembled to separating long segment sequence
CN107153777A (en) * 2017-05-03 2017-09-12 武汉菲沙基因信息有限公司 A kind of method for the diplodization degree for estimating tetraploid species gene group
CN110997936A (en) * 2017-09-08 2020-04-10 深圳华大生命科学研究院 Method and device for genotyping based on low-depth genome sequencing and application of method and device
CN113496760A (en) * 2020-04-01 2021-10-12 深圳华大基因科技服务有限公司 Polyploid genome assembling method and device based on third-generation sequencing
CN111816248A (en) * 2020-05-22 2020-10-23 武汉菲沙基因信息有限公司 Complete genome typing method based on Pacbio libraries and Hi-C reads

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HEQUAN SUN: "Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar", 《NATURE PORTFOLIO》, 3 March 2022 (2022-03-03) *

Also Published As

Publication number Publication date
CN115148289B (en) 2023-01-24
CN115148289A (en) 2022-10-04
CN116168763B (en) 2024-08-13

Similar Documents

Publication Publication Date Title
Yuan et al. Advances in optical mapping for genomic research
Sun et al. Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar
EP3207483B1 (en) Ancestral human genomes
Page et al. Insights into the evolution of cotton diploids and polyploids from whole-genome re-sequencing
Useche et al. High-throughput identification, database storage and analysis of SNPs in EST sequences
CN111816248B (en) Pacbio surassemblies and Hi-C reads-based whole genome typing method
KR20080026153A (en) Method of processing and/or genome mapping of ditag sequences
CN116168763B (en) Method and device for constructing chromosome and application thereof
Manching et al. Phased genotyping-by-sequencing enhances analysis of genetic diversity and reveals divergent copy number variants in maize
Page et al. Methods for mapping and categorization of DNA sequence reads from allopolyploid organisms
Baute et al. Using genomic approaches to unlock the potential of CWR for crop adaptation to climate change
CN109524060B (en) Genetic disease risk prompting gene sequencing data processing system and processing method
KR101539737B1 (en) Methodology for improving efficiency of marker-assisted backcrossing using genome sequence and molecular marker
CN111916151B (en) Traceability detection method and application of verticillium wilt of alfalfa
Li et al. Comparative chloroplast genomics of 24 species shed light on the genome evolution and phylogeny of subtribe Coelogyninae (Orchidaceae)
Tutaj et al. Rat genome assemblies, annotation, and variant repository
CN104598775B (en) A kind of rna editing event recognition method
CN114530200B (en) Mixed sample identification method based on calculation of SNP entropy
KR101911307B1 (en) Method for selecting and utilizing tag-SNP for discriminating haplotype in gene unit
Ke et al. LDB2000: sequence-based integrated maps of the human genome
Liao Construction of a Human Pangenome Reference to Improve Structural Variation Detection
CN118240948B (en) Identification method and application of genetic relationship of litopenaeus vannamei based on targeted sequencing typing
Liu et al. Development of Omni InDel and supporting database for maize
CN117524303A (en) Method for optimizing genetic structure of endangered goose protective population
Stelzer et al. Genome structure of Brachionus asplanchnoidis, a Eukaryote with intrapopulation variation in genome size

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant