WO2011071382A1 - Profilage polymorphique du génome entier - Google Patents

Profilage polymorphique du génome entier Download PDF

Info

Publication number
WO2011071382A1
WO2011071382A1 PCT/NL2010/050836 NL2010050836W WO2011071382A1 WO 2011071382 A1 WO2011071382 A1 WO 2011071382A1 NL 2010050836 W NL2010050836 W NL 2010050836W WO 2011071382 A1 WO2011071382 A1 WO 2011071382A1
Authority
WO
WIPO (PCT)
Prior art keywords
adapter
sequence
fragment
genome
fragments
Prior art date
Application number
PCT/NL2010/050836
Other languages
English (en)
Inventor
An Michels
Adriaan Jan Van Oeveren
Original Assignee
Keygene N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Keygene N.V. filed Critical Keygene N.V.
Publication of WO2011071382A1 publication Critical patent/WO2011071382A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention relates to the field of molecular biology and biotechnology.
  • the invention relates to the field of nucleic acid detection and identification. More in particular the invention relates to the generation of a physical map of a genome, or part thereof, using high-throughput sequencing technology, combined with the identification of polymorphic markers that are mapped to the physical map.
  • the invention further relates to the development of marker assays of the discovered polymorphic markers and the use of the markers in the generation of a genetic map. More in particular, the invention relates to the identification of sequence tags, from which a physical map is created, and polymorphic sequence tags, from which a genetic map is made, and which can be integrated into a genome wide high density physical map. Background of the invention
  • Integrated genetic and physical genome maps are extremely valuable for map-based gene isolation, comparative genome analysis and as sources of sequence-ready clones for genome sequencing projects.
  • the effect of the availability of an integrated map of physical and genetic markers of a species for genome research is enormous.
  • Integrated maps allow for precise and rapid gene mapping and precise mapping of microsatellite loci and SNP markers.
  • Various methods have been developed for assembling physical maps of genomes of varying complexity.
  • One of the better characterized approaches use restriction enzymes to generate large numbers of DNA fragments from genomic subclones (Brenner et al. , Proc. Natl. Acad. Sci. , (1989), 86, 8902-8906; Gregory et al., Genome Res.
  • a physical map is generated from a combination of restriction enzyme digestion of clones in a library, pooling, restriction enzyme digestion, adapter-ligation, (selective) amplification, high-throughput sequencing and deconvolution of the resulting sequences results into BAC clone specific sets that can be used to assemble physical maps.
  • the assembly of the clones into contigs is based on the co- presence of terminal nucleotide sequences of the sequenced fragments which can be used as sequence based anchor points for additional linkage of sequence data. More in detail, the technology disclosed in WO2008007951 is Whole Genome Profiling (WGP) , KeyGene's recently developed proprietary approach for sequence based physical mapping.
  • WGP Whole Genome Profiling
  • a BAC library is constructed from a single homozygous individual and BAC clones are pooled in a multi-dimensional format.
  • BAC pools are characterized by pool specific tags to allow assignment of sequences to individual BAC clones based on the coordinates in the multidimensional pool screening.
  • DNA is extracted from each BAC pool and digested with restriction enzymes, for instance EcoRI and Msel.
  • restriction enzymes for instance EcoRI and Msel.
  • the EcoRI ends of the restriction fragments are analyzed on a next-generation sequencer such as the lllumina Genome Analyzer and in this way these relative short (20-100 basepairs) sequenced fragments, called the WGP Tags, can be assigned to individual BACs.
  • BACs can be assembled based on overlapping WGP tag patterns using a contiging software tool such as FPC
  • the WGP method is unique in providing sequence based anchor points instead of fragment lengths for assembly of BAC contigs. Sequence based anchors are more accurate and provide the basis for assembly of Whole Genome Shotgun data.
  • sequencing refers to determining the order of nucleotides (base sequences) in a nucleic acid sample, e.g. DNA or RNA.
  • bases sequences e.g. DNA or RNA.
  • Many techniques are available such as Sanger sequencing and high-throughput sequencing technologies (also known as next- generation sequencing technologies) such as the GS FLX platform offered by Roche Applied Science, and the Genome Analyzer from lllumina, both based on pyrosequencing.
  • Restriction endonuclease a restriction endonuclease or restriction enzyme is an enzyme that recognizes a specific nucleotide sequence (target site) in a double-stranded DNA molecule, and will cleave both strands of the DNA molecule at or near every target site, leaving a blunt or a staggered end.
  • Frequent cutters and rare cutters Restriction enzymes typically have recognition sequences that vary in number of nucleotides from 3, 4 (such as Msel) to 6 (EcoRI) and even 8 (Notl).
  • the restriction enzymes used can be frequent and rare cutters. The term 'frequent' in this respect is typically used in relation to the term 'rare'.
  • Frequent cutting endonucleases are restriction endonucleases that have a relatively short recognition sequence. Frequent cutters typically have 3-5 nucleotides that they recognise and
  • a frequent cutter on average cuts a DNA sequence every 64-1024 nucleotides.
  • Rare cutters are restriction endonucleases that have a relatively long recognition sequence. Rare cutters typically have 6 or more nucleotides that they recognise and subsequently cut. Thus, a rare 6-cutter on average cuts a DNA sequence every 4096 nucleotides, leading to longer fragments. It is observed again that the definition of frequent and rare is relative to each other, meaning that when a 4 bp restriction enzyme, such as Msel, is used in combination with a 5-cutter such as Avail, Avail is seen as the rare cutter and Msel as the frequent cutter.
  • Restriction fragments the DNA molecules produced by digestion with a restriction endonuclease are referred to as restriction fragments. Any given genome (or nucleic acid, regardless of its origin) will be digested by a particular restriction endonuclease into a discrete set of restriction fragments.
  • the DNA fragments that result from restriction endonuclease cleavage can be further used in a variety of techniques and can for instance be detected by gel electrophoresis.
  • Ligation the enzymatic reaction catalyzed by a ligase enzyme in which two double- stranded DNA molecules are covalently joined together is referred to as ligation.
  • ligation the enzymatic reaction catalyzed by a ligase enzyme in which two double- stranded DNA molecules are covalently joined together.
  • both DNA strands are covalently joined together, but it is also possible to prevent the ligation of one of the two strands through chemical or enzymatic modification of one of the ends of the strands. In that case the covalent joining will occur in only one of the two DNA strands.
  • Synthetic oligonucleotide single-stranded DNA molecules having preferably from about 10 to about 50 bases, which can be synthesized chemically are referred to as synthetic oligonucleotides.
  • synthetic DNA molecules are designed to have a unique or desired nucleotide sequence, although it is possible to synthesize families of molecules having related sequences and which have different nucleotide compositions at specific positions within the nucleotide sequence.
  • synthetic oligonucleotide will be used to refer to DNA molecules having a designed or desired nucleotide sequence.
  • Adapters short double-stranded DNA molecules with a limited number of base pairs, e.g. about 10 to about 30 base pairs in length, which are designed such that they can be ligated to the ends of restriction fragments.
  • Adapters are generally composed of two synthetic oligonucleotides which have nucleotide sequences which are partially complementary to each other. When mixing the two synthetic oligonucleotides in solution under appropriate conditions, they will anneal to each other forming a double-stranded structure.
  • one end of the adapter molecule is designed such that it is compatible with the end of a restriction fragment and can be ligated thereto; the other end of the adapter can be designed so that it cannot be ligated, but this need not be the case (double ligated adapters).
  • Adapter-ligated restriction fragments restriction fragments that have been capped by adapters.
  • primers in general, the term primers refer to DNA strands which can prime the synthesis of DNA.
  • DNA polymerase cannot synthesize DNA de novo without primers: it can only extend an existing DNA strand in a reaction in which the complementary strand is used as a template to direct the order of nucleotides to be assembled.
  • primers we will refer to the synthetic oligonucleotide molecules which are used in a polymerase chain reaction (PCR) as primers.
  • DNA amplification the term DNA amplification or amplification will be typically used to denote the in vitro synthesis of double-stranded DNA molecules using PCR. It is noted that other amplification methods exist and they may be used in the present invention without departing from the gist.
  • Tagging refers to the addition of a sequence tag to a nucleic acid sample in order to be able to distinguish it from a second or further nucleic acid sample.
  • Tagging can e.g. be performed by the addition of a sequence identifier during complexity reduction or by any other means known in the art such as a separate ligation step.
  • a sequence identifier can e.g. be a unique base sequence of varying but defined length uniquely used for identifying a specific nucleic acid sample. Typical examples are ZIP sequences, known in the art as commonly used tags for unique detection by hybridization (lannone et al. Cytometry 39:131 -140, 2000).
  • nucleotide based tags the origin of a sample, a clone or an amplified product can be determined upon further processing.
  • the different nucleic acid samples should be identified using different tags.
  • Identifier a short sequence that can be added to an adapter or a primer or included in its sequence or otherwise used as label to provide a unique identifier (aka barcode or index).
  • identifiers can be sample specific, pool specific, clone specific, amplicon specific etc.
  • the different nucleic acid samples are generally identified using different identifiers.
  • Identifiers preferably differ from each other by at least two base pairs and preferably do not contain two identical consecutive bases to prevent misreads.
  • the identifier function can sometimes be combined with other functionalities such as adapters or primers and can be located at any convenient position.
  • Tagged library refers to a library of tagged nucleic acids.
  • Aligning and alignment With the term “aligning” and “alignment” is meant the comparison of two or more nucleotide sequence based on the presence of short or long stretches of identical or similar nucleotides. Several methods for alignment of nucleotide sequences are known in the art, as will be further explained below.
  • a contig is used in connection with DNA sequence analysis, and refers to assembled contiguous stretches of DNA derived from two or more DNA fragments having contiguous nucleotide sequences.
  • a contig is a set of overlapping DNA fragments that provides a partial contiguous sequence of a genome.
  • the term 'contig' is also used to indicate a contiguous stretch of, for instance, BACs, a "BAC-contig'.
  • a BAC contig can also be made on marker analysis, i.e. a more indirect way of sequence analysis.
  • a "scaffold" is defined as a series of contigs that are in the correct order, but are not connected in one continuous sequence , i.e. contain gaps.
  • Contig maps also represent the structure of contiguous regions of a genome by specifying overlap relationships among a set of clones.
  • the term "contigs” encompasses a series of cloning vectors which are ordered in such a way as to have each sequence overlap that of its neighbours.
  • the linked clones can then be grouped into contigs, either manually or, preferably, using appropriate computer programs such as FPC, PHRAP, CAP3 etc.
  • Complexity reduction is used to denote a method wherein the complexity of a nucleic acid sample, such as genomic DNA, is reduced by the generation or selection of a subset of the sample.
  • This subset can be representative for the whole (i.e. complex) sample and is preferably a reproducible subset. Reproducible means in this context that when the same sample is reduced in complexity using the same method and experimental conditions, the same, or at least comparable, subset is obtained.
  • the method used for complexity reduction may be any method for complexity reduction known in the art. Examples of methods for complexity reduction include for example AFLP® (Keygene N.V., the Netherlands; see e.g. EP 0 534 858), the methods described by Dong (see e.g.
  • WO 03/0121 18, WO 00/24939 indexed linking
  • Unrau et al., vide infra indexed linking
  • the complexity reduction methods used in the present invention have in common that they are reproducible. Reproducible in the sense that when the same sample is reduced in complexity in the same manner, the same subset of the sample is obtained, as opposed to more random complexity reduction such as microdissection, random shearing, or the use of mRNA (cDNA) which represents a portion of the genome transcribed in a selected tissue and for its reproducibility is depending on the selection of tissue, time of isolation etc..
  • cDNA mRNA
  • DNA amplification the term DNA amplification will be typically used to denote the in vitro synthesis of double-stranded DNA molecules using PCR. It is noted that other amplification methods exist and they may be used in the present invention without departing from the gist.
  • High-throughput screening is a method for scientific experimentation especially relevant to the fields of biology and chemistry. Through a combination of modern robotics and other specialised laboratory hardware, it allows a researcher to effectively screen large amounts of samples
  • Artificial clone library a population of hosts (bacteria, yeast), each of which carries a DNA molecule that was inserted into a cloning vector such that a representation of the genome of an organism is present (usually an entrire genome) of clones, artificial
  • BAC Bacterial Artificial Chromosome, a DNA construct, usually based on a functional fertilility plasmid (or F-plasmid), used for transforming and cloning in bacteria, usually E. coli.
  • the usual insert size is 100-350 kb but can also be 700kb.
  • BACs are often used to sequence the genomes of organisms whereby a short piece of the organism's DNA is amplified as an insert in BACs and then sequenced. Rearrangement of the sequenced parts in silico provides the genome sequence of the organism.
  • polymorphism refers to the presence of two or more variants of a nucleotide sequence in a population.
  • a polymorphism may comprise one or more base changes, an insertion, a repeat, or a deletion.
  • a polymorphism includes e.g. a simple sequence repeat (SSR) and a single nucleotide polymorphism (SNP), which is a variation, occurring when a single nucleotide: adenine (A), thymine (T), cytosine (C) or guanine (G) - is altered.
  • SSR simple sequence repeat
  • SNP single nucleotide polymorphism
  • a variation must generally occur in at least 1 % of the population to be considered a SNP.
  • SNPs make up e.g. 90% of all human genetic variations, and occur every 100 to 300 bases along the human genome. Two of every three SNPs substitute Cytosine (C) with
  • Thymine Variations in the DNA sequences of e.g. humans or plants can affect how they handle diseases, bacteria, viruses, chemicals, drugs, etc.
  • Heterozygous An organism is heterozygous for a particular gene when different alleles occupy the gene's position on the homologous chromosomes
  • Homozygous An organism is homozygous for a particular gene when identical alleles are present on both homologous chromosomes Summary of the invention
  • the present inventors found that 'Whole Genome Profiling' or WGP has proven to work well in providing high-quality physical maps. However, from analysis and validation on several data sets, it was observed that sometimes 'gaps' (missing WGP tags) in assembled BACs occur: one overlapping BAC might not show the same set of WGP tags as another BAC covering the same region. These gaps could result from sequence errors or incomplete deconvolution due to inadequate sequencing depth, however the inventors realised that they could also have a biological background. If there would be a SNP or short indel within the WGP tag region, this would result in a polymorphic tag: either a present/absent tag or a tag with a SNP or indel. Exactly these latter SNP variants provide a unique opportunity to use them as genetic markers.
  • the present inventors have found by using artificial chromosome libraries (particularly BAC libraries) from a heterozygous genome sample or a combination of two or more homozygous genome samples, this observed effect can be used in an efficient way to create a physical map and, based on the same data set, screen the data for the presence of polymorphisms within the WGP-tags. This will significantly reduce the effort in SNP discovery and performing a large scale genotyping experiment.
  • the addition of a (rough scale) genotyping and genetic mapping effort provides a link of BAC contigs into linkage groups. This genetic map can then be extended to a high density, high resolution map by adding all SNP tags from their positions as known from the BAC contigs, resulting in an integrated physical and genetic map.
  • FIG. 6 Various adapter-primer combinations containing identifiers to yield tagged fragments.
  • A adapter nucleotide
  • F fragment nucleotide
  • P primer nucleotide
  • 1. adapter ligated fragment; 2. adapter ligated fragment containing identifier, a. amplification with primer directed against identifier, b. amplification with primer directed against adapter; 3. adapter ligated fragment containing degenerate identifier section, amplification with primer directed against degenerate identifier and introducing identifier in amplified fragment. 4. adapter ligated fragment containing no identifier, amplification with primer introducing identifier in amplified fragment.
  • Figure 7 Integration of genetic and physical map.
  • Figure 8 Alignment of BAC clones for contig 293. Indicated by the rectangular dotted box is the likely position of the polymorphic WGP tag SNP1 .
  • FIG. 9 Alignment of BAC clones for contig 307. Indicated by the rectangular dotted box is the likely position of the polymorphic WGP tag SNP2.
  • the invention relates to a method for the generation of a physical map of a sample genome and identification of polymorphisms, comprising the steps of:
  • an artificial chromosome e.g. BAC, YAC
  • each artificial chromosome clone contains DNA from a sample genome, wherein the sample genome is selected from the group consisting of
  • step (h) aligning the sequenced fragments based on the determined sequences in step (e);
  • step (i) determining polymorphisms between the aligned sequences of step (h).
  • an artificial clone bank is provided.
  • the library can be a Bacterial Artificial Chromosome library (BAC) or based on yeast (YAC). Other libraries such as based on fosmids, cosmids, PAC, TAC or MAC are also possible.
  • BAC library is preferably of a high quality and preferably is a high insert size genomic library. This means that the individual BAC contains a relative large insert of the genomic DNA under investigation (typically > 100 kbp). The size of the preferred large insert is species-dependent.
  • BACs as examples of artificial chromosomes.
  • the present invention is not limited thereto and that other artificial chromosomes can be used without departing from the gist of the invention.
  • the libraries contain at least five genome equivalents, more preferably at least 7, most preferably at least 8. Particularly preferred is at least 10. The higher the number of genome equivalents in the library, the more reliable the resulting contigs and physical map will be.
  • sample genome DNA is thus selected from the group consisting of a
  • heterozygous sample genome a combination of two or more homozygous sample genomes; and a combination of at least one heterozygous and at least one homozygous sample genome.
  • the individual clones in the library are pooled to form pools containing a multitude of artificial chromosomes or clones.
  • the pooling may be the simple combination of a number of individual clones into one sample (for example, 100 clones into 10 pools, each containing 10 clones), but also more elaborate pooling strategies may be used.
  • the distribution of the clones over the pools is preferably such that each clone is present in at least two or more of the pools.
  • the pools contain from 10 to 10000 clones per pool, preferably from 100 to 1000, more preferably from 250 to 750. It is observed that the number of clones per pool can vary widely, and this variation is related to, for instance, the size of the genome under investigation.
  • the maximum size of a pool or a sub-pool is governed by the ability to uniquely identify a clone in a pooling set by a set of identifiers.
  • a typical range for a genome equivalent in a pool set is in the order of 0.2 - 0.3, and this may again vary per genome.
  • the pools are generated based on pooling strategies well known in the art. The skilled man is capable selecting the optimal pooling strategy based on factors such as genome size etc.
  • the resulting pooling strategy will depend on the circumstances, and examples thereof are plate pooling, N-dimensional pooling such as 2D-pooling, 3D- pooling, 6D-pooling or complex pooling.
  • the pools may, on their turn, be combined in super-pools (i.e. super-pools are pools of pools of clones) or divided into sub-pools.
  • deconvolution i.e. the correct identification of the individual clone in a library by detection of the presence of a known associated indicator (i.e. label or identifier) of the clone in one or more pools or subpools
  • a known associated indicator i.e. label or identifier
  • the pooling strategy is preferably such that every clone in the library is distributed in such over the pools that a unique combination of pools is made for every clone. The result thereof is that a certain combination of (sub)pools uniquely identifies a clone.
  • the pools are digested with restriction endonucleases to yield restriction fragments.
  • Each pool is preferably separately subjected to an endonuclease digest.
  • Each pool is treated with the same (combination of) endonuclease(s).
  • Restriction endonucleases may be frequent cutters (4 or 5 cutters, such as Msel or Pstl) or rare cutters (6 and more cutters such as EcoRI, Hindlll).
  • restriction endonucleases are selected such that restriction fragments are obtained that are, on average, present in an amount or have a certain length distribution that is adequate for the subsequent steps.
  • two or more restriction endonucleases can be used and in certain embodiments, combinations of rare and frequent cutters can be used. For large genomes the use of, for instance, three or more restriction endonucleases can be used advantageously.
  • adapters are ligated in step (d) to provide for adapter-ligated restriction fragments.
  • adapters are synthetic oligonucleotides as defined herein elsewhere.
  • the adapters used in the present invention preferably contain an identifier section, in essence as defined herein elsewhere to provide for 'tagged adapters' .
  • the adapter contains a pool-specific identifier, i.e.
  • an adapter containing a unique identifier is used that unequivocally indicates the pool.
  • the adapter contains a degenerate identifier section which is used in combination with a primer containing a pool-specific identifier.
  • the adapter-ligated restriction fragments can be combined in larger groups, in particular when the adapters contain a pool-specific identifier. This combination in larger groups may aid in reducing the number of parallel amplifications of each set of adapter-ligated restriction fragments obtained from a pool.
  • the adapters that are ligated do not contain an identifier or a degenerate identifier section.
  • the adapter-ligated fragments are subsequently amplified using primers that contain identifiers (tags), for instance at their 5'end. The result is that amplified, tagged adapter-ligated fragments are obtained.
  • the adapters can be the same for a plurality (or all) of pools and the amplification using tagged primers creates the distinction between the pools that can later be used in the deconvolution.
  • the tagged adapter-ligated fragment can be amplified.
  • the amplification may serve to reduce the complexity or to increase the amount the DNA available for analysis.
  • the amplification can be performed using a set of primers that are at least partly complementary to the adapters and or the tags/identifiers. This amplification may be independently from the amplification described herein above that introduces the tags into the adapters. In certain embodiments, the amplification may serve several purposes at a time, i.e. reduce complexity, increase DNA amount and introduce tags in the adapter-ligated fragments in the pools.
  • the adapter-ligated fragments can be combined in larger groups, in particular when the adapters contain a pool-specific identifier. This combination in larger groups may aid in reducing the number of parallel amplifications of each set of adapter- ligated restriction obtained from a pool.
  • the adapter-ligated fragments can be amplified using a set of primers of which at least one primer amplifies the pool-specific identifier at the position of the pool-specific or degenerate identifier in the adapter.
  • the primer may contain (part of) the identifier, but the primer may also be complementary to a section of the adapter that is located outside the tag, i.e. downstream in the adapter. Amplification then also amplifies the tag. See in this respect Fig 6 for various embodiments.
  • step (e) part of the sequence of the tagged adapter-ligated fragment is determined.
  • the tagged adapter-ligated fragments are subjected to sequencing, preferably high throughput sequencing as described herein elsewhere. During sequencing, at least part of the nucleotide sequence of the (amplified) tagged adapter-ligated fragment is determined.
  • At least the sequence of the pool-specific identifier and part of the fragment (i.e. derived from the sample genome) of the (amplified) tagged adapter-ligated fragment is determined.
  • a sequence of at least 10 nucleotides of the fragment is determined.
  • at least 15, 20, 25, 30 or 35 nucleotides of the fragment (i.e. derived from the sample genome) are determined.
  • the number of nucleotides that are to be determined minimally will be, again, genome- as well as sequencing platform dependent. For instance, in plants more repetitive sequences are present, hence longer sequences (50-150 nucleotides) may to be determined for a contig of comparable quality.
  • the sequence library may be sequenced with an average redundancy level (aka oversampling rate) of at least 5.
  • an average redundancy level (aka oversampling rate) of at least 5.
  • the sequence is determined of at least 5 amplicons obtained from the amplification of one specific adapter- ligated fragment.
  • each fragment is (statistically) sequenced on average at least five times.
  • Increased redundancy is preferred as its improves the fraction of fragments that are sampled in each pool and the accuracy of these sequences, so preferably the redundancy level is at least 7, more preferably a least 10.
  • Increased average sequencing redundancy levels are used to compensate for a phenomenon that is known as 'sampling variation', i.e.
  • sequencing is performed using high-throughput sequencing methods, such as the methods disclosed in WO 03/004690, WO 03/054142, WO
  • step (f) the (partly) sequenced (amplified) tagged adapter-ligated fragments are correlated to the corresponding clone, typically in silico by means of computerized methods.
  • the (amplified) tagged adapter-ligated fragments are selected that contain identical sections of nucleotides in the restriction fragment-derived part.
  • the different pool-specific identifiers (tags) are identified that are present in those (amplified) tagged adapter-ligated fragments.
  • the combination of the different pool-specific identifiers and hence the sequence of the restriction fragment can be uniquely assigned to a specific clone (a process indicated as 'deconvolution').
  • each pool in the library is uniquely addressed by a combination of 3 pool- specific identifiers with the same restriction fragment-derived section.
  • a restriction fragment-derived section originating from a clone will be tagged with 3 different identifiers.
  • Unique restriction fragment-derived sections when observed in combination with the 3 identifiers can be assigned to a single BAC clone. This can be repeated for each (amplified) tagged adapter-ligated fragment that contains other unique sections of nucleotides in the restriction fragment-derived part.
  • the clones are combined and ordered into clone contigs in step (g) of the method.
  • the grouping and ordering can be performed by fingerprint contiging software for this purpose such as FPC software (Soderlund et al (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput. Appl. Biosci., 13:523-535.) essentially as described herein elsewhere.
  • FPC software Serlund et al (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput. Appl. Biosci., 13:523-535.
  • the alignment of the clones into contigs and the corresponding order of WGP tags generates a physical map of the sample genome.
  • the above steps can be performed independently for each genome sample. More in particular, in certain embodiments, for each heterozygous and/or homozygous sample genome steps (a) - (g) are performed independently. The steps relating to the screening for polymorphisms in the subsequent steps are taken from the separate steps (e).
  • sequenced fragments are aligned based on the determined sequences in step (e) of the method.
  • sequenced fragments comprise a sample identification tag, identifying the pool from which the BAC was derived (and hence the fragment).
  • the sequence fragments further comprise the remains of the restriction site and further a sequence that is derived form the sample genome (sometimes indicated as the 'sequence tag').
  • step (i) the polymorphisms are determined between the aligned sequences of the WGP tags of step (h).
  • a polymorphism in the WGP tag region will result in two variants, with a SNP (or Indel) somewhere in the 65 nt, non-restriction site part of a (75 nt sequence read length)) WGP tag (which is now termed 'SNP Tag', i.e. a SNP Tag is a WGP tag that contains a polymorphism or an indel between its two variants, See fig 3). Given equal portions of BACs and pool samples, the two variants ('alleles') of each SNP Tag are expected to be found in a 50/50 distribution.
  • a polymorphism In those cases where a polymorphism is present in the restriction site region, it will cause the WGP tag to be present in only 50% of the BACs from the corresponding region in the genome. These tags are less useful to discriminate between alleles as it is not sure whether the tags are missing because the region was not covered by the BAC, or because of a polymorphism.
  • the low number of SNP tags will still allow the formation of contigs from overlapping BACs, as obtained in a standard WGP approach, albeit with less stringent alignment (FPC) settings.
  • FPC stringent alignment
  • a binning step is executed first to combine allelic variants (and possibly sequence errors) into a single WPG tag followed by a FPC step as for a homozygous line (i.e. with more stringent settings).
  • an additional analysis will be done to identify "SNP tags" and their position on the contigs or physical map.
  • the discovered polymorphism and SNPs are converted into conventional SNP assays using conventional technology.
  • SNP tags can due to the existence of sequence information be converted to PCR-assays, invader assays, Golden Gate assays and the like.
  • the developed assay can be validated on the sample genomes(s) used to generate the SNPs.
  • the discovered SNP tags are both physical markers (as their position on the physical map is known) and genetic markers, as they differ between sample genomes. This allows to couple genetic and physical maps (See Fig 7).
  • SNPs can be followed in subsequent crosses, for instance in the offspring of crosses between parents of which at least one of the parents was used to create the physical map.
  • the SNPs can serve as genetic markers that are linked to the physical map.
  • the BACs used to generate the physical map can be anchored to the genetic map, resulting in a high resolution map based on a scaffold of genetic markers (SNP tags) and supplemented with WGP tags. It is further possible to also link genetic markers obtained via other ways to the map to further complete the high resolution map.
  • a further aspect of the invention relates to a method for the generation of a linked genetic map of genetic markers (SNP tags) and a physical map (of WGP tags) comprising
  • step (i) providing two parents, wherein at least one of the parents has been used as a sample genome in the generation of the physical map that provided the SNP tags of step (i);
  • the genetic linkage map is constructed using the markers discovered using the method of the invention.
  • the map can be constructed by observing how frequently pairs of SNP Tag markers are inherited together after selfing an F1 population and analyzing the F2 individuals (about 100).
  • the thus obtained genetic map shows the relative locations of these SNPs along the chromosome, thus providing the link with the position of the SNP tags on the BAC contigs of the physical map.
  • This example is to provide an illustration for the polymorphic Whole Genome Profiling (pWGP) concept described herein.
  • the invention is aimed to perform physical mapping and SNP discovery at the same time, by executing a WGP project on a BAC library derived from a heterozygote individual or a combination of two polymorphic individuals.
  • data have been used from both Arabidopsis thaliana Columbia and Landsberg erecta ecotypes.
  • WGP tags were identified which were specific to either the Col or the Ler data.
  • a single Col tag matches a single Ler tag with only one nucleotide difference, than such a tag is a putative SNP marker.
  • Two of such candidates and corresponding BAC clones are identified. They are presented in Table 3.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé permettant de générer une carte physique d'un génome d'un échantillon combinée à l'identification de polymorphismes. Ce procédé implique, d'une part de dresser une carte physique d'une échantillothèque de chromosomes artificiels bactériens d'un génome d'échantillon hétérozygote en se basant sur un séquençage à haut débit de fragments de restrictions qui sont ligaturés à des adaptateurs marqués et qui appartiennent à des clones de chromosomes artificiels bactériens mis en commun, et d'autre part à déterminer les polymorphismes entre les fragments de restrictions ligaturés à des adaptateurs marqués.
PCT/NL2010/050836 2009-12-10 2010-12-09 Profilage polymorphique du génome entier WO2011071382A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US28533109P 2009-12-10 2009-12-10
NL2003932 2009-12-10
US61/285,331 2009-12-10
NL2003932 2009-12-10

Publications (1)

Publication Number Publication Date
WO2011071382A1 true WO2011071382A1 (fr) 2011-06-16

Family

ID=42272720

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2010/050836 WO2011071382A1 (fr) 2009-12-10 2010-12-09 Profilage polymorphique du génome entier

Country Status (1)

Country Link
WO (1) WO2011071382A1 (fr)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0534858A1 (fr) 1991-09-24 1993-03-31 Keygene N.V. Amplification sélective des fragments de restriction: procédé général pour le "fingerprinting" d'ADN
WO2000024939A1 (fr) 1998-10-27 2000-05-04 Affymetrix, Inc. Gestion de la complexite et analyse d'adn genomique
WO2003004690A2 (fr) 2001-07-06 2003-01-16 454$m(3) CORPORATION Methode utilisant un filtre poreux pour isoler en parallele des micro-reactions chimiques independantes
WO2003012118A1 (fr) 2001-07-31 2003-02-13 Affymetrix, Inc. Gestion de la complexite d'adn genomique
WO2003027311A2 (fr) * 2001-09-24 2003-04-03 Seqwright, Inc Strategie de sequençage aleatoire de banques ordonnees de clones pour le sequençage des acides nucleiques
WO2003054142A2 (fr) 2001-10-30 2003-07-03 454 Corporation Nouvelles proteines de fusion sulfurylase-luciferase et sulfurylase thermostable
WO2004063323A2 (fr) * 2003-01-10 2004-07-29 Keygene N.V. Procede fonde sur aflp destine a l'integration de cartes physiques et genetiques
WO2004070005A2 (fr) 2003-01-29 2004-08-19 454 Corporation Sequençage a double extremite
WO2006137734A1 (fr) * 2005-06-23 2006-12-28 Keygene N.V. Strategies ameliorees pour le sequençage de genomes complexes utilisant des techniques de sequençage a haut rendement
WO2008007951A1 (fr) 2006-07-12 2008-01-17 Keygene N.V. Cartographie physique à haut débit par aflp

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0969102A2 (fr) * 1991-09-24 2000-01-05 Keygene N.V. Amplification sélective des fragments de restriction; procédé général pour le "fingerprinting" d'ADN
EP0534858A1 (fr) 1991-09-24 1993-03-31 Keygene N.V. Amplification sélective des fragments de restriction: procédé général pour le "fingerprinting" d'ADN
WO2000024939A1 (fr) 1998-10-27 2000-05-04 Affymetrix, Inc. Gestion de la complexite et analyse d'adn genomique
WO2003004690A2 (fr) 2001-07-06 2003-01-16 454$m(3) CORPORATION Methode utilisant un filtre poreux pour isoler en parallele des micro-reactions chimiques independantes
WO2003012118A1 (fr) 2001-07-31 2003-02-13 Affymetrix, Inc. Gestion de la complexite d'adn genomique
US6975943B2 (en) 2001-09-24 2005-12-13 Seqwright, Inc. Clone-array pooled shotgun strategy for nucleic acid sequencing
WO2003027311A2 (fr) * 2001-09-24 2003-04-03 Seqwright, Inc Strategie de sequençage aleatoire de banques ordonnees de clones pour le sequençage des acides nucleiques
WO2003054142A2 (fr) 2001-10-30 2003-07-03 454 Corporation Nouvelles proteines de fusion sulfurylase-luciferase et sulfurylase thermostable
WO2004063323A2 (fr) * 2003-01-10 2004-07-29 Keygene N.V. Procede fonde sur aflp destine a l'integration de cartes physiques et genetiques
WO2004070005A2 (fr) 2003-01-29 2004-08-19 454 Corporation Sequençage a double extremite
WO2004069849A2 (fr) 2003-01-29 2004-08-19 454 Corporation Amplification d'acides nucleiques par emulsion de billes
WO2004070007A2 (fr) 2003-01-29 2004-08-19 454 Corporation Prodece de preparation de banques d'adn simple brin
WO2005003375A2 (fr) 2003-01-29 2005-01-13 454 Corporation Procede d'amplification et de sequençage d'acides nucleiques
WO2006137734A1 (fr) * 2005-06-23 2006-12-28 Keygene N.V. Strategies ameliorees pour le sequençage de genomes complexes utilisant des techniques de sequençage a haut rendement
WO2008007951A1 (fr) 2006-07-12 2008-01-17 Keygene N.V. Cartographie physique à haut débit par aflp

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
BRENNER ET AL., PROC. NATL. ACAD. SCI., vol. 86, 1989, pages 8902 - 8906
CAI W-W ET AL: "A clone-array pooled shotgun strategy for sequencing large genomics", GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, WOODBURY, NY, US LNKD- DOI:10.1101/GR.198101, vol. 11, 1 January 2001 (2001-01-01), pages 1619 - 1623, XP002967818, ISSN: 1088-9051 *
GREGORY ET AL., GENOME RES., vol. 7, 1997, pages 1162 - 1168
KLEIN ET AL., GENOME RESEARCH, vol. 10, 2000, pages 798 - 807
KLEIN P E ET AL: "A high-throughput AFLP-based method for constructing integrated genetic and physical maps: Progress toward a sorghum genome map", GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, WOODBURY, NY, US LNKD- DOI:10.1101/GR.10.6.789, vol. 10, no. 6, 1 June 2000 (2000-06-01), pages 789 - 807, XP002240094, ISSN: 1088-9051 *
LANNONE ET AL., CYTOMETRY, vol. 39, 2000, pages 131 - 140
MARRA ET AL., GENOME RES., vol. 7, 1997, pages 1072 - 1084
SEO ET AL., PROC. NATL. ACAD. SCI. USA, vol. 101, 2004, pages 5488 - 93
SODERLUND ET AL.: "FPC: a system for building contigs from restriction fingerprinted clones", COMPUT. APPL. BIOSCI., vol. 13, 1997, pages 523 - 535

Similar Documents

Publication Publication Date Title
US10538806B2 (en) High throughput screening of populations carrying naturally occurring mutations
JP5389638B2 (ja) 制限断片に基づく分子マーカーのハイスループットな検出
EP2663655B1 (fr) Génotypage fondé sur des séquences aléatoires à extrémités appariées
EP2427569B1 (fr) Utilisation d'endonucléases à restriction de classe iib dans des applications de séquençage de 2ème génération
EP2379751B1 (fr) Nouvelles stratégies de séquençage du génome
US8975028B2 (en) Method for the identification of the clonal source of a restriction fragment
EP2513333A1 (fr) Séquençage du génome total basé sur des enzymes de restriction
US20200102612A1 (en) Method for identifying the source of an amplicon
WO2011071382A1 (fr) Profilage polymorphique du génome entier
US20150329906A1 (en) Novel genome sequencing strategies

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10796186

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10796186

Country of ref document: EP

Kind code of ref document: A1