WO2014059370A1 - Improved high throughput system for genetic studies - Google Patents

Improved high throughput system for genetic studies Download PDF

Info

Publication number
WO2014059370A1
WO2014059370A1 PCT/US2013/064694 US2013064694W WO2014059370A1 WO 2014059370 A1 WO2014059370 A1 WO 2014059370A1 US 2013064694 W US2013064694 W US 2013064694W WO 2014059370 A1 WO2014059370 A1 WO 2014059370A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
tetrad
barcode
acid molecule
tetrads
Prior art date
Application number
PCT/US2013/064694
Other languages
French (fr)
Inventor
Aimee M. DUDLEY
Adrian Scott
Gareth CROMIE
Catherine LUDLOW
Original Assignee
Institute For Systems Biology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute For Systems Biology filed Critical Institute For Systems Biology
Publication of WO2014059370A1 publication Critical patent/WO2014059370A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/80Vectors or expression systems specially adapted for eukaryotic hosts for fungi
    • C12N15/81Vectors or expression systems specially adapted for eukaryotic hosts for fungi for yeasts
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the invention relates to improvements in techniques for studying the effects of genetic mutations on phenotypes and interactions between genes and environmental factors.
  • a classic method for such study based on sporulation of tetrads in yeast and other microorganisms that undergo similar behaviors is adapted for high throughput analysis by providing miotic- based fluorescent labeling and the use of barcodes.
  • Meiotic mapping is a linkage-based method for analyzing the recombinant progeny of a cross that has long been a cornerstone of genetics.
  • the method is possible in a wide range of eukaryotes, including genetically facile yeasts and less tractable microorganisms, such as the filamentous fungus Neurospora crassa and the unicellular green alga Chlamydomonas reinhardtii.
  • the approach is enabled by tetrad dissection, a technique for isolating and cultivating "with complete certainty all of the spores [meiotic progeny] derived from individual asci [tetrads]" that was first developed in S.
  • the third strategy “bulk segregant analysis” 7 or more recently “extreme QTL (X-QTL) mapping” , has been used in organisms ranging from yeast to plants.
  • a common feature of bulk segregant methods is the use of a pooled genotyping strategy to identify regions of DNA common to the majority of progeny under a specific selection criterion, e.g., growth under high drug concentrations. While the three strategies have been effectively applied to specific problems, they each have limitations that fall short of the broad applicability of conventional tetrad analysis.
  • the limited throughput of manual tetrad dissection stands in stark contrast to the need for extremely large numbers in two research areas where meiotic mapping can be powerfully applied.
  • the first area is the genetic mapping of complex traits resulting from combinations of naturally occurring polymorphisms. A better understanding of how complex interactions between genes and environmental factors give rise to phenotypic variation is essential for human health, agriculture, and bioengineering. Unfortunately, most genetic studies in these areas are currently limited by the power to detect only a small fraction of the genetic loci that contribute to a trait 9 .
  • the second area is the study of the molecular mechanisms of recombination, in which the segregation pattern of DNA serves as an assay for molecular processes.
  • the present invention takes advantage of the availability of reporter genes, barcodes, and efficient sequencing techniques to modernize the use of tetrads in studies of genetic interactions and phenotypic characteristics.
  • Diploids obtained by crossing two parental haploid mating strains are provided with nucleic acids containing an expression system for a fusion protein that comprises a protein that participates in meiosis and a reporter such as a fluorescent protein and are also provided with a barcode.
  • a library of such barcodes is used to transfect a culture of diploids that have been obtained by genetic cross of haploid parents.
  • the expression system and barcode can be supplied on the same plasmid and a library of plasmids containing a multiplicity of barcodes used to transform the culture.
  • the culture is subjected to sporulation stimulus to form tetrads each tetrad containing four haploid spores.
  • the spores are then separated, but sister spores originating from the same tetrad can be identified by the barcode. Further, the tetrads themselves can be separated from the remainder of the culture by FACS based on the reporter fusion protein.
  • Second is the incorporation of a highly complex pool of DNA barcodes in a form that transmits the same unique sequence to all four spores of a tetrad and can be read by DNA sequencing of the recombinant progeny. This identifies which progeny come from the same tetrad.
  • Third is the genotyping step, which in one embodiment uses RAD-tag sequencing of a consistent 2-3% subset of the genome, including the tetrad- specific barcode to genotype progeny strains. The recovery of tetrad relationships along with the empirically-derived genotyping data from the cross allows the accurate inference of missing information, including markers with low sequence coverage as well as the complete genotype of inviable (and therefore unrecoverable) individuals.
  • the method is illustrated in the most commonly used microorganism for meiotic mapping, the yeast S. cerevisiae. However, with minor substitutions of organism- specific reagents, e.g., different sporulation- specific proteins fused to GFP, the method should be readily transferrable to other microorganisms, including organisms in which meiotic mapping is significantly more labor intensive or currently intractable.
  • the invention is directed to an improved method for isolating and sequencing spores from a tetrad-forming organism wherein said improvements comprise providing diploids subject to tetrad formation and sporulation which diploids contain a nucleic acid molecule comprising an expression system for a fusion protein wherein said fusion protein comprises a fluorescent marker fused to a meiosis-dependent protein and/or said diploids contain unique barcodes.
  • the invention is directed to a nucleic acid molecule which contains an expression system operable in an organism that forms spores from tetrads said system to produce a fusion protein wherein said fusion protein comprises a fluorescent marker fused to a meiosis-dependent protein.
  • the nucleic acid molecules may also contain a unique barcode and/or a selection marker.
  • the invention is directed to a culture of cells or a library comprising nucleic acid molecule which contains an expression system operable in an organism that forms spores from tetrads said system to produce a fusion protein wherein said fusion protein comprises a fluorescent marker fused to a meiosis-dependent protein.
  • the nucleic acid molecules may also contain a unique barcode and/or a selection marker.
  • Figure 1 shows an outline of the method of the invention.
  • Figure 2 shows the genetic patterns obtained when spores are sorted to a single tetrad.
  • Figures 3A-3D show the results of FACS separation of tetrads from other cells.
  • Tetrad dissection in yeast has two critical steps that are difficult to automate because they are performed manually with a micromanipulator mounted to a microscope. The first is the isolation of tetrads away from unsporulated cells in the culture, which often out-number tetrads (99 to 1 in the commonly used FY strain background) (Swain Lenz and Fay, unpublished result). The second is physically separating the spores of a tetrad and arranging them in a grid. In S. cerevisiae, spores are held together by both an outer ascus, the remnant of the cell wall of the original diploid cell, and a set of interspore bridges 13.
  • enzymatic digestion removes the ascus and a researcher uses a micromanipulator to break the interspore bridges and array the spores in a gridded pattern.
  • the grid separates spores to prevent interspore mating and also preserves the knowledge of which spores came from the same tetrad.
  • Tetrads are isolated from unsporulated cells in the culture using a meiosis-specific fluorescent reporter and FACS. Tetrads are then disrupted, physically separating the spores, which are randomly arrayed on an agar plate. Because sister spores share a unique molecular barcode that can be read during the genotyping of the strains, the tetrad relationship between sisters is maintained even among randomly arrayed cells.
  • Both the meiosis-specific fluorescent reporter and the molecular barcode are introduced and transmitted to the recombinant progeny by means of a plasmid library and maintained by drug selection. Barcodes are oligonucleotides of 4-20 or intermediate numbers of bases. The complexity of the barcode system is determined by the number of randomized bases in the sequence.
  • FIG. 1 depicts a typical high copy plasmid with barcodes used to transform the diploid cells.
  • a library with multiple plasmids with different barcodes is used.
  • the expression system for the reporter protein is on the same plasmid, though it need not be.
  • the diploids are sporulated to create tetrads that express the fluorescent protein reporter (shown as EGFP in the plasmid and GFP in the tetrad) and the tetrads are then isolated by FACS.
  • the spores are then separated on agar plates and sequenced using the currently available high throughput sequencing techniques.
  • the data are then grouped according to spores originating from the same tetrad.
  • FACS sorting permits an easy and rapid separation of 4- spore tetrads out of a mixed population that includes vegetative cells, dead cells, clumped cells, and 2-spore dyads.
  • Several reporter genes have recently been used to fluorescently label tetrads 14"16 .
  • SPS2-GFP fusion because it has been successfully used to quantitate sporulation in a number of genetically diverse, non-laboratory strains.
  • a molecular barcoding strategy is employed to identify spores from the same tetrad.
  • the strategy satisfies four main criteria.
  • the pool of barcodes must be complex enough to ensure that most individuals recovered share a common barcode because they were members of the same tetrad.
  • the barcode must be reliably transmitted to all four tetrad spores.
  • the presence of the barcode should be phenotypically neutral.
  • the barcode should be compatible with the method used to determine the progeny genotypes, allowing the barcode to be read as part of the strain genotyping workflow. No existing barcoding resource satisfies all of these criteria. For example, strategies that integrate barcodes at a neutral genomic location 17 will be heterozygous in a diploid genome and thus only present in half of the tetrad's spores.
  • plasmids are maintained in high copy (10-40 copies per cell) and stably segregate during cell division 18 , greatly increasing the likelihood of the plasmid' s transmission to all four spores.
  • the presence of engineered 2- micron plasmids should have a relatively neutral impact on most traits. However, because the plasmid is no longer required after the strain's genotype is determined, direct counter selection or simple failure to maintain selection would facilitate plasmid loss.
  • a 2-micron-based plasmid library that contains the SPS2-GFP sporulation-specific fluorescent reporter, a complex DNA barcode flanked by restriction sites compatible with our RAD-tag sequencing protocol and a drug resistance marker for plasmid maintenance is transformed into heterozygous diploid cells resulting from a cross of two haploid strains.
  • the complexity of the library is conferred by the presence of a randomized 15 nucleotide sequence, which permits a theoretical 10 9 unique sequences.
  • the number of different barcodes may vary from 10- 10 10 and all intervening integers, e.g., 100, 1000, 10 5 , etc.
  • each spore is sequenced using an efficient sequencing technique. Any such technique may be used. However, for illustration, and for convenience, a sequencing strategy that permits the simultaneous determination of the genotype itself and the barcode is used.
  • the plasmid-borne tetrad barcode is flanked by the same restriction sites used in our RAD-tag method, its sequence is present in the genotyping reads.
  • strains arising from the same FACS-sorted plate that share a common plasmid barcode sequence are grouped together as members of the same tetrad, a hypothesis that is confirmed by a series of quality control metrics.
  • the small proportion of strains (5%) that lack a clear tetrad barcode in their sequence reads can later be assigned to tetrads based on the expectation of 2:2 allele segregation of markers within tetrads.
  • strains lacking this sequence still have the potential to be assigned to tetrads.
  • the method of the present invention permits inference of a complete genome even though some sequence information is missing.
  • Missing markers can also be inferred probabilistically based on genetic linkage, i.e., an untyped marker that is close to a typed marker has a high probability of carrying the same allele as the typed marker. This probability can be calculated based purely on the genetic distances between markers. However, the use of both genetic distance and the known haplotypes of all spores in the tetrad can improve the accuracy of inference, sometimes greatly, by incorporating the probability of all possible recombination patterns at the tetrad level. In the pilot crosses below, -12% of the final set of allele calls were made using these inference methods.
  • An exciting extension of tetrad-based genotype inference is the ability to infer the full genome sequence of non- viable spores. This permits the discovery of synthetic interactions, like those seen in synthetic lethal screens, except that they result from natural variation. Using the invention method, synthetic lethal screens should be significantly less limited by strain background and the number of interacting genes than current methods. For example, it should be possible to uncover a synthetic interaction between four genes in two previously
  • YPS163 23 gave 86% spore viability. This is more typical of crosses between genetically distant strains. In the high viability (FY x ⁇ 1278b) cross, 71% of progeny were assembled into 3 or 4 spore tetrads. For the lower viability (S288c x YPS163) cross 64% were assembled into 3-4 spore tetrads. Using this "percent in tetrads" metric as a measure of efficiency and correcting for the 15% difference in spore viability, the resulting efficiency of the method is equivalent between the two crosses.
  • Enormous numbers of recombinant progeny are required to gain a full understanding of the mechanisms involved in the complex interplay between genotype, phenotype and environment, and the invention method provides a high-throughput approach combining tetrad dissection and genotyping the progeny of yeast crosses or crosses of other microorganisms, such as Neurospora crassa and Chlamydomonas reinhardtii.
  • genotype information for each progeny strain while preserving tetrad relationships by means of unique tetrad barcodes is achieved.
  • the invention most closely recapitulates the information provided by a manually dissected yeast cross. This tetrad information allows for use of the expected 2:2 allele segregation pattern to infer missing markers and permits reconstruction of the full genotypes of spores that are inviable and therefore unrecoverable.
  • Ehrenreich I. M. et al. Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464, 1039-1042, doi: 10.1038/nature08923 (2010).
  • Fogel, S., Mortimer, R., Lusnak, K. & Tavares, F. Meiotic gene conversion a signal of the basic recombination event in yeast. Cold Spring Harb Symp Quant Biol 43 Pt 2, 1325-1341 (1979).
  • S. cerevisiae strains and genome sequences used in this study are as follows.
  • a genome assembly for YPS163 was generated by assembling a maximal consistent set of polymorphisms relative to S288c and applying these polymorphisms to the reference sequence.
  • Gapl.l_F CTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATAG
  • Gapl.l_R CTCCTTACGCATCTGTGCGGTATTTCACACCGCATAGATCTTA
  • aPl 4 base barcode sequences used in this study: GGAT, TGCA, CGTT,
  • AGGC GGTA, TGGT, CGCG, AGAA, GGCC, TGAC, CGGA, AGTG, GCGT, TCTT, CCAA, ACCA, GCAC, TCCG, CCTC, ACGG, GCTG, TCGA, CCCT, ACAT, GTCA, TTAA, CTGG, ATTT, GTGC, TTTC, CTAT, ATCG, GTAG, TTCT, CTTA, ATGA, GATT, TAGC, CACC, AAAC, GACG, TAAT, CAGT, AATA, GAGA, TATG, CAAG, AACT
  • pCL2_BC The plasmid-based barcode library
  • pCL2 plasmid backbone was constructed by gap repair in yeast as follows: the yeast 2-micron ADE2 plasmid, pRS422 , was cut with Bglll. The ADE2-containing fragment was discarded and the remaining plasmid backbone was treated with Antarctic
  • SPS2::EGFP::kanMX4 cassette was amplified from BC257 (gift of Barak Cohen) using primers Gapl.l_F and Gapl.l_R that bear homology to both the SPS2 genomic and plasmid DNA sequences.
  • the resulting PCR product was co-transformed along with the plasmid fragment into yeast.
  • Transformants were selected on YPD agar containing 200 ⁇ g/ml G418. G418 resistant clones were scraped and pooled; DNA was prepared and transformed into OneShot TOP10 chemically competent bacteria (Life Technologies). Bacterial transformants were selected on LB-carbenicillin plates and analyzed by restriction digestion to identify the repaired plasmid.
  • a complex library of random barcodes was inserted as follows: 20 nmoles of a 200-mer oligo, including a high complexity 15-base degenerate region, was amplified by 20 rounds of PCR using Phusion ® High-Fidelity DNA Polymerase (Thermo Fisher) with BC_F and BC_R primers at a final concentration of 20 pM each.
  • the DNA from a pool of 24 separate reactions was pooled and ligated to the linearized pCL2 at its unique Smal site using the In-Fusion ® HD Cloning System (Clontech). To maintain complexity, five ligation reactions were carried out and used for 18 independent bacterial transformations onto LB-carbenicillin selection plates.
  • the barcode complexity of the pCL2_BC library was assessed by Illumina DNA sequence analysis. Briefly, 1.5 ⁇ g of the plasmid library was fragmented by digestion with Mfel and Sau3Al (a DAM-methylation insensitive isoschizomer of Mbol). Digests were incubated for 2 hrs at 37°C in a 20 ⁇ reaction with 2 units of Sau3Al and 10 units Mfel (NEB), followed by heat inactivation at 65°C for 20 min.
  • Mfel and Sau3Al a DAM-methylation insensitive isoschizomer of Mbol
  • the P2 adaptor and four sets of barcoded PI adaptors were then ligated onto the plasmid fragments at room temperature for 20 min in a single 25 ⁇ reaction containing 1 ⁇ g of digested plasmid, 400 units T4 DNA ligase (NEB), 2.5 ⁇ ⁇ T4 ligase buffer and 6 ⁇ of a combined PI (25 nM), P2 (1 ⁇ ) adaptor mix.
  • the T4 ligase was heat inactivated for 20 min at 65°C.
  • the ligated plasmid DNA was concentrated to 10 ⁇ using a MinElute ® PCR Purification Kit (Qiagen). The DNA was size selected and extracted, as below. Approximately 10 ng of the purified plasmid DNA library was enriched with a PCR reaction and sequenced in a single flow cell lane of a Genome Analyzer IIx (Illumina).
  • Heterozygous diploids resulting from crosses between two parental strains were grown to ⁇ 2 x 10 cells/ml and transformed with ⁇ 2 ⁇ g of the pCL2_BC barcoded plasmid library using a standard protocol 9 modified to include 8% DMSO in the transformation mix. After the 30 min 42°C heat shock step, the transformed cells were gently washed with 1 ml of YPD, resuspended in 1 ml of YPD, and allowed to recover by sitting at room temperature for 3 hours. Transformants were then selected by plating 200 ⁇ of the recovered culture per YPD + 200 ⁇ g/ml G418 plate, a total of five plates per transformation.
  • Tetrads were isolated from the sporulation culture by FACS with a FACSAria II equipped with an Automated Cell Deposition Unit (BD Biosciences). GFP fluorescence was detected using the 488 nm laser and 530/30 filter. To achieve a reproducibly high proportion of tetrads we implemented a series of gating steps. The results are shown in Figure 3. Selecting a narrow width of the FSC and SSC signals, while permitting a large range of FSC and SSC heights filtered out events containing cell or media debris as well as those containing multiple cells per droplet (Figure 3A,B). A GFP vs. FSC area gate was used to identify fluorescent (and therefore sporulated) cells ( Figure 3C).
  • the population selected by these steps consisted of two subpopulations: one subpopulation was composed of clumps of tetrads and tetrads with a small bud attached, while the other subpopulation was primarily composed of isolated tetrads. These subpopulations were distinguished from each other on the basis of their FSC signal. The clumps and budded tetrads had a higher FSC than the isolated tetrads, though the distribution of FSC in these two subpopulations did overlap as indicated by the overlapping peaks in
  • Figure 3D To enrich for isolated tetrads, we set a final gate to include events with a low FSC. During gate assignment, tetrad recovery was assessed by sorting 1000 events onto a microscope slide and manually counting tetrads.
  • tetrads were sorted directly onto YPD + 200 ⁇ g/ml G418 agar plates with a 25 ⁇ drop of lmg/ml zymolyase in 0.7 M sorbitol on top of the agar. Tetrads were sorted into the drop by positioning the plate on top of the 96-well plate adaptor and directly under the sorting stream. To reduce the chance of recovering two tetrads with the same plasmid barcode on the same plate and to ensure the development of single, isolated colonies, only 25 tetrads were sorted per plate. Each plate was inverted immediately after being removed from the sorter and incubated at 37°C for 30 min.
  • Yeast genomic DNA was isolated for RAD-tag sequencing as follows. 96-well format plates were used to seed 0.5 ml cultures in 2 ml deep-well plates containing YPD with 200 ⁇ g/ml of G418. These were then grown overnight at 30°C on a VibraTranslator ® electromagnetic shaker (Union Scientific Corp.). Yeast cells were pelleted at 1000-g for 5 min. Yeast genomic DNA was extracted in 96-well format using the ZR-96 Fungal/Bacterial DNA KitTM (ZymoResearch).
  • each cell pellet was re-suspended with 50 ⁇ H 2 0, 400 ⁇ of ZR lysis buffer was added and the suspension was transferred to the kit' s ZR lysis rack, containing 0.5 mm beads.
  • the racks were processed at 1300 rpm for 2 min in a 96-well block bead beater (Geno/Grinder ® 2010, SPEX Sample Prep). After centrifugation, supematants were transferred to a 96 deep-well block and DNA binding, washing and elution procedures were followed as specified in the manufacturer's protocol, except that DNA was eluted in 35 ⁇ of DNA elution buffer.
  • genotype and barcode of each strain was determined using a multiplexed RAD- tag 11 sequencing strategy.
  • -50 ng genomic DNA was fragmented by restriction enzyme digestion with Mfel and Mbol (New England Biolabs). The digests were incubated 1 hr at 37°C in a 12.5 ⁇ reaction containing 2.5 units of each enzyme, then heat inactivated at 65°C for 20 min.
  • Adaptors were ligated onto the fragments in a 25 ⁇ reaction containing the entire digest, 400 units T4 DNA ligase (New England Biolabs), 2.5 ⁇ ⁇ T4 ligase buffer and 5 ⁇ of a combined PI (25 nM), P2 (1 ⁇ ) adaptor mix (IDT) at room temperature for 20 min.
  • the PI adaptor contains the Illumina PCR Forward sequencing primer sequence followed by one of 48 unique 4-nucleotide barcodes and finally the Mfel restriction enzyme compatible overhang sequence.
  • the P2 adaptor contains the Illumina PCR Reverse primer sequence followed by the Mbol restriction enzyme compatible overhang sequence.
  • the DNA library was then enriched with a PCR reaction using Illumina PCR Forward and Reverse primers and Phusion ® HF PCR Master Mix polymerase (Finnzymes). Thermocycler conditions were as follows: 98°C /l min; 14 cycles of 98°C /10 sec, 60°C /30 sec, 72°C /30 sec; final extension at 72°C /4 min.
  • raw read sequences were split into 48 pools based on their strain barcode sequences, which are contained in the first four bases of the read. Reads with unexpected strain barcodes or with barcodes having Phred (-10 logio P er ror) quality scores less than 20 or ambiguous ("N") calls at any barcode base were discarded. Reads with more than 2 "N" calls outside the barcode were also discarded. In each of the resulting strain- specific pool of reads, the barcode sequences were removed and the remaining 36 base pairs of sequence were searched for reads carrying the plasmid (tetrad) barcode.
  • Tetrad barcodes were identified using the pattern ⁇ read-start>NNNNNTGCCGACCC ⁇ barcode>GCAGG, where the barcode is restricted to a length of 11-19 nucleotides. A single mismatch or nucleotide deletion was allowed in the pattern match outside the barcode. A consensus length and sequence for the tetrad barcode were derived from the set of all plasmid barcode reads coming from each strain. [0047] The strain- specific read pools were then used to infer the genotypes of the progeny strains. From each strain pool, the sequence reads that did not correspond to the plasmid barcodes (above) were aligned to both fully sequenced parental genome sequences.
  • PI parent 1
  • P2 parent 2
  • Mfel site polymorphisms Two classes of informative Mfel markers were observed: loci in which the Mfel site was present in both parental genomes with adjacent sequence polymorphisms and loci in which the Mfel site was present in only one parent (restriction site polymorphisms).
  • scores supporting the PI and P2 alleles for that Mfel marker were generated as follows. For all polymorphic nucleotides within each read aligned at the Mfel site, the read was allowed to increase support for the PI or P2 score of that Mfel marker by the base quality (Phred) of the allele called. No read was allowed to increase the total PI or P2 support by more than its higher alignment quality. In cases where only one parent had an Mfel site (restriction site
  • polymorphisms each aligned read increased the relevant allele support for that Mfel marker by the alignment quality of the read.
  • strains were then grouped into tetrads based on common tetrad barcode sequences. Strains derived from each plate of 25 sorted tetrads were analyzed independently to reduce the risk of encountering more than 1 tetrad with the same plasmid barcode. Duplicate strains were identified (>90 identical allele calls across at least 100 markers) and the lower coverage strain removed. Strains where the number of tetrad barcode reads was ⁇ 0.15% of the number of aligned reads used in genotyping (cutoff determined empirically) had their tetrad barcode removed and were relabeled as un-barcoded.
  • strains, tetrads and markers were then assessed for several quality metrics. Strains were assessed for heterozygosity based on the proportion of allele 3 calls. Next, 3 and 4-spore tetrads were assessed for the frequency of marker missegregation (>2 PI or P2 alleles). A 10% threshold was used to define "high quality" strains and tetrads. Finally, low quality markers were identified and removed. Unless they were called in >10% but not >60% of strains, mono- allelic markers (Mfel restriction site polymorphisms) were removed. All other markers were also removed unless they were called as PI or P2 in >10% of strains and showed a P1/P2 segregation ratio across all strains within the range of 0.8: 1.25.
  • missing alleles were inferred based on the relative probability of all possible local crossover patterns within the tetrad, anchored at flanking positions with allele calls in all 4 spores.
  • Recombination frequencies were calculated from the physical distance between markers, using a genome-wide regression of genetic on physical distance with genetic distances calculated using Haldane's mapping function 12. Allele calls with probabilities greater than 0.99 were then accepted.
  • estimated recombination frequencies between missing and flanking typed markers within each strain individually were used to infer the probability of PI vs. P2 alleles at the missing markers. Genetic distance was calculated from physical distance using the same method as previously and the same probability cutoff of 0.99 was employed.
  • haplotypes were first generated without linkage -based inference and then analyzed using R/qtl (version 1.21-2). Markers with abnormal linkage patterns (linked to no other marker, linked to another chromosome or distant region of the same chromosome etc.) were identified and flagged. Haplotypes were then generated a second time allowing the use of linkage-based inference, after removing the flagged markers.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Mycology (AREA)
  • Biomedical Technology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Improvements in methods to obtain and characterize the genome of tetrad spores by providing fluorescent markers and/or barcodes are disclosed. The invention relates to improvements in techniques for studying the effects of genetic mutations on phenotypes and interactions between genes and environmental factors. A classic method for such study based on sporulation of tetrads in yeast and other microorganisms that undergo similar behaviors is adapted for high throughput analysis by providing mioticbased fluorescent labeling and the use of barcodes.

Description

IMPROVED HIGH THROUGHPUT SYSTEM FOR GENETIC STUDIES
Statement of Rights to Inventions Made Under Federally Sponsored Research
[0001] This invention was supported in part by grants from the National Institute of Health and the National Human Genome Research Institute. The U.S. government has certain rights in this invention.
Technical Field
[0002] The invention relates to improvements in techniques for studying the effects of genetic mutations on phenotypes and interactions between genes and environmental factors. A classic method for such study based on sporulation of tetrads in yeast and other microorganisms that undergo similar behaviors is adapted for high throughput analysis by providing miotic- based fluorescent labeling and the use of barcodes.
Background Art
[0003] Meiotic mapping is a linkage-based method for analyzing the recombinant progeny of a cross that has long been a cornerstone of genetics. The method is possible in a wide range of eukaryotes, including genetically facile yeasts and less tractable microorganisms, such as the filamentous fungus Neurospora crassa and the unicellular green alga Chlamydomonas reinhardtii. The approach is enabled by tetrad dissection, a technique for isolating and cultivating "with complete certainty all of the spores [meiotic progeny] derived from individual asci [tetrads]" that was first developed in S. cerevisiae by Winge and Laustsen1. In the 75 years since that publication, the method has catalyzed yeast genetic research. However, the manual process of dissecting tetrads severely limits its throughput, even for experienced researchers with access to specialized equipment {i.e., a microscope equipped with a micromanipulator). One prominent yeast geneticist, Cora Styles, documented 12,157 yeast crosses over a career that spanned 30 years (Gerald Fink, personal communication), a number few could hope to replicate.
[0004] Many approaches have tried to circumvent this bottleneck. While details differ between methods and organisms, they generally employ one of three strategies. The first strategy, "random spore analysis", initially enriches for tetrads by relying on properties of the ascus, which protects spores from a variety of insults that kill vegetative cells . Spores are then randomly dispersed on solid media to recover the recombinant progeny. A second strategy avoids much of the high variability and low specificity of random spore analysis by using a selectable reporter gene (HIS3) under the control of a mating-type-specific transcriptional promoter . This approach has been applied with great success to generate specific classes of
3-5
recombinant progeny needed to test synthetic growth defects " and linkage between traits and gene deletions6. The third strategy, "bulk segregant analysis"7 or more recently "extreme QTL (X-QTL) mapping" , has been used in organisms ranging from yeast to plants. A common feature of bulk segregant methods is the use of a pooled genotyping strategy to identify regions of DNA common to the majority of progeny under a specific selection criterion, e.g., growth under high drug concentrations. While the three strategies have been effectively applied to specific problems, they each have limitations that fall short of the broad applicability of conventional tetrad analysis. Chief amongst these is the inability to recover all viable meiotic progeny (either due to the progeny generation method or the phenotypic selection imposed for bulk comparisons) and the loss of the tetrad relationship between progeny (i.e., knowledge of which sister spores were members of the same original tetrad).
[0005] The limited throughput of manual tetrad dissection stands in stark contrast to the need for extremely large numbers in two research areas where meiotic mapping can be powerfully applied. The first area is the genetic mapping of complex traits resulting from combinations of naturally occurring polymorphisms. A better understanding of how complex interactions between genes and environmental factors give rise to phenotypic variation is essential for human health, agriculture, and bioengineering. Unfortunately, most genetic studies in these areas are currently limited by the power to detect only a small fraction of the genetic loci that contribute to a trait9. The second area is the study of the molecular mechanisms of recombination, in which the segregation pattern of DNA serves as an assay for molecular processes. The study of some recombination and segregation processes depends on capturing all the events of an individual meiosis. A striking example is the study of gene conversion, for which one seminal paper utilized over 19,000 tetrads10. In the absence of high-throughput alternatives, such fields are unable to effectively leverage the current revolution in DNA sequencing technology, the costs of which have decreased 100,000-fold over the past decade and continue to outpace Moore's Law11.
Disclosure of the Invention
[0006] The present invention takes advantage of the availability of reporter genes, barcodes, and efficient sequencing techniques to modernize the use of tetrads in studies of genetic interactions and phenotypic characteristics. Diploids obtained by crossing two parental haploid mating strains are provided with nucleic acids containing an expression system for a fusion protein that comprises a protein that participates in meiosis and a reporter such as a fluorescent protein and are also provided with a barcode. A library of such barcodes is used to transfect a culture of diploids that have been obtained by genetic cross of haploid parents. The expression system and barcode can be supplied on the same plasmid and a library of plasmids containing a multiplicity of barcodes used to transform the culture. The culture is subjected to sporulation stimulus to form tetrads each tetrad containing four haploid spores. The spores are then separated, but sister spores originating from the same tetrad can be identified by the barcode. Further, the tetrads themselves can be separated from the remainder of the culture by FACS based on the reporter fusion protein.
[0007] In somewhat more detail, a new high-throughput tetrad genotyping method called Barcode Enabled Sequencing of Tetrads (BEST) has been developed. BEST expands upon current methods in high-throughput genetics by enabling the generation and genotyping of large numbers of progeny that are isolated, genotyped, and maintained as individuals in a manner that allows the sister spore relationships of all four meiotic products to be recovered. BEST combines tetrad dissection and progeny genotyping and is based on three main strategies. First is the introduction of a reporter construct that labels cells that have undergone meiosis with GFP and allows them to be isolated by fluorescence-activated cell sorting (FACS). Second is the incorporation of a highly complex pool of DNA barcodes in a form that transmits the same unique sequence to all four spores of a tetrad and can be read by DNA sequencing of the recombinant progeny. This identifies which progeny come from the same tetrad. Third is the genotyping step, which in one embodiment uses RAD-tag sequencing of a consistent 2-3% subset of the genome, including the tetrad- specific barcode to genotype progeny strains. The recovery of tetrad relationships along with the empirically-derived genotyping data from the cross allows the accurate inference of missing information, including markers with low sequence coverage as well as the complete genotype of inviable (and therefore unrecoverable) individuals. The method is illustrated in the most commonly used microorganism for meiotic mapping, the yeast S. cerevisiae. However, with minor substitutions of organism- specific reagents, e.g., different sporulation- specific proteins fused to GFP, the method should be readily transferrable to other microorganisms, including organisms in which meiotic mapping is significantly more labor intensive or currently intractable.
[0008] Thus in one aspect, the invention is directed to an improved method for isolating and sequencing spores from a tetrad-forming organism wherein said improvements comprise providing diploids subject to tetrad formation and sporulation which diploids contain a nucleic acid molecule comprising an expression system for a fusion protein wherein said fusion protein comprises a fluorescent marker fused to a meiosis-dependent protein and/or said diploids contain unique barcodes.
[0009] In another aspect, the invention is directed to a nucleic acid molecule which contains an expression system operable in an organism that forms spores from tetrads said system to produce a fusion protein wherein said fusion protein comprises a fluorescent marker fused to a meiosis-dependent protein. The nucleic acid molecules may also contain a unique barcode and/or a selection marker.
[0010] In still another aspect, the invention is directed to a culture of cells or a library comprising nucleic acid molecule which contains an expression system operable in an organism that forms spores from tetrads said system to produce a fusion protein wherein said fusion protein comprises a fluorescent marker fused to a meiosis-dependent protein. The nucleic acid molecules may also contain a unique barcode and/or a selection marker.
Brief Description of the Drawings
[0011] Figure 1 shows an outline of the method of the invention.
[0012] Figure 2 shows the genetic patterns obtained when spores are sorted to a single tetrad.
[0013] Figures 3A-3D show the results of FACS separation of tetrads from other cells.
Modes of Carrying Out the Invention
[0014] Tetrad dissection in yeast has two critical steps that are difficult to automate because they are performed manually with a micromanipulator mounted to a microscope. The first is the isolation of tetrads away from unsporulated cells in the culture, which often out-number tetrads (99 to 1 in the commonly used FY strain background) (Swain Lenz and Fay, unpublished result). The second is physically separating the spores of a tetrad and arranging them in a grid. In S. cerevisiae, spores are held together by both an outer ascus, the remnant of the cell wall of the original diploid cell, and a set of interspore bridges 13. In conventional tetrad dissection, enzymatic digestion removes the ascus and a researcher uses a micromanipulator to break the interspore bridges and array the spores in a gridded pattern. The grid separates spores to prevent interspore mating and also preserves the knowledge of which spores came from the same tetrad.
[0015] The present invention overcomes both of these bottlenecks. Tetrads are isolated from unsporulated cells in the culture using a meiosis-specific fluorescent reporter and FACS. Tetrads are then disrupted, physically separating the spores, which are randomly arrayed on an agar plate. Because sister spores share a unique molecular barcode that can be read during the genotyping of the strains, the tetrad relationship between sisters is maintained even among randomly arrayed cells. Both the meiosis-specific fluorescent reporter and the molecular barcode are introduced and transmitted to the recombinant progeny by means of a plasmid library and maintained by drug selection. Barcodes are oligonucleotides of 4-20 or intermediate numbers of bases. The complexity of the barcode system is determined by the number of randomized bases in the sequence.
[0016] The general process is shown in Figure 1 which depicts a typical high copy plasmid with barcodes used to transform the diploid cells. As noted above, a library with multiple plasmids with different barcodes is used. In this depiction the expression system for the reporter protein is on the same plasmid, though it need not be. The diploids are sporulated to create tetrads that express the fluorescent protein reporter (shown as EGFP in the plasmid and GFP in the tetrad) and the tetrads are then isolated by FACS. The spores are then separated on agar plates and sequenced using the currently available high throughput sequencing techniques. The data are then grouped according to spores originating from the same tetrad.
[0017] FACS sorting permits an easy and rapid separation of 4- spore tetrads out of a mixed population that includes vegetative cells, dead cells, clumped cells, and 2-spore dyads. Several reporter genes have recently been used to fluorescently label tetrads14"16. We chose the well characterized SPS2-GFP fusion, because it has been successfully used to quantitate sporulation in a number of genetically diverse, non-laboratory strains. We then established a series of FACS gating parameters that reproducibly yield 95% 4-spore tetrads, even from strains with relatively poor sporulation efficiency (Scott, Sirr, and Dudley, unpublished results).
[0018] The inclusion of this FACS sorting step is where the invention achieves its largest gain in throughput. Because a FACS sorter is able to query thousands of events per second, the identification and isolation of 10 tetrads can be accomplished by FACS in less than a second, while the equivalent manual process takes an experienced yeast geneticist several minutes. In crosses with poor sporulation efficiency, the burden of manual dissection increases, with the researcher required to hunt through a large excess of unsporulated cells to find tetrads.
Consequently, cell sorting provides an even greater advantage in such strain backgrounds.
[0019] To prevent spore loss during liquid transfer steps, we developed a procedure for sorting tetrads into a pool of zymolyase solution that enzymatically digests the ascus directly on the agar plates. We then separate spores by agitation with glass beads, which applies the mechanical force necessary to break the interspore bridges connecting sister spores. This process also physically disperses spores randomly across the agar plate, far enough apart that colonies are pure clonal isolates of each recombinant spore. In this way, a rapid, high throughput method replaces the visual and manual processes of identifying, disrupting and spacing 4-spore tetrads. The result is that in the -15 minutes of hands-on time required for an experienced yeast researcher to manually dissect 10 tetrads, the method of the invention can yield progeny from over 150 tetrads.
[0020] A molecular barcoding strategy is employed to identify spores from the same tetrad. The strategy satisfies four main criteria. First, the pool of barcodes must be complex enough to ensure that most individuals recovered share a common barcode because they were members of the same tetrad. Second, the barcode must be reliably transmitted to all four tetrad spores. Third, the presence of the barcode should be phenotypically neutral. Finally, the barcode should be compatible with the method used to determine the progeny genotypes, allowing the barcode to be read as part of the strain genotyping workflow. No existing barcoding resource satisfies all of these criteria. For example, strategies that integrate barcodes at a neutral genomic location 17 will be heterozygous in a diploid genome and thus only present in half of the tetrad's spores.
[0021] In S. cerevisiae a highly complex, barcoded 2-micron plasmid library can satisfy all of these requirements. A random barcode sequence, flanked by restriction sites that ensure its representation in sequencing reads of the chosen genotyping method, can be placed on a 2-micron plasmid. Such plasmids are maintained in high copy (10-40 copies per cell) and stably segregate during cell division 18 , greatly increasing the likelihood of the plasmid' s transmission to all four spores. Like the native 2-micron circle 18 , the presence of engineered 2- micron plasmids should have a relatively neutral impact on most traits. However, because the plasmid is no longer required after the strain's genotype is determined, direct counter selection or simple failure to maintain selection would facilitate plasmid loss.
[0022] A 2-micron-based plasmid library that contains the SPS2-GFP sporulation-specific fluorescent reporter, a complex DNA barcode flanked by restriction sites compatible with our RAD-tag sequencing protocol and a drug resistance marker for plasmid maintenance is transformed into heterozygous diploid cells resulting from a cross of two haploid strains. The complexity of the library is conferred by the presence of a randomized 15 nucleotide sequence, which permits a theoretical 109 unique sequences. The number of different barcodes may vary from 10- 1010 and all intervening integers, e.g., 100, 1000, 105, etc. By pooling thousands of yeast transformants, we create a mixed population of barcoded diploid cells that fluoresce only when cells have undergone sporulation and that pass on a tetrad- specific barcode to each spore of every tetrad. Because the pool of barcodes, determined by the number of initial yeast transformants, is much larger than the number of tetrads that will be mixed together on the same plate, the probability of having the same barcode in two tetrads is low.
[0023] Once the individual spores are separated, each spore is sequenced using an efficient sequencing technique. Any such technique may be used. However, for illustration, and for convenience, a sequencing strategy that permits the simultaneous determination of the genotype itself and the barcode is used.
[0024] Because recombination in yeast generates relatively few crossover events per chromosome per meiosis19, the majority of a recombinant genome sequence can be imputed from relatively sparse genetic markers as shown in Figure 2. A highly multiplexed genome reduction strategy known as RAD-tag 12 , which directs sequencing to positions in the genome that contain a specific restriction site pattern is employed. Our choice of restriction enzymes and 40 base Illumina single-end reads allows us to sequence the same 2-3% of every yeast strain by a multiplexed sequencing strategy. For a cross between strains with a sufficient number of DNA polymorphisms, this provides a high-density set of genotype markers and drastically reduces costs relative to the complete genome sequencing of each progeny strain. If the complete genome sequences of both of the parental strains are known, RAD-tag genotype markers permit the imputation of essentially the entire genome sequence of each recombinant individual. However, the rapid decrease of sequencing costs will soon enable the cost effective determination of complete genome sequences of large numbers of S. cerevisiae progeny.
[0025] Because the plasmid-borne tetrad barcode is flanked by the same restriction sites used in our RAD-tag method, its sequence is present in the genotyping reads. After RAD-tag sequencing, strains arising from the same FACS-sorted plate that share a common plasmid barcode sequence are grouped together as members of the same tetrad, a hypothesis that is confirmed by a series of quality control metrics. The small proportion of strains (5%) that lack a clear tetrad barcode in their sequence reads can later be assigned to tetrads based on the expectation of 2:2 allele segregation of markers within tetrads. Thus, while the presence of a usable tetrad barcode in the sequencing data simplifies tetrad assignment, strains lacking this sequence still have the potential to be assigned to tetrads.
[0026] The method of the present invention permits inference of a complete genome even though some sequence information is missing.
[0027] Missing data arising from stochastic lack of sequence coverage is a common problem in large-scale genetic analysis, even in samples with otherwise high sequence representation. Tetrad information offers a powerful solution because the expected 2:2 allele segregation of each marker can be used to infer the values of markers that are not confidently assigned (see Figure 2). Except in the case of rare gene conversion event, it will always be possible to correctly infer a missing marker from a complete (4-spore) tetrad if the status of that marker in the other three members of the tetrad is known. Similarly, if a marker is missing in two members of a tetrad, it is possible to infer their values from the other two spores 50% of the time. Missing markers can also be inferred probabilistically based on genetic linkage, i.e., an untyped marker that is close to a typed marker has a high probability of carrying the same allele as the typed marker. This probability can be calculated based purely on the genetic distances between markers. However, the use of both genetic distance and the known haplotypes of all spores in the tetrad can improve the accuracy of inference, sometimes greatly, by incorporating the probability of all possible recombination patterns at the tetrad level. In the pilot crosses below, -12% of the final set of allele calls were made using these inference methods.
[0028] An exciting extension of tetrad-based genotype inference is the ability to infer the full genome sequence of non- viable spores. This permits the discovery of synthetic interactions, like those seen in synthetic lethal screens, except that they result from natural variation. Using the invention method, synthetic lethal screens should be significantly less limited by strain background and the number of interacting genes than current methods. For example, it should be possible to uncover a synthetic interaction between four genes in two previously
uncharacterized wild strains. In the pilot crosses below, removing all sequence information for one member of a 4-spore tetrad and reconstructing its genotype by inference {i.e., simulating the inference of a dead spore) recovered -97% of the original allele calls with -0.5% error.
[0029] Two crosses similar in size to those currently being published for complex trait studies in yeast (-100 tetrads) were used to illustrate the invention as set forth in the examples below. The first cross was between two well characterized and commonly used laboratory
20 21
strain backgrounds, FY and∑1278b . By manual dissection, this cross showed high (98%) spore viability. A second cross between the laboratory strain S288c 22 and the wild oak isolate
YPS163 23 gave 86% spore viability. This is more typical of crosses between genetically distant strains. In the high viability (FY x∑1278b) cross, 71% of progeny were assembled into 3 or 4 spore tetrads. For the lower viability (S288c x YPS163) cross 64% were assembled into 3-4 spore tetrads. Using this "percent in tetrads" metric as a measure of efficiency and correcting for the 15% difference in spore viability, the resulting efficiency of the method is equivalent between the two crosses. For both crosses, generate a dense set of genetic markers were generated, with an average marker separation of 18 kb for the FY x∑1278b cross and 10 kb for the S288c x YPS163 cross. The larger number of markers available in the (S288c x YPS163) cross reflects the greater sequence divergence of these two parent strains. [0030] The modest reduction in efficiency of the invention compared to conventional tetrad dissection appears to be attributable to an increased loss of otherwise viable spores, which could result from adhesion to the glass beads during spreading or increased spore death due to the mechanical stresses of the process, or to colonies with mixed genotypes that are therefore unusable. In our pilot crosses -5% of colonies were heterozygous across much of their genome and likely represent failures of sister-spore separation. These decreases in efficiency are easily overcome by the large advance in throughput that the invention method affords. The rapid fluorescence-based identification of tetrads allows large numbers of tetrads to be collected quickly, and the glass-bead based separation method allows those tetrads to be dissected in parallel. As a result, an individual researcher can prepare separated spores from over 500 tetrads per hour.
[0031] Enormous numbers of recombinant progeny are required to gain a full understanding of the mechanisms involved in the complex interplay between genotype, phenotype and environment, and the invention method provides a high-throughput approach combining tetrad dissection and genotyping the progeny of yeast crosses or crosses of other microorganisms, such as Neurospora crassa and Chlamydomonas reinhardtii. In conjunction with multiplexed RAD- tag sequencing, genotype information for each progeny strain while preserving tetrad relationships by means of unique tetrad barcodes is achieved. Of all currently used high throughput methods, the invention most closely recapitulates the information provided by a manually dissected yeast cross. This tetrad information allows for use of the expected 2:2 allele segregation pattern to infer missing markers and permits reconstruction of the full genotypes of spores that are inviable and therefore unrecoverable.
References Winge, O. & Laustsen, O. On two types of spore germination, and on genetic segregations in Saccharomyces demonstrated through single spore cultures. C.R. Trav. Lab. Carlsberg Ser. Physiol. 24, 263-315 (1937).
Amberg, D. C, Burke, D. J. & Strathern, J. N. Random spore analysis in yeast. CSH Protoc 2006, doi: 10.1101/pdb.prot4162 (2006).
Tong, A. H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364-2368, doi: 10.1126/science.l065810 (2001).
Pan, X. et al. A robust toolkit for functional profiling of the yeast genome. Mol Cell 16, 487-496, doi: 10.1016/j.molcel.2004.09.035 (2004).
Schuldiner, M. et al. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell 123, 507-519, doi:S0092- 8674(05)00868-8 [pii] 10.1016/j.cell.2005.08.031 (2005).
Snitkin, E. S. et al. Model-driven analysis of experimentally determined growth phenotypes for 465 yeast gene deletion mutants under 16 different conditions. Genome Biol 9, R140, doi:gb-2008-9-9-rl40 [pii] 10.1186/gb-2008-9-9-rl40 (2008).
Michelmore, R. W., Paran, I. & Kesseli, R. V. Identification of markers linked to disease- resistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proceedings of the National Academy of Sciences of the United States of America 88, 9828-9832 (1991).
Ehrenreich, I. M. et al. Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464, 1039-1042, doi: 10.1038/nature08923 (2010).
Mackay, T. F., Stone, E. A. & Ayroles, J. F. The genetics of quantitative traits: challenges and prospects. Nat Rev Genet 10, 565-577, doi: 10.1038/nrg2612 (2009).
Fogel, S., Mortimer, R., Lusnak, K. & Tavares, F. Meiotic gene conversion: a signal of the basic recombination event in yeast. Cold Spring Harb Symp Quant Biol 43 Pt 2, 1325-1341 (1979).
Lander, E. S. Initial impact of the sequencing of the human genome. Nature 470, 187-197, doi: 10.1038/nature09792 (2011).
Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3, e3376, doi: 10.1371/journal.pone.0003376 (2008).
Coluccio, A. & Neiman, A. M. Interspore bridges: a new feature of the Saccharomyces cerevisiae spore wall. Microbiology 150, 3189-3196, doi: 10.1099/mic.0.27253-0 (2004). Chin, B. L., Frizzell, M. A., Timberlake, W. E. & Fink, G. R. FASTER MT: Isolation of Pure Populations of a and alpha Ascospores from Saccharomycescerevisiae. G3 (Bethesda) 2, 449-452, doi: 10.1534/g3.111.001826 (2012).
Gerke, J. P., Chen, C. T. & Cohen, B. A. Natural isolates of Saccharomyces cerevisiae display complex genetic variation in sporulation efficiency. Genetics 174, 985-997, doi:genetics.106.058453 [pii] 10.1534/genetics.106.058453 (2006). 16 Thacker, D., Lam, L, Knop, M. & Keeney, S. Exploiting spore-autonomous fluorescent protein expression to quantify meiotic chromosome behaviors in Saccharomyces cerevisiae. Genetics 189, 423-439, doi:genetics.111.131326 [pii] 10.1534/genetics.l l l. l31326 (2011).
17 Yan, Z. et al. Yeast Barcoders: a chemogenomic application of a universal donor-strain collection carrying bar-code identifiers. Nat Methods 5, 719-725, doi: 10.1038/nmeth. l231 (2008).
18 Armstrong, K. A., Som, T., Volkert, F. C, Rose, A. & Broach, J. R. Propagation and
expression of genes in yeast using 2-micron circle vectors. Biotechnology 13, 165-192 (1989).
19 Mancera, E., Bourgon, R., Brozzi, A., Huber, W. & Steinmetz, L. M. High-resolution
mapping of meiotic crossovers and non-crossovers in yeast. Nature 454, 479-485, doi: 10.1038/nature07135 (2008).
20 Winston, F., Dollard, C. & Ricupero-Hovasse, S. L. Construction of a set of convenient Saccharomyces cerevisiae strains that are isogenic to S288C. Yeast 11, 53-55,
doi: 10.1002/yea.320110107 (1995).
21 Dowell, R. D. et al. Genotype to phenotype: a complex problem. Science 328, 469,
doi: 10.1126/science.H89015 (2010).
22 Mortimer, R. K. & Johnston, J. R. Genealogy of principal strains of the yeast genetic stock center. Genetics 113, 35-43 (1986).
23 Sniegowski, P. D., Dombrowski, P. G. & Fingerman, E. Saccharomyces cerevisiae and Saccharomyces paradoxus coexist in a natural woodland site in North America and display different levels of reproductive isolation from European conspecifics. FEMS Yeast Res 1, 299-306 (2002).
[0032] The following examples are offered to illustrate but not to limit the invention.
Preparation A
Yeast Strains, Media, and Manipulation
[0033] Unless noted, standard media and methods were used for growth and genetic manipulation of yeast1.
[0034] S. cerevisiae strains and genome sequences used in this study are as follows.
Strain Genotype Genome Build
FY4 MATa a
SGD R64-l-l_20110203
∑1278b MATa Genbank: ACVY01
YPS163 MATa This workb
S288c a
MATa SGD R64-l-l_20110203
a on the Web at yeastgenome.org/
b on the Web at ncbi.nlm.nih.gov/sra (accession number upon manuscript acceptance) Preparation B
Whole Genome Sequence of YPS163
[0035] A genome assembly for YPS163 was generated by assembling a maximal consistent set of polymorphisms relative to S288c and applying these polymorphisms to the reference sequence. YPS163 was sequenced by 40bp paired-end sequencing using a Genome Analyzer IIx (Illumina). Polymorphisms relative to the S288c reference genome were detected by two methods. First, read pairs were aligned to the S288c reference sequence (listed above) using BWA (v5.8) allowing 6 mismatches and using quality trimming with a threshold of Phred=20. SAMtools (vO.1.17) was then used to call SNPs using the mpileup and view commands.
Potential indels were also identified by this method and confirmed by local de novo assembly of readpairs with at least one end mapped in a 300 bp window around the candidate indel. De novo assembly was carried out using Velvet (vl.0.13)4 with a kmer length of 15. The assembled contigs were aligned to the S288c reference using the MUMmer (v 3.22) 5 nucmer command and only indels supported by this alignment (using the show-snps command) were then accepted. Second, the complete set of read pairs was quality filtered and error corrected using QUAKE (v0.3)6 with a kmer length of 16 and a coverage cutoff of 2. These read pairs were assembled into contigs using Velvet (vl.0.13) with kmer lengths of 15, 19, 23, 27, 31 and the resulting assemblies aligned to the S288c reference using the nucmer command of
MUMmer (v 3.22). Consensus SNPs and short indels were identified using the MUMmer show-snps command and longer indels were identified by analysis of split alignments using the MUMmer show-coords command. The set of polymorphisms identified by both the
resequencing and de novo assembly approaches were then combined after resolving
incompatible polymorphisms in favor first of the largest indels and then SNPs identified by SAMtools over those identified by de novo assembly. Upon manuscript acceptance, raw DNA sequencing reads for the whole genome sequencing of YPS163 will be deposited in the NCBI Sequence Read Archive (on the Web at ncbi.nlm.nih.gov/sra/).
Example 1
pCL2 BC barcode library construction
[0036] The oligonucleotides used in this study are in Table 1. Table 1
Primer Sequence
Gapl.l_F CTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATAG
ATCTCACTAAGAATTGAAGC
Gapl.l_R CTCCTTACGCATCTGTGCGGTATTTCACACCGCATAGATCTTA
ACCCTAAGGAAGAACCG
BC_F GAATTCCTGCAGCCCCCTGC
BC_R ACTAGTGGATCCCCCCCTAACTCACGTAAT
aPl Adaptor-sense ACACTCTTTCCCTACACGACGCTCTTCCGATCT[4 base barcode] aPl Adaptor- antisense /5Phos/AATT[4 base barcode] AG ATCGG A AG AGCGTCGTGT AGG
GAAAGAGTGT
P2 Adaptor-sense /5Phos/GATCCTCAGGCATCACTCGATTCCTCCGAGAACAA
P2 Adaptor-antisense CAAGCAGAAGACGGCATACGACGGAGGAATCGAGTGATGCC
TGAG
Illumina PCR Forward AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACAC
GACGCTCT
Illumina PCR Reverse CAAGCAGAAGACGGCATACGA
BC_longmer GAATTCCTGCAGCCCCCTGCNNNNNNNNNNNNNNNGGGTCG
GCACAATTGGCTAACCTTCATCCTTATCAAAGCTTGGAGCCA
ATGATGAGGATTATTGCCTTGCGACAGACTTCCTACTCACAGT
CGCTCACATTGAGCTACTCGATGGGTCATCAGCTTGACCCGG
TCTGTTGGGCCGCGATTACGTGAGTTAGGGGG
aPl 4 base barcode sequences (strain barcodes) used in this study: GGAT, TGCA, CGTT,
AGGC, GGTA, TGGT, CGCG, AGAA, GGCC, TGAC, CGGA, AGTG, GCGT, TCTT, CCAA, ACCA, GCAC, TCCG, CCTC, ACGG, GCTG, TCGA, CCCT, ACAT, GTCA, TTAA, CTGG, ATTT, GTGC, TTTC, CTAT, ATCG, GTAG, TTCT, CTTA, ATGA, GATT, TAGC, CACC, AAAC, GACG, TAAT, CAGT, AATA, GAGA, TATG, CAAG, AACT
[0037] The plasmid-based barcode library (pCL2_BC) was constructed in two steps.
[0038] First the pCL2 plasmid backbone was constructed by gap repair in yeast as follows: the yeast 2-micron ADE2 plasmid, pRS422 , was cut with Bglll. The ADE2-containing fragment was discarded and the remaining plasmid backbone was treated with Antarctic
Phosphatase (New England Biolabs) to prevent re-ligation, then gel-purified. An
SPS2::EGFP::kanMX4 cassette was amplified from BC257 (gift of Barak Cohen) using primers Gapl.l_F and Gapl.l_R that bear homology to both the SPS2 genomic and plasmid DNA sequences. The resulting PCR product was co-transformed along with the plasmid fragment into yeast. Transformants were selected on YPD agar containing 200 μg/ml G418. G418 resistant clones were scraped and pooled; DNA was prepared and transformed into OneShot TOP10 chemically competent bacteria (Life Technologies). Bacterial transformants were selected on LB-carbenicillin plates and analyzed by restriction digestion to identify the repaired plasmid.
[0039] Next, a complex library of random barcodes was inserted as follows: 20 nmoles of a 200-mer oligo, including a high complexity 15-base degenerate region, was amplified by 20 rounds of PCR using Phusion® High-Fidelity DNA Polymerase (Thermo Fisher) with BC_F and BC_R primers at a final concentration of 20 pM each. The DNA from a pool of 24 separate reactions was pooled and ligated to the linearized pCL2 at its unique Smal site using the In-Fusion® HD Cloning System (Clontech). To maintain complexity, five ligation reactions were carried out and used for 18 independent bacterial transformations onto LB-carbenicillin selection plates. Each transformation produced an average of 3.5 x 104 colonies. A pilot ligation, transformed and screened by X-gal blue- white, showed a low plasmid re-ligation background of -5%. The transformants were scraped from the plates, re-suspended and divided into 5 separate pools. Plasmid DNA was extracted and purified using a Qiagen Plasmid Maxi Purification kit (Qiagen) from each pool.
[0040] The barcode complexity of the pCL2_BC library was assessed by Illumina DNA sequence analysis. Briefly, 1.5 μg of the plasmid library was fragmented by digestion with Mfel and Sau3Al (a DAM-methylation insensitive isoschizomer of Mbol). Digests were incubated for 2 hrs at 37°C in a 20 μΐ reaction with 2 units of Sau3Al and 10 units Mfel (NEB), followed by heat inactivation at 65°C for 20 min. The P2 adaptor and four sets of barcoded PI adaptors were then ligated onto the plasmid fragments at room temperature for 20 min in a single 25 μΐ reaction containing 1 μg of digested plasmid, 400 units T4 DNA ligase (NEB), 2.5 μΐ ΙΟχ T4 ligase buffer and 6 μΐ of a combined PI (25 nM), P2 (1 μΜ) adaptor mix. The T4 ligase was heat inactivated for 20 min at 65°C. The ligated plasmid DNA was concentrated to 10 μΐ using a MinElute® PCR Purification Kit (Qiagen). The DNA was size selected and extracted, as below. Approximately 10 ng of the purified plasmid DNA library was enriched with a PCR reaction and sequenced in a single flow cell lane of a Genome Analyzer IIx (Illumina).
Example 2
Generation of Barcoded Yeast Tetrads and Tetrad Isolation by FACS
[0041] Heterozygous diploids resulting from crosses between two parental strains were grown to ~2 x 10 cells/ml and transformed with ~2 μg of the pCL2_BC barcoded plasmid library using a standard protocol9 modified to include 8% DMSO in the transformation mix. After the 30 min 42°C heat shock step, the transformed cells were gently washed with 1 ml of YPD, resuspended in 1 ml of YPD, and allowed to recover by sitting at room temperature for 3 hours. Transformants were then selected by plating 200 μΐ of the recovered culture per YPD + 200 μg/ml G418 plate, a total of five plates per transformation. This protocol yielded a library of ~104 single colonies. Transformants were pooled by scraping the plates. A portion of the pool was saved as a frozen glycerol stock to set up sporulation cultures at a later date. All crosses described in this work were performed with frozen stocks revived by an overnight growth in liquid YPD + 200 μg/ml G418. These cells were washed and transferred to liquid sporulation medium10 that also included 200 μg/ml G418. Sporulation was performed at room temperature with agitation and monitored daily. Cultures were deemed ready for sorting when sporulation had reached the point of completing well-formed tetrads, without significant numbers of dyads.
[0042] Tetrads were isolated from the sporulation culture by FACS with a FACSAria II equipped with an Automated Cell Deposition Unit (BD Biosciences). GFP fluorescence was detected using the 488 nm laser and 530/30 filter. To achieve a reproducibly high proportion of tetrads we implemented a series of gating steps. The results are shown in Figure 3. Selecting a narrow width of the FSC and SSC signals, while permitting a large range of FSC and SSC heights filtered out events containing cell or media debris as well as those containing multiple cells per droplet (Figure 3A,B). A GFP vs. FSC area gate was used to identify fluorescent (and therefore sporulated) cells (Figure 3C). The population selected by these steps consisted of two subpopulations: one subpopulation was composed of clumps of tetrads and tetrads with a small bud attached, while the other subpopulation was primarily composed of isolated tetrads. These subpopulations were distinguished from each other on the basis of their FSC signal. The clumps and budded tetrads had a higher FSC than the isolated tetrads, though the distribution of FSC in these two subpopulations did overlap as indicated by the overlapping peaks in
(Figure 3D). To enrich for isolated tetrads, we set a final gate to include events with a low FSC. During gate assignment, tetrad recovery was assessed by sorting 1000 events onto a microscope slide and manually counting tetrads.
Example 3
Spore Separation by On-Plate Digestion and Glass Bead Spreading
[0043] To prevent spore loss during liquid handling, tetrads were sorted directly onto YPD + 200 μg/ml G418 agar plates with a 25 μΐ drop of lmg/ml zymolyase in 0.7 M sorbitol on top of the agar. Tetrads were sorted into the drop by positioning the plate on top of the 96-well plate adaptor and directly under the sorting stream. To reduce the chance of recovering two tetrads with the same plasmid barcode on the same plate and to ensure the development of single, isolated colonies, only 25 tetrads were sorted per plate. Each plate was inverted immediately after being removed from the sorter and incubated at 37°C for 30 min. to digest the asci. After digestion, 100 μΐ PBS and 15-25 glass beads (Sigma- Aldrich, 425-600 μΜ) were added to each plate. Plates were shaken vigorously for 4 min. in stacks of 5 plates and then incubated face up (with glass beads in place) at 30°C for 2 days. After colonies appeared, plates were carefully inverted to remove the glass beads without disturbing the colonies and the number of single colonies on each plate was counted to assess the success of the spore separation treatment. In our hands, different sporulation conditions have different, strain- specific effects on the ability to disrupt tetrads. Each colony was picked into a well of a 96-well plate containing liquid YPD + 200 μg/ml G418. Information about which colonies came from which agar plate was recorded. These colonies were cultured for genotyping and preserved as frozen glycerol stocks.
Example 4
RAD-tag Progeny Genotyping
[0044] Yeast genomic DNA was isolated for RAD-tag sequencing as follows. 96-well format plates were used to seed 0.5 ml cultures in 2 ml deep-well plates containing YPD with 200 μg/ml of G418. These were then grown overnight at 30°C on a VibraTranslator® electromagnetic shaker (Union Scientific Corp.). Yeast cells were pelleted at 1000-g for 5 min. Yeast genomic DNA was extracted in 96-well format using the ZR-96 Fungal/Bacterial DNA Kit™ (ZymoResearch). Briefly, each cell pellet was re-suspended with 50 μΐ H20, 400 μΐ of ZR lysis buffer was added and the suspension was transferred to the kit' s ZR lysis rack, containing 0.5 mm beads. The racks were processed at 1300 rpm for 2 min in a 96-well block bead beater (Geno/Grinder® 2010, SPEX Sample Prep). After centrifugation, supematants were transferred to a 96 deep-well block and DNA binding, washing and elution procedures were followed as specified in the manufacturer's protocol, except that DNA was eluted in 35 μΐ of DNA elution buffer.
[0045] The genotype and barcode of each strain was determined using a multiplexed RAD- tag11 sequencing strategy. For each progeny strain, -50 ng genomic DNA was fragmented by restriction enzyme digestion with Mfel and Mbol (New England Biolabs). The digests were incubated 1 hr at 37°C in a 12.5 μΐ reaction containing 2.5 units of each enzyme, then heat inactivated at 65°C for 20 min. Adaptors were ligated onto the fragments in a 25 μΐ reaction containing the entire digest, 400 units T4 DNA ligase (New England Biolabs), 2.5 μΐ ΙΟχ T4 ligase buffer and 5 μΐ of a combined PI (25 nM), P2 (1 μΜ) adaptor mix (IDT) at room temperature for 20 min. The PI adaptor contains the Illumina PCR Forward sequencing primer sequence followed by one of 48 unique 4-nucleotide barcodes and finally the Mfel restriction enzyme compatible overhang sequence. The P2 adaptor contains the Illumina PCR Reverse primer sequence followed by the Mbol restriction enzyme compatible overhang sequence. Because an edit distance of >=2 separates all 48 PI barcode sequences, a single base sequencing miscall in the barcode cannot generate another expected barcode sequence. After ligation, the T4 enzyme was heat inactivated for 20 min at 65°C, and the barcoded ligation products from 12 strains were pooled, concentrated to 10 μΐ using a MinElute® PCR Purification Kit (Qiagen), and size selected on a 2% Certified Low Range Ultra Agarose gel (BioRad). A range of fragments from 150 to 500 base pairs was excised from the gel and extracted with several MinElute® Gel Extraction Kit (Qiagen) columns. DNA extracted from four gel lanes was then pooled to multiplex 48 samples in one sequencing library. The DNA library was then enriched with a PCR reaction using Illumina PCR Forward and Reverse primers and Phusion® HF PCR Master Mix polymerase (Finnzymes). Thermocycler conditions were as follows: 98°C /l min; 14 cycles of 98°C /10 sec, 60°C /30 sec, 72°C /30 sec; final extension at 72°C /4 min.
Reactions were then purified with QIAquick® PCR Purification Kit columns (Qiagen).
Sequencing runs were performed on the Genome Analyzer IIx (Illumina) for 40 base pair single-end reads, with one library of 48 multiplexed samples per flow cell lane, which yields 20-40 million reads.
Example 5
Strain Genotype and Tetrad Determination
[0046] For each lane of sequencing, raw read sequences were split into 48 pools based on their strain barcode sequences, which are contained in the first four bases of the read. Reads with unexpected strain barcodes or with barcodes having Phred (-10 logio Perror) quality scores less than 20 or ambiguous ("N") calls at any barcode base were discarded. Reads with more than 2 "N" calls outside the barcode were also discarded. In each of the resulting strain- specific pool of reads, the barcode sequences were removed and the remaining 36 base pairs of sequence were searched for reads carrying the plasmid (tetrad) barcode. Tetrad barcodes were identified using the pattern <read-start>NNNNNTGCCGACCC<barcode>GCAGG, where the barcode is restricted to a length of 11-19 nucleotides. A single mismatch or nucleotide deletion was allowed in the pattern match outside the barcode. A consensus length and sequence for the tetrad barcode were derived from the set of all plasmid barcode reads coming from each strain. [0047] The strain- specific read pools were then used to infer the genotypes of the progeny strains. From each strain pool, the sequence reads that did not correspond to the plasmid barcodes (above) were aligned to both fully sequenced parental genome sequences. To simplify the sequence alignment, these parental genomes were reduced to include only sequences within one read length of an Mfel site, the only relevant portion of the genome that should be detectable by the RAD-tag method. These reduced parental genomes were additionally aligned to the FY reference genome so that the Mfel restriction sites and any sites polymorphic between the parents, could be assigned positions within each parental genome and also related to a position in the reference genome. Read alignment was carried out using BWA (v5.8r allowing 6 mismatches and using quality trimming (threshold of Phred=20). Each Mfel site was used as a potential genotyping marker that could be called as having the parent 1 (PI) or parent 2 (P2) allele in each strain. Two classes of informative Mfel markers were observed: loci in which the Mfel site was present in both parental genomes with adjacent sequence polymorphisms and loci in which the Mfel site was present in only one parent (restriction site polymorphisms). In cases where both parental strains had an Mfel site at the same location (relative to S288c), scores supporting the PI and P2 alleles for that Mfel marker were generated as follows. For all polymorphic nucleotides within each read aligned at the Mfel site, the read was allowed to increase support for the PI or P2 score of that Mfel marker by the base quality (Phred) of the allele called. No read was allowed to increase the total PI or P2 support by more than its higher alignment quality. In cases where only one parent had an Mfel site (restriction site
polymorphisms) each aligned read increased the relevant allele support for that Mfel marker by the alignment quality of the read.
[0048] After read processing, a consensus allele call was made at each informative Mfel marker in each strain, using the PI and P2 scores along with the number of aligned reads. For markers where the site was present in both parents, this was done as follows: when there was only one supporting read, or two reads resulting in both PI and P2 scores greater than 0, an error allele of 0 (insufficient evidence) was assigned. When there were more than two reads with (min(pl_score, p2_score))/(pl_score+p2_score)>=0.1 an error allele of 3 (heterozygous) was assigned. Otherwise, a greater PI score led to an allele score of 1 and a greater P2 score led to an allele score of 2. For markers where the site was present in only one parent (restriction site polymorphisms) more than 5 supporting reads were required to make the allele call, else an error allele of 0 (insufficient evidence) was assigned.
[0049] The strains were then grouped into tetrads based on common tetrad barcode sequences. Strains derived from each plate of 25 sorted tetrads were analyzed independently to reduce the risk of encountering more than 1 tetrad with the same plasmid barcode. Duplicate strains were identified (>90 identical allele calls across at least 100 markers) and the lower coverage strain removed. Strains where the number of tetrad barcode reads was <0.15% of the number of aligned reads used in genotyping (cutoff determined empirically) had their tetrad barcode removed and were relabeled as un-barcoded. Strains where the consensus sequence made up <85% of all plasmid barcode sequences were also reclassified as having no tetrad barcode. An attempt was then made to assign the un-barcoded strains to any 2 or 3 spore tetrads from the same sorting plate. This was done by examining the frequency of non 2:2 allele segregation in each existing tetrad when each single un-barcoded strain was added. Un- barcoded strains with <10% missegregation across at least 200 total informative markers were then added to the tetrad in increasing missegregation order, to a maximum of 4 strains in the tetrad. For any "tetrads" with > 4 spores, including the remaining set of strains lacking an accepted tetrad barcode (treated as a single "tetrad"), real tetrads were recovered based on the same allele missegregation analysis. Specifically, the frequency of non 2:2 segregation across all informative markers was calculated for each subset of 4 strains. Any subset with <10% missegregation across a total of more than 200 markers was accepted as a true tetrad. After testing all subsets of 4 strains and removing any true tetrads, all subsets of 3 were tested in the same way. Low coverage (<5000 reads) un-barcoded strains were not considered for assignment to tetrads.
Example 6
Genotype Quality Control and Missing Data Inference
[0050] Having generated an initial haplotype for each strain and assigned strains to tetrads, the strains, tetrads and markers were then assessed for several quality metrics. Strains were assessed for heterozygosity based on the proportion of allele 3 calls. Next, 3 and 4-spore tetrads were assessed for the frequency of marker missegregation (>2 PI or P2 alleles). A 10% threshold was used to define "high quality" strains and tetrads. Finally, low quality markers were identified and removed. Unless they were called in >10% but not >60% of strains, mono- allelic markers (Mfel restriction site polymorphisms) were removed. All other markers were also removed unless they were called as PI or P2 in >10% of strains and showed a P1/P2 segregation ratio across all strains within the range of 0.8: 1.25.
[0051] The superset of high quality markers called in any strain was then compiled and used to identify missing allele calls in each strain (including error allele calls), which were then inferred, where possible. For this purpose, all markers in heterozygous strains were temporarily reinitialized to allele 0. Similarly, temporary genotypes were initialized for the missing spores from good quality 3 spore tetrads. The missing allele values were then inferred in three ways, in order. First, for strains in good quality tetrads, any marker called as PI or P2 in 2 of the strains allowed assignment of missing marker alleles in either of the other strains with essentially complete confidence. Second, for good quality tetrads, missing alleles were inferred based on the relative probability of all possible local crossover patterns within the tetrad, anchored at flanking positions with allele calls in all 4 spores. Recombination frequencies were calculated from the physical distance between markers, using a genome-wide regression of genetic on physical distance with genetic distances calculated using Haldane's mapping function 12. Allele calls with probabilities greater than 0.99 were then accepted. Third, estimated recombination frequencies between missing and flanking typed markers within each strain individually were used to infer the probability of PI vs. P2 alleles at the missing markers. Genetic distance was calculated from physical distance using the same method as previously and the same probability cutoff of 0.99 was employed.
[0052] Because linkage-based inference methods require error-free genetic maps, haplotypes were first generated without linkage -based inference and then analyzed using R/qtl (version 1.21-2). Markers with abnormal linkage patterns (linked to no other marker, linked to another chromosome or distant region of the same chromosome etc.) were identified and flagged. Haplotypes were then generated a second time allowing the use of linkage-based inference, after removing the flagged markers.
[0053] In -5% of the strains in each of our test crosses, no plasmid barcode sequence was detected, despite high read coverage of the genomic DNA sequence. In all 89 tested cases, plasmid DNA could be isolated from these strains. Capillary sequencing (Beckman Coulter Genomics) of this DNA identified two major classes of events accounting for the missing barcode information. In the first class, a base change inactivated the Mfel site. In the second class, a new Mbol site was generated within the random tetrad barcode sequence.
References Rose, M., Winston, F. & Hieter, P. Methods in Yeast Genetics: A Laboratory Course Manual. (Cold Spring Harbor Laboratory Press, 1990).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinformatics 25, 1754-1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078- 2079 (2009).
Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18, 821-829 (2008).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome biology 5, R12 (2004).
Kelley, D. R., Schatz, M. C. & Salzberg, S. L. Quake: quality-aware detection and correction of sequencing errors. Genome biology 11, R116 (2010).
Brachmann, C. B. et al. Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14, 115-132 (1998).
Gerke, J. P., Chen, C. T. & Cohen, B. A. Natural isolates of Saccharomyces cerevisiae display complex genetic variation in sporulation efficiency. Genetics 174, 985-997 (2006). Gietz, R. D. & Woods, R. A. Transformation of yeast by lithium acetate/single- stranded carrier DN A/polyethylene glycol method. Methods Enzymol 350, 87-96 (2002).
Tong, A. H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364-2368 (2001).
Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3, e3376 (2008).
Haldane, J. The combination of linkage values and the calculation of distance between loci of linked factors. Journal of Genetics 8, 299-309 (1919).

Claims

Claims
1. An improved method for isolating and sequencing spores from a tetrad-forming organism wherein said improvements comprise providing diploids subject to tetrad formation and sporulation which diploids contain a nucleic acid molecule comprising an expression system for a fusion protein wherein said fusion protein comprises a fluorescent marker fused to a meiosis-dependent protein and/or said diploids contain unique barcodes.
2. The improved method of claim 1 wherein the fusion protein is an SPS2 :green fluorescent protein fusion.
3. The improved method of claim 1 wherein the barcode is an oligonucleotide of 10-20 nucleotides.
4. The improved method of claim 1 wherein said diploid is provided with both said nucleic acid molecule comprising the expression system for said fusion protein and the barcode.
5. The method of claim 4 wherein said expression system and said barcode are present in the same nucleic acid molecule.
6. The method of claim 5 wherein said nucleic acid molecule is a 2 micron plasmid.
7. The method of claim 1 wherein said isolating is accomplished by FACS.
8. The method of any of claims 1-7 wherein the organism is yeast.
9. A nucleic acid molecule which contains an expression system operable in an organism that forms spores from tetrads which expression system produces a fusion protein wherein said fusion protein comprises a fluorescent marker fused to a meiosis-dependent protein.
10. The nucleic acid molecule of claim 9 which further contains a barcode.
11. The nucleic acid molecule of claim 10 wherein the barcode is an oligonucleotide of 10-20 nucleotides.
12. The nucleic acid molecule of claim 9 wherein the fusion protein is an SPS2: green fluorescent protein fusion.
13. The nucleic acid molecule of claim 9 which is a 2 micron plasmid.
14. The nucleic acid molecule of claim 10 which is a 2 micron plasmid.
15. The nucleic acid molecule of claim 9 which is operable in yeast.
16. The nucleic acid molecule of claim 10 which is operable in yeast.
17. A culture of cells of a tetrad-forming organism wherein said cells comprise the nucleic acid molecule of any of claims 9-16.
18. A library of nucleic acid molecules of any of claims 9- 16 which contains a multiplicity of said nucleic acid molecules with different barcodes.
19. The library of claim 18 which comprises at least 10 different barcodes.
20. The library of claim 18 which comprises at least 100 different barcodes.
21. The library of claim 18 which comprises at least 1,000 different barcodes.
22. The library of claim 18 which comprises at least 106 different barcodes.
PCT/US2013/064694 2012-10-12 2013-10-11 Improved high throughput system for genetic studies WO2014059370A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261713427P 2012-10-12 2012-10-12
US61/713,427 2012-10-12

Publications (1)

Publication Number Publication Date
WO2014059370A1 true WO2014059370A1 (en) 2014-04-17

Family

ID=50477955

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/064694 WO2014059370A1 (en) 2012-10-12 2013-10-11 Improved high throughput system for genetic studies

Country Status (1)

Country Link
WO (1) WO2014059370A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018014002A1 (en) * 2016-07-15 2018-01-18 Pacific Northwest Diabetes Research Institute Systems and methods to facilitate genetic research
WO2019056927A1 (en) * 2017-09-25 2019-03-28 江苏中新医药有限公司 Method and biological indicator for rapidly determining sterilization effect
US20190367904A1 (en) * 2016-11-07 2019-12-05 Zymo Research Corporation Automated method for release of nucleic acids from microbial samples
EP3822365A1 (en) * 2015-05-11 2021-05-19 Illumina, Inc. Platform for discovery and analysis of therapeutic agents

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040121324A1 (en) * 2001-01-18 2004-06-24 Brenner Charles M. Barcoded synthetic lethal screening to identify drug targets
US20080287317A1 (en) * 2001-08-15 2008-11-20 Charles Boone Yeast arrays, methods of making such arrays, and methods of analyzing such arrays
WO2011031319A2 (en) * 2009-09-10 2011-03-17 Whitehead Institute For Biomedical Research Rnai in budding yeast

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040121324A1 (en) * 2001-01-18 2004-06-24 Brenner Charles M. Barcoded synthetic lethal screening to identify drug targets
US20080287317A1 (en) * 2001-08-15 2008-11-20 Charles Boone Yeast arrays, methods of making such arrays, and methods of analyzing such arrays
WO2011031319A2 (en) * 2009-09-10 2011-03-17 Whitehead Institute For Biomedical Research Rnai in budding yeast

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHEN ET AL.: "Generation and analysis of a barcode-tagged insertion mutant library in the fission yeast Schizosaccharomyces pombe.", BMC GENOMICS, vol. 13, no. ISS. 1, 3 May 2012 (2012-05-03), pages 1 - 18 *
GERKE ET AL.: "Natural Isolates of Saccharomyces cerevisiae Display Complex Genetic Variation in Sporulation Efficiency.", GENETICS, vol. 174, no. ISS. 2, 1 September 2006 (2006-09-01), pages 985 - 97 *
LUDLOW ET AL.: "High-throughput Tetrad Analysis.", NATURE METHODS., vol. 10, no. 7, 12 May 2013 (2013-05-12), pages 1 - 16 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3822365A1 (en) * 2015-05-11 2021-05-19 Illumina, Inc. Platform for discovery and analysis of therapeutic agents
US11795581B2 (en) 2015-05-11 2023-10-24 Illumina, Inc. Platform for discovery and analysis of therapeutic agents
WO2018014002A1 (en) * 2016-07-15 2018-01-18 Pacific Northwest Diabetes Research Institute Systems and methods to facilitate genetic research
EP3485044A4 (en) * 2016-07-15 2020-01-22 Pacific Northwest Diabetes Research Institute Systems and methods to facilitate genetic research
US20190367904A1 (en) * 2016-11-07 2019-12-05 Zymo Research Corporation Automated method for release of nucleic acids from microbial samples
WO2019056927A1 (en) * 2017-09-25 2019-03-28 江苏中新医药有限公司 Method and biological indicator for rapidly determining sterilization effect

Similar Documents

Publication Publication Date Title
Todd et al. Genome plasticity in Candida albicans is driven by long repeat sequences
Gallaher et al. High‐throughput sequencing of the chloroplast and mitochondrion of Chlamydomonas reinhardtii to generate improved de novo assemblies, analyze expression patterns and transcript speciation, and evaluate diversity among laboratory strains and wild isolates
Williams‐Carrier et al. Use of Illumina sequencing to identify transposon insertions underlying mutant phenotypes in high‐copy Mutator lines of maize
Cromie et al. Genomic sequence diversity and population structure of Saccharomyces cerevisiae assessed by RAD-seq
Blackwell et al. Functional genomics: lessons from yeast
Pomraning et al. Bulk segregant analysis followed by high-throughput sequencing reveals the Neurospora cell cycle gene, ndc-1, to be allelic with the gene for ornithine decarboxylase, spe-1
Maclean et al. Deciphering the genic basis of yeast fitness variation by simultaneous forward and reverse genetics
Naranjo et al. Dissecting the genetic basis of a complex cis-regulatory adaptation
Oliver From gene to screen with yeast
Ludlow et al. High-throughput tetrad analysis
Bleykasten-Grosshans et al. Species-wide transposable element repertoires retrace the evolutionary history of the Saccharomyces cerevisiae host
WO2014059370A1 (en) Improved high throughput system for genetic studies
Coughlan et al. The yeast mating-type switching endonuclease HO is a domesticated member of an unorthodox homing genetic element family
Tisch et al. Omics analyses of Trichoderma reesei CBS999. 97 and QM6a indicate the relevance of female fertility to carbohydrate-active enzyme and transporter levels
Haas et al. Mapping ethanol tolerance in budding yeast reveals high genetic variation in a wild isolate
Schmidt et al. Evaluation of Saccharomyces cerevisiae wine yeast competitive fitness in enologically relevant environments by barcode sequencing
Bozdag et al. Engineering recombination between diverged yeast species reveals genetic incompatibilities
CA2459450A1 (en) Yeast arrays, methods of making such arrays, and methods of analyzing such arrays
Zamora et al. PCR-based assay for mating type and diploidy in Chlamydomonas
EP1397513A2 (en) Novel technology for genetic mapping
Li et al. A multiplexed, three-dimensional pooling and next-generation sequencing strategy for creating barcoded mutant arrays: construction of a Schizosaccharomyces pombe transposon insertion library
Boocock et al. Single-cell eQTL mapping in yeast reveals a tradeoff between growth and reproduction
Gollnisch et al. SAG-RAD: a method for single-cell population genomics of unicellular eukaryotes
Collins et al. Variation in ubiquitin system genes creates substrate-specific effects on proteasomal protein degradation
Jawher et al. Efficient discovery of single-nucleotide variations in Cochliobolus sativus vegetative compatibility groups by EcoTILLING

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13845786

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13845786

Country of ref document: EP

Kind code of ref document: A1