WO2012019765A1

WO2012019765A1 - Methods and systems for tracking samples and sample combinations

Info

Publication number: WO2012019765A1
Application number: PCT/EP2011/004010
Authority: WO
Inventors: Julien Gagneur; Vicente José PELECHANO GARCIA; Lars Steinmetz; Christoph Merten
Original assignee: European Molecular Biology Laboratory (Embl)
Priority date: 2010-08-10
Filing date: 2011-08-10
Publication date: 2012-02-16

Abstract

The present invention relates to a method of tracking combinations of components of chemical or biological libraries using a molecular barcode system. It further relates to libraries containing molecular barcodes. The invention also relates to a molecular barcode system.

Description

METHODS AND SYSTEMS FOR TRACKING SAMPLES AND SAMPLE COMBINATIONS

BACKGROUND OF THE INVENTION

High-throughput screening (HTS) is a well-established and rapidly growing technology for use as primary screening method in drug discovery in the Pharma and Biotech industry and is now also being employed for academic research. Essentially it comprises the screening of large chemical and biological libraries for activity against biological targets. Virtually all pharmaceutical companies maintain or build libraries of compounds, extracts or dried plant material for their own use, for exchange and for licensing. The use of automation, miniaturised assays and large-scale data analysis allows an ever-increasing size of library capacity. The average compound library size for instance has increased from about 100 000 specimen samples in 1994 to more than 500 000 specimen samples in 2000 while companies like Pfizer and Pharmacopeia possess libraries of 1.1 and 3.3 million specimen samples, respectively (Mayr and Bojanic, Curr Opin Pharmacol. 2009 Oct;9(5):580-8; Zhu and Cuozzo J Biomol Screen, 2009 Dec;14(10):l 157-64; ten Kate and Laird, The Commercial Use of Biodiversity: Access to Genetic Resources and Benefit-sharing, Earthscan 2003).

Combinatorial therapy, in which drug cocktails instead of single drugs are used, is a promising strategy to overcome the compensatory mechanisms and unwanted off-target effects of individual drugs (Joseph Lehar et al., Synergistic drug combinations tend to improve therapeutically relevant selectivity. Nat Biotechnol (2009) vol. 27 (7) pp. 659). Combination therapy has also proven to be efficient to overcome drug resistance, as in the case of the HIV virus. When trying to identify a combination of drugs with a desirable effect, the number of samples to screen increases immensely. One solution to screen large numbers of samples is the use of microfluidic technology.

Droplet-based microfluidic systems allow the compartmentalization of biological and chemical samples at very high frequencies. Up to several thousand samples per second can be processed. The resulting aqueous microcompartments (pico - nanoliter volumes) can be used as independent reaction vessels for a variety of applications, including chemical synthesis, (bio)chemical assays and cell-based screens (Schaerli and Hollfelder, The potential of microfluidic water-in-oil droplets in experimental biology. Mol Biosyst (2009) vol. 5 (12) pp. 1392-404). When performing droplet fusion, droplet-based microfluidic systems also allow mixing individual samples at very high rates (up to kilohertz frequencies), thus generating samples of new/modified composition on-chip (Mazutis et al., A fast and efficient microfluidic system for highly selective one-to-one droplet fusion. Lab Chip (2009) vol. 9 (18) pp. 2665-2672). In theory, this can be exploited to screen combinatorial mixtures of compounds or reactants for a desired property (e.g. the therapeutic effect of a drug cocktail) at very high throughput. This can be achieved by pooling droplets of different composition and fusing them in a random combinatorial fashion. Subsequently, the fused droplets can be screened and sorted for a desired property (e.g. a fluorescence signal resulting from a biochemical assay). Consequently, droplets containing a compound mixture mediating the desired readout signal can be selected. Since the mixtures are generated in a random combinatorial fusion process, the identity of their content still has to be determined, for example using molecular barcodes.

Molecular barcodes are unique molecules associated with a particular library specimen sample and they are transmitted with the library component comprised in this specimen sample. To determine the identity of library components in a sample containing components of one or more libraries, one identifies the molecular barcodes using a preferably simple method applicable to all molecular barcodes. This approach is clearly superior to identifying the library components themselves as the latter may require elaborate methods and also a variety of different methods if the library components are not of the same class of substances. The most commonly used barcode molecules are oligonucleotides due to simple unique design possibilities and cost-effective and established identification technologies such as sequencing or microarray hybridization. However, the application of oligonucleotides as molecular barcodes reaches its limits when libraries are very large and specimen samples of different libraries are combined. This is because for each combination of library components, separate steps necessary for identification have to be performed. In the case of employing sequencing for instance this would involve PCR amplification as well as the actual sequencing reaction for each combination to be identified. If, for example, specimen samples of an average compound library with 500 000 specimen samples are combined with a small cell line library of 20 specimen samples, then there are 10 million combinations to be identified. If the combinations are assayed for an effect and only positive combinations are to be identified, then there are still 10 000 to 100 000 combinations to be identified if an average hit rate of 0.1-1% for an unbiased library is assumed (Bleicher et al., GPCR HIT DISCOVERY BEYOND HTS, Beilstein Institut The Chemical Theatre of Biological Systems, May 2004, Bozen, Italy). Hence, up to 100 000 separate PCR and sequencing reactions are required. It goes without saying that this number is even greater when larger or more than two libraries are combined. Thus, there is a need in the art for less laborious and less expensive molecular barcode systems and methods of using the same, enabling the identification of a large number of combinations of specimen samples of different libraries.

SUMMARY OF THE INVENTION

The present invention relates to the tracking of combinatorial samples for parallel analysis and in particular to a method of mixing library members of different composition, each one including a unique identifier, in a random combinatorial fashion. Subsequently, the samples, i.e. the combined library members can be screened and selected for a desired property (e.g. a fluorescence signal resulting from a biochemical assay). However, since the samples are generated in a random combinatorial fashion, the identity of their content, i.e. the library members, still has to be determined. This can be done by analysing each sample individually and decoding the two or more identifiers present in the sample. However, this approach is very slow for large sample numbers. In contrast, a parallel analysis of the identifiers of all selected samples drastically increases the throughput. This can be achieved by pooling all samples, as long as the identifiers of each individual sample are stably modified and/or linked to each other and as long as no further linking and/or modification of identifiers occurs after pooling, thus keeping track of the identity of the individual samples entering the pool.

In a first aspect, the present invention relates to a method of determining the identity of a sample in a set of samples, comprising the steps of:

(A) providing a set of samples wherein each sample comprises:

(aa) a member of a first library, wherein each member of said first library comprises a library component and a library member identifier,

(bb) a member of a second library, wherein each member of said second library comprises a library component and a library member identifier, and optionally

(cc) one or more members of one or more supplemental libraries, wherein each member comprises a library component and a library member identifier, (B) in each sample, combining identifiers by generating direct or indirect specific interactions between identifiers resulting in linkage of two or more identifiers and/or identifier modification,

(C) combining at least a subset of samples from the set of samples into a sample pool, under conditions which prevent further linkage and/or modifications, and

(D) determining identifier linkage and/or modifications in the sample pool by analysing the linkage and/or modifications generated in (B).

In a second aspect, the present invention relates to a set of a first and a second library for carrying out the method of the present invention, wherein each member of the first and second library comprises a different library component and a different terminal library identifier and optionally one or more supplemental libraries, wherein each member of the one or more supplemental libraries comprises a different library component and an internal library identifier or set thereof, wherein

(aa) each library identifier of said first library comprises the following oligonucleotide elements:

(i) a priming region (PI),

(ii) a unique region (Ul ), and

(iii) an annealing region (Al), and

(bb) each library identifier of said second library comprises the following oligonucleotide elements:

(i) a priming region (P2)

(ii) a unique region (U2), and

(iii) an annealing region (A2)

and optionally

(cc) each unit of two internal library identifiers of one or more supplemental libraries comprises a first internal library identifier which comprises the following oligonucleotide elements:

(i) an annealing region (Aimemai ), wherein Ai„_temai anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uintemail ), and optionally

(iii) a linking region (LI) that anneals to the optional linking region (L2) of the second member of the unit, and a second internal library identifier which comprises the following oligonucleotide elements:

(i) an annealing region wherein Aj„tenia]2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region Q _mterna]2), and/or

(iii) a linking region (L2) that anneals to the optional linking region (LI) of the first member of the unit,

(dd) each internal library identifier of one or more supplemental libraries comprises the following oligonucleotide elements:

(i) an annealing region (Asternal 1 ), wherein Asternal 1 anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uin_temai3), and

(iii) an annealing region

wherein Ain_temai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library

and/or

(ee) one or more bridging oligonucleotides each comprising the following elements:

(i) an annealing region (Abndgel ), wherein Abridge 1 anneals to Al , Aintemai2, to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide,

(ii) optionally a connector region (C), and

(iii) an annealing region (Abndge2), wherein Abndge2 anneals to A2, Aimemail , to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide.

In a third aspect, the present invention provides a set of groups comprising a group of first terminal library identifiers and a group of second terminal library identifiers for carrying out the method of the present invention, wherein each member of the group of the first and second terminal library identifiers is different from each other and wherein each terminal library identifier of the first and second group comprises the following oligonucleotide elements:

(i) a priming region (P 1 , P2),

(ii) a unique region (Ul , U2), and (iii) an annealing region (Al, A2), wherein Al anneals to A2 or optionally to the annealing region of an internal library identifier or unit thereof (Aintemail ,

and the length and the sequence of the oligonucleotides is such that the T_m of complementary annealing regions is lower than the T_m of a primer or primers to PI and/or P2.

In a fourth aspect, the present invention provides a computer implemented method of designing a group of first terminal library identifiers, a group of second terminal library identifiers and optionally one or more groups of internal library identifiers for carrying out the method of the present invention, comprising the following steps:

(i) registering an input comprising the number of libraries and the number of members per library to be assayed,

(ii) determining the number of groups of internal library identifiers required according to the method of the present invention,

(iii) generating sequences for PI and P2, respectively, wherein each sequence

is a naturally occurring sequence, is generated randomly or is selected from a set of pre-existing priming sequences, and

allows annealing of a primer sequence at a specific and/or preset annealing temperature which is essentially identical for P 1 and P2 of the group of first and group of second terminal library identifiers, respectively,

(iv) generating annealing sequences for Al and A2 and optionally Aintemail and Ai_nternai2, wherein each sequence:

is a naturally occurring sequence, is generated randomly or is selected from a set of pre-existing annealing sequences,

is identical for each annealing region of the same group of terminal library identifiers,

if internal library identifiers are present Aintemail and Ain_temai2, respectively, is identical for one annealing region of the same group of internal library identifiers or of the same group of units of two internal library identifiers,

wherein Al anneals to A2 or Ain_leniail or to the annealing region of another internal library identifier of a further supplemental library and A2 anneals to Al or Aj_ntemai2 or to the annealing region of another internal library identifier of a further supplemental library and wherein the length and sequence of all annealing regions (Al , A2, Aintemai and Ai_ntemai2) and linking regions (LI and L2) is such that the annealing temperature is lower than the annealing temperature of (a) primer(s) annealing to PI and/or P2 or a sequence complementary thereto, does neither anneal to the annealing region of members of the same library nor to the priming regions of the terminal library identifiers or their corresponding primer sequences at the lowest linking temperature,

(v) generating unique sequences for the unique regions of the groups of terminal and internal library identifiers, wherein each unique sequence

is a naturally occurring sequence, is generated randomly or is selected from a set of pre-existing unique sequences,

each differ in a least one nucleotide from the other unique sequences of the same group of terminal or internal library identifiers,

do not allow annealing to the priming region, annealing region or unique region of any other terminal and internal library identifiers at the linking temperature and optionally anneals to the unique sequence of another member of a unit of internal library identifiers.

This summary of the invention does not necessarily describe all features of the invention.

DETAILED DESCRIPTION OF THE INVENTION

Before the present invention is described in detail below, it is to be understood that this invention is not limited to the particular methodology, protocols and reagents described herein as these may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims. Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art.

Preferably, the terms used herein are defined as described in "A multilingual glossary of biotechnological terms: (IUPAC Recommendations)", Leuenberger, H.G.W, Nagel, B. and olbl, H. eds. (1995), Helvetica Chimica Acta, CH-4010 Basel, Switzerland).

Several documents are cited throughout the text of this specification. Each of the documents cited herein (including all patents, patent applications, scientific publications, manufacturer's specifications, instructions, GenBank Accession Number sequence submissions etc.), whether supra or infra, is hereby incorporated by reference in its entirety. Nothing herein is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention.

In the following, the elements of the present invention will be described. These elements are listed with specific embodiments, however, it should be understood that they may be combined in any manner and in any number to create additional embodiments. The variously described examples and preferred embodiments should not be construed to limit the present invention to only the explicitly described embodiments. This description should be understood to support and encompass embodiments which combine the explicitly described embodiments with any number of the disclosed and/or preferred elements. Furthermore, any permutations and combinations of all described elements in this application should be considered disclosed by the description of the present application unless the context indicates otherwise.

Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and "comprising", are to be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integer or step.

As used in this specification and the appended claims, the singular forms "a", "an", and "the" include plural referents, unless the content clearly dictates otherwise.

The present invention relates to a method of determining the identity of a sample in a set of samples, comprising the steps of:

(A) providing a set of samples wherein each sample comprises:

(cc) one or more members of one or more supplemental libraries, wherein each member comprises a library component and a library member identifier,

(B) in each sample, combining identifiers by generating direct or indirect specific interactions between identifiers resulting in linkage of two or more identifiers and/or identifier modification,

(C) combining at least a subset of samples from the set of samples into a sample pool, under conditions which prevent further linkage and/or modifications, and (D) determining identifier linkage and/or modifications in the sample pool by analysing the linkage and/or modifications generated in (B).

In the context of the present invention, the term "library" refers to a collection of specimen samples, which are preferably separately stored. It is preferred that information is associated with each specimen sample. The associated information is preferably stored in and retrievable from a database. A database is structured information. Preferably, this information relates to properties of the specimen sample and optionally further constituents comprised therein, in particular to the properties of the library component comprised in a given specimen sample, such as structure, purity, quantity and/or physical/chemical/biological characteristics. It is preferred that each specimen sample, preferably each library component comprised therein, differs in at least one property of the specimen sample, preferably in at least one property of the library component, with respect to the other specimen samples/library components of the library. Preferably, the library comprises 10, 100, 500, 1,000, 5,000, 10,000, or more specimen samples. It is particular preferred that each of these specimen samples forming the library comprises a library component differing in one property from all other library components in that library. It is also preferred that each specimen sample of a library comprises at least one, two, three library constituents. Constituents can be library components, library identifiers and substances required for storage or function of the library component or identifier, such as buffer components, enzyme inhibitors etc. The terms "first library", "second library" and "supplemental library" do not indicate qualitative differences of the libraries referred to, but are merely used to differentiate between libraries having different further constituents, e.g. identifiers. In a preferred embodiment, a library comprises as library components different chemical compounds or different biological entities or a mixture thereof. Preferred library components are cells, viruses, bacteria, unicellular organisms, multicellular organisms, genetically modified cells, proteins, peptides, hormones, antibodies, RNA, preferably siRNA or dsR A, small organic or inorganic compounds, preferably less than 5000 Da, more preferably less than 2000 Da, drugs, pharmaceutically active substances, metabolites, natural compounds, culture media, body fluids such as urine, blood or lymph, tissues, plant seeds or samples of soil, plants or marine origin.

The term "library member" refers to a set of constituents comprising at least a library component and a library identifier. The term "sample" refers to a set of constituents comprising at least a member of one library and a member of at least one other library. It is different from the term "specimen sample", which refers to a set of substances comprising no more than one different library component. The term "library identifier" refers to any molecule which allows the identification of a library member it is comprised in. Generally, it comprises or consists of a unique composition or chemical moiety, allowing unambiguous identification. For example, a library identifier can be a sequencing identifier, e.g. an oligonucleotide, a spectroscopy identifier, preferably a mass spectroscopy identifier, preferably a peptide, or an optical identifier, preferably an oligonucleotide (Geiss et al., Direct multiplexed measurement of gene expression with color-coded probe pairs. Nature Biotechnology (2008) vol. 26 (3) pp. 317-25) or a quantum dot. Other examples are graphical or electronic identifiers. An identifier which is useful in the present invention preferably possesses one or more of the following attributes:

1) It is capable of being distinguished from all other identifiers.

2) The identifier is capable of being detected when present in low amounts, for example at 10 ^"22 to 10 ^"6mol.

3) The identifier possesses a chemical handle through which it can be attached to a substrate, wherein the substrate can be a carrier or another identifier. The attachment may be made directly or indirectly.

4) The identifier is chemically stable towards all manipulations to which it is subjected and which are not meant to degrade it, in particular those that occur during steps of the method of the present invention described herein.

5) The identifier does not significantly interfere with the manipulations performed on the substrate it can be attached to. For instance, if the identifier is attached to an oligonucleotide, the identifier must not significantly and unintentionally interfere with any hybridisation or enzymatic reactions (e.g., PCR sequencing reactions) performed on the oligonucleotide. Similarly, if the identifier is attached to an antibody, it must not significantly interfere with antigen recognition by the antibody.

The term "spectroscopy identifier" refers to an identifier, which can be detected with a spectroscopic method. Preferably, it should possess properties which enhance the sensitivity and specificity of detection by that method. Spectroscopic methods by which identifiers are usefully distinguished include mass spectroscopy (MS), infrared (1R), ultraviolet (UV), and fluorescence. a. Characteristics of MS identifiers

Where an identifier is analysable by mass spectrometry (i.e., is a MS-readable identifier, also referred to herein as a MS identifier), the essential feature of the identifier is that it is able to be ionized. It is thus a preferred element in the design of MS-readable identifiers to incorporate therein a chemical functionality which can carry a positive or negative charge under conditions of ionization in the MS. This feature confers improved efficiency of ion formation and greater overall sensitivity of detection. Factors that can increase the relative sensitivity of an analyte being detected by mass spectrometry are discussed in, e.g., Sunner J. et al. (Anal. Chem. 60:1300-1307, 1988).

A preferred functionality to facilitate the carrying of a negative charge is an organic acid, such as phenolic hydroxyl, carboxylic acid, phosphonate, phosphate, tetrazole, sulfonyl urea, perfluoro alcohol and sulfonic acid.

Preferred functionality to facilitate the carrying of a positive charge under ionization conditions are aliphatic or aromatic amines. Examples of amine functional groups which give enhanced detectability of MS identifiers include quaternary amines (i.e., amines that have four bonds, each to carbon atoms, see Aebersold, U.S. Pat. No. 5,240,859) and tertiary amines (i.e., amines that have three bonds, each to carbon atoms, which includes C=N— C groups such as are present in pyridine, see Hess et al., Anal. Biochem. 224:373, 1995; Bures et al., Anal. Biochem. 224:364, 1995). Hindered tertiary amines are particularly preferred. Tertiary and quaternary amines may be alkyl or aryl. An MS identifier must bear at least one ionizable species, but may possess more than one ionizable species. The preferred charge state is a single ionized species per identifier. Accordingly, it is preferred that each MS identifier contains only a single hindered amine or organic acid group.

The identification of an identifier by mass spectrometry is preferably based upon its molecular mass to charge ratio (m/z). The preferred molecular mass range of MS identifiers is from about 100 to 2,000 daltons. It is generally difficult for mass spectrometers to distinguish among moieties having parent ions below about 200-250 daltons (depending on the precise instrument), and thus MS identifiers preferably have masses above that range, i.e. preferably from 200 to 2000, more preferably from 25 to 2000 daltons.

It is relatively difficult to distinguish identifiers by mass spectrometry when those identifiers incorporate atoms that have more than one isotope in significant abundance. Accordingly, preferred groups which are intended for mass spectroscopic identification contain carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine. While other atoms may be present, their presence can render analysis of the mass spectral data somewhat more difficult. Preferably, the MS identifiers have only carbon, nitrogen and oxygen atoms, in addition to hydrogen and/or fluoride. Fluoride is an optional, yet preferred atom to have in an MS identifier. In comparison to hydrogen, fluoride is, of course, much heavier. Thus, the presence of fluoride atoms rather than hydrogen atoms leads to higher mass, thereby allowing the MS identifier to reach and exceed a mass of greater than 250 daltons, which is desirable as explained above. In addition, the replacement of hydrogen with fluoride confers greater volatility, and greater volatility of the analyte enhances sensitivity when mass spectrometry is being used as the detection method.

The molecular formula of MS identifiers falls within the scope of C 1-500 N 0-100 O _0-i₀₀ S o-io P 0-10 H a F β I δ wherein the sum of α, β and δ is sufficient to satisfy the otherwise unsatisfied valencies of the C, N, O, S and P atoms. The designation C i_-5o₀N o-ioo O 0-100 S 0-10 P o-io H _α F β I s means that an MS identifier contains at least one, and may contain any number from 1 to 500 carbon atoms, in addition to optionally containing as many as 100 nitrogen atoms ("N 0- " means that MS identifiers need not contain any nitrogen atoms), and as many as 100 oxygen atoms, and as many as 10 sulfur atoms and as many as 10 phosphorus atoms. The symbols α, β and δ represent the number of hydrogen, fluoride and iodide atoms in MS identifiers, where any two of these numbers may be zero, and where the sum of these numbers equals the total of the otherwise unsatisfied valencies of the C, N, O, S and P atoms. Preferably, MS identifiers have a molecular formula that falls within the scope of C 1-50 N 0-10 O o-io H a F β where the sum of a and β equals the number of hydrogen and fluoride atoms, respectively, present in the moiety. Mass spectrometry identifiers can comprise or consist of peptides. b. Characteristics of IR identifiers

There are two primary forms of IR detection of organic chemical groups: Raman scattering IR and absorption DR.. Raman scattering IR spectra and absorption IR spectra are complementary spectroscopic methods. In general, Raman excitation depends on bond polarisability changes, whereas IR absorption depends on bond dipole moment changes. Weak IR absorption lines become strong Raman lines and vice versa. Wavenumber is the characteristic unit for IR spectra. There, are 3 spectral regions for IR identifiers which have separate applications: near IR at 12500 to 4000 cm ^_1 , mid IR at 4000 to 600 cm ^_1 , far IR at 600 to 30 cm ^_1 . For the uses described herein where a compound is to serve as a identifier, the mid spectral regions would be preferred. For example, the carbonyl stretch (1850 to 1750 cm ') would be measured for carboxylic acids, carboxylic esters and amides, and alkyl and aryl carbonates, carbamates and ketones. N— H bending (1750 to 160 cm ^_1) would be used to identify amines, ammonium ions, and amides. At 1400 to 1250 cm ^"', R— OH bending is detected as well as the C— N stretch in amides. Aromatic substitution patterns are detected at 900 to 690 cm ^_1 (C— H bending, N— H bending for ArNH ₂). Saturated C— H, olefins, aromatic rings, double and triple bonds, esters, acetals, ketals, ammonium salts, N— O compounds such as oximes, nitro, N-oxides, and nitrates, azo, hydrazones, quinones, carboxylic acids, amides, and lactams all possess vibrational infrared correlation data (see Pretsch et al., Spectral Data for Structure Determination of Organic Compounds , Springer- Verlag, New York, 1989). Preferred compounds would include an aromatic nitrile which exhibits a very strong nitrile stretching vibration at 2230 to 2210 cm ^_1. Other useful types of compounds are aromatic alkynes which have a strong stretching vibration that gives rise to a sharp absorption band between 2140 and 2100 cm ^_1. A third compound type is the aromatic azides which exhibit an intense absorption band in the 2160 to 2120 cm ^_1 region. Thiocyanates are representative of compounds that have a strong absorption at 2275 to 2263 cm ^_1. c. Characteristics of UV identifiers

A compilation of organic chromophore types and their respective UV-visible properties is given in Scott {Interpretation of the UV Spectra of Natural Products, Permagon Press, New York, 1962). A chromophore is an atom or group of atoms or electrons that are responsible for the particular light absorption. Empirical rules exist for the π to π* maxima in conjugated systems (see Pretsch et al., Spectral Data for Structure Determination of Organic Compounds, p. B65 and B70, Springer- Verlag, New York, 1989). Preferred compounds (with conjugated systems) would possess n to π* and π to π* transitions. Such compounds are exemplified by Acid Violet 7, Acridine Orange, Acridine Yellow G, Brilliant Blue G, Congo Red, Crystal Violet, Malachite Green oxalate, Metanil Yellow, Methylene Blue, Methyl Orange, Methyl Violet B, Naphtol Green B, Oil Blue N, Oil Red O, 4-phenylazophenol, Safranie O, Solvent Green 3, and Sudan Orange G, all of which are commercially available (Aldrich, Milwaukee. Wis.). Other suitable compounds are listed in, e.g., Jane et al. (J. Chrom. 323:191-225, 1985). d. Characteristics of fluorescent identifiers

Fluorescent probes are identified and quantified most directly by their absorption and fluorescence emission wavelengths and intensities. Emission spectra (fluorescence and phosphorescence) are much more sensitive and permit more specific measurements than absorption spectra. Other photophysical characteristics such as excited-state lifetime and fluorescence anisotropy are less widely used. The most generally useful intensity parameters are the molar extinction coefficient (ε) for absorption and the quantum yield (QY) for fluorescence. The value of ε is specified at a single wavelength (usually the absorption maximum of the probe), whereas QY is a measure of the total photon emission over the entire fluorescence spectral profile. A narrow optical bandwidth (<20 nm) is usually used for fluorescence excitation (via absorption), whereas the fluorescence detection bandwidth is much more variable, ranging from full spectrum for maximal sensitivity to narrow band (~20 nm) for maximal resolution. Fluorescence intensity per probe molecule is proportional to the product of ε and QY. The range of these parameters among fluorophores of current practical importance is approximately 10,000 to 100,000 cm ^_1 M for ε and 0.1 to 1.0 for QY. Compounds that can serve as fluorescent identifiers are as follows: fluorescein, rhodamine, lambda blue 470, lambda green, lambda red 664, lambda red 665, acridine orange, and propidium iodide, which are commercially available from Lambda Fluorescence Co. (Pleasant Gap, Pa.). Fluorescent compounds such as nile red, Texas Red, lissamine™, BODIPY™s are available from Molecular Probes (Eugene, Oreg.). Preferred fluorescent identifiers are quantum dots (Medintz et al., Quantum dot bioconjugates for imaging, labelling and sensing, Nature Materials, Vol. 4, June 2005). The term "quantum dot" refers to a quantum dot as defined in Han et al. (Quantum-dot-tagged microbeads for multiplexed optical coding of biomolecules. Nature Biotechnology (2001) vol. 19 (7) pp. 631-5).

Identifiers can also be identified by non-spectroscopic methods. Examples are identifiers that can be identified by sequencing, by imaging or electronically. a. Characteristics of sequencing identifiers

A sequencing identifier is any molecule which is characterised by a sequence of subunits, which can be identified by sequencing methods, for example oligonucleotide (e.g. RNA or DNA) which can be analysed, for example, by classical sequencing (e.g. Sanger sequencing) or next-generation sequencing (Metzker, Sequencing technologies— the next generation. Nat Rev Genet (2009) vol. 11 (1) pp. 31 -46). b. Characteristics of graphical identifiers

A graphical identifier is an element with a unique spatial pattern which can be identified by microscopy imaging (Wilson et al., Encoded microcarriers for high-throughput multiplexed detection. Angew Chem Int Ed Engl (2006) vol. 45 (37) pp. 6104-17). c. Characteristics of electronic identifiers

An electronic identifier is an element that can be identified by electronic methods such as the transmission of a radio-frequency code. One example is the use of microtransponders, e.g. microscopic read-only memory devices that prompt the transmission of a radio-frequency code when exited by a laser (Wilson et al., Encoded microcarriers for high-throughput multiplexed detection. Angew Chem Int Ed Engl (2006) vol. 45 (37) pp. 6104-17).

In a preferred embodiment, however, the term "library identifier" refers to a double, partially double, i.e. comprising protruding 5' and/or 3' single stranded regions, or single stranded oligonucleotide, which is associated with a library component in a sample.

The term "modification" refers to a stable alteration of an identifier, which is due to the presence of another identifier from a different library and/or a carrier. This alteration can be conformational as well as physical/chemical, which includes the addition, i.e. direct or indirect binding of one or more other molecules to the identifier as well as removing a part of the identifier, e.g. cleaving the identifier. Examples for interactions resulting in a modification are given below. Further, the terms "identifier modification" or "modification of an identifier" refer not only to a modification of an identifier itself, but also to the modification of another, non-identifier molecule due to the presence of an identifier. An example is the elongation of non-identifier oligonucleotides dependent on an identifier as a template (see Fig. 12). The term "stable" means that an identifier modification or linkage is preserved throughout the method of the invention, unless measures are taken which specifically aim at altering/reversing the modification. In the case of reaction vessels being microfluidic water- in-oil droplets, the modification of a library identifier remains stable even after disruption of the droplets and the pooling of the samples contained in the droplets. An example for a modification is the template-dependent elongation of oligonucleotides by polymerisation, wherein the template can be, for example, at least part of another identifier of a different library or an oligonucleotide attached to a carrier. Another example is the cleavage of an identifier via an agent such as a DNA restriction enzyme or a proteinase, wherein this agent can be an identifier of a different library and the cleavage site is specific to the identifiers combination (see Fig. 13).

Direct interactions between the identifiers resulting in a linkage of one or more identifiers are such that identifiers of different libraries are linked together via physical links such as the ones mentioned below. Indirect interactions between identifiers are such that identifiers are linked to other identifiers via binding to the same carrier. The linkage is generally stable, but may not be stable if it is accompanied by a stable modification. The interaction preferably is by covalent or non-covalent binding. Binding to the carrier can be by any of the physical links such as the ones mentioned below. The linkage can be a physical link between identifiers by a covalent interaction, e.g. phosphodiester bond (DNA ligation), click chemistry (see "Click chemistry, a powerful tool for pharmaceutical sciences." Hein CD et al., Pharm Res. 25(10):2216-2230, 2008), disulfur bond, peptide bond, or protein-protein, protein-DNA or DNA-DNA cross-linking, by non-covalent interaction, e.g. nucleic acid annealing, protein-protein interaction (for instance antibody-epitope interactions), protein- ligand interaction (for instance streptavidin-biotin), or electro-static interaction (for instance carboxylic acid coated superpara-magnetic beads - DNA) or by topology, e.g. catenated DNA or compartmentalisation (for instance vesicularisation). Accordingly, the linkage of identifiers can also be considered to be a modification of each identifier, since these interactions alter an identifier and can be or result in a stable modification. Identifiers can also undergo a modification and a linkage at the same time, wherein modification and linkage are separate incidents. For example, a single-stranded oligonucleotide identifier can be linked to another oligonucleotide identifier via annealing, and subsequently each identifier can be modified by elongation via a polymerisation, wherein another linked identifier serves as a template. In this particular case, the modification, but not the linkage would be stable in terms of the present invention.

In a preferred embodiment, the set of samples is assayed for assay-dependent effects due to the presence of the two or more library components, preferably during or after step (A) or after step (B). The assay-specific effect can for example be a desired effect of a compound on a cell, and allows the later selection and pooling of the samples, i.e. library member combinations, exhibiting the specific effect (hereafter also called "positive samples"). The assay is preferably carried out before samples are pooled (see step (C) of the above-described method and below). At this stage, the specific combination (resulting in identifier linkage and/or modification) of the library identifiers within the selected samples, and in consequence the combination of the library components, can be identified. However, this approach of identification requires a separate analysis for each combination to be identified. Instead of this laborious identification approach, the method of the invention comprises a further step after step (B) and before step (C), comprising means of preventing further combinations or modifications of identifiers, particularly in the sample pool. All identifier combinations in the sample pool are then pooled, under conditions which prevent further linkage and/or modifications. These conditions can be the result of aforementioned means. Pooled samples can then be analysed in parallel, saving both a significant amount of time and resources.

The assay determines the strength or activity of a library component by comparing its effects with those of a standard. The standard can have no or a negligible effect, but can also have a significant effect if, for example, substances or substance combinations with a particularly strong effect are to be found. The assay can be any assay suitable for indicating the occurrence of a reaction or process directly or indirectly using reporters, such as reporter genes (e.g. β-gal, luciferase, GFP etc.) in cell-based assays. It can be a quantal (all or none response) assay or a graded assay. Non-limiting examples are fluorescence assays using a fluorophore (if an optical identifier is used as an identifier or part of and identifier, preferably emitting light at a different wavelength than the optical identifier). For instance, this can be a cell-based reporter assay in which fluorescence provides a positive read-out for the desirable effect of the combined treatment (e.g. activation of a particular pathway such as apoptosis). Other examples of assay types are radioimmunoassays, microbiological assays, antibody- based assays (e.g. ELISA), detection of antigens at the surface of the cell, metabolite detection (e.g. enzymatic detection of ATP by luciferin-luciferase assay), assays for cell- viability, proliferation, cell fusion, endocytosis, presence of receptors, activity of ion channels, signal transduction, presence of ions, reactive oxygen species, colorimetric assay using e.g. pH indicators and membrane-potential assays using potentiometric probes. The assay can also be a microscopic assay of cell or organism morphology (e.g. live-cell imaging).

The prevention of further linkage and/or modifications of identifiers in a sample pool, wherein at least a subset of samples from the set of samples is combined, is achieved by means or conditions which depend on the type of identifier and/or carrier and the type of linkage and/or modification. Thus, the means mentioned in the following are construed to be merely non-limiting examples. Covalent or non-covalent interactions can be prevented for example by inhibiting or inactivating the enzyme (e.g. DNA polymerase) if the interaction is catalysed enzymatically, for example by changing the temperature (e.g. heat inactivation), the pH, the buffer composition or by adding enzyme-specific inhibitors to achieve conditions under which the enzyme does not function. Also, the identifier or carrier can be altered so that further combinations or modifications do not occur, for example by phosphatase treatment of DNA to prevent ligation or by saturation of binding sites of identifiers and/or carriers, for example by providing an excess of biotin to bind streptavidin beads or an excess of DNA capable of annealing to annealing regions. Further, topological interactions can be prevented by inhibiting the enzyme (e.g. DNA ligase or topoisomerase for catenated DNA) with the afore-mentioned means if the reaction is enzymatically catalysed. Such preventative means or conditions are applied preferably before samples are pooled, i.e. between steps (B) and (C) of the above-described method.

The term "carrier" refers to a physical entity which is capable of binding identifiers, preferably on its surface. Examples for carriers are beads (e.g. polystyrene beads, magnetic beads etc.), single or double stranded oligonucleotide sequences that can act as scaffold for the binding of other molecules (e.g. other oligonucleotides, transcription factors, RNA binding proteins, etc.), circular molecules that topologically trap catenated circular oligonucleotides, molecules that are able to react or bind to other molecules using multiple binding sites (e.g. immunoglobulins such as IgM, IgA, IgB, Ig, and IgD), microscopic slides, wells of a multi-well plate or beads covered by reactive groups (e.g. covered with streptavidin or aminosilane or poly-L-lysine to bind oligonucleotides), vesicles or synthetic cells that incorporate the identifiers in their surface or interior, virus particles or proteins that can assemble around library identifiers such as oligonucleotides, or wells of a multi-well plate. Generally, on their surface carriers carry binding sites to which all identifiers of the libraries can bind. In one embodiment, carriers, particularly beads are used which carry oligonucleotides to which oligonucleotide identifiers of all samples can anneal. In another embodiment, carriers carry antibodies or the antigen-recognising variable part thereof, to which the identifiers can bind. In a preferred embodiment, streptavidin beads are used as a carrier for biotinylated identifiers, e.g. oligonucleotides or quantum dots.

Generally, carriers can be comprised in a sample or they can be added to samples at any point before step (C) of the afore-described method. Preferably, carriers are added in step

(B) , in which they can bind the different identifiers comprised in the sample. Preferably after identifier binding and before pooling of at least a subset of samples, i.e. between steps (B) and

(C) , the carriers are brought into contact with an excess of molecules which bind to the same binding sites of the carrier as the identifiers, thereby saturating the binding sites of the carrier (e.g. an excess of biotin if streptavidin beads are used, non-identifier oligonucleotides annealing to oligonucleotides carried by beads, or antigen binding to antibodies carried by beads). In case binding of identifiers to carriers requires a (bio)chemical reaction, further binding can also be prevented by avoiding this reaction by any of the afore-mentioned means, such as changing temperature, pH, buffer composition or by adding specific inhibitors of the reaction (e.g. of required enzymes). This ensures that any artifactual binding of other identifiers in the sample pool is prevented.

Carriers and the identifiers bound can be analysed using technologies known in the art, including imaging techniques such as microscopy, ELISA, multiplex bead analysis (e.g. Cytometric Bead Assay, Luminex xMap Technology and Coupled Particle Light Scattering; Mohamed et al., Multiplex Bead Array Assays: Performance Evaluation and Comparison of Sensitivity to ELISA, Methods, 2006 April; 38(4): 317-323), DNA sequencing methods, mass spectrometry analysis, fluorescence or bioluminescent analysis, graphical methods, spectroscopic methods or electronic methods and generally all methods mentioned herein for the for the analysis of identifiers.

In a preferred embodiment, beads containing immobilized identifiers are distributed in a bead-array platform (Fan et al., Illumina universal bead arrays. Meth Enzymol (2006) vol. 410 pp. 57-73). The identifiers bound to each bead can then be identified by sequential hybridization of fluorescent labeled oligonucleotides of known sequences (Gunderson et al., Decoding randomly ordered DNA arrays. Genome Research (2004) vol. 14 (5) pp. 870-7). In another preferred embodiment, streptavidin beads are used to carry biotinylated quantum dots as identifiers (commercially available, e.g. from Quantum Dot Corporation), which have a unique emission spectrum for each sample (Han et al., Quantum-dot-tagged microbeads for multiplexed optical coding of biomolecules, Nature Biotechnology (2001) vol. 19 (7) pp. 631- 5)·

Carriers may also be capable of binding other molecules of the sample which are not identifiers, in particular naturally occurring molecules for instance in a sample comprising biological material such as cells. In a preferred embodiment, carriers also bind naturally occurring polynucleotides, e.g. mRNA. Preferably, prevention of binding in step (C) also includes the prevention of binding of further such molecules in this step as well.

In one embodiment, the set of samples in step (A) of above-described method is provided by separately combining each member of a first library with each member of a second library and optionally with each member of the one or more supplementary library, thereby generating a discrete sample for each combination.

In another embodiment, prior to this step (A), each member of the first and second and optionally the one or more supplementary library may be comprised in a separate droplet, on a separate bead or in a separate well of a multi-well plate. In a preferred embodiment, microfluidics is employed, i.e. said member is comprised in a microfluidic droplet, e.g. of a water-in-oil emulsion (Schaerli and Hollfelder, The potential of microfluidic water-in-oil droplets in experimental biology. Mol Biosyst (2009) vol. 5 (12) pp. 1392-404). In another preferred embodiment, said microfluidic droplet also contains carriers at any stage of above- described method to which identifiers can bind.

In a preferred embodiment, microfluidic droplets of one member are fused in a random fashion with droplets of other members in order to achieve a combination of members from one ore more different libraries. Generally, each droplet is fused with a single, randomly picked droplet of another member. This can be achieved by generating an emulsion consisting of a large number of droplets for each member of a library. Subsequently droplets of this emulsion are injected into a micro-channel of a chip. Because no particular order is maintained when generating the emulsions, the series of droplets in the micro-channel contains a random order of members of the library. This process is done separately for each library, starting with two libraries. Subsequently, a one-to-one fusion according to Mazutis et al. (A fast and efficient microfluidic system for highly selective one-to-one droplet fusion. Lab Chip (2009) vol. 9 (18) pp. 2665-2672) of the droplets from the first library with those of the second library is performed. Because of the random sequence of the library members, the droplets of the first library are fused with droplets containing a random member of the second library. The same technique can be applied to generate higher order combinations by merging these fused droplets with those of further libraries similarly generated. The number of droplets created for each member thus depends on the number of desired member combinations and must be sufficiently large to ensure that all combinations occur as a result of random fusion. To ensure all desired combinations, for each combination of one sample, at least 10, at least 100 or preferably at least 1000 droplets are generated from this sample.

The assay for the selection of positive samples or member combinations, in this case positive droplets, can be carried out using an appropriate microfluidic device, e.g. microfluidic devices in poly(dimethylsiloxane) (PDMS) fabricated by soft lithography (Squires and Quake, Microfluidics: Fluid physics at the nanoliter scale, Reviews of Modern Physics, 2005, vol. 77). The droplets may already comprise necessary assay components or they may be merged with droplets containing the assay components (e.g. chemicals, proteins, cells, see Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008). In one embodiment, this assay is a cell-based fluorescent assay. In a preferred embodiment, this cell-based assay is a reporter assay in which fluorescence provides a positive read-out for the desirable effect of the combined samples on a cell (e.g. activation of a particular pathway such as apoptosis, assays for cell-viability, for proliferation, for cell fusion, endocytosis, presence of receptors, activity of ion channels, signal transduction, presence of ions, of reactive oxygen species, pH indicator and membrane-potential using potentiometric probes). It can also be based on other fluorescent or non-fluorescent methods, including detection of antigens at the surface of the cell, metabolite detection (e.g. enzymatic detection of ATP by luciferin-luciferase assay). The assay can also be a microscopic assay of cell or organism morphology (eg. using live-cell imaging). The droplets, which contain both the assay components and the library samples including the identifiers, are incubated for a time period sufficient for the response of the cell and the generation of the readout signal (e.g. change in fluorescence). In one embodiment, the sample droplets are first fused with droplets containing cells, incubated for a time period sufficient for a cellular response and then fused with further droplets containing components of a fluorescence assay and incubated for a time period sufficient for the generation of the readout signal. In a preferred embodiment, sorting is achieved based on the fluorescence intensity of the droplets (Baret et al., Fluorescence- activated droplet sorting (FADS): efficient micro fluidic cell sorting based on enzymatic activity. Lab Chip (2009) vol. 9 (13) pp. 1850-1858). In one embodiment, droplets with a fluorescent signal above a certain threshold are defined as positive samples. In another embodiment, more that one group of positive samples is defined, i.e. a grading of the level of "being positive", using different thresholds for the fluorescent intensity or measuring simultaneously the fluorescence of different cell assays at different emission wavelengths. Positive samples (showing the desired fluorescence intensity) can thus be selected for pooling. Pooling, with respect to microfluidics, refers to pooling the samples (i.e. the droplet contents but not the droplets), i.e. breaking droplets which have been collected in a common recipient. Before pooling, measures are taken as described above to prevent further combination or modification of identifiers. In as far as these measures require further components (e.g. biotin for saturating streptavidin beads), these components can be contained in separate droplets which are fused with the positive droplets using a microfluidic device (see above for appropriate devices). After this step, selected and collected positive droplets are broken by using any means known in the art, for example by addition of a destabilising agent, for example using Emulsion Destabilizer A 104 (RainDance Technologies) as described in Clausell-Tormos et al. (Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008) or by the application of an electric field, to release the combined identifiers into the aqueous phase. Preferably, identifiers of the positive sample which failed to link and/or to be modified, are degraded or removed, for example by enzymatic degradation of single-stranded oligonucleotide identifiers or by washing carriers (e.g. wells or immobilised beads). The subsequent identification of linked and/or modified identifiers depends on the type of identifier and is achieved by standard imaging techniques for the identification of optical identifiers, for example Nanostring technology (Geiss et al., Direct multiplexed measurement of gene expression with color-coded probe pairs. Nature Biotechnology (2008) vol. 26 (3) pp. 317-25), the use of DNA sequencing methods, mass spectrometry or other detection methods (Wilson et al., Encoded microcarriers for high-throughput multiplexed detection, Angew Chem Int Ed Engl (2006) vol. 45 (37) pp. 6104-17) and by methods described elsewhere in the present application.

In a preferred embodiment using microfluidics, identifiers are biotinylated quantum dots which combine indirectly via streptavidin beads (Han et al., Quantum-dot-tagged microbeads for multiplexed optical coding of biomolecules. Nature Biotechnology (2001) vol. 19 (7) pp. 631-5).

In another embodiment, the samples of the set of samples further comprise enzymes, enzymatic buffers, nucleotides, reporter gene expression system, fluorescent molecules, bioluminescent reagents, colorimetric reagents, radioactive molecules, DNA- and RNA- dependent polymerases, oligonucleotides and/or detergents.

In one embodiment, the first library is identical with the second library and with the optional one or more supplemental libraries. This embodiment enables a combination of each library component with every other component and not only of components of distinct libraries. Also, it is not limited to combinations wherein all components of the combination are different. This is particularly relevant in cases in which combinations comprising different numbers of distinct components are to be tested. For example, if three identical libraries are used, combinations of three, two or one components are tested in the same setup. In other words, if three identical libraries X, Y and Z are used, combinations of components such as xl, y2 and z3 (which are all different) are possible, but^' also xl, yl and z3 (wherein components xl and yl are the same) and xl, yl and zl (wherein all components are the same). This embodiment is preferred in cases in which all library components are associated with identifiers all having the same linkage and/or modification properties (i.e. linking or binding site), for example biotinylated identifiers (e.g. oligonucleotides or quantum dots).

In another embodiment, the library components of the first library are identical with the library components of the second library and with the library components of the optional one or more supplemental libraries. This embodiment achieves the same as the preceding embodiment, but it is preferred in cases in which the linkage and/or modification properties of identifiers are dependent on other identifiers, e.g. if identifiers are linked in a sequential fashion and then elongated, for example oligonucleotide identifiers which anneal only to oligonucleotide identifiers of another library to form a chain. Thus, in this embodiment, the library components of the different libraries are identical, the library identifiers, however, are different, at least in the part that determines their linkage. The unique part which allows analysing their identity can, but may not be identical for those identifiers that are associated with the identical library components.

In one embodiment, a further assay is carried out during step D of above-described method, i.e. after combining the samples in a sample pool. This further assay is combines the identification of library identifiers with a molecular read-out of other constituents of the samples, for example library components or parts thereof or of components of assays carried out before step C of above-describe method, such as cells or parts thereof, e.g. DNA, RNA, or proteins. The further assay is preferably carried out simultaneously for all pooled samples or a subset thereof. One example of a further assay is genotyping of a locus of interest, for which the locus is amplified and the amplicons are combined with library identifiers (Fig. 8). Another example is measuring the expression level of a specific gene (Fig. 9 and 10) or a full transcriptome analysis (Fig. 11), in which case cDNAs are generated and combined with library identifiers. In both of these examples, sequencing of the modified identifiers can provide both the identity of the sample combinations as well as the molecular outcome of the assay (amplicon or cDNA).

In one embodiment, the present invention can be applied to performing combinatorial synthetic chemistry and assaying the reaction products for a desired property. For example, the first, second and the optional one or more supplemental libraries contain components comprising chemical entities with non-uniform functional groups, wherein the non-uniform functional groups of one library can undergo a chemical reaction with non-uniform functional groups of one or more other libraries, resulting in the generation of new chemical groups. Thus, by combining the samples new chemical molecules can be generated. In a preferred embodiment, the functional group of the first library is an alkyne and the functional group of the second library is an azide (see also "Click chemistry, a powerful tool for pharmaceutical sciences." Hein CD et al., Pharm Res. 25(10):2216-2230, 2008).

In a preferred embodiment, the present invention relates to a method of determining the identity of a sample in a set of samples, comprising the steps of: providing a set of samples wherein each sample comprises:

(aa) a member of a first library, wherein each member of said first library comprises a library component and a first terminal library identifier, which comprises the following oligonucleotide elements:

(i) a priming region (PI),

(ii) a unique region (Ul), and

(iii) an annealing region (Al), and

(bb) a member of a second library, wherein each member of said second library comprises a library component and a second terminal library identifier, which comprises the following oligonucleotide elements:

(i) a priming region (P2)

(ii) a unique region (U2), and

(iii) an annealing region (A2)

and optionally

cc) one or more members of one or more supplemental libraries, wherein each member comprises a library component and a unit of two internal library identifiers, wherein the first internal library identifier comprises the following oligonucleotide elements:

(i) an annealing region (Ai_nteniail ), wherein Ai_nteraail anneals to Al or to the

annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Ui_nternail), and optionally

(iii) a linking region (LI) that anneals to the optional linking region (L2) of the second member of the unit,

and the second internal library identifier comprises the following oligonucleotide elements:

(i) an annealing region (Aj_ntemai2), wherein Ai_ntemai anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Ui_memai2), and/or

(iii) a linking region (L2) that anneals to the optional linking region (LI) of the first member of the unit

(dd) one or more members of one or more supplemental libraries, wherein each member comprises a library component and an internal library identifier, wherein the internal library identifier comprises the following oligonucleotide elements:

(i) an annealing region (Aj_nleniail), wherein Aj_ntema]l anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Ui_nternai3), and

(iii) an annealing region (Aj_ntema]2), wherein Aj_ntemai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

and/or

(i) an annealing region (Abridgel ), wherein Abridge anneals to Al , Ain_temai2, to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide,

(ii) optionally a connector region (C), and

(iii) an annealing region (Abridge2), wherein Abridge2 anneals to A2, Asternal 1₅ ^to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide,

wherein Al anneals to A2 or Asternal 1 or to the annealing region of another internal library identifier of a further supplemental library and A2 anneals to Al or Α_ύ,,^^ or to the annealing region of another internal library identifier of a further supplemental library;

(B) exposing the set of samples to conditions allowing annealing of the annealing regions and optionally the linking regions, and ligating and/or elongating the annealed library identifiers with a nucleotide polymerase generating double stranded annealed oligonucleotides,

(C) combining at least a subset of samples from the set of samples into a sample pool, and

(D) selecting oligonucleotides wherein at least two identifiers are linked and/or at least one identifier is elongated.

Therein, a group of "first library identifiers" is associated with members of the first library, a group of "second library identifiers" is associated with members of the second library and a group of "internal library identifiers" or a group of "units of two internal library identifiers" can be associated with an optional supplemental library, i.e. each group of library identifiers is associated with a library. In a preferred embodiment, oligonucleotides according to the invention comprise DNA and/or RNA.

Oligonucleotide library identifiers according to this embodiment are subdivided into regions, preferably designated contiguous sections of the nucleotide sequence. A region may overlap with another region, i.e. two regions may share at least a part of their nucleotide sequences. In a preferred embodiment, regions do not overlap. In another preferred embodiment the regions are contiguous. The regions may be present on the same strand of an oligonucleotide or may be present separately on a double or partially double stranded oligonucleotide. In the case of double stranded region it is understood that the complement of the respectively defined region will also be comprised in the library identifier. First library identifiers comprise a priming region (PI ), a unique region (Ul), and an annealing region (Al), preferably on one single oligonucleotide and preferably in this order in 5' to 3' or 3' to 5' orientation, preferably in 5' to 3' orientation. Second library identifiers comprise a priming region (P2), a unique region (U2), and an annealing region (A2, preferably on one single oligonucleotide and preferably in this order in 5' to 3' or 3' to 5' orientation, preferably in 5' to 3' orientation. First internal library identifiers comprise an annealing region (As_ternal 1), a unique region (Uin_tem_ail), and optionally a linking region (LI), preferably on one single oligonucleotide and preferably in this order in 5 to 3' or 3 ^; to 5' orientation, preferably in 3^" to 5' orientation. Second internal library identifiers comprise an annealing region (Ain_temai2), a unique region (υ_ίυ1εΓη3ι2), and optionally a linking region (L2), preferably on one single oligonucleotide and preferably in this order in 5' to 3' or 3' to 5' orientation, preferably in 3' to 5' orientation. A first internal library identifier and a second internal library identifier form a unit of two internal identifiers. Internal library identifiers, which are different from first internal library identifiers and second internal library identifiers, comprise an annealing region (As_ternal 1), a unique region (Uin_tam^), and another annealing region (Ain_temai2), preferably on one single oligonucleotide and preferably in this order in 5' to 3' or 3' to 5' orientation.

As set out above, the order of the regions in the library identifiers can be either in 5 '-3' or in 3 '-5' direction. It is preferred that there is a priming region or reverse complement of a priming region located at each end of the double-stranded annealed oligonucleotide of step (B) of above-described method.

In one embodiment, the primer or primers annealing to a sequence complementary to the priming region of one or both terminal library identifiers, a nucleotide mix, a buffer, reagents and/or the nucleotide polymerase is added to each member of the library prior to providing the set of samples, or during or after above-described step (A), preferably the primer or primers annealing to a sequence complementary to the priming region of one or both terminal library identifiers is added in step (D) of above-described method.

In yet another embodiment, the above-described method comprises in step (D) the further step of analysing the oligonucleotide sequence of the unique sequences of the double stranded annealed oligonucleotides.

The term "bridging nucleotide" refers to an oligonucleotide comprising an annealing region (A_bn_dgingl), a connector region (C), and another annealing region (Abn_dging2), preferably in this order.

The term "priming region" refers to a part of the nucleotide sequence of a terminal library identifier, the sequence of which or the reverse complement of that sequence is capable of sequence specifically annealing to an oligonucleotide primer. Preferably, the priming region has a length of at least 10 nucleotides, preferably of at least 15, more preferably of at least 20 nucleotides. Whether a primer anneals sequence specifically to a given priming region or its reverse complement is determined by the nucleotide sequence, the temperature, salt concentration and pH, at which the priming region or its reverse complement and the primer are contacted. To determine whether two given oligonucleotide sequences anneal, it is common to use the melting point (T_m) of the oligonucleotide sequence. Thus, the length and the nucleotide sequence of the priming region or reverse complement thereof, is chosen in such that it has a melting point of at least 55°C, preferably of at least 60°C, more preferably of at least 65 °C, or most preferably at 70°C under the conditions commonly used for PCR amplification reactions, e.g. 5 minutes of denaturing at 94°C, 25 to 40 amplification cycles (30 seconds of denaturing at 94°C, 20 seconds of annealing at 70°C, 10 seconds of elongation at 72°C) and 5 minutes of final extension at 72°C. The annealing temperature is generally at least 1°C, at least 2°C, at least 3°C, at least 4°C, or preferably at least 5°C below the T_m of the sequences complementary to each other.

Furthermore, the 5' end of the primer may have additional nucleotides attached which do not correspond to the sequence of the priming region. In the most preferred embodiment, the sequence of the primer is identical with the part of the priming region it corresponds to. In one embodiment of the invention, the nucleotide sequence of the priming region PI of each member of the first library is identical and the nucleotide sequence of P2 of each member of the second library is identical, and the nucleotide sequences of PI and P2 are different or identical. Primers anneal to the sequence complementary to the priming region or part of the same at a particular annealing temperature, which depends on the length and nucleotide composition of the sequences annealing to each other. Said annealing temperature is also referred to throughout the application as the "annealing temperature of the priming region".

The term "unique region" refers to a part of the nucleotide sequence of a terminal, an internal, a first internal, or a second internal library identifier. A unique region comprises a sequence which is unique within a group of library identifiers, i.e. it differs in at least one nucleotide from the sequences of all other unique regions of the same group of library identifiers. Its sequence is or is not different from the sequences of all unique regions of the library identifiers of other groups.

The term "annealing region" refers to a part of the nucleotide sequence of a terminal, a first internal, or a second internal library identifier, all of which comprise one annealing region, or of an internal library identifier or a bridging nucleotide, both of which comprise two annealing regions (Agonal 1/Ain_teniai2 and A_bridgel Abridge2, respectively). In a preferred embodiment, the nucleotide sequence of the annealing region Al of each member of the first library is identical, the nucleotide sequence of the annealing region A2 of each member of the second library is identical and optionally the nucleotide sequence of the annealing region or regions of each member of a supplementary library (Aintemail and Ain_temai2) is or are identical. In other words, the annealing regions of a group of library identifiers are preferably identical, wherein for internal library identifiers, all annealing regions Asternal 1 within the same group are preferably identical and all annealing regions Αΐη_1αι13ΐ2 within the same group are preferably identical. However, Aintemail may be different from Ai_nteniai2 within the same group. An annealing region of one group of library identifiers comprises a nucleotide sequence which enables it to anneal to the annealing region of only one other group of library identifiers, wherein Aintemail and A^ema^ each anneal to an annealing region of a different group of library identifiers. The annealing regions Abndgel and A_Dridge2 have the same requirements and properties as the annealing regions Aintemail and A_mtema 2 as far as they are applicable. This principle applies throughout the present application, particularly to embodiments which do not mention the annealing regions Abndgel and A_bndge .

The term "linking region" refers to a part of the nucleotide sequence of a first internal or a second internal library identifier. In a preferred embodiment, the nucleotide sequence of the linking region LI of each member of the first internal library identifier of a supplementary library is identical and the nucleotide sequence of the linking region L2 of each member of the second internal library identifier is identical. In other words, the linking regions LI of first internal identifiers of the same group are preferably identical and the linking regions L2 of second internal identifier of the same group are preferably identical. The linking regions LI of a first internal identifier of one unit of internal library identifiers comprise a sequence which anneals to the sequence of the linking regions L2 of second internal identifier of the same unit of internal library identifiers, i.e. the linking regions LI comprise a sequence which anneals to the sequence of the linking regions L2 of the same unit of two internal library identifiers.

The term "connector region" refers to a part of the nucleotide sequence of a bridging nucleotide. The connector region can be any artificial or naturally occurring nucleotide sequence.

In a preferred embodiment, the elements of the terminal and internal library identifiers within a library fulfil one or more of the following criteria:

(a) annealing regions of the terminal library identifiers, of the optional internal library identifiers, or of the bridging nucleotides, preferably also the linking regions of the internal library identifiers, do not anneal to a unique region and/or to themselves under the conditions of step (B) of the above-described method;

(b) the unique regions of the terminal library identifiers within a library do not anneal to each other or to themselves under the conditions of step (B) of the above-described method;

(c) the unique regions of a set of internal library identifiers within a supplementary library anneal to each other or not but do not anneal to themselves under the conditions of step (B) of the above-described method; and/or

(d) the terminal library identifiers and the optional internal library identifiers in the libraries comprised in the set of samples do not anneal to themselves under the conditions of step (B) of the above-described method.

In other words, preferably the annealing and optional linking regions of one group of library identifiers do not anneal to a unique region of a library identifier of the same group and/or to themselves under the conditions of step (B) of the above-described method or at the lowest annealing temperature. The unique regions of the terminal library identifiers of the same group preferably do not anneal to each other or to themselves under the conditions of step (B) of the above-described method or at the lowest annealing temperature. The unique regions of internal library identifiers of the same group preferably do not anneal to themselves or to each other under the conditions of step (B) of the above-described method or at the lowest annealing temperature. However, the unique regions of the internal library identifier of the same unit may anneal to each other under the conditions of step (B) of the above- described method or at the lowest annealing temperature. Library identifiers preferably do not anneal to themselves under the conditions of step (B) of the above-described method or at the lowest annealing temperature.

The term "anneals to" means that two oligonucleotides with complementary sequences anneal to each other or hybridise at a particular annealing temperature. Thus, said term is defined by the complementarity and the annealing temperature.

"Complementary" or the synonymous "essentially complementary" means that a nucleotide sequence is complementary to another nucleotide sequence to an extent of 70%, 75%, 80%, 85%, 90%, or 95%. Alternatively, it is complementary with the exception of up to 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 nucleotide(s). In the most preferred embodiment, the sequence is 100% complementary. The term "complementary" also requires that the complementary sequences anneal at a particular annealing temperature. The annealing temperature generally is at least 1°C, at least 2°C, at least 3°C, at least 4°C, or preferably at least 5°C below the melting temperature T_m of the sequences complementary to each other. The melting temperature is the temperature at which 50% of an oligonucleotide or part of the same (e.g. an annealing or linking region) is in duplex with an essentially complementary oligonucleotide or part of the same (e.g. a different annealing or linking region). The melting temperature depends on the length and the nucleotide composition of the oligonucleotide or part of the same and on the salt concentration, and it can be calculated using standard algorithms known to the skilled person, for example the basic method (Marmur and Doty. Journal of Molecular Biology (1962) vol. 5 pp. 109-18), the salt adjusted method (Howley et al., J Biol Chem (1979) vol. 254 (1 1) pp. 4876-83), the base-stacking or the nearest-neighbour method (SantaLucia. Proc Natl Acad Sci USA (1998) vol. 95 (4) pp. 1460-5), or combinations of other known methods (Panjkovich and Melo. Bioinformatics (2005) vol. 21 (6) pp. 71 1 -22). The term "lowest annealing temperature" refers to the annealing temperature of an annealing or linking region which is the lowest of the annealing temperatures of all annealing regions and linking regions of the library identifiers of all groups. The annealing temperature of an annealing or linking region is up to 5°C, up to 10°C, up to 15°C, up to 20°C, up to 25°C, up to 30°C, up to 35°C, up to 40°C, up to 45°C, up to 50°C, up to 55°C, up to 60°C, up to 65°C, or up to 70°C or more, and the annealing temperature of a priming region is at least 25°C, at least 30°C, at least 35°C, at least 40°C, at least 45°C, at least 50°C, at least 55°C, at least 60°C, at least 65°C, or at least 70°C.

In a preferred embodiment, the length and sequence of all annealing regions (Al , A2, Aimernail , Ain_tern_ai2, A_bri_dge 1 , and Abrid_ge2) and linking regions (LI and L2) is such that the annealing temperature is lower than the annealing temperature of (a) primer(s) annealing to PI and/or P2 or a sequence complementary thereto. In other words, the annealing temperatures of all annealing and linking regions of all groups of library identifiers and of all optional bridging nucleotides are below the annealing temperatures of the priming regions of the terminal library identifiers. The annealing temperature of the priming region PI may be different from the annealing temperature of the priming region P2. In a preferred embodiment, the annealing temperatures of the priming regions PI and P2 are similar, i.e. they differ by up to 10°C. More preferably, they differ by up to 5°C, and most preferably they are identical to the integer. The annealing temperatures of the annealing or linking regions may be different from each other. In a preferred embodiment, the annealing temperatures of the annealing and linking regions are similar, i.e. they differ by up to 20°C. More preferably, they differ by up to 10°C, even more preferably by up to 5°C, and most preferably they are identical to the integer.

In one embodiment, one or more of the first terminal library identifier, the second terminal library identifier, the internal library identifier, and the bridging nucleotide are at least in part double stranded oligonucleotides and/or the set of two internal library identifiers forms at least in part a double strand oligonucleotides in such that under the conditions of step b) a free 3' position of an identifier is juxtaposed to a free 5' position of another identifier in such that at least one oligonucleotide strand of identifier can be ligated to an oligonucleotide strand of another identifier.

The term "in part double-stranded" refers to an oligonucleotide consisting of a strand of nucleotides which is single stranded for one or more terminal nucleotides at the 5' and/or 3' end of an oligonucleotide comprising an annealing or linking region at said end(s). The number of nucleotides of this single-stranded part is not limited. Preferably, the number is at least 1, at least 2, at least 3, at least 5, at least 10, or at least 15. Even more preferably, the single-stranded part comprises the annealing or linking region and most preferably, the single- stranded part consists of the annealing or linking region.

The term "elongating the annealed library identifiers" refers to the template directed addition of nucleotide building blocks to the free 3' OH end of a polynucleotide. This addition preferably occurs through a DNA or R A polymerase. "Template directed" refers to the addition of nucleotides as determined by the commonly known pairing rules (e.g. A pairs with T or U, G pairs with C) on the basis of the nucleotide sequence of the complementary strand. Preferably such elongation leads to the closure of a gap between two polynucleotides, which annealed spaced apart to complementary regions on the same polynucleotide (gap-filling process) or to the extension until the end of the template is reached (blunt-ending process). At the end of the gap-filling process the free 3-OH end of the last nucleotide added is usually not linked with the subsequent 5' moiety of the annealed polynucleotide. In this cases it is preferred that a subsequent ligation step is carried out, which leads to the formation of a covalent bond between the end of the polynucleotides formed by polymerase fill-in and the previously annealed polynucleotide. Alternatively, a polymerase with 5 '-3' exonuclease activity such as DNA Polymerase I can be used for the gap-filling process. The blunt ending process usually leads to a double-stranded region, with a free 3' prime and 5' prime end. It is known, however, that some RNA or DNA polymerases have terminal transferase activity and will add one or more nucleotides to the blunt end, thereby creating a single strand overhand of one or more bases, which is also referred to as "sticky end".

The embodiment of the invention using unique oligonucleotide identifiers is based on the linkage of library identifiers of different groups, i.e. library identifiers which are associated with different libraries, to an oligonucleotide chain via the annealing of the annealing regions and optionally the linking regions. In its simplest form with two different libraries (e.g. a compound library and a cell line library), a sample comprising a component of the first library and a first terminal library identifier is brought together with a sample comprising a component of the second library and a second terminal library identifier (see e.g. Fig. 2). At an adequate temperature, i.e. the annealing temperature of Al and A2 or the lower of the two, i.e. the lowest annealing temperature, Al and A2 anneal, leading to the formation of an oligonucleotide chain comprising the first library identifier and the second library identifier. Single stranded regions, e.g. gaps, in the double-strand of the oligonucleotide thus combined are filled in with a polymerase. The polymerase used can be any polymerase known in the art to be appropriate for synthesizing a complementary nucleotide strand. In a preferred embodiment, the nucleotide polymerase in step (B) of the above-described method is selected from the group consisting of mesophilic polymerases, thermophilic polymerases, DNA- dependent DNA polymerases if the oligonucelotides consist of DNA, RNA-dependent DNA polymerases if the oligonucelotides consist of RNA. Examples of polymerases are E.coli DNA polymerase, Klenow fragment DNA polymerase, Taq polymerase, Thermococcus kodakaraensis DNA polymerase, Thermococcus litoralis DNA polymerase, Pfu DNA polymerase, Pyrococcus DNA polymerase, Phusion DNA Polymerase, Thermus brockianus DNA polymerase, T4-DNA polymerase, T7-DNA polymerase, Bacillus stearothermophilus DNA polymerase, Sulfolobus DNA Polymerase IV, phi29 DNA Polymerase, M-MLV reverse transcriptase, HIV-1 reverse transcriptase, AMV reverse transcriptase, phi6 RNA polymerase or PyroPhage DNA Polymerase. If necessary, for example when partly double-stranded identifiers are combined (see e.g. Fig. 1) or when elongation breaks off by reaching a double-stranded part and no polymerase with strand displacement or 5'-3' exonuclease is used (as for example phi29 DNA Polymerase, DNA Polymerase I) in the case of using a polymerase without strand displacement or 5' to 3' exonuclease activity (see e.g. Fig. 4), the adjoining nucleotides of the library identifiers are ligated, preferably using a ligase which is stable at the annealing temperature. All pair-wise combinations of samples of the first and the second library run through the same process. In a preferred embodiment, the set of samples is assayed for assay- dependent effects due to the presence of the two or more library components, preferably during or after step (A) or after step (B). The assay-specific effect can for example be a desired effect of a compound on a cell, and allows the selection and pooling of the sample combinations exhibiting the specific effect. The assay is preferably carried out during or after step (A) or after step (B) of the above-described method, i.e. before samples are pooled (see step (C) of the above-described method and below). At this stage, the specific linkage/elongation of the library identifiers within the collected samples, and in consequence the combination of the library components, can be identified for example by amplifying the oligonucleotide comprising the library identifiers using the polymerase chain reaction (PCR) using primers corresponding to PI and P2 and subsequently sequencing the oligonucleotide. However, this approach of identification requires a separate analysis, i.e. for example a separate PCR and a separate sequencing step for each combination to be identified. Instead of this laborious identification approach, the method of the invention comprises a further step after step (B) and before step (C) of increasing the temperature to or above the melting point of the annealing regions and optionally the linking regions. Subsequently, the collected samples are pooled and the annealed oligonucleotides are selected (step (D) of above- described method). Selecting preferably comprises one of the following:

(i) amplifying the double-stranded annealed oligonucleotides in the pool using oligonucleotide primers annealing to sequences complementary to the priming regions of the two terminal library identifiers, wherein the amplification is carried out under conditions that do not allow annealing of the annealing regions and optional linking regions, preferably at the primer region annealing temperature;

(ii) degrading single-stranded, i.e. non-annealed oligonucleotides, and

(iii) size selection, e.g. by gel, column or gradient separation, and

(iv) affinity purification of the linked or the unlinked identifiers. As laid out above, preferably the temperature is increased to or above the melting point of the annealing and the optional linking regions before pooling. This has the effect that single non-annealed library identifiers, i.e. those that are not in combination with another library identifier, cannot anneal to other library identifiers anymore. This is crucial since the pooling step (C) brings together all samples selected for and thereby creates other sample combinations not selected for. Thus, in the pooling and selecting steps (C) and (D), no new combinations of library identifiers can be formed and only those which were originally formed are selected for and can then be identified (see e.g. Fig. 3). This approach allows the identification of any number of sample combinations in one step and does not require analysing each sample combination separately. Identification of the oligonucleotides can be performed by any means available in the art and suitable for nucleotide identification, for example sequencing, mass spectrometry, fluorescence reading or hybridization technologies such as microarray hybridization depending on the unique regions.

The method of the invention is also suitable for tracking the combination of samples of more than two libraries, i.e. of a first library, a second library and one or more supplemental libraries. In one embodiment, the samples of the supplemental library comprise a unit of two internal library identifiers: a first and a second library identifier. In case of one supplemental library, the annealing region Al of the first terminal library identifier anneals to the annealing region Aintemail of the first internal library identifier and the annealing region A2 of the first terminal library identifier anneals to the annealing region Ajntemai2 of the second internal library identifier. Also, the linking region LI of the first internal library identifier anneals to the linking region L2 of the second internal library identifier, thereby creating an oligonucleotide chain in the order: first terminal library identifier - first internal library identifier - second internal library identifier- second terminal library identifier. This oligonucleotide chain can be analysed as described above, i.e. the sample combinations of interest can be pooled and analysed together. Further supplemental libraries can be introduced by designing the annealing regions so that the library identifiers of all different libraries can form an oligonucleotide chain comprising one library identifier or one unit of two library identifiers of each group (see, for an example, Fig. 7).

In an alternative embodiment, the two internal identifiers of the same unit may lack their linking regions if their unique regions are designed to anneal to each other, thereby substituting for the function of the linking regions, i.e. the unique regions are also linking regions and, thus, fulfil the requirements of linking regions. In another embodiment, the supplemental libraries are associated with groups of internal library identifiers. In this embodiment, a sample of a supplemental library comprises a single internal library identifier and not a unit of two internal library identifiers as in above- described embodiment. In case of an even number of supplemental libraries, for example two (see Fig. 6), the annealing region A l of the first terminal library identifier anneals to the annealing region Aintemai of one internal library identifier, the annealing region Ain_temai2 of the same internal library identifier anneals to the annealing region Aintemail of the other internal library identifier, and the annealing region Aintemai2 of that other internal library identifier anneals to the annealing region A2 of the second terminal library identifier. The formed oligonucleotide chains of different sample combinations are then pooled and selected as described above. Further supplemental libraries can be introduced by designing the annealing regions so that the library identifiers of all different libraries can form an oligonucleotide chain comprising one library identifier of each group, i.e. one annealing region of each library identifier anneals to the annealing region of exactly one library identifier of a different group. However, to ensure the formation of an oligonucleotide chain capable of being amplified correctly if an uneven number of supplemental libraries is used, a bridging nucleotide may be introduced. A bridging nucleotide is capable of annealing to the annealing regions of two different library identifiers, i.e. it comprises at least two annealing regions which fulfil the requirements of the annealing regions of a hypothetical library identifier of a further, different group of library identifiers. Instead of introducing a bridging nucleotide, one or more library identifiers can be in part double-stranded (see, e.g., Fig. 5).

In one embodiment of the present invention, identifiers and or bridging nucleotides comprise oligonucleotides derived from molecular readouts resulting in nucleotides, such as genotyping, R A expression, protein interaction with proximity ligation.

In one embodiment of the invention, oligonucleotides or polynucleotides naturally occurring in a sample can be introduced into one or more bridging nucleotides. These naturally occurring oligonucleotides or polynucleotides may constitute a part of the bridging nucleotide, preferably the connector region, or the complete bridging nucleotide(s).

In another embodiment, oligonucleotides or polynucleotides naturally occurring in a sample can be introduced into a library identifier. These naturally occurring oligonucleotides or polynucleotides may constitute a part of the library identifier, i.e. one or more of the unique, priming, annealing and linking regions, or the complete library identifier. They may also be introduced adjacent to one of the aforementioned regions. "Naturally occurring" in the context of the invention means being present in or being produced in a library component. Said naturally occurring oligonucleotides or polynucleotides can for example be derived from molecular readouts resulting in nucleotides, such as genotyping, RNA expression profiling or protein interaction with proximity ligation. Thus, in a preferred embodiment of the invention, identifiers and/or bridging nucleotides comprise oligonucleotides derived from molecular readouts resulting in nucleotides, such as genotyping, RNA expression, protein interaction with proximity ligation.

There are numerous approaches of introducing a naturally occurring oligonucleotide into an identifier or bridging nucleotide and the present invention is not limited to the following examples. For instance, the naturally occurring sequence can be amplified using primers with sequences comprising the annealing regions (or an annealing region and a primer region in case of a terminal identifier) of the identifier or bridging nucleotide (see, e.g., Fig. 8). The double-stranded amplicon can be dissociated, e.g. by raising the temperature above the melting temperature of said amplicon, into two single strands and the single strand constituting the identifier can be annealed, optionally subsequently to an isolation step, to a different identifier. Alternatively, the double-stranded amplicon can be digested with one or more restriction enzymes, which preferably do not cut within any other identifier, so that a double-stranded oligonucleotide with single-stranded ends (one single-stranded end in case of a terminal identifier) is generated which can anneal to other identifiers. Moreover, the part of the identifiers generated from a naturally occurring sequence is not necessarily unique in each sample. For example, identifiers can be generated using primers or partial identifiers with priming function that can anneal to at least a subpopulation of naturally occurring sequences (e.g. poly(A) tail of the mRNAs as shown in Fig. 11, common repetitive genomic sequences, DNA binding sites motives, etc.). A known part of the naturally occurring sequence, in the example of mRNA the poly(A) tail of the mRNA, can anneal to a partial identifiers and the partial identifier can be completed by DNA polymerisation (reverse transcription in the mRNA example) and subsequent cutting of the resulting double-stranded part to produce a single-stranded end capable of annealing to another identifier. Also, a mixture of primer pairs or partial identifiers with priming function annealing to distinct naturally occurring sequences can be used to generate non-identical identifiers within the same sample, which, however, preferably all still possess a common unique region or anneal to an identifier comprising a unique region in the same sample. The amplicons resulting from a PCR using said primer pairs can be processed as described above, i.e. dissociated into single strands or subjected to digestion with said restriction enzymes to produce a single-stranded end capable of annealing. Also, the naturally occurring oligonucleotide can already comprise or consist of all elements of the identifier or bridging nucleotide if the naturally occurring oligonucleotide or a part of it is unique for the sample it is comprised in (for example if one library has only one member as in Fig. 9). Generally, the naturally occurring oligonucleotide can comprise or consist of all elements of an oligonucleotide without a unique region, e.g. one member if a unit of internal library identifiers, i.e. a first or second internal library identifier (for example as in Fig. 10), or a bridging nucleotide. A double-stranded oligonucleotide comprising air elements requires digestion with a restriction enzyme as described above or a denaturation step.

An alternative way of incorporating a naturally occurring sequence is that one end of the naturally occurring oligonucleotide anneals to non-identical PI or P2 and the respective PI or P2 primer is substituted by a primer corresponding to the other end, i.e. non-annealing end, of the naturally occurring oligonucleotide, enabling the amplification of the double-stranded annealed oligonucleotides including the naturally occurring oligonucleotide in the selecting step of the above-described method.

The present invention also relates to a set of a first and a second library for carrying out the method of claims 1 to 20, wherein each member of the first and second library comprises a different library component and a different terminal library identifier and optionally one or more supplemental libraries, wherein each member of the one or more supplemental libraries comprises a different library component and an internal library identifier or set thereof, wherein

(i) a priming region (PI),

(ii) a unique region (U 1 ), and

(iii) an annealing region (Al), and

(i) a priming region (P2)

(ii) a unique region (U2), and

(iii) an annealing region (A2)

and optionally

(cc) each unit of two internal library identifiers of one or more supplemental libraries comprises a first internal library identifier which comprises the following oligonucleotide elements: (i) an annealing region (Asternal 1), wherein Asternal 1 anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uintemail ), and optionally

(iii) a linking region (LI ) that anneals to the optional linking region (L2) of the second member of the unit,

and a second internal library identifier which comprises the following oligonucleotide elements:

(i) an annealing region (Aintemai2), wherein Ai_ntemai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uin_ternai2), and/or

(iii) a linking region (L2) that anneals to the optional linking region (LI ) of the first member of the unit,

(i) an annealing region (Asternal 1), wherein Aimemail anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uin_temai3), and

(iii) an annealing region (Ai_ntemai2), wherein Ain_ternai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library

and/or

(i) an annealing region (Abndge ), wherein Abridge 1 anneals to Al , Aintemai2, to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide,

(ii) optionally a connector region (C), and

(iii) an annealing region (A_bridge2), wherein Abridge anneals to A2, Aj_nternail , to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide.

The present invention also relates to a set of groups comprising a group of first terminal library identifiers and a group of second terminal library identifiers for carrying out the method of the invention, wherein each member of the group of the first and second terminal library identifiers is different from each other and wherein each terminal library _ identifier of the first and second group comprises the following oligonucleotide elements:

(i) a priming region (P I , P2),

(ii) a unique region (Ul , U2), and

(iii) an annealing region (Al , A2), wherein Al anneals to A2 or optionally to the annealing region of an internal library identifier or unit thereof (Ain_temail , A_mtema\2) and the length and the sequence of the oligonucleotides is such that the T_m of complementary annealing regions is lower than the T_m of a primer or primers to PI and or P2.

In one embodiment, the set of groups of the invention further comprises one or more groups of different internal library identifiers or units thereof, wherein each internal library identifier or unit thereof comprises

(a) a first internal library identifier which comprises the following oligonucleotide elements:

(i) an annealing region (Aintemail), wherein Aintemai l anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uin_temail), and optionally

(i) an annealing region (Ai_ntemai2), wherein Ain_temai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Ujntemai2), and/or

(b) an internal library identifier of one or more supplemental libraries comprises the following oligonucleotide elements:

(i) an annealing region (Aj_ntemai )_> wherein Aintemail anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Ui_ntemai3), and (iii) an annealing region (Ai_nteniai2), wherein Ai_ntCTnai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

and/or

(c) one or more bridging oligonucleotides each comprising the following elements:

(i) an annealing. region (Abndgel ), wherein Abridge 1 anneals to Al , Ain_temai2, to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide,

(ii) optionally a connector region (C), and

(iii) an annealing region (Abndge2), wherein Abridge2 anneals to A2, Aintemail , ^{to me} annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide.

The present invention also relates to a computer implemented method of designing a group of first terminal library identifiers, a group of second terminal library identifiers and optionally one or more groups of internal library identifiers for carrying out the method of claims 1 to 20, comprising the following steps:

(ii) determining the number of groups of internal library identifiers required according to the method of the invention,

(iii) generating sequences for PI and P2, respectively, wherein each sequence

is generated randomly or selected from a set of pre-existing priming sequences, and

allows annealing of a primer sequence at a specific and/or preset annealing temperature which is essentially identical for PI and P2 of the group of first and group of second terminal library identifiers, respectively,

(iv) generating annealing sequences for Al and A2 and optionally Asternal 1 and Aj_ntemai2, wherein each sequence:

is generated randomly or selected from a set of pre-existing annealing sequences,

is identical for each annealing region of the same group of terminal library identifiers, if internal library identifiers are present Α_ύ,^ιΐ and A_interna|2, respectively, is identical for one annealing region of the same group of internal library identifiers or of the same group of units of two internal library identifiers, . wherein Al anneals to A2 or A_internall or to the annealing region of another internal library identifier of a further supplemental library and A2 anneals to Al or Aintemai2 or to the annealing region of another internal library identifier of a further supplemental library and wherein the length and sequence of all annealing regions (Al, A2, Aj_ntemail and Ain_ternal2) and linking regions (LI and L2) is such that the annealing temperature is lower than the annealing temperature of (a) primer(s) annealing to PI and/or P2 or a sequence complementary thereto

does neither anneal to the annealing region of members of the same library nor to the priming regions of the terminal library identifiers or their corresponding primer sequences at the lowest linking temperature,

is a naturally occurring sequence, generated randomly or selected from a set of pre-existing unique sequences,

In one embodiment, to generate unique sequences for a library with N members, the minimal sequence length of the unique regions is the smallest integer greater than the logarithm in base 4 of N. For a given length of the unique regions, the algorithm generates N random sequences and tests that the generated set fulfils the above criteria. If it does not, another set is generated. If after a few trials no set is obtained then the length is incremented by one and the whole process is reiterated as many times as necessary until a set is obtained. BRIEF DESCRIPTION OF THE FIGURES

The following figures are merely illustrative of the present invention and should not be construed to limit the scope of the invention as indicated by the appended claims in any way.

Fig. 1: Pair-wise combination of double-stranded library identifiers. A first library sample comprising a partly double-stranded first terminal identifier is combined with a second library sample comprising a partly double-stranded second terminal library identifier. The identifiers anneal via their single-stranded annealing regions and the adjacent nucleotides of the two identifiers are ligated as indicated by the semicircles. Arrows indicate 5' to 3' direction.

Fig. 2: Pair- wise combination of single-stranded library identifiers. A first library sample comprising a single-stranded first terminal identifier is combined with a second library sample comprising a single-stranded second terminal library identifier. The identifiers anneal via their annealing regions, some identifiers, however, may remain unbound. The resulting partial double-strand can be elongated using a polymerase. Arrows indicate 5^? to 3' direction.

Fig. 3: Selecting of two pair-wise combinations of single-stranded library identifiers.

Two first library samples comprising single-stranded first terminal identifiers are each combined with a different second library sample comprising a single-stranded second terminal library identifier (as in Fig. 2). Identifiers anneal via their single- stranded annealing regions and the resulting partial double-strand are elongated/extended using a polymerase. The sample combinations are pooled and primers corresponding to PI and P2 anneal to the combined library identifiers. Only the specific combinations will be amplified by PCR at an annealing temperature which does not allow annealing of the annealing regions (selecting). Arrows indicate 5' to 3' direction and dotted arrows the nucleotides generated by elongation in 5' to 3' direction. The prime symbol '"" in the identifier designation indicates reverse complement sequence.

Fig. 4: Combination of three samples, one comprising units two internal identifiers.

One first library sample comprising single-stranded first terminal identifiers is combined with a supplemental library sample comprising a unit of two internal identifiers and a second library sample comprising a single-stranded second terminal library identifier! The asterisk above the second internal library identifier indicates that the identifier may also comprise a unique region Ui_nt 2. The two internal library identifiers may or may not be in an annealed state. The samples are combined and the identifiers anneal at the annealing temperatures of the annealing/linking regions. The resulting partial double-strand can be elongated/extended using a polymerase (dotted arrows). Nucleotides are ligated where necessary as indicated by the semicircles. Arrows indicate 5 ' to 3' direction. Instead of a unit of internal identifiers, the supplemental library sample could also comprise a single-stranded internal identifier and a bridging nucleotide. LI would then be Aj_nt2, L2 would be Abndgel and Α„« 2 would be A_bridge2.

Fig. 5: Combination of three samples, one comprising single-stranded internal identifiers. One first library sample comprising partly double-stranded first terminal identifiers is combined with a supplemental library sample comprising single- stranded internal identifiers and a second library sample comprising partly double- stranded second terminal library identifiers. The samples are combined and the identifiers anneal at the annealing temperatures of the annealing regions. The resulting partial double-strand can be elongated/extended using a polymerase (dotted arrows). Nucleotides are ligated where necessary as indicated by the semicircles. Arrows indicate 5 ' to 3' direction.

Fig. 6: Combination of four samples, each comprising single-stranded identifiers. One first library sample comprising single-stranded first terminal identifiers is combined with two supplemental library samples of different libraries, each comprising single- stranded internal identifiers, and a second library sample comprising a single- stranded second terminal library identifier. The samples are combined and the identifiers anneal at the annealing temperatures of the annealing regions. The resulting partial double-strand can be elongated/extended using a polymerase (dotted arrows). Nucleotides are ligated where necessary as indicated by the semicircles. Arrows indicate 5' to 3' direction. Fig. 7: Combination of four samples, one comprising single-stranded internal identifiers and one comprising units of two internal identifiers. One first library sample comprising partly double-stranded first terminal identifiers is combined with a supplemental library sample comprising single-stranded internal identifiers, a supplemental library sample comprising units of two internal identifiers and a second library sample comprising a single-stranded second terminal library identifier. The asterisk above the second internal library identifier indicates that the identifier may also comprise a unique region Uuu_.2. The two internal library identifiers may or may not be in an annealed state. The samples are combined and the identifiers anneal at the annealing temperatures of the annealing regions. The resulting partial double- strand can be elongated/extended using a polymerase (dotted arrows). Nucleotides are li gated where necessary as indicated by the semicircles. Arrows indicate 5' to 3' direction.

Fig. 8: Pair- wise combination of single-stranded library identifiers, one comprising a naturally occurring sequence derived from DNA. The first terminal library identifier is generated in the sample using a molecular read-out, in this example genotyping of an SNP indicated by the asterisk. The SNP locus is amplified using primers PI and Α (the reverse complement of Al) flanking the locus, generating the first library identifier Pl-*-Al. Samples are then combined and the identifiers anneal at the annealing temperatures of the annealing regions. The resulting partial double-strand can be elongated/extended using a polymerase (dotted arrows). Arrows indicate 5' to 3' direction. Said generation of the identifier can also take place after the sample combination.

Fig. 9: Pair-wise combination of single-stranded library identifiers, one comprising a naturally occurring sequence derived from RNA. The first terminal library identifier is generated in the only sample of the first library using a molecular readout, in this example expression profiling of the gene x, of which a region is used as the first terminal identifier. The depicted cell comprising gene x can respond to different stimuli by changing the expression of gene x. The resulting mRNA molecules (indicated by striped arrows) can anneal via the annealing region Al to the annealing region A2 of the second terminal library identifier when the samples are combined. The second terminal library identifier serves as a primer for reverse transcription of the part of gene x comprising PI and Ul and the generated cDNA is amplified by PCR using the primers PI and P2, which results in a double-stranded combination of both library identifiers. This result tracks the sample combination and gives information about the transcriptional state of gene x. Arrows indicate 5' to 3' direction.

Fig. 10: Generation of an internal library identifier from a naturally occurring sequence. The first internal library identifier is generated in the sample using a molecular read-out, in this example expression profiling of the gene x, of which a region is used as the first terminal identifier. The depicted cell comprising gene x can respond to different stimuli by changing the expression of gene x. The resulting mPv A molecules (indicated by striped arrows) can anneal via the linking region LI to the linking region L2 of the second internal library identifier within the same sample. The second internal library identifier serves as a primer for reverse transcription of the part of gene x comprising Αύηΐ . The generated cDNA can then be amplified by PCR using primers annealing to Ai„_tl and Ai_nt2, which will result in a double-stranded combination of a unit of internal library identifiers (not shown). The double-stranded amplicon can be digested with one or more restriction enzymes, which preferably do not cut within any other identifier, so that a double-stranded oligonucleotide with single-stranded ends is generated, which can anneal to other identifiers (not shown). This allows tracking the sample combination and obtaining information about the transcriptional state of gene x. Arrows indicate 5' to 3' direction.

Fig. 11: Pair- wise combination of four samples, one comprising a naturally occurring sequence derived from all the population of polyadenylated RNA. The first terminal library identifier, that is partially double stranded, is ligated with the first internal library identifier. The first internal library identifier possesses a linking region (LI, composed by 16 T nucleotides followed by a non-T nucleotide (V) and a random nucleotide (N) [Ti₆VN]) that is able to anneal with the poly(A) tail from the natural occurring polyadenylated mR As. The naturally occurring polyadenylated mRNAs that can respond to different stimuli by changing its expression level are defined as the second internal library identifiers. The naturally occurring mRNAs possess an internal annealing region that is defined by the recognition sequence of a restriction enzyme (Ain_t2), a unique internal region specific for each kind of mRNA (Uin,2) and the linker region (L2, that corresponds to the poly(A) tail). The ligated first terminal identifier and first internal identifier are used to prime the retrotranscription of the naturally occurring polyadenylated mRNAs (second internal library identifiers). In a posterior step the annealed library identifiers are transformed to double stranded DNA molecules (using either RNaseH or random hexamer priming). These molecules are cut with a restriction enzyme that only cuts inside the natural occurring mRNA sequences. In a final step the second terminal identifiers library is fused using as annealing region (A2) that is complementary with the protruding DNA sequence (Α_ύ«2) produced by the restriction enzyme digestion.

Fig. 12: Identifier-dependent DNA polymerisation of oligonucleotides immobilized on beads.

The beads are covered by single-strand DNA ending with a common annealing region A2. The library identifiers are single-stranded DNA with a unique region and a common annealing region Al for all libraries that can anneal with A2. Samples are prepared so that they contain all components necessary for a DNA polymerisation reaction. During the step of carrier modification, the single-stranded DN A that covers the bead is extended by polymerase using as template the annealed library identifiers. Then, the DNA polymerase is inactivated, thereby preventing any further elongations. Identification of the combinations of library identifiers (hit identification) can be done by sequential hybridization of fluorescent labelled oligonucleotides of known sequence (Gunderson et al., Decoding randomly ordered DNA arrays. Genome Research (2004) vol. 14 (5) pp. 870-7).

Fig. 13: Combination of double-stranded oligonucleotide identifiers immobilized on beads and restriction enzyme identifiers. Identifiers of the first library are double- stranded oligonucelotides immobilized on beads. The oligonucleotides contain a unique sequence (here Ul and U2) followed by a sequence common to all samples of the library with a defined sequence of distinct enzyme restriction sites (here three sites SI, S2 and S3). These three sites are specific to distinct restriction enzymes (for example enzymes El, E2, E3). The identifiers of the second library are the restriction enzymes. Upon merging of samples of the two libraries, the enzyme modifies specifically the first library identifier by cutting it at its restriction site. The sequence of the modified first library identifier, which consists of the unique sequence U followed by the common region until the site where it has been cut by the second library identifier, uniquely identifies the sample combination.

14: Pair-wise combination of library identifiers, fusion by elongation and identification by parallel sequencing. A) Experimental Design: A group of ten first terminal library identifiers and a group of ten second terminal library identifiers were designed, each comprising a priming region, a unique region and an annealing region. The first five identifiers of each library were mixed in a pair-wise manner, and the last five identifiers of each library were also mixed in a pair-wise manner. The samples are fused by elongation producing 50 the possible combinations (positive barcode fusions, marked as +). The samples are pooled and the fusions identified by parallel sequencing. B) Results: The resulting matrix of identified barcode combinations is presented in gray scale intensity. The positives combinations are represented in the upper right and lower left quadrant, while the false positives combinations are presented in the lower right and upper left quadrant. The number of times each individual identifier is sequenced (independently of the fused identifier) is represented by bars in the upper or right part of the graph, corresponding to the first or the second terminal library respectively. The values are expressed on millions of reads (megareads).

EXAMPLES

The following examples are for illustrative purposes only and do not limit the invention described above in any way.

Example 1: Pair-wise combination of library identifiers

1.1 Oligonucleotide design

A group of two first terminal library identifiers and a group of two second terminal library identifiers were designed, each comprising a priming region, a unique region and an annealing region.

The priming region PI of the first terminal identifiers was identical to the priming region P2 of the second terminal identifier. The sequence of the priming region was: PI = P2 = 5' CAAGCAGAAGACGGCATACGAGATC 3' (SEQ ID NO: 1)

These priming regions have a calculated melting temperature of 67°C using the salt adjusted method (Kibbe. OligoCalc: an online oligonucleotide properties calculator, Nucleic Acids Res (2007) vol. 35 (Web Server issue) pp. W43-6). The primer P had a sequence identical to the sequence of the priming regions PI and P2.

The sequence of the annealing region Al of the first terminal identifiers was:

Al = 5' CACGAGGTCATT 3' (SEQ ID NO: 2)

The annealing region A2 of the second terminal library identifier was complementary to Al and had the sequence:

A2 = 5' AATGACCTCGTG 3' (SEQ ID NO: 3)

The melting temperature of Al and A2 as calculated by the salt adjusted method was

36°C.

The unique regions of the library identifiers were as follows:

-Ul i = GAACCA

-U2i = ACGCTA

-Ul₂ = AGACCT

-U2₂ = GGAGAC

Thus, the sequences of the library identifiers were as follows:

First library identifiers:

P 1 -U 11 - A 1 : 5 ' C AAGC AGAAGACGGCATACGAGATC-GAACCA-CACGAGGTC ATT (SEQ ID NO: 4)

P 1 -U2 , -A 1 : 5 ' C AAGC AGAAGACGGCATACGAGATC- ACGCTA-CACGAGGTC ATT (SEQ ID NO: 5)

Second library identifiers:

P1-U1₂-A2: 5' CAAGCAGAAGACGGCATACGAGATC-AGACCT- AATGACCTCGTG (SEQ ID NO: 6)

Pl-U2₂-A2: 5' CAAGCAGAAGACGGCATACGAGATC-GGAGAC-AATGACCTCGTG (SEQ ID NO: 7)

None of these library identifiers shows self-complementarty and no hairpins were predicted.

1.2 Combination of the library identifiers

The first library identifier Pl-Ul i-Al and the second library identifier P1-U1₂-A2 on the one hand, and Pl-U2i-Al and Pl-U2₂-A2 on the other hand were mixed in standard test tubes. The final concentration of each library identifier oligonucleotide in the reaction volume was 50 nM. Identifiers were combined and strands were extended by fill-in using the following reaction mix and conditions:

- 5 ΐ, identifier mix

- 0.2 pL dNTPs lOmM

- 2 μί, NEBuffer 2 l Ox

- 0.2 μΙ_ Klenow (5u^L from NEB)

- 7.5 μΐ H₂0

The library identifiers were combined at an annealing temperature of 37°C for 60 min, followed by enzyme inactivation for 20 min at 75°C.

1.3 Pooling and amplification

0.5 iL of each mix were pooled. A PC was performed on the 1 of pooled library identifiers using the primer P and the following cycles:

3 min 94°C

30 s 94°C

30x 20 s 70°C

10 s 72°C

5 min 72°C.

1.4 Identification

The PCR amplification products were sequenced and cloned using standard methods. Six different clones were sequenced and found to contain the following oligonucleotides:

Two oligonucleotides had the sequence: 5' CAAGCAGAAGACGGCATACGAGA TC-GAACCA-CACGAGGTCATT-AGGTCT-GATCTCGTATGCCGTATTCTGCTTG (SEQ ID NO: 8), which is exactly the sequence of Pl-Ul Al-Ul₂'-P', wherein the symbol ""' stands for a complementary sequence.

Four oligonucleotides had the sequence: 5' CAAGCAGAAGACGGCATACGA GATC-ACGCTA-CACGAGGTCATT-GTCTCC-GATCTCGTATGCCGTATTCTGCTTG (SEQ ID NO: 9), which is exactly the sequence of Pl -U2i-Al -U2₂'-P', wherein the symbol ""' stands for a complementary sequence.

No artifactual oligonucleotides were obtained, neither spurious pairs (such as Pl -Ul i- Al-U2₂'-P' or Pl -U2i-Al -Ul₂'-P') nor any other combinations. Controls:

Control experiments were performed using all possible combinations of library identifiers. Only those predicted to combine, i.e. first library identifiers with second library identifiers, did so. Combinations were analysed subsequently to the fill-in step by assessing their migration in agarose gels or after the amplification step by sequencing.

Different DNA polymerases where used to perform the fill-in (T4 DNA polymerase, Taq polymerase and the Klenow fragment). All were able to perform the fill-in step. In the case of Klenow polymerase the fill in steps could be reduced to 5 min of incubation at 37°C.

A negative control without adding polymerase to perform the fill-in step was also performed and failed to produce a PCR amplification product.

Example 2: Pair-wise combination of library identifiers using microfluidics droplets

2.1 Oligonucleotide design

The oligonucleotides of Example 1 were used.

2.2 Combination of the library identifiers, pooling and amplification

The first library identifier Pl-Ul i-Al and the second library identifier P1-U1₂-A2 on the one hand, and Pl-U2i-Al and Pl-U2₂-A2 on the other hand were mixed as in Example 1 and kept on ice. Aqueous droplets in an immiscible oil carrier phase were generated simultaneously for each mix using a microfluidic drop maker and applying flow rates of 250 μΙΤϊι for both aqueous samples and 6000μΙ71ι fluorinated oil supplemented with 0.5% DMP- PFPE surfactant (Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008). 500 μΐ of the emulsion containing the mix of the two kinds of droplets were collected on ice. Subsequently the emulsion was incubated at 37°C for 15 min to allow the fill-in reaction within the droplets before the enzyme was inactivated by incubating the emulsion for 20 min at 75°C. As the next step, 100 μΐ, 20 mM EDTA was added and the droplets were disrupted by the addition of 15% lH,lH,2H,2H-perfluoro octanol. Subsequently the aqueous phase was collected and the samples were ethanol-precipitated. A PCR was performed on the resulting pooled samples as described in Example 1.

2.3 Identification

Ten clones were obtained and sequenced as described in Example 1. Two oligonucleotides had the sequence: 5' CAAGCAGAAGACGGCATACGAGAT C-GAACCA-CACGAGGTCATT-AGGTCT-GATCTCGTATGCCGTATTCTGCTTG (SEQ ID NO: 8), which is exactly the sequence of Pl-Ul i-Al-Ul₂'-P', wherein the symbol ""' stands for a complementary sequence.

Eight oligonucleotides had the sequence: 5' CAAGCAGAAGACGGCATACGAGA TC-ACGCTA-CACGAGGTCATT-GTCTCC-GATCTCGTATGCCGTATTCTGCTTG (SEQ ID NO: 9), which is exactly the sequence of

wherein the symbol ""' stands for a complementary sequence.

No artifactual oligonucleotides were obtained, neither spurious pairs (such as Pl-Ul r Al-U2₂'-P' or Pl-U2i-Al-Ul₂'-P') nor any other combinations.

Controls:

A control experiment was performed using the same conditions described above, except for using droplets containing Pl-Ul j-Al and Pl-Ul i-Al for one combination and Pl- Ul ₂-A2 and P1-U1₂-A2 for another combination. In those conditions the fill-in is theoretically not possible, and will only happen if there is a non-specific fusion of droplets during the experiment. The PCR amplification yielded no specific product detectable by gel- electrophoresis.

Example 3: Screening combinatorial mixtures of biological and chemical factors for their effect on human cells, using partially complementary oligonucleotides for tracking the samples

The current invention is applicable to identify a combination of two cell-treatments (incubation of a cell with drugs and siRNAs) that together have a phenotypic effect on cells. In this example, the identifiers comprise single-stranded DNA and the samples are comprised in micro-fluidics droplets. The identification of combinations is achieved by parallel sequencing of modified (elongated) identifiers.

3.1 Library preparation

In a first step the two libraries, "library 1" containing 500 drugs and "library 2" containing 500 siRNAs, are prepared and unique identifiers are added to the components in a one-to-one fashion. The identifiers for library 1 each consist of a priming region PI identical for all identifiers of library 1, a unique region Ul specific to each identifier of library 1 and an annealing region Al identical for all identifiers of library 1. The sequence of PI is AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT, which is compatible with Illumina parallel sequencing. The identifiers for library 2 each consist of a priming region P2 identical for all identifiers of library 2, a unique region U2 specific to each identifier of library 2, and an annealing region A2 identical for all identifiers of library 2. The sequence of P2 is CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT, which is compatible with Illumina parallel sequencing. Al and A2 are the reverse complement of each other. The complementary sequences Al and A2 are designed to have an annealing temperature lower than the one of the priming regions PI and P2 and their respective primers. Also, an elongation mix ( lenow polymerase, lenow polymerase buffer and dNTPs) is added to all members of one library, in this example library 1.

3.2 Sample combination

About 5,000 to 50,000 micro-fluidic droplets are generated for each library member according to Clausell-Tormos et al. (Droplet-based microfiuidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008). Then each droplet from library 1 is fused according to Mazutis et al. (A fast and efficient microfiuidic system for highly selective one-to-one droplet fusion. Lab Chip (2009) vol. 9 (18) pp. 2665-2672) with a single, randomly picked, droplet from library 2. This random pairing enables rapid generation of all possible pair-wise combinations. Each merged droplet thus contains identifiers related to a drug and an sRNAi. Subsequently, the fused droplets are incubated at a temperature allowing hybridization of the annealing regions Al and A2 and an oligonucleotide extension by the encapsulated polymerase. Thereby, double- stranded oligonucleotides comprising the elongated identifiers are generated, each representing a specific drug/siRNA combination by their unique sequence pairing U1/U2.

3.3 Assay

The drug/siRNA combinations are assayed in a microfiuidic device, such as microfiuidic devices in poly(dimethylsiloxane) (PDMS) fabricated by soft lithography (Squires and Quake. Microfluidics: Fluid physics at the nanoliter scale. Reviews of Modern Physics, 2005, vol. 77) by merging the droplets described above, herein termed "combinatorial droplets", with droplets containing all assay components (e.g. chemicals, proteins, cells, see Clausell-Tormos et al., Droplet-based microfiuidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol. 15, pg 427, 2008) necessary to screen the effect of the combinatorial treatment. The effect of the library members is then assayed using a reporter assay in which fluorescence provides a positive read-out for the desirable effect of the combined treatment (e.g. the use of a cell with a reporter gene GFP tagged will give a read-out of the expression of a specific cellular pathway or the use of GFP tagged cells in constitutively expressed genes will give a read-out of the cellular proliferation). The droplets, which now contain both the assay components and the drug/siRNA combinations, are incubated for a time period sufficient for the response of the cell and the generation of the readout signal, i.e. change in fluorescence. The incubation can be done inside or outside the microfluidics device depending on the time and the conditions necessary to obtain the fluorescence read-out. The selection of the droplets is done based on the fluorescence intensity of the droplets (Baret et al., Fluorescence-activated droplet sorting (FADS): efficient microfluidic cell sorting based on enzymatic activity. Lab Chip (2009) vol. 9 (13) pp. 1850-1858), i.e. droplets with a fluorescence signal above a certain threshold are defined as positive samples. Positive samples are selected and then gathered in a common recipient.

3.4 Hit identification

The "hit identification" step analyses the identifiers of the pairwise drug/siRNA combinations that yield the positive readout signal. In a first step the pooled droplets are incubated at a high temperature to inactivate the polymerase activity, e.g. at 75°C for 20 minutes. Then, the droplets are disrupted by the addition of a destabilising agent, for example the emulsion destabilizer A 104 (RainDance Technologies, as described in Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008), releasing the elongated identifiers of positive samples into the aqueous phase, together with identifiers that may not be elongated. In a further step, elongated identifiers, which are flanked by the priming sequences PI and P2, are PCR-amplified using the corresponding primers. This PCR is performed at an annealing temperature higher than the annealing temperature of A1/A2 to prevent false positive combinations of identifiers due to the annealing regions Al and A2. Finally, the amplicons undergo parallel sequencing, for example using the Illumina platform (Illumina, Inc.). The obtained sequences of the elongated identifiers reveal the drug/siRNA combinations that lead to the positive readout signal. Example 4: Screening combinatorial mixtures of biological and chemical factors for their effect on human cells, using oligonucleotide barcodes immobilized on beads.

The current invention can also be used to identify a combination of three cell- treatments (incubation of a cell with three drugs) that together have a phenotypic effect on cells. The number of combinations can become rapidly very large. For example, there are about 161700 different combinations of three drugs among a library of 100 drugs. In this example, a particular embodiment of the invention is described in which a three-way combination of all drugs of the same library is achieved. All samples are aqueous droplets of a water-in-oil emulsion, identifiers are single-stranded DNA oligonucleotides attachable to carrier beads and the identification is performed by sequential hybridization.

4.1 Library preparation

The library of 100 drugs is considered to be three libraries (library 1, library 2 and library 3) which all contain the same 100 drugs. Unique oligonucleotide identifiers are added to the library components in a one-to-one fashion. These identifiers each comprise a unique nucleotide sequence that allows their unambiguous identification by sequential hybridization of fluorescent labeled oligonucleotides of known sequences (Gunderson et al., Decoding randomly ordered DNA arrays, Genome Research (2004) vol. 14 (5) pp. 870-7) and a biotin modification.

4.2 Sample combination

About 5,000 to 500,000 droplets are generated for each sample, according to Clausell- Tormos et al. (Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008). Then each droplet from library 1 is fused with a single, randomly picked, droplet from library 2 and a single, randomly picked droplet from library 3. Droplet fusion is obtained according to Mazutis et al. (A fast and efficient microfluidic system for highly selective one-to-one droplet fusion. Lab Chip (2009) vol. 9 (18) pp. 2665-2672). This random pairing enables to rapidly generate all possible combinations of all library members. Also, it allows for all one-way and all two-way combinations as well (i.e. three times the same library member from each library or two times the same and one different library member). Each fused droplet thus contains three identifiers which may or may not be different. 4.3 Identifier immobilization

The droplets containing library members including identifiers are merged according to Mazutis et al. (A fast and efficient microfluidic system for highly selective one-to-one droplet fusion, Lab Chip (2009) vol. 9 (18) pp. 2665-2672) with droplets containing beads (as described in Lim and Zhang, Bead-based microfluidic immunoassays: the next generation. Biosens Bioelectro, 2007, vol. 22 (7) pp. 1 197-204) with a surface covered with streptavidin, to which the biotinylated identifiers can bind. The combination of identifiers linked to the beads uniquely identifies the particular drug combination of the droplet. Subsequently, the droplets are fused with droplets containing an excess of biotin, for example 10 to 100 times more concentrated than the library identifiers, for example 1 mM free biotin, to saturate the carrier beads, thus avoiding any possible artifactual linking of identifiers to the beads after droplet disruption.

4.4 Assay

The drug combinations are assayed on a microfluidic device by merging the droplets described above, herein termed "combinatorial droplets", with droplets containing all assay components (e.g. chemicals, proteins, cells, see Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008) necessary to screen the effect of the combinatorial treatment. The assay is a cell-based reporter assay in which fluorescence provides a positive read-out for the desirable effect of the combined treatment. The droplets, which now contain both the assay components and the drug combinations, are incubated for a time period sufficient for the response of the cell and the generation of the readout signal, i.e. change in fluorescence. Droplets are then sorted according to the readout signal. Droplets with a fluorescent signal above a certain threshold are defined as positive samples and sorted based on the fluorescence intensity of the droplets (Baret et al., Fluorescence-activated droplet sorting (FADS): efficient microfluidic cell sorting based on enzymatic activity. Lab Chip (2009) vol. 9 (13) pp. 1850-1858). Positive samples are gathered in a common recipient.

4.5 Hit identification

The "hit identification" step analyses the identifiers of the drug combinations that yield the positive readout signal. The droplets are disrupted by the addition of a destabilizing agent, for example the emulsion destabilizer A104 (RainDance Technologies, as described in Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008), releasing the beads with the immobilized identifiers of all positive sample combinations into the aqueous phase, together with identifiers that may not have combined with compatible identifiers of the same sample. Because the beads' surface has been saturated with biotin before droplet disruption, artifactual linking with possible free identifiers is avoided at this stage. The beads containing the immobilized identifiers of the positive samples are distributed in a bead-array platform (Fan et al., Illumina universal bead arrays. Meth Enzymol (2006) vol. 410 pp. 57-73). The identifiers bound to each bead are identified by sequential hybridization of fluorescent labeled oligonucleotides of known sequence (Gunderson et al., Decoding randomly ordered DNA arrays. Genome Research (2004) vol. 14 (5) pp. 870-7).

Example 5: Screening combinatorial mixtures of biological and chemical factors for their effect on human cells, using optical barcodes immobilized on beads.

The current invention can also be used to identify a combination of three cell- treatments (incubation of a cell with three drugs) that together have a phenotypic effect on cells. The number of combinations can become rapidly very large. For example, there are about 161,700 different combinations of three drugs among a library of 100 drugs. In this example a particular embodiment of the invention is described in which a three-way combination of all drugs of the same library is achieved. All samples are aqueous droplets of a water-in-oil emulsion, identifiers are quantum dots (Han et al., Quantum-dot-tagged microbeads for multiplexed optical coding of biomolecules. Nature Biotechnology (2001) vol. 19 (7) pp. 631-5) attached to beads and the identification is performed using imaging techniques.

5.1 Library preparation

The library of 100 drugs is considered to be three libraries (library 1, library 2 and library 3) which are all contain the same 100 drugs. Unique identifiers are added to the library components in a one-to-one fashion. These identifiers are distinct biotinylated quantum dot identifiers (commercially available from Quantum Dot Corporation), which have a unique emission spectrum, allowing their unambiguous identification by imaging.

5.2 Sample combination About 5,000 to 500,000 thousand of droplets are generated for each library member. Then each droplet from library 1 is fused with a single, randomly picked, droplet from library 2 and a single, randomly picked droplet from library 3. Droplet fusion is obtained according to Mazutis et al. (A fast and efficient microfluidic system for highly selective one-to-one droplet fusion. Lab Chip (2009) vol. 9 (18) pp. 2665-2672). This random pairing enables to rapidly generate all possible combinations of all library members. Also, it allows for all oneway and all two-way combinations as well (i.e. three times the same library member from each library or two times the same and one different library member). Each fused droplet thus contains three identifiers which may or may not be different.

5.3 Identifier immobilization

The droplets containing library components and identifiers are merged with droplets containing beads (as described in Lim and Zhang, Bead -based microfluidic immunoassays: the next generation. Biosens Bioelectro, 2007, vol. 22 (7) pp. 1 197-204) with a surface covered with streptavidin, to which the biotinylated quantum dots can bind. The combination of quantum dots linked to the beads uniquely identifies the particular drug combination of the droplet. Subsequently, the droplets are fused with droplets containing an excess of biotin, for example 10 to 100 hundred times more concentrated than the library identifiers, for example 1 mM free biotin, to saturate the carrier beads, thus avoiding any possible artifactual addition of identifiers to the beads after droplet disruption.

5.4 Assay

The drug and/or siR A combinations are assayed in a microfluidic device by merging the droplets described above, herein termed "combinatorial droplets", with droplets containing all assay components (e.g. chemicals, proteins, cells, see Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008) necessary to screen the effect of the combinatorial treatment. The assay is a cell-based reporter assay in which fluorescence provides a positive read-out for the desirable effect of the combined treatment (e.g. the use of a cell with a reporter gene GFP tagged will give a read-out of the expression of a specific cellular pathway or the use of GFP tagged cells in constitutively expressed genes will give a read-out of the cellular proliferation). The cell-based assay uses a fluorophore, for example GFP, Cy3 or Cy5. The quantum dot identifiers and the fluorophore are chosen so that they emit at different wavelengths. The droplets, which now contain both the assay components and the drug combinations, are incubated for a time period sufficient for the response of the cell and the generation of the readout signal, i.e. change in fluorescence. Droplets are then selected according to the readout signal. Droplets with a fluorescent signal above a certain threshold are defined as positive samples and selected based on the fluorescence intensity of the droplets (Baret et al., Fluorescence-activated droplet sorting (FADS): efficient microfluidic cell sorting based on enzymatic activity. Lab Chip (2009) vol. 9 (13) pp. 1850- 1858). Positive samples are gathered in a common recipient.

5.5 Hit identification

The "hit identification" step analyses the identifiers of the drug combinations that yield the positive readout signal. The droplets are disrupted by the addition of a destabilising agent, for example the emulsion destabilizer A104 (RainDance Technologies, as described in Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008), releasing the beads with the immobilized quantum dot identifiers of all positive samples into the aqueous phase, together with quantum dot identifiers that may not be linked to a bead. Because the beads surface has been saturated with biotin before droplet disruption, artifactual linking with possible free identifiers is avoided at this stage. The beads containing the immobilized mixture identifiers are subsequently analysed to identify the optical identifiers. This can be done by fluorescence spectroscopy as described in Han et al. (Quantum-dot- tagged microbeads for multiplexed optical coding of biomolecules. Nature Biotechnology (2001) vol. 19 (7) pp. 631), or using techniques such as flow cytometry (Wilson et al., Encoded microcarriers for high-throughput multiplexed detection. Angew Chem Int Ed Engl (2006) vol. 45 (37) pp. 6104-17).

Example 6: Performing combinatorial synthetic chemistry on-chip and screening the reaction products for a desired property.

The current invention is applicable to perform combinatorial synthetic chemistry and to screen for a desired effect of the synthesised combinatorial compounds. In this example, as in example 3, the identifiers comprise single-stranded DNA and the samples are comprised in micro-fluidics droplets. The identification of combinations is achieved by parallel sequencing of modified (elongated) identifiers. The samples are then combined, assayed and identified as described in Example 3, the only difference being that by mixing the samples in a combinatorial fashion new chemical molecules are generated.

6.1 Library preparation

Two libraries are prepared in which all samples of library 1 contain chemical entities with a non-uniform functional group A and all samples of library 2 contain chemical entities with a non-uniform functional group B. The functional groups A and B can undergo a chemical reaction resulting in the generation of a new chemical group. For example, the functional group A is an alkyne and the functional group B is an azide (see also "Click chemistry, a powerful tool for pharmaceutical sciences." Hein CD et al., Pharm Res. 25(10):2216-2230, 2008). Moreover, unique identifiers are added to the components of each library in a one-to-one fashion. The identifiers for library 1 each consist of a priming region PI identical for all identifiers of library 1, a unique region Ul specific to each identifier of library 1 and an annealing region Al identical for all identifiers of library 1. The sequence of PI is AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC CGATCT, which is compatible with Illumina parallel sequencing. The identifiers for library 2 each consist of a priming region P2 identical for all identifiers of library 2, a unique region U2 specific to each identifier of library 2, and an annealing region A2 identical for all identifiers of library 2. The sequence of P2 is CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT, which is compatible with Illumina parallel sequencing. Al and A2 are the reverse complement of each other. The complementary sequences Al and A2 are designed to have an annealing temperature lower than the one of the priming regions PI and P2 and their respective primers. Also, an elongation mix (Klenow polymerase, Klenow polymerase buffer and dNTPs) is added to all members of one library, in this example library 1.

6.2 Sample combination

Micro-fiuidics droplets are generated for each library member according to Clausell- Tormos et al. (Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008). The number of droplets for each member is 10 to 100 times the number of different members of the other library, so that every possible combination will likely occur, possibly in replicates. Then each droplet from library 1 is fused according to Mazutis et al. (A fast and efficient microfluidic system for highly selective one-to-one droplet fusion. Lab Chip (2009) vol. 9 (18) pp. 2665- 2672) with a single, randomly picked droplet from library 2. This random pairing enables rapid generation of all possible pair-wise combinations. Each merged droplet thus contains an identifier combination according to one chemical entity of library 1 and to one chemical entity of library 2. Subsequently, the fused droplets are incubated at a temperature allowing hybridization of the annealing regions Al and A2 and an oligonucleotide extension by the encapsulated polymerase. Thereby, double-stranded oligonucleotides comprising the elongated identifiers are generated, each representing a specific chemical entity combination by their unique sequence pairing U1/U2.

At the same time, the chemical entity of library 1 links to the chemical entity of library 2, forming a compound. This can be achieved using click chemistry as described in Hein et al. (Click chemistry, a powerful tool for pharmaceutical sciences. Pharmaceutical research (2008) vol. 25 (10) pp. 2216-30).

6.3 Assay

The compounds that were combinatorially synthesised as described above, are assayed in a microfluidic device, such as microfluidic devices in poly(dimethylsiloxane) (PDMS) fabricated by soft lithography (Squires and Quake. Microfluidics: Fluid physics at the nanoliter scale. Reviews of Modern Physics, 2005, vol. 77), by merging the droplets described above, herein termed "combinatorial droplets", with droplets containing all assay components (e.g. chemicals, proteins, cells, see Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008) necessary to screen an effect of the new compound. The effect of the compounds is then assayed using a reporter assay in which, for example, fluorescence provides a positive read-out for the desirable effect of the ne treatment (e.g. the use of a enzymatic activity assay with a fluorescent read-out, cell with a reporter gene GFP tagged will give a read-out of the expression of a specific cellular pathway or the use of GFP tagged cells in constitutively expressed genes will give a read-out of the cellular proliferation). The droplets, which now contain both the assay components and the compounds, are incubated for a time period sufficient for the generation of the readout signal, i.e. change in fluorescence. The incubation can be done inside or outside the microfluidics device depending on the time and the conditions necessary to obtain the read-out. The selection of the droplets is done based on the fluorescence intensity of the droplets (Baret et al., Fluorescence-activated droplet sorting (FADS): efficient microfluidic cell sorting based on enzymatic activity. Lab Chip (2009) vol. 9 (13) pp. 1850-1858), i.e. droplets with a fluorescence signal above a certain threshold are defined as positive samples. Positive samples are selected and then gathered in a common recipient.

6.4 Hit identification

The "hit identification" step analyses the identifiers of the combinatorially synthesised compounds that yield the positive readout signal. In a first step the pooled droplets are incubated at a high temperature to inactivate the polymerase activity, e.g. at 75°C for 20 minutes. Then, the droplets are disrupted by the addition of a destabilising agent, for example the emulsion destabilizer A104 (RainDance Technologies, as described in Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008), releasing the elongated identifiers of positive samples into the aqueous phase, together with identifiers that may not be elongated. In a further step, elongated identifiers, which are flanked by the priming sequences PI and P2, are PCR-amplified using the corresponding primers. This PCR is performed at an annealing temperature higher than the annealing temperature of A1/A2 to prevent false positive combinations of identifiers due to the annealing regions Al and A2. Finally, the amplicons undergo parallel sequencing, for example using the Alumina platform (Alumina, Inc.). The obtained sequences of the elongated identifiers reveal combinatorially synthesised compounds that lead to the positive readout signal.

Example 7: Screening combinatorial mixtures of drugs for their effect on human cells, using template-dependent DNA polymerization immobilized on beads.

This example follows example 4, where combinations of drugs are assayed, but uses template-dependent DNA polymerization immobilized on beads for combinatorial sample tracking (see Fig. 12) and is described for two- ways drug combinations.

7.1 Library preparation

The library of 100 drugs is considered to be two libraries (library 1 and library 2) which both contain the same 100 drugs. Unique oligonucleotide identifiers are added to the library components in a one-to-one fashion. The library identifiers comprise single-stranded DNA with a unique region and a common annealing region A2 that can anneal to Al. The unique nucleotide sequence of the identifiers allows their unambiguous identification by sequential hybridization of fluorescent labeled oligonucleotides of known sequences (Gunderson et al., Decoding randomly ordered DNA arrays, Genome Research (2004) vol. 14 (5) pp. 870-7) and a biotin modification.

7.2 Sample combination

About 500 to 50,000 droplets are generated for each sample, according to Clausell- Tormos et al. (Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008). Then each droplet from library 1 is fused with a single, randomly picked, droplet from library 2. Droplet fusion is obtained according to Mazutis et al. (A fast and efficient microfluidic system for highly selective one-to-one droplet fusion. Lab Chip (2009) vol. 9 (18) pp. 2665-2672). This random pairing enables rapid generation of all possible combinations of all library members. Also, it allows for all one-way combinations as well (i.e. two times the same library member from each library). Each fused droplet thus contains two identifiers which may or may not be different.

The droplets containing library members including identifiers are merged according to Mazutis et al. (A fast and efficient microfluidic system for highly selective one-to-one droplet fusion, Lab Chip (2009) vol. 9 (18) pp. 2665-2672) with droplets containing beads (as described in Lim and Zhang, Bead-based microfluidic immunoassays: the next generation. Biosens Bioelectro, 2007, vol. 22 (7) pp. 1197-204). The beads are covered by single-stranded DNA ending with a common annealing region Al . Subsequently, the fused droplets are incubated at a temperature allowing hybridization of the annealing regions Al and A2 and an oligonucleotide extension by the encapsulated polymerase. Thereby, each single-stranded oligonucleotide immobilized on beads will be elongated using as template one of the identifiers present in the droplet (Fig. 12). Because every bead is covered by thousands of oligonucleotides, each bead will carry multiple reverse complement copies of each library identifier. The combination of reverse complement copies of identifiers carried by the beads uniquely identifies the particular drug combination of the droplet.

7.3 Assay

The drug combinations are assayed on a microfluidic device by merging the droplets described above, herein termed "combinatorial droplets", with droplets containing all assay components (e.g. chemicals, proteins, cells, see Clausell-Tormos et al., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008) necessary to screen the effect of the drug combination. The assay is, for example, a cell-based reporter assay in which fluorescence provides a positive read-out for the desirable effect of the drug combination. The droplets, which now contain both the assay components and the drug combinations, are incubated for a time period sufficient for the response of the cell and the generation of the readout signal, i.e. change in fluorescence. Droplets are then sorted according to the readout signal. Droplets with a fluorescent signal above a certain threshold are defined as positive samples and sorted based on the fluorescence intensity of the droplets (Baret et al., Fluorescence-activated droplet sorting (FADS): efficient microfluidic cell sorting based on enzymatic activity. Lab Chip (2009) vol. 9 (13) pp. 1850-1858). Positive samples are gathered in a common recipient.

7.4 Hit identification

The "hit identification" step analyses the identifiers of the drug combinations that yield the positive readout signal. First, the DNA polymerase is inactivated thereby preventing further elongations of the oligonucleotides immobilized on beads. The droplets are disrupted by the addition of a destabilizing agent, for example the emulsion destabilizer A 104 (RainDance Technologies, as described in Ciauseii-Tormos et ai., Droplet-based microfluidic platforms for the encapsulation and screening of mammalian cells and multicellular organisms, Chem Biol, vol 15, pg 427, 2008), releasing the beads with the elongated immobilized oligonucelotides of all positive sample combinations into the aqueous phase, together with possible free identifiers. Because the DNA polymerase has been inactivated before droplet disruption, artifactual elongation of immobilized oligonucleotides using as template possible free identifiers is avoided at this stage. Optionally, the beads are washed in a denaturing condition to wash out non-immobilized DNA. The beads containing the immobilized identifiers of the positive samples are distributed in a bead-array platform (Fan et al., Illumina universal bead arrays. Meth Enzymol (2006) vol. 410 pp. 57-73). For each bead, the reverse complement sequences of the identifiers it carries are identified by sequential hybridization of fluorescent labeled oligonucleotides of known sequence (Gunderson et al., Decoding randomly ordered DNA arrays. Genome Research (2004) vol. 14 (5) pp. 870-7). Example 8: Pair-wise combination of library identifiers, fusion by elongation and identification by parallel sequencing

The experimental design of this Example as well as the results thereof are

schematically illustrated in Fig. 14.

8.1 Oligonucleotide design

A group of ten first terminal library identifiers and a group of ten second terminal library identifiers were designed, each comprising a priming region, a unique region and an annealing region.

The identifiers for library 1 each consist of a priming region PI identical for all identifiers of library 1, a unique region Ul specific to each identifier of library 1 and an annealing region Al identical for all identifiers of library 1. The sequence of PI is AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT, which is compatible with Illumina parallel sequencing. The identifiers for library 2 each consist of a priming region P2 identical for all identifiers of library 2, a unique region U2 specific to each identifier of library 2, and an annealing region A2 identical for all identifiers of library 2. The sequence of P2 is CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT, which is compatible with Illumina parallel sequencing. These priming regions have a calculated melting temperature of 86.1°C for PI and 76.5°C for P2 using the salt adjusted method (Kibbe. OligoCalc: an online oligonucleotide properties calculator, Nucleic Acids Res (2007) vol. 35 (Web Server issue) pp. W43-6).

Al and A2 are the reverse complement of each other. The complementary sequences Al and A2 are designed to have an annealing temperature lower than the one of the priming regions PI and P2 and their respective primers.

The sequence of the annealing region Al of the first terminal identifiers was:

Al = 5' CACGAGGTCATT 3' (SEQ ID NO: 2)

A2 = 5' AATGACCTCGTG 3' (SEQ ID NO: 3)

36°C.

The unique regions of the library identifiers were as follows: -U1₁=U1₂ = CGTGAT

-U2, =U2₂ = AAGCTA

-U3i = U3₂ = GTAGCC

-U4) =U4₂ = TACAAG

-U5i =U5₂ = ACATCG

- U6i = U6₂ = GCCTAA

-U7i = U7₂ = TGGTCA

-U8i =U8₂ = ATTGGC

-U9i =U9₂ = GATCTG

-UlOj =U10₂ = CCTCCC

Thus, the sequences of the library identifiers were as follows:

First library identifiers:

Pl-Uli-Al: 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-CGTGAT-CACGAGGTCATT (SEQ ID NO: 10)

Pl-U2i-Al: 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-AAGCTA-CACGAGGTCATT (SEQ ID NO: 11)

Pl-U3i-Al: 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-GTAGCC-CACGAGGTCATT (SEQ ID NO: 12)

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-TACAAG-CACGAGGTCATT (SEQ ID NO: 13)

Pl-U5i-Al: 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-ACATCG-CACGAGGTCATT (SEQ ID NO: 14)

PI-U6₁-AI: 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-GCCTAA-CACGAGGTCATT (SEQ ID NO: 15)

Pl-U7i-Al: 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-TGGTCA-CACGAGGTCATT (SEQ ID NO: 16) PI-U81-AI : 5'

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-ATTGGC-CACGAGGTCATT (SEQ ID NO: 17)

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-GATCTG-CACGAGGTCATT (SEQ ID NO: 18)

Pl-UKVAl : 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT-CCTCCC-CACGAGGTCATT (SEQ ID NO: 19)

Second library identifiers:

P2-U1₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-CGTGAT- AATGACCTCGTG (SEQ ID NO: 20)

P2-U2₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-AAGCTA- AATGACCTCGTG (SEQ ID NO: 21)

P2-U3₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-GTAGCC- AATGACCTCGTG (SEQ ID NO: 22)

P2-U4₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-TACAAG- AATGACCTCGTG (SEQ ID NO: 23)

P2-U5₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-ACATCG- AATGACCTCGTG (SEQ ID NO: 24)

P2-U6₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-GCCTAA- AATGACCTCGTG (SEQ ID NO: 25)

P2-U7₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-TGGTCA- AATGACCTCGTG (SEQ ID NO: 26)

P2-U8₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-ATTGGC- AATGACCTCGTG (SEQ ID NO: 27)

P2-U9₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-GATCTG- AATGACCTCGTG (SEQ ID NO: 28)

P2-U10₂-A2: 5' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-CCTCCC- AATGACCTCGTG (SEQ ID NO: 29) 8.2 Sample combination

The first five library identifiers from the group of first terminal library identifiers (Pl - U -Al , Pl -U2i-Al , Pl-U3i-Al , P1 -U4,-A1 and P l -U5_rAl) were mixed in a pair-wise manner with the first five library identifiers from the group of second terminal library identifiers (P2-U1₂-A2, P2-U2₂-A2, P2-U3₂-A2, P2-U2₂-A1 and P2-U5₂-A2). And the last five library identifiers from the group of first terminal library identifiers (PI -U61-AI , Pl -U7i- Al , PI -U81-AI , Pl -U9rAl and Pl -UlOi-Al) were mixed in a pair-wise manner with the last five library identifiers from the group of second terminal library identifiers (P2-U6₂-A2, P2- U7₂-A2, P2-U8₂-A2, P2-U9₂-A1 and P2-U10₂-A2). The mixes were done in standard 96-well plates. The final concentration of each library identifier oligonucleotide in the reaction volume was 50 nM. Identifiers were combined and strands were extended by fill-in using the following reaction mix and conditions:

- 10 μΐ, identifier mix

- 0.2 jiL dNTPs l OmM

- 2 μΐ. NEBuffer 2 lOx

- 0.1 μΐ, Klenow (5ιι/μΙ, from NEB)

- 7.7 μΐ H₂0

The library identifiers were combined at an annealing temperature of 37°C for 15 min, followed by enzyme inactivation for 20 min at 75°C.

8.3 Pooling and amplification

Volumes of 0.8 uL of each combination were pooled in a standard test tube and 1 μΐ, of Exonuclease I from E. coli (20 \JI iL from NEB) was added to the 40 μΐ. of pooled mix. The sample was incubated during 1 hour at 37°C to degrade the non-fused library identifiers, followed by enzyme inactivation for 15 min at 75°C.

A PCR was performed on the 3 uL of pooled library identifiers using the primer P 1 and P2. Specifically it was mixed:

- 3 uL of the Exonuclease I treated pooled library

- 10 μί 5x Phusion HF Buffer (Finnzymes)

- 4 uL PI oligonucleotide (10 μΜ)

- 4 μί, P2 oligonucleotide (10 μΜ)

- 1 ^ dNTPs (l O mM)

- 27 μΐ H₂0 - 1 μί, Phusion® High-Fidelity DNA Polymerase (2 U/ μΐ- from Finnzymes)

The following cycles were performed in a DNA Engine Tertrad 2 PCR machine (Bio-

Rad):

30 s 98°C

10 s 98°C

15x 20 s 70°C

10 s 72°C

5 min 72°C.

The PCR product was purified using Agencourt AMPure XP beads and the quality was checked by agarose gel migration and spectrophotometry.

8.4 Identification

The PCR amplification products were used for Illumina GAIIx sequencing. After filtering for good quality reads, 25,779,678 sequences were assigned to one of the 100 putative possible combinations. 25,587,713 reads (99.255%) correspond the expected sequences produced during the sample combination step while 191,965 reads (0.745%) correspond to non expected sequences (false positive combinations). From the positive combinations, 13,226,802 reads (51.307%) correspond to the fusion of the first five library identifiers from the group of first terminal library identifiers (Pl-U -Al, P1-U2₁-A1, PIUS _l-Al, P1-U4₁-A1 and Pl-U5i-Al) with the first five library identifiers from the group of second terminal library identifiers (P2-U1₂-A2, P2-U2₂-A2, P2-U3₂-A2, P2-U2₂-A1 and P2- U5₂-A2) and 12,360,91 1 reads (47.948%) correspond to the last five library identifiers from the group of first terminal library identifiers (Pl-U6 Al, P1-U7,-A1, Pl-U8 Al, P1-U9,-A1 and Pl-UlOj-Al) with the last five library identifiers from the group of second terminal library identifiers (P2-U6₂-A2, P2-U7₂-A2, P2-U8₂-A2, P2-U9₂-A1 and P2-U10₂-A2). While from the false positive combinations, 103,231 reads (0.4%) correspond to the fusion of the first five library identifiers from the group of first terminal library identifiers (Pl-Ul i-Al, Pl- U2i-Al, Pl-U3i-Al, Pl-U4i-Al and Pl-U5i-Al) with the last five library identifiers from the group of second terminal library identifiers (P2-U6₂-A2, P2-U7₂-A2, P2-U8₂-A2, P2-U9₂-A1 and P2-U10₂-A2) and 88,734 reads (0.344%) correspond to the last five library identifiers from the group of first terminal library identifiers (Pl-U6 Al, P1-U7₁-A1, PI-U81-AI, Pl- U9]-A1 and Pl-UlOj-Al) with the first five library identifiers from the group of second terminal library identifiers (P2-U1₂-A2, P2-U2₂-A2, P2-U3₂-A2, P2-U2 -A1 and P2-U5₂-A2). Together the precision is 99.255%, with precision defined as the proportion of the true positives identified against all the obtained results (both true positives and false positives).

Example 9: Pair- ise combination of library identifiers, fusion by ligation and identification by parallel sequencing

9.1 Oligonucleotide design

A group of ten first terminal library identifiers, a bridging oligonucleotide, a group of ten internal library identifiers and a second terminal library identifier were designed.

The identifiers for the first terminal library each consist of a priming region PI identical for all identifiers of the first terminal library, a unique region Ul specific to each identifier of the first terminal library and an annealing region Al identical for all identifiers the first terminal library. The sequence of PI is

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA TCT, which is compatible with Illumina parallel sequencing. The sequence of the annealing region Al of the first terminal identifiers was CTGACAT and the unique regions of the first terminal identifiers were as follows:

- Ul i = GATATCAA

- U2, = ACTCCACT

- U31 = AAGTACCT

- U4, = CACTCACC

- U5i = GTACACAT

- U6, = CATCGTCC

- U7i = GTCTGTCT

- U8 i = ACATCCTA

- U9i = CTATATCT

- UlOi = ATCCTTCA

The bridging oligonucleotide consists of two annealing regions, Abndge and Abridge 1 · The sequence of Abndgel is ATGTCAG and the sequence of Abndge is TCTAGGT.

The identifiers for the internal library identifier each consist of an annealing region Aintemai identical for all identifiers of the internal library, a unique region Uimemai specific to each identifier of the internal library, and an annealing region Ai_nternai2 identical for all identifiers of the internal library. The sequence of Ai_ntemall is ACCTAGA and the sequence of A_in,emai2 is TTTTTTTTTTTTTTTTVN, where V=(A, C or G) and N=(A, C, G or T). The unique regions of the second internal library identifiers were as follows:

^— Uintemajl ^— GTCATGTA

^— Uintemal2 " TCACCATG

~ Uintemal3 ^— CTAGTTCA

^— U internal ^: CACGATCG

^— CGACACCA

^— Uintemal6 ⁼ CTCACTTG

^— GATCACTG

~ UinternalS ⁼ TGATACTA

^— Uintemal9^— TGTCTACT

^— U internal 0 ^: = TGACTGTC

The second terminal library identifier consists of a priming region P2, and an annealing region A2. As it is formed by only one oligonucleotide it does not contain a unique region. The sequence of the priming region P2 is CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT, which is compatible with lliumina parallel sequencing. The sequence of the annealing region A2 of the second terminal identifier was AAAAAAAAAAAAAAAAAA

Al is reverse complementary to Abndgel and Ajntemail is reverse complementary to Abridge2. Both have a melting temperature of 20°C using the salt adjusted method (Kibbe. OligoCalc: an online oligonucleotide properties calculator, Nucleic Acids Res (2007) vol. 35 (Web Server issue) pp. W43-6). To allow the enzymatic ligation of the first terminal library identifiers with the internal library identifiers mediated by the bridging oligonucleotide, the 5' ends of the internal library identifiers are phosphorylated. The complementary sequences Al - Abridge 1 and Ai_ntemail - Abndge2 are designed to have an annealing temperature lower than the one of the priming regions PI and P2 and their respective primers. The melting temperature of PI is 86.1 °C a d P2 is 76.5°C using the salt adjusted method.

Thus, the sequences of the library identifiers were as follows:

First terminal library identifiers: P 1 -U 1 , -A 1 : 5 ' ATCTAC ACTCTTTCCCTAC ACG ACGCTCTTCCGATCT-G AT ATC AA- CTGACAT (SEQ ID NO: 30)

P 1 -U21 -A 1 : 5 ' ATCTACACTCTTTCCCTAC ACGACGCTCTTCCGATCT- ACTCC ACT- CTGACAT (SEQ ID NO: 31 )

P 1 -U3 ,-A 1 : 5 ' ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-AAGTACCT- CTGACAT (SEQ ID NO: 32)

P 1 -U4,-A 1 : 5 ' ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-CACTCACC- CTGACAT (SEQ ID NO: 33)

Pl-U5 i-Al : 5' ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-GTACACAT- CTGACAT (SEQ ID NO: 34)

PI-U61-AI : 5' ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-CATCGTCC- CTGACAT (SEQ ID NO: 35)

Pl-U7i-Al : 5' ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-GTCTGTCT- CTGACAT (SEQ ID NO: 36)

PI-U81-AI : 5 ' ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-ACATCCTA- CTGACAT (SEQ ID NO: 37)

Pl-U9i-Al : 5' ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-CTATATCT- CTGACAT (SEQ ID NO: 38)

Pl-UlOrAl: 5 ' ATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-ATCCTTCA- CTGACAT (SEQ ID NO: 39)

Bridging oligonucleotide:

Abndge2- A_bridgel : 5 ' TCTAGGT-ATGTCAG (SEQ ID NO: 40) Internal library identifiers

Aintemal ^ ~U internal 1 " Ai_nt_ernal : 5 ' [Phos] ACCTAGA-GTC ATGTA-TTTTTTTTTTTTTTTTVN (SEQ ID NO: 41 )

^internal 1 "Uinternal2- Aintemal2: 5 ' [Phos] ACCTAGA-TCACCATG-TTTTTTTTTTTTTTTTVN (SEQ ID NO: 42)

Aintemai l

5 ' [PhosJACCTAGA-CTAGTTCA-TTTTTTTTTTTTTTTTVN (SEQ ID NO: 43)

Aintemai l A_merm<2 5 ' [Phos]ACCTAGA-CACGATCG-TTTTTTTTTTTTTTTTVN (SEQ ID NO: 44) Ai_ntemail -U_intemai5- A_intemai2: 5 ' [PhosJACCTAGA-CGACACCA-TTTTTTTTTTTTTTTTVN (SEQ ID NO: 45)

Aintemall -Uintemaie- A_mlem!t{2 5' [Phos]ACCTAGA-CTCACTTG-TTTTTTTTTTTTTTTTVN (SEQ ID NO: 46)

Aintemall -U_intemai7- Ai„_temai2: 5 ' [PhosJACCTAGA-GATCACTG-TTTTTTTTTTTTTTTTVN (SEQ ID NO: 47)

Aintemall -Ui„temai8- Ain,emai2: 5' [Phos]ACCTAGA-TGATACTA-TTTTTTTTTTTTTTTTVN (SEQ ID NO: 48)

Aintemail A_in,emai2: 5' [Phos]ACCTAGA-TGTCTAC-TTTTTTTTTTTTTTTTTVN (SEQ ID NO: 49)

Aimemail -Uintemail O- A_intem_ai2: 5 ' [PhosJACCTAGA-TGACTGTC-TTTTTTTTTTTTTTTTVN (SEQ ID NO: 50)

Second terminal library identifier:

P2-A2: 5 ' CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-

AAAAAAAAAAAAAAAAAA (SEQ ID NO: 51)

9.2 Sample combination

The first five library identifiers from the group of first terminal library identifiers (Pl-Ul i-Al, Pl-U2i-Al, Pl-U3 i-Al, P1 -U4]-A1 and P1 -U5]-A1) were mixed in a pair-wise manner with the first five library identifiers from the group of internal library identifiers (Aintemall -Uintemail -

Aintemal2, Aintemall "Uinternal2- Aj_nternal2, Aj_ntemall -Uintemal3- Ajnternal2, Asternal 1 -Uintemal4- Aintemal and Aintemal l -Uintemai - Ain_temai ) in the presence of the bridging oligonucleotide (A_bridge2- Abridgel)- And the last five library identifiers from the group of first terminal library identifiers (PI-U6₁-AI, P 1 -U7!-A1 , P 1 -U8 ,-A1 , Pl-U9 Al and Pl-Ul Oi-Al) were mixed in a pair-wise manner with the last five library identifiers from the group of internal library identifiers

(Ajntemall -Uintemal6- Ai_nternal2, Ajntemall

Ajnternal2, Ajntemall "Uintemal 8- Ajnternall -

Uintemai9- Ainternai2 and Ajntemall -Uimemai 10- Aintemai ) in the presence of the bridging oligonucleotide (Abndge2- A_Dridgel)- The mixes were done in standard 96-well plates. The final concentration of each library identifier and bridging oligonucleotide in the reaction volume was 100 nM. Identifiers were combined and strands were fused by ligation in presence of 50 mM NaCl using the following reaction mix and conditions:

- 3 uL identifier mix (in 50 mM NaCl)

- 0.5 μΐ T4 DNA ligase buffer lOx (NEB) - 0.25 μΐ. T4 Ligase (5\ι/μΙ from NEB)

- 1.25 ul NaCl 50 mM

The library identifiers were combined at 25°C for 30 min to allow the ligation.

In order to fuse to each combination the second terminal identifier the following reaction mix and conditions were performed:

- 5 μΐ,, ligated identifiers mix

- 0.5 μΐ, DNA polymerase I buffer lOx (NEB)

- 0.1 nL dNTPs 10 mM

- 1 μΙ_, P2-A2 second terminal identifier 0.5 μΜ

- 0.05 μΐ, DNA polymerase I (NEB)

- 3.35 μΐ, H₂0

The samples were incubated:

- 10 minutes at 32°C

- 10 minutes at 37°C

- 10 minutes at 42°C

- 10 minutes at 80°C to inactivate the DNA polymerase

9.3 Pooling and amplification

5 uL of each combination were pooled in a standard test tube. A PCR was performed on the 4 μΐ, of pooled library identifiers using the primer PI and P2. Specifically it was mixed:

- 4 μί of the pooled library

- 5 μΙ_- lOx AmpliTaq Gold polymerase buffer (Applied Biosystems)

- 4 μΐ, MgCl₂ 25 mM

- 5 μί PI oligonucleotide (10 μΜ)

- 5 μΐ, P2 oligonucleotide (10 μΜ)

- 1 ^ dNTPs (lO mM)

- 25.3 μΐ H₂0

- 0.7 μΐ, AmpliTaq Gold polymerase (5U/ μΐ, from Applied Biosystems)

The following cycles were performed in a DNA Engine Tertrad 2 PCR machine (Bio-

Rad):

2 min 96°C 10 s 96°C

15x 30 s 65°C

30 s 72°C

5 min 72°C.

9.4 Identification

The PCR amplification products were used for Illumina GAIIx sequencing. After filtering for good quality reads 337,009 sequences were assigned to one of the 100 putative possible combinations. 334.393 reads (99.224 %) correspond the expected sequences produced during the sample combination step while 2,616 reads (0.776 %) correspond to non expected sequences (false positive combinations). From the positive combinations, 72,233 reads (21.434 %) correspond to the fusion of the first five library identifiers from the group of first terminal library identifiers (Pl-U -Al, Pl-U2 Al, Pl-U3 Al, Pl-U4 Al and Pl-U5 Al) with the first five library identifiers from the group of internal terminal library identifiers

( jntemal -Uintemal - Ajnternal2, Ajntemal -Ujntemal2- Ajnternal, Ajnternal "Uintemal3 - Ajnternal, Aj_nternal 1 "

Uintemai4- Aintemai2 and Ai_ntemai -Uintemai5- Aintemai2) and 262,160 reads (77.79 %) correspond to the last five library identifiers from the group of first terminal library identifiers (PI-U6₁-AI, Pl-U7i-Al, Pl-U8rAl, P1-U9,-A1 and P1-U10,-A1) with the last five library identifiers from the group of second terminal library identifiers (Aintemai l

A_imernai2, Agonal 1 -

Uinternal7- Ainternal2, Asternal 1 -Uj_ntenial8- Aj_nternal2, Ajntemall -Ujntemal9- Aintemal2 and Ajnternal 1"

Uinterqai O- Ajnternal). While from the false positive combinations, 756 reads (0.224 %) correspond to the fusion of the first five library identifiers from the group of first terminal library identifiers (P1-U1 ,-A1, Pl-U2_rAl, Pl-U3 i-Al, Pl-U4 Al and P1-U5,-A1) with the last five library identifiers from the group of second terminal library identifiers (Aj_nternai l -

Uinternal6- Ajnternal2, Ajnternal 1 -Ujnternal7- Ajnternal2, Ajntemall -Ujnterna|8- Aj_ntemal2, Ajntemal -Uj_ntemal9-

Aintemai2 and Ajnternal l-Uintemai 10- Ajn_temai2) and 1,860 reads (0.552 %) correspond to the last five library identifiers from the group of first terminal library identifiers (PI-U6₁-AI, P1-U7_!- Al, PI-U8₁-AI, P1-U9,-A1 and Pl-Ul Ot-Al) with the first five library identifiers from the group of second terminal library identifiers (Aj_ntemail -Internal 1- Ajn_temai2, Ai„_temail -Uintemai2-

Ajnternal2, Ajntemall "Ujntemal3- Aj_ntemal2, Ajnternal 1 "Uinternal4- Ajnternal and Ajntemall -Uj_ntemal5-

Ainiemai2). Together the precision is 99.22%, being precision defined as the proportion of the true positives identified against all the obtained results (both true positives and false positives).

Claims

A method of determining the identity of one or more samples of a set of samples, comprising the steps of:

(A) providing a set of samples wherein each sample comprises:

(bb) a member of a second library, wherein each member of said second library comprises a library component and a library member identifier,

and optionally

The method according to claim 1 , wherein the library components of the first library are identical with the library components of the second library and with the library components of the optional one or more supplemental libraries.

The method according to any one of claims 1 to 2, wherein the first library is identical with the second library and with the optional one or more supplemental libraries.

The method according to any one of claims 1 to 3, wherein the set of samples in step (A) is provided by separately combining each member of a first library with each member of a second library and with each member of the optional one or more supplemental library identifiers, thereby generating a discrete sample for each combination.

5. The method according to any one of claims 1 to 4, wherein the set of samples is assayed for assay-dependent effects due to the presence of the two or more library components, preferably during or after step (A) or after step (B).

6. The method according to any one of claims 1 to 5, wherein each member of the first and second and optionally the one or more supplementary library is comprised in a separate droplet, preferably a micro-fluidic aqueous droplet of a water-in-oil emulsion, a separate bead or a separate well of a multi-well plate prior to step (A).

7. The method according to any one of claims 1 to 6, wherein the library component of a library is selected from the group consisting of cells, viruses, bacteria, unicellular organisms, multicellular organisms, genetically modified cells, proteins, peptides, hormones, antibodies, R A (siRNA, dsRNA), small compounds, drugs, pharmaceutically active substances, metabolites, natural compounds, culture media, body fluids such as urine, blood or lymph, tissues, plant seeds or samples of soil, plants or marine origin.

8. The method according to one any of claims 1 to 7, wherein the identifiers are sequencing identifiers, preferably oligonucleotides, spectroscopy identifiers, preferably a mass spectroscopy identifier, optical identifiers, preferably quantum dots, graphical identifiers or electronic identifiers.

9. The method according to one any of claims 1 to 8, wherein the identifiers are oligonucleotides and interact by annealing.

10. The method according to one any of claims 1, 2 and 4 to 7, wherein the library identifier of the first library is a first terminal library identifier comprising the following oligonucleotide elements:

(i) a priming region (PI),

(ii) a unique region (Ul), and

(iii) an annealing region (Al);

wherein the library identifier of the second library is a second terminal library identifier comprising the following oligonucleotide elements:

(i) a priming region (P2), (ii) a unique region (U2), and

(iii) an annealing region (A2);

and wherein the library identifier of the optional one or more supplemental libraries is the following:

(a) a unit of two internal library identifiers, wherein the first internal library identifier comprises the following oligonucleotide elements:

(i) an annealing region (Amtemail ), wherein Aintemail anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uintemail), and optionally

(i) an annealing region (Ain_ternai2), wherein Ain_teniai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (U_in,em_i2), and/or

and/or

(b) an internal library identifier, comprising the following oligonucleotide elements:

(i) an annealing region (Aintemail ), wherein Aintemai anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Ui„_ternai3), and

(iii) an annealing region (Aimemai), wherein Ain_temai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library;

wherein Al anneals to A2 or Aintemai l or to the annealing region of another internal library identifier of a further supplemental library and A2 anneals to Al or Ain_temai2 or to the annealing region of another internal library identifier of a further supplemental library. The method according to claim 10, wherein each sample optionally further comprises one or more bridging oligonucleotides each comprising the following elements:

(i) an annealing region (Abridge 1 ), wherein Abridge 1 anneals to Al , Aintemai2, to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide,

(ii) optionally a connector region (C), and

(iii) an annealing region (Abridge2), wherein Abridge2 anneals to A2, Aintemail , to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide,

The method according to any one of claims 10 to 1 1 , wherein step (B) comprises exposing the set of samples to conditions allowing annealing of the annealing regions and optionally the linking regions and ligating and/or elongating the annealed library identifiers with a nucleotide polymerase generating double stranded annealed oligonucleotides, and wherein step (D) comprises selecting oligonucleotides wherein at least two identifiers are linked.

Method according to any one of claims 10 to 12, wherein the length and sequence of all annealing regions (Al , A2, Asternal 1 , Ain_temai2, Abridged and A_hMge2) and linking regions (LI and L2) is such that the annealing temperature is lower than the annealing temperature of (a) primer(s) annealing to PI and/or P2 or a sequence complementary thereto.

Method according to any one of claims 12 to 13, wherein the selecting in step (D) comprises one of the following:

(i) amplifying the double stranded annealed oligonucleotides in the pool using oligonucleotide primers annealing to sequences complementary to the priming regions of the two terminal library identifiers, wherein the amplification is carried out under conditions that do not allow annealing of the annealing regions and optional linking regions,

(ii) degrading single stranded oligonucleotides,

(iii) size selection, and

(iv) affinity purification of the linked or the unlinked identifiers, and wherein step (D) preferably further comprises analysing the oligonucleotide sequence of the unique sequences of the double stranded annealed oligonucleotides.

Method according to any of claims 10 to 14, wherein the nucleotide sequence of PI of each member of the first library is identical and the nucleotide sequence of P2 of each member of the second library is identical, and wherein the nucleotide sequences of PI and P2 are different or identical.

Method according to any of claims 10 to 15, wherein the nucleotide sequence of Al of each member of the first library is identical, the nucleotide sequence of A2 of each member of the second library is identical and optionally the nucleotide sequence of the annealing region or regions of each member of a supplementary library (Asternal 1 and Aintemai2) is or are identical.

Method according to any of claims 10 to 16, wherein the nucleotide sequence of LI of each member of the first internal library identifier of a supplementary library is identical and the nucleotide sequence of L2 of each member of the second internal library identifier is identical.

Method according to any of claims 10 to 17, wherein the nucleotide sequence of intemail of each member of the first internal library identifier of a given supplemental library is identical and the nucleotide sequence of Ajj_nema]2 of each member of the second internal library identifier of a given supplemental library is identical.

Method according to any of claims 10 to 18, wherein the elements of the terminal and internal library identifiers within a library fulfil one or more of the following criteria:

(a) annealing regions of the terminal library identifiers, of the optional internal library identifiers, or of the bridging nucleotides, preferably also the linking regions of the internal library identifiers, do not anneal to a unique region and/or to themselves under the conditions of step (B);

(b) the unique regions of the terminal library identifiers within a library do not anneal to each other or to themselves under the conditions of step (B); (c) the unique regions of a unit of internal library identifiers within a supplementary library anneal to each other or not but do not anneal to themselves under the conditions of step (B); and/or

(d) the terminal library identifiers and the optional internal library identifiers in the libraries comprised in the set of samples do not anneal to themselves under the conditions of step (B).

20. Method according to any of claims 10 to 19, in which identifiers and/or bridging nucleotides comprise oligonucleotides derived from molecular readouts resulting in nucleotides, such as genotyping, RNA expression, protein interaction with proximity ligation.

21. Method according to any of claims 10 to 20, comprising a further step after step (B) and before step (C) of increasing the temperature to or above the melting point of the annealing regions and optionally the linking regions.

22. Method according to any of claims 10 to 21, wherein the nucleotide polymerase in step (B) is selected from the group consisting of mesophiiic polymerases, thermophilic polymerases, DNA-dependent DNA polymerases if the oligonucelotides consist of DNA, RNA-dependent DNA polymerases if the oligonucelotides consist of RNA.

23. Method according to any of claims 10 to 22, wherein one or more of the first terminal library identifier, the second terminal library identifier, the internal library identifier, and the bridging nucleotide are at least in part double stranded oligonucleotides, and/or the unit of two internal library identifiers forms at least in part a double strand oligonucleotides in such that under the conditions of step b) a free 3' position of an identifier is juxtaposed to a free 5' position of another identifier in such that at least one oligonucleotide strand of identifier can be ligated to an oligonucleotide strand of another identifier.

24. Method according to any of claims 10 to 23, wherein the primer or primers annealing to a sequence complementary to the priming region of one or both terminal library identifiers, a nucleotide mix, a buffer, reagents and/or the nucleotide polymerase is added to each member of the library prior to providing the set of samples, or during or after step a), preferably the primer or primers annealing to a sequence complementary to the priming region of one or both terminal library identifiers is added in step d)

25. Method according to any of claims 10 to 24, wherein step (D) further comprises after selecting analysing the nucleotide sequence of the unique sequences of the double stranded annealed oligonucleotides.

26. Method according to any of the claims 10 to 25, wherein the oligonucleotides comprise RNA or DNA.

27. The method according to any of claims 1 to 26, wherein the samples of the set of samples further comprise enzymes, enzymatic buffers, nucleotides, reporter gene expression system, fluorescent molecules, bioluminescent reagents, colorimetric reagents, radioactive molecules, DNA- and RNA-dependent polymerases, oligonucleotides and/or detergents.

28. Set of a first and a second library for carrying out the method of claims 10 to 27, wherein each member of the first and second library comprises a different library component and a different terminal library identifier and optionally one or more supplemental libraries, wherein each member of the one or more supplemental libraries comprises a different library component and an internal library identifier or set thereof, wherein

(i) a priming region (PI),

(ii) a unique region (Ul),

(iii) an annealing region (Al), and

(i) a priming region (P2),

(ii) a unique region (U2),

(iii) an annealing region (A2),

and optionally (cc) each unit of two internal library identifiers of one or more supplemental libraries comprises a first internal library identifier which comprises the following oligonucleotide elements:

(i) an annealing region (Ajntemail ), wherein Aimemail anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uintemai ), and optionally

(i) an annealing region (Aintemai2), wherein Ain_ternai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uintemai2), and/or

(i) an annealing region (Ain_ternail ), wherein A;_ntem_ail anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (U_intemai3), and

(iii) an annealing region (A^ema^), wherein Aj_ntemai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library

and/or

(i) an annealing region (Abridge ), wherein Abridgel anneals to Al , Aj_ntemai2, to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide,

(ii) optionally a connector region (C), and (iii) an annealing region (Abridge2), wherein Abndge2 anneals to A2, Ai_ntemail , to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide.

29. Set of groups comprising a group of first terminal library identifiers and a group of second terminal library identifiers for carrying out the method of claims 10 to 27, wherein each member of the group of the first and second terminal library identifiers is different from each other and wherein each terminal library identifier of the first and second group comprises the following oligonucleotide elements:

(i) a priming region (P 1 , P2),

(ii) a unique region (Ul, U2), and

(iii) an annealing region (Al, A2), wherein Al anneals to A2 or optionally to the annealing region of an internal library identifier or unit thereof (Asternal 1_>

and the length and the sequence of the oligonucleotides is such that the T_m of complementary annealing regions is lower than the T_m of a primer or primers to P I and/or P2.

30. Set of groups according to claim 29 further comprising one or more groups of different internal library identifiers or units thereof, wherein each internal library identifier or unit thereof comprises

(i) an annealing region (Asternal 1), wherein Asternal 1 anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uunemai ), and optionally

(i) an annealing region (Ain_ternai2), wherein Ain_temai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uin_ternai2), and/or (iii) a linking region (L2) that anneals to the optional linking region (LI ) of the first member of the unit,

(i) an annealing region (Aj_nteniail ), wherein Aimemail anneals to Al or to the annealing region of another internal library identifier of a further supplemental library,

(ii) a unique region (Uinternai3), and

(iii) an annealing region (Ajn_temai2), wherein Aj„temai2 anneals to A2 or to the annealing region of another internal library identifier of a further supplemental library,

and/or

(i) an annealing region (Abndgel ), wherein Abridge 1 anneals to Al, Aimemai, to the annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide,

(ii) optionally a connector region (C), and

(iii) an annealing region (Abndge ), wherein Abndge2 anneals to A2,

^{to me} annealing region of another internal library identifier or to the annealing region of another bridging oligonucleotide. 1. A computer implemented method of designing a group of first terminal library identifiers, a group of second terminal library identifiers and optionally one or more groups of internal library identifiers for carrying out the method of claims 10 to 27, comprising the following steps:

(ii) determining the number of groups of internal library identifiers required according to the method of claims 1 -20,

(iii) generating sequences for P I and P2, respectively, wherein each sequence

is a naturally occurring sequence, is generated randomly or is selected from a set of pre-existing priming sequences, and allows annealing of a primer sequence at a specific and/or preset annealing temperature which is essentially identical for PI and P2 of the group of first and group of second terminal library identifiers, respectively,

(iv) generating annealing sequences for Al and A2 and optionally Ai_mernail and Aj_nternai2, wherein each sequence:

if internal library identifiers are present Ai_nteraail and Ain_temai2, respectively, is identical for one annealing region of the same group of internal library identifiers or of the same group of units of two internal library identifiers, wherein Al anneals to A2 or Asternal 1 or to the annealing region of another internal library identifier of a further supplemental library and A2 anneals to Al or Ai_nternai2 or to the annealing region of another internal library identifier of a further supplemental library and wherein the length and sequence of all annealing regions (Al, A2, Ain_ternail and Ain_temai2) and linking regions (LI and L2) is such that the annealing temperature is lower than the annealing temperature of (a) primer(s) annealing to PI and/or P2 or a sequence complementary thereto,