CN112746332A

CN112746332A - Nucleic acid coding compound library composed of non-natural nucleotides

Info

Publication number: CN112746332A
Application number: CN202011177047.7A
Authority: CN
Inventors: 李进; 巩晓明; 窦登峰
Original assignee: Hitgen Inc
Current assignee: Hitgen Inc
Priority date: 2019-10-29
Filing date: 2020-10-29
Publication date: 2021-05-04

Abstract

The invention discloses a nucleic acid coding compound and a compound library consisting of non-natural nucleotides. The nucleic acid encoding compounds and compound libraries of the present invention overcome the limitations of the use of natural base libraries of nucleic acid encoding compounds by introducing Z, P, S, B, S' five artificial bases.

Description

Nucleic acid coding compound library composed of non-natural nucleotides

Technical Field

The invention particularly relates to a nucleic acid coding compound and/or compound library consisting of non-natural nucleotides.

Background

In the field of new drug development, high-throughput screening for biological targets is one of the main means for rapidly obtaining lead compounds. However, traditional high throughput screening based on single molecules requires long time, large equipment investment, limited number of library compounds (millions), and the building of compound libraries requires decades of accumulation, limiting the efficiency and possibility of discovery of lead compounds. The recently developed DNA coding compound library technologies (WO2005058479, WO2018166532, CN103882532) combine the technologies of combinatorial chemistry and molecular biology, and each compound is labeled with a DNA tag on the molecular level, so that a compound library up to hundred million levels can be synthesized in a very short time, and the compound can be identified by a gene sequencing method, so that the size and synthesis efficiency of the compound library are greatly increased, and the technology becomes the trend of the next generation compound library screening technology. DNA-encoded compound library technology is beginning to be widely used in the pharmaceutical industry and produces many positive effects (Accounts of Chemical Research,2014,47, 1247-.

With the expansion of the application of the technology, the DNA coding label also shows the limitation on the screening of certain biological targets: 1) For example, proteins such as transcription factors and the like which interact with DNA sequences, ribonucleic acid (RNA) and the like are used as disease regulation targets, more background binding signals are generated by traditional screening of DNA coding compound libraries. These signals may result from the affinity of the DNA tag of the compound for the transcription factor protein, or the resulting hybridization affinity of the DNA tag of the compound to the RNA target, not via binding of the compound structure itself to the biological target. 2) When the traditional screening of the DNA coding compound library is applied to certain biological samples (such as screening based on living cell in situ membrane protein targets), the amplification efficiency is reduced and certain mismatch false positive amplification signals are formed because the biological samples are easy to generate interference of endogenous genomic DNA on the amplification and detection of DNA tags of enriched molecules.

Shuichi Hoshika et al disclose a nucleic acid coding system for non-natural bases (Science, 2019, 363: 884-887). The invention applies the non-natural base to the DNA coding compound library technology and uses a coding system different from natural coding nucleotide, thereby overcoming the application limitation and improving the application range of the coding compound library technology.

Disclosure of Invention

The invention discloses a nucleic acid encoding compound, comprising a functional part and a nucleic acid part, wherein the base of the nucleic acid part is selected from non-natural base Z, P, S, B, S';

wherein the base Z is

Base P is

Base S is

Base B is

The base S' is

Further, the nucleic acid encoding compound further comprises a linking group, whereby the functional moiety and the nucleic acid moiety are linked by the linking group.

Further, the nucleic acid portion includes single-stranded nucleic acid and/or double-stranded nucleic acid.

Further, in the double-stranded nucleic acid, the base Z corresponds to the base P, the base S corresponds to the base B, and the base S' corresponds to the base B.

Further, the nucleic acid portion is composed of ribonucleotides and/or deoxyribonucleotides.

Further, the bases of the ribonucleotides are Z, P, S', B and the base of the deoxyribonucleotides is Z, P, S, B.

Further, the nucleic acid part is greater than 10bp in nucleic acid length.

Further, the nucleic acid portion may be inserted with a nucleotide having a natural base, but 3 or more consecutive nucleotides of the natural base are not inserted, and the number of nucleotides of the natural base is less than 30% of the total number of nucleotides of the nucleic acid portion.

Further, the nucleic acid encoding compound has the structure shown in formula I:

wherein the content of the first and second substances,

x is an atom or molecular framework having a valence of at least 3;

L₁is a linking group to which the 5' end of a nucleic acid can be linked;

L₂is a linking group to which the 3' end of a nucleic acid can be linked;

Z₁is a first nucleic acid moiety;

Z₂is a second nucleic acid moiety; wherein the bases of the second nucleic acid portion at least partially correspond to the bases of the first nucleic acid portion;

m is a linking group to which a functional moiety may be attached;

y is a functional moiety consisting of one or more synthons.

Further, X is a carbon atom.

Still further, said M is selected from an alkylene chain or a poly (ethylene glycol) chain.

Further, said L₁、L₂Selected from alkylene chains or poly (ethylene glycol) chains.

Further specifically, the alkylene chain, poly (ethylene glycol) chain, bears a phosphate linker group.

Further, Z is₁And Z₂Each further comprising a PCR primer binding site sequence.

The invention also discloses a library of nucleic acid encoding compounds comprising at least 10²A different one of the above nucleic acid encoding compounds.

Obviously, many modifications, substitutions, and variations are possible in light of the above teachings of the invention, without departing from the basic technical spirit of the invention, as defined by the following claims.

The present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. All the technologies realized based on the above contents of the present invention belong to the scope of the present invention.

Drawings

FIG. 1 shows compound 550, known to bind to TAR RNA, having two sites to which nucleic acid codes can be ligated.

Detailed Description

Example 1 construction of nucleic acid encoding Compounds

1) A compound 550 (figure 1, Ki ═ 0.039 mu M) with a binding effect on TAR RNA is subjected to the method described in WO2005058479 or WO2018166532 to construct nucleic acid coding compounds 1-3 with a nucleic acid tag and control compounds 4-6, wherein the structures of the compounds are as follows:

numbering	Compounds moieties	Nucleic acid tag
			1	Compound 550	Natural DNA sequence with TAR RNA binding function
2	Compound 550	Non-native nucleic acid sequences of the invention
			3	Compound 550	Natural DNA sequence without binding effect with TAR RNA
4	Is free of	Natural DNA sequence with TAR RNA binding function
			5	Is free of	Non-native nucleic acid sequences of the invention
6	Is free of	Natural DNA sequence without binding effect with TAR RNA

Example 2 verification of screening methods for Compounds encoded by nucleic acids of the invention

The 3' end of the TAR RNA sequence was modified with biotin for immobilization. Tag small peptide ends were labeled with FAM. Fixing TAR RNA with neutral avidin protein magnetic beads, incubating with FAM-tat, eluting once, heating the TAR RNA and the magnetic beads, measuring the fluorescence content of FAM in supernatant, confirming that FAM-tat is combined with the fixed TAR RNA, and ensuring that the TAR RNA has activity.

The compounds 1-6 were incubated with TAR RNA in a screening buffer (50mM Tris,80mM KCl,0.3mg/mL ssDNA, 0.01% Tween 20, pH 7.5) for 1h, followed by addition of neutravidin magnetic beads for incubation at room temperature for 30min to immobilize the TAR RNA, which was then eluted with the screening buffer, followed by transfer of the beads to an elution buffer (50mM Tris,160 mM KCl, pH 7.5) heated to 95 ℃ for 10min, and the supernatant was collected. The nucleic acid content in the elution buffer was quantified by qPCR and the degree of enrichment of nucleic acid was compared between groups.

Adding the compounds 1-6 into a traditional DNA coding compound library according to the number of molecules of 10^ 5-10 ^9, and screening the TAR RNA, wherein the specific operation steps are as described above. The encoded compounds from the first round of screening were subjected to a second round of screening in TAR RNA, and this was repeated until the total number of eluted molecules was around 10^ 8. And carrying out PCR amplification and sequencing on the obtained coding compound, then decoding a sequencing result, and comparing the final enrichment copy number of the compounds 1-6.

Claims

1. A nucleic acid encoding compound comprising a functional portion and a nucleic acid portion, wherein the base of the nucleic acid portion is selected from the group consisting of non-natural base Z, P, S, B, S';

wherein the base Z is

Base P is

Base S is

Base B is

The base S' is

2. The compound of claim 1, wherein: the nucleic acid encoding compound further comprises a linking group by which the functional moiety and the nucleic acid moiety are linked.

3. The compound of claim 1, wherein: the nucleic acid portion includes single-stranded nucleic acid and/or double-stranded nucleic acid.

4. A compound according to claim 3, characterized in that: in the double-stranded nucleic acid, the base Z corresponds to the base P, the base S corresponds to the base B, and the base S' corresponds to the base B.

5. The compound of claim 1, wherein: the nucleic acid portion is composed of ribonucleotides and/or deoxyribonucleotides.

6. The compound of claim 5, wherein: the bases of the ribonucleotides are Z, P, S' and B, and the base of the deoxyribonucleotides is Z, P, S, B.

7. The compound of claim 1, wherein: the nucleic acid part has a nucleic acid length of more than 10 bp.

8. The compound of claim 5, wherein: the nucleic acid portion may be inserted with nucleotides having a natural base, but not with 3 or more consecutive natural base nucleotides, and the number of nucleotides of the natural base is less than 30% of the total number of nucleotides of the nucleic acid portion.

9. The compound of claim 1, wherein: the structure of the nucleic acid coding compound is shown as the formula I:

wherein the content of the first and second substances,

x is an atom or molecular framework having a valence of at least 3;

L₁is a linking group to which the 5' end of a nucleic acid can be linked;

L₂is a linking group to which the 3' end of a nucleic acid can be linked;

Z₁is a first nucleic acid moiety;

m is a linking group to which a functional moiety may be attached;

y is a functional moiety consisting of one or more synthons.

10. The compound of claim 9, wherein: and X is a carbon atom.

11. The compound of claim 9, wherein: the M is selected from an alkylene chain or a poly (ethylene glycol) chain.

12. The compound of claim 9, wherein: said L₁、L₂Selected from alkylene chains or poly (ethylene glycol) chains.

13. The compound of claim 12, wherein: the alkylene chain, poly (ethylene glycol) chain, bears a phosphate linker group.

14. The compound of claim 9, wherein: z is₁And Z₂Each further comprising a PCR primer binding site sequence.

15. A library of nucleic acid encoding compounds comprising at least 10²A different nucleic acid encoding compound of claims 1-14.