CN108959855B - Computer coding method for screening reagent of DNA coding compound library - Google Patents

Computer coding method for screening reagent of DNA coding compound library Download PDF

Info

Publication number
CN108959855B
CN108959855B CN201810378969.0A CN201810378969A CN108959855B CN 108959855 B CN108959855 B CN 108959855B CN 201810378969 A CN201810378969 A CN 201810378969A CN 108959855 B CN108959855 B CN 108959855B
Authority
CN
China
Prior art keywords
reagent
small molecule
organic small
organic
functional groups
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810378969.0A
Other languages
Chinese (zh)
Other versions
CN108959855A (en
Inventor
吴阿亮
刘世恩
张在红
李科
陈雯婷
邢莉
杨洪芳
彭宣嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Apptec Co Ltd
Original Assignee
Wuxi Apptec Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Apptec Co Ltd filed Critical Wuxi Apptec Co Ltd
Priority to CN201810378969.0A priority Critical patent/CN108959855B/en
Publication of CN108959855A publication Critical patent/CN108959855A/en
Application granted granted Critical
Publication of CN108959855B publication Critical patent/CN108959855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present invention relates to a computer screening method for screening a library of DNA-encoding compounds for agents, the method comprising the following: and (3) checking the dominant weight of the reagent, classifying the reagent according to a functional group or a skeleton, checking the weight of the reagent in an invisible way (after a protective group or a reaction group is removed), and screening the classified reagent according to a certain screening rule to remove the reagent which does not meet the requirement. The computer program obtained by the computer coding method can efficiently, simply and quickly carry out duplication checking and classification treatment on thousands of reagents to obtain the reagents which can be used for constructing a DNA coding compound library and have a certain or several types of functional groups or frameworks, and has wide application prospect in the construction of the DNA coding compound library.

Description

Computer coding method for screening reagent of DNA coding compound library
Technical Field
The invention belongs to the field of computers, and relates to a method for obtaining 4 computer programs by a computer coding method according to a certain rule, carrying out duplicate checking and classification on an obtained organic small molecule reagent catalogue, and efficiently, simply and quickly carrying out duplicate checking and classification treatment on thousands of reagents to obtain reagents which can be used for constructing a DNA coding compound library and have a certain or several types of functional groups or frameworks.
Background
The concept of synthesis and screening of libraries of DNA-Encoded compounds (DEL) was proposed in 1992 by the teaching of the national Scripps institute of Sydney Brenner and Richard Lerner (reference: Proc. Natl. Acad. Sci.,1992,89,5381, U.S. Pat. No. 5573905) by linking an organic small molecule agent to a unique sequence of DNA at the molecular level (i.e., DNA-labeling of small molecule agents), rapidly constructing a large Library of compounds each consisting of residues of a different organic small molecule agent and labeled with DNA of the corresponding unique base sequence using a "combinatorial-resolution" strategy of combinatorial chemistry through two to many cycles, affinity screening of very small libraries of DNA-Encoded compounds to a target, washing away the Library molecules that are not adsorbed to the target, and washing away the remaining Library molecules of compounds that are adsorbed to the target, the concentration of the molecules of the obtained compound library is low, and the molecules are difficult to analyze and identify by a conventional method, but the DNA part in the obtained compound library molecules adsorbed to a target can be copied and amplified by a Polymerase Chain Reaction (PCR) unique to DNA until the obtained DNA quantity can be identified by a DNA sequencer, the sequenced data is decoded by a relation table between an organic small molecule reagent and each specific DNA base sequence which are created when a DNA coding compound library is constructed, so that the organic small molecule reagent corresponding to a specific compound capable of identifying the molecules with potential activity is found, the organic small molecule reagents are combined together by a traditional organic synthesis method to obtain a screened target molecule, and the physiological activity of the target molecule is detected and confirmed.
The method for constructing DNA coding compound Library mainly comprises three methods, the first is DNA-guided Chemical Library Synthesis (DTCL) mainly obtained by using DNA template technology from Ensemble corporation in America, the second is DNA-Recorded Chemical Library (DRCL) mainly obtained by using DNA marking technology from GSK corporation, X-Chem corporation and domestic leader corporation, the third is coding Self-assembly molecular Library (ESAC) mainly obtained by using fragment-based drug design (FBDD) technology from Philogen corporation in Switzerland, the second is a method for constructing DNA coding compound Library which is widely applied in industry at present, the method is simple to operate and lower in cost, and can quickly obtain the DNA coding compound library containing massive small molecular compounds by using a combinatorial chemistry method.
The construction of the DNA coding compound library needs a large number of DNA labels and small organic molecule reagents which can react according to a certain sequence besides a DNA initial fragment (see patent of invention: 201711263372.3 and 201711318894.9 of the company), the codes of the DNA labels can be obtained by a certain computer program, a specific DNA base sequence (see patent of invention: 201711247220.4 of the company) is obtained by a DNA synthesizer, the small organic molecule reagents are obtained, a large-scale transnational medicine enterprise generally has a small molecular compound storage warehouse of the enterprise and can be directly picked up according to a certain requirement, for a small medicine enterprise without certain small molecular compound accumulation, the simplest way is to buy through a reagent company according to a certain requirement, no matter what company, when facing a massive organic small molecular reagent catalogue, specific reagents are selected for the construction of the DNA coding compound library, how to quickly and efficiently screen out a proper small molecule reagent catalogue is a relatively troublesome problem. Detailed published reports of screening of small organic molecule reagents for construction of libraries of DNA-encoding compounds have not been found.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a computer coding method, which can be used for screening organic small molecule reagents of a DNA coding compound library, and can efficiently, simply and quickly carry out weight checking and classification treatment on thousands of reagents to obtain the reagents which can be used for constructing the DNA coding compound library and have certain or several functional groups or skeletons.
The noun explains:
the Ribes-King Rule (Lipinski's "Rule of Five"): the basic rule of the pfeizu who is the capital of the pfeizu pharmaceutical chemist for the summary of the screened drug molecules is that the compound which conforms to the five rules of the rijjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj. In the field of drug development, the riksky five rule is used for preliminary screening of compound libraries to eliminate molecules which are not suitable for being synthesized into drugs, reduce the screening range and reduce the drug development cost, and the specific contents are as follows: the molecular weight is not more than 500, the number of hydrogen bond donors (including hydroxyl, amino and the like) in the structure of the compound is not more than 5, the number of hydrogen bond acceptors in the compound is not more than 10, the logarithm value (logP) of the water-lipid distribution coefficient of the compound is between-2 and 5, and the number of rotatable bonds in the compound is not more than 10. The simplified four rules remove the limitation on the number of rotatable bonds, and the three rules further remove the limitation on the number of hydrogen bond acceptors.
Active Aryl halides (Active Aryl halides): in the presence of alkali, aromatic halogen capable of reacting with organic amine under SnAr reaction at a certain temperature is generally ortho-or para-halogen (Cl or F) with strong electron-withdrawing group on the aromatic ring, or ortho-position of the N-containing aromatic ring contains Cl or F.
The International Chemical Identifier (InChI) is a string of characters established by the International Union of pure and applied chemistry and the national institute of Standard and technology to uniquely identify the IUPAC name of a compound.
In order to solve the above technical problems, the technical solution adopted by the present invention is described as follows:
reagent types that can be used for constructing libraries of DNA-encoding compounds:
the starting DNA segment of the DNA-encoding compound library is typically done by a DNA synthesizer. Due to the restriction of the DNA synthesizer and the On-DNA chemical reaction, the obtained initial DNA fragment or the modified initial DNA fragment obtained by the On-DNA chemical modification and specifically used for constructing the DNA coding compound library can have amino, carboxyl, aldehyde, aromatic halide, azide, alkyne, alkene, diene, active halide and the like. On-DNA chemistry to construct libraries of DNA coding compounds is carried out in aqueous phase, and nearly 50 methods have been reported, but are dwarfed compared to the enormous number of conventional organic chemistry synthesis methods. Based On the characteristics or limitations of the modified starting DNA fragment and the On-DNA chemical reaction, the inventors have found through intensive research and careful analysis that reagents that can be used for constructing libraries of DNA-encoding compounds need to satisfy the following conditions:
1. molecular weight limitation of organic small molecule reagents
By organic small molecule agent is meant an organic agent having a molecular weight of less than 1000. Due to the five rules in pharmaceutical chemistry and the limitation of the cycle times of DNA coding compound libraries (typically three to four times), the molecular weight of small organic reagents after removal of protecting groups (mainly amino protecting groups, ester protecting groups) and reactive functional groups (mainly boronic acid, ester, halogen) should be within 350, exceeding the direct removal of 350. The molecular weight of the organic small molecular reagent is less than 1000, and the molecular weight of the organic small molecular reagent after the protective group is removed, the salt and the reaction functional group is less than or equal to 350. The organic small molecule reagent only receives hydrochloride, hydrobromide and does not receive other salt forms.
2. The organic small molecule reagent at least contains a reaction functional group which can react with the modified initial DNA fragment
The organic small molecule reagent at least has one reaction functional group, and the reaction functional group is selected from one or more of carboxyl, primary amino, secondary amino, boric acid ester, sulfonyl chloride, isocyanate, sulfur isocyanate, acyl chloride, anhydride, aldehyde, alkyne, azide, active aromatic halide, phenol, alcohol and mercaptan. The reaction functional group is a reaction site for reacting the organic small molecule reagent with the modified initial DNA fragment.
3. Organic small molecule reagents not having any one of the reactive functional groups listed in Condition 2 need to be removed
Functional groups such as alkane, aromatic hydrocarbon, nitro, nitrile, azo, non-reactive aromatic halide, sulfonic acid, ester, amine with protecting group, urea, thioether are not reactive functional groups described in the condition 2, and small organic molecule reagents having only one or more of these functional groups are not available for construction of libraries of DNA encoding compounds.
4. Organic small molecule reagents with two/more identical reactive functional groups that do not define a reactive site need to be removed
The di/poly functional group organic small molecule reagent with the same reaction functional group exposed (unprotected) and/or protected by a protecting group can not determine which one of the same reaction functional group is used as a reaction site, and should remove: for example, two or more Boc protected amines, both methyl and ethyl esters, unprotected di/polyamines, di/polyacids, di/polyaldehydes, multiple active aromatic halides of the same chemical environment (sometimes available as scaffold, such as cyanuric chloride, because the reaction activity of one active Cl is reduced after the reaction of another active Cl), and two or more aryl iodides and aryl bromides (or aryl chlorides in the N-ortho position). The organic small molecule reagent cannot contain two or more identical reaction functional groups.
5. Reagents having a plurality of differently exposed functional groups that can interact under certain conditions require removal
For the same reagent containing unprotected amino and carboxyl, aldehyde and amino, ester and amino, considering the chemical stability of the reagent, whether salifying and reaction activity exist, the unprotected amino and carboxyl, the unprotected amino and aldehyde and amino, the unprotected amino and ester and amino can not be directly removed; very unstable reagents: such as Fmoc protected amino and naked amino, unsaturated hydrocarbons and amino, alkyl halides and amino, aryl bromides/iodides and boronic acids (esters) can be removed directly. The organic small molecule reagent cannot contain a plurality of different exposed reaction functional groups which can react with each other under certain conditions.
6. Selection of protecting group-containing reagents
When the small organic molecule reagent contains a protecting group, protecting groups that can be used in large amounts are: amino protecting groups are: fmoc, Boc, Nvoc, Alloc; amino protecting groups which may not be used are: bn, Dmb, PMB, Cbz, Ac, CF3CO, Teoc, Pht, Tos, Trt, SEM, etc.; the carboxylic acid protecting groups are: methyl, ethyl, tert-butyl; carboxyl protecting groups which may not be used are: n-butyl, isopropyl, benzyl, allyl, or larger esters. The above-mentioned unavailable amino protecting group and unavailable carboxyl protecting group cause the protecting group to be difficult to remove under the existing reaction conditions. When selecting the reagent, the reagent of which the organic amine contains the protective group which cannot be removed is removed; non-hydrolyzable esters are also possible in principleThis is done. When the organic micromolecule reagent contains an amino protecting group, the amino protecting group is selected from Fmoc, Boc, Nvoc and Alloc; when the organic micromolecule reagent contains a carboxylic acid protecting group, the carboxylic acid protecting group is selected from methyl ester, ethyl ester and tert-butyl ester. Reagents containing deprotecting groups that have not been developed for On-DNA chemistry need to be removed.
7. Very low reactivity reagents need to be removed
Very reactive acids/aldehydes/amines etc: reagents such as acids/aldehydes/amines at tertiary carbon positions, or acids/aldehydes/amines having a bulky steric hindrance group at the ortho position, may be used; amines similar in base type and position to DNA: pyrimidines, purines, amines, and these amine reagents are not contemplated. The organic small molecule reagent cannot be acid/aldehyde/amine with low reactivity, namely the reaction functional group cannot be carboxyl, aldehyde, primary amino and secondary amino with low reactivity.
II, a reagent screening method and a reagent screening module in the process of constructing a DNA coding compound library:
before synthesizing a DNA coding compound library, collecting chemical structure information of organic small molecule chemical reagents, screening and classifying according to the chemical structure information, generally screening the reagents according to the flow of FIG. 1, wherein each module can run a self-programmed computer program, each step of the program can be imported and exported through an sdf file, the obtained sdf file is imported into excel, deduplication, screening and sequencing are realized through various functions of the excel, and in actual operation, the order of weight checking and classification can be performed in a crossed manner according to the quantity of the reagents.
The step of importing the sdf file into the computer program is as follows: the self-programming computer program and the sdf file to be processed are placed in the same folder, a shift key is pressed under the folder directory, a right mouse button is clicked, a 'command window is opened here', the name of the computer program is input under a command line, the file name input by exe-i, the file name output by sdf-o, sdf, the operation is started by pressing a carriage return key, and an sdf file of the output file name appears under the folder.
1. Collecting and merging the list of commercially available organic small molecule reagents
Reagent lists obtained from different suppliers (such as reagent company websites of Expo network, amine, Alfa and the like) are firstly sorted according to a uniform rule to obtain information of the small molecule reagents, such as structural formula, SMLIES, inventory, price, supplier, MDL, CAS, MW and the like, and then all the reagent lists are combined into the same excel.
2. A reagent weight checking module: searching for repeated data of the organic small molecule reagent by using the reagent duplication searching module, and removing the repeated data
The sdf file returned after importing the sdf file containing the structural formula information into a reagent duplication checking computer program 'Inchi KeyCalc.exe' is added with a list of 'InChIKey' information, and the duplicated data is removed through a countif function of excel.
One reagent duplication checking computer program "inchikeycalc.exe" specific code executed by the reagent duplication checking module is as follows:
Figure BDA0001640561180000051
Figure BDA0001640561180000061
Figure BDA0001640561180000071
the residues finally provided by a plurality of reagents with different structural formulas in the construction of small molecular compound parts of a DNA coding compound library belong to an invisible duplication and need to be removed, like different amine protecting group reagents of organic amine, different esters of the same carboxylic acid, aryl bromide, aryl iodide and the like for Suzuki or Buchwald and other reactions, and the invisible duplication checking comprises the duplication checking of a reagent framework after the protecting groups are removed and the duplication checking of a residual framework after the functional groups participating in the reactions are removed. The method is characterized in that a list of information of uniqInChI is added to an sdf file returned after the sdf file containing structural formula information is imported into a reagent duplicate checking computer program of 'uniqInchiKey.exe' after protecting groups or functional groups are removed, all structural formulas are described by the same InChi language, and duplicate data are removed through the sorting of excel and a countif function.
The specific code of a reagent duplication checking computer program 'uniqInchiKey.exe' after a protecting group or a functional group is removed, which is executed by a reagent duplication checking module, is as follows:
Figure BDA0001640561180000072
Figure BDA0001640561180000081
Figure BDA0001640561180000091
Figure BDA0001640561180000101
3. reagent functional group classification module: classifying the organic small molecule reagent according to the type and/or the number of functional groups by using a reagent functional group classification module, and screening out the organic small molecule reagent with the functional groups meeting the requirements; further, the classification module will remove data from small organic molecule reagents having undesirable functional groups. The classification module may comprise the following screening methods:
the organic small molecule reagent at least has one reaction functional group, and the reaction functional group is selected from one or more of carboxyl, primary amino, secondary amino, boric acid ester, sulfonyl chloride, isocyanate, sulfur isocyanate, acyl chloride, acid anhydride, aldehyde, alkyne, azide, active aromatic halide, phenol, alcohol and mercaptan; organic small molecule reagents not having any of the above reactive functional groups need to be removed; the organic small molecule reagent cannot have two or more identical reaction functional groups, and the organic small molecule reagent with two or more identical reaction functional groups needs to be removed; the organic small molecule reagent cannot have two or more different exposed functional groups capable of mutually reacting under a certain condition, and the organic small molecule reagent with the two or more different exposed functional groups capable of mutually reacting under a certain condition needs to be removed; when the organic micromolecule reagent contains an amino protecting group, the amino protecting group is selected from Fmoc, Boc, Nvoc and Alloc; when the organic micromolecule reagent contains a carboxylic acid protecting group, the carboxylic acid protecting group is selected from methyl ester, ethyl ester and tert-butyl ester; the reactive functional group of the organic small molecule reagent can not be carboxyl, aldehyde, primary amino and secondary amino with low reactivity.
The sdf file containing the structural formula information is imported into the sdf file returned after the computer program 'FG _ classification.exe' is classified according to the functional groups, two columns of 'Class' and 'Type' information are added, and the required reagent is selected through sorting and screening of excel.
A computer program "FG _ classification.exe" classified by function group executed by the reagent function group classification module is specifically as follows:
Figure BDA0001640561180000111
Figure BDA0001640561180000121
Figure BDA0001640561180000131
Figure BDA0001640561180000141
4. reagent skeleton classification module: classifying organic small molecule reagents according to small molecule frameworks by using reagent framework classification module
The DNA coding compound library can be constructed according to the same type of functional group and the same type of chemical reaction, and can also be constructed according to the same type of small molecule skeleton (Scaffold) and different types of chemical reaction, which relates to that all reagents are classified according to the small molecule skeleton, and a column of 'Scaffold ID' information is added to an sdf file returned after the sdf file containing structural formula information is introduced into a computer program 'Scaffold _ classification.exe' according to the functional group classification, and the reagents needing the skeleton are selected through the sorting and screening of excel. The classification according to the reagent skeleton refers to a class of reagent fragments which all have a certain biological activity and are the same or similar.
A computer program "Scaffold _ classificantion.exe" specific code executed by the agent framework classification module to classify the computer program by small molecule framework:
Figure BDA0001640561180000151
Figure BDA0001640561180000161
Figure BDA0001640561180000171
the list of screened and classified organic small molecule reagents which can be obtained through the 4 screening and classifying steps can be issued to a supplier for purchase or batch claiming and used for synthesizing a DNA coding compound library.
5. The list of the organic small molecule reagents obtained by screening in the steps can be manually checked and verified according to certain organic chemistry and medicinal chemistry knowledge.
The computer coding method related by the invention finishes the screening of the organic small molecule reagent through a reagent duplication checking module, a reagent functional group classification module and a reagent framework classification module. The computer program for checking and classifying 4 reagents is a specific implementation mode, is based on a computer language, comprehensively utilizes knowledge of organic chemistry, medicinal chemistry and chemical informatics to set a certain rule, and is written by using a Python language and an RDkit chemical information toolkit, and can conveniently enable a single person to process the gathering, classifying and checking work of thousands of reagents. Small molecule reagents useful for the synthesis of libraries of DNA encoding compounds can be listed in categories for ease of application, purchase and subsequent use.
The computer program is written by using Python language and RDKit toolkit, etc., and those skilled in the art can use other similar computer tools and computer programs obtained by using the same code without creative labor, which all belong to the protection scope of the present invention.
The method of the invention can efficiently, simply and rapidly carry out duplication checking and classification treatment on thousands of reagents through 4 computer programs to obtain the reagent with a certain or several types of functional groups or skeletons for constructing a DNA coding compound library.
Drawings
FIG. 1 is a flow chart of an embodiment of the computer screening method of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely below, and it should be apparent that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Example 1 procedure for screening existing reagents of the Potentilla chinensis by the method of the invention to obtain a list of small organic molecule reagents useful for the synthesis of libraries of DNA encoding compounds
1. Existing reagent list download in expo network
Landing a Web site "https:// www.labnetwork.com/frontend-app/p/#! The/screening-sets' can download sdf files of all online small molecule reagents of the world Wide Web, and total 4,786,438 reagents (No. 29/3 2018) exist, because of the problem of calculation time length, only 1% of data (namely 47,864 reagents) are randomly sampled in the embodiment to be processed by using the method, and results are displayed.
2. Classifying 1% of the reagent of the Zhangbo net according to functional groups
The reagent functional group classification module classifies 47,864 sampled reagents by the computer program "FG _ classification. exe" of the present invention, and a total of 9 types of monofunctional group reagents, 21216 types of bifunctional group reagents, 3110 types of trifunctional group reagents, and 482 types of trifunctional group reagents, which can be used for the construction of a library of DNA-encoding compounds, were obtained.
Figure BDA0001640561180000191
Figure BDA0001640561180000192
Figure BDA0001640561180000193
Figure BDA0001640561180000201
3. Check the repetition of the reagent of the exhibition net
And then checking the weight of the classified organic small molecule reagent through a reagent weight checking module. The duplicate checking method can be divided into two types, namely explicit duplicate checking and invisible duplicate checking. The reagent duplication checking module can firstly use a computer program 'incre KeyCalc.exe' to carry out dominant duplication checking, and then use another computer program 'uniqInchiKey.exe' to carry out invisible duplication checking, and the results are as follows:
Figure BDA0001640561180000202
Figure BDA0001640561180000203
Figure BDA0001640561180000204
Figure BDA0001640561180000211
the check duplication in the same reagent is simply carried out, if different kinds of double-functional group or triple-functional group reagents are recycled in the construction of a DNA coding compound library, the recessive duplication among the several kinds of reagents is required to be calculated, such as acid-alkyne-Boc amine and acid-alkyne-Fmoc amine, for the construction of the same DNA coding compound library, the two lists are combined and then subjected to invisible duplication checking through a computer program of uniqInchiKey.exe, the repeated reagent skeleton is removed, and then library production is carried out.
The total of 1028 reagents obtained by sampling 1% of the available reagents from the Sorbome, such as "Boc amino-acid" and "Fmoc amino-acid" with the dominant and recessive repeats removed, were theoretically 1028 compounds from the library, regardless of the validated yield of the On-DNA chemistry of the reagents, if we need to do a 3-cycle library of tripeptide DNA-encoded compounds3=1,086,373,952。
Version 2013 of Microsoft Office Excel and version MDL ISIS/Draw 2.5 SP4 are used in this operation,
the present invention uses Python language and RDKit toolkit, but uses other similar software to obtain computer programs with the same or highly similar coding methods and functions, and should be understood to be within the scope of the present invention.
In summary, the above embodiments are merely preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (12)

1. A computer coding method for screening organic small molecule chemical reagents for constructing a DNA coding compound library is characterized in that,
collecting chemical structure information of the organic small molecule chemical reagent, searching repeated data of the organic small molecule reagent by using a reagent duplication checking module according to the chemical structure information, wherein the repeated data comprises dominant duplication checking and invisible duplication checking, and removing the repeated data; wherein, dominant repetition of the organic small molecule reagent refers to a reagent with completely consistent structure; invisible duplication of organic small molecule agents refers to agents whose structures are not completely identical, but whose residues or backbones provided in the construction of the small molecule compound portion of the final DNA-encoding compound library are modular;
classifying the organic small molecule reagent according to the type and/or the number of functional groups by using a reagent functional group classification module;
classifying the organic small molecule reagent according to a small molecule skeleton by using a reagent skeleton classification module;
screening out an organic small molecule reagent with a functional group meeting the requirement;
the screening method comprises the following steps: the molecular weight of the organic small molecule reagent is less than 1000; the molecular weight of the organic small molecular reagent after the protecting group, the salt and the reaction functional group are removed is less than or equal to 350;
the organic small molecule reagent at least has one reaction functional group, and the reaction functional group is selected from one or more of carboxyl, primary amino, secondary amino, boric acid ester, sulfonyl chloride, isocyanate, sulfur isocyanate, acyl chloride, anhydride, aldehyde, alkyne, azide, active aromatic halide, phenol, alcohol and mercaptan; removing the organic small molecule reagent without any one of the reaction functional groups;
the organic small molecule reagent cannot have two or more identical reaction functional groups; removing the organic small molecule reagent with two or more than two same reaction functional groups;
the organic small molecule reagent cannot have two or more different exposed reaction functional groups which can mutually react under a certain condition, and the organic small molecule reagent with two or more different exposed functional groups which can mutually react under a certain condition is removed;
when the organic micromolecule reagent contains an amino protecting group, the amino protecting group is selected from Fmoc, Boc, Nvoc and Alloc; when the organic micromolecule reagent contains a carboxylic acid protecting group, the carboxylic acid protecting group is selected from methyl ester, ethyl ester and tert-butyl ester;
when the organic small molecule reagent is in a salt form, the organic small molecule reagent can only be hydrochloride or hydrobromide;
the reaction functional group of the organic small molecule reagent can not be carboxyl, aldehyde, amino and active aromatic halide with low reaction activity.
2. The method of claim 1, wherein the computer programs for explicit repeat and implicit repeat checks can be integrated together for both types of checks, or used separately for different types of checks.
3. The method of claim 1, wherein the reagent check weighing further comprises the check weighing of the reagent backbone after removing the protecting group and the salt, in addition to the same structural formula check weighing.
4. The method of claim 1, wherein the reagent repetition comprises, in addition to the same structural formula repetition, repetition of the skeleton remaining after removal of the functional groups involved in the reaction.
5. The method of claim 2, wherein the computer program encoded in a computer language for classifying small organic molecule agents can be classified according to the functional group of the agent and also according to the skeleton of the agent;
wherein, the reagent is classified according to the functional groups of the reagent, and the reagent is specifically classified according to the number and the types of the functional groups; the classification according to the reagent skeleton refers to a class of reagent fragments which all have a certain biological activity and are the same or similar.
6. The method of claim 5, wherein the computer programs classified according to the functional groups of the agent and classified according to the skeleton of the agent are integrated together to perform the classification of both functions simultaneously, or are used separately to perform the classification of different functions, respectively.
7. The method of claim 5, wherein the reagent classification is by number of functional groups.
8. The method of claim 5, wherein the agent classification is by functional group species.
9. The method of claim 5, wherein the reagent classification is according to a molecular framework.
10. The method of claim 5, wherein the reagent classification is by molecular framework and functional group species.
11. The method of claim 5, wherein the reagent classification is by molecular backbone and functional group number.
12. The method of claim 1, wherein the same class of reagents obtained after the duplication and classification of small organic molecule reagents is used for the construction of libraries of DNA encoding compounds.
CN201810378969.0A 2018-04-25 2018-04-25 Computer coding method for screening reagent of DNA coding compound library Active CN108959855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810378969.0A CN108959855B (en) 2018-04-25 2018-04-25 Computer coding method for screening reagent of DNA coding compound library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810378969.0A CN108959855B (en) 2018-04-25 2018-04-25 Computer coding method for screening reagent of DNA coding compound library

Publications (2)

Publication Number Publication Date
CN108959855A CN108959855A (en) 2018-12-07
CN108959855B true CN108959855B (en) 2021-05-18

Family

ID=64498882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810378969.0A Active CN108959855B (en) 2018-04-25 2018-04-25 Computer coding method for screening reagent of DNA coding compound library

Country Status (1)

Country Link
CN (1) CN108959855B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109680342B (en) * 2018-12-18 2021-09-28 上海药明康德新药开发有限公司 Method for reducing On-DNA aromatic nitro compound in DNA coding compound library into On-DNA aromatic amine compound

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1151793A (en) * 1994-04-13 1997-06-11 纽约市哥伦比亚大学信托人 Complex combinatorial chemical libraries encoded with tags
CN102971434A (en) * 2010-08-11 2013-03-13 中国科学院心理研究所 High-throughput sequencing method for methylated DNA and use thereof
CN103664986A (en) * 2013-11-28 2014-03-26 江苏康缘药业股份有限公司 Antineoplastic compound extracted from gamboges, preparation method and application of antineoplastic compound
US20150274755A1 (en) * 2012-09-25 2015-10-01 Shane W. Krska Compound diversification using late stage functionalization
CN106776755A (en) * 2016-11-16 2017-05-31 盐城工学院 A kind of information control system of Subject-oriented
CN107766530A (en) * 2017-10-27 2018-03-06 北京再塑宝科技有限公司 A kind of method and its device of gathered data distribution
CN107832577A (en) * 2017-10-30 2018-03-23 中国农业大学 A kind of method for screening the inhibitor of chitinase OfCht I

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1151793A (en) * 1994-04-13 1997-06-11 纽约市哥伦比亚大学信托人 Complex combinatorial chemical libraries encoded with tags
CN102971434A (en) * 2010-08-11 2013-03-13 中国科学院心理研究所 High-throughput sequencing method for methylated DNA and use thereof
US20150274755A1 (en) * 2012-09-25 2015-10-01 Shane W. Krska Compound diversification using late stage functionalization
CN103664986A (en) * 2013-11-28 2014-03-26 江苏康缘药业股份有限公司 Antineoplastic compound extracted from gamboges, preparation method and application of antineoplastic compound
CN106776755A (en) * 2016-11-16 2017-05-31 盐城工学院 A kind of information control system of Subject-oriented
CN107766530A (en) * 2017-10-27 2018-03-06 北京再塑宝科技有限公司 A kind of method and its device of gathered data distribution
CN107832577A (en) * 2017-10-30 2018-03-23 中国农业大学 A kind of method for screening the inhibitor of chitinase OfCht I

Also Published As

Publication number Publication date
CN108959855A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
Brown et al. mobileOG-db: a manually curated database of protein families mediating the life cycle of bacterial mobile genetic elements
Canutescu et al. A graph‐theory algorithm for rapid protein side‐chain prediction
Henrick et al. PQS: a protein quaternary structure file server
Warr Combinatorial chemistry and molecular diversity. An overview
Hu et al. Pfizer Global Virtual Library (PGVL): a chemistry design tool powered by experimentally validated parallel synthesis information
CA2215193C (en) Biochemical information processing apparatus, biochemical information processing method, and biochemical information recording medium
US20090137408A1 (en) Methods, systems, and apparatus for facilitating the design of molecular constructs
US20180107477A1 (en) Systems and Methods for Software Scanning Tool
Humbeck et al. CHIPMUNK: A Virtual Synthesizable Small‐Molecule Library for Medicinal Chemistry, Exploitable for Protein–Protein Interaction Modulators
Bullock et al. DockoMatic 2.0: high throughput inverse virtual screening and homology modeling
Panapitiya et al. Evaluation of deep learning architectures for aqueous solubility prediction
CN108959855B (en) Computer coding method for screening reagent of DNA coding compound library
Weisel et al. Prolix: rapid mining of protein–ligand interactions in large crystal structure databases
Shibuya et al. Dictionary-driven prokaryotic gene finding
Arun et al. Reaction impurity prediction using a data mining approach
Irwin Using ZINC to acquire a virtual screening library
Hutter Differential Multimolecule Fingerprint for Similarity Search─ Making Use of Active and Inactive Compound Sets in Virtual Screening
Stanton et al. Ritornello: high fidelity control-free chromatin immunoprecipitation peak calling
Antonio et al. Simplifying computational workflows with the multiscale atomic zeolite simulation environment (maze)
Heizinger et al. Evidence for the preferential reuse of sub‐domain motifs in primordial protein folds
Yasri et al. REALISIS: a medicinal chemistry-oriented reagent selection, library design, and profiling platform
Feuston et al. Web enabling technology for the design, enumeration, optimization and tracking of compound libraries
Koichi et al. Algorithm for advanced canonical coding of planar chemical structures that considers stereochemical and symmetric information
David et al. A toolkit for covalent docking with GOLD: from automated ligand preparation with KNIME to bound protein–ligand complexes
Teppa et al. COVTree: Coevolution in OVerlapped sequences by Tree analysis server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant