CN110400607A - A kind of extended method in molecular formula library - Google Patents
A kind of extended method in molecular formula library Download PDFInfo
- Publication number
- CN110400607A CN110400607A CN201910645529.1A CN201910645529A CN110400607A CN 110400607 A CN110400607 A CN 110400607A CN 201910645529 A CN201910645529 A CN 201910645529A CN 110400607 A CN110400607 A CN 110400607A
- Authority
- CN
- China
- Prior art keywords
- library
- molecular formula
- molecules
- fingerprint
- bblist
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012216 screening Methods 0.000 claims description 21
- 239000000126 substance Substances 0.000 claims description 7
- 238000006757 chemical reactions by type Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 239000003153 chemical reaction reagent Substances 0.000 claims description 3
- 238000001914 filtration Methods 0.000 abstract description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 238000010977 unit operation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
- G16C20/62—Design of libraries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/90—Programming languages; Computing architectures; Database systems; Data warehousing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of extended methods in molecular formula library, for having library of molecules by oneself and newly entering library of molecules, own library of molecules is the library of molecules held, the entire molecule formula and fingerprint of each molecule are saved in own library of molecules, newly entering library of molecules is the library of molecules newly uploaded, it is made of several word banks, due to multiple word banks being spliced into the process in entire molecule library than relatively time-consuming, the present invention compares own library of molecules and only takes the molecular formula that must wherein splice when newly entering library of molecules, the molecular formula that can learn that it is not overlapped need not be spliced by filtering out, find out and have by oneself the identical molecular formula of library of molecules, remaining different molecular formula is added in own library of molecules, realize the Quick Extended in molecular formula library.
Description
Technical field
The invention belongs to molecular compound administrative skill field more particularly to a kind of extended methods in molecular formula library.
Background technique
It in the management of molecular compound, needs to have to the molecular formula in database at fingertips, to each newly-increased chemistry
Formula needs to know whether have existed in own library of molecules, and in production environment, newly-increased chemical formula is usually with multiple sons
What the form in library provided, and usually have in each word bank it is thousands of to tens of thousands of a molecular formula, it is right according to current molecule joining method
The splicing single-unit operation of scale word bank in this way is often required to tens days, and multi-host parallel will also consume a large amount of calculation resources, Bu Nengman
Sufficient production requirement.There is presently no the prioritization schemes for being directed to this scene.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to propose a kind of online extension side in chemical molecular formula library
Method.
The purpose of the present invention is achieved through the following technical solutions: a kind of extended method in molecular formula library, for certainly
There is library of molecules and newly enter library of molecules, have library of molecules by oneself and be made of mol_i { i=1,2 ... n }, n is chemistry point in own library of molecules
The number of minor newly enters library of molecules and is made of BBlist_j { j=1,2 ... m }, and m is newly to enter word bank BBlist_j in library of molecules
Number, each word bank BBlist_j are made of Mj chemical molecular formula, are denoted as BBlist_j_k { k=1,2 ... Mj }.It propagates through
Journey the following steps are included:
(1) the entire molecule formula mol_i { i=1,2 ... n } and fingerprint fp_i { i that each molecule is saved in library of molecules are had by oneself
=1,2 ... n }, and whole fingerprint or fingerprint fp_all that operation obtains.
(2) newly enter library of molecules to one, calculate the fingerprint of each molecular formula BBlist_j_k in wherein each BBlist_j
Fp_j_k, j=1,2, m, k=1,2 ... Mj.
(3) molecular formula in BBlist is screened: for each fingerprint fp_j_k, compared with fingerprint fp_all, if fp_j_
Having a position (bit) in k is fp_all no, i.e. this bit of fp_j_k is 1 and fp_all is 0, then removes
BBlist_j_k and fp_j_k is not involved in subsequent calculating.In screening process, if some fingerprint fp_j_k is not by mistake
Filter, then compared with each fp_i, the set FP_K of fingerprint fp_i of the record comprising fingerprint fp_j_k whole position.
(4) a step splicing is carried out to the molecular formula in the BBlist after screening, continues to screen in splicing, specifically
Are as follows: the splicing of two BBlist_j is carried out, the set FP_K of two molecular formula is merged after splicing every time, molecule after being spliced
The new FP_K set of formula, carries out the screening of step 3 in new FP_K set.
(5) step 4 is repeated, until obtaining completely newly entering library of molecules, and newly to enter each molecular formula in library of molecules corresponding
Set FP_K in own library of molecules.
(6) it will newly enter in library of molecules and screened with the duplicate molecular formula of own library of molecules, by remaining different points
Minor is added in own library of molecules, realizes the extension in molecular formula library, wherein screening process is as follows: remaining in library of molecules to newly entering
Each molecular formula has searched whether duplicate molecular formula in its corresponding FP_K, has compared the finger of each molecular formula first
Whether line is just the same with the fingerprint of some molecular formula in FP_K, and two molecular formula are carried out accurate if just the same
Match, otherwise filters out the molecular formula;In accurate matching process, the molecular formula is filtered out if it fails to match.
Further, in the step (4), the molecular formula for having same loci to mark just can be carried out splicing, in molecular formula
Site is denoted as [i], i=0,1 ... N, and it is the chemical reaction that the type is carried out with reagent molecule that [i], which corresponds to a certain reaction type,
It obtains, N is the sum of reaction type.
Further, in the step (4), the screening of step 3 is carried out in new FP_K set specifically: after splicing
Molecular formula calculate new fingerprint, first screened with fp_all comparison with new fingerprint, if do not filtered, in new FP_K collection
The set FP_K_sub of molecular formula of the screening comprising new fingerprint whole position in conjunction.
Further, in the step (6), fine matching method uses SMARTS method for mode matching.
The beneficial effects of the present invention are: the method for the present invention screens the library of molecules that newly enters newly uploaded, will be remained after screening
The own library of molecules held is added in remaining molecular formula, realizes the extension of library of molecules.Newly entering library of molecules is usually with several
Word bank is constituted, and is needed word bank to be spliced into complete library of molecules just and be can be carried out screening, due to the process ratio that multiple word banks are spliced
Relatively time-consuming, the present invention realizes a kind of method for accelerating this process.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
As shown in Figure 1, a kind of extended method in molecular formula library provided by the invention, specific as follows:
For two library of molecules (storing a large amount of chemical molecular formulas in library of molecules), has library of molecules by oneself and newly enter library of molecules, from
Have library of molecules by mol_i { i=1,2 ... n } constitute, n be have by oneself library of molecules in chemical molecular formula number, newly enter library of molecules by
BBlist_j { j=1,2 ... m } is constituted, and m is the number for newly entering word bank BBlist_j in library of molecules, each word bank BBlist_j by
Mj chemical molecular formula is constituted, and each molecular formula is a building block, i.e. BBlist_j_k k=1,2 ...
Mj}.Actual in line service, having library of molecules by oneself is the library of molecules held, and is saved in own library of molecules each
The entire molecule formula and fingerprint (fingerprint) of molecule, newly entering library of molecules is the library of molecules newly uploaded, by several word banks
BBlist_j is constituted, and due to multiple BBlist_j being spliced into the process in entire molecule library than relatively time-consuming, compares own library of molecules
The molecular formula that must wherein splice only is taken when newly entering library of molecules, the molecule that can learn that it is not overlapped need not be spliced by filtering out
The identical molecular formula of library of molecules is found out and had by oneself to formula, and remaining different molecular formula is added in own library of molecules, realizes molecular formula
The extension in library, expansion process the following steps are included:
1. saving the entire molecule formula mol_i { i=1,2 ... n } and fingerprint fp_i { i=of each molecule in own library of molecules
1,2 ... n }, and whole fingerprint or fingerprint fp_all that operation obtains.
2. pair one newly enters library of molecules, the fingerprint of each molecular formula BBlist_j_k in wherein each BBlist_j is calculated
(fingerprint) fp_j_k, j=1,2, m, k=1,2 ... Mj.
3. screening the molecular formula in BBlist: for each fingerprint fp_j_k, compared with fingerprint fp_all, if fp_j_k
In there is a position (bit) to be fp_all no, i.e. this bit of fp_j_k is 1 and fp_all is 0, then removes BBlist_
J_k and fp_j_k is not involved in subsequent calculating.Due to fingerprint some position be 1 when indicate molecule have a kind of specific atom
Sequence or attribute, position specific for one, fp_j_k is 1 and fp_all is 0 to illustrate that fp_j_k has an atomic series or attribute
It is fp_all no, that is, is not present this atomic series or attribute in own library of molecules, is i.e. fp_j_k can not be with
Some mol_i is just the same.
In screening process, if some fingerprint fp_j_k is not filtered, compared with each fp_i, record is comprising referring to
The set FP_K of the fingerprint fp_i of line fp_j_k whole position.
4. the molecular formula in BBlist after pair screening carries out a step splicing, continue to screen in splicing:
Complete molecular formula is by the way that BBlist_j { j=1,2, m }, all splicing is obtained, i.e. in BBlist_1 each
Molecular formula is spliced with the molecular formula that can splice all in BBlist_2, obtains BBlist_1_2, then every in BBlist_1_2
A molecular formula is spliced with the molecular formula that can splice all in BBlist_3, obtains BBlist_1_2_3.Site in molecular formula
It is denoted as [i], i=0,1 ... N, it is that the chemical reaction for carrying out the type with reagent molecule obtains that [i], which corresponds to a certain reaction type,
, N is the sum of reaction type;The molecular formula for having same loci to mark just can be carried out splicing, such as the molecule [1] in BBlist_1
[2] CCO [1] in C=O and BBlist_2 has same loci label [1] that can splice, and cannot then splice with [2] CCO [3].
In this step, the splicing for carrying out two BBlist_j, such as the splicing of BBlist_1 and BBlist_2.It is assumed that being formed
Entire molecule library splicing sequence be 1,2 ... m, i.e., 1 first splices with 2, result again with 3 splice, until with m-th of BBlist
Splicing, obtains entire molecule library.
The set FP_K of two molecular formula is merged after splicing every time, the new FP_K set of molecular formula after being spliced,
The screening of step 3 is carried out in new FP_K set: being to calculate spliced molecular formula new fingerprint at this time, with new fingerprint
It is first screened with fp_all comparison, if do not filtered, screening includes the molecule of new fingerprint whole position in new FP_K set
The set FP_K_sub of formula.
5. repeat step 4, until obtaining completely newly entering library of molecules, and newly enter each molecular formula in library of molecules it is corresponding from
There is the set FP_K in library of molecules.
6. being screened newly entering in library of molecules with the duplicate molecular formula of own library of molecules, by remaining different molecule
Formula is added in own library of molecules, realizes the extension in molecular formula library, wherein screening process is as follows: remaining every in library of molecules to newly entering
A molecular formula has searched whether duplicate molecular formula in its corresponding FP_K, has compared the fingerprint of each molecular formula first
It is whether just the same with the fingerprint of some molecular formula in FP_K, two molecular formula are accurately matched if just the same,
Otherwise the molecular formula is filtered out.In accurate matching process, the molecular formula is filtered out if it fails to match.Fine matching method
SMARTS method for mode matching can be used.
It is to be understood that the content of present invention and specific embodiment are intended to prove the reality of technical solution provided by the present invention
Border application, should not be construed as limiting the scope of the present invention.Within the spirit of the invention and the scope of protection of the claims,
To any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.
Claims (4)
1. a kind of extended method in molecular formula library, which is characterized in that for having library of molecules by oneself and newly entering library of molecules, have library of molecules by oneself
It is made of mol_i { i=1,2 ... n }, n is the number of chemical molecular formula in own library of molecules, newly enters library of molecules by BBlist_j
{ j=1,2 ... m } are constituted, and m is the number for newly entering word bank BBlist_j in library of molecules, and each word bank BBlist_j is by Mj chemistry
Molecular formula is constituted, and is denoted as BBlist_j_k { k=1,2 ... Mj }.Expansion process the following steps are included:
(1) have by oneself saved in library of molecules each molecule entire molecule formula mol_i { i=1,2 ... n } and fingerprint fp_i i=1,
2 ... n }, and whole fingerprint or fingerprint fp_all that operation obtains.
(2) newly enter library of molecules to one, calculate the fingerprint fp_j_ of each molecular formula BBlist_j_k in wherein each BBlist_j
K, j=1,2, m, k=1,2 ... Mj.
(3) molecular formula in BBlist is screened: for each fingerprint fp_j_k, compared with fingerprint fp_all, if in fp_j_k
One position (bit) is fp_all no, i.e. this bit of fp_j_k is 1 and fp_all is 0, then removes BBlist_j_
K and fp_j_k is not involved in subsequent calculating.It is and each if some fingerprint fp_j_k is not filtered in screening process
Fp_i compares, the set FP_K of fingerprint fp_i of the record comprising fingerprint fp_j_k whole position.
(4) a step splicing is carried out to the molecular formula in the BBlist after screening, continues to screen in splicing, specifically: into
The splicing of two BBlist_j of row every time merges the set FP_K of two molecular formula after splicing, molecular formula after being spliced
New FP_K set carries out the screening of step 3 in new FP_K set.
(5) step 4 is repeated, until obtaining completely newly entering library of molecules, and newly to enter each molecular formula in library of molecules corresponding own
Set FP_K in library of molecules.
(6) it will newly enter in library of molecules and screened with the duplicate molecular formula of own library of molecules, by remaining different molecular formula
It is added in own library of molecules, realizes the extension in molecular formula library, wherein screening process is as follows: is remaining each in library of molecules to newly entering
Molecular formula, has searched whether duplicate molecular formula in its corresponding FP_K, and the fingerprint for comparing each molecular formula first is
It is no just the same with the fingerprint of some molecular formula in FP_K, two molecular formula are accurately matched if just the same, it is no
Then filter out the molecular formula;In accurate matching process, the molecular formula is filtered out if it fails to match.
2. a kind of extended method in molecular formula library according to claim 1, which is characterized in that in the step (4), there is phase
Molecular formula with site-tag just can be carried out splicing, and the site in molecular formula is denoted as [i], i=0,1 ... N, and [i] corresponds to a certain
Reaction type is that the chemical reaction for carrying out the type with reagent molecule obtains, and N is the sum of reaction type.
3. a kind of extended method in molecular formula library according to claim 1, which is characterized in that in the step (4), new
FP_K set in carry out the screening of step 3 specifically: calculate spliced molecular formula new fingerprint, with new fingerprint first with
Fp_all comparison screening, if do not filtered, molecular formula of the screening comprising new fingerprint whole position in new FP_K set
Set FP_K_sub.
4. a kind of extended method in molecular formula library according to claim 1, which is characterized in that in the step (6), accurately
Matching process uses SMARTS method for mode matching.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910645529.1A CN110400607B (en) | 2019-07-17 | 2019-07-17 | Method for expanding molecular library |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910645529.1A CN110400607B (en) | 2019-07-17 | 2019-07-17 | Method for expanding molecular library |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110400607A true CN110400607A (en) | 2019-11-01 |
CN110400607B CN110400607B (en) | 2020-06-02 |
Family
ID=68325775
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910645529.1A Active CN110400607B (en) | 2019-07-17 | 2019-07-17 | Method for expanding molecular library |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110400607B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1908774A1 (en) * | 2006-10-06 | 2008-04-09 | sanofi-aventis | Antibacterial and antiviral peptides from Actinomadura namibiensis |
CN104750761A (en) * | 2013-12-31 | 2015-07-01 | 上海致化化学科技有限公司 | Method for creating molecular structure databases and method for searching same |
CN108149325A (en) * | 2016-12-02 | 2018-06-12 | 杭州阿诺生物医药科技股份有限公司 | The synthesis of DNA encoding dynamic library of molecules and screening technique |
CN109686413A (en) * | 2018-12-24 | 2019-04-26 | 杭州费尔斯通科技有限公司 | A kind of chemical molecular formula search method based on es inverted index |
-
2019
- 2019-07-17 CN CN201910645529.1A patent/CN110400607B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1908774A1 (en) * | 2006-10-06 | 2008-04-09 | sanofi-aventis | Antibacterial and antiviral peptides from Actinomadura namibiensis |
CN104750761A (en) * | 2013-12-31 | 2015-07-01 | 上海致化化学科技有限公司 | Method for creating molecular structure databases and method for searching same |
CN108149325A (en) * | 2016-12-02 | 2018-06-12 | 杭州阿诺生物医药科技股份有限公司 | The synthesis of DNA encoding dynamic library of molecules and screening technique |
CN109686413A (en) * | 2018-12-24 | 2019-04-26 | 杭州费尔斯通科技有限公司 | A kind of chemical molecular formula search method based on es inverted index |
Also Published As
Publication number | Publication date |
---|---|
CN110400607B (en) | 2020-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108509969B (en) | Data labeling method and terminal | |
CN105446723B (en) | Method and apparatus for identifying the semantic differential between source code version | |
CN106033436B (en) | Database merging method | |
CN104461842A (en) | Log similarity based failure processing method and device | |
CN103337113B (en) | Method and device for intelligently analyzing electronic day-to-day journals, as well as processor | |
CN109144882A (en) | A kind of software fault positioning method and device based on program invariants | |
CN105224543A (en) | For the treatment of seasonal effect in time series method and apparatus | |
CN104636130B (en) | For generating the method and system of event tree | |
CN110515986B (en) | Processing method and device of social network diagram and storage medium | |
CN106462620A (en) | Distance queries on massive networks | |
CN105335246B (en) | A kind of program crashing defect self-repairing method based on question and answer web analytics | |
CN110674360B (en) | Tracing method and system for data | |
CN101882135B (en) | Data processing method and device | |
CN108415846A (en) | A kind of method and apparatus generating minimal automation test use cases | |
CN110990520B (en) | Address coding method and device, electronic equipment and storage medium | |
KR20190039758A (en) | Data analysis support device and data analysis support system | |
CN105279089B (en) | A kind of method and device for obtaining page elements | |
CN108363684A (en) | List creation method, device and server | |
CN110750588A (en) | Multi-source heterogeneous data fusion method, system, device and storage medium | |
CN108345658B (en) | Decomposition processing method of algorithm calculation track, server and storage medium | |
CN110737779B (en) | Knowledge graph construction method and device, storage medium and electronic equipment | |
Ashraf et al. | WeFreS: weighted frequent subgraph mining in a single large graph | |
CN104748757A (en) | Data updating method and device for navigation electronic map | |
Bi et al. | MM-GNN: Mix-moment graph neural network towards modeling neighborhood feature distribution | |
CN107426610A (en) | Video information synchronous method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: 7 / F, building B, 482 Qianmo Road, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province 310000 Patentee after: Huoshi Creation Technology Co.,Ltd. Address before: 7 / F, building B, 482 Qianmo Road, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province 310000 Patentee before: HANGZHOU FIRESTONE TECHNOLOGY Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder |