CN114078570A - Chemical molecular structure retrieval system - Google Patents
Chemical molecular structure retrieval system Download PDFInfo
- Publication number
- CN114078570A CN114078570A CN202010796802.3A CN202010796802A CN114078570A CN 114078570 A CN114078570 A CN 114078570A CN 202010796802 A CN202010796802 A CN 202010796802A CN 114078570 A CN114078570 A CN 114078570A
- Authority
- CN
- China
- Prior art keywords
- module
- molecular
- chemical
- retrieval
- molecular structure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/40—Searching chemical structures or physicochemical data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
Abstract
The invention discloses a chemical molecular structure retrieval system, which comprises: the device comprises an input module, a processing module, a retrieval module, a storage module and an output module; the input module is used for receiving drawing operation of the chemical molecular structure and sending the received drawing operation input of the chemical molecular structure to the processing module; the processing module processes the drawing operation input of the user, calculates the molecular fingerprint of the user and sends the processed result to the retrieval module; and the retrieval module compares and retrieves the molecular fingerprints of all molecular structures in the storage module according to the processed result, and outputs the retrieved result by combining the data in the storage module. The retrieval system provided by the invention can quickly and accurately search compounds, realize accurate retrieval, substructure retrieval and similarity retrieval of chemical structures, and effectively solve the problems of unintuitive, inaccurate, low efficiency and the like in text retrieval.
Description
Technical Field
The invention belongs to the technical field of information search, and particularly relates to a chemical molecular structure retrieval system.
Background
Among chemical information, the structure of a compound is one of the most important information. Common chemical database retrieval means include name retrieval, molecular formula retrieval, and CAS number retrieval, where the name retrieval and molecular formula retrieval results are not unique, and CAS number retrieval does not visually reflect the structure of a compound. Since some new compounds or intermediates related in chemical and pharmaceutical patent documents cannot find corresponding substance names and CAS numbers at all, only the chemical structural formula can be used for searching.
The common chemical information database only has a text retrieval function, the searching modes have the problems of non-intuition, inaccuracy, low efficiency and the like, and the types and the number of chemicals are huge data volume, and the names of the chemicals are complex, so that the traditional Chinese and English name searching cannot well meet the requirements of users.
In order to meet the most common language-chemical structure and chemical structure search requirement of chemical research and development personnel, a chemical molecular structure search system needs to be developed.
Disclosure of Invention
In view of the above, the present invention provides a chemical molecular structure search system, which solves many problems of the existing text search and realizes fast and accurate search of a compound structure.
In order to achieve the purpose, the invention provides the following technical scheme:
a chemical molecular structure retrieval system, comprising: the device comprises an input module, a processing module, a retrieval module, a storage module and an output module;
the input module is used for receiving drawing operation of the chemical molecular structure and sending the received drawing operation input of the chemical molecular structure to the processing module;
the processing module processes the drawing operation input of the user, calculates the molecular fingerprint of the user and sends the processed result to the retrieval module;
the retrieval module compares and retrieves the molecular fingerprints of all the molecular structures in the storage module according to the processed result, and the comparison of the drawn chemical molecular structure with the molecular fingerprint of the molecular structure in the storage module comprises the following steps: and comparing each corresponding character position of the two molecular structure molecular fingerprint character strings, and dividing the number of the same character positions and the same characters by the total character number to obtain the similarity of the molecular fingerprints.
And the output module outputs the searched result.
Preferably, the input module is a chemical molecular structural formula editor.
Preferably, the storage module comprises a chemical molecule database, and the retrieval module retrieves the chemical molecule database.
Preferably, the chemical molecule database comprises: chemical molecule English name, CAS number, molecular formula, smiles code, IUPAC standard name, EINECS number, InChI number, UNII number, alias, molecular fingerprint.
Compared with the prior art, the invention has the beneficial effects that: the system for quickly and accurately searching the compound structure is provided, the accurate search, the substructure search and the similarity search of the chemical structure are realized, and the problems of incompleteness, inaccuracy, low efficiency and the like in the text search are effectively solved.
Drawings
In order to make the object, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for explanation:
FIG. 1 is a schematic diagram of a chemical molecular structure retrieval system according to the present invention.
The input module 101 in the figure; 102 a processing module; 103 a retrieval module; 104 a storage module; 105 an output module.
FIG. 2 is a schematic diagram of the structure of an exemplary chemical molecule.
FIG. 3 is a graph showing the result of substructure search of the chemical molecular structure of the example.
Detailed Description
The present invention is further described with reference to the following drawings and specific examples so that those skilled in the art can better understand the present invention and can practice the present invention, but the examples are not intended to limit the present invention.
Example 1
The present embodiment provides a chemical molecular structure retrieval system 10 as shown in fig. 1, including: an input module 101, a processing module 102, a retrieval module 103, a storage module 104 and an output module 105.
(1) The input module 101 is a chemical molecular structural formula editor, and is configured to receive a drawing operation of a chemical molecular structure and send a received drawing operation input of the chemical molecular structure to the processing module 102. The chemical molecular structure editor used in the examples was a Ketcher structure editor.
(2) The processing module 102 processes the drawing operation input of the user, performs molecular fingerprint calculation on the input structure, and sends the processed result to the retrieval module 103.
Wherein, the molecular Fingerprint calculation is according to the Chemical Hashed finger calculation method (https:// docs. chemaxon. com/display/docs/Chemical Hashed _ finger. html). The molecular fingerprint represents the structural information of a chemical molecule by composing a bit string of '0' and '1'.
(3) The retrieval module 103 compares and retrieves the molecular fingerprints of all molecular structures in the storage module according to the processed result, and outputs the retrieved result by combining the data in the storage module.
The storage module mainly comprises a chemical molecule database, and the retrieval module is used for retrieving in the chemical molecule database. The chemical molecule database includes: chemical molecule English name, CAS number, molecular formula, smiles code, IUPAC standard name, EINECS number, InChI number, UNII number, alias, molecular fingerprint, etc
For a clearer understanding of the search system, the following example is a structure shown in formula (I) and a substructure search:
a) the input module 101: plotting the molecular structure of system (I) in a molecular editor, as shown in FIG. 2;
b) the processing module 102: performing molecular fingerprint calculation on the input molecular structure of the formula (I);
c) the retrieval module 103: comparing and retrieving the molecular fingerprints of all the molecular structures in the storage module 104 according to the molecular fingerprint of the molecular structure of the formula (I), wherein the retrieval condition is that the similarity is more than 80%;
d) the output module 105: and outputting the searched results meeting the requirements, wherein the output results are molecular structures which are ordered from high to low according to the similarity. As shown in fig. 3.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.
Claims (4)
1. A chemical molecular structure retrieval system, characterized by comprising: the device comprises an input module, a processing module, a retrieval module, a storage module and an output module;
the input module is used for receiving drawing operation of the chemical molecular structure and sending the received drawing operation input of the chemical molecular structure to the processing module;
the processing module processes the drawn chemical molecular structure, calculates the molecular fingerprint of the chemical molecular structure and sends the processed result to the retrieval module;
the retrieval module compares and retrieves the molecular fingerprints of all molecular structures in the storage module according to the processed result, wherein the drawn molecular fingerprint of the chemical molecular structure is compared with the molecular fingerprint of the molecular structure in the storage module, and the method comprises the following steps: comparing each corresponding character position of the molecular fingerprint character strings of the two chemical molecular structures, and dividing the number of the same character positions with the same character by the total character number to obtain the similarity of the molecular fingerprints;
and the output module outputs the searched result.
2. The system for retrieving chemical molecular structures of claim 1, wherein the input module is a chemical molecular structural formula editor.
3. The chemical molecule structure retrieval system of claim 1, wherein the storage module comprises a chemical molecule database, and the retrieval module retrieves the chemical molecule database.
4. The chemical molecule structure retrieval system according to claim 3, wherein the chemical molecule database comprises: chemical molecule English name, CAS number, molecular formula, smiles code, IUPAC standard name, EINECS number, InChI number, UNII number, alias, molecular fingerprint.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010796802.3A CN114078570A (en) | 2020-08-10 | 2020-08-10 | Chemical molecular structure retrieval system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010796802.3A CN114078570A (en) | 2020-08-10 | 2020-08-10 | Chemical molecular structure retrieval system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114078570A true CN114078570A (en) | 2022-02-22 |
Family
ID=80279992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010796802.3A Pending CN114078570A (en) | 2020-08-10 | 2020-08-10 | Chemical molecular structure retrieval system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114078570A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116705189A (en) * | 2023-08-09 | 2023-09-05 | 北京慧采通科技有限公司 | Method, device and storage medium for searching chemical |
-
2020
- 2020-08-10 CN CN202010796802.3A patent/CN114078570A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116705189A (en) * | 2023-08-09 | 2023-09-05 | 北京慧采通科技有限公司 | Method, device and storage medium for searching chemical |
CN116705189B (en) * | 2023-08-09 | 2023-10-10 | 北京慧采通科技有限公司 | Method, device and storage medium for searching chemical |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108932294B (en) | Resume data processing method, device, equipment and storage medium based on index | |
US9922102B2 (en) | Templates for defining fields in machine data | |
US8515684B2 (en) | System and method for identifying similar molecules | |
US11768892B2 (en) | Method and apparatus for extracting name of POI, device and computer storage medium | |
CN103823838A (en) | Method for inputting and comparing multi-format documents | |
US10169208B1 (en) | Similarity scoring of programs | |
WO2021258848A1 (en) | Data dictionary generation method and apparatus, data query method and apparatus, and device and medium | |
CN113407785B (en) | Data processing method and system based on distributed storage system | |
US11741064B2 (en) | Fuzzy search using field-level deletion neighborhoods | |
CN111400323A (en) | Data retrieval method, system, device and storage medium | |
WO2020037794A1 (en) | Index building method for english geographical name, and query method and apparatus therefor | |
CN102867049A (en) | Chinese PINYIN quick word segmentation method based on word search tree | |
US20070028168A1 (en) | Phonetic searching using multiple readings | |
CN111190920A (en) | Data interactive query method and system based on natural language | |
CN114078570A (en) | Chemical molecular structure retrieval system | |
Nakashima et al. | Constructing LZ78 tries and position heaps in linear time for large alphabets | |
WO2008038416A1 (en) | Document searching device and document searching method | |
CN111984745A (en) | Dynamic expansion method, device, equipment and storage medium for database field | |
Tian | A mathematical indexing method based on the hierarchical features of operators in formulae | |
US11822530B2 (en) | Augmentation to the succinct trie for multi-segment keys | |
WO2006058476A1 (en) | Prime number replacing character string search technology | |
CN114462413B (en) | User entity matching method, device, computer equipment and readable storage medium | |
Zhou et al. | Adjacency matrix based full-text indexing models | |
CN115934884B (en) | Medical insurance catalog medicine rapid comparison method, device, equipment and storage medium | |
US20230409620A1 (en) | Non-transitory computer-readable recording medium storing information processing program, information processing method, information processing device, and information processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |