CN114078570A - Chemical molecular structure retrieval system - Google Patents

Chemical molecular structure retrieval system Download PDF

Info

Publication number
CN114078570A
CN114078570A CN202010796802.3A CN202010796802A CN114078570A CN 114078570 A CN114078570 A CN 114078570A CN 202010796802 A CN202010796802 A CN 202010796802A CN 114078570 A CN114078570 A CN 114078570A
Authority
CN
China
Prior art keywords
module
molecular
chemical
retrieval
molecular structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010796802.3A
Other languages
Chinese (zh)
Inventor
杨建明
李天泉
罗元平
李雪梅
陈浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Kangzhou Big Data Co ltd
Original Assignee
Chongqing Kangzhou Big Data Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Kangzhou Big Data Co ltd filed Critical Chongqing Kangzhou Big Data Co ltd
Priority to CN202010796802.3A priority Critical patent/CN114078570A/en
Publication of CN114078570A publication Critical patent/CN114078570A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/40Searching chemical structures or physicochemical data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Abstract

The invention discloses a chemical molecular structure retrieval system, which comprises: the device comprises an input module, a processing module, a retrieval module, a storage module and an output module; the input module is used for receiving drawing operation of the chemical molecular structure and sending the received drawing operation input of the chemical molecular structure to the processing module; the processing module processes the drawing operation input of the user, calculates the molecular fingerprint of the user and sends the processed result to the retrieval module; and the retrieval module compares and retrieves the molecular fingerprints of all molecular structures in the storage module according to the processed result, and outputs the retrieved result by combining the data in the storage module. The retrieval system provided by the invention can quickly and accurately search compounds, realize accurate retrieval, substructure retrieval and similarity retrieval of chemical structures, and effectively solve the problems of unintuitive, inaccurate, low efficiency and the like in text retrieval.

Description

Chemical molecular structure retrieval system
Technical Field
The invention belongs to the technical field of information search, and particularly relates to a chemical molecular structure retrieval system.
Background
Among chemical information, the structure of a compound is one of the most important information. Common chemical database retrieval means include name retrieval, molecular formula retrieval, and CAS number retrieval, where the name retrieval and molecular formula retrieval results are not unique, and CAS number retrieval does not visually reflect the structure of a compound. Since some new compounds or intermediates related in chemical and pharmaceutical patent documents cannot find corresponding substance names and CAS numbers at all, only the chemical structural formula can be used for searching.
The common chemical information database only has a text retrieval function, the searching modes have the problems of non-intuition, inaccuracy, low efficiency and the like, and the types and the number of chemicals are huge data volume, and the names of the chemicals are complex, so that the traditional Chinese and English name searching cannot well meet the requirements of users.
In order to meet the most common language-chemical structure and chemical structure search requirement of chemical research and development personnel, a chemical molecular structure search system needs to be developed.
Disclosure of Invention
In view of the above, the present invention provides a chemical molecular structure search system, which solves many problems of the existing text search and realizes fast and accurate search of a compound structure.
In order to achieve the purpose, the invention provides the following technical scheme:
a chemical molecular structure retrieval system, comprising: the device comprises an input module, a processing module, a retrieval module, a storage module and an output module;
the input module is used for receiving drawing operation of the chemical molecular structure and sending the received drawing operation input of the chemical molecular structure to the processing module;
the processing module processes the drawing operation input of the user, calculates the molecular fingerprint of the user and sends the processed result to the retrieval module;
the retrieval module compares and retrieves the molecular fingerprints of all the molecular structures in the storage module according to the processed result, and the comparison of the drawn chemical molecular structure with the molecular fingerprint of the molecular structure in the storage module comprises the following steps: and comparing each corresponding character position of the two molecular structure molecular fingerprint character strings, and dividing the number of the same character positions and the same characters by the total character number to obtain the similarity of the molecular fingerprints.
And the output module outputs the searched result.
Preferably, the input module is a chemical molecular structural formula editor.
Preferably, the storage module comprises a chemical molecule database, and the retrieval module retrieves the chemical molecule database.
Preferably, the chemical molecule database comprises: chemical molecule English name, CAS number, molecular formula, smiles code, IUPAC standard name, EINECS number, InChI number, UNII number, alias, molecular fingerprint.
Compared with the prior art, the invention has the beneficial effects that: the system for quickly and accurately searching the compound structure is provided, the accurate search, the substructure search and the similarity search of the chemical structure are realized, and the problems of incompleteness, inaccuracy, low efficiency and the like in the text search are effectively solved.
Drawings
In order to make the object, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for explanation:
FIG. 1 is a schematic diagram of a chemical molecular structure retrieval system according to the present invention.
The input module 101 in the figure; 102 a processing module; 103 a retrieval module; 104 a storage module; 105 an output module.
FIG. 2 is a schematic diagram of the structure of an exemplary chemical molecule.
FIG. 3 is a graph showing the result of substructure search of the chemical molecular structure of the example.
Detailed Description
The present invention is further described with reference to the following drawings and specific examples so that those skilled in the art can better understand the present invention and can practice the present invention, but the examples are not intended to limit the present invention.
Example 1
The present embodiment provides a chemical molecular structure retrieval system 10 as shown in fig. 1, including: an input module 101, a processing module 102, a retrieval module 103, a storage module 104 and an output module 105.
(1) The input module 101 is a chemical molecular structural formula editor, and is configured to receive a drawing operation of a chemical molecular structure and send a received drawing operation input of the chemical molecular structure to the processing module 102. The chemical molecular structure editor used in the examples was a Ketcher structure editor.
(2) The processing module 102 processes the drawing operation input of the user, performs molecular fingerprint calculation on the input structure, and sends the processed result to the retrieval module 103.
Wherein, the molecular Fingerprint calculation is according to the Chemical Hashed finger calculation method (https:// docs. chemaxon. com/display/docs/Chemical Hashed _ finger. html). The molecular fingerprint represents the structural information of a chemical molecule by composing a bit string of '0' and '1'.
(3) The retrieval module 103 compares and retrieves the molecular fingerprints of all molecular structures in the storage module according to the processed result, and outputs the retrieved result by combining the data in the storage module.
The storage module mainly comprises a chemical molecule database, and the retrieval module is used for retrieving in the chemical molecule database. The chemical molecule database includes: chemical molecule English name, CAS number, molecular formula, smiles code, IUPAC standard name, EINECS number, InChI number, UNII number, alias, molecular fingerprint, etc
For a clearer understanding of the search system, the following example is a structure shown in formula (I) and a substructure search:
Figure BDA0002625930090000041
a) the input module 101: plotting the molecular structure of system (I) in a molecular editor, as shown in FIG. 2;
b) the processing module 102: performing molecular fingerprint calculation on the input molecular structure of the formula (I);
c) the retrieval module 103: comparing and retrieving the molecular fingerprints of all the molecular structures in the storage module 104 according to the molecular fingerprint of the molecular structure of the formula (I), wherein the retrieval condition is that the similarity is more than 80%;
d) the output module 105: and outputting the searched results meeting the requirements, wherein the output results are molecular structures which are ordered from high to low according to the similarity. As shown in fig. 3.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (4)

1. A chemical molecular structure retrieval system, characterized by comprising: the device comprises an input module, a processing module, a retrieval module, a storage module and an output module;
the input module is used for receiving drawing operation of the chemical molecular structure and sending the received drawing operation input of the chemical molecular structure to the processing module;
the processing module processes the drawn chemical molecular structure, calculates the molecular fingerprint of the chemical molecular structure and sends the processed result to the retrieval module;
the retrieval module compares and retrieves the molecular fingerprints of all molecular structures in the storage module according to the processed result, wherein the drawn molecular fingerprint of the chemical molecular structure is compared with the molecular fingerprint of the molecular structure in the storage module, and the method comprises the following steps: comparing each corresponding character position of the molecular fingerprint character strings of the two chemical molecular structures, and dividing the number of the same character positions with the same character by the total character number to obtain the similarity of the molecular fingerprints;
and the output module outputs the searched result.
2. The system for retrieving chemical molecular structures of claim 1, wherein the input module is a chemical molecular structural formula editor.
3. The chemical molecule structure retrieval system of claim 1, wherein the storage module comprises a chemical molecule database, and the retrieval module retrieves the chemical molecule database.
4. The chemical molecule structure retrieval system according to claim 3, wherein the chemical molecule database comprises: chemical molecule English name, CAS number, molecular formula, smiles code, IUPAC standard name, EINECS number, InChI number, UNII number, alias, molecular fingerprint.
CN202010796802.3A 2020-08-10 2020-08-10 Chemical molecular structure retrieval system Pending CN114078570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010796802.3A CN114078570A (en) 2020-08-10 2020-08-10 Chemical molecular structure retrieval system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010796802.3A CN114078570A (en) 2020-08-10 2020-08-10 Chemical molecular structure retrieval system

Publications (1)

Publication Number Publication Date
CN114078570A true CN114078570A (en) 2022-02-22

Family

ID=80279992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010796802.3A Pending CN114078570A (en) 2020-08-10 2020-08-10 Chemical molecular structure retrieval system

Country Status (1)

Country Link
CN (1) CN114078570A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705189A (en) * 2023-08-09 2023-09-05 北京慧采通科技有限公司 Method, device and storage medium for searching chemical

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705189A (en) * 2023-08-09 2023-09-05 北京慧采通科技有限公司 Method, device and storage medium for searching chemical
CN116705189B (en) * 2023-08-09 2023-10-10 北京慧采通科技有限公司 Method, device and storage medium for searching chemical

Similar Documents

Publication Publication Date Title
CN108932294B (en) Resume data processing method, device, equipment and storage medium based on index
US9922102B2 (en) Templates for defining fields in machine data
US8515684B2 (en) System and method for identifying similar molecules
US11768892B2 (en) Method and apparatus for extracting name of POI, device and computer storage medium
CN103823838A (en) Method for inputting and comparing multi-format documents
US10169208B1 (en) Similarity scoring of programs
WO2021258848A1 (en) Data dictionary generation method and apparatus, data query method and apparatus, and device and medium
CN113407785B (en) Data processing method and system based on distributed storage system
US11741064B2 (en) Fuzzy search using field-level deletion neighborhoods
CN111400323A (en) Data retrieval method, system, device and storage medium
WO2020037794A1 (en) Index building method for english geographical name, and query method and apparatus therefor
CN102867049A (en) Chinese PINYIN quick word segmentation method based on word search tree
US20070028168A1 (en) Phonetic searching using multiple readings
CN111190920A (en) Data interactive query method and system based on natural language
CN114078570A (en) Chemical molecular structure retrieval system
Nakashima et al. Constructing LZ78 tries and position heaps in linear time for large alphabets
WO2008038416A1 (en) Document searching device and document searching method
CN111984745A (en) Dynamic expansion method, device, equipment and storage medium for database field
Tian A mathematical indexing method based on the hierarchical features of operators in formulae
US11822530B2 (en) Augmentation to the succinct trie for multi-segment keys
WO2006058476A1 (en) Prime number replacing character string search technology
CN114462413B (en) User entity matching method, device, computer equipment and readable storage medium
Zhou et al. Adjacency matrix based full-text indexing models
CN115934884B (en) Medical insurance catalog medicine rapid comparison method, device, equipment and storage medium
US20230409620A1 (en) Non-transitory computer-readable recording medium storing information processing program, information processing method, information processing device, and information processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination