KR101801226B1 - Classification Algorithm for Chemical Compound Using InChI - Google Patents
Classification Algorithm for Chemical Compound Using InChI Download PDFInfo
- Publication number
- KR101801226B1 KR101801226B1 KR1020150120964A KR20150120964A KR101801226B1 KR 101801226 B1 KR101801226 B1 KR 101801226B1 KR 1020150120964 A KR1020150120964 A KR 1020150120964A KR 20150120964 A KR20150120964 A KR 20150120964A KR 101801226 B1 KR101801226 B1 KR 101801226B1
- Authority
- KR
- South Korea
- Prior art keywords
- layer
- compound
- class
- atomic
- classified
- Prior art date
Links
Images
Classifications
-
- G06F19/707—
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16Z—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
- G16Z99/00—Subject matter not provided for in other main groups of this subclass
-
- G06F19/709—
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The method of classifying a compound according to the present invention comprises the steps of: (a) inputting an InChI (International Chemical Identifier) identifier of a compound; (b) classifying a layer using the identifier to confirm whether the compound is an organic compound or an inorganic compound (c) determining the atomic class by identifying the constituent atomic components in the case of an organic compound without performing classification in the case of an inorganic compound, (d) (E) determining a structure to determine a subclass, and (f) generating a classification string including an atomic class, a main class and a subclass, By identifying the functional group and the structure using InChI among the compound identifiers, the organic compound can be accurately classified in a short time in the database, and the physical properties of the compound .
Description
The present invention relates to a method for classifying a compound using a molecular structure of a compound, and more particularly, to a method for classifying a compound by identifying an atomic structure, a functional group and a structure using an InChI (International Chemical Identifier) .
As science and technology developed, a large number of materials were used, and physical and chemical properties were entered into the database through experiments. This allows the database to accumulate experimental data for a large number of compounds and allows database users to verify the properties of the compound before use by searching for the compound of interest.
For this reason, analyzing, organizing and storing the compounds in the database broadens the way in which the data can be used as well as the convenience of the user. Thus, in the case of a database containing a compound, an identifier or a three-dimensional structure including the structure of a compound other than the name of the compound is easily included in the database for classification of similar compounds, thereby facilitating the distinction between the compounds.
The best way to distinguish the compounds contained in the database is to convert the three-dimensional structure of the compound into a one-dimensional character array and compare them. The most commonly used methods are SMILES (Simplified Molecular-Input Line-Entry System) and InChI (International Chemical Identifier). The character array generated by these two conversion methods has the advantage of reducing the size of the DB as compared with the method of storing the three-dimensional structure.
SMILES indicates the arrangement and bonding of atoms by linearly representing the atoms contained in the molecule. Therefore, the three-dimensional structure represented by SMILES is generally readable and has an advantage in showing a simple three-dimensional structure. However, since SMILES does not take into account the direction and order of atoms, it has various disadvantages in that it is difficult to distinguish complex structures.
InChI is a one-dimensional array developed by IUPAC and NIST. It shows various information such as composition, arrangement, and binding of atoms contained in a molecule in each layer. It can be distinguished in case of complex structure or the same structure although it is less readable than SMILES, and it is possible to accurately express the hydrogen bonding or resonance structure not included in SMILES.
In actual research, there are many cases where physical properties of a compound having a specific property are required in addition to a search for a single property. Particularly, in the case of an organic compound, the physical properties and reactivity of the compound are greatly influenced by the functional group or the structure. Thus, in the case of organic compounds it is necessary to perform data classification according to functional groups or structures. In this case, a common data classification method is classified through the structure in the name or identifier of the functional group of the compound. However, all of these methods require much time to analyze the structure contained in names and identifiers, to find out the number of possible cases, and there is a drawback that the efficiency of the database is poor.
Conventional techniques classify compounds by name, structure, and identifier included in the database to classify the compounds. However, in this method, a method of directly searching for the number of all the cases included in the name and the identifier is used in order to classify one structure. This method is not only time consuming, but also if the number of cases is omitted, It has a problem that it is not performed.
As a result of intensive efforts to solve the above problems, the present inventors have found that functional groups and structures can be identified using InChI among compound identifiers for organic compounds, and developed as a program to effectively classify organic compounds in a database in a short time And completed the present invention.
It is an object of the present invention to provide a method of classifying a compound having a high searching speed, excellent accuracy and easy prediction of physical properties of a compound.
In order to accomplish the above object, the present invention provides a method for producing a compound, comprising: (a) inputting an InChI (International Chemical Identifier) identifier of a compound; (b) classifying the layer using the identifier to identify an organic compound or an inorganic compound; (c) determining the atomic class by identifying the constituent atom components in the case of an organic compound without performing classification in the case of an inorganic compound; (d) determining a main class by identifying a functional group; (e) determining a subclass by checking the structure; And (f) generating a classification string including an atomic class, a main class, and a subclass.
According to the present invention, it is possible to more accurately classify the functional group and the functional group of a compound having a similar structure but having a different structure, thereby allowing a database user to more easily identify the characteristics of the compound, Can be used to analyze the physical properties resulting from this.
1 is a view schematically showing a step of classifying a compound using an InChI identifier according to the present invention.
Figure 2 is a flow chart illustrating steps for determining an atomic class in accordance with the present invention.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In general, the nomenclature used herein is well known and commonly used in the art.
The object of the present invention is to search for the functional group or structure of a substance present in a database containing the compound identifier InChI of an IUPAC with respect to an organic compound and to show the functional group and structure of the substance. And to perform the classification.
Accordingly, in one aspect, the present invention provides a method for detecting a compound, comprising: (a) inputting an InChI (International Chemical Identifier) identifier of a compound; (b) classifying the layer using the identifier to identify an organic compound or an inorganic compound; (c) determining the atomic class by identifying the constituent atom components in the case of an organic compound without performing classification in the case of an inorganic compound; (d) determining a main class by identifying a functional group; (e) determining a subclass by checking the structure; And (f) generating a classification string comprising an atomic class, a main class and a subclass.
Unlike conventional classification methods, the present invention is a method of classifying data through functional groups and structures of compounds using InChI, which is an identifier uniquely assigned to a compound.
In the present invention, classification is carried out into an atomic class, a main class, and a sub class using the InChI identifier of a compound. An atomic class is a classification of an atomic structure. A main class is a classification of a functional group, and a sub class is a classification of a structure.
Figure 1 shows the step of classifying compounds using InChI identifiers. As shown in FIG. 1, the method of classifying a compound according to the present invention includes an
The compound identifier used in the present invention can be input as an InChI as a character string via the
The input InChI is decomposed into the respective layers by the
The classification through the
The distinction (300) between an inorganic compound and an organic compound is identified through identification of a chemical layer. If there is no carbon in the formula layer or if no hydrogen layer is present, it is classified as an inorganic compound. Among organic compounds, organic salts are classified as being bonded to metals. At this stage, the compounds classified as organic compounds have an atomic class as classification, and when there is no structural specificity and functional group, most of the compounds are not classified.
Compounds classified as organic compounds through the division (300) between inorganic compounds and organic compounds are classified into atomic classes according to their contained atoms. The atomic class is determined by the inclusion of atoms other than carbon and hydrogen, which can be determined by determining the atomic class by analyzing the chemical layer using the method given in Figure 2 below. (A1) loading a layer of formula (i = 0); (a2) identifying a member if the letter atomic layer (i + 1) of the chemical layer is a lowercase letter (i = i + 2) If i is not i, checking the member after i = i + 1; (a3) if the character atomic layer (i) is a number, proceeding to the beginning of (a3) after i = i + 1 processing; And (a4) terminating the entire step if the string is terminated, and repeating (a2) if not.
Each atomic class is shown in Table 2, but it is not limited to the reactors shown in Table 2.
In Fig. 2, the atomic structure is confirmed by reading the characters of the compound layer. (410) and reads from the first character of the string. Identification of the element is confirmed by checking the element symbol (420). The first letter of an element is an uppercase letter, so if it is a lowercase letter, it is an elemental symbol of two letters. If it is an uppercase letter or a number, it is an elemental symbol of a letter. The member is identified through the obtained element symbol (430). The character immediately after the symbol is checked (440). If the number is an integer, the number of atoms can be checked. By repeating the above procedure until the end of the string (460), the composition of all the atoms contained in the compound can be confirmed.
The atomic class is identified through the
Table 2 shows the criteria for classification of nitrogen, oxygen, and chlorine for groups 15, 16, and 17, respectively, and it is possible to classify the same group in the same way.
In the main class determination phase, the most important thing is the connection position of the object atoms contained in the hydrogen and carbon, which can be confirmed by counting the number of atoms excluding hydrogen in the chemical layer. The number of hydrogen atoms can also be calculated by determining how many hydrogen atoms are bound to the atom through the hydrogen layer.
The main class obtained through the step of identifying functional groups (500) represents only functional groups and does not distinguish structures. Thus, the structure of the compound is achieved through the
That is, the structure may be formed in the atomic connection layer through a branch or a hydrogen-bonding layer, such as N-, Branched-, Unsaturated-, Poly-, Cyclic- or aromatic Aromatic-), or can be classified as a subclass.
The atomic class, the main class, and the sub class obtained through the
Hereinafter, the present invention will be described in more detail with reference to Examples. It is to be understood by those skilled in the art that these embodiments are for illustrative purposes only and that the scope of the present invention is not limited by these embodiments.
[Example]
Example One
Illustrative materials were sorted according to the steps of the method described above to aid in the description of the invention. Table 4 shows the results of the classification of the materials and the respective steps in Example 1.
[Table 4]
As shown in Table 4, organic compounds can be effectively classified in a database in a short time by confirming functional groups and structures using InChI among the compound identifiers for organic compounds. It is possible to improve the retrieval speed and accuracy and also to easily predict the physical properties of the compound.
While the present invention has been particularly shown and described with reference to specific embodiments thereof, those skilled in the art will appreciate that such specific embodiments are merely preferred embodiments and that the scope of the present invention is not limited thereto will be. Accordingly, the actual scope of the present invention will be defined by the appended claims and their equivalents.
Claims (9)
(a) a computer-equipped compound identifier means for inputting an International Chemical Identifier (InChI) identifier of the compound as a string;
(b) the layer means classifies the layer using the identifier to confirm whether it is an organic compound or an inorganic compound;
(a1) a chemical layer (i = 0), and (ii) an atomic class determining means for determining an atomic class, if the atomic class determining means is an inorganic compound, ; (a2) confirming the member when the character atomic layer (i + 1) of the chemical layer is a lowercase character (i = i + 2), and checking the member after processing i = i + 1 if the character atomic layer is not a lowercase character; (a3) if the character atomic layer (i) is a number, proceeding to the beginning of (a3) after i = i + 1 processing; And (a4) terminating the entire step if the string is terminated, and repeating (a2) if not.
(d) the main class determining means determines the main class by checking the functional group;
(e) determining a subclass by confirming the structure of the subclass; And
(f) The classification string generating means executes a step of generating a classification string including an atom class, a main class and a sub class.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150120964A KR101801226B1 (en) | 2015-08-27 | 2015-08-27 | Classification Algorithm for Chemical Compound Using InChI |
PCT/KR2016/009273 WO2017034280A1 (en) | 2015-08-27 | 2016-08-23 | Method of classifying compounds by using molecular structure of compounds |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150120964A KR101801226B1 (en) | 2015-08-27 | 2015-08-27 | Classification Algorithm for Chemical Compound Using InChI |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170025071A KR20170025071A (en) | 2017-03-08 |
KR101801226B1 true KR101801226B1 (en) | 2017-11-24 |
Family
ID=58101018
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150120964A KR101801226B1 (en) | 2015-08-27 | 2015-08-27 | Classification Algorithm for Chemical Compound Using InChI |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR101801226B1 (en) |
WO (1) | WO2017034280A1 (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101236966B1 (en) * | 2011-11-14 | 2013-02-26 | 숭실대학교산학협력단 | Apparatus and method for string expression of compound distinguishing isomer, apparatus and method for searching compound using the same |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8140267B2 (en) * | 2006-06-30 | 2012-03-20 | International Business Machines Corporation | System and method for identifying similar molecules |
US8468001B2 (en) * | 2007-03-22 | 2013-06-18 | Infosys Limited | Ligand identification and matching software tools |
KR101375672B1 (en) * | 2011-10-27 | 2014-03-20 | 주식회사 켐에쎈 | Method for Predicting a Property of Compound and System for Predicting a Property of Compound |
-
2015
- 2015-08-27 KR KR1020150120964A patent/KR101801226B1/en active IP Right Grant
-
2016
- 2016-08-23 WO PCT/KR2016/009273 patent/WO2017034280A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101236966B1 (en) * | 2011-11-14 | 2013-02-26 | 숭실대학교산학협력단 | Apparatus and method for string expression of compound distinguishing isomer, apparatus and method for searching compound using the same |
Non-Patent Citations (3)
Title |
---|
Sun, Bingjun, et al. "Extraction and search of chemical formulae in text documents on the web." Proceedings of the 16th international conference on World Wide Web. ACM, 2007. |
논문 1 : CEUR Workshop |
웹사이트 |
Also Published As
Publication number | Publication date |
---|---|
KR20170025071A (en) | 2017-03-08 |
WO2017034280A1 (en) | 2017-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kajino | Molecular hypergraph grammar with its application to molecular optimization | |
US10885323B2 (en) | Digital image-based document digitization using a graph model | |
CN103177120B (en) | A kind of XPath query pattern tree matching method based on index | |
US20210011889A1 (en) | Concurrent enumeration of multiple hierarchies in a database environment | |
CN101031907B (en) | Index processing | |
CN106293677B (en) | A kind of code conversion method and device | |
CN103823823B (en) | Denormalization policy selection method based on Frequent Itemsets Mining Algorithm | |
CN109902142B (en) | Character string fuzzy matching and query method based on edit distance | |
EP2492824A1 (en) | Method of searching a data base, navigation device and method of generating an index structure | |
CN109598334B (en) | Sample generation method and device | |
CN110969517B (en) | Bidding life cycle association method, system, storage medium and computer equipment | |
CN109062876B (en) | A kind of similar web page lookup method and system based on DOM webpage beta pruning | |
Nguyen et al. | Scalable and incremental clone detection for evolving software | |
JP4045400B2 (en) | Search device and search method | |
CN115562679B (en) | Java language-based automatic code generation method and server | |
KR101801226B1 (en) | Classification Algorithm for Chemical Compound Using InChI | |
US20140172897A1 (en) | Device, method, and program for processing data with tree structure | |
CN104346616A (en) | Character recognition device and character recognition method | |
JP5022252B2 (en) | Expression template generation apparatus, method and program thereof | |
Guo et al. | RED: Redundancy-Driven Data Extraction from Result Pages? | |
JP2758609B2 (en) | Exact match search method for chemical structural formulas | |
CN112699637B (en) | Paragraph type recognition method and system and document structure recognition method and system | |
CN110147396A (en) | A kind of mapping relations generation method and device | |
CN115272649A (en) | Image recognition, retrieval and entry method and system of molecular structure diagram and medium | |
CN117521017B (en) | Method and device for acquiring multi-mode characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
GRNT | Written decision to grant |