CN112131244A - Chemical reaction search method, device and system and graphic processor - Google Patents

Chemical reaction search method, device and system and graphic processor Download PDF

Info

Publication number
CN112131244A
CN112131244A CN202010989051.7A CN202010989051A CN112131244A CN 112131244 A CN112131244 A CN 112131244A CN 202010989051 A CN202010989051 A CN 202010989051A CN 112131244 A CN112131244 A CN 112131244A
Authority
CN
China
Prior art keywords
chemical reaction
reaction
target
superstructure
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010989051.7A
Other languages
Chinese (zh)
Inventor
夏宁
万钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhihua Technology Co ltd
Original Assignee
Wuhan Zhihua Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zhihua Technology Co ltd filed Critical Wuhan Zhihua Technology Co ltd
Priority to CN202010989051.7A priority Critical patent/CN112131244A/en
Publication of CN112131244A publication Critical patent/CN112131244A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/40Searching chemical structures or physicochemical data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/64Screening of libraries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a chemical reaction searching method, a device and a system, and a graphic processor, and particularly, the chemical reaction searching method can be applied to the graphic processor, the graphic processor can obtain the reaction attribute of a target chemical reaction and the reaction attributes of a plurality of known chemical reactions, so that based on the superstructure matching degree of the target chemical reaction to the known chemical reactions, the superstructure matching degree represents the probability that the target chemical reaction belongs to the reaction category of the known chemical reactions, and the graphic processor has the characteristic of parallel data processing, so that different target chemical reactions can be processed in parallel relative to the superstructure matching degree of the known chemical reactions.

Description

Chemical reaction search method, device and system and graphic processor
Technical Field
The invention relates to the field of computers, in particular to a chemical reaction searching method, a chemical reaction searching device, a chemical reaction searching system and a graphic processor.
Background
The chemical reaction search refers to retrieving a chemical reaction meeting a preset search condition from a large-scale chemical reaction database, and is widely applied to chemical information retrieval and compound inverse synthesis analysis. The search for common chemical reactions has several important factors such as reactants, products, reaction conditions, catalysts, etc.
For example, the target chemical reaction may be classified by searching for a chemical reaction of the reactant and the product included in the target chemical reaction, and the target chemical reaction may include some characteristics of the searched chemical reaction and be more complicated than the searched chemical reaction.
However, as the number of new molecules discovered by chemical research becomes larger and larger, and as the technology of building virtual molecules by using computers emerges, the number of molecules in the known molecular database increases gradually, the amount of data in the known chemical reaction database also increases gradually, from millions to tens of millions, and still increases continuously, the conventional calculation method for searching chemical reactions meeting conditions takes a long time, cannot meet practical requirements in the face of large data volume, and the accuracy of the conventional chemical reaction search is not high.
Disclosure of Invention
In order to solve the above technical problems, embodiments of the present application provide a chemical reaction search method, apparatus, and system, and a graphics processor, so as to improve efficiency and accuracy of chemical reaction search.
The embodiment of the application provides a chemical reaction searching method, which is applied to a graphic processor and comprises the following steps:
acquiring the reaction attribute of a target chemical reaction and the chemical attributes of a plurality of known chemical reactions;
calculating a superstructure matching degree of the target chemical reaction relative to the known chemical reaction in parallel based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction, wherein the superstructure matching degree represents the probability that the target chemical reaction belongs to the reaction category of the known chemical reaction.
Optionally, the reaction attribute of the target chemical reaction includes a first chemical reaction fingerprint sequence obtained by encoding a reactant and a product of the target chemical reaction; the reaction attribute of the known chemical reaction comprises a second chemical reaction fingerprint sequence obtained by encoding a reaction center of the known target chemical reaction, wherein the reaction center of the known chemical reaction is a reactant molecular fragment and a corresponding product molecular fragment which are changed in the known chemical reaction.
Optionally, the first and second chemically reactive fingerprint sequences are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
Optionally, the superstructure matching degree is calculated according to a superstructure search algorithm.
Optionally, the method further includes:
and determining a chemical reaction with the superstructure matching degree larger than a preset value from the known chemical reactions as a search result, wherein the target chemical reaction belongs to the reaction category of the search result.
The embodiment of the present application further provides a chemical reaction search apparatus, which is applied to a graphics processor, and the apparatus includes:
an attribute acquisition unit for acquiring a reaction attribute of a target chemical reaction and reaction attributes of a plurality of known chemical reactions;
a matching degree calculation unit for calculating, in parallel, a superstructure matching degree of the target chemical reaction with respect to the known chemical reaction, the superstructure matching degree reflecting a probability that the target chemical reaction belongs to a reaction category of the known chemical reaction, based on a reaction attribute of the target chemical reaction and a reaction attribute of the known chemical reaction.
Optionally, the reaction attribute of the target chemical reaction includes a first chemical reaction fingerprint sequence obtained by encoding a reaction center of the target chemical reaction, where the reaction center of the target chemical reaction is a reactant molecular fragment and a corresponding product molecular fragment that change in the target chemical reaction; the reaction attributes of the known chemical reaction include a second chemical reaction fingerprint sequence encoding reactants and products of the known chemical reaction.
Optionally, the first and second chemically reactive fingerprint sequences are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
Optionally, the superstructure matching degree is calculated according to a superstructure search algorithm.
Optionally, the apparatus further comprises:
and the search result determining unit is used for determining a chemical reaction with the superstructure matching degree larger than a preset value from the known chemical reactions as a search result, and the target chemical reaction belongs to the reaction category of the search result.
An embodiment of the present application further provides a graphics processor, including: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform the chemical reaction search method.
The embodiment of the application also provides a chemical reaction searching system which comprises at least one graphic processor.
The embodiment of the application provides a chemical reaction search method, a device and a system, and a graphic processor, and particularly, the chemical reaction search method can be applied to the graphic processor, the graphic processor can obtain a reaction attribute of a target chemical reaction and a plurality of reaction attributes of known chemical reactions, so that based on the reaction attribute of the target chemical reaction and the reaction attributes of the known chemical reactions, a superstructure matching degree of the target chemical reaction relative to the known chemical reactions can be calculated in parallel, the superstructure matching degree represents a probability that the target chemical reaction belongs to a reaction category of the known chemical reactions, and due to the characteristic of parallel data processing of the graphic processor, different target chemical reactions can be processed in parallel relative to the superstructure matching degree of the known chemical reactions, and compared with sequential execution, the time consumed by calculation of the superstructure matching degree can be reduced, the calculation efficiency of calculating the superstructure matching degree is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a flow chart of a chemical reaction search method provided in an embodiment of the present application;
FIG. 2 is a schematic illustration of reactants and products of a chemical reaction provided by an embodiment of the present application;
FIG. 3 is a schematic view of a reaction center provided in an embodiment of the present application;
fig. 4 is a block diagram of a chemical reaction search apparatus according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
At present, chemical reactions meeting preset search conditions can be retrieved from a large-scale chemical reaction database, common search conditions may include reactants, products, reaction conditions, catalysts, and the like, for example, the reactants and products of a target chemical reaction may be used to search for chemical reactions of the reactants and products included in the target chemical reaction, the target chemical reaction includes some characteristics of the searched chemical reactions and is more complex than the searched chemical reactions, and therefore the target chemical reaction may be considered to belong to a reaction category of the searched chemical reactions, thereby achieving classification of the target chemical reactions.
However, with the increasing number of new molecules discovered by chemical research and the advent of computer-based virtual molecule construction technology, the number of molecules in the known molecular database is gradually increased, the data volume in the known chemical reaction database is also gradually increased, from millions to tens of millions, and is still increasing.
The inventor finds that the main reason of the long time consumption in the traditional calculation method for chemical reaction search is that the traditional algorithm depends on the reading speed of the known chemical database and the calculation speed of a Central Processing Unit (CPU), and only depends on the upgrading of hardware performance under the condition that the increase of the disk reading speed and the performance of the CPU is gradually gradual, so that the improvement of the calculation speed is very limited, and the search calculation of a large number of chemical reactions with rapid increase speed cannot be met. The reason why the central processing unit cannot realize rapid chemical reaction search calculation is that the serial characteristic of the central processing unit enables the central processing unit to realize serial calculation of data, that is, the central processing unit only calculates the matching degree of a known chemical reaction and a target chemical reaction within a time period, all matching degree calculations are sequentially completed in sequence, the calculation time consumption is linearly increased along with the continuous increase of the number of chemical reactions in a known chemical reaction database, and the actual requirements can obviously not be met.
Based on this, embodiments of the present application provide a chemical reaction search method, an apparatus and a system, and a graphics processor, and in particular, the chemical reaction search method may be applied to the graphics processor, and the graphics processor may obtain a reaction attribute of a target chemical reaction and a plurality of reaction attributes of known chemical reactions, such that based on the reaction attribute of the target chemical reaction and the reaction attributes of the known chemical reactions, a superstructure matching degree of the target chemical reaction with respect to the known chemical reactions may be calculated in parallel, the superstructure matching degree represents a probability that the target chemical reaction belongs to a reaction category of the known chemical reactions, and due to the characteristics of the graphics processor that processes data in parallel, such that different target chemical reactions may be processed in parallel with respect to the superstructure matching degree of the known chemical reactions, and a time consumed by the superstructure matching degree calculation may be reduced compared to sequential execution, the calculation efficiency of calculating the superstructure matching degree is improved.
The following describes in detail a specific implementation manner of a chemical reaction search method, apparatus, and system provided by the embodiments of the present application with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a chemical reaction search method provided in this embodiment of the present application may be applied to a Graphics Processing Unit (GPU), where the GPU may have a plurality of computing units, and the computing units may process data in parallel, for example, the GPU may include thousands of computing units, so that thousands of data may be processed at the same time. The chemical reaction search method provided by the embodiment of the application can comprise the following steps:
s101, acquiring the reaction attribute of the target chemical reaction and the reaction attributes of a plurality of known chemical reactions.
In this embodiment, the graphics processor may perform a search for a known chemical reaction, for example, may perform a search for a chemical reaction representing a category of a target chemical reaction, and screen the known chemical reaction according to a matching degree with the target chemical reaction, specifically, may determine a chemical reaction corresponding to the category to which the target chemical reaction belongs by calculating a superstructure matching degree of the target chemical reaction with respect to the known chemical reaction. The target chemical reaction may be a newly researched chemical reaction or a chemical reaction with unknown reaction characteristics, the known chemical reaction may be a researched chemical reaction, for example, a chemical reaction with known reaction characteristics, and the known chemical reaction may be stored in a database of known chemical reactions, so that after the target chemical reaction is determined to be a superstructure of the known chemical reaction, the target chemical reaction may be analyzed according to the chemical characteristics of the known chemical reaction.
The chemical reaction of the category which embodies the target chemical reaction is searched, the category can be classified into the search of a compound superstructure, in the field of chemical information, the compound superstructure means that a part of the molecular structure of one compound completely corresponds to another compound, namely the former comprises the structure of the latter, the former can be used as the superstructure of the latter, the complete correspondence comprises atom one-to-one correspondence, and chemical bonds between atoms also correspond one-to-one correspondence. Similarly, if a part of the reaction property of one chemical reaction completely corresponds to another chemical reaction property, i.e. the former contains the reaction property of the latter, and the former is more complex than the latter, the former may be considered as belonging to the reaction type of the latter, and if the reaction property comprises a chemical molecule in the reaction, the complete correspondence here comprises a one-to-one correspondence of atoms of the chemical molecule, and chemical bonds between atoms also correspond to one-to-one.
The reaction properties of the target chemical reaction and the reaction properties of the known chemical reaction may include at least one of reactants, products, reaction conditions, catalysts, solvents, etc., e.g., the reaction properties of the target chemical reaction and the reaction properties of the known chemical reaction may include chemical molecules of the reactants and products, such that using the reactants and products of the target chemical reaction, the reactants and products of the known chemical reaction, a calculation of the superstructure matching degree of the chemical reaction may be performed.
Generally, the chemical molecules of the reactants and the products are represented by chemical formulas, which represent atoms and molecular bonds constituting the chemical molecules, and the molecular bonds can represent the connection relationship between the atoms constituting the chemical molecules.
Therefore, the reaction attribute of the target chemical reaction may include chemical molecules of the reactant and the product of the target chemical reaction, and specifically may include a molecular fingerprint of the reactant and a molecular fingerprint of the product of the target chemical reaction, where the molecular fingerprint of the reactant may be obtained by performing molecular coding on atoms and chemical bonds of the reactant, and the molecular fingerprint of the product may be obtained by performing molecular coding on atoms and chemical bonds of the product. Similarly, the reaction attribute of the known chemical reaction may include chemical molecules of the reactant and the product of the known chemical reaction, and specifically may include a molecular fingerprint of the reactant and a molecular fingerprint of the product of the known chemical reaction, where the molecular fingerprint of the reactant may be obtained by molecular coding of atoms and chemical bonds of the reactant, and the molecular fingerprint of the product may be obtained by molecular coding of atoms and chemical bonds of the product.
Molecular fingerprints can be of various types, for example: molecular ACCess System (MACCS) Fingerprints, Morgan Fingerprints, extended Connectivity Fingerprints, etc., it should be noted that the Molecular Fingerprints of the reactant and the Molecular Fingerprints of the product in the target chemical reaction, and the Molecular Fingerprints of the reactant and the Molecular Fingerprints of the product in the known chemical reaction all have the same type of Molecular fingerprint, thereby facilitating the calculation of the degree of superstructure matching between the target chemical reaction and the known chemical reaction.
Specifically, the molecular fingerprint of the reactant of the target chemical reaction and the molecular fingerprint of the product may be spliced to obtain the molecular fingerprint of the target chemical reaction as the reaction attribute of the target chemical reaction, and the molecular fingerprint of the reactant of the known chemical reaction and the molecular fingerprint of the product of the known chemical reaction may be spliced to obtain the molecular fingerprint of the known chemical reaction as the reaction attribute of the known chemical reaction.
With the synthesis of more and more new compounds and new natural compounds discovered continuously in the chemical industry in recent years, the number of chemical structure fragments, functional groups and structures with point ion groups corresponding to the new compounds and the new functions and new characteristics becomes larger and larger, and even the number of the structures with point ion groups increases, so that in practical operation, if the superstructure matching degree of the target chemical reaction reactant and product and the total information of the known chemical reaction reactant and product are obtained, a larger number of calculation amounts are generated.
In fact, after research on target chemical reactions and known chemical reactions, it is found that reactants and products in chemical reactions are different in only partial molecular fragments, and the reactants of many chemical reactions are not completely identical, but the reactions have the same reaction mechanism, and from the atomic microscopic viewpoint, a chemical reaction is that electrons are transferred between different atoms, and a chemical reaction with the same mechanism has completely the same electron transfer mechanism, so that whether a certain chemical reaction can occur or not, that is, whether an electron transfer process corresponding to a chemical reaction can occur or not is determined as a factor for determining whether some atoms local to a reactant and chemical bond relationships between the atoms and surrounding atoms satisfy the requirements of the type of chemical reaction.
That is, only a part of molecular fragments in the reactant are changed, and the molecular fingerprints of the reactant and the molecular fingerprints of the product are used as the reaction attributes of the chemical reaction, so that more information irrelevant to the reaction is easily introduced, thereby reducing the proportion of actually changed and influencing the accuracy of superstructure search of the chemical reaction.
Therefore, in the embodiment of the present application, a reaction center of a chemical reaction can be determined, where the reaction center is a reactant molecular fragment and a corresponding product molecular fragment that change in the chemical reaction, and in a microscopic view, the reaction center is a molecular fragment formed by atoms that have a decisive influence on the chemical reaction and surrounding atoms, and then a reaction attribute of the chemical reaction can be determined according to the reaction center. It is to be appreciated that at least one of the reaction properties of the target chemical reaction and the reaction properties of the known chemical reactions can be determined based on their corresponding reaction centers, facilitating an increase in the efficiency of computation of the superstructure matching.
Specifically, the reaction property of the target chemical reaction may be determined according to a reaction center of the target chemical reaction, where the reaction center of the target chemical reaction is a reactant molecular fragment and a corresponding product molecular fragment that change in the target chemical reaction, and the reaction center of the target chemical reaction may be used as a simplified reaction formula of the target chemical reaction. The reaction properties of a known chemical reaction can be determined from the reaction centers of the known chemical reactions, which are the reactant molecular fragments and the corresponding product molecular fragments that change in the known chemical reaction, as a simplified reaction formula for the known chemical reaction.
For example, referring to FIG. 2, a schematic diagram of the reactants and products of a chemical reaction, a Suzuki coupling reaction, is provided in the examples of the present application, wherein the arrow indicates the reaction proceeding direction, and the reactant C is on the left of the arrow7H7Cl and C6H7O2B, product C to the right of the arrow13H11Comparative reaction C7H7Cl, reactant C6H7O2B and products C13H11The atoms and chemical bonds in the reaction center can be used to obtain a reaction center in a chemical reaction, and referring to fig. 3, a schematic diagram of a reaction center provided in the examples of the present application is shown, arrows indicate the reaction proceeding direction, and the left side of the arrows is a reactant molecular fragment C in the reaction center3H4Cl and C3H6O2B, the product molecule fragment C in the reaction center is on the right side of the arrow6H8
For example, the reaction attribute of the target chemical reaction may include a first chemical reaction fingerprint sequence, which may be encoded by reactants and products of the target chemical reaction. Specifically, the first reactant molecular fingerprint may be obtained by encoding a reactant in the target chemical reaction, the first product molecular fingerprint may be obtained by encoding a product in the target chemical reaction, and the first reactant molecular fingerprint and the first product molecular fingerprint may be spliced to obtain the first chemical reaction fingerprint sequence.
The reactant in the known chemical reaction may include a second chemical reaction fingerprint sequence, the second chemical reaction fingerprint sequence may be obtained by encoding a reaction center of the known chemical reaction, the reaction center of the known chemical reaction may include a reactant molecular fragment and a corresponding product molecular fragment that are changed in the known chemical reaction, and the second chemical reaction fingerprint sequence may be obtained by encoding the reactant molecular fragment and the corresponding product molecular fragment that are changed in the known chemical reaction. Specifically, the second reactant molecular fingerprint can be obtained by encoding a reactant fragment which changes in a known chemical reaction, the second product molecular fingerprint can be obtained by encoding a product molecular fragment corresponding to the changed reactant fragment, and the second reactant molecular fingerprint and the second product molecular fingerprint can be spliced to obtain the second chemical reaction fingerprint sequence.
Specifically, the first chemical reaction fingerprint sequence of the target chemical reaction and the second chemical reaction fingerprint sequence of the known chemical reaction are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
In this embodiment, the number of the reaction attributes of the target chemical reaction obtained by the graphics processor may be related to the calculation processing of the graphics processor itself, and the graphics processor may obtain the reaction attribute of one target chemical reaction and the reaction attributes of a plurality of known chemical reactions, so as to calculate the superstructure matching degree of the target chemical reaction with respect to each known chemical reaction, and may also obtain the reaction attributes of a plurality of target chemical reactions and the reaction attributes of a plurality of known chemical reactions, so as to calculate the superstructure matching degree of the plurality of target chemical reactions with respect to a plurality of known chemical reactions.
The reaction properties of the known chemical reaction may be stored in a database in the storage device, e.g. the second chemical reaction fingerprint sequence of the known chemical reaction may be stored in a library of known chemical reaction fingerprints, and the obtaining of the reaction properties of the known chemical reaction may in particular be obtaining the second chemical reaction fingerprint sequence of the known chemical reaction from the library of known chemical reaction fingerprints. In specific implementation, the reaction attribute of the known chemical reaction may be read into a memory of a Central Processing Unit (CPU), and then transferred to a memory of a graphics processor, so that the graphics processor may obtain the reaction attribute of the known chemical reaction from the memory of the graphics processor.
And S102, calculating the superstructure matching degree of the target chemical reaction relative to the known chemical reaction in parallel based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction.
After obtaining the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction, the graphics processor may calculate a superstructure matching degree of the target chemical reaction relative to the known chemical reaction based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction, where the superstructure matching degree may represent a probability that the target chemical reaction belongs to a reaction category of the known chemical reaction, for example, a matching degree of a first chemical reaction fingerprint sequence of the target chemical reaction and a second chemical reaction fingerprint sequence of the known chemical reaction may be calculated, and the matching degree of the fingerprint sequences may be calculated by using a similarity calculation method of the fingerprint sequences, or may be calculated by using the first chemical reaction fingerprint sequence and the second chemical reaction fingerprint sequence as two compounds and using a superstructure search algorithm, which may include, for example, Ullmann algorithm, VF and VF2 algorithms, etc.
When the reaction attribute of the known chemical reaction is determined according to the reaction center of the known chemical reaction, the ratio of the known chemical reaction center in the reaction attribute of the known chemical reaction is high, so that the superstructure matching degree of the target chemical reaction determined by using the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction relative to the superstructure matching degree between the known chemical reactions is high in relation to the reaction center of the known chemical reaction, and even mainly reflects the matching degree of the reaction center of the known chemical reaction and the target chemical reaction, so that the obtained superstructure matching degree is accurately higher.
Because the graphic processor is provided with the plurality of computing units, each computing unit can independently process data, the plurality of computing units in the graphic processor can be used for calculating the superstructure matching degree of a target chemical reaction relative to a known chemical reaction in parallel, namely the superstructure matching degree of the same target chemical reaction relative to a plurality of known chemical reactions can be calculated in the same time period, the superstructure matching degrees of a plurality of target chemical reactions relative to the same known chemical reaction can be calculated in the same time period, and the superstructure matching degrees of different target chemical reactions relative to different known chemical reactions can be calculated in the same time period, so that the calculation efficiency of the superstructure matching degree is improved. It can be appreciated that the greater the number of computing units in the graphics processor, the more parallel processing efficiency of the superstructure matching.
After the graphics processor calculates the superstructure matching degree of the target chemical reaction relative to the known chemical reaction, the calculation result can be stored in the memory for subsequent use. For example, a chemical reaction with a superstructure matching degree with the target chemical reaction higher than a preset value may be determined from the plurality of known chemical reactions as a search result according to a superstructure matching degree of the target chemical reaction with respect to the plurality of known chemical reactions, when the target chemical reaction belongs to a reaction category of the chemical reactions in the search result.
The known chemical reaction can be represented by a simplified reaction (namely, a reaction center), the simplified reaction can represent a class of chemical reaction, and when the target chemical reaction comprises the simplified reaction in the search result, the target chemical reaction can belong to the class corresponding to the search result, so that the chemical reaction superstructure search is realized, and the classification of the target chemical reaction is realized. The category corresponding to the search structure is obtained by classifying according to the reaction center of the known chemical reaction in advance.
The embodiment of the application provides a chemical reaction searching method, which can be applied to a graphic processor, the graphic processor can acquire the reaction attribute of the target chemical reaction, and a plurality of reaction properties of known chemical reactions, such that based on the reaction properties of the target chemical reaction and the reaction properties of the known chemical reactions, the superstructure matching degree of the target chemical reaction with respect to the known chemical reactions, which represents the probability that the target chemical reaction belongs to the reaction category of the known chemical reactions, can be calculated in parallel, since the graphics processor has the property of processing data in parallel, so that different target chemical reactions can be processed in parallel with respect to the superstructure matching of known chemical reactions, compared to sequential execution, the time consumed by calculating the superstructure matching degree can be reduced, and the calculation efficiency of calculating the superstructure matching degree is improved.
Based on the above chemical reaction search method, an embodiment of the present application further provides a chemical reaction search apparatus, which is shown in fig. 4 and is a structural block diagram of the chemical reaction search apparatus provided in the embodiment of the present application, where the apparatus includes:
an attribute acquiring unit 110 for acquiring a reaction attribute of a target chemical reaction, and reaction attributes of a plurality of known chemical reactions;
a matching degree calculation unit 120 configured to calculate, in parallel, a superstructure matching degree of the target chemical reaction with respect to the known chemical reaction, the superstructure matching degree reflecting a probability that the target chemical reaction belongs to a reaction category of the known chemical reaction, based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction.
Optionally, the reaction attribute of the target chemical reaction includes a first chemical reaction fingerprint sequence obtained by encoding a reaction center of the target chemical reaction, where the reaction center of the target chemical reaction is a reactant molecular fragment and a corresponding product molecular fragment that change in the target chemical reaction; the reaction attributes of the known chemical reaction include a second chemical reaction fingerprint sequence encoding reactants and products of the known chemical reaction.
Optionally, the first and second chemically reactive fingerprint sequences are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
Optionally, the superstructure matching degree is calculated according to a superstructure search algorithm.
Optionally, the apparatus further comprises:
and the search result determining unit is used for determining a chemical reaction with the superstructure matching degree larger than a preset value from the known chemical reactions as a search result, and the target chemical reaction belongs to the reaction category of the search result.
The embodiment of the application provides a chemical reaction searching device, and particularly, the chemical reaction searching device can be applied to a graphic processor, the graphic processor can acquire the reaction attribute of the target chemical reaction, and a plurality of reaction properties of known chemical reactions, such that based on the reaction properties of the target chemical reaction and the reaction properties of the known chemical reactions, the superstructure matching degree of the target chemical reaction with respect to the known chemical reactions, which represents the probability that the target chemical reaction belongs to the reaction category of the known chemical reactions, can be calculated in parallel, since the graphics processor has the property of processing data in parallel, so that different target chemical reactions can be processed in parallel with respect to the superstructure matching of known chemical reactions, compared to sequential execution, the time consumed by calculating the superstructure matching degree can be reduced, and the calculation efficiency of calculating the superstructure matching degree is improved.
An embodiment of the present application further provides a graphics processor, including: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform the chemical reaction search method.
The embodiment of the application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores instructions, and when the instructions are run on the terminal device, the instructions enable the graphic processor to execute the chemical reaction searching method.
In addition, the embodiment of the application also provides a chemical reaction search system which comprises at least one graphic processor. Specifically, a plurality of graphics processors are extended on a single computer, so that higher operation speed can be obtained.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a router) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, system embodiments and device embodiments are substantially similar to method embodiments and are therefore described in a relatively simple manner, where relevant reference may be made to some descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, wherein modules described as separate parts may or may not be physically separate, and parts shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only a preferred embodiment of the present application and is not intended to limit the scope of the present application. It should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the scope of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A chemical reaction search method applied to a graphics processor, the method comprising:
acquiring the reaction attribute of a target chemical reaction and the chemical attributes of a plurality of known chemical reactions;
calculating a superstructure matching degree of the target chemical reaction relative to the known chemical reaction in parallel based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction, wherein the superstructure matching degree represents the probability that the target chemical reaction belongs to the reaction category of the known chemical reaction.
2. The method of claim 1, wherein the reaction attributes of the target chemical reaction include a first chemical reaction fingerprint sequence encoding reactants and products of the target chemical reaction; the reaction attribute of the known chemical reaction comprises a second chemical reaction fingerprint sequence obtained by encoding a reaction center of the known target chemical reaction, wherein the reaction center of the known chemical reaction is a reactant molecular fragment and a corresponding product molecular fragment which are changed in the known chemical reaction.
3. The method of claim 2, wherein the first and second chemically reactive fingerprint sequences are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
4. The method of any of claims 1-3, wherein the superstructure matching metric is calculated according to a superstructure search algorithm.
5. The method of any one of claims 1-3, further comprising:
and determining a chemical reaction with the superstructure matching degree larger than a preset value from the known chemical reactions as a search result, wherein the target chemical reaction belongs to the reaction category of the search result.
6. A chemical reaction search apparatus applied to a graphics processor, the apparatus comprising:
an attribute acquisition unit for acquiring a reaction attribute of a target chemical reaction and reaction attributes of a plurality of known chemical reactions;
a matching degree calculation unit for calculating, in parallel, a superstructure matching degree of the target chemical reaction with respect to the known chemical reaction, the superstructure matching degree reflecting a probability that the target chemical reaction belongs to a reaction category of the known chemical reaction, based on a reaction attribute of the target chemical reaction and a reaction attribute of the known chemical reaction.
7. The apparatus of claim 6, wherein the reaction attribute of the target chemical reaction comprises a first chemical reaction fingerprint sequence encoded by the reaction center of the target chemical reaction, the reaction center of the target chemical reaction being a reactant molecular fragment and a corresponding product molecular fragment that change in the target chemical reaction; the reaction attributes of the known chemical reaction include a second chemical reaction fingerprint sequence encoding reactants and products of the known chemical reaction.
8. The apparatus of claim 6, further comprising:
and the search result determining unit is used for determining a chemical reaction with the superstructure matching degree larger than a preset value from the known chemical reactions as a search result, and the target chemical reaction belongs to the reaction category of the search result.
9. A graphics processor, comprising: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the chemical reaction search method of any of claims 1-5.
10. A chemical reaction search system comprising at least one graphics processor as claimed in claim 9.
CN202010989051.7A 2020-09-18 2020-09-18 Chemical reaction search method, device and system and graphic processor Pending CN112131244A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010989051.7A CN112131244A (en) 2020-09-18 2020-09-18 Chemical reaction search method, device and system and graphic processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010989051.7A CN112131244A (en) 2020-09-18 2020-09-18 Chemical reaction search method, device and system and graphic processor

Publications (1)

Publication Number Publication Date
CN112131244A true CN112131244A (en) 2020-12-25

Family

ID=73841504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010989051.7A Pending CN112131244A (en) 2020-09-18 2020-09-18 Chemical reaction search method, device and system and graphic processor

Country Status (1)

Country Link
CN (1) CN112131244A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115240785A (en) * 2022-07-21 2022-10-25 苏州沃时数字科技有限公司 Chemical reaction prediction method, system, device and storage medium
CN116226472A (en) * 2022-11-17 2023-06-06 上海药明康德新药开发有限公司 Vectorization-based reference reaction query method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030182094A1 (en) * 2002-02-14 2003-09-25 Broughton Howard Barff Methods for classifying and searching chemical reactions
CN104750761A (en) * 2013-12-31 2015-07-01 上海致化化学科技有限公司 Method for creating molecular structure databases and method for searching same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030182094A1 (en) * 2002-02-14 2003-09-25 Broughton Howard Barff Methods for classifying and searching chemical reactions
CN104750761A (en) * 2013-12-31 2015-07-01 上海致化化学科技有限公司 Method for creating molecular structure databases and method for searching same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
袁小龙: "一种新的利用 GPU 加速分子指纹预筛及结构相似性计算的算法", 第十二届全国计算(机)化学学术会议论文集, pages 89 *
贺巧鑫: "ReaxFF MD模拟结果的化学反应网络自动构建及可视化", 中国科学院大学(中国科学院过程工程研究所), no. 2019, pages 12 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115240785A (en) * 2022-07-21 2022-10-25 苏州沃时数字科技有限公司 Chemical reaction prediction method, system, device and storage medium
CN115240785B (en) * 2022-07-21 2023-09-12 苏州沃时数字科技有限公司 Chemical reaction prediction method, system, device and storage medium
CN116226472A (en) * 2022-11-17 2023-06-06 上海药明康德新药开发有限公司 Vectorization-based reference reaction query method and system

Similar Documents

Publication Publication Date Title
US10381106B2 (en) Efficient genomic read alignment in an in-memory database
Li et al. Improving approximate nearest neighbor search through learned adaptive early termination
Yu et al. SeqOthello: querying RNA-seq experiments at scale
US20220188652A1 (en) System and method for de novo drug discovery
CN112259168B (en) Gene sequencing data processing method and gene sequencing data processing device
Lin et al. Clustering methods in protein-protein interaction network
CN112131244A (en) Chemical reaction search method, device and system and graphic processor
US10354745B2 (en) Aligning and clustering sequence patterns to reveal classificatory functionality of sequences
US20150142808A1 (en) System and method for efficiently determining k in data clustering
US20130325428A1 (en) Assembly of Metagenomic Sequences
CN112133379A (en) Chemical reaction search method, device and system and graphic processor
CN111798935A (en) Universal compound structure-property correlation prediction method based on neural network
Diao et al. Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis.
Sarkar et al. An algorithm for DNA read alignment on quantum accelerators
CN106599122B (en) Parallel frequent closed sequence mining method based on vertical decomposition
CN106844541B (en) Online analysis processing method and device
CN113838541B (en) Method and apparatus for designing ligand molecules
Sun et al. RecMotif: a novel fast algorithm for weak motif discovery
Zhang et al. Accelerating exact nearest neighbor search in high dimensional Euclidean space via block vectors
CN112086136A (en) Data processing method, device and system and graphics processor
CN114530215B (en) Method and apparatus for designing ligand molecules
US20240006017A1 (en) Protein Structure Prediction
Carletti et al. Graph-based representations for supporting genome data analysis and visualization: Opportunities and challenges
CN114373509A (en) Method for accelerating AutoDock Vina based on GPU
CN115881211A (en) Protein sequence alignment method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination