CN112086136A - Data processing method, device and system and graphics processor - Google Patents

Data processing method, device and system and graphics processor Download PDF

Info

Publication number
CN112086136A
CN112086136A CN202010986444.2A CN202010986444A CN112086136A CN 112086136 A CN112086136 A CN 112086136A CN 202010986444 A CN202010986444 A CN 202010986444A CN 112086136 A CN112086136 A CN 112086136A
Authority
CN
China
Prior art keywords
reaction
chemical reaction
target
known chemical
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010986444.2A
Other languages
Chinese (zh)
Inventor
夏宁
万钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhihua Technology Co ltd
Original Assignee
Wuhan Zhihua Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zhihua Technology Co ltd filed Critical Wuhan Zhihua Technology Co ltd
Priority to CN202010986444.2A priority Critical patent/CN112086136A/en
Publication of CN112086136A publication Critical patent/CN112086136A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/10Analysis or design of chemical reactions, syntheses or processes

Abstract

The embodiment of the application discloses a data processing method, a data processing device, a data processing system and a graphic processor, and particularly, the data processing method can be applied to the graphic processor, the graphic processor can determine the reaction attribute of a target chemical reaction and the reaction attributes of a plurality of known chemical reactions, and the reaction similarity of the target chemical reaction and the known chemical reactions is calculated in parallel based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reactions.

Description

Data processing method, device and system and graphics processor
Technical Field
The present invention relates to the field of computers, and in particular, to a data processing method, apparatus and system, and a graphics processor.
Background
The similarity between chemical reactions is a concept in chemical informatics and refers to the similarity between two chemical reactions, and the similarity between chemical reactions usually considers several important factors of chemical reactions, such as reactants, products, reaction conditions, catalysts, solvents, etc. in practical applications, the similarity between two chemical reactions can be determined by using the similarity between reactants and the similarity between products, for example, the similarity between two chemical reactions refers to the similarity between reactants of two chemical reactions, and the similarity between products of two chemical reactions.
However, as the number of new molecules discovered by chemical research becomes larger and a technology of constructing virtual molecules by using a computer is developed, the number of molecules in a known molecular database is gradually increased, the data volume in a known chemical reaction database is also gradually increased, which is approximately increased from one million to ten million and still continues to increase.
Disclosure of Invention
In order to solve the foregoing technical problems, embodiments of the present application provide a data processing method, an apparatus and a system, and a graphics processor, so as to improve the computation efficiency of the molecular similarity.
The embodiment of the application provides a data processing method, which is applied to a graphics processor and comprises the following steps:
acquiring the reaction attribute of a target chemical reaction and the reaction attributes of a plurality of known chemical reactions;
based on the reaction properties of the target chemical reaction and the reaction properties of the known chemical reactions, reaction similarities of the target chemical reaction and the known chemical reactions are calculated in parallel.
Optionally, the reaction attribute of the target chemical reaction is determined according to a first reaction center, where the first reaction center is a reactant molecular fragment and a corresponding product molecular fragment that change in the target chemical reaction; the reaction properties of the known chemical reaction are determined from a second reaction center, which is a reactant molecular fragment and a corresponding product molecular fragment that are changed in the known chemical reaction.
Optionally, the reaction attribute of the target chemical reaction includes a chemical reaction fingerprint sequence of the target chemical reaction, and the chemical reaction fingerprint sequence of the target chemical reaction is obtained by encoding the first reaction center; the reaction attribute of the known chemical reaction includes a chemical reaction fingerprint sequence of the known chemical reaction obtained by encoding the second reaction center.
Optionally, the chemical reaction fingerprint sequence of the target chemical reaction and the chemical reaction fingerprint sequence of the known chemical reaction are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
Optionally, the method further includes:
and determining a similar reaction with the highest reaction similarity with the target chemical reaction from a plurality of the known chemical reactions.
An embodiment of the present application further provides a data processing apparatus, which is applied to a graphics processor, and the apparatus includes:
an attribute acquisition unit for determining a reaction attribute of a target chemical reaction and reaction attributes of a plurality of known chemical reactions;
a similarity calculation unit for calculating reaction similarities of the target chemical reaction and the known chemical reaction in parallel based on a reaction property of the target chemical reaction and a reaction property of the known chemical reaction.
Optionally, the reaction attribute of the target chemical reaction is determined according to a first reaction center, where the first reaction center is a reactant molecular fragment and a product molecular fragment that change in the target chemical reaction; the reaction properties of the known chemical reaction are determined from a second reaction center, the second reaction center being a reactant molecular fragment and a product molecular fragment that change in the known chemical reaction.
Optionally, the reaction attribute of the target chemical reaction includes a chemical reaction fingerprint sequence of the target chemical reaction, and the chemical reaction fingerprint sequence of the target chemical reaction is obtained by encoding the first reaction center; the reaction attribute of the known chemical reaction includes a chemical reaction fingerprint sequence of the known chemical reaction obtained by encoding the second reaction center.
Optionally, the chemical reaction fingerprint sequence of the target chemical reaction and the chemical reaction fingerprint sequence of the known chemical reaction are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
Optionally, the apparatus further comprises:
and determining a similar reaction with the highest reaction similarity with the target chemical reaction from a plurality of the known chemical reactions.
An embodiment of the present application further provides a graphics processor, including: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is for storing one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform the data processing method.
The embodiment of the application also provides a data processing system which comprises at least one graphics processor.
The embodiment of the application provides a data processing method, a data processing device, a data processing system and a graphic processor, and particularly, the data processing method can be applied to the graphic processor, the graphic processor can determine a reaction attribute of a target chemical reaction and a reaction attribute of a plurality of known chemical reactions, and based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reactions, the reaction similarity of the target chemical reaction and the known chemical reactions is calculated in parallel.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic illustration of reactants and products of a chemical reaction provided by an embodiment of the present application;
FIG. 3 is a schematic view of a reaction center provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of a chemical reaction fingerprint sequence provided in an embodiment of the present application;
fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In practical applications, the similarity between two chemical reactions can be determined by using the similarity between the reactants and the similarity between the products, for example, the similarity between two chemical reactions means that the reactants of two chemical reactions are similar, and the products of two chemical reactions are similar.
However, as the number of new molecules discovered by chemical research becomes larger and a technology of constructing virtual molecules by using a computer is developed, the number of molecules in a known molecular database is gradually increased, the data volume in a known chemical reaction database is also gradually increased, which is approximately increased from one million to ten million and still continues to increase.
The inventor finds that the main reason of the long time consumption in the conventional chemical reaction similarity calculation method is that the conventional algorithm depends on the reading speed of the known chemical reaction database and the calculation speed of a Central Processing Unit (CPU), and only depends on the upgrading of hardware performance under the current situation that the disk reading speed and the performance of the CPU increase gradually, so that the improvement of the calculation speed is very limited, and therefore, the similarity calculation of a large number of chemical reactions with rapid increase speed cannot be satisfied. The reason why the central processing unit cannot realize rapid calculation of the similarity of the chemical reactions is that the serial characteristic of the central processing unit enables the central processing unit to realize serial calculation of data, that is, the central processing unit only calculates the similarity of one known chemical reaction and a target chemical reaction within a period of time, all similarity calculations are sequentially completed, the calculation time consumption is linearly increased along with the continuous increase of the number of the chemical reactions in the known chemical reaction database, and the actual requirements can obviously not be met.
Based on this, embodiments of the present application provide a data processing method, an apparatus and a system, and a graphics processor, and in particular, the data processing method may be applied to the graphics processor, and the graphics processor may determine a reaction attribute of a target chemical reaction and a reaction attribute of a plurality of known chemical reactions, and concurrently calculate a reaction similarity between the target chemical reaction and the known chemical reactions based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reactions.
The following describes a specific implementation manner of a data processing method, an apparatus and a system provided by the embodiments of the present application in detail by embodiments with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a data Processing method provided in this embodiment of the present application may be applied to a Graphics Processing Unit (GPU), where the GPU may have a plurality of computing units, and the computing units may process data in parallel, for example, the GPU may include thousands of computing units, so that thousands of data may be processed at the same time. The data processing method provided by the embodiment of the application can comprise the following steps:
s101, acquiring the reaction attribute of the target chemical reaction and the reaction attributes of a plurality of known chemical reactions.
In this embodiment, the graphic processor may perform the calculation of the similarity between the target chemical reaction and the known chemical reaction, and thus may obtain the reaction property of the target chemical reaction and the reaction properties of the plurality of known chemical reactions. The target chemical reaction may be a newly researched chemical reaction or a chemical reaction with unknown reaction characteristics, the known chemical reaction may be a researched chemical reaction, for example, a chemical reaction with known reaction characteristics, and the known chemical reaction may be stored in a database of known chemical reactions, so that after the target chemical reaction is determined to be similar to the known chemical reaction, the target chemical reaction may be analyzed according to the chemical characteristics of the known chemical reaction.
The similarity between the chemical reactions may take several important factors of the chemical reactions into consideration, such as reactants, products, reaction conditions, catalysts, solvents, etc., and thus the reaction properties of the target chemical reaction and the reaction properties of the known chemical reactions may include characteristics that embody at least one of the above factors, and in particular, the reaction properties of the target chemical reaction and the reaction properties of the known chemical reactions may embody characteristics of the reactants and the products, such that the determination of the similarity between the target chemical reaction and the known chemical reactions may be performed using the reactants and the products of the target chemical reactions, the reactants and the products of the known chemical reactions.
Generally, chemical molecules of reactants and products are represented by chemical formulas, which represent atoms and molecular bonds constituting the chemical molecules, and the molecular bonds can represent connections between atoms constituting the chemical molecules.
Therefore, the reaction attribute of the target chemical reaction can embody the characteristics of the reactant and the product, and can include the molecular fingerprint of the reactant and the molecular fingerprint of the product of the target chemical reaction, wherein the molecular fingerprint of the reactant can be obtained by performing molecular coding on the atom and the chemical bond of the reactant, and the molecular fingerprint of the product can be obtained by performing molecular coding on the atom and the chemical bond of the product. Similarly, the reaction attribute of the known chemical reaction may embody the characteristics of the reactant and the product, and may include a molecular fingerprint of the reactant and a molecular fingerprint of the product of the known chemical reaction, where the molecular fingerprint of the reactant may be obtained by performing molecular coding on atoms and chemical bonds of the reactant, and the molecular fingerprint of the product may be obtained by performing molecular coding on atoms and chemical bonds of the product.
Molecular fingerprints can be of various types, for example: molecular ACCess System (MACCS) Fingerprints, Morgan Fingerprints, extended Connectivity Fingerprints, etc., it should be noted that the Molecular Fingerprints of the reactant and the Molecular Fingerprints of the product in the target chemical reaction, and the Molecular Fingerprints of the reactant and the Molecular Fingerprints of the product in the known chemical reaction all have the same type of Molecular fingerprint, thereby facilitating the comparison between the target chemical reaction and the known chemical reaction.
Specifically, the molecular fingerprint of the reactant of the target chemical reaction and the molecular fingerprint of the product may be spliced to obtain the molecular fingerprint of the target chemical reaction as the reaction attribute of the target chemical reaction, and the molecular fingerprint of the reactant of the known chemical reaction and the molecular fingerprint of the product of the known chemical reaction may be spliced to obtain the molecular fingerprint of the known chemical reaction as the reaction attribute of the known chemical reaction.
In fact, after research on a target chemical reaction and a known chemical reaction, it is found that only partial molecular fragments of reactants and products in the chemical reaction are different, that is, only partial molecular fragments in the reactants are changed, and by using the molecular fingerprints of the reactants and the molecular fingerprints of the products as reaction attributes of the chemical reaction, more information irrelevant to the reaction is easily introduced, so that the proportion of actually changed is reduced, and the accuracy of the similarity of the chemical reaction is influenced.
Therefore, in the embodiment of the present application, a reaction center of a chemical reaction can be determined, where the reaction center is a reactant molecular fragment and a corresponding product molecular fragment that change in the chemical reaction, and then a reaction attribute of the chemical reaction is determined according to the reaction center. That is, the reaction property of the target chemical reaction may be determined based on a first reaction center, which is a reactant molecular fragment and a corresponding product molecular fragment that change in the target chemical reaction, and the reaction property of the known chemical reaction may be determined based on a second reaction center, which is a reactant molecular fragment and a corresponding product molecular fragment that change in the known chemical reaction.
For example, referring to FIG. 2, a schematic diagram of the reactants and products of a chemical reaction is provided for the examples of the present application, wherein reactant C is shown on the left6H8N2O, product C on the right6H6N2O3Comparative reaction C6H8N2O and product C6H6N2O3The atom and chemical bond in (A) are known, and the reactant C6H8N2H in O2N- "fragment changes to" O "in product B2N- "fragment, the remainder remaining unchanged, reactant C6H8N2Included in O is "H2Fragment of N- "and product C6H6N2O3In (1) includes "O2The fragment of N- "can be used as a reaction center, as shown in FIG. 3, which is a schematic diagram of a reaction center provided in the examples of the present application, and the left side isMolecular fragment of reactant in reaction center: "C3H6N ", right side product molecule fragment" C "in reaction center3H4NO2”。
Wherein the reaction attribute of the target chemical reaction may include a chemical reaction fingerprint sequence of the target chemical reaction, the chemical reaction fingerprint sequence of the target chemical reaction may be obtained by encoding a first reaction center, the first reaction center includes reactant molecular fragments and corresponding product molecular fragments that change in the target chemical reaction, the chemical reaction fingerprint sequence of the target chemical reaction can be obtained by encoding the reactant molecular fragments and the corresponding product molecular fragments that change in the target chemical reaction, specifically, a first reactant molecular fingerprint may be encoded with a reactant fragment that changes in the target chemical reaction, and coding the corresponding product molecular fragment to obtain a first product molecular fingerprint, and splicing the first reactant molecular fingerprint and the first product molecular fingerprint to obtain a chemical reaction fingerprint sequence of the target chemical reaction.
Similarly, the reaction attribute of the known chemical reaction may include a chemical reaction fingerprint sequence of the known chemical reaction, the chemical reaction fingerprint sequence of the known chemical reaction may be obtained by encoding a second reaction center, the second reaction center including a reactant molecular fragment and a corresponding product molecular fragment that are changed in the known chemical reaction, the chemical reaction fingerprint sequence of the known chemical reaction can be obtained by encoding the reactant molecular fragments and the corresponding product molecular fragments that are changed in the known chemical reaction, specifically, a second reactant molecular fingerprint may be encoded for a reactant segment that changes in a known chemical reaction, and coding the corresponding product molecular fragment to obtain a second product molecular fingerprint, and splicing the second reactant molecular fingerprint and the second product molecular fingerprint to obtain a chemical reaction fingerprint sequence of the known chemical reaction.
Specifically, the chemical reaction fingerprint sequence of the target chemical reaction and the chemical reaction fingerprint sequence of the known chemical reaction are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
Referring to FIG. 4, a schematic diagram of a fingerprint sequence of a chemical reaction provided in the embodiments of the present application is shown, wherein the reactant molecule fragment "C" is3H6The molecular fingerprint of N "can be represented as" 1010100101100000 ", the product molecular fragment" C3H4NO2The molecular fingerprint of the ' can be represented as ' 0010001000000100 ', and the chemical reaction fingerprint sequence ' 10101001011000000010001000000100 ' can be obtained after the two are spliced.
In this embodiment, the number of the reaction attributes of the target chemical reaction obtained by the graphics processor may be related to the calculation process of the graphics processor itself, and the graphics processor may obtain the reaction attribute of one target chemical reaction and the reaction attributes of a plurality of known chemical reactions, so as to calculate the reaction similarity between the target chemical reaction and each of the known chemical reactions, and may also obtain the reaction attributes of a plurality of target chemical reactions and the reaction attributes of a plurality of known chemical reactions, so as to calculate the reaction similarity between the plurality of target chemical reactions and the plurality of known chemical reactions.
The reaction properties of the known chemical reaction may be stored in a database in the storage device, e.g. the chemical reaction fingerprint sequence of the known chemical reaction may be stored in a library of known chemical reaction fingerprints, and the reaction properties of the known chemical reaction may be obtained with the chemical reaction fingerprint sequence of the known chemical reaction from the library of known chemical reaction fingerprints. In specific implementation, the reaction attribute of the known chemical reaction may be read into the memory, and then transferred to the video memory of the graphics processor, so that the graphics processor obtains the reaction attribute of the known chemical reaction from the video memory.
And S102, calculating the reaction similarity of the target chemical reaction and the known chemical reaction in parallel based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction.
After obtaining the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction, the graphics processor may calculate a reaction similarity between the target chemical reaction and the known chemical reaction based on the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction, for example, may calculate a similarity between a chemical reaction fingerprint sequence of the target chemical reaction and a chemical reaction fingerprint sequence of the known chemical reaction, as the reaction similarity between the target chemical reaction and the known chemical reaction, calculate a similarity between the chemical reaction fingerprint sequence of the target chemical reaction and the chemical reaction fingerprint sequence of the known chemical reaction, and may use a calculation method of a molecular similarity, for example, may use a Tanimoto similarity calculation formula, and the like.
When the reaction attribute of the target chemical reaction is determined according to the first reaction center and the reaction attribute of the known chemical reaction is determined according to the second reaction center, the ratio of the characteristic of the first reaction center to the characteristic of the target chemical reaction is higher, and the ratio of the characteristic of the second reaction center to the characteristic of the known chemical reaction is also higher, so that the reaction similarity between the target chemical reaction and the known chemical reaction determined by using the reaction attribute of the target chemical reaction and the reaction attribute of the known chemical reaction is more correlated with the similarity between the first reaction center and the second reaction center, or even mainly reflects the similarity between the first reaction center and the second reaction center, and therefore, the calculated reaction similarity is more accurate.
Because the graphic processor is provided with the plurality of computing units, each computing unit can independently process data, the plurality of computing units in the graphic processor can be used for calculating the reaction similarity of the target chemical reaction and the known chemical reaction in parallel, namely the reaction similarity of the same target chemical reaction and the plurality of known chemical reactions can be calculated in the same time period, the reaction similarity of the plurality of target chemical reactions and the same known chemical reaction can be calculated in the same time period, and the reaction similarity of different target chemical reactions and different known chemical reactions can be calculated in the same time period, so that the calculation efficiency of the reaction similarity is improved. It is understood that the greater the number of computing units in the graphics processor, the more efficient the parallel processing reflecting the similarity.
After the graphic processor calculates the reaction similarity between the target chemical reaction and the known chemical reaction, the calculation result can be stored in the memory for subsequent use. For example, a similar chemical reaction having the highest reaction similarity with the target chemical reaction may be determined from the plurality of known chemical reactions based on the reaction similarities between the target chemical reaction and the plurality of known chemical reactions, and the similar chemical reaction may be the closest chemical reaction to the target chemical reaction.
The embodiment of the application provides a data processing method, which can be applied to a graphic processor, wherein the graphic processor can determine a reaction attribute of a target chemical reaction and a plurality of reaction attributes of known chemical reactions, and based on the reaction attribute of the target chemical reaction and the reaction attributes of the known chemical reactions, the reaction similarity of the target chemical reaction and the known chemical reactions is calculated in parallel.
Based on the above data processing method, an embodiment of the present application further provides a data processing apparatus, and referring to fig. 5, a block diagram of a structure of the data processing apparatus provided in the embodiment of the present application is shown, where the apparatus includes:
an attribute obtaining unit 110 for determining a reaction attribute of a target chemical reaction, and reaction attributes of a plurality of known chemical reactions;
a similarity calculation unit 120 configured to calculate reaction similarities of the target chemical reaction and the known chemical reaction in parallel based on the reaction property of the target chemical reaction and the reaction property of the known chemical reaction.
Optionally, the reaction attribute of the target chemical reaction is determined according to a first reaction center, where the first reaction center is a reactant molecular fragment and a product molecular fragment that change in the target chemical reaction; the reaction properties of the known chemical reaction are determined from a second reaction center, the second reaction center being a reactant molecular fragment and a product molecular fragment that change in the known chemical reaction.
Optionally, the reaction attribute of the target chemical reaction includes a chemical reaction fingerprint sequence of the target chemical reaction, and the chemical reaction fingerprint sequence of the target chemical reaction is obtained by encoding the first reaction center; the reaction attribute of the known chemical reaction includes a chemical reaction fingerprint sequence of the known chemical reaction obtained by encoding the second reaction center.
Optionally, the chemical reaction fingerprint sequence of the target chemical reaction and the chemical reaction fingerprint sequence of the known chemical reaction are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
Optionally, the apparatus further comprises:
and determining a similar reaction with the highest reaction similarity with the target chemical reaction from a plurality of the known chemical reactions.
The embodiment of the application provides a data processing device, and in particular, the data processing device can be applied to a graphic processor, the graphic processor can determine a reaction attribute of a target chemical reaction and a plurality of reaction attributes of known chemical reactions, and based on the reaction attribute of the target chemical reaction and the reaction attributes of the known chemical reactions, the reaction similarity of the target chemical reaction and the known chemical reactions is calculated in parallel.
An embodiment of the present application further provides a graphics processor, including: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is for storing one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform the data processing method.
The embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the instructions cause the graphics processor to execute the data processing method.
In addition, the embodiment of the application also provides a data processing system which comprises at least one graphics processor. Specifically, a plurality of graphics processors are extended on a single computer, so that higher operation speed can be obtained.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a router) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, system embodiments and device embodiments are substantially similar to method embodiments and are therefore described in a relatively simple manner, where relevant reference may be made to some descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, wherein modules described as separate parts may or may not be physically separate, and parts shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only a preferred embodiment of the present application and is not intended to limit the scope of the present application. It should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the scope of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A data processing method, applied to a graphics processor, the method comprising:
acquiring the reaction attribute of a target chemical reaction and the reaction attributes of a plurality of known chemical reactions;
based on the reaction properties of the target chemical reaction and the reaction properties of the known chemical reactions, reaction similarities of the target chemical reaction and the known chemical reactions are calculated in parallel.
2. The method of claim 1, wherein the reaction properties of the target chemical reaction are determined from first reaction centers, the first reaction centers being reactant molecular fragments and corresponding product molecular fragments that change in the target chemical reaction; the reaction properties of the known chemical reaction are determined from a second reaction center, which is a reactant molecular fragment and a corresponding product molecular fragment that are changed in the known chemical reaction.
3. The method of claim 2, wherein the reaction attribute of the target chemical reaction comprises a chemical reaction fingerprint sequence of the target chemical reaction obtained by encoding the first reaction center; the reaction attribute of the known chemical reaction includes a chemical reaction fingerprint sequence of the known chemical reaction obtained by encoding the second reaction center.
4. The method of claim 3, wherein the chemical reaction fingerprint sequence of the target chemical reaction and the chemical reaction fingerprint sequence of the known chemical reaction are one of the following molecular fingerprints: molecular access system fingerprints, Morgan fingerprints, extended connectivity fingerprints.
5. The method according to any one of claims 1-4, further comprising:
and determining a similar reaction with the highest reaction similarity with the target chemical reaction from a plurality of the known chemical reactions.
6. A data processing apparatus, for use in a graphics processor, the apparatus comprising:
an attribute acquisition unit for determining a reaction attribute of a target chemical reaction and reaction attributes of a plurality of known chemical reactions;
a similarity calculation unit for calculating reaction similarities of the target chemical reaction and the known chemical reaction in parallel based on a reaction property of the target chemical reaction and a reaction property of the known chemical reaction.
7. The apparatus of claim 6, wherein the reaction properties of the target chemical reaction are determined according to a first reaction center, the first reaction center being a reactant molecular fragment and a product molecular fragment that change in the target chemical reaction; the reaction properties of the known chemical reaction are determined from a second reaction center, the second reaction center being a reactant molecular fragment and a product molecular fragment that change in the known chemical reaction.
8. The apparatus of claim 7, wherein the reaction attribute of the target chemical reaction comprises a chemical reaction fingerprint sequence of the target chemical reaction, the chemical reaction fingerprint sequence of the target chemical reaction being obtained by encoding the first reaction center; the reaction attribute of the known chemical reaction includes a chemical reaction fingerprint sequence of the known chemical reaction obtained by encoding the second reaction center.
9. A graphics processor, comprising: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is for storing one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform the data processing method of any of claims 1-5.
10. A data processing system comprising at least one graphics processor as claimed in claim 9.
CN202010986444.2A 2020-09-18 2020-09-18 Data processing method, device and system and graphics processor Pending CN112086136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010986444.2A CN112086136A (en) 2020-09-18 2020-09-18 Data processing method, device and system and graphics processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010986444.2A CN112086136A (en) 2020-09-18 2020-09-18 Data processing method, device and system and graphics processor

Publications (1)

Publication Number Publication Date
CN112086136A true CN112086136A (en) 2020-12-15

Family

ID=73738239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010986444.2A Pending CN112086136A (en) 2020-09-18 2020-09-18 Data processing method, device and system and graphics processor

Country Status (1)

Country Link
CN (1) CN112086136A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913931A (en) * 2021-02-09 2022-08-16 重庆博腾制药科技股份有限公司 Inter-reaction similarity quantification method, system and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030182094A1 (en) * 2002-02-14 2003-09-25 Broughton Howard Barff Methods for classifying and searching chemical reactions
US20090024575A1 (en) * 2007-07-17 2009-01-22 Alain Wagner Methods for similarity searching of chemical reactions
WO2018121866A1 (en) * 2016-12-29 2018-07-05 Pharmacelera, S.L. Calculating molecular similarity
US20200027528A1 (en) * 2017-09-12 2020-01-23 Massachusetts Institute Of Technology Systems and methods for predicting chemical reactions
CN111243660A (en) * 2020-01-06 2020-06-05 中国海洋大学 Parallel marine drug screening method based on heterogeneous many-core architecture

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030182094A1 (en) * 2002-02-14 2003-09-25 Broughton Howard Barff Methods for classifying and searching chemical reactions
US20090024575A1 (en) * 2007-07-17 2009-01-22 Alain Wagner Methods for similarity searching of chemical reactions
WO2018121866A1 (en) * 2016-12-29 2018-07-05 Pharmacelera, S.L. Calculating molecular similarity
US20200027528A1 (en) * 2017-09-12 2020-01-23 Massachusetts Institute Of Technology Systems and methods for predicting chemical reactions
CN111243660A (en) * 2020-01-06 2020-06-05 中国海洋大学 Parallel marine drug screening method based on heterogeneous many-core architecture

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HENDRICKSON,ET AL.: "Reaction indexing for reaction databases", 《JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES》 *
袁小龙: "一种新的利用GPU加速分子指纹预筛及结构相似性计算的算法", 《第十二届全国计算(机)化学学术会议论文集》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913931A (en) * 2021-02-09 2022-08-16 重庆博腾制药科技股份有限公司 Inter-reaction similarity quantification method, system and device

Similar Documents

Publication Publication Date Title
Meisner et al. Inferring population structure and admixture proportions in low-depth NGS data
Bailey et al. The value of position-specific priors in motif discovery using MEME
Zou et al. Supersecondary structure prediction using Chou's pseudo amino acid composition
Roshan et al. Probalign: multiple sequence alignment using partition function posterior probabilities
US11615324B2 (en) System and method for de novo drug discovery
Vachaspati et al. FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization
Cao et al. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing
Kleftogiannis et al. Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures
US10354745B2 (en) Aligning and clustering sequence patterns to reveal classificatory functionality of sequences
Chen et al. iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets
Yamada et al. Prediction of RNA–protein interactions using a nucleotide language model
Valle et al. Optimization strategies for fast detection of positive selection on phylogenetic trees
Gong et al. Investigation of the molecular surface area and volume: Defined and calculated by the molecular face theory
CN112131244A (en) Chemical reaction search method, device and system and graphic processor
Li et al. Integrated entropy-based approach for analyzing exons and introns in DNA sequences
CN112086136A (en) Data processing method, device and system and graphics processor
CN112133379A (en) Chemical reaction search method, device and system and graphic processor
Mohammadi et al. PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles
Amerifar et al. A tool for feature extraction from biological sequences
Troukhan et al. Genome-wide discovery of cis-elements in promoter sequences using gene expression
Soares et al. Sequence comparison alignment-free approach based on suffix tree and L-words frequency
Yu et al. Mapping RNA-seq reads to transcriptomes efficiently based on learning to hash method
CN114072878A (en) Data transmission calculation method, device and storage medium
Carletti et al. Graph-based representations for supporting genome data analysis and visualization: Opportunities and challenges
US20040072204A1 (en) Base sequence cluster generating system, base sequence cluster generating method, program for performing cluster generating method, and computer readable recording medium on which program is recorded and system for providing base sequence Information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201215