CN114464267A

CN114464267A - Method and device for model training and product prediction

Info

Publication number: CN114464267A
Application number: CN202111000478.0A
Authority: CN
Inventors: 孟子乔; 赵沛霖
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2022-05-10

Abstract

The embodiment of the application provides a method and a device for model training and resultant prediction, wherein the training method comprises the following steps: obtaining a first reactant and a first product of the first reactant; carrying out noise adding processing on the first product to obtain the first product under different noise values; inputting a first reactant and a first product under different noise values into a prediction model to obtain gradient field information of an adjacency matrix of the first product output by the prediction model under different noise values; and performing joint training on the prediction model according to the gradient field information of the adjacency matrix of the first organism under different noise values to obtain the trained prediction model. In other words, the prediction model is jointly trained by the first reactant and the first product under different noise values, and the obtained prediction model is used for predicting the gradient field information of the adjacency matrix of the product of the second reactant, so that the products of different types of reactants can be predicted.

Description

Method and device for model training and product prediction

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for model training and product prediction.

Background

In the chemical field and the pharmaceutical field, the prediction of products from reactants is of great importance.

However, the variety of organic chemical reactions is great, and with the development of chemical research techniques, new chemical reactions are developed, and how to accurately predict the products of reactants is a technical problem to be solved in the field.

Disclosure of Invention

The embodiment of the application provides a method and a device for model training and product prediction, which predict the product of any reactant through a prediction model and realize accurate prediction of the products of different reactants.

In a first aspect, an embodiment of the present application provides a model training method, including:

obtaining a first reactant, and a first product of the first reactant;

carrying out noise adding processing on the first product to obtain first products under different noise values;

inputting the first reactant and a first product under different noise values into a prediction model to obtain gradient field information of an adjacent matrix of the first product output by the prediction model under different noise values;

and training the prediction model according to gradient field information of the adjacency matrix of the first product under different noise values to obtain the trained prediction model, wherein the trained prediction model is used for predicting the gradient field information of the adjacency matrix of the product of the second reactant, and the gradient field information is used for indicating the variation trend of the generation probability of the product of the second reactant.

In some embodiments, the first feature extraction subunit and the first feature extraction subunit are both encoders in a converter, and the feature decoding unit is a decoder in the converter.

In some embodiments, the above-mentioned noise-adding the adjacency matrix of the first living organism to obtain an adjacency matrix of the first living organism under different noise values includes:

and adding different noise values to the adjacent matrix of the first product by adopting a mode of adding noise in Gaussian distribution to obtain the adjacent matrix of the first product under different noise values.

In some embodiments, the above-mentioned p-th molecular feature extraction unit is a multilayer perceptron MLP.

In some embodiments, the correspondence between the noise value and the model parameter is a correspondence between the noise value and a parameter in the MLP.

In a second aspect, an embodiment of the present application provides a product prediction method, including:

obtaining a second reactant to be predicted and preset K noise values, wherein K is a positive integer less than or equal to L;

aiming at the ith noise value in the K noise values, determining a prediction model corresponding to the ith noise value, wherein the prediction model is obtained by training through the training method of any one of the claims 1-16, and i is a positive integer from 1 to K;

sampling in gradient field information predicted by a prediction model corresponding to the ith noise value according to the target adjacency matrix of the second reactant and the second product under the ith-1 noise value to obtain a target adjacency matrix of the second product under the ith noise value, wherein the second product is the product of the second reactant;

and determining the second product according to a target adjacency matrix of the second product under a K-th noise value, wherein the K-th noise value is the minimum value of the K noise values.

In some embodiments, the encoding module includes a first encoding submodule and a second encoding submodule;

the first coding submodule is used for processing the adjacent matrix and the node characteristic matrix of the second reactant to obtain first characteristic information of the second reactant;

the second coding submodule is used for processing the t-1 th adjacent matrix of the second product under the ith noise value to obtain first characteristic information of the t-1 th adjacent matrix of the second product.

In some embodiments, the first encoding submodule includes a first atomic feature extraction unit and a first molecular feature extraction unit;

the first atom feature extraction unit is used for processing the adjacency matrix and the node feature matrix of the second reactant to obtain the embedded representation of each atom in the second reactant;

the first molecular feature extraction unit is used for performing feature interaction on the embedded representation of each atom in the second reactant to obtain first feature information of the second reactant.

In some embodiments, the second encoding submodule includes a second atomic feature extraction unit and a second molecular feature extraction unit;

the second atomic feature extraction unit is used for processing the t-1 th adjacency matrix of the second product under the ith noise value to obtain the embedded representation of each atom corresponding to the t-1 th adjacency matrix of the second product;

the second molecular feature extraction unit is used for performing feature interaction on the embedded representation of each atom corresponding to the t-1 th adjacency matrix of the second product to obtain first feature information of the t-1 th adjacency matrix of the second product.

In some embodiments, the decoding module includes a feature extraction unit and a feature decoding unit;

the feature extraction unit is used for respectively obtaining second characteristic information of the second reactant and second feature information of the t-1 th adjacent matrix of the second product according to the first feature information of the second reactant and the first feature information of the t-1 th adjacent matrix of the second product;

the characteristic decoding unit is used for obtaining t-1 gradient field information of the adjacent matrix of the second product under the ith noise value according to the second characteristic information of the second reactant and the second characteristic information of the t-1 adjacent matrix of the second product.

In some embodiments, the feature extraction unit includes a first feature extraction subunit and a second feature extraction subunit;

the first feature extraction subunit is configured to obtain second characteristic information of the second reactant according to first feature information of the second reactant;

the second feature extraction subunit is configured to obtain second feature information of the t-1 th adjacent matrix of the second product according to the first feature information of the t-1 th adjacent matrix of the second product.

In some embodiments, if i is 1, each element of the target adjacency matrix of the second product at the i-1 st noise value conforms to the first normal distribution.

Optionally, the variance of the first normal distribution is a positive number less than or equal to 3.

In a third aspect, an embodiment of the present application provides a model training apparatus, including:

an acquisition unit for acquiring a first reactant and a first product of the first reactant;

the noise adding unit is used for adding noise to the first product to obtain the first product under different noise values;

the prediction unit is used for inputting the first reactant and a first product under different noise values into a prediction model to obtain gradient field information of an adjacent matrix of the first product output by the prediction model under different noise values;

and the training unit is used for training the prediction model according to gradient field information of the adjacency matrix of the first reactant under different noise values to obtain the trained prediction model, wherein the trained prediction model is used for predicting the gradient field information of the adjacency matrix of the second reactant, and the gradient field information is used for indicating the variation trend of the generation probability of the second reactant.

In a fourth aspect, an embodiment of the present application provides a product prediction apparatus, including:

the acquisition unit is used for acquiring a second reactant and preset K noise values;

a determining unit, configured to determine, for an ith noise value of the K noise values, a prediction model corresponding to the ith noise value, where the prediction model is obtained by training through the training method, and i is a positive integer from 1 to K;

a sampling unit, configured to sample gradient field information predicted by a prediction model corresponding to an i-th noise value according to the second reactant and a target adjacency matrix of a second product at the i-1 th noise value, to obtain a target adjacency matrix of the second product at the i-th noise value, where the second product is a product of the second reactant;

and the prediction unit is used for determining the second product according to a target adjacency matrix of the second product under a K-th noise value, wherein the K-th noise value is the minimum value of the K noise values.

In a fifth aspect, embodiments of the present application provide a computing device, comprising a processor and a memory;

the memory for storing a computer program;

the processor is configured to execute the computer program to implement the method of any of the first to second aspects.

In a sixth aspect, the present application provides a computer-readable storage medium, which includes computer instructions, which when executed by a computer, cause the computer to implement the method according to any one of the first aspect to the second aspect.

In a seventh aspect, this application embodiment provides a computer program product, which includes a computer program, the computer program being stored in a readable storage medium, the computer program being readable from the readable storage medium by at least one processor of a computer, the at least one processor executing the computer program to cause the computer to implement the method according to any one of the first aspect to the second aspect.

According to the method and the device for model training and product prediction, the first reactant and the first product of the first reactant are obtained; carrying out noise adding processing on the first product to obtain the first product under different noise values; inputting a first reactant and a first product under different noise values into a prediction model to obtain gradient field information of an adjacency matrix of the first product output by the prediction model under different noise values; and performing joint training on the prediction model according to the gradient field information of the adjacency matrix of the first organism under different noise values to obtain the trained prediction model. In other words, in the embodiment of the present application, the prediction models under different noise values can be obtained by performing joint training on the prediction models through the first reactant and the first product under different noise values, where the prediction models are used to predict the gradient field information of the adjacency matrix of the product of the second reactant, and the gradient field information is used to indicate the variation trend of the generation probability of the product of the second reactant, so that the product can be accurately predicted according to the gradient field information of the adjacency matrix of the product when the product is predicted. In addition, the prediction model of the application does not limit the types of the reactants, and further can realize the prediction of products of different types of reactants.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a system architecture diagram according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart illustrating a model training method according to an embodiment of the present disclosure;

FIG. 3A is a schematic diagram of a training of a predictive model according to an embodiment of the present disclosure;

FIG. 3B is a schematic diagram of a prediction model according to an embodiment of the present application;

FIG. 3C is a schematic diagram of a prediction model according to an embodiment of the present disclosure;

FIG. 3D is a schematic diagram of a prediction model according to an embodiment of the present disclosure;

FIG. 3E is a schematic diagram of a structure of a prediction model according to an embodiment of the present application;

FIG. 3F is a schematic diagram of a prediction model according to an embodiment of the present disclosure;

FIG. 3G is a schematic diagram of a prediction model according to an embodiment of the present application;

FIG. 3H is a schematic diagram of a prediction model according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a training process of a prediction model according to an embodiment of the present application;

FIG. 5 is a schematic flow chart illustrating a method for predicting a product according to an embodiment of the present disclosure;

FIG. 6A is a schematic diagram of a prediction model according to an embodiment of the present application;

FIG. 6B is a diagram illustrating another example of prediction of a prediction model according to an embodiment of the present application;

FIG. 6C is a diagram illustrating another prediction of a prediction model according to an embodiment of the present application;

FIG. 6D is a diagram illustrating another example of prediction of a prediction model according to an embodiment of the present application;

FIG. 6E is a diagram illustrating another example of prediction of a prediction model according to an embodiment of the present application;

FIG. 6F is a diagram illustrating a prediction process of a prediction model according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a training apparatus for a prediction model according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a structure of a product prediction device provided in an embodiment of the present application;

fig. 9 is a block diagram of a computing device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be understood that, in the present embodiment, "B corresponding to a" means that B is associated with a. In one implementation, B may be determined from a. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.

In the description of the present application, "plurality" means two or more than two unless otherwise specified.

In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," and the like do not denote any order or importance, but rather the terms "first," "second," and the like do not denote any order or importance.

The embodiment of the application is applied to the technical field of software testing, and particularly applied to the legality check of the requirement data, so that the test case can be stably and efficiently generated according to the legal requirement data.

To facilitate understanding of the embodiments of the present application, first, the related concepts related to the embodiments of the present application are briefly described as follows:

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

The natural language model is as follows: a large number of human language words are converted into machine language through a statistical model, and then used for cognition, understanding and generation. Specific applications include machine translation, automatic question answering and the like.

The AI technology has the greatest advantage that a large amount of learning data can be digested in a short time through a self-learning process, so that the purpose of no teaching and self-learning is realized.

Based on this, the present embodiments utilize AI techniques to predict the products of the reactants. Specifically, a large amount of organic chemical reaction data is used for training the prediction model, for example, the first reactant and the first product of the first reactant are used for training the prediction model, so that the prediction model learns the hiding rule of the reactants, and in the later prediction, the product of the reactant can be accurately predicted, for example, the product which cannot be predicted by the reaction template or an unknown product can be predicted, and the research and development efficiency of drugs and the like is greatly improved.

The application scenarios of the present application include, but are not limited to, medical, biological, scientific and other fields, such as drug production, drug research and development, vaccine research and development, etc.

In some embodiments, the system architecture of embodiments of the present application is shown in fig. 1.

Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application, which includes a user device 101, a data acquisition device 102, a training device 103, an execution device 104, a database 105, and a content library 106.

The data acquisition device 102 is configured to read training data from the content library 106 and store the read training data in the database 105. The training data referred to in the embodiments of the present application includes a plurality of reactants and products for model training, and for convenience of description, the reactant for model training is referred to herein as a first reactant, and the product of the first reactant is known and referred to as a first product.

In some embodiments, the user device 101 is configured to perform annotation operations on training data in the database 105.

The training device 103 trains the predictive model based on training data maintained in the database 105 so that the trained predictive model can accurately predict the products of the reactants. Where the predictive model derived by the training apparatus 103 may be applied to different systems or apparatuses.

In one possible implementation, in fig. 1, the execution device 104 is configured with an I/O interface 107 for data interaction with external devices. Such as receiving a reactant (e.g., a second reactant) to be predicted sent by user device 101 via an I/O interface. The calculation module 109 in the execution device 104 processes the input second reactant using the trained predictive model, outputs gradient field information of the adjacency matrix of the product of the second reactant, determines the product of the second reactant according to the gradient field information, and sends the product of the second reactant to the user device 101 through the I/O interface.

The user device 101 may include a mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), or other terminal devices with a browser installation function.

The execution device 104 may be a server.

For example, the server may be a rack server, a blade server, a tower server, or a rack server. The server may be an independent test server, or a test server cluster composed of a plurality of test servers.

In this embodiment, the execution device 104 is connected to the user device 101 via a network. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, or a communication network.

It should be noted that fig. 1 is only a schematic diagram of a system architecture provided in an embodiment of the present application, and a positional relationship between devices, modules, and the like shown in the diagram does not constitute any limitation. In some embodiments, the data collection device 102 may be the same device as the user device 101, the training device 103, and the performance device 104. The database 105 may be distributed on one server or a plurality of servers, and the content library 106 may be distributed on one server or a plurality of servers.

The technical solutions of the embodiments of the present application are described in detail below with reference to some embodiments. The following several embodiments may be combined with each other and may not be described in detail in some embodiments for the same or similar concepts or processes.

First, a training process of the prediction model according to an embodiment of the present application will be described with reference to fig. 2.

Fig. 2 is a schematic flow chart of a model training method provided in an embodiment of the present application, and as shown in fig. 2, the method includes:

s201, obtaining a first reactant and a first product of the first reactant.

The execution subject of the embodiment of the present application is an apparatus having a model training function, for example, a model training apparatus, which may be a computing device or a part of a computing device, for example, a processor in a computing device. Illustratively, the model training device may be the training apparatus in fig. 1. Wherein the training device in fig. 1 may be understood as a computing device, or a processor or the like in a computing device.

For convenience of description, the following embodiments are described taking an execution subject as an example of a computing device.

In one example, a number of reactants known to the product are collected from an OAS database (underlying Antibody Space database) for model training. For convenience of description, the reactant used for model training is referred to as the first reactant, and the product of the first reactant is referred to as the first product.

It should be noted that the model training in the embodiment of the present application is an iterative process, i.e., a first training is performed on the model by using a first reactant and a corresponding first organism, then, a second training is performed on the model after the first training by using a second first reactant and a corresponding first organism, and so on until the model training is completed.

The process of training the model using each first reactant and the product of the first reactant is the same, and for convenience of description, the process of training the model using one first reactant and the corresponding first organism is described as an example.

In some embodiments, (G) is used^r，G^p) Represents a chemical reaction, G^rRepresents a reactant, G^pRepresents the product. Wherein G is^r＝(V，A^r，Z^r) And V represents the atomic set of the reactant, i.e. the atomic species that the reactant includes, for example, H2O, the atomic set of the reactant includes two atoms, H (hydrogen) and O (oxygen), and the size of the atomic set is equal to the number of atoms | V | ═ N. A. the^r∈R^N×NRepresents a adjacency matrix, Z^rRepresenting a characteristic matrix of reactant atoms, exemplary, inverseThe atomic feature matrix of the reactant includes information such as the type of each atom in the reactant, the chemical characteristics of the atom (e.g., whether it is aromatic), and the charge amount of the atom. Product G^p＝(V，A^p，Z^p) Are defined similarly to the reactants.

The goal of the prediction of an organic chemical reaction is to give reactant G^rPredicted product G^p. Since the atoms in the reactant and the product correspond one to one in the organic chemical reaction, the only change before and after the reaction is the connection condition of the chemical bond, and the change before and after the chemical reaction of the adjacent matrix is seen from the view point of the molecular diagram. That is, the change of the adjacency matrix a before and after the reaction is mainly caused, and the atom set is not changed. Therefore, the product can be predicted only by predicting the corresponding adjacent matrix, and the structure of the whole product can be restored according to the adjacent matrix.

It should be noted that the adjacency matrix of the present application may be different from the conventional adjacency matrix, such as a^rAnd A^pEach of the values in (a) is not necessarily only 0 or 1(0 represents no connection between corresponding atom pairs, and 1 represents connection between corresponding atom pairs), but may also be a plurality of discrete or continuous values, such as four values of 0, 1, 2, and 3, which are used to represent four connection situations, i.e., no connection, single connection, double connection, and triple connection, respectively.

S202, carrying out noise adding processing on the first product to obtain the first product under different noise values.

S203, inputting the first reactant and the first product under different noise values into a prediction model to obtain gradient field information of the adjacency matrix of the first product output by the prediction model under different noise values.

The gradient field information is used to indicate a trend of change in the probability of generation of the first organism.

As shown in fig. 3A, after the first product is subjected to the noise addition processing, the first reactant and the first product at different noise values are input to the prediction model, respectively, and gradient field information indicating a change tendency of the generation probability of the first product, which is output from the prediction model, corresponding to the different noise values of the adjacency matrix of the first product is obtained.

As can be seen from the above description, the first organism is G^p＝(V，A^p，Z^p) Where V represents a collection of atoms and the size of the collection is the number of atoms | V | ═ N, A^p∈R^N×NA adjacency matrix representing a first product, Z^pAn atomic feature matrix representing the first product.

In some embodiments, the noise processing on the first product in S202 may be V, A^p、Z^pAt least one of which is subjected to a noise addition process.

In some embodiments, the performing noise processing on the first product in S202 includes: acquiring an adjacency matrix of a first organism; and carrying out noise addition processing on the adjacent matrix of the first product to obtain the adjacent matrix of the first product under different noise values. For example, for the above A^pAnd (5) carrying out noise adding treatment.

The present application does not limit the specific embodiment of the above-described noise addition processing for the adjacent matrix of the first product.

In a possible implementation manner, different noise values are added to the adjacency matrix of the first product by adopting a gaussian distribution noise adding manner, so as to obtain an adjacency matrix of the first product under different noise values.

For example, adding different noise values to the adjacency matrix of the first product according to the following formula (1) yields an adjacency matrix of the first product at different noise values:

wherein the content of the first and second substances,

is a contiguous matrix of the first product obtained after the addition of noise through a Gaussian distribution N (A)^pσ) was obtained. Mu, sigma are two parameters of the Gaussian distribution, the mean of the mu control distribution, the target of the sigma control distributionAnd (4) tolerance. Wherein the contiguous matrix A of the first organisms^pAs the μ parameter of the gaussian distribution.

Since the adjacent matrix of the first product is a triangular symmetric matrix, the above expression (1) indicates that only the upper triangular matrix is noise-disturbed, and the lower triangular matrix is symmetric in order to maintain the symmetry of the adjacent matrix of the first product.

By the above-described noise addition method, the adjacency matrix A for the first product^pAdding different levels of noise values

(i.e., denoise with a gaussian distribution of different standard deviations σ) to obtain a total of L sets of denoised data, each set of denoised data comprising a contiguous matrix of the first reactant and a contiguous matrix of the first product at a certain noise value. The prediction model is then jointly trained using the L sets of data.

After the adjacent matrix of the first product is subjected to the noise adding processing through the steps, the corresponding step S203 includes steps S203-a and S203-B:

S203-A, acquiring an adjacency matrix and a node characteristic matrix of the first reactant and a node characteristic matrix of the first product.

S203-B, inputting the adjacency matrix and the node characteristic matrix of the first reactant, the node characteristic matrix of the first product and the adjacency matrix under different noise values into the prediction model to obtain gradient field information, corresponding to the different noise values, of the adjacency matrix of the first product output by the prediction model.

The prediction model according to the embodiment of the present application is not directly fitted to the product adjacent matrix, but is fitted to the change tendency of the product generation probability, that is, gradient field information of the product adjacent matrix, and is expressed as

In this way, in the subsequent actual product prediction process, optimization can be performed along the direction of gradient field information of the product adjacency matrix predicted by the prediction model to obtain the product adjacency matrix, namely, along the direction of the product adjacency matrixThe gradient field information direction of the adjacent matrix of the object slides, the adjacent matrix of the product is closer to the high probability density area, and finally the adjacent matrix of the product with better quality is generated.

The present application does not limit the specific network structure of the prediction model, for example, the prediction model may be a graph attention network, a converter, a variational self-encoder, etc.

In some embodiments, as shown in fig. 3B, the prediction model includes an encoding module and a decoding module, in this case, S203-B includes:

S203-B1, inputting the adjacency matrix and the atomic feature matrix of the first reactant, the atomic feature matrix of the first product and the adjacency matrix of the first product under the ith noise value into the encoding module aiming at the ith noise value in different noise values, and obtaining the first feature information of the first reactant and the first feature information of the first product under the ith noise value, which are output by the encoding module.

S203-B2, inputting the first characteristic information of the first reactant and the first characteristic information of the first product under the ith noise value into a decoding module, and obtaining the gradient field information of the first generated adjacency matrix under the ith noise value output by the decoding module.

The model training process is described below in connection with the network structure of the coding modules.

In some embodiments, as shown in fig. 3C, the encoding module includes a first encoding sub-module and a second encoding sub-module, in this case, S203-B1 includes:

S203-B11, inputting the adjacency matrix and the node characteristic matrix of the first reactant into the first coding submodule to obtain first characteristic information of the first reactant.

S203-B12, inputting the atomic characteristic matrix of the first product and the adjacent matrix under the ith noise value into a second coding submodule to obtain first characteristic information of the first product under the ith noise value.

In the embodiment of the present application, the network structures of the first encoding sub-module and the second encoding sub-module are not limited, and optionally, the network structures of the first encoding sub-module and the second encoding sub-module may be different.

In some embodiments, the network structure of the first encoding submodule is the same as the network structure of the second encoding submodule. For convenience of description, the first encoding sub-module and the second encoding sub-module are replaced with a pth encoding sub-module. For example, when p is 1, the pth encoding sub-module is the first encoding sub-module, and when p is 2, the pth encoding sub-module is the second encoding sub-module.

The pth coding submodule comprises a pth atomic feature extraction unit and a pth molecular feature extraction unit, and in this case, the above-mentioned S203-B1 or S203-B2 can be implemented by the following steps:

S203-B01, inputting the adjacency matrix and the node feature matrix of the target object into the p-th atom feature extraction unit to obtain the embedded representation of each atom in the target object;

S203-B02, inputting the embedded representation of each atom in the target object into the pth molecular feature extraction unit for feature interaction to obtain first feature information of the target object;

when p is 1, the pth encoding submodule is a first encoding submodule, the adjacent matrix and the node characteristic matrix of the target object are adjacent matrices and node characteristic matrices of the first reactant, when p is 2, the pth encoding submodule is a second encoding submodule, and the adjacent matrix and the node characteristic matrix of the target object are an atomic characteristic matrix of the first product and an adjacent matrix of the first product under the ith noise value.

For example, as shown in fig. 3D, if the first encoding submodule includes a first atomic feature extraction unit and a first molecular feature extraction unit, S203-B1 includes: inputting the adjacency matrix and the node feature matrix of the first reactant into a first atomic feature extraction unit for atomic feature extraction to obtain an embedding (embedding) representation of each atom in the first reactant; then, the embedded representation of each atom in the first reactant is input into the first molecular feature extraction unit for atomic feature interaction, so as to obtain first feature information of the first reactant, wherein the first feature information of the first reactant can be understood as the molecular structure characteristic information of the first reactant.

With continued reference to fig. 3D, if the second encoding submodule includes a second atomic feature extraction unit and a second molecular feature extraction unit, S203-B2 includes: inputting the atomic characteristic matrix of the first product and the adjacent matrix under the ith noise value into a first atomic feature extraction unit for atomic feature extraction to obtain an embedding (embedding) representation of each atom in the first product; then, the embedded representation of each atom in the first product is input into the first molecular feature extraction unit for atomic feature interaction, so as to obtain first feature information of the first product under the ith noise value, wherein the first feature information of the first product under the ith noise value can be understood as the molecular structure characteristic information of the first product after being scrambled by the ith noise value.

The embodiment of the present application does not limit the network structure of the pth atomic characteristic extraction module, that is, the first atomic characteristic extraction module and the second atomic characteristic extraction module.

In one possible implementation, as shown in fig. 3E, the p-th atomic feature extraction module includes M GNN layers, where M is a positive integer, the output value of the next GNN layer is added to the predicted value of the previous GNN layer to be used as the predicted value of the next GNN layer, and the final output of the p-th atomic feature extraction module composed of M GNN layers is the sum of the predicted values of each GNN layer.

At this time, the above S203-B02 inputs the adjacency matrix and the node feature matrix of the object into the pth atomic feature extraction unit, and the obtaining of the embedded representation of each atom in the object includes S203-B021 to S203-B023:

S203-B021, inputting the adjacent matrix and the node characteristic matrix of the target object into a p-th atom characteristic extraction unit to obtain the connecting bond information of the j-th atom in the target object extracted by the M-th GNN layer, wherein M is a positive integer less than or equal to M.

The adjacency matrix of the embodiment of the present application includes four values of 0, 1, 2, and 3, which respectively represent no connection, single bond connection, double bond connection, and triple bond connection between atoms. Inputting the adjacency matrix and the node feature matrix of the target object into the p-th atomic feature extraction unit, so that the p-th atomic feature extraction unit obtains the connecting bond information of each atom in the target object extracted by the m-th GNN layer according to the adjacency matrix and the node feature matrix of the target object. For convenience of description, the following embodiments are described by taking the case of specifying the bond information of the j-th atom in the target object as an example, and the process of specifying the bond information of other atoms may refer to the process of specifying the bond information of the j-th atom.

In one example, the adjacency matrix and the node feature matrix of the target object are input into the p-th atom feature extraction unit, and the information of the j-th atom and the information of the neighbor atoms of different bond types of the j-th atom are aggregated through the m-th GNN layer, so that the bond information of the j-th atom extracted by the m-th GNN layer is obtained.

For example, as shown in fig. 3F, taking the mth GNN layer as an example, assume that the jth atom has 4 neighbor atoms, wherein the jth atom is singly bonded to two neighbor atoms and the jth atom is doubly bonded to two other neighbor atoms. Firstly, inputting an adjacency matrix and a node characteristic matrix of an object into an mth GNN layer, and for a jth atom in the object, the mth GNN layer carries out aggregation on information of two adjacent atoms connected by a single bond of the jth atom and information of the jth atom to obtain single bond information of the jth atom extracted by the mth GNN layer. Similarly, the information of two neighboring atoms of the mth GNN layer, which connect the double bond of the jth atom, and the information of the jth atom are polymerized to obtain the double bond information of the jth atom extracted by the mth GNN layer.

Wherein, the process of extracting the connecting bond information of the j atom from each GNN in the M GNN layers is basically the same.

Optionally, the bond information of the jth atom in the target object extracted from the mth GNN layer may be determined according to the following formula (2):

wherein C represents a bond type, A_{[C，·，·]}An adjacency matrix corresponding to the link key C is shown,

is an atomic feature matrix corresponding to the m-1 GNN layer,

and connecting bond information of the jth atom in the target object extracted from the mth GNN layer.

The meaning of the above formula (2) is to aggregate information inside each kind of linkage. For example, if an atom has 4 neighbors, 2 of which are double-bonded and the other 2 of which are single-bonded, then double-bonded neighbor information is aggregated first and single-bonded neighbor information is aggregated, i.e., aggregated separately according to the type of bond.

S203-B022, fusing the connection bond information of the jth atom and the (m-1) th embedded representation of the jth atom corresponding to the (m-1) th GNN layer to obtain the (m) th embedded representation of the jth atom corresponding to the (m) th GNN layer.

Continuing with fig. 3F, after fusion splicing is performed on the single bond information and the double bond information of the jth atom and the m-1 th embedded representation of the jth atom corresponding to the m-1 th GNN layer (i.e., the embedding at the previous time in fig. 3F), MLP operation is performed to obtain the mth embedded representation of the jth atom corresponding to the mth GNN layer. Then, according to the above steps, the bond information of the jth atom extracted from the m +1 th GNN layer is obtained, and after fusion splicing is performed on the bond information of the jth atom extracted from the m +1 th GNN layer and the mth embedded representation of the jth atom corresponding to the mth GNN layer, MLP operation is performed to obtain the m +1 th embedded representation of the jth atom corresponding to the m +1 th GNN layer. And sequentially proceeding until an Mth embedded representation of a jth atom corresponding to the Mth GNN layer is obtained.

In one example, the mth embedded representation of the jth atom corresponding to the mth GNN layer can be determined according to equation (3) as follows:

wherein the content of the first and second substances,

under the m-1 embedded table for the jth atom corresponding to the m-1 GNN layer,

the bond information of the j atom extracted from the m-th GNN layer,

the mth embedded representation of the jth atom corresponding to the mth GNN layer.

S203-B023, splicing the embedded representation of the jth atom corresponding to each GNN layer in the M GNN layers to obtain the embedded representation of the jth atom.

According to the steps, the embedded representation of the jth atom corresponding to each GNN layer in the M GNN layers can be obtained, and the embedded representations of the jth atom corresponding to each GNN layer are spliced to obtain the embedded representation of the jth atom.

In one example, the embedded representation of the jth atom is determined according to equation (4) below:

wherein (Z)_out)_jFor the embedded representation of the jth atom,

is an embedded representation of the jth atom corresponding to the mth GNN layer.

In one example, the pth atomic feature extraction unit can be simplified as represented by the following formula (5):

Z_out＝MultiChannelGNN(A，Z_in) (5)

wherein Z is_outRepresenting the output of the p-th atomic feature extraction unit, A being the adjacency matrix of the input p-th atomic feature extraction unit, Z_inFor inputting the node feature matrix of the p-th atomic feature extraction unit, the above formula (5) shows that the p-th atomic feature extraction unit isThe multi-channel GNN is output from the p-th atomic feature extraction means based on the characteristic information extracted from each GNN layer.

The embodiments of the present application do not limit the network structure of the pth molecular characteristics extraction module, that is, the first molecular characteristics extraction module and the second molecular characteristics extraction module.

According to the above mode, after the embedded representation of each atom in the target object is obtained, S203-B02 is executed to input the embedded representation of each atom in the target object into the pth molecular feature extraction unit for feature interaction, so as to obtain the first feature information of the target object. For example, the embedded representation of each atom in the first reactant is input into a first molecular feature extraction unit to obtain first feature information x of the first reactant^RInputting the embedding expression of each atom in the first product into the second molecular characteristic extraction unit to obtain the first characteristic information x of the first product under the ith noise value^p。

The first characteristic information may be understood as a low-dimensional vector representation of the target object.

The embodiment of the present application does not limit the network structure of the p-th molecular feature extraction unit.

In one example, the network structure of the above-mentioned pth molecular feature extraction unit, i.e. the first molecular characteristic extraction module and the second molecular characteristic extraction module, is a multi-layer perceptron MLP.

The model training process is described above in connection with the network structure of the coding module, and the module training process is described below in connection with the decoding module.

In some embodiments, as shown in fig. 3G, the decoding module includes a feature extracting unit and a feature decoding unit, wherein the feature extracting unit in the decoding module can be used to perform interactive learning again on the features extracted by the encoding module, and the feature decoding unit can be understood to be used to decode the features extracted by the feature extracting unit to generate gradient field information pointing to the adjacency matrix of the first creature.

On the basis of the network structure shown in FIG. 3G, the above-mentioned S203-B2 includes S203-B21 to S203-B22:

S203-B21, inputting the first characteristic information of the first reactant and the first characteristic information of the first product under the ith noise value into the characteristic extraction unit, and respectively obtaining the second characteristic information of the first reactant and the second characteristic information of the first product under the ith noise value.

The present application does not limit the network structure of the above-described feature extraction unit.

In some embodiments, as shown in fig. 3H, the feature extraction unit includes a first feature extraction sub-unit and a second feature extraction sub-unit, and in this case, the above S203-B21 includes:

S203-B211, inputting the first characteristic information of the first reactant into a first characteristic extraction subunit to obtain second characteristic information of the first reactant;

S203-B212, inputting the first feature information of the first product under the ith noise value into a second feature extraction subunit to obtain the second feature information of the first product under the ith noise value.

The network structure of the first feature extraction subunit and the second feature extraction subunit is not limited in the present application.

S203-B22, inputting the second characteristic information of the first reactant and the second characteristic information of the first product under the ith noise value into the characteristic decoding unit, and obtaining the gradient field information of the adjacency matrix of the first product under the ith noise value.

In some embodiments, the first feature extraction subunit and the second feature extraction subunit are both a transformer encoder (transformer encoder) and the feature decoding unit is a transformer decoder (transformer decoder). Specifically, the second characteristic information h of the first reactant extracted by the first feature extraction subunit^RAs query, the second feature information h of the first product extracted by the second feature extraction subunit under the ith noise value^PThe TransformerDecoder is input as key and value together, and gradient field information of the adjacency matrix of the first organism at the ith noise value is obtained.

In one example, the gradient field information S of the adjacency matrix of the first organism at the ith noise value can be obtained according to the following formula:

S＝TransformerDecoder(h^R，h^P) (7)

wherein the content of the first and second substances,

h^R＝TransformerEncoder(x^R)

h^P＝TransformerEncoder(x^P)

h above^RInformation on the second characteristic of the first reactant, h^PAnd second feature information of the first product under the ith noise value is obtained.

Based on the network structure of the prediction model described in the above embodiments, in a specific embodiment, the prediction model and the training process of the prediction model related to the embodiments of the present application are as shown in fig. 4, an adjacency matrix and an atomic feature matrix of a first reactant are input into a first atomic characteristics extraction unit GNN encoder for atomic feature extraction, and an embedded representation of each atom in the first reactant is obtained, wherein the adjacency matrix e R of the first reactant is^N×N×CThe atomic feature matrix of the first reactant ∈ R^N×FThe GNN encoder includes M GNN layers. The method comprises the steps of conducting noise processing on an adjacent matrix of a first product, inputting an atomic feature matrix of the first product and the adjacent matrix after the noise processing into a second atomic feature extraction unit GNN encoder to conduct atomic feature extraction, and obtaining an embedded expression of each atom in the first product, wherein the atomic feature matrix of the first product belongs to R^N×FThe adjacency matrix of the first product and the adjacency matrix [ epsilon ] R after the noise addition^N×N×CThe GNN encoder includes M GNN layers. Then, the embedding representation of each atom in the first reactant is input into a first molecular feature extraction unit MLP for molecular feature extraction, and first feature information x of the first reactant is obtained^RWherein x is^R∈R^N×N. Inputting the embedded representation of each atom in the first product into a second molecular feature extraction unit MLP for molecular feature extraction to obtain first feature information x of the first product under the ith noise value^pWherein x is^p∈R^N×N. Then, the characteristic information x of the first reactant is determined^RInputting a first characteristic extraction subunitTransformer encoder for obtaining second characteristic information h of the first reactant^RWherein h is^R∈R^N×N. The first feature information x of the first product under the ith noise value^pInputting the second feature extraction subunit to obtain second feature information h of the first product under the ith noise value^pWherein h is^p∈R^N×N. Then, the second characteristic information h of the first reactant is analyzed^RAnd second characteristic information h of the first product under the ith noise value^pInputting a feature decoding unit transformerDecoder to obtain gradient field information S e R of an adjacency matrix of a first product predicted by a prediction model under the ith noise value^N×N。

The above-mentioned introduction is made to the gradient field information of the adjacency matrix of the first product predicted by the prediction model at the i-th noise value, and the ways of predicting the gradient field information corresponding to the adjacency matrix of the first product at each noise value by the prediction model are the same, and the description of the gradient field information of the adjacency matrix of the first product at the i-th noise value predicted by the prediction model is specifically referred to, and is not repeated here.

After the prediction model predicts the gradient field information of the adjacency matrix of the first organism under different noise values according to the above steps, the following S204 is executed to implement the training of the prediction model.

And S204, training the prediction model according to the gradient field information of the adjacency matrix of the first organism under different noise values to obtain the trained prediction model.

The implementation manners of S204 include, but are not limited to, the following:

in the first mode, the prediction model is trained reversely by using the loss between the gradient field information of the adjacency matrix of the first product under different noise values predicted by the prediction model and the true value of the gradient field information of the adjacency matrix of the first product under different noise values, so as to obtain the trained prediction model. Optionally, the true values of the gradient field information under different noise values may be obtained by predicting with other trained gradient field information prediction models, for example, inputting the first reactant and the first product under different noise values into the gradient field information prediction model, obtaining the gradient field information of the adjacency matrix of the first product output by the gradient field information prediction model under different noise values, and taking the gradient field information of the adjacency matrix of the first product output by the gradient field information prediction model under different noise values as the true values of the gradient field information of the adjacency matrix of the first product under different noise values.

In a second mode, the S204 comprises S204-A1 and S204-A2:

S204-A1, determining the loss of the prediction model according to the gradient field information of the adjacency matrix of the first organism under different noise values;

and S204-A2, adjusting parameters in the prediction model according to the loss of the prediction model to obtain the trained prediction model.

The embodiment of the present application does not limit the way of determining the loss of the prediction model.

In one example, the gradient field information of the adjacency matrix of the first organism at different noise values is substituted into the existing loss function, and the loss of the prediction model is calculated.

In one example, the loss of the predictive model is determined according to equation (8) below

Where theta represents a parameter of the prediction model,

which represents the number L of noise values,

gradient field information at the ith noise value for a adjacency matrix of the first product predicted by the prediction model,

is firstThe actual gradient field information of the adjacency matrix of the product at the i-th noise value. The above-mentioned loss function is used for constraining the gradient field information of the adjacency matrix of the first organism predicted by the prediction model, so that the gradient field information predicted by the prediction model gradually approaches the true value of the gradient field information of the adjacency matrix of the first organism.

In some embodiments, the actual gradient field information of the adjacency matrix of the first generation at the ith noise value can be derived according to a noise processing mode. For example, if the gaussian distribution noise is adopted in S202, the noise is generated

At this time, the loss of the prediction model can be determined according to the following formula (9)

As can be seen from the above, the present application adds different noise values to the first product, and performs joint training on the prediction model using the first product at different noise values. The trained prediction model can predict gradient field information of the product adjacency matrix (i.e., a transformation trend of the product generation probability), and further obtain the product based on the gradient field information of the predicted product adjacency matrix, for example, perform sampling in the gradient field information of the predicted product adjacency matrix to obtain the product adjacency matrix, thereby realizing prediction of the product.

In some embodiments, in order to select a prediction model under different noise values in the subsequent prediction of the product, the embodiments of the present application further include: and generating a corresponding relation between the noise values and the model parameters, wherein the corresponding relation comprises the model parameters of the prediction model corresponding to each noise value in different noise values.

In some embodiments, if the p-th molecular feature extraction unit is an MLP layer, different model parameters are set for different noise values in the MLP layer, and during model training, the model parameters corresponding to different noise values in the MLP layer are adjusted in reverse according to the calculated loss function, so as to obtain model parameters corresponding to different noise values in the MLP layer. Therefore, when the prediction models corresponding to different noise values are used subsequently, parameters in the MLP layer in the prediction models are directly replaced by model parameters corresponding to the target noise values, and the prediction efficiency of the models is further improved.

Illustratively, the correspondence between the noise values and the model parameters in the MLP layer is shown in table 1:

σ₁	α₁	β₁
			σ₂	α₂	β₂
……	……	……
			σ_L	α_L	β_L

where α, β are parameters of the MLP layer, e.g. Lth noise value σ_LParameter α of the corresponding MLP layer_L，β_LTo is aligned withOther parameters in the prediction model are fixed and do not vary with the noise value. Therefore, when the gradient field information of the adjacent matrix of the subsequent prediction product under different noise values is obtained, the parameters in the MLP layer in the prediction model are determined as the parameters corresponding to the noise values, and the prediction model under the noise values can be obtained.

In some embodiments, when sampling is performed using a method such as noise annealing in a product prediction process, it is desirable to quickly and easily obtain prediction models at different noise values. Thus, as shown in Table 1, σ is obtained if necessary₁When the prediction model is trained horizontally, the input is 1, and then the sigma can be obtained₁A corresponding predictive model.

According to the model prediction method provided by the embodiment of the application, a first reactant and a first product of the first reactant are obtained; carrying out noise adding processing on the first product to obtain the first product under different noise values; inputting a first reactant and a first product under different noise values into a prediction model to obtain gradient field information of an adjacency matrix of the first product output by the prediction model under different noise values; and performing joint training on the prediction model according to the gradient field information of the adjacency matrix of the first organism under different noise values to obtain the trained prediction model. In other words, in the embodiment of the present application, the prediction models under different noise values can be obtained by performing joint training on the prediction models through the first reactant and the first product under different noise values, where the prediction models are used to predict the gradient field information of the adjacency matrix of the product of the second reactant, and the gradient field information is used to indicate the variation trend of the generation probability of the product of the second reactant, so that the product can be accurately predicted according to the gradient field information of the adjacency matrix of the product when the product is predicted. In addition, the prediction model of the application does not limit the types of the reactants, and further can realize the prediction of products of different types of reactants.

The above describes the training process of the prediction model, and the following describes the use process of the prediction model.

Fig. 5 is a schematic flowchart of a product prediction method according to an embodiment of the present application, as shown in fig. 5, including:

s501, obtaining a second reactant and preset K noise values.

The second reactant is understood to mean a reactant whose product is to be predicted.

K is a positive integer less than or equal to L.

Optionally, the K noise values are the same as part of the L noise values used in training the prediction model.

Optionally, at least one of the K noise values is different from the L noise value used in the training of the prediction model.

S502, aiming at the ith noise value in the K noise values, determining a prediction model corresponding to the ith noise value, wherein the prediction model is obtained by training through the model training method, and i is a positive integer from 1 to K.

According to the embodiment, when the prediction model is trained, the training samples are subjected to noise adding processing by using the L noise values, so that the prediction model under the L noise values is obtained. Based on the above, in actual use, the prediction model under a specific noise value can be selected according to actual needs for use.

For example, if the ith noise value among the K noise values is the same as the noise value a among the L noise values at the time of the training, the prediction model corresponding to the noise value a is determined as the prediction model corresponding to the ith noise value.

For another example, if the i-th noise value is different from each of the L noise values used in the training but is closest to the noise value b, the prediction model corresponding to the noise value b is determined as the prediction model corresponding to the i-th noise value.

In the above training, the prediction model is trained for each noise value to obtain a set of parameters, so that the prediction model at a certain noise value described herein can be understood as a model parameter corresponding to the noise value in the prediction model.

In some embodiments, the step S502 includes the steps S502-A1 through S502-A3:

and S502-A1, acquiring the corresponding relation between the noise value and the model parameter.

As shown in table 1, the correspondence relationship includes model parameters of the prediction model corresponding to each of the different noise values.

S502-A2, according to the ith noise value, inquiring the ith group of model parameters corresponding to the ith noise value from the corresponding relation.

And S502-A3, taking the ith group of model parameters as the parameters of the prediction model to obtain the prediction model corresponding to the ith noise value.

The prediction models of this embodiment include prediction models at different noise values, for example, when gradient field information at the i-th noise value is predicted, the prediction model at the i-th noise value is used for prediction. Specifically, from the correspondence shown in table 1, an ith group of model parameters corresponding to the ith noise value is queried, and the ith group of model parameters is used as parameters of the prediction model, so as to obtain the prediction model corresponding to the ith noise value. And predicting the gradient field information under the ith noise value by using a prediction model corresponding to the ith noise value.

S503, sampling in gradient field information predicted by a prediction model corresponding to the ith noise value according to the target adjacency matrix of the second reactant and the second product under the ith-1 noise value to obtain the target adjacency matrix of the second product under the ith noise value, wherein i is a positive integer from 1 to K, and the second product is the product of the second reactant.

The process of this step is a loop iteration process, for example, K noise values are sorted from large to small. Starting from i-1, an adjacency matrix is first randomly initialized for the second product

As a target adjacency matrix for the second product at the 0 th noise value. The initial adjacency matrix of the second product and the adjacency matrix of the second reactant are equal in size, and are each an N × N matrix, where N is the number of atoms in the second reactant. The initial adjacency matrix is a place in the second productThere is a linking relationship of atoms, and thus the structure of the second product can be represented. Initial adjacency matrix according to second reactant and second product

And sampling the gradient field information under the 1 st noise value predicted by the prediction model to obtain a target adjacency matrix of the second product under the 1 st noise value. And then, taking i as i +1, sampling the gradient field information under the 2 nd noise value predicted by the prediction model according to the target adjacent matrix of the second reactant and the second product under the 1 st noise value to obtain the target adjacent matrix of the second product under the 2 nd noise value, and repeating the steps until obtaining the target adjacent matrix of the second product under the K-th noise value. And the Kth noise value is the minimum value of the K noise values.

In the embodiment of the application, the target adjacency matrix of the second product is sampled in the gradient field information predicted by the prediction model corresponding to the K noise values, the target adjacency matrix gradually approaches to the direction of the real adjacency matrix of the second product along with the change of the gradient field information, and the target adjacency matrix obtained by sampling in the gradient field information corresponding to the K-th noise value with the minimum noise value is used as the final adjacency matrix of the second product, so that the accurate prediction of the product of the second reactant is realized.

The method for obtaining the target adjacency matrix of the second product at the ith noise value in S503 includes, but is not limited to, the following methods:

in the first embodiment, the target adjacency matrix of the second product at the i-1 th noise value and the second reactant are input into the prediction model corresponding to the i-1 th noise value, and the gradient field information of the i-th target adjacency matrix of the second product output by the prediction model corresponding to the i-th noise value, which reflects the variation trend of the i-th target adjacency matrix of the second product, is obtained.

In a second embodiment, the target adjacency matrix of the second product at the ith noise value is generated according to T iterations, and the specific step S503 includes the following steps S503-a1 to S503-A3:

S503-A1, inputting the T-1 th adjacency matrix of the second product under the ith noise value and the second reactant into a prediction model corresponding to the ith noise value to obtain the T-1 th gradient field information of the adjacency matrix of the second product under the ith noise value, wherein T is a positive integer less than or equal to T, and when T is 1, the T-1 th adjacency matrix of the second product under the ith noise value is the target adjacency matrix of the second product under the i-1 th noise value.

S503-A2, updating the T-1 th adjacent matrix of the second product under the ith noise value by using T-1 gradient field information of the adjacent matrix of the second product under the ith noise value to obtain the T-th adjacent matrix of the second product under the ith noise value, and repeating the steps until T is T.

S503-A3, determining the T adjacent matrix of the second product under the ith noise value as the target adjacent matrix of the second product under the ith noise value.

For example, a target adjacency matrix of the second product under the i-1 noise value and the second reactant are input into a prediction model corresponding to the i noise value, and 0 th gradient field information corresponding to the adjacency matrix of the second product under the i noise value is obtained. And updating the target adjacency matrix of the second product under the i-1 th noise value by using the 0 th gradient field information to obtain the 1 st adjacency matrix of the second product under the i-1 th noise value. And then inputting the 1 st adjacency matrix of the second product under the ith noise value and the second reactant into a prediction model corresponding to the ith noise value to obtain the 1 st gradient field information of the adjacency matrix of the second product under the ith noise value. And updating the 1 st adjacency matrix of the second product under the ith noise value by using the 1 st gradient field information to obtain the 2 nd adjacency matrix of the second product under the ith noise value. And then inputting the 2 nd adjacency matrix of the second product under the ith noise value and the second reactant into a prediction model corresponding to the ith noise value to obtain the 2 nd gradient field information of the adjacency matrix of the second product under the ith noise value. And updating the 2 nd adjacency matrix of the second product under the ith noise value by using the 2 nd gradient field information to obtain the 3 rd adjacency matrix of the second product under the ith noise value. And by analogy, inputting the T-1 adjacent matrix of the second product under the ith noise value and the second reactant into the prediction model corresponding to the ith noise value to obtain the T-1 gradient field information of the adjacent matrix of the second product under the ith noise value. And updating the T-1 th adjacent matrix of the second product under the ith noise value by using the T-1 th gradient field information until the T-1 th adjacent matrix of the second product under the ith noise value is obtained. And finally, determining the T-th adjacent matrix of the second product under the ith noise value as the target adjacent matrix of the second product under the ith noise value.

In the above example, the ith noise value is taken as an example, and the above steps are repeated sequentially for each of K noise values with reference to the above steps until the target adjacent matrix of the second product at the kth noise value is obtained.

In some embodiments, the above S503-a2 updates the t-1 th adjacency matrix of the second product at the ith noise value by using t-1 gradient field information of the adjacency matrix of the second product at the ith noise value, and the manner of obtaining the t-th adjacency matrix of the second product at the ith noise value includes, but is not limited to, the following manners:

in the first mode, the sum of t-1 gradient field information of the adjacency matrix of the second product at the ith noise value and the t-1 adjacency matrix of the second product at the ith noise value is used as the t-th adjacency matrix of the second product at the ith noise value.

In a second mode, the updating of the adjacency matrix in S503-a2 includes the following steps:

and step A1, determining the size of the update step corresponding to the ith noise value according to the K noise values and the ith noise value.

For example, the ratio of the ith noise value to the kth noise value is determined as the size of the update step corresponding to the ith noise value.

For example, the size of the update step corresponding to the ith noise value is determined according to the following equation (10):

wherein epsilon is a preset parameter,

is the square of the lth noise value,

is the square of the ith noise value, α_iThe size of the update step corresponding to the ith noise value.

And step A2, determining a noise value corresponding to the t-th adjacency matrix.

Wherein the distribution of the noisy values conforms to a normal distribution, e.g. to a standard normal distribution, e.g. with a noisy value of z_t，z_tN (0, 1), where "-" indicates compliance or conformity, and N (0, 1) indicates a standard normal distribution.

And A3, determining the t adjacent matrix of the second product under the ith noise value according to t-1 gradient field information of the adjacent matrix of the second product under the ith noise value, the t-1 adjacent matrix of the second product under the ith noise value, the size of the updating step corresponding to the ith noise value and the noise adding value corresponding to the t adjacent matrix.

In one example, the magnitude of the update step corresponding to the ith noise value is multiplied by the noisy value corresponding to the t-th adjacency matrix to obtain a first product. And taking the sum of t-1 adjacent matrix of the second product under the ith noise value, t-1 gradient field information of the adjacent matrix of the second product under the ith noise value and the first product as the t adjacent matrix of the second product under the ith noise value.

In one example, the t-th adjacency matrix for the second reactant at the i-th noise value may be determined according to equation (11) as follows:

wherein the content of the first and second substances,

for the t-th adjacency matrix of the second product at the i-th noise value,

is the t-1 th adjacency matrix of the second product at the ith noise value, R is the second reactant,

t-1 gradient field information at the ith noise value, alpha, for the adjacency matrix of the second product_iFor noisy values, z, corresponding to the t-th adjacency matrix_tThe size of the update step corresponding to the ith noise value.

It should be noted that the above formula (11) is only an example, and equivalent modifications to the above formula (11), or addition, subtraction, multiplication, or subtraction of one or more parameters to the formula (11) also belong to the protection scope of the present application.

If i is 1, the target adjacency matrix of the second product under the i-1 th noise value, that is, the initial adjacency matrix of the second product

Is in accordance with a first normal distribution.

Since the number of the links in the adjacent matrix is 0, 1, 2, or 3, the variance of the first normal distribution is set to a positive number of 3 or less in order to cover all the links.

In some embodiments, the sampling is performed by using langevin annealing sampling, and the specific sampling process is as follows:

inputting:

e, T, R (from large to small), where e is the minimum optimization step, T is the number of updates per noise value, and R is the second reactant;

(1) initiation of

(randomly initializing a adjacency matrix);

(2) noise level i ← 1 to K:

(3)

(setting the size of the update step);

(4) number of times T ← 1 to T: (number of update steps);

(5) sample z_t～N(0，1)；

(6)

(sample along

Sliding updating is carried out on the estimated gradient field information);

(7) ending the circulation;

(8)

(the obtained result is taken as an initial adjacency matrix under the next noise level condition);

(9) ending the circulation;

(10) return to

(final prediction of product adjacency matrix).

And sampling in the gradient field information of the adjacent matrix of the second product predicted by the prediction model according to the Langewaten annealing sampling method to obtain a target adjacent matrix of the second product under the K-th noise, and obtaining the second product according to the connection relation of atoms in the target adjacent matrix.

In some embodiments, the initial adjacency matrix of the second product in step (1) above satisfies the following formula

Wherein i and j are atoms in the second product,. epsilon_[i，j]N (0, 3), i.e., the present application initializes each edge between atoms in the second product using a portion where the normal distribution N (0, 3) is greater than 0.

The specific sampling process related to the embodiment of the present application is described above, and a process of obtaining t-1 gradient field information of the second product at the ith noise value in S502-a41 by inputting the t-1 th adjacency matrix of the second product at the ith noise value and the second reactant into the prediction model corresponding to the ith noise value in combination with the network structure of the prediction model is described below.

In some embodiments, as shown in fig. 6A, the prediction model includes an encoding module and a decoding module, in which case S503-a1 includes:

S503-A11, acquiring an adjacency matrix and a node characteristic matrix of a second reactant;

S503-A12, as shown in FIG. 6A, inputting the adjacency matrix and the node feature matrix (the node feature matrix is also called as an atomic feature matrix) of the second reactant and the t-1 adjacent matrix of the second product under the ith noise value into the prediction model, so that the coding module processes the adjacent matrix and the node characteristic matrix of the second reactant and the t-1 adjacent matrix of the second product under the ith noise value to obtain the first characteristic information of the second reactant and the first characteristic information of the t-1 adjacent matrix of the second product, so that the decoding module processes the first characteristic information of the second reactant and the first characteristic information of the t-1 th adjacent matrix of the second product to obtain t-1 th gradient field information corresponding to the second product under the ith noise value.

In some embodiments, as shown in fig. 6B, the encoding module includes a first encoding sub-module and a second encoding sub-module. The first coding submodule is used for processing the adjacent matrix and the node characteristic matrix of the second reactant to obtain first characteristic information of the second reactant. The second coding submodule is used for processing the t-1 adjacent matrix of the second product under the ith noise value to obtain the first characteristic information of the t-1 adjacent matrix of the second product.

In some embodiments, as shown in fig. 6C, the first encoding submodule includes a first atomic feature extraction unit and a first molecular feature extraction unit,

the first atom feature extraction unit is used for processing the adjacency matrix and the node feature matrix of the second reactant to obtain the embedded representation of each atom in the second reactant.

In some embodiments, and as shown with continued reference to fig. 6C, the second encoding submodule includes a second atomic feature extraction unit and a second molecular feature extraction unit,

the second atomic feature extraction unit is used for processing the t-1 th adjacent moment of the second product under the ith noise value to obtain the embedded representation of each atom corresponding to the t-1 th adjacent moment of the second product.

The first molecular feature extraction unit is used for performing feature interaction on the embedded representation of each atom corresponding to the t-1 th adjacency matrix of the second product to obtain first feature information of the t-1 th adjacency matrix of the second product.

In some embodiments, the network structure of the first and second atomic feature extraction units includes M GNN layers as shown in fig. 3E, each of which determines an embedded representation of an atom according to the method shown in fig. 3F. For the specific process, reference is made to the description of the above training process embodiment, which is not described herein again.

In some embodiments, as shown in fig. 6D, the decoding module includes a feature extraction unit and a feature decoding unit;

the feature extraction unit is used for obtaining second characteristic information of the second reactant and second characteristic information of the t-1 th adjacent matrix of the second product according to the first characteristic information of the second reactant and the first characteristic information of the t-1 th adjacent matrix of the second product;

the characteristic decoding unit is used for obtaining t-1 gradient field information corresponding to the second product under the ith noise value according to the second characteristic information of the second reactant and the second characteristic information of the t-1 adjacent matrix of the second product.

In some embodiments, as shown in fig. 6E, the feature extraction unit includes a first feature extraction sub-unit and a second feature extraction sub-unit;

the first characteristic extraction subunit is used for obtaining second characteristic information of the second reactant according to the first characteristic information of the second reactant;

the second feature extraction subunit is used for obtaining second feature information of the t-1 th adjacent matrix of the second product according to the first feature information of the t-1 th adjacent matrix of the second product.

In one possible implementation manner, the first feature extraction subunit and the first feature extraction subunit are both encoders in the converter, and the feature decoding unit is a decoder in the converter.

Optionally, the first molecular feature extraction unit and the second molecular feature extraction unit are MLP layers. At this time, the correspondence between the noise value and the model parameter is the correspondence between the noise value and the parameter in the MLP layer.

It should be noted that the model for predicting the t-1 th gradient field information corresponding to the ith noise value of the second product is the prediction model obtained by the training method, and the specific network structure of the prediction model refers to the specific description in the training step and is not described herein again.

In a specific embodiment, the prediction model and the prediction process of the prediction model according to the embodiments of the present application are shown in fig. 6FAs shown in fig. 6F, S502-a412 includes: inputting the adjacency matrix and the atomic feature matrix of the second reactant into a first atomic characteristic extraction unit GNN encoder for atomic feature extraction to obtain an embedded representation of each atom in the second reactant, wherein the adjacency matrix of the second reactant is epsilon R^N×N×CThe atomic feature matrix of the second reactant ∈ R^N×FThe GNN encoder includes M GNN layers. And inputting the t-1 th adjacency matrix of the second product under the ith noise value into a second atomic feature extraction unit GNN encoder for atomic feature extraction to obtain the embedded representation of each atom corresponding to the t-1 th adjacency matrix of the second product. Then, the embedding representation of each atom in the second reactant is input into the molecular feature extraction unit MLP for molecular feature extraction, and first feature information x of the second reactant is obtained^RWherein x is^R∈R^N×N. The embedding representation of each atom corresponding to the t-1 adjacent matrix of the second product is input into the second molecular feature extraction unit MLP for molecular feature extraction, and first feature information x of the t-1 adjacent matrix of the second product under the ith noise value is obtained^pWherein x is^p∈R^N×N. Then, the characteristic information x of the second reactant is determined^RInputting the first characteristic extraction subunit transformer encoder to obtain second characteristic information h of the second reactant^RWherein h is^R∈R^N×N. The first characteristic information x of t-1 adjacent matrix of the second product under the ith noise value^pInputting the second feature extraction subunit to obtain second feature information h of the t-1 th adjacency matrix of the second product under the ith noise value^pWherein h is^p∈R^N×N. Then, the second characteristic information h of the second reactant^RAnd second characteristic information h of t-1 adjacent matrix of second product under ith noise value^pInputting a feature decoding unit transformerDecoder to obtain t-1 gradient field information S e R corresponding to a second product predicted by the prediction model under the ith noise value^N×N。

According to the above mode, after T-1 th gradient field information corresponding to the second product predicted by the prediction model at the ith noise value is obtained, the T-1 th adjacency matrix of the second product at the ith noise value is updated by using the T-1 th gradient field information corresponding to the second product at the ith noise value, the T-1 th adjacency matrix of the second product at the ith noise value is obtained, and the process is repeated until T is T. And determining the T-th adjacent matrix of the second product under the ith noise value as a target adjacent matrix of the second product under the ith noise value.

And executing the steps for each noise value in the K noise values, and sequentially and iteratively executing the steps until a target adjacency matrix of the second product under the K noise value is determined. Wherein the kth noise value is the smallest noise value of the K noise values. Next, S503 is executed as follows.

And S504, predicting the second product according to the target adjacent matrix of the second product under the K noise value, wherein the K noise value is the minimum value of the K noise values.

For example, a product corresponding to the target adjacency matrix at the L-th noise value can be obtained by performing a reverse estimation based on the connection relationship between atoms in the target adjacency matrix at the K-th noise value of the second product, and the product can be determined as a product of the second reactant, that is, the second product.

The product prediction method provided by the embodiment of the application predicts the gradient field information of the adjacent matrix of the second product corresponding to the second reactant through the prediction model, wherein the gradient field information represents the variation trend of the generation probability of the second product, and further samples the adjacent matrix of the second product in the gradient field information of the adjacent matrix of the second product to obtain the final adjacent matrix of the second product, thereby realizing accurate prediction of the second product corresponding to the second reactant. And the second reactant can be any type of reactant, that is, the embodiment of the present application can realize the prediction of the product of any type of reactant.

The preferred embodiments of the present application have been described in detail with reference to the accompanying drawings, however, the present application is not limited to the details of the above embodiments, and various simple modifications can be made to the technical solution of the present application within the technical idea of the present application, and these simple modifications are all within the protection scope of the present application. For example, the various features described in the foregoing detailed description may be combined in any suitable manner without contradiction, and various combinations that may be possible are not described in this application in order to avoid unnecessary repetition. For example, various embodiments of the present application may be arbitrarily combined with each other, and the same should be considered as the disclosure of the present application as long as the concept of the present application is not violated.

It should also be understood that, in the various method embodiments of the present application, the sequence numbers of the above-mentioned processes do not imply an execution sequence, and the execution sequence of the processes should be determined by their functions and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Method embodiments of the present application are described in detail above with reference to fig. 2 to 6, and apparatus embodiments of the present application are described in detail below with reference to fig. 7 to 9.

Fig. 7 is a schematic structural diagram of a training apparatus for a prediction model according to an embodiment of the present disclosure. The training apparatus 20 may be a computing device or a component of a computing device (e.g., an integrated circuit, a chip, etc.) for performing the model training method described above.

An acquisition unit 21 for acquiring a first reactant and a first product of the first reactant;

a noise adding unit 22, configured to perform noise adding processing on the first product to obtain first products under different noise values;

the prediction unit 23 is configured to input the first reactant and a first product at different noise values into a prediction model, and obtain gradient field information of an adjacency matrix of the first product output by the prediction model at different noise values;

the training unit 24 is configured to train the prediction model according to gradient field information of the adjacency matrix of the first reactant under different noise values, so as to obtain the trained prediction model, where the trained prediction model is used to predict gradient field information of the adjacency matrix of the product of the second reactant, and the gradient field information is used to indicate a variation trend of a generation probability of the product of the second reactant.

In some embodiments, the training unit 24 is specifically configured to determine a loss of the prediction model according to gradient field information of the adjacency matrix of the first product under different noise values; and adjusting parameters in the prediction model according to the loss of the prediction model to obtain the trained prediction model.

In some embodiments, the noise unit 22 is specifically configured to obtain an adjacency matrix of the first organism; carrying out noise processing on the adjacency matrix of the first organism to obtain the adjacency matrix of the first organism under different noise values;

correspondingly, the prediction unit 23 is specifically configured to obtain an adjacency matrix and a node feature matrix of the first reactant, and a node feature matrix of the first product; inputting the adjacency matrix and the node characteristic matrix of the first reactant, the node characteristic matrix of the first product and the adjacency matrix under different noise values into the prediction model to obtain gradient field information of the adjacency matrix of the first product under different noise values, which is output by the prediction model.

In some embodiments, the prediction model includes an encoding module and a decoding module, and the prediction unit 23 is specifically configured to input, to the encoding module, an adjacency matrix and a node feature matrix of the first reactant, and an atomic property matrix of the first product and an adjacency matrix at the ith noise value, for the ith noise value in the different noise values, and obtain first feature information of the first reactant and first feature information of the first product at the ith noise value, which are output by the encoding module;

and inputting the first characteristic information of the first reactant and the first characteristic information of the first product under the ith noise value into the decoding module to obtain the gradient field information of the first generated adjacent matrix under the ith noise value, which is output by the decoding module.

In some embodiments, the encoding module includes a first encoding submodule and a second encoding submodule, and the prediction unit 23 is specifically configured to input the adjacency matrix and the node feature matrix of the first reactant into the first encoding submodule to obtain first feature information of the first reactant; and inputting the atomic characteristic matrix of the first product and the adjacent matrix under the ith noise value into the second coding submodule to obtain first characteristic information of the first product under the ith noise value.

In some embodiments, the pth encoding submodule includes a pth atomic feature extraction unit and a pth molecular feature extraction unit, and the prediction unit 23 is further configured to input an adjacency matrix and a node feature matrix of the object into the pth atomic feature extraction unit, so as to obtain an embedded representation of each atom in the object; inputting the embedded representation of each atom in the target object into the p-th molecular feature extraction unit for feature interaction to obtain first feature information of the target object;

when p is 1, the pth encoding submodule is the first encoding submodule, the adjacent matrix and the node feature matrix of the target object are the adjacent matrix and the node feature matrix of the first reactant, when p is 2, the pth encoding submodule is the second encoding submodule, and the adjacent matrix and the node feature matrix of the target object are the atomic characteristic matrix of the first product and the adjacent matrix under the ith noise value.

In some embodiments, the pth atomic feature extraction module includes M graph neural network GNN layers, where M is a positive integer, and the prediction unit 23 is specifically configured to input an adjacency matrix and a node feature matrix of an object into the pth atomic feature extraction unit to obtain bond information of a jth atom in the object extracted by the mth GNN layer, where M is a positive integer less than or equal to M; fusing the connecting bond information of the jth atom with the (m-1) th embedded representation of the jth atom corresponding to the (m-1) th GNN layer to obtain the (m) th embedded representation of the ith atom corresponding to the (m) th GNN layer; and splicing the embedded representation of the jth atom corresponding to each GNN layer in the M GNN layers to obtain the embedded representation of the jth atom.

In some embodiments, the prediction unit 23 is specifically configured to input the adjacency matrix and the node feature matrix of the target object into the p-th atom feature extraction unit, and aggregate the information of the j-th atom and the information of the neighbor atoms of different bond types through the m-th GNN layer to obtain the bond information of the j-th atom extracted by the m-th GNN layer about different bond types.

In some embodiments, the decoding module includes a feature extraction unit and a feature decoding unit, and the prediction unit 23 is specifically configured to input first feature information of the first reactant and first feature information of the first product at an ith noise value into the feature extraction unit, so as to obtain second feature information of the first reactant and second feature information of the first product at the ith noise value; and inputting the second characteristic information of the first reactant and the second characteristic information of the first product under the ith noise value into the feature decoding unit to obtain the gradient field information of the adjacency matrix of the first product under the ith noise value.

In some embodiments, the feature extraction unit includes a first feature extraction subunit and a second feature extraction subunit, and the prediction unit 23 is specifically configured to input first feature information of the first reactant into the first feature extraction subunit to obtain second characteristic information of the first reactant; and inputting the first feature information of the first product under the ith noise value into the second feature extraction subunit to obtain the second feature information of the first product under the ith noise value.

In some embodiments, the first feature extraction sub-unit and the first feature extraction sub-unit are both in-converter encoders, and the feature decoding unit is in-converter decoders.

In some embodiments, the noise adding unit 22 is specifically configured to add different noise values to the adjacency matrix of the first product by using a gaussian distribution noise adding manner, so as to obtain an adjacency matrix of the first product under different noise values.

In some embodiments, the training unit 24 is further configured to generate a correspondence between the noise values and the model parameters, where the correspondence includes the model parameters of the prediction model corresponding to each of the different noise values.

Optionally, the pth molecular feature extraction unit is MLP.

Optionally, the correspondence between the noise value and the model parameter is a correspondence between the noise value and a parameter in the MLP.

It is to be understood that apparatus embodiments and method embodiments may correspond to one another and that similar descriptions may refer to method embodiments. To avoid repetition, further description is omitted here. Specifically, the training device shown in fig. 7 may correspond to a corresponding main body in executing the model training method according to the embodiment of the present application, and the foregoing and other operations and/or functions of each module in the training device are respectively for implementing corresponding processes in each method in the above model training, and are not described herein again for brevity.

Fig. 8 is a schematic structural diagram of a product prediction apparatus according to an embodiment of the present application. The prediction apparatus 30 may be a computing device, or may be a component (e.g., an integrated circuit, a chip, etc.) of a computing device, for executing the product prediction method.

An obtaining unit 31, configured to obtain a second reactant and preset K noise values, where K is a positive integer less than or equal to L;

a determining unit 32, configured to determine, for an ith noise value of the K noise values, a prediction model corresponding to the ith noise value, where the prediction model is obtained through the training method, and i is a positive integer from 1 to K;

a sampling unit 33, configured to sample gradient field information predicted by a prediction model corresponding to an i-th noise value according to the second reactant and a target adjacency matrix of a second product at the i-1 th noise value, to obtain a target adjacency matrix of the second product at the i-th noise value, where the second product is a product of the second reactant;

a prediction unit 34, configured to determine the second product according to a target adjacency matrix of the second product under a K-th noise value, where the K-th noise value is a minimum value of the K noise values.

In some embodiments, the determining unit 32 is specifically configured to obtain a correspondence between noise values and model parameters, where the correspondence includes a model parameter of the prediction model corresponding to each noise value in different noise values; inquiring an ith group of model parameters corresponding to the ith noise value from the corresponding relation according to the ith noise value; taking the ith group of model parameters as parameters of the prediction model to obtain a prediction model corresponding to the ith noise value;

the sampling unit 33 is specifically configured to sample gradient field information predicted by the prediction model corresponding to the ith noise value according to the second reactant and the target adjacency matrix of the second reactant under the i-1 th noise value, so as to obtain the target adjacency matrix of the second reactant under the i-th noise value.

In some embodiments, the sampling unit 33 is specifically configured to input a T-1 th adjacency matrix of the second product at an ith noise value and the second reactant into the prediction model corresponding to the ith noise value, so as to obtain T-1 th gradient field information of the adjacency matrix of the second product at the ith noise value, where T is a positive integer less than or equal to T, and when T is 1, the T-1 th adjacency matrix of the second product at the ith noise value is a target adjacency matrix of the second product at the i-1 th noise value; updating the T-1 adjacent matrix of the second product under the ith noise value by using T-1 gradient field information of the adjacent matrix of the second product under the ith noise value to obtain the T adjacent matrix of the second product under the ith noise value, and repeating the operation until the T is T; and determining the Tth adjacency matrix of the second product under the ith noise value as a target adjacency matrix of the second reactant under the ith noise value.

In some embodiments, the sampling unit 33 is specifically configured to determine, according to the K noise values and the ith noise value, a size of an update step corresponding to the ith noise value; determining a noise adding value corresponding to the t-th adjacency matrix; and determining the t adjacent matrix of the second product under the ith noise value according to t-1 gradient field information corresponding to the adjacent matrix of the second product under the ith noise value, the t-1 adjacent matrix of the second product under the ith noise value, the size of the updating step corresponding to the ith noise value and the noise adding value corresponding to the t adjacent matrix.

In some embodiments, the prediction model includes an encoding module and a decoding module, and the sampling unit 33 is specifically configured to obtain an adjacency matrix and a node feature matrix of the second reactant; inputting the adjacency matrix and the node characteristic matrix of the second reactant and the t-1 adjacent matrix of the second product under the ith noise value into the prediction model, so that the coding module processes the adjacency matrix and the node characteristic matrix of the second reactant and the t-1 adjacent matrix of the second product under the ith noise value to obtain the first characteristic information of the second reactant and the first characteristic information of the t-1 adjacent matrix of the second product, so that the decoding module processes the first characteristic information of the second reactant and the first characteristic information of the t-1 th adjacent matrix of the second product to obtain the t-1 th gradient field information of the adjacent matrix of the second product under the ith noise value.

In some embodiments, the encoding module comprises a first encoding sub-module and a second encoding sub-module;

the second coding submodule is used for processing the t-1 adjacent matrix of the second product under the ith noise value to obtain the first characteristic information of the t-1 adjacent matrix of the second product.

In some embodiments, the first encoding submodule comprises a first atomic feature extraction unit and a first molecular feature extraction unit;

In some embodiments, the second encoding submodule comprises a second atomic feature extraction unit and a second molecular feature extraction unit;

In some embodiments, the decoding module comprises a feature extraction unit and a feature decoding unit;

In some embodiments, the feature extraction unit comprises a first feature extraction sub-unit and a second feature extraction sub-unit;

Optionally, if i is 1, each element in the target adjacency matrix of the second product under the i-1 th noise value conforms to the first normal distribution.

It is to be understood that apparatus embodiments and method embodiments may correspond to one another and that similar descriptions may refer to method embodiments. To avoid repetition, further description is omitted here. Specifically, the prediction apparatus shown in fig. 8 may correspond to a corresponding main body in the prediction method of the embodiment of the present application, and the foregoing and other operations and/or functions of each module in the prediction apparatus are respectively for implementing the corresponding flow in the prediction method, and are not described herein again for brevity.

The apparatus of the embodiments of the present application is described above in connection with the drawings from the perspective of functional modules. It should be understood that the functional modules may be implemented by hardware, by instructions in software, or by a combination of hardware and software modules. Specifically, the steps of the method embodiments in the present application may be implemented by integrated logic circuits of hardware in a processor and/or instructions in the form of software, and the steps of the method disclosed in conjunction with the embodiments in the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in a processor. Alternatively, the software modules may be located in random access memory, flash memory, read only memory, programmable read only memory, electrically erasable programmable memory, registers, and the like, as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps in the above method embodiments in combination with hardware thereof.

Fig. 9 is a block diagram of a computing device according to an embodiment of the present application, configured to execute the method according to the foregoing embodiment, and refer to the description in the foregoing method embodiment specifically.

The computing device 200 shown in fig. 9 includes a memory 201, a processor 202, and a communication interface 203. The memory 201, the processor 202 and the communication interface 203 are connected with each other in communication. For example, the memory 201, the processor 202, and the communication interface 203 may be connected by a network connection. Alternatively, the computing device 200 may also include a bus 204. The memory 201, the processor 202 and the communication interface 203 are connected to each other through a bus 204. Fig. 9 is a computing device 200 with a memory 201, a processor 202, and a communication interface 203 communicatively coupled to each other via a bus 204.

The Memory 201 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM). The memory 201 may store programs, and the processor 202 and the communication interface 203 are used to perform the above-described methods when the programs stored in the memory 201 are executed by the processor 202.

Processor 202 may be implemented as a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more Integrated circuits.

The processor 202 may also be an integrated circuit chip having signal processing capabilities. In implementation, the method of the present application may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 202. The processor 202 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 202 reads the information in the memory 201 and completes the method of the embodiment of the application in combination with the hardware thereof.

The communication interface 203 enables communication between the computing device 200 and other devices or communication networks using transceiver modules such as, but not limited to, transceivers. For example, the data set may be acquired through the communication interface 203.

When computing device 200 includes bus 204, as described above, bus 204 may include a pathway to transfer information between various components of computing device 200 (e.g., memory 201, processor 202, communication interface 203).

There is also provided according to the present application a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, the present application also provides a computer program product containing instructions, which when executed by a computer, cause the computer to execute the method of the above method embodiment.

There is also provided according to the present application a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of the above-described method embodiment.

In other words, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application occur, in whole or in part, when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the module is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or modules, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In addition, the method embodiments and the device embodiments may also refer to each other, and the same or corresponding contents in different embodiments may be referred to each other, which is not described in detail.

Claims

1. A method of model training, comprising:

obtaining a first reactant, and a first product of the first reactant;

2. The method according to claim 1, wherein the training the prediction model according to the gradient field information of the adjacency matrix of the first product under different noise values to obtain the trained prediction model comprises:

determining the loss of the prediction model according to the gradient field information of the adjacency matrix of the first product under different noise values;

and adjusting parameters in the prediction model according to the loss of the prediction model to obtain the trained prediction model.

3. The method of claim 2, wherein the denoising the first product to obtain the first product at different noise values comprises:

acquiring an adjacency matrix of a first organism;

carrying out noise processing on the adjacency matrix of the first organism to obtain the adjacency matrix of the first organism under different noise values;

the step of inputting the first reactant and the first product under different noise values into a prediction model to obtain gradient field information of the adjacency matrix of the first product output by the prediction model, where the gradient field information corresponds to different noise values, includes:

acquiring an adjacency matrix and a node characteristic matrix of the first reactant and a node characteristic matrix of the first product;

inputting the adjacency matrix and the node characteristic matrix of the first reactant, the node characteristic matrix of the first product and the adjacency matrix under different noise values into the prediction model to obtain gradient field information of the adjacency matrix of the first product under different noise values, which is output by the prediction model.

4. The method of claim 3, wherein the prediction model comprises an encoding module and a decoding module, and the inputting the adjacency matrix and the node feature matrix of the first reactant, the node feature matrix of the first product, and the adjacency matrix under different noise values into the prediction model to obtain gradient field information of the adjacency matrix of the first product under different noise values output by the prediction model comprises:

aiming at the ith noise value in the different noise values, inputting the adjacent matrix and the node characteristic matrix of the first reactant, the atomic characteristic matrix of the first product and the adjacent matrix under the ith noise value into the encoding module to obtain first characteristic information of the first reactant and first characteristic information of the first product under the ith noise value, which are output by the encoding module;

5. The method according to claim 4, wherein the encoding module comprises a first encoding submodule and a second encoding submodule, and the inputting the adjacency matrix and the node characteristic matrix of the first reactant, and the atomic characteristic matrix of the first product and the adjacency matrix at the ith noise value into the encoding module to obtain the first characteristic information of the first reactant and the first characteristic information of the first product at the ith noise value output by the encoding module comprises:

inputting the adjacency matrix and the node characteristic matrix of the first reactant into the first coding submodule to obtain first characteristic information of the first reactant;

and inputting the atomic characteristic matrix of the first product and the adjacent matrix under the ith noise value into the second coding submodule to obtain first characteristic information of the first product under the ith noise value.

6. The method of claim 5, wherein the pth encoding submodule includes a pth atomic feature extraction unit and a pth molecular feature extraction unit, the method further comprising:

inputting an adjacency matrix and a node feature matrix of the target object into the p-th atom feature extraction unit to obtain an embedded representation of each atom in the target object;

inputting the embedded representation of each atom in the target object into the p-th molecular feature extraction unit for feature interaction to obtain first feature information of the target object;

7. The method according to claim 6, wherein the p-th atomic feature extraction module comprises M graph neural network GNN layers, M is a positive integer, and the inputting the adjacency matrix and the node feature matrix of the object into the p-th atomic feature extraction unit to obtain the embedded representation of each atom in the object comprises:

inputting an adjacency matrix and a node feature matrix of the target object into the p-th atom feature extraction unit to obtain the bond information of the j-th atom in the target object extracted by the M-th GNN layer, wherein M is a positive integer less than or equal to M;

fusing the connecting bond information of the jth atom and the (m-1) th embedded representation of the jth atom corresponding to the (m-1) th GNN layer to obtain the (m) th embedded representation of the jth atom corresponding to the (m) th GNN layer;

and splicing the embedded representation of the jth atom corresponding to each GNN layer in the M GNN layers to obtain the embedded representation of the jth atom.

8. The method according to claim 7, wherein the inputting the adjacency matrix and the node feature matrix of the object into the p-th atom feature extraction unit to obtain the bond information of the j-th atom in the object extracted by the m-th GNN layer comprises:

inputting an adjacency matrix and a node feature matrix of a target object into the p-th atom feature extraction unit, and aggregating the information of the j-th atom and the information of neighbor atoms of different bond types through the m-th GNN layer to obtain the bond information of the j-th atom extracted by the m-th GNN layer, wherein the bond information is related to different bond types.

9. The method according to claim 4, wherein the decoding module comprises a feature extraction unit and a feature decoding unit, and the inputting the first feature information of the first reactant and the first feature information of the first product at the ith noise value into the decoding module to obtain the gradient field information of the adjacency matrix of the first product at the ith noise value, which is output by the decoding module, comprises:

inputting first feature information of the first reactant and first feature information of the first product at an ith noise value into the feature extraction unit to obtain second feature information of the first reactant and second feature information of the first product at the ith noise value;

and inputting the second characteristic information of the first reactant and the second characteristic information of the first product under the ith noise value into the feature decoding unit to obtain the gradient field information of the adjacency matrix of the first product under the ith noise value.

10. The method according to claim 9, wherein the feature extraction unit comprises a first feature extraction subunit and a second feature extraction subunit, and the inputting the first feature information of the first reactant and the first feature information of the first product at an ith noise value into the feature extraction unit to obtain the second characteristic information of the first reactant and the second feature information of the first product at the ith noise value comprises:

inputting first characteristic information of the first reactant into the first characteristic extraction subunit to obtain second characteristic information of the first reactant;

and inputting the first feature information of the first product under the ith noise value into the second feature extraction subunit to obtain the second feature information of the first product under the ith noise value.

11. The method according to any one of claims 1-10, further comprising:

and generating a corresponding relation between the noise values and the model parameters, wherein the corresponding relation comprises the model parameters of the prediction model corresponding to each noise value in different noise values.

12. A method for predicting a product, comprising:

aiming at the ith noise value in the K noise values, determining a prediction model corresponding to the ith noise value, wherein the prediction model is obtained by training through the training method of any one of the claims 1-11, and i is a positive integer from 1 to K;

13. The method of claim 12, wherein determining the predictive model for the i-th noise value comprises:

acquiring a corresponding relation between the noise values and model parameters, wherein the corresponding relation comprises the model parameters of the prediction model corresponding to each noise value in different noise values;

inquiring an ith group of model parameters corresponding to the ith noise value from the corresponding relation according to the ith noise value;

and taking the ith group of model parameters as parameters of the prediction model to obtain the prediction model corresponding to the ith noise value.

14. The method of claim 13, wherein the obtaining the target adjacency matrix of the second product at the ith noise value by sampling the gradient field information predicted by the prediction model corresponding to the ith noise value according to the target adjacency matrix of the second reactant and the second product at the ith-1 noise value comprises:

inputting the T-1 adjacent matrix of the second product under the ith noise value and the second reactant into a prediction model corresponding to the ith noise value to obtain the T-1 gradient field information of the adjacent matrix of the second product under the ith noise value, wherein T is a positive integer less than or equal to T, and when T is 1, the T-1 adjacent matrix of the second product under the ith noise value is a target adjacent matrix of the second product under the i-1 noise value;

updating the T-1 adjacent matrix of the second product under the ith noise value by using T-1 gradient field information of the adjacent matrix of the second product under the ith noise value to obtain the T adjacent matrix of the second product under the ith noise value, and repeating the operation until the T is equal to the T;

and determining the Tth adjacency matrix of the second product under the ith noise value as a target adjacency matrix of the second reactant under the ith noise value.

15. The method of claim 14, wherein the updating the t-1 th adjacency matrix of the second product at the ith noise value by using the t-1 gradient field information of the adjacency matrix of the second product at the ith noise value to obtain the t-1 th adjacency matrix of the second product at the ith noise value comprises:

determining the size of an updating step corresponding to the ith noise value according to the K noise values and the ith noise value;

determining a noise adding value corresponding to the t-th adjacency matrix;

and determining the t adjacent matrix of the second product under the ith noise value according to t-1 gradient field information corresponding to the adjacent matrix of the second product under the ith noise value, the t-1 adjacent matrix of the second product under the ith noise value, the size of the updating step corresponding to the ith noise value and the noise adding value corresponding to the t adjacent matrix.

16. The method of claim 14, wherein the prediction model comprises an encoding module and a decoding module, and the inputting the t-1 th adjacency matrix of the second product at the ith noise value and the second reactant into the prediction model corresponding to the ith noise value to obtain the t-1 th gradient field information of the adjacency matrix of the second product at the ith noise value comprises:

acquiring an adjacency matrix and a node characteristic matrix of a second reactant;

inputting the adjacency matrix and the node characteristic matrix of the second reactant and the t-1 adjacent matrix of the second product under the ith noise value into the prediction model, so that the coding module processes the adjacent matrix and the node characteristic matrix of the second reactant and the t-1 adjacent matrix of the second product under the ith noise value to obtain the first characteristic information of the second reactant and the first characteristic information of the t-1 adjacent matrix of the second product, so that the decoding module processes the first characteristic information of the second reactant and the first characteristic information of the t-1 th adjacent matrix of the second product to obtain the t-1 th gradient field information of the adjacent matrix of the second product under the ith noise value.

17. A model training apparatus, comprising:

and the training unit is used for training the prediction model according to gradient field information of the adjacency matrix of the first reactant under different noise values to obtain the trained prediction model, wherein the trained prediction model is used for predicting the gradient field information of the adjacency matrix of the second reactant, and the gradient field information is used for indicating the variation trend of the adjacency matrix of the second reactant.

18. A product prediction device is characterized by comprising:

19. A computing device, comprising: a processor and a memory;

the memory for storing a computer program;

the processor for executing the computer program to implement the method of any one of claims 1 to 11 or 12 to 16.

20. A computer-readable storage medium, characterized in that the storage medium comprises computer instructions which, when executed by a computer, cause the computer to carry out the method according to any one of claims 1 to 11 or 12 to 16.