CN114822721A - Molecular diagram generation method and device - Google Patents

Molecular diagram generation method and device Download PDF

Info

Publication number
CN114822721A
CN114822721A CN202210554539.6A CN202210554539A CN114822721A CN 114822721 A CN114822721 A CN 114822721A CN 202210554539 A CN202210554539 A CN 202210554539A CN 114822721 A CN114822721 A CN 114822721A
Authority
CN
China
Prior art keywords
atom
candidate
sequence
chemical bond
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210554539.6A
Other languages
Chinese (zh)
Inventor
陈致远
方晓敏
王凡
何径舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210554539.6A priority Critical patent/CN114822721A/en
Publication of CN114822721A publication Critical patent/CN114822721A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/80Data visualisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Mining & Analysis (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Medicinal Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a molecular diagram generation method and a molecular diagram generation device, relates to the technical field of computers, and particularly relates to the technical field of biological computation and the technical field of artificial intelligence. The specific implementation scheme is as follows: firstly, a molecular diagram generation model is obtained, the molecular diagram generation model comprises an atom prediction submodel and a chemical bond prediction submodel, then an initiator is input into the atom prediction submodel to perform atom prediction, a candidate atom sequence is determined, finally the candidate atom sequence is input into the chemical bond prediction submodel to perform chemical bond prediction, a molecular diagram corresponding to the candidate atom sequence is generated, and the generation of the molecular diagram is completed by utilizing two steps of candidate atom sequence generation and chemical bond generation between candidate atoms, so that the generation effect of the molecular diagram can be effectively improved, the generation difficulty of the molecular diagram is reduced, and the operation efficiency is improved.

Description

Molecular diagram generation method and device
Technical Field
The present disclosure relates to the field of computer technologies, in particular to the field of biological computing technologies and artificial intelligence technologies, and in particular, to a method and an apparatus for generating a molecular diagram.
Background
In the fields of chemistry and drug discovery, the screening of molecules with drug characteristics from huge chemical space is a major problem in drug design, and the current virtual screening method consumes a large amount of computing resources and time cost and is difficult to generate new molecular structures. The brand new Drug Design (de novo Drug Design) can reduce the target chemical molecule search space, greatly improve the Drug Design efficiency and reduce the cost.
With the continuous development and progress of artificial intelligence, a large number of molecular generation models appear and become effective means for brand-new drug design, and the existing molecular generation models can be generated in a hidden space sampling mode in model architecture design.
Disclosure of Invention
The present disclosure provides a molecular graph generation method, apparatus, electronic device, storage medium, and computer program product.
According to an aspect of the present disclosure, there is provided a molecular graph generation method, including: obtaining a molecular diagram generation model, wherein the molecular diagram generation model comprises an atom predictor model and a chemical bond predictor model; inputting the initial character into an atom prediction submodel to perform atom prediction, and determining a candidate atom sequence; and inputting the candidate atomic sequence into a chemical bond predictor model for chemical bond prediction to generate a molecular diagram corresponding to the candidate atomic sequence.
According to another aspect of the present disclosure, there is provided a molecular graph generating apparatus including: an obtaining module configured to obtain a molecular diagram generation model, wherein the molecular diagram generation model comprises an atom predictor model and a chemical bond predictor model; the atom prediction module is configured to input the start character into the atom prediction submodel for atom prediction, and determine a candidate atom sequence; and the chemical bond prediction module is configured to input the candidate atom sequence into a chemical bond predictor model for chemical bond prediction, and generate a molecular diagram corresponding to the candidate atom sequence.
According to another aspect of the present disclosure, there is provided an electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the molecular diagram generation method described above.
According to another aspect of the present disclosure, there is provided a computer-readable medium having stored thereon computer instructions for enabling a computer to execute the above molecular diagram generating method.
According to another aspect of the present disclosure, a computer program product is provided, which includes computer programs/instructions, and when the computer programs/instructions are executed by a processor, the molecular diagram generation method is implemented.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow diagram of one embodiment of a molecular diagram generation method according to the present disclosure;
FIG. 2 is a schematic diagram of one application scenario of a molecular diagram generation method according to the present disclosure;
FIG. 3 is a flow diagram for one embodiment of determining candidate atomic sequences, according to the present disclosure;
FIG. 4 is a flow diagram for one embodiment of generating a molecular graph corresponding to a candidate atomic sequence, according to the present disclosure;
FIG. 5 is a flow diagram for one embodiment of optimizing a molecular graph according to the present disclosure;
FIG. 6 is a flow diagram for one embodiment of obtaining an atomic predictor model according to the present disclosure;
FIG. 7 is a flow diagram of one embodiment of obtaining a chemical bond predictor model according to the present disclosure;
FIG. 8 is a schematic structural diagram of one embodiment of a molecular diagram generation apparatus according to the present disclosure;
FIG. 9 is a block diagram of an electronic device for implementing a molecular diagram generation method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Referring to fig. 1, fig. 1 shows a flow diagram 100 of an embodiment of a molecular diagram generation method that may be applied to the present disclosure. The molecular diagram generation method comprises the following steps:
step 110, obtaining a molecular diagram generation model.
In this embodiment, an execution subject (for example, a server) of the molecular diagram generation method may read a molecular diagram generation model for generating a molecular diagram through a network or locally, where the molecular diagram may be a compound molecular diagram and is composed of a plurality of atoms and chemical bonds, and the molecular diagram generation model may be used for performing an untargeted random sampling molecular generation task, that is, sampling and predicting the atoms and the chemical bonds to generate a random molecular diagram, so that various molecular diagrams can be acquired.
The molecular diagram generation model comprises an atom prediction submodel and a chemical bond prediction submodel, and can be a model obtained by training based on a natural language pre-training model (uniLM), the atom prediction submodel can be connected with the chemical bond prediction submodel, and the output result of the atom prediction submodel can be directly input into the chemical bond prediction submodel. The atom prediction submodel can be a neural network comprising a Transformer encoder and is used for carrying out atom prediction to obtain a random candidate atom sequence and inputting the obtained candidate atom sequence into the chemical bond prediction submodel; the chemical bond predictor model may be a neural network including a Transformer encoder and an attention mechanism for performing chemical bond prediction on each candidate atom in the sequence of candidate atoms, and the attention mechanism may consider a relationship between a currently generated chemical bond and a historically generated chemical bond to predict the chemical bond of each candidate atom in the sequence of candidate atoms.
And step 120, inputting the initial character into an atom prediction submodel for atom prediction, and determining a candidate atom sequence.
In this embodiment, after obtaining the molecular diagram generation model including the atom predictor model and the chemical bond predictor model, the execution subject may receive an initiator set by an operator, may serve as a special node in the molecular diagram, may be expressed in various forms such as < cls >, and may be connected to atoms in the molecular diagram. The start character may be used as an input of the atomic prediction submodel, and is used to start an atomic prediction operation, and the execution subject inputs the start character into the atomic prediction submodel after obtaining the start character. And after receiving the initial character, the atom prediction submodel starts to perform atom prediction based on the initial character, randomly predicts a plurality of candidate atoms based on the learning sample self-learning in the training process, and determines the predicted candidate atoms as a candidate atom sequence.
The candidate atom sequence may include a plurality of randomly predicted candidate atoms, which may be expressed in various expression forms such as < cls > CCC, for example, the atom prediction submodel may learn autonomously according to a learning sample in a training process, and may randomly predict a plurality of candidate atoms that may be used to generate chemical bonds after receiving an initiator, where the disclosure does not specifically limit the number and type of the candidate atoms.
And step 130, inputting the candidate atomic sequence into a chemical bond predictor model for chemical bond prediction, and generating a molecular diagram corresponding to the candidate atomic sequence.
In this embodiment, the execution agent determines a candidate atom sequence by the atom predictor model, and then inputs the candidate atom sequence into the chemical bond predictor model. After receiving the candidate atom sequence, the chemical bond predictor model may determine the current atom from the candidate atom sequence according to the arrangement order of the atoms, and then predict the chemical bonds between the current atom and other candidate atoms in the candidate atom sequence, so as to determine whether chemical bonds exist between the current atom and other candidate atoms. Then, the executing agent may use a candidate atom in the candidate atom sequence that is ranked after the current atom as a new current atom, and predict chemical bonds between the new current atom and other candidate atoms in the candidate atom sequence until determining the chemical bond of each candidate atom in the candidate atom sequence.
After the execution subject determines the chemical bond of each candidate atom in the candidate atom sequence through the chemical bond predictor model, each candidate atom can be connected according to the chemical bond of each candidate atom to generate a molecular diagram corresponding to the candidate atom sequence, wherein the molecular diagram comprises each candidate atom in the candidate atom sequence and the chemical bond between each candidate atom.
With continued reference to fig. 2, fig. 2 is a schematic diagram of an application scenario of the molecular diagram generation method according to the present embodiment.
In the application scenario of fig. 2, the server 201 may locally store a molecular diagram generation model including an atomic predictor model and a chemical bond predictor model. The terminal 202 may send an initiator to the server 201, the server 201 may read a locally stored molecular diagram generation model including an atom prediction submodel and a chemical bond prediction submodel, receive the initiator sent by the terminal 202, input the initiator into the atom prediction submodel to perform atom prediction, determine a candidate atom sequence, then the server 201 may input the candidate atom sequence into the chemical bond prediction submodel to perform chemical bond prediction, generate a molecular diagram corresponding to the candidate atom sequence, and send the generated molecular diagram to the terminal 202.
The molecular diagram generation method provided by the embodiment of the disclosure includes obtaining a molecular diagram generation model, wherein the molecular diagram generation model includes an atom prediction submodel and a chemical bond prediction submodel, inputting an initiator into the atom prediction submodel to perform atom prediction, determining a candidate atom sequence, inputting the candidate atom sequence into the chemical bond prediction submodel to perform chemical bond prediction, generating a molecular diagram corresponding to the candidate atom sequence, predicting the required candidate atom sequence according to the initiator, without sampling from a hidden space of the molecular diagram to generate an atom feature matrix, improving the search efficiency of the candidate atom sequence, predicting the corresponding chemical bond based on the candidate atom sequence and the chemical bond prediction submodel, without collecting an adjacent matrix with chemical bond features from the hidden space of the molecular diagram, improving the generation efficiency of the chemical bond, the generation of the molecular diagram is completed by utilizing two steps of candidate atom sequence generation and chemical bond generation between candidate atoms, so that the generation effect of the molecular diagram can be effectively improved, the generation difficulty of the molecular diagram is reduced, and the operation efficiency is improved.
Referring to fig. 3, fig. 3 shows a flowchart 300 of an embodiment of determining a candidate atomic sequence, that is, the above step 120 of inputting a start symbol into an atomic prediction submodel to perform atomic prediction, and determining the candidate atomic sequence may include the following steps:
and 310, inputting the initial character into an atom prediction submodel to perform atom prediction, and acquiring a predicted atom sequence.
In this embodiment, after obtaining the molecular diagram generation model including the atom predictor model and the chemical bond predictor model, the execution subject may receive an initiator set by an operator, may serve as a special node in the molecular diagram, may be expressed in various forms such as < cls >, and may be connected to atoms in the molecular diagram.
The start character may be used as an input of an atomic predictor model, and the execution subject may input the start character into the atomic predictor model after obtaining the start character. And after receiving the initial character, the atom prediction submodel starts to perform atom prediction based on the initial character, and randomly predicts a plurality of predicted atoms based on the learning sample self-learning in the training process. The execution body may arrange the predicted atoms according to their predicted order to obtain a predicted atom sequence corresponding to a plurality of predicted atoms.
In response to the predicted terminator, the predicted atomic sequence is determined to be a candidate atomic sequence, step 320.
In this embodiment, in the process of performing atom prediction by the execution main body using the atom prediction submodel, random prediction may be performed on each atom node, and a predicted atom may be directly output after being predicted, and when the predicted node is an end character, the atom prediction submodel stops atom prediction, and a predicted atom sequence before the end character is used as a candidate atom sequence. The terminator can be used as a special node in the molecular diagram, can be expressed in various forms such as < sp >, can be connected with atoms in the molecular diagram, and can be used for terminating the atom prediction of the atom prediction submodel.
When the atomic prediction submodel is trained, the molecular graph labeled with the start symbol and the end symbol can be learned, the start symbol can be used for starting the atomic prediction of the atomic prediction submodel, and the end symbol can be used for terminating the atomic prediction of the atomic prediction submodel. The atom prediction submodel may start atom prediction and output predicted candidate atoms when receiving a start character, may stop atom prediction and output all predicted candidate atoms when predicting an end character, and may be expressed in various expression forms such as < cls > CCC < sp >, for example, the atom prediction submodel may learn autonomously according to a learning sample in a training process, may randomly predict a plurality of candidate atoms that may be used to generate chemical bonds after receiving the start character, and stop atom prediction after predicting the end character, and the present disclosure does not specifically limit the number and types of candidate atoms.
In the embodiment, the input of the atom prediction submodel is constructed by introducing special symbols such as a start symbol and an end symbol, so that the atom prediction of the atom prediction submodel is started and stopped, the endless atom prediction is avoided, the accuracy and the flexibility of the atom prediction can be improved, and the prediction efficiency and the accuracy of candidate atoms are improved by utilizing the atom prediction submodel to carry out the atom prediction.
Referring to fig. 4, fig. 4 shows a flowchart 400 of an embodiment of generating a molecular graph corresponding to a candidate atom sequence, that is, the step 130 described above, inputting the candidate atom sequence to a chemical bond predictor model for chemical bond prediction, and generating the molecular graph corresponding to the candidate atom sequence may include the following steps:
step 410, inputting the candidate atomic sequence into the coding layer for coding processing, so as to obtain a coding vector corresponding to the candidate atomic sequence.
The chemical bond predictor model may include an encoding layer and an attention mechanism, wherein the encoding layer may be a transform Encoder (transform Encoder), may be connected with the attention mechanism (attention), and may be followed by a residual layer (add/norm).
In this embodiment, after the execution subject obtains the candidate atomic sequence through the atomic predictor model, the candidate atomic sequence may be input to the coding layer of the chemical bond predictor model. The coding layer of the chemical bond predictor model can perform coding processing on each candidate atom in the candidate atom sequence to obtain a coding vector corresponding to the candidate atom sequence.
And step 420, inputting the coding vector into an attention mechanism for attention processing, and predicting the chemical bonds of the candidate atom sequence.
In this embodiment, after the execution subject acquires the coding vector corresponding to the candidate atomic sequence through the coding layer of the chemical bond predictor model, the acquired coding vector may be input to the attention mechanism of the chemical bond predictor model. The attention mechanism of the chemical bond prediction submodel can perform attention processing on the coding vector corresponding to the candidate atom sequence to consider the relationship between the currently generated chemical bond and the historically generated chemical bond, perform residual processing through a residual layer, and predict the chemical bond of each candidate atom in the candidate atom sequence.
As an alternative implementation manner, and in the step 420, the encoding vector is input to an attention mechanism for attention processing, and the chemical bond of the candidate atom sequence is predicted, the method may include the following steps: performing first attention processing on the coding vector through an attention mechanism to obtain a first processing result corresponding to the coding vector; and performing second attention processing on the first processing result through an attention mechanism to predict the chemical bonds of the candidate atom sequences.
In this implementation, the attention mechanism process includes a first attention process for focusing on correlations between the current candidate atom and other candidate atoms and a second attention process for focusing on correlations between the current candidate atom and other atom relationships in the current subgraph that have been predicted. Wherein the first attention process may be an attention process on the correlations between the current candidate atom and other candidate atoms, and determine the correlations between the current candidate atom and each of the other candidate atoms; the second attention process may be an attention process of correlations between the current candidate atom and other atom relationships in the current subgraph that have been predicted, determining correlations between the current candidate atom and the historical chemical bonds that have been determined.
After the execution main body acquires the coding vector corresponding to the candidate atomic sequence through the coding layer of the chemical bond predictor model, the acquired coding vector can be input into the attention mechanism of the chemical bond predictor model. The attention mechanism of the chemical bond predictor model can perform first attention processing on the coding vector corresponding to the candidate atom sequence, and analyze the correlation between the current candidate atom and each candidate atom in the candidate atom sequence to obtain a corresponding first processing result. The attention mechanism of the chemical bond predictor model may then perform a second attention process on the first processed result, analyzing the correlation between the current candidate atom and the historical chemical bonds in the already generated molecular graph, thereby predicting the chemical bonds of the candidate atom sequence.
As an example, the execution subject obtains a candidate atomic sequence through an atomic predictor model, where the candidate atomic sequence is < cls > abc < sp >, and inputs the candidate atomic sequence < cls > abc < sp > to an encoding layer in the chemical bond predictor model to obtain a corresponding encoding vector. The execution subject may input the coding vector corresponding to the candidate atom sequence < cls > abc < sp > into the attention mechanism of the chemical bond predictor model, may analyze the correlations between the candidate atom a and the candidate atom b, between the candidate atom a and the candidate atom c, between the candidate atom b and the candidate atom c, and after determining the correlations between each candidate atom and other candidate atoms, further analyze the correlations between the candidate atom a and the chemical bonds between the candidate atom b and the candidate atom c, so as to determine the chemical bonds corresponding to the candidate atom sequence < cls > abc < sp >.
If the candidate atom a, the candidate atom b and the chemical bond between the candidate atom a and the candidate atom b are known, acquiring a new candidate atom c, and when the chemical bond between the new candidate atom c and the candidate atom a is predicted, analyzing and measuring the correlation between the new candidate atom c and the candidate atom a and the correlation between the candidate atom a and the candidate atom b; when predicting the chemical bond between the new candidate atom c and the candidate atom b, the correlation between the candidate atom a and the candidate atom b, and the chemical bond between the new candidate atom c and the candidate atom a, which have been predicted, are analyzed.
In the implementation mode, attention processing is carried out on the coding vector through an attention mechanism, the relation between the chemical bond generated historically before and each current candidate atom can be considered in the chemical bond prediction process, the dimensionality of chemical bond prediction is increased, the predicted chemical bond is more accurate, and the generation effect of the molecular diagram is improved.
And step 430, generating a molecular diagram corresponding to the candidate atom sequence based on the chemical bonds of the candidate atom sequence.
In this embodiment, after the execution subject predicts the chemical bonds of the candidate atom sequences through the attention mechanism in the chemical bond predictor model, the execution subject may connect the candidate atoms based on the chemical bonds of the candidate atoms to generate the molecular diagram corresponding to the candidate atom sequences.
In the embodiment, attention processing is performed on the coding vector by adding an attention mechanism, so that the dimensionality of chemical bond prediction is increased, the predicted chemical bond is more accurate, and the generation effect of the molecular diagram is improved.
Referring to FIG. 5, FIG. 5 illustrates a flow diagram 500 of one embodiment of optimizing a molecular graph, which may include the steps of:
step 510, obtaining a molecular map to be optimized.
In this embodiment, the execution main body may obtain the to-be-optimized molecular map by network reading or receiving user upload, and the to-be-optimized molecular map may include a plurality of reference atoms having reference chemical bonds, and the to-be-optimized molecular map may be marked with a start symbol and an end symbol in the reference atoms.
And 520, inputting the molecular diagram to be optimized into an atom prediction submodel for atom prediction, and determining an optimized atom sequence corresponding to the molecular diagram to be optimized.
In this embodiment, the execution main body inputs the obtained to-be-optimized molecular diagram into an atom prediction submodel, where the atom prediction submodel performs prediction recognition on the to-be-optimized molecular diagram, detects a start character in the to-be-optimized molecular diagram, the atom prediction submodel starts to detect a reference atom in the to-be-optimized molecular diagram, performs atom prediction according to the reference atom in the to-be-optimized molecular diagram, and determines an optimized atom sequence corresponding to the to-be-optimized molecular diagram, where the optimized atom sequence includes the reference atom included in the to-be-optimized molecular diagram, and also includes a predicted atom predicted based on the reference atom, and until a predicted node is an end character, the atom prediction submodel stops atom prediction, and uses the reference atom and the predicted atom before the end character as optimized atom sequences.
And 530, inputting the optimized atomic sequence into a chemical bond predictor model for chemical bond prediction to generate an optimized molecular diagram corresponding to the molecular diagram to be optimized.
In this embodiment, after the execution subject obtains the optimized atomic sequence, the optimized atomic sequence may be input into the chemical bond predictor model. After the chemical bond predictor model receives the optimized atom sequence, the current atom can be determined from the optimized atom sequence according to the arrangement sequence of the atoms, and then the chemical bonds between the current atom and other optimized atoms in the optimized atom sequence are predicted, so that whether chemical bonds exist between the current atom and other optimized atoms can be determined. Then, the execution subject may use the optimized atom in the optimized atom sequence ordered after the current atom as a new current atom, and predict chemical bonds between the new current atom and other optimized atoms in the optimized atom sequence until determining the chemical bond of each optimized atom in the optimized atom sequence.
After the execution main body determines the chemical bond of each optimized atom in the optimized atom sequence through the chemical bond predictor model, each optimized atom can be connected according to the chemical bond of each optimized atom to generate an optimized molecular diagram corresponding to the optimized atom sequence, wherein the optimized molecular diagram comprises each optimized atom in the optimized atom sequence and the chemical bond between each optimized atom, and is the molecular diagram obtained after the molecular structure of the to-be-optimized molecular diagram is optimized.
In this embodiment, the molecular structure of the molecule graph to be optimized is optimized through the atom prediction submodel and the chemical bond prediction submodel, so that a molecular structure with better properties is generated, and the effect of the molecular graph is improved.
Referring to FIG. 6, FIG. 6 shows a flow diagram 600 of one embodiment of obtaining an atomic predictor model, which may include the steps of:
at step 610, a first set of training samples is obtained.
In this step, the first training sample set may include a sample molecular diagram labeled with a start symbol and an end symbol, and a sample atomic sequence corresponding to the sample molecular diagram. The execution main body can acquire a plurality of sample molecular graphs from a preset database through a network, each sample molecular graph comprises a plurality of sample atoms, the sample atoms in each sample molecular graph are labeled, and a start character and an end character corresponding to the sample molecular graph are determined. The execution subject uses the sample molecular diagram labeled with the start character and the end character and the sample atomic sequence corresponding to the sample molecular diagram as a first training sample set used for training.
At step 620, a first initial model is constructed that includes an input layer and a transform encoder.
In this step, after the execution subject obtains the first training sample set, a first initial model including an input layer and a transform encoder may be constructed.
Step 630, using a machine learning method, taking the sample molecular diagram as an input of the input layer, taking a sample atomic sequence corresponding to the sample molecular diagram as an expected output of the transform encoder, and training the first initial model to obtain an atomic predictor model.
In this step, after the executing entity obtains the first training sample set and constructs the first initial model, the machine learning method may be used to train the first initial model by using a training mode of a Unified Pre-training Language model (unilim, Unified Language model Pre-training for Natural Language Understanding and Generation) based on the first training sample set, so as to obtain the atomic prediction sub-model.
Specifically, the execution subject may input the sample molecular diagram into a first initial model, and the sample molecular diagram is used as input of an input layer, and a predicted atomic sequence corresponding to the sample molecular diagram can be obtained through processing of the first initial model, where a network structure of the first initial model may be a network framework including the input layer and a transform encoder in the related art, and a processing flow of parameters in other network layers may refer to a processing flow of an unlim model in the related art.
In the training process, the execution subject may use a sample atomic sequence corresponding to the sample molecular diagram as an expected output of the transform encoder, compare the output predicted atomic sequence with the expected output, determine whether the predicted atomic sequence meets the constraint condition, adjust the network parameter of the first initial model if the predicted atomic sequence does not meet the constraint condition, and input the sample molecular diagram again to continue training. And if the predicted atomic sequence meets the constraint condition, completing model training to obtain an atomic prediction submodel. The constraint condition may be that a difference between the predicted atomic sequence and the sample atomic sequence in the first training sample set satisfies a preset threshold, where the preset threshold may be preset according to experience, and this is not specifically limited by the present disclosure.
In the implementation mode, the first initial model is trained through the obtained sample molecular diagram and the sample atomic sequence to obtain the atom prediction sub-model, so that the atom prediction efficiency and accuracy can be improved, and the accuracy and efficiency of generating the molecular diagram can be improved.
Referring to FIG. 7, FIG. 7 shows a flow diagram 700 of one embodiment of obtaining a chemical bond predictor model, which may include the steps of:
step 710, a second training sample set is obtained.
In this step, the second training sample set may include a sample atomic sequence and a sample chemical bond corresponding to the sample atomic sequence. The execution main body can acquire the sample atom sequence from a preset database through a network, and carry out chemical bond determination on the sample atoms in the sample atom sequence to obtain the sample chemical bonds corresponding to the sample atom sequence. The execution subject uses the sample atomic sequence and the sample chemical bond corresponding to the sample atomic sequence as a second training sample set used for training the model.
At step 720, a second initial model is constructed that includes a transform encoder and an attention mechanism.
In this step, after the execution subject acquires the second training sample set, a second initial model including a Transformer encoder and an attention mechanism may be constructed.
And 730, training the second initial model by using a machine learning method and taking the sample atomic sequence as the input of the transform encoder and the sample chemical bond corresponding to the sample atomic sequence as the expected output of the attention mechanism to obtain a chemical bond predictor model.
In this step, after the execution subject obtains the second training sample set and constructs the second initial model, the machine learning method may be used to train the second initial model by using a training mode of a Unified Pre-training Language model (unilim, Unified Language model Pre-training for Natural Language Understanding and Generation) based on the second training sample set, so as to obtain the chemical bond prediction sub-model.
Specifically, the execution body may input the sample atomic sequence into a second initial model, and the sample atomic sequence is used as an input of a transform encoder, and a predicted chemical bond corresponding to the sample atomic sequence can be obtained through processing of the second initial model, a network structure of the second initial model may be a network framework including the transform encoder and an attention mechanism in the related art, and a processing flow of parameters in other network layers may refer to a processing flow of an unlim model in the related art.
In the training process, the execution subject may use a sample chemical bond corresponding to the sample atomic sequence as an expected output of the attention mechanism, compare the output predicted chemical bond with the expected output, determine whether the predicted chemical bond meets the constraint condition, adjust the network parameter of the second initial model if the predicted chemical bond does not meet the constraint condition, and input the sample atomic sequence again to continue training. And if the predicted chemical bond meets the constraint condition, completing model training to obtain a chemical bond prediction submodel. The constraint condition may be that a difference between the predicted chemical bond and the chemical bond of the sample in the second training sample set satisfies a preset threshold, where the preset threshold may be preset according to experience, and this is not specifically limited by the present disclosure.
In the implementation mode, the second initial model is trained through the obtained sample atomic sequence and the sample chemical bond to obtain the chemical bond prediction sub-model, so that the chemical bond prediction efficiency and accuracy can be improved, and the accuracy and efficiency of generating the composition sub-map can be improved.
With further reference to fig. 8, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a molecular graph generation apparatus, which corresponds to the method embodiment shown in fig. 1, and which may be applied in various electronic devices.
As shown in fig. 8, the molecular diagram generating apparatus 800 of the present embodiment includes: an acquisition module 810, an atom prediction module 820, and a chemical bond prediction module 830.
The obtaining module 810 is configured to obtain a molecular diagram generation model, wherein the molecular diagram generation model includes an atom predictor model and a chemical bond predictor model;
an atom prediction module 820 configured to input the start symbol into an atom prediction submodel for atom prediction, and determine a candidate atom sequence;
and a chemical bond prediction module 830 configured to input the candidate atom sequence into a chemical bond predictor model for chemical bond prediction, and generate a molecular diagram corresponding to the candidate atom sequence.
In some alternatives of this embodiment, the atom prediction module 820 is further configured to: inputting the initial character into an atom prediction submodel to perform atom prediction to obtain a predicted atom sequence; in response to predicting the terminator, the predicted atomic sequence is determined as a candidate atomic sequence.
In some alternatives of this embodiment, the chemical bond predictor model includes an encoding layer and an attention mechanism; and, a chemical bond prediction module 830, comprising: the encoding unit is configured to input the candidate atomic sequence into an encoding layer for encoding processing to obtain an encoding vector corresponding to the candidate atomic sequence; an attention unit configured to input the encoding vector to an attention mechanism for attention processing, and predict chemical bonds of the candidate atom sequence; and the generating unit is configured to generate a molecular diagram corresponding to the candidate atom sequence based on the chemical bonds of the candidate atom sequence.
In some optional ways of this embodiment, the attention mechanism process includes a first attention process for paying attention to the correlation between the current candidate atom and other candidate atoms and a second attention process for paying attention to the correlation between the current candidate atom and other atom relations in the current subgraph that has been predicted; and an attention unit, further configured to: performing first attention processing on the coding vector through an attention mechanism to obtain a first processing result corresponding to the coding vector; and performing second attention processing on the first processing result through an attention mechanism to predict the chemical bonds of the candidate atom sequences.
In some optional manners of this embodiment, the obtaining module 810 is further configured to obtain a molecular map to be optimized; the atom prediction module 820 is further configured to input the molecular diagram to be optimized into an atom prediction submodel for atom prediction, and determine an optimized atom sequence corresponding to the molecular diagram to be optimized; and the chemical bond prediction module 830 is further configured to input the optimized atomic sequence into the chemical bond predictor model for chemical bond prediction, and generate an optimized molecular diagram corresponding to the molecular diagram to be optimized.
In some alternatives of this embodiment, the atomic predictor model is obtained based on the following steps: acquiring a first training sample set, wherein the first training sample set comprises a sample molecular diagram marked with a start character and an end character and a sample atomic sequence corresponding to the sample molecular diagram; constructing a first initial model comprising an input layer and a Transformer encoder; and training the first initial model by using a machine learning method and taking the sample molecular diagram as the input of the input layer, and taking the sample atomic sequence corresponding to the sample molecular diagram as the expected output of the transform encoder to obtain an atomic prediction submodel.
In some alternatives of this embodiment, the chemical bond predictor model is obtained based on the following steps: acquiring a second training sample set, wherein the second training sample set comprises a sample atomic sequence and a sample chemical bond corresponding to the sample atomic sequence; constructing a second initial model comprising a Transformer encoder and an attention mechanism; and training the second initial model by using a machine learning method and taking the sample atomic sequence as the input of a transform encoder and the sample chemical bond corresponding to the sample atomic sequence as the expected output of an attention mechanism to obtain a chemical bond predictor model.
The molecular diagram generation device provided by the embodiment of the disclosure, by obtaining a molecular diagram generation model, which includes an atom prediction submodel and a chemical bond prediction submodel, then inputting an initiator to the atom prediction submodel for atom prediction, determining a candidate atom sequence, and finally inputting the candidate atom sequence to the chemical bond prediction submodel for chemical bond prediction, generates a molecular diagram corresponding to the candidate atom sequence, can predict the required candidate atom sequence according to the initiator, does not need to sample from the hidden space of the molecular diagram to generate an atom feature matrix, improves the search efficiency of the candidate atom sequence, can predict the corresponding chemical bond based on the candidate atom sequence and the chemical bond prediction submodel, does not need to collect an adjacent matrix with chemical bond features from the hidden space of the molecular diagram, improves the generation efficiency of the chemical bond, the generation of the molecular diagram is completed by utilizing two steps of candidate atom sequence generation and chemical bond generation between candidate atoms, so that the generation effect of the molecular diagram can be effectively improved, the generation difficulty of the molecular diagram is reduced, and the operation efficiency is improved.
Those skilled in the art will appreciate that the above-described apparatus may also include some other well-known structure, such as a processor, memory, etc., which is not shown in fig. 8 in order not to unnecessarily obscure embodiments of the present disclosure.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the electronic apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the molecular diagram generation method. For example, in some embodiments, the molecular diagram generation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When loaded into RAM 903 and executed by computing unit 901, a computer program may perform one or more steps of the molecular diagram generation method described above. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the molecular diagram generation method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A molecular graph generation method, comprising:
obtaining a molecular diagram generation model, wherein the molecular diagram generation model comprises an atom predictor model and a chemical bond predictor model;
inputting the initial character into the atom prediction submodel to perform atom prediction, and determining candidate atom sequences;
and inputting the candidate atomic sequence into the chemical bond predictor model for chemical bond prediction to generate a molecular diagram corresponding to the candidate atomic sequence.
2. The method of claim 1, wherein inputting an initiator into the atom predictor model for atom prediction to determine candidate atom sequences comprises:
inputting the initial character into the atom prediction submodel to perform atom prediction to obtain a predicted atom sequence;
in response to predicting the terminator, determining the predicted atomic sequence as the candidate atomic sequence.
3. The method of claim 1, wherein the chemical bond predictor model comprises an encoding layer and an attention mechanism; and the number of the first and second groups,
inputting the candidate atomic sequence into the chemical bond predictor model for chemical bond prediction to generate a molecular diagram corresponding to the candidate atomic sequence, wherein the method comprises the following steps:
inputting the candidate atomic sequence into the coding layer for coding to obtain a coding vector corresponding to the candidate atomic sequence;
inputting the coding vector into the attention mechanism for attention processing, and predicting chemical bonds of the candidate atom sequence;
and generating a molecular diagram corresponding to the candidate atom sequence based on the chemical bonds of the candidate atom sequence.
4. The method of claim 3, wherein the attention mechanism process includes a first attention process for focusing on correlations between the current candidate atom and other candidate atoms and a second attention process for focusing on correlations between the current candidate atom and other atom relationships in the current subgraph that have been predicted; and the number of the first and second groups,
inputting the coding vector into the attention mechanism for attention processing, predicting the chemical bond of the candidate atom, and comprising:
performing first attention processing on the coding vector through the attention mechanism to obtain a first processing result corresponding to the coding vector;
and performing second attention processing on the first processing result through the attention mechanism to predict the chemical bonds of the candidate atom sequence.
5. The method of any of claims 1-4, further comprising:
obtaining a molecular diagram to be optimized;
inputting the molecular diagram to be optimized into the atomic prediction submodel for atomic prediction, and determining an optimized atomic sequence corresponding to the molecular diagram to be optimized;
and inputting the optimized atomic sequence into the chemical bond predictor model for chemical bond prediction to generate an optimized molecular diagram corresponding to the molecular diagram to be optimized.
6. The method of any of claims 1-4, wherein the atomic predictor model is obtained based on:
acquiring a first training sample set, wherein the first training sample set comprises a sample molecular graph marked with a start character and an end character and a sample atomic sequence corresponding to the sample molecular graph;
constructing a first initial model comprising an input layer and a Transformer encoder;
and training the first initial model by using a machine learning method and taking the sample molecular diagram as the input of the input layer, taking a sample atomic sequence corresponding to the sample molecular diagram as the expected output of the Transformer encoder, and obtaining the atomic prediction submodel.
7. The method of any one of claims 1-4, wherein the chemical bond predictor model is obtained based on:
obtaining a second training sample set, wherein the second training sample set comprises a sample atomic sequence and a sample chemical bond corresponding to the sample atomic sequence;
constructing a second initial model comprising a Transformer encoder and an attention mechanism;
and training the second initial model by using a machine learning method and taking the sample atomic sequence as the input of the Transformer encoder and the sample chemical bond corresponding to the sample atomic sequence as the expected output of the attention mechanism to obtain the chemical bond predictor model.
8. A molecular graph generation apparatus, comprising:
an obtaining module configured to obtain a molecular graph generation model, wherein the molecular graph generation model comprises an atom predictor model and a chemical bond predictor model;
an atom prediction module configured to input an initiator to the atom prediction submodel for atom prediction, and determine a candidate atom sequence;
and the chemical bond prediction module is configured to input the candidate atom sequence into the chemical bond predictor model for chemical bond prediction, and generate a molecular diagram corresponding to the candidate atom sequence.
9. The apparatus of claim 8, wherein the atomic prediction module is further configured to:
inputting the initial character into the atom prediction submodel to perform atom prediction to obtain a predicted atom sequence;
in response to predicting the terminator, determining the predicted atomic sequence as the candidate atomic sequence.
10. The apparatus of claim 8, wherein the chemical bond predictor model comprises an encoding layer and an attention mechanism; and, the chemical bond prediction module comprising:
the encoding unit is configured to input the candidate atomic sequence into the encoding layer for encoding processing, so as to obtain an encoding vector corresponding to the candidate atomic sequence;
an attention unit configured to input the encoding vector to the attention mechanism for attention processing, predicting chemical bonds of the candidate atom sequence;
and the generating unit is configured to generate a molecular diagram corresponding to the candidate atom sequence based on the chemical bonds of the candidate atom sequence.
11. The apparatus of claim 10, wherein the attention mechanism process comprises a first attention process for focusing on correlations between the current candidate atom and other candidate atoms and a second attention process for focusing on correlations between the current candidate atom and other atom relationships in the current subgraph that have been predicted; and, the attention unit, further configured to:
performing first attention processing on the coding vector through the attention mechanism to obtain a first processing result corresponding to the coding vector;
and performing second attention processing on the first processing result through the attention mechanism to predict the chemical bonds of the candidate atom sequence.
12. The apparatus of any one of claims 8-11,
the obtaining module is further configured to obtain a molecular map to be optimized;
the atom prediction module is further configured to input the molecular diagram to be optimized into the atom prediction submodel for atom prediction, and determine an optimized atom sequence corresponding to the molecular diagram to be optimized;
the chemical bond prediction module is further configured to input the optimized atomic sequence into the chemical bond predictor model for chemical bond prediction, and generate an optimized molecular diagram corresponding to the molecular diagram to be optimized.
13. The apparatus of any one of claims 8-11, wherein the atomic predictor model is obtained based on:
acquiring a first training sample set, wherein the first training sample set comprises a sample molecular graph marked with a start character and an end character and a sample atomic sequence corresponding to the sample molecular graph;
constructing a first initial model comprising an input layer and a Transformer encoder;
and training the first initial model by using a machine learning method and taking the sample molecular diagram as the input of the input layer, taking a sample atomic sequence corresponding to the sample molecular diagram as the expected output of the Transformer encoder, and obtaining the atomic prediction submodel.
14. The apparatus of any one of claims 8-11, wherein the chemical bond predictor model is obtained based on:
obtaining a second training sample set, wherein the second training sample set comprises a sample atomic sequence and a sample chemical bond corresponding to the sample atomic sequence;
constructing a second initial model comprising a Transformer encoder and an attention mechanism;
and training the second initial model by using a machine learning method and taking the sample atomic sequence as the input of the Transformer encoder and the sample chemical bond corresponding to the sample atomic sequence as the expected output of the attention mechanism to obtain the chemical bond predictor model.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the method of claim 1.
CN202210554539.6A 2022-05-20 2022-05-20 Molecular diagram generation method and device Pending CN114822721A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210554539.6A CN114822721A (en) 2022-05-20 2022-05-20 Molecular diagram generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210554539.6A CN114822721A (en) 2022-05-20 2022-05-20 Molecular diagram generation method and device

Publications (1)

Publication Number Publication Date
CN114822721A true CN114822721A (en) 2022-07-29

Family

ID=82517215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210554539.6A Pending CN114822721A (en) 2022-05-20 2022-05-20 Molecular diagram generation method and device

Country Status (1)

Country Link
CN (1) CN114822721A (en)

Similar Documents

Publication Publication Date Title
CN113553864B (en) Translation model training method and device, electronic equipment and storage medium
CN112560496A (en) Training method and device of semantic analysis model, electronic equipment and storage medium
CN112466288A (en) Voice recognition method and device, electronic equipment and storage medium
CN113705628B (en) Determination method and device of pre-training model, electronic equipment and storage medium
CN112528641A (en) Method and device for establishing information extraction model, electronic equipment and readable storage medium
CN114564390A (en) Performance test method, device, equipment and product of application program
CN117032938A (en) Operator parallel scheduling method and device, electronic equipment and storage medium
CN115293149A (en) Entity relationship identification method, device, equipment and storage medium
JP2022095895A (en) Traffic data prediction method, traffic data prediction device, electronic device, storage medium, computer program product, and computer program
CN114818913A (en) Decision generation method and device
CN113204614A (en) Model training method, method and device for optimizing training data set
CN114141236B (en) Language model updating method and device, electronic equipment and storage medium
CN115601042A (en) Information identification method and device, electronic equipment and storage medium
CN114822721A (en) Molecular diagram generation method and device
CN113850686B (en) Method and device for determining application probability, storage medium and electronic equipment
CN114998649A (en) Training method of image classification model, and image classification method and device
CN114897183A (en) Problem data processing method, and deep learning model training method and device
CN115186738A (en) Model training method, device and storage medium
CN109285559B (en) Role transition point detection method and device, storage medium and electronic equipment
CN113051479A (en) File processing and recommendation information generation method, device, equipment and storage medium
CN113010571A (en) Data detection method, data detection device, electronic equipment, storage medium and program product
CN113361712B (en) Training method of feature determination model, semantic analysis method, semantic analysis device and electronic equipment
CN113553863B (en) Text generation method, device, electronic equipment and storage medium
CN115878783B (en) Text processing method, deep learning model training method and sample generation method
CN112579842A (en) Model searching method, model searching apparatus, electronic device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination