CN116168775A

CN116168775A - Molecular multi-mode model training and application method, storage medium and chip

Info

Publication number: CN116168775A
Application number: CN202211099018.2A
Authority: CN
Inventors: 苏冰; 文继荣; 杜大钊; 杨钊; 周彧杰; 李江梦; 孙浩; 卢志武
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2023-05-26

Abstract

The invention realizes a molecular multi-mode model training and application method, a storage medium, a chip and a system by a method in the field of network security. Firstly, interaction is carried out between candidate document sets and sub-topics or queries through an Encoder structure in a Transformer, after formal representations of documents and sub-topics are obtained, combination weights are modeled through selected documents, all candidate documents and sub-topics, explicit scores and implicit scores are obtained through interaction, and finally the explicit scores and the implicit scores are combined into final diversified evaluation segments through updated weights. The method designs a explicit and implicit characteristic combination model for dynamically adjusting the weight under different steps of different inquiry so as to improve the effect of diversification of search results. And training the model through a loss function of the lambdaRank mode of listpairwise, and carrying out experimental results on the model to prove the effectiveness and the interpretability of the model.

Description

Molecular multi-mode model training and application method, storage medium and chip

Technical Field

The invention relates to the technical field of machine learning, in particular to a molecular multi-modal model training and application method, a storage medium and a chip.

Background

Knowledge about the molecular correlation and discovery of molecular characteristics are critical to scientific exploration in various fields of biomedicine, chemistry, materials and the like. Traditional exploration methods require a large number of wet biochemical experiments by professionals, which are not only expensive but also time-consuming. With the progress of deep learning, scientific exploration such as predicting molecular properties and generating candidate molecules using artificial intelligence has become possible and many developments have been made.

However, unlike humans that understand molecules from multiple modes, most existing artificial intelligence models are directed to a single modality (e.g., molecular diagram, SMILES string, text) of a single cognitive ability-specific (e.g., attribute prediction, molecular generation, literature understanding) molecule. These models fall into two main categories. The language-based model takes as input natural language related to molecular knowledge and/or SMILES strings. Molecular property prediction models were designed for SMILES molecular strings, for example, in the operations of SMILES-bert, large scale unsuper-visual pre-training for molecular property prediction, by Shung Wang et al, and in the chemistry of Seyone Chithrananda, et al, large-scale self-supervised pretraining for molecular property prediction; the work of Iz Beltagy et al Scibert A pretrained language model for scientific text, diya Li et al Biomedical event extraction based on knowledge-drive tree-lstm, jinheuk Lee et al Biobert: a pre-trained biomedical language representation model for biomedical text mining, etc. focused on learning from biochemical text; zheni Zeng et al, in a deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals, developed a deep learning system to combine the learning of molecular related text with molecular SMILES strings to establish relationships between them; in the molecular generation model of Jike Wang, et al, multi-constraint molecular generation based on conditional transformer, knowl-edge distillation and reinforcement learning, jeffGuo, et al, improving de novo molecular design with curriculum learning, daniel Flam-Shepherd, et al, language models can learn complex molecular distributions, samul C Hoffman, et al, optimizing molecules using efficient queries from property evaluations, etc., the generated molecule is denoted as SMILES. Graph-based models can only handle molecular graphs. Current Graph Neural Network (GNN) based molecular property prediction models are learned from molecular graphs, or generative models are learned directly from graph data to generate component graphs. Training these models requires a large number of manual annotations or a collection of specific properties, while other properties of the molecule and related conditions are typically ignored. Thus, these models can only handle one form of the molecule and cannot obtain a comprehensive understanding of the molecule.

Disclosure of Invention

Therefore, the invention firstly provides a molecular multi-mode model training and application method, a storage medium and a chip, and a graph encoder and a text encoder are jointly learned from multi-mode molecular data so as to correlate a molecular graph with biomedical text description thereof, thereby solving the technical problem of the existing single-mode-based molecular data machine learning model.

The invention firstly provides a molecular multi-mode model training and application method, which comprises the following steps:

s100, a data collection unit is established, a molecular graph and semantic weak related text data thereof are extracted from published SCI papers, and a molecular graph data set is constructed;

s200, a molecular multi-mode model and a pre-training unit thereof are established, the molecular multi-mode model comprising a graph encoder and a text encoder is constructed, and the model is trained by a contrast learning method;

s300, a text-based molecular diagram generating unit is established, and the model is applied to different downstream tasks such as cross-modal retrieval, text description-based molecular diagram generation and the like.

Wherein the method of constructing the data set of the data collection unit comprises:

s201, collecting names, synonyms and SMILES strings of the first 50K molecular compounds in PubCHem;

s202, for each collected molecule, using a SMILES2graph function provided by OGB to convert the SMILES character string into a molecular diagram;

s203, searching sentences containing the names in abstract, introduction and conclusion parts of published scientific papers in the medical, biological, chemical and computer science fields of the S2orc database by using the names of the molecules as queries, recording each searched sentence and adjacent sentences thereof into a document as a paragraph, searching again by taking synonyms or aliases of the molecules as queries if the number of the paragraphs searched by the names is less than two, and terminating the molecular search in advance when the specified number of the paragraphs or the specified document size is searched;

s204, obtaining a molecular map-document pair to form a multi-modal molecular data set.

The molecular multi-mode model consists of a graph encoder and a text encoder, and the two encoders respectively extract a molecular graph representation and a text representation; using a graph isomorphic network as a graph encoder and using a language model Bert as a text encoder; the training stage model additionally uses a similarity calculation module, and the similarity calculation module uses two mapping heads to map the molecular graph and the text representation into a joint representation space respectively, so as to calculate cosine similarity of the mapped features.

The molecular multi-mode model pre-training method specifically comprises the following steps:

s401, initializing a graph encoder by using self-supervision training weights of a graph isomorphic network, and initializing a text encoder by using pre-training weights of BERT in Sci-BERT or KV-PLM;

s402, for each training period, sampling a batch of N molecular diagram-text pair data from a training sequence in sequence;

s403, for each group of small batch data { G ] ₁ ,…,G _N Two different enhancements are generated from each graph by means of random node deletion and random subgraph, respectively, 2N enhancement graphs are generated altogether, and these graphs are input into a graph encoder to obtain their characterization vectors

wherein />

and />

Representing the ith graph G _i Is characterized by two enhancement maps;

s404, randomly extracting two different sentences from the document corresponding to each molecule, describing G for the ith diagram through a text encoder _i The representation obtained for two different sentences of (a) is expressed as

Each molecular diagram in a small batch corresponds to two different sentences, yielding a text representation of 2N +.>

S405, for the ith graph G _i The total multiview loss includes four characterization pairs from multiple modalities

and />

Four contrast losses between->

wherein ,

corresponding contrast loss:

corresponding contrast loss:

corresponding contrast loss:

corresponding contrast loss:

where τ is a temperature parameter that is used to determine,

i.e. a similarity calculation module which first will +.>

and />

Projecting to the same dimension, and then calculating cosine similarity between projection vectors;

s406, calculating contrast loss of graph modes: graph mode contrast loss for the i-th graph is:

s407, calculating the sum of losses of all samples in the batch:

wherein λ is the balance factor between the cross-modal loss and the graph modal loss, and is a hyper-parameter;

s408, for each batch, updating parameters of a graph encoder, a text encoder and a mapping head by back propagation of the total loss L until all the batches in all the current epochs are processed;

s409, repeating S402-S408 until the preset maximum epoch number of rounds is reached.

The text-based molecular diagram generation method comprises the following steps:

the method comprises a molecular multi-modal model trained by a pre-training method and a molecular generator which is pre-trained, is based on random seed sampling and allows a counter-transmission gradient, wherein parameters of the molecular multi-modal model and the molecular generator are fixed;

s501, inputting a text x describing a molecule ^T ；

S502, initializing to generate a seed q; setting q as a learnable parameter;

s503, generating a molecular diagram x according to q by using the trained molecular generator ^G ；

S504, respectively x ^G and x^T Sending the images to a graph encoder and a text encoder of a pre-trained molecular multi-mode model, and extracting a corresponding graph sign z ^G And text characterization z ^T . And calculating the negative similarity of the molecular multi-mode model and the similarity by using a similarity calculation module of the molecular multi-mode model as a loss function:

l _q ＝-sim(z ^G ,z ^T )/τ,

s505, for loss l _q Counter-propagating and updating the seed q by using a gradient descent method;

s506, repeating the steps S503-S506 until the preset maximum epoch number is reached;

s507, sending the optimized q to a molecular generator, generating a final molecular diagram and outputting the molecular diagram.

Meanwhile, the invention also provides a storage medium for the embedded molecular multi-mode model training and application method and a chip applying the storage medium.

The invention has the technical effects that:

the invention provides a molecular multi-modal model (Momu), which implicitly establishes a connection between a molecular structure and a language description. The model can be applied to a very wide range of downstream tasks due to the ability to handle multiple modes of molecules.

The method can directly generate new molecules from the text description of the required conditions by using the molecular diagram generation method based on the text from the molecules generated in the molecular function description, solves the problem that effective molecules cannot be generated in the description in the prior art, and can generate molecular structures under the condition of meeting as many conditions as possible for the description with a plurality of conditions. In contrast to existing AI-based molecular generation methods that can only generate specified attributes, the present method can adaptively generate molecular candidates based on input text that can describe any desired condition or conditions. The pretrained molecular multi-mode model can promote scientific exploration of a plurality of molecular related fields such as biology, chemistry, materials, medicine and the like due to strong generalization capability and imagination.

The molecular multimodal model MoMu provided by the invention is based on pre-training of pair-wise multimodal data consisting of molecular maps and their weakly related biochemical descriptions retrieved from publicly available SCI papers. The pre-training molecular multi-mode model provided by the invention can be applied to a very wide downstream task because of being capable of processing molecules in multiple modes, and therefore, the invention provides a zero sample molecular generation method and a molecular description method based on the pre-training model. Experimental results show that the pre-training model has strong generalization capability in a wide range of downstream tasks, including cross-modal molecular retrieval, molecular titles, zero-sample molecular generation and molecular property prediction.

Drawings

FIG. 1 is a flow chart of the present invention for collection of a set of teletext data;

FIG. 2 is a schematic diagram of the architecture and pre-training principle of the molecular multi-modal model of the present invention;

FIG. 3 is a schematic block diagram of a text-based molecular diagram generation method of the present invention;

FIG. 4 is a graphical representation of the results of the text-based molecular diagram generation method of the present invention for some text input with respect to a functional description.

FIG. 5 is a graphical representation of the results of the text-based molecular graph generation method of the present invention for some text input with respect to structural descriptions.

Detailed Description

The following is a preferred embodiment of the present invention and a technical solution of the present invention is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.

The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention. The invention is more particularly described by way of example in the following paragraphs with reference to the drawings. Advantages and features of the invention will become more apparent from the following description and from the claims. It should be noted that the drawings are in a very simplified form and are all to a non-precise scale, merely for convenience and clarity in aiding in the description of embodiments of the invention.

The invention provides a molecular multi-mode model training and application method, and a storage medium and a chip realized based on the method.

The molecular multi-mode model training and application method comprises three constituent units of data collection, a molecular multi-mode (Momu) model and pre-training and text-based molecular diagram generation.

A data collection unit:

the data collection unit is realized by constructing a data set of a molecular graph-text pair and is used for pre-training a molecular multi-modal model.

The construction method of the data set specifically comprises the following steps: first, the names, synonyms, and SMILES strings of the first 50K molecular compounds in PubCHem were collected, and the PubCHem database contained basic information of over 1.5 hundred million chemicals. To obtain a molecular map of the collected compounds, the SMILES string was converted into a molecular map using a SMILES2graph function provided by OGB. Text in the published scientific papers related to the corresponding molecules is retrieved in the S2orc database as weak semantic supervision. S2orc is a corpus containing 1.36 hundred million papers from different fields, only medical, biological, chemical and computer science fields are extracted, as they are more likely to contain descriptions related to molecules. In order to avoid special characters in the text that are related to the experimental data as much as possible, retrieval is performed only from the abstract, introduction and conclusion sections of each extracted paper.

As shown in fig. 1, for each molecule, a sentence containing its name is first retrieved using its name as a query. Each retrieved sentence and its neighboring sentences are recorded as a paragraph into a document. If there are fewer than two paragraphs retrieved by name, then the search is again conducted by molecular synonyms or aliases as queries. When 5000 paragraphs or document sizes are retrieved over 500Mb, molecular retrieval is terminated prematurely. Not all 50,000 molecules can be retrieved by the corresponding textual description. Finally, 15,613 molecular map-document pairs were obtained to form a multimodal molecular dataset. There are about 3700 ten thousand segments in all collected files. In each pair, sentences in the document contain weakly related semantic information of the corresponding molecular graph.

Molecular multi-modal (MoMu) model and pre-training unit thereof:

the overall architecture and pre-training process of the molecular multi-modal (MoMu) model is shown in fig. 2. MoMu consists of a graphic encoder and a text encoder that encode the molecular graph and text, respectively, into a joint representation space. A Graph Isomorphic Network (GIN) is used as a graph encoder, and a widely used language model Bert is used as a text encoder.

Unlike the training of a teletext multimodal base model with general teletext data, relatively little, if any, teletext data is relevant for a molecule, which is insufficient for training a molecular teletext encoder de novo. Just as humans should have the ability to recognize graphics and language while learning expertise, molecular knowledge that allows artificial intelligence learning expertise needs to be built on trained generic graphics and text encoders. Thus, the graph encoder is initialized with the self-supervised training weights of GINs provided in, and the text encoder is initialized with the pre-training weights of BERTs provided by Sci-BERT and KV-PLM, respectively. Momu initialized with weights of Sci-BERT and KV-PLM are denoted as Momu-S and Momu-K, respectively.

And then training Momu according to the collected pairing data set. For each pair of graph document data in a small batch, two separate graphs are created from the molecular graph using two different types of graph enhancements. Graph augmentation is performed using GraphCL-introduced data enhancement. For graph enhancement, two types of graph enhancement are considered, namely node deletion and subgraph, in particular. The node discard randomly discards some portion of the vertices of the original graph. For molecular figures, the absence of certain atoms (e.g., some hydrogen atoms in a compound) does not change its semantic information. Subgraphs refer to sampling a subgraph from the original graph using random walk. The properties of the molecules have a certain similarity to the properties of the molecules formed by their subgraphs, e.g. some molecules contain the same functional groups. Two different sentences are then randomly extracted from the document. Thus, each modality has two samples containing the same semantic information.

For the graph modality, small batch data (mini-batch) { G of molecular graph-document pair data of size N ₁ ,…,G _N Each graph produces two different enhancements, together producing a 2N enhancement graph. Subsequently, these figures are inputGraph encoders to obtain their token vectors

wherein />

and />

Representing the ith graph G _i Is described herein, is a representation of two enhanced versions of (1). Meanwhile, G will be described by a text encoder _i The characterization obtained by two different sentences of (a) is expressed as +.>

Thus, for the ith graph G _i The total multiview loss includes four contrast losses between four characterization pairs from the multimodality, i.e.>

and />

For simplicity only +.>

The contrast loss of (c) is expressed as:

where τ is a temperature parameter that is used to determine,

first of all the +.>

and />

Projection to the same dimension and then cosine similarity between projection vectors is calculated. The other three cross-modal contrast losses have the same form.

To further enhance the representation capabilities of the graph encoder, contrast learning is utilized in the graph modality. In particular, features of a positive pair are introduced by minimizing normalized cross entropy loss while pushing the features of a negative pair away, where positive pair is two enhancements of the same molecular graph and negative pair is from different molecular graphs. Based on the previous definition, the graph mode contrast loss of the ith graph is derived as:

where τ is the temperature parameter and the final loss is calculated in all samples in a small lot.

The pre-trained MoMu is able to process both molecular figures and natural language text in a unified manner and to obtain generic and transferable knowledge from these heterogeneous data, which can be easily generalized to different downstream tasks.

Implementation details:

a 5-layer GIN with a hidden layer dimension of 300 is used as a graphics encoder. The text encoder selects a BERT with a hidden layer size of 768. Two multi-layered perceptrons are used to project the graphical and sentence features into the same feature space, with each perceptron having an output dimension of 256. For two graph enhancement, the node drop rate is 10% and the size of the sample subgraph is 80% of the original graph. The input data of the text modality are two sentences randomly selected from the document of the corresponding graphic data. The model was pre-trained using an AdamW optimizer with a learning rate of 0.0001 and a weight decay of 1e-5 for 300 epochs. τ was set to 0.1 and the batch size was set to 256. The entire pre-training process was implemented using PyTorch and trained on 8 NVIDIA Tesla V100 PCIe 32GB GPUs.

Application of a molecular multi-modal model in a cross-modal retrieval task. Because the MoMu model provided by the invention is pre-trained by matching weakly relevant text with corresponding molecular patterns, it is able to process the patterns and text modalities of the molecules.

We evaluate its performance in cross-modal retrieval. Given a molecular map, map-to-text (G-T) retrieval is intended to retrieve the text description most relevant to the molecule. Conversely, given a paragraph of text, a text-to-graph (T-G) search is intended to retrieve the most relevant molecular graph it describes. MoM u was evaluated on a PCdes dataset containing SMILES and paired text descriptions for 15K molecules in PubChem. The dataset has been divided into a 10500 pair training set, a 1500 pair validation set, and a 3000 pair test set (two SMILES in the test set cannot be converted to a graph by Rdkit, so the remaining 2998 pairs are used for detection). We convert the SMILES string in each pair into a molecular diagram. In the G-T/T-G task, extracting tokens from query graphs/texts by using a Momu graph/text encoder, extracting tokens from all key texts/graphs to be retrieved by using a Momu text/graph encoder, calculating cosine similarity between the query tokens and all key tokens, and sequencing the key texts/graphs from large to small according to the similarity. According to the experimental setup in document [11], searches were performed in small batches (64 pairs per batch) and all test pairs, respectively, randomly sampled, and the average accuracy of top-1 search results and the recall results (mean ± standard deviation) of top-20 were reported, respectively. In the literature, we represent this setting by sentence-level search, by randomly extracting a sentence from the text corresponding to each molecule for search. The setup using the complete paragraph description for each molecule is further evaluated, called paragraph level retrieval.

The results of the comparison of the method of the present invention with Sci-BERT [8] and KV-PLM (KV-PLM differs from KV-PLM in the treatment of SMILES markers) are shown in Table 1. These methods all fine tune on the PCdes training set for fair comparison. For different settings of G-T and T-G tasks, momu-S and Momu-K are superior to other methods of searching directly using SMILES. The Momu of the present invention can better link molecular structure and natural language description than KV-PLM which jointly model molecular SMILES and language text.

Table 1 performance of different methods on PCdes datasets in graph-to-text (G-T) search and text-to-graph (T-G) search, where the results of sentence-level search by Sci-Bert, KV-PLM are reported in the literature.

Given that some of this 15K molecular map-text pair data in PCdes may have been collected as pre-training data for MoMu, 5,562 graphic-text pairs ranging in molecular id from 50,000 to 100,000 were collected from PubChem, which were not used for pre-training. The comparison with Sci-BERT and KV-PLM on this collected zero sample retrieval test set is shown in Table 2. The performance of Momu-S and Momu-K was significantly better than Sci-BERT and KV-PLM, further demonstrating the generalization ability of Momu. On both data sets, momu-S and Momu-K perform quite well, i.e., initializing a text encoder for Momu with KV-PLM does not result in better performance than initializing with Sci-BERT. This suggests that structural information learned from one-dimensional SMILES molecular strings is not easily transferred to the structured molecular graph, whereas Momu captures the structural information directly using the graph neural network under the supervision of the linguistic descriptions.

Table 2 different approaches on our collected dataset were performed on the performance of a zero sample graph-to-text (G-T) search and a zero sample text-to-graph (T-G) search.

A text-based molecular diagram generation unit:

as shown in FIG. 3, the zero sample text-to-graphic molecule generation method is based on Momu similarityThe measurement module is composed of a molecular generator which is based on random seed sampling and allows gradient back transmission. The invention is illustrated by taking a molecular generator MoFlow based on a flow model as an example. MoFlow defines a parameterized reversible mapping from gaussian distribution to molecular distribution. The molecular diagram G is composed of an atomic matrix

And key matrix->

Composition, wherein N is the number of atoms in the molecule, C _a and C_b Is the number of atom types and bond types. If the nth atom belongs to the c-th atom type, V _n,c =1; otherwise V _n,c =0. E if the bond between the n-th atom and the n '-th atom is of the c' -th bond type _n,n',c' =1; otherwise E _n,n',c' =0. MoFlow contains a map condition flow q _v ＝f _c (V|E) for encoding the atom matrix V given the key matrix E, thereby converting into a latent variable q _v 。gflow:q _e ＝f _g (E) For encoding key matrix E as latent variable q _e 。f _c and f_g Is realized by a graph roll-up neural network based on a graph coupling layer. q _v and q_e Connection q= [ q ] _v ；q _e ]Obeys a gaussian distribution P (q). After MoFlow training, a variable q can be sampled from P (q) and decomposed into two parts q _v and q_e They are input to the reverse map conditional flow +.>

And reverse gflow->

To obtain a probability matrix: />

wherein

Is the predicted probability that the bond between the nth atom and the nth 'atom belongs to the c' bond type,/o>

Is the probability that the nth atom belongs to the type of c atom. V and E can be determined by p->

and />

Is obtained by performing the maximum value indexing operation on the last dimension of the block. GN is the graph normalization layer. By sampling different q from P (q), the MoFlow can generate different novel and efficient molecules.

Zero sample text-to-graphic molecule generation method describes text x ^T As input. The only parameter that can be learned in the method is q, which is initialized by random sampling from P (q). All parameters of pre-trained MoMu and Mo Flow were frozen. Input x ^T Is fed into a Momu text encoder to obtain a text representation z ^T . q is input to MoFlow to obtain

and />

In order to make all operations differentiable, thus allowing the gradient to counter-propagate, will +.>

Rather than V-inputInto a map encoder of Momu to obtain a map representation z ^G . The trained graphic encoder GIN contains the embedding of all atoms and bond types. The original V is used as the index sentence for selecting the corresponding embedding from the atom types in the first layer. When using +.>

When, for each atom, the characterization obtained is in fact a weighted sum of all atom embeddings. />

The probability between the atom types of each node is used as the attention score. The loss function is the projection z ^T and z^G Cosine similarity between:

l _q ＝-sim(z ^G ,z ^T )/τ,

wherein sim (z) ^G ,z ^T ) For computing the similarity between the projected representations in MoMu. q can be determined by reference to l _q Is updated by gradient back-propagation. Updates were made using Adam optimizer. After repeated updates up to 500 iterations, obtain optimized q, then input it into MoFlow to obtain

and />

Finally pair->

and />

The last dimension of the block is subjected to maximum value indexing operation to obtain a molecular graph g= (V, E).

The molecules generated from the description of molecular functions by the method of the invention are shown in FIG. 4. For the description of "fluorescent molecules", molT5 cannot produce an effective molecule, whereas the method of the present invention produces a molecule having a conjugated double bond or conjugated molecule. For the description that "the molecule includes a hydroxyl group and a carboxyl group, is capable of decomposing to produce ammonia gas, and has an oxygen content exceeding 20%", various molecules having a hydroxyl group, a high oxygen content, and a nitrogen atom to produce ammonia are successfully produced, and thus three-quarters of the conditions are satisfied. Unlike existing AI-based molecular generation methods that can only generate specified attributes, the present method adaptively generates molecular candidates based on input text that may describe any desired condition or conditions. In the last description, three desired molecular properties are specified, including high water solubility, high barrier permeability and low toxicity, which can be evaluated by a fine-tuning property prediction model. The present invention is based on the process of Momu-S and Momu-K, which allows the generation of different molecules with high penetrability, low toxicity and high water solubility.

The molecules generated from the molecular structure description are shown in fig. 5. For descriptions containing nucleophilic groups, the methods of the invention produce different molecules with amino groups, hydroxyl groups, or double bonds. For descriptions containing electrophilic groups, the method of the present invention is capable of generating different molecules having carbonyl groups, alkyl-like groups or halogen atoms, despite inhibiting formal charge. For descriptions containing hydrophilic groups, the method of the present invention is capable of producing molecules containing hydroxyl, amino or aldehyde groups of different structures. For descriptions containing lipophilic groups, the method of the present invention produces molecules containing alkyl groups having different structures, halogen atoms, or benzene rings.

The present invention is not described in detail in part as being well known to those skilled in the art.

Claims

1. A molecular multi-mode model training and application method is characterized in that: the method comprises the following steps:

s500, a text-based molecular diagram generating unit is established, a model is applied to different downstream tasks such as cross-modal retrieval, molecular diagram generation based on text description and the like, and the generated molecular diagram is finally output.

2. The method for training and applying a molecular multi-modal model according to claim 1, wherein: wherein the method of constructing the data set of the data collection unit comprises:

3. A method of molecular multimodal model training and application as claimed in claim 2, wherein: the molecular multi-mode model consists of a graph encoder and a text encoder, and the two encoders respectively extract a molecular graph representation and a text representation; using a graph isomorphic network as a graph encoder and using a language model Bert as a text encoder; the training stage model additionally uses a similarity calculation module, and the similarity calculation module uses two mapping heads to map the molecular graph and the text representation into a joint representation space respectively, so as to calculate cosine similarity of the mapped features.

4. A method of molecular multimodal model training and application as claimed in claim 3, wherein: the molecular multi-mode model pre-training method specifically comprises the following steps:

s403, for each group of small batch data { G ] ₁ ，...，G _N Two different enhancements are generated from each graph by means of random node deletion and random subgraph, respectively, 2N enhancement graphs are generated altogether, and these graphs are input into a graph encoder to obtain their characterization vectors

wherein />

and />

Representing the ith graph G _i Is characterized by two enhancement maps;

S405, for the ith graph G _i The total multiview loss includes four from multiple modalitiesPairs of characterizations

and />

Four contrast losses between->

wherein ,

corresponding contrast loss:

corresponding contrast loss:

corresponding contrast loss:

corresponding contrast loss:

where τ is a temperature parameter that is used to determine,

i.e. a similarity calculation module which first will +.>

and />

s407, calculating the sum of losses of all samples in the batch:

5. The method for training and applying the molecular multi-modal model as set forth in claim 4, wherein: the text-based molecular diagram generation method comprises the following steps:

s501, inputting a text x describing a molecule ^T ；

S502, initializing to generate a seed q; setting q as a learnable parameter;

S504, respectively x ^G and x^T Sending the images to a graph encoder and a text encoder of a pre-trained molecular multi-mode model, and extracting a corresponding graph sign z ^G And text characterization z ^T And calculating the negative similarity of the molecular multi-mode model and the similarity by using a similarity calculation module of the molecular multi-mode model as a loss function:

l _q ＝-sim(z ^G ，z ^T )/τ，

6. A storage medium, characterized in that: a method of training and applying a molecular multimodal model as defined in any one of claims 1-5.

7. Chip, its characterized in that: use of the storage medium of claim 6.