CN116013407A - Method for generating property decoupling protein based on language model - Google Patents

Method for generating property decoupling protein based on language model Download PDF

Info

Publication number
CN116013407A
CN116013407A CN202211686617.4A CN202211686617A CN116013407A CN 116013407 A CN116013407 A CN 116013407A CN 202211686617 A CN202211686617 A CN 202211686617A CN 116013407 A CN116013407 A CN 116013407A
Authority
CN
China
Prior art keywords
amino acid
property
language model
properties
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211686617.4A
Other languages
Chinese (zh)
Inventor
张强
王泽元
陈华钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZJU Hangzhou Global Scientific and Technological Innovation Center
Original Assignee
ZJU Hangzhou Global Scientific and Technological Innovation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZJU Hangzhou Global Scientific and Technological Innovation Center filed Critical ZJU Hangzhou Global Scientific and Technological Innovation Center
Priority to CN202211686617.4A priority Critical patent/CN116013407A/en
Publication of CN116013407A publication Critical patent/CN116013407A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for generating a property decoupling protein based on a language model, which comprises the following steps: constructing an amino acid quality knowledge graph according to the amino acid properties; protein data are obtained, each protein data is decoupled into an amino acid sequence according to an amino acid property knowledge graph, and the amino acid property sequence is mapped to a vector space from a property space to obtain a vector representation of the amino acid sequence; modeling and training the language model based on causal relation prediction tasks by using vector representation of the amino acid property sequence to optimize parameters of the language model; the generation of proteins is based on a parametric optimized language model, which enables the generation of specific proteins based on amino acid properties.

Description

Method for generating property decoupling protein based on language model
Technical Field
The invention relates to the technical field of proteins, in particular to a method for generating a property decoupling protein based on a language model.
Background
Proteins are sequence data consisting of several amino acids, which from this point of view have a certain similarity to natural language, and thus a great deal of research has migrated methods for natural language onto protein sequences. The language model is the currently most interesting paradigm for modeling languages, the core idea of which is to use known sequences to obtain probability distributions for unknown sequences. There are two most common language models: a mask language model and a cause and effect language model. The masking language model predicts the probability distribution of words at covered locations based on the context of covered words, which is very effective in understanding text. In the field of sequence generation, causal language models dominate, which use probabilities of the preceding modeling postambles to generate text through successive iterations. Modeling of natural language today has shifted from word-to-word co-occurrence frequency based statistics to word vector based neural network fits. Experiments prove that the distributed characteristics and the nonlinear mapping of the neural network have stronger generalization. GPT-3 expands the parameters to 175B, and it is found that at such large scales, the model can generate sentences that are no worse than human utterances. Inspired by the method, a plurality of research teams try to apply the paradigm to proteins, train models such as ProGen2, protGPT2, RITA and the like, find that the larger the model parameter scale is, the better the modeling effect on protein sequences is, the more natural protein can be generated, and the models can be expected to obtain sequences which are different from the natural ones but have expected functions through sampling.
However, based on an amino acid language model of 20 mutually independent amino acid symbols, the properties of the amino acid itself cannot be well modeled, such as: steric hindrance of amino acids, hydrophilicity of amino acids, etc., thus increasing difficulty of model learning. Secondly, the amino acid symbol embedding is a superposition of the probabilities that each property appears at the current position, because the property cannot be decoupled, and the predicted probability that each amino acid appears at the current position. However, proteins with different functions should be distinguished, so that the inability to decouple properties can result in the inability to generate function-specific proteins in a targeted manner, limiting the flexibility in model use.
The current generation method based on the causal language model mainly focuses on how to design the sampler. Conventional maximization-based samplers are intended to generate sequences that best meet model expectations, including greedy-based generation and improved method beam searching. But maximization-based methods can lead to text degradation such as being tedious, disoriented, or trapped in repeated loops. In order to enable the generated text to have flexibility, researchers design Top-k and Nuclear sampling modes, and the main idea is to select a plurality of candidate words first and then select the candidate words according to probability. However, it is noted that these sampling methods are all based on probabilities, and it is impossible to determine what meaning the generated sequence has, and it is impossible to generate a desired sequence in a tendency, so that the practicality of the model is reduced.
Disclosure of Invention
In view of the foregoing, an object of the present invention is to provide a method for generating a protein by decoupling an amino acid profile based on a language model, training a plurality of language models on a protein data set, training samplers for a specific family on a specific amino acid family data set by taking a property representation output from the plurality of language models as input, so as to produce proteins in different fields.
To achieve the above object, an embodiment provides a method for generating a property decoupling protein based on a language model, including the steps of:
constructing an amino acid quality knowledge graph according to the amino acid properties;
protein data are obtained, each protein data is decoupled into an amino acid sequence according to an amino acid property knowledge graph, and the amino acid property sequence is mapped to a vector space from a property space to obtain a vector representation of the amino acid sequence;
modeling and training the language model based on causal relation prediction tasks by using vector representation of the amino acid property sequence to optimize parameters of the language model;
predicting the probability distribution of the next amino acid property based on the vector representation of the known amino acid property sequence by using a language model with optimized parameters, predicting the amino acid based on the probability distribution of the next amino acid property by using a sampler, supplementing the predicted amino acid property into the known amino acid property sequence, repeating the steps until the completion, and converting the final amino acid property sequence into an amino acid sequence as a generated protein.
Preferably, in the amino acid quality knowledge graph, each amino acid and its property are expressed as a triplet (amino acid, property strength, property category), the amino acid quality knowledge graph is constructed according to the triplet, in the property space, the property strength is expressed by the modular length of the vector, the property category is expressed by the direction of the vector, and the embedding of the property in the amino acid quality knowledge graph is obtained.
Preferably, decoupling each protein data into amino acid sequences according to an amino acid property knowledge graph comprises:
and finding out the corresponding amino acid property of each amino acid in each protein data in an amino acid property knowledge graph, and replacing the amino acid by the amino acid property to obtain an amino acid sequence.
Preferably, the amino acid property sequence is mapped from a property space to a vector space, comprising:
according to the different amino acid properties, different mapping modes are adopted, and for discrete properties, a dictionary type embedding mode is adopted; for continuous properties, the vector direction is used for representing the properties, and the vector size is used for representing the magnitude of the property value; for the nature of the graph type representation, a graph neural network embedding approach is employed.
Preferably, the language model is a pluggable model capable of encoding sequences, including LSTM, transformer, GPT3;
the language model is used to predict the probability distribution of the next amino acid property representation from the vector representation of the known amino acid properties, in the language model, the top layer and the bottom layer independently encode the different amino acid properties, do not share parameters, the middle part except the top layer and the bottom layer is used for information sharing by embedding the different amino acid properties through sparse self-attention, and the single amino acid property representation is predicted by information interaction among multiple amino acid properties and enhancement of the known amino acid property information.
Preferably, when training the language model, constructing a loss function by minimizing errors of the property label and the property prediction result, and updating parameters of the language model according to the loss function;
for discrete amino acid properties, a constructed loss function l b (p 0m ) The method comprises the following steps:
Figure BDA0004017334120000041
wherein b represents a lot number, y represents a property label, C represents a predicted property, C is the total amount of the predicted property class,
Figure BDA0004017334120000042
representing the probability that the model predicts the ith amino acid class as c,/for>
Figure BDA0004017334120000043
Representing the probability that the model predicts the ith amino acid species as y, p 0m The sum of the probabilities of the model predicting the amino acid sequence of length m is represented.
For consecutive amino acid properties, the constructed loss function is:
Figure BDA0004017334120000044
wherein m represents the total amount of amino acids,
Figure BDA0004017334120000045
represents an amino acid mean value of +.>
Figure BDA0004017334120000046
Variance is->
Figure BDA0004017334120000047
Is a normal distribution amino acid property probability density function of (a), x represents the input amino acid property sequence, mu and sigma represent the amino acid property mean and variance,/->
Figure BDA0004017334120000048
Represents an amino acid mean value of +.>
Figure BDA0004017334120000049
A normal distribution amino acid property probability density function with variance of 1.
Preferably, the language model is output as a predicted amino acid property representation, which is then mapped to a property space using a single layer linear network, and the loss function is calculated based on the predicted amino acid property.
Preferably, the sampler adopts a neural network and a multi-layer perceptron.
Preferably, the predicted amino acid is obtained by a sampler according to the probability distribution of the amino acid property, the amino acid property knowledge graph is used for determining the property of the predicted amino acid, and the property of the predicted amino acid is supplemented into the known amino acid sequence.
Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
(1) An amino acid quality knowledge graph is constructed for the first time based on the existing amino acid data, and a priori knowledge of finer granularity is provided for protein representation.
(2) At present, amino acid quality probability is obtained through prediction of multiple language models, and domain specific proteins are generated through a domain sampler. Unlike the existing generation model based on amino acid symbols, the resulting generated amino acid sequence is specific, cannot reflect the nature of the amino acid required at the current position, and limits the generation space to 20 natural amino acids. The language model in the invention can not only describe the attribute of the amino acid needed by the current position, but also has the interpretability, and can also enable biologists to design new artificial amino acid through the property described by the model, thereby improving the diversity of biological materials. The sampler takes various amino acid property signals as input, can better balance what amino acid is selected, and meets the requirement of the current position better.
(3) The property embedding mode of the invention utilizes the means of knowledge graph enhancement to map continuous and discrete properties to vector space for the use of language models.
(4) Different from the existing unimorph language modeling generation model, the invention proposes to use a hybrid expert system under different amino acid properties to enable each property to carry out limited communication so as to learn the sequence mode guidance generation with unique properties.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for generating a property decoupling protein based on a language model provided in the examples;
FIG. 2 is a schematic representation of the amino acid properties provided in the examples;
FIG. 3 is a schematic diagram of a pre-training and fine-tuning process for a language model provided by an embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
FIG. 1 is a flow chart of a method for generating a property decoupling protein based on a language model according to an embodiment. As shown in fig. 1, the method for generating a property decoupling protein based on a language model provided in the embodiment includes the following steps:
and 1, constructing an amino acid quality knowledge graph according to the amino acid properties.
In the embodiment, based on experiments on amino acids in the chemical field, the physicochemical properties and the importance degree of the amino acids playing an important role in the protein function are obtained, and an amino acid quality knowledge graph is constructed based on the physicochemical properties and the importance degree, and the amino acid quality knowledge graph is used as a basis for decoupling the amino acids.
As shown in fig. 2, the amino acid properties include: category, solubility, radius, charge, polarity, component, etc. In the amino acid quality knowledge graph, each amino acid and the property thereof are expressed as a triplet (amino acid, property intensity and property category), the amino acid quality knowledge graph is constructed according to the triplet, the property intensity is expressed by the modular length of a vector in a property space, the property category is expressed by the direction of the vector, and the embedding of the property in the amino acid quality knowledge graph is obtained.
And 2, acquiring protein data and constructing a pre-training data set.
In an embodiment, the protein data is derived from protein sequence corpora, including all protein data measured based on sequencing experiments of proteins in the biological domain. Each piece of protein data is an amino acid sequence consisting of amino acids, and in the examples, except the protein data which cannot be taken as a sample, namely the protein data with the length of the amino acid sequence being more than 2048 is removed, and the total length of the rest similar length is not more than 2048 amino acid sequences which are taken as samples.
And 3, decoupling protein data based on the amino acid quality knowledge graph, and performing vector representation.
In an embodiment, each piece of protein data is decoupled into an amino acid sequence according to an amino acid property knowledge graph, and the amino acid property sequence is mapped from a property space to a vector space to obtain a vector representation of the amino acid sequence.
Specifically, each protein data is decoupled into an amino acid sequence according to an amino acid property knowledge graph, and the method comprises the following steps: and finding out the corresponding amino acid property of each amino acid in each protein data in an amino acid property knowledge graph, and replacing the amino acid by the amino acid property to obtain an amino acid sequence.
For a given piece of protein data, each amino acid in the protein data can be mapped from an amino acid profile into three vectors relating to solubility, radius, polarity, and such a piece of protein data can be mapped into three sets of amino acid property sequence vectors.
In the embodiment, in order to input the amino acid properties into the language model, mapping the amino acid properties from a property space to a dense vector space is also required, specifically, according to different amino acid properties, different mapping modes are adopted, and for discrete properties, a dictionary embedding mode is adopted; for continuous properties, the vector direction is used for representing the properties, and the vector size is used for representing the magnitude of the property value; and for the property represented by the graph type, a graph neural network embedding mode is adopted, so that a conversion processing mode of mapping the property space into a vector space is established through the property and embedding mode, and the vector representation of the amino acid property sequence is obtained.
And 4, modeling and training the language model based on a causal relation prediction task by using vector representation of the amino acid property sequence to optimize parameters of the language model.
In an embodiment, the language model is used for predicting a probability distribution of a property representation of a next amino acid from a vector representation of known amino acid properties, a property representation of a current position being obtained on the basis of the known property information being obtained by exchanging information between the properties. The language model is a pluggable model capable of encoding sequences, including LSTM, transformer, GPT3, and the like, and the embodiment adopts GPT3, and the GPT3 is a decoder-only transducer model. As shown in fig. 3, in the language model, the top layer and the bottom layer independently encode different amino acid properties, parameters are not shared, embedding of different amino acid properties is performed through sparse self-attention in the middle part except the top layer and the bottom layer, information sharing is performed, information interaction among multiple amino acid properties is performed, and single amino acid property representation is predicted through enhancement of known amino acid property information.
In an embodiment, the amino acid property representations generated by the language model are mapped to corresponding property spaces based on different property expressions in order to calculate the loss function. Specifically, a single-layer linear network is employed as a mapping head for both discrete and continuous properties to map the amino acid property representation to the corresponding property space, obtaining the amino acid property.
In an embodiment, when training the language model, a loss function is constructed by minimizing errors of the property label and the property prediction result, and parameters of the language model are updated according to the loss function.
In the example, the amino acid quality causal language modeling, a training batch consists of N pieces of protein data, for each of which the current position can only obtain the property information of the previous position. The model predicts the amino acid properties at each position one by one, reducing the confusion of the model over the whole amino acid properties.
The causal language modeling loss function of discrete amino acid properties of a protein is:
Figure BDA0004017334120000081
wherein b represents a lot number, y represents a property label, C represents a predicted property, C is the total amount of the predicted property class,
Figure BDA0004017334120000091
representing the probability that the model predicts the ith amino acid class as c,/for>
Figure BDA0004017334120000092
Representing the probability that the model predicts the ith amino acid species as y, p 0:m Representing the sum of the probabilities of the representation model predicting an amino acid sequence of length m, the total loss is all of the training batchesAnd predicting the sum of losses.
The causal language modeling loss function for the continuous amino acid properties of a protein is:
Figure BDA0004017334120000093
wherein m represents the total amount of amino acids,
Figure BDA0004017334120000094
represents an amino acid mean value of +.>
Figure BDA0004017334120000095
Variance is->
Figure BDA0004017334120000096
Is a normal distribution amino acid property probability density function of (a), x represents the input amino acid property sequence, mu and sigma represent the amino acid property mean and variance,/->
Figure BDA0004017334120000097
Represents an amino acid mean value of +.>
Figure BDA0004017334120000098
A normal distribution amino acid property probability density function with variance of 1. The total loss is the sum of all predicted losses for one training batch.
And 5, generating protein based on the language model optimized by the parameters.
In the downstream task, as shown in fig. 3, a domain-specific protein family is used as a fine-tuning dataset, an amino acid property representation is generated by using a language model optimized by parameters based on the fine-tuning dataset, and a sampler in the specific domain is subjected to parameter optimization by taking the amino acid property representation and amino acid as samples, so that the sampler after parameter optimization can generate a protein with a certain function.
In an embodiment, the protein generation process comprises: predicting the probability distribution of the next amino acid property based on the vector representation of the known amino acid property sequence by using a parameter-optimized language model, predicting the amino acid based on the probability distribution of the next amino acid property by using a sampler, supplementing the predicted amino acid property into the known amino acid property sequence, repeating the steps until the completion, and converting the final amino acid property sequence into an amino acid sequence as a generated protein, thereby generating a domain-specific protein by using the language model and the amino acid sequence simultaneously.
In an embodiment, the sampler employs a multi-layer perceptron and neural network.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims (9)

1. A method for generating a property decoupling protein based on a language model, comprising the steps of:
constructing an amino acid quality knowledge graph according to the amino acid properties;
protein data are obtained, each protein data is decoupled into an amino acid sequence according to an amino acid property knowledge graph, and the amino acid property sequence is mapped to a vector space from a property space to obtain a vector representation of the amino acid sequence;
modeling and training the language model based on causal relation prediction tasks by using vector representation of the amino acid property sequence to optimize parameters of the language model;
predicting the probability distribution of the next amino acid property based on the vector representation of the known amino acid property sequence by using a language model with optimized parameters, predicting the amino acid based on the probability distribution of the next amino acid property by using a sampler, supplementing the predicted amino acid property into the known amino acid property sequence, repeating the steps until the completion, and converting the final amino acid property sequence into an amino acid sequence as a generated protein.
2. The method for generating a property decoupling protein based on a language model according to claim 1, wherein in the amino acid property knowledge graph, each amino acid and its property are represented as a triplet (amino acid, property strength, property class), the amino acid property knowledge graph is constructed according to the triplet, the property strength is represented by the modular length of a vector in a property space, the property class is represented by the direction of the vector, and the embedding of the property in the amino acid property knowledge graph is obtained.
3. The language model based property decoupling protein production method of claim 1, wherein decoupling each protein data into amino acid sequences according to an amino acid property knowledge graph comprises:
and finding out the corresponding amino acid property of each amino acid in each protein data in an amino acid property knowledge graph, and replacing the amino acid by the amino acid property to obtain an amino acid sequence.
4. The language model based property decoupling protein production method of claim 1, wherein the amino acid property sequence is mapped from a property space to a vector space, comprising:
according to the different amino acid properties, different mapping modes are adopted, and for discrete properties, a dictionary type embedding mode is adopted; for continuous properties, the vector direction is used for representing the properties, and the vector size is used for representing the magnitude of the property value; for the nature of the graph type representation, a graph neural network embedding approach is employed.
5. The method for generating a property decoupling protein based on a language model according to claim 1, wherein the language model is a pluggable model capable of encoding sequences, including LSTM, transformer, GPT3;
the language model is used to predict the probability distribution of the next amino acid property representation from the vector representation of the known amino acid properties, in the language model, the top layer and the bottom layer independently encode the different amino acid properties, do not share parameters, the middle part except the top layer and the bottom layer is used for information sharing by embedding the different amino acid properties through sparse self-attention, and the single amino acid property representation is predicted by information interaction among multiple amino acid properties and enhancement of the known amino acid property information.
6. The language model based property decoupling protein generation method of claim 1, wherein a loss function is constructed by minimizing errors of a property tag and a property prediction result when training the language model, and parameters of the language model are updated according to the loss function;
for discrete amino acid properties, a constructed loss function l b (p 0:m ) The method comprises the following steps:
Figure FDA0004017334110000021
wherein b represents a lot number, y represents a property label, C represents a predicted property, C is the total amount of the predicted property class,
Figure FDA0004017334110000022
representing the probability that the model predicts the ith amino acid class as c,/for>
Figure FDA0004017334110000023
Representing the probability that the model predicts the ith amino acid species as y, p 0:m The sum of the probabilities of the model predicting the amino acid sequence of length m is represented.
For consecutive amino acid properties, the constructed loss function is:
Figure FDA0004017334110000031
wherein m represents the total amount of amino acids,
Figure FDA0004017334110000032
represents an amino acid mean value of +.>
Figure FDA0004017334110000033
Variance is->
Figure FDA0004017334110000034
Is a normal distribution amino acid property probability density function of (a), x represents the input amino acid property sequence, mu and sigma represent the amino acid property mean and variance,/->
Figure FDA0004017334110000035
Represents an amino acid mean value of +.>
Figure FDA0004017334110000036
A normal distribution amino acid property probability density function with variance of 1.
7. The language model based property decoupling protein production method of claim 6, wherein the language model output is a predicted amino acid property representation, and then the predicted amino acid property representation is mapped to a property space using a single layer linear network, and a loss function is calculated based on the predicted amino acid property.
8. The method for generating a property decoupling protein based on a language model as claimed in claim 1, wherein the sampler adopts a neural network and a multi-layer perceptron.
9. The method for generating a protein with decoupling properties based on a language model according to claim 1, wherein the predicted amino acid is obtained according to the probability distribution of the amino acid properties by using a sampler, the predicted amino acid properties are determined by using an amino acid property knowledge graph, and the predicted amino acid properties are supplemented to the known amino acid sequences.
CN202211686617.4A 2022-12-26 2022-12-26 Method for generating property decoupling protein based on language model Pending CN116013407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211686617.4A CN116013407A (en) 2022-12-26 2022-12-26 Method for generating property decoupling protein based on language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211686617.4A CN116013407A (en) 2022-12-26 2022-12-26 Method for generating property decoupling protein based on language model

Publications (1)

Publication Number Publication Date
CN116013407A true CN116013407A (en) 2023-04-25

Family

ID=86026113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211686617.4A Pending CN116013407A (en) 2022-12-26 2022-12-26 Method for generating property decoupling protein based on language model

Country Status (1)

Country Link
CN (1) CN116013407A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935952A (en) * 2023-09-18 2023-10-24 浙江大学杭州国际科创中心 Method and device for training protein prediction model based on graph neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935952A (en) * 2023-09-18 2023-10-24 浙江大学杭州国际科创中心 Method and device for training protein prediction model based on graph neural network
CN116935952B (en) * 2023-09-18 2023-12-01 浙江大学杭州国际科创中心 Method and device for training protein prediction model based on graph neural network

Similar Documents

Publication Publication Date Title
CN113535984B (en) Knowledge graph relation prediction method and device based on attention mechanism
CN110309514A (en) A kind of method for recognizing semantics and device
US20230274420A1 (en) Method and system for automated generation of text captions from medical images
CN112364880A (en) Omics data processing method, device, equipment and medium based on graph neural network
CN111008266B (en) Training method and device of text analysis model, text analysis method and device
CN110210032A (en) Text handling method and device
CN112487193B (en) Zero sample picture classification method based on self-encoder
CN110322959B (en) Deep medical problem routing method and system based on knowledge
CN111882042B (en) Neural network architecture automatic search method, system and medium for liquid state machine
CN114913938B (en) Small molecule generation method, equipment and medium based on pharmacophore model
CN116013407A (en) Method for generating property decoupling protein based on language model
CN113946685A (en) Fishery standard knowledge map construction method integrating rules and deep learning
CN115687609A (en) Zero sample relation extraction method based on Prompt multi-template fusion
CN110299194B (en) Similar case recommendation method based on comprehensive feature representation and improved wide-depth model
CN114093445A (en) Patient screening and marking method based on multi-label learning
CN114021584A (en) Knowledge representation learning method based on graph convolution network and translation model
CN117390131A (en) Text emotion classification method for multiple fields
CN114239575B (en) Statement analysis model construction method, statement analysis method, device, medium and computing equipment
CN114139531B (en) Medical entity prediction method and system based on deep learning
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
KR20190082453A (en) Method, apparatus and computer program for analyzing new learning contents for machine learning modeling
CN114692615A (en) Small sample semantic graph recognition method for small languages
CN114239605A (en) Method, device and equipment for generating auxiliary communication content and storage medium
Kavipriya et al. Adaptive Weight Deep Convolutional Neural Network (AWDCNN) Classifier for Predicting Student’s Performance in Job Placement Process
Luo et al. scDiffusion: conditional generation of high-quality single-cell data using diffusion model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination