WO2023109714A1 - Multi-mode information fusion method and system for protein representative learning, and terminal and storage medium - Google Patents

Multi-mode information fusion method and system for protein representative learning, and terminal and storage medium Download PDF

Info

Publication number
WO2023109714A1
WO2023109714A1 PCT/CN2022/138208 CN2022138208W WO2023109714A1 WO 2023109714 A1 WO2023109714 A1 WO 2023109714A1 CN 2022138208 W CN2022138208 W CN 2022138208W WO 2023109714 A1 WO2023109714 A1 WO 2023109714A1
Authority
WO
WIPO (PCT)
Prior art keywords
protein
multimodal
modal
learning model
feature extractor
Prior art date
Application number
PCT/CN2022/138208
Other languages
French (fr)
Chinese (zh)
Inventor
胡奕绅
殷鹏
胡帆
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2023109714A1 publication Critical patent/WO2023109714A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Definitions

  • the application belongs to the technical field of medical data processing, and specifically relates to a multimodal information fusion method, system, terminal and storage medium for protein representation learning.
  • Protein representation learning is a very important research topic in the field of bioinformatics. It plays a key role in predicting protein-protein interactions, protein-drug interactions, and protein-gene interactions. A good data representation should be able to cover the information of the object itself in multiple directions, so that the reasoning process of downstream tasks has more available feature support.
  • representation learning In the computational research of proteins, it is necessary to convert proteins into data that can be processed by computers, and before inputting the original data into the model, features need to be extracted. This process is called representation learning. A good representation learning can improve the performance of downstream tasks. with major help.
  • the representation learning of proteins can be divided into single-modal representation and multi-modal representation.
  • the sequence of a protein is similar to a text sequence, which can be modeled using techniques from the NLP field.
  • some studies used CNN to perform one-dimensional convolution on protein sequences to extract the sequence features of proteins for subsequent tasks; some studies also used the RNN model, which is good at time series data, and achieved good results.
  • Recently, many people have tried Transformer, which has made breakthroughs in the fields of NLP and CV, pre-trained large-scale protein sequences, and achieved better results in downstream tasks.
  • the structural modality of the protein is also crucial to understanding the protein itself.
  • There are fewer studies on protein structure modeling than sequences Some studies convert 3D protein structures into images, and then use CNN to extract features to represent proteins. Other studies tile 3D structures into adjacency matrices of amino acid nodes. Then use the algorithm of graph neural network to model.
  • the key is how to fuse unimodal information.
  • Most studies use different feature extractors to extract unimodal information, and then splicing or summing embeddings of different modalities to obtain new The embedding is used as a multimodal representation, and some people input the embedding into a new interactive network, such as Transformer, to obtain an interactive embedding after splicing or summing.
  • a new interactive network such as Transformer
  • One of the purposes of the present application is to provide a multimodal information fusion method for protein representation learning, including the following steps:
  • the unimodal feature extractor is used as a feature extractor for protein sequences
  • the multi-modal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-modal information with multiple models is used as the single-modal feature the input to the extractor;
  • the training set trains the learning model, the verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the test set to independently test the learning model generalization ability.
  • the step of preprocessing the open source protein data specifically includes the following steps:
  • the sequence data of the protein is extracted from the open source protein data set, the sequence is composed of 20 English letters, and the 20 English letters represent 20 kinds of amino acids, and the 3D structure of the protein is converted into an adjacency matrix graph.
  • a single-modal feature extractor in the step of constructing a single-modal feature extractor, it specifically includes:
  • the unimodal feature extractor is a pre-trained Transformer model.
  • the following steps are specifically included:
  • D struc represents the characteristic dimension of each amino acid in the structure
  • the pooled vectors of sequences and structures are concatenated, and then transformed into vectors containing multimodal information through a fully connected network.
  • the formula is as follows:
  • the multi-modal information compression vector M comp is redistributed to each mode to calibrate the single-modal information.
  • the process of distributing is to introduce the fully connected conversion layer of each mode respectively.
  • the formula is as follows:
  • T struc W struc M comp +b struc ,
  • the modal vector of shunt conversion is activated through the activation function, which is used as a gating switch to limit the contribution of each amino acid to the overall task.
  • the specific formula is as follows:
  • refers to the sigmoid function, and ⁇ refers to the Hadamard product
  • the reconstructed unimodal vector is obtained as the input of the next layer unimodal feature extractor.
  • the following steps are specifically included:
  • the original protein data passes through the early unimodal feature extractor of the Ne layer, the sequence passes through the encoding layer of the Transformer model, and the structure passes through the graph attention network layer, and the output result represents a unimodal vector representation that has extracted high-level semantics;
  • the single mode is calibrated by multi-modal information, and continues to pass through the N1 layer feature extractor, and further performs the calibrated feature mining;
  • the [cls] vectors of the two modalities of the calibrated feature mining are spliced, and then passed through the feed-forward neural network, and then spliced with the [cls] vector obtained by the early single-modal feature extractor;
  • the learning model is obtained.
  • An auxiliary loss is added to update the parameters of the learning model.
  • the second purpose of this application is to provide a multimodal information fusion system for protein representation learning, including:
  • Data processing unit used to preprocess open source protein data
  • Classification unit used to divide the protein data set into training set, verification set and test set;
  • Single-modal feature extractor construction unit used to construct a single-modal feature extractor, the single-modal feature extractor is used as a feature extractor for protein sequences;
  • Multimodal fusion module construction unit used to build a multimodal fusion module, the multimodal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-mode has multiple models information, and as the input of the unimodal feature extractor;
  • Learning model building unit used to build a learning model based on the multimodal fusion module
  • Training unit the training set trains the learning model, the verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the test set to independently test the learning model. generalization ability of the learning model.
  • the third purpose of the present application is to provide a terminal, including: the terminal includes a processor and a memory coupled to the processor, wherein,
  • the memory stores program instructions for realizing the multimodal information fusion method for protein characterization learning
  • the processor is configured to execute the program instructions stored in the memory to control multimodal information fusion.
  • the fourth object of the present application is to provide a storage medium, which stores program instructions executable by a processor, and the program instructions are used to execute the multimodal information fusion method for protein characterization learning.
  • the multi-modal information fusion method, system, terminal, and storage medium for protein representation learning provided by this application utilize the strategies of early extraction, mid-term fusion, and later prediction, so that each single-modal model can fully extract the information of each modality
  • the high-level semantic information is then fused, and then the feed-forward neural network is used to predict the task in the later stage; at the same time, a multi-modal fusion module is proposed, which can perform different modal information for each layer of the network during the mid-term fusion.
  • the fused multimodal embedding and the previous single-modal embedding are spliced together as the protein itself In this way, the original information of the single mode can be preserved to the greatest extent.
  • the multi-modal information fusion method, system, terminal and storage medium for protein representation learning provided by this application, when designing the loss function, predict a result for the feature extraction network of different layers in the later prediction stage, as the final loss
  • the auxiliary loss, the introduction of auxiliary loss can help the model converge faster and achieve a better performance.
  • Fig. 1 is a flow chart of the steps of the multimodal information fusion method for protein representation learning provided by the embodiment of the present application.
  • Fig. 2 is an adjacency matrix diagram of proteins provided in the examples of the present application.
  • Fig. 3 is a schematic diagram of a multi-modal fusion module provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a learning module provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a multi-modal information fusion method for protein representation learning provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a terminal structure provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a storage medium provided by an embodiment of the present application.
  • first and second are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • “plurality” means two or more, unless otherwise specifically defined.
  • Figure 1 is a flow chart of the steps of the multimodal information fusion method for protein characterization learning provided by this application, including the following steps:
  • Step S110 Preprocessing the open source protein data.
  • these datasets have various tasks, including prediction of protein fluorescence, protein secondary structure, protein long-range homology and protein stability. Extract protein sequence data from these data sets, the sequence consists of 20 English letters (representing 20 amino acids), and convert the 3D structure of the protein into an adjacency matrix map, also known as a contact map.
  • Step 120 Divide the protein data set into training set, verification set and test set.
  • the processed data set is divided into training, verification and test sets, the training set is used to adjust the model parameters to fit the target, the verification set is used to select the optimal parameters, and the test set is used to evaluate the final effect of the model.
  • Step S130 constructing a single-modal feature extractor, which serves as a feature extractor for protein sequences.
  • this application selects the pre-trained Transformer model TAPE as the feature extractor of the protein sequence.
  • the pre-training strategy allows the model to have prior information before training, which can play a positive role in the reasoning of the model. Function, it can capture the amino acid association relationship of the whole sequence, and supports parallelization.
  • the nature of its topological structure is suitable for solving by graph algorithms.
  • This application uses an effective graph neural network, specifically the graph attention network GAT, which also uses the attention mechanism to capture neighbor nodes and target nodes. relation.
  • Step S140 Constructing a multimodal fusion module, the multimodal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-modal has information of multiple models, and serves as the single-modal feature extractor. Input to the modality feature extractor.
  • Step 1 Perform average pooling on the sequence feature matrix and structure feature matrix, and obtain a representative value for each amino acid feature vector, the formula is as follows, where
  • D seq represents the feature dimension of each amino acid on the sequence
  • D struc represents the feature dimension of each amino acid on the structure
  • the second step concatenate the pooled vectors of sequences and structures, and then convert them into vectors containing multimodal information through a fully connected network.
  • the formula is as follows:
  • This step is a process of multimodal information interaction and compression.
  • Step 3 Re-distribute the multi-modal information compression vector M comp to each mode to calibrate the single-modal information.
  • the process of diversion is to introduce the fully connected conversion layer of each mode respectively.
  • the formula is as follows:
  • T struc W struc M comp +b struc ,
  • Step 4 Activate the modal vector of shunt conversion through the activation function, and use it as a gating switch to limit the contribution of each amino acid to the overall task.
  • the specific formula is as follows:
  • refers to the sigmoid function
  • refers to the Hadamard product
  • Step 5 After multiplying with the activated gating vector, the reconstructed unimodal vector is obtained as the input of the next layer unimodal feature extractor.
  • this application is a method of calibration and reconstruction in the step of building a multimodal fusion module. Specifically, it uses multimodal information interaction to update each single-modal amino acid token embedding, so that the original possible Fuzzy single-modal information with multi-modal information guidance makes it clearer in pattern recognition.
  • Step S150 Construct a learning model based on the multimodal fusion module.
  • Figure 4 is a schematic diagram of the principle of building a learning model based on the multimodal fusion module, which specifically includes the following steps:
  • Step S151 Add a special token to the original input of the protein sequence and structure, named [cls], the [cls] of the sequence is placed at the top of the entire sequence, and the [cls] of the structure establishes a virtual full connection with all amino acids .
  • Step S152 The original protein data passes through the early unimodal feature extractor of Ne layer, the sequence passes through the encoding layer of the Transformer model, the structure passes through the graph attention network layer, and the output result represents the unimodal vector that has extracted high-level semantics express.
  • Step S153 inserting the multimodal fusion module for mid-term fusion.
  • each layer adds interaction between modalities, inserts the multi-modal fusion network described in Figure 3, and passes through a total of N m layers.
  • Step S154 After the mid-term fusion, the single modality is calibrated by the multimodal information, and continues to pass through N1 layer feature extractors for further calibrated feature mining.
  • Step S155 Concatenate the [cls] vectors of the two modalities of the calibrated feature mining, and then pass through the feedforward neural network, and then concatenate with the [cls] vector obtained by the early single-mode feature extractor.
  • the spliced feature vectors can be passed through a learnable feedforward neural network to obtain a more holistic feature vector, and the prediction results are more accurate.
  • the multi-modality may lose some information of the single-mode in the process of information transmission, the information can be completed after splicing with the single-mode vector.
  • Step S156 Obtain the learning model through the second feed-forward neural network.
  • the multi-modal fusion strategy provided by the embodiment of the present application can make the model learn more fully about single-modal and multi-modal information through early extraction, mid-term fusion, and later prediction strategies;
  • the state representation is not directly used for prediction, but the early unimodal representation is added, so that the unimodal information lost during the network transmission process can be supplemented at the end.
  • Step S157 adding an auxiliary loss to update the parameters of the learning model.
  • each feature extraction layer in the later prediction stage of this application will output the results to predict the final goal.
  • the resulting loss is used as an auxiliary loss, which is added to the main loss to update the parameters of the model.
  • Step S160 the training set trains the learning model, the verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the testing set to independently test the generalization ability of the learning model.
  • FIG. 5 is a schematic structural diagram of a multimodal information fusion system for protein characterization learning provided by this application, including: data processing unit 110: used to preprocess open source protein data; classification unit 120: used to The protein data set is divided into a training set, a verification set and a test set; a single-modal feature extractor construction unit 130: used to construct a single-modal feature extractor, and the single-modal feature extractor is used as a feature extraction of protein sequences device; multimodal fusion module construction unit 140: used to construct a multimodal fusion module, the multimodal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-modal with Multi-model information, and as the input of the single-modal feature extractor; learning model construction unit 150: for building a learning model based on the multi-modal fusion module; training unit 160: the training set trains the learning model The verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the test set
  • FIG. 6 is a schematic diagram of a terminal structure according to an embodiment of the present application.
  • the terminal 50 includes a processor 51 and a memory 52 coupled to the processor 51 .
  • the memory 52 stores program instructions for implementing the multimodal information fusion method for protein characterization learning.
  • the processor 51 is configured to execute the program instructions stored in the memory 52 to control the multimodal information fusion.
  • the processor 51 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 51 may be an integrated circuit chip with signal processing capabilities.
  • the processor 51 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components .
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • FIG. 7 is a schematic structural diagram of a storage medium according to an embodiment of the present application.
  • the storage medium of the embodiment of the present application stores a program file 61 capable of realizing all the above-mentioned methods, wherein the program file 61 can be stored in the above-mentioned storage medium in the form of a software product, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the methods in various embodiments of the present application.
  • a computer device which can It is a personal computer, a server, or a network device, etc.
  • processor processor
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. , or terminal devices such as computers, servers, mobile phones, and tablets.

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Physiology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The present application provides a multi-mode information fusion method and system for protein representative learning, and a terminal and a storage medium. A strategy of performing extraction at an early stage, performing fusion at an intermediate stage and performing prediction at a later stage is used, such that single-mode models can fully extract advanced semantic information of respective modes and then perform fusion, and task prediction is then performed by means of a feedforward neural network at the later stage. Moreover, a multi-mode fusion module is provided, such that fine-grained interaction can be performed on different mode information of a network on each layer during fusion at the intermediate stage, and multiple modes are better fused and transferred. On the last layer of a feature extractor during prediction at the later stage, the embedding of the fused multiple modes and the embedding of the previous single mode are combined together to serve as a representation of protein itself, such that original information of the single mode can be retained to the greatest extent.

Description

用于蛋白质表征学习的多模态信息融合方法、系统、终端及存储介质Multimodal information fusion method, system, terminal and storage medium for protein representation learning 技术领域technical field
本申请属于医学数据处理技术领域,具体涉及一种用于蛋白质表征学习的多模态信息融合方法、系统、终端及存储介质。The application belongs to the technical field of medical data processing, and specifically relates to a multimodal information fusion method, system, terminal and storage medium for protein representation learning.
背景技术Background technique
蛋白质的表征学习是生物信息领域一个非常重要的研究课题,它对于预测蛋白与蛋白的互作、蛋白与药物的互作、蛋白与基因的互作等都起到十分关键的作用。一个好的数据表征应该能多方位覆盖物体本身的信息,使下游任务的推理过程有更多可用的特征支撑。Protein representation learning is a very important research topic in the field of bioinformatics. It plays a key role in predicting protein-protein interactions, protein-drug interactions, and protein-gene interactions. A good data representation should be able to cover the information of the object itself in multiple directions, so that the reasoning process of downstream tasks has more available feature support.
在蛋白质的计算研究中,需要将蛋白质转化为计算机能够处理的数据,而原始数据在输入模型之前,需要进行特征的提取,这个过程称为表征学习,一个好的表征学习对于下游任务的性能提升具有重大帮助。对于蛋白质的表征学习可以分为单模态的表征和多模态的表征。In the computational research of proteins, it is necessary to convert proteins into data that can be processed by computers, and before inputting the original data into the model, features need to be extracted. This process is called representation learning. A good representation learning can improve the performance of downstream tasks. with major help. The representation learning of proteins can be divided into single-modal representation and multi-modal representation.
在单模态上,主要是分别针对序列和结构的特征进行学习。蛋白质的序列类似于文本序列,可以借鉴NLP领域的技术去建模。在过去,有研究利用CNN对蛋白质序列进行一维卷积,提取蛋白质的序列特征后做后续的任务;也有研究用到了在时序的数据上见长的RNN模型,同样取得了不错的效果。最近,不少人尝试了在NLP和CV领域取得突破性进展的Transformer,对大规模的蛋白质序列做预训练,在下游任务中取得了更好的结果。与序列的模态相对,蛋白质的结构模态同样对理解蛋白质本身至关重要。针对蛋白质结构的建模研究相比于序列要少一些,有研究将3D蛋白质结构转为图像,然后利用CNN提取特征对蛋白质进行表示,还有研究将3D结构平铺为氨基酸节点的邻接矩 阵,然后利用图神经网络的算法进行建模。On the single modality, it is mainly to learn the features of sequence and structure respectively. The sequence of a protein is similar to a text sequence, which can be modeled using techniques from the NLP field. In the past, some studies used CNN to perform one-dimensional convolution on protein sequences to extract the sequence features of proteins for subsequent tasks; some studies also used the RNN model, which is good at time series data, and achieved good results. Recently, many people have tried Transformer, which has made breakthroughs in the fields of NLP and CV, pre-trained large-scale protein sequences, and achieved better results in downstream tasks. As opposed to the modality of the sequence, the structural modality of the protein is also crucial to understanding the protein itself. There are fewer studies on protein structure modeling than sequences. Some studies convert 3D protein structures into images, and then use CNN to extract features to represent proteins. Other studies tile 3D structures into adjacency matrices of amino acid nodes. Then use the algorithm of graph neural network to model.
在多模态上,关键是如何将单模态的信息进行融合,大多数的研究利用不同的特征提取器提取单模态的信息,然后将不同模态的embedding进行拼接或加和,得到新的embedding作为多模态的表征,也有人在拼接或加和之后将embedding输入到一个新的交互网络中,比如Transformer,去得到一个交互的embedding。In multimodality, the key is how to fuse unimodal information. Most studies use different feature extractors to extract unimodal information, and then splicing or summing embeddings of different modalities to obtain new The embedding is used as a multimodal representation, and some people input the embedding into a new interactive network, such as Transformer, to obtain an interactive embedding after splicing or summing.
当前很多多模态融合的方法只是简单地将单模态的表征进行拼接或加和,这种方法并不能细粒度地学习模态之间的交互信息,得到的表征向量会丢失很多信息。有些研究考虑到要学习模态间的交互,它们在原始数据的初始嵌入层将两个模态的数据进行拼接,然后传入Transformer的编码层去学习token间的关系,尽管如此,该方法在早期对模态进行融合,会导致每个模态并没有充分提取高级的语义信息就与其它模态融合,在后续任务中表现的不尽人意。此外,基本所有的研究都是提取出多模态的表征之后直接用于下游,但是无论多模态的特征学得多好,总会有单模态的信息在传递过程中丢失。Many current multimodal fusion methods simply splicing or summing single-modal representations. This method cannot learn the interaction information between modalities in a fine-grained manner, and the resulting representation vector will lose a lot of information. Some studies consider learning the interaction between modalities. They splice the data of the two modalities in the initial embedding layer of the original data, and then pass it into the encoding layer of the Transformer to learn the relationship between tokens. However, this method is in Early fusion of modalities will cause each modal to be fused with other modalities without fully extracting advanced semantic information, and the performance in subsequent tasks is unsatisfactory. In addition, basically all research is to extract multi-modal representations and use them directly downstream, but no matter how well the multi-modal features are learned, there will always be single-modal information lost in the transmission process.
发明内容Contents of the invention
鉴于此,有必要针对现有技术存在的缺陷提供一种可以最大程度保留单模态原有的信息的用于蛋白质表征学习的多模态信息融合方法。In view of this, it is necessary to provide a multimodal information fusion method for protein representation learning that can preserve the original information of the single modality to the greatest extent to address the shortcomings of the existing technology.
为解决上述问题,本申请采用下述技术方案:In order to solve the above problems, the application adopts the following technical solutions:
本申请目的之一在于,提供一种用于蛋白质表征学习的多模态信息融合方法,包括下述步骤:One of the purposes of the present application is to provide a multimodal information fusion method for protein representation learning, including the following steps:
对开源蛋白质数据进行预处理;Preprocessing open source protein data;
将所述蛋白质数据集划分成训练集、验证集和测试集;dividing the protein data set into a training set, a verification set and a test set;
构建单模态特征提取器,所述单模态特征提取器作为蛋白质序列的特征提取器;Constructing a unimodal feature extractor, the unimodal feature extractor is used as a feature extractor for protein sequences;
构建多模态融合模块,所述多模态融合模块对所述单模态特征提取器的氨基酸token embedding进行更新,以使单模态带有多模型的信息,并作为所述单模态特征提取器的输入;Build a multi-modal fusion module, the multi-modal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-modal information with multiple models is used as the single-modal feature the input to the extractor;
基于所述多模态融合模块构建学习模型;Constructing a learning model based on the multimodal fusion module;
所述训练集训练所述学习模型,所述验证集衡量所述学习模型的效果,并选出性能表现最好的参数作为所述学习模型的参数,利用所述测试集独立测试所述学习模型的泛化能力。The training set trains the learning model, the verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the test set to independently test the learning model generalization ability.
在其中一些实施例中,在对开源蛋白质数据进行预处理的步骤中,具体包括下述步骤:In some of these embodiments, the step of preprocessing the open source protein data specifically includes the following steps:
从所述开源蛋白质数据集中提取蛋白质的序列数据,序列由20个英文字母组成,所述20个英文字母代表20种氨基酸,并将蛋白质的3D结构转化为邻接矩阵图。The sequence data of the protein is extracted from the open source protein data set, the sequence is composed of 20 English letters, and the 20 English letters represent 20 kinds of amino acids, and the 3D structure of the protein is converted into an adjacency matrix graph.
在其中一些实施例中,在构建单模态特征提取器的步骤中,具体包括:In some of these embodiments, in the step of constructing a single-modal feature extractor, it specifically includes:
单模态特征提取器为经过预训练的Transformer模型。The unimodal feature extractor is a pre-trained Transformer model.
在其中一些实施例中,在构建多模态融合模块的步骤中,具体包括下述步骤:In some of these embodiments, in the step of constructing the multimodal fusion module, the following steps are specifically included:
对序列特征矩阵和结构特征矩阵进行平均池化,每个氨基酸Average pooling of sequence feature matrix and structure feature matrix, each amino acid
的特征向量得到一个代表值,公式如下,其中
Figure PCTCN2022138208-appb-000001
The eigenvectors get a representative value, the formula is as follows, where
Figure PCTCN2022138208-appb-000001
Figure PCTCN2022138208-appb-000002
Figure PCTCN2022138208-appb-000002
Figure PCTCN2022138208-appb-000003
Figure PCTCN2022138208-appb-000003
Figure PCTCN2022138208-appb-000004
分别表示输入多模态模块前的序列特征矩阵和结构特征矩阵,其中D seq表示序列上每个氨基酸的特征维度,
Figure PCTCN2022138208-appb-000004
Represents the sequence feature matrix and structure feature matrix before inputting the multimodal module, where D seq represents the feature dimension of each amino acid on the sequence,
D struc表示结构上每个氨基酸的特征维度,L seq和L struc分别表示在序列和结构的氨基酸长度,但是两者其实是相等的,即L seq=L struc=L; D struc represents the characteristic dimension of each amino acid in the structure, L seq and L struc represent the amino acid length in the sequence and structure respectively, but the two are actually equal, that is, L seq = L struc = L;
将序列和结构的池化向量拼接,再经过一个全连接网络转化为含有多模态信息的向量,公式如下:The pooled vectors of sequences and structures are concatenated, and then transformed into vectors containing multimodal information through a fully connected network. The formula is as follows:
M comp=W[M seq,M struc]+b M comp =W[M seq ,M struc ]+b
其中,
Figure PCTCN2022138208-appb-000005
令D comp=(L seq+L struc)/5;
in,
Figure PCTCN2022138208-appb-000005
Let D comp =(L seq +L struc )/5;
将多模态信息压缩向量M comp重新分流到各个模态中去校准单模态信息,分流的过程是分别引入各自模态的全连接转化层,公式如下: The multi-modal information compression vector M comp is redistributed to each mode to calibrate the single-modal information. The process of distributing is to introduce the fully connected conversion layer of each mode respectively. The formula is as follows:
T seq=W seqM comp+b seq,
Figure PCTCN2022138208-appb-000006
T seq =W seq M comp +b seq ,
Figure PCTCN2022138208-appb-000006
T struc=W strucM comp+b struc,
Figure PCTCN2022138208-appb-000007
T struc =W struc M comp +b struc ,
Figure PCTCN2022138208-appb-000007
将分流转化的模态向量通过激活函数进行激活,作为门控开关去限制每个氨基酸对整体任务的贡献程度,具体公式如下:The modal vector of shunt conversion is activated through the activation function, which is used as a gating switch to limit the contribution of each amino acid to the overall task. The specific formula is as follows:
Figure PCTCN2022138208-appb-000008
Figure PCTCN2022138208-appb-000008
Figure PCTCN2022138208-appb-000009
Figure PCTCN2022138208-appb-000009
其中σ是指sigmoid函数,⊙是指哈达玛积;Where σ refers to the sigmoid function, and ⊙ refers to the Hadamard product;
经过与激活后的门控向量相乘之后,得到重构的单模态向量作为下一层单模态特征提取器的输入。After being multiplied by the activated gating vector, the reconstructed unimodal vector is obtained as the input of the next layer unimodal feature extractor.
在其中一些实施例中,在基于所述多模态融合模块构建学习模型的步骤中,具体包括下述步骤:In some of these embodiments, in the step of building a learning model based on the multimodal fusion module, the following steps are specifically included:
给蛋白质序列和结构的原始输入加上一个特殊token,命名为[cls],序列的[cls]放在整个序列的最前面,结构的[cls]与所有氨基酸建立一个虚拟的全连通;Add a special token to the original input of the protein sequence and structure, named [cls], the [cls] of the sequence is placed at the front of the entire sequence, and the [cls] of the structure establishes a virtual full connection with all amino acids;
原始蛋白质数据经过N e层的早期单模态特征提取器,序列经过所述Transformer模型的编码层,结构经过图注意力网络层,输出的结果代表已经提取高层语义的单模态向量表示; The original protein data passes through the early unimodal feature extractor of the Ne layer, the sequence passes through the encoding layer of the Transformer model, and the structure passes through the graph attention network layer, and the output result represents a unimodal vector representation that has extracted high-level semantics;
插入所述的多模态融合模块进行中期融合;Insert the multimodal fusion module for mid-term fusion;
在经过中期融合之后,单模态被多模态信息校准,继续经过N l层特征提取器,进一步作校准后的特征挖掘; After the mid-term fusion, the single mode is calibrated by multi-modal information, and continues to pass through the N1 layer feature extractor, and further performs the calibrated feature mining;
将校准后的特征挖掘的两个模态的[cls]向量进行拼接,然后经过前馈神经网络,再和早期单模态特征提取器得到的[cls]向量进行拼接;The [cls] vectors of the two modalities of the calibrated feature mining are spliced, and then passed through the feed-forward neural network, and then spliced with the [cls] vector obtained by the early single-modal feature extractor;
再经过第二个前馈神经网络,得到学习模型。After the second feed-forward neural network, the learning model is obtained.
在其中一些实施例中,在完成基于所述多模态融合模块构建学习模型的步骤之后,还包括下述步骤:In some of these embodiments, after completing the step of building a learning model based on the multimodal fusion module, the following steps are also included:
增加辅助损失对所述学习模型进行参数更新。An auxiliary loss is added to update the parameters of the learning model.
本申请目的之二,在于提供一种用于蛋白质表征学习的多模态信息融合系统,包括:The second purpose of this application is to provide a multimodal information fusion system for protein representation learning, including:
数据处理单元:用于对开源蛋白质数据进行预处理;Data processing unit: used to preprocess open source protein data;
分类单元:用于将所述蛋白质数据集划分成训练集、验证集和测试集;Classification unit: used to divide the protein data set into training set, verification set and test set;
单模态特征提取器构建单元:用于构建单模态特征提取器,所述单模态特征提取器作为蛋白质序列的特征提取器;Single-modal feature extractor construction unit: used to construct a single-modal feature extractor, the single-modal feature extractor is used as a feature extractor for protein sequences;
多模态融合模块构建单元:用于构建多模态融合模块,所述多模态融合模块对所述单模态特征提取器的氨基酸token embedding进行更新,以使单模态带有多模型的信息,并作为所述单模态特征提取器的输入;Multimodal fusion module construction unit: used to build a multimodal fusion module, the multimodal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-mode has multiple models information, and as the input of the unimodal feature extractor;
学习模型构建单元:用于基于所述多模态融合模块构建学习模型;Learning model building unit: used to build a learning model based on the multimodal fusion module;
训练单元:所述训练集训练所述学习模型,所述验证集衡量所述学习模型的效果,并选出性能表现最好的参数作为所述学习模型的参数,利用所述测试集独立测试所述学习模型的泛化能力。Training unit: the training set trains the learning model, the verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the test set to independently test the learning model. generalization ability of the learning model.
本申请目的之三,在于提供一种终端,包括:所述终端包括处理器、与所述处理器耦接的存储器,其中,The third purpose of the present application is to provide a terminal, including: the terminal includes a processor and a memory coupled to the processor, wherein,
所述存储器存储有用于实现所述的用于蛋白质表征学习的多模态信息融合方法的程序指令;The memory stores program instructions for realizing the multimodal information fusion method for protein characterization learning;
所述处理器用于执行所述存储器存储的所述程序指令以控制多模态信息融合。The processor is configured to execute the program instructions stored in the memory to control multimodal information fusion.
本申请目的之四,在于提供一种存储介质,存储有处理器可运行的程序指令,所述程序指令用于执行所述用于蛋白质表征学习的多模态信息融合方法。The fourth object of the present application is to provide a storage medium, which stores program instructions executable by a processor, and the program instructions are used to execute the multimodal information fusion method for protein characterization learning.
本申请采用上述技术方案具备下述效果:The application adopts the above-mentioned technical solution to have the following effects:
本申请提供的用于蛋白质表征学习的多模态信息融合方法、系统、终端及存储介质,利用早期提取、中期融合、后期预测的策略,让各个单模态的模型能够充分提取各自模态的高级语义信息之后再进行融合,后期再通过前馈神经网络去做任务的预测;同时,提出一个多模态融合的模块,能够在中期融合的时候,对每一层网络的不同模态信息进行细粒度的交互,更好地将多模态进行融合并传递下去;在后期预测阶段特征提取器的最后一层,将融合的多模态embedding和之前的单模态embedding拼接在一起作为蛋白质本身的表示,这样做可以最大程度保留单模态原有的信息。The multi-modal information fusion method, system, terminal, and storage medium for protein representation learning provided by this application utilize the strategies of early extraction, mid-term fusion, and later prediction, so that each single-modal model can fully extract the information of each modality The high-level semantic information is then fused, and then the feed-forward neural network is used to predict the task in the later stage; at the same time, a multi-modal fusion module is proposed, which can perform different modal information for each layer of the network during the mid-term fusion. Fine-grained interaction, better fusion and transmission of multimodality; in the last layer of the feature extractor in the later prediction stage, the fused multimodal embedding and the previous single-modal embedding are spliced together as the protein itself In this way, the original information of the single mode can be preserved to the greatest extent.
此外,本申请提供的用于蛋白质表征学习的多模态信息融合方法、系统、终端及存储介质,在设计损失函数时,对后期预测阶段不同层的特征提取网络都预测一个结果,作为最后损失的辅助损失,辅助损失的引入可以帮助模型更快收敛并且达到一个更好的性能。In addition, the multi-modal information fusion method, system, terminal and storage medium for protein representation learning provided by this application, when designing the loss function, predict a result for the feature extraction network of different layers in the later prediction stage, as the final loss The auxiliary loss, the introduction of auxiliary loss can help the model converge faster and achieve a better performance.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造 性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following will briefly introduce the accompanying drawings that need to be used in the embodiments of the present application or in the description of the prior art. Obviously, the accompanying drawings described below are only of the present application For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without creative effort.
图1为本申请实施例提供的用于蛋白质表征学习的多模态信息融合方法的步骤流程图。Fig. 1 is a flow chart of the steps of the multimodal information fusion method for protein representation learning provided by the embodiment of the present application.
图2为本申请实施例提供的蛋白质的邻接矩阵图。Fig. 2 is an adjacency matrix diagram of proteins provided in the examples of the present application.
图3为本申请实施例提供的多模态融合模块示意图。Fig. 3 is a schematic diagram of a multi-modal fusion module provided by an embodiment of the present application.
图4为本申请实施例提供的学习模块示意图。FIG. 4 is a schematic diagram of a learning module provided by an embodiment of the present application.
图5是本申请实施例提供的用于蛋白质表征学习的多模态信息融合方法的结构示意图。Fig. 5 is a schematic structural diagram of a multi-modal information fusion method for protein representation learning provided by an embodiment of the present application.
图6为本申请实施例提供的终端结构示意图。FIG. 6 is a schematic diagram of a terminal structure provided by an embodiment of the present application.
图7为本申请实施例提供的存储介质的结构示意图。FIG. 7 is a schematic structural diagram of a storage medium provided by an embodiment of the present application.
具体实施方式Detailed ways
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are intended to explain the present application, and should not be construed as limiting the present application.
在本申请的描述中,需要理解的是,术语“上”、“下”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本申请的限制。In the description of the present application, it should be understood that the orientation or positional relationship indicated by the terms "upper", "lower", "horizontal", "inner", "outer", etc. is based on the orientation or positional relationship shown in the drawings , is only for the convenience of describing the present application and simplifying the description, but does not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present application, "plurality" means two or more, unless otherwise specifically defined.
请参阅图1,为本申请提供的用于蛋白质表征学习的多模态信息融合方法的步骤流程图,包括下述步骤:Please refer to Figure 1, which is a flow chart of the steps of the multimodal information fusion method for protein characterization learning provided by this application, including the following steps:
步骤S110:对开源蛋白质数据进行预处理。Step S110: Preprocessing the open source protein data.
在本实施例中,在所述开源蛋白质数据中,这些数据集有各种任务,包括预测蛋白质的荧光性、蛋白质的次级结构,蛋白质的远程同源性和蛋白质的稳定性。从这些数据集中提取蛋白质的序列数据,序列由20个英文字母(代表20种氨基酸)组成,并将蛋白质的3D结构转化为邻接矩阵图,又称为contact map。In this example, among the open source protein data, these datasets have various tasks, including prediction of protein fluorescence, protein secondary structure, protein long-range homology and protein stability. Extract protein sequence data from these data sets, the sequence consists of 20 English letters (representing 20 amino acids), and convert the 3D structure of the protein into an adjacency matrix map, also known as a contact map.
如图2,它表示的是氨基酸之间是否在空间中接触,白色的部分表示有接触,黑色表示没有。As shown in Figure 2, it indicates whether the amino acids are in contact with each other in space, the white part indicates that there is contact, and the black part indicates that there is no contact.
步骤120:将所述蛋白质数据集划分成训练集、验证集和测试集。Step 120: Divide the protein data set into training set, verification set and test set.
可以理解,将处理后的数据集划分为训练、验证和测试集,训练集用来调整模型参数来拟合目标,验证集用来选择最优的参数,测试集用来评价模型最后的效果。It can be understood that the processed data set is divided into training, verification and test sets, the training set is used to adjust the model parameters to fit the target, the verification set is used to select the optimal parameters, and the test set is used to evaluate the final effect of the model.
步骤S130:构建单模态特征提取器,所述单模态特征提取器作为蛋白质序列的特征提取器。Step S130: constructing a single-modal feature extractor, which serves as a feature extractor for protein sequences.
在本实施例中,本申请选用经过预训练的Transformer模型TAPE作为蛋白质序列的特征提取器,预训练的策略能让模型在训练之前就带有先验信息,对模型的推理能起到正向作用,可以捕获全序列的氨基酸关联关系,而且支持并行化。对于蛋白质结构,其拓扑结构的性质,适合用图算法去解决,本申请选用了有效的图神经网络,具体是图注意力网络GAT,该网络也是利用注意力机制去捕获邻居节点与目标节点的关系。In this embodiment, this application selects the pre-trained Transformer model TAPE as the feature extractor of the protein sequence. The pre-training strategy allows the model to have prior information before training, which can play a positive role in the reasoning of the model. Function, it can capture the amino acid association relationship of the whole sequence, and supports parallelization. For the protein structure, the nature of its topological structure is suitable for solving by graph algorithms. This application uses an effective graph neural network, specifically the graph attention network GAT, which also uses the attention mechanism to capture neighbor nodes and target nodes. relation.
步骤S140:构建多模态融合模块,所述多模态融合模块对所述单模态特征提取器的氨基酸token embedding进行更新,以使单模态带有多模型的信息,并作为所述单模态特征提取器的输入。Step S140: Constructing a multimodal fusion module, the multimodal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-modal has information of multiple models, and serves as the single-modal feature extractor. Input to the modality feature extractor.
请参阅图3,在构建多模态融合模块的步骤中,主要有池化、压缩、分流、重构四个步骤,具体包括下述步骤:Please refer to Figure 3. In the steps of building a multimodal fusion module, there are mainly four steps of pooling, compression, splitting, and reconstruction, which specifically include the following steps:
第一步:对序列特征矩阵和结构特征矩阵进行平均池化,每个氨基酸的特征向量得到一个代表值,公式如下,其中
Figure PCTCN2022138208-appb-000010
Step 1: Perform average pooling on the sequence feature matrix and structure feature matrix, and obtain a representative value for each amino acid feature vector, the formula is as follows, where
Figure PCTCN2022138208-appb-000010
Figure PCTCN2022138208-appb-000011
Figure PCTCN2022138208-appb-000011
Figure PCTCN2022138208-appb-000012
Figure PCTCN2022138208-appb-000012
假设
Figure PCTCN2022138208-appb-000013
分别表示输入多模态模块前的序列特征矩阵和结构特征矩阵,其中D seq表示序列上每个氨基酸的特征维度,D struc表示结构上每个氨基酸的特征维度,L seq和L struc分别表示在序列和结构的氨基酸长度,但是两者其实是相等的,即,L seq=L struc=L。
suppose
Figure PCTCN2022138208-appb-000013
Represent the sequence feature matrix and structure feature matrix before inputting the multimodal module, where D seq represents the feature dimension of each amino acid on the sequence, D struc represents the feature dimension of each amino acid on the structure, L seq and L struc represent respectively in The amino acid length of sequence and structure, but both are actually equal, ie, L seq =L struc =L.
第二步:将序列和结构的池化向量拼接,再经过一个全连接网络转化为含有多模态信息的向量,公式如下:The second step: concatenate the pooled vectors of sequences and structures, and then convert them into vectors containing multimodal information through a fully connected network. The formula is as follows:
M comp=W[M seq,M struc]+b M comp =W[M seq ,M struc ]+b
其中,
Figure PCTCN2022138208-appb-000014
令D comp=(L seq+L struc)/5。
in,
Figure PCTCN2022138208-appb-000014
Let D comp =(L seq +L struc )/5.
可以理解,通过上述步骤可以限制模型的大小,提高它的泛化能力,这一步是多模态信息交互并压缩的过程。It can be understood that through the above steps, the size of the model can be limited and its generalization ability can be improved. This step is a process of multimodal information interaction and compression.
第三步:将多模态信息压缩向量M comp重新分流到各个模态中去校准单模态信息,分流的过程是分别引入各自模态的全连接转化层,公式如下: Step 3: Re-distribute the multi-modal information compression vector M comp to each mode to calibrate the single-modal information. The process of diversion is to introduce the fully connected conversion layer of each mode respectively. The formula is as follows:
T seq=W seqM comp+b seq,
Figure PCTCN2022138208-appb-000015
T seq =W seq M comp +b seq ,
Figure PCTCN2022138208-appb-000015
T struc=W strucM comp+b struc,
Figure PCTCN2022138208-appb-000016
T struc =W struc M comp +b struc ,
Figure PCTCN2022138208-appb-000016
第四步:将分流转化的模态向量通过激活函数进行激活,作为门控开关去限制每个氨基酸对整体任务的贡献程度,具体公式如下:Step 4: Activate the modal vector of shunt conversion through the activation function, and use it as a gating switch to limit the contribution of each amino acid to the overall task. The specific formula is as follows:
Figure PCTCN2022138208-appb-000017
Figure PCTCN2022138208-appb-000017
Figure PCTCN2022138208-appb-000018
Figure PCTCN2022138208-appb-000018
其中σ是指sigmoid函数,⊙是指哈达玛积。Where σ refers to the sigmoid function, and ⊙ refers to the Hadamard product.
第五步:经过与激活后的门控向量相乘之后,得到重构的单模态向量作为下一层单模态特征提取器的输入。Step 5: After multiplying with the activated gating vector, the reconstructed unimodal vector is obtained as the input of the next layer unimodal feature extractor.
可以理解,本申请在构建多模态融合模块的步骤一种校准重构的方法,具体来讲就是利用多模态的信息交互对每个单模态的氨基酸token embedding进行更新,从而使原本可能信息模糊的单模态带有多模型的信息指导,在模式识别时更加清晰。It can be understood that this application is a method of calibration and reconstruction in the step of building a multimodal fusion module. Specifically, it uses multimodal information interaction to update each single-modal amino acid token embedding, so that the original possible Fuzzy single-modal information with multi-modal information guidance makes it clearer in pattern recognition.
步骤S150:基于所述多模态融合模块构建学习模型。Step S150: Construct a learning model based on the multimodal fusion module.
请参阅图4,为基于所述多模态融合模块构建学习模型的原理示意图,具体包括下述步骤:Please refer to Figure 4, which is a schematic diagram of the principle of building a learning model based on the multimodal fusion module, which specifically includes the following steps:
步骤S151:给蛋白质序列和结构的原始输入加上一个特殊token,命名为[cls],序列的[cls]放在整个序列的最前面,结构的[cls]与所有氨基酸建立一个虚拟的全连通。Step S151: Add a special token to the original input of the protein sequence and structure, named [cls], the [cls] of the sequence is placed at the top of the entire sequence, and the [cls] of the structure establishes a virtual full connection with all amino acids .
可以理解,引入[cls]的目的是让[cls]能代表整个模态参与后续预测。It can be understood that the purpose of introducing [cls] is to allow [cls] to represent the entire modality to participate in subsequent predictions.
步骤S152:原始蛋白质数据经过N e层的早期单模态特征提取器,序列经过所述Transformer模型的编码层,结构经过图注意力网络层,输出的结果代表已经提取高层语义的单模态向量表示。 Step S152: The original protein data passes through the early unimodal feature extractor of Ne layer, the sequence passes through the encoding layer of the Transformer model, the structure passes through the graph attention network layer, and the output result represents the unimodal vector that has extracted high-level semantics express.
步骤S153:插入所述的多模态融合模块进行中期融合。Step S153: inserting the multimodal fusion module for mid-term fusion.
可以理解,进入多模态融合阶段,也就是中期融合阶段,在前面早期提取的基础上,每一层加入了模态间的交互,插入了图3描述的多模态融合网络,一共经过了N m层。 It can be understood that when entering the multi-modal fusion stage, that is, the mid-term fusion stage, on the basis of the previous early extraction, each layer adds interaction between modalities, inserts the multi-modal fusion network described in Figure 3, and passes through a total of N m layers.
步骤S154:在经过中期融合之后,单模态被多模态信息校准,继续经过N l层特征提取器,进一步作校准后的特征挖掘。 Step S154: After the mid-term fusion, the single modality is calibrated by the multimodal information, and continues to pass through N1 layer feature extractors for further calibrated feature mining.
步骤S155:将校准后的特征挖掘的两个模态的[cls]向量进行拼接,然后经过前馈神经网络,再和早期单模态特征提取器得到的[cls]向量进行拼接。Step S155: Concatenate the [cls] vectors of the two modalities of the calibrated feature mining, and then pass through the feedforward neural network, and then concatenate with the [cls] vector obtained by the early single-mode feature extractor.
可以理解,由于拼接后的向量相对来说比较割裂,拼接后的特征向量经过一个可学习的前馈神经网络可以得到一个更为整体的特征向量,预测结果更为准确。It can be understood that since the spliced vectors are relatively fragmented, the spliced feature vectors can be passed through a learnable feedforward neural network to obtain a more holistic feature vector, and the prediction results are more accurate.
可以理解,由于多模态在信息传递的过程中可能会丢失单模态的一些信息,因此与单模态向量拼接之后可以做信息的补全。It can be understood that since the multi-modality may lose some information of the single-mode in the process of information transmission, the information can be completed after splicing with the single-mode vector.
步骤S156:再经过第二个前馈神经网络,得到学习模型。Step S156: Obtain the learning model through the second feed-forward neural network.
可以理解,本申请实施例提供的多模态融合的策略,通过早期提取、中期融合、后期预测的策略,能够让模型对单模态和多模态的信息学习更充分;后期得到的多模态表征并没有直接用于预测,而是加入了早期的单模态表征,让网络传递过程中损失的单模态信息在最后能得到补充。It can be understood that the multi-modal fusion strategy provided by the embodiment of the present application can make the model learn more fully about single-modal and multi-modal information through early extraction, mid-term fusion, and later prediction strategies; The state representation is not directly used for prediction, but the early unimodal representation is added, so that the unimodal information lost during the network transmission process can be supplemented at the end.
在其中一些实施例中,在完成基于所述多模态融合模块构建学习模型的步骤之后,还包括下述步骤:In some of these embodiments, after completing the step of building a learning model based on the multimodal fusion module, the following steps are also included:
步骤S157:增加辅助损失对所述学习模型进行参数更新。Step S157: adding an auxiliary loss to update the parameters of the learning model.
可以理解,由于主网络的参数较大,模型较复杂,在训练的时候会比较难收敛,因此,本申请在后期预测阶段的每个特征提取层,都将结果输出,去预测最终的目标,由此得到的损失作为辅助损失,与主损失相加,对模型进行参数更新。It can be understood that due to the large parameters of the main network and the complex model, it will be difficult to converge during training. Therefore, each feature extraction layer in the later prediction stage of this application will output the results to predict the final goal. The resulting loss is used as an auxiliary loss, which is added to the main loss to update the parameters of the model.
步骤S160:所述训练集训练所述学习模型,所述验证集衡量所述学习模型的效果,并选出性能表现最好的参数作为所述学习模型的参数,利用所述测试集独立测试所述学习模型的泛化能力。Step S160: the training set trains the learning model, the verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the testing set to independently test the generalization ability of the learning model.
请参阅图5,为本申请提供的用于蛋白质表征学习的多模态信息融合系统的结构示意图,包括:数据处理单元110:用于对开源蛋白质数据进行预处理;分类单元120:用于将所述蛋白质数据集划分成训练集、验证集和测试集;单 模态特征提取器构建单元130:用于构建单模态特征提取器,所述单模态特征提取器作为蛋白质序列的特征提取器;多模态融合模块构建单元140:用于构建多模态融合模块,所述多模态融合模块对所述单模态特征提取器的氨基酸token embedding进行更新,以使单模态带有多模型的信息,并作为所述单模态特征提取器的输入;学习模型构建单元150:用于基于所述多模态融合模块构建学习模型;训练单元160:所述训练集训练所述学习模型,所述验证集衡量所述学习模型的效果,并选出性能表现最好的参数作为所述学习模型的参数,利用所述测试集独立测试所述学习模型的泛化能力。其详细的实现方式在本申请上述方法描述中已经说明,这里不再赘述。Please refer to Figure 5, which is a schematic structural diagram of a multimodal information fusion system for protein characterization learning provided by this application, including: data processing unit 110: used to preprocess open source protein data; classification unit 120: used to The protein data set is divided into a training set, a verification set and a test set; a single-modal feature extractor construction unit 130: used to construct a single-modal feature extractor, and the single-modal feature extractor is used as a feature extraction of protein sequences device; multimodal fusion module construction unit 140: used to construct a multimodal fusion module, the multimodal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-modal with Multi-model information, and as the input of the single-modal feature extractor; learning model construction unit 150: for building a learning model based on the multi-modal fusion module; training unit 160: the training set trains the learning model The verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the test set to independently test the generalization ability of the learning model. Its detailed implementation has been described in the above description of the method in this application, and will not be repeated here.
请参阅图6,为本申请实施例的终端结构示意图。该终端50包括处理器51、与处理器51耦接的存储器52。Please refer to FIG. 6 , which is a schematic diagram of a terminal structure according to an embodiment of the present application. The terminal 50 includes a processor 51 and a memory 52 coupled to the processor 51 .
存储器52存储有用于实现所述的用于蛋白质表征学习的多模态信息融合方法的程序指令。The memory 52 stores program instructions for implementing the multimodal information fusion method for protein characterization learning.
处理器51用于执行存储器52存储的程序指令以控制所述多模态信息融合。The processor 51 is configured to execute the program instructions stored in the memory 52 to control the multimodal information fusion.
其中,处理器51还可以称为CPU(Central Processing Unit,中央处理单元)。处理器51可能是一种集成电路芯片,具有信号的处理能力。处理器51还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Wherein, the processor 51 may also be referred to as a CPU (Central Processing Unit, central processing unit). The processor 51 may be an integrated circuit chip with signal processing capabilities. The processor 51 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components . A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
请参阅图7,为本申请实施例的存储介质的结构示意图。本申请实施例的存储介质存储有能够实现上述所有方法的程序文件61,其中,该程序文件61可以以软件产品的形式存储在上述存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质,或 者是计算机、服务器、手机、平板等终端设备。Please refer to FIG. 7 , which is a schematic structural diagram of a storage medium according to an embodiment of the present application. The storage medium of the embodiment of the present application stores a program file 61 capable of realizing all the above-mentioned methods, wherein the program file 61 can be stored in the above-mentioned storage medium in the form of a software product, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the methods in various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. , or terminal devices such as computers, servers, mobile phones, and tablets.
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above are only examples of the present application, and are not intended to limit the present application. For those skilled in the art, various modifications and changes may occur in this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.

Claims (9)

  1. 一种用于蛋白质表征学习的多模态信息融合方法,其特征在于,包括下述步骤:A multimodal information fusion method for protein representation learning, characterized in that it comprises the following steps:
    对开源蛋白质数据进行预处理;Preprocessing open source protein data;
    将所述蛋白质数据集划分成训练集、验证集和测试集;dividing the protein data set into a training set, a verification set and a test set;
    构建单模态特征提取器,所述单模态特征提取器作为蛋白质序列的特征提取器;Constructing a unimodal feature extractor, the unimodal feature extractor is used as a feature extractor for protein sequences;
    构建多模态融合模块,所述多模态融合模块对所述单模态特征提取器的氨基酸token embedding进行更新,以使单模态带有多模型的信息,并作为所述单模态特征提取器的输入;Build a multi-modal fusion module, the multi-modal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-modal information with multiple models is used as the single-modal feature the input to the extractor;
    基于所述多模态融合模块构建学习模型;Constructing a learning model based on the multimodal fusion module;
    所述训练集训练所述学习模型,所述验证集衡量所述学习模型的效果,并选出性能表现最好的参数作为所述学习模型的参数,利用所述测试集独立测试所述学习模型的泛化能力。The training set trains the learning model, the verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the test set to independently test the learning model generalization ability.
  2. 根据权利要求1所述的用于蛋白质表征学习的多模态信息融合方法,其特征在于,在对开源蛋白质数据进行预处理的步骤中,具体包括下述步骤:The multimodal information fusion method for protein characterization learning according to claim 1, wherein, in the step of preprocessing open source protein data, specifically comprising the following steps:
    从所述开源蛋白质数据集中提取蛋白质的序列数据,序列由20个英文字母组成,所述20个英文字母代表20种氨基酸,并将蛋白质的3D结构转化为邻接矩阵图。The sequence data of the protein is extracted from the open source protein data set, the sequence is composed of 20 English letters, and the 20 English letters represent 20 kinds of amino acids, and the 3D structure of the protein is converted into an adjacency matrix graph.
  3. 根据权利要求2所述的用于蛋白质表征学习的多模态信息融合方法,其特征在于,在构建单模态特征提取器的步骤中,具体包括:The multimodal information fusion method for protein representation learning according to claim 2, wherein in the step of constructing a single-modal feature extractor, it specifically includes:
    单模态特征提取器为经过预训练的Transformer模型。The unimodal feature extractor is a pre-trained Transformer model.
  4. 根据权利要求3所述的用于蛋白质表征学习的多模态信息融合方法,其特征在于,在构建多模态融合模块的步骤中,具体包括下述步骤:The multimodal information fusion method for protein representation learning according to claim 3, characterized in that, in the step of building a multimodal fusion module, specifically comprising the following steps:
    对序列特征矩阵和结构特征矩阵进行平均池化,每个氨基酸的特征向量得到一个代表值,公式如下,其中
    Figure PCTCN2022138208-appb-100001
    Average pooling is performed on the sequence feature matrix and the structure feature matrix, and a representative value is obtained from the feature vector of each amino acid. The formula is as follows, where
    Figure PCTCN2022138208-appb-100001
    Figure PCTCN2022138208-appb-100002
    Figure PCTCN2022138208-appb-100002
    Figure PCTCN2022138208-appb-100003
    Figure PCTCN2022138208-appb-100003
    Figure PCTCN2022138208-appb-100004
    分别表示输入多模态模块前的序列特征矩阵和结构特征矩阵,其中D seq表示序列上每个氨基酸的特征维度,D struc表示结构上每个氨基酸的特征维度,L seq和L struc分别表示在序列和结构的氨基酸长度,但是两者其实是相等的,即L seq=L struc=L;
    Figure PCTCN2022138208-appb-100004
    Represent the sequence feature matrix and structure feature matrix before inputting the multimodal module, where D seq represents the feature dimension of each amino acid on the sequence, D struc represents the feature dimension of each amino acid on the structure, L seq and L struc represent respectively in The amino acid length of sequence and structure, but the two are actually equal, that is, L seq = L struc = L;
    将序列和结构的池化向量拼接,再经过一个全连接网络转化为含有多模态信息的向量,公式如下:The pooled vectors of sequences and structures are concatenated, and then transformed into vectors containing multimodal information through a fully connected network. The formula is as follows:
    M comp=W[M seq,M struc]+b M comp =W[M seq ,M struc ]+b
    其中,
    Figure PCTCN2022138208-appb-100005
    令D comp=(L seq+L struc)/5;
    in,
    Figure PCTCN2022138208-appb-100005
    Let D comp =(L seq +L struc )/5;
    将多模态信息压缩向量M comp重新分流到各个模态中去校准单模态信息,分流的过程是分别引入各自模态的全连接转化层,公式如下: The multi-modal information compression vector M comp is redistributed to each mode to calibrate the single-modal information. The process of distributing is to introduce the fully connected conversion layer of each mode respectively. The formula is as follows:
    T seq=W seqM comp+b seq,
    Figure PCTCN2022138208-appb-100006
    T struc=W strucM comp+b struc,
    Figure PCTCN2022138208-appb-100007
    T seq =W seq M comp +b seq ,
    Figure PCTCN2022138208-appb-100006
    T struc =W struc M comp +b struc ,
    Figure PCTCN2022138208-appb-100007
    将分流转化的模态向量通过激活函数进行激活,作为门控开关去限制每个氨基酸对整体任务的贡献程度,具体公式如下:The modal vector of shunt conversion is activated through the activation function, which is used as a gating switch to limit the contribution of each amino acid to the overall task. The specific formula is as follows:
    Figure PCTCN2022138208-appb-100008
    Figure PCTCN2022138208-appb-100008
    Figure PCTCN2022138208-appb-100009
    Figure PCTCN2022138208-appb-100009
    其中σ是指sigmoid函数,⊙是指哈达玛积;Where σ refers to the sigmoid function, and ⊙ refers to the Hadamard product;
    经过与激活后的门控向量相乘之后,得到重构的单模态向量作为下一层单模态特征提取器的输入。After being multiplied by the activated gating vector, the reconstructed unimodal vector is obtained as the input of the next layer unimodal feature extractor.
  5. 根据权利要求4所述的用于蛋白质表征学习的多模态信息融合方法,其特征在于,在基于所述多模态融合模块构建学习模型的步骤中,具体包括下述步骤:The multimodal information fusion method for protein representation learning according to claim 4, wherein in the step of building a learning model based on the multimodal fusion module, specifically comprising the following steps:
    给蛋白质序列和结构的原始输入加上一个特殊token,命名为[cls],序列的[cls]放在整个序列的最前面,结构的[cls]与所有氨基酸建立一个虚拟的全连通;Add a special token to the original input of the protein sequence and structure, named [cls], the [cls] of the sequence is placed at the front of the entire sequence, and the [cls] of the structure establishes a virtual full connection with all amino acids;
    原始蛋白质数据经过N e层的早期单模态特征提取器,序列经过所述Transformer模型的编码层,结构经过图注意力网络层,输出的结果代表已经提取高层语义的单模态向量表示; The original protein data passes through the early unimodal feature extractor of the Ne layer, the sequence passes through the encoding layer of the Transformer model, and the structure passes through the graph attention network layer, and the output result represents a unimodal vector representation that has extracted high-level semantics;
    插入所述的多模态融合模块进行中期融合;Insert the multimodal fusion module for mid-term fusion;
    在经过中期融合之后,单模态被多模态信息校准,继续经过N l层特征提取器,进一步作校准后的特征挖掘; After the mid-term fusion, the single mode is calibrated by multi-modal information, and continues to pass through the N1 layer feature extractor, and further performs the calibrated feature mining;
    将校准后的特征挖掘的两个模态的[cls]向量进行拼接,然后经过前馈神经网络,再和早期单模态特征提取器得到的[cls]向量进行拼接;The [cls] vectors of the two modalities of the calibrated feature mining are spliced, and then passed through the feed-forward neural network, and then spliced with the [cls] vector obtained by the early single-modal feature extractor;
    再经过第二个前馈神经网络,得到学习模型。After the second feed-forward neural network, the learning model is obtained.
  6. 根据权利要求5所述的用于蛋白质表征学习的多模态信息融合方法,其特征在于,在完成基于所述多模态融合模块构建学习模型的步骤之后,还包括下述步骤:The multimodal information fusion method for protein representation learning according to claim 5, characterized in that, after completing the step of building a learning model based on the multimodal fusion module, it also includes the following steps:
    增加辅助损失对所述学习模型进行参数更新。An auxiliary loss is added to update the parameters of the learning model.
  7. 一种用于蛋白质表征学习的多模态信息融合系统,其特征在于,包括:A multi-modal information fusion system for protein representation learning, characterized in that it includes:
    数据处理单元:用于对开源蛋白质数据进行预处理;Data processing unit: used to preprocess open source protein data;
    分类单元:用于将所述蛋白质数据集划分成训练集、验证集和测试集;Classification unit: used to divide the protein data set into training set, verification set and test set;
    单模态特征提取器构建单元:用于构建单模态特征提取器,所述单模态特征提取器作为蛋白质序列的特征提取器;Single-modal feature extractor construction unit: used to construct a single-modal feature extractor, the single-modal feature extractor is used as a feature extractor for protein sequences;
    多模态融合模块构建单元:用于构建多模态融合模块,所述多模态融合模块对所述单模态特征提取器的氨基酸token embedding进行更新,以使单模态带有多模型的信息,并作为所述单模态特征提取器的输入;Multimodal fusion module construction unit: used to build a multimodal fusion module, the multimodal fusion module updates the amino acid token embedding of the single-modal feature extractor, so that the single-mode has multiple models information, and as the input of the unimodal feature extractor;
    学习模型构建单元:用于基于所述多模态融合模块构建学习模型;Learning model building unit: used to build a learning model based on the multimodal fusion module;
    训练单元:所述训练集训练所述学习模型,所述验证集衡量所述学习模型的效果,并选出性能表现最好的参数作为所述学习模型的参数,利用所述测试集独立测试所述学习模型的泛化能力。Training unit: the training set trains the learning model, the verification set measures the effect of the learning model, and selects the parameters with the best performance as the parameters of the learning model, and uses the test set to independently test the learning model. generalization ability of the learning model.
  8. 一种终端,其特征在于,包括:所述终端包括处理器、与所述处理器耦接的存储器,其中,A terminal, characterized by comprising: the terminal includes a processor and a memory coupled to the processor, wherein,
    所述存储器存储有用于实现权利要求1-6任一项所述的用于蛋白质表征学习的多模态信息融合方法的程序指令;The memory stores program instructions for realizing the multimodal information fusion method for protein characterization learning according to any one of claims 1-6;
    所述处理器用于执行所述存储器存储的所述程序指令以控制多模态信息融合。The processor is configured to execute the program instructions stored in the memory to control multimodal information fusion.
  9. 一种存储介质,其特征在于,存储有处理器可运行的程序指令,所述程序指令用于执行权利要求1至6任一项所述用于蛋白质表征学习的多模态信息融合方法。A storage medium, characterized in that it stores program instructions executable by a processor, and the program instructions are used to execute the multimodal information fusion method for protein representation learning according to any one of claims 1 to 6.
PCT/CN2022/138208 2021-12-15 2022-12-09 Multi-mode information fusion method and system for protein representative learning, and terminal and storage medium WO2023109714A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111536668.4A CN114388064A (en) 2021-12-15 2021-12-15 Multi-modal information fusion method, system, terminal and storage medium for protein characterization learning
CN202111536668.4 2021-12-15

Publications (1)

Publication Number Publication Date
WO2023109714A1 true WO2023109714A1 (en) 2023-06-22

Family

ID=81197386

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138208 WO2023109714A1 (en) 2021-12-15 2022-12-09 Multi-mode information fusion method and system for protein representative learning, and terminal and storage medium

Country Status (2)

Country Link
CN (1) CN114388064A (en)
WO (1) WO2023109714A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116913383A (en) * 2023-09-13 2023-10-20 鲁东大学 T cell receptor sequence classification method based on multiple modes
CN116933046A (en) * 2023-09-19 2023-10-24 山东大学 Deep learning-based multi-mode health management scheme generation method and system
CN116935952A (en) * 2023-09-18 2023-10-24 浙江大学杭州国际科创中心 Method and device for training protein prediction model based on graph neural network
CN117173692A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114388064A (en) * 2021-12-15 2022-04-22 深圳先进技术研究院 Multi-modal information fusion method, system, terminal and storage medium for protein characterization learning
CN115984622B (en) * 2023-01-10 2023-12-29 深圳大学 Multi-mode and multi-example learning classification method, prediction method and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052911A (en) * 2017-12-20 2018-05-18 上海海洋大学 Multi-modal remote sensing image high-level characteristic integrated classification method based on deep learning
CN111584073A (en) * 2020-05-13 2020-08-25 山东大学 Artificial intelligence fusion multi-modal information-based diagnosis model for constructing multiple pathological types of benign and malignant pulmonary nodules
US20200279156A1 (en) * 2017-10-09 2020-09-03 Intel Corporation Feature fusion for multi-modal machine learning analysis
CN112837753A (en) * 2021-02-07 2021-05-25 中国科学院新疆理化技术研究所 MicroRNA-disease associated prediction method based on multi-mode stacking automatic coding machine
CN114388064A (en) * 2021-12-15 2022-04-22 深圳先进技术研究院 Multi-modal information fusion method, system, terminal and storage medium for protein characterization learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200279156A1 (en) * 2017-10-09 2020-09-03 Intel Corporation Feature fusion for multi-modal machine learning analysis
CN108052911A (en) * 2017-12-20 2018-05-18 上海海洋大学 Multi-modal remote sensing image high-level characteristic integrated classification method based on deep learning
CN111584073A (en) * 2020-05-13 2020-08-25 山东大学 Artificial intelligence fusion multi-modal information-based diagnosis model for constructing multiple pathological types of benign and malignant pulmonary nodules
CN112837753A (en) * 2021-02-07 2021-05-25 中国科学院新疆理化技术研究所 MicroRNA-disease associated prediction method based on multi-mode stacking automatic coding machine
CN114388064A (en) * 2021-12-15 2022-04-22 深圳先进技术研究院 Multi-modal information fusion method, system, terminal and storage medium for protein characterization learning

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116913383A (en) * 2023-09-13 2023-10-20 鲁东大学 T cell receptor sequence classification method based on multiple modes
CN116913383B (en) * 2023-09-13 2023-11-28 鲁东大学 T cell receptor sequence classification method based on multiple modes
CN116935952A (en) * 2023-09-18 2023-10-24 浙江大学杭州国际科创中心 Method and device for training protein prediction model based on graph neural network
CN116935952B (en) * 2023-09-18 2023-12-01 浙江大学杭州国际科创中心 Method and device for training protein prediction model based on graph neural network
CN116933046A (en) * 2023-09-19 2023-10-24 山东大学 Deep learning-based multi-mode health management scheme generation method and system
CN116933046B (en) * 2023-09-19 2023-11-24 山东大学 Deep learning-based multi-mode health management scheme generation method and system
CN117173692A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device
CN117173692B (en) * 2023-11-02 2024-02-02 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device

Also Published As

Publication number Publication date
CN114388064A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
WO2023109714A1 (en) Multi-mode information fusion method and system for protein representative learning, and terminal and storage medium
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
CN110347932B (en) Cross-network user alignment method based on deep learning
US20220262162A1 (en) Face detection method, apparatus, and device, and training method, apparatus, and device for image detection neural network
CN112883149B (en) Natural language processing method and device
CN116415654A (en) Data processing method and related equipment
WO2022105117A1 (en) Method and device for image quality assessment, computer device, and storage medium
CN115221846A (en) Data processing method and related equipment
WO2023231753A1 (en) Neural network training method, data processing method, and device
CN113761250A (en) Model training method, merchant classification method and device
WO2023051369A1 (en) Neural network acquisition method, data processing method and related device
US20200065654A1 (en) Neural network fusion apparatus and modular neural network fusion method and matching interface generation method for the same
WO2023284716A1 (en) Neural network searching method and related device
CN111091010A (en) Similarity determination method, similarity determination device, network training device, network searching device and storage medium
WO2021169366A1 (en) Data enhancement method and apparatus
CN114974397A (en) Training method of protein structure prediction model and protein structure prediction method
CN113628059A (en) Associated user identification method and device based on multilayer graph attention network
CN115223020B (en) Image processing method, apparatus, device, storage medium, and computer program product
CN113435365A (en) Face image migration method and device
CN113641797A (en) Data processing method, device, equipment, storage medium and computer program product
CN115858848A (en) Image-text mutual inspection method and device, training method and device, server and medium
KR102292800B1 (en) Neural network fusion apparatus, uint neural network fusioning method and matching interface generation method thereof
CN117114063A (en) Method for training a generative large language model and for processing image tasks
WO2022127603A1 (en) Model processing method and related device
WO2023045949A1 (en) Model training method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906461

Country of ref document: EP

Kind code of ref document: A1