WO2021227091A1 - 一种基于图卷积神经网络的多模态分类方法 - Google Patents
一种基于图卷积神经网络的多模态分类方法 Download PDFInfo
- Publication number
- WO2021227091A1 WO2021227091A1 PCT/CN2020/090879 CN2020090879W WO2021227091A1 WO 2021227091 A1 WO2021227091 A1 WO 2021227091A1 CN 2020090879 W CN2020090879 W CN 2020090879W WO 2021227091 A1 WO2021227091 A1 WO 2021227091A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- modal
- convolutional neural
- graph
- neural network
- graph convolutional
- Prior art date
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 28
- 238000000034 method Methods 0.000 title claims abstract description 25
- 239000013598 vector Substances 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims 1
- 230000002902 bimodal effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the invention belongs to the technical field of artificial intelligence in computer science and technology, and specifically relates to a multi-modal classification method based on graph convolutional neural networks.
- Multi-modal data In recent years, more and more multi-modal data have appeared in practical applications. For example, multimedia data on the Internet often contains multiple modal information: videos, images, and surrounding text information; webpage data also contains multiple modal information. State information: the text information of the web page itself and the hyperlink information that links to the web page.
- These multi-modal data contain huge economic value, and at the same time, using these multi-modal data can often obtain better results than single-modal data. For example, in user content recommendation based on information flow, different modal information (such as pictures, text) in the information flow can be considered at the same time to recommend content of interest to the user. In practical applications, we can easily find multiple structural information of data from different modalities.
- graph convolutional neural networks can embed graph structure information into neural networks, and are suitable for processing large-scale data, but they cannot be directly applied to multi-modal scenes. Objects in practical applications often have multi-modality. However, the traditional multi-modal method only trains learners on multiple modalities and then integrates them. This method is easy to ignore useful structural information in different modalities. For this reason, we propose a graph-based convolutional neural network. Multi-modal classification method.
- the purpose of the present invention is to provide a multi-modal classification method based on graph convolutional neural network to solve the above-mentioned problems in the background art.
- a multi-modal classification method based on graph convolutional neural network including the following steps:
- each object contains V modalities.
- a class label is provided for a small number of objects in the library by manual labeling. These class-labeled objects are called the initial labeled training data, and they form the training data set together with the remaining large number of unlabeled objects.
- the objects in the training object library are converted into corresponding feature representations, that is, the features of the objects in the object library are extracted, and all objects are converted into corresponding feature vectors. Since the object contains V modalities, the final feature vector of each object is also divided into V parts.
- each object can correspond to two feature vectors in d 1 and d 2 -dimensional Euclidean spaces.
- the user adds k nearest neighbors to the object library according to the feature vector of the object to be tested on the V modalities, and then the new graph and the obtained feature vector are input to the trained V respectively.
- Classifier the classifier will return the prediction result of the object to the user, and then select the one with higher confidence among the V prediction results as the final label output.
- the beneficial effect of the present invention is that the present invention comprehensively considers the graph structure information of different modalities through an innovative multi-modal graph convolutional neural network, and in each of the multi-modal graph convolutional neural networks By assigning trainable weights in the layer, the representation learned by each mode can gradually consider the structural information of other modes.
- the present invention requires map creation, it can be used in inductive learning scenarios without obtaining samples to be tested during training.
- FIG. 1 is a flowchart of the present invention
- Fig. 2 is a flowchart of the training algorithm of the multi-modal graph convolutional neural network in the present invention
- Fig. 3 is a flow chart of the prediction algorithm of the multi-modal graph convolutional neural network in the present invention.
- Step 1 Establish an object library containing n pieces of information as the training object library, and assign a category label to a small number of objects in the object library by manual labeling, and use y i to represent the category label of the i-th object.
- military news webpages are the first category
- entertainment news webpages are the second category.
- the content contained in the user is entertainment news
- y i 0
- the web page belongs to the second category .
- x i (x 1,i ,x 2,i )
- the bimodal eigenvector pair of the subsequent i-th object can also be called a sample x i ; the bimodal feature can be represented by matrices X 1 and X 2 .
- Step 3 Let the user select the k and distance space to be used, which can be various common distance spaces, including Euclidean distance, Cosine distance, etc., and then establish a k-nearest neighbor graph according to the selected k value and distance space.
- the state v is represented by the adjacency graph as Av .
- the distance metric used can be expressed as d(x i ,x j )
- a v(ij) exp(-d(x i ,x j )/ ⁇ 2 )
- ⁇ is a hyperparameter, usually selected from ⁇ 0.01,0.1,1 ⁇ ;
- Step 4 Use the multi-modal graph convolutional neural network training algorithm to train the classifier, where the specific structure of the multi-modal graph convolutional neural network is:
- the hidden layer structure is: For k ⁇ 1,2,...,K v -1 ⁇ ,
- D v ⁇ j A v ( ij), where A v (ij) A v represents the i-th row j-th element.
- Step 5 Obtain the sample to be predicted, use the same feature extraction algorithm as step 2 to extract the features, use the same distance metric as step 3 to create a new graph, the method of constructing the graph is except for the original edge, for each sample to be tested Find its k-nearest neighbors in the original object library and connect them to the edges.
- Step 6 input the features and the new image into the multi-modal graph convolutional neural network trained in step 4. Finally, the predicted label is inferred based on the output value.
- the training process of the weighted multimodal graph convolutional neural network method is:
- Step 7 the maximum number of iteration rounds T, the number of graph convolutional network layers; initialize the multi-modal graph convolutional neural network f 1 , f 2 ,..., f V , and set the parameters in the graph convolutional layer Initialized to
- Step 8 If t>T, go to step 11; otherwise, continue training and go to step 9
- the calculation method according to the corresponding gradient is Then increase the iteration counter t by 1, and go to step 8.
- Step 11 Output the obtained network f 1 , f 2 ,..., f V.
- the prediction process of the weighted multimodal graph convolutional neural network method is:
- Step 12 For the t samples to be predicted, first use the method in step 2 to extract features
- Step 13 using the same distance metric used in step 3 to find the k nearest neighbors in the object library for each sample to be predicted and weighting the corresponding new image A v ′, where
- Step 15 first integrate the prediction results of each modal
- Step 16 and then output the result according to the predicted value of each category Among them, i ⁇ n+1,...,n+t correspond to the sample to be predicted.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (4)
- 一种基于图卷积神经网络的多模态分类方法,包括以下步骤:(一)建立一个对象库作为训练数据集,其中对象库包含n个对象,给对象库中的少量对象赋予一个类别标记,用l表示有标记的对象数目,u表示未标记的对象数目;(二)通过特征提取算法,提取对象库中不同模态对应的特征,假设具有V个模态,为每个对象生成特征向量对(特征1,特征2,...,特征V)。(三)为每一个模态的特征建立一个k-近邻图,对于模态v,其邻接矩阵记作A v;(四)将数据的特征向量以及每一个模态的k-近邻图输入到多模态图卷积神经网络中,为每个模态分别训练得到一个分类器;(五)获取待测对象,用t表示待测对象数目并用步骤(二)中相同的方法得到其特征向量对,用步骤(三)中的建图方法将新的样本加入到图中。(六)将各个模态上的特征向量及所有更新后的k-近邻图输入步骤(四)所训练得到的对应分类器中,获得V个预测标记,并输出其中置信度较高的那个作为最终标记。
- 如权利要求1所述的基于图卷积神经网络的多模态分类方法,其特征在于,所述步骤(四),使用多模态图卷积神经网络作为分类器,其具体步骤为:S2若t>T,转到步骤5);否则继续训练转到步骤3)S4通过分别为模态v=1,2,...,V,固定 使用有标记数据及损失函数计算损失,并使用优化器例如SGD或者Adam更新网络中参数 其中 根据对应梯度的计算方法为 之后令迭代计数器t加1,转到步骤2).S5输出得到的网络f 1,f 2,...,f V。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010412886.6A CN111985520B (zh) | 2020-05-15 | 2020-05-15 | 一种基于图卷积神经网络的多模态分类方法 |
CN202010412886.6 | 2020-05-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021227091A1 true WO2021227091A1 (zh) | 2021-11-18 |
Family
ID=73442010
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/090879 WO2021227091A1 (zh) | 2020-05-15 | 2020-05-18 | 一种基于图卷积神经网络的多模态分类方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111985520B (zh) |
WO (1) | WO2021227091A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114238752A (zh) * | 2021-11-30 | 2022-03-25 | 湖南大学 | 物品推荐方法、装置及存储介质 |
CN114359627A (zh) * | 2021-12-15 | 2022-04-15 | 南京视察者智能科技有限公司 | 一种基于图卷积的目标检测后处理方法及装置 |
CN114662033A (zh) * | 2022-04-06 | 2022-06-24 | 昆明信息港传媒有限责任公司 | 一种基于文本和图像的多模态有害链接识别 |
CN115018010A (zh) * | 2022-07-11 | 2022-09-06 | 东南大学 | 一种基于图像和文本的多模态商品匹配方法 |
CN116049597A (zh) * | 2023-01-10 | 2023-05-02 | 北京百度网讯科技有限公司 | 网页的多任务模型的预训练方法、装置及电子设备 |
CN116130089A (zh) * | 2023-02-02 | 2023-05-16 | 湖南工商大学 | 基于超图神经网络的多模态抑郁症检测系统、装置及介质 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113283578A (zh) * | 2021-04-14 | 2021-08-20 | 南京大学 | 一种基于标记风险控制的数据去噪方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190325342A1 (en) * | 2018-04-20 | 2019-10-24 | Sri International | Embedding multimodal content in a common non-euclidean geometric space |
CN110782015A (zh) * | 2019-10-25 | 2020-02-11 | 腾讯科技(深圳)有限公司 | 神经网络的网络结构优化器的训练方法、装置及存储介质 |
CN111046227A (zh) * | 2019-11-29 | 2020-04-21 | 腾讯科技(深圳)有限公司 | 一种视频查重方法及装置 |
CN111046664A (zh) * | 2019-11-26 | 2020-04-21 | 哈尔滨工业大学(深圳) | 基于多粒度的图卷积神经网络的假新闻检测方法及系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106934055B (zh) * | 2017-03-20 | 2020-05-19 | 南京大学 | 一种基于不充分模态信息的半监督网页自动分类方法 |
CN109583519A (zh) * | 2018-12-27 | 2019-04-05 | 中国石油大学(华东) | 一种基于p-Laplacian图卷积神经网络的半监督分类方法 |
CN109766935A (zh) * | 2018-12-27 | 2019-05-17 | 中国石油大学(华东) | 一种基于超图p-Laplacian图卷积神经网络的半监督分类方法 |
CN110046656B (zh) * | 2019-03-28 | 2023-07-11 | 南京邮电大学 | 基于深度学习的多模态场景识别方法 |
-
2020
- 2020-05-15 CN CN202010412886.6A patent/CN111985520B/zh active Active
- 2020-05-18 WO PCT/CN2020/090879 patent/WO2021227091A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190325342A1 (en) * | 2018-04-20 | 2019-10-24 | Sri International | Embedding multimodal content in a common non-euclidean geometric space |
CN110782015A (zh) * | 2019-10-25 | 2020-02-11 | 腾讯科技(深圳)有限公司 | 神经网络的网络结构优化器的训练方法、装置及存储介质 |
CN111046664A (zh) * | 2019-11-26 | 2020-04-21 | 哈尔滨工业大学(深圳) | 基于多粒度的图卷积神经网络的假新闻检测方法及系统 |
CN111046227A (zh) * | 2019-11-29 | 2020-04-21 | 腾讯科技(深圳)有限公司 | 一种视频查重方法及装置 |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114238752A (zh) * | 2021-11-30 | 2022-03-25 | 湖南大学 | 物品推荐方法、装置及存储介质 |
CN114359627A (zh) * | 2021-12-15 | 2022-04-15 | 南京视察者智能科技有限公司 | 一种基于图卷积的目标检测后处理方法及装置 |
CN114359627B (zh) * | 2021-12-15 | 2024-06-07 | 南京视察者智能科技有限公司 | 一种基于图卷积的目标检测后处理方法及装置 |
CN114662033A (zh) * | 2022-04-06 | 2022-06-24 | 昆明信息港传媒有限责任公司 | 一种基于文本和图像的多模态有害链接识别 |
CN114662033B (zh) * | 2022-04-06 | 2024-05-03 | 昆明信息港传媒有限责任公司 | 一种基于文本和图像的多模态有害链接识别 |
CN115018010A (zh) * | 2022-07-11 | 2022-09-06 | 东南大学 | 一种基于图像和文本的多模态商品匹配方法 |
CN116049597A (zh) * | 2023-01-10 | 2023-05-02 | 北京百度网讯科技有限公司 | 网页的多任务模型的预训练方法、装置及电子设备 |
CN116049597B (zh) * | 2023-01-10 | 2024-04-19 | 北京百度网讯科技有限公司 | 网页的多任务模型的预训练方法、装置及电子设备 |
CN116130089A (zh) * | 2023-02-02 | 2023-05-16 | 湖南工商大学 | 基于超图神经网络的多模态抑郁症检测系统、装置及介质 |
CN116130089B (zh) * | 2023-02-02 | 2024-01-02 | 湖南工商大学 | 基于超图神经网络的多模态抑郁症检测系统、装置及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN111985520A (zh) | 2020-11-24 |
CN111985520B (zh) | 2022-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021227091A1 (zh) | 一种基于图卷积神经网络的多模态分类方法 | |
Hussain et al. | A deep neural network and classical features based scheme for objects recognition: an application for machine inspection | |
CN106202256B (zh) | 基于语义传播及混合多示例学习的Web图像检索方法 | |
CN111291212A (zh) | 基于图卷积神经网络的零样本草图图像检索方法和系统 | |
CN110717526A (zh) | 一种基于图卷积网络的无监督迁移学习方法 | |
CN112380435A (zh) | 基于异构图神经网络的文献推荐方法及推荐系统 | |
Rad et al. | Image annotation using multi-view non-negative matrix factorization with different number of basis vectors | |
CN112287170B (zh) | 一种基于多模态联合学习的短视频分类方法及装置 | |
CN112417097B (zh) | 一种用于舆情解析的多模态数据特征提取与关联方法 | |
CN110598018B (zh) | 一种基于协同注意力的草图图像检索方法 | |
CN112308115B (zh) | 一种多标签图像深度学习分类方法及设备 | |
CN114067385A (zh) | 基于度量学习的跨模态人脸检索哈希方法 | |
Chen et al. | RRGCCAN: Re-ranking via graph convolution channel attention network for person re-identification | |
CN115588122A (zh) | 一种基于多模态特征融合的新闻分类方法 | |
CN115687760A (zh) | 一种基于图神经网络的用户学习兴趣标签预测方法 | |
CN116258990A (zh) | 一种基于跨模态亲和力的小样本参考视频目标分割方法 | |
Gong et al. | Unsupervised RGB-T saliency detection by node classification distance and sparse constrained graph learning | |
CN113642602B (zh) | 一种基于全局与局部标签关系的多标签图像分类方法 | |
CN113886615A (zh) | 一种基于多粒度联想学习的手绘图像实时检索方法 | |
Wu | Application of improved boosting algorithm for art image classification | |
CN114896514B (zh) | 一种基于图神经网络的Web API标签推荐方法 | |
CN116883751A (zh) | 基于原型网络对比学习的无监督领域自适应图像识别方法 | |
CN113516118B (zh) | 一种图像与文本联合嵌入的多模态文化资源加工方法 | |
CN115797642A (zh) | 基于一致性正则化与半监督领域自适应图像语义分割算法 | |
Qi et al. | Scalable graph based non-negative multi-view embedding for image ranking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20935278 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20935278 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 080523) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20935278 Country of ref document: EP Kind code of ref document: A1 |