CN110046656A - Multi-modal scene recognition method based on deep learning - Google Patents
Multi-modal scene recognition method based on deep learning Download PDFInfo
- Publication number
- CN110046656A CN110046656A CN201910242039.7A CN201910242039A CN110046656A CN 110046656 A CN110046656 A CN 110046656A CN 201910242039 A CN201910242039 A CN 201910242039A CN 110046656 A CN110046656 A CN 110046656A
- Authority
- CN
- China
- Prior art keywords
- scene recognition
- layer
- text
- deep learning
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000013135 deep learning Methods 0.000 title claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 23
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 13
- 230000011218 segmentation Effects 0.000 claims abstract description 13
- 238000013145 classification model Methods 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 description 12
- 238000011176 pooling Methods 0.000 description 9
- 239000000284 extract Substances 0.000 description 4
- 208000003028 Stuttering Diseases 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
本发明揭示了一种基于深度学习的多模态场景识别方法,包括如下步骤:S1、对短文本进行分词处理;S2、将一组图片和短文本分词及相应的标签输入各自的卷积神经网络中进行训练;S3、训练短文本分类模型;S4、训练图片分类模型;S5、将S3与S4中的全连接层输出分别与标准分类结果计算交叉熵,计算平均欧式距离并以此作为损失值,随后再反馈回各自的卷积神经网络,最终得到完整的多模态场景识别模型;S6、将文本和图像预测结果向量相加,得到最终的分类结果;S7、将待识别的短文本和图像分别输入所训练出的所述多模态场景识别模型,进行场景识别。本发明提出了一种多模态场景搜索方式,为用户提供了更加精准、方便的场景识。
The invention discloses a multi-modal scene recognition method based on deep learning, which includes the following steps: S1, performing word segmentation processing on short text; S2, inputting a group of pictures and short text word segmentation and corresponding labels into respective convolutional neural networks Train in the network; S3, train the short text classification model; S4, train the image classification model; S5, calculate the cross entropy of the fully connected layer output in S3 and S4 and the standard classification result respectively, calculate the average Euclidean distance and use this as the loss value, and then feed it back to the respective convolutional neural networks, and finally obtain a complete multimodal scene recognition model; S6, add the text and image prediction result vectors to obtain the final classification result; S7, add the short text to be recognized and images are respectively input into the trained multi-modal scene recognition model for scene recognition. The present invention proposes a multi-modal scene search method, which provides users with more accurate and convenient scene recognition.
Description
技术领域technical field
本发明涉及一种多模态场景识别方法,具体涉及一种基于深度学习的多模态场景识别方法,属于人工智能、模式识别领域。The invention relates to a multimodal scene recognition method, in particular to a multimodal scene recognition method based on deep learning, and belongs to the fields of artificial intelligence and pattern recognition.
背景技术Background technique
深度学习是机器学习的一个崭新的领域,其目的是让机器学习更加接近人类智能,卷积神经网络是深度学习的代表性算法,具有结构简单、适应性强、训练参数少而连接多等特点,因此,多年来这一网络被广泛地应用在图像处理和模式识别等领域。Deep learning is a new field of machine learning. Its purpose is to make machine learning closer to human intelligence. Convolutional neural network is a representative algorithm of deep learning. It has the characteristics of simple structure, strong adaptability, few training parameters and many connections. Therefore, this network has been widely used in image processing and pattern recognition for many years.
具体而言,卷积神经网络是一种层次模型,其输入是原始数据,通过卷积操作、池化操作、非线性激活函数等一系列操作的层层叠叠,将高层语意信息逐层从原始数据输入层中抽取出来、并逐层抽象。这一过程被称为“前馈运算”。最终,卷积神经网络最后一层输出目标函数,通过设计损失函数,计算预测值和真实值之间的误差损失,再通过反向传播算法,将误差由最后一层逐层向前反馈,更新每层参数,并在更新参数后再次前馈。如此往复,直到网络模型收敛,从而达到模型训练的目的。Specifically, a convolutional neural network is a hierarchical model whose input is raw data. Through a series of operations such as convolution operations, pooling operations, and nonlinear activation functions, the high-level semantic information is transformed from the original data layer by layer. The data input layer is extracted and abstracted layer by layer. This process is called "feedforward operation". Finally, the last layer of the convolutional neural network outputs the objective function. By designing the loss function, the error loss between the predicted value and the real value is calculated, and then the error is fed back layer by layer from the last layer through the back propagation algorithm to update parameters at each layer and feed forward again after updating the parameters. Repeat this until the network model converges, so as to achieve the purpose of model training.
目前常用的模态融合方式主要包括决策融合和特征融合两种方式。At present, the commonly used modal fusion methods mainly include decision fusion and feature fusion.
决策融合是指在获得两个模态分类结果的基础上,对两类结果进行加权综合,得出最终结果。Meng-Ju Han等在研究中提出了一种决策融合策略,这一策略将训练样本与决策平面的平均欧氏距离归一化后作为融合的权重,取得了比单模态高约5%的识别率。决策融合的方法虽然处理过程比较简单,但是其所获得的结果不够客观。Decision fusion refers to the weighted synthesis of the two types of results on the basis of obtaining the two modal classification results to obtain the final result. Meng-Ju Han et al. proposed a decision fusion strategy in their research, which normalized the average Euclidean distance between the training samples and the decision plane as the fusion weight, and achieved about 5% higher than single-modality. Recognition rate. Although the process of decision fusion is relatively simple, the results obtained are not objective enough.
特征融合则是指在将从两个模态提取出来的特征进行融合后再次进行分类。S.Emerich等在研究中对提取的面部表情特征和语音特征进行了特征的融合,融合后的特征识别率和鲁棒性较单模态均有提升。特征融合的方法所得出的结果比较客观,但事其实现方式则过于复杂。Feature fusion refers to classifying again after the features extracted from the two modalities are fused. In their research, S.Emerich et al. performed feature fusion on the extracted facial expression features and speech features, and the fusion feature recognition rate and robustness were both improved compared to single-modality features. The results obtained by the method of feature fusion are relatively objective, but the implementation method is too complicated.
综上所述,如何在现有技术的基础上提出一种全新的多模态场景识别方法,尽可能地保留决策融合和特征融合两种方式各自的优点、克服其各自的不足,也就成为了本领域内技术人员亟待解决的问题。To sum up, how to propose a new multi-modal scene recognition method on the basis of the existing technology, keep the respective advantages of decision fusion and feature fusion as much as possible, and overcome their respective shortcomings, which becomes the It solves the problem to be solved urgently by those skilled in the art.
发明内容SUMMARY OF THE INVENTION
鉴于现有技术存在上述缺陷,本发明的目的是提出一种基于深度学习的多模态场景识别方法,包括如下步骤:In view of the above-mentioned defects in the prior art, the purpose of the present invention is to propose a multi-modal scene recognition method based on deep learning, comprising the following steps:
S1、对短文本进行分词处理;S1. Perform word segmentation on short texts;
S2、将一组图片和短文本分词及相应的标签输入各自的卷积神经网络中进行训练;S2. Input a set of pictures and short text word segmentation and corresponding labels into their respective convolutional neural networks for training;
S3、训练短文本分类模型;S3. Train a short text classification model;
S4、训练图片分类模型;S4, train the image classification model;
S5、将S3与S4中的全连接层输出分别与标准分类结果计算交叉熵,计算平均欧式距离并以此作为损失值,随后再反馈回各自的卷积神经网络,反复进行训练,直至模型收敛,最终得到完整的多模态场景识别模型;S5. Calculate the cross-entropy of the fully connected layer output in S3 and S4 and the standard classification result respectively, calculate the average Euclidean distance and use this as the loss value, and then feed back to the respective convolutional neural network, and repeat the training until the model converges , and finally obtain a complete multimodal scene recognition model;
S6、将经过训练的文本和图像预测结果向量相加,得到最终的分类结果;S6. Add the trained text and image prediction result vectors to obtain the final classification result;
S7、将待识别的短文本和图像分别输入所训练出的所述多模态场景识别模型,进行场景识别。S7. Input the short text and image to be recognized into the trained multimodal scene recognition model, respectively, to perform scene recognition.
优选地,S1具体包括如下步骤:使用结巴分词工具对短文本进行分词处理。Preferably, S1 specifically includes the following steps: using a stuttering word segmentation tool to perform word segmentation processing on the short text.
优选地,S3具体包括如下步骤:Preferably, S3 specifically includes the following steps:
S31、将输入的短文本分词结果量化,输入三个并行卷积层中;S31, quantify the word segmentation result of the input short text, and input it into three parallel convolutional layers;
S32、将所述三个并行卷积层的输出依次送入线性修正单元层和池化层中,得到多个池化输出结果;S32, sending the outputs of the three parallel convolutional layers to the linear correction unit layer and the pooling layer in turn to obtain multiple pooled output results;
S33、将多个所述池化输出结果连接在一起,经过随机丢弃,作为全连接层的输入,最后计算全连接层,得到文本分类预测结果向量输出。S33. Connect a plurality of the pooled output results together, discard them at random, and use them as the input of the fully connected layer, and finally calculate the fully connected layer to obtain a text classification prediction result vector output.
优选地,所述三个并行卷积层包括第一卷积层、第二卷积层以及第三卷积层,所述第一卷积层具有384个3*128大小的卷积核,所述第二卷积层具有256个4*128大小的卷积核,所述第三卷积层具有128个5*128大小的卷积核。Preferably, the three parallel convolutional layers include a first convolutional layer, a second convolutional layer and a third convolutional layer, and the first convolutional layer has 384 convolution kernels with a size of 3*128, so The second convolution layer has 256 convolution kernels of size 4*128, and the third convolution layer has 128 convolution kernels of size 5*128.
优选地,S4具体包括如下步骤:Preferably, S4 specifically includes the following steps:
S41、将输入的图片送入第一层卷积网络,通过设计的卷积核个数提取图片中相应的特征个数,输出卷积层结果;S41. Send the input image to the first-layer convolutional network, extract the corresponding number of features in the image through the number of designed convolution kernels, and output the result of the convolution layer;
S42、将卷积层的输出进行池化,通过卷积核压缩数据核参数的量,减少过拟合,再将池化结果输入下一层卷积,反复经过4次卷积池化,使卷积核内的权值初始化为随机值,并不断训练获得模型参数;S42. Pool the output of the convolution layer, compress the amount of data kernel parameters through the convolution kernel, reduce overfitting, and then input the pooling result into the next layer of convolution, and repeat the convolution pooling for 4 times to make The weights in the convolution kernel are initialized to random values, and the model parameters are obtained through continuous training;
S43、将最后一层池化结果输入全连接层,经过随机丢弃,计算得到图像分类预测结果向量输出。S43: Input the pooling result of the last layer into the fully connected layer, and after random discarding, the image classification prediction result vector output is obtained by calculation.
优选地,S5中所述计算平均欧式距离并以此作为损失值,具体包括如下步骤:使用损失函数S计算损失值,所述损失函数S的计算公式为Preferably, calculating the average Euclidean distance as described in S5 and using it as the loss value specifically includes the following steps: using the loss function S to calculate the loss value, and the calculation formula of the loss function S is:
其中,h1=H(p1,q1),h2=H(p2,q2),h3=H(p1,p2),p1为S3中输出的文本分类预测结果向量,p2为S4中输出的图像分类预测结果向量,q1为文本分类标准结果向量,q2为图像分类标准结果向量,H(·)为交叉熵函数。Among them, h 1 =H(p 1 ,q 1 ),h 2 =H(p 2 ,q 2 ),h 3 =H(p 1 ,p 2 ), p 1 is the text classification prediction result vector output in S3 , p 2 is the image classification prediction result vector output in S4, q 1 is the text classification standard result vector, q 2 is the image classification standard result vector, and H( ) is the cross entropy function.
优选地,S6具体包括如下步骤:使用Softmax函数将训练好的文本和图像预测结果向量相加,得到最终的分类结果。Preferably, S6 specifically includes the following steps: using the Softmax function to add the trained text and image prediction result vectors to obtain the final classification result.
与现有技术相比,本发明的优点主要体现在以下几个方面:Compared with the prior art, the advantages of the present invention are mainly reflected in the following aspects:
本发明所提供的基于深度学习的多模态场景识别方法,提出了一种全新的多模态场景搜索方式,为用户提供了更加精准、方便的场景识别手段。本发明的方法全面提取了文字与图像的特征,并设计了新的损失函数,利用多种模态的信息,提高了场景识别的准确率。The deep learning-based multi-modal scene recognition method provided by the present invention proposes a brand-new multi-modal scene search method, and provides users with a more accurate and convenient scene recognition method. The method of the invention comprehensively extracts the features of text and images, designs a new loss function, and utilizes the information of various modalities to improve the accuracy of scene recognition.
本发明也为同领域内的其他相关问题提供了参考,可以以此为依据进行拓展延伸,运用于其他与场景识别方法相关的技术方案中,具有十分广阔的应用前景。The present invention also provides reference for other related problems in the same field, and can be expanded and extended based on this, and has a very broad application prospect when applied to other technical solutions related to the scene recognition method.
以下便结合实施例附图,对本发明的具体实施方式作进一步的详述,以使本发明技术方案更易于理解、掌握。The specific embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings of the embodiments, so as to make the technical solutions of the present invention easier to understand and grasp.
附图说明Description of drawings
图1为本发明所构建的多模态场景识别模型的结构示意图。FIG. 1 is a schematic structural diagram of a multimodal scene recognition model constructed by the present invention.
具体实施方式Detailed ways
本发明针对现有场景识别方法结果不准确、复杂度高等问题提供了一种新的基于深度学习的多模态场景识别方法,将输入的多模态信息,利用卷积神经网络分别提取图像和文本模态的特征信息,并将多模态特征信息进行融合,提高场景识别的准确率。Aiming at the problems of inaccurate results and high complexity of the existing scene recognition methods, the present invention provides a new multi-modal scene recognition method based on deep learning. feature information of text modalities, and fuse multi-modal feature information to improve the accuracy of scene recognition.
进一步而言,本发明的基于深度学习的多模态场景识别方法,包括如下步骤。Further, the deep learning-based multimodal scene recognition method of the present invention includes the following steps.
S1、使用结巴分词工具对短文本进行分词处理。S1. Use the stuttering word segmentation tool to segment the short text.
S2、将一组图片和短文本分词及相应的标签输入各自的卷积神经网络中进行训练。S2. Input a set of pictures and short text word segmentation and corresponding labels into the respective convolutional neural networks for training.
S3、训练短文本分类模型。具体包括如下步骤:S3. Train a short text classification model. Specifically include the following steps:
S31、在短文本分类模型的训练过程中,将输入的短文本分词结果量化,输入三个并行卷积层中。S31. In the training process of the short text classification model, quantify the input short text word segmentation result and input it into three parallel convolutional layers.
所述三个并行卷积层包括第一卷积层、第二卷积层以及第三卷积层,所述第一卷积层具有384个3*128大小的卷积核,所述第二卷积层具有256个4*128大小的卷积核,所述第三卷积层具有128个5*128大小的卷积核。The three parallel convolutional layers include a first convolutional layer, a second convolutional layer, and a third convolutional layer, the first convolutional layer has 384 convolution kernels of size 3*128, the second convolutional layer The convolution layer has 256 convolution kernels of size 4*128, and the third convolution layer has 128 convolution kernels of size 5*128.
S32、将所述三个并行卷积层的输出依次送入线性修正单元(relu)层和池化层中,得到多个池化输出结果。S32. Send the outputs of the three parallel convolutional layers to the linear correction unit (relu) layer and the pooling layer in turn to obtain multiple pooled output results.
S33、将多个所述池化输出结果连接在一起,经过随机丢弃,作为全连接层的输入,最后计算全连接层,得到文本分类预测结果向量输出。S33. Connect a plurality of the pooled output results together, discard them at random, and use them as the input of the fully connected layer, and finally calculate the fully connected layer to obtain a text classification prediction result vector output.
S4、训练图片分类模型。具体包括如下步骤:S4. Train the image classification model. Specifically include the following steps:
S41、将输入的图片送入第一层卷积网络,通过设计的卷积核个数提取图片中相应的特征个数,输出卷积层结果。S41. Send the input image to the first-layer convolutional network, extract the corresponding number of features in the image through the designed number of convolution kernels, and output the result of the convolution layer.
S42、将卷积层的输出进行池化,通过卷积核压缩数据核参数的量,减少过拟合,再将池化结果输入下一层卷积,反复经过4次卷积池化,使卷积核内的权值初始化为随机值,并不断训练获得适用于本发明方法所使用的模型参数。S42. Pool the output of the convolution layer, compress the amount of data kernel parameters through the convolution kernel, reduce overfitting, and then input the pooling result into the next layer of convolution, and repeat the convolution pooling for 4 times to make The weights in the convolution kernel are initialized to random values, and are continuously trained to obtain model parameters suitable for the method of the present invention.
S43、将最后一层池化结果输入全连接层,经过随机丢弃,计算得到图像分类预测结果向量输出。S43: Input the pooling result of the last layer into the fully connected layer, and after random discarding, the image classification prediction result vector output is obtained by calculation.
S5、将S3与S4中的全连接层输出分别与标准分类结果计算交叉熵,计算平均欧式距离并以此作为损失值,随后再反馈回各自的卷积神经网络,反复进行训练,直至模型收敛,最终得到完整的多模态场景识别模型。模型结构如图1所示。S5. Calculate the cross-entropy of the fully connected layer output in S3 and S4 and the standard classification result respectively, calculate the average Euclidean distance and use this as the loss value, and then feed back to the respective convolutional neural network, and repeat the training until the model converges , and finally a complete multimodal scene recognition model is obtained. The model structure is shown in Figure 1.
所述计算平均欧式距离并以此作为损失值,具体包括如下步骤:使用损失函数S计算损失值,所述损失函数S的计算公式为The calculating the average Euclidean distance as the loss value specifically includes the following steps: using the loss function S to calculate the loss value, and the calculation formula of the loss function S is:
其中,h1=H(p1,q1),h2=H(p2,q2),h3=H(p1,p2),p1为S3中输出的文本分类预测结果向量,p2为S4中输出的图像分类预测结果向量,q1为文本分类标准结果向量,q2为图像分类标准结果向量,H(·)为交叉熵函数。Among them, h 1 =H(p 1 ,q 1 ),h 2 =H(p 2 ,q 2 ),h 3 =H(p 1 ,p 2 ), p 1 is the text classification prediction result vector output in S3 , p 2 is the image classification prediction result vector output in S4, q 1 is the text classification standard result vector, q 2 is the image classification standard result vector, and H( ) is the cross entropy function.
S6、使用Softmax函数将训练好的文本和图像预测结果向量相加,得到最终的分类结果。S6. Use the Softmax function to add the trained text and image prediction result vectors to obtain the final classification result.
S7、将待识别的短文本和图像分别输入所训练出的所述多模态场景识别模型,进行场景识别。S7. Input the short text and image to be recognized into the trained multimodal scene recognition model, respectively, to perform scene recognition.
总体而言,本发明将图像卷积神经网络与短文本卷积神经网络相融合,采用一种新的决策融合方式,先经过训练获得两类分类结果再与标准结果之间计算交叉熵,然后再计算两种模态之间训练所得的分类结果之间的交叉熵,最后计算三者的平均欧式距离作为损失值返回到前馈网络更新参数,相比于现有技术具有更高的识别率。In general, the present invention integrates the image convolutional neural network and the short text convolutional neural network, adopts a new decision fusion method, first obtains two types of classification results through training, and then calculates the cross-entropy between the standard results and the results. Then calculate the cross entropy between the classification results obtained from the training between the two modalities, and finally calculate the average Euclidean distance of the three as the loss value to return to the feedforward network to update the parameters, which has a higher recognition rate than the prior art. .
本发明所提供的基于深度学习的多模态场景识别方法,提出了一种全新的多模态场景搜索方式,为用户提供了更加精准、方便的场景识别手段。本发明的方法全面提取了文字与图像的特征,并设计了新的损失函数,利用多种模态的信息,提高了场景识别的准确率。The deep learning-based multi-modal scene recognition method provided by the present invention proposes a brand-new multi-modal scene search method, and provides users with a more accurate and convenient scene recognition method. The method of the invention comprehensively extracts the features of text and images, designs a new loss function, and utilizes the information of various modalities to improve the accuracy of scene recognition.
本发明也为同领域内的其他相关问题提供了参考,可以以此为依据进行拓展延伸,运用于其他与场景识别方法相关的技术方案中,具有十分广阔的应用前景。The present invention also provides reference for other related problems in the same field, and can be expanded and extended based on this, and has a very broad application prospect when applied to other technical solutions related to the scene recognition method.
本发明也为同领域内的其他相关问题提供了参考,可以以此为依据进行拓展延伸,运用于其他与对情绪识别分析方法相关的技术方案中,具有十分广阔的应用前景。The present invention also provides reference for other related problems in the same field, and can be expanded and extended on the basis of this, and has a very broad application prospect when applied to other technical solutions related to the emotion recognition and analysis method.
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神和基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本发明内,不应将权利要求中的任何附图标记视为限制所涉及的权利要求。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit and essential characteristics of the present invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes that come within the meaning and range of equivalents of , are intended to be embraced within the invention, and any reference signs in the claims shall not be construed as limiting the involved claim.
此外,应当理解,虽然本说明书按照实施方式加以描述,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施例中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。In addition, it should be understood that although this specification is described in terms of embodiments, not each embodiment only includes an independent technical solution, and this description in the specification is only for the sake of clarity, and those skilled in the art should take the specification as a whole , the technical solutions in each embodiment can also be appropriately combined to form other implementations that can be understood by those skilled in the art.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910242039.7A CN110046656B (en) | 2019-03-28 | 2019-03-28 | Multi-mode scene recognition method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910242039.7A CN110046656B (en) | 2019-03-28 | 2019-03-28 | Multi-mode scene recognition method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110046656A true CN110046656A (en) | 2019-07-23 |
CN110046656B CN110046656B (en) | 2023-07-11 |
Family
ID=67275472
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910242039.7A Active CN110046656B (en) | 2019-03-28 | 2019-03-28 | Multi-mode scene recognition method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110046656B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866092A (en) * | 2019-11-25 | 2020-03-06 | 三角兽(北京)科技有限公司 | Information searching method and device, electronic equipment and storage medium |
CN111079813A (en) * | 2019-12-10 | 2020-04-28 | 北京百度网讯科技有限公司 | Classification model calculation method and device based on model parallelism |
CN111310795A (en) * | 2020-01-19 | 2020-06-19 | 中国科学院动物研究所 | Multi-modal fruit fly recognition system and method based on image and molecular data |
CN111985520A (en) * | 2020-05-15 | 2020-11-24 | 南京智谷人工智能研究院有限公司 | A Multimodal Classification Method Based on Graph Convolutional Neural Networks |
CN112115806A (en) * | 2020-08-28 | 2020-12-22 | 河海大学 | Accurate classification method of remote sensing image scene based on Dual-ResNet small sample learning |
CN112527858A (en) * | 2020-11-26 | 2021-03-19 | 微梦创科网络科技(中国)有限公司 | Marketing account identification method, device, medium and equipment based on social content |
CN112884074A (en) * | 2021-03-22 | 2021-06-01 | 杭州太火鸟科技有限公司 | Image design method, equipment, storage medium and device based on decision tree |
CN113177961A (en) * | 2021-06-07 | 2021-07-27 | 傲雄在线(重庆)科技有限公司 | Multi-mode depth model training method for seal image-text comparison |
CN113393833A (en) * | 2021-06-16 | 2021-09-14 | 中国科学技术大学 | Audio and video awakening method, system, device and storage medium |
CN113554021A (en) * | 2021-06-07 | 2021-10-26 | 傲雄在线(重庆)科技有限公司 | Intelligent seal identification method |
CN114090780A (en) * | 2022-01-20 | 2022-02-25 | 宏龙科技(杭州)有限公司 | Prompt learning-based rapid picture classification method |
CN114241279A (en) * | 2021-12-30 | 2022-03-25 | 中科讯飞互联(北京)信息科技有限公司 | Image-text joint error correction method, device, storage medium and computer equipment |
CN114266938A (en) * | 2021-12-23 | 2022-04-01 | 南京邮电大学 | A scene recognition method based on multimodal information and global attention mechanism |
CN114581861A (en) * | 2022-03-02 | 2022-06-03 | 北京交通大学 | A Track Region Recognition Method Based on Deep Learning Convolutional Neural Networks |
CN114757287A (en) * | 2022-04-19 | 2022-07-15 | 王荣 | An automated testing method based on multimodal fusion of text and images |
CN114942857A (en) * | 2021-11-11 | 2022-08-26 | 北京电信发展有限公司 | Multi-mode service intelligent diagnosis system |
CN115115868A (en) * | 2022-04-13 | 2022-09-27 | 之江实验室 | Triple-modal collaborative scene recognition method based on triples |
WO2023056889A1 (en) * | 2021-10-09 | 2023-04-13 | 百果园技术(新加坡)有限公司 | Model training and scene recognition method and apparatus, device, and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
WO2018213841A1 (en) * | 2017-05-19 | 2018-11-22 | Google Llc | Multi-task multi-modal machine learning model |
CN109146849A (en) * | 2018-07-26 | 2019-01-04 | 昆明理工大学 | A kind of road surface crack detection method based on convolutional neural networks and image recognition |
-
2019
- 2019-03-28 CN CN201910242039.7A patent/CN110046656B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018213841A1 (en) * | 2017-05-19 | 2018-11-22 | Google Llc | Multi-task multi-modal machine learning model |
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
CN109146849A (en) * | 2018-07-26 | 2019-01-04 | 昆明理工大学 | A kind of road surface crack detection method based on convolutional neural networks and image recognition |
Non-Patent Citations (1)
Title |
---|
梁蒙蒙等: "基于随机化融合和CNN的多模态肺部肿瘤图像识别", 《南京大学学报(自然科学)》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866092A (en) * | 2019-11-25 | 2020-03-06 | 三角兽(北京)科技有限公司 | Information searching method and device, electronic equipment and storage medium |
CN111079813A (en) * | 2019-12-10 | 2020-04-28 | 北京百度网讯科技有限公司 | Classification model calculation method and device based on model parallelism |
CN111079813B (en) * | 2019-12-10 | 2023-07-07 | 北京百度网讯科技有限公司 | Classification model calculation method and device based on model parallelism |
CN111310795A (en) * | 2020-01-19 | 2020-06-19 | 中国科学院动物研究所 | Multi-modal fruit fly recognition system and method based on image and molecular data |
CN111985520A (en) * | 2020-05-15 | 2020-11-24 | 南京智谷人工智能研究院有限公司 | A Multimodal Classification Method Based on Graph Convolutional Neural Networks |
CN111985520B (en) * | 2020-05-15 | 2022-08-16 | 南京智谷人工智能研究院有限公司 | Multi-mode classification method based on graph convolution neural network |
CN112115806A (en) * | 2020-08-28 | 2020-12-22 | 河海大学 | Accurate classification method of remote sensing image scene based on Dual-ResNet small sample learning |
CN112115806B (en) * | 2020-08-28 | 2022-08-19 | 河海大学 | Remote sensing image scene accurate classification method based on Dual-ResNet small sample learning |
CN112527858A (en) * | 2020-11-26 | 2021-03-19 | 微梦创科网络科技(中国)有限公司 | Marketing account identification method, device, medium and equipment based on social content |
CN112884074A (en) * | 2021-03-22 | 2021-06-01 | 杭州太火鸟科技有限公司 | Image design method, equipment, storage medium and device based on decision tree |
CN113177961A (en) * | 2021-06-07 | 2021-07-27 | 傲雄在线(重庆)科技有限公司 | Multi-mode depth model training method for seal image-text comparison |
CN113554021A (en) * | 2021-06-07 | 2021-10-26 | 傲雄在线(重庆)科技有限公司 | Intelligent seal identification method |
CN113554021B (en) * | 2021-06-07 | 2023-12-15 | 重庆傲雄在线信息技术有限公司 | Intelligent seal identification method |
CN113393833A (en) * | 2021-06-16 | 2021-09-14 | 中国科学技术大学 | Audio and video awakening method, system, device and storage medium |
CN113393833B (en) * | 2021-06-16 | 2024-04-02 | 中国科学技术大学 | Audio and video wake-up methods, systems, devices and storage media |
WO2023056889A1 (en) * | 2021-10-09 | 2023-04-13 | 百果园技术(新加坡)有限公司 | Model training and scene recognition method and apparatus, device, and medium |
CN114942857A (en) * | 2021-11-11 | 2022-08-26 | 北京电信发展有限公司 | Multi-mode service intelligent diagnosis system |
CN114266938A (en) * | 2021-12-23 | 2022-04-01 | 南京邮电大学 | A scene recognition method based on multimodal information and global attention mechanism |
CN114241279A (en) * | 2021-12-30 | 2022-03-25 | 中科讯飞互联(北京)信息科技有限公司 | Image-text joint error correction method, device, storage medium and computer equipment |
CN114090780A (en) * | 2022-01-20 | 2022-02-25 | 宏龙科技(杭州)有限公司 | Prompt learning-based rapid picture classification method |
CN114581861A (en) * | 2022-03-02 | 2022-06-03 | 北京交通大学 | A Track Region Recognition Method Based on Deep Learning Convolutional Neural Networks |
CN115115868A (en) * | 2022-04-13 | 2022-09-27 | 之江实验室 | Triple-modal collaborative scene recognition method based on triples |
CN115115868B (en) * | 2022-04-13 | 2024-05-07 | 之江实验室 | A triplet-based multimodal collaborative scene recognition method |
CN114757287A (en) * | 2022-04-19 | 2022-07-15 | 王荣 | An automated testing method based on multimodal fusion of text and images |
Also Published As
Publication number | Publication date |
---|---|
CN110046656B (en) | 2023-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110046656B (en) | Multi-mode scene recognition method based on deep learning | |
CN109284506B (en) | User comment emotion analysis system and method based on attention convolution neural network | |
CN110852368B (en) | Global and local feature embedding and image-text fusion emotion analysis method and system | |
CN109933795B (en) | Text Sentiment Analysis System Based on Context-Sentiment Word Vector | |
CN110287320B (en) | A Deep Learning Multi-Class Sentiment Analysis Model Combined with Attention Mechanism | |
CN110674305B (en) | Commodity information classification method based on deep feature fusion model | |
CN106250855B (en) | Multi-core learning based multi-modal emotion recognition method | |
CN103984959B (en) | A kind of image classification method based on data and task-driven | |
CN109492101B (en) | Text classification method, system and medium based on label information and text characteristics | |
CN108510012A (en) | A kind of target rapid detection method based on Analysis On Multi-scale Features figure | |
CN109460737A (en) | A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network | |
CN110969020A (en) | CNN and attention mechanism-based Chinese named entity identification method, system and medium | |
CN108765383B (en) | Video description method based on deep migration learning | |
CN108596039A (en) | A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks | |
CN110502753A (en) | A Deep Learning Sentiment Analysis Model Based on Semantic Enhancement and Its Analysis Method | |
CN105740349A (en) | Sentiment classification method capable of combining Doc2vce with convolutional neural network | |
CN111897957B (en) | Capsule neural network integrating multi-scale feature attention and text classification method | |
CN106845411A (en) | A kind of video presentation generation method based on deep learning and probability graph model | |
CN107742095A (en) | Chinese sign language recognition method based on convolutional neural network | |
CN114049381A (en) | A Siamese Cross-Target Tracking Method Fusing Multi-layer Semantic Information | |
CN111783688B (en) | A classification method of remote sensing image scene based on convolutional neural network | |
CN103489033A (en) | Incremental type learning method integrating self-organizing mapping and probability neural network | |
CN113139468B (en) | Video abstract generation method fusing local target features and global features | |
CN106897254A (en) | A kind of network representation learning method | |
CN111401261B (en) | Robot gesture recognition method based on GAN-CNN framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |