CN110046656A - Multi-modal scene recognition method based on deep learning - Google Patents

Multi-modal scene recognition method based on deep learning Download PDF

Info

Publication number
CN110046656A
CN110046656A CN201910242039.7A CN201910242039A CN110046656A CN 110046656 A CN110046656 A CN 110046656A CN 201910242039 A CN201910242039 A CN 201910242039A CN 110046656 A CN110046656 A CN 110046656A
Authority
CN
China
Prior art keywords
scene recognition
layer
text
modal
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910242039.7A
Other languages
Chinese (zh)
Other versions
CN110046656B (en
Inventor
吴家皋
刘源
孙璨
郑剑刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201910242039.7A priority Critical patent/CN110046656B/en
Publication of CN110046656A publication Critical patent/CN110046656A/en
Application granted granted Critical
Publication of CN110046656B publication Critical patent/CN110046656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The multi-modal scene recognition method based on deep learning that present invention discloses a kind of includes the following steps: S1, carries out word segmentation processing to short text;S2, one group of picture and short text participle and corresponding label are inputted in respective convolutional neural networks and is trained;S3, training short text disaggregated model;S4, training picture classification model;S5, the full articulamentum in S3 and S4 is exported respectively with criteria classification result calculating cross entropy, calculating average Euclidean distance and in this, as penalty values, then feeds back to respective convolutional neural networks again, finally obtains complete multi-modal scene Recognition model;S6, text is added with image prediction result vector, obtains final classification results;S7, short text to be identified and image are inputted to the multi-modal scene Recognition model trained respectively, carry out scene Recognition.The invention proposes a kind of multi-modal scene search modes, provide more accurate, convenient scene for user and know.

Description

Multi-modal scene recognition method based on deep learning
Technical field
The present invention relates to a kind of multi-modal scene recognition methods, and in particular to a kind of multi-modal scene based on deep learning Recognition methods belongs to artificial intelligence, area of pattern recognition.
Background technique
Deep learning is a brand-new field of machine learning, and the purpose is to allow machine learning to be more nearly mankind's intelligence Can, convolutional neural networks are the representative algorithms of deep learning, with structure is simple, adaptable, training parameter is few and connects The features such as more, therefore, this network is widely used in the fields such as image procossing and pattern-recognition for many years.
Specifically, convolutional neural networks are a kind of hierarchical models, input is initial data, passes through convolution operation, pond Change the layer upon layer of of the sequence of operations such as operation, nonlinear activation function, high-rise semantics information is successively inputted from initial data It extracts in layer and is successively abstracted.This process is referred to as " feed forward operation ".Finally, convolutional neural networks the last layer is defeated Objective function out calculates the error loss between predicted value and true value, then calculate by backpropagation by allowable loss function Method updates every layer parameter by error by the layer-by-layer feedforward of the last layer, and feedovers again after undated parameter.And so on, Until network model is restrained, to achieve the purpose that model training.
Currently used modality fusion mode mainly includes Decision fusion and Fusion Features two ways.
Decision fusion refers on the basis of obtaining two mode classification results, is weighted synthesis to two class results, obtains Final result out.Meng-Ju Han etc. proposes a kind of Decision fusion strategy under study for action, this strategy is by training sample and certainly Weight after the average Euclidean distance normalization of plan plane as fusion, achieves about 5% discrimination higher than single mode.Decision Although the method treatment process of fusion is fairly simple, its result obtained is not objective enough.
Fusion Features then refer to classifies again after being merged the feature come out from two Frequency extractions. S.Emerich etc. has carried out the fusion of feature, fused spy to the facial expression feature of extraction and phonetic feature under study for action Levying discrimination and robustness has promotion compared with single mode.The result that the method for Fusion Features is obtained is more objective, but thing is in fact Existing mode is then excessively complicated.
In conclusion how to propose a kind of completely new multi-modal scene recognition method on the basis of existing technology, to the greatest extent may be used Can ground retain the respective advantage of Decision fusion and Fusion Features two ways, overcome its respective deficiency, also just become ability Technical staff's urgent problem to be solved in domain.
Summary of the invention
In view of the prior art there are drawbacks described above, the purpose of the present invention is to propose to a kind of multi-modal field based on deep learning Scape recognition methods, includes the following steps:
S1, word segmentation processing is carried out to short text;
S2, one group of picture and short text participle and corresponding label are inputted in respective convolutional neural networks and is instructed Practice;
S3, training short text disaggregated model;
S4, training picture classification model;
S5, the full articulamentum output in S3 and S4 is calculated into cross entropy with criteria classification result respectively, calculates average Euclidean Distance and in this, as penalty values, then feed back to respective convolutional neural networks again, training be repeated, until model receive It holds back, finally obtains complete multi-modal scene Recognition model;
S6, trained text is added with image prediction result vector, obtains final classification results;
S7, short text to be identified and image are inputted to the multi-modal scene Recognition model trained respectively, into Row scene Recognition.
Preferably, S1 specifically comprises the following steps: to carry out word segmentation processing to short text using stammerer participle tool.
Preferably, S3 specifically comprises the following steps:
S31, the short text word segmentation result of input is quantified, is inputted in three parallel-convolution layers;
S32, the output of three parallel-convolution layers is sequentially sent in linear amending unit layer and pond layer, is obtained more A pondization exports result;
S33, multiple pondizations output results are linked together, by random drop, as the input of full articulamentum, Full articulamentum is finally calculated, the output of text classification prediction result vector is obtained.
Preferably, three parallel-convolution layers include the first convolutional layer, the second convolutional layer and third convolutional layer, described First convolutional layer has the convolution kernel of 384 3*128 sizes, and second convolutional layer has the convolution of 256 4*128 sizes Core, the third convolutional layer have the convolution kernel of 128 5*128 sizes.
Preferably, S4 specifically comprises the following steps:
S41, the picture of input is sent into first layer convolutional network, is extracted by the convolution kernel number of design corresponding in picture Characteristic Number, export convolutional layer result;
S42, the output of convolutional layer is subjected to pond, by the amount of convolution kernel compressed data nuclear parameter, reduces over-fitting, then Pond result is inputted into next layer of convolution, passes through 4 convolution ponds repeatedly, makes the weight initialization random value in convolution kernel, And constantly training obtains model parameter;
S43, the last layer pond result is inputted into full articulamentum, by random drop, image classification prediction is calculated Result vector output.
Preferably, described in S5 calculating average Euclidean distance and in this, as penalty values, specifically comprise the following steps: using Loss function S calculates penalty values, and the calculation formula of the loss function S is
Wherein, h1=H (p1,q1),h2=H (p2,q2),h3=H (p1,p2), p1For the text classification prediction knot exported in S3 Fruit vector, p2For the image classification prediction result vector exported in S4, q1For text classification standard results vector, q2For image point Class standard result vector, H () are to intersect entropy function.
Preferably, S6 specifically comprises the following steps: trained text and image prediction result using Softmax function Addition of vectors obtains final classification results.
Compared with prior art, advantages of the present invention is mainly reflected in the following aspects:
Multi-modal scene recognition method provided by the present invention based on deep learning proposes a kind of completely new multi-modal Scene search mode provides more accurate, convenient scene Recognition means for user.Method of the invention is extracted text comprehensively The feature of word and image, and new loss function is devised, using the information of multiple modalities, improve the accurate of scene Recognition Rate.
The present invention also provides reference for other relevant issues in same domain, can carry out expanding on this basis and prolong It stretches, applies in other technical solutions relevant to scene recognition method, there is very wide application prospect.
Just attached drawing in conjunction with the embodiments below, the embodiment of the present invention is described in further detail, so that of the invention Technical solution is more readily understood, grasps.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of multi-modal scene Recognition model constructed by the present invention.
Specific embodiment
The present invention for existing scene recognition method result inaccuracy, complexity is high the problems such as provide it is a kind of it is new based on The multi-modal information of input is extracted image using convolutional neural networks by the multi-modal scene recognition method of deep learning respectively With the characteristic information of text modality, and multi-modal characteristic information is merged, improves the accuracy rate of scene Recognition.
Furthermore, the multi-modal scene recognition method of the invention based on deep learning, includes the following steps.
S1, word segmentation processing is carried out to short text using stammerer participle tool.
S2, one group of picture and short text participle and corresponding label are inputted in respective convolutional neural networks and is instructed Practice.
S3, training short text disaggregated model.Specifically comprise the following steps:
S31, in the training process of short text disaggregated model, the short text word segmentation result of input is quantified, inputs three In parallel-convolution layer.
Three parallel-convolution layers include the first convolutional layer, the second convolutional layer and third convolutional layer, the first volume Lamination has the convolution kernel of 384 3*128 sizes, and second convolutional layer has the convolution kernel of 256 4*128 sizes, described Third convolutional layer has the convolution kernel of 128 5*128 sizes.
S32, the output of three parallel-convolution layers is sequentially sent in linear amending unit (relu) layer and pond layer, Obtain multiple pondization output results.
S33, multiple pondizations output results are linked together, by random drop, as the input of full articulamentum, Full articulamentum is finally calculated, the output of text classification prediction result vector is obtained.
S4, training picture classification model.Specifically comprise the following steps:
S41, the picture of input is sent into first layer convolutional network, is extracted by the convolution kernel number of design corresponding in picture Characteristic Number, export convolutional layer result.
S42, the output of convolutional layer is subjected to pond, by the amount of convolution kernel compressed data nuclear parameter, reduces over-fitting, then Pond result is inputted into next layer of convolution, passes through 4 convolution ponds repeatedly, makes the weight initialization random value in convolution kernel, And constantly training is obtained suitable for model parameter used in the method for the present invention.
S43, the last layer pond result is inputted into full articulamentum, by random drop, image classification prediction is calculated Result vector output.
S5, the full articulamentum output in S3 and S4 is calculated into cross entropy with criteria classification result respectively, calculates average Euclidean Distance and in this, as penalty values, then feed back to respective convolutional neural networks again, training be repeated, until model receive It holds back, finally obtains complete multi-modal scene Recognition model.Model structure is as shown in Figure 1.
Calculating average Euclidean distance and in this, as penalty values, specifically comprises the following steps: to count using loss function S Penalty values are calculated, the calculation formula of the loss function S is
Wherein, h1=H (p1,q1),h2=H (p2,q2),h3=H (p1,p2), p1For the text classification prediction knot exported in S3 Fruit vector, p2For the image classification prediction result vector exported in S4, q1For text classification standard results vector, q2For image point Class standard result vector, H () are to intersect entropy function.
S6, trained text is added using Softmax function with image prediction result vector, obtains final classification As a result.
S7, short text to be identified and image are inputted to the multi-modal scene Recognition model trained respectively, into Row scene Recognition.
In general, the present invention blends image convolution neural network and short text convolutional neural networks, using one kind New Decision fusion mode first passes through training two class classification results of acquisition and calculates cross entropy between standard results again, then again The cross entropy between both modalities which between the resulting classification results of training is calculated, finally calculates the average Euclidean of three apart from conduct Penalty values return to feedforward network undated parameter, have higher discrimination compared with the prior art.
Multi-modal scene recognition method provided by the present invention based on deep learning proposes a kind of completely new multi-modal Scene search mode provides more accurate, convenient scene Recognition means for user.Method of the invention is extracted text comprehensively The feature of word and image, and new loss function is devised, using the information of multiple modalities, improve the accurate of scene Recognition Rate.
The present invention also provides reference for other relevant issues in same domain, can carry out expanding on this basis and prolong It stretches, applies in other technical solutions relevant to scene recognition method, there is very wide application prospect.
The present invention also provides reference for other relevant issues in same domain, can carry out expanding on this basis and prolong It stretches, applies to other to in the relevant technical solution of Emotion identification analysis method, there is very wide application prospect.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit and essential characteristics of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention, and any reference signs in the claims should not be construed as limiting the involved claims.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims (7)

1. a kind of multi-modal scene recognition method based on deep learning, which comprises the steps of:
S1, word segmentation processing is carried out to short text;
S2, one group of picture and short text participle and corresponding label are inputted in respective convolutional neural networks and is trained;
S3, training short text disaggregated model;
S4, training picture classification model;
S5, the full articulamentum output in S3 and S4 is calculated into cross entropy with criteria classification result respectively, calculates average Euclidean distance And in this, as penalty values, respective convolutional neural networks are then fed back to again, training is repeated, until model is restrained, most Complete multi-modal scene Recognition model is obtained eventually;
S6, trained text is added with image prediction result vector, obtains final classification results;
S7, short text to be identified and image are inputted to the multi-modal scene Recognition model trained respectively, carry out field Scape identification.
2. the multi-modal scene recognition method according to claim 1 based on deep learning, which is characterized in that S1 is specifically wrapped It includes following steps: word segmentation processing being carried out to short text using stammerer participle tool.
3. the multi-modal scene recognition method according to claim 1 based on deep learning, which is characterized in that S3 is specifically wrapped Include following steps:
S31, the short text word segmentation result of input is quantified, is inputted in three parallel-convolution layers;
S32, the output of three parallel-convolution layers is sequentially sent in linear amending unit layer and pond layer, obtains multiple ponds Change output result;
S33, multiple pondization output results are linked together, by random drop, as the input of full articulamentum, finally Full articulamentum is calculated, the output of text classification prediction result vector is obtained.
4. the multi-modal scene recognition method according to claim 3 based on deep learning, it is characterised in that: described three Parallel-convolution layer includes the first convolutional layer, the second convolutional layer and third convolutional layer, and first convolutional layer has 384 3* The convolution kernel of 128 sizes, second convolutional layer have the convolution kernel of 256 4*128 sizes, and the third convolutional layer has The convolution kernel of 128 5*128 sizes.
5. the multi-modal scene recognition method according to claim 3 based on deep learning, which is characterized in that S4 is specifically wrapped Include following steps:
S41, the picture of input is sent into first layer convolutional network, is extracted by the convolution kernel number of design corresponding special in picture Number is levied, convolutional layer result is exported;
S42, the output of convolutional layer is subjected to pond, by the amount of convolution kernel compressed data nuclear parameter, reduces over-fitting, then by pond Change result and input next layer of convolution, passes through 4 convolution ponds repeatedly, make the weight initialization random value in convolution kernel, not Disconnected training obtains model parameter;
S43, image classification prediction result is calculated by random drop in the full articulamentum of the last layer pond result input Vector output.
6. the multi-modal scene recognition method according to claim 5 based on deep learning, which is characterized in that described in S5 Calculate average Euclidean distance and in this, as penalty values, specifically comprise the following steps: using loss function S calculate penalty values, institute The calculation formula for stating loss function S is
Wherein, h1=H (p1,q1),h2=H (p2,q2),h3=H (p1,p2), p1For the text classification prediction result that is exported in S3 to Amount, p2For the image classification prediction result vector exported in S4, q1For text classification standard results vector, q2For image classification mark Quasi- result vector, H () are to intersect entropy function.
7. the multi-modal scene recognition method according to claim 1 based on deep learning, which is characterized in that S6 is specifically wrapped It includes following steps: trained text being added with image prediction result vector using Softmax function, obtains final classification As a result.
CN201910242039.7A 2019-03-28 2019-03-28 Multi-mode scene recognition method based on deep learning Active CN110046656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910242039.7A CN110046656B (en) 2019-03-28 2019-03-28 Multi-mode scene recognition method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910242039.7A CN110046656B (en) 2019-03-28 2019-03-28 Multi-mode scene recognition method based on deep learning

Publications (2)

Publication Number Publication Date
CN110046656A true CN110046656A (en) 2019-07-23
CN110046656B CN110046656B (en) 2023-07-11

Family

ID=67275472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910242039.7A Active CN110046656B (en) 2019-03-28 2019-03-28 Multi-mode scene recognition method based on deep learning

Country Status (1)

Country Link
CN (1) CN110046656B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866092A (en) * 2019-11-25 2020-03-06 三角兽(北京)科技有限公司 Information searching method and device, electronic equipment and storage medium
CN111079813A (en) * 2019-12-10 2020-04-28 北京百度网讯科技有限公司 Classification model calculation method and device based on model parallelism
CN111310795A (en) * 2020-01-19 2020-06-19 中国科学院动物研究所 Multi-modal fruit fly recognition system and method based on image and molecular data
CN111985520A (en) * 2020-05-15 2020-11-24 南京智谷人工智能研究院有限公司 Multi-mode classification method based on graph convolution neural network
CN112115806A (en) * 2020-08-28 2020-12-22 河海大学 Remote sensing image scene accurate classification method based on Dual-ResNet small sample learning
CN112527858A (en) * 2020-11-26 2021-03-19 微梦创科网络科技(中国)有限公司 Marketing account identification method, device, medium and equipment based on social content
CN112884074A (en) * 2021-03-22 2021-06-01 杭州太火鸟科技有限公司 Image design method, equipment, storage medium and device based on decision tree
CN113177961A (en) * 2021-06-07 2021-07-27 傲雄在线(重庆)科技有限公司 Multi-mode depth model training method for seal image-text comparison
CN113393833A (en) * 2021-06-16 2021-09-14 中国科学技术大学 Audio and video awakening method, system, device and storage medium
CN113554021A (en) * 2021-06-07 2021-10-26 傲雄在线(重庆)科技有限公司 Intelligent seal identification method
CN114090780A (en) * 2022-01-20 2022-02-25 宏龙科技(杭州)有限公司 Prompt learning-based rapid picture classification method
CN114581861A (en) * 2022-03-02 2022-06-03 北京交通大学 Track area identification method based on deep learning convolutional neural network
CN114942857A (en) * 2021-11-11 2022-08-26 北京电信发展有限公司 Multi-mode service intelligent diagnosis system
CN115115868A (en) * 2022-04-13 2022-09-27 之江实验室 Triple-modal collaborative scene recognition method based on triples
WO2023056889A1 (en) * 2021-10-09 2023-04-13 百果园技术(新加坡)有限公司 Model training and scene recognition method and apparatus, device, and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679491A (en) * 2017-09-29 2018-02-09 华中师范大学 A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data
WO2018213841A1 (en) * 2017-05-19 2018-11-22 Google Llc Multi-task multi-modal machine learning model
CN109146849A (en) * 2018-07-26 2019-01-04 昆明理工大学 A kind of road surface crack detection method based on convolutional neural networks and image recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018213841A1 (en) * 2017-05-19 2018-11-22 Google Llc Multi-task multi-modal machine learning model
CN107679491A (en) * 2017-09-29 2018-02-09 华中师范大学 A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data
CN109146849A (en) * 2018-07-26 2019-01-04 昆明理工大学 A kind of road surface crack detection method based on convolutional neural networks and image recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
梁蒙蒙等: "基于随机化融合和CNN的多模态肺部肿瘤图像识别", 《南京大学学报(自然科学)》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866092A (en) * 2019-11-25 2020-03-06 三角兽(北京)科技有限公司 Information searching method and device, electronic equipment and storage medium
CN111079813A (en) * 2019-12-10 2020-04-28 北京百度网讯科技有限公司 Classification model calculation method and device based on model parallelism
CN111079813B (en) * 2019-12-10 2023-07-07 北京百度网讯科技有限公司 Classification model calculation method and device based on model parallelism
CN111310795A (en) * 2020-01-19 2020-06-19 中国科学院动物研究所 Multi-modal fruit fly recognition system and method based on image and molecular data
CN111985520A (en) * 2020-05-15 2020-11-24 南京智谷人工智能研究院有限公司 Multi-mode classification method based on graph convolution neural network
CN111985520B (en) * 2020-05-15 2022-08-16 南京智谷人工智能研究院有限公司 Multi-mode classification method based on graph convolution neural network
CN112115806A (en) * 2020-08-28 2020-12-22 河海大学 Remote sensing image scene accurate classification method based on Dual-ResNet small sample learning
CN112115806B (en) * 2020-08-28 2022-08-19 河海大学 Remote sensing image scene accurate classification method based on Dual-ResNet small sample learning
CN112527858A (en) * 2020-11-26 2021-03-19 微梦创科网络科技(中国)有限公司 Marketing account identification method, device, medium and equipment based on social content
CN112884074A (en) * 2021-03-22 2021-06-01 杭州太火鸟科技有限公司 Image design method, equipment, storage medium and device based on decision tree
CN113177961A (en) * 2021-06-07 2021-07-27 傲雄在线(重庆)科技有限公司 Multi-mode depth model training method for seal image-text comparison
CN113554021A (en) * 2021-06-07 2021-10-26 傲雄在线(重庆)科技有限公司 Intelligent seal identification method
CN113554021B (en) * 2021-06-07 2023-12-15 重庆傲雄在线信息技术有限公司 Intelligent seal identification method
CN113393833A (en) * 2021-06-16 2021-09-14 中国科学技术大学 Audio and video awakening method, system, device and storage medium
CN113393833B (en) * 2021-06-16 2024-04-02 中国科学技术大学 Audio and video awakening method, system, equipment and storage medium
WO2023056889A1 (en) * 2021-10-09 2023-04-13 百果园技术(新加坡)有限公司 Model training and scene recognition method and apparatus, device, and medium
CN114942857A (en) * 2021-11-11 2022-08-26 北京电信发展有限公司 Multi-mode service intelligent diagnosis system
CN114090780A (en) * 2022-01-20 2022-02-25 宏龙科技(杭州)有限公司 Prompt learning-based rapid picture classification method
CN114581861A (en) * 2022-03-02 2022-06-03 北京交通大学 Track area identification method based on deep learning convolutional neural network
CN115115868A (en) * 2022-04-13 2022-09-27 之江实验室 Triple-modal collaborative scene recognition method based on triples
CN115115868B (en) * 2022-04-13 2024-05-07 之江实验室 Multi-mode collaborative scene recognition method based on triples

Also Published As

Publication number Publication date
CN110046656B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
CN110046656A (en) Multi-modal scene recognition method based on deep learning
CN106709461B (en) Activity recognition method and device based on video
Fathallah et al. Facial expression recognition via deep learning
CN107679526B (en) Human face micro-expression recognition method
CN108615010B (en) Facial expression recognition method based on parallel convolution neural network feature map fusion
Chai et al. Two streams recurrent neural networks for large-scale continuous gesture recognition
CN106529503B (en) A kind of integrated convolutional neural networks face emotion identification method
Liu et al. Multi-channel pose-aware convolution neural networks for multi-view facial expression recognition
CN109241255A (en) A kind of intension recognizing method based on deep learning
CN103268495B (en) Human body behavior modeling recognition methods based on priori knowledge cluster in computer system
CN108846350A (en) Tolerate the face identification method of change of age
CN106651830A (en) Image quality test method based on parallel convolutional neural network
Zhou et al. Convolutional neural networks based pornographic image classification
CN108537120A (en) A kind of face identification method and system based on deep learning
Kindiroglu et al. Temporal accumulative features for sign language recognition
Tan et al. Style interleaved learning for generalizable person re-identification
CN110490028A (en) Recognition of face network training method, equipment and storage medium based on deep learning
Sang et al. Discriminative deep feature learning for facial emotion recognition
CN117195148A (en) Ore emotion recognition method based on expression, electroencephalogram and voice multi-mode fusion
Cheng et al. Student action recognition based on deep convolutional generative adversarial network
Fan et al. Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition.
CN113159002B (en) Facial expression recognition method based on self-attention weight auxiliary module
Tian et al. 3D facial expression recognition using deep feature fusion CNN
CN110287938A (en) Event recognition method, system, equipment and medium based on critical segment detection
CN103440332B (en) A kind of image search method strengthening expression based on relational matrix regularization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant