CN110046656A - Multi-modal scene recognition method based on deep learning - Google Patents
Multi-modal scene recognition method based on deep learning Download PDFInfo
- Publication number
- CN110046656A CN110046656A CN201910242039.7A CN201910242039A CN110046656A CN 110046656 A CN110046656 A CN 110046656A CN 201910242039 A CN201910242039 A CN 201910242039A CN 110046656 A CN110046656 A CN 110046656A
- Authority
- CN
- China
- Prior art keywords
- scene recognition
- layer
- text
- modal
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
The multi-modal scene recognition method based on deep learning that present invention discloses a kind of includes the following steps: S1, carries out word segmentation processing to short text;S2, one group of picture and short text participle and corresponding label are inputted in respective convolutional neural networks and is trained;S3, training short text disaggregated model;S4, training picture classification model;S5, the full articulamentum in S3 and S4 is exported respectively with criteria classification result calculating cross entropy, calculating average Euclidean distance and in this, as penalty values, then feeds back to respective convolutional neural networks again, finally obtains complete multi-modal scene Recognition model;S6, text is added with image prediction result vector, obtains final classification results;S7, short text to be identified and image are inputted to the multi-modal scene Recognition model trained respectively, carry out scene Recognition.The invention proposes a kind of multi-modal scene search modes, provide more accurate, convenient scene for user and know.
Description
Technical field
The present invention relates to a kind of multi-modal scene recognition methods, and in particular to a kind of multi-modal scene based on deep learning
Recognition methods belongs to artificial intelligence, area of pattern recognition.
Background technique
Deep learning is a brand-new field of machine learning, and the purpose is to allow machine learning to be more nearly mankind's intelligence
Can, convolutional neural networks are the representative algorithms of deep learning, with structure is simple, adaptable, training parameter is few and connects
The features such as more, therefore, this network is widely used in the fields such as image procossing and pattern-recognition for many years.
Specifically, convolutional neural networks are a kind of hierarchical models, input is initial data, passes through convolution operation, pond
Change the layer upon layer of of the sequence of operations such as operation, nonlinear activation function, high-rise semantics information is successively inputted from initial data
It extracts in layer and is successively abstracted.This process is referred to as " feed forward operation ".Finally, convolutional neural networks the last layer is defeated
Objective function out calculates the error loss between predicted value and true value, then calculate by backpropagation by allowable loss function
Method updates every layer parameter by error by the layer-by-layer feedforward of the last layer, and feedovers again after undated parameter.And so on,
Until network model is restrained, to achieve the purpose that model training.
Currently used modality fusion mode mainly includes Decision fusion and Fusion Features two ways.
Decision fusion refers on the basis of obtaining two mode classification results, is weighted synthesis to two class results, obtains
Final result out.Meng-Ju Han etc. proposes a kind of Decision fusion strategy under study for action, this strategy is by training sample and certainly
Weight after the average Euclidean distance normalization of plan plane as fusion, achieves about 5% discrimination higher than single mode.Decision
Although the method treatment process of fusion is fairly simple, its result obtained is not objective enough.
Fusion Features then refer to classifies again after being merged the feature come out from two Frequency extractions.
S.Emerich etc. has carried out the fusion of feature, fused spy to the facial expression feature of extraction and phonetic feature under study for action
Levying discrimination and robustness has promotion compared with single mode.The result that the method for Fusion Features is obtained is more objective, but thing is in fact
Existing mode is then excessively complicated.
In conclusion how to propose a kind of completely new multi-modal scene recognition method on the basis of existing technology, to the greatest extent may be used
Can ground retain the respective advantage of Decision fusion and Fusion Features two ways, overcome its respective deficiency, also just become ability
Technical staff's urgent problem to be solved in domain.
Summary of the invention
In view of the prior art there are drawbacks described above, the purpose of the present invention is to propose to a kind of multi-modal field based on deep learning
Scape recognition methods, includes the following steps:
S1, word segmentation processing is carried out to short text;
S2, one group of picture and short text participle and corresponding label are inputted in respective convolutional neural networks and is instructed
Practice;
S3, training short text disaggregated model;
S4, training picture classification model;
S5, the full articulamentum output in S3 and S4 is calculated into cross entropy with criteria classification result respectively, calculates average Euclidean
Distance and in this, as penalty values, then feed back to respective convolutional neural networks again, training be repeated, until model receive
It holds back, finally obtains complete multi-modal scene Recognition model;
S6, trained text is added with image prediction result vector, obtains final classification results;
S7, short text to be identified and image are inputted to the multi-modal scene Recognition model trained respectively, into
Row scene Recognition.
Preferably, S1 specifically comprises the following steps: to carry out word segmentation processing to short text using stammerer participle tool.
Preferably, S3 specifically comprises the following steps:
S31, the short text word segmentation result of input is quantified, is inputted in three parallel-convolution layers;
S32, the output of three parallel-convolution layers is sequentially sent in linear amending unit layer and pond layer, is obtained more
A pondization exports result;
S33, multiple pondizations output results are linked together, by random drop, as the input of full articulamentum,
Full articulamentum is finally calculated, the output of text classification prediction result vector is obtained.
Preferably, three parallel-convolution layers include the first convolutional layer, the second convolutional layer and third convolutional layer, described
First convolutional layer has the convolution kernel of 384 3*128 sizes, and second convolutional layer has the convolution of 256 4*128 sizes
Core, the third convolutional layer have the convolution kernel of 128 5*128 sizes.
Preferably, S4 specifically comprises the following steps:
S41, the picture of input is sent into first layer convolutional network, is extracted by the convolution kernel number of design corresponding in picture
Characteristic Number, export convolutional layer result;
S42, the output of convolutional layer is subjected to pond, by the amount of convolution kernel compressed data nuclear parameter, reduces over-fitting, then
Pond result is inputted into next layer of convolution, passes through 4 convolution ponds repeatedly, makes the weight initialization random value in convolution kernel,
And constantly training obtains model parameter;
S43, the last layer pond result is inputted into full articulamentum, by random drop, image classification prediction is calculated
Result vector output.
Preferably, described in S5 calculating average Euclidean distance and in this, as penalty values, specifically comprise the following steps: using
Loss function S calculates penalty values, and the calculation formula of the loss function S is
Wherein, h1=H (p1,q1),h2=H (p2,q2),h3=H (p1,p2), p1For the text classification prediction knot exported in S3
Fruit vector, p2For the image classification prediction result vector exported in S4, q1For text classification standard results vector, q2For image point
Class standard result vector, H () are to intersect entropy function.
Preferably, S6 specifically comprises the following steps: trained text and image prediction result using Softmax function
Addition of vectors obtains final classification results.
Compared with prior art, advantages of the present invention is mainly reflected in the following aspects:
Multi-modal scene recognition method provided by the present invention based on deep learning proposes a kind of completely new multi-modal
Scene search mode provides more accurate, convenient scene Recognition means for user.Method of the invention is extracted text comprehensively
The feature of word and image, and new loss function is devised, using the information of multiple modalities, improve the accurate of scene Recognition
Rate.
The present invention also provides reference for other relevant issues in same domain, can carry out expanding on this basis and prolong
It stretches, applies in other technical solutions relevant to scene recognition method, there is very wide application prospect.
Just attached drawing in conjunction with the embodiments below, the embodiment of the present invention is described in further detail, so that of the invention
Technical solution is more readily understood, grasps.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of multi-modal scene Recognition model constructed by the present invention.
Specific embodiment
The present invention for existing scene recognition method result inaccuracy, complexity is high the problems such as provide it is a kind of it is new based on
The multi-modal information of input is extracted image using convolutional neural networks by the multi-modal scene recognition method of deep learning respectively
With the characteristic information of text modality, and multi-modal characteristic information is merged, improves the accuracy rate of scene Recognition.
Furthermore, the multi-modal scene recognition method of the invention based on deep learning, includes the following steps.
S1, word segmentation processing is carried out to short text using stammerer participle tool.
S2, one group of picture and short text participle and corresponding label are inputted in respective convolutional neural networks and is instructed
Practice.
S3, training short text disaggregated model.Specifically comprise the following steps:
S31, in the training process of short text disaggregated model, the short text word segmentation result of input is quantified, inputs three
In parallel-convolution layer.
Three parallel-convolution layers include the first convolutional layer, the second convolutional layer and third convolutional layer, the first volume
Lamination has the convolution kernel of 384 3*128 sizes, and second convolutional layer has the convolution kernel of 256 4*128 sizes, described
Third convolutional layer has the convolution kernel of 128 5*128 sizes.
S32, the output of three parallel-convolution layers is sequentially sent in linear amending unit (relu) layer and pond layer,
Obtain multiple pondization output results.
S33, multiple pondizations output results are linked together, by random drop, as the input of full articulamentum,
Full articulamentum is finally calculated, the output of text classification prediction result vector is obtained.
S4, training picture classification model.Specifically comprise the following steps:
S41, the picture of input is sent into first layer convolutional network, is extracted by the convolution kernel number of design corresponding in picture
Characteristic Number, export convolutional layer result.
S42, the output of convolutional layer is subjected to pond, by the amount of convolution kernel compressed data nuclear parameter, reduces over-fitting, then
Pond result is inputted into next layer of convolution, passes through 4 convolution ponds repeatedly, makes the weight initialization random value in convolution kernel,
And constantly training is obtained suitable for model parameter used in the method for the present invention.
S43, the last layer pond result is inputted into full articulamentum, by random drop, image classification prediction is calculated
Result vector output.
S5, the full articulamentum output in S3 and S4 is calculated into cross entropy with criteria classification result respectively, calculates average Euclidean
Distance and in this, as penalty values, then feed back to respective convolutional neural networks again, training be repeated, until model receive
It holds back, finally obtains complete multi-modal scene Recognition model.Model structure is as shown in Figure 1.
Calculating average Euclidean distance and in this, as penalty values, specifically comprises the following steps: to count using loss function S
Penalty values are calculated, the calculation formula of the loss function S is
Wherein, h1=H (p1,q1),h2=H (p2,q2),h3=H (p1,p2), p1For the text classification prediction knot exported in S3
Fruit vector, p2For the image classification prediction result vector exported in S4, q1For text classification standard results vector, q2For image point
Class standard result vector, H () are to intersect entropy function.
S6, trained text is added using Softmax function with image prediction result vector, obtains final classification
As a result.
S7, short text to be identified and image are inputted to the multi-modal scene Recognition model trained respectively, into
Row scene Recognition.
In general, the present invention blends image convolution neural network and short text convolutional neural networks, using one kind
New Decision fusion mode first passes through training two class classification results of acquisition and calculates cross entropy between standard results again, then again
The cross entropy between both modalities which between the resulting classification results of training is calculated, finally calculates the average Euclidean of three apart from conduct
Penalty values return to feedforward network undated parameter, have higher discrimination compared with the prior art.
Multi-modal scene recognition method provided by the present invention based on deep learning proposes a kind of completely new multi-modal
Scene search mode provides more accurate, convenient scene Recognition means for user.Method of the invention is extracted text comprehensively
The feature of word and image, and new loss function is devised, using the information of multiple modalities, improve the accurate of scene Recognition
Rate.
The present invention also provides reference for other relevant issues in same domain, can carry out expanding on this basis and prolong
It stretches, applies in other technical solutions relevant to scene recognition method, there is very wide application prospect.
The present invention also provides reference for other relevant issues in same domain, can carry out expanding on this basis and prolong
It stretches, applies to other to in the relevant technical solution of Emotion identification analysis method, there is very wide application prospect.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case where without departing substantially from spirit and essential characteristics of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included within the present invention, and any reference signs in the claims should not be construed as limiting the involved claims.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped
Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should
It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art
The other embodiments being understood that.
Claims (7)
1. a kind of multi-modal scene recognition method based on deep learning, which comprises the steps of:
S1, word segmentation processing is carried out to short text;
S2, one group of picture and short text participle and corresponding label are inputted in respective convolutional neural networks and is trained;
S3, training short text disaggregated model;
S4, training picture classification model;
S5, the full articulamentum output in S3 and S4 is calculated into cross entropy with criteria classification result respectively, calculates average Euclidean distance
And in this, as penalty values, respective convolutional neural networks are then fed back to again, training is repeated, until model is restrained, most
Complete multi-modal scene Recognition model is obtained eventually;
S6, trained text is added with image prediction result vector, obtains final classification results;
S7, short text to be identified and image are inputted to the multi-modal scene Recognition model trained respectively, carry out field
Scape identification.
2. the multi-modal scene recognition method according to claim 1 based on deep learning, which is characterized in that S1 is specifically wrapped
It includes following steps: word segmentation processing being carried out to short text using stammerer participle tool.
3. the multi-modal scene recognition method according to claim 1 based on deep learning, which is characterized in that S3 is specifically wrapped
Include following steps:
S31, the short text word segmentation result of input is quantified, is inputted in three parallel-convolution layers;
S32, the output of three parallel-convolution layers is sequentially sent in linear amending unit layer and pond layer, obtains multiple ponds
Change output result;
S33, multiple pondization output results are linked together, by random drop, as the input of full articulamentum, finally
Full articulamentum is calculated, the output of text classification prediction result vector is obtained.
4. the multi-modal scene recognition method according to claim 3 based on deep learning, it is characterised in that: described three
Parallel-convolution layer includes the first convolutional layer, the second convolutional layer and third convolutional layer, and first convolutional layer has 384 3*
The convolution kernel of 128 sizes, second convolutional layer have the convolution kernel of 256 4*128 sizes, and the third convolutional layer has
The convolution kernel of 128 5*128 sizes.
5. the multi-modal scene recognition method according to claim 3 based on deep learning, which is characterized in that S4 is specifically wrapped
Include following steps:
S41, the picture of input is sent into first layer convolutional network, is extracted by the convolution kernel number of design corresponding special in picture
Number is levied, convolutional layer result is exported;
S42, the output of convolutional layer is subjected to pond, by the amount of convolution kernel compressed data nuclear parameter, reduces over-fitting, then by pond
Change result and input next layer of convolution, passes through 4 convolution ponds repeatedly, make the weight initialization random value in convolution kernel, not
Disconnected training obtains model parameter;
S43, image classification prediction result is calculated by random drop in the full articulamentum of the last layer pond result input
Vector output.
6. the multi-modal scene recognition method according to claim 5 based on deep learning, which is characterized in that described in S5
Calculate average Euclidean distance and in this, as penalty values, specifically comprise the following steps: using loss function S calculate penalty values, institute
The calculation formula for stating loss function S is
Wherein, h1=H (p1,q1),h2=H (p2,q2),h3=H (p1,p2), p1For the text classification prediction result that is exported in S3 to
Amount, p2For the image classification prediction result vector exported in S4, q1For text classification standard results vector, q2For image classification mark
Quasi- result vector, H () are to intersect entropy function.
7. the multi-modal scene recognition method according to claim 1 based on deep learning, which is characterized in that S6 is specifically wrapped
It includes following steps: trained text being added with image prediction result vector using Softmax function, obtains final classification
As a result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910242039.7A CN110046656B (en) | 2019-03-28 | 2019-03-28 | Multi-mode scene recognition method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910242039.7A CN110046656B (en) | 2019-03-28 | 2019-03-28 | Multi-mode scene recognition method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110046656A true CN110046656A (en) | 2019-07-23 |
CN110046656B CN110046656B (en) | 2023-07-11 |
Family
ID=67275472
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910242039.7A Active CN110046656B (en) | 2019-03-28 | 2019-03-28 | Multi-mode scene recognition method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110046656B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866092A (en) * | 2019-11-25 | 2020-03-06 | 三角兽(北京)科技有限公司 | Information searching method and device, electronic equipment and storage medium |
CN111079813A (en) * | 2019-12-10 | 2020-04-28 | 北京百度网讯科技有限公司 | Classification model calculation method and device based on model parallelism |
CN111310795A (en) * | 2020-01-19 | 2020-06-19 | 中国科学院动物研究所 | Multi-modal fruit fly recognition system and method based on image and molecular data |
CN111985520A (en) * | 2020-05-15 | 2020-11-24 | 南京智谷人工智能研究院有限公司 | Multi-mode classification method based on graph convolution neural network |
CN112115806A (en) * | 2020-08-28 | 2020-12-22 | 河海大学 | Remote sensing image scene accurate classification method based on Dual-ResNet small sample learning |
CN112527858A (en) * | 2020-11-26 | 2021-03-19 | 微梦创科网络科技(中国)有限公司 | Marketing account identification method, device, medium and equipment based on social content |
CN112884074A (en) * | 2021-03-22 | 2021-06-01 | 杭州太火鸟科技有限公司 | Image design method, equipment, storage medium and device based on decision tree |
CN113177961A (en) * | 2021-06-07 | 2021-07-27 | 傲雄在线(重庆)科技有限公司 | Multi-mode depth model training method for seal image-text comparison |
CN113393833A (en) * | 2021-06-16 | 2021-09-14 | 中国科学技术大学 | Audio and video awakening method, system, device and storage medium |
CN113554021A (en) * | 2021-06-07 | 2021-10-26 | 傲雄在线(重庆)科技有限公司 | Intelligent seal identification method |
CN114090780A (en) * | 2022-01-20 | 2022-02-25 | 宏龙科技(杭州)有限公司 | Prompt learning-based rapid picture classification method |
CN114581861A (en) * | 2022-03-02 | 2022-06-03 | 北京交通大学 | Track area identification method based on deep learning convolutional neural network |
CN114942857A (en) * | 2021-11-11 | 2022-08-26 | 北京电信发展有限公司 | Multi-mode service intelligent diagnosis system |
CN115115868A (en) * | 2022-04-13 | 2022-09-27 | 之江实验室 | Triple-modal collaborative scene recognition method based on triples |
WO2023056889A1 (en) * | 2021-10-09 | 2023-04-13 | 百果园技术(新加坡)有限公司 | Model training and scene recognition method and apparatus, device, and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
WO2018213841A1 (en) * | 2017-05-19 | 2018-11-22 | Google Llc | Multi-task multi-modal machine learning model |
CN109146849A (en) * | 2018-07-26 | 2019-01-04 | 昆明理工大学 | A kind of road surface crack detection method based on convolutional neural networks and image recognition |
-
2019
- 2019-03-28 CN CN201910242039.7A patent/CN110046656B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018213841A1 (en) * | 2017-05-19 | 2018-11-22 | Google Llc | Multi-task multi-modal machine learning model |
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
CN109146849A (en) * | 2018-07-26 | 2019-01-04 | 昆明理工大学 | A kind of road surface crack detection method based on convolutional neural networks and image recognition |
Non-Patent Citations (1)
Title |
---|
梁蒙蒙等: "基于随机化融合和CNN的多模态肺部肿瘤图像识别", 《南京大学学报(自然科学)》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866092A (en) * | 2019-11-25 | 2020-03-06 | 三角兽(北京)科技有限公司 | Information searching method and device, electronic equipment and storage medium |
CN111079813A (en) * | 2019-12-10 | 2020-04-28 | 北京百度网讯科技有限公司 | Classification model calculation method and device based on model parallelism |
CN111079813B (en) * | 2019-12-10 | 2023-07-07 | 北京百度网讯科技有限公司 | Classification model calculation method and device based on model parallelism |
CN111310795A (en) * | 2020-01-19 | 2020-06-19 | 中国科学院动物研究所 | Multi-modal fruit fly recognition system and method based on image and molecular data |
CN111985520A (en) * | 2020-05-15 | 2020-11-24 | 南京智谷人工智能研究院有限公司 | Multi-mode classification method based on graph convolution neural network |
CN111985520B (en) * | 2020-05-15 | 2022-08-16 | 南京智谷人工智能研究院有限公司 | Multi-mode classification method based on graph convolution neural network |
CN112115806A (en) * | 2020-08-28 | 2020-12-22 | 河海大学 | Remote sensing image scene accurate classification method based on Dual-ResNet small sample learning |
CN112115806B (en) * | 2020-08-28 | 2022-08-19 | 河海大学 | Remote sensing image scene accurate classification method based on Dual-ResNet small sample learning |
CN112527858A (en) * | 2020-11-26 | 2021-03-19 | 微梦创科网络科技(中国)有限公司 | Marketing account identification method, device, medium and equipment based on social content |
CN112884074A (en) * | 2021-03-22 | 2021-06-01 | 杭州太火鸟科技有限公司 | Image design method, equipment, storage medium and device based on decision tree |
CN113177961A (en) * | 2021-06-07 | 2021-07-27 | 傲雄在线(重庆)科技有限公司 | Multi-mode depth model training method for seal image-text comparison |
CN113554021A (en) * | 2021-06-07 | 2021-10-26 | 傲雄在线(重庆)科技有限公司 | Intelligent seal identification method |
CN113554021B (en) * | 2021-06-07 | 2023-12-15 | 重庆傲雄在线信息技术有限公司 | Intelligent seal identification method |
CN113393833A (en) * | 2021-06-16 | 2021-09-14 | 中国科学技术大学 | Audio and video awakening method, system, device and storage medium |
CN113393833B (en) * | 2021-06-16 | 2024-04-02 | 中国科学技术大学 | Audio and video awakening method, system, equipment and storage medium |
WO2023056889A1 (en) * | 2021-10-09 | 2023-04-13 | 百果园技术(新加坡)有限公司 | Model training and scene recognition method and apparatus, device, and medium |
CN114942857A (en) * | 2021-11-11 | 2022-08-26 | 北京电信发展有限公司 | Multi-mode service intelligent diagnosis system |
CN114090780A (en) * | 2022-01-20 | 2022-02-25 | 宏龙科技(杭州)有限公司 | Prompt learning-based rapid picture classification method |
CN114581861A (en) * | 2022-03-02 | 2022-06-03 | 北京交通大学 | Track area identification method based on deep learning convolutional neural network |
CN115115868A (en) * | 2022-04-13 | 2022-09-27 | 之江实验室 | Triple-modal collaborative scene recognition method based on triples |
CN115115868B (en) * | 2022-04-13 | 2024-05-07 | 之江实验室 | Multi-mode collaborative scene recognition method based on triples |
Also Published As
Publication number | Publication date |
---|---|
CN110046656B (en) | 2023-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110046656A (en) | Multi-modal scene recognition method based on deep learning | |
CN106709461B (en) | Activity recognition method and device based on video | |
Fathallah et al. | Facial expression recognition via deep learning | |
CN107679526B (en) | Human face micro-expression recognition method | |
CN108615010B (en) | Facial expression recognition method based on parallel convolution neural network feature map fusion | |
Chai et al. | Two streams recurrent neural networks for large-scale continuous gesture recognition | |
CN106529503B (en) | A kind of integrated convolutional neural networks face emotion identification method | |
Liu et al. | Multi-channel pose-aware convolution neural networks for multi-view facial expression recognition | |
CN109241255A (en) | A kind of intension recognizing method based on deep learning | |
CN103268495B (en) | Human body behavior modeling recognition methods based on priori knowledge cluster in computer system | |
CN108846350A (en) | Tolerate the face identification method of change of age | |
CN106651830A (en) | Image quality test method based on parallel convolutional neural network | |
Zhou et al. | Convolutional neural networks based pornographic image classification | |
CN108537120A (en) | A kind of face identification method and system based on deep learning | |
Kindiroglu et al. | Temporal accumulative features for sign language recognition | |
Tan et al. | Style interleaved learning for generalizable person re-identification | |
CN110490028A (en) | Recognition of face network training method, equipment and storage medium based on deep learning | |
Sang et al. | Discriminative deep feature learning for facial emotion recognition | |
CN117195148A (en) | Ore emotion recognition method based on expression, electroencephalogram and voice multi-mode fusion | |
Cheng et al. | Student action recognition based on deep convolutional generative adversarial network | |
Fan et al. | Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition. | |
CN113159002B (en) | Facial expression recognition method based on self-attention weight auxiliary module | |
Tian et al. | 3D facial expression recognition using deep feature fusion CNN | |
CN110287938A (en) | Event recognition method, system, equipment and medium based on critical segment detection | |
CN103440332B (en) | A kind of image search method strengthening expression based on relational matrix regularization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |