CN112329604A - Multi-modal emotion analysis method based on multi-dimensional low-rank decomposition - Google Patents
Multi-modal emotion analysis method based on multi-dimensional low-rank decomposition Download PDFInfo
- Publication number
- CN112329604A CN112329604A CN202011209001.9A CN202011209001A CN112329604A CN 112329604 A CN112329604 A CN 112329604A CN 202011209001 A CN202011209001 A CN 202011209001A CN 112329604 A CN112329604 A CN 112329604A
- Authority
- CN
- China
- Prior art keywords
- video
- features
- modal
- emotion analysis
- rank decomposition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 46
- 230000008451 emotion Effects 0.000 title claims abstract description 44
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims abstract description 11
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 4
- 238000007500 overflow downdraw method Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Acoustics & Sound (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- Child & Adolescent Psychology (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-modal emotion analysis method based on multi-dimensional low-rank decomposition, which is used for fusing multi-modal features with high dimensionality into low-dimensional vectors and then using the low-dimensional vectors for video emotion analysis. The method specifically comprises the following steps: acquiring a video data set for training a multi-modal emotion analysis model, wherein the video data set comprises a plurality of sample videos and defines an algorithm target; extracting image features, audio features and text features in the video data set to obtain the image features, the audio features and the text features of the video data; establishing a multi-modal emotion analysis model based on a multi-dimensional low-rank decomposition mechanism based on the extracted image features, audio features and text features; and performing emotion analysis on the input video by using the multi-mode emotion analysis model. The method is suitable for multi-modal emotion analysis of a real video scene, and has better effect and robustness in the face of various complex conditions.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a multi-modal emotion analysis method based on multi-dimensional low-rank decomposition.
Background
In modern society, video becomes an indispensable part of human society, and is said to be ubiquitous. Such environments have led to significant developments in the research of semantic content of video. The multi-modal emotion analysis is a relatively important branch of video analysis, and is particularly important in the era of rapid development of short video live broadcast in the modern society. The method can judge the change of emotion of the video speaker in real time according to the expression, language and sound of the video speaker, and serve subsequent applications.
Most of the existing multi-modal emotion analysis methods based on tensors adopt the method that feature mean values of different modes are pooled, and then features are mapped to high-order tensor representations to be applied to subsequent tasks. Such processing methods ignore rich video timing information.
Disclosure of Invention
In order to solve the above problems, the present invention provides a multi-modal emotion analysis method based on multi-dimensional low-rank decomposition, which is used for predicting the emotion score of a speaker in a video. The method first extracts the multi-modal features of the video including images, audio, and text. And then, feature fusion is carried out on multi-modal feature multi-dimension by using a tensor low-rank approximation method, so that the complexity of the model is reduced. The method makes full use of various modes in the video data, overcomes the defect that the existing tensor method ignores the time sequence information, and has better expandability.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a multi-modal emotion analysis method based on multi-dimensional low-rank decomposition comprises the following steps:
s1, acquiring a video data set for training a multi-modal emotion analysis model, wherein the video data set comprises a plurality of sample videos and defines an algorithm target;
s2, extracting image features, audio features and text features in the video data set to obtain the image features, the audio features and the text features of the video data;
s3, establishing a multi-modal emotion analysis model based on the multi-dimensional low-rank decomposition mechanism based on the extracted image features, audio features and text features;
and S4, performing emotion analysis on the input video by using the multi-modal emotion analysis model.
Further, in step S1, the video data set includes a video XtrainAnd manually labeled sentiment score Ytrain;
The algorithm targets are defined as: given video x ═ x1,x2,...,xL},xiRepresenting the ith block, each video block containing a fixed number of video frames, L representing the total number of video blocks, and the sentiment score y for the predicted video segment, y being a continuous value score.
Further, step S2 specifically includes:
s21, taking each video block xiAll the images in (1) are input into a two-dimensional convolution neural network, the image characteristics of the video are extracted, and the mean vector is calculated and recorded as
S22, extracting each video block xiThe text in (1) is characterized by word vectors, and the mean vector is calculated and recorded as
S24, obtaining all video block image characteristics based on the extraction resultText features Audio features
Further, in step S3, the multi-modal emotion analysis model based on multi-dimensional low-rank decomposition is composed of a series of linear layers, dot product layers and mean pooling layers, and its video representationCalculated by the following formula:
wherein, VmRepresenting modal characteristics including image S, audio A or text T, andandis a training parameter, wherein R1And R2The rank of the expression tensor is manually set;
the emotion scores of the speakers in the video are predicted based on the video representation o,
p=Woo+bo
Further, the multi-modal emotion analysis model training uses an L1 loss function of the predicted value p and the label value y,
Loss=|y-p|1
where the entire model is trained under the Loss function Loss using Adam optimization algorithm and back propagation algorithm.
Compared with the existing multi-modal emotion analysis method, the multi-modal emotion analysis method based on multi-dimensional low-rank decomposition has the following beneficial effects:
firstly, the time sequence characteristics are introduced into a tensor fusion method, and the defects of the existing method are greatly supplemented.
Secondly, the invention firstly proposes tensor fusion on a plurality of dimensions, and a method for carrying out low-rank decomposition approximation on the plurality of dimensions is proposed through deduction, so that the performance of the model is improved, and the efficiency is not lost.
The method has good application value in short video and live broadcast systems, and can effectively improve the accuracy of multi-modal emotion analysis.
Drawings
FIG. 1 is a flow diagram of a multi-modal sentiment analysis method based on multi-dimensional low-rank decomposition according to the present invention.
FIG. 2 is a frame diagram of a multi-modal sentiment analysis model based on multi-dimensional low-rank decomposition according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
Referring to fig. 1, in a preferred embodiment of the present invention, a multi-modal sentiment analysis method based on multi-dimensional low rank decomposition includes the following steps:
first, a video data set for training a multimodal emotion analysis model is acquired. Wherein the video data set used for training the emotion analysis model comprises a video XtrainArtificially labeled video description sentence Ytrain;
The algorithm targets are defined as: given video x ═ x1,x2,...,xL},xiRepresenting the ith block, each video block containing a fixed number of video frames, L representing the total number of video blocks, and the sentiment score y for the predicted video segment, y being a continuous value score.
Second, multi-modal features in the video data set are extracted. Specifically, the method comprises the following steps:
first, take each video block xiAll the images in (1) are input into a two-dimensional convolution neural network, the image characteristics of the video are extracted, and the mean vector is calculated and recorded as
Second, extract each video block xiThe text in (1) is characterized by word vectors, and the mean vector is calculated and recorded as
Thirdly, extracting traditional MFCC audio features based on video blocks and recording the traditional MFCC audio features as
Fourthly, obtaining all video block image characteristics based on the extraction resultText featuresAudio features
And then, establishing a multi-modal emotion analysis model based on a multi-dimensional low-rank decomposition mechanism based on the extracted image features, audio features and text features.
The multi-modal emotion analysis model based on multi-dimensional low-rank decomposition is composed of a series of linear layers, a dot product layer and a mean pooling layer, and video representation of the modelCalculated by the following formula:
wherein, VmRepresenting modal characteristics including image S, audio A or text T, andandis a training parameter, wherein R1And R2The rank of the expression tensor is manually set;
the emotion scores of the speakers in the video are predicted based on the video representation o described above,
p=Woo+bo
further, the training of the multi-model emotion analysis model uses an L1 loss function of the predicted value p and the label value y,
Loss=|y-p|1
where the entire model is trained under the Loss function Loss using Adam optimization algorithm and back propagation algorithm.
In the above embodiments, the multi-modal emotion analysis method of the present invention uses image features, audio features and text features in the video. On the basis, a multi-dimensional low-rank decomposition mechanism is established. And finally, performing emotion analysis on the unmarked video by using the trained model.
Through the technical scheme, the embodiment of the invention develops a multi-modal emotion analysis method algorithm applied to the unprocessed video based on the deep learning technology. The time sequence information is introduced into the existing tensor fusion method, the tensor fusion method is used in multiple dimensions, and the model efficiency is improved by low-rank decomposition, so that the emotion analysis is more accurate and rapid.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (5)
1. A multi-modal emotion analysis method based on multi-dimensional low-rank decomposition is characterized by comprising the following steps:
s1, acquiring a video data set for training a multi-modal emotion analysis model, wherein the video data set comprises a plurality of sample videos and defines an algorithm target;
s2, extracting image features, audio features and text features in the video data set to obtain the image features, the audio features and the text features of the video data;
s3, establishing a multi-modal emotion analysis model based on the multi-dimensional low-rank decomposition mechanism based on the extracted image features, audio features and text features;
and S4, performing emotion analysis on the input video by using the multi-modal emotion analysis model.
2. The method for multi-modal sentiment analysis based on multi-dimensional low rank decomposition of claim 1 wherein in step S1, the video data set comprises video XtrainAnd manually labeled sentiment score Ytrain;
The algorithm targets are defined as:given video x ═ x1,x2,...,xL},xiRepresenting the ith block, each video block containing a fixed number of video frames, L representing the total number of video blocks, and the sentiment score y for the predicted video segment, y being a continuous value score.
3. The multi-modal sentiment analysis method based on multi-dimensional low rank decomposition according to claim 2, wherein the step S2 specifically comprises:
s21, taking each video block xiAll the images in (1) are input into a two-dimensional convolution neural network, the image characteristics of the video are extracted, and the mean vector is calculated and recorded as
S22, extracting each video block xiThe text in (1) is characterized by word vectors, and the mean vector is calculated and recorded as
4. The method as claimed in claim 3, wherein in step S3, the multi-modal sentiment analysis model based on multi-dimensional low-rank decomposition is composed of a series of linear layers, dot product layers and mean pooling layers, and its video representation isCalculated by the following formula:
wherein, VmRepresenting modal characteristics including image S, audio A or text T, andandis a training parameter, wherein R1And R2The rank of the expression tensor is manually set; based on the above video characterizationoThe emotion scores of the speakers in the video are predicted,
p=Woo+bo
5. The multi-dimensional low-rank decomposition based multi-modal sentiment analysis method of claim 4, wherein the training of the multi-modal sentiment analysis model uses an L1 loss function of a predicted value p and a tag value y.
Loss=|y-p|1
Where the entire model is trained under the Loss function Loss using Adam optimization algorithm and back propagation algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011209001.9A CN112329604B (en) | 2020-11-03 | 2020-11-03 | Multi-modal emotion analysis method based on multi-dimensional low-rank decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011209001.9A CN112329604B (en) | 2020-11-03 | 2020-11-03 | Multi-modal emotion analysis method based on multi-dimensional low-rank decomposition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112329604A true CN112329604A (en) | 2021-02-05 |
CN112329604B CN112329604B (en) | 2022-09-20 |
Family
ID=74322845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011209001.9A Active CN112329604B (en) | 2020-11-03 | 2020-11-03 | Multi-modal emotion analysis method based on multi-dimensional low-rank decomposition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112329604B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114999006A (en) * | 2022-05-20 | 2022-09-02 | 南京邮电大学 | Multi-modal emotion analysis method, device and equipment based on uncertainty estimation |
WO2023163383A1 (en) * | 2022-02-28 | 2023-08-31 | 에스케이텔레콤 주식회사 | Multimodal-based method and apparatus for recognizing emotion in real time |
CN117688936A (en) * | 2024-02-04 | 2024-03-12 | 江西农业大学 | Low-rank multi-mode fusion emotion analysis method for graphic fusion |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103310229A (en) * | 2013-06-15 | 2013-09-18 | 浙江大学 | Multitask machine learning method and multitask machine learning device both used for image classification |
CN104299216A (en) * | 2014-10-22 | 2015-01-21 | 北京航空航天大学 | Multimodality medical image fusion method based on multiscale anisotropic decomposition and low rank analysis |
CN106056082A (en) * | 2016-05-31 | 2016-10-26 | 杭州电子科技大学 | Video action recognition method based on sparse low-rank coding |
CN107292858A (en) * | 2017-05-22 | 2017-10-24 | 昆明理工大学 | A kind of multimode medical image fusion method based on low-rank decomposition and rarefaction representation |
CN108197629A (en) * | 2017-12-30 | 2018-06-22 | 北京工业大学 | A kind of Multimodal medical image feature extracting method based on label correlation constraint tensor resolution |
CN109934135A (en) * | 2019-02-28 | 2019-06-25 | 北京航空航天大学 | A kind of rail foreign matter detecting method decomposed based on low-rank matrix |
CN110188770A (en) * | 2019-05-17 | 2019-08-30 | 重庆邮电大学 | A kind of non-convex low-rank well-marked target detection method decomposed based on structure matrix |
CN110222213A (en) * | 2019-05-28 | 2019-09-10 | 天津大学 | A kind of image classification method based on isomery tensor resolution |
CN111178389A (en) * | 2019-12-06 | 2020-05-19 | 杭州电子科技大学 | Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling |
-
2020
- 2020-11-03 CN CN202011209001.9A patent/CN112329604B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103310229A (en) * | 2013-06-15 | 2013-09-18 | 浙江大学 | Multitask machine learning method and multitask machine learning device both used for image classification |
CN104299216A (en) * | 2014-10-22 | 2015-01-21 | 北京航空航天大学 | Multimodality medical image fusion method based on multiscale anisotropic decomposition and low rank analysis |
CN106056082A (en) * | 2016-05-31 | 2016-10-26 | 杭州电子科技大学 | Video action recognition method based on sparse low-rank coding |
CN107292858A (en) * | 2017-05-22 | 2017-10-24 | 昆明理工大学 | A kind of multimode medical image fusion method based on low-rank decomposition and rarefaction representation |
CN108197629A (en) * | 2017-12-30 | 2018-06-22 | 北京工业大学 | A kind of Multimodal medical image feature extracting method based on label correlation constraint tensor resolution |
CN109934135A (en) * | 2019-02-28 | 2019-06-25 | 北京航空航天大学 | A kind of rail foreign matter detecting method decomposed based on low-rank matrix |
CN110188770A (en) * | 2019-05-17 | 2019-08-30 | 重庆邮电大学 | A kind of non-convex low-rank well-marked target detection method decomposed based on structure matrix |
CN110222213A (en) * | 2019-05-28 | 2019-09-10 | 天津大学 | A kind of image classification method based on isomery tensor resolution |
CN111178389A (en) * | 2019-12-06 | 2020-05-19 | 杭州电子科技大学 | Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling |
Non-Patent Citations (1)
Title |
---|
ZHUN LIU 等: "Efficient Low-rank Multimodal Fusion with Modality-Specific Factors", 《ARXIV:1806.00064V1 [CS.AI]》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023163383A1 (en) * | 2022-02-28 | 2023-08-31 | 에스케이텔레콤 주식회사 | Multimodal-based method and apparatus for recognizing emotion in real time |
CN114999006A (en) * | 2022-05-20 | 2022-09-02 | 南京邮电大学 | Multi-modal emotion analysis method, device and equipment based on uncertainty estimation |
CN117688936A (en) * | 2024-02-04 | 2024-03-12 | 江西农业大学 | Low-rank multi-mode fusion emotion analysis method for graphic fusion |
CN117688936B (en) * | 2024-02-04 | 2024-04-19 | 江西农业大学 | Low-rank multi-mode fusion emotion analysis method for graphic fusion |
Also Published As
Publication number | Publication date |
---|---|
CN112329604B (en) | 2022-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112329604B (en) | Multi-modal emotion analysis method based on multi-dimensional low-rank decomposition | |
CN112465008B (en) | Voice and visual relevance enhancement method based on self-supervision course learning | |
WO2023065617A1 (en) | Cross-modal retrieval system and method based on pre-training model and recall and ranking | |
CN110083729B (en) | Image searching method and system | |
CN112004111A (en) | News video information extraction method for global deep learning | |
Bilkhu et al. | Attention is all you need for videos: Self-attention based video summarization using universal transformers | |
CN115393933A (en) | Video face emotion recognition method based on frame attention mechanism | |
CN110866129A (en) | Cross-media retrieval method based on cross-media uniform characterization model | |
WO2023134085A1 (en) | Question answer prediction method and prediction apparatus, electronic device, and storage medium | |
CN114896434A (en) | Hash code generation method and device based on center similarity learning | |
CN117540007B (en) | Multi-mode emotion analysis method, system and equipment based on similar mode completion | |
CN117152851B (en) | Face and human body collaborative clustering method based on large model pre-training | |
CN117668262A (en) | Sound image file utilization system based on artificial intelligent voice and image recognition technology | |
CN117609548A (en) | Video multi-mode target element extraction and video abstract synthesis method and system based on pre-training model | |
CN112084788A (en) | Automatic marking method and system for implicit emotional tendency of image captions | |
CN110674265A (en) | Unstructured information oriented feature discrimination and information recommendation system | |
CN115797642A (en) | Self-adaptive image semantic segmentation algorithm based on consistency regularization and semi-supervision field | |
CN113823271B (en) | Training method and device for voice classification model, computer equipment and storage medium | |
CN114842301A (en) | Semi-supervised training method of image annotation model | |
CN115019137A (en) | Method and device for predicting multi-scale double-flow attention video language event | |
CN114565804A (en) | NLP model training and recognizing system | |
CN118093936B (en) | Video tag processing method, device, computer equipment and storage medium | |
CN112016540B (en) | Behavior identification method based on static image | |
TWI830604B (en) | Video topic analysis system, method and computer readable medium thereof | |
Rasi et al. | Image Description Generator Using Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |