CN114359768B - 一种基于多模态异质特征融合的视频密集事件描述方法 - Google Patents
一种基于多模态异质特征融合的视频密集事件描述方法 Download PDFInfo
- Publication number
- CN114359768B CN114359768B CN202111159640.3A CN202111159640A CN114359768B CN 114359768 B CN114359768 B CN 114359768B CN 202111159640 A CN202111159640 A CN 202111159640A CN 114359768 B CN114359768 B CN 114359768B
- Authority
- CN
- China
- Prior art keywords
- video
- event
- feature
- description
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 230000004927 fusion Effects 0.000 title claims abstract description 74
- 239000013598 vector Substances 0.000 claims abstract description 195
- 230000000007 visual effect Effects 0.000 claims abstract description 70
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 41
- 238000000605 extraction Methods 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 32
- 230000008569 process Effects 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000003993 interaction Effects 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 8
- 238000007500 overflow downdraw method Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 238000012512 characterization method Methods 0.000 claims description 5
- 238000007499 fusion processing Methods 0.000 claims description 4
- 230000001174 ascending effect Effects 0.000 claims 1
- 230000033764 rhythmic process Effects 0.000 abstract description 5
- 239000000284 extract Substances 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 14
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000003044 adaptive effect Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111159640.3A CN114359768B (zh) | 2021-09-30 | 2021-09-30 | 一种基于多模态异质特征融合的视频密集事件描述方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111159640.3A CN114359768B (zh) | 2021-09-30 | 2021-09-30 | 一种基于多模态异质特征融合的视频密集事件描述方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114359768A CN114359768A (zh) | 2022-04-15 |
CN114359768B true CN114359768B (zh) | 2024-04-16 |
Family
ID=81095830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111159640.3A Active CN114359768B (zh) | 2021-09-30 | 2021-09-30 | 一种基于多模态异质特征融合的视频密集事件描述方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114359768B (zh) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114708472B (zh) * | 2022-06-06 | 2022-09-09 | 浙江大学 | 面向ai实训的多模态数据集标注方法、装置及电子设备 |
CN115496134B (zh) * | 2022-09-14 | 2023-10-03 | 北京联合大学 | 基于多模态特征融合的交通场景视频描述生成方法和装置 |
CN115438197B (zh) * | 2022-11-07 | 2023-03-24 | 巢湖学院 | 一种基于双层异质图的事理知识图谱关系补全方法及系统 |
CN116089654B (zh) * | 2023-04-07 | 2023-07-07 | 杭州东上智能科技有限公司 | 一种基于音频监督的可转移视听文本生成方法和系统 |
CN117196096A (zh) * | 2023-08-21 | 2023-12-08 | 中电科新型智慧城市研究院有限公司 | 目标事件的预测方法、装置、终端设备及存储介质 |
CN117113281B (zh) * | 2023-10-20 | 2024-01-26 | 光轮智能(北京)科技有限公司 | 多模态数据的处理方法、设备、智能体和介质 |
CN117407507A (zh) * | 2023-10-27 | 2024-01-16 | 北京百度网讯科技有限公司 | 基于大语言模型的事件处理方法、装置、设备及介质 |
CN117556220B (zh) * | 2024-01-09 | 2024-03-22 | 吉林大学 | 用于康复护理的智能辅助系统及方法 |
CN117726977B (zh) * | 2024-02-07 | 2024-04-12 | 南京百伦斯智能科技有限公司 | 基于dcnn的实验操作关键节点评分方法和系统 |
CN118279803A (zh) * | 2024-05-08 | 2024-07-02 | 珠海澳大科技研究院 | 一种基于语义消歧结构化编码的视频描述方法 |
CN118395195A (zh) * | 2024-06-28 | 2024-07-26 | 浪潮电子信息产业股份有限公司 | 模型训练方法、视频定位方法、系统、设备、产品及介质 |
CN118444620A (zh) * | 2024-07-08 | 2024-08-06 | 青岛科技大学 | 一种面向终端设备的智能场景生成方法及智慧家庭系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108648746A (zh) * | 2018-05-15 | 2018-10-12 | 南京航空航天大学 | 一种基于多模态特征融合的开放域视频自然语言描述生成方法 |
WO2020190112A1 (en) * | 2019-03-21 | 2020-09-24 | Samsung Electronics Co., Ltd. | Method, apparatus, device and medium for generating captioning information of multimedia data |
CN113392717A (zh) * | 2021-05-21 | 2021-09-14 | 杭州电子科技大学 | 一种基于时序特征金字塔的视频密集描述生成方法 |
-
2021
- 2021-09-30 CN CN202111159640.3A patent/CN114359768B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108648746A (zh) * | 2018-05-15 | 2018-10-12 | 南京航空航天大学 | 一种基于多模态特征融合的开放域视频自然语言描述生成方法 |
WO2020190112A1 (en) * | 2019-03-21 | 2020-09-24 | Samsung Electronics Co., Ltd. | Method, apparatus, device and medium for generating captioning information of multimedia data |
CN113392717A (zh) * | 2021-05-21 | 2021-09-14 | 杭州电子科技大学 | 一种基于时序特征金字塔的视频密集描述生成方法 |
Non-Patent Citations (1)
Title |
---|
基于多特征的视频描述生成算法研究;曹磊;万旺根;侯丽;;电子测量技术;20200823(16);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114359768A (zh) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114359768B (zh) | 一种基于多模态异质特征融合的视频密集事件描述方法 | |
WO2023050295A1 (zh) | 一种基于多模态异质特征融合的视频密集事件描述方法 | |
Le-Khac et al. | Contrastive representation learning: A framework and review | |
Zhang et al. | Learning affective features with a hybrid deep model for audio–visual emotion recognition | |
US11657230B2 (en) | Referring image segmentation | |
Pouyanfar et al. | Multimodal deep learning based on multiple correspondence analysis for disaster management | |
Zhou et al. | Facial depression recognition by deep joint label distribution and metric learning | |
Guanghui et al. | Multi-modal emotion recognition by fusing correlation features of speech-visual | |
Praveen et al. | Audio–visual fusion for emotion recognition in the valence–arousal space using joint cross-attention | |
Hazourli et al. | Multi-facial patches aggregation network for facial expression recognition and facial regions contributions to emotion display | |
Wu et al. | Facial kinship verification: A comprehensive review and outlook | |
Zhang | Voice keyword retrieval method using attention mechanism and multimodal information fusion | |
Lee et al. | Timeconvnets: A deep time windowed convolution neural network design for real-time video facial expression recognition | |
Gurnani et al. | Saf-bage: Salient approach for facial soft-biometric classification-age, gender, and facial expression | |
Chauhan et al. | Analysis of Intelligent movie recommender system from facial expression | |
Singh et al. | Age, gender prediction and emotion recognition using convolutional neural network | |
de Lima Costa et al. | High-level context representation for emotion recognition in images | |
Cheng et al. | Multimodal sentiment analysis based on attentional temporal convolutional network and multi-layer feature fusion | |
Saleem et al. | Stateful human-centered visual captioning system to aid video surveillance | |
Zhao et al. | SWRR: Feature Map Classifier Based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition | |
Zhao et al. | Multi-view dimensionality reduction via subspace structure agreement | |
Sharma et al. | Machine learning techniques for real-time emotion detection from facial expressions | |
Chen et al. | Learning an attention-aware parallel sharing network for facial attribute recognition | |
Irene et al. | Person search over security video surveillance systems using deep learning methods: A review | |
Wang et al. | Audiovisual emotion recognition via cross-modal association in kernel space |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Liu Jin Inventor after: Zhu Xiaorong Inventor after: Han Bing Inventor after: Li Ying Inventor after: Wu Zhongdai Inventor after: Gong Peizhu Inventor after: Zhang Xiliang Inventor after: Wang Junxiang Inventor after: Guo Lei Inventor after: Hu Rong Inventor before: Liu Jin Inventor before: Gong Peizhu Inventor before: Zhang Xiliang Inventor before: Wu Zhongdai Inventor before: Wang Junxiang Inventor before: Guo Lei Inventor before: Hu Rong Inventor before: Han Bing Inventor before: Zhu Xiaorong |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |