WO2023249556A3 - 基于对比学习来处理视频的方法、装置、设备和介质 - Google Patents
基于对比学习来处理视频的方法、装置、设备和介质 Download PDFInfo
- Publication number
- WO2023249556A3 WO2023249556A3 PCT/SG2023/050421 SG2023050421W WO2023249556A3 WO 2023249556 A3 WO2023249556 A3 WO 2023249556A3 SG 2023050421 W SG2023050421 W SG 2023050421W WO 2023249556 A3 WO2023249556 A3 WO 2023249556A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- contrastive
- frame
- features
- video
- medium
- Prior art date
Links
- 238000000034 method Methods 0.000 title abstract 2
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Image Analysis (AREA)
Abstract
提供了基于对比学习来处理视频的方法、装置、设备和介质。从训练数据中的训练视频中的第一帧和第二帧分别提取至少一个第一对象和至少一个第二对象。针对至少一个第一对象中的第一对象,基于训练数据从至少一个第二对象中选择与第一对象相关联的至少一个正样本对象和至少一个负样本对象。基于至少一个正样本对象和至少一个负样本对象生成对比模型,对比模型描述视频中的帧中的对象与对象的对比特征之间的关联关系,对比模型使得对比特征与视频中的另一帧中的另一对象的另一对比特征之间的相似性指示对象与另一对象是否表示相同对象。对比特征区分各个帧中的对象是否表示相同对象,由此提高跨越各个帧执行对象跟踪的准确性。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210714416.4 | 2022-06-22 | ||
CN202210714416.4A CN117315521A (zh) | 2022-06-22 | 2022-06-22 | 基于对比学习来处理视频的方法、装置、设备和介质 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2023249556A2 WO2023249556A2 (zh) | 2023-12-28 |
WO2023249556A3 true WO2023249556A3 (zh) | 2024-03-07 |
Family
ID=89241258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2023/050421 WO2023249556A2 (zh) | 2022-06-22 | 2023-06-14 | 基于对比学习来处理视频的方法、装置、设备和介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117315521A (zh) |
WO (1) | WO2023249556A2 (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105976397A (zh) * | 2016-04-28 | 2016-09-28 | 西安电子科技大学 | 基于半非负优化集成学习的目标跟踪方法 |
CN109740665A (zh) * | 2018-12-29 | 2019-05-10 | 珠海大横琴科技发展有限公司 | 基于专家知识约束的遮挡图像船只目标检测方法及系统 |
CN110110670A (zh) * | 2019-05-09 | 2019-08-09 | 杭州电子科技大学 | 基于Wasserstein度量的行人跟踪中的数据关联方法 |
CN113642472A (zh) * | 2021-08-13 | 2021-11-12 | 北京百度网讯科技有限公司 | 判别器模型的训练方法和动作识别方法 |
CN113762231A (zh) * | 2021-11-10 | 2021-12-07 | 中电科新型智慧城市研究院有限公司 | 端对端的多行人姿态跟踪方法、装置及电子设备 |
-
2022
- 2022-06-22 CN CN202210714416.4A patent/CN117315521A/zh active Pending
-
2023
- 2023-06-14 WO PCT/SG2023/050421 patent/WO2023249556A2/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105976397A (zh) * | 2016-04-28 | 2016-09-28 | 西安电子科技大学 | 基于半非负优化集成学习的目标跟踪方法 |
CN109740665A (zh) * | 2018-12-29 | 2019-05-10 | 珠海大横琴科技发展有限公司 | 基于专家知识约束的遮挡图像船只目标检测方法及系统 |
CN110110670A (zh) * | 2019-05-09 | 2019-08-09 | 杭州电子科技大学 | 基于Wasserstein度量的行人跟踪中的数据关联方法 |
CN113642472A (zh) * | 2021-08-13 | 2021-11-12 | 北京百度网讯科技有限公司 | 判别器模型的训练方法和动作识别方法 |
CN113762231A (zh) * | 2021-11-10 | 2021-12-07 | 中电科新型智慧城市研究院有限公司 | 端对端的多行人姿态跟踪方法、装置及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
WO2023249556A2 (zh) | 2023-12-28 |
CN117315521A (zh) | 2023-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3913542A3 (en) | Method and apparatus of training model, device, medium, and program product | |
CN108090857B (zh) | 一种多模态的学生课堂行为分析系统和方法 | |
EP3836077A3 (en) | Product defect detection method and apparatus, electronic device, storage medium and program | |
EP3933686A3 (en) | Video processing method, apparatus, electronic device, storage medium, and program product | |
PH12020550588A1 (en) | Target detection method and apparatus, training method, electronic device and medium | |
CN107491435B (zh) | 基于计算机自动识别用户情感的方法及装置 | |
US20180322416A1 (en) | Feature extraction and classification method based on support vector data description and system thereof | |
EP3843031A3 (en) | Face super-resolution realization method and apparatus, electronic device and storage medium | |
EP3907666A3 (en) | Method, apparatus, electronic device, readable storage medium and program for constructing key-point learning model | |
CN112820322B (zh) | 一种基于自监督对比学习的半监督音频事件标注方法 | |
EP3913532A3 (en) | Object area measurement method, apparatus, storage medium and computer product | |
EP3872760A3 (en) | Method and apparatus of training depth estimation network, and method and apparatus of estimating depth of image | |
EP3872761A3 (en) | Analysing objects in a set of frames | |
EP3998583A3 (en) | Method and apparatus of training cycle generative networks model, and method and apparatus of building character library | |
CN114722822B (zh) | 命名实体识别方法、装置、设备和计算机可读存储介质 | |
Morfi et al. | Data-efficient weakly supervised learning for low-resource audio event detection using deep learning | |
Liu et al. | Synthvsr: Scaling up visual speech recognition with synthetic supervision | |
Dong et al. | CML: A contrastive meta learning method to estimate human label confidence scores and reduce data collection cost | |
WO2023249556A3 (zh) | 基于对比学习来处理视频的方法、装置、设备和介质 | |
CN109697982A (zh) | 一种讲授场景中的说话人语音识别系统 | |
Xiao et al. | Power-spectral analysis of head motion signal for behavioral modeling in human interaction | |
EP4187504A8 (en) | Method for training text classification model, apparatus, storage medium and computer program product | |
EP4134920A3 (en) | Entity recognition method and apparatus, and computer program product | |
EP4030424A3 (en) | Method and apparatus of processing voice for vehicle, electronic device and medium | |
EP3842961A3 (en) | Method and apparatus for mining tag, device, storage medium and computer program product |