WO2023249556A3 - 基于对比学习来处理视频的方法、装置、设备和介质 - Google Patents

基于对比学习来处理视频的方法、装置、设备和介质 Download PDF

Info

Publication number
WO2023249556A3
WO2023249556A3 PCT/SG2023/050421 SG2023050421W WO2023249556A3 WO 2023249556 A3 WO2023249556 A3 WO 2023249556A3 SG 2023050421 W SG2023050421 W SG 2023050421W WO 2023249556 A3 WO2023249556 A3 WO 2023249556A3
Authority
WO
WIPO (PCT)
Prior art keywords
contrastive
frame
features
video
medium
Prior art date
Application number
PCT/SG2023/050421
Other languages
English (en)
French (fr)
Other versions
WO2023249556A2 (zh
Inventor
柏松
吴俊峰
刘启昊
江毅
卢宾
Original Assignee
脸萌有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 脸萌有限公司 filed Critical 脸萌有限公司
Publication of WO2023249556A2 publication Critical patent/WO2023249556A2/zh
Publication of WO2023249556A3 publication Critical patent/WO2023249556A3/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

提供了基于对比学习来处理视频的方法、装置、设备和介质。从训练数据中的训练视频中的第一帧和第二帧分别提取至少一个第一对象和至少一个第二对象。针对至少一个第一对象中的第一对象,基于训练数据从至少一个第二对象中选择与第一对象相关联的至少一个正样本对象和至少一个负样本对象。基于至少一个正样本对象和至少一个负样本对象生成对比模型,对比模型描述视频中的帧中的对象与对象的对比特征之间的关联关系,对比模型使得对比特征与视频中的另一帧中的另一对象的另一对比特征之间的相似性指示对象与另一对象是否表示相同对象。对比特征区分各个帧中的对象是否表示相同对象,由此提高跨越各个帧执行对象跟踪的准确性。
PCT/SG2023/050421 2022-06-22 2023-06-14 基于对比学习来处理视频的方法、装置、设备和介质 WO2023249556A2 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210714416.4 2022-06-22
CN202210714416.4A CN117315521A (zh) 2022-06-22 2022-06-22 基于对比学习来处理视频的方法、装置、设备和介质

Publications (2)

Publication Number Publication Date
WO2023249556A2 WO2023249556A2 (zh) 2023-12-28
WO2023249556A3 true WO2023249556A3 (zh) 2024-03-07

Family

ID=89241258

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2023/050421 WO2023249556A2 (zh) 2022-06-22 2023-06-14 基于对比学习来处理视频的方法、装置、设备和介质

Country Status (2)

Country Link
CN (1) CN117315521A (zh)
WO (1) WO2023249556A2 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976397A (zh) * 2016-04-28 2016-09-28 西安电子科技大学 基于半非负优化集成学习的目标跟踪方法
CN109740665A (zh) * 2018-12-29 2019-05-10 珠海大横琴科技发展有限公司 基于专家知识约束的遮挡图像船只目标检测方法及系统
CN110110670A (zh) * 2019-05-09 2019-08-09 杭州电子科技大学 基于Wasserstein度量的行人跟踪中的数据关联方法
CN113642472A (zh) * 2021-08-13 2021-11-12 北京百度网讯科技有限公司 判别器模型的训练方法和动作识别方法
CN113762231A (zh) * 2021-11-10 2021-12-07 中电科新型智慧城市研究院有限公司 端对端的多行人姿态跟踪方法、装置及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976397A (zh) * 2016-04-28 2016-09-28 西安电子科技大学 基于半非负优化集成学习的目标跟踪方法
CN109740665A (zh) * 2018-12-29 2019-05-10 珠海大横琴科技发展有限公司 基于专家知识约束的遮挡图像船只目标检测方法及系统
CN110110670A (zh) * 2019-05-09 2019-08-09 杭州电子科技大学 基于Wasserstein度量的行人跟踪中的数据关联方法
CN113642472A (zh) * 2021-08-13 2021-11-12 北京百度网讯科技有限公司 判别器模型的训练方法和动作识别方法
CN113762231A (zh) * 2021-11-10 2021-12-07 中电科新型智慧城市研究院有限公司 端对端的多行人姿态跟踪方法、装置及电子设备

Also Published As

Publication number Publication date
WO2023249556A2 (zh) 2023-12-28
CN117315521A (zh) 2023-12-29

Similar Documents

Publication Publication Date Title
EP3913542A3 (en) Method and apparatus of training model, device, medium, and program product
CN108090857B (zh) 一种多模态的学生课堂行为分析系统和方法
EP3836077A3 (en) Product defect detection method and apparatus, electronic device, storage medium and program
EP3933686A3 (en) Video processing method, apparatus, electronic device, storage medium, and program product
PH12020550588A1 (en) Target detection method and apparatus, training method, electronic device and medium
CN107491435B (zh) 基于计算机自动识别用户情感的方法及装置
US20180322416A1 (en) Feature extraction and classification method based on support vector data description and system thereof
EP3843031A3 (en) Face super-resolution realization method and apparatus, electronic device and storage medium
EP3907666A3 (en) Method, apparatus, electronic device, readable storage medium and program for constructing key-point learning model
CN112820322B (zh) 一种基于自监督对比学习的半监督音频事件标注方法
EP3913532A3 (en) Object area measurement method, apparatus, storage medium and computer product
EP3872760A3 (en) Method and apparatus of training depth estimation network, and method and apparatus of estimating depth of image
EP3872761A3 (en) Analysing objects in a set of frames
EP3998583A3 (en) Method and apparatus of training cycle generative networks model, and method and apparatus of building character library
CN114722822B (zh) 命名实体识别方法、装置、设备和计算机可读存储介质
Morfi et al. Data-efficient weakly supervised learning for low-resource audio event detection using deep learning
Liu et al. Synthvsr: Scaling up visual speech recognition with synthetic supervision
Dong et al. CML: A contrastive meta learning method to estimate human label confidence scores and reduce data collection cost
WO2023249556A3 (zh) 基于对比学习来处理视频的方法、装置、设备和介质
CN109697982A (zh) 一种讲授场景中的说话人语音识别系统
Xiao et al. Power-spectral analysis of head motion signal for behavioral modeling in human interaction
EP4187504A8 (en) Method for training text classification model, apparatus, storage medium and computer program product
EP4134920A3 (en) Entity recognition method and apparatus, and computer program product
EP4030424A3 (en) Method and apparatus of processing voice for vehicle, electronic device and medium
EP3842961A3 (en) Method and apparatus for mining tag, device, storage medium and computer program product