WO2023287360A3 - 多媒体处理方法、装置、电子设备及存储介质 - Google Patents

多媒体处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023287360A3
WO2023287360A3 PCT/SG2022/050494 SG2022050494W WO2023287360A3 WO 2023287360 A3 WO2023287360 A3 WO 2023287360A3 SG 2022050494 W SG2022050494 W SG 2022050494W WO 2023287360 A3 WO2023287360 A3 WO 2023287360A3
Authority
WO
WIPO (PCT)
Prior art keywords
text content
multimedia resource
invalid
voice data
electronic device
Prior art date
Application number
PCT/SG2022/050494
Other languages
English (en)
French (fr)
Other versions
WO2023287360A2 (zh
Inventor
郑鑫
朱聪慧
夏瑞
尚楚翔
钟德建
蒋泳森
屠明
邓乐来
Original Assignee
脸萌有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 脸萌有限公司 filed Critical 脸萌有限公司
Priority to JP2023576228A priority Critical patent/JP2024527483A/ja
Priority to EP22842574.0A priority patent/EP4336854A4/en
Priority to BR112023026041A priority patent/BR112023026041A2/pt
Publication of WO2023287360A2 publication Critical patent/WO2023287360A2/zh
Publication of WO2023287360A3 publication Critical patent/WO2023287360A3/zh
Priority to US18/535,891 priority patent/US20240105234A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/433Query formulation using audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

摘要本公开实施例提供一种多媒体处理方法、装置、电子设备及存储介质,通过获取第一多媒体资源;对第一多媒体资源的音频数据进行语音识别,确定第一多媒体资源对应的初始文本内容,第一多媒体资源的音频数据包含初始文本内容的语音数据;确定初始文本内容中的无效文本内容,无效文本内容为语义上无信息表达作用的文本内容;确定无效文本内容的语音数据在第一多媒体资源中的第一播放位置;基于第一播放位置,对第一多媒体资源进行裁剪,得到第二多媒体资源,其中,第二多媒体资源的音频数据中包含目标文本内容的语音数据且不包含无效文本内容的语音数据。本公开实施例实现了对多媒体资源中无效内容自动剪裁,提高了剪裁效率和剪裁效果。
PCT/SG2022/050494 2021-07-15 2022-07-14 多媒体处理方法、装置、电子设备及存储介质 WO2023287360A2 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2023576228A JP2024527483A (ja) 2021-07-15 2022-07-14 マルチメディア処理方法、装置、電子機器および記憶媒体
EP22842574.0A EP4336854A4 (en) 2021-07-15 2022-07-14 MULTIMEDIA PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM
BR112023026041A BR112023026041A2 (pt) 2021-07-15 2022-07-14 Método de processamento de multimídia, dispositivos, equipamentos eletrônicos e mídias de armazenamento
US18/535,891 US20240105234A1 (en) 2021-07-15 2023-12-11 Multimedia processing method and apparatus, electronic device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110802038.0A CN115623279A (zh) 2021-07-15 2021-07-15 多媒体处理方法、装置、电子设备及存储介质
CN202110802038.0 2021-07-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/535,891 Continuation US20240105234A1 (en) 2021-07-15 2023-12-11 Multimedia processing method and apparatus, electronic device, and storage medium

Publications (2)

Publication Number Publication Date
WO2023287360A2 WO2023287360A2 (zh) 2023-01-19
WO2023287360A3 true WO2023287360A3 (zh) 2023-04-13

Family

ID=84855225

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2022/050494 WO2023287360A2 (zh) 2021-07-15 2022-07-14 多媒体处理方法、装置、电子设备及存储介质

Country Status (6)

Country Link
US (1) US20240105234A1 (zh)
EP (1) EP4336854A4 (zh)
JP (1) JP2024527483A (zh)
CN (1) CN115623279A (zh)
BR (1) BR112023026041A2 (zh)
WO (1) WO2023287360A2 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118368478A (zh) * 2023-01-19 2024-07-19 北京字跳网络技术有限公司 视频的编辑方法、装置、电子设备和存储介质
EP4429256A1 (en) 2023-01-19 2024-09-11 Beijing Zitiao Network Technology Co., Ltd. Video editing method and apparatus, electronic device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202502737U (zh) * 2012-03-12 2012-10-24 中国人民解放军济南军区司令部第二部 一种视音频信息的智能编辑系统
US20140207449A1 (en) * 2013-01-18 2014-07-24 R Paul Johnson Using speech to text for detecting commercials and aligning edited episodes with transcripts
CN108259965A (zh) * 2018-03-31 2018-07-06 湖南广播电视台广播传媒中心 一种视频剪辑方法和剪辑系统
CN112733654A (zh) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 一种视频拆条的方法和装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070219778A1 (en) * 2006-03-17 2007-09-20 University Of Sheffield Speech processing system
US11861906B2 (en) * 2014-02-28 2024-01-02 Genius Sports Ss, Llc Data processing systems and methods for enhanced augmentation of interactive video content
US20190215464A1 (en) * 2018-01-11 2019-07-11 Blue Jeans Network, Inc. Systems and methods for decomposing a video stream into face streams
CN109241332B (zh) * 2018-10-19 2021-09-24 广东小天才科技有限公司 一种通过语音确定语义的方法及系统
CN110297907B (zh) * 2019-06-28 2022-03-08 谭浩 生成访谈报告的方法、计算机可读存储介质和终端设备
CN110853621B (zh) * 2019-10-09 2024-02-13 科大讯飞股份有限公司 语音顺滑方法、装置、电子设备及计算机存储介质
US11875781B2 (en) * 2020-08-31 2024-01-16 Adobe Inc. Audio-based media edit point selection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202502737U (zh) * 2012-03-12 2012-10-24 中国人民解放军济南军区司令部第二部 一种视音频信息的智能编辑系统
US20140207449A1 (en) * 2013-01-18 2014-07-24 R Paul Johnson Using speech to text for detecting commercials and aligning edited episodes with transcripts
CN108259965A (zh) * 2018-03-31 2018-07-06 湖南广播电视台广播传媒中心 一种视频剪辑方法和剪辑系统
CN112733654A (zh) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 一种视频拆条的方法和装置

Also Published As

Publication number Publication date
BR112023026041A2 (pt) 2024-03-05
US20240105234A1 (en) 2024-03-28
WO2023287360A2 (zh) 2023-01-19
JP2024527483A (ja) 2024-07-25
CN115623279A (zh) 2023-01-17
EP4336854A4 (en) 2024-09-25
EP4336854A2 (en) 2024-03-13

Similar Documents

Publication Publication Date Title
WO2023287360A3 (zh) 多媒体处理方法、装置、电子设备及存储介质
US10282162B2 (en) Audio book smart pause
KR102451034B1 (ko) 화자 구분
CN105244026B (zh) 一种语音处理方法及装置
CN103327181B (zh) 可提高用户获知语音信息效率的语音聊天方法
US9240180B2 (en) System and method for low-latency web-based text-to-speech without plugins
CN104702791A (zh) 长时间录音并同步转写文字的智能手机及其信息处理方法
SG179091A1 (en) Multifunction multimedia device
CN104252861A (zh) 视频语音转换方法、装置和服务器
CN107680584B (zh) 用于切分音频的方法和装置
US8620670B2 (en) Automatic realtime speech impairment correction
US20160118063A1 (en) Deep tagging background noises
CN108074570A (zh) 自动切割、传输、保存的语音识别方法
TW200605040A (en) Storage medium including metadata and reproduction apparatus and method therefor
CN104732975A (zh) 一种语音即时通讯方法及装置
EP4033484A3 (en) Recognition of semantic information of a speech signal, training a recognition model
US20230343325A1 (en) Audio processing method and apparatus, and electronic device
CN109920413A (zh) 一种厨房场景触屏语音对话的实现方法及存储介质
EP4033483A3 (en) Method and apparatus for testing vehicle-mounted voice device, electronic device and storage medium
CN108962228B (zh) 模型训练方法和装置
WO2013180600A3 (ru) Способ переозвучивания аудиоматериалов и устройство для его осуществления
KR101611224B1 (ko) 오디오 인터페이스
WO2021017302A1 (zh) 一种数据提取方法、装置、计算机系统及可读存储介质
US20240304208A1 (en) Audio playback and captioning
US10007724B2 (en) Creating, rendering and interacting with a multi-faceted audio cloud

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2022842574

Country of ref document: EP

Ref document number: 22842574.0

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 11202309454P

Country of ref document: SG

ENP Entry into the national phase

Ref document number: 2023576228

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2022842574

Country of ref document: EP

Effective date: 20231207

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023026041

Country of ref document: BR

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22842574

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112023026041

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20231211