CN113056908A - 视频字幕合成方法、装置、存储介质及电子设备 - Google Patents

视频字幕合成方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
CN113056908A
CN113056908A CN201980076343.7A CN201980076343A CN113056908A CN 113056908 A CN113056908 A CN 113056908A CN 201980076343 A CN201980076343 A CN 201980076343A CN 113056908 A CN113056908 A CN 113056908A
Authority
CN
China
Prior art keywords
voice
vector
recognized
voiceprint
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201980076343.7A
Other languages
English (en)
Other versions
CN113056908B (zh
Inventor
叶青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd, Shenzhen Huantai Technology Co Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of CN113056908A publication Critical patent/CN113056908A/zh
Application granted granted Critical
Publication of CN113056908B publication Critical patent/CN113056908B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/278Subtitling

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请公开了一种视频字幕合成方法,包括:获取视频当中的语音信息,根据语音信息的特征得到待识别语音,将待识别语音输入至d‑vector声纹识别模型,以得到待识别语音所对应的声纹标识,声纹标识包含d‑vector特征,对待识别语音进行语音识别以得到对应的文本信息,将声纹标识和文本信息进行合成,以生成待识别语音的字幕。

Description

PCT国内申请,说明书已公开。

Claims (20)

  1. PCT国内申请,权利要求书已公开。
CN201980076343.7A 2019-01-29 2019-01-29 视频字幕合成方法、装置、存储介质及电子设备 Active CN113056908B (zh)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/073770 WO2020154916A1 (zh) 2019-01-29 2019-01-29 视频字幕合成方法、装置、存储介质及电子设备

Publications (2)

Publication Number Publication Date
CN113056908A true CN113056908A (zh) 2021-06-29
CN113056908B CN113056908B (zh) 2024-04-05

Family

ID=71840280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980076343.7A Active CN113056908B (zh) 2019-01-29 2019-01-29 视频字幕合成方法、装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN113056908B (zh)
WO (1) WO2020154916A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620310A (zh) * 2022-11-30 2023-01-17 杭州网易云音乐科技有限公司 图像识别方法、模型训练方法、介质、装置及计算设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811733A (zh) * 2010-05-04 2015-07-29 Lg电子株式会社 处理视频信号的方法和设备
WO2017048008A1 (ko) * 2015-09-17 2017-03-23 엘지전자 주식회사 영상 코딩 시스템에서 인터 예측 방법 및 장치
CN107911646A (zh) * 2016-09-30 2018-04-13 阿里巴巴集团控股有限公司 一种会议分享、生成会议记录的方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160293167A1 (en) * 2013-10-10 2016-10-06 Google Inc. Speaker recognition using neural networks
CN104123115B (zh) * 2014-07-28 2017-05-24 联想(北京)有限公司 一种音频信息处理方法及电子设备
CN106782545B (zh) * 2016-12-16 2019-07-16 广州视源电子科技股份有限公司 一种将音视频数据转化成文字记录的系统和方法
CN108630207B (zh) * 2017-03-23 2021-08-31 富士通株式会社 说话人确认方法和说话人确认设备

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811733A (zh) * 2010-05-04 2015-07-29 Lg电子株式会社 处理视频信号的方法和设备
WO2017048008A1 (ko) * 2015-09-17 2017-03-23 엘지전자 주식회사 영상 코딩 시스템에서 인터 예측 방법 및 장치
CN107911646A (zh) * 2016-09-30 2018-04-13 阿里巴巴集团控股有限公司 一种会议分享、生成会议记录的方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴明辉: "基于深度学习的与文本无关话者确认研究", CNKI优秀硕士学位论文全文库, pages 47 - 50 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620310A (zh) * 2022-11-30 2023-01-17 杭州网易云音乐科技有限公司 图像识别方法、模型训练方法、介质、装置及计算设备

Also Published As

Publication number Publication date
CN113056908B (zh) 2024-04-05
WO2020154916A1 (zh) 2020-08-06

Similar Documents

Publication Publication Date Title
US11727914B2 (en) Intent recognition and emotional text-to-speech learning
US10878824B2 (en) Speech-to-text generation using video-speech matching from a primary speaker
EP3824462B1 (en) Electronic apparatus for processing user utterance and controlling method thereof
WO2020098115A1 (zh) 字幕添加方法、装置、电子设备及计算机可读存储介质
CN110675886B (zh) 音频信号处理方法、装置、电子设备及存储介质
WO2022033556A1 (zh) 电子设备及其语音识别方法和介质
WO2019242414A1 (zh) 语音处理方法、装置、存储介质及电子设备
CN114401417B (zh) 直播流对象跟踪方法及其装置、设备、介质
US20210168460A1 (en) Electronic device and subtitle expression method thereof
CN109346057A (zh) 一种智能儿童玩具的语音处理系统
CN111640434A (zh) 用于控制语音设备的方法和装置
KR20200027331A (ko) 음성 합성 장치
CN115798459B (zh) 音频处理方法、装置、存储介质及电子设备
CN113903338A (zh) 面签方法、装置、电子设备和存储介质
CN113056908B (zh) 视频字幕合成方法、装置、存储介质及电子设备
CN113299309A (zh) 语音翻译方法及装置、计算机可读介质和电子设备
CN110337030B (zh) 视频播放方法、装置、终端和计算机可读存储介质
WO2023040658A1 (zh) 语音交互方法及电子设备
CN111696566B (zh) 语音处理方法、装置和介质
CN110083392B (zh) 音频唤醒预录的方法、存储介质、终端及其蓝牙耳机
US20210082427A1 (en) Information processing apparatus and information processing method
CN117153166B (zh) 语音唤醒方法、设备及存储介质
CN111696564B (zh) 语音处理方法、装置和介质
CN113056784B (zh) 语音信息的处理方法、装置、存储介质及电子设备
US20230267934A1 (en) Display apparatus and operating method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant