CN113056908A - 视频字幕合成方法、装置、存储介质及电子设备 - Google Patents
视频字幕合成方法、装置、存储介质及电子设备 Download PDFInfo
- Publication number
- CN113056908A CN113056908A CN201980076343.7A CN201980076343A CN113056908A CN 113056908 A CN113056908 A CN 113056908A CN 201980076343 A CN201980076343 A CN 201980076343A CN 113056908 A CN113056908 A CN 113056908A
- Authority
- CN
- China
- Prior art keywords
- voice
- vector
- recognized
- voiceprint
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003860 storage Methods 0.000 title claims description 17
- 238000001308 synthesis method Methods 0.000 title abstract description 9
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 55
- 238000000034 method Methods 0.000 claims description 82
- 230000015572 biosynthetic process Effects 0.000 claims description 41
- 238000003786 synthesis reaction Methods 0.000 claims description 41
- 238000012549 training Methods 0.000 claims description 39
- 238000004590 computer program Methods 0.000 claims description 13
- 239000000203 mixture Substances 0.000 claims description 8
- 238000012935 Averaging Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000003062 neural network model Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 241000282414 Homo sapiens Species 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000002945 steepest descent method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/278—Subtitling
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
本申请公开了一种视频字幕合成方法,包括:获取视频当中的语音信息,根据语音信息的特征得到待识别语音,将待识别语音输入至d‑vector声纹识别模型,以得到待识别语音所对应的声纹标识,声纹标识包含d‑vector特征,对待识别语音进行语音识别以得到对应的文本信息,将声纹标识和文本信息进行合成,以生成待识别语音的字幕。
Description
PCT国内申请,说明书已公开。
Claims (20)
- PCT国内申请,权利要求书已公开。
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/073770 WO2020154916A1 (zh) | 2019-01-29 | 2019-01-29 | 视频字幕合成方法、装置、存储介质及电子设备 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113056908A true CN113056908A (zh) | 2021-06-29 |
CN113056908B CN113056908B (zh) | 2024-04-05 |
Family
ID=71840280
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201980076343.7A Active CN113056908B (zh) | 2019-01-29 | 2019-01-29 | 视频字幕合成方法、装置、存储介质及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113056908B (zh) |
WO (1) | WO2020154916A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115620310A (zh) * | 2022-11-30 | 2023-01-17 | 杭州网易云音乐科技有限公司 | 图像识别方法、模型训练方法、介质、装置及计算设备 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104811733A (zh) * | 2010-05-04 | 2015-07-29 | Lg电子株式会社 | 处理视频信号的方法和设备 |
WO2017048008A1 (ko) * | 2015-09-17 | 2017-03-23 | 엘지전자 주식회사 | 영상 코딩 시스템에서 인터 예측 방법 및 장치 |
CN107911646A (zh) * | 2016-09-30 | 2018-04-13 | 阿里巴巴集团控股有限公司 | 一种会议分享、生成会议记录的方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160293167A1 (en) * | 2013-10-10 | 2016-10-06 | Google Inc. | Speaker recognition using neural networks |
CN104123115B (zh) * | 2014-07-28 | 2017-05-24 | 联想(北京)有限公司 | 一种音频信息处理方法及电子设备 |
CN106782545B (zh) * | 2016-12-16 | 2019-07-16 | 广州视源电子科技股份有限公司 | 一种将音视频数据转化成文字记录的系统和方法 |
CN108630207B (zh) * | 2017-03-23 | 2021-08-31 | 富士通株式会社 | 说话人确认方法和说话人确认设备 |
-
2019
- 2019-01-29 CN CN201980076343.7A patent/CN113056908B/zh active Active
- 2019-01-29 WO PCT/CN2019/073770 patent/WO2020154916A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104811733A (zh) * | 2010-05-04 | 2015-07-29 | Lg电子株式会社 | 处理视频信号的方法和设备 |
WO2017048008A1 (ko) * | 2015-09-17 | 2017-03-23 | 엘지전자 주식회사 | 영상 코딩 시스템에서 인터 예측 방법 및 장치 |
CN107911646A (zh) * | 2016-09-30 | 2018-04-13 | 阿里巴巴集团控股有限公司 | 一种会议分享、生成会议记录的方法及装置 |
Non-Patent Citations (1)
Title |
---|
吴明辉: "基于深度学习的与文本无关话者确认研究", CNKI优秀硕士学位论文全文库, pages 47 - 50 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115620310A (zh) * | 2022-11-30 | 2023-01-17 | 杭州网易云音乐科技有限公司 | 图像识别方法、模型训练方法、介质、装置及计算设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113056908B (zh) | 2024-04-05 |
WO2020154916A1 (zh) | 2020-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11727914B2 (en) | Intent recognition and emotional text-to-speech learning | |
US10878824B2 (en) | Speech-to-text generation using video-speech matching from a primary speaker | |
EP3824462B1 (en) | Electronic apparatus for processing user utterance and controlling method thereof | |
WO2020098115A1 (zh) | 字幕添加方法、装置、电子设备及计算机可读存储介质 | |
CN110675886B (zh) | 音频信号处理方法、装置、电子设备及存储介质 | |
WO2022033556A1 (zh) | 电子设备及其语音识别方法和介质 | |
WO2019242414A1 (zh) | 语音处理方法、装置、存储介质及电子设备 | |
CN114401417B (zh) | 直播流对象跟踪方法及其装置、设备、介质 | |
US20210168460A1 (en) | Electronic device and subtitle expression method thereof | |
CN109346057A (zh) | 一种智能儿童玩具的语音处理系统 | |
CN111640434A (zh) | 用于控制语音设备的方法和装置 | |
KR20200027331A (ko) | 음성 합성 장치 | |
CN115798459B (zh) | 音频处理方法、装置、存储介质及电子设备 | |
CN113903338A (zh) | 面签方法、装置、电子设备和存储介质 | |
CN113056908B (zh) | 视频字幕合成方法、装置、存储介质及电子设备 | |
CN113299309A (zh) | 语音翻译方法及装置、计算机可读介质和电子设备 | |
CN110337030B (zh) | 视频播放方法、装置、终端和计算机可读存储介质 | |
WO2023040658A1 (zh) | 语音交互方法及电子设备 | |
CN111696566B (zh) | 语音处理方法、装置和介质 | |
CN110083392B (zh) | 音频唤醒预录的方法、存储介质、终端及其蓝牙耳机 | |
US20210082427A1 (en) | Information processing apparatus and information processing method | |
CN117153166B (zh) | 语音唤醒方法、设备及存储介质 | |
CN111696564B (zh) | 语音处理方法、装置和介质 | |
CN113056784B (zh) | 语音信息的处理方法、装置、存储介质及电子设备 | |
US20230267934A1 (en) | Display apparatus and operating method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |