CN109525787B - 面向直播场景的实时字幕翻译及系统实现方法 - Google Patents
面向直播场景的实时字幕翻译及系统实现方法 Download PDFInfo
- Publication number
- CN109525787B CN109525787B CN201811523195.2A CN201811523195A CN109525787B CN 109525787 B CN109525787 B CN 109525787B CN 201811523195 A CN201811523195 A CN 201811523195A CN 109525787 B CN109525787 B CN 109525787B
- Authority
- CN
- China
- Prior art keywords
- time
- neural network
- spectrogram
- voice signal
- real
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000013519 translation Methods 0.000 title claims abstract description 17
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000013528 artificial neural network Methods 0.000 claims abstract description 8
- 238000011176 pooling Methods 0.000 claims description 21
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 3
- 230000000737 periodic effect Effects 0.000 claims description 3
- 238000013518 transcription Methods 0.000 abstract description 3
- 230000035897 transcription Effects 0.000 abstract description 3
- 238000001228 spectrum Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/278—Subtitling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811523195.2A CN109525787B (zh) | 2018-12-13 | 2018-12-13 | 面向直播场景的实时字幕翻译及系统实现方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811523195.2A CN109525787B (zh) | 2018-12-13 | 2018-12-13 | 面向直播场景的实时字幕翻译及系统实现方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109525787A CN109525787A (zh) | 2019-03-26 |
CN109525787B true CN109525787B (zh) | 2021-03-16 |
Family
ID=65795550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811523195.2A Active CN109525787B (zh) | 2018-12-13 | 2018-12-13 | 面向直播场景的实时字幕翻译及系统实现方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109525787B (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008927A (zh) * | 2019-04-15 | 2019-07-12 | 河南大华安防科技股份有限公司 | 一种基于深度学习模型Fast-Rcnn改进的警情自动化判定方法 |
CN115938385A (zh) * | 2021-08-17 | 2023-04-07 | 中移(苏州)软件技术有限公司 | 一种语音分离方法、装置及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104077336A (zh) * | 2013-05-09 | 2014-10-01 | 腾讯科技(深圳)有限公司 | 一种拖拽音频文件进行音频文件信息检索的方法和装置 |
CN106952649A (zh) * | 2017-05-14 | 2017-07-14 | 北京工业大学 | 基于卷积神经网络和频谱图的说话人识别方法 |
WO2017196931A1 (en) * | 2016-05-10 | 2017-11-16 | Google Llc | Frequency based audio analysis using neural networks |
CN108281139A (zh) * | 2016-12-30 | 2018-07-13 | 深圳光启合众科技有限公司 | 语音转写方法和装置、机器人 |
CN108564940A (zh) * | 2018-03-20 | 2018-09-21 | 平安科技(深圳)有限公司 | 语音识别方法、服务器及计算机可读存储介质 |
-
2018
- 2018-12-13 CN CN201811523195.2A patent/CN109525787B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104077336A (zh) * | 2013-05-09 | 2014-10-01 | 腾讯科技(深圳)有限公司 | 一种拖拽音频文件进行音频文件信息检索的方法和装置 |
WO2017196931A1 (en) * | 2016-05-10 | 2017-11-16 | Google Llc | Frequency based audio analysis using neural networks |
CN108281139A (zh) * | 2016-12-30 | 2018-07-13 | 深圳光启合众科技有限公司 | 语音转写方法和装置、机器人 |
CN106952649A (zh) * | 2017-05-14 | 2017-07-14 | 北京工业大学 | 基于卷积神经网络和频谱图的说话人识别方法 |
CN108564940A (zh) * | 2018-03-20 | 2018-09-21 | 平安科技(深圳)有限公司 | 语音识别方法、服务器及计算机可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN109525787A (zh) | 2019-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6019108B2 (ja) | 文字に基づく映像生成 | |
WO2010081395A1 (zh) | 语音驱动动画中嘴唇形状变化、获取嘴唇动画的方法及装置 | |
JP2014519082A5 (zh) | ||
CN112309365B (zh) | 语音合成模型的训练方法、装置、存储介质以及电子设备 | |
CN110085244A (zh) | 直播互动方法、装置、电子设备及可读存储介质 | |
KR20070020252A (ko) | 메시지를 수정하기 위한 방법 및 시스템 | |
CN1639738A (zh) | 产生漫画化的正在说话的头部的方法和系统 | |
EP1203352A1 (en) | Method of animating a synthesised model of a human face driven by an acoustic signal | |
CN109525787B (zh) | 面向直播场景的实时字幕翻译及系统实现方法 | |
WO2023011221A1 (zh) | 混合变形值的输出方法及存储介质、电子装置 | |
CN110675886A (zh) | 音频信号处理方法、装置、电子设备及存储介质 | |
JP2014215461A (ja) | 音声処理装置および方法、並びにプログラム | |
CN113436609B (zh) | 语音转换模型及其训练方法、语音转换方法及系统 | |
CN111460094B (zh) | 一种基于tts的音频拼接优化的方法及其装置 | |
CN113823323B (zh) | 一种基于卷积神经网络的音频处理方法、装置及相关设备 | |
CN106327555A (zh) | 一种获得唇形动画的方法及装置 | |
CN116051692B (zh) | 一种基于语音驱动的三维数字人脸部动画生成方法 | |
CN116580720A (zh) | 一种基于视听语音分离的说话人视觉激活解释方法及系统 | |
CN116912375A (zh) | 面部动画生成方法、装置、电子设备及存储介质 | |
CN112466306A (zh) | 会议纪要生成方法、装置、计算机设备及存储介质 | |
CN110505405A (zh) | 基于体感技术的视频拍摄系统及方法 | |
CN115223224A (zh) | 数字人说话视频生成方法、系统、终端设备及介质 | |
KR100849027B1 (ko) | 음성 신호에 대한 립싱크 동기화 방법 및 장치 | |
CN115883869B (zh) | 基于Swin Transformer的视频帧插帧模型的处理方法、装置及处理设备 | |
Kumar et al. | Towards robust speech recognition model using Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221020 Address after: Room A1430, Room 01, 02, 03, 04, 10, 11, 18/F, Building A, Wuhan Optics Valley International Business Center, No. 111, Guanshan Avenue, Donghu New Technology Development Zone, Wuhan 430000, Hubei Province (Wuhan area of the Free Trade Zone) Patentee after: Wuhan Ruidimu Network Technology Co.,Ltd. Address before: 210003, 66 new model street, Gulou District, Jiangsu, Nanjing Patentee before: NANJING University OF POSTS AND TELECOMMUNICATIONS |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221220 Address after: 430070 Building B2, Phase I, Longshan Innovation Park, Future City, No. 999, Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province (except Room 101) (Wuhan area of the Free Trade Zone) Patentee after: TRANSN IOL TECHNOLOGY Co.,Ltd. Address before: Room A1430, Room 01, 02, 03, 04, 10, 11, 18/F, Building A, Wuhan Optics Valley International Business Center, No. 111, Guanshan Avenue, Donghu New Technology Development Zone, Wuhan 430000, Hubei Province (Wuhan area of the Free Trade Zone) Patentee before: Wuhan Ruidimu Network Technology Co.,Ltd. |