CN109671422A - 一种获取纯净语音的录音方法 - Google Patents

一种获取纯净语音的录音方法 Download PDF

Info

Publication number
CN109671422A
CN109671422A CN201910017762.5A CN201910017762A CN109671422A CN 109671422 A CN109671422 A CN 109671422A CN 201910017762 A CN201910017762 A CN 201910017762A CN 109671422 A CN109671422 A CN 109671422A
Authority
CN
China
Prior art keywords
coefficient
frame
frequency spectrum
recording
spectrum energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910017762.5A
Other languages
English (en)
Other versions
CN109671422B (zh
Inventor
陆成刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201910017762.5A priority Critical patent/CN109671422B/zh
Publication of CN109671422A publication Critical patent/CN109671422A/zh
Application granted granted Critical
Publication of CN109671422B publication Critical patent/CN109671422B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

一种获取纯净语音的录音方法,包括以下步骤:1)用户将需要录音的文本输入语音合成引擎TTS,并选定一个音色参数;2)收到语音合成TTS引擎输出的合成音频,逐帧进行Fourier变换,生成频域系数;3)计算该帧频谱能量;4)基于能量系数对该帧进行清音浊音检测,如果是清音跳到步骤6),否则进行步骤5);5)对浊音帧频谱能量的部分共振峰系数使用用户事先录制的用户自身的浊音频谱共振峰系数代替,得到修正的该帧频谱能量系数;6)该帧频谱能量系数使用Griffin‑lem算法处理生成频域系数;7)对该帧频域系数进行反Fourier变换恢复成时域语音信号。本发明不需要依赖任何麦克风设备即能实现高纯净的语音录音。

Description

一种获取纯净语音的录音方法
技术领域
本发明属于语音录音技术领域,涉及一种获取纯净语音的录音方法。
背景技术
一般录音时必然会在录取声音中加载背景杂音,这个杂音可能包含线路热噪声、或者环境声学杂音,只是视录音器材的优劣、录音环境的安静程度,背景噪音的强弱程度有所不同而已。如果为了获取纯净度较高的语音录音,可以采用搭建专门的录音室、以及采用专门的录音麦克风。录音室为了降低声音在室内的声学反射传播,所有的墙面家具表面均采用吸声材质进行装饰,麦克风采用高价格的对线路热噪声具有良好抑制性能、并且频谱响应较宽较平直的电子设备。此外,还有一种常见的获取纯净录音的技术方法,使用软件形式或者硬件形式的噪声滤波器。而这又分为两个技术路线:一、对于单麦克风处理,即获取声音后经过AD转换,变成数字域声音然后采用盲降噪技术进行噪声频谱估计从而将噪声成份滤除;二、使用多路麦克风组成的声学阵列,多路麦克风间互为参考信号,能够判定录音声源的方位、强弱,从而形成指向该音源的自适应BIN丛,这样就避免了背景杂音的录取。但这些技术要么依赖特殊的录音场地或录音设备,要么依赖特殊的降噪仪器,总之迄今仍没有只依赖普通设备、甚至不需要麦克风设备的录音方法。
发明内容
为了克服已有获取纯净录音方式的需要依赖特殊的录音场地或录音设备或降噪仪器、操作麻烦的不足,本发明提供一种不需要依赖任何麦克风设备即能实现高纯净的语音录音的获取纯净语音的录音方法。
本发明解决其技术问题所采用的技术方案是:
一种获取纯净语音的录音方法,包括以下步骤:
1)用户将需要录音的文本输入语音合成引擎TTS,并选定一个音色参数;
2)收到语音合成TTS引擎输出的合成音频,逐帧进行Fourier变换,生成频域系数;
3)计算该帧频谱能量,即频域系数平方求和;
4)基于能量系数对该帧进行清音浊音检测,如果是清音,则跳到步骤6),否则进行步骤5);.
5)对浊音帧频谱能量的部分共振峰系数(F3、F4、F5)使用用户事先录制的用户自身的浊音频谱共振峰系数(F3、F4、F5)代替,得到修正的该帧频谱能量系数;
6)该帧频谱能量系数使用Griffin-lem算法处理生成频域系数;
7)对该帧频域系数进行反Fourier变换恢复成时域语音信号。
进一步,所述步骤1)中,选定的音色参数必须是使用录音室级的纯净采样合成语音的音色。
本发明中,采用完全不同的思路实现纯净语音的录音:采用浊音共振峰频谱搬移,再基于Griffin-lem算法恢复成频域,通过反Fourier变换生成时域语音信号。
原理是将需要录音的语音文本传送给语音合成引擎TTS(一般TTS引擎的音素库都是使用录音室级的纯净采样样本),同时选定一个待输出的合成音色(男声、女声或童声),收到语音合成引擎TTS输出的音频后,对音频信号进行Fourier变换,生成一帧帧频域系数,再计算每帧频谱能量系数,基于能量系数对该帧进行清音、浊音检测,对浊音段频谱能量的部分共振峰系数(F3、F4、F5)使用对应的用户事先录制的自身的浊音频谱共振峰系数(F3、F4、F5)代替,对修正后的该帧频谱系数使用Griffin-lem算法处理生成频域系数,然后进行反Fourier变换,生成具有用户自己音色特征的纯净语音。
本发明的有益效果主要表现在:不需要依赖任何麦克风设备即能实现录音,而且是高纯净的语音录音。
附图说明
图1是一种获取纯净语音的录音方法的流程图。
具体实施方式
下面结合附图对本发明作进一步描述。
参照图1,一种获取纯净语音的录音方法,包括以下步骤:
1)用户将需要录音的文本输入语音合成引擎TTS,并选定一个音色参数;
2)收到语音合成TTS引擎输出的合成音频,逐帧进行Fourier变换,生成频域系数;
3)计算该帧频谱能量,即频域系数平方求和;
4)基于能量系数对该帧进行清音浊音检测,如果是清音,则跳到步骤6),否则进行步骤5);.
5)对浊音帧频谱能量的部分共振峰系数(F3、F4、F5)使用用户事先录制的用户自身的浊音频谱共振峰系数(F3、F4、F5)代替,得到修正的该帧频谱能量系数;
6)该帧频谱能量系数使用Griffin-lem算法处理生成频域系数;
7)对该帧频域系数进行反Fourier变换恢复成时域语音信号。
进一步,所述步骤1)中,选定的音色参数必须是使用录音室级的纯净采样合成语音的音色。

Claims (2)

1.一种获取纯净语音的录音方法,其特征在于,所述方法包括以下步骤:
1)用户将需要录音的文本输入语音合成引擎TTS,并选定一个音色参数;
2)收到语音合成TTS引擎输出的合成音频,逐帧进行Fourier变换,生成频域系数;
3)计算该帧频谱能量,即频域系数平方求和;
4)基于能量系数对该帧进行清音浊音检测,如果是清音,则跳到步骤6),否则进行步骤5);.
5)对浊音帧频谱能量的部分共振峰系数(F3、F4、F5)使用用户事先录制的用户自身的浊音频谱共振峰系数(F3、F4、F5)代替,得到修正的该帧频谱能量系数;
6)该帧频谱能量系数使用Griffin-lem算法处理生成频域系数;
7)对该帧频域系数进行反Fourier变换恢复成时域语音信号。
2.如权利要求1所述的一种获取纯净语音的录音方法,其特征在于,所述步骤1)中,选定的音色参数是使用录音室级的纯净采样合成语音的音色。
CN201910017762.5A 2019-01-09 2019-01-09 一种获取纯净语音的录音方法 Active CN109671422B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910017762.5A CN109671422B (zh) 2019-01-09 2019-01-09 一种获取纯净语音的录音方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910017762.5A CN109671422B (zh) 2019-01-09 2019-01-09 一种获取纯净语音的录音方法

Publications (2)

Publication Number Publication Date
CN109671422A true CN109671422A (zh) 2019-04-23
CN109671422B CN109671422B (zh) 2022-06-17

Family

ID=66149428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910017762.5A Active CN109671422B (zh) 2019-01-09 2019-01-09 一种获取纯净语音的录音方法

Country Status (1)

Country Link
CN (1) CN109671422B (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246502A (zh) * 2019-06-26 2019-09-17 广东小天才科技有限公司 语音降噪方法、装置及终端设备
CN112652315A (zh) * 2020-08-03 2021-04-13 李�昊 基于深度学习的汽车引擎声实时合成系统及方法
CN113838453A (zh) * 2021-08-17 2021-12-24 北京百度网讯科技有限公司 语音处理方法、装置、设备和计算机存储介质
US11996084B2 (en) 2021-08-17 2024-05-28 Beijing Baidu Netcom Science Technology Co., Ltd. Speech synthesis method and apparatus, device and computer storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0193796A (ja) * 1987-10-06 1989-04-12 Nippon Hoso Kyokai <Nhk> 声質変換方法
JPH056198A (ja) * 1991-06-26 1993-01-14 Yamaha Corp フオルマント合成装置
US5459813A (en) * 1991-03-27 1995-10-17 R.G.A. & Associates, Ltd Public address intelligibility system
US5649058A (en) * 1990-03-31 1997-07-15 Gold Star Co., Ltd. Speech synthesizing method achieved by the segmentation of the linear Formant transition region
US20040158470A1 (en) * 2003-01-30 2004-08-12 Yamaha Corporation Tone generator of wave table type with voice synthesis capability
CN101067929A (zh) * 2007-06-05 2007-11-07 南京大学 使用共振峰增强提取话音共振峰轨迹的方法
CN101359473A (zh) * 2007-07-30 2009-02-04 国际商业机器公司 自动进行语音转换的方法和装置
CN106057192A (zh) * 2016-07-07 2016-10-26 Tcl集团股份有限公司 一种实时语音转换方法和装置
CN108682413A (zh) * 2018-04-24 2018-10-19 上海师范大学 一种基于语音转换的情感疏导系统

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0193796A (ja) * 1987-10-06 1989-04-12 Nippon Hoso Kyokai <Nhk> 声質変換方法
US5649058A (en) * 1990-03-31 1997-07-15 Gold Star Co., Ltd. Speech synthesizing method achieved by the segmentation of the linear Formant transition region
US5459813A (en) * 1991-03-27 1995-10-17 R.G.A. & Associates, Ltd Public address intelligibility system
JPH056198A (ja) * 1991-06-26 1993-01-14 Yamaha Corp フオルマント合成装置
US20040158470A1 (en) * 2003-01-30 2004-08-12 Yamaha Corporation Tone generator of wave table type with voice synthesis capability
CN101067929A (zh) * 2007-06-05 2007-11-07 南京大学 使用共振峰增强提取话音共振峰轨迹的方法
CN101359473A (zh) * 2007-07-30 2009-02-04 国际商业机器公司 自动进行语音转换的方法和装置
CN106057192A (zh) * 2016-07-07 2016-10-26 Tcl集团股份有限公司 一种实时语音转换方法和装置
CN108682413A (zh) * 2018-04-24 2018-10-19 上海师范大学 一种基于语音转换的情感疏导系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王坤赤等: "一种基于语音频谱的基频和共振峰提取算法", 《信息技术》 *
罗兰娥等: "歌唱艺术嗓音中声学参数的应用", 《山西电子技术》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246502A (zh) * 2019-06-26 2019-09-17 广东小天才科技有限公司 语音降噪方法、装置及终端设备
CN112652315A (zh) * 2020-08-03 2021-04-13 李�昊 基于深度学习的汽车引擎声实时合成系统及方法
CN113838453A (zh) * 2021-08-17 2021-12-24 北京百度网讯科技有限公司 语音处理方法、装置、设备和计算机存储介质
US11996084B2 (en) 2021-08-17 2024-05-28 Beijing Baidu Netcom Science Technology Co., Ltd. Speech synthesis method and apparatus, device and computer storage medium

Also Published As

Publication number Publication date
CN109671422B (zh) 2022-06-17

Similar Documents

Publication Publication Date Title
CN109671422A (zh) 一种获取纯净语音的录音方法
Wise et al. Maximum likelihood pitch estimation
Alku et al. Formant frequency estimation of high-pitched vowels using weighted linear prediction
Bresch et al. Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans
US9111526B2 (en) Systems, method, apparatus, and computer-readable media for decomposition of a multichannel music signal
Iseli et al. Age, sex, and vowel dependencies of acoustic measures related to the voice source
EP1064648B1 (en) Wideband speech synthesis from a narrowband speech signal
US8706496B2 (en) Audio signal transforming by utilizing a computational cost function
JP2009042716A (ja) 周期信号処理方法、周期信号変換方法および周期信号処理装置ならびに周期信号の分析方法
CN108701465A (zh) 音频信号解码
Ganapathy et al. Temporal envelope compensation for robust phoneme recognition using modulation spectrum
CN106653048B (zh) 基于人声模型的单通道声音分离方法
CN108172210B (zh) 一种基于歌声节奏的演唱和声生成方法
JP2010210758A (ja) 音声を含む信号の処理方法及び装置
Kotnik et al. Evaluation of pitch detection algorithms in adverse conditions
Benetos et al. Auditory spectrum-based pitched instrument onset detection
Cosi et al. Lyon's auditory model inversion: a tool for sound separation and speech enhancement
Yim et al. Computationally efficient algorithm for time scale modification (GLS-TSM)
CN105336320A (zh) 一种弹簧混响模型
KR20030031936A (ko) 피치변경법을 이용한 단일 음성 다중 목소리 합성기
Alku et al. Linear predictive method for improved spectral modeling of lower frequencies of speech with small prediction orders
Ternström Hi-Fi voice: observations on the distribution of energy in the singing voice spectrum above 5 kHz
CN107919115A (zh) 一种基于非线性谱变换的特征补偿方法
CN109697985B (zh) 语音信号处理方法、装置及终端
Flanagan Parametric representation of speech signals [dsp history]

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190423

Assignee: Lingqi Internet of Things Technology (Hangzhou) Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2022330000931

Denomination of invention: A recording method for obtaining pure speech

Granted publication date: 20220617

License type: Common License

Record date: 20221229

Application publication date: 20190423

Assignee: Zhejiang Yu'an Information Technology Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2022330000897

Denomination of invention: A recording method for obtaining pure speech

Granted publication date: 20220617

License type: Common License

Record date: 20221228

Application publication date: 20190423

Assignee: Hangzhou Ruiboqifan Enterprise Management Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2022330000903

Denomination of invention: A recording method for obtaining pure speech

Granted publication date: 20220617

License type: Common License

Record date: 20221228

Application publication date: 20190423

Assignee: Hangzhou Anfeng Jiyue Cultural Creativity Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2022330000901

Denomination of invention: A recording method for obtaining pure speech

Granted publication date: 20220617

License type: Common License

Record date: 20221228

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190423

Assignee: Taizhou Linhai Xinxing Safety Technology Training Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2023980047386

Denomination of invention: A Recording Method for Obtaining Pure Speech

Granted publication date: 20220617

License type: Common License

Record date: 20231117

EE01 Entry into force of recordation of patent licensing contract