CN109671422B - 一种获取纯净语音的录音方法 - Google Patents

一种获取纯净语音的录音方法 Download PDF

Info

Publication number
CN109671422B
CN109671422B CN201910017762.5A CN201910017762A CN109671422B CN 109671422 B CN109671422 B CN 109671422B CN 201910017762 A CN201910017762 A CN 201910017762A CN 109671422 B CN109671422 B CN 109671422B
Authority
CN
China
Prior art keywords
frame
frequency domain
coefficient
coefficients
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910017762.5A
Other languages
English (en)
Other versions
CN109671422A (zh
Inventor
陆成刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201910017762.5A priority Critical patent/CN109671422B/zh
Publication of CN109671422A publication Critical patent/CN109671422A/zh
Application granted granted Critical
Publication of CN109671422B publication Critical patent/CN109671422B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种获取纯净语音的录音方法,包括以下步骤:1)用户将需要录音的文本输入语音合成引擎TTS,并选定一个音色参数;2)收到语音合成TTS引擎输出的合成音频,逐帧进行Fourier变换,生成频域系数;3)计算该帧频谱能量;4)基于能量系数对该帧进行清音浊音检测,如果是清音跳到步骤6),否则进行步骤5);5)对浊音帧频谱能量的部分共振峰系数使用用户事先录制的用户自身的浊音频谱共振峰系数代替,得到修正的该帧频谱能量系数;6)该帧频谱能量系数使用Griffin‑lem算法处理生成频域系数;7)对该帧频域系数进行反Fourier变换恢复成时域语音信号。本发明不需要依赖任何麦克风设备即能实现高纯净的语音录音。

Description

一种获取纯净语音的录音方法
技术领域
本发明属于语音录音技术领域,涉及一种获取纯净语音的录音方法。
背景技术
一般录音时必然会在录取声音中加载背景杂音,这个杂音可能包含线路热噪声、或者环境声学杂音,只是视录音器材的优劣、录音环境的安静程度,背景噪音的强弱程度有所不同而已。如果为了获取纯净度较高的语音录音,可以采用搭建专门的录音室、以及采用专门的录音麦克风。录音室为了降低声音在室内的声学反射传播,所有的墙面家具表面均采用吸声材质进行装饰,麦克风采用高价格的对线路热噪声具有良好抑制性能、并且频谱响应较宽较平直的电子设备。此外,还有一种常见的获取纯净录音的技术方法,使用软件形式或者硬件形式的噪声滤波器。而这又分为两个技术路线:一、对于单麦克风处理,即获取声音后经过AD转换,变成数字域声音然后采用盲降噪技术进行噪声频谱估计从而将噪声成份滤除;二、使用多路麦克风组成的声学阵列,多路麦克风间互为参考信号,能够判定录音声源的方位、强弱,从而形成指向该音源的自适应BIN丛,这样就避免了背景杂音的录取。但这些技术要么依赖特殊的录音场地或录音设备,要么依赖特殊的降噪仪器,总之迄今仍没有只依赖普通设备、甚至不需要麦克风设备的录音方法。
发明内容
为了克服已有获取纯净录音方式的需要依赖特殊的录音场地或录音设备或降噪仪器、操作麻烦的不足,本发明提供一种不需要依赖任何麦克风设备即能实现高纯净的语音录音的获取纯净语音的录音方法。
本发明解决其技术问题所采用的技术方案是:
一种获取纯净语音的录音方法,包括以下步骤:
1)用户将需要录音的文本输入语音合成引擎TTS,并选定一个音色参数;
2)收到语音合成TTS引擎输出的合成音频,逐帧进行Fourier变换,生成频域系数;
3)计算该帧频谱能量,即频域系数平方求和;
4)基于能量系数对该帧进行清音浊音检测,如果是清音,则跳到步骤6),否则进行步骤5);.
5)对浊音帧频谱能量的部分共振峰系数(F3、F4、F5)使用用户事先录制的用户自身的浊音频谱共振峰系数(F3、F4、F5)代替,得到修正的该帧频谱能量系数;
6)该帧频谱能量系数使用Griffin-lem算法处理生成频域系数;
7)对该帧频域系数进行反Fourier变换恢复成时域语音信号。
进一步,所述步骤1)中,选定的音色参数必须是使用录音室级的纯净采样合成语音的音色。
本发明中,采用完全不同的思路实现纯净语音的录音:采用浊音共振峰频谱搬移,再基于Griffin-lem算法恢复成频域,通过反Fourier变换生成时域语音信号。
原理是将需要录音的语音文本传送给语音合成引擎TTS(一般TTS引擎的音素库都是使用录音室级的纯净采样样本),同时选定一个待输出的合成音色(男声、女声或童声),收到语音合成引擎TTS输出的音频后,对音频信号进行Fourier变换,生成一帧帧频域系数,再计算每帧频谱能量系数,基于能量系数对该帧进行清音、浊音检测,对浊音段频谱能量的部分共振峰系数(F3、F4、F5)使用对应的用户事先录制的自身的浊音频谱共振峰系数(F3、F4、F5)代替,对修正后的该帧频谱系数使用Griffin-lem算法处理生成频域系数,然后进行反Fourier变换,生成具有用户自己音色特征的纯净语音。
本发明的有益效果主要表现在:不需要依赖任何麦克风设备即能实现录音,而且是高纯净的语音录音。
附图说明
图1是一种获取纯净语音的录音方法的流程图。
具体实施方式
下面结合附图对本发明作进一步描述。
参照图1,一种获取纯净语音的录音方法,包括以下步骤:
1)用户将需要录音的文本输入语音合成引擎TTS,并选定一个音色参数;
2)收到语音合成TTS引擎输出的合成音频,逐帧进行Fourier变换,生成频域系数;
3)计算该帧频谱能量,即频域系数平方求和;
4)基于能量系数对该帧进行清音浊音检测,如果是清音,则跳到步骤6),否则进行步骤5);.
5)对浊音帧频谱能量的部分共振峰系数(F3、F4、F5)使用用户事先录制的用户自身的浊音频谱共振峰系数(F3、F4、F5)代替,得到修正的该帧频谱能量系数;
6)该帧频谱能量系数使用Griffin-lem算法处理生成频域系数;
7)对该帧频域系数进行反Fourier变换恢复成时域语音信号。
进一步,所述步骤1)中,选定的音色参数必须是使用录音室级的纯净采样合成语音的音色。

Claims (2)

1.一种获取纯净语音的录音方法,其特征在于,所述方法包括以下步骤:
1)用户将需要录音的文本输入语音合成引擎TTS,并选定一个音色参数;
2)收到语音合成TTS引擎输出的合成音频,逐帧进行Fourier变换,生成频域系数;
3)计算每帧频谱能量,即频域系数平方求和;
4)基于能量系数对每帧进行清音浊音检测,如果是清音,则跳到步骤6),否则进行步骤5);
5)对浊音帧频谱能量的部分共振峰系数F3、F4和F5使用用户事先录制的用户自身的浊音频谱共振峰系数F3、F4和F5代替,得到修正的每帧频谱能量系数;
6)每帧频谱能量系数使用Griffin-lem算法处理生成频域系数;
7)对每帧频域系数进行反Fourier变换恢复成时域语音信号。
2.如权利要求1所述的一种获取纯净语音的录音方法,其特征在于,所述步骤1)中,选定的音色参数是使用录音室级的纯净采样合成语音的音色。
CN201910017762.5A 2019-01-09 2019-01-09 一种获取纯净语音的录音方法 Active CN109671422B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910017762.5A CN109671422B (zh) 2019-01-09 2019-01-09 一种获取纯净语音的录音方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910017762.5A CN109671422B (zh) 2019-01-09 2019-01-09 一种获取纯净语音的录音方法

Publications (2)

Publication Number Publication Date
CN109671422A CN109671422A (zh) 2019-04-23
CN109671422B true CN109671422B (zh) 2022-06-17

Family

ID=66149428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910017762.5A Active CN109671422B (zh) 2019-01-09 2019-01-09 一种获取纯净语音的录音方法

Country Status (1)

Country Link
CN (1) CN109671422B (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246502A (zh) * 2019-06-26 2019-09-17 广东小天才科技有限公司 语音降噪方法、装置及终端设备
CN113838453B (zh) * 2021-08-17 2022-06-28 北京百度网讯科技有限公司 语音处理方法、装置、设备和计算机存储介质
CN113838452B (zh) 2021-08-17 2022-08-23 北京百度网讯科技有限公司 语音合成方法、装置、设备和计算机存储介质

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0193796A (ja) * 1987-10-06 1989-04-12 Nippon Hoso Kyokai <Nhk> 声質変換方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR920008259B1 (ko) * 1990-03-31 1992-09-25 주식회사 금성사 포만트의 선형전이구간 분할에 의한 한국어 합성방법
CA2056110C (en) * 1991-03-27 1997-02-04 Arnold I. Klayman Public address intelligibility system
JP2737459B2 (ja) * 1991-06-26 1998-04-08 ヤマハ株式会社 フォルマント合成装置
US7424430B2 (en) * 2003-01-30 2008-09-09 Yamaha Corporation Tone generator of wave table type with voice synthesis capability
CN101067929B (zh) * 2007-06-05 2011-04-20 南京大学 使用共振峰增强提取话音共振峰轨迹的方法
CN101359473A (zh) * 2007-07-30 2009-02-04 国际商业机器公司 自动进行语音转换的方法和装置
CN106057192A (zh) * 2016-07-07 2016-10-26 Tcl集团股份有限公司 一种实时语音转换方法和装置
CN108682413B (zh) * 2018-04-24 2020-09-29 上海师范大学 一种基于语音转换的情感疏导系统

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0193796A (ja) * 1987-10-06 1989-04-12 Nippon Hoso Kyokai <Nhk> 声質変換方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于语音频谱的基频和共振峰提取算法;王坤赤等;《信息技术》;20071025(第10期);全文 *

Also Published As

Publication number Publication date
CN109671422A (zh) 2019-04-23

Similar Documents

Publication Publication Date Title
Li et al. On the importance of power compression and phase estimation in monaural speech dereverberation
La Bouquin-Jeannes et al. Enhancement of speech degraded by coherent and incoherent noise using a cross-spectral estimator
CN109671422B (zh) 一种获取纯净语音的录音方法
Naylor et al. Speech dereverberation
Marro et al. Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering
JP5127754B2 (ja) 信号処理装置
JP5275612B2 (ja) 周期信号処理方法、周期信号変換方法および周期信号処理装置ならびに周期信号の分析方法
CN102026080B (zh) 音频处理系统和自适应反馈抵消方法
US20130073284A1 (en) Speech Enhancement System
KR20130108391A (ko) 다중 채널 오디오 신호를 분해하는 방법, 장치 및 머신 판독가능 저장 매체
WO2005050618A3 (en) Adaptive beamformer with robustness against uncorrelated noise
KR20080019222A (ko) 음성-상태 모델을 사용하는 다중-감각 음성 향상을 위한잡읍-감소된 값에 대한 추정치를 구하는 방법, 컴퓨터판독가능 매체 및 깨끗한 음성 값을 식별하는 방법
JP5717097B2 (ja) 音声合成用の隠れマルコフモデル学習装置及び音声合成装置
CN106653048B (zh) 基于人声模型的单通道声音分离方法
EP1913591B1 (en) Enhancement of speech intelligibility in a mobile communication device by controlling the operation of a vibrator in dependance of the background noise
JP5443547B2 (ja) 信号処理装置
CN102118675A (zh) 带有自适应反馈补偿装置的助听器
CN110931034B (zh) 一种送话拾音麦克风内置型耳机的拾音降噪方法
JP2012208177A (ja) 帯域拡張装置及び音声補正装置
CN111968627B (zh) 一种基于联合字典学习和稀疏表示的骨导语音增强方法
Liu et al. Phase Spectrum Recovery for Enhancing Low-Quality Speech Captured by Laser Microphones
JP2011180234A (ja) 音響信号に対する情報の埋め込み装置
Prasanna et al. Speech enhancement using source features and group delay analysis
Bae et al. On a new hybrid speech coder using variables LPF
US20130226568A1 (en) Audio signals by estimations and use of human voice attributes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190423

Assignee: Lingqi Internet of Things Technology (Hangzhou) Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2022330000931

Denomination of invention: A recording method for obtaining pure speech

Granted publication date: 20220617

License type: Common License

Record date: 20221229

Application publication date: 20190423

Assignee: Zhejiang Yu'an Information Technology Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2022330000897

Denomination of invention: A recording method for obtaining pure speech

Granted publication date: 20220617

License type: Common License

Record date: 20221228

Application publication date: 20190423

Assignee: Hangzhou Ruiboqifan Enterprise Management Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2022330000903

Denomination of invention: A recording method for obtaining pure speech

Granted publication date: 20220617

License type: Common License

Record date: 20221228

Application publication date: 20190423

Assignee: Hangzhou Anfeng Jiyue Cultural Creativity Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2022330000901

Denomination of invention: A recording method for obtaining pure speech

Granted publication date: 20220617

License type: Common License

Record date: 20221228

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190423

Assignee: Taizhou Linhai Xinxing Safety Technology Training Co.,Ltd.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2023980047386

Denomination of invention: A Recording Method for Obtaining Pure Speech

Granted publication date: 20220617

License type: Common License

Record date: 20231117

EE01 Entry into force of recordation of patent licensing contract