CN112037759A - Anti-noise perception sensitivity curve establishing and voice synthesizing method - Google Patents

Anti-noise perception sensitivity curve establishing and voice synthesizing method Download PDF

Info

Publication number
CN112037759A
CN112037759A CN202010686375.3A CN202010686375A CN112037759A CN 112037759 A CN112037759 A CN 112037759A CN 202010686375 A CN202010686375 A CN 202010686375A CN 112037759 A CN112037759 A CN 112037759A
Authority
CN
China
Prior art keywords
noise
critical
sensitivity curve
voice
perception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010686375.3A
Other languages
Chinese (zh)
Other versions
CN112037759B (en
Inventor
杨玉红
冯佳倩
蔡林君
陈旭峰
刘青沐
郭佳昊
余洪江
涂卫平
艾浩军
王晓晨
高戈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202010686375.3A priority Critical patent/CN112037759B/en
Publication of CN112037759A publication Critical patent/CN112037759A/en
Application granted granted Critical
Publication of CN112037759B publication Critical patent/CN112037759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals

Abstract

The invention provides an anti-noise perception sensitivity curve establishing and voice synthesizing method, which comprises the steps of using band-pass filtering to divide noise according to critical frequency bands perceived by human ears to obtain a plurality of critical frequency band noises; recording a corresponding anti-noise voice sequence according to different noise decibels aiming at each critical frequency band noise; determining a perception threshold value based on the SII objective test index, and performing noise decibel level perception test on each critical frequency band to obtain an updated critical decibel; generating an anti-noise perception sensitivity curve according to the updated critical decibels; and obtaining critical decibel values from an anti-noise perception sensitivity curve, selecting anti-noise voices with different critical decibel values, training an anti-noise voice feature mapping model, and performing voice synthesis by using the mapped anti-noise voice features. The method of the invention utilizes the hearing characteristic of people in the noise environment to provide an anti-noise perception sensitivity curve establishing and voice synthesizing method, which is more beneficial to the practical application scene of anti-noise voice conversion.

Description

Anti-noise perception sensitivity curve establishing and voice synthesizing method
Technical Field
The invention belongs to the technical field of acoustics, and particularly relates to an anti-noise perception sensitivity curve establishing and voice synthesizing method.
Background
An equal loudness curve refers to the plot of sound pressure level versus frequency for a pure tone of the same perceived loudness of a typical listener. And (3) an equal loudness curve of the binaural audiometry, wherein a dotted line with the lowest threshold value, namely a pure tone minimum audible sound field, is used as a hearing threshold curve of the binaural audiometry. The loudness is mainly determined by the sound intensity, and the loudness level is correspondingly increased by increasing the sound intensity. However, the loudness of sound is not determined purely by the sound intensity, but depends on the frequency, and pure tones of different frequencies have different loudness growth rates, wherein the loudness growth rate of low-frequency pure tones is faster than that of medium-frequency pure tones.
Thus, similar to the equal loudness curve, speakers perceive ambient noise at different frequencies, at different noise levels, and in different anti-noise sound production patterns triggered accordingly. Determining a distinguishing threshold curve of a speaker for the decibel level change of environmental noise, guiding to establish an anti-noise sound production model based on the Lombard effect, starting corresponding anti-noise voice conversion in due time, and ensuring the consistency of the converted anti-noise voice and various real noise scenes. However, the prior art focuses on the acoustic features of the Lombard effect changes, and the importance of the acoustic features to improve the intelligibility of anti-noise speech. Due to lack of guidance of anti-noise perception sensitivity, the converted anti-noise voice is not matched with a real scene, and experience of subsequent voice application is further influenced.
The invention provides an anti-noise perception sensitivity curve establishment and speech synthesis method, which aims to fully utilize the perception characteristics of people in different noise environments, study the anti-noise vocalization mechanism from the perspective of auditory perception, establish the perception sensitivity curve of a speaker to environmental noise and solve the problem that the anti-noise speech conversion is disconnected from a real scene due to the lack of auditory perception model guidance of the anti-noise speech vocalization at present.
Disclosure of Invention
The invention provides an anti-noise perception sensitivity curve establishing and voice synthesizing method, aiming at solving the problem that the existing anti-noise voice production is lack of auditory perception model guidance and reducing the detail difference in frequency.
The technical scheme adopted by the invention is that the method for establishing the anti-noise perception sensitivity curve comprises the following steps,
step 1, dividing noise according to critical frequency bands sensed by human ears by using band-pass filtering to obtain a plurality of critical frequency band noises;
step 2, recording corresponding anti-noise voice sequences according to different noise decibels aiming at each critical frequency band noise in the step 1;
step 3, determining a perception threshold value based on the SII objective test index, and performing noise decibel level perception test on each critical frequency band to obtain an updated critical decibel;
and 4, generating an anti-noise perception sensitivity curve according to the updated critical decibels obtained in the step 3.
In step 1, white noise is used as the noise.
In step 1, Bark band or Mel band is used as the critical band of human ear perception.
Moreover, the implementation manner of the step 2 is that firstly, aiming at each critical frequency band noise obtained in the step 1, data is collected through a manual head, each critical frequency band noise is correspondingly adjusted according to a preset signal-to-noise ratio, and the decibel level is calibrated; and then respectively recording voice sequences for different decibel levels according to the noise of each critical frequency band.
And recording according to the preset lower limit MIN and upper limit MAX of the signal-to-noise ratio range and the step length d respectively according to the signal-to-noise ratio of MIN, MIN + d, MIN +2d, … and MAX to obtain the corresponding voice sequence.
In step 3, the noise decibel level sensing test for each critical frequency band is realized by using the MUSHRA standard.
The invention also provides a speech synthesis method based on the anti-noise perception sensitivity curve, which comprises the following steps,
step 1, dividing noise according to critical frequency bands sensed by human ears by using band-pass filtering to obtain a plurality of critical frequency band noises;
step 2, recording corresponding anti-noise voice sequences according to different noise decibels aiming at each critical frequency band noise in the step 1;
step 3, determining a perception threshold value based on the SII objective test index, and performing noise decibel level perception test on each critical frequency band to obtain an updated critical decibel;
step 4, generating an anti-noise perception sensitivity curve according to the updated critical decibels obtained in the step 3;
and 5, obtaining critical decibel values from the anti-noise perception sensitivity curve obtained in the step 4, selecting anti-noise voices with different critical decibel values, training an anti-noise voice feature mapping model, and performing voice synthesis by using the mapped anti-noise voice features.
Furthermore, in step 5, the WORLD vocoder is used to extract acoustic features, including the fundamental frequency and the spectral envelope.
In step 5, the anti-noise speech feature mapping model is obtained by training a spectrum envelope by using an EM method and a gaussian mixture model.
And moreover, based on the spectrum envelope feature conversion result obtained by the anti-noise voice feature mapping model, voice synthesis is carried out by combining the fundamental frequency feature.
The method of the invention utilizes the hearing characteristic and the special sound production mechanism of people in the noise environment to provide an anti-noise perception sensitivity curve establishment and voice synthesis method, which is more beneficial to the practical application scene of anti-noise voice conversion, has high accuracy and wide application prospect, for example, a large amount of anti-noise voice data sets are needed in the practical application of voice separation and conference transcription.
Detailed Description
The present invention will be described in further detail with reference to examples for the purpose of facilitating understanding and practice of the invention by those of ordinary skill in the art, and it is to be understood that the present invention has been described in the illustrative embodiments and is not to be construed as limited thereto.
The method provided by the invention can realize the process by using computer software technology and other hardware equipment, and the process of the invention is specifically explained below.
Example one
The embodiment of the invention provides a voice synthesis method established based on an anti-noise perception sensitivity curve, which comprises the following specific implementation steps:
step 1: dividing the noise according to the critical frequency band sensed by the human ear by using band-pass filtering to obtain a plurality of critical frequency band noises;
the noise used in the embodiment is white noise, a Bark band is used as a critical frequency band of human ear perception, and the white noise is divided according to the Bark band by using band-pass filtering.
Step 2: recording a corresponding anti-noise voice sequence according to different noise decibels aiming at each critical frequency band noise obtained in the step 1;
for step 2, this embodiment may be implemented by the following steps:
step 2.1: and (3) aiming at each Bark band noise in the step (1), acquiring data through a manual head, correspondingly adjusting each Bark band noise according to a preset signal-to-noise ratio, and calibrating the decibel level.
Considering that the common scene noise is about 35dB, the hearing pain threshold of human ear is 85dB, and the preset signal-to-noise ratio range in the embodiment is 40-85dB, that is, MIN is 40, MAX is 85, and the step length d is 5 dB. And for each Bark band noise, recording according to signal-to-noise ratios of 40 dB, 45 dB, … 80 dB and 85dB respectively to obtain corresponding voice data.
The preferred recording materials and specific settings used in the examples are as follows:
embodiments use an artificial head device for recording, such as a g.r.a.s.kemar 45BA 1/2 inch low noise ear analog system, including a highly simulated extended ear canal. In order to avoid other noises such as wall reflection and the like, various environmental noises are played in the earphone by manually wearing the earphone, and the accurate signal-to-noise ratio can be obtained by manually recording the sound by the head.
The signal-to-noise ratio is calculated in the art as follows:
Figure RE-GDA0002762118810000031
wherein s (n) is a speech signal, d (n) is a noise signal, psFor the power of the speech signal, pdIs the noise signal power, where N is the sampling point and N is the sampling point length.
Step 2.2: and respectively recording voice sequences for different decibel levels according to the noise of each Bark band.
In specific implementation, each speaker can wear an earphone, the earphone plays the noise calibrated in the step 2.1, and the voice sequence of each speaker is recorded for different decibel levels aiming at the noise of each Bark band. The corresponding experiment of the embodiment scheme is carried out in a anechoic room of Wuhan university, and a high-fidelity microphone is used for recording to obtain the voice data of corresponding decibel level.
Specifically, step 1 and step 2 may be performed in advance as input data.
And step 3: determining a perception threshold value based on a Speech Intelligibility Index (SII) objective Test index, and then performing noise decibel level perception Test on each critical frequency band by using a MUSHRA (Multi-Stimulus Test with high Reference and Anchor) standard to obtain an updated critical decibel level;
in specific implementation, other objective test indexes can be adopted, for example; other criteria may also be used for testing, such as the clarity Index (AI)
For step 3, this embodiment may be implemented by the following steps:
step 3.1: the improvement is carried out based on a definition index SII, the SII depends on the audible proportion of a listener in the spectrum information, the step uses a definition formula of the SII, and the critical decibel is calculated under the condition of a determined SII score, and the definition formula of the SII is as follows:
Figure RE-GDA0002762118810000046
Figure RE-GDA0002762118810000041
wherein, SII score is 0-1, and 0.35 is taken for determining decibel threshold value in the embodiment; n isf20 for the total number of frequency bands; wfA human ear perception weight representing the frequency band f; l isfA variable element representing a speech level distortion; efAnd DfDecibels representing speech and interference noise, respectively;
Figure RE-GDA0002762118810000042
representing the audible threshold for that band.
By the formula, while the speech intelligibility is ensured, the noise signal-to-noise ratio (critical decibel) corresponding to the anti-noise speech is obtained, namely Ef-Df
Step 3.2: fine-tuning the critical decibel value in step 3.1: noise decibel level perception experiments were performed on each Bark band noise, where hearing perception tests were performed using the MUSHRA standard, and Word Error Rate (WER) was calculated. In order to keep the recognized word sequence consistent with the standard sequence, some words are replaced, deleted, or inserted, and the total number of words is divided by the total number of words in the standard sequence, multiplied by a percentage. The final word error rate calculation formula is as follows:
Figure RE-GDA0002762118810000043
the obtained error rate is a score, and the statistical significance is required to be taken as a reference, and the average score of each voice sequence is calculated firstly
Figure RE-GDA0002762118810000044
Figure RE-GDA0002762118810000045
Wherein, scoreijkAnd (4) representing the score of the ith listener on the kth voice under the jth signal-to-noise ratio level, wherein N is the total number of listeners in the subjective experiment. Confidence intervals for each average score were then calculated:
Figure RE-GDA0002762118810000051
Figure RE-GDA0002762118810000052
the confidence coefficient is 95%, and non-repeated boundary values are found by comparing confidence intervals of different signal-to-noise ratios, and the critical decibel is updated.
And 4, step 4: and (4) generating an anti-noise perception sensitivity curve according to the test result in the step (3) (the updated critical decibel obtained in the step (3.2)).
In the present embodiment, Bark band is used, so the sensitivity curve here is plotted with Bark band on the horizontal axis and Bark band noise decibel level on the vertical axis, and other frequency bands, such as Mel band, may be used to generate corresponding cancellation in specific implementation.
And 5: and 4, obtaining critical decibel values from the anti-noise perception sensitivity curve in the step 4, selecting anti-noise voices with different critical decibel values, training an anti-noise voice feature mapping model, and performing voice synthesis by using the mapped anti-noise voice features.
For step 5, this embodiment may be implemented by the following steps:
step 5.1: and selecting anti-noise voices with different critical decibel values and corresponding common voices in the anti-noise perception sensitivity curve, and extracting acoustic features such as fundamental frequency (f0) and spectral envelope (spec).
In this embodiment, the method of extracting acoustic features by using the WORLD vocoder includes:
f0=DIO(x,fs)
spec=CheapTrick(x,fs,f0)
where x is the input speech signal, fs is the sampling rate, DIO and cheaptlock are prior art in the worrld vocoder, and the present invention is not described in detail.
Step 5.2: and (5) training an anti-noise voice feature mapping model by using the acoustic features extracted in the step (5.1), and performing feature conversion by using the feature mapping model.
The anti-noise speech feature mapping model used in this embodiment is a Gaussian Mixture Model (GMM), and a maximum-Expectation algorithm (EM) is used to train the GMM corresponding to the spec in step 5.1, where the spec feature is 24-dimensional, and the GMM is not described in detail for the prior art
In this embodiment, the GMM is used as the feature mapping model, and neural network models such as CycleGAN and StarGAN may also be used.
Step 5.3: and converting the spec characteristic into spe' by using the mapping model in the step 5.2, and combining other characteristics in the step 5.1 for voice synthesis.
This step adopts WORLD vocoder to carry out speech synthesis, includes:
source=Platinum(x,f0,spec)
y=SynthesisByWORLD(source,spec')
wherein y is the synthesized voice, and Platinum and synthesized ByWORLD are the prior art of WORLD vocoder, which is not repeated in the present invention.
In the embodiment, a WORLD vocoder is preferably used for analyzing and synthesizing the voice, wherein a STRAIGHT vocoder and the like can be used for analyzing the voice, and a neural network model such as WaveNet and WaveGAN can be used for synthesizing the voice.
Example two
The second embodiment of the invention fully utilizes the auditory characteristic of people in a noise environment, provides an anti-noise perception sensitivity curve establishing method, and can provide key guidance for anti-noise voice conversion in practical application. In specific implementation, the steps 1 to 4 in the first embodiment are implemented.
In specific implementation, the method provided by the technical scheme of the invention can be used for realizing an automatic operation process by a person skilled in the art by adopting a computer software technology to carry out operations such as generating an anti-noise perception sensitivity curve, synthesizing voice and the like. The system device for operating the method, such as a computer readable storage medium storing the corresponding computer program of the technical solution of the present invention and a computer apparatus including the corresponding computer program, should also be within the scope of the present invention.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. An anti-noise perception sensitivity curve establishing method is characterized by comprising the following steps: comprises the following steps of (a) carrying out,
step 1, dividing noise according to critical frequency bands sensed by human ears by using band-pass filtering to obtain a plurality of critical frequency band noises;
step 2, recording corresponding anti-noise voice sequences according to different noise decibels aiming at each critical frequency band noise in the step 1;
step 3, determining a perception threshold value based on the objective test index, and performing noise decibel level perception test on each critical frequency band to obtain updated critical decibels;
and 4, generating an anti-noise perception sensitivity curve according to the updated critical decibels obtained in the step 3.
2. The anti-noise perceptual sensitivity curve creation method of claim 1, wherein: in step 1, the noise is white noise.
3. The anti-noise perceptual sensitivity curve creation method of claim 1, wherein: in step 1, Bark band or Mel band is used as the critical band of human ear perception.
4. The anti-noise perceptual sensitivity curve creation method of claim 1, wherein: the implementation mode of the step 2 is that firstly, aiming at each critical frequency band noise obtained in the step 1, data is collected through a manual head, each critical frequency band noise is correspondingly adjusted according to a preset signal-to-noise ratio, and the decibel level is calibrated; and then respectively recording voice sequences for different decibel levels according to the noise of each critical frequency band.
5. The anti-noise perceptual sensitivity curve creation method of claim 4, wherein: and recording according to the preset lower limit MIN, the preset upper limit MAX and the preset step length d of the signal-to-noise ratio range and the signal-to-noise ratio of MIN, MIN + d, MIN +2d, … and MAX respectively to obtain a corresponding voice sequence.
6. The anti-noise perceptual sensitivity curve creation method of claim 1, wherein: in step 3, a perception threshold value is determined based on the SII objective test index, and a noise decibel level perception test is carried out on each critical frequency band by adopting the MUSHRA standard.
7. A speech synthesis method based on anti-noise perception sensitivity curve establishment is characterized in that: comprises the following steps of (a) carrying out,
step 1, dividing noise according to critical frequency bands sensed by human ears by using band-pass filtering to obtain a plurality of critical frequency band noises;
step 2, recording corresponding anti-noise voice sequences according to different noise decibels aiming at each critical frequency band noise in the step 1;
step 3, determining a perception threshold value based on the objective test index, and performing noise decibel level perception test on each critical frequency band to obtain updated critical decibels;
step 4, generating an anti-noise perception sensitivity curve according to the updated critical decibels obtained in the step 3;
and 5, obtaining critical decibel values from the anti-noise perception sensitivity curve obtained in the step 4, selecting anti-noise voices with different critical decibel values, training an anti-noise voice feature mapping model, and performing voice synthesis by using the mapped anti-noise voice features.
8. The method of speech synthesis based on antinoise perceptual sensitivity curve creation as defined in claim 7, wherein: in step 5, a WORLD vocoder is used to extract acoustic features, including fundamental frequency and spectral envelope.
9. A speech synthesis method based on an antinoise perceptual sensitivity curve according to claim 8, characterized in that: in step 5, the anti-noise voice feature mapping model is obtained by adopting a Gaussian mixture model and using an EM (effective noise ratio) method to train the spectrum envelope.
10. A speech synthesis method based on an antinoise perceptual sensitivity curve according to claim 9, characterized in that: and performing voice synthesis by combining the fundamental frequency characteristics based on the spectrum envelope characteristic conversion result obtained by the anti-noise voice characteristic mapping model.
CN202010686375.3A 2020-07-16 2020-07-16 Anti-noise perception sensitivity curve establishment and voice synthesis method Active CN112037759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010686375.3A CN112037759B (en) 2020-07-16 2020-07-16 Anti-noise perception sensitivity curve establishment and voice synthesis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010686375.3A CN112037759B (en) 2020-07-16 2020-07-16 Anti-noise perception sensitivity curve establishment and voice synthesis method

Publications (2)

Publication Number Publication Date
CN112037759A true CN112037759A (en) 2020-12-04
CN112037759B CN112037759B (en) 2022-08-30

Family

ID=73579514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010686375.3A Active CN112037759B (en) 2020-07-16 2020-07-16 Anti-noise perception sensitivity curve establishment and voice synthesis method

Country Status (1)

Country Link
CN (1) CN112037759B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450780A (en) * 2021-06-16 2021-09-28 武汉大学 Lombard effect classification method for auditory perception loudness space

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1460992A (en) * 2003-07-01 2003-12-10 北京阜国数字技术有限公司 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding
US20040024591A1 (en) * 2001-10-22 2004-02-05 Boillot Marc A. Method and apparatus for enhancing loudness of an audio signal
US20110178799A1 (en) * 2008-07-25 2011-07-21 The Board Of Trustees Of The University Of Illinois Methods and systems for identifying speech sounds using multi-dimensional analysis
CN103165136A (en) * 2011-12-15 2013-06-19 杜比实验室特许公司 Audio processing method and audio processing device
CN103390408A (en) * 2012-05-09 2013-11-13 奥迪康有限公司 Method and apparatus for processing audio signal
CN105869652A (en) * 2015-01-21 2016-08-17 北京大学深圳研究院 Psychological acoustic model calculation method and device
US20190156855A1 (en) * 2016-05-11 2019-05-23 Nuance Communications, Inc. Enhanced De-Esser For In-Car Communication Systems
CN110085245A (en) * 2019-04-09 2019-08-02 武汉大学 A kind of speech intelligibility Enhancement Method based on acoustic feature conversion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040024591A1 (en) * 2001-10-22 2004-02-05 Boillot Marc A. Method and apparatus for enhancing loudness of an audio signal
CN1460992A (en) * 2003-07-01 2003-12-10 北京阜国数字技术有限公司 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding
US20110178799A1 (en) * 2008-07-25 2011-07-21 The Board Of Trustees Of The University Of Illinois Methods and systems for identifying speech sounds using multi-dimensional analysis
CN103165136A (en) * 2011-12-15 2013-06-19 杜比实验室特许公司 Audio processing method and audio processing device
CN103390408A (en) * 2012-05-09 2013-11-13 奥迪康有限公司 Method and apparatus for processing audio signal
CN105869652A (en) * 2015-01-21 2016-08-17 北京大学深圳研究院 Psychological acoustic model calculation method and device
US20190156855A1 (en) * 2016-05-11 2019-05-23 Nuance Communications, Inc. Enhanced De-Esser For In-Car Communication Systems
CN110085245A (en) * 2019-04-09 2019-08-02 武汉大学 A kind of speech intelligibility Enhancement Method based on acoustic feature conversion

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
G. LI等: "Normal-To-Lombard Speech Conversion by LSTM Network and BGMM for Intelligibility Enhancement of Telephone Speech", 《2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 *
S. SESHADRI等: "Vocal Effort Based Speaking Style Conversion Using Vocoder Features and Parallel Learning", 《IEEE ACCESS》 *
田斌等: "一种用于强噪声环境下语音识别的含噪Lombard及Loud语音补偿方法", 《声学学报(中文版)》 *
田斌等: "一种用于强噪声环境下语音识别的含噪Lombard及Loud语音补偿方法", 《声学学报》 *
陈胜等: "基于人耳感知掩蔽效应的子空间语音增强算法研究", 《电子质量》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450780A (en) * 2021-06-16 2021-09-28 武汉大学 Lombard effect classification method for auditory perception loudness space
CN113450780B (en) * 2021-06-16 2023-02-24 武汉大学 Lombard effect classification method for auditory perception loudness space

Also Published As

Publication number Publication date
CN112037759B (en) 2022-08-30

Similar Documents

Publication Publication Date Title
US5737719A (en) Method and apparatus for enhancement of telephonic speech signals
US9943253B2 (en) System and method for improved audio perception
Humes et al. Application of the Articulation Index and the Speech Transmission Index to the recognition of speech by normal-hearing and hearing-impaired listeners
US8369549B2 (en) Hearing aid system adapted to selectively amplify audio signals
US8867764B1 (en) Calibrated hearing aid tuning appliance
CN109246515B (en) A kind of intelligent earphone and method promoting personalized sound quality function
CN107293286B (en) Voice sample collection method based on network dubbing game
US20140309549A1 (en) Methods for testing hearing
Boothroyd et al. The hearing aid input: A phonemic approach to assessing the spectral distribution of speech
Marzinzik Noise reduction schemes for digital hearing aids and their use for the hearing impaired
US6956955B1 (en) Speech-based auditory distance display
Kates et al. The hearing-aid audio quality index (HAAQI)
Monson et al. The maximum audible low-pass cutoff frequency for speech
CN112037759B (en) Anti-noise perception sensitivity curve establishment and voice synthesis method
WO2022240346A1 (en) Voice optimization in noisy environments
KR100888049B1 (en) A method for reinforcing speech using partial masking effect
DK2584795T3 (en) Method for determining a compression characteristic
Herzke et al. Effects of instantaneous multiband dynamic compression on speech intelligibility
CN113450780B (en) Lombard effect classification method for auditory perception loudness space
Salehi et al. Electroacoustic assessment of wireless remote microphone systems
CN114205724B (en) Hearing aid earphone debugging method, device and equipment
Bouserhal et al. On the potential for artificial bandwidth extension of bone and tissue conducted speech: A mutual information study
Patel et al. Frequency-based multi-band adaptive compression for hearing aid application
JP7404664B2 (en) Audio processing device and audio processing method
RU2589298C1 (en) Method of increasing legible and informative audio signals in the noise situation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant