WO2021114545A1 - 一种声音增强方法及声音增强系统 - Google Patents

一种声音增强方法及声音增强系统 Download PDF

Info

Publication number
WO2021114545A1
WO2021114545A1 PCT/CN2020/086485 CN2020086485W WO2021114545A1 WO 2021114545 A1 WO2021114545 A1 WO 2021114545A1 CN 2020086485 W CN2020086485 W CN 2020086485W WO 2021114545 A1 WO2021114545 A1 WO 2021114545A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
signal
eigenmode
sound enhancement
functions
Prior art date
Application number
PCT/CN2020/086485
Other languages
English (en)
French (fr)
Inventor
黄锷
叶家荣
Original Assignee
江苏爱谛科技研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏爱谛科技研究院有限公司 filed Critical 江苏爱谛科技研究院有限公司
Priority to US16/764,057 priority Critical patent/US11570553B2/en
Publication of WO2021114545A1 publication Critical patent/WO2021114545A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/35Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception using translation techniques
    • H04R25/356Amplitude, e.g. amplitude shift or compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/35Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception using translation techniques
    • H04R25/353Frequency, e.g. frequency shift or compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/50Customised settings for obtaining desired overall acoustical characteristics
    • H04R25/505Customised settings for obtaining desired overall acoustical characteristics using digital signal processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/40Applications of speech amplifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42391Systems providing special services or facilities to subscribers where the subscribers are hearing-impaired persons, e.g. telephone devices for the deaf
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility

Definitions

  • the invention relates to the field of sound enhancement, in particular to a voice enhancement method and a voice enhancement system.
  • the acoustic signal When the pressure wave associated with the acoustic signal propagates through the external auditory canal and hits the tympanic membrane, the acoustic signal is perceived as sound.
  • This vibration is amplified by about 22 times through the ossicles (including malleus, incus and stapes) and reaches the oval window at the bottom of the cochlea.
  • the vibration of the oval window membrane will generate a pressure wave in the vestibule, which makes the soft basement membrane vibrate and deform together with the spiral organs and hair cells, and then touch the covering membrane of the curved hair cells. More importantly, this wave in the basement membrane will have a maximum amplitude commensurate with the characteristic frequency generated by the vibration.
  • the hair cells that bend at the crests will trigger neurons to emit electrical impulses, which will pass through the thalamic cortex system and be transmitted to the Primary Auditory Cortex (PAC) for processing to produce the previously heard sounds.
  • the electrical signal will determine the frequency of the sound signal, which can be used as auditory brainstem response (ABR) through functional magnetic resonance imaging (Functional Magnetic Resonance Imaging, FMRI) and electroencephalogram (Electroencephalogram, EEG) technology Non-invasive measurement.
  • ABR auditory brainstem response
  • FMRI Functional Magnetic Resonance Imaging
  • EEG Electroencephalogram
  • any of the above-mentioned hearing formation mechanisms may cause hearing loss. If the external auditory canal is blocked, it will prevent sound from reaching the cochlea, and conductive hearing loss will occur. If there is any functional disorder in the inner ear, such as hair cell degeneration, it will prevent the generation and transmission of nerve impulses, making it impossible to reach the primary auditory cortex, and sensorineural hearing loss will occur at this time. Of course, it may also be due to a combination of the above reasons that cause hearing loss.
  • the causes of hearing impairment include aging (presbycusis), over-exposure to noise (Noise Induce of Hearing Loss, NIHL), heredity (congenital hearing loss), deafness caused by toxins in drugs, and so on. Regardless of the cause of the hearing loss, in addition to central deafness, hearing aids are usually helpful in other situations.
  • hearing loss the most common is the loss of hearing sensitivity in certain frequency bands, which can be detected by the hearing sensitivity chart in the hearing test.
  • hearing sensitivity in the high frequency band is easily lost.
  • hearing loss caused by noise hearing sensitivity will be lost in certain notched frequency bands.
  • the main symptom of the resulting hearing loss is difficulty in understanding sounds, especially those from women and children with high fundamental waves, and difficulty in understanding sounds with background noise. This is also called a cocktail party problem.
  • Even with hearing aids there is still a problem that the sound becomes very loud, but the content of the sound cannot be heard clearly. From the above content, we can see that a feasible way to remedy hearing damage is to amplify the sound of a specific weakened frequency band.
  • hearing loss is caused by most of the loss of sensitivity to high-frequency sounds produced by consonants, which define intelligibility and carry the meaning of speech.
  • Consonants are transient and cannot be analyzed by Fourier, so they are usually ignored in Fourier analysis.
  • amplifying higher frequencies in Fourier analysis is equivalent to amplifying harmonics, resulting in larger fundamental waves, just like the well-known phenomenon of missing fundamentals.
  • the consonants were ignored.
  • due to insufficient representation of the temporal fine structure (TFS) related to consonants the clarity will be reduced.
  • the Fourier method is based on linear and steady-state assumptions that do not conform to language characteristics. We cannot perceive sound based on Fourier analysis. This can also be discussed based on the following technical reasons:
  • the operating mechanism of the cochlea is driven by fluid dynamics, so it is impossible to produce peaks at all harmonic positions to meet the performance requirements of sound quality.
  • Harmonics are artificially generated by analyzing nonlinear signals through linear methods. Therefore, harmonics are mathematical artifacts rather than physical phenomena.
  • the technical problem to be solved by the present invention is to provide a selective amplified voice enhancement method and voice enhancement system.
  • the voice can be selectively amplified, and only the higher-frequency consonants in the sound are amplified without amplifying the vowels, which is effective Improve the clarity of the amplified sound.
  • HHT Hilbert-Huang Transform
  • EMD Empirical Mode Decomposition
  • IMF Intrinsic Mode Function
  • h 1 satisfies the conditions of IMF, then h 1 is the first IMF component obtained; otherwise, use h 1 as the original signal to perform the steps (1)-(2) until the difference after the kth iteration
  • the value h 1,k (t) becomes an IMF, denoted as c 1 (t) h 1,k (t)
  • the termination criterion of the above k-step iteration is such that Located within the set interval;
  • the termination criterion is when the Nth-order residual signal r N (t) is sufficient, so that no more IMF can be extracted.
  • the IMF component needs to meet the following two conditions: (1) In the entire time range of the IMF function, the number of local extreme points and zero-crossing points must be equal, or at most one difference; (2) At any time, the local maximum The average value of the envelope (upper envelope) and the envelope of the local minimum (lower envelope) approaches zero.
  • the signal is subjected to the Hilbert transform, assuming that the existing signal x(t), the Hilbert transform defining the signal x(t) is H[x(t)],
  • the HHT transform is designed to analyze non-linear and non-stationary data, and the voice signal also has such characteristics, so the HHT transform is extremely suitable for the processing of the voice signal. But the instantaneous frequency cannot represent the "period" produced by the modulation mode. HHT is further extended to higher-dimensional holographic Hilbert spectrum, which can cover modulation (or envelope) frequencies. In this new method, the frequency of the carrier and envelope (also known as pitch) is strictly defined. Using a holographic spectrum representation method specially designed for nonlinear data, this method will not be affected by the mathematical artifacts of harmonics. At the same time, the holographic Hilbert spectrum representation method is also designed for non-stationary data. This method can represent transient consonants with high fidelity. More importantly, this method can reveal modulation or periodic patterns.
  • the present invention can save signal processing time and improve sound clarity.
  • one aspect of the present invention provides a sound enhancement method, which includes the following steps:
  • step (1) The digital signal in step (1) is decomposed by the modal decomposition method, and multiple eigenmode functions (IMFs) are obtained.
  • the multiple eigenmode functions indicate that the digital signal converted from the sound signal is different The amplitude of frequency changes with time;
  • step (3) Selectively amplify the amplitude of multiple eigenmode functions obtained in step (2);
  • the modal decomposition method includes an empirical mode decomposition method, a collective empirical mode decomposition method, or an adaptive binary mask empirical mode decomposition method.
  • the amplified frequency range and the amplification factor are determined according to the hearing test atlas of the hearing impaired patient.
  • the eigenmode function of the consonant frequency range is selected to be amplified.
  • the present invention also provides another sound enhancement method, which includes the following steps:
  • step (1) The digital signal in step (1) is decomposed by an adaptive filter to obtain multiple class eigenmode functions.
  • the multiple class eigenmode functions indicate that the digital signal converted from the sound signal is at different frequencies.
  • step (3) Selectively amplify the amplitudes of multiple eigenmode functions obtained in step (2);
  • the adaptive filter is an average filter.
  • the amplified frequency range and the amplification factor are determined according to the hearing test atlas of the hearing impaired patient.
  • the quasi-eigenmode function of the consonant frequency range is selected to be amplified.
  • the two sound enhancement methods provided in the present invention can be applied to hearing aids, telephones, and broadcasting equipment in teleconferences.
  • a sound enhancement system which includes a sound receiving module, a sound enhancement module, and a sound playback module, wherein:
  • the sound receiving module is used to receive sound signals and convert the sound signals into digital signals
  • the sound enhancement module is used to process digital signals to obtain multiple eigenmode functions or multiple eigenmode-like functions, and selectively amplify the amplitudes of the obtained eigenmode functions or eigenmode-like functions , And integrate the selectively amplified eigenmode functions or quasi-eigenmode functions to obtain the integrated reconstructed signal, and convert the reconstructed signal into an analog signal to obtain an enhanced sound signal;
  • the sound playing module is used to play the enhanced sound signal.
  • the sound enhancement module includes an adaptive filter library, an amplification unit and an integration unit, among which:
  • the adaptive filter library is used to decompose the digital signal to obtain multiple eigenmode functions or multiple eigenmode functions of the digital signal;
  • the amplification unit is used to selectively amplify the amplitude of multiple eigenmode functions or multiple eigenmode functions;
  • the integration unit is used to integrate the enhanced eigenmode function or the quasi-eigenmode function to obtain an enhanced sound signal.
  • the sound enhancement module further includes a gain value adjustment unit, which, according to the hearing test atlas of the hearing impaired patient, obtains the multiple that the hearing impaired patient needs to amplify the sound signal amplitude in different frequency ranges or determines the amplification according to the frequency range where the consonants are located.
  • a gain value adjustment unit which, according to the hearing test atlas of the hearing impaired patient, obtains the multiple that the hearing impaired patient needs to amplify the sound signal amplitude in different frequency ranges or determines the amplification according to the frequency range where the consonants are located.
  • Multiplier the amplifying unit amplifies the amplitude of multiple eigenmode functions or multiple eigenmode functions according to the gain value adjustment unit.
  • the adaptive filter library includes one of a modal decomposition filter bank or an average filter bank.
  • the sound enhancement system is applied to hearing aids, telephones, and broadcast equipment in conference calls.
  • the invention overcomes the erroneous cognition in sound analysis, and analyzes the sound signal in the time domain based on the Hilbert-Huang transform.
  • the sound enhancement method and sound enhancement system of the present invention the sound can be selectively amplified, and only the higher-frequency consonants in the sound are amplified without amplifying the vowels, thereby effectively improving the clarity of the amplified sound.
  • Figure 1 is a flow chart of the sound from being generated to being played after being enhanced in the present invention.
  • Figure 2 shows the waveforms and Fourier spectrograms of low A, medium A, and high A emitted by the piano.
  • Figure 3 is the Fourier spectrum of the low A sound emitted by the piano, where 3a is the frequency spectrum containing the fundamental wave (220 Hz), and Figure 3b is the frequency spectrum that does not contain the fundamental wave.
  • Figure 4 is a wavelet spectrogram of a low A sound emitted by a piano, where 4a is the frequency spectrum containing the fundamental wave (220 Hz), and Figure 4b is the frequency spectrum that does not contain the fundamental wave.
  • Fig. 5 is a Hilbert time-spectrum diagram of a low A sound emitted by a piano, where 5a is a frequency spectrum including the fundamental wave (220 Hz), and 5b is a frequency spectrum that does not include the fundamental wave.
  • Figure 6 shows the holographic Hilbert spectrum of the low-A sound emitted by the piano with a fundamental wave (220 Hz).
  • Figure 7 shows the holographic Hilbert spectrum of the low-A sound emitted by the piano without the fundamental wave (220 Hz).
  • Fig. 8 shows the fringe spectrum of Fig. 6 and Fig. 7.
  • Figure 9 shows the sound data from "zi".
  • "z” is a consonant, followed by a vowel "i”.
  • Fig. 10 shows the IMF component of the sound data shown in Fig. 9.
  • Figure 11 is a Fourier spectrum diagram of the superimposed voice "zi" and the sound signal.
  • Figure 12 shows the Hilbert spectrum of the superimposed voice "zi" and the sound signal.
  • Figure 13 is a comparison of the reconstructed signal after the speech "zi" is enlarged or reduced in the high frequency part.
  • Figure 14 shows the sound data from “hello", where "h” and “lo” are audible sounds.
  • FIG. 15 is the IMF component of the sound data given in FIG. 14.
  • Figure 16 is the Hilbert spectrum of the voice "hello".
  • Figure 17 shows the Fourier spectrum of the voice "hello".
  • Figure 18a is a comparison between the first IMF and the filtered composition of different filters.
  • Figure 18b is a detailed comparison of the differences in the main parts of the signal.
  • FIG. 19 is a block diagram of an application scenario of an adaptive algorithm based on speech enhancement, which is based on signal decomposition and selective amplification of communication devices (such as telephones and conference calls).
  • step 100 a sound signal emitted by a sound source is received. Then proceed to step 110 to digitize the sound signal.
  • the sampling frequency can be selected as needed. If you want to reduce the cost, you can reduce the sampling frequency to 6000Hz to 10000Hz. For the accuracy, a high-frequency sampling frequency, 22KHz or 44KHz (22KHz and 44KHz are the sampling frequencies used by current mainstream capture cards). Because some noise may appear in the sound, the noise needs to be removed. In step 120, it can be removed by EMD or a median filter.
  • the denoised signal is processed, and the signal can be processed by a modal decomposition method (step 130) or an average filter (140) to obtain the eigenmode function component or the eigenmode function component of the sound signal.
  • the modal decomposition method refers to any modal decomposition method that can obtain the eigenmode function components in the present invention, such as Empirical Mode Decomposition (EMD), and Ensemble Empirical Mode Decomposition (EMD).
  • EEMD EEMD
  • CADM-EMD Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition
  • the mean filter bank a binary masking empirical mode decomposition method
  • CADM-EMD Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition
  • the eigenmode function component or quasi-eigenmode function component represents the change of the amplitude of the sound data at different frequency scales over time.
  • step 150 according to the hearing test result of the hearing impaired patient, the eigenmode function component or the eigenmode function component is selectively amplified.
  • step 160 the selectively amplified eigenmode function components or quasi-eigenmode function components are integrated to obtain an integrated reconstructed signal.
  • the limiter is selected to process the integrated reconstructed signal, because when the magnification in step 160 is too large, it may cause the signal to be clipped, making the reconstructed sound rough, here is added Limiter, enhance the reproduction of the sound.
  • step 170 the digital signal is converted into an analog signal (ie, a sound signal), and the sound is played back to the hearing impaired patient through a speaker (step 180).
  • the sound signal from the sound source is received (step 100), and the sound signal is digitized (step 110).
  • the incoming sound is digitized at a sampling frequency of 22KHz.
  • the sampling frequency is determined based on the following considerations. In speech, vowels and voiced consonants are dominated by the vibration frequency of the vocal cords, and the vibration frequency of the vocal cords forms the fundamental wave, which is denoted as F 0 .
  • the frequency range of F 0 is from 80 Hz to 400 Hz, and the frequency range represents from a deep male voice to a child's voice.
  • speech can contain up to 10KHz spectrum information
  • the Fourier spectrum information needed to distinguish between different consonants and vowels is largely located below 3000Hz to 5000Hz, because many of the frequency spectrums are composed of harmonics. It may be much higher than the frequency of the actual sound signal.
  • a sampling frequency of 22KHz is sufficient.
  • the sampling frequency can be reduced to 10000Hz or even 6000Hz.
  • a full 44KHz sampling frequency can also be used.
  • the spike noise in these sound signals can be removed by EMD or median filter (step 120).
  • x(t) is the initial sound signal
  • c j (t) is the Intrinsic Mode Function (IMF) component
  • r N (t) is the residual signal.
  • IMF Intrinsic Mode Function
  • the first IMF component is usually composed of three-point oscillations. Since EMD is almost a filter with increased frequency divided by two, when the fifth IMF component is reached, the oscillation should consist of an average wavelength of 48 points. When the sampling frequency is 22KHz, the fifth IMF component is already equal to the frequency of 450Hz. According to the patient's condition, we should stop before this point. For example, for a signal digitized at 22KHz, the average frequency of the first 5 components is:
  • step 150 we can selectively amplify the high-frequency components according to the patient's condition (step 150), and reconstruct the signal into y(t) (ie, step 160):
  • r N (t) represents the trend of sound
  • the frequency is generally very low and cannot be recognized by the human ear and can be ignored. Therefore, the reconstructed signal y(t) can be expressed as:
  • amplification factor which is determined according to the patient's audiogram test data to adapt to different patients; the value of the amplification factor can also be preset according to the frequency band of the consonant. Most amplification should be selectively placed on high-frequency components, because those components actually represent consonants that increase the clarity of the sound. Since most hearing impaired patients can still hear sounds up to 500 Hz, for the purpose of practical use, amplification of the first 4 components should be sufficient.
  • the reconstructed signal y(t) can be converted back to an analog signal, that is, a sound signal (step 170) and played back to the listener. It is worth noting that a limiter may be needed here (step 161), because too large magnification may cause signal clipping and make the reconstructed sound rough. Adding a limiter can enhance the reproduction of the sound. .
  • the sampling frequency can be set to 44KHz.
  • the value of the first IMF will be 15KHz, which may be ignored to suppress environmental noise. Regardless of the sampling rate, we only need to amplify the first 5 IMF components to reach 450 Hz.
  • Figure 2 shows the low-A, mid-A, and high-A waveform data from the piano and the corresponding Fourier spectrum. From the waveform data on the left, you can see the sinusoidal-shaped distortion waveform. The distorted waveform will produce harmonics as shown in the Fourier spectrum in the graph on the right.
  • Figure 3a and Figure 4a show the Fourier spectrogram and wavelet spectrogram including the fundamental wave. The fundamental wave can be removed by the notch filter, but after the fundamental wave is removed, the filtered signal is still regarded as the fundamental tone.
  • FIG. 4b also show the frequency spectrum lacking the fundamental wave.
  • Figure 3b is compared with Figure 3a.
  • Figure 4b is compared with Figure 4a, but the fundamental wave is missing, but the two are converted into sound After the signal, the sound signal sounds the same. Therefore, from the above-mentioned spectrum, we found the puzzling phenomenon of pitch loss.
  • Figure 5a and Figure 5b show the Hilbert spectrum with and without the fundamental wave, respectively. After removing the fundamental wave from the Hilbert spectrum as shown in Figure 5b , There is still a weak fundamental wave, but this weak energy density cannot explain why the listener can hear the sound. For a long time, people have realized that the perceived sound actually comes from the periodicity of the envelope. However, there are currently no available tools to rigorously and objectively determine the frequency composition of the envelope. The sound we perceive is currently only defined by subjective "pitch.”
  • Figure 6 and Figure 7 respectively show the frequency spectrum with and without the fundamental wave in the sound.
  • Figure 6 shows the holographic spectrum of low-A sound with a fundamental wave. It can be seen from Figure 6 that in almost all FM frequency ranges, there is an enhanced AM energy density of about 220 Hz. There is also a strong FM energy density near 220 Hz.
  • Figure 7 shows the holographic spectrum of low-A sound without a fundamental wave.
  • the complete four-dimensional time-correlated holographic Hilbert spectrum is too complicated and clumsy.
  • a simplified time-based Hilbert spectrum analysis of the instantaneous frequency is sufficient.
  • the present invention operates only based on time.
  • Figure 9 is the data information of the sound "zi”.
  • "z” is a consonant, followed by a vowel "i”.
  • Chinese contains some of the highest frequency unvoiced sounds (such as z, c, s and j, q, x). The existence of these sounds brings great challenges to the design of hearing aids.
  • the "zi" is one example.
  • the data shown in Figure 9 is decomposed.
  • the decomposition result is shown in Fig. 10, that is, Fig. 10 shows the IMF component of the data in Fig. 9.
  • the high-frequency components in the first four IMFs mainly represent the sound of "z", especially IMF1 and IMF2.
  • the block diagram area represents the time period covered by the data given in Figure 9.
  • Figure 11 is a Fourier spectrum diagram of the voice "zi” signal superimposed. In the first 0.15s, the sound is “z", and its frequency is very high, starting from around 8000 Hz, almost reaching 20000 Hz. The vowel part starts later and is full of harmonics. There are dense harmonics in the 2000Hz range, and there are other high energy density regions, located at 4000Hz to 5000Hz and 8000Hz to 10000Hz respectively. Due to all the shortcomings of Fourier analysis when applied to non-linear and non-stationary data, we will compare the results of HHT-based Hilbert spectrum analysis in Figure 12.
  • Figure 13 is a comparison of reconstructed signals after zooming in (step 160) or zooming out.
  • the amplified signals (H1z and H2z) represent different amplification factors of the high-frequency IMF.
  • the different amplification factors indicate that the voice enhancement method of the present invention can individually and selectively amplify different patients. Compared with the original signal, we can see that the amplification only selectively amplifies the consonant part, while leaving the vowel part unchanged.
  • the reduced signal (L1z and L2z) simulates hearing loss to varying degrees.
  • the hearing impairment is the consonant part rather than the vowel part.
  • Hearing aids with self-compensation mechanisms currently available on the market can make the sound louder, but the sound lacks clarity. What is important is that if you selectively amplify the harmonics in the range of 1000Hz to 4000Hz, it is effective to amplify the fundamental wave of the vowel without involving the consonant part.
  • the final effect will be equivalent to amplifying L1z or L2z, so that The sound will become louder, but the clarity will not improve.
  • the principle of hearing aid design is "selective amplification" of sound.
  • Amplifying Fourier in the range of about 2000Hz to 4000Hz this method effectively amplifies the harmonics, which is equivalent to amplifying the fundamental wave without the fundamental wave. But these fundamental waves do not need to be amplified at all. However, some consonants have no harmonics, nor any tangible signals in the range of 2000Hz to 4000Hz.
  • the combined effect in the Fourier method actually amplifies the audible vowels, which is equivalent to amplifying the signal L1z or L2z in Figure 13. The patient will not get any clarity, but only loudness, which is the complaint of current hearing aid users based on the Fourier principle.
  • the EMD method can be replaced by other methods or equivalent.
  • These equivalent methods include repeated application of continuous operation mode, median method, a single set of band pass filters, any filter that can divide the signal into high and low parts, and high pass with various window sizes required by the input signal Filter or other time domain filtering.
  • the steps are as follows: First, the data is decomposed by continuous running mean value,
  • ⁇ x(t)> nj represents an average filter with a window size of nj, where nj must be an odd number.
  • h j (t) is the IMF-like generated by the filter.
  • the repeated use of the rectangular filter actually changes the response function of the rectangular filter used. For example, two repetitions will give a triangular response, and more than four repetitions, almost a Gaussian response can be obtained.
  • the key parameter for using this filter is the window size. According to the discussion in formula (2), at a sampling frequency of 22KHz, we can draw the following conclusion:
  • the rectangular filter and EMD should have the following equivalent relationship:
  • the value of a j can be the same by the patient according to formula (3), and can be determined according to the audiogram test result.
  • Figure 14 shows the digitized data in the "hello” language, where "h” and “lo” are audible sounds.
  • Figure 15 shows the data decomposed by EMD. The component with the highest energy is IMF3, and the two high-frequency IMFs are IMF1 and IMF2.
  • the Hilbert spectrum of "hello” is shown in Figure 16.
  • the energy density along the 200 Hz signal represents the vibration of the vocal cords, and the energy density between 400 Hz and 1000 Hz mainly represents the resonance of the vocal organs.
  • the high-frequency energy between 2000Hz and 3000Hz represents the reflection of the vocal tract.
  • the frequency depends on the height and weight of the speaker and varies from person to person.
  • the reflected signal in Figure 12 is much higher, about 4000 Hz, which indicates that the speaker is smaller.
  • These high frequency components will increase the timbre of the sound. It is worth noting that there is very little energy above 1000 Hz.
  • Figure 17 is a Fourier spectrum diagram of the sound "hello". It can be seen from the figure that it covers all the harmonics in all frequency ranges. Based on the phenomenon of "fundamental loss” discussed above, the amplification of harmonics is equal to the amplification of the fundamental wave. Therefore, in Fourier analysis, any attempt to amplify the frequency within this range will accurately prove the lack of fundamental wave, and the result will be a louder sound, but no increase in clarity.
  • Figure 18a is a comparison of the first IMF and filter components after filtering.
  • the filter used in the comparison here is an average filter. Overall, they look similar. The enlarged details are shown in Figure 18b, which is used to compare the differences in the main parts of the signal in detail. The lack of dynamic range in the filter results is obvious.
  • the method of using the filter does not guarantee the properties of the IMF, because the instantaneous frequency and the resulting envelope will be different from the EMD method.
  • the most critical disadvantage of the filter method is that the average filter will remove some harmonics with sharp characteristics of low-frequency components. Therefore, there will be leakage in the filter method, but the method is also complete, and the sum of the IMF-like generated in this way will be added to completely restore the original data.
  • the filter method can provide an acceptable but cheaper alternative to IMF generated by EMD.
  • the filter method may still have exactly the same effect to increase clarity without increasing loudness, because the decrease in clarity is due to insufficient representation of TFS (Time Fine Structure, also known as consonants). This is the function we have implemented in this embodiment.
  • the filter method and the EMD method look similar, but the filter method still loses some clarity and other qualitative details.
  • the above two embodiments mainly introduce the hearing aid method of the present invention, and the main application during the introduction is a hearing aid device for the hearing impaired, that is, a hearing aid.
  • the signal decomposition adaptive algorithm based on speech enhancement in the present invention can also be used in communication devices, such as broadcasting in telephones or conference calls.
  • Telephone voice is a classic problem for hearing impaired patients. With the development of high-quality mobile phones, voice quality has been greatly improved. However, for hearing impaired patients, this is still a challenge. Sound enhancement, noise reduction and optimization are all necessary.
  • FIG. 19 is a block diagram of a speech enhancement system according to an embodiment of the present invention.
  • the voice enhancement system includes a sound receiving module 10, a sound enhancement module 20, and a sound playing module 30.
  • the sound receiving module 10 is used to receive a sound signal, and determine whether the received sound signal is an analog signal or a digital signal, and when the received sound signal is an analog signal, convert the analog signal into a digital signal.
  • the sound enhancement module 20 is used to selectively amplify the received digital sound signal.
  • the principles and detailed steps involved in the key parts of the sound enhancement module are the same as those listed in the hearing aid embodiments.
  • the adaptive filter library processes the sound signal to obtain multiple eigenmode function components or multiple eigenmode function components.
  • the adaptive filter library includes a modal decomposition filter bank, an average filter bank, and the modal decomposition filter bank adopts any method of the present invention that can obtain eigenmode components, such as empirical mode decomposition.
  • an adaptive filter bank such as a mean filter bank, can also be used to obtain the class eigenmode components. After the digital sound signal passes through the adaptive filter library, multiple eigenmode function components or multiple eigenmode function components are obtained, and the multiple eigenmode function components or multiple eigenmode function components are represented by The amplitude of sound data at different frequency scales changes over time.
  • the value of the gain value adjustment unit can determine the multiples that need to be amplified for the amplitude of the sound signal in different frequency ranges according to the measurement results of the hearing impaired; it can also be preset according to the frequency range of the consonant.
  • the eigenmode function components or the eigenmode function components processed by the adaptive filter library are selectively amplified, and different frequency ranges can be selected to be amplified by different multiples to achieve selective amplification the goal of.
  • the selectively enhanced eigenmode function components or quasi-eigenmode function components are integrated to obtain an enhanced sound signal.
  • the sound playing module 30 is used to play the enhanced sound, which converts the enhanced sound signal into an analog signal and plays it.

Landscapes

  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurosurgery (AREA)
  • Otolaryngology (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本发明公开一种声音增强方法及声音增强系统。该方法包括:取得声音信号,将声音信号转化成数字信号;将数字信号进行分解,取得多个本征模态函数或者多个类本征模态函数;将取得的多个本征模态函数或多个类本征模态函数的振幅进行选择性放大;将选择性放大后的本征模态函数或类本征模态函数进行整合,得到整合后的重构信号;将整合后的重构信号转换成模拟信号。本发明基于希尔伯特-黄变换,能有效地将声音进行选择性增强,并且只放大声音中频率较高的辅音而不是放大元音,该方法能有效地提高放大后声音的清晰度,克服了目前声音增强方法中只增加声音响度而不增加清晰度的问题。

Description

一种声音增强方法及声音增强系统 技术领域
本发明涉及声音增强领域,尤其涉及一种语音增强方法及语音增强系统。
背景技术
1.听觉的形成机制
当与声学信号相关的压力波传播通过外耳道撞击鼓膜时,声学信号被感知为声音。这种振动通过听小骨(包括锤骨,砧骨和镫骨)放大了22倍左右,到达耳蜗底部的卵圆窗。卵圆窗膜的振动将会在前庭中产生压力波,这样使得柔软的基底膜与螺旋器以及毛细胞一起振动变形,然后触及弯曲毛细胞的覆盖膜。更重要的是,基底膜中的这种波将具有与该振动产生的特征频率相称的最大振幅。在波峰处弯曲的毛细胞将触发神经元发射产生电脉冲,这些电脉冲将穿过丘脑皮层系统,并传输到初级听觉皮层(Primary Auditory Cortex,PAC)进行处理,以产生之前听到的声音。该电信号将确定声音信号的频率,该频率可以作为听觉脑干反应(Auditory Brainstem Response,ABR)通过功能性磁共振成像(Functional Magnetic Resonance Imaging,FMRI)和脑电图(Electroencephalogram,EEG)技术进行无创测量。由此可以清楚地知道,声音感知的关键在于螺旋器以及相关毛细胞的运动。
2.听力损伤
在上述听觉形成机制上的任一环节出现问题均可能导致听力损失。如果外耳道发生阻塞,将会阻止声音传到耳蜗,这样就会出现传导性听力损失。如果内耳中出现任何功能性障碍,如毛细胞退化,将会阻止神经脉冲的产生和传递,使其不能到达初级听觉皮层,此时将出现感觉神经性听力损失的情况。当然,还可能是因为上述原因的组合导致听力损失。听力障碍的产生原因包括衰老(老年性耳聋),过度接触噪音即噪音引发的听力损失(Noise Induce of Hearing Loss,NIHL),遗传(先天性听力损失),药物中的毒素导致的耳聋等等。无论导致听力损失的原因是什么,除了中枢性耳聋,其他情况助听器通常都是有帮助的。
在所有听力损失的原因中,最常见的是对某些频带的听力敏感性丧失,可以通过在听力测试中的听力敏感图进行检测。对于老年性耳聋病例,高频频段的听力敏感性容易丧失。对于噪音引发的听力损失,将会在某些缺口频段的听 力敏感性丧失。由此产生的听力损失的主要症状是难以理解声音,特别是来自基波较高的女性和儿童的声音,以及难以理解具有背景噪声的声音,这也被称之为鸡尾酒会问题。即使使用助听器,仍然存在的问题是声音会变得很大声,但是听不清声音的内容。从上述内容可知,补救听力损伤的可行办法是放大特定弱化频带的声音。这正是目前市面上助听器的放大原理。然而,这样的助听器却不能很好地发挥助听功能,因为即使在使用了这样的助听器之后,仍然存在声音变得很大但是却没有清晰度的问题。数据统计显示,目前在25%的能获得助听器的听力受损人群中,60%的人即使配备了助听器也不会持续地使用。
3.目前助听器设计的缺陷
目前助听器设计的问题根源在于,对声音感知的认识是建立在上述误解之上的。亥姆霍兹曾发表著名的言论:“所有的声音,无论多么复杂,都可以在数学上分解为正弦波”,自此之后,声音便以傅里叶频率表示。但是在人类的听觉感知中,却不是这样工作的,因为存在基音缺失的现象以及被正弦波调制的白噪声所产生的音调。“基音缺失现象”是指人耳能够明确地感受到一个复合音的基音音高,然而在实际上该复合音的傅里叶频谱中并没有显示出人耳所感知的基音部分。为了弥补这些存在的缺点,引入了“音调”,认为声音感知是基于周期的,取决于调制产生的包络,而非频率。由于缺乏严格的方法来确定调制模式,因此“音调”只是一个主观上的定义。然而,目前在使用的听力图仍然是基于傅里叶频率的纯正弦声音,听力图用来测量听力损失并且根据其测量结果安装助听器。因此,当前基于傅里叶的助听器将放大弱化的高频带,该频带主要是谐波,与所感知的声音几乎没有关系。由于存在基音缺失的现象,谐波的放大无异于放大了基波。听力障碍者在听基音上是没有问题的,他们存在听力障碍的高频区的声音主要是来自清音和辅音。谐波的放大,放大的是听力障碍者所能感知的基音部分,会使得声音更大声,但是,无法听到辅音会导致清晰度下降。显然,当前的助听器方法存在问题。
这种混乱深深根植于我们对可听声音理论的误解中。傅里叶分析是基于线性和稳态的假设,但是声音既不是线性的也不是稳态的。因此,语言中能用傅里叶进行分析的部分是一些元音和一些浊辅音,与声带振动有关的声音,其打开速度比闭合速度慢,并产生轮廓不对称且失真的非线性波。在用傅里叶进行声音分析时,每当涉及声带声音产生,这些声音就含有丰富的谐波。谐波的放 大等效于基波的放大,这些基波的频率较低且没有周期性。这些不是听力产生的问题。
实际上,听力损失是由于大部分由辅音产生的对高频声音的敏感度丧失而引起的,辅音定义了清晰度并且带有语音的含义。辅音是瞬态的,无法用傅里叶进行分析,因此在傅里叶分析中通常被忽略。结果,在傅里叶分析中放大较高的频率相当于放大谐波,从而产生较大的基波,就像著名的基音缺失的现象。而辅音被忽略了。实际上,由于与辅音有关的时间精细结构(Temporal Fine Structure,TFS)的表示不足,清晰度将会降低。
重要的是,不应使用傅里叶方法分析声学信号,傅里叶方法是基于不符合语言特征的线性和稳态假设。我们不能基于傅里叶分析来感知声音,这也可以基于以下技术原因来讨论:
(1).傅里叶分析是基于积分变换的,积分变换需要一个有限的窗口,并受测不准原理的限制。
(2).傅里叶频谱无法检测出调制:它们无法说明周期性,而周期性是解释包络声音这一声音感知的重要属性。因此,出现了令人困惑的基音缺失现象以及正弦波调制的白噪声的声音感知。
(3).傅里叶无法表示打击乐器发出的“chi”的声音,因为这种声音是非平稳的。
(4).耳蜗的运行机制是由流体动力学驱动的,所以其不可能在所有谐波的位置处都产生波峰来满足音质的表现需求。
(5).谐波是通过线性方法来分析非线性信号而人工产生的,因此谐波是数学上的假象而非物理现象。
(6).替代数据(具有任意相位的傅里叶光谱)的存在使得傅里叶光谱的表示不唯一。
由于存在上述限制,过去的语音分析都把重点放在元音上。而实际上,我们讲话的含义大部分是由辅音表示,这些辅音的频率大都高于大多数谐波的频率范围。采用目前的语音分析方法对声音进行增强时,往往只是放大了元音的谐波部分,而不放大辅音,造成声音很大但是却不清晰。无法在傅里叶分析中正确地表示且在音高的形成感知中忽略这些辅音是我们语音感知理论和助听器及原理中的致命缺陷。
发明内容
本发明所要解决的技术问题在于提供一种选择性放大的语音增强方法及语音增强系统,通过本发明可以将声音进行选择性放大,只放大声音中频率较高的辅音而不放大元音,有效提高放大后声音的清晰度。
本发明基于希尔伯特-黄变换(Hilbert-Huang Transform,HHT),其是基于时间分析。HHT是将信号进行经验模态分解(Empirical Mode Decomposition,简称为EMD),信号经过EMD分解后得到多个本征模态函数(Intrinsic Mode Function,简称IMF)分量,将每个IMF分量进行Hilbert变换,得到信号的时频属性。其中,频率是由相位函数的微分定义而非傅里叶变换中的积分变换定义。
EMD分解的步骤如下:
(1)找出信号x(t)中所有局部极大值,并用三次样条插值函数连接成上包络;同理,利用三次样条插值函数连接所有的局部极小值,构成下包络;
(2)求出上、下包络的平均值记为m 1,并求原始信号与包络均值的差值:x(t)-m 1=h 1
(3)如果h 1满足IMF的条件,那么h 1就是求得的第一个IMF分量;否则将h 1作为原始信号进行(1)-(2)的步骤,直到第k次迭代后的差值h 1,k(t)成为一个IMF,记为c 1(t)=h 1,k(t),上述k步迭代的终止准则是使得
Figure PCTCN2020086485-appb-000001
位于设定的区间之内;
(4)从原始信号中减去c 1(t)得到第一阶剩余信号r 1(t),x(t)-c 1(t)=r 1(t);
(5)将剩余信号r 1(t)作为原信号进行步骤(1)-(4)过程,
Figure PCTCN2020086485-appb-000002
终止准则是当第N阶剩余信号r N(t)足够,以致不能再提取IMF。
综上所述,原始信号x(t)的分解为:
Figure PCTCN2020086485-appb-000003
其中IMF分量需要满足以下两个条件:(1)IMF函数在整个时间范围内,局部极值点和过零点的数目必须相等,或者最多相差一个;(2)在任意时刻,局部极大值的包络(上包络线)和局部极小值的包络(下包络线)的平均值趋 近于0。
将信号进行希尔伯特变换,假设,现有信号x(t),定义信号x(t)的希尔伯特变换为H[x(t)],
Figure PCTCN2020086485-appb-000004
HHT变换设计用于分析非线性和非稳态数据,而语音信号也具有这样的特性,因此HHT变换极其适用于语音信号的处理。但是瞬时频率不能代表调制模式产生的“周期”。将HHT进一步扩展到了更高维的全息希尔伯特频谱,可以涵盖调制(或包络)频率。在这种新方法中,严格定义了载波和包络(又称音高)的频率。使用专门为非线性数据设计的全息频谱表示方法,该方法不会受到谐波的数学假象的影响。同时,全息希尔伯特频谱表示方法,也是为非稳态数据而设计,该方法可以高保真地表示瞬态辅音。更重要的是,该方法可以揭示调制或周期性模式。
根据对声音信号分析的详细知识,在本发明中,我们避免进行频域空间的操作,完全基于时间。因此,本发明将能节省信号处理时间并且提高声音的清晰度。
为了实现上述发明目的,本发明一方面提供一种声音增强方法,包括以下步骤:
(1)取得声音信号,并将声音信号转化成数字信号;
(2)将步骤(1)中的数字信号采用模态分解方法进行分解,取得多个本征模态函数(IMFs),该多个本征模态函数表示由声音信号转化的数字信号在不同频率的振幅随时间的变化;
(3)将步骤(2)中取得的多个本征模态函数的振幅进行选择性放大;
(4)将选择性放大后的本征模态函数进行整合,得到整合后的重构信号;
(5)将整合后的重构信号转换为模拟信号。
可选的,模态分解方法包括经验模态分解法,集合经验模态分解法,或者自适应性二进位遮罩经验模态分解法。
可选的,步骤(3)中对本征模态函数的振幅进行放大时,放大的频率区间和放大倍数根据听力障碍患者的听力测试图谱进行确定。
可选的,步骤(3)中对本征模态函数的振幅进行放大时,选择对辅音频率范围的本征模态函数进行放大。
为了降低信号处理时间以及降低成本,本发明还提供另一种声音增强方法,包括以下步骤:
(1)取得声音信号,并将声音信号转化为数字信号;
(2)将步骤(1)中的数字信号采用自适应滤波器进行分解,取得多个类本征模态函数,该多个类本征模态函数表示由声音信号转化的数字信号在不同频率的振幅随时间的变化;
(3)将步骤(2)中取得的多个类本征模态函数的振幅进行选择性放大;
(4)将选择性放大后的类本征模态函数进行整合,得到整合后的重构信号;
(5)将整合后的重构信号转换为模拟信号。
可选的,自适应滤波器为均值滤波器。
可选的,步骤(3)中对类本征模态函数的振幅进行放大时,放大的频率区间和放大倍数根据听力障碍患者的听力测试图谱进行确定。
可选的,步骤(3)中对类本征模态函数的振幅进行放大时,选择对辅音频率范围的类本征模态函数进行放大。
可选的,本发明中提供的两种声音增强方法,可以应用于助听器,电话,以及电话会议中的广播设备。
本发明的另一方面,提供一种声音增强系统,包括声音接收模块、声音增强模块和声音播放模块,其中:
声音接收模块用于接收声音信号,并将声音信号转换为数字信号;
声音增强模块用于对数字信号进行处理,得到多个本征模态函数或者多个类本征模态函数,将得到的本征模态函数或类本征模态函数的振幅进行选择性放大,并将选择性放大的本征模态函数或类本征模态函数进行整合,得到整合后的重构信号,并将重构信号转化为模拟信号,得到增强后的声音信号;
声音播放模块,用于将增强后的声音信号进行播放。
可选的,声音增强模块包括自适应滤波器库,放大单元和整合单元,其中:
自适应滤波器库用于对数字信号进行分解,得到数字信号的多个本征模态函数或者多个类本征模态函数;
放大单元用于对多个本征模态函数或者多个类本征模态函数的振幅进行选择性放大;
整合单元用于对增强的本征模态函数或者类本征模态函数进行整合,得到 增强的声音信号。
可选的,声音增强模块还包括增益值调整单元,其根据听力障碍患者的听力测试图谱,获取听力障碍患者在不同频率范围内声音信号振幅所需要放大的倍数或者根据辅音所在的频率范围确定放大倍数,放大单元根据增益值调整单元对多个本征模态函数或多个类本征模态函数的振幅进行放大。
可选的,自适应滤波器库包括模态分解滤波器组或者均值滤波器组其中之一。
可选的,声音增强系统应用于助听器,电话,以及电话会议中的广播设备。
一贯以来,人们对声音存在误解,认为所有的声音信号都可以分解成正弦波,即声音用傅里叶频率表示。本发明克服了声音分析中的错误认知,基于希尔伯特-黄变换,对声音信号在时域上进行分析。利用本发明中的声音增强方法及声音增强系统,可以对声音进行选择性放大,只放大声音中频率较高的辅音而不放大元音,有效提高放大后声音的清晰度。
附图说明
图1为本发明中声音从产生到增强之后进行播放的流程图。
图2为钢琴发出的低A、中A和高A的波形图和傅里叶频谱图。
图3为钢琴发出的低A声音的傅里叶频谱,其中3a为包含基波(220Hz)的频谱,图3b为不包含基波的频谱。
图4为钢琴发出的低A声音的小波频谱图,其中4a为包含基波(220Hz)的频谱,图4b为不包含基波的频谱。
图5为钢琴发出的低A声音的希尔伯特时间频谱图,其中5a为包含基波(220Hz)的频谱,5b为不包含基波的频谱。
图6为钢琴发出的低A声音具有基波(220Hz)的全息希尔伯特频谱。
图7为钢琴发出的低A声音不具有基波(220Hz)的全息希尔伯特频谱。
图8为图6和图7的边缘频谱。
图9为来自“zi”的声音数据,在汉语中,“z”是辅音,其后是元音“i”。
图10为图9中给出的声音数据的IMF分量。
图11为语音“zi”和声音信号叠加后的傅里叶频谱图。
图12为语音“zi”和声音信号叠加后的希尔伯特频谱图。
图13为语音“zi”对高频部分放大或缩小后的重构信号比较。
图14为来自“hello”的声音数据,其中“h”和“lo”是能听到的声音。
图15是图14中给出的声音数据的IMF分量。
图16是语音“hello”的希尔伯特频谱图。
图17为语音“hello”的傅里叶频谱图。
图18a为第一个IMF与不同滤波器滤波后的组成的对比。图18b为信号主要部分的差异的详细比较。
图19为基于语音增强的自适应算法应用场景的框图,该自适应算法基于通信设备(例如电话和电话会议)的信号分解和选择性放大。
具体实施方式
以下配合附图及本发明的较佳实施例,进一步阐述本发明为达成预定发明目的所采取的技术手段。
如图1所示,本发明实施例所揭露的一种声音增强方法,在步骤100,接收声源发出的声音信号。接着进行步骤110,将声音信号进行数字化,在进行声音信号数字化过程中,采样频率可以根据需要进行选择,若想要降低成本,可以将采样频率降低至6000Hz至10000Hz,若为了获得更高的保真度,也可以采用高频率的采样频率,22KHz或者44KHz(其中22KHz和44KHz属于目前主流的采集卡使用的采样频率)。因为声音中,可能会出现一些噪声,需要将噪声进行清除,在步骤120中,可以通过EMD或者中值滤波器来进行清除。将去噪后的信号进行处理,可以通过模态分解方法(步骤130)或者均值滤波器(140)处理信号,得到声音信号的本征模态函数分量或者类本征模态函数分量。模态分解方法指本发明中利用任意一种可以取得本征模态函数分量的模态分解方法,例如经验模态分解法(Empirical Mode Decomposition,EMD),集合经模态分解法(Ensemble Empirical Mode Decomposition,EEMD),或者自适应性二进位遮罩经验模态分解法(Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition,CADM-EMD),除了使用以上各种经验模态分解方法以及基于其改进的信号分解方法,如均值滤波器组,获得类本征模态函数分量。所得到的本征模态函数分量或者类本征模态函数分量表示声音数据在不同频率尺度的振幅随时间的变化。在步骤150中,根据听力障碍患者的听力测试结果,选择性 地放大本征模态函数分量或者类本征模态函数分量。在步骤160中,将选择性放大的本征模态函数分量或者类本征模态函数分量进行整合,得到整合后的重构信号。在步骤161中,选择限幅器对整合后的重构信号进行处理,因为当步骤160中放大倍数过大时,可能会导致信号削波,使得重构后的声音变得粗糙,此处增加限幅器,增强声音的还原度。在步骤170中,将数字信号转化为模拟信号(即声音信号),并通过扬声器(步骤180)将声音回放给听力障碍患者。
为了更好地说明本发明的声音增强方法,我们首先以模态分解方法为例进行说明。首先,接收声源发出的声音信号(步骤100),并对声音信号进行数字化(步骤110)。为了节省时间,将进入的声音以22KHz采样频率进行数字化。采样频率基于以下考虑因素确定。在语音中,元音和浊辅音受声带振动频率支配,声带的振动频率形成基波,记为F 0。F 0的频率范围从80Hz到400Hz,该频率范围表示从一个深沉的男性声音到一个孩子的声音。尽管语音可以包含高达10KHz的频谱信息,但是区分不同的辅音和元音所需的傅里叶频谱信息在很大程度上位于3000Hz至5000Hz以下,因为其中的很多频谱是由谐波组成,其频率可能比实际声音信号的频率高得多。就没有人造谐波的希尔伯特频谱表示而言,声音信号的瞬时频率很少会超过1000Hz。因此,22KHz的采样频率足够了。为了进一步节省处理成本,采样频率可以降低到10000Hz甚至6000Hz,当然,为了获得更高的保真度,也可以使用全44KHz的采样频率。
这些声音信号中的尖峰噪声可以通过EMD或者中值滤波器进行去除(步骤120)。通过EMD对声音信号进行分解(步骤130),得到声音信号的本征模态函数分量
Figure PCTCN2020086485-appb-000005
其中,x(t)是初始声音信号,c j(t)是本征模态函数(Intrinsic Mode Function,IMF)分量,r N(t)是剩余信号。这些IMF分量是正交的,并且在时间范围内对这些分量进行动态排序。第一个IMF分量通常是由三点振荡组成。由于EMD几乎是二分频频率增加的滤波器,所以当到达第五个IMF分量时,振荡应由48个点的平均波长组成。当采样频率为22KHz时,第五个IMF分量已经等于450Hz的频率。根据患者的病情,我们应当在该点之前停止。例如,对于以22KHz进行数字化的信号,前5个分量的平均频率为:
c 1(t):3-points~7,000Hz
c 2(t):6-points~3,500Hz
c 3(t):12-points~1,800Hz
c 4(t):24-points~900Hz
c 5(t):48-points~450Hz     (2)
无论基础频率值如何,我们都可以根据患者的病情选择性地放大高频分量(步骤150),并将信号重构为y(t)(即步骤160):
Figure PCTCN2020086485-appb-000006
因为r N(t)代表声音的趋势,一般频率很低,人耳不能识别,可以被忽略,因此,重构信号y(t)可以表示为:
Figure PCTCN2020086485-appb-000007
其中是a j放大因子,该值是根据患者的听力图测试数据确定的,以适应不同的患者;也可以根据辅音所在的频段预先设定放大因子的数值。大多数放大应该选择性地放在高频分量上,因为那些分量实际上代表了会增加声音清晰度的辅音。由于大多数听力障碍患者仍然能够听到高达500Hz的声音,因此,处于实际使用的目的,扩增前4个成分应该是足够的。重构的信号y(t)可以被转换回模拟信号即声音信号(步骤170)并被回放给收听者。值得注意的是,这里可能需要一个限幅器(步骤161),因为放大倍数过大将可能会导致信号削波,并且使得重构的声音变得粗糙,添加限幅器,可以增强声音的还原度。
为了获得更高的保真度,可以将采样频率设置为44KHz。在这种情况下,第一个IMF的值将为15KHz,该值可能被忽略以抑制环境噪声。无论采样率如何,我们只需要放大前5个IMF分量即可达到450Hz。
为了说明本发明中声音分析方法的优势,在图2-图8中,我们比较了傅里叶频谱图、小波频谱图以及希尔伯特时间频谱图,通过不同方法频谱图的比较,用缺少基波的例子来讲述听力机制的细节,这将有助于说明目前谐波放大方法的不足之处。
首先我们以钢琴(一种打击乐器)发出的低A音为例。在图2中给出了钢琴发出的低A、中A和高A的波形数据以及相应的傅里叶频谱,从左侧的波形数据中可以看出正弦曲线形状的失真波形。失真的波形将会产生如右侧图形中傅里叶频谱中所示的谐波。图3a和图4a给出了包含基波的傅里叶频谱图和小波 频谱图,基波可以通过陷波滤波器去除,但是在去除基波之后,滤除的信号仍被视为基音。图3b和图4b中的傅里叶频谱图和小波分析频谱图中也显示了缺少基波的频谱。图3b与图3a相比,虽然缺少了基波,但是两者转换成声音信号之后,声音信号听起来是一致的;图4b与图4a相比,缺少了基波,但是两者转换成声音信号之后,声音信号听起来也是一致的。因此,从上述图谱中,我们发现了令人费解的基音缺失现象。如果我们切换到自适应的HHT分析,图5a和图5b分别给出了包含基波和不包含基波的希尔伯特频谱,如图5b所示的希尔伯特频谱中去除基波之后,仍存在微弱的基波,但是这种微弱的能量密度无法解释为什么听者能听到声音。长期以来,人们已经认识到,感知到的声音实际上来自于包络的周期性。然而,目前并没有可用的工具来严格且客观地确定包络的频率组成。我们感知到的声音目前仅仅是通过主观的“音高”来定义。
最近,黄锷等人引入了全息希尔伯特频谱分析,更准确地说,是引入了一整套工具来分析与听力有关的声信号。如果使用全息希尔伯特频谱分析,图6和图7中分别给出了声音中具有和不具有基波的频谱。图6为具有基波的低A声音的全息频谱,由图6可知,在几乎所有的FM频率范围内,都有一个约220Hz频率的强调制AM能量密度。在220Hz附近也有很强的FM能量密度。图7为没有基波的低A声音的全息频谱,由图7可知,在220Hz左右的强AM调制频率仍然覆盖了几乎所有的FM频率范围,即在覆盖了几乎所有FM频率范围的220Hz左右频率上的强调制AM能量密度仍然存在。这里缺少220Hz左右的强FM能量密度,这表明经过滤波后的数据中缺少基波。如果我们进一步从图6和图7中计算边缘全息频谱,其结果如图8所示。在两种情况下,即使没有基波,AM的能量密度也是主要的。无论是否存在基波,此处均清晰地显示了调制频率AM的主导地位,即使是在FM投影中滤波器数据中已经没有基波的情况下。FM或AM的主要频率是感知到的声音。因此,我们证明了HHT在声学信号分析中的优势,以及缺少基波的影响,通过放大谐波进而放大基波。
但是,对于语言分析,完整的四维时间相关的全息希尔伯特频谱显得过于复杂且粗笨。要解决本发明中的问题,简化的基于时间的瞬时频率希尔伯特频谱分析就足够了。本发明仅基于时间操作。
为了详细说明本发明中实际的工作方法,我们以中文发音中的清音“zi”进行进一步说明。请参见图9,图9是声音“zi”的数据信息,在汉语中,“z”是 辅音,其后是元音“i”。实际上,值得注意的是,汉语中包含了一些最高频率的清音(如z,c,s和j,q,x),这些声音的存在给助听器设计带来了很大的挑战,本例中的“zi”就是其中一例。
根据EMD分解方法,对图9中展示的数据进行分解。该分解结果在图10中展示,即图10中展示的是图9中数据的IMF分量。图10中,前4个IMF中的高频分量主要代表“z”的声音,尤其是IMF1和IMF2,框图区域内表示图9中给出的数据所覆盖的时间段。
图11是语音“zi”信号叠加后的傅里叶频谱图。在开始的0.15s内,声音为“z”,其频率非常高,从8000Hz附近开始,几乎达到20000Hz。元音部分稍后开始,并且充满了谐波。在2000Hz范围内有密集的谐波,还有其他高能量密度区域,分别位于4000Hz至5000Hz和8000Hz至10000Hz左右。由于傅里叶分析在应用于非线性和非稳态数据时的所有缺点,我们将在图12中给出基于HHT的希尔伯特频谱分析的结果进行比较。
图12中,“z”声的高频能量密度保持不变,最高频率可达12000Hz,但是不存在8000Hz的元音谐波。4000Hz的能量不是任何声音的谐波,而是声音在声道中的反射。该图谱中,在高频范围内不存在任何谐波,仅留下辅音,这样就为我们提供了一个很好的机会,可以在不改变元音部分声音的情况下放大辅音。这是本发明的关键技术。根据公式(3),我们可以放大前几个IMF而不影响元音(步骤150),对于IMF1和IMF2尤其如此。
图13是放大(步骤160)或缩小之后的重构信号的比较。放大的信号(H1z和H2z)代表了高频IMF的不同放大因子,不同的放大因子说明了本发明中的语音增强方法可以对不同患者的个性化选择性放大作用。与原始信号相比,我们可以看到放大仅选择性地放大了辅音部分,而使元音部分保持不变。
缩小信号(L1z和L2z)在不同程度上模拟了听力损失。对于老年患者而言,存在听力障碍的是辅音部分而非元音部分。目前市面上的自补偿机制的助听器会使得声音变大,但是声音缺乏清晰度。重要的是,如果选择性放大在1000Hz到4000Hz范围内的谐波上,则有效的是放大元音的基波而不会涉及辅音部分,最终的效果将是相当于放大了L1z或者L2z,使得声音将变得很大,但是清晰度不会提高。最后,我们可以将重构的信号转换回模拟信号,即声音信号(步骤170),以通过助听器的放大器或者麦克风进行回放(步骤180)。对于先天性听 力损失的病例,取决于个别患者的病情,放大对他们而言可能更加重要。
需要指出的是,助听器设计的原理是声音的“选择性放大”。放大大约2000Hz至4000Hz范围内的傅里叶,该方法有效地放大了谐波,这在没有基波的情况下,相当于放大了基波。但是这些基波根本不需要放大。然而,某些辅音没有谐波,也没有任何在2000Hz至4000Hz范围内的有形信号。傅里叶方法中的组合效果实际上是放大了可以听到的元音,等效于放大图13中的信号L1z或L2z。患者将不会获得任何清晰度,而只会获得响度,这正是当前基于傅里叶原理的助听器使用者的抱怨。
实施例二
更进一步地,为了节省时间,EMD方法可以用其他方法进行替代或者进行等效。这些等效方法包括连续运行方式的重复应用,中值方法,单独的一组带通滤波器,任何可以将信号分为高低部分的滤波器,具有根据输入信号所需的各种窗口大小的高通滤波器或其他时域滤波。所述步骤如下:首先通过连续运行均值分解数据,
Figure PCTCN2020086485-appb-000008
其中,<x(t)> nj表示窗口大小为nj的均值滤波器,其中nj必须为奇数。h j(t)是过滤器产生的类IMF。此外,矩形滤波器的反复使用实际上改变了所用的矩形滤波器的响应函数。例如,两次重复将会给出一个三角响应,重复四次以上,几乎可以得到高斯形状的响应。使用此过滤器的关键参数是窗口大小。根据公式(2)中的讨论,在22KHz的采样频率下,我们可以得出以下结论:矩形滤波器和EMD之间应具有如下的等价关系:
Figure PCTCN2020086485-appb-000009
滤波器的缺点是没有一个滤波器能像EMD一样清晰,这点我们稍后将会讲到。但是,滤波器仍能作为EMD的廉价替代品。
像公式(3)中那样实现选择性放大,并得到重构信号y(t)为:
Figure PCTCN2020086485-appb-000010
其中a j的值可以由患者根据公式(3)中相同,可以根据听力图测试结果确定。
为了详细说明EMD数据分解的替代方法在语音增强领域的应用,以及这些替代方法与EMD分解方法的性能对比,请参照图14至图18b,我们以“hello”的语言数据为例。图14为“hello”语言的数字化数据,其中“h”和“lo”是能听到的声音。图15为EMD分解的数据,其中能量最高的组分是IMF3,还有两个高频IMF分别为IMF1和IMF2。“hello”的希尔伯特频谱如图16所示,沿200Hz信号的能量密度表示声带的振动,在400Hz至1000Hz之间的能量密度主要表示发声器官的共振。2000Hz至3000Hz之间的高频能量表示声道的反射,该频率根据说话者的身高体重而定,因人而异。例如,图12中的反射信号要高得多,大约为4000Hz,这表明该说话者的体型较小。这些高频成分将会增加声音的音色。值得注意的是,只有极少的能量高于1000Hz。
图17是声音“hello”的傅里叶频谱图。从图中可以看到其涵盖所有频率范围的全部谐波。基于上面讨论的“基音缺失”现象,谐波的放大等于基波的放大。因此,在傅里叶分析中,任何试图在此范围内放大频率的尝试都将精确地证明缺少基波的现象,结果将是使得声音更加响亮,但是不会增加清晰度。
图18a是第一个IMF和滤波器滤波后的成分对比,此处对比使用的滤波器为均值滤波器。总体而言,它们看起来是相似的。放大后的细节如图18b所示,用于详细比较信号主要部分的差异,其中滤波器结果中缺乏动态范围是显而易见的。使用滤波器的方法并不能保证IMF属性,因为瞬时频率和产生的包络将与EMD方法不同。滤波器方法最关键的缺点是,均值滤波器将会去除低频分量尖锐特征的一些谐波。因此,用滤波器方法将会有泄露,但是该方法也是完整的,这样产生的类IMF的总和将加起来以完整地恢复原始数据。基于上述考虑,滤波器方法可以提供可接受的但是更加便宜的EMD产生的IMF替代品。滤波器方法可能仍具有完全相同的效果,以增加清晰度而不增加响度,因为清晰度的降低是由于TFS(时间精细结构,又称为辅音)的表示不足所致。这就是我 们在此实施例中所实现的功能。滤波器方法和EMD方法看起来相似,但是滤波器方法仍然会损失一些清晰度和其他定性细节。
由于EMD更加耗时,即使计算复杂度可与傅里叶变换相比。如果使用滤波器方法,我们可以得到与EMD方法相当的高频分量,声音可能不那么清晰,因为均值滤波器的确将滤波后的结果分布在更宽的时域上(图18a和18b详细显示了EMD与均值滤波器之间的比较)。最终结果将不如完整的EMD方法那么精确,但是,滤波器方法可以更简单,更便宜地实现。
实施例三
上述的两个实施例主要是介绍本发明中的助听方法,在介绍时主要应用是针对听力障碍人士的助听设备,即助听器。本发明中的基于语言增强的信号分解自适应算法除了在助听器方面的应用,还可以用于通信设备,例如电话或者电话会议中的广播。
电话语音是听力障碍患者的经典问题。随着高质量手机的发展,语音质量得到了极大的提高。但是,对于听力障碍患者而言,这仍然是一个挑战。声音的增强,降噪以及优化都是非常必要的。
对于电话会议中的广播,高频分量的快速衰减将使得到达听众的声音失去清晰度。因此,高频的选择性放大将会改善声音质量。
针对本发明中的算法在电话或电话会议中的广播,实现步骤如图19所示,其中的关键部分是语音增强模块。请参照图19所示,图19为本发明实施例的一种语音增强系统的框图。该语音增强系统包括声音接收模块10、声音增强模块20和声音播放模块30。其中,声音接收模块10用于接收声音信号,并判断接收到的声音信号为模拟信号或数字信号,当接收到的声音信号为模拟信号时,将模拟信号转换为数字信号。声音增强模块20用于对接收到的声音数字信号进行选择性放大,声音增强模块的关键部分所涉及的原理和详细步骤与助听器实施例中列出的原理和详细步骤相同。声音增强模块10在接收到数字声音信号之后,由自适应滤波器库对声音信号进行处理,得到多个本征模态函数分量或者多个类本征模态函数分量。其中,自适应滤波器库包括模态分解滤波器组,均值滤波器组,模态分解滤波器组采用本发明中利用任意一种可以取得本征模态分量的方法,例如经验模态分解法(Empirical Mode Decomposition,EMD),集合经模态分解法(Ensemble Empirical Mode Decomposition,EEMD),或者自适应 性二进位遮罩经验模态分解法(Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition,CADM-EMD),除了使用以上各种经验模态分解方法以及基于其改进的信号分解方法,还可以使用自适应滤波器组,如均值滤波器组,获得类本征模态分量。数字声音信号经过自适应滤波器库后得到多个本征模态函数分量或多个类本征模态函数分量,该多个本征模态函数分量或多个类本征模态函数分量表示声音数据在不同频率尺度的振幅随时间的变化。增益值调整单元,其数值可以根据听力障碍者的测量结果,确定不同频率范围内的声音信号振幅所需要放大的倍数;也可以根据辅音所在的频段范围进行预置。根据增益值调整单元,对自适应滤波器库处理后的本征模态函数分量或类本征模态函数分量进行选择性放大,不同的频率范围可以选择放大不同的倍数,以达到选择性放大的目的。将选择性增强后的本征模态函数分量或者类本征模态函数分量进行整合,得到增强的声音信号。声音播放模块30用于对增强后的声音进行播放,其将增强后的声音信号转换成模拟信号,并进行播放。
以上所述仅是本发明的优选实施例而已,并非对本发明做任何形式上的限制,虽然本发明已以优选实施例揭露如上,然而并非用以限定本发明,任何熟悉本专业的技术人员,在不脱离本发明技术方案的范围内,当可利用上述揭示的技术内容作出些许更动或修饰为等同变化的等效实施例,但凡是未脱离本发明技术方案的内容,依据本实用发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰,均仍属于本发明技术方案的范围内。

Claims (14)

  1. 一种声音增强方法,其特征在于,包括以下步骤:
    (1)取得声音信号,并将声音信号转化成数字信号;
    (2)将步骤(1)中的数字信号采用模态分解方法进行分解,取得多个本征模态函数(IMFs),该多个本征模态函数表示由声音信号转化的数字信号在不同频率的振幅随时间的变化;
    (3)将步骤(2)中取得的多个本征模态函数的振幅进行选择性放大;
    (4)将选择性放大后的本征模态函数进行整合,得到整合后的重构信号;
    (5)将整合后的重构信号转换为模拟信号。
  2. 根据权利要求1中的声音增强方法,其特征在于,还包括:所述模态分解方法包括经验模态分解法,集合经验模态分解法,或者自适应性二进位遮罩经验模态分解法。
  3. 根据权利要求1中的声音增强方法,其特征在于,还包括:步骤(3)中对本征模态函数的振幅进行放大时,放大的频率区间和放大倍数根据听力障碍患者的听力测试图谱进行确定。
  4. 根据权利要求1中的声音增强方法,其特征在于,还包括:步骤(3)中对本征模态函数的振幅进行放大时,选择对辅音频率范围的本征模态函数进行放大。
  5. 一种声音增强方法,其特征在于,包括以下步骤:
    (1)取得声音信号,并将声音信号转化为数字信号;
    (2)将步骤(1)中的数字信号采用自适应滤波器进行分解,取得多个类本征模态函数,该多个类本征模态函数表示由声音信号转化的数字信号在不同频率的振幅随时间的变化;
    (3)将步骤(2)中取得的多个类本征模态函数的振幅进行选择性放大;
    (4)将选择性放大后的类本征模态函数进行整合,得到整合后的重构信号;
    (5)将整合后的重构信号转换为模拟信号。
  6. 根据权利要求5中的声音增强方法,其特征在于,还包括:所述自适应滤波器为均值滤波器。
  7. 根据权利要求5中的声音增强方法,其特征在于,还包括:步骤(3)中 对类本征模态函数的振幅进行放大时,放大的频率区间和放大倍数根据听力障碍患者的听力测试图谱进行确定。
  8. 根据权利要求5中的声音增强方法,其特征在于,还包括:步骤(3)中对类本征模态函数的振幅进行放大时,选择对辅音频率范围的类本征模态函数进行放大。
  9. 根据权利要求1或5中的声音增强方法,其特征在于,所述声音增强方法应用于助听器,电话,以及电话会议中的广播设备。
  10. 一种声音增强系统,其特征在于,所述声音增强系统包括声音接收模块、声音增强模块和声音播放模块,其中:
    声音接收模块用于接收声音信号,并将声音信号转换为数字信号;
    声音增强模块用于对数字信号进行处理,得到多个本征模态函数或者多个类本征模态函数,将得到的本征模态函数或类本征模态函数的振幅进行选择性放大,并将选择性放大的本征模态函数或类本征模态函数进行整合,得到整合后的重构信号,并将重构信号转化为模拟信号,得到增强后的声音信号;
    声音播放模块,用于将增强后的声音信号进行播放。
  11. 根据权利要求10中的声音增强系统,其特征在于,所述声音增强模块包括自适应滤波器库,放大单元和整合单元,其中:
    自适应滤波器库用于对数字信号进行分解,得到数字信号的多个本征模态函数或者多个类本征模态函数;
    放大单元用于对多个本征模态函数或者多个类本征模态函数的振幅进行选择性放大;
    整合单元用于对增强的本征模态函数或者类本征模态函数进行整合,得到增强的声音信号。
  12. 根据权利要求11中的声音增强系统,其特征在于,所述声音增强模块还包括增益值调整单元,其根据听力障碍患者的听力测试图谱,获取听力障碍患者在不同频率范围内声音信号振幅所需要放大的倍数或者根据辅音所在的频率范围确定放大倍数,放大单元根据增益值调整单元对多个本征模态函数或多个类本征模态函数的振幅进行放大。
  13. 根据权利要求11中的声音增强系统,其特征在于,所述自适应滤波器库包括模态分解滤波器组或者均值滤波器组其中之一。
  14. 根据权利要求10中的声音增强系统,其特征在于,所述声音增强系统应用于助听器,电话,以及电话会议中的广播设备。
PCT/CN2020/086485 2019-12-11 2020-04-23 一种声音增强方法及声音增强系统 WO2021114545A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/764,057 US11570553B2 (en) 2019-12-11 2020-04-23 Method and apparatus for sound enhancement

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911265653.1 2019-12-11
CN201911265653.1A CN111107478B (zh) 2019-12-11 2019-12-11 一种声音增强方法及声音增强系统

Publications (1)

Publication Number Publication Date
WO2021114545A1 true WO2021114545A1 (zh) 2021-06-17

Family

ID=70421687

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086485 WO2021114545A1 (zh) 2019-12-11 2020-04-23 一种声音增强方法及声音增强系统

Country Status (3)

Country Link
US (1) US11570553B2 (zh)
CN (1) CN111107478B (zh)
WO (1) WO2021114545A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233037B (zh) * 2020-10-23 2021-06-11 新相微电子(上海)有限公司 基于图像分割的图像增强系统及方法
CN112468947A (zh) * 2020-11-27 2021-03-09 江苏爱谛科技研究院有限公司 一种实时增强语音的手机助听器系统
CN113286243A (zh) * 2021-04-29 2021-08-20 佛山博智医疗科技有限公司 一种自测言语识别的纠错系统及方法
CN114550740B (zh) * 2022-04-26 2022-07-15 天津市北海通信技术有限公司 噪声下的语音清晰度算法及其列车音频播放方法、系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4410764A (en) * 1981-06-29 1983-10-18 Rockwell International Corporation Speech processor for processing analog signals
CN103778920A (zh) * 2014-02-12 2014-05-07 北京工业大学 数字助听器中语音增强和频响补偿相融合方法
CN105095559A (zh) * 2014-05-09 2015-11-25 中央大学 实施全息希尔伯特频谱分析的方法与系统
CN107547983A (zh) * 2016-06-27 2018-01-05 奥迪康有限公司 用于提高目标声音的可分离性的方法和听力装置

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6738734B1 (en) 1996-08-12 2004-05-18 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Empirical mode decomposition apparatus, method and article of manufacture for analyzing biological signals and performing curve fitting
US5983162A (en) 1996-08-12 1999-11-09 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Computer implemented empirical mode decomposition method, apparatus and article of manufacture
US6381559B1 (en) 1996-08-12 2002-04-30 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Empirical mode decomposition apparatus, method and article of manufacture for analyzing biological signals and performing curve fitting
US6311130B1 (en) 1996-08-12 2001-10-30 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Computer implemented empirical mode decomposition method, apparatus, and article of manufacture for two-dimensional signals
US6240192B1 (en) * 1997-04-16 2001-05-29 Dspfactory Ltd. Apparatus for and method of filtering in an digital hearing aid, including an application specific integrated circuit and a programmable digital signal processor
US6862558B2 (en) 2001-02-14 2005-03-01 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Empirical mode decomposition for analyzing acoustical signals
US6901353B1 (en) 2003-07-08 2005-05-31 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Computing Instantaneous Frequency by normalizing Hilbert Transform
US7941298B2 (en) 2006-09-07 2011-05-10 DynaDx Corporation Noise-assisted data analysis method, system and program product therefor
US9818416B1 (en) * 2011-04-19 2017-11-14 Deka Products Limited Partnership System and method for identifying and processing audio signals
CN102222507B (zh) * 2011-06-07 2012-10-24 中国科学院声学研究所 一种适用于汉语语言的听力损失补偿方法及设备
JP2014122939A (ja) * 2012-12-20 2014-07-03 Sony Corp 音声処理装置および方法、並びにプログラム
CN104244155A (zh) * 2013-06-07 2014-12-24 杨国屏 处理声音段的方法及助听器
CN104299620A (zh) * 2014-09-22 2015-01-21 河海大学 一种基于emd算法的语音增强方法
EP3011895B1 (en) * 2014-10-26 2021-08-11 Tata Consultancy Services Limited Determining cognitive load of a subject from electroencephalography (EEG) signals
US10758186B2 (en) * 2015-04-20 2020-09-01 Vita-Course Technologies Co., Ltd. Physiological sign information acquisition method and system
US20210110925A1 (en) 2017-05-22 2021-04-15 Adaptive, Intelligent and Dynamic Brian Corporation (AidBrain) Method, module and system for analysis of brain electrical activity
CN111447872A (zh) 2017-12-11 2020-07-24 艾德脑科技股份有限公司 生理信号分析装置与方法
CN108682429A (zh) * 2018-05-29 2018-10-19 平安科技(深圳)有限公司 语音增强方法、装置、计算机设备及存储介质
EP3666178A1 (en) * 2018-12-14 2020-06-17 Widex A/S Monitoring system comprising a master device in wireless communication with at least one slave device having a sensor
CN109785854B (zh) * 2019-01-21 2021-07-13 福州大学 一种经验模态分解和小波阈值去噪相结合的语音增强方法
CN110426569B (zh) * 2019-07-12 2021-09-21 国网上海市电力公司 一种变压器声信号降噪处理方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4410764A (en) * 1981-06-29 1983-10-18 Rockwell International Corporation Speech processor for processing analog signals
CN103778920A (zh) * 2014-02-12 2014-05-07 北京工业大学 数字助听器中语音增强和频响补偿相融合方法
CN105095559A (zh) * 2014-05-09 2015-11-25 中央大学 实施全息希尔伯特频谱分析的方法与系统
CN107547983A (zh) * 2016-06-27 2018-01-05 奥迪康有限公司 用于提高目标声音的可分离性的方法和听力装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIU ZHUO-FU, LIAO ZHEN-PENG, SANG EN-FANG: "SPEECH ENHANCEMENT BASED ON HILBERT-HUANG TRANSFORM , EN-FANG SANG", PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, 18 August 2005 (2005-08-18) - 21 August 2005 (2005-08-21), pages 18 - 21, XP055822496 *

Also Published As

Publication number Publication date
CN111107478A (zh) 2020-05-05
US11570553B2 (en) 2023-01-31
US20210250704A1 (en) 2021-08-12
CN111107478B (zh) 2021-04-09

Similar Documents

Publication Publication Date Title
WO2021114545A1 (zh) 一种声音增强方法及声音增强系统
Rosen et al. Listening to speech in a background of other talkers: Effects of talker number and noise vocoding
Jørgensen et al. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing
Stone et al. On the near non-existence of “pure” energetic masking release for speech
US7243060B2 (en) Single channel sound separation
Moore Basic auditory processes involved in the analysis of speech sounds
Chen et al. Predicting the intelligibility of vocoded speech
Steinmetzger et al. The role of periodicity in perceiving speech in quiet and in background noise
Stone et al. Benefit of high-rate envelope cues in vocoder processing: Effect of number of channels and spectral region
Chen et al. Predicting the intelligibility of vocoded and wideband Mandarin Chinese
Yoo et al. Speech signal modification to increase intelligibility in noisy environments
Gnansia et al. Effects of spectral smearing and temporal fine structure degradation on speech masking release
Régnier et al. A method to identify noise-robust perceptual features: Application for consonant/t
Stone et al. The near non-existence of “pure” energetic masking release for speech: Extension to spectro-temporal modulation and glimpsing
Steinmetzger et al. Predicting the effects of periodicity on the intelligibility of masked speech: An evaluation of different modelling approaches and their limitations
Payton et al. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data
Rader et al. Speech perception with combined electric-acoustic stimulation: a simulation and model comparison
Li et al. Perceptual time-frequency subtraction algorithm for noise reduction in hearing aids
Lehtonen et al. Audibility of aliasing distortion in sawtooth signals and its implications for oscillator algorithm design
Jørgensen et al. Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility
US20220068289A1 (en) Speech Processing Method and System in A Cochlear Implant
Bhattacharya et al. Combined spectral and temporal enhancement to improve cochlear-implant speech perception
Souza et al. Application of the envelope difference index to spectrally sparse speech
Lyzenga et al. A speech enhancement scheme incorporating spectral expansion evaluated with simulated loss of frequency selectivity
Monson et al. On the use of the TIMIT, QuickSIN, NU-6, and other widely used bandlimited speech materials for speech perception experiments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20899682

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20899682

Country of ref document: EP

Kind code of ref document: A1