WO2022160593A1 - 一种语音增强方法、装置、系统及计算机可读存储介质 - Google Patents

一种语音增强方法、装置、系统及计算机可读存储介质 Download PDF

Info

Publication number
WO2022160593A1
WO2022160593A1 PCT/CN2021/103635 CN2021103635W WO2022160593A1 WO 2022160593 A1 WO2022160593 A1 WO 2022160593A1 CN 2021103635 W CN2021103635 W CN 2021103635W WO 2022160593 A1 WO2022160593 A1 WO 2022160593A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain
signal
time
bone conduction
frequency
Prior art date
Application number
PCT/CN2021/103635
Other languages
English (en)
French (fr)
Inventor
陈国明
Original Assignee
歌尔股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 歌尔股份有限公司 filed Critical 歌尔股份有限公司
Priority to US18/263,357 priority Critical patent/US20240079021A1/en
Publication of WO2022160593A1 publication Critical patent/WO2022160593A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/41Detection or adaptation of hearing aid parameters or programs to listening situation, e.g. pub, forest
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/61Aspects relating to mechanical or electronic switches or control elements, e.g. functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2460/00Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
    • H04R2460/13Hearing devices using bone conduction transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/60Mounting or interconnection of hearing aid parts, e.g. inside tips, housings or to ossicles
    • H04R25/604Mounting or interconnection of hearing aid parts, e.g. inside tips, housings or to ossicles of acoustic or vibrational transducers
    • H04R25/606Mounting or interconnection of hearing aid parts, e.g. inside tips, housings or to ossicles of acoustic or vibrational transducers acting directly on the eardrum, the ossicles or the skull, e.g. mastoid, tooth, maxillary or mandibular bone, or mechanically stimulating the cochlea, e.g. at the oval window

Definitions

  • the present application relates to the technical field of speech processing, and in particular, to a speech enhancement method, apparatus, system, and computer-readable storage medium.
  • Speech enhancement is an effective method to solve noise pollution, so it is widely used in digital mobile phones, Hands-free phone systems in automobiles, teleconferencing, reducing background interference for the hearing-impaired and other civil and military occasions.
  • the main goal of speech enhancement is to extract the pure speech signal from the noisy speech signal as much as possible at the receiving end, so as to reduce the auditory fatigue of the listeners and improve the intelligibility.
  • Air conduction is the well-known sound wave transmitted from the external auditory canal to the middle ear through the auricle, and then to the inner ear through the ossicular chain, and the speech spectrum components are relatively rich. Due to the influence of environmental noise, the air-conducted speech signal is inevitably polluted by noise.
  • Bone conduction means that sound waves are transmitted to the inner ear through the vibration of the skull, jaw, etc. In bone conduction, sound waves can also be transmitted to the inner ear without passing through the outer and middle ears.
  • the bone voiceprint sensor can only collect information that is in direct contact with the bone conduction microphone and generates vibration. In theory, it cannot collect speech transmitted through the air, and is not disturbed by environmental noise. It is very suitable for speech transmission in noisy environments. However, due to the influence of the process, the bone voiceprint sensor can only collect and transmit lower frequency voice signals, which makes the voice sound dull and affects the sound quality and user experience.
  • the purpose of the embodiments of the present application is to provide a speech enhancement method, device, system and computer-readable storage medium, which can make the output sound signal more pleasant during use, improve the sound quality of the sound, and improve the user experience.
  • the embodiment of the present application provides a speech enhancement method, including:
  • time-domain microphone signal and the time-domain bone conduction signal are speech signals, and if so, perform noise removal processing on the time-domain microphone signal through a pre-established DNN noise removal model to obtain a time-domain noise-removed time-domain signal.
  • microphone signal perform noise removal processing in the frequency domain on the time-domain bone conduction signal to obtain a time-domain bone conduction signal after noise removal; if not, set the output signal corresponding to the current moment to zero;
  • an output time domain signal corresponding to the current moment is obtained.
  • the process of performing noise removal processing in the frequency domain on the time-domain bone conduction signal to obtain the time-domain bone conduction signal after noise removal is as follows:
  • the time-domain bone conduction signal if not satisfied, use the pre-established DNN bandwidth expansion model to expand the bandwidth of the frequency-domain bone conduction signal after noise removal, so that the expanded bandwidth reaches the preset bandwidth, and The time-frequency inverse transformation is performed on the expanded frequency-domain bone conduction signal to obtain a time-domain bone conduction signal after noise removal.
  • the process of performing noise removal processing on the time-domain microphone signal by using a pre-established DNN noise removal model to obtain a time-domain microphone signal after noise removal is as follows:
  • Extract the first signal feature of the frequency-domain microphone signal and use a pre-established DNN noise elimination model to process the first signal feature to obtain the first frequency points corresponding to the first frequency points of the frequency-domain microphone signal respectively. a gain;
  • the time-frequency inverse transformation is performed on the noise-eliminated frequency-domain microphone signal to obtain a noise-eliminated time-domain microphone signal.
  • the process of judging whether the time-domain microphone signal and the time-domain bone conduction signal are speech signals is:
  • the time-domain microphone signal is a speech signal.
  • the process of performing voice activation detection on the time-domain bone conduction signal and judging whether the time-domain bone conduction signal is a voice signal is:
  • Whether the time-domain bone conduction signal is a voice signal is determined according to the voice activation detection flag bit.
  • the process of performing fusion judgment on the zero-crossing rate, the pitch period, the spectral energy, and the spectral centroid, and obtaining a voice activation detection marker bit corresponding to the time-domain bone conduction signal for:
  • the voice activation detection flag corresponding to the time domain bone conduction signal is 0; otherwise, the voice activation detection corresponding to the time domain bone conduction signal
  • the flag bit is 1;
  • the voice activation detection flag bit is 1, the time-domain bone conduction signal is a voice signal;
  • the current time-domain bone conduction signal is a noise signal.
  • the process of obtaining the output time domain signal corresponding to the current moment according to the first output time domain signal and the second output time domain signal is:
  • the fused time-domain signal is dynamically adjusted so that the adjusted time-domain signal is within a preset range, and the adjusted time-domain signal is used as the output time-domain signal corresponding to the current moment.
  • the embodiment of the present application also provides a voice enhancement device, including:
  • an acquisition module for acquiring the time-domain microphone signal and the time-domain bone conduction signal at the current moment
  • a judging module for judging whether the time-domain microphone signal and the time-domain bone conduction signal are speech signals, and if so, triggering a noise reduction module; if not, triggering a zero-setting module;
  • the noise reduction module is configured to perform noise removal processing on the time-domain microphone signal through a pre-established DNN noise removal model to obtain a noise-removed time-domain microphone signal, which is used to perform a frequency-frequency analysis on the time-domain bone conduction signal. Domain noise removal processing to obtain the time-domain bone conduction signal after noise removal;
  • the zero-setting module is used to set the output signal corresponding to the current moment to zero;
  • a filtering module configured to perform high-pass filtering processing on the time-domain microphone signal after noise removal to obtain a first output time-domain signal, and perform low-pass filtering processing on the noise-eliminated time-domain bone conduction signal to obtain the second output time domain signal;
  • a fusion module configured to obtain an output time domain signal corresponding to the current moment according to the first output time domain signal and the second output time domain signal.
  • the embodiment of the present application also provides a speech enhancement system, including:
  • the processor is configured to implement the steps of the above-mentioned speech enhancement method when executing the computer program.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned speech enhancement method are implemented.
  • Embodiments of the present application provide a speech enhancement method, device, system, and computer-readable storage medium.
  • the method picks up a time-domain microphone signal and a time-domain bone conduction signal, and then determines the time-domain microphone signal and the time-domain bone conduction signal by judging the time-domain microphone signal and the time-domain bone conduction signal. Whether it is a voice signal, it can be determined whether the user is speaking at the current moment. When it is a voice signal, the noise cancellation process is further performed on the time-domain microphone signal through the pre-established DNN noise cancellation model, and the time-domain bone conduction signal is processed in the frequency domain.
  • the noise elimination process is performed using the original noise elimination process, so as to better eliminate the background noise, and then perform high-pass filtering on the noise-eliminated time-domain microphone signal to obtain the first output time-domain signal of the high-frequency part.
  • the signal is processed by low-pass filtering, the second output time domain signal of the low frequency part is obtained, and then the output time domain including both the high frequency part and the low frequency part can be obtained according to the first output time domain signal and the second output time domain signal signal; the application can better eliminate background noise, which is conducive to improving the sound quality of the sound and improving the user experience.
  • FIG. 1 is a schematic diagram of the existing bone conduction principle
  • FIG. 2 is a schematic flowchart of a speech enhancement method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a speech enhancement apparatus provided by an embodiment of the present application.
  • Embodiments of the present application provide a speech enhancement method, device, system and computer-readable storage medium, which can make the output sound signal more pleasant, improve the sound quality of the sound, and improve the user experience during use.
  • FIG. 2 is a schematic flowchart of a speech enhancement method provided by an embodiment of the present application.
  • the method includes:
  • the time-domain microphone signal can be picked up by a microphone
  • the time-domain bone conduction signal can be collected by the bone voiceprint sensor
  • the time-domain microphone signal and the time-domain bone conduction signal obtained at each moment are all used in this application.
  • the speech enhancement method provided by the embodiment performs processing.
  • S120 Determine whether the time-domain microphone signal and the time-domain bone conduction signal are speech signals, if so, go to S130; if not, go to S140;
  • the time-domain microphone signal and the time-domain bone conduction signal are speech signals. It reflects whether the user is currently speaking, so by judging whether the time-domain bone conduction signal is a speech signal, it can be further determined whether the time-domain microphone signal picked up by the microphone at the current moment is a speech signal, that is, when the time-domain at the current moment is determined.
  • the time-domain microphone signal at the current moment is also a voice signal, which means that the time-domain microphone signal at the current moment is also a voice signal.
  • the time-domain bone conduction signal at the current moment is a noise signal, it means that the time-domain microphone signal at the current moment is also a noise signal.
  • S130 Perform noise cancellation processing on the time-domain microphone signal by using a pre-established DNN noise cancellation model to obtain a noise-cancelled time-domain microphone signal, and perform noise cancellation processing on the time-domain bone conduction signal in the frequency domain to obtain a noise-cancelled signal.
  • a DNN noise elimination model may be established in advance, and then the DNN noise elimination model is used to perform noise elimination processing on the time-domain microphone signal, wherein the establishment process of the DNN noise elimination model is for:
  • the time-domain noise signal n' and the time-domain microphone voice signal s are actually recorded, the mixed signal s_mix of the time-domain noise signal n' and the time-domain microphone voice signal s is calculated, and the time-domain noise signal n', the time-domain microphone voice signal and the mixed signal are calculated.
  • the signals are respectively time-frequency transformed (such as FFT), and the obtained frequency domain signals are N'(k), S(k) and S_mix(k) respectively, where k is the frequency domain serial number. Then perform feature extraction on S_mix(k), and calculate the first feature parameter.
  • first sub-band division Divide the time-domain microphone voice signal s and the mixed signal s_mix into a plurality of first sub-bands (such as 18 first sub-bands) in the frequency domain, and the first sub-band division can be divided by the mel frequency.
  • the division method of the bark sub-band is adopted, and the specific method can be determined according to actual needs.
  • the first characteristic parameter of the real mixed signal calculated above is used as the input signal, and the real first subband gain g obtained by the above calculation is used as the output signal, Continuously train and adjust the weight coefficients W, U and bias in the deep neural network, so that the first gain g' of each output is constantly approaching the real first gain value g.
  • the network is successfully trained, and the final DNN noise elimination model is obtained according to the network parameters at this time.
  • the method may further include:
  • P n (k,t) represents t
  • P n (k,t-1 represents the power of the noise signal received by the bone conduction sensor at time t-1
  • Y(k, t) represents the kth signal at time t
  • the frequency domain bone conduction signal, k represents the frequency domain serial number, ⁇ represents the iteration factor, and ⁇ may be 0.9.
  • the specific value of ⁇ can be determined according to actual needs, which is not limited in this embodiment.
  • the above-mentioned process of performing noise elimination processing on the time-domain bone conduction signal in the frequency domain to obtain the time-domain bone conduction signal after noise elimination may be specifically:
  • Noise elimination is performed on the frequency-domain bone conduction signal to obtain the eliminated frequency-domain bone conduction signal, where, Y t (k) represents the spectral signal at time t, represents the spectral signal after noise removal, H t (k) represents the gain function, ⁇ represents the overreduction factor, ⁇ is a constant (for example, 0.9), and ⁇ t (k) represents the posterior signal-to-noise ratio.
  • the corresponding time-domain microphone signal is also a noise signal, so the output signal corresponding to the current moment can be directly set to zero.
  • S150 Perform high-pass filtering processing on the noise-eliminated time-domain microphone signal to obtain a first output time-domain signal, and perform low-pass filtering processing on the noise-eliminated time-domain bone conduction signal to obtain a second output time-domain signal;
  • the embodiment of the present application can perform high-pass on the noise-eliminated time-domain microphone signal.
  • the filtering process is performed to obtain the first output time-domain signal of the high-frequency part
  • the low-pass filtering process is performed on the time-domain bone conduction signal after noise removal to obtain the second output time-domain signal of the low-frequency part.
  • S160 Obtain an output time domain signal corresponding to the current moment according to the first output time domain signal and the second output time domain signal.
  • the present application can fuse the first output time domain signal and the second output time domain signal.
  • the first weight coefficient k1 corresponding to the first output time domain signal and the second output time domain signal can be predetermined.
  • the corresponding second weight coefficient k2 and then the fused time domain signal is obtained by summing the respective weight coefficients.
  • the fused time-domain signal can also be dynamically adjusted, the excessively large signal can be compressed, and the excessively small signal can be appropriately amplified to prevent the signal from overflowing.
  • the time domain signal of is the output time domain signal corresponding to the current moment.
  • the process of performing noise elimination processing in the frequency domain on the time-domain bone conduction signal to obtain the time-domain bone conduction signal after noise elimination may be specifically:
  • the pre-established DNN bandwidth expansion model is used to expand the bandwidth of the frequency-domain bone conduction signal after noise removal, so that the expanded bandwidth reaches the preset bandwidth, and the expanded frequency-domain bone conduction signal is expanded.
  • the time-frequency inverse transformation of the signal is performed to obtain a time-domain bone conduction signal after noise removal.
  • the bandwidth of the frequency-domain bone conduction signal after noise removal reaches the preset bandwidth (the preset bandwidth can be 1 kHz), if If achieved, then directly perform time-frequency inverse transformation on the frequency-domain bone conduction signal after noise removal to obtain the time-domain bone conduction signal after noise removal; if the preset bandwidth is not met, the pre-established DNN bandwidth expansion model can be used to The frequency domain bone conduction signal after noise removal is subjected to bandwidth expansion, and the expanded bandwidth can reach the preset bandwidth. domain bone conduction signal.
  • the preset bandwidth can be 1 kHz
  • the establishment process of the DNN bandwidth expansion model is as follows:
  • the bone conduction noise signal n g and the bone conduction speech signal s g remaining after noise reduction are actually obtained, the mixed signal s g _mix of the bone conduction noise signal n g and the bone conduction speech signal s g is calculated, and the bone conduction noise signal n g and the bone conduction speech signal s g are calculated.
  • the bone conduction speech signal s g and the bone conduction mixed signal s g _mix are respectively time-frequency transformed (such as FFT) to obtain the frequency domain signals N g (k), S g (k) and S g _mix (k), and then N g (k) g (k), S g (k) and S g _mix (k) respectively perform feature extraction, and calculate their respective second feature parameters.
  • the bone conduction speech signal s g and the mixed signal s g _mix are divided into a plurality of second sub-bands (such as 5 second sub-bands) in the frequency domain, and the way of dividing the second sub-bands can be divided by mel frequency
  • the division method of the bark subband can also be used, and the specific method can be determined according to actual needs; calculate the bone conduction speech signal energy and bone conduction mixed signal energy on each second subband:
  • the second subband gain is calculated, which can be specified according to A calculation is performed, where g(b') represents the gain of the b'th second subband.
  • the real second characteristic parameter obtained by the above calculation is used as the input signal, and the real second subband gain g obtained by the above calculation is used as the output signal. Adjust the weight coefficients W and U bias in the deep neural network, so that the second gain of each output is continuously close to the real value.
  • the network training is successful, and the final DNN bandwidth expansion model is obtained according to the network parameters at this time.
  • the process of using a pre-established DNN bandwidth expansion model to expand the bandwidth of the frequency-domain bone conduction signal after noise cancellation may specifically include: extracting features from the frequency-domain bone conduction signal to obtain the second signal feature; using the above The pre-established DNN bandwidth expansion model processes the second signal feature to obtain the second gain corresponding to each second frequency domain point of the frequency domain bone conduction signal respectively;
  • the time-frequency inverse transformation is performed on the frequency-domain microphone signal after noise removal to obtain a time-domain microphone signal after noise removal.
  • the process of judging whether the time-domain bone conduction signal is a speech signal in the above S120 may be specifically:
  • Voice activation detection is performed on the time-domain bone conduction signal to determine whether the time-domain bone conduction signal is a voice signal.
  • the above-mentioned process of performing voice activation detection on the time-domain bone conduction signal and judging whether the time-domain bone conduction signal is a voice signal may specifically be:
  • Time-frequency transform is performed on the time-domain bone conduction signal to obtain the frequency-domain bone conduction signal; specifically, the FFT fast Fourier transform can be used to process the time-domain bone conduction signal to obtain the frequency-domain bone conduction signal;
  • time-domain bone conduction signal is a speech signal is determined according to the voice activation detection marker bit.
  • the zero-crossing rate corresponding to the time-domain bone conduction signal is calculated according to the first calculation formula, wherein the first calculation formula is:
  • Z n represents the zero-crossing number
  • x(m) represents the time domain signal corresponding to the time variable m
  • x(m-1) represents the time domain signal corresponding to the time variable m-1
  • x(n) represents the time domain signal corresponding to the time variable m-1.
  • the time domain signal corresponding to the variable n, x(n-1) represents the time domain signal corresponding to the time variable n-1; n ⁇ N, N represents the length of the current time domain signal x(n);
  • ZCR Zn/(m2-m1 + 1), where ZCR represents the zero-crossing rate, m1 represents the m1 th point in the current frame time domain signal column, and m2 represents the m2 th point in the current frame time domain signal.
  • the autocorrelation function is: Among them, R m represents the autocorrelation function of the speech signal, and x(n+m) represents the time domain signal corresponding to the time variable n+m;
  • the 8khz bandwidth is divided into 128 sub-bands, and the lower 24 sub-band energy is taken:
  • E g represents the logarithmic energy of the lower 24 sub-bands
  • j represents the sequence number of the lower 24 sub-bands
  • Y(j) represents the frequency domain signal. 24 subbands are taken frequently.
  • E(k)
  • the above-mentioned process of merging and judging the zero-crossing rate, the pitch period, the spectral energy and the spectral centroid, and obtaining the speech activation detection marker bit corresponding to the time-domain bone conduction signal can be specifically:
  • the voice activation detection flag bit corresponding to the time-domain bone conduction signal is 0; otherwise, the voice activation detection flag bit corresponding to the time-domain bone conduction signal is 1;
  • the first preset value can be -9
  • the second preset value can be 03.6
  • the third preset value can be 143
  • the fourth preset value can be 8
  • the fifth preset value can be The value may be 3.
  • the specific value of each preset value may be determined according to the actual situation, which is not specially limited in this embodiment.
  • the corresponding above-mentioned process of judging whether the time-domain bone conduction signal is a speech signal according to the voice activation detection marker bit may be specifically:
  • the voice activation detection flag bit is 1, the time-domain bone conduction signal is a voice signal;
  • the current time-domain bone conduction signal is a noise signal.
  • the process of performing noise removal processing on the time-domain microphone signal and the time-domain bone conduction signal in the above S130 may specifically be:
  • the noise removal processing is performed on the time-domain microphone signal, and the time-domain microphone signal after noise removal is obtained;
  • the time-domain bone conduction signal is subjected to frequency-domain noise removal processing to obtain a time-domain bone conduction signal after noise removal.
  • the microphone signal in the time domain is picked up by the microphone
  • the bone conduction signal in the time domain is collected by the bone voiceprint sensor, and then by judging whether the microphone signal in the time domain and the bone conduction signal in the time domain are speech signals, it can be determined whether the current moment is not.
  • the user is speaking.
  • the noise removal processing is further performed on the time-domain microphone signal through the pre-established DNN noise removal model, and the frequency-domain noise removal processing is performed on the time-domain bone conduction signal, so as to better eliminate the background.
  • the noise and then perform high-pass filtering on the noise-eliminated time-domain microphone signal to obtain the first output time-domain signal of the high-frequency part, and perform low-pass filtering on the noise-eliminated time-domain bone conduction signal to obtain the low-frequency part.
  • the second output time-domain signal and then according to the first output time-domain signal and the second output time-domain signal, the output time-domain signal containing both high-frequency part and low-frequency part can be obtained; the application can better eliminate background noise , which is beneficial to improve the sound quality and user experience.
  • an embodiment of the present application further provides a voice enhancement apparatus, for details, please refer to FIG. 3 .
  • the device includes:
  • an acquisition module 21 for acquiring the time-domain microphone signal and the time-domain bone conduction signal at the current moment
  • the judgment module 22 is used for judging whether the time-domain microphone signal and the time-domain bone conduction signal are speech signals, if so, triggering the noise reduction module 23; if not, triggering the zero-setting module 24;
  • the noise reduction module 23 is used to perform noise removal processing on the time-domain microphone signal through the pre-established DNN noise removal model to obtain a time-domain microphone signal after noise removal, and is used for performing frequency-domain noise removal processing on the time-domain bone conduction signal. Obtain the time-domain bone conduction signal after noise removal;
  • the zero-setting module 24 is used to set the output signal corresponding to the current moment to zero;
  • the filtering module 25 is configured to perform high-pass filtering processing on the time-domain microphone signal after noise removal to obtain a first output time-domain signal, and perform low-pass filtering processing on the noise-eliminated time-domain bone conduction signal to obtain a second output time domain signal;
  • the fusion module 26 is configured to obtain an output time domain signal corresponding to the current moment according to the first output time domain signal and the second output time domain signal.
  • the speech enhancement device provided in the embodiment of the present application has the same beneficial effects as the speech enhancement method provided in the above-mentioned embodiment, and for the specific introduction of the speech enhancement method involved in this embodiment, please refer to The above-mentioned embodiments are not repeated in this application.
  • the embodiments of the present application further provide a speech enhancement system, which includes:
  • the processor is configured to implement the steps of the above speech enhancement method when executing the computer program.
  • the processor in this embodiment of the present application may be specifically configured to receive a time-domain microphone signal and a time-domain bone conduction signal at the current moment, where the time-domain microphone signal is picked up by a microphone, and the time-domain bone conduction signal is picked up by a microphone. It is collected by the bone voiceprint sensor; it is judged whether the time-domain microphone signal and the time-domain bone conduction signal are speech signals.
  • the time-domain microphone signal is used to perform noise removal processing on the time-domain bone conduction signal in the frequency domain to obtain the time-domain bone conduction signal after noise removal; if not, set the output signal corresponding to the current moment to zero;
  • the noise-eliminated time-domain microphone signal is subjected to high-pass filtering to obtain a first output time-domain signal, and the noise-eliminated time-domain bone conduction signal is subjected to low-pass filtering to obtain a second output time-domain signal;
  • the output time domain signal and the second output time domain signal are output to obtain the output time domain signal corresponding to the current moment.
  • the embodiments of the present application also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned speech enhancement method are implemented.
  • the computer-readable storage medium may include: U disk, removable hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc. that can store program codes medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurosurgery (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种语音增强方法、装置、系统及计算机可读存储介质,该方法包括:获取当前时刻的时域麦克风信号和时域骨导信号(S110);判断时域麦克风信号和时域骨导信号是否为语音信号(S120),若是,则通过预先建立的DNN噪声消除模型对时域麦克风信号进行噪声消除处理,并对时域骨导信号进行频域的噪声消除处理(S130);若否,则将与当前时刻对应的输出信号置为零(S140);对经噪声消除后的时域麦克风信号进行高通滤波处理,得到第一输出时域信号,对经噪声消除后的时域骨导信号进行低通滤波处理,得到第二输出时域信号(S150);依据第一输出时域信号和第二输出时域信号,得到与当前时刻对应的输出时域信号(S160);该方法能够较好的消除背景噪声,有利于提高声音的音质,提升用户体验。

Description

一种语音增强方法、装置、系统及计算机可读存储介质
本申请要求于2021年01月28日提交中国专利局、申请号202110119855.6、申请名称为“一种语音增强方法、装置、系统及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音处理技术领域,特别是涉及一种语音增强方法、装置、系统及计算机可读存储介质。
背景技术
语音增强是解决噪声污染的有效方法,因此被广泛的用于数字移动电话、汽车中Hands-free电话系统、远距离电话会议(teleconferencing)、为听力障碍者降低背景干扰等等民用和军用场合。语音增强的主要目标就是在接收端尽可能从带噪语音信号中提取纯净的语音信号,降低听众的听觉疲劳程度,提高可懂度。
在正常情况下,如图1所示声波可以通过两条路径传入内耳:空气传导和骨传导。空气传导是我们所熟知的声波经耳廓由外耳道传递到中耳,再经听骨链传到内耳,语音频谱成份比较丰富。由于环境噪声的影响,经过空气传导的语音信号不可避免受到噪声的污染。
骨传导是指声波通过颅骨、颌骨等的振动传到内耳,在骨传导中声波无需经过外耳和中耳也可以传递到内耳。骨声纹传感器只能采集与骨导麦克风直接接触并产生振动的信息,理论上不能采集通过空气传播的语音,不受环境噪声的干扰,非常适用于噪声环境下的语音传输。但由于工艺影响,骨声纹传感器只能采集并传送较低频率的语音信号,导致语音听起来比较沉闷,影响音质及用户体验。
鉴于此,如何提供一种解决上述技术问题的语音增强方法、装置、系统及计算机可读存储介质成为本领域技术人员需要解决的问题。
发明内容
本申请实施例的目的是提供一种语音增强方法、装置、系统及计算机可读存储介质,在使用过程中可以使输出的声音信号更加好听,提高声音的音质,提升用户体验。
为解决上述技术问题,本申请实施例提供了一种语音增强方法,包括:
获取当前时刻的时域麦克风信号和时域骨导信号;
判断所述时域麦克风信号和所述时域骨导信号是否为语音信号,若是,则通过预先建立的DNN噪声消除模型对所述时域麦克风信号进行噪声消除处理得到经噪声消除后的时域麦克风信号,对所述时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号;若否,则将与所述当前时刻对应的输出信号置为零;
对所述经噪声消除后的时域麦克风信号进行高通滤波处理,得到第一输出时域信号,对所述经噪声消除后的时域骨导信号进行低通滤波处理,得到第二输出时域信号;
依据所述第一输出时域信号和所述第二输出时域信号,得到与所述当前时刻对应的输出时域信号。
可选的,所述对所述时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号的过程为:
将所述时域骨导信号通过时频转换,转换为频域骨导信号;
对所述频域骨导信号进行频域的噪声消除处理,得到经噪声消除后的频域骨导信号;
判断所述经噪声消除后的频域骨导信号的带宽是否达到预设带宽,若达到,则直接对所述经噪声消除后的频域骨导信号进行时频反变换,得到经噪声消除后的时域骨导信号;若不满足,则采用预先建立的DNN带宽扩展模型对所述经噪声消除后的频域骨导信号进行带宽扩展,使扩展后的带宽达到所述预设带宽,并将所述扩展后的频域骨导信号进行时频反变换,得到经噪声消除后的时域骨导信号。
可选的,所述通过预先建立的DNN噪声消除模型对所述时域麦克风信号进行噪声消除处理,得到消除噪声后的时域麦克风信号的过程为:
对所述时域麦克风信号进行时频变换,得到对应的频域麦克风信号;
提取所述频域麦克风信号的第一信号特征,并采用预先建立的DNN噪声消除模型对所述第一信号特性进行处理,得到与所述频域麦克风信号的各个第一频率点分别对应的第一增益;
计算所述频域麦克风信号中与每个所述第一频率点对应的频谱信号与对应的第一增益的乘积,得到与每个所述第一频率点各自对应的、消除噪声后的频谱信号,以得到消除噪声后的频域麦克风信号;
将所述消除噪声后的频域麦克风信号进行时频反变换,得到消除噪声后的时域麦克风信号。
可选的,所述判断所述时域麦克风信号和所述时域骨导信号是否为语音信号的过程为:
对所述时域骨导信号进行语音激活检测,以判断所述时域骨导信号是否为语音信号;
当所述时域骨导信号为语音信号时,所述时域麦克风信号为语音信号。
可选的,所述对所述时域骨导信号进行语音激活检测,判断所述时域骨导信号是否为语音信号的过程为:
计算所述时域骨导信号对应的过零率及基音周期;
对所述时域骨导信号进行时频变换,得到频域骨导信号;
计算所述频域骨导信号对应的频谱能量及谱质心;
对所述过零率、所述基音周期、所述频谱能量及所述谱质心进行融合判断,并得到与所述时域骨导信号对应的语音激活检测标记位;
依据所述语音激活检测标记位判断所述时域骨导信号是否为语音信号。
可选的,所述对所述过零率、所述基音周期、所述频谱能量及所述谱质心进行融合判断,并得到与所述时域骨导信号对应的语音激活检测标记位的过程为:
判断所述频谱能量是否小于第一预设值,若是,则与所述时域骨导信号对应的语音激活检测标记位为0;若否,则进入下一步判断;
判断所述过零率是否大于第二预设值,若是,则与所述时域骨导信号对应的语音激活检测标记位为0,若否,则进入下一步判断;
判断所述基音周期是否大于第三预设值或小于第四预设值,若是,则与所述时域骨导信号对应的语音激活检测标记位为0;否则,进入下一步判断;
判断所述谱质心是否大于第五预设值,若是,则与所述时域骨导信号对应的语音激活检测标记位为0;否则,则与所述时域骨导信号对应的语音激活检测标记位为1;
则,所述依据所述语音激活检测标记位判断所述时域骨导信号是否为语音信号的过程为:
当所述语音激活检测标记位为1时,所述时域骨导信号为语音信号;
当所述语音激活检测标记位为0时,所述当前的时域骨导信号为噪声信号。
可选的,所述依据所述第一输出时域信号和所述第二输出时域信号,得到与所述当前时刻对应的输出时域信号的过程为:
依据第一权重系数和第二权重系数对所述第一输出时域信号和所述第二输出时域信号进行融合,得到融合后的时域信号;
对融合后的时域信号进行动态调整,使调整后的时域信号在预设范围内,并将调整后的时域信号作为与所述当前时刻对应的输出时域信号。
本申请实施例还提供了一种语音增强装置,包括:
获取模块,用于获取当前时刻的时域麦克风信号和时域骨导信号;
判断模块,用于判断所述时域麦克风信号和所述时域骨导信号是否为语音信号,若是,则触发降噪模块;若否,则触发置零模块;
所述降噪模块,用于通过预先建立的DNN噪声消除模型对所述时域麦克风信号进行噪声消除处理得到经噪声消除后的时域麦克风信号,用于对所述时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号;
所述置零模块,用于将与所述当前时刻对应的输出信号置为零;
滤波模块,用于对所述经噪声消除后的时域麦克风信号进行高通滤波处理,得到第一输出时域信号,对所述经噪声消除后的时域骨导信号进行低通滤波处理,得到第二输出时域信号;
融合模块,用于依据所述第一输出时域信号和所述第二输出时域信号,得到与所述当前时刻对应的输出时域信号。
本申请实施例还提供了一种语音增强系统,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现如上述所述语音增强方法的步骤。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上述所述语音增强方法的步骤。
本申请实施例提供了一种语音增强方法、装置、系统及计算机可读存储介质,该方法通过拾取时域麦克风信号和时域骨导信号,然后通过判断时域麦克风信号和时域骨导信号是否为语音信号,可以确定出当前时刻是否是用户在讲话,当是语音信号时进一步通过预先建立的DNN噪声消除模型对时域麦克风信号进行噪声消除处理,并对时域骨导信号进行频域的噪声消除处理,从而较好的消除背景噪声,再对经噪声消除后的时域麦克风信号进行高通滤波后得到高频部分的第一输出时域信号,对经噪声消除后的时域骨导信号进行低通滤波处理后,得到低频部分的第二输出时域信号,然后根据第一输出时域信号和第二输出时域信号即可得到既包含高频部分又包含低频部分的输出时域信号;本申请能够较好的消除背景噪声,有利于提高声音的音质,提升用户体验。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对现有技术和实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为现有的骨传导原理示意图;
图2为本申请实施例提供的一种语音增强方法的流程示意图;
图3为本申请实施例提供的一种语音增强装置的结构示意图。
具体实施方式
本申请实施例提供了一种语音增强方法、装置、系统及计算机可读存储介质,在使用过程中可以使输出的声音信号更加好听,提高声音的音质,提升用户体验。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参照图2,图2为本申请实施例提供的一种语音增强方法的流程示意图。该方法包括:
S110:获取当前时刻的时域麦克风信号和时域骨导信号;
具体的,在实际应用中可以通过麦克风拾取时域麦克风信号,通过骨声纹传感器采集时域骨导信号,并对每一时刻所获取的时域麦克风信号和时域骨导信号均采用本申请实施例所提供的语音增强方法进行处理。
S120:判断时域麦克风信号和时域骨导信号是否为语音信号,若是,则进入S130;若否,则进入S140;
需要说明的是,在获取当前时刻的时域麦克风信号和时域骨导信号后,可以判断时域麦克风信号和时域骨导信号是不是语音信号,其中,由于时域骨导信号能够准确的反应当前是不是用户在说话,因此通过判断时域骨导信号是不是语音信号,能够进一步确定当前时刻麦克风拾取到的时域麦克风信号是否为语音信号,也即,当确定出当前时刻的时域骨导信号为语音信号后,由于时域麦克风信号和时域骨导信号是同一时刻采集的信号,因此当前时刻的时域麦克风信号也是语音信号,则说明当前时刻的时域麦克风信号也是语音信号,当确定出当前时刻的时域骨导信号为噪声信号后,则说明当前时刻的时域麦克风信号也是噪声信号。
S130:通过预先建立的DNN噪声消除模型对所述时域麦克风信号进行噪声消除处理得到经噪声消除后的时域麦克风信号,对所述时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号;
需要说明的是,本实施例中为了更好地消除噪声,可以预先建立DNN噪声消除模型,然后采用该DNN噪声消除模型对时域麦克风信号进行噪声消除处理,其中,DNN噪声消除模型 的建立过程为:
实际录取时域噪声信号n'和时域麦克风语音信号s,计算时域噪声信号n'和时域麦克风语音信号s的混合信号s_mix,将时域噪声信号n'、时域麦克风语音信号以及混合信号分别做时频变换(如FFT),得到的频域信号分别为N'(k),S(k)和S_mix(k),其中,k为频域序号。再对S_mix(k)进行特征提取,计算第一特征参数。
将时域麦克风语音信号s以及混合信号s_mix在频域上分别划分为多个第一子带(如18个第一子带),第一子带划分的方式可以采用mel频率的划分方式也可以采用bark子带的划分方式,具体采用哪种方式可以根据实际需要进行确定。
划分完成后,计算各个子带上的语音信号能量和混合信号能量,其中,语音信号能量根据
Figure PCTCN2021103635-appb-000001
进行计算,混合信号能量根据
Figure PCTCN2021103635-appb-000002
进行计算,其中,b表示子带序号,b=0,1,...,18;
然后计算第一子带增益,具体可以根据
Figure PCTCN2021103635-appb-000003
进行计算,其中,g(b)表示第b个第一子带的增益。
具体的,深度神经网络DNN噪声消除模型的训练的过程中,将上述计算出的真实混合信号的第一特征参数为输入信号,将上述计算得到的真实的第一子带增益g作为输出信号,不断训练和调整深度神经网路中的权重系数W、U及偏置,使每次输出的第一增益g'不断接近真实的第一增益值g。当g'和g的误差小于对应的预设值后,网络训练成功,并依据此时的网络参数得到最终的DNN噪声消除模型。
另外,上述判断时域骨导信号是否为语音信号,并确定出时域骨导信号不是语言信号后,该方法还可以包括:
依据时域骨导信号对骨导噪声信号功率谱进行更新;具体的,将时域骨导信号通过时频转换,转换为频域骨导信号,然后可以根据计算关系式P n(k,t)=β*P n(k,t-1)+(1-β)*|Y(k,t)| 2对骨导噪声信号功率谱进行更新,其中,P n(k,t)表示t时刻骨导传感器接收到的噪声信号的功率,P n(k,t-1)表示t-1时刻骨导传感器接收到的噪声信号的功率,Y(k,t)表示t时刻的第k个频域骨导信号,k表示频域序号,β表示迭代因子,β具体可以为0.9,当然,β的具体数值可以根据实际需要进行确定,本实施例不做特殊限定。
则相应的,上述对时域骨导信号进行频域的噪声消除处理,得到噪声消除后的时域骨导信号的过程,具体可以为:
依据计算关系式
Figure PCTCN2021103635-appb-000004
对频域骨导信号进行噪声消除,得到消除后的频域骨导信号,其中,
Figure PCTCN2021103635-appb-000005
Y t(k)表示t时刻的频谱信号,
Figure PCTCN2021103635-appb-000006
表示经过噪声消除后的频谱信号,H t(k)表示增益函数,λ表示过减因子,λ为常数(例如为0.9),γ t(k)表示后验信噪比。
S140:将与当前时刻对应的输出信号置为零;
具体的,当确定出当前时刻的时域骨导信号为噪声信号后,对应的时域麦克风信号也是噪声信号,因此可以直接将与当前时刻对应的输出信号置为零。
S150:对经噪声消除后的时域麦克风信号进行高通滤波处理,得到第一输出时域信号,对经噪声消除后的时域骨导信号进行低通滤波处理,得到第二输出时域信号;
需要说明的是,由于麦克风采集到的声音信号中高频比较丰富,骨导传感器采集到的声音信号中低频比较清晰完整,因此,本申请实施例可以对经噪声消除后的时域麦克风信号进行高通滤波处理,得到高频部分的第一输出时域信号,对经噪声消除后的时域骨导信号进行低通滤波处理,得到低频部分的第二输出时域信号。
S160:依据第一输出时域信号和第二输出时域信号,得到与当前时刻对应的输出时域信号。
具体的,本申请可以将第一输出时域信号和第二输出时域信号进行融合,具体可以预先确定与第一输出时域信号对应的第一权重系数k1,以及与第二输出时域信号对应的第二权重系数k2,然后通过各自的权重系数求和得到融合后的时域信号,具体可以通过out=k1*out1+k2*out2计算关系式得到融合后的时域信号out,其中,out1为第一输出时域信号,out2为第二输出时域信号。
另外,为了避免融合后的时域信号溢出,还可以对融合后的时域信号进行动态调整,将过大的信号进行压缩,将过小的信号适当放大,从而防止信号溢出,然后将调整后的时域信号作为与当前时刻对应的输出时域信号。
进一步的,对时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号的过程,具体可以为:
将时域骨导信号通过时频转换,转换为频域骨导信号;
对频域骨导信号进行频域的噪声消除处理,得到经噪声消除后的频域骨导信号;
判断经噪声消除后的频域骨导信号的带宽是否达到预设带宽,若达到,则直接对经噪声消除后的频域骨导信号进行时频反变换,得到经噪声消除后的时域骨导信号;若不满足,则采用预先建立的DNN带宽扩展模型对经噪声消除后的频域骨导信号进行带宽扩展,使扩展后的带宽达到预设带宽,并将扩展后的频域骨导信号进行时频反变换,得到经噪声消除后的时域骨导信号。
需要说明的是,上述在得到经噪声消除后的频域骨导信号后,还可以进一步判断噪声消除后的频域骨导信号的带宽是否达到预设带宽(预设带宽可以为1kHz),若达到,则直接对噪声消除后的频域骨导信号做时频反变换,得到噪声消除后的时域骨导信号;若不满足预设带宽,则可以采用预先建立的DNN带宽扩展模型对经噪声消除后的频域骨导信号进行带宽扩展,将其扩展后的带宽达到预设带宽即可,然后在将扩展后的频域骨导信号进行时频反变换,得到经噪声消除后的时域骨导信号。
其中,DNN带宽扩展模型的建立过程为:
实际获取降噪后残留的骨导噪声信号n g和骨导语音信号s g,计算骨导噪声信号n g和骨导语音信号s g的混合信号s g_mix,将骨导噪声信号n g、骨导语音信号s g以及骨导混合信号s g_mix分别做时频变换(如FFT),得到频域信号N g(k),S g(k)和S g_mix(k),再对N g(k),S g(k)和S g_mix(k)分别进行特征提取,计算各自的第二特征参数。
同样将骨导语音信号s g以及混合信号s g_mix在频域上划分为多个第二子带(如5个第二子带),第二子带划分的方式可以采用mel频率的划分方式也可以采用bark子带的划分方式,具体采用哪种方式可以根据实际需要进行确定;计算各个第二子带上的骨导语音信号能量和骨导混合信号能量:
其中,骨导语音信号能量可以采用计算关系式
Figure PCTCN2021103635-appb-000007
进行计算,骨导混合信号能量根据
Figure PCTCN2021103635-appb-000008
进行计算,b'表示第二子带序号,b'=0,1,...,5;
然后计算第二子带增益,具体可以根据
Figure PCTCN2021103635-appb-000009
进行计算,其中,g(b')表示第b'个第二子带的增益。
具体的,深度神经网络DNN带宽扩展模型的训练过程中,将上述计算得到的真实的第二特征参数作为输入信号,将上述计算得到的真实的第二子带增益g作为输出信号,不断训练和调整深度神经网路中的权重系数W、U偏置,使每次输出的第二增益不断接近真实值。当输出的第二增益和真实值的误差小于对应的预设值后,网络训练成功,并依据此时的网络 参数得到最终的DNN带宽扩展模型。
具体的,采用预先建立的DNN带宽扩展模型对经噪声消除后的频域骨导信号进行带宽扩展的过程,具体可以为:对频域骨导信号进行特征提取,得到第二信号特征;采用上述预先建立的DNN带宽扩展模型对第二信号特征进行处理,得到与频域骨导信号的各个第二频域点分别对应的第二增益;
计算频域骨导信号中与每个第二频率点各自对应的频谱信号与对应的第二增益的乘积,得到与每个第二频率点各自对应的、消除噪声后的频谱信号,以得到消除噪声后的频域骨导信号。进一步的,通过预先建立的DNN噪声消除模型对时域麦克风信号进行噪声消除处理,得到消除噪声后的时域麦克风信号的过程,具体可以为:
对时域麦克风信号进行时频变换,得到对应的频域麦克风信号;
提取频域麦克风信号的第一信号特征,并采用预先建立的DNN噪声消除模型对第一信号特性进行处理,得到与频域麦克风信号的各个第一频率点分别对应的第一增益;
计算频域麦克风信号中与每个第一频率点对应的频谱信号与对应的第一增益的乘积,得到与每个第一频率点各自对应的、消除噪声后的频谱信号,以得到消除噪声后的频域麦克风信号;
将消除噪声后的频域麦克风信号进行时频反变换,得到消除噪声后的时域麦克风信号。
进一步的,上述S120中判断时域骨导信号是否为语音信号的过程,具体可以为:
对时域骨导信号进行语音激活检测,以判断时域骨导信号是否为语音信号。
其中,上述对时域骨导信号进行语音激活检测,判断时域骨导信号是否为语音信号的过程,具体可以为:
计算时域骨导信号对应的过零率及基音周期;
对时域骨导信号进行时频变换,得到频域骨导信号;具体可以采用FFT快速傅里叶变换对时域骨导信号进行处理得到频域骨导信号;
计算频域骨导信号对应的频谱能量及谱质心;
对过零率、基音周期、频谱能量及谱质心进行融合判断,并得到与时域骨导信号对应的语音激活检测标记位;
依据语音激活检测标记位判断时域骨导信号是否为语音信号。
具体的,上述计算时域骨导信号对应的过零率的过程为:
根据第一计算关系式,计算时域骨导信号对应的过零率,其中第一计算关系式为:
Figure PCTCN2021103635-appb-000010
其中,Z n表示过零数,x(m)表示与时间变量m对应的时域信号,x(m-1)表示与时间变量m-1对应的时域信号,x(n)表示与时间变量n对应的时域信号,x(n-1)表示与时间变量n-1对应的时域信号;n≤N,N表示当前时域信号x(n)的长度;
Figure PCTCN2021103635-appb-000011
ZCR=Z n/(m2-m1+1),其中,ZCR表示过零率,m1表示当前帧时域信号列中第m1个点,m2表示当前帧时域信号中第m2个点。
上述计算时域骨导信号对应的基音周期的过程为:
自相关函数为:
Figure PCTCN2021103635-appb-000012
其中,R m表示语音信号自相关函数,x(n+m)表示与时间变量n+m对应的时域信号;
基音周期为:Pitch=max{R m},其中,Pitch表示基音周期。
上述计算频域骨导信号对应的频谱能量的过程为:
具体的,对于指定带宽的频谱能量,如时域骨导信号经FFT快速傅里叶变换后,将8khz带宽分为128个子带,取低24子带能量:
Figure PCTCN2021103635-appb-000013
其中,E g表示低24子带的对数能量,j表示低24子带序号,Y(j)表示频域信号,其中,低24子带指的是从128个子带中按照从低频到高频取24个子带。
上述计算频域骨导信号对应的谱质心的过程为:
Figure PCTCN2021103635-appb-000014
E(k)=|Y(k)| 2,其中,brightness表示谱质心,f(k)表示第k个频率点的频率,E(k)第k个频率点的频谱能量,U表示频率点数。
更进一步的,上述对过零率、基音周期、频谱能量及谱质心进行融合判断,并得到与时域骨导信号对应的语音激活检测标记位的过程,具体可以为:
判断频谱能量是否小于第一预设值,若是,则与时域骨导信号对应的语音激活检测标记位为0;若否,则进入下一步判断;
判断过零率是否大于第二预设值,若是,则与时域骨导信号对应的语音激活检测标记位为0,若否,则进入下一步判断;
判断基音周期是否大于第三预设值或小于第四预设值,若是,则与时域骨导信号对应的语音激活检测标记位为0;否则,进入下一步判断;
判断谱质心是否大于第五预设值,若是,则与时域骨导信号对应的语音激活检测标记位为0;否则,则与时域骨导信号对应的语音激活检测标记位为1;
需要说明的是,在实际应用中第一预设值可以为-9,第二预设值可以为03.6,第三预设值可以为143,第四预设值可以为8,第五预设值可以为3,当然,每个预设值的具体数值可以根据实际情况进行确定,本实施例不做特殊限定。
则,相应的上述依据语音激活检测标记位判断时域骨导信号是否为语音信号的过程,具体可以为:
当语音激活检测标记位为1时,时域骨导信号为语音信号;
当语音激活检测标记位为0时,当前的时域骨导信号为噪声信号。
进一步的,上述S130中对时域麦克风信号以及时域骨导信号进行噪声消除处理的过程,具体可以为:
通过预先建立的DNN噪声消除模型,对时域麦克风信号进行噪声消除处理,得到消除噪声后的时域麦克风信号;
对时域骨导信号进行频域的噪声消除处理,得到噪声消除后的时域骨导信号。
可见,本申请实施例通过麦克风拾取时域麦克风信号,通过骨声纹传感器采集时域骨导信号,然后通过判断时域麦克风信号和时域骨导信号是否为语音信号,可以确定出当前时刻是否是用户在讲话,当是语音信号时进一步通过预先建立的DNN噪声消除模型对时域麦克风信号进行噪声消除处理,并对时域骨导信号进行频域的噪声消除处理,从而较好的消除背景噪声,再对经噪声消除后的时域麦克风信号进行高通滤波后得到高频部分的第一输出时域信号,对经噪声消除后的时域骨导信号进行低通滤波处理后,得到低频部分的第二输出时域信号,然后根据第一输出时域信号和第二输出时域信号即可得到既包含高频部分又包含低频部分的输出时域信号;本申请能够较好的消除背景噪声,有利于提高声音的音质,提升用户体验。
在上述实施例的基础上,本申请实施例还提供了一种语音增强装置,具体请参照图3。该装置包括:
获取模块21,用于获取当前时刻的时域麦克风信号和时域骨导信号;
判断模块22,用于判断时域麦克风信号和时域骨导信号是否为语音信号,若是,则触发降噪模块23;若否,则触发置零模块24;
降噪模块23,用于通过预先建立的DNN噪声消除模型对时域麦克风信号进行噪声消除处理得到经噪声消除后的时域麦克风信号,用于对时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号;
置零模块24,用于将与当前时刻对应的输出信号置为零;
滤波模块25,用于对经噪声消除后的时域麦克风信号进行高通滤波处理,得到第一输出时域信号,对经噪声消除后的时域骨导信号进行低通滤波处理,得到第二输出时域信号;
融合模块26,用于依据第一输出时域信号和第二输出时域信号,得到与当前时刻对应的输出时域信号。
需要说明的是,本申请实施例中提供的语音增强装置具有与上述实施例中所提供的语音增强方法相同的有益效果,并且对于本实施例中所涉及到的语音增强方法的具体介绍请参照上述实施例,本申请在此不再赘述。
在上述实施例的基础上,本申请实施例还提供了一种语音增强系统,该系统包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现如上述语音增强方法的步骤。
需要说明的是,本申请实施例中的处理器具体可以用于实现接收当前时刻的时域麦克风信号和时域骨导信号,其中,时域麦克风信号为通过麦克风拾取的,时域骨导信号为通过骨声纹传感器采集的;判断时域麦克风信号和时域骨导信号是否为语音信号,若是,则通过预先建立的DNN噪声消除模型对时域麦克风信号进行噪声消除处理得到经噪声消除后的时域麦克风信号,用于对时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号;若否,则将与当前时刻对应的输出信号置为零;对经噪声消除后的时域麦克风信号进行高通滤波处理,得到第一输出时域信号,对经噪声消除后的时域骨导信号进行低通滤波处理,得到第二输出时域信号;依据第一输出时域信号和第二输出时域信号,得到与当前时刻对应的输出时域信号。
在上述实施例的基础上,本申请实施例还提供了一种计算机可读存储介质,计算机可 读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述语音增强方法的步骤。
该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其他实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (10)

  1. 一种语音增强方法,其特征在于,包括:
    获取当前时刻的时域麦克风信号和时域骨导信号;
    判断所述时域麦克风信号和所述时域骨导信号是否为语音信号,若是,则通过预先建立的DNN噪声消除模型对所述时域麦克风信号进行噪声消除处理得到经噪声消除后的时域麦克风信号,对所述时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号;若否,则将与所述当前时刻对应的输出信号置为零;
    对所述经噪声消除后的时域麦克风信号进行高通滤波处理,得到第一输出时域信号,对所述经噪声消除后的时域骨导信号进行低通滤波处理,得到第二输出时域信号;
    依据所述第一输出时域信号和所述第二输出时域信号,得到与所述当前时刻对应的输出时域信号。
  2. 根据权利要求1所述的语音增强方法,其特征在于,所述对所述时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号的过程为:
    将所述时域骨导信号通过时频转换,转换为频域骨导信号;
    对所述频域骨导信号进行频域的噪声消除处理,得到经噪声消除后的频域骨导信号;
    判断所述经噪声消除后的频域骨导信号的带宽是否达到预设带宽,若达到,则直接对所述经噪声消除后的频域骨导信号进行时频反变换,得到经噪声消除后的时域骨导信号;若不满足,则采用预先建立的DNN带宽扩展模型对所述经噪声消除后的频域骨导信号进行带宽扩展,使扩展后的带宽达到所述预设带宽,并将所述扩展后的频域骨导信号进行时频反变换,得到经噪声消除后的时域骨导信号。
  3. 根据权利要求1所述的语音增强方法,其特征在于,所述通过预先建立的DNN噪声消除模型对所述时域麦克风信号进行噪声消除处理,得到消除噪声后的时域麦克风信号的过程为:
    对所述时域麦克风信号进行时频变换,得到对应的频域麦克风信号;
    提取所述频域麦克风信号的第一信号特征,并采用预先建立的DNN噪声消除模型对所述第一信号特性进行处理,得到与所述频域麦克风信号的各个第一频率点分别对应的第一增益;
    计算所述频域麦克风信号中与每个所述第一频率点对应的频谱信号与对应的第一增益的乘积,得到与每个所述第一频率点各自对应的、消除噪声后的频谱信号,以得到消除噪声后的频域麦克风信号;
    将所述消除噪声后的频域麦克风信号进行时频反变换,得到消除噪声后的时域麦克风信号。
  4. 根据权利要求1所述的语音增强方法,其特征在于,所述判断所述时域麦克风信号和所述时域骨导信号是否为语音信号的过程为:
    对所述时域骨导信号进行语音激活检测,以判断所述时域骨导信号是否为语音信号;
    当所述时域骨导信号为语音信号时,所述时域麦克风信号为语音信号。
  5. 根据权利要求4所述的语音增强方法,其特征在于,所述对所述时域骨导信号进行语音激活检测,判断所述时域骨导信号是否为语音信号的过程为:
    计算所述时域骨导信号对应的过零率及基音周期;
    对所述时域骨导信号进行时频变换,得到频域骨导信号;
    计算所述频域骨导信号对应的频谱能量及谱质心;
    对所述过零率、所述基音周期、所述频谱能量及所述谱质心进行融合判断,并得到与所述时域骨导信号对应的语音激活检测标记位;
    依据所述语音激活检测标记位判断所述时域骨导信号是否为语音信号。
  6. 根据权利要求5所述的语音增强方法,其特征在于,所述对所述过零率、所述基音周期、所述频谱能量及所述谱质心进行融合判断,并得到与所述时域骨导信号对应的语音激活检测标记位的过程为:
    判断所述频谱能量是否小于第一预设值,若是,则与所述时域骨导信号对应的语音激活检测标记位为0;若否,则进入下一步判断;
    判断所述过零率是否大于第二预设值,若是,则与所述时域骨导信号对应的语音激活检测标记位为0,若否,则进入下一步判断;
    判断所述基音周期是否大于第三预设值或小于第四预设值,若是,则与所述时域骨导信号对应的语音激活检测标记位为0;否则,进入下一步判断;
    判断所述谱质心是否大于第五预设值,若是,则与所述时域骨导信号对应的语音激活检测标记位为0;否则,则与所述时域骨导信号对应的语音激活检测标记位为1;
    则,所述依据所述语音激活检测标记位判断所述时域骨导信号是否为语音信号的过程为:
    当所述语音激活检测标记位为1时,所述时域骨导信号为语音信号;
    当所述语音激活检测标记位为0时,所述当前的时域骨导信号为噪声信号。
  7. 根据权利要求1所述的语音增强方法,其特征在于,所述依据所述第一输出时域信号和所述第二输出时域信号,得到与所述当前时刻对应的输出时域信号的过程为:
    依据第一权重系数和第二权重系数对所述第一输出时域信号和所述第二输出时域信号进行融合,得到融合后的时域信号;
    对融合后的时域信号进行动态调整,使调整后的时域信号在预设范围内,并将调整后的时域信号作为与所述当前时刻对应的输出时域信号。
  8. 一种语音增强装置,其特征在于,包括:
    获取模块,用于获取当前时刻的时域麦克风信号和时域骨导信号;
    判断模块,用于判断所述时域麦克风信号和所述时域骨导信号是否为语音信号,若是,则触发降噪模块;若否,则触发置零模块;
    所述降噪模块,用于通过预先建立的DNN噪声消除模型对所述时域麦克风信号进行噪声消除处理得到经噪声消除后的时域麦克风信号,并对所述时域骨导信号进行频域的噪声消除处理得到经噪声消除后的时域骨导信号;
    所述置零模块,用于将与所述当前时刻对应的输出信号置为零;
    滤波模块,用于对所述经噪声消除后的时域麦克风信号进行高通滤波处理,得到第一输出时域信号,对所述经噪声消除后的时域骨导信号进行低通滤波处理,得到第二输出时域信号;
    融合模块,用于依据所述第一输出时域信号和所述第二输出时域信号,得到与所述当前时刻对应的输出时域信号。
  9. 一种语音增强系统,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述语音增强方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述语音增强方法的步骤。
PCT/CN2021/103635 2021-01-28 2021-06-30 一种语音增强方法、装置、系统及计算机可读存储介质 WO2022160593A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/263,357 US20240079021A1 (en) 2021-01-28 2021-06-30 Voice enhancement method, apparatus and system, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110119855.6 2021-01-28
CN202110119855.6A CN112767963B (zh) 2021-01-28 2021-01-28 一种语音增强方法、装置、系统及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022160593A1 true WO2022160593A1 (zh) 2022-08-04

Family

ID=75706467

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103635 WO2022160593A1 (zh) 2021-01-28 2021-06-30 一种语音增强方法、装置、系统及计算机可读存储介质

Country Status (3)

Country Link
US (1) US20240079021A1 (zh)
CN (1) CN112767963B (zh)
WO (1) WO2022160593A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116904569A (zh) * 2023-09-13 2023-10-20 北京齐碳科技有限公司 信号处理方法、装置、电子设备、介质和产品

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767963B (zh) * 2021-01-28 2022-11-25 歌尔科技有限公司 一种语音增强方法、装置、系统及计算机可读存储介质
CN113727242B (zh) * 2021-08-30 2022-11-04 歌尔科技有限公司 一种在线拾音主电单元、方法及可穿戴设备
CN114038476A (zh) * 2021-11-29 2022-02-11 北京达佳互联信息技术有限公司 音频信号处理方法及装置
CN114822573A (zh) * 2022-04-28 2022-07-29 歌尔股份有限公司 语音增强方法、装置、耳机设备以及计算机可读存储介质
CN114582365B (zh) * 2022-05-05 2022-09-06 阿里巴巴(中国)有限公司 音频处理方法和装置、存储介质和电子设备
CN115662436B (zh) * 2022-11-14 2023-04-14 北京探境科技有限公司 音频处理方法、装置、存储介质及智能眼镜
CN115862656B (zh) * 2023-02-03 2023-06-02 中国科学院自动化研究所 一种骨传麦克风语音增强方法及装置、设备及存储介质
CN116030823B (zh) * 2023-03-30 2023-06-16 北京探境科技有限公司 一种语音信号处理方法、装置、计算机设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180040333A1 (en) * 2016-08-03 2018-02-08 Apple Inc. System and method for performing speech enhancement using a deep neural network-based signal
CN109767783A (zh) * 2019-02-15 2019-05-17 深圳市汇顶科技股份有限公司 语音增强方法、装置、设备及存储介质
CN110782912A (zh) * 2019-10-10 2020-02-11 安克创新科技股份有限公司 音源的控制方法以及扬声设备
CN110931031A (zh) * 2019-10-09 2020-03-27 大象声科(深圳)科技有限公司 一种融合骨振动传感器和麦克风信号的深度学习语音提取和降噪方法
CN111916101A (zh) * 2020-08-06 2020-11-10 大象声科(深圳)科技有限公司 一种融合骨振动传感器和双麦克风信号的深度学习降噪方法及系统
CN112017696A (zh) * 2020-09-10 2020-12-01 歌尔科技有限公司 耳机的语音活动检测方法、耳机及存储介质
CN112767963A (zh) * 2021-01-28 2021-05-07 歌尔科技有限公司 一种语音增强方法、装置、系统及计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7447630B2 (en) * 2003-11-26 2008-11-04 Microsoft Corporation Method and apparatus for multi-sensory speech enhancement
CN107886967B (zh) * 2017-11-18 2018-11-13 中国人民解放军陆军工程大学 一种深度双向门递归神经网络的骨导语音增强方法
CN112017687B (zh) * 2020-09-11 2024-03-29 歌尔科技有限公司 一种骨传导设备的语音处理方法、装置及介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180040333A1 (en) * 2016-08-03 2018-02-08 Apple Inc. System and method for performing speech enhancement using a deep neural network-based signal
CN109767783A (zh) * 2019-02-15 2019-05-17 深圳市汇顶科技股份有限公司 语音增强方法、装置、设备及存储介质
CN110931031A (zh) * 2019-10-09 2020-03-27 大象声科(深圳)科技有限公司 一种融合骨振动传感器和麦克风信号的深度学习语音提取和降噪方法
CN110782912A (zh) * 2019-10-10 2020-02-11 安克创新科技股份有限公司 音源的控制方法以及扬声设备
CN111916101A (zh) * 2020-08-06 2020-11-10 大象声科(深圳)科技有限公司 一种融合骨振动传感器和双麦克风信号的深度学习降噪方法及系统
CN112017696A (zh) * 2020-09-10 2020-12-01 歌尔科技有限公司 耳机的语音活动检测方法、耳机及存储介质
CN112767963A (zh) * 2021-01-28 2021-05-07 歌尔科技有限公司 一种语音增强方法、装置、系统及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116904569A (zh) * 2023-09-13 2023-10-20 北京齐碳科技有限公司 信号处理方法、装置、电子设备、介质和产品
CN116904569B (zh) * 2023-09-13 2023-12-15 北京齐碳科技有限公司 信号处理方法、装置、电子设备、介质和产品

Also Published As

Publication number Publication date
CN112767963A (zh) 2021-05-07
CN112767963B (zh) 2022-11-25
US20240079021A1 (en) 2024-03-07

Similar Documents

Publication Publication Date Title
WO2022160593A1 (zh) 一种语音增强方法、装置、系统及计算机可读存储介质
AU771444B2 (en) Noise reduction apparatus and method
CN103871421B (zh) 一种基于子带噪声分析的自适应降噪方法与系统
WO2022052244A1 (zh) 耳机的语音活动检测方法、耳机及存储介质
US9064502B2 (en) Speech intelligibility predictor and applications thereof
US8842861B2 (en) Method of signal processing in a hearing aid system and a hearing aid system
US9532149B2 (en) Method of signal processing in a hearing aid system and a hearing aid system
WO2012142270A1 (en) Systems, methods, apparatus, and computer readable media for equalization
CN103238183A (zh) 噪音抑制装置
CN107680609A (zh) 一种基于噪声功率谱密度的双通道语音增强方法
CN103813251B (zh) 一种可调节去噪程度的助听器去噪装置和方法
CN110248300B (zh) 一种基于自主学习的啸叫抑制方法及扩声系统
WO2009123387A1 (en) Procedure for processing noisy speech signals, and apparatus and computer program therefor
WO2015085946A1 (zh) 语音信号处理方法、装置及服务器
WO2022256577A1 (en) A method of speech enhancement and a mobile computing device implementing the method
KR101715198B1 (ko) 가변 전력 예산을 이용한 음성 강화 방법
JP2007251354A (ja) マイクロホン、音声生成方法
Bhat et al. Smartphone based real-time super gaussian single microphone speech enhancement to improve intelligibility for hearing aid users using formant information
RU2589298C1 (ru) Способ повышения разборчивости и информативности звуковых сигналов в шумовой обстановке
Halawani et al. Speech enhancement techniques for hearing impaired people: Digital signal processing based approach
CN113593612B (zh) 语音信号处理方法、设备、介质及计算机程序产品
EP3837621B1 (en) Dual-microphone methods for reverberation mitigation
Shanmugapriya et al. A thorough investigation on speech enhancement techniques for hearing aids
CN113838471A (zh) 基于神经网络的降噪方法、系统、电子设备及存储介质
CN117912485A (zh) 语音频带扩展方法、降噪音频设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922191

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18263357

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922191

Country of ref document: EP

Kind code of ref document: A1