WO2017075979A1 - 语音信号的处理方法及装置 - Google Patents
语音信号的处理方法及装置 Download PDFInfo
- Publication number
- WO2017075979A1 WO2017075979A1 PCT/CN2016/083622 CN2016083622W WO2017075979A1 WO 2017075979 A1 WO2017075979 A1 WO 2017075979A1 CN 2016083622 W CN2016083622 W CN 2016083622W WO 2017075979 A1 WO2017075979 A1 WO 2017075979A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- power spectrum
- power
- calculating
- echo
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 238000012545 processing Methods 0.000 title claims abstract description 46
- 238000001228 spectrum Methods 0.000 claims abstract description 244
- 238000012546 transfer Methods 0.000 claims abstract description 56
- 230000006870 function Effects 0.000 claims description 100
- 238000005311 autocorrelation function Methods 0.000 claims description 19
- 238000005314 correlation function Methods 0.000 claims description 17
- 230000003595 spectral effect Effects 0.000 claims description 15
- 230000005236 sound signal Effects 0.000 claims description 11
- 238000005303 weighing Methods 0.000 abstract 2
- 239000013598 vector Substances 0.000 description 19
- 238000001514 detection method Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 230000001131 transforming effect Effects 0.000 description 7
- 230000004044 response Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 241000699666 Mus <mouse, genus> Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 229940050561 matrix product Drugs 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
- H04M9/082—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/02—Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback
Definitions
- the present invention relates to the field of terminal technologies, and in particular, to a method and an apparatus for processing a voice signal.
- Voice intelligibility refers to the percentage of the user's listening to the voice signal transmitted by the sound system. For example, if the user hears that the sound system has transmitted 100 words but only understands 50 words, the system's voice intelligibility. It is 50%. As the external size of the portable mobile terminal gradually develops toward miniaturization, the maximum sound power that the mobile terminal can output gradually decreases, and accordingly the voice intelligibility of the user when using the mobile terminal for communication is also affected. Since speech intelligibility is an important indicator for measuring the performance of mobile terminals, how mobile terminals handle speech signals to improve speech intelligibility is the key to their development.
- an automatic gain control algorithm is used to detect a broadcast signal to be played, and a small signal in the broadcast signal to be played is amplified, and the amplified broadcast is performed.
- the signal is converted into an electrical signal and the electrical signal is transmitted to the speaker.
- the average fluctuation amplitude of the broadcast signal is much smaller than the peak fluctuation amplitude, for a speaker with a maximum rated output power of 1 watt, under the excitation of the normal speech signal, the average output power during normal operation generally only reaches the maximum rated output. About 10% of the power (that is, 0.1W).
- the amplitude of the electrical signal input to the speaker is continuously increased, the portion of the signal having a larger amplitude in the broadcast signal will cause the speaker to be overloaded, forming saturation distortion, and reducing the intelligibility and clarity of the speech; If only the small signal in the broadcast signal is amplified, the broadcast signal will be reduced.
- the effective dynamic range, the corresponding speech intelligibility is also not significantly improved.
- an embodiment of the present invention provides a method and an apparatus for processing a voice signal.
- the technical solution is as follows:
- a method of processing a voice signal comprising:
- the adjusted speech signal is output.
- a processing apparatus for a voice signal comprising:
- At least one processor At least one processor
- a memory wherein the memory stores program instructions that, when executed by the processor, configure the apparatus to perform the operations of:
- the adjusted speech signal is output.
- the frequency amplitude of the broadcast signal is automatically adjusted according to the frequency distribution of the noise signal and the broadcast signal, thereby significantly improving the speech intelligibility.
- FIG. 1 is a schematic diagram of an implementation environment involved in a method for processing a voice signal according to an embodiment of the present invention
- FIG. 2 is a system architecture diagram of a method for processing a voice signal according to another embodiment of the present invention.
- FIG. 3 is a flowchart of a method for processing a voice signal according to another embodiment of the present invention.
- FIG. 4 is a flowchart of a method for processing a voice signal according to another embodiment of the present invention.
- FIG. 5 is a schematic diagram of a signal flow corresponding to a method for processing a voice signal according to another embodiment of the present invention.
- FIG. 6 is a flowchart of a method for processing a voice signal according to another embodiment of the present invention.
- FIG. 7 is a schematic structural diagram of a device for processing a voice signal according to another embodiment of the present invention.
- FIG. 8 is a schematic structural diagram of a processing terminal of a voice signal according to another embodiment of the present invention.
- the voice instant messaging application is an application that can make VoIP or network audio conferences and is widely installed on mobile terminals such as smart phones, tablets, notebook computers, and wearable electronic products. As the physical dimensions of these mobile terminals gradually develop toward miniaturization, the maximum sound power that the micro-speakers in the mobile terminal device can output also encounters a bottleneck.
- the existing electroacoustic sound amplification technology mainly relies on three parts of a power amplifier, a speaker and a sound chamber to realize the generation of sound waves.
- the physical size of the speaker and the sound chamber is proportional to the wavelength of the sound wave, the mobile terminal device
- the speaker can achieve electro-acoustic conversion with maximum efficiency.
- the size of portable mobile devices has become smaller, the size of mobile terminals tends to be smaller than the wavelength of sound waves.
- the size of the mobile terminal needs to be at least 1 meter, and the miniaturization of the speaker size results in a reduction in the maximum sound power output by the mobile terminal.
- the currently used moving coil speakers need to reach a certain size and thickness to ensure that the diaphragm has sufficient space for movement.
- the external dimensions of the mobile terminal decrease and the thickness becomes thin, the overall acoustics in the mobile terminal The design is limited by the physical size, so that the maximum sound power output by the mobile terminal is limited.
- the voice instant messaging application installed in the mobile terminal generally runs on the operating system, and the volume control of the hardware can be implemented through an application program interface provided by the operating system.
- the current mainstream implementation method is that the voice instant messaging application declares the audio configuration mode to the operating system, and the operating system sets the relevant hardware. After the configuration is completed, the voice instant messaging application only needs to periodically broadcast the signal. The corresponding data is written into the recording API of the operating system, and then the data can be read from the recording API of the operating system.
- the types of audio configuration modes supported by the operating system are limited. These limited audio configuration modes are implemented by the mobile terminal manufacturer in the hardware firmware (firmware firmware), and the application's control of the hardware output volume is affected by this factor.
- hardware vendors often only do the underlying audio optimization for normal usage scenarios. For use scenarios in extreme environments (such as large ambient noise), mobile terminal manufacturers generally do not optimize this ( For example, mobile terminal manufacturers generally do not provide a dedicated software interface that can increase the hardware output volume).
- the order of output volume from large to small is: laptop, tablet, smart phone (hands-free mode), wearable device, and the like.
- the environmental noise problems faced by these kinds of mobile terminals are in the opposite trend: usually, the frequency of use of laptops indoors is relatively high, and the noise that is exposed is also low noise of indoor low decibels. Mainly; tablets and smartphones are used more frequently in outdoor and public places, and the noise that comes into contact is dominated by high noise of high decibels; wearable devices are exposed to the human body for a long time, and the noise scenes are the most exposed. The most complicated. As the external dimensions of mobile terminals are becoming smaller, the problem of environmental noise faced by mobile terminals becomes more and more prominent, which seriously affects the experience of users when using mobile terminals for communication.
- the embodiment of the present invention provides a method for improving the mobile terminal by processing the voice signal without changing the hardware of the mobile terminal.
- the method of speech intelligibility With the method provided by the embodiment of the present invention, the user of the mobile terminal can hear the voice content of the opposite end of the call even in a noisy scene.
- FIG. 1 is a flowchart of a method and an apparatus for processing a voice signal according to an embodiment of the present invention. Schematic diagram.
- the implementation environment includes three acoustic bodies of a mobile terminal P, a user U, and a noise source N, and further includes a sound output and input device speaker S and a microphone M.
- the mobile terminal P can be a mobile phone, a tablet computer, a notebook computer, a wearable device, etc., in which one or more voice instant messaging applications (Apps) are installed, and based on these voice instant messaging applications, the user can communicate with other users anytime and anywhere.
- Apps voice instant messaging applications
- the speaker S and the microphone M can be built in the mobile terminal, or can be connected to the mobile terminal in the form of an external device such as an external audio, an external speaker, a Bluetooth speaker, or a Bluetooth headset.
- the microphone M can pick up the sound in the entire scene, including: the noise emitted by the noise source N, the voice emitted by the user U when speaking, and the sound broadcast by the speaker S.
- the mobile terminal receives the voice signal to be played by the opposite end (for the sake of differentiation, hereinafter referred to as the broadcast signal), and after the broadcast signal is processed, the speaker converts into a sound wave.
- the sound wave emitted by the noise source N is also transmitted to the user U through the air, and is also perceived by the user U, and the sound wave emitted by the noise source N may interfere with the user U. , reducing the voice intelligibility of the mobile terminal.
- the present invention will utilize the psychoacoustic masking effect to solve the interference problem of the noise signal to the broadcast signal.
- the broadcast signal and the noise signal are not single frequency signals, they each occupy different frequency bands, and their energy distribution at each frequency point is not uniform.
- the frequency points of the lowest energy in the noise signal can be found, which is denoted as f_weak.
- the energy of the broadcast signal is concentrated to be played near f_weak without exceeding the output power of the speaker, and at the same time, the energy of the broadcast signal at the frequency away from f_weak is attenuated to avoid speaker overload. In this way, at the frequency point near f_weak, the noise signal is masked by the broadcast signal, and the user perceives the content of the broadcast signal.
- the broadcast signal is still masked by the noise signal.
- the enhanced broadcast signal masks the noise signal at part of the frequency, so that the noise no longer forms an overall mask on the broadcast signal, and the user can hear the content of the broadcast signal.
- the system architecture includes a user U, a speaker S, a microphone M, and various functional modules.
- the function module package
- the signal detection and classification module, the spectrum estimation module, the loop function transfer calculation module, the speech intelligibility estimation module, and the like are included.
- the spectrum estimation module may specifically include a voice activation detection module, a noise power spectrum module, and an echo power spectrum module.
- the microphone M is used to pick up the ambient sound.
- the ambient sound is referred to as a recording signal (denoted as x), and the recording signal x is sent to the signal detection and classification module.
- the signal detection and classification module is used for detecting and distinguishing the recorded signal, and outputs three types of signals: a voice signal when the user U speaks (denoted as the near-end signal v), and a noise signal generated by the noise source N (recorded as the noise signal n) ), the signal that the sound played by the speaker S is re-recorded by the M (denoted as the echo signal e).
- the spectrum estimation module is configured to calculate a power spectrum of the noise signal, a power spectrum of the echo signal, and a power characteristic value of the near-end signal, wherein the power spectrum of the noise signal can be represented by Pn , the power of the echo signal can be represented by P e , and the signal of the near-end signal
- the power characteristic value can be expressed by VAD_v.
- the loop transfer function calculation module is configured to calculate a transfer function on the path of "heavy filter-speaker-sound field-microphone" according to the broadcast signal y and the recording signal x picked up by the microphone, and record it as H_loop.
- the speech intelligibility estimation module is configured to determine speech intelligibility (denoted as SII) based on H_loop, VAD_v, Pn, and P e , and the speech intelligibility is also used to calculate the frequency emphasis coefficient of the emphasis filter W.
- SII speech intelligibility
- the purpose of processing the broadcast signal and the recording signal is to hope that the user U is in the ear.
- the SII in position is adjusted to the maximum, not the position where the microphone M is located.
- the method provided by this embodiment employs an approximation process.
- the length of the propagation path of the sound between the speaker S and the ear of the user U is represented by h1
- the length of the propagation path of the sound between the noise source N and the user's ear is h2.
- the length of the propagation path of the sound between the noise source N and the microphone M is represented by h3
- the length of the propagation path of the sound between the mouth of the user U and the microphone M is represented by h4
- the sound is in the microphone M and the speaker.
- the length of the propagation path between S is denoted by h5.
- the problem of calculating the maximum speech intelligibility of the location of the user U can be converted into the maximum speech intelligibility problem of calculating the position of the microphone M.
- FIG. 3 is a flow chart showing a method of processing a voice signal according to an embodiment of the present invention. Referring to FIG. 3, the method provided in this embodiment includes:
- a recording signal and a voice signal for example, collecting a recording signal from a near end and receiving a voice signal (ie, a broadcasting signal) sent by the opposite end.
- the recording signal includes at least a noise signal and an echo signal.
- the method provided by the embodiment of the invention automatically adjusts the frequency amplitude of the broadcast signal according to the frequency distribution of the noise signal and the broadcast signal under the premise of ensuring that the speaker is not overloaded without destroying the dynamic amplitude of the original broadcast signal, thereby significantly improving the amplitude of the broadcast signal.
- Voice intelligibility
- the loop transfer function is calculated based on the recorded signal and the broadcast signal, including:
- the loop transfer function is calculated based on the frequency domain cross-correlation function between the recording signal and the broadcast signal and the frequency domain autocorrelation function of the broadcast signal.
- the power spectrum of the recorded signal is calculated using the following formula:
- X(n) is the vector obtained by Fourier transforming the recorded signal acquired at the nth time, and .2 is used to square each vector element in X(n) .
- the power spectrum of the noise signal is obtained by subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal.
- calculating the square of the spectral estimation value of the echo signal, before obtaining the power spectrum of the echo signal further includes:
- the power characteristic value of the recording signal is greater than the first threshold
- the power value of the broadcast signal is greater than the second threshold
- the power characteristic value of the echo signal is greater than the third threshold
- the method before subtracting the power spectrum of the echo signal from the power spectrum of the echo signal to obtain the power spectrum of the noise signal, the method further includes:
- the power spectrum of the echo signal is subtracted from the power spectrum of the recorded signal to obtain a power spectrum of the noise signal.
- the frequency emphasis coefficient is calculated based on the power spectrum of the echo signal and the power spectrum of the noise signal, including:
- the frequency emphasis coefficient is obtained according to the maximum value of the speech intelligibility function.
- FIG. 4 is a flow chart showing a method of processing a voice signal according to another embodiment of the present invention. Referring to FIG. 4, the method provided in this embodiment includes:
- the mobile terminal collects the recording signal from the near end and receives the broadcast signal sent by the opposite end.
- the near end is the current environment of the mobile terminal, and the mobile terminal collects the recording signal from the near end, including but not limited to: turning on the microphone, collecting the sound signal in the current environment through the microphone, and The sound signal collected by the microphone is used as a recording signal, and the recording signal includes a noise signal, an echo signal, and a near-end signal.
- the recording signal can be represented by x
- the noise signal can be represented by n
- the echo signal can be represented by e
- the near-end signal can be represented by v.
- the peer end collects the voice signal of the peer user through the microphone, processes the collected voice signal, and sends it to the mobile terminal through the network.
- the instant messaging application on the mobile terminal receives the voice signal sent by the opposite end, and the opposite end
- the transmitted voice signal is used as a broadcast signal.
- the peer end may be other mobile terminals that communicate with the mobile terminal through a voice instant messaging application.
- the broadcast signal can be represented by y.
- the microphone on the mobile terminal side collects the recording signal every preset time period, and the opposite side microphone also collects the broadcasting signal every preset time period, and collects the sound signal.
- the incoming broadcast signal is sent to the mobile terminal.
- the preset duration may be 10 ms (milliseconds), 20 ms, 50 ms, and the like.
- the recording signal collected by the mobile terminal from the near end and the broadcast signal sent by the opposite end are substantially time domain signals.
- the method provided in this embodiment will also adopt Fourier transform or the like. The method separately processes the collected recording signal and the received broadcasting signal, and converts the recording signal in the time domain form into the recording signal in the frequency domain form, and converts the broadcasting signal in the time domain form into the frequency domain form. Broadcast signal for subsequent calculations.
- the recording signal in the frequency domain form is a column vector, and the vector length is equal to the number of points of the Fourier transform used, and can be represented by X;
- the broadcast signal in the frequency domain form is also a column vector, and the vector length is also equal to the adopted
- the number of points of the Fourier transform can be represented by Y.
- the obtained recording signal in the frequency domain form and the broadcast signal in the frequency domain form have the same dimension.
- the mobile terminal calculates a loop transfer function according to the recording signal and the broadcast signal.
- the mobile terminal acquires a frequency domain cross-correlation function between the recording signal and the broadcast signal.
- the cross-correlation function is used to indicate the degree of correlation between the two signals.
- the mobile terminal acquires the frequency domain cross-correlation function between the recording signal and the broadcast signal, the following formula ⁇ 1> can be used:
- r_xy is the cross-correlation function between the recording signal and the broadcast signal
- E[.] is the expected operator
- the mobile terminal acquires a frequency domain autocorrelation function of the broadcast signal.
- the autocorrelation function is used to indicate the degree of correlation between the signal and the delayed signal of the signal.
- the mobile terminal acquires the frequency domain autocorrelation function of the broadcast signal, the following formula ⁇ 2> can be used:
- R_yy is the frequency domain autocorrelation function of the broadcast signal
- the symbol * indicates the matrix product operation
- the symbol ' indicates the conjugate transpose operation
- Y(n) is the Fourier transform obtained by the broadcast signal acquired at the nth time.
- the mobile terminal may apply the following formula ⁇ 3 based on the frequency domain cross-correlation function between the recording signal and the broadcast signal acquired in the above step 4021 and the frequency domain autocorrelation function of the broadcast signal obtained in step 4022. >, calculate the loop transfer function:
- H_loop is a loop transfer function
- ⁇ -1 represents a matrix inversion operation
- the mobile terminal acquires a power spectrum of the recorded signal and a power spectrum of the broadcast signal.
- the mobile terminal can calculate the power spectrum of the recorded signal by applying the following formula ⁇ 4>:
- X(n) is the vector obtained by Fourier transforming the recorded signal acquired at the nth time, and .2 is used for each vector element in X(n) Find the square.
- P x ⁇ a 1 2 , a 2 2 , a 3 2 , . . . , a n 2 ⁇ .
- the mobile terminal can calculate the power spectrum of the broadcast signal by applying the following formula ⁇ 5>:
- Y(n) is the vector obtained by Fourier transforming the broadcast signal collected at the nth time, and .2 is used for each vector element in Y(n) Find the square.
- P y ⁇ b 1 2 , b 2 2 , b 3 2 , . . . , b n 2 ⁇ .
- the mobile terminal calculates an estimated value of the echo signal according to the loop transfer function and the broadcast signal.
- the mobile terminal can calculate the estimated value of the echo signal by applying the following formula ⁇ 6>:
- E(n) is an estimated value of the echo signal.
- the mobile terminal acquires a power feature value of the recorded signal, a power feature value of the broadcast signal, and a power feature value of the echo signal.
- the power characteristic value of the recorded signal is a measure of the power spectrum of the recorded signal, and can be obtained by processing the power spectrum of the recorded signal.
- the power characteristic value of the recorded signal can be represented by VAD_x.
- the power characteristic value of the broadcast signal is a measure of the power spectrum of the broadcast signal, and can be obtained by processing the power spectrum of the broadcast signal.
- the power feature value of the broadcast signal can be represented by VAD_y.
- the power characteristic value of the echo signal is a measure of the power spectrum of the echo signal.
- the power characteristic value of the echo signal can be represented by VAD_e.
- the power spectrum of an echo signal can be calculated according to the spectrum estimation value of the echo signal, and then the power spectrum of the echo signal is processed to obtain the echo signal. Power characteristic value.
- the power spectrum of the echo signal calculated here is an estimate of the power spectrum of the echo signal. Whether the power spectrum of the echo signal is the power spectrum of the echo signal calculated here needs to be further determined by the following step 406.
- the mobile terminal determines whether the power feature value of the recorded signal is greater than the first threshold, whether the power feature value of the broadcast signal is greater than the second threshold, and whether the power feature value of the echo signal is greater than a third threshold. If yes, step 407 is performed.
- the present embodiment applies the signal detection and classification module and the voice activation detection mechanism, and according to the power characteristic value of the recording signal, the power characteristic value of the echo signal, and the power characteristic value of the broadcast signal, The time distinguishes the near-end signal (with background noise superimposed) and the non-near-end signal to obtain the power spectrum of the noise signal.
- the mobile terminal needs to determine whether the power feature value of the recorded signal is greater than the first threshold, whether the power feature value of the broadcast signal is greater than the second threshold, and whether the power feature value of the echo signal is greater than the third threshold.
- the first threshold, the second threshold, and the third threshold are preset thresholds.
- the first threshold may be represented by Tx
- the second threshold may be represented by a Ty table
- the third threshold may be represented by Te.
- the mobile terminal may not have a near-end signal in the recording signal collected by the microphone.
- the following formula ⁇ 8> can be used to determine:
- VAD_v VAD_x ⁇ 8>
- the recording signal collected by the microphone is the near-end signal. At this time, the user is talking, otherwise the user is not talking.
- the judging process if it is determined that the power feature value of the recording signal is greater than the first threshold, the power feature value of the broadcast signal is greater than the second threshold, and the power feature value of the echo signal is greater than the third threshold, performing the following step 407;
- the power feature value of the recorded signal is greater than the first threshold, the power feature value of the broadcast signal is greater than the second threshold, the power feature value of the echo signal is less than or equal to the third threshold, or the power feature value of the recorded signal is greater than the first threshold, and the broadcast is performed. If the power characteristic value of the signal is less than or equal to the second threshold, the acquired recording signal and the broadcast signal are ignored.
- the mobile terminal calculates a square of a spectrum estimation value of the echo signal as a power spectrum of the echo signal.
- the mobile terminal obtains the square of the spectrum estimation value of the echo signal.
- the power spectrum of the echo signal in the specific calculation, the following formula ⁇ 9> can be applied:
- P e is the power spectrum of the echo signal.
- the mobile terminal determines whether the power feature value of the recorded signal is less than the first threshold, and whether the power feature value of the echo signal is less than a third threshold. If yes, step 409 is performed.
- the mobile terminal further determines whether the power feature value of the recorded signal is less than the first threshold, and whether the power feature value of the echo signal is less than a third threshold to obtain a power spectrum of the noise signal.
- step 409 is performed; if it is determined that the power feature value of the recorded signal is less than the first threshold If the power characteristic value of the echo signal is greater than or equal to the third threshold, the acquired recording signal and the broadcast signal are ignored.
- the mobile terminal subtracts the power spectrum of the echo signal from the power spectrum of the recorded signal as a power spectrum of the noise signal.
- the power spectrum of the noise signal is obtained by subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal. For specific implementation, see the following formula ⁇ 10>:
- P n is the power spectrum of the noise signal.
- the mobile terminal calculates a frequency emphasis coefficient according to a power spectrum of the echo signal and a power spectrum of the noise signal.
- the mobile terminal constructs a speech intelligibility function according to a power spectrum of the echo signal and a power spectrum of the noise signal.
- the speech intelligibility function (SII) has multiple sets of standards.
- the standard [4] in ASNI-S3.5 is used for calculation.
- the speech intelligibility function can represent The power spectrum of the echo signal and the power spectrum of the noise signal are functions of the independent variable. Therefore, after the mobile terminal calculates the power spectrum of the echo signal and the power spectrum of the noise signal, a speech intelligibility function can be constructed.
- the constructed speech intelligibility function can be found in the following formula ⁇ 11>:
- i max is the total number of bands split, i is any band within i max , SII is a speech intelligibility function, Pe i is the power spectrum of the echo signal in the ith band, and Pn i is a noise signal In the power spectrum in the i-th frequency band, Pu i is the power spectrum of the standard speech intensity in the i-th band, I i is the sub-band weighting weight, and Pd i is the intermediate variable, which can be expressed by the following formula ⁇ 12>:
- f k represents the kth frequency point in the i-th frequency band
- C k is an intermediate variable, which can be expressed by the following formula ⁇ 13>:
- Pe k is the power spectrum of the echo signal at the kth frequency point
- Pn k is the power spectrum of the noise signal at the kth frequency point
- the mobile terminal calculates a maximum value of the speech intelligibility function, thereby obtaining a frequency emphasis coefficient.
- the frequency emphasis coefficient is a coefficient of the weighting filter in the mobile terminal, and is used to adjust the frequency point amplitude of the broadcast signal output by the mobile terminal. At different times, the frequency accretion coefficients calculated by the mobile terminal are different.
- the speech intelligibility function is a function of the power spectrum of the echo signal and the power spectrum of the noise signal as an independent variable, that is, the speech intelligibility.
- the method provided in this embodiment performs an approximate calculation, and sets the power spectrum of the noise signal at the nth time to be approximately equal to the power spectrum of the noise signal at time n-1, so that when calculating the frequency emphasis coefficient at the nth time
- the mobile terminal can directly use the power spectrum of the noise signal calculated at time n-1.
- the mobile terminal converts the speech intelligibility function into a function of the power spectrum of the echo signal as an independent variable.
- the mobile terminal will also use the emphasis filter to process the broadcast signal before the broadcast signal is played through the speaker, so as to increase the amplitude of the broadcast signal at the specified frequency point. Increase the energy of the broadcast signal.
- the maximum sound power played by the speaker has a maximum value.
- this method is called the extremum problem under constraints. This extreme value problem can be expressed by the following formula ⁇ 14>:
- Pe i is the power spectrum of the echo signal before the enhancement at the i-th frequency point
- Pe' i is the power spectrum of the enhanced echo signal at the i-th frequency point
- the signal processed by the emphasis filter is an electrical signal, and the electrical signal needs to be converted by the speaker to become an acoustic wave. Since the output frequency responses of the speakers of different types of mobile terminals are different, if the output frequency response of the speakers of different mobile terminals is to be obtained, it is necessary to separately measure the speakers of each mobile terminal and perform correction compensation during operation. A hardware fragmentation issue will result. In order to avoid this problem, the method provided by this embodiment will adopt the following method to avoid direct measurement of the frequency response of the speaker.
- the mobile terminal adjusts a frequency amplitude of the broadcast signal based on the frequency emphasis coefficient.
- the mobile terminal Based on the determined frequency emphasis coefficient, by the mobile terminal to dynamically track and adjust the speech intelligibility functions to implement the power spectrum of the noise signal P n, changes in the power spectrum of the echo signal P e automatically adapt.
- the mobile terminal outputs the adjusted broadcast signal.
- the mobile terminal In order to improve the accuracy of the broadcast signal outputted by the mobile terminal at the current time, the mobile terminal combines the broadcast signal outputted during the period before the current time and the corresponding frequency emphasis coefficient, and determines the current time according to the following formula ⁇ 17>.
- the output broadcast signal In order to improve the accuracy of the broadcast signal outputted by the mobile terminal at the current time, the mobile terminal combines the broadcast signal outputted during the period before the current time and the corresponding frequency emphasis coefficient, and determines the current time according to the following formula ⁇ 17>. The output broadcast signal.
- z(n) is the output broadcast signal
- w(k) is the corresponding value of the frequency emphasis coefficient calculated at the nth time in the time domain
- K max is equal to the order of the weighting filter W
- y(nk) is The value of the broadcast signal before the nk time.
- the adjusted broadcast signal output by the mobile terminal in this step can mask the noise signal, the user can hear the content of the broadcast signal after listening to the broadcast signal to be adjusted.
- FIG. 5 is a diagram showing a signal flow corresponding to a method for processing a voice signal according to an embodiment of the present invention.
- the mobile terminal when based on the acquired recording signal X and the broadcast signal Y, the mobile terminal is based on the recorded signal and the broadcast signal.
- the mobile terminal according to the power feature value of the recorded signal and the power feature value of the broadcast signal And the power characteristic value of the echo signal, and using the voice activation detection mechanism, calculating the power spectrum of the echo signal and the power spectrum of the noise signal, and then obtaining the frequency emphasis coefficient by calculating the maximum value of the voice intelligibility function, and finally based on the frequency
- the weighting coefficient is adjusted by using an emphasis filter to adjust the frequency of the broadcast signal, and the adjusted broadcast signal is output.
- FIG. 6 is a flowchart of a method for processing a voice signal according to another embodiment of the present invention.
- This method can be implemented by software.
- the mobile terminal After the voice instant messaging application is started, the mobile terminal periodically acquires the recording signal x collected by the microphone from the near end and the broadcast signal y sent by the opposite end, and calculates the power spectrum P x of the recorded signal and the power spectrum P of the broadcast signal. y , and then calculate the loop transfer function H_loop based on the formula ⁇ 3>. After determining the loop transfer function, the mobile terminal can calculate the estimated value E(n) of the echo signal according to the formula ⁇ 6>. In addition, since the echo signal, the near-end speech signal, and the noise signal are picked up by the same microphone, there is overlap in time.
- Equation ⁇ 10> calculates the noise power spectrum P n . Then, based on the power spectrum of the echo signal and the power spectrum of the noise signal, a speech intelligibility function SII is constructed, and by calculating the maximum value of the speech intelligibility function SII, the spectral emphasis coefficient W can be obtained. Finally, according to the formula ⁇ 17>, the output enhanced audio signal is sent to the speaker, and the speaker converts into sound for playing.
- the foregoing method may be implemented in a voice instant messaging application layer, or may be implemented at an operating system level, or may be implemented in firmware of a hardware chip.
- the processing method of the voice data provided by the embodiment of the present invention is applicable, and the only difference is that the processing method of the same voice data is specifically at which level in the mobile terminal system.
- the present invention has been described above by taking a mobile terminal as an example, and those skilled in the art can understand that the present invention can also be applied to other terminal devices, such as a desktop computer and the like.
- the above broadcast signal may be received from the opposite end, for example, the voice signal received by the terminal device from other terminal devices (ie, the peer device) through a wired or wireless network, and the above broadcast signal may also be local storage of the terminal device.
- Voice signal may be exemplified above, and those skilled in the art can understand that the above voice instant messaging application can be replaced with any other voice playing application.
- the above method can be used not only to improve speech intelligibility, but also to improve audio signals of other content.
- the tone of the ringtone and the alarm clock can be automatically enhanced according to different environmental noises, so that the enhanced prompt sound can be heard more clearly by the user, so as to overcome the interference of environmental noise.
- the above method can be used to combat non-noise environments in addition to anti-noise scenes.
- two people, A and B make calls at similar distances at the same time, where A and a call, B and b call. Since the distance between A and B is very close, the voice of A will interfere with the listening of B, and the voice of B also interferes with the listening of A.
- the method provided by the implementation of the present invention can also be applied to the voice competition scenario.
- the mobile terminal on the A side will use the voice of B as a noise signal and the voice of a as a signal to be enhanced.
- B The mobile terminal on the side will use A as the noise signal and the speech of b as the signal to be enhanced.
- the method provided by the embodiment of the invention automatically adjusts the frequency amplitude of the broadcast signal according to the frequency distribution of the noise signal and the broadcast signal under the premise of ensuring that the speaker is not overloaded without destroying the dynamic amplitude of the original broadcast signal, thereby significantly improving the amplitude of the broadcast signal.
- Voice intelligibility
- an embodiment of the present invention provides a schematic structural diagram of a processing apparatus for a voice signal, where the apparatus includes:
- the collecting module 701 is configured to collect a recording signal from the near end, where the recording signal includes at least a noise signal and an echo signal;
- the receiving module 702 is configured to receive a broadcast signal sent by the opposite end;
- the first calculating module 703 is configured to calculate a loop transfer function according to the recording signal and the broadcast signal;
- a second calculation module 704 configured to calculate a power spectrum of the recorded signal
- a third calculating module 705, configured to calculate a power spectrum of the echo signal and a power spectrum of the noise signal according to the power spectrum, the broadcast signal, and the loop transfer function of the recorded signal;
- a fourth calculating module 706, configured to calculate a frequency emphasis coefficient according to a power spectrum of the echo signal and a power spectrum of the noise signal;
- the adjusting module 707 is configured to adjust a frequency amplitude of the broadcast signal based on the frequency emphasis coefficient
- the output module 708 is configured to output the adjusted broadcast signal.
- the first calculating module 703 is configured to calculate a frequency domain cross-correlation function between the recording signal and the broadcast signal; calculate a frequency domain autocorrelation function of the broadcast signal; and according to the recording signal and the broadcast signal The frequency domain autocorrelation function between the frequency domain and the frequency domain autocorrelation function of the broadcast signal are used to calculate the loop transfer function.
- the second calculating module 704 is configured to calculate a power spectrum of the recorded signal by applying the following formula to the recorded signal:
- X(n) is the vector obtained by Fourier transforming the recorded signal acquired at the nth time, and .2 is used for each vector element in X(n) Find the square.
- the third calculating module 705 is configured to calculate a spectrum estimation value of the echo signal according to the loop transfer function and the broadcast signal; calculate a square of the spectrum estimation value of the echo signal, and obtain the power of the echo signal. Spectrum; the power spectrum of the echo signal is subtracted from the power spectrum of the recorded signal to obtain the power spectrum of the noise signal.
- the apparatus further includes:
- a fifth calculating module configured to calculate a power feature value of the recorded signal, a power feature value of the broadcast signal, and a power feature value of the echo signal;
- a first determining module configured to determine whether a power feature value of the recorded signal is greater than a first threshold, whether a power feature value of the broadcast signal is greater than a second threshold, and whether a power feature value of the echo signal is greater than a third threshold;
- the third calculating module 705 is configured to calculate a spectrum estimation value of the echo signal when the power feature value of the recording signal is greater than the first threshold, the power value of the broadcast signal is greater than the second threshold, and the power feature value of the echo signal is greater than the third threshold. Squared to get the power spectrum of the echo signal.
- the apparatus further includes:
- a second determining module configured to determine whether a power feature value of the recorded signal is less than a first threshold, and whether a power feature value of the echo signal is less than a third threshold;
- the third calculating module 705 is configured to: when the power feature value of the recorded signal is less than the first threshold and the power feature value of the echo signal is less than the third threshold, subtract the power spectrum of the echo signal from the power spectrum of the recorded signal to obtain a noise signal. power spectrum.
- the fourth calculating module 706 is configured to construct a speech intelligibility function according to the power spectrum of the echo signal and the power spectrum of the noise signal; and under the condition that the power spectrum of the echo signal remains unchanged According to the maximum value of the speech intelligibility function, the frequency emphasis coefficient is obtained.
- the device provided by the embodiment of the present invention automatically adjusts the frequency amplitude of the broadcast signal according to the frequency distribution of the noise signal and the broadcast signal, while ensuring that the speaker is not overloaded and does not damage the dynamic amplitude of the original broadcast signal.
- FIG. 8 is a schematic structural diagram of a processing terminal of a voice signal according to an embodiment of the present invention.
- the terminal may be used to implement a method for processing a voice signal provided in the foregoing embodiment. Specifically:
- the terminal 800 may include an RF (Radio Frequency) circuit 110, a memory 120 including one or more computer readable storage media, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, and a WiFi (Wireless Fidelity, wireless).
- the fidelity module 170 includes a processor 180 having one or more processing cores, and a power supply 190 and the like. It will be understood by those skilled in the art that the terminal structure shown in FIG. 8 does not constitute a limitation to the terminal, and may include more or less components than those illustrated, or a combination of certain components, or different component arrangements. among them:
- the RF circuit 110 can be used for transmitting and receiving information or during a call, receiving and transmitting signals, and in particular, receiving downlink information of the base station and then processing it by one or more processors 180; The data related to the uplink is sent to the base station.
- the RF circuit 110 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier). , duplexer, etc.
- SIM Subscriber Identity Module
- RF circuitry 110 can also communicate with the network and other devices via wireless communication.
- the wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access). , Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), e-mail, SMS (Short Messaging Service), and the like.
- GSM Global System of Mobile communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- e-mail Short Messaging Service
- the memory 120 can be used to store software programs and modules, and the processor 180 executes various functional applications and data processing by running software programs and modules stored in the memory 120.
- the memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to The data created by the use of the terminal 800 (such as audio data, phone book, etc.) and the like.
- memory 120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 120 may also include a memory controller to provide access to memory 120 by processor 180 and input unit 130.
- the input unit 130 can be configured to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
- input unit 130 can include touch-sensitive surface 131 as well as other input devices 132.
- Touch-sensitive surface 131 also referred to as a touch display or trackpad, can collect touch operations on or near the user (such as a user using a finger, stylus, etc., on any suitable object or accessory on touch-sensitive surface 131 or The operation near the touch-sensitive surface 131) and driving the corresponding connecting device according to a preset program.
- the touch-sensitive surface 131 can include two portions of a touch detection device and a touch controller.
- the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
- the processor 180 is provided and can receive commands from the processor 180 and execute them.
- the touch-sensitive surface 131 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
- the input unit 130 can also include other input devices 132.
- other input devices 132 may include but are not limited to physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, and operations. One or more of a rod or the like.
- Display unit 140 can be used to display information entered by the user or information provided to the user and various graphical user interfaces of terminal 800, which can be constructed from graphics, text, icons, video, and any combination thereof.
- the display unit 140 may include a display panel 141.
- the display panel 141 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
- the touch-sensitive surface 131 may cover the display panel 141, and when the touch-sensitive surface 131 detects a touch operation thereon or nearby, it is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 according to the touch event The type provides a corresponding visual output on display panel 141.
- touch-sensitive surface 131 and display panel 141 are implemented as two separate components to implement input and input functions, in some embodiments, touch-sensitive surface 131 can be integrated with display panel 141 for input. And output function.
- Terminal 800 can also include at least one type of sensor 150, such as a light sensor, motion sensor, and other sensors.
- the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 141 according to the brightness of the ambient light, and the proximity sensor may close the display panel 141 when the terminal 800 moves to the ear. / or backlight.
- the gravity acceleration sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
- the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the terminal 800 can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here Let me repeat.
- the audio circuit 160, the speaker 161, and the microphone 162 can provide an audio interface between the user and the terminal 800.
- the audio circuit 160 can transmit the converted electrical data of the received audio data to the speaker 161 for conversion to the sound signal output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal by the audio circuit 160. After receiving, it is converted into audio data, and then processed by the audio data output processor 180, transmitted to the terminal, for example, via the RF circuit 110, or outputted to the memory 120 for further processing.
- the audio circuit 160 may also include an earbud jack to provide communication of the peripheral earphones with the terminal 800.
- WiFi is a short-range wireless transmission technology
- the terminal 800 can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 170, which provides wireless broadband Internet access for users.
- FIG. 8 shows the WiFi module 170, it can be understood that it does not belong to the essential configuration of the terminal 800, and may be omitted as needed within the scope of not changing the essence of the invention.
- the processor 180 is the control center of the terminal 800, connecting various portions of the entire handset with various interfaces and lines, by running or executing software programs and/or modules stored in the memory 120, and recalling data stored in the memory 120, The various functions and processing data of the terminal 800 are performed to perform overall monitoring of the mobile phone.
- the processor 180 may include one or more processing cores; optionally, the processor 180 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, and an application. Etc.
- the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 180.
- the terminal 800 also includes a power source 190 (such as a battery) for powering various components.
- a power source 190 such as a battery
- the power source can be logically coupled to the processor 180 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
- Power supply 190 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
- the terminal 800 may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
- the display unit of the terminal 800 is a touch screen display
- the terminal 800 further includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to be one or one The above processor executes.
- the one or more programs include instructions for performing the following operations:
- the adjusted speech signal is output.
- the recorded signal is a sound signal collected using a microphone of the terminal device.
- the outputting the adjusted voice signal comprises playing the adjusted voice signal through a speaker, wherein the voice signal is received by the terminal device through the network. Or a locally stored broadcast signal to be played through the speaker.
- the end memory also contains instructions for performing the following operations:
- Calculate the loop transfer function based on the recorded signal and the speech signal including:
- the terminal's memory also contains instructions for performing the following operations:
- Calculate the power spectrum of the recorded signal including:
- X(n) is the vector obtained by Fourier transforming the recorded signal acquired at the nth time, and .2 is used for each vector element in X(n) Find the square.
- the memory of the terminal further includes an instruction for performing the following operations:
- Calculating the power spectrum of the echo signal and the power spectrum of the noise signal according to the recording signal, the voice signal, and the loop transfer function including:
- the power spectrum of the noise signal is obtained by subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal.
- the memory of the terminal further includes an instruction for performing the following operations:
- Calculating the square of the spectral estimate of the echo signal, before obtaining the power spectrum of the echo signal also includes:
- the power characteristic value of the recording signal is greater than the first threshold
- the power value of the broadcast signal is greater than the second threshold
- the power characteristic value of the echo signal is greater than the third threshold
- the end memory also contains instructions for performing the following operations:
- the method further includes:
- the step of subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal to obtain a power spectrum of the noise signal is performed.
- the memory of the terminal further includes an instruction for performing the following operations:
- Calculating the frequency emphasis coefficient according to the power spectrum of the echo signal and the power spectrum of the noise signal including:
- the frequency emphasis coefficient is obtained according to the maximum value of the speech intelligibility function.
- the terminal provided by the embodiment of the invention automatically adjusts the frequency amplitude of the broadcast signal according to the frequency distribution of the noise signal and the broadcast signal under the premise of ensuring that the speaker is not overloaded and does not destroy the dynamic amplitude of the original broadcast signal, thereby significantly improving the frequency. Voice intelligibility.
- the embodiment of the present invention further provides a computer readable storage medium, which may be a computer readable storage medium included in the memory in the above embodiment; or may exist separately and not assembled into the terminal.
- Computer readable storage medium stores one or more programs that are used by one or more processors to perform a method of processing a speech signal, the method comprising:
- the adjusted speech signal is output.
- the first possible implementation is used as the basis.
- the recorded signal is a sound signal collected using a microphone of the terminal device.
- the outputting the adjusted voice signal comprises playing the adjusted voice signal through a speaker, wherein the voice signal is received by the terminal device through the network. Or a locally stored broadcast signal to be played through the speaker.
- the memory of the terminal further includes an instruction for performing the following operations:
- Calculate the loop transfer function based on the recorded signal and the speech signal including:
- the terminal's memory also contains instructions for performing the following operations:
- Calculate the power spectrum of the recorded signal including:
- X(n) is the vector obtained by Fourier transforming the recorded signal acquired at the nth time, and .2 is used for each vector element in X(n) Find the square.
- the memory of the terminal further includes an instruction for performing the following operations:
- Calculating the power spectrum of the echo signal and the power spectrum of the noise signal according to the recording signal, the voice signal, and the loop transfer function including:
- the power spectrum of the noise signal is obtained by subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal.
- the memory of the terminal further includes an instruction for performing the following operations:
- Calculating the square of the spectral estimate of the echo signal, before obtaining the power spectrum of the echo signal also includes:
- the power characteristic value of the recording signal is greater than the first threshold
- the power value of the broadcast signal is greater than the second threshold
- the power characteristic value of the echo signal is greater than the third threshold
- the memory of the terminal further includes an instruction for performing the following operations:
- the method further includes:
- the step of subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal to obtain a power spectrum of the noise signal is performed.
- the memory of the terminal further includes an instruction for performing the following operations:
- Calculating the frequency emphasis coefficient according to the power spectrum of the echo signal and the power spectrum of the noise signal including:
- the frequency emphasis coefficient is obtained according to the maximum value of the speech intelligibility function.
- the computer readable storage medium provided by the embodiment of the invention automatically adjusts the frequency amplitude of the broadcast signal according to the frequency distribution of the noise signal and the broadcast signal under the premise of ensuring that the speaker is not overloaded and does not destroy the dynamic amplitude of the original broadcast signal. , significantly improved speech intelligibility.
- a graphic user interface is provided.
- the graphic user interface is used on a processing terminal of a voice signal, and the processing terminal for executing a voice signal includes a touch screen display, a memory, and a program for executing one or more programs. Or more than one processor; the graphical user interface includes:
- the recording signal includes at least a noise signal and an echo signal
- the adjusted speech signal is output.
- the graphic user interface provided by the embodiment of the invention automatically adjusts the frequency amplitude of the broadcast signal according to the frequency distribution of the noise signal and the broadcast signal, while ensuring that the speaker is not overloaded and does not destroy the dynamic amplitude of the original broadcast signal. Improved speech intelligibility.
- the processing apparatus for the voice signal provided by the foregoing embodiment is only illustrated by the division of the foregoing functional modules. In actual applications, the foregoing functions may be allocated by different functional modules as needed. Completion, that is, the internal structure of the processing device of the voice signal is divided into different functional modules to complete all or part of the functions described above.
- the processing device of the voice signal provided by the foregoing embodiment is the same as the embodiment of the method for processing the voice signal, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
- a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
- the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Telephone Function (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
Claims (18)
- 一种语音信号的处理方法,包括:获取录音信号和要输出的语音信号,所述录音信号中至少包括噪声信号及回声信号;根据所述录音信号和所述语音信号,计算环路传递函数;根据所述录音信号、所述语音信号及所述环路传递函数,计算所述回声信号的功率谱和所述噪声信号的功率谱;根据所述回声信号的功率谱和所述噪声信号的功率谱,计算频率加重系数;基于所述频率加重系数,对所述语音信号的频点幅值进行调节;输出调节后的语音信号。
- 根据权利要求1所述的方法,其中,所述录音信号是使用终端设备的麦克风采集的声音信号。
- 根据权利要求1所述的方法,其中,输出调节后的语音信号包括通过终端设备的扬声器播放调节后的语音信号,其中所述语音信号是终端设备通过网络接收的或本地存储的要通过扬声器播放的播音信号。
- 根据权利要求3所述的方法,其中,所述根据所述录音信号和所述语音信号,计算环路传递函数,包括:计算所述录音信号与所述播音信号之间的频域互相关函数;计算所述播音信号的频域自相关函数;根据所述录音信号与所述播音信号之间的频域互相关函数以及所述播音信号的频域自相关函数计算所述环路传递函数。
- 根据权利要求3所述的方法,其中,所述根据所述录音信号、所述语音信号及所述环路传递函数,计算所述回声信号的功率谱和所述噪声信号的功率谱,包括:计算所述录音信号的功率谱;根据所述环路传递函数及所述播音信号,计算所述回声信号的频谱估计值;计算所述回声信号的频谱估计值的平方,得到所述回声信号的功率谱;将所述录音信号的功率谱减去所述回声信号的功率谱,得到所述噪声信号的功率谱。
- 根据权利要求5所述的方法,还包括:计算所述录音信号的功率特征值、所述播音信号的功率特征值及所述回声信号的功率特征值;和判断所述录音信号的功率特征值是否大于第一阈值、所述播音信号的功率特征值是否大于第二阈值、所述回声信号的功率特征值是否大于第三阈值,其中,所述计算所述回声信号的频谱估计值的平方,得到所述回声信号的功率谱包括:当所述录音信号的功率特征值大于所述第一阈值、所述播音信号的功率值大于所述第二阈值且所述回声信号的功率特征值大于所述第三阈值时,计算所述回声信号的频谱估计值的平方,得到所述回声信号的功率谱。
- 根据权利要求6所述的方法,还包括:判断所述录音信号的功率特征值是否小于所述第一阈值、所述回声信号的功率特征值是否小于所述第三阈值,其中,所述将所述录音信号的功率谱减去所述回声信号的功率谱,得到所述噪声信号的功率谱包括:当所述录音信号的功率特征值小于所述第一阈值且所述回声信号的功率特征值小于所述第三阈值时,将所述录音信号的功率谱减去所述回声信号的功率谱,得到所述噪声信号的功率谱。
- 根据权利要求3所述的方法,其中,所述根据所述回声信号的功率谱和所述噪声信号的功率谱,计算频率加重系数,包括:根据所述回声信号的功率谱及所述噪声信号的功率谱,构建语音可懂度函数;在所述回声信号的功率谱保持不变的条件下,根据所述语音可懂度函数的极大值,得到所述频率加重系数。
- 根据权利要求1所述的方法,其中所述终端设备包括加重滤波器、扬声器和麦克风,所述频率加重系数表示语音信号经过加重滤波器和扬声器后被麦克风拾取的比例。
- 一种语音信号的处理装置,包括:至少一个处理器;和存储器,其中所述存储器存储有程序指令,所述指令当由所述处理器执行时,配置所述装置执行下述操作:获取录音信号和语音信号,所述录音信号中至少包括噪声信号及回声信号;根据所述录音信号和所述语音信号,计算环路传递函数;根据所述录音信号、所述语音信号及所述环路传递函数,计算所述回声信号的功率谱和所述噪声信号的功率谱;根据所述回声信号的功率谱和所述噪声信号的功率谱,计算频率加重系数;基于所述频率加重系数,对所述语音信号的频点幅值进行调节;输出调节后的语音信号。
- 根据权利要求10所述的装置,其中,所述录音信号是使用终端设备的麦克风采集的声音信号。
- 根据权利要求10所述的装置,其中,输出调节后的语音信号包括通过扬声器播放调节后的语音信号,其中所述语音信号是终端设备通过网络接收的或本地存储的要通过扬声器播放的播音信号。
- 根据权利要求12所述的装置,其中,所述根据所述录音信号和所述语音信号,计算环路传递函数,包括:计算所述录音信号与所述播音信号之间的频域互相关函数;计算所述播音信号的频域自相关函数;根据所述录音信号与所述播音信号之间的频域互相关函数以及所述播音信号的频域自相关函数计算所述环路传递函数。
- 根据权利要求12所述的装置,其中,所述根据所述录音信号、所述语音信号及所述环路传递函数,计算所述回声信号的功率谱和所述噪声信号的功率谱,包括:计算所述录音信号的功率谱;根据所述环路传递函数及所述播音信号,计算所述回声信号的频谱估计值;计算所述回声信号的频谱估计值的平方,得到所述回声信号的功率谱;将所述录音信号的功率谱减去所述回声信号的功率谱,得到所述噪声信号的功率谱。
- 根据权利要求12所述的装置,其中,所述装置还被配置为:计算所述录音信号的功率特征值、所述播音信号的功率特征值及所述回声信号的功率特征值;判断所述录音信号的功率特征值是否大于第一阈值、所述播音信号的功率特征值是否大于第二阈值、所述回声信号的功率特征值是否大于第三阈值;当所述录音信号的功率特征值大于所述第一阈值、所述播音信号的功率值 大于所述第二阈值且所述回声信号的功率特征值大于所述第三阈值时,计算所述回声信号的频谱估计值的平方,得到所述回声信号的功率谱。
- 根据权利要求12所述的装置,其中,所述装置还被配置为:判断所述录音信号的功率特征值是否小于所述第一阈值、所述回声信号的功率特征值是否小于所述第三阈值;当所述录音信号的功率特征值小于所述第一阈值且所述回声信号的功率特征值小于所述第三阈值时,将所述录音信号的功率谱减去所述回声信号的功率谱,得到所述噪声信号的功率谱。
- 根据权利要求12所述的装置,其中,所述根据所述回声信号的功率谱和所述噪声信号的功率谱,计算频率加重系数,包括:根据所述回声信号的功率谱及所述噪声信号的功率谱,构建语音可懂度函数;在所述回声信号的功率谱保持不变的条件下,根据所述语音可懂度函数的极大值,得到所述频率加重系数。
- 一种计算机可读存储介质,所述存储介质存储有程序指令,所述指令当由计算装置的处理器执行时,配置所述装置执行根据权利要求1-9中任一项所述的方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017553962A JP6505252B2 (ja) | 2015-11-04 | 2016-05-27 | 音声信号を処理するための方法及び装置 |
EP16861250.5A EP3373300B1 (en) | 2015-11-04 | 2016-05-27 | Method and apparatus for processing voice signal |
KR1020177029724A KR101981879B1 (ko) | 2015-11-04 | 2016-05-27 | 음성 신호를 처리하기 위한 방법 및 장치 |
US15/691,300 US10586551B2 (en) | 2015-11-04 | 2017-08-30 | Speech signal processing method and apparatus |
US16/774,854 US10924614B2 (en) | 2015-11-04 | 2020-01-28 | Speech signal processing method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510741057.1 | 2015-11-04 | ||
CN201510741057.1A CN105280195B (zh) | 2015-11-04 | 2015-11-04 | 语音信号的处理方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/691,300 Continuation-In-Part US10586551B2 (en) | 2015-11-04 | 2017-08-30 | Speech signal processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017075979A1 true WO2017075979A1 (zh) | 2017-05-11 |
Family
ID=55149085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/083622 WO2017075979A1 (zh) | 2015-11-04 | 2016-05-27 | 语音信号的处理方法及装置 |
Country Status (7)
Country | Link |
---|---|
US (2) | US10586551B2 (zh) |
EP (1) | EP3373300B1 (zh) |
JP (1) | JP6505252B2 (zh) |
KR (1) | KR101981879B1 (zh) |
CN (1) | CN105280195B (zh) |
MY (1) | MY179978A (zh) |
WO (1) | WO2017075979A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390947A (zh) * | 2018-04-23 | 2019-10-29 | 北京京东尚科信息技术有限公司 | 声源位置的确定方法、系统、设备和存储介质 |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105280195B (zh) | 2015-11-04 | 2018-12-28 | 腾讯科技(深圳)有限公司 | 语音信号的处理方法及装置 |
US20170330564A1 (en) * | 2016-05-13 | 2017-11-16 | Bose Corporation | Processing Simultaneous Speech from Distributed Microphones |
CN106506872B (zh) * | 2016-11-02 | 2019-05-24 | 腾讯科技(深圳)有限公司 | 通话状态检测方法及装置 |
WO2018054171A1 (zh) | 2016-09-22 | 2018-03-29 | 腾讯科技(深圳)有限公司 | 通话方法、装置、计算机存储介质及终端 |
CN108447472B (zh) * | 2017-02-16 | 2022-04-05 | 腾讯科技(深圳)有限公司 | 语音唤醒方法及装置 |
CN106878575B (zh) * | 2017-02-24 | 2019-11-05 | 成都喜元网络科技有限公司 | 残留回声的估计方法及装置 |
CN107833579B (zh) * | 2017-10-30 | 2021-06-11 | 广州酷狗计算机科技有限公司 | 噪声消除方法、装置及计算机可读存储介质 |
CN108200526B (zh) * | 2017-12-29 | 2020-09-22 | 广州励丰文化科技股份有限公司 | 一种基于可信度曲线的音响调试方法及装置 |
US11335357B2 (en) * | 2018-08-14 | 2022-05-17 | Bose Corporation | Playback enhancement in audio systems |
CN109727605B (zh) * | 2018-12-29 | 2020-06-12 | 苏州思必驰信息科技有限公司 | 处理声音信号的方法及系统 |
KR20210072384A (ko) | 2019-12-09 | 2021-06-17 | 삼성전자주식회사 | 전자 장치 및 이의 제어 방법 |
CN111048096B (zh) * | 2019-12-24 | 2022-07-26 | 大众问问(北京)信息科技有限公司 | 一种语音信号处理方法、装置及终端 |
CN111048118B (zh) * | 2019-12-24 | 2022-07-26 | 大众问问(北京)信息科技有限公司 | 一种语音信号处理方法、装置及终端 |
CN111128194A (zh) * | 2019-12-31 | 2020-05-08 | 云知声智能科技股份有限公司 | 一种提高在线语音识别效果的系统及方法 |
CN112203188B (zh) * | 2020-07-24 | 2021-10-01 | 北京工业大学 | 一种自动音量调节方法 |
KR102424795B1 (ko) * | 2020-08-25 | 2022-07-25 | 서울과학기술대학교 산학협력단 | 음성 구간 검출 방법 |
CN111986688B (zh) * | 2020-09-09 | 2024-07-23 | 北京小米松果电子有限公司 | 一种提高语音清晰度的方法、装置及介质 |
CN112259125B (zh) * | 2020-10-23 | 2023-06-16 | 江苏理工学院 | 基于噪声的舒适度评价方法、系统、设备及可存储介质 |
US11610598B2 (en) * | 2021-04-14 | 2023-03-21 | Harris Global Communications, Inc. | Voice enhancement in presence of noise |
CN112820311A (zh) * | 2021-04-16 | 2021-05-18 | 成都启英泰伦科技有限公司 | 一种基于空间预测的回声消除方法及装置 |
CN114822571A (zh) * | 2021-04-25 | 2022-07-29 | 美的集团(上海)有限公司 | 一种回声消除方法、装置、电子设备和存储介质 |
CN113178192B (zh) * | 2021-04-30 | 2024-05-24 | 平安科技(深圳)有限公司 | 语音识别模型的训练方法、装置、设备及存储介质 |
CN115665642B (zh) * | 2022-12-12 | 2023-03-17 | 杭州兆华电子股份有限公司 | 一种噪声消除方法及系统 |
DE202023103428U1 (de) | 2023-06-21 | 2023-06-28 | Richik Kashyap | Ein Sprachqualitätsschätzsystem für reale Signale basierend auf nicht negativer frequenzgewichteter Energie |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763858A (zh) * | 2009-10-19 | 2010-06-30 | 瑞声声学科技(深圳)有限公司 | 双麦克风信号处理方法 |
CN102893331A (zh) * | 2010-05-20 | 2013-01-23 | 高通股份有限公司 | 用于使用头戴式麦克风对来处理语音信号的方法、设备和计算机可读媒体 |
CN103606374A (zh) * | 2013-11-26 | 2014-02-26 | 国家电网公司 | 一种瘦终端的噪音消除和回声抑制方法及装置 |
CN104050971A (zh) * | 2013-03-15 | 2014-09-17 | 杜比实验室特许公司 | 声学回声减轻装置和方法、音频处理装置和语音通信终端 |
CN105280195A (zh) * | 2015-11-04 | 2016-01-27 | 腾讯科技(深圳)有限公司 | 语音信号的处理方法及装置 |
Family Cites Families (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04100460A (ja) * | 1990-08-20 | 1992-04-02 | Nippon Telegr & Teleph Corp <Ntt> | 電話機の歪測定方法 |
JP3397269B2 (ja) * | 1994-10-26 | 2003-04-14 | 日本電信電話株式会社 | 多チャネル反響消去方法 |
IL115892A (en) * | 1994-11-10 | 1999-05-09 | British Telecomm | Interference detection system for telecommunications |
JP3420705B2 (ja) * | 1998-03-16 | 2003-06-30 | 日本電信電話株式会社 | エコー抑圧方法及び装置並びにエコー抑圧プログラムが記憶されたコンピュータに読取り可能な記憶媒体 |
EP0980064A1 (de) * | 1998-06-26 | 2000-02-16 | Ascom AG | Verfahren zur Durchführung einer maschinengestützten Beurteilung der Uebertragungsqualität von Audiosignalen |
KR100723283B1 (ko) * | 1999-06-24 | 2007-05-30 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 음향 에코 및 잡음 제거 적응성 필터 |
WO2002013572A2 (en) * | 2000-08-07 | 2002-02-14 | Audia Technology, Inc. | Method and apparatus for filtering and compressing sound signals |
US7117145B1 (en) * | 2000-10-19 | 2006-10-03 | Lear Corporation | Adaptive filter for speech enhancement in a noisy environment |
US7171003B1 (en) * | 2000-10-19 | 2007-01-30 | Lear Corporation | Robust and reliable acoustic echo and noise cancellation system for cabin communication |
DE10157535B4 (de) * | 2000-12-13 | 2015-05-13 | Jörg Houpert | Verfahren und Vorrichtung zur Reduzierung zufälliger, kontinuierlicher, instationärer Störungen in Audiosignalen |
WO2003083828A1 (en) * | 2002-03-27 | 2003-10-09 | Aliphcom | Nicrophone and voice activity detection (vad) configurations for use with communication systems |
JP3864914B2 (ja) * | 2003-01-20 | 2007-01-10 | ソニー株式会社 | エコー抑圧装置 |
EP1591995B1 (en) * | 2004-04-29 | 2019-06-19 | Harman Becker Automotive Systems GmbH | Indoor communication system for a vehicular cabin |
US7454332B2 (en) * | 2004-06-15 | 2008-11-18 | Microsoft Corporation | Gain constrained noise suppression |
CN1321400C (zh) * | 2005-01-18 | 2007-06-13 | 中国电子科技集团公司第三十研究所 | 客观音质评价中基于噪声掩蔽门限算法的巴克谱失真测度方法 |
US8594320B2 (en) * | 2005-04-19 | 2013-11-26 | (Epfl) Ecole Polytechnique Federale De Lausanne | Hybrid echo and noise suppression method and device in a multi-channel audio signal |
CN101233561B (zh) * | 2005-08-02 | 2011-07-13 | 皇家飞利浦电子股份有限公司 | 通过根据背景噪声控制振动器的操作来增强移动通信设备中的语音可懂度 |
JP4671303B2 (ja) * | 2005-09-02 | 2011-04-13 | 国立大学法人北陸先端科学技術大学院大学 | マイクロホンアレイ用ポストフィルタ |
ATE492979T1 (de) * | 2005-09-20 | 2011-01-15 | Ericsson Telefon Ab L M | Verfahren zur messung der sprachverständlichkeit |
US8046218B2 (en) * | 2006-09-19 | 2011-10-25 | The Board Of Trustees Of The University Of Illinois | Speech and method for identifying perceptual features |
JP4509126B2 (ja) * | 2007-01-24 | 2010-07-21 | 沖電気工業株式会社 | エコーキャンセラ及びエコーキャンセル方法 |
US8954324B2 (en) * | 2007-09-28 | 2015-02-10 | Qualcomm Incorporated | Multiple microphone voice activity detector |
ATE521064T1 (de) * | 2007-10-08 | 2011-09-15 | Harman Becker Automotive Sys | Verstärkung und spektralformenanpassung bei der verarbeitung von audiosignalen |
DE602007007090D1 (de) * | 2007-10-11 | 2010-07-22 | Koninkl Kpn Nv | Verfahren und System zur Messung der Sprachverständlichkeit eines Tonübertragungssystems |
US8412525B2 (en) * | 2009-04-30 | 2013-04-02 | Microsoft Corporation | Noise robust speech classifier ensemble |
CN101582264A (zh) * | 2009-06-12 | 2009-11-18 | 瑞声声学科技(深圳)有限公司 | 语音增强的方法及语音增加的声音采集系统 |
GB2493327B (en) * | 2011-07-05 | 2018-06-06 | Skype | Processing audio signals |
DK2563045T3 (da) * | 2011-08-23 | 2014-10-27 | Oticon As | Fremgangsmåde og et binauralt lyttesystem for at maksimere en bedre øreeffekt |
CN102306496B (zh) * | 2011-09-05 | 2014-07-09 | 歌尔声学股份有限公司 | 一种多麦克风阵列噪声消除方法、装置及系统 |
CN102510418B (zh) * | 2011-10-28 | 2015-11-25 | 声科科技(南京)有限公司 | 噪声环境下的语音可懂度测量方法及装置 |
CN103578479B (zh) * | 2013-09-18 | 2016-05-25 | 中国人民解放军电子工程学院 | 基于听觉掩蔽效应的语音可懂度测量方法 |
US10262677B2 (en) * | 2015-09-02 | 2019-04-16 | The University Of Rochester | Systems and methods for removing reverberation from audio signals |
US10403299B2 (en) * | 2017-06-02 | 2019-09-03 | Apple Inc. | Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition |
US20180358032A1 (en) * | 2017-06-12 | 2018-12-13 | Ryo Tanaka | System for collecting and processing audio signals |
-
2015
- 2015-11-04 CN CN201510741057.1A patent/CN105280195B/zh active Active
-
2016
- 2016-05-27 EP EP16861250.5A patent/EP3373300B1/en active Active
- 2016-05-27 KR KR1020177029724A patent/KR101981879B1/ko active IP Right Grant
- 2016-05-27 MY MYPI2017703990A patent/MY179978A/en unknown
- 2016-05-27 WO PCT/CN2016/083622 patent/WO2017075979A1/zh active Application Filing
- 2016-05-27 JP JP2017553962A patent/JP6505252B2/ja active Active
-
2017
- 2017-08-30 US US15/691,300 patent/US10586551B2/en active Active
-
2020
- 2020-01-28 US US16/774,854 patent/US10924614B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763858A (zh) * | 2009-10-19 | 2010-06-30 | 瑞声声学科技(深圳)有限公司 | 双麦克风信号处理方法 |
CN102893331A (zh) * | 2010-05-20 | 2013-01-23 | 高通股份有限公司 | 用于使用头戴式麦克风对来处理语音信号的方法、设备和计算机可读媒体 |
CN104050971A (zh) * | 2013-03-15 | 2014-09-17 | 杜比实验室特许公司 | 声学回声减轻装置和方法、音频处理装置和语音通信终端 |
CN103606374A (zh) * | 2013-11-26 | 2014-02-26 | 国家电网公司 | 一种瘦终端的噪音消除和回声抑制方法及装置 |
CN105280195A (zh) * | 2015-11-04 | 2016-01-27 | 腾讯科技(深圳)有限公司 | 语音信号的处理方法及装置 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390947A (zh) * | 2018-04-23 | 2019-10-29 | 北京京东尚科信息技术有限公司 | 声源位置的确定方法、系统、设备和存储介质 |
CN110390947B (zh) * | 2018-04-23 | 2024-04-05 | 北京京东尚科信息技术有限公司 | 声源位置的确定方法、系统、设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN105280195B (zh) | 2018-12-28 |
EP3373300A1 (en) | 2018-09-12 |
JP2018517167A (ja) | 2018-06-28 |
US20200168237A1 (en) | 2020-05-28 |
EP3373300A4 (en) | 2019-07-31 |
CN105280195A (zh) | 2016-01-27 |
US10586551B2 (en) | 2020-03-10 |
EP3373300B1 (en) | 2020-09-16 |
MY179978A (en) | 2020-11-19 |
US20170365270A1 (en) | 2017-12-21 |
KR20170129211A (ko) | 2017-11-24 |
US10924614B2 (en) | 2021-02-16 |
KR101981879B1 (ko) | 2019-05-23 |
JP6505252B2 (ja) | 2019-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017075979A1 (zh) | 语音信号的处理方法及装置 | |
US10609483B2 (en) | Method for sound effect compensation, non-transitory computer-readable storage medium, and terminal device | |
EP3547659B1 (en) | Method for processing audio signal and related products | |
JP5876154B2 (ja) | 雑音を制御するための電子デバイス | |
US20230008818A1 (en) | Sound masking method and apparatus, and terminal device | |
WO2017143805A1 (zh) | 回声消除方法、装置和计算机存储介质 | |
CN108540900B (zh) | 音量调节方法及相关产品 | |
US10687142B2 (en) | Method for input operation control and related products | |
US10878833B2 (en) | Speech processing method and terminal | |
JP2016541222A (ja) | フィードバック検出のためのシステムおよび方法 | |
US20140341386A1 (en) | Noise reduction | |
CN111083297A (zh) | 一种回声消除方法及电子设备 | |
CN110995909B (zh) | 一种声音补偿方法及装置 | |
CN111182118A (zh) | 一种音量调节方法及电子设备 | |
CN111541975B (zh) | 音频信号的调节方法及电子设备 | |
CN116994596A (zh) | 啸叫抑制方法、装置、存储介质及电子设备 | |
WO2023284406A1 (zh) | 一种通话方法及电子设备 | |
CN115691524A (zh) | 音频信号的处理方法、装置、设备及存储介质 | |
CN106210951A (zh) | 一种蓝牙耳机的适配方法、装置和终端 | |
WO2022254834A1 (ja) | 信号処理装置、信号処理方法およびプログラム | |
CN116246645A (zh) | 语音处理方法、装置、存储介质及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16861250 Country of ref document: EP Kind code of ref document: A1 |
|
REEP | Request for entry into the european phase |
Ref document number: 2016861250 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2017553962 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20177029724 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |