CN113744752A - Voice processing method and device - Google Patents

Voice processing method and device Download PDF

Info

Publication number
CN113744752A
CN113744752A CN202111003630.0A CN202111003630A CN113744752A CN 113744752 A CN113744752 A CN 113744752A CN 202111003630 A CN202111003630 A CN 202111003630A CN 113744752 A CN113744752 A CN 113744752A
Authority
CN
China
Prior art keywords
processed
audio signal
signal
estimation
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111003630.0A
Other languages
Chinese (zh)
Inventor
聂玮奇
刘煜
刘博洋
季经伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Shengbijie Information Technology Co ltd
Original Assignee
Xi'an Shengbijie Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Shengbijie Information Technology Co ltd filed Critical Xi'an Shengbijie Information Technology Co ltd
Priority to CN202111003630.0A priority Critical patent/CN113744752A/en
Publication of CN113744752A publication Critical patent/CN113744752A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present disclosure provides a voice processing method and apparatus, relating to the technical field of voice, wherein the method comprises obtaining at least two audio signals to be processed; the at least two audio signals to be processed comprise audio signals acquired by a microphone array; estimating the direction of arrival of any two microphones in the microphone array; carrying out beam forming processing on the audio signal to be processed according to the direction of arrival estimation and the beam forming algorithm; carrying out noise suppression on the audio signal to be processed after the beam forming processing to obtain a target audio signal; and outputting the target audio signal. The audio frequency picking and enhancing function is realized, and the accuracy of audio frequency identification is improved.

Description

Voice processing method and device
Technical Field
The present disclosure relates to the field of speech technologies, and in particular, to a speech processing method and apparatus.
Background
With the continuous development of artificial intelligence technology, traditional equipment in various fields is gradually replaced by corresponding intelligent terminals. The intelligent terminal is a fully-open platform with multiple functions of monitoring, sensing, communication and intelligent interaction, carries an operating system, can automatically install and uninstall various application software, and continuously expands and upgrades functions. In the aspect of intelligent interaction, many complicated items are not realized only by a remote control and a touch screen which are commonly used by a target, wherein the best method is to adopt a voice remote control, and the key of the voice remote control is the acquisition and recognition of a voice signal.
In the related art, when a speech signal is acquired, the speech signal is usually filtered and output directly.
However, in the above-described technique, if the acquired speech signal includes speech in a plurality of directions, only filtering results in a large amount of noise in the finally obtained speech signal, and thus accuracy of speech recognition is reduced.
Disclosure of Invention
The embodiment of the disclosure provides a voice processing method and device, which can solve the problem that the accuracy of voice recognition is reduced in the prior art. The technical scheme is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a speech processing method, the method including:
acquiring at least two audio signals to be processed; the at least two audio signals to be processed comprise audio signals acquired by a microphone array;
estimating the direction of arrival of any two microphones in the microphone array;
carrying out beam forming processing on the audio signal to be processed according to the direction of arrival estimation and the beam forming algorithm;
carrying out noise suppression on the audio signal to be processed after the beam forming processing to obtain a target audio signal;
and outputting the target audio signal.
The embodiment of the disclosure provides a voice processing method, which performs direction-of-arrival estimation on any two microphones in a microphone array when a plurality of audio signals to be processed are acquired, performs beamforming processing on the audio signals to be processed according to the direction-of-arrival estimation and a beamforming algorithm, performs noise suppression on the audio signals to be processed after the beamforming processing, and finally outputs a target audio signal obtained after the noise reduction and suppression. Therefore, the method and the device have the advantages that the direction of arrival estimation is carried out on every two audio signals to be processed, and the noise suppression processing is carried out on the audio signals to be processed after the beam forming processing, so that the audio picking and enhancing functions are realized, and the accuracy of audio identification is improved.
In one embodiment, before the estimating the direction of arrival of any two microphones in the microphone array, the method further includes:
performing voice activity detection and noise estimation on each audio signal to be processed, and determining the existence probability of the audio signal according to the results of the voice activity detection and the noise estimation;
the estimating direction of arrival of any two microphones of the microphone array comprises:
and estimating the direction of arrival of any two microphones in the microphone array according to the existence probability of the audio signal.
In one embodiment, the estimating the direction of arrival of any two microphones of the microphone array according to the audio signal existence probability comprises:
and calculating the time delay estimation of any two microphones in the microphone array according to the existence probability of the audio signal, and calculating the relative angle between a target sound source and the microphone array according to the time delay estimation result.
In one embodiment, the performing voice activity detection and noise estimation on each of the audio signals to be processed includes:
determining whether there is a synchronous input signal;
when the synchronous input signal is determined, performing echo cancellation processing on each audio signal to be processed;
performing voice activity detection and noise estimation on each audio signal to be processed after echo cancellation processing;
and when the synchronous input signal is determined not to exist, carrying out voice activity detection and noise estimation on each audio signal to be processed.
In one embodiment, the obtaining at least two audio signals to be processed comprises:
acquiring at least two original audio signals; the original audio signal is a signal output by an audio input module;
and carrying out short-time Fourier transform on each original audio signal to obtain the audio signal to be processed.
In one embodiment, the performing echo cancellation processing on each of the audio signals to be processed includes:
according to the formula
Figure BDA0003236475330000031
And
formula (II)
Figure BDA0003236475330000032
Performing echo cancellation processing on each audio signal to be processed;
wherein y (t, m) represents the synchronous input signal collected by the mth microphone at the time t, s (t-l) represents the synchronous input signal at the time t-l, hlIndicating the channel between the synchronous input signal to each microphone, L being an identifier in the accumulation operator, L indicating the length of time, h (t, m) [ h ]0h1...hL-1]Representing the channel between the synchronous input signal to the mth microphone at time t;
Figure BDA0003236475330000033
representing the channel estimate of the synchronous input signal acquired by the mth microphone at time t +1,
Figure BDA0003236475330000034
representing the channel estimate of the synchronous input signal acquired by the mth microphone at time t,
Figure BDA0003236475330000035
representing the error signal, mu the smoothing factor,
Figure BDA0003236475330000036
denotes the echo estimate of the mth microphone at time t, x (t, m) denotes the near-end signal of the mth microphone at time t, and s (k, m) [ s (k, m) s (k-1, m) … s (k-L + 1),m)]Representing a vector of synchronous input signals, sT(k, m) represents the transpose of s (k, m).
In one embodiment, the performing voice activity detection and noise estimation on each of the audio signals to be processed, and determining the existence probability of the audio signal according to the results of the voice activity detection and noise estimation includes:
according to the formula
Figure BDA0003236475330000037
Performing voice activity detection on each audio signal to be processed;
according to the formula
Figure BDA0003236475330000041
Performing noise estimation on each audio signal to be processed;
according to the formula
Figure BDA0003236475330000042
And
formula (II)
Figure BDA0003236475330000043
Determining the audio signal presence probability;
wherein alpha issSmoothing factor, alpha, representing noise estimation in presence of speechnThe smoothing factor represents the noise estimation in the absence of voice, V (k, t-1) represents the noise spectrum estimation value of the kth frequency point at the t-1 moment, V (k, t) represents the noise spectrum estimation value of the kth frequency point at the t moment, and X (k, t) represents the short-time Fourier transform of the kth frequency point at the t moment; beta is asSmoothing factor, beta, representing the estimation of the signal in the presence of speechnThe smoothing factor represents the signal estimation without voice, Y (k, t-1) represents the signal spectrum estimation value of the kth frequency point at the t-1 moment, and Y (k, t) represents the signal spectrum estimation value of the kth frequency point at the t moment; SNR (k, t) represents the estimated value of the SNR, P (k, t) represents the existence probability of the voice of the kth frequency point at the time t, THSNRRepresenting a signal-to-noise threshold.
In one embodiment, the calculating the time delay estimation of any two microphones in the microphone array according to the existence probability of the audio signal includes:
according to the formula
Figure BDA0003236475330000044
Calculating time delay estimation of any two microphones in the microphone array;
according to the formula
Figure BDA0003236475330000045
Calculating the relative angle between a target sound source and the microphone array;
where τ represents an estimate of the time delay between two audio signals to be processed, Ψ (m) represents the generalized cross-correlation of the two audio signals to be processed,
Figure BDA0003236475330000046
the weight value is represented by a weight value,
Figure BDA0003236475330000047
representing the expectation of the energy of the signal, theta represents the direction of arrival, c represents the speed of sound in air, and d represents the distance between the two microphones for the two audio signals to be processed.
In one embodiment, the beamforming the audio signal to be processed according to the direction of arrival estimation and beamforming algorithm includes:
according to the formula
Figure BDA0003236475330000048
Carrying out beam forming processing on the audio signal to be processed;
wherein R ═ E { X (t) XT(t)},
d(θ)=[1e-jωδcosθ/c...e-j(M-1)ωδcosθ/c]T
The method can obtain the following results by a Lagrange multiplier method:
Figure BDA0003236475330000051
Figure BDA0003236475330000052
represents hBFIs transposed matrix, subject to denotes such that
Figure BDA0003236475330000053
Is equal to 1, dT(theta) denotes a transposed matrix of d (theta), X (t) denotes a short-time Fourier transform at time t, XT(t) denotes a transposed matrix of X (t).
In an embodiment, the performing noise suppression on the to-be-processed audio signal after the beamforming processing to obtain the target audio signal includes:
according to the formula
Figure BDA0003236475330000054
And the formula S (k, t) ═ hNR(k) X (k, t) obtaining the target audio signal;
wherein S (k, t) represents the audio signal to be processed after noise reduction processing, hNR(k) Denotes a noise reduction filter, and X (k, t) denotes a short-time fourier-transformed audio signal to be processed.
According to a second aspect of the embodiments of the present disclosure, there is provided a speech processing apparatus, the method including:
the acquisition module is used for acquiring at least two audio signals to be processed; the at least two audio signals to be processed comprise audio signals acquired by a microphone array;
the first processing module is used for estimating the direction of arrival of any two microphones in the microphone array;
the second processing module is used for carrying out beam forming processing on the audio signal to be processed according to the direction of arrival estimation and the beam forming algorithm;
the third processing module is used for carrying out noise suppression on the audio signal to be processed after the beam forming processing to obtain a target audio signal;
and the output module is used for outputting the target audio signal.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flowchart of a speech processing method provided by an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method of speech processing provided by an embodiment of the present disclosure;
FIG. 3a is a block diagram of a speech processing apparatus according to an embodiment of the present disclosure;
FIG. 3b is a block diagram of a speech processing apparatus according to an embodiment of the present disclosure;
FIG. 3c is a block diagram of a speech processing apparatus according to an embodiment of the present disclosure;
FIG. 3d is a block diagram of a speech processing apparatus according to an embodiment of the present disclosure;
fig. 3e is a structural diagram of a speech processing apparatus according to an embodiment of the present disclosure;
FIG. 3f is a block diagram of a speech processing apparatus according to an embodiment of the present disclosure;
FIG. 3g is a block diagram of a speech processing apparatus according to an embodiment of the disclosure;
FIG. 3h is a block diagram of a speech processing apparatus according to an embodiment of the present disclosure;
fig. 3i is a structural diagram of a speech processing apparatus according to an embodiment of the present disclosure;
FIG. 3j is a block diagram of a speech processing apparatus according to an embodiment of the present disclosure;
fig. 4 is a block diagram of a speech processing device according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
An embodiment of the present disclosure provides a speech processing method, as shown in fig. 1, the method includes the following steps:
step 101, at least two audio signals to be processed are obtained.
The audio signals to be processed are all signals output by the audio input module, and the at least two audio signals to be processed comprise audio signals obtained by the microphone array.
And 102, estimating the direction of arrival of any two microphones in the microphone array.
And 103, carrying out beam forming processing on the audio signal to be processed according to the direction of arrival estimation and the beam forming algorithm.
And step 104, performing noise suppression on the audio signal to be processed after the beam forming processing to obtain a target audio signal.
And 105, outputting the target audio signal.
The embodiment of the disclosure provides a voice processing method, which performs direction-of-arrival estimation on any two microphones in a microphone array when a plurality of audio signals to be processed are acquired, performs beamforming processing on the audio signals to be processed according to the direction-of-arrival estimation and a beamforming algorithm, performs noise suppression on the audio signals to be processed after the beamforming processing, and finally outputs a target audio signal obtained after the noise reduction and suppression. Therefore, the method and the device have the advantages that the direction of arrival estimation is carried out on every two audio signals to be processed, and the noise suppression processing is carried out on the audio signals to be processed after the beam forming processing, so that the audio picking and enhancing functions are realized, and the accuracy of audio identification is improved.
An embodiment of the present disclosure provides a speech processing method, as shown in fig. 2, the method includes the following steps:
step 201, at least two original audio signals are obtained.
The original audio signal is a signal output by an audio input module, and the original audio signal comprises an audio signal output by a microphone array and/or an audio signal output by an intelligent microphone.
Illustratively, a multi-channel original audio signal is obtained from an audio input module at a fixed period, and the source of the original audio signal may be a microphone array or other smart microphones.
It should be noted that the audio input module may include a sound collection module and at least one input channel, for example, the audio input module includes 16 input channels; the sound collection module may include analog-to-digital conversion devices, a microphone array, a smart microphone, and the like, for example, the sound collection module includes 8 analog microphone inputs and 2 analog-to-digital conversion devices; the input sources of the overall audio signal may include: microphone arrays, third party analog or digital audio streams, other smart microphones.
Step 202, performing short-time fourier transform on each original audio signal to obtain the audio signal to be processed.
Optionally, according to a formula
Figure BDA0003236475330000071
And carrying out short-time Fourier transform on each original audio signal to obtain the audio signal to be processed.
Wherein, X (k, t, m) represents the short-time Fourier transform of the kth frequency point of the mth channel at the time of t, namely the audio signal to be processed, N represents the length of a time window, w (N) represents the function value of the nth window, X (N + t, m) represents the audio signal to be processed of the mth channel at the time of N + t, N is an integer greater than or equal to 1, wk2 pi K/K denotes the angular frequency, K denotes the length of the short-time fourier transform, and e is the natural index.
Illustratively, the acquired multi-channel original audio signal is converted from the time domain to the frequency domain by a short-time fourier transform.
Step 203, determining whether there is a synchronous input signal.
For example, the synchronous input signal generally refers to analog and digital audio streams of a third party, and is mainly carried by a sound source of sound played in the current environment, for example, sound played by a sound box or a television; the detection of the synchronous input signal is a necessary condition for performing echo cancellation processing, so that whether the synchronous input signal is directly related to whether echo cancellation is performed is determined; specifically, the detection of the synchronous input signal is usually completed by energy detection, that is, the signal energy of the synchronous input channel is calculated, and when the signal energy is greater than or equal to a set threshold, it is determined that there is a synchronous input signal and echo cancellation is required; and when the signal energy is less than the set threshold value, determining that no synchronous input signal exists and not needing echo cancellation.
And 204, when the synchronous input signal is determined, performing echo cancellation processing on each audio signal to be processed.
The echo cancellation means that artificially played sound, i.e., a synchronization signal, is removed from the acquired audio signal to be processed, and other sound is retained to the maximum extent.
Optionally, the channel estimation is performed according to a normalized minimum mean square error method, i.e. according to a formula
Figure BDA0003236475330000081
And
formula (II)
Figure BDA0003236475330000082
Performing echo cancellation processing on each audio signal to be processed;
wherein y (t, m) represents the synchronous input signal collected by the mth microphone at the time t, s (t-l) represents the synchronous input signal at the time t-l, hlIndicating the channel between the synchronous input signal to each microphone, L being an identifier in the accumulation operator, L indicating the length of time, h (t, m) [ h ]0h1...hL-1]Indicates that at time tStep one, inputting a channel between the signal and the mth microphone;
Figure BDA0003236475330000083
representing the channel estimate of the synchronous input signal acquired by the mth microphone at time t +1,
Figure BDA0003236475330000084
representing the channel estimate of the synchronous input signal acquired by the mth microphone at time t,
Figure BDA0003236475330000091
representing the error signal, mu the smoothing factor,
Figure BDA0003236475330000092
denotes the echo estimate of the mth microphone at time t, x (t, m) denotes the near-end signal of the mth microphone at time t, s (k, m) [ s (k, m) s (k-1, m) … s (k-L +1, m) ]]Representing a vector of synchronous input signals, sT(k, m) represents the transpose of s (k, m).
It should be noted that echo cancellation can also be performed by other methods in the prior art, which is not limited by the present disclosure.
Step 205, performing voice activity detection and noise estimation on each audio signal to be processed.
Specifically, in a real acoustic scene, a speech signal does not always exist in an environment, most of the speech segments and noise segments alternate, and even most of the speech segments and noise segments are noise segments, so that speech activity detection is required, and the speech activity detection is realized by detecting the energy or amplitude of a real-time audio stream and tracking the change of speech and noise in the audio stream on the basis of the energy or amplitude. In order to obtain better noise reduction effect, noise estimation is necessary, and the noise estimation tracks the change of the acoustic frequency spectrum in real time by tracking the change of characteristics such as signal-to-noise ratio, amplitude and the like in the audio stream signal. The most typical way is to estimate the signal-to-noise ratio of the audio in real time by tracking the frequency spectrums of the voice and the noise, and then update the frequency spectrums of the voice and the noise according to the estimated signal-to-noise ratio of the audio.
Optionally, when it is determined that there is a synchronous input signal, voice activity detection and noise estimation are performed on each audio signal to be processed after echo cancellation processing, so as to obtain an audio signal existence probability.
Optionally, when it is determined that no synchronous input signal exists, voice activity detection and noise estimation are directly performed on each audio signal to be processed, so as to obtain the existence probability of the audio signal.
Illustratively, according to a formula
Figure BDA0003236475330000093
Noise estimation is performed for each audio signal to be processed.
Wherein alpha issSmoothing factor, alpha, representing noise estimation in presence of speechnAnd the smoothing factor represents the noise estimation in the absence of voice, V (k, t-1) represents the noise spectrum estimation value of the kth frequency point at the t-1 moment, V (k, t) represents the noise spectrum estimation value of the kth frequency point at the t moment, and X (k, t) represents the short-time Fourier transform of the kth frequency point at the t moment.
According to the formula
Figure BDA0003236475330000094
Voice activity detection is performed for each audio signal to be processed.
Wherein, betasSmoothing factor, beta, representing the estimation of the signal in the presence of speechnAnd the smoothing factor represents the signal estimation in the absence of voice, Y (k, t-1) represents the signal spectrum estimation value of the kth frequency point at the t-1 moment, and Y (k, t) represents the signal spectrum estimation value of the kth frequency point at the t moment.
Figure BDA0003236475330000101
Where SNR (k, t) represents an estimate of the signal-to-noise ratio.
Figure BDA0003236475330000102
Wherein, P (k, t) represents the audio signal existence probability of the kth frequency point at the time t, THSNRRepresenting a signal-to-noise threshold.
And step 206, estimating the direction of arrival of any two microphones in the microphone array according to the existence probability of the audio signal.
The direction of arrival is the relative angle between the target sound source and the microphone array, and the estimation of the direction of arrival is divided into two steps: and calculating the time delay estimation of any two microphones in the microphone array according to the existence probability of the audio signal, and then calculating the relative angle between the target sound source and the microphone array according to the time delay estimation result.
Illustratively, according to a formula
Figure BDA0003236475330000103
Time delay estimates for any two microphones in the array of microphones are calculated.
According to the formula
Figure BDA0003236475330000104
The relative angle of the target sound source to the microphone array is calculated.
Where τ represents an estimate of the time delay between two audio signals to be processed, Ψ (m) represents the generalized cross-correlation of the two audio signals to be processed,
Figure BDA0003236475330000105
phi (k) represents a weight, phi (k) 1/| E { X (k,1) X*(k,2)|,E{X(k,1)X*(k,2) } denotes the expectation of the energy of the signal, θ denotes the direction of arrival, c denotes the speed of sound in air, and d denotes the distance between two microphones corresponding to two audio signals to be processed.
And step 207, performing beam forming processing on the audio signal to be processed according to the direction of arrival estimation and the beam forming algorithm.
Specifically, when the direction of arrival is determined, the spatial information of the signal can be utilized to the maximum extent by the beam forming algorithm, and noise and reverberation from directions other than the sound source direction can be eliminated. The beamforming is to perform phase compensation on each microphone in different frequency bands, so as to achieve the effects of enhancing a target signal and suppressing noise and interference. Specifically, spatial filters are respectively designed on different frequency bands, and each audio signal to be processed is spatially filtered.
For example, the beamforming coefficients may be designed according to a distortion-free minimum mean square error, and the overall energy may be minimized while ensuring that the direction of arrival signal is unchanged. I.e. according to the formula
Figure BDA0003236475330000111
And carrying out beam forming processing on the audio signal to be processed.
Wherein R ═ E { X (t) XT(t)},
d(θ)=[1e-jωδcosθ/c...e-j(M-1)ωδcosθ/c]T
The method can obtain the following results by a Lagrange multiplier method:
Figure BDA0003236475330000112
Figure BDA0003236475330000113
represents hBFIs transposed matrix, subject to denotes such that
Figure BDA0003236475330000114
Is equal to 1, dT(theta) denotes a transposed matrix of d (theta), X (t) denotes a short-time Fourier transform at time t, XT(t) denotes a transposed matrix of X (t).
And step 208, performing noise suppression on the audio signal to be processed after the beam forming processing to obtain a target audio signal.
Specifically, noise cancellation is necessary because noise is ubiquitous in the real environment, where noise cancellation is achieved by frequency filtering, which can be found by minimizing the difference between the clean signal and the estimated signal. The noise reduction is usually performed by spectral subtraction, and the idea of spectral subtraction is to calculate a ratio of a clean signal to an observed signal by using the energy of a current signal and the energy of noise estimation for each frequency point, and then perform frequency filtering by using the ratio.
Illustratively, according to a formula
Figure BDA0003236475330000115
And formula
S(k,t)=hNR(k) X (k, t) results in the target audio signal.
Wherein S (k, t) represents the audio signal to be processed after noise reduction processing, hNR(k) Denotes a noise reduction filter, and X (k, t) denotes a short-time fourier-transformed audio signal to be processed.
And 209, performing short-time Fourier inversion on each target audio signal and outputting the target audio signal.
For example, after the target audio signal is determined, the target audio signal is converted from the frequency domain to the time domain again by using short-time inverse fourier transform to obtain a finally output digital audio stream, the output digital audio stream may be output through an audio output module, and the audio output module may be a headphone interface, a USB sound card, or other smart microphones.
The embodiment of the disclosure provides a voice processing method, when a plurality of audio signals to be processed are obtained, whether a synchronous input signal exists is detected, and when the synchronous input signal exists, echo cancellation processing is performed on each audio signal to be processed; then, voice activity detection and noise estimation are carried out on each audio signal to be processed after echo cancellation processing, and the existence probability of the audio signal is obtained; and determining direction-of-arrival estimation between every two audio signals to be processed according to the audio signal existence probability, performing noise reduction processing on each audio signal to be processed according to the direction-of-arrival estimation, and finally outputting the target audio signal subjected to the noise reduction processing. Therefore, the method not only detects the synchronous input signals of the received audio signals to be processed, but also performs voice activity detection and noise estimation, and finally performs direction-of-arrival estimation on every two audio signals to be processed according to the existence probability of the audio signals obtained by the voice activity detection and the noise estimation, and performs noise reduction processing on all the audio signals to be processed according to the direction-of-arrival estimation, thereby further reducing various noises in the target audio signal, realizing audio pickup and enhancement functions, and further improving the accuracy of audio identification; in addition, the method and the device can simultaneously acquire the audio signals to be processed output by the plurality of intelligent microphones and simultaneously process the audio signals to be processed output by the plurality of intelligent microphones, so that the combined processing of the plurality of intelligent microphones is realized, complex scenes with high processing difficulty can be matched, and the adaptability is high. The microphone designed by the method has a comprehensive middle and far field voice enhancement effect, can be applied to all scenes with middle and far field voice enhancement requirements, and has extremely high universality.
Based on the speech processing method described in the above embodiments, the following is an embodiment of the apparatus of the present disclosure, which can be used to execute the embodiment of the method of the present disclosure.
The embodiment of the present disclosure provides a voice processing apparatus, as shown in fig. 3a, the voice processing apparatus 30 includes: an acquisition module 301, a first processing module 302, a second processing module 303, a third processing module 304 and an output module 305.
The acquiring module 301 is configured to acquire at least two audio signals to be processed; the at least two audio signals to be processed comprise audio signals acquired by a microphone array.
A first processing module 302, configured to perform direction-of-arrival estimation on any two microphones in the microphone array.
The second processing module 303 is configured to perform beamforming processing on the audio signal to be processed according to the direction of arrival estimation and a beamforming algorithm.
The third processing module 304 is configured to perform noise suppression on the audio signal to be processed after the beamforming processing, so as to obtain a target audio signal.
An output module 305, configured to output the target audio signal.
In one embodiment, as shown in fig. 3b, the apparatus further comprises a determination module 306, and the first processing module 302 comprises a first processing sub-module 3021.
The determining module 306 is configured to perform voice activity detection and noise estimation on each to-be-processed audio signal, and determine an existence probability of the audio signal according to results of the voice activity detection and the noise estimation.
The first processing sub-module 3021 is configured to perform direction-of-arrival estimation on any two microphones in the microphone array according to the audio signal existence probability.
In one embodiment, as shown in fig. 3c, the first processing submodule 3021 comprises a calculation unit 30211.
The calculating unit 30211 is configured to calculate time delay estimates of any two microphones in the microphone array according to the existence probability of the audio signal, and calculate a relative angle between a target sound source and the microphone array according to a result of the time delay estimates.
In one embodiment, as shown in FIG. 3d, the determination module 306 includes a first determination submodule 3061, a second processing submodule 3062, a third processing submodule 3063, and a fourth processing submodule 3064.
Therein, the first determining submodule 3061 is used for determining whether there is a synchronous input signal.
The second processing submodule 3062 is configured to perform echo cancellation processing on each to-be-processed audio signal when it is determined that the synchronization input signal exists.
The third processing submodule 3063 is configured to perform voice activity detection and noise estimation on each of the to-be-processed audio signals after the echo cancellation processing.
The fourth processing submodule 3064 is configured to, when it is determined that the synchronization input signal is not present, perform voice activity detection and noise estimation on each of the audio signals to be processed.
In one embodiment, as shown in fig. 3e, the obtaining module 301 includes a obtaining sub-module 3011 and a transforming sub-module 3012.
The obtaining sub-module 3011 is configured to obtain at least two original audio signals; the original audio signal is a signal output by the audio input module.
The transform submodule 3012 is configured to perform short-time fourier transform on each original audio signal to obtain the to-be-processed audio signal.
In one embodiment, as shown in FIG. 3f, the second processing submodule 3062 includes a processing unit 30621.
Wherein the processing unit 30621 is used for processing according to a formula
Figure BDA0003236475330000141
And
formula (II)
Figure BDA0003236475330000142
And performing echo cancellation processing on each audio signal to be processed.
Wherein y (t, m) represents the synchronous input signal collected by the mth microphone at the time t, s (t-l) represents the synchronous input signal at the time t-l, hlIndicating the channel between the synchronous input signal to each microphone, L being an identifier in the accumulation operator, L indicating the length of time, h (t, m) [ h ]0 h1...hL-1]Representing the channel between the synchronous input signal to the mth microphone at time t;
Figure BDA0003236475330000143
representing the channel estimate of the synchronous input signal acquired by the mth microphone at time t +1,
Figure BDA0003236475330000144
representing the channel estimate of the synchronous input signal acquired by the mth microphone at time t,
Figure BDA0003236475330000145
representing the error signal, mu the smoothing factor,
Figure BDA0003236475330000146
denotes the echo estimate of the mth microphone at time t, x (t, m) denotes the near-end signal of the mth microphone at time t, s (k, m) denotes [ s (k, m) s (k-1, m) … s (k-L +1, m)]Representing a vector of synchronous input signals, sT(k, m) represents the transpose of s (k, m).
In one embodiment, as shown in FIG. 3g, the determination module 306 includes a detection sub-module 3065, a fifth processing sub-module 3066, and a second determination sub-module 3067.
Wherein the detection submodule 3065 is used for calculating a formula
Figure BDA0003236475330000147
And carrying out voice activity detection on each audio signal to be processed.
A fifth processing submodule 3066 for processing according to a formula
Figure BDA0003236475330000148
And carrying out noise estimation on each audio signal to be processed.
A second determination submodule 3067 for determining a formula
Figure BDA0003236475330000149
And
formula (II)
Figure BDA0003236475330000151
Determining the audio signal presence probability.
Wherein alpha issSmoothing factor, alpha, representing noise estimation in presence of speechnThe smoothing factor represents the noise estimation in the absence of voice, V (k, t-1) represents the noise spectrum estimation value of the kth frequency point at the t-1 moment, V (k, t) represents the noise spectrum estimation value of the kth frequency point at the t moment, and X (k, t) represents the short-time Fourier transform of the kth frequency point at the t moment; beta is asSmoothing factor, beta, representing the estimation of the signal in the presence of speechnThe smoothing factor represents the signal estimation without voice, Y (k, t-1) represents the signal spectrum estimation value of the kth frequency point at the t-1 moment, and Y (k, t) represents the signal spectrum estimation value of the kth frequency point at the t moment; SNR (k, t) represents the estimated value of the SNR, P (k, t) represents the existence probability of the voice of the kth frequency point at the time t, THSNRRepresenting a signal-to-noise threshold.
In one embodiment, as shown in fig. 3h, the calculation unit 30211 includes a first calculation subunit 302111 and a second calculation subunit 302112.
Wherein the first calculating subunit 302111 is configured to calculate the first calculation according to the formula
Figure BDA0003236475330000152
Time delay estimates for any two microphones in the array of microphones are calculated.
The second calculating subunit 302112, configured to calculate the formula
Figure BDA0003236475330000153
The relative angle of the target sound source to the microphone array is calculated.
Where τ represents an estimate of the time delay between two audio signals to be processed, Ψ (m) represents the generalized cross-correlation of the two audio signals to be processed,
Figure BDA0003236475330000154
the weight value is represented by a weight value,
Figure BDA0003236475330000155
representing the expectation of the energy of the signal, theta represents the direction of arrival, c represents the speed of sound in air, and d represents the distance between the two microphones for the two audio signals to be processed.
In one embodiment, as shown in fig. 3i, the second processing module 303 comprises a sixth processing submodule 3031.
Wherein the sixth processing submodule 3031 is configured to perform processing according to a formula
Figure BDA0003236475330000156
And carrying out beam forming processing on the audio signal to be processed.
Wherein R ═ E { X (t) XT(t)},
d(θ)=[1e-jωδcosθ/c...e-j(M-1)ωδcosθ/c]T
The method can obtain the following results by a Lagrange multiplier method:
Figure BDA0003236475330000157
Figure BDA0003236475330000158
represents hBFIs transposed matrix, subject to denotes such that
Figure BDA0003236475330000159
Is equal to 1, dT(theta) denotes a transposed matrix of d (theta), X (t) denotes a short-time Fourier transform at time t, XT(t) denotes a transposed matrix of X (t).
In one embodiment, as shown in FIG. 3j, the third processing module 304 includes a seventh processing submodule 3041.
Wherein the seventh processing submodule 3041 is configured to according to a formula
Figure BDA0003236475330000161
And the formula S (k, t) ═ hNR(k) X (k, t) results in the target audio signal.
Wherein S (k, t) represents the audio signal to be processed after noise reduction processing, hNR(k) Denotes a noise reduction filter, and X (k, t) denotes a short-time fourier-transformed audio signal to be processed.
The embodiment of the present disclosure provides a voice processing apparatus, which performs direction-of-arrival estimation on any two microphones in a microphone array when acquiring a plurality of audio signals to be processed, performs beamforming processing on the audio signals to be processed according to the direction-of-arrival estimation and a beamforming algorithm, performs noise suppression on the audio signals to be processed after the beamforming processing, and finally outputs a target audio signal obtained after the noise reduction and suppression. Therefore, the method and the device have the advantages that the direction of arrival estimation is carried out on every two audio signals to be processed, and the noise suppression processing is carried out on the audio signals to be processed after the beam forming processing, so that the audio picking and enhancing functions are realized, and the accuracy of audio identification is improved.
Referring to fig. 4, an embodiment of the present disclosure further provides a speech processing apparatus, where the speech processing apparatus includes a receiver 401, a transmitter 402, a memory 403, and a processor 404, where the transmitter 402 and the memory 403 are respectively connected to the processor 404, the memory 403 stores at least one computer instruction, and the processor 404 is configured to load and execute the at least one computer instruction to implement the speech processing method described in the embodiment corresponding to fig. 1.
Based on the voice processing method described in the embodiment corresponding to fig. 1, an embodiment of the present disclosure further provides a computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be a Read Only Memory (ROM), a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. The storage medium stores computer instructions for executing the voice processing method described in the embodiment corresponding to fig. 1, which is not described herein again.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (11)

1. A method of speech processing, the method comprising:
acquiring at least two audio signals to be processed; the at least two audio signals to be processed comprise audio signals acquired by a microphone array;
estimating the direction of arrival of any two microphones in the microphone array;
carrying out beam forming processing on the audio signal to be processed according to the direction of arrival estimation and the beam forming algorithm;
carrying out noise suppression on the audio signal to be processed after the beam forming processing to obtain a target audio signal;
and outputting the target audio signal.
2. The method of claim 1, wherein prior to the estimating the direction of arrival for any two microphones of the array of microphones, further comprising:
performing voice activity detection and noise estimation on each audio signal to be processed, and determining the existence probability of the audio signal according to the results of the voice activity detection and the noise estimation;
the estimating direction of arrival of any two microphones of the microphone array comprises:
and estimating the direction of arrival of any two microphones in the microphone array according to the existence probability of the audio signal.
3. The method of claim 2, wherein the estimating direction of arrival for any two microphones in the microphone array according to the audio signal presence probability comprises:
calculating time delay estimation of any two microphones in the microphone array according to the existence probability of the audio signal;
and calculating the relative angle between the target sound source and the microphone array according to the time delay estimation result.
4. The method of claim 3, wherein the performing voice activity detection and noise estimation on each of the audio signals to be processed comprises:
determining whether there is a synchronous input signal;
when the synchronous input signal is determined, performing echo cancellation processing on each audio signal to be processed;
performing voice activity detection and noise estimation on each audio signal to be processed after echo cancellation processing;
and when the synchronous input signal is determined not to exist, carrying out voice activity detection and noise estimation on each audio signal to be processed.
5. The method of claim 1, wherein the obtaining at least two audio signals to be processed comprises:
acquiring at least two original audio signals; the original audio signal is a signal output by an audio input module;
and carrying out short-time Fourier transform on each original audio signal to obtain the audio signal to be processed.
6. The method of claim 4, wherein the performing echo cancellation processing on each of the audio signals to be processed comprises:
according to the formula
Figure FDA0003236475320000021
And
formula (II)
Figure FDA0003236475320000022
Performing echo cancellation processing on each audio signal to be processed;
wherein y (t, m) represents the m-thSynchronous input signal collected by microphone at t moment, s (t-l) represents synchronous input signal at t-l moment, hlIndicating the channel between the synchronous input signal to each microphone, L being an identifier in the accumulation operator, L indicating the length of time, h (t, m) [ h ]0h1...hL-1]Representing the channel between the synchronous input signal to the mth microphone at time t;
Figure FDA0003236475320000023
representing the channel estimate of the synchronous input signal acquired by the mth microphone at time t +1,
Figure FDA0003236475320000024
representing the channel estimate of the synchronous input signal acquired by the mth microphone at time t,
Figure FDA0003236475320000025
representing the error signal, mu the smoothing factor,
Figure FDA0003236475320000026
denotes the echo estimate of the mth microphone at time t, x (t, m) denotes the near-end signal of the mth microphone at time t, s (k, m) [ s (k, m) s (k-1, m) … s (k-L +1, m) ]]Representing a vector of synchronous input signals, sT(k, m) represents the transpose of s (k, m).
7. The method of claim 6, wherein performing voice activity detection and noise estimation on each of the audio signals to be processed, and determining the audio signal existence probability according to the results of the voice activity detection and noise estimation comprises:
according to the formula
Figure FDA0003236475320000031
Performing voice activity detection on each audio signal to be processed;
according to the formula
Figure FDA0003236475320000032
Performing noise estimation on each audio signal to be processed;
according to the formula
Figure FDA0003236475320000033
And
formula (II)
Figure FDA0003236475320000034
Determining the audio signal presence probability;
wherein alpha issSmoothing factor, alpha, representing noise estimation in presence of speechnThe smoothing factor represents the noise estimation in the absence of voice, V (k, t-1) represents the noise spectrum estimation value of the kth frequency point at the t-1 moment, V (k, t) represents the noise spectrum estimation value of the kth frequency point at the t moment, and X (k, t) represents the short-time Fourier transform of the kth frequency point at the t moment; beta is asSmoothing factor, beta, representing the estimation of the signal in the presence of speechnThe smoothing factor represents the signal estimation without voice, Y (k, t-1) represents the signal spectrum estimation value of the kth frequency point at the t-1 moment, and Y (k, t) represents the signal spectrum estimation value of the kth frequency point at the t moment; SNR (k, t) represents the estimated value of the SNR, P (k, t) represents the existence probability of the voice of the kth frequency point at the time t, THSNRRepresenting a signal-to-noise threshold.
8. The method of claim 7, wherein calculating an estimate of time delay for any two microphones in the array of microphones based on the probability of existence of the audio signal, and wherein calculating a relative angle of a target sound source to the array of microphones based on the result of the estimate of time delay comprises:
according to the formula
Figure FDA0003236475320000035
Calculating time delay estimation of any two microphones in the microphone array;
according toFormula (II)
Figure FDA0003236475320000036
Calculating the relative angle between a target sound source and the microphone array;
where τ represents an estimate of the time delay between two audio signals to be processed, Ψ (m) represents the generalized cross-correlation of the two audio signals to be processed,
Figure FDA0003236475320000037
Figure FDA0003236475320000038
the weight value is represented by a weight value,
Figure FDA0003236475320000039
E{X(k,1)X*(k,2) } denotes the expectation of the energy of the signal, θ denotes the direction of arrival, c denotes the speed of sound in air, and d denotes the distance between two microphones corresponding to two audio signals to be processed.
9. The method of claim 8, wherein the beamforming the audio signal to be processed according to the direction of arrival estimation and beamforming algorithm comprises:
according to the formula
Figure FDA0003236475320000041
subject to
Figure FDA0003236475320000042
Carrying out beam forming processing on the audio signal to be processed;
wherein R ═ E { X (t) XT(t)},
d(θ)=[1e-jωδcosθ/c ... e-j(M-1)ωδcosθ/c]T
The method can obtain the following results by a Lagrange multiplier method:
Figure FDA0003236475320000043
Figure FDA0003236475320000044
represents hBFIs transposed matrix, subject to denotes such that
Figure FDA0003236475320000045
Is equal to 1, dT(theta) denotes a transposed matrix of d (theta), X (t) denotes a short-time Fourier transform at time t, XT(t) denotes a transposed matrix of X (t).
10. The method of claim 9, wherein the performing noise suppression on the beamformed audio signal to be processed to obtain a target audio signal comprises:
according to the formula
Figure FDA0003236475320000046
And the formula S (k, t) ═ hNR(k) X (k, t) obtaining the target audio signal;
wherein S (k, t) represents the audio signal to be processed after noise reduction processing, hNR(k) Denotes a noise reduction filter, and X (k, t) denotes a short-time fourier-transformed audio signal to be processed.
11. A speech processing apparatus, comprising:
the acquisition module is used for acquiring at least two audio signals to be processed; the at least two audio signals to be processed comprise audio signals acquired by a microphone array;
the first processing module is used for estimating the direction of arrival of any two microphones in the microphone array;
the second processing module is used for carrying out beam forming processing on the audio signal to be processed according to the direction of arrival estimation and the beam forming algorithm;
the third processing module is used for carrying out noise suppression on the audio signal to be processed after the beam forming processing to obtain a target audio signal;
and the output module is used for outputting the target audio signal.
CN202111003630.0A 2021-08-30 2021-08-30 Voice processing method and device Pending CN113744752A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111003630.0A CN113744752A (en) 2021-08-30 2021-08-30 Voice processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111003630.0A CN113744752A (en) 2021-08-30 2021-08-30 Voice processing method and device

Publications (1)

Publication Number Publication Date
CN113744752A true CN113744752A (en) 2021-12-03

Family

ID=78733797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111003630.0A Pending CN113744752A (en) 2021-08-30 2021-08-30 Voice processing method and device

Country Status (1)

Country Link
CN (1) CN113744752A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783441A (en) * 2022-05-14 2022-07-22 云知声智能科技股份有限公司 Voice recognition method, device, equipment and medium
CN115579016A (en) * 2022-12-07 2023-01-06 成都海普迪科技有限公司 Method and system for eliminating acoustic echo

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007147732A (en) * 2005-11-24 2007-06-14 Japan Advanced Institute Of Science & Technology Hokuriku Noise reduction system and noise reduction method
CN106251877A (en) * 2016-08-11 2016-12-21 珠海全志科技股份有限公司 Voice Sounnd source direction method of estimation and device
CN108831508A (en) * 2018-06-13 2018-11-16 百度在线网络技术(北京)有限公司 Voice activity detection method, device and equipment
CN108899044A (en) * 2018-07-27 2018-11-27 苏州思必驰信息科技有限公司 Audio signal processing method and device
CN108922553A (en) * 2018-07-19 2018-11-30 苏州思必驰信息科技有限公司 Wave arrival direction estimating method and system for sound-box device
CN110097891A (en) * 2019-04-22 2019-08-06 广州视源电子科技股份有限公司 Microphone signal processing method, device, equipment and storage medium
CN110164446A (en) * 2018-06-28 2019-08-23 腾讯科技(深圳)有限公司 Voice signal recognition methods and device, computer equipment and electronic equipment
CN110556103A (en) * 2018-05-31 2019-12-10 阿里巴巴集团控股有限公司 Audio signal processing method, apparatus, system, device and storage medium
CN111161751A (en) * 2019-12-25 2020-05-15 声耕智能科技(西安)研究院有限公司 Distributed microphone pickup system and method under complex scene
CN111624553A (en) * 2020-05-26 2020-09-04 锐迪科微电子科技(上海)有限公司 Sound source positioning method and system, electronic equipment and storage medium
CN111856402A (en) * 2020-07-23 2020-10-30 海尔优家智能科技(北京)有限公司 Signal processing method and device, storage medium, and electronic device
CN113270106A (en) * 2021-05-07 2021-08-17 深圳市友杰智新科技有限公司 Method, device and equipment for inhibiting wind noise of double microphones and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007147732A (en) * 2005-11-24 2007-06-14 Japan Advanced Institute Of Science & Technology Hokuriku Noise reduction system and noise reduction method
CN106251877A (en) * 2016-08-11 2016-12-21 珠海全志科技股份有限公司 Voice Sounnd source direction method of estimation and device
CN110556103A (en) * 2018-05-31 2019-12-10 阿里巴巴集团控股有限公司 Audio signal processing method, apparatus, system, device and storage medium
CN108831508A (en) * 2018-06-13 2018-11-16 百度在线网络技术(北京)有限公司 Voice activity detection method, device and equipment
CN110164446A (en) * 2018-06-28 2019-08-23 腾讯科技(深圳)有限公司 Voice signal recognition methods and device, computer equipment and electronic equipment
CN108922553A (en) * 2018-07-19 2018-11-30 苏州思必驰信息科技有限公司 Wave arrival direction estimating method and system for sound-box device
CN108899044A (en) * 2018-07-27 2018-11-27 苏州思必驰信息科技有限公司 Audio signal processing method and device
CN110097891A (en) * 2019-04-22 2019-08-06 广州视源电子科技股份有限公司 Microphone signal processing method, device, equipment and storage medium
CN111161751A (en) * 2019-12-25 2020-05-15 声耕智能科技(西安)研究院有限公司 Distributed microphone pickup system and method under complex scene
CN111624553A (en) * 2020-05-26 2020-09-04 锐迪科微电子科技(上海)有限公司 Sound source positioning method and system, electronic equipment and storage medium
CN111856402A (en) * 2020-07-23 2020-10-30 海尔优家智能科技(北京)有限公司 Signal processing method and device, storage medium, and electronic device
CN113270106A (en) * 2021-05-07 2021-08-17 深圳市友杰智新科技有限公司 Method, device and equipment for inhibiting wind noise of double microphones and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783441A (en) * 2022-05-14 2022-07-22 云知声智能科技股份有限公司 Voice recognition method, device, equipment and medium
CN115579016A (en) * 2022-12-07 2023-01-06 成都海普迪科技有限公司 Method and system for eliminating acoustic echo
CN115579016B (en) * 2022-12-07 2023-03-21 成都海普迪科技有限公司 Method and system for eliminating acoustic echo

Similar Documents

Publication Publication Date Title
US10123113B2 (en) Selective audio source enhancement
JP4815661B2 (en) Signal processing apparatus and signal processing method
CN106710601B (en) Noise-reduction and pickup processing method and device for voice signals and refrigerator
EP3542547B1 (en) Adaptive beamforming
CN107479030B (en) Frequency division and improved generalized cross-correlation based binaural time delay estimation method
KR101449433B1 (en) Noise cancelling method and apparatus from the sound signal through the microphone
KR101456866B1 (en) Method and apparatus for extracting the target sound signal from the mixed sound
US8462962B2 (en) Sound processor, sound processing method and recording medium storing sound processing program
CN109285557B (en) Directional pickup method and device and electronic equipment
KR20040044982A (en) Selective sound enhancement
CN106887239A (en) For the enhanced blind source separation algorithm of the mixture of height correlation
CN110610718B (en) Method and device for extracting expected sound source voice signal
JP2007523514A (en) Adaptive beamformer, sidelobe canceller, method, apparatus, and computer program
CN111863015B (en) Audio processing method, device, electronic equipment and readable storage medium
CN108109617A (en) A kind of remote pickup method
CN113744752A (en) Voice processing method and device
WO2007123047A1 (en) Adaptive array control device, method, and program, and its applied adaptive array processing device, method, and program
KR100917460B1 (en) Noise cancellation apparatus and method thereof
CN113903353A (en) Directional noise elimination method and device based on spatial discrimination detection
CN112802490B (en) Beam forming method and device based on microphone array
CN117169812A (en) Sound source positioning method based on deep learning and beam forming
CN116106826A (en) Sound source positioning method, related device and medium
CN116760442A (en) Beam forming method, device, electronic equipment and storage medium
CN113948101B (en) Noise suppression method and device based on space distinguishing detection
KR20090098552A (en) Apparatus and method for automatic gain control using phase information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211203

RJ01 Rejection of invention patent application after publication