WO2016074495A1 - 信号处理的方法及装置 - Google Patents

信号处理的方法及装置 Download PDF

Info

Publication number
WO2016074495A1
WO2016074495A1 PCT/CN2015/084148 CN2015084148W WO2016074495A1 WO 2016074495 A1 WO2016074495 A1 WO 2016074495A1 CN 2015084148 W CN2015084148 W CN 2015084148W WO 2016074495 A1 WO2016074495 A1 WO 2016074495A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
frequency
beamforming
output
frequency domain
Prior art date
Application number
PCT/CN2015/084148
Other languages
English (en)
French (fr)
Inventor
韩娜
袁浩
黄冬梅
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to US15/526,812 priority Critical patent/US10181330B2/en
Priority to EP15859302.0A priority patent/EP3220158A4/en
Publication of WO2016074495A1 publication Critical patent/WO2016074495A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • G01S3/805Systems for determining direction or deviation from predetermined direction using adjustment of real or effective orientation of directivity characteristics of a transducer or transducer system to give a desired condition of signal derived from that transducer or transducer system, e.g. to give a maximum or minimum signal
    • G01S3/8055Systems for determining direction or deviation from predetermined direction using adjustment of real or effective orientation of directivity characteristics of a transducer or transducer system to give a desired condition of signal derived from that transducer or transducer system, e.g. to give a maximum or minimum signal adjusting orientation of a single directivity characteristic to produce maximum or minimum signal
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude

Definitions

  • the present invention relates to the field of signal processing, and in particular, to a signal processing method and apparatus.
  • DSBF Delay Beam Saw Beamforming
  • adaptive beamforming techniques Flanagan proposed the DSBF method in 1985 as the simplest method of fixed beamforming. It first time compensates the speech signals received on the individual microphones in the array to synchronize the speech of each channel, and then adds the average of the signals of each channel. In this case, once the signal deviates from the array, the array will exhibit different gains for signals of different frequencies, resulting in processing distortion of the wideband signal.
  • Another type of beamforming technique that corresponds to fixed beamforming techniques is adaptive beamforming, the adaptive characteristic of which is that the filter coefficients vary with the statistical characteristics of the input signal.
  • the Generalized Sidelobe Canceller (GSC) proposed by Griffth and Jim in 1982 is a general model of adaptive beamformers.
  • GSC Generalized Sidelobe Canceller
  • the output of the Block Matrix (BM) often contains valid speech components, which will cause damage to the original speech in the filtering result.
  • the present invention provides a method and an apparatus for processing a signal, and the main purpose thereof is to solve the problem of distortion of a wideband signal in a voice enhancement based on a microphone array existing in the prior art.
  • the present invention provides a signal processing method, the method comprising:
  • a time domain sound signal outputted after beamforming in the direction is obtained.
  • the acquiring the beamforming output signal of the corresponding beam group of the audio signal of each frequency point according to the preset weight vector of the multi-direction and the frequency domain audio signal corresponding to the sound signal of each channel includes:
  • the obtaining, by the beam energy of different frequency points in the same direction, the output direction of the beam group including:
  • the beam energies of different frequency points in the same direction are summed, and the direction with the largest beam energy is selected as the output direction.
  • the beam energies of different frequency points in the same direction are summed, and the direction with the largest beam energy is selected as the output direction, including:
  • the beam energies of all frequency points between the preset first frequency and the second frequency in the same direction are summed, and the direction with the largest beam energy is selected as the output direction.
  • the multi-directional weight vector is obtained based on a delay accumulation beamforming algorithm, a linear constrained minimum variance beamforming algorithm, a generalized sidelobe cancellation beamforming algorithm, or a minimum variance distortionless response method MVDR.
  • the method further includes:
  • the audio signal of each frequency point output after beamforming in the output direction is multiplied by a gain, which is a value proportional to the frequency domain value.
  • the gain has a different proportional relationship with the frequency domain value in a range of different frequency domain values set in advance.
  • the present invention also provides an apparatus for signal processing, the apparatus comprising:
  • a short-time Fourier transform STFT unit configured to acquire at least two channel sound signals, and perform short-time Fourier transform STFT on each channel sound signal to obtain a frequency domain audio signal corresponding to each channel sound signal;
  • a first acquiring unit configured to acquire, according to a preset multi-direction weight vector and a frequency domain audio signal corresponding to each channel sound signal, a beamforming output signal of a corresponding signal group of the audio signal of each frequency point;
  • a second acquiring unit configured to acquire an output direction of the beam group according to beam energy of different frequency points in the same direction
  • An inverse transform unit is configured to acquire a time domain sound signal output after beamforming in the direction.
  • the first acquiring unit is configured to:
  • the second acquiring unit is configured to:
  • the beam energies of different frequency points in the same direction are summed, and the direction with the largest beam energy is selected as the output direction.
  • the second acquiring unit is further configured to:
  • the beam energies of all frequency points between the preset first frequency and the second frequency in the same direction are summed, and the direction with the largest beam energy is selected as the output direction.
  • the multi-directional weight vector is obtained based on a delay accumulation beamforming algorithm, a linear constrained minimum variance beamforming algorithm, a generalized sidelobe cancellation beamforming algorithm, or a minimum variance distortionless response method MVDR.
  • the apparatus further includes a gain unit configured to multiply an audio signal of each frequency point output after beamforming in the output direction by a gain, the gain being proportional to a frequency domain value Value.
  • the gain has a different proportional relationship with the frequency domain value in a range of different frequency domain values set in advance.
  • the invention acquires at least two channel sound signals, and performs short-time Fourier transform STFT on each channel sound signal to acquire frequency domain audio signals corresponding to the sound signals of the respective channels; according to preset multi-direction weight vector sums Obtaining, by the frequency domain audio signal corresponding to each channel sound signal, a beamforming output signal of a corresponding beam group of the audio signal of each frequency point; acquiring an output direction of the beam group according to beam energy of different frequency points in the same direction;
  • the present invention adopts a frequency domain-based wideband beamforming algorithm to effectively improve the gain of the received speech, and adopts an adaptive selection of the optimal beam.
  • the frequency domain beamforming algorithm used is advantageous for fine adjustment of the signal spectrum, and is convenient for fusion with other pre- and post-processing algorithms.
  • the invention is easy to implement, has small calculation amount, and is suitable for various embedded platforms.
  • FIG. 1 is a schematic flow chart of a first embodiment of a signal processing method according to the present invention.
  • FIG. 2 is a schematic diagram of a method of beamforming according to the present invention.
  • FIG. 3 is a schematic diagram of the refinement process of step 103;
  • FIG. 4 is a schematic diagram of an L-shaped three-dimensional space microphone array provided by the present invention.
  • FIG. 5 is a schematic flowchart diagram of a second embodiment of a method for signal processing according to the present invention.
  • FIG. 6 is a schematic diagram of functional modules of a first embodiment of a signal processing apparatus according to the present invention.
  • FIG. 7 is a schematic diagram of functional modules of a second embodiment of a signal processing apparatus according to the present invention.
  • the present invention provides a method of signal processing.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 1 is a schematic flowchart of a first embodiment of a signal processing method according to the present invention.
  • the method of signal processing includes:
  • Step 101 Acquire at least two channel sound signals, and perform short-time Fourier transform STFT on each channel sound signal to acquire frequency domain audio signals corresponding to the sound signals of the respective channels;
  • STFT short-time Fourier transform
  • the same framing method is used for each microphone signal to perform short-time Fourier transform, and the frame and the frame are partially overlapped.
  • the 1/4 frame shift is used for framing, of course.
  • Other methods such as 1/2 frame shift can also be adopted; the nth microphone frame signal s n (i) is multiplied by the window function w(i), and the hamming window is used in this embodiment to obtain the windowed frame signal x n (i)
  • short-time Fourier transform is performed on the framed frame signal to obtain frame data in the frequency domain, namely:
  • Step 102 Acquire a beamforming output signal of a corresponding beam group of an audio signal of each frequency point according to a preset multi-direction weight vector and a frequency domain audio signal corresponding to each channel sound signal;
  • a beam group is designed, including M beams respectively pointing in M directions: ⁇ 1 , ⁇ 2 , . . . , ⁇ M , and each beam uses all array elements in the microphone array for beamforming.
  • the main lobe of each adjacent beam intersects, and the main lobe of the beam group covers the required spatial extent, so that no matter which direction the sound source comes from, there is a beam pointing close to it.
  • the corresponding M beamformed frequency domain frame data is obtained.
  • the specific method is as follows: for a specific direction ⁇ m , using M weight vectors of different directions to weight the received data of each microphone in the microphone array at the same frequency point f, to obtain the m-th beam, the frequency-weighted synthesis of the frequency Data Y m (f).
  • W m,n (f) is the weight applied to the data on the frequency f received by the nth microphone in the mth beam
  • H indicates conjugate
  • X and W m are vector representations of X n (f) and W m,n (f), respectively.
  • the acquiring the beamforming output signal of the corresponding beam group of the audio signal of each frequency point according to the preset weight vector of the multi-direction and the frequency domain audio signal corresponding to the sound signal of each channel includes:
  • FIG. 2 is a schematic diagram of a method of beamforming according to the present invention.
  • a circular microphone array composed of eight directional microphones is shown.
  • the microphone closest to the desired signal direction and its two adjacent microphones are used to form a sub-array for beamforming, for example, the desired signal is 45 degrees.
  • the beam in the direction is selected to form a sub-array of the No. 2 microphone in the 45-degree direction and its adjacent No. 1 and No. 3 microphones for beamforming.
  • Step 103 Acquire an output direction of the beam group according to beam energy of different frequency points in the same direction.
  • the obtaining, by the beam energy of different frequency points in the same direction, the output direction of the beam group including:
  • the beam energies of different frequency points in the same direction are summed, and the direction with the largest beam energy is selected as the output direction.
  • FIG. 3 is a schematic diagram of the refinement process of step 103.
  • the energy of the M frequency domain frame data is calculated using the weighted synthesized data Y m (f) obtained in step 102. Calculated as follows:
  • f s is the sampling rate
  • E m the beam with the largest energy value
  • the beam energies of different frequency points in the same direction are summed, and the direction with the largest beam energy is selected as the output direction, including:
  • the beam energies of all frequency points between the preset first frequency and the second frequency in the same direction are summed, and the direction with the largest beam energy is selected as the output direction.
  • the optimal output beam can be selected according to the energy sum of the partial frequency points.
  • the specific implementation process is shown in Figure 3.
  • the energy sum of the frequency domain frame data corresponding to the M directions is calculated by using the weighted synthesized data Y m (f) obtained in step 102. Calculated as follows:
  • the multi-directional weight vector is obtained based on a delay accumulation beamforming algorithm, a linear constrained minimum variance beamforming algorithm, a generalized sidelobe cancellation beamforming algorithm or a Minimum Variance Distortionless Response (MVDR) method. .
  • MVDR Minimum Variance Distortionless Response
  • an MVDR beamforming filter is taken as an example for detailed description.
  • the MVDR method is to minimize the power of the output signal to obtain an estimate of the optimal beamformer weight vector.
  • the power spectral density of the output signal is:
  • ⁇ xx represents the power spectral density matrix of the array input signal.
  • d represents the attenuation and delay caused by signal propagation, as follows:
  • the amplitude difference of the received signals of each array element is negligible, the attenuation factor ⁇ n is all set to 1, ⁇ is the angular frequency, and ⁇ n is the time difference between the two elements:
  • Equation (4) is applicable to microphone arrays of any topology.
  • this beamformer is transformed into solving the optimization problem with constraints:
  • the MVDR filter can be obtained by using the power spectral density matrix of the noise:
  • ⁇ vv is the power spectral density matrix of the noise. If the matrix is a coherent matrix, a super-directional beamformer is obtained, which is the frequency domain weight vector used in step 102:
  • ⁇ vv is a noise coherence function matrix, where the pth and qth column elements are calculated by:
  • Step 104 Acquire a time domain sound signal outputted after beamforming in the direction.
  • the embodiment of the present invention acquires at least two sound signals of a channel, and performs short-time Fourier transform STFT on each channel sound signal to acquire a frequency domain audio signal corresponding to each channel sound signal; according to a preset multi-directional right And a frequency domain audio signal corresponding to each channel sound signal acquires a beamforming output signal of a corresponding beam group of the audio signal of each frequency point; and an output direction of the beam group is obtained according to beam energy of different frequency points in the same direction; Obtaining a time domain sound signal outputted after beamforming in the direction, the invention adopts a frequency domain based wideband beamforming algorithm to effectively improve the gain of the received speech, and adopts an adaptive selection of the optimal beam to avoid providing the desired signal.
  • a priori information such as direction of arrival reduces the complexity of the algorithm and increases the scope of application of the algorithm.
  • the frequency domain beamforming algorithm used is advantageous for fine adjustment of the signal spectrum, and is convenient for fusion with other pre- and post-processing algorithms.
  • the invention is easy to implement, has small calculation amount, and is suitable for various embedded platforms.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 5 is a schematic flowchart diagram of a second embodiment of a signal processing method according to the present invention.
  • step 103 further includes step 105:
  • Step 105 Multiply an audio signal of each frequency point output after beamforming in the output direction by a gain, and the gain is a value proportional to a frequency domain value.
  • the broadband beam it is also necessary to consider the consistency problem of the beam in the frequency domain, especially the problem that the beam main lobe width of each frequency point is inconsistent.
  • the main lobe of the wideband beam is wide at the low frequency and narrow at the high frequency. If the normalized constraint in equation (9) is satisfied at the same time, that is, the signal in the desired direction is not distorted, which will attenuate the high frequency energy of the signal. , causing signal distortion. Therefore, this embodiment has a post-processing process after beamforming. As the frequency increases, the weight coefficient of the beam is multiplied by a gradually increasing weighting factor, as shown in equation (15), to compensate for the attenuation of the high frequency portion, thereby achieving the purpose of high frequency boosting.
  • different enhancement or attenuation processes are performed for different frequency points, so that the subjective auditory experience is more comfortable.
  • the main lobe of the beam is very wide, and the low frequency signal is not substantially attenuated, so there is no need to enhance it.
  • the frequency is greater than a certain value, the signal begins to attenuate, and the gain of the beam is amplified to different degrees as the frequency increases, as shown in equation (16).
  • the gain has a different proportional relationship with the frequency domain value in a range of different frequency domain values set in advance.
  • Step 104 Perform inverse transform of the STFT on the audio signal of each frequency point output after beamforming in the output direction in the output direction to obtain a time domain sound signal.
  • the frequency domain based wideband beamforming algorithm effectively improves the gain of the received speech compared with the prior art.
  • the method of adaptively selecting the best beam avoids a priori information such as providing the desired signal arrival direction, reduces the complexity of the algorithm, and increases the application range of the algorithm.
  • the frequency domain beamforming algorithm used facilitates the fine adjustment of the signal spectrum and facilitates fusion with other pre- and post-processing algorithms.
  • the post-processing algorithm for adjusting the frequency gain is used to improve the sound quality degradation in wideband speech signal processing.
  • the invention is easy to implement, has a small amount of calculation, and is suitable for various embedded platforms.
  • the present invention provides an apparatus for signal processing.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 6 is a schematic diagram of functional modules of a first embodiment of a signal processing apparatus according to the present invention.
  • the apparatus comprises:
  • the acquisition and time-frequency transform unit 601 is configured to acquire at least two channel sound signals, and perform short-time Fourier transform STFT on each channel sound signal to acquire frequency domain audio signals corresponding to the sound signals of the respective channels;
  • STFT short-time Fourier transform
  • the same framing method is used for each microphone signal to perform short-time Fourier transform, and the frame and the frame are partially overlapped.
  • the 1/4 frame shift is used for framing, of course.
  • Other methods such as 1/2 frame shift can also be adopted; the nth microphone frame signal s n (i) is multiplied by the window function w(i), and the hamming window is used in this embodiment to obtain the windowed frame signal x n (i)
  • short-time Fourier transform is performed on the framed frame signal to obtain frame data in the frequency domain, namely:
  • the first obtaining unit 602 is configured to acquire, according to the preset multi-direction weight vector and the frequency domain audio signal corresponding to each channel sound signal, a beamforming output signal of the audio beam corresponding beam group of each frequency point;
  • a beam group is designed, including M beams respectively pointing in M directions: ⁇ 1 , ⁇ 2 , . . . , ⁇ M , and each beam uses all array elements in the microphone array for beamforming.
  • the main lobe of each adjacent beam intersects, and the main lobe of the beam group covers the required spatial extent, so that no matter which direction the sound source comes from, there is a beam pointing close to it.
  • the corresponding M beamformed frequency domain frame data is obtained.
  • the specific method is as follows: for a specific direction ⁇ m , using M weight vectors of different directions to weight the received data of each microphone in the microphone array at the same frequency point f, to obtain the m-th beam, the frequency-weighted synthesis of the frequency Data Y m (f).
  • W m,n (f) is the weight applied to the data on the frequency f received by the nth microphone in the mth beam
  • H indicates conjugate
  • X and W m are vector representations of X n (f) and W m,n (f), respectively.
  • the first obtaining unit 602 is configured to:
  • FIG. 2 is a schematic diagram of a method of beamforming according to the present invention.
  • a circular microphone array composed of eight directional microphones is shown.
  • the microphone closest to the desired signal direction and its two adjacent microphones are used to form a sub-array for beamforming, for example, the desired signal is 45 degrees.
  • the beam in the direction is selected to form a sub-array of the No. 2 microphone in the 45-degree direction and its adjacent No. 1 and No. 3 microphones for beamforming.
  • the second obtaining unit 603 is configured to acquire an output direction of the beam group according to beam energy of different frequency points in the same direction;
  • the second obtaining unit 603 is configured to:
  • the beam energies of different frequency points in the same direction are summed, and the direction with the largest beam energy is selected as the output direction.
  • the implementation process uses the weighted synthesized data Y m (f) obtained by the first obtaining unit 602 to calculate the energy of the M frequency domain frame data. Calculated as follows:
  • f s is the sampling rate
  • E m the beam with the largest energy value
  • the second obtaining unit 603 is further configured to:
  • the beam energies of all frequency points between the preset first frequency and the second frequency in the same direction are summed, and the direction with the largest beam energy is selected as the output direction.
  • the optimal output beam can be selected according to the energy sum of the partial frequency points.
  • the specific implementation process is shown in Figure 3.
  • the energy sum of the frequency domain frame data corresponding to the M directions is calculated by using the weighted synthesized data Y m (f) obtained by the first obtaining unit 602. Calculated as follows:
  • the multi-direction weight vector is obtained based on a delay accumulation beamforming algorithm, a linear constrained minimum variance beamforming algorithm, a generalized sidelobe cancellation beamforming algorithm, or a minimum variance distortionless response method MVDR.
  • an MVDR beamforming filter is taken as an example for detailed description.
  • the MVDR method is to minimize the power of the output signal to obtain an estimate of the optimal beamformer weight vector.
  • the power spectral density of the output signal is:
  • ⁇ xx represents the power spectral density matrix of the array input signal.
  • d represents the attenuation and delay caused by signal propagation, as follows:
  • the amplitude difference of the received signals of each array element is negligible, the attenuation factor ⁇ n is all set to 1, ⁇ is the angular frequency, and ⁇ n is the time difference between the two elements:
  • Equation (4) is applicable to microphone arrays of any topology.
  • this beamformer is transformed into solving the optimization problem with constraints:
  • the MVDR filter can be obtained by using the power spectral density matrix of the noise:
  • ⁇ vv is the power spectral density matrix of the noise. If the matrix is a coherent matrix, that is, a super-directional beamformer is obtained, that is, a frequency domain weight vector used in the first obtaining unit 602:
  • ⁇ vv is a noise coherence function matrix, where the pth and qth column elements are calculated by:
  • the inverse transform unit 604 is configured to acquire a time domain sound signal output after beamforming in the direction.
  • the at least two channel sound signals are obtained, and the short-time Fourier transform STFT is performed on each channel sound signal, and the frequency domain audio signals corresponding to the sound signals of the respective channels are acquired; And obtaining, by the frequency domain audio signal corresponding to each of the sound signals, a beamforming output signal of the corresponding beam group of the audio signal of each frequency point; acquiring an output direction of the beam group according to beam energy of different frequency points in the same direction; acquiring The time domain sound signal outputted after the beam formation in the direction, the invention adopts the wideband beamforming algorithm based on the frequency domain to effectively improve the gain of the received speech, and adopts the method of adaptively selecting the optimal beam to avoid providing the desired signal arrival.
  • a priori information such as direction reduces the complexity of the algorithm and increases the scope of application of the algorithm.
  • the frequency domain beamforming algorithm used is advantageous for fine adjustment of the signal spectrum, and is convenient for fusion with other pre- and post-processing algorithms.
  • the invention is easy to implement, has small calculation amount, and is suitable for various embedded platforms.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 7 is a schematic diagram of functional modules of a second embodiment of a signal processing apparatus according to the present invention.
  • the gain unit 605 is configured to multiply an audio signal of each frequency point output after beamforming in the output direction by a gain, the gain being a value proportional to a frequency domain value.
  • the broadband beam it is also necessary to consider the consistency problem of the beam in the frequency domain, especially the problem that the beam main lobe width of each frequency point is inconsistent.
  • the main lobe of the wideband beam is wide at the low frequency and narrow at the high frequency. If the normalized constraint in equation (9) is satisfied at the same time, that is, the signal in the desired direction is not distorted, which will attenuate the high frequency energy of the signal. , causing signal distortion. Therefore, this embodiment has a post-processing process after beamforming. As the frequency increases, the weight coefficient of the beam is multiplied by a gradually increasing weighting factor, as shown in equation (15), to compensate for the attenuation of the high frequency portion, thereby achieving the purpose of high frequency boosting.
  • different enhancement or attenuation processes are performed for different frequency points, so that the subjective auditory experience is more comfortable.
  • the main lobe of the beam is very wide, and the low frequency signal is not substantially attenuated, so there is no need to enhance it.
  • the frequency is greater than a certain value, the signal begins to attenuate, and the gain of the beam is amplified to different degrees as the frequency increases, as shown in equation (16).
  • the gain has a different proportional relationship with the frequency domain value in a range of different frequency domain values set in advance.
  • the frequency domain based wideband beamforming algorithm effectively improves the gain of the received speech compared with the prior art.
  • the method of adaptively selecting the best beam avoids a priori information such as providing the desired signal arrival direction, reduces the complexity of the algorithm, and increases the application range of the algorithm.
  • the frequency domain beamforming algorithm used facilitates the fine adjustment of the signal spectrum and facilitates fusion with other pre- and post-processing algorithms.
  • the post-processing algorithm for adjusting the frequency gain is used to improve the sound quality degradation in wideband speech signal processing.
  • the invention is easy to implement, has a small amount of calculation, and is suitable for various embedded platforms.
  • the invention adopts a frequency domain based wideband beamforming algorithm to effectively improve the gain of the received speech, and adopts an adaptive selection of the optimal beam to avoid providing A priori information such as the expected direction of signal arrival reduces the complexity of the algorithm and increases the scope of application of the algorithm.
  • the frequency domain beamforming algorithm used is advantageous for fine adjustment of the signal spectrum, and is convenient for fusion with other pre- and post-processing algorithms. At the same time, the invention is easy to implement

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

一种信号处理的方法,通过获取至少两个通道声音信号,并获取所述各个通道声音信号对应的频域音频信号;获取每个频点的音频信号对应波束群的波束形成输出信号;获取所述波束群的输出方向;获取所述方向上波束形成后输出的时域声音信号。一种信号处理的装置,采用基于频域的宽带波束形成算法有效地提高了接收语音的增益,采用自适应选择最佳波束的方式,规避了提供期望信号到达方向等先验信息,减小了算法复杂度,增加了算法的适用范围,所用的频域波束形成算法有利于对信号频谱的精细调整,方便与其它的前后处理算法进行融合,同时,易于实现,计算量小,适用于各种嵌入式平台。

Description

信号处理的方法及装置 技术领域
本发明涉及信号处理领域,尤其涉及一种信号处理的方法及装置。
背景技术
基于麦克风阵列的语音增强方法中应用最普遍的是利用阵列的波束形成特性。根据实现方式不同,现有的波束形成技术可以分为固定波束形成技术(Delay and Sum Beam forming,DSBF)和自适应波束形成技术。Flanagan在1985年提出DSBF方法是一种最简单的固定波束形成方法。它首先将阵列中各个麦克风上接收到的语音信号进行时间补偿,以使各通道的语音同步,然后对各通道信号相加平均。在这种情况下,一旦信号偏离阵列指向,阵列对于不同频率的信号会表现出不同的增益,从而造成宽带信号的处理失真。
与固定波束形成技术相对应的另一类波束形成技术就是自适应的波束形成,其自适应特性表现在滤波系数是随着输入信号统计特性的变化而变化的。Griffth和Jim于1982提出的广义旁瓣抵消器(Generalized Sidelobe Canceller,GSC)是自适应波束形成器的一种通用模型。然而GSC算法中,阻塞矩阵(Block Matrix,BM)的输出往往含有有效的语音成分,这样在滤波结果中会对原始语音造成损伤。
发明内容
本发明提供一种处理信号的方法及装置,主要目的在于解决现有技术中存在的基于麦克风阵列的语音增强时宽带信号的失真问题。
为实现上述目的,本发明提供的一种信号处理的方法,所述方法包括:
获取至少两个通道声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取所述各个通道声音信号对应的频域音频信号;
根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;
根据同一方向的不同频点的波束能量获取所述波束群的输出方向;
获取所述方向上波束形成后输出的时域声音信号。
在本发明实施例中,所述根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号,包括:
根据预先设置的多方向的权向量,选取全部或部分通道声音信号对应的频域音频信号,获取每个频点的音频信号对应波束群的波束形成输出信号
在本发明实施例中,所述根据同一方向的不同频点的波束能量获取所述波束群的输出方向,包括:
对同一方向的不同频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
在本发明实施例中,所述对同一方向的不同频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向,包括:
对同一方向的预先设置的第一频率至第二频率之间的所有频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
在本发明实施例中,所述多方向的权向量是基于延时累加波束形成算法、线性约束最小方差波束形成算法、广义旁瓣抵消波束形成算法或者最小方差无畸变响应法MVDR得到的。
在本发明实施例中,所述根据同一方向的不同频点的波束能量获取所述波束群的输出方向之后,还包括:
对所述输出方向上波束形成后输出的各频点的音频信号乘以增益,所述增益为与频域值正比例相关的值。
在本发明实施例中,所述增益在预先设置的不同频域值范围内,与频域值有不同的正比例关系。
此外,为实现上述目的,本发明还提供一种信号处理的装置,所述装置包括:
短时傅里叶变换STFT单元,设置为获取至少两个通道声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取所述各个通道声音信号对应的频域音频信号;
第一获取单元,设置为根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;
第二获取单元,设置为根据同一方向的不同频点的波束能量获取所述波束群的输出方向;
逆变换单元,设置为获取所述方向上波束形成后输出的时域声音信号。
在本发明实施例中,所述第一获取单元,设置为:
根据预先设置的多方向的权向量,选取全部或部分通道声音信号对应的频域音频信号,获取每个频点的音频信号对应波束群的波束形成输出信号
在本发明实施例中,所述第二获取单元,设置为:
对同一方向的不同频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
在本发明实施例中,所述第二获取单元还设置为:
对同一方向的预先设置的第一频率至第二频率之间的所有频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
在本发明实施例中,所述多方向的权向量是基于延时累加波束形成算法、线性约束最小方差波束形成算法、广义旁瓣抵消波束形成算法或者最小方差无畸变响应法MVDR得到的。
在本发明实施例中,其中,所述装置还包括增益单元,设置为对所述输出方向上波束形成后输出的各频点的音频信号乘以增益,所述增益为与频域值正比例相关的值。
在本发明实施例中,所述增益在预先设置的不同频域值范围内,与频域值有不同的正比例关系。
本发明通过获取至少两个通道声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取所述各个通道声音信号对应的频域音频信号;根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;根据同一方向的不同频点的波束能量获取所述波束群的输出方向;获取所述方向上波束形成后输出的时域声音信号,本发明采用基于频域的宽带波束形成算法有效地提高了接收语音的增益,采用自适应选择最佳波束的方式, 规避了提供期望信号到达方向等先验信息,减小了算法复杂度,增加了算法的适用范围。所用的频域波束形成算法有利于对信号频谱的精细调整,方便与其它的前后处理算法进行融合,同时,本发明易于实现,计算量小,适用于各种嵌入式平台。
附图说明
图1为本发明信号处理的方法第一实施例的流程示意图;
图2为本发明波束形成的方法示意图;
图3为步骤103的细化流程示意图;
图4为本发明提供的一种L型的三维空间麦克风阵列的示意图;
图5为本发明信号处理的方法第二实施例的流程示意图;
图6为本发明信号处理的装置第一实施例的功能模块示意图;
图7为本发明信号处理的装置第二实施例的功能模块示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本发明提供一种信号处理的方法。
实施例一:
参照图1,图1为本发明信号处理的方法第一实施例的流程示意图。
在第一实施例中,该信号处理的方法包括:
步骤101,获取至少两个通道声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取所述各个通道声音信号对应的频域音频信号;
具体的,采集N个麦克风的声音信号(N>=2),对每一个麦克风接收到的时域信号进行短时傅里叶变换(short-time Fourier transform,STFT),得到该麦克风接收信号各频点的数据。
对各个麦克风信号采用相同的分帧方法进行短时傅里叶变换,帧与帧之间部分重叠,重叠的方式可以有多种,本实施例采用1/4帧移的方式进行分帧,当然也可以采用1/2帧移等其他方式;将第n个麦克风帧信号sn(i)乘上窗函数w(i),本实施例使用hamming窗,得到加窗帧信号xn(i);然后,对加窗后的帧信号进行短时傅里叶变换,得到频域的帧数据,即:
Xn(f)=fft(xn(i))             (1)
其中,i=1,…,L,L为帧数据长度,f为频点。
步骤102,根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;
具体的,设计一个波束群,包含M个波束分别指向M个方向:θ1,θ2,…,θM,每个波束都使用麦克风阵列中的所有阵元做波束形成。各个相邻波束的主瓣相交,并且波束群的主瓣覆盖所需的空间范围,这样不管声源从哪个方向过来,都有一个波束的指向与之接近。
根据M个不同方向的权矢量,获得相应的M个波束形成后的频域帧数据。具体方法如下:针对其中的某个特定方向θm,使用M个不同方向的权向量对麦克风阵列中各个麦克风在相同频点f的接收数据进行加权和,得到第m个波束该频点加权合成数据Ym(f)。
Figure PCTCN2015084148-appb-000001
其中,Wm,n(f)为对第m个波束中第n个麦克风接收的频点f上数据所施加的权值,m=1,…,M,*表示共轭,H表示共轭转置,X和Wm分别为Xn(f)和Wm,n(f)的矢量表示形式。
在本发明实施例中,所述根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号,包括:
根据预先设置的多方向的权向量,选取全部或部分通道声音信号对应的频域音频信号,获取每个频点的音频信号对应波束群的波束形成输出信号
具体的,由于受到麦克风阵列的拓扑结构的影响,使用麦克风阵列中的部分子阵做波束形成的效果可以非常接近使用全部阵元做波束形成的效果。可以通过较少的运算量获得相同的性能效果。如图2,图2为本发明波束形成的方法示意图。所示的8个指向性麦克风组成的圆型麦克风阵列,本实施例选择与期望信号方向最接近的那个麦克风及其相邻的两个麦克风组成子阵来做波束形成,比如期望信号为45度方向的波束,选择正对45度方向的2号麦克风及其相邻的1、3号麦克风组成子阵来做波束形成。
设计一个波束群,包含8个波束分别指向8个方向:0度、45度、90度、135度、180度、225度、270度、315度。各个相邻波束的主瓣相交,并且所有波束的主瓣叠加覆盖360度范围,这样不管声源从哪个方向过来,都有一个波束的指向与之接近。
步骤103,根据同一方向的不同频点的波束能量获取所述波束群的输出方向;
在本发明实施例中,所述根据同一方向的不同频点的波束能量获取所述波束群的输出方向,包括:
对同一方向的不同频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
具体的,图3为步骤103的细化流程示意图。利用步骤102得到的加权合成数据Ym(f),分别计算M个频域帧数据的能量。计算公式如下:
Figure PCTCN2015084148-appb-000002
其中fs为采样率,然后,选择能量值Em最大的波束,作为最终的波束形成结果。从而实现自适应选择一个与声源方向最接近的波束,获得最佳的音质。
在本发明实施例中,所述对同一方向的不同频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向,包括:
对同一方向的预先设置的第一频率至第二频率之间的所有频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
具体的,为了节省计算量和保持选择的准确性,可以根据部分频点的能量和来选择最佳的输出波束。具体实现流程如图3所示。利用步骤102得到的加权合成数据Ym(f),分别计算M个方向对应的频域帧数据的能量和。计算公式如下:
Figure PCTCN2015084148-appb-000003
其中0<f1<f2<fs/2,例如,当FFT的长度L为256时,f1=fs/8,f2=fs/2。这里计算的就是频点f1到f2的能量和。然后选择能量值E最大的波束,作为最终的波束形成结果。采用以上方式可以避免低频信号失真。
其中,所述多方向的权向量是基于延时累加波束形成算法、线性约束最小方差波束形成算法、广义旁瓣抵消波束形成算法或者最小方差无畸变响应法(Minimum Variance Distortionless Response,MVDR)得到的。
具体的,本实施例以MVDR波束形成滤波器为例进行详细说明。
MVDR方法就是使输出信号的功率最小来获得对最优波束形成器权矢量的估计。输出信号的功率谱密度为:
ΦYY=WHΦXXW              (5)
其中Φxx表示阵列输入信号的功率谱密度矩阵。
在最优化过程中需要保证期望方向上的信号无失真,即
WHd=1              (6)
其中d表示信号传播所引起的衰减和延迟,如下:
Figure PCTCN2015084148-appb-000004
如果使用远场模型,各阵元接收信号的幅度差异可忽略,衰减因子αn全部设为1,Ω为角频率,τn为空间两个阵元之间时间差:
Figure PCTCN2015084148-appb-000005
其中,fs为信号的采样率,c为声速340m/s,lx,n为第n个阵元与参考阵元之间的间隔距离在x轴方向的分量,ly,n为y轴方向的分量,lz,n为z轴方向的分量,θ为入射信号在xy平面的投影与x轴的夹角,
Figure PCTCN2015084148-appb-000006
为入射信号与z轴的夹角。图4为本发明提供的一种L型的三维空间麦克风阵列的示意图。而公式(4)对于任意拓扑结构的麦克风阵列都是适用的。
那么这个波束形成器就转化为求解带约束的优化问题:
Figure PCTCN2015084148-appb-000007
因为只对最佳的噪声抑制感兴趣,如果期望信号的方向和阵列的指向是完全一致的,那么就只要使用噪声的功率谱密度矩阵,可以得到MVDR滤波器为:
Figure PCTCN2015084148-appb-000008
其中Φvv为噪声的功率谱密度矩阵。如果该矩阵为相干矩阵即得到超指向性波束形成器,即为步骤102中使用的频域权矢量:
Figure PCTCN2015084148-appb-000009
Γvv为噪声相干函数矩阵,其中第p行、第q列元素由下式计算:
Figure PCTCN2015084148-appb-000010
其中lpq为阵元p、q之间的间隔距离。
步骤104,获取所述方向上波束形成后输出的时域声音信号。
具体的,对所有频点f的加权合成帧数据Y(f)作逆短时傅里叶变换,即可得到加权后的时域帧数据y(i),i=1,…,L。然后,对时域帧数据做加窗与叠加处理,得到最终的时域数据。
对逆短时傅里叶变换的结果进行加窗,得到中间结果:
y′(i)=y(i)·w(i),1≤i≤L       (13)
由于采用的1/4帧移,需要将4帧的数据进行叠加处理。将上式求得结果所属第j-3、j-2、j-1、j帧的信号相叠加,得到第j帧时域信号zj(i)(长度为L/4):
zj(i)=y'j-3(i+3·L/4)+y'j-2(i+L/2)+y'j-1(i+L/4)+y'j(i),1≤i≤L/4
(14)
本发明实施例通过获取至少通道两个声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取所述各个通道声音信号对应的频域音频信号;根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;根据同一方向的不同频点的波束能量获取所述波束群的输出方向;获取所述方向上波束形成后输出的时域声音信号,本发明采用基于频域的宽带波束形成算法有效地提高了接收语音的增益,采用自适应选择最佳波束的方式,规避了提供期望信号到达方向等先验信息,减小了算法复杂度,增加了算法的适用范围。所用的频域波束形成算法有利于对信号频谱的精细调整,方便与其它的前后处理算法进行融合,同时,本发明易于实现,计算量小,适用于各种嵌入式平台。
实施例二:
参照图5,图5为本发明信号处理的方法第二实施例的流程示意图。
在第一实施例的基础上,步骤103之后还包括步骤105:
步骤105,对所述输出方向上波束形成后输出的各频点的音频信号乘以增益,所述增益为与频域值正比例相关的值。
具体的,在宽带波束中,还需要考虑波束在频域的一致性问题,尤其各频点的波束主瓣宽度不一致的问题。宽带波束的主瓣在低频部分宽,高频部分窄,如果同时满足公式(9)中的归一化约束条件,即保证期望方向上的信号无失真,将使信号的高频能量衰减较大,引起信号失真。因此,在波束形成以后,本实施例有一个后处理过程。 随着频率的增加,将波束的权系数乘上一个逐渐递增的权值因子,如公式(15)所示,补偿高频部分的衰减,从而达到高频提升的目的。
Y(f)=Y(f)×(1+f/fs·β)  (15)
在本发明实施例中,针对不同的频率点作不同的增强或衰减处理,使得主观听觉感受更加舒适。例如,在低频时,波束的主瓣很宽,低频信号基本没有受到衰减,因此可以不用增强。而当频率大于一定值以后,信号开始衰减,随着频率的增加将波束的增益做不同程度的放大,如公式(16)所示。
Figure PCTCN2015084148-appb-000011
其中,f1=fs/8,f2=fs/4,β1、β2为不同的放大倍数,本实施例中β1=2.8,β2=2。
所述增益在预先设置的不同频域值范围内,与频域值有不同的正比例关系。
步骤104,对增益后的所述输出方向上波束形成后输出的各频点的音频信号进行STFT的逆变换,获取时域声音信号。
采用本发明所述方法,与现有技术相比,基于频域的宽带波束形成算法,有效地提高了接收语音的增益。采用自适应选择最佳波束的方式,规避了提供期望信号到达方向等先验信息,减小了算法复杂度,增加了算法的适用范围。所用的频域波束形成算法有利于对信号频谱的精细调整,方便与其它的前后处理算法进行融合。采用调节频点增益的后处理算法,改善了宽带语音信号处理中的音质下降问题。同时,本发明易于实现,计算量小,适用于各种嵌入式平台。
本发明提供一种信号处理的装置。
实施例一:
参照图6,图6为本发明信号处理的装置第一实施例的功能模块示意图。
在第一实施例中,该装置包括:
采集与时频变换单元601,设置为获取至少两个通道声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取所述各个通道声音信号对应的频域音频信号;
具体的,采集N个麦克风的声音信号(N>=2),对每一个麦克风接收到的时域信号进行短时傅里叶变换(short-time Fourier transform,STFT),得到该麦克风接收信号各频点的数据。
对各个麦克风信号采用相同的分帧方法进行短时傅里叶变换,帧与帧之间部分重叠,重叠的方式可以有多种,本实施例采用1/4帧移的方式进行分帧,当然也可以采用1/2帧移等其他方式;将第n个麦克风帧信号sn(i)乘上窗函数w(i),本实施例使用hamming窗,得到加窗帧信号xn(i);然后,对加窗后的帧信号进行短时傅里叶变换,得到频域的帧数据,即:
Xn(f)=fft(xn(i))              (1)
其中,i=1,…,L,L为帧数据长度,f为频点。
第一获取单元602,设置为根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;
具体的,设计一个波束群,包含M个波束分别指向M个方向:θ1,θ2,…,θM,每个波束都使用麦克风阵列中的所有阵元做波束形成。各个相邻波束的主瓣相交,并且波束群的主瓣覆盖所需的空间范围,这样不管声源从哪个方向过来,都有一个波束的指向与之接近。
根据M个不同方向的权矢量,获得相应的M个波束形成后的频域帧数据。具体方法如下:针对其中的某个特定方向θm,使用M个不同方向的权向量对麦克风阵列中各个麦克风在相同频点f的接收数据进行加权和,得到第m个波束该频点加权合成数据Ym(f)。
Figure PCTCN2015084148-appb-000012
其中,Wm,n(f)为对第m个波束中第n个麦克风接收的频点f上数据所施加的权值,m=1,…,M,*表示共轭,H表示共轭转置,X和Wm分别为Xn(f)和Wm,n(f)的矢量表示形式。
在本发明实施例中,所述第一获取单元602,设置为:
根据预先设置的多方向的权向量,选取全部或部分通道声音信号对应的频域音频信号,获取每个频点的音频信号对应波束群的波束形成输出信号
具体的,由于受到麦克风阵列的拓扑结构的影响,使用麦克风阵列中的部分子阵做波束形成的效果可以非常接近使用全部阵元做波束形成的效果。可以通过较少的运算量获得相同的性能效果。如图2,图2为本发明波束形成的方法示意图。所示的8个指向性麦克风组成的圆型麦克风阵列,本实施例选择与期望信号方向最接近的那个麦克风及其相邻的两个麦克风组成子阵来做波束形成,比如期望信号为45度方向的波束,选择正对45度方向的2号麦克风及其相邻的1、3号麦克风组成子阵来做波束形成。
设计一个波束群,包含8个波束分别指向8个方向:0度、45度、90度、135度、180度、225度、270度、315度。各个相邻波束的主瓣相交,并且所有波束的主瓣叠加覆盖360度范围,这样不管声源从哪个方向过来,都有一个波束的指向与之接近。
第二获取单元603,设置为根据同一方向的不同频点的波束能量获取所述波束群的输出方向;
在本发明实施例中,所述第二获取单元603,设置为:
对同一方向的不同频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
具体的,实现流程如图3所示,利用第一获取单元602得到的加权合成数据Ym(f),分别计算M个频域帧数据的能量。计算公式如下:
Figure PCTCN2015084148-appb-000013
其中fs为采样率,然后,选择能量值Em最大的波束,作为最终的波束形成结果。从而实现自适应选择一个与声源方向最接近的波束,获得最佳的音质。
在本发明实施例中,所述第二获取单元603还设置为:
对同一方向的预先设置的第一频率至第二频率之间的所有频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
具体的,为了节省计算量和保持选择的准确性,可以根据部分频点的能量和来选择最佳的输出波束。具体实现流程如图3所示。利用所述第一获取单元602到的加权合成数据Ym(f),分别计算M个方向对应的频域帧数据的能量和。计算公式如下:
Figure PCTCN2015084148-appb-000014
其中0<f1<f2<fs/2,例如,当FFT的长度L为256时,f1=fs/8,f2=fs/2。这里计算的就是频点f1到f2的能量和。然后选择能量值E最大的波束,作为最终的波束形成结果。采用以上方式可以避免低频信号失真。
其中,所述多方向的权向量是基于延时累加波束形成算法、线性约束最小方差波束形成算法、广义旁瓣抵消波束形成算法或者最小方差无畸变响应法MVDR得到的。
具体的,本实施例以MVDR波束形成滤波器为例进行详细说明。
MVDR方法就是使输出信号的功率最小来获得对最优波束形成器权矢量的估计。输出信号的功率谱密度为:
ΦYY=WHΦXXW              (5)
其中Φxx表示阵列输入信号的功率谱密度矩阵。
在最优化过程中需要保证期望方向上的信号无失真,即
WHd=1            (6)
其中d表示信号传播所引起的衰减和延迟,如下:
Figure PCTCN2015084148-appb-000015
如果使用远场模型,各阵元接收信号的幅度差异可忽略,衰减因子αn全部设为1,Ω为角频率,τn为空间两个阵元之间时间差:
Figure PCTCN2015084148-appb-000016
其中,fs为信号的采样率,c为声速340m/s,lx,n为第n个阵元与参考阵元之间的间隔距离在x轴方向的分量,ly,n为y轴方向的分量,lz,n为z轴方向的分量,θ为入射信号在xy平面的投影与x轴的夹角,
Figure PCTCN2015084148-appb-000017
为入射信号与z轴的夹角。图4为本发明提供的一种L型的三维空间麦克风阵列的示意图。而公式(4)对于任意拓扑结构的麦克风阵列都是适用的。
那么这个波束形成器就转化为求解带约束的优化问题:
Figure PCTCN2015084148-appb-000018
因为只对最佳的噪声抑制感兴趣,如果期望信号的方向和阵列的指向是完全一致的,那么就只要使用噪声的功率谱密度矩阵,可以得到MVDR滤波器为:
Figure PCTCN2015084148-appb-000019
其中Φvv为噪声的功率谱密度矩阵。如果该矩阵为相干矩阵即得到超指向性波束形成器,即为第一获取单元602中使用的频域权矢量:
Figure PCTCN2015084148-appb-000020
Γvv为噪声相干函数矩阵,其中第p行、第q列元素由下式计算:
Figure PCTCN2015084148-appb-000021
其中lpq为阵元p、q之间的间隔距离。
逆变换单元604,设置为获取所述方向上波束形成后输出的时域声音信号。
具体的,对所有频点f的加权合成帧数据Y(f)作逆短时傅里叶变换,即可得到加权后的时域帧数据y(i),i=1,…,L。然后,对时域帧数据做加窗与叠加处理,得到最终的时域数据。
对逆短时傅里叶变换的结果进行加窗,得到中间结果:
y′(i)=y(i)·w(i),1≤i≤L         (13)
由于采用的1/4帧移,需要将4帧的数据进行叠加处理。将上式求得结果所属第j-3、j-2、j-1、j帧的信号相叠加,得到第j帧时域信号zj(i)(长度为L/4):
zj(i)=y'j-3(i+3·L/4)+y'j-2(i+L/2)+y'j-1(i+L/4)+y'j(i),1≤i≤L/4
(14)
本发明实施例通过获取至少两个通道声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取所述各个通道声音信号对应的频域音频信号;根据预先设置的多方向的权向量和所述各个声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;根据同一方向的不同频点的波束能量获取所述波束群的输出方向;获取所述方向上波束形成后输出的时域声音信号,本发明采用基于频域的宽带波束形成算法有效地提高了接收语音的增益,采用自适应选择最佳波束的方式,规避了提供期望信号到达方向等先验信息,减小了算法复杂度,增加了算法的适用范围。所用的频域波束形成算法有利于对信号频谱的精细调整,方便与其它的前后处理算法进行融合,同时,本发明易于实现,计算量小,适用于各种嵌入式平台。
实施例二:
参照图7,图7为本发明信号处理的装置第二实施例的功能模块示意图。
在第一实施例的基础上,还包括增益单元605;
所述增益单元605,设置为对所述输出方向上波束形成后输出的各频点的音频信号乘以增益,所述增益为与频域值正比例相关的值。
具体的,在宽带波束中,还需要考虑波束在频域的一致性问题,尤其各频点的波束主瓣宽度不一致的问题。宽带波束的主瓣在低频部分宽,高频部分窄,如果同时满足公式(9)中的归一化约束条件,即保证期望方向上的信号无失真,将使信号的高频能量衰减较大,引起信号失真。因此,在波束形成以后,本实施例有一个后处理过程。随着频率的增加,将波束的权系数乘上一个逐渐递增的权值因子,如公式(15)所示,补偿高频部分的衰减,从而达到高频提升的目的。
Y(f)=Y(f)×(1+f/fs·β)  (15)
在本发明实施例中,针对不同的频率点作不同的增强或衰减处理,使得主观听觉感受更加舒适。例如,在低频时,波束的主瓣很宽,低频信号基本没有受到衰减,因此可以不用增强。而当频率大于一定值以后,信号开始衰减,随着频率的增加将波束的增益做不同程度的放大,如公式(16)所示。
Figure PCTCN2015084148-appb-000022
其中,f1=fs/8,f2=fs/4,β1、β2为不同的放大倍数,本实施例中β1=2.8,β2=2。
所述增益在预先设置的不同频域值范围内,与频域值有不同的正比例关系。
采用本发明所述方法,与现有技术相比,基于频域的宽带波束形成算法,有效地提高了接收语音的增益。采用自适应选择最佳波束的方式,规避了提供期望信号到达方向等先验信息,减小了算法复杂度,增加了算法的适用范围。所用的频域波束形成算法有利于对信号频谱的精细调整,方便与其它的前后处理算法进行融合。采用调节频点增益的后处理算法,改善了宽带语音信号处理中的音质下降问题。同时,本发明易于实现,计算量小,适用于各种嵌入式平台。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。
工业实用性
基于本发明实施例提供的上述技术方案,获取至少两个通道声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取该各个通道声音信号对应的频域音频信号;根据预先设置的多方向的权向量和该各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;根据同一方向的不同频点的波束能量获取该波束群的输出方向;获取该方向上波束形成后输出的时域声音信号,本发明采用基于频域的宽带波束形成算法有效地提高了接收语音的增益,采用自适应选择最佳波束的方式,规避了提供期望信号到达方向等先验信息,减小了算法复杂度,增加了算法的适用范围。所用的频域波束形成算法有利于对信号频谱的精细调整,方便与其它的前后处理算法进行融合,同时,本发明易于实现,计算量小,适用于各种嵌入式平台。

Claims (14)

  1. 一种信号处理的方法,所述方法包括:
    获取至少两个通道声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取所述各个通道声音信号对应的频域音频信号;
    根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;
    根据同一方向的不同频点的波束能量获取所述波束群的输出方向;
    获取所述方向上波束形成后输出的时域声音信号。
  2. 根据权利要求1所述的方法,其中,所述根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号,包括:
    根据预先设置的多方向的权向量,选取全部或部分通道声音信号对应的频域音频信号,获取每个频点的音频信号对应波束群的波束形成输出信号
  3. 根据权利要求1所述的方法,其中,所述根据同一方向的不同频点的波束能量获取所述波束群的输出方向,包括:
    对同一方向的不同频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
  4. 根据权利要求3所述的方法,其中,所述对同一方向的不同频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向,包括:
    对同一方向的预先设置的第一频率至第二频率之间的所有频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
  5. 根据权利要求1所述的方法,其中,所述多方向的权向量基于延时累加波束形成算法、线性约束最小方差波束形成算法、广义旁瓣抵消波束形成算法或者最小方差无畸变响应法MVDR而得到的。
  6. 根据权利要求1至5任意一项所述的方法,其中,所述根据同一方向的不同频点的波束能量获取所述波束群的输出方向之后,还包括:
    对所述输出方向上波束形成后输出的各频点的音频信号乘以增益,所述增益为与频域值正比例相关的值。
  7. 根据权利要求6所述的方法,其中,所述增益在预先设置的不同频域值范围内,与频域值有不同的正比例关系。
  8. 一种信号处理的装置,所述装置包括:
    采集与时频变换单元,设置为获取至少两个通道声音信号,并对各个通道声音信号进行短时傅里叶变换STFT,获取所述各个通道声音信号对应的频域音频信号;
    第一获取单元,设置为根据预先设置的多方向的权向量和所述各个通道声音信号对应的频域音频信号获取每个频点的音频信号对应波束群的波束形成输出信号;
    第二获取单元,设置为根据同一方向的不同频点的波束能量获取所述波束群的输出方向;
    逆变换单元,设置为获取所述方向上波束形成后输出的时域声音信号。
  9. 根据权利要求8所述的装置,其中,所述第一获取单元,设置为:
    根据预先设置的多方向的权向量,选取所述部分通道声音信号对应的频域音频信号,获取每个频点的音频信号对应波束群的波束形成输出信号
  10. 根据权利要求8所述的装置,其中,所述第二获取单元,设置为:
    对同一方向的不同频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
  11. 根据权利要求10所述的装置,其中,所述第二获取单元还设置为:
    对同一方向的预先设置的第一频率至第二频率之间的所有频点的波束能量进行求和,并选取波束能量最大的方向作为输出方向。
  12. 根据权利要求11所述的装置,其中,所述多方向的权向量是基于延时累加波束形成算法、线性约束最小方差波束形成算法、广义旁瓣抵消波束形成算法或者最小方差无畸变响应法MVDR得到的。
  13. 根据权利要求8至12任意一项所述的装置,其中,所述装置还包括增益单元,设置为对所述输出方向上波束形成后输出的各频点的音频信号乘以增益,所述增益为与频域值正比例相关的值。
  14. 根据权利要求13所述的装置,其中,所述增益在预先设置的不同频域值范围内,与频域值有不同的正比例关系。
PCT/CN2015/084148 2014-11-14 2015-07-15 信号处理的方法及装置 WO2016074495A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US15/526,812 US10181330B2 (en) 2014-11-14 2015-07-15 Signal processing method and device
EP15859302.0A EP3220158A4 (en) 2014-11-14 2015-07-15 Signal processing method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410649621.2A CN105590631B (zh) 2014-11-14 2014-11-14 信号处理的方法及装置
CN201410649621.2 2014-11-14

Publications (1)

Publication Number Publication Date
WO2016074495A1 true WO2016074495A1 (zh) 2016-05-19

Family

ID=55930153

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/084148 WO2016074495A1 (zh) 2014-11-14 2015-07-15 信号处理的方法及装置

Country Status (4)

Country Link
US (1) US10181330B2 (zh)
EP (1) EP3220158A4 (zh)
CN (1) CN105590631B (zh)
WO (1) WO2016074495A1 (zh)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9252908B1 (en) * 2012-04-12 2016-02-02 Tarana Wireless, Inc. Non-line of sight wireless communication system and method
CN106973353A (zh) * 2017-03-27 2017-07-21 广东顺德中山大学卡内基梅隆大学国际联合研究院 一种基于Volterra滤波器的麦克风阵列通道不匹配校准方法
CN108877828B (zh) * 2017-05-16 2020-12-08 福州瑞芯微电子股份有限公司 语音增强方法/系统、计算机可读存储介质及电子设备
CN107785029B (zh) * 2017-10-23 2021-01-29 科大讯飞股份有限公司 目标语音检测方法及装置
CN107910012B (zh) * 2017-11-14 2020-07-03 腾讯音乐娱乐科技(深圳)有限公司 音频数据处理方法、装置及系统
CN109599104B (zh) * 2018-11-20 2022-04-01 北京小米智能科技有限公司 多波束选取方法及装置
CN109599124B (zh) * 2018-11-23 2023-01-10 腾讯科技(深圳)有限公司 一种音频数据处理方法、装置及存储介质
CN111624554B (zh) * 2019-02-27 2023-05-02 北京京东尚科信息技术有限公司 声源定位方法和装置
CN111833901B (zh) * 2019-04-23 2024-04-05 北京京东尚科信息技术有限公司 音频处理方法、音频处理装置、系统及介质
CN110111805B (zh) * 2019-04-29 2021-10-29 北京声智科技有限公司 远场语音交互中的自动增益控制方法、装置及可读存储介质
CN110265038B (zh) * 2019-06-28 2021-10-22 联想(北京)有限公司 一种处理方法及电子设备
US11234073B1 (en) * 2019-07-05 2022-01-25 Facebook Technologies, Llc Selective active noise cancellation
CN110517703B (zh) 2019-08-15 2021-12-07 北京小米移动软件有限公司 一种声音采集方法、装置及介质
CN114586097A (zh) * 2019-11-05 2022-06-03 阿里巴巴集团控股有限公司 差分定向传感器系统
CN110600051B (zh) * 2019-11-12 2020-03-31 乐鑫信息科技(上海)股份有限公司 用于选择麦克风阵列的输出波束的方法
CN110956951A (zh) * 2019-12-23 2020-04-03 苏州思必驰信息科技有限公司 一种语音增强采集配件、方法、系统、设备及存储介质
CN112634934B (zh) * 2020-12-21 2024-06-25 北京声智科技有限公司 语音检测方法及装置
CN112634931B (zh) * 2020-12-22 2024-05-14 北京声智科技有限公司 语音增强方法及装置
CN113375063B (zh) * 2021-06-07 2022-06-28 国家石油天然气管网集团有限公司西气东输分公司 一种天然气管道泄漏智能监测方法及系统
CN113362846B (zh) * 2021-06-29 2022-09-20 辽宁工业大学 一种基于广义旁瓣相消结构的语音增强方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060215854A1 (en) * 2005-03-23 2006-09-28 Kaoru Suzuki Apparatus, method and program for processing acoustic signal, and recording medium in which acoustic signal, processing program is recorded
CN1866356A (zh) * 2005-08-15 2006-11-22 华为技术有限公司 一种宽带波束形成方法和装置
EP1992959A2 (en) * 2007-05-18 2008-11-19 Ono Sokki Co., Ltd. Sound source search method, sound source search device, and sound source search program storage medium
CN103308889A (zh) * 2013-05-13 2013-09-18 辽宁工业大学 复杂环境下被动声源二维doa估计方法
CN103513249A (zh) * 2012-06-20 2014-01-15 中国科学院声学研究所 一种宽带相干模基信号处理方法及系统
CN104103277A (zh) * 2013-04-15 2014-10-15 北京大学深圳研究生院 一种基于时频掩膜的单声学矢量传感器目标语音增强方法

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8098842B2 (en) 2007-03-29 2012-01-17 Microsoft Corp. Enhanced beamforming for arrays of directional microphones
JP5305743B2 (ja) * 2008-06-02 2013-10-02 株式会社東芝 音響処理装置及びその方法
WO2009151578A2 (en) * 2008-06-09 2009-12-17 The Board Of Trustees Of The University Of Illinois Method and apparatus for blind signal recovery in noisy, reverberant environments
US9159335B2 (en) * 2008-10-10 2015-10-13 Samsung Electronics Co., Ltd. Apparatus and method for noise estimation, and noise reduction apparatus employing the same
KR20110106715A (ko) * 2010-03-23 2011-09-29 삼성전자주식회사 후방 잡음 제거 장치 및 방법
JP2012150237A (ja) * 2011-01-18 2012-08-09 Sony Corp 音信号処理装置、および音信号処理方法、並びにプログラム
JP2012234150A (ja) * 2011-04-18 2012-11-29 Sony Corp 音信号処理装置、および音信号処理方法、並びにプログラム
CN102324237B (zh) * 2011-05-30 2013-01-02 深圳市华新微声学技术有限公司 麦克风阵列语音波束形成方法、语音信号处理装置及系统
US9173025B2 (en) * 2012-02-08 2015-10-27 Dolby Laboratories Licensing Corporation Combined suppression of noise, echo, and out-of-location signals
CN103856866B (zh) * 2012-12-04 2019-11-05 西北工业大学 低噪微分麦克风阵列
WO2014085978A1 (en) * 2012-12-04 2014-06-12 Northwestern Polytechnical University Low noise differential microphone arrays
US9432769B1 (en) * 2014-07-30 2016-08-30 Amazon Technologies, Inc. Method and system for beam selection in microphone array beamformers
US20170337932A1 (en) * 2016-05-19 2017-11-23 Apple Inc. Beam selection for noise suppression based on separation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060215854A1 (en) * 2005-03-23 2006-09-28 Kaoru Suzuki Apparatus, method and program for processing acoustic signal, and recording medium in which acoustic signal, processing program is recorded
CN1866356A (zh) * 2005-08-15 2006-11-22 华为技术有限公司 一种宽带波束形成方法和装置
EP1992959A2 (en) * 2007-05-18 2008-11-19 Ono Sokki Co., Ltd. Sound source search method, sound source search device, and sound source search program storage medium
CN103513249A (zh) * 2012-06-20 2014-01-15 中国科学院声学研究所 一种宽带相干模基信号处理方法及系统
CN104103277A (zh) * 2013-04-15 2014-10-15 北京大学深圳研究生院 一种基于时频掩膜的单声学矢量传感器目标语音增强方法
CN103308889A (zh) * 2013-05-13 2013-09-18 辽宁工业大学 复杂环境下被动声源二维doa估计方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3220158A4 *

Also Published As

Publication number Publication date
US10181330B2 (en) 2019-01-15
CN105590631A (zh) 2016-05-18
CN105590631B (zh) 2020-04-07
EP3220158A1 (en) 2017-09-20
EP3220158A4 (en) 2017-12-20
US20170337936A1 (en) 2017-11-23

Similar Documents

Publication Publication Date Title
WO2016074495A1 (zh) 信号处理的方法及装置
CN108172235B (zh) 基于维纳后置滤波的ls波束形成混响抑制方法
CN106782590B (zh) 基于混响环境下麦克风阵列波束形成方法
EP2647221B1 (en) Apparatus and method for spatially selective sound acquisition by acoustic triangulation
Habets et al. New insights into the MVDR beamformer in room acoustics
US8654990B2 (en) Multiple microphone based directional sound filter
CN110085248B (zh) 个人通信中降噪和回波消除时的噪声估计
JP6363213B2 (ja) いくつかの入力オーディオ信号の残響を除去するための信号処理の装置、方法、およびコンピュータプログラム
WO2015196729A1 (zh) 一种麦克风阵列语音增强方法及装置
US20130083943A1 (en) Processing Signals
Berkun et al. Combined beamformers for robust broadband regularized superdirective beamforming
JP2017503388A5 (zh)
EP3245795A2 (en) Reverberation suppression using multiple beamformers
CN112331226A (zh) 一种针对主动降噪系统的语音增强系统及方法
JP2010245984A (ja) マイクロホンアレイにおけるマイクロホンの感度を補正する装置、この装置を含んだマイクロホンアレイシステム、およびプログラム
Comminiello et al. A novel affine projection algorithm for superdirective microphone array beamforming
WO2018167921A1 (ja) 信号処理装置
Borisovich et al. Improvement of microphone array characteristics for speech capturing
Buerger et al. The spatial coherence of noise fields evoked by continuous source distributions
JP2010210728A (ja) 音響信号処理方法及び装置
Liu et al. Simulation of fixed microphone arrays for directional hearing aids
Zhao et al. Frequency-domain beamformers using conjugate gradient techniques for speech enhancement
Nguyen et al. A Study Of Dual Microphone Array For Speech Enhancement In Noisy Environment
Stern et al. " polyaural" array processing for automatic speech recognition in degraded environments.
Zou et al. Speech enhancement with an acoustic vector sensor: an effective adaptive beamforming and post-filtering approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15859302

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015859302

Country of ref document: EP