WO2016119388A1 - 一种基于语音信号构造聚焦协方差矩阵的方法及装置 - Google Patents

一种基于语音信号构造聚焦协方差矩阵的方法及装置 Download PDF

Info

Publication number
WO2016119388A1
WO2016119388A1 PCT/CN2015/082571 CN2015082571W WO2016119388A1 WO 2016119388 A1 WO2016119388 A1 WO 2016119388A1 CN 2015082571 W CN2015082571 W CN 2015082571W WO 2016119388 A1 WO2016119388 A1 WO 2016119388A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
focus
covariance matrix
sampling frequency
covariance
Prior art date
Application number
PCT/CN2015/082571
Other languages
English (en)
French (fr)
Inventor
陈喆
殷福亮
张梦晗
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016119388A1 publication Critical patent/WO2016119388A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Definitions

  • the present invention relates to the field of voice signal processing technologies, and in particular, to a method and apparatus for constructing a focus covariance matrix based on a voice signal.
  • the microphone array can utilize the spatial and frequency domain information of the sound source, and can also utilize the spatial information of the sound source. Therefore, it has the advantages of strong anti-interference ability and flexible application, and solves the sound source localization. It has strong advantages in terms of speech enhancement and speech recognition. It has been widely used in audio and video conference systems, in-vehicle systems, hearing aids, human-computer interaction systems, robot systems, security surveillance, military reconnaissance and other fields.
  • the focus covariance matrix needs to be constructed.
  • It is necessary to predict the incident angle of the sound source construct the focus covariance matrix according to the predicted incident angle, and estimate the number of sound sources.
  • the constructed focus covariance matrix is constructed. The accuracy is lower.
  • the embodiments of the present invention provide a method and a device for constructing a focus covariance matrix based on a voice signal, which are used to solve the defect that the accuracy of the focus covariance matrix obtained in the prior art is low.
  • a method for constructing a focus covariance matrix based on a speech signal comprising:
  • the calculated sum of the focus covariance matrices of the speech signals respectively collected at the respective sampling frequency points is used as the focus covariance matrix of the speech signals collected by the microphone array.
  • the calculating the first covariance matrix comprises:
  • the first covariance matrix is calculated in the following manner:
  • the k represents the any one of the sampling frequency points
  • the P represents the number of frames in which the microphone array collects the voice signal
  • the X i (k) represents the microphone Discrete Fourier transform DFT value of the array at any one frame and any one of the sampling frequency points
  • the conjugate transposed matrix of the X i (k) is represented
  • the N represents the number of sampling frequency points included in any one frame, and the number of sampling frequency points included in any two different frames is the same.
  • the method before calculating the focus transformation matrix, the method further includes:
  • Calculating the focus transformation matrix specifically includes:
  • the product of the conjugate transposed matrix of the first eigenvector matrix and the second eigenvector matrix is used as the focus transformation matrix.
  • the calculating the second covariance matrix includes:
  • the second covariance matrix is calculated in the following manner:
  • the k 0 represents the focus frequency point
  • the P represents the number of frames in which the microphone array collects the voice signal
  • the X i (k 0 ) represents the microphone DFT value of the array at any one frame and the focus frequency
  • the A conjugate transpose matrix representing the X i (k 0 ).
  • the decomposing the feature value for the first covariance matrix includes:
  • the eigenvalues are decomposed into the first covariance matrix as follows:
  • the decomposing the feature values for the second covariance matrix includes:
  • the feature values are decomposed into the second covariance matrix as follows:
  • the X i (k) form is as follows:
  • X i1 (k) represents the DFT value of the first element of the microphone array at the ith frame and the kth sampling frequency
  • X i2 (k) represents the second element of the microphone array
  • X iL (k) represents the DFT value of the Lth array element of the microphone array at the ith frame and the kth sampling frequency point
  • L is the number of array elements included in the microphone array.
  • an apparatus for constructing a focus covariance matrix based on a voice signal including:
  • a determining unit configured to determine a sampling frequency point used when the microphone array collects the voice signal
  • a first calculating unit configured to calculate, according to any one of the determined sampling frequency points, a first covariance matrix, a focus transformation matrix, and a focus of the voice signal collected at the any one of the sampling frequency points Focusing a conjugate transposed matrix of the transform matrix, and collecting a product of the first covariance matrix, the focus transform matrix, and the conjugate transposed matrix of the focus transform matrix as at any one of the sampling frequency points a focus covariance matrix of the received speech signal;
  • the second calculating unit is configured to use the sum of the calculated focus covariance matrices of the voice signals respectively collected at the respective sampling frequency points as a focus covariance matrix of the voice signals collected by the microphone array.
  • the first calculating unit when calculating the first covariance matrix, is specifically:
  • the first covariance matrix is calculated in the following manner:
  • the k represents the any one of the sampling frequency points
  • the P represents the number of frames in which the microphone array collects the voice signal
  • the X i (k) represents the microphone Discrete Fourier transform DFT value of the array at any one frame and any one of the sampling frequency points
  • the N represents the number of sampling frequency points included in any one frame
  • the number of sampling frequency points included in any two different frames are the same.
  • the determining unit is further configured to determine a sampling frequency used when the microphone array collects a voice signal. Focus frequency
  • the first calculating unit is further configured to calculate a second covariance matrix of the voice signal collected by the microphone array at the focus frequency point;
  • the first calculating unit is specifically:
  • the product of the conjugate transposed matrix of the first eigenvector matrix and the second eigenvector matrix is used as the focus transformation matrix.
  • the first calculating unit when calculating the second covariance matrix, is specifically:
  • the second covariance matrix is calculated in the following manner:
  • the k 0 represents the focus frequency point
  • the P represents the number of frames in which the microphone array collects the voice signal
  • the X i (k 0 ) represents the microphone DFT value of the array at any one frame and the focus frequency
  • the A conjugate transpose matrix representing the X i (k 0 ).
  • the first calculating unit when decomposing the feature value for the first covariance matrix, is specifically:
  • the eigenvalues are decomposed into the first covariance matrix as follows:
  • the first calculating unit when decomposing the feature value for the second covariance matrix, is specifically:
  • the feature values are decomposed into the second covariance matrix as follows:
  • the X i (k) form is as follows:
  • X i1 (k) represents the DFT value of the first element of the microphone array at the ith frame and the kth sampling frequency
  • X i2 (k) represents the second element of the microphone array
  • X iL (k) represents the DFT value of the Lth array element of the microphone array at the ith frame and the kth sampling frequency point
  • L is the number of array elements included in the microphone array.
  • the main idea of constructing a focus covariance matrix based on a speech signal is: determining a sampling frequency point used when a microphone array acquires a speech signal; and calculating any random sampling frequency point in the determined sampling frequency point A sampling frequency is acquired to the first covariance matrix of the speech signal, the focus transformation matrix, and the conjugate transposed matrix of the focus transformation matrix, and the conjugate of the first covariance matrix, the focus transformation matrix, and the focus transformation matrix are transposed
  • 1A is a flowchart of constructing a focus covariance matrix based on a voice signal according to an embodiment of the present invention
  • FIG. 1B is a schematic diagram of frame shifting according to an embodiment of the present invention.
  • 1C is a schematic diagram of comparison between the number of calculated sound sources and the number of CSM-GDE calculated sound sources according to an embodiment of the present invention
  • FIG. 1D is another schematic diagram of comparing the number of calculated sound sources with the number of CSM-GDE calculated sound sources according to an embodiment of the present invention
  • 3A is a schematic structural diagram of an apparatus for constructing a focus covariance matrix based on a voice signal according to an embodiment of the present invention
  • FIG. 3B is a schematic structural diagram of an apparatus for constructing a focus covariance matrix based on a voice signal according to an embodiment of the present invention.
  • a process of constructing a focus covariance matrix based on a voice signal is as follows:
  • Step 100 Determine a sampling frequency point used when the microphone array collects a voice signal.
  • Step 110 Calculate a first covariance matrix, a focus transformation matrix, and a conjugate transpose of the focus transformation matrix of the speech signal collected at any one of the sampling frequency points for any one of the determined sampling frequency points.
  • a matrix, and the product of the first covariance matrix, the focus transformation matrix, and the conjugate transposed matrix of the focus transformation matrix is used as a focus covariance matrix of the speech signal collected at any sampling frequency point;
  • Step 120 The calculated sum of the focus covariance matrices of the speech signals respectively collected at the respective sampling frequency points is used as a focus covariance matrix of the speech signals collected by the microphone array.
  • the voice signal collected by the microphone array at any sampling frequency point is calculated.
  • the voice signal collected at any sampling frequency point is calculated.
  • a covariance matrix, a focus transformation matrix, and a conjugate transpose matrix of the focus transformation matrix are also included:
  • the first covariance matrix, the focus transformation matrix, and the conjugate transposed matrix of the focus transformation matrix of the speech signal collected at any sampling frequency point are calculated.
  • the following manner may be adopted:
  • a first covariance matrix, a focus transformation matrix, and a conjugate transpose matrix of the focus transformation matrix of the pre-emphasized speech signal are calculated.
  • the voice signal may be pre-emphasized in the following manner:
  • X i1 (k) represents the DFT value of the first array element of the microphone array at the ith frame and the kth sampling frequency point
  • X i2 (k) represents the second array element of the microphone array at the ith frame
  • the DFT value at the kth sampling frequency point ...
  • X iL (k) represents the DFT value of the Lth array element of the microphone array at the ith frame and the kth sampling frequency point
  • L is a microphone array including
  • P represents the number of frames in which the microphone array acquires speech signals.
  • the first voice signal collected at any sampling frequency is calculated.
  • the covariance matrix, the focus transformation matrix, and the conjugate transpose matrix of the focus transformation matrix the following operations are also included:
  • a first covariance matrix, a focus transformation matrix, and a conjugate transpose matrix of the focus transformation matrix of the speech signal subjected to the framing processing are calculated.
  • the framing when the framing processing is performed, the framing is performed in an overlapping manner, that is, the two frames are overlapped, and the overlapping portion is called a frame shift.
  • the selected frame is moved to a frame length.
  • Half, the framing overlap is shown in Figure 1B.
  • the framing processed speech signal needs to be windowed.
  • the windowing process of the speech signal after the framing process can be performed as follows:
  • the speech signal subjected to the framing processing is multiplied by the Hamming window function w(n).
  • the Hamming window function w(n) is as shown in Equation 3:
  • N represents the number of sampling frequency points included in any one frame, and the number of sampling frequency points included in any two different frames are the same.
  • the voice signal collected by the microphone array may have some signals as voice signals from the target object, and some signals are voice signals from non-target objects.
  • some signals are voice signals from non-target objects.
  • the noise is a speech signal emitted by a non-target object, and when the presenter starts speaking, the speech signal collected by the microphone array at this time is the speech signal emitted by the target object, and the focus covariance matrix constructed according to the speech signal emitted by the target object.
  • the accuracy of the voice signal is high. Therefore, in the embodiment of the present invention, after acquiring the voice signal collected by the microphone array, the first covariance matrix, the focus transformation matrix, and the focus of the voice signal collected at any sampling frequency point are calculated. Before transforming the conjugate transposed matrix of the matrix, the following operations are also included:
  • the first covariance matrix is calculated as follows:
  • k represents any sampling frequency
  • P represents the number of frames in which the microphone array acquires speech signals
  • X i (k) represents the DFT of the microphone array in any frame and any sampling frequency (Discrete) Fourier Transform, discrete Fourier transform) values
  • N represents the number of sampling frequency points included in any one frame, and the number of sampling frequency points included in any two different frames is the same.
  • the product of the conjugate transposed matrix of the first eigenvector matrix and the second eigenvector matrix is used as a focus transformation matrix.
  • the second covariance matrix is calculated as follows:
  • k 0 represents a focus frequency point
  • P represents a number of frames in which the microphone array acquires a speech signal
  • X i (k 0 ) represents a DFT value of the microphone array at any one frame and a focus frequency point
  • the eigenvalue when the eigenvalue is decomposed into the first covariance matrix, the following may be adopted:
  • the X i (k) form is as shown in Formula 2.
  • the number of sound sources may be calculated according to the obtained focus covariance matrix.
  • the following manner may be adopted. :
  • the number of sound sources is calculated from the obtained focus covariance matrix using the Gaelic circle criterion.
  • the room size is 10m ⁇ 10m ⁇ 3m
  • the coordinates of the eight vertices are (0,0,0), (0,10,0), (0,10,2.5), (0,0, 2.5), (10,0,0), (10,10,0), (10,10,2.5) and (10,0,2.5).
  • a uniform linear array of 10 microphones is distributed between (2, 4, 1.3) and (2, 4.9, 1.3) points with an array element spacing of 0.1 m.
  • the array elements are isotropic omnidirectional microphones, 6
  • the speaker positions are (8,1,1.3), (8,2.6,1.3), (8,4.2,1.3), (8,5.8,1.3), (8,7.4,1.3) and (8,9, 1.3), assuming that the background noise is Gaussian white noise.
  • the microphone array and speaker speech are processed using the Image simulation model, and the speech signal is sampled at a sampling frequency of 8 kHz to obtain a microphone array received signal.
  • the speech signal length of the speaker is long enough, and 50 different tests are taken for each experiment.
  • the detection probability is as follows:
  • the CSM-GDE method has a detection probability of 0.9 when the signal-to-noise ratio is 0 dB, and a detection probability of 1 when the signal-to-noise ratio is 4 dB.
  • the correct detection probability is greatly improved compared with the CSM-GDE method; when the signal-to-noise ratio is -3 dB, the detection probability reaches 0.9, and the signal-to-noise ratio is -3 dB.
  • the correct detection probability is reached, it can reach 1.
  • the focus is constructed by the method provided by the embodiment of the present invention.
  • the comparison between the method of covariance matrix and the existing CSM-GDE method detection probability with the number of frames is shown in Fig. 1D.
  • the CSM-GDE method has a detection probability of 0.9 when the number of frames is 40, and a detection probability of 1 when the number of frames is 65.
  • the detection probability is greatly improved compared with the CSM-GDE method; when the number of frames is 25, the detection probability reaches 0.9, and when the number of frames is 50, the detection probability can reach 1 .
  • Table 1 gives a comparison of the performance of the method of constructing the focus covariance matrix to calculate the number of sound sources and the method of calculating the number of sound sources by CSM-GDE according to the scheme of the present invention in the case of different number of speakers.
  • the actual number of speakers is 2
  • the signal-to-noise ratio is 10 dB
  • the subframe length is 128 points
  • the number of frames is 100.
  • the method for constructing the focus covariance matrix to calculate the number of sound sources provided by the scheme of the present invention and the method for calculating the number of sound sources by CSM-GDE can reach 1 when When the actual number of speakers is greater than 3, the probability of detection decreases with the increase in the number of speakers, and the number of sound sources is calculated by the method of constructing the focus covariance matrix provided by the scheme of the present invention, and the number of sound sources is calculated by CSM-GDE.
  • the method has a higher probability of detection.
  • calculating the number of sound sources according to the obtained focus covariance matrix by using the Gaelic circle criterion is a relatively common method in the technical field, and will not be described in detail herein.
  • FIG. 2 In order to better understand the embodiments of the present invention, a specific application scenario is given below, and a process for constructing a focus covariance matrix based on a voice signal is further described in detail, as shown in FIG. 2:
  • Step 200 determining that the sampling frequency used by the microphone array to collect the voice signal is 100: sampling frequency point 0, sampling frequency point 1, sampling frequency point 2, ..., sampling frequency point 99;
  • Step 210 Calculate a first covariance matrix for the sampling frequency point 0 for the sampling frequency point, 0.
  • Step 220 Determine a focus frequency point of 100 sampling frequency points
  • Step 230 Calculate a second covariance matrix of the voice signal collected by the microphone array at the focus frequency point;
  • Step 240 Decompose the eigenvalues of the first covariance matrix to obtain a first eigenvector matrix, and perform conjugate transposition on the first eigenvector matrix to obtain a conjugate transposed matrix of the first eigenvector matrix;
  • Step 250 Decompose the feature values for the second covariance matrix to obtain a second feature vector matrix
  • Step 260 The product of the conjugate transposed matrix and the second eigenvector matrix of the first eigenvector matrix is used as a focus transform matrix, and the conjugate transpose is performed on the focus transform matrix to obtain a conjugate transposed matrix of the focus transform matrix. ;
  • Step 270 The product of the first covariance matrix, the focus transformation matrix, and the conjugate transposed matrix of the focus transformation matrix is used as a focus covariance matrix of the speech signal collected at the sampling frequency point 0;
  • Step 280 Calculate a focus covariance matrix of other sampling frequency points according to a method for calculating a focus covariance matrix of the sampling frequency point 0, and collect the sum of the focus covariance matrices for each sampling frequency point as a microphone array.
  • the focus covariance matrix of the speech signal is
  • an embodiment of the present invention provides a An apparatus for constructing a focus covariance matrix based on a speech signal, the apparatus comprising a determining unit 30, a first calculating unit 31, and a second calculating unit 32, wherein:
  • a determining unit 30 configured to determine a sampling frequency point used when the microphone array collects the voice signal
  • the first calculating unit 31 is configured to calculate a first covariance matrix, a focus transformation matrix, and a focus transformation matrix of the speech signal collected at any one of the sampling frequency points for any one of the determined sampling frequency points.
  • a conjugate transposed matrix, and the product of the first covariance matrix, the focus transformation matrix, and the conjugate transposed matrix of the focus transformation matrix is used as a focus covariance matrix of the speech signal collected at any sampling frequency point;
  • the second calculating unit 32 is configured to use the sum of the calculated focus covariance matrices of the voice signals respectively collected at the respective sampling frequency points as a focus covariance matrix of the voice signals collected by the microphone array.
  • the first calculating unit 31 when calculating the first covariance matrix, is specifically:
  • the first covariance matrix is calculated as follows:
  • k represents any sampling frequency
  • P represents the number of frames in which the microphone array acquires speech signals
  • X i (k) represents the discrete Fourier of the microphone array at any one frame and any sampling frequency.
  • Leaf transform DFT value The conjugate transposed matrix representing X i (k), N represents the number of sampling frequency points included in any one frame, and the number of sampling frequency points included in any two different frames is the same.
  • the determining unit 30 is further configured to: determine a focus frequency point of the sampling frequency point used when the microphone array collects the voice signal;
  • the first calculating unit 31 is further configured to calculate a second covariance matrix of the voice signal collected by the microphone array at the focus frequency point;
  • the first calculation unit 31 calculates the focus transformation matrix, it specifically:
  • the product of the conjugate transposed matrix of the first eigenvector matrix and the second eigenvector matrix is used as a focus transformation matrix.
  • the first calculating unit 31 is specifically:
  • the second covariance matrix is calculated as follows:
  • k 0 represents a focus frequency point
  • P represents a number of frames in which the microphone array acquires a speech signal
  • X i (k 0 ) represents a DFT value of the microphone array at any one frame and a focus frequency point
  • the first calculating unit 31 when decomposing the feature value for the second covariance matrix, specifically:
  • X i1 (k) represents the DFT value of the first array element of the microphone array at the ith frame and the kth sampling frequency point
  • X i2 (k) represents the second array element of the microphone array at the ith frame
  • the DFT value at the kth sampling frequency point ...
  • X iL (k) represents the DFT value of the Lth array element of the microphone array at the ith frame and the kth sampling frequency point
  • L is a microphone array including The number of array elements.
  • FIG. 3B another schematic structural diagram of an apparatus for constructing a focus covariance matrix based on a voice signal according to an embodiment of the present invention includes at least one processor 301, a communication bus 302, a memory 303, and at least one communication interface 304.
  • the communication bus 302 is used to implement the connection and communication between the above components, and the communication interface 304 is used to connect and communicate with external devices.
  • the memory 303 is configured to store executable program code, and the processor 301 executes the program code for:
  • the calculated sum of the focus covariance matrices of the speech signals respectively collected at the respective sampling frequency points is used as the focus covariance matrix of the speech signals collected by the microphone array.
  • the processor 301 calculates the first covariance matrix, specifically:
  • the first covariance matrix is calculated as follows:
  • k represents any sampling frequency
  • P represents the number of frames in which the microphone array acquires speech signals
  • X i (k) represents the discrete Fourier of the microphone array at any one frame and any sampling frequency.
  • Leaf transform DFT value The conjugate transposed matrix representing X i (k), N represents the number of sampling frequency points included in any one frame, and the number of sampling frequency points included in any two different frames is the same.
  • the method further includes:
  • Calculating the focus transformation matrix specifically including:
  • the product of the conjugate transposed matrix of the first eigenvector matrix and the second eigenvector matrix is used as a focus transformation matrix.
  • the processor 301 calculates the second covariance matrix, specifically:
  • the second covariance matrix is calculated as follows:
  • k 0 represents a focus frequency point
  • P represents a number of frames in which the microphone array acquires a speech signal
  • X i (k 0 ) represents a DFT value of the microphone array at any one frame and a focus frequency point
  • the processor 301 decomposes the eigenvalues by using the first covariance matrix, specifically:
  • the processor 301 decomposes the eigenvalues of the second covariance matrix, specifically:
  • X i1 (k) represents the DFT value of the first array element of the microphone array at the ith frame and the kth sampling frequency point
  • X i2 (k) represents the second array element of the microphone array at the ith frame
  • the DFT value at the kth sampling frequency point ...
  • X iL (k) represents the DFT value of the Lth array element of the microphone array at the ith frame and the kth sampling frequency point
  • L is a microphone array including The number of array elements.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus functions in one or more blocks of a flow or a flow diagram and/or block diagram of a flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

一种基于语音信号构造聚焦协方差矩阵的方法及装置:确定麦克风阵列采集语音信号时采用的采样频点(100);针对确定出的采样频点中的任意一个采样频点,计算在任意一个采样频点采集到语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵,并将第一协方差矩阵、聚焦变换矩阵、聚焦变换矩阵的共轭转置矩阵的乘积,作为在任意一采样频点采集到的语音信号的聚焦协方差矩阵(110);将计算得到的在各个采样频点分别采集得到的语音信号的聚焦协方差矩阵之和,作为语音信号的聚焦协方差矩阵(120)。在该方案中,在构造聚焦协方差矩阵时,不需要预测声源的入射角度,而预测声源的入射角时存在误差,因此,提高了构造的聚焦协方差矩阵的准确度。

Description

一种基于语音信号构造聚焦协方差矩阵的方法及装置 技术领域
本发明涉及语音信号处理技术领域,特别涉及一种基于语音信号构造聚焦协方差矩阵的方法及装置。
背景技术
麦克风阵列与单麦克风相比,除了能利用声源的时域和频域信息外,还能利用声源的空间信息,因此,具有抗干扰能力强、应用灵活等优点,在解决声源定位、语音增强、语音识别等问题方面具有较强的优势,目前已广泛用于音视频会议系统、车载系统、助听装置、人机交互系统、机器人系统、安防监控、军事侦察等领域。
在基于麦克风阵列的语音处理技术中,往往需要知道声源的数目,这样才能获得较高的处理性能;如果声源数目未知,或者假设的声源数目过多或过少,则对麦克风阵列获取的语音的处理结果的准确性就会下降。
为了提高对麦克风阵列获取的语音的处理结果的准确度,提出了计算声源的方法,在计算声源的过程中,需要构造聚焦协方差矩阵,但是,目前在构造聚焦协方差矩阵的过程中需要预测声源的入射角度,再根据预测的入射角度构造聚焦协方差矩阵,并估算声源的数目,但是,如果预测出的声源的入射角度误差较大的话,构造得到的聚焦协方差矩阵的准确度较低。
发明内容
本发明实施例提供一种基于语音信号构造聚焦协方差矩阵的方法及装置,用以解决现有技术中存在的构造得到的聚焦协方差矩阵的准确度较低的缺陷。
第一方面,提供一种基于语音信号构造聚焦协方差矩阵的方法,包括:
确定麦克风阵列采集语音信号时采用的采样频点;
针对确定出的采样频点中的任意一个采样频点,计算在所述任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及所述聚焦变换矩阵的共轭转置矩阵,并将所述第一协方差矩阵、所述聚焦变换矩阵、所述聚焦变换矩阵的共轭转置矩阵的乘积,作为在所述任意一采样频点采集到的语音信号的聚焦协方差矩阵;
将计算得到的在各个采样频点分别采集得到的语音信号的聚焦协方差矩阵之和,作为所述麦克风阵列采集到的语音信号的聚焦协方差矩阵。
结合第一方面,在第一种可能的实现方式中,计算所述第一协方差矩阵,具体包括:
采用如下方式计算所述第一协方差矩阵:
Figure PCTCN2015082571-appb-000001
其中,所述
Figure PCTCN2015082571-appb-000002
表示所述第一协方差矩阵、所述k表示所述任意一采样频点、所述P表示所述麦克风阵列采集所述语音信号的帧的数量、所述Xi(k)表示所述麦克风阵列在任意一帧及所述任意一采样频点时的离散傅里叶变换DFT值、所述
Figure PCTCN2015082571-appb-000003
表示所述Xi(k)的共轭转置矩阵、所述N表示任意一帧包括的采样频点的数量,任意两个不同帧所包括的采样频点的数量均相同。
结合第一方面,及第一方面的第一种可能的实现方式,在第二种可能的实现方式中,计算所述聚焦变换矩阵之前,还包括:
确定所述麦克风阵列采集语音信号时采用的采样频点的聚焦频点;
计算所述麦克风阵列在所述聚焦频点采集到的语音信号的第二协方差矩阵;
计算所述聚焦变换矩阵,具体包括:
对所述第一协方差矩阵分解特征值,得到第一特征向量矩阵,并对所述第一特征向量矩阵进行共轭转置,得到所述第一特征向量矩阵的共轭转置矩阵;
对所述第二协方差矩阵分解特征值,得到第二特征向量矩阵;
将所述第一特征向量矩阵的共轭转置矩阵、所述第二特征向量矩阵的乘积,作为所述聚焦变换矩阵。
结合第一方面的第二种可能的实现方式,在第三种可能的实现方式中,计算所述第二协方差矩阵,具体包括:
采用如下方式计算所述第二协方差矩阵:
Figure PCTCN2015082571-appb-000004
其中,所述
Figure PCTCN2015082571-appb-000005
表示所述第二协方差矩阵、所述k0表示所述聚焦频点、所述P表示所述麦克风阵列采集所述语音信号的帧的数量、所述Xi(k0)表示所述麦克风阵列在任意一帧及所述聚焦频点时的DFT值、所述
Figure PCTCN2015082571-appb-000006
表示所述Xi(k0)的共轭转置矩阵。
结合第一方面的第二种或者第三种可能的实现方式,在第四种可能的实现方式中,对所述第一协方差矩阵分解特征值,具体包括:
采用如下方式对所述第一协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000007
其中,所述
Figure PCTCN2015082571-appb-000008
表示所述第二协方差矩阵、所述U(k)表示所述
Figure PCTCN2015082571-appb-000009
的第二特征向量矩阵、所述Λ表示所述
Figure PCTCN2015082571-appb-000010
的特征值按从大到小顺序排列所构成的对角矩阵、所述UH(k)表示所述U(k)的共轭转置矩阵。
结合第一方面的第二种至第四种可能的实现方式,在第五种可能的实现方式中,对所述第二协方差矩阵分解特征值,具体包括:
采用如下方式对所述第二协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000011
其中,所述表示所述第二协方差矩阵、所述U(k0)表示所述
Figure PCTCN2015082571-appb-000013
的第二特征向量矩阵、所述Λ0表示所述
Figure PCTCN2015082571-appb-000014
的特征值按从大到小顺序排列所构成的对角矩阵、所述UH(k0)表示所述U(k0)的共轭转置矩阵。
结合第一方面的第一种至第五种可能的实现方式,在第六种可能的实现方式中,所述Xi(k)形式如下:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
其中:Xi1(k)表示所述麦克风阵列的第1个阵元在第i帧及第k个采样频点时的DFT值、Xi2(k)表示所述麦克风阵列的第2个阵元在第i帧及第k个采样频点时的DFT值、XiL(k)表示所述麦克风阵列的第L个阵元在第i帧及第k个采样频点时的DFT值、所述L为所述麦克风阵列包括的阵元的数量。
第二方面,提供一种基于语音信号构造聚焦协方差矩阵的装置,包括:
确定单元,用于确定麦克风阵列采集语音信号时采用的采样频点;
第一计算单元,用于针对确定出的采样频点中的任意一个采样频点,计算在所述任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及所述聚焦变换矩阵的共轭转置矩阵,并将所述第一协方差矩阵、所述聚焦变换矩阵、所述聚焦变换矩阵的共轭转置矩阵的乘积,作为在所述任意一采样频点采集到的语音信号的聚焦协方差矩阵;
第二计算单元,用于将计算得到的在各个采样频点分别采集得到的语音信号的聚焦协方差矩阵之和,作为所述麦克风阵列采集到的语音信号的聚焦协方差矩阵。
结合第二方面,在第一种可能的实现方式中,所述第一计算单元在计算所述第一协方差矩阵时,具体为:
采用如下方式计算所述第一协方差矩阵:
Figure PCTCN2015082571-appb-000015
其中,所述
Figure PCTCN2015082571-appb-000016
表示所述第一协方差矩阵、所述k表示所述任意一采样频点、所述P表示所述麦克风阵列采集所述语音信号的帧的数量、所述Xi(k)表示所述麦克风阵列在任意一帧及所述任意一采样频点时的离散傅里叶变换DFT值、所述
Figure PCTCN2015082571-appb-000017
表示所述Xi(k)的共轭转置矩阵、所述N表示任意一帧包括 的采样频点的数量,任意两个不同帧所包括的采样频点的数量均相同。
结合第二方面,及第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述确定单元还用于,确定所述麦克风阵列采集语音信号时采用的采样频点的聚焦频点;
所述第一计算单元还用于,计算所述麦克风阵列在所述聚焦频点采集到的语音信号的第二协方差矩阵;
所述第一计算单元在计算所述聚焦变换矩阵时,具体为:
对所述第一协方差矩阵分解特征值,得到第一特征向量矩阵,并对所述第一特征向量矩阵进行共轭转置,得到所述第一特征向量矩阵的共轭转置矩阵;
对所述第二协方差矩阵分解特征值,得到第二特征向量矩阵;
将所述第一特征向量矩阵的共轭转置矩阵、所述第二特征向量矩阵的乘积,作为所述聚焦变换矩阵。
结合第二方面的第二种可能的实现方式,在第三种可能的实现方式中,所述第一计算单元在计算所述第二协方差矩阵时,具体为:
采用如下方式计算所述第二协方差矩阵:
Figure PCTCN2015082571-appb-000018
其中,所述
Figure PCTCN2015082571-appb-000019
表示所述第二协方差矩阵、所述k0表示所述聚焦频点、所述P表示所述麦克风阵列采集所述语音信号的帧的数量、所述Xi(k0)表示所述麦克风阵列在任意一帧及所述聚焦频点时的DFT值、所述
Figure PCTCN2015082571-appb-000020
表示所述Xi(k0)的共轭转置矩阵。
结合第二方面的第二种或者第三种可能的实现方式,在第四种可能的实现方式中,所述第一计算单元在对所述第一协方差矩阵分解特征值时,具体为:
采用如下方式对所述第一协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000021
其中,所述
Figure PCTCN2015082571-appb-000022
表示所述第二协方差矩阵、所述U(k)表示所述
Figure PCTCN2015082571-appb-000023
的第二特征向量矩阵、所述Λ表示所述
Figure PCTCN2015082571-appb-000024
的特征值按从大到小顺序排列所构成的对角矩阵、所述UH(k)表示所述U(k)的共轭转置矩阵。
结合第二方面的第二种至第四种可能的实现方式,在第五种可能的实现方式中,所述第一计算单元在对所述第二协方差矩阵分解特征值时,具体为:
采用如下方式对所述第二协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000025
其中,所述
Figure PCTCN2015082571-appb-000026
表示所述第二协方差矩阵、所述U(k0)表示所述
Figure PCTCN2015082571-appb-000027
的第二特征向量矩阵、所述Λ0表示所述
Figure PCTCN2015082571-appb-000028
的特征值按从大到小顺序排列所构成的对角矩阵、所述UH(k0)表示所述U(k0)的共轭转置矩阵。
结合第二方面的第一种至第五种可能的实现方式,在第六种可能的实现方式中,所述Xi(k)形式如下:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
其中:Xi1(k)表示所述麦克风阵列的第1个阵元在第i帧及第k个采样频点时的DFT值、Xi2(k)表示所述麦克风阵列的第2个阵元在第i帧及第k个采样频点时的DFT值、XiL(k)表示所述麦克风阵列的第L个阵元在第i帧及第k个采样频点时的DFT值、所述L为所述麦克风阵列包括的阵元的数量。
本发明实施例提供的基于语音信号构造聚焦协方差矩阵的主要思想为:确定麦克风阵列采集语音信号时采用的采样频点;针对确定出的采样频点中的任意一个采样频点,计算在任意一个采样频点采集到语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵,并将第一协方差矩阵、聚焦变换矩阵、聚焦变换矩阵的共轭转置矩阵的乘积,作为在任意一采样频点采集到的语音信号的聚焦协方差矩阵;将计算得到的在各个采样频点分别采集得到的语音信号的聚焦协方差矩阵之和,作为语音信号的聚焦协方差矩阵,在该方案中,在构造聚焦协方差矩阵时,不需要预测声源的入射角度,而预测声源的入射角时存在误差,因此,本发明实施例提供的方案提 高了构造的聚焦协方差矩阵的准确度。
附图说明
图1A为本发明实施例中基于语音信号构造聚焦协方差矩阵的流程图;
图1B为本发明实施例中帧移示意图;
图1C为本发明实施例提供的计算声源的数目与CSM-GDE计算声源的数目的一种对比示意图;
图1D为本发明实施例提供的计算声源的数目与CSM-GDE计算声源的数目的另一种对比示意图;
图2为本发明实施例中基于语音信号构造聚焦协方差矩阵的实施例;
图3A为本发明实施例中基于语音信号构造聚焦协方差矩阵的装置的结构示意图;
图3B为本发明实施例中基于语音信号构造聚焦协方差矩阵的装置的一种结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字母“/”,一般表示前后关联对象是一种“或”的关系。
下面结合说明书附图对本发明优选的实施方式进行详细说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明, 并且在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
下面结合附图对本发明优选的实施方式进行详细说明。
参阅图1A所示,本发明实施例中,基于语音信号构造聚焦协方差矩阵的流程如下:
步骤100:确定麦克风阵列采集语音信号时采用的采样频点;
步骤110:针对确定出的采样频点中的任意一个采样频点,计算在任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵,并将第一协方差矩阵、聚焦变换矩阵、聚焦变换矩阵的共轭转置矩阵的乘积,作为在任意一采样频点采集到的语音信号的聚焦协方差矩阵;
步骤120:将计算得到的在各个采样频点分别采集得到的语音信号的聚焦协方差矩阵之和,作为麦克风阵列采集到的语音信号的聚焦协方差矩阵。
本发明实施例中,为了提高构造出的聚焦协方差矩阵的准确度,在获取麦克风阵列在任意一采样频点采集到的语音信号之后,计算在任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵之前,还包括如下操作:
对采集到的语音信号进行预加重处理;
此时,计算在任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵,可选的,可以采用如下方式:
对在任意一个采样频点采集到的语音信号进行预加重处理;
计算经过预加重处理后的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵。
本发明实施例中,可选的,可以采用如下方式对语音信号进行预加重处理:
Figure PCTCN2015082571-appb-000029
   (公式一)
其中,
Figure PCTCN2015082571-appb-000030
为对在第k个采样频点采集到的语音信号进行预加重处理后的语音信号、x(k)为在第k个采样频点采集到的语音信号、x(k-1)为在第k-1个采样频点采集到的语音信号、N为采样频点的数量、a为预加重系数,可选的,取a=0.9375。
其中,可选的,x(k)的形式如公式二所示:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1   (公式二)
其中:Xi1(k)表示麦克风阵列的第1个阵元在第i帧及第k个采样频点时的DFT值、Xi2(k)表示麦克风阵列的第2个阵元在第i帧及第k个采样频点时的DFT值、……、XiL(k)表示麦克风阵列的第L个阵元在第i帧及第k个采样频点时的DFT值、L为麦克风阵列包括的阵元的数量、P表示麦克风阵列采集语音信号的帧的数量。
本发明实施例中,为了提高构造出的聚焦协方差矩阵的准确度,获取麦克风阵列在任意一采样频点采集到的语音信号之后,计算在任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵之前,还包括如下操作:
对采集到的语音信号进行分帧处理;
计算在任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵时,可选的,可以采用如下方式:
对在任意一个采样频点采集到的语音信号进行分帧处理;
计算进行分帧处理后的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵。
本发明实施例中,在进行分帧处理时,采用交叠的方式进行分帧,即前后两帧产生交叠,交叠的部分称为帧移,可选的,选取帧移为帧长的一半,分帧交叠如图1B所示。
本发明实施例中,为了进一步提高构造出的聚焦协方差矩阵的准确度,在对接收的语音信号在进行分帧处理后,需要对进行分帧处理后的语音信号进行加窗处理。
对进行分帧处理后的语音信号进行加窗处理时可以采用如下方式:
将进行分帧处理后的语音信号与Hamming窗函数w(n)相乘。其中,可选的,Hamming窗函数w(n)如公式三所示:
Figure PCTCN2015082571-appb-000031
   (公式三)
其中,k为任意一采样频点,N表示任意一帧包括的采样频点的数量,任意两个不同帧所包括的采样频点的数量均相同。
在实际应用中,麦克风阵列采集到的语音信号可能有些信号是目标对象发出的语音信号,有些信号是非目标对象发出的语音信号,例如:在开会时,在主讲人讲话之前,有一些噪音,这些噪音是非目标对象发出的语音信号,而在主讲人开始讲话时,此时麦克风阵列采集到的语音信号就是目标对象发出的语音信号,而根据这些目标对象发出的语音信号构造出的聚焦协方差矩阵的准确度较高,因此,本发明实施例中,在获取麦克风阵列采集到的语音信号之后,计算在任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵之前,还包括如下操作:
计算在任意一个采样频点、在任意一帧采集到的语音信号的能量值;
确定对应的能量值达到预设能量门限值的语音信号所在的帧;
计算在任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵时,可选的,可以采用如下方式:
计算在任意一个采样频点、及确定的帧采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵。
本发明实施例中,计算第一协方差矩阵的方式有多种,可选的,可以采用如下方式:
采用如下方式计算第一协方差矩阵:
Figure PCTCN2015082571-appb-000032
   (公式四)
其中,
Figure PCTCN2015082571-appb-000033
表示第一协方差矩阵、k表示任意一采样频点、P表示麦克风阵列采集语音信号的帧的数量、Xi(k)表示麦克风阵列在任意一帧及任意一采样频点时的DFT(Discrete Fourier Transform,离散傅里叶变换)值、
Figure PCTCN2015082571-appb-000034
表示Xi(k)的共轭转置矩阵、N表示任意一帧包括的采样频点的数量,任意两个不同帧所包括的采样频点的数量均相同。
本发明实施例中,在计算聚焦变换矩阵之前,还包括如下操作:
确定麦克风阵列采集语音信号时采用的采样频点的聚焦频点;
计算麦克风阵列在聚焦频点采集到的语音信号的第二协方差矩阵;
此时,在计算聚焦变换矩阵时,可选的,可以采用如下方式:
对第一协方差矩阵分解特征值,得到第一特征向量矩阵,并对第一特征向量矩阵进行共轭转置,得到第一特征向量矩阵的共轭转置矩阵;
对第二协方差矩阵分解特征值,得到第二特征向量矩阵;
将第一特征向量矩阵的共轭转置矩阵、第二特征向量矩阵的乘积,作为聚焦变换矩阵。
本发明实施例中,在计算第二协方差矩阵时,可选的,可以采用如下方式:
采用如下方式计算第二协方差矩阵:
Figure PCTCN2015082571-appb-000035
   (公式五)
其中,
Figure PCTCN2015082571-appb-000036
表示第二协方差矩阵、k0表示聚焦频点、P表示麦克风阵列采集语音信号的帧的数量、Xi(k0)表示麦克风阵列在任意一帧及聚焦频点时的DFT值、
Figure PCTCN2015082571-appb-000037
表示Xi(k0)的共轭转置矩阵。
本发明实施例中,对第一协方差矩阵分解特征值时,可选的,可以采用如下方式:
采用如下方式对第一协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000038
   (公式六)
其中,
Figure PCTCN2015082571-appb-000039
表示第二协方差矩阵、U(k)表示
Figure PCTCN2015082571-appb-000040
的第二特征向量矩阵、Λ表示
Figure PCTCN2015082571-appb-000041
的特征值按从大到小顺序排列所构成的对角矩阵、UH(k)表示U(k)的共轭转置矩阵。
本发明实施例中,对第二协方差矩阵分解特征值时,可选的,可以采用如下方式:
采用如下方式对第二协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000042
   (公式七)
其中,
Figure PCTCN2015082571-appb-000043
表示第二协方差矩阵、U(k0)表示
Figure PCTCN2015082571-appb-000044
的第二特征向量矩阵、Λ0表示
Figure PCTCN2015082571-appb-000045
的特征值按从大到小顺序排列所构成的对角矩阵、UH(k0)表示U(k0)的共轭转置矩阵。
本发明实施例中,可选的,Xi(k)形式如公式二所示。本发明实施例中,在计算得到聚焦协方差矩阵后,可以根据得到的聚焦协方差矩阵计算声源数目,在根据得到的聚焦协方差矩阵计算声源数目时,可选的,可以采用如下方式:
采用盖尔圆准则根据得到的聚焦协方差矩阵计算声源数目。例如:在室内环境,房间大小为10m×10m×3m,八个顶点坐标分别为(0,0,0)、(0,10,0)、(0,10,2.5)、(0,0,2.5)、(10,0,0)、(10,10,0)、(10,10,2.5)和(10,0,2.5)。10个麦克风组成的均匀直线阵列分布在(2,4,1.3)和(2,4.9,1.3)两点间,阵元间距为0.1m,阵元为各向同性的全向性麦克风,6个说话人位置分别为(8,1,1.3)、(8,2.6,1.3)、(8,4.2,1.3)、(8,5.8,1.3)、(8,7.4,1.3)和(8,9,1.3),假设背景噪声为高斯白噪声。使用Image仿真模型对麦克风阵列和说话人话音进行处理,以8kHz采样频率对语音信号进行采样,获取麦克风阵列接收信号。折叠重采样的系数γ=0.8,迭代次数为20。说话人语音信号时长足够长,每次实验中取不同数据进行50次测试,检测概率如下所示:
Figure PCTCN2015082571-appb-000046
   (公式八)
如果实际说话人数目为2,任意一帧包括128个采样频点,帧数量为100,盖尔圆准则中的参数D(K)=0.7,信噪比从-5dB变化到5dB,步长为1dB时,采用本发明实施例提供的方法构造出的聚焦协方差矩阵的方法与现有的CSM(Coherent Signal Subspace Method,相干信号子空间方法)-GDE(Gerschgorin Disk Estimator,盖尔圆盘估计法)方法的检测概率随信噪比的对比如图1C所示。由图1C可已看出,CSM-GDE方法在信噪比为0dB时,检测概率可达到0.9,在信噪比为4dB时,检测概率可达到1。本发明提供的方案在信噪比小于0dB时,与CSM-GDE方法相比,正确检测概率有较大提升;在信噪比为-3dB时,检测概率达到0.9,在信噪比为-3dB时,正确检测概率即可达到1。
如果实际说话人数目为2,信噪比为10dB,任意一帧包括128个采样频点,帧数量从5变化到70,步长为5时,采用本发明实施例提供的方法构造出的聚焦协方差矩阵的方法与现有的CSM-GDE方法检测概率随帧数量的对比如图1D所示。由图1D可知,CSM-GDE方法在帧数量为40时,检测概率可达到0.9,在帧数量为65时,检测概率可达到1。本发明方案在帧数量小于50时,与CSM-GDE方法相比,检测概率有较大提升;在帧数量为25时,检测概率达到0.9,在帧数量为50时,检测概率即可达到1。
表1给出了根据本发明方案提供的构造聚焦协方差矩阵计算声源数目的方法与CSM-GDE计算声源数目的方法在不同说话人数目情况下的性能比较。在该实验中,实际说话人数目为2,信噪比为10dB,子帧长度为128点,帧数量为100。由表1可知,在实际说话人数目为2和3时,本发明方案提供的构造聚焦协方差矩阵计算声源数目的方法与CSM-GDE计算声源数目的方法检测概率都可达到1,当实际说话人数目大于3时,随说话人数目增加检测概率逐渐下降,说话人数目相同情况下,根据本发明方案提供的构造聚焦协方差矩阵计算声源数目的方法较CSM-GDE计算声源数目的方法具有更高的检测概率。
表1检测概率随实际说话人数目的变化
实际说话人数目 2个 3个 4个 5个 6个
CSM-GDE 1 1 0.94 0.84 0.66
本发明方案 1 1 0.98 0.90 0.72
本发明实施例中,采用盖尔圆准则根据得到的聚焦协方差矩阵计算声源数目为本技术领域中比较常用的方式,在此不再进行详述。
为了更好地理解本发明实施例,以下给出具体应用场景,针对基于语音信号构造聚焦协方差矩阵的过程,做出进一步详细描述,如图2所示:
步骤200:确定麦克风阵列采集语音信号时采用的采样频点为100个:采样频点0、采样频点1、采样频点2、……、采样频点99;
步骤210:针对采样频点,0,计算针对采样频点0的第一协方差矩阵;
步骤220:确定100个采样频点的聚焦频点;
步骤230:计算麦克风阵列在聚焦频点采集到的语音信号的第二协方差矩阵;
步骤240:对第一协方差矩阵分解特征值,得到第一特征向量矩阵,并对第一特征向量矩阵进行共轭转置,得到第一特征向量矩阵的共轭转置矩阵;
步骤250:对第二协方差矩阵分解特征值,得到第二特征向量矩阵;
步骤260:将第一特征向量矩阵的共轭转置矩阵、第二特征向量矩阵的乘积,作为聚焦变换矩阵,并对聚焦变换矩阵进行共轭转置,得到聚焦变换矩阵的共轭转置矩阵;
步骤270:将第一协方差矩阵、聚焦变换矩阵、聚焦变换矩阵的共轭转置矩阵的乘积,作为在采样频点0采集到的语音信号的聚焦协方差矩阵;
步骤280:按照计算针对采样频点0的聚焦协方差矩阵的方式计算其他采样频点的聚焦协方差矩阵,并将针对每一个采样频点的聚焦协方差矩阵之和,作为麦克风阵列采集到的语音信号的聚焦协方差矩阵。
基于上述相应方法的技术方案,参阅图3A所示,本发明实施例提供一种 基于语音信号构造聚焦协方差矩阵的装置,该装置包括确定单元30、第一计算单元31,及第二计算单元32,其中:
确定单元30,用于确定麦克风阵列采集语音信号时采用的采样频点;
第一计算单元31,用于针对确定出的采样频点中的任意一个采样频点,计算在任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵,并将第一协方差矩阵、聚焦变换矩阵、聚焦变换矩阵的共轭转置矩阵的乘积,作为在任意一采样频点采集到的语音信号的聚焦协方差矩阵;
第二计算单元32,用于将计算得到的在各个采样频点分别采集得到的语音信号的聚焦协方差矩阵之和,作为麦克风阵列采集到的语音信号的聚焦协方差矩阵。
可选的,第一计算单元31在计算第一协方差矩阵时,具体为:
采用如下方式计算第一协方差矩阵:
Figure PCTCN2015082571-appb-000047
其中,
Figure PCTCN2015082571-appb-000048
表示第一协方差矩阵、k表示任意一采样频点、P表示麦克风阵列采集语音信号的帧的数量、Xi(k)表示麦克风阵列在任意一帧及任意一采样频点时的离散傅里叶变换DFT值、
Figure PCTCN2015082571-appb-000049
表示Xi(k)的共轭转置矩阵、N表示任意一帧包括的采样频点的数量,任意两个不同帧所包括的采样频点的数量均相同。
进一步的,确定单元30还用于,确定麦克风阵列采集语音信号时采用的采样频点的聚焦频点;
第一计算单元31还用于,计算麦克风阵列在聚焦频点采集到的语音信号的第二协方差矩阵;
第一计算单元31在计算聚焦变换矩阵时,具体为:
对第一协方差矩阵分解特征值,得到第一特征向量矩阵,并对第一特征向量矩阵进行共轭转置,得到第一特征向量矩阵的共轭转置矩阵;
对第二协方差矩阵分解特征值,得到第二特征向量矩阵;
将第一特征向量矩阵的共轭转置矩阵、第二特征向量矩阵的乘积,作为聚焦变换矩阵。
可选的,第一计算单元31在计算第二协方差矩阵时,具体为:
采用如下方式计算第二协方差矩阵:
Figure PCTCN2015082571-appb-000050
其中,
Figure PCTCN2015082571-appb-000051
表示第二协方差矩阵、k0表示聚焦频点、P表示麦克风阵列采集语音信号的帧的数量、Xi(k0)表示麦克风阵列在任意一帧及聚焦频点时的DFT值、
Figure PCTCN2015082571-appb-000052
表示Xi(k0)的共轭转置矩阵。
可选的,第一计算单元31在对第一协方差矩阵分解特征值时,具体为:
采用如下方式对第一协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000053
其中,
Figure PCTCN2015082571-appb-000054
表示第二协方差矩阵、U(k)表示
Figure PCTCN2015082571-appb-000055
的第二特征向量矩阵、Λ表示
Figure PCTCN2015082571-appb-000056
的特征值按从大到小顺序排列所构成的对角矩阵、UH(k)表示U(k)的共轭转置矩阵。
可选的,第一计算单元31在对第二协方差矩阵分解特征值时,具体为:
采用如下方式对第二协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000057
其中,
Figure PCTCN2015082571-appb-000058
表示第二协方差矩阵、U(k0)表示
Figure PCTCN2015082571-appb-000059
的第二特征向量矩阵、Λ0表示
Figure PCTCN2015082571-appb-000060
的特征值按从大到小顺序排列所构成的对角矩阵、UH(k0)表示U(k0)的共轭转置矩阵。
可选的,Xi(k)形式如下:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
其中:Xi1(k)表示麦克风阵列的第1个阵元在第i帧及第k个采样频点时的DFT值、Xi2(k)表示麦克风阵列的第2个阵元在第i帧及第k个采样频点时的DFT 值、……、XiL(k)表示麦克风阵列的第L个阵元在第i帧及第k个采样频点时的DFT值、L为麦克风阵列包括的阵元的数量。
如图3B所示,为本发明实施例提供的基于语音信号构造聚焦协方差矩阵的装置的另一种结构示意图,包括至少一个处理器301,通信总线302,存储器303以及至少一个通信接口304。
其中,通信总线302用于实现上述组件之间的连接并通信,通信接口304用于与外部设备连接并通信。
其中,存储器303用于存储有可执行的程序代码,处理器301通过执行这些程序代码,以用于:
确定麦克风阵列采集语音信号时采用的采样频点;
针对确定出的采样频点中的任意一个采样频点,计算在任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及聚焦变换矩阵的共轭转置矩阵,并将第一协方差矩阵、聚焦变换矩阵、聚焦变换矩阵的共轭转置矩阵的乘积,作为在任意一采样频点采集到的语音信号的聚焦协方差矩阵;
将计算得到的在各个采样频点分别采集得到的语音信号的聚焦协方差矩阵之和,作为麦克风阵列采集到的语音信号的聚焦协方差矩阵。
可选的,处理器301计算第一协方差矩阵时,具体为:
采用如下方式计算第一协方差矩阵:
Figure PCTCN2015082571-appb-000061
其中,
Figure PCTCN2015082571-appb-000062
表示第一协方差矩阵、k表示任意一采样频点、P表示麦克风阵列采集语音信号的帧的数量、Xi(k)表示麦克风阵列在任意一帧及任意一采样频点时的离散傅里叶变换DFT值、
Figure PCTCN2015082571-appb-000063
表示Xi(k)的共轭转置矩阵、N表示任意一帧包括的采样频点的数量,任意两个不同帧所包括的采样频点的数量均相同。
进一步的,处理器301计算聚焦变换矩阵之前,还包括:
确定麦克风阵列采集语音信号时采用的采样频点的聚焦频点;
计算麦克风阵列在聚焦频点采集到的语音信号的第二协方差矩阵;
计算聚焦变换矩阵,具体包括:
对第一协方差矩阵分解特征值,得到第一特征向量矩阵,并对第一特征向量矩阵进行共轭转置,得到第一特征向量矩阵的共轭转置矩阵;
对第二协方差矩阵分解特征值,得到第二特征向量矩阵;
将第一特征向量矩阵的共轭转置矩阵、第二特征向量矩阵的乘积,作为聚焦变换矩阵。
可选的,处理器301计算第二协方差矩阵时,具体为:
采用如下方式计算第二协方差矩阵:
Figure PCTCN2015082571-appb-000064
其中,
Figure PCTCN2015082571-appb-000065
表示第二协方差矩阵、k0表示聚焦频点、P表示麦克风阵列采集语音信号的帧的数量、Xi(k0)表示麦克风阵列在任意一帧及聚焦频点时的DFT值、
Figure PCTCN2015082571-appb-000066
表示Xi(k0)的共轭转置矩阵。
可选的,处理器301对第一协方差矩阵分解特征值时,具体为:
采用如下方式对第一协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000067
其中,
Figure PCTCN2015082571-appb-000068
表示第二协方差矩阵、U(k)表示
Figure PCTCN2015082571-appb-000069
的第二特征向量矩阵、Λ表示
Figure PCTCN2015082571-appb-000070
的特征值按从大到小顺序排列所构成的对角矩阵、UH(k)表示U(k)的共轭转置矩阵。
可选的,处理器301对第二协方差矩阵分解特征值时,具体为:
采用如下方式对第二协方差矩阵分解特征值:
Figure PCTCN2015082571-appb-000071
其中,
Figure PCTCN2015082571-appb-000072
表示第二协方差矩阵、U(k0)表示
Figure PCTCN2015082571-appb-000073
的第二特征向量矩阵、Λ0表示
Figure PCTCN2015082571-appb-000074
的特征值按从大到小顺序排列所构成的对角矩阵、UH(k0)表示U(k0)的共轭转置矩阵。
本发明实施例中,可选的,Xi(k)形式如下:
Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
其中:Xi1(k)表示麦克风阵列的第1个阵元在第i帧及第k个采样频点时的DFT值、Xi2(k)表示麦克风阵列的第2个阵元在第i帧及第k个采样频点时的DFT值、……、XiL(k)表示麦克风阵列的第L个阵元在第i帧及第k个采样频点时的DFT值、L为麦克风阵列包括的阵元的数量。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了 基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明实施例的精神和范围。这样,倘若本发明实施例的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (14)

  1. 一种基于语音信号构造聚焦协方差矩阵的方法,其特征在于,包括:
    确定麦克风阵列采集语音信号时采用的采样频点;
    针对确定出的采样频点中的任意一个采样频点,计算在所述任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及所述聚焦变换矩阵的共轭转置矩阵,并将所述第一协方差矩阵、所述聚焦变换矩阵、所述聚焦变换矩阵的共轭转置矩阵的乘积,作为在所述任意一采样频点采集到的语音信号的聚焦协方差矩阵;
    将计算得到的在各个采样频点分别采集得到的语音信号的聚焦协方差矩阵之和,作为所述麦克风阵列采集到的语音信号的聚焦协方差矩阵。
  2. 如权利要求1所述的方法,其特征在于,计算所述第一协方差矩阵,具体包括:
    采用如下方式计算所述第一协方差矩阵:
    Figure PCTCN2015082571-appb-100001
    其中,所述
    Figure PCTCN2015082571-appb-100002
    表示所述第一协方差矩阵、所述k表示所述任意一采样频点、所述P表示所述麦克风阵列采集所述语音信号的帧的数量、所述Xi(k)表示所述麦克风阵列在任意一帧及所述任意一采样频点时的离散傅里叶变换DFT值、所述
    Figure PCTCN2015082571-appb-100003
    表示所述Xi(k)的共轭转置矩阵、所述N表示任意一帧包括的采样频点的数量,任意两个不同帧所包括的采样频点的数量均相同。
  3. 如权利要求1或2所述的方法,其特征在于,计算所述聚焦变换矩阵之前,还包括:
    确定所述麦克风阵列采集语音信号时采用的采样频点的聚焦频点;
    计算所述麦克风阵列在所述聚焦频点采集到的语音信号的第二协方差矩阵;
    计算所述聚焦变换矩阵,具体包括:
    对所述第一协方差矩阵分解特征值,得到第一特征向量矩阵,并对所述第一特征向量矩阵进行共轭转置,得到所述第一特征向量矩阵的共轭转置矩阵;
    对所述第二协方差矩阵分解特征值,得到第二特征向量矩阵;
    将所述第一特征向量矩阵的共轭转置矩阵、所述第二特征向量矩阵的乘积,作为所述聚焦变换矩阵。
  4. 如权利要求3所述的方法,其特征在于,计算所述第二协方差矩阵,具体包括:
    采用如下方式计算所述第二协方差矩阵:
    Figure PCTCN2015082571-appb-100004
    其中,所述
    Figure PCTCN2015082571-appb-100005
    表示所述第二协方差矩阵、所述k0表示所述聚焦频点、所述P表示所述麦克风阵列采集所述语音信号的帧的数量、所述Xi(k0)表示所述麦克风阵列在任意一帧及所述聚焦频点时的DFT值、所述
    Figure PCTCN2015082571-appb-100006
    表示所述Xi(k0)的共轭转置矩阵。
  5. 如权利要求3或4所述的方法,其特征在于,对所述第一协方差矩阵分解特征值,具体包括:
    采用如下方式对所述第一协方差矩阵分解特征值:
    Figure PCTCN2015082571-appb-100007
    其中,所述
    Figure PCTCN2015082571-appb-100008
    表示所述第二协方差矩阵、所述U(k)表示所述
    Figure PCTCN2015082571-appb-100009
    的第二特征向量矩阵、所述Λ表示所述
    Figure PCTCN2015082571-appb-100010
    的特征值按从大到小顺序排列所构成的对角矩阵、所述UH(k)表示所述U(k)的共轭转置矩阵。
  6. 如权利要求3-5任一项所述的方法,其特征在于,对所述第二协方差矩阵分解特征值,具体包括:
    采用如下方式对所述第二协方差矩阵分解特征值:
    Figure PCTCN2015082571-appb-100011
    其中,所述
    Figure PCTCN2015082571-appb-100012
    表示所述第二协方差矩阵、所述U(k0)表示所述
    Figure PCTCN2015082571-appb-100013
    的第二特征向量矩阵、所述Λ0表示所述
    Figure PCTCN2015082571-appb-100014
    的特征值按从大到小顺序排列所构成的对角矩阵、所述UH(k0)表示所述U(k0)的共轭转置矩阵。
  7. 如权利要求2-6任一项所述的方法,其特征在于,所述Xi(k)形式如下:
    Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
    其中:Xi1(k)表示所述麦克风阵列的第1个阵元在第i帧及第k个采样频点时的DFT值、Xi2(k)表示所述麦克风阵列的第2个阵元在第i帧及第k个采样频点时的DFT值、XiL(k)表示所述麦克风阵列的第L个阵元在第i帧及第k个采样频点时的DFT值、所述L为所述麦克风阵列包括的阵元的数量。
  8. 一种基于语音信号构造聚焦协方差矩阵的装置,其特征在于,包括:
    确定单元,用于确定麦克风阵列采集语音信号时采用的采样频点;
    第一计算单元,用于针对确定出的采样频点中的任意一个采样频点,计算在所述任意一个采样频点采集到的语音信号的第一协方差矩阵、聚焦变换矩阵,及所述聚焦变换矩阵的共轭转置矩阵,并将所述第一协方差矩阵、所述聚焦变换矩阵、所述聚焦变换矩阵的共轭转置矩阵的乘积,作为在所述任意一采样频点采集到的语音信号的聚焦协方差矩阵;
    第二计算单元,用于将计算得到的在各个采样频点分别采集得到的语音信号的聚焦协方差矩阵之和,作为所述麦克风阵列采集到的语音信号的聚焦协方差矩阵。
  9. 如权利要求8所述的装置,其特征在于,所述第一计算单元在计算所述第一协方差矩阵时,具体为:
    采用如下方式计算所述第一协方差矩阵:
    Figure PCTCN2015082571-appb-100015
    其中,所述
    Figure PCTCN2015082571-appb-100016
    表示所述第一协方差矩阵、所述k表示所述任意一采样频点、所述P表示所述麦克风阵列采集所述语音信号的帧的数量、所述Xi(k)表 示所述麦克风阵列在任意一帧及所述任意一采样频点时的离散傅里叶变换DFT值、所述
    Figure PCTCN2015082571-appb-100017
    表示所述Xi(k)的共轭转置矩阵、所述N表示任意一帧包括的采样频点的数量,任意两个不同帧所包括的采样频点的数量均相同。
  10. 如权利要求8或9所述的装置,其特征在于,所述确定单元还用于,确定所述麦克风阵列采集语音信号时采用的采样频点的聚焦频点;
    所述第一计算单元还用于,计算所述麦克风阵列在所述聚焦频点采集到的语音信号的第二协方差矩阵;
    所述第一计算单元在计算所述聚焦变换矩阵时,具体为:
    对所述第一协方差矩阵分解特征值,得到第一特征向量矩阵,并对所述第一特征向量矩阵进行共轭转置,得到所述第一特征向量矩阵的共轭转置矩阵;
    对所述第二协方差矩阵分解特征值,得到第二特征向量矩阵;
    将所述第一特征向量矩阵的共轭转置矩阵、所述第二特征向量矩阵的乘积,作为所述聚焦变换矩阵。
  11. 如权利要求10所述的装置,其特征在于,所述第一计算单元在计算所述第二协方差矩阵时,具体为:
    采用如下方式计算所述第二协方差矩阵:
    Figure PCTCN2015082571-appb-100018
    其中,所述
    Figure PCTCN2015082571-appb-100019
    表示所述第二协方差矩阵、所述k0表示所述聚焦频点、所述P表示所述麦克风阵列采集所述语音信号的帧的数量、所述Xi(k0)表示所述麦克风阵列在任意一帧及所述聚焦频点时的DFT值、所述
    Figure PCTCN2015082571-appb-100020
    表示所述Xi(k0)的共轭转置矩阵。
  12. 如权利要求10或11所述的装置,其特征在于,所述第一计算单元在对所述第一协方差矩阵分解特征值时,具体为:
    采用如下方式对所述第一协方差矩阵分解特征值:
    Figure PCTCN2015082571-appb-100021
    其中,所述表示所述第二协方差矩阵、所述U(k)表示所述
    Figure PCTCN2015082571-appb-100023
    的第二特征向量矩阵、所述Λ表示所述
    Figure PCTCN2015082571-appb-100024
    的特征值按从大到小顺序排列所构成的对角矩阵、所述UH(k)表示所述U(k)的共轭转置矩阵。
  13. 如权利要求10-12任一项所述的装置,其特征在于,所述第一计算单元在对所述第二协方差矩阵分解特征值时,具体为:
    采用如下方式对所述第二协方差矩阵分解特征值:
    Figure PCTCN2015082571-appb-100025
    其中,所述
    Figure PCTCN2015082571-appb-100026
    表示所述第二协方差矩阵、所述U(k0)表示所述
    Figure PCTCN2015082571-appb-100027
    的第二特征向量矩阵、所述Λ0表示所述
    Figure PCTCN2015082571-appb-100028
    的特征值按从大到小顺序排列所构成的对角矩阵、所述UH(k0)表示所述U(k0)的共轭转置矩阵。
  14. 如权利要求9-13任一项所述的装置,其特征在于,所述Xi(k)形式如下:
    Xi(k)=[Xi1(k),Xi2(k),......,XiL(k)]T,i=0,1,2,......,P-1
    其中:Xi1(k)表示所述麦克风阵列的第1个阵元在第i帧及第k个采样频点时的DFT值、Xi2(k)表示所述麦克风阵列的第2个阵元在第i帧及第k个采样频点时的DFT值、XiL(k)表示所述麦克风阵列的第L个阵元在第i帧及第k个采样频点时的DFT值、所述L为所述麦克风阵列包括的阵元的数量。
PCT/CN2015/082571 2015-01-30 2015-06-26 一种基于语音信号构造聚焦协方差矩阵的方法及装置 WO2016119388A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510052368.7A CN104599679A (zh) 2015-01-30 2015-01-30 一种基于语音信号构造聚焦协方差矩阵的方法及装置
CN201510052368.7 2015-01-30

Publications (1)

Publication Number Publication Date
WO2016119388A1 true WO2016119388A1 (zh) 2016-08-04

Family

ID=53125412

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/082571 WO2016119388A1 (zh) 2015-01-30 2015-06-26 一种基于语音信号构造聚焦协方差矩阵的方法及装置

Country Status (2)

Country Link
CN (1) CN104599679A (zh)
WO (1) WO2016119388A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110501727A (zh) * 2019-08-13 2019-11-26 中国航空工业集团公司西安飞行自动控制研究所 一种基于空频自适应滤波的卫星导航抗干扰方法
CN111696570A (zh) * 2020-08-17 2020-09-22 北京声智科技有限公司 语音信号处理方法、装置、设备及存储介质
CN113409804A (zh) * 2020-12-22 2021-09-17 声耕智能科技(西安)研究院有限公司 一种基于变张成广义子空间的多通道频域语音增强算法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104599679A (zh) * 2015-01-30 2015-05-06 华为技术有限公司 一种基于语音信号构造聚焦协方差矩阵的方法及装置
CN108538306B (zh) * 2017-12-29 2020-05-26 北京声智科技有限公司 提高语音设备doa估计的方法及装置
CN110992977B (zh) * 2019-12-03 2021-06-22 北京声智科技有限公司 一种目标声源的提取方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040220800A1 (en) * 2003-05-02 2004-11-04 Samsung Electronics Co., Ltd Microphone array method and system, and speech recognition method and system using the same
CN102568493A (zh) * 2012-02-24 2012-07-11 大连理工大学 一种基于最大矩阵对角率的欠定盲分离方法
CN102621527A (zh) * 2012-03-20 2012-08-01 哈尔滨工程大学 基于数据重构的宽带相干源的方位估计方法
CN102664666A (zh) * 2012-04-09 2012-09-12 电子科技大学 一种高效的宽带稳健自适应波束形成方法
CN104599679A (zh) * 2015-01-30 2015-05-06 华为技术有限公司 一种基于语音信号构造聚焦协方差矩阵的方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166120B (zh) * 2014-07-04 2017-07-11 哈尔滨工程大学 一种声矢量圆阵稳健宽带mvdr方位估计方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040220800A1 (en) * 2003-05-02 2004-11-04 Samsung Electronics Co., Ltd Microphone array method and system, and speech recognition method and system using the same
CN102568493A (zh) * 2012-02-24 2012-07-11 大连理工大学 一种基于最大矩阵对角率的欠定盲分离方法
CN102621527A (zh) * 2012-03-20 2012-08-01 哈尔滨工程大学 基于数据重构的宽带相干源的方位估计方法
CN102664666A (zh) * 2012-04-09 2012-09-12 电子科技大学 一种高效的宽带稳健自适应波束形成方法
CN104599679A (zh) * 2015-01-30 2015-05-06 华为技术有限公司 一种基于语音信号构造聚焦协方差矩阵的方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110501727A (zh) * 2019-08-13 2019-11-26 中国航空工业集团公司西安飞行自动控制研究所 一种基于空频自适应滤波的卫星导航抗干扰方法
CN110501727B (zh) * 2019-08-13 2023-10-20 中国航空工业集团公司西安飞行自动控制研究所 一种基于空频自适应滤波的卫星导航抗干扰方法
CN111696570A (zh) * 2020-08-17 2020-09-22 北京声智科技有限公司 语音信号处理方法、装置、设备及存储介质
CN111696570B (zh) * 2020-08-17 2020-11-24 北京声智科技有限公司 语音信号处理方法、装置、设备及存储介质
CN113409804A (zh) * 2020-12-22 2021-09-17 声耕智能科技(西安)研究院有限公司 一种基于变张成广义子空间的多通道频域语音增强算法

Also Published As

Publication number Publication date
CN104599679A (zh) 2015-05-06

Similar Documents

Publication Publication Date Title
WO2016119388A1 (zh) 一种基于语音信号构造聚焦协方差矩阵的方法及装置
US9978388B2 (en) Systems and methods for restoration of speech components
JP6374882B2 (ja) 音場の高次アンビソニクス表現における無相関な音源の方向を決定する方法及び装置
US20180262832A1 (en) Sound Signal Processing Apparatus and Method for Enhancing a Sound Signal
Dorfan et al. Tree-based recursive expectation-maximization algorithm for localization of acoustic sources
CN109509465B (zh) 语音信号的处理方法、组件、设备及介质
WO2019080551A1 (zh) 目标语音检测方法及装置
JP2021110938A (ja) 平面マイクロフォンアアレイのための複数音源トラッキング及び発話区間検出
CN110610718B (zh) 一种提取期望声源语音信号的方法及装置
JP6225245B2 (ja) 信号処理装置、方法及びプログラム
CN110706719B (zh) 一种语音提取方法、装置、电子设备及存储介质
EP3440670B1 (en) Audio source separation
CN109308909B (zh) 一种信号分离方法、装置、电子设备及存储介质
EP3320311B1 (en) Estimation of reverberant energy component from active audio source
CN114171041A (zh) 基于环境检测的语音降噪方法、装置、设备及存储介质
CN111681665A (zh) 一种全向降噪方法、设备及存储介质
JP2007047427A (ja) 音声処理装置
Liao et al. An effective low complexity binaural beamforming algorithm for hearing aids
WO2019119593A1 (zh) 语音增强方法及装置
Ganguly et al. Non-uniform microphone arrays for robust speech source localization for smartphone-assisted hearing aid devices
Aroudi et al. DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation
JP6734237B2 (ja) 目的音源推定装置、目的音源推定方法及び目的音源推定プログラム
CN114495974B (zh) 音频信号处理方法
CN115910047B (zh) 数据处理方法、模型训练方法、关键词检测方法及设备
CN117037836B (zh) 基于信号协方差矩阵重构的实时声源分离方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15879591

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15879591

Country of ref document: EP

Kind code of ref document: A1