CN102760435A - Frequency-domain blind deconvolution method for voice signal - Google Patents

Frequency-domain blind deconvolution method for voice signal Download PDF

Info

Publication number
CN102760435A
CN102760435A CN2012102278402A CN201210227840A CN102760435A CN 102760435 A CN102760435 A CN 102760435A CN 2012102278402 A CN2012102278402 A CN 2012102278402A CN 201210227840 A CN201210227840 A CN 201210227840A CN 102760435 A CN102760435 A CN 102760435A
Authority
CN
China
Prior art keywords
omega
signal
domain
voice signal
sum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012102278402A
Other languages
Chinese (zh)
Inventor
丁志中
黄玉雷
戴礼荣
陈小平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN2012102278402A priority Critical patent/CN102760435A/en
Publication of CN102760435A publication Critical patent/CN102760435A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a frequency-domain blind deconvolution method for a voice signal, comprising the following steps of: converting a time-domain convolution mixed voice signal to a frequency domain and then performing blind separation; converting and transforming the time-domain convolution mixed voice signal to a frequency-domain linear instantaneous mixture model via windowed Fourier transform according to the short-time stability of the voice signal; after performing pre-processing such as filtering and whitening in the frequency domain, realizing segmented blind separation for the voice signal by adopting a method of the approximate joint diagonalization of correlation matrices under different time delays; and after solving the problem of the fuzziness of the blind separation for the signal, performing segmented recombination for the separated signals in the time domain via inverse Fourier transform. Via the frequency-domain blind deconvolution method disclosed by the invention, a good separation effect is realized for 2*2 real-time recoded mixed voice signal, and the recognition accuracy of the voice signal of a human-computer interaction system in an environment with the speech interference of other people can be efficiently improved.

Description

A kind of voice signal frequency domain blind deconvolution method
Technical field
The invention belongs to voice signal extraction and identification field in the multimedia messages processing, be specifically related to a kind of voice signal frequency domain blind deconvolution method, can be applicable to improve in the man-machine interaction scene interactive identification rate.
Background technology
The automatic speech recognition technology was through the development in surplus 60 years, and under noiseless or noiseless environment, discrimination surpasses 95%.But when especially two or more speakers spoke simultaneously in actual application environment, phonetic recognization rate descended suddenly, this limited greatly should technology in man-machine interaction (Human-Machine Interaction, the application in HMI).Human auditory system can be obtained own information of interest in noisy environment, and the robot that is under the man-machine interaction environment is difficult to have this ability.Blind Signal Separation is exactly a kind of technology that the mixed signal that under original signal and the equal condition of unknown of transmission channel, only obtained by receiving sensor is estimated original signal.
Blind separation under the HMI environment belongs to blind uncoiling category, and for mixing voice signal under convolution mixed signal or the true environment, academia mainly contains two kinds of methods it is carried out deconvolution, and a kind of is the blind uncoiling of time domain, and another kind is the blind uncoiling of frequency domain.The blind uncoiling of time domain mainly is based on the ICA notion scalar hybrid matrix under the linear instantaneous mix is expanded to the wave filter hybrid matrix under the convolution mix, and objective function and iterative algorithm are done certain correction.The blind uncoiling basic idea of frequency domain is to utilize Short Time Fourier Transform that the time domain convolution mixed signal is transformed to the instantaneous mixed signal of frequency domain; Utilize the comparatively ripe blind separation algorithm of instantaneous mixing that the frequency domain mixed signal is separated again; Promptly each frequency utilizes the blind separation algorithm of instantaneous mixing to separate in frequency domain, solves the time-domain signal after the uncertain and signal amplitude of the order of output signal obtains separating through inverse Fourier transform after uncertain again.
The inferior position of the blind uncoiling of time domain is that calculated amount is too big, and especially when compound filter was comparatively complicated, each rank of finding the solution wave filter all will rely on finding the solution of all the other rank.For example the diagonal angle constant that proposes of Chan divides from matrix algorithms, and compound filter is 5 rank when following, and algorithm is separating mixture of source signals fast, and when the exponent number of wave filter be 6 rank when above, velocity of separation obviously descends and the separating effect variation.And frequency domain algorithm is separate in each Frequency point separation, and the compound filter exponent number is little a lot of than Time-Domain algorithm to the calculated amount influence.
Existing blind deconvolution method is also few both at home and abroad, and there is deficiency in the following areas in existing method:
1) most of algorithms obtain under certain qualifications, and separating effect is undesirable, and it is bigger to separate back signal cross interference, and robustness is not high.
2) in true environment man-machine interaction process, recognition correct rate is not high.
3) existing algorithm search speed is slow, and real-time is relatively poor, can not well be applied to the real time human-machine interaction scene.
Summary of the invention
It is not enough to the present invention is directed to above-mentioned existing in prior technology, discloses a kind of voice signal frequency domain blind deconvolution method, and this method is carried out blind separation through the time domain convolution mixed signal is transformed to frequency domain, and separating effect is better, can be applicable to field of speech recognition.
Technical solution problem of the present invention adopts following technical scheme:
Voice signal frequency domain blind deconvolution method is characterized in that: time domain convolution hybrid voice is transformed to frequency domain carry out blind separation, specifically may further comprise the steps:
1) divide frame to the self-adaptation of original audio file, when SF was 16KHz, frame length was got 16ms, and frame pipettes 2ms;
2) the single frames data are carried out Fourier transform, change the convolution mixed signal model into linear mixed model; The convolution mixture model can be expressed as
x ( t ) = H ⊗ s ( t ) (
Figure BDA0000184688242
The expression convolution) (1)
The Short Time Fourier Transform of signal can be expressed as
X ( ω , t s ) = ∑ t e ? j ω t x ( t ) w ( t ? t s ) - - - ( 2 )
Wherein X ( ω , t s ) = H ( ω ) S ( ω , t s ) - - - ( 3 ) X (ω, t s) be the Short Time Fourier Transform of x (t), w (t) is a window function; Suppose when commingled system is to become, by (1) Shi Kede
X ( ω , t s ) = H ( ω ) S ( ω , t s ) - - - ( 3 )
Wherein H (ω) and S (ω, t s) be respectively the Fourier transform of compound filter H (p) and source signal s (t), H (ω) can estimate separately at each Frequency point;
3) adopting characteristic value decomposition that input signal is carried out albefaction handles; The covariance matrix of mixed signal can be broken down into
R x ( 0 ) = 1 T ∑ t = 0 T ? 1 x ( t ) x ? ( t ) = QΛ Q - 1 - - - ( 4 )
Here Λ=diag (d 1, d 2... D n) be diagonal matrix, its element is covariance matrix R x(0) eigenwert, Q is the characteristic of correspondence vector, Q -1It is the inverse matrix of Q; The albefaction matrix V can be expressed as
V = Λ ? 1 2 Q ? 1 - - - ( 5 )
Figure BDA0000184688248
is the square root of the inverse matrix of covariance matrix Λ;
4) a rotation matrix U is promptly sought in correlation matrix associating diagonalization, makes following formula reach minimum;
∑ τ = 1 r ∑ i ≠ j | ( U R z ( τ ) U * ) i j | 2 - - - ( 6 )
Here R z(τ) be defined as
R z ( τ ) = 1 T ∑ t = 0 T ? 1 z ( t ) z ? ( t + τ )   , τ = 1 , 2 , · · · r - - - ( 7 )
Frequency domain is separated hybrid matrix W
W = U V - - - ( 8 )
5) definition output signal spectrum Y 1(ω) and Y 2(ω), amplitude a 1(ω) and a 2(ω) related coefficient does
r ( a 1 ( ω ) , a 2 ( ω ) ) = cov ( a 1 ( ω ) , a 2 ( ω ) ) D ( a 1 ( ω ) ) D ( a 2 ( ω ) ) - - - ( 9 )
Wherein covariance does
Figure BDA00001846882413
Figure BDA00001846882414
a 1(ω, m) being illustrated in first signal is the signal component amplitude of ω in m window and frequency;
6) calculating parameter | r (a 1m), a 1M+1)) |, | r (a 2m), a 2M+1)) |, | r (a 1m), a 2M+1)) |, | r (a 2m), a 1M+1)) |, confirm the signal reorganization;
7) inverse Fourier transform in short-term of calculating (2) formula
X ( t ) = 1 2 π 1 W ( t ) ∑ t s ∑ ω e j ω ( t ? t s ) X ( ω , t s ) - - - ( 11 )
Here
W ( t ) = ∑ t s w ( t ? t s ) . - - - ( 12 )
The present invention is according to the stationarity in short-term of voice signal; The time domain convolution mixed signal is transformed into frequency domain linear instantaneous mixture model through the windowing Fourier transform; In frequency domain after the pre-service such as filtering, albefaction; Adopt the blind separation of the diagonalizable method realization segmentation voice signal of the approximate associating of correlation matrix under the different delay, behind the fuzzy problem that has solved blind signal separation, in time domain, carry out the reorganization of segmentation separation signal through inverse fourier transform.
Beneficial effect of the present invention is:
1) the present invention has good separating effect to 2 * 2 faithful record mixing voice signals, can improve effectively to have other people the speak voice signal recognition correct rate of man-machine interactive system under the interference environment.
2) the present invention carries out blind separation through the time domain convolution mixed signal is transformed to frequency domain, and separating effect is better, can be applicable to field of speech recognition.
Description of drawings
The system flowchart of Fig. 1 the inventive method.
Embodiment
With reference to Fig. 1, a kind of voice signal frequency domain blind deconvolution method transforms to frequency domain with time domain convolution hybrid voice and carries out blind separation, specifically may further comprise the steps:
1) divide frame to the self-adaptation of original audio file, when SF was 16KHz, frame length was got 16ms, and frame pipettes 2ms;
2) the single frames data are carried out Fourier transform, change the convolution mixed signal model into linear mixed model; The convolution mixture model can be expressed as
x ( t ) = H ⊗ s ( t ) ( The expression convolution) (1)
The Short Time Fourier Transform of signal can be expressed as
X ( ω , t s ) = ∑ t e ? j ω t x ( t ) w ( t ? t s ) - - - ( 2 )
Wherein
Figure BDA00001846882420
X (ω, t s) be the Short Time Fourier Transform of x (t), w (t) is a window function; Suppose when commingled system is to become, by (1) Shi Kede
X ( ω , t s ) = H ( ω ) S ( ω , t s ) - - - ( 3 )
Wherein H (ω) and S (ω, t s) be respectively the Fourier transform of compound filter H (p) and source signal s (t), H (ω) can estimate separately at each Frequency point;
3) adopting characteristic value decomposition that input signal is carried out albefaction handles; The covariance matrix of mixed signal can be broken down into
R x ( 0 ) = 1 T ∑ t = 0 T ? 1 x ( t ) x ? ( t ) = QΛ Q - 1 - - - ( 4 )
Here Λ=diag (d 1, d 2D n) be diagonal matrix, its element is covariance matrix R x(0) eigenwert, Q is the characteristic of correspondence vector, Q -1It is the inverse matrix of Q; The albefaction matrix V can be expressed as
V = Λ ? 1 2 Q ? 1 - - - ( 5 )
is the square root of the inverse matrix of covariance matrix Λ;
4) a rotation matrix U is promptly sought in correlation matrix associating diagonalization, makes following formula reach minimum;
∑ τ = 1 r ∑ i ≠ j | ( U R z ( τ ) U * ) i j | 2 - - - ( 6 )
Here R z(τ) be defined as
R z ( τ ) = 1 T ∑ t = 0 T ? 1 z ( t ) z ? ( t + τ )    τ = 1 , 2 , · · · r - - - ( 7 )
Frequency domain is separated hybrid matrix W
W = U V - - - ( 8 )
5) definition output signal spectrum Y 1(ω) and Y 2(ω), amplitude a 1(ω) and a 2(ω) related coefficient does
r ( a 1 ( ω ) , a 2 ( ω ) ) = cov ( a 1 ( ω ) , a 2 ( ω ) ) D ( a 1 ( ω ) ) D ( a 2 ( ω ) ) - - - ( 9 )
Wherein covariance does
Figure BDA00001846882429
Figure BDA00001846882430
a 1(ω, m) being illustrated in first signal is the signal component amplitude of ω in m window and frequency;
6) calculating parameter | r (a 1m), a 1M+1)) |, | r (a 2m), a 2M+1)) |, | r (a 1m), a 2M+1)) |, | r (a 2m), a 1M+1)) |, confirm the signal reorganization;
7) inverse Fourier transform in short-term of calculating (2) formula
X ( t ) = 1 2 π 1 W ( t ) ∑ t s ∑ ω e j ω ( t ? t s ) X ( ω , t s ) - - - ( 11 )
Here
W ( t ) = ∑ t s w ( t ? t s ) . - - - ( 12 )
Following table 1 is the present invention and two kinds of typical blind deconvolution method performance comparison:
Table 1
Figure BDA00001846882433

Claims (1)

1. voice signal frequency domain blind deconvolution method is characterized in that: time domain convolution hybrid voice is transformed to frequency domain carry out blind separation, specifically may further comprise the steps:
1) divide frame to the self-adaptation of original audio file, when SF was 16KHz, frame length was got 16ms, and frame pipettes 2ms;
2) the single frames data are carried out Fourier transform, change the convolution mixed signal model into linear mixed model; The convolution mixture model can be expressed as
x ( t ) = H ⊗ s ( t ) (
Figure FDA0000184688232
The expression convolution) (1)
The Short Time Fourier Transform of signal can be expressed as
X ( ω , t s ) = ∑ t e ? j ω t x ( t ) w ( t ? t s ) - - - ( 2 )
Wherein
Figure FDA0000184688234
X (ω, t s) be the Short Time Fourier Transform of x (t), w (t) is a window function; Suppose when commingled system is to become, by (1) Shi Kede
X ( ω , t s ) = H ( ω ) S ( ω , t s ) - - - ( 3 )
Wherein H (ω) and S (ω, t s) be respectively the Fourier transform of compound filter H (p) and source signal s (t), H (ω) can estimate separately at each Frequency point;
3) adopting characteristic value decomposition that input signal is carried out albefaction handles; The covariance matrix of mixed signal can be broken down into
R x ( 0 ) = 1 T ∑ t = 0 T ? 1 x ( t ) x ? ( t ) = QΛ Q - 1 - - - ( 4 )
Here Λ=diag (d 1, d 2... D n) be diagonal matrix, its element is covariance matrix R x(0) eigenwert, Q is the characteristic of correspondence vector, Q -1It is the inverse matrix of Q; The albefaction matrix V can be expressed as
V = Λ ? 1 2 Q ? 1 - - - ( 5 )
Figure FDA0000184688238
is the square root of the inverse matrix of covariance matrix Λ;
4) a rotation matrix U is promptly sought in correlation matrix associating diagonalization, makes following formula reach minimum;
∑ τ = 1 r ∑ i ≠ j | ( U R z ( τ ) U * ) i j | 2 - - - ( 6 )
Here R z(τ) be defined as
R z ( τ ) = 1 T ∑ t = 0 T ? 1 z ( t ) z ? ( t + τ )    τ = 1 , 2 , · · · r - - - ( 7 )
Frequency domain is separated hybrid matrix W
W = U V - - - ( 8 )
5) definition output signal spectrum Y 1(ω) and Y 2(ω), amplitude a 1(ω) and a 2(ω) related coefficient does
r ( a 1 ( ω ) , a 2 ( ω ) ) = cov ( a 1 ( ω ) , a 2 ( ω ) ) D ( a 1 ( ω ) ) D ( a 2 ( ω ) ) - - - ( 9 )
Wherein covariance does
Figure FDA00001846882313
Figure FDA00001846882314
a 1(ω, m) being shown in first signal is the signal component amplitude of ω in m window and frequency;
6) calculating parameter | r (a 1m), a 1M+1)) |, | r (a 2m), a 2M+1)) |, | r (a 1m), a 2M+1)) |, | r (a 2m), a 1M+1)) |, confirm the signal reorganization;
7) inverse Fourier transform in short-term of calculating (2) formula
X ( t ) = 1 2 π 1 W ( t ) ∑ t s ∑ ω e j ω ( t ? t s ) X ( ω , t s ) - - - ( 11 )
Here
W ( t ) = ∑ t s w ( t ? t s ) . - - - ( 12 )
CN2012102278402A 2012-07-03 2012-07-03 Frequency-domain blind deconvolution method for voice signal Pending CN102760435A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012102278402A CN102760435A (en) 2012-07-03 2012-07-03 Frequency-domain blind deconvolution method for voice signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012102278402A CN102760435A (en) 2012-07-03 2012-07-03 Frequency-domain blind deconvolution method for voice signal

Publications (1)

Publication Number Publication Date
CN102760435A true CN102760435A (en) 2012-10-31

Family

ID=47054877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012102278402A Pending CN102760435A (en) 2012-07-03 2012-07-03 Frequency-domain blind deconvolution method for voice signal

Country Status (1)

Country Link
CN (1) CN102760435A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104934041A (en) * 2015-05-07 2015-09-23 西安电子科技大学 Convolutive blind signal separation method based on multi-target optimization joint block diagonalization
CN105324762A (en) * 2013-06-25 2016-02-10 歌拉利旺株式会社 Filter coefficient group computation device and filter coefficient group computation method
CN105825866A (en) * 2016-05-24 2016-08-03 天津大学 Real-time convolutive mixed blind signal separation adaptive step length method based on fuzzy system
CN106023984A (en) * 2016-04-28 2016-10-12 成都之达科技有限公司 Speech recognition method based on car networking
CN107563300A (en) * 2017-08-08 2018-01-09 浙江上风高科专风实业有限公司 Noise reduction preconditioning technique based on prewhitening method
CN110265060A (en) * 2019-06-04 2019-09-20 广东工业大学 A kind of speaker's number automatic testing method based on Density Clustering
CN116866116A (en) * 2023-07-13 2023-10-10 中国人民解放军战略支援部队航天工程大学 Time-delay mixed linear blind separation method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105324762A (en) * 2013-06-25 2016-02-10 歌拉利旺株式会社 Filter coefficient group computation device and filter coefficient group computation method
CN105324762B (en) * 2013-06-25 2017-11-28 歌拉利旺株式会社 Filter coefficient group computing device and filter coefficient group's computational methods
CN104934041A (en) * 2015-05-07 2015-09-23 西安电子科技大学 Convolutive blind signal separation method based on multi-target optimization joint block diagonalization
CN104934041B (en) * 2015-05-07 2018-07-03 西安电子科技大学 Convolution Blind Signal Separation method based on multiple-objection optimization joint block-diagonalization
CN106023984A (en) * 2016-04-28 2016-10-12 成都之达科技有限公司 Speech recognition method based on car networking
CN105825866A (en) * 2016-05-24 2016-08-03 天津大学 Real-time convolutive mixed blind signal separation adaptive step length method based on fuzzy system
CN107563300A (en) * 2017-08-08 2018-01-09 浙江上风高科专风实业有限公司 Noise reduction preconditioning technique based on prewhitening method
CN110265060A (en) * 2019-06-04 2019-09-20 广东工业大学 A kind of speaker's number automatic testing method based on Density Clustering
CN116866116A (en) * 2023-07-13 2023-10-10 中国人民解放军战略支援部队航天工程大学 Time-delay mixed linear blind separation method
CN116866116B (en) * 2023-07-13 2024-02-27 中国人民解放军战略支援部队航天工程大学 Time-delay mixed linear blind separation method

Similar Documents

Publication Publication Date Title
CN102760435A (en) Frequency-domain blind deconvolution method for voice signal
JP7434137B2 (en) Speech recognition method, device, equipment and computer readable storage medium
CN108986838B (en) Self-adaptive voice separation method based on sound source positioning
CN107393542B (en) Bird species identification method based on two-channel neural network
DE112017001830B4 (en) VOICE ENHANCEMENT AND AUDIO EVENT DETECTION FOR A NON-STATIONARY NOISE ENVIRONMENT
Shi et al. On the importance of phase in human speech recognition
Yegnanarayana et al. Processing of reverberant speech for time-delay estimation
Schädler et al. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition
CN101154384B (en) Sound signal correcting method, sound signal correcting apparatus and computer program
CN105393305A (en) Method for processing acoustic signal
CN111986695B (en) Non-overlapping sub-band division rapid independent vector analysis voice blind separation method and system
CN108597505A (en) Audio recognition method, device and terminal device
CN105679312A (en) Phonetic feature processing method of voiceprint identification in noise environment
CN105845149A (en) Predominant pitch acquisition method in acoustical signal and system thereof
CN102426837A (en) Robustness method used for voice recognition on mobile equipment during agricultural field data acquisition
CN103258537A (en) Method utilizing characteristic combination to identify speech emotions and device thereof
CN111681649B (en) Speech recognition method, interaction system and achievement management system comprising system
CN106023984A (en) Speech recognition method based on car networking
CN110838303B (en) Voice sound source positioning method using microphone array
CN117169812A (en) Sound source positioning method based on deep learning and beam forming
CN103778914B (en) Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching
CN105869627A (en) Vehicle-networking-based speech processing method
CN114827363A (en) Method, device and readable storage medium for eliminating echo in call process
CN108074580B (en) Noise elimination method and device
Khan et al. Speaker separation using visual speech features and single-channel audio.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121031