CN102760435A - Frequency-domain blind deconvolution method for voice signal - Google Patents
Frequency-domain blind deconvolution method for voice signal Download PDFInfo
- Publication number
- CN102760435A CN102760435A CN2012102278402A CN201210227840A CN102760435A CN 102760435 A CN102760435 A CN 102760435A CN 2012102278402 A CN2012102278402 A CN 2012102278402A CN 201210227840 A CN201210227840 A CN 201210227840A CN 102760435 A CN102760435 A CN 102760435A
- Authority
- CN
- China
- Prior art keywords
- omega
- signal
- domain
- voice signal
- sum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Complex Calculations (AREA)
Abstract
The invention discloses a frequency-domain blind deconvolution method for a voice signal, comprising the following steps of: converting a time-domain convolution mixed voice signal to a frequency domain and then performing blind separation; converting and transforming the time-domain convolution mixed voice signal to a frequency-domain linear instantaneous mixture model via windowed Fourier transform according to the short-time stability of the voice signal; after performing pre-processing such as filtering and whitening in the frequency domain, realizing segmented blind separation for the voice signal by adopting a method of the approximate joint diagonalization of correlation matrices under different time delays; and after solving the problem of the fuzziness of the blind separation for the signal, performing segmented recombination for the separated signals in the time domain via inverse Fourier transform. Via the frequency-domain blind deconvolution method disclosed by the invention, a good separation effect is realized for 2*2 real-time recoded mixed voice signal, and the recognition accuracy of the voice signal of a human-computer interaction system in an environment with the speech interference of other people can be efficiently improved.
Description
Technical field
The invention belongs to voice signal extraction and identification field in the multimedia messages processing, be specifically related to a kind of voice signal frequency domain blind deconvolution method, can be applicable to improve in the man-machine interaction scene interactive identification rate.
Background technology
The automatic speech recognition technology was through the development in surplus 60 years, and under noiseless or noiseless environment, discrimination surpasses 95%.But when especially two or more speakers spoke simultaneously in actual application environment, phonetic recognization rate descended suddenly, this limited greatly should technology in man-machine interaction (Human-Machine Interaction, the application in HMI).Human auditory system can be obtained own information of interest in noisy environment, and the robot that is under the man-machine interaction environment is difficult to have this ability.Blind Signal Separation is exactly a kind of technology that the mixed signal that under original signal and the equal condition of unknown of transmission channel, only obtained by receiving sensor is estimated original signal.
Blind separation under the HMI environment belongs to blind uncoiling category, and for mixing voice signal under convolution mixed signal or the true environment, academia mainly contains two kinds of methods it is carried out deconvolution, and a kind of is the blind uncoiling of time domain, and another kind is the blind uncoiling of frequency domain.The blind uncoiling of time domain mainly is based on the ICA notion scalar hybrid matrix under the linear instantaneous mix is expanded to the wave filter hybrid matrix under the convolution mix, and objective function and iterative algorithm are done certain correction.The blind uncoiling basic idea of frequency domain is to utilize Short Time Fourier Transform that the time domain convolution mixed signal is transformed to the instantaneous mixed signal of frequency domain; Utilize the comparatively ripe blind separation algorithm of instantaneous mixing that the frequency domain mixed signal is separated again; Promptly each frequency utilizes the blind separation algorithm of instantaneous mixing to separate in frequency domain, solves the time-domain signal after the uncertain and signal amplitude of the order of output signal obtains separating through inverse Fourier transform after uncertain again.
The inferior position of the blind uncoiling of time domain is that calculated amount is too big, and especially when compound filter was comparatively complicated, each rank of finding the solution wave filter all will rely on finding the solution of all the other rank.For example the diagonal angle constant that proposes of Chan divides from matrix algorithms, and compound filter is 5 rank when following, and algorithm is separating mixture of source signals fast, and when the exponent number of wave filter be 6 rank when above, velocity of separation obviously descends and the separating effect variation.And frequency domain algorithm is separate in each Frequency point separation, and the compound filter exponent number is little a lot of than Time-Domain algorithm to the calculated amount influence.
Existing blind deconvolution method is also few both at home and abroad, and there is deficiency in the following areas in existing method:
1) most of algorithms obtain under certain qualifications, and separating effect is undesirable, and it is bigger to separate back signal cross interference, and robustness is not high.
2) in true environment man-machine interaction process, recognition correct rate is not high.
3) existing algorithm search speed is slow, and real-time is relatively poor, can not well be applied to the real time human-machine interaction scene.
Summary of the invention
It is not enough to the present invention is directed to above-mentioned existing in prior technology, discloses a kind of voice signal frequency domain blind deconvolution method, and this method is carried out blind separation through the time domain convolution mixed signal is transformed to frequency domain, and separating effect is better, can be applicable to field of speech recognition.
Technical solution problem of the present invention adopts following technical scheme:
Voice signal frequency domain blind deconvolution method is characterized in that: time domain convolution hybrid voice is transformed to frequency domain carry out blind separation, specifically may further comprise the steps:
1) divide frame to the self-adaptation of original audio file, when SF was 16KHz, frame length was got 16ms, and frame pipettes 2ms;
2) the single frames data are carried out Fourier transform, change the convolution mixed signal model into linear mixed model; The convolution mixture model can be expressed as
The Short Time Fourier Transform of signal can be expressed as
Wherein
X (ω, t
s) be the Short Time Fourier Transform of x (t), w (t) is a window function; Suppose when commingled system is to become, by (1) Shi Kede
Wherein H (ω) and S (ω, t
s) be respectively the Fourier transform of compound filter H (p) and source signal s (t), H (ω) can estimate separately at each Frequency point;
3) adopting characteristic value decomposition that input signal is carried out albefaction handles; The covariance matrix of mixed signal can be broken down into
Here Λ=diag (d
1, d
2... D
n) be diagonal matrix, its element is covariance matrix R
x(0) eigenwert, Q is the characteristic of correspondence vector, Q
-1It is the inverse matrix of Q; The albefaction matrix V can be expressed as
4) a rotation matrix U is promptly sought in correlation matrix associating diagonalization, makes following formula reach minimum;
Here R
z(τ) be defined as
Frequency domain is separated hybrid matrix W
5) definition output signal spectrum Y
1(ω) and Y
2(ω), amplitude a
1(ω) and a
2(ω) related coefficient does
Wherein covariance does
a
1(ω, m) being illustrated in first signal is the signal component amplitude of ω in m window and frequency;
6) calculating parameter | r (a
1(ω
m), a
1(ω
M+1)) |, | r (a
2(ω
m), a
2(ω
M+1)) |, | r (a
1(ω
m), a
2(ω
M+1)) |, | r (a
2(ω
m), a
1(ω
M+1)) |, confirm the signal reorganization;
7) inverse Fourier transform in short-term of calculating (2) formula
Here
The present invention is according to the stationarity in short-term of voice signal; The time domain convolution mixed signal is transformed into frequency domain linear instantaneous mixture model through the windowing Fourier transform; In frequency domain after the pre-service such as filtering, albefaction; Adopt the blind separation of the diagonalizable method realization segmentation voice signal of the approximate associating of correlation matrix under the different delay, behind the fuzzy problem that has solved blind signal separation, in time domain, carry out the reorganization of segmentation separation signal through inverse fourier transform.
Beneficial effect of the present invention is:
1) the present invention has good separating effect to 2 * 2 faithful record mixing voice signals, can improve effectively to have other people the speak voice signal recognition correct rate of man-machine interactive system under the interference environment.
2) the present invention carries out blind separation through the time domain convolution mixed signal is transformed to frequency domain, and separating effect is better, can be applicable to field of speech recognition.
Description of drawings
The system flowchart of Fig. 1 the inventive method.
Embodiment
With reference to Fig. 1, a kind of voice signal frequency domain blind deconvolution method transforms to frequency domain with time domain convolution hybrid voice and carries out blind separation, specifically may further comprise the steps:
1) divide frame to the self-adaptation of original audio file, when SF was 16KHz, frame length was got 16ms, and frame pipettes 2ms;
2) the single frames data are carried out Fourier transform, change the convolution mixed signal model into linear mixed model; The convolution mixture model can be expressed as
The Short Time Fourier Transform of signal can be expressed as
Wherein
X (ω, t
s) be the Short Time Fourier Transform of x (t), w (t) is a window function; Suppose when commingled system is to become, by (1) Shi Kede
Wherein H (ω) and S (ω, t
s) be respectively the Fourier transform of compound filter H (p) and source signal s (t), H (ω) can estimate separately at each Frequency point;
3) adopting characteristic value decomposition that input signal is carried out albefaction handles; The covariance matrix of mixed signal can be broken down into
Here Λ=diag (d
1, d
2D
n) be diagonal matrix, its element is covariance matrix R
x(0) eigenwert, Q is the characteristic of correspondence vector, Q
-1It is the inverse matrix of Q; The albefaction matrix V can be expressed as
is the square root of the inverse matrix of covariance matrix Λ;
4) a rotation matrix U is promptly sought in correlation matrix associating diagonalization, makes following formula reach minimum;
Here R
z(τ) be defined as
Frequency domain is separated hybrid matrix W
5) definition output signal spectrum Y
1(ω) and Y
2(ω), amplitude a
1(ω) and a
2(ω) related coefficient does
Wherein covariance does
a
1(ω, m) being illustrated in first signal is the signal component amplitude of ω in m window and frequency;
6) calculating parameter | r (a
1(ω
m), a
1(ω
M+1)) |, | r (a
2(ω
m), a
2(ω
M+1)) |, | r (a
1(ω
m), a
2(ω
M+1)) |, | r (a
2(ω
m), a
1(ω
M+1)) |, confirm the signal reorganization;
7) inverse Fourier transform in short-term of calculating (2) formula
Here
Following table 1 is the present invention and two kinds of typical blind deconvolution method performance comparison:
Table 1
Claims (1)
1. voice signal frequency domain blind deconvolution method is characterized in that: time domain convolution hybrid voice is transformed to frequency domain carry out blind separation, specifically may further comprise the steps:
1) divide frame to the self-adaptation of original audio file, when SF was 16KHz, frame length was got 16ms, and frame pipettes 2ms;
2) the single frames data are carried out Fourier transform, change the convolution mixed signal model into linear mixed model; The convolution mixture model can be expressed as
The Short Time Fourier Transform of signal can be expressed as
Wherein
X (ω, t
s) be the Short Time Fourier Transform of x (t), w (t) is a window function; Suppose when commingled system is to become, by (1) Shi Kede
Wherein H (ω) and S (ω, t
s) be respectively the Fourier transform of compound filter H (p) and source signal s (t), H (ω) can estimate separately at each Frequency point;
3) adopting characteristic value decomposition that input signal is carried out albefaction handles; The covariance matrix of mixed signal can be broken down into
Here Λ=diag (d
1, d
2... D
n) be diagonal matrix, its element is covariance matrix R
x(0) eigenwert, Q is the characteristic of correspondence vector, Q
-1It is the inverse matrix of Q; The albefaction matrix V can be expressed as
4) a rotation matrix U is promptly sought in correlation matrix associating diagonalization, makes following formula reach minimum;
Here R
z(τ) be defined as
Frequency domain is separated hybrid matrix W
5) definition output signal spectrum Y
1(ω) and Y
2(ω), amplitude a
1(ω) and a
2(ω) related coefficient does
Wherein covariance does
a
1(ω, m) being shown in first signal is the signal component amplitude of ω in m window and frequency;
6) calculating parameter | r (a
1(ω
m), a
1(ω
M+1)) |, | r (a
2(ω
m), a
2(ω
M+1)) |, | r (a
1(ω
m), a
2(ω
M+1)) |, | r (a
2(ω
m), a
1(ω
M+1)) |, confirm the signal reorganization;
7) inverse Fourier transform in short-term of calculating (2) formula
Here
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012102278402A CN102760435A (en) | 2012-07-03 | 2012-07-03 | Frequency-domain blind deconvolution method for voice signal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012102278402A CN102760435A (en) | 2012-07-03 | 2012-07-03 | Frequency-domain blind deconvolution method for voice signal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102760435A true CN102760435A (en) | 2012-10-31 |
Family
ID=47054877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012102278402A Pending CN102760435A (en) | 2012-07-03 | 2012-07-03 | Frequency-domain blind deconvolution method for voice signal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102760435A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104934041A (en) * | 2015-05-07 | 2015-09-23 | 西安电子科技大学 | Convolutive blind signal separation method based on multi-target optimization joint block diagonalization |
CN105324762A (en) * | 2013-06-25 | 2016-02-10 | 歌拉利旺株式会社 | Filter coefficient group computation device and filter coefficient group computation method |
CN105825866A (en) * | 2016-05-24 | 2016-08-03 | 天津大学 | Real-time convolutive mixed blind signal separation adaptive step length method based on fuzzy system |
CN106023984A (en) * | 2016-04-28 | 2016-10-12 | 成都之达科技有限公司 | Speech recognition method based on car networking |
CN107563300A (en) * | 2017-08-08 | 2018-01-09 | 浙江上风高科专风实业有限公司 | Noise reduction preconditioning technique based on prewhitening method |
CN110265060A (en) * | 2019-06-04 | 2019-09-20 | 广东工业大学 | A kind of speaker's number automatic testing method based on Density Clustering |
CN116866116A (en) * | 2023-07-13 | 2023-10-10 | 中国人民解放军战略支援部队航天工程大学 | Time-delay mixed linear blind separation method |
-
2012
- 2012-07-03 CN CN2012102278402A patent/CN102760435A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105324762A (en) * | 2013-06-25 | 2016-02-10 | 歌拉利旺株式会社 | Filter coefficient group computation device and filter coefficient group computation method |
CN105324762B (en) * | 2013-06-25 | 2017-11-28 | 歌拉利旺株式会社 | Filter coefficient group computing device and filter coefficient group's computational methods |
CN104934041A (en) * | 2015-05-07 | 2015-09-23 | 西安电子科技大学 | Convolutive blind signal separation method based on multi-target optimization joint block diagonalization |
CN104934041B (en) * | 2015-05-07 | 2018-07-03 | 西安电子科技大学 | Convolution Blind Signal Separation method based on multiple-objection optimization joint block-diagonalization |
CN106023984A (en) * | 2016-04-28 | 2016-10-12 | 成都之达科技有限公司 | Speech recognition method based on car networking |
CN105825866A (en) * | 2016-05-24 | 2016-08-03 | 天津大学 | Real-time convolutive mixed blind signal separation adaptive step length method based on fuzzy system |
CN107563300A (en) * | 2017-08-08 | 2018-01-09 | 浙江上风高科专风实业有限公司 | Noise reduction preconditioning technique based on prewhitening method |
CN110265060A (en) * | 2019-06-04 | 2019-09-20 | 广东工业大学 | A kind of speaker's number automatic testing method based on Density Clustering |
CN116866116A (en) * | 2023-07-13 | 2023-10-10 | 中国人民解放军战略支援部队航天工程大学 | Time-delay mixed linear blind separation method |
CN116866116B (en) * | 2023-07-13 | 2024-02-27 | 中国人民解放军战略支援部队航天工程大学 | Time-delay mixed linear blind separation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102760435A (en) | Frequency-domain blind deconvolution method for voice signal | |
JP7434137B2 (en) | Speech recognition method, device, equipment and computer readable storage medium | |
CN108986838B (en) | Self-adaptive voice separation method based on sound source positioning | |
CN107393542B (en) | Bird species identification method based on two-channel neural network | |
DE112017001830B4 (en) | VOICE ENHANCEMENT AND AUDIO EVENT DETECTION FOR A NON-STATIONARY NOISE ENVIRONMENT | |
Shi et al. | On the importance of phase in human speech recognition | |
Yegnanarayana et al. | Processing of reverberant speech for time-delay estimation | |
Schädler et al. | Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition | |
CN101154384B (en) | Sound signal correcting method, sound signal correcting apparatus and computer program | |
CN105393305A (en) | Method for processing acoustic signal | |
CN111986695B (en) | Non-overlapping sub-band division rapid independent vector analysis voice blind separation method and system | |
CN108597505A (en) | Audio recognition method, device and terminal device | |
CN105679312A (en) | Phonetic feature processing method of voiceprint identification in noise environment | |
CN105845149A (en) | Predominant pitch acquisition method in acoustical signal and system thereof | |
CN102426837A (en) | Robustness method used for voice recognition on mobile equipment during agricultural field data acquisition | |
CN103258537A (en) | Method utilizing characteristic combination to identify speech emotions and device thereof | |
CN111681649B (en) | Speech recognition method, interaction system and achievement management system comprising system | |
CN106023984A (en) | Speech recognition method based on car networking | |
CN110838303B (en) | Voice sound source positioning method using microphone array | |
CN117169812A (en) | Sound source positioning method based on deep learning and beam forming | |
CN103778914B (en) | Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching | |
CN105869627A (en) | Vehicle-networking-based speech processing method | |
CN114827363A (en) | Method, device and readable storage medium for eliminating echo in call process | |
CN108074580B (en) | Noise elimination method and device | |
Khan et al. | Speaker separation using visual speech features and single-channel audio. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20121031 |