CN111429936A - Voice signal separation method - Google Patents
Voice signal separation method Download PDFInfo
- Publication number
- CN111429936A CN111429936A CN202010195601.8A CN202010195601A CN111429936A CN 111429936 A CN111429936 A CN 111429936A CN 202010195601 A CN202010195601 A CN 202010195601A CN 111429936 A CN111429936 A CN 111429936A
- Authority
- CN
- China
- Prior art keywords
- vector
- signal
- mixing matrix
- vectors
- observation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 35
- 239000013598 vector Substances 0.000 claims abstract description 71
- 239000011159 matrix material Substances 0.000 claims abstract description 41
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000004364 calculation method Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention provides a voice signal separation method, which comprises the steps of firstly establishing a linear instantaneous mixed model of an observation signal, and aiming at the problem that the separation precision is obviously reduced under the condition that the number of source signals is increased, providing improved minimum l1And (4) carrying out norm algorithm. The algorithm firstly preprocesses an observation signal and a mixing matrix, then finds a vector closest to the observation signal according to the length and the direction of the vector, changes the form of the mixing matrix on the basis, estimates a source signal at a certain moment by using the changed mixing matrix, and further estimates source signals at all moments. The method provided by the invention solves the problem that the separation precision is obviously reduced under the condition that the number of the source signals is increased, and simultaneously, the source signals are effectively separated.
Description
Technical Field
The invention relates to a voice signal separation method under an under-determined model, in particular to a voice signal separation method, and belongs to the field of signal processing.
Background
In recent years, separation of speech signals has become a research hotspot in the field of signal processing. It has many applications and impacts in teleconferencing, hearing aids and machine speech recognition. Since the received sound is usually noisy, the problem of identifying the sound of interest and obtaining a clear sound in such an environment becomes a considerable problem, the so-called blind source separation problem.
Blind source separation is generally divided according to the number of source signals and observation signals, and can be divided into over-determined, adaptive and under-determined blind source separation, wherein the under-determined blind source separation is more in line with the actual situation, is more widely applied in life, and is more challenging. Underdetermined blind source separation refers to the case where the number of sensors or microphones is less than the number of source signals. In general, the method for solving the underdetermined blind source separation is also suitable for the over-determined and the adaptive situations, so that the research on the underdetermined blind source separation method is necessary. The general approach to underdetermined blind source separation is to use sparse component analysis, also commonly referred to as a "two-step" approach. The first step is to estimate the mixing matrix by observing the signals, and the second step is to separate the source signals by using the estimated mixing matrix. According to the current research situation of source signal separation, the problem that the existing source signal separation algorithm generally has obvious reduction under the condition that the number of source signals is increased is solved.
Disclosure of Invention
In view of the above prior art, the technical problem to be solved by the present invention is to provide an improvement-based minimization method that can improve the problem of significant reduction of separation accuracy when the number of source signals increases1Norm speech signal separation method.
In order to solve the above technical problem, the present invention provides a method for separating a voice signal, comprising the following steps:
step 1: establishing a linear instantaneous mixed model of an observation signal, which specifically comprises the following steps:
wherein x (t) ═ x1(t),x2(t),L,xN(t)]TIs an N-dimensional observation signal vector, A ═ a1,a2,L,aM]Is a mixture of N × M dimensionsMatrix, s (t) ═ s1(t),s2(t),L,sM(t)]TIs an M-dimensional source signal vector, t is a time sample point and aiAn ith column vector representing the mixing matrix;
step 2: removing all zero column vectors in the observation signals, and then, symmetrical the observation signals to an upper plane:
and step 3: with improved minimisation1Norm separation source signal:
minimization of1Norm is:
the method comprises the following steps:
(3a) calculating the observation signal angle θ (t) at time t and the column vector direction angle α of the mixing matrixi:
The calculation formula is as follows:
αi=arctan(ai2/ai1)i=1,2,K,n
in the formula (I), the compound is shown in the specification,representing two observation signals, ainRepresenting the nth element in the ith column vector in the mixing matrix.
(3b) Calculating the mixed direction of any two column vectors in the mixed matrix through sine theorem and cosine theorem:
the specific process is as follows:
∠AOB=∠AOx-∠BOx
AB2=OA2+OB2-2OAOBcos∠AOB
OC2=OA2+AC2-2OAACcos∠OAC
the vectors OA and OB are any two column vectors in the estimated mixing matrix, the angle ∠ AOx of the vector OA and the angle ∠ BOx of the vector OB are directions corresponding to the column vectors in the mixing matrix, respectively, and the lengths of the vectors OA and OB correspond to the lengths of the column vectors in the mixing matrix.
(3c) Calculating theta (t) and αiAngle Δ θ:
if Δ θ is 0, use is made of:
x(t)=aisi(t)
wherein x (t) is an observed signal vector at time t, aiFor the ith column vector, s, of the mixing matrixiAnd (t) is the ith source signal estimated at the time t.
If Δ θ ≠ 0, use:
in the formula, Wr=Ar -1WhereinacAnd adIs the two vectors closest to the observed signal vector at time t.
(3d) The method comprises the following steps Traversing all the time instants, obtaining the representation s (t) of the source signal at all the time instants.
The invention has the beneficial effects that: the present invention is directed to the second step of the sparse component analysis method. In the present invention, source signal separation is adopted based on improved minimization1And (3) a norm separation method.
(1) The proposed source signal separation algorithm is applicable to two paths of observation signals;
(2) with the increase of the number of the source signals, the separation precision of the source signal separation algorithm is reduced more stably.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 is a graph of a three-way initial source signal;
FIG. 3 shows two observation signals mixed together;
FIG. 4 is a diagram illustrating a mixture of any two column vectors;
fig. 5 is a diagram of the separated three-way source signal.
Detailed Description
The method comprises the steps of firstly, finding a vector closest to an observed signal according to the length and the angle of the vector, then, changing the form of a mixing matrix, estimating a source signal at a certain moment by using the changed mixing matrix, and further estimating source signals at all moments.
The invention is described in detail below with reference to the accompanying drawings and specific embodiments.
Referring to FIG. 1, an improvement-based minimization of the present invention1The method for separating the norm voice signals comprises the following concrete steps:
step 1: establishing a linear instantaneous mixed model of an observation signal; fig. 2 is a three-way initial source signal, and fig. 3 is a two-way observation signal mixed.
In step 1, the established mathematical model is a linear instantaneous hybrid model. The speech signal is chosen as the source signal, the noise considered is additive noise, and the signal-to-noise ratio is 30 dB.
And establishing a linear instantaneous mixed model of the observed signals, wherein the specific expression is shown as follows.
Wherein x (t) ═ x1(t),x2(t),L,xN(t)]TIs an N-dimensional observation signal vector, A ═ a1,a2,L,aM]Is an N ×M-dimensional mixing matrix, s (t) ═ s1(t),s2(t),L,sM(t)]TIs an M-dimensional source signal vector, t is a time sample point and aiThe ith column vector representing the mixing matrix.
Step 2: removing all zero column vectors in the observation signals, and then, symmetrically arranging the observation signals to an upper plane;
in step 2, since all zero column vectors in the observation signal have no effect on the separation source signal, all zero column vectors need to be removed. In order to facilitate post-processing of the signals, the observed signals are symmetrical to the upper plane.
And step 3: using improved minimization1The norm separates the source signals.
For separating the source signals, the invention uses minimization1Norm criterion.
The method comprises the following specific steps:
(3a) computing α the angle of the observed signal at time t (t) and the column vector direction angle of the mixing matrixi。
The source signal at each sampling instant can be separated from the observed signal x (t) at that instant, so that the source signal separation problem translates into a source signal separation problem at a single sampling instant, the observed signal direction θ (t) at the next time t and the column vector direction α of the mixing matrix are first calculatedi。
The calculation formula is as follows:
θ(t)=arctan(xt2/xt1)
αi=arctan(ai2/ai1)i=1,2,K,n
in the formula (I), the compound is shown in the specification,representing two observation signals, ainRepresenting the nth element in the ith column vector in the mixing matrix.
(3b) And calculating the mixed direction of any two column vectors in the mixed matrix through sine theorem and cosine theorem. Fig. 4 is a diagram illustrating a mixture of any two column vectors.
Since the length and direction of the column vector are considered simultaneously to seek the minimization of the sum of the modulus values of the source signals, on the basis of knowing the length and direction of the column vector of the mixing matrix, the sine theorem and the cosine theorem are needed to be used for solving the direction of any two column vectors after mixing.
The specific process is as follows:
∠AOB=∠AOx-∠BOx
AB2=OA2+OB2-2OAOBcos∠AOB
OC2=OA2+AC2-2OAACcos∠OAC
the vectors OA and OB are any two column vectors in the mixing matrix, the angle ∠ AOx of the vector OA and the angle ∠ BOx of the vector OB are directions corresponding to the column vectors in the mixing matrix, respectively, and the lengths of the vectors OA and OB correspond to the lengths of the column vectors in the mixing matrix.
(3c) Calculating theta (t) and αiAngle Δ θ:
if Δ θ is 0, the slope of the sampling point of the observation signal is the same as the direction of one column vector of the hybrid matrix, and then the following formula is used to obtain the slope;
x(t)=aisi(t)
wherein x (t) is an observed signal vector at time t, aiFor the ith column vector, s, of the mixing matrixiAnd (t) is the ith source signal estimated at the time t.
If Δ θ ≠ 0, it means that the slope of the sampling point of the observation signal is different from the direction of one column vector of the mixing matrix, and at this time, the direction obtained by mixing any two column vectors of the mixing matrix obtained in (3b) is used to find two column vectors a which minimize the sum of the modulus values of the source signalcAnd ad. And then, the source signal at the corresponding moment is obtained by using the following formula.
(3d) The method comprises the following steps And traversing all the moments to obtain the representation s (t) of the source signals at all the moments, and fig. 5 is a diagram of the separated three-way source signals.
Minimization of l based on improvement of the invention1The norm voice signal separation method has the advantage that the separation precision is gradually reduced along with the increase of the number of the source signals.
Minimization of l based on improvement of the invention1The norm speech signal separation method is only suitable for two paths of observation signals.
In summary, the following steps: the invention provides a method for minimizing l based on improvement1The norm method comprises the steps of firstly establishing a linear instantaneous mixed model of an observation signal, and providing improved minimum l aiming at the problem that the separation precision is obviously reduced under the condition that the number of source signals is increased1And (4) carrying out norm algorithm. The algorithm firstly preprocesses the observation signal and the mixed matrix, then finds the vector closest to the observation signal according to the length and the direction of the vector, changes the form of the mixed matrix on the basis, and utilizes the changed formThe mixing matrix of (a) estimates the source signal at a certain time, and then estimates the source signals at all times. The method provided by the invention solves the problem that the separation precision is obviously reduced under the condition that the number of the source signals is increased, and simultaneously, the source signals are effectively separated.
It should be noted that the above embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalent alternatives or equivalent variations fall within the scope of the present invention.
Claims (1)
1. A method for separating a speech signal, comprising the steps of:
step 1: establishing a linear instantaneous mixed model of an observation signal, which specifically comprises the following steps:
wherein x (t) ═ x1(t),x2(t),L,xN(t)]TIs an N-dimensional observation signal vector, A ═ a1,a2,L,aM]Is a mixed matrix of N × M dimensions, s (t) ═ s1(t),s2(t),L,sM(t)]TIs an M-dimensional source signal vector, t is a time sample point and aiAn ith column vector representing the mixing matrix;
step 2: removing all zero column vectors in the observation signals, and then, symmetrical the observation signals to an upper plane:
and step 3: with improved minimisation1Norm separation source signal:
minimization of1Norm is:
the method comprises the following steps:
(3a) calculating the observation signal angle θ (t) at time t and the column vector direction angle α of the mixing matrixi:
The calculation formula is as follows:
αi=arctan(ai2/ai1)i=1,2,K,n
in the formula (I), the compound is shown in the specification,representing two observation signals, ainRepresenting the nth element in the ith column vector in the mixing matrix;
(3b) calculating the mixed direction of any two column vectors in the mixed matrix through sine theorem and cosine theorem:
the specific process is as follows:
∠AOB=∠AOx-∠BOx
AB2=OA2+OB2-2OAOBcos∠AOB
OC2=OA2+AC2-2OAACcos∠OAC
∠COx=∠AOx-∠AOC
the vectors OA and OB are any two column vectors in the estimated mixing matrix, the angle ∠ AOx of the vector OA and the angle ∠ BOx of the vector OB are respectively the directions corresponding to the column vectors in the mixing matrix, and the lengths of the vectors OA and OB correspond to the lengths of the column vectors of the mixing matrix;
(3c) calculating theta (t) and αiAngle Δ θ:
if Δ θ is 0, use is made of:
x(t)=aisi(t)
wherein x (t) is an observed signal vector at time t, aiFor the ith column vector, s, of the mixing matrixi(t) estimating an ith source signal at the time t;
if Δ θ ≠ 0, use:
in the formula, Wr=Ar -1WhereinacAnd adAre the two vectors closest to the observed signal vector at time t;
(3d) the method comprises the following steps Traversing all the time instants, obtaining the representation s (t) of the source signal at all the time instants.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010195601.8A CN111429936B (en) | 2020-03-19 | 2020-03-19 | Voice signal separation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010195601.8A CN111429936B (en) | 2020-03-19 | 2020-03-19 | Voice signal separation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111429936A true CN111429936A (en) | 2020-07-17 |
CN111429936B CN111429936B (en) | 2022-10-14 |
Family
ID=71553535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010195601.8A Active CN111429936B (en) | 2020-03-19 | 2020-03-19 | Voice signal separation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111429936B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112562706A (en) * | 2020-11-30 | 2021-03-26 | 哈尔滨工程大学 | Target voice extraction method based on time potential domain specific speaker information |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070202919A1 (en) * | 2003-04-22 | 2007-08-30 | Shu David B | Separating Mixed Signals In A Cellular Environment |
CN104637494A (en) * | 2015-02-02 | 2015-05-20 | 哈尔滨工程大学 | Double-microphone mobile equipment voice signal enhancing method based on blind source separation |
JP2015210512A (en) * | 2014-04-24 | 2015-11-24 | 晋哉 齋藤 | Method and device for separating blind signal |
CN105354594A (en) * | 2015-10-30 | 2016-02-24 | 哈尔滨工程大学 | Mixing matrix estimation method aiming at underdetermined blind source separation |
CN106448694A (en) * | 2016-09-08 | 2017-02-22 | 哈尔滨工程大学 | Time-frequency single source point extraction method in underdetermined blind source separation based on compound angle detection |
CN108009584A (en) * | 2017-12-01 | 2018-05-08 | 西安交通大学 | Deficient based on the detection of single source point determines blind source separation method |
-
2020
- 2020-03-19 CN CN202010195601.8A patent/CN111429936B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070202919A1 (en) * | 2003-04-22 | 2007-08-30 | Shu David B | Separating Mixed Signals In A Cellular Environment |
JP2015210512A (en) * | 2014-04-24 | 2015-11-24 | 晋哉 齋藤 | Method and device for separating blind signal |
CN104637494A (en) * | 2015-02-02 | 2015-05-20 | 哈尔滨工程大学 | Double-microphone mobile equipment voice signal enhancing method based on blind source separation |
CN105354594A (en) * | 2015-10-30 | 2016-02-24 | 哈尔滨工程大学 | Mixing matrix estimation method aiming at underdetermined blind source separation |
CN106448694A (en) * | 2016-09-08 | 2017-02-22 | 哈尔滨工程大学 | Time-frequency single source point extraction method in underdetermined blind source separation based on compound angle detection |
CN108009584A (en) * | 2017-12-01 | 2018-05-08 | 西安交通大学 | Deficient based on the detection of single source point determines blind source separation method |
Non-Patent Citations (4)
Title |
---|
W. WANG: ""Novel algorithm for underdetermined blind separation based on Sparse Component Analysis"", 《THE 2010 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION》 * |
X. GUO: ""A mixing matrix estimation algorithm for frequency hopping signals under the UBSS model"", 《2017 PIERS - FALL》 * |
吴静: ""基于稀疏分量分析的欠定盲源分离算法研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
张鑫: ""欠定盲源分离技术研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112562706A (en) * | 2020-11-30 | 2021-03-26 | 哈尔滨工程大学 | Target voice extraction method based on time potential domain specific speaker information |
CN112562706B (en) * | 2020-11-30 | 2023-05-05 | 哈尔滨工程大学 | Target voice extraction method based on time potential domain specific speaker information |
Also Published As
Publication number | Publication date |
---|---|
CN111429936B (en) | 2022-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10602267B2 (en) | Sound signal processing apparatus and method for enhancing a sound signal | |
CN108364659B (en) | Frequency domain convolution blind signal separation method based on multi-objective optimization | |
WO2018077109A1 (en) | Sound processing method and device | |
CN110010148B (en) | Low-complexity frequency domain blind separation method and system | |
JP4521549B2 (en) | A method for separating a plurality of sound sources in the vertical and horizontal directions, and a system therefor | |
CN103165137B (en) | Speech enhancement method of microphone array under non-stationary noise environment | |
CN102074236A (en) | Speaker clustering method for distributed microphone | |
CN107884751B (en) | Method for estimating number of information sources by using single-channel received signal | |
CN111429936B (en) | Voice signal separation method | |
WO2018133056A1 (en) | Method and apparatus for locating sound source | |
CN110931036A (en) | Microphone array beam forming method | |
CN109884591B (en) | Microphone array-based multi-rotor unmanned aerial vehicle acoustic signal enhancement method | |
CN110610718B (en) | Method and device for extracting expected sound source voice signal | |
CN110176250B (en) | Robust acoustic scene recognition method based on local learning | |
CN111025273B (en) | Distortion drag array line spectrum feature enhancement method and system | |
CN109583350A (en) | A kind of high-precision denoising method of local ultrasound array signal | |
JP3975153B2 (en) | Blind signal separation method and apparatus, blind signal separation program and recording medium recording the program | |
JP2010112995A (en) | Call voice processing device, call voice processing method and program | |
US20130148814A1 (en) | Audio acquisition systems and methods | |
CN110890099B (en) | Sound signal processing method, device and storage medium | |
CN111024208B (en) | Vertical array sound pressure gradient beam forming and signal detecting method | |
CN109001678B (en) | Thunder detection and positioning method based on three-dimensional microphone array | |
CN111060867A (en) | Directional microphone microarray direction of arrival estimation method | |
CN112257484B (en) | Multi-sound source direction finding method and system based on deep learning | |
CN108269581B (en) | Double-microphone time delay difference estimation method based on frequency domain coherent function |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |