CN111429936A - Voice signal separation method - Google Patents

Voice signal separation method Download PDF

Info

Publication number
CN111429936A
CN111429936A CN202010195601.8A CN202010195601A CN111429936A CN 111429936 A CN111429936 A CN 111429936A CN 202010195601 A CN202010195601 A CN 202010195601A CN 111429936 A CN111429936 A CN 111429936A
Authority
CN
China
Prior art keywords
vector
signal
mixing matrix
vectors
observation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010195601.8A
Other languages
Chinese (zh)
Other versions
CN111429936B (en
Inventor
李一兵
吴静
孙骞
吕威
田园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202010195601.8A priority Critical patent/CN111429936B/en
Publication of CN111429936A publication Critical patent/CN111429936A/en
Application granted granted Critical
Publication of CN111429936B publication Critical patent/CN111429936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention provides a voice signal separation method, which comprises the steps of firstly establishing a linear instantaneous mixed model of an observation signal, and aiming at the problem that the separation precision is obviously reduced under the condition that the number of source signals is increased, providing improved minimum l1And (4) carrying out norm algorithm. The algorithm firstly preprocesses an observation signal and a mixing matrix, then finds a vector closest to the observation signal according to the length and the direction of the vector, changes the form of the mixing matrix on the basis, estimates a source signal at a certain moment by using the changed mixing matrix, and further estimates source signals at all moments. The method provided by the invention solves the problem that the separation precision is obviously reduced under the condition that the number of the source signals is increased, and simultaneously, the source signals are effectively separated.

Description

Voice signal separation method
Technical Field
The invention relates to a voice signal separation method under an under-determined model, in particular to a voice signal separation method, and belongs to the field of signal processing.
Background
In recent years, separation of speech signals has become a research hotspot in the field of signal processing. It has many applications and impacts in teleconferencing, hearing aids and machine speech recognition. Since the received sound is usually noisy, the problem of identifying the sound of interest and obtaining a clear sound in such an environment becomes a considerable problem, the so-called blind source separation problem.
Blind source separation is generally divided according to the number of source signals and observation signals, and can be divided into over-determined, adaptive and under-determined blind source separation, wherein the under-determined blind source separation is more in line with the actual situation, is more widely applied in life, and is more challenging. Underdetermined blind source separation refers to the case where the number of sensors or microphones is less than the number of source signals. In general, the method for solving the underdetermined blind source separation is also suitable for the over-determined and the adaptive situations, so that the research on the underdetermined blind source separation method is necessary. The general approach to underdetermined blind source separation is to use sparse component analysis, also commonly referred to as a "two-step" approach. The first step is to estimate the mixing matrix by observing the signals, and the second step is to separate the source signals by using the estimated mixing matrix. According to the current research situation of source signal separation, the problem that the existing source signal separation algorithm generally has obvious reduction under the condition that the number of source signals is increased is solved.
Disclosure of Invention
In view of the above prior art, the technical problem to be solved by the present invention is to provide an improvement-based minimization method that can improve the problem of significant reduction of separation accuracy when the number of source signals increases1Norm speech signal separation method.
In order to solve the above technical problem, the present invention provides a method for separating a voice signal, comprising the following steps:
step 1: establishing a linear instantaneous mixed model of an observation signal, which specifically comprises the following steps:
Figure BDA0002417478130000011
wherein x (t) ═ x1(t),x2(t),L,xN(t)]TIs an N-dimensional observation signal vector, A ═ a1,a2,L,aM]Is a mixture of N × M dimensionsMatrix, s (t) ═ s1(t),s2(t),L,sM(t)]TIs an M-dimensional source signal vector, t is a time sample point and aiAn ith column vector representing the mixing matrix;
step 2: removing all zero column vectors in the observation signals, and then, symmetrical the observation signals to an upper plane:
and step 3: with improved minimisation1Norm separation source signal:
minimization of1Norm is:
Figure BDA0002417478130000021
the method comprises the following steps:
(3a) calculating the observation signal angle θ (t) at time t and the column vector direction angle α of the mixing matrixi
The calculation formula is as follows:
Figure BDA0002417478130000022
αi=arctan(ai2/ai1)i=1,2,K,n
in the formula (I), the compound is shown in the specification,
Figure BDA0002417478130000023
representing two observation signals, ainRepresenting the nth element in the ith column vector in the mixing matrix.
(3b) Calculating the mixed direction of any two column vectors in the mixed matrix through sine theorem and cosine theorem:
the specific process is as follows:
∠AOB=∠AOx-∠BOx
AB2=OA2+OB2-2OAOBcos∠AOB
Figure BDA0002417478130000024
Figure BDA0002417478130000025
OC2=OA2+AC2-2OAACcos∠OAC
Figure BDA0002417478130000026
the vectors OA and OB are any two column vectors in the estimated mixing matrix, the angle ∠ AOx of the vector OA and the angle ∠ BOx of the vector OB are directions corresponding to the column vectors in the mixing matrix, respectively, and the lengths of the vectors OA and OB correspond to the lengths of the column vectors in the mixing matrix.
(3c) Calculating theta (t) and αiAngle Δ θ:
if Δ θ is 0, use is made of:
x(t)=aisi(t)
wherein x (t) is an observed signal vector at time t, aiFor the ith column vector, s, of the mixing matrixiAnd (t) is the ith source signal estimated at the time t.
If Δ θ ≠ 0, use:
Figure BDA0002417478130000027
in the formula, Wr=Ar -1Wherein
Figure BDA0002417478130000031
acAnd adIs the two vectors closest to the observed signal vector at time t.
(3d) The method comprises the following steps Traversing all the time instants, obtaining the representation s (t) of the source signal at all the time instants.
The invention has the beneficial effects that: the present invention is directed to the second step of the sparse component analysis method. In the present invention, source signal separation is adopted based on improved minimization1And (3) a norm separation method.
(1) The proposed source signal separation algorithm is applicable to two paths of observation signals;
(2) with the increase of the number of the source signals, the separation precision of the source signal separation algorithm is reduced more stably.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 is a graph of a three-way initial source signal;
FIG. 3 shows two observation signals mixed together;
FIG. 4 is a diagram illustrating a mixture of any two column vectors;
fig. 5 is a diagram of the separated three-way source signal.
Detailed Description
The method comprises the steps of firstly, finding a vector closest to an observed signal according to the length and the angle of the vector, then, changing the form of a mixing matrix, estimating a source signal at a certain moment by using the changed mixing matrix, and further estimating source signals at all moments.
The invention is described in detail below with reference to the accompanying drawings and specific embodiments.
Referring to FIG. 1, an improvement-based minimization of the present invention1The method for separating the norm voice signals comprises the following concrete steps:
step 1: establishing a linear instantaneous mixed model of an observation signal; fig. 2 is a three-way initial source signal, and fig. 3 is a two-way observation signal mixed.
In step 1, the established mathematical model is a linear instantaneous hybrid model. The speech signal is chosen as the source signal, the noise considered is additive noise, and the signal-to-noise ratio is 30 dB.
And establishing a linear instantaneous mixed model of the observed signals, wherein the specific expression is shown as follows.
Figure BDA0002417478130000032
Wherein x (t) ═ x1(t),x2(t),L,xN(t)]TIs an N-dimensional observation signal vector, A ═ a1,a2,L,aM]Is an N ×M-dimensional mixing matrix, s (t) ═ s1(t),s2(t),L,sM(t)]TIs an M-dimensional source signal vector, t is a time sample point and aiThe ith column vector representing the mixing matrix.
Step 2: removing all zero column vectors in the observation signals, and then, symmetrically arranging the observation signals to an upper plane;
in step 2, since all zero column vectors in the observation signal have no effect on the separation source signal, all zero column vectors need to be removed. In order to facilitate post-processing of the signals, the observed signals are symmetrical to the upper plane.
And step 3: using improved minimization1The norm separates the source signals.
For separating the source signals, the invention uses minimization1Norm criterion.
Figure BDA0002417478130000041
The method comprises the following specific steps:
(3a) computing α the angle of the observed signal at time t (t) and the column vector direction angle of the mixing matrixi
The source signal at each sampling instant can be separated from the observed signal x (t) at that instant, so that the source signal separation problem translates into a source signal separation problem at a single sampling instant, the observed signal direction θ (t) at the next time t and the column vector direction α of the mixing matrix are first calculatedi
The calculation formula is as follows:
θ(t)=arctan(xt2/xt1)
αi=arctan(ai2/ai1)i=1,2,K,n
in the formula (I), the compound is shown in the specification,
Figure BDA0002417478130000042
representing two observation signals, ainRepresenting the nth element in the ith column vector in the mixing matrix.
(3b) And calculating the mixed direction of any two column vectors in the mixed matrix through sine theorem and cosine theorem. Fig. 4 is a diagram illustrating a mixture of any two column vectors.
Since the length and direction of the column vector are considered simultaneously to seek the minimization of the sum of the modulus values of the source signals, on the basis of knowing the length and direction of the column vector of the mixing matrix, the sine theorem and the cosine theorem are needed to be used for solving the direction of any two column vectors after mixing.
The specific process is as follows:
∠AOB=∠AOx-∠BOx
AB2=OA2+OB2-2OAOBcos∠AOB
Figure BDA0002417478130000043
Figure BDA0002417478130000044
OC2=OA2+AC2-2OAACcos∠OAC
Figure BDA0002417478130000045
the vectors OA and OB are any two column vectors in the mixing matrix, the angle ∠ AOx of the vector OA and the angle ∠ BOx of the vector OB are directions corresponding to the column vectors in the mixing matrix, respectively, and the lengths of the vectors OA and OB correspond to the lengths of the column vectors in the mixing matrix.
(3c) Calculating theta (t) and αiAngle Δ θ:
if Δ θ is 0, the slope of the sampling point of the observation signal is the same as the direction of one column vector of the hybrid matrix, and then the following formula is used to obtain the slope;
x(t)=aisi(t)
wherein x (t) is an observed signal vector at time t, aiFor the ith column vector, s, of the mixing matrixiAnd (t) is the ith source signal estimated at the time t.
If Δ θ ≠ 0, it means that the slope of the sampling point of the observation signal is different from the direction of one column vector of the mixing matrix, and at this time, the direction obtained by mixing any two column vectors of the mixing matrix obtained in (3b) is used to find two column vectors a which minimize the sum of the modulus values of the source signalcAnd ad. And then, the source signal at the corresponding moment is obtained by using the following formula.
Figure BDA0002417478130000051
In the formula, Wr=Ar -1Wherein
Figure BDA0002417478130000052
acAnd adIs the two vectors closest to x at time t.
(3d) The method comprises the following steps And traversing all the moments to obtain the representation s (t) of the source signals at all the moments, and fig. 5 is a diagram of the separated three-way source signals.
Minimization of l based on improvement of the invention1The norm voice signal separation method has the advantage that the separation precision is gradually reduced along with the increase of the number of the source signals.
Minimization of l based on improvement of the invention1The norm speech signal separation method is only suitable for two paths of observation signals.
In summary, the following steps: the invention provides a method for minimizing l based on improvement1The norm method comprises the steps of firstly establishing a linear instantaneous mixed model of an observation signal, and providing improved minimum l aiming at the problem that the separation precision is obviously reduced under the condition that the number of source signals is increased1And (4) carrying out norm algorithm. The algorithm firstly preprocesses the observation signal and the mixed matrix, then finds the vector closest to the observation signal according to the length and the direction of the vector, changes the form of the mixed matrix on the basis, and utilizes the changed formThe mixing matrix of (a) estimates the source signal at a certain time, and then estimates the source signals at all times. The method provided by the invention solves the problem that the separation precision is obviously reduced under the condition that the number of the source signals is increased, and simultaneously, the source signals are effectively separated.
It should be noted that the above embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalent alternatives or equivalent variations fall within the scope of the present invention.

Claims (1)

1. A method for separating a speech signal, comprising the steps of:
step 1: establishing a linear instantaneous mixed model of an observation signal, which specifically comprises the following steps:
Figure FDA0002417478120000011
wherein x (t) ═ x1(t),x2(t),L,xN(t)]TIs an N-dimensional observation signal vector, A ═ a1,a2,L,aM]Is a mixed matrix of N × M dimensions, s (t) ═ s1(t),s2(t),L,sM(t)]TIs an M-dimensional source signal vector, t is a time sample point and aiAn ith column vector representing the mixing matrix;
step 2: removing all zero column vectors in the observation signals, and then, symmetrical the observation signals to an upper plane:
and step 3: with improved minimisation1Norm separation source signal:
minimization of1Norm is:
Figure FDA0002417478120000012
the method comprises the following steps:
(3a) calculating the observation signal angle θ (t) at time t and the column vector direction angle α of the mixing matrixi
The calculation formula is as follows:
Figure FDA0002417478120000013
αi=arctan(ai2/ai1)i=1,2,K,n
in the formula (I), the compound is shown in the specification,
Figure FDA0002417478120000014
representing two observation signals, ainRepresenting the nth element in the ith column vector in the mixing matrix;
(3b) calculating the mixed direction of any two column vectors in the mixed matrix through sine theorem and cosine theorem:
the specific process is as follows:
∠AOB=∠AOx-∠BOx
AB2=OA2+OB2-2OAOBcos∠AOB
Figure FDA0002417478120000015
Figure FDA0002417478120000016
OC2=OA2+AC2-2OAACcos∠OAC
Figure FDA0002417478120000017
∠COx=∠AOx-∠AOC
the vectors OA and OB are any two column vectors in the estimated mixing matrix, the angle ∠ AOx of the vector OA and the angle ∠ BOx of the vector OB are respectively the directions corresponding to the column vectors in the mixing matrix, and the lengths of the vectors OA and OB correspond to the lengths of the column vectors of the mixing matrix;
(3c) calculating theta (t) and αiAngle Δ θ:
if Δ θ is 0, use is made of:
x(t)=aisi(t)
wherein x (t) is an observed signal vector at time t, aiFor the ith column vector, s, of the mixing matrixi(t) estimating an ith source signal at the time t;
if Δ θ ≠ 0, use:
Figure FDA0002417478120000021
in the formula, Wr=Ar -1Wherein
Figure FDA0002417478120000022
acAnd adAre the two vectors closest to the observed signal vector at time t;
(3d) the method comprises the following steps Traversing all the time instants, obtaining the representation s (t) of the source signal at all the time instants.
CN202010195601.8A 2020-03-19 2020-03-19 Voice signal separation method Active CN111429936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010195601.8A CN111429936B (en) 2020-03-19 2020-03-19 Voice signal separation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010195601.8A CN111429936B (en) 2020-03-19 2020-03-19 Voice signal separation method

Publications (2)

Publication Number Publication Date
CN111429936A true CN111429936A (en) 2020-07-17
CN111429936B CN111429936B (en) 2022-10-14

Family

ID=71553535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010195601.8A Active CN111429936B (en) 2020-03-19 2020-03-19 Voice signal separation method

Country Status (1)

Country Link
CN (1) CN111429936B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562706A (en) * 2020-11-30 2021-03-26 哈尔滨工程大学 Target voice extraction method based on time potential domain specific speaker information

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070202919A1 (en) * 2003-04-22 2007-08-30 Shu David B Separating Mixed Signals In A Cellular Environment
CN104637494A (en) * 2015-02-02 2015-05-20 哈尔滨工程大学 Double-microphone mobile equipment voice signal enhancing method based on blind source separation
JP2015210512A (en) * 2014-04-24 2015-11-24 晋哉 齋藤 Method and device for separating blind signal
CN105354594A (en) * 2015-10-30 2016-02-24 哈尔滨工程大学 Mixing matrix estimation method aiming at underdetermined blind source separation
CN106448694A (en) * 2016-09-08 2017-02-22 哈尔滨工程大学 Time-frequency single source point extraction method in underdetermined blind source separation based on compound angle detection
CN108009584A (en) * 2017-12-01 2018-05-08 西安交通大学 Deficient based on the detection of single source point determines blind source separation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070202919A1 (en) * 2003-04-22 2007-08-30 Shu David B Separating Mixed Signals In A Cellular Environment
JP2015210512A (en) * 2014-04-24 2015-11-24 晋哉 齋藤 Method and device for separating blind signal
CN104637494A (en) * 2015-02-02 2015-05-20 哈尔滨工程大学 Double-microphone mobile equipment voice signal enhancing method based on blind source separation
CN105354594A (en) * 2015-10-30 2016-02-24 哈尔滨工程大学 Mixing matrix estimation method aiming at underdetermined blind source separation
CN106448694A (en) * 2016-09-08 2017-02-22 哈尔滨工程大学 Time-frequency single source point extraction method in underdetermined blind source separation based on compound angle detection
CN108009584A (en) * 2017-12-01 2018-05-08 西安交通大学 Deficient based on the detection of single source point determines blind source separation method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
W. WANG: ""Novel algorithm for underdetermined blind separation based on Sparse Component Analysis"", 《THE 2010 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION》 *
X. GUO: ""A mixing matrix estimation algorithm for frequency hopping signals under the UBSS model"", 《2017 PIERS - FALL》 *
吴静: ""基于稀疏分量分析的欠定盲源分离算法研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
张鑫: ""欠定盲源分离技术研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562706A (en) * 2020-11-30 2021-03-26 哈尔滨工程大学 Target voice extraction method based on time potential domain specific speaker information
CN112562706B (en) * 2020-11-30 2023-05-05 哈尔滨工程大学 Target voice extraction method based on time potential domain specific speaker information

Also Published As

Publication number Publication date
CN111429936B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
US10602267B2 (en) Sound signal processing apparatus and method for enhancing a sound signal
CN108364659B (en) Frequency domain convolution blind signal separation method based on multi-objective optimization
WO2018077109A1 (en) Sound processing method and device
CN110010148B (en) Low-complexity frequency domain blind separation method and system
JP4521549B2 (en) A method for separating a plurality of sound sources in the vertical and horizontal directions, and a system therefor
CN103165137B (en) Speech enhancement method of microphone array under non-stationary noise environment
CN102074236A (en) Speaker clustering method for distributed microphone
CN107884751B (en) Method for estimating number of information sources by using single-channel received signal
CN111429936B (en) Voice signal separation method
WO2018133056A1 (en) Method and apparatus for locating sound source
CN110931036A (en) Microphone array beam forming method
CN109884591B (en) Microphone array-based multi-rotor unmanned aerial vehicle acoustic signal enhancement method
CN110610718B (en) Method and device for extracting expected sound source voice signal
CN110176250B (en) Robust acoustic scene recognition method based on local learning
CN111025273B (en) Distortion drag array line spectrum feature enhancement method and system
CN109583350A (en) A kind of high-precision denoising method of local ultrasound array signal
JP3975153B2 (en) Blind signal separation method and apparatus, blind signal separation program and recording medium recording the program
JP2010112995A (en) Call voice processing device, call voice processing method and program
US20130148814A1 (en) Audio acquisition systems and methods
CN110890099B (en) Sound signal processing method, device and storage medium
CN111024208B (en) Vertical array sound pressure gradient beam forming and signal detecting method
CN109001678B (en) Thunder detection and positioning method based on three-dimensional microphone array
CN111060867A (en) Directional microphone microarray direction of arrival estimation method
CN112257484B (en) Multi-sound source direction finding method and system based on deep learning
CN108269581B (en) Double-microphone time delay difference estimation method based on frequency domain coherent function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant