CN110321810A - Single channel signal two-way separation method, device, storage medium and processor - Google Patents
Single channel signal two-way separation method, device, storage medium and processor Download PDFInfo
- Publication number
- CN110321810A CN110321810A CN201910515889.XA CN201910515889A CN110321810A CN 110321810 A CN110321810 A CN 110321810A CN 201910515889 A CN201910515889 A CN 201910515889A CN 110321810 A CN110321810 A CN 110321810A
- Authority
- CN
- China
- Prior art keywords
- time
- road
- single channel
- target
- signal data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 44
- 230000000873 masking effect Effects 0.000 claims abstract description 133
- 238000000034 method Methods 0.000 claims abstract description 119
- 238000013507 mapping Methods 0.000 claims abstract description 104
- 238000013528 artificial neural network Methods 0.000 claims abstract description 26
- 238000013136 deep learning model Methods 0.000 claims description 51
- 238000013527 convolutional neural network Methods 0.000 claims description 17
- 238000013135 deep learning Methods 0.000 claims description 17
- 230000004913 activation Effects 0.000 claims description 13
- 230000000306 recurrent effect Effects 0.000 claims description 12
- 238000004088 simulation Methods 0.000 claims description 9
- 230000006403 short-term memory Effects 0.000 claims description 7
- 238000013461 design Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 230000007547 defect Effects 0.000 abstract description 9
- 230000008901 benefit Effects 0.000 abstract description 4
- 238000012549 training Methods 0.000 description 37
- 238000012545 processing Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 238000002156 mixing Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000000052 comparative effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/0308—Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
The invention discloses a kind of single channel signal two-way separation method, device, storage medium and processors.Wherein, method is comprising steps of establish a multichannel neural-network learning model, the model includes target mapping road, time-frequency masking road and full articulamentum, target mapping road separates single channel signal data using target mapping method, parallel, time-frequency masking road separates single channel signal data using time-frequency masking method;The specification that the data exported behind target mapping road and the separation of time-frequency masking road are converged by full articulamentum, and arranged as target data, and then the echo signal data characteristics of output estimation.The present invention has been compatible with the advantage of time-frequency masking method and target mapping method respectively, and compensates for its defect to a certain extent, in the case where not considering signal data phase equally, model generalization better performances.
Description
Technical field
The invention belongs to blind source separating (Blind Source Separation, BSS) research field, early stage main application
It in field of signal processing, also known as is Blind Signal Separation, in particular to a kind of single channel signal two-way separation method, device, storage
Medium and processor.
Background technique
It is the separation process of signal data to be regarded to a monitoring learning problem as, and then utilize deep learning mostly at present
Network model is realized.Blind source separating generality frame based on deep learning is broadly divided into " deep learning model training " and " single
Two stages of channel data separation ":
(1) training stage: using the feature of deep learning model extraction training data, learn unsegregated source signal data
By the non-linear relation of the label signal data manually separated;
(2) separation phase: trained model is used for mixed signal data separating, finally the signal data isolated
It reintegrates and fashions into complete signal data.
Key using deep learning method is to calculate the design of target, this can be directly reflected into the setting of cost function,
There is great influence to the properties of deep learning model.Currently, for single channel signal data separating task, the meter of mainstream
Calculating target mainly has target mapping and time-frequency masking:
(1) target maps: directly learning the mapping relations of source data and label data in the training process;It is testing and is testing
The target data of output estimation during card is that monitoring learning problem is most direct, most widely calculates target setting method,
Cost function setting are as follows:
Wherein yiFor required echo signal data, xiIt is that single channel band is made an uproar mixed signal data,It is by depth
Practise the estimation for the echo signal data that model obtains.When carrying out single channel signal data separating, this method makes deep learning model
Mapping relations between direct learning objective signal data and single channel signals with noise data.Its main feature has: 1. not needing
Priori knowledge;2. needing not move through complicated data processing, characteristic extraction procedure;3. there is no the defects in physical theory.
But single channel band makes an uproar mixed signal data with noise randomness and unpredictability, the pass with echo signal data
System is not direct, indefinite, and the major defect of such methods is: 1. model estimation difficulty is big;2. model training speed is slow;3. mould
The extensive effect of type is poor.
(2) time-frequency covers disadvantage: assuming that echo signal data and single channel band mixed signal data of making an uproar are deposited on different time-frequencies
In certain proportionate relationship, i.e. time-frequency masking, in the training process by special data processing, characteristic extraction procedure, learn
Practise the time-frequency masking relationship of source data and label data;The time-frequency masking ratio of output estimation is closed during test and verification
System, and then the echo signal data estimated.
When carrying out single channel signal data separating, this method makes deep learning model analysis echo signal data and single channel
Signals with noise data have preferable performance effect in proportionate relationship present on different time-frequencies, for voice signal data separation
Fruit.Its main feature has: 1. model estimation difficulty is smaller;2. model training fast speed;3. model generalization effect is preferable.
But in true environment, the major defect of such methods is: it is difficult to predict ranges for echo signal data, and often
Occur because the phase of echo signal data and noise signal data it is unequal caused by physical interference phenomenon.
Therefore, in view of the above-mentioned problems, needing to provide a kind of single channel signal two-way separation method based on deep learning, dress
It sets, and realizes the storage medium and processor of the above method or application above-mentioned apparatus.
Summary of the invention
The shortcomings that it is a primary object of the present invention to overcome the prior art and insufficient, use for reference multichannel neural network thought, when
Frequency covering method and target mapping method provide a kind of single channel signal two-way separation method, device based on deep learning,
It has been compatible with the advantage of time-frequency masking method and target mapping method respectively, and compensates for its defect to a certain extent, has
The accurate advantage of fast convergence rate, separating resulting.
It is another object of the present invention to provide a kind of storage mediums, are stored thereon with computer program, program fortune
The single channel signal two-way separation method is executed when row.
It is another object of the present invention to provide a kind of processor, the processor is for running program, wherein described
Program executes the single channel signal two-way separation method when running.
The purpose of the present invention is realized by the following technical solution: single channel signal two-way separation method, comprising steps of
A multichannel neural-network learning model is established, which includes that target maps road, time-frequency masking road and full articulamentum,
Target mapping road separates single channel signal data using target mapping method, parallel, time-frequency masking road uses time-frequency
Covering method separates single channel signal data;The data exported behind target mapping road and the separation of time-frequency masking road pass through complete
Articulamentum converges, and the specification arranged as target data, and then the echo signal data characteristics of output estimation.
Preferably, it when target mapping road separates single channel signal data using target mapping method, is reflected in target
Design connects a mapping layer after penetrating deep learning model, and mapping layer uses relu race activation primitive simulated target mapping side
Method, the mapping that the signal data and single channel band for establishing target mapping deep learning model output are made an uproar between mixed signal data are closed
System obtains the echo signal data of target mapping road estimation.
Preferably, it when time-frequency masking road separates single channel signal data using time-frequency masking method, is covered in time-frequency
It covers and designs one masking layer of connection after deep learning model, time-frequency masking deep learning model makes an uproar mixed signal to single channel band
Data are separated, and masking layer simulates time-frequency masking method using sigmoid activation primitive, establish time-frequency masking deep learning mould
The signal data and single channel band of type output are made an uproar the time-frequency masking ratio between mixed signal data
Further, be balance two-way for the weight of overall model and the distributional difference of output data, in advance when
Frequency shelter road simulation output echo signal data characteristics estimation, i.e., withIt is handled, xiIndicate single channel band
Then mixed signal of making an uproar data are converged with target mapping Lu Quan articulamentum.
Target mapping deep learning model, time-frequency masking deep learning model are all made of convolution as a preferred method,
Neural network CNN is realized.
Target mapping deep learning model, time-frequency masking deep learning model are all made of length as a preferred method,
When memory Recognition with Recurrent Neural Network LSTM realize.
Target mapping deep learning model, time-frequency masking deep learning model are all made of two-way as a preferred method,
Long short-term memory Recognition with Recurrent Neural Network BLSTM is realized.
Single channel signal two-way separator, comprising:
One multichannel neural-network learning model module, the module include that target maps road, time-frequency masking road and full articulamentum,
Wherein:
Target maps road, for being separated using target mapping method to single channel signal data,
Time-frequency masking road, for being separated using time-frequency masking method to single channel signal data;
Full articulamentum converges module, for target to be mapped the data fusion exported behind road and the separation of time-frequency masking road, and
Arrange the specification for target data, and then the echo signal data characteristics of output estimation.
Preferably, target mapping road includes target mapping deep learning model and mapping layer, and mapping layer uses relu
Race's activation primitive simulated target mapping method, the signal data for establishing target mapping deep learning model output are made an uproar with single channel band
Mapping relations between mixed signal data obtain the echo signal data of target mapping road estimation.
Preferably, the time-frequency masking road includes time-frequency masking deep learning model and masking layer, and masking layer uses
Sigmoid activation primitive simulates time-frequency masking method, establishes the signal data and single-pass of the output of time-frequency masking deep learning model
Road band is made an uproar the time-frequency masking ratio between mixed signal dataIn advance in time-frequency masking road simulation output echo signal data
The estimation of feature, i.e., withIt is handled, xiIt indicates that single channel band is made an uproar mixed signal data, is then reflected with target
Rays are converged in full articulamentum.
The present invention is directed to single channel signal, by using target mapping method and time-frequency masking method to be divided respectively parallel
From then being converged to the data exported after separation by full articulamentum, overall convergence speed is somewhere between time-frequency masking method
Slower than time-frequency masking method between target mapping method but faster than target mapping method, there is no the reasons of time-frequency covering method
By defect, performance is got well than time-frequency masking method and target mapping method;Time-frequency masking road plays " accelerator "
Role, and target mapping road then plays the role of " lifter ".In the case where not considering signal data phase equally, model
Generalization Capability is preferable.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention.
Fig. 2 is the treatment process schematic diagram of the full articulamentum meet of the method for the present invention.
Fig. 3 is to converge the full attended operation of processing to two-way to regard the schematic diagrames of semi-connected operations as.
Fig. 4 is the flow chart that deep learning model is all made of CNN realization in embodiment 1.
Fig. 5 is the flow chart that deep learning model is all made of LSTM realization in embodiment 1.
Fig. 6 is the flow chart that deep learning model is all made of BLSTM realization in embodiment 1.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
Embodiment 1
For the data separating of single channel signal, target mapping is respectively adopted in the prior art and time-frequency masking all has respectively
From defect propose a kind of single channel signal of in summary two methods for this purpose, the present invention uses for reference multichannel neural network thought
Two-way separation method.
Multichannel neural network is mainly characterized by that there is no multiple models, does not also need preparatory training, but single
Multiple individual branches with different model structures and data process method are designed on the basis of model, and often each branch it
Afterwards plus full articulamentum to converge, and instructed eventually by the unified entirety that backpropagation BP algorithm carries out each branch of entire model
Practice.It is characterized in that multiple branch circuit, merging are trained, branch converges.The main thought of multichannel neural network is by single or polynary mould
State data by multichannel process with expanding data dimension and increase model treatment granularity, be finally reached improve model learning efficiency and
The purpose of performance.The characteristics of single dimension, single modal data can only be handled compared to traditional single channel neural network, multichannel nerve
Network can not only handle multi-modal data, but also can be used to the observation dimension of expanding data, usually have better data-handling capacity
And performance.Because the branch of multichannel neural network seems independent, but can be due to converging when backpropagation is integrally trained
Close processing and generate it is actual influence each other, can achieve complementary effect.Single channel signal two-way separation proposed by the present invention
Method can have complementary advantages both target mapping method and time-frequency masking method, to obtain better performance.
Referring to Fig. 1, the training stage can be divided into the present invention is based on the single channel signal two-way separation method of deep learning and divided
From the stage, each stage is specifically described with reference to the accompanying drawing.
One, the training stage
In the training stage, method is broadly divided into 3 parts, is target mapping road, time-frequency masking road and full connection respectively
Layer converges.
It include that target maps deep learning model, and setting maps deep learning mould in target in target mapping road
Mapping layer after type.Target mapping deep learning model can be multiplicity, such as DNN, CNN, RNN, and mapping layer uses
Relu race function (such as Relu, Leaky Relu, PRelu, ELU) simulated target mapping method, realization are mapped by previous step target
The resulting signal data y of deep learning modeliIt makes an uproar mixed signal data x with single channel bandiBetween mapping relations yi=f (xi),
To obtain the echo signal data of road estimation.
Include time-frequency masking deep learning model in the time-frequency masking road, and is arranged in time-frequency masking deep learning mould
Masking layer after type.Time-frequency masking deep learning model is equally also possible to multiplicity, and existing time-frequency masking function has more
Kind, such as Wiener filtering covering method (Wiener Filter Mask, WFM), ideal two-value covering method (Ideal Binary
Mask, IBM) and ideal floating value covering method (Ideal Ratio Mask, IRM) etc., formula is as follows:
Wherein C is the number for mixing source signal, | si,ft| it is energy value of i-th of source signal in (f, t) time frequency unit.
And for different time-frequency masking functions, cost function has different setting methods, but can generally be summarized
Are as follows:
Wherein yiFor required echo signal data, miFor target time-frequency masking,For estimating for deep learning model output
Time-frequency masking is counted, M ' is masking reconstruction.The difference of two formulas is still to estimate with time-frequency masking with signal data to calculate
Difference.
Masking layer in time-frequency masking road makes it can indirect learning echo signal data to simulate time-frequency masking method
The time-frequency masking relationship made an uproar between mixed signal data with single channel band.
Due to time-frequency masking need by export-restriction be a certain range such as [0,1], masking layer can be used sigmoid activation letter
Number:
But it is different from time-frequency masking road, target mapping road be do not need for output data to be limited in it is a certain range of,
And if directly carrying out time-frequency masking road and target mapping road to converge processing, the data distribution that two-way can be made to export respectively is poor
Different excessive (time-frequency masking road output area is [0,1], and target mapping road output area is [0 ,+∞]).If from model angle
From the point of view of, this will cause the weight on time-frequency masking road and target mapping road, and unbalanced (the too small and target of time-frequency masking right of way weight maps
Right of way weight is excessive) so that the study generation to model seriously affects.
Therefore, it is balance two-way for the weight of overall model and the distributional difference of output data, needs in advance in time-frequency
Shelter road simulation output echo signal data characteristics estimation, i.e., withIt is handled, it just can finally and target
Converge on mapping road.
Since target maps road and time-frequency masking road estimating with respective independent logical simulation output echo signal data
Meter, the data of output are not generally identical;And two-way converge after data volume be twice of single-pass data amount, advised with target data
Lattice are not consistent.Therefore the specification that the data that two branches export are converged by full articulamentum, and are arranged as target data is needed,
And then the echo signal data characteristics of output estimation.Fig. 2, which is illustrated, to be converged two-way by full articulamentum and arranges data requirement
Process.Two-way above-mentioned is converged into processing procedural abstraction expression using following formula:
WhereinIt is the ith feature data for estimating echo signal data;iAnd Wi' it is respectively that target maps road, time-frequency is covered
The weight of the full attended operation on road is covered, b is the biasing of full attended operation;X and X ' is respectively target mapping road, time-frequency masking road
The output data of n × 1,It is then by the concatenation of two-way output data.In addition, f is the activation primitive of full articulamentum, double
Road, which is converged in the full articulamentum of processing, to be not provided with, but as the mapping layer operation on target mapping road, it is contemplated that signal data
Again often using Short Time Fourier Transform amplitude spectrum as feature, therefore two-way converges the full articulamentum of processing that relu race generally can be used
Activation primitive.
As target mapping method, two-way separation method final output is the echo signal data characteristics estimated, generation
Valence function setup are as follows:
The above method proposed by the invention is built upon on deep learning, is that target mapping method, time-frequency is taken to cover
The scheme of the length of both methods is covered, therefore hypothesis deduction can be carried out from the angle of deep learning, and then prove the section of this method
The property learned and reasonability.
For the description for simplifying two-way separation method, semi-connected operations are regarded in the full attended operation that processing can be converged to two-way as,
As shown in figure 3, using following formula abstract expression:
Due to having used relu race activation primitive, can simplify again are as follows:
Biasing b is omitted:
The ith feature data of echo signal data can will be estimated as a result,See target mapping road and time-frequency masking as
The respective weight sum of products of corresponding data of road output.
From theory analysis, in the training process, the update of branch weight is regular.For time-frequency masking method, instruction
It is more very fast than target mapping method to practice speed, therefore in model training early period, output maps road closer to echo signal than target
Data y, such as following formula:
Wherein, XtAnd Xt-1The respectively output of this training iteration of target mapping road and last training iteration, X 'tWith
X′t-1Respectively the output of this training iteration of time-frequency masking mapping road and last training iteration, y are echo signal data.
However, time-frequency masking method is there are theoretical defects, if physical interference influences excessive, phase after training, time-frequency
Masking routing is unable to get effective training in data are limited in a certain range, but target mapping road is that there is no this theories to lack
It is sunken, still can continue to train at this time, therefore its output than time-frequency masking road closer to y, such as following formula:
From the point of view of deep learning backpropagation mechanism, the separate branches closer to y are exported, weight should be bigger.Therefore
In training early period, the weight on time-frequency masking road is larger, after training the phase, and the weight that target maps road is larger.If to train iteration T
The early period and later period of partitioning model training, i.e. 0 < t < T is training early period, and t > T is the training later period, can be indicated are as follows:
Analyzed by above-mentioned theory, it is known that the present invention is in model training early period, time-frequency masking road training speed compared with
Fastly, weight ratio target mapping road is larger, the estimation echo signal data of model outputError is constantly reduced;In model training
In the later period, time-frequency masking road is unable to get effective training, but target mapping road still can continue to train, weight ratio time-frequency masking
Road is larger, the estimation echo signal data of model outputError is still constantly reduced.The method of the present invention totality convergence speed
It is slower than time-frequency masking method but faster than target mapping method between time-frequency masking method and target mapping method, it does not deposit
In the theoretical defects of time-frequency masking method, performance is got well than time-frequency masking method and target mapping method;Time-frequency masking
Road plays the role of " accelerator ", and target mapping road then plays the role of " lifter ".
Two, separation phase
Mixed signal data to be separated are entered data into respectively after carrying out data extraction/data processing
It states trained target mapping branch and carries out target mapping separation, when being input to trained time-frequency masking road progress
Frequency masking separation, then after full articulamentum converges, obtains echo signal data characteristics, and then can be used for subsequent echo signal
Data waveform reconstruct.
In one embodiment, either target maps deep learning model or time-frequency masking deep learning model,
It is realized using convolutional neural networks CNN.Process is as shown in figure 4, due to establishing in backpropagation BP algorithm and neocognitron
(Neocogniron) on the basis of, weight sharing policy is used, the complexity of convolutional neural networks is compared to other nerve nets
It is substantially reduced for network, training parameter also greatly reduces, and performance is significant.
In another embodiment, either target maps deep learning model or time-frequency masking deep learning model,
Long short-term memory Recognition with Recurrent Neural Network LSTM is all made of to realize.Recognition with Recurrent Neural Network also known as recurrent neural network are Michael
I.Jordan and Jeffrey Elman was proposed respectively at, nineteen ninety in 1986.The present embodiment method flow as shown in figure 5,
LSTM improves the long sequence Dependence Problem of Recognition with Recurrent Neural Network RNN with " door " structure, retains RNN in the processing unit of feedforward connection
Inside be added internal feedback connection operation, allow RNN neuron current time output state the moment later again
It is input to the neuron, to realize the neural metwork training in time-domain.
In another embodiment, either target maps deep learning model or time-frequency masking deep learning model,
Two-way long short-term memory Recognition with Recurrent Neural Network BLSTM is all made of to realize.Process as shown in fig. 6, BLSTM be two it is unidirectional long in short-term
Remembering the combination of Recognition with Recurrent Neural Network LSTM, history input information and the following input information is respectively associated in each unidirectional LSTM,
Finally the output data of two unidirectional LSTM is connected in output layer.
It is real respectively using two-way separation method proposed by the invention in public data collection environment and actual application environment
The now single channel signal data two-way disjunctive model based on CNN, LSTM and BLSTM, by being covered with target mapping method, time-frequency
Cover the comparative experiments verifying of method and the validity and practical application performance of test two-way separation method.
Two-way separation method described in the present embodiment carries out feature extraction using Short Time Fourier Transform method, using window it is long and
The single channel that respectively 256 sampled points are grown in window walk, Hamming (Hamming) window of 128 sampled points is 16KHz to sample rate
Signal data carries out Time-frequency Decomposition, obtains Fourier coefficient in short-term, and obtains in short-term to STFFT coefficient modulo operation (| STFT |)
Fourier transformation amplitude spectrum (SFFT-magnitude).Wherein, Hamming window such as following formula:
Furthermore, it is contemplated that the continuity of signal data, the feature present frame together with front cross frame and rear two frame inputs jointly
To model.
This experiment be based on CNN, LSTM and BLSTM realize respectively target mapping method, time-frequency masking (IRM) method and
The single channel signal data separating model of dual-arm approach compares experiment.Table 1 is the essential information of each model of comparative experiments.
1 comparative experiments model essential information of table
Based on TIMIT corpus and NOISEX-92 noise collection, it is trained using above-mentioned 9 models.Table 2 gives
Training complete iteration number when " terminating in advance " strategy is executed in training process.It can be seen that single channel signal data two-way
The model training convergence rate of separation method (LSTM-TB, BLSTM-TB, CNN-TB) is very fast, in target mapping method and when
Between frequency covering method.
Training complete iteration number when each model of table 2 " terminating in advance "
Model | LSTM-IRM | LSTM-MP | LSTM-TB |
Training the number of iterations | 14 | 159 | 44 |
Model | BLSTM-IRM | BLSTM-MP | BLSTM-TB |
Training the number of iterations | 15 | 129 | 117 |
Model | CNN-IRM | CNN-MP | CNN-TB |
Training the number of iterations | 13 | 110 | 68 |
Table 3 is each method model performance comparing result realized in the case where matching noise signal data based on LSTM.It is given in table
LSTM-IRM, LSTM-MP, LSTM-TB are gone out when mixing signal-to-noise ratio is -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB
SDR, SAR, SNR performance comparison situation of separated estimation echo signal data out.As can be seen that being separated using LSTM-TB
SDR value ratio LSTM-IRM, LSTM-MP mean height 1.4000,1.1250 of estimation echo signal data out;SNR value ratio LSTM-
IRM, LSTM-MP mean height 3.4883,0.4333;SAR value ratio LSTM-MP mean height 1.3133.
Table 3 matches each method model performance comparison under noise data based on LSTM
Table 4 is each method model performance comparing result realized in the case where matching noise signal data based on BLSTM.It is given in table
Having gone out BLSTM-IRM, BLSTM-MP, BLSTM-TB in mixing signal-to-noise ratio is -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB situation
Under it is separated go out estimation echo signal data SDR, SAR, SNR performance comparison situation.As can be seen that using BLSTM-TB points
SDR value ratio BLSTM-IRM, BLSTM-MP mean height 1.2617,1.2567 of the estimation echo signal data separated out;SNR value ratio
BLSTM-IRM, BLSTM-MP mean height 2.9633,0.5883;SAR value ratio BLSTM-IRM, BLSTM-MP mean height 0.1050,
1.4233。
Table 4 matches each method model performance comparison under noise data based on BLSTM
Table 5 is each method model performance comparing result realized in the case where matching noise signal data based on CNN.It is provided in table
CNN-IRM, CNN-MP, CNN-TB are separated when mixing signal-to-noise ratio and being -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB
SDR, SAR, SNR performance comparison situation of estimation echo signal data out.As can be seen that the estimation isolated using CNN-TB
SDR value ratio CNN-IRM, CNN-MP mean height 1.0517,1.6117 of echo signal data;SNR value ratio CNN-IRM, CNN-MP
Mean height 2.8000,0.8850;SAR value ratio CNN-MP mean height 1.7767.
Table 5 matches each method model performance comparison under noise data based on CNN
Table 6 is each method model performance comparing result realized under non-matching noise signal data based on LSTM.In table
It is -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB situation that LSTM-IRM, LSTM-MP, LSTM-TB, which are given, in mixing signal-to-noise ratio
Under it is separated go out estimation echo signal data SDR, SAR, SNR performance comparison situation.As can be seen that using LSTM-TB points
SDR value ratio LSTM-IRM, LSTM-MP mean height 0.6917,1.1183 of the estimation echo signal data separated out;SNR value ratio
LSTM-IRM, LSTM-MP mean height 1.9933,0.5167;SAR value ratio LSTM-MP mean height 1.2717.
Each method model performance comparison under the non-matching noise data of table 6 based on LSTM
Table 7 is each method model performance comparing result realized under non-matching noise signal data based on BLSTM.In table
It is -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB feelings that BLSTM-IRM, BLSTM-MP, BLSTM-TB, which are given, in mixing signal-to-noise ratio
SDR, SAR, SNR performance comparison situation of separated estimation echo signal data out under condition.As can be seen that using BLSTM-TB
SDR value ratio BLSTM-IRM, BLSTM-MP mean height 0.9017,0.7733 for the estimation echo signal data isolated;SNR value
Than BLSTM-IRM, BLSTM-MP mean height 1.5400,0.0933;SAR value ratio BLSTM-IRM, BLSTM-MP mean height
0.4100、1.0050。
Each method model performance comparison under the non-matching noise data of table 7 based on BLSTM
Table 8 is each method model performance comparing result realized under non-matching noise signal data based on CNN.It is given in table
Gone out CNN-IRM, CNN-MP, CNN-TB in the case that mix signal-to-noise ratio be -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB divided
SDR, SAR, SNR performance comparison situation of the estimation echo signal data separated out.As can be seen that being estimated using what CNN-TB was isolated
Count SDR value ratio CNN-IRM, CNN-MP mean height 0.6667,0.7300 of echo signal data;SNR value ratio CNN-IRM mean height
1.0633;SAR value ratio CNN-IRM, CNN-MP mean height 0.3467,1.1367.
Each method model performance comparison under the non-matching noise data of table 8 based on CNN
It is compared from the above experimental result, it is recognised that in the environment of public data collection, it is real based on LSTM, BLSTM, CNN
Existing two-way separation method compares target no matter under matching noise signal data or non-matching noise signal data cases
Mapping method and time-frequency masking method have better performance.
Embodiment 2
The present embodiment provides a kind of single channel signal two-way separators, correspond to separation method described in embodiment 1,
The device includes a multichannel neural-network learning model module, which includes target mapping road, time-frequency masking road and full connection
Layer, in which:
One multichannel neural-network learning model module, the module include that target maps road, time-frequency masking road and full articulamentum,
Wherein:
Target maps road, for being separated using target mapping method to single channel signal data,
Time-frequency masking road, for being separated using time-frequency masking method to single channel signal data;
Full articulamentum converges module, for target to be mapped the data fusion exported behind road and the separation of time-frequency masking road, and
Arrange the specification for target data, and then the echo signal data characteristics of output estimation.
Wherein, target mapping road includes target mapping deep learning model, and setting maps deep learning mould in target
Mapping layer after type.Mapping layer uses relu race functional simulation target mapping method, realizes and maps depth by previous step target
The resulting signal y of learning modeliIt makes an uproar mixed signal data x with single channel bandiBetween mapping relations yi=f (xi), to obtain
The echo signal data of road estimation.
Wherein, time-frequency masking road includes time-frequency masking deep learning model, and is arranged in time-frequency masking deep learning mould
Masking layer after type.Masking layer simulates time-frequency masking method using sigmoid activation primitive, realizes by previous step time-frequency masking
The resulting signal data m of deep learning modeliIt makes an uproar mixed signal data x with single channel bandiBetween time-frequency masking ratio
It is balance two-way for the weight of overall model and the distributional difference of output data, needs in advance on time-frequency masking road
The estimation of simulation output echo signal data characteristics, i.e., withIt is handled, just finally can map road with target
Converge.
Full articulamentum converges module for converging the data that two branches export by full articulamentum, and arranging is number of targets
According to specification, and then the echo signal data characteristics of output estimation.
In the present embodiment, the target mapping deep learning model in target mapping block and the time-frequency in time-frequency masking module
Deep learning model is sheltered, the convolutional neural networks CNN in embodiment 1 or the long short-term memory in embodiment 2 can be used
Two-way long short-term memory Recognition with Recurrent Neural Network BLSTM in Recognition with Recurrent Neural Network LSTM or embodiment 3.
Embodiment 3
The present embodiment provides a kind of storage mediums, are stored thereon with computer program, when which runs, can be performed and implement
Timing physiological data classification method described in example 1.
Embodiment 4
The present embodiment provides a kind of processor, the processor is for running program, wherein described program executes when running
Timing physiological data classification method described in embodiment 1.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention,
It should be equivalent substitute mode, be included within the scope of the present invention.
Claims (10)
1. single channel signal two-way separation method, which is characterized in that comprising steps of
A multichannel neural-network learning model is established, which includes target mapping road, time-frequency masking road and full articulamentum, target
Mapping road separates single channel signal data using target mapping method, parallel, time-frequency masking road uses time-frequency masking
Method separates single channel signal data;The data exported behind target mapping road and the separation of time-frequency masking road by connecting entirely
Layer converges, and the specification arranged as target data, and then the echo signal data characteristics of output estimation.
2. single channel signal two-way separation method according to claim 1, which is characterized in that target maps road and uses target
When mapping method separates single channel signal data, one mapping of design connection after target maps deep learning model
Layer, mapping layer use relu race activation primitive simulated target mapping method, establish the letter of target mapping deep learning model output
Number and single channel band are made an uproar the mapping relations between mixed signal data, and the echo signal number of target mapping road estimation is obtained
According to.
3. single channel signal two-way separation method according to claim 1, which is characterized in that time-frequency masking road uses time-frequency
When covering method separates single channel signal data, one masking of design connection after time-frequency masking deep learning model
Layer, time-frequency masking deep learning model separate single channel band mixed signal data of making an uproar, and masking layer is using sigmoid activation
Functional simulation time-frequency masking method, the signal data for establishing the output of time-frequency masking deep learning model is made an uproar with single channel band mixes letter
Time-frequency masking ratio between numberIn advance in the estimation of time-frequency masking road simulation output echo signal data characteristics, i.e.,
WithIt is handled, xiIt indicates that single channel band is made an uproar mixed signal data, then maps Lu Quan articulamentum with target
Converge.
4. single channel signal two-way separation method according to claim 1, which is characterized in that target maps deep learning mould
Type, time-frequency masking deep learning model are all made of convolutional neural networks CNN realization.
5. single channel signal two-way separation method according to claim 1, which is characterized in that target maps deep learning mould
Type, time-frequency masking deep learning model are all made of long short-term memory Recognition with Recurrent Neural Network LSTM and realize.
6. single channel signal two-way separation method according to claim 1, which is characterized in that target maps deep learning mould
Type, time-frequency masking deep learning model are all made of two-way long short-term memory Recognition with Recurrent Neural Network BLSTM and realize.
7. single channel signal two-way separator characterized by comprising
One multichannel neural-network learning model module, the module include that target maps road, time-frequency masking road and full articulamentum,
In:
Target maps road, for being separated using target mapping method to single channel signal data,
Time-frequency masking road, for being separated using time-frequency masking method to single channel signal data;
Full articulamentum converges module, for target to be mapped the data fusion exported behind road and the separation of time-frequency masking road, and arranges
For the specification of target data, and then the echo signal data characteristics of output estimation.
8. single channel signal two-way separator according to claim 7, which is characterized in that target mapping road includes
Target maps deep learning model and mapping layer, and mapping layer uses relu race activation primitive simulated target mapping method, establishes mesh
The signal data and single channel band of mark mapping deep learning model output are made an uproar the mapping relations between mixed signal data, are somebody's turn to do
Target maps the echo signal data of road estimation;
The time-frequency masking road includes time-frequency masking deep learning model and masking layer, and masking layer uses sigmoid activation primitive
Time-frequency masking method is simulated, the signal data and single channel band for establishing the output of time-frequency masking deep learning model are made an uproar mixed signal number
Time-frequency masking ratio betweenIn advance in the estimation of time-frequency masking road simulation output echo signal data characteristics, i.e., withIt is handled, xiIt indicates that single channel band is made an uproar mixed signal data, is then converged with target mapping Lu Quan articulamentum
It closes.
9. a kind of storage medium, is stored thereon with computer program, which is characterized in that execute such as claim when the program is run
The described in any item single channel signal two-way separation methods of 1-6.
10. a kind of processor, which is characterized in that the processor is for running program, wherein executed such as when described program is run
Single channel signal two-way separation method described in any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910515889.XA CN110321810A (en) | 2019-06-14 | 2019-06-14 | Single channel signal two-way separation method, device, storage medium and processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910515889.XA CN110321810A (en) | 2019-06-14 | 2019-06-14 | Single channel signal two-way separation method, device, storage medium and processor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110321810A true CN110321810A (en) | 2019-10-11 |
Family
ID=68119589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910515889.XA Pending CN110321810A (en) | 2019-06-14 | 2019-06-14 | Single channel signal two-way separation method, device, storage medium and processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110321810A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126199A (en) * | 2019-12-11 | 2020-05-08 | 复旦大学 | Signal feature extraction and data mining method based on echo measurement data |
CN111583954A (en) * | 2020-05-12 | 2020-08-25 | 中国人民解放军国防科技大学 | Speaker independent single-channel voice separation method |
CN112259118A (en) * | 2020-10-19 | 2021-01-22 | 成都明杰科技有限公司 | Single track human voice and background music separation method |
CN112289338A (en) * | 2020-10-15 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Signal processing method and device, computer device and readable storage medium |
CN113053400A (en) * | 2019-12-27 | 2021-06-29 | 武汉Tcl集团工业研究院有限公司 | Training method of audio signal noise reduction model, audio signal noise reduction method and device |
CN114464206A (en) * | 2022-04-11 | 2022-05-10 | 中国人民解放军空军预警学院 | Single-channel blind source separation method and system |
CN114500189A (en) * | 2022-01-24 | 2022-05-13 | 华南理工大学 | Direct pre-equalization method, system, device and medium for visible light communication |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20100130328A (en) * | 2009-06-03 | 2010-12-13 | 충북대학교 산학협력단 | Method to combine casa and soft mask for single-channel speech separation |
CN106933649A (en) * | 2016-12-21 | 2017-07-07 | 华南师范大学 | Virtual machine load predicting method and system based on rolling average and neutral net |
CN109841226A (en) * | 2018-08-31 | 2019-06-04 | 大象声科(深圳)科技有限公司 | A kind of single channel real-time noise-reducing method based on convolution recurrent neural network |
-
2019
- 2019-06-14 CN CN201910515889.XA patent/CN110321810A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20100130328A (en) * | 2009-06-03 | 2010-12-13 | 충북대학교 산학협력단 | Method to combine casa and soft mask for single-channel speech separation |
CN106933649A (en) * | 2016-12-21 | 2017-07-07 | 华南师范大学 | Virtual machine load predicting method and system based on rolling average and neutral net |
CN109841226A (en) * | 2018-08-31 | 2019-06-04 | 大象声科(深圳)科技有限公司 | A kind of single channel real-time noise-reducing method based on convolution recurrent neural network |
Non-Patent Citations (1)
Title |
---|
张晖: ""基于深度学习的语音分离研究"", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126199A (en) * | 2019-12-11 | 2020-05-08 | 复旦大学 | Signal feature extraction and data mining method based on echo measurement data |
CN113053400A (en) * | 2019-12-27 | 2021-06-29 | 武汉Tcl集团工业研究院有限公司 | Training method of audio signal noise reduction model, audio signal noise reduction method and device |
CN113053400B (en) * | 2019-12-27 | 2024-06-07 | 武汉Tcl集团工业研究院有限公司 | Training method of audio signal noise reduction model, audio signal noise reduction method and equipment |
CN111583954A (en) * | 2020-05-12 | 2020-08-25 | 中国人民解放军国防科技大学 | Speaker independent single-channel voice separation method |
CN112289338A (en) * | 2020-10-15 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Signal processing method and device, computer device and readable storage medium |
CN112289338B (en) * | 2020-10-15 | 2024-03-12 | 腾讯科技(深圳)有限公司 | Signal processing method and device, computer equipment and readable storage medium |
CN112259118A (en) * | 2020-10-19 | 2021-01-22 | 成都明杰科技有限公司 | Single track human voice and background music separation method |
CN114500189A (en) * | 2022-01-24 | 2022-05-13 | 华南理工大学 | Direct pre-equalization method, system, device and medium for visible light communication |
CN114464206A (en) * | 2022-04-11 | 2022-05-10 | 中国人民解放军空军预警学院 | Single-channel blind source separation method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321810A (en) | Single channel signal two-way separation method, device, storage medium and processor | |
CN109993280B (en) | Underwater sound source positioning method based on deep learning | |
CN110728360B (en) | Micro-energy device energy identification method based on BP neural network | |
CN107169527B (en) | Medical image classification method based on collaborative deep learning | |
CN107194404B (en) | Underwater target feature extraction method based on convolutional neural network | |
CN107703486B (en) | Sound source positioning method based on convolutional neural network CNN | |
CN109620152B (en) | MutifacolLoss-densenert-based electrocardiosignal classification method | |
CN111709315A (en) | Underwater acoustic target radiation noise identification method based on field adaptation | |
CN110459225B (en) | Speaker recognition system based on CNN fusion characteristics | |
CN105488466B (en) | A kind of deep-neural-network and Acoustic Object vocal print feature extracting method | |
CN106952649A (en) | Method for distinguishing speek person based on convolutional neural networks and spectrogram | |
CN108231086A (en) | A kind of deep learning voice enhancer and method based on FPGA | |
CN106529428A (en) | Underwater target recognition method based on deep learning | |
CN111723701A (en) | Underwater target identification method | |
CN110120926A (en) | Modulation mode of communication signal recognition methods based on evolution BP neural network | |
CN104463194A (en) | Driver-vehicle classification method and device | |
CN109033632A (en) | A kind of trend forecasting method based on depth quantum nerve network | |
CN113158964A (en) | Sleep staging method based on residual learning and multi-granularity feature fusion | |
CN109344751B (en) | Reconstruction method of noise signal in vehicle | |
Li et al. | Automatic modulation classification based on bispectrum and CNN | |
CN110096976A (en) | Human behavior micro-Doppler classification method based on sparse migration network | |
CN108805206A (en) | Improved L SSVM establishing method for analog circuit fault classification | |
CN113109782B (en) | Classification method directly applied to radar radiation source amplitude sequence | |
CN112862084B (en) | Traffic flow prediction method based on deep migration fusion learning | |
Gang et al. | Time series prediction using wavelet process neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191011 |
|
RJ01 | Rejection of invention patent application after publication |