CN110321810A - Single channel signal two-way separation method, device, storage medium and processor - Google Patents

Single channel signal two-way separation method, device, storage medium and processor Download PDF

Info

Publication number
CN110321810A
CN110321810A CN201910515889.XA CN201910515889A CN110321810A CN 110321810 A CN110321810 A CN 110321810A CN 201910515889 A CN201910515889 A CN 201910515889A CN 110321810 A CN110321810 A CN 110321810A
Authority
CN
China
Prior art keywords
time
road
single channel
target
signal data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910515889.XA
Other languages
Chinese (zh)
Inventor
聂瑞华
高卓君
梁志浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN201910515889.XA priority Critical patent/CN110321810A/en
Publication of CN110321810A publication Critical patent/CN110321810A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The invention discloses a kind of single channel signal two-way separation method, device, storage medium and processors.Wherein, method is comprising steps of establish a multichannel neural-network learning model, the model includes target mapping road, time-frequency masking road and full articulamentum, target mapping road separates single channel signal data using target mapping method, parallel, time-frequency masking road separates single channel signal data using time-frequency masking method;The specification that the data exported behind target mapping road and the separation of time-frequency masking road are converged by full articulamentum, and arranged as target data, and then the echo signal data characteristics of output estimation.The present invention has been compatible with the advantage of time-frequency masking method and target mapping method respectively, and compensates for its defect to a certain extent, in the case where not considering signal data phase equally, model generalization better performances.

Description

Single channel signal two-way separation method, device, storage medium and processor
Technical field
The invention belongs to blind source separating (Blind Source Separation, BSS) research field, early stage main application It in field of signal processing, also known as is Blind Signal Separation, in particular to a kind of single channel signal two-way separation method, device, storage Medium and processor.
Background technique
It is the separation process of signal data to be regarded to a monitoring learning problem as, and then utilize deep learning mostly at present Network model is realized.Blind source separating generality frame based on deep learning is broadly divided into " deep learning model training " and " single Two stages of channel data separation ":
(1) training stage: using the feature of deep learning model extraction training data, learn unsegregated source signal data By the non-linear relation of the label signal data manually separated;
(2) separation phase: trained model is used for mixed signal data separating, finally the signal data isolated It reintegrates and fashions into complete signal data.
Key using deep learning method is to calculate the design of target, this can be directly reflected into the setting of cost function, There is great influence to the properties of deep learning model.Currently, for single channel signal data separating task, the meter of mainstream Calculating target mainly has target mapping and time-frequency masking:
(1) target maps: directly learning the mapping relations of source data and label data in the training process;It is testing and is testing The target data of output estimation during card is that monitoring learning problem is most direct, most widely calculates target setting method, Cost function setting are as follows:
Wherein yiFor required echo signal data, xiIt is that single channel band is made an uproar mixed signal data,It is by depth Practise the estimation for the echo signal data that model obtains.When carrying out single channel signal data separating, this method makes deep learning model Mapping relations between direct learning objective signal data and single channel signals with noise data.Its main feature has: 1. not needing Priori knowledge;2. needing not move through complicated data processing, characteristic extraction procedure;3. there is no the defects in physical theory.
But single channel band makes an uproar mixed signal data with noise randomness and unpredictability, the pass with echo signal data System is not direct, indefinite, and the major defect of such methods is: 1. model estimation difficulty is big;2. model training speed is slow;3. mould The extensive effect of type is poor.
(2) time-frequency covers disadvantage: assuming that echo signal data and single channel band mixed signal data of making an uproar are deposited on different time-frequencies In certain proportionate relationship, i.e. time-frequency masking, in the training process by special data processing, characteristic extraction procedure, learn Practise the time-frequency masking relationship of source data and label data;The time-frequency masking ratio of output estimation is closed during test and verification System, and then the echo signal data estimated.
When carrying out single channel signal data separating, this method makes deep learning model analysis echo signal data and single channel Signals with noise data have preferable performance effect in proportionate relationship present on different time-frequencies, for voice signal data separation Fruit.Its main feature has: 1. model estimation difficulty is smaller;2. model training fast speed;3. model generalization effect is preferable.
But in true environment, the major defect of such methods is: it is difficult to predict ranges for echo signal data, and often Occur because the phase of echo signal data and noise signal data it is unequal caused by physical interference phenomenon.
Therefore, in view of the above-mentioned problems, needing to provide a kind of single channel signal two-way separation method based on deep learning, dress It sets, and realizes the storage medium and processor of the above method or application above-mentioned apparatus.
Summary of the invention
The shortcomings that it is a primary object of the present invention to overcome the prior art and insufficient, use for reference multichannel neural network thought, when Frequency covering method and target mapping method provide a kind of single channel signal two-way separation method, device based on deep learning, It has been compatible with the advantage of time-frequency masking method and target mapping method respectively, and compensates for its defect to a certain extent, has The accurate advantage of fast convergence rate, separating resulting.
It is another object of the present invention to provide a kind of storage mediums, are stored thereon with computer program, program fortune The single channel signal two-way separation method is executed when row.
It is another object of the present invention to provide a kind of processor, the processor is for running program, wherein described Program executes the single channel signal two-way separation method when running.
The purpose of the present invention is realized by the following technical solution: single channel signal two-way separation method, comprising steps of
A multichannel neural-network learning model is established, which includes that target maps road, time-frequency masking road and full articulamentum, Target mapping road separates single channel signal data using target mapping method, parallel, time-frequency masking road uses time-frequency Covering method separates single channel signal data;The data exported behind target mapping road and the separation of time-frequency masking road pass through complete Articulamentum converges, and the specification arranged as target data, and then the echo signal data characteristics of output estimation.
Preferably, it when target mapping road separates single channel signal data using target mapping method, is reflected in target Design connects a mapping layer after penetrating deep learning model, and mapping layer uses relu race activation primitive simulated target mapping side Method, the mapping that the signal data and single channel band for establishing target mapping deep learning model output are made an uproar between mixed signal data are closed System obtains the echo signal data of target mapping road estimation.
Preferably, it when time-frequency masking road separates single channel signal data using time-frequency masking method, is covered in time-frequency It covers and designs one masking layer of connection after deep learning model, time-frequency masking deep learning model makes an uproar mixed signal to single channel band Data are separated, and masking layer simulates time-frequency masking method using sigmoid activation primitive, establish time-frequency masking deep learning mould The signal data and single channel band of type output are made an uproar the time-frequency masking ratio between mixed signal data
Further, be balance two-way for the weight of overall model and the distributional difference of output data, in advance when Frequency shelter road simulation output echo signal data characteristics estimation, i.e., withIt is handled, xiIndicate single channel band Then mixed signal of making an uproar data are converged with target mapping Lu Quan articulamentum.
Target mapping deep learning model, time-frequency masking deep learning model are all made of convolution as a preferred method, Neural network CNN is realized.
Target mapping deep learning model, time-frequency masking deep learning model are all made of length as a preferred method, When memory Recognition with Recurrent Neural Network LSTM realize.
Target mapping deep learning model, time-frequency masking deep learning model are all made of two-way as a preferred method, Long short-term memory Recognition with Recurrent Neural Network BLSTM is realized.
Single channel signal two-way separator, comprising:
One multichannel neural-network learning model module, the module include that target maps road, time-frequency masking road and full articulamentum, Wherein:
Target maps road, for being separated using target mapping method to single channel signal data,
Time-frequency masking road, for being separated using time-frequency masking method to single channel signal data;
Full articulamentum converges module, for target to be mapped the data fusion exported behind road and the separation of time-frequency masking road, and Arrange the specification for target data, and then the echo signal data characteristics of output estimation.
Preferably, target mapping road includes target mapping deep learning model and mapping layer, and mapping layer uses relu Race's activation primitive simulated target mapping method, the signal data for establishing target mapping deep learning model output are made an uproar with single channel band Mapping relations between mixed signal data obtain the echo signal data of target mapping road estimation.
Preferably, the time-frequency masking road includes time-frequency masking deep learning model and masking layer, and masking layer uses Sigmoid activation primitive simulates time-frequency masking method, establishes the signal data and single-pass of the output of time-frequency masking deep learning model Road band is made an uproar the time-frequency masking ratio between mixed signal dataIn advance in time-frequency masking road simulation output echo signal data The estimation of feature, i.e., withIt is handled, xiIt indicates that single channel band is made an uproar mixed signal data, is then reflected with target Rays are converged in full articulamentum.
The present invention is directed to single channel signal, by using target mapping method and time-frequency masking method to be divided respectively parallel From then being converged to the data exported after separation by full articulamentum, overall convergence speed is somewhere between time-frequency masking method Slower than time-frequency masking method between target mapping method but faster than target mapping method, there is no the reasons of time-frequency covering method By defect, performance is got well than time-frequency masking method and target mapping method;Time-frequency masking road plays " accelerator " Role, and target mapping road then plays the role of " lifter ".In the case where not considering signal data phase equally, model Generalization Capability is preferable.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention.
Fig. 2 is the treatment process schematic diagram of the full articulamentum meet of the method for the present invention.
Fig. 3 is to converge the full attended operation of processing to two-way to regard the schematic diagrames of semi-connected operations as.
Fig. 4 is the flow chart that deep learning model is all made of CNN realization in embodiment 1.
Fig. 5 is the flow chart that deep learning model is all made of LSTM realization in embodiment 1.
Fig. 6 is the flow chart that deep learning model is all made of BLSTM realization in embodiment 1.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
Embodiment 1
For the data separating of single channel signal, target mapping is respectively adopted in the prior art and time-frequency masking all has respectively From defect propose a kind of single channel signal of in summary two methods for this purpose, the present invention uses for reference multichannel neural network thought Two-way separation method.
Multichannel neural network is mainly characterized by that there is no multiple models, does not also need preparatory training, but single Multiple individual branches with different model structures and data process method are designed on the basis of model, and often each branch it Afterwards plus full articulamentum to converge, and instructed eventually by the unified entirety that backpropagation BP algorithm carries out each branch of entire model Practice.It is characterized in that multiple branch circuit, merging are trained, branch converges.The main thought of multichannel neural network is by single or polynary mould State data by multichannel process with expanding data dimension and increase model treatment granularity, be finally reached improve model learning efficiency and The purpose of performance.The characteristics of single dimension, single modal data can only be handled compared to traditional single channel neural network, multichannel nerve Network can not only handle multi-modal data, but also can be used to the observation dimension of expanding data, usually have better data-handling capacity And performance.Because the branch of multichannel neural network seems independent, but can be due to converging when backpropagation is integrally trained Close processing and generate it is actual influence each other, can achieve complementary effect.Single channel signal two-way separation proposed by the present invention Method can have complementary advantages both target mapping method and time-frequency masking method, to obtain better performance.
Referring to Fig. 1, the training stage can be divided into the present invention is based on the single channel signal two-way separation method of deep learning and divided From the stage, each stage is specifically described with reference to the accompanying drawing.
One, the training stage
In the training stage, method is broadly divided into 3 parts, is target mapping road, time-frequency masking road and full connection respectively Layer converges.
It include that target maps deep learning model, and setting maps deep learning mould in target in target mapping road Mapping layer after type.Target mapping deep learning model can be multiplicity, such as DNN, CNN, RNN, and mapping layer uses Relu race function (such as Relu, Leaky Relu, PRelu, ELU) simulated target mapping method, realization are mapped by previous step target The resulting signal data y of deep learning modeliIt makes an uproar mixed signal data x with single channel bandiBetween mapping relations yi=f (xi), To obtain the echo signal data of road estimation.
Include time-frequency masking deep learning model in the time-frequency masking road, and is arranged in time-frequency masking deep learning mould Masking layer after type.Time-frequency masking deep learning model is equally also possible to multiplicity, and existing time-frequency masking function has more Kind, such as Wiener filtering covering method (Wiener Filter Mask, WFM), ideal two-value covering method (Ideal Binary Mask, IBM) and ideal floating value covering method (Ideal Ratio Mask, IRM) etc., formula is as follows:
Wherein C is the number for mixing source signal, | si,ft| it is energy value of i-th of source signal in (f, t) time frequency unit.
And for different time-frequency masking functions, cost function has different setting methods, but can generally be summarized Are as follows:
Wherein yiFor required echo signal data, miFor target time-frequency masking,For estimating for deep learning model output Time-frequency masking is counted, M ' is masking reconstruction.The difference of two formulas is still to estimate with time-frequency masking with signal data to calculate Difference.
Masking layer in time-frequency masking road makes it can indirect learning echo signal data to simulate time-frequency masking method The time-frequency masking relationship made an uproar between mixed signal data with single channel band.
Due to time-frequency masking need by export-restriction be a certain range such as [0,1], masking layer can be used sigmoid activation letter Number:
But it is different from time-frequency masking road, target mapping road be do not need for output data to be limited in it is a certain range of, And if directly carrying out time-frequency masking road and target mapping road to converge processing, the data distribution that two-way can be made to export respectively is poor Different excessive (time-frequency masking road output area is [0,1], and target mapping road output area is [0 ,+∞]).If from model angle From the point of view of, this will cause the weight on time-frequency masking road and target mapping road, and unbalanced (the too small and target of time-frequency masking right of way weight maps Right of way weight is excessive) so that the study generation to model seriously affects.
Therefore, it is balance two-way for the weight of overall model and the distributional difference of output data, needs in advance in time-frequency Shelter road simulation output echo signal data characteristics estimation, i.e., withIt is handled, it just can finally and target Converge on mapping road.
Since target maps road and time-frequency masking road estimating with respective independent logical simulation output echo signal data Meter, the data of output are not generally identical;And two-way converge after data volume be twice of single-pass data amount, advised with target data Lattice are not consistent.Therefore the specification that the data that two branches export are converged by full articulamentum, and are arranged as target data is needed, And then the echo signal data characteristics of output estimation.Fig. 2, which is illustrated, to be converged two-way by full articulamentum and arranges data requirement Process.Two-way above-mentioned is converged into processing procedural abstraction expression using following formula:
WhereinIt is the ith feature data for estimating echo signal data;iAnd Wi' it is respectively that target maps road, time-frequency is covered The weight of the full attended operation on road is covered, b is the biasing of full attended operation;X and X ' is respectively target mapping road, time-frequency masking road The output data of n × 1,It is then by the concatenation of two-way output data.In addition, f is the activation primitive of full articulamentum, double Road, which is converged in the full articulamentum of processing, to be not provided with, but as the mapping layer operation on target mapping road, it is contemplated that signal data Again often using Short Time Fourier Transform amplitude spectrum as feature, therefore two-way converges the full articulamentum of processing that relu race generally can be used Activation primitive.
As target mapping method, two-way separation method final output is the echo signal data characteristics estimated, generation Valence function setup are as follows:
The above method proposed by the invention is built upon on deep learning, is that target mapping method, time-frequency is taken to cover The scheme of the length of both methods is covered, therefore hypothesis deduction can be carried out from the angle of deep learning, and then prove the section of this method The property learned and reasonability.
For the description for simplifying two-way separation method, semi-connected operations are regarded in the full attended operation that processing can be converged to two-way as, As shown in figure 3, using following formula abstract expression:
Due to having used relu race activation primitive, can simplify again are as follows:
Biasing b is omitted:
The ith feature data of echo signal data can will be estimated as a result,See target mapping road and time-frequency masking as The respective weight sum of products of corresponding data of road output.
From theory analysis, in the training process, the update of branch weight is regular.For time-frequency masking method, instruction It is more very fast than target mapping method to practice speed, therefore in model training early period, output maps road closer to echo signal than target Data y, such as following formula:
Wherein, XtAnd Xt-1The respectively output of this training iteration of target mapping road and last training iteration, X 'tWith X′t-1Respectively the output of this training iteration of time-frequency masking mapping road and last training iteration, y are echo signal data.
However, time-frequency masking method is there are theoretical defects, if physical interference influences excessive, phase after training, time-frequency Masking routing is unable to get effective training in data are limited in a certain range, but target mapping road is that there is no this theories to lack It is sunken, still can continue to train at this time, therefore its output than time-frequency masking road closer to y, such as following formula:
From the point of view of deep learning backpropagation mechanism, the separate branches closer to y are exported, weight should be bigger.Therefore In training early period, the weight on time-frequency masking road is larger, after training the phase, and the weight that target maps road is larger.If to train iteration T The early period and later period of partitioning model training, i.e. 0 < t < T is training early period, and t > T is the training later period, can be indicated are as follows:
Analyzed by above-mentioned theory, it is known that the present invention is in model training early period, time-frequency masking road training speed compared with Fastly, weight ratio target mapping road is larger, the estimation echo signal data of model outputError is constantly reduced;In model training In the later period, time-frequency masking road is unable to get effective training, but target mapping road still can continue to train, weight ratio time-frequency masking Road is larger, the estimation echo signal data of model outputError is still constantly reduced.The method of the present invention totality convergence speed It is slower than time-frequency masking method but faster than target mapping method between time-frequency masking method and target mapping method, it does not deposit In the theoretical defects of time-frequency masking method, performance is got well than time-frequency masking method and target mapping method;Time-frequency masking Road plays the role of " accelerator ", and target mapping road then plays the role of " lifter ".
Two, separation phase
Mixed signal data to be separated are entered data into respectively after carrying out data extraction/data processing It states trained target mapping branch and carries out target mapping separation, when being input to trained time-frequency masking road progress Frequency masking separation, then after full articulamentum converges, obtains echo signal data characteristics, and then can be used for subsequent echo signal Data waveform reconstruct.
In one embodiment, either target maps deep learning model or time-frequency masking deep learning model, It is realized using convolutional neural networks CNN.Process is as shown in figure 4, due to establishing in backpropagation BP algorithm and neocognitron (Neocogniron) on the basis of, weight sharing policy is used, the complexity of convolutional neural networks is compared to other nerve nets It is substantially reduced for network, training parameter also greatly reduces, and performance is significant.
In another embodiment, either target maps deep learning model or time-frequency masking deep learning model, Long short-term memory Recognition with Recurrent Neural Network LSTM is all made of to realize.Recognition with Recurrent Neural Network also known as recurrent neural network are Michael I.Jordan and Jeffrey Elman was proposed respectively at, nineteen ninety in 1986.The present embodiment method flow as shown in figure 5, LSTM improves the long sequence Dependence Problem of Recognition with Recurrent Neural Network RNN with " door " structure, retains RNN in the processing unit of feedforward connection Inside be added internal feedback connection operation, allow RNN neuron current time output state the moment later again It is input to the neuron, to realize the neural metwork training in time-domain.
In another embodiment, either target maps deep learning model or time-frequency masking deep learning model, Two-way long short-term memory Recognition with Recurrent Neural Network BLSTM is all made of to realize.Process as shown in fig. 6, BLSTM be two it is unidirectional long in short-term Remembering the combination of Recognition with Recurrent Neural Network LSTM, history input information and the following input information is respectively associated in each unidirectional LSTM, Finally the output data of two unidirectional LSTM is connected in output layer.
It is real respectively using two-way separation method proposed by the invention in public data collection environment and actual application environment The now single channel signal data two-way disjunctive model based on CNN, LSTM and BLSTM, by being covered with target mapping method, time-frequency Cover the comparative experiments verifying of method and the validity and practical application performance of test two-way separation method.
Two-way separation method described in the present embodiment carries out feature extraction using Short Time Fourier Transform method, using window it is long and The single channel that respectively 256 sampled points are grown in window walk, Hamming (Hamming) window of 128 sampled points is 16KHz to sample rate Signal data carries out Time-frequency Decomposition, obtains Fourier coefficient in short-term, and obtains in short-term to STFFT coefficient modulo operation (| STFT |) Fourier transformation amplitude spectrum (SFFT-magnitude).Wherein, Hamming window such as following formula:
Furthermore, it is contemplated that the continuity of signal data, the feature present frame together with front cross frame and rear two frame inputs jointly To model.
This experiment be based on CNN, LSTM and BLSTM realize respectively target mapping method, time-frequency masking (IRM) method and The single channel signal data separating model of dual-arm approach compares experiment.Table 1 is the essential information of each model of comparative experiments.
1 comparative experiments model essential information of table
Based on TIMIT corpus and NOISEX-92 noise collection, it is trained using above-mentioned 9 models.Table 2 gives Training complete iteration number when " terminating in advance " strategy is executed in training process.It can be seen that single channel signal data two-way The model training convergence rate of separation method (LSTM-TB, BLSTM-TB, CNN-TB) is very fast, in target mapping method and when Between frequency covering method.
Training complete iteration number when each model of table 2 " terminating in advance "
Model LSTM-IRM LSTM-MP LSTM-TB
Training the number of iterations 14 159 44
Model BLSTM-IRM BLSTM-MP BLSTM-TB
Training the number of iterations 15 129 117
Model CNN-IRM CNN-MP CNN-TB
Training the number of iterations 13 110 68
Table 3 is each method model performance comparing result realized in the case where matching noise signal data based on LSTM.It is given in table LSTM-IRM, LSTM-MP, LSTM-TB are gone out when mixing signal-to-noise ratio is -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB SDR, SAR, SNR performance comparison situation of separated estimation echo signal data out.As can be seen that being separated using LSTM-TB SDR value ratio LSTM-IRM, LSTM-MP mean height 1.4000,1.1250 of estimation echo signal data out;SNR value ratio LSTM- IRM, LSTM-MP mean height 3.4883,0.4333;SAR value ratio LSTM-MP mean height 1.3133.
Table 3 matches each method model performance comparison under noise data based on LSTM
Table 4 is each method model performance comparing result realized in the case where matching noise signal data based on BLSTM.It is given in table Having gone out BLSTM-IRM, BLSTM-MP, BLSTM-TB in mixing signal-to-noise ratio is -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB situation Under it is separated go out estimation echo signal data SDR, SAR, SNR performance comparison situation.As can be seen that using BLSTM-TB points SDR value ratio BLSTM-IRM, BLSTM-MP mean height 1.2617,1.2567 of the estimation echo signal data separated out;SNR value ratio BLSTM-IRM, BLSTM-MP mean height 2.9633,0.5883;SAR value ratio BLSTM-IRM, BLSTM-MP mean height 0.1050, 1.4233。
Table 4 matches each method model performance comparison under noise data based on BLSTM
Table 5 is each method model performance comparing result realized in the case where matching noise signal data based on CNN.It is provided in table CNN-IRM, CNN-MP, CNN-TB are separated when mixing signal-to-noise ratio and being -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB SDR, SAR, SNR performance comparison situation of estimation echo signal data out.As can be seen that the estimation isolated using CNN-TB SDR value ratio CNN-IRM, CNN-MP mean height 1.0517,1.6117 of echo signal data;SNR value ratio CNN-IRM, CNN-MP Mean height 2.8000,0.8850;SAR value ratio CNN-MP mean height 1.7767.
Table 5 matches each method model performance comparison under noise data based on CNN
Table 6 is each method model performance comparing result realized under non-matching noise signal data based on LSTM.In table It is -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB situation that LSTM-IRM, LSTM-MP, LSTM-TB, which are given, in mixing signal-to-noise ratio Under it is separated go out estimation echo signal data SDR, SAR, SNR performance comparison situation.As can be seen that using LSTM-TB points SDR value ratio LSTM-IRM, LSTM-MP mean height 0.6917,1.1183 of the estimation echo signal data separated out;SNR value ratio LSTM-IRM, LSTM-MP mean height 1.9933,0.5167;SAR value ratio LSTM-MP mean height 1.2717.
Each method model performance comparison under the non-matching noise data of table 6 based on LSTM
Table 7 is each method model performance comparing result realized under non-matching noise signal data based on BLSTM.In table It is -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB feelings that BLSTM-IRM, BLSTM-MP, BLSTM-TB, which are given, in mixing signal-to-noise ratio SDR, SAR, SNR performance comparison situation of separated estimation echo signal data out under condition.As can be seen that using BLSTM-TB SDR value ratio BLSTM-IRM, BLSTM-MP mean height 0.9017,0.7733 for the estimation echo signal data isolated;SNR value Than BLSTM-IRM, BLSTM-MP mean height 1.5400,0.0933;SAR value ratio BLSTM-IRM, BLSTM-MP mean height 0.4100、1.0050。
Each method model performance comparison under the non-matching noise data of table 7 based on BLSTM
Table 8 is each method model performance comparing result realized under non-matching noise signal data based on CNN.It is given in table Gone out CNN-IRM, CNN-MP, CNN-TB in the case that mix signal-to-noise ratio be -6dB, -4dB, -2dB, 0,2dB, 4dB, 6dB divided SDR, SAR, SNR performance comparison situation of the estimation echo signal data separated out.As can be seen that being estimated using what CNN-TB was isolated Count SDR value ratio CNN-IRM, CNN-MP mean height 0.6667,0.7300 of echo signal data;SNR value ratio CNN-IRM mean height 1.0633;SAR value ratio CNN-IRM, CNN-MP mean height 0.3467,1.1367.
Each method model performance comparison under the non-matching noise data of table 8 based on CNN
It is compared from the above experimental result, it is recognised that in the environment of public data collection, it is real based on LSTM, BLSTM, CNN Existing two-way separation method compares target no matter under matching noise signal data or non-matching noise signal data cases Mapping method and time-frequency masking method have better performance.
Embodiment 2
The present embodiment provides a kind of single channel signal two-way separators, correspond to separation method described in embodiment 1, The device includes a multichannel neural-network learning model module, which includes target mapping road, time-frequency masking road and full connection Layer, in which:
One multichannel neural-network learning model module, the module include that target maps road, time-frequency masking road and full articulamentum, Wherein:
Target maps road, for being separated using target mapping method to single channel signal data,
Time-frequency masking road, for being separated using time-frequency masking method to single channel signal data;
Full articulamentum converges module, for target to be mapped the data fusion exported behind road and the separation of time-frequency masking road, and Arrange the specification for target data, and then the echo signal data characteristics of output estimation.
Wherein, target mapping road includes target mapping deep learning model, and setting maps deep learning mould in target Mapping layer after type.Mapping layer uses relu race functional simulation target mapping method, realizes and maps depth by previous step target The resulting signal y of learning modeliIt makes an uproar mixed signal data x with single channel bandiBetween mapping relations yi=f (xi), to obtain The echo signal data of road estimation.
Wherein, time-frequency masking road includes time-frequency masking deep learning model, and is arranged in time-frequency masking deep learning mould Masking layer after type.Masking layer simulates time-frequency masking method using sigmoid activation primitive, realizes by previous step time-frequency masking The resulting signal data m of deep learning modeliIt makes an uproar mixed signal data x with single channel bandiBetween time-frequency masking ratio
It is balance two-way for the weight of overall model and the distributional difference of output data, needs in advance on time-frequency masking road The estimation of simulation output echo signal data characteristics, i.e., withIt is handled, just finally can map road with target Converge.
Full articulamentum converges module for converging the data that two branches export by full articulamentum, and arranging is number of targets According to specification, and then the echo signal data characteristics of output estimation.
In the present embodiment, the target mapping deep learning model in target mapping block and the time-frequency in time-frequency masking module Deep learning model is sheltered, the convolutional neural networks CNN in embodiment 1 or the long short-term memory in embodiment 2 can be used Two-way long short-term memory Recognition with Recurrent Neural Network BLSTM in Recognition with Recurrent Neural Network LSTM or embodiment 3.
Embodiment 3
The present embodiment provides a kind of storage mediums, are stored thereon with computer program, when which runs, can be performed and implement Timing physiological data classification method described in example 1.
Embodiment 4
The present embodiment provides a kind of processor, the processor is for running program, wherein described program executes when running Timing physiological data classification method described in embodiment 1.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims (10)

1. single channel signal two-way separation method, which is characterized in that comprising steps of
A multichannel neural-network learning model is established, which includes target mapping road, time-frequency masking road and full articulamentum, target Mapping road separates single channel signal data using target mapping method, parallel, time-frequency masking road uses time-frequency masking Method separates single channel signal data;The data exported behind target mapping road and the separation of time-frequency masking road by connecting entirely Layer converges, and the specification arranged as target data, and then the echo signal data characteristics of output estimation.
2. single channel signal two-way separation method according to claim 1, which is characterized in that target maps road and uses target When mapping method separates single channel signal data, one mapping of design connection after target maps deep learning model Layer, mapping layer use relu race activation primitive simulated target mapping method, establish the letter of target mapping deep learning model output Number and single channel band are made an uproar the mapping relations between mixed signal data, and the echo signal number of target mapping road estimation is obtained According to.
3. single channel signal two-way separation method according to claim 1, which is characterized in that time-frequency masking road uses time-frequency When covering method separates single channel signal data, one masking of design connection after time-frequency masking deep learning model Layer, time-frequency masking deep learning model separate single channel band mixed signal data of making an uproar, and masking layer is using sigmoid activation Functional simulation time-frequency masking method, the signal data for establishing the output of time-frequency masking deep learning model is made an uproar with single channel band mixes letter Time-frequency masking ratio between numberIn advance in the estimation of time-frequency masking road simulation output echo signal data characteristics, i.e., WithIt is handled, xiIt indicates that single channel band is made an uproar mixed signal data, then maps Lu Quan articulamentum with target Converge.
4. single channel signal two-way separation method according to claim 1, which is characterized in that target maps deep learning mould Type, time-frequency masking deep learning model are all made of convolutional neural networks CNN realization.
5. single channel signal two-way separation method according to claim 1, which is characterized in that target maps deep learning mould Type, time-frequency masking deep learning model are all made of long short-term memory Recognition with Recurrent Neural Network LSTM and realize.
6. single channel signal two-way separation method according to claim 1, which is characterized in that target maps deep learning mould Type, time-frequency masking deep learning model are all made of two-way long short-term memory Recognition with Recurrent Neural Network BLSTM and realize.
7. single channel signal two-way separator characterized by comprising
One multichannel neural-network learning model module, the module include that target maps road, time-frequency masking road and full articulamentum, In:
Target maps road, for being separated using target mapping method to single channel signal data,
Time-frequency masking road, for being separated using time-frequency masking method to single channel signal data;
Full articulamentum converges module, for target to be mapped the data fusion exported behind road and the separation of time-frequency masking road, and arranges For the specification of target data, and then the echo signal data characteristics of output estimation.
8. single channel signal two-way separator according to claim 7, which is characterized in that target mapping road includes Target maps deep learning model and mapping layer, and mapping layer uses relu race activation primitive simulated target mapping method, establishes mesh The signal data and single channel band of mark mapping deep learning model output are made an uproar the mapping relations between mixed signal data, are somebody's turn to do Target maps the echo signal data of road estimation;
The time-frequency masking road includes time-frequency masking deep learning model and masking layer, and masking layer uses sigmoid activation primitive Time-frequency masking method is simulated, the signal data and single channel band for establishing the output of time-frequency masking deep learning model are made an uproar mixed signal number Time-frequency masking ratio betweenIn advance in the estimation of time-frequency masking road simulation output echo signal data characteristics, i.e., withIt is handled, xiIt indicates that single channel band is made an uproar mixed signal data, is then converged with target mapping Lu Quan articulamentum It closes.
9. a kind of storage medium, is stored thereon with computer program, which is characterized in that execute such as claim when the program is run The described in any item single channel signal two-way separation methods of 1-6.
10. a kind of processor, which is characterized in that the processor is for running program, wherein executed such as when described program is run Single channel signal two-way separation method described in any one of claims 1-6.
CN201910515889.XA 2019-06-14 2019-06-14 Single channel signal two-way separation method, device, storage medium and processor Pending CN110321810A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910515889.XA CN110321810A (en) 2019-06-14 2019-06-14 Single channel signal two-way separation method, device, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910515889.XA CN110321810A (en) 2019-06-14 2019-06-14 Single channel signal two-way separation method, device, storage medium and processor

Publications (1)

Publication Number Publication Date
CN110321810A true CN110321810A (en) 2019-10-11

Family

ID=68119589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910515889.XA Pending CN110321810A (en) 2019-06-14 2019-06-14 Single channel signal two-way separation method, device, storage medium and processor

Country Status (1)

Country Link
CN (1) CN110321810A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126199A (en) * 2019-12-11 2020-05-08 复旦大学 Signal feature extraction and data mining method based on echo measurement data
CN111583954A (en) * 2020-05-12 2020-08-25 中国人民解放军国防科技大学 Speaker independent single-channel voice separation method
CN112259118A (en) * 2020-10-19 2021-01-22 成都明杰科技有限公司 Single track human voice and background music separation method
CN112289338A (en) * 2020-10-15 2021-01-29 腾讯科技(深圳)有限公司 Signal processing method and device, computer device and readable storage medium
CN113053400A (en) * 2019-12-27 2021-06-29 武汉Tcl集团工业研究院有限公司 Training method of audio signal noise reduction model, audio signal noise reduction method and device
CN114464206A (en) * 2022-04-11 2022-05-10 中国人民解放军空军预警学院 Single-channel blind source separation method and system
CN114500189A (en) * 2022-01-24 2022-05-13 华南理工大学 Direct pre-equalization method, system, device and medium for visible light communication

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100130328A (en) * 2009-06-03 2010-12-13 충북대학교 산학협력단 Method to combine casa and soft mask for single-channel speech separation
CN106933649A (en) * 2016-12-21 2017-07-07 华南师范大学 Virtual machine load predicting method and system based on rolling average and neutral net
CN109841226A (en) * 2018-08-31 2019-06-04 大象声科(深圳)科技有限公司 A kind of single channel real-time noise-reducing method based on convolution recurrent neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100130328A (en) * 2009-06-03 2010-12-13 충북대학교 산학협력단 Method to combine casa and soft mask for single-channel speech separation
CN106933649A (en) * 2016-12-21 2017-07-07 华南师范大学 Virtual machine load predicting method and system based on rolling average and neutral net
CN109841226A (en) * 2018-08-31 2019-06-04 大象声科(深圳)科技有限公司 A kind of single channel real-time noise-reducing method based on convolution recurrent neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张晖: ""基于深度学习的语音分离研究"", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126199A (en) * 2019-12-11 2020-05-08 复旦大学 Signal feature extraction and data mining method based on echo measurement data
CN113053400A (en) * 2019-12-27 2021-06-29 武汉Tcl集团工业研究院有限公司 Training method of audio signal noise reduction model, audio signal noise reduction method and device
CN113053400B (en) * 2019-12-27 2024-06-07 武汉Tcl集团工业研究院有限公司 Training method of audio signal noise reduction model, audio signal noise reduction method and equipment
CN111583954A (en) * 2020-05-12 2020-08-25 中国人民解放军国防科技大学 Speaker independent single-channel voice separation method
CN112289338A (en) * 2020-10-15 2021-01-29 腾讯科技(深圳)有限公司 Signal processing method and device, computer device and readable storage medium
CN112289338B (en) * 2020-10-15 2024-03-12 腾讯科技(深圳)有限公司 Signal processing method and device, computer equipment and readable storage medium
CN112259118A (en) * 2020-10-19 2021-01-22 成都明杰科技有限公司 Single track human voice and background music separation method
CN114500189A (en) * 2022-01-24 2022-05-13 华南理工大学 Direct pre-equalization method, system, device and medium for visible light communication
CN114464206A (en) * 2022-04-11 2022-05-10 中国人民解放军空军预警学院 Single-channel blind source separation method and system

Similar Documents

Publication Publication Date Title
CN110321810A (en) Single channel signal two-way separation method, device, storage medium and processor
CN109993280B (en) Underwater sound source positioning method based on deep learning
CN110728360B (en) Micro-energy device energy identification method based on BP neural network
CN107169527B (en) Medical image classification method based on collaborative deep learning
CN107194404B (en) Underwater target feature extraction method based on convolutional neural network
CN107703486B (en) Sound source positioning method based on convolutional neural network CNN
CN109620152B (en) MutifacolLoss-densenert-based electrocardiosignal classification method
CN111709315A (en) Underwater acoustic target radiation noise identification method based on field adaptation
CN110459225B (en) Speaker recognition system based on CNN fusion characteristics
CN105488466B (en) A kind of deep-neural-network and Acoustic Object vocal print feature extracting method
CN106952649A (en) Method for distinguishing speek person based on convolutional neural networks and spectrogram
CN108231086A (en) A kind of deep learning voice enhancer and method based on FPGA
CN106529428A (en) Underwater target recognition method based on deep learning
CN111723701A (en) Underwater target identification method
CN110120926A (en) Modulation mode of communication signal recognition methods based on evolution BP neural network
CN104463194A (en) Driver-vehicle classification method and device
CN109033632A (en) A kind of trend forecasting method based on depth quantum nerve network
CN113158964A (en) Sleep staging method based on residual learning and multi-granularity feature fusion
CN109344751B (en) Reconstruction method of noise signal in vehicle
Li et al. Automatic modulation classification based on bispectrum and CNN
CN110096976A (en) Human behavior micro-Doppler classification method based on sparse migration network
CN108805206A (en) Improved L SSVM establishing method for analog circuit fault classification
CN113109782B (en) Classification method directly applied to radar radiation source amplitude sequence
CN112862084B (en) Traffic flow prediction method based on deep migration fusion learning
Gang et al. Time series prediction using wavelet process neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191011

RJ01 Rejection of invention patent application after publication