CN107154258A

CN107154258A - Method for recognizing sound-groove based on negatively correlated incremental learning

Info

Publication number: CN107154258A
Application number: CN201710229138.2A
Authority: CN
Inventors: 王念滨; 何鸣; 宋奎勇; 周连科; 王红滨; 孙文; 王瑛琦; 尹新亮; 顾镇北; 曾庆宇
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2017-04-10
Filing date: 2017-04-10
Publication date: 2017-09-12

Abstract

The present invention is to provide a kind of method for recognizing sound-groove based on negatively correlated incremental learning.First, the voice signal of input is pre-processed and feature extraction；2nd, system integrating is initialized, if existing system integrating before, replicates current all-network；3rd, system integrating is trained；4th, structural adjustment is carried out to each network in system integrating；5th, current network is screened, selects wherein optimal a part of network；6th, currently available network is applied, if new data arrives, then circulates execution at the beginning from step.The present invention is that Application on Voiceprint Recognition is studied in the method for incremental learning, improves its efficiency and recognition accuracy under data increment arrival scene；Incremental Learning Algorithm based on negative correlation learning can efficiently solve increment problem.The present invention is improved in terms of model training and model select two respectively, it is proposed that the problem of a kind of new algorithm is to solve above-mentioned, is then applied in incremental learning.

Description

Method for recognizing sound-groove based on negatively correlated incremental learning

Technical field

The present invention relates to a kind of method for recognizing sound-groove.

Background technology

Vocal print refers to the information of acoustic wave figure for reflecting human speech sound spectrum of speaking, during Application on Voiceprint Recognition is exactly the oscillogram according to voice The speech parameter comprising speaker characteristic reflected carrys out the automatic technology that speaker's identity is identified and judged.Due to every The difference of personal vocal organs, the sound property issued and its tone are also different, therefore, regard vocal print as individual character Identification and checking of the feature to realize different people have the characteristics, the technology of Application on Voiceprint Recognition such as quick, stable and accuracy rate height Also information and the every field of network are widely applied to.

But under substantial amounts of application scenarios, training data can not be obtained disposably, and Application on Voiceprint Recognition also has Well-known defect, personally for, although voice signal has relative stability, nor definitely unalterable , it may have mutability, easily influenceed by from health, mental emotion and sound pick-up outfit.The hair of same person Sound is under different periods, different physiological status, or is recorded under different microphone and channel parameter, the sound of its voice Learn feature and be likely to occur change, so that the performance of identification, which is produced, to be influenceed, the machine learning side of at this moment traditional batch type Formula, which seems, to be difficult in adapt to.

Negative correlation learning (Negative Correlation Learning, NCL) is that a kind of Artificial neural network ensemble study is calculated Method, can solve increment problem, negative correlation learning is a kind of Ensemble Learning Algorithms based on ANN, parallel using BP algorithm Training it is integrated in each individual neutral net.For training sample set { (x₁,t₁),...,(x_n,t_n) system integrating it is defeated Go out for：

Wherein, M is integrated size, i.e. the number in integrated middle individual neutral net, F_i(n) it is i-th of neutral net For the output of n-th of sample.

The characteristics of negative correlation learning is maximum is the design of its error function, and the error function is by mean square error and penalty term structure Into i-th of network is as follows for the error function of n-th of sample：

Wherein t_nIt is sample i target output, p_i(n) it is penalty term, λ is a control parameter, for adjusting the power of punishment Degree, value is [0,1], referred to as punishes term coefficient, when λ is 0, and interaction is not present between each network.Use the error function It can to reach a balance between the mean square error for influenceing individual precision and the penalty term for influenceing interindividual variation.Wherein punish Item is penalized to be defined as：

Pi (n)=(F_i(n)-F(n))∑_j≠iF_j(n)-F(n)

F in formula_i(n) be network i output, F (n) is integrated output.

NCL illustrates good performance in incremental learning, earliest NCL by simple application in incremental learning when, research Person has separately designed two kinds of algorithms：One kind is the fixed NCL of integrated size (Fixed Size NCL, FSNCL), another to be Collection increases the NCL (Growing NCL, GNCL) of formula on a large scale.The advantage and defect of both algorithms are entered in 2.2.2 sections Went analysis, for its advantage and disadvantage, Minlong Lin et al. propose selective negative correlation learning (Selective Negative Correlation Learning, SNCL) algorithm.

The basic thought of the algorithm is, every time when new data arrives, will it is original it is integrated in all neutral nets replicate It is a, it is assumed that integrated size is N, is learnt using the network after duplication on new data set, by all neutral nets Integrate and obtain 2N network, then carry out a selection operation, brush selects N number of network therein, the network so obtained In the network trained of existing use new data, contain what is above trained again, both ensure that integrated size kept constant, Also so that the network remained can adapt to new data, and old knowledge can preferably be retained.SNCL has preferably Generalization Capability, but but it is far above other algorithms on time complexity, in addition, also there is the hiding node layer of individual networks Number is difficult to determine, easy to produce the phenomenons such as over-fitting.

The content of the invention

Can improve efficiency and recognition accuracy it is an object of the invention to provide one kind, reduction time complexity based on negative The method for recognizing sound-groove of related incremental learning.

The object of the present invention is achieved like this：

Step 1: being pre-processed to the voice signal of input and feature extraction；

Step 2: initialization system integrating, if existing system integrating before, replicates current all-network；

Step 3: being trained to system integrating；

Step 4: carrying out structural adjustment to each network in system integrating；

Step 5: being screened to current network, wherein optimal a part of network is selected；

Step 6: currently available network is applied, if new data arrives, then circulate and hold at the beginning from step OK.

The present invention can also include：

1st, the voice signal of described pair of input is pre-processed and feature extraction is specifically included：Firstly for raw tone letter Number carry out sample quantization, then by analog signal with certain frequency sampling processing, become discrete data, then carry out at preemphasis Reason, followed by framing and windowing process, is divided into a frame by the signal in a period of time, is analyzed in units of frame, then Jing Yin section is filtered out using double threshold method, mel-frequency cepstrum coefficient (Mel- next is extracted to the voice signal after treated Frequency Cepstral Coefficients, MFCC) and first-order difference MFCC is as model training and knows another characteristic ginseng Number.

2nd, step 2 is specifically included：If not carrying out network training before, i.e., currently available data set is first Batch, then initializing M neutral net is used for system integrating, wherein the number of each neutral net concealed nodes uses following public affairs Formula is determined：

Otherwise by currently the network of trained mistake is replicated.In formula, n is input node number, and l saves for output Point number, the number of input node and output node determines by the feature and problem to be solved of input sample, α be one Random number on interval [1,10].

3rd, described be trained to system integrating specifically includes：D is obtained to training data progress repeated sampling individual different Data set, the number of data set is identical with the number of network, i.e. D=M.Then concurrently instructed using Negative Correlation Training Method Practice, the speed that steepness factor accelerates network training is introduced during training, if replicated to network, that is, is worked as Preceding network number is 2M, then trains and carried out on network after the replication, otherwise primitive network is trained.

4th, each network progress structural adjustment in system integrating is specifically included：After training terminates, it is judged Default target error whether is met, next step is carried out if meeting, each network being otherwise trained for step 3, The importance of each of which concealed nodes is calculated, for each insignificant node, maximum important of an associated property is found Node, the node of two correlation maximums is merged, and the mode of merging is to merge both corresponding weights, weight The property wanted η_iCalculation formula it is as follows：

Wherein σ_iIt is concealed nodes i standard deviation, the output by the concealed nodes in all sample datas is calculated；μ_i It is the number of times that the node is trained to so far.

Calculate two correlation of nodes formula be：

Wherein h_iAnd h (p)_j(p) it is respectively output to i and j-th of concealed nodes on sample p in training set, becomes AmountWithIt is h respectively_iAnd h_jAverage value.

The present invention is that Application on Voiceprint Recognition is studied in the method for incremental learning, improves it under data increment arrival scene Efficiency and recognition accuracy.Incremental Learning Algorithm based on negative correlation learning can efficiently solve increment problem, but existing The selective Negative Correlation Training Method (SNCL) of classification degree of accuracy highest too high, the individual networks that remain time complexity Hidden layer node number be difficult to determine, easily produce over-fitting the problems such as.The present invention is selected from model training and model respectively Two aspects are improved, it is proposed that the problem of a kind of new algorithm is to solve above-mentioned, are then applied in incremental learning.

The present invention uses the incremental learning based on negative correlation learning (Negative Correlation Learning, NCL) Algorithm is as the recognizer of vocal print, and in existing related algorithm, fixed-size FSNCL (Fixed Size NCL) is easily Forgetting is produced to the knowledge learned, the overall Generalization Capabilities of GNCL (Growing NCL) for increasing formula are not good, based on selective collection Into SNCL (Selective NCL) precision it is higher but the training time is long.The present invention is to SNCL algorithms from model training and mould Type selects two aspects to be improved.First, NCL algorithms are based on BP networks and are modified so that individual net Network error during training, to negatively correlated direction change, adds the otherness between network, but the network basis ignored Easily there is hidden layer node number and is difficult determination in the training process in body, and the training time is long, easily produces over-fitting etc. and asks Topic, the present invention NCL algorithms are improved for these problems so that network structure adaptive change, and by its with Bagging algorithms are combined, referred to as ANCLBag algorithms, and the algorithm avoids purely manual setting concealed nodes number, makes its root Situation about being trained according to current network, which adaptively increases or deleted, hides node layer, artificial error is reduced, in the mistake of adjustment Cheng Zhong, excavates the information processing capability of network in itself, keeps node number less, on the one hand subtract in the training process as much as possible Iterations is lacked, on the other hand, smaller unnecessary concealed nodes are likely to result in the possibility of over-fitting.In addition, with After Bagging algorithms are combined, both influence in terms of training set generation and network training on whole integrated generation respectively so that Difference between individual networks further increases, and ensure that overall Generalization Capability.Then, base of the present invention in the algorithm The research of incremental learning has been carried out on plinth.During incremental learning is studied, the present invention has used for reference SNCL framework, and to it The method of model selection is modified, and using the selective ensemble method based on sub-clustering and sequence, is considered simultaneously in the method The accuracy and otherness of model, propose a kind of new SANCLBag algorithms, and experiment shows that the Generalization Capability of the algorithm is slightly higher In SNCL, the latter is then substantially better than in terms of time complexity.

Finally, the model is applied in Application on Voiceprint Recognition, realize one voice signal can be pre-processed, feature Extract, model incremental training and pattern-recognition Application on Voiceprint Recognition model, and using the model carried out parameter selection and increment test The experiment of card, test result indicates that the model has higher recognition accuracy, and can efficiently solve asking for incremental learning Topic.

Brief description of the drawings

Fig. 1 is FB(flow block) of the invention；

Fig. 2 is comparison diagram of the inventive algorithm with the algorithm before improving in terms of classification accuracy；

Fig. 3 is for inventive algorithm with the algorithm before improving in the time-related comparison diagram of algorithm；

Fig. 4 is inventive algorithm and other traditional algorithm classification accuracy comparison diagrams.

Embodiment

Illustrate below and the present invention is described in more detail.

With reference to Fig. 1, method of the invention is realized by following steps：

Step 1: being pre-processed to voice signal and feature extraction.Sampled firstly for original voice signal Quantify, handled by analog signal with certain frequency sampling, so as to realize analog-to-digital conversion, become discrete data.Then carry out Preemphasis processing, the section for making frequency high is highlighted, and filters out interference of the frequency compared with lower curtate.Next previous step is obtained Result framing and windowing process, the voice signal that voice signal possesses in validity in short-term, i.e. 10ms to 30ms is relatively flat Steady.Therefore, the signal in a period of time is divided into a frame, analyzed in units of frame, then filtered out using double threshold method Jing Yin section.Next the spy that MFCC and first-order difference MFCC is used as model training and identification is extracted to the voice signal after treated Levy parameter；

If Step 2: not carrying out network training before, i.e., currently available data set is first, then initializes M neutral net is for integrated, wherein the number of each neutral net concealed nodes is determined using following empirical equation：

Step 3: repeated sampling is carried out to training data obtains the different data sets of D, the number of data set and network Number is identical, i.e. D=M.Then concurrently trained using Negative Correlation Training Method, during training introduce steepness because Son is to accelerate the speed of network training, if previous step is replicated to network, i.e., current network number is 2M, then trains Carry out, primitive network is trained on network after the replication otherwise；

Step 4: after training terminates, judging whether it meets default target error, carried out if meeting next Step, each network being otherwise trained for step 3 calculates the importance of each of which concealed nodes, for each insignificant Node, find the maximum important node of an associated property (removing insignificant node, remaining is all important node), will The node of two correlation maximums is merged, and the mode of merging is to merge both corresponding weights, importance η_i Calculation formula it is as follows：

Here σ_iConcealed nodes i standard deviation, the standard deviation by the concealed nodes all sample datas output Calculate, this value can show situation of change of the node for different sample datas well.When standard deviation is smaller, say The bright concealed nodes export a very close value for all samples, nearly all, and in other words, the node is for difference The separating capacity of sample is poor.Because in three layers of feed-forward network, hidden layer and output layer are joined directly together and connect, hidden layer The difference of the result of different pieces of information will be produced directly on the output of network to be influenceed, it can be considered that the node is for net It is unessential for network.Another parameter μ in formula_iIt is the number of times that the node is trained to so far, used here as instruction The reason for practicing number of times is, because adjustment operation generates new node, and new node and node frequency of training before are differences 's.The formula for calculating two correlation of nodes is as follows：

Wherein h_iAnd h (p)_j(p) it is respectively output to i and j-th of concealed nodes on sample p in training set.Become AmountWithIt is h respectively_iAnd h_jAverage value, average value here calculates after all samples have been trained.If through Training before crossing, overall error reduction is not obvious, then increases a concealed nodes, and method is that one node of random selection enters Line splitting；

If Step 5: current network number is 2M, sub-clustering is carried out to all networks, by the net of correlation maximum Network is divided into cluster, and a classification degree of accuracy highest network is then selected from every cluster, if current network number is M, Then directly carry out next step；

Step 6: the network now obtained is used to apply, if there is new data to arrive afterwards, return to step one.

Claims

1. a kind of method for recognizing sound-groove based on negatively correlated incremental learning, it is characterized in that：

Step 3: being trained to system integrating；

Step 6: currently available network is applied, if new data arrives, then execution is circulated at the beginning from step.

2. the method for recognizing sound-groove according to claim 1 based on negatively correlated incremental learning, it is characterized in that described pair of input Voice signal carry out pretreatment and feature extraction specifically include：Sample quantization is carried out firstly for primary speech signal, then will Analog signal with certain frequency sampling processing, become discrete data, then carry out preemphasis processing, followed by framing and Windowing process, is divided into a frame by the signal in a period of time, is analyzed in units of frame, then using double threshold method filter out it is quiet Segment, it is MFCC and first-order difference MFCC as mould that mel-frequency cepstrum coefficient is next extracted to the voice signal after treated Type training and the characteristic parameter of identification.

3. the method for recognizing sound-groove according to claim 2 based on negatively correlated incremental learning, it is characterized in that step 2 is specific Including：If not carrying out network training before, i.e., currently available data set is first, then initializes M nerve net Network is used for system integrating, wherein the number of each neutral net concealed nodes is determined using formula below：

Otherwise by currently the network of trained mistake is replicated, in formula, n is input node number, and l is output node The number of number, input node and output node determines that α is one in interval by the feature and problem to be solved of input sample [1,10] random number on.

4. the method for recognizing sound-groove according to claim 3 based on negatively correlated incremental learning, it is characterized in that described to network Integrated be trained specifically includes：Repeated sampling is carried out to training data and obtains the different data sets of D, the number of data set and The number of network is identical, i.e. D=M, is then concurrently trained, is introduced during training using Negative Correlation Training Method Steepness factor accelerates the speed of network training, if replicated to network, i.e., current network number is 2M, then instructs Practice and carried out on network after the replication, otherwise primitive network is trained.

5. the method for recognizing sound-groove according to claim 4 based on negatively correlated incremental learning, it is characterized in that described to network Each network in integrated carries out structural adjustment and specifically included：After training terminates, judge whether it meets default target and miss Difference, next step is carried out if meeting, and each network being otherwise trained for step 3 calculates each of which concealed nodes Importance, for each insignificant node, finds a maximum important node of associated property, by two correlation maximums Node merge, the mode of merging is to merge both corresponding weights, importance η_iCalculation formula it is as follows：

Wherein σ_iIt is concealed nodes i standard deviation, the output by the concealed nodes in all sample datas is calculated；μ_iIt is this The number of times that node is trained to so far.

Calculate two correlation of nodes formula be：

Wherein h_iAnd h (p)_j(p) it is respectively output to i and j-th of concealed nodes on sample p in training set, variable WithIt is h respectively_iAnd h_jAverage value.