CN112542161A

CN112542161A - BP neural network voice recognition method based on double-layer PID optimization

Info

Publication number: CN112542161A
Application number: CN202011455918.7A
Authority: CN
Inventors: 和思铭; 李伟觐; 曾文钰; 范晨奥; 刘世新; 汪雨琦; 吴英然
Original assignee: Changchun Institute of Applied Chemistry of CAS
Current assignee: Changchun Institute of Applied Chemistry of CAS
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2021-03-23
Anticipated expiration: 2040-12-10
Also published as: CN112542161B

Abstract

The invention relates to a BP neural network voice recognition method based on double-layer PID optimization, which takes an FPGA as a platform of a voice signal input method, adjusts a weight threshold value and a learning rate by using a PID algorithm, and adopts K in the double-layer PID algorithm_P，K_IAnd K_DError E of three parameters according to system and training result_g(k) The automatic adjustment is carried out, so that the weight threshold convergence of the hidden layer and the output layer is more stable, and the data fluctuation of the system is reduced; the outer PID algorithm synchronizes the updating of the learning rate with the training process of the neural network, provides a larger updating intensity in the early stage of the neural network training to enable the neural network to iterate rapidly, and reduces the updating intensity in the later stage of the neural network training to prevent the data from deviating correctlyThe learning rate can provide larger updating intensity in the early stage to help the system to update quickly, and can reduce the updating intensity in the later stage to prevent the data from deviating from the correct value, so that the accuracy of voice recognition is higher.

Description

BP neural network voice recognition method based on double-layer PID optimization

The technical field is as follows:

the invention relates to an artificial intelligence algorithm, in particular to a BP neural network speech recognizer optimized by double-layer PID

Background art:

as an important application direction in the field of artificial intelligence, speech recognition has been studied by many learners, such as speech recognition under deep learning, and a speech recognition method under a support vector machine. In the aspect of learning algorithm, the BP algorithm has been named after a strong nonlinear mapping and a simple structure, and has been used for a long time in the context of speech recognition, but it also has some defects, such as that a weight threshold and a learning rate cannot be determined during network initialization, convergence fluctuation is large if the setting is too large, and convergence speed is slow if the setting is small. The existing weight threshold updating formula almost depends on a negative gradient algorithm, although the negative gradient algorithm accelerates the convergence of the weight threshold to a certain degree, the numerical fluctuation caused by the negative gradient algorithm is too large, and the normal convergence of the weight threshold is influenced to a certain degree. Meanwhile, the updating of the learning rate is basically manually adjusted and set in continuous experiments, the existing variable learning rate algorithm only simply and linearly reduces the learning rate, the influence of the learning rate is reduced when the algorithm is carried out to the later stage, the output deviation from a correct result due to fluctuation caused by overlarge updating force is avoided, but the effect of the algorithm on the correct rate is not achieved, and the condition of low music identification correct rate can occur when a general BP neural network structure is used for identifying music data.

CN103639211A discloses a BP neural network and PID parameter optimization roll gap control method and system, and provides a method for optimizing PID structure by using a neural network, wherein PID parameters are more stable by the method, and the algorithm is more stable.

CN110488600A discloses an LQR optimization type brushless DC motor speed regulation neural network PID controller, and provides an algorithm for optimizing neural network regulation PID by using an LQR algorithm, so that the DC motor control is more stable.

CN104834215A discloses a BP neural network PID control algorithm optimized by variation particle swarm, and provides a BP neural network PID control algorithm optimized by variation particle swarm, and the method enables PID algorithm output to be more stable.

The invention content is as follows:

the invention aims to provide a BP neural network speech recognition method based on double-layer PID optimization aiming at the defects of the prior art.

The invention idea is as follows: and the FPGA with the voice recognition function is used as a platform of a voice signal input method. The double-layer PID is divided into an inner layer and an outer layer, the inner layer PID algorithm reduces data fluctuation caused by a negative gradient algorithm, so that the weight threshold updating process is converged smoothly, and the oscillation generated during convergence is reduced; meanwhile, the outer-layer PID algorithm synchronizes the updating of the learning rate with the training process of the neural network, provides a larger updating intensity in the early stage of the neural network training to enable the neural network to iterate rapidly, reduces the updating intensity in the later stage of the neural network training to prevent data from deviating from a correct value, and has higher voice recognition accuracy.

The purpose of the invention is realized by the following technical scheme:

firstly, three layers of BP neural networks are built, namely an input layer, a hidden layer and an output layer. Randomly generating a weighting factor W between an output layer and a hidden layer_ij(k) And W_jg(k) And an activation function parameter a between two layers_j(k) And b_g(k) Selecting a learning rate eta (k), and setting k to be 1;

secondly, the FPGA platform extracts voice data of the voice to be recognized, and the BP neural network is used for extracting the extracted data X_i(k) Analyzing and calculating the output O of the BP neural network output layer_g(k) While outputting Y as desired_g(k) Calculating error E_g(k) Using error E_g(k) Proportional parameter K in Proportion Integration Differentiation (PID) algorithm_PIntegral parameter K_IDifferential parameterK_DAdjusting η (k);

then, the error E is used_g(k) And the adjusted weighting coefficient W of the output layer and the hidden layer of the eta (k) correction BP neural network_ij(k) And W_jg(k) And an activation function parameter a between two layers_j(k) And b_g(k) Until the output error of the BP neural network output layer meets the requirement, finally judging whether the input voice is a set voice signal;

the method comprises the following steps:

A. carrying out training initialization preparation, initializing a neural network structure, and acquiring a sample set for training;

B. carrying out feature extraction on the sample set to obtain a feature set;

C. training a neural network by taking the feature set as a training set, and using the expected output Y in the training process_g(k) And the actual output O_g(k) Obtain an error E_g(k) Error E_g(k) Adjusting the learning rate η (k) as a parameter in the outer PID algorithm, and then continuing to use the error E_g(k) And the adjusted learning rate eta (k +1) is taken as the weight W of the inner PID algorithm parameter to the input layer_ij(k) Threshold value a_j(k) And output layer weight W_jg(k) Threshold b_g(k) Carrying out adjustment;

D. testing the adjusted neural network, extracting the characteristics of the test sample according to the step (2), and identifying the test sample by the neural network in the step (3) to obtain the error E in the step (3)_g(k) When error E_g(k) If the value is less than a certain threshold value, the neural network training is finished, and whether the input voice signal is a set voice signal or not is judged.

In said step C, error E_g(k) And adjusting the learning rate as an outer layer PID algorithm parameter, wherein an outer layer PID adjusting formula is as follows:

where s is 1,2, …, k, k is the current iterationNumber of times, N₃Taking N as the number of output layers ₃3, wherein

E_gD(k)＝E_g(k)-E_g(k-1)。

In said step C, the error E is used_g(k) And the adjusted learning rate eta (k +1) is taken as the weight W of the inner PID algorithm parameter pair between the input layer and the hidden layer_ij(k) Hidden layer threshold a_j(k) And the weight W between the hidden layer and the output layer_jg(k) Output layer threshold b_g(k) The regulation is carried out, the outer layer PID regulation formula is as follows,

the updating formula of the weight between the input layer and the hidden layer is as follows:

the hidden layer threshold is formulated as:

N₃taking N as the number of output layers₃＝3，

The updating formula of the weight between the hidden layer and the output layer is as follows:

the updating formula of the output layer threshold value is as follows:

where s is 1,2, …, k, k is the current iteration number, N₃Taking N as the number of output layers ₃3, wherein

E_gD(k)＝E_g(k)-E_g(k-1)。

Has the advantages that: adjusting weight threshold and learning rate using PID algorithm, K in double-layer PID algorithm_P，K_IAnd K_DError E of three parameters according to system and training result_g(k) The automatic adjustment is carried out, so that the weight threshold convergence of the hidden layer and the output layer is more stable, and the data fluctuation of the system is reduced; the learning rate can provide larger updating intensity in the early stage to help the system to update quickly, and can reduce the updating intensity in the later stage to prevent the data from deviating from the correct value, so that the speech recognition accuracy is higher.

Description of the drawings:

FIG. 1 is a running diagram of a speech recognition method based on an FPGA platform

FIG. 2 is a flowchart of an algorithm for updating learning rate of weight threshold by two-level PID

FIG. 3 is a diagram of a BP neural network structure

FIG. 4 is a flow chart of neural network training

FIG. 5 is a simulation display diagram of four music characteristic signals

FIG. 6 is a graph of convergence weight threshold for training of general neural network structure

FIG. 7 is a graph of convergence of training weight threshold of neural network structure after double-layer PID optimization

FIG. 8 is a graph of the convergence of learning rate in training of general neural network structure

FIG. 9 is a neural network structure training learning rate convergence diagram after double-layer PID optimization

FIG. 10 is a graph of the accuracy of the training structure of the general neural network structure for four music characteristic signals

FIG. 11 is a graph of the accuracy of the neural network structure training structure under the double-layer PID result optimization of four music feature signals.

The specific implementation mode is as follows:

the invention is described in further detail below with reference to the following figures and examples:

a BP neural network voice recognition method with double-layer PID optimization,

firstly, build three layersThe BP neural network is an input layer, a hidden layer and an output layer respectively. Randomly generating a weighting factor W between an output layer and a hidden layer_ij(k) And W_jg(k) And an activation function parameter a between two layers_j(k) And b_g(k) Selecting a learning rate eta (k), and setting k to be 1;

secondly, the FPGA platform extracts voice data of the voice to be recognized, and the BP neural network is used for extracting the extracted data X_i(k) Analyzing and calculating the output O of the BP neural network output layer_g(k) While outputting Y as desired_g(k) Calculating error E_g(k) Using error E_g(k) Proportional parameter K in Proportion Integration Differentiation (PID) algorithm_PIntegral parameter K_IDifferential parameter K_DAdjusting η (k);

the method comprises the following steps:

a: initializing an FPGA platform based on a double-layer PID optimization BP neural network, and taking three input layer neurons, six hidden layer neurons and three output layer neurons as shown in FIG. 3. Denote the ith input layer neuron input data as X_i(k) (ii) a Recording the weight value between the ith input layer neuron and the jth hidden layer neuron as W_ij(k) (ii) a Let the hidden layer threshold for the jth hidden layer neuron be a_j(k) (ii) a Recording the weight value between the jth hidden layer neuron and the g output layer neuron as W_jg(k) (ii) a Let the g-th output layer neuron threshold be b_g(k) (ii) a Let the g output layer neuron output value be O_g(k) The expected output value and the output value error value are recorded as Y_g(k) And E_g(k) (ii) a The learning rate is noted as η (k). Wherein i is the number of neurons in the input layer, j is the number of neurons in the hidden layer, g is the number of neurons in the output layer, and k is the current iteration number.Generating proportional parameter K in inner PID and outer PID algorithm simultaneously_PIntegral parameter K_IDifferential parameter K_D。

B: the FPGA platform extracts voice data from a voice signal by AD conversion, then performs feature extraction on the extracted voice data by the MFCC method to obtain a feature vector set, and records an input layer vector as z (k) ═ X₁(k),X₂(k),X₃(k) Wherein X) is_iFor each of the three neurons in the input layer (i ═ 1,2,3), the input layer first iteration takes as input Z (1) ═ X₁(1),X₂(1),X₃(1) Where the first input of the first neuron is X)₁(1) After Z (k) is input, firstly, data weight processing is carried out on the hidden layer, and the processing result is recorded as:

variable N₁For inputting the number of layers, take N₁＝3；

Continue to process the weight value processing result net_jProcessing hidden layer threshold value data, recorded as H_j＝f(net_j-a_j) Wherein f is an activation function, a_jTo hide the layer threshold, H_jIs the output layer input value.

Output layer input value H_j(j is 1,2,3,4,5,6), data weighting processing is performed on the output layer, and the processing result is expressed as:

variable N₂Taking N to hide the number of layers₂＝6；

Then, the threshold processing is continuously carried out on the output layer weight processing result and is recorded as O_g＝G(net_g-b_g) G is an activation function, b_gAs output layer threshold, O_gAnd outputting the output value of the output layer, namely the final output of the neural network. Error recording E_g(k)＝Y_g(k)-O_g(k) Wherein the first output neuron outputs an error E₁(1)＝Y₁(1)-O₁(1) Second output neuron output error E₂(1)＝Y₂(1)-O₂(1) Third output neuron output error E₃(1)＝Y₃(1)-O₃(1)。

C: carrying out reverse error propagation by the BP neural network, and updating a learning rate eta (k); hidden layer weight W_ij(k) And a threshold value a_j(k) (ii) a Output layer weight W_jg(k) And a threshold value b_g(k) In that respect The inner-layer PID updating formula of the learning rate eta (k) is as follows:

where s is 1,2, …, k, k is the current iteration number, N₃Taking N as the number of output layers₃＝3；

Wherein

The purpose of summation and averaging in the formula is to convert a plurality of adjustment values into one, so that eta can be subjected to the change of the whole neural network, the parameter is used as a proportional adjustment parameter, eta can be rapidly adjusted, and the second iteration of the first output neuron is taken as an example:

the parameter is used as an integral adjustment parameter, error accumulation is carried out on time, a stable adjustment is maintained, inertia during parameter updating is reduced, and system oscillation is reduced. Take the second iteration of the first output neuron as an example: e₁(2)+E₁(1) (ii) a Wherein E_gD(k)＝E_g(k)-E_g(k-1) as a differential regulation parameter, the parameter can reflect the output error change of the neural network in advance, and an effective parameter is introduced into the systemThe signal is corrected early, so that the action speed of the system is accelerated, and the adjusting time is reduced. Take the second iteration of the first output neuron as an example: e_1D(2)＝E₁(2)-E₁(1)。

Take the inner-layer PID update formula of the second iteration of the first output neuron as an example:

the learning rate updated by using the inner PID updating algorithm can provide a larger updating intensity at the early stage of the operation of the neural network, so that the system can quickly iterate and correct parameters, the updating intensity is reduced at the later stage of the operation of the system, and the result is prevented from deviating from the correct value due to data oscillation;

after the learning rate is updated, the updated learning rate eta (k) and the output error E are used_g(k) As a parameter in an inner PID updating formula, then updating a weight between an input layer and a hidden layer, a hidden layer threshold, a weight between the hidden layer and an output layer threshold;

the formula for updating the weight between the input layer and the hidden layer is exemplified by the second iteration:

the updating formula of the hidden layer threshold value is as follows:

N₃taking N as the number of output layers₃＝3。

The hidden layer threshold formula is exemplified by the second iteration:

the weight formula between the hidden layer and the output layer is exemplified by the second iteration:

the updating formula of the output layer threshold value is as follows:

the update formula of the output layer threshold takes the second iteration as an example:

the negative gradient algorithm only considers the current state of the neural network, but does not consider the past state and the future state, and the weight threshold value adjustment data in the iteration process has large fluctuation and is easy to deviate from correct output. And updating the weight threshold value by using an inner-layer PID algorithm, and then updating the weight threshold value according to the error return value, so that the weight threshold value can be stably iterated, and the purposes of stronger system stability and higher output result accuracy are achieved.

D: and testing the adjusted neural network algorithm, extracting the characteristics of the test sample according to the step two, then testing according to the step two, and if the error is lower than a certain threshold value, finishing the training and outputting the result.

According to the simulation experiment result, the same data is processed by using a common neural network and a neural network with a double-layer PID structure optimized, four types of characteristic signals are shown in FIG. 5, and it can be seen that the second type of music characteristic signal is very similar to the third type of music characteristic signal, so that the second type of music characteristic signal or the third type of music characteristic signal may not be ideal in final identification accuracy.

The weight threshold of the hidden layer and the weight threshold of the output layer of the common neural network are shown in fig. 6, the convergence of the neural network with a double-layer PID structure is shown in fig. 7, and it can be seen that the output convergence of the weight threshold optimized by using the double-layer PID is more stable.

The learning rate after the training of the common neural network structure is shown in fig. 8, and the learning rate of the neural network training under the optimization of the double-layer PID structure is shown in fig. 9, so that the obvious convergence can be seen, and compared with the fixed learning rate, the learning rate is obviously more scientific by continuously converging according to the feedback result.

The operation result of the neural network with the common structure is shown in fig. 10, and the output result of the neural network with the optimized double-layer PID structure is shown in fig. 11, so that the accuracy of the neural network with the optimized double-layer PID structure for the third music identification is far higher than that of the neural network with the common structure, and meanwhile, the neural network with the optimized double-layer PID structure has almost no influence on the other three types of music identification, and the overall accuracy is greatly improved.

Claims

1. A BP neural network speech recognition method of double-deck PID optimization, characterized by that:

firstly, building three layers of BP neural networks which are respectively an input layer, a hidden layer and an output layer, and randomly generating a weighting coefficient W between the output layer and the hidden layer_ij(k) And W_jg(k) And an activation function parameter a between two layers_j(k) And b_g(k) Selecting a learning rate eta (k), and setting k to be 1;

the method comprises the following steps:

B. carrying out feature extraction on the sample set to obtain a feature set;

2. The method according to claim 1, wherein in the step C, the error E is determined by the method_g(k) And adjusting the learning rate as an outer layer PID algorithm parameter, wherein an outer layer PID adjusting formula is as follows:

where s is 1,2, …, k, k is the current iteration number, N₃Taking N as the number of output layers₃3, wherein

E_gD(k)＝E_g(k)-E_g(k-1)。

3. The method according to claim 1, wherein in step C, an error E is used_g(k) And the adjusted learning rate eta (k +1) is taken as the weight W of the inner PID algorithm parameter pair between the input layer and the hidden layer_ij(k) Hidden layer threshold a_j(k) And the weight W between the hidden layer and the output layer_jg(k) Output layer threshold b_g(k) The regulation is carried out, the outer layer PID regulation formula is as follows,

the hidden layer threshold is formulated as:

N₃taking N as the number of output layers₃＝3，

the updating formula of the output layer threshold value is as follows:

E_gD(k)＝E_g(k)-E_g(k-1)。