CN106919710A

CN106919710A - A kind of dialect sorting technique based on convolutional neural networks

Info

Publication number: CN106919710A
Application number: CN201710144714.3A
Authority: CN
Inventors: 伍家松; 魏黎明; 邱诗洁; 杨淳沨; 孔佑勇; 朱小贝; 舒华忠
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2017-03-13
Filing date: 2017-03-13
Publication date: 2017-07-04

Abstract

The invention discloses a kind of dialect sorting technique based on convolutional neural networks, comprise the following steps：(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled；(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures assign label information, label information represents the county-level city belonging to corresponding picture；(3) convolutional neural networks are set up, each level of convolutional neural networks is followed successively by input layer, multiple convolutional layers, full articulamentum and output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm；(4) after the completion of training, the error rate downward trend figure in training process is obtained.Beneficial effects of the present invention are：Two dimensional image is classified using convolutional neural networks, good classifying quality can be obtained, largely improve the classification accuracy to dialect.

Description

A kind of dialect sorting technique based on convolutional neural networks

Technical field

The present invention relates to convolutional neural networks application field, especially a kind of dialect classification side based on convolutional neural networks Method.

Background technology

Convolutional neural networks are one kind of artificial neural network, it has also become current speech analysis is ground with field of image recognition Study carefully focus.The shared network structure of its weights is allowed to be more closely similar to biological neural network, reduces the complexity of network model, subtracts The quantity of weights is lacked.What the advantage was showed when the input of network is multidimensional image becomes apparent, and image is directly made It is the input of network, it is to avoid complicated feature extraction and data reconstruction processes in tional identification algorithm.Convolutional network is to know One multilayer perceptron of other two-dimensional shapes and particular design, this network structure is to translation, proportional zoom, inclination or is total to him The deformation of form has height consistency.

Convolutional neural networks are a neutral nets for multilayer, and every layer is made up of multiple two dimensional surfaces, and each plane by Multiple independent neuron compositions.Input picture by and three trainable wave filters and can biasing put and carry out convolution, after convolution Three Feature Mapping figures are produced at C1 layers, then every group of four pixels are sued for peace again in Feature Mapping figure, weighted value, biasing Put, three S2 layers of Feature Mapping figure is obtained by a Sigmoid function.These mapping graphs obtain C3 layers after filtering again. This hierarchical structure produces S4 as S2 again.Finally, these pixel values are rasterized, and connect into a vector and be input to biography The neutral net of system, is exported.

Usually, C layers is characterized extract layer, and the input of each neuron is connected with the local receptor field of preceding layer, and carries The local feature is taken, after the local feature is extracted, its position relationship and between further feature is also decided therewith； S layers is Feature Mapping layer, and each computation layer of network is made up of multiple Feature Mappings, and each Feature Mapping is a plane, is put down The weights of all neurons are equal on face.Feature Mapping structure is using the small sigmoid functions of influence function core as convolution net The activation primitive of network so that Feature Mapping has shift invariant.

Further, since the shared weights of neuron on a mapping face, thus the number of network freedom parameter is reduced, drop The complexity of low network parameter selection.Each followed by one use of feature extraction layer (C- layers) in convolutional neural networks To ask the computation layer (S- layers) of local average and second extraction, this distinctive structure of feature extraction twice to make network in identification There is distortion tolerance higher to input sample.

The content of the invention

The technical problems to be solved by the invention are, there is provided a kind of dialect sorting technique based on convolutional neural networks, Dialect audio spectrum picture can be classified and recognized.

In order to solve the above technical problems, the present invention provides a kind of dialect sorting technique based on convolutional neural networks, including Following steps：

(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled；

(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures are assigned Label information, label information is given to represent the county-level city belonging to corresponding picture；

(3) convolutional neural networks are set up, each level of convolutional neural networks is followed successively by input layer, multiple convolutional layers, full connection Layer and output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm；

(4) after the completion of training, the error rate downward trend figure in training process is obtained.

Preferably, in step (1), sample set is pre-processed, audio file is converted into sonograph, and remove sound spectrum Margin in figure.

Preferably, in step (1), sample set includes the dialect sample in multiple places.

Preferably, in step (2), picture unification is scaled to 227 × 227 colour picture.

Preferably, in step (3), convolutional neural networks are classical Alexnet network structures, in the network, first Layer be input layer, receive size be 227 × 227 coloured image as input, last layer be output layer, altogether N number of node, N Represent the classification sum of the dialect data set for needing classification.

Preferably, in step (3), gradient descent algorithm is concretely comprised the following steps：Since any point, along the anti-of the gradient A segment distance is moved in direction, then along the segment distance of gradient reverse direction operation one of new position, such iteration.Solution is always towards descending steepest Direction motion, it would be desirable to move to the global minima point of function, that is, cause the minimum point of error amount.

Preferably, in step (3), back-propagation algorithm is concretely comprised the following steps：When finding error using gradient descent method After minimum value, weights are updated forward successively from last layer of network, updating weights, i.e. chain type with the method for backpropagation asks Then, chain type Rule for derivation is as follows for inducing defecation by enema and suppository：

Preferably, in step (4), training sample and test sample be trained, constantly more to all samples in batches New weights, a stationary value is converged on until the value of object function converges on the value in a stability region, i.e. error rate.

Beneficial effects of the present invention are：Two dimensional image is classified using convolutional neural networks, good dividing can be obtained Class effect, largely improves the classification accuracy to dialect.

Brief description of the drawings

Fig. 1 is method of the present invention schematic flow sheet.

Fig. 2 is the object function of convolutional neural networks term dialect classification of the invention and the changing trend diagram of error rate.

Specific embodiment

As shown in figure 1, a kind of dialect sorting technique based on convolutional neural networks, comprises the following steps：

(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled；Sample set is carried out Pretreatment, is converted into sonograph, and remove the margin in sonograph by audio file；Sample set includes the side in multiple places Speech sample；

(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures are assigned Label information, label information is given to represent the county-level city belonging to corresponding picture；Picture unification is scaled to 227 × 227 cromogram Piece；

(4) all samples are trained in batches, constantly update weights, until the value of object function converges on one surely Determine the value in region, i.e. error rate and converge on a stationary value；After the completion of training, obtain the error rate in training process and decline Gesture figure.

Convolutional neural networks are classical Alexnet network structures, and in the network, ground floor is input layer, receive big It is small be 227 × 227 coloured image as input, last layer be output layer, altogether N number of node, N represent need classification dialect The classification sum of data set.

Gradient descent algorithm is concretely comprised the following steps：Since any point, a segment distance is moved along the opposite direction of the gradient, Again along the segment distance of gradient reverse direction operation one of new position, such iteration.Solution is moved towards the direction of descending steepest always, it would be desirable to The global minima point of function is moved to, that is, causes the minimum point of error amount.

Back-propagation algorithm is concretely comprised the following steps：After the minimum value for finding error using gradient descent method, from network Last layer update weights forward successively, update weights, i.e. chain type Rule for derivation, chain type method of derivation with the method for backpropagation It is then as follows：

Experiment condition：Now to choose a computer carry out dialect classification, and the computer is configured with Intel (R) processor (3.30GHz) and 32GB random access memory (RAM), GTX970GPU, 64 bit manipulation systems, programming language is Matlab (R2015a versions).

Experimental subjects：Dialect databases include greyscale image data storehouse and color image data storehouse, using coloured silk in the present invention Color image database images are tested, and what is classified is 70, the Jiangsu dialect in place, therefore has 70 classes, each classification 200 width images are all included, each image size is 227 × 227.160 width are randomly selected in each class image as training figure Picture, remaining 40 width is used as test image.

Experimental procedure：

Step 1, dialect audio file is converted into sonograph, and removes the margin of sonograph, then adjust picture Whole is 227 × 227 colour picture.

Step 2, by it is all training and test pictures mark.

Step 3, in Alexnet network structures in Matconvnet, change partial parameters so that network structure and dialect Database matching.

Step 4, it is ready after, will mark picture feeding network in, bring into operation program.

Step 5, program can export the error rate of each pictures identification, when program is completed, can export whole service process The changing trend diagram of the error rate of middle training and test.

Fig. 2 is that convolutional neural networks are used for the object function of dialect classification and the changing trend diagram of error rate in the present invention； Wherein, abscissa (epoch) represents the batch of training；Left side objective represents the variation tendency of object function, ordinate table Show the value of object function；Middle top1err represents the changing trend diagram of the error rate for accurately assigning to its generic, ordinate table Show the size of error rate；The right top5err is that the change for representing the error rate for assigning to 5 classifications immediate with generic becomes Gesture figure, ordinate represents the size of error rate.Because in experimentation, we distinguish training with the color of curve and tested Journey, because the color of curve in limitation Fig. 2 is black, because top1err is the key criterion of judgment experiment accuracy rate, we Only with reference to the index of top1err.Upper graph is test process in top1err, and lower surface curve is training process.Top1err is surveyed The value of examination can be stablized 90%.

Although the present invention is illustrated and has been described with regard to preferred embodiment, it is understood by those skilled in the art that Without departing from scope defined by the claims of the present invention, variations and modifications can be carried out to the present invention.

Claims

1. a kind of dialect sorting technique based on convolutional neural networks, it is characterised in that comprise the following steps：

(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures assign mark Label information, label information represents the county-level city belonging to corresponding picture；

(3) set up convolutional neural networks, each level of convolutional neural networks be followed successively by input layer, multiple convolutional layers, full articulamentum and Output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm；

2. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that right in step (1) Sample set is pre-processed, and audio file is converted into sonograph, and remove the margin in sonograph.

3. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 2, it is characterised in that in step (1), sample This collection includes the dialect sample in multiple places.

4. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (2), figure Piece unification is scaled to 227 × 227 colour picture.

5. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 4, it is characterised in that in step (3), volume Product neutral net is classical Alexnet network structures, and in the network, ground floor is input layer, receive size for 227 × Used as input, last layer is output layer to 227 coloured image, altogether N number of node, and N represents the dialect data set of needs classification Classification sum.

6. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (3), ladder Spend concretely comprising the following steps for descent algorithm：Since any point, a segment distance is moved along the opposite direction of the gradient, then along new position The segment distance of gradient reverse direction operation one, such iteration；Solution is moved towards the direction of descending steepest always, it would be desirable to move to function Global minima point, that is, cause the minimum point of error amount.

7. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (3), instead To concretely comprising the following steps for propagation algorithm：After the minimum value for finding error using gradient descent method, from last layer of network Update weights forward successively, weights, i.e. chain type Rule for derivation are updated with the method for backpropagation, chain type Rule for derivation is as follows：

\frac{d z}{d x} = \frac{d z}{d y} \cdot \frac{d y}{d x} .

8. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (4), instruction Practice sample and test sample, i.e., all samples are trained in batches, constantly update weights, until the value of object function restrains Value in a stability region, i.e. error rate converge on a stationary value.