CN106919710A - A kind of dialect sorting technique based on convolutional neural networks - Google Patents
A kind of dialect sorting technique based on convolutional neural networks Download PDFInfo
- Publication number
- CN106919710A CN106919710A CN201710144714.3A CN201710144714A CN106919710A CN 106919710 A CN106919710 A CN 106919710A CN 201710144714 A CN201710144714 A CN 201710144714A CN 106919710 A CN106919710 A CN 106919710A
- Authority
- CN
- China
- Prior art keywords
- neural networks
- convolutional neural
- dialect
- sorting technique
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/686—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of dialect sorting technique based on convolutional neural networks, comprise the following steps:(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled;(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures assign label information, label information represents the county-level city belonging to corresponding picture;(3) convolutional neural networks are set up, each level of convolutional neural networks is followed successively by input layer, multiple convolutional layers, full articulamentum and output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm;(4) after the completion of training, the error rate downward trend figure in training process is obtained.Beneficial effects of the present invention are:Two dimensional image is classified using convolutional neural networks, good classifying quality can be obtained, largely improve the classification accuracy to dialect.
Description
Technical field
The present invention relates to convolutional neural networks application field, especially a kind of dialect classification side based on convolutional neural networks
Method.
Background technology
Convolutional neural networks are one kind of artificial neural network, it has also become current speech analysis is ground with field of image recognition
Study carefully focus.The shared network structure of its weights is allowed to be more closely similar to biological neural network, reduces the complexity of network model, subtracts
The quantity of weights is lacked.What the advantage was showed when the input of network is multidimensional image becomes apparent, and image is directly made
It is the input of network, it is to avoid complicated feature extraction and data reconstruction processes in tional identification algorithm.Convolutional network is to know
One multilayer perceptron of other two-dimensional shapes and particular design, this network structure is to translation, proportional zoom, inclination or is total to him
The deformation of form has height consistency.
Convolutional neural networks are a neutral nets for multilayer, and every layer is made up of multiple two dimensional surfaces, and each plane by
Multiple independent neuron compositions.Input picture by and three trainable wave filters and can biasing put and carry out convolution, after convolution
Three Feature Mapping figures are produced at C1 layers, then every group of four pixels are sued for peace again in Feature Mapping figure, weighted value, biasing
Put, three S2 layers of Feature Mapping figure is obtained by a Sigmoid function.These mapping graphs obtain C3 layers after filtering again.
This hierarchical structure produces S4 as S2 again.Finally, these pixel values are rasterized, and connect into a vector and be input to biography
The neutral net of system, is exported.
Usually, C layers is characterized extract layer, and the input of each neuron is connected with the local receptor field of preceding layer, and carries
The local feature is taken, after the local feature is extracted, its position relationship and between further feature is also decided therewith;
S layers is Feature Mapping layer, and each computation layer of network is made up of multiple Feature Mappings, and each Feature Mapping is a plane, is put down
The weights of all neurons are equal on face.Feature Mapping structure is using the small sigmoid functions of influence function core as convolution net
The activation primitive of network so that Feature Mapping has shift invariant.
Further, since the shared weights of neuron on a mapping face, thus the number of network freedom parameter is reduced, drop
The complexity of low network parameter selection.Each followed by one use of feature extraction layer (C- layers) in convolutional neural networks
To ask the computation layer (S- layers) of local average and second extraction, this distinctive structure of feature extraction twice to make network in identification
There is distortion tolerance higher to input sample.
The content of the invention
The technical problems to be solved by the invention are, there is provided a kind of dialect sorting technique based on convolutional neural networks,
Dialect audio spectrum picture can be classified and recognized.
In order to solve the above technical problems, the present invention provides a kind of dialect sorting technique based on convolutional neural networks, including
Following steps:
(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled;
(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures are assigned
Label information, label information is given to represent the county-level city belonging to corresponding picture;
(3) convolutional neural networks are set up, each level of convolutional neural networks is followed successively by input layer, multiple convolutional layers, full connection
Layer and output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm;
(4) after the completion of training, the error rate downward trend figure in training process is obtained.
Preferably, in step (1), sample set is pre-processed, audio file is converted into sonograph, and remove sound spectrum
Margin in figure.
Preferably, in step (1), sample set includes the dialect sample in multiple places.
Preferably, in step (2), picture unification is scaled to 227 × 227 colour picture.
Preferably, in step (3), convolutional neural networks are classical Alexnet network structures, in the network, first
Layer be input layer, receive size be 227 × 227 coloured image as input, last layer be output layer, altogether N number of node, N
Represent the classification sum of the dialect data set for needing classification.
Preferably, in step (3), gradient descent algorithm is concretely comprised the following steps:Since any point, along the anti-of the gradient
A segment distance is moved in direction, then along the segment distance of gradient reverse direction operation one of new position, such iteration.Solution is always towards descending steepest
Direction motion, it would be desirable to move to the global minima point of function, that is, cause the minimum point of error amount.
Preferably, in step (3), back-propagation algorithm is concretely comprised the following steps:When finding error using gradient descent method
After minimum value, weights are updated forward successively from last layer of network, updating weights, i.e. chain type with the method for backpropagation asks
Then, chain type Rule for derivation is as follows for inducing defecation by enema and suppository:
Preferably, in step (4), training sample and test sample be trained, constantly more to all samples in batches
New weights, a stationary value is converged on until the value of object function converges on the value in a stability region, i.e. error rate.
Beneficial effects of the present invention are:Two dimensional image is classified using convolutional neural networks, good dividing can be obtained
Class effect, largely improves the classification accuracy to dialect.
Brief description of the drawings
Fig. 1 is method of the present invention schematic flow sheet.
Fig. 2 is the object function of convolutional neural networks term dialect classification of the invention and the changing trend diagram of error rate.
Specific embodiment
As shown in figure 1, a kind of dialect sorting technique based on convolutional neural networks, comprises the following steps:
(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled;Sample set is carried out
Pretreatment, is converted into sonograph, and remove the margin in sonograph by audio file;Sample set includes the side in multiple places
Speech sample;
(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures are assigned
Label information, label information is given to represent the county-level city belonging to corresponding picture;Picture unification is scaled to 227 × 227 cromogram
Piece;
(3) convolutional neural networks are set up, each level of convolutional neural networks is followed successively by input layer, multiple convolutional layers, full connection
Layer and output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm;
(4) all samples are trained in batches, constantly update weights, until the value of object function converges on one surely
Determine the value in region, i.e. error rate and converge on a stationary value;After the completion of training, obtain the error rate in training process and decline
Gesture figure.
Convolutional neural networks are classical Alexnet network structures, and in the network, ground floor is input layer, receive big
It is small be 227 × 227 coloured image as input, last layer be output layer, altogether N number of node, N represent need classification dialect
The classification sum of data set.
Gradient descent algorithm is concretely comprised the following steps:Since any point, a segment distance is moved along the opposite direction of the gradient,
Again along the segment distance of gradient reverse direction operation one of new position, such iteration.Solution is moved towards the direction of descending steepest always, it would be desirable to
The global minima point of function is moved to, that is, causes the minimum point of error amount.
Back-propagation algorithm is concretely comprised the following steps:After the minimum value for finding error using gradient descent method, from network
Last layer update weights forward successively, update weights, i.e. chain type Rule for derivation, chain type method of derivation with the method for backpropagation
It is then as follows:
Experiment condition:Now to choose a computer carry out dialect classification, and the computer is configured with Intel (R) processor
(3.30GHz) and 32GB random access memory (RAM), GTX970GPU, 64 bit manipulation systems, programming language is
Matlab (R2015a versions).
Experimental subjects:Dialect databases include greyscale image data storehouse and color image data storehouse, using coloured silk in the present invention
Color image database images are tested, and what is classified is 70, the Jiangsu dialect in place, therefore has 70 classes, each classification
200 width images are all included, each image size is 227 × 227.160 width are randomly selected in each class image as training figure
Picture, remaining 40 width is used as test image.
Experimental procedure:
Step 1, dialect audio file is converted into sonograph, and removes the margin of sonograph, then adjust picture
Whole is 227 × 227 colour picture.
Step 2, by it is all training and test pictures mark.
Step 3, in Alexnet network structures in Matconvnet, change partial parameters so that network structure and dialect
Database matching.
Step 4, it is ready after, will mark picture feeding network in, bring into operation program.
Step 5, program can export the error rate of each pictures identification, when program is completed, can export whole service process
The changing trend diagram of the error rate of middle training and test.
Fig. 2 is that convolutional neural networks are used for the object function of dialect classification and the changing trend diagram of error rate in the present invention;
Wherein, abscissa (epoch) represents the batch of training;Left side objective represents the variation tendency of object function, ordinate table
Show the value of object function;Middle top1err represents the changing trend diagram of the error rate for accurately assigning to its generic, ordinate table
Show the size of error rate;The right top5err is that the change for representing the error rate for assigning to 5 classifications immediate with generic becomes
Gesture figure, ordinate represents the size of error rate.Because in experimentation, we distinguish training with the color of curve and tested
Journey, because the color of curve in limitation Fig. 2 is black, because top1err is the key criterion of judgment experiment accuracy rate, we
Only with reference to the index of top1err.Upper graph is test process in top1err, and lower surface curve is training process.Top1err is surveyed
The value of examination can be stablized 90%.
Although the present invention is illustrated and has been described with regard to preferred embodiment, it is understood by those skilled in the art that
Without departing from scope defined by the claims of the present invention, variations and modifications can be carried out to the present invention.
Claims (8)
1. a kind of dialect sorting technique based on convolutional neural networks, it is characterised in that comprise the following steps:
(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled;
(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures assign mark
Label information, label information represents the county-level city belonging to corresponding picture;
(3) set up convolutional neural networks, each level of convolutional neural networks be followed successively by input layer, multiple convolutional layers, full articulamentum and
Output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm;
(4) after the completion of training, the error rate downward trend figure in training process is obtained.
2. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that right in step (1)
Sample set is pre-processed, and audio file is converted into sonograph, and remove the margin in sonograph.
3. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 2, it is characterised in that in step (1), sample
This collection includes the dialect sample in multiple places.
4. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (2), figure
Piece unification is scaled to 227 × 227 colour picture.
5. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 4, it is characterised in that in step (3), volume
Product neutral net is classical Alexnet network structures, and in the network, ground floor is input layer, receive size for 227 ×
Used as input, last layer is output layer to 227 coloured image, altogether N number of node, and N represents the dialect data set of needs classification
Classification sum.
6. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (3), ladder
Spend concretely comprising the following steps for descent algorithm:Since any point, a segment distance is moved along the opposite direction of the gradient, then along new position
The segment distance of gradient reverse direction operation one, such iteration;Solution is moved towards the direction of descending steepest always, it would be desirable to move to function
Global minima point, that is, cause the minimum point of error amount.
7. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (3), instead
To concretely comprising the following steps for propagation algorithm:After the minimum value for finding error using gradient descent method, from last layer of network
Update weights forward successively, weights, i.e. chain type Rule for derivation are updated with the method for backpropagation, chain type Rule for derivation is as follows:
8. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (4), instruction
Practice sample and test sample, i.e., all samples are trained in batches, constantly update weights, until the value of object function restrains
Value in a stability region, i.e. error rate converge on a stationary value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710144714.3A CN106919710A (en) | 2017-03-13 | 2017-03-13 | A kind of dialect sorting technique based on convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710144714.3A CN106919710A (en) | 2017-03-13 | 2017-03-13 | A kind of dialect sorting technique based on convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106919710A true CN106919710A (en) | 2017-07-04 |
Family
ID=59461330
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710144714.3A Pending CN106919710A (en) | 2017-03-13 | 2017-03-13 | A kind of dialect sorting technique based on convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106919710A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108170735A (en) * | 2017-12-15 | 2018-06-15 | 东南大学 | A kind of dialect databases method for building up suitable for convolutional neural networks |
CN109887497A (en) * | 2019-04-12 | 2019-06-14 | 北京百度网讯科技有限公司 | Modeling method, device and the equipment of speech recognition |
CN110033760A (en) * | 2019-04-15 | 2019-07-19 | 北京百度网讯科技有限公司 | Modeling method, device and the equipment of speech recognition |
CN110148400A (en) * | 2018-07-18 | 2019-08-20 | 腾讯科技(深圳)有限公司 | The pronunciation recognition methods of type, the training method of model, device and equipment |
WO2019232849A1 (en) * | 2018-06-04 | 2019-12-12 | 平安科技(深圳)有限公司 | Chinese character model training method, handwritten character recognition method, apparatuses, device and medium |
CN111488486A (en) * | 2020-04-20 | 2020-08-04 | 武汉大学 | Electronic music classification method and system based on multi-sound-source separation |
CN111881797A (en) * | 2020-07-20 | 2020-11-03 | 北京理工大学 | Method, device, equipment and storage medium for finely classifying vegetation on coastal wetland |
CN115472147A (en) * | 2022-09-15 | 2022-12-13 | 北京大学深圳医院 | Language identification method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
WO2015180368A1 (en) * | 2014-05-27 | 2015-12-03 | 江苏大学 | Variable factor decomposition method for semi-supervised speech features |
CN105895110A (en) * | 2016-06-30 | 2016-08-24 | 北京奇艺世纪科技有限公司 | Method and device for classifying audio files |
CN106485251A (en) * | 2016-10-08 | 2017-03-08 | 天津工业大学 | Egg embryo classification based on deep learning |
-
2017
- 2017-03-13 CN CN201710144714.3A patent/CN106919710A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015180368A1 (en) * | 2014-05-27 | 2015-12-03 | 江苏大学 | Variable factor decomposition method for semi-supervised speech features |
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
CN105895110A (en) * | 2016-06-30 | 2016-08-24 | 北京奇艺世纪科技有限公司 | Method and device for classifying audio files |
CN106485251A (en) * | 2016-10-08 | 2017-03-08 | 天津工业大学 | Egg embryo classification based on deep learning |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108170735A (en) * | 2017-12-15 | 2018-06-15 | 东南大学 | A kind of dialect databases method for building up suitable for convolutional neural networks |
WO2019232849A1 (en) * | 2018-06-04 | 2019-12-12 | 平安科技(深圳)有限公司 | Chinese character model training method, handwritten character recognition method, apparatuses, device and medium |
CN110148400A (en) * | 2018-07-18 | 2019-08-20 | 腾讯科技(深圳)有限公司 | The pronunciation recognition methods of type, the training method of model, device and equipment |
CN110148400B (en) * | 2018-07-18 | 2023-03-17 | 腾讯科技(深圳)有限公司 | Pronunciation type recognition method, model training method, device and equipment |
CN109887497A (en) * | 2019-04-12 | 2019-06-14 | 北京百度网讯科技有限公司 | Modeling method, device and the equipment of speech recognition |
CN109887497B (en) * | 2019-04-12 | 2021-01-29 | 北京百度网讯科技有限公司 | Modeling method, device and equipment for speech recognition |
CN110033760A (en) * | 2019-04-15 | 2019-07-19 | 北京百度网讯科技有限公司 | Modeling method, device and the equipment of speech recognition |
US11688391B2 (en) | 2019-04-15 | 2023-06-27 | Beijing Baidu Netcom Science And Technology Co. | Mandarin and dialect mixed modeling and speech recognition |
CN110033760B (en) * | 2019-04-15 | 2021-01-29 | 北京百度网讯科技有限公司 | Modeling method, device and equipment for speech recognition |
CN111488486B (en) * | 2020-04-20 | 2021-08-17 | 武汉大学 | Electronic music classification method and system based on multi-sound-source separation |
CN111488486A (en) * | 2020-04-20 | 2020-08-04 | 武汉大学 | Electronic music classification method and system based on multi-sound-source separation |
CN111881797A (en) * | 2020-07-20 | 2020-11-03 | 北京理工大学 | Method, device, equipment and storage medium for finely classifying vegetation on coastal wetland |
CN115472147A (en) * | 2022-09-15 | 2022-12-13 | 北京大学深圳医院 | Language identification method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106919710A (en) | A kind of dialect sorting technique based on convolutional neural networks | |
Zhao et al. | A visual long-short-term memory based integrated CNN model for fabric defect image classification | |
Poma et al. | Dense extreme inception network: Towards a robust cnn model for edge detection | |
Hertel et al. | Deep convolutional neural networks as generic feature extractors | |
CN109558942B (en) | Neural network migration method based on shallow learning | |
CN110309856A (en) | Image classification method, the training method of neural network and device | |
Colak et al. | Automated McIntosh-based classification of sunspot groups using MDI images | |
CN107423756A (en) | Nuclear magnetic resonance image sequence sorting technique based on depth convolutional neural networks combination shot and long term memory models | |
CN107358169A (en) | A kind of facial expression recognizing method and expression recognition device | |
CN104517122A (en) | Image target recognition method based on optimized convolution architecture | |
CN107408209A (en) | Without the classification of the automatic defect of sampling and feature selecting | |
CN105718952A (en) | Method for focus classification of sectional medical images by employing deep learning network | |
CN110083700A (en) | A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks | |
CN110070107A (en) | Object identification method and device | |
AU2020100052A4 (en) | Unattended video classifying system based on transfer learning | |
CN110457982A (en) | A kind of crop disease image-recognizing method based on feature transfer learning | |
CN107203606A (en) | Text detection and recognition methods under natural scene based on convolutional neural networks | |
Pathar et al. | Human emotion recognition using convolutional neural network in real time | |
CN112614119A (en) | Medical image region-of-interest visualization method, device, storage medium and equipment | |
CN109815945A (en) | A kind of respiratory tract inspection result interpreting system and method based on image recognition | |
CN108960260A (en) | A kind of method of generating classification model, medical image image classification method and device | |
CN109508640A (en) | A kind of crowd's sentiment analysis method, apparatus and storage medium | |
Paul et al. | A modern approach for sign language interpretation using convolutional neural network | |
CN112668486A (en) | Method, device and carrier for identifying facial expressions of pre-activated residual depth separable convolutional network | |
Diouf et al. | Convolutional neural network and decision support in medical imaging: case study of the recognition of blood cell subtypes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170704 |