CN106919710A - A kind of dialect sorting technique based on convolutional neural networks - Google Patents

A kind of dialect sorting technique based on convolutional neural networks Download PDF

Info

Publication number
CN106919710A
CN106919710A CN201710144714.3A CN201710144714A CN106919710A CN 106919710 A CN106919710 A CN 106919710A CN 201710144714 A CN201710144714 A CN 201710144714A CN 106919710 A CN106919710 A CN 106919710A
Authority
CN
China
Prior art keywords
neural networks
convolutional neural
dialect
sorting technique
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710144714.3A
Other languages
Chinese (zh)
Inventor
伍家松
魏黎明
邱诗洁
杨淳沨
孔佑勇
朱小贝
舒华忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201710144714.3A priority Critical patent/CN106919710A/en
Publication of CN106919710A publication Critical patent/CN106919710A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of dialect sorting technique based on convolutional neural networks, comprise the following steps:(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled;(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures assign label information, label information represents the county-level city belonging to corresponding picture;(3) convolutional neural networks are set up, each level of convolutional neural networks is followed successively by input layer, multiple convolutional layers, full articulamentum and output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm;(4) after the completion of training, the error rate downward trend figure in training process is obtained.Beneficial effects of the present invention are:Two dimensional image is classified using convolutional neural networks, good classifying quality can be obtained, largely improve the classification accuracy to dialect.

Description

A kind of dialect sorting technique based on convolutional neural networks
Technical field
The present invention relates to convolutional neural networks application field, especially a kind of dialect classification side based on convolutional neural networks Method.
Background technology
Convolutional neural networks are one kind of artificial neural network, it has also become current speech analysis is ground with field of image recognition Study carefully focus.The shared network structure of its weights is allowed to be more closely similar to biological neural network, reduces the complexity of network model, subtracts The quantity of weights is lacked.What the advantage was showed when the input of network is multidimensional image becomes apparent, and image is directly made It is the input of network, it is to avoid complicated feature extraction and data reconstruction processes in tional identification algorithm.Convolutional network is to know One multilayer perceptron of other two-dimensional shapes and particular design, this network structure is to translation, proportional zoom, inclination or is total to him The deformation of form has height consistency.
Convolutional neural networks are a neutral nets for multilayer, and every layer is made up of multiple two dimensional surfaces, and each plane by Multiple independent neuron compositions.Input picture by and three trainable wave filters and can biasing put and carry out convolution, after convolution Three Feature Mapping figures are produced at C1 layers, then every group of four pixels are sued for peace again in Feature Mapping figure, weighted value, biasing Put, three S2 layers of Feature Mapping figure is obtained by a Sigmoid function.These mapping graphs obtain C3 layers after filtering again. This hierarchical structure produces S4 as S2 again.Finally, these pixel values are rasterized, and connect into a vector and be input to biography The neutral net of system, is exported.
Usually, C layers is characterized extract layer, and the input of each neuron is connected with the local receptor field of preceding layer, and carries The local feature is taken, after the local feature is extracted, its position relationship and between further feature is also decided therewith; S layers is Feature Mapping layer, and each computation layer of network is made up of multiple Feature Mappings, and each Feature Mapping is a plane, is put down The weights of all neurons are equal on face.Feature Mapping structure is using the small sigmoid functions of influence function core as convolution net The activation primitive of network so that Feature Mapping has shift invariant.
Further, since the shared weights of neuron on a mapping face, thus the number of network freedom parameter is reduced, drop The complexity of low network parameter selection.Each followed by one use of feature extraction layer (C- layers) in convolutional neural networks To ask the computation layer (S- layers) of local average and second extraction, this distinctive structure of feature extraction twice to make network in identification There is distortion tolerance higher to input sample.
The content of the invention
The technical problems to be solved by the invention are, there is provided a kind of dialect sorting technique based on convolutional neural networks, Dialect audio spectrum picture can be classified and recognized.
In order to solve the above technical problems, the present invention provides a kind of dialect sorting technique based on convolutional neural networks, including Following steps:
(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled;
(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures are assigned Label information, label information is given to represent the county-level city belonging to corresponding picture;
(3) convolutional neural networks are set up, each level of convolutional neural networks is followed successively by input layer, multiple convolutional layers, full connection Layer and output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm;
(4) after the completion of training, the error rate downward trend figure in training process is obtained.
Preferably, in step (1), sample set is pre-processed, audio file is converted into sonograph, and remove sound spectrum Margin in figure.
Preferably, in step (1), sample set includes the dialect sample in multiple places.
Preferably, in step (2), picture unification is scaled to 227 × 227 colour picture.
Preferably, in step (3), convolutional neural networks are classical Alexnet network structures, in the network, first Layer be input layer, receive size be 227 × 227 coloured image as input, last layer be output layer, altogether N number of node, N Represent the classification sum of the dialect data set for needing classification.
Preferably, in step (3), gradient descent algorithm is concretely comprised the following steps:Since any point, along the anti-of the gradient A segment distance is moved in direction, then along the segment distance of gradient reverse direction operation one of new position, such iteration.Solution is always towards descending steepest Direction motion, it would be desirable to move to the global minima point of function, that is, cause the minimum point of error amount.
Preferably, in step (3), back-propagation algorithm is concretely comprised the following steps:When finding error using gradient descent method After minimum value, weights are updated forward successively from last layer of network, updating weights, i.e. chain type with the method for backpropagation asks Then, chain type Rule for derivation is as follows for inducing defecation by enema and suppository:
Preferably, in step (4), training sample and test sample be trained, constantly more to all samples in batches New weights, a stationary value is converged on until the value of object function converges on the value in a stability region, i.e. error rate.
Beneficial effects of the present invention are:Two dimensional image is classified using convolutional neural networks, good dividing can be obtained Class effect, largely improves the classification accuracy to dialect.
Brief description of the drawings
Fig. 1 is method of the present invention schematic flow sheet.
Fig. 2 is the object function of convolutional neural networks term dialect classification of the invention and the changing trend diagram of error rate.
Specific embodiment
As shown in figure 1, a kind of dialect sorting technique based on convolutional neural networks, comprises the following steps:
(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled;Sample set is carried out Pretreatment, is converted into sonograph, and remove the margin in sonograph by audio file;Sample set includes the side in multiple places Speech sample;
(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures are assigned Label information, label information is given to represent the county-level city belonging to corresponding picture;Picture unification is scaled to 227 × 227 cromogram Piece;
(3) convolutional neural networks are set up, each level of convolutional neural networks is followed successively by input layer, multiple convolutional layers, full connection Layer and output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm;
(4) all samples are trained in batches, constantly update weights, until the value of object function converges on one surely Determine the value in region, i.e. error rate and converge on a stationary value;After the completion of training, obtain the error rate in training process and decline Gesture figure.
Convolutional neural networks are classical Alexnet network structures, and in the network, ground floor is input layer, receive big It is small be 227 × 227 coloured image as input, last layer be output layer, altogether N number of node, N represent need classification dialect The classification sum of data set.
Gradient descent algorithm is concretely comprised the following steps:Since any point, a segment distance is moved along the opposite direction of the gradient, Again along the segment distance of gradient reverse direction operation one of new position, such iteration.Solution is moved towards the direction of descending steepest always, it would be desirable to The global minima point of function is moved to, that is, causes the minimum point of error amount.
Back-propagation algorithm is concretely comprised the following steps:After the minimum value for finding error using gradient descent method, from network Last layer update weights forward successively, update weights, i.e. chain type Rule for derivation, chain type method of derivation with the method for backpropagation It is then as follows:
Experiment condition:Now to choose a computer carry out dialect classification, and the computer is configured with Intel (R) processor (3.30GHz) and 32GB random access memory (RAM), GTX970GPU, 64 bit manipulation systems, programming language is Matlab (R2015a versions).
Experimental subjects:Dialect databases include greyscale image data storehouse and color image data storehouse, using coloured silk in the present invention Color image database images are tested, and what is classified is 70, the Jiangsu dialect in place, therefore has 70 classes, each classification 200 width images are all included, each image size is 227 × 227.160 width are randomly selected in each class image as training figure Picture, remaining 40 width is used as test image.
Experimental procedure:
Step 1, dialect audio file is converted into sonograph, and removes the margin of sonograph, then adjust picture Whole is 227 × 227 colour picture.
Step 2, by it is all training and test pictures mark.
Step 3, in Alexnet network structures in Matconvnet, change partial parameters so that network structure and dialect Database matching.
Step 4, it is ready after, will mark picture feeding network in, bring into operation program.
Step 5, program can export the error rate of each pictures identification, when program is completed, can export whole service process The changing trend diagram of the error rate of middle training and test.
Fig. 2 is that convolutional neural networks are used for the object function of dialect classification and the changing trend diagram of error rate in the present invention; Wherein, abscissa (epoch) represents the batch of training;Left side objective represents the variation tendency of object function, ordinate table Show the value of object function;Middle top1err represents the changing trend diagram of the error rate for accurately assigning to its generic, ordinate table Show the size of error rate;The right top5err is that the change for representing the error rate for assigning to 5 classifications immediate with generic becomes Gesture figure, ordinate represents the size of error rate.Because in experimentation, we distinguish training with the color of curve and tested Journey, because the color of curve in limitation Fig. 2 is black, because top1err is the key criterion of judgment experiment accuracy rate, we Only with reference to the index of top1err.Upper graph is test process in top1err, and lower surface curve is training process.Top1err is surveyed The value of examination can be stablized 90%.
Although the present invention is illustrated and has been described with regard to preferred embodiment, it is understood by those skilled in the art that Without departing from scope defined by the claims of the present invention, variations and modifications can be carried out to the present invention.

Claims (8)

1. a kind of dialect sorting technique based on convolutional neural networks, it is characterised in that comprise the following steps:
(1) sample set comprising many ground dialect is set up, sample is pre-processed, and be labeled;
(2) all pictures in training set and test set are scaled to the cromogram of predefined size, and are that every pictures assign mark Label information, label information represents the county-level city belonging to corresponding picture;
(3) set up convolutional neural networks, each level of convolutional neural networks be followed successively by input layer, multiple convolutional layers, full articulamentum and Output layer, affiliated convolutional neural networks are trained using gradient descent method and back-propagation algorithm;
(4) after the completion of training, the error rate downward trend figure in training process is obtained.
2. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that right in step (1) Sample set is pre-processed, and audio file is converted into sonograph, and remove the margin in sonograph.
3. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 2, it is characterised in that in step (1), sample This collection includes the dialect sample in multiple places.
4. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (2), figure Piece unification is scaled to 227 × 227 colour picture.
5. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 4, it is characterised in that in step (3), volume Product neutral net is classical Alexnet network structures, and in the network, ground floor is input layer, receive size for 227 × Used as input, last layer is output layer to 227 coloured image, altogether N number of node, and N represents the dialect data set of needs classification Classification sum.
6. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (3), ladder Spend concretely comprising the following steps for descent algorithm:Since any point, a segment distance is moved along the opposite direction of the gradient, then along new position The segment distance of gradient reverse direction operation one, such iteration;Solution is moved towards the direction of descending steepest always, it would be desirable to move to function Global minima point, that is, cause the minimum point of error amount.
7. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (3), instead To concretely comprising the following steps for propagation algorithm:After the minimum value for finding error using gradient descent method, from last layer of network Update weights forward successively, weights, i.e. chain type Rule for derivation are updated with the method for backpropagation, chain type Rule for derivation is as follows:
d z d x = d z d y · d y d x .
8. the dialect sorting technique of convolutional neural networks is based on as claimed in claim 1, it is characterised in that in step (4), instruction Practice sample and test sample, i.e., all samples are trained in batches, constantly update weights, until the value of object function restrains Value in a stability region, i.e. error rate converge on a stationary value.
CN201710144714.3A 2017-03-13 2017-03-13 A kind of dialect sorting technique based on convolutional neural networks Pending CN106919710A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710144714.3A CN106919710A (en) 2017-03-13 2017-03-13 A kind of dialect sorting technique based on convolutional neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710144714.3A CN106919710A (en) 2017-03-13 2017-03-13 A kind of dialect sorting technique based on convolutional neural networks

Publications (1)

Publication Number Publication Date
CN106919710A true CN106919710A (en) 2017-07-04

Family

ID=59461330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710144714.3A Pending CN106919710A (en) 2017-03-13 2017-03-13 A kind of dialect sorting technique based on convolutional neural networks

Country Status (1)

Country Link
CN (1) CN106919710A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170735A (en) * 2017-12-15 2018-06-15 东南大学 A kind of dialect databases method for building up suitable for convolutional neural networks
CN109887497A (en) * 2019-04-12 2019-06-14 北京百度网讯科技有限公司 Modeling method, device and the equipment of speech recognition
CN110033760A (en) * 2019-04-15 2019-07-19 北京百度网讯科技有限公司 Modeling method, device and the equipment of speech recognition
CN110148400A (en) * 2018-07-18 2019-08-20 腾讯科技(深圳)有限公司 The pronunciation recognition methods of type, the training method of model, device and equipment
WO2019232849A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Chinese character model training method, handwritten character recognition method, apparatuses, device and medium
CN111488486A (en) * 2020-04-20 2020-08-04 武汉大学 Electronic music classification method and system based on multi-sound-source separation
CN111881797A (en) * 2020-07-20 2020-11-03 北京理工大学 Method, device, equipment and storage medium for finely classifying vegetation on coastal wetland
CN115472147A (en) * 2022-09-15 2022-12-13 北京大学深圳医院 Language identification method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test
WO2015180368A1 (en) * 2014-05-27 2015-12-03 江苏大学 Variable factor decomposition method for semi-supervised speech features
CN105895110A (en) * 2016-06-30 2016-08-24 北京奇艺世纪科技有限公司 Method and device for classifying audio files
CN106485251A (en) * 2016-10-08 2017-03-08 天津工业大学 Egg embryo classification based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015180368A1 (en) * 2014-05-27 2015-12-03 江苏大学 Variable factor decomposition method for semi-supervised speech features
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test
CN105895110A (en) * 2016-06-30 2016-08-24 北京奇艺世纪科技有限公司 Method and device for classifying audio files
CN106485251A (en) * 2016-10-08 2017-03-08 天津工业大学 Egg embryo classification based on deep learning

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170735A (en) * 2017-12-15 2018-06-15 东南大学 A kind of dialect databases method for building up suitable for convolutional neural networks
WO2019232849A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Chinese character model training method, handwritten character recognition method, apparatuses, device and medium
CN110148400A (en) * 2018-07-18 2019-08-20 腾讯科技(深圳)有限公司 The pronunciation recognition methods of type, the training method of model, device and equipment
CN110148400B (en) * 2018-07-18 2023-03-17 腾讯科技(深圳)有限公司 Pronunciation type recognition method, model training method, device and equipment
CN109887497A (en) * 2019-04-12 2019-06-14 北京百度网讯科技有限公司 Modeling method, device and the equipment of speech recognition
CN109887497B (en) * 2019-04-12 2021-01-29 北京百度网讯科技有限公司 Modeling method, device and equipment for speech recognition
CN110033760A (en) * 2019-04-15 2019-07-19 北京百度网讯科技有限公司 Modeling method, device and the equipment of speech recognition
US11688391B2 (en) 2019-04-15 2023-06-27 Beijing Baidu Netcom Science And Technology Co. Mandarin and dialect mixed modeling and speech recognition
CN110033760B (en) * 2019-04-15 2021-01-29 北京百度网讯科技有限公司 Modeling method, device and equipment for speech recognition
CN111488486B (en) * 2020-04-20 2021-08-17 武汉大学 Electronic music classification method and system based on multi-sound-source separation
CN111488486A (en) * 2020-04-20 2020-08-04 武汉大学 Electronic music classification method and system based on multi-sound-source separation
CN111881797A (en) * 2020-07-20 2020-11-03 北京理工大学 Method, device, equipment and storage medium for finely classifying vegetation on coastal wetland
CN115472147A (en) * 2022-09-15 2022-12-13 北京大学深圳医院 Language identification method and device

Similar Documents

Publication Publication Date Title
CN106919710A (en) A kind of dialect sorting technique based on convolutional neural networks
Zhao et al. A visual long-short-term memory based integrated CNN model for fabric defect image classification
Poma et al. Dense extreme inception network: Towards a robust cnn model for edge detection
Hertel et al. Deep convolutional neural networks as generic feature extractors
CN109558942B (en) Neural network migration method based on shallow learning
CN110309856A (en) Image classification method, the training method of neural network and device
Colak et al. Automated McIntosh-based classification of sunspot groups using MDI images
CN107423756A (en) Nuclear magnetic resonance image sequence sorting technique based on depth convolutional neural networks combination shot and long term memory models
CN107358169A (en) A kind of facial expression recognizing method and expression recognition device
CN104517122A (en) Image target recognition method based on optimized convolution architecture
CN107408209A (en) Without the classification of the automatic defect of sampling and feature selecting
CN105718952A (en) Method for focus classification of sectional medical images by employing deep learning network
CN110083700A (en) A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
CN110070107A (en) Object identification method and device
AU2020100052A4 (en) Unattended video classifying system based on transfer learning
CN110457982A (en) A kind of crop disease image-recognizing method based on feature transfer learning
CN107203606A (en) Text detection and recognition methods under natural scene based on convolutional neural networks
Pathar et al. Human emotion recognition using convolutional neural network in real time
CN112614119A (en) Medical image region-of-interest visualization method, device, storage medium and equipment
CN109815945A (en) A kind of respiratory tract inspection result interpreting system and method based on image recognition
CN108960260A (en) A kind of method of generating classification model, medical image image classification method and device
CN109508640A (en) A kind of crowd's sentiment analysis method, apparatus and storage medium
Paul et al. A modern approach for sign language interpretation using convolutional neural network
CN112668486A (en) Method, device and carrier for identifying facial expressions of pre-activated residual depth separable convolutional network
Diouf et al. Convolutional neural network and decision support in medical imaging: case study of the recognition of blood cell subtypes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170704