CN108010514A - A kind of method of speech classification based on deep neural network - Google Patents
A kind of method of speech classification based on deep neural network Download PDFInfo
- Publication number
- CN108010514A CN108010514A CN201711155884.8A CN201711155884A CN108010514A CN 108010514 A CN108010514 A CN 108010514A CN 201711155884 A CN201711155884 A CN 201711155884A CN 108010514 A CN108010514 A CN 108010514A
- Authority
- CN
- China
- Prior art keywords
- local
- classification
- global
- frequency domain
- sound spectrograph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 18
- 230000007246 mechanism Effects 0.000 claims abstract description 23
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 21
- 239000000284 extract Substances 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000005267 amalgamation Methods 0.000 claims abstract description 5
- 230000007423 decrease Effects 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 9
- 239000000203 mixture Substances 0.000 claims description 7
- 238000009826 distribution Methods 0.000 claims description 5
- 230000001537 neural effect Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 241000208340 Araliaceae Species 0.000 claims description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 2
- 235000008434 ginseng Nutrition 0.000 claims description 2
- 238000012546 transfer Methods 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of method of speech classification based on deep neural network, it is intended to by a unified algorithm model, solves the problems, such as different Classification of Speech.The present invention includes the following steps:S1:Convert speech into corresponding sound spectrograph;Piecemeal is carried out along frequency domain on complete sound spectrograph, obtains one group of local frequency domain information set.S2:Input using complete and local frequency domain information as model respectively, based on different inputs, convolutional neural networks can extract part and global characteristics.S3:With notice mechanism, amalgamation of global and local feature representation, form final feature representation.S4:Using marked data, pass through gradient decline and back-propagation algorithm training network.S5:To unlabelled voice, using trained parameter, the classification of model output maximum probability is as prediction result.The present invention realizes the unified algorithm model to different phonetic classification problem, and accuracy is improved in multiple Classification of Speech problems.
Description
Technical field
A kind of method of speech classification based on deep neural network, for handling the classification task of different voices, is related to
The technical fields such as Speech processing, artificial intelligence.
Background technology
With the fast development of computer technology, how more preferably the mankind constantly strengthen the dependence of computer and requirement,
Ground is interacted with computer has become a research hotspot.Voice as it is most universal in daily life, most natural one
Kind communication way, it includes huge information content, such as the accent of speaker, the affective state of speaker etc..Computer
Classification of Speech recognition capability be computer carry out speech processes important component, realize nature human-computer interaction interface
Key precondition, has very big researching value and application value.Speech classification technique is a highly important research direction, it
In speech recognition, voice content detection etc. all plays an important role.Classification of Speech is the base that advanced treating is carried out to audio
Plinth and premise, for the section audio currently provided, the audio environment residing for voice can be determined in advance by classification, say
The gender of people, accent, mood etc. are talked about, basis is provided to adjust the adaptive algorithm of speech model.Therefore, method of speech classification is
It is vital.
Classification of Speech includes a variety of different tasks, such as:Speech emotion recognition, accents recognition, Speaker Identification, voice
Ambient zone grades.The challenge of Classification of Speech task is the higher-dimension characteristic of voice.Traditional method of speech classification, it will usually be directed to
The problem of single or database, extract specific audio frequency characteristics, so as to reduce the dimension of the data of input sorter network.So
And feature extraction needs enough Speech processing knowledge, because feature extraction represents the filtering of information, information can be caused
Missing.Secondly, traditional sorting algorithm is often not suitable for more classification tasks, such as support vector machines etc..These problems are all
The difficult point that our need of work is captured.
Deep neural network method is one of current processing most important means of big data, especially high dimensional data.Depth
The characteristics of neutral net, is to realize to sound by the training to connection weight by constructing the nonlinear mapping function of multilayer
The study of the feature of frequency evidence simultaneously is used to classify.Deep neural network, can be according to output because it has the function of feedback, study etc.
As a result network inherent parameters are adjusted, at present, although the upsurge of deep neural network is gradually in every subjects field
Sprawling is opened, and is applied successfully to multiple fields, including machine translation, speech recognition, target identification etc..
The content of the invention
The present invention provides a kind of method of speech classification based on deep neural network for above-mentioned shortcoming, solves existing
There is the intractable problem of feature extracting method, high dimensional data only for the classification of distinctive single task or data in technology.
To achieve these goals, the technical solution adopted by the present invention is:
A kind of method of speech classification based on deep neural network, it is characterised in that include the following steps:
S1:Voice data is subjected to Short Time Fourier Transform, is converted to corresponding sound spectrograph;Along frequency on complete sound spectrograph
Domain carries out piecemeal, obtains one group of local frequency domain information set;
S2:The algorithm model based on convolutional neural networks and notice mechanism is established, respectively by complete sound spectrograph and part
Input of the frequency domain information as model, carries out feature learning;Based on local and complete sound spectrograph information, convolutional Neural net is used
Network extracts part and global characteristics;
S3:With notice mechanism, amalgamation of global and local feature representation, form final feature representation, are input to
Softmax graders, so as to obtain the prediction of the classification belonging to voice;
S4:Using marked voice data, by gradient decline and back-propagation algorithm training network, and network ginseng is preserved
Number;
S5:To unlabelled voice, it is predicted using trained model, model exports the affiliated classification conduct of maximum probability
Final prediction result.
Further, distributed sound spectrograph transfer process specifically comprises the following steps in the S1:
Short Time Fourier Transform is carried out to original audio, given original audio is divided into M sections of short audios;To every section of short audio,
Its short-time energy and modulus are calculated, finally obtains a complete sound spectrograph expression S, the S of sound spectrograph is expressed as follows:
(1)
Wherein, N is expressed as every section of short audio length scale;Formula(1)In illustrate sound spectrograph be two-dimensional matrix structure composition,
Two of which dimension phonetically represents the change order of time respectively and frequency domain is changed by the section of low frequency to high frequency, often
Numerical values recited on a point represents the size of amplitude.
The direction changed to complete sound spectrograph information along frequency domain carries out piecemeal, can obtain one group of local and overall situation
Spectrum information set, that is, obtain one group based on different frequency domain distributions input data combination:。
Further, the feature extraction of convolutional neural networks specifically comprises the following steps in S2:
For multiple local inputs, the feature of different information is extracted using convolutional neural networks, so as to obtain one group of local expression:
(2)
In above formula, each local inputThere is corresponding deconvolution parameterWith,fIt is expressed as activation primitive;Most
The one group of local feature obtained eventually is expressed as:。
For current complete global frequency domain information, the feature of the overall situation is extracted using convolutional neural networks, it is specific to calculate
Formula is as follows:
(3)
Wherein,aIt is expressed as the global characteristics that convolutional neural networks extract.
Wherein formula(2)With(3)In be mainly concerned with convolutional neural networks convolution and pondization operation.Convolution it is specific
Operation is as follows:
(4)
Wherein,MWithNThe size of convolution kernel is defined,m,nRepresent line number and columns, for defining pixel position,fIt is convolution
Kernel function,Define current layeriOKjThe feature representation of row,Define current layeriOKjThe input data of row.wIt is fixed
The justice parameter of convolution kernel,bIt is corresponding bias;
Formula(4)In convolution operation, play an important role in convolutional network.By sharing the design of weights, convolution
The feature that network extracts has feature invariant shape;Change somewhat, the changing features that network proposes occur for the input inputted
Less.
The concrete operations in pond are as follows:
(5)
Wherein,Pond function is represent, most common pond function there are three kinds, i.e., in receptive field(The space of convolution kernel)It is interior
It is maximized, minimum value or average value.aIt is the input that represent pond layer,pRepresent the output after pondization operation;
Formula(5)Middle pond parameter greatly reduces the number of weights in network, it is therefore prevented that over-fitting occurs in network.
Further, the notice mechanism amalgamation of global in S3 specifically comprises the following steps with local feature representation:
Based on different local features, with notice mechanism, new global characteristics expression is retrieved;Global information is given first
Assign each of which part one " coefficient ":
(6)
In above formula,Represent global characteristicsaA certain part, altogethermA composition information,Represent based on current
Local feature,The coefficient of this part, represents its importance degree;
Formula(6)Implication be notice mechanism essence operation, based on guidance information local feature, to global characteristicsa
Each part assign it is differentWeights, represent the significance level of the composition.It is intended to wish to pass through network training, looks for
Go out most representational feature in composition.
Then the coefficient that represent significance level calculated is multiplied with corresponding part, form one it is new complete
Office's information:
(7)
Notice mechanism is so used, is obtainednA new global information, with initial global characteristicsaContraposition is added, and is obtained most
Whole feature representation:
(8)
By final feature representationA, softmax graders are input to, the classification of the probable value maximum of gained is the voice number
According to prediction classification.
After using the above scheme, the beneficial effects of the present invention are:
(1)Traditional method of speech classification both for it is single the problem of passed through using different feature extraction algorithms, the present invention
Deep neural network directly carries out feature learning to voice sound spectrograph, can independently learn different sounds according to the difference of task
Frequency feature.
(2)The training of deep neural network generally requires big data, but presently disclosed voice data number is less.Base
In the research of conventional deep neural network, present invention further proposes merging for convolutional neural networks and notice mechanism
Algorithm model, further increasing discrimination in multiple tasks.
By taking two groups of Classification of Speech tasks of accents recognition and Speaker Identification as an example:
Table 1 illustrates the contrast of model and other methods in the present invention in accents recognition problem, and wherein i-Vector is classical
Feature extraction algorithm, VGG and ResNets are representative convolutional neural networks models.
Table 2 illustrates the contrast of model and other methods in the present invention in Speaker Identification problem, wherein MFCC be through
The feature extraction algorithm of allusion quotation, VGG and ResNets are representative convolutional neural networks models.
Above-mentioned the results show:
1)In multiple Classification of Speech problems, the feature that model learning proposed by the present invention arrives is calculated compared to traditional feature extraction
Method, can obtain more preferable recognition result.
2)Compared to the method for other neural net methods, the present invention, which further increasing, applies notice mechanism
In convolutional neural networks, increase model robustness and generalization ability, the accuracy rate of speech recognition is all improved in multiple problems.
Brief description of the drawings
Fig. 1 is algorithm model synoptic diagram in the present invention;
Fig. 2 is the distributed sound spectrograph based on frequency domain;
Fig. 3 is the convolution block basic block diagram for employing notice mechanism;
Fig. 4 is the overall process figure of the present invention.
Specific embodiment
Each attached drawing in the embodiment of the present invention can be combined one by one below, the technical solution in the present embodiment is carried out detailed
Ground describes;But it is described here go out embodiment be only the present invention part of the embodiment, rather than whole specification
Embodiment.
Referring to Fig. 1, a kind of kernel model of the Classification of Speech based on deep neural network is one and multiple employs attention
The deep neural network model of the convolution block composition of power mechanism.One is convolutional neural networks, main using the non-linear of multilayer
Function, to learn the mapping relations between input data and feature;Deep learning algorithm can automatically learn according to target
Correlated characteristic;Another is notice mechanism, mainly by distributing local message different weights, so as to obtain a part
The different expression of information proportion.The present invention is effectively improved voice point by merging deep learning algorithm and notice mechanism
The accuracy of class..
The method of speech classification based on deep neural network, includes the following steps:
Step S1:Short Time Fourier Transform is carried out to original audio, given original audio is divided into M sections of short audios;To every section
Short audio, calculates its short-time energy and modulus, finally obtains a complete sound spectrograph expression S, and the S of sound spectrograph is expressed as follows:
(1)
Wherein, N is expressed as every section of short audio length scale.
The direction changed to complete sound spectrograph information along frequency domain carries out piecemeal, can obtain one group of local and overall situation
Spectrum information set, that is, obtain one group based on different frequency domain distributions input data combination:。
Referring to Fig. 2 it can be seen that the displaying of complete sound spectrograph and the distributed sound spectrograph based on frequency domain.Based on distribution
Sound spectrograph be along frequency domain change section carry out piecemeal, so as to obtain the distributed intelligence in different frequency section.
Step S2:The algorithm model based on convolutional neural networks and notice mechanism is established, respectively by complete sound spectrograph
With input of the local frequency domain information as model, feature learning is carried out;For multiple local inputs, convolutional neural networks are used
The feature of different information is extracted, so as to obtain one group of local expression:
(2)
In above formula, each local inputThere is corresponding deconvolution parameterWith,fIt is expressed as activation primitive;Most
The one group of local feature obtained eventually is expressed as:。
For current complete global frequency domain information, the feature of the overall situation is extracted using convolutional neural networks, it is specific to calculate
Formula is as follows:
(3)
Wherein,aIt is expressed as the global characteristics that convolutional neural networks extract.
Step S3:On the basis of the local feature and global characteristics that step S2 is proposed, based on different local features, fortune
With notice mechanism, new global characteristics expression is retrieved;Assigning each of which part one to global information first " is
Number ":
(6)
In above formula,Represent global characteristicsaA certain part, altogethermA composition information,Represent based on current
Local feature,The coefficient of this part, represents its importance degree.
Then the coefficient that represent significance level calculated is multiplied with corresponding part, form one it is new complete
Office's information:
(7)
Notice mechanism is so used, is obtainednA new global information, with initial global characteristicsaContraposition is added, and is obtained most
Whole feature representation:
(8)
By final feature representationA, softmax graders are input to, the classification of the probable value maximum of gained is the voice number
According to prediction classification.
Referring to Fig. 3, the basic block diagram of the convolution block based on notice mechanism is illustrated, is included based on local and global
The characteristic extraction procedure of information, and it is last use notice thought merging again into row information, last to one is finally
Feature representationA。
Step S4:Using marked voice data, by gradient decline and back-propagation algorithm training network, and protect
Deposit network parameter;The initial structure of model is that the parameter in network is random initializtion, is produced by marked voice data
Error, so as to update network parameter, until network becomes stable, retain optimal parameter.
Step S5:To unlabelled voice, it is predicted using trained model and parameter, model output maximum probability
Affiliated classification as final prediction result.
Procedure chart of the invention complete is illustrated from step S1 to step S5 referring to Fig. 4, if in the presence of the audio for still needing to identification,
Step S1 to step S5 is then continued to execute, the last highest place classification of grader output probability value is prediction result.
Claims (4)
1. a kind of method of speech classification based on deep neural network, it is characterised in that distributed sound spectrograph and convolutional Neural net
The combination of network and notice mechanism, includes the following steps:
S1:Voice data is subjected to Short Time Fourier Transform, is converted to corresponding sound spectrograph;Along frequency on complete sound spectrograph
Domain carries out piecemeal, obtains one group of local frequency domain information set;
S2:The algorithm model based on convolutional neural networks and notice mechanism is established, respectively by complete sound spectrograph and part
Input of the frequency domain information as model, carries out feature learning;Based on local and complete sound spectrograph information, convolutional Neural net is used
Network extracts part and global characteristics;
S3:With notice mechanism, amalgamation of global and local feature representation, form final feature representation, are input to
Softmax graders, so as to obtain the prediction of the classification belonging to voice;
S4:Using marked voice data, by gradient decline and back-propagation algorithm training network, and network ginseng is preserved
Number;
S5:To unlabelled voice, it is predicted using trained model, model exports the affiliated classification conduct of maximum probability
Final prediction result.
A kind of 2. method of speech classification based on deep neural network according to claim 1, it is characterised in that:The S1
Middle distribution sound spectrograph transfer process specifically comprises the following steps:
S11:Short Time Fourier Transform is carried out to original audio, given original audio is divided into M sections of short audios;To every section of minor
Frequently, its short-time energy and modulus are calculated, finally obtains a complete sound spectrograph expression S, the S of sound spectrograph is expressed as follows:
(1)
Wherein, N is expressed as every section of short audio length scale;
S12:The direction changed to complete sound spectrograph information along frequency domain carries out piecemeal, wherein some local frequency domain information
Be expressed as follows:
(2)
One group of local spectrum information set with the overall situation is finally obtained, that is, obtains one group of input based on different frequency domain distributions
Data combine:。
A kind of 3. method of speech classification based on deep neural network according to claim 1, it is characterised in that:The S2
The feature extraction of middle convolutional neural networks specifically comprises the following steps:
S21:For multiple local inputs, the feature of different information is extracted using convolutional neural networks, so as to obtain one group of part
Expression:
(3)
In above formula, each local inputThere is corresponding deconvolution parameterWith,fIt is expressed as activation primitive;Finally
One group of obtained local feature is expressed as:;
S22:For current complete global frequency domain information, the feature of the overall situation is extracted using convolutional neural networks, it is specific to calculate
Formula is as follows:
(4)
Wherein,aIt is expressed as the global characteristics that convolutional neural networks extract.
A kind of 4. method of speech classification based on deep neural network according to claim 1, it is characterised in that:The step
Notice mechanism amalgamation of global in rapid S3 specifically comprises the following steps with local feature representation:
Based on different local features, with notice mechanism, new global characteristics expression is retrieved;Global information is given first
Assign each of which part one " coefficient ":
(6)
In above formula,Represent global characteristicsaA certain part, altogethermA composition information,Expression is based on current office
Portion's feature,The coefficient of this part, represents its importance degree;
Then the coefficient that represent significance level calculated is multiplied with corresponding part, forms a new global letter
Breath:
(7)
Notice mechanism is so used, is obtainednA new global information, with initial global characteristicsaContraposition is added, and is obtained most
Whole feature representation:
(8)
By final feature representationA, softmax graders are input to, the classification of the probable value maximum of gained is the voice number
According to prediction classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711155884.8A CN108010514B (en) | 2017-11-20 | 2017-11-20 | Voice classification method based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711155884.8A CN108010514B (en) | 2017-11-20 | 2017-11-20 | Voice classification method based on deep neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108010514A true CN108010514A (en) | 2018-05-08 |
CN108010514B CN108010514B (en) | 2021-09-10 |
Family
ID=62052777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711155884.8A Expired - Fee Related CN108010514B (en) | 2017-11-20 | 2017-11-20 | Voice classification method based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108010514B (en) |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846048A (en) * | 2018-05-30 | 2018-11-20 | 大连理工大学 | Musical genre classification method based on Recognition with Recurrent Neural Network and attention mechanism |
CN108877783A (en) * | 2018-07-05 | 2018-11-23 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus for determining the audio types of audio data |
CN109243466A (en) * | 2018-11-12 | 2019-01-18 | 成都傅立叶电子科技有限公司 | A kind of vocal print authentication training method and system |
CN109256135A (en) * | 2018-08-28 | 2019-01-22 | 桂林电子科技大学 | A kind of end-to-end method for identifying speaker, device and storage medium |
CN109285539A (en) * | 2018-11-28 | 2019-01-29 | 中国电子科技集团公司第四十七研究所 | A kind of sound identification method neural network based |
CN109410914A (en) * | 2018-08-28 | 2019-03-01 | 江西师范大学 | A kind of Jiangxi dialect phonetic and dialect point recognition methods |
CN109509475A (en) * | 2018-12-28 | 2019-03-22 | 出门问问信息科技有限公司 | Method, apparatus, electronic equipment and the computer readable storage medium of speech recognition |
CN109523994A (en) * | 2018-11-13 | 2019-03-26 | 四川大学 | A kind of multitask method of speech classification based on capsule neural network |
CN109599129A (en) * | 2018-11-13 | 2019-04-09 | 杭州电子科技大学 | Voice depression recognition methods based on attention mechanism and convolutional neural networks |
CN109767790A (en) * | 2019-02-28 | 2019-05-17 | 中国传媒大学 | A kind of speech-emotion recognition method and system |
CN109817233A (en) * | 2019-01-25 | 2019-05-28 | 清华大学 | Voice flow steganalysis method and system based on level attention network model |
CN110047516A (en) * | 2019-03-12 | 2019-07-23 | 天津大学 | A kind of speech-emotion recognition method based on gender perception |
CN110197206A (en) * | 2019-05-10 | 2019-09-03 | 杭州深睿博联科技有限公司 | The method and device of image procossing |
CN110223714A (en) * | 2019-06-03 | 2019-09-10 | 杭州哲信信息技术有限公司 | A kind of voice-based Emotion identification method |
CN110459225A (en) * | 2019-08-14 | 2019-11-15 | 南京邮电大学 | A kind of speaker identification system based on CNN fusion feature |
CN110503128A (en) * | 2018-05-18 | 2019-11-26 | 百度(美国)有限责任公司 | The spectrogram that confrontation network carries out Waveform composition is generated using convolution |
CN110534133A (en) * | 2019-08-28 | 2019-12-03 | 珠海亿智电子科技有限公司 | A kind of speech emotion recognition system and speech-emotion recognition method |
CN110648669A (en) * | 2019-09-30 | 2020-01-03 | 上海依图信息技术有限公司 | Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium |
CN110782878A (en) * | 2019-10-10 | 2020-02-11 | 天津大学 | Attention mechanism-based multi-scale audio scene recognition method |
CN110992988A (en) * | 2019-12-24 | 2020-04-10 | 东南大学 | Speech emotion recognition method and device based on domain confrontation |
CN111009262A (en) * | 2019-12-24 | 2020-04-14 | 携程计算机技术(上海)有限公司 | Voice gender identification method and system |
CN111223488A (en) * | 2019-12-30 | 2020-06-02 | Oppo广东移动通信有限公司 | Voice wake-up method, device, equipment and storage medium |
CN111259189A (en) * | 2018-11-30 | 2020-06-09 | 马上消费金融股份有限公司 | Music classification method and device |
CN111340187A (en) * | 2020-02-18 | 2020-06-26 | 河北工业大学 | Network characterization method based on counter attention mechanism |
CN111666996A (en) * | 2020-05-29 | 2020-09-15 | 湖北工业大学 | High-precision equipment source identification method based on attention mechanism |
CN112466298A (en) * | 2020-11-24 | 2021-03-09 | 网易(杭州)网络有限公司 | Voice detection method and device, electronic equipment and storage medium |
CN112489689A (en) * | 2020-11-30 | 2021-03-12 | 东南大学 | Cross-database voice emotion recognition method and device based on multi-scale difference confrontation |
CN112489687A (en) * | 2020-10-28 | 2021-03-12 | 深兰人工智能芯片研究院(江苏)有限公司 | Speech emotion recognition method and device based on sequence convolution |
CN112885372A (en) * | 2021-01-15 | 2021-06-01 | 国网山东省电力公司威海供电公司 | Intelligent diagnosis method, system, terminal and medium for power equipment fault sound |
CN112967730A (en) * | 2021-01-29 | 2021-06-15 | 北京达佳互联信息技术有限公司 | Voice signal processing method and device, electronic equipment and storage medium |
CN112992119A (en) * | 2021-01-14 | 2021-06-18 | 安徽大学 | Deep neural network-based accent classification method and model thereof |
CN113035227A (en) * | 2021-03-12 | 2021-06-25 | 山东大学 | Multi-modal voice separation method and system |
CN113049084A (en) * | 2021-03-16 | 2021-06-29 | 电子科技大学 | Attention mechanism-based Resnet distributed optical fiber sensing signal identification method |
CN113409827A (en) * | 2021-06-17 | 2021-09-17 | 山东省计算中心(国家超级计算济南中心) | Voice endpoint detection method and system based on local convolution block attention network |
CN113571063A (en) * | 2021-02-02 | 2021-10-29 | 腾讯科技(深圳)有限公司 | Voice signal recognition method and device, electronic equipment and storage medium |
CN114141244A (en) * | 2020-09-04 | 2022-03-04 | 四川大学 | Voice recognition technology based on audio media analysis |
CN116504259A (en) * | 2023-06-30 | 2023-07-28 | 中汇丰(北京)科技有限公司 | Semantic recognition method based on natural language processing |
CN116778951A (en) * | 2023-05-25 | 2023-09-19 | 上海蜜度信息技术有限公司 | Audio classification method, device, equipment and medium based on graph enhancement |
CN116825092A (en) * | 2023-08-28 | 2023-09-29 | 珠海亿智电子科技有限公司 | Speech recognition method, training method and device of speech recognition model |
CN117275491A (en) * | 2023-11-17 | 2023-12-22 | 青岛科技大学 | Sound classification method based on audio conversion and time diagram neural network |
CN112967730B (en) * | 2021-01-29 | 2024-07-02 | 北京达佳互联信息技术有限公司 | Voice signal processing method and device, electronic equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706780A (en) * | 2009-09-03 | 2010-05-12 | 北京交通大学 | Image semantic retrieving method based on visual attention model |
CN102044254A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Speech spectrum color enhancement method for speech visualization |
US20160283841A1 (en) * | 2015-03-27 | 2016-09-29 | Google Inc. | Convolutional neural networks |
US20170099200A1 (en) * | 2015-10-06 | 2017-04-06 | Evolv Technologies, Inc. | Platform for Gathering Real-Time Analysis |
CN106652999A (en) * | 2015-10-29 | 2017-05-10 | 三星Sds株式会社 | System and method for voice recognition |
CN106847309A (en) * | 2017-01-09 | 2017-06-13 | 华南理工大学 | A kind of speech-emotion recognition method |
CN106952649A (en) * | 2017-05-14 | 2017-07-14 | 北京工业大学 | Method for distinguishing speek person based on convolutional neural networks and spectrogram |
CN107145518A (en) * | 2017-04-10 | 2017-09-08 | 同济大学 | Personalized recommendation system based on deep learning under a kind of social networks |
CN107203999A (en) * | 2017-04-28 | 2017-09-26 | 北京航空航天大学 | A kind of skin lens image automatic division method based on full convolutional neural networks |
CN107221326A (en) * | 2017-05-16 | 2017-09-29 | 百度在线网络技术(北京)有限公司 | Voice awakening method, device and computer equipment based on artificial intelligence |
CN107239446A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism |
CN107256228A (en) * | 2017-05-02 | 2017-10-17 | 清华大学 | Answer selection system and method based on structuring notice mechanism |
CN107316066A (en) * | 2017-07-28 | 2017-11-03 | 北京工商大学 | Image classification method and system based on multi-path convolutional neural networks |
-
2017
- 2017-11-20 CN CN201711155884.8A patent/CN108010514B/en not_active Expired - Fee Related
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706780A (en) * | 2009-09-03 | 2010-05-12 | 北京交通大学 | Image semantic retrieving method based on visual attention model |
CN102044254A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Speech spectrum color enhancement method for speech visualization |
US20160283841A1 (en) * | 2015-03-27 | 2016-09-29 | Google Inc. | Convolutional neural networks |
US20170099200A1 (en) * | 2015-10-06 | 2017-04-06 | Evolv Technologies, Inc. | Platform for Gathering Real-Time Analysis |
CN106652999A (en) * | 2015-10-29 | 2017-05-10 | 三星Sds株式会社 | System and method for voice recognition |
CN106847309A (en) * | 2017-01-09 | 2017-06-13 | 华南理工大学 | A kind of speech-emotion recognition method |
CN107145518A (en) * | 2017-04-10 | 2017-09-08 | 同济大学 | Personalized recommendation system based on deep learning under a kind of social networks |
CN107203999A (en) * | 2017-04-28 | 2017-09-26 | 北京航空航天大学 | A kind of skin lens image automatic division method based on full convolutional neural networks |
CN107256228A (en) * | 2017-05-02 | 2017-10-17 | 清华大学 | Answer selection system and method based on structuring notice mechanism |
CN106952649A (en) * | 2017-05-14 | 2017-07-14 | 北京工业大学 | Method for distinguishing speek person based on convolutional neural networks and spectrogram |
CN107221326A (en) * | 2017-05-16 | 2017-09-29 | 百度在线网络技术(北京)有限公司 | Voice awakening method, device and computer equipment based on artificial intelligence |
CN107239446A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism |
CN107316066A (en) * | 2017-07-28 | 2017-11-03 | 北京工商大学 | Image classification method and system based on multi-path convolutional neural networks |
Non-Patent Citations (4)
Title |
---|
CHE-WEI HUANG: ""Deep Convolutional Recurrent neural network with attention mechanism robust speech emotion recognition"", 《ICME 2017》 * |
FAN HU: ""Transferring Deep Convolutional Neural networks for the sceneclassification of high-resolution remote sensing imagery"", 《MDPI》 * |
KYUNGHYUN CHO: ""Describing Multimedia Content Using Attention-based Encoder-Decoder Networks"", 《IEEE TRANSACTION ON MULTIMEDIA》 * |
L. HE, M. LECH: ""Time-frequency feature extraction from spectrograms and wavelet packets with application to automatic stress and emotion classification in speech"", 《PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING》 * |
Cited By (63)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110503128A (en) * | 2018-05-18 | 2019-11-26 | 百度(美国)有限责任公司 | The spectrogram that confrontation network carries out Waveform composition is generated using convolution |
CN108846048A (en) * | 2018-05-30 | 2018-11-20 | 大连理工大学 | Musical genre classification method based on Recognition with Recurrent Neural Network and attention mechanism |
CN108877783A (en) * | 2018-07-05 | 2018-11-23 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus for determining the audio types of audio data |
CN109410914B (en) * | 2018-08-28 | 2022-02-22 | 江西师范大学 | Method for identifying Jiangxi dialect speech and dialect point |
CN109256135A (en) * | 2018-08-28 | 2019-01-22 | 桂林电子科技大学 | A kind of end-to-end method for identifying speaker, device and storage medium |
CN109256135B (en) * | 2018-08-28 | 2021-05-18 | 桂林电子科技大学 | End-to-end speaker confirmation method, device and storage medium |
CN109410914A (en) * | 2018-08-28 | 2019-03-01 | 江西师范大学 | A kind of Jiangxi dialect phonetic and dialect point recognition methods |
CN109243466A (en) * | 2018-11-12 | 2019-01-18 | 成都傅立叶电子科技有限公司 | A kind of vocal print authentication training method and system |
CN109599129A (en) * | 2018-11-13 | 2019-04-09 | 杭州电子科技大学 | Voice depression recognition methods based on attention mechanism and convolutional neural networks |
CN109599129B (en) * | 2018-11-13 | 2021-09-14 | 杭州电子科技大学 | Voice depression recognition system based on attention mechanism and convolutional neural network |
CN109523994A (en) * | 2018-11-13 | 2019-03-26 | 四川大学 | A kind of multitask method of speech classification based on capsule neural network |
CN109285539B (en) * | 2018-11-28 | 2022-07-05 | 中国电子科技集团公司第四十七研究所 | Sound recognition method based on neural network |
CN109285539A (en) * | 2018-11-28 | 2019-01-29 | 中国电子科技集团公司第四十七研究所 | A kind of sound identification method neural network based |
CN111259189A (en) * | 2018-11-30 | 2020-06-09 | 马上消费金融股份有限公司 | Music classification method and device |
CN109509475B (en) * | 2018-12-28 | 2021-11-23 | 出门问问信息科技有限公司 | Voice recognition method and device, electronic equipment and computer readable storage medium |
CN109509475A (en) * | 2018-12-28 | 2019-03-22 | 出门问问信息科技有限公司 | Method, apparatus, electronic equipment and the computer readable storage medium of speech recognition |
CN109817233A (en) * | 2019-01-25 | 2019-05-28 | 清华大学 | Voice flow steganalysis method and system based on level attention network model |
CN109767790A (en) * | 2019-02-28 | 2019-05-17 | 中国传媒大学 | A kind of speech-emotion recognition method and system |
CN110047516A (en) * | 2019-03-12 | 2019-07-23 | 天津大学 | A kind of speech-emotion recognition method based on gender perception |
CN110197206A (en) * | 2019-05-10 | 2019-09-03 | 杭州深睿博联科技有限公司 | The method and device of image procossing |
CN110223714B (en) * | 2019-06-03 | 2021-08-03 | 杭州哲信信息技术有限公司 | Emotion recognition method based on voice |
CN110223714A (en) * | 2019-06-03 | 2019-09-10 | 杭州哲信信息技术有限公司 | A kind of voice-based Emotion identification method |
CN110459225A (en) * | 2019-08-14 | 2019-11-15 | 南京邮电大学 | A kind of speaker identification system based on CNN fusion feature |
CN110534133A (en) * | 2019-08-28 | 2019-12-03 | 珠海亿智电子科技有限公司 | A kind of speech emotion recognition system and speech-emotion recognition method |
CN110534133B (en) * | 2019-08-28 | 2022-03-25 | 珠海亿智电子科技有限公司 | Voice emotion recognition system and voice emotion recognition method |
CN110648669A (en) * | 2019-09-30 | 2020-01-03 | 上海依图信息技术有限公司 | Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium |
CN110782878A (en) * | 2019-10-10 | 2020-02-11 | 天津大学 | Attention mechanism-based multi-scale audio scene recognition method |
CN110782878B (en) * | 2019-10-10 | 2022-04-05 | 天津大学 | Attention mechanism-based multi-scale audio scene recognition method |
CN110992988B (en) * | 2019-12-24 | 2022-03-08 | 东南大学 | Speech emotion recognition method and device based on domain confrontation |
CN110992988A (en) * | 2019-12-24 | 2020-04-10 | 东南大学 | Speech emotion recognition method and device based on domain confrontation |
CN111009262A (en) * | 2019-12-24 | 2020-04-14 | 携程计算机技术(上海)有限公司 | Voice gender identification method and system |
CN111223488B (en) * | 2019-12-30 | 2023-01-17 | Oppo广东移动通信有限公司 | Voice wake-up method, device, equipment and storage medium |
CN111223488A (en) * | 2019-12-30 | 2020-06-02 | Oppo广东移动通信有限公司 | Voice wake-up method, device, equipment and storage medium |
CN111340187A (en) * | 2020-02-18 | 2020-06-26 | 河北工业大学 | Network characterization method based on counter attention mechanism |
CN111340187B (en) * | 2020-02-18 | 2024-02-02 | 河北工业大学 | Network characterization method based on attention countermeasure mechanism |
CN111666996A (en) * | 2020-05-29 | 2020-09-15 | 湖北工业大学 | High-precision equipment source identification method based on attention mechanism |
CN111666996B (en) * | 2020-05-29 | 2023-09-19 | 湖北工业大学 | High-precision equipment source identification method based on attention mechanism |
CN114141244A (en) * | 2020-09-04 | 2022-03-04 | 四川大学 | Voice recognition technology based on audio media analysis |
CN112489687A (en) * | 2020-10-28 | 2021-03-12 | 深兰人工智能芯片研究院(江苏)有限公司 | Speech emotion recognition method and device based on sequence convolution |
CN112489687B (en) * | 2020-10-28 | 2024-04-26 | 深兰人工智能芯片研究院(江苏)有限公司 | Voice emotion recognition method and device based on sequence convolution |
CN112466298A (en) * | 2020-11-24 | 2021-03-09 | 网易(杭州)网络有限公司 | Voice detection method and device, electronic equipment and storage medium |
CN112466298B (en) * | 2020-11-24 | 2023-08-11 | 杭州网易智企科技有限公司 | Voice detection method, device, electronic equipment and storage medium |
CN112489689B (en) * | 2020-11-30 | 2024-04-30 | 东南大学 | Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure |
CN112489689A (en) * | 2020-11-30 | 2021-03-12 | 东南大学 | Cross-database voice emotion recognition method and device based on multi-scale difference confrontation |
CN112992119A (en) * | 2021-01-14 | 2021-06-18 | 安徽大学 | Deep neural network-based accent classification method and model thereof |
CN112992119B (en) * | 2021-01-14 | 2024-05-03 | 安徽大学 | Accent classification method based on deep neural network and model thereof |
CN112885372A (en) * | 2021-01-15 | 2021-06-01 | 国网山东省电力公司威海供电公司 | Intelligent diagnosis method, system, terminal and medium for power equipment fault sound |
CN112885372B (en) * | 2021-01-15 | 2022-08-09 | 国网山东省电力公司威海供电公司 | Intelligent diagnosis method, system, terminal and medium for power equipment fault sound |
CN112967730B (en) * | 2021-01-29 | 2024-07-02 | 北京达佳互联信息技术有限公司 | Voice signal processing method and device, electronic equipment and storage medium |
CN112967730A (en) * | 2021-01-29 | 2021-06-15 | 北京达佳互联信息技术有限公司 | Voice signal processing method and device, electronic equipment and storage medium |
CN113571063B (en) * | 2021-02-02 | 2024-06-04 | 腾讯科技(深圳)有限公司 | Speech signal recognition method and device, electronic equipment and storage medium |
CN113571063A (en) * | 2021-02-02 | 2021-10-29 | 腾讯科技(深圳)有限公司 | Voice signal recognition method and device, electronic equipment and storage medium |
CN113035227A (en) * | 2021-03-12 | 2021-06-25 | 山东大学 | Multi-modal voice separation method and system |
CN113049084A (en) * | 2021-03-16 | 2021-06-29 | 电子科技大学 | Attention mechanism-based Resnet distributed optical fiber sensing signal identification method |
CN113409827A (en) * | 2021-06-17 | 2021-09-17 | 山东省计算中心(国家超级计算济南中心) | Voice endpoint detection method and system based on local convolution block attention network |
CN113409827B (en) * | 2021-06-17 | 2022-06-17 | 山东省计算中心(国家超级计算济南中心) | Voice endpoint detection method and system based on local convolution block attention network |
CN116778951A (en) * | 2023-05-25 | 2023-09-19 | 上海蜜度信息技术有限公司 | Audio classification method, device, equipment and medium based on graph enhancement |
CN116504259B (en) * | 2023-06-30 | 2023-08-29 | 中汇丰(北京)科技有限公司 | Semantic recognition method based on natural language processing |
CN116504259A (en) * | 2023-06-30 | 2023-07-28 | 中汇丰(北京)科技有限公司 | Semantic recognition method based on natural language processing |
CN116825092B (en) * | 2023-08-28 | 2023-12-01 | 珠海亿智电子科技有限公司 | Speech recognition method, training method and device of speech recognition model |
CN116825092A (en) * | 2023-08-28 | 2023-09-29 | 珠海亿智电子科技有限公司 | Speech recognition method, training method and device of speech recognition model |
CN117275491B (en) * | 2023-11-17 | 2024-01-30 | 青岛科技大学 | Sound classification method based on audio conversion and time attention seeking neural network |
CN117275491A (en) * | 2023-11-17 | 2023-12-22 | 青岛科技大学 | Sound classification method based on audio conversion and time diagram neural network |
Also Published As
Publication number | Publication date |
---|---|
CN108010514B (en) | 2021-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108010514A (en) | A kind of method of speech classification based on deep neural network | |
Sun et al. | Speech emotion recognition based on DNN-decision tree SVM model | |
Lee et al. | Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging | |
Chang et al. | Learning representations of emotional speech with deep convolutional generative adversarial networks | |
Daneshfar et al. | Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm | |
Yue et al. | The classification of underwater acoustic targets based on deep learning methods | |
KR102154676B1 (en) | Method for training top-down selective attention in artificial neural networks | |
CN109887484A (en) | A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device | |
CN108229659A (en) | Piano singly-bound voice recognition method based on deep learning | |
Gupta et al. | A stacked technique for gender recognition through voice | |
WO2021127982A1 (en) | Speech emotion recognition method, smart device, and computer-readable storage medium | |
CN111597333B (en) | Event and event element extraction method and device for block chain field | |
Lee et al. | Deep representation learning for affective speech signal analysis and processing: Preventing unwanted signal disparities | |
Lorena et al. | Automatic microstructural classification with convolutional neural network | |
Guo et al. | Transformer-based spiking neural networks for multimodal audio-visual classification | |
Wani et al. | Deepfakes audio detection leveraging audio spectrogram and convolutional neural networks | |
Qais et al. | Deepfake audio detection with neural networks using audio features | |
Roy et al. | Speech emotion recognition using deep learning | |
Yue et al. | Equilibrium optimizer for emotion classification from english speech signals | |
Song et al. | Transfer learning for music genre classification | |
Li et al. | An improved method of speech recognition based on probabilistic neural network ensembles | |
Al-Thahab | Speech recognition based radon-discrete cosine transforms by Delta Neural Network learning rule | |
MANNEM et al. | Deep Learning Methodology for Recognition of Emotions using Acoustic features. | |
Mohanty et al. | Improvement of speech emotion recognition by deep convolutional neural network and speech features | |
Sunny et al. | Development of a speech recognition system for speaker independent isolated Malayalam words |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210910 |