CN108010514A - A kind of method of speech classification based on deep neural network - Google Patents

A kind of method of speech classification based on deep neural network Download PDF

Info

Publication number
CN108010514A
CN108010514A CN201711155884.8A CN201711155884A CN108010514A CN 108010514 A CN108010514 A CN 108010514A CN 201711155884 A CN201711155884 A CN 201711155884A CN 108010514 A CN108010514 A CN 108010514A
Authority
CN
China
Prior art keywords
local
classification
global
frequency domain
sound spectrograph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711155884.8A
Other languages
Chinese (zh)
Other versions
CN108010514B (en
Inventor
毛华
章毅
吴雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201711155884.8A priority Critical patent/CN108010514B/en
Publication of CN108010514A publication Critical patent/CN108010514A/en
Application granted granted Critical
Publication of CN108010514B publication Critical patent/CN108010514B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of method of speech classification based on deep neural network, it is intended to by a unified algorithm model, solves the problems, such as different Classification of Speech.The present invention includes the following steps:S1:Convert speech into corresponding sound spectrograph;Piecemeal is carried out along frequency domain on complete sound spectrograph, obtains one group of local frequency domain information set.S2:Input using complete and local frequency domain information as model respectively, based on different inputs, convolutional neural networks can extract part and global characteristics.S3:With notice mechanism, amalgamation of global and local feature representation, form final feature representation.S4:Using marked data, pass through gradient decline and back-propagation algorithm training network.S5:To unlabelled voice, using trained parameter, the classification of model output maximum probability is as prediction result.The present invention realizes the unified algorithm model to different phonetic classification problem, and accuracy is improved in multiple Classification of Speech problems.

Description

A kind of method of speech classification based on deep neural network
Technical field
A kind of method of speech classification based on deep neural network, for handling the classification task of different voices, is related to The technical fields such as Speech processing, artificial intelligence.
Background technology
With the fast development of computer technology, how more preferably the mankind constantly strengthen the dependence of computer and requirement, Ground is interacted with computer has become a research hotspot.Voice as it is most universal in daily life, most natural one Kind communication way, it includes huge information content, such as the accent of speaker, the affective state of speaker etc..Computer Classification of Speech recognition capability be computer carry out speech processes important component, realize nature human-computer interaction interface Key precondition, has very big researching value and application value.Speech classification technique is a highly important research direction, it In speech recognition, voice content detection etc. all plays an important role.Classification of Speech is the base that advanced treating is carried out to audio Plinth and premise, for the section audio currently provided, the audio environment residing for voice can be determined in advance by classification, say The gender of people, accent, mood etc. are talked about, basis is provided to adjust the adaptive algorithm of speech model.Therefore, method of speech classification is It is vital.
Classification of Speech includes a variety of different tasks, such as:Speech emotion recognition, accents recognition, Speaker Identification, voice Ambient zone grades.The challenge of Classification of Speech task is the higher-dimension characteristic of voice.Traditional method of speech classification, it will usually be directed to The problem of single or database, extract specific audio frequency characteristics, so as to reduce the dimension of the data of input sorter network.So And feature extraction needs enough Speech processing knowledge, because feature extraction represents the filtering of information, information can be caused Missing.Secondly, traditional sorting algorithm is often not suitable for more classification tasks, such as support vector machines etc..These problems are all The difficult point that our need of work is captured.
Deep neural network method is one of current processing most important means of big data, especially high dimensional data.Depth The characteristics of neutral net, is to realize to sound by the training to connection weight by constructing the nonlinear mapping function of multilayer The study of the feature of frequency evidence simultaneously is used to classify.Deep neural network, can be according to output because it has the function of feedback, study etc. As a result network inherent parameters are adjusted, at present, although the upsurge of deep neural network is gradually in every subjects field Sprawling is opened, and is applied successfully to multiple fields, including machine translation, speech recognition, target identification etc..
The content of the invention
The present invention provides a kind of method of speech classification based on deep neural network for above-mentioned shortcoming, solves existing There is the intractable problem of feature extracting method, high dimensional data only for the classification of distinctive single task or data in technology.
To achieve these goals, the technical solution adopted by the present invention is:
A kind of method of speech classification based on deep neural network, it is characterised in that include the following steps:
S1:Voice data is subjected to Short Time Fourier Transform, is converted to corresponding sound spectrograph;Along frequency on complete sound spectrograph Domain carries out piecemeal, obtains one group of local frequency domain information set;
S2:The algorithm model based on convolutional neural networks and notice mechanism is established, respectively by complete sound spectrograph and part Input of the frequency domain information as model, carries out feature learning;Based on local and complete sound spectrograph information, convolutional Neural net is used Network extracts part and global characteristics;
S3:With notice mechanism, amalgamation of global and local feature representation, form final feature representation, are input to Softmax graders, so as to obtain the prediction of the classification belonging to voice;
S4:Using marked voice data, by gradient decline and back-propagation algorithm training network, and network ginseng is preserved Number;
S5:To unlabelled voice, it is predicted using trained model, model exports the affiliated classification conduct of maximum probability Final prediction result.
Further, distributed sound spectrograph transfer process specifically comprises the following steps in the S1:
Short Time Fourier Transform is carried out to original audio, given original audio is divided into M sections of short audios;To every section of short audio, Its short-time energy and modulus are calculated, finally obtains a complete sound spectrograph expression S, the S of sound spectrograph is expressed as follows:
(1)
Wherein, N is expressed as every section of short audio length scale;Formula(1)In illustrate sound spectrograph be two-dimensional matrix structure composition, Two of which dimension phonetically represents the change order of time respectively and frequency domain is changed by the section of low frequency to high frequency, often Numerical values recited on a point represents the size of amplitude.
The direction changed to complete sound spectrograph information along frequency domain carries out piecemeal, can obtain one group of local and overall situation Spectrum information set, that is, obtain one group based on different frequency domain distributions input data combination:
Further, the feature extraction of convolutional neural networks specifically comprises the following steps in S2:
For multiple local inputs, the feature of different information is extracted using convolutional neural networks, so as to obtain one group of local expression:
(2)
In above formula, each local inputThere is corresponding deconvolution parameterWith,fIt is expressed as activation primitive;Most The one group of local feature obtained eventually is expressed as:
For current complete global frequency domain information, the feature of the overall situation is extracted using convolutional neural networks, it is specific to calculate Formula is as follows:
(3)
Wherein,aIt is expressed as the global characteristics that convolutional neural networks extract.
Wherein formula(2)With(3)In be mainly concerned with convolutional neural networks convolution and pondization operation.Convolution it is specific Operation is as follows:
(4)
Wherein,MWithNThe size of convolution kernel is defined,m,nRepresent line number and columns, for defining pixel position,fIt is convolution Kernel function,Define current layeriOKjThe feature representation of row,Define current layeriOKjThe input data of row.wIt is fixed The justice parameter of convolution kernel,bIt is corresponding bias;
Formula(4)In convolution operation, play an important role in convolutional network.By sharing the design of weights, convolution The feature that network extracts has feature invariant shape;Change somewhat, the changing features that network proposes occur for the input inputted Less.
The concrete operations in pond are as follows:
(5)
Wherein,Pond function is represent, most common pond function there are three kinds, i.e., in receptive field(The space of convolution kernel)It is interior It is maximized, minimum value or average value.aIt is the input that represent pond layer,pRepresent the output after pondization operation;
Formula(5)Middle pond parameter greatly reduces the number of weights in network, it is therefore prevented that over-fitting occurs in network.
Further, the notice mechanism amalgamation of global in S3 specifically comprises the following steps with local feature representation:
Based on different local features, with notice mechanism, new global characteristics expression is retrieved;Global information is given first Assign each of which part one " coefficient ":
(6)
In above formula,Represent global characteristicsaA certain part, altogethermA composition information,Represent based on current Local feature,The coefficient of this part, represents its importance degree;
Formula(6)Implication be notice mechanism essence operation, based on guidance information local feature, to global characteristicsa Each part assign it is differentWeights, represent the significance level of the composition.It is intended to wish to pass through network training, looks for Go out most representational feature in composition.
Then the coefficient that represent significance level calculated is multiplied with corresponding part, form one it is new complete Office's information:
(7)
Notice mechanism is so used, is obtainednA new global information, with initial global characteristicsaContraposition is added, and is obtained most Whole feature representation:
(8)
By final feature representationA, softmax graders are input to, the classification of the probable value maximum of gained is the voice number According to prediction classification.
After using the above scheme, the beneficial effects of the present invention are:
(1)Traditional method of speech classification both for it is single the problem of passed through using different feature extraction algorithms, the present invention Deep neural network directly carries out feature learning to voice sound spectrograph, can independently learn different sounds according to the difference of task Frequency feature.
(2)The training of deep neural network generally requires big data, but presently disclosed voice data number is less.Base In the research of conventional deep neural network, present invention further proposes merging for convolutional neural networks and notice mechanism Algorithm model, further increasing discrimination in multiple tasks.
By taking two groups of Classification of Speech tasks of accents recognition and Speaker Identification as an example:
Table 1 illustrates the contrast of model and other methods in the present invention in accents recognition problem, and wherein i-Vector is classical Feature extraction algorithm, VGG and ResNets are representative convolutional neural networks models.
Table 2 illustrates the contrast of model and other methods in the present invention in Speaker Identification problem, wherein MFCC be through The feature extraction algorithm of allusion quotation, VGG and ResNets are representative convolutional neural networks models.
Above-mentioned the results show:
1)In multiple Classification of Speech problems, the feature that model learning proposed by the present invention arrives is calculated compared to traditional feature extraction Method, can obtain more preferable recognition result.
2)Compared to the method for other neural net methods, the present invention, which further increasing, applies notice mechanism In convolutional neural networks, increase model robustness and generalization ability, the accuracy rate of speech recognition is all improved in multiple problems.
Brief description of the drawings
Fig. 1 is algorithm model synoptic diagram in the present invention;
Fig. 2 is the distributed sound spectrograph based on frequency domain;
Fig. 3 is the convolution block basic block diagram for employing notice mechanism;
Fig. 4 is the overall process figure of the present invention.
Specific embodiment
Each attached drawing in the embodiment of the present invention can be combined one by one below, the technical solution in the present embodiment is carried out detailed Ground describes;But it is described here go out embodiment be only the present invention part of the embodiment, rather than whole specification Embodiment.
Referring to Fig. 1, a kind of kernel model of the Classification of Speech based on deep neural network is one and multiple employs attention The deep neural network model of the convolution block composition of power mechanism.One is convolutional neural networks, main using the non-linear of multilayer Function, to learn the mapping relations between input data and feature;Deep learning algorithm can automatically learn according to target Correlated characteristic;Another is notice mechanism, mainly by distributing local message different weights, so as to obtain a part The different expression of information proportion.The present invention is effectively improved voice point by merging deep learning algorithm and notice mechanism The accuracy of class..
The method of speech classification based on deep neural network, includes the following steps:
Step S1:Short Time Fourier Transform is carried out to original audio, given original audio is divided into M sections of short audios;To every section Short audio, calculates its short-time energy and modulus, finally obtains a complete sound spectrograph expression S, and the S of sound spectrograph is expressed as follows:
(1)
Wherein, N is expressed as every section of short audio length scale.
The direction changed to complete sound spectrograph information along frequency domain carries out piecemeal, can obtain one group of local and overall situation Spectrum information set, that is, obtain one group based on different frequency domain distributions input data combination:
Referring to Fig. 2 it can be seen that the displaying of complete sound spectrograph and the distributed sound spectrograph based on frequency domain.Based on distribution Sound spectrograph be along frequency domain change section carry out piecemeal, so as to obtain the distributed intelligence in different frequency section.
Step S2:The algorithm model based on convolutional neural networks and notice mechanism is established, respectively by complete sound spectrograph With input of the local frequency domain information as model, feature learning is carried out;For multiple local inputs, convolutional neural networks are used The feature of different information is extracted, so as to obtain one group of local expression:
(2)
In above formula, each local inputThere is corresponding deconvolution parameterWith,fIt is expressed as activation primitive;Most The one group of local feature obtained eventually is expressed as:
For current complete global frequency domain information, the feature of the overall situation is extracted using convolutional neural networks, it is specific to calculate Formula is as follows:
(3)
Wherein,aIt is expressed as the global characteristics that convolutional neural networks extract.
Step S3:On the basis of the local feature and global characteristics that step S2 is proposed, based on different local features, fortune With notice mechanism, new global characteristics expression is retrieved;Assigning each of which part one to global information first " is Number ":
(6)
In above formula,Represent global characteristicsaA certain part, altogethermA composition information,Represent based on current Local feature,The coefficient of this part, represents its importance degree.
Then the coefficient that represent significance level calculated is multiplied with corresponding part, form one it is new complete Office's information:
(7)
Notice mechanism is so used, is obtainednA new global information, with initial global characteristicsaContraposition is added, and is obtained most Whole feature representation:
(8)
By final feature representationA, softmax graders are input to, the classification of the probable value maximum of gained is the voice number According to prediction classification.
Referring to Fig. 3, the basic block diagram of the convolution block based on notice mechanism is illustrated, is included based on local and global The characteristic extraction procedure of information, and it is last use notice thought merging again into row information, last to one is finally Feature representationA
Step S4:Using marked voice data, by gradient decline and back-propagation algorithm training network, and protect Deposit network parameter;The initial structure of model is that the parameter in network is random initializtion, is produced by marked voice data Error, so as to update network parameter, until network becomes stable, retain optimal parameter.
Step S5:To unlabelled voice, it is predicted using trained model and parameter, model output maximum probability Affiliated classification as final prediction result.
Procedure chart of the invention complete is illustrated from step S1 to step S5 referring to Fig. 4, if in the presence of the audio for still needing to identification, Step S1 to step S5 is then continued to execute, the last highest place classification of grader output probability value is prediction result.

Claims (4)

1. a kind of method of speech classification based on deep neural network, it is characterised in that distributed sound spectrograph and convolutional Neural net The combination of network and notice mechanism, includes the following steps:
S1:Voice data is subjected to Short Time Fourier Transform, is converted to corresponding sound spectrograph;Along frequency on complete sound spectrograph Domain carries out piecemeal, obtains one group of local frequency domain information set;
S2:The algorithm model based on convolutional neural networks and notice mechanism is established, respectively by complete sound spectrograph and part Input of the frequency domain information as model, carries out feature learning;Based on local and complete sound spectrograph information, convolutional Neural net is used Network extracts part and global characteristics;
S3:With notice mechanism, amalgamation of global and local feature representation, form final feature representation, are input to Softmax graders, so as to obtain the prediction of the classification belonging to voice;
S4:Using marked voice data, by gradient decline and back-propagation algorithm training network, and network ginseng is preserved Number;
S5:To unlabelled voice, it is predicted using trained model, model exports the affiliated classification conduct of maximum probability Final prediction result.
A kind of 2. method of speech classification based on deep neural network according to claim 1, it is characterised in that:The S1 Middle distribution sound spectrograph transfer process specifically comprises the following steps:
S11:Short Time Fourier Transform is carried out to original audio, given original audio is divided into M sections of short audios;To every section of minor Frequently, its short-time energy and modulus are calculated, finally obtains a complete sound spectrograph expression S, the S of sound spectrograph is expressed as follows:
(1)
Wherein, N is expressed as every section of short audio length scale;
S12:The direction changed to complete sound spectrograph information along frequency domain carries out piecemeal, wherein some local frequency domain information Be expressed as follows:
(2)
One group of local spectrum information set with the overall situation is finally obtained, that is, obtains one group of input based on different frequency domain distributions Data combine:
A kind of 3. method of speech classification based on deep neural network according to claim 1, it is characterised in that:The S2 The feature extraction of middle convolutional neural networks specifically comprises the following steps:
S21:For multiple local inputs, the feature of different information is extracted using convolutional neural networks, so as to obtain one group of part Expression:
(3)
In above formula, each local inputThere is corresponding deconvolution parameterWith,fIt is expressed as activation primitive;Finally One group of obtained local feature is expressed as:
S22:For current complete global frequency domain information, the feature of the overall situation is extracted using convolutional neural networks, it is specific to calculate Formula is as follows:
(4)
Wherein,aIt is expressed as the global characteristics that convolutional neural networks extract.
A kind of 4. method of speech classification based on deep neural network according to claim 1, it is characterised in that:The step Notice mechanism amalgamation of global in rapid S3 specifically comprises the following steps with local feature representation:
Based on different local features, with notice mechanism, new global characteristics expression is retrieved;Global information is given first Assign each of which part one " coefficient ":
(6)
In above formula,Represent global characteristicsaA certain part, altogethermA composition information,Expression is based on current office Portion's feature,The coefficient of this part, represents its importance degree;
Then the coefficient that represent significance level calculated is multiplied with corresponding part, forms a new global letter Breath:
(7)
Notice mechanism is so used, is obtainednA new global information, with initial global characteristicsaContraposition is added, and is obtained most Whole feature representation:
(8)
By final feature representationA, softmax graders are input to, the classification of the probable value maximum of gained is the voice number According to prediction classification.
CN201711155884.8A 2017-11-20 2017-11-20 Voice classification method based on deep neural network Expired - Fee Related CN108010514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711155884.8A CN108010514B (en) 2017-11-20 2017-11-20 Voice classification method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711155884.8A CN108010514B (en) 2017-11-20 2017-11-20 Voice classification method based on deep neural network

Publications (2)

Publication Number Publication Date
CN108010514A true CN108010514A (en) 2018-05-08
CN108010514B CN108010514B (en) 2021-09-10

Family

ID=62052777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711155884.8A Expired - Fee Related CN108010514B (en) 2017-11-20 2017-11-20 Voice classification method based on deep neural network

Country Status (1)

Country Link
CN (1) CN108010514B (en)

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846048A (en) * 2018-05-30 2018-11-20 大连理工大学 Musical genre classification method based on Recognition with Recurrent Neural Network and attention mechanism
CN108877783A (en) * 2018-07-05 2018-11-23 腾讯音乐娱乐科技(深圳)有限公司 The method and apparatus for determining the audio types of audio data
CN109243466A (en) * 2018-11-12 2019-01-18 成都傅立叶电子科技有限公司 A kind of vocal print authentication training method and system
CN109256135A (en) * 2018-08-28 2019-01-22 桂林电子科技大学 A kind of end-to-end method for identifying speaker, device and storage medium
CN109285539A (en) * 2018-11-28 2019-01-29 中国电子科技集团公司第四十七研究所 A kind of sound identification method neural network based
CN109410914A (en) * 2018-08-28 2019-03-01 江西师范大学 A kind of Jiangxi dialect phonetic and dialect point recognition methods
CN109509475A (en) * 2018-12-28 2019-03-22 出门问问信息科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of speech recognition
CN109523994A (en) * 2018-11-13 2019-03-26 四川大学 A kind of multitask method of speech classification based on capsule neural network
CN109599129A (en) * 2018-11-13 2019-04-09 杭州电子科技大学 Voice depression recognition methods based on attention mechanism and convolutional neural networks
CN109767790A (en) * 2019-02-28 2019-05-17 中国传媒大学 A kind of speech-emotion recognition method and system
CN109817233A (en) * 2019-01-25 2019-05-28 清华大学 Voice flow steganalysis method and system based on level attention network model
CN110047516A (en) * 2019-03-12 2019-07-23 天津大学 A kind of speech-emotion recognition method based on gender perception
CN110197206A (en) * 2019-05-10 2019-09-03 杭州深睿博联科技有限公司 The method and device of image procossing
CN110223714A (en) * 2019-06-03 2019-09-10 杭州哲信信息技术有限公司 A kind of voice-based Emotion identification method
CN110459225A (en) * 2019-08-14 2019-11-15 南京邮电大学 A kind of speaker identification system based on CNN fusion feature
CN110503128A (en) * 2018-05-18 2019-11-26 百度(美国)有限责任公司 The spectrogram that confrontation network carries out Waveform composition is generated using convolution
CN110534133A (en) * 2019-08-28 2019-12-03 珠海亿智电子科技有限公司 A kind of speech emotion recognition system and speech-emotion recognition method
CN110648669A (en) * 2019-09-30 2020-01-03 上海依图信息技术有限公司 Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium
CN110782878A (en) * 2019-10-10 2020-02-11 天津大学 Attention mechanism-based multi-scale audio scene recognition method
CN110992988A (en) * 2019-12-24 2020-04-10 东南大学 Speech emotion recognition method and device based on domain confrontation
CN111009262A (en) * 2019-12-24 2020-04-14 携程计算机技术(上海)有限公司 Voice gender identification method and system
CN111223488A (en) * 2019-12-30 2020-06-02 Oppo广东移动通信有限公司 Voice wake-up method, device, equipment and storage medium
CN111259189A (en) * 2018-11-30 2020-06-09 马上消费金融股份有限公司 Music classification method and device
CN111340187A (en) * 2020-02-18 2020-06-26 河北工业大学 Network characterization method based on counter attention mechanism
CN111666996A (en) * 2020-05-29 2020-09-15 湖北工业大学 High-precision equipment source identification method based on attention mechanism
CN112466298A (en) * 2020-11-24 2021-03-09 网易(杭州)网络有限公司 Voice detection method and device, electronic equipment and storage medium
CN112489689A (en) * 2020-11-30 2021-03-12 东南大学 Cross-database voice emotion recognition method and device based on multi-scale difference confrontation
CN112489687A (en) * 2020-10-28 2021-03-12 深兰人工智能芯片研究院(江苏)有限公司 Speech emotion recognition method and device based on sequence convolution
CN112885372A (en) * 2021-01-15 2021-06-01 国网山东省电力公司威海供电公司 Intelligent diagnosis method, system, terminal and medium for power equipment fault sound
CN112967730A (en) * 2021-01-29 2021-06-15 北京达佳互联信息技术有限公司 Voice signal processing method and device, electronic equipment and storage medium
CN112992119A (en) * 2021-01-14 2021-06-18 安徽大学 Deep neural network-based accent classification method and model thereof
CN113035227A (en) * 2021-03-12 2021-06-25 山东大学 Multi-modal voice separation method and system
CN113049084A (en) * 2021-03-16 2021-06-29 电子科技大学 Attention mechanism-based Resnet distributed optical fiber sensing signal identification method
CN113409827A (en) * 2021-06-17 2021-09-17 山东省计算中心(国家超级计算济南中心) Voice endpoint detection method and system based on local convolution block attention network
CN113571063A (en) * 2021-02-02 2021-10-29 腾讯科技(深圳)有限公司 Voice signal recognition method and device, electronic equipment and storage medium
CN114141244A (en) * 2020-09-04 2022-03-04 四川大学 Voice recognition technology based on audio media analysis
CN116504259A (en) * 2023-06-30 2023-07-28 中汇丰(北京)科技有限公司 Semantic recognition method based on natural language processing
CN116778951A (en) * 2023-05-25 2023-09-19 上海蜜度信息技术有限公司 Audio classification method, device, equipment and medium based on graph enhancement
CN116825092A (en) * 2023-08-28 2023-09-29 珠海亿智电子科技有限公司 Speech recognition method, training method and device of speech recognition model
CN117275491A (en) * 2023-11-17 2023-12-22 青岛科技大学 Sound classification method based on audio conversion and time diagram neural network
CN112967730B (en) * 2021-01-29 2024-07-02 北京达佳互联信息技术有限公司 Voice signal processing method and device, electronic equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706780A (en) * 2009-09-03 2010-05-12 北京交通大学 Image semantic retrieving method based on visual attention model
CN102044254A (en) * 2009-10-10 2011-05-04 北京理工大学 Speech spectrum color enhancement method for speech visualization
US20160283841A1 (en) * 2015-03-27 2016-09-29 Google Inc. Convolutional neural networks
US20170099200A1 (en) * 2015-10-06 2017-04-06 Evolv Technologies, Inc. Platform for Gathering Real-Time Analysis
CN106652999A (en) * 2015-10-29 2017-05-10 三星Sds株式会社 System and method for voice recognition
CN106847309A (en) * 2017-01-09 2017-06-13 华南理工大学 A kind of speech-emotion recognition method
CN106952649A (en) * 2017-05-14 2017-07-14 北京工业大学 Method for distinguishing speek person based on convolutional neural networks and spectrogram
CN107145518A (en) * 2017-04-10 2017-09-08 同济大学 Personalized recommendation system based on deep learning under a kind of social networks
CN107203999A (en) * 2017-04-28 2017-09-26 北京航空航天大学 A kind of skin lens image automatic division method based on full convolutional neural networks
CN107221326A (en) * 2017-05-16 2017-09-29 百度在线网络技术(北京)有限公司 Voice awakening method, device and computer equipment based on artificial intelligence
CN107239446A (en) * 2017-05-27 2017-10-10 中国矿业大学 A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism
CN107256228A (en) * 2017-05-02 2017-10-17 清华大学 Answer selection system and method based on structuring notice mechanism
CN107316066A (en) * 2017-07-28 2017-11-03 北京工商大学 Image classification method and system based on multi-path convolutional neural networks

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706780A (en) * 2009-09-03 2010-05-12 北京交通大学 Image semantic retrieving method based on visual attention model
CN102044254A (en) * 2009-10-10 2011-05-04 北京理工大学 Speech spectrum color enhancement method for speech visualization
US20160283841A1 (en) * 2015-03-27 2016-09-29 Google Inc. Convolutional neural networks
US20170099200A1 (en) * 2015-10-06 2017-04-06 Evolv Technologies, Inc. Platform for Gathering Real-Time Analysis
CN106652999A (en) * 2015-10-29 2017-05-10 三星Sds株式会社 System and method for voice recognition
CN106847309A (en) * 2017-01-09 2017-06-13 华南理工大学 A kind of speech-emotion recognition method
CN107145518A (en) * 2017-04-10 2017-09-08 同济大学 Personalized recommendation system based on deep learning under a kind of social networks
CN107203999A (en) * 2017-04-28 2017-09-26 北京航空航天大学 A kind of skin lens image automatic division method based on full convolutional neural networks
CN107256228A (en) * 2017-05-02 2017-10-17 清华大学 Answer selection system and method based on structuring notice mechanism
CN106952649A (en) * 2017-05-14 2017-07-14 北京工业大学 Method for distinguishing speek person based on convolutional neural networks and spectrogram
CN107221326A (en) * 2017-05-16 2017-09-29 百度在线网络技术(北京)有限公司 Voice awakening method, device and computer equipment based on artificial intelligence
CN107239446A (en) * 2017-05-27 2017-10-10 中国矿业大学 A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism
CN107316066A (en) * 2017-07-28 2017-11-03 北京工商大学 Image classification method and system based on multi-path convolutional neural networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHE-WEI HUANG: ""Deep Convolutional Recurrent neural network with attention mechanism robust speech emotion recognition"", 《ICME 2017》 *
FAN HU: ""Transferring Deep Convolutional Neural networks for the sceneclassification of high-resolution remote sensing imagery"", 《MDPI》 *
KYUNGHYUN CHO: ""Describing Multimedia Content Using Attention-based Encoder-Decoder Networks"", 《IEEE TRANSACTION ON MULTIMEDIA》 *
L. HE, M. LECH: ""Time-frequency feature extraction from spectrograms and wavelet packets with application to automatic stress and emotion classification in speech"", 《PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING》 *

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503128A (en) * 2018-05-18 2019-11-26 百度(美国)有限责任公司 The spectrogram that confrontation network carries out Waveform composition is generated using convolution
CN108846048A (en) * 2018-05-30 2018-11-20 大连理工大学 Musical genre classification method based on Recognition with Recurrent Neural Network and attention mechanism
CN108877783A (en) * 2018-07-05 2018-11-23 腾讯音乐娱乐科技(深圳)有限公司 The method and apparatus for determining the audio types of audio data
CN109410914B (en) * 2018-08-28 2022-02-22 江西师范大学 Method for identifying Jiangxi dialect speech and dialect point
CN109256135A (en) * 2018-08-28 2019-01-22 桂林电子科技大学 A kind of end-to-end method for identifying speaker, device and storage medium
CN109256135B (en) * 2018-08-28 2021-05-18 桂林电子科技大学 End-to-end speaker confirmation method, device and storage medium
CN109410914A (en) * 2018-08-28 2019-03-01 江西师范大学 A kind of Jiangxi dialect phonetic and dialect point recognition methods
CN109243466A (en) * 2018-11-12 2019-01-18 成都傅立叶电子科技有限公司 A kind of vocal print authentication training method and system
CN109599129A (en) * 2018-11-13 2019-04-09 杭州电子科技大学 Voice depression recognition methods based on attention mechanism and convolutional neural networks
CN109599129B (en) * 2018-11-13 2021-09-14 杭州电子科技大学 Voice depression recognition system based on attention mechanism and convolutional neural network
CN109523994A (en) * 2018-11-13 2019-03-26 四川大学 A kind of multitask method of speech classification based on capsule neural network
CN109285539B (en) * 2018-11-28 2022-07-05 中国电子科技集团公司第四十七研究所 Sound recognition method based on neural network
CN109285539A (en) * 2018-11-28 2019-01-29 中国电子科技集团公司第四十七研究所 A kind of sound identification method neural network based
CN111259189A (en) * 2018-11-30 2020-06-09 马上消费金融股份有限公司 Music classification method and device
CN109509475B (en) * 2018-12-28 2021-11-23 出门问问信息科技有限公司 Voice recognition method and device, electronic equipment and computer readable storage medium
CN109509475A (en) * 2018-12-28 2019-03-22 出门问问信息科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of speech recognition
CN109817233A (en) * 2019-01-25 2019-05-28 清华大学 Voice flow steganalysis method and system based on level attention network model
CN109767790A (en) * 2019-02-28 2019-05-17 中国传媒大学 A kind of speech-emotion recognition method and system
CN110047516A (en) * 2019-03-12 2019-07-23 天津大学 A kind of speech-emotion recognition method based on gender perception
CN110197206A (en) * 2019-05-10 2019-09-03 杭州深睿博联科技有限公司 The method and device of image procossing
CN110223714B (en) * 2019-06-03 2021-08-03 杭州哲信信息技术有限公司 Emotion recognition method based on voice
CN110223714A (en) * 2019-06-03 2019-09-10 杭州哲信信息技术有限公司 A kind of voice-based Emotion identification method
CN110459225A (en) * 2019-08-14 2019-11-15 南京邮电大学 A kind of speaker identification system based on CNN fusion feature
CN110534133A (en) * 2019-08-28 2019-12-03 珠海亿智电子科技有限公司 A kind of speech emotion recognition system and speech-emotion recognition method
CN110534133B (en) * 2019-08-28 2022-03-25 珠海亿智电子科技有限公司 Voice emotion recognition system and voice emotion recognition method
CN110648669A (en) * 2019-09-30 2020-01-03 上海依图信息技术有限公司 Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium
CN110782878A (en) * 2019-10-10 2020-02-11 天津大学 Attention mechanism-based multi-scale audio scene recognition method
CN110782878B (en) * 2019-10-10 2022-04-05 天津大学 Attention mechanism-based multi-scale audio scene recognition method
CN110992988B (en) * 2019-12-24 2022-03-08 东南大学 Speech emotion recognition method and device based on domain confrontation
CN110992988A (en) * 2019-12-24 2020-04-10 东南大学 Speech emotion recognition method and device based on domain confrontation
CN111009262A (en) * 2019-12-24 2020-04-14 携程计算机技术(上海)有限公司 Voice gender identification method and system
CN111223488B (en) * 2019-12-30 2023-01-17 Oppo广东移动通信有限公司 Voice wake-up method, device, equipment and storage medium
CN111223488A (en) * 2019-12-30 2020-06-02 Oppo广东移动通信有限公司 Voice wake-up method, device, equipment and storage medium
CN111340187A (en) * 2020-02-18 2020-06-26 河北工业大学 Network characterization method based on counter attention mechanism
CN111340187B (en) * 2020-02-18 2024-02-02 河北工业大学 Network characterization method based on attention countermeasure mechanism
CN111666996A (en) * 2020-05-29 2020-09-15 湖北工业大学 High-precision equipment source identification method based on attention mechanism
CN111666996B (en) * 2020-05-29 2023-09-19 湖北工业大学 High-precision equipment source identification method based on attention mechanism
CN114141244A (en) * 2020-09-04 2022-03-04 四川大学 Voice recognition technology based on audio media analysis
CN112489687A (en) * 2020-10-28 2021-03-12 深兰人工智能芯片研究院(江苏)有限公司 Speech emotion recognition method and device based on sequence convolution
CN112489687B (en) * 2020-10-28 2024-04-26 深兰人工智能芯片研究院(江苏)有限公司 Voice emotion recognition method and device based on sequence convolution
CN112466298A (en) * 2020-11-24 2021-03-09 网易(杭州)网络有限公司 Voice detection method and device, electronic equipment and storage medium
CN112466298B (en) * 2020-11-24 2023-08-11 杭州网易智企科技有限公司 Voice detection method, device, electronic equipment and storage medium
CN112489689B (en) * 2020-11-30 2024-04-30 东南大学 Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure
CN112489689A (en) * 2020-11-30 2021-03-12 东南大学 Cross-database voice emotion recognition method and device based on multi-scale difference confrontation
CN112992119A (en) * 2021-01-14 2021-06-18 安徽大学 Deep neural network-based accent classification method and model thereof
CN112992119B (en) * 2021-01-14 2024-05-03 安徽大学 Accent classification method based on deep neural network and model thereof
CN112885372A (en) * 2021-01-15 2021-06-01 国网山东省电力公司威海供电公司 Intelligent diagnosis method, system, terminal and medium for power equipment fault sound
CN112885372B (en) * 2021-01-15 2022-08-09 国网山东省电力公司威海供电公司 Intelligent diagnosis method, system, terminal and medium for power equipment fault sound
CN112967730B (en) * 2021-01-29 2024-07-02 北京达佳互联信息技术有限公司 Voice signal processing method and device, electronic equipment and storage medium
CN112967730A (en) * 2021-01-29 2021-06-15 北京达佳互联信息技术有限公司 Voice signal processing method and device, electronic equipment and storage medium
CN113571063B (en) * 2021-02-02 2024-06-04 腾讯科技(深圳)有限公司 Speech signal recognition method and device, electronic equipment and storage medium
CN113571063A (en) * 2021-02-02 2021-10-29 腾讯科技(深圳)有限公司 Voice signal recognition method and device, electronic equipment and storage medium
CN113035227A (en) * 2021-03-12 2021-06-25 山东大学 Multi-modal voice separation method and system
CN113049084A (en) * 2021-03-16 2021-06-29 电子科技大学 Attention mechanism-based Resnet distributed optical fiber sensing signal identification method
CN113409827A (en) * 2021-06-17 2021-09-17 山东省计算中心(国家超级计算济南中心) Voice endpoint detection method and system based on local convolution block attention network
CN113409827B (en) * 2021-06-17 2022-06-17 山东省计算中心(国家超级计算济南中心) Voice endpoint detection method and system based on local convolution block attention network
CN116778951A (en) * 2023-05-25 2023-09-19 上海蜜度信息技术有限公司 Audio classification method, device, equipment and medium based on graph enhancement
CN116504259B (en) * 2023-06-30 2023-08-29 中汇丰(北京)科技有限公司 Semantic recognition method based on natural language processing
CN116504259A (en) * 2023-06-30 2023-07-28 中汇丰(北京)科技有限公司 Semantic recognition method based on natural language processing
CN116825092B (en) * 2023-08-28 2023-12-01 珠海亿智电子科技有限公司 Speech recognition method, training method and device of speech recognition model
CN116825092A (en) * 2023-08-28 2023-09-29 珠海亿智电子科技有限公司 Speech recognition method, training method and device of speech recognition model
CN117275491B (en) * 2023-11-17 2024-01-30 青岛科技大学 Sound classification method based on audio conversion and time attention seeking neural network
CN117275491A (en) * 2023-11-17 2023-12-22 青岛科技大学 Sound classification method based on audio conversion and time diagram neural network

Also Published As

Publication number Publication date
CN108010514B (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN108010514A (en) A kind of method of speech classification based on deep neural network
Sun et al. Speech emotion recognition based on DNN-decision tree SVM model
Lee et al. Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging
Chang et al. Learning representations of emotional speech with deep convolutional generative adversarial networks
Daneshfar et al. Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm
Yue et al. The classification of underwater acoustic targets based on deep learning methods
KR102154676B1 (en) Method for training top-down selective attention in artificial neural networks
CN109887484A (en) A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device
CN108229659A (en) Piano singly-bound voice recognition method based on deep learning
Gupta et al. A stacked technique for gender recognition through voice
WO2021127982A1 (en) Speech emotion recognition method, smart device, and computer-readable storage medium
CN111597333B (en) Event and event element extraction method and device for block chain field
Lee et al. Deep representation learning for affective speech signal analysis and processing: Preventing unwanted signal disparities
Lorena et al. Automatic microstructural classification with convolutional neural network
Guo et al. Transformer-based spiking neural networks for multimodal audio-visual classification
Wani et al. Deepfakes audio detection leveraging audio spectrogram and convolutional neural networks
Qais et al. Deepfake audio detection with neural networks using audio features
Roy et al. Speech emotion recognition using deep learning
Yue et al. Equilibrium optimizer for emotion classification from english speech signals
Song et al. Transfer learning for music genre classification
Li et al. An improved method of speech recognition based on probabilistic neural network ensembles
Al-Thahab Speech recognition based radon-discrete cosine transforms by Delta Neural Network learning rule
MANNEM et al. Deep Learning Methodology for Recognition of Emotions using Acoustic features.
Mohanty et al. Improvement of speech emotion recognition by deep convolutional neural network and speech features
Sunny et al. Development of a speech recognition system for speaker independent isolated Malayalam words

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210910