CN108010514B - Voice classification method based on deep neural network - Google Patents
Voice classification method based on deep neural network Download PDFInfo
- Publication number
- CN108010514B CN108010514B CN201711155884.8A CN201711155884A CN108010514B CN 108010514 B CN108010514 B CN 108010514B CN 201711155884 A CN201711155884 A CN 201711155884A CN 108010514 B CN108010514 B CN 108010514B
- Authority
- CN
- China
- Prior art keywords
- local
- global
- neural network
- information
- spectrogram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 21
- 230000014509 gene expression Effects 0.000 claims abstract description 31
- 230000007246 mechanism Effects 0.000 claims abstract description 26
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 24
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 18
- 238000000638 solvent extraction Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000004927 fusion Effects 0.000 claims description 2
- 238000001228 spectrum Methods 0.000 claims description 2
- 238000011176 pooling Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a speech classification method based on a deep neural network, which aims to solve different speech classification problems through a unified algorithm model. The invention comprises the following steps: s1: converting the voice into a corresponding spectrogram; and partitioning along the frequency domain on the complete spectrogram to obtain a group of local frequency domain information sets. S2: complete and local frequency domain information is used as input of the model, and the convolutional neural network can extract local and global features based on different input. S3: and fusing the global and local feature expressions to form a final feature expression by applying an attention mechanism. S4: using the labeled data, the network is trained by gradient descent and back propagation algorithms. S5: and for the unmarked voice, the trained parameters are adopted, and the model outputs the classification with the highest probability as a prediction result. The invention realizes a unified algorithm model for different voice classification problems, and improves the accuracy on a plurality of voice classification problems.
Description
Technical Field
A speech classification method based on a deep neural network is used for processing different speech classification tasks and relates to the technical fields of speech signal processing, artificial intelligence and the like.
Background
With the rapid development of computer technology, the dependence and requirement of human beings on computers are continuously enhanced, and how to better interact with computers has become a research hotspot. Speech, the most common and natural way of communicating in daily life, contains a huge amount of information, such as the accent of a speaker, the emotional state of the speaker, etc. The speech classification and recognition capability of the computer is an important component of speech processing of the computer, is a key premise for realizing a natural human-computer interaction interface, and has great research value and application value. The speech classification technology is an important research direction, and plays an important role in speech recognition, speech content detection and other aspects. The voice classification is the basis and the premise for deep processing of the audio, and for a section of audio given currently, the audio environment, the gender, the accent, the emotion and the like of a speaker in which the voice is located can be determined in advance through classification, so that a basis is provided for adjusting the adaptive algorithm of the voice model. Therefore, the speech classification method is crucial.
Speech classification includes a number of different tasks, such as: speech emotion recognition, accent recognition, speaker recognition, speech environment differentiation, etc. The challenge of the speech classification task is the high dimensional nature of speech. Conventional speech classification methods typically extract specific audio features for a single problem or database, thereby reducing the dimensionality of the data input to the classification network. However, feature extraction requires sufficient speech signal processing knowledge, since feature extraction represents filtering of information, which can result in loss of information. Secondly, conventional classification algorithms are often not suitable for multi-classification tasks, such as support vector machines and the like. These problems are the difficulties that our work needs to overcome.
The deep neural network method is one of the most important means for processing big data, especially high-dimensional data at present. The deep neural network is characterized in that the learning of the characteristics of the audio data and the classification can be realized through constructing a multilayer nonlinear mapping function and through the training of the connection weight. The deep neural network can adjust the parameters of the network according to the output result because of the functions of feedback, learning and the like, and at present, the heat tide of the deep neural network is gradually spread in various subject fields, so that the deep neural network is successfully applied to a plurality of fields including machine translation, voice recognition, target recognition and the like.
Disclosure of Invention
The invention provides a voice classification method based on a deep neural network aiming at the defects, and solves the problems that the existing single task classification or data feature extraction method is only aimed at, and high-dimensional data is difficult to process in the prior art.
In order to achieve the purpose, the invention adopts the technical scheme that:
a speech classification method based on a deep neural network is characterized by comprising the following steps:
s1: carrying out short-time Fourier transform on the voice data, and converting the voice data into a corresponding spectrogram; partitioning along a frequency domain on a complete spectrogram to obtain a group of local frequency domain information sets;
s2: establishing an algorithm model based on a convolutional neural network and an attention mechanism, and respectively taking a complete spectrogram and local frequency domain information as the input of the model to carry out feature learning; extracting local and global features by using a convolutional neural network based on local and complete spectrogram information;
s3: fusing global and local feature expressions by using an attention mechanism to form a final feature expression, and inputting the final feature expression into a softmax classifier so as to obtain prediction of the classification to which the voice belongs;
s4: adopting marked voice data, training a network through a gradient descent and back propagation algorithm, and storing network parameters;
s5: and predicting the unmarked voice by adopting a trained model, and outputting the belonged classification with the highest probability as a final prediction result by the model.
Further, the distributed spectrogram conversion process in S1 specifically includes the following steps:
carrying out short-time Fourier transform on the original audio, and dividing the given original audio into M sections of short audio; calculating the short-time energy of each short audio segment and performing modulus extraction to finally obtain a complete spectrogram expression S, wherein the expression S of the spectrogram is as follows:
wherein, N is expressed as the length of each short audio segment; formula (1) shows the structural composition of the spectrogram as a two-dimensional matrix, wherein two dimensions represent the time change sequence and the frequency domain change from low frequency to high frequency on the voice respectively, and the numerical value at each point represents the amplitude.
Partitioning the complete spectrogram information along the direction of frequency domain change can obtain a group of local and global frequency spectrum information sets, namely a group of input data combinations based on different frequency domain distributions: { s1,s2,...,sn,S}。
Further, the feature extraction of the convolutional neural network in S2 specifically includes the following steps:
for a plurality of local inputs, extracting features of different information by using a convolutional neural network so as to obtain a group of local expressions:
in the above formula, each local input snAll have convolution parameters w corresponding theretonAnd bnF is expressed as an activation function; the resulting set of local features is expressed as: { a1,a2,…,an}。
For the current complete global frequency domain information, extracting global features by using a convolutional neural network, wherein a specific calculation formula is as follows:
a=g(wS+b) (3)
each global input S has a convolution parameter weight w and a bias parameter b corresponding to the global input S, g represents an activation function adopted by the global input S, and finally a represents a global feature extracted by the convolutional neural network.
Wherein the convolution and pooling operations of the convolutional neural network are mainly involved in equations (2) and (3). The specific operation of convolution is as follows:
where M and N define the size of the convolution kernel, M, N represent the number of rows and columns used to define the pixel point location, f is the convolution kernel function, aijDefining a characteristic expression, s, of i rows and j columns of the current layerijThe input data of the current layer i row j column is defined. w defines the parameters of the convolution kernel, b is the corresponding offset value;
the convolution operation in equation (4) plays an important role in the convolution network. Through the design of the shared weight, the features extracted by the convolutional network have the characteristics of no deformation; i.e. the input of the input changes slightly, the characteristics proposed by the network do not change much.
The specific operation of pooling is as follows:
p=σ(a) (5)
where σ represents the pooling function, the most common pooling functions are three, namely taking the maximum, minimum or average within the receptive field (the space of the convolution kernel). a represents the input of the pooling layer and p represents the output after the pooling operation;
the pooling parameters in the formula (5) greatly reduce the number of weights in the network, and prevent the overfitting phenomenon of the network.
Further, the fusion of the attention mechanism and the local feature expression in S3 specifically includes the following steps:
based on different local features, a new global feature expression is obtained again by applying an attention mechanism; the global information is first given a "coefficient" for each of its components:
in the above formula, piA certain component representing the global feature a, a total of m component information,the representation is based on the current local feature an,piThe coefficient of this component, representing its degree of importance; the attention mechanism learns through two-tier mapping, the first tier employing a weight W1Bias parameter b1And an activation function f to learn the mapping, the second layer using the weight W2Bias parameter b2And the activation function g learns the mapping on the results of the first layer;
the meaning of equation (6) is the essential operation of the attention mechanism, based on the local feature a of the guidance informationnAssigning different p to each component of the global feature aiThe weight value represents the importance of the composition. It is desirable to find the most representative features in the composition through network training.
Then multiplying the calculated coefficient representing the degree of importance with the corresponding component to form a new global information:
thus, by applying the attention mechanism, n new global information are obtained, and are added to the initial global feature a in a para-position manner to obtain a final feature expression:
and inputting the final feature expression A into a softmax classifier, wherein the obtained class with the maximum probability value is the prediction class of the voice data.
After the scheme is adopted, the invention has the beneficial effects that:
(1) the traditional speech classification method adopts different feature extraction algorithms aiming at a single problem, and the invention directly performs feature learning on a speech spectrogram through a deep neural network, and can autonomously learn different audio features according to different tasks.
(2) Training of deep neural networks often requires large data, however, the number of speech data presently disclosed is small. Based on the previous research of the deep neural network, the invention further provides an algorithm model for fusing the convolutional neural network and the attention mechanism, and the recognition rate on multiple tasks is further improved.
Taking two groups of voice classification tasks of accent recognition and speaker recognition as examples:
model (model) | Accuracy (%) |
i-Vector | 74.50 |
Convolutional network and attention model | 79.32 |
VGG-11 | 54.40 |
ResNet-18 | 61.66 |
ResNet-34 | 58.47 |
Table 1 shows the comparison of the model of the present invention with other methods in terms of accent recognition, where i-Vector is the classical feature extraction algorithm and VGG and ResNet are representative convolutional neural network models.
Model (model) | Accuracy (%) |
MFCC | 91.00 |
Convolutional network and attention model | 98.04 |
VGG-11 | 75.21 |
ResNet-18 | 75.04 |
ResNet-34 | 66.05 |
Table 2 shows the comparison of the model of the present invention with other methods in terms of speaker recognition, where MFCC is the classical feature extraction algorithm and VGG and resets are representative convolutional neural network models.
The above experimental results prove that:
1) on the aspect of a plurality of voice classification problems, compared with the traditional feature extraction algorithm, the features learned by the model provided by the invention can obtain a better recognition result.
2) Compared with other neural network methods, the method further improves the application of an attention mechanism in the convolutional neural network, increases the robustness and generalization capability of the model, and improves the accuracy of speech recognition on multiple problems.
Drawings
FIG. 1 is a schematic diagram of an algorithm model according to the present invention;
FIG. 2 is a frequency domain based distributed spectrogram;
FIG. 3 is a basic block diagram of a convolution block employing an attention mechanism;
fig. 4 is an overall process diagram of the present invention.
Detailed description of the preferred embodiments
The technical solution in the embodiment of the present invention will be described in detail below with reference to the drawings in the embodiment of the present invention; the embodiments described herein are merely a few embodiments of the present invention and are not all embodiments of the specification.
Referring to fig. 1, a core model of speech classification based on a deep neural network is a deep neural network model composed of a plurality of convolution blocks using an attention mechanism. One is a convolutional neural network, which mainly adopts multilayer nonlinear functions to learn the mapping relation between input data and characteristics; the deep learning algorithm can automatically learn relevant features according to the target; the other is an attention mechanism, which mainly obtains an expression that the local information has different weights by distributing different weights to the local information. The invention effectively improves the accuracy of voice classification by combining the deep learning algorithm and the attention mechanism. .
The speech classification method based on the deep neural network comprises the following steps:
step S1: carrying out short-time Fourier transform on the original audio, and dividing the given original audio into M sections of short audio; calculating the short-time energy of each short audio segment and performing modulus extraction to finally obtain a complete spectrogram expression S, wherein the expression S of the spectrogram is as follows:
where N is expressed as the short audio length per segment size.
The complete spectrogram information is partitioned along the direction of frequency domain change, so that the method can obtainA set of local and global spectral information is collected, that is, a set of input data combinations based on different frequency domain distributions are obtained: { s1,s2,…,sn,S}。
The display of the complete spectrogram and the frequency domain-based distributed spectrogram can be seen with reference to fig. 2. The spectrogram based on distribution is to perform partitioning along intervals of frequency domain variation, so as to obtain distribution information in different frequency intervals.
Step S2: establishing an algorithm model based on a convolutional neural network and an attention mechanism, and respectively taking a complete spectrogram and local frequency domain information as the input of the model to carry out feature learning; for a plurality of local inputs, extracting features of different information by using a convolutional neural network so as to obtain a group of local expressions:
in the above formula, each local input snAll have convolution parameters w corresponding theretonAnd bnF is expressed as an activation function; the resulting set of local features is expressed as: { a1,a2,…,an}。
For the current complete global frequency domain information, extracting global features by using a convolutional neural network, wherein a specific calculation formula is as follows:
a=g(wS+b) (3)
each global input S has a convolution parameter weight w and a bias parameter b corresponding to the global input S, g represents an activation function adopted by the global input S, and finally a represents a global feature extracted by the convolutional neural network.
Step S3: on the basis of the local features and the global features proposed in the step S2, based on different local features, applying an attention mechanism to retrieve a new global feature expression; the global information is first given a "coefficient" for each of its components:
in the above formula, piA certain component representing the global feature a, a total of m component information,the representation is based on the current local feature an,piThe coefficient of this component, representing its degree of importance; the attention mechanism learns through two-tier mapping, the first tier employing a weight W1Bias parameter b1And an activation function f to learn the mapping, the second layer using the weight W2Bias parameter b2And the activation function g learns the mapping on the results of the first layer.
Then multiplying the calculated coefficient representing the degree of importance with the corresponding component to form a new global information:
thus, by applying the attention mechanism, n new global information are obtained, and are added to the initial global feature a in a para-position manner to obtain a final feature expression:
and inputting the final feature expression A into a softmax classifier, wherein the obtained class with the maximum probability value is the prediction class of the voice data.
Referring to fig. 3, a basic structure diagram of a convolution block based on attention mechanism is shown, including a feature extraction process based on local and global information, and finally performing information re-fusion by using attention idea, and finally obtaining a final feature expression a.
Step S4: adopting marked voice data, training a network through a gradient descent and back propagation algorithm, and storing network parameters; the initial construction of the model is that the parameters in the network are initialized randomly, and the network parameters are updated through the errors generated by the marked voice data until the network becomes stable and the optimal parameters are reserved.
Step S5: and predicting the unmarked voice by adopting the trained model and parameters, and outputting the belonged classification with the highest probability as a final prediction result by the model.
Referring to fig. 4, the complete process diagram of the present invention from step S1 to step S5, if there is audio still to be identified, the process continues to step S1 to step S5, and finally the classifier outputs the classification with the highest probability value, which is the prediction result.
Claims (4)
1. A speech classification method based on a deep neural network is characterized in that a distributed spectrogram is combined with a convolutional neural network and an attention mechanism, and the method comprises the following steps:
s1: carrying out short-time Fourier transform on the voice data, and converting the voice data into a corresponding spectrogram; partitioning along a frequency domain on a complete spectrogram to obtain a group of local frequency domain information sets;
s2: establishing an algorithm model based on a convolutional neural network and an attention mechanism, and respectively taking a complete spectrogram and local frequency domain information as the input of the model to carry out feature learning; extracting local and global features by using a convolutional neural network based on local and complete spectrogram information;
s3: fusing global and local feature expressions by using an attention mechanism to form a final feature expression, and inputting the final feature expression into a softmax classifier so as to obtain prediction of the classification to which the voice belongs;
s4: adopting marked voice data, training a network through a gradient descent and back propagation algorithm, and storing network parameters;
s5: and predicting the unmarked voice by adopting a trained model, and outputting the belonged classification with the highest probability as a final prediction result by the model.
2. The method of claim 1, wherein the deep neural network based speech classification method comprises: the distributed spectrogram conversion process in S1 specifically includes the following steps:
s11: carrying out short-time Fourier transform on the original audio, and dividing the given original audio into M sections of short audio; calculating the short-time energy of each short audio segment and performing modulus extraction to finally obtain a complete spectrogram expression S, wherein the expression S of the spectrogram is as follows:
wherein, N is expressed as the length of each short audio segment;
s12: partitioning the complete spectrogram information along the direction of frequency domain variation, wherein a certain local frequency domain information snIs expressed as follows:
finally, a group of local and global frequency spectrum information sets are obtained, and a group of input data combinations based on different frequency domain distributions are obtained: { s1,s2,…,sn,S}。
3. The method of claim 1, wherein the deep neural network based speech classification method comprises: the characteristic extraction of the convolutional neural network in the step S2 specifically includes the following steps:
s21: for a plurality of local inputs, extracting features of different information by using a convolutional neural network so as to obtain a group of local expressions:
in the above formula, each local input snAll have convolution parameters w corresponding theretonAnd bnF is expressed as an activation function; the resulting set of partsThe characteristics are expressed as: { a1,a2,…,an};
S22: for the current complete global frequency domain information, extracting global features by using a convolutional neural network, wherein a specific calculation formula is as follows:
a=g(wS+b) (4)
each global input S has a convolution parameter weight w and a bias parameter b corresponding to the global input S, g represents an activation function adopted by the global input S, and finally a represents a global feature extracted by the convolutional neural network.
4. The method of claim 1, wherein the deep neural network based speech classification method comprises: the fusion of the attention mechanism and the local feature expression in step S3 specifically includes the following steps:
based on different local features, a new global feature expression is obtained again by applying an attention mechanism; the global information is first given a "coefficient" for each of its components:
in the above formula, piA certain component representing the global feature a, a total of m component information,the representation is based on the current local feature an,piThe coefficient of this component, representing its degree of importance; the attention mechanism learns through two-tier mapping, the first tier employing a weight W1Bias parameter b1And an activation function f to learn the mapping, the second layer using the weight W2Bias parameter b2And the activation function g learns the mapping on the results of the first layer;
then multiplying the calculated coefficient representing the degree of importance with the corresponding component to form a new global information:
thus, by applying the attention mechanism, n new global information are obtained, and are added to the initial global feature a in a para-position manner to obtain a final feature expression:
and inputting the final feature expression A into a softmax classifier, wherein the obtained class with the maximum probability value is the prediction class of the voice data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711155884.8A CN108010514B (en) | 2017-11-20 | 2017-11-20 | Voice classification method based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711155884.8A CN108010514B (en) | 2017-11-20 | 2017-11-20 | Voice classification method based on deep neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108010514A CN108010514A (en) | 2018-05-08 |
CN108010514B true CN108010514B (en) | 2021-09-10 |
Family
ID=62052777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711155884.8A Expired - Fee Related CN108010514B (en) | 2017-11-20 | 2017-11-20 | Voice classification method based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108010514B (en) |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11462209B2 (en) * | 2018-05-18 | 2022-10-04 | Baidu Usa Llc | Spectrogram to waveform synthesis using convolutional networks |
CN108846048A (en) * | 2018-05-30 | 2018-11-20 | 大连理工大学 | Musical genre classification method based on Recognition with Recurrent Neural Network and attention mechanism |
CN108877783B (en) * | 2018-07-05 | 2021-08-31 | 腾讯音乐娱乐科技(深圳)有限公司 | Method and apparatus for determining audio type of audio data |
CN109256135B (en) * | 2018-08-28 | 2021-05-18 | 桂林电子科技大学 | End-to-end speaker confirmation method, device and storage medium |
CN109410914B (en) * | 2018-08-28 | 2022-02-22 | 江西师范大学 | Method for identifying Jiangxi dialect speech and dialect point |
CN109243466A (en) * | 2018-11-12 | 2019-01-18 | 成都傅立叶电子科技有限公司 | A kind of vocal print authentication training method and system |
CN109523994A (en) * | 2018-11-13 | 2019-03-26 | 四川大学 | A kind of multitask method of speech classification based on capsule neural network |
CN109599129B (en) * | 2018-11-13 | 2021-09-14 | 杭州电子科技大学 | Voice depression recognition system based on attention mechanism and convolutional neural network |
CN109285539B (en) * | 2018-11-28 | 2022-07-05 | 中国电子科技集团公司第四十七研究所 | Sound recognition method based on neural network |
CN111259189B (en) * | 2018-11-30 | 2023-04-18 | 马上消费金融股份有限公司 | Music classification method and device |
CN109509475B (en) * | 2018-12-28 | 2021-11-23 | 出门问问信息科技有限公司 | Voice recognition method and device, electronic equipment and computer readable storage medium |
CN109817233B (en) * | 2019-01-25 | 2020-12-01 | 清华大学 | Voice stream steganalysis method and system based on hierarchical attention network model |
CN109767790A (en) * | 2019-02-28 | 2019-05-17 | 中国传媒大学 | A kind of speech-emotion recognition method and system |
CN110047516A (en) * | 2019-03-12 | 2019-07-23 | 天津大学 | A kind of speech-emotion recognition method based on gender perception |
CN110197206B (en) * | 2019-05-10 | 2021-07-13 | 杭州深睿博联科技有限公司 | Image processing method and device |
CN110223714B (en) * | 2019-06-03 | 2021-08-03 | 杭州哲信信息技术有限公司 | Emotion recognition method based on voice |
CN110459225B (en) * | 2019-08-14 | 2022-03-22 | 南京邮电大学 | Speaker recognition system based on CNN fusion characteristics |
CN110534133B (en) * | 2019-08-28 | 2022-03-25 | 珠海亿智电子科技有限公司 | Voice emotion recognition system and voice emotion recognition method |
CN110648669B (en) * | 2019-09-30 | 2022-06-07 | 上海依图信息技术有限公司 | Multi-frequency shunt voiceprint recognition method, device and system and computer readable storage medium |
CN110782878B (en) * | 2019-10-10 | 2022-04-05 | 天津大学 | Attention mechanism-based multi-scale audio scene recognition method |
CN111009262A (en) * | 2019-12-24 | 2020-04-14 | 携程计算机技术(上海)有限公司 | Voice gender identification method and system |
CN110992988B (en) * | 2019-12-24 | 2022-03-08 | 东南大学 | Speech emotion recognition method and device based on domain confrontation |
CN111223488B (en) * | 2019-12-30 | 2023-01-17 | Oppo广东移动通信有限公司 | Voice wake-up method, device, equipment and storage medium |
CN111340187B (en) * | 2020-02-18 | 2024-02-02 | 河北工业大学 | Network characterization method based on attention countermeasure mechanism |
CN111666996B (en) * | 2020-05-29 | 2023-09-19 | 湖北工业大学 | High-precision equipment source identification method based on attention mechanism |
CN114141244A (en) * | 2020-09-04 | 2022-03-04 | 四川大学 | Voice recognition technology based on audio media analysis |
CN112489687B (en) * | 2020-10-28 | 2024-04-26 | 深兰人工智能芯片研究院(江苏)有限公司 | Voice emotion recognition method and device based on sequence convolution |
CN112466298B (en) * | 2020-11-24 | 2023-08-11 | 杭州网易智企科技有限公司 | Voice detection method, device, electronic equipment and storage medium |
CN112489689B (en) * | 2020-11-30 | 2024-04-30 | 东南大学 | Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure |
CN112992119B (en) * | 2021-01-14 | 2024-05-03 | 安徽大学 | Accent classification method based on deep neural network and model thereof |
CN112885372B (en) * | 2021-01-15 | 2022-08-09 | 国网山东省电力公司威海供电公司 | Intelligent diagnosis method, system, terminal and medium for power equipment fault sound |
CN113593525B (en) * | 2021-01-26 | 2024-08-06 | 腾讯科技(深圳)有限公司 | Accent classification model training and accent classification method, apparatus and storage medium |
CN112967730B (en) * | 2021-01-29 | 2024-07-02 | 北京达佳互联信息技术有限公司 | Voice signal processing method and device, electronic equipment and storage medium |
CN113571063B (en) * | 2021-02-02 | 2024-06-04 | 腾讯科技(深圳)有限公司 | Speech signal recognition method and device, electronic equipment and storage medium |
CN113035227B (en) * | 2021-03-12 | 2022-02-11 | 山东大学 | Multi-modal voice separation method and system |
CN113049084B (en) * | 2021-03-16 | 2022-05-06 | 电子科技大学 | Attention mechanism-based Resnet distributed optical fiber sensing signal identification method |
CN113409827B (en) * | 2021-06-17 | 2022-06-17 | 山东省计算中心(国家超级计算济南中心) | Voice endpoint detection method and system based on local convolution block attention network |
CN116778951B (en) * | 2023-05-25 | 2024-08-09 | 上海蜜度科技股份有限公司 | Audio classification method, device, equipment and medium based on graph enhancement |
CN116504259B (en) * | 2023-06-30 | 2023-08-29 | 中汇丰(北京)科技有限公司 | Semantic recognition method based on natural language processing |
CN116825092B (en) * | 2023-08-28 | 2023-12-01 | 珠海亿智电子科技有限公司 | Speech recognition method, training method and device of speech recognition model |
CN117275491B (en) * | 2023-11-17 | 2024-01-30 | 青岛科技大学 | Sound classification method based on audio conversion and time attention seeking neural network |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706780A (en) * | 2009-09-03 | 2010-05-12 | 北京交通大学 | Image semantic retrieving method based on visual attention model |
CN102044254A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Speech spectrum color enhancement method for speech visualization |
CN106652999A (en) * | 2015-10-29 | 2017-05-10 | 三星Sds株式会社 | System and method for voice recognition |
CN106847309A (en) * | 2017-01-09 | 2017-06-13 | 华南理工大学 | A kind of speech-emotion recognition method |
CN106952649A (en) * | 2017-05-14 | 2017-07-14 | 北京工业大学 | Method for distinguishing speek person based on convolutional neural networks and spectrogram |
CN107145518A (en) * | 2017-04-10 | 2017-09-08 | 同济大学 | Personalized recommendation system based on deep learning under a kind of social networks |
CN107203999A (en) * | 2017-04-28 | 2017-09-26 | 北京航空航天大学 | A kind of skin lens image automatic division method based on full convolutional neural networks |
CN107221326A (en) * | 2017-05-16 | 2017-09-29 | 百度在线网络技术(北京)有限公司 | Voice awakening method, device and computer equipment based on artificial intelligence |
CN107239446A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism |
CN107256228A (en) * | 2017-05-02 | 2017-10-17 | 清华大学 | Answer selection system and method based on structuring notice mechanism |
CN107316066A (en) * | 2017-07-28 | 2017-11-03 | 北京工商大学 | Image classification method and system based on multi-path convolutional neural networks |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10762894B2 (en) * | 2015-03-27 | 2020-09-01 | Google Llc | Convolutional neural networks |
US10382300B2 (en) * | 2015-10-06 | 2019-08-13 | Evolv Technologies, Inc. | Platform for gathering real-time analysis |
-
2017
- 2017-11-20 CN CN201711155884.8A patent/CN108010514B/en not_active Expired - Fee Related
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706780A (en) * | 2009-09-03 | 2010-05-12 | 北京交通大学 | Image semantic retrieving method based on visual attention model |
CN102044254A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Speech spectrum color enhancement method for speech visualization |
CN106652999A (en) * | 2015-10-29 | 2017-05-10 | 三星Sds株式会社 | System and method for voice recognition |
CN106847309A (en) * | 2017-01-09 | 2017-06-13 | 华南理工大学 | A kind of speech-emotion recognition method |
CN107145518A (en) * | 2017-04-10 | 2017-09-08 | 同济大学 | Personalized recommendation system based on deep learning under a kind of social networks |
CN107203999A (en) * | 2017-04-28 | 2017-09-26 | 北京航空航天大学 | A kind of skin lens image automatic division method based on full convolutional neural networks |
CN107256228A (en) * | 2017-05-02 | 2017-10-17 | 清华大学 | Answer selection system and method based on structuring notice mechanism |
CN106952649A (en) * | 2017-05-14 | 2017-07-14 | 北京工业大学 | Method for distinguishing speek person based on convolutional neural networks and spectrogram |
CN107221326A (en) * | 2017-05-16 | 2017-09-29 | 百度在线网络技术(北京)有限公司 | Voice awakening method, device and computer equipment based on artificial intelligence |
CN107239446A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism |
CN107316066A (en) * | 2017-07-28 | 2017-11-03 | 北京工商大学 | Image classification method and system based on multi-path convolutional neural networks |
Non-Patent Citations (4)
Title |
---|
"Deep Convolutional Recurrent neural network with attention mechanism robust speech emotion recognition";Che-Wei Huang;《ICME 2017》;20170714;全文 * |
"Describing Multimedia Content Using Attention-based Encoder-Decoder Networks";Kyunghyun Cho;《IEEE transaction on Multimedia》;20151231;第17卷(第11期);全文 * |
"Time-frequency feature extraction from spectrograms and wavelet packets with application to automatic stress and emotion classification in speech";L. He, M. Lech;《Proceedings of the International Conference on Information, Communications and Signal Processing》;20091231;全文 * |
"Transferring Deep Convolutional Neural networks for the sceneclassification of high-resolution remote sensing imagery";Fan Hu;《MDPI》;20151205;第7卷(第11期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108010514A (en) | 2018-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108010514B (en) | Voice classification method based on deep neural network | |
Wang et al. | Research on Web text classification algorithm based on improved CNN and SVM | |
CN107944559B (en) | Method and system for automatically identifying entity relationship | |
Du et al. | Shape recognition based on neural networks trained by differential evolution algorithm | |
Thakur et al. | Deep metric learning for bioacoustic classification: Overcoming training data scarcity using dynamic triplet loss | |
CN109271522A (en) | Comment sensibility classification method and system based on depth mixed model transfer learning | |
CN106897254B (en) | Network representation learning method | |
WO2021127982A1 (en) | Speech emotion recognition method, smart device, and computer-readable storage medium | |
CN112818861A (en) | Emotion classification method and system based on multi-mode context semantic features | |
Chang et al. | Automatic channel pruning via clustering and swarm intelligence optimization for CNN | |
CN111414461A (en) | Intelligent question-answering method and system fusing knowledge base and user modeling | |
CN111898703B (en) | Multi-label video classification method, model training method, device and medium | |
WO2020151310A1 (en) | Text generation method and device, computer apparatus, and medium | |
EP4198807A1 (en) | Audio processing method and device | |
WO2021042857A1 (en) | Processing method and processing apparatus for image segmentation model | |
CN111353534B (en) | Graph data category prediction method based on adaptive fractional order gradient | |
Lian et al. | Unsupervised representation learning with future observation prediction for speech emotion recognition | |
CN109522432B (en) | Image retrieval method integrating adaptive similarity and Bayes framework | |
Hsu et al. | Unsupervised convolutional neural networks for large-scale image clustering | |
Verkholyak et al. | Modeling short-term and long-term dependencies of the speech signal for paralinguistic emotion classification | |
CN112418059A (en) | Emotion recognition method and device, computer equipment and storage medium | |
CN116386899A (en) | Graph learning-based medicine disease association relation prediction method and related equipment | |
CN107807919A (en) | A kind of method for carrying out microblog emotional classification prediction using random walk network is circulated | |
CN109033413B (en) | Neural network-based demand document and service document matching method | |
WO2021059527A1 (en) | Learning device, learning method, and recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210910 |