CN113361592A - Acoustic event identification method based on public subspace representation learning - Google Patents
Acoustic event identification method based on public subspace representation learning Download PDFInfo
- Publication number
- CN113361592A CN113361592A CN202110620415.9A CN202110620415A CN113361592A CN 113361592 A CN113361592 A CN 113361592A CN 202110620415 A CN202110620415 A CN 202110620415A CN 113361592 A CN113361592 A CN 113361592A
- Authority
- CN
- China
- Prior art keywords
- segment
- matrix
- acoustic event
- level
- subspace
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Complex Calculations (AREA)
Abstract
An acoustic event identification method based on common subspace representation learning relates to an acoustic event identification method. The method aims to solve the problem of low accuracy of an acoustic event recognition task caused by inconsistency of subspaces among different semantic features. Firstly, sampling, quantizing, frame level feature extraction, segment level feature extraction and expansion are carried out on each original acoustic event signal; then, semantic feature representation of the common subspace is obtained through learning; and calculating a kernel matrix of the training set, and training a classifier to obtain a classification model. During testing, sampling, quantizing, frame level feature extraction, segment level feature extraction and expansion are carried out on each original acoustic event signal; obtaining semantic feature representation of the common subspace under the guidance of the learned common subspace; and finally, calculating a kernel matrix of the test set, and performing model matching under the guidance of the classification model to obtain a prediction result. The method is mainly used for the identification of acoustic events.
Description
Technical Field
The invention relates to an acoustic event identification method.
Background
Acoustic events are sound signals with well-defined semantics and are also important media for human perception of the surrounding environment. With the development of information technology, it is possible for machines to have the ability to recognize and understand acoustic events by simulating the human auditory mechanisms. The acoustic event recognition technology is also widely applied to various practical fields such as environmental monitoring and smart home, and is receiving attention of more and more researchers.
In a plurality of acoustic event identification technologies, the semantic feature extraction method based on subspace learning can effectively improve the identification performance due to the fact that the content information and the time sequence relation of acoustic events are considered. The method fully describes the semantic features of a single event by independently learning respective subspaces of different acoustic events. However, since it ignores the consistency of the subspace used to characterize semantic features of different acoustic events, the differences between these features come not only from the semantic features implied by the events, but also from the inconsistency of the subspace, thereby affecting the recognition accuracy of the acoustic events.
Disclosure of Invention
The method aims to solve the problem of low accuracy of an acoustic event recognition task caused by inconsistency of subspaces among different semantic features.
An acoustic event recognition method based on common subspace representation learning, comprising the following steps:
extracting logarithmic Mel spectrum characteristics of the audio frame corresponding to the acoustic event sample to be identified to obtain a frame level characteristic matrix; further abstracting the frame-level features into segment-level features by using a convolutional neural network;
utilizing common subspace basis matrices U*Solving the optimal semantic feature matrix of each acoustic event sample
Then, calculating a kernel matrix of the acoustic event sample to be identified:
wherein the content of the first and second substances,to utilize a common subspace basis matrix U*Aiming at an optimal semantic feature matrix obtained by a training set, wherein the training set is a set formed by acoustic event samples used for training; n is a radical ofteNumber of samples, N, of acoustic event samples to be identifiedtrThe total number of training samples in the training set; k (·, ·) is a grassmann kernel function; r represents a real number space;
finally in the classification parameter alpha*Under the guidance of (1), model matching is performed according to the following formula:
P=(α*)TKte
wherein the content of the first and second substances,for the prediction result, each column of the prediction result is a probability score of the acoustic event sample to be identified on each category, and the identification result of the acoustic event sample to be identified is determined by taking the maximum value of each column;
the classification parameter alpha*The classification parameters of the support vector machine classifier are obtained by training with a training set.
Further, the classification parameter α*The acquisition process comprises the following steps:
calculating a kernel matrix according to the optimal semantic feature matrix of the training set:
wherein, KtrA kernel matrix which is a training set;respectively according to the optimal semantic features of the training set;
by KtrTraining a support vector machine classifier to obtain a classification model, wherein classification parameters of the classification model are expressed asWherein c is the total number of classes.
Further, the common subspace base matrix U*The determination process of (a) includes the steps of:
step 5.1: randomly initializing a base matrix of the public subspace;
step 5.2: extracting frame level characteristics corresponding to each training sample in the training set, obtaining segment level characteristics according to the frame level characteristic matrix, and recording the segment level characteristics of the ith sample in the segment level characteristic set as the segment level characteristicsWherein N isiD is the nth segment to form the number of segments required for the sampleA characteristic dimension of (d);
randomly selecting part of acoustic event samples in the training set of the segment-level features, and respectively solving semantic feature matrixes of the acoustic event samples under the condition of fixing a common subspace basis matrix;
step 5.3: updating the base matrix of the public subspace by using the plurality of semantic feature matrixes obtained in the step 5.2;
step 5.4: repeating the steps 5.2 to 5.3 until convergence to obtain the optimal representation of the common subspace base matrix, namely the common subspace base matrix U*。
Further, the segment level feature X of step 5.2iA smoothing process and a length normalization process are required.
Further, the process of randomly selecting some acoustic event samples and solving their semantic feature matrices respectively under the condition of fixing the common subspace basis matrix as described in step 5.2 includes the following steps:
in the training set of the segment-level features, selecting l acoustic event samples at random, fixing U, and respectively obtaining a semantic feature matrix according to the following formula:
wherein, U is a standard orthogonal basis matrix of a random initialization public subspace; viFor the ith sample X in the l training samplesiCorresponding semantic feature matrix, constraining the different columns in the matrix to be mutually orthogonal, and dividing ViIs shown asThe above elements; q. q.siIs NiThe total combination number of two segments with the occurrence sequence in each segment, lambda is used for reflecting a hyper-parameter of the influence of the characterization time sequence relation on semantic feature quality, and eta is a hyper-parameter reflecting the obvious degree of the characterization time sequence relation; 1p=[1]∈Rp×1Is a p-dimensional whole-column vector,is p × qiAll one matrix of (a); respectively represent XiNth, mth segment level features;
Further, said NiTotal combined number q of two segments with successive occurrence order in each segmenti=0.5×Ni×(Ni-1)。
Further, in the process of updating the base matrix of the common subspace by using the plurality of semantic feature matrices obtained in step 5.2, the base matrix of the common subspace is updated according to the following formula:
wherein | · | purpleFRepresenting the Frobenius norm.
Furthermore, the acoustic event sample to be identified is obtained after sampling and quantizing the acoustic event signal to be identified.
Further, in the process of further abstracting the frame-level features into the segment-level features by using the convolutional neural network, a plurality of adjacent frame-level features are further abstracted into the segment-level features.
Further, after the frame-level features are further abstracted into segment-level features by using the convolutional neural network, the segment-level features are subjected to smoothing processing and length normalization processing.
Has the advantages that:
the method can well solve the influence of the inconsistency of the subspace among different semantic features on the accuracy rate of the acoustic event recognition task. Meanwhile, the invention can describe the semantic features of the acoustic event from multiple angles by learning the multidimensional public subspace, thereby improving the recognition performance together, and the accuracy of the acoustic event recognition task is as high as 84.1%.
Drawings
Fig. 1 is a schematic diagram of an acoustic event recognition method based on common subspace representation learning.
FIG. 2 is a convolutional neural network architecture diagram for extracting segment-level features.
Fig. 3 is a histogram of the accuracy of acoustic event recognition methods and correlation methods on ESC-50 data sets based on common subspace representation learning.
Detailed Description
The first embodiment is as follows: the present embodiment is described with reference to fig. 1, and fig. 1 is a schematic diagram of an acoustic event recognition method based on common subspace representation learning. In the training stage, firstly, sampling, quantizing, frame level feature extraction, segment level feature extraction and expansion are respectively carried out on original signals from a training set; then, obtaining semantic feature representation of the public subspace by learning the public subspace; and finally, calculating a kernel matrix of the training set, and training a classifier to obtain a classification model. In the testing stage, firstly, sampling, quantizing, frame level feature extracting, segment level feature extracting and expanding are carried out on each original acoustic event signal in a testing set; then, obtaining semantic feature representation under the guidance of the learned public subspace; and finally, calculating a kernel matrix of the test set, and performing model matching under the guidance of the classification model to obtain a prediction result.
The acoustic event identification method based on common subspace representation learning in the embodiment comprises the following steps:
step 1: and respectively sampling and quantizing the original acoustic event signals in the training set and the testing set to obtain processed acoustic event samples. In this embodiment, the sampling rate may be 44100 Hz, and the number of quantization bits may be 16.
Step 2: and (2) dividing each acoustic event sample obtained in the step (1) into a plurality of audio frames, dividing the audio frames into a plurality of audio frames according to a pre-specified frame length and an inter-frame overlapping proportion, and respectively extracting classical logarithmic Mel spectral features in an acoustic event recognition task from the audio frames according to a pre-specified Mel frequency band number to obtain a frame level feature matrix. In the present embodiment, the frame length, the inter-frame overlap, and the number of mel-frequency bands may be set to 23 msec, 50%, and 128, respectively.
And step 3: considering that the audio frame is often too short in duration, there is a limitation that the audio frame contains insufficient semantic information. For this purpose, for each frame-level feature matrix obtained in step 2, according to a pre-specified segment length and an inter-segment overlapping proportion, a plurality of adjacent frame-level features are input into a pre-trained convolutional neural network, and under the condition of fixed network parameters, segment-level features are obtained without supervision, so that a segment-level feature matrix is obtained. FIG. 2 is a convolutional neural network architecture diagram for extracting segment-level features. The network is composed of 13 convolutional layers, 6 maximum pooling layers and an average pooling layer, logarithmic Mel spectrum characteristics of samples in an AudioSet data set are used as input, supervised pre-training is carried out, and then optimal network parameters are obtained.
In order to fully utilize and utilize the capability of the convolutional neural network and effectively control the total time of samples, in this embodiment, each segment-level feature may be set as a further abstraction result of 84 audio frames, the inter-segment overlap may be 20%, and the penultimate layer (last convolutional layer) of the above network may be output as a segment-level feature, the node number of which is 1024.
And 4, step 4: in order to enhance the time sequence correlation among the segment-level features, smoothing is performed on each segment-level feature matrix obtained in the step 3 by using a classical time domain transform averaging method, and then length normalization is performed on the smoothed segment-level feature matrix to obtain the segment-level feature matrix expanded by the operation. For the convenience of the following description, the segment level of the i-th sample after being expanded is characterized asWherein N isiD is the nth segment to form the number of segments required for the sampleThe characteristic dimension of (c). Since the above expansion operation does not change the segment-level feature dimension obtained after step 3, d in this step can also be set to 1024.
And 5: in order to depict the overall semantic features of the acoustic event samples, a public subspace is constructed by using a strategy based on public subspace learning, and the method comprises the following steps:
step 5.1: defining a subspace with a dimension p, randomly initializing the orthonormal basis matrix of the subspace, and recording asWherein the content of the first and second substances,refers to the Stiefel flow pattern, which isA set consisting of orthonormal matrices of size d × p. Wherein p can range from [1,2,3,4,5 ]]Selecting.
Step 5.2: in the training set processed in step 4, randomly selecting l acoustic event samples, fixing U (steps 5.1-5.3 are three main steps in an iterative algorithm, which can be described as alternately updating U and Vi, and here, fixing U updates the other), and obtaining their semantic feature matrices according to the following formula:
wherein the content of the first and second substances,is about ViFunction of ViFor the ith sample X in the l training samplesiCorresponding semantic feature matrix, in order to avoid redundancy in semantic features, the invention restrains the mutual orthogonality between different columns in the matrix, and combines ViIs shown asThe above elements; q. q.siIs NiThe total combination number of two segments with the successive occurrence sequence in each segment can be determined by qi=0.5×Ni×(Ni-1) performing a calculation; λ is a hyper-parameter of the invention used for reflecting the influence of the characterization time sequence relation on the semantic feature quality, and η is a hyper-parameter reflecting the significance degree of the characterization time sequence relation; 1p=[1]∈Rp×1Is a p-dimensional whole-column vector,is p × qiAll of the matrices of (a).Containing a sample XiThe difference between any two of the segments having a tandem occurrence relationship;respectively represent XiNth, mth segment level features.
To effectively solve aboutThe invention utilizes a classical Riemann gradient descent algorithm which is an iterative solution strategy, and obtains a semantic feature representation V after the preassigned maximum iteration timesi. According to the invention, through learning the multidimensional public subspace, the semantic features of the acoustic events can be described from multiple angles, so that the recognition performance is improved together.
where l may be set to 64, λ may be set to 0.1, η may be set to 0.001, and the maximum number of iterations may be set to 20.
Step 5.3: updating the base matrix of the public subspace according to the following formula by using the plurality of semantic feature matrixes obtained in the step 5.2:
is a function related to U, and in order to effectively solve the formula, a classical Riemannian gradient descent algorithm is adopted for updating once to obtain updated U. Specifically, the above algorithms all depend on the update processWith respect to the gradient of U, it can be expressed as:
step 5.4: repeating steps 5.2 to 5.3 until convergence to obtain an optimal representation U of the common subspace base matrix*。
Step 6: at U*Respectively solving the optimal semantic feature matrix of each acoustic event sample in the training set and the test set after the step 4 under the guidance of (1), and recording the optimal semantic feature matrix set of the training set asWherein N istrThe total training sample number is obtained; the optimal semantic feature matrix set of the test set isWherein N isteIs the total number of test samples.
And 7: calculating a kernel matrix of the training set by using the optimal semantic feature matrix of the training set according to the following formula:
wherein, KtrTo train the kernel matrix of the set, k (·,) is a pre-specified grassmann kernel function. Further, with KtrTraining a support vector machine classifier to obtain a classification model, wherein classification parameters of the model can be expressed asWherein c is the total number of classes.
And 8: and calculating a kernel matrix of the test set according to the following formula by using the optimal semantic feature matrix of the training set and the test set:
wherein, KteIs the above-mentioned kernel matrix. Further, at α*Under the guidance of (1), model matching is performed according to the following formula:
P=(α*)TKte
wherein the content of the first and second substances,for predicting the result, each column can be regarded as a probability score of the corresponding test sample on each category, and the recognition result of the test sample can be determined by taking the maximum value of each column.
The invention is tested on the internationally published data set, and the test result shows that the technology can effectively improve the identification performance.
Fig. 3 is a histogram of the accuracy of acoustic event recognition methods and correlation methods on ESC-50 data sets based on common subspace representation learning. By comparing the accuracy of the method provided by the invention and the acoustic event identification method based on subspace representation learning, the necessity of introducing a common subspace can be verified. Meanwhile, compared with the accuracy achieved by the pre-training convolutional neural network and the human ear, the effectiveness of the method provided by the invention can be verified.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.
Claims (10)
1. An acoustic event recognition method based on common subspace representation learning, characterized by comprising the following steps:
extracting logarithmic Mel spectrum characteristics of the audio frame corresponding to the acoustic event sample to be identified to obtain a frame level characteristic matrix; further abstracting the frame-level features into segment-level features by using a convolutional neural network;
utilizing common subspace basis matrices U*Solving the optimal semantic feature matrix of each acoustic event sample
Then, calculating a kernel matrix of the acoustic event sample to be identified:
wherein the content of the first and second substances,to utilize a common subspace basis matrix U*Aiming at an optimal semantic feature matrix obtained by a training set, wherein the training set is a set formed by acoustic event samples used for training; n is a radical ofteNumber of samples, N, of acoustic event samples to be identifiedtrThe total number of training samples in the training set; k (·, ·) is a grassmann kernel function; r represents a real number space;
finally in the classification parameter alpha*Under the guidance of (1), model matching is performed according to the following formula:
P=(α*)TKte
wherein the content of the first and second substances,for predicting the result, each column thereof corresponds to a sample of the acoustic event to be recognizedScoring the probability of each category, and determining the recognition result of the acoustic event sample to be recognized by taking the maximum value of each column;
the classification parameter alpha*The classification parameters of the support vector machine classifier are obtained by training with a training set.
2. The method of claim 1, wherein the classification parameter α is a*The acquisition process comprises the following steps:
calculating a kernel matrix according to the optimal semantic feature matrix of the training set:
wherein, KtrA kernel matrix which is a training set;respectively according to the optimal semantic features of the training set;
3. The method for recognizing acoustic events based on common subspace representation learning according to claim 1 or 2, wherein the common subspace base matrix U*The determination process of (a) includes the steps of:
step 5.1: randomly initializing a base matrix of the public subspace;
step 5.2: extracting frame level characteristics corresponding to each training sample in the training set, obtaining segment level characteristics according to the frame level characteristic matrix, and recording the segment level characteristics of the ith sample in the segment level characteristic set as the segment level characteristicsWherein N isiD is the nth segment to form the number of segments required for the sampleA characteristic dimension of (d);
randomly selecting part of acoustic event samples in the training set of the segment-level features, and respectively solving semantic feature matrixes of the acoustic event samples under the condition of fixing a common subspace basis matrix;
step 5.3: updating the base matrix of the public subspace by using the plurality of semantic feature matrixes obtained in the step 5.2;
step 5.4: repeating the steps 5.2 to 5.3 until convergence to obtain the optimal representation of the common subspace base matrix, namely the common subspace base matrix U*。
4. The method of claim 3, wherein the segment-level feature X of step 5.2 is the same as the segment-level feature X of the acoustic event recognition method based on common subspace representation learningiA smoothing process and a length normalization process are required.
5. The method for recognizing acoustic events based on common subspace representation learning according to claim 4, wherein the process of randomly selecting partial acoustic event samples and respectively solving the semantic feature matrices thereof under the condition of fixing the common subspace base matrix in step 5.2 comprises the following steps:
in the training set of the segment-level features, selecting l acoustic event samples at random, fixing U, and respectively obtaining a semantic feature matrix according to the following formula:
wherein, U is a standard orthogonal basis matrix of a random initialization public subspace; viFor the ith sample X in the l training samplesiCorresponding semantic feature matrix, constraining the different columns in the matrix to be mutually orthogonal, and dividing ViIs shown asThe above elements; q. q.siIs NiThe total combination number of two segments with the occurrence sequence in each segment, lambda is used for reflecting a hyper-parameter of the influence of the characterization time sequence relation on semantic feature quality, and eta is a hyper-parameter reflecting the obvious degree of the characterization time sequence relation; 1p=[1]∈Rp×1Is a p-dimensional whole-column vector,is p × qiAll one matrix of (a); respectively represent XiNth, mth segment level features;
6. The method of claim 5, wherein N is the number of acoustic events identified based on the learning of the common subspace representationiTotal combined number q of two segments with successive occurrence order in each segmenti=0.5×Ni×(Ni-1)。
7. The method for recognizing acoustic events based on common subspace representation learning according to claim 6, wherein the updating of the base matrix of the common subspace by using the plurality of semantic feature matrices obtained in step 5.2 is performed by updating the base matrix of the common subspace according to the following formula:
wherein | · | purpleFRepresenting the Frobenius norm.
8. The method as claimed in claim 7, wherein the acoustic event samples to be identified are obtained by sampling and quantizing the acoustic event signals to be identified.
9. The method of claim 8, wherein the further abstraction of the frame-level features into the segment-level features by the convolutional neural network is to further abstract a plurality of adjacent frame-level features into the segment-level features.
10. The method of claim 9, wherein after the frame-level features are further abstracted into segment-level features by using a convolutional neural network, the segment-level features are further subjected to smoothing and length normalization.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110620415.9A CN113361592B (en) | 2021-06-03 | 2021-06-03 | Acoustic event identification method based on public subspace representation learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110620415.9A CN113361592B (en) | 2021-06-03 | 2021-06-03 | Acoustic event identification method based on public subspace representation learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113361592A true CN113361592A (en) | 2021-09-07 |
CN113361592B CN113361592B (en) | 2022-11-08 |
Family
ID=77531792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110620415.9A Active CN113361592B (en) | 2021-06-03 | 2021-06-03 | Acoustic event identification method based on public subspace representation learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113361592B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016042359A (en) * | 2014-08-18 | 2016-03-31 | 株式会社デンソーアイティーラボラトリ | Recognition apparatus, real number matrix decomposition method, and recognition method |
CN106250855A (en) * | 2016-08-02 | 2016-12-21 | 南京邮电大学 | A kind of multi-modal emotion identification method based on Multiple Kernel Learning |
CN110148428A (en) * | 2019-05-27 | 2019-08-20 | 哈尔滨工业大学 | A kind of acoustic events recognition methods indicating study based on subspace |
US20200075040A1 (en) * | 2018-08-31 | 2020-03-05 | The Regents Of The University Of Michigan | Automatic speech-based longitudinal emotion and mood recognition for mental health treatment |
CN112241605A (en) * | 2019-07-17 | 2021-01-19 | 华北电力大学(保定) | Method for identifying state of circuit breaker energy storage process by constructing CNN characteristic matrix through acoustic vibration signals |
CN112820071A (en) * | 2021-02-25 | 2021-05-18 | 泰康保险集团股份有限公司 | Behavior identification method and device |
-
2021
- 2021-06-03 CN CN202110620415.9A patent/CN113361592B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016042359A (en) * | 2014-08-18 | 2016-03-31 | 株式会社デンソーアイティーラボラトリ | Recognition apparatus, real number matrix decomposition method, and recognition method |
CN106250855A (en) * | 2016-08-02 | 2016-12-21 | 南京邮电大学 | A kind of multi-modal emotion identification method based on Multiple Kernel Learning |
US20200075040A1 (en) * | 2018-08-31 | 2020-03-05 | The Regents Of The University Of Michigan | Automatic speech-based longitudinal emotion and mood recognition for mental health treatment |
CN110148428A (en) * | 2019-05-27 | 2019-08-20 | 哈尔滨工业大学 | A kind of acoustic events recognition methods indicating study based on subspace |
CN112241605A (en) * | 2019-07-17 | 2021-01-19 | 华北电力大学(保定) | Method for identifying state of circuit breaker energy storage process by constructing CNN characteristic matrix through acoustic vibration signals |
CN112820071A (en) * | 2021-02-25 | 2021-05-18 | 泰康保险集团股份有限公司 | Behavior identification method and device |
Non-Patent Citations (3)
Title |
---|
LIWEN ZHANG等: "《Pyramidal Temporal Pooling With Discriminative Mapping for Audio Classification》", 《IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 * |
史秋莹 等: "《基于DNN和多模态信息融合的复杂音频场景识别》", 《第十四届全国人机语音通讯学术会议(NCMMSC’2017)论文集》 * |
程石磊: "《视频序列中人体行为的特征提取与识别方法研究》", 《中国博士学位论文全文数据库 (信息科技辑)》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113361592B (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | Utterance-level aggregation for speaker recognition in the wild | |
JP6235938B2 (en) | Acoustic event identification model learning device, acoustic event detection device, acoustic event identification model learning method, acoustic event detection method, and program | |
Shi et al. | Few-shot acoustic event detection via meta learning | |
CN110349597B (en) | Voice detection method and device | |
CN112885372B (en) | Intelligent diagnosis method, system, terminal and medium for power equipment fault sound | |
CN111899757B (en) | Single-channel voice separation method and system for target speaker extraction | |
CN111986699A (en) | Sound event detection method based on full convolution network | |
CN112216287A (en) | Environmental sound identification method based on ensemble learning and convolution neural network | |
Naranjo-Alcazar et al. | On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification | |
Mustika et al. | Comparison of keras optimizers for earthquake signal classification based on deep neural networks | |
CN111243621A (en) | Construction method of GRU-SVM deep learning model for synthetic speech detection | |
KR102241364B1 (en) | Apparatus and method for determining user stress using speech signal | |
CN113361592B (en) | Acoustic event identification method based on public subspace representation learning | |
Mahanta et al. | The brogrammers dicova 2021 challenge system report | |
Neili et al. | Gammatonegram based pulmonary pathologies classification using convolutional neural networks | |
CN115083433A (en) | DNN-based text irrelevant representation tone clustering method | |
CN115267672A (en) | Method for detecting and positioning sound source | |
CN114898773A (en) | Synthetic speech detection method based on deep self-attention neural network classifier | |
CN112712096A (en) | Audio scene classification method and system based on deep recursive non-negative matrix decomposition | |
Estrebou et al. | Voice recognition based on probabilistic SOM | |
Alex et al. | Performance analysis of SOFM based reduced complexity feature extraction methods with back propagation neural network for multilingual digit recognition | |
Thakur et al. | Conv-codes: audio hashing for bird species classification | |
Long et al. | Offline to online speaker adaptation for real-time deep neural network based LVCSR systems | |
Ashurov et al. | Classification of Environmental Sounds Through Spectrogram-Like Images Using Dilation-Based CNN | |
Nagajyothi et al. | Voice Recognition Based on Vector Quantization Using LBG |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |