CN110176250B - Robust acoustic scene recognition method based on local learning - Google Patents
Robust acoustic scene recognition method based on local learning Download PDFInfo
- Publication number
- CN110176250B CN110176250B CN201910464699.XA CN201910464699A CN110176250B CN 110176250 B CN110176250 B CN 110176250B CN 201910464699 A CN201910464699 A CN 201910464699A CN 110176250 B CN110176250 B CN 110176250B
- Authority
- CN
- China
- Prior art keywords
- data
- sample
- acoustic scene
- samples
- scene recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 11
- 230000005236 sound signal Effects 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 4
- 230000006978 adaptation Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 3
- 238000013135 deep learning Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a robust acoustic scene recognition method based on local learning, and belongs to the technical field of sound signal processing. Firstly, collecting sound signals of different acoustic scenes, and extracting frequency domain characteristics; preprocessing the extracted characteristic data; then, carrying out mean shift on the normalized data, and carrying out data expansion by using a mixup method; then, a convolutional neural network model is established according to the local learning idea, and a training sample set subjected to data expansion is input into the model for training to obtain a trained model; and finally, sequentially carrying out frequency domain feature extraction and data preprocessing on a sample to be recognized, inputting the sample to be recognized into the trained model for recognition, and obtaining an acoustic scene recognition result. The method and the device solve the problem of low accuracy of acoustic scene identification under the conditions of audio channel mismatching and unbalanced number of different channel samples. The method can be applied to the recognition of the acoustic scene with various channels and unbalanced different channel sample numbers.
Description
Technical Field
The invention relates to an acoustic scene recognition method, and belongs to the technical field of sound signal processing.
Background
The sound scene recognition can be widely applied to the fields of robots, unmanned vehicles and the like which need to effectively sense the surrounding sound environment. However, there are often more than one real-world sound acquisition devices, and the acquired signals are usually not identical due to different channel characteristics of different acquisition devices. How to automatically and accurately classify scenes of sounds input by different channels and realize robust acoustic scene recognition becomes an urgent and challenging research topic.
In order to achieve robust acoustic scene recognition, a priori knowledge of the data needs to be fully utilized. At present, most methods are acoustic scene recognition methods under pure voice or the same channel; such as acoustic scene recognition based on convolutional neural networks, acoustic scene recognition based on hidden markov models, acoustic scene recognition based on recurrent neural networks, and so on. The technologies do not match the channels of the audio data and correspondingly adjust when the data volume of the equipment type is unbalanced, so that if the methods are applied to the actual environment with various channels and unbalanced number of different channel samples, the accuracy of acoustic scene recognition is low, and the requirements for actual tasks cannot be met.
Disclosure of Invention
The invention provides a robust acoustic scene recognition method based on local learning, which aims to solve the problem of low acoustic scene recognition accuracy under the conditions of audio channel mismatching and unbalanced different channel sample numbers.
The invention relates to a robust acoustic scene recognition method based on local learning, which is realized by the following technical scheme:
collecting sound signals of different acoustic scenes, extracting frequency domain characteristics, extracting 40-dimensional FBank characteristics of the sound signals, and establishing a training sample set;
step two, preprocessing the characteristic data extracted in the step one:
calculating the mean value and the standard deviation of the features extracted in the step one on each dimension, and normalizing all the features by using the obtained mean value and standard deviation;
step three, channel adaptation and data expansion:
carrying out mean shift on the normalized data; then, performing data expansion by using a mixup method;
establishing a convolutional neural network model according to the local learning idea, and constructing a loss function to enable the closest distance between any sample point and the similar sample point to be smaller than the closest distance between the same sample point and the heterogeneous sample point; inputting the training sample set subjected to data expansion into the convolutional neural network model for training to obtain a trained model; the homogeneous sample points refer to sample points belonging to the same audio scene as the arbitrary sample points; the heterogeneous sample points refer to sample points belonging to different audio scenes from the arbitrary sample points;
and fifthly, sequentially carrying out frequency domain feature extraction and data preprocessing on the sample to be recognized, and then inputting the sample to be recognized into the trained model for recognition to obtain an acoustic scene recognition result.
The most prominent characteristics and remarkable beneficial effects of the invention are as follows:
the invention relates to a robust acoustic scene recognition method based on local learning, which is characterized by collecting sound signals of different acoustic scenes, extracting FBank characteristics of the sound signals to establish a training sample set, then carrying out mean shift on the training sample set to increase the robustness of a system, and generating a new sample by using a mixup method to carry out data expansion so as to solve the problem of unbalanced equipment category number. The method has the characteristics of easy realization and good reliability, and can effectively identify the acoustic scene under the conditions of audio channel mismatching and unbalanced number of samples of different channels, thereby being suitable for popularization and use; compared with the traditional deep learning method, the method has better identification effect and faster calculation speed, and in a simulation experiment, the method obtains an average accuracy rate of 55% on a small amount of equipment, and the accuracy rate is 9.4% higher than that of a general deep learning method by 45.6%.
Drawings
FIG. 1 is a schematic diagram of the calculation of the mean and standard deviation in step two of the present invention;
FIG. 2 is a schematic diagram of a convolutional neural network model established based on the idea of local learning in the present invention; in fig. 2, a filled circle represents an anchor point, an open circle represents a sample point that is the same kind as the anchor point, and an open triangle represents a sample point that is different kind from the anchor point.
Detailed Description
The first embodiment is as follows: the robust acoustic scene recognition method based on local learning provided by the embodiment specifically comprises the following steps:
collecting sound signals of different acoustic scenes, wherein the sampling frequency is 44.1KHz, extracting frequency domain characteristics, segmenting the collected audio into a frame sequence, the frame length is 40ms, and extracting 40-dimensional FBank (filter bank) characteristics of each frame of data to establish a training sample set;
step two, preprocessing the characteristic data extracted in the step one:
calculating the mean value and the standard deviation of the features extracted in the step one in each dimension, as shown in fig. 1, calculating the mean value mu of all samples along the time axis direction, and calculating the standard deviation sigma by the same method; normalizing all the characteristics by using the obtained mean value and standard deviation;
step three, channel adaptation and data expansion:
after the processing of the second step, the characteristic data of all different channels are normalized by using the same mean value and standard deviation, and then the difference between the different channels can be scaled; therefore, mean shift is carried out on the normalized data (the difference value between the mean value of the sample data acquired by the main equipment and the mean value of the sample data acquired by other equipment is calculated, the difference value is added to the training sample data with a certain probability to be used as the processed training data, and the method is called mean shift) so as to increase the robustness of the system; and then a mixup method (an unconventional data enhancement method) is used for generating new samples for data expansion so as to solve the problem of unbalanced equipment category number.
Establishing a convolutional neural network model according to the local learning idea, and constructing a loss function to enable the closest distance between any sample point and the similar sample point to be smaller than the closest distance between the same sample point and the heterogeneous sample point; inputting the training sample set subjected to data expansion into the convolutional neural network model for training to obtain a trained model; the homogeneous sample points referred to herein refer to sample points belonging to the same audio scene as the arbitrary sample points (anchor points); the heterogeneous sample point refers to a sample point belonging to a different audio scene from the arbitrary sample point (anchor point);
and fifthly, sequentially carrying out frequency domain feature extraction and data preprocessing on the sample to be recognized, and then inputting the sample to be recognized into the trained model for recognition to obtain an acoustic scene recognition result.
The second embodiment is as follows: the difference between this embodiment and the first embodiment is that the normalization of all the features by using the obtained mean and standard deviation in the second step specifically includes:
the resulting mean and standard deviation were used to normalize the feature data according to the following formula:
wherein x isnormData after normalization are represented, mu is a mean value, and sigma is a standard deviation; x represents characteristic data.
Other steps and parameters are the same as those in the first embodiment.
The third concrete implementation mode: the difference between this embodiment and the second embodiment is that the mean shift in step three is specifically:
adding a difference value epsilon to the normalized data by a probability p:
wherein, mumostA data mean vector representing the device that collected the largest number of samples; n denotes the number of devices other than the device which collects the largest number of samples, μiA data mean vector representing the ith device except the device with the largest number of collected samples; 1, …, N; to increase the robustness of the system, not all data are differenced, but rather the probability p, p ∈ [0,1 ] is added]。
Other steps and parameters are the same as those in the second embodiment.
The fourth concrete implementation mode: the difference between this embodiment and the first, second or third embodiment is that the data expansion using the mixup method in step three is specifically:
the mixup method generates a new sample by combining two known samples, so that one sample (x) is randomly selected from the data collected by the device collecting the largest number of samplesj,yj) Randomly picking another sample (x) from data collected by other devicesi,yi) Combining the two samples to generate a new sampleNew samplesCharacteristic data ofAnd corresponding labelThe calculation method of (c) is as follows:
wherein, λ represents the mixing coefficient, λ ∈ [0,1 ]];xi、yiRespectively represent samples (x)i,yi) And the corresponding tag, xj、yjRespectively represent samples (x)j,yj) And the corresponding tag.
Other steps and parameters are the same as those in the first, second or third embodiment.
The fifth concrete implementation mode: the present embodiment is described with reference to fig. 2, and is different from the fourth embodiment in that the loss function in step four is specifically:
L=max(0,dap-dan+α) (4)
as shown in fig. 2, d needs to be satisfied for any anchor point (solid circle)ap+α<danThus constructing the above-mentioned loss function L;
wherein, the anchor point is a certain sample point; dapRepresenting the nearest Euclidean distance, d, of a sample point (anchor point) to a sample point of the same kindanDenotes the Euclidean distance between the sample point (anchor point) and the heterogeneous sample point, and α denotes dapAnd danThe minimum value of the distance interval.
Other steps and parameters are the same as those in the fourth embodiment.
Examples
The following examples were used to demonstrate the beneficial effects of the present invention:
comparing the method with a general deep learning method on an international public data set DCASE2018 Task1-Subtask B acoustic scene recognition data set, wherein the method comprises the following steps:
step one, segmenting audio in an international public data set DCASE2018 Task1-Subtask B acoustic scene recognition data set into a frame sequence, wherein the frame length is 40ms, extracting 40-dimensional FBank features from each frame of data, and establishing a training sample set by using the extracted FBank features;
step two, calculating the mean value mu and the standard deviation sigma on each dimension of the features extracted in the step one, and normalizing all the features by using the obtained mean value and standard deviation; the normalized expression is:
thirdly, mean shift is carried out on the normalized data:
the normalized data is added with the difference epsilon with the probability p being 0.5:
generating new sample by using mixup methodPerform data expansion, new samplesCharacteristic data ofAnd corresponding labelThe calculation method of (c) is as follows:
here, the mixing coefficient λ is 0.1;
step four, establishing a convolutional neural network model according to the local learning idea, and constructing a loss function:
L=max(0,dap-dan+α) (4)
α is set to 1.5;
inputting the training sample set subjected to data expansion into the convolutional neural network model for training to obtain a trained model;
and fifthly, sequentially carrying out frequency domain feature extraction and data preprocessing on the sample to be recognized, and then inputting the sample to be recognized into the trained model for recognition to obtain an acoustic scene recognition result.
Compared with the identification results obtained by a general deep learning method, the method of the invention obtains an average accuracy rate of 55% on a verification set of a small amount of equipment (except the equipment with the largest number of collected samples), and the accuracy rate is 9.4% higher than that of a general deep learning method of 45.6%. Therefore, the method can effectively identify the acoustic scene under the conditions of audio channel mismatching and unbalanced number of samples of different channels.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.
Claims (5)
1. A robust acoustic scene recognition method based on local learning is characterized by specifically comprising the following steps:
collecting sound signals of different acoustic scenes, extracting frequency domain characteristics, extracting 40-dimensional FBank characteristics of the sound signals, and establishing a training sample set;
step two, preprocessing the characteristic data extracted in the step one:
calculating the mean value and the standard deviation of the features extracted in the step one on each dimension, and normalizing all the features by using the obtained mean value and standard deviation;
step three, channel adaptation and data expansion:
carrying out mean shift on the normalized data; then, performing data expansion by using a mixup method;
establishing a convolutional neural network model according to the local learning idea, and constructing a loss function to enable the closest distance between any sample point and the similar sample point to be smaller than the closest distance between the same sample point and the heterogeneous sample point; inputting the training sample set subjected to data expansion into the convolutional neural network model for training to obtain a trained model; the homogeneous sample points refer to sample points belonging to the same audio scene as the arbitrary sample points; the heterogeneous sample points refer to sample points belonging to different audio scenes from the arbitrary sample points;
and fifthly, sequentially carrying out frequency domain feature extraction and data preprocessing on the sample to be recognized, and then inputting the sample to be recognized into the trained model for recognition to obtain an acoustic scene recognition result.
2. The robust acoustic scene recognition method based on local learning according to claim 1, wherein the normalization of all features by using the obtained mean and standard deviation in the second step is specifically:
the feature data is normalized as follows:
wherein x isnormData after normalization are represented, mu is a mean value, and sigma is a standard deviation; x represents characteristic data.
3. The robust acoustic scene recognition method based on local learning according to claim 2, wherein the mean shift in step three specifically includes:
optionally, the normalized data is summed with a probability p, the difference epsilon:
wherein, mumostA data mean vector representing the device that collected the largest number of samples; n denotes the number of devices other than the device which collects the largest number of samples, μiA data mean vector representing the ith device except the device with the largest number of collected samples; 1., N.
4. The robust acoustic scene recognition method based on local learning according to claim 1, 2 or 3, wherein the data expansion using the mixup method in step three specifically comprises:
randomly selecting a sample (x) from the data collected by the device with the largest number of samplesj,yj) Randomly picking another sample (x) from data collected by other devicesi,yi) Combining the two samples to generate a new sampleNew samplesCharacteristic data ofAnd corresponding labelThe calculation method of (c) is as follows:
wherein, λ represents the mixing coefficient, λ ∈ [0,1 ]];xi、yiRespectively represent samples (x)i,yi) And the corresponding tag, xj、yjRespectively represent samples (x)j,yj) And the corresponding tag.
5. The robust acoustic scene recognition method based on local learning according to claim 4, wherein the loss function in step four is specifically:
L=max(0,dap-dan+α) (4)
wherein d isapRepresenting the nearest Euclidean distance, d, of a sample point to a sample point of the same kindanRepresenting the nearest Euclidean distance between a sample point and a heterogeneous sample point, and alpha represents dapAnd danThe minimum value of the distance interval.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910464699.XA CN110176250B (en) | 2019-05-30 | 2019-05-30 | Robust acoustic scene recognition method based on local learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910464699.XA CN110176250B (en) | 2019-05-30 | 2019-05-30 | Robust acoustic scene recognition method based on local learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110176250A CN110176250A (en) | 2019-08-27 |
CN110176250B true CN110176250B (en) | 2021-05-07 |
Family
ID=67696792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910464699.XA Active CN110176250B (en) | 2019-05-30 | 2019-05-30 | Robust acoustic scene recognition method based on local learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110176250B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110751183A (en) * | 2019-09-24 | 2020-02-04 | 东软集团股份有限公司 | Image data classification model generation method, image data classification method and device |
CN110852200B (en) * | 2019-10-28 | 2023-05-12 | 华中科技大学 | Non-contact human body action detection method |
CN112489678B (en) * | 2020-11-13 | 2023-12-05 | 深圳市云网万店科技有限公司 | Scene recognition method and device based on channel characteristics |
CN112990443B (en) * | 2021-05-06 | 2021-08-27 | 北京芯盾时代科技有限公司 | Neural network evaluation method and device, electronic device, and storage medium |
CN113793624B (en) * | 2021-06-11 | 2023-11-17 | 上海师范大学 | Acoustic scene classification method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106952644A (en) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | A kind of complex audio segmentation clustering method based on bottleneck characteristic |
CN107203777A (en) * | 2017-04-19 | 2017-09-26 | 北京协同创新研究院 | audio scene classification method and device |
CN108615532A (en) * | 2018-05-03 | 2018-10-02 | 张晓雷 | A kind of sorting technique and device applied to sound field scape |
US20180336889A1 (en) * | 2017-05-19 | 2018-11-22 | Baidu Online Network Technology (Beijing) Co., Ltd . | Method and Apparatus of Building Acoustic Feature Extracting Model, and Acoustic Feature Extracting Method and Apparatus |
CN109002529A (en) * | 2018-07-17 | 2018-12-14 | 厦门美图之家科技有限公司 | Audio search method and device |
CN109061558A (en) * | 2018-06-21 | 2018-12-21 | 桂林电子科技大学 | A kind of sound collision detection and sound localization method based on deep learning |
CN109448703A (en) * | 2018-11-14 | 2019-03-08 | 山东师范大学 | In conjunction with the audio scene recognition method and system of deep neural network and topic model |
CN109558512A (en) * | 2019-01-24 | 2019-04-02 | 广州荔支网络技术有限公司 | A kind of personalized recommendation method based on audio, device and mobile terminal |
-
2019
- 2019-05-30 CN CN201910464699.XA patent/CN110176250B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106952644A (en) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | A kind of complex audio segmentation clustering method based on bottleneck characteristic |
CN107203777A (en) * | 2017-04-19 | 2017-09-26 | 北京协同创新研究院 | audio scene classification method and device |
US20180336889A1 (en) * | 2017-05-19 | 2018-11-22 | Baidu Online Network Technology (Beijing) Co., Ltd . | Method and Apparatus of Building Acoustic Feature Extracting Model, and Acoustic Feature Extracting Method and Apparatus |
CN108615532A (en) * | 2018-05-03 | 2018-10-02 | 张晓雷 | A kind of sorting technique and device applied to sound field scape |
CN109061558A (en) * | 2018-06-21 | 2018-12-21 | 桂林电子科技大学 | A kind of sound collision detection and sound localization method based on deep learning |
CN109002529A (en) * | 2018-07-17 | 2018-12-14 | 厦门美图之家科技有限公司 | Audio search method and device |
CN109448703A (en) * | 2018-11-14 | 2019-03-08 | 山东师范大学 | In conjunction with the audio scene recognition method and system of deep neural network and topic model |
CN109558512A (en) * | 2019-01-24 | 2019-04-02 | 广州荔支网络技术有限公司 | A kind of personalized recommendation method based on audio, device and mobile terminal |
Non-Patent Citations (7)
Title |
---|
Acoustic scene classification: an overview of DCASE 2017 challenge entries;Mesaros A 等;《2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC)》;20181105;全文 * |
Facenet: A unified embedding for face;Schroff F 等;《Proceedings of the IEEE conference on computer》;20151015;全文 * |
Feature enhancement for robust acoustic scene classification with device mismatch;Song H 等;《Tech. Rep., DCASE2019 Challenge》;20191026;全文 * |
mixup: Beyond empirical risk;Zhang H 等;《arxiv.org/abs/1710.09412》;20180427;全文 * |
Semi-supervised triplet loss based learning of ambient audio embeddings;Turpault N 等;《ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;20190417;全文 * |
基于数据扩充和三元组损失的不匹配声学场景的鲁棒识别方法;杨皓;《中国优秀硕士学位论文全文数据库 基础科学辑》;20200215;全文 * |
复杂场景下的音频自动标注方法;张立赛;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110176250A (en) | 2019-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110176250B (en) | Robust acoustic scene recognition method based on local learning | |
Chen et al. | Deep attractor network for single-microphone speaker separation | |
Wang et al. | Deep extractor network for target speaker recovery from single channel speech mixtures | |
CN110600018B (en) | Voice recognition method and device and neural network training method and device | |
CN107393526B (en) | Voice silence detection method, device, computer equipment and storage medium | |
CN106297776B (en) | A kind of voice keyword retrieval method based on audio template | |
CN101980336B (en) | Hidden Markov model-based vehicle sound identification method | |
KR100745976B1 (en) | Method and apparatus for classifying voice and non-voice using sound model | |
CN108922513A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN107393527A (en) | The determination methods of speaker's number | |
CN109346084A (en) | Method for distinguishing speek person based on depth storehouse autoencoder network | |
CN113111786B (en) | Underwater target identification method based on small sample training diagram convolutional network | |
CN113763965A (en) | Speaker identification method with multiple attention characteristics fused | |
CN110544482A (en) | single-channel voice separation system | |
CN114234061B (en) | Intelligent discrimination method for water leakage sound of pressurized operation water supply pipeline based on neural network | |
CN108573711A (en) | A kind of single microphone speech separating method based on NMF algorithms | |
CN117789699B (en) | Speech recognition method, device, electronic equipment and computer readable storage medium | |
CN113516987B (en) | Speaker recognition method, speaker recognition device, storage medium and equipment | |
US11776532B2 (en) | Audio processing apparatus and method for audio scene classification | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
CN107564546A (en) | A kind of sound end detecting method based on positional information | |
CN110060699A (en) | A kind of single channel speech separating method based on the sparse expansion of depth | |
CN116383719A (en) | MGF radio frequency fingerprint identification method for LFM radar | |
CN110807370A (en) | Multimode-based conference speaker identity noninductive confirmation method | |
Zhang et al. | End-to-end overlapped speech detection and speaker counting with raw waveform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |