CN110176250B - Robust acoustic scene recognition method based on local learning - Google Patents

Robust acoustic scene recognition method based on local learning Download PDF

Info

Publication number
CN110176250B
CN110176250B CN201910464699.XA CN201910464699A CN110176250B CN 110176250 B CN110176250 B CN 110176250B CN 201910464699 A CN201910464699 A CN 201910464699A CN 110176250 B CN110176250 B CN 110176250B
Authority
CN
China
Prior art keywords
data
sample
acoustic scene
samples
scene recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910464699.XA
Other languages
Chinese (zh)
Other versions
CN110176250A (en
Inventor
韩纪庆
杨皓
郑贵滨
郑铁然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201910464699.XA priority Critical patent/CN110176250B/en
Publication of CN110176250A publication Critical patent/CN110176250A/en
Application granted granted Critical
Publication of CN110176250B publication Critical patent/CN110176250B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a robust acoustic scene recognition method based on local learning, and belongs to the technical field of sound signal processing. Firstly, collecting sound signals of different acoustic scenes, and extracting frequency domain characteristics; preprocessing the extracted characteristic data; then, carrying out mean shift on the normalized data, and carrying out data expansion by using a mixup method; then, a convolutional neural network model is established according to the local learning idea, and a training sample set subjected to data expansion is input into the model for training to obtain a trained model; and finally, sequentially carrying out frequency domain feature extraction and data preprocessing on a sample to be recognized, inputting the sample to be recognized into the trained model for recognition, and obtaining an acoustic scene recognition result. The method and the device solve the problem of low accuracy of acoustic scene identification under the conditions of audio channel mismatching and unbalanced number of different channel samples. The method can be applied to the recognition of the acoustic scene with various channels and unbalanced different channel sample numbers.

Description

Robust acoustic scene recognition method based on local learning
Technical Field
The invention relates to an acoustic scene recognition method, and belongs to the technical field of sound signal processing.
Background
The sound scene recognition can be widely applied to the fields of robots, unmanned vehicles and the like which need to effectively sense the surrounding sound environment. However, there are often more than one real-world sound acquisition devices, and the acquired signals are usually not identical due to different channel characteristics of different acquisition devices. How to automatically and accurately classify scenes of sounds input by different channels and realize robust acoustic scene recognition becomes an urgent and challenging research topic.
In order to achieve robust acoustic scene recognition, a priori knowledge of the data needs to be fully utilized. At present, most methods are acoustic scene recognition methods under pure voice or the same channel; such as acoustic scene recognition based on convolutional neural networks, acoustic scene recognition based on hidden markov models, acoustic scene recognition based on recurrent neural networks, and so on. The technologies do not match the channels of the audio data and correspondingly adjust when the data volume of the equipment type is unbalanced, so that if the methods are applied to the actual environment with various channels and unbalanced number of different channel samples, the accuracy of acoustic scene recognition is low, and the requirements for actual tasks cannot be met.
Disclosure of Invention
The invention provides a robust acoustic scene recognition method based on local learning, which aims to solve the problem of low acoustic scene recognition accuracy under the conditions of audio channel mismatching and unbalanced different channel sample numbers.
The invention relates to a robust acoustic scene recognition method based on local learning, which is realized by the following technical scheme:
collecting sound signals of different acoustic scenes, extracting frequency domain characteristics, extracting 40-dimensional FBank characteristics of the sound signals, and establishing a training sample set;
step two, preprocessing the characteristic data extracted in the step one:
calculating the mean value and the standard deviation of the features extracted in the step one on each dimension, and normalizing all the features by using the obtained mean value and standard deviation;
step three, channel adaptation and data expansion:
carrying out mean shift on the normalized data; then, performing data expansion by using a mixup method;
establishing a convolutional neural network model according to the local learning idea, and constructing a loss function to enable the closest distance between any sample point and the similar sample point to be smaller than the closest distance between the same sample point and the heterogeneous sample point; inputting the training sample set subjected to data expansion into the convolutional neural network model for training to obtain a trained model; the homogeneous sample points refer to sample points belonging to the same audio scene as the arbitrary sample points; the heterogeneous sample points refer to sample points belonging to different audio scenes from the arbitrary sample points;
and fifthly, sequentially carrying out frequency domain feature extraction and data preprocessing on the sample to be recognized, and then inputting the sample to be recognized into the trained model for recognition to obtain an acoustic scene recognition result.
The most prominent characteristics and remarkable beneficial effects of the invention are as follows:
the invention relates to a robust acoustic scene recognition method based on local learning, which is characterized by collecting sound signals of different acoustic scenes, extracting FBank characteristics of the sound signals to establish a training sample set, then carrying out mean shift on the training sample set to increase the robustness of a system, and generating a new sample by using a mixup method to carry out data expansion so as to solve the problem of unbalanced equipment category number. The method has the characteristics of easy realization and good reliability, and can effectively identify the acoustic scene under the conditions of audio channel mismatching and unbalanced number of samples of different channels, thereby being suitable for popularization and use; compared with the traditional deep learning method, the method has better identification effect and faster calculation speed, and in a simulation experiment, the method obtains an average accuracy rate of 55% on a small amount of equipment, and the accuracy rate is 9.4% higher than that of a general deep learning method by 45.6%.
Drawings
FIG. 1 is a schematic diagram of the calculation of the mean and standard deviation in step two of the present invention;
FIG. 2 is a schematic diagram of a convolutional neural network model established based on the idea of local learning in the present invention; in fig. 2, a filled circle represents an anchor point, an open circle represents a sample point that is the same kind as the anchor point, and an open triangle represents a sample point that is different kind from the anchor point.
Detailed Description
The first embodiment is as follows: the robust acoustic scene recognition method based on local learning provided by the embodiment specifically comprises the following steps:
collecting sound signals of different acoustic scenes, wherein the sampling frequency is 44.1KHz, extracting frequency domain characteristics, segmenting the collected audio into a frame sequence, the frame length is 40ms, and extracting 40-dimensional FBank (filter bank) characteristics of each frame of data to establish a training sample set;
step two, preprocessing the characteristic data extracted in the step one:
calculating the mean value and the standard deviation of the features extracted in the step one in each dimension, as shown in fig. 1, calculating the mean value mu of all samples along the time axis direction, and calculating the standard deviation sigma by the same method; normalizing all the characteristics by using the obtained mean value and standard deviation;
step three, channel adaptation and data expansion:
after the processing of the second step, the characteristic data of all different channels are normalized by using the same mean value and standard deviation, and then the difference between the different channels can be scaled; therefore, mean shift is carried out on the normalized data (the difference value between the mean value of the sample data acquired by the main equipment and the mean value of the sample data acquired by other equipment is calculated, the difference value is added to the training sample data with a certain probability to be used as the processed training data, and the method is called mean shift) so as to increase the robustness of the system; and then a mixup method (an unconventional data enhancement method) is used for generating new samples for data expansion so as to solve the problem of unbalanced equipment category number.
Establishing a convolutional neural network model according to the local learning idea, and constructing a loss function to enable the closest distance between any sample point and the similar sample point to be smaller than the closest distance between the same sample point and the heterogeneous sample point; inputting the training sample set subjected to data expansion into the convolutional neural network model for training to obtain a trained model; the homogeneous sample points referred to herein refer to sample points belonging to the same audio scene as the arbitrary sample points (anchor points); the heterogeneous sample point refers to a sample point belonging to a different audio scene from the arbitrary sample point (anchor point);
and fifthly, sequentially carrying out frequency domain feature extraction and data preprocessing on the sample to be recognized, and then inputting the sample to be recognized into the trained model for recognition to obtain an acoustic scene recognition result.
The second embodiment is as follows: the difference between this embodiment and the first embodiment is that the normalization of all the features by using the obtained mean and standard deviation in the second step specifically includes:
the resulting mean and standard deviation were used to normalize the feature data according to the following formula:
Figure BDA0002079080920000031
wherein x isnormData after normalization are represented, mu is a mean value, and sigma is a standard deviation; x represents characteristic data.
Other steps and parameters are the same as those in the first embodiment.
The third concrete implementation mode: the difference between this embodiment and the second embodiment is that the mean shift in step three is specifically:
adding a difference value epsilon to the normalized data by a probability p:
Figure BDA0002079080920000032
wherein, mumostA data mean vector representing the device that collected the largest number of samples; n denotes the number of devices other than the device which collects the largest number of samples, μiA data mean vector representing the ith device except the device with the largest number of collected samples; 1, …, N; to increase the robustness of the system, not all data are differenced, but rather the probability p, p ∈ [0,1 ] is added]。
Other steps and parameters are the same as those in the second embodiment.
The fourth concrete implementation mode: the difference between this embodiment and the first, second or third embodiment is that the data expansion using the mixup method in step three is specifically:
the mixup method generates a new sample by combining two known samples, so that one sample (x) is randomly selected from the data collected by the device collecting the largest number of samplesj,yj) Randomly picking another sample (x) from data collected by other devicesi,yi) Combining the two samples to generate a new sample
Figure BDA0002079080920000041
New samples
Figure BDA0002079080920000042
Characteristic data of
Figure BDA0002079080920000043
And corresponding label
Figure BDA0002079080920000044
The calculation method of (c) is as follows:
Figure BDA0002079080920000045
wherein, λ represents the mixing coefficient, λ ∈ [0,1 ]];xi、yiRespectively represent samples (x)i,yi) And the corresponding tag, xj、yjRespectively represent samples (x)j,yj) And the corresponding tag.
Other steps and parameters are the same as those in the first, second or third embodiment.
The fifth concrete implementation mode: the present embodiment is described with reference to fig. 2, and is different from the fourth embodiment in that the loss function in step four is specifically:
L=max(0,dap-dan+α) (4)
as shown in fig. 2, d needs to be satisfied for any anchor point (solid circle)ap+α<danThus constructing the above-mentioned loss function L;
wherein, the anchor point is a certain sample point; dapRepresenting the nearest Euclidean distance, d, of a sample point (anchor point) to a sample point of the same kindanDenotes the Euclidean distance between the sample point (anchor point) and the heterogeneous sample point, and α denotes dapAnd danThe minimum value of the distance interval.
Other steps and parameters are the same as those in the fourth embodiment.
Examples
The following examples were used to demonstrate the beneficial effects of the present invention:
comparing the method with a general deep learning method on an international public data set DCASE2018 Task1-Subtask B acoustic scene recognition data set, wherein the method comprises the following steps:
step one, segmenting audio in an international public data set DCASE2018 Task1-Subtask B acoustic scene recognition data set into a frame sequence, wherein the frame length is 40ms, extracting 40-dimensional FBank features from each frame of data, and establishing a training sample set by using the extracted FBank features;
step two, calculating the mean value mu and the standard deviation sigma on each dimension of the features extracted in the step one, and normalizing all the features by using the obtained mean value and standard deviation; the normalized expression is:
Figure BDA0002079080920000051
thirdly, mean shift is carried out on the normalized data:
the normalized data is added with the difference epsilon with the probability p being 0.5:
Figure BDA0002079080920000052
generating new sample by using mixup method
Figure BDA0002079080920000053
Perform data expansion, new samples
Figure BDA0002079080920000054
Characteristic data of
Figure BDA0002079080920000055
And corresponding label
Figure BDA0002079080920000056
The calculation method of (c) is as follows:
Figure BDA0002079080920000057
here, the mixing coefficient λ is 0.1;
step four, establishing a convolutional neural network model according to the local learning idea, and constructing a loss function:
L=max(0,dap-dan+α) (4)
α is set to 1.5;
inputting the training sample set subjected to data expansion into the convolutional neural network model for training to obtain a trained model;
and fifthly, sequentially carrying out frequency domain feature extraction and data preprocessing on the sample to be recognized, and then inputting the sample to be recognized into the trained model for recognition to obtain an acoustic scene recognition result.
Compared with the identification results obtained by a general deep learning method, the method of the invention obtains an average accuracy rate of 55% on a verification set of a small amount of equipment (except the equipment with the largest number of collected samples), and the accuracy rate is 9.4% higher than that of a general deep learning method of 45.6%. Therefore, the method can effectively identify the acoustic scene under the conditions of audio channel mismatching and unbalanced number of samples of different channels.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims (5)

1. A robust acoustic scene recognition method based on local learning is characterized by specifically comprising the following steps:
collecting sound signals of different acoustic scenes, extracting frequency domain characteristics, extracting 40-dimensional FBank characteristics of the sound signals, and establishing a training sample set;
step two, preprocessing the characteristic data extracted in the step one:
calculating the mean value and the standard deviation of the features extracted in the step one on each dimension, and normalizing all the features by using the obtained mean value and standard deviation;
step three, channel adaptation and data expansion:
carrying out mean shift on the normalized data; then, performing data expansion by using a mixup method;
establishing a convolutional neural network model according to the local learning idea, and constructing a loss function to enable the closest distance between any sample point and the similar sample point to be smaller than the closest distance between the same sample point and the heterogeneous sample point; inputting the training sample set subjected to data expansion into the convolutional neural network model for training to obtain a trained model; the homogeneous sample points refer to sample points belonging to the same audio scene as the arbitrary sample points; the heterogeneous sample points refer to sample points belonging to different audio scenes from the arbitrary sample points;
and fifthly, sequentially carrying out frequency domain feature extraction and data preprocessing on the sample to be recognized, and then inputting the sample to be recognized into the trained model for recognition to obtain an acoustic scene recognition result.
2. The robust acoustic scene recognition method based on local learning according to claim 1, wherein the normalization of all features by using the obtained mean and standard deviation in the second step is specifically:
the feature data is normalized as follows:
Figure FDA0002962293680000011
wherein x isnormData after normalization are represented, mu is a mean value, and sigma is a standard deviation; x represents characteristic data.
3. The robust acoustic scene recognition method based on local learning according to claim 2, wherein the mean shift in step three specifically includes:
optionally, the normalized data is summed with a probability p, the difference epsilon:
Figure FDA0002962293680000012
wherein, mumostA data mean vector representing the device that collected the largest number of samples; n denotes the number of devices other than the device which collects the largest number of samples, μiA data mean vector representing the ith device except the device with the largest number of collected samples; 1., N.
4. The robust acoustic scene recognition method based on local learning according to claim 1, 2 or 3, wherein the data expansion using the mixup method in step three specifically comprises:
randomly selecting a sample (x) from the data collected by the device with the largest number of samplesj,yj) Randomly picking another sample (x) from data collected by other devicesi,yi) Combining the two samples to generate a new sample
Figure FDA0002962293680000021
New samples
Figure FDA0002962293680000022
Characteristic data of
Figure FDA0002962293680000023
And corresponding label
Figure FDA0002962293680000024
The calculation method of (c) is as follows:
Figure FDA0002962293680000025
wherein, λ represents the mixing coefficient, λ ∈ [0,1 ]];xi、yiRespectively represent samples (x)i,yi) And the corresponding tag, xj、yjRespectively represent samples (x)j,yj) And the corresponding tag.
5. The robust acoustic scene recognition method based on local learning according to claim 4, wherein the loss function in step four is specifically:
L=max(0,dap-dan+α) (4)
wherein d isapRepresenting the nearest Euclidean distance, d, of a sample point to a sample point of the same kindanRepresenting the nearest Euclidean distance between a sample point and a heterogeneous sample point, and alpha represents dapAnd danThe minimum value of the distance interval.
CN201910464699.XA 2019-05-30 2019-05-30 Robust acoustic scene recognition method based on local learning Active CN110176250B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910464699.XA CN110176250B (en) 2019-05-30 2019-05-30 Robust acoustic scene recognition method based on local learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910464699.XA CN110176250B (en) 2019-05-30 2019-05-30 Robust acoustic scene recognition method based on local learning

Publications (2)

Publication Number Publication Date
CN110176250A CN110176250A (en) 2019-08-27
CN110176250B true CN110176250B (en) 2021-05-07

Family

ID=67696792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910464699.XA Active CN110176250B (en) 2019-05-30 2019-05-30 Robust acoustic scene recognition method based on local learning

Country Status (1)

Country Link
CN (1) CN110176250B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751183A (en) * 2019-09-24 2020-02-04 东软集团股份有限公司 Image data classification model generation method, image data classification method and device
CN110852200B (en) * 2019-10-28 2023-05-12 华中科技大学 Non-contact human body action detection method
CN112489678B (en) * 2020-11-13 2023-12-05 深圳市云网万店科技有限公司 Scene recognition method and device based on channel characteristics
CN112990443B (en) * 2021-05-06 2021-08-27 北京芯盾时代科技有限公司 Neural network evaluation method and device, electronic device, and storage medium
CN113793624B (en) * 2021-06-11 2023-11-17 上海师范大学 Acoustic scene classification method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic
CN107203777A (en) * 2017-04-19 2017-09-26 北京协同创新研究院 audio scene classification method and device
CN108615532A (en) * 2018-05-03 2018-10-02 张晓雷 A kind of sorting technique and device applied to sound field scape
US20180336889A1 (en) * 2017-05-19 2018-11-22 Baidu Online Network Technology (Beijing) Co., Ltd . Method and Apparatus of Building Acoustic Feature Extracting Model, and Acoustic Feature Extracting Method and Apparatus
CN109002529A (en) * 2018-07-17 2018-12-14 厦门美图之家科技有限公司 Audio search method and device
CN109061558A (en) * 2018-06-21 2018-12-21 桂林电子科技大学 A kind of sound collision detection and sound localization method based on deep learning
CN109448703A (en) * 2018-11-14 2019-03-08 山东师范大学 In conjunction with the audio scene recognition method and system of deep neural network and topic model
CN109558512A (en) * 2019-01-24 2019-04-02 广州荔支网络技术有限公司 A kind of personalized recommendation method based on audio, device and mobile terminal

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic
CN107203777A (en) * 2017-04-19 2017-09-26 北京协同创新研究院 audio scene classification method and device
US20180336889A1 (en) * 2017-05-19 2018-11-22 Baidu Online Network Technology (Beijing) Co., Ltd . Method and Apparatus of Building Acoustic Feature Extracting Model, and Acoustic Feature Extracting Method and Apparatus
CN108615532A (en) * 2018-05-03 2018-10-02 张晓雷 A kind of sorting technique and device applied to sound field scape
CN109061558A (en) * 2018-06-21 2018-12-21 桂林电子科技大学 A kind of sound collision detection and sound localization method based on deep learning
CN109002529A (en) * 2018-07-17 2018-12-14 厦门美图之家科技有限公司 Audio search method and device
CN109448703A (en) * 2018-11-14 2019-03-08 山东师范大学 In conjunction with the audio scene recognition method and system of deep neural network and topic model
CN109558512A (en) * 2019-01-24 2019-04-02 广州荔支网络技术有限公司 A kind of personalized recommendation method based on audio, device and mobile terminal

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Acoustic scene classification: an overview of DCASE 2017 challenge entries;Mesaros A 等;《2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC)》;20181105;全文 *
Facenet: A unified embedding for face;Schroff F 等;《Proceedings of the IEEE conference on computer》;20151015;全文 *
Feature enhancement for robust acoustic scene classification with device mismatch;Song H 等;《Tech. Rep., DCASE2019 Challenge》;20191026;全文 *
mixup: Beyond empirical risk;Zhang H 等;《arxiv.org/abs/1710.09412》;20180427;全文 *
Semi-supervised triplet loss based learning of ambient audio embeddings;Turpault N 等;《ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;20190417;全文 *
基于数据扩充和三元组损失的不匹配声学场景的鲁棒识别方法;杨皓;《中国优秀硕士学位论文全文数据库 基础科学辑》;20200215;全文 *
复杂场景下的音频自动标注方法;张立赛;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115;全文 *

Also Published As

Publication number Publication date
CN110176250A (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN110176250B (en) Robust acoustic scene recognition method based on local learning
Chen et al. Deep attractor network for single-microphone speaker separation
Wang et al. Deep extractor network for target speaker recovery from single channel speech mixtures
CN110600018B (en) Voice recognition method and device and neural network training method and device
CN107393526B (en) Voice silence detection method, device, computer equipment and storage medium
CN106297776B (en) A kind of voice keyword retrieval method based on audio template
CN101980336B (en) Hidden Markov model-based vehicle sound identification method
KR100745976B1 (en) Method and apparatus for classifying voice and non-voice using sound model
CN108922513A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN107393527A (en) The determination methods of speaker&#39;s number
CN109346084A (en) Method for distinguishing speek person based on depth storehouse autoencoder network
CN113111786B (en) Underwater target identification method based on small sample training diagram convolutional network
CN113763965A (en) Speaker identification method with multiple attention characteristics fused
CN110544482A (en) single-channel voice separation system
CN114234061B (en) Intelligent discrimination method for water leakage sound of pressurized operation water supply pipeline based on neural network
CN108573711A (en) A kind of single microphone speech separating method based on NMF algorithms
CN117789699B (en) Speech recognition method, device, electronic equipment and computer readable storage medium
CN113516987B (en) Speaker recognition method, speaker recognition device, storage medium and equipment
US11776532B2 (en) Audio processing apparatus and method for audio scene classification
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
CN107564546A (en) A kind of sound end detecting method based on positional information
CN110060699A (en) A kind of single channel speech separating method based on the sparse expansion of depth
CN116383719A (en) MGF radio frequency fingerprint identification method for LFM radar
CN110807370A (en) Multimode-based conference speaker identity noninductive confirmation method
Zhang et al. End-to-end overlapped speech detection and speaker counting with raw waveform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant