CN110070856A - A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data - Google Patents

A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data Download PDF

Info

Publication number
CN110070856A
CN110070856A CN201910233185.3A CN201910233185A CN110070856A CN 110070856 A CN110070856 A CN 110070856A CN 201910233185 A CN201910233185 A CN 201910233185A CN 110070856 A CN110070856 A CN 110070856A
Authority
CN
China
Prior art keywords
audio
harmonic wave
training set
enhancing
impulse source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910233185.3A
Other languages
Chinese (zh)
Inventor
张涛
刘赣俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910233185.3A priority Critical patent/CN110070856A/en
Publication of CN110070856A publication Critical patent/CN110070856A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data, comprising: the separation that total harmonic component H and total impact component P are carried out to the audio in training set realizes data enhancing by generating two section audios by a segment of audio;, as training set input scene identifying system, audio feature extraction is carried out for by two isolated section audios of harmonic wave impulse source;Using the audio frequency characteristics of training set as the input of classifier network, training classifier network identifies audio scene according to the output result of classifier network in test set.The present invention is used to identify that, using identical sorter model, classification accuracy has obtained biggish promotion in audio scene.By being enhanced based on harmonic wave impulse source mask data, system will obtain a bigger, more diversified training set.Classifier network is trained based on the training set, the learning ability and generalization ability of classifier network can be promoted well.

Description

A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data
Technical field
The present invention relates to a kind of identification of audio scene and classification methods.Increase more particularly to the data of a kind of pair of audio processing Strong and pattern-recognition the audio scene recognition method based on the enhancing of harmonic wave impulse source mask data.
Background technique
Currently, generalling use following method for scene Recognition.
1, audio scene identification description
The data of audio scene identification are directly acquired in actual environment, and the presence of overlapping sound is just certainly existed. Human lives are in a complicated audio environment, can be fine while ignoring or simply recognizing other sound sources Ground follows specific sound source.For example, we can be talked at one by other people or the busy background that forms of music under carry out It talks.The performance that audio scene identification is classified automatically is limited by very large in this task.Sound mixed signal includes more A simultaneous sound event, machine hearing system are also far from reaching the water of the mankind in terms of identifying these sound events It is flat.Single sound event can be used to describe an audio scene: they can by it is a kind of it is symbolistic in a manner of represent one Scene on a busy street, automobile pass through, car horn and the hasty step of people.
Audio scene identification and the purpose of classification are handled voice signal, and the corresponding of scene appearance is translated into The denotational description of sound event, for applications such as automatic marking, automatic sound analysis or audio segmentations.Knew in the past with audio scene Not relevant research is all to consider the audio scene with the overlapping events explicitly marked, but testing result is the shape with sequence What formula was presented, it is assumed that only include every time most significant event.In this respect, system is only able to find a scene every time, if inspection The scene measured is included in annotation, then assessment will be considered that output is correct.In multi-source environment, the performance of this system is non- It is often limited.
2, data enhancement methods describe
Since the research identified with audio scene more and more carries out, and a large amount of methods neural network based are answered With with such issues that, therefore for also increasing with the demand of data.But with the image classification the case where compared with, it is current to use More limits are received in the quantitative aspects of size, diversity and event instance in the data set of exploitation audio scene identifying system System, although the newest contribution of the data sets such as AudioSet, DCASE reduces this gap significantly.
One good solution of this problem is data enhancing processing, i.e., answers the training sample of one group of band annotation It is deformed with one or more, to generate new, additional training data.One key concept of data extending is to be applied to The deformation of flag data will not change the semanteme of label.By taking computer vision as an example, width rotation, translation, mirror image or scaling Automobile image is still the coherent automobile image of a width, therefore can be using these deformations to generate additional training data, together When keep label semantic validity.By carrying out network training to additional deformation data, it is desirable to which network keeps these deformations It is constant, preferably it is generalized to invisible data.By doing so, model will be exposed to a bigger, more diversified trained sample In this, therefore the decision boundary between class can be better described.It also proposed in audio area and retain semantic deformation, and by The model accuracy of music assorting task can be improved in proof.
3, HPSS method describes
Harmonic wave/impulse source separation (HPSS) technology is proposed and is applied in music separation field.In general, sound Music signal is usually in two kinds of face formal distribution in spectrogram, and one is being distributed along time shaft continuously smooth, another kind is along frequency Both sources of sound being distributed usually are referred to as harmonic source and impulse source by the distribution of axis continuously smooth." harmonic wave/impulse source separation " It is the straightforward procedure of a kind of harmonic wave for analyzing audio signal and impulse source, this method is effective to promote as a kind of preprocess method Into multi-tone analysis, the research in the audio signal analysis field such as automatic music transcription.Harmonic source and impact in audio signal Source has anisotropic characteristics on frequency spectrum, generally comprises fixed tone based on this characteristic harmonic source, can be formed on frequency spectrum A series of smooth instantaneous envelopes, thus on time-axis direction be it is smooth continuous, it is intermittent in frequency axis direction;Instead It, impulse source was generally concentrated in the short period, a series of vertical wideband spectral envelopes was formed on frequency spectrum, therefore in time shaft It is intermittent on direction, is smooth continuous in frequency axis direction.
Summary of the invention
Rushing based on harmonic wave for classification accuracy larger can be promoted the technical problem to be solved by the invention is to provide a kind of Hit the audio scene recognition method of source mask data enhancing.
The technical scheme adopted by the invention is that: a kind of audio scene identification based on the enhancing of harmonic wave impulse source mask data Method includes the following steps:
1) separation that total harmonic component H and total impact component P are carried out to the audio in training set, by being given birth to by a segment of audio At two section audios, data enhancing is realized;
2), as training set input scene identifying system, sound is carried out for by two isolated section audios of harmonic wave impulse source Frequency feature extraction;
3) using the audio frequency characteristics of training set as the input of classifier network, training classifier network, the root in test set Audio scene is identified according to the output result of classifier network.
Include: to the separation of audio progress harmonic source and impulse source in training set described in step 1)
(1) audio signal model J (H, P) is designed:
Hi+Pi=Wi Hi> 0, Pi> 0 (2)
Wherein, HiFor i-th of harmonic component of audio signal, PiFor i-th of impact component of audio signal, FiFor input I-th of Short Time Fourier Transform of signal f (t), Wi=| Fi|2For energy spectrum, σH, σPFor weight smoothing factor;
(2) minimum value that each audio signal model is calculated using alternative manner, obtains total harmonic wave of each audio signal Component H and total impact component P;
(2.1) when have spectrum gradient Hi-1-HiAnd Pi-1-PiIndependent Gaussian distribution is obeyed, then is had:
(Hi-1-Hi)2≤2(Hi-1-Ui)2+2(Hi-Ui)2 (3)
(Pi-1-Pi)2≤2(Pi-1-Vi)2+2(Pi-Ui)2 (4)
Wherein, intermediate variable UiAnd ViIt respectively indicates are as follows:
(2.2) auxiliary function is set are as follows:
Formula (3), (4) are substituted into formula (5), obtain formula (6), (7):
J(H,P)≤Q(H,P,N,V)(6)
(2.3) pass through k iteration, k value is less than or equal to 200, obtains:
The separation for realizing total harmonic component H and total impact component P, to enhance audio data.
Step 2) includes:
(1) down-sampling is carried out to audio, uniformly converts the audio data of different sample frequencys to as 44.1 kHz;
(2) framing windowing process: setting the frame length of Short Time Fourier Transform as N ms, and it is N/2ms that frame, which moves size, uses M A mel filter obtains mel spectrogram, which is the audio frequency characteristics extracted.
Step 3) includes:
(1) audio frequency characteristics of the training set of extraction and corresponding class label are formed into one group of input-output pair;
(2) using the mapping between classifier e-learning audio frequency characteristics and corresponding class label;
(3) in test phase, the audio of test set is identified using the classifier network that training obtains, classifier net The estimation of network exportsIt is expected that there are two types of situation, one is, when in estimation output there is no with audio frequency characteristics with it is corresponding Class label when,Close to 0;Another kind is, when existing in estimation output with audio frequency characteristics with corresponding class label, Close to 1.
A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data of the invention, in audio field Scape identification, using identical sorter model, classification accuracy has obtained biggish promotion.By being separated based on harmonic wave impulse source Data enhancing, system will obtain a bigger, more diversified training set.Classifier network is instructed based on the training set Practice, the learning ability and generalization ability of classifier network can be promoted well.
Detailed description of the invention
Fig. 1 is total harmonic component mel spectrogram after separation;
Fig. 2 is total impact component mel spectrogram after separation.
Specific embodiment
Below with reference to embodiment and attached drawing to a kind of audio field based on the enhancing of harmonic wave impulse source mask data of the invention Scape recognition methods is described in detail.
A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data of the invention, including walk as follows It is rapid:
1) separation that total harmonic component H and total impact component P are carried out to the audio in training set, by being given birth to by a segment of audio At two section audios, data enhancing is realized;The separation that the described audio in training set carries out harmonic source and impulse source includes:
(1) audio signal model J (H, P) is designed:
Hi+Pi=Wi Hi> 0, Pi> 0 (2)
Wherein, HiFor i-th of harmonic component of audio signal, PiFor i-th of impact component of audio signal, FiFor input I-th of Short Time Fourier Transform of signal f (t), Wi=| Fi|2For energy spectrum, σH, σPFor weight smoothing factor;
(2) minimum value that each audio signal model is calculated using alternative manner, obtains total harmonic wave of each audio signal Component H and total impact component P;
(2.1) when have spectrum gradient Hi-1-HiAnd Pi-1-PiIndependent Gaussian distribution is obeyed, then is had:
(Hi-1-Hi)2≤2(Hi-1-Ui)2+2(Hi-Ui)2 (3)
(Pi-1-Pi)2≤2(Pi-1-Vi)2+2(Pi-Ui)2 (4)
Wherein, intermediate variable UiAnd ViIt respectively indicates are as follows:
(2.2) auxiliary function is set are as follows:
Formula (3), (4) are substituted into formula (5), obtain formula (6), (7):
J(H,P)≤Q(H,P,N,V) (6)
(2.3) pass through k iteration, k value is less than or equal to 200, obtains:
The separation for realizing total harmonic component H and total impact component P, to enhance audio data.
The separation mel spectrogram of total harmonic component H and total impact component P are as shown in Figure 1 and Figure 2.
2) will by harmonic wave impulse source separation (HPSS) obtained two section audios as training set input scene identifying system, Carry out audio feature extraction;Include:
(1) down-sampling is carried out to audio, uniformly converts the audio data of different sample frequencys to as 44.1kHz;
(2) framing windowing process: setting the frame length of Short Time Fourier Transform as 46ms, and it is 23ms that frame, which moves size, uses M Mel filter obtains mel spectrogram, which is the audio frequency characteristics extracted.
3) using the audio frequency characteristics of training set as the input of classifier network, training classifier network, the root in test set Audio scene is identified according to the output result of classifier network.Include:
(1) audio frequency characteristics of the training set of extraction and corresponding class label are formed into one group of input-output pair;
(2) using the mapping between classifier e-learning audio frequency characteristics and corresponding class label;
(3) in test phase, the audio of test set is identified using the classifier network that training obtains, classifier net The estimation of network exportsIt is expected that there are two types of situation, one is, when in estimation output there is no with audio frequency characteristics with it is corresponding Class label when,Close to 0;Another kind is, when existing in estimation output with audio frequency characteristics with corresponding class label, Close to 1.
The embodiment of the present invention:
A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data of the invention, including walk as follows It is rapid:
1) separation that total harmonic component H and total impact component P are carried out to the audio in training set, by being given birth to by a segment of audio At two section audios, data enhancing is realized;The separation that the described audio in training set carries out harmonic source and impulse source includes:
(1) audio signal model J (H, P) is designed:
Hi+Pi=Wi Hi> 0, Pi> 0 (2)
Wherein, HiFor i-th of harmonic component of audio signal, PiFor i-th of impact component of audio signal, FiFor input I-th of Short Time Fourier Transform of signal f (t), Wi=| Fi|2For energy spectrum, it is assigned a value of H respectivelyi=0.5Wi, Pi=0.5Wi, σH, σPFor weight smoothing factor, it is assigned a value of σ respectivelyH=0.7, σP=1.05;;
(2) minimum value that each audio signal model is calculated using alternative manner, obtains total harmonic wave of each audio signal Component H and total impact component P;
(2.1) when have spectrum gradient Hi-1-HiAnd Pi-1-PiIndependent Gaussian distribution is obeyed, then is had:
(Hi-1-Hi)2≤2(Hi-1-Ui)2+2(Hi-Ui)2 (3)
(Pi-1-Pi)2≤2(Pi-1-Vi)2+2(Pi-Ui)2 (4)
Wherein, intermediate variable UiAnd ViIt respectively indicates are as follows:
(2.2) auxiliary function is set are as follows:
Formula (3), (4) are substituted into formula (5), obtain formula (6), (7):
J(H,P)≤Q(H,P,N,V) (6)
(2.3) pass through k iteration, wherein k=20 is obtained:
The separation for realizing total harmonic component H and total impact component P, to enhance audio data.
The separation mel spectrogram of total harmonic component H and total impact component P are as shown in Figure 1 and Figure 2.
2) will by harmonic wave impulse source separation (HPSS) obtained two section audios as training set input scene identifying system, Carry out audio feature extraction;Include:
(1) down-sampling is carried out to audio, uniformly converts the audio data of different sample frequencys to as 44.1kHz;
(2) framing windowing process: setting the frame length of Short Time Fourier Transform as 46ms, and it is 23ms that frame, which moves size, uses 128 A mel filter obtains mel spectrogram, which is the audio frequency characteristics extracted.
3) using the audio frequency characteristics of training set as the input of classifier network, training classifier network, the root in test set Audio scene is identified according to the output result of classifier network.Include:
(1) audio frequency characteristics of the training set of extraction and corresponding class label are formed into one group of input-output pair;
(2) mapping between classifier e-learning audio frequency characteristics and corresponding class label is used, wherein classifier network For two layers of convolutional neural networks that convolution kernel is 3 × 3;
(3) in test phase, the audio of test set is identified using the classifier network that training obtains, classifier net The estimation of network exportsIt is expected that there are two types of situation, one is, when in estimation output there is no with audio frequency characteristics with it is corresponding Class label when,Close to 0;Another kind is, when existing in estimation output with audio frequency characteristics with corresponding class label, Close to 1.

Claims (4)

1. a kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data, which is characterized in that including walking as follows It is rapid:
1) separation that total harmonic component H and total impact component P are carried out to the audio in training set, by generating two by a segment of audio Section audio realizes data enhancing;
2), as training set input scene identifying system, it is special to carry out audio for by two isolated section audios of harmonic wave impulse source Sign is extracted;
3) using the audio frequency characteristics of training set as the input of classifier network, training classifier network, the basis point in test set The output result of class device network identifies audio scene.
2. a kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data according to claim 1, It is characterized in that, the separation for carrying out harmonic source and impulse source to the audio in training set described in step 1) includes:
(1) audio signal model J (H, P) is designed:
Hi+Pi=Wi Hi> 0, Pi> 0 (2)
Wherein, HiFor i-th of harmonic component of audio signal, PiFor i-th of impact component of audio signal, FiFor input signal f (t) i-th of Short Time Fourier Transform, Wi=| Fi|2For energy spectrum, σH, σPFor weight smoothing factor;
(2) minimum value that each audio signal model is calculated using alternative manner, obtains total harmonic component H of each audio signal With total impact component P;
(2.1) when have spectrum gradient Hi-1-HiAnd Pi-1-PiIndependent Gaussian distribution is obeyed, then is had:
(Hi-1-Hi)2≤2(Hi-1-Ui)2+2(Hi-Ui)2 (3)
(Pi-1-Pi)2≤2(Pi-1-Vi)2+2(Pi-Ui)2 (4)
Wherein, intermediate variable UiAnd ViIt respectively indicates are as follows:
(2.2) auxiliary function is set are as follows:
Formula (3), (4) are substituted into formula (5), obtain formula (6), (7):
J(H,P)≤Q(H,P,N,V) (6)
(2.3) pass through k iteration, k value is less than or equal to 200, obtains:
The separation for realizing total harmonic component H and total impact component P, to enhance audio data.
3. a kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data according to claim 1, It is characterized in that, step 2) includes:
(1) down-sampling is carried out to audio, uniformly converts the audio data of different sample frequencys to as 44.1kHz;
(2) framing windowing process: setting the frame length of Short Time Fourier Transform as Nms, and it is N/2ms that frame, which moves size, uses M mel Filter obtains mel spectrogram, which is the audio frequency characteristics extracted.
4. a kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data according to claim 1, It is characterized in that, step 3) includes:
(1) audio frequency characteristics of the training set of extraction and corresponding class label are formed into one group of input-output pair;
(2) using the mapping between classifier e-learning audio frequency characteristics and corresponding class label;
(3) in test phase, the audio of test set is identified using the classifier network that training obtains, classifier network Estimation outputIt is expected that one is be not present and audio frequency characteristics and corresponding class when in estimation output there are two types of situation When label,Close to 0;Another kind is, when existing in estimation output with audio frequency characteristics with corresponding class label,It is close In 1.
CN201910233185.3A 2019-03-26 2019-03-26 A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data Pending CN110070856A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910233185.3A CN110070856A (en) 2019-03-26 2019-03-26 A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910233185.3A CN110070856A (en) 2019-03-26 2019-03-26 A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data

Publications (1)

Publication Number Publication Date
CN110070856A true CN110070856A (en) 2019-07-30

Family

ID=67366745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910233185.3A Pending CN110070856A (en) 2019-03-26 2019-03-26 A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data

Country Status (1)

Country Link
CN (1) CN110070856A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807901A (en) * 2019-11-08 2020-02-18 西安联丰迅声信息科技有限责任公司 Non-contact industrial abnormal sound detection method
CN111505650A (en) * 2020-04-28 2020-08-07 西北工业大学 HPSS-based underwater target passive detection method
CN113241091A (en) * 2021-05-28 2021-08-10 思必驰科技股份有限公司 Sound separation enhancement method and system
CN113497953A (en) * 2020-04-07 2021-10-12 北京达佳互联信息技术有限公司 Music scene recognition method, device, server and storage medium
CN116186524A (en) * 2023-05-04 2023-05-30 天津大学 Self-supervision machine abnormal sound detection method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477798A (en) * 2009-02-17 2009-07-08 北京邮电大学 Method for analyzing and extracting audio data of set scene
CN104616663A (en) * 2014-11-25 2015-05-13 重庆邮电大学 Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)
CN108335703A (en) * 2018-03-28 2018-07-27 腾讯音乐娱乐科技(深圳)有限公司 The method and apparatus for determining the stress position of audio data
CN109256146A (en) * 2018-10-30 2019-01-22 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency detection, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477798A (en) * 2009-02-17 2009-07-08 北京邮电大学 Method for analyzing and extracting audio data of set scene
CN104616663A (en) * 2014-11-25 2015-05-13 重庆邮电大学 Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)
CN108335703A (en) * 2018-03-28 2018-07-27 腾讯音乐娱乐科技(深圳)有限公司 The method and apparatus for determining the stress position of audio data
CN109256146A (en) * 2018-10-30 2019-01-22 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency detection, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YOONCHANG HAN ET AL: "Convolutional Neural Networks With Binaural Representations And Background Subtraction For Acoustic Scene Classification", 《DETECTION AND CLASSIFICATION OF ACOUSTIC SCENES AND EVENTS 2017》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807901A (en) * 2019-11-08 2020-02-18 西安联丰迅声信息科技有限责任公司 Non-contact industrial abnormal sound detection method
CN110807901B (en) * 2019-11-08 2021-08-03 西安联丰迅声信息科技有限责任公司 Non-contact industrial abnormal sound detection method
CN113497953A (en) * 2020-04-07 2021-10-12 北京达佳互联信息技术有限公司 Music scene recognition method, device, server and storage medium
CN111505650A (en) * 2020-04-28 2020-08-07 西北工业大学 HPSS-based underwater target passive detection method
CN111505650B (en) * 2020-04-28 2022-11-01 西北工业大学 HPSS-based underwater target passive detection method
CN113241091A (en) * 2021-05-28 2021-08-10 思必驰科技股份有限公司 Sound separation enhancement method and system
CN116186524A (en) * 2023-05-04 2023-05-30 天津大学 Self-supervision machine abnormal sound detection method
CN116186524B (en) * 2023-05-04 2023-07-18 天津大学 Self-supervision machine abnormal sound detection method

Similar Documents

Publication Publication Date Title
CN110070856A (en) A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data
Sailor et al. Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification.
Basu et al. A review on emotion recognition using speech
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
CN109036382B (en) Audio feature extraction method based on KL divergence
CN107393554B (en) Feature extraction method for fusion inter-class standard deviation in sound scene classification
CN112257521B (en) CNN underwater acoustic signal target identification method based on data enhancement and time-frequency separation
Naranjo-Alcazar et al. Acoustic scene classification with squeeze-excitation residual networks
CN107610707A (en) A kind of method for recognizing sound-groove and device
CN103810994B (en) Speech emotional inference method based on emotion context and system
Muckenhirn et al. Understanding and Visualizing Raw Waveform-Based CNNs.
CN110120230B (en) Acoustic event detection method and device
CN103456312B (en) A kind of single-channel voice blind separating method based on Computational auditory scene analysis
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
Huang et al. Intelligent feature extraction and classification of anuran vocalizations
CN109767756A (en) A kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient
CN113327586B (en) Voice recognition method, device, electronic equipment and storage medium
Anjana et al. Language identification from speech features using SVM and LDA
Renjith et al. Speech based emotion recognition in Tamil and Telugu using LPCC and hurst parameters—A comparitive study using KNN and ANN classifiers
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
CN107369451B (en) Bird voice recognition method for assisting phenological study of bird breeding period
Xie et al. Application of image processing techniques for frog call classification
WO2023279691A1 (en) Speech classification method and apparatus, model training method and apparatus, device, medium, and program
Yamamoto et al. Deformable cnn and imbalance-aware feature learning for singing technique classification
Shu et al. Time-frequency performance study on urban sound classification with convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190730