CN110070856A - A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data - Google Patents
A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data Download PDFInfo
- Publication number
- CN110070856A CN110070856A CN201910233185.3A CN201910233185A CN110070856A CN 110070856 A CN110070856 A CN 110070856A CN 201910233185 A CN201910233185 A CN 201910233185A CN 110070856 A CN110070856 A CN 110070856A
- Authority
- CN
- China
- Prior art keywords
- audio
- harmonic wave
- training set
- enhancing
- impulse source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000002708 enhancing effect Effects 0.000 title claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 40
- 238000000926 separation method Methods 0.000 claims abstract description 22
- 238000012360 testing method Methods 0.000 claims abstract description 14
- 238000000605 extraction Methods 0.000 claims abstract description 8
- 230000005236 sound signal Effects 0.000 claims description 23
- 238000001228 spectrum Methods 0.000 claims description 11
- 238000009432 framing Methods 0.000 claims description 4
- 238000009499 grossing Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/083—Recognition networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data, comprising: the separation that total harmonic component H and total impact component P are carried out to the audio in training set realizes data enhancing by generating two section audios by a segment of audio;, as training set input scene identifying system, audio feature extraction is carried out for by two isolated section audios of harmonic wave impulse source;Using the audio frequency characteristics of training set as the input of classifier network, training classifier network identifies audio scene according to the output result of classifier network in test set.The present invention is used to identify that, using identical sorter model, classification accuracy has obtained biggish promotion in audio scene.By being enhanced based on harmonic wave impulse source mask data, system will obtain a bigger, more diversified training set.Classifier network is trained based on the training set, the learning ability and generalization ability of classifier network can be promoted well.
Description
Technical field
The present invention relates to a kind of identification of audio scene and classification methods.Increase more particularly to the data of a kind of pair of audio processing
Strong and pattern-recognition the audio scene recognition method based on the enhancing of harmonic wave impulse source mask data.
Background technique
Currently, generalling use following method for scene Recognition.
1, audio scene identification description
The data of audio scene identification are directly acquired in actual environment, and the presence of overlapping sound is just certainly existed.
Human lives are in a complicated audio environment, can be fine while ignoring or simply recognizing other sound sources
Ground follows specific sound source.For example, we can be talked at one by other people or the busy background that forms of music under carry out
It talks.The performance that audio scene identification is classified automatically is limited by very large in this task.Sound mixed signal includes more
A simultaneous sound event, machine hearing system are also far from reaching the water of the mankind in terms of identifying these sound events
It is flat.Single sound event can be used to describe an audio scene: they can by it is a kind of it is symbolistic in a manner of represent one
Scene on a busy street, automobile pass through, car horn and the hasty step of people.
Audio scene identification and the purpose of classification are handled voice signal, and the corresponding of scene appearance is translated into
The denotational description of sound event, for applications such as automatic marking, automatic sound analysis or audio segmentations.Knew in the past with audio scene
Not relevant research is all to consider the audio scene with the overlapping events explicitly marked, but testing result is the shape with sequence
What formula was presented, it is assumed that only include every time most significant event.In this respect, system is only able to find a scene every time, if inspection
The scene measured is included in annotation, then assessment will be considered that output is correct.In multi-source environment, the performance of this system is non-
It is often limited.
2, data enhancement methods describe
Since the research identified with audio scene more and more carries out, and a large amount of methods neural network based are answered
With with such issues that, therefore for also increasing with the demand of data.But with the image classification the case where compared with, it is current to use
More limits are received in the quantitative aspects of size, diversity and event instance in the data set of exploitation audio scene identifying system
System, although the newest contribution of the data sets such as AudioSet, DCASE reduces this gap significantly.
One good solution of this problem is data enhancing processing, i.e., answers the training sample of one group of band annotation
It is deformed with one or more, to generate new, additional training data.One key concept of data extending is to be applied to
The deformation of flag data will not change the semanteme of label.By taking computer vision as an example, width rotation, translation, mirror image or scaling
Automobile image is still the coherent automobile image of a width, therefore can be using these deformations to generate additional training data, together
When keep label semantic validity.By carrying out network training to additional deformation data, it is desirable to which network keeps these deformations
It is constant, preferably it is generalized to invisible data.By doing so, model will be exposed to a bigger, more diversified trained sample
In this, therefore the decision boundary between class can be better described.It also proposed in audio area and retain semantic deformation, and by
The model accuracy of music assorting task can be improved in proof.
3, HPSS method describes
Harmonic wave/impulse source separation (HPSS) technology is proposed and is applied in music separation field.In general, sound
Music signal is usually in two kinds of face formal distribution in spectrogram, and one is being distributed along time shaft continuously smooth, another kind is along frequency
Both sources of sound being distributed usually are referred to as harmonic source and impulse source by the distribution of axis continuously smooth." harmonic wave/impulse source separation "
It is the straightforward procedure of a kind of harmonic wave for analyzing audio signal and impulse source, this method is effective to promote as a kind of preprocess method
Into multi-tone analysis, the research in the audio signal analysis field such as automatic music transcription.Harmonic source and impact in audio signal
Source has anisotropic characteristics on frequency spectrum, generally comprises fixed tone based on this characteristic harmonic source, can be formed on frequency spectrum
A series of smooth instantaneous envelopes, thus on time-axis direction be it is smooth continuous, it is intermittent in frequency axis direction;Instead
It, impulse source was generally concentrated in the short period, a series of vertical wideband spectral envelopes was formed on frequency spectrum, therefore in time shaft
It is intermittent on direction, is smooth continuous in frequency axis direction.
Summary of the invention
Rushing based on harmonic wave for classification accuracy larger can be promoted the technical problem to be solved by the invention is to provide a kind of
Hit the audio scene recognition method of source mask data enhancing.
The technical scheme adopted by the invention is that: a kind of audio scene identification based on the enhancing of harmonic wave impulse source mask data
Method includes the following steps:
1) separation that total harmonic component H and total impact component P are carried out to the audio in training set, by being given birth to by a segment of audio
At two section audios, data enhancing is realized;
2), as training set input scene identifying system, sound is carried out for by two isolated section audios of harmonic wave impulse source
Frequency feature extraction;
3) using the audio frequency characteristics of training set as the input of classifier network, training classifier network, the root in test set
Audio scene is identified according to the output result of classifier network.
Include: to the separation of audio progress harmonic source and impulse source in training set described in step 1)
(1) audio signal model J (H, P) is designed:
Hi+Pi=Wi Hi> 0, Pi> 0 (2)
Wherein, HiFor i-th of harmonic component of audio signal, PiFor i-th of impact component of audio signal, FiFor input
I-th of Short Time Fourier Transform of signal f (t), Wi=| Fi|2For energy spectrum, σH, σPFor weight smoothing factor;
(2) minimum value that each audio signal model is calculated using alternative manner, obtains total harmonic wave of each audio signal
Component H and total impact component P;
(2.1) when have spectrum gradient Hi-1-HiAnd Pi-1-PiIndependent Gaussian distribution is obeyed, then is had:
(Hi-1-Hi)2≤2(Hi-1-Ui)2+2(Hi-Ui)2 (3)
(Pi-1-Pi)2≤2(Pi-1-Vi)2+2(Pi-Ui)2 (4)
Wherein, intermediate variable UiAnd ViIt respectively indicates are as follows:
(2.2) auxiliary function is set are as follows:
Formula (3), (4) are substituted into formula (5), obtain formula (6), (7):
J(H,P)≤Q(H,P,N,V)(6)
(2.3) pass through k iteration, k value is less than or equal to 200, obtains:
The separation for realizing total harmonic component H and total impact component P, to enhance audio data.
Step 2) includes:
(1) down-sampling is carried out to audio, uniformly converts the audio data of different sample frequencys to as 44.1 kHz;
(2) framing windowing process: setting the frame length of Short Time Fourier Transform as N ms, and it is N/2ms that frame, which moves size, uses M
A mel filter obtains mel spectrogram, which is the audio frequency characteristics extracted.
Step 3) includes:
(1) audio frequency characteristics of the training set of extraction and corresponding class label are formed into one group of input-output pair;
(2) using the mapping between classifier e-learning audio frequency characteristics and corresponding class label;
(3) in test phase, the audio of test set is identified using the classifier network that training obtains, classifier net
The estimation of network exportsIt is expected that there are two types of situation, one is, when in estimation output there is no with audio frequency characteristics with it is corresponding
Class label when,Close to 0;Another kind is, when existing in estimation output with audio frequency characteristics with corresponding class label,
Close to 1.
A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data of the invention, in audio field
Scape identification, using identical sorter model, classification accuracy has obtained biggish promotion.By being separated based on harmonic wave impulse source
Data enhancing, system will obtain a bigger, more diversified training set.Classifier network is instructed based on the training set
Practice, the learning ability and generalization ability of classifier network can be promoted well.
Detailed description of the invention
Fig. 1 is total harmonic component mel spectrogram after separation;
Fig. 2 is total impact component mel spectrogram after separation.
Specific embodiment
Below with reference to embodiment and attached drawing to a kind of audio field based on the enhancing of harmonic wave impulse source mask data of the invention
Scape recognition methods is described in detail.
A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data of the invention, including walk as follows
It is rapid:
1) separation that total harmonic component H and total impact component P are carried out to the audio in training set, by being given birth to by a segment of audio
At two section audios, data enhancing is realized;The separation that the described audio in training set carries out harmonic source and impulse source includes:
(1) audio signal model J (H, P) is designed:
Hi+Pi=Wi Hi> 0, Pi> 0 (2)
Wherein, HiFor i-th of harmonic component of audio signal, PiFor i-th of impact component of audio signal, FiFor input
I-th of Short Time Fourier Transform of signal f (t), Wi=| Fi|2For energy spectrum, σH, σPFor weight smoothing factor;
(2) minimum value that each audio signal model is calculated using alternative manner, obtains total harmonic wave of each audio signal
Component H and total impact component P;
(2.1) when have spectrum gradient Hi-1-HiAnd Pi-1-PiIndependent Gaussian distribution is obeyed, then is had:
(Hi-1-Hi)2≤2(Hi-1-Ui)2+2(Hi-Ui)2 (3)
(Pi-1-Pi)2≤2(Pi-1-Vi)2+2(Pi-Ui)2 (4)
Wherein, intermediate variable UiAnd ViIt respectively indicates are as follows:
(2.2) auxiliary function is set are as follows:
Formula (3), (4) are substituted into formula (5), obtain formula (6), (7):
J(H,P)≤Q(H,P,N,V) (6)
(2.3) pass through k iteration, k value is less than or equal to 200, obtains:
The separation for realizing total harmonic component H and total impact component P, to enhance audio data.
The separation mel spectrogram of total harmonic component H and total impact component P are as shown in Figure 1 and Figure 2.
2) will by harmonic wave impulse source separation (HPSS) obtained two section audios as training set input scene identifying system,
Carry out audio feature extraction;Include:
(1) down-sampling is carried out to audio, uniformly converts the audio data of different sample frequencys to as 44.1kHz;
(2) framing windowing process: setting the frame length of Short Time Fourier Transform as 46ms, and it is 23ms that frame, which moves size, uses M
Mel filter obtains mel spectrogram, which is the audio frequency characteristics extracted.
3) using the audio frequency characteristics of training set as the input of classifier network, training classifier network, the root in test set
Audio scene is identified according to the output result of classifier network.Include:
(1) audio frequency characteristics of the training set of extraction and corresponding class label are formed into one group of input-output pair;
(2) using the mapping between classifier e-learning audio frequency characteristics and corresponding class label;
(3) in test phase, the audio of test set is identified using the classifier network that training obtains, classifier net
The estimation of network exportsIt is expected that there are two types of situation, one is, when in estimation output there is no with audio frequency characteristics with it is corresponding
Class label when,Close to 0;Another kind is, when existing in estimation output with audio frequency characteristics with corresponding class label,
Close to 1.
The embodiment of the present invention:
A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data of the invention, including walk as follows
It is rapid:
1) separation that total harmonic component H and total impact component P are carried out to the audio in training set, by being given birth to by a segment of audio
At two section audios, data enhancing is realized;The separation that the described audio in training set carries out harmonic source and impulse source includes:
(1) audio signal model J (H, P) is designed:
Hi+Pi=Wi Hi> 0, Pi> 0 (2)
Wherein, HiFor i-th of harmonic component of audio signal, PiFor i-th of impact component of audio signal, FiFor input
I-th of Short Time Fourier Transform of signal f (t), Wi=| Fi|2For energy spectrum, it is assigned a value of H respectivelyi=0.5Wi, Pi=0.5Wi,
σH, σPFor weight smoothing factor, it is assigned a value of σ respectivelyH=0.7, σP=1.05;;
(2) minimum value that each audio signal model is calculated using alternative manner, obtains total harmonic wave of each audio signal
Component H and total impact component P;
(2.1) when have spectrum gradient Hi-1-HiAnd Pi-1-PiIndependent Gaussian distribution is obeyed, then is had:
(Hi-1-Hi)2≤2(Hi-1-Ui)2+2(Hi-Ui)2 (3)
(Pi-1-Pi)2≤2(Pi-1-Vi)2+2(Pi-Ui)2 (4)
Wherein, intermediate variable UiAnd ViIt respectively indicates are as follows:
(2.2) auxiliary function is set are as follows:
Formula (3), (4) are substituted into formula (5), obtain formula (6), (7):
J(H,P)≤Q(H,P,N,V) (6)
(2.3) pass through k iteration, wherein k=20 is obtained:
The separation for realizing total harmonic component H and total impact component P, to enhance audio data.
The separation mel spectrogram of total harmonic component H and total impact component P are as shown in Figure 1 and Figure 2.
2) will by harmonic wave impulse source separation (HPSS) obtained two section audios as training set input scene identifying system,
Carry out audio feature extraction;Include:
(1) down-sampling is carried out to audio, uniformly converts the audio data of different sample frequencys to as 44.1kHz;
(2) framing windowing process: setting the frame length of Short Time Fourier Transform as 46ms, and it is 23ms that frame, which moves size, uses 128
A mel filter obtains mel spectrogram, which is the audio frequency characteristics extracted.
3) using the audio frequency characteristics of training set as the input of classifier network, training classifier network, the root in test set
Audio scene is identified according to the output result of classifier network.Include:
(1) audio frequency characteristics of the training set of extraction and corresponding class label are formed into one group of input-output pair;
(2) mapping between classifier e-learning audio frequency characteristics and corresponding class label is used, wherein classifier network
For two layers of convolutional neural networks that convolution kernel is 3 × 3;
(3) in test phase, the audio of test set is identified using the classifier network that training obtains, classifier net
The estimation of network exportsIt is expected that there are two types of situation, one is, when in estimation output there is no with audio frequency characteristics with it is corresponding
Class label when,Close to 0;Another kind is, when existing in estimation output with audio frequency characteristics with corresponding class label,
Close to 1.
Claims (4)
1. a kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data, which is characterized in that including walking as follows
It is rapid:
1) separation that total harmonic component H and total impact component P are carried out to the audio in training set, by generating two by a segment of audio
Section audio realizes data enhancing;
2), as training set input scene identifying system, it is special to carry out audio for by two isolated section audios of harmonic wave impulse source
Sign is extracted;
3) using the audio frequency characteristics of training set as the input of classifier network, training classifier network, the basis point in test set
The output result of class device network identifies audio scene.
2. a kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data according to claim 1,
It is characterized in that, the separation for carrying out harmonic source and impulse source to the audio in training set described in step 1) includes:
(1) audio signal model J (H, P) is designed:
Hi+Pi=Wi Hi> 0, Pi> 0 (2)
Wherein, HiFor i-th of harmonic component of audio signal, PiFor i-th of impact component of audio signal, FiFor input signal f
(t) i-th of Short Time Fourier Transform, Wi=| Fi|2For energy spectrum, σH, σPFor weight smoothing factor;
(2) minimum value that each audio signal model is calculated using alternative manner, obtains total harmonic component H of each audio signal
With total impact component P;
(2.1) when have spectrum gradient Hi-1-HiAnd Pi-1-PiIndependent Gaussian distribution is obeyed, then is had:
(Hi-1-Hi)2≤2(Hi-1-Ui)2+2(Hi-Ui)2 (3)
(Pi-1-Pi)2≤2(Pi-1-Vi)2+2(Pi-Ui)2 (4)
Wherein, intermediate variable UiAnd ViIt respectively indicates are as follows:
(2.2) auxiliary function is set are as follows:
Formula (3), (4) are substituted into formula (5), obtain formula (6), (7):
J(H,P)≤Q(H,P,N,V) (6)
(2.3) pass through k iteration, k value is less than or equal to 200, obtains:
The separation for realizing total harmonic component H and total impact component P, to enhance audio data.
3. a kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data according to claim 1,
It is characterized in that, step 2) includes:
(1) down-sampling is carried out to audio, uniformly converts the audio data of different sample frequencys to as 44.1kHz;
(2) framing windowing process: setting the frame length of Short Time Fourier Transform as Nms, and it is N/2ms that frame, which moves size, uses M mel
Filter obtains mel spectrogram, which is the audio frequency characteristics extracted.
4. a kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data according to claim 1,
It is characterized in that, step 3) includes:
(1) audio frequency characteristics of the training set of extraction and corresponding class label are formed into one group of input-output pair;
(2) using the mapping between classifier e-learning audio frequency characteristics and corresponding class label;
(3) in test phase, the audio of test set is identified using the classifier network that training obtains, classifier network
Estimation outputIt is expected that one is be not present and audio frequency characteristics and corresponding class when in estimation output there are two types of situation
When label,Close to 0;Another kind is, when existing in estimation output with audio frequency characteristics with corresponding class label,It is close
In 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233185.3A CN110070856A (en) | 2019-03-26 | 2019-03-26 | A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910233185.3A CN110070856A (en) | 2019-03-26 | 2019-03-26 | A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110070856A true CN110070856A (en) | 2019-07-30 |
Family
ID=67366745
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910233185.3A Pending CN110070856A (en) | 2019-03-26 | 2019-03-26 | A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110070856A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807901A (en) * | 2019-11-08 | 2020-02-18 | 西安联丰迅声信息科技有限责任公司 | Non-contact industrial abnormal sound detection method |
CN111505650A (en) * | 2020-04-28 | 2020-08-07 | 西北工业大学 | HPSS-based underwater target passive detection method |
CN113241091A (en) * | 2021-05-28 | 2021-08-10 | 思必驰科技股份有限公司 | Sound separation enhancement method and system |
CN113497953A (en) * | 2020-04-07 | 2021-10-12 | 北京达佳互联信息技术有限公司 | Music scene recognition method, device, server and storage medium |
CN116186524A (en) * | 2023-05-04 | 2023-05-30 | 天津大学 | Self-supervision machine abnormal sound detection method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477798A (en) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | Method for analyzing and extracting audio data of set scene |
CN104616663A (en) * | 2014-11-25 | 2015-05-13 | 重庆邮电大学 | Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation) |
CN108335703A (en) * | 2018-03-28 | 2018-07-27 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus for determining the stress position of audio data |
CN109256146A (en) * | 2018-10-30 | 2019-01-22 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio-frequency detection, device and storage medium |
-
2019
- 2019-03-26 CN CN201910233185.3A patent/CN110070856A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477798A (en) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | Method for analyzing and extracting audio data of set scene |
CN104616663A (en) * | 2014-11-25 | 2015-05-13 | 重庆邮电大学 | Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation) |
CN108335703A (en) * | 2018-03-28 | 2018-07-27 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus for determining the stress position of audio data |
CN109256146A (en) * | 2018-10-30 | 2019-01-22 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio-frequency detection, device and storage medium |
Non-Patent Citations (1)
Title |
---|
YOONCHANG HAN ET AL: "Convolutional Neural Networks With Binaural Representations And Background Subtraction For Acoustic Scene Classification", 《DETECTION AND CLASSIFICATION OF ACOUSTIC SCENES AND EVENTS 2017》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807901A (en) * | 2019-11-08 | 2020-02-18 | 西安联丰迅声信息科技有限责任公司 | Non-contact industrial abnormal sound detection method |
CN110807901B (en) * | 2019-11-08 | 2021-08-03 | 西安联丰迅声信息科技有限责任公司 | Non-contact industrial abnormal sound detection method |
CN113497953A (en) * | 2020-04-07 | 2021-10-12 | 北京达佳互联信息技术有限公司 | Music scene recognition method, device, server and storage medium |
CN111505650A (en) * | 2020-04-28 | 2020-08-07 | 西北工业大学 | HPSS-based underwater target passive detection method |
CN111505650B (en) * | 2020-04-28 | 2022-11-01 | 西北工业大学 | HPSS-based underwater target passive detection method |
CN113241091A (en) * | 2021-05-28 | 2021-08-10 | 思必驰科技股份有限公司 | Sound separation enhancement method and system |
CN116186524A (en) * | 2023-05-04 | 2023-05-30 | 天津大学 | Self-supervision machine abnormal sound detection method |
CN116186524B (en) * | 2023-05-04 | 2023-07-18 | 天津大学 | Self-supervision machine abnormal sound detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110070856A (en) | A kind of audio scene recognition method based on the enhancing of harmonic wave impulse source mask data | |
Sailor et al. | Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification. | |
Basu et al. | A review on emotion recognition using speech | |
CN105023573B (en) | It is detected using speech syllable/vowel/phone boundary of auditory attention clue | |
CN109036382B (en) | Audio feature extraction method based on KL divergence | |
CN107393554B (en) | Feature extraction method for fusion inter-class standard deviation in sound scene classification | |
CN112257521B (en) | CNN underwater acoustic signal target identification method based on data enhancement and time-frequency separation | |
Naranjo-Alcazar et al. | Acoustic scene classification with squeeze-excitation residual networks | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN103810994B (en) | Speech emotional inference method based on emotion context and system | |
Muckenhirn et al. | Understanding and Visualizing Raw Waveform-Based CNNs. | |
CN110120230B (en) | Acoustic event detection method and device | |
CN103456312B (en) | A kind of single-channel voice blind separating method based on Computational auditory scene analysis | |
CN111724770B (en) | Audio keyword identification method for generating confrontation network based on deep convolution | |
Huang et al. | Intelligent feature extraction and classification of anuran vocalizations | |
CN109767756A (en) | A kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient | |
CN113327586B (en) | Voice recognition method, device, electronic equipment and storage medium | |
Anjana et al. | Language identification from speech features using SVM and LDA | |
Renjith et al. | Speech based emotion recognition in Tamil and Telugu using LPCC and hurst parameters—A comparitive study using KNN and ANN classifiers | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
CN107369451B (en) | Bird voice recognition method for assisting phenological study of bird breeding period | |
Xie et al. | Application of image processing techniques for frog call classification | |
WO2023279691A1 (en) | Speech classification method and apparatus, model training method and apparatus, device, medium, and program | |
Yamamoto et al. | Deformable cnn and imbalance-aware feature learning for singing technique classification | |
Shu et al. | Time-frequency performance study on urban sound classification with convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190730 |