CN109584888A - Whistle recognition methods based on machine learning - Google Patents

Whistle recognition methods based on machine learning Download PDF

Info

Publication number
CN109584888A
CN109584888A CN201910038606.7A CN201910038606A CN109584888A CN 109584888 A CN109584888 A CN 109584888A CN 201910038606 A CN201910038606 A CN 201910038606A CN 109584888 A CN109584888 A CN 109584888A
Authority
CN
China
Prior art keywords
whistle
data
classifier
training
method described
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910038606.7A
Other languages
Chinese (zh)
Inventor
乔天昊
徐树公
张舜卿
曹姗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201910038606.7A priority Critical patent/CN109584888A/en
Publication of CN109584888A publication Critical patent/CN109584888A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Abstract

A kind of whistle recognition methods based on machine learning, sample data is generated with public data collection and whistle data mixing, classifier is trained by extracting its MFCC feature, and the classifier after training is used to classify to testing data in on-line stage, realize whistle identification, in which: classifier is realized using open source lightweight gradient Boosting frame;Sample data is by obtaining whistle data non-in public data collection ESC-50 and whistle data mixing.The present invention has time-consuming short and more accurate recognition effect compared with prior art.

Description

Whistle recognition methods based on machine learning
Technical field
The present invention relates to a kind of technology in ambient sound identification field, specifically a kind of whistles based on machine learning Recognition methods.
Background technique
Vehicle whistle sound inherently can be used as one of ambient sound research as a part in ambient sound, to its identification Point.In the illegal whistle capturing system to come into being under cyberage, whistle sound should be identified rapidly accurately for taking the photograph As head is captured in time, excludes the interference of other sound in environment again to prevent erroneous judgement, therefore, the identification for sound of blowing a whistle is wanted Ask very high.
Whistle recognition methods under existing complicated noise obtains original training sample library first with microphone and selects Training sample set out is then trained to obtain model library using HMM model, finally be divided using model test sample Class identifies to obtain final recognition result.The training dataset of high quality is obtained using less artificial mark in the art, To solve the difficulty of train sound complexity bring training sample selection itself, and then improve recognition correct rate.But it is this kind of The sample that technology uses is generally non-public data set, and the complexity of algorithm is higher, required time is longer.
Summary of the invention
The present invention In view of the above shortcomings of the prior art, proposes a kind of whistle recognition methods based on machine learning, Identify that there is more accurate recognition effect compared with prior art to it using machine learning algorithm (lightGBM), Solves the case where existing test method judges incorrectly non-whistle data.
The present invention is achieved by the following technical solutions:
The present invention generates sample data with public data collection and whistle data mixing, by extracting its MFCC feature to classification Device is trained, and is used to classify to testing data by the classifier after training in on-line stage, realizes whistle identification.
The classifier is realized using open source lightweight gradient Boosting frame (LightGBM).
The sample data, by by whistle data non-in public data collection ESC-50 and whistle data mixing, wherein Include 11636 whistle data and 6359 non-whistle data.
The extraction refers to: Fourier transformation is carried out to sample data, using carrying out Fourier after logarithm operation again Inverse transformation obtains mel cepstrum to obtain the envelope information of sound spectrum, and it includes the formant informations of sound, while also right The low frequency part of signal is answered, it is the important information for distinguishing sound.Therefore carrying out cepstral analysis has highly important meaning Justice.
The extraction, specifically includes: framing, adding window, discrete Fourier transform, mel-frequency conversion, the non-linear change of log It changes and discrete cosine transform, in which: sample rate 22050, the MFCC quantity of return are 20.
The training refers to: being based on decision Tree algorithms, is identified using regression tree, then iteratively added first Add tree so that new tree can focus on the mistake classification of previous all tree set, by the prediction combination of multiple trees to optimize mesh Scalar functions, and adjust by gradient decline the parameter of the tree of addition.
The objective function uses logloss (logarithm loss).
Described classify referring to testing data: simultaneously by the classifier after the MFCC feature input training of testing data Prediction result is obtained, is non-whistle when prediction result is greater than 0.5 to be judged to whistle otherwise.
Technical effect
Compared with prior art, the ambient sound concentrated present invention employs public data is supplemented very as non-whistle data The ambient sound data set recorded under real street environment so that whistle is balanced with non-whistle data distribution, and then improves non-whistle number According to discrimination, propose high performance effect to reach.The present invention is used the method for lightGBM property compared with prior art More preferably better recognition effect can be obtained, and time consumption for training is shorter.
Detailed description of the invention
Fig. 1 is the flow chart of the whistle identification the present invention is based on machine learning;
Fig. 2 is the relation schematic diagram of mel-frequency and hertz.
Specific embodiment
The present embodiment specifically includes the following steps:
Step 1. chooses class road noise data from ESC-50, this experimental data set is added, and constitutes distribution relative equilibrium Whistle data set.Data set is then divided into five files, for doing five folding cross validations.
Step 2. handles data set, extracts MFCC feature:
Step 2.1) framing: each for every section of sound convenient for analyzing sound, to be divided into segment small one by one A small fragment is a frame.Every frame is one and its of short duration time, can be regarded as smoothly, be helped in this way It is subsequent that sound is analyzed.In addition to this, in order to reduce the variation between consecutive frame, can generally be arranged between frames One section of overlapping region, area size depend on the circumstances.
Step 2.2) adding window: by every frame substitute into a window function, in order to eliminate every frame two sides signal discontinuity, need Value other than window is set as 0.
Common window function has rectangular window, Hamming window and Hanning window etc., and the present embodiment uses Hamming window, because it is used not With weighting coefficient so that its secondary lobe is smaller, meanwhile, Hamming window have the function of it is smooth, after Fourier transformation being weakened Secondary lobe size and spectral leakage the problem of.
Step 2.3) Fast Fourier Transform (FFT) (FFT): the time-domain information of signal is limited, only includes the amplitude information of sound. And the characteristic of signal is mostly hidden in its frequency domain information, so needing for voice signal to be transformed into frequency domain, carries out subsequent point Analysis.
Each frame after framing windowing operation carries out discrete Fourier transform, to obtain the amplitude of the frame different frequency Distribution: energy is higher, indicates that this section of provincial characteristics is more obvious, more important in entire spectrogram.
The conversion of step 2.4) mel-frequency;
Mel-frequency is based on a kind of non-linear frequency scale of human hearing characteristic, it can generate line to the variation of tone The perception of property.Mel-frequency and hertz frequency are a kind of nonlinear relationships, with the increase of frequency, the increase of mel-frequency Further slowly, frequency range of the mel-frequency range of equal length corresponding to high frequency is wider.Therefore, Meier filter needs Broader bandwidth is used in high frequency section.
As shown in Fig. 2, the intensity in the partial shrinkage of high frequency is bigger.This not only conforms with the mel-frequency pair that front is analyzed The perception of human hearing characteristic, filter group can also compress the amplitude of frequency domain, so that each frequency range can be with It is indicated with a mel-frequency value, obtains the characteristic information of smaller complexity, make feature simplification.
Step 2.5) logarithm nonlinear transformation;
The low frequency part of sound often hides more information, and for human ear to low frequency part also rdativery sensitive, log transformation is main Effect be to enhance the low frequency of sound to indicate that enhancing is hidden in the information of low frequency part.
A kind of step 2.6) discrete cosine transform: real number variation based on Fourier transformation.DCT is general in addition to having Orthogonal property outside, the base vector of transformation matrix embodies the language and picture characteristics of human perception.Dct transform has very strong " concentrate energy " ability so that most of energy of sound and image can concentrate on the low frequency after dct transform It on part, thus is actually that dimensionality reduction operation has been carried out to every frame data of sound.According to the definition of cepstrum process, need Carry out inverse Fourier transform then again by low-pass filter to obtain the low-frequency information of sound, but we only use from Scattered cosine transform directly can just obtain the low-frequency information of frequency spectrum, with this instead of the operation of Fourier inversion.By from Cosine transform is dissipated, we can obtain a series of cepstrum vectors for being used to describe voice signal, and each vector is exactly every frame MFCC feature vector.
Step 3. carries out model training and preservation model using the feature extracted, and is made in the present embodiment using SVM model For comparison, specific steps include: to be standardized operation to data first, parameter configuration are then arranged, wherein parameter are as follows: Nthread=4;N_estimators=10000;Learning_rate=0.02;Num_leaves=32;colsample_ Bytree=0.9497036;Subsample=0.8715623;Max_depth=8;Reg_alpha=0.04;reg_ Lambda=0.073;
Min_split_gain=0.0222415;Min_child_weight=40;Silent=True;Verbose=-1, It can start the training of lightGBM model.
The SVM model, is trained in the following manner: finding first with grid data service optimal in SVM Data are then standardized operation by parameter " gamma " and " C ", can finally start to train, and are using parameter in training Gamma=0.001 and C=1000.
Step 4. carries out feature extraction operation to test data set, this operation should ensure that be operated with to training dataset It is identical, test data set feature is input in trained model and is tested, following result is obtained:
The result that SVM model carries out five folding cross validations is respectively as follows: 0.946,0.947,0.966,0.949,0.961
The result that lightGBM model carries out five folding cross validations is respectively as follows: 0.968,0.979,0.979,0.979, 0.978
To sum up, lightGBM performance is better than (recognition accuracy is higher than) SVM.Furthermore each fold of SVM model is (total Five) training time is 35 minutes or so 1 hour, and each fold of LightGBM model training only needs 25 minutes.
Above-mentioned specific implementation can by those skilled in the art under the premise of without departing substantially from the principle of the invention and objective with difference Mode carry out local directed complete set to it, protection scope of the present invention is subject to claims and not by above-mentioned specific implementation institute Limit, each implementation within its scope is by the constraint of the present invention.

Claims (7)

1. a kind of whistle recognition methods based on machine learning, which is characterized in that raw with public data collection and whistle data mixing At sample data, classifier is trained by extracting its MFCC feature, and uses the classifier after training in on-line stage Classify in testing data, realizes whistle identification, in which: classifier is real using open source lightweight gradient Boosting frame It is existing;Sample data is by obtaining whistle data non-in public data collection ESC-50 and whistle data mixing.
2. according to the method described in claim 1, it is characterized in that, the extraction refers to: extraction refer to: to sample data carry out Fourier transformation obtains mel cepstrum using Fourier inversion is carried out after logarithm operation again to obtain the packet of sound spectrum Network information.
3. method according to claim 1 or 2, characterized in that the extraction specifically includes: framing, adding window, discrete Fourier transformation, mel-frequency conversion, log nonlinear transformation and discrete cosine transform.
4. according to the method described in claim 1, it is characterized in that, the training refers to: be based on decision Tree algorithms, use first Regression tree is identified that then iteratively addition is set so that new tree can focus on the mistake point that previously all trees are gathered Class combines the predictions of multiple trees with optimization object function, and adjusts by gradient decline the parameter of the tree of addition.
5. according to the method described in claim 1, it is characterized in that, it is described to testing data carry out classification refer to: by number to be measured According to MFCC feature input training after classifier and obtain prediction result, when prediction result be greater than 0.5 be judged to whistle otherwise For non-whistle.
6. according to the method described in claim 1, it is characterized in that, the non-whistle data, i.e., the class chosen from ESC-50 Road noise data.
7. according to the method described in claim 3, it is characterized in that, the adding window, by Hamming window realize it is smooth, weaken in Fu The problem of secondary lobe size and spectral leakage after leaf transformation, the value other than window are set as 0.
CN201910038606.7A 2019-01-16 2019-01-16 Whistle recognition methods based on machine learning Pending CN109584888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910038606.7A CN109584888A (en) 2019-01-16 2019-01-16 Whistle recognition methods based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910038606.7A CN109584888A (en) 2019-01-16 2019-01-16 Whistle recognition methods based on machine learning

Publications (1)

Publication Number Publication Date
CN109584888A true CN109584888A (en) 2019-04-05

Family

ID=65915034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910038606.7A Pending CN109584888A (en) 2019-01-16 2019-01-16 Whistle recognition methods based on machine learning

Country Status (1)

Country Link
CN (1) CN109584888A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110047512A (en) * 2019-04-25 2019-07-23 广东工业大学 A kind of ambient sound classification method, system and relevant apparatus
CN112906795A (en) * 2021-02-23 2021-06-04 江苏聆世科技有限公司 Whistle vehicle judgment method based on convolutional neural network
CN113205830A (en) * 2021-05-08 2021-08-03 南京师范大学 Automobile whistle recognition method based on subband spectral entropy method and PSO-GA-SVM
US11551116B2 (en) * 2020-01-29 2023-01-10 Rohde & Schwarz Gmbh & Co. Kg Signal analysis method and signal analysis module

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050027514A1 (en) * 2003-07-28 2005-02-03 Jian Zhang Method and apparatus for automatically recognizing audio data
CN101388688A (en) * 2008-11-05 2009-03-18 北京理工大学 Frequency scanning interference suspending method for direct sequence spread spectrum communication system
CN104916289A (en) * 2015-06-12 2015-09-16 哈尔滨工业大学 Quick acoustic event detection method under vehicle-driving noise environment
CN105447490A (en) * 2015-11-19 2016-03-30 浙江宇视科技有限公司 Vehicle key point detection method based on gradient regression tree and apparatus thereof
CN105845126A (en) * 2016-05-23 2016-08-10 渤海大学 Method for automatic English subtitle filling of English audio image data
CN106203368A (en) * 2016-07-18 2016-12-07 江苏科技大学 A kind of traffic video frequency vehicle recognition methods based on SRC and SVM assembled classifier
CN106373559A (en) * 2016-09-08 2017-02-01 河海大学 Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting
CN108182949A (en) * 2017-12-11 2018-06-19 华南理工大学 A kind of highway anomalous audio event category method based on depth conversion feature
CN108597505A (en) * 2018-04-20 2018-09-28 北京元心科技有限公司 Audio recognition method, device and terminal device
CN108614155A (en) * 2018-05-31 2018-10-02 许继集团有限公司 A kind of synchronous phasor measuring method and system that Hamming window is added
CN108898227A (en) * 2018-06-15 2018-11-27 成都四方伟业软件股份有限公司 Learning rate calculation method and device, disaggregated model calculation method and device
CN108922514A (en) * 2018-09-19 2018-11-30 河海大学 A kind of robust features extracting method based on low frequency logarithmic spectrum
CN109065030A (en) * 2018-08-01 2018-12-21 上海大学 Ambient sound recognition methods and system based on convolutional neural networks

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050027514A1 (en) * 2003-07-28 2005-02-03 Jian Zhang Method and apparatus for automatically recognizing audio data
CN101388688A (en) * 2008-11-05 2009-03-18 北京理工大学 Frequency scanning interference suspending method for direct sequence spread spectrum communication system
CN104916289A (en) * 2015-06-12 2015-09-16 哈尔滨工业大学 Quick acoustic event detection method under vehicle-driving noise environment
CN105447490A (en) * 2015-11-19 2016-03-30 浙江宇视科技有限公司 Vehicle key point detection method based on gradient regression tree and apparatus thereof
CN105845126A (en) * 2016-05-23 2016-08-10 渤海大学 Method for automatic English subtitle filling of English audio image data
CN106203368A (en) * 2016-07-18 2016-12-07 江苏科技大学 A kind of traffic video frequency vehicle recognition methods based on SRC and SVM assembled classifier
CN106373559A (en) * 2016-09-08 2017-02-01 河海大学 Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting
CN108182949A (en) * 2017-12-11 2018-06-19 华南理工大学 A kind of highway anomalous audio event category method based on depth conversion feature
CN108597505A (en) * 2018-04-20 2018-09-28 北京元心科技有限公司 Audio recognition method, device and terminal device
CN108614155A (en) * 2018-05-31 2018-10-02 许继集团有限公司 A kind of synchronous phasor measuring method and system that Hamming window is added
CN108898227A (en) * 2018-06-15 2018-11-27 成都四方伟业软件股份有限公司 Learning rate calculation method and device, disaggregated model calculation method and device
CN109065030A (en) * 2018-08-01 2018-12-21 上海大学 Ambient sound recognition methods and system based on convolutional neural networks
CN108922514A (en) * 2018-09-19 2018-11-30 河海大学 A kind of robust features extracting method based on low frequency logarithmic spectrum

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
冯陈定;李少波;姚勇;杨静: "基于改进卷积神经网络与动态衰减学习率的环境声音识别算法", 科学技术与工程 *
刘昊天;姜海燕;舒欣;徐彦;伍艳莲;郭小清;: "基于特征迁移的多物种鸟声识别方法", 数据采集与处理 *
张小霞等: "基于能量检测的复杂环境下的鸟鸣识别", 《计算机应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110047512A (en) * 2019-04-25 2019-07-23 广东工业大学 A kind of ambient sound classification method, system and relevant apparatus
US11551116B2 (en) * 2020-01-29 2023-01-10 Rohde & Schwarz Gmbh & Co. Kg Signal analysis method and signal analysis module
CN112906795A (en) * 2021-02-23 2021-06-04 江苏聆世科技有限公司 Whistle vehicle judgment method based on convolutional neural network
CN113205830A (en) * 2021-05-08 2021-08-03 南京师范大学 Automobile whistle recognition method based on subband spectral entropy method and PSO-GA-SVM

Similar Documents

Publication Publication Date Title
CN109584888A (en) Whistle recognition methods based on machine learning
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
WO2001016937A9 (en) System and method for classification of sound sources
EP1569200A1 (en) Identification of the presence of speech in digital audio data
CN109256138A (en) Auth method, terminal device and computer readable storage medium
Huang et al. Intelligent feature extraction and classification of anuran vocalizations
CN106997765B (en) Quantitative characterization method for human voice timbre
KR101888058B1 (en) The method and apparatus for identifying speaker based on spoken word
WO2017045429A1 (en) Audio data detection method and system and storage medium
CN110807585A (en) Student classroom learning state online evaluation method and system
CN113327626A (en) Voice noise reduction method, device, equipment and storage medium
Ramashini et al. Robust cepstral feature for bird sound classification
CN111402922B (en) Audio signal classification method, device, equipment and storage medium based on small samples
CN109997186B (en) Apparatus and method for classifying acoustic environments
Memon et al. Using information theoretic vector quantization for inverted MFCC based speaker verification
Kaminski et al. Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models
EP4177885A1 (en) Quantifying signal purity by means of machine learning
CN114302301B (en) Frequency response correction method and related product
Moinuddin et al. Speaker Identification based on GFCC using GMM
KR20190135916A (en) Apparatus and method for determining user stress using speech signal
CN112735477B (en) Voice emotion analysis method and device
Islam et al. Neural-Response-Based Text-Dependent speaker identification under noisy conditions
Hashemi et al. Persian music source separation in audio-visual data using deep learning
Silveira et al. Convolutive ICA-based forensic speaker identification using mel frequency cepstral coefficients and gaussian mixture models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190405

RJ01 Rejection of invention patent application after publication