CN1924850A - Audio fast search method - Google Patents

Audio fast search method Download PDF

Info

Publication number
CN1924850A
CN1924850A CN 200510086315 CN200510086315A CN1924850A CN 1924850 A CN1924850 A CN 1924850A CN 200510086315 CN200510086315 CN 200510086315 CN 200510086315 A CN200510086315 A CN 200510086315A CN 1924850 A CN1924850 A CN 1924850A
Authority
CN
China
Prior art keywords
audio
histogram
frequency
sub belt
target audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200510086315
Other languages
Chinese (zh)
Other versions
CN100424692C (en
Inventor
梁伟
张树武
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CNB2005100863153A priority Critical patent/CN100424692C/en
Publication of CN1924850A publication Critical patent/CN1924850A/en
Application granted granted Critical
Publication of CN100424692C publication Critical patent/CN100424692C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This invention provides one rapid audio frequency research method based on time and frequency zone, which has the following properties: using audio signal energy proportion and taking histogram as establishing method and testing the appearance position on aim audio frequency; selecting proper sub band to make the frequency signal with best robustness of noise signal and deformation in statistical means; frequency spectrum distribution according to aim audio and adjusting VQ boundary; using widely histogram match formula; forwarding audio research formula property standard and designing the object evaluation parameters.

Description

Audio fast search method
Technical field
The present invention relates to multimedia audio searching system technical field.More precisely, a kind of audio fast search method.
Background technology
At present, information industry is just obtaining unprecedented development.Various information mediums have also obtained swift and violent development, such as TV, and broadcasting, network, wireless telecommunications etc.All be flooded with a large amount of information every day in these information mediums.How the attention that just progressively obtains country with the information security that guarantees country is effectively managed and monitored to these information.Based on the responsive Audio Monitoring System of audio frequency time-frequency domain treatment technology in order to satisfy the monitoring requirement of the responsive audio frequency of information security field.
Summary of the invention
The present invention proposes a kind of audio fast search method of robust, this method has strong robustness for distortion such as noises.The most basic feature of the present invention is the time-frequency domain treatment technology at frequency spectrum.By normalized, make proper vector have very strong robustness and the property distinguished to frequency spectrum.Based on the frequency spectrum after handling, set up sub belt energy than histogram, the matching process that utilizes histogram to overlap carries out rapid Estimation to the doubtful position of target audio;
A kind of audio fast search method, fast audio search method have proposed the fast audio search method based on the description of time and frequency zone frequency spectrum.The essential characteristic of this method is to utilize the sound signal sub belt energy to liken to be essential characteristic, and as modeling method, the appearance position of target audio jumped to be detected, thereby has very high search speed with histogram; The essential characteristic of this method, the one, select suitable subband, make the signal of this frequency band on statistical significance, have best robustness for noise signal and distortion; The 2nd, according to the spectrum distribution of target audio, adaptive adjustment VQ quantization boundary; The 3rd, used for reference widely used histogram matching algorithm in the image recognition.After the sub belt energy signal is done normalization, avoided in the conventional method detecting mistake and omission, and calculated amount is very little because of what distortion such as ground unrest interference caused; The 4th, proposed to set up the Performance evaluation criterion of audio search algorithm, and design analysis the objective evaluation parameter of result for retrieval.Experiment showed, that algorithm that the present invention proposes not only steadily obtaining good retrieval precision and search speed under the ground unrest, also has good robustness to nonstationary noise.
Audio fast search method, this method can be located the target audio fragment of being concerned about from the unknown audio stream of magnanimity fast, and process flow diagram the steps include: as shown in Figure 1
1) at first target audio segment and audio stream are carried out feature extraction; The feature extraction of audio frequency at first utilizes bandpass filter that audio frequency is carried out filtering, calculates sub belt energy respectively based on the signal of each passband after the filtering, and the calculating of sub belt energy is a frame with 256, and frame moves 128 points; Frequency subband is evenly distributed on the log frequency;
2) based on 1) sub belt energy that calculated, calculate the sub belt energy ratio of target audio segment and audio stream, liken to sub belt energy and be the essential characteristic vector;
3) in order to improve the robustness of feature for noise, need be to 2) proper vector calculated carries out quantification treatment, the selection of every dimension quantization boundary has equal characteristic number with each dimensional feature of target audio in each bin be criterion, proper vector after quantizing is set up histogram model, and the quantization boundary of each dimension of record; Quantization boundary according to target audio quantizes the proper vector that testing audio flows;
4) histogram of target audio flows to line slip along audio frequency characteristics, and sets up the histogram of audio stream current location, and the histogram of target audio and the histogram of testing audio stream are complementary, and obtains similarity; If similarity, is then thought the position of finding target audio greater than certain thresholding, mate otherwise jump to next possible position according to the estimation of current similarity next time.
The present invention mainly comprises three modules: a feature extraction, two histogrammic foundation are described in detail respectively below three measuring similarities.
Feature extraction.This method employing sub belt energy likens to and is essential characteristic, sub belt energy is than being to each description of the distribution trend of pairing each sub belt energy constantly, in order to improve the robustness of feature, need carry out vector quantization to the sub belt energy ratio handles, the selection of quantization boundary has equal feature number with each dimensional feature of target audio in each bin be criterion, proper vector after quantization boundary and the quantification is deposited in the file
Can be expressed as:
Feature(n)=(f(n),g(n)) (5)
f(n)=(f 1(n),f 2(n),f 3(n),…,f M(n)) (6)
g(n)=(g 1(n),g 2(n),g 3(n),…,g M(n)) (7)
In the formula, n express time, the frequency band number of M representation feature vector
f i(n)=α(n)×E i(n) (8)
g i(n)=β(n)×ECR i(n) (9)
ECR i(n)=(E i(n)-E i(n-1))/E i(n-1) (10)
In the formula, E i(n) the output frame energy of pairing i the bandpass filter of expression n frame; Because short-time energy is relatively more responsive to high level,, be defined as so the range value that adopts short-time average magnitude to measure sound signal changes:
E i ( n ) = Σ t = nN ( n + 1 ) N | g i ( t ) | - - - ( 11 )
α (n) is used for each proper vector is carried out normalization, so that eliminate the influence of volume, is defined as:
α ( n ) = 1 max ( E i ( n ) ) - - - ( 12 )
β ( n ) = 1 max i ( EC R i ( n ) ) - - - ( 13 )
In the formula, max represents to get maximal value.
In order to improve the robustness of feature, need carry out vector quantization to the sub belt energy ratio.The vector quantization border is to determine according to the distribution of the sub belt energy ratio of target audio.The selection of quantization boundary has equal characteristic number with each dimensional feature of target audio in each bin be criterion.
Histogrammic foundation and measuring similarity.After having finished feature extraction, need set up model to each audio-frequency fragments, the method for setting up model is a lot, because the calculated amount of histogram matching is little, and has stronger robustness for noise, so adopt histogrammic matching process.
Simultaneously, for the sequential that increases template is distinguished property, be that the target audio of t is equally divided into n subwindow to duration, set up histogram respectively at each subwindow, use h i RExpression.
Distance metric adopts the overlapping mode of histogram, can be expressed as such as n histogrammic distance constantly in target audio histogram and the testing audio stream:
S ( h R , h T ( n ) ) = 1 L Σ i = 1 L min ( h i R , h i T ( n ) ) - - - ( 1 )
In the formula, h R: the histogram of reference audio, h j T(n): n is the histogram of testing audio constantly, L: the number in histogram Zhong Bao chamber.
Because similarity and histogrammic sliding position between the histogram have correlativity, can pass through n 1The similarity of stone inscription is to n 2The similarity upper limit is constantly estimated.The coupling budget that can skip this point if discreet value is lower than the thresholding of appointment, thus calculated amount reduced.Predictor formula is as follows:
S up ( h i R , h i T ( n 2 ) ) = S ( h i R , h i T ( n 1 ) ) + n 2 - n 1 P i - - - ( 2 )
So the jumping over step-length and can utilize formulate as follows of each subwindow:
w i = floor ( P i ( &theta; - S i ) ) + 1 if S i < &theta; , 1 otherwise , - - - ( 3 )
In the formula, w iExpression jump step-length, the maximum positive integer less than x is got in floor (x) expression.Finally jump step-length w can use following public affairs
w = max i ( w i ) - - - ( 4 )
Algorithm performance is estimated.The performance evaluation of this algorithm is by the occurrence number of advertisement in the TV programme is verified.If detect position and the actual play position of targeted advertisements differ and be no more than 1 second, we just think that this advertisement correctly detects.Search performance is made up of two indexs: accuracy ξ, recall rate δ and overall accuracy τ.Formulate is as follows:
Figure A20051008631500086
&tau; = 2 &times; &xi; &times; &delta; &xi; + &delta;
Description of drawings
Fig. 1 is a quick audio retrieval process flow diagram of the present invention.
Fig. 2 is that audio-frequency fragments is through the short-time energy oscillogram behind the comb filtering.
Fig. 3 is the energy waveform figure of each frequency band after the low-pass filtering.
Fig. 4 is the energy waveform figure of each frequency band after the normalization.
Embodiment
The quick audio retrieval flow process of Fig. 1, this flow process at first utilize the comb filter group that testing audio and reference audio are carried out comb filtering, obtain proper vector through handling; Then reference audio is set up histogram; Utilize the reference audio histogram that testing audio is searched at last.Jump each time all and the current matching similarity of search window have confidential relation.
The audio-frequency fragments of Fig. 2 is through the short-time energy oscillogram behind the comb filtering, and this figure is the subband short-time energy waveform that obtains after audio-frequency fragments is handled through the comb filtering group.The frequency band energy waveform that different color showings is different.
The energy waveform figure of each frequency band after the low-pass filtering of Fig. 3.This figure is the short-time energy curve that obtains behind the subband short-time energy waveform process low pass smoothing filter.
Fig. 4, this figure are to carry out the normalized on the frequency axis direction, the normalization short-time energy curve that finally obtains through the short-time energy curve after the processing of low pass smoothing filter.
Table 1: result for retrieval
Table 1: experimental result relatively
Search method The advertisement duration Accuracy Recall rate Correctness Search time
The related coefficient matching process <=5 seconds 64.8% 78.2% 71.5% 21 hours 56 minutes
6-10 second 91.1% 88.5% 89.8%
11-20 second 97.1% 85.8% 91.4%
>20 seconds 100% 89.7% 94.8%
Histogram method <=5 seconds 93.0% 94.2% 93.6% 30 minutes 14 seconds
6-10 second 95.3% 96.0% 95.7%
11-20 second 99.2% 97.5% 98.4%
>20 seconds 100% 98.2% 99.1%

Claims (3)

1. audio fast search method, utilize the sound signal sub belt energy to liken to and be essential characteristic, with histogram as modeling method, the appearance position of target audio jumped detect, the essential characteristic of this method, the one, select suitable subband, make the signal of this frequency band on statistical significance, have best robustness for noise signal and distortion; The 2nd, according to the spectrum distribution of target audio, adaptive adjustment VQ quantization boundary; The 3rd, used for reference widely used histogram matching algorithm in the image recognition, after the sub belt energy signal is done normalization, avoided in the conventional method detecting mistake and omission, and calculated amount is very little because of what the ground unrest distorted due to interference caused; The 4th, proposed to set up the Performance evaluation criterion of audio search algorithm, and design analysis the objective evaluation parameter of result for retrieval.
2. according to the audio fast search method of claim 1, it is characterized in that this method can be located the target audio fragment of being concerned about fast, the steps include: from the unknown audio stream of magnanimity
1) at first target audio segment and audio stream are carried out feature extraction; The feature extraction of audio frequency at first utilizes bandpass filter that audio frequency is carried out filtering, calculates sub belt energy respectively based on the signal of each passband after the filtering, and the calculating of sub belt energy is a frame with 256, and frame moves 128 points; Frequency subband is evenly distributed on the log frequency;
2) based on 1) sub belt energy that calculated, calculate the sub belt energy ratio of target audio segment and audio stream, liken to sub belt energy and be the essential characteristic vector;
3) in order to improve the robustness of feature for noise, need be to 2) proper vector calculated carries out quantification treatment, the selection of every dimension quantization boundary has equal characteristic number with each dimensional feature of target audio in each bin be criterion, proper vector after quantizing is set up histogram model, and the quantization boundary of each dimension of record; Quantization boundary according to target audio quantizes the proper vector that testing audio flows;
4) histogram of target audio flows to line slip along audio frequency characteristics, and sets up the histogram of audio stream current location, and the histogram of target audio and the histogram of testing audio stream are complementary, and obtains similarity; If similarity, is then thought the position of finding target audio greater than certain thresholding, mate otherwise jump to next possible position according to the estimation of current similarity next time.
3. audio fast search method according to claim 1 and 2 is characterized in that, feature extraction, and histogrammic foundation and similarity are calculated,
1) feature extraction
This method employing sub belt energy likens to and is essential characteristic, sub belt energy is than being to each description of the distribution trend of pairing each sub belt energy constantly, in order to improve the robustness of feature, need carry out vector quantization to the sub belt energy ratio handles, the selection of quantization boundary has equal feature number with each dimensional feature of target audio in each bin be criterion, proper vector after quantization boundary and the quantification is deposited in the file
2) histogrammic foundation and measuring similarity
After having finished feature extraction, need set up model to each audio-frequency fragments, the method for setting up model is a lot, because the calculated amount of histogram matching is little, and has stronger robustness for noise, thus adopt histogrammic matching process,
Simultaneously, for the sequential that increases template is distinguished property, be that the target audio of t is equally divided into 4 subwindows to duration, set up histogram respectively at each subwindow, use h i RExpression,
Distance metric adopts the overlapping mode of histogram, can be expressed as such as n histogrammic distance constantly in target audio histogram and the testing audio stream:
S ( h R , h T ( n ) ) = 1 L &Sigma; i = 1 L min ( h i R , h i T ( n ) ) - - - ( 1 )
In the formula, h R: the histogram of reference audio, h j T(n): n is the histogram of testing audio constantly, L: the number in histogram Zhong Bao chamber,
Because similarity and histogrammic sliding position between the histogram have correlativity, can pass through n 1Similarity constantly is to n 2The similarity upper limit is constantly estimated, the coupling budget that can skip this point if discreet value is lower than the thresholding of appointment, thus having reduced calculated amount, predictor formula is as follows:
S up ( h i R , h i T ( n 2 ) ) = S ( h i R , h i T ( n 1 ) ) + n 2 - n 1 P i - - - ( 2 )
So the jumping over step-length and can utilize formulate as follows of each subwindow:
w i = floor ( P i ( &theta; - S i ) ) + 1 if S i < &theta; , 1 otherwise , - - - ( 3 )
In the formula, w iExpression jump step-length, the maximum positive integer less than x is got in floor (x) expression, and final jump step-length w can use following formulate:
w = max i ( w i ) . - - - ( 4 )
CNB2005100863153A 2005-08-31 2005-08-31 Audio fast search method Expired - Fee Related CN100424692C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005100863153A CN100424692C (en) 2005-08-31 2005-08-31 Audio fast search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100863153A CN100424692C (en) 2005-08-31 2005-08-31 Audio fast search method

Publications (2)

Publication Number Publication Date
CN1924850A true CN1924850A (en) 2007-03-07
CN100424692C CN100424692C (en) 2008-10-08

Family

ID=37817492

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100863153A Expired - Fee Related CN100424692C (en) 2005-08-31 2005-08-31 Audio fast search method

Country Status (1)

Country Link
CN (1) CN100424692C (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123787A (en) * 2011-11-21 2013-05-29 金峰 Method for synchronizing and exchanging mobile terminal with media
CN104505101A (en) * 2014-12-24 2015-04-08 北京巴越赤石科技有限公司 Real-time audio comparison method
CN110299134A (en) * 2019-07-01 2019-10-01 中科软科技股份有限公司 A kind of audio-frequency processing method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100524065B1 (en) * 2002-12-23 2005-10-26 삼성전자주식회사 Advanced method for encoding and/or decoding digital audio using time-frequency correlation and apparatus thereof
GB2403636A (en) * 2003-07-02 2005-01-05 Sony Uk Ltd Information retrieval using an array of nodes
WO2005010865A2 (en) * 2003-07-31 2005-02-03 The Registrar, Indian Institute Of Science Method of music information retrieval and classification using continuity information

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123787A (en) * 2011-11-21 2013-05-29 金峰 Method for synchronizing and exchanging mobile terminal with media
CN103123787B (en) * 2011-11-21 2015-11-18 金峰 A kind of mobile terminal and media sync and mutual method
CN104505101A (en) * 2014-12-24 2015-04-08 北京巴越赤石科技有限公司 Real-time audio comparison method
CN104505101B (en) * 2014-12-24 2017-11-03 北京巴越赤石科技有限公司 A kind of real-time audio comparison method
CN110299134A (en) * 2019-07-01 2019-10-01 中科软科技股份有限公司 A kind of audio-frequency processing method and system

Also Published As

Publication number Publication date
CN100424692C (en) 2008-10-08

Similar Documents

Publication Publication Date Title
US7333864B1 (en) System and method for automatic segmentation and identification of repeating objects from an audio stream
CN101477801B (en) Method for detecting and eliminating pulse noise in digital audio signal
CN102097095A (en) Speech endpoint detecting method and device
CN1655229A (en) Apparatus, method, and medium for detecting and discriminating impact sound
CN1957396A (en) Device and method for analyzing an information signal
CN109145727A (en) A kind of bearing fault characteristics extracting method based on VMD parameter optimization
US20160322064A1 (en) Method and apparatus for signal extraction of audio signal
CN111126819B (en) Intelligent analysis method for urban driving condition
CN1384960A (en) Method and means for robust feature extraction for speech recognition
CN1773605A (en) Sound end detecting method for sound identifying system
CN110146922B (en) Single-double seismometer interference identification method for high-speed railway earthquake early warning system
CN110767248B (en) Anti-modulation interference audio fingerprint extraction method
CN102144258A (en) Method and apparatus to facilitate determining signal bounding frequencies
CN1924850A (en) Audio fast search method
CN102759572B (en) A kind of quality determining method of product and pick-up unit
CN106504760A (en) Broadband background noise and speech Separation detecting system and method
CN109102818B (en) Denoising audio sampling algorithm based on signal frequency probability density function distribution
Malik et al. Automatic threshold optimization in nonlinear energy operator based spike detection
CN1870136A (en) Variation Bayesian voice strengthening method based on voice generating model
Blommer et al. Sound quality metric development for wind buffeting and gusting noise
CN117172601A (en) Non-invasive load monitoring method based on residual total convolution neural network
CN110287853B (en) Transient signal denoising method based on wavelet decomposition
WO2021088176A1 (en) Binary multi-band power distribution-based low signal-to-noise ratio sound event detection method
CN101858939B (en) Method and device for detecting harmonic signal
CN102759571B (en) Product quality test process and test device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20081008

Termination date: 20180831