CN1924850A - Audio fast search method - Google Patents
Audio fast search method Download PDFInfo
- Publication number
- CN1924850A CN1924850A CN 200510086315 CN200510086315A CN1924850A CN 1924850 A CN1924850 A CN 1924850A CN 200510086315 CN200510086315 CN 200510086315 CN 200510086315 A CN200510086315 A CN 200510086315A CN 1924850 A CN1924850 A CN 1924850A
- Authority
- CN
- China
- Prior art keywords
- audio
- histogram
- frequency
- sub belt
- target audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 238000001228 spectrum Methods 0.000 claims abstract description 7
- 238000011156 evaluation Methods 0.000 claims abstract description 6
- 230000005236 sound signal Effects 0.000 claims abstract description 4
- 238000013139 quantization Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 10
- 239000012634 fragment Substances 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 5
- 238000011002 quantification Methods 0.000 claims description 4
- 206010038743 Restlessness Diseases 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 230000000295 complement effect Effects 0.000 claims description 2
- 238000007796 conventional method Methods 0.000 claims description 2
- 230000008878 coupling Effects 0.000 claims description 2
- 238000010168 coupling process Methods 0.000 claims description 2
- 238000005859 coupling reaction Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 claims description 2
- 230000009191 jumping Effects 0.000 claims description 2
- 238000010845 search algorithm Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This invention provides one rapid audio frequency research method based on time and frequency zone, which has the following properties: using audio signal energy proportion and taking histogram as establishing method and testing the appearance position on aim audio frequency; selecting proper sub band to make the frequency signal with best robustness of noise signal and deformation in statistical means; frequency spectrum distribution according to aim audio and adjusting VQ boundary; using widely histogram match formula; forwarding audio research formula property standard and designing the object evaluation parameters.
Description
Technical field
The present invention relates to multimedia audio searching system technical field.More precisely, a kind of audio fast search method.
Background technology
At present, information industry is just obtaining unprecedented development.Various information mediums have also obtained swift and violent development, such as TV, and broadcasting, network, wireless telecommunications etc.All be flooded with a large amount of information every day in these information mediums.How the attention that just progressively obtains country with the information security that guarantees country is effectively managed and monitored to these information.Based on the responsive Audio Monitoring System of audio frequency time-frequency domain treatment technology in order to satisfy the monitoring requirement of the responsive audio frequency of information security field.
Summary of the invention
The present invention proposes a kind of audio fast search method of robust, this method has strong robustness for distortion such as noises.The most basic feature of the present invention is the time-frequency domain treatment technology at frequency spectrum.By normalized, make proper vector have very strong robustness and the property distinguished to frequency spectrum.Based on the frequency spectrum after handling, set up sub belt energy than histogram, the matching process that utilizes histogram to overlap carries out rapid Estimation to the doubtful position of target audio;
A kind of audio fast search method, fast audio search method have proposed the fast audio search method based on the description of time and frequency zone frequency spectrum.The essential characteristic of this method is to utilize the sound signal sub belt energy to liken to be essential characteristic, and as modeling method, the appearance position of target audio jumped to be detected, thereby has very high search speed with histogram; The essential characteristic of this method, the one, select suitable subband, make the signal of this frequency band on statistical significance, have best robustness for noise signal and distortion; The 2nd, according to the spectrum distribution of target audio, adaptive adjustment VQ quantization boundary; The 3rd, used for reference widely used histogram matching algorithm in the image recognition.After the sub belt energy signal is done normalization, avoided in the conventional method detecting mistake and omission, and calculated amount is very little because of what distortion such as ground unrest interference caused; The 4th, proposed to set up the Performance evaluation criterion of audio search algorithm, and design analysis the objective evaluation parameter of result for retrieval.Experiment showed, that algorithm that the present invention proposes not only steadily obtaining good retrieval precision and search speed under the ground unrest, also has good robustness to nonstationary noise.
Audio fast search method, this method can be located the target audio fragment of being concerned about from the unknown audio stream of magnanimity fast, and process flow diagram the steps include: as shown in Figure 1
1) at first target audio segment and audio stream are carried out feature extraction; The feature extraction of audio frequency at first utilizes bandpass filter that audio frequency is carried out filtering, calculates sub belt energy respectively based on the signal of each passband after the filtering, and the calculating of sub belt energy is a frame with 256, and frame moves 128 points; Frequency subband is evenly distributed on the log frequency;
2) based on 1) sub belt energy that calculated, calculate the sub belt energy ratio of target audio segment and audio stream, liken to sub belt energy and be the essential characteristic vector;
3) in order to improve the robustness of feature for noise, need be to 2) proper vector calculated carries out quantification treatment, the selection of every dimension quantization boundary has equal characteristic number with each dimensional feature of target audio in each bin be criterion, proper vector after quantizing is set up histogram model, and the quantization boundary of each dimension of record; Quantization boundary according to target audio quantizes the proper vector that testing audio flows;
4) histogram of target audio flows to line slip along audio frequency characteristics, and sets up the histogram of audio stream current location, and the histogram of target audio and the histogram of testing audio stream are complementary, and obtains similarity; If similarity, is then thought the position of finding target audio greater than certain thresholding, mate otherwise jump to next possible position according to the estimation of current similarity next time.
The present invention mainly comprises three modules: a feature extraction, two histogrammic foundation are described in detail respectively below three measuring similarities.
Feature extraction.This method employing sub belt energy likens to and is essential characteristic, sub belt energy is than being to each description of the distribution trend of pairing each sub belt energy constantly, in order to improve the robustness of feature, need carry out vector quantization to the sub belt energy ratio handles, the selection of quantization boundary has equal feature number with each dimensional feature of target audio in each bin be criterion, proper vector after quantization boundary and the quantification is deposited in the file
Can be expressed as:
Feature(n)=(f(n),g(n)) (5)
f(n)=(f
1(n),f
2(n),f
3(n),…,f
M(n)) (6)
g(n)=(g
1(n),g
2(n),g
3(n),…,g
M(n)) (7)
In the formula, n express time, the frequency band number of M representation feature vector
f
i(n)=α(n)×E
i(n) (8)
g
i(n)=β(n)×ECR
i(n) (9)
ECR
i(n)=(E
i(n)-E
i(n-1))/E
i(n-1) (10)
In the formula, E
i(n) the output frame energy of pairing i the bandpass filter of expression n frame; Because short-time energy is relatively more responsive to high level,, be defined as so the range value that adopts short-time average magnitude to measure sound signal changes:
α (n) is used for each proper vector is carried out normalization, so that eliminate the influence of volume, is defined as:
In the formula, max represents to get maximal value.
In order to improve the robustness of feature, need carry out vector quantization to the sub belt energy ratio.The vector quantization border is to determine according to the distribution of the sub belt energy ratio of target audio.The selection of quantization boundary has equal characteristic number with each dimensional feature of target audio in each bin be criterion.
Histogrammic foundation and measuring similarity.After having finished feature extraction, need set up model to each audio-frequency fragments, the method for setting up model is a lot, because the calculated amount of histogram matching is little, and has stronger robustness for noise, so adopt histogrammic matching process.
Simultaneously, for the sequential that increases template is distinguished property, be that the target audio of t is equally divided into n subwindow to duration, set up histogram respectively at each subwindow, use h
i RExpression.
Distance metric adopts the overlapping mode of histogram, can be expressed as such as n histogrammic distance constantly in target audio histogram and the testing audio stream:
In the formula, h
R: the histogram of reference audio, h
j T(n): n is the histogram of testing audio constantly, L: the number in histogram Zhong Bao chamber.
Because similarity and histogrammic sliding position between the histogram have correlativity, can pass through n
1The similarity of stone inscription is to n
2The similarity upper limit is constantly estimated.The coupling budget that can skip this point if discreet value is lower than the thresholding of appointment, thus calculated amount reduced.Predictor formula is as follows:
So the jumping over step-length and can utilize formulate as follows of each subwindow:
In the formula, w
iExpression jump step-length, the maximum positive integer less than x is got in floor (x) expression.Finally jump step-length w can use following public affairs
Algorithm performance is estimated.The performance evaluation of this algorithm is by the occurrence number of advertisement in the TV programme is verified.If detect position and the actual play position of targeted advertisements differ and be no more than 1 second, we just think that this advertisement correctly detects.Search performance is made up of two indexs: accuracy ξ, recall rate δ and overall accuracy τ.Formulate is as follows:
Description of drawings
Fig. 1 is a quick audio retrieval process flow diagram of the present invention.
Fig. 2 is that audio-frequency fragments is through the short-time energy oscillogram behind the comb filtering.
Fig. 3 is the energy waveform figure of each frequency band after the low-pass filtering.
Fig. 4 is the energy waveform figure of each frequency band after the normalization.
Embodiment
The quick audio retrieval flow process of Fig. 1, this flow process at first utilize the comb filter group that testing audio and reference audio are carried out comb filtering, obtain proper vector through handling; Then reference audio is set up histogram; Utilize the reference audio histogram that testing audio is searched at last.Jump each time all and the current matching similarity of search window have confidential relation.
The audio-frequency fragments of Fig. 2 is through the short-time energy oscillogram behind the comb filtering, and this figure is the subband short-time energy waveform that obtains after audio-frequency fragments is handled through the comb filtering group.The frequency band energy waveform that different color showings is different.
The energy waveform figure of each frequency band after the low-pass filtering of Fig. 3.This figure is the short-time energy curve that obtains behind the subband short-time energy waveform process low pass smoothing filter.
Fig. 4, this figure are to carry out the normalized on the frequency axis direction, the normalization short-time energy curve that finally obtains through the short-time energy curve after the processing of low pass smoothing filter.
Table 1: result for retrieval
Table 1: experimental result relatively
Search method | The advertisement duration | Accuracy | Recall rate | Correctness | Search time |
The related coefficient matching process | <=5 seconds | 64.8% | 78.2% | 71.5% | 21 hours 56 minutes |
6-10 second | 91.1% | 88.5% | 89.8% | ||
11-20 second | 97.1% | 85.8% | 91.4% | ||
>20 seconds | 100% | 89.7% | 94.8% | ||
Histogram method | <=5 seconds | 93.0% | 94.2% | 93.6% | 30 minutes 14 seconds |
6-10 second | 95.3% | 96.0% | 95.7% | ||
11-20 second | 99.2% | 97.5% | 98.4% | ||
>20 seconds | 100% | 98.2% | 99.1% |
Claims (3)
1. audio fast search method, utilize the sound signal sub belt energy to liken to and be essential characteristic, with histogram as modeling method, the appearance position of target audio jumped detect, the essential characteristic of this method, the one, select suitable subband, make the signal of this frequency band on statistical significance, have best robustness for noise signal and distortion; The 2nd, according to the spectrum distribution of target audio, adaptive adjustment VQ quantization boundary; The 3rd, used for reference widely used histogram matching algorithm in the image recognition, after the sub belt energy signal is done normalization, avoided in the conventional method detecting mistake and omission, and calculated amount is very little because of what the ground unrest distorted due to interference caused; The 4th, proposed to set up the Performance evaluation criterion of audio search algorithm, and design analysis the objective evaluation parameter of result for retrieval.
2. according to the audio fast search method of claim 1, it is characterized in that this method can be located the target audio fragment of being concerned about fast, the steps include: from the unknown audio stream of magnanimity
1) at first target audio segment and audio stream are carried out feature extraction; The feature extraction of audio frequency at first utilizes bandpass filter that audio frequency is carried out filtering, calculates sub belt energy respectively based on the signal of each passband after the filtering, and the calculating of sub belt energy is a frame with 256, and frame moves 128 points; Frequency subband is evenly distributed on the log frequency;
2) based on 1) sub belt energy that calculated, calculate the sub belt energy ratio of target audio segment and audio stream, liken to sub belt energy and be the essential characteristic vector;
3) in order to improve the robustness of feature for noise, need be to 2) proper vector calculated carries out quantification treatment, the selection of every dimension quantization boundary has equal characteristic number with each dimensional feature of target audio in each bin be criterion, proper vector after quantizing is set up histogram model, and the quantization boundary of each dimension of record; Quantization boundary according to target audio quantizes the proper vector that testing audio flows;
4) histogram of target audio flows to line slip along audio frequency characteristics, and sets up the histogram of audio stream current location, and the histogram of target audio and the histogram of testing audio stream are complementary, and obtains similarity; If similarity, is then thought the position of finding target audio greater than certain thresholding, mate otherwise jump to next possible position according to the estimation of current similarity next time.
3. audio fast search method according to claim 1 and 2 is characterized in that, feature extraction, and histogrammic foundation and similarity are calculated,
1) feature extraction
This method employing sub belt energy likens to and is essential characteristic, sub belt energy is than being to each description of the distribution trend of pairing each sub belt energy constantly, in order to improve the robustness of feature, need carry out vector quantization to the sub belt energy ratio handles, the selection of quantization boundary has equal feature number with each dimensional feature of target audio in each bin be criterion, proper vector after quantization boundary and the quantification is deposited in the file
2) histogrammic foundation and measuring similarity
After having finished feature extraction, need set up model to each audio-frequency fragments, the method for setting up model is a lot, because the calculated amount of histogram matching is little, and has stronger robustness for noise, thus adopt histogrammic matching process,
Simultaneously, for the sequential that increases template is distinguished property, be that the target audio of t is equally divided into 4 subwindows to duration, set up histogram respectively at each subwindow, use h
i RExpression,
Distance metric adopts the overlapping mode of histogram, can be expressed as such as n histogrammic distance constantly in target audio histogram and the testing audio stream:
In the formula, h
R: the histogram of reference audio, h
j T(n): n is the histogram of testing audio constantly, L: the number in histogram Zhong Bao chamber,
Because similarity and histogrammic sliding position between the histogram have correlativity, can pass through n
1Similarity constantly is to n
2The similarity upper limit is constantly estimated, the coupling budget that can skip this point if discreet value is lower than the thresholding of appointment, thus having reduced calculated amount, predictor formula is as follows:
So the jumping over step-length and can utilize formulate as follows of each subwindow:
In the formula, w
iExpression jump step-length, the maximum positive integer less than x is got in floor (x) expression, and final jump step-length w can use following formulate:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005100863153A CN100424692C (en) | 2005-08-31 | 2005-08-31 | Audio fast search method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005100863153A CN100424692C (en) | 2005-08-31 | 2005-08-31 | Audio fast search method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1924850A true CN1924850A (en) | 2007-03-07 |
CN100424692C CN100424692C (en) | 2008-10-08 |
Family
ID=37817492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2005100863153A Expired - Fee Related CN100424692C (en) | 2005-08-31 | 2005-08-31 | Audio fast search method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100424692C (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103123787A (en) * | 2011-11-21 | 2013-05-29 | 金峰 | Method for synchronizing and exchanging mobile terminal with media |
CN104505101A (en) * | 2014-12-24 | 2015-04-08 | 北京巴越赤石科技有限公司 | Real-time audio comparison method |
CN110299134A (en) * | 2019-07-01 | 2019-10-01 | 中科软科技股份有限公司 | A kind of audio-frequency processing method and system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100524065B1 (en) * | 2002-12-23 | 2005-10-26 | 삼성전자주식회사 | Advanced method for encoding and/or decoding digital audio using time-frequency correlation and apparatus thereof |
GB2403636A (en) * | 2003-07-02 | 2005-01-05 | Sony Uk Ltd | Information retrieval using an array of nodes |
WO2005010865A2 (en) * | 2003-07-31 | 2005-02-03 | The Registrar, Indian Institute Of Science | Method of music information retrieval and classification using continuity information |
-
2005
- 2005-08-31 CN CNB2005100863153A patent/CN100424692C/en not_active Expired - Fee Related
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103123787A (en) * | 2011-11-21 | 2013-05-29 | 金峰 | Method for synchronizing and exchanging mobile terminal with media |
CN103123787B (en) * | 2011-11-21 | 2015-11-18 | 金峰 | A kind of mobile terminal and media sync and mutual method |
CN104505101A (en) * | 2014-12-24 | 2015-04-08 | 北京巴越赤石科技有限公司 | Real-time audio comparison method |
CN104505101B (en) * | 2014-12-24 | 2017-11-03 | 北京巴越赤石科技有限公司 | A kind of real-time audio comparison method |
CN110299134A (en) * | 2019-07-01 | 2019-10-01 | 中科软科技股份有限公司 | A kind of audio-frequency processing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN100424692C (en) | 2008-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7333864B1 (en) | System and method for automatic segmentation and identification of repeating objects from an audio stream | |
CN101477801B (en) | Method for detecting and eliminating pulse noise in digital audio signal | |
CN102097095A (en) | Speech endpoint detecting method and device | |
CN1655229A (en) | Apparatus, method, and medium for detecting and discriminating impact sound | |
CN1957396A (en) | Device and method for analyzing an information signal | |
CN109145727A (en) | A kind of bearing fault characteristics extracting method based on VMD parameter optimization | |
US20160322064A1 (en) | Method and apparatus for signal extraction of audio signal | |
CN111126819B (en) | Intelligent analysis method for urban driving condition | |
CN1384960A (en) | Method and means for robust feature extraction for speech recognition | |
CN1773605A (en) | Sound end detecting method for sound identifying system | |
CN110146922B (en) | Single-double seismometer interference identification method for high-speed railway earthquake early warning system | |
CN110767248B (en) | Anti-modulation interference audio fingerprint extraction method | |
CN102144258A (en) | Method and apparatus to facilitate determining signal bounding frequencies | |
CN1924850A (en) | Audio fast search method | |
CN102759572B (en) | A kind of quality determining method of product and pick-up unit | |
CN106504760A (en) | Broadband background noise and speech Separation detecting system and method | |
CN109102818B (en) | Denoising audio sampling algorithm based on signal frequency probability density function distribution | |
Malik et al. | Automatic threshold optimization in nonlinear energy operator based spike detection | |
CN1870136A (en) | Variation Bayesian voice strengthening method based on voice generating model | |
Blommer et al. | Sound quality metric development for wind buffeting and gusting noise | |
CN117172601A (en) | Non-invasive load monitoring method based on residual total convolution neural network | |
CN110287853B (en) | Transient signal denoising method based on wavelet decomposition | |
WO2021088176A1 (en) | Binary multi-band power distribution-based low signal-to-noise ratio sound event detection method | |
CN101858939B (en) | Method and device for detecting harmonic signal | |
CN102759571B (en) | Product quality test process and test device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20081008 Termination date: 20180831 |