The method of measuring similarity between a kind of audio-frequency fragments
Technical field
The invention belongs to the audio retrieval technical field, be specifically related to the method for measuring similarity between a kind of audio-frequency fragments.
Background technology
Along with being on the increase of multimedia document and application, audio analysis and retrieval technique become more and more important, and the audio-frequency fragments retrieval is a kind of important form of above-mentioned technology, it is the given audio-frequency fragments of user, how automatically in audio repository, to retrieve similar audio-frequency fragments, and sort from high to low according to similarity.Existing audio retrieval technology generally is to extract audio frequency characteristics from audio-frequency fragments, utilizes these features to carry out measuring similarity then, and the result retrieves according to tolerance.This method is not because consider the difference of particular content in the audio-frequency fragments, and adopts audio frequency characteristics to represent whole audio-frequency fragments, similarity that therefore can not the valid metric audio content.
(author is J.Gu to the document of delivering at Pacific-Rim Conference on Multimedia in 2004 " DominantFeature Vectors Based Audio Similarity Measure ", L.Lu, R.Cai, H.J.Zhang and J.Yang, the page number is 890-897), a kind of audio frequency characteristics of proper vector and the eigenwert based on the audio frequency characteristics matrix has been proposed: main proper vector (Dominant Feature Vectors).The frame characterizing definition that the document extracts audio fragment becomes a feature frame matrix, calculates the autocorrelation matrix of this matrix then, calculates the proper vector of autocorrelation matrix and eigenwert at last as the audio fragment feature.This method is based on the statistical nature of whole audio fragment, the content change characteristic in therefore can't the description audio segment, thus limited the accuracy of audio retrieval.
Summary of the invention
At the deficiencies in the prior art, the present invention proposes a kind of method of audio-frequency fragments measuring similarity, is used to measure the similarity between the different audio-frequency fragments.
For reaching above purpose, the technical solution used in the present invention is: the method for measuring similarity between a kind of audio-frequency fragments may further comprise the steps:
(1) audio-frequency fragments that respectively will be to be measured is divided into the similar audio unit of a plurality of tonequality;
(2) calculate the similarity between any two audio units in above-mentioned two audio-frequency fragments;
(3) according to the result of (2), measure the similarity between above-mentioned two audio-frequency fragments.
Further, (Bayesian Information Criterion BIC), is divided into the similar audio unit of a plurality of tonequality with audio-frequency fragments to be measured to utilization Bayes information standard.
Further, use following formula to calculate the similarity of two audio units:
Sim(s
i,s
j)=exp(-Dis tan ce(s
i,s
j)/2)
Wherein, s
iAnd s
jRepresent two audio units, Dis tan ce (s
i, s
j) expression s
iAnd s
jThe Euclidean distance of audio frequency characteristics vector.
Further, the proper vector of audio unit is to adopt the mean value of all frame audio frequency characteristics vectors in this audio unit to represent.
What further, the proper vector of audio frame adopted is 13 dimensional feature vectors that logarithm energy and Mel cepstral coefficients are formed.
Further: the similarity concrete steps of measuring between above-mentioned two audio-frequency fragments are:
A: the measuring similarity of two audio-frequency fragments is modeled as a cum rights bipartite graph;
B: the similarity between two audio-frequency fragments of utilization Optimum Matching tolerance;
C: adopt the similarity between two audio-frequency fragments of following formula calculating:
∑ ω
IjRepresent two maximum similarities that the audio-frequency fragments Optimum Matching obtains, p and q represent the audio unit number of two audio-frequency fragments X and Y respectively.
In addition, the present invention proposes a kind of method of audio-frequency fragments retrieval, this method can be retrieved and audio-frequency fragments like the query piece phase failure more effectively, and sorts from high to low according to similarity, thereby can bring into play the huge effect of audio retrieval technology in information retrieval more fully.
For reaching above purpose, the technical scheme of employing is that a kind of method of audio-frequency fragments retrieval is used for retrieving the audio-frequency fragments similar to the audio-frequency fragments of inquiring about from audio repository, may further comprise the steps:
(1) audio-frequency fragments and the audio-frequency fragments in the audio repository with inquiry is divided into the similar audio unit of a plurality of tonequality;
(2) calculate the similarity of inquiring about between audio-frequency fragments and the audio repository sound intermediate frequency segment sound intermediate frequency unit respectively;
(3) measure similarity between above-mentioned inquiry segment and the audio repository sound intermediate frequency segment respectively;
(4) by similarity from high to low, retrieve and audio-frequency fragments like the query piece phase failure.
Further, (Bayesian Information Criterion BIC), is divided into the similar audio unit of a plurality of tonequality with audio-frequency fragments and the audio-frequency fragments of inquiring about in the audio repository to utilization Bayes information standard.
Further, use following formula to calculate the similarity of two audio units:
Sim(s
i,s
j)=exp(-Dis tan ce(s
i,s
j)/2)
Wherein, s
iAnd s
jRepresent two audio units, Dis tan ce (s
i, s
j) expression s
iAnd s
jThe Euclidean distance of audio frequency characteristics vector; Wherein the proper vector of audio unit is to adopt the mean value of all frame audio frequency characteristics vectors in this audio unit to represent, what the proper vector of audio frame adopted is 13 dimensional feature vectors that logarithm energy and Mel cepstral coefficients are formed.
Further, the similarity concrete steps between tolerance inquiry segment and the audio repository sound intermediate frequency segment are:
A: the measuring similarity of two audio-frequency fragments is modeled as a cum rights bipartite graph;
B: the similarity between two audio-frequency fragments of utilization Optimum Matching tolerance;
C: adopt the similarity between two audio-frequency fragments of following formula calculating:
∑ ω
IjRepresent two maximum similarities that the audio-frequency fragments Optimum Matching obtains, p and q represent the audio unit number of two audio-frequency fragments X and Y respectively.
Effect of the present invention is: compare with existing method, the present invention can obtain higher retrieval accuracy, thereby gives full play to the huge effect of audio retrieval technology in information retrieval.
Why the present invention has the foregoing invention effect, and its reason is: at prior art problems, the present invention is divided into two levels to the audio-frequency fragments retrieval: audio unit and audio-frequency fragments.In the audio unit stage, it is the similar audio frames of a series of tonequality that the present invention defines audio unit, at first audio-frequency fragments is divided into audio unit one by one, measures the similarity of two audio-frequency fragments sound intermediate frequency unit then; In the audio-frequency fragments stage, based on the tolerance result of audio unit, the measuring similarity of two audio-frequency fragments is modeled as a cum rights bipartite graph, use the similarity of two audio-frequency fragments of Optimum Matching tolerance at last.
Description of drawings
Fig. 1 is a schematic flow sheet of the present invention;
Fig. 2 is the recall ratio contrast synoptic diagram of the present invention and existing 3 kinds of methods;
Fig. 3 is the precision ratio contrast synoptic diagram of the present invention and existing 3 kinds of methods.
Embodiment
The present invention is described in further detail below in conjunction with the drawings and specific embodiments.
As shown in Figure 1, method of the present invention specifically may further comprise the steps:
(1) audio-frequency fragments and the audio-frequency fragments in the audio repository with inquiry is divided into the similar audio unit of tonequality one by one;
(Bayesian Information Criterion BIC), is divided into the similar audio unit of tonequality one by one to audio-frequency fragments at first to use Bayes's information standard.Detailed description about Bayes's information standard, can list of references " Efficient Audio Segmentation Algorithms based on the BIC " [M.Cettolo and M.Vescovi, IEEE International Conference on Acoustics, Speech andSignal Processing, 2003].
(2) calculate the similarity of inquiring about between audio-frequency fragments and the audio repository sound intermediate frequency segment sound intermediate frequency unit respectively;
What the proper vector of audio frame adopted is 13 dimensional feature vectors that logarithm energy and Mel cepstral coefficients are formed, and the proper vector of audio unit is to adopt the mean value of all frame audio frequency characteristics vectors in this audio unit to represent.Use following formula to calculate the similarity of two audio units then:
Sim(s
i,s
j)=exp(-Dis tan ce(s
i,s
j)/2)
Wherein, s
iAnd s
jRepresent two audio units, Dis tan ce (s
i, s
j) expression s
iAnd s
jThe Euclidean distance of audio frequency characteristics vector.
(3) measure similarity between above-mentioned inquiry segment and the audio repository sound intermediate frequency segment respectively;
A: the measuring similarity of two audio-frequency fragments is modeled as a cum rights bipartite graph;
B: the similarity between two audio-frequency fragments of utilization Optimum Matching tolerance;
C: adopt the similarity between two audio-frequency fragments of following formula calculating:
∑ ω
IjRepresent two maximum similarities that the audio-frequency fragments Optimum Matching obtains, p and q represent the audio unit number of two audio-frequency fragments X and Y.
(4) by similarity from high to low, retrieve and audio-frequency fragments like the query piece phase failure.
Following experimental result shows that compare with existing method, the present invention can obtain higher retrieval accuracy, thereby gives full play to the huge effect of audio retrieval technology in information retrieval.
Set up the database of 1000 audio-frequency fragments in the present embodiment, comprised the fragment of sound of many types, for example animal sound, voice, vehicle sound, machine voice music, the report of a gun etc.In these 1000 audio-frequency fragments, have 500 segments that one or more similar segments are arranged, and other 500 segments have only occurred once.Therefore, 500 audio-frequency fragments of one or more similar segments are arranged, be used, so that verify the correctness of similar audio-frequency fragments retrieval as the inquiry segment.
In order to prove validity of the present invention, we have tested following 4 kinds of methods and have contrasted as experiment:
1, the present invention;
2, (author is J.Gu to the document " Dominant Feature Vectors Based Audio Similarity Measure " delivered at Pacific-Rim Conference on Multimedia of existing method 1:2004, L.Lu, R.Cai, H.J.Zhang and J.Yang, page number 890-897);
3, existing method 2:L
2Distance;
4, have the document " Content-based Indexing and Retrieval-by-Example in Audio " (author is Z.Liu and Q.Huang) that method 3:2000 delivers at IEEE International Conference on Multimedia andExpo now.
13 dimensional feature vectors that above-mentioned 4 kinds of methods, audio frame feature have all adopted logarithm energy and Mel cepstral coefficients to form, therefore, last experimental result can prove superiority of the present invention.The key distinction of these 4 kinds of methods is as shown in table 1:
Table 1: the key distinction of the present invention and existing method
| The present invention | Existing method 1 | Existing method 2 | Existing method 3 |
Segment is represented | The audio unit feature | Main feature | The audio frame feature | The audio frame feature |
Measuring similarity | Audio unit tolerance and audio-frequency fragments tolerance | Audio-frequency fragments tolerance | Audio-frequency fragments tolerance | Audio-frequency fragments tolerance |
Measure | Optimum Matching | Main proper vector | The K-L distance | L
2Distance
|
Two kinds of evaluation indexes in the MPEG-7 standardization activity have been adopted in experiment: the average adjusted retrieval order of normalization ANMRR (Average Normalized Modified Retrieval Rank) and recall level average AR (Average Recall).AR is similar to traditional recall ratio (Recall), and ANMRR compares with traditional precision ratio (Precision), not only can reflect correct result for retrieval ratio, and can reflect correct result's arrangement sequence number.The ANMRR value is more little, means that the rank of the correct segment that retrieval obtains is forward more; The AR value is big more, and it is big more to mean that in the individual result for retrieval of preceding K (K is the cutoff value of result for retrieval) similar segment accounts for the ratio of all similar segments.So AR is big more, illustrate that the recall ratio of segment retrieval is good more; ANMRR is more little, illustrates that the accuracy of segment retrieval is high more.Table 2 is that AR and the ANMRR that above-mentioned 4 kinds of methods are retrieved 500 audio-frequency fragments compares.
The contrast and experiment of table 2 the present invention and existing method
| The present invention | Existing method 1 | Existing method 2 | Existing method 3 |
AR | 0.72 | 0.66 | 0.67 | 0.66 |
ANMRR | 0.26 | 0.33 | 0.32 | 0.33 |
As can be seen from Table 2, no matter the present invention is AR, or ANMRR, all obtained than the better effect of existing method, this mainly be because: (1) the present invention proposes the similarity of audio-frequency fragments is based upon on the similarity of audio unit, and audio unit is the similar audio frames of a series of tonequality, and this has guaranteed the validity of audio-frequency fragments measuring similarity; (2) the present invention proposes to use the similarity of Optimum Matching tolerance audio-frequency fragments, and Optimum Matching has the mechanism of coupling one to one, and this has guaranteed the validity of audio-frequency fragments tolerance.
In order further to confirm validity of the present invention, except AR and ANMRR, we have adopted other one group of evaluation index: recall ratio and precision ratio, and they are defined as follows:
The number of relevant segment number/all relevant segments of recall ratio=retrieve
All segment numbers of the relevant segment number of precision ratio=retrieve/retrieve
The result as shown in Figures 2 and 3, no matter the present invention is recall ratio, or precision ratio, has all obtained than the better effect of existing method.Therefore, above-mentioned two class evaluation index: AR and ANMRR, recall ratio and precision ratio, full proof the present invention in audio-frequency fragments retrieval, go out chromatic effect.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.
Annotate: work of the present invention, by grant of national natural science foundation (project approval number: 60503062).