CN102214218A

CN102214218A - System and method for retrieving contents of audio/video

Info

Publication number: CN102214218A
Application number: CN 201110151426
Authority: CN
Inventors: 张峰; 黄伟
Original assignee: Shengle Information Technolpogy Shanghai Co Ltd
Current assignee: SHANGHAI GEAK ELECTRONICS Co.,Ltd.
Priority date: 2011-06-07
Filing date: 2011-06-07
Publication date: 2011-10-12
Anticipated expiration: 2031-06-07
Also published as: CN102214218B

Abstract

The invention discloses a system and a method for retrieving contents of an audio/video. The method at least comprises the following steps of: receiving an audio/video segment; extracting the fingerprint characteristic of each frame of the audio/video segment; computing the anti-interference degree of a fingerprint of each frame by using an anti-interference degree computation model; sorting the frames according to the anti-interference degree of the fingerprint of each frame; and retrieving the fingerprints in a fingerprint database according to a frame sorting result. The fingerprint retrieval speed of the audio/video is greatly improved by computing the anti-interference degree of the fingerprint characteristic of each frame, sorting according to the anti-interference degree of the fingerprint characteristic of each frame and retrieving.

Description

Audio-video frequency content searching system and method thereof

Technical field

The present invention is about a kind of audio-video frequency content searching system and method thereof, particularly about a kind of audio-video frequency content searching system and method thereof based on audio/video fingerprint.

Background technology

Along with the fast development of network and multimedia technology, the quantity of audio-video frequency media is explosive increase, and people manage accurately and effectively to digital audio/video frequency content and visit becomes very difficult.In recent years, many new research and development directions have appearred in content-based audio frequency and video retrieval, and the audio/video fingerprint technology is arisen at the historic moment.

Audio-frequency fingerprint (audio fingerprinting) technology just is being suggested a long time ago, for example, Jaap Haitsma and Ton Kalke have delivered " a kind of audio fingerprint system of high reliability " (AHighly Robust Audio Fingerprinting System) in music searching progress in 2002 international conference, this system passes through method for processing signals, with the sound signal of (for example 11.6ms) at set intervals in the audio file, be converted into the fingerprint (fingerprint) of one 32 bit (bit) size, an audio file just can be converted into a file fingerprint by this method, system is after indexing to all audio-frequency fingerprint files, and just audio-frequency fingerprint has been retrieved fast.

Similar to the audio fingerprint techniques principle, video finger print (video fingerprinting) system is converted into very little fingerprint (for example 32 bit sizes) with each frame or a few frame, retrieves then.For example international patent is that " the Method and system for fingerprinting digital video object based on multiresolution; multirat and temporal signatures " of WO2007/127590A2 discloses the video finger print disposal route that a kind of every frame with vision signal is converted into 84 bits or 132 bits, and a video file just can be converted into a very little file fingerprint by this method.

As seen, existing audio/video fingerprint technology all is at first the audio-video frequency content of importing to be carried out fingerprint characteristic to extract usually, according to the frame preface of fingerprint characteristic, carries out the retrieval of inverted index in order in fingerprint database then.Yet but there are the following problems for this way: because code check, form and the noise of audio frequency and video can cause the fingerprint characteristic distortion, influence retrieval effectiveness.

In sum, the audio frequency and video retrieval technique of prior art exists because audio frequency and video code check, form and noise cause fingerprint characteristic to be out of shape and then to influence the problem of retrieval effectiveness as can be known, therefore is necessary to propose improved technological means in fact, solves this problem.

Summary of the invention

For the audio frequency and video retrieval technique that overcomes above-mentioned prior art exists because audio frequency and video code check, form and noise cause fingerprint characteristic to be out of shape and then to influence the problem of retrieval effectiveness, fundamental purpose of the present invention is to provide a kind of audio-video frequency content searching system and method thereof, it is by calculating the anti-interference degree of each frame fingerprint characteristic in advance, and, can improve retrieval rate greatly by the anti-interference degree of the fingerprint characteristic laggard line retrieval that sorts.

For reaching above-mentioned and other purpose, a kind of audio-video frequency content searching system of the present invention comprises at least:

Receive module, be used to receive an audio frequency and video segment;

Fingerprint characteristic extracts module, is used to extract the fingerprint characteristic of this each frame of audio frequency and video segment;

Anti-interference degree is calculated module, makes up an anti-interference degree computation model, calculates the anti-interference degree of every frame fingerprint according to this anti-interference degree computation model;

The ordering module carries out the frame ordering according to the anti-interference degree of every frame fingerprint; And

The retrieval module carries out the fingerprint retrieval according to the frame ranking results in fingerprint database.

Further, this fingerprint characteristic extracts module and assigns to obtain the fingerprint characteristic of each frame by calculating the energy difference of adjacent two sub belt energy difference and the same subband of adjacent two frames in the same frame.

Further, this anti-interference degree computation model is:

Robust (n) = Σ_{m = 1}^{m = 32} ABS (E (n, m) - E (n, m + 1) - (E (n - 1, m) - E (n - 1, m + 1)));

Wherein Robust (n) represents the anti-interference degree of n frame fingerprint, E (n, m) energy of m subband of expression n frame.

Further, the anti-interference degree of every frame fingerprint can the highest some dimensions be calculated acquisition by the absolute value of primary energy difference numerical.

Further, this anti-interference degree computation model is:

Robust (n) = Σ_{k = 1}^{k = 32} E_frame_sort (n, k);

Wherein Robust (n) represents the anti-interference degree of n frame fingerprint, and E_frame_sort (n, k)=sort _M=1,32(ABS (E (n, m)-E (n, m+1)-(E (n-1, m)-E (n-1, m+1))), (sort () represents ordering to E for n, the m) energy of m subband of expression n frame.

Further, this anti-interference degree is calculated the anti-interference degree that module also can calculate each frame fingerprint by spectrum value or color-values.

For reaching above-mentioned and other purpose, the present invention also provides a kind of audio-video frequency content search method, and this method comprises the steps: at least

Receive an audio frequency and video segment;

Extract the fingerprint characteristic of this each frame of audio frequency and video segment;

Utilize an anti-interference degree computation model to calculate the anti-interference degree of each frame fingerprint;

Carry out the frame ordering according to the anti-interference degree of every frame fingerprint; And

Result according to the frame ordering carries out the fingerprint retrieval in fingerprint database.

Further, each frame fingerprint characteristic is to assign to obtain by calculating in the same frame energy difference of adjacent two sub belt energy difference and the same subband of adjacent two frames.

Further, this anti-interference degree computation model is:

Robust (n) = Σ_{m = 1}^{m = 32} ABS (E (n, m) - E (n, m + 1) - (E (n - 1, m) - E (n - 1, m + 1)));

Further, this anti-interference degree computation model can also for:

Robust (n) = Σ_{k = 1}^{k = 32} E_frame_sort (n, k);

Further, the anti-interference degree of each frame fingerprint also can be calculated acquisition by spectrum value or color-values.

Compared with prior art, a kind of audio-video frequency content searching system of the present invention and method are by calculating the anti-interference degree of each frame fingerprint of the audio frequency and video that need retrieval in advance, and according to retrieval after the anti-interference degree ordering of fingerprint, the preferential retrieval that anti-interference degree is high, rather than preface retrieval frame by frame, can improve retrieval rate greatly.

Description of drawings

Fig. 1 is the system architecture diagram of a kind of audio-video frequency content searching system of the present invention;

Fig. 2 is the flow chart of steps of a kind of audio-video frequency content search method of the present invention.

Embodiment

Below by specific instantiation and accompanying drawings embodiments of the present invention, those skilled in the art can understand other advantage of the present invention and effect easily by the content that this instructions disclosed.The present invention also can be implemented or be used by other different instantiation, and the every details in this instructions also can be based on different viewpoints and application, carries out various modifications and change under the spirit of the present invention not deviating from.

Fig. 1 is the Organization Chart of a kind of audio-video frequency content searching system of the present invention, below will with Fig. 1 System Operation of the present invention be described earlier.As shown in Figure 1, a kind of audio-video frequency content searching system of the present invention comprises reception module 101 at least, fingerprint characteristic extracts module 102, anti-interference degree calculating module 103, ordering module 104 and retrieval module 105.

Receive module 101 and be used to receive a files in stream media, comprise an audio frequency and video segment in this files in stream media at least, this audio frequency and video segment can be audio file, also can be video file.

Fingerprint characteristic extracts module 102 and is connected with reception module 101, receiving the audio frequency and video sheet at reception module 101 has no progeny, fingerprint characteristic extracts the fingerprint characteristic that 102 of modules are used for extracting each frame audio frequency of this audio frequency and video segment or each frame video, below specifies fingerprint characteristic and extracts the take the fingerprint course of work of feature of module 102.

Be characterized as example at this to extract audio-frequency fingerprint, at first, fingerprint characteristic extracts the monophonic audio of module 102 for fixed sample rate, moves by anchor-frame audio frequency to be divided into some milliseconds audio frame, and adds Hamming window (Hanning Window); Secondly, each frame audio frame is done Fourier transform, extract power spectrum, adopt logarithmic scale evenly to be divided into mutually disjoint 33 subbands on certain section frequency band (for example 300Hz-4000Hz), and calculate the sub belt energy of each frame; At last, calculate the energy difference of interior adjacent two sub belt energy difference of same frame and the same subband of adjacent two frames, obtain fingerprint characteristic value, can obtain by following computing formula:

F (n, m) = \{\begin{matrix} 1, E (n, m) - E (n, m + 1) - (E (n - 1, m) - E (n - 1, m + 1)) > 0 \\ 0, E (n, m) - E (n, m + 1) - (E (n - 1, m) - E (n - 1, m + 1)) \leq 0 \end{matrix}

Wherein use E (n, the m) energy of m subband of expression n frame, F (n, m) the m dimensional feature in the fingerprint characteristic of expression n frame correspondence.

The feature of utilizing above-mentioned formulas Extraction is totally 32 bit (bit), the just length of what a long type data.Can see, each dimension of each frame has been quantized into 0 or 1 by original energy difference fractional value, if the absolute value of energy difference fractional value is big more, this dimension of this frame is just felt bad noise more, if the absolute value of primary energy difference numerical is more little, this dimension of this frame is the easy more noise that is subjected to just.Though preferred embodiment of the present invention only obtains the fingerprint characteristic of every frame with the method for primary energy difference numerical, the present invention can certainly obtain the fingerprint characteristic of every frame by additive methods such as spectrum value, color-values not as limit, does not repeat them here.

Similar with the audio-frequency fingerprint feature extraction, for the video finger print Feature Extraction, both can take original energy difference fractional value method to obtain, also can obtain by additive methods such as spectrum value, color-values, the video finger print Feature Extraction is known prior art, does not repeat them here.

After fingerprint characteristic extraction module 102 extracted the fingerprint characteristic of each frame audio frequency and each frame video, anti-interference degree was calculated 103 of modules and is made up the anti-interference degree that an anti-interference degree computation model calculates each frame fingerprint.Anti-interference degree for every frame fingerprint, can calculate by the numerical value that adopts the primary energy difference, also can adopt spectrum value or color-values to calculate, the present invention is not as limit, in preferred embodiment of the present invention, then still be calculated as example with the numerical value that adopts the primary energy difference, anti-interference degree computation model is:

Robust (n) = Σ_{m = 1}^{m = 32} ABS (E (n, m) - E (n, m + 1) - (E (n - 1, m) - E (n - 1, m + 1)));

Here the anti-interference degree of representing n frame fingerprint, E (n, m) energy of m subband of expression n frame.

Preferable, the anti-interference degree of every frame fingerprint can also the highest some dimensions be calculated by the absolute value of primary energy difference numerical:

E_frame_sort (n, k)=sort _M=1,32(ABS (E (n, m)-E (n, m+1)-(E (n-1, m-E (n-1, m+1))), wherein, sort () represents ordering; Then anti-interference degree computation model can for:

Robust (n) = Σ_{k = 1}^{k = 32} E_frame_sort (n, k);

Expression calculate every frame absolute value the highest the 1st tie up the 32nd dimension.

Ordering module 104 is used for carrying out the frame ordering according to the anti-interference degree Robust (n) of every frame fingerprint; Retrieval module 105 carries out the fingerprint retrieval according to the frame ranking results of anti-interference degree in fingerprint database.Specifically, during 105 retrievals of retrieval module is not to retrieve according to frame preface commonly used, but retrieve the preferential retrieval that Robust (n) is high more, the hysteresis retrieval that Robust (n) is low more according to the ranking results of the Robust (n) that represents the anti-interference degree of this frame fingerprint.

Fig. 2 is the flow chart of steps of a kind of audio-video frequency content search method of the present invention.As shown in Figure 2, a kind of audio-video frequency content search method of the present invention comprises the following steps: to receive an audio frequency and video segment (step 201); Extract each frame fingerprint characteristic (step 202) of this audio frequency and video segment; Utilize anti-interference degree computation model to calculate the anti-interference degree (step 203) of each frame fingerprint; Carry out frame ordering (step 204) according to the anti-interference degree of every frame fingerprint; And carry out fingerprint according to the result that frame sorts to fingerprint database and retrieve (step 205).

For step 203, the anti-interference degree of every frame fingerprint, can calculate by the numerical value that adopts the primary energy difference, also can adopt spectrum value or color-values to calculate, in preferred embodiment of the present invention, then be calculated as example with the numerical value that adopts the primary energy difference, therefore, anti-interference degree computation model is:

Robust (n) = Σ_{m = 1}^{m = 32} ABS (E (n, m) - E (n, m + 1) - (E (n - 1, m) - E (n - 1, m + 1)));

Preferable, the anti-interference degree of every frame fingerprint, can also the highest some dimensions calculate by the absolute value of primary energy difference numerical:

E_frame_sort (n, k)=sort _M=1,32(AES (E (n, m)-E (n, m+1)-(E (n-1, m)-E (n-1, m+1))), wherein, sort () represents ordering; Then anti-interference degree computation model can for:

Robust (n) = Σ_{k = 1}^{k = 32} E_frame_sort (n, k);

In sum, a kind of audio-video frequency content searching system of the present invention and method are by calculating the anti-interference degree of each frame fingerprint of the audio frequency and video that need retrieval in advance, and according to retrieval after the anti-interference degree ordering of fingerprint, the preferential retrieval that anti-interference degree is high, rather than preface retrieval frame by frame, improved the speed of audio/video fingerprint retrieval greatly.

The foregoing description is illustrative principle of the present invention and effect thereof only, but not is used to limit the present invention.Any those skilled in the art all can be under spirit of the present invention and category, and the foregoing description is modified and changed.Therefore, the scope of the present invention should be listed as claims.

Claims

1. audio-video frequency content searching system comprises at least:

Receive module, be used to receive an audio frequency and video segment;

2. audio-video frequency content searching system as claimed in claim 1 is characterized in that: this fingerprint characteristic extracts module and assigns to obtain the fingerprint characteristic of each frame by calculating the energy difference of adjacent two sub belt energy difference and the same subband of adjacent two frames in the same frame.

3. audio-video frequency content searching system as claimed in claim 2 is characterized in that, this anti-interference degree computation model is:

Robust (n) = Σ_{m = 1}^{m = 32} ABS (E (n, m) - E (n, m + 1) - (E (n - 1, m) - E (n - 1, m + 1)));

4. audio-video frequency content searching system as claimed in claim 2 is characterized in that: the anti-interference degree of every frame fingerprint can the highest some dimensions be calculated acquisition by the absolute value of primary energy difference numerical.

5. audio-video frequency content searching system as claimed in claim 4 is characterized in that: this anti-interference degree computation model is:

Robust (n) = Σ_{k = 1}^{k = 32} E_sort (E_frame_sort (n, k));

Wherein Robust (n) represents the anti-interference degree of n frame fingerprint, and E_frame_sort (n, k)=sort _M=1,32(ABS (E (n, m)-E (n ,+1)-(E (n-1, m)-E (n-1, m+1))), (sort () represents ordering to E for n, the m) energy of m subband of expression n frame.

6. audio-video frequency content searching system as claimed in claim 1 is characterized in that: this anti-interference degree is calculated module calculates each frame fingerprint by spectrum value or color-values anti-interference degree.

7. an audio-video frequency content search method comprises the steps: at least

Receive an audio frequency and video segment;

8. audio-video frequency content search method as claimed in claim 7 is characterized in that: each frame fingerprint characteristic is to assign to obtain by calculating in the same frame energy difference of adjacent two sub belt energy difference and the same subband of adjacent two frames.

9. audio-video frequency content search method as claimed in claim 8 is characterized in that, this anti-interference degree computation model is:

Robust (n) = Σ_{m = 1}^{m = 32} ABS (E (n, m) - E (n, m + 1) - (E (n - 1, m) - E (n - 1, m + 1)));

10. audio-video frequency content search method as claimed in claim 8 is characterized in that, this anti-interference degree computation model is:

Robust (n) = Σ_{k = 1}^{k = 32} E_frame_sort (n, k);

11. audio-video frequency content search method as claimed in claim 7 is characterized in that: the anti-interference degree of each frame fingerprint can be calculated acquisition by spectrum value or color-values.