CN103324663A

CN103324663A - Compressed domain audio fingerprint extraction method based on MDCT spectrum expectation

Info

Publication number: CN103324663A
Application number: CN 201310142650
Authority: CN
Inventors: 吴黎明; 邓耀华; 王桂棠; 韩威; 高世平; 陈智翔; 李垚飞
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2013-04-23
Filing date: 2013-04-23
Publication date: 2013-09-25

Abstract

The invention relates to a method for directly extracting an audio fingerprint from a compressed domain audio file (MP3 format). The audio file of the MP3 format is decoded, so that the MDCT (Modified Discrete Cosine Transform) spectrum is obtained; the overlapped sub-block operation is performed to the MDCT spectrum; critical zone partition is performed to each MDCT spectrum, and the characteristic vector of each critical zone is calculated; class probability computation is performed to characteristic vectors of all critical zones in the same block; then the expectation of each MDCT spectrum is calculated; the audio fingerprint is obtained according to the expected value of the adjacent blocks. The fingerprint extraction method for the compressed domain audio file provided by the invention is highly suitable for storage and transmission modes taking the compressed domain format audio as the main stream in the current network, and can be used for copyright management of the audio file and search of audio file information.

Description

Compressed domain audio fingerprint extraction method based on the expectation of MDCT frequency spectrum

Technical field

The present invention relates to the compressed domain audio index field based on audio content, described method can be used for the rapid extraction of compressed domain audio fingerprint, and then can be used for the copyright management of audio file and searching of audio file essential information.

Background technology

How fast finding has become a kind of challenge to own needed audio file from an audio-frequency information storehouse in the face of the magnanimity audio file on the internet! And most of audio files are all stored with MP3 format and are transmitted on the current internet, how does this directly carry out index to the MP3 audio frequency again?

Along with the rise of automatic speech recognition technology, obtained primary study based on the audio index technology of audio-frequency fingerprint.Audio-frequency fingerprint refers to represent the content-based digital signature of compacting of one section important acoustic feature of music, and its fundamental purpose is to set up the perception acoustical quality that a kind of actual mechanism is come two voice datas of comparison.Audio-frequency fingerprint has following three topmost character: accuracy comprises correct recognition rata, loss (False negative) and false drop rate (False positive); Robustness refers to that unknown audio frequency can still can be identified after standing more serious Audio Signal Processing; Fingerprint size for carrying out fast search, requires the fingerprint of each audio file as far as possible little.

Begin audio-frequency fingerprint theory and application start research thereof 20 end of the centurys both at home and abroad, the Philips algorithm surely belongs in this classical, and the audio-frequency fingerprint algorithm of main flow just is based on the improvement project of Philips algorithm at present.But this class algorithm is all based on wav (employing pcm encoder) audio format, need first the MP3 audio file to be carried out format conversion, and such algorithm need to still be divided a plurality of subbands to each frame in very high interframe multiplicity situation, the algorithm time complexity is large, and fingerprint size is also larger.And current most of audio file is all stored with compressed format (such as MP3 format) and is transmitted, so compressed domain audio fingerprint index scheme seems more with practical value.

Summary of the invention

The invention provides a kind of compressed domain audio fingerprint extraction method based on expecting based on the MDCT frequency spectrum, can directly extract audio-frequency fingerprint for the MP3 audio file, can reduce audio-frequency fingerprint extraction algorithm time complexity, reduce fingerprint size.

The present invention relates to the compressed domain audio fingerprint extraction method based on the expectation of MDCT frequency spectrum, its concrete steps are:

(1) directly the MP3 format audio file is decoded frame by frame and obtain continuous N DCT frequency spectrum;

(2) above-mentioned MDCT frequency spectrum is divided: the MDCT frequency spectrum that per 5 frame MP3 data decodes are gone out is as a piece, and the degree of overlapping between the adjacent block is 95%;

(3) above-mentioned each piece MDCT frequency spectrum blocks is carried out critical band and divide, calculate the eigenvector of all critical bands in each piece;

Each piece MDCT frequency spectrum critical band eigenvector calculation procedure is as follows:

1. to every MDCT frequency spectrum, at f=0～f _s/ 2 (f _sSampled value for audio file) determines in Several critical band frequency division points.The method of determining is with i=1,2,3 ... substitution formula (1), can obtain corresponding

(take Hz as unit).

{\hat{f}}_{i} = \frac{1960 i + 1038.8}{26.28 - i} - - - (1)

2. with

Consist of i (i=1,2,3 ...) and critical band, with the MDCT coefficient in each critical band | C _MDCT| ²Get and can obtain corresponding critical band eigenvector.If dual-channel audio, then adopt (| C _MDCT1| ²+ | C _MDCT2| ²)/2 sum is as the eigenvector of this critical band.Use SEN=[sen ₁, sen ₂..., sen _l..., sen _n] expression critical band eigenvector, it is as follows that then available formula (2) is calculated the critical band eigenvector.

{sen}_{l} = \underset{{\hat{f}}_{l} < {\hat{f}}_{k} \leq {\hat{f}}_{l + 1}}{Σ} | C_{MDCT} |^{2} - - - (2)

3. calculate the eigenvector SEN (i, j) of j critical band of MDCT frequency spectrum i piece; N MDCT coefficient in m joint of s (j, n) expression; MDCT _iAnd MDCT _jExpression belongs to the up and down boundary of the MDCT coefficient index of different critical band respectively, and therefore the MDCT spectrum signature vector of j critical band of i piece can calculate according to formula (3).

SEN (i, j) = Σ_{m = 0.5 (i + 1)}^{10 + 0.5 (i - 1)} Σ_{n = {MDCT}_{i}}^{{MDCT}_{j}} | s (m, n) |^{2} - - - (3)

(4) the critical band eigenvector with above-mentioned same carries out the processing of class randomization: each vector value obtains the class probability of each vector divided by the vector value sum in the same;

In order to obtain class probability characteristics phasor function P (), and to satisfy its fundamental characteristics be that all elements sum is 1, by formula (4) the MDCT spectrum signature vector of each critical band is selected the total characteristic vector of critical band divided by all, with P (i, j) expression.

P (i, j) = \frac{SEN (i, j)}{Σ_{j = 1}^{n} SEN (i, j)} - - - (4)

(5) according to the mathematical expectation E (i) of above-mentioned vector value and each MDCT frequency spectrum blocks of class probability calculation thereof;

E (i) = Σ_{j = 1}^{n} p (i, j) * SEN (i, j) - - - (5)

(6) mathematical expectation of adjacent block is done poor, obtained " 0,1 " scale-of-two fingerprint sequence by decision rule according to difference.

In order to make between fingerprint coupling more convenient, also for the ease of fingerprint storage, adopt the expectation value magnitude relationship relative method of adjacent set to form the last fingerprint sequence that is consisted of by 0/1 Bit String, shown in (6).

S (i) = \{\begin{matrix} 0, E (i) < E (i + 1) \\ 1, E (i) &GreaterEqual; E (i + 1) \end{matrix}, i = 1,2, . . ., n - 1 - - - (6)

The relative prior art of the present invention and method have following advantage and beneficial effect:

(1) directly from MP3 audio decoder process, obtains the MDCT frequency spectrum, need not to carry out in advance the audio format conversion;

(2) because also can directly calculate the MDCT frequency spectrum of unpacked format audio frequency, be conducive to the fingerprint of unpacked format audio file and the fingerprint of compressed format audio file are mated;

(3) each MDCT frequency spectrum blocks only produces a binary bit as fingerprint, and fingerprint size is little, thereby can reduce the fingerprint matching time;

(4) audio-frequency fingerprint depends primarily on audio frequency, except frequency is disturbed, can resist other various noise.

Description of drawings

Fig. 1 is the audio index theory diagram based on audio-frequency fingerprint of the present invention;

Fig. 2 is the theory diagram that is obtained the MDCT continuous frequency spectrum by the decoding of MP3 audio file of the present invention;

Fig. 3 is the algorithm principle figure that carries out the fingerprint extraction computing from the MDCT frequency spectrum of the present invention.

Embodiment

Below in conjunction with accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited to this.

Consult Fig. 1: at first according to the audio-frequency fingerprint algorithm magnanimity audio file is carried out fingerprint extraction, and the fingerprint of each audio file and its metamessage (comprising author, date, content, keyword etc.) are carried out related, make up the audio-frequency fingerprint storehouse; In the time will searching the metamessage of a unknown audio file, just with same audio-frequency fingerprint algorithm it is taken the fingerprint, then with the storehouse in fingerprint compare, if be present in the storehouse, then export the metamessage of this audio file to the person of searching.

Consult such as 3: the MP3 format audio file is decoded frame by frame obtains the MDCT frequency spectrum, be the piece (having fixedly degree of overlapping between the adjacent block) of certain-length with the MDCT spectral decomposition, every MDCT frequency spectrum is carried out critical band to be divided, and then calculate the eigenvector of each critical band, all critical band eigenvectors in the same are carried out the class probabilistic operations, finally calculate the expectation of every MDCT frequency spectrum, the difference of the expectation value between the adjacent block is carried out " 0,1 " quantize to obtain audio-frequency fingerprint.

Claims

1. based on the compressed domain audio fingerprint extraction method of MDCT frequency spectrum expectation, it is characterized in that:

(1) directly the MP3 format audio file is decoded and obtain the MDCT frequency spectrum;

(2) above-mentioned MDCT frequency spectrum is divided: the MDCT frequency spectrum that goes out take m frame MP3 decoding is as a piece, and the degree of overlapping between the adjacent block is n;

(3) above-mentioned each MDCT frequency spectrum blocks is carried out critical band and divide, calculate the eigenvector of each critical band;

(5) expect according to the eigenvector of above-mentioned vector value and each MDCT frequency spectrum blocks of class probability calculation thereof;

(6) expectation value of adjacent block is done poor, obtained " 0,1 " scale-of-two fingerprint sequence by decision rule according to difference.

2. as described in requiring such as right 1, calculate the method for the critical band eigenvector of MDCT frequency spectrum blocks.