CN103324663A - Compressed domain audio fingerprint extraction method based on MDCT spectrum expectation - Google Patents

Compressed domain audio fingerprint extraction method based on MDCT spectrum expectation Download PDF

Info

Publication number
CN103324663A
CN103324663A CN 201310142650 CN201310142650A CN103324663A CN 103324663 A CN103324663 A CN 103324663A CN 201310142650 CN201310142650 CN 201310142650 CN 201310142650 A CN201310142650 A CN 201310142650A CN 103324663 A CN103324663 A CN 103324663A
Authority
CN
China
Prior art keywords
mdct
audio
frequency spectrum
fingerprint
audio file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201310142650
Other languages
Chinese (zh)
Inventor
吴黎明
邓耀华
王桂棠
韩威
高世平
陈智翔
李垚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN 201310142650 priority Critical patent/CN103324663A/en
Publication of CN103324663A publication Critical patent/CN103324663A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a method for directly extracting an audio fingerprint from a compressed domain audio file (MP3 format). The audio file of the MP3 format is decoded, so that the MDCT (Modified Discrete Cosine Transform) spectrum is obtained; the overlapped sub-block operation is performed to the MDCT spectrum; critical zone partition is performed to each MDCT spectrum, and the characteristic vector of each critical zone is calculated; class probability computation is performed to characteristic vectors of all critical zones in the same block; then the expectation of each MDCT spectrum is calculated; the audio fingerprint is obtained according to the expected value of the adjacent blocks. The fingerprint extraction method for the compressed domain audio file provided by the invention is highly suitable for storage and transmission modes taking the compressed domain format audio as the main stream in the current network, and can be used for copyright management of the audio file and search of audio file information.

Description

Compressed domain audio fingerprint extraction method based on the expectation of MDCT frequency spectrum
Technical field
The present invention relates to the compressed domain audio index field based on audio content, described method can be used for the rapid extraction of compressed domain audio fingerprint, and then can be used for the copyright management of audio file and searching of audio file essential information.
Background technology
How fast finding has become a kind of challenge to own needed audio file from an audio-frequency information storehouse in the face of the magnanimity audio file on the internet! And most of audio files are all stored with MP3 format and are transmitted on the current internet, how does this directly carry out index to the MP3 audio frequency again?
Along with the rise of automatic speech recognition technology, obtained primary study based on the audio index technology of audio-frequency fingerprint.Audio-frequency fingerprint refers to represent the content-based digital signature of compacting of one section important acoustic feature of music, and its fundamental purpose is to set up the perception acoustical quality that a kind of actual mechanism is come two voice datas of comparison.Audio-frequency fingerprint has following three topmost character: accuracy comprises correct recognition rata, loss (False negative) and false drop rate (False positive); Robustness refers to that unknown audio frequency can still can be identified after standing more serious Audio Signal Processing; Fingerprint size for carrying out fast search, requires the fingerprint of each audio file as far as possible little.
Begin audio-frequency fingerprint theory and application start research thereof 20 end of the centurys both at home and abroad, the Philips algorithm surely belongs in this classical, and the audio-frequency fingerprint algorithm of main flow just is based on the improvement project of Philips algorithm at present.But this class algorithm is all based on wav (employing pcm encoder) audio format, need first the MP3 audio file to be carried out format conversion, and such algorithm need to still be divided a plurality of subbands to each frame in very high interframe multiplicity situation, the algorithm time complexity is large, and fingerprint size is also larger.And current most of audio file is all stored with compressed format (such as MP3 format) and is transmitted, so compressed domain audio fingerprint index scheme seems more with practical value.
Summary of the invention
The invention provides a kind of compressed domain audio fingerprint extraction method based on expecting based on the MDCT frequency spectrum, can directly extract audio-frequency fingerprint for the MP3 audio file, can reduce audio-frequency fingerprint extraction algorithm time complexity, reduce fingerprint size.
The present invention relates to the compressed domain audio fingerprint extraction method based on the expectation of MDCT frequency spectrum, its concrete steps are:
(1) directly the MP3 format audio file is decoded frame by frame and obtain continuous N DCT frequency spectrum;
(2) above-mentioned MDCT frequency spectrum is divided: the MDCT frequency spectrum that per 5 frame MP3 data decodes are gone out is as a piece, and the degree of overlapping between the adjacent block is 95%;
(3) above-mentioned each piece MDCT frequency spectrum blocks is carried out critical band and divide, calculate the eigenvector of all critical bands in each piece;
Each piece MDCT frequency spectrum critical band eigenvector calculation procedure is as follows:
1. to every MDCT frequency spectrum, at f=0~f s/ 2 (f sSampled value for audio file) determines in Several critical band frequency division points.The method of determining is with i=1,2,3 ... substitution formula (1), can obtain corresponding
Figure BSA00000883294400021
(take Hz as unit).
f ^ i = 1960 i + 1038.8 26.28 - i - - - ( 1 )
2. with
Figure BSA00000883294400023
Consist of i (i=1,2,3 ...) and critical band, with the MDCT coefficient in each critical band | C MDCT| 2Get and can obtain corresponding critical band eigenvector.If dual-channel audio, then adopt (| C MDCT1| 2+ | C MDCT2| 2)/2 sum is as the eigenvector of this critical band.Use SEN=[sen 1, sen 2..., sen l..., sen n] expression critical band eigenvector, it is as follows that then available formula (2) is calculated the critical band eigenvector.
sen l = &Sigma; f ^ l < f ^ k &le; f ^ l + 1 | C MDCT | 2 - - - ( 2 )
3. calculate the eigenvector SEN (i, j) of j critical band of MDCT frequency spectrum i piece; N MDCT coefficient in m joint of s (j, n) expression; MDCT iAnd MDCT jExpression belongs to the up and down boundary of the MDCT coefficient index of different critical band respectively, and therefore the MDCT spectrum signature vector of j critical band of i piece can calculate according to formula (3).
SEN ( i , j ) = &Sigma; m = 0.5 ( i + 1 ) 10 + 0.5 ( i - 1 ) &Sigma; n = MDCT i MDCT j | s ( m , n ) | 2 - - - ( 3 )
(4) the critical band eigenvector with above-mentioned same carries out the processing of class randomization: each vector value obtains the class probability of each vector divided by the vector value sum in the same;
In order to obtain class probability characteristics phasor function P (), and to satisfy its fundamental characteristics be that all elements sum is 1, by formula (4) the MDCT spectrum signature vector of each critical band is selected the total characteristic vector of critical band divided by all, with P (i, j) expression.
P ( i , j ) = SEN ( i , j ) &Sigma; j = 1 n SEN ( i , j ) - - - ( 4 )
(5) according to the mathematical expectation E (i) of above-mentioned vector value and each MDCT frequency spectrum blocks of class probability calculation thereof;
E ( i ) = &Sigma; j = 1 n p ( i , j ) * SEN ( i , j ) - - - ( 5 )
(6) mathematical expectation of adjacent block is done poor, obtained " 0,1 " scale-of-two fingerprint sequence by decision rule according to difference.
In order to make between fingerprint coupling more convenient, also for the ease of fingerprint storage, adopt the expectation value magnitude relationship relative method of adjacent set to form the last fingerprint sequence that is consisted of by 0/1 Bit String, shown in (6).
S ( i ) = 0 , E ( i ) < E ( i + 1 ) 1 , E ( i ) &GreaterEqual; E ( i + 1 ) i = 1,2 , . . . , n - 1 - - - ( 6 )
The relative prior art of the present invention and method have following advantage and beneficial effect:
(1) directly from MP3 audio decoder process, obtains the MDCT frequency spectrum, need not to carry out in advance the audio format conversion;
(2) because also can directly calculate the MDCT frequency spectrum of unpacked format audio frequency, be conducive to the fingerprint of unpacked format audio file and the fingerprint of compressed format audio file are mated;
(3) each MDCT frequency spectrum blocks only produces a binary bit as fingerprint, and fingerprint size is little, thereby can reduce the fingerprint matching time;
(4) audio-frequency fingerprint depends primarily on audio frequency, except frequency is disturbed, can resist other various noise.
Description of drawings
Fig. 1 is the audio index theory diagram based on audio-frequency fingerprint of the present invention;
Fig. 2 is the theory diagram that is obtained the MDCT continuous frequency spectrum by the decoding of MP3 audio file of the present invention;
Fig. 3 is the algorithm principle figure that carries out the fingerprint extraction computing from the MDCT frequency spectrum of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited to this.
Consult Fig. 1: at first according to the audio-frequency fingerprint algorithm magnanimity audio file is carried out fingerprint extraction, and the fingerprint of each audio file and its metamessage (comprising author, date, content, keyword etc.) are carried out related, make up the audio-frequency fingerprint storehouse; In the time will searching the metamessage of a unknown audio file, just with same audio-frequency fingerprint algorithm it is taken the fingerprint, then with the storehouse in fingerprint compare, if be present in the storehouse, then export the metamessage of this audio file to the person of searching.
Consult such as 3: the MP3 format audio file is decoded frame by frame obtains the MDCT frequency spectrum, be the piece (having fixedly degree of overlapping between the adjacent block) of certain-length with the MDCT spectral decomposition, every MDCT frequency spectrum is carried out critical band to be divided, and then calculate the eigenvector of each critical band, all critical band eigenvectors in the same are carried out the class probabilistic operations, finally calculate the expectation of every MDCT frequency spectrum, the difference of the expectation value between the adjacent block is carried out " 0,1 " quantize to obtain audio-frequency fingerprint.

Claims (2)

1. based on the compressed domain audio fingerprint extraction method of MDCT frequency spectrum expectation, it is characterized in that:
(1) directly the MP3 format audio file is decoded and obtain the MDCT frequency spectrum;
(2) above-mentioned MDCT frequency spectrum is divided: the MDCT frequency spectrum that goes out take m frame MP3 decoding is as a piece, and the degree of overlapping between the adjacent block is n;
(3) above-mentioned each MDCT frequency spectrum blocks is carried out critical band and divide, calculate the eigenvector of each critical band;
(4) the critical band eigenvector with above-mentioned same carries out the processing of class randomization: each vector value obtains the class probability of each vector divided by the vector value sum in the same;
(5) expect according to the eigenvector of above-mentioned vector value and each MDCT frequency spectrum blocks of class probability calculation thereof;
(6) expectation value of adjacent block is done poor, obtained " 0,1 " scale-of-two fingerprint sequence by decision rule according to difference.
2. as described in requiring such as right 1, calculate the method for the critical band eigenvector of MDCT frequency spectrum blocks.
CN 201310142650 2013-04-23 2013-04-23 Compressed domain audio fingerprint extraction method based on MDCT spectrum expectation Pending CN103324663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201310142650 CN103324663A (en) 2013-04-23 2013-04-23 Compressed domain audio fingerprint extraction method based on MDCT spectrum expectation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201310142650 CN103324663A (en) 2013-04-23 2013-04-23 Compressed domain audio fingerprint extraction method based on MDCT spectrum expectation

Publications (1)

Publication Number Publication Date
CN103324663A true CN103324663A (en) 2013-09-25

Family

ID=49193406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201310142650 Pending CN103324663A (en) 2013-04-23 2013-04-23 Compressed domain audio fingerprint extraction method based on MDCT spectrum expectation

Country Status (1)

Country Link
CN (1) CN103324663A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106663102A (en) * 2014-04-04 2017-05-10 Teletrax有限公司 Method and device for generating fingerprints of information signals

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106663102A (en) * 2014-04-04 2017-05-10 Teletrax有限公司 Method and device for generating fingerprints of information signals
CN106663102B (en) * 2014-04-04 2021-05-07 Teletrax有限公司 Method and apparatus for generating a fingerprint of an information signal

Similar Documents

Publication Publication Date Title
CN101297356B (en) Audio compression
EP2791935B1 (en) Low complexity repetition detection in media data
US10261965B2 (en) Audio generation method, server, and storage medium
CN102314875B (en) Audio file identification method and device
CN104252862B (en) The method and apparatus for handling audio signal
EP2659481B1 (en) Scene change detection around a set of seed points in media data
CN1866355B (en) Audio coding apparatus and method, and audio decoding apparatus and method
CN104050259A (en) Audio fingerprint extracting method based on SOM (Self Organized Mapping) algorithm
US9997166B2 (en) Method, terminal, system for audio encoding/decoding/codec
WO2016119604A1 (en) Voice information search method and apparatus, and server
CN103559232A (en) Music humming searching method conducting matching based on binary approach dynamic time warping
CN101594527B (en) Two-stage method for detecting templates in audio and video streams with high accuracy
Dimoulas et al. Investigation of wavelet approaches for joint temporal, spectral and cepstral features in audio semantics
CN105608105A (en) Context listening based music recommendation method
CN102214219B (en) Audio/video content retrieval system and method
CN101266795B (en) An implementation method and device for grid vector quantification coding
CN101763848B (en) Synchronization method for audio content identification
CN103324663A (en) Compressed domain audio fingerprint extraction method based on MDCT spectrum expectation
WO2009088257A2 (en) Method and apparatus for identifying frame type
JP5384952B2 (en) Feature amount extraction apparatus, feature amount extraction method, and program
CN103294696A (en) Audio and video content retrieval method and system
CN103247316B (en) The method and system of index building in a kind of audio retrieval
CN102903365A (en) Method for refining parameter of narrow band vocoder on decoding end
Wang et al. Robust audio fingerprint extraction algorithm based on 2-D chroma
WO2012163013A1 (en) Music query method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130925