CN103730128A - Audio clip authentication method based on frequency spectrum SIFT feature descriptor - Google Patents

Audio clip authentication method based on frequency spectrum SIFT feature descriptor Download PDF

Info

Publication number
CN103730128A
CN103730128A CN201210389030.7A CN201210389030A CN103730128A CN 103730128 A CN103730128 A CN 103730128A CN 201210389030 A CN201210389030 A CN 201210389030A CN 103730128 A CN103730128 A CN 103730128A
Authority
CN
China
Prior art keywords
audio
frequency
delta
sift
suspicious
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210389030.7A
Other languages
Chinese (zh)
Inventor
李伟
殷玥
董旭炯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201210389030.7A priority Critical patent/CN103730128A/en
Publication of CN103730128A publication Critical patent/CN103730128A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The invention belongs to the technical field of information safety and protection, and relates to an audio clip authentication method based on a frequency spectrum SIFT feature descriptor, in particular to a content authentication method which is based on the computer vision technology and can use audio clips as detection objects. The SIFT feature matching is utilized for extracting the feature descriptor, and the detected suspicious audio clips are aligned in a reference audio file; the suspicious audio clips are divided into a plurality of blocks through contraction-expansion factors extracted from an SIFT key point; the time domain blocks can be directly used for recognizing malicious clipping, malicious insertion and other actions; modified factors are estimated and used for describing corresponding time domain units, so that robust Hash is calculated conveniently. Due to matching and Hash detection, not only can the integrity and authenticity of the suspicious audio clips be authenticated accurately, but also the positions of malicious tamper operations are accurately locked and classified according to types.

Description

A kind of audio fragment authentication method based on frequency spectrum SIFT Feature Descriptor
Technical field
The invention belongs to information security and the resist technology field of audio authentication aspect; relate to a kind of audio fragment authentication method based on frequency spectrum SIFT Feature Descriptor, be specifically related to a kind of based on content authentication method computer vision technique, that the audio fragment of can take is detected object.
Background technology
The audio content authentication techniques effective technology that to be realizations detect and protect for integrality and the authenticity of the voice datas such as music and voice; its object is mainly that the data that the reciever of assurance audio transmission obtains are not suffered third-party malice editor and distort in transport process, the angle of human perception system, is identical with original audio.Different from traditional signatures authentication, the multimedia authentications such as audio frequency are guaranteed is the authentication of file content but not protects simply bit stream.Audio authentication is recorded in national security, trade secret, news at present, music records distribution and there is important application in many fields such as copyright protection, military communication.
Up to now, the research of audio authentication only has a small amount of method to deliver, and is summarized as follows.
Document [1] has proposed a kind of half fragile voice digital watermark for inspection content integrality, i.e. exponential odd even modulation technique.The method, at DFT territory embed watermark, does not need excessive data to assist its completeness check, and can distinguish malice and distort and keep content operation, but the method only to resampling, white noise pollutes and the minority such as voice coding allows that operation tests.
Document [2] proposes a kind of authentication method based on feature according to following principle, and two similar its masking curves of audio frequency of acoustical quality are also highly similar.First by calculating the Hash functional value of audio masking curve, then adopt known data-hiding method that it is embedded in sound signal as watermark.When detecting, by after watermark extracting with the hash value comparison calculating before, calculate its related coefficient.Because this coefficient declines along with the decline of audio frequency acoustical quality also has appropriateness, thereby can judgement thresholding be suitably set according to receptible acoustical quality standard.The method can distinguish the Audio Signal Processing such as MP3 and malice is distorted operation.
Document [3] has been introduced two kinds of methods for audio content authentication.The first has been discussed possible audio frequency characteristics, to allow several follow-up signals to process; The second, for obtaining highest security, detects the change of each bit, and carrys out reconstruct original audio by introducing the concept of reversible water mark; The method and then again in conjunction with digital signature and digital watermarking, and use key to produce the method that can openly verify and can rebuild original audio.
The method that document [4] proposes is based on audio-frequency fingerprint, and the method that employing robust hashing function is combined with robust watermarking is verified the integrality of audio file.Experiment is mainly tested for MP3 compression artefacts, when higher sampling bits rate is as above in 128kbps, can reach the vision response test below 7%, but compared with low sampling rate during as 32kbps vision response test about 40% left and right.
In document [5] and document [6], mentioned distributed source compression for the application that keeps audio frequency and video quality, detection of malicious attack operation.Document utilizes reference audio in [5], has the audio authentication of robustness by Slepian-Wolf Code And Decode process, yet has supposed that in advance audio frequency to be verified and original reference audio frequency align.Document has been used the hash signature compacting in [6], and the storage space of reference database is not reduced to 20%-70% not etc.
Above audio authentication method is all to carry out on the audio frequency to be certified basis identical with reference audio length, but the frequent just fragment of audio frequency to be certified, so the present invention in actual applications intends providing a kind of studied brand new technical first: audio fragment content authentication.Audio fragment authentication, on the basis of conventional audio authentication, can be compared with the reference audio of raw footage with a bit of audio frequency and mate behind location, adopts Hash or watermarking algorithm to obtain authentication result.
Reference related to the present invention has:
[1]C.P.Wu?and?C.C.Kuo,“Fragile?speech?watermarking?based?on?exponential?scalequantization?for?tamper?detection,”ICASSP?2002,pp.3305-3308.
[2]R.Radhakrishnan?and?N.Memon,“Audio?content?authentication?based?onpsycho-acoustic?model,”SPIE?Security?and?Watermarking?of?Multimedia?Contents,4675:110-117,2002.
[3]M.Steinebach?and?J.Dittmann,“Watermarking-based?digital?audio?dataauthentication,”EURASIP?JournalonApplied?Signal?Processing,10:1001-1015,2003.
[4]S.Zmudzinski?and?M.Steinebach,“Perception-based?audio?authenticationwatermarking?in?the?time-frequency?domain,”IH2009,pp.146-160.
[5]D.Varodayan,Y.?C.Lin?and?B.Girod,“Audio?authentication?based?on?distributedsource?coding,”ICASSP?2008,pp.225-228.
[6]G.?Valenzise,G.?Prandi,M.Tagliasacchi?and?A.Sarti,“Identification?of?sparse?audiotampering?using?distributed?source?coding?and?compressive?sensing?techniques,”EURASIP?JournalonImage?and?Video?Processing,2009:1-12.
Summary of the invention
The object of the invention is to propose the new audio content authentication method of information protection and field of authentication.Be specifically related to a kind of audio fragment authentication method based on frequency spectrum SIFT Feature Descriptor, especially a kind of based on content authentication method computer vision technique, that the audio fragment of can take is detected object.
The audio content authentication method that the present invention proposes is the method based on computer vision technique.The present invention solves is audio fragment integrality in audio authentication and the test problems of authenticity.The present invention utilizes SIFT (Scale Invariant FeatureTransform) characteristic matching to extract Feature Descriptor, and realizes the alignment of examined suspicious audio fragment in reference audio file; The time contraction-expansion factor (Time-Stretching Factor) that recycling is extracted from SIFT key point, is divided into a plurality of piecemeals by suspicious audio fragment.Time domain partitioning can be directly used in that identification malice is sheared and the behavior such as insertion; In addition for the estimation of the modified tone factor (pitch-shiftingfactor), also can be used for describing corresponding time frequency unit, thereby be convenient to robust Hash, calculate; By coupling and Hash, detect integrality and authenticity that not only can the suspicious audio-frequency fragments of precise Identification, and the accurately locking classifying by type of the position that can distort operation to malice.
Audio authentication method of the present invention.Compare with the authentication method of the traditional complete audio frequency to be measured of needs, the present invention only needs an audio fragment to be measured can carry out to it judgement of authenticity and integrity, the requirement of more realistic application.Fragment alignment (step 1 ~ step 4), robust hashing value (Hash) that the method is divided into based on sound spectrograph SIFT local description are calculated (step 5 ~ step 6) and authentication decision (step 7) three parts, and concrete steps are as follows:
Step 1, is used Short Time Fourier Transform (STFT) to convert one dimension sound signal to corresponding two-dimentional time-frequency representation, gets the medium and low frequency section of 100 ~ 3000Hz; To guarantee to cover the frequency range at voice and most musical instruments place; The length of choosing of STFT time window is 4096;
Step 2, calculated characteristics descriptor
Calculate respectively 128 dimension SIFT Feature Descriptors of suspicious sound signal and reference audio signal, and obtain crucial match point (Matched SIFT Key Point) by comparing two groups of descriptors.If coupling logarithm is N, be designated as
Figure BDA00002250876700031
and then coupling centering suspicious audio frequency high order end, low order end match point and reference audio high order end, low order end match point can be expressed as, T 0 D = min { P 1 D . x , , . . . , P N D . x } , T 1 D = max { P 1 D . x , , . . . , P N D . x } , T 0 R = min { P 1 R . x , , . . . , P N R . x } , T 1 R = max { P 1 R . x , , . . . , P N R . x } ,
Step 3, the length of establishing suspicious audio fragment is L d, the length of reference audio is L r, the distance of audio fragment left margin and leftmost side SIFT unique point is
Figure BDA00002250876700036
the distance of right margin and rightmost side SIFT unique point is thereby the mapping distance in corresponding reference audio
Figure BDA00002250876700038
with
Figure BDA00002250876700039
available formula (1) calculates,
Δ 0 R = ( T 0 D ) Δ R ‾ Δ D ‾ Δ 1 R = ( L D - T 1 D ) Δ R ‾ Δ D ‾ - - - ( 1 )
Wherein Δ D ‾ = 1 N - 1 Σ i = 1 i = N - 1 ( P i + 1 D . x - P i D . x ) , Δ R ‾ = 1 N - 1 Σ i = 1 i = N - 1 ( P i + 1 R . x - P i R . x ) ;
Step 4, by the ascending order arrangement in chronological order of SIFT key point, thereby formula (2) compute location can be used in the position of suspicious audio fragment in reference audio,
T start R = ( T 0 R - Δ 0 R ) × ( P R × SR R ) L R T end R = ( T 1 R + Δ 1 R ) × ( P R × SR R ) L R - - - ( 2 )
L wherein r, SR r, P rrespectively frame number, sampling rate and the time span (second) of reference audio;
Step 5, for the pre-service of audio frequency attack (time-scaling, modified tone, time domain shearing and insertion etc.)
(1) time, domain partitioning is processed: respectively suspicious audio frequency and reference audio are divided into a plurality of fritters, make every interior time domain zoom factor (time-stretching factor) approximately equal, thereby be convenient to find that fragment and robust Hash afterwards that malice is sheared or inserted calculate.
Concrete processing is as follows:
For SIFT key point pair
Figure BDA00002250876700041
the distance L of two consecutive point in time domain (D, R}with corresponding time contraction-expansion factor R ibe defined as follows:
L { D , R } = { L i { D , R } = P i + 1 { D , R } . x - P i { D , R } . x , i = 1,2 , . . . , N - 1 R i = L i D L i R , i = 1,2 , . . . , N - 1 - - - ( 3 )
Then will gather { R ibe divided into several subsets { C i}:
Figure BDA00002250876700043
every two continuous subsets, as C i={ R j..., R kand C i+1={ R k+1..., R i, both contraction-expansion factors averaging time have larger gap.
According to
Figure BDA00002250876700044
be divided into the SIFT key point of each subset, utilize the fragment match method in step 2, suspicious audio fragment and reference audio fragment are divided into each corresponding piecemeal to { B 1, B 2..., B m, as shown in Figure 2.
Fig. 2 illustrated about time domain partitioning details.Suppose only through the signal such as irrelevant sequential such as lossy compression method, to process before suspicious audio frequency the free contraction-expansion factor R of institute obtaining iapproximately equal (≈ 1), and suspicious audio frequency and the reference audio signal corresponding with it all only have a time block; Suppose suspicious audio frequency flexible processing, all R of waiting of elapsed time before iapproximately equal (> 1 or < 1) still, and respectively there is a piecemeal on both sides; / 3rd sections of centres supposing suspicious audio frequency are cut off, and are cut so the R on part both sides ivalue will diminish, thereby form a concave point, and in this case, two section audios all can be divided into three piecemeals, the impact that still can not sheared of two sections of left and right, mutually alignment.
(2) frequency alignment: mate calculating the modified tone factor by SIFT key point, thereby calculate the corresponding relation of frequency between suspicious audible spectrum and reference audio frequency spectrum;
Concrete processing is as follows:
Owing to modifying tone, attack, the frequency values of suspicious audio frequency also can change in proportion from the initial value of reference audio; So in order to describe frequency content, must first estimate the modified tone factor (Pitch-shifting Factor).
Coupling for one group of SIFT key point is right
Figure BDA00002250876700045
their frequency content can be expressed as
Figure BDA00002250876700046
thereby the modified tone factor can be obtained by following formula,
R ^ = median ( { R ^ i | i = 1,2 , . . . , N } ) - - - ( 4 )
Wherein,
Figure BDA00002250876700051
the fundamental frequency of the ratio of reference audio and suspicious audio frequency
Figure BDA00002250876700052
concrete corresponding relation as shown in Figure 3.
Step 6, robust Hash calculates
Adopt Philips algorithm to carry out the calculation of Hash yardage.Each is organized to corresponding piecemeal pair
Figure BDA00002250876700053
its fragment length
Figure BDA00002250876700054
and frequency range
Figure BDA00002250876700055
first use formula (5) to adjust,
W i R = W i D R &OverBar; i , i = 1,2 , . . . , M F start R = F start D R ^ , F end R = F end D R ^ - - - ( 5 )
Wherein,
Figure BDA00002250876700057
with
Figure BDA00002250876700058
represent respectively corresponding piecemeal zoom factor and average modified tone factor averaging time;
Make E { D, R}(k, n) represents that in frequency spectrum, being positioned at k organizes frequency range, the energy of n time frame.By frequency band
Figure BDA000022508767000510
be divided into 33 nonoverlapping sub-bands, the 32 bit Hash codes in each region can use formula (6) to calculate,
H { D , 0 } ( k , n ) = 1 , if E { D , R } ( k , n ) > E { D , R } ( k + 1 , n ) 0 , if E { D , R } ( k , n ) &le; E { D , R } ( k + 1 , n ) - - - ( 6 )
K=1 wherein, 2 ..., 32, n=1,2 ... N f, the border of n can be by
Figure BDA000022508767000512
calculate;
Step 7, revises type detection
(1) malice shearing/Insert Fragment: by detection time zoom factor curve whether have concave point or salient point, judge whether audio file suffers the modification that malice is sheared or inserted; Wherein, if detection time, zoom factor was about 1 or fixed constant, illustrate that sound signal only experiences that the signal irrelevant with sequential processed or the whole time is flexible; If there is malice to shear/insert, on the curve of time contraction-expansion factor, corresponding position there will be concave point/salient point, as shown in Figure 4, Figure 5;
(2) malice frequency modification: constructing respectively according to SIFT key point can suspect signal and the histogram of reference signal, relatively judges from histogram whether apocrypha suffers malice frequency modification; For example, it is the behavior of a kind of typical malice frequency modification that bandwidth is blocked, spectrum position corresponding to region being modified, the SIFT key point of mating with reference audio can obviously reduce, accordingly, in the present invention, make the key point of SIFT coupling about the histogram of frequency, relatively determined whether that such distorts generation.As shown in Figure 6, transverse axis is that frequency 100 ~ 3000Hz is divided into 30 sections, and the longitudinal axis represents the number for the SIFT key point of frequency band match; Two figure are more known in left and right, and in 800 ~ 900Hz frequency range, right figure does not almost have match point, is very likely subject to frequency and distorts in known this section of frequency band.
(3) content modification: utilize bit error rate (BER) to judge whether file suffers the content modification of malice, and formula has defined threshold value T and decision rule in (7),
BER = 1 N f &Sigma; n = 1 N f H D ( n ) &CirclePlus; H R ( n ) - - - ( 7 )
If BER≤T, authentication is passed through, and represents that examined audio frequency does not suffer maliciously to distort, and the content integrity of file is good; If BER > is T, authentification failure, represents that audio frequency is maliciously tampered.In one embodiment of the present of invention, compare one section of given suspicious audio-frequency fragments x d() and one section of longer reference audio x r(), detects x dwhether () is subject to all kinds of attacks and distorts, and for the shearing/insertion of time domain and the bandwidth of frequency domain, blocks this two kinds of operations, judges respectively with time domain contraction-expansion factor and the crucial match point of SIFT about the histogram of time; If both all do not detect, x and then relatively r(), x r() robust hash value: pass through suspicious audio frequency x if BER (bit error rate) lower than threshold value, shows authentication r() do not suffer the operation of distorting semantically, may be that the contents such as lossy compression method, TSM and modified tone keep operation; If BER, higher than threshold value, shows authentification failure.
Accompanying drawing explanation
Fig. 1: the SIFT key point coupling between suspicious audio fragment frequency spectrum and reference audio file frequency spectrum.
Fig. 2: while there is shearing manipulation in suspicious audio frequency, the time domain partitioning based on time contraction-expansion factor detects explanation.
Fig. 3: the frequency alignment between suspicious audio frequency and reference audio piecemeal frequency spectrum.
Fig. 4: shear the time contraction-expansion factor curvilinear motion causing.
Fig. 5: insert the time contraction-expansion factor curvilinear motion causing.
Fig. 6: the histogram of the SIFT key point number vs. frequency band of coupling before and after frequency band blocks; Before the corresponding frequency band of left figure blocks, after the corresponding frequency band of right figure blocks.
Embodiment
For the validity of checking said method, the present invention has carried out following experiment.
Embodiment 1
The database that model has comprised 1030 audio files, has contained different voice signals and music signal.Long 2 minutes of each audio file, WAVE form, sampling rate is 44.1kHz, monophony.The suspicious audio fragment of intercepting to be certified is long for 10s, intercepts at random from above original audio file.In the present embodiment, the audio content Verification System that first test is constructed is according to the method described above for authentication percent of pass (the True Positive Rates that keeps content operation, TPR), and distort authentification failure rate (the True Negative Rates of operation for malice, TNR), the two is all considered the accuracy rate of Verification System, and the two numerical value is larger, all shows that the accuracy rate of system is higher.
Table 1. is to revise type and corresponding TPR/TNR thereof.
Table 1.
Figure BDA00002250876700071
Result shows, keeping content operation MP3 compression (32kbps), TSM ± 10%, ± modified tone 20% below and under the low-pass filtering of 4kHz and 8kHz, the authentication percent of pass of system remains at least 81% accuracy rate level; When TSM bring up to ± 20% time, accuracy rate declines to some extent, but still remains on 80% left and right.Even if it is pointed out that without any attack, the Verification System being built by SIFT characteristic matching symbol also cannot guarantee 100% authentication accuracy rate, in other words, detects the audio fragment of 1030 unmodified, and 968 audio frequency of only having an appointment can be by authentication.Reason is that fragment to be certified can not be navigated to its accurate location in reference audio signal by coupling to entirely accurate, still can authentification failure without any distortion in the situation that thereby cause.
For the detection of the tampering in three kinds of time domains (replace, shear and insert), authentification failure rate is respectively 99.4%, 99.6% and 100%, has proved the validity of time contraction-expansion factor when identification time domain is distorted operation; And when the malice bandwidth blocking operation (barrage width is respectively 800 ~ 900Hz, 1500 ~ 1600Hz) detecting in frequency, 82.9% effect being compared in time domain with 76.7% accuracy rate is relative lower.
The present invention has detected two error statistics data important in actual authentication system, and wherein, one is FPR (False Positive Rate), refers to be mistaken for the ratio that keeps content, i.e. Error type I by distorting; Another is FNR (FalseNegative Rate), refers to keep content to be mistaken for the ratio of distorting, i.e. error type II.By the confusion matrix of table 2, carry out the index of illustrative system.
The confusion matrix of table 2. authentication error rate
Figure BDA00002250876700072
Table 2 has shown and in content operations, have and by mistake, be judged to be and distort for 2340 times, so FNR is 0.1623 altogether keeping for 14420 times; Whether reason is to keep the operation of content can cause the time-frequency representation of fragment to change, and has influence on the accuracy of fragment positioned in alignment, simultaneously due to the demand authenticating, on threshold value setting, for keeping the boundary member of content to tend to make the judgement of distorting.In 3090 times distort altogether, have and by mistake, be judged to be maintenance content 10 times, so FPR is 0.00324.From the matrix of table 2, utilize the audio authentication system of this method effectively to differentiate to distort and keep content operation.

Claims (2)

1. the audio fragment authentication method based on frequency spectrum SIFT Feature Descriptor, it is characterized in that, it comprises: the fragment alignment step (1 ~ 4) based on sound spectrograph SIFT local description, robust hashing value calculation procedure (5 ~ 6) and authentication decision step (7):
Step 1, is used Short Time Fourier Transform (STFT) to convert one dimension sound signal to corresponding two-dimentional time-frequency representation, gets the medium and low frequency section of 100 ~ 3000Hz;
Step 2, calculated characteristics descriptor
The 128 dimension SIFT Feature Descriptors that calculate respectively suspicious sound signal and reference audio signal, obtain crucial match point by comparing two groups of descriptors, and establishing coupling logarithm is N, is designated as
Figure FDA00002250876600011
the suspicious audio frequency high order end of coupling centering, low order end match point and reference audio high order end, low order end match point are expressed as, T 0 D = min { P 1 D . x , , . . . , P N D . x } , T 1 D = max { P 1 D . x , , . . . , P N D . x } , T 0 R = min { P 1 R . x , , . . . , P N R . x } , T 1 R = max { P 1 R . x , , . . . , P N R . x } ;
Step 3, the length of establishing suspicious audio fragment is L d, the length of reference audio is L r, the distance of audio fragment left margin and leftmost side SIFT unique point is
Figure FDA00002250876600016
the distance of right margin and rightmost side SIFT unique point is
Figure FDA00002250876600017
mapping distance in corresponding reference audio
Figure FDA00002250876600018
with
Figure FDA00002250876600019
by formula (1), obtain,
&Delta; 0 R = ( T 0 D ) &Delta; R &OverBar; &Delta; D &OverBar; &Delta; 1 R = ( L D - T 1 D ) &Delta; R &OverBar; &Delta; D &OverBar; - - - ( 1 )
Wherein &Delta; D &OverBar; = 1 N - 1 &Sigma; i = 1 i = N - 1 ( P i + 1 D . x - P i D . x ) , &Delta; R &OverBar; = 1 N - 1 &Sigma; i = 1 i = N - 1 ( P i + 1 R . x - P i R . x ) ;
Step 4, by the ascending order arrangement in chronological order of SIFT key point, locate by formula (2) position of suspicious audio fragment in reference audio,
T start R = ( T 0 R - &Delta; 0 R ) &times; ( P R &times; SR R ) L R T end R = ( T 1 R + &Delta; 1 R ) &times; ( P R &times; SR R ) L R - - - ( 2 )
L wherein r, SR r, P rrespectively frame number, sampling rate and the time span (second) of reference audio;
Step 5, for the pre-service of audio frequency attack:
Time domain partitioning process: respectively suspicious audio frequency and reference audio are divided into a plurality of fritters, make time domain zoom factor approximately equal in piece, thus be convenient to find fragment that malice is sheared or inserted and convenient after robust Hash calculate;
Frequency alignment: mate calculating the modified tone factor by SIFT key point, thereby calculate the corresponding relation of frequency between suspicious audible spectrum and reference audio frequency spectrum;
Step 6, robust Hash calculates
Adopt Philips method to carry out the calculation of Hash yardage; Each is organized to corresponding piecemeal pair its fragment length
Figure FDA00002250876600022
and frequency range
Figure FDA00002250876600023
first use formula (3) to adjust,
W i R = W i D R i &OverBar; , i = 1,2 , . . . , M F start R = F start D R ^ , F end R = F end D R ^ - - - ( 3 )
Wherein,
Figure FDA00002250876600025
with
Figure FDA00002250876600026
represent respectively corresponding piecemeal
Figure FDA00002250876600027
zoom factor and average modified tone factor averaging time;
Make E { D, R}, (k, n) represents that in frequency spectrum, being positioned at k organizes frequency range, the energy of n time frame; By frequency band
Figure FDA00002250876600028
be divided into 33 nonoverlapping sub-bands, the 32 bit Hash codes in each region adopt formula (4) to calculate,
H { D , 0 } ( k , n ) = 1 , if E { D , R } ( k , n ) > E { D , R } ( k + 1 , n ) 0 , if E { D , R } ( k , n ) &le; E { D , R } ( k + 1 , n ) - - - ( 4 )
K=1 wherein, 2 ..., 32, n=1,2 ..., N f, the border of n by
Figure FDA000022508766000210
obtain;
Step 7, revises type detection
Malice shearing/Insert Fragment: by detection time zoom factor curve whether have concave point or salient point, judge whether audio file suffers that malice is sheared or the modification of insertion;
Malice frequency modification: constructing respectively according to SIFT key point can suspect signal and the histogram of reference signal, relatively judges from histogram whether apocrypha suffers malice frequency modification;
Content modification: utilize bit error rate to judge whether file suffers the content modification of malice, and formula has defined threshold value T and decision rule in (5),
BER = 1 N f &Sigma; n = 1 N f H D ( n ) &CirclePlus; H R ( n ) - - - ( 5 )
If BER≤T, authentication is passed through, and represents that examined audio frequency does not suffer maliciously to distort, and the content integrity of file is good; If BER > is T, authentification failure, represents that audio frequency is maliciously tampered.
2. by method claimed in claim 1, it is characterized in that the attack of described step 5: time-scaling, modified tone, time domain are sheared or inserted.
CN201210389030.7A 2012-10-13 2012-10-13 Audio clip authentication method based on frequency spectrum SIFT feature descriptor Pending CN103730128A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210389030.7A CN103730128A (en) 2012-10-13 2012-10-13 Audio clip authentication method based on frequency spectrum SIFT feature descriptor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210389030.7A CN103730128A (en) 2012-10-13 2012-10-13 Audio clip authentication method based on frequency spectrum SIFT feature descriptor

Publications (1)

Publication Number Publication Date
CN103730128A true CN103730128A (en) 2014-04-16

Family

ID=50454174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210389030.7A Pending CN103730128A (en) 2012-10-13 2012-10-13 Audio clip authentication method based on frequency spectrum SIFT feature descriptor

Country Status (1)

Country Link
CN (1) CN103730128A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134443A (en) * 2014-08-14 2014-11-05 兰州理工大学 Symmetrical ternary string represented voice perception Hash sequence constructing and authenticating method
CN104361889A (en) * 2014-10-28 2015-02-18 百度在线网络技术(北京)有限公司 Audio file processing method and device
CN105227311A (en) * 2014-07-01 2016-01-06 腾讯科技(深圳)有限公司 Verification method and system
CN107785023A (en) * 2016-08-25 2018-03-09 财团法人资讯工业策进会 Voiceprint identification device and voiceprint identification method thereof
CN108665905A (en) * 2018-05-18 2018-10-16 宁波大学 A kind of digital speech re-sampling detection method based on band bandwidth inconsistency
CN108766464A (en) * 2018-06-06 2018-11-06 华中师范大学 Digital audio based on mains frequency fluctuation super vector distorts automatic testing method
CN109284717A (en) * 2018-09-25 2019-01-29 华中师范大学 It is a kind of to paste the detection method and system for distorting operation towards digital audio duplication
CN115798490A (en) * 2023-02-07 2023-03-14 西华大学 Audio watermark implantation method and device based on SIFT

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1169199A (en) * 1995-01-26 1997-12-31 苹果电脑公司 System and method for generating and using context dependent subsyllable models to recognize a tonal language
US20120132056A1 (en) * 2010-11-29 2012-05-31 Wang Wen-Nan Method and apparatus for melody recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1169199A (en) * 1995-01-26 1997-12-31 苹果电脑公司 System and method for generating and using context dependent subsyllable models to recognize a tonal language
US20120132056A1 (en) * 2010-11-29 2012-05-31 Wang Wen-Nan Method and apparatus for melody recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIANGYANG XUE ET AL.: "Towards content-based audio fragment authentication", 《MM "11: PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA 》, 31 December 2011 (2011-12-31), pages 1249 - 1252 *
唐朝伟等: "一种改进的SIFT描述子及其性能分析", 《武汉大学学报·信息科学版》, vol. 37, no. 1, 31 January 2012 (2012-01-31), pages 11 - 16 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105227311A (en) * 2014-07-01 2016-01-06 腾讯科技(深圳)有限公司 Verification method and system
CN104134443A (en) * 2014-08-14 2014-11-05 兰州理工大学 Symmetrical ternary string represented voice perception Hash sequence constructing and authenticating method
CN104134443B (en) * 2014-08-14 2017-02-08 兰州理工大学 Symmetrical ternary string represented voice perception Hash sequence constructing and authenticating method
CN104361889A (en) * 2014-10-28 2015-02-18 百度在线网络技术(北京)有限公司 Audio file processing method and device
CN104361889B (en) * 2014-10-28 2018-03-16 北京音之邦文化科技有限公司 Method and device for processing audio file
CN107785023A (en) * 2016-08-25 2018-03-09 财团法人资讯工业策进会 Voiceprint identification device and voiceprint identification method thereof
CN108665905A (en) * 2018-05-18 2018-10-16 宁波大学 A kind of digital speech re-sampling detection method based on band bandwidth inconsistency
CN108766464A (en) * 2018-06-06 2018-11-06 华中师范大学 Digital audio based on mains frequency fluctuation super vector distorts automatic testing method
CN108766464B (en) * 2018-06-06 2021-01-26 华中师范大学 Digital audio tampering automatic detection method based on power grid frequency fluctuation super vector
CN109284717A (en) * 2018-09-25 2019-01-29 华中师范大学 It is a kind of to paste the detection method and system for distorting operation towards digital audio duplication
CN115798490A (en) * 2023-02-07 2023-03-14 西华大学 Audio watermark implantation method and device based on SIFT
CN115798490B (en) * 2023-02-07 2023-04-21 西华大学 Audio watermark implantation method and device based on SIFT transformation

Similar Documents

Publication Publication Date Title
CN103730128A (en) Audio clip authentication method based on frequency spectrum SIFT feature descriptor
Renza et al. Authenticity verification of audio signals based on fragile watermarking for audio forensics
US6674861B1 (en) Digital audio watermarking using content-adaptive, multiple echo hopping
US6738744B2 (en) Watermark detection via cardinality-scaled correlation
KR100492743B1 (en) Method for inserting and detecting watermark by a quantization of a characteristic value of a signal
Özer et al. Perceptual audio hashing functions
Dhar et al. Audio watermarking in transform domain based on singular value decomposition and Cartesian-polar transformation
Chen et al. Perceptual audio hashing algorithm based on Zernike moment and maximum-likelihood watermark detection
Dhar A blind audio watermarking method based on lifting wavelet transform and QR decomposition
Li et al. Audio-lossless robust watermarking against desynchronization attacks
Huang et al. A reversible acoustic steganography for integrity verification
Huang et al. A new approach of reversible acoustic steganography for tampering detection
CN104091104B (en) Multi-format audio perceives the characteristics extraction of Hash certification and authentication method
Wang et al. Tampering Detection Scheme for Speech Signals using Formant Enhancement based Watermarking.
CN101609675B (en) Fragile audio frequency watermark method based on mass center
Su et al. Window switching strategy based semi-fragile watermarking for MP3 tamper detection
Li et al. Music content authentication based on beat segmentation and fuzzy classification
Zmudzinski et al. Perception-based audio authentication watermarking in the time-frequency domain
Li et al. Content based JPEG fragmentation point detection
Zhang et al. An encrypted speech authentication method based on uniform subband spectrumvariance and perceptual hashing
Karnjana et al. Audio Watermarking Scheme Based on Singular Spectrum Analysis and Psychoacoustic Model with Self‐Synchronization
Masmoudi et al. MP3 Audio watermarking using calibrated side information features for tamper detection and localization
Xue et al. Towards content-based audio fragment authentication
Zmudzinski et al. Digital audio authentication by robust feature embedding
Jiao et al. Key-dependent compressed domain audio hashing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140416