CN101350198B - Method for compressing watermark using voice based on bone conduction - Google Patents

Method for compressing watermark using voice based on bone conduction Download PDF

Info

Publication number
CN101350198B
CN101350198B CN2008101507573A CN200810150757A CN101350198B CN 101350198 B CN101350198 B CN 101350198B CN 2008101507573 A CN2008101507573 A CN 2008101507573A CN 200810150757 A CN200810150757 A CN 200810150757A CN 101350198 B CN101350198 B CN 101350198B
Authority
CN
China
Prior art keywords
frame
watermark
group
organizes
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008101507573A
Other languages
Chinese (zh)
Other versions
CN101350198A (en
Inventor
同鸣
姬红兵
陈巍
闫涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN2008101507573A priority Critical patent/CN101350198B/en
Publication of CN101350198A publication Critical patent/CN101350198A/en
Application granted granted Critical
Publication of CN101350198B publication Critical patent/CN101350198B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The invention discloses a method of speech compression watermarking based on bone conduction, which substantially solves the problem that prior similar methods can not detect a position and a type of attacking. During the process of watermarking and embedding, watermarking which is embedded in a current frame is produced by a line spectrum frequency coefficient of the current frame, a pitch period which is extracted from a next frame and the watermarking of a previous frame, different watermarking generative rules are selected according to the position of different frames, and the watermarking of the current frame is embedded in a position serial number of multi-pulse excitation of speech coding. During the process of validating the watermarking, the watermarking is extracted from the position serial number of the pulse excitation which is selected, the watermarking which is extracted is compared with validated watermarking, and the position of attacking and the attacking types are detected according to different results which are produced by different attacking positions and attacking types. The invention has excellent noise suppression ability and low frequency pickup ability, can be used for voice recording equipment, and provides warranty for the integrity and the authenticity of digital speech.

Description

Method for compressing watermark using voice based on bone conduction
Technical field
The invention belongs to field of information processing, be specifically related to the method for watermark compression, can be used for sound pick-up outfit, for digital speech integrality and authenticity are given security.
Background technology
Digital watermarking is the research focus of multimedia messages processing in recent years, and the digital speech watermark has the characteristics of himself as a vital classification in digital watermarking field.Speech signal bandwidth is narrower, and more more convenient than image and vision signal in transmission, form is also more various, as phone, and audio broadcasting, and video sound accompaniment etc. all is common in the daily life, its coverage rate is very extensive.Along with the development of modern signal processing technology, people can be easy to according to the intention of oneself voice signal of juggling the figures, and therefore in order to prove the validity of audio digital signals content, the researchist has proposed the notion of breakable watermark.Opposite fully with the robustness watermark, breakable watermark has the sensitivity of height to attack, the voice signal of embed watermark is done any small change watermark can't be recovered or incomplete recovery, thereby judge whether voice signal is distorted.
The frangible voice water mark method that has proposed at present mainly contains:
1.Chung-Ping Wu, Kuo, C.C.J.Speech content integrity verification integrated with ITUG.723.1 speech coding.Proc.IEEE Int.Conf.Inf.Technol.:Coding Comput., 2001, pp:680-684, this method extracts some features of voice, embeds as watermark after treatment, judges the integrality of voice by the feature of the watermark of relatively extracting and selection.Because this method depends on a threshold value and judges that whether voice are modified, and easily cause erroneous judgement.
2.Chen Ning, Zhu Jie.An Efficient Approach to Integrate Watermarking with SpeechCoding Algorithm.Communications and Networking in China, CHINACOM ' 07.2007:355-359, this method is embedded in watermark in the secondary vector quantization parameter of speech linear predictive coefficient, its defective is to judge whether host's carrier is attacked, and can not judge attack type.
3.Chung-Ping Wu, Kuo C.-C.J.Fragile speech watermarking based on exponential scalequantization for tamper detection.Acoustics, Speech, and Signal Processing, (2002.Proceedings. ICASSP ' 02) .2002 (4): 13-17, though this method can tell resampling, Gauss add make an uproar, G.711 compress speech and the type of attacking such as voice compression coding G.721, can not tell the type that insertion, replacement and shearing etc. are attacked.
4.Chia-Hsiung Liu, Chen O.T.-C.A content-based fragile watermark scheme for speechwaveform authentication.Circuits and Systems, 2005.2005 (1): 432-435, though this method can detect position and type that insertion, replacement, deletion etc. are attacked, but belong to the time domain watermark, be not suitable for the voice of compressed format.
In sum, existing frangible voice watermark has following deficiency: 1) embed watermark is fixed, easily deleted or forgery; 2) mostly based on time domain or transform domain, can't satisfy the requirement of real-time, and the watermark that embeds is easily lost after overcompression is handled, development along with voice compression technique, increasing voice signal is to exist with the form after the compression, adopts the method embed watermark of time domain or transform domain then will face complicated encoding-decoding process; 3) most fragile watermark method can only judge whether host's carrier is distorted, and can not detect to attack position and attack type.
Summary of the invention
The present invention seeks at above-mentioned deficiency, propose a kind ofly can detect the method for compressing watermark using voice of attacking position and attack type, for digital speech integrality and authenticity are given security based on bone conduction.
The key problem in technology of realizing the object of the invention is: in watermark embed process, with the initial branch of voice signal in groups, each group comprises some frames.Be embedded into the watermark of present frame and produce by the line spectral frequencies coefficient of present frame, the pitch period of next frame extraction and the watermark of former frame, and the watermark create-rule different according to the choice of location of different frame.In the watermark verification process, use the watermark of extraction and checking watermark to compare, according to the Different Results of difference attack position and attack type generation, detect the position and the attack type of the frame of being attacked.Concrete scheme is as follows:
One, watermark embed process
(1) utilizes the bone conduction device signal that voice signal is removed noise and other noise pre-service, and extract the numbering IDD of bone conduction equipment and the numbering ID of voice signal S
(2) pretreated voice signal is divided into the frame of several regular lengths according to the length of voice, every frame length 30ms is divided into one group with the n frame, and except last group, promptly the frame number of last group is the total frame number remainder divided by n;
(3) utilize G.723.1 voice compression coding standard, extract each main feed line spectral frequency coefficient L G, iWith pitch period P G, i, generate in order to subsequent watermark, wherein g represents the sequence number of the group at frame place, i represents the position of frame in group;
(4) utilize the numbering ID of Hash hash function with the bone conduction equipment of extraction D, voice signal numbering ID S, present frame line spectral frequencies coefficient L G, i, the pitch period P that extracts of next frame G, i+1Watermark W with former frame G, i-1, generate the watermark W that is embedded in present frame G, i
(5) with the watermark W of present frame G, iBe embedded in the position number of multi-pulse excitation of voice coding, promptly replace the least significant bit of the position number of pulse excitation, finally obtain containing the voice compression coding stream of watermark with watermark.
Two, watermark extracting proof procedure
1) the compressed encoding stream that contains watermark is carried out decoding processing;
2) from the position number of selected pulse excitation, extract watermark
Figure G2008101507573D00021
3) utilize the numbering ID of Hash hash function with bone conduction equipment D, voice signal numbering ID S, present frame line spectral frequencies coefficient L G, i, the pitch period P that extracts of next frame G, i+1Watermark with the former frame extraction
Figure G2008101507573D00031
Generate the checking watermark
4) relatively g organizes the watermark that the i frame extracts
Figure G2008101507573D00033
With the checking watermark that generates
Figure G2008101507573D00034
If
Figure G2008101507573D00035
Be not equal to
Figure G2008101507573D00036
Judge that then g organizes i frame mistake,, judge the position and the type of attacking according to position and the distribution that mistake occurs.
This method has the following advantages:
1) the present invention is owing to be embedded into the watermark of present frame by the line spectral frequencies coefficient of present frame, the pitch period of next frame extraction and the watermark generation of former frame, and the watermark create-rule different according to the choice of location of different frame, in the watermark verification process, relatively watermark of Ti Quing and checking watermark, attack the Different Results of position and attack type generation according to difference, detect and attack position and attack type, thereby can more effectively guarantee the integrality of digital multimedia products such as digital recording.
2) the present invention carries out pre-service to voice signal and removes noise and other noise owing to utilized good noise suppression ability of bone conduction device and low frequency pickup ability, and extracts the bone conduction device numbering with generating watermark, has increased the security of watermark.
Description of drawings
Fig. 1 is a watermark embed process block diagram of the present invention;
Fig. 2 is a watermark extraction process block diagram of the present invention;
Fig. 3 is that the present invention is to inserting the analogous diagram of attacking;
Fig. 4 is that the present invention is to replacing the analogous diagram of attacking;
Fig. 5 is the analogous diagram that the present invention attacks deletion;
Fig. 6 is that the present invention is to replacing and insert the analogous diagram that gangs up against.
Embodiment
One. the basic theory introduction
1. bone conduction device
In recent years, because the speech processes characteristic of bone conduction device uniqueness has obtained field of voice signal and has more and more paid close attention to.This device is a kind of high-sensitive solid audio sensor, have good noise suppression ability and low frequency pickup ability, its principle is when people speak, and the bone of head can produce vibration, and the bone conduction of being close to bone can capture the foundation of these vibration informations as speech detection.In addition, as a kind of solid shock sensor, the bone conduction device is insensitive to neighbourhood noise, has natural noise suppression advantage.The bone conduction device can only receive the composition of voice signal 3.5~4kHz with lower frequency, and most energy of voice all concentrate in this frequency range, and its airborne to external world vibration simultaneously is insensitive.
2.G.723.1 voice compression coding standard
G.723.1 standard is a kind of Low Bit-rate Coding algorithm that ITU-T is organized in release in 1996.Be mainly used in compression, as video-phone system, digital transmission system and quality voice compressibility etc. to voice and other multimedia audio signals.This scrambler is the speech coder of a dual rate, and two code rates are respectively 6.3kbps and 5.3kbps.6.3kbps two-forty adopt multi-pulse excitation maximum likelihood quantification MP MLQ algorithm, the low rate of 5.3kbps adopts algebraic codebook Excited Linear Prediction ACELP algorithm.These two kinds of algorithms have identical theoretical foundation, all are based on linear pre-side LPC, adopt the driving source of aperiodic component.
G.723.1 standard is based on the algorithm of Code Excited Linear Prediction CELP encoding model, at coding side, analog input signal at first passes through voice band filtering, again filtering output is become 16bit linear PCM voice signal through the 8khz sample conversion, each frame is 30ms, 240 sampling points should be arranged mutually, then, extract the CELP parameter by the time-delay voice are analyzed, these parameter codings are transmitted.In decoding end,, pumping signal is obtained reconstructed speech signal by composite filter with these parametric configuration pumping signal and composite filter.
3.Hash function
The Hash function is a kind of cryptographic algorithm that is used for information security field, and it can change into the information of some different lengths the coding of one group of regular length, i.e. hash value.Hash value length is usually much smaller than the length of importing.The Hash function of a safety should satisfy following condition at least: 1) input length is arbitrarily; 2) output length is fixed; 3) to each given input, calculating output is that hash value is easily; 4) message selected at random of the description of given Hash function and one, find another message different with this message make their Hash to same value be calculate infeasible.
The Hash function is mainly used in completeness check and improves the validity of digital signature, existing at present a lot of schemes, as: MD4, MD5, SHA1 etc.Because the MD5Hash algorithm has " digital finger-print " characteristic, become most widely used a kind of file integrality checking algorithm at present, so this paper adopts the MD5Hash algorithm to generate watermark.
Two, related symbol explanation
ID DThe numbering of bone conduction equipment
ID SThe numbering of voice
L G, iG organizes the line spectral frequencies coefficient of i frame
P G, iG organizes the pitch period of i frame
W G, iBe embedded in g and organize the watermark information of i frame
The least significant bit of LSB (x) x
The position number of Exc (i) i selected pulse
The i bit of W (i) watermark information
G organizes the watermark of extracting the i frame from synthetic speech
Figure G2008101507573D00042
Organize the checking watermark of i frame by the g of synthetic speech generation
Three, based on the compression domain voice water mark method of bone conduction
With reference to Fig. 1, digital watermark embed process of the present invention is as follows:
Step 1 is carried out pre-service to voice signal.
Because the bone conduction device has good noise suppression ability, and voice signal and bone conduction signal are fully synchronous, the short-time energy of therefore calculating the bone conduction signal distributes, and set a thresholding e, if the bone conduction signal energy of section is greater than e sometime, then the voice signal of this time period is constant, otherwise can think that the voice signal of this time period is noise or noise, with the voice zero setting of this time period, thus noise and noise in the removal voice signal; And the numbering ID of extraction bone conduction equipment DNumbering ID with voice S, in order to generate watermark.The numbering ID of bone conduction equipment wherein DRelevant with digital recording system, guaranteed that the identical voice segments that distinct device is recorded produces different watermarks; The numbering ID of voice signal SRelevant with the time and the number of times of recording, it has guaranteed that the identical speech data that identical equipment is recorded can produce different watermarks in the different time.
Step 2 is divided frame, packet transaction.
Pretreated voice signal is divided into the frame of several regular lengths according to the length of voice, and every frame length 30ms is divided into one group with the n frame, and except last group, promptly the frame number of last group is the total frame number remainder divided by n;
Step 3, extracting parameter L G, iAnd P G, i
Utilize G.723.1 voice compression coding standard, extract each main feed line spectral frequency coefficient L G, iWith pitch period P G, i, generate in order to subsequent watermark, wherein g represents the sequence number of the group at frame place, i represents the position of frame in group.
Step 4 generates watermark.
Utilize the numbering ID of Hash hash function with the bone conduction equipment of extraction D, voice signal numbering ID S, present frame line spectral frequencies coefficient L G, i, the pitch period P that extracts of next frame G, i+1Watermark W with former frame G, i-1, generate the watermark W that embeds present frame by following three kinds of situations G, i:
(1), then presses following formula and generate watermark W if present frame is first frame of each group G, 1, promptly
W g,1=H x(ID D,ID S,g,W g-1,n,L g,1,P g,2)
In the formula, H x() expression Hash function,
W G, 1Be that g organizes the watermark that first frame generates, when g=1, make W 0, nBe private key Key,
W G-1, nBe the watermark of the n frame of g-1 group, i.e. the watermark of former frame,
L G, 1Be the line spectral frequencies coefficient that g organizes the 1st frame,
P G, 2It is the pitch period that g organizes the 2nd frame.
(2) if present frame is the last frame of last group, establishing last group has the m frame, then presses following formula and generates watermark W T, m, promptly
W T,m=H x(ID D,ID S,W T,m-1,L T,m,(T-1)×n+m)
In the formula, W T, mBe that T organizes the watermark that the m frame generates,
W T, m-1Be the watermark of the m-1 frame of T group,
L T, mBe the line spectral frequencies coefficient that T organizes the m frame,
(T-1) * n+m is the voice totalframes.
(3) other situation is then pressed following formula and is generated watermark W G, i, promptly
W g,i=H x(ID D,ID S,W g,i-1,L g,i,P g,i+1)
In the formula, W G, iBe the watermark of the i frame generation of g group,
W G, i-1Be the watermark of the i-1 frame of g group,
L G, iBe the line spectral frequencies coefficient of the i frame of g group,
P G, i+1It is the pitch period that g organizes the i+1 frame.
Step 5, watermark embeds.
Because G.723.1 standard is based on the algorithm of Code Excited Linear Prediction CELP encoding model, pumping signal therefore will be with the watermark W of present frame to the minimum that influences of voice quality G, iBe embedded in the position number of multi-pulse excitation of voice coding, promptly replace the least significant bit of the position number of pulse excitation, finally obtain containing the voice compression coding stream of watermark with watermark.
With reference to Fig. 2, digital watermarking leaching process of the present invention is as follows:
Steps A, decoding processing.
Reference is the voice compression coding standard G.723.1, the voice coding stream that contains watermark is carried out decoding processing, and extract the coefficient L of the line spectral frequencies of every frame G, iWith pitch period P G, i
Step B, watermark extracting.
Because watermark is to be embedded on the least significant bit of the position number of selected pulse excitation, presses the following formula watermark extracting.
W′(i)=LSB(Exc(i))
Wherein, W ' (i) represents the i bit of the watermark information that extracts.
Step C generates the checking watermark.
Utilize the numbering ID of Hash hash function with bone conduction equipment D, voice signal numbering ID S, present frame line spectral frequencies coefficient L G, i, the pitch period P that extracts of next frame G, i+1Watermark with the former frame extraction
Figure G2008101507573D00061
Generate the checking watermark by following three kinds of situations
Figure G2008101507573D00062
(1), then presses following formula and generate watermark if present frame is first frame of each group
Figure G2008101507573D00063
Promptly
W g , 1 ′ ′ = H x ( ID D , ID S , g , W g - 1 , n ′ , L g , 1 , P g , 2 )
In the formula, H x() expression Hash function,
Be the checking watermark that g organizes the 1st frame, when g=1, W 0, nBe private key Key,
Figure G2008101507573D00066
Be the watermark of the n frame of g-1 group, i.e. the watermark of former frame extraction,
L G, 1Be the line spectral frequencies coefficient that g organizes the 1st frame,
P G, 2It is the pitch period that g organizes the 2nd frame;
(2) if present frame is the last frame of last group, establishing last group has the m frame, then presses following formula and generates watermark
Figure G2008101507573D00071
Promptly
W T , m ′ ′ = H x ( ID D , ID S , W T , m - 1 ′ , L T , m , ( T - 1 ) × n + m )
In the formula,
Figure G2008101507573D00073
Be that T organizes the checking watermark that the m frame generates,
Be the watermark of the m-1 frame extraction of T group,
L T, mBe the line spectral frequencies coefficient that T organizes the m frame,
(T-1) * n+m is the voice totalframes.
(3) other situation is then pressed following formula and is generated watermark W G, i, promptly
W g , i ′ ′ = H x ( ID D , ID S , W g , i - 1 ′ , L g , i , P g , i + 1 )
In the formula,
Figure G2008101507573D00076
Be the checking watermark of the i frame generation of g group,
Figure G2008101507573D00077
Be the watermark of the i-1 frame extraction of g group,
L G, iBe the line spectral frequencies coefficient of the i frame of g group,
P G, i+1It is the pitch period that g organizes the i+1 frame.
Step D, watermark verification.
Relatively g organizes the watermark that the i frame extracts
Figure G2008101507573D00078
With the checking watermark that generates
Figure G2008101507573D00079
If
Figure G2008101507573D000710
Be not equal to
Figure G2008101507573D000711
Judge that then g organizes i frame mistake,, judge the position and the type of attacking according to following process according to position and the distribution that mistake occurs:
(4a) detect the attack position
Since first group of first frame, the relatively watermark of Ti Quing
Figure G2008101507573D000712
With the checking watermark that generates
Figure G2008101507573D000713
If
Figure G2008101507573D000714
With
Figure G2008101507573D000715
Unanimity then continues relatively next frame; If W g , i ′ ′ ≠ W g , i ′ ′ , Judge that g organizes i frame mistake, then position under fire is that g organizes the i frame;
(4b) set type of error
Error type I: it is inconsistent with the checking watermark that a certain group first frame extracts watermark, but the extraction watermark of its former frame and back one frame is consistent with the checking watermark, that is: W g , 1 ′ ≠ W g , 1 ′ ′ , W g - 1 , n ′ = W g - 1 , n ′ ′ , W g , 2 ′ = W g , 2 ′ ′ ;
Error type II: it is inconsistent with the checking watermark that the last frame of last group extracts watermark, but the extraction watermark of its former frame is consistent with the checking watermark;
The 3rd class mistake: other mistake is the 3rd class mistake;
(4c) judge attack type according to the different distributions of different type of errors
If attack type is attacked for inserting, then insert the position of attacking and the 3rd continuous class mistake occurs, the equally spaced first kind and the 3rd class mistake of occurring after the 3rd continuous class mistake, error type II appears in last frame, is spaced apart every group of frame number that comprises;
If attack type is attacked for replacing, then replace the position of attacking and the 3rd continuous class mistake, other frame inerrancy occur;
If attack type is attacked for deletion, then the equally spaced first kind and the 3rd class mistake of occurring after the position that deletion is attacked is spaced apart every group the frame number that comprises, and error type II can appear in last frame.
Effect of the present invention can further specify by following experiment simulation:
1. simulated conditions
Select sample frequency 8khz, the voice signal of the monophony wav form of quantization digit 16bit embeds carrier as watermark, and voice length is 25.5s, is divided into 850 frames, and every frame length 30ms is divided into 17 groups, and every group comprises 50 frames.The experiment software environment is Matlab7.0.Several attack tests below having designed:
(1) insert attacking, is the position that signal that a section of 216 frames does not contain watermark is inserted into the 325th frame of synthetic speech with length.
(2) changing attack, is that the signal that a section of 216 frames does not contain watermark is replaced the 325th frame of synthetic speech to 540 frames with length.
(3) deletion is attacked, the 325th frame to the 500 frames of deletion synthetic speech.
(4) ganging up against, is that a section of 116 frames does not contain the 175th frame that the signal of watermark replaces synthetic speech to 290 frames with length, is the position that signal that a section of 116 frames does not contain watermark is inserted into the 475th frame of synthetic speech with length.
2. simulation result and analysis
Experimental result is respectively as Fig. 3, Fig. 4, Fig. 5 and Fig. 6, and they are respectively speech waveform figure and watermarking detecting results after inserting, replace, delete and ganging up against.Wherein:
Fig. 3 a is a voice signal in the oscillogram of inserting after attacking, and it is the signal that does not contain watermark of 216 frames that the position that this figure is presented at the 325th frame~540 frames has been inserted into a segment length.
Fig. 3 b is a watermarking detecting results, as can be seen from the figure, begin to occur continuous the 3rd class mistake that a segment length is 216 frames from its 325th frame~540 frames, Error type I and the 3rd class mistake uniformly-spaced appear in the back of this 3rd continuous class mistake, it is length 50 frames at interval, be every group of frame number that comprises, error type II then appears in the last frame at voice, therefore can judge that position under fire is the 325th frame, owing to replace to attack and after the 3rd continuous class mistake, other mistakes can not occur, and the 3rd continuous class mistake can not appear in the deletion attack, can judge that therefore the suffered attack of voice is to insert to attack, and insertion length is 216 frames.
Fig. 4 a is the oscillogram of voice signal after replacing attack, and this figure is presented at the 325th frame~540 frames and is replaced by one section isometric signal that does not contain watermark.
Fig. 4 b is a watermarking detecting results, as can be seen from the figure: continuous the 3rd class mistake occurs from the 325th frame~540 frames, thereafter frame inerrancy, therefore can judge that position under fire is the 325th frame, owing to insert to attack and after the 3rd continuous class mistake, other mistakes can occur, and the 3rd continuous class mistake can not appear in the deletion attack, can judge that therefore the suffered attack of voice is to replace to attack, and the scope of replacement is the 325th frame~540 frames.
Fig. 5 a is the oscillogram of voice signal after deletion is attacked, and this figure shows that the 325th frame~500 frames are deleted.
Fig. 5 b is a watermarking detecting results, uniformly-spaced occurs Error type I and the 3rd class mistake after the 325th frame of this figure, its at interval length be i.e. 50 frames of every group of frame number that comprises, error type II then appears at the last frame of voice.Can judge that thus position under fire is the 325th frame, owing to insert to attack and after the 3rd continuous class mistake, other mistakes can occur, other mistakes can not appear after the 3rd continuous class mistake and replace to attack, therefore can judge that the suffered attack of voice is the deletion attack, because the position of the frame that the 3rd class mistake that uniformly-spaced occurs occurs is 325,375,425..., its remainder divided by every group frame number 50 all is 25, therefore the length L of deletion is: L=50 * N+25+1, N=0,1 ... recomputate the checking watermark of last frame
Figure G2008101507573D00091
:
W T , m ′ ′ = H x ( D , S , W T , m - 1 , L T , m , ( T - 1 ) × n + m + 50 × N + 25 + 1 )
Search for N=3, make W T , m ′ ′ = W T , m ′ , Thereby calculating deletion length is 176 frames.
Fig. 6 a is the oscillogram of voice signal after ganging up against, and this figure demonstration is replaced by one section isometric signal that does not contain watermark from the 175th frame~290 frames, and it is the signal that does not contain watermark of one section of 116 frame that the position of the 475th frame has been inserted into length.
Fig. 6 b is a watermarking detecting results, as can be seen from the figure: the 3rd continuous class mistake all occurs at the 175th~290 frame and the 475th frame~590 frames, because the frame of replacing after attacking does not have mistake, insert then equally spaced Error type I and the 3rd class mistake of occurring of frame after attacking, therefore can judge that voice are subjected to replacing attack at the 175th~290 frame, the 475th~590 frame is subjected to inserting and attacks.
Above simulation result shows the present invention can judge not only whether host's carrier is attacked, and can accurately detect the position and the attack type of attack.

Claims (5)

1. compressing watermark using voice embedding grammar based on bone conduction comprises following process:
(1) utilizes the bone conduction device signal that voice signal is removed noise and other noise pre-service, and extract the numbering ID of bone conduction equipment DNumbering ID with voice signal S
(2) pretreated voice signal is divided into the frame of several regular lengths according to the length of voice, every frame length 30ms is divided into one group with the n frame, and except last group, promptly the frame number of last group is the total frame number remainder divided by n;
(3) utilize G. 723.1 voice compression coding standards, extract the coefficient L of each main feed line spectral frequency G, iWith pitch period P G, i, L wherein G, iAnd P G, iRepresent that respectively g organizes the line spectral frequencies coefficient and the pitch period of i frame;
(4) utilize the numbering ID of Hash hash function with the bone conduction equipment of extraction D, voice signal numbering ID S, present frame line spectral frequencies coefficient L G, i, the pitch period P that extracts of next frame G, i+1Watermark W with former frame G, i-1, generate the watermark W that is embedded in present frame G, i
(5) with the watermark W of present frame G, iBe embedded in the position number of multi-pulse excitation of voice coding, promptly replace the least significant bit of the position number of pulse excitation, finally obtain containing the voice compression coding stream of watermark with watermark.
2. watermark embedding method according to claim 1, wherein being undertaken of step (4): (4a) if present frame for first frame of each group, is then pressed following formula generation watermark W by following three kinds of situations G, 1, promptly
W g,1=H x(ID D,ID S,g,W g-1,n,L g,1,P g,2)
In the formula, H x() expression Hash function,
W G, 1Be that g organizes the watermark that first frame generates, when g=1, make W 0, nBe private key Key,
W G-1, nBe the watermark of the n frame of g-1 group, i.e. the watermark of former frame,
L G, 1Be the line spectral frequencies coefficient that g organizes the 1st frame,
P G, 2It is the pitch period that g organizes the 2nd frame;
(4b) if present frame is the last frame of last group, establishing last group has the m frame, then presses following formula and generates watermark W T, m, promptly
W T,m=H x(ID D,ID S,W T,m-1,L T,m,(T-1)×n+m)
In the formula, W T, mBe that T organizes the watermark that the m frame generates,
W T, m-1Be the watermark of the m-1 frame of T group,
L T, mBe the line spectral frequencies coefficient that T organizes the m frame,
(T-1) * n+m is the voice totalframes;
(4c) other situation is then pressed following formula and is generated watermark W G, i, promptly
W g,i=H x(ID D,ID S,W g,i-1,L g,i,P g,i+1)
In the formula, W G, iBe the watermark of the i frame generation of g group,
W G, i-1Be the watermark of the i-1 frame of g group,
L G, iBe the line spectral frequencies coefficient of the i frame of g group,
P G, i+1It is the pitch period that g organizes the i+1 frame.
3. the compressing watermark using voice based on bone conduction extracts verification method, comprises following process:
1) the compressed encoding stream that contains watermark is carried out decoding processing;
2) from the position number of selected pulse excitation, extract watermark
Figure F2008101507573C00024
3) utilize the numbering ID of Hash hash function with bone conduction equipment D, voice signal numbering ID S, present frame line spectral frequencies coefficient L G, i, the pitch period P that extracts of next frame G, i+1Watermark with the former frame extraction
Figure F2008101507573C00022
Generate the checking watermark
Figure F2008101507573C00023
4) relatively g organizes the watermark that the i frame extracts
Figure 2008101507573100001F2008101507573C00024
With the checking watermark that generates
Figure F2008101507573C00025
If
Figure F2008101507573C00026
Be not equal to
Figure F2008101507573C00027
Judge that then g organizes i frame mistake,, judge the position and the type of attacking according to position and the distribution that mistake occurs.
4. watermark extracting verification method according to claim 3, wherein being undertaken of step 3): (3a) if present frame for first frame of each group, is then pressed following formula generation watermark by following three kinds of situations
Figure F2008101507573C00028
Promptly
Figure F2008101507573C00029
In the formula, H x() expression Hash function,
Figure F2008101507573C000210
Be the checking watermark that g organizes the 1st frame, when g=1, W 0, nBe private key Key,
Figure F2008101507573C000211
Be the watermark of the n frame of g-1 group, i.e. the watermark of former frame extraction,
L G, 1Be the line spectral frequencies coefficient that g organizes the 1st frame,
P G, 2It is the pitch period that g organizes the 2nd frame;
(3b) if present frame is the last frame of last group, establishing last group has the m frame, then presses following formula and generates watermark
Figure F2008101507573C000212
Promptly
In the formula,
Figure F2008101507573C000214
Be that T organizes the checking watermark that the m frame generates,
Figure F2008101507573C000215
Be the watermark of the m-1 frame extraction of T group,
L T, mBe the line spectral frequencies coefficient that T organizes the m frame,
(T-1) * n+m is the voice totalframes;
(3c) other situation is then pressed following formula and is generated watermark W G, i, promptly
Figure F2008101507573C00031
In the formula,
Figure F2008101507573C00032
Be the checking watermark of the i frame generation of g group,
Figure F2008101507573C00033
Be the watermark of the i-1 frame extraction of g group,
L G, iBe the line spectral frequencies coefficient of the i frame of g group,
P G, i+1It is the pitch period that g organizes the i+1 frame.
5. watermark extracting verification method according to claim 3, the described judgement of the step 4) position and the type of attacking wherein, carry out according to the following procedure:
(4a) detect the attack position
Since first group of first frame, the relatively watermark of Ti Quing With the checking watermark that generates
Figure F2008101507573C00035
If With
Figure F2008101507573C00037
Unanimity then continues relatively next frame; If
Figure F2008101507573C00038
Judge that g organizes i frame mistake, then position under fire is that g organizes the i frame;
(4b) set type of error
Error type I: it is inconsistent with the checking watermark that a certain group first frame extracts watermark, but the extraction watermark of its former frame and back one frame is consistent with the checking watermark, that is:
Figure F2008101507573C00039
Figure F2008101507573C000310
Error type II: it is inconsistent with the checking watermark that the last frame of last group extracts watermark, but the extraction watermark of its former frame is consistent with the checking watermark;
The 3rd class mistake: other mistake is the 3rd class mistake;
(4c) judge attack type according to the different distributions of different type of errors
If attack type is attacked for inserting, then insert the position of attacking and the 3rd continuous class mistake occurs, the equally spaced first kind and the 3rd class mistake of occurring after the 3rd continuous class mistake, error type II appears in last frame, is spaced apart every group of frame number that comprises;
If attack type is attacked for replacing, then replace the position of attacking and the 3rd continuous class mistake, other frame inerrancy occur;
If attack type is attacked for deletion, then the equally spaced first kind and the 3rd class mistake of occurring after the position that deletion is attacked is spaced apart every group the frame number that comprises, and error type II can appear in last frame.
CN2008101507573A 2008-08-29 2008-08-29 Method for compressing watermark using voice based on bone conduction Expired - Fee Related CN101350198B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101507573A CN101350198B (en) 2008-08-29 2008-08-29 Method for compressing watermark using voice based on bone conduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101507573A CN101350198B (en) 2008-08-29 2008-08-29 Method for compressing watermark using voice based on bone conduction

Publications (2)

Publication Number Publication Date
CN101350198A CN101350198A (en) 2009-01-21
CN101350198B true CN101350198B (en) 2011-09-21

Family

ID=40268955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101507573A Expired - Fee Related CN101350198B (en) 2008-08-29 2008-08-29 Method for compressing watermark using voice based on bone conduction

Country Status (1)

Country Link
CN (1) CN101350198B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011155323A (en) * 2010-01-25 2011-08-11 Sony Corp Digital watermark generating apparatus, electronic-watermark verifying apparatus, method of generating digital watermark, and method of verifying digital watermark
CN107545899B (en) * 2017-09-06 2021-02-19 武汉大学 AMR steganography method based on unvoiced fundamental tone delay jitter characteristic
CN114333859A (en) * 2020-09-30 2022-04-12 华为技术有限公司 Audio watermark adding and analyzing method, equipment and medium
US20220303642A1 (en) * 2021-03-19 2022-09-22 Product Development Associates, Inc. Securing video distribution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李静.基于骨导信号的语音重构技术.《西北工业大学硕士论文》.2004, *
陈明奇,钮心忻,杨义先.数字水印的攻击方法.《电子与信息学报》.2001,705-711. *

Also Published As

Publication number Publication date
CN101350198A (en) 2009-01-21

Similar Documents

Publication Publication Date Title
US7546173B2 (en) Apparatus and method for audio content analysis, marking and summing
CN101345054B (en) Digital watermark production and recognition method used for audio document
CN101271690B (en) Audio spread-spectrum watermark processing method for protecting audio data
Nematollahi et al. Digital watermarking
Li et al. Detection of quantization index modulation steganography in G. 723.1 bit stream based on quantization index sequence analysis
US7140043B2 (en) Watermark embedding and detecting method by quantization of a characteristic value of a signal
CN107993669B (en) Voice content authentication and tampering recovery method based on modification of least significant digit weight
CN101350198B (en) Method for compressing watermark using voice based on bone conduction
CN103456308B (en) A kind of recoverable ciphertext domain voice content authentication method
Chen et al. Content-dependent watermarking scheme in compressed speech with identifying manner and location of attacks
CN105304091B (en) A kind of voice tamper recovery method based on DCT
CN102222504A (en) Digital audio multilayer watermark implanting and extracting method
Park et al. Speech authentication system using digital watermarking and pattern recovery
CN105895109B (en) A kind of digital speech evidence obtaining and tamper recovery method based on DWT and DCT
Wang et al. Tampering Detection Scheme for Speech Signals using Formant Enhancement based Watermarking.
Wu et al. Fragile speech watermarking for content integrity verification
CN114999502B (en) Adaptive word framing based voice content watermark generation and embedding method and voice content integrity authentication and tampering positioning method
CN100353444C (en) Digital audio-frequency anti-distorting method
Wu et al. Speech Content Authentication Integrated With Celp Speech Coders.
CN104091104A (en) Feature extraction and authentication method for multi-format audio perceptual Hashing authentication
Yargıçoğlu et al. Hidden data transmission in mixed excitation linear prediction coded speech using quantisation index modulation
CN108877819B (en) Voice content evidence obtaining method based on coefficient autocorrelation
Wu et al. Comparison of two speech content authentication approaches
Liu et al. Fragile speech watermarking scheme with recovering speech contents
Jiao et al. Compressed domain perceptual hashing for MELP coded speech

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110921

Termination date: 20140829

EXPY Termination of patent right or utility model