CN102568469B

CN102568469B - G.729A compressed pronunciation flow information hiding detection device and detection method

Info

Publication number: CN102568469B
Application number: CN2011104351639A
Authority: CN
Inventors: 李松斌; 黄永峰
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2011-12-22
Filing date: 2011-12-22
Publication date: 2013-10-16
Anticipated expiration: 2031-12-22
Also published as: CN102568469A

Abstract

The invention discloses a G.729A compressing pronunciation flow information hiding detection device, which at least comprises a compressing pronunciation flow mapping phoneme sequence module, a phoneme sequence feature extraction module group, a grader device and a result integration module. The compressing pronunciation flow mapping phoneme sequence module receives compressing pronunciation flowdelivered from the outside, enables the compressing pronunciation flow to be mapped into phoneme sequences and outputs the phoneme sequence. The phoneme sequence feature extraction module group respectively extracts and outputs phoneme vector space feature vectors and phoneme state transfer first-order markov feature vectors of phoneme sequences. The grader device respectively trains graders for different feature vectors based on training sets, and train is used for samples in unknown types to obtain the graders for conducting classification and output classification results. The result integration module integrates output results of a plurality of graders and outputs the output results serving as final steganography detection results. The G.729A compressing pronunciation flow informationhiding detection device is applicable to detection of conducting quantization index modulation (QIM) information hiding through a grouping vector codebook optimized and divided by copy number variation (CNV) in a voice coding process based on a G.729A standard.

Description

G.729A compressed voice stream information hiding detection device and detection method

Technical field

The present invention relates to the Information hiding detection field, relate in particular to a kind of G.729A compressed voice stream information hiding detection device and detection method.

Background technology

In recent years, along with the sustainable growth of bandwidth and the enhancing of network integration trend, VoIP becomes popular streaming media communication service in the Internet gradually, be used widely in the world, thoroughly changed the voice communication market structure, the network traffics that it produces are in sustainable growth, and this makes VoIP be suitable for very much carrying out covert communications in IP network.G.729 standard is the VoIP speech coding standard of ITU definition, and G.729A its simple version is used widely at VoIP.G.729A this make that compressed voice stream becomes a kind of potential Covers of Information Hiding that has threat, utilizes it to carry out covert communications and will form grave danger to the national communication supervision, study necessary based on the Information hiding detection method of this carrier.Whether Information hiding detects (also claiming Stego-detection) is exactly to judge to exist in the viewed carrier data to hide Info.

Current method of carrying out Information hiding in voice can roughly be divided into following a few class: the first is replaced or matching process for the least significant bit (LSB) of pulse code modulation (PCM) speech data; It two is transform domain methods, and the method transforms to transform domain with carrier data first, and then by realize the embedding of confidential information in some parameters of transform domain modification, conversion commonly used comprises Cepstrum Transform, discrete cosine transform, wavelet transform etc.; Its three method that is based on quantization index modulation (Quantization Index Modulation, QIM) is applicable to comprise DAB, the image and video coding of vector quantization.It is fast to have the simple speed of computing based on the information concealing method of QIM mechanism in these three class methods, be used in characteristics such as carrying out Information hiding in the compression encoding process, be particularly suitable for G.729A carrying out Information hiding in the voice flow, its threat to national communication safety is also maximum.

Summary of the invention

For the problems referred to above, the object of the present invention is to provide a kind of G.729A compressed voice stream information hiding detection device and detection method, be applied to detect in the speech based on standard G.729A and use through the mutual neighbor node of CNV() the grouping vector code book divided of algorithm optimization carries out the detection of QIM Information hiding.

For achieving the above object, a kind of G.729A compressed voice stream information hiding detection device of the present invention comprises compressed voice stream mapping aligned phoneme sequence module, aligned phoneme sequence characteristic extracting module group, sorter device and result integration module, wherein at least;

Compressed voice stream mapping aligned phoneme sequence module receives the outside compressed voice stream that transports, and is mapped to aligned phoneme sequence and output;

Aligned phoneme sequence characteristic extracting module group, phoneme vector space proper vector and phoneme state transitions single order Markov proper vector and the output of extracting respectively aligned phoneme sequence;

The sorter device, to different characteristic vector difference training classifier, then the sample for unknown classification uses training acquisition sorter to classify and the output category result based on training set;

The result integration module, to the Output rusults of a plurality of sorters carry out integrated and output as final Stego-detection result.

Preferably, described aligned phoneme sequence characteristic extracting module group comprises PVSF characteristic extracting module and FOMF characteristic extracting module, wherein;

The PVSF characteristic extracting module, phoneme vector space proper vector and the output of extracting aligned phoneme sequence;

The FOMF characteristic extracting module is extracted phoneme state transitions single order Markov proper vector and output.

Preferably, described sorter device comprises the first sorter, the second sorter and the 3rd sorter, wherein,

The first sorter, based on phoneme vector space proper vector train obtain utilizing behind the sorter this sorter unknown classification sample is predicted and Output rusults to integration module;

The second sorter, based on the fusion feature vector of phoneme vector space proper vector and phoneme state transitions single order Markov proper vector train obtain utilizing behind the sorter this sorter unknown classification sample is predicted and Output rusults to integration module;

The 3rd sorter, based on phoneme state transitions single order Markov proper vector train obtain utilizing behind the sorter this sorter unknown classification sample is predicted and Output rusults to integration module.

For achieving the above object, a kind of G.729A compressed voice stream information of the present invention is hidden detection method, may further comprise the steps:

Compressed voice stream is mapped to aligned phoneme sequence;

Extract respectively phoneme vector space proper vector and the phoneme state transitions single order Markov proper vector of aligned phoneme sequence;

To various features vector difference training classifier, and carry out the classification results of a plurality of sorters integrated as final classification results based on most voting mechanisms.

Preferably, the described method that compressed voice stream is mapped to aligned phoneme sequence is: setting the phoneme that voice comprise is limited, and voice to be mapped are divided into voice small pieces corresponding to each phoneme, and the duration of getting small pieces is G.729A frame length.

Preferably, the aligned phoneme sequence feature extracting method is:

Channel parameters when using phoneme pronunciation is as the quantificational description of phoneme, LPC wave filter in using G.729A characterizes channel parameters, the LPC wave filter is determined by quantization index, each phoneme is corresponded to first territory of LPC wave filter quantization index, use the statistical nature of this territory formation sequence as the statistical nature of aligned phoneme sequence;

Use phoneme vector space proper vector to quantize to extract the G.729A skewness weighing apparatus property of phoneme that voice comprise;

Use phoneme state transitions single order Markov chain that aligned phoneme sequence is carried out modeling, the computing mode transition matrix is measured the correlativity that each phoneme distributes, and adopts and selects the dimension method that state-transition matrix is carried out dimensionality reduction---get the diagonal of a matrix element as the vector that characterizes the phoneme distribution correlation.

Preferably, adopt the Ensemble classifier method: extract the phoneme vector space proper vector of the G.729A compressed voice stream in the training set and the single order Markov proper vector behind the dimensionality reduction, distinguish training classifier with phoneme vector space proper vector, single order Markov proper vector and both fusion feature vectors as feature respectively.

Beneficial effect of the present invention is:

The present invention is applied to detect in the speech based on standard G.729A and uses through the mutual neighbor node of CNV() the grouping vector code book divided of algorithm optimization carries out the detection of QIM Information hiding.Use this detection system, show in the number of frames of frame sequence G.729A for the test of mass data to surpass (being that voice flow length is above 0.64 second) at 640 o'clock, system can obtain the Detection accuracy above 93%.

Description of drawings

Fig. 1 is the structural representation of the described device of the embodiment of the invention;

Fig. 2 is the synoptic diagram that the present invention is based on the voice component model of phoneme;

Fig. 3 is that the present invention uses the CNV algorithm to carry out that Optimization of Codebook is divided and divides code book and carry out QIM and embed the disturbance exemplary plot of quantization index sequence G.729A based on optimizing;

Fig. 4 is the synoptic diagram of the described detection method of the embodiment of the invention;

Fig. 5 is overhaul flow chart of the present invention.

Embodiment

The present invention will be further described below in conjunction with Figure of description.

As shown in Figure 1, the described a kind of G.729A compressed voice stream information hiding detection device of the embodiment of the invention comprises compressed voice stream mapping aligned phoneme sequence module, aligned phoneme sequence characteristic extracting module group, sorter device and result integration module, wherein at least;

Described aligned phoneme sequence characteristic extracting module group comprises PVSF characteristic extracting module and FOMF characteristic extracting module, wherein;

Described sorter device comprises the first sorter, the second sorter and the 3rd sorter, wherein,

The base unit of human pronunciation is phoneme, and phoneme can be divided into vowel and consonant by large class, and each classification can be divided into a plurality of subclasses again.The general corresponding different sound channel form of different phonemes.It is the elementary cell that consists of language that phoneme also can be described as phonetic symbol, and these discrete elementary cells become word according to certain phoneme with grammar rule cluster more or less; Word is according to the language system of certain syntax form complete.Language system is to have some statistical law, and for example, the letter that access times are maximum in the English according to statistics is " e ", is mapped to so to think on the voice that the occurrence number of phoneme " e " is also maximum; Secondly, the assembled arrangement mode in the English between the letter is and then " u " when existing the back such as " q " of certain rule most of, is mapped to so the assembled arrangement that can think on the voice between the phoneme and also has certain rule.In other words, the appearance of each phoneme in one section voice is unbalanced, and secondly there is correlativity in the appearance of different phonemes.Carry out to cause based on the Information hiding of QIM mechanism the change of these distribution characters, therefore can utilize the phoneme distribution character of the G.729A voice flow sample of Information hiding that whether exists to be determined to carry out Stego-detection.The below sets forth this Stego-detection method.The pronunciation of the phoneme " p " that the phoneme " o " of the phoneme " sh " that the pronunciation of English word " shop " is namely produced by noise source, periodically sound source generation and impact sound source produce consists of.As shown in Figure 2.But be a plurality of small fragments corresponding with phoneme in the cutting of next section of ideal case voice, in other words, one section complete voice can be considered as being arranged in sequence by a plurality of phonemes and form that the present invention is referred to as the voice component model based on phoneme.

As shown in Figure 4, a kind of G.729A compressed voice stream Stego-detection method is characterized in that, may further comprise the steps:

Compressed voice stream is mapped to aligned phoneme sequence;

The described method that compressed voice stream is mapped to aligned phoneme sequence is: setting the phoneme that voice comprise is limited, and voice to be mapped are divided into voice small pieces corresponding to each phoneme, and the duration of getting small pieces is G.729A frame length.

The specific implementation principle is:

Define 1 phoneme ρ _iBe tlv triple (p _i, s _i, t _i), p wherein _iBe phonetic symbol, s _iBe phonetic symbol p _iPronunciation be the voice small fragment with certain time length, t _iDuration for these voice small pieces; ρ _iBe the basic composition unit of voice, the set of phonemes p={ ρ of language ₁, ρ ₂..., ρ _M-1, ρ _MComprise limited phoneme; But one section voice S cutting N voice small pieces S set={ f for arranging chronologically that duration is T ₁, f ₂..., f _N-1, f _N, burst f wherein _k=s _l(k ∈ [1, N], l ∈ [1, M]) claims at this moment f _kCan be mapped to phoneme ρ _l, use f _k→ ρ _lRepresent this mapping relations, the set of all mapping relations is F.Voice component model based on phoneme is described with tlv triple (P, S, F).

But one section voice cutting is voice small pieces sequence f based on above-mentioned model ₁, f ₂..., f _N-1, f _N, and can be aligned phoneme sequence ρ with the small pieces sequence mapping ₁, ρ ₂..., ρ _N-1, ρ _NThe duration of each phoneme is not isometric in the voice, and for example voiced sound " o " may continue more than 50 milliseconds, and turbid plosive " b " then may only continue 10 milliseconds, and along with its lasting duration of difference of speaker and word speed is ever-changing especially.Therefore, phoneme ρ _iDuration t _iBe very difficult pre-determined, this causes carrying out based on the cutting of phoneme one section voice very difficult.Because the purpose that the present invention sets up model is to analyze G.729A whether to have in the coded frame sequence that QIM is hidden to be write, and G.729A divide frame to voice and every frame calculated a LPC coefficient (namely estimating a secondary channel voice parameter) take 10 milliseconds as unit, this means G.729A think 10 milliseconds in short-term in the form of sound channel be stable; Suppose the corresponding different phoneme pronunciations of different sound channel form, can think so G.729A in the part of the corresponding phoneme of every frame or a phoneme.According to the statistics to actual speech, the lasting duration average of phoneme is much larger than 10 milliseconds in the English, and this has confirmed the correctness of above-mentioned conclusion.For this reason take 10 milliseconds as boundary, the present invention is with t _iThe phoneme that is no more than 10 milliseconds is called category-A, otherwise is category-B.Be made as G.729A frame length L for its duration of category-A phoneme.Establish its duration t for the category-B phoneme _i=nL (n 〉=1), this moment, each phoneme comprised a plurality of G.729A frames, specifically comprised several still being difficult to determine.Signal waveform when the present invention finds the category-B phoneme pronunciation generally has periodic feature, for example the phoneme " o " in the accompanying drawing 1 has comprised four obvious cycles, this moment, the signal of one-period can reflect the sound channel feature, therefore can be considered in G.729A for the category-B phoneme its channel parameters has been carried out repeatedly repeating estimating.Given this, it is considered herein that for t _iThe category-B phoneme of=nL (n 〉=1) can be divided into n frame and carry out respectively the LPC parameter estimation.The above analysis, can G.729A frame be corresponding (for the category-B phoneme with a phoneme with each, may continuous several frames corresponding identical phonemes all), each frame can be mapped as a phoneme from this angle, such one section G.729A compressed voice can be considered an aligned phoneme sequence.The feature of aligned phoneme sequence can represent with the feature of this section speech frame sequence.

Described aligned phoneme sequence feature extracting method is:

The specific implementation principle is:

Owing to G.729A each frame is calculated one group of LPC coefficient, voice component model is above pointed out each G.729A corresponding phoneme of frame, and therefore every group of LPC coefficient is assumed to be ρ (suppose same phoneme is corresponding under the ideal conditions LPC coefficient be identical then think different phonemes such as difference) with a corresponding phoneme.Establishing LPC coefficient to every frame carries out its quantized result of vector quantization and is made as C=(c again _{1, i}, c _{2, j}, c _{3, k}) c wherein _{1, i}∈ L ₁, c _{2, j}∈ L ₂, c _{3, k}∈ L ₃, then phoneme ρ and index C will form one-to-one relationship, and the present invention uses

Represent this relation.If certain section voice comprise N frame, G.729A encoding to obtain the quantization index sequence C ₁C ₂C ₃... C _N-1C _N, then corresponding aligned phoneme sequence is ρ ₁ρ ₂ρ ₃... ρ _N-1ρ _NBecause quantized result C=(c _{1, i}, c _{2, j}, c _{3, k}) having 128 * 32 * 32=131072 kind value, small data quantity is difficult to its statistical property of reflection, this means and must could effectively obtain its statistical nature when G.729A the length of frame sequence reaches very large.Obviously, this is unfavorable for carrying out Stego-detection.Owing to G.729A adopt Split vector quantizer, index sequence C ₁C ₂C ₃... C _N-1C _NIn fact consisted of by three subsequences, namely by c _1,1c _1,2c _1,3... c _{1, N-1}c _{1, N}, c _2,1c _2,2c _2,3... c _{2, N-1}c _{2, N}And c _3,1c _3,2c _3,3... c _{3, N-1}c _{3, N}Consist of.So basis

Can get

With

Be that aligned phoneme sequence and three sub-index sequence also form one-to-one relationship, QIM is hidden to write the generation disturbance (mean aligned phoneme sequence generation disturbance, the statistical property on its some dimension will change) that will make these subsequences.In three territories that quantized result C comprises, c _{1, i}Importance surpass c _{2, j}And c _{3, k}This is because c _{1, i}That the one-level vector all needs in calculating 10 LPC coefficients, and c _{2, j}And c _{3, k}Be the secondary vector and only be respectively applied to calculate the first five and rear five LPC coefficients.For this reason, for reaching the purpose of dimensionality reduction, get c as a kind of approximate the present invention _{1, i}For characterizing phoneme ρ _iProper vector namely

ρ behind the dimensionality reduction _i128 kinds of values are only arranged.Accordingly, can adopt quantification subsequence c _1,1c _1,2c _1,3... c _{1, N-1}c _{1, N}Statistical nature as aligned phoneme sequence ρ ₁ρ ₂ρ ₃... ρ _N-1ρ _NStatistical nature.

Use the CNV algorithm to carry out the code book division and carry out the QIM embedding to make the quantization index sequence that larger disturbance occurs, accompanying drawing 3 has provided the example of this kind disturbance.Accompanying drawing 2 has been added up one section voice for four kinds of different classes of speaker (this section voice duration be 1 second comprise 100 G.729A speech frames), its index sequence c behind the coil insertion device confidential information _1,1c _1,2c _1,3... c _1,99c _1,100Situation of change; Horizontal ordinate among this figure in four subgraphs is the sequence number of the frame arranged by the time, and ordinate is c _{1, i}The quantization index of (1≤i≤100); For English male and female students and Chinese male and female students, the variation of the index sequence of investigating before and after embedding is significant as can be seen from this figure.To there be some statistical property, so basis according to above analyzing in the aligned phoneme sequence

Also will there be this statistical property in the quantization index sequence as can be known.Hidden write operation can make quantization index sequence generation disturbance, and some statistical nature is changed.If obviously can carry out effective quantitative analysis to this kind change, then may carry out accordingly Stego-detection.For quantitative analysis is carried out in this kind variation, must set up the characteristic statistics model of aligned phoneme sequence.

As indicated above, the distribution of each phoneme exists lack of uniformity and correlativity in aligned phoneme sequence.Lack of uniformity for the distribution of quantitative analysis phoneme, the voice component model based on phoneme that proposes according to the present invention, the modeling method of while reference documents vector space model, set up the G.729A statistical model of aligned phoneme sequence of phoneme vector space model (Phoneme Vector Space Model, PVSM) conduct.This model description is as follows:

Define 2 phoneme ρ _iBe the basic composition unit of voice, the set of phonemes P={ ρ of language L ₁, ρ ₂..., ρ _M-1, ρ _M, claim ρ _iBe phoneme vocabulary (Phoneme Word) that P is the phoneme dictionary, P is that the corresponding high dimension vector of M dimensional vector space is called the phoneme vector space.

But N the voice small pieces of the voice S cutting that to define one section duration of 3 language L be T for arranging chronologically, each small pieces f _iA vocabulary ρ among the corresponding phoneme dictionary P _i, voice burst sequence is S behind the burst ^*=f ₁, f ₂..., f _N-1, f _N, deserve to be called the process of stating and be the voice burst based on phoneme.

Definition 4 is S to each voice burst sequence ^*=f ₁, f ₂..., f _N-1, f _N, a point (ρ in the available phoneme vector space ₁, w ₁, ρ ₂, w ₂..., ρ _M, w _M) expression, wherein ρ _i(ρ _i∈ P) is i small pieces f _iCorresponding phoneme code word, w _iBe ρ _iWeight.Deserving to be called the sound bite quantization means method of stating the definition formation is phoneme vector space model (PVSM), (ρ ₁, w ₁, ρ ₂, w ₂..., ρ _M, w _M) be phoneme vector space feature (Phoneme Vector Space Feature, the PVSF) vector of S.

For the stream of compressed voice G.729A, need not divide frame (each G.729A frame directly phoneme of correspondence) according to above analyzing, and according to relation

Its phoneme dictionary P={c as can be known _1,0, c _1,1..., c _1,126, c _1,127.According to these hypothesis, can with one section G.729A voice adopt phoneme vector space proper vector (c _1,0, w _1,0, c _1,1, w _1,1..., c _1,126, w _1,127) represented that its dimension is 128 dimensions, c _{1, i}Weight w _{1, i}Get c _{1, i}The normalization frequency of occurrences in quantized sequences be w _{1, i}=d _{1, i}/ N, the G.729A number of frames that comprises for this section voice of N wherein, d _{1, i}Be c _{1, i}The number of times that in quantized sequences, occurs.

But phoneme vector space feature (PVSF) has only reflected the unbalanced characteristic that different phonemes distribute in voice, can not reflect the correlativity that phoneme distributes.Be the quantitative analysis analysed for relevance, the present invention adopts Markov chain that aligned phoneme sequence is carried out modeling, each phoneme is considered as a state (being called the phoneme state) on the Markov chain, with this upstate transition probability the combination dependence between phoneme is carried out quantitative analysis.For one section aligned phoneme sequence ρ that voice are corresponding ₁, ρ ₂..., ρ _N-1, ρ _NIf, suppose that the appearance of each phoneme is only relevant with previous phoneme, then this aligned phoneme sequence can be considered as a phoneme state transitions single order Markov process, analogize and this aligned phoneme sequence can be considered as second order or the Markovian process of high-order more.According to philological statistical law, the appearance of general certain letter only has larger related with its previous letter, therefore analogize and think that the appearance of certain phoneme is also only larger related with its previous phoneme existence, Given this present invention adopts phoneme state transitions single order Markov process that aligned phoneme sequence is carried out modeling.Suppose ρ _iThat the phoneme stochastic variable is at i value constantly, ρ _I+1I+1 value constantly, according to

Conditional probability shown in the available formula 2 represents each shape of single order Markov chain

Pr _α,β=Pr(ρ _i+1=α/ρ _i=β)(α,β∈[0,127]) （2）

Transition probability between attitude.Direct design conditions probability is comparatively difficult when actual computation, generally is translated into joint probability and calculates, and namely according to condition probability formula formula 2 is converted into formula 3 and carries out the calculating of correlativity between each phoneme.According to formula 3

\Pr_{α, β} = \frac{\Pr (ρ_{i + 1} = α ρ_{i} = β)}{\Pr (ρ_{i} = β)} (α, β &Element; [0,127]) - - - (3)

For arbitrary aligned phoneme sequence, can obtain the state-transition matrix of one 128 * 128 dimension, this matrix is called the phoneme state transitions single order Markov feature of aligned phoneme sequence.Obviously, this characteristic quantification the correlativity that occurs of adjacent phoneme, but its dimension is too high-leveled and difficult in application, therefore must carry out dimensionality reduction to it.Dimension reduction method commonly used has svd, principal component analysis (PCA) and selects dimension method etc.The present invention adopts the dimension method of selecting: only select in the transition probability matrix element on the principal diagonal as characteristic quantity, phoneme state transitions single order Markov feature can be reduced to like this vector of one 128 dimension, claim that this vector is single order Markov feature (the First Order Markov Feature of G729A frame sequence, FOMF) vector is used for quantizing the correlative character that phoneme distributes.

The classification based training method is: extract the phoneme vector space proper vector of the G.729 compressed voice stream in the training set and the single order Markov proper vector behind the dimensionality reduction, distinguish training classifier with phoneme vector space proper vector, single order Markov proper vector and both fusion feature vectors as feature respectively.

The specific implementation principle is:

Suppose to have G.729A compressed voice frame sequence S of the unknown, the target of Stego-detection of the present invention is to judge whether S exists that QIM is hidden to be write, and it is differentiated the result and only has "Yes" (the present invention is called the stego class) and "No" (the present invention is called the cover class) two classes.Therefore the Stego-detection process nature is assorting process, and namely the sample S for unknown classification assigns to cover class or stego class with it.For classification problem, be current main-stream based on the sorting technique of machine learning, the present invention also adopts this method to carry out the kind judging of unknown sample for this reason.The present invention can be divided into two steps for the kind judging process of unknown sample: the feature of at first extracting the G.729A frame sequence of unknown classification obtains its vectorization character representation, thereafter utilize to utilize the suitable mapping of semantic classifiers realization from the frame sequence low-level feature to the high-level semantic classification of the institute's frame sequence that obtains vector characteristic Design, wherein semantic classifiers generally adopts the method acquisition of supervised learning namely by using some to mark the sample training acquisition sorter of classification.Training and the prediction steps of sorter are as follows:

Step 1: obtain G.729A frame sequence of cover classification as much as possible, and use QIM embedding grammar (the grouping code book uses the CNV algorithm to be optimized division) to carry out hidden writing to obtain the stego sample that each sample is corresponding in the cover classification, and carry out mark;

Step 2: the PVSF/FOMF feature of the two class samples that the extraction previous step obtains forms proper vector, each vectorial classification of mark;

Step 3: training classifier: practical previous step obtains other proper vector set training classifier of marking class, obtains the hidden identification and classification device of writing;

Step 4: use sorter that unknown classification sample is carried out the hidden judgement of writing: for the G.729A frame sequence of unknown classification, at first extract its PVSF/FOMF feature, form proper vector, with the input of this proper vector input as previous step training gained sorter, sorter is with the output category result.

According to described Stego-detection method, the present invention towards the flow process of the hidden detection system of writing of the stream QIM of compressed voice G.729A as shown in Figure 5.The G.729A voice flow that is input as classification to be detected of system (namely G.729A frame sequence) fragment is output as this stream and whether has Information hiding.G.729A frame sequence for input uses the PVSF feature extractor to obtain the PVSF vector of 128 dimensions, uses the FOMF feature extractor to obtain the FOMF vector of 128 dimensions.The training of using the PVSF vector to carry out sorter obtains sorter 1, and the training of using the FOMF vector to carry out sorter obtains sorter 3, and the training that the associating vector of use PVSF and FOMF (256 dimension) carries out sorter obtains sorter 2.Classification results to three sorters adopts most voting mechanisms to determine final classification results.Sorter adopts support vector machine in the present embodiment, but adopts other sorters also in this patent protection domain.Use this detection system, show in the number of frames of frame sequence G.729A for the test of mass data to surpass (being that voice flow length is above 0.64 second) at 640 o'clock, system can obtain the Detection accuracy above 93%.

More than; be preferred embodiment of the present invention only, but protection scope of the present invention is not limited to this, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain that claim was defined.

Claims

1. a compressed voice stream information hiding detection device G.729A is characterized in that, comprises at least compressed voice stream mapping aligned phoneme sequence module, aligned phoneme sequence characteristic extracting module group, sorter device and result integration module, wherein;

2. G.729A compressed voice stream information hiding detection device according to claim 1 is characterized in that, described aligned phoneme sequence characteristic extracting module group comprises PVSF characteristic extracting module and FOMF characteristic extracting module, wherein;

3. G.729A compressed voice stream information hiding detection device according to claim 1 is characterized in that described sorter device comprises the first sorter, the second sorter and the 3rd sorter, wherein,

One kind G.729A the compressed voice stream information hide detection method, it is characterized in that, may further comprise the steps:

Compressed voice stream is mapped to aligned phoneme sequence;

5. G.729A compressed voice stream information according to claim 4 is hidden detection method, it is characterized in that, the described method that compressed voice stream is mapped to aligned phoneme sequence is: setting the phoneme that voice comprise is limited, voice to be mapped are divided into voice small pieces corresponding to each phoneme, and the duration of getting small pieces is frame length G.729A.

6. G.729A compressed voice stream information according to claim 5 is hidden detection method, it is characterized in that the aligned phoneme sequence feature extracting method is:

7. G.729A compressed voice stream information according to claim 5 is hidden detection method, it is characterized in that, adopt the Ensemble classifier method: extract the phoneme vector space proper vector of the G.729A compressed voice stream in the training set and the single order Markov proper vector behind the dimensionality reduction, distinguish training classifier with phoneme vector space proper vector, single order Markov proper vector and both fusion feature vectors as feature respectively.