CN110390941A

CN110390941A - MP3 audio hidden information analysis method and device based on coefficient correlation model

Info

Publication number: CN110390941A
Application number: CN201910586062.8A
Authority: CN
Inventors: 黄永峰; 杨浩; 鲍永健; 杨忠良; 杨震
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2019-10-29

Abstract

The MP3 audio hidden information analysis method and device based on coefficient correlation model that the invention discloses a kind of, wherein method is the following steps are included: obtain MP3 audio QMDCT coefficient matrix to be analyzed, using QMDCT coefficient as mode input；A variety of QMDCT associative mode coefficient vectors are extracted from MP3 audio QMDCT coefficient matrix by Recognition with Recurrent Neural Network；The audio associative mode coefficient vector analyzed by tagsort network handles is classified, and the probability whether MP3 audio to be analyzed carries out steganography is obtained.This method can realize higher Stego-detection rate to a variety of steganographic algorithms based on MP3 quantization encoding process, there is certain versatility, and dependence for manual feature is eliminated on model, and the model designed is fairly simple, it can be realized lower detection time-consuming.

Description

MP3 audio hidden information analysis method and device based on coefficient correlation model

Technical field

The present invention relates to digital audio Stego-detection technical field, in particular to a kind of MP3 based on coefficient correlation model Audio hidden information analysis method and device.

Background technique

Existing to be simply divided into two classes for MP3 compression domain Stego-detection method, one kind is conventional method, a kind of It is by the way of deep learning, traditional approach mainly passes through the extraction feature of manpower work, is then divided using classifier Class.

Detection method based on deep learning is then largely carried out feature extraction and is classified using neural network.It is based on The input of the method for MP3 compression domain Stego-detection is mainly with QMDCT (Quantified Modified Discrete Cosine Transform, audio quantization correct cosine transform coefficient matrix) based on.For a MP3 audio, usually for The audio of given frame number, each available n*576 QMDCT coefficient of frame, it is number of channels and granule number that wherein n, which depends on, Amount.The existing conventional representative steganographic algorithm analyzed based on QMDCT coefficient mainly has ADOTP steganalysis and MDI2 Algorithm, wherein ADOTP algorithm uses first-order difference to that filtering of QMDCT coefficient first, then using markov matrix to being Number carries out modeling and extracts feature, is then input in classifier and is trained.MDI2 algorithm has calculated separately each ranks Then difference calculates markov one-step transition probability, feature input classifier is calculated.Algorithm based on deep learning is such as Multiple dimensioned correlation also substantially carries out feature construction by the way of first manual designs high-pass filtering filtering, then using mind Mode through network carries out further feature extraction.

All in all, the either conventional method or neural network method of method before this depends primarily on manual extraction Feature (is related to designing filter), has certain subjectivity, conventional method has ignored the transfer characteristic of high-order, and reason is to mention Take complexity excessively high, the method based on deep learning lays particular emphasis on level characteristics based on then mainly modeling with convolutional neural networks, and The depth network number of plies generally constructed is excessive, and detection is time-consuming too long.In addition, detection algorithm before this is in versatility and complicated item Verification and measurement ratio under part is not generally high.

Summary of the invention

The present invention is directed to solve at least some of the technical problems in related technologies.

For this purpose, an object of the present invention is to provide a kind of audio hidden information analysis sides MP3 based on coefficient correlation model Method, this method can be realized higher steganalysis verification and measurement ratio, and still performance is good under the conditions of low insertion rate, and coefficient Correlation model complexity is relatively low, can guarantee preferable Real time detection performance.

It is another object of the present invention to propose a kind of MP3 audio hidden information analysis device based on coefficient correlation model.

In order to achieve the above objectives, one aspect of the present invention embodiment proposes a kind of MP3 audio based on coefficient correlation model Steganalysis method, comprising the following steps: MP3 audio QMDCT coefficient matrix to be analyzed is obtained, by the QMDCT coefficient As mode input；A variety of QMDCT relevant modes are extracted from the MP3 audio QMDCT coefficient matrix by Recognition with Recurrent Neural Network Formula coefficient vector；Classified by tagsort network to the audio associative mode coefficient vector to be analyzed, obtain to Whether the MP3 audio of analysis carries out the probability of steganography.

The MP3 audio hidden information analysis method based on coefficient correlation model of the embodiment of the present invention, again with the viewpoint of sequence MP3 tonic train is modeled, and using Recognition with Recurrent Neural Network building coefficient correlation model between captured frame and frame, it is each in frame Correlated characteristic between coefficient, and directly eliminate the process filtered by hand using QMDCT coefficient, and can modeling comparison it is remote The coefficient of distance relies on and the directional information of coefficient, so as to realize higher steganalysis verification and measurement ratio, and in low insertion rate Under the conditions of still performance it is good, and coefficient correlation model complexity is relatively low, can guarantee preferable Real time detection performance.

In addition, the MP3 audio hidden information analysis method according to the above embodiment of the present invention based on coefficient correlation model may be used also With following additional technical characteristic:

Further, in one embodiment of the invention, the QMDCT coefficient sequence that each MP3 audio obtains indicates Are as follows:

X=[x₁,x₂,…,x_K],

Wherein, x_KFor the coefficient vector after each frame amount.

Further, in one embodiment of the invention, further includes: right described in coefficient of utilization associated extraction model QMDCT coefficient sequence carries out correlation analysis, obtains coefficient correlated characteristic vector.

Further, in one embodiment of the invention, the tagsort network are as follows:

s_t=tanh (W_t·y_t+b_t),

Z_k=V_k*s_k+b_k,

Wherein, s is the middle layer output of tagsort network, Z_kIt is the output valve of corresponding pre-set categories.

Further, in one embodiment of the invention, Stego-detection formula are as follows:

Wherein, z is the probability value of output, and threshold is preset judgment threshold, and T is the class label of model.

In order to achieve the above objectives, another aspect of the present invention embodiment proposes a kind of MP3 sound based on coefficient correlation model Frequency hidden information analysis device, comprising: module is obtained, it, will be described for obtaining MP3 audio QMDCT coefficient matrix to be analyzed QMDCT coefficient is as mode input；Extraction module, for passing through Recognition with Recurrent Neural Network from the MP3 audio QMDCT coefficient matrix It is middle to extract a variety of QMDCT associative mode coefficient vectors；Categorization module, for passing through tagsort network to the sound to be analyzed Frequency associative mode coefficient vector is classified, and the probability whether MP3 audio to be analyzed carries out steganography is obtained.

The MP3 audio hidden information analysis device based on coefficient correlation model of the embodiment of the present invention, again with the viewpoint of sequence MP3 tonic train is modeled, and using Recognition with Recurrent Neural Network building coefficient correlation model between captured frame and frame, it is each in frame Correlated characteristic between coefficient, and directly eliminate the process filtered by hand using QMDCT coefficient, and can modeling comparison it is remote The coefficient of distance relies on and the directional information of coefficient, so as to realize higher steganalysis verification and measurement ratio, and in low insertion rate Under the conditions of still performance it is good, and coefficient correlation model complexity is relatively low, can guarantee preferable Real time detection performance.

In addition, the MP3 audio hidden information analysis device according to the above embodiment of the present invention based on coefficient correlation model may be used also With following additional technical characteristic:

X=[x₁,x₂,…,x_K],

Wherein, x_KFor the coefficient vector after each frame amount.

Further, in one embodiment of the invention, further includes: analysis module is used for coefficient of utilization associated extraction Correlation analysis is carried out to QMDCT coefficient sequence described in model, obtains coefficient correlated characteristic vector.

s_t=tanh (W_t·y_t+b_t),

Z_k=V_k*s_k+b_k,

The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.

Detailed description of the invention

Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:

Fig. 1 is the MP3 audio encoding process schematic diagram according to the embodiment of the present invention；

Fig. 2 is the MP3 audio decoding process schematic diagram according to the embodiment of the present invention；

Fig. 3 is schematic diagram of being classified according to the compressed domain audio steganography main algorithm of the embodiment of the present invention；

Fig. 4 is the flow chart according to the MP3 audio hidden information analysis method based on coefficient correlation model of the embodiment of the present invention；

Fig. 5 is to implement main process schematic diagram according to the algorithm of the embodiment of the present invention；

Fig. 6 is the coefficient correlation model structural schematic diagram according to the embodiment of the present invention；

Fig. 7 is the Recognition with Recurrent Neural Network basic framework schematic diagram according to the embodiment of the present invention；

Fig. 8 is the structural schematic diagram according to the bidirectional circulating neural network of the embodiment of the present invention；

Fig. 9 is the structural schematic diagram according to the shot and long term memory network of the embodiment of the present invention；

Figure 10 is the frame training algorithm flow chart according to the supervised learning of the embodiment of the present invention；

Figure 11 is to be shown according to the structure of the MP3 audio hidden information analysis device based on coefficient correlation model of the embodiment of the present invention It is intended to.

Specific embodiment

The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.

Before introducing the MP3 audio hidden information analysis method and device based on coefficient correlation model, first simply introduce Steganography and MP3 steganographic algorithm.

Steganography is the key technology in system of hiding.The main target of Steganography is based on disclosed various media informations As carrier, it is embedded in information as much as possible, and guarantees that the various statistical natures of steganography front and back carrier do not change as far as possible.Steganography The purpose of analysis is to detect the information of various insertions as far as possible in all suspicious carriers.In general more after steganography Media file has good not sentience, during transmission the general more difficult secret information for being detected its carrying, The specific crowd that only hidden writer specifies can extract secret information from the multimedia file after steganography.Since information is hidden This characteristic write, it is easy to by criminal using carrying out information transmitting, it is more likely that caused to national security very big Danger.Therefore Steganalysis has obtained more and more concerns.

In general the Digital Media of steganography includes image, audio and text etc..Audio file is due in recent years in net Widespread in network, to become the hot spot of many hidden writers and steganalysis detection researcher's concern.Particularly, MP3 audio is instantly the most popular one of audio file formats, and compared to wav format, MP3 format has very high compression Than, and it is able to maintain extraordinary sound quality.

The coding principle of MP3 as shown in Figure 1, the decoding principle of MP3 as shown in Fig. 2, for MP3 audio steganography algorithm, root It is believed that the difference of breath embedded location, as shown in figure 3, can be mainly divided into following several classes: (1) information based on time-frequency domain signal is embedding Enter；(2) it is embedded in based on quantization and the information of programming process；(3) the information insertion based on code stream domain.

Particularly, the method for the embodiment of the present invention is the insertion steganalysis for quantization and cataloged procedure, based on quantization With the insertion steganalysis of cataloged procedure be mainly say telescopiny in compressed encoding quantization compression combine, may be implemented compared with High concealment.

MP3 steganographic algorithm than more typical compression domain mainly includes MP3Stego, based on the hidden of Huffman code word mapping Write algorithm (HCM) and adaptive M P3 steganographic algorithm (EECS) scheduling algorithm based on the replacement of isometric entropy code word.Wherein MP3stego It is that insertion is completed in the interior circulation of MP3 coding, algorithm is according to the parity of part_2_3_length come embedding information.HCM is hidden It writes algorithm to modify to the Huffman code word in MP3 in the case where not destroying MP3 code flow structure, mainly to code word Classify and is then embedded according to different code-word types.EECS algorithm is based primarily upon isometric entropy code word replacement and realizes that password disappears The insertion of breath.In the design for being adaptively mainly reflected in cost function in telescopiny, syndrome trellis coding and cost Construction of function greatly enhances the safety of steganography.

The MP3 audio steganography based on coefficient correlation model proposed according to embodiments of the present invention point is described with reference to the accompanying drawings Method and device is analysed, describes the MP3 sound based on coefficient correlation model proposed according to embodiments of the present invention with reference to the accompanying drawings first Frequency steganalysis method.

Fig. 4 is the flow chart of the MP3 audio hidden information analysis method based on coefficient correlation model of one embodiment of the invention.

As shown in figure 4, should MP3 audio hidden information analysis method based on coefficient correlation model the following steps are included:

In step S401, MP3 audio QMDCT coefficient matrix to be analyzed is obtained, QMDCT coefficient is defeated as model Enter.

Specifically, as shown in figure 5, the audio file such as wav formatted file of 2N different style of selection, sound intermediate frequency File number N > 10000 carry out audio compression to wherein N number of audio using mp3 condensing encoder and obtain MP3 audio file.It is right Remaining N number of audio uses the MP3 audio steganography tool based on quantization or coding to believe the secret of different content different length Breath is embedded in obtain the MP3 audio file of steganography.Steganography audio and non-steganography audio constitute steganalysis sample database.

Decompression is decoded to MP3 audio using MP3 audio decoder, in decompression process, available solution The coefficient of compression.For giving sample, each frame decompression of audio is commonly available n*576 QMDCT coefficient, and n depends on The sound channel and numbers of particles of audio.If single audio frequency is segmented into K frame, it is obtained to each audio QMDCT system Number Sequence can indicate are as follows:

X=[x₁,x₂,…,x_K]。

In step S402, a variety of QMDCT phases are extracted from MP3 audio QMDCT coefficient matrix by Recognition with Recurrent Neural Network Close mode coefficient vector.

Further, in one embodiment of the invention, the method for the embodiment of the present invention further include: coefficient of utilization is related It extracts model and correlation analysis is carried out to QMDCT coefficient sequence, obtain coefficient correlated characteristic vector.

Specifically, as shown in figure 5, using coefficient associated extraction model as shown in FIG. 6 to QMDCT coefficient sequence H into Row correlation analysis, available coefficient correlated characteristic vector W=[g₁；g₂,…；g_n].The wherein specific implementation of coefficient correlation model Based on Recognition with Recurrent Neural Network, basic framework with formula as shown in fig. 7, expressed are as follows:

Wherein s_tT moment recirculating network node state is represented, x is t moment input vector, and b is linear bias, and W is input Weight, V are output weight.

In order to capture the direction relations between coefficient, the embodiment of the present invention uses bidirectional circulating neural network come the side of extraction To information, basic structure using formula as shown in figure 8, can be expressed as:

Wherein, h is node state vector, and W is weighing vector, and b is bias vector, and y is output vector.Wherein, H is length Phase memory network (LSTM), structure is as shown in Figure 9.It can be stated with formula are as follows:

Wherein, W is weight vector, and i is input gate vector, and f is to forget door vector, and o is out gate vector, x be input to Amount, b are linear bias, and h is node state, and σ is usually sigmoid function, and ⊙ is point-by-point operation.

In step S403, the audio associative mode coefficient vector analyzed by tagsort network handles is classified, Obtain the probability whether MP3 audio to be analyzed carries out steganography.

Specifically, as shown in figure 5, coefficient correlated characteristic vector obtained in step 402 is used tagsort model Classify, wherein tagsort network can be described as:

s_t=tanh (W_t·y_t+b_t),

Z_k=V_k*s_k+b_k,

Wherein, s is the middle layer output of tagsort network, Z_kIt is the output valve of corresponding a certain classification, Z is to finally obtain Each classification normalization probability.Particularly for Stego-detection, have:

Wherein, z is the probability value exported before this, and threshold is given judgment threshold, and T is the class label of model.

Further, for above-mentioned two models, that is, coefficient associated extraction model and tagsort model, the present invention is real It applies example to be trained using the frame of supervised learning, algorithm flow is as shown in Figure 10, passes through available two models of training Optimized parameter.For any one MP3 audio, according to above-mentioned step, obtain in an identical manner the QMDCT feature of audio to Amount, then using train come two model parameters detected, to judge whether MP3 audio passes through steganography.

To sum up, the audio quantization obtained for MP3 audio compression process corrects cosine transform coefficient matrix QMDCT, proposes A kind of coefficient correlation model carries out steganalysis.The model uses Recognition with Recurrent Neural Network using QMDCT coefficient as input Itd is proposed coefficient correlated characteristic is extracted, and uses tagsort network as disaggregated model to carry out steganography classification, it is involved Model obtains model coefficient by there is the learning framework of supervision.The coefficient correlation model binding characteristic classification net finally trained Network can carry out MP3 audio steganography detection.Present invention method can apply the MP3 sound a variety of based on quantization and coding Frequency steganographic algorithm such as MP3Stego in EECS scheduling algorithm, has certain versatility and can obtain very high steganography Detection accuracy.That is, the method for the embodiment of the present invention can be to a variety of steganographic algorithms based on MP3 quantization encoding process It can realize higher Stego-detection rate, there is certain versatility, and eliminate the dependence for manual feature on model, and set The model of meter is fairly simple, and it is time-consuming to can be realized lower detection.

The MP3 audio hidden information analysis method based on coefficient correlation model proposed according to embodiments of the present invention, with sequence Viewpoint models MP3 tonic train again, and using Recognition with Recurrent Neural Network building coefficient correlation model between captured frame and frame, Correlated characteristic in frame between each coefficient, and the process filtered by hand directly is eliminated using QMDCT coefficient, and can build The more remote coefficient of mould relies on and the directional information of coefficient, so as to realize higher steganalysis verification and measurement ratio, and Still performance is good under the conditions of low insertion rate, and coefficient correlation model complexity is relatively low, can guarantee preferable real-time Detection performance.

The MP3 audio steganography based on coefficient correlation model point proposed according to embodiments of the present invention referring next to attached drawing description Analysis apparatus.

Figure 11 is that the structure of the MP3 audio hidden information analysis device based on coefficient correlation model of one embodiment of the invention is shown It is intended to.

As shown in figure 11, should MP3 audio hidden information analysis device 10 based on coefficient correlation model include: obtain module 100, Extraction module 200 and categorization module 300.

Wherein, module 100 is obtained for obtaining MP3 audio QMDCT coefficient matrix to be analyzed, and QMDCT coefficient is made For mode input.Extraction module 200 is used to extract from MP3 audio QMDCT coefficient matrix by Recognition with Recurrent Neural Network a variety of QMDCT associative mode coefficient vector.Categorization module 300 is used for the audio associative mode system analyzed by tagsort network handles Number vector is classified, and the probability whether MP3 audio to be analyzed carries out steganography is obtained.The device 10 of the embodiment of the present invention can Higher Stego-detection rate can be realized to a variety of steganographic algorithms based on MP3 quantization encoding process, there is certain versatility, and The dependence for manual feature is eliminated on model, and the model designed is fairly simple, it is time-consuming to can be realized lower detection.

X=[x₁,x₂,…,x_K],

Wherein, x_KFor the coefficient vector after each frame amount.

Further, in one embodiment of the invention, the device 10 of the embodiment of the present invention further include: analysis module. Wherein, analysis module carries out correlation analysis to QMDCT coefficient sequence for coefficient of utilization associated extraction model, obtains coefficient correlation Feature vector.

Further, in one embodiment of the invention, tagsort network are as follows:

s_t=tanh (W_t·y_t+b_t),

Z_k=V_k*s_k+b_k,

It should be noted that the aforementioned explanation to the MP3 audio hidden information analysis embodiment of the method based on coefficient correlation model Illustrate the MP3 audio hidden information analysis device based on coefficient correlation model for being also applied for the embodiment, details are not described herein again.

The MP3 audio hidden information analysis device based on coefficient correlation model proposed according to embodiments of the present invention, with sequence Viewpoint models MP3 tonic train again, and using Recognition with Recurrent Neural Network building coefficient correlation model between captured frame and frame, Correlated characteristic in frame between each coefficient, and the process filtered by hand directly is eliminated using QMDCT coefficient, and can build The more remote coefficient of mould relies on and the directional information of coefficient, so as to realize higher steganalysis verification and measurement ratio, and Still performance is good under the conditions of low insertion rate, and coefficient correlation model complexity is relatively low, can guarantee preferable real-time Detection performance.

In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.

Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, modifies, replacement and variant.

Claims

1. a kind of MP3 audio hidden information analysis method based on coefficient correlation model, which comprises the following steps:

MP3 audio QMDCT coefficient matrix to be analyzed is obtained, using the QMDCT coefficient as mode input；

Extracted from the MP3 audio QMDCT coefficient matrix by Recognition with Recurrent Neural Network a variety of QMDCT associative mode coefficients to Amount；

Classified by tagsort network to the audio associative mode coefficient vector to be analyzed, is obtained to be analyzed Whether MP3 audio carries out the probability of steganography.

2. the method according to claim 1, wherein the QMDCT coefficient sequence that each MP3 audio obtains indicates Are as follows:

X=[x₁,x₂,…,x_K],

Wherein, x_KFor the coefficient vector after each frame amount.

3. according to the method described in claim 2, it is characterized by further comprising:

Correlation analysis is carried out to QMDCT coefficient sequence described in coefficient of utilization associated extraction model, obtains coefficient correlated characteristic vector.

4. the method according to claim 1, wherein the tagsort network are as follows:

s_t=tanh (W_t·y_t+b_t),

Z_k=V_k*s_k+b_k,

5. according to the method described in claim 4, it is characterized in that, Stego-detection formula are as follows:

6. a kind of MP3 audio hidden information analysis device based on coefficient correlation model characterized by comprising

Module is obtained, it is for obtaining MP3 audio QMDCT coefficient matrix to be analyzed, the QMDCT coefficient is defeated as model Enter；

Extraction module, for extracting a variety of QMDCT phases from the MP3 audio QMDCT coefficient matrix by Recognition with Recurrent Neural Network Close mode coefficient vector；

Categorization module, for being classified by tagsort network to the audio associative mode coefficient vector to be analyzed, Obtain the probability whether MP3 audio to be analyzed carries out steganography.

7. device according to claim 6, which is characterized in that the QMDCT coefficient sequence that each MP3 audio obtains indicates Are as follows:

X=[x₁,x₂,…,x_K],

Wherein, x_KFor the coefficient vector after each frame amount.

8. device according to claim 7, which is characterized in that further include:

Analysis module obtains coefficient for carrying out correlation analysis to QMDCT coefficient sequence described in coefficient of utilization associated extraction model Correlated characteristic vector.

9. device according to claim 6, which is characterized in that the tagsort network are as follows:

s_t=tanh (W_t·y_t+b_t),

Z_k=V_k*s_k+b_k,

10. device according to claim 9, which is characterized in that Stego-detection formula are as follows: