CN110428846A

CN110428846A - Voice-over-net stream steganalysis method and device based on bidirectional circulating neural network

Info

Publication number: CN110428846A
Application number: CN201910609734.2A
Authority: CN
Inventors: 黄永峰; 杨浩; 杨忠良; 鲍永健; 杨震
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2019-11-08

Abstract

The invention discloses a kind of voice-over-net stream steganalysis methods and device based on bidirectional circulating neural network, wherein this method comprises: S1, obtains voice-over-net stream training set sample and voice-over-net current test collection sample；S2 is extracted voice-over-net load from voice-over-net stream training set sample by sliding window method and is handled to obtain quantization index vector；S3 extracts the code word associated vector in quantization index vector using code word correlation model；S4 classifies to code word associated vector by tagsort model；S5 is iterated step S2-S4 by measure of supervision, generates Stego-detection model；S6, by voice-over-net current test collection sample input Stego-detection model detected to obtain steganography probability, according to steganography probabilistic determination voice-over-net current test collection sample whether steganography.This method can quickly and efficiently judge whether raw tone stream passes through information steganography.

Description

Voice-over-net stream steganalysis method and device based on bidirectional circulating neural network

Technical field

It is the present invention relates to digital audio Stego-detection technical field, in particular to a kind of based on bidirectional circulating neural network Voice-over-net stream steganalysis method and device.

Background technique

Cryptography and Information Hiding Techniques are the Liang Ge branches of information security field.Cryptography is mainly with information theory and close Based on code theory, says that raw information carries out scramble by certain mode, obtain ciphertext.Information steganography is then to believe secret Breath is embedded in initial carrier according to certain way, under the premise of not discovered by third party, sends containing information to reception Side.Compared to cryptography mode, the mode of Information hiding is not easy to cause third-party to be attacked note that can significantly reduce by third party The probability hit, safety greatly improve.Also just because of this, Information hiding has obtained the concern of many people.

Steganography and steganalysis are two importances of Information hiding.Steganography is in order to ensure the information of communicating pair is pacified Entirely.Steganalysis is then to judge whether there is encrypted message in carrier.Since steganography has high confidentiality, hold very much It is easily utilized by criminal, to endanger public security.Then, effective Steganalysis can be examined effectively rapidly The secret information contained in carrier is measured, to prevent the progress of communication in time.

Common steganography carrier specifically includes that image, text and voice.Image-carrier since hidden capacity is larger, by To more concern, steganography and Steganalysis are relatively mature.Voice and text hidden capacity are relatively small, steganography It is also relatively large with steganalysis difficulty.Particularly, voice signal due to its randomness it is strong, steganography and steganalysis are more tired It is difficult.

Compared to static carrier, dynamic carrier steganography increases time dimension, is to use a kind of more steganography now Carrier.Among these, more representational is stream media technology, with the continuous development of network, based on Streaming Media on network Using more and more, typical application such as voice communication technology (VoIP).

Steganography based on Streaming Media unit be broadly divided into the steganography based on agreement and the steganography based on load or.It is based on The Information hiding of network protocol.Mainly make full use of the head of agreement is some not use, or optional region into Row.This mode is since agreement discloses, and compare preferable detection.Information hiding based on load mainly passes through modification Streaming Media The redundancy of load is in the middle to achieve the purpose that covert communications.

Summary of the invention

The present invention is directed to solve at least some of the technical problems in related technologies.

For this purpose, an object of the present invention is to provide a kind of voice-over-net stream steganography based on bidirectional circulating neural network Analysis method, this method can quickly and efficiently judge whether raw tone stream passes through information steganography.

It is another object of the present invention to propose a kind of voice-over-net stream steganography based on bidirectional circulating neural network point Analysis apparatus.

In order to achieve the above objectives, one aspect of the present invention embodiment proposes a kind of network based on bidirectional circulating neural network Voice flow steganalysis method, comprising:

The voice-over-net stream steganalysis method based on bidirectional circulating neural network of the embodiment of the present invention, passes through sliding window Mouth method obtains quantization index vector from Web compression voice flow, is input with quantization index vector, passes through bidirectional circulating mind The correlative character vector of code word in quantization index vector is extracted through network struction code word network of relation, and uses feature point Class network classifies to obtained correlative character vector, whether can quickly and efficiently judge raw tone stream By steganography.

In addition, the voice-over-net stream steganalysis side according to the above embodiment of the present invention based on bidirectional circulating neural network Method can also have following additional technical characteristic:

Further, in one embodiment of the invention, the voice-over-net according to the steganography probabilistic determination Current test collection sample whether steganography, comprising:

The steganography probability is more than or equal to default decision threshold, then there are steganography in the voice-over-net current test collection sample Information；

The steganography probability is less than default decision threshold, then there is no steganography to believe in the voice-over-net current test collection sample Breath.

Further, in one embodiment of the invention, before step S2 further include:

The half voice-over-net stream training set sample in the voice-over-net stream training set sample is selected to carry out information insertion, Take default modulation system that required hiding information is embedded in the half voice-over-net stream training set sample of selection.

Further, in one embodiment of the invention, the voice-over-net stream training set sample and the network language It is acquired in the voice-over-net stream sample database that sound current test collection sample is constructed in internet database or in advance.

Further, in one embodiment of the invention, the voice-over-net in the voice-over-net stream training set sample Stream has K frame, then the quantization index vector is expressed as:

X=[x₁,x₂,…,x_K]。

Further, in one embodiment of the invention, the code word correlation model is to bidirectional circulating neural network What training generated, the code word correlation model are as follows:

Wherein, h is node state vector, and W is weighing vector, and b is bias vector, and y is output vector

H is shot and long term memory network, and x is the quantization index vector.

Further, in one embodiment of the invention, the tagsort model are as follows:

s_t=tanh (W_t·y_t+b_t)

Z_k=V_k*s_k+b_k

Wherein, s_tIt is the middle layer output of the tagsort model, Z_kIt is the output valve of corresponding a certain classification, Z is final The normalization probability of obtained each classification, h are node state vectors.

In order to achieve the above objectives, another aspect of the present invention embodiment proposes a kind of net based on bidirectional circulating neural network Network voice flow hidden information analysis device, comprising:

Module is obtained, for obtaining voice-over-net stream training set sample and voice-over-net current test collection sample；

Generation module, for extracting voice-over-net from the voice-over-net stream training set sample by sliding window method Load is handled to obtain quantization index vector；

Abstraction module, for extracting the code word associated vector in the quantization index vector using code word correlation model；

Categorization module, for being classified by tagsort model to the code word associated vector；

Modeling module generates Stego-detection model for being iterated by measure of supervision；

Detection module detect for the voice-over-net current test collection sample to be inputted the Stego-detection model To steganography probability, according to voice-over-net current test collection sample described in the steganography probabilistic determination whether steganography.

The voice-over-net stream hidden information analysis device based on bidirectional circulating neural network of the embodiment of the present invention, passes through sliding window Mouth method obtains quantization index vector from Web compression voice flow, is input with quantization index vector, passes through bidirectional circulating mind The correlative character vector of code word in quantization index vector is extracted through network struction code word network of relation, and uses feature point Class network classifies to obtained correlative character vector, whether can quickly and efficiently judge raw tone stream By steganography.

In addition, the voice-over-net stream steganalysis dress according to the above embodiment of the present invention based on bidirectional circulating neural network Following additional technical characteristic can also be had by setting:

The steganography probability is more than or equal to default decision threshold, then there are steganography letters for the voice-over-net current test collection sample Breath；

The steganography probability is less than default decision threshold, then there is no steganography letters for the voice-over-net current test collection sample Breath.

Further, in one embodiment of the invention, further includes: insertion module；

The insertion module, for selecting the half voice-over-net stream training set in the voice-over-net stream training set sample Sample carries out information insertion, takes default modulation system that required hiding information is embedded in the half voice-over-net stream of selection Training set sample.

The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.

Detailed description of the invention

Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:

Fig. 1 is the voice-over-net stream steganalysis side based on bidirectional circulating neural network according to one embodiment of the invention Method flow chart；

Fig. 2 is the code word correlation model overall framework schematic diagram according to one embodiment of the invention；

Fig. 3 is the door recursive unit structural schematic diagram according to one embodiment of the invention；

Fig. 4 is the voice-over-net stream steganalysis side based on bidirectional circulating neural network according to one embodiment of the invention Method application scenario diagram；

Fig. 5 is the steganalysis model schematic based on bidirectional circulating neural network according to one embodiment of the invention；

Fig. 6 is the voice coding flow chart according to one embodiment of the invention；

Fig. 7 is the quantization index modulation flow chart according to one embodiment of the invention；

Fig. 8 is to be filled according to the voice-over-net stream steganalysis based on bidirectional circulating neural network of one embodiment of the invention Set structural schematic diagram.

Specific embodiment

The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.

The voice-over-net based on bidirectional circulating neural network proposed according to embodiments of the present invention is described with reference to the accompanying drawings Flow steganalysis method and device.

The network language based on bidirectional circulating neural network proposed according to embodiments of the present invention is described with reference to the accompanying drawings first Sound stream steganalysis method.

Fig. 1 is the voice-over-net stream steganalysis side based on bidirectional circulating neural network according to one embodiment of the invention Method flow chart.

As shown in Figure 1, should voice-over-net stream steganalysis method based on bidirectional circulating neural network the following steps are included:

Step S1 obtains voice-over-net stream training set sample and voice-over-net current test collection sample.

Further, in one embodiment of the invention, voice-over-net stream training set sample and voice-over-net current test It is acquired in the voice-over-net stream sample database that collection sample is constructed in internet database or in advance.

Specifically, acquisition or other way construct a voice-over-net stream sample database from network.Therefrom randomly choose one Subnetwork voice flow is as training sample set, and remaining voice-over-net stream is as test sample collection.

Step S2, through sliding window method from extraction voice-over-net load carries out in voice-over-net stream training set sample Reason obtains quantization index vector.

Further, in one embodiment of the invention, before step S2 further include:

It selects the half voice-over-net stream training set sample in voice-over-net stream training set sample to carry out information insertion, takes Required hiding information is embedded in the half voice-over-net stream training set sample of selection by default modulation system.

Specifically, it in training sample set, selects half voice-over-net stream therein to do information insertion, takes quantization rope Required hiding information is embedded in voice flow by the mode for drawing modulation or gene debugging, and it is steganography that sample label, which is arranged, Voice.To the other half voice-over-net stream, embedding information, setting sample label are not non-steganography voice.To test sample collection using same The method of sample is handled.

Further, it, which has K frame, is assumed for one section of voice-over-net stream, then quantization index vector can indicate are as follows:

X=[x₁,x₂,…,x_K]

G.729 or G.723 particularly for for voice compression coding, wherein x_jFor the column vector of a 3*1, j ∈ [1,K]。

Step S3 extracts the code word associated vector in quantization index vector using code word correlation model.

Further, code word correlation model is to generate to bidirectional circulating neural metwork training, as shown in Fig. 2, illustrating Overall framework schematic diagram.It can be expressed as by formula:

Wherein, h is node state vector, and W is weighing vector, and b is bias vector, and y is output vector.

Wherein, H is that usually shot and long term memory network (LSTM), also can choose similar to door recursive unit (GRU), Structural representation with formula as shown in figure 3, can be stated are as follows:

i_t=σ (W_i·[h_t-1,x_t]+b_i)

f_t=σ (W_f·[h_t-1,x_t]+b_f)

q_t=tanh (W_q·[h_t-1,x_t]+b_q)

o_t=σ (W_o·[h_t-1,x_t]+b_o)

c_t=f_t⊙c_t-1+i_t⊙q_t

h_t=o_t⊙tanh(c_t)

Wherein W be weight vector, i be input gate vector, f be forget door vector, o be out gate vector, x be input to Amount, b are linear bias, and h is node state, and σ is usually sigmoid function, and ⊙ is point-by-point operation.

Step S4 classifies to code word associated vector by tagsort model.

Further, tagsort model classifies the correlated characteristic vector that code word correlation model extracts, it is assumed that The correlated characteristic vector extracted is y, then tagsort network can be described as:

s_t=tanh (W_t·y_t+b_t)

Z_k=V_k*s_k+b_k

Wherein, s is the middle layer output of tagsort network, Z_kIt is the output valve of corresponding a certain classification, Z is to finally obtain Each classification normalization probability.

Further, feature extraction network and tagsort network, which all pass through, has the learning framework training of supervision to obtain.

Step S5 is iterated step S2-S4 by measure of supervision, generates Stego-detection model.

Further, it is iterated by using measure of supervision, available trained Stego-detection model.

Step S6 is detected voice-over-net current test collection sample input Stego-detection model to obtain steganography probability, root According to steganography probabilistic determination voice-over-net current test collection sample whether steganography.

Voice-over-net current test collection sample is inputted into trained Stego-detection model, exports voice-over-net current test collection sample The probability of this steganography whether there is steganography content according to probabilistic determination voice-over-net current test collection sample.

Further, in one embodiment of the invention, according to steganography probabilistic determination voice-over-net current test collection sample Whether steganography, comprising:

Steganography probability is more than or equal to default decision threshold, then there are steganography information in voice-over-net current test collection sample；

Steganography probability is less than default decision threshold, then steganography information is not present in voice-over-net current test collection sample.

Specifically, for Stego-detection, have:

Wherein, z is the probability value exported before this, and threshold is given judgment threshold, and stego speech is steganography Label, cover speech are non-steganography label.

As shown in Figure 4 and Figure 5, model designed by the embodiment of the present invention is primarily directed to the Information hiding based on load, Detection carrier is the voice payload of Streaming Media unit and mainly seems VoIP voice flow.

In general, in order to efficiently carry out network transmission, common voice-over-net standard can all compress voice Transmission, for G.723, voice coding process is as shown in Figure 6.

Wherein, voice sequence is compressed into a series of by linear predictive coding (Linear Predictive Coding, LPC) Voice coefficient, then these coefficients, which are quantized into, is more suitable the quantization parameter bit stream of transmission and is transmitted on network.In In voice steganography, popular steganography method is realized usually in the process of compression and quantization parameter and is write to information Enter, such as quantization index modulation and pitch period steganography.Particularly, as shown in Figure 7 for quantization index modulation steganography.

Quantization index modulation steganography passes through the code book for dividing vector quantization, by Information hiding in code book classification.It receives Side is by judging code book belonging to code word, thus the information flow that judgement is transmitted.Pitch period modulation is similar with this mode.This The method of invention is able to detect the steganography including the voice-over-nets stream such as quantization index modulation and pitch period modulation, has certain Versatility.

Since voice itself has very special property, for common audio steganalysis method to voice-over-net stream simultaneously It is not particularly suitable.It is common such as directly in the detection mode of tim e- domain detection and transform domain since compress speech itself has It is extraordinary can not perceive and voice-over-net stream required by real-time so that these methods and being not suitable for.

Particularly, the existing steganalysis for voice-over-net stream is mainly with statistical method and deep learning method It is main.Statistics-Based Method mainly passes through manual extraction low order code word feature, is then classified using support vector machines, this It is lower that kind mode generally calculates very fast but accuracy rate, and can not capture the information of high-order.Voice-over-net based on deep learning Most of flow algorithm all real-times are poor, and most of correlation information all not accounted between voice sequence code word, so that The Detection accuracy of complex condition is not high.

The voice-over-net stream steganalysis method based on bidirectional circulating neural network proposed according to embodiments of the present invention leads to It crosses sliding window method and obtains quantization index vector from Web compression voice flow, be input with quantization index vector, by double Code word network of relation is constructed to Recognition with Recurrent Neural Network to extract the correlative character vector of code word in quantization index vector, and is made Classified with tagsort network to obtained correlative character vector, it being capable of the original language of judgement quickly and efficiently Whether sound stream passes through steganography.

The voice-over-net based on bidirectional circulating neural network proposed according to embodiments of the present invention is described referring next to attached drawing Flow hidden information analysis device.

As shown in figure 8, being somebody's turn to do the voice-over-net stream hidden information analysis device based on bidirectional circulating neural network includes: acquisition module 100, generation module 200, abstraction module 300, categorization module 400, modeling module 500 and detection module 600.

Wherein, module 100 is obtained, for obtaining voice-over-net stream training set sample and voice-over-net current test collection sample.

Generation module 200, for extracting voice-over-net from voice-over-net stream training set sample by sliding window method Load is handled to obtain quantization index vector.

Abstraction module 300, for extracting the code word associated vector in quantization index vector using code word correlation model.

Categorization module 400, for being classified by tagsort model to code word associated vector.

Modeling module 500 generates Stego-detection model for being iterated by measure of supervision.

Detection module 600, it is hidden for being detected to obtain voice-over-net current test collection sample input Stego-detection model Write probability, according to steganography probabilistic determination voice-over-net current test collection sample whether steganography.

Steganography probability is more than or equal to default decision threshold, then there are steganography information for voice-over-net current test collection sample；

It is embedded in module, for selecting the half voice-over-net stream training set sample in voice-over-net stream training set sample to carry out Information insertion, takes default modulation system that required hiding information is embedded in the half voice-over-net stream training set sample of selection.

It should be noted that aforementioned to the voice-over-net stream steganalysis method embodiment based on bidirectional circulating neural network Explanation be also applied for the device of the embodiment, details are not described herein again.

The voice-over-net stream hidden information analysis device based on bidirectional circulating neural network proposed according to embodiments of the present invention is led to It crosses sliding window method and obtains quantization index vector from Web compression voice flow, be input with quantization index vector, by double Code word network of relation is constructed to Recognition with Recurrent Neural Network to extract the correlative character vector of code word in quantization index vector, and is made Classified with tagsort network to obtained correlative character vector, it being capable of the original language of judgement quickly and efficiently Whether sound stream passes through steganography.

In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.

Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, modifies, replacement and variant.

Claims

1. a kind of voice-over-net stream steganalysis method based on bidirectional circulating neural network, which is characterized in that including following step It is rapid:

S1 obtains voice-over-net stream training set sample and voice-over-net current test collection sample；

S2 is extracted voice-over-net load from the voice-over-net stream training set sample by sliding window method and handle To quantization index vector；

S3 extracts the code word associated vector in the quantization index vector using code word correlation model；

S4 classifies to the code word associated vector by tagsort model；

S5 is iterated step S2-S4 by measure of supervision, generates Stego-detection model；

The voice-over-net current test collection sample is inputted the Stego-detection model and is detected to obtain steganography probability, root by S6 According to voice-over-net current test collection sample described in the steganography probabilistic determination whether steganography.

2. the method according to claim 1, wherein the voice-over-net according to the steganography probabilistic determination Current test collection sample whether steganography, comprising:

The steganography probability is more than or equal to default decision threshold, then there are steganography letters in the voice-over-net current test collection sample Breath；

The steganography probability is less than default decision threshold, then steganography information is not present in the voice-over-net current test collection sample.

3. the method according to claim 1, wherein before step S2 further include:

It selects the half voice-over-net stream training set sample in the voice-over-net stream training set sample to carry out information insertion, takes Required hiding information is embedded in the half voice-over-net stream training set sample of selection by default modulation system.

4. the method according to claim 1, wherein the voice-over-net stream training set sample and the network language It is acquired in the voice-over-net stream sample database that sound current test collection sample is constructed in internet database or in advance.

5. the method according to claim 1, wherein the voice-over-net in the voice-over-net stream training set sample Stream has K frame, then the quantization index vector is expressed as:

X=[x₁,x₂,…,x_K]。

6. the method according to claim 1, wherein the code word correlation model is to bidirectional circulating neural network What training generated, the code word correlation model are as follows:

Wherein, h is node state vector, and W is weighing vector, and b is bias vector, and y is output vector, and H is that shot and long term remembers net Network, x are the quantization index vector.

7. the method according to claim 1, wherein the tagsort model are as follows:

s_t=tanh (W_t·y_t+b_t)

Z_k=V_k*s_k+b_k

Wherein, s_tIt is the middle layer output of the tagsort model, Z_kIt is the output valve of corresponding a certain classification, Z is to finally obtain Each classification normalization probability, h is node state vector.

8. a kind of voice-over-net stream hidden information analysis device based on bidirectional circulating neural network characterized by comprising

Generation module, for extracting voice-over-net load from the voice-over-net stream training set sample by sliding window method It is handled to obtain quantization index vector；

Detection module, it is hidden for being detected to obtain the voice-over-net current test collection sample input Stego-detection model Write probability, according to voice-over-net current test collection sample described in the steganography probabilistic determination whether steganography.

9. device according to claim 6, which is characterized in that the voice-over-net according to the steganography probabilistic determination Current test collection sample whether steganography, comprising:

The steganography probability is more than or equal to default decision threshold, then there are steganography information for the voice-over-net current test collection sample；

10. device according to claim 6, which is characterized in that further include: insertion module；

The insertion module, for selecting the half voice-over-net stream training set sample in the voice-over-net stream training set sample Information insertion is carried out, takes default modulation system that required hiding information is embedded in the half voice-over-net stream training of selection Collect sample.