CN114630006B

CN114630006B - Secret information extraction method based on consistent most advantageous test

Info

Publication number: CN114630006B
Application number: CN202210055235.5A
Authority: CN
Inventors: 刘九芬; 杜寒松; 张祎; 罗向阳
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2023-05-26
Anticipated expiration: 2042-01-18
Also published as: CN114630006A

Abstract

The invention provides a secret information extraction method based on consistent most advantageous test. Firstly, researching probability distribution of different bits in a sequence extracted by a true steganographic key; then searching the relation between the probability distribution of the sequence extracted by the pseudo-steganographic key and the probability distribution of the secret carrying sequence, and proving that the probability distribution of the subsequence extracted by the pseudo-steganographic key has difference by researching the probability distribution of the space domain and JPEG domain secret carrying sequence; and finally, based on the difference, screening out the correct steganographic key by utilizing the consistent most advantageous test. Given the probability of false rejection and false taking, the threshold and sample size required for the hypothesis testing are derived. Experimental results show that the method provided by the invention can recover the steganographic key of the common main stream steganographic algorithm loaded secret image, thereby realizing the extraction of the steganographic information.

Description

Secret information extraction method based on consistent most advantageous test

Technical Field

The invention relates to the technical field of digital steganography, in particular to a secret information extraction method based on consistent optimal potential test.

Background

Digital steganography uses open channels to communicate by embedding steganographic information into multimedia files such as digital images, audio, video, etc. to achieve steganography. The communication conceals the existence of secret information, has strong deception, and has become a main mode for implementing the concealed communication and a research hot spot in the field of information security in recent years. The digital steganalysis technology mainly researches how to detect and extract secret information hidden in a digital carrier, and the final aim is to extract the hidden secret information and verify the correctness of steganography detection. On the one hand, because the two parties of communication mostly adopt a mode of 'hiding+encrypting' to realize hidden communication, for an attacker who wants to obtain hidden communication content and evidence-taking hidden communication behaviors, he must firstly extract hidden ciphertext information and then consider decoding, so the hidden information extraction is a problem which is difficult to avoid in the process of obtaining hidden communication content and evidence-taking hidden communication behaviors. On the other hand, most detection methods for steganography at present determine whether secret information is hidden by analyzing whether a carrier is modified, but the modified carrier does not necessarily contain sensitive information, so the validity of the determination is questioned, and the research of extracting the secret information is also necessary.

Currently, the carrier is adaptive steganography of images, that is, image adaptive steganography has become the dominant research direction in steganography. The embedding process of image adaptive steganography is typically composed of two parts, a distortion function and steganography, and may be expressed as "adaptive steganography=distortion function+steganography. The distortion function is used to calculate the distortion of the carrier image at different positions. Different steganography algorithms define distortion functions from different angles, the general principle being: the values are smaller in the texture complex region and the edge region, and larger in the smooth region. Steganographic encoding is used to select a modification position based on the calculated distortion to embed steganographic information with minimal distortion cost. The self-adaptive steganography adopts matrix coding, wet paper coding and other steganography coding. Document 1"T.

T.Filler, P.Bas. "Using High-Dimensional Image Models to Perform Highly Undetectable Steganography," In: proceedings of the 12th International Workshop on Information Hiding (IH), calgary, canada,2010, pp.161-177 "applies STC (Syndrome-Trellis Codes) for the first time to image adaptive steganography. Afterwards, STC approaches the theoretical optimum characteristic with its performance in minimizing embedded distortion, and becomes the first choice coding of the adaptive steganography algorithm, and adaptive steganography based on STC has also become the key point of forward improvement and the difficulty of backward analysis of the steganography algorithm.

Most of the secret information extraction methods are carried out under specific conditions, and extraction methods under the condition of only a secret image are urgently needed to be researched. The document 2"X.Luo,X.Song,X.Li,et al," Steganalysis of HUGO steganography based on parameter recognition of Syndrome-Trellis-Codes, "Multimedia Tools and Applications,2016, vol.75, no.21, pp.13557-13583" proposes a secret information extraction method suitable for space domain steganography under a secret-only condition for plaintext embedding. The LSB (Least significant bit) of the spatial domain image pixels is random noise, and the frequency of occurrence of 01 bits is the same. The method considers that the probability distribution of the sequence extracted by the pseudo steganographic key is the same as that of the spatial carrier image pixel LSB, namely the 01 bit frequency is close to the same. However, most of the pictures transmitted on the internet are in JPEG format, and the frequency of occurrence of 01 bits in the embeddable DCT (Discrete Cosine Transform) coefficient of the JPEG image is not the same.

Disclosure of Invention

In order to provide a secret information extraction method suitable for JPEG images under a secret-only condition, the invention provides a secret information extraction method based on consistent most advantageous test.

The invention provides a secret information extraction method based on consistent most advantageous test, which comprises the following steps:

step 1: estimating the length m of the secret information according to pixels of the secret image or the embeddable DCT coefficient;

step 2: given a first class error rate alpha and a second class error rate beta, calculating a sample capacity N and a threshold value T; wherein the first type error rate α represents a probability of determining a true steganographic key as a false steganographic key, and the second type error rate β represents a probability of determining a false steganographic key as a true steganographic key;

step 3: constructing consistent most advantageous test statistics

Each check matrix in the exhaustive check matrix space sequentially performs a statistic calculation process, wherein the statistic calculation process comprises the following steps: extracting a sequence from the encrypted image by using the check matrix enumerated at the current moment, and sampling at intervals of 7 bits from the j-th bit of the first byte of the sequence to obtain a subsequence with the length of N->

Calculating to obtain statistics->

Is a value of (2);

step 4: judging whether N > m is true, if so, turning to step 7; otherwise, turning to step 5;

step 5: judging the corresponding check matrix of each

If so, storing the corresponding check matrix into the key alternative set B; otherwise, discarding the corresponding check matrix;

step 6: when judging that all check matrix corresponds to

Then, if |b|=1, the check matrix in B is the true steganographic key, and the extraction is successful; if |b|=0, the extraction fails; if |B|>1, making the check matrix space be B, and turning to step 7;

step 7: will enable

The check matrix reaching the maximum value is stored in the key alternative set D; if |D|=1, the check matrix in D is the true steganography key, and the extraction is successful; if |D|>1, extraction failure.

Further, step 2 specifically includes:

step 2.1: according to the given first class error rate alpha and second class error rate beta, calculating to obtain a critical value a, b:

step 2.2: the sample size N and the threshold T are calculated according to equation (28):

wherein a, b satisfy φ (a) =α, φ (b) =1- β, μ ₀ and σ₀ Respectively represent when H ₀ R at the time of establishment _i Is μ ₁ and σ₁ Respectively represent H ₁ R at the time of establishment _i Is the expected and variance of (1);

i＝1,2,…,n ₀ ；H ₀ and H₁ Representing hypothesis testing questions, denoted as H ₀ :D＝D ₀ ，H ₁ :D＝D ₁ D represents the overall distribution function of the sample, D ₀ D represents a distribution function of a steganographic key ₁ A distribution function representing a pseudo steganographic key; let the sequence extracted by the steganographic key be l, starting with the j, j=1, 2, …,8 bits of the first byte, samples every 7 bits, a total of samples n ₀ Bits, get subsequence->

Is +.>

i＝1,2,…,n ₀ ；/>

Represents the j-th subsequence extracted with the use of the steganographic key>

I < th > bit->

Probability density function, χ ₀ ,χ ₁ ≠0.5；/>

Representing the sequence extracted with pseudo steganographic key +.>

I < th > bit->

Probability density function, gamma ₀ ,γ ₁ ≈0.5；i＝1,2,…,n ₀ 。

Further, step 3 further includes: constructing consistent most advantageous test statistics according to equation (29)

wherein ,

expression sequence->

Number of occurrences of 0, < >>

Expression sequence->

1 in the number of occurrences of (1).

The invention has the beneficial effects that:

the image self-adaptive steganography technology is also one of tools for planning and coordinating criminal activities by enemy and terrorists to endanger political safety and social stability in China while protecting data privacy. The secret information extraction method based on the consistent most advantageous test is simultaneously suitable for extracting secret information in airspace and JPEG (Joint Photographic Experts Group) domains, and can recover secret keys of common main stream secret algorithm secret images under the condition of plaintext embedding, thereby realizing secret information extraction and having important significance for maintaining national security of China.

Drawings

Fig. 1 is a flow chart of a secret information extraction method based on consistent most advantageous test according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of probability distribution of each bit of plain text according to an embodiment of the present invention;

FIG. 3 is a 01 bit frequency distribution of LSB bits of a carrier image provided by an embodiment of the present invention;

FIG. 4 is a 01 bit frequency distribution of LSB bits of a loaded image according to an embodiment of the present invention; (a) at 0.5bpp or bpnzac embedding rate; (b) at 0.4bpp or bpnzac embedding rate; (c) at 0.3bpp or bpnzac embedding rate; (d) at 0.2bpp or bpnzac embedding rate; (e) at 0.1bpp or bpnzac embedding rate;

FIG. 5 is a schematic diagram of an airspace carrier image according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a JPEG domain carrier image according to an embodiment of the present invention;

FIG. 7 is a 01 bit frequency distribution in a sub-sequence of pseudo steganographic key extraction provided by an embodiment of the present invention;

FIG. 8 is a statistical magnitude of sequences extracted from various steganographic keys provided by an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention is characterized in that: the correct steganographic key is screened out using a consistent most advantageous test based on the difference between the probability distribution of the sub-sequence extracted from the secret image using the true steganographic key and the probability distribution of the sub-sequence extracted from the secret image using the pseudo steganographic key.

Since the technical scheme of the invention is based on the premise that the probability distribution of sequences extracted by the true and false steganographic keys is different, but the prior art does not have a premise of researching and indicating the fact, the premise of the fact is proved before the technical scheme of the invention is introduced.

(1) Message distribution characteristics for steganographic key extraction

When the embedded message is in plaintext, the sequence extracted by the steganographic key is a plaintext sequence. Next, the probabilities of 0 and 1 of the respective bits of the chinese-english plaintext byte are studied.

In a computer system, english characters are stored in the form of bytes, each character occupies 1 byte, and the highest bit of each byte is 0; chinese characters are stored in the form of double bytes, the highest bit of each byte being 1 in order to avoid confusion with english characters. For the seven lower bits of the Chinese and English bytes, the probabilities of 0 and 1 in the seven lower bits of the Chinese and English bytes are calculated according to a large number of statistical results of natural language, and the results are shown in table 1 (8 represents the lowest bit, and the other bits are 7,6,5,5,4,3,2,1 in sequence).

Table 1 probabilities of respective bits of plaintext byte being 0 and 1

For plaintext embedding, the information extracted by the correct steganographic key is a plaintext sequence. Starting from the ith bit of the first byte of the plaintext sequence, sampling is performed at 7 bits each interval to obtain a subsequence L _i I=1, 2, …,8. From Table 2, the subsequence L _i I=0, 1 probability in 1,2, …,8.

TABLE 2 probability of 0 in subsequence

Obviously L _i I=1, 2, …,8, i.e. the 01 bit probability imbalance in the different bits of the sequence extracted by the true steganographic key.

(2) Message distribution characteristics for pseudo steganographic key extraction

This section investigates the probability distribution of 01 bits in the sequence of pseudo steganographic key extraction. Firstly, the relation between the probability of 01 bits in the sequence extracted by the pseudo steganographic key and the probability of 01 bits in the carrier sequence is proved to be: the difference of the probabilities of 01 bits in the sequence extracted by the pseudo steganographic key is equal to the power of r of the difference of the probabilities of 01 bits in the carrier sequence; then, the probability distribution of 01 bits in the space domain and JPEG domain self-adaptive steganography secret-carrying sequences is studied respectively, so that the probability of 01 bits in the sequence extracted by the pseudo steganography secret key is close to balance.

(2.1) the following quotients are presented first:

lemma 1: is provided with

F ₂ For binary Galois field, for fixed

If pr (a) _i ＝0)＝p ₀ ,pr(a _i ＝1)＝p ₁ I=1, 2, …, q, then

And (3) proving: on the binary Galois field,

if and only if->

There are an odd number 1; />

If and only if->

There is an even number 1. Then:

thus (2)

The proof ends.

According to lemma 1, the theorem describing the relationship between the probability distribution of the sequence extracted by the pseudo steganographic key and the probability distribution of the secret-carrying sequence is given as follows.

Theorem 1: let the payload sequence be y= (y) ₁ ,y ₂ ,…,y _n ) Wherein pr (y) _i ＝0)＝p ₀ ,pr(y _i ＝1)＝p ₁ I=1, 2, …, n. Let s=(s) be the sequence of pseudo steganographic key extraction ₁ ,s ₂ ,…,s _t ) When the pseudo steganographic key is composed of one sub-check matrix, the probability of 0 in the sequence s is:

when the pseudo steganographic key is composed of two different sub-check matrices, the probability of 0 in the sequence s is:

wherein n_i I=1, 2 represents the number of 1s in the i-th sub-check matrix.

And (3) proving: when the secret information is long enough, the distribution of h-1 bits in front of the sequence does not influence the overall distribution, wherein h is the height of the sub-check matrix. Therefore, the first h-1 bits of the sequence are not considered for ease of discussion. For any j is greater than or equal to h, s _j Equal to the multiplication of the basis row vector with the corresponding partial encryption sequence. Let the vector of the non-zero part of the j-th row of the steganographic key be

Wherein when->

V when (v) _i =1. Setting corresponding partial carrier sequence +.>

Then

When pr (y) _i ＝0)＝p ₀ ,pr(y _i ＝1)＝p ₁ At the same time, as can be seen from the quotients 1:

where r represents the number of 1s in the base row vector, namely: the difference in 01 bit probabilities in the sequence extracted by the pseudo steganographic key is equal to the r power of the difference in 01 bit probabilities in the carrier sequence.

When the check matrix is composed of a sub-check matrix, the number of 1 in the sub-check matrix is assumed to be n ₁ At this time

When the check matrix is composed of two different sub-check matrices, it is assumed that the number of 1 in the sub-check matrix is n ₁ and n₂ At this time

And (5) finishing the verification.

(2.2) the frequency distribution in the sequence of pseudo steganographic key extraction is examined below. Obviously, the frequency of occurrence of 0 in the sequence is:

from Bernoulli's law of large numbers, it is known that:

i.e. when the sequence is long enough, the frequency of occurrence of 0 in the sequence stabilizes at the probability calculated in theorem 1.

A good check matrix must satisfy all 1's for the first and last rows, so n ₁ ,n ₂ And is more than or equal to 4. Due to 0<|p ₀ -p ₁ |<1, thus if (p ₀ -p ₁ ) Small enough to ensure that the frequency of 0's in the sequence of pseudo-steganographic key extraction is close to 0.5. The difference between the probabilities of 01 bits in the payload sequence, i.e. (p), is studied for adaptive steganography in the spatial and JPEG domains, respectively ₀ -p ₁ ) Is of a size of (a) and (b).

(2.3) for spatial adaptive steganography, the carrier (payload) sequence consists of LSBs of pixels of the payload (carrier) image. Since the LSB of the pixels of the spatial image is random noise, the probability distribution of 0 and 1 in the carrier and the carrier sequence is close to:

pr(y _i ＝0)＝pr(y _i ＝1)＝1/2 (12)

the result using theorem 1 shows that:

pr(m _i ＝0)-pr(m _i ＝1)＝0 (13)

i.e. the frequency of 01 bits in the sequence of pseudo steganographic key extraction is close to equilibrium.

For JPEG domain adaptive steganography, the carrier (secret) sequence is composed of non-zero Alternating Current (AC) coefficients among DCT coefficients (hereinafter, DCT coefficients) after quantization of the carrier (secret) image. The DCT coefficients of the JPEG image follow the Laplacian distribution. The probability of 0 and 1 in the non-zero AC coefficients LSB of the carrier image DCT coefficients is estimated using the distribution of the carrier image DCT coefficients.

Since the carrier image DCT coefficients approximately obey the Laplacian distribution with a position parameter of 0, the probability density function of the carrier image DCT coefficients can be expressed as:

let h (0) represent the frequency with which DCT coefficients with a value of 0 appear in the carrier image. Thus:

it is considered that the frequency of occurrence is small when the absolute value of the DCT coefficients of the carrier image is large. For ease of calculation, therefore, during the following calculation, taking DCT coefficients of a carrier image the value range is extended to [ -, a-is, ++ infinity ].

Let alpha ₀ and α₁ Representing the probabilities of 0 and 1 in the LSB of the DCT coefficients of the carrier image, respectively, i.e., alpha ₀ Alpha is the sum of the probabilities of the DCT coefficient values being even in the carrier image ₁ For the sum of the probabilities of the DCT coefficient values being odd in the carrier image, then:

let beta ₀ and β₁ The probabilities of 0 and 1 in LSB representing the non-zero DCT coefficients of the carrier image, respectively, then:

let beta' ₀ and β′₁ The probabilities of 0 and 1 in LSB of non-zero AC coefficients of the carrier image are represented respectively:

β′ ₀ ≈β ₀ ,β′ ₁ ≈β ₁ (20)

if a vector sequence is usedThe 01 bit probability difference of the carrier sequence is estimated, and then the 01 bit probability difference in the sequence extracted by the error check matrix is

Where r represents the number of 1 in the basic row vector. The good check matrix must satisfy all 1 in the first row and the last row, so r is not less than 4. At this time, the probability difference between 0 and 1 in the sequence of pseudo steganographic key extraction is small, and the probabilities of 0 and 1 are close to the same. Note that theorem 1 holds true for arbitrary j+_h, so the above derivation holds true for the subsequence of pseudo-steganographic key extraction.

In section (1), the conclusion is that: the probability of 01 bits in different bits of the subsequence extracted by the true steganographic key is unbalanced; in section (2), the conclusion is that: the 01 bit frequency in the sub-sequence of pseudo steganographic key extraction is close to equilibrium. This can be verified by: the probability distribution of the sequence extracted by the true and false steganographic keys is different.

Example 1

On the basis of the premise of the facts, as shown in fig. 1, the embodiment of the invention provides a secret information extraction method based on consistent most advantageous test, which comprises the following steps:

s101: estimating the length m of the secret information according to pixels of the secret image or the embeddable DCT coefficient;

specifically, document "J" may be employed.

The method of fridrich, "Quantitative Steganalysis using rich models," In: proceedings of SPIE, electronic Imaging, media Watermarking, security, and Forensics X v, san Francisco, CA,2013, vol.8665, pp.866500 "estimates the secret information length m, and is not described In detail herein.

S102: given a first class error rate alpha and a second class error rate beta, calculating a sample capacity N and a threshold value T; wherein the first type error rate α represents a probability of determining a true steganographic key as a false steganographic key, and the second type error rate β represents a probability of determining a false steganographic key as a true steganographic key;

specifically, let the sequence of the extraction of the true steganographic key be l, but let the j, j=1, 2, …,8 bits, length n ₀ Is a subsequence of (2)

Is +.>

Then->

Obeying two-point distribution, the probability density function is as follows:

wherein χ₀ ,χ ₁ ≠0.5。

The sequence of pseudo steganographic key extraction is approximately 01 balanced. For the sequence of pseudo-steganographic key extraction, correspondingly, starting from the j, j=1, 2, …,8 bits of the first byte, samples are taken at 7 bits intervals, together with a total of n ₀ Bits, get length n ₀ Is a subsequence of (2)

The probability density function is:

wherein ,γ₀ ,γ ₁ ≈0.5。

Based on this statistical difference, the discrimination problem of the authenticity steganographic key can be converted into a hypothesis testing problem with respect to the sequence distribution:

H ₀ :D＝D ₀ ，H ₁ :D＝D ₁ (23)

wherein D represents the overall distribution function of the sample, D ₀ D represents a distribution function of a steganographic key ₁ A distribution function representing a pseudo steganographic key;

order the

(i＝1,2,…,n ₀ ) Let it be assumed that when H ₀ R at the time of establishment _i The expectation and variance are μ respectively ₀ and σ₀ When H ₁ R at the time of establishment _i The expectation and variance are μ respectively ₁ and σ₁ As can be seen from formulas (24) and (27), respectively:

on this basis, as an implementation manner, the method mainly comprises the following substeps:

s1021: calculating to obtain a critical value a and b according to a given first class error rate alpha and a second class error rate beta;

specifically, it has been mentioned above that the first type error rate α represents the probability of determining a true steganographic key as a false steganographic key, and the second type error rate β represents the probability of determining a false steganographic key as a true steganographic key; so as long as α is made small enough, it can be ensured that the probability of missing a true steganographic key can be sufficiently small. And the expected number of the pseudo steganographic keys to be accepted is not more than 1, namely beta|K|is less than or equal to 1/|K|, beta=1/|K| is generally taken, wherein |K| represents the number of elements in the steganographic key space.

Thus, taking β=1/|k|, while properly defining the value of α, the critical values a, b satisfying Φ (a) =α, Φ (b) =1- β are calculated.

S1022: the sample size N and the threshold T are calculated according to equation (28):

specifically, in the above process, the expiration μ has been counted ₀ and σ₀ ，μ ₁ and σ₁ The threshold values a, b are calculated by using the formula (28) to obtain the sample capacity N and the threshold value T.

S103: each check matrix in the exhaustive check matrix space sequentially performs a statistic calculation process, wherein the statistic calculation process comprises the following steps: extracting a sequence from the encrypted image by using a check matrix enumerated at the current moment, and sampling at intervals of 7 bits from the j-th bit of the first byte of the sequence to obtain a subsequence with the length of N

Calculating to obtain statistics->

Is a value of (2); generally, j=2;

specifically, consistent most advantageous test statistics are constructed according to equation (29)

wherein ,

n long sequence +.>

Number of occurrences of 0, < >>

N long sequence +.>

1 in the number of occurrences of (1).

S104: judging whether N > m is true, if so, turning to step 7; otherwise, turning to step 5;

s105: judging the corresponding check matrix of each

s106: when judging that all check matrix corresponds to

Then, if |b|=1, the check matrix in B is the true steganographic key, and the extraction is successful; if |b|=0, the extraction fails; if |B|>1, making the check matrix space be B, and turning to step S107;

s107: will enable

The secret information extraction method provided by the invention is irrelevant to a specific distortion function adopted by the STC-based adaptive steganography, and is applicable to any adaptive steganography algorithm adopting the STC. The airspace steganography algorithm HUGO "T.

T.Filler,P.Bas.“Using High-Dimensional Image Models to Perform Highly Undetectable Steganography,”In:Proceedings of the 12th International Workshop on Information Hiding (IH), calgary, canada,2010, pp.161-177' applies STC to adaptive steganography, which has led researchers to pay attention to STC and has proposed many improvements. JPEG domain steganography algorithm J-UNIWARD "V.Holub, J.Fridrich." Digital image steganography using universal distortion, "In: proceedings of the 1st ACM Information Hiding and Multimedia Security Workshop (IH&MMSec), montallier, france,2013, pp.59-68 "is the sum of the relative changes of coefficients in the directional filter bank decomposition of the carrier image. This directionality allows the embedded change regions to be concentrated in areas that are difficult to model in multiple directions, with strong resistance to detection.

In order to verify the effectiveness of the secret information extraction method provided by the invention, the invention also provides the following experimental data.

(one) Experimental objects and Experimental Environment

HUGO and J-UNWARD steganography algorithms are selected as experimental objects respectively. The experimental environment is as follows: the operating system is Microsoft Win 10, the CPU is Intel i5, the memory is 8GB, and the programming language is MATLAB.

(II) Experimental setup

In the experiment, 80 airspace carrier images are randomly selected from a BOSSBase_1.01 library, and then the 80 airspace carrier images are converted into JPEG domain carrier images by utilizing Photoshop, wherein the quality factor is 90. The 160 carrier images are grouped into groups of 20, 8 groups each, each designated G ₁ ,G ₂ ,…,G ₈, wherein G₁ ,G ₂ ,G ₃ ,G ₄ Is airspace carrier image, G ₅ ,G ₆ ,G ₇ ,G ₈ Is a JPEG domain carrier image. Experiments produced a total of 800-page encrypted images at embedding ratios of 0.5bpp, 0.4bpp, 0.3bpp, 0.2bpp, and 0.1bpp, respectively. The experiment was performed at a sub-check matrix height of 7.

The sub-check matrixes adopted in the experiment are respectively as follows: [109,71],[109,79,83],[89,127,99,69],[95,75,121,71,109],[95,107,109,79,117,67,121,123,103,81].

(III) results of experiments

The experimental setup was as follows: section (1) researches the probability distribution of each bit of I'm's dream, and then researches the 01 bit probability distribution of the LSB bit of the spatial vector image pixel selected by the experiment and the 01 bit frequency distribution of the LSB bit of the DCT coefficient embeddable by the JPEG domain vector image; section (2) studies the individual digital features of the spatial size and statistics of the steganographic key; the (3) section verifies that the frequency of occurrence of 01 bits in the sequence extracted by the pseudo steganographic key is approximately equal at first, and then based on the (2) section, the assumption test statistics are calculated respectively to obtain the correct steganographic key.

(1) Probability distribution of plaintext and pixel (DCT) coefficients

First, verify the probability distribution of each bit of plaintext message I'm have a dream. As a result, as shown in fig. 2, for each sub-sequence, the right column thereof indicates a frequency of 1, and the left column indicates a frequency of 0. The resulting errors of bit 2 and bit 6 differ the most, but are also controlled to be within 0.1. The frequency error of the remaining bits is controlled to be within 0.06. The 2 nd bit and 6 th bit frequency errors are larger because of too little sample size.

And secondly, researching the frequency distribution of LSBs of the spatial carrier image pixels selected by the experiment and the frequency distribution of LSBs of the embeddable DCT coefficients of the JPEG domain carrier image. The upper right hand corner represents the JPEG image experimental set and the lower left hand corner represents the airspace image experimental set. The experimental results are shown in FIG. 3. As can be seen from fig. 3, airspace group G ₁ ,G ₂ ,G ₃ ,G ₄ The frequency of 1 of the pixel LSB is approximately stabilized around 0.5; JPEG domain group G ₅ ,G ₆ ,G ₇ ,G ₈ The frequency of 1 in LSB of the embeddable DCT coefficient is substantially stable around 0.7.

(2) Steganographic key space size and digital features

The size of the steganographic key space is calculated as follows. For a height h and width w ₁ Is 2, the number of all possible sub-check matrices ^hw . The good sub-check matrix should meet the requirement that the first row and the last row are 1 and any two columns are different, when the embedding rate alpha is the reciprocal of a certain integer, the check matrix is only composed of a single sub-check momentArray formation, steganographic key space of size

Wherein when the embedding rate alpha is not the reciprocal of a certain integer, the check matrix is formed by two sub-check matrices together, and the size of the steganographic key space is

Table 3 shows the size of the steganographic key space at different embedding rates. The sub-check matrix in this experiment was 7 in height. When the embedding ratio is 0.05bpp, the steganographic key space is minimal, about 10 ³ . When the embedding ratio is 0.3bpp, the steganographic key space is maximum, about 2.5X10 ¹⁰ . To be able to recover the steganographic key in a reasonable time, the potential of the steganographic key space is taken to be |k|=10 ³ First class error rate α=0.01, second class error rate β=1/|k|=1/10 ³ . At this time a= -2.33, b=3.90.

Table 3 steganographic key space size at different embedding rates

From the formulas (24-27), R can be calculated _i Expected and variance of i=1, 2, sample size and threshold. Table 4 shows the individual digital characteristics of the statistics at the different bits selected. The probability of 0 occurrence in the most significant bit and the third bit is 0, so that the formulas (24-27) are meaningless and are therefore not considered. As can be seen from table 4, when the second bit is selected, the required sample capacity value is smaller, which is caused by the larger difference of probability distribution of the sub-sequences extracted by the true-false steganographic key; and the larger the probability distribution difference of the sub-sequences extracted by the true-false steganographic key is, the smaller the required sample capacity value is.

Table 4 digital characteristics of statistics at different bits

As can be seen from Table 1, the probability p of 0 occurrence in the plaintext message is

The various digital features of the statistics when plaintext is selected are shown in table 5. The sample size required at this time is large because the difference between the two probability distributions to be distinguished is small.

TABLE 5 digital characterization of statistics in plaintext

(3) Steganographic key recovery result and method comparison

And randomly selecting 6 carrier images from the 80 carrier images adopted in the experiment, and displaying the experimental results. The carrier images are shown in fig. 4 and 5. The frequency distribution of the sub-sequences extracted by the pseudo steganographic key is first studied below. Taking the second bit as an example, the pseudo-steganographic key is exhausted, the sequence extracted from the pseudo-steganographic key is sampled at intervals of 7 bits from the second bit of the first byte, and a subsequence is obtained. When the embedding ratio is 0.5bpp (bpnzac), the sub-check matrix height is 7, the steganographic key space size is 992, and the pseudo steganographic key space size is 991. Fig. 6 shows the frequency distribution of 1 in the sub-sequence of pseudo-steganographic key extraction, wherein the abscissa represents the respective pseudo-steganographic key and the ordinate represents the frequency of 1 in the sub-sequence. Abscissa 1,2,3, …,991 represents steganographic keys [65,67], [65,69], [65,71], …, [127,125], respectively. As can be seen from fig. 6, at an embedding rate of 0.5bpp (bpnzac), the frequency of 1 in the sub-sequence of pseudo steganographic key extraction is approximately around 0.5. The experimental results under other embedded rates and other loaded images are similar. The conclusion above can thus be verified: the frequency distribution of the sub-sequences extracted by the authenticity steganographic key is different.

The following is an experimental result of fig. 6, and the steganographic key recovery is performed by using the steganographic information extraction method provided by the invention. Recorder sequence l ₂ N is the first of (2) ₀ Bits are

When the steganographic key is a true key, then +.>

Obeying two-point distribution, the probability density function is as follows:

from the above experiments, the probability density function of the subsequence extracted by the pseudo-steganographic key is:

wherein ,γ₀ ,γ ₁ ≈0.5。

H ₀ :D＝D ₀ ，H ₁ :D＝D ₁

statistics of the construction:

wherein

N long sequence +.>

Number of occurrences of 0. Statistics->

I.e., the consistent most advantageous test statistic with the smallest probability of error.

When (when)

When receiving hypothesis H ₀ Will H _i Storing the candidate key as a true steganographic key in a key alternative set; when->

When receiving hypothesis H ₁ Discard H _i 。

FIG. 7 illustrates the sequence correspondence of respective steganographic key extractions for spatial and JPEG domain-loaded images

Values, ordinate represent statistics->

The value, the abscissa represents the steganographic key. When the embedding ratio is 0.5bpp (bpnzac), the steganographic key space size is 992. Abscissa 1,2,3, …,992 represent steganographic keys [65,67], respectively]、[65,69],[65,71],…,[127,125]. As can be seen from table 4, at this time, the threshold t=8.17, and the sample size n=44. All possible steganographic keys are exhausted and statistics are calculated separately +.>

Values. The results of FIG. 8 show that only when the abscissa 686, the corresponding steganographic key, is [109,71]]When (I)>

The accepted original hypothesis can be considered as [109,71]]Is the correct steganographic key.

At an embedding ratio of 0.5bpp (bpnzac), 0.4bpp (bpnzac), 0.3bpp (bpnzac), 0.2bpp (bpnzac), 0.1bpp (bpnzac)For spatial and JPEG domain-loaded images, there are and only statistics corresponding to the correct steganographic key

The value satisfies->

Or the value of the statistic corresponding to the correct steganographic key +.>

The maximum number of the identified hidden secret keys is 1, and the accuracy rate can reach 100%.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The secret information extraction method based on the consistent most advantageous test is characterized by comprising the following steps:

step 2: given a first class error rate alpha and a second class error rate beta, calculating a sample capacity N and a threshold value T; wherein the first type error rate α represents a probability of determining a true steganographic key as a false steganographic key, and the second type error rate β represents a probability of determining a false steganographic key as a true steganographic key; the step 2 specifically comprises the following steps:

H ₀ and H₁ Representing hypothesis testing questions, denoted as H ₀ :F＝F ₀ ，H ₁ :F＝F ₁ F represents the overall distribution function of the sample, F ₀ Representing a distribution function of a steganographic key, F ₁ A distribution function representing a pseudo steganographic key; let the sequence extracted by the steganographic key be l, starting with the j, j=1, 2, …,8 bits of the first byte, samples every 7 bits, a total of samples n ₀ Bits, get subsequence->

Is the ith bit of (2)

I < th > bit->

Probability density function, χ ₀ ,χ ₁ ≠0.5；/>

Representing the sequence extracted with pseudo steganographic key +.>

I < th > bit->

Probability density function, gamma ₀ ,γ ₁ ≈0.5；i＝1,2,…,n ₀ ；

Step 3: constructing consistent most advantageous test statistics according to equation (29)

Calculating to obtain statistics->

Is a value of (2);

wherein ,

expression sequence->

Number of occurrences of 0, < >>

Expression sequence->

Number of occurrences of 1 in (2)

step 5: judging the corresponding check matrix of each

step 6: when judging that all check matrix corresponds to

step 7: will enable

The check matrix reaching the maximum value is stored in the key alternative set D; if |D|=1, the check matrix in D is the true steganography key, and the extraction is successful; if |D|>1, extraction failure. />