CN114051082A

CN114051082A - Steganography detection feature selection method and device based on distortion degree and information gain ratio

Info

Publication number: CN114051082A
Application number: CN202111213537.2A
Authority: CN
Inventors: 马媛媛; 王艺皓; 许力戈; 靳瑞霞; 李淳
Original assignee: Henan Normal University
Current assignee: Henan Normal University
Priority date: 2021-10-19
Filing date: 2021-10-19
Publication date: 2022-02-15
Anticipated expiration: 2041-10-19
Also published as: CN114051082B

Abstract

The invention belongs to the technical field of steganography detection feature selection, and particularly relates to a steganography detection feature selection method and a steganography detection feature selection device based on a distortion degree and an information gain ratio, wherein the method comprises the steps of firstly, measuring the difference of each steganography detection feature component between a carrier image and a secret image by using the distortion degree and the information gain ratio; then, respectively arranging the distortion value and the information gain ratio of each steganography detection characteristic component in a descending order; then, deleting the steganography detection characteristic components with larger difference of the arrangement sequence according to the two arranged column numbers; and finally, training and detecting the reserved steganography detection characteristic component as a finally selected steganography detection characteristic. The method can effectively reduce the DCTR characteristic dimension while maintaining or even improving the detection precision of the secret-carrying image, thereby reducing the space-time complexity of detecting the secret-carrying image.

Description

Steganography detection feature selection method and device based on distortion degree and information gain ratio

Technical Field

The invention belongs to the technical field of selection of steganography detection features, and particularly relates to a method and a device for selecting steganography detection features based on distortion degree and information gain ratio.

Background

Steganography, another term for covert communications, is a technique of hiding a message in an object that is not susceptible to suspicion and then sending it to the intended recipient, and has recently received widespread attention in the field of information security. And where digital media "steganography" is engaged in activities that jeopardize national security as it may be used by illegal organizations and the like to conceal communications. The corresponding attack technology, steganalysis, is to extract the hidden message to protect the national security against steganography.

With the rapid development of digital media, how to improve the speed and precision of steganography detection becomes a problem to be solved urgently. Therefore, the steganography detection algorithm based on digital image self-adaptation is a direction in which the current scholars pay more attention, and the steganography detection algorithm mainly extracts steganography detection features and utilizes an integrated classifier to train and detect, so that a good detection effect can be obtained. Currently, scholars have developed a series of high-dimensional steganography detection algorithms. Although the high-dimensional steganography detection features achieve higher detection precision on image self-adaptive steganography, the dimension of extracting the steganography detection features by the self-adaptive steganography detection algorithm is higher, so that higher space-time complexity is brought to detection of the secret-carrying image, and development of rapid steganography detection is influenced. Therefore, how to select the features which greatly contribute to the detection is the center of research in the current steganography detection field, so that the dimension of the steganography detection features is reduced, and the space-time complexity of detecting the secret-carrying images is further reduced.

Currently, a series of studies have been conducted by some scholars on the selection and dimension reduction of the steganography detection features. These research methods can be classified into general and specific feature selection methods according to the objects to which the feature selection methods are applied. The universal steganography detection feature selection method is suitable for measuring various detection features, measures the contribution of feature components to the detection of the secret-carrying images, and selects the feature components which greatly contribute to the detection of the secret-carrying images as feature vectors for training and testing. The specific steganography detection feature selection method is a selection method aiming at a certain steganography detection feature. Although the calculation of the characteristic selection method is simpler than that of the general characteristic selection, the application range is narrower.

To date, several studies have achieved different steganalysis feature selection effects, such as CC-PEV, GFR, CC-JRM, SRM, J + SRM features. However, the existing method has unsatisfactory effect on selecting DCTR features, and has the problems of overhigh dimension of the selected features, excessive reduction of detection precision and the like.

Disclosure of Invention

In order to greatly reduce the characteristic dimension of DCTR (discrete cosine transform) under the condition of not influencing the detection precision, the invention provides a steganography detection characteristic selection method (abbreviated as S-FUND method) and a device based on distortion degree and information gain ratio, which improve the detection precision of a secret-carrying image and reduce the characteristic dimension at the same time, thereby achieving the purpose of reducing the space complexity of detecting the secret-carrying image; and the dependence on the classification result can be avoided, so that the time complexity of detecting the secret-carrying image is reduced.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a steganography detection feature selection method based on distortion degree and information gain ratio, which comprises the following steps:

measuring the difference of each steganography detection characteristic component between the carrier image and the secret image by using the distortion degree and the information gain ratio;

respectively arranging the distortion value and the information gain ratio of each steganography detection characteristic component in a descending order;

deleting the steganographic detection characteristic components with larger difference of the arrangement sequence according to the two arranged column numbers;

and training and detecting the reserved steganography detection characteristic components as finally selected steganography detection characteristics.

Further, the difference between the carrier image and the secret image of each steganographic detection characteristic component is measured by using a distortion degree, the carrier image can generate distortion with different degrees due to embedding of information, and the distortion degree K of the carrier image before and after steganographic is measured by using a formula (1)_iThe formula is as follows:

in the formula (I), the compound is shown in the specification,

and

respectively representing the values of the ith steganographically detected feature component in the carrier image and the secret image,

and

respectively representing the value of the ith steganographic detection characteristic component in the jth carrier image and the value of the carrier image; k_iThe larger the value, the larger the distortion degree generated after the carrier image is embedded with information, which indicates that the difference between the steganographic detection characteristic component and the secret image is larger, the more favorable the characteristic component is for detecting the secret image, the more the characteristic component should be preserved.

Further, the information gain ratio is defined as a ratio of an information gain value of the feature component between the carrier image and the density of the carrier image to a bias entropy of the feature component in the carrier image with respect to the feature component in the density image, and the information gain ratio is measured by using formula (2)

The formula is as follows:

in the formula (I), the compound is shown in the specification,

representing the characteristic components on the carrierThe information gain value between the image and the density carrier,

representing the partial entropy of the value of the characteristic component in the carrier image relative to the value of the characteristic component in the secret image;

the larger the value, the larger the difference between the carrier image and the secret image, which indicates the steganographically detected feature component, the more favorable the feature component is for detecting the secret image, the more should it be preserved.

Further, the formula of the information gain value of the characteristic component between the carrier image and the carrier density is as follows:

in the formula (I), the compound is shown in the specification,

and

respectively representing the information entropy values of the characteristic components in the carrier image and the secret image,

representing the conditional entropy values of the feature components in the median of the dense image under the condition that the feature components are in the median of the dense image,

representing the joint entropy of the feature components between the carrier image and the secret image.

Further, the step of arranging the distortion value and the information gain ratio of each steganography detection feature component in a descending order respectively comprises:

according to K_iArranging the steganography detection characteristic components in descending order of value;

then according to

The steganographic detection feature components are arranged in descending order of value.

Further, deleting the steganography detection characteristic components with larger difference of the arrangement sequence according to the two arranged column numbers comprises:

calculating the difference of the ith characteristic component arranged according to two criteria according to the two sequencing results;

the feature components having the absolute value of the difference in the arrangement order larger than the threshold value T are deleted.

The invention also provides a steganography detection feature selection device based on the distortion degree and the information gain ratio, which comprises the following steps:

a measurement module for measuring the difference between the carrier image and the secret image of each steganographic detection feature component by using the distortion degree and the information gain ratio;

the descending order arrangement module is used for respectively descending order arrangement on the distortion value and the information gain ratio of each steganography detection characteristic component;

the deleting module is used for deleting the steganography detection characteristic components with larger difference of the arrangement sequence according to the two arranged column numbers;

and the training module is used for training and detecting the reserved steganography detection characteristic components as finally selected steganography detection characteristics.

Compared with the prior art, the invention has the following advantages:

firstly, measuring the difference of each steganographic detection characteristic component between a carrier image and a secret carrier image by using the distortion degree and the information gain ratio; then, respectively arranging the distortion value and the information gain ratio of each steganography detection characteristic component in a descending order; then, deleting the steganography detection characteristic components with larger difference of the arrangement sequence according to the two arranged column numbers; finally, training and detecting the reserved steganography detection characteristic component as a finally selected steganography detection characteristic; the method can effectively reduce the DCTR characteristic dimension while maintaining or even improving the detection precision of the secret-carrying image, thereby reducing the space complexity of detecting the secret-carrying image, and the method can improve the operation efficiency to a great extent, thereby reducing the time complexity of detecting the secret-carrying image by the classifier, and reducing the detection cost.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a steganography detection feature selection method based on distortion factor and information gain ratio according to an embodiment of the present invention;

FIG. 2 is a process diagram of selecting steganographic detection features by the steganographic detection feature selection method based on distortion and information gain ratio according to the embodiment of the present invention;

FIG. 3 is a comparison of the S-FUND method of the present invention before and after the DCTR feature is selected.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The greater the difference of the steganographic detection characteristic component between the carrier image and the secret-carrying image is, the more favorable the distinguishing of the carrier image and the secret-carrying image is, and the more favorable the detection of the secret-carrying image is; conversely, a feature component having little or no difference between the carrier image and the secret image is regarded as a useless feature if its effect on detecting the secret image is too small. These features will result in the increase of feature dimension, bring unnecessary space-time overhead for detection, are not favorable for the application of steganography detection, and hinder the development thereof. Therefore, the characteristic component which contributes greatly to distinguishing the carrier image and the secret image is selected as much as possible, and in order to measure the difference between the steganographic detection characteristic component and the carrier image, the distortion degree and the information gain ratio are introduced.

As shown in fig. 1 and fig. 2, the present embodiment provides a steganography detection feature selection method based on distortion and information gain ratio, including the following steps:

step S11, measuring the difference between the carrier image and the secret image of each steganographic detection characteristic component by using the distortion degree and the information gain ratio; thus, the effect of the two criteria of distortion degree and information gain ratio is considered to be the same, and the reliability of the selected characteristic is increased.

Distortion-based metrics: after the secret information is embedded by using the steganographic algorithm, part of feature components of the carrier image can be changed, so that the difference exists between the carrier image and the secret-carrying image, and considering that the change degrees of all the feature components are not the same, the feature components with larger difference between the carrier image and the secret-carrying image are more beneficial to distinguishing the carrier image from the secret-carrying image. The carrier image will generate distortion in different degrees due to the embedding of information, and the distortion degree K of the carrier image before and after steganography is measured by using a formula (1)_iThe formula is as follows:

in the formula (I), the compound is shown in the specification,

and

and

respectively representing the value of the ith steganographic detection characteristic component in the jth carrier image and the value of the carrier image; k_iThe larger the value, the larger the distortion degree generated after the carrier image is embedded with the information, thereby indicating that the greater the difference between the steganographic detection characteristic component and the secret image, the more favorable the characteristic component is for detecting the secret image, and the more the characteristic component should be preserved.

Metric based on information gain ratio:

currently, information gain is used to measure the difference between a carrier image and a secret image, and the formula is as follows:

in the formula (I), the compound is shown in the specification,

an information gain value representing a characteristic component between the carrier image and the carrier density,

and

The larger the value is, the larger the information gain of the steganographic detection feature component between the secret-carrying image and the carrier image is, thereby indicating that the difference between the secret-carrying image and the carrier image is larger, and further being more beneficial to detecting the secret-carrying image.

However, when the value of the feature is more, the partition according to the feature can easily obtain a more definite subset, that is, the subset is more

Lower value, again because

The information gain is certain, so the information gain is larger, and therefore, if the information gain is taken as the basis for selecting the characteristics, the problem that the characteristics with more values are selected is existed preferentially. To solve this problem, the difference between the carrier image and the secret-carrying image of the feature component is steganographically detected using an information gain ratio, defined as the ratio of the information gain value of the feature component between the carrier image and the secret-carrying density to the partial entropy of the feature component in the carrier image with respect to the feature component in the secret-carrying image, measured using equation (2)

The formula is as follows:

in the formula (I), the compound is shown in the specification,

Step S12, the distortion value and the information gain ratio of each steganographic detection feature component are sorted in descending order.

In step S13, the steganographic detection feature component with a large difference in the arrangement order is deleted according to the two arranged column numbers.

In particular, according to K_iValue descending order arranged steganography detection characteristic component, and then according to

And step S14, training and detecting the reserved steganography detection feature component as a finally selected steganography detection feature.

Specifically, according to two sorting results, calculating the difference of the ith characteristic component arranged according to two criteria; the feature components having the absolute value of the difference in the arrangement order larger than the threshold value T are deleted.

Thus, the difference between the steganographically detected feature component in the carrier image and the secret image is measured herein using the distortion factor and the information gain ratio; and the functions of the two criteria for measuring the difference of the characteristic components between the carrier image and the secret image are considered to be the same, and the characteristic components with smaller information gain ratio and distortion degree arrangement difference are selected. The method greatly reduces the feature dimension, thereby reducing the space-time complexity of detecting the secret-carrying image.

The time complexity of the main steps in the S-FUND method proposed herein is analyzed one by one and compared with the time complexity of the classification result of the integrated classifier depending on Fisher linear discriminant, so as to better understand the performance of the method.

The S-FUND method proposed herein mainly comprises the steps of: calculating a distortion value and an information gain ratio, sorting the feature components in a descending order according to the distortion value and the information gain ratio, deleting the feature components with larger sorting order difference, and the like, and analyzing the time complexity of different steps respectively, as shown in table 1.

TABLE 1 time complexity analysis of the major steps

There is no nesting relationship of steps in table 1, so the time complexity of the S-FUND method proposed herein is equal to the maximum time complexity of all steps. When O (Nlog)₂N) is less than or equal to O (nn), i.e. log₂When N is less than or equal to N, the time complexity of the S-FUND method is O (nn); log when₂N>n, the time complexity of the S-FUND method is O (Nlog)₂N). However, most of the existing feature selection methods rely on the classification result of the Fisher linear discriminant ensemble classifier, and the time complexity of the Fisher linear discriminant ensemble classifier is as follows:

wherein L represents the number of individual learners, N^trnIndicates the number of training sets of each type, d_subRepresenting a subspace dimension. The time complexity O (FLD) of this type of selection method_depend) Is necessarily equal to or greater than O (FLD), i.e.

Therefore, the selection method of the integrated classifier result depending on Fisher linear discrimination is shownThe time complexity is much greater than O (nn) or O (Nlog)₂N). Since the characteristic dimension of DCTR is 8000, n<N and log₂N<And N is added. The time complexity of the S-FUND method is less than that of the PCA-D, Steganalysis-alpha, Fisher-G, and SRGS methods, similar to the CGSM method. Therefore, the S-FUND method greatly reduces the running time complexity and improves the efficiency of detecting the secret-carrying image.

To test the performance of the S-FUND method presented herein, we performed a series of selection and comparison experiments using 8,000-dimensional steganographic detection features of DCTR. All experiments utilize images in a BOSSbase1.01 image library, the types of the images are gray level images, the formats of the images are JPEG formats, and the images are operated in MATLAB R2018a carrying Inteli7-8550UCPU and 8G RAM computers, so that different methods can be compared fairly. Experimental results graphs were processed and generated in originpro8.5.

1. Subject setup

Computer software, hardware, an image library and steganography detection characteristics used in all experiments are the same, so that different methods can be compared fairly, and the experiments are more reliable.

The Break Ourt Steganographic System (BOSS) is the first image library to implement image steganography and steganography from theory to practical applications. We performed a series of operations on the BOSSbase1.01 image library of the website (the BOSSbase1.01 image library is from the website: http:// dde. binghamton. edu/download /), in preparation for the following experiments, as follows:

(1) 10,000 images in PGM format in the BOSSbase1.01 image library were converted to JPEG images with compression quality factor of 95.

(2) The SI-UNIWARD steganography algorithm [6] is used to generate 10,000 × 5 (50,000) dense images with embedding rates of payload 0.1,0.2,0.3,0.4, and 0.5(bpAC), respectively, from 10,000 JPEG carrier images.

(3) And extracting 8,000-dimensional steganography detection features from the carrier image and the secret-carrying image by using a DCTR extraction algorithm to obtain a steganography detection feature set of 10,000 (1+5) to 60,000 images. Specific subject settings are shown in table 2.

Table 2 subject setup

The method adopts a Fisher linear discrimination integration classifier to train and test the carrier image characteristics and the selected steganography detection characteristics, and is widely used for training and detecting steganography analysis characteristic selection. Firstly, randomly selecting one half of carrier image characteristics and secret-carrying image characteristics corresponding to different embedding rates from each group of characteristic image sets as a training set; and then, taking the residual carrier image characteristics and the secret-carrying image characteristics corresponding to different embedding rates as a test set. The error rate in the ensemble classifier is:

wherein, P_FAIndicates the false alarm rate, P_MDIndicates the missing rate, N_TSRepresenting the number of test sets because the test set contains a carrier image set and a secret image set, i.e., N _TS2. The error rate represents the proportion of the number of classification errors to the total test feature component. The lower the detection error rate is, the better the effect of the selected feature detection secret-carrying image is. In order to more intuitively represent the quality of the comparison experiment result, the formula is used:

and converting the detection error rate obtained by the classifier into detection precision.

Which represents the accuracy of the average detection,

the larger the size, the better the selected features will be in detecting the dense images.

2. Selection experiment

8000-dimensional steganography detection features of the DCTR image are used herein. This feature is the first order statistic of quantization noise residuals obtained from decompressed JPEG images using 64 discrete cosine transform kernels. Compared with other abundant models, the DCTR features lower dimensionality, computational complexity and higher detection performance.

In the S-FUND method, the steganography detection characteristic components with the difference larger than the threshold value T are measured by utilizing two criteria, and the threshold value T is analyzed in order to enable the S-FUND method to have a better selection effect. Firstly, in order to effectively reduce the feature dimension, the difference measured by using two criteria is considered to be larger than 15% of the original feature dimension, and for the DCTR feature of 8,000 dimensions, the steganography detection feature component with the difference larger than 8,000 × 15%, namely 1,200 dimensions is deleted; then, the value of the threshold T is gradually reduced, and the threshold T belongs to {0.15,0.14, …,0.02,0.01 }. Respectively carrying out experiments, comparing the feature dimension and the detection precision selected by each group, and the experimental results are shown in table 3:

table 3 comparative experiment results before and after DCTR characteristic selection based on S-FUND method

In table 3, Dim represents a feature dimension,

indicating the detection accuracy. As can be seen from Table 3, the S-FUND method can greatly reduce the feature dimension while maintaining or even improving the detection accuracy at different embedding rates. For example: when Payload is equal to 0.1, the detection precision of the features selected based on the S-FUND method on the secret-carrying image can reach 0.5270, which is 0.31% higher than the original detection precision, and the feature dimension after selection is 3462% lower than the original feature dimension; and when T is 0.04, the feature dimension of the feature selected based on the S-FUND method is only 30.44% of the original dimension while maintaining the detection accuracy of the secret-carrying image. When Payload is 0.2 and 0.3, the characteristic dimension of DCTR (discrete cosine transform) can be reduced to different degrees based on the characteristics selected by the S-FUND method, and the detection precision is higher than that of the original detection precisionRespectively improved by 0.49% and 0.16%; moreover, the features selected based on the S-FUND method only account for 29.79% and 51.79% of the original feature dimensions while the detection accuracy of the secret-carrying image is maintained, so that the space-time overhead of classifier training is reduced.

In order to compare the selection of the DCTR steganography detection features by the S-FUND method more intuitively, the feature dimensions and the detection precision before and after the selection are shown in FIG. 3.

In fig. 3, the horizontal axis represents a threshold, the vertical axis represents a corresponding feature dimension and detection accuracy, five lines from top to bottom represent the effect of DCTR features selected at five different embedding rates, and points with optimal performance at each embedding rate are processed and labeled with numerical values. The S-FUND method provided by the invention can maintain and even improve the detection precision of DCTR features on the dense images, and greatly reduce the feature dimension, thereby proving the effectiveness of the S-FUND method.

A large number of experiments are carried out, and the DCTR characteristics selected by the S-FUND method greatly reduce the characteristic dimension while the detection precision of the secret-carrying image is maintained or even improved. Compared with a Random-D method, a CGSM method and a PCA-D method, the detection accuracy of the secret-carrying image based on the characteristics selected by the S-FUND method is higher. For example: in a comparison experiment with the Random-D method, when Payload is 0.5, the detection accuracy of the secret-carrying image by the characteristics selected by the S-FUND method is up to 1.81 percent higher than that of the Random-D method; in a comparison experiment with the CGSM method, the detection precision of the features selected by the S-FUND method on the secret-carrying image is 2.25 percent higher than that of the CGSM method; in a comparison experiment with the PCA-D method, the detection precision of the features selected by the S-FUND method on the secret-carrying image is 4.25 percent higher than that of the PCA-D method.

The embodiment further provides a device for selecting steganography detection characteristics based on distortion degree and information gain ratio, which includes:

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A steganography detection feature selection method based on distortion degree and information gain ratio is characterized by comprising the following steps:

2. The method of claim 1, wherein each steganography detection feature is selected using a distortion metricDetecting the difference between the carrier image and the secret image of the characteristic component, wherein the carrier image can generate distortion in different degrees due to the embedding of information, and measuring the distortion degree K of the carrier image before and after steganography by using a formula (1)_iThe formula is as follows:

in the formula (f)_i ^CAnd f_i ^SRespectively representing the values of the ith steganographically detected feature component in the carrier image and the secret image,

and

3. The steganography detection feature selection method based on distortion degree and information gain ratio as claimed in claim 2, wherein the information gain ratio is defined as the ratio of the information gain value of the feature component between the carrier image and the carrier density to the bias entropy of the feature component in the carrier image about the feature component in the carrier image, and the information gain ratio g is measured by formula (2)_R(f_i ^S,f_i ^C) The formula is as follows:

in the formula, g (f)_i ^S,f_i ^C) An information gain value representing a characteristic component between the carrier image and the carrier density,

representing the partial entropy of the value of the characteristic component in the carrier image relative to the value of the characteristic component in the secret image; g_R(f_i ^S,f_i ^C) The larger the value, the larger the difference between the carrier image and the secret image, which indicates the steganographically detected feature component, the more favorable the feature component is for detecting the secret image, the more should it be preserved.

4. The steganography detection feature selection method based on distortion factor and information gain ratio as claimed in claim 3, wherein the formula of the information gain value of the feature component between the carrier image and the carrier density is as follows:

g(f_i ^S,f_i ^C)＝H(f_i ^S)-H(f_i ^S|f_i ^C) (4)

H(f_i ^S|f_i ^C)＝H(f_i ^S,f_i ^C)-H(f_i ^S) (5)

in the formula, H (f)_i ^C) And H (f)_i ^S) Respectively representing the information entropy values of the characteristic components in the carrier image and the secret image,

H(f_i ^S|f_i ^C) Conditional entropy, H (f), representing the median of the feature component in the carrier image_i ^S,f_i ^C) Representing the joint entropy of the feature components between the carrier image and the secret image.

5. The method for selecting steganography detection features based on distortion degree and information gain ratio as claimed in claim 4, wherein the step of arranging the distortion degree value and the information gain ratio value of each steganography detection feature component in descending order comprises the steps of:

according to g again_R(f_i ^S,f_i ^C) The steganographic detection feature components are arranged in descending order of value.

6. The method for selecting steganography detection features based on distortion factor and information gain ratio as claimed in claim 5, wherein the deleting the steganography detection feature components with larger difference of the arrangement order according to the two arranged column numbers comprises:

7. A steganography detection feature selection device based on distortion factor and information gain ratio is characterized by comprising: