CN108985391A

CN108985391A - Hidden writer's detection method of Behavior-based control

Info

Publication number: CN108985391A
Application number: CN201810996553.5A
Authority: CN
Inventors: 张卫明; 俞能海; 李莉; 姚远志
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2018-12-11

Abstract

The invention discloses a kind of hidden writer's detection methods of Behavior-based control, comprising: chooses a certain number of users from social platform, each user crawls N continuous images, and using the image of a part of user as training data, other are as test data；Image is randomly selected from training data to simulate hidden writer's behavior, generates hidden writer's data；Behavioural characteristic is extracted from training data and hidden writer's data respectively, and trains two classifiers using extracted feature；Two classifiers are tested using test data, and using by two classifiers after test, new input picture is detected, therefore, it is determined that the user for sending new input picture is normal users or hidden writer.Based on this method, hidden writer can be accurately detected.

Description

Hidden writer's detection method of Behavior-based control

Technical field

The present invention relates to a kind of hidden writer of social networks safety and steganalysis field more particularly to Behavior-based control detections Method.

Background technique

The purpose of steganalysis is whether detection image is modified by Steganography.For the steganalysis of single image, lead to Often regard that one is distinguished carrier and carries two close classification problems as, wherein design is able to reflect message insertion and counts special to carrier Property influence validity feature be one of its critical issue.The rich model steganalysis feature and selection that Fridrich et al. is proposed Channel attack model makes the steganalysis performance of single image be greatly improved；In recent years, with the hair of deep learning Exhibition, CNN, RNN, Res-Net, GAN are also increasingly used for steganalysis.

Although Steganalysis is constantly progressive, current research is all based on laboratory condition, i.e. image is generally Natural image, and the matching of insertion rate and embedded mobile GIS is required when training classifier.But this is usually unable to satisfy in reality Kind requires.Insertion rate and embedded mobile GIS firstly for image are unknown.In addition, in true social platform, Yong Hufa The noise source of the picture material and image sent is also multiplicity, and it is various that this will make the mode of this supervised learning face The problem of mismatch.Even if steganalysis feature is up to tens of thousands of dimensions, also it is difficult to play its effectiveness in true scene.For this Situation, Ker are proposed the concept of hidden writer's detection, are carried out as unit of the user for sending image rather than as unit of single image Detection.In hidden writer detection, unsupervised learning method is generally used.Ker proposes the method detection steganography using cluster first Person again detected the local outlier factor (Local Outlier Factor) in abnormality detection for hidden writer in 2014. The method that Li et al. people in 2016 proposes hierarchical clustering and clustering ensemble.Zheng et al. attempts to extract using deep neural network hidden Analysis feature is write to detect for hidden writer.Although these methods avoid the problem of mismatch in supervised learning, it is adopted Feature is all traditional steganalysis feature of low-dimensional, and essence is still by whether making steganography modification to be sentenced to do Fixed, for different data, performance be would also vary from.Fig. 1 is the local outlier factor proposed using Ker (lof) experimental result in BossBase and twitter data, abscissa represent insertion rate, ordinate generation to method respectively Average ranking of the lof value of the hidden writer of table in 100 hidden writers, it is more forward to illustrate that effect is better.It can be seen from the figure that It shows widely different in the data of BossBase and twitter, is influenced by image source very big.And when insertion rate is low Average ranking reaches 50, is equivalent to and is substantially not detectable.

The full communication process communicated using hidden image should include the selection of image-carrier, point of insertion rate Match, the selection of embedded mobile GIS, last embedded images are simultaneously sent.In social scene, the behavioural information of various dimensions can be related to, than Such as the frequency of communication, the object of communication sends the content relevance of image.And current Steganography is concerned only with the peace of single dimension Entirely, i.e., so that carrier and the close undistinguishable of load.We have investigated hundreds of steganography software, and almost all of software is all only paid attention to The improvement of steganographic algorithm, the information revealed without considering other behaviors of user in entire communication process, for example, being sent out Send the correlation of image.Existing steganography software does not have the function that carrier is selected for user, and friendly software relatively can be The random selection carrier of user allows user to use the image of oneself captured in real-time.But for the user of not professional knowledge For, in order to save time and efforts, it is likely that image can be selected at random as carrier.In this case, using user's Behavioural information will make hidden writer detect the basic change of generation to detect hidden writer.

Summary of the invention

The object of the present invention is to provide a kind of hidden writer's detection methods of Behavior-based control, can accurately detect hidden writer.

The purpose of the present invention is what is be achieved through the following technical solutions:

A kind of hidden writer's detection method of Behavior-based control, comprising:

A certain number of users are chosen from social platform, and each user crawls N continuous images, and by a part of user Image as training data, other are as test data；

Database of the image of selected part user as hidden writer, and therefrom randomly select a certain number of images and carry out mould Intend hidden writer's behavior, generates hidden writer's data；

Behavioural characteristic is extracted from training data and hidden writer's data respectively, and trains two points using extracted feature Class device；

Two classifiers are tested using test data, and using by two classifiers after test, input figure to new As being detected, therefore, it is determined that the user for sending new input picture is normal users or hidden writer.

As seen from the above technical solution provided by the invention, the feature of correlation between image will be reflected as row It is characterized, and cooperates two classifiers that can accurately detect hidden writer.Meanwhile the diversity of behavioural information can be examined for steganography person It surveys and the detection visual angle of multi-angle is provided, on the one hand on the other hand, steganography software can be promoted to consider with the hidden writer of more reliable detection Behavioural information designs more humane safer steganography software.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

The method that Fig. 1 is the lof proposed using Ker that background of invention provides respectively in BossBase and Experimental result in twitter data；

Fig. 2 is a kind of flow chart of hidden writer's detection method of Behavior-based control provided in an embodiment of the present invention；

Fig. 3 is the flow chart provided in an embodiment of the present invention for extracting behavioural characteristic；

Fig. 4 is the experimental result provided in an embodiment of the present invention based on the present invention program.

Specific embodiment

With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based on this The embodiment of invention, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, belongs to protection scope of the present invention.

The embodiment of the present invention provides a kind of hidden writer's detection method of Behavior-based control, mainly includes the following steps:

1, a certain number of users are chosen from social platform, each user crawls N continuous images, and a part is used The image at family is as training data, other are as test data.

Illustratively, social platform can be selected as twitter, and the tool of crawling can choose tweepy；It, can in practical operation To use tweepy to crawl upper 2000 users of twitter, the user by picture number less than 100 is screened out, and retains 700 use Family, each user retain 100 continuous images.

It,, will using the resize function of matlab after crawling each user N continuous images in the embodiment of the present invention Each image cropping is specified size m × n.Illustratively, size can be set to 512*512.

The division proportion of training data and test data may be set according to actual conditions.

2, database of the image of selected part user as hidden writer from the image crawled, and therefrom randomly select one The image of fixed number amount simulates hidden writer's behavior, generates hidden writer's data.

Since the data of hidden writer in practice can not be obtained, for the validity of verification method, in the embodiment of the present invention, A part is randomly selected from the image crawled to test to simulate hidden writer.

Likewise, the amount of images randomly selected from hidden writer's database may be set to be 100.

3, behavioural characteristic is extracted from training data and hidden writer's data respectively, and trains two using extracted feature Classifier.

In the embodiment of the present invention, training data is made of the image of a part of user, and hidden writer's data are also by steganography The image of person forms, and the mode of feature extraction is identical；For each user or hidden writer, from corresponding image sequence The feature of correlation between being able to reflect image is extracted as behavioural characteristic；Extracting mode is as shown in figure 3, main process is as follows:

1) for the image sequence of each user or hidden writer, the difference of the grey level histogram of adjacent two images is calculated, Constitute matrix of differences:

d^i,i-1=abs (hⁱ-h^i-1)；

In formula, hⁱ、h^i-1Respectively indicate the grey level histogram of the i-th width image, the (i-1)-th width image；

2) quantification treatment is carried out to matrix of differences: first takes logarithmic quantization, is then truncated, truncation section is [0, T] table It is shown as:

D'=truc_T(round(logd^i,i+1))；

3) the frequecy characteristic P and co-occurrence matrix C of d' are counted using the single order of all matrix of differences and second-order statistics Distribution:

P=[p₁,...,p_T+1]；

Wherein, d'_k、d'_k+1Respectively indicate kth in d', k+1 element；M, n is respectively the length and width of image；C in co-occurrence matrix C_l,jAnd c_j,lCorrelation is similar between represented pixel, is merged, and is closed Co-occurrence matrix after and

By frequecy characteristic P with merge after the co-occurrence matrix C' feature namely behavioural characteristic that merge to the end:

F=[P C']；

Characteristic dimension is as follows:

Assuming that take T=12, then the characteristic dimension of behavioural characteristic | F |=104.

4, two classifiers are tested using test data, and using by two classifiers after test, is inputted to new Image is detected, therefore, it is determined that the user for sending new input picture is normal users or hidden writer.

Obtain trained two classifier in the above manner, then using test data to trained two classifier into Row test, two classifiers tested after passing through then can be used for the classification and Detection of hidden writer.

It will be understood by those skilled in the art that being also needed in test phase and when being detected to new input picture Behavioural characteristic is extracted, then using the behavioural characteristic extracted as the input of two classifiers, is positive to obtain behavioural characteristic correspondence The classification output of common family or hidden writer.

In addition, by changing training data composition, reducing training number for the hidden writer realized with certain behavior safety According to the mismatch problems with test data.Hidden writer is divided into different behavior safeties according to ratio shared by random image in image Grade has divided multiple behavior safety grades altogether, indicates not have the hidden writer of behavior safety consciousness with P%, i.e., transmitted Image is all random；(P-Q) % indicates the hidden writer with certain behavior safety consciousness, in the image transmitted by him, has Q% is sent according to the sequence of normal users, and P% is the image randomly selected.And so on, obtain multiple behavior safeties etc. Grade；Illustratively, the hidden writer for not having behavior safety consciousness can be indicated with 100%, i.e., transmitted image is all random 's；90% indicates the hidden writer with certain behavior safety consciousness, and in the image transmitted by him, 10% is according to just common What the sequence at family was sent, 90% is the image randomly selected.And so on, we obtain the hidden writer of 10 kinds of grades.

In training classifier, the hidden writer of different safety class averagely mixes composition training set, reaches for unknown The accurate detection of the hidden writer of behavior safety grade.Such as: 1000 hidden writers in training set include 100 " 10% " hidden Writer, 100 " 20% " hidden writer ... 100 " 100% " hidden writers.

In order to which the detection effect of above scheme of the present invention has also carried out related experiment.Experimental result is as shown in table 1 and Fig. 4.

1 combined training of table and test experiments result

The experiment of table 1 is the hidden writer for realizing with certain behavior safety, by changing the composition of training set, instruction Practice mixed classifier.Then using mixed classifier respectively to the hidden writer of different safety class (10%, 20% ..., 100%) it is tested, obtained false dismissal probability result.

Fig. 4 be in order to illustrate this method for picture number have robustness, that is, select different number of image into Row experiment, but guarantee that test set and training set, normal users and hidden writer, the picture number of selected each user are consistent. 10 image/users, 20 image/users ..., 100 image/users, obtained Average Error Probabilities (false-alarm are chosen respectively Rate and false dismissed rate are averaged).

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment can The mode of necessary general hardware platform can also be added to realize by software by software realization.Based on this understanding, The technical solution of above-described embodiment can be embodied in the form of software products, which can store non-easy at one In the property lost storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Within the technical scope of the present disclosure, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims Subject to enclosing.

Claims

1. a kind of hidden writer's detection method of Behavior-based control characterized by comprising

A certain number of users are chosen from social platform, and each user crawls N continuous images, and by the figure of a part of user As being used as training data, other are as test data；

Database of the image of selected part user as hidden writer, and it is hidden to simulate therefrom to randomly select a certain number of images Writer's behavior generates hidden writer's data；

Behavioural characteristic is extracted from training data and hidden writer's data respectively, and trains two classification using extracted feature Device；

Two classifiers are tested using test data, and using by test after two classifiers, to new input picture into Row detection, therefore, it is determined that the user for sending new input picture is normal users or hidden writer.

2. a kind of hidden writer's detection method of Behavior-based control according to claim 1, which is characterized in that crawl each user It is specified size by each image cropping using the resize function of matlab after N continuous images.

3. a kind of hidden writer's detection method of Behavior-based control according to claim 1, which is characterized in that the extraction behavior Feature includes:

Training data is made of the image of a part of user, and hidden writer's data are also to be made of the image of hidden writer, and feature mentions The mode taken is identical；For each user or hidden writer, extracts and be able to reflect between image from corresponding image sequence The feature of correlation is as behavioural characteristic；Extracting mode is as follows:

For the image sequence of each user or hidden writer, the difference of the grey level histogram of adjacent two images is calculated, it is poor to constitute Value matrix:

d^i,i-1=abs (hⁱ-h^i-1)；

Quantification treatment is carried out to matrix of differences: first taking logarithmic quantization, is then truncated, section is truncated as [0, T] expression are as follows:

D'=truc_T(round(logd^i,i+1))；

Frequecy characteristic P and co-occurrence matrix the C distribution of d' is counted using the single order of all matrix of differences and second-order statistics:

P=[p₁,...,p_T+1]；

Wherein, d'_k、d'_k+1Respectively indicate kth in d', k+1 element；M, n is respectively the length and width of image； C in co-occurrence matrix C_i,jAnd c_j,iCorrelation is similar between represented pixel, is merged, the co-occurrence matrix after being merged

F=[P C']；

Characteristic dimension is as follows:

4. a kind of hidden writer's detection method of Behavior-based control according to claim 1, which is characterized in that this method is also wrapped It includes: hidden writer being divided into different behavior safety grades according to ratio shared by random image in image, has divided multiple rows altogether For security level, the hidden writer for not having behavior safety consciousness is indicated with P%, i.e., transmitted image is all random；(P- Q) % indicates the hidden writer with certain behavior safety consciousness, and in the image transmitted by it, Q% is according to normal users What sequence was sent, P% is the image randomly selected, and so on, obtain multiple behavior safety grades；In training classifier, The hidden writer of different safety class averagely mixes composition training set, reaches the hidden writer of standard to(for) unknown behavior safety grade Really detection.