CN108509775A - A kind of malice PNG image-recognizing methods based on machine learning - Google Patents
A kind of malice PNG image-recognizing methods based on machine learning Download PDFInfo
- Publication number
- CN108509775A CN108509775A CN201810128524.7A CN201810128524A CN108509775A CN 108509775 A CN108509775 A CN 108509775A CN 201810128524 A CN201810128524 A CN 201810128524A CN 108509775 A CN108509775 A CN 108509775A
- Authority
- CN
- China
- Prior art keywords
- png
- image
- steganography
- picture
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000010801 machine learning Methods 0.000 title claims abstract description 25
- GOLXNESZZPUPJE-UHFFFAOYSA-N spiromesifen Chemical compound CC1=CC(C)=CC(C)=C1C(C(O1)=O)=C(OC(=O)CC(C)(C)C)C11CCCC1 GOLXNESZZPUPJE-UHFFFAOYSA-N 0.000 claims abstract description 12
- 241001269238 Data Species 0.000 claims abstract description 5
- 230000005540 biological transmission Effects 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 7
- 230000001965 increasing effect Effects 0.000 claims description 6
- 230000003466 anti-cipated effect Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 238000013135 deep learning Methods 0.000 claims description 4
- 238000013441 quality evaluation Methods 0.000 claims description 4
- 238000013526 transfer learning Methods 0.000 claims description 4
- 230000008676 import Effects 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 2
- 238000005259 measurement Methods 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims description 2
- 230000008569 process Effects 0.000 claims description 2
- 238000003062 neural network model Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 10
- 238000005728 strengthening Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 9
- 238000001514 detection method Methods 0.000 description 6
- 230000006399 behavior Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 2
- 238000000540 analysis of variance Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000013442 quality metrics Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000686 essence Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000009385 viral infection Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/0021—Image watermarking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/16—Program or content traceability, e.g. by watermarking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration by the use of local operators
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/031—Protect user input by software means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2201/00—General purpose image data processing
- G06T2201/005—Image watermarking
- G06T2201/0065—Extraction of an embedded watermark; Reliable detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Multimedia (AREA)
- Technology Law (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Image Analysis (AREA)
Abstract
The present invention proposes the malice PNG image-recognizing methods based on machine learning, belongs to cyberspace security technology area, initially sets up PNG characteristics of image library and steganography identification model;Uploading pictures file request is examined in server-side, characteristic matching identification is carried out according to PNG characteristics of image library, whether preliminary identification PNG pictures are legal, call number steganography identification model excavates PNG pictures with the presence or absence of Information hiding if legal, refuses to upload if illegal or existence information is hidden;PNG picture format file datas in client monitors webpage transmission process, characteristic matching identification is carried out according to PNG characteristics of image library, call number steganography identification model excavates PNG pictures with the presence or absence of Information hiding if legal, forbids accessing the picture resource if illegal or existence information is hidden.The present invention can forbid the upload of illegal picture in server-side, forbid, to the access of illegal picture, strengthening network security in client.
Description
Technical field
The invention belongs to cyberspace security technology area more particularly to a kind of malice PNG images based on machine learning
Recognition methods.
Background technology
With the rapid proliferation application of network, the fast development of digitizing technique, cyberspace safety problem gradually enters into
The visual field of people is paid attention to by more and more people.
On the one hand, browser obtains the main media of internet information as people, and safety problem is not allowed to despise.In recent years
Come, JavaScript examine it is not stringent etc. due to, more and more webpages have been implanted panoramic web advertisement, it
It is light then induce user to click to access malicious link, it is heavy then by by Malware, malice dynamic link library file (Dynamic
Link Libraries, DLL) it is attached to the mode of Web page picture, computer and networks system of defense is bypassed, directly to user
People's computer and mobile device cause the baneful influences such as virus infection, information leakage.
On the other hand, website is emerged one after another by illegal control, mass data leakage event, and frequent as wherein utilizing
One attack technology --- malicious code is uploaded by file upload function, such as a word wooden horse, and then controls server, danger
Evil should not be underestimated.For the detection for uploading malicious code and around being defence and attack that game both sides never stop.In recent years,
Attacker begins to use the PNG pictures of upload " legal " to hide the detection of intruding detection system, and malicious code then passes through volume
The Steganographies such as code, LSB steganography are hidden in " legal " PNG pictures of forgery, once successfully uploading, attacker can lead to
Cross the mode of the attack load being hidden in PNG pictures that parsing constructs meticulously that accesses, remote control Website server, thus into
The more damaging trial of row and operation behavior, such as steal website user's private data, remote control Website server conduct
Puppet's machine starts to attack (DoS) etc. the denied access of other servers.
After all, either in the client of such as browser etc, or deployment Website server server-side,
One urgent problem to be solved is exactly to be audited to the picture in webpage to prevent hiding malicious act.The figure of PNG format
Piece, lossless compression small with its, optimization network transmission display the features such as be widely used in webpage, PNG pictures are also
Good Covers of Information Hiding, it should be by the object of primary study.
If server-side can efficiently and accurately identify legal figure when handling user's uploading pictures file request
Piece upload request, and analyze and whether used Steganography in picture and contained malicious attack load;Client can
When accessing web page resources, the picture resource in webpage is filtered, to the doubtful picture resource for including rogue program file
Forbid voluntarily downloading, then can contain the generation of such malicious act from source.
For this purpose, we introduce machine learning techniques and Steganography to solve the problems, such as this.
The application of machine learning techniques spreads the every field of artificial intelligence, is the core technology of artificial intelligence.Currently, machine
The characteristic that device learning art is learnt with its autonomous learning, efficiently, accurately learnt, also begins to play in cyberspace security fields
Huge effect.
The realization of machine learning has inseparable relationship with three parts:Environment, study part and execution part.Ring
Border provides certain information to the study part of system, and knowledge base is changed in study part using these information, to promote system execution
It is partially completed the efficiency of task, execution part completes task according to knowledge base, while the information of acquisition is fed back to study part.
Below for identifying PNG images, detailed description influences three factors of machine learning system design:
The information that environment is provided to system:What is stored in knowledge base is the rule for instructing execution part to act, but ring
The information that border is provided to system is various.It is smaller with the difference of rule if the quality of information is relatively high,
Then learn part to be easier to handle.If providing rambling guidance to system executes the specific letter specifically acted
It ceases, then system needs after obtaining enough data, deletes unnecessary details, popularization of summarizing, and forms guidance action
Rule is put into knowledge base, learns the task of part in this way with regard to burdensome, design is got up also more difficult.
Knowledge base:Form that there are many expressing for knowledge, such as the storage mode of the header flags of PNG images, PNG images,
The end mark etc. of PNG images.These representations have their own characteristics, and following 4 sides are met when selecting representation
Face:
(1) ability to express is strong;
(2) it is easy to reasoning;
(3) it is easily modified knowledge base;
(4) representation of knowledge is easy to extend.
Execution part:The core of whole system because the action of execution part is exactly study part strive for improvement it is dynamic
Make.During to PNG image recognitions, constantly the content for learning part is adjusted according to recognition result, is held with improving
Accuracy when row.
Steganography is a kind of safety being embedded into secret information in digital media without damaging its carrier quality
Technology.The secret information handled by Steganography, third party be both not aware of the presence of secret information, had not also known secret
The content of confidential information.The carrier of steganography includes image, audio, video etc..In recent years, Steganography is changeable, hidden by its
The features such as property hidden is strong has become information security technology focus of attention.Since each Web site is dependent on various more matchmakers
Body, such as audio, video and image resource, therefore attacker can be by hidden with number to Malware, malicious attack load
Attack is hidden in multimedia by writing technology, and can easily bypass the detection of anti-malware, to cause bigger
Potential threat.
By taking the image of multimedia resource as an example, classical digital picture steganography includes two aspects, based on the hidden of spatial domain
It writes and the steganography based on transform domain.Wherein, the steganography based on spatial domain mainly has least significant bit (Least Significant
Bit, LSB) steganography, the steganography based on transform domain is main and discrete cosine transform (the Discrete Cosine of image
Transform, DCT) coefficient is related, including Jsteg steganography, F5 steganography, Outguess steganography, is based on model (Model-
Based, MB) steganography etc..
Invention content
In order to solve the problems of prior art, the present invention provides a kind of malice PNG images based on machine learning
Recognition methods carries out characteristic matching identification using PNG characteristics of image library, and judges PNG pictures by steganography identification model
Forbid to illegal picture in client to forbid the upload of illegal picture in server-side with the presence or absence of hiding information
It accesses, strengthens network security.
The present invention adopts the following technical scheme that realize:A kind of malice PNG image-recognizing methods based on machine learning, packet
Include following steps:
Step 1: establishing PNG characteristics of image library and steganography identification model by machine learning;
Step 2: being examined the request of all uploading pictures files in server-side, the PNG that control step 1 is established
Characteristics of image library carries out characteristic matching identification to PNG pictures, if it find that illegal PNG picture formats, then refuse to upload and ask
It asks;Otherwise, PNG pictures enter step three by tentatively identifying;
Step 3: for the PNG picture format files by tentatively identifying, the steganography that invocation step one is established is known
Other model excavates PNG pictures and whether there is Information hiding, and if it exists, then refuses upload request;If being not present, allow to upload
Request;
Step 4: the PNG picture format file datas in client monitors webpage transmission process, control step 1 is built
Vertical PNG characteristics of image library carries out characteristic matching identification, if it find that illegal PNG picture formats, then prohibit to PNG pictures
Only access the picture resource;Otherwise, five are entered step;
Step 5: the steganography identification model that invocation step one is established, it is hidden with the presence or absence of information to excavate PNG pictures
It hides, the picture hidden for existence information, it is believed that fallacious message may be hidden, forbid accessing the picture resource.
Preferably, PNG characteristics of image library is established described in step 1, process is as follows:Batch PNG image conducts are provided first
Training set data imports machine learning system;Next establishes PNG characteristics of image identification library, including following characteristics information:(1).PNG
Head feature;(2) .PNG end marks IEND blocks;(3) records the IHDR blocks of PNG image informations;(4) stores real image number
According to IDAT blocks;(5) stores image redundancy block of information;It is finally directed to library identified above, selects supporting vector machine model to carry out special
Sign study completes the identification to target and classifies.
Preferably, the steganography identification model described in step 1, in such a way that shallow-layer study and deep learning combine
To establish:On the one hand the steganography feature based on classical steganographic algorithm establishes feature database and carries out feature learning;On the other hand, it is based on hidden
The feature of slight change certainly will occur for the picture quality after writing, to the PNG images containing steganography information and without steganography information
PNG images are filtered pretreatment using high-pass filter respectively, enhance image presentation features, using the residual image of acquisition as
Then training set selects convolutional neural networks model to carry out transfer learning, there are the probability of steganography for final output image.
Preferably, the steganography feature based on classical steganographic algorithm establishes feature database and carries out feature learning, to select RS
The study that parser carries out having supervision to PNG images:
Input is waited for that the image of training pattern is divided into the identical image block of multiple sizes first, to each image block scan
It is arranged in pixel vectors G={ x1,x2,...,xn, and calculate using following formula the spatial coherence of each image block:
Wherein xiIndicating the gray value of each pixel, and f values are smaller, gray-value variation is smaller between illustrating neighbor pixel,
Image block spatial coherence is stronger;
Then partial pixel is randomly selected to each image block and applies non-negative turning operation, wherein the definition of overturning function is such as
Under:
Remember F1For the mutual variation relation of pixel value 2i and 2i+1, i.e.,
Remember F-1For the mutual variation relation of pixel value 2i-1 and 2i, i.e.,
Remember F0For pixel value invariant relation;
Calculate the ratio R of the increased image block of its spatial coherenceMOr the ratio S of the image block of reductionM:
Equally, partial pixel is randomly selected to each image block and applies non-positive turning operation, calculate the increasing of its spatial coherence
The ratio R of the image block added-MOr the ratio S of the image block of reduction-M:
If right when the non-positive overturning of application is more than increase of the non-negative overturning of application to confusion degree to the increase of confusion degree
It is that there are LSB steganography features that label, which is arranged, in the PNG images;Conversely, setting label is that there is no LSB steganography features, and carry out defeated
Go out;
Using PNG images as input object, if there are LSB steganography features as anticipated output, finally by input object
Training data is formed with anticipated output and establishes a mode of learning, and mode of learning speculates whether new PNG images are deposited according to this
In LSB steganography.
Compared with prior art, the present invention has the advantages that:Present invention introduces machine learning techniques and number are hidden
Writing technology establishes PNG characteristics of image library and carries out characteristic matching identification, preliminary to judge PNG images with the presence or absence of the hidden of fallacious message
It hides, and further judges that PNG pictures whether there is hiding information by steganography identification model, to forbid not in server-side
The upload of legal picture is forbidden, to the access of illegal picture, strengthening network security in client.Wherein, in steganography
The study for selecting RS parsers to carry out having supervision to PNG images in identification model, by the positive and negative turning operation for overturning function
To image confusion degree whether quite come judge image whether there is LSB steganography features, followed by convolutional neural networks pair
There are the probability of steganography to carry out deep learning for image, judges, accuracy rate is high, and the design of entire model is relatively simple, is easy to
It realizes.
Description of the drawings
Fig. 1 is a kind of malice PNG image-recognizing method flow charts based on machine learning provided in an embodiment of the present invention;
Fig. 2 is the number in a kind of malice PNG image-recognizing methods based on machine learning provided in an embodiment of the present invention
Steganography identification model frame diagram.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention
It is described in detail.It should be appreciated that described herein, specific examples are only used to explain the present invention, is not used to limit this hair
It is bright.
The realization of the present invention is based on server-side and client two parts.Technical solution of the present invention is applied in server-side,
If the request of each uploading pictures file can be recorded, and sequentially enter PNG feature recognitions as test set data
Library and steganography identification model are matched, then can be achieved with effectively containing that hacker controls by load is attacked in upload
The behavior of server.In the client by technical solution of the present invention application, if each the web page resources containing picture can be by
It records, and sequentially enters PNG feature recognitions library as test set data and steganography identification model is matched, then
It can effectively contain the behavior of malicious act control user equipment from source.
The present invention is trained by a large amount of PNG image recognitions first, establishes PNG characteristics of image identification library;By more to using
Kind Steganography carries out the PNG images of steganography information in such a way that shallow-layer learns and deep learning is combined, and establishes number
Word steganography identification model.In service end environment, the text in library identification client file upload procedure is identified according to PNG characteristics of image
Whether part is PNG images, if it is confirmed that being PNG images, then tentatively assert legal and carries out next step detection, if it is confirmed that not being inconsistent
Close the requirement of PNG picture formats, then it is assumed that upper transmitting file is illegal, refuses upload request.Tentatively assert it is legal after, further use
Steganography identification model detects PNG images and whether there is Information hiding, and if it exists, then think that file is doubtful malicious file,
Refuse client upload request;If being not present, then it is assumed that mean no harm behavior, allows upload request.In client environment, by clear
Web page picture (the refering in particular to PNG images herein) data that device of looking at monitors plug-in unit in real time or other real-time monitoring instruments browse user into
Row monitoring in real time carries out characteristics of image identification with PNG characteristics of image identification library, and the image that such as notes abnormalities (is known by machine
Result is not meet the PNG images of specification after not), then forbid user to access the image resource;If not finding image abnormity into
One step whether there is Information hiding with steganography identification model detection image, and if it exists, user is then forbidden to access the picture
Resource;If being not present, user can normally access the picture resource.Such as Fig. 1, specifically comprise the following steps:
Step 1: establishing PNG characteristics of image library and steganography identification model by machine learning.
Foundation for PNG characteristics of image library, it is contemplated that the uniformity of PNG picture formats, therefore only need to use shallow-layer
Study:It is to provide batch PNG images first and imports machine learning system as training set data.It is special followed by establish PNG images
Sign identification library, including following characteristics information:(1) .PNG head features;(2) .PNG end marks IEND blocks;(3) records PNG figures
As the IHDR blocks of information;(4) stores the IDAT blocks of actual image data;(5) stores image redundancy block of information (such as tExt blocks)
Deng.The identification library for being finally directed to the above hand-designed carries out feature learning, in view of study is intended to complete the identification to target point
Class, therefore supporting vector machine model (Support Vector Machine, SVM) is selected to carry out supervised learning.
Foundation for steganography identification model, it is contemplated that other than some classical steganographic algorithms, based on classical hidden
The steganographic algorithm of the transformation or independent design of writing algorithm is difficult detected feature, so the present invention is learnt and deep using shallow-layer
The mode that degree study combines:
On the one hand it is that the steganography feature based on classical steganographic algorithm establishes feature database, carries out feature learning, classics here
Steganographic algorithm refers to the steganographic algorithm under spatial domain --- such as least significant bit (Least Significant Bit, LSB) steganography.
In view of RS (Regularand Singular groups method) parser is based on Image Smoothness before and after steganography
Change to detect secret information, to using random LSB steganographic algorithms, (i.e. classified information selects the minimum of image according to random sequence
Significance bit carries out steganography) there is very strong robustness, therefore the study that selection RS parsers carry out having supervision to PNG images
(Supervised learning), it is specific as follows:
Input is waited for that the image of training pattern is divided into the identical image block of multiple sizes first, to each in a manner of Zigzag
A image block scan is arranged in pixel vectors G={ x1,x2,...,xn, and calculate using following formula the space of each image block
Correlation:
Wherein xiIndicating the gray value of each pixel, and f values are smaller, gray-value variation is smaller between illustrating neighbor pixel,
Image block spatial coherence is stronger.
Then partial pixel is randomly selected to each image block and applies non-negative overturning (F1And F0) operation, wherein overturning function
It is defined as follows:
Remember F1For the mutual variation relation of pixel value 2i and 2i+1, i.e.,
Remember F-1For the mutual variation relation of pixel value 2i-1 and 2i, i.e.,
Remember F0For pixel value invariant relation.
The ratio for calculating the increased image block of its spatial coherence (is denoted as RM) or the ratio of image block of reduction (be denoted as
SM):
(RM+SM≤1)
Equally, partial pixel is randomly selected to each image block and applies non-positive overturning (F-1And F0) operation, calculate its space
The ratio of the increased image block of correlation (is denoted as R-M) or the ratio of image block of reduction (be denoted as S-M):
(R-M+S-M≤1)
Statistically, if image does not pass through LSB steganography, non-negative overturning is carried out to image or non-positive overturning is grasped
Make the spatial coherence that meeting equal extent destroys image block, i.e., increases the confusion degree of image block on an equal basis, there is R at this timeM≈R-M,SM
≈S-M, and RM> SM,R-M> S-M。
Therefore, if being grasped when being more than the non-negative overturning of application using the increase of confusion degree caused by non-positive turning operation to image
Caused by making when the increase of confusion degree, it is believed that the PNG images very likely have LSB steganography, and setting label is hidden for there are LSB
Write feature;Conversely, setting label is that there is no LSB steganography features, and are exported.Finally by input object (PNG images) and
Anticipated output (whether there is LSB steganography feature) composition training data simultaneously establishes a mode of learning (Learning Model),
And mode of learning speculates that new PNG images whether there is LSB steganography according to this.
On the other hand, the feature of slight change certainly will occur based on the picture quality after steganography, first to containing steganography letter
The PNG images of breath and the PNG images without steganography information are filtered pretreatment using high-pass filter respectively, and enhancing image is aobvious
Show feature, using the residual image of acquisition as training set;In view of superiority of the convolutional neural networks model in space reflection,
In terms of suitable for processing image, and helps to reduce in the insufficient transfer learning of data volume and build Neural Network Data
Demand, therefore selected based on Lionel Pibre et al. improved convolutional neural networks (Convolutional Neural
Network, CNN) model carries out transfer learning, and main thought is as follows:
By the convolutional neural networks model of Lionel Pibre et al. pre-training as feature extraction operator, by convolutional Neural
Last layer of network changes the grader of oneself into, then fixes the weight and the entire convolutional neural networks of training of other layers.
Referring to Fig.2, convolutional neural networks model structure is as follows:
Input:Treated residual image all pixels point value;
Feature structure layer:Using model trained in advance as feature extractor;
Grader:Including the full articulamentum (Fully Connected Layer) being connected and classification function
(softmax);
Output:There are the probability of steganography for image;When output probability is more than 0.8, it is believed that there are steganographies for image.
Wherein grader used Avcibas propose based on image quality evaluation (Image Quality Metrics,
IQM) blind checking method is built, specific as follows:
1. feature vector is selected by defining the measurement of a variety of picture qualities, in order to extract distincter feature, this
In used variance analysis (Analysis of Variance, ANOVA) technology;By taking Minkowsky features as an example, two images
Dissimilar degree norm can by spatially take pixel difference Minkowsky average values then with coloration (i.e. whole
On a frequency band) it indicates:
M when wherein γ=1γIndicate absolute average error, M when γ=2γIndicate mean square error, Ck(i, j) indicates pixel position
The multispectral component of the normal picture of i, j and pixel k is set,Indicate the hidden image of location of pixels i, j and pixel k
Multispectral component, N indicate total number of image pixels;
2. selected IQM (Image Quality Metrics, image quality evaluation) forms a multidimensional characteristic sky
Between, normal picture is more easy to distinguish with hidden image within this space;
3. after having chosen suitable feature set, multiple linear regression model is established on lot of experimental data, is being returned
The grader for distinguishing normal picture and hidden image is established on the basis of model.
Step 2: being examined the request of all uploading pictures files in server-side, pre- place first is decoded to data
Reason, then the PNG characteristics of image library that step 1 is established is compareed, characteristic matching identification is carried out to PNG pictures, if it find that illegal
PNG picture formats, then refuse upload request;Otherwise, PNG pictures enter step three by tentatively identifying.
In this step, the request of uploading pictures file is examined, examines that information includes following:(1) file suffixes
Name;(2) the content genres Content-type of .HTTP messages head statement;(3) whether transferring contents are by coding;(4).
Whether transferring content is legal.
Step 3: for the PNG picture format files by tentatively identifying, the steganography that invocation step one is established is known
Other model excavates PNG pictures and whether there is Information hiding, and if it exists, then refuses upload request;If being not present, allow to upload
Request.
Step 4: the PNG figures in client monitors the forms such as plug-in unit monitoring webpage transmission process by browser in real time
Piece format file data the pretreatments such as is decoded to data, then compares the PNG characteristics of image library that step 1 is established, to PNG
Picture carries out characteristic matching identification, if it find that illegal PNG picture formats, then forbid accessing the picture resource;Otherwise, into
Enter step 5.
Client monitors webpage PNG image datas refer specifically to monitoring PNG image datas itself and whether there is Information hiding,
The case where malicious link of inductivity is implied for picture be not in limit of consideration.
Step 5: same step 3, the steganography identification model that invocation step one is established, excavate whether PNG pictures are deposited
In Information hiding, the picture hidden for existence information, it is believed that fallacious message may be hidden, forbid accessing the picture resource.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.
Claims (7)
1. a kind of malice PNG image-recognizing methods based on machine learning, which is characterized in that include the following steps:
Step 1: establishing PNG characteristics of image library and steganography identification model by machine learning;
Step 2: being examined the request of all uploading pictures files in server-side, the PNG images that control step 1 is established
Feature database carries out characteristic matching identification, if it find that illegal PNG picture formats, then refuse upload request to PNG pictures;
Otherwise, PNG pictures enter step three by tentatively identifying;
Step 3: for the PNG picture format files by tentatively identifying, the steganography that invocation step one is established identifies mould
Type excavates PNG pictures and whether there is Information hiding, and if it exists, then refuses upload request;If being not present, allow upload request;
Step 4: the PNG picture format file datas in client monitors webpage transmission process, control step 1 is established
PNG characteristics of image library carries out characteristic matching identification, if it find that illegal PNG picture formats, then forbid visiting to PNG pictures
Ask the picture resource;Otherwise, five are entered step;
Step 5: the steganography identification model that invocation step one is established, excavates PNG pictures and whether there is Information hiding, it is right
In the picture that existence information is hidden, it is believed that fallacious message may be hidden, forbid accessing the picture resource.
2. the malice PNG image-recognizing methods according to claim 1 based on machine learning, which is characterized in that step 1
Described establishes PNG characteristics of image library, and process is as follows:Batch PNG images are provided first and import engineering as training set data
Learning system;Next establishes PNG characteristics of image identification library, including following characteristics information:(1) .PNG head features;(2) .PNG terminates
Indicate IEND blocks;(3) records the IHDR blocks of PNG image informations;(4) stores the IDAT blocks of actual image data;(5) is stored
Image redundancy block of information;It is finally directed to library identified above, supporting vector machine model is selected to carry out feature learning, is completed to target
Identification classification.
3. the malice PNG image-recognizing methods according to claim 1 based on machine learning, which is characterized in that step 1
The steganography identification model is established using shallow-layer study and deep learning in conjunction with by the way of:On the one hand based on classics
The steganography feature of steganographic algorithm establishes feature database and carries out feature learning;On the other hand, it certainly will be sent out based on the picture quality after steganography
The feature of raw slight change, filters the PNG images containing steganography information and the PNG images without steganography information using high pass respectively
Wave device is filtered pretreatment, enhances image presentation features, using the residual image of acquisition as training set, then selects convolution god
Transfer learning is carried out through network model, there are the probability of steganography for final output image.
4. the malice PNG image-recognizing methods according to claim 3 based on machine learning, which is characterized in that the base
Feature database is established in the steganography feature of classical steganographic algorithm and carries out feature learning, to select RS parsers to carry out PNG images
There is the study of supervision:
Input is waited for that the image of training pattern is divided into the identical image block of multiple sizes first, each image block scan is arranged
Pixel vector G={ x1,x2,...,xn, and calculate using following formula the spatial coherence of each image block:
Wherein xiIndicate the gray value of each pixel, and f values are smaller, gray-value variation is smaller between illustrating neighbor pixel, image block
Spatial coherence is stronger;
Then partial pixel is randomly selected to each image block and applies non-negative turning operation, wherein overturning function is defined as follows:
Remember F1For the mutual variation relation of pixel value 2i and 2i+1, i.e.,
Remember F-1For the mutual variation relation of pixel value 2i-1 and 2i, i.e.,
Remember F0For pixel value invariant relation;
Calculate the ratio R of the increased image block of its spatial coherenceMOr the ratio S of the image block of reductionM:
Equally, partial pixel is randomly selected to each image block and applies non-positive turning operation, it is increased to calculate its spatial coherence
The ratio R of image block-MOr the ratio S of the image block of reduction-M:
If caused by being more than the non-negative turning operation of application using the increase of confusion degree caused by non-positive turning operation to image
It is that there are LSB steganography features to the PNG images setting label when increase of confusion degree;Conversely, setting label be there is no
LSB steganography features, and exported;
Using PNG images as input object, if there are LSB steganography features as anticipated output, finally by input object and in advance
Phase output composition training data simultaneously establishes a mode of learning, and mode of learning speculates that new PNG images whether there is LSB according to this
Steganography.
5. the malice PNG image-recognizing methods according to claim 3 based on machine learning, which is characterized in that the volume
Product neural network model structure include:
Input:Treated residual image all pixels point value;
Feature structure layer:Using model trained in advance as feature extractor;
Grader:Including the full articulamentum being connected and classification function;
Output:There are the probability of steganography for image;When the probability of output is more than 0.8, it is believed that there are steganographies for image.
6. the malice PNG image-recognizing methods according to claim 5 based on machine learning, which is characterized in that described point
Class device is used to be built based on image quality evaluation blind checking method:
Using technique of variance analysis, feature vector is selected by defining the measurement of a variety of picture qualities;The not phase of two images
Like degree norm by spatially take pixel difference Minkowsky average values then use chrominance representation:
M when wherein γ=1γIndicate absolute average error, M when γ=2γIndicate mean square error, Ck(i, j) indicates location of pixels i, j
With the multispectral component of the normal picture of pixel k,Indicate the multispectral of the hidden image of location of pixels i, j and pixel k
Component, N indicate total number of image pixels;
Selected image quality evaluation forms a multidimensional feature space;
After choosing suitable feature set, multiple linear regression model is established on lot of experimental data, on the basis of regression model
Establish the grader for distinguishing normal picture and hidden image.
7. the malice PNG image-recognizing methods according to claim 1 based on machine learning, which is characterized in that step 2
In, the request of uploading pictures file is examined, examines that information includes following:(1) file suffixes name;(2) .HTTP messages
The content genres Content-type of message header statement;(3) whether transferring contents are by coding;(4) whether transferring contents close
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810128524.7A CN108509775B (en) | 2018-02-08 | 2018-02-08 | Malicious PNG image identification method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810128524.7A CN108509775B (en) | 2018-02-08 | 2018-02-08 | Malicious PNG image identification method based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108509775A true CN108509775A (en) | 2018-09-07 |
CN108509775B CN108509775B (en) | 2020-11-13 |
Family
ID=63375310
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810128524.7A Active CN108509775B (en) | 2018-02-08 | 2018-02-08 | Malicious PNG image identification method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108509775B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992967A (en) * | 2019-03-12 | 2019-07-09 | 福建拓尔通软件有限公司 | A kind of method and system for realizing automatic detection file security when file uploads |
CN110309654A (en) * | 2019-06-28 | 2019-10-08 | 四川长虹电器股份有限公司 | The safety detection method and device that picture uploads |
CN110942034A (en) * | 2019-11-28 | 2020-03-31 | 中国科学院自动化研究所 | Method, system and device for detecting multi-type depth network generated image |
CN110995954A (en) * | 2019-10-11 | 2020-04-10 | 中国平安财产保险股份有限公司 | Method and device for detecting picture steganography, computer equipment and storage medium |
WO2020140422A1 (en) * | 2019-01-02 | 2020-07-09 | Boe Technology Group Co., Ltd. | Neural network for automatically tagging input image, computer-implemented method for automatically tagging input image, apparatus for automatically tagging input image, and computer-program product |
WO2020151173A1 (en) * | 2019-01-25 | 2020-07-30 | 深信服科技股份有限公司 | Webpage tampering detection method and related apparatus |
CN112632475A (en) * | 2020-12-30 | 2021-04-09 | 郑州轻工业大学 | Picture copyright protection system and method based on state password and picture steganography |
CN113111200A (en) * | 2021-04-09 | 2021-07-13 | 百度在线网络技术(北京)有限公司 | Method and device for auditing picture file, electronic equipment and storage medium |
CN113112472A (en) * | 2021-04-09 | 2021-07-13 | 百度在线网络技术(北京)有限公司 | Image processing method and device |
GB2590916A (en) * | 2020-01-05 | 2021-07-14 | British Telecomm | Steganographic malware detection |
GB2590917A (en) * | 2020-01-05 | 2021-07-14 | British Telecomm | Steganographic malware identification |
CN113806747A (en) * | 2021-11-18 | 2021-12-17 | 浙江鹏信信息科技股份有限公司 | Trojan horse picture detection method and system and computer readable storage medium |
CN115296823A (en) * | 2022-09-29 | 2022-11-04 | 佛山蚕成科技有限公司 | Credible digital badge security authentication method and system |
WO2023136775A3 (en) * | 2021-12-17 | 2023-09-07 | Grabtaxi Holdings Pte. Ltd. | Method for filtering images and image hosting server |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080013817A1 (en) * | 2006-07-11 | 2008-01-17 | Fujitsu Limited | Code image processing method and code image processing apparatus |
CN106874936A (en) * | 2017-01-17 | 2017-06-20 | 腾讯科技(上海)有限公司 | Image propagates monitoring method and device |
CN107292315A (en) * | 2016-04-11 | 2017-10-24 | 北京大学 | Steganalysis method and hidden information analysis device based on multiple dimensioned LTP features |
-
2018
- 2018-02-08 CN CN201810128524.7A patent/CN108509775B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080013817A1 (en) * | 2006-07-11 | 2008-01-17 | Fujitsu Limited | Code image processing method and code image processing apparatus |
CN107292315A (en) * | 2016-04-11 | 2017-10-24 | 北京大学 | Steganalysis method and hidden information analysis device based on multiple dimensioned LTP features |
CN106874936A (en) * | 2017-01-17 | 2017-06-20 | 腾讯科技(上海)有限公司 | Image propagates monitoring method and device |
Non-Patent Citations (2)
Title |
---|
CHIEW KANG LENG 等: "JPEG Image Steganalysis Improvement Via Image-to-Image Variation Minimization", 《2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING》 * |
李雨 等: "基于稀疏编码的图像隐写检测技术研究", 《通信技术》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020140422A1 (en) * | 2019-01-02 | 2020-07-09 | Boe Technology Group Co., Ltd. | Neural network for automatically tagging input image, computer-implemented method for automatically tagging input image, apparatus for automatically tagging input image, and computer-program product |
WO2020151173A1 (en) * | 2019-01-25 | 2020-07-30 | 深信服科技股份有限公司 | Webpage tampering detection method and related apparatus |
CN109992967A (en) * | 2019-03-12 | 2019-07-09 | 福建拓尔通软件有限公司 | A kind of method and system for realizing automatic detection file security when file uploads |
CN110309654A (en) * | 2019-06-28 | 2019-10-08 | 四川长虹电器股份有限公司 | The safety detection method and device that picture uploads |
CN110995954B (en) * | 2019-10-11 | 2022-10-04 | 中国平安财产保险股份有限公司 | Method and device for detecting picture steganography, computer equipment and storage medium |
CN110995954A (en) * | 2019-10-11 | 2020-04-10 | 中国平安财产保险股份有限公司 | Method and device for detecting picture steganography, computer equipment and storage medium |
CN110942034A (en) * | 2019-11-28 | 2020-03-31 | 中国科学院自动化研究所 | Method, system and device for detecting multi-type depth network generated image |
GB2590916A (en) * | 2020-01-05 | 2021-07-14 | British Telecomm | Steganographic malware detection |
GB2590917A (en) * | 2020-01-05 | 2021-07-14 | British Telecomm | Steganographic malware identification |
CN112632475A (en) * | 2020-12-30 | 2021-04-09 | 郑州轻工业大学 | Picture copyright protection system and method based on state password and picture steganography |
CN112632475B (en) * | 2020-12-30 | 2024-03-29 | 郑州轻工业大学 | Picture copyright protection system and method based on national password and picture steganography |
CN113112472A (en) * | 2021-04-09 | 2021-07-13 | 百度在线网络技术(北京)有限公司 | Image processing method and device |
CN113112472B (en) * | 2021-04-09 | 2023-08-29 | 百度在线网络技术(北京)有限公司 | Image processing method and device |
CN113111200A (en) * | 2021-04-09 | 2021-07-13 | 百度在线网络技术(北京)有限公司 | Method and device for auditing picture file, electronic equipment and storage medium |
CN113806747B (en) * | 2021-11-18 | 2022-02-25 | 浙江鹏信信息科技股份有限公司 | Trojan horse picture detection method and system and computer readable storage medium |
CN113806747A (en) * | 2021-11-18 | 2021-12-17 | 浙江鹏信信息科技股份有限公司 | Trojan horse picture detection method and system and computer readable storage medium |
WO2023136775A3 (en) * | 2021-12-17 | 2023-09-07 | Grabtaxi Holdings Pte. Ltd. | Method for filtering images and image hosting server |
CN115296823A (en) * | 2022-09-29 | 2022-11-04 | 佛山蚕成科技有限公司 | Credible digital badge security authentication method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108509775B (en) | 2020-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108509775A (en) | A kind of malice PNG image-recognizing methods based on machine learning | |
Guo et al. | Fake face detection via adaptive manipulation traces extraction network | |
Li et al. | How to prove your model belongs to you: A blind-watermark based framework to protect intellectual property of DNN | |
Walia et al. | Digital image forgery detection: a systematic scrutiny | |
Chen et al. | Detecting visually similar web pages: Application to phishing detection | |
Xiao et al. | Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets | |
Wang et al. | SSteGAN: self-learning steganography based on generative adversarial networks | |
Mohan et al. | Spoof net: syntactic patterns for identification of ominous online factors | |
Bourouis et al. | Recent advances in digital multimedia tampering detection for forensics analysis | |
Cohen et al. | ASSAF: Advanced and Slim StegAnalysis Detection Framework for JPEG images based on deep convolutional denoising autoencoder and Siamese networks | |
Zhu et al. | Fragile neural network watermarking with trigger image set | |
Ghai et al. | A deep-learning-based image forgery detection framework for controlling the spread of misinformation | |
CN110929806A (en) | Picture processing method and device based on artificial intelligence and electronic equipment | |
Mareen et al. | Comprint: Image forgery detection and localization using compression fingerprints | |
Chen et al. | XSS adversarial example attacks based on deep reinforcement learning | |
Gong et al. | Kaleidoscope: Physical backdoor attacks against deep neural networks with RGB filters | |
Hariprasad et al. | Boundary-based fake face anomaly detection in videos using recurrent neural networks | |
Lee et al. | Attacking logo-based phishing website detectors with adversarial perturbations | |
CN115001763B (en) | Phishing website attack detection method and device, electronic equipment and storage medium | |
Wu et al. | DAPter: Preventing user data abuse in deep learning inference services | |
Li et al. | Side channel steganalysis: when behavior is considered in steganographer detection | |
CN110020256A (en) | The method and system of the harmful video of identification based on User ID and trailer content | |
Liang et al. | Soft multimedia anomaly detection based on neural network and optimization driven support vector machine | |
Hendrych et al. | New approach to steganography detection via steganalysis framework | |
Li et al. | Halnet: A hybrid deep learning model for encrypted c&c malware traffic detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |