CN108509775A - A kind of malice PNG image-recognizing methods based on machine learning - Google Patents

A kind of malice PNG image-recognizing methods based on machine learning Download PDF

Info

Publication number
CN108509775A
CN108509775A CN201810128524.7A CN201810128524A CN108509775A CN 108509775 A CN108509775 A CN 108509775A CN 201810128524 A CN201810128524 A CN 201810128524A CN 108509775 A CN108509775 A CN 108509775A
Authority
CN
China
Prior art keywords
png
image
steganography
picture
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810128524.7A
Other languages
Chinese (zh)
Other versions
CN108509775B (en
Inventor
杨悉瑜
翁健
魏林锋
杨悉琪
潘冰
张悦
李明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
University of Jinan
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201810128524.7A priority Critical patent/CN108509775B/en
Publication of CN108509775A publication Critical patent/CN108509775A/en
Application granted granted Critical
Publication of CN108509775B publication Critical patent/CN108509775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration by the use of local operators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/031Protect user input by software means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0065Extraction of an embedded watermark; Reliable detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Image Analysis (AREA)

Abstract

The present invention proposes the malice PNG image-recognizing methods based on machine learning, belongs to cyberspace security technology area, initially sets up PNG characteristics of image library and steganography identification model;Uploading pictures file request is examined in server-side, characteristic matching identification is carried out according to PNG characteristics of image library, whether preliminary identification PNG pictures are legal, call number steganography identification model excavates PNG pictures with the presence or absence of Information hiding if legal, refuses to upload if illegal or existence information is hidden;PNG picture format file datas in client monitors webpage transmission process, characteristic matching identification is carried out according to PNG characteristics of image library, call number steganography identification model excavates PNG pictures with the presence or absence of Information hiding if legal, forbids accessing the picture resource if illegal or existence information is hidden.The present invention can forbid the upload of illegal picture in server-side, forbid, to the access of illegal picture, strengthening network security in client.

Description

A kind of malice PNG image-recognizing methods based on machine learning
Technical field
The invention belongs to cyberspace security technology area more particularly to a kind of malice PNG images based on machine learning Recognition methods.
Background technology
With the rapid proliferation application of network, the fast development of digitizing technique, cyberspace safety problem gradually enters into The visual field of people is paid attention to by more and more people.
On the one hand, browser obtains the main media of internet information as people, and safety problem is not allowed to despise.In recent years Come, JavaScript examine it is not stringent etc. due to, more and more webpages have been implanted panoramic web advertisement, it It is light then induce user to click to access malicious link, it is heavy then by by Malware, malice dynamic link library file (Dynamic Link Libraries, DLL) it is attached to the mode of Web page picture, computer and networks system of defense is bypassed, directly to user People's computer and mobile device cause the baneful influences such as virus infection, information leakage.
On the other hand, website is emerged one after another by illegal control, mass data leakage event, and frequent as wherein utilizing One attack technology --- malicious code is uploaded by file upload function, such as a word wooden horse, and then controls server, danger Evil should not be underestimated.For the detection for uploading malicious code and around being defence and attack that game both sides never stop.In recent years, Attacker begins to use the PNG pictures of upload " legal " to hide the detection of intruding detection system, and malicious code then passes through volume The Steganographies such as code, LSB steganography are hidden in " legal " PNG pictures of forgery, once successfully uploading, attacker can lead to Cross the mode of the attack load being hidden in PNG pictures that parsing constructs meticulously that accesses, remote control Website server, thus into The more damaging trial of row and operation behavior, such as steal website user's private data, remote control Website server conduct Puppet's machine starts to attack (DoS) etc. the denied access of other servers.
After all, either in the client of such as browser etc, or deployment Website server server-side, One urgent problem to be solved is exactly to be audited to the picture in webpage to prevent hiding malicious act.The figure of PNG format Piece, lossless compression small with its, optimization network transmission display the features such as be widely used in webpage, PNG pictures are also Good Covers of Information Hiding, it should be by the object of primary study.
If server-side can efficiently and accurately identify legal figure when handling user's uploading pictures file request Piece upload request, and analyze and whether used Steganography in picture and contained malicious attack load;Client can When accessing web page resources, the picture resource in webpage is filtered, to the doubtful picture resource for including rogue program file Forbid voluntarily downloading, then can contain the generation of such malicious act from source.
For this purpose, we introduce machine learning techniques and Steganography to solve the problems, such as this.
The application of machine learning techniques spreads the every field of artificial intelligence, is the core technology of artificial intelligence.Currently, machine The characteristic that device learning art is learnt with its autonomous learning, efficiently, accurately learnt, also begins to play in cyberspace security fields Huge effect.
The realization of machine learning has inseparable relationship with three parts:Environment, study part and execution part.Ring Border provides certain information to the study part of system, and knowledge base is changed in study part using these information, to promote system execution It is partially completed the efficiency of task, execution part completes task according to knowledge base, while the information of acquisition is fed back to study part.
Below for identifying PNG images, detailed description influences three factors of machine learning system design:
The information that environment is provided to system:What is stored in knowledge base is the rule for instructing execution part to act, but ring The information that border is provided to system is various.It is smaller with the difference of rule if the quality of information is relatively high, Then learn part to be easier to handle.If providing rambling guidance to system executes the specific letter specifically acted It ceases, then system needs after obtaining enough data, deletes unnecessary details, popularization of summarizing, and forms guidance action Rule is put into knowledge base, learns the task of part in this way with regard to burdensome, design is got up also more difficult.
Knowledge base:Form that there are many expressing for knowledge, such as the storage mode of the header flags of PNG images, PNG images, The end mark etc. of PNG images.These representations have their own characteristics, and following 4 sides are met when selecting representation Face:
(1) ability to express is strong;
(2) it is easy to reasoning;
(3) it is easily modified knowledge base;
(4) representation of knowledge is easy to extend.
Execution part:The core of whole system because the action of execution part is exactly study part strive for improvement it is dynamic Make.During to PNG image recognitions, constantly the content for learning part is adjusted according to recognition result, is held with improving Accuracy when row.
Steganography is a kind of safety being embedded into secret information in digital media without damaging its carrier quality Technology.The secret information handled by Steganography, third party be both not aware of the presence of secret information, had not also known secret The content of confidential information.The carrier of steganography includes image, audio, video etc..In recent years, Steganography is changeable, hidden by its The features such as property hidden is strong has become information security technology focus of attention.Since each Web site is dependent on various more matchmakers Body, such as audio, video and image resource, therefore attacker can be by hidden with number to Malware, malicious attack load Attack is hidden in multimedia by writing technology, and can easily bypass the detection of anti-malware, to cause bigger Potential threat.
By taking the image of multimedia resource as an example, classical digital picture steganography includes two aspects, based on the hidden of spatial domain It writes and the steganography based on transform domain.Wherein, the steganography based on spatial domain mainly has least significant bit (Least Significant Bit, LSB) steganography, the steganography based on transform domain is main and discrete cosine transform (the Discrete Cosine of image Transform, DCT) coefficient is related, including Jsteg steganography, F5 steganography, Outguess steganography, is based on model (Model- Based, MB) steganography etc..
Invention content
In order to solve the problems of prior art, the present invention provides a kind of malice PNG images based on machine learning Recognition methods carries out characteristic matching identification using PNG characteristics of image library, and judges PNG pictures by steganography identification model Forbid to illegal picture in client to forbid the upload of illegal picture in server-side with the presence or absence of hiding information It accesses, strengthens network security.
The present invention adopts the following technical scheme that realize:A kind of malice PNG image-recognizing methods based on machine learning, packet Include following steps:
Step 1: establishing PNG characteristics of image library and steganography identification model by machine learning;
Step 2: being examined the request of all uploading pictures files in server-side, the PNG that control step 1 is established Characteristics of image library carries out characteristic matching identification to PNG pictures, if it find that illegal PNG picture formats, then refuse to upload and ask It asks;Otherwise, PNG pictures enter step three by tentatively identifying;
Step 3: for the PNG picture format files by tentatively identifying, the steganography that invocation step one is established is known Other model excavates PNG pictures and whether there is Information hiding, and if it exists, then refuses upload request;If being not present, allow to upload Request;
Step 4: the PNG picture format file datas in client monitors webpage transmission process, control step 1 is built Vertical PNG characteristics of image library carries out characteristic matching identification, if it find that illegal PNG picture formats, then prohibit to PNG pictures Only access the picture resource;Otherwise, five are entered step;
Step 5: the steganography identification model that invocation step one is established, it is hidden with the presence or absence of information to excavate PNG pictures It hides, the picture hidden for existence information, it is believed that fallacious message may be hidden, forbid accessing the picture resource.
Preferably, PNG characteristics of image library is established described in step 1, process is as follows:Batch PNG image conducts are provided first Training set data imports machine learning system;Next establishes PNG characteristics of image identification library, including following characteristics information:(1).PNG Head feature;(2) .PNG end marks IEND blocks;(3) records the IHDR blocks of PNG image informations;(4) stores real image number According to IDAT blocks;(5) stores image redundancy block of information;It is finally directed to library identified above, selects supporting vector machine model to carry out special Sign study completes the identification to target and classifies.
Preferably, the steganography identification model described in step 1, in such a way that shallow-layer study and deep learning combine To establish:On the one hand the steganography feature based on classical steganographic algorithm establishes feature database and carries out feature learning;On the other hand, it is based on hidden The feature of slight change certainly will occur for the picture quality after writing, to the PNG images containing steganography information and without steganography information PNG images are filtered pretreatment using high-pass filter respectively, enhance image presentation features, using the residual image of acquisition as Then training set selects convolutional neural networks model to carry out transfer learning, there are the probability of steganography for final output image.
Preferably, the steganography feature based on classical steganographic algorithm establishes feature database and carries out feature learning, to select RS The study that parser carries out having supervision to PNG images:
Input is waited for that the image of training pattern is divided into the identical image block of multiple sizes first, to each image block scan It is arranged in pixel vectors G={ x1,x2,...,xn, and calculate using following formula the spatial coherence of each image block:
Wherein xiIndicating the gray value of each pixel, and f values are smaller, gray-value variation is smaller between illustrating neighbor pixel, Image block spatial coherence is stronger;
Then partial pixel is randomly selected to each image block and applies non-negative turning operation, wherein the definition of overturning function is such as Under:
Remember F1For the mutual variation relation of pixel value 2i and 2i+1, i.e.,
Remember F-1For the mutual variation relation of pixel value 2i-1 and 2i, i.e.,
Remember F0For pixel value invariant relation;
Calculate the ratio R of the increased image block of its spatial coherenceMOr the ratio S of the image block of reductionM
Equally, partial pixel is randomly selected to each image block and applies non-positive turning operation, calculate the increasing of its spatial coherence The ratio R of the image block added-MOr the ratio S of the image block of reduction-M
If right when the non-positive overturning of application is more than increase of the non-negative overturning of application to confusion degree to the increase of confusion degree It is that there are LSB steganography features that label, which is arranged, in the PNG images;Conversely, setting label is that there is no LSB steganography features, and carry out defeated Go out;
Using PNG images as input object, if there are LSB steganography features as anticipated output, finally by input object Training data is formed with anticipated output and establishes a mode of learning, and mode of learning speculates whether new PNG images are deposited according to this In LSB steganography.
Compared with prior art, the present invention has the advantages that:Present invention introduces machine learning techniques and number are hidden Writing technology establishes PNG characteristics of image library and carries out characteristic matching identification, preliminary to judge PNG images with the presence or absence of the hidden of fallacious message It hides, and further judges that PNG pictures whether there is hiding information by steganography identification model, to forbid not in server-side The upload of legal picture is forbidden, to the access of illegal picture, strengthening network security in client.Wherein, in steganography The study for selecting RS parsers to carry out having supervision to PNG images in identification model, by the positive and negative turning operation for overturning function To image confusion degree whether quite come judge image whether there is LSB steganography features, followed by convolutional neural networks pair There are the probability of steganography to carry out deep learning for image, judges, accuracy rate is high, and the design of entire model is relatively simple, is easy to It realizes.
Description of the drawings
Fig. 1 is a kind of malice PNG image-recognizing method flow charts based on machine learning provided in an embodiment of the present invention;
Fig. 2 is the number in a kind of malice PNG image-recognizing methods based on machine learning provided in an embodiment of the present invention Steganography identification model frame diagram.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention It is described in detail.It should be appreciated that described herein, specific examples are only used to explain the present invention, is not used to limit this hair It is bright.
The realization of the present invention is based on server-side and client two parts.Technical solution of the present invention is applied in server-side, If the request of each uploading pictures file can be recorded, and sequentially enter PNG feature recognitions as test set data Library and steganography identification model are matched, then can be achieved with effectively containing that hacker controls by load is attacked in upload The behavior of server.In the client by technical solution of the present invention application, if each the web page resources containing picture can be by It records, and sequentially enters PNG feature recognitions library as test set data and steganography identification model is matched, then It can effectively contain the behavior of malicious act control user equipment from source.
The present invention is trained by a large amount of PNG image recognitions first, establishes PNG characteristics of image identification library;By more to using Kind Steganography carries out the PNG images of steganography information in such a way that shallow-layer learns and deep learning is combined, and establishes number Word steganography identification model.In service end environment, the text in library identification client file upload procedure is identified according to PNG characteristics of image Whether part is PNG images, if it is confirmed that being PNG images, then tentatively assert legal and carries out next step detection, if it is confirmed that not being inconsistent Close the requirement of PNG picture formats, then it is assumed that upper transmitting file is illegal, refuses upload request.Tentatively assert it is legal after, further use Steganography identification model detects PNG images and whether there is Information hiding, and if it exists, then think that file is doubtful malicious file, Refuse client upload request;If being not present, then it is assumed that mean no harm behavior, allows upload request.In client environment, by clear Web page picture (the refering in particular to PNG images herein) data that device of looking at monitors plug-in unit in real time or other real-time monitoring instruments browse user into Row monitoring in real time carries out characteristics of image identification with PNG characteristics of image identification library, and the image that such as notes abnormalities (is known by machine Result is not meet the PNG images of specification after not), then forbid user to access the image resource;If not finding image abnormity into One step whether there is Information hiding with steganography identification model detection image, and if it exists, user is then forbidden to access the picture Resource;If being not present, user can normally access the picture resource.Such as Fig. 1, specifically comprise the following steps:
Step 1: establishing PNG characteristics of image library and steganography identification model by machine learning.
Foundation for PNG characteristics of image library, it is contemplated that the uniformity of PNG picture formats, therefore only need to use shallow-layer Study:It is to provide batch PNG images first and imports machine learning system as training set data.It is special followed by establish PNG images Sign identification library, including following characteristics information:(1) .PNG head features;(2) .PNG end marks IEND blocks;(3) records PNG figures As the IHDR blocks of information;(4) stores the IDAT blocks of actual image data;(5) stores image redundancy block of information (such as tExt blocks) Deng.The identification library for being finally directed to the above hand-designed carries out feature learning, in view of study is intended to complete the identification to target point Class, therefore supporting vector machine model (Support Vector Machine, SVM) is selected to carry out supervised learning.
Foundation for steganography identification model, it is contemplated that other than some classical steganographic algorithms, based on classical hidden The steganographic algorithm of the transformation or independent design of writing algorithm is difficult detected feature, so the present invention is learnt and deep using shallow-layer The mode that degree study combines:
On the one hand it is that the steganography feature based on classical steganographic algorithm establishes feature database, carries out feature learning, classics here Steganographic algorithm refers to the steganographic algorithm under spatial domain --- such as least significant bit (Least Significant Bit, LSB) steganography. In view of RS (Regularand Singular groups method) parser is based on Image Smoothness before and after steganography Change to detect secret information, to using random LSB steganographic algorithms, (i.e. classified information selects the minimum of image according to random sequence Significance bit carries out steganography) there is very strong robustness, therefore the study that selection RS parsers carry out having supervision to PNG images (Supervised learning), it is specific as follows:
Input is waited for that the image of training pattern is divided into the identical image block of multiple sizes first, to each in a manner of Zigzag A image block scan is arranged in pixel vectors G={ x1,x2,...,xn, and calculate using following formula the space of each image block Correlation:
Wherein xiIndicating the gray value of each pixel, and f values are smaller, gray-value variation is smaller between illustrating neighbor pixel, Image block spatial coherence is stronger.
Then partial pixel is randomly selected to each image block and applies non-negative overturning (F1And F0) operation, wherein overturning function It is defined as follows:
Remember F1For the mutual variation relation of pixel value 2i and 2i+1, i.e.,
Remember F-1For the mutual variation relation of pixel value 2i-1 and 2i, i.e.,
Remember F0For pixel value invariant relation.
The ratio for calculating the increased image block of its spatial coherence (is denoted as RM) or the ratio of image block of reduction (be denoted as SM):
(RM+SM≤1)
Equally, partial pixel is randomly selected to each image block and applies non-positive overturning (F-1And F0) operation, calculate its space The ratio of the increased image block of correlation (is denoted as R-M) or the ratio of image block of reduction (be denoted as S-M):
(R-M+S-M≤1)
Statistically, if image does not pass through LSB steganography, non-negative overturning is carried out to image or non-positive overturning is grasped Make the spatial coherence that meeting equal extent destroys image block, i.e., increases the confusion degree of image block on an equal basis, there is R at this timeM≈R-M,SM ≈S-M, and RM> SM,R-M> S-M
Therefore, if being grasped when being more than the non-negative overturning of application using the increase of confusion degree caused by non-positive turning operation to image Caused by making when the increase of confusion degree, it is believed that the PNG images very likely have LSB steganography, and setting label is hidden for there are LSB Write feature;Conversely, setting label is that there is no LSB steganography features, and are exported.Finally by input object (PNG images) and Anticipated output (whether there is LSB steganography feature) composition training data simultaneously establishes a mode of learning (Learning Model), And mode of learning speculates that new PNG images whether there is LSB steganography according to this.
On the other hand, the feature of slight change certainly will occur based on the picture quality after steganography, first to containing steganography letter The PNG images of breath and the PNG images without steganography information are filtered pretreatment using high-pass filter respectively, and enhancing image is aobvious Show feature, using the residual image of acquisition as training set;In view of superiority of the convolutional neural networks model in space reflection, In terms of suitable for processing image, and helps to reduce in the insufficient transfer learning of data volume and build Neural Network Data Demand, therefore selected based on Lionel Pibre et al. improved convolutional neural networks (Convolutional Neural Network, CNN) model carries out transfer learning, and main thought is as follows:
By the convolutional neural networks model of Lionel Pibre et al. pre-training as feature extraction operator, by convolutional Neural Last layer of network changes the grader of oneself into, then fixes the weight and the entire convolutional neural networks of training of other layers.
Referring to Fig.2, convolutional neural networks model structure is as follows:
Input:Treated residual image all pixels point value;
Feature structure layer:Using model trained in advance as feature extractor;
Grader:Including the full articulamentum (Fully Connected Layer) being connected and classification function (softmax);
Output:There are the probability of steganography for image;When output probability is more than 0.8, it is believed that there are steganographies for image.
Wherein grader used Avcibas propose based on image quality evaluation (Image Quality Metrics, IQM) blind checking method is built, specific as follows:
1. feature vector is selected by defining the measurement of a variety of picture qualities, in order to extract distincter feature, this In used variance analysis (Analysis of Variance, ANOVA) technology;By taking Minkowsky features as an example, two images Dissimilar degree norm can by spatially take pixel difference Minkowsky average values then with coloration (i.e. whole On a frequency band) it indicates:
M when wherein γ=1γIndicate absolute average error, M when γ=2γIndicate mean square error, Ck(i, j) indicates pixel position The multispectral component of the normal picture of i, j and pixel k is set,Indicate the hidden image of location of pixels i, j and pixel k Multispectral component, N indicate total number of image pixels;
2. selected IQM (Image Quality Metrics, image quality evaluation) forms a multidimensional characteristic sky Between, normal picture is more easy to distinguish with hidden image within this space;
3. after having chosen suitable feature set, multiple linear regression model is established on lot of experimental data, is being returned The grader for distinguishing normal picture and hidden image is established on the basis of model.
Step 2: being examined the request of all uploading pictures files in server-side, pre- place first is decoded to data Reason, then the PNG characteristics of image library that step 1 is established is compareed, characteristic matching identification is carried out to PNG pictures, if it find that illegal PNG picture formats, then refuse upload request;Otherwise, PNG pictures enter step three by tentatively identifying.
In this step, the request of uploading pictures file is examined, examines that information includes following:(1) file suffixes Name;(2) the content genres Content-type of .HTTP messages head statement;(3) whether transferring contents are by coding;(4). Whether transferring content is legal.
Step 3: for the PNG picture format files by tentatively identifying, the steganography that invocation step one is established is known Other model excavates PNG pictures and whether there is Information hiding, and if it exists, then refuses upload request;If being not present, allow to upload Request.
Step 4: the PNG figures in client monitors the forms such as plug-in unit monitoring webpage transmission process by browser in real time Piece format file data the pretreatments such as is decoded to data, then compares the PNG characteristics of image library that step 1 is established, to PNG Picture carries out characteristic matching identification, if it find that illegal PNG picture formats, then forbid accessing the picture resource;Otherwise, into Enter step 5.
Client monitors webpage PNG image datas refer specifically to monitoring PNG image datas itself and whether there is Information hiding, The case where malicious link of inductivity is implied for picture be not in limit of consideration.
Step 5: same step 3, the steganography identification model that invocation step one is established, excavate whether PNG pictures are deposited In Information hiding, the picture hidden for existence information, it is believed that fallacious message may be hidden, forbid accessing the picture resource.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.

Claims (7)

1. a kind of malice PNG image-recognizing methods based on machine learning, which is characterized in that include the following steps:
Step 1: establishing PNG characteristics of image library and steganography identification model by machine learning;
Step 2: being examined the request of all uploading pictures files in server-side, the PNG images that control step 1 is established Feature database carries out characteristic matching identification, if it find that illegal PNG picture formats, then refuse upload request to PNG pictures; Otherwise, PNG pictures enter step three by tentatively identifying;
Step 3: for the PNG picture format files by tentatively identifying, the steganography that invocation step one is established identifies mould Type excavates PNG pictures and whether there is Information hiding, and if it exists, then refuses upload request;If being not present, allow upload request;
Step 4: the PNG picture format file datas in client monitors webpage transmission process, control step 1 is established PNG characteristics of image library carries out characteristic matching identification, if it find that illegal PNG picture formats, then forbid visiting to PNG pictures Ask the picture resource;Otherwise, five are entered step;
Step 5: the steganography identification model that invocation step one is established, excavates PNG pictures and whether there is Information hiding, it is right In the picture that existence information is hidden, it is believed that fallacious message may be hidden, forbid accessing the picture resource.
2. the malice PNG image-recognizing methods according to claim 1 based on machine learning, which is characterized in that step 1 Described establishes PNG characteristics of image library, and process is as follows:Batch PNG images are provided first and import engineering as training set data Learning system;Next establishes PNG characteristics of image identification library, including following characteristics information:(1) .PNG head features;(2) .PNG terminates Indicate IEND blocks;(3) records the IHDR blocks of PNG image informations;(4) stores the IDAT blocks of actual image data;(5) is stored Image redundancy block of information;It is finally directed to library identified above, supporting vector machine model is selected to carry out feature learning, is completed to target Identification classification.
3. the malice PNG image-recognizing methods according to claim 1 based on machine learning, which is characterized in that step 1 The steganography identification model is established using shallow-layer study and deep learning in conjunction with by the way of:On the one hand based on classics The steganography feature of steganographic algorithm establishes feature database and carries out feature learning;On the other hand, it certainly will be sent out based on the picture quality after steganography The feature of raw slight change, filters the PNG images containing steganography information and the PNG images without steganography information using high pass respectively Wave device is filtered pretreatment, enhances image presentation features, using the residual image of acquisition as training set, then selects convolution god Transfer learning is carried out through network model, there are the probability of steganography for final output image.
4. the malice PNG image-recognizing methods according to claim 3 based on machine learning, which is characterized in that the base Feature database is established in the steganography feature of classical steganographic algorithm and carries out feature learning, to select RS parsers to carry out PNG images There is the study of supervision:
Input is waited for that the image of training pattern is divided into the identical image block of multiple sizes first, each image block scan is arranged Pixel vector G={ x1,x2,...,xn, and calculate using following formula the spatial coherence of each image block:
Wherein xiIndicate the gray value of each pixel, and f values are smaller, gray-value variation is smaller between illustrating neighbor pixel, image block Spatial coherence is stronger;
Then partial pixel is randomly selected to each image block and applies non-negative turning operation, wherein overturning function is defined as follows:
Remember F1For the mutual variation relation of pixel value 2i and 2i+1, i.e.,
Remember F-1For the mutual variation relation of pixel value 2i-1 and 2i, i.e.,
Remember F0For pixel value invariant relation;
Calculate the ratio R of the increased image block of its spatial coherenceMOr the ratio S of the image block of reductionM
Equally, partial pixel is randomly selected to each image block and applies non-positive turning operation, it is increased to calculate its spatial coherence The ratio R of image block-MOr the ratio S of the image block of reduction-M
If caused by being more than the non-negative turning operation of application using the increase of confusion degree caused by non-positive turning operation to image It is that there are LSB steganography features to the PNG images setting label when increase of confusion degree;Conversely, setting label be there is no LSB steganography features, and exported;
Using PNG images as input object, if there are LSB steganography features as anticipated output, finally by input object and in advance Phase output composition training data simultaneously establishes a mode of learning, and mode of learning speculates that new PNG images whether there is LSB according to this Steganography.
5. the malice PNG image-recognizing methods according to claim 3 based on machine learning, which is characterized in that the volume Product neural network model structure include:
Input:Treated residual image all pixels point value;
Feature structure layer:Using model trained in advance as feature extractor;
Grader:Including the full articulamentum being connected and classification function;
Output:There are the probability of steganography for image;When the probability of output is more than 0.8, it is believed that there are steganographies for image.
6. the malice PNG image-recognizing methods according to claim 5 based on machine learning, which is characterized in that described point Class device is used to be built based on image quality evaluation blind checking method:
Using technique of variance analysis, feature vector is selected by defining the measurement of a variety of picture qualities;The not phase of two images Like degree norm by spatially take pixel difference Minkowsky average values then use chrominance representation:
M when wherein γ=1γIndicate absolute average error, M when γ=2γIndicate mean square error, Ck(i, j) indicates location of pixels i, j With the multispectral component of the normal picture of pixel k,Indicate the multispectral of the hidden image of location of pixels i, j and pixel k Component, N indicate total number of image pixels;
Selected image quality evaluation forms a multidimensional feature space;
After choosing suitable feature set, multiple linear regression model is established on lot of experimental data, on the basis of regression model Establish the grader for distinguishing normal picture and hidden image.
7. the malice PNG image-recognizing methods according to claim 1 based on machine learning, which is characterized in that step 2 In, the request of uploading pictures file is examined, examines that information includes following:(1) file suffixes name;(2) .HTTP messages The content genres Content-type of message header statement;(3) whether transferring contents are by coding;(4) whether transferring contents close Method.
CN201810128524.7A 2018-02-08 2018-02-08 Malicious PNG image identification method based on machine learning Active CN108509775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810128524.7A CN108509775B (en) 2018-02-08 2018-02-08 Malicious PNG image identification method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810128524.7A CN108509775B (en) 2018-02-08 2018-02-08 Malicious PNG image identification method based on machine learning

Publications (2)

Publication Number Publication Date
CN108509775A true CN108509775A (en) 2018-09-07
CN108509775B CN108509775B (en) 2020-11-13

Family

ID=63375310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810128524.7A Active CN108509775B (en) 2018-02-08 2018-02-08 Malicious PNG image identification method based on machine learning

Country Status (1)

Country Link
CN (1) CN108509775B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992967A (en) * 2019-03-12 2019-07-09 福建拓尔通软件有限公司 A kind of method and system for realizing automatic detection file security when file uploads
CN110309654A (en) * 2019-06-28 2019-10-08 四川长虹电器股份有限公司 The safety detection method and device that picture uploads
CN110942034A (en) * 2019-11-28 2020-03-31 中国科学院自动化研究所 Method, system and device for detecting multi-type depth network generated image
CN110995954A (en) * 2019-10-11 2020-04-10 中国平安财产保险股份有限公司 Method and device for detecting picture steganography, computer equipment and storage medium
WO2020140422A1 (en) * 2019-01-02 2020-07-09 Boe Technology Group Co., Ltd. Neural network for automatically tagging input image, computer-implemented method for automatically tagging input image, apparatus for automatically tagging input image, and computer-program product
WO2020151173A1 (en) * 2019-01-25 2020-07-30 深信服科技股份有限公司 Webpage tampering detection method and related apparatus
CN112632475A (en) * 2020-12-30 2021-04-09 郑州轻工业大学 Picture copyright protection system and method based on state password and picture steganography
CN113111200A (en) * 2021-04-09 2021-07-13 百度在线网络技术(北京)有限公司 Method and device for auditing picture file, electronic equipment and storage medium
CN113112472A (en) * 2021-04-09 2021-07-13 百度在线网络技术(北京)有限公司 Image processing method and device
GB2590916A (en) * 2020-01-05 2021-07-14 British Telecomm Steganographic malware detection
GB2590917A (en) * 2020-01-05 2021-07-14 British Telecomm Steganographic malware identification
CN113806747A (en) * 2021-11-18 2021-12-17 浙江鹏信信息科技股份有限公司 Trojan horse picture detection method and system and computer readable storage medium
CN115296823A (en) * 2022-09-29 2022-11-04 佛山蚕成科技有限公司 Credible digital badge security authentication method and system
WO2023136775A3 (en) * 2021-12-17 2023-09-07 Grabtaxi Holdings Pte. Ltd. Method for filtering images and image hosting server

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080013817A1 (en) * 2006-07-11 2008-01-17 Fujitsu Limited Code image processing method and code image processing apparatus
CN106874936A (en) * 2017-01-17 2017-06-20 腾讯科技(上海)有限公司 Image propagates monitoring method and device
CN107292315A (en) * 2016-04-11 2017-10-24 北京大学 Steganalysis method and hidden information analysis device based on multiple dimensioned LTP features

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080013817A1 (en) * 2006-07-11 2008-01-17 Fujitsu Limited Code image processing method and code image processing apparatus
CN107292315A (en) * 2016-04-11 2017-10-24 北京大学 Steganalysis method and hidden information analysis device based on multiple dimensioned LTP features
CN106874936A (en) * 2017-01-17 2017-06-20 腾讯科技(上海)有限公司 Image propagates monitoring method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHIEW KANG LENG 等: "JPEG Image Steganalysis Improvement Via Image-to-Image Variation Minimization", 《2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING》 *
李雨 等: "基于稀疏编码的图像隐写检测技术研究", 《通信技术》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140422A1 (en) * 2019-01-02 2020-07-09 Boe Technology Group Co., Ltd. Neural network for automatically tagging input image, computer-implemented method for automatically tagging input image, apparatus for automatically tagging input image, and computer-program product
WO2020151173A1 (en) * 2019-01-25 2020-07-30 深信服科技股份有限公司 Webpage tampering detection method and related apparatus
CN109992967A (en) * 2019-03-12 2019-07-09 福建拓尔通软件有限公司 A kind of method and system for realizing automatic detection file security when file uploads
CN110309654A (en) * 2019-06-28 2019-10-08 四川长虹电器股份有限公司 The safety detection method and device that picture uploads
CN110995954B (en) * 2019-10-11 2022-10-04 中国平安财产保险股份有限公司 Method and device for detecting picture steganography, computer equipment and storage medium
CN110995954A (en) * 2019-10-11 2020-04-10 中国平安财产保险股份有限公司 Method and device for detecting picture steganography, computer equipment and storage medium
CN110942034A (en) * 2019-11-28 2020-03-31 中国科学院自动化研究所 Method, system and device for detecting multi-type depth network generated image
GB2590916A (en) * 2020-01-05 2021-07-14 British Telecomm Steganographic malware detection
GB2590917A (en) * 2020-01-05 2021-07-14 British Telecomm Steganographic malware identification
CN112632475A (en) * 2020-12-30 2021-04-09 郑州轻工业大学 Picture copyright protection system and method based on state password and picture steganography
CN112632475B (en) * 2020-12-30 2024-03-29 郑州轻工业大学 Picture copyright protection system and method based on national password and picture steganography
CN113112472A (en) * 2021-04-09 2021-07-13 百度在线网络技术(北京)有限公司 Image processing method and device
CN113112472B (en) * 2021-04-09 2023-08-29 百度在线网络技术(北京)有限公司 Image processing method and device
CN113111200A (en) * 2021-04-09 2021-07-13 百度在线网络技术(北京)有限公司 Method and device for auditing picture file, electronic equipment and storage medium
CN113806747B (en) * 2021-11-18 2022-02-25 浙江鹏信信息科技股份有限公司 Trojan horse picture detection method and system and computer readable storage medium
CN113806747A (en) * 2021-11-18 2021-12-17 浙江鹏信信息科技股份有限公司 Trojan horse picture detection method and system and computer readable storage medium
WO2023136775A3 (en) * 2021-12-17 2023-09-07 Grabtaxi Holdings Pte. Ltd. Method for filtering images and image hosting server
CN115296823A (en) * 2022-09-29 2022-11-04 佛山蚕成科技有限公司 Credible digital badge security authentication method and system

Also Published As

Publication number Publication date
CN108509775B (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN108509775A (en) A kind of malice PNG image-recognizing methods based on machine learning
Guo et al. Fake face detection via adaptive manipulation traces extraction network
Li et al. How to prove your model belongs to you: A blind-watermark based framework to protect intellectual property of DNN
Walia et al. Digital image forgery detection: a systematic scrutiny
Chen et al. Detecting visually similar web pages: Application to phishing detection
Xiao et al. Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets
Wang et al. SSteGAN: self-learning steganography based on generative adversarial networks
Mohan et al. Spoof net: syntactic patterns for identification of ominous online factors
Bourouis et al. Recent advances in digital multimedia tampering detection for forensics analysis
Cohen et al. ASSAF: Advanced and Slim StegAnalysis Detection Framework for JPEG images based on deep convolutional denoising autoencoder and Siamese networks
Zhu et al. Fragile neural network watermarking with trigger image set
Ghai et al. A deep-learning-based image forgery detection framework for controlling the spread of misinformation
CN110929806A (en) Picture processing method and device based on artificial intelligence and electronic equipment
Mareen et al. Comprint: Image forgery detection and localization using compression fingerprints
Chen et al. XSS adversarial example attacks based on deep reinforcement learning
Gong et al. Kaleidoscope: Physical backdoor attacks against deep neural networks with RGB filters
Hariprasad et al. Boundary-based fake face anomaly detection in videos using recurrent neural networks
Lee et al. Attacking logo-based phishing website detectors with adversarial perturbations
CN115001763B (en) Phishing website attack detection method and device, electronic equipment and storage medium
Wu et al. DAPter: Preventing user data abuse in deep learning inference services
Li et al. Side channel steganalysis: when behavior is considered in steganographer detection
CN110020256A (en) The method and system of the harmful video of identification based on User ID and trailer content
Liang et al. Soft multimedia anomaly detection based on neural network and optimization driven support vector machine
Hendrych et al. New approach to steganography detection via steganalysis framework
Li et al. Halnet: A hybrid deep learning model for encrypted c&c malware traffic detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant