GB2590916A - Steganographic malware detection - Google Patents

Steganographic malware detection Download PDF

Info

Publication number
GB2590916A
GB2590916A GB2000083.2A GB202000083A GB2590916A GB 2590916 A GB2590916 A GB 2590916A GB 202000083 A GB202000083 A GB 202000083A GB 2590916 A GB2590916 A GB 2590916A
Authority
GB
United Kingdom
Prior art keywords
image
malware
data
classifier
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2000083.2A
Other versions
GB202000083D0 (en
Inventor
Kallos George
El-Moussa Fadi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Telecommunications PLC
Original Assignee
British Telecommunications PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Telecommunications PLC filed Critical British Telecommunications PLC
Priority to GB2000083.2A priority Critical patent/GB2590916A/en
Publication of GB202000083D0 publication Critical patent/GB202000083D0/en
Publication of GB2590916A publication Critical patent/GB2590916A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • G06T1/0028Adaptive watermarking, e.g. Human Visual System [HVS]-based watermarking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32144Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
    • H04N1/32352Controlling detectability or arrangements to facilitate detection or retrieval of the embedded information, e.g. using markers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0201Image watermarking whereby only tamper or origin are detected and no embedding takes place

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

A computer implemented method of detecting data stored within an image for or by malware, the method comprising: for each of a plurality of input images, storing a portion of malware data within the input image by a steganographic process to create a respective second image; training a classifier to classify each second image to its respective input image; receiving a third image; and detecting malware data within the third image by executing the classifier based on the third image.

Description

Steganographic Ma!ware Detection The present invention relates to the detection of malicious software concealed using steganography.
Steganography is the concealment of information within other information, such as concealing a file, message, image, or video within another file, message, image, or video. Steganographic techniques are increasingly employed for embedding data within digital images where the data is hidden in the content of the image. Malicious software (malware) uses steganography to store executable code, command and control instructions and/or parameters, and/or stolen information within the content of images such as images that are communicated via websites. The use of such techniques for the malicious communication of information introduces additional challenges for the detection, mitigation and remediation of malware in computer systems and networks.
Accordingly, it is beneficial to provide improvements in the detection of malware.
According to a first aspect of the present invention, there is provided A computer implemented method of detecting data stored within an image for or by malware, the method comprising: for each of a plurality of input images, storing a portion of malware data within the input image by a steganographic process to create a respective second image; training a classifier to classify each second image to its respective input image; receiving a third image; and detecting malware data within the third image by executing the classifier based on the third image.
Preferably, the malware is detected within the third image based on a degree of confidence of classification of the third image by the classifier.
Preferably, the malware data includes one or more of: executable malware code; and malware command and/or control instructions.
Preferably, the steganographic process is one of: an image domain steganographic process in which the portion of malware data is stored in the input image by adjusting an intensity of pixels in the input image; and a transform domain steganographic process in which the portion of malware data is stored in the input image by transforming the input image and then storing the malware data in the input image.
Preferably, the classifier is trained using backpropagation.
According to a second aspect of the present invention, there is a provided a computer system including a processor and memory storing computer program code for performing the steps of the method set out above.
According to a third aspect of the present invention, there is a provided a computer system including a processor and memory storing computer program code for performing the steps of the method set out above.
Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a block diagram a computer system suitable for the operation of 10 embodiments of the present invention; Figure 2 is a component diagram of an arrangement for detecting data stored within an image for or by malware in accordance with an embodiment of the present invention; Figure 3 is a flowchart of a method for detecting data stored within an image for or by malware in accordance with an embodiment of the present invention; Figure 4 is a component diagram of an arrangement for detecting data stored within an image for or by malware in accordance with an embodiment of the present invention; and Figure 5 is a flowchart of a method for detecting data stored within an image for or by malware in accordance with an embodiment of the present invention.
Figure 1 is a block diagram of a computer system suitable for the operation of embodiments of the present invention. A central processor unit (CPU) 102 is communicatively connected to a storage 104 and an input/output (I/O) interface 106 via a data bus 108. The storage 104 can be any read/write storage device such as a random-access memory (RAM) or a non-volatile storage device. An example of a non-volatile storage device includes a disk or tape storage device. The I/O interface 106 is an interface to devices for the input or output of data, or for both input and output of data. Examples of I/O devices connectable to I/O interface 106 include a keyboard, a mouse, a display (such as a monitor) and a network connection.
Figure 2 is a component diagram of an arrangement for detecting data stored within an image for or by malware in accordance with an embodiment of the present invention.
A classifier 200 is provided as a machine learning component suitable for generating a classification as an output based on an input set of parameters. For example, the classifier 200 is implemented as a neural network, autoencoder or support vector machine, though other suitable classifiers are and may become available. The classifier is configured to accept, as an input data set, a data structure corresponding to image data such as a vector or matrix representation of an image including, for example, image pixel data encoded using a plurality of colour values. Such data is preferably normalised. For example, normalisation of a colour value occurring in a range of 0 to 255 can be provided within a normalised range of 0 to 1 using well known techniques for numeric normalisation. The classifier 200 is trained to generate an output classification indicative of a correspondence between an input data set and an output data set of the classifier. For example, an output of the classifier can be a vector or matrix representation. In accordance with embodiments of the present invention, the classifier 200 is arranged to take image data as an input and classify such image data to other image data as an output. In particular, the classifier is trained using a training method such as a supervised back-propagation method of a feedforward neural network. Such training can be provided by way of a trainer (not illustrated) such as a hardware, software, firmware or combination component arranged to provide classifier training functionality based on training data provided as a plurality of training examples. For example, the classifier can be provided as a feedforward neural network trained using a supervised back-propagation algorithm.
In accordance with embodiments of the present invention, the classifier 200 is trained based on a plurality of input images 202 such as bitmapped, raster or other suitable images.
Each input image used for training the classifier is subjected to a steganographic process 206 by which at least a portion of malware data 204 is stored in the input image to create a second image. Any suitable steganographic process can be employed such as any of the steganographic processes described in "An Overview of Image Steganography" (Morkel, T., Eloff, J.H. and Olivier, M.S. (2005), Information Security South Africa, Johannesburg, 29 June-1 July 2005, 1-11). Most preferably, the steganographic process 206 employed for generating the second image for training the classifier 200 is a steganographic process known to be utilised by a malware for communicating malware data within images.
The malware data 204 stored in the input image to generate the second image can be any malware data including: executable malware code; malware script; malware command 30 instructions; and malware control instructions.
The classifier 200 is trained using a combination of an input image (unamended by steganography) and the first image (corresponding to the input image amended by steganographic process 206) so as to train the classifier 200 to classify the first image to the input image. Thus, the training of the classifier 200 by multiple (and potentially many) such input image and first image pairs results in the classifier 200 being adapted to classify any image containing data stored using the steganographic process 206 to an original of such image with a greater degree of confidence than an image devoid of steganographically stored content. That is to say that a new image, such as third image 208, containing data stored therein using the steganographic process 206 will be classified by the classifier 200 to an output vector with a high degree of confidence, the output vector corresponding to an original version of such third image 208 prior to the application of a steganographic process 206. Furthermore, should third image 206 not contain data stored therein using the steganographic process 206 then the classifier 200 will classify the third image 206 with a lower degree of confidence (or not at all) indicative of an absence of data stored therein using the steganographic process 206. Accordingly, the trained classifier 200 can be considered to encode the effect of the steganographic process 206 on an input image by confidently classifying such image and thus constitutes an effective measure for detecting the application of the steganographic process 206 in any image.
Responsive to the classifier 200 classifying a new image 208 as including data stored therein using the steganographic process 206, a responder component 210 as a hardware, software, firmware or combination component can be configured to provide responsive action to such classification. For example, the responder component 210 can implement, trigger or provide responsive action(s) such as: isolating, quarantining or deleting the image 208; trigger further scanning of the image 208; alerting a user as to the existence of the image 208; dispatch, send or otherwise communicate the image 208 to a malware reporting, scanning or protection component; utilise the image 208 as input to train a further, additional or downstream malware detection component; add the image 208 to a register of detected malware; and other responsive measures as will be apparent to those skilled in the art.
Figure 3 is a flowchart of a method for detecting data stored within an image for or by malware in accordance with an embodiment of the present invention. Initially, at step 302, the method loops through each of a plurality of input images 202. At step 304, for a current input image 202, the method stores a portion of malware data 204 in the image using a steganographic process 206 to create a second image. The method loops through all input images at step 306. At step 307 the method trains a classifier 200 to classify each second image to each respective input image. At step 308 a third image 208 is received and the classifier is executed at step 310 to determine if the third image 208 can be confidently classified to indicate that the third image 208 includes data stored therein using the steganographic process 206. Where such storage of data is detected at step 312, the method triggers responsive actions at step 314.
Figure 4 is a component diagram of an arrangement for detecting data stored within an image for or by malware in accordance with an embodiment of the present invention. Many of the elements of Figure 4 are identical to those described above with respect to Figure 2 and these will not be repeated here. The arrangement of Figure 4 reflects the potential for particular steganographic processes to indicate a particular type of malware such that a particular malware may consistently utilise one or more particular steganographic techniques to store malware data within images. Figure 4 is arranged to provide the malware detection of Figure 2 with additional malware identification based on a plurality of steganographic processes 406a and 406b. Thus, multiple classifiers are provided 400a, 400b each being trained based on a plurality of input images 202 in which malware data 204 is stored using different steganographic processes 406a, 406b. Thus, classifier 400a is trained based on input images 202 in which malware data 204 is stored using a first steganographic process 406a, so as to generate first images as training data for first classifier 400a. Similarly, classifier 400b is trained based on input images 202 in which malware data 204 is stored using a second steganographic process 406b, so as to generate second images as training data for second classifier 400b.
Subsequently, in use, a third image such as image 408 is classified by each of the first and second classifies 400a, 400b so as to detect the presence of malware data stored in the third image 408 and, additionally, to determine which of the steganographic processes 406a, 406b is used in storing such malware data in the third image 408. The determination of the steganographic process used can be made based on a confidence of classification by each of the first and second classifiers 400a, 400b such that a more confident classification by a classifier determines a steganographic process used to train that classifier. Malware being stored in images using a particular steganographic process is indicative of the type of malware and the determination of the steganographic process used based on the first and second classifiers 400a, 400b thus serves to identify the malware, type of malware or category of malware used in the third image 408.
While a common set of input images 202 and malware data 204 are depicted in Figure 4, it will be apparent to those skilled in the art that a separate set or intersecting sets of input 30 images and malware data may be used for training each of the first and second classifiers 400a, 400b Notably, while two classifiers are depicted in Figure 4 and described herein, any number of classifiers may be used each corresponding to any number of steganographic processes so as to distinguish potentially multiple malwares or malware types. In one embodiment, particular malware or malware types are not known when training the classifiers and, in such embodiments, the application of multiple classifiers serves to categorise malware stored in images 408 into groups of malware that can be considered similar or related, even where the particular malware is unknown (such as during a zero-day attack). Furthermore, the output of multiple classifiers for multiple images 408 can be clustered using, for example, k-means clustering techniques, to group unknown or partially known malwares for similar handling such as for similar responsive actions by the responder 410.
Figure 5 is a flowchart of a method for detecting data stored within an image for or by malware in accordance with an embodiment of the present invention. Initially, at step 502, the method loops through each of a plurality of input images 202 (noting that, in some embodiments, different input images may be used for different classifiers). At step 504, for a current input image 202, the method stores a portion of malware data 204 in the image using a first steganographic process 406a to create a first image. At step 506, for the current input image 202, the method stores a portion of malware data 204 in the image using a second steganographic process 406b to create a second image. The method loops through all input images at step 508. At step 510 the method trains a first classifier 400a to classify each first image to each respective input image. At step 512 the method trains a second classifier 400b to classify each second image to each respective input image. At step 514 a third image 408 is received and the first and second classifiers 400a, 400b are executed at step 516 to determine if the third image 408 can be confidently classified to indicate that the third image 408 includes data stored therein using one of the steganographic processes 406a, 406b. The malware or a type of the malware is identified or categorised based on a degree of confidence of classification by each of the classifiers 400a, 400b. Where such storage of data is detected at step 518, the method triggers responsive actions at step 520 where the responsive actions are based on the identified malware type.
Insofar as embodiments of the invention described are implementable, at least in part, using a software-controlled programmable processing device, such as a microprocessor, digital signal processor or other processing device, data processing apparatus or system, it will be appreciated that a computer program for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present invention. The computer program may be embodied as source code or undergo compilation for implementation on a processing device, apparatus or system or may be embodied as object code, for example.
Suitably, the computer program is stored on a carrier medium in machine or device readable form, for example in solid-state memory, magnetic memory such as disk or tape, optically or magneto-optically readable memory such as compact disk or digital versatile disk etc., and the processing device utilises the program or a part thereof to configure it for operation. The computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged as aspects of the present invention.
It will be understood by those skilled in the art that, although the present invention has been described in relation to the above described example embodiments, the invention is not limited thereto and that there are many possible variations and modifications which fall within the scope of the invention.
The scope of the present invention includes any novel features or combination of features disclosed herein. The applicant hereby gives notice that new claims may be formulated to such features or combination of features during prosecution of this application or of any such further applications derived therefrom. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the claims.

Claims (7)

  1. CLAIMS1. A computer implemented method of detecting data stored within an image for or by malware, the method comprising: for each of a plurality of input images, storing a portion of malware data within the input image by a steganographic process to create a respective second image; training a classifier to classify each second image to its respective input image; receiving a third image; and detecting malware data within the third image by executing the classifier based on the third image.
  2. 2. The method of claim 1 wherein the malware is detected within the third image based on a degree of confidence of classification of the third image by the classifier.
  3. 3. The method of any preceding claim wherein the malware data includes one or more 15 of: executable malware code; and malware command and/or control instructions.
  4. 4. The method of any preceding claim wherein the steganographic process is one of: an image domain steganographic process in which the portion of malware data is stored in the input image by adjusting an intensity of pixels in the input image; and a transform domain steganographic process in which the portion of malware data is stored in the input image by transforming the input image and then storing the malware data in the input image.
  5. 5. The method of any preceding claim wherein the classifier is trained using backpropagation.
  6. 6. A computer system including a processor and memory storing computer program code for performing the steps of the method of any preceding claim.
  7. 7. A computer program element comprising computer program code to, when loaded 30 into a computer system and executed thereon, cause the computer to perform the steps of a method as claimed in any of claims 1 to 5.
GB2000083.2A 2020-01-05 2020-01-05 Steganographic malware detection Pending GB2590916A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB2000083.2A GB2590916A (en) 2020-01-05 2020-01-05 Steganographic malware detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2000083.2A GB2590916A (en) 2020-01-05 2020-01-05 Steganographic malware detection

Publications (2)

Publication Number Publication Date
GB202000083D0 GB202000083D0 (en) 2020-02-19
GB2590916A true GB2590916A (en) 2021-07-14

Family

ID=69527812

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2000083.2A Pending GB2590916A (en) 2020-01-05 2020-01-05 Steganographic malware detection

Country Status (1)

Country Link
GB (1) GB2590916A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487428B (en) * 2020-11-26 2022-03-11 南方电网数字电网研究院有限公司 Dormant combined computer virus discovery method based on block chain

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509775A (en) * 2018-02-08 2018-09-07 暨南大学 A kind of malice PNG image-recognizing methods based on machine learning
KR20190057726A (en) * 2017-11-20 2019-05-29 경일대학교산학협력단 Apparatus for detecting and extracting image having hidden data using artificial neural network, method thereof and computer recordable medium storing program to perform the method
US20190182268A1 (en) * 2017-12-07 2019-06-13 Mcafee, Llc Methods, systems and apparatus to mitigate steganography-based malware attacks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190057726A (en) * 2017-11-20 2019-05-29 경일대학교산학협력단 Apparatus for detecting and extracting image having hidden data using artificial neural network, method thereof and computer recordable medium storing program to perform the method
US20190182268A1 (en) * 2017-12-07 2019-06-13 Mcafee, Llc Methods, systems and apparatus to mitigate steganography-based malware attacks
CN108509775A (en) * 2018-02-08 2018-09-07 暨南大学 A kind of malice PNG image-recognizing methods based on machine learning

Also Published As

Publication number Publication date
GB202000083D0 (en) 2020-02-19

Similar Documents

Publication Publication Date Title
US11637859B1 (en) System and method for analyzing binary code for malware classification using artificial neural network techniques
Shenfield et al. Intelligent intrusion detection systems using artificial neural networks
US10621487B2 (en) Neural network verification
CN111917740B (en) Abnormal flow alarm log detection method, device, equipment and medium
US20210157909A1 (en) Sample data generation apparatus, sample data generation method, and computer readable medium
CN109600362B (en) Zombie host recognition method, device and medium based on recognition model
Li et al. Deep learning backdoors
KR20120073018A (en) System and method for detecting malicious code
CN113360912A (en) Malicious software detection method, device, equipment and storage medium
CN110909348A (en) Internal threat detection method and device
CN112784269A (en) Malicious software detection method and device and computer storage medium
GB2590916A (en) Steganographic malware detection
CN113378161A (en) Security detection method, device, equipment and storage medium
GB2590917A (en) Steganographic malware identification
JP7251078B2 (en) Image processing device and program
CN113553586A (en) Virus detection method, model training method, device, equipment and storage medium
WO2020193331A1 (en) Feature detection with neural network classification of images representations of temporal graphs
Santoso et al. Malware Detection using Hybrid Autoencoder Approach for Better Security in Educational Institutions
Hou et al. IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency
CN114679331B (en) AI technology-based malicious code passive detection method and system
US20240129329A1 (en) Method and apparatus for testing a malware detection machine learning model
Bunzel et al. Multi-class Detection for Off The Shelf transfer-based Black Box Attacks
CN109214212A (en) Information leakage protection method and device
US20240135230A1 (en) Method and apparatus for generating a dataset for training a content detection machine learning model
KR102525486B1 (en) Method and apparatus for discriminating between original image and forgery image based on out-of-distribution detection