US12505211B2 - Method for detection of malware - Google Patents
Method for detection of malwareInfo
- Publication number
- US12505211B2 US12505211B2 US18/031,640 US202018031640A US12505211B2 US 12505211 B2 US12505211 B2 US 12505211B2 US 202018031640 A US202018031640 A US 202018031640A US 12505211 B2 US12505211 B2 US 12505211B2
- Authority
- US
- United States
- Prior art keywords
- image
- cfut
- data
- images
- different
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the invention relates to a method for detection of malware, such as a defective code or a computer virus.
- malware is typically contained in computer files, and these computer files may have constant or varying sizes.
- the invention further concerns a computer system, a computer program and a computer-readable storage medium, such as a physical data carrier or a network resource, which are all related to the above method.
- Malware is generally considered as a type of software or code that possesses malicious characteristics to cause damage to the user, a computer, or a network, for example. Malware can be differentiated into known malware (i.e. previously classified), potentially unwanted applications and unknown binaries.
- malware detection approaches In static analysis, observable artifacts of software objects are analyzed, whereas in dynamic analysis, information is generated dynamically by executing software objects which are under scrutiny.
- malware In static analysis, malware is thus detected without executing an application or monitoring the run time behavior of a particular code.
- a classic static analysis technique is signature matching, in which a match is to be found with malicious signatures (which constitute a kind of fingerprint of the malware).
- signature matching must keep up with current development of malware signatures to be effective. As around 360.000 new malware samples hit the global computer networks every single day, while around 70% of this malware exists only once and around 80% of this malware disappears after one hour, keeping up with this extremely fast development of signatures is becoming an increasingly unrealizable task.
- a central object of the invention is to achieve a high classification performance of static malware analysis and to outperform classic static analysis approaches. It is a further object of the invention to develop a malware detection approach that is highly adaptive to computer files of varying types and sizes and is also highly difficult to circumvent by malicious code architecture, in particular by obfuscation schemes.
- a method having one or more of the features disclosed herein, which solves the afore-mentioned problem.
- the invention proposes a method as introduced at the beginning, which, in addition, is characterized in that a computer file under test, referred to in the following as CFUT, is processed by the following steps: the CFUT is converted into image data that comprise a multitude of different images; the image data are classified using a method of artificial intelligence, in particular a method of machine learning, in particular deep learning, to detect the malware.
- the CFUT may be safely isolated in case malware is detected, to provide protection from the malware, for example by isolating affected attachments of an email that were detected as containing malware using the method.
- Machine learning methods are based on algorithms whose performance improve as they are exposed to more and more data over time. Standard methods of machine learning employ for example neural networks, decision trees or other classification methods. Also deep learning is a subset of machine learning in which multi-layered neural networks (NN) learn from vast amounts of data.
- NN multi-layered neural networks
- the CFUT may be stored temporarily in a memory.
- the CFUT may also be extracted from data traffic, for example traffic from a network connection. Such data traffic can also result from remote connections, in particular internet connections.
- the term computer file may be understood here in a broad sense as comprising any sort of binary data that is summarized into a data block.
- the computer file under test (CFUT) 1 may be of any file format. For example, it may be a standard PDF-document or an executable file.
- the images may be different in their visual appearance, i.e. in the underlying image data.
- two color images of same type and identical image size S may be different in the distribution of a particular color; or two black-and-white images may be different in the distribution of black pixels.
- classification may be understood here in that classification modules, for example in the form of a multi-layer neural network, may classify a particular image as belonging to a certain class of images corresponding to malware. Such classification can be based on training performed with training data in the form of image files derived from known malware.
- each of the multitude of different images may be classified independently from the other images.
- the method may thus evaluate the malignity of the CFUT, in particular with a digital result (YES/NO), based on an overall classification result of the multitude of different images.
- the result of this evaluation may be used to trigger further actions within a computer system possibly affected by the malware.
- the detected malware can be an intentionally designed defective code or simply an artefact affecting the normal operation or use of the CFUT.
- Each of the different image files may thus be processed by a separate image classification module that is based on machine learning from training data.
- a classifier module may detect the malware by processing outputs of the image classification modules. Such outputs may be produced in the form of a scalar, e.g. a pre-classification score between 0 and 100, or as a multidimensional output vector.
- the above method allows robust detection of malware by relying on an image classification, which considers relevant semantic characteristics, in particular of inner data components, of the CFUT.
- the detection of the malware may be based on static analysis of the CFUT only.
- image may be understood here, in particular, as a multidimensional (e.g. 1.5d, 2d, 3d, 4d, etc.) representation of digital binary data. Rasterization may be used to convert such image data into a grid of pixels that can be used for visualization purposes (allowing humans to view such images), although the method may be implemented in computer software without relying on any such visualization (visualization may be useful, however, for human-based optimization of the classification, for example by fine-tuning the classification).
- image may be understood as describing any sort of data representing a multidimensional vector space. In the simplest form, an image may be represented by an array of continuous data, wherein each data point represents a pixel of the image, i.e. a certain point within the vector space. Each such pixel may thus be represented by a number of bits defining the brightness and/or color and/or spatial position of the pixel.
- image data or images may thus be characterized in that these image data contain neighborhood information describing the spatial relation between different data points (pixels) within the image.
- the images used for classifying the malware may be characterized as data that allow calculation of spatial distances between data points (i.e. pixels) constituting a single image. Such distances may be understood as a measure of remoteness between those data points in the vector space represented by the image data.
- the image may have any spatial support. However, calculations may be more efficient by using a spatial support that has rectangular or even quadratic faces.
- the pixels of an image may have any shape. However, preferably all pixels are of equal shape.
- an image is represented by a pixel matrix having N 1 ⁇ N 2 ⁇ N 3 ⁇ . . . ⁇ N d entries, with d the dimension of the matrix.
- N 1 may be any natural number, typically multiples of 2.
- a 2d-image might be represented by 256 ⁇ 256 pixels. It is clear, that any image needs to be supported by associated image data, typically stored in a convenient image file format.
- the image classification modules and/or the classifier module can be trained using training images, which can be done prior to the classification of the image data.
- the CFUT may be converted such that for at least one, preferably for all, of the different images, a respective uniform image size S i is maintained independent of the size X of the CFUT.
- a conversion that maintains a particular image size S i independent of the CFUT to be converted can be achieved by using a loss-less conversion, in which all data of the CFUT are considered, or by converting only part of the data contained in the CFUT and neglecting the remainder of the CFUT, as will be explained in greater detail below.
- a uniform image size S i of a particular type of image may be maintained without cropping a particular image to its desired uniform image size S i .
- Image cropping has the fundamental disadvantage that no integral representation of the CFUT is generated; for example, a cropped image may only represent parts of the body or the footer of a CFUT. This poses the risk that a cropped image may be classified as resulting from a benign CFUT although the CFUT contains malware, which may be nested exactly in those parts of the image that have been cropped away. In addition, images that may be classified individually as benign may be classified together as malign. Image cropping thus poses fundamental risks, if not combined with other sophisticated methods.
- the multitude of different images may comprise several different image types.
- Each image type may be characterized by a particular image size S i . Therefore, the respective image sizes S 1 , S 2 , S 3 , etc. of some of the different images may be different, while some of the images, in particular when being of the same image type, may have the same image size.
- various CFUT of varying sizes X i may be converted into a uniform image size S i with respect to a particular image type used among the different images, for example for a particular type of gray-scale image.
- an image classification module used to classify one of the different image types may be trained using training computer files of different sizes X i . In this case, the training computer files may be converted into training images of the uniform image size S i of the image type classified by the module and used as inputs to that image classification module for training.
- the conversion of the CFUT may be such that a respective uniform image size S i is maintained for each one of the different image types, independent of the (current) size X of the CFUT.
- the method may include, that a uniform image size S is specified as a parameter being independent of the size of the CFUT.
- the CFUT may thus be converted such that one or several of the different images have the specified uniform image size S.
- An image classification module used to classify such images of size S may be trained using training computer files of different sizes; however, the training files may be converted into training images of the uniform image size S and used as inputs to the image classification module.
- there may be also different uniform image sizes S i each used for a respective one of the different images, in particular for a respective image type.
- the conversion(s) of the CFUT may be such that a respective uniform image size S i may be maintained for each one of the different images, independent of the size S of the CFUT currently investigated.
- Using different image types among the multitude of different images may have the effect that the different images differ in their respective image size S i and/or in the respective underlying conversion algorithm used for calculating the particular image type and/or in the respective pixel type used for the particular image type.
- Such different pixel types may be one-dimensional gray-scale pixels or multidimensional color pixels, for example.
- each of the different image types employed can provide a different semantic representation of the CFUT.
- the semantic representation provided by different image types can differ in terms of information on a structure and/or on a composition and/or on a spatial distribution and/or on a density and/or on statistics of inner data components of the CFUT, respectively.
- Such inner data components may be data defined by the user/editor of the CFUT (which may be referred to as the “bacon”) or data representing structural information (which may be referred to as the “skeleton”).
- latent space representations may be understood here as providing information about the CFUT that is not plainly observable from the raw data of the CFUT. Rather, latent space representations must be typically derived from the raw data of the CFUT, for example by processing an image obtained from the CFUT with a neural network. As latent space representations may represent important (image) features of the original image, they can contain all relevant information needed to classify the original image and thereby the CFUT.
- the success rate of malware detection can be greatly improved.
- the file size X of the CFUT varies greatly, e.g. due to a varying size of user specific information (bacon) contained in the CFUT
- the approach of using multiple different image representations of the CFUT (or parts of the CFUT) will safeguard that relevant information on the structure or randomness or entropy of the data contained in the CFUT will still be fed into the classification with sufficient detail. This is because single ones of the different images can be tailored to represent different semantic qualities/aspects/characteristics of the CFUT.
- the method can be based purely on static analysis.
- the image data may be classified purely by static analysis, in particular without dynamic interaction with the CFUT and/or without activating the malware.
- This approach has the benefit that no execution of the CFUT is required, which would be potentially harmful, even in a sand-box environment.
- Another important suggestion for improving the robustness of the detection and also for increasing the difficulty in bypassing such detection by the malware itself is to calculate images from randomly selected data subsets of the CFUT.
- at least one of the different images may be an image calculated from a subset of data of the CFUT.
- the image size A i of that particular image can be maintained independent of a size X of the CFUT, in particular by neglecting some of the data of the CFUT.
- the subset is selected randomly from the CFUT and/or independently of a size X of the CFUT.
- the subset may be selected by employing a random number generator. In this case, it will be impossible to predict, which data points of the CFUT will be considered for the calculation of the image and what the image will look like.
- the image is a characteristic representation of the CFUT.
- malicious CFUT will produce certain characteristics in the image, even when the subset of data is randomly chosen, and those characteristics can be detected using available methods of machine learning.
- the random selection of the data may follow a predetermined probability distribution such as an even distribution or a distribution centered at different locations within the CFUT or within the image space defined by the particular image.
- the subset of data randomly selected from the CFUT consists of subsubsets of data.
- These subsubsets may each represent a particular local portion of the CFUT, in particular within a particular data segment of the CFUT.
- a subsubset may consist of three bytes which follow consecutively on each other in the CFUT.
- the data contained in one of these subsubsets may be locally near each other within the CFUT, in particular such that these data form a string of data (e.g. a string of three bytes).
- the CFUT may be divided into data clusters.
- the subset may be selected by dividing the CFUT into data clusters, preferably of uniform cluster size, and by randomly selecting data from within each of the clusters to form the subset.
- it is preferably when multiple data points from within one particular of said clusters are used to calculate a particular, preferably multi-dimensional, pixel of said image of uniform image size S i .
- the cluster size may be chosen randomly and/or variable cluster sizes may be used, to further enhance the randomness of the selection of the subset of data used for the calculation of the image.
- the meaningfulness of the image can be increased as it represents more information than a simple gray image.
- an image space defined by said image is filled up with said pixels by arranging the pixels in the image space using a space filling curve, preferably using a Hilbert curve.
- a major advantage of using a space filling curve is that the locality of the information contained in the CFUT can be better preserved in the image. This means that pixels which are close to each other in the image are computed from data which are close to each other within the CFUT.
- Such an approach of maintaining locality is specifically adapted to convolutional neural networks, since these networks typically generalize image data by using 2D-rasterization (e.g. by generalizing a N ⁇ N pixels area into a single Max-/Min-/average value).
- the process of randomly selecting data from the CFUT for calculation of a particular image may be repeated. That is, the same CFUT may be repeatedly converted into an image of uniform image size S i , each time calculating a respective image from a randomly selected subset of data of the CFUT.
- a (first) series of images all calculated from varying data subsets of the CFUT can be obtained.
- the malware may then be detected by classifying, either sequentially or in parallel, said series of images.
- Lossless mapping of data may be used for generating images from the CFUT.
- the CFUT may be converted into one of the different images by mapping, in particular reordering, preferably all, data contained in the CFUT to that particular image. This means that all pixels of at least one of the different images may be calculated from data of the CFUT, in particular without relying on zero padding.
- Another approach for generating a particular type of image type is to calculate image pixels from sequence of data blocks, in particular from byte sequences. If the resulting image size exceeds a desired uniform image size S i , some of said pixels may be deleted until the number of pixels is adequate for filling up the image size S i . For example, all pixels of at least one of the different images may be calculated from a respective sequence of, preferably neighboring, data blocks (in particular bytes) of the CFUT. In this case, a desired uniform image size S i can be maintained by deleting pixels iteratively and thus using only a subset of the calculated pixels in the final image.
- an image space defined by at least one of the images may be filled up completely with pixels calculated from data of the CFUT using a space filling curve, most preferably a Hilbert curve.
- a space filling curve most preferably a Hilbert curve.
- each pixel of that particular image can represent data contained in the CFUT.
- the CFUT can be converted into one of the different images by mapping, in particular reordering, a sequence of data blocks of the CFUT using a space filling curve, preferably using a Hilbert curve.
- One major advantage of using a space filling curve for converting the CFUT into image data is that—different from reshaping and resizing of images obtained by a simple direct transfer of data from the CFUT—the image data can be obtained largely or completely free of distortions, such that characteristics of the CFUT, which are relevant for the image classification, can be preserved.
- Entropy images Another suitable type of image that may be used for enhancing the detection performance are entropy images.
- Entropy may be understood here (as used in classical information theory, e.g. Shannon entropy) as a measure of average information content, with a small entropy indicating a high order and hence low information content.
- pixel values of the resulting image may be given by ratios of the calculated entropies divided by the total entropy of the CFUT, i.e. as relative local entropies; for example, the pixel values may be normalized to an interval ranging from 0 to 1, to indicate the local relative entropy of a particular data block within the CFUT.
- the resulting entropy images can thus visualize local entropy distributions within the CFUT. This is particularly helpful for detecting obfuscated code in the CFUT.
- Code obfuscation mechanisms are frequently used by developers of malware to bypass malware detection technologies, thus making it hard for an AV-software to detect such malware.
- use of a large variety of code obfuscation mechanisms, for example Packers or Cryptors typically produces high entropy.
- the use of entropy images therefore greatly improves the detection of malware, as such obfuscation can be detected as local hotspots of the entropy.
- any property of the CFUT that can be extracted as a numerical value may be used to convert the CFUT into an image to be classified.
- an appropriate metric may be defined and applied to the CFUT to extract such values and compute an image from these values.
- latent space representations in particular described as a vector in a high-dimensional vector space, may be extracted from images computed from the CFUT, and such data may be evaluated to detect malware.
- the CFUT may be represented as a color image by grouping/transforming successive bits or bytes into pixels and by giving the pixels a certain order.
- statistical properties or other data features may be extracted from the color image, an these data may be analyzed in a vector space as a latent space representation of the image.
- convolutions may be used to generate more and more generalized representations of an original image obtained from the CFUT, until a latent space representation of the original image is achieved, which can then be used as input for a classifier module, such as a neural network or a boosted tree.
- a classifier module such as a neural network or a boosted tree.
- overlapping or same portions of the CFUT may be converted into different images, in particular into images of different image type.
- At least one of the different images of size A may be obtained by segmenting an image space of that particular image into a multitude of image segments, by segmenting the CFUT into data segments, in particular of successive data blocks, and by mapping different ones of the data segments onto different ones of the image segments.
- the image segments have rectangular, preferably quadratic, faces. This also facilitates the use of space filling curves, when filling up or defining the image segments.
- the pixels of that image are calculated from data blocks of the CFUT and that the pixels are re-allocated in blocks filling respective image segments of the image.
- the explained re-allocation of data can be tailored in particular to the needs of convolutions employed in the classification of the images.
- a convolutional neural network CNN
- a convolutional XGBoost ConvXGB
- at least one of the different images may be classified by an image classification module, in particular by a convolutional classifier, such as a convolutional neural network, that employs a convolution window of size C for processing said image and said image may therefore be divided into image segments which match the size C of the convolution window.
- a convolutional classifier such as a convolutional neural network
- the classifier can derive a meaningful latent space representation of the CFUT from a convolution of the image (e.g. by using an appropriate Kernel or filter).
- the latent space representation derived from a particular image segment will be correlated to the respective data segment of the CFUT.
- the size C of the convolution window may be different among different ones of the image classification modules, depending on the image format to be processed.
- large computer files may actually consist of several sub-files.
- the original CFUT may be classified as being malign.
- the CFUT may be first split up into sub-files, and each of the sub-files may be converted into a different image.
- Another approach that may be used additionally or alternatively is to segment at least one of the different images into a multitude of sub-images. In this case, data contained in one of the sub-files of the CFUT may be mapped to a respective one of the sub-images.
- steps can also be iteratively applied to further sub-sub-files contained in the sub-files of the CFUT, in particular to generate further sub-sub-images, as part of the sub-images.
- the CFUT may be split up into its skeleton and bacon (c.f. above) prior to image conversion. Thereby, important structural information can be preserved, in particular when processing large CFUT.
- the CFUT may be first divided into a skeleton-file containing structural information, in particular about the type of CFUT, and a bacon-file containing user-specific information, characterizing the particular CFUT currently under investigation.
- the skeleton-file and/or the bacon-file may be each converted into one of the different images used for the detection of malware.
- This may be done, preferably, by applying a byte-conversion (which will be detailed below) to the skeleton-file and/or the bacon-file, respectively. Afterwards, the corresponding image files obtained from the skeleton-file and the bacon-file may be separately classified.
- a byte-conversion which will be detailed below
- a further concept for maintaining uniform image sizes of the different types of images used in the method is downsizing, either of portions of the CFUT and/or of sequence of data strings within the CFUT. It is therefore suggested that prior to or during the conversion of the CFUT into one of the different images at least a portion of the CFUT (which may be in particular the complete CFUT) is downsized to a reduced set of data. In particular, and different from (e.g. randomly) selecting only a subset of data from within a portion of the CFUT, all data contained in the portion of the CFUT may contribute to the reduced set of data. Hence, the reduced set of data may represent or may be derived from all of the data contained in said portion.
- the portion of the CFUT may be further segmented into a sequence of data strings and each data string may be downsized to a smaller data block. This may be done, in particular, such that all data of a particular data string contribute to the corresponding smaller data block.
- the image may be a gray-scale image.
- the image may be a colored image such as a RGB-image.
- pixels of at least one of the different images may have values that result from values of several, preferably successive, bits or bytes of the CFUT, in particular such that the pixels are given by n-dimensional vectors. In such cases, it is preferable if the respective n components of said vectors result from n neighboring data blocks, in particular n neighboring bytes, of the CFUT.
- compound images composed from images already calculated from the CFUT, may be used to further enhance the effectiveness of the malware detection.
- at least two of the different images may be combined to a compound image.
- the image data obtained from the CFUT may comprise such a compound image.
- a specific compound image may be designed that reveals the anatomy of the CFUT.
- at least one of the different images may be a compound image resulting from combining an image of a first image type calculated from a randomly chosen subset of data of the CFUT, and another image of a second image type visualizing local entropies of the CFUT.
- the compound image may further comprise a skeleton-image and a bacon-image, preferably each of a third or fourth image type, calculated from a skeleton-file and a bacon-file derived from the CFUT, respectively, as was explained previously.
- the efficiency of malware detection can be further improved by evaluating characteristic data features of the CFUT, such as statistical data, in particular prior to the image classification. Therefore, a number of characteristic data features may be extracted from the CFUT directly, in particular without converting the CFUT into image data. These data features may then be fed into a separate data classification module, whose output is considered in the detection of the malware.
- a classifier module may consider the output of the data classification module and results of image classification modules, which are used for classifying the different images, in order to detect the malware.
- the data classification module may be based on a method of machine learning, similar to the image classification modules.
- a somewhat different approach which may be used alternatively or additionally, is to generate matrix images from different data features, such as statistical data and metadata, of the CFUT.
- a number of N of such data features may be extracted as N numerical values from the CFUT, and converted into a matrix image.
- the matrix image may be any kind of image, preferably a 2d or 3d image.
- the image data which are classified using a method of machine learning, comprise the matrix image.
- the malware may then be detected based at least in part on a classification result of said matrix image.
- the N values may each be normalized prior to the conversion. This way, each of the normalized values can be represented as a pixel in the matrix image.
- the matrix image may be completed using zero padding, i.e. by filling up the remainder of the image with pixels having the value zero.
- the classification of the image data can be done using state-of-the-art methods of artificial intelligence such as machine learning.
- neural networks particularly suited for the method presented herein are neural networks, since they are particularly effective in classifying multidimensional data.
- the image data may be classified using at least one neural network (NN), preferably a convolutional neural network (CNN), and/or at least one decision tree (DT).
- NN neural network
- CNN convolutional neural network
- DT decision tree
- the particular method of artificial intelligence used for classifying a particular image type which may be implemented by one of said image classification modules, can be designed for operating on arrays of continuous data and/or on data arranged in multidimensional image file formats, such as RGB color images.
- the image classification modules may be implemented as neural networks and/or the classifier module mentioned before may be implemented as a decision tree.
- the method may be developed further in that at least two different latent space representations, in particular obtained from pre-processing at least two of the different images, may be classified to produce a pre-classification result that is evaluated to detect the malware.
- pre-processing may comprise convolutions of the image and/or extraction of features from the image.
- a separate latent space classification module may be used to classify at least two different latent space representations of at least one of the different images to detect the malware.
- Such representations can describe characteristics of an image relevant for malware detection and may be obtained, for example, from various hidden layers of a neural network (NN) that processes one of the images obtained from the CFUT, and in particular also from several hidden layers of different NN.
- NN neural network
- the invention further suggests a computer system, a computer program and a computer-readable storage medium:
- the computer system can comprise means for carrying out the single steps of a method as described before or as defined by one of the claims directed towards a method.
- said computer program may comprise instructions which, when executed by a computer, in particular by the computer system just explained, cause the computer/the system to carry out a method as described before or as defined by one of the claims directed towards a method.
- the computer-readable storage medium may comprise such instructions for the same purpose.
- FIG. 1 is a schematic flow diagram illustrating single steps of a method for detection of malware according to the invention
- FIG. 2 is a schematic of another possible implementation of a classification engine implementing a method according to the invention.
- FIG. 3 illustrates details of the file-to-image-conversion approach used
- FIG. 4 illustrates a neural network, which may be used as a classifier
- FIG. 5 illustrates examples of gray-scale images as they are obtained by direct bit-by-bit encoding of computer files of varying sizes
- FIG. 6 illustrates the use of a space filling curve for completely filling up a pre-defined image space with a given number of pixels calculated from a computer file
- FIG. 7 shows images of uniform image size obtained by using the space filling curve approach illustrated in FIG. 6 .
- FIG. 8 shows two examples of images a first image type A obtained from benign (left) and malicious (right) PDF documents
- FIG. 9 shows two examples of images a second image type B that visualized local entropies, as obtained from benign (left) and malign (right) PDF documents
- FIG. 10 illustrates a possible re-allocation of pixels in an image calculated from a computer file under test
- FIG. 11 shows two examples of images a third image type C, namely two images calculated from skeletons of a benign (left) and malign (right) computer file under test,
- FIG. 12 shows two examples of images the same image type C but of different image size as the images of FIG. 11 , the two images being calculated from bacons of a benign (left) and malign (right) computer file under test,
- FIG. 13 illustrates two compound images obtained by combining several other images derived from a benign (top) or malign (bottom) computer file
- FIG. 14 illustrates two matrix images obtained from conversion of data features contained in a benign (top) and malign (bottom) computer file under test.
- FIG. 1 presents a possible design of a classification engine 32 , implemented by software on a standard computer system, that executes a method for malware detection according to the invention.
- a computer file under test (CFUT) 1 for example a standard PDF-document temporarily stored in a memory of the computer system, is first converted into image data 3 using a conversion engine 20 .
- the image data 3 comprise a multitude 5 of different images 4 , namely a first series 9 a of different images 4 of a first image type A, a second series 9 b of different images 4 of a second image type B, further images 4 of a third image type C, several compound images 11 of a fourth image type E, which are each composed of one image 4 of type A, one image 4 of type B and two images 4 of type C (cf. FIG. 13 ), and finally yet another image 4 of a fifth image Type F in the form of a matrix image 16 . All of these different images 4 have been calculated from data contained in the CFUT 1 .
- the engine 32 of FIG. 1 offers a separate image classification module 6 ( 6 a . . . 6 e ) in the form of a respective neural network 24 that has been trained with training images up-front to identify malign content in the respective images 4 .
- the classification modules 6 classify the image data 3 using a method of machine learning, respectively.
- each of the image classification modules 6 is designed for a particular image size S i that is maintained in the conversion of the CFUT 1 into a particular image type A/B/C1/C2/D/E.
- the computer system executing the method may take further steps to safely isolate the CFUT, for example if the score 23 expresses a probability of the presence of malware 2 in the CFUT 1 above a certain threshold.
- each of the neural networks 24 employed delivers a pre-classification score 42 in the form of a numerical score value between 0 and 100.
- classification modules can also be used which deliver multidimensional output vectors representing a more complex classification result.
- a first conversion engine 20 converts the CFUT 1 into images 4 of the three different image types A, B, C. This conversion is done in such a way that no matter what the file size of the CFUT 1 , a uniform image size S i is maintained for each of the images 4 of a particular image type, for example an image Size 51 is maintained for the images 4 of type A of the first series 9 a . Therefore, none of these images 4 needs to be cropped, such that all pixels contained in the particular image 4 are considered by the respective classification module 6 .
- sub-types of images 4 for example types C1 and C2, which are skeleton images 12 and bacon images 13 of image type C, respectively, but which differ in their respective uniform image sizes S 3 and S 4 .
- image classification modules 6 c 1 and 6 c 2 which are specifically designed for an input of image size S 3 or S 4 , respectively.
- Illustrated in FIG. 1 is also that a number of characteristic data features 14 are extracted from the raw CFUT, parallel to the image conversion performed by the conversion engine 20 for image types A, B and C. These features are fed directly into a data classification module 15 in the form of a decision tree 26 that delivers another pre-classification score 42 g.
- the data features 14 are converted by a second image conversion engine 20 into a matrix image 16 of image type E, that is classified by a respective image classification module 6 e .
- the module 6 e also delivers a pre-classification score 42 e.
- the engine 32 of FIG. 1 comprises yet another neural network 24 that implements a latent space classification module 25 .
- This module 25 evaluates information processed by the image classification modules 6 a , 6 b , 6 c 1 , 6 c 2 , 6 d and 6 e ; in particular, the module 25 evaluates latent space representations 44 (cf. FIG. 1 ) of the images 4 obtained in hidden layers 30 of the neural networks 24 used as image classification modules 6 .
- the latent space representations 44 are obtained from the last hidden layer 30 before the output layer 31 of the respective neural network 24 , which delivers a minimum multi-dimensional representation of the respective image 4 , that is not accessible form the CFUT 1 directly.
- the generated compound images 11 which are composed of one image 4 of type A, one image 4 of type B, and one skeleton-image 12 and one bacon-image 13 and, each of Type C, are classified by a separate image classification module 6 d that delivers a pre-classification score 42 d.
- all pre-classification scores 42 are fed into the classifier module 7 , which determines the final classification score 23 on the basis of the eight individual pre-classification scores 42 a - 42 g , which are obtained by processing images 4 calculated from the original CFUT 1 or by processing further data derived from the CFUT 1 (as in the case of modules 15 and 6 e , which process data features 14 and latent space representations 44 processed by the modules 6 a - 6 e , respectively).
- the method for detection of malware may thus be implemented by a classification engine 32 comprising
- FIG. 2 shows another possible implementation of a method according to the invention.
- the engine 32 of FIG. 2 executing the method features an artificial intelligence stack 22 that combines the functionality delivered by the modules 6 , 7 , 15 , and 25 of FIG. 1 , and delivers the final classification score 23 .
- the engine 32 illustrated in FIG. 2 extracts a certain number of statistical data and metadata features 14 from the raw CFUT 1 (e.g. a total of 418 different features in case of a PDF-file, but a different number in case of other file formats) to gather quantitative characteristics that are specific for the particular CFUT 1 , for example the presence and amount of already known suspicious strings of characters, obfuscated, or encrypted data.
- a certain number of statistical data and metadata features 14 from the raw CFUT 1 (e.g. a total of 418 different features in case of a PDF-file, but a different number in case of other file formats) to gather quantitative characteristics that are specific for the particular CFUT 1 , for example the presence and amount of already known suspicious strings of characters, obfuscated, or encrypted data.
- statistical and metadata features are designed to not only quantify the internal composition of the CFUT 1 but also to extract indicators of malicious behavior and capabilities that are historically explored but normally only detected in dynamic analysis and interaction.
- each feature value extracted from the CFUT is first normalized, for example into a range of integers from 0 to 255, such that the normalized value can be represented by a pixel of the image 16 .
- the vector of length 418 is reshaped into a 2d-array of dimension 21 ⁇ 21.
- the resulting image 16 can provide a visual distinction between benign and malign CFUT 1 .
- a classification engine 32 executing the method may comprise a conversion engine 20 for converting the CFUT 1 into image data 3 comprising a multitude of different images 4 , for example the stack 9 a of three different images of type A, and at least on image classification module 6 for classifying the image data 3 based on a method of machine learning, which delivers a classification result indicative of the presence of malware 2 in the CFUT 1 . All other features of the method/said engine 32 may be considered optional.
- any binary code present in a computer file can be plotted, for example as a grey-scale image 4 , 17 .
- Textural and structural similarities among malware from a particular malware family can then be identified in such an image 4 using image analysis techniques, in particular based on machine learning.
- the data-to-image-conversion used in the method can thus rely on so-called pixel conversion.
- Pixel conversion can be understood here in reading a byte of the CFUT and transferring it into a value between 0 and 255, which directly corresponds to a pixel intensity (in grey value).
- pixel conversion can convert a binary consisting of a string of bytes into a 1-dimensional stream of 1-dimensional pixels of varying intensity/grey-scale, for example by using the binary bit-by-bit encoding matrix 27 shown in FIG. 3 .
- the resulting stream of pixels can then be reshaped into a multi-dimensional object, in the simplest case, into a 2d-grey-scale image with M ⁇ N pixels, which can be stored as an image file.
- Such an image 4 can thus represent a maximum of M ⁇ N bytes.
- byte-strings of the CFUT 1 may also be converted into multidimensional pixels 36 , for example into RGB-pixels; this type of pixel conversion is used for the generation of the images 4 of type A (cf. FIG. 8 ).
- the structure 28 of the original computer file illustrated on the right can be recognized in the image 4 shown on the left; in particular skeleton portions 40 and bacon portions 41 can be recognized in the image 4 , which correspond to the skeleton 43 and bacon 44 of the computer file illustrated on the right.
- a convenient and efficient way of classification is to use neural networks (NN) 24 as the one illustrated in FIG. 4 that can be employed as a classifier.
- latent space representations 44 can be generated by convolutions of an image 4 and then fed as input to a NN 24 acting as a classifier.
- networks 24 may have an input layer 29 , one or several hidden layers 30 , and an output layer 31 , which can deliver a multidimensional or 1-dimensional output vector.
- Such a vector can represent a certain classification result of an image 4 obtained from the original CFUT 1 .
- such a network 24 requires an input vector of defined size, that does not change.
- FIG. 5 shows the result of a grey-scale image conversion of two separate computer files that differ in file size. As is visible, the resulting image size (e.g. in terms of columns ⁇ lines or number of pixels) will be different. Such images 4 of varying size cannot be efficiently classified by a single classification module/a single neural network, or only with severe drawbacks.
- FIG. 6 presents one solution to this problem, namely the use of so-called space-filling curves 35 .
- Maintaining a uniform image size S can be achieved by filling a pre-defined image space 8 with relevant pixels (preferably without using zero padding), calculated from data of the CFUT 1 and thus representing important aspects of the CFUT 1 in the image 4 .
- the image space 8 defined by a particular type of image for example type A or B or C, is divided into different image segments 10 , for example into quadrants as illustrated on the top left of FIG. 6 .
- the segments 10 are connected to each other by a space filling curve 35 such as a Hilbert curve of 1 st order, as shown in the top right in FIG. 6 .
- This process can be iteratively repeated, in particular such that, one, several or all of the segments 10 are further divided in analogous manner into smaller sub-segments 39 , as illustrated in the bottom left of FIG. 6 .
- These sub-segments 39 can then be connected using a space filling curve 35 such as a Hilbert curve of 2 nd order, as shown in the bottom left of FIG. 6 .
- This 2 nd order curve can be obtained by connecting the sub-segments 39 within a segment 10 by a Hilbert curve of order 1 , respectively, and then by rotating these 1 st order curves and connecting them.
- the last data cluster or some of the last data clusters, from which the data are randomly extracted can be chosen larger than the other data clusters of the CFUT 1 , from which the subset of data is extracted. That is, by varying the size of the data clusters from which the data are extracted, a desired number of 64 pixels can be generated from the CFUT 1 , just enough to fill up the 8 ⁇ 8 pixel image.
- a maximum order of 8 can be used for the Hilbert curve.
- a space filling curve 35 of lower order may be used, that is, the image space 8 will be divided more coarsely into image segments 10 .
- the approach of using data clusters and randomly selecting subsets of data from which to calculate an image 4 can also be applied to entropy images 4 of type B or to images of type C.
- maintaining a uniform image size S i can be achieved by a loss-less byte conversion that considers all data contained in the CFUT 1 .
- a large number of pixels are first calculated considering all data contained in the CFUT 1 ; afterwards, some of the computed pixels may be discarded, i.e. only a subset of the calculated pixels may be present in the final image.
- the CFUT 1 can also be downsized prior to image conversion, and this downsizing can likewise consider all data contained in the CFUT 1 .
- the file to be converted which may be a portion of the CFUT 1 for example, is larger than 49 kB, it may be first transformed into a continuous byte-string, and then every 2nd byte of the string may be deleted, thereby creating a reduced set of data. If this reduction is not sufficient, the reduction of the string may be repeated, e.g. by deleting every 3rd byte of the already reduced string, i.e. in this case, data strings of three consecutive bytes are thus downsized to a data block of two bytes. This way, a reproducible sampling of the CFUT 1 is performed, and the bytes remaining after downsizing are finally used for defining the RGB pixels.
- all pixels of an image 4 may thus be calculated from a respective sequence of neighboring data blocks 38 of the CFUT 1 (in particular from bytes in close vicinity to each other within the CFUT 1 ), which is the case in the images 4 of type C shown in FIGS. 11 and 12 .
- the reduced byte string used for calculating these images 4 is not visualized directly (e.g. line-by-line), however. Rather, the pixels 36 calculated from the reduced byte string are re-allocated.
- FIG. 10 shows a simplified example of a possible re-allocation of pixels calculated from a one-dimensional string of bytes obtained from a CFUT 1 .
- the 1D-stream of pixels 36 is not simply filled line-by-line into the image space 8 . Rather, pixels 36 are re-allocated in blocks filling respective image segments 10 of the image 4 .
- Such a re-allocation is characterized in that neighboring pixels (e.g. the pixels #10-18) within a particular image segment 10 of the image 4 are calculated from data blocks 38 of the CFUT 1 , which are contained within a specific local data segment 43 (cf. FIG. 3 ) of the CFUT 1 .
- the locality of information is preserved, i.e. pixels 36 in close vicinity to each other represent binary data that are spatially close to each other within the CFUT 1 .
- the original CFUT 1 has been divided into different data segments 43 , which have then been individually mapped onto different image segments 10 of the final image 4 .
- the skeleton images 12 shown in FIG. 11 have been calculated by first extracting a skeleton content of the CFUT ( 1 ) as a skeleton-file and then converting the skeleton-file into an image 4 of type C1 (cf. FIG. 1 ). This is done by first transforming the skeleton-file into a byte string, reducing the byte string as explained above, with 3 consecutive bytes of the reduced string defining one RGB-pixel, and then and re-allocating the pixels as explained before.
- Images of type B are images 4 which visualize Shannon-entropies of local data blocks contained in the CFUT 1 . This is done by calculating a ratio of a local Shannon-entropy of a data block and a total Shannon-entropy of the CFUT 1 as a merit number, and by plotting these local relative entropies after normalization.
- the resulting images 4 can be color images 18 of one dimensional pixels 37 , as illustrated in FIG. 9 .
- a particular efficient solution is the use of compound images 11 as shown in FIG. 13 .
- Such images may be obtained by combining type A, B and C images 4 .
- some image cropping may be tolerable, in particular when using sub-images of different image sizes.
- Type A images 4 can be images of a uniform image size, by mapping a random selection of data contained in the CFUT 1 to an image space 8 of the image 4 .
- Type B images 4 are entropy images 4 , which may be obtained by calculating entropies of data-blocks of the CFUT 1 and visualizing the entropies as ratios of the total entropy of the CFUT 1 .
- Type C images 4 may be obtained by applying a byte-conversion to the CFUT 1 , as explained above.
- a method is proposed that can handle computer files 1 of varying types and sizes and at the same time maintain a high detection performance by classifying a number of different types (A, B, C, D, E) of images 4 , each calculated or derived from a particular computer file under test 1 (CFUT), using artificial intelligence methods such as machine learning, in particular deep learning, for example as provided by neuronal networks 24 or supervised deep learning algorithms.
- the different image types are generated using different image conversion techniques and a number of approaches are presented for computing images 4 of uniform size Si that contain relevant information for classifying the CFUT 1 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Virology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
Description
r=X i /S i varies largely.
-
- a first layer of classification supported by tree-based and/or supervised machine learning algorithms that are fed with data features, in particular statistical data and/or metadata, extracted from the CFUT 1;
- a second layer of classification based on detection of specific image features, in particular supported by supervised deep learning algorithms; and
- a third layer of classification that is fed with classification results from the first and second layer.
-
- 1 computer file under test (CFUT)
- 2 malware
- 3 image data
- 4 image
- 5 multitude (of 4)
- 6 image classification module
- 7 classifier module
- 8 image space (defined by 4)
- 9 series of images
- 10 image segment (of 4)
- 11 compound image
- 12 skeleton-image
- 13 bacon-image
- 14 data features (of 1)
- 15 data classification module
- 16 matrix image
- 17 grey-scale image (1D-pixels have values from 0 . . . 255)
- 18 color image (pixels may have 1-3 dimensions: e.g. R or R/G/B)
- 19 latent space concatenation and classification
- 20 conversion engine (calculates 4 from 1)
- 21 extraction of data features
- 22 artificial intelligence stack
- 23 final classification score
- 24 neural network
- 25 latent space classification module
- 26 decision tree (DT)
- 27 encoding matrix
- 28 structure (of 1)
- 29 input layer
- 30 hidden layer
- 31 output layer
- 32 classification engine
- 33 skeleton
- 34 bacon
- 35 space filling curve
- 36 multidimensional pixel
- 37 one-dimensional pixel
- 38 data block
- 39 sub-segment
- 40 skeleton portion (of 4)
- 41 bacon portion (of 4)
- 42 pre-classification score
- 43 data segment
- 44 latent space representation (of 4)
- 45 convolution window
Claims (27)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/EP2020/080599 WO2022089763A1 (en) | 2020-10-30 | 2020-10-30 | Method for detection of malware |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230394144A1 US20230394144A1 (en) | 2023-12-07 |
| US12505211B2 true US12505211B2 (en) | 2025-12-23 |
Family
ID=73059889
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/031,640 Active 2041-03-23 US12505211B2 (en) | 2020-10-30 | 2020-10-30 | Method for detection of malware |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12505211B2 (en) |
| EP (1) | EP4237977B1 (en) |
| CN (1) | CN116368487B (en) |
| WO (1) | WO2022089763A1 (en) |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12541591B2 (en) | 2022-04-25 | 2026-02-03 | Palo Alto Networks, Inc. | Malware detection for documents using knowledge distillation assisted learning |
| US12348560B2 (en) * | 2022-04-25 | 2025-07-01 | Palo Alto Networks, Inc. | Detecting phishing PDFs with an image-based deep learning approach |
| CN116996278B (en) * | 2023-07-21 | 2024-01-19 | 广东技术师范大学 | Webpage detection method and device based on mining behavior of WASM module |
| US20250209165A1 (en) * | 2023-12-22 | 2025-06-26 | Emergent Security, LLC | Data Tampering Defense System |
| CN117892301B (en) * | 2024-01-15 | 2024-06-28 | 湖北大学 | Few-sample malware classification method, device, equipment and medium |
| US20250274479A1 (en) * | 2024-02-22 | 2025-08-28 | Vodafone Group Services Limited | Fingerprinting network sessions for discovery of cyber threats |
| CN118585996B (en) * | 2024-08-07 | 2024-10-18 | 浙江大学 | Malicious mining software detection method based on large language model |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060282886A1 (en) * | 2005-06-09 | 2006-12-14 | Lockheed Martin Corporation | Service oriented security device management network |
| US20130117850A1 (en) * | 2011-11-09 | 2013-05-09 | Douglas Britton | System and Method for Bidirectional Trust Between Downloaded Applications and Mobile Devices Including a Secure Charger and Malware Scanner |
| US10437999B1 (en) * | 2016-08-31 | 2019-10-08 | Symantec Corporation | Runtime malware detection |
| US20190311120A1 (en) * | 2018-04-10 | 2019-10-10 | Raytheon Company | Device behavior anomaly detection |
| CN110532771A (en) | 2018-05-23 | 2019-12-03 | 深信服科技股份有限公司 | Malicious file detection method, device, equipment and computer readable storage medium |
| US20200175164A1 (en) * | 2018-12-03 | 2020-06-04 | Mayachitra, Inc. | Malware classification and detection using audio descriptors |
| US20200317458A1 (en) * | 2019-04-04 | 2020-10-08 | Seiko Epson Corporation | Image processing apparatus, machine learning device, and image processing method |
| US10977367B1 (en) * | 2018-02-06 | 2021-04-13 | Facebook, Inc. | Detecting malicious firmware modification |
| CN108280348B (en) | 2018-01-09 | 2021-06-22 | 上海大学 | Android malicious software identification method based on RGB image mapping |
| US20210397877A1 (en) * | 2020-06-23 | 2021-12-23 | IronNet Cybersecurity, Inc. | Systems and methods of detecting anomalous websites |
| US11556644B1 (en) * | 2018-12-24 | 2023-01-17 | Cloudflare, Inc. | Machine learning-based malicious attachment detector |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9111094B2 (en) * | 2011-01-21 | 2015-08-18 | F-Secure Corporation | Malware detection |
-
2020
- 2020-10-30 CN CN202080106696.XA patent/CN116368487B/en active Active
- 2020-10-30 WO PCT/EP2020/080599 patent/WO2022089763A1/en not_active Ceased
- 2020-10-30 EP EP20800863.1A patent/EP4237977B1/en active Active
- 2020-10-30 US US18/031,640 patent/US12505211B2/en active Active
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060282886A1 (en) * | 2005-06-09 | 2006-12-14 | Lockheed Martin Corporation | Service oriented security device management network |
| US20130117850A1 (en) * | 2011-11-09 | 2013-05-09 | Douglas Britton | System and Method for Bidirectional Trust Between Downloaded Applications and Mobile Devices Including a Secure Charger and Malware Scanner |
| US10437999B1 (en) * | 2016-08-31 | 2019-10-08 | Symantec Corporation | Runtime malware detection |
| CN108280348B (en) | 2018-01-09 | 2021-06-22 | 上海大学 | Android malicious software identification method based on RGB image mapping |
| US10977367B1 (en) * | 2018-02-06 | 2021-04-13 | Facebook, Inc. | Detecting malicious firmware modification |
| US20190311120A1 (en) * | 2018-04-10 | 2019-10-10 | Raytheon Company | Device behavior anomaly detection |
| CN110532771A (en) | 2018-05-23 | 2019-12-03 | 深信服科技股份有限公司 | Malicious file detection method, device, equipment and computer readable storage medium |
| US20200175164A1 (en) * | 2018-12-03 | 2020-06-04 | Mayachitra, Inc. | Malware classification and detection using audio descriptors |
| US11556644B1 (en) * | 2018-12-24 | 2023-01-17 | Cloudflare, Inc. | Machine learning-based malicious attachment detector |
| US20200317458A1 (en) * | 2019-04-04 | 2020-10-08 | Seiko Epson Corporation | Image processing apparatus, machine learning device, and image processing method |
| US20210397877A1 (en) * | 2020-06-23 | 2021-12-23 | IronNet Cybersecurity, Inc. | Systems and methods of detecting anomalous websites |
Non-Patent Citations (4)
| Title |
|---|
| Kumar Ajit et al: "Machine learning based malware classification for Android applications using multimodal image representations", 2016 10th International Conference on Intelligent Systems and Control (ISCO), IEEE, pp. 1-6, Jan. 7, 2016 (Jan. 7, 2016). |
| Li Chen: "Deep Transfer Learning for Static Malware Classification", arxiv.org, Cornell University Library, 201 Olin Library Cornell University Ithaca, NY 14853,Abstract, Sections I-IV, Figure 1, 9, 10 Tables I, II, III, IV, 9 pages, Dec. 18, 2018 (Dec. 18, 2018). |
| Kumar Ajit et al: "Machine learning based malware classification for Android applications using multimodal image representations", 2016 10th International Conference on Intelligent Systems and Control (ISCO), IEEE, pp. 1-6, Jan. 7, 2016 (Jan. 7, 2016). |
| Li Chen: "Deep Transfer Learning for Static Malware Classification", arxiv.org, Cornell University Library, 201 Olin Library Cornell University Ithaca, NY 14853,Abstract, Sections I-IV, Figure 1, 9, 10 Tables I, II, III, IV, 9 pages, Dec. 18, 2018 (Dec. 18, 2018). |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4237977A1 (en) | 2023-09-06 |
| EP4237977C0 (en) | 2024-01-17 |
| CN116368487A (en) | 2023-06-30 |
| WO2022089763A1 (en) | 2022-05-05 |
| US20230394144A1 (en) | 2023-12-07 |
| CN116368487B (en) | 2025-09-16 |
| EP4237977B1 (en) | 2024-01-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12505211B2 (en) | Method for detection of malware | |
| Vu et al. | HIT4Mal: Hybrid image transformation for malware classification | |
| Rezende et al. | Malicious software classification using transfer learning of resnet-50 deep neural network | |
| KR101863615B1 (en) | Apparatus for detecting variants of a malicious code based on neural network learning, method thereof and computer recordable medium storing program to perform the method | |
| Vu et al. | A convolutional transformation network for malware classification | |
| Chen | Deep transfer learning for static malware classification | |
| Waghela et al. | Robust image classification: Defensive strategies against FGSM and PGD adversarial attacks | |
| CN108062478B (en) | A malicious code classification method combining global feature visualization and local features | |
| US12183056B2 (en) | Adversarially robust visual fingerprinting and image provenance models | |
| CN113139618B (en) | Robustness-enhanced classification method and device based on integrated defense | |
| Kornish et al. | Malware classification using deep convolutional neural networks | |
| CN118051908A (en) | Malicious code homology detection method, device, equipment and storage medium | |
| CN116975864A (en) | Malicious code detection method, device, electronic equipment and storage medium | |
| CN116522341A (en) | Malicious software countermeasure sample generation method based on pixel attention mechanism | |
| CN119026127B (en) | Malicious code detection method, system and equipment based on multi-level feature fusion | |
| CN115292702A (en) | Malicious code family identification method, device, equipment and storage medium | |
| CN111797397A (en) | Malicious code visualization and variant detection method, device and storage medium | |
| CN114861178B (en) | Malicious code detection engine design method based on improved B2M algorithm | |
| CN113761912B (en) | A method and device for interpreting the determination of malware belonging to an attacking organization | |
| CN116383818A (en) | Malicious code family detection method and device | |
| Liaqat et al. | Deep Learning-based Malware Detection Using Independent Stream Analysis of RGB and Grayscale Images | |
| Xu et al. | Band selection for hyperspectral images based on particle swarm optimization and differential evolution algorithms with hybrid encoding | |
| Chavda | Image spam detection | |
| Alam et al. | Mining android bytecodes through the eyes of gabor filters for detecting malware. | |
| Wang et al. | RL4Mal: Representation learning-based malware classification under long-tailed distribution |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: INLYSE GMBH, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BOLL, CHRISTIAN;ZIEGLER, JULIAN;REEL/FRAME:063311/0179 Effective date: 20230330 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ALLOWED -- NOTICE OF ALLOWANCE NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |