CN114760131B - Feature extraction method, device and equipment for return type programming flow - Google Patents

Feature extraction method, device and equipment for return type programming flow Download PDF

Info

Publication number
CN114760131B
CN114760131B CN202210394785.XA CN202210394785A CN114760131B CN 114760131 B CN114760131 B CN 114760131B CN 202210394785 A CN202210394785 A CN 202210394785A CN 114760131 B CN114760131 B CN 114760131B
Authority
CN
China
Prior art keywords
data
flow
group
detected
return
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210394785.XA
Other languages
Chinese (zh)
Other versions
CN114760131A (en
Inventor
王剑
张梦杰
杨刚
刘星彤
黄恺杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210394785.XA priority Critical patent/CN114760131B/en
Publication of CN114760131A publication Critical patent/CN114760131A/en
Application granted granted Critical
Publication of CN114760131B publication Critical patent/CN114760131B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application discloses a feature extraction method, device and equipment for return type programming flow, wherein the method comprises the following steps: acquiring flow to be detected, and sequentially and circularly marking corresponding serial numbers of bytes in the flow to be detected from the first byte according to the number of preset dimensions; dividing bytes with the same sequence number into the same group to obtain a first data group containing a plurality of groups of flow data; respectively carrying out sliding window value on each group of flow data in the first data group by utilizing a sliding window with a preset window size to obtain a second data group; and taking variance from the data of the same window sliding position in each group of data of the second data group to convert the data into a one-dimensional data group, and determining the characteristic of the return-oriented programming flow in the flow to be detected according to the recorded abnormal variance region. By the technical scheme, attack data can be effectively identified, detection efficiency is improved, and high false alarm rate caused by focusing on important bytes in traditional static system detection is effectively reduced.

Description

Feature extraction method, device and equipment for return type programming flow
Technical Field
The present invention relates to the field of network traffic detection, and in particular, to a feature extraction method, apparatus and device for return type programming traffic.
Background
The modern operating system has a relatively perfect vulnerability mitigation mechanism, the memory use authority of the process can be set according to the granularity of the memory pages, the memory authority is respectively readable (R), writable (W) and executable (X), and once the CPU (Central Processing Unit, the central processing unit) executes codes on the memory without the executable authority, the operating system can immediately terminate the program. Based on the rule of vulnerability mitigation, a memory with writable and executable rights does not exist in a program generally, so that a code segment or a data segment of the program cannot be modified directly by overlaying memory data to execute a boot program to execute any code. For this vulnerability mitigation mechanism, a technique of controlling the program execution flow by returning to a specific instruction sequence in the program, i.e., return-oriented programming (ROP) has emerged. The method realizes random code execution by utilizing a plurality of gadget fragments ending with ret instructions in a libc code library to form a return guide programming chain (ROP chain), wherein the libc code library refers to a dynamic library or a system static library related to code operation. Therefore, the ROP chain becomes a unique mark facing the return programming flow, and the detection of the ROP chain is completed, so that the detection of the return programming flow can be completed.
The conventional technology has difficulty in detecting ROP traffic because of the absence of the injected binary code. One key feature of ROP code is the reliance on gadgets in the code segment. The current ROP traffic detection usually considers that the target process is initialized first, the memory address range of the code segment is identified, then the input data is scanned, and whether the input data is located in the gadget space of any protected application program is checked, but the detection rate of the method is low, and the running time cost is high. In order to improve the detection efficiency, researchers try to use a static method for detection, but under the environment of high network traffic, no suitable feature extraction method exists, and the false alarm rate is high.
In summary, how to effectively extract the characteristics of the return-oriented programming flow and further improve the detection efficiency is a problem to be solved in the present day.
Disclosure of Invention
In view of the above, the present invention aims to provide a method, an apparatus and a device for extracting characteristics of a return-oriented programming flow, which can effectively identify data of the return-oriented programming flow, improve detection efficiency, and perform appropriate characteristic extraction on the data of the return-oriented programming flow. The specific scheme is as follows:
in a first aspect, the present application discloses a feature extraction method for a return-type programming flow, including:
Acquiring flow to be detected, and sequentially and circularly marking corresponding serial numbers of bytes in the flow to be detected from the first byte according to the number of preset dimensions;
dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions;
respectively carrying out sliding window value on each group of flow data in the first data group by utilizing a sliding window with a preset window size so as to obtain a second data group with data of corresponding group number;
and taking variance from the data of the same window sliding position in each group of data of the second data group, so as to convert the second data group into a one-dimensional data group, and determining the characteristic facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group.
Optionally, before the obtaining the flow to be detected, the method further includes:
acquiring flow data to be processed;
and removing protocol header information of the flow data to be processed, and reserving a payload in the flow data to be processed to obtain the flow to be detected.
Optionally, the sequentially circularly marking the corresponding sequence numbers for the bytes in the flow to be detected from the first byte according to the preset dimension number includes:
determining the type of the currently used computer;
if the computer is a 32-bit computer, sequentially and circularly marking 4 serial numbers for bytes in the flow to be detected from the first byte;
if the computer is a 64-bit computer, sequentially and circularly marking 8 serial numbers for bytes in the flow to be detected from the first byte;
correspondingly, dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data, wherein the method comprises the following steps:
if the computer is a 32-bit computer, dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing 4 groups of flow data;
if the computer is a 64-bit computer, dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing 8 groups of flow data.
Optionally, before the variance is obtained for the flow data of the same window sliding position in each set of flow data of the second data set, the method further includes:
Converting each group of flow data in the first data group into corresponding flow signal diagrams respectively, and judging whether only one flow signal diagram exists in all the flow signal diagrams, and the data value of the flow signal diagram in a target area is stable;
if only one data value of the flow signal diagram in the target area is stable, the flow to be detected has return-oriented programming flow data, and then the variance is obtained for the data of the same window sliding position in each data of the second data group.
Optionally, the determining the characteristic of the return-oriented programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data set includes:
if the variance corresponding to the first area in the one-dimensional data set is larger than the variance corresponding to the second area, and the difference between the variance corresponding to the first area and the variance corresponding to the second area is larger than a preset threshold, determining that the first area is the area containing the return-oriented programming flow data, and extracting the characteristics of the return-oriented programming flow in the first area.
Optionally, the sliding window value of the first data set by using a sliding window with a preset window size includes:
Utilizing a sliding window with a preset window size and based on a preset sliding value rule, performing sliding window value on the first data set; the preset sliding value rule is that the number of different byte values in the sliding window is used as a corresponding value result when sliding each time.
Optionally, the sliding window value of the first data set includes:
and carrying out sliding window value selection on the first data set according to a preset sliding step length.
In a second aspect, the present application discloses a feature extraction device for return-oriented programming flow, the device comprising:
the serial number marking module is used for obtaining the flow to be detected and sequentially and circularly marking corresponding serial numbers of bytes in the flow to be detected from the first byte according to the number of preset dimensions;
the first data set determining module is used for dividing bytes with the same serial number in the flow to be detected into the same group so as to obtain a first data set containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions;
the second data set determining module is used for respectively carrying out sliding window value on each group of flow data in the first data set by utilizing a sliding window with a preset window size so as to obtain a second data set with flow data with corresponding group number;
The feature extraction module is used for taking variance from the data of the same window sliding position in each group of data of the second data group so as to convert the second data group into a one-dimensional data group, and determining the feature facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group.
In a third aspect, the present application discloses an electronic device comprising a processor and a memory; wherein the memory is configured to store a computer program that is loaded and executed by the processor to implement the feature extraction method for return-oriented programming traffic as described above.
In the method, firstly, the flow to be detected is obtained, and corresponding serial numbers of bytes in the flow to be detected are marked in a circulating mode sequentially from the first byte according to the number of preset dimensions; dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions; respectively carrying out sliding window value on each group of flow data in the first data group by utilizing a sliding window with a preset window size so as to obtain a second data group with data of corresponding group number; and taking variance from the data of the same window sliding position in each group of data of the second data group, so as to convert the second data group into a one-dimensional data group, and determining the characteristic facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group. The method comprises the steps that corresponding serial numbers are marked from the first byte of the flow to be detected in sequence, bytes with the same serial numbers are divided into the same group, and the first byte and the following bytes of a Gadget address in the flow to be detected can be separated, so that the flow to be detected is decomposed into a plurality of groups of flow data to obtain a first data group, and the characteristic of the return-oriented programming flow is highlighted; then, the first data set is subjected to numerical conversion by utilizing a sliding window value-taking mode to obtain a second data set with data of corresponding group number, the extraction mode is simple, and the combination of a target environment is not needed; and finally, taking variance from the data of the same window sliding position in each group of data of the second data group, so that the second data group is converted into a one-dimensional data group, the data length is shortened, the data value range is compressed, the characteristic of the return-oriented programming flow in the flow to be detected is effectively extracted according to the recorded abnormal variance area in the one-dimensional data group, the whole process does not need to predict the corresponding memory space address range of the host, attack data can be effectively identified by utilizing the byte distribution characteristic of the return-oriented programming flow, the detection efficiency is improved, and in addition, the problem of high false alarm rate caused by focusing on bytes in the traditional static system detection is effectively reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a feature extraction method for return-type programming flow disclosed in the present application;
FIG. 2 is a variance feature extraction explanatory diagram for a return-oriented programming flow disclosed herein;
FIG. 3a is a schematic diagram of a variance feature extraction for a return-oriented programming flow disclosed herein;
FIG. 3b is a schematic diagram of a non-return-oriented programming flow variance feature extraction of the present disclosure;
FIG. 4 is a flowchart of a method for feature extraction for return-oriented programming flow in accordance with the disclosure herein;
FIG. 5 is a graph of an original flow signal for a return-oriented programming flow disclosed herein;
FIG. 6 is a flow signal diagram of a specific embodiment of the present disclosure after converting an original flow signal diagram;
FIG. 7 is a schematic structural diagram of a feature extraction device for return-type programming flow disclosed in the present application;
fig. 8 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Currently, when identifying and detecting the return-oriented programming traffic data, the technology detection rate of the effective return-oriented programming traffic data is low and the running time overhead is high by checking whether the data in the gadget space of any protected application exists in the input. In order to improve the detection efficiency, a static method is used for detecting that under the environment of high network flow, no proper feature extraction method exists, and false alarm is high.
Therefore, the feature extraction scheme for the return-type programming flow can effectively identify data of the return-type programming flow, improve detection efficiency and perform proper feature extraction on the data of the return-type programming flow.
The embodiment of the invention discloses a feature extraction method for return type programming flow, which is shown in fig. 1 and comprises the following steps:
step S11: and obtaining the flow to be detected, and sequentially and circularly marking corresponding serial numbers for bytes in the flow to be detected from the first byte according to the number of preset dimensions.
In this embodiment of the present application, before obtaining the flow to be detected, the method further includes: acquiring flow data to be processed; and removing protocol header information of the flow data to be processed, and reserving effective load in the flow data to be processed to obtain the flow to be detected, wherein the standard length of the byte of the specified detected flow is 1460, which is the longest effective load length of a single data packet transmitted by a network, and 0xFF bytes are added after the length is insufficient.
In the embodiment of the application, it is assumed that return-oriented programming flow data exist in flow data to be processed, after flow data to be detected are acquired, protocol header information in the flow data is removed, and corresponding serial numbers are marked in a circulating mode sequentially on bytes in the flow data to be detected from the first byte according to the number of preset dimensions.
In one embodiment, if the computer is a 32-bit computer, i.e., running within a 32-bit operating system environment, the return-oriented programming traffic data is four bytes of data. If address byte data pointing to the instruction sequence similar to the gadget is found at the offset n of the data, any one of the data with a later interval of 4 bits, namely n+4, n+8, n+12 … data, also has to have the instruction sequence pointing to the similar address information, so that the bytes in the traffic to be detected are sequentially and circularly marked with 4 serial numbers from the first byte, namely, the serial numbers are sequentially marked from the first byte of the traffic to be detected in the order of 0, 1, 2 and 3, and thus, the sequential extraction of the traffic to be detected is further realized.
In another embodiment, if the computer is a 64-bit computer, i.e., running within a 64-bit operating system environment, the return-oriented programming traffic data is eight bytes of data. Therefore, 8 serial numbers are sequentially and circularly marked on the bytes in the flow to be detected from the first byte, namely, the serial numbers are sequentially marked from the first byte of the flow to be detected in the order of 0, 1, 2, 3, 5, 6 and 7, so that the sequential extraction of the flow to be detected is further realized.
Step S12: dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the preset dimension number.
In the embodiment of the application, the flow to be detected is converted into a first data set containing a plurality of sets of flow data based on a preset dimension. Specifically, dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions.
In one embodiment, if the computer is a 32-bit computer, the bytes with the same sequence number in the flow to be detected are divided into the same group in a 4-dimensional arrangement sequence extraction mode, namely, the bytes with the sequence number 0 are divided into the same group, the bytes with the sequence number 1 are divided into the same group, and the like, so as to obtain a first data group containing 4 groups of flow data.
In another embodiment, if the computer is a 64-bit computer, the bytes with the same sequence number in the flow to be detected are divided into the same group in an 8-dimensional permutation order extraction manner, so as to obtain a first data group including 8 groups of flow data.
Step S13: and respectively carrying out sliding window value on each group of flow data in the first data group by utilizing a sliding window with a preset window size so as to obtain a second data group with corresponding group number of flow data.
In this embodiment, after the flow data to be detected is decomposed into a first data set including a plurality of sets of flow data, a sliding window is performed on the first data set to obtain a second data set having data of a corresponding set number.
Specifically, sliding window value is carried out on the first data set according to a preset sliding step length by utilizing a sliding window with a preset window size and based on a preset sliding value rule; the preset sliding value rule is that the number of different byte values in the sliding window is used as a corresponding value result when sliding each time. It should be noted that the preset window size determines the magnitude of the value change, and the specific numerical value thereof may be set according to the actual situation and is not limited herein.
Step S14: and taking variance from the data of the same window sliding position in each group of data of the second data group, so as to convert the second data group into a one-dimensional data group, and determining the characteristic facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group.
In this embodiment of the present application, after a second data set is obtained by converting a value of a first data set by using a sliding window value, variance is obtained for data of the same window sliding position in each set of data of the second data set, so that the second data set is converted into a one-dimensional data set, and a feature of the to-be-detected flow, which faces to a return programming flow, is determined according to an abnormal variance area recorded in the one-dimensional data set.
An embodiment of the present application will be specifically described by taking fig. 2 as an example. It is assumed that the method operates in a 32-bit operating system, the data of the flow to be detected is marked with serial numbers of 0, 1, 2 and 3 in a cyclic order, then bytes with the same serial numbers in the flow to be detected are divided into the same group, namely, the bytes marked with the serial numbers of 0 are divided into T1 groups by T1, the bytes marked with the serial numbers of 1 are divided into T2 groups by T2, the bytes marked with the serial numbers of 2 are divided into T3 groups by T3, and the bytes marked with the serial numbers of 3 by T4 are divided into T4 groups, so that a first data group containing 4 groups of flow data is obtained. And then, respectively carrying out sliding window value taking on the 4 groups of flow data, for example, carrying out sliding window value taking on the 4 groups of flow data by using a sliding window with the window length of 8 and the step length of 1, wherein the sliding value taking rule is that the size of each value is the number of different byte values in the window. If in the T1 group, the byte values in the window have three different values of 61, 10 and ff when the sliding starts, so the sliding window takes a value of 3; after sliding forward for 1 step, the byte values in the window have three different values of 10, ff and 61, so the sliding window has a value of 3, and the corresponding 4 groups of data after the value is obtained by analogy: t12, T22, T32, T42. It will be appreciated that a1 is 3 in T12, a2 is 3 …, then the variance is taken for the flow data at the same window sliding position in T12, T22, T32, T42, i.e. the variance is taken for a1, b1, c1, d1, the variance is taken for a2, b2, c2, d2, and so on, so that the second data set can be converted into a one-dimensional data set.
It can be appreciated that when the flow is not oriented to the return programming, the bytes are stable at the same time or the 4 sliding windows are disordered at the same time and are large or small, and the variance values are relatively stable and the values are smaller; when the data of the return-oriented programming flow are obtained, one data becomes smaller, the other 3 data are relatively larger, and the variance value in the area is larger, so that the variance characteristic can effectively extract the four-dimensional value characteristic of the return-oriented programming flow and distinguish the four-dimensional value characteristic from other non-return-oriented programming flows, and therefore, the characteristic of the return-oriented programming flow in the flow to be detected can be determined according to the abnormal variance area recorded in the one-dimensional data set.
Fig. 3a shows a schematic diagram of the existing return-oriented programming flow obtained by extracting variance features from the flow data, and fig. 3b shows a schematic diagram of the non-return-oriented programming flow obtained by extracting variance features from the flow data. In the non-return-oriented programming flow, the variance value is generally smaller and relatively stable, while in the return-oriented programming flow, the variance value is far larger than that of the non-return-oriented programming flow in the area with certain area persistence, the area is judged to be the area containing the ROP attack force chain flow data, and the flow to be detected is judged to be the return-oriented programming flow in sequence.
In the method, firstly, the flow to be detected is obtained, and corresponding serial numbers of bytes in the flow to be detected are marked in a circulating mode sequentially from the first byte according to the number of preset dimensions; dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions; respectively carrying out sliding window value on each group of flow data in the first data group by utilizing a sliding window with a preset window size so as to obtain a second data group with data of corresponding group number; and taking variance from the data of the same window sliding position in each group of data of the second data group, so as to convert the second data group into a one-dimensional data group, and determining the characteristic facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group. Therefore, the first byte of the Gadget address in the flow to be detected can be separated from the following bytes, so that the flow to be detected is decomposed into a plurality of groups of flow data to obtain a first data group, and the characteristic of the return-oriented programming flow is highlighted; then, the first data set is subjected to numerical conversion by utilizing a sliding window value-taking mode to obtain a second data set with data of corresponding group number, the extraction mode is simple, and the combination of a target environment is not needed; and finally, taking variance from the data of the same window sliding position in each group of data of the second data group, so that the second data group is converted into a one-dimensional data group, the data length is shortened, the data value range is compressed, the characteristic of the return-oriented programming flow in the flow to be detected is effectively extracted according to the recorded abnormal variance area in the one-dimensional data group, the whole process does not need to predict the corresponding memory space address range of the host, attack data can be effectively identified by utilizing the byte distribution characteristic of the return-oriented programming flow, the detection efficiency is improved, and in addition, the problem of high false alarm rate caused by focusing on bytes in the traditional static system detection is effectively reduced.
The embodiment of the application discloses a specific feature extraction method for return-oriented programming flow, which is shown in fig. 4 and comprises the following steps:
step S21: and obtaining the flow to be detected, and sequentially and circularly marking corresponding serial numbers for bytes in the flow to be detected from the first byte according to the number of preset dimensions.
Step S22: dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the preset dimension number.
Step S23: and respectively carrying out sliding window value on each group of flow data in the first data group by utilizing a sliding window with a preset window size so as to obtain a second data group with data of corresponding group number.
For the more specific processing procedure of the above steps S21, S22 and S23, reference may be made to the corresponding content disclosed in the foregoing embodiment, and no detailed description is given here.
Step S24: and respectively converting each group of flow data in the first data group into a corresponding flow signal diagram, and judging whether only one flow signal diagram exists in all the flow signal diagrams, and the data value of the flow signal diagram in a target area is stable.
In this embodiment of the present application, each set of flow data in the first data set corresponds to one flow signal map, and at this time, it is determined whether only one data value of the flow signal map in the target area is stable in all the flow signal maps. It can be understood that, according to the characteristics of the return-oriented programming flow, if the return-oriented programming flow data exists in the flow to be detected, there must be a stable data value of one group of data in the target area in each group of flow data, only a fixed number of values are taken, and the data values of other groups are unstable, so that the characteristics of the return-oriented programming flow are greatly highlighted.
Step S25: if only one flow signal diagram exists and the data value of the flow signal diagram in the target area is stable, the return-oriented programming flow data exists in the flow to be detected.
In the embodiment of the present application, if only one flow signal graph exists and the data value of the flow signal graph in the target area is stable, return guiding programming flow attack data exists in the acquired flow data, and then feature extraction is performed on the flow to be detected. It will be appreciated that if there is no region in the flow signal diagram where the data value is stable, then there is no return-oriented programming flow data in the acquired flow data. If the data of the return-oriented programming flow exists in the flow to be detected, the variance is obtained for the data of the sliding position of the same window in each group of data of the second data group, and preparation is made for extracting the characteristics of the return-oriented programming flow in the flow to be detected.
Step S26: and taking variance from the data of the same window sliding position in each group of data of the second data group, so as to convert the second data group into a one-dimensional data group, and determining the characteristic facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group.
Specifically, if the variance corresponding to the first area in the one-dimensional data set is greater than the variance corresponding to the second area, determining that the first area is the area containing the data of the return-oriented programming flow, and extracting the characteristics of the return-oriented programming flow in the first area.
It can be appreciated that the variance characteristic can effectively extract four-dimensional numerical characteristics of the return-oriented programming attack chain, and distinguish the four-dimensional numerical characteristics from other non-return-oriented programming traffic, and whether the first region is a region containing the return-oriented programming traffic data can be determined according to a relationship between a difference value of variance values between a variance value stable region and a variance value unstable region and a preset threshold. The preset threshold may be modified and set according to actual situations, which is not limited herein.
By way of example, assuming that in a 32-bit operating system environment, FIG. 5 is a diagram of traffic signals to be detected for which there is a return-oriented programming traffic, and FIG. 6 is a diagram of traffic signals for each set of traffic data in the first data set converted to a corresponding traffic signal, it can be more clearly observed from this example that there are only a few values, such as T4, for byte values for one of the four sets of data in the region where the return-oriented programming traffic is located; the other three sets of data have a larger range of byte values than the gadget header byte values in the region of the return-oriented programming traffic, such as T1, T2, T3.
In the method, firstly, the flow to be detected is obtained, and corresponding serial numbers of bytes in the flow to be detected are marked in a circulating mode sequentially from the first byte according to the number of preset dimensions; dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions; respectively carrying out sliding window value on each group of flow data in the first data group by utilizing a sliding window with a preset window size so as to obtain a second data group with data of corresponding group number; converting each group of flow data in the first data group into corresponding flow signal diagrams respectively, and judging whether only one flow signal diagram exists in all the flow signal diagrams, and the data value of the flow signal diagram in a target area is stable; if only one flow signal diagram exists and the data value of the flow signal diagram in the target area is stable, the flow to be detected has return-oriented programming flow data; and taking variance from the data of the same window sliding position in each group of data of the second data group, so as to convert the second data group into a one-dimensional data group, and determining the characteristic facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group. Therefore, the first byte of the Gadget address in the flow to be detected can be separated from the following bytes, so that the flow to be detected is decomposed into a plurality of groups of flow data to obtain a first data group, and the characteristic of the return-oriented programming flow is highlighted; then, the first data set is subjected to numerical conversion by utilizing a sliding window value-taking mode to obtain a second data set with data of corresponding group number, the extraction mode is simple, and the combination of a target environment is not needed; and finally, taking variance from the data of the same window sliding position in each group of data of the second data group, so that the second data group is converted into a one-dimensional data group, the data length is shortened, the data value range is compressed, the characteristic of the return-oriented programming flow in the flow to be detected is effectively extracted according to the recorded abnormal variance area in the one-dimensional data group, the whole process does not need to predict the corresponding memory space address range of the host, attack data can be effectively identified by utilizing the byte distribution characteristic of the return-oriented programming flow, the detection efficiency is improved, and in addition, the problem of high false alarm rate caused by focusing on bytes in the traditional static system detection is effectively reduced.
Correspondingly, the embodiment of the application discloses a feature extraction device for return-type programming flow, referring to fig. 7, the device comprises:
the sequence number marking module 11 is used for obtaining the flow to be detected and sequentially and circularly marking corresponding sequence numbers of bytes in the flow to be detected from the first byte according to the number of preset dimensions;
a first data set determining module 12, configured to divide bytes with the same sequence number in the flow to be detected into the same group, so as to obtain a first data set including a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions;
a second data set determining module 13, configured to perform sliding window value on each set of flow data in the first data set by using a sliding window with a preset window size, so as to obtain a second data set with data of a corresponding set number;
the feature extraction module 14 is configured to take variance from the data of the same window sliding position in each set of data of the second data set, so as to convert the second data set into a one-dimensional data set, and determine the feature of the to-be-detected flow, which faces the return programming flow, according to the abnormal variance area recorded in the one-dimensional data set.
The more specific working process of each module may refer to the corresponding content disclosed in the foregoing embodiment, and will not be described herein.
Therefore, through the scheme of the embodiment, firstly, the flow to be detected is obtained, and corresponding serial numbers are marked in a circulating mode sequentially for bytes in the flow to be detected from the first byte according to the number of preset dimensions; dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions; respectively carrying out sliding window value on each group of flow data in the first data group by utilizing a sliding window with a preset window size so as to obtain a second data group with data of corresponding group number; and taking variance from the data of the same window sliding position in each group of data of the second data group, so as to convert the second data group into a one-dimensional data group, and determining the characteristic facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group. Therefore, the first byte of the Gadget address in the flow to be detected can be separated from the following bytes, so that the flow to be detected is decomposed into a plurality of groups of flow data to obtain a first data group, and the characteristic of the return-oriented programming flow is highlighted; then, the first data set is subjected to numerical conversion by utilizing a sliding window value-taking mode to obtain a second data set with data of corresponding group number, the extraction mode is simple, and the combination of a target environment is not needed; and finally, taking variance from the data of the same window sliding position in each group of data of the second data group, so that the second data group is converted into a one-dimensional data group, the data length is shortened, the data value range is compressed, the characteristic of the return-oriented programming flow in the flow to be detected is effectively extracted according to the recorded abnormal variance area in the one-dimensional data group, the whole process does not need to predict the corresponding memory space address range of the host, attack data can be effectively identified by utilizing the byte distribution characteristic of the return-oriented programming flow, the detection efficiency is improved, and in addition, the problem of high false alarm rate caused by focusing on bytes in the traditional static system detection is effectively reduced.
Further, the embodiment of the present application discloses an electronic device, and fig. 8 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.
Fig. 8 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement relevant steps in the feature extraction method for return-oriented programming traffic disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be a computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, data 223, and the like, and the data 223 may include various data. The storage means may be a temporary storage or a permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further comprise a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the return-oriented programming flow feature extraction method performed by the electronic device 20 as disclosed in any of the previous embodiments.
Further, embodiments of the present application disclose a computer readable storage medium, where the computer readable storage medium includes random access Memory (Random Access Memory, RAM), memory, read-Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, magnetic disk, or optical disk, or any other form of storage medium known in the art. The method for extracting the characteristics of the return-oriented programming flow is realized when the computer program is executed by a processor. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The steps of a feature extraction or algorithm for return-oriented programming traffic described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is provided for a method, a device and a device for extracting characteristics of return-oriented programming flow, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the above description of the examples is only used to help understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (7)

1. The feature extraction method for the return-type programming flow is characterized by comprising the following steps of:
acquiring flow to be detected, and sequentially and circularly marking corresponding serial numbers of bytes in the flow to be detected from the first byte according to the number of preset dimensions;
dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions;
respectively carrying out sliding window value on each group of flow data in the first data group by utilizing a sliding window with a preset window size so as to obtain a second data group with data of corresponding group number;
Taking variance from the data of the same window sliding position in each group of data of the second data group, so as to convert the second data group into a one-dimensional data group, and determining the characteristic facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group;
the step of sequentially and circularly marking the bytes in the flow to be detected with corresponding serial numbers from the first byte according to the preset dimension number comprises the following steps:
determining the type of the currently used computer; if the computer is a 32-bit computer, sequentially and circularly marking 4 serial numbers for bytes in the flow to be detected from the first byte; if the computer is a 64-bit computer, sequentially and circularly marking 8 serial numbers for bytes in the flow to be detected from the first byte;
correspondingly, dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing a plurality of groups of flow data, wherein the method comprises the following steps:
if the computer is a 32-bit computer, dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing 4 groups of flow data; if the computer is a 64-bit computer, dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing 8 groups of flow data;
Before the variance is obtained for the data of the same window sliding position in each data set of the second data set, the method further comprises:
converting each group of flow data in the first data group into corresponding flow signal diagrams respectively, and judging whether only one flow signal diagram exists in all the flow signal diagrams, and the data value of the flow signal diagram in a target area is stable;
if only one data value of the flow signal diagram in the target area is stable, the flow to be detected has return-oriented programming flow data, and then the variance is obtained for the data of the same window sliding position in each data of the second data group.
2. The method for extracting characteristics of a return-type programming flow according to claim 1, further comprising, before the obtaining the flow to be detected:
acquiring flow data to be processed;
and removing protocol header information of the flow data to be processed, and reserving a payload in the flow data to be processed to obtain the flow to be detected.
3. The method for extracting features of return-oriented programming flow according to claim 1, wherein determining features of return-oriented programming flow in the flow to be detected according to the abnormal variance region recorded in the one-dimensional data set comprises:
If the variance corresponding to the first area in the one-dimensional data set is larger than the variance corresponding to the second area, and the difference between the variance corresponding to the first area and the variance corresponding to the second area is larger than a preset threshold, determining that the first area is the area containing the return-oriented programming flow data, and extracting the characteristics of the return-oriented programming flow in the first area.
4. A method of extracting characteristics of a return-oriented programming flow according to any one of claims 1 to 3, wherein sliding window values of the first data set using a sliding window of a preset window size include:
utilizing a sliding window with a preset window size and based on a preset sliding value rule, performing sliding window value on the first data set; the preset sliding value rule is that the number of different byte values in the sliding window is used as a corresponding value result when sliding each time.
5. The method for feature extraction for return-oriented programming traffic of claim 4, wherein said sliding window value of said first data set comprises:
and carrying out sliding window value selection on the first data set according to a preset sliding step length.
6. A return-oriented programming flow feature extraction apparatus, comprising:
the serial number marking module is used for obtaining the flow to be detected and sequentially and circularly marking corresponding serial numbers of bytes in the flow to be detected from the first byte according to the number of preset dimensions;
the first data set determining module is used for dividing bytes with the same serial number in the flow to be detected into the same group so as to obtain a first data set containing a plurality of groups of flow data; the number of the groups of the flow data in the first data group is consistent with the number of the preset dimensions;
the second data set determining module is used for respectively carrying out sliding window value on each group of flow data in the first data set by utilizing a sliding window with a preset window size so as to obtain a second data set with data of corresponding group number;
the feature extraction module is used for taking variance from the data of the same window sliding position in each group of data of the second data group so as to convert the second data group into a one-dimensional data group, and determining the feature facing the return programming flow in the flow to be detected according to the abnormal variance area recorded in the one-dimensional data group;
The serial number marking module is specifically configured to:
determining the type of the currently used computer; if the computer is a 32-bit computer, sequentially and circularly marking 4 serial numbers for bytes in the flow to be detected from the first byte; if the computer is a 64-bit computer, sequentially and circularly marking 8 serial numbers for bytes in the flow to be detected from the first byte;
correspondingly, the first data set determining module is specifically configured to:
if the computer is a 32-bit computer, dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing 4 groups of flow data; if the computer is a 64-bit computer, dividing bytes with the same serial number in the flow to be detected into the same group to obtain a first data group containing 8 groups of flow data;
the feature extraction device for the return-type programming flow is specifically used for: before taking variance for the data of the same window sliding position in each group of data of the second data group, converting each group of flow data in the first data group into a corresponding flow signal diagram respectively, and judging whether only one data value of the flow signal diagram in a target area exists in all the flow signal diagrams or not;
If only one data value of the flow signal diagram in the target area is stable, the flow to be detected has return-oriented programming flow data, and then the variance is obtained for the data of the same window sliding position in each data of the second data group.
7. An electronic device comprising a processor and a memory; wherein the memory is for storing a computer program that is loaded and executed by the processor to implement the return-oriented programming traffic feature extraction method of any one of claims 1 to 5.
CN202210394785.XA 2022-04-15 2022-04-15 Feature extraction method, device and equipment for return type programming flow Active CN114760131B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210394785.XA CN114760131B (en) 2022-04-15 2022-04-15 Feature extraction method, device and equipment for return type programming flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210394785.XA CN114760131B (en) 2022-04-15 2022-04-15 Feature extraction method, device and equipment for return type programming flow

Publications (2)

Publication Number Publication Date
CN114760131A CN114760131A (en) 2022-07-15
CN114760131B true CN114760131B (en) 2024-03-01

Family

ID=82331151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210394785.XA Active CN114760131B (en) 2022-04-15 2022-04-15 Feature extraction method, device and equipment for return type programming flow

Country Status (1)

Country Link
CN (1) CN114760131B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115473826B (en) * 2022-11-03 2023-01-20 中国人民解放军国防科技大学 ROP flow detection method, device, equipment and computer readable storage medium
CN117648232B (en) * 2023-12-11 2024-05-24 武汉天宝莱信息技术有限公司 Application program data monitoring method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111312A (en) * 2011-03-28 2011-06-29 钱叶魁 Multi-scale principle component analysis-based network abnormity detection method
CN110149343A (en) * 2019-05-31 2019-08-20 国家计算机网络与信息安全管理中心 A kind of abnormal communications and liaison behavioral value method and system based on stream
CN111147396A (en) * 2019-12-26 2020-05-12 哈尔滨工程大学 Encrypted flow classification method based on sequence characteristics
CN111262849A (en) * 2020-01-13 2020-06-09 东南大学 Method for identifying and blocking network abnormal flow behaviors based on flow table information
CN113037748A (en) * 2021-03-08 2021-06-25 中国科学院信息工程研究所 C and C channel hybrid detection method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102111312A (en) * 2011-03-28 2011-06-29 钱叶魁 Multi-scale principle component analysis-based network abnormity detection method
CN110149343A (en) * 2019-05-31 2019-08-20 国家计算机网络与信息安全管理中心 A kind of abnormal communications and liaison behavioral value method and system based on stream
CN111147396A (en) * 2019-12-26 2020-05-12 哈尔滨工程大学 Encrypted flow classification method based on sequence characteristics
CN111262849A (en) * 2020-01-13 2020-06-09 东南大学 Method for identifying and blocking network abnormal flow behaviors based on flow table information
CN113037748A (en) * 2021-03-08 2021-06-25 中国科学院信息工程研究所 C and C channel hybrid detection method and system

Also Published As

Publication number Publication date
CN114760131A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN114760131B (en) Feature extraction method, device and equipment for return type programming flow
CN110287163B (en) Method, device, equipment and medium for collecting and analyzing security log
US20190220334A1 (en) Anomaly detection using sequences of system calls
CN111597040B (en) Resource allocation method, device, storage medium and electronic equipment
CN112905184B (en) Pile-inserting-based reverse analysis method for industrial control protocol grammar under basic block granularity
CN113114524B (en) Spark streaming based DNS tunnel detection method and device and electronic equipment
EP3288222B1 (en) Packet filtering device and packet filtering method
CN111831639B (en) Global unique ID generation method and device and vehicle management system
CN114201756A (en) Vulnerability detection method and related device for intelligent contract code segment
CN113536300A (en) PDF file trust filtering and analyzing method, device, equipment and medium
JPWO2019107149A1 (en) Bit assignment estimation device, bit assignment estimation method, program
CN109388617B (en) Method and device for judging reliability of file timestamp
CN114338806B (en) Synchronous message processing method and system
CN113810342B (en) Intrusion detection method, device, equipment and medium
US11556649B2 (en) Methods and apparatus to facilitate malware detection using compressed data
CN116302095A (en) Instruction jump judging method and device, electronic equipment and readable storage medium
CN115473826B (en) ROP flow detection method, device, equipment and computer readable storage medium
CN113839826B (en) Method and device for detecting windows terminal and computer readable storage medium
CN113961647A (en) Data deserialization method and device and related equipment
CN116800637B (en) Method for estimating base number of data item in data stream and related equipment
CN112769599B (en) Automatic resource access method, system and readable storage medium
CN114297636A (en) Method for producing collapse index of knowledge graph and related device
CN115955521A (en) Method and system for identifying private message
CN115525894A (en) Family judgment method, device, equipment and storage medium
CN117034210A (en) Event image generation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant