KR101802443B1

KR101802443B1 - Computer-executable intrusion detection method, system and computer-readable storage medium storing the same

Info

Publication number: KR101802443B1
Application number: KR1020150156373A
Authority: KR
Inventors: 윤명근; 신선호; 김현봉
Original assignee: 국민대학교산학협력단
Priority date: 2015-11-09
Filing date: 2015-11-09
Publication date: 2017-11-28
Also published as: KR20170053895A

Abstract

A computer-executable intrusion detection method and system that includes receiving a packet, dynamically determining a set of candidate patterns out of m q-gram (s) from a predefined regular expression based on predefined constraints, Determining whether a string contained in the packet includes at least one of the set of candidate patterns; if so, determining the packet as a malware candidate to determine whether the packet includes a regular expression by the regular expression . Thus, regular expression matching in an intrusion detection system can be promoted.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer-readable intrusion detection method, a system, and a computer-readable recording medium.

The present invention relates to a computer-enabled intrusion detection method, system and computer-readable recording medium, and more particularly, to a computer-readable intrusion detection method, system and computer readable recording medium capable of promoting regular expression matching in an intrusion detection system using a q- And a system therefor.

Regular expressions are gaining popularity in big data, network traffic, and web content analysis, to name a few. Special or wildcard characters in regular expressions are more flexible and expressive than existing exact string matching methods that only allow you to find simple strings of predefined alphabets. Many network and security applications provide the ability for users to define their own regular expression rules.

Security applications, including anti-virus scanners, intrusion detection / prevention systems, and firewalls, are increasingly reliant on regular expression matching. Because modern cyber attacks are complicated and sophisticated, not all attack patterns can be clearly explained by the simple strings of existing designated alphabets. Therefore, regular expressions are more widely used than ever. For example, one of the most popular intrusion detection systems, snort, contains about 1800 regular expression sets as of 2015. However, regular expression matching still leaves performance bottlenecks, especially when run on a general purpose computer.

Internet use has steadily increased since the advent of the Internet. Today, there are a variety of services for surfing the Web, downloading videos, sharing files with p2p, and exchanging emails. As a result, network traffic has exploded. And the chances of an attacker hiding himself and doing malicious acts have also increased. Malicious activities include spreading malicious programs through a network, sending spam mails, attacking a specific server, and causing damage or stealing important information. However, malicious traffic is extremely difficult to detect because of the huge amount of network traffic. This requires rapid and accurate detection without compromising the network.

Korean Patent No. 10-1322037

One embodiment of the present invention is to provide a computer-executable intrusion detection method and system that can increase regular expression matching speed through a new regular expression filter based on q-gram.

One embodiment of the present invention is to provide a computer-executable intrusion detection method and system that can improve throughput of regular expression matching through a new regular expression filter.

An embodiment of the present invention is to provide a computer-executable intrusion detection method and system capable of miniaturizing a memory space required by a new regular expression filter.

Among the embodiments, a computer-executable intrusion detection method comprises the steps of: (a) receiving a packet; (b) selecting a candidate of N q-gram (s) from a predefined regular expression based on pre- Dynamically determining a set of patterns to determine whether a string contained in the packet includes at least one of the candidate pattern sets; (c) if so, determining the packet as a malicious code candidate so that the packet is included in the regular expression &Lt; / RTI > includes a regular expression by < RTI ID = 0.0 >

The step (c) may include not determining whether the packet includes a regular expression of the regular expression if the string included in the packet does not include the candidate pattern set.

The step (b) may include determining the q-gram according to the size of the packet.

The step (b) may include determining a constraint condition that minimizes the candidate pattern set according to a predefined performance optimization condition.

The step (b) may include dynamically determining the candidate pattern set according to the occurrence frequency of the q-gram.

Among embodiments, a computer-enabled intrusion detection system receives a packet and dynamically determines a set of candidate patterns out of N q-gram (s) from a predefined regular expression based on a predefined constraint A filtering module for determining whether the string contained in the packet includes one of the candidate pattern sets, and if so determining the packet as a malicious code candidate to determine whether the packet includes a regular expression based on the regular expression And a matching engine.

The matching engine may not determine whether the packet includes a regular expression of the regular expression if the string contained in the packet does not include the candidate pattern set.

The filtering module may determine a constraint condition that minimizes the candidate pattern set according to a predefined performance optimization condition.

The filtering module may dynamically determine the candidate pattern set according to the occurrence frequency of the q-gram.

Of the embodiments, a computer readable recording medium may record the computer executable intrusion detection method program.

The disclosed technique may have the following effects. It is to be understood, however, that the scope of the disclosed technology is not to be construed as limited thereby, as it is not meant to imply that a particular embodiment should include all of the following effects or only the following effects.

A computer-enabled intrusion detection method and system in accordance with an embodiment of the present invention can increase regular expression matching speed through a new regular expression filter based on q-grams.

A computer-enabled intrusion detection method and system in accordance with an embodiment of the present invention can improve throughput of regular expression matching through a new regular expression filter.

A computer-executable intrusion detection method and system according to an embodiment of the present invention can miniaturize a memory space required by a new regular expression filter.

1 is a block diagram illustrating a computer-enabled intrusion detection system.
Figure 2 is a diagram illustrating the filtering module in Figure 1 determining a basic q-gram candidate pattern set derived from a regular expression.
FIG. 3 is a diagram illustrating how the filtering module in FIG. 1 determines whether a string included in a packet includes a candidate pattern set.
FIG. 4 is a diagram illustrating that the filtering module in FIG. 1 allows a set of candidate patterns to be minimized according to predefined performance optimization conditions.
5 is a diagram illustrating a process of filtering a packet using the minimized candidate pattern set according to the method of FIG. 4 by the filtering module 100 of FIG.
FIG. 6 is a flowchart illustrating a computer-executable intrusion detection method according to an embodiment of the present invention.

The description of the present invention is merely an example for structural or functional explanation, and the scope of the present invention should not be construed as being limited by the embodiments described in the text. That is, the embodiments are to be construed as being variously embodied and having various forms, so that the scope of the present invention should be understood to include equivalents capable of realizing technical ideas. Also, the purpose or effect of the present invention should not be construed as limiting the scope of the present invention, since it does not mean that a specific embodiment should include all or only such effect.

Meanwhile, the meaning of the terms described in the present application should be understood as follows.

The terms "first "," second ", and the like are intended to distinguish one element from another, and the scope of the right should not be limited by these terms. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

It is to be understood that when an element is referred to as being "connected" to another element, it may be directly connected to the other element, but there may be other elements in between. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that there are no other elements in between. On the other hand, other expressions that describe the relationship between components, such as "between" and "between" or "neighboring to" and "directly adjacent to" should be interpreted as well.

It is to be understood that the singular " include " or "have" are to be construed as including the stated feature, number, step, operation, It is to be understood that the combination is intended to specify that it does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

In each step, the identification code (e.g., a, b, c, etc.) is used for convenience of explanation, the identification code does not describe the order of each step, Unless otherwise stated, it may occur differently from the stated order. That is, each step may occur in the same order as described, may be performed substantially concurrently, or may be performed in reverse order.

The present invention can be embodied as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes all kinds of recording devices for storing data that can be read by a computer system . Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like. In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner.

All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. Commonly used predefined terms should be interpreted to be consistent with the meanings in the context of the related art and can not be interpreted as having ideal or overly formal meaning unless explicitly defined in the present application.

1 is a block diagram illustrating a computer-enabled intrusion detection system 10.

Referring to FIG. 1, a computer-enabled intrusion detection system 10 includes a filtering module 100 and a matching engine 200. Here, the filtering module 100 may be a q-gram based regular expression filtering module.

The filtering module 100 receives the packet and dynamically determines a set of candidate patterns out of N q-gram (s) from a predefined regular expression based on a predefined constraint, May include one of the candidate pattern sets. In one embodiment, the filtering module 100 may determine the q-gram according to the size of the packet. In addition, the filtering module 100 may determine a constraint condition that minimizes the candidate pattern set according to a predefined performance optimization condition. In another embodiment, the filtering module 100 may dynamically determine the candidate pattern set according to the frequency of occurrence of q-grams.

The matching engine 200 can determine whether the packet includes a regular expression based on the regular expression by determining the packet as a malicious code candidate when the string included in the packet includes one of the candidate pattern sets. Also, the matching engine 200 may not determine whether the packet includes a regular expression of the regular expression if the string contained in the packet does not include the candidate pattern set.

Here, the regular expression may mean a regular expression rule set in the following description, and the regular expression may mean a regular expression rule or a candidate rule group. Also, the candidate pattern set can mean a q-gram set and can mean a unique identifier of a regular expression rule. The constraint condition can mean θ, and the predefined performance optimization condition is the condition that the throughput of the intrusion detection system is maximized and the memory space is minimized. The number of connection lines c connecting the rule node and the q- d. < / RTI > The constraint θ is described in detail below with reference to FIG.

A computer-executable intrusion detection method and system consistent with an embodiment of the present invention can be applied to any computing application where regular expression matching delays process time and requires a large amount of system resources.

Most regular expression applications place the filtering module 100 in front of the matching engine 200 to improve processing speed. The filtering module 100 may include a regular expression filter. The filtering module 100 is typically implemented with an accurate multi-pattern string matching method such as Aho-Corasick. A representative simple string is defined in each regular expression rule, and a representative set of strings is supplied to the filtering module 100. When a packet arrives, the filtering module 100 first checks whether the packet contains any simple string belonging to the set. If a match is found, the packet is forwarded to the matching engine 200 for further evaluation. The filter can tell the engine specific rule numbers by index so that the engine does not have to check all the rules. In one embodiment, the Aho-Corasick filter addition raises the matching throughput from 2.3 Mbps to 2.7 Mbps.

For a given set of regular expression rules, the engine compiles them into multiple automata groups. When there are a large number of rules, putting them into a single automaton is impossible because it requires too many states and requires large memory space. Actual systems divide regular expression rules into multiple groups, and automaton is generated for each group.

Some applications even create one automaton per rule. For example, Snort has about 1800 regular expression rules, and each rule is represented by its own automaton. This architecture can efficiently utilize modern multicore and many core systems by dynamically allocating each core to target regular expression rules at once. According to this architecture, it is assumed that regular expression rules are called one by one from separate automaton.

Regular expression matching consists of two main steps, so the total throughput and memory space requirements are definitely affected by both the filter and the engine. The filtering process is fast, but it must be done for every packet that requires an operation and for each byte. On the other hand, the engine operates slowly but operates only on packets filtered by the filter. The overall performance of regular expression matching can only be improved if the filter and engine are improved separately.

The current filtering module 100 is based on an accurate multi-pattern string matching method, but is not designed for optimal regular expression filtering purposes. Since the current filtering module 100 is implemented as a state machine, it is required for each state transition to maintain a state track and a plurality of pointer operations in the automata. However, the previous state may not need to be remembered for approximate filtering. Thus, a new approximate simple string matching filter based on stateless q-grams can be designed.

A new filtering method for multi-pattern approximate string matching called REF (Regular Expression Filter), which is performed through the filtering module 100, filters positive packets to minimize activation of the matching engine 200 . This method consists of two steps: rule graph generation and regular expression filtering. For a given set of regular expression rules, a rule graph is generated in the first step. Once the rule graph is complete, the REF can be executed in real time and can identify the packets required for further evaluation of the matching engine 200. In addition, a filtering method using a θ-bound REF (simply θ-REF) and dynamic-REF is also possible.

Figure 2 is a diagram illustrating the filtering module 100 in Figure 1 determining a basic q-gram candidate pattern set derived from a regular expression.

In FIG. 2, the filtering module 100 may generate a rule graph. In the step of generating the rule graph, _first , a set of n regular expression rules given by R = {r ₁ , r ₂ , ..., r _n } is assumed. While generating the rule graph, all possible q-grams are extracted from each rule and a representative simple string. Then, construct a graph of nouns with rule nodes on the left and q-gram nodes on the right from the R and q-grams. Finally, REF transforms the rule graph into an efficient data structure for fast regular expression filtering (eg, a hash table in this paper).

Filtering module 100 may generate q- gram sets for each of the rules represented by x _q (r _i) for a for a given R, the first r _i. Some regular expression applications require the user to provide a representative simple string defined by regular expression rules. For example, Snort requires users to provide content fields for regular expression rules that are only simple strings. In addition to the simple strings provided by the user, the filtering module 100 may extract additional q-grams by removing special characters from regular expression rules. For example, the filtering module 100 may use the regular expression rule " ^ SSH- [12]. SSH- "can be extracted by removing the special characters" ^ "," [12] ", and" \ d + "from" \ d + ". All q-gram sets are S =

= {s ₁ , s ₂ , ..., s _m }. Table 1 summarizes frequently used notations.

notation Explanation q Gram size. Number of characters per gram r _i i th regular expression rule R A set of regular expression rules R = {r ₁ , r ₂ , ..., r _n } x _q (r _i ) The set of q-grams extracted from r _i and the representative strings (eg, content fields from Snort) S A set of all possible q-grams from r _i (1 ≤ i ≤ n).
S =

= {s ₁ , s ₂ , ..., s _m } e _ij The connection line between r _i and s _j (s _j ∈ x _q (r _i )) r _i .d The order of r _i . r Connected to _i r _i .c r _i counter. Number of choices in the composition of the rule graph s _j .d The order of s _j . s Number of connections connected to _j s _j .c s _j counter. Number of choices in the composition of the rule graph s _j f s _j Frequency of occurrence (dynamic-REF only)

In general, the filtering module 100 may extract a sufficient number of q-grams from regular expression rules and representative simple strings. For example, the filtering module 100 may extract at least one q-gram from all regular expression rules when q < = 4. Most rules can generate more than one q-gram. If a q-gram can not be extracted from a regular expression rule, a separate process may be required to evaluate the packets with rules that failed q-gram generation.

If S is obtained from R, the nutrient graph G = (V, E) is defined as follows. V = S ∪ R, where r _i is the left side of G and s _j is the right side (1 ≤ i ≤ n and 1 ≤ _j ≤ m). A connection line is drawn between r _i and s _j . s _j is derived from r _i, ie, s _j ∈ x _q (r _i ), denoted by e _ij ∈ E. Figure 2 is _{R = {r 1, r 2} , r 3} shows an example of a graph of food produced. Representative simple strings are not included in this example for simplicity. _{x q (r 1) = {} SSH-}, x q (r 2) = {/ ff., ff.p, f.ph, .php}, x q (r 3) = {/ qp, q.ph , .php}.

FIG. 3 is a diagram illustrating how the filtering module 100 in FIG. 1 determines whether the string contained in the packet includes a candidate pattern set.

In Fig. 3, the filtering module 100 may use a basic nondetermination rule graph that has not yet been optimized to filter positive packets. Regular expression rules are activated only when all q-grams are found in the packet. When a new packet arrives, the filtering module 100 may first generate all possible q-grams. If the packet is x bytes long, (x-q + 1) grams are generated. The filtering module 100 may look up the nodes of S implemented in the q-gram hash table for each q-gram in the packet. If the q-gram matches any s _j , then the filtering module 100 may remove all connection lines associated with s _j . If any r _i is separated from the graph, the matching engine 200 may be activated to evaluate the packet with r _i .

Once the packet arrives, the filtering module 100 can retrieve all possible q-grams and retrieve any _sj nodes that match q-grams in the rule graph. Filter module 100 when the s _j matching the q- grams in packets, it is possible to remove the connection lines connected to the s _j. The filtering module 100 may then separate r ₁ and r ₂ from the graph after finishing the packet analysis. The matching engine 200 is then activated for two rules according to the packet content.

The REF included in the filtering module 100 can approximately identify positive packets. It does not belong to the family of accurate multi-pattern string matching methods such as Aho-Corasick. For example, if the packet contains " q.php ", then Aho-Corasick does not activate matching engine 200. [ However, the REF may send some packets to the matching engine 200 unnecessarily. This is called a positive error. REF, however, not only saves more memory space than the existing multi-pattern string matching method, but also can operate faster.

The filtering module 100 may implement a q-gram hash table derived from S and insert m q-grams into the table. Since the hash table can be queried by O (1), the q-grams in the packet are quickly searched. For X byte sized packets, the hash table is simply queried (x - q + 1) times.

Existing multi-pattern exact string matching methods, like REF, process each byte in a packet, but may require more memory space to operate slower or handle complex state transitions in the automata. In addition, at least one regular expression rule may be executed, although existing accurate multi-pattern string matching methods identify the suspicious string. This is because these methods can also reproduce the filtering module 100 before the matching engine 200 like the REF filtering module 100.

FIG. 4 is a diagram for explaining that the filtering module 100 in FIG. 1 allows a set of candidate patterns to be minimized according to predefined performance optimization conditions.

In Fig. 4, the filtering module 100 may create a unique identifier for the rule if only a few q-grams are properly selected in the regular expression rule. In the REF, each packet generates as many q-grams as the packet size, and the hash table of S must be retrieved a large number of times. When the memory size is limited, a large set of S will cause more conflicts that worsen the REF performance. Some r _i nodes may cause unnecessarily large q-grams while the rule graph is being generated. However, only a few connecting lines may be sufficient to clearly identify most r _i nodes. This is called θ-bounded REF or shortly θ-REF. The number of connecting lines from r _i should not be greater than the threshold value θ. Therefore, a method of selecting the? Connecting lines for each node r _i (1? I? N) is needed.

The connection line selection method definitely affects the performance of the θ-REF. Another option is to create a set of other S's and thus to create another hash table. REF execution through the filtering module 100 results in two major tasks affecting performance, hashtable search and matching engine 200 activation. Each hash table search can be executed quickly, but this operation is repeated as many times as the number of bytes. This cost increases by the number of connections and the size of S. On the other hand, the operation of the matching engine 200 is activated more frequently when the number of connected lines becomes smaller. Therefore, the selection method must be carefully designed to reduce the overall cost of hash table lookup and regular expression activation. However, each cost typically depends on the runtime environment.

If the filtering module 100 is able to connect as many of the rules on the left side of the graph as possible to the other q-grams on the right, then the filtering module 100 is able to calculate at least one small candidate rule group with a small & Can be found. REF activates the rule once the rule is separated from the rule graph. If the two rules have the same q-gram in common and these q-grams are present in the packet, the rules will be detached from the graph at the same time. It is undesirable that the matching engine 200 is unnecessarily activated to evaluate rules with duplicate q-grams. Thus, another design principle for the selection method is that high priority should be provided to a smaller order node. This means that larger order nodes generally have more degrees of freedom that are not shared with others in connection selection.

The two new attributes are defined by the connections and orders simply denoted c and d for each node in the rule graph for θ-REF. The order is the number of physical connections around the node. The connection is the number of connections around the selected node.

The filtering module 100 may first select a rule node with c being the minimum in the rule graph. The filtering module 100 may select a minimum value d if there are a plurality of rule nodes having the same minimum value c. The filtering module 100 may randomly select one of them if there are a plurality of rule nodes with the same minimum values c and d. The filtering module 100 may then select one connection line to be associated with the selected rule node. Again, the filtering module 100 may first select one q-gram node with a minimum value c. The filtering module 100 may compare the d values and finally select the most appropriate q-gram to determine the connection line, in order to break the balance. Once a connection line is selected, the filtering module 100 may increment the c value of the end nodes by one. If c is equal to d or? At the rule node, the filtering module 100 may remove the rule node from the graph. This is because the rule node has been selected θ times or there are no longer any possible connection lines to the node. The filtering module 100 may repeat the connection line selection process until all rule nodes are removed from the graph.

The filtering module 100 may move the removed connection line to E '. After generating the rule graph, E 'is used to generate the final rule graph for θ-REF. All connections except those in E 'are removed from the basic rule graph.

In Figure 4a, three regular expression rules are given and seven four-grams are extracted. In Fig. 4B, r1 is _first selected. This is because r ₁ has minimum values c and d. And s ₁ is selected, so e ₁₁ is selected and moved to E '. The c value of r ₁ and s ₁ is incremented by one. r ₁ is removed because the value of c is equal to d. In the next turn, r ₃ is selected and s ₆ is selected. In Figure 4c Connection e ₃₆ is added to E '. This process is repeated until all rules are removed in R, as shown in Figure 4f.

5 is a diagram illustrating a process of filtering a packet using the minimized candidate pattern set according to the method of FIG. 4 by the filtering module 100 of FIG.

Figure 5A shows that the filtering module 100 has determined the final candidate pattern set via &thetas; -REF. The filtering module 100 may determine whether the string contained in the packet includes one of the candidate pattern sets. When the packet arrives, the filtering module 100 can extract all possible q-grams and query the S-set of hash tables for s _j corresponding to any q-grams. If a matching node is found, the filtering module 100 may select (remove) all connected lines connected thereto and decrease the c value of all connected nodes by one. If any r _i is separated from the graph, i. E., The value of c becomes zero, the matching engine 200 may be activated to evaluate the entire packet content with rule r _i . Figures 5b and 5c show that a packet of " /ff.php " arrives and 4-grams of " / ff. &Quot; and " ff.p " Since r ₂ is to be separated from the graph, the matching engine 200 is activated with r ₂ and the entire packet content is evaluated. However, the packet does not contain '/', so it does not match r ₂ .

Some runtime environments can have static data and traffic patterns. For example, if an intrusion detection system is running in front of a web server with static content for many years, the system will see periodic or continuously repeated traffic patterns. This attribute may be used to generate more optimal rule graphs while minimizing the number of matching engine 200 activations. This version of REF is referred to as dynamic-REF (Dynamic-REF) because it utilizes statistical information about previous traffic to improve performance.

Suppose that the frequency of occurrence of each q-gram in S is known. When the filtering module 100 selects s _j while generating the rule graph, it first has to take into account the q-grams which rarely occurred. This is because the connection lines connected to such s _j may be scarcely removed and the corresponding rules may be evaluated less frequently in the matching engine 200.

Dynamic-REF implementations require information on q-gram frequency in previous network traffic. A new attribute, frequency f, indicating the number of times the q-gram occurred is added to node s _j . The filtering module 100 may collect this information and generate a dynamic-REF rule graph for a particular period of the learning phase. The rule graph construction method is basically the same as that of θ-REF. The only difference is that the c, d and f values for each node of the dynamic -REF are considered. For θ-REF, only c and d are considered. After the c values are compared first, a small f value node is given a high priority in the selection.

FIG. 6 is a flowchart illustrating a computer-executable intrusion detection method according to an embodiment of the present invention.

In FIG. 6, the filtering module 100 may receive the packet (S210) and, upon receiving the packet, determine the q-gram from the regular expressions included in the regular expression set (S220). The filtering module 100 may dynamically determine a set of candidate patterns among the determined q-grams (S230). The filtering module 100 may extract q-grams of the packet and may determine whether the q-gram or string contained in the packet includes at least one of the set of candidate patterns. The filtering module 100 may determine the packet as a malicious code candidate and activate the matching engine 200 if the q-gram or string of the packet matches the q-gram of the candidate pattern set. If the q-gram or string of the packet does not include the candidate pattern set, the filtering module 100 may not activate the matching engine 200 (S240). If the matching engine 200 is activated, the matching engine 200 may evaluate the packet content and determine whether it matches the regular expression (S250).

10: Computer-capable intrusion prevention system
100: Filtering module
200: matching engine

Claims

(a) receiving a packet;
(b) determining a constraint condition indicating a number of connection lines and a degree of a connection line connecting the rule node and the q-gram node such that the candidate pattern set is minimized according to a predefined performance optimization condition, and Dynamically determines a set of candidate patterns according to the frequency of occurrence of the q-gram among m q-grams (s) from the regular expression defined in the packet, and determines whether or not the string included in the packet includes at least one of the set of candidate patterns ;
(c) if so, determining the packet as a malware candidate and determining whether the packet includes a regular expression based on the regular expression.

2. The method of claim 1, wherein step (c)
And if the character string included in the packet does not include the candidate pattern set, determining whether the packet includes a regular expression of the regular expression.

2. The method of claim 1, wherein step (b)
And determining the q-gram according to the size of the packet.

delete

And a constraint condition indicating a number of connection lines and a degree of a line connecting the rule node and the q-gram node such that the candidate pattern set is minimized according to a predefined performance optimization condition is determined, Dynamically determines a set of candidate patterns according to the frequency of occurrence of the q-gram among m q-grams (s) from a predefined regular expression with the string included in the packet including at least one of the set of candidate patterns A filtering module to determine whether or not to do;
A matching engine for determining the packet as a malware candidate and determining whether the packet includes a regular expression based on the regular expression.

7. The system of claim 6, wherein the matching engine
Characterized in that it does not determine whether the packet includes a regular expression of the regular expression if the string contained in the packet does not include the candidate pattern set.

7. The apparatus of claim 6, wherein the filtering module
And determines the q-gram according to the size of the packet.

delete

A function of receiving a packet;
Determining a constraint condition indicating a number of connection lines and a degree of a line connecting the rule node and the q-gram node such that the candidate pattern set is minimized according to a predefined performance optimization condition, Determines a candidate pattern set dynamically according to the frequency of occurrence of the q-gram among m q-grams (s) from the regular expression, and determines whether a character string included in the packet includes at least one of the candidate pattern sets function; And
Determining whether the packet includes a regular expression based on the regular expression by determining the packet as a malicious code candidate, and determining whether the packet includes the regular expression based on the regular expression.