CN107659555B

CN107659555B - Network attack detection method and device, terminal equipment and computer storage medium

Info

Publication number: CN107659555B
Application number: CN201710656777.7A
Authority: CN
Inventors: 刘超; 朱文雷; 李昌志; 吴雷
Original assignee: Beijing Changting Future Technology Co ltd
Current assignee: Beijing Pulsar Technology Co Ltd
Priority date: 2016-08-30
Filing date: 2017-08-03
Publication date: 2020-08-11
Anticipated expiration: 2037-08-03
Also published as: CN107659555A; WO2018041114A1

Abstract

The embodiment of the invention provides a method and a device for detecting network attacks, terminal equipment and a computer storage medium, and relates to the technical field of network security. The network attack detection method comprises the following steps: determining a target language from the request data according to the type of the target language; performing lexical analysis, syntactic analysis and semantic analysis on the target language; and determining the risk level of the request data according to the results of the lexical analysis, the syntactic analysis and the semantic analysis. In the technical scheme provided by the invention, the target language is determined from the request data according to the type of the target language, so that the extraction operation of different types of target languages can be considered to adapt to different detection purposes, and the compatibility of network attack detection is further improved.

Description

Network attack detection method and device, terminal equipment and computer storage medium

Technical Field

The present invention relates to the field of network security technologies, and in particular, to a method and an apparatus for detecting a network attack, a terminal device, and a computer storage medium.

Background

In recent years, vulnerability attack technologies for Web (network) applications are rapidly developed, diversified and retrofitted, and therefore, it is a serious challenge to form effective defense against different vulnerability attacks.

Common vulnerability attacks are, for example: SQL (Structured Query Language) injection attacks, XSS (Cross Site Scripting) attacks, and the like.

The SQL injection attack is to insert an SQL command (malicious) into a Web form or a page request, and finally achieve the purpose of deceiving a server to execute the malicious SQL command.

The XSS attack means that a malicious attacker inserts malicious Script codes into a Web page, and when a user browses the page, the Script codes embedded in the page are executed, so that the purpose of maliciously attacking the user is achieved.

However, in the prior art, the attack defense method usually only aims at a certain type of vulnerability attack, and therefore, the existing attack defense method often has the problem of low compatibility.

Disclosure of Invention

The embodiment of the invention provides a network attack detection method and device, terminal equipment and a computer storage medium, which are used for solving the technical problems in the prior art.

In a first aspect, an embodiment of the present invention provides a method for detecting a network attack.

Specifically, the method comprises the following steps:

determining a target language from the request data according to the type of the target language;

performing lexical analysis, syntactic analysis and semantic analysis on the target language;

and determining the risk level of the request data according to the results of the lexical analysis, the syntactic analysis and the semantic analysis.

As can be seen from the foregoing, different vulnerability attacks tend to depend on different types of computer languages. In contrast, in the invention, the target language is determined from the request data according to the type of the target language, so that the extraction operation of different types of target languages can be considered to adapt to different detection purposes, and the compatibility of network attack detection is further improved.

With reference to the first aspect, in some embodiments of the invention, determining the target language from the request data according to the type of the target language comprises:

if the type of the target language is a script language, determining the target language from the payload data of the request data through a first automaton, wherein the first automaton is constructed by using the following model: a model built for a wrapper layer outside the target language, the model comprising: the position of occurrence, the form of occurrence and the form of encoding of the target language.

To more effectively determine the target language in the request data, with reference to the first aspect, in some embodiments of the present invention, determining the target language from the request data according to the type of the target language further includes:

identifying whether the target language is encoded;

and if the target language is coded, decoding the target language.

and if the type of the target language is the structured query language, determining the payload data of the request data as the target language.

With reference to the first aspect, in some embodiments of the invention, the method further comprises:

extracting the payload data from the request data.

With reference to the first aspect, in some embodiments of the invention, extracting the payload data from the request data comprises:

analyzing a specified head parameter or a request body from the request data;

decoding the header parameters or a request body to obtain the payload data.

With reference to the first aspect, in some embodiments of the invention, the head parameters include a combination of one or more of:

a Request URL (Request network address, where URL refers to Uniform Resource Locator (Uniform Resource Locator, also called network address)) parameter, a Referer parameter, a cookie parameter, and a User-Agent parameter.

With reference to the first aspect, in some embodiments of the invention, the lexical analysis of the target language includes:

determining lexical elements in the target language;

and analyzing the lexical elements through a finite state automaton to obtain token (marker) sequences of clauses in the target language.

In order to ensure the accuracy of the lexical analysis result, with reference to the first aspect, in some embodiments of the present invention, the performing lexical analysis on the target language further includes:

performing a disambiguation operation on the target language according to a context of the target language.

With reference to the first aspect, in some embodiments of the invention, parsing the target language comprises:

and inputting the token sequence into a second automaton to obtain a syntax analysis result of the clause in the target language, wherein the second automaton is generated according to the syntax standard of the target language.

generating a BNF (Backus-Naur Form) file according to the grammatical standard of the target language;

and generating the second automaton according to the BNF file.

Since the second automaton for performing the syntax analysis is generated from the BNF file in the present invention, it is possible to make the result of the syntax analysis more accurate and to improve the execution speed of the syntax analysis.

With reference to the first aspect, in some embodiments of the invention, the semantic analyzing the target language comprises:

and identifying a key function call body and a key feature substructure from the target language.

With reference to the first aspect, in some embodiments of the invention, identifying a key function call body and a key feature substructure from the target language comprises:

and identifying a key function calling body and a key feature substructure from the target language by adopting a bottom-up reduction mode.

Because the semantic analysis is carried out in a bottom-up reduction mode, the automatic layer-by-layer analysis of the semantics can be realized, and the semantic structure of the language can be accurately identified.

With reference to the first aspect, in some embodiments of the invention, determining the risk level of the requested data from the results of the lexical analysis, the syntactic analysis, and the semantic analysis includes:

calculating a comprehensive score of the results of the lexical analysis, the syntactic analysis and the semantic analysis;

comparing the comprehensive score with a set threshold range;

determining a risk level of the requested data according to a result of the comparison.

According to the method and the device, the comprehensive results of lexical analysis, syntactic analysis and semantic analysis of the target language are quantized to generate the corresponding comprehensive score, and the comprehensive score is compared with the set threshold range, so that the judgment process of the risk level is more convenient.

With reference to the first aspect, in some embodiments of the invention, calculating a composite score of the results of the lexical analysis, the syntactic analysis, and the semantic analysis includes:

calculating a first sub-score, a second sub-score and a third sub-score of the target language according to the results of the lexical analysis, the syntactic analysis and the semantic analysis respectively;

respectively weighting the first sub-score, the second sub-score and the third sub-score;

calculating the composite score based on the weighted first sub-score, second sub-score, and third sub-score.

In the invention, the comprehensive score of the target language is calculated according to the weighted lexical analysis result, the syntactic analysis result and the semantic analysis result, so that the importance degrees of different analysis results can be reflected by different weight parameters, and the accuracy of the comprehensive score is improved.

With reference to the first aspect, in some embodiments of the invention, calculating the first sub-score of the target language from the result of the lexical analysis comprises:

and calculating the first sub-score according to the occurrence times and the weight parameters of the token sequence.

With reference to the first aspect, in some embodiments of the invention, calculating the second sub-score of the target language according to the result of the parsing comprises:

and calculating the second sub-score according to the grammar analysis result and the weight parameter of the grammar analysis result.

With reference to the first aspect, in some embodiments of the invention, calculating the third sub-score of the target language according to the result of the semantic analysis includes:

and calculating the third sub-score according to the occurrence frequency and the weight parameter of the key function calling body or the key feature substructure.

With reference to the first aspect, in some embodiments of the invention, the combined score of the results of the lexical analysis, the syntactic analysis and the semantic analysis is calculated according to the following formula:

in the above formula:

score (payload) is the composite score;

t_ithe occurrence number of the ith token sequence obtained by the lexical analysis is shown;

w_tithe weight parameter of the ith token sequence;

s_jthe syntax analysis result of the jth clause obtained by the syntax analysis is 0 or 1;

w_sjthe weight parameter is the jth clause;

m_kthe number of times of occurrence of the kth key function call body or key feature substructure obtained through the semantic analysis;

w_mka weight parameter for the kth key function call body or key feature substructure;

C_t、C_sand C_mAnd the weight parameters of lexical analysis, syntactic analysis and semantic analysis in the comprehensive score are respectively.

To further improve the accuracy of the composite score, in some embodiments of the invention in combination with the first aspect, the method further comprises:

the weight parameters are optimized by machine learning.

In a second aspect, the embodiment of the present invention provides a device for detecting a network attack.

Specifically, the apparatus comprises:

the target language determining module is used for determining the target language from the request data according to the type of the target language;

an analysis module comprising: the lexical analysis unit is used for carrying out lexical analysis on the target language, the syntactic analysis unit is used for carrying out syntactic analysis on the target language, and the semantic analysis unit is used for carrying out semantic analysis on the target language;

and the risk level determining module is used for determining the risk level of the request data according to the results of the lexical analysis, the syntactic analysis and the semantic analysis.

According to the invention, the target language is determined from the request data according to the type of the target language, so that the extraction operation of different types of target languages can be considered to adapt to different detection purposes, and the compatibility of network attack detection is further improved.

With reference to the second aspect, in some embodiments of the invention, the target language determination module comprises:

a determining unit, configured to determine a target language from payload data of the request data through a first automaton in a case where a type of the target language is a script language, wherein the first automaton is constructed using the following model: a model built for a wrapper layer outside the target language, the model comprising: the position of occurrence, the form of occurrence and the form of encoding of the target language.

To determine the target language in the request data more effectively, in some embodiments of the present invention, with reference to the second aspect, the target language determination module further includes:

an identifying unit for identifying whether the target language is encoded;

and the target language decoding unit is used for decoding the target language under the condition that the target language is coded.

With reference to the second aspect, in some embodiments of the invention, the target language determination module is configured to determine the target language from the request data according to the type of the target language by: in a case where the type of the target language is a structured query language, determining payload data of the request data as the target language.

With reference to the second aspect, in some embodiments of the invention, the apparatus further comprises:

and the extraction module is used for extracting the effective load data from the request data.

With reference to the second aspect, in some embodiments of the invention, the extraction module comprises:

the analysis unit is used for analyzing the specified head parameters or the request body from the request data;

a header parameter or request body decoding unit, configured to decode the header parameter or request body to obtain the payload data.

With reference to the second aspect, in some embodiments of the invention, the head parameters include a combination of one or more of: a Request URL parameter, a refer parameter, a cookie parameter, and a User-Agent parameter.

With reference to the second aspect, in some embodiments of the invention, the lexical analysis unit includes:

a determining component for determining lexical elements in the target language;

and the analysis component is used for analyzing the lexical elements through a finite state automaton to obtain a token sequence of the clauses in the target language.

In order to ensure the accuracy of the lexical analysis result, in combination with the second aspect, in some embodiments of the present invention, the lexical analysis unit further includes:

a disambiguation component to perform a disambiguation operation on the target language according to a context of the target language.

With reference to the second aspect, in some embodiments of the invention, the parsing unit is configured to parse the target language by: and inputting the token sequence into a second automaton to obtain a syntax analysis result of the clause in the target language, wherein the second automaton is generated according to the syntax standard of the target language.

the file generation module is used for generating a BNF file according to the grammar standard of the target language;

and the automaton generating module is used for generating the second automaton according to the BNF file.

With reference to the second aspect, in some embodiments of the present invention, the semantic analysis unit is configured to perform semantic analysis on the target language by: and identifying a key function call body and a key feature substructure from the target language.

With reference to the second aspect, in some embodiments of the present invention, the semantic analysis unit is configured to identify a key function call body and a key feature substructure from the target language by: and identifying a key function calling body and a key feature substructure from the target language by adopting a bottom-up reduction mode.

With reference to the second aspect, in some embodiments of the invention, the risk level determination module comprises:

the calculation submodule is used for calculating the comprehensive score of the results of the lexical analysis, the syntactic analysis and the semantic analysis;

the comparison submodule is used for comparing the comprehensive score with a set threshold range;

and the determining submodule is used for determining the risk level of the request data according to the comparison result.

With reference to the second aspect, in some embodiments of the invention, the computation submodule includes:

a sub-score calculating unit comprising: a first calculation component for calculating a first sub-score of the target language based on results of the lexical analysis, a second calculation component for calculating a second sub-score of the target language based on results of the syntactic analysis, and a third calculation component for calculating a third sub-score of the target language based on results of the semantic analysis;

the weighting unit is used for respectively weighting the first sub-score, the second sub-score and the third sub-score;

and the comprehensive score calculating unit is used for calculating the comprehensive score according to the weighted first sub-score, the weighted second sub-score and the weighted third sub-score.

With reference to the second aspect, in some embodiments of the invention, the first calculation component is configured to calculate the first sub-score of the target language from the result of the lexical analysis by: and calculating the first sub-score according to the occurrence times and the weight parameters of the token sequence.

With reference to the second aspect, in some embodiments of the invention, the second calculation component is configured to calculate the second sub-score of the target language from the result of the parsing by: and calculating the second sub-score according to the grammar analysis result and the weight parameter of the grammar analysis result.

With reference to the second aspect, in some embodiments of the invention, the third computing component is configured to implement the computing of the third sub-score of the target language from the result of the semantic analysis by: and calculating the third sub-score according to the occurrence frequency and the weight parameter of the key function calling body or the key feature substructure.

With reference to the second aspect, in some embodiments of the invention, the calculation sub-module is configured to calculate a combined score of the results of the lexical analysis, the syntactic analysis and the semantic analysis according to the following formula:

in the above formula:

score (payload) is the composite score;

w_tithe weight parameter of the ith token sequence;

w_sjthe weight parameter is the jth clause;

To further improve the accuracy of the composite score, in combination with the second aspect, in some embodiments of the invention, the apparatus further comprises:

an optimization module to optimize the weight parameters through machine learning.

In a third aspect, the embodiment of the invention provides a terminal device.

The terminal equipment comprises a memory and a processor; wherein,

the memory is used for storing one or more computer instructions, wherein the one or more computer instructions can realize the detection method of any one of the above network attacks when being executed by the processor.

In a fourth aspect, embodiments of the present invention provide a computer storage medium.

The computer storage medium is used for storing one or more computer instructions, wherein the one or more computer instructions can realize the detection method of any network attack when being executed.

These and other aspects of the invention will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a network attack detection method according to method embodiment 1 of the present invention;

FIG. 2 illustrates one embodiment of the process S2 shown in FIG. 1;

FIG. 3 illustrates another embodiment of the process S2 shown in FIG. 1;

FIG. 4 illustrates one embodiment of the process S5 shown in FIG. 1;

FIG. 5 illustrates one embodiment of the process S51 shown in FIG. 4;

fig. 6 is a schematic structural diagram of a network attack detection apparatus according to embodiment 1 of the present invention;

fig. 7 is a schematic structural diagram of a network attack detection apparatus according to embodiment 5 of the present invention;

FIG. 8 illustrates one embodiment of the extraction module 4' shown in FIG. 7;

FIG. 9 illustrates one embodiment of the lexical analysis unit 21 shown in FIG. 6;

FIG. 10 illustrates one embodiment of the risk level determination module 3 shown in FIG. 6;

FIG. 11 illustrates one embodiment of the calculation submodule 31 illustrated in FIG. 10.

Detailed Description

Various aspects of the invention are described in detail below with reference to the figures and the detailed description. Well-known processes, program modules, elements and their interconnections, links, communications or operations, among others, are not shown or described in detail herein in various embodiments of the invention.

Furthermore, the described features, architectures, or functions can be combined in any manner in one or more implementations.

Furthermore, it should be understood by those skilled in the art that the following embodiments are illustrative only and are not intended to limit the scope of the present invention. Those of skill would further appreciate that the program modules, elements, or steps of the various embodiments described herein and illustrated in the figures may be combined and designed in a wide variety of different configurations.

Technical terms not specifically described in the present specification should be construed in the broadest sense in the art unless otherwise specifically indicated.

In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

[ METHOD EMBODIMENT 1]

Fig. 1 is a flowchart of a network attack detection method according to embodiment 1 of the method of the present invention. Referring to fig. 1, in the present embodiment, the method includes:

s1: determining the target language from the request data according to the type of the target language.

S2: and carrying out lexical analysis on the target language.

S3: and carrying out syntactic analysis on the target language.

S4: and performing semantic analysis on the target language.

S5: and determining the risk level of the request data according to the results of the lexical analysis, the syntactic analysis and the semantic analysis.

The invention provides a detection method suitable for different types of network attacks (such as SQL injection attack, XSS attack, PHP (Hypertext Preprocessor) code injection attack and the like) aiming at the problems in the prior art.

Among them, the SQL injection attack attacks using the structured query language (SQL command), and the XSS attack and the PHP code injection attack attacks using the scripting language (e.g., JavaScript (an interpreted scripting language), VBScript (a lightweight interpreted language in microsoft environment), PHP, etc.).

It follows that different types of target languages need to be analyzed for different types of network attacks. In contrast, in the invention, the target language is determined from the request data according to the type of the target language, so that the extraction operation of different types of target languages can be considered to adapt to different detection purposes, and the compatibility of network attack detection is further improved.

Specifically, in the present invention, if the type of the target language is a structured query language, it is determined that payload data of the request data is the target language; if the type of the target language is a script language, the target language embedded in the payload data needs to be determined. Determining the target language from the payload data may be achieved, for example, by:

determining the target language from payload data of the request data by a first automaton, wherein the first automaton is constructed using the following model: a model built for a wrapper layer outside the target language, the model comprising: the position of occurrence, the form of occurrence and the form of encoding of the target language.

Wherein the first automaton is, for example, a push-down automaton or a finite state automaton.

In order to more effectively determine the target language in the request data, in the present invention, the implementation process of determining the target language from the payload data may further include the following processing:

identifying whether the target Language is encoded (e.g., there may be HTML entry encoding in JavaScript (HyperText Markup Language)); and if the target language is coded, decoding the target language. That is, if the encoding result in the target language determined from the payload data is identified, the corresponding decoding operation is performed, and the list of the target language can be obtained after decoding.

In addition, the network attack detection method provided by the invention abandons the traditional mode of using rules to detect, defines a model of a target language aiming at a vulnerability, does not need manual maintenance rules, can intelligently extract the target language from the request data, and carries out lexical analysis, syntactic analysis and semantic analysis aiming at the target language, integrates the results of the lexical analysis, the syntactic analysis and the semantic analysis, deeply identifies the target language, further judges the risk level of the request data, and further improves the accuracy of network attack detection.

Meanwhile, the method does not adopt a mode of using the rule for detection, so that the problem of low execution speed caused by rule superposition does not exist.

In specific implementation, the network attack detection method provided by the invention can be applied to firewalls of various servers, attack-prevention monitoring software, log audit or entity equipment with protection function combining software and hardware, such as various network equipment and the like.

[ METHOD EMBODIMENT 2 ]

The method provided by this embodiment includes all the contents of method embodiment 1, and is not described herein again. In this embodiment, the method further comprises: extracting the payload data (payload) from the request data. Specifically, a specified header parameter or a request body is analyzed from the request data; decoding the header parameters or a request body to obtain the payload data.

Wherein the head parameters include a combination of one or more of: a Request URL parameter, a refer parameter, a cookie parameter, and a User-Agent parameter.

Taking HTTP (HyperText Transfer Protocol) Protocol as an example, HTTP headers generally include a generic header, a request header, a response header, and an entity header. Each header consists of three parts, domain name, colon (: and domain value). In these headers, part of the header parameters may contain malicious attack information input by the user, such as:

a Request URL parameter in the generic header, the parameter representing a requested URL address;

a refer header field, a cookie header field and a User-Agent header field in the request header; wherein,

the Referer header field indicates the source of the current request, i.e. the browser indicates to the Web server from which Web page/URL it gets/clicks on the Web address/URL in the current request. For example, Referer: http:// www.ABC.com.

The cookie header field is sent by the client and is included in the header of the HTTP request. An example is as follows: cookie: userId ═ C5bYpXrimdmsiQmsBPnE1Vn8ZQmdWSm3WRlEB3vRwTnRtW < -Cookie.

The User-Agent header field is used to indicate the identity of the client (which browser, editor, or other User tool is).

Of course, the present invention is not limited to the above header parameters, and any header parameters that may contain malicious data may be used.

In Web traffic data, payload data (also referred to as a minimum unencoded request unit) is the portion of data in which information is described. In general, when Web data is transmitted, in order to make data transmission more reliable, original data is transmitted in batches, and certain auxiliary information, such as the size of data volume and check bits, is added at the head and tail of each batch of data. The original of these data is the payload data. For example, the user inputs the name and quantity of the commodity to be purchased in the e-commerce webpage, when sending a purchase request to the server, the name and quantity information of the commodity is not transmitted in the network in an exposed manner, but certain auxiliary information is added and encoded and then transmitted in the network, and the server decodes to extract payload data after receiving the purchase request, wherein the name and quantity information of the commodity to be purchased by the user is the payload data itself or is contained in the payload data.

In general, data to be transmitted is encoded and then transmitted through a network according to the requirements of a transmission protocol, and therefore, in the above processing, header parameters or a request body need to be decoded so as to extract payload data contained therein.

Specifically, when the request data includes the encoding information at the time of decoding, the corresponding decoding operation can be performed directly using the encoding information. However, if no decoding information exists in the requested data, it is necessary to intelligently guess a plurality of possible encoding methods and try corresponding decoding operations until decoding can be successfully performed. Only if the header parameters or the requester are successfully decoded, a valid attack code can be extracted.

[ METHOD EMBODIMENT 3 ]

The method provided by this embodiment includes all the contents of method embodiment 1 or method embodiment 2, and is not described herein again. As shown in fig. 2, in the present embodiment, the process S2 is implemented by:

s21: and determining lexical elements in the target language.

S22: and analyzing the lexical elements through a finite state automaton to obtain a token sequence of the clauses in the target language.

[ METHOD EMBODIMENT 4 ]

The method provided by this embodiment includes all the contents of method embodiment 1 or method embodiment 2, and is not described herein again. As shown in fig. 3, in the present embodiment, the process S2 is implemented by:

s21': performing a disambiguation operation on the target language according to a context of the target language.

S22': and determining lexical elements in the target language.

S23': and analyzing the lexical elements through a finite state automaton to obtain a token sequence of the clauses in the target language.

In the process of determining the lexical elements, the types of the lexical elements are deduced according to the 1 st non-empty character, and the subsequent characters are processed one by one until characters which do not belong to the types appear, so that the boundaries and the types of the lexical elements are determined. Correctly determining the boundaries of lexical elements is critical to the lexical analysis, and if not correctly processed, it may affect the correctness of the entire lexical analysis result.

In order to ensure the accuracy of the lexical analysis result, the following processing is added to the method provided by the embodiment: a disambiguating operation is performed on the target language according to a context of the target language. For example, a possible concatenation quotation mark is determined in advance, for example, a prefix mark does not exist in the SQL injection example of select1from users where generated password is 'admin', and lexical analysis is directly performed, where the result of correct lexical analysis is:

however, in the SQL injection example of xxxx 'or'1'═ 1, there is a prefix, and a monogram needs to be added before xxxx' for correct lexical analysis, otherwise, xxxx is recognized as a pure word (bareword).

The result of correct lexical analysis after disambiguation of xxxx ' or '1' ═ 1 is:

< typing > < keyword > < typing > < operator > < typing >, and if a single quotation mark is not supplemented, it is erroneously analyzed as the following result:

<typebareword><typestring><type number><typestring><type number>。

as can be seen from the above examples, for a target language that may generate ambiguity, the result obtained by the disambiguation operation is greatly different from the result obtained without the disambiguation operation, and therefore, the disambiguation operation performed on the target language can improve the accuracy of the lexical analysis result.

In specific implementation, multiple situations or manners (for convenience, referred to as ambiguity generating situations hereinafter) that may generate ambiguity may be predefined, and when the lexical analysis is performed, whether the set ambiguity generating situation occurs in the target language is identified, and if the set ambiguity generating situation occurs in the target language, the ambiguity is resolved first, and then the boundary of each lexical element in the target language is determined step by step according to the above manners.

Situations where ambiguities may arise include, but are not limited to: for example, there may be ambiguity of multiple database languages in SQL injection, and for example, the left slash may be analyzed as a division sign or as an ambiguity of the start of a regular expression during lexical analysis of XSS. These situations need to be eliminated according to the context to finally obtain the correct lexical analysis result. Of course, the several possible ambiguity scenarios listed above are merely exemplary, and the present invention is not limited thereto.

[ METHOD EMBODIMENT 5 ]

The method provided by this embodiment includes all of the contents of any one of method embodiment 1 to method embodiment 4, and is not described herein again. In the present embodiment, the process S3 is realized by:

and inputting the token sequence into a second automaton to obtain a syntax analysis result of the clause in the target language, wherein the second automaton is generated according to the syntax standard of the target language, namely the second automaton is adaptive to the syntax of the target language.

In the present embodiment, for example, the grammar standard of the target language can be defined in the language standard of the language.

Furthermore, the second automaton is, for example, a push-down automaton or a finite state automaton.

The target language usually comprises one or more clauses (each clause is separated by a semicolon; "for example), and accordingly, the token sequence of the clauses in the target language can be obtained through lexical analysis, so that the second automaton is executed by taking the token sequence of the clauses obtained through lexical analysis as the input of the second automaton, and the grammar analysis result of the clauses in the target language can be obtained.

Wherein the syntax analysis result is used to indicate whether a clause meets the syntax standard, for example, "0" or "1" may be used to identify whether a clause meets the syntax standard. Thus, the results of the parsing of different clauses in the target language may constitute an array containing 0's or 1's.

For example, when the select the best components from the class as the term representation (only including one clause) is parsed, the clause does not conform to the corresponding syntax rule, so the second automaton outputs the following result: 0.

however, for 'or'1'═ 1'; select password from users where 1 is 1; for example, since both clauses conform to the corresponding grammar rule, the second automaton outputs the result of [1,1 ].

[ METHOD EMBODIMENT 6 ]

The method provided by this embodiment includes the entire contents of method embodiment 5, which are not described herein again. In this embodiment, the method further includes the following processes:

(1) and generating a BNF file according to the grammar standard of the target language. I.e., a grammar defined for the standard of the target language, a BNF file corresponding to the grammar is generated.

(2) And generating a second automaton according to the BNF file.

BNF is a way to describe the syntax of a given language using formal notation, and BNF files are a kind of syntax description definition files. In particular, the generation of a corresponding automaton from a BNF file can be achieved by an OpenFst (build, merge, optimize, and search library of weighted finite state machines) tool.

[ METHOD EMBODIMENT 7 ]

The method provided by this embodiment includes all of the contents of any one of method embodiment 1 to method embodiment 6, and is not described herein again. In the present embodiment, the process S4 is realized by: and identifying a key function call body and a key feature substructure from the target language. For example, a key function call body and key feature substructures may be identified from the target language in a bottom-up reduction manner.

By adopting a bottom-up reduction mode, semantic structures such as expressions, key sentences, bracket matching relations, function calling relations and the like in the target language can be identified, so that key function calling bodies and key feature substructures contained in the target language can be identified.

For example, an example of SQL injection is as follows:

union select substr(version(),1,1)from users

in the process of analysis by adopting a reduction mode, when version () is analyzed, it is recognized that it is a function call, based on which, further reducing upwards, an expression can be analyzed locally: e1(E1 ═ version, (,)), which expression E1 together with the following 1 (the 2 nd 1from left to right in the above formula) constitutes the parameters of the substr function, on the basis of which the expression E2(E2 ═ substr, (, E1,1,1,) can be reduced further upwards, and the select clause can be reduced again: select E2from users. It can be seen that reduction is a process of identifying sub-structures and gradually obtaining higher-level structures upwards.

Because the semantic analysis is carried out in a bottom-up reduction mode, the automatic layer-by-layer analysis of the semantics can be realized, so that the semantic structure of the language can be accurately identified, the accuracy of the semantic analysis is improved, and a foundation is laid for accurately carrying out comprehensive judgment of the network attack on the results of subsequent comprehensive lexical analysis, syntactic analysis and semantic analysis.

[ METHOD EMBODIMENT 8 ]

The method provided by this embodiment includes all of the contents of any one of method embodiment 1 to method embodiment 7, and is not described herein again. As shown in fig. 4, in the present embodiment, the process S5 is implemented by:

s51: and calculating a comprehensive score of the results of the lexical analysis, the syntactic analysis and the semantic analysis.

S52: and comparing the comprehensive score with a set threshold range.

S53: determining a risk level of the requested data according to a result of the comparison.

For convenience of judgment, in the present invention, one or more risk levels and a threshold range corresponding to each level may be preset, as shown in table 1:

risk level	Threshold range
		Without risk (Normal)	n<10
Risk level 1	10≤n<20
		Risk level 2	20≤n<50
Risk class 3	n≥50

TABLE 1

In table 1, n is a composite score, and the risk levels of risk level 1, risk level 2, and risk level 3 are sequentially higher. In particular implementations, risk level 1 may be a lower risk, risk level 2 may be a medium risk, risk level 3 may be a higher risk, and so on.

The various threshold ranges and risk levels in table 1 are exemplary only and one skilled in the art can also distinguish between no risk and at risk.

After the risk level is analyzed, network attacks can be selectively filtered, intercepted or prompted according to various preset strategies.

[ METHOD EMBODIMENT 9 ]

The method provided by this embodiment includes all the contents of method embodiment 8, and is not described herein again. As shown in fig. 5, in the present embodiment, the process S51 is implemented by:

s511: and calculating a first sub-score of the target language according to the result of the lexical analysis.

For example, the first sub-score is calculated according to the number of occurrences of the token sequence and a weighting parameter.

S512: and calculating a second sub-score of the target language according to the result of the syntactic analysis.

For example, the second sub-score is calculated based on the parsing result and a weight parameter of the parsing result.

S513: and calculating a third sub-score of the target language according to the result of the semantic analysis.

For example, the third sub-score is calculated according to the occurrence number of the key function call body or the key feature substructure and a weight parameter.

S514: and respectively weighting the first sub-score, the second sub-score and the third sub-score.

S515: calculating the composite score based on the weighted first sub-score, second sub-score, and third sub-score.

[ METHOD EMBODIMENT 10 ]

The method provided by this embodiment includes all the contents of method embodiment 8 or method embodiment 9, and is not described herein again. In the present embodiment, the process S51 is implemented according to the following formula:

in the above formula:

score (payload) is the composite score;

w_tias the weight of the ith token sequenceA parameter;

w_sjthe weight parameter is the jth clause;

Generally, the weight parameter (i.e. the weight coefficient or the weight value) is positively correlated with the importance degree, that is, the larger the weight parameter is, the more important the corresponding data is in the calculation formula of the overall comprehensive score. For example, setting C_tIs 0.5, C_s1.0, which indicates that in the above calculation formula, the result of the parsing is more important than the result of the lexical analysis, i.e., the lexical analysis result is less important than the parsing result.

In order to improve the accuracy of the composite score, the weight parameters can also be optimized through machine learning. For example, for C_t、C_sAnd C_mAnd w_ti、w_sjAnd w_mkThe constant weight parameters can be used for obtaining a model based on initial values (for example, prior weight values obtained according to experience) of the parameters, continuously performing machine learning, continuously training on the basis of big data, and finally obtaining optimized C_t、C_sAnd C_mAnd w_ti、w_sjAnd w_mkAnd (4) the weighting parameters are equal. Moreover, with the accumulation of a large amount of network attack detection data, the optimization process can be dynamically and continuously carried out to continuously adjust and optimize C_t、C_sAnd C_mAnd w_ti、w_sjAnd w_mkAnd the final judgment result is more and more accurate due to the equal weight parameters.

The judgment method of the comprehensive score can also eliminate noise caused by some request data which does not conform to the grammar rule but has no aggressivity, so that the accuracy of network attack judgment is improved, and the misjudgment rate is reduced to a certain extent.

Based on the same inventive concept, the embodiment of the invention also provides a network attack detection device, and as the principle adopted by the device is similar to the network attack detection method, the implementation of the device can refer to the network attack detection method, and repeated details are not repeated.

[ DEVICE EMBODIMENT 1]

Fig. 6 is a schematic structural diagram of a network attack detection apparatus according to embodiment 1 of the present invention. Referring to fig. 6, in the present embodiment, a network attack detection apparatus 10 includes: the target language determining module 1, the analyzing module 2 and the risk level determining module 3 specifically:

the target language determining module 1 is used for determining the target language from the request data according to the type of the target language.

The analysis module 2 includes: lexical analysis section 21, syntax analysis section 22, and semantic analysis section 23, specifically:

the lexical analysis unit 21 is configured to perform lexical analysis on the target language determined by the target language determination module 1.

The parsing unit 22 is used for parsing the target language determined by the target language determination module 1.

The semantic analysis unit 23 is used for performing semantic analysis on the target language determined by the target language determination module 1.

The risk level determination module 3 is configured to determine a risk level of the requested data according to results of the lexical analysis unit 21, the syntactic analysis unit 22, and the semantic analysis unit 23.

[ DEVICE EMBODIMENT 2 ]

The apparatus provided in this embodiment includes all the contents of apparatus embodiment 1, and is not described herein again. In the present embodiment, the target language determination module 1 comprises a determination unit, specifically:

the determination unit is used for determining the target language from the payload data of the request data through a first automaton under the condition that the type of the target language is a script language, wherein the first automaton is constructed by using the following model: a model built for a wrapper layer outside the target language, the model comprising: the position of occurrence, the form of occurrence and the form of encoding of the target language.

[ DEVICE EMBODIMENT 3 ]

The apparatus provided in this embodiment includes all the contents of apparatus embodiment 2, and is not described herein again. In this embodiment, the target language determination module 1 further includes: a recognition unit and a target language decoding unit, specifically:

the identification unit is used for identifying whether the target language is coded.

The target language decoding unit is used for decoding the target language under the condition that the identification unit identifies that the target language is coded.

[ DEVICE EMBODIMENT 4 ]

The apparatus provided in this embodiment includes all the contents of apparatus embodiment 1, and is not described herein again. In this embodiment, the target language determining module 1 specifically determines the target language from the request data according to the type of the target language by: in a case where the type of the target language is a structured query language, determining payload data of the request data as the target language.

[ DEVICE EMBODIMENT 5 ]

Fig. 7 is a schematic structural diagram of a network attack detection apparatus according to embodiment 5 of the present invention. Referring to fig. 7, in the present embodiment, a network attack detection apparatus 10' includes: target language determination module 1', analysis module 2', risk level determination module 3 'and extraction module 4', in particular:

the target language determining module 1', the analyzing module 2 ', and the risk level determining module 3 ' are the same as the target language determining module 1, the analyzing module 2, and the risk level determining module 3 in the device embodiment 1, and are not described herein again.

The extraction module 4' is configured to extract the payload data from the request data.

[ DEVICE EMBODIMENT 6 ]

The apparatus provided in this embodiment includes all the contents of apparatus embodiment 5, and is not described herein again. As shown in fig. 8, in the present embodiment, the extraction module 4' includes: parsing unit 41 'and header parameter or request body decoding unit 42', specifically:

the parsing unit 41' is used for parsing out the specified header parameter or request body from the request data.

The header parameter or request body decoding unit 42 'is configured to decode the header parameter or request body parsed by the parsing unit 41' to obtain the payload data.

[ DEVICE EMBODIMENT 7 ]

The apparatus provided in this embodiment includes all of the apparatus embodiments 1 to 6, and is not described herein again. As shown in fig. 9, in the present embodiment, the lexical analysis unit 21 includes: the determination component 211 and the analysis component 212, in particular:

the determination component 211 is configured to determine lexical elements in the target language.

The analysis component 212 is configured to analyze the lexical elements determined by the determination component 211 through finite state automata to obtain a token sequence of clauses in the target language.

[ DEVICE EMBODIMENT 8 ]

The apparatus provided in this embodiment includes all of apparatus embodiment 7, and is not described herein again. In this embodiment, the lexical analysis unit 21 further includes a disambiguation component, specifically:

the disambiguation component is to perform a disambiguation operation on the target language according to a context of the target language.

[ DEVICE EMBODIMENT 9 ]

The apparatus provided in this embodiment includes all of the contents of any one of apparatus embodiments 1 to 8, and is not described herein again. In the present embodiment, the parsing unit 22 specifically implements parsing of the target language by: and inputting the token sequence into a second automaton to obtain a syntax analysis result of the clause in the target language, wherein the second automaton is generated according to the syntax standard of the target language.

[ DEVICE EMBODIMENT 10 ]

The apparatus provided in this embodiment includes all the contents of apparatus embodiment 9, and is not described herein again. The network attack detection device provided by this embodiment further includes: the file generation module and the automaton generation module specifically:

and the file generation module is used for generating a BNF file according to the grammar standard of the target language.

The automaton generating module is used for generating the second automaton according to the BNF file generated by the file generating module.

[ DEVICE EMBODIMENT 11 ]

The apparatus provided in this embodiment includes all of the apparatus embodiments 1 to 10, and is not described herein again. In this embodiment, the semantic analysis unit 23 implements semantic analysis on the target language by specifically: and identifying a key function call body and a key feature substructure from the target language, for example, by using a bottom-up reduction mode.

[ DEVICE EMBODIMENT 12 ]

The apparatus provided in this embodiment includes all of the apparatus embodiments 1 to 11, and is not described herein again. As shown in fig. 10, in the present embodiment, the risk level determination module 3 includes: a calculation submodule 31, a comparison submodule 32 and a determination submodule 33, in particular:

the computation submodule 31 is configured to compute a composite score of the results of the lexical analysis, the syntactic analysis and the semantic analysis.

The comparison submodule 32 is used for comparing the comprehensive score calculated by the calculation submodule 31 with a set threshold value range.

The determination submodule 33 is used for determining the risk level of the requested data according to the result of the comparison by the comparison submodule 32.

[ DEVICE EMBODIMENT 13 ]

The apparatus provided in this embodiment includes all of apparatus embodiment 10, and is not described herein again. As shown in fig. 11, the calculation submodule 31 includes: the sub-score calculating unit 311, the weighting unit 312, and the comprehensive score calculating unit 313 specifically:

the sub-score calculating unit 311 includes: the first computing component 3111, the second computing component 3112 and the third computing component 3113, in particular:

the first computing component 3111 is configured to compute a first sub-score for the target language based on a result of the lexical analysis.

The second computing component 3112 is configured for computing a second sub-score for the target language based on the result of the parsing.

The third computing component 3113 is configured for computing a third sub-score for the target language based on the result of the semantic analysis.

The weighting unit 312 is configured to weight the first sub-score calculated by the first calculating component 3111, the second sub-score calculated by the second calculating component 3112, and the third sub-score calculated by the third calculating component 3113.

The integrated score calculating unit 313 is configured to calculate the integrated score according to the first sub-score, the second sub-score, and the third sub-score weighted by the weighting unit 312.

[ DEVICE EMBODIMENT 14 ]

The apparatus provided in this embodiment includes all of apparatus embodiment 13, and is not described herein again. In this embodiment, the first calculating component 3111 specifically implements calculating the first sub-score of the target language according to the result of the lexical analysis by: and calculating the first sub-score according to the occurrence times and the weight parameters of the token sequence.

[ DEVICE EMBODIMENT 15 ]

The apparatus provided in this embodiment includes all of the apparatus embodiment 13 or the apparatus embodiment 14, and is not described herein again. In this embodiment, the second calculation component 3112 is specifically configured to calculate the second sub-score of the target language based on the result of the parsing by: and calculating the second sub-score according to the grammar analysis result and the weight parameter of the grammar analysis result.

[ DEVICE EMBODIMENT 16 ]

The apparatus provided in this embodiment includes all of apparatus embodiment 13 to apparatus embodiment 15, and is not described herein again. In this embodiment, the third computing component 3113 specifically implements the computing of the third sub-score of the target language from the result of the semantic analysis by: and calculating the third sub-score according to the occurrence frequency and the weight parameter of the key function calling body or the key feature substructure.

[ DEVICE EMBODIMENT 17 ]

The apparatus provided in this embodiment includes all of the apparatus embodiments 12 to 16, and is not described herein again. In the present embodiment, the calculation sub-module 31 calculates the integrated score of the results of the lexical analysis, the syntactic analysis, and the semantic analysis according to the following formulas:

in the above formula:

score (payload) is the composite score;

w_tithe weight parameter of the ith token sequence;

w_sjthe weight parameter is the jth clause;

[ DEVICE EMBODIMENT 18 ]

The apparatus provided in this embodiment includes all of apparatus embodiment 17, and is not described herein again. The network attack detection device provided by this embodiment further includes: an optimization module, specifically:

the optimization module is used for optimizing the weight parameters through machine learning.

The embodiment of the invention also provides terminal equipment, which comprises a memory and a processor; wherein,

the memory is configured to store one or more computer instructions that, when executed by the processor, are capable of performing the method of any one of method embodiments 1-10.

Furthermore, embodiments of the present invention also provide a computer storage medium for storing one or more computer instructions, wherein the one or more computer instructions, when executed, enable implementation of the method according to any one of method embodiment 1 to method embodiment 10.

Generally, the detection scheme of the network attack provided by the embodiment of the invention abandons the traditional detection mode using rules, can define a model of a target language aiming at various network vulnerabilities (SQL injection attack, XSS cross-site scripting attack, PHP injection attack, and the like), does not need manual maintenance rules, can intelligently extract the target language from request data, performs lexical analysis, syntactic analysis and semantic analysis aiming at the target language, and integrates the results of the lexical analysis, the syntactic analysis and the semantic analysis, so as to accurately and deeply identify whether the target language is the network attack, and further judge whether the request data has risks, and has high accuracy and few misjudgments. And because a rule detection mode is not adopted, the problem that the speed is slower when more rules are overlapped does not exist, and the running speed is higher.

Those skilled in the art will clearly understand that the present invention may be implemented entirely in software, or by a combination of software and a hardware platform. Based on such understanding, all or part of the technical solutions of the present invention contributing to the background may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, a smart phone, a network device, etc.) to execute the method according to each embodiment or some parts of the embodiments of the present invention.

As used herein, the term "software" or the like refers to any type of computer code or set of computer-executable instructions in a general sense that is executed to program a computer or other processor to perform various aspects of the present inventive concepts as discussed above. Furthermore, it should be noted that according to one aspect of the embodiment, one or more computer programs implementing the method of the present invention when executed do not need to be on one computer or processor, but may be distributed in modules in multiple computers or processors to execute various aspects of the present invention.

Computer-executable instructions may take many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. In particular, the operations performed by the program modules may be combined or separated as desired in various embodiments.

Also, technical solutions of the present invention may be embodied as a method, and at least one example of the method has been provided. The actions may be performed in any suitable order and may be presented as part of the method. Thus, embodiments may be configured such that acts may be performed in an order different than illustrated, which may include performing some acts simultaneously (although in the illustrated embodiments, the acts are sequential).

The definitions given and used herein should be understood with reference to dictionaries, definitions in documents incorporated by reference, and/or their ordinary meanings.

In the claims, as well as in the specification above, all transitional phrases such as "comprising," "having," "containing," "carrying," "having," "involving," "consisting essentially of …," and the like are to be understood to be open-ended, i.e., to include but not limited to.

The terms and expressions used in the specification of the present invention have been set forth for illustrative purposes only and are not meant to be limiting. It will be appreciated by those skilled in the art that changes could be made to the details of the above-described embodiments without departing from the underlying principles thereof. The scope of the invention is, therefore, indicated by the appended claims, in which all terms are intended to be interpreted in their broadest reasonable sense unless otherwise indicated.

While various embodiments of the present invention have been described above with particularity, various aspects or features of the teachings of embodiments of the present invention are described below in another form and are not limited to the following series of paragraphs, some or all of which may be assigned alphanumeric characters for the sake of clarity. Each of these paragraphs may be combined with the contents of one or more other paragraphs in any suitable manner. Without limiting examples of some of the suitable combinations, some paragraphs hereinafter make specific reference to and further define other paragraphs.

A1, a method for detecting network attacks, comprising:

decoding request data in Web flow, and extracting payload data payload in the request data;

determining a target language in the payload;

and judging whether the request data has risks or not according to the results of the lexical analysis, the syntactic analysis and the semantic analysis.

A2, where the method in a1, the decoding the request data in the Web traffic to extract payload data payload therein includes:

analyzing at least one preset head parameter or HTTP Body request Body in the request data according to a hypertext transfer protocol (HTTP);

decoding the at least one header request parameter or the content of the HTTP Body request Body to extract payload contained in the at least one header request parameter or the content of the HTTP Body request Body;

the at least one head parameter comprises one or any combination of the following parameters:

a Request URL parameter, a refer parameter, a cookie parameter, and a User-Agent parameter.

A3, the method as in a1, wherein the determining the target language in the payload includes:

determining that the payload is itself the target language; or

Extracting the target language embedded in the payload from the payload.

A4, the method as in A3, extracting the target language embedded in the payload from the payload, comprising:

modeling a packaging layer outside a target language, and listing all possible positions, appearance forms and encoding forms of the target language;

and constructing a push-down automaton or a finite state automaton by using a modeling model of an external packaging layer to process the external language, marking and extracting the target language.

A5, the method of a1, wherein the lexical analysis of the target language comprises:

sequentially determining the boundary of each lexical element in the target language;

and constructing a finite state automaton, and analyzing the determined lexical elements to form a corresponding token sequence.

A6, the method as in a5, wherein determining the boundary of each lexical element in the target language further comprises:

A7, the method as in a5, wherein parsing the target language comprises:

generating a corresponding push-down automaton or finite state automaton according to the grammar standard of the target language;

and taking the token sequence of the target language obtained by lexical analysis as the input of the push-down automaton or the finite state automaton, and outputting a syntax analysis result whether each clause of the target language meets the syntax standard.

A8, the method as in a7, wherein generating a corresponding push-down automaton or finite state automaton according to the language standard of the target language comprises:

generating the language description Backus-van BNF file aiming at the language standard of the target language;

and generating a corresponding push-down automaton or finite state automaton according to the BNF file.

A9, the method of any one of a1 to A8, wherein semantically analyzing the target language comprises:

and analyzing the semantic structure of the target language in a bottom-up reduction mode, and identifying a key function calling body and a key feature substructure contained in the semantic structure.

A10, the method of A9, wherein the determining whether the requested data is at risk according to the results of the lexical analysis, the syntactic analysis, and the semantic analysis comprises:

calculating the comprehensive scores of the lexical analysis result, the syntactic analysis result and the semantic analysis result of the target language according to a preset algorithm;

comparing the composite score with a preset threshold range of at least one risk level;

and determining whether the requested data has risks and the risk level to which the requested data belongs when the requested data has risks according to the comparison result.

A11, the method of A10, wherein the comprehensive score of the lexical analysis result, the syntactic analysis result and the semantic analysis result of the target language is calculated according to the following formulas:

in the above formula:

score (payload) is the composite score for the target language;

t_ithe occurrence frequency of the ith token sequence of the target language obtained by lexical analysis is obtained;

w_tithe weight value corresponding to the ith token sequence;

s_jwhether the jth clause of the target language obtained by syntax analysis meets the syntax analysis result of the syntax standard or not is judged, and the syntax analysis result is 0 or 1;

w_sjthe weight value corresponding to the jth clause;

m_kthe number of times of occurrence of a k-th key function calling body or key feature substructure of the target language obtained by semantic analysis;

w_mka weight value corresponding to the kth key function calling body or the key feature substructure;

C_t、C_sand C_mThe weight coefficients corresponding to lexical analysis, syntactic analysis and semantic analysis in the comprehensive score are respectively.

A12, the method as described in A11, wherein C_t、C_sAnd C_mAnd w_ti、w_sjAnd w_mkObtained by the following method: the method is obtained by continuously training and optimizing a parameter model established by the initial values of the preset parameters.

B13, an apparatus for detecting a network attack, comprising:

the extraction module is used for decoding the request data in the Web flow and extracting payload data payload in the request data;

the target language determining module is used for determining a target language in the payload;

the lexical analysis module is used for carrying out lexical analysis on the target language;

the grammar analysis module is used for carrying out grammar analysis on the target language;

the semantic analysis module is used for performing semantic analysis on the target language;

and the risk judgment module is used for judging whether the request data has risks or not according to the results of the lexical analysis, the syntactic analysis and the semantic analysis.

B14, the device as described in B13, the extraction module comprising:

the analysis submodule is used for analyzing at least one preset head parameter or HTTP Body request Body in the request data according to an HTTP protocol;

a decoding sub-module for decoding the at least one header request parameter or the content of the HTTP Body request Body;

an extraction submodule, configured to extract payload included in the content decoded by the decoding submodule;

B15, the apparatus of B13, wherein the target language determining module comprises: determining a submodule or extracting a submodule; wherein:

the determining submodule is used for determining that the payload is the target language;

and the extraction submodule is used for extracting the target language embedded in the payload from the payload.

B16, the device as described in B15, wherein the extraction submodule is specifically configured to model a packaging layer outside a target language, enumerate all possible positions, forms and encoding forms of the target language; and constructing a push-down automaton or a finite state automaton by using a modeling model of an external packaging layer to process the external language, marking and extracting the target language.

B17, in the apparatus according to B13, the lexical analysis module is specifically configured to sequentially determine boundaries of each lexical element in the target language; and constructing a finite state automaton, and analyzing the determined lexical elements to form a corresponding token sequence.

B18, the apparatus as in B17, wherein the lexical analysis module is further configured to perform a disambiguation operation on the target language according to a context of the target language when determining boundaries of each lexical element in the target language.

B19, the apparatus as described in B17, the syntax analysis module comprising:

the automatic machine generation submodule is used for generating a corresponding push-down automatic machine or a finite state automatic machine according to the language standard of the target language;

and the grammar analysis submodule is used for taking the token sequence of the target language obtained by lexical analysis as the input of the push-down automaton or the finite state automaton and outputting a grammar analysis result whether each clause of the target language meets a grammar standard.

B20, in the apparatus according to B19, the automaton generation submodule is specifically configured to generate the language description bacauss-paradigm BNF file for the language standard of the target language; and generating a corresponding push-down automaton or finite state automaton according to the BNF file.

B21, the apparatus according to any one of B13 to B20, wherein the semantic analysis module is specifically configured to analyze the semantic structure of the target language in a bottom-up reduction manner, and identify the key function call entity and the key feature substructure contained therein.

B22, the apparatus of B21, wherein the risk assessment module comprises:

the comprehensive score calculation submodule is used for calculating the comprehensive scores of the lexical analysis result, the syntactic analysis result and the semantic analysis result of the target language according to a preset algorithm;

a comparison submodule for comparing the composite score with a preset threshold range of at least one risk level;

and the judgment submodule is used for determining whether the requested data has risks and the risk level to which the requested data belongs when the requested data has risks according to the comparison result.

B23, in the apparatus according to B22, the comprehensive score calculating sub-module is specifically configured to calculate the comprehensive scores of the lexical analysis result, the syntactic analysis result, and the semantic analysis result of the target language according to the following formulas:

in the above formula:

score (payload) is the composite score for the target language;

w_tithe weight value corresponding to the ith token sequence;

w_sjthe weight value corresponding to the jth clause;

B24, device as described in B23, said C_t、C_sAnd C_mAnd w_ti、w_sjAnd w_mkObtained by the following method: the method is obtained by continuously training and optimizing a parameter model established by the initial values of the preset parameters.

C25, an apparatus for detecting cyber attacks, comprising:

a processor;

a memory for storing processor executable commands;

wherein the processor is configured to:

determining a target language in the payload;

D26, a non-transitory computer readable storage medium having instructions that, when executed by a processor of a network device such as a server, enable the network device to perform the detection of the network attack, comprising:

determining a target language in the payload;

In summary, the technical solutions provided by the embodiments of the present invention at least have the following beneficial effects:

1) in the detection method and the related device for network attacks provided by the embodiment of the invention, the traditional mode of using rules for detection is abandoned, the model of the target language can be defined aiming at various network vulnerabilities (SQL injection attack, XSS cross-site scripting attack, PHP injection attack and the like), the manual maintenance rules are not needed, the target language can be intelligently extracted from the request data, the lexical analysis, the syntactic analysis and the semantic analysis are carried out aiming at the target language, the results of the lexical analysis, the syntactic analysis and the semantic analysis are integrated, whether the target language is the network attack or not is accurately and deeply identified, and then whether the request data has risks or not is judged, the accuracy is higher, and the misjudgment is less. And because a rule detection mode is not adopted, the problem that the speed is slower when more rules are overlapped does not exist, and the running speed is higher.

2) In the method and the related apparatus for detecting a network attack provided by the embodiments of the present invention, a BNF file (lexical or syntactic description definition file) can be converted into a corresponding push-down automaton or finite state automaton to perform lexical analysis or syntactic analysis, so that the result is accurate and the execution speed is fast.

3) In the method for detecting network attacks and the related device provided by the embodiment of the invention, the lexical analysis, the syntactic analysis and the semantic analysis are quantized into the comprehensive score, and in the calculation process of the comprehensive score, various weight parameters of the results of the lexical analysis, the syntactic analysis and the semantic analysis are utilized, and various parameters are automatically learned through a machine learning model, so that the parameters are adjusted and optimized, and the calculation result is more accurate.

4) In the method and the related device for detecting network attacks provided by the embodiments of the present invention, a bottom-up reduction manner is adopted, so that semantic structures such as expressions, key statements, parenthesis matching relationships, function call relationships, and the like in a target language can be identified, and thus a key function call body and a key feature substructure contained therein are identified. The method can realize automatic layer-by-layer analysis of semantics, can accurately identify the semantic structure of the language, has higher accuracy of semantic analysis, and lays a foundation for accurately carrying out comprehensive judgment of network attacks on the results of subsequent comprehensive lexical analysis, syntactic analysis and semantic analysis.

Claims

1. A method for detecting a network attack, the method comprising:

determining the risk level of the request data according to the results of the lexical analysis, the syntactic analysis and the semantic analysis;

wherein determining the target language from the request data according to the type of the target language comprises:

if the type of the target language is a script language, determining the target language embedded in the target language from the payload data of the request data through a first automaton, wherein the first automaton is constructed by using the following model: a model built for a wrapper layer outside the target language, the model comprising: the appearance position, the appearance form and the coding form of the target language;

if the type of the target language is a structured query language, determining the payload data of the request data as the target language;

wherein the lexical analysis of the target language comprises:

determining lexical elements in the target language;

and analyzing the lexical elements through a finite state automaton to obtain a token sequence of the clauses in the target language.

2. The method of claim 1, wherein determining the target language from the request data based on the type of target language further comprises:

identifying whether the target language is encoded;

and if the target language is coded, decoding the target language.

3. The method of claim 1 or 2, wherein the method further comprises:

extracting the payload data from the request data.

4. The method of claim 3, wherein extracting the payload data from the request data comprises:

analyzing a specified head parameter or a request body from the request data;

decoding the header parameters or a request body to obtain the payload data.

5. The method of claim 4,

the head parameters include a combination of one or more of:

a Request network address Request URL parameter, a reference Referer parameter, a cookie parameter, and a User Agent User-Agent parameter.

6. The method of claim 1, wherein lexical analyzing the target language further comprises:

7. The method of claim 1, wherein parsing the target language comprises:

8. The method of claim 7, wherein the method further comprises:

generating a Backos-normal BNF file according to the grammatical standard of the target language;

and generating the second automaton according to the BNF file.

9. The method of claim 7, wherein semantically analyzing the target language comprises:

10. The method of claim 9, wherein identifying key function call volumes and key feature substructures from the target language comprises:

11. The method of claim 10, wherein determining a risk level for the requested data based on results of the lexical analysis, the syntactic analysis, and the semantic analysis comprises:

comparing the comprehensive score with a set threshold range;

12. The method of claim 11, wherein calculating a composite score for the results of the lexical analysis, the syntactic analysis, and the semantic analysis comprises:

13. The method of claim 12, wherein calculating a first sub-score for the target language based on the results of the lexical analysis comprises:

14. The method of claim 13, wherein calculating a second sub-score for the target language based on the results of the parsing comprises:

15. The method of claim 14, wherein calculating a third sub-score for the target language based on the results of the semantic analysis comprises:

16. The method of claim 15, wherein the integrated score of the results of the lexical analysis, the syntactic analysis, and the semantic analysis is calculated according to the following formula:

in the above formula:

score (payload) is the composite score;

n_tthe number of token sequences obtained by the lexical analysis is used as the number of token sequences;

w_tithe weight parameter of the ith token sequence;

n_sthe number of clauses obtained through the syntactic analysis is used as the number of the clauses;

w_sjthe weight parameter is the jth clause;

n_kthe number of key function call bodies or key feature substructures obtained through the semantic analysis;

17. The method of claim 16, wherein the method further comprises:

the weight parameters are optimized by machine learning.

18. An apparatus for detecting a cyber attack, the apparatus comprising:

the risk level determining module is used for determining the risk level of the request data according to the results of the lexical analysis, the syntactic analysis and the semantic analysis;

wherein the target language determination module comprises:

a determination unit configured to determine, in a case where a type of a target language is a script language, the target language embedded therein from payload data of the request data by a first automaton, wherein the first automaton is constructed using the following model: a model built for a wrapper layer outside the target language, the model comprising: the appearance position, the appearance form and the coding form of the target language;

the target language determination module is further configured to determine the target language from the request data according to the type of the target language by: determining payload data of the request data as the target language in case that the type of the target language is a structured query language;

wherein the lexical analysis unit includes:

19. The apparatus of claim 18, wherein the target language determination module further comprises:

an identifying unit for identifying whether the target language is encoded;

20. The apparatus of claim 18 or 19, wherein the apparatus further comprises:

21. The apparatus of claim 20, wherein the extraction module comprises:

22. The apparatus of claim 21,

the head parameters include a combination of one or more of:

23. The apparatus of claim 18, wherein the lexical analysis unit further comprises:

24. The apparatus of claim 18,

the syntax analysis unit is used for realizing syntax analysis of the target language by the following modes: and inputting the token sequence into a second automaton to obtain a syntax analysis result of the clause in the target language, wherein the second automaton is generated according to the syntax standard of the target language.

25. The apparatus of claim 24, wherein the apparatus further comprises:

26. The apparatus of claim 24,

the semantic analysis unit is used for performing semantic analysis on the target language by the following modes: and identifying a key function call body and a key feature substructure from the target language.

27. The apparatus of claim 26,

the semantic analysis unit is used for identifying a key function call body and a key feature substructure from the target language by the following modes: and identifying a key function calling body and a key feature substructure from the target language by adopting a bottom-up reduction mode.

28. The apparatus of claim 27, wherein the risk level determination module comprises:

29. The apparatus of claim 28, wherein the computation submodule comprises:

30. The apparatus of claim 29,

the first calculation component is used for calculating a first sub-score of the target language according to the result of the lexical analysis by the following means: and calculating the first sub-score according to the occurrence times and the weight parameters of the token sequence.

31. The apparatus of claim 30,

the second calculation component is used for calculating a second sub-score of the target language according to the result of the syntactic analysis by the following method: and calculating the second sub-score according to the grammar analysis result and the weight parameter of the grammar analysis result.

32. The apparatus of claim 31,

the third computing component is used for computing a third sub-score of the target language according to the result of the semantic analysis by the following means: and calculating the third sub-score according to the occurrence frequency and the weight parameter of the key function calling body or the key feature substructure.

33. The apparatus of claim 32,

the calculation submodule is used for calculating the comprehensive score of the results of the lexical analysis, the syntactic analysis and the semantic analysis according to the following formula:

in the above formula:

score (payload) is the composite score;

w_tithe weight parameter of the ith token sequence;

w_sjthe weight parameter is the jth clause;

34. The apparatus of claim 33, wherein the apparatus further comprises:

35. A terminal device comprising a memory and a processor; wherein,

the memory is to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, are capable of implementing the method of any one of claims 1 to 17.

36. A computer storage medium storing one or more computer instructions which, when executed, are capable of implementing the method of any one of claims 1 to 17.