CN117792720A - Network attack identification method, device and equipment - Google Patents

Network attack identification method, device and equipment Download PDF

Info

Publication number
CN117792720A
CN117792720A CN202311767799.2A CN202311767799A CN117792720A CN 117792720 A CN117792720 A CN 117792720A CN 202311767799 A CN202311767799 A CN 202311767799A CN 117792720 A CN117792720 A CN 117792720A
Authority
CN
China
Prior art keywords
data
api request
api
attack
digital sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311767799.2A
Other languages
Chinese (zh)
Inventor
周涛
常力元
马尚荣
崔乾
方文静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Safety Technology Co Ltd
Original Assignee
Tianyi Safety Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Safety Technology Co Ltd filed Critical Tianyi Safety Technology Co Ltd
Priority to CN202311767799.2A priority Critical patent/CN117792720A/en
Publication of CN117792720A publication Critical patent/CN117792720A/en
Pending legal-status Critical Current

Links

Abstract

The application discloses a method, a device and equipment for identifying network attack, comprising the following steps: decrypting the received API request data to obtain original data, and removing redundant data irrelevant to attack detection in the original data to obtain original characteristic data relevant to attack detection in the API request; based on the corresponding relation between the preset symbol and the identification text, replacing the symbol in the original characteristic data with the corresponding identification text to obtain text data corresponding to the API request; converting the text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a digital sequence corresponding to the API request; and inputting the digital sequence into an attack recognition model, and determining whether the API request is legal or not based on a recognition result output by the attack recognition model. According to the method and the device for identifying the illegal API requests, the legal API requests and the illegal API requests are distinguished based on the semantic processed API requests, and the accuracy of identifying the illegal API requests is improved.

Description

Network attack identification method, device and equipment
Technical Field
The application belongs to the technical field of network security, and particularly relates to a method, a device and equipment for identifying network attacks.
Background
In the field of network security, there are various types of application program interface (Application Programming Interface, API) injection attack modes, including structured query language (Structured Query Language, SQL) injection, cross-site scripting attack (Cross Site Script, XSS), operating System (OS) command injection, and the like. These attack patterns all share a basic feature: an attacker spoofs an application program by masquerading an illegal API request as a legal API request through masquerading malicious code, so that the application program performs improper operation, thereby bypassing a normal security mechanism. Once an attack is successful, potentially catastrophic results may result, such as database theft, confidential data leakage, application crashes, etc.
The existing API injection attack detection solution is usually used for detecting and defending against a single attack mode, and cannot effectively detect and defend all types of API injection attacks, particularly unknown attack types; in addition, the existing API injection attack detection method has the problems of lower accuracy and higher misinformation and omission ratio when distinguishing legal API requests from illegal API requests, and poor performance when processing special symbols in the API requests, and an attacker can manipulate request parameters to bypass detection through encoding the special symbols, so that potential illegal API requests cannot be effectively identified.
Disclosure of Invention
Aiming at the problems, the application provides a method, a device and equipment for identifying network attack, which are used for distinguishing legal API requests and illegal API requests based on semantic processing API requests, so that the accuracy of detecting the illegal API requests of different types is improved.
In a first aspect, the present application provides a method for identifying a network attack, the method comprising:
decrypting received application program interface API request data to obtain original data, and removing redundant data irrelevant to attack detection in the original data to obtain original characteristic data relevant to attack detection in the API request;
based on the corresponding relation between the preset symbol and the identification text, replacing the symbol in the original characteristic data with the corresponding identification text to obtain text data corresponding to the API request;
converting the text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a digital sequence corresponding to the API request;
and inputting the digital sequence into an attack recognition model, and determining whether the API request is legal or not based on a recognition result output by the attack recognition model.
In one or more embodiments, the removing redundant data in the original data that is not related to attack detection includes:
Determining parameters irrelevant to attack detection based on a set data screening rule, and removing parameters irrelevant to attack detection and parameter values corresponding to the parameters in the original data; and
and removing the digital parameter values of the parameters related to attack detection in the original data.
In one or more embodiments, after obtaining the original feature data related to attack detection in the API request, before converting the symbol of the original feature data into the text expression form, the method further includes:
if the API request belongs to the file request type, determining the file type suffix and the file size of the API request, and adding the determined file type suffix and the determined file size into the original characteristic data.
In one or more embodiments, the number sequences include a first number sequence corresponding to a word and a second number sequence corresponding to a letter;
the converting the text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain the digital sequence corresponding to the API request comprises the following steps:
determining a first digital sequence corresponding to the words in the text data based on the corresponding relation between the words in the text data and the first digital sequence, and replacing the words in the text data with the corresponding first digital sequence to obtain a word coding sequence corresponding to an API request;
Determining a second digital sequence corresponding to the letters in the text data based on the corresponding relation between the letters in the text data and the second digital sequence, and converting the letters in the text data into the corresponding second digital sequence to obtain a letter coding sequence corresponding to an API request;
and obtaining a number sequence corresponding to the API request based on the word coding sequence and the letter coding sequence.
In one or more embodiments, the attack recognition model is derived as follows:
converting the acquired API request sample into a digital sequence corresponding to the API request sample, wherein the API request sample comprises legal API requests and illegal API requests as a training sample set;
constructing a basic attack recognition model according to the received model construction instruction, wherein a loss function of the basic attack recognition model is a weighted loss function set based on the relative proportion of legal API requests and illegal API requests;
and carrying out iterative training on the constructed basic attack recognition model according to the training sample set until the model accuracy reaches a set value to obtain the attack recognition model.
In one or more embodiments, the converting the acquired API request sample into a number sequence corresponding to the API request sample includes:
Acquiring an API request sample, and removing redundant data irrelevant to attack detection in API request sample data to obtain sample original characteristic data relevant to attack detection in the API request sample;
based on the corresponding relation between the preset symbol and the identification text, replacing the symbol in the original characteristic data of the sample with the corresponding identification text to obtain sample text data corresponding to the API request sample;
and converting the sample text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a sample digital sequence corresponding to the API request.
In one or more embodiments, the determining whether the API request is legal based on the recognition result output by the attack recognition model includes:
inputting the digital sequence into an attack recognition model, and outputting legal probability of an API request;
taking the API request with the legal probability exceeding a first preset value as a first legal API request;
the API requests with legal probability between a first preset value and a second preset value are arranged according to the legal probability from big to small, and the API requests with the top ranking of the set proportion are selected to be used as the second legal API requests, wherein the first preset value is higher than the second preset value;
Identifying the first legitimate API request and the second legitimate API request as legitimate API requests, and identifying the remaining API requests as illegitimate API requests.
In a second aspect, the present application provides an apparatus for network attack identification, the apparatus comprising:
the data screening module is used for decrypting the received API request data to obtain original data, and removing redundant data irrelevant to attack detection in the original data to obtain original characteristic data relevant to the attack detection in the API request;
the first conversion module is used for replacing the symbols in the original characteristic data with the corresponding identification texts based on the corresponding relation between the preset symbols and the identification texts to obtain text data corresponding to the API request;
the second conversion module is used for converting the text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a digital sequence corresponding to the API request;
and the attack recognition module is used for inputting the digital sequence into an attack recognition model and determining whether the API request is legal or not based on a recognition result output by the attack recognition model.
In one or more embodiments, the data filtering module removes redundant data that is irrelevant to attack detection from the original data, and specifically includes:
Determining parameters irrelevant to attack detection based on a set data screening rule, and removing parameters irrelevant to attack detection and parameter values corresponding to the parameters in the original data; and
and removing the digital parameter values of the parameters related to attack detection in the original data.
In one or more embodiments, the apparatus further includes a suffix extraction module 305, where after obtaining the original feature data related to attack detection in the API request, before converting the symbol of the original feature data into the text expression form, the suffix extraction module is specifically configured to:
if the API request belongs to the file request type, determining the file type suffix and the file size of the API request, and adding the determined file type suffix and the determined file size into the original characteristic data.
In one or more embodiments, the number sequences include a first number sequence corresponding to a word and a second number sequence corresponding to a letter;
the second conversion module converts the text data into a digital sequence based on a corresponding relation between the preset text data and the digital sequence to obtain a digital sequence corresponding to the API request, and specifically comprises the following steps:
Determining a first digital sequence corresponding to the words in the text data based on the corresponding relation between the words in the text data and the first digital sequence, and replacing the words in the text data with the corresponding first digital sequence to obtain a word coding sequence corresponding to an API request;
determining a second digital sequence corresponding to the letters in the text data based on the corresponding relation between the letters in the text data and the second digital sequence, and converting the letters in the text data into the corresponding second digital sequence to obtain a letter coding sequence corresponding to an API request;
and obtaining a number sequence corresponding to the API request based on the word coding sequence and the letter coding sequence.
In one or more embodiments, the apparatus further comprises a model training module, in particular for:
converting the acquired API request sample into a digital sequence corresponding to the API request sample, wherein the API request sample comprises legal API requests and illegal API requests as a training sample set;
constructing a basic attack recognition model according to the received model construction instruction, wherein a loss function of the basic attack recognition model is a weighted loss function set based on the relative proportion of legal API requests and illegal API requests;
And carrying out iterative training on the constructed basic attack recognition model according to the training sample set until the model accuracy reaches a set value to obtain the attack recognition model.
In one or more embodiments, the model training module converts the acquired API request sample into a number sequence corresponding to the API request sample, and specifically includes:
acquiring an API request sample, and removing redundant data irrelevant to attack detection in API request sample data to obtain sample original characteristic data relevant to attack detection in the API request sample;
based on the corresponding relation between the preset symbol and the identification text, replacing the symbol in the original characteristic data of the sample with the corresponding identification text to obtain sample text data corresponding to the API request sample;
and converting the sample text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a sample digital sequence corresponding to the API request.
In one or more embodiments, the attack recognition module determines whether the API request is legal based on the recognition result output by the attack recognition model, and specifically includes:
inputting the digital sequence into an attack recognition model, and outputting legal probability of an API request;
Taking the API request with the legal probability exceeding a first preset value as a first legal API request;
the API requests with legal probability between a first preset value and a second preset value are arranged according to the legal probability from big to small, and the API requests with the top ranking of the set proportion are selected to be used as the second legal API requests, wherein the first preset value is higher than the second preset value;
identifying the first legitimate API request and the second legitimate API request as legitimate API requests, and identifying the remaining API requests as illegitimate API requests.
In a third aspect, embodiments of the present application provide an apparatus comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of network attack identification as provided in any of the first aspects of the present application.
In a fourth aspect, embodiments of the present application also provide a computer readable storage medium, which when executed by a processor of a terminal device, enables the terminal device to perform a method of network attack identification according to any of the first aspects of the present application.
The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:
the application provides a network attack recognition method, device and equipment, which can prevent an attacker from controlling special symbols in a request to bypass detection by converting semantic processing of an API request into text data in a natural language format, capture more accurate attack semantics, respectively carry out word-level digital conversion and letter-level digital conversion on the text data corresponding to the API request, then input a digital sequence corresponding to the API request into an attack recognition model, enable the recognition result output by the attack recognition model to capture the characteristics in the text more carefully, and simultaneously set a fault-tolerant interval to allow the model to accept legal effective requests within a certain error range, thereby avoiding the error classification of the effective requests into illegal malicious requests.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, and it is obvious that the drawings that are described below are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for identifying a network attack according to an embodiment of the present application;
fig. 2 is a schematic diagram of an attack recognition model architecture according to an embodiment of the present application;
fig. 3 is a schematic diagram of a device for identifying a network attack according to an embodiment of the present application;
fig. 4 is a schematic diagram of a device for identifying a network attack according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Wherein the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the field of network security, there are various types of application program interface (Application Programming Interface, API) injection attack modes, including structured query language (Structured Query Language, SQL) injection, cross-site scripting attack (Cross Site Script, XSS), operating System (OS) command injection, and the like. These attack patterns all share a basic feature: an attacker spoofs an application program by masquerading an illegal API request as a legal API request through masquerading malicious code, so that the application program performs improper operation, thereby bypassing a normal security mechanism. Once an attack is successful, potentially catastrophic results may result, such as database theft, confidential data leakage, application crashes, etc.
The existing API injection attack detection solution is usually used for detecting and defending against a single attack mode, and cannot effectively detect and defend all types of API injection attacks, particularly unknown attack types; in addition, the existing API injection attack detection method has the problems of lower accuracy and higher misinformation and omission ratio when distinguishing legal API requests from illegal API requests, and poor performance when processing special symbols in the API requests, and an attacker can manipulate request parameters to bypass detection through encoding the special symbols, so that potential illegal API requests cannot be effectively identified.
In view of the above problems, the application provides a method, a device and equipment for identifying network attacks, which can prevent an attacker from controlling special symbols in requests to bypass detection by converting semantic processing of the API requests into text data in a natural language format, capture more accurate attack semantics, and input a digital sequence corresponding to the API requests into an attack identification model after respectively carrying out word-level digital conversion and letter-level digital conversion on the text data corresponding to the API requests, so that the identification result output by the attack identification model captures characteristics in the text more carefully, and meanwhile, a fault-tolerant interval is set to allow the model to accept legal effective requests within a certain error range, thereby avoiding the erroneous classification of the effective requests into illegal malicious requests.
Embodiments of the present application are described in further detail below with reference to the accompanying drawings.
As shown in fig. 1, a flowchart of a method for identifying a network attack according to an embodiment of the present application is provided, where the method includes the following steps S101 to S104:
step S101, decrypting received application program interface API request data to obtain original data, and removing redundant data irrelevant to attack detection in the original data to obtain original characteristic data relevant to attack detection in the API request;
as a possible implementation manner, the decrypting the received API request data to obtain the original data includes:
based on the set format unification rule, unifying the format of the received API request data;
and decoding the API request data after the unified format at least once until the original data of the API request is obtained.
The received API request data may be legal API requests generated by normal operation of an application program captured by an application program monitor, where the requests are typically requests sent by a user, other application programs or services to a target API, or illegal API requests containing malicious code generated by an attacker using malicious scripts. Specifically, the format unification rule is set, and all characters requested by each API are required to be uniformed into a lower case format.
Because API requests are typically made in a manner that is passed through uniform resource locator (Uniform Resource Locator, URL) parameters or code sequences. Some characters have special roles in the API request data, e.g., "≡" stands for separate query parameters. Whereas URL data can only contain letters from the ASCII set. For other parameters outside the collection, URL encoding techniques are used to encode them.
By default, the browser and the web server decode the API request data only once, but an attacker may take the form of secondary encoding to bypass, where the API request data is decoded using a decoding technique that includes at least one decoding operation until the original data of the API request is obtained. For example, when the decoding mode uses double decoding, the first decoding restores the API request data to the first layer of encoding, and the second decoding restores the original data of the API request data. This prevents an attacker from spoofing the backend processing logic with multiple encodings.
As a possible implementation manner, the removing redundant data which is irrelevant to attack detection in the original data includes:
determining parameters irrelevant to attack detection based on a set data screening rule, and removing parameters irrelevant to attack detection and parameter values corresponding to the parameters in the original data; and removing digital parameter values of parameters related to attack detection in the original data.
It is contemplated that many parameters are included in one API request, but not all are relevant to attack detection. If all parameters are retained without removing redundant data, the post analysis may become complex and error-prone, reducing the efficiency and accuracy of the analysis.
Therefore, after the original data of the API request is obtained in the present application, redundant data irrelevant to attack detection in the original data needs to be removed, so as to obtain the original feature data relevant to attack detection in the API request.
The redundant data not related to attack detection includes parameters and parameter values not related to attack detection, and digital parameter values of parameters related to attack detection. The parameters to be removed may be parameters determined based on the set data screening rule, specifically may be some general URL parameters, for example pagenumber, pagesize, etc., the digital parameter values of which need to be removed, and the parameter name pagenumber, pagesize is reserved, because the parameter pagenumber, pagesize is focused on in the subsequent attack recognition, and not the value corresponding to the specific parameter value.
Step S102, replacing the symbol in the original characteristic data with a corresponding identification text based on the corresponding relation between the preset symbol and the identification text to obtain text data corresponding to an API request;
As described in the foregoing embodiment, after removing redundant data that is not related to attack detection in the original data corresponding to each API request, the original feature data related to attack detection in each API request is obtained, where the original feature data includes words and symbols related to attack detection and no numbers.
In some embodiments, after obtaining the original feature data related to attack detection in the API request, before converting the sign of the original feature data into a text expression form, the method further includes:
if the API request belongs to the file request type, determining the file type suffix and the file size of the API request, and adding the determined file type suffix and the determined file size into the original characteristic data.
For an API request belonging to a file request (such as requesting resource data of an image, audio, video, etc.), the application needs to extract a file suffix and a requested resource size, and if the API requests the requested file, for example: PNG, the file suffix is PNG, and the requested resource size is 1024KB; then add "png+size:1024 "into the raw feature data. Here, the digital parameter value "1024" is a parameter value related to analysis of the attack behavior, and is thus retained.
In the embodiment of the application, after the original feature data related to attack detection in each API request is obtained, on the basis of the corresponding relation between the preset symbol and the identification text, the symbol in the original feature data corresponding to each API request is subjected to semantic processing, the symbol of the original feature data is converted into a text expression form, the text data corresponding to each API request is obtained, and the text data only comprises letters (or letters and numbers are included in the text data when the API request belongs to a file request type) and does not comprise symbols.
Specifically, the identification text may be a custom word or letter string, and the API request may be converted into text data similar to a natural language format based on a preset correspondence between symbols and the identification text.
Illustratively, the timestamp in the original feature data corresponding to each API request is converted into an identification text "timestamp", all click identifiers are converted into an identification text "clicktag", and the resource id is converted into an identification text "resource".
In this embodiment of the present application, the corresponding relationship between the preset symbol and the identifier text may be a data table stored in a unified dictionary mapping form of the symbol and the identifier text, through which the symbol in the original feature data corresponding to each API request may be semantically processed and converted into a text expression form, so that the present application may further summarize the corresponding relationship between the specific symbol for attack and the identifier text set in addition to implementing the detection of the common injection attack, identify the attack behavior other than the common injection attack, and provide comprehensive security detection for the API request.
The above semanticalization process is described below with reference to specific examples:
for example, an SQL injection attack would modify a query to be executed by the database engine to retrieve the requested information. For example, API requests "url:/users? "user information can be viewed" if the application does not have any mechanism to prevent this type of attack, then the attacker can modify the API request to obtain more information, such as the attacker modifying the API request as an illegitimate API request: "/userstatetry=navigation '+or+1=1", the original feature data after removing the redundant data is "/userstatetry=navigation' +or+ =".
Special symbols "/", "? "," = ","' "and" + ", which would be eliminated in conventional natural language processing applications, are present in the application scenario where attack detection is performed to aid in attack detection. We therefore convert special symbols into specific identification text to represent deep semantics, such as the illegal API request above would become:
“slash users question category equality navigation tick plus OR plus equality”。
as another example, XSS injection attacks refer to an attacker manipulating a vulnerable website and returning malicious script content to a general user for stealing the user's personal information or other malicious operations. For each html tag in the malicious script returned in the XSS injection attack, the html tag is replaced with an identification text, e.g. < p > becomes "chevrons p chevrons".
Step S103, converting the text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a digital sequence corresponding to the API request;
in some embodiments, before converting the text data into a sequence of numbers, the method further comprises:
when determining that repeated text data exists in the text data corresponding to the API request, executing the de-duplication operation, and converting the text data after executing the de-duplication operation into a digital form.
It should be noted that, in the present application, after obtaining the text data corresponding to the API request, the deduplication operation is performed on the text data corresponding to the API request, because there may be some cases where the original data of some API requests are different, and the text data systems obtained through the processing in step S101 and step S102 are the same.
As a possible implementation, the number sequence includes a first number sequence corresponding to a word and a second number sequence corresponding to a letter;
the converting the text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain the digital sequence corresponding to the API request comprises the following steps:
determining a first digital sequence corresponding to the words in the text data based on the corresponding relation between the words in the text data and the first digital sequence, and replacing the words in the text data with the corresponding first digital sequence to obtain a word coding sequence corresponding to an API request;
Determining a second digital sequence corresponding to the letters in the text data based on the corresponding relation between the letters in the text data and the second digital sequence, and converting the letters in the text data into the corresponding second digital sequence to obtain a letter coding sequence corresponding to an API request;
and obtaining a number sequence corresponding to the API request based on the word coding sequence and the letter coding sequence.
In the embodiment of the application, in order to convert text data corresponding to an API request into a digital form capable of being input into an attack recognition model, two kinds of numerical conversion are performed on the text data corresponding to the API request.
Firstly, performing word-level numerical conversion on text data, determining a first number sequence corresponding to each word in a text based on the corresponding relation between the word and the first number sequence, and replacing the word in the text data with the corresponding first number sequence to obtain a word coding sequence corresponding to an API request.
And secondly, carrying out letter-level numerical conversion on the text data, determining a second digital sequence corresponding to each letter in the text based on the corresponding relation between the letters and the second digital sequence, and converting the letters in the text data into the corresponding second digital sequence to obtain a letter coding sequence corresponding to the API request.
Optionally, the correspondence between the words and the first number sequence and the correspondence between the letters and the second number sequence may be a data table stored in a dictionary mapping form, and the data table may be a total data table including the correspondence between all the words and the first number sequence and the correspondence between the letters and the second number sequence, or may be a common data table including the correspondence between the common (occurrence frequency is higher than a set value) words and the first number sequence and the correspondence between the letters and the second number sequence.
Specifically, in the process of converting the text data into the digital sequence based on the corresponding relation between the preset text data and the digital sequence, the common data table with smaller data quantity can be checked first, and when the common data table cannot check the corresponding relation, the total data table with larger data quantity is checked again, so that the searching efficiency is improved.
Through both encodings, the API request may be converted to a digital form for input modeling. Specifically, each API requests that parameters of the input attack recognition model include a number sequence of word code sequences and letter code sequences.
Optionally, the number sequence corresponding to the API request is truncated, and then converted into a number sequence with a proper length, and then input into the attack recognition model, so as to facilitate recognition by the attack recognition model.
As a possible implementation manner, the attack recognition model is obtained by adopting the following steps S201 to S203:
step S201, converting the acquired API request sample into a digital sequence corresponding to the API request sample, wherein the API request sample comprises legal API requests and illegal API requests as a training sample set;
specifically, the legal API requests may be API requests generated by capturing normal operation of an application program through a source such as a history log or an application program monitor, and these requests are typically requests sent to a target API by a user, other application programs or services.
The above-described illegitimate API request may be an API request containing malicious code generated using a security test tool, vulnerability scanning tool, or script to simulate a malicious intent that an attacker may use.
Optionally, the converting the obtained API request sample into a number sequence corresponding to the API request sample includes:
acquiring an API request sample, and removing redundant data irrelevant to attack detection in API request sample data to obtain sample original characteristic data relevant to attack detection in the API request sample;
based on the corresponding relation between the preset symbol and the identification text, replacing the symbol in the original characteristic data of the sample with the corresponding identification text to obtain sample text data corresponding to the API request sample;
And converting the sample text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a sample digital sequence corresponding to the API request.
It should be noted that, for the specific implementation process of converting the obtained API request sample into the number sequence corresponding to the API request sample, reference may be made to the foregoing embodiment, which is not repeated herein.
Step S202, constructing a basic attack recognition model according to a received model construction instruction, wherein a loss function of the basic attack recognition model is a weighted loss function set based on the relative proportion of legal API requests and illegal API requests;
specifically, referring to fig. 2, a schematic view of an attack recognition model architecture provided in an embodiment of the present application, an attack recognition model constructed in the embodiment of the present application sequentially includes an embedding layer 10, a space loss layer 20, a Long Short-Term Memory (LSTM) layer 30, and a full connection layer 40.
Embedding layer 10: for converting a numeric sequence of word code sequences and letter code sequences into an embedded vector. In the embedding matrix formed by the embedding vectors, each first/second digital sequence in the digital sequence corresponds to a specific embedding vector, and the representation (i.e. the embedding vector) of the semantically similar words or characters (i.e. the first/second digital sequence with similar meaning) in the embedding space is also as similar as possible by capturing the semantic similarity between the words or characters, so that the API request is better understood. Thus, when a first/second digital sequence is given, we can learn the semantic similarity between it and the other first/second digital sequences by looking up its corresponding embedded vector.
Spatial loss layer 20: as the next layer of the embedding layer 10, it can regularize and reduce the dependency between elements in the embedding vector, thereby improving the generalization performance of the model.
LSTM layer 30: accepting the output of the spatial loss layer, performing necessary calculation to learn the feature from the number sequence corresponding to the API request, and transferring the learned feature vector to the next layer.
Full connection layer 40: for obtaining the recognition result. It uses feature vectors from the output of LSTM layer 30 and uses these features to return legal and illegal probabilities of API requests using Sigmoid activation functions.
In addition, in practical application, the number of samples of the legal API request sample set in the API request sample is usually greater than that of the illegal API request sample, and under the condition of unbalanced categories, the model may be excessively biased to the category with more samples by the ordinary cross entropy loss, so, in order to solve the problem of unbalanced data, the embodiment of the application uses a weighted loss function based on the relative proportion of the legal API request sample and the illegal API request sample, and assigns different types of weights based on the relative proportion to the legal API request sample and the illegal API request sample in the weighted loss function. And then inputting the sample set obtained in the step S201 into a basic attack recognition model, and performing iterative training to obtain a trained attack recognition model.
And step 203, performing iterative training on the constructed basic attack recognition model according to the training sample set until the model accuracy reaches a set value to obtain the attack recognition model.
The goal of the attack recognition model in the embodiments of the present application is to predict the legal probability of an API request. By constructing a neural network structure including the embedded layer 10, the spatial loss layer 20, the LSTM layer 30 and the full connection layer 40, and using a weighted loss function to deal with the class imbalance problem, the trained attack recognition model can effectively detect the maliciousness of the API request.
Step S104, inputting the digital sequence into an attack recognition model, and determining whether the API request is legal or not based on a recognition result output by the attack recognition model.
As a possible implementation manner, the determining whether the API request is legal based on the recognition result output by the attack recognition model includes:
inputting the digital sequence into an attack recognition model, and outputting legal probability of an API request;
taking the API request with the legal probability exceeding a first preset value as a first legal API request;
the API requests with legal probability between a first preset value and a second preset value are arranged according to the legal probability from big to small, and the API requests with the top ranking of the set proportion are selected to be used as the second legal API requests, wherein the first preset value is higher than the second preset value;
Identifying the first legitimate API request and the second legitimate API request as legitimate API requests, and identifying the remaining API requests as illegitimate API requests.
In this embodiment of the present application, if the legal probability of the API request output by the attack recognition model is 40%, the illegal probability of the request being a malicious request is 60%, and if 40% is smaller than the first preset value, the API request is recognized as an illegal API request. In practice, this may be a valid request that is misclassified by the attack recognition model. In order to reduce the false alarm probability and improve the fault tolerance interval of the attack recognition model, a set proportion p is added, and illegal API requests with the set proportion are classified as legal API requests.
For example, when the setting proportion p=10%, if the user sends 20 API requests to be identified, wherein 10 API requests with legal probabilities exceeding the first preset value are classified as legal API requests, the API requests with legal probabilities between the first preset value and the second preset value in the other 10 API requests are arranged according to the legal probabilities, and the API requests with the top 10% of the setting proportion ranking are selected to be classified as legal API requests, and the rest API requests are identified as illegal API requests.
In some embodiments, the method for identifying network attacks provided by the embodiments of the present application may be applied to network attack identification under various scenarios, such as Web application security scenarios and cloud service security scenarios, and by using the method provided by the present application, potential API injection attacks in API requests received by a Web application or in data from cloud API requests may be detected, and API request data may be processed through steps such as semantic processing and digital sequence conversion, and meanwhile, based on an attack identification model trained by using a weighted loss function and a set fault tolerance interval, so as to ensure that potential attacks are detected while reducing false alarm probability, and effectively protect the Web application or cloud service from malicious attacks.
In some embodiments, the present application may also be applied to log analysis tools, which analyze application-generated log files using the methods provided herein to detect API injection attacks. The log data is processed through the steps of semantic processing, digital sequence conversion and the like, and an attack identification model trained based on the weighted loss function is helpful for identifying potential attacks, so that a log analysis tool can better discover and report potential security threats.
According to the network attack recognition method provided by the embodiment of the application, through semanteme processing of the API request and conversion of the API request into text data in a natural language format, special symbols in the request can be prevented from being controlled by an attacker to bypass detection, more accurate attack semantics are captured, then the text data corresponding to the API request are respectively subjected to word-level digital conversion and letter-level digital conversion and then digital sequences corresponding to the API request are input into the attack recognition model, so that the recognition result output by the attack recognition model captures the characteristics in the text more carefully, meanwhile, a fault-tolerant interval is set to allow the model to accept legal effective requests within a certain error range, the problem that the effective requests are wrongly classified as illegal malicious requests is solved, the legal accuracy of recognizing different types of API requests is effectively improved, and the false alarm probability is reduced.
Based on the same inventive concept, the embodiment of the present application further provides a device for identifying network attack, as shown in fig. 3, where the device includes:
the data filtering module 301 is configured to decrypt received API request data to obtain original data, and remove redundant data irrelevant to attack detection in the original data to obtain original feature data relevant to attack detection in the API request;
The first conversion module 302 is configured to replace a symbol in the original feature data with a corresponding identification text based on a corresponding relationship between a preset symbol and the identification text, so as to obtain text data corresponding to an API request;
the second conversion module 303 is configured to convert the text data into a digital sequence based on a corresponding relationship between preset text data and the digital sequence, so as to obtain a digital sequence corresponding to the API request;
the attack recognition module 304 is configured to input the number sequence into an attack recognition model, and determine whether the API request is legal based on a recognition result output by the attack recognition model.
In one or more embodiments, the data filtering module 301 removes redundant data that is not related to attack detection from the original data, and specifically includes:
determining parameters irrelevant to attack detection based on a set data screening rule, and removing parameters irrelevant to attack detection and parameter values corresponding to the parameters in the original data; and
and removing the digital parameter values of the parameters related to attack detection in the original data.
In one or more embodiments, the apparatus further includes a suffix extraction module 305, after the obtaining of the original feature data related to attack detection in the API request, before the converting the symbol of the original feature data into the text expression form, the suffix extraction module 305 is specifically configured to:
If the API request belongs to the file request type, determining the file type suffix and the file size of the API request, and adding the determined file type suffix and the determined file size into the original characteristic data.
In one or more embodiments, the number sequences include a first number sequence corresponding to a word and a second number sequence corresponding to a letter;
the second conversion module 303 converts the text data into a digital sequence based on a corresponding relationship between the preset text data and the digital sequence, so as to obtain a digital sequence corresponding to the API request, which specifically includes:
determining a first digital sequence corresponding to the words in the text data based on the corresponding relation between the words in the text data and the first digital sequence, and replacing the words in the text data with the corresponding first digital sequence to obtain a word coding sequence corresponding to an API request;
determining a second digital sequence corresponding to the letters in the text data based on the corresponding relation between the letters in the text data and the second digital sequence, and converting the letters in the text data into the corresponding second digital sequence to obtain a letter coding sequence corresponding to an API request;
And obtaining a number sequence corresponding to the API request based on the word coding sequence and the letter coding sequence.
In one or more embodiments, the apparatus further includes a model training module 306, specifically for:
converting the acquired API request sample into a digital sequence corresponding to the API request sample, wherein the API request sample comprises legal API requests and illegal API requests as a training sample set;
constructing a basic attack recognition model according to the received model construction instruction, wherein a loss function of the basic attack recognition model is a weighted loss function set based on the relative proportion of legal API requests and illegal API requests;
and carrying out iterative training on the constructed basic attack recognition model according to the training sample set until the model accuracy reaches a set value to obtain the attack recognition model.
In one or more embodiments, the model training module 306 converts the acquired API request sample into a number sequence corresponding to the API request sample, and specifically includes:
acquiring an API request sample, and removing redundant data irrelevant to attack detection in API request sample data to obtain sample original characteristic data relevant to attack detection in the API request sample;
Based on the corresponding relation between the preset symbol and the identification text, replacing the symbol in the original characteristic data of the sample with the corresponding identification text to obtain sample text data corresponding to the API request sample;
and converting the sample text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a sample digital sequence corresponding to the API request.
In one or more embodiments, the attack recognition module 304 determines whether the API request is legal based on the recognition result output by the attack recognition model, which specifically includes:
inputting the digital sequence into an attack recognition model, and outputting legal probability of an API request;
taking the API request with the legal probability exceeding a first preset value as a first legal API request;
the API requests with legal probability between a first preset value and a second preset value are arranged according to the legal probability from big to small, and the API requests with the top ranking of the set proportion are selected to be used as the second legal API requests, wherein the first preset value is higher than the second preset value;
identifying the first legitimate API request and the second legitimate API request as legitimate API requests, and identifying the remaining API requests as illegitimate API requests.
Based on the same inventive concept, the present application also provides a network attack recognition device 400, as shown in fig. 4, comprising at least one processor 402; and a memory 401 communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of network attack identification described above.
The memory 401 is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory 401 may be a volatile memory (RAM) such as a random-access memory (RAM); the memory may also be a nonvolatile memory (non-volatile memory), such as a flash memory (flash memory), a Hard Disk Drive (HDD) or a Solid State Drive (SSD); but may be any one or a combination of any of the above volatile and nonvolatile memories.
The processor 402 may be a central processing unit (central processing unit, CPU for short), a network processor (network processor, NP for short), or a combination of CPU and NP. But also a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device, CPLD for short), a field-programmable gate array (field-programmable gate array, FPGA for short), general-purpose array logic (generic array logic, GAL for short), or any combination thereof.
The embodiment of the invention also provides a computer readable storage medium, which comprises instructions, when the computer readable storage medium runs on a computer, for causing the computer to execute the network attack identification method provided by the embodiment.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The foregoing has described in detail the technical solutions provided herein, and specific examples have been used to illustrate the principles and embodiments of the present application, where the above examples are only used to help understand the methods and core ideas of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. A method for network attack identification, comprising:
decrypting received application program interface API request data to obtain original data, and removing redundant data irrelevant to attack detection in the original data to obtain original characteristic data relevant to attack detection in the API request;
based on the corresponding relation between the preset symbol and the identification text, replacing the symbol in the original characteristic data with the corresponding identification text to obtain text data corresponding to the API request;
converting the text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a digital sequence corresponding to the API request;
and inputting the digital sequence into an attack recognition model, and determining whether the API request is legal or not based on a recognition result output by the attack recognition model.
2. The method according to claim 1, wherein decrypting the received API request data to obtain the original data comprises:
based on the set format unification rule, unifying the format of the received API request data;
and decoding the API request data after the unified format at least once until the original data of the API request is obtained.
3. The method of claim 1, wherein the removing redundant data in the original data that is not relevant to attack detection comprises:
determining parameters irrelevant to attack detection based on a set data screening rule, and removing parameters irrelevant to attack detection and parameter values corresponding to the parameters in the original data; and
and removing the digital parameter values of the parameters related to attack detection in the original data.
4. The method of claim 1, wherein after obtaining the original feature data related to attack detection in the API request, before converting the symbol of the original feature data into a text representation, the method further comprises:
if the API request belongs to the file request type, determining the file type suffix and the file size of the API request, and adding the determined file type suffix and the determined file size into the original characteristic data.
5. The method of claim 1, wherein the sequence of numbers comprises a first sequence of numbers corresponding to words and a second sequence of numbers corresponding to letters;
the converting the text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain the digital sequence corresponding to the API request comprises the following steps:
Determining a first digital sequence corresponding to the words in the text data based on the corresponding relation between the words in the text data and the first digital sequence, and replacing the words in the text data with the corresponding first digital sequence to obtain a word coding sequence corresponding to an API request;
determining a second digital sequence corresponding to the letters in the text data based on the corresponding relation between the letters in the text data and the second digital sequence, and converting the letters in the text data into the corresponding second digital sequence to obtain a letter coding sequence corresponding to an API request;
and obtaining a number sequence corresponding to the API request based on the word coding sequence and the letter coding sequence.
6. The method of claim 1, wherein the attack recognition model is derived by:
converting the acquired API request sample into a digital sequence corresponding to the API request sample, wherein the API request sample comprises legal API requests and illegal API requests as a training sample set;
constructing a basic attack recognition model according to the received model construction instruction, wherein a loss function of the basic attack recognition model is a weighted loss function set based on the relative proportion of legal API requests and illegal API requests;
And carrying out iterative training on the constructed basic attack recognition model according to the training sample set until the model accuracy reaches a set value to obtain the attack recognition model.
7. The method according to claim 6, wherein converting the obtained API request samples into a number sequence corresponding to the API request samples comprises:
acquiring an API request sample, and removing redundant data irrelevant to attack detection in API request sample data to obtain sample original characteristic data relevant to attack detection in the API request sample;
based on the corresponding relation between the preset symbol and the identification text, replacing the symbol in the original characteristic data of the sample with the corresponding identification text to obtain sample text data corresponding to the API request sample;
and converting the sample text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a sample digital sequence corresponding to the API request.
8. The method according to any one of claims 1 to 7, wherein determining whether the API request is legitimate based on the recognition result output by the attack recognition model includes:
inputting the digital sequence into an attack recognition model, and outputting legal probability of an API request;
Taking the API request with the legal probability exceeding a first preset value as a first legal API request;
the API requests with legal probability between a first preset value and a second preset value are arranged according to the legal probability from big to small, and the API requests with the top ranking of the set proportion are selected to be used as the second legal API requests, wherein the first preset value is higher than the second preset value;
identifying the first legitimate API request and the second legitimate API request as legitimate API requests, and identifying the remaining API requests as illegitimate API requests.
9. An apparatus for network attack identification, comprising:
the data screening module is used for decrypting the received API request data to obtain original data, and removing redundant data irrelevant to attack detection in the original data to obtain original characteristic data relevant to the attack detection in the API request;
the first conversion module is used for replacing the symbols in the original characteristic data with the corresponding identification texts based on the corresponding relation between the preset symbols and the identification texts to obtain text data corresponding to the API request;
the second conversion module is used for converting the text data into a digital sequence based on the corresponding relation between the preset text data and the digital sequence to obtain a digital sequence corresponding to the API request;
And the attack recognition module is used for inputting the digital sequence into an attack recognition model and determining whether the API request is legal or not based on a recognition result output by the attack recognition model.
10. A device for network attack identification, comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
CN202311767799.2A 2023-12-20 2023-12-20 Network attack identification method, device and equipment Pending CN117792720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311767799.2A CN117792720A (en) 2023-12-20 2023-12-20 Network attack identification method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311767799.2A CN117792720A (en) 2023-12-20 2023-12-20 Network attack identification method, device and equipment

Publications (1)

Publication Number Publication Date
CN117792720A true CN117792720A (en) 2024-03-29

Family

ID=90401161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311767799.2A Pending CN117792720A (en) 2023-12-20 2023-12-20 Network attack identification method, device and equipment

Country Status (1)

Country Link
CN (1) CN117792720A (en)

Similar Documents

Publication Publication Date Title
CN112003870B (en) Network encryption traffic identification method and device based on deep learning
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN112771523A (en) System and method for detecting a generated domain
Walls et al. Forensic Triage for Mobile Phones with {DEC0DE}
CN109948334B (en) Vulnerability detection method and system, electronic equipment and storage medium
CN112989348B (en) Attack detection method, model training method, device, server and storage medium
CN113381962B (en) Data processing method, device and storage medium
CN111600919A (en) Web detection method and device based on artificial intelligence
Akram et al. DroidMD: an efficient and scalable android malware detection approach at source code level
CN113067792A (en) XSS attack identification method, device, equipment and medium
Soltani et al. Event reconstruction using temporal pattern of file system modification
CN116055067B (en) Weak password detection method, device, electronic equipment and medium
CN115314268B (en) Malicious encryption traffic detection method and system based on traffic fingerprint and behavior
CN113688240B (en) Threat element extraction method, threat element extraction device, threat element extraction equipment and storage medium
Vahedi et al. Cloud based malware detection through behavioral entropy
CN117792720A (en) Network attack identification method, device and equipment
CN115001724B (en) Network threat intelligence management method, device, computing equipment and computer readable storage medium
CN113312619B (en) Malicious process detection method and device based on small sample learning, electronic equipment and storage medium
CN112995218A (en) Domain name anomaly detection method, device and equipment
Ma et al. A Parse Tree-Based NoSQL Injection Attacks Detection Mechanism.
CN112597498A (en) Webshell detection method, system and device and readable storage medium
RU2659741C1 (en) Methods of detecting the anomalous elements of web pages on basis of statistical significance
CN112052453A (en) Webshell detection method and device based on Relief algorithm
CN116488947B (en) Security element treatment method
Bozogullarindan et al. Detection of Turkish Fraudulent Domain Names to Proactively Prevent Phishing Attacks Using A Character-Level Convolutional Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination