CN118101352A - Abnormality detection rule generation method and device - Google Patents

Abnormality detection rule generation method and device Download PDF

Info

Publication number
CN118101352A
CN118101352A CN202410522726.5A CN202410522726A CN118101352A CN 118101352 A CN118101352 A CN 118101352A CN 202410522726 A CN202410522726 A CN 202410522726A CN 118101352 A CN118101352 A CN 118101352A
Authority
CN
China
Prior art keywords
detection rule
normal
data packet
regression model
logistic regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410522726.5A
Other languages
Chinese (zh)
Inventor
常宇歌
刘聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhian Technology Co ltd
Original Assignee
Beijing Zhian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhian Technology Co ltd filed Critical Beijing Zhian Technology Co ltd
Priority to CN202410522726.5A priority Critical patent/CN118101352A/en
Publication of CN118101352A publication Critical patent/CN118101352A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application provides a method and a device for generating an abnormality detection rule, wherein when a status code is a normal code, a data packet carrying a historical data access request is marked as a normal data packet; extracting features of the normal data packet to obtain a plurality of feature vectors, classifying the similar feature vectors into a vector group, and configuring corresponding normal labels for the vector group; training a preset logistic regression model based on the first number of target feature vectors and corresponding normal labels; inputting the test packet of the protected system into a trained logistic regression model to obtain the probability that the trained logistic regression model predicts the test packet as a normal data packet; when the probability is not greater than a preset value, marking the classification label of the test packet as an abnormal label; and acquiring a difference part between the test packet marked as the abnormal label and the normal data packet, and generating an abnormal detection rule of the protected system based on the difference part. The embodiment of the application is beneficial to improving the generation efficiency of the abnormality detection rule.

Description

Abnormality detection rule generation method and device
Technical Field
The embodiments of the application belong to the technical field of network security, and particularly relate to a method and a device for generating an anomaly detection rule.
Background
The anomaly detection rule has wide application value in enterprises, and can help the enterprises to promote safety and cope with various potential security threats. Through the anomaly detection rule, the enterprise can construct a safe and reliable information system environment, and powerful guarantee is provided for the robust development of the enterprise.
However, the generation process of the anomaly detection rule is complex, and various configuration information needs to be manually filled, and the manual filling time is long, so that a large amount of human resources and time resources are consumed. In addition, the situation of each enterprise is different, and the corresponding abnormality detection rule needs to be customized according to the actual situation, so that the rule generation process becomes more complex, and the existing abnormality detection rule generation process is complex, and is not beneficial to improving the abnormality detection rule generation efficiency.
Disclosure of Invention
The embodiment of the application provides an abnormality detection rule generation method and device, which are used for solving the technical problems that the generation process of the existing abnormality detection rule is complicated and the generation efficiency of the abnormality detection rule is not improved.
In a first aspect, an embodiment of the present application provides an anomaly detection rule generating method, applied to an electronic device, where the anomaly detection rule generating method includes:
Acquiring a historical data access request of a protected system and a state code of the historical data access request, wherein the state code is used for representing the processing state of the historical data access request;
when the status code is the normal code, marking the data packet carrying the historical data access request as a normal data packet;
Extracting features of the normal data packet to obtain a plurality of feature vectors, classifying the similar feature vectors into a vector group, and configuring corresponding normal labels for the vector group;
Training a preset logistic regression model based on a first number of target feature vectors and the corresponding normal labels, wherein the target feature vectors are the feature vectors in the vector group;
inputting the test packet of the protected system into the trained logistic regression model to obtain the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
When the probability is not greater than a preset value, marking the classification label of the test packet as an abnormal label, wherein the normal label and the abnormal label are different labels;
And acquiring a difference part between the test packet marked as the abnormal label and the normal data packet, and generating an abnormal detection rule of the protected system based on the difference part.
The embodiment of the application has the advantages that on one hand, the difference part between the test packet marked as the abnormal label and the normal data packet is obtained, and the abnormal detection rule of the protected system is generated based on the difference part, and various configuration information is not required to be filled manually, so that the generation process of the abnormal detection rule is simplified, and the generation efficiency of the abnormal detection rule is improved; on the other hand, the abnormality detection rule of the protected system has higher pertinence to the protected system, so that potential problems or risks of the protected system can be effectively identified, thereby being beneficial to improving the safety and reliability of the protected system.
In a possible implementation manner of the first aspect, the acquiring the historical data access request of the protected system and the status code of the historical data access request includes:
Acquiring an access log of the protected system, and acquiring the historical data access request and response data of the historical data access request in the access log;
And acquiring a status code type corresponding to the historical data access request, and acquiring the status code corresponding to the status code type in the response data.
In the embodiment of the application, through the status code, a developer can determine the access condition of the protected system, discover potential problems during access in time, and take corresponding measures for repairing, thereby improving the availability and stability of the protected system.
In a possible implementation manner of the first aspect, the feature extracting the plurality of normal data packets to obtain a plurality of feature vectors, classifying the similar feature vectors into a vector group, and configuring a normal label corresponding to the vector group includes:
extracting the characteristics of the normal data packet to obtain a plurality of characteristic vectors;
Based on a preset clustering algorithm, classifying similar feature vectors in a plurality of feature vectors into a vector group, and configuring corresponding normal labels for the vector group.
In the embodiment of the application, based on a preset clustering algorithm, similar feature vectors in a plurality of feature vectors are classified into a vector group, so that classification of the plurality of feature vectors can be realized, and key information or targets in the plurality of feature vectors are extracted.
In a possible implementation manner of the first aspect, the inputting the test packet of the protected system into the trained logistic regression model, and obtaining the probability that the trained logistic regression model predicts that the test packet is the normal data packet includes:
Based on the target feature vectors of the second number and the logistic regression model after the training of the normal label test, acquiring an evaluation score of the logistic regression model after the training, judging whether the evaluation score is larger than a preset score, inputting the test packet of the protected system into the logistic regression model after the training when the evaluation score is larger than the preset score, and acquiring the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
Or, based on the second number of target feature vectors and the logistic regression model after the training of the normal label test, acquiring the current accuracy of the logistic regression model after the training, judging whether the current accuracy is greater than a preset accuracy, and inputting the test packet of the protected system into the logistic regression model after the training when the current accuracy is greater than the preset accuracy, and acquiring the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
Or, based on the second number of target feature vectors and the logistic regression model after the training of the normal label test, acquiring the current recall rate of the logistic regression model after the training, judging whether the current recall rate is larger than a preset recall rate, and inputting the test packet of the protected system into the logistic regression model after the training when the current recall rate is larger than the preset recall rate, and acquiring the probability that the trained logistic regression model predicts that the test packet is the normal data packet.
In the embodiment of the application, when the evaluation score is larger than the preset score, whether the current accuracy is larger than the preset accuracy or not or whether the current recall is larger than the preset recall, the logistic regression model is indicated to perform better than expected in the classification task, so that the stability and the reliability of the logistic regression model are ensured.
In a possible implementation manner of the first aspect, the acquiring a difference portion between the test packet and the normal data packet marked as the anomaly tag, and constructing an anomaly detection rule of the protected system based on the difference portion includes:
And acquiring the accumulated number of the abnormal labels, judging whether the accumulated number is larger than a preset number, and when the accumulated number is larger than the preset number, acquiring a difference part between the test packet marked as the abnormal label and the normal data packet, and generating an abnormal detection rule of the protected system based on the difference part.
In the embodiment of the application, when the accumulated number is greater than the preset number, the difference part between the test packet marked as the abnormal label and the normal data packet is obtained, so that the validity of the difference part can be ensured, and the abnormal detection rule of the protected system is generated based on the difference part, which is beneficial to reducing the situation that the abnormal detection rule is erroneously detected, and is beneficial to the stability and reliability of the abnormal detection rule.
In a possible implementation manner of the first aspect, the obtaining a difference portion between the test packet and the normal data packet marked as the anomaly tag, and generating, based on the difference portion, an anomaly detection rule of the protected system includes:
Acquiring a user name length of the protected system, acquiring first content and second content corresponding to the user name length, and generating a first abnormality detection rule of the protected system based on the difference part between the first content and the second content, wherein the first abnormality detection rule is the abnormality detection rule corresponding to the user name length, the first content is the content of the user name length in the test packet marked as the abnormality label, and the second content is the content of the user name length in the normal data packet; or alternatively, the first and second heat exchangers may be,
The historical data access obtains a user name format of the protected system, obtains third content and fourth content corresponding to the user name format, generates a second abnormality detection rule of the protected system based on the difference part between the third content and the fourth content, wherein the second abnormality detection rule is the abnormality detection rule corresponding to the user name format, the third content is the content of the user name format in the test packet marked as the abnormality label, and the fourth content is the content of the user name format in the normal data packet.
In the embodiment of the application, the first abnormality detection rule and the second abnormality detection rule do not need to be filled with various configuration information manually, so that the generation process of the first abnormality detection rule and the second abnormality detection rule is simplified, and the generation efficiency of the first abnormality detection rule and the second abnormality detection rule is improved.
In a possible implementation manner of the first aspect, when the status code is the normal code, the marking the data packet carrying the historical data access request as a normal data packet includes:
When the status code is the normal code, acquiring response time of the data packet carrying the historical data access request;
Judging whether the response time is smaller than a preset time or not;
and marking the data packet as the normal data packet when the response time is smaller than the preset time.
In the embodiment of the application, the data packet with the response time being longer than the preset time is often related to malicious behaviors, and when the response time is shorter than the preset time, the data packet is marked as a normal data packet, so that the validity and the accuracy of the normal data packet can be ensured.
In a possible implementation manner of the first aspect, after the acquiring a difference portion between the test packet marked as the anomaly label and the normal data packet, and generating an anomaly detection rule of the protected system based on the difference portion, the anomaly detection rule generating method further includes:
And acquiring a current data access request of the protected system, and inputting the current data access request into the abnormality detection rule to obtain a detection result output by the abnormality detection rule based on the current data access request.
In the embodiment of the application, the detection result is output based on the abnormal detection rule, so that errors and prejudices possibly caused by human factors are avoided, and the objectivity and the accuracy of the detection result are ensured.
In a possible implementation manner of the first aspect, after the inputting the test packet of the protected system into the trained logistic regression model, the method for generating the anomaly detection rule further includes:
And when the probability is larger than the preset value, marking the classification label of the test packet as the normal label.
In the embodiment of the application, the classification label of the test packet is marked as the normal label,
Test packets marked as the abnormal label may be explicitly classified from test packets marked as the normal label.
In a second aspect, an embodiment of the present application provides an anomaly detection rule generating apparatus, including:
The first acquisition module is used for acquiring a historical data access request of the protected system and a state code of the historical data access request, wherein the state code is used for representing the processing state of the historical data access request;
The first marking module is used for marking the data packet carrying the historical data access request as a normal data packet when the status code is the normal code;
the configuration module is used for extracting the characteristics of the normal data packet to obtain a plurality of characteristic vectors, classifying the similar characteristic vectors into a vector group, and configuring the corresponding normal labels for the vector group;
the training module is used for training a preset logistic regression model based on a first number of target feature vectors and the corresponding normal labels, wherein the target feature vectors are the feature vectors in the vector group;
The second acquisition module is used for inputting the test packet of the protected system into the trained logistic regression model to obtain the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
The second marking module is used for marking the classification label of the test packet as an abnormal label when the probability is not greater than a preset value, wherein the normal label and the abnormal label are different labels;
And the generation module is used for acquiring the difference part between the test packet marked as the abnormal label and the normal data packet and generating an abnormal detection rule of the protected system based on the difference part.
The embodiment of the application has the advantages that on one hand, the difference part between the test packet marked as the abnormal label and the normal data packet is obtained, and the abnormal detection rule of the protected system is generated based on the difference part, and various configuration information is not required to be filled manually, so that the generation process of the abnormal detection rule is simplified, and the generation efficiency of the abnormal detection rule is improved; on the other hand, the abnormality detection rule of the protected system has higher pertinence to the protected system, so that potential problems or risks of the protected system can be effectively identified, thereby being beneficial to improving the safety and reliability of the protected system.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. Some specific embodiments of the application will be described in detail hereinafter by way of example and not by way of limitation with reference to the accompanying drawings. The same reference numbers in the drawings denote the same or similar parts or portions, and it will be understood by those skilled in the art that the drawings are not necessarily drawn to scale, in which:
FIG. 1 is a flowchart of an anomaly detection rule generation method provided by an embodiment of the present application;
FIG. 2 is a flowchart of acquiring a normal data packet according to an embodiment of the present application;
FIG. 3 is an application diagram of the abnormality detection rule generation method provided by the embodiment of the present application;
fig. 4 is a schematic block diagram of an abnormality detection rule generation apparatus provided by an embodiment of the present application.
Detailed Description
In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings. It will be apparent that the described embodiments are merely some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
The abnormality detection rule generating method provided by the embodiment of the application can be applied to electronic devices such as a server, a mobile phone, a tablet personal computer, a wearable device, a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal DIGITAL ASSISTANT, PDA) and the like, and the embodiment of the application does not limit the specific types of the electronic devices.
Fig. 1 is a flowchart of an anomaly detection rule generation method provided by an embodiment of the present application, and as shown in fig. 1, the anomaly detection rule generation method provided by the embodiment of the present application includes the following steps, which are described in detail as follows:
S101, acquiring a historical data access request of a protected system and a state code of the historical data access request, wherein the state code is used for representing a processing state of the historical data access request;
wherein the protected system is a system which needs protection.
The protected system comprises one or a combination of an office system, a data management system and a business application system.
For ease of illustration, examples are as follows:
An office system: such as a flow management system, a document management system, an administrative management system, a personnel management system, a customer management system and the like, ensures that enterprises can efficiently process office business and business information, realizes efficient utilization of information resources, and further achieves the aim of improving productivity.
Network security system: such as firewall systems, intrusion detection systems, etc., to ensure that the enterprise network is protected from attacks.
A data management system: such as database systems, data warehouse systems, etc., to protect the security and integrity of enterprise data.
Service application system: such as enterprise resource planning systems, customer relationship management systems, etc., to ensure continuity and stability of business operations.
Wherein the obtaining the historical data access request of the protected system and the status code of the historical data access request comprises:
Acquiring an access log of the protected system, and acquiring the historical data access request and response data of the historical data access request in the access log;
And acquiring a status code type corresponding to the historical data access request, and acquiring the status code corresponding to the status code type in the response data.
In the embodiment of the application, through the status code, a developer can determine the access condition of the protected system, discover potential problems during access in time, and take corresponding measures for repairing, thereby improving the availability and stability of the protected system.
S102, when the status code is the normal code, marking the data packet carrying the historical data access request as a normal data packet;
When the status code is the normal code, the data packet carrying the historical data access request is marked as a normal data packet, so that the data packet which does not meet the condition can be effectively filtered, and the more valuable and reliable normal data packet is reserved.
S103, extracting features of the normal data packet to obtain a plurality of feature vectors, classifying the similar feature vectors into a vector group, and configuring corresponding normal labels for the vector group;
The feature extraction is performed on the plurality of normal data packets to obtain a plurality of feature vectors, the similar feature vectors are classified into a vector group, and a normal label corresponding to the vector group is configured, including:
extracting the characteristics of the normal data packet to obtain a plurality of characteristic vectors;
Based on a preset clustering algorithm, classifying similar feature vectors in a plurality of feature vectors into a vector group, and configuring corresponding normal labels for the vector group.
In the embodiment of the application, based on a preset clustering algorithm, similar feature vectors in a plurality of feature vectors are classified into a vector group, so that classification of the plurality of feature vectors can be realized, and key information or targets in the plurality of feature vectors are extracted.
S104, training a preset logistic regression model based on a first number of target feature vectors and the corresponding normal labels, wherein the target feature vectors are the feature vectors in the vector group;
Illustratively, training a preset logistic regression model based on a first number of target feature vectors and the corresponding normal labels, including:
Training a preset logistic regression model based on a first number of target feature vectors and the corresponding normal labels to obtain the current cycle number;
judging whether the current cycle number is larger than a preset cycle number or not;
and stopping training when the current cycle number is greater than the preset cycle number, and storing the logistic regression model after training is completed.
For ease of illustration, examples are as follows:
for example, the preset cycle number is set to be 50, after the training process runs for 50 cycles, training is stopped, and the logistic regression model after training is completed is saved.
For example, the preset cycle number is set to be 30, after the training process is operated for 30 cycles, training is stopped, and the logistic regression model after training is completed is saved.
The effect of this is, among other things, that the training process is straightforward, easy to manage, and a relatively stable logistic regression model can generally be obtained.
S105, inputting the test packet of the protected system into the trained logistic regression model to obtain the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
The inputting the test packet of the protected system into the trained logistic regression model, obtaining the probability that the trained logistic regression model predicts that the test packet is the normal data packet, comprising:
Based on the target feature vectors of the second number and the logistic regression model after the training of the normal label test, acquiring an evaluation score of the logistic regression model after the training, judging whether the evaluation score is larger than a preset score, inputting the test packet of the protected system into the logistic regression model after the training when the evaluation score is larger than the preset score, and acquiring the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
Or, based on the second number of target feature vectors and the logistic regression model after the training of the normal label test, acquiring the current accuracy of the logistic regression model after the training, judging whether the current accuracy is greater than a preset accuracy, and inputting the test packet of the protected system into the logistic regression model after the training when the current accuracy is greater than the preset accuracy, and acquiring the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
Or, based on the second number of target feature vectors and the logistic regression model after the training of the normal label test, acquiring the current recall rate of the logistic regression model after the training, judging whether the current recall rate is larger than a preset recall rate, and inputting the test packet of the protected system into the logistic regression model after the training when the current recall rate is larger than the preset recall rate, and acquiring the probability that the trained logistic regression model predicts that the test packet is the normal data packet.
In the embodiment of the application, when the evaluation score is larger than the preset score, whether the current accuracy is larger than the preset accuracy or not or whether the current recall is larger than the preset recall, the logistic regression model is indicated to perform better than expected in the classification task, so that the stability and the reliability of the logistic regression model are ensured.
S106, when the probability is not greater than a preset value, marking the classification label of the test packet as an abnormal label, wherein the normal label and the abnormal label are different labels;
the preset value is user self-set or default, and is not limited herein.
For ease of illustration, examples are as follows:
for example, the preset value is 0.5, and when the probability is not greater than 0.5, the classification label of the test packet is marked as an abnormal label.
For example, the preset value is 0.6, and when the probability is not greater than 0.6, the classification label of the test packet is marked as an abnormal label.
S107, acquiring a difference part between the test packet marked as the abnormal label and the normal data packet, and generating an abnormal detection rule of the protected system based on the difference part. And packaging the target file to generate an application package of the application program, wherein the application program is deployed based on the application package because one application program is stored in one target file, so that the quick expansion of the application program can be ensured.
The step of obtaining the difference part between the test packet marked as the abnormal label and the normal data packet, and constructing an abnormal detection rule of the protected system based on the difference part, wherein the abnormal detection rule comprises the following steps:
And acquiring the accumulated number of the abnormal labels, judging whether the accumulated number is larger than a preset number, and when the accumulated number is larger than the preset number, acquiring a difference part between the test packet marked as the abnormal label and the normal data packet, and generating an abnormal detection rule of the protected system based on the difference part.
When the accumulated number is greater than the preset number, the difference part between the test packet marked as the abnormal label and the normal data packet is obtained, so that the validity of the difference part can be ensured, and the abnormal detection rule of the protected system is generated based on the difference part, which is beneficial to reducing the situation that the abnormal detection rule is erroneously detected, and is beneficial to the stability and reliability of the abnormal detection rule.
The step of obtaining the difference part between the test packet marked as the abnormal label and the normal data packet, and generating an abnormal detection rule of the protected system based on the difference part, wherein the abnormal detection rule comprises the following steps:
Acquiring a user name length of the protected system, acquiring first content and second content corresponding to the user name length, and generating a first abnormality detection rule of the protected system based on the difference part between the first content and the second content, wherein the first abnormality detection rule is the abnormality detection rule corresponding to the user name length, the first content is the content of the user name length in the test packet marked as the abnormality label, and the second content is the content of the user name length in the normal data packet;
for ease of illustration, examples are as follows:
For example, the user name length of the protected system a is a user name length A1, the content of the user name length A1 in the test packet marked as the anomaly label is 13 characters, that is, the first content is 13 characters, and the content of the user name length A1 in the normal data packet is 12 characters, that is, the second content is 12 characters, so that the difference part between the first content and the second content is the number of characters of the user name length, and therefore, a rule that the first anomaly detection rule of the protected system a is the number of characters of the user name length exceeds 12 characters can be generated.
For example, the user name length of the protected system B is a user name length B1, the content of the user name length B1 in the test packet marked as the anomaly tag is 20 characters, that is, the first content is 20 characters, and the content of the user name length B1 in the normal data packet is 19 characters, that is, the second content is 19 characters, so that the difference between the first content and the second content is the number of characters of the user name length, and therefore, a rule that the first anomaly detection rule of the protected system B is the number of characters of the user name length exceeding 19 characters can be generated.
In this way, the protected system A and the protected system B can set different first abnormality detection rules for the user name length according to the personalized requirements of the systems, which helps to prevent unauthorized access and ensure safe and stable operation of the protected system A and the protected system B.
Or, the historical data access obtains a user name format of the protected system, obtains third content and fourth content corresponding to the user name format, and generates a second abnormality detection rule of the protected system based on the difference part between the third content and the fourth content, wherein the second abnormality detection rule is the abnormality detection rule corresponding to the user name format, the third content is the content of the user name format in the test packet marked as the abnormality label, and the fourth content is the content of the user name format in the normal data packet.
For ease of illustration, examples are as follows:
For example, the user name format of the protected system G is a user name format G1, the content of the user name format G1 in the test packet marked as the anomaly label is a combination of a special symbol and a number, that is, a third content is a combination of a special symbol and a number, and the user name format G1 in the normal data packet is a combination of a department code and a serial number, that is, a fourth content is a combination of a department code and a serial number, so that the difference part between the third content and the fourth content is different from the characters of the user name format, and therefore, a rule that the second anomaly detection rule of the protected system G is that the user name format contains the special symbol can be generated.
For example, a combination of special symbols and numbers:
#001、#023、#0222、#02211。
Wherein the special symbol comprises one or a combination of a well number, a currency symbol, an arrow symbol, an expression symbol and a punctuation mark.
For example, a combination of department codes plus serial numbers:
IT001: representing the first employee of the IT department.
HR023: representing the 23 rd employee of the human resources department.
For example, the user name format of the protected system F is a user name format F1, the content of the user name format F1 in the test packet marked as the anomaly label is a combination of employee name abbreviation and mathematical symbol, that is, a third content is a combination of employee name abbreviation and mathematical symbol, the content of the user name format F1 in the normal data packet is a combination of employee name abbreviation and job entry year, that is, a fourth content is a combination of employee name abbreviation and job entry year, and therefore, the difference between the third content and the fourth content is partly a mathematical symbol, and thus a rule that the second anomaly detection rule of the protected system F is that the user name format contains a mathematical symbol can be generated.
For example, a combination of employee name abbreviations and mathematical symbols:
Pdd+: PDD is an abbreviation for employee name, + is a mathematical notation.
ZSS-: ZSS is an abbreviation for employee name, -mathematical notation.
For example, employee name abbreviations, and job year combinations:
PDD2001: PDD is an abbreviation for employee name, 2001 is the year of job entry.
ZSS2023: ZSS is an abbreviation for employee name, 2023 is the year of job entry.
In this way, the protected system G and the protected system F can set different second anomaly detection rules for the user name format according to the personalized requirements of the systems, which helps to prevent unauthorized access and ensure safe and stable operation of the protected system G and the protected system F.
The first abnormality detection rule and the second abnormality detection rule do not need to be filled with various configuration information manually, so that the generation process of the first abnormality detection rule and the second abnormality detection rule is simplified, and the generation efficiency of the first abnormality detection rule and the second abnormality detection rule is improved.
Wherein after the obtaining the difference portion between the test packet and the normal data packet marked as the anomaly label, generating an anomaly detection rule of the protected system based on the difference portion, the anomaly detection rule generating method further comprises:
And acquiring a current data access request of the protected system, and inputting the current data access request into the abnormality detection rule to obtain a detection result output by the abnormality detection rule based on the current data access request.
For ease of illustration, examples are as follows:
for example, when the abnormality detection rule is a rule that the number of characters of the user name length exceeds 19 characters, if the user name length of the current data access request exceeds 19 characters, the detection result outputted by the abnormality detection rule based on the current data access request is obtained as the abnormality of the current data access request.
For example, when the abnormality detection rule is a rule that the user name format contains a special symbol, if the user name format of the current data access request is a combination of the special symbol and a number, the detection result output by the abnormality detection rule based on the current data access request is the abnormality of the current data access request.
In the embodiment of the application, the detection result is output based on the abnormal detection rule, so that errors and prejudices possibly caused by human factors are avoided, and the objectivity and the accuracy of the detection result are ensured.
The method for generating the anomaly detection rule further comprises the steps of after the test packet of the protected system is input into the trained logistic regression model to obtain the probability that the trained logistic regression model predicts that the test packet is the normal data packet, and the method further comprises the following steps:
And when the probability is larger than the preset value, marking the classification label of the test packet as the normal label.
Wherein, the classification label of the test packet is marked as the normal label,
Test packets marked as the abnormal label may be explicitly classified from test packets marked as the normal label.
The embodiment of the application has the advantages that on one hand, the difference part between the test packet marked as the abnormal label and the normal data packet is obtained, and the abnormal detection rule of the protected system is generated based on the difference part, and various configuration information is not required to be filled manually, so that the generation process of the abnormal detection rule is simplified, and the generation efficiency of the abnormal detection rule is improved; on the other hand, the abnormality detection rule of the protected system has higher pertinence to the protected system, so that potential problems or risks of the protected system can be effectively identified, thereby being beneficial to improving the safety and reliability of the protected system.
Fig. 2 is a flowchart for acquiring a normal data packet according to an embodiment of the present application, which is described in detail below:
S201, when the status code is the normal code, acquiring response time of the data packet carrying the historical data access request;
S202, judging whether the response time is smaller than a preset time;
and S203, marking the data packet as the normal data packet when the response time is smaller than the preset time.
In the embodiment of the application, the data packet with the response time being longer than the preset time is often related to malicious behaviors, and when the response time is shorter than the preset time, the data packet is marked as a normal data packet, so that the validity and the accuracy of the normal data packet can be ensured.
Fig. 3 is an application diagram of the anomaly detection rule generation method provided in the embodiment of the present application, and is described in detail below:
S301, collecting response data;
s302, acquiring a normal data packet according to a status code of response data;
S303, splitting the feature vector of the normal data packet into a training set and a testing set
S304, training a logistic regression model based on a clustering algorithm and a training set
S305, judging whether the scores of the logistic regression model reach the standards; if yes, executing S306; otherwise, executing S304;
S306, deploying a logistic regression model;
the logistic regression model can be deployed in a cloud server, a local server, or a computer.
For ease of illustration, examples are as follows:
for example, the logistic regression model is deployed on a cloud server, so that the high-performance operation and stable service of the model can be ensured by utilizing the characteristics of elasticity and high availability of cloud service.
For example, the logistic regression model is deployed in a local server or a computer, so that the safety and privacy of data can be ensured, the data transmission delay is reduced, and the real-time response capability is improved.
S307, judging whether a classified boundary is detected; if yes, executing S309; otherwise, executing S308;
wherein the shape of the boundary comprises one or a combination of a straight line, a curved line, or a hyperplane in a high-dimensional space.
S308, generating a test request, and sending a test packet to the logistic regression model through the test request;
s309, a difference part between the test packet marked as the abnormal label and the normal data packet is acquired, and an abnormal detection rule is generated based on the difference part.
And when the probability is not more than a preset value, marking the classification label of the test packet as an abnormal label, acquiring a difference part between the test packet marked as the abnormal label and the normal data packet, and generating an abnormal detection rule based on the difference part.
In the embodiment of the application, the abnormality detection rule of the protected system has higher pertinence to the protected system, so that potential problems or risks of the protected system can be effectively identified, thereby being beneficial to improving the safety and reliability of the protected system.
Referring to fig. 4, fig. 4 is a schematic block diagram of an abnormality detection rule generating device according to an embodiment of the present application, where the abnormality detection rule generating device 400 shown in fig. 4 may be applied to the above-mentioned electronic device, and the abnormality detection rule generating device 400 shown in fig. 4 may be described in detail below by taking the electronic device as an example, where the abnormality detection rule generating device 400 may include a first obtaining module 401, a first marking module 402, a configuration module 403, a training module 404, a second obtaining module 405, a second marking module 406, and a generating module 407.
A first obtaining module 401, configured to obtain a historical data access request of a protected system and a status code of the historical data access request, where the status code is used to represent a processing status of the historical data access request;
A first marking module 402, configured to mark a data packet carrying the historical data access request as a normal data packet when the status code is the normal code;
the configuration module 403 is configured to perform feature extraction on the normal data packet to obtain a plurality of feature vectors, classify similar feature vectors into a vector group, and configure the vector group with corresponding normal labels;
A training module 404, configured to train a preset logistic regression model based on a first number of target feature vectors and the corresponding normal labels, where the target feature vectors are the feature vectors in the vector group;
A second obtaining module 405, configured to input a test packet of the protected system into the trained logistic regression model, so as to obtain a probability that the trained logistic regression model predicts that the test packet is the normal data packet;
A second marking module 406, configured to mark the classification label of the test packet as an abnormal label when the probability is not greater than a preset value, where the normal label and the abnormal label are different labels;
A generating module 407, configured to obtain a difference portion between the test packet and the normal data packet marked as the abnormal label, and generate an abnormality detection rule of the protected system based on the difference portion.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
The embodiment of the application has the advantages that on one hand, the difference part between the test packet marked as the abnormal label and the normal data packet is obtained, and the abnormal detection rule of the protected system is generated based on the difference part, and various configuration information is not required to be filled manually, so that the generation process of the abnormal detection rule is simplified, and the generation efficiency of the abnormal detection rule is improved; on the other hand, the abnormality detection rule of the protected system has higher pertinence to the protected system, so that potential problems or risks of the protected system can be effectively identified, thereby being beneficial to improving the safety and reliability of the protected system.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims (10)

1. An abnormality detection rule generation method, characterized in that the abnormality detection rule generation method includes:
Acquiring a historical data access request of a protected system and a state code of the historical data access request, wherein the state code is used for representing the processing state of the historical data access request;
when the status code is a normal code, marking a data packet carrying the historical data access request as a normal data packet;
Extracting features of the normal data packet to obtain a plurality of feature vectors, classifying the similar feature vectors into a vector group, and configuring corresponding normal labels for the vector group;
Training a preset logistic regression model based on a first number of target feature vectors and the corresponding normal labels, wherein the target feature vectors are the feature vectors in the vector group;
inputting the test packet of the protected system into the trained logistic regression model to obtain the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
When the probability is not greater than a preset value, marking the classification label of the test packet as an abnormal label, wherein the normal label and the abnormal label are different labels;
And acquiring a difference part between the test packet marked as the abnormal label and the normal data packet, and generating an abnormal detection rule of the protected system based on the difference part.
2. The abnormality detection rule generation method according to claim 1, characterized in that the acquiring the history data access request of the protected system and the status code of the history data access request includes:
Acquiring an access log of the protected system, and acquiring the historical data access request and response data of the historical data access request in the access log;
And acquiring a status code type corresponding to the historical data access request, and acquiring the status code corresponding to the status code type in the response data.
3. The method of generating an anomaly detection rule according to claim 1, wherein the feature extracting the normal data packet to obtain a plurality of feature vectors, classifying the similar feature vectors into a vector group, and configuring a normal tag corresponding to the vector group includes:
extracting the characteristics of the normal data packet to obtain a plurality of characteristic vectors;
Based on a preset clustering algorithm, classifying similar feature vectors in a plurality of feature vectors into a vector group, and configuring corresponding normal labels for the vector group.
4. The method of claim 1, wherein inputting the test packet of the protected system into the trained logistic regression model, obtaining the probability that the trained logistic regression model predicts that the test packet is the normal data packet, comprises:
Based on the target feature vectors of the second number and the logistic regression model after the training of the normal label test, acquiring an evaluation score of the logistic regression model after the training, judging whether the evaluation score is larger than a preset score, inputting the test packet of the protected system into the logistic regression model after the training when the evaluation score is larger than the preset score, and acquiring the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
Or, based on the second number of target feature vectors and the logistic regression model after the training of the normal label test, acquiring the current accuracy of the logistic regression model after the training, judging whether the current accuracy is greater than a preset accuracy, and inputting the test packet of the protected system into the logistic regression model after the training when the current accuracy is greater than the preset accuracy, and acquiring the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
Or, based on the second number of target feature vectors and the logistic regression model after the training of the normal label test, acquiring the current recall rate of the logistic regression model after the training, judging whether the current recall rate is larger than a preset recall rate, and inputting the test packet of the protected system into the logistic regression model after the training when the current recall rate is larger than the preset recall rate, and acquiring the probability that the trained logistic regression model predicts that the test packet is the normal data packet.
5. The abnormality detection rule generation method according to claim 1, wherein the acquiring a difference portion between the test packet and the normal data packet marked as the abnormality label, based on which an abnormality detection rule of the protected system is constructed, includes:
And acquiring the accumulated number of the abnormal labels, judging whether the accumulated number is larger than a preset number, and when the accumulated number is larger than the preset number, acquiring a difference part between the test packet marked as the abnormal label and the normal data packet, and generating an abnormal detection rule of the protected system based on the difference part.
6. The abnormality detection rule generation method according to claim 5, wherein the acquiring a difference portion between the test packet and the normal data packet marked as the abnormality label, based on which an abnormality detection rule of the protected system is generated, includes:
Acquiring a user name length of the protected system, acquiring first content and second content corresponding to the user name length, and generating a first abnormality detection rule of the protected system based on the difference part between the first content and the second content, wherein the first abnormality detection rule is the abnormality detection rule corresponding to the user name length, the first content is the content of the user name length in the test packet marked as the abnormality label, and the second content is the content of the user name length in the normal data packet; or alternatively, the first and second heat exchangers may be,
The historical data access obtains a user name format of the protected system, obtains third content and fourth content corresponding to the user name format, generates a second abnormality detection rule of the protected system based on the difference part between the third content and the fourth content, wherein the second abnormality detection rule is the abnormality detection rule corresponding to the user name format, the third content is the content of the user name format in the test packet marked as the abnormality label, and the fourth content is the content of the user name format in the normal data packet.
7. The anomaly detection rule generation method of claim 1, wherein when the status code is the normal code, marking the data packet carrying the historical data access request as a normal data packet comprises:
When the status code is the normal code, acquiring response time of the data packet carrying the historical data access request;
Judging whether the response time is smaller than a preset time or not;
and marking the data packet as the normal data packet when the response time is smaller than the preset time.
8. The abnormality detection rule generation method according to claim 1, characterized in that after the abnormality detection rule of the protected system is generated based on a difference portion between the test packet and the normal data packet, which are marked as the abnormality label, the abnormality detection rule generation method further comprises:
And acquiring a current data access request of the protected system, and inputting the current data access request into the abnormality detection rule to obtain a detection result output by the abnormality detection rule based on the current data access request.
9. The anomaly detection rule generation method according to claim 1, after the test packet of the protected system is input into the trained logistic regression model to obtain a probability that the trained logistic regression model predicts that the test packet is the normal data packet, the anomaly detection rule generation method further comprises:
And when the probability is larger than the preset value, marking the classification label of the test packet as the normal label.
10. An abnormality detection rule generation device, comprising:
The first acquisition module is used for acquiring a historical data access request of the protected system and a state code of the historical data access request, wherein the state code is used for representing the processing state of the historical data access request;
the first marking module is used for marking the data packet carrying the historical data access request as a normal data packet when the state code is a normal code;
the configuration module is used for extracting the characteristics of the normal data packet to obtain a plurality of characteristic vectors, classifying the similar characteristic vectors into a vector group, and configuring the corresponding normal labels for the vector group;
the training module is used for training a preset logistic regression model based on a first number of target feature vectors and the corresponding normal labels, wherein the target feature vectors are the feature vectors in the vector group;
The second acquisition module is used for inputting the test packet of the protected system into the trained logistic regression model to obtain the probability that the trained logistic regression model predicts that the test packet is the normal data packet;
The second marking module is used for marking the classification label of the test packet as an abnormal label when the probability is not greater than a preset value, wherein the normal label and the abnormal label are different labels;
And the generation module is used for acquiring the difference part between the test packet marked as the abnormal label and the normal data packet and generating an abnormal detection rule of the protected system based on the difference part.
CN202410522726.5A 2024-04-28 2024-04-28 Abnormality detection rule generation method and device Pending CN118101352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410522726.5A CN118101352A (en) 2024-04-28 2024-04-28 Abnormality detection rule generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410522726.5A CN118101352A (en) 2024-04-28 2024-04-28 Abnormality detection rule generation method and device

Publications (1)

Publication Number Publication Date
CN118101352A true CN118101352A (en) 2024-05-28

Family

ID=91159819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410522726.5A Pending CN118101352A (en) 2024-04-28 2024-04-28 Abnormality detection rule generation method and device

Country Status (1)

Country Link
CN (1) CN118101352A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050160340A1 (en) * 2004-01-02 2005-07-21 Naoki Abe Resource-light method and apparatus for outlier detection
CN115913950A (en) * 2022-10-21 2023-04-04 浪潮通信信息系统有限公司 Home-wide-oriented batch user early warning analysis method and system
CN117171570A (en) * 2023-09-05 2023-12-05 中电云计算技术有限公司 Method for automatically collecting and treating sample set and generating model on line to detect abnormal command line behaviors in real time
CN117729032A (en) * 2023-12-22 2024-03-19 江苏云天网络安全技术有限公司 Night safety protection method for office network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050160340A1 (en) * 2004-01-02 2005-07-21 Naoki Abe Resource-light method and apparatus for outlier detection
CN115913950A (en) * 2022-10-21 2023-04-04 浪潮通信信息系统有限公司 Home-wide-oriented batch user early warning analysis method and system
CN117171570A (en) * 2023-09-05 2023-12-05 中电云计算技术有限公司 Method for automatically collecting and treating sample set and generating model on line to detect abnormal command line behaviors in real time
CN117729032A (en) * 2023-12-22 2024-03-19 江苏云天网络安全技术有限公司 Night safety protection method for office network

Similar Documents

Publication Publication Date Title
US9654510B1 (en) Match signature recognition for detecting false positive incidents and improving post-incident remediation
US9183384B1 (en) Leveraging indexed document matching to automatically train SVM classifiers
KR101013264B1 (en) Method and system for distinguishing relevant network security threats using comparison of refined intrusion detection audits and intelligent security analysis
CN113812130A (en) Detection of phishing activities
US20230360513A1 (en) Adaptive severity functions for alerts
CN111651784A (en) Log desensitization method, device, equipment and computer readable storage medium
CN112465411B (en) Risk prediction method, device and equipment
CN112527484B (en) Workflow breakpoint continuous running method and device, computer equipment and readable storage medium
CN112685771A (en) Log desensitization method, device, equipment and storage medium
CN111294347B (en) Safety management method and system for industrial control equipment
CN113704772B (en) Safety protection processing method and system based on user behavior big data mining
CN109067587B (en) Method and device for determining key information infrastructure
CN113297287B (en) Automatic user policy deployment method and device and electronic equipment
CN112597752A (en) Complaint text processing method and device, electronic equipment and storage medium
CN118101352A (en) Abnormality detection rule generation method and device
US20090183061A1 (en) Anti-tamper process toolset
KR20220117189A (en) Security compliance automation method
US20170024745A1 (en) Network management event escalation
CN112749124A (en) Page information management method and device, computer equipment and readable storage medium
CN115809466B (en) Security requirement generation method and device based on STRIDE model, electronic equipment and medium
Pitre et al. Blockchain and Machine Learning Based Approach to Prevent Phishing Attacks
CN113825138B (en) Fraud short message monitoring method and device, electronic equipment and storage medium
EP4272377B1 (en) Network adaptive alert prioritization system
CN115002100B (en) File transmission method and device, electronic equipment and storage medium
US20240195841A1 (en) System and method for manipulation of secure data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination