CN113743512A

CN113743512A - Autonomous learning judgment method and system for safety alarm event

Info

Publication number: CN113743512A
Application number: CN202111047170.1A
Authority: CN
Inventors: 孙宇; 胡绍勇
Original assignee: Information and Data Security Solutions Co Ltd
Current assignee: Information and Data Security Solutions Co Ltd
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2021-12-03

Abstract

A method and system for self-learning and judging of security alarm events, belonging to the technical field of data processing, and solving the problem of how to judge whether a certain alarm data is a security vulnerability through self-learning in the case of massive alarm data; Train the security event judgment model, correct the error of the calculation results, and input the current alarm data for judgment; learn the historical alarm data, master its characteristics, and automatically determine the new data, without the need to manually face the massive original data, reduce Human error greatly improves efficiency.

Description

Autonomous learning judgment method and system for safety alarm event

Technical Field

The invention belongs to the technical field of data processing, and relates to a safety alarm event autonomous learning judgment method and system.

Background

As shown in fig. 4, the existing process of converting alarm data into a security event is entirely determined and processed by manual work, and security monitoring personnel need to manually review the data to determine which data are potential system safety hazards and vulnerabilities. The alarm data has the characteristics of large data volume, multiple dimensions, strong real-time performance and the like, and common alarm data generally comprises 10-20 attributes, such as: category, level, description IP, protocol, port, etc. raw data. A human auditor needs to review these attributes and make decisions as specified. Assuming that an auditor can judge that one piece of alarm data is finished in 1 minute, one auditor can only audit 8 × 60-480 alarms in 8 hours of working time. If 48000 pieces of data are processed a day, 100 auditors are required, and if 480 ten thousand pieces of data, 1 ten thousand auditors are required, which is obviously impractical. The efficiency of manual review cannot meet the increasing data volume and the enterprise requirement with higher and higher real-time requirements, and the defects of error, overlooking, low efficiency and the like exist in manual judgment.

In order to solve the difficulty of manual review, a machine needs to be introduced to automatically process. And setting the judged rule into a system, matching the rule in a rule base after receiving the alarm data, and performing corresponding processing if the rule can be matched. However, the conventional rule processing has a limitation, and the rule is usually not changed after being set, and if the rule has an error, the error is accumulated continuously. Meanwhile, when data which is not covered by the rule is encountered, the judgment is missed. In the prior art, a chinese patent application "an intelligent alarm method for network security incident" with publication number CN110457906A and publication date of 2019, 11, month and 15 discloses a hyper-parameter optimization step: carrying out hyper-parameter optimization on the model parameter theta of the long and short term memory network model according with the quantile regression based on the network safety historical data to obtain the optimal model parameter training and curing step of the long and short term memory network model according with the quantile regression: off-line training and solidifying the long-short term memory network model which follows quantile regression based on the optimized model parameters of the long-short term memory network model which follows quantile regression; and intelligent alarm interval calculation: calculating an intelligent alarm interval of network security through a long-short term memory network model according to quantile regression based on online data of network security; interval comparison: and comparing the network security online data with the intelligent alarm interval, and giving an alarm if the network security online data exceeds the intelligent alarm interval. However, the document does not solve the problem that the intelligent judging algorithm of machine learning fails to judge when encountering data not covered by the rule.

Disclosure of Invention

The invention aims to design a safety alarm event autonomous learning judgment method and a safety alarm event autonomous learning judgment system, so as to solve the problem of judging whether certain alarm data is a safety hole or not through autonomous learning under the condition of massive alarm data.

The invention solves the technical problems through the following technical scheme:

a safety alarm event autonomous learning judgment method comprises the following steps:

s1, constructing a safety event judgment model, wherein the safety event judgment model comprises: a scoring function and a judging function;

s2, training the safety event judgment model: initializing each weight value of a score function, reading a historical data sample set, converting the historical data sample set into a matrix form, inputting each alarm data in the matrix into the score function to obtain a corresponding score value, and substituting the score value of each alarm data into a judgment function to obtain a calculation result;

s3, error correction of calculation result: subtracting the corresponding calculation result from the real result of the alarm data to obtain a result error value, judging the result error value, adjusting each weight value of the score function according to the judgment result and finishing the training of the safety event judgment model;

s4, inputting current alarm data for judgment: and substituting the current alarm data into the trained score function to obtain the score of the current alarm data, substituting the score into the judgment function to obtain a calculation result, and judging whether the current alarm data is a security vulnerability or not according to the calculation result.

According to the technical scheme, the safety event judgment model is constructed, the characteristics of historical alarm data are learned, the error correction of a calculation result is carried out, the current alarm data is input for judgment, new data are automatically broken, the manual work is not needed to face massive original data, manual errors are reduced, and the efficiency is greatly improved.

As a technical solution of the present inventionIn a further improvement, the scoring function described in step S1 is y-w₀+w₁x₁+…+w_nx_nThe judgment function is h (y) sigmoid (y); wherein x is₁…x_nRespectively represent the 1 st … nth dependent variable, w corresponding to the alarm data₁…w_nAre each x₁…x_nThe weight value of (1); w is a₀Is a fixed value used to adjust the output value.

As a further improvement of the technical solution of the present invention, the historical data sample set in step S2 includes: attribute values and judgment results, wherein the attribute values comprise alarm types, alarm levels, asset numbers, application layer protocols and alarm ports.

As a further improvement of the technical solution of the present invention, in step S3, the result error value is judged by using a square loss function or a logarithmic loss function.

As a further improvement of the technical solution of the present invention, the step S3 of adjusting each weight value of the score function according to the evaluation result and completing the training of the security event judgment model includes:

step S31, when the result error value is positive, the weight values of the score function are adjusted down, and when the result error value is negative, the weight values of the integral function are adjusted up;

step S32, obtaining new score values of each alarm data according to the adjusted score function;

and repeating the steps S31 and S32, and finishing the training of the safety event judgment model when the new score value is optimal.

A security alarm event autonomous learning decision system, comprising: the system comprises a model building module, a model training module, an error correction module and a judgment module;

the model building module is used for constructing a safety event judgment model, and the safety event judgment model comprises: a scoring function and a judging function;

the model training module is used for training the safety event judgment model: initializing each weight value of a score function, reading a historical data sample set, converting the historical data sample set into a matrix form, inputting each alarm data in the matrix into the score function to obtain a corresponding score value, and substituting the score value of each alarm data into a judgment function to obtain a calculation result;

the error correction module is used for correcting the error of the calculation result: subtracting the corresponding calculation result from the real result of the alarm data to obtain a result error value, judging the result error value, adjusting each weight value of the score function according to the judgment result and finishing the training of the safety event judgment model;

the judging module is used for inputting current alarm data for judgment: and substituting the current alarm data into the trained score function to obtain a score value of the current alarm data, and substituting the score value into a judgment function to obtain a calculation result so as to judge whether the current alarm data is a security vulnerability.

As a further improvement of the technical solution of the present invention, the score function in the model building module is y ═ w₀+w₁x₁+…+w_nx_nThe judgment function is h (y) sigmoid (y); wherein x is₁…x_nRespectively represent the 1 st … nth dependent variable, w corresponding to the alarm data₁…w_nAre each x₁…x_nThe weight value of (1); w is a₀Is a fixed value used to adjust the output value.

As a further improvement of the technical scheme of the invention, the historical data sample set in the model training module comprises: attribute values and judgment results, wherein the attribute values comprise alarm types, alarm levels, asset numbers, application layer protocols and alarm ports.

As a further improvement of the technical scheme of the invention, the error correction module adopts a square loss function or a logarithmic loss function to judge the result error value.

As a further improvement of the technical solution of the present invention, the error correction module includes:

the weight value adjusting submodule is used for reducing each weight value of the scoring function when the result error value is positive and increasing each weight value of the integral function when the result error value is negative;

the calculating submodule is used for obtaining a new score value of each alarm data according to the adjusted score function;

and the determining submodule is used for finishing the training of the safety event judgment model when the new score value is optimal.

The invention has the advantages that:

Drawings

FIG. 1 is a flow chart of a method for autonomous learning and determining a security alarm event according to an embodiment of the present invention;

FIG. 2 is a diagram of mapping alarm data to points on a plane according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of various types of dots divided by a straight line in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a conventional manual determination of a security alarm event;

fig. 5 is a schematic diagram of the safety warning event autonomous learning determination principle of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As described in the background art, the problem of missed judgment exists when an alarm is processed currently, but the problem in this respect can be effectively solved by the intelligent studying and judging algorithm based on machine learning, which is specifically shown in fig. 5: in the application, a plurality of 'rules' (actually parameters in an algorithm) are summarized from a large amount of historical data to initialize the system. When the system runs, the algorithm is further optimized through the continuously accumulated data, the robustness is increased, and the missing judgment is reduced.

The technical scheme of the invention is further described by combining the drawings and the specific embodiments in the specification:

example one

As shown in fig. 1, an autonomous learning and determining method for a security alarm event includes the following steps:

1. constructing a security event judgment model, wherein the security event judgment model comprises the following steps: a scoring function and a judging function; the scoring function is that y is w₀+w₁x₁+…+w_nx_nThe judgment function is h (y) sigmoid (y); wherein x is₁…x_nRespectively represent the 1 st … nth dependent variable, w corresponding to the alarm data₁…w_nAre each x₁…x_nThe weight value of (1); w is a₀Is a fixed value used to adjust the output value.

2. Training a safety event judgment model: initializing each weight value of a score function to 1, reading a historical data sample set, converting the historical data sample set into a matrix form, inputting each alarm data in the matrix into the score function to obtain a corresponding score value, and substituting the score value of each alarm data into a judgment function to obtain a calculation result; the historical data sample set comprises: attribute values and a judgment result, wherein the attribute values are respectively as follows: alarm type, alarm level, number of assets, application layer protocol, alarm port.

3. Error correction of the calculation results: subtracting the calculation result from the real data result to obtain a result error value, judging the error value, and correspondingly reducing or increasing each weight value of the score function according to the judgment result; the error value is judged by adopting a square loss function.

4. Inputting current alarm data for judgment: and substituting the current alarm data into the trained score function to obtain a score value of the current alarm data, and substituting the score value into a judgment function to obtain a calculation result so as to judge whether the current alarm data is a security vulnerability. The method for judging whether the current alarm data is a security vulnerability comprises the following steps: when the calculation result is between 0 and 0.5, judging that the security vulnerability exists; and judging that the security is a security hole when the calculation result is between 0.5 and 1.

As shown in fig. 2, to determine whether a certain alarm data is a security event, the arguments can have only 0 and 1, and 0 and 1 respectively indicate whether the alarm data is a security event. Dependent variables are many, such as: alarm type, alarm level, risk value, number of associated assets, associated units, etc. A safety event judgment function needs to be designed, dependent variables are input randomly, and 0 or 1 results are output.

Mapping all data into one point on a plane respectively, and representing different types of points by different shapes (for example, squares and triangles in the figure represent two different types of points); the points are then distributed over different areas with a partitionable space between them, which is a curve. Although the curve can perfectly divide the interval, the curve is more complicated in mathematical expression and is not suitable for engineering application. Due to the particularity of the service data of the embodiment of the invention: either a "0" or a "1".

As shown in fig. 3, the different types of dots may be divided into intervals by a straight line. The equation for this line can be expressed as: y ═ w₀+w₁x₁+…+w_nx_nWherein x is₁…x_nRespectively represent the 1 st … nth dependent variable, w corresponding to the alarm data₁…w_nAre each x₁…x_nThe weight value of (1); w is a₀Is a fixed value used to adjust the output value.

The respective weight values then need to be calibrated by the real alarm data. The specific process is as follows:

setting boundary value, inputting the alarm data into safety event judging function to calculate, comparing the result with the value at boundary, if it is greater than the boundary value, it is judged that it belongs to class 1, and if it is less than the boundary value, it is judged that it belongs to class 2. The intelligent classification function is realized. For example: given a sample set, each sample set has five dimensional values: alarm type, alarm level, number of assets, application layer protocol, alarm port, and a result value. If the value of each dimension is non-number, the value is mapped into number through the mapping rule, so that mathematical calculation is convenient. The mapping rules for different dimension values are different, as shown in the following table:

dimension (d) of	True value	Mapping values
			Application layer protocol	http	1
Application layer protocol	tcp	2
			Application layer protocol	udp	3
Alarm port	80	1
			Alarm port	22	2
Alarm systemPort(s)	3306	3

The mapping table is extended continuously according to the service situation. The sample calculations are two in number, represented by 0 and 1, and the data is shown below:

the task of machine learning is to find a function that predicts the probability of a 1 result given the values of two dimensions of a datum. The model for this function is as follows: h, (y) sigmoid (y), y ═ w₀+w₁x₁+…+w_nx_n。

sigmoid is an S-curve function, also called a logistic function. Any parameter coming in will return a result between 0 and 1. It is particularly suitable for use in a scenario where such a determination is yes or no. Such as: the value of the function between 0 and 0.5 is considered as "no" and between 0.5 and 1 is considered as "yes". Here we pass in what is the "score" of each alarm. The y function is used to describe the score for each alarm data. x denotes an individual attribute of an alarm data and w denotes a weight or coefficient of the attribute. Finally, the coefficient is added to all the attributes of the alarm to calculate a value. Is the score value of the alarm.

The problem now translates into finding the optimal values of the parameters w (w0, w1, …, wn) based on existing sample data. Now we give some initial values of w and then take the data of sample 1 and sample 2 into account to see how the prediction of this function works, assuming that the predicted value of sample 1 is p1 ═ 0.8 and the predicted value of sample 2 is: p2 is 0.4.

The error of the function on sample 1 is E1 ═ 0.2 (1-0.8), on sample 2 is E2 ═ 0.4 (0-0.4) — 0.4, and the total error E is-0.20 (E1+ E2). As shown in the following table:

knowing the error of the algorithm, we need to improve the algorithm to minimize the error. There are many methods for judging the error value, such as: a square loss function, a logarithmic loss function. The square loss function is a least square method, and the principle of the square loss function is a central limit law, and the difference value of the predicted value and the actual value of each test datum is squared and then accumulated.

For sample 1: our predicted values are smaller than the theoretical values, so we want to increase the value of the function output. I.e. increase the value of w1 x 1. Since x1 is negative, we must reduce the value of w1 to achieve the goal. For sample 2: our predicted values are larger than the theoretical values, so we want to reduce the function output. I.e. decrease the value of w1 x 1. Since x1 is negative, the value of w1 must be increased in order to reach the target. With the same algorithm, for sample 1, the increasing coefficient enables the algorithm to be more accurate; for sample 2, the lower coefficient would be more accurate. At this time, we need to make a trade-off. Such as: after the adjustment up, the error of sample 1 is greatly reduced, and the error of sample 2 is slightly increased, then the adjustment up can be performed. How much to increase can be expressed by a variable alpha, and the trial is performed by very small adjustment once and again. When the final accuracy is highest, the attempt is terminated.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for self-learning and judging of safety alarm events, characterized in that, comprising the following steps:

S1, construct a security event judgment model, and the security event judgment model includes: a score function and a judgment function;

S2. Train the security event judgment model: initialize each weight value of the score function, read the historical data sample set, convert the historical data sample set into a matrix form, and input each alarm data in the matrix into the score function to calculate The corresponding score value is obtained, and the score value of each alarm data is substituted into the judgment function to obtain the calculation result;

S3. Error correction of the calculation result: subtract the corresponding calculation result from the real result of the alarm data to obtain the error value of the result, judge the error value of the result, adjust each weight value of the score function according to the judgment result, and complete the security event judgment model. train;

S4. Input the current alarm data for judgment: Substitute the current alarm data into the trained score function to obtain the score value of the current alarm data, substitute the score value into the judgment function to obtain the calculation result, and judge whether the current alarm data is based on the calculation result. for security breaches.

2 . The method for self-learning and judging of security alarm events according to claim 1 , wherein the score function described in step S1 is y=w ₀ +w ₁ x ₁ +…+w _n x _n , so The above judgment function is h(y)=sigmoid(y); in which, x ₁ ... x _n respectively represent the 1st ... nth dependent variable corresponding to the alarm data, and w ₁ ... w _n are respectively x ₁ ... x _n The weight value of ; w ₀ is a fixed value used to adjust the output value.

3. The method for self-learning and judging of security alarm events according to claim 1, wherein the historical data sample set described in step S2 includes: attribute values and judgment results, and the attribute values include alarm type, Alarm level, asset quantity, application layer protocol, alarm port.

4 . The method for self-learning and judging of safety alarm events according to claim 1 , wherein in step S3 , a square loss function or a logarithmic loss function is used to judge the result error value. 5 .

5. The method for self-learning and judging of a security alarm event according to claim 1, wherein in step S3, adjusting each weight value of the score function according to the judgment result and completing the training of the security event judgment model comprises:

Step S31, when the result error value is positive, lower each weight value of the score function, and when the result error value is negative, increase each weight value of the integral function;

Step S32, obtaining a new score value of each alarm data according to the adjusted score function;

Steps S31 and S32 are repeated, and when the new score value is optimal, the training of the security event judgment model is completed.

6. An autonomous learning and judging system for security alarm events, comprising: a model building module, a model training module, an error correction module, and a judgment module;

The model building module is used to construct a security event judgment model, and the security event judgment model includes: a score function and a judgment function;

The model training module is used to train the security event judgment model: initialize each weight value of the score function, read the historical data sample set, convert the historical data sample set into a matrix form, and convert each alarm in the matrix The data is input into the score function to obtain the corresponding score value, and the score value of each alarm data is substituted into the judgment function to obtain the calculation result;

The error correction module is used for the error correction of the calculation result: subtracting the corresponding calculation result from the real result of the alarm data to obtain the result error value, and judging the result error value, adjusting each weight value of the score function according to the judgment result, and Complete the training of the security event judgment model;

The judging module is used for inputting the current alarm data for judgment: substituting the current alarm data into the trained score function to obtain the score value of the current alarm data, and then substituting the score value into the judgment function to obtain the calculation result, thereby judging Whether the current alarm data is a security vulnerability.

7 . The self-learning and judging system for security alarm events according to claim 6 , wherein the score function described in the model building module is y=w ₀ +w ₁ x ₁ +…+w _n x _n , The judgment function is h(y)=sigmoid(y); wherein, x ₁ ... x _n respectively represent the 1st ... nth dependent variable corresponding to the alarm data, and w ₁ ... w _n are respectively x ₁ ... x The weight value of _n ; w ₀ is a fixed value used to adjust the output value.

8 . The self-learning and judging system for security alarm events according to claim 6 , wherein the historical data sample set in the model training module includes: attribute values and judgment results, and the attribute values include alarm types. 9 . , alarm level, asset quantity, application layer protocol, alarm port.

9 . The system according to claim 6 , wherein the error correction module adopts a square loss function or a logarithmic loss function to judge the result error value. 10 .

10. The self-learning and judging system for security alarm events according to claim 6, wherein the error correction module comprises:

The weight value adjustment sub-module is used to adjust each weight value of the score function when the result error value is positive, and increase each weight value of the integral function when the result error value is negative;

a calculation sub-module for obtaining a new score value of each alarm data according to the adjusted score function;

The determination sub-module is used to complete the training of the security event judgment model when the new score value is optimal.