CN112463740A

CN112463740A - Method and system for automatic log security audit

Info

Publication number: CN112463740A
Application number: CN202011295513.1A
Authority: CN
Inventors: 徐潇
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2021-03-09

Abstract

The invention discloses a method and a system for automatic log safety audit, which comprises the steps of obtaining redundant log information, converting the redundant log information into log information of digital information through rules, and training a mean shift model; obtaining log information needing auditing, converting the log information needing auditing into rules, putting the rules into a trained mean shift model, comparing the rules with a preset radius, and if the rules exceed the preset radius, determining that the log information needing auditing belongs to 'suspicious log information'; if the log information does not exceed the preset radius, the log information needing auditing belongs to 'normal log information'; through the mode, the method can utilize the redundant logs, establish the data conversion rule of the logs through the redundant logs, establish a mean shift model through a neural network, train by using the redundant logs, and screen the suspicious log information of the log audit through the model after training.

Description

Method and system for automatic log security audit

Technical Field

The invention relates to the technical field of information security, in particular to a method and a system for automatic log security audit.

Background

At present, each network platform has various types of logs, the logs are continuously generated every day, and maintenance personnel of each platform need to screen out information possibly containing suspicious behaviors from massive logs and further track and process the information; because the data volume of the system log is huge, as the system runs for longer and longer time, more and more redundant log information exists. There will always be careless omission to be eliminated completely by manpower.

The current system log auditing method analyzes the current log through an analysis module, the analyzed result is relatively inaccurate, old log data can be deleted regularly although strategies are different on different platforms, and a large amount of log information is not completely utilized. No useful connection is established between logs of different time periods.

Disclosure of Invention

The invention mainly solves the technical problem of providing an automatic log safety audit method and system, which can utilize redundant logs, establish data conversion rules of the logs through the redundant logs, establish a mean shift model through a neural network, train by using the redundant logs, and screen suspicious log information of log audit through the model after training.

In order to solve the technical problems, the invention adopts a technical scheme that: a method for automated log security audit is provided, comprising: collecting log information in a server;

establishing a conversion rule, and converting data in the log information into digital data; the conversion rule is that different operation behaviors are replaced by different numerical values;

converting the data of the log information in the server into digital data of a mean shift model through a conversion rule, training the mean shift model through the digital data of the mean shift model, and finding out the mass center of the log information data;

setting the length of the judgment radius; when the log information is audited, calculating the distance between the log information data to be audited and the mass center of the log information data by using a mean shift model, and comparing the distance with the length of the judgment radius;

if the distance exceeds the length of the judgment radius, the log information needing auditing belongs to suspicious log information;

if the distance does not exceed the length of the judgment radius, the log information needing auditing belongs to normal log information.

Further, the conversion rule includes an IP digital conversion rule, a request type conversion rule, and a return code conversion rule.

Further, the request type conversion rule is to use different variable values instead of different requests.

Further, the return code conversion rule is that a plurality of return codes are multiplied by a uniform coefficient, and a value multiplied by the uniform coefficient is used as a variable value of the return code.

Further, the IP digital conversion rule is that the frequency of the occurrence of a plurality of IPs every day, the frequency of the occurrence of each week and the frequency of the occurrence of each month are recorded through a background, and the security level coefficients of the plurality of IPs are calculated.

Further, the step of calculating the security level coefficients of the plurality of IPs comprises the following steps:

taking data in an IP address as a four-dimensional vector, and adding three matched variables, namely the frequency of daily occurrence, the frequency of weekly occurrence and the frequency of monthly occurrence of the IP; putting the information into a vector with seven dimensions;

setting a safe network IP or network segment and setting a self-defined coefficient C;

finding out the frequency of the IP appearing in the current week and the frequency of the IP appearing in the current month in the vectors of the seven dimensions, respectively calculating to obtain the average daily frequency M and the average weekly frequency N of the IP, and comparing the average daily frequency M and the average weekly frequency N with the frequency A of the IP appearing today and the frequency B of the IP appearing in the current week.

Further, the performing an alignment comprises

The reference value is 1, when A < M,

if B < N, then the process is repeated,

the security rating coefficient is 1 × C;

if B > N, the number of the N atoms is greater than,

the security rating factor is [1+ (B-N)/N ] × C;

when A > M, the compound is selected from the group consisting of,

the security rating factor is [1+ (a-M)/M ] × C.

A system for automated log security audit, comprising: the system comprises a log acquisition module, a rule conversion module, a model training module and a model application module;

the log acquisition module acquires log information; the log information comprises redundant log information and log information needing auditing;

the rule conversion module converts data in the log information into digital data of a mean shift model through conversion rules;

the model training module determines a clustering radius, acquires redundant log information acquired by the log acquisition module, and trains a mean shift model through a conversion rule of the rule conversion module;

the model application module obtains the log information to be audited, which is acquired by the log acquisition information module, converts the rule of the log information to be audited through the rule conversion module, inputs the log information to be audited into the mean shift model trained by the model training module, and judges suspicious log information through the mean shift model and the class radius.

Further, the rule conversion module comprises an IP digital conversion rule module, a request type conversion rule module and a return code conversion rule module; the IP digital conversion rule module records the daily occurrence frequency, the weekly occurrence frequency and the monthly occurrence frequency of the IP through a background and calculates the security level coefficient of the IP; the request type conversion rule module expresses different requests by using different variables; and the return code conversion rule module multiplies the return code by a uniform coefficient to obtain a variable of the return code.

The invention has the beneficial effects that: according to the invention, hidden information in the logs is also utilized in a mode of training to extract the feature center, the residual value in the redundant logs is extracted, and maintenance personnel are helped to screen massive logs by a log audit automation method, so that a large amount of labor and time are saved, the log audit efficiency is improved, and by setting the safety level, log information of suspicious behaviors can be ensured not to be omitted, and the log audit effectiveness is improved.

Drawings

FIG. 1 is a flow diagram of a method of automated log security audit in accordance with the present invention;

FIG. 2 is a flow chart of a method step by step for automated log security audit in accordance with the present invention;

FIG. 3 is a schematic diagram of a system architecture for automated log security audit according to the present invention.

Detailed Description

The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.

The embodiment of the invention comprises the following steps:

referring to fig. 1, a method for automated log security audit includes: recording all normal log data in the server for training a mean shift model;

analyzing sample characteristics needing to be classified, wherein the classification comprises two types, one type is safe operation behavior, and the other type is non-safe operation behavior;

the safe operation behavior comprises the following characteristics: fixed IP or white list IP access (i.e., IP segments that occur frequently), operations to view system information, operations to customize maintenance backups, and so forth.

Non-secure operational behavior includes the following features: occasional access to IP, multiple failed login attempts, repeated login logout within a fixed time, frequent data access within a fixed time, multiple data requests that result in a system response 401/403/404 or 500 series, and constructor injection behavior.

Most of logs in daily system operation and maintenance belong to safe operation behaviors; therefore, the log sample maintained by the daily system has single characteristics, and the characteristics of the operation behavior are obvious and stable, so that the mean shift model is selected.

And abstracting the log information, establishing a set of conversion rules, and converting the data in the log information into digital data of a mean shift model.

The conversion rule uses values of different magnitudes as variable values of different operation behaviors, and comprises an IP digital conversion rule, a request type conversion rule and a return code conversion rule;

the IP digital conversion rules include: the target address IP and the source address IP in the log information are composed of four numbers, so that the data can be directly treated as 1 4-dimensional vector, and 3 matched variables are added, namely the frequency of the IP appearing every day, the frequency of the IP appearing every week and the frequency of the IP appearing every month; and (4) integrating the information to calculate 1 security level coefficient of the IP. For example, taking 100.25.3.44 as an example of the IP, the IP recorded in the background is present 100 times today, 160 times weekly and 5000 times monthly.

That data may constitute 1 vector of 7 dimensions for this IP [100,25,3,44,100,160,5000 ].

And calculating 1 security level coefficient of the IP, wherein the security level coefficient can be customized in practice. The higher the safety level coefficient is, the more certain threat exists; the lower the security level factor, the more secure is indicated,

firstly, for a system, a network IP or a network segment which is considered to be safe by the system can be set in advance; for example, 100.25.3.1 is set as a trusted network segment, and the trusted network segment is customized by a factor, for example, 0.1.

If an IP other than this segment, e.g., 120.4.22.43, is present, the custom coefficient is larger, e.g., 0.6.

Finding out the occurrence frequency of the IP in the week and the occurrence frequency of the IP in the month in the vectors of the seven dimensions, respectively calculating to obtain the average daily occurrence frequency of the IP, comparing the average weekly occurrence frequency of the IP with the occurrence frequency of the current day and the occurrence frequency of the week, and taking the value of a reference value as 1;

1. present day frequency < average daily frequency;

if the frequency of occurrence in the week is less than the average frequency of occurrence in each week, the security level coefficient is considered to be lower;

The safety grade coefficient is the reference value self-defined coefficient;

if the frequency of occurrence in the week is greater than the average frequency of occurrence in each week, the security level coefficient is considered to be higher;

then the safety rating coefficient ═ 1+ (frequency of occurrence in this week-average frequency of occurrence per week)/average frequency of occurrence per week) · custom coefficient;

2. the daily occurrence frequency of today > average daily frequency,

safety rating coefficient ═ 1+ (frequency of occurrence today-average frequency of daily)/average frequency of daily) } custom coefficient.

This allows the security level coefficient of the IP to be calculated.

The request conversion rule includes variable values that respectively label different request types such as GET/POST/PUT/DELETE with 1, 10, 100, 1000, etc.

The return code conversion rule includes multiplying the value of the return code itself (200/404/500, etc.) by a uniform coefficient as the value of the variable.

After the data conversion rule of the log is established, each piece of log information can be automatically converted into input data of 1 mean shift model by using a program; as the amount of training data increases, multiple centroids (which may be understood as multiple feature centers) of the log information data may be found gradually.

Finally, the length R of the judgment radius can be set from short to long in a self-defined mode from high to low according to different safety level requirements; when the log information is audited, calculating the distance between the log information and the data and the centroid by using a mean shift algorithm;

And converting each natural language in the log record into 1 piece of multidimensional data in a self-defined mode.

For example, can be expressed as a number of

[ Security level coefficient, request type, Return code, Return value body size, number of response headers, digital representation of a plurality of other information ]

[ 0.2,10,3.6,65536,8, (numerical representation of multiple messages), … ]

The "mean shift model" can compute multiple centroids in model space for a large number of data classes.

For example, the security level coefficients are 0.2, 0.3, 0.5, 0.1, 0.3, and the like, and the center of mass of the security level coefficient data is found through a mean shift model;

when new data arrives, the distance of each data from the centroids can be directly calculated by the coordinates of each point in the model space.

This radius means that the new data cannot be too far from the centroids. If the distance is too far away, then it is judged as "suspicious or non-secure log information";

when the distance from a certain log information to each feature center in the model exceeds the judgment radius R, the system judges that the log information belongs to suspicious log information, marks the suspicious log information and prompts maintenance personnel to check and confirm the suspicious log information.

Referring to fig. 2, a method for automatic log security audit is provided, from two aspects, a first aspect is to perform mean shift model training, and a second aspect is to evaluate a security level of a mean shift model.

Mean shift model training

Firstly, obtaining redundant log information, converting the redundant log information into digital information log information a through rules, putting the log information a into a mean shift model to be trained, gradually finding a plurality of centroids of the log information a through the mean shift model, and finally finishing the training of the mean shift model.

Mean shift model security level evaluation

The safety level evaluation of the mean shift model firstly acquires the current latest log information, converts the latest log information into digital information log information b through rules, puts the log information b into the trained mean shift model, and calculates the distances between the log information data b and a plurality of centroids of the log information a;

performing safety grade judgment, and determining the distance of the class radius R;

finally, comparing the distances between the log information data b and a plurality of centroids of the log information a with R, and if the distances exceed the radius R, the log information belongs to 'suspicious log information'; if the radius R is not exceeded, the log information belongs to 'normal log information'.

Wherein, the Mean value shifts (Mean-shift)

The Mean shift algorithm is also called a Mean shift algorithm, is a hill climbing algorithm based on kernel density estimation, and can be used for clustering, image segmentation, tracking and the like; its working principle is based on centroid, which means that its goal is to locate the centroid of each cluster/class, i.e. first calculate the shifted mean of the current point, move the point to this shifted mean, and then continue moving with this as a new starting point until the final condition is met (find the most dense area).

Applicable scenarios are as follows:

the method is applicable to the condition that the number of categories is unknown and the number of samples is less than 10K;

at present, the method has wide application in clustering, image smoothing, segmentation, tracking and other aspects.

Referring to fig. 3, based on the same inventive concept as the method for automatic log security audit in the foregoing embodiment, an embodiment of this specification further provides a system for automatic log security audit, including: the system comprises a log acquisition module, a rule conversion module, a model training module and a model application module;

the rule conversion module converts data in the log information into digital data of a mean shift model through conversion rules; the rule conversion module comprises an IP digital conversion rule module, a request type conversion rule module and a return code conversion rule module; the IP digital conversion rule module records the daily occurrence frequency, the weekly occurrence frequency and the monthly occurrence frequency of the IP through a background and calculates the security level coefficient of the IP; the request type conversion rule module expresses different requests by using different variables; the return code conversion rule module multiplies the return code by a uniform coefficient to obtain a variable of the return code;

The model training module acquires redundant log information acquired by the log acquisition module, trains a mean shift model through the conversion rule of the rule conversion module, and sets a clustering radius R;

the model application module obtains log information needing auditing, which is acquired by the log acquisition module, converts rules of the log information needing auditing through the rule conversion module, inputs the log information into a mean shift model trained by the model training module, and judges suspicious log information through the mean shift model and the class radius R.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, or applied directly or indirectly to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for automated log security audit, comprising:

collecting log information in a server;

2. The method of claim 1, wherein the method comprises: the conversion rules include an IP number conversion rule, a request type conversion rule and a return code conversion rule.

3. The method of claim 2, wherein the method comprises: the request type conversion rule is to use different variable values instead of different requests.

4. The method of claim 2, wherein the method comprises: the return code conversion rule is that a plurality of return codes are multiplied by a uniform coefficient, and the value multiplied by the uniform coefficient is used as the variable value of the return codes.

5. The method of claim 2, wherein the method comprises: the IP digital conversion rule is that the frequency of the IP appearing every day, the frequency of the IP appearing every week and the frequency of the IP appearing every month are recorded through a background, and the security level coefficients of the IP are calculated.

6. The method of claim 5, wherein the method comprises:

the method for calculating the security level coefficients of the plurality of IPs comprises the following steps:

7. The method of claim 6, wherein the method comprises:

The performing of the alignment comprises

The reference value is 1, when A < M,

if B < N, then the process is repeated,

the security rating coefficient is 1 × C;

if B > N, the number of the N atoms is greater than,

the security rating factor is [1+ (B-N)/N ] × C;

when A > M, the compound is selected from the group consisting of,

the security rating factor is [1+ (a-M)/M ] × C.

8. A system for automated log security audit, comprising: the system comprises a log acquisition module, a rule conversion module, a model training module and a model application module;

9. The system for automated log security audit according to claim 8 wherein:

the rule conversion module comprises an IP digital conversion rule module, a request type conversion rule module and a return code conversion rule module; the IP digital conversion rule module records the daily occurrence frequency, the weekly occurrence frequency and the monthly occurrence frequency of the IP through a background and calculates the security level coefficient of the IP; the request type conversion rule module expresses different requests by using different variables; and the return code conversion rule module multiplies the return code by a uniform coefficient to obtain a variable of the return code.