CN112463740A - Method and system for automatic log security audit - Google Patents

Method and system for automatic log security audit Download PDF

Info

Publication number
CN112463740A
CN112463740A CN202011295513.1A CN202011295513A CN112463740A CN 112463740 A CN112463740 A CN 112463740A CN 202011295513 A CN202011295513 A CN 202011295513A CN 112463740 A CN112463740 A CN 112463740A
Authority
CN
China
Prior art keywords
log information
frequency
module
log
conversion rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011295513.1A
Other languages
Chinese (zh)
Inventor
徐潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202011295513.1A priority Critical patent/CN112463740A/en
Publication of CN112463740A publication Critical patent/CN112463740A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a system for automatic log safety audit, which comprises the steps of obtaining redundant log information, converting the redundant log information into log information of digital information through rules, and training a mean shift model; obtaining log information needing auditing, converting the log information needing auditing into rules, putting the rules into a trained mean shift model, comparing the rules with a preset radius, and if the rules exceed the preset radius, determining that the log information needing auditing belongs to 'suspicious log information'; if the log information does not exceed the preset radius, the log information needing auditing belongs to 'normal log information'; through the mode, the method can utilize the redundant logs, establish the data conversion rule of the logs through the redundant logs, establish a mean shift model through a neural network, train by using the redundant logs, and screen the suspicious log information of the log audit through the model after training.

Description

Method and system for automatic log security audit
Technical Field
The invention relates to the technical field of information security, in particular to a method and a system for automatic log security audit.
Background
At present, each network platform has various types of logs, the logs are continuously generated every day, and maintenance personnel of each platform need to screen out information possibly containing suspicious behaviors from massive logs and further track and process the information; because the data volume of the system log is huge, as the system runs for longer and longer time, more and more redundant log information exists. There will always be careless omission to be eliminated completely by manpower.
The current system log auditing method analyzes the current log through an analysis module, the analyzed result is relatively inaccurate, old log data can be deleted regularly although strategies are different on different platforms, and a large amount of log information is not completely utilized. No useful connection is established between logs of different time periods.
Disclosure of Invention
The invention mainly solves the technical problem of providing an automatic log safety audit method and system, which can utilize redundant logs, establish data conversion rules of the logs through the redundant logs, establish a mean shift model through a neural network, train by using the redundant logs, and screen suspicious log information of log audit through the model after training.
In order to solve the technical problems, the invention adopts a technical scheme that: a method for automated log security audit is provided, comprising: collecting log information in a server;
establishing a conversion rule, and converting data in the log information into digital data; the conversion rule is that different operation behaviors are replaced by different numerical values;
converting the data of the log information in the server into digital data of a mean shift model through a conversion rule, training the mean shift model through the digital data of the mean shift model, and finding out the mass center of the log information data;
setting the length of the judgment radius; when the log information is audited, calculating the distance between the log information data to be audited and the mass center of the log information data by using a mean shift model, and comparing the distance with the length of the judgment radius;
if the distance exceeds the length of the judgment radius, the log information needing auditing belongs to suspicious log information;
if the distance does not exceed the length of the judgment radius, the log information needing auditing belongs to normal log information.
Further, the conversion rule includes an IP digital conversion rule, a request type conversion rule, and a return code conversion rule.
Further, the request type conversion rule is to use different variable values instead of different requests.
Further, the return code conversion rule is that a plurality of return codes are multiplied by a uniform coefficient, and a value multiplied by the uniform coefficient is used as a variable value of the return code.
Further, the IP digital conversion rule is that the frequency of the occurrence of a plurality of IPs every day, the frequency of the occurrence of each week and the frequency of the occurrence of each month are recorded through a background, and the security level coefficients of the plurality of IPs are calculated.
Further, the step of calculating the security level coefficients of the plurality of IPs comprises the following steps:
taking data in an IP address as a four-dimensional vector, and adding three matched variables, namely the frequency of daily occurrence, the frequency of weekly occurrence and the frequency of monthly occurrence of the IP; putting the information into a vector with seven dimensions;
setting a safe network IP or network segment and setting a self-defined coefficient C;
finding out the frequency of the IP appearing in the current week and the frequency of the IP appearing in the current month in the vectors of the seven dimensions, respectively calculating to obtain the average daily frequency M and the average weekly frequency N of the IP, and comparing the average daily frequency M and the average weekly frequency N with the frequency A of the IP appearing today and the frequency B of the IP appearing in the current week.
Further, the performing an alignment comprises
The reference value is 1, when A < M,
if B < N, then the process is repeated,
the security rating coefficient is 1 × C;
if B > N, the number of the N atoms is greater than,
the security rating factor is [1+ (B-N)/N ] × C;
when A > M, the compound is selected from the group consisting of,
the security rating factor is [1+ (a-M)/M ] × C.
A system for automated log security audit, comprising: the system comprises a log acquisition module, a rule conversion module, a model training module and a model application module;
the log acquisition module acquires log information; the log information comprises redundant log information and log information needing auditing;
the rule conversion module converts data in the log information into digital data of a mean shift model through conversion rules;
the model training module determines a clustering radius, acquires redundant log information acquired by the log acquisition module, and trains a mean shift model through a conversion rule of the rule conversion module;
the model application module obtains the log information to be audited, which is acquired by the log acquisition information module, converts the rule of the log information to be audited through the rule conversion module, inputs the log information to be audited into the mean shift model trained by the model training module, and judges suspicious log information through the mean shift model and the class radius.
Further, the rule conversion module comprises an IP digital conversion rule module, a request type conversion rule module and a return code conversion rule module; the IP digital conversion rule module records the daily occurrence frequency, the weekly occurrence frequency and the monthly occurrence frequency of the IP through a background and calculates the security level coefficient of the IP; the request type conversion rule module expresses different requests by using different variables; and the return code conversion rule module multiplies the return code by a uniform coefficient to obtain a variable of the return code.
The invention has the beneficial effects that: according to the invention, hidden information in the logs is also utilized in a mode of training to extract the feature center, the residual value in the redundant logs is extracted, and maintenance personnel are helped to screen massive logs by a log audit automation method, so that a large amount of labor and time are saved, the log audit efficiency is improved, and by setting the safety level, log information of suspicious behaviors can be ensured not to be omitted, and the log audit effectiveness is improved.
Drawings
FIG. 1 is a flow diagram of a method of automated log security audit in accordance with the present invention;
FIG. 2 is a flow chart of a method step by step for automated log security audit in accordance with the present invention;
FIG. 3 is a schematic diagram of a system architecture for automated log security audit according to the present invention.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
The embodiment of the invention comprises the following steps:
referring to fig. 1, a method for automated log security audit includes: recording all normal log data in the server for training a mean shift model;
analyzing sample characteristics needing to be classified, wherein the classification comprises two types, one type is safe operation behavior, and the other type is non-safe operation behavior;
the safe operation behavior comprises the following characteristics: fixed IP or white list IP access (i.e., IP segments that occur frequently), operations to view system information, operations to customize maintenance backups, and so forth.
Non-secure operational behavior includes the following features: occasional access to IP, multiple failed login attempts, repeated login logout within a fixed time, frequent data access within a fixed time, multiple data requests that result in a system response 401/403/404 or 500 series, and constructor injection behavior.
Most of logs in daily system operation and maintenance belong to safe operation behaviors; therefore, the log sample maintained by the daily system has single characteristics, and the characteristics of the operation behavior are obvious and stable, so that the mean shift model is selected.
And abstracting the log information, establishing a set of conversion rules, and converting the data in the log information into digital data of a mean shift model.
The conversion rule uses values of different magnitudes as variable values of different operation behaviors, and comprises an IP digital conversion rule, a request type conversion rule and a return code conversion rule;
the IP digital conversion rules include: the target address IP and the source address IP in the log information are composed of four numbers, so that the data can be directly treated as 1 4-dimensional vector, and 3 matched variables are added, namely the frequency of the IP appearing every day, the frequency of the IP appearing every week and the frequency of the IP appearing every month; and (4) integrating the information to calculate 1 security level coefficient of the IP. For example, taking 100.25.3.44 as an example of the IP, the IP recorded in the background is present 100 times today, 160 times weekly and 5000 times monthly.
That data may constitute 1 vector of 7 dimensions for this IP [100,25,3,44,100,160,5000 ].
And calculating 1 security level coefficient of the IP, wherein the security level coefficient can be customized in practice. The higher the safety level coefficient is, the more certain threat exists; the lower the security level factor, the more secure is indicated,
firstly, for a system, a network IP or a network segment which is considered to be safe by the system can be set in advance; for example, 100.25.3.1 is set as a trusted network segment, and the trusted network segment is customized by a factor, for example, 0.1.
If an IP other than this segment, e.g., 120.4.22.43, is present, the custom coefficient is larger, e.g., 0.6.
Finding out the occurrence frequency of the IP in the week and the occurrence frequency of the IP in the month in the vectors of the seven dimensions, respectively calculating to obtain the average daily occurrence frequency of the IP, comparing the average weekly occurrence frequency of the IP with the occurrence frequency of the current day and the occurrence frequency of the week, and taking the value of a reference value as 1;
1. present day frequency < average daily frequency;
if the frequency of occurrence in the week is less than the average frequency of occurrence in each week, the security level coefficient is considered to be lower;
The safety grade coefficient is the reference value self-defined coefficient;
if the frequency of occurrence in the week is greater than the average frequency of occurrence in each week, the security level coefficient is considered to be higher;
then the safety rating coefficient ═ 1+ (frequency of occurrence in this week-average frequency of occurrence per week)/average frequency of occurrence per week) · custom coefficient;
2. the daily occurrence frequency of today > average daily frequency,
safety rating coefficient ═ 1+ (frequency of occurrence today-average frequency of daily)/average frequency of daily) } custom coefficient.
This allows the security level coefficient of the IP to be calculated.
The request conversion rule includes variable values that respectively label different request types such as GET/POST/PUT/DELETE with 1, 10, 100, 1000, etc.
The return code conversion rule includes multiplying the value of the return code itself (200/404/500, etc.) by a uniform coefficient as the value of the variable.
After the data conversion rule of the log is established, each piece of log information can be automatically converted into input data of 1 mean shift model by using a program; as the amount of training data increases, multiple centroids (which may be understood as multiple feature centers) of the log information data may be found gradually.
Finally, the length R of the judgment radius can be set from short to long in a self-defined mode from high to low according to different safety level requirements; when the log information is audited, calculating the distance between the log information and the data and the centroid by using a mean shift algorithm;
And converting each natural language in the log record into 1 piece of multidimensional data in a self-defined mode.
For example, can be expressed as a number of
[ Security level coefficient, request type, Return code, Return value body size, number of response headers, digital representation of a plurality of other information ]
[ 0.2,10,3.6,65536,8, (numerical representation of multiple messages), … ]
The "mean shift model" can compute multiple centroids in model space for a large number of data classes.
For example, the security level coefficients are 0.2, 0.3, 0.5, 0.1, 0.3, and the like, and the center of mass of the security level coefficient data is found through a mean shift model;
when new data arrives, the distance of each data from the centroids can be directly calculated by the coordinates of each point in the model space.
This radius means that the new data cannot be too far from the centroids. If the distance is too far away, then it is judged as "suspicious or non-secure log information";
when the distance from a certain log information to each feature center in the model exceeds the judgment radius R, the system judges that the log information belongs to suspicious log information, marks the suspicious log information and prompts maintenance personnel to check and confirm the suspicious log information.
Referring to fig. 2, a method for automatic log security audit is provided, from two aspects, a first aspect is to perform mean shift model training, and a second aspect is to evaluate a security level of a mean shift model.
Mean shift model training
Firstly, obtaining redundant log information, converting the redundant log information into digital information log information a through rules, putting the log information a into a mean shift model to be trained, gradually finding a plurality of centroids of the log information a through the mean shift model, and finally finishing the training of the mean shift model.
Mean shift model security level evaluation
The safety level evaluation of the mean shift model firstly acquires the current latest log information, converts the latest log information into digital information log information b through rules, puts the log information b into the trained mean shift model, and calculates the distances between the log information data b and a plurality of centroids of the log information a;
performing safety grade judgment, and determining the distance of the class radius R;
finally, comparing the distances between the log information data b and a plurality of centroids of the log information a with R, and if the distances exceed the radius R, the log information belongs to 'suspicious log information'; if the radius R is not exceeded, the log information belongs to 'normal log information'.
Wherein, the Mean value shifts (Mean-shift)
The Mean shift algorithm is also called a Mean shift algorithm, is a hill climbing algorithm based on kernel density estimation, and can be used for clustering, image segmentation, tracking and the like; its working principle is based on centroid, which means that its goal is to locate the centroid of each cluster/class, i.e. first calculate the shifted mean of the current point, move the point to this shifted mean, and then continue moving with this as a new starting point until the final condition is met (find the most dense area).
Applicable scenarios are as follows:
the method is applicable to the condition that the number of categories is unknown and the number of samples is less than 10K;
at present, the method has wide application in clustering, image smoothing, segmentation, tracking and other aspects.
Referring to fig. 3, based on the same inventive concept as the method for automatic log security audit in the foregoing embodiment, an embodiment of this specification further provides a system for automatic log security audit, including: the system comprises a log acquisition module, a rule conversion module, a model training module and a model application module;
the log acquisition module acquires log information; the log information comprises redundant log information and log information needing auditing;
the rule conversion module converts data in the log information into digital data of a mean shift model through conversion rules; the rule conversion module comprises an IP digital conversion rule module, a request type conversion rule module and a return code conversion rule module; the IP digital conversion rule module records the daily occurrence frequency, the weekly occurrence frequency and the monthly occurrence frequency of the IP through a background and calculates the security level coefficient of the IP; the request type conversion rule module expresses different requests by using different variables; the return code conversion rule module multiplies the return code by a uniform coefficient to obtain a variable of the return code;
The model training module acquires redundant log information acquired by the log acquisition module, trains a mean shift model through the conversion rule of the rule conversion module, and sets a clustering radius R;
the model application module obtains log information needing auditing, which is acquired by the log acquisition module, converts rules of the log information needing auditing through the rule conversion module, inputs the log information into a mean shift model trained by the model training module, and judges suspicious log information through the mean shift model and the class radius R.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, or applied directly or indirectly to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A method for automated log security audit, comprising:
collecting log information in a server;
establishing a conversion rule, and converting data in the log information into digital data; the conversion rule is that different operation behaviors are replaced by different numerical values;
Converting the data of the log information in the server into digital data of a mean shift model through a conversion rule, training the mean shift model through the digital data of the mean shift model, and finding out the mass center of the log information data;
setting the length of the judgment radius; when the log information is audited, calculating the distance between the log information data to be audited and the mass center of the log information data by using a mean shift model, and comparing the distance with the length of the judgment radius;
if the distance exceeds the length of the judgment radius, the log information needing auditing belongs to suspicious log information;
if the distance does not exceed the length of the judgment radius, the log information needing auditing belongs to normal log information.
2. The method of claim 1, wherein the method comprises: the conversion rules include an IP number conversion rule, a request type conversion rule and a return code conversion rule.
3. The method of claim 2, wherein the method comprises: the request type conversion rule is to use different variable values instead of different requests.
4. The method of claim 2, wherein the method comprises: the return code conversion rule is that a plurality of return codes are multiplied by a uniform coefficient, and the value multiplied by the uniform coefficient is used as the variable value of the return codes.
5. The method of claim 2, wherein the method comprises: the IP digital conversion rule is that the frequency of the IP appearing every day, the frequency of the IP appearing every week and the frequency of the IP appearing every month are recorded through a background, and the security level coefficients of the IP are calculated.
6. The method of claim 5, wherein the method comprises:
the method for calculating the security level coefficients of the plurality of IPs comprises the following steps:
taking data in an IP address as a four-dimensional vector, and adding three matched variables, namely the frequency of daily occurrence, the frequency of weekly occurrence and the frequency of monthly occurrence of the IP; putting the information into a vector with seven dimensions;
setting a safe network IP or network segment and setting a self-defined coefficient C;
finding out the frequency of the IP appearing in the current week and the frequency of the IP appearing in the current month in the vectors of the seven dimensions, respectively calculating to obtain the average daily frequency M and the average weekly frequency N of the IP, and comparing the average daily frequency M and the average weekly frequency N with the frequency A of the IP appearing today and the frequency B of the IP appearing in the current week.
7. The method of claim 6, wherein the method comprises:
The performing of the alignment comprises
The reference value is 1, when A < M,
if B < N, then the process is repeated,
the security rating coefficient is 1 × C;
if B > N, the number of the N atoms is greater than,
the security rating factor is [1+ (B-N)/N ] × C;
when A > M, the compound is selected from the group consisting of,
the security rating factor is [1+ (a-M)/M ] × C.
8. A system for automated log security audit, comprising: the system comprises a log acquisition module, a rule conversion module, a model training module and a model application module;
the log acquisition module acquires log information; the log information comprises redundant log information and log information needing auditing;
the rule conversion module converts data in the log information into digital data of a mean shift model through conversion rules;
the model training module determines a clustering radius, acquires redundant log information acquired by the log acquisition module, and trains a mean shift model through a conversion rule of the rule conversion module;
the model application module obtains the log information to be audited, which is acquired by the log acquisition information module, converts the rule of the log information to be audited through the rule conversion module, inputs the log information to be audited into the mean shift model trained by the model training module, and judges suspicious log information through the mean shift model and the class radius.
9. The system for automated log security audit according to claim 8 wherein:
the rule conversion module comprises an IP digital conversion rule module, a request type conversion rule module and a return code conversion rule module; the IP digital conversion rule module records the daily occurrence frequency, the weekly occurrence frequency and the monthly occurrence frequency of the IP through a background and calculates the security level coefficient of the IP; the request type conversion rule module expresses different requests by using different variables; and the return code conversion rule module multiplies the return code by a uniform coefficient to obtain a variable of the return code.
CN202011295513.1A 2020-11-18 2020-11-18 Method and system for automatic log security audit Pending CN112463740A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011295513.1A CN112463740A (en) 2020-11-18 2020-11-18 Method and system for automatic log security audit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011295513.1A CN112463740A (en) 2020-11-18 2020-11-18 Method and system for automatic log security audit

Publications (1)

Publication Number Publication Date
CN112463740A true CN112463740A (en) 2021-03-09

Family

ID=74836934

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011295513.1A Pending CN112463740A (en) 2020-11-18 2020-11-18 Method and system for automatic log security audit

Country Status (1)

Country Link
CN (1) CN112463740A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815125A (en) * 2015-12-02 2017-06-09 阿里巴巴集团控股有限公司 A kind of log audit method and platform
CN111160401A (en) * 2019-12-09 2020-05-15 国网辽宁省电力有限公司电力科学研究院 Abnormal electricity utilization judging method based on mean shift and XGboost

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815125A (en) * 2015-12-02 2017-06-09 阿里巴巴集团控股有限公司 A kind of log audit method and platform
CN111160401A (en) * 2019-12-09 2020-05-15 国网辽宁省电力有限公司电力科学研究院 Abnormal electricity utilization judging method based on mean shift and XGboost

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
毛玲玥: "审计信息系统的异常数据挖掘算法和应用", 《全国流通经济》 *
邓旭冉等: "聚类中心初始值选择方法综述", 《中国电子科学研究院学报》 *

Similar Documents

Publication Publication Date Title
CN111163057B (en) User identification system and method based on heterogeneous information network embedding algorithm
CN111782484B (en) Anomaly detection method and device
CN108268886B (en) Method and system for identifying plug-in operation
CN117523299B (en) Image recognition method, system and storage medium based on computer network
CN112333195B (en) APT attack scene reduction detection method and system based on multi-source log correlation analysis
CN105224600A (en) A kind of detection method of Sample Similarity and device
CN112733954A (en) Abnormal traffic detection method based on generation countermeasure network
US20230418943A1 (en) Method and device for image-based malware detection, and artificial intelligence-based endpoint detection and response system using same
CN113987243A (en) Image file gathering method, image file gathering device and computer readable storage medium
CN117992953A (en) Abnormal user behavior identification method based on operation behavior tracking
CN113722719A (en) Information generation method and artificial intelligence system for security interception big data analysis
CN112257076B (en) Vulnerability detection method based on random detection algorithm and information aggregation
US11539730B2 (en) Method, device, and computer program product for abnormality detection
CN111988327B (en) Threat behavior detection and model establishment method and device, electronic equipment and storage medium
CN112463740A (en) Method and system for automatic log security audit
CN115604032B (en) Method and system for detecting complex multi-step attack of power system
KR20220009098A (en) A Study on Malware Detection System Using Static Analysis and Stacking
CN116707924A (en) Network security detection method and system based on big data analysis
CN113849810B (en) Identification method, device, equipment and storage medium for risk operation behavior
EP4254241A1 (en) Method and device for image-based malware detection, and artificial intelligence-based endpoint detection and response system using same
CN116227916A (en) Real-time wind control system and method based on rule engine
CN113542200B (en) Risk control method, risk control device and storage medium
CN111209158B (en) Mining monitoring method and cluster monitoring system for server cluster
CN112597498A (en) Webshell detection method, system and device and readable storage medium
CN113297582A (en) Safety portrait generation method based on information safety big data and big data system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210309

RJ01 Rejection of invention patent application after publication