CN112765172A - Log auditing method, device, equipment and readable storage medium - Google Patents
Log auditing method, device, equipment and readable storage medium Download PDFInfo
- Publication number
- CN112765172A CN112765172A CN202110056186.2A CN202110056186A CN112765172A CN 112765172 A CN112765172 A CN 112765172A CN 202110056186 A CN202110056186 A CN 202110056186A CN 112765172 A CN112765172 A CN 112765172A
- Authority
- CN
- China
- Prior art keywords
- label
- tree
- weight
- log
- target feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000002159 abnormal effect Effects 0.000 claims description 11
- 238000012550 audit Methods 0.000 claims description 10
- 230000006399 behavior Effects 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 abstract description 10
- 238000010586 diagram Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000010354 integration Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The log auditing method comprises the steps of constructing a tree-shaped label system through historical log data, determining the contribution rate of target characteristics to each label in the tree-shaped label system, and updating the weight of the target characteristics according to the contribution rate, so that the weight with high matching degree is gradually increased in the updating iteration process, the weight with low matching degree is reduced, the matching degree between the label and the characteristics can be improved, and further, the label of log data to be audited is determined based on the tree-shaped label system after the weight is updated; and auditing the log data to be audited based on the label, so that the label of the log data to be audited can be comprehensively and accurately determined, and the log auditing efficiency is improved.
Description
Technical Field
The disclosure belongs to the technical field of information security, and particularly relates to a log auditing method, device, equipment and readable storage medium.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The log is a detailed record of a business operation or customer service process, and the rapid analysis and analysis of the log play a very important role in improving the business service level, improving the user service perception, and even improving the business development. The data label is used for reflecting the attribute of the data, is obtained by human according to experience judgment, embodies the relation between the data and the data in a data labeling mode, and is convenient for mining the value of the big data from a business angle.
The inventor finds that in the log auditing process, a manual label setting method is usually adopted to perform label processing on the log, so that the determined label is often poor in comprehensiveness, and the problem that the determined label is inaccurate due to mismatching between the characteristics and the label is caused by the fact that the characteristics of the log are determined first and then the label is set according to the characteristics is solved.
Disclosure of Invention
In order to solve the problems, the disclosure provides a log auditing method, a log auditing device, log auditing equipment and a readable storage medium, wherein a tree-shaped label system is constructed through historical log data, the contribution rate of a target feature to each label in the tree-shaped label system is determined, the weight of the target feature is updated according to the contribution rate so as to improve the matching degree between the label and the feature, and then the label of the log data to be audited is determined based on the tree-shaped label system after the weight is updated so as to comprehensively and accurately determine the label of the log data to be audited.
The application mainly comprises the following aspects:
in a first aspect, an embodiment of the present application provides a log auditing method, where the log auditing method includes:
constructing a tree-shaped label system based on a plurality of labels of the acquired historical log data and the characteristics corresponding to the labels;
for each label in the tree-shaped label system, determining a target feature corresponding to the label, determining the contribution rate of the target feature to the label based on the target feature and the preset weight of the target feature, and updating the weight of the target feature based on the contribution rate; the contribution rate is used for representing the influence degree of the target characteristics on the label;
determining a label of the log data to be evaluated based on the tree-shaped label system after the weight is updated;
and auditing the log data to be audited based on the label.
In a possible implementation manner, before constructing a tree label system based on a plurality of labels of the obtained historical log data and features corresponding to the labels, the log auditing method further includes:
acquiring historical log data, and performing data cleaning and structuring processing on the historical log data;
and determining a plurality of labels of the historical log data and the corresponding characteristics of the labels.
In a possible implementation manner, after the determining a plurality of labels of the historical log data and features corresponding to the labels, the log auditing method further includes:
acquiring the occurrence frequency of each label within a preset time range, and determining the label with the frequency lower than a preset frequency threshold value as a target label;
filtering out a target label from the plurality of labels.
In a possible implementation manner, the tree-shaped label system includes at least one label tree, the label tree is constructed by using a label as a leaf node, the label tree further includes a middle node, and the middle node of the label tree is determined based on a feature corresponding to each label.
In a possible implementation, the updating the weight of the target feature based on the contribution rate includes:
determining a difference between the weight and the contribution rate of the target feature;
if the difference is larger than or equal to a preset difference threshold, updating the weight of the target feature into the contribution rate, and determining the contribution rate of the target feature to the label based on the updated weight;
and if the difference is smaller than the difference threshold, determining the current weight as the final weight.
In one possible embodiment, the contribution rate of the target feature to the tag is determined according to the following:
here, CiIs the contribution rate of the target feature to the label, NdIs the depth, w, of the tag tree in the tree-like tag systemijIs the weight of the target feature, NtIs the total number of the target features, n is the possible value number of the target features, pjIs the target feature.
In a possible implementation manner, after the auditing the log data to be audited based on the label, the log auditing method further includes:
and detecting abnormal operation behaviors based on the auditing result of the log data to be audited, and generating corresponding safety early warning information.
In a second aspect, an embodiment of the present application provides a log auditing apparatus, where the log auditing apparatus includes:
the tree-shaped label system building module is used for building a tree-shaped label system based on a plurality of labels of the acquired historical log data and the characteristics corresponding to the labels;
the weight updating module is used for determining a target feature corresponding to each label in the tree-shaped label system, determining the contribution rate of the target feature to the label based on the target feature and the preset weight of the target feature, and updating the weight of the target feature based on the contribution rate; the contribution rate is used for representing the influence degree of the target characteristics on the label;
the label determining module is used for determining a label of the log data to be audited based on the tree-shaped label system after the weight is updated;
and the auditing module is used for auditing the log data to be audited based on the label.
In a possible implementation manner, the log auditing apparatus further includes:
the acquisition module is used for acquiring historical log data and carrying out data cleaning and structuring on the historical log data;
and the determining module is used for determining a plurality of labels of the historical log data and the characteristics corresponding to the labels.
In a possible implementation manner, the log auditing apparatus further includes:
the target tag obtaining module is used for obtaining the occurrence frequency of each tag within a preset time range and determining the tag with the frequency lower than a preset frequency threshold value as a target tag;
and the filtering module is used for filtering a target label in the plurality of labels.
In a possible implementation manner, the tree-shaped label system includes at least one label tree, the label tree is constructed by using a label as a leaf node, the label tree further includes a middle node, and the middle node of the label tree is determined based on a feature corresponding to each label.
In a possible implementation manner, when the weight updating module is configured to update the weight of the target feature based on the contribution rate, the weight updating module is specifically configured to:
determining a difference between the weight and the contribution rate of the target feature;
if the difference is larger than or equal to a preset difference threshold, updating the weight of the target feature into the contribution rate, and determining the contribution rate of the target feature to the label based on the updated weight;
and if the difference is smaller than the difference threshold, determining the current weight as the final weight.
In a possible implementation, the weight update module is configured to determine the contribution rate of the target feature to the tag according to the following:
here, CiIs the contribution rate of the target feature to the label, NdIs the depth, w, of the tag tree in the tree-like tag systemijIs the weight of the target feature, NtIs the total number of the target features, n is the possible value number of the target features, pjIs the target feature.
In a possible implementation manner, the log auditing apparatus further includes:
and the early warning information generation module is used for detecting abnormal operation behaviors based on the auditing result of the log data to be audited and generating corresponding safety early warning information.
In a third aspect, an embodiment of the present application provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when a computer device is running, the machine-readable instructions when executed by the processor performing the steps of the log auditing method as described in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, the present embodiment provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the log auditing method according to the first aspect or any one of the possible implementation manners of the first aspect.
Compared with the prior art, the beneficial effect of this disclosure is:
(1) the labels and the characteristics included in the tree label system provided by the present disclosure, and the corresponding relationship between the labels and the characteristics are obtained according to historical log data. Compared with the prior art, the method provided by the disclosure is more objective and can relatively comprehensively determine the label and the characteristics according to the mode of empirically determining the label.
(2) According to the method, for each label in a tree-shaped label system, the contribution rate of the target feature to the label is determined, the weight of the target feature is updated according to the contribution rate, the weight with high matching degree is gradually increased in the updating iteration process, the weight with low matching degree is reduced, the finally determined label and the feature are guaranteed to have high matching degree, and the accuracy of the determined label is improved.
(3) Since the same feature can present different characteristics at different positions, when determining the contribution rate of the feature to the label, the present disclosure considers the distribution characteristic of the feature at the position, that is, by adding the information entropy H (θ) of each feature into the contribution rate determination formula, the feature describing the label can be determined adaptively according to the difference of log data, the influence of subjective factors on label determination is avoided, and the degree of fit between the finally constructed label system and the actual application is improved.
(4) The method and the device for auditing the log data determine the label of the log data to be audited based on the tree-shaped label system after the weight is updated, and audit is performed on the log data to be audited based on the label, so that the human resource consumption can be reduced to a great extent, the audit efficiency is improved, moreover, the audit result is less influenced by human subjective factors, and the audit result is more accurate.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a flow chart of a method for auditing logs provided by an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method for auditing logs according to another embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a log auditing apparatus provided by an embodiment of the present disclosure;
fig. 4 is a second schematic structural diagram of a log auditing apparatus provided in the embodiment of the present disclosure;
fig. 5 is a schematic diagram of a computer device provided by an embodiment of the present disclosure.
The specific implementation mode is as follows:
the present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
In the present disclosure, terms such as "upper", "lower", "left", "right", "front", "rear", "vertical", "horizontal", "side", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only relational terms determined for convenience in describing structural relationships of the parts or elements of the present disclosure, and do not refer to any parts or elements of the present disclosure, and are not to be construed as limiting the present disclosure.
In the present disclosure, terms such as "fixedly connected", "connected", and the like are to be understood in a broad sense, and mean either a fixed connection or an integrally connected or detachable connection; may be directly connected or indirectly connected through an intermediate. For persons skilled in the art, the specific meanings of the above terms in the present disclosure can be determined according to specific situations, and are not to be construed as limitations of the present disclosure.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
A log is information that a computer system, device, software, etc. records under certain circumstances. The specific content depends on the source of the log, for example, a Web server typically logs when someone accesses a Web page requesting resource (picture, file, etc.). If the page accessed by the user needs to be authenticated, the log message will contain the username. Through the log I T, the manager can know the operation condition, safety condition and even operation condition of the system. Auditing is to locate the security problem of the system by tracking the access content and access mode of the client, so as to make corresponding security measures to make up for the existing vulnerability problem.
In the log auditing process, a manual label setting method is usually adopted to perform label processing on the log, so that the determined label is often poor in comprehensiveness, and the problem that the determined label is inaccurate due to mismatching between the characteristics and the label is caused by firstly determining the characteristics of the log and then setting the label according to the characteristics.
Based on the above, the disclosure provides a log auditing method, device, equipment and readable storage medium, a tree-shaped label system is constructed through historical log data, the contribution rate of a target feature to each label in the tree-shaped label system is determined, the weight of the target feature is updated according to the contribution rate so as to improve the matching degree between the label and the feature, and then the label of the log data to be audited is determined based on the tree-shaped label system after the weight is updated so as to comprehensively and accurately determine the label of the log data to be audited.
To facilitate understanding of the embodiment, a detailed description is first given to the log auditing method disclosed in the embodiment of the present invention, and an execution subject of the log auditing method provided in the embodiment of the present invention may be a cloud platform or a server interacting with a user side. The following describes a log auditing method provided by an embodiment of the present invention with an execution subject as a server.
Example one
Referring to fig. 1, fig. 1 is a flowchart of a log auditing method provided by an embodiment of the present disclosure, and as shown in fig. 1, the log auditing method includes steps S101 to S104, where:
s101: and constructing a tree-shaped label system based on the plurality of labels of the acquired historical log data and the characteristics corresponding to the labels.
In specific implementation, the historical log data within a preset time range is first obtained, for example, log data of the last year, a quarter or a month can be obtained, a specific time range can also be set, the log data within the time range is obtained, a part or all of the obtained historical log data are labeled with tags in a manual mode, and then a tree-shaped tag system is constructed according to the tags and the characteristics corresponding to the tags.
The historical log data is usually acquired through various channels such as websites, database import, third-party equipment and the like, the data sources are wide, the data structures are various, and partial data have the problems of deviation, inconsistency and the like, so that the acquired historical log data needs to be preprocessed before a tree-shaped label system is constructed. In this embodiment of the present disclosure, as an optional implementation manner, before S101, that is, before a tree label system is constructed based on a plurality of labels of acquired historical log data and a feature corresponding to each label, the log auditing method further includes:
step a 11: and acquiring historical log data, and performing data cleaning and structuring on the historical log data.
In a specific implementation, the data washing and structuring process of the historical log data can include the following aspects:
(1) cleaning: erroneous and/or inconsistent data in historical log data is detected and eliminated to improve the quality of the data. The method specifically comprises the following steps: denoising, deduplication, missing data processing (interpolation, reconstruction, discarding, etc.).
(2) Integration: mainly for the case of multiple data sources. The data selection, the conflict problem and the inconsistent processing problem achieve the aim of simplifying the data set after integration.
(3) Conversion: and processing the non-normalized data and the data which can not be processed by the data mining model through operations such as smoothing, filtering, clustering, normalization, transformation, normalization and the like.
(4) And (3) stipulation: i.e. dimensionality reduction. And (4) removing the attributes which can not describe the key features of the system and the attributes which do not accord with the mining subject, and sampling samples.
Step a 12: and determining a plurality of labels of the historical log data and the corresponding characteristics of the labels.
In a specific implementation, a plurality of labels and corresponding features of the labels are extracted from historical log data after data washing and structuring processing.
Therefore, sufficient basic data can be provided for constructing a tree-shaped label system and log auditing, and the accuracy of log auditing results is improved.
In the embodiment of the present disclosure, as an optional implementation, after step a12, that is, after determining a plurality of labels of the historical log data and corresponding features of the labels, the log auditing method further includes:
step a 13: acquiring the occurrence frequency of each label within a preset time range, and determining the label with the frequency lower than a preset frequency threshold value as a target label;
step a 14: filtering out a target label from the plurality of labels.
In specific implementation, the tags of the historical log data can be automatically counted according to a sliding window with a preset time length, and the target tags with the occurrence frequency lower than a certain set threshold value in the current sliding window are given out, so that the target tags are removed.
Therefore, the tags which are out of date or not used any more can be screened out by counting the occurrence frequency of the tags in a certain time period, the tags are removed, the effectiveness of the tags can be ensured, redundant data are reduced, and the log audit efficiency is improved.
In the embodiment of the present disclosure, as an optional implementation manner, the tree-shaped label system includes at least one label tree, the label tree is constructed by using a label as a leaf node, the label tree further includes a middle node, and the middle node of the label tree is determined based on a feature corresponding to each label.
In specific implementation, the steps of constructing the tree label system are as follows:
let the layer number of the original label tree be C, degree be D, the number of nodes be N, the nodes of the original label tree be { j1,j2,j3,…,jN}。
Generating an initial tree label system consisting of Num trees:
1) randomly generating a natural number K, wherein K is less than or equal to N;
2) randomly generating K non-repeating integers { jI1,jI2,jI3,…,jIKJ is more than or equal to 0Ii≤N,i∈{1,…,K};
3) Extracting a node set G from the original label tree;
G={ti,i={jI1,jI2,jI3,…,jIK}};
4) generating a label tree by the node set G, wherein the generation process is as follows:
a. taking any node from G as a root node;
b. then, one node is selected as a child node of the root node to form an initial tree with only 2 nodes;
c. taking one other node from G, judging the similarity between the selected node and each node in the initial tree (the similarity is obtained by calculation according to the characteristics of the selected node and the nodes in the tree, such as calculating the Euclidean distance between the characteristics of the two nodes), and selecting the nodes in the tree with small distance;
d. judging the current degree of the nodes in the selected tree, if the current degree is not greater than D-1, taking the selected nodes as child nodes of the nodes in the selected tree, otherwise, selecting the nodes in the tree with the second smallest distance, judging the current degree of the nodes in the selected tree, if the current degree is not greater than D-1, taking the selected nodes as child nodes of the nodes in the selected tree, and if the node degrees in all the trees are not greater than D-1, finishing the construction of the tree to form the final label tree.
5) And (4) carrying out cyclic processing for 1) -4) Num times to finally form a tree-shaped label system.
In this way, the labels and the features included in the tree label system and the corresponding relationship between the labels and the features are obtained according to the historical log data. Compared with the prior art, the method for determining the label based on experience is more objective, closer to the actual application scene, and can determine the label and the characteristics relatively comprehensively.
S102: for each label in the tree-shaped label system, determining a target feature corresponding to the label, determining the contribution rate of the target feature to the label based on the target feature and the preset weight of the target feature, and updating the weight of the target feature based on the contribution rate; the contribution rate is used for characterizing the influence degree of the target characteristics on the label.
In this embodiment of the present disclosure, as an optional implementation, the updating the weight of the target feature based on the contribution ratio includes:
determining a difference between the weight and the contribution rate of the target feature;
if the difference is larger than or equal to a preset difference threshold, updating the weight of the target feature into the contribution rate, and determining the contribution rate of the target feature to the label based on the updated weight;
and if the difference is smaller than the difference threshold, determining the current weight as the final weight.
In specific implementation, the contribution rate C of each feature to the label is obtainediThen, if | Ci-wijIf | > or | threshold |, then let wij=CiDetermining the contribution rate of the target feature to the label based on the updated weight; if | Ci-wij|<If threshold, then determine the current weight as the final weight.
In the embodiment of the present disclosure, as an optional implementation manner, the contribution rate of the target feature to the tag is determined according to the following manner:
here, CiIs the contribution rate of the target feature to the label, NdIs the depth, w, of the tag tree in the tree-like tag systemijIs the weight of the target feature, NtIs the total number of the target features, n is the possible value number of the target features, pjIs the target feature.
The tree-shaped label system comprises at least one label tree, and the depth N of each label tree in the tree-shaped label systemd,d=1,…,NL,NLIs the tree of the label tree.
Since the same feature can present different characteristics at different positions, when determining the contribution rate of the feature to the label, the present disclosure considers the distribution characteristic of the feature at the position, that is, by adding the information entropy H (θ) of each feature into the contribution rate determination formula, the feature describing the label can be determined adaptively according to the difference of log data, the influence of subjective factors on label determination is avoided, and the degree of fit between the finally constructed label system and the actual application is improved.
S103: and determining the label of the log data to be audited based on the tree-shaped label system after the weight is updated.
In specific implementation, a position corresponding to log data to be audited in a tree-shaped label system is determined, and a label at the position is marked as a label of the log data to be audited.
S104: and auditing the log data to be audited based on the label.
According to the log auditing method provided by the embodiment of the disclosure, a tree-shaped label system is constructed through historical log data, the contribution rate of a target feature to each label in the tree-shaped label system is determined, and the weight of the target feature is updated according to the contribution rate, so that the weight with high matching degree is gradually increased and the weight with low matching degree is reduced in the updating iteration process, the matching degree between the label and the feature can be improved, and further, the label of log data to be audited is determined based on the tree-shaped label system after the weight is updated; and auditing the log data to be audited based on the label, so that the label of the log data to be audited can be comprehensively and accurately determined, and the log auditing efficiency is improved.
Example two
Referring to fig. 2, fig. 2 is a flowchart of a log auditing method according to another embodiment of the present disclosure, and as shown in fig. 2, the log auditing method includes:
s201: constructing a tree-shaped label system based on a plurality of labels of the acquired historical log data and the characteristics corresponding to the labels;
s202: for each label in the tree-shaped label system, determining a target feature corresponding to the label, determining the contribution rate of the target feature to the label based on the target feature and the preset weight of the target feature, and updating the weight of the target feature based on the contribution rate; the contribution rate is used for representing the influence degree of the target characteristics on the label;
s203: determining a label of the log data to be evaluated based on the tree-shaped label system after the weight is updated;
s204: and auditing the log data to be audited based on the label.
The descriptions of S201 to S204 may refer to the descriptions of S101 to S104, and the same technical effects can be achieved, which are not described in detail herein.
S205: and detecting abnormal operation behaviors based on the auditing result of the log data to be audited, and generating corresponding safety early warning information.
In specific implementation, after the label of the log data to be audited is determined, according to the audit result, the abnormal operation behavior in the log data to be audited is determined, corresponding safety early warning information is generated, and a user is reminded of abnormal operation, so that corresponding safety measures can be made conveniently, and the current loophole problem can be solved.
According to the log auditing method provided by the embodiment of the disclosure, a tree-shaped label system is constructed through historical log data, the contribution rate of a target feature to each label in the tree-shaped label system is determined, the weight of the target feature is updated according to the contribution rate, the weight with high matching degree is gradually increased in the updating iteration process, the weight with low matching degree is reduced, and the matching degree between the label and the feature can be improved; determining the label of the log data to be audited based on the tree-shaped label system after the weight is updated, and comprehensively and accurately determining the label of the log data to be audited; and based on the audit result of the log data to be audited, abnormal operation behaviors are detected, corresponding safety early warning information is generated, and the abnormal operation behaviors can be found in time, so that corresponding safety measures can be made conveniently, and the current loophole problem can be solved.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a log auditing apparatus according to an embodiment of the present disclosure, and as shown in fig. 3, the log auditing apparatus 300 includes:
a tree label system building module 310, configured to build a tree label system based on the multiple labels of the acquired historical log data and the characteristics corresponding to the labels;
the weight updating module 320 is configured to determine, for each label in the tree-shaped label system, a target feature corresponding to the label, determine, based on the target feature and a preset weight of the target feature, a contribution rate of the target feature to the label, and update the weight of the target feature based on the contribution rate; the contribution rate is used for representing the influence degree of the target characteristics on the label;
the label determining module 330 is configured to determine a label of the log data to be audited based on the tree-shaped label system after the weight is updated;
and the auditing module 340 is used for auditing the log data to be audited based on the label.
In a possible implementation, the log auditing apparatus 300 further includes:
an obtaining module (not shown in the figure) for obtaining historical log data, and performing data cleaning and structuring processing on the historical log data;
and the determining module is used for determining a plurality of labels of the historical log data and the characteristics corresponding to the labels.
In a possible implementation, the log auditing apparatus 300 further includes:
the target tag obtaining module is used for obtaining the occurrence frequency of each tag within a preset time range and determining the tag with the frequency lower than a preset frequency threshold value as a target tag;
and the filtering module is used for filtering a target label in the plurality of labels.
In a possible implementation manner, the tree-shaped label system includes at least one label tree, the label tree is constructed by using a label as a leaf node, the label tree further includes a middle node, and the middle node of the label tree is determined based on a feature corresponding to each label.
In a possible implementation, when the weight updating module 320 is configured to update the weight of the target feature based on the contribution rate, the weight updating module is specifically configured to:
determining a difference between the weight and the contribution rate of the target feature;
if the difference is larger than or equal to a preset difference threshold, updating the weight of the target feature into the contribution rate, and determining the contribution rate of the target feature to the label based on the updated weight;
and if the difference is smaller than the difference threshold, determining the current weight as the final weight.
In one possible implementation, the weight update module 320 is configured to determine the contribution rate of the target feature to the tag according to the following manner:
here, CiIs the contribution rate of the target feature to the label, NdIs the depth, w, of the tag tree in the tree-like tag systemijIs the weight of the target feature, NtIs the total number of the target features, n is the possible value number of the target features, pjIs the target feature.
In a possible implementation, as shown in fig. 4, the log auditing apparatus 300 further includes:
and the early warning information generating module 350 is configured to detect an abnormal operation behavior based on an audit result of the log data to be audited, and generate corresponding safety early warning information.
According to the log auditing device provided by the embodiment of the disclosure, a tree-shaped label system building module builds a tree-shaped label system based on historical log data; the weight updating module determines the contribution rate of the target feature to each label in the tree-shaped label system, and updates the weight of the target feature according to the contribution rate, so that the weight with high matching degree is gradually increased and the weight with low matching degree is reduced in the updating iteration process, and the matching degree between the label and the feature can be improved; and then, the label determining module determines the label of the log data to be audited based on the tree-shaped label system after the weight is updated, and the auditing module audits the log data to be audited based on the label, so that the label of the log data to be audited can be comprehensively and accurately determined.
In addition, the early warning information generation module detects abnormal operation behaviors based on the auditing result of the log data to be audited, generates corresponding safety early warning information, and can find the abnormal operation behaviors in time so as to make corresponding safety measures and make up the current loophole problem.
EXAMPLE III
Referring to fig. 5, fig. 5 is a schematic diagram of a computer device according to an embodiment of the invention. As shown in fig. 5, the computer device 500 includes a processor 510, a memory 520, and a bus 530.
The memory 520 stores machine-readable instructions executable by the processor 510, when the computer device 500 runs, the processor 510 communicates with the memory 520 through the bus 530, and when the machine-readable instructions are executed by the processor 510, the steps of the log auditing method in the method embodiments shown in fig. 1 and fig. 2 may be performed.
Example four
Based on the same application concept, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the log auditing method in the above method embodiment.
The computer program product of the log auditing method provided in the embodiments of the present invention includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the log auditing method in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.
Claims (10)
1. A log auditing method is characterized by comprising the following steps:
constructing a tree-shaped label system based on a plurality of labels of the acquired historical log data and the characteristics corresponding to the labels;
for each label in the tree-shaped label system, determining a target feature corresponding to the label, determining the contribution rate of the target feature to the label based on the target feature and the preset weight of the target feature, and updating the weight of the target feature based on the contribution rate; the contribution rate is used for representing the influence degree of the target characteristics on the label;
determining a label of the log data to be evaluated based on the tree-shaped label system after the weight is updated;
and auditing the log data to be audited based on the label.
2. The log auditing method according to claim 1, wherein before constructing a tree label system based on the plurality of labels of the acquired historical log data and the characteristics corresponding to each label, the log auditing method further comprises:
acquiring historical log data, and performing data cleaning and structuring processing on the historical log data;
and determining a plurality of labels of the historical log data and the corresponding characteristics of the labels.
3. The log auditing method of claim 2, after said determining a plurality of tags for the historical log data and a characteristic corresponding to each tag, the log auditing method further comprising:
acquiring the occurrence frequency of each label within a preset time range, and determining the label with the frequency lower than a preset frequency threshold value as a target label;
filtering out a target label from the plurality of labels.
4. The log auditing method according to claim 1, wherein the tree label system includes at least one label tree, the label tree is constructed with labels as leaf nodes, the label tree further includes intermediate nodes, and the intermediate nodes of the label tree are determined based on characteristics corresponding to the labels.
5. The log audit method of claim 1, wherein the updating the weight of the target feature based on the contribution rate comprises:
determining a difference between the weight and the contribution rate of the target feature;
if the difference is larger than or equal to a preset difference threshold, updating the weight of the target feature into the contribution rate, and determining the contribution rate of the target feature to the label based on the updated weight;
and if the difference is smaller than the difference threshold, determining the current weight as the final weight.
6. The log auditing method of claim 1 where the rate of contribution of target features to the label is determined according to:
here, CiIs the contribution rate of the target feature to the label, NdIs the depth, w, of the tag tree in the tree-like tag systemijIs the weight of the target feature, NtIs the total number of the target features, n is the possible value number of the target features, pjIs the target feature.
7. The log auditing method according to claim 1, wherein after said auditing of the log data to be audited based on the label, the log auditing method further comprises:
and detecting abnormal operation behaviors based on the auditing result of the log data to be audited, and generating corresponding safety early warning information.
8. A log auditing apparatus, comprising:
the tree-shaped label system building module is used for building a tree-shaped label system based on a plurality of labels of the acquired historical log data and the characteristics corresponding to the labels;
the weight updating module is used for determining a target feature corresponding to each label in the tree-shaped label system, determining the contribution rate of the target feature to the label based on the target feature and the preset weight of the target feature, and updating the weight of the target feature based on the contribution rate; the contribution rate is used for representing the influence degree of the target characteristics on the label;
the label determining module is used for determining a label of the log data to be audited based on the tree-shaped label system after the weight is updated;
and the auditing module is used for auditing the log data to be audited based on the label.
9. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is run, the machine-readable instructions when executed by the processor performing the steps of the log auditing method of any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon, a computer program for performing, when executed by a processor, the steps of the log auditing method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110056186.2A CN112765172A (en) | 2021-01-15 | 2021-01-15 | Log auditing method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110056186.2A CN112765172A (en) | 2021-01-15 | 2021-01-15 | Log auditing method, device, equipment and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112765172A true CN112765172A (en) | 2021-05-07 |
Family
ID=75702027
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110056186.2A Pending CN112765172A (en) | 2021-01-15 | 2021-01-15 | Log auditing method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112765172A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090216A (en) * | 2017-12-29 | 2018-05-29 | 咪咕文化科技有限公司 | Label prediction method, device and storage medium |
US20180336487A1 (en) * | 2017-05-17 | 2018-11-22 | Microsoft Technology Licensing, Llc | Tree ensemble explainability system |
US20190310635A1 (en) * | 2018-04-09 | 2019-10-10 | Diveplane Corporation | Computer Based Reasoning and Artificial Intelligence Systems |
CN110751188A (en) * | 2019-09-26 | 2020-02-04 | 华南师范大学 | User label prediction method, system and storage medium based on multi-label learning |
US20200193353A1 (en) * | 2018-12-13 | 2020-06-18 | Nice Ltd. | System and method for performing agent behavioral analytics |
CN111382352A (en) * | 2020-03-02 | 2020-07-07 | 腾讯科技(深圳)有限公司 | Data recommendation method and device, computer equipment and storage medium |
CN111695037A (en) * | 2020-06-11 | 2020-09-22 | 腾讯科技(北京)有限公司 | Information recommendation method and device based on artificial intelligence and electronic equipment |
US20200302318A1 (en) * | 2019-03-20 | 2020-09-24 | Oracle International Corporation | Method for generating rulesets using tree-based models for black-box machine learning explainability |
US20200372400A1 (en) * | 2019-05-22 | 2020-11-26 | The Regents Of The University Of California | Tree alternating optimization for learning classification trees |
CN112395261A (en) * | 2019-08-16 | 2021-02-23 | 中国移动通信集团浙江有限公司 | Service recommendation method and device, computing equipment and computer storage medium |
-
2021
- 2021-01-15 CN CN202110056186.2A patent/CN112765172A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180336487A1 (en) * | 2017-05-17 | 2018-11-22 | Microsoft Technology Licensing, Llc | Tree ensemble explainability system |
CN108090216A (en) * | 2017-12-29 | 2018-05-29 | 咪咕文化科技有限公司 | Label prediction method, device and storage medium |
US20190310635A1 (en) * | 2018-04-09 | 2019-10-10 | Diveplane Corporation | Computer Based Reasoning and Artificial Intelligence Systems |
US20200193353A1 (en) * | 2018-12-13 | 2020-06-18 | Nice Ltd. | System and method for performing agent behavioral analytics |
US20200302318A1 (en) * | 2019-03-20 | 2020-09-24 | Oracle International Corporation | Method for generating rulesets using tree-based models for black-box machine learning explainability |
US20200372400A1 (en) * | 2019-05-22 | 2020-11-26 | The Regents Of The University Of California | Tree alternating optimization for learning classification trees |
CN112395261A (en) * | 2019-08-16 | 2021-02-23 | 中国移动通信集团浙江有限公司 | Service recommendation method and device, computing equipment and computer storage medium |
CN110751188A (en) * | 2019-09-26 | 2020-02-04 | 华南师范大学 | User label prediction method, system and storage medium based on multi-label learning |
CN111382352A (en) * | 2020-03-02 | 2020-07-07 | 腾讯科技(深圳)有限公司 | Data recommendation method and device, computer equipment and storage medium |
CN111695037A (en) * | 2020-06-11 | 2020-09-22 | 腾讯科技(北京)有限公司 | Information recommendation method and device based on artificial intelligence and electronic equipment |
Non-Patent Citations (3)
Title |
---|
MINGYU LI: "A Two-Tier Service Filtering Model for Web Service QoS Prediction", 《IEEE》 * |
P DE BOVES HARRINGTON: "Support vector machine classification trees based on fuzzy entropy of classification", 《ELSEVIER》 * |
吉向文: "标签树模板在网页关键信息抽取及话题识别中的应用", 《万方数据》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11949747B2 (en) | Apparatus, method and article to facilitate automatic detection and removal of fraudulent user information in a network environment | |
US11475143B2 (en) | Sensitive data classification | |
US11354282B2 (en) | Classifying an unmanaged dataset | |
US11429878B2 (en) | Cognitive recommendations for data preparation | |
JP6307169B2 (en) | System and method for rapid data analysis | |
Meinshausen | Hierarchical testing of variable importance | |
US7676454B2 (en) | Private clustering and statistical queries while analyzing a large database | |
US7805443B2 (en) | Database configuration analysis | |
CN109981625B (en) | Log template extraction method based on online hierarchical clustering | |
CN111107072B (en) | Authentication graph embedding-based abnormal login behavior detection method and system | |
EP3396558B1 (en) | Method for user identifier processing, terminal and nonvolatile computer readable storage medium thereof | |
CN108959395B (en) | Multi-source heterogeneous big data oriented hierarchical reduction combined cleaning method | |
US20180109531A1 (en) | Anomaly detection using tripoint arbitration | |
CN103530312B (en) | Use the method and system of the ID of many-sided footprint | |
US11687804B2 (en) | Latent feature dimensionality bounds for robust machine learning on high dimensional datasets | |
CN117786656B (en) | API identification method and device, electronic equipment and storage medium | |
CN112348041B (en) | Log classification and log classification training method and device, equipment and storage medium | |
CN112765172A (en) | Log auditing method, device, equipment and readable storage medium | |
Hazelton | Testing for changes in spatial relative risk | |
US11983623B1 (en) | Data validation for automatic model building and release | |
CN112465073A (en) | Numerical value distribution anomaly detection method and system based on distance | |
CN111930545B (en) | SQL script processing method, SQL script processing device and SQL script processing server | |
CN113656267B (en) | Device energy efficiency calculation method and device, electronic device and storage medium | |
CN117892101B (en) | Universal method for identifying WebServer directory | |
CN111125685A (en) | Method and device for predicting network security situation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210507 |