CN112131199A - Log processing method, device, equipment and medium - Google Patents

Log processing method, device, equipment and medium Download PDF

Info

Publication number
CN112131199A
CN112131199A CN202011023270.6A CN202011023270A CN112131199A CN 112131199 A CN112131199 A CN 112131199A CN 202011023270 A CN202011023270 A CN 202011023270A CN 112131199 A CN112131199 A CN 112131199A
Authority
CN
China
Prior art keywords
log
classified
logs
vector
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011023270.6A
Other languages
Chinese (zh)
Inventor
张欢
范渊
刘博�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
Hangzhou Dbappsecurity Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dbappsecurity Technology Co Ltd filed Critical Hangzhou Dbappsecurity Technology Co Ltd
Priority to CN202011023270.6A priority Critical patent/CN112131199A/en
Publication of CN112131199A publication Critical patent/CN112131199A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Abstract

The application discloses a log processing method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring a log to be classified; extracting characteristic items of each log in the logs to be classified to obtain a log characteristic item set corresponding to each log in the logs to be classified; determining log vectors corresponding to all logs in the logs to be classified based on the log feature item sets corresponding to all logs in the logs to be classified; and classifying log vectors corresponding to all logs in the logs to be classified by utilizing an ant colony clustering algorithm so as to classify the logs to be classified. Therefore, the logs can be classified, the accuracy and consistency of classification results are improved, and the applicability is strong.

Description

Log processing method, device, equipment and medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for processing a log.
Background
Clustering analysis is an important branch in the field of data mining, and is to group data objects into multiple classes or clusters, wherein the objects in the same cluster have high similarity, and the objects in different clusters have large differences. The existing clustering algorithm is mainly divided into four categories, namely a dividing method, a hierarchical method, a density-based method and a grid-based method.
The inventors have found that there may be problems in the above prior art, one of which is that the user is required to provide certain clustering prior information, which results in that the clustering result is very sensitive to the input parameters, which greatly reduces the adaptability of the classification method. Secondly, the prior art is based on a heuristic mechanism algorithm, and the method has high solving efficiency, but is easy to fall into local optimum, so that the accuracy and consistency of a clustering result are difficult to ensure.
Disclosure of Invention
In view of this, an object of the present application is to provide a log processing method, apparatus, device, and medium, which can classify logs, improve accuracy and consistency of classification results, and have strong applicability. The specific scheme is as follows:
in a first aspect, the present application discloses a log processing method, including:
acquiring a log to be classified;
extracting characteristic items of each log in the logs to be classified to obtain a log characteristic item set corresponding to each log in the logs to be classified;
determining log vectors corresponding to all logs in the logs to be classified based on the log feature item sets corresponding to all logs in the logs to be classified;
and classifying log vectors corresponding to all logs in the logs to be classified by utilizing an ant colony clustering algorithm so as to classify the logs to be classified.
Optionally, after the obtaining the log to be classified, the method further includes:
and acquiring log classification parameters corresponding to the logs to be classified.
Optionally, the classifying the log vectors corresponding to the logs to be classified by using the ant colony clustering algorithm includes:
a01: determining initial clustering centers from the log vectors, and determining pheromones from the log vectors to be classified to all the initial clustering centers except the initial clustering centers;
a02: dividing each log vector to be classified into a class corresponding to the initial clustering center based on the pheromone;
a03: determining the distance sum of each log vector to other log vectors except the log vector;
a04: updating the initial clustering center based on the distance sum, and updating the pheromone;
and step A02 is executed again until the updated clustering center is the same as the clustering center before updating, or the current iteration number is equal to the preset maximum iteration number, and then the classification of the log vectors is finished.
Optionally, the determining pheromones from the log vectors to be classified to the initial clustering centers except the initial clustering centers in the log vectors includes:
determining pheromones from log vectors to be classified to all the initial clustering centers except the initial clustering centers in the log vectors based on a first operation formula, wherein the first operation formula is as follows:
Figure BDA0002701353380000021
wherein, tauijPheromone representing the ith log vector to be classified to the jth initial cluster center, dijAnd representing the Euclidean distance from the ith log vector to be classified to the jth initial cluster center, and r represents the preset cluster center radius.
Optionally, the dividing, based on the pheromone, each log vector to be classified into a class corresponding to the initial clustering center includes:
determining the probability of dividing each log vector to be classified into the class corresponding to each initial clustering center based on the pheromone and a second operation formula;
and dividing the log vectors to be classified into classes corresponding to the initial clustering centers according to the probabilities, wherein the second operation formula is as follows:
Figure BDA0002701353380000031
wherein, the PijAnd S represents a log vector set to be classified, wherein the Euclidean distance from the ith log vector to be classified to the jth initial clustering center is less than or equal to the preset clustering center radius.
Optionally, the determining a sum of distances from each log vector to other log vectors except for the log vector comprises:
determining the distance sum of each log vector to other log vectors except the log vector based on a third operation formula, wherein the third operation formula is as follows:
Figure BDA0002701353380000032
Figure BDA0002701353380000033
wherein L ismRepresents the sum of the distances from the m-th log vector to other log vectors except for itself, xmRepresents the m-th log vector, xmpFor the p-th value of the m-th log vector, N represents the total number of log vectors, cmpIs a transition vector cmP-th value of (1, | x)m-cm||2Representing a log vector xmAnd a transition vector cmThe square of the mode.
Optionally, the updating the initial clustering center based on the distance sum and the updating the pheromone includes:
determining the minimum distance and the corresponding log vector as a new clustering center, and updating the initial clustering center by using the new clustering center;
updating the pheromone by using a fourth operation formula, wherein the fourth operation formula is as follows:
Figure BDA0002701353380000034
wherein, tauij' Informative the updated ith log vector to be classified to the jth clustering center, tauijPheromone representing the ith log vector to be classified to the jth cluster center before updating, dijAnd expressing the Euclidean distance from the ith log vector to be classified to the jth clustering center, wherein rho expresses the volatility of pheromones, and Q expresses the total quantity of preset pheromones.
In a second aspect, the present application discloses a log processing apparatus, including:
the data acquisition module is used for acquiring the logs to be classified;
the characteristic item extraction module is used for extracting characteristic items of all the logs to be classified to obtain a log characteristic item set corresponding to all the logs to be classified;
the log vector determining module is used for determining log vectors corresponding to all logs in the logs to be classified based on the log feature item sets corresponding to all logs in the logs to be classified;
and the log classification module is used for classifying the log vectors corresponding to the logs to be classified by utilizing an ant colony clustering algorithm so as to classify the logs to be classified.
In a third aspect, the present application discloses an electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the log processing method disclosed in the foregoing.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the log processing method disclosed above.
Therefore, the method includes the steps of firstly obtaining logs to be classified, extracting feature items of the logs in the logs to be classified to obtain a log feature item set corresponding to each log in the logs to be classified, then determining log vectors corresponding to the logs in the logs to be classified based on the log feature item sets corresponding to the logs in the logs to be classified, and then classifying the log vectors corresponding to the logs in the logs to be classified by utilizing an ant colony clustering algorithm to classify the logs to be classified. Therefore, after the logs to be classified are obtained, the obtained logs to be classified are subjected to feature extraction and other processing to obtain corresponding log vectors, and then the obtained logs to be classified are classified by utilizing the ant colony clustering algorithm. In addition, the algorithm structure and operation of the ant colony clustering algorithm are simple and easy to realize.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a log processing method disclosed in the present application;
FIG. 2 is a partial flow diagram of a particular log processing method disclosed herein;
FIG. 3 is a flowchart of a specific log processing method disclosed in the present application;
FIG. 4 is a schematic diagram of a log processing apparatus according to the disclosure;
fig. 5 is a schematic structural diagram of an electronic device disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, an embodiment of the present application discloses a log processing method, including:
step S11: and acquiring a log to be classified.
In a specific implementation process, a log to be classified is required to be obtained first, wherein the log to be classified includes a plurality of logs, specifically, the log to be classified and the like can be obtained from an access log library of a website, and a specific way for obtaining the log to be classified is not limited herein.
After the log to be classified is obtained, correspondingly, a log classification parameter corresponding to the log to be classified is also obtained, wherein the log classification parameter includes, but is not limited to, a preset cluster center radius and the like.
Step S12: and extracting characteristic items of each log in the logs to be classified to obtain a log characteristic item set corresponding to each log in the logs to be classified.
After the log to be classified is obtained, the log to be classified needs to be correspondingly processed, so that log vectors corresponding to all logs in the log to be classified can be obtained, and corresponding equipment can conveniently perform classification processing.
Specifically, feature item extraction needs to be performed on each log in the logs to be classified to obtain a log feature item set corresponding to each log in the logs to be classified, wherein the feature items needing to be extracted from each log include, but are not limited to, excessive outbound traffic, excessive inbound traffic, off-hours VPN login, firewall acceptance, firewall rejection, login from outside an internal network, multiple continuous failed logins, at least one successful login, multiple target IPs probed from a single source, multiple target IPs and ports probed from a single source.
Step S13: and determining log vectors corresponding to the logs in the logs to be classified based on the log feature item sets corresponding to the logs in the logs to be classified.
After the log feature item sets corresponding to the logs in the logs to be classified are obtained, the log vectors corresponding to the logs in the logs to be classified can be determined based on the log feature item sets corresponding to the logs in the logs to be classified.
Specifically, for a current feature item corresponding to any log, if the feature item is extracted, a value corresponding to the feature item is represented as 1, and if the feature item is not extracted, a value corresponding to the feature item is represented as 0, so that a log vector corresponding to the log is obtained.
For example, for log a, there is extracted from the log a excessive outbound traffic, excessive inbound traffic, off-hours VPN login, firewall accept, firewall reject, and there is no extraction of login from outside the internal network, multiple failed logins in succession, at least one successful login, single source probing multiple target IPs and ports. The log vector corresponding to log a is represented as (1, 1, 1, 1, 1, 0, 0, 0, 0, 0).
Step S14: and classifying log vectors corresponding to all logs in the logs to be classified by utilizing an ant colony clustering algorithm so as to classify the logs to be classified.
After the log vectors corresponding to the logs in the logs to be classified are obtained, the ant colony clustering algorithm can be used for classifying the log vectors corresponding to the logs in the logs to be classified so as to classify the logs to be classified.
The foraging process of ants can be divided into two links of food searching and food carrying. Each ant releases pheromone on the path which the ant passes through in the moving process and can sense the pheromone and the intensity of the pheromone. The more ant passes, the stronger the pheromone is, and the pheromone itself volatilizes along with the lapse of time. Ants tend to move in the direction with high pheromone strength, the more ants travel on a certain path, the greater the probability that the later ants select the path, and the behavior of the whole ant colony shows the information positive feedback phenomenon. The ant colony clustering algorithm has the basic idea that data are regarded as ants with different attributes, a clustering center is a 'food source' to be searched by the ants, and then a data clustering process can be regarded as a process that the ants search the food source.
Therefore, the method includes the steps of firstly obtaining logs to be classified, extracting feature items of the logs in the logs to be classified to obtain a log feature item set corresponding to each log in the logs to be classified, then determining log vectors corresponding to the logs in the logs to be classified based on the log feature item sets corresponding to the logs in the logs to be classified, and then classifying the log vectors corresponding to the logs in the logs to be classified by utilizing an ant colony clustering algorithm to classify the logs to be classified. Therefore, after the logs to be classified are obtained, the obtained logs to be classified are subjected to feature extraction and other processing to obtain corresponding log vectors, and then the obtained logs to be classified are classified by utilizing the ant colony clustering algorithm. In addition, the algorithm structure and operation of the ant colony clustering algorithm are simple and easy to realize.
Referring to fig. 2, classifying the log vectors corresponding to each log in the logs to be classified by using an ant colony clustering algorithm may specifically include:
a01: determining initial clustering centers from the log vectors, and determining pheromones from the log vectors to be classified to all the initial clustering centers except the initial clustering centers;
a02: dividing each log vector to be classified into a class corresponding to the initial clustering center based on the pheromone;
a03: determining the distance sum of each log vector to other log vectors except the log vector;
a04: updating the initial clustering center based on the distance sum, and updating the pheromone;
and step A02 is executed again until the updated clustering center is the same as the clustering center before updating, or the current iteration number is equal to the preset maximum iteration number, and then the classification of the log vectors is finished.
Specifically, the ant colony clustering algorithm is used for classifying log vectors corresponding to each log in the logs to be classified, and initialization is needed first, that is, a certain number of initial clustering centers are determined randomly from the log vectors, and then the pheromones from the log vectors to be classified to each initial clustering center except the initial clustering centers are determined based on a first operation formula, wherein the first operation formula is as follows:
Figure BDA0002701353380000071
wherein, tauijPheromone representing the ith log vector to be classified to the jth initial cluster center, dijAnd representing the Euclidean distance from the ith log vector to be classified to the jth initial cluster center, and r represents the preset cluster center radius.
Then, based on the pheromone, dividing each log vector to be classified into a class corresponding to the initial clustering center, specifically, determining a probability of dividing each log vector to be classified into a class corresponding to each initial clustering center based on the pheromone and a second operation formula, and then dividing each log vector to be classified into a class corresponding to the initial clustering center according to the probability, wherein the second operation formula is as follows:
Figure BDA0002701353380000081
wherein, the PijAnd S represents a log vector set to be classified, wherein the Euclidean distance from the ith log vector to be classified to the jth initial clustering center is less than or equal to the preset clustering center radius. That is, the probability of dividing each log vector to be classified into the class corresponding to each initial clustering center is determined based on the pheromone and a second operation formula, and then the current log vector to be classified is divided into the class corresponding to the initial clustering center corresponding to the maximum probability.
In the practical process, alpha and beta can be 0.9 and 0.01 respectively, and alpha and beta play roles in preventing all ants from obtaining stagnation search generated by the same result along the same path and reproducing the classical greedy algorithm idea.
For example, the initial clustering center includes a log vector a and a log vector B, the log vector to be classified includes a log vector C, the probability of dividing the log vector C into the class corresponding to the log vector a is 0.7, and the probability of dividing the log vector C into the class corresponding to the log vector B is 0.3, then the log vector C is divided into the class corresponding to the log vector a.
And when the log vectors to be classified are all divided into classes corresponding to the corresponding initial clustering centers, completing the first round of clustering, and determining the distance sum of each log vector to other log vectors except the log vector.
Specifically, the sum of the distances from each log vector to other log vectors except the log vector is determined based on a third operation formula, where the third operation formula is:
Figure BDA0002701353380000082
Figure BDA0002701353380000091
wherein L ismRepresents the sum of the distances from the m-th log vector to other log vectors except for itself, xmRepresents the m-th log vector, xmpFor the p-th value of the m-th log vector, N represents the total number of log vectors, cmpIs a transition vector cmP-th value of (1, | x)m-cm||2Representing a log vector xmAnd a transition vector cmThe square of the mode. In the third operational formula cmAn intermediate transition vector.
The initial cluster center needs to be updated based on the distance sum, and the pheromone needs to be updated. Specifically, the minimum distance and the corresponding log vector are determined as a new clustering center, and the initial clustering center is updated by using the new clustering center; then, the pheromone is updated by using a fourth operation formula, wherein the fourth operation formula is as follows:
Figure BDA0002701353380000092
wherein, tauij' Informative the updated ith log vector to be classified to the jth clustering center, tauijPheromone representing the ith log vector to be classified to the jth cluster center before updating, dijAnd expressing the Euclidean distance from the ith log vector to be classified to the jth clustering center, wherein rho expresses the volatility of pheromones, and Q expresses the total quantity of preset pheromones.
After the new clustering center is determined, whether the determined new clustering center is the same as the clustering center before the new clustering center is determined or not can be judged, if yes, the clustering center tends to be stable, and classification is finished. If not, judging whether the current iteration number is not less than a preset maximum iteration number threshold, if so, executing the step A02 again, and if not, finishing the classification. That is, the step a02 is executed again until the updated cluster center is the same as the cluster center before updating, or the current iteration number is equal to the preset maximum iteration number, and then the classification of the log vectors is completed.
Referring to fig. 3, a log processing method is shown. Firstly, inputting a log to be classified, initializing related classification parameters, subtracting 1 from a preset maximum iteration number, then, using one log to be classified except the clustering center as an ant in the log to be classified, calculating the state transition probability of the ant i, namely calculating the probability of dividing the ant into the classes corresponding to the clustering centers, and dividing ants into corresponding classes (clusters) of corresponding clustering centers according to the probability, judging whether each ant is divided into the corresponding cluster, if so, recalculating the clustering center, updating pheromones from each ant to the clustering center, judging whether a termination condition is met, if the answer is satisfied, outputting a final solution, if the answer is not satisfied, repeating the steps of calculating the state transition probability of the ant i and dividing the ant into the class corresponding to the corresponding clustering center according to the probability.
Referring to fig. 4, an embodiment of the present application discloses a log processing apparatus, including:
the data acquisition module 11 is used for acquiring logs to be classified;
a feature item extraction module 12, configured to perform feature item extraction on each log in the logs to be classified to obtain a log feature item set corresponding to each log in the logs to be classified;
a log vector determining module 13, configured to determine, based on the log feature item set corresponding to each log in the logs to be classified, a log vector corresponding to each log in the logs to be classified;
the log classifying module 14 is configured to classify the log vectors corresponding to the logs to be classified by using an ant colony clustering algorithm, so as to classify the logs to be classified.
Therefore, the method includes the steps of firstly obtaining logs to be classified, extracting feature items of the logs in the logs to be classified to obtain a log feature item set corresponding to each log in the logs to be classified, then determining log vectors corresponding to the logs in the logs to be classified based on the log feature item sets corresponding to the logs in the logs to be classified, and then classifying the log vectors corresponding to the logs in the logs to be classified by utilizing an ant colony clustering algorithm to classify the logs to be classified. Therefore, after the logs to be classified are obtained, the obtained logs to be classified are subjected to feature extraction and other processing to obtain corresponding log vectors, and then the obtained logs to be classified are classified by utilizing the ant colony clustering algorithm. In addition, the algorithm structure and operation of the ant colony clustering algorithm are simple and easy to realize.
Specifically, the data obtaining module 11 is further configured to:
and acquiring log classification parameters corresponding to the logs to be classified.
Further, the log classification module 14 is configured to:
a01: determining initial clustering centers from the log vectors, and determining pheromones from the log vectors to be classified to all the initial clustering centers except the initial clustering centers;
a02: dividing each log vector to be classified into a class corresponding to the initial clustering center based on the pheromone;
a03: determining the distance sum of each log vector to other log vectors except the log vector;
a04: updating the initial clustering center based on the distance sum, and updating the pheromone;
and step A02 is executed again until the updated clustering center is the same as the clustering center before updating, or the current iteration number is equal to the preset maximum iteration number, and then the classification of the log vectors is finished.
Further, the log classification module 14 is configured to:
determining pheromones from log vectors to be classified to all the initial clustering centers except the initial clustering centers in the log vectors based on a first operation formula, wherein the first operation formula is as follows:
Figure BDA0002701353380000111
wherein, tauijPheromone representing the ith log vector to be classified to the jth initial cluster center, dijAnd representing the Euclidean distance from the ith log vector to be classified to the jth initial cluster center, and r represents the preset cluster center radius.
Further, the log classification module 14 is configured to:
determining the probability of dividing each log vector to be classified into the class corresponding to each initial clustering center based on the pheromone and a second operation formula;
and dividing the log vectors to be classified into classes corresponding to the initial clustering centers according to the probabilities, wherein the second operation formula is as follows:
Figure BDA0002701353380000112
wherein, the PijRepresenting the probability of dividing the ith log vector to be classified into the class corresponding to the jth initial clustering center, wherein alpha and beta are bothAnd S represents a log vector set to be classified, wherein the Euclidean distance from the jth initial clustering center to the jth initial clustering center is smaller than or equal to the preset radius of the clustering center, and is preset adjusting factors.
Further, the log classification module 14 is configured to:
determining the distance sum of each log vector to other log vectors except the log vector based on a third operation formula, wherein the third operation formula is as follows:
Figure BDA0002701353380000121
Figure BDA0002701353380000122
wherein L ismRepresents the sum of the distances from the m-th log vector to other log vectors except for itself, xmRepresents the m-th log vector, xmpFor the p-th value of the m-th log vector, N represents the total number of log vectors, cmpIs a transition vector cmP-th value of (1, | x)m-cm||2Representing a log vector xmAnd a transition vector cmThe square of the mode.
Further, the log classification module 14 is configured to:
determining the minimum distance and the corresponding log vector as a new clustering center, and updating the initial clustering center by using the new clustering center;
updating the pheromone by using a fourth operation formula, wherein the fourth operation formula is as follows:
Figure BDA0002701353380000123
wherein, tauij' Informative the updated ith log vector to be classified to the jth clustering center, tauijRepresenting the ith log vector to be classified into the jth cluster before updatingCardiac pheromone, dijAnd expressing the Euclidean distance from the ith log vector to be classified to the jth clustering center, wherein rho expresses the volatility of pheromones, and Q expresses the total quantity of preset pheromones.
Referring to fig. 5, a schematic structural diagram of an electronic device 20 provided in the embodiment of the present application is shown, where the electronic device 20 may specifically include, but is not limited to, a notebook computer, a desktop computer, a server, or the like.
In general, the electronic device 20 in the present embodiment includes: a processor 21 and a memory 22.
The processor 21 may include one or more processing cores, such as a four-core processor, an eight-core processor, and so on. The processor 21 may be implemented by at least one hardware of a DSP (digital signal processing), an FPGA (field-programmable gate array), and a PLA (programmable logic array). The processor 21 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 21 may be integrated with a GPU (graphics processing unit) which is responsible for rendering and drawing images to be displayed on the display screen. In some embodiments, the processor 21 may include an AI (artificial intelligence) processor for processing computing operations related to machine learning.
Memory 22 may include one or more computer-readable storage media, which may be non-transitory. Memory 22 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 22 is at least used for storing the following computer program 221, wherein after being loaded and executed by the processor 21, the steps of the log processing method disclosed in any one of the foregoing embodiments can be implemented.
In some embodiments, the electronic device 20 may further include a display 23, an input/output interface 24, a communication interface 25, a sensor 26, a power supply 27, and a communication bus 28.
Those skilled in the art will appreciate that the configuration shown in FIG. 5 is not limiting of electronic device 20 and may include more or fewer components than those shown.
Further, an embodiment of the present application also discloses a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the log processing method disclosed in any of the foregoing embodiments.
For the specific process of the log processing method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of other elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The log processing method, device, equipment and medium provided by the present application are introduced in detail, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A log processing method, comprising:
acquiring a log to be classified;
extracting characteristic items of each log in the logs to be classified to obtain a log characteristic item set corresponding to each log in the logs to be classified;
determining log vectors corresponding to all logs in the logs to be classified based on the log feature item sets corresponding to all logs in the logs to be classified;
and classifying log vectors corresponding to all logs in the logs to be classified by utilizing an ant colony clustering algorithm so as to classify the logs to be classified.
2. The log processing method according to claim 1, wherein after the obtaining of the log to be classified, the method further comprises:
and acquiring log classification parameters corresponding to the logs to be classified.
3. The log processing method according to claim 1 or 2, wherein the classifying the log vectors corresponding to the logs to be classified by using an ant colony clustering algorithm comprises:
a01: determining initial clustering centers from the log vectors, and determining pheromones from the log vectors to be classified to all the initial clustering centers except the initial clustering centers;
a02: dividing each log vector to be classified into a class corresponding to the initial clustering center based on the pheromone;
a03: determining the distance sum of each log vector to other log vectors except the log vector;
a04: updating the initial clustering center based on the distance sum, and updating the pheromone;
and step A02 is executed again until the updated clustering center is the same as the clustering center before updating, or the current iteration number is equal to the preset maximum iteration number, and then the classification of the log vectors is finished.
4. The log processing method according to claim 3, wherein the determining pheromones from the log vectors to be classified to each of the initial clustering centers except the initial clustering center in the log vectors comprises:
determining pheromones from log vectors to be classified to all the initial clustering centers except the initial clustering centers in the log vectors based on a first operation formula, wherein the first operation formula is as follows:
Figure FDA0002701353370000011
wherein, tauijPheromone representing the ith log vector to be classified to the jth initial cluster center, dijAnd representing the Euclidean distance from the ith log vector to be classified to the jth initial cluster center, and r represents the preset cluster center radius.
5. The log processing method according to claim 4, wherein the classifying the log vectors to be classified into the classes corresponding to the initial clustering centers based on the pheromone comprises:
determining the probability of dividing each log vector to be classified into the class corresponding to each initial clustering center based on the pheromone and a second operation formula;
and dividing the log vectors to be classified into classes corresponding to the initial clustering centers according to the probabilities, wherein the second operation formula is as follows:
Figure FDA0002701353370000021
wherein, the PijAnd S represents a log vector set to be classified, wherein the Euclidean distance from the ith log vector to be classified to the jth initial clustering center is less than or equal to the preset clustering center radius.
6. The log processing method according to claim 5, wherein said determining a sum of distances of each of the log vectors to other log vectors except for itself comprises:
determining the distance sum of each log vector to other log vectors except the log vector based on a third operation formula, wherein the third operation formula is as follows:
Figure FDA0002701353370000022
cm=(cm1,cm2,cm3,···cmp),
Figure FDA0002701353370000023
wherein L ismRepresents the sum of the distances from the m-th log vector to other log vectors except for itself, xmRepresents the m-th log vector, xmpFor the p-th value of the m-th log vector, N represents the total number of log vectors, cmpIs a transition vector cmP-th value of (1, | x)m-cm||2Representing a log vector xmAnd a transition vector cmThe square of the mode.
7. The log processing method of claim 6, wherein the updating the initial cluster center and the updating the pheromone based on the distance sum comprises:
determining the minimum distance and the corresponding log vector as a new clustering center, and updating the initial clustering center by using the new clustering center;
updating the pheromone by using a fourth operation formula, wherein the fourth operation formula is as follows:
Figure FDA0002701353370000031
wherein, tauij' Informative the updated ith log vector to be classified to the jth clustering center, tauijPheromone representing the ith log vector to be classified to the jth cluster center before updating, dijAnd expressing the Euclidean distance from the ith log vector to be classified to the jth clustering center, wherein rho expresses the volatility of pheromones, and Q expresses the total quantity of preset pheromones.
8. A log processing apparatus, comprising:
the data acquisition module is used for acquiring the logs to be classified;
the characteristic item extraction module is used for extracting characteristic items of all the logs to be classified to obtain a log characteristic item set corresponding to all the logs to be classified;
the log vector determining module is used for determining log vectors corresponding to all logs in the logs to be classified based on the log feature item sets corresponding to all logs in the logs to be classified;
and the log classification module is used for classifying the log vectors corresponding to the logs to be classified by utilizing an ant colony clustering algorithm so as to classify the logs to be classified.
9. An electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the log processing method according to any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the log processing method according to any one of claims 1 to 7.
CN202011023270.6A 2020-09-25 2020-09-25 Log processing method, device, equipment and medium Pending CN112131199A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011023270.6A CN112131199A (en) 2020-09-25 2020-09-25 Log processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011023270.6A CN112131199A (en) 2020-09-25 2020-09-25 Log processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN112131199A true CN112131199A (en) 2020-12-25

Family

ID=73840288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011023270.6A Pending CN112131199A (en) 2020-09-25 2020-09-25 Log processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN112131199A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632000A (en) * 2020-12-30 2021-04-09 北京天融信网络安全技术有限公司 Log file clustering method and device, electronic equipment and readable storage medium
CN113553499A (en) * 2021-06-22 2021-10-26 杭州摸象大数据科技有限公司 Cheating detection method and system based on marketing fission and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222098A (en) * 2011-06-20 2011-10-19 北京邮电大学 Method and system for pre-fetching webpage
CN102254004A (en) * 2011-07-14 2011-11-23 北京邮电大学 Method and system for modeling Web in weblog excavation
CN109543739A (en) * 2018-11-15 2019-03-29 杭州安恒信息技术股份有限公司 A kind of log classification method, device, equipment and readable storage medium storing program for executing
CN110633371A (en) * 2019-09-23 2019-12-31 北京安信天行科技有限公司 Log classification method and system
CN111159413A (en) * 2019-12-31 2020-05-15 深信服科技股份有限公司 Log clustering method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222098A (en) * 2011-06-20 2011-10-19 北京邮电大学 Method and system for pre-fetching webpage
CN102254004A (en) * 2011-07-14 2011-11-23 北京邮电大学 Method and system for modeling Web in weblog excavation
CN109543739A (en) * 2018-11-15 2019-03-29 杭州安恒信息技术股份有限公司 A kind of log classification method, device, equipment and readable storage medium storing program for executing
CN110633371A (en) * 2019-09-23 2019-12-31 北京安信天行科技有限公司 Log classification method and system
CN111159413A (en) * 2019-12-31 2020-05-15 深信服科技股份有限公司 Log clustering method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
白磊: "蚁群算法的改进及其应用研究", 《中国优秀硕士学位论文数据库》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632000A (en) * 2020-12-30 2021-04-09 北京天融信网络安全技术有限公司 Log file clustering method and device, electronic equipment and readable storage medium
CN112632000B (en) * 2020-12-30 2023-11-10 北京天融信网络安全技术有限公司 Log file clustering method, device, electronic equipment and readable storage medium
CN113553499A (en) * 2021-06-22 2021-10-26 杭州摸象大数据科技有限公司 Cheating detection method and system based on marketing fission and electronic equipment

Similar Documents

Publication Publication Date Title
US10846052B2 (en) Community discovery method, device, server and computer storage medium
EP3227836B1 (en) Active machine learning
WO2020073507A1 (en) Text classification method and terminal
US20230102337A1 (en) Method and apparatus for training recommendation model, computer device, and storage medium
US9536201B2 (en) Identifying associations in data and performing data analysis using a normalized highest mutual information score
EP3905126A2 (en) Image clustering method and apparatus
CN111382283B (en) Resource category label labeling method and device, computer equipment and storage medium
US20230195809A1 (en) Joint personalized search and recommendation with hypergraph convolutional networks
CN110598869B (en) Classification method and device based on sequence model and electronic equipment
CN112528022A (en) Method for extracting characteristic words corresponding to theme categories and identifying text theme categories
CN112131199A (en) Log processing method, device, equipment and medium
Concolato et al. Data science: A new paradigm in the age of big-data science and analytics
CN112418320A (en) Enterprise association relation identification method and device and storage medium
CN111159481B (en) Edge prediction method and device for graph data and terminal equipment
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN110705282A (en) Keyword extraction method and device, storage medium and electronic equipment
CN116451081A (en) Data drift detection method, device, terminal and storage medium
CN114897290A (en) Evolution identification method and device of business process, terminal equipment and storage medium
EP4336405A1 (en) Feature vector dimension compression method and apparatus, and device and medium
CN115130536A (en) Training method of feature extraction model, data processing method, device and equipment
US20200387811A1 (en) Systems and methods for neighbor frequency aggregation of parametric probability distributions with decision trees
CN113869904A (en) Suspicious data identification method, device, electronic equipment, medium and computer program
CN113094584A (en) Method and device for determining recommended learning resources
KR20210121773A (en) Apparatus and method for detecting community in large scale network
Thirunavukkarasu et al. Analysis of classification techniques in data mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination