CN109992476B

CN109992476B - Log analysis method, server and storage medium

Info

Publication number: CN109992476B
Application number: CN201910213011.0A
Authority: CN
Inventors: 陈涛
Original assignee: Wangsu Science and Technology Co Ltd
Current assignee: Wangsu Science and Technology Co Ltd
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2023-08-18
Anticipated expiration: 2039-03-20
Also published as: CN109992476A

Abstract

The embodiment of the application relates to the field of data processing, and discloses a log analysis method, a server and a storage medium. In some embodiments of the present application, a method for analyzing a log includes: acquiring a first log to be processed; processing the first log to obtain a word bag of the first log; determining similarity of a word bag of the first log and a word bag of a reference log in a mapping file, wherein the mapping file comprises the word bag of the reference log and a fault class of the reference log and/or a fault level of the reference log; and determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bags of the first log and the word bags of the reference log. In the implementation, the server can analyze the first log by using the mapping file, determine the fault class of the first log and/or the fault level of the first log, so that the intelligence of the server is improved, and the pressure of maintenance personnel from analyzing the log is reduced.

Description

Log analysis method, server and storage medium

Technical Field

The embodiment of the application relates to the field of data processing, in particular to a log analysis method, a server and a storage medium.

Background

The kernel log is a main means for recording the performance conditions of the server itself, the running processes, modules and the like in the running process. However, for a portion of the kernel message, the kernel log cannot record it. For example, when a system is down (panic), a part of information is displayed directly through a screen, and the part of information cannot be recorded in a kernel log due to the down. This part of the information disappears after the system is restarted. Currently, some transport tools, such as netcondosoles, solve the problem that this portion of the kernel log cannot be collected. The method sends the part of the kernel log to another server for storage through the network, so that the kernel log which is reserved by the system is as free of omission as possible.

However, the inventors found that there are at least the following problems in the prior art: the daily log volume is huge, especially for the number of servers at the enterprise level, and the manual processing of the log for each server wastes a lot of time and effort.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of an embodiment of the present invention is to provide a method for analyzing logs, a server, and a storage medium, so that the number of logs recorded is reduced, and the time and effort spent on manually processing the logs are reduced.

In order to solve the technical problem, the embodiment of the invention also provides a log analysis method, which comprises the following steps: acquiring a first log to be processed; processing the first log to obtain a word bag of the first log; determining similarity of a word bag of the first log and a word bag of a reference log in a mapping file, wherein the mapping file comprises the word bag of the reference log and a fault class of the reference log and/or a fault level of the reference log; and determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bags of the first log and the word bags of the reference log.

The embodiment of the invention also provides a server, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of analyzing logs as mentioned in the above embodiments.

The embodiment of the invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the method for analyzing logs mentioned in the above embodiment.

Compared with the prior art, the method and the device for processing the word bags of the log can determine the relation between the log to be processed and the word bags of the history log by comparing the word bags of the log to be processed with the word bags of the history log, and the relation reflects the relation between the log to be processed and the history log. The server can determine the relation between the logs to be processed and the history logs, so that the recorded logs can be selectively reserved according to the relation, the number of the recorded logs is reduced, and the burden of manually processing the logs is reduced.

In addition, determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bag of the first log and the word bag of the reference log specifically includes: the fault class of the reference log with the highest similarity with the word bag of the first log is used as the fault class of the first log, and/or the fault level of the reference log with the highest similarity with the word bag of the first log is used as the fault level of the first log. In the implementation, the fault type of the log and/or the fault level can be automatically determined, so that the intelligence is improved, and maintenance personnel can conveniently know the fault type and the fault level of the server.

In addition, determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bag of the first log and the word bag of the reference log specifically includes: judging whether a word bag of a reference log with the similarity to the word bag of the first log being larger than a second preset value exists in the mapping file; if the determination is, taking the fault class of the reference log with the highest similarity with the word bag of the first log as the fault class of the first log, and/or taking the fault level of the reference log with the highest similarity with the word bag of the first log as the fault level of the first log; otherwise, determining the fault class of the bag of words of the first log as an unknown class, and/or determining the fault level of the first log as an unknown level. In this implementation, it is enabled to automatically identify new faults.

In addition, the mapping file comprises a word bag of the reference log, a fault class of the reference log and a fault level of the reference log; after determining that the fault class of the bag of words of the first log is an unknown class and determining that the fault level of the first log is an unknown level, the log analysis method further comprises: reporting a first log; determining the fault class of the first log and the fault level of the first log according to the fault class and the fault level designated by the user; and updating the mapping file according to the word bags of the first logs, the fault types of the first logs and the fault levels of the first logs. In the implementation, the mapping file can be updated according to the identified log of the new fault type, and the mapping file is continuously expanded, so that the subsequent log can be analyzed more accurately.

In addition, the mapping file comprises a word bag of the reference log, a fault class of the reference log and a fault level of the reference log; after determining the fault class of the first log and the fault level of the first log according to the similarity between the word bag of the first log and the word bag of the reference log, the log analysis method further comprises the following steps: judging whether a second log exists in the recorded logs, wherein the second log is the log belonging to the same fault type as the first log; if the fault level of the first log is determined to exist, comparing the fault level of the first log with the fault level of the second log, and updating the recorded log according to the comparison result; if it is determined that the first log does not exist, the first log is recorded.

In addition, according to the comparison result, updating the recorded log specifically includes: if the comparison result is determined to indicate that the fault level of the first log is higher than that of the second log, covering the second log by the first log; and if the comparison result is determined to indicate that the fault level of the first log is not higher than that of the second log, the second log is not covered by the first log. In the implementation, the logs with higher fault levels in the same fault category are recorded, and the importance degree of the reference log is guaranteed to be continuously improved, so that the effect of continuously upgrading the alarm is achieved.

In addition, determining the similarity between the bag of words of the first log and the bag of words of the reference log in the mapping file specifically includes: calculating the similarity according to the constraint relation of the word bags of the first log, the word bags of the reference log and the similarity; wherein, the constraint relation is: similarity = number of words of the bag of words of the first log and the bag of words of the reference log present at the same time/(number of words of the bag of words of the first log + number of words of the reference log-number of words of the bag of words of the first log present at the same time).

In addition, before calculating the similarity according to the constraint relation of the word bag of the first log, the word bag of the reference log and the similarity, the analysis method of the log further comprises: removing invalid words in the word bags of the first log and the word bags of the reference log; wherein the invalid word is a pre-specified word. In this implementation, the influence of the invalid word on the similarity of the bag of words of the first log and the bag of words of the reference log in the mapping file can be avoided.

In addition, the processing of the first log to obtain the bag of words of the first log specifically includes: deleting a variable in the first log, wherein the variable is a preset parameter; splitting the first log after deleting the variables into N words, generating a word bag of the log to be processed, wherein N is a positive integer.

In addition, the preset parameters at least comprise any one of position information of a bad track, number information of the bad track, position information of a bad block and number information of the bad block.

In addition, deleting the variable in the first log specifically includes: identifying a number of a body portion of the first log; the digits of the body portion of the first log are deleted.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.

Fig. 1 is a flowchart of a method of processing a log according to a first embodiment of the present invention;

FIG. 2 is a flow chart of a method of processing logs of a second embodiment of the present invention;

FIG. 3 is a flow chart of a method of analyzing logs of a third embodiment of the present invention;

fig. 4 is a flowchart of a method of analyzing a log according to a fourth embodiment of the present invention;

fig. 5 is a schematic diagram of the structure of a server according to a fifth embodiment of the present invention;

fig. 6 is a schematic diagram of a server according to a sixth embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments.

The first embodiment of the application relates to a log processing method which is applied to a server. As shown in fig. 1, the log processing method includes:

step 101: and obtaining a log to be processed.

Specifically, the log to be processed may be a log generated by the server itself, or may be a log of another server stored on the server. The other servers can transmit their own logs to the server through netconfoles, and can copy the logs of the other servers to the server through other modes.

It should be noted that, as will be understood by those skilled in the art, the method for processing the log may be applied to a process of processing a plurality of recorded logs by a server, or may be applied to a process of determining whether to record the log after the log is generated by the server.

For clarity of explanation, in this embodiment, it is assumed that, after receiving the first log file, the server processes each log sequentially in order from old to new using the log processing method mentioned in this embodiment. Those skilled in the art can understand that, in practical application, the process of processing the log generated by the server may refer to the related content of this embodiment, which is not described herein.

Step 102: and processing the log to be processed to obtain a word bag of the log to be processed.

Specifically, the logs mainly consist of words, one log is converted into a word bag consisting of a plurality of words, the word bag does not contain repeated words, and the relationship among the logs can be determined through the relationship among the word bags of the logs.

In one example, the server first deletes the variable in the log to be processed; splitting the log to be processed after deleting the variables into N words, generating a word bag of the log to be processed, wherein N is a positive integer. The server compresses the log to be processed before converting the log into the word bag. Since there may be some information in the log to be processed in too great detail or information that is not important to the maintenance personnel, which is of little use for analyzing the running state of the server, the maintenance personnel may set the information as variables so that the server deletes the information and compresses the log when processing the log.

The following illustrates a process in which the server deletes a variable in a log to be processed.

In one example, the variable is a preset parameter. The preset parameters at least comprise any one of position information of a bad track, number information of the bad track, position information of a bad block and number information of the bad block. In this case, the method for deleting the variable by the server may be: identifying a number of a body part of the log to be processed; and deleting the numbers of the text parts of the logs to be processed.

For each log, it can be split into two parts, namely a timestamp part and a body part. For the text part, the preset variables are removed, and the quantification is reserved. Wherein, the quantification is important information in the log. The determination of variables and quantification may be set empirically and according to requirements.

In one example, the variables may include, but are not limited to, the following information:

1. information in too great detail, such as location information (e.g., address) of a bad track, location information of a bad block, number information of a bad track, and number information of a bad block, which is composed of a series of numbers or numbers plus english letters.

2. Less important information, such as 1 in sda1, represents the first partition of the disk named sda, where sda is important information and 1 is unimportant information.

Since the numbers are included in the information, the server can determine the variables in the log by identifying the numbers. Of course, in the present embodiment, the variables in the log are determined by the method of identifying the numbers, and the number is not represented as unimportant information. For example, in a log of the content Nov 26 00:24:04 CPU27:Package power limit notification (total events= 173318), the importance of 27 in the CPU27 is the same as the importance of a in the sda disk, and should not be deleted. Here, nov 26 00:24:04 CPU27:Package power limit notification (total events= 173318) represents a performance power limit notification (total event= 173318) of the central processing unit No. 27 (Central Processing Unit, CPU) at the point 0 minutes 4 seconds at the time of 11 months 26 days. In this case, the situation that the important information is deleted by mistake can be avoided by modifying the deletion rule. For example, before deleting the number of the body part of the log to be processed, the server judges whether the word preceding the number is a CPU or not, and if it is determined that the word is not a CPU, the number is deleted.

It should be noted that, as those skilled in the art can understand, when the preset variable changes, the method for identifying the variable by the server may also change, and in practical application, the method for identifying the variable by the server may be set according to the needs.

The following describes a procedure of compressing the log to be processed in combination with the actual situation.

For example, the log to be processed is: nov 20:01:02I/O error on device sdc1, logical block 1057. The body part of the log is: I/O error on device sdc, logical block 1057. Wherein, the variable is pdc 1, which refers to the first partition of the pdc disk, the variable 1057 is the 1057 th logic block, and the quantitative values are "I/O error on device" and "logical block". Thus, the log tells us that an error occurred in the read and write operations to logical block 1057 of the first partition of the pdc disk. The log is compressed, i.e. the information in the current log is modified or less important information is discarded, so as to achieve the purpose of compression. For example log records, this information can be discarded 1057, along with the "first partition" of pdc. Thus, the example log is compressed to obtain I/O error on device sdc logical block. This information indicates that the word disc named pdc has a logical block read-write error.

According to the method, important information in the log to be processed is reserved through compression processing of the log to be processed, less important information is discarded, and the storage space occupied by the log is reduced.

It should be noted that, in practical applications, the server may perform other processing on the log to be processed, and the compression processing is taken as an example in this embodiment, but the compression processing is not an essential step in the processing of the log to be processed, and the portion of the content may be selectively executed.

Step 103: and comparing the word bags of the logs to be processed with the word bags of the history logs, and determining the relationship between the word bags of the logs to be processed and the word bags of the history logs.

Specifically, the relationship between the bag of words of the log to be processed (hereinafter referred to as bag of words 1) and the bag of words of the history log (hereinafter referred to as bag of words 2) includes, but is not limited to: the first relationship, the second relationship, the third relationship, the fourth relationship, and the fifth relationship. The word bags of the history logs comprise word bags of the logs to be processed, the second relation is that the word bags of the history logs are equal to the word bags of the logs to be processed, the third relation is that the word bags of the logs to be processed comprise word bags of the history logs, the fourth relation is that the word bags of the logs to be processed are intersected with the word bags of the history logs, and the fifth relation is that the word bags of the logs to be processed are independent of the word bags of the history logs.

Various relationships between the word bag of the log to be processed (hereinafter, referred to as word bag 1) and the word bag of the history log (hereinafter, referred to as word bag 2) are explained below.

First, the first relationship and the third relationship, that is, the inclusion relationship characterized in that the bag of words 1 contains the bag of words 2 or that the bag of words 2 contains the bag of words 1 are explained. Bag of words 1 contains bag of words 2 meaning that bag of words 2 has all of the words present in bag of words 1, and bag of words 1 has some of the words not in bag of words 2. Bag of words 2 contains bag of words 1 meaning that bag of words 1 has all of the words present in bag of words 2, and bag of words 2 has some of the words not in bag of words 1. When a problem occurs in the transmission process of the logs, the relationship of the word bags of the same two logs may be the inclusion relationship. For example, for two identical logs, netcondosoles lose some elements in the process of transmitting the log, resulting in the server receiving a complete log and a incomplete log. In this case, the word bags of the complete log include word bags of the incomplete log, and there is a relationship of inclusion between the word bags.

The second relationship, namely the equal relationship characterized by the bag of words 1 and the bag of words 2, is then explained. Bag of words 1 is equal to bag of words 2, meaning that the words in bag of words 1 are identical to the words in bag of words 2. For example, the word bags generated by two identical logs are equal, or the word bags of the logs with the same important information are equal.

Next, a fourth relationship, that is, an intersecting relationship characterized by the intersection of the bag of words 1 and the bag of words 2 is explained. The intersection of the bag of words 1 and the bag of words 2 means that some of the words in the bag of words 1 and the bag of words 2 are in one-to-one correspondence, but that some of the words in each other are not present in the bag of words of the other party. When the log to be processed is the same as some important information in the history log and some important information is different, the bag of words 1 and the bag of words 2 are in an intersecting relationship. For example, the log to be processed is Nov 25 18:09:11 Kernel panic-not synchronization: fatal hardware error-! The history log is Nov 20:01:02I/O error on device sdc1, logical block 1057, both word bags have the word "error", but two completely different logs. Wherein, "Nov 25 18:09:11 Kernel panic-not synchronization: fatal hardware error-! "means that the system has a kernel crash-out of sync at 11 months, 25 days, 18 points, 9 minutes and 11 seconds: fatal hardware error-! "Nov 20:01:02I/O error on device sdc1, logical block 1057" indicates that a read/write error occurred on logical block 1057 of the first partition of disk named pdc at 0 point 1 minute 2 seconds on day 11 month 20.

Finally, the fifth relationship is explained, namely, the bag of words 1 and the bag of words 2 are explained independently. The word bag 1 is independent from the word bag 2, namely, the word bag 1 and the word bag 2 have no same words, and the corresponding logs are not connected at all.

In one example, a mask bag of words is provided in the server, which determines that all words in the mask bag of words in the log to be processed are not included in the bag of words before performing step 103. The mask word bag contains mask words, and when all words in the mask word bag are included in the word bag of the log to be processed, the server deletes the log.

In one example, 16 shielding word bags are provided in the server. The mask words in the 1 st mask word bag are audio (audit), the mask words in the 2 nd mask word bag are inodes (metadata nodes), the mask words in the 3 rd mask word bag are hooks, the mask words in the 4 th mask word bag are hooks, tasks, time out and seconds, the mask words in the 5 th mask word bag are CAPs (authorities), NET (networks) and ADMIN (administrators), the mask words in the 6 th mask word bag are filter systems, the mask words in the 7 th mask word bag are IPVs (IP Virtual Server, IP virtual servers), the mask words in the 8 th mask word bag are the words, keys, cross (information), the mask words in the 9 th mask word bag are USB (Universal Serial Bus), universal buses), the mask words in the 10 th mask word bag are map (authorities), the mask words in the 11 th mask word bag are exam words, the end words in the end of the window (14), the mask words in the end of the window) and the end of the filter words in the 14 th mask word bag are certificates (end of the filter words), the filter words in the filter bags and the end of the filter cards (end of the filter cards) are the end of the filter cards (filter cards), the mask words in the 15 th mask bag are bus and error, and the mask words in the 16 th mask bag are error and device. And deleting the log to be processed when the log to be processed comprises all words in any one of the shielding word bags.

It should be noted that, as will be understood by those skilled in the art, in practical application, the number of shielding word bags may be set according to needs, and the number of shielding word bags is not limited in this embodiment.

It should be noted that, as will be understood by those skilled in the art, the mask words in each mask bag may be set as required, which is not listed here.

It is worth mentioning that the server directly removes a part of logs according to the set shielding word bags, so that the processing pressure of the server is reduced, and the number of logs is further reduced.

Step 104: and determining whether to reserve the log to be processed according to the relation between the word bags of the log to be processed and the word bags of the history log.

Specifically, in the process of processing the first log file, the log which has been processed in the first log file and is determined to be referred to is saved in the second log file. The history log refers to a log in the second log file.

It should be noted that, as will be understood by those skilled in the art, if the server processes the log immediately after the log is generated, and records the log after determining to keep the log, the history log refers to the recorded log, and the meaning of the history log is not limited in this embodiment.

In one example, if the server determines that the relationship between the bag of words of the log to be processed and the bag of words of the history log is the first relationship or the second relationship, deleting the log to be processed; if the relation between the word bags of the logs to be processed and the word bags of the history logs is determined to be a third relation, reserving a time stamp part of the history logs and the word bags of the logs to be processed; if the relation between the word bags of the logs to be processed and the word bags of the history logs is determined to be a fourth relation or a fifth relation, the timestamp part of the logs to be processed and the word bags of the logs to be processed are reserved. When the word bag of the log to be processed is in a first relation with the word bag of the history log, the log to be processed may have a fault, or the information recorded by the history log is more detailed than the information recorded by the log to be processed, when the word bag of the log to be processed is in a third relation with the word bag of the history log, the history log may have a fault, or the information recorded by the log to be processed is more detailed than the information recorded by the history log. For this case, one log with a larger bag of words is reserved, and the timestamp of the one log with an earlier timestamp is selected as the timestamp of the log. When the word bag of the log to be processed is in the second relation with the word bag of the history log, the log to be processed is possibly identical to the history log, so that the log to be processed can be deleted. When the relation between the word bags of the logs to be processed and the word bags of the history logs is a fourth relation, the history logs and the logs to be processed are explained to have the same parameters, and have different parameters, and the two logs can be logs with different fault types of the same disk, logs with the same fault type of different disks, and logs with only some descriptive words being the same but substantially completely different are explained. Therefore, it is necessary to keep logs to be processed and history logs. When the word bag of the log to be processed and the word bag of the history log are in the fifth relation, the log to be processed and the history log are completely irrelevant, so that the log to be processed and the history log need to be reserved.

As is clear from the above, the method for processing logs according to the present embodiment focuses on the internal relationship between the discovery logs, thereby ensuring that the key logs can be discovered as accurately as possible. After the first log file is processed by the log processing method provided by the embodiment, a second log file can be obtained, compared with the first log file, the repeated log and the damaged log of the second log file are greatly reduced, some excessively detailed information in the log is also removed, each log is unique record, and the time of first occurrence is used as the standard, so that the storage space occupied by the log can be reduced, irrelevant or error logs are removed, and the repeated logs are combined, thereby accelerating the analysis speed. Through verification, the log file processed by the log processing method provided by the embodiment can reduce 90% of storage space, and the analysis efficiency is multiplied.

The foregoing is merely illustrative, and is not intended to limit the technical aspects of the present invention.

Compared with the prior art, the method for processing the log provided by the embodiment compares the word bags of the log to be processed with the word bags of the history log, and can determine the relationship between the log to be processed and the word bags of the history log, wherein the relationship represents the relationship between the log to be processed and the history log. The server can determine the relation between the logs to be processed and the history logs, so that the recorded logs can be selectively reserved according to the relation, the number of the recorded logs is reduced, and the burden of manually processing the logs is reduced.

A second embodiment of the present invention relates to a log processing method, and the present embodiment is a further improvement of the first embodiment, specifically the improvement is that: after all logs to be processed are processed, a mapping file is generated according to the reference log so as to analyze the subsequently received logs.

Specifically, as shown in fig. 2, the present embodiment includes steps 201 to 208, wherein steps 201 to 204 are substantially the same as steps 101 to 104 in the first embodiment, and are not repeated here. The differences between the second embodiment and the first embodiment will be mainly described below:

steps 201 to 204 are performed.

After all logs to be processed are processed, the following steps are performed:

step 205: and acquiring the reserved logs, taking the reserved logs as reference logs, and determining the similarity between the reference logs.

Specifically, in the process of determining the similarity of any two reference logs, the server performs the following operations: determining the similarity between word bags of two reference logs; and taking the similarity between the word bags of the two reference logs as the similarity between the two reference logs. For example, the reference logs include log 1 and log 2, the word bag of log 1 is word bag 3, the word bag of log 2 is word bag 4, and the similarity between log 1 and log 2=the similarity between word bag 3 and word bag 4.

In the first example, the similarity between the bag of words 3 and the bag of words 4 = 100% of the number of words that appear in both the bag of words 3 and the bag of words 4/(the number of words in the bag of words 3 + the number of words in the bag of words 4-the number of words that appear in both the bag of words 3 and the bag of words 4).

In the second example, the server removes the prepositions, the conjunctions, and the like in the bag of words 3 to obtain the bag of words 5, removes the prepositions, the conjunctions, and the like in the bag of words 4 to obtain the bag of words 6, and the similarity between the bag of words 3 and the bag of words 4 = the number of words appearing in the bag of words 5 and the bag of words 6 at the same time/(the number of words in the bag of words 5 + the number of words in the bag of words 6 at the same time) -the number of words appearing in the bag of words 5 and the bag of words 6 at the same time = 100%.

Step 206: and classifying the reference logs according to the similarity between the reference logs.

Specifically, the similarity of the logs of the same class is greater than a first preset value. The first preset value may be any percentage greater than 0 and less than 1, for example, the first preset value is a percentage of 30% to 60%, such as 40%.

For example, the first preset value is 40%, the reference log includes 5 logs with numbers of 1 to 5, respectively, and the information before processing, the information after processing and the size of the word bag of each log are shown in table 1.

Table 1

Wherein mce [ Hardware Error ]: machine check: processor context corrupt represents Machine check anomaly: [ hardware error ]: machine inspection: processor context corruption; kernel panic-not synchronization: timeout: not all CPU entered broadcast exception handler represents a Kernel crash-out of sync: timeout: not all CPUs enter a broadcast exception handler; the list 47 memory errors indicates that 47 memory errors are Lost; sbridge HANDLING MCE MEMORY ERROR indicates handling of MCE MEMORY ERRORs; mce [ Hardware Error ]: CPU 17:Machine Check Exception:5 Bank 12:be00003f001000c3 indicates that an exception was found when CPU number 17 was detected, the exception location being 5 Bank 12:be00003f001000c3. The similarity between each log was calculated using the method provided by the second example, and the similarity between each log is shown in table 2.

Table 2

From the above table, the log with the number 1 and the log with the number 5 belong to the log reflecting the same fault category, the log with the number 3 and the log with the number 4 belong to the log reflecting the same fault category, and the log with the number 2 is an independent log. By continuously learning the existing logs, fault categories of the reference logs can be continuously enriched.

Step 207: the fault class of each class of log is determined, as well as the fault level of each reference log.

Specifically, the server displays each type of log to a maintainer, the maintainer determines and inputs the fault type of the log of the type, and the server determines the fault type of the log of each type according to the fault type input by the user. The server can display each reference log to a maintainer, the maintainer determines and inputs the fault level of each reference log, and the server determines the fault level of each reference log according to the fault level of the reserved log input by the maintainer.

It should be noted that, as will be understood by those skilled in the art, in practical application, the server may also automatically identify the word of the word bag of each log, determine the fault class and the fault level of the log, and the present embodiment is not limited to determining the fault class of the log of each class and the fault level of each reference log.

In one example, word bags of the same fault class are classified into A, B, C, D and E five fault levels by importance from high to low.

Step 208: and generating a mapping file according to the reference log, the fault class of the reference log and the fault level of the reference log.

Specifically, the mapping file is a mapping from the reference log to the category of the reference log, and a mapping from the reference log to the fault level of the reference log, and is used for analyzing the subsequently received log and determining the fault category and the fault level of the subsequently received log.

In one example, after generating the mapping file, the server uses the mapping file to analyze the subsequently received log. The process of the server analyzing the log is as follows: the server acquires a log to be analyzed; processing the log to be analyzed to obtain a word bag of the log to be analyzed; determining the similarity between the word bags of the logs to be analyzed and the word bags of the reference logs in the mapping file; and determining the fault category of the log to be analyzed and the fault level of the log to be analyzed according to the similarity between the word bags of the log to be analyzed and the word bags of the reference log.

In one example, the method for determining the fault class of the log to be analyzed and the fault level of the log to be analyzed by the server according to the similarity between the word bags of the log to be analyzed and the word bags of the reference log includes, but is not limited to, the following two methods:

method 1: the server takes the fault category of the reference log with highest similarity with the word bags of the log to be analyzed as the fault category of the log to be analyzed; and taking the fault level of the reference log with the highest similarity as the fault level of the log to be analyzed.

Method 2: the server judges whether a word bag of a reference log with similarity to the word bag of the log to be analyzed is larger than a second preset value exists in the mapping file; if yes, taking the fault class of the reference log with the highest similarity with the word bags of the logs to be analyzed as the fault class of the first log, and taking the fault level of the reference log with the highest similarity as the fault level of the logs to be analyzed; otherwise, determining the fault class of the word bag of the log to be analyzed as an unknown class, and determining the fault level of the log to be analyzed as an unknown level.

In one example, the method for determining the similarity between the bag of words of the log to be analyzed and the bag of words of the reference log in the mapping file by the server is as follows: calculating the similarity according to the constraint relation of the word bags of the logs to be processed, the word bags of the reference logs and the similarity; wherein, the constraint relation is: similarity = number of words of the bag of words of the first log and the bag of words of the reference log present at the same time/(number of words of the bag of words of the first log + number of words of the reference log-number of words of the bag of words of the first log present at the same time).

It should be noted that, the process of analyzing the log to be analyzed by the server may refer to the process of analyzing the first log by the server in the third embodiment and the fourth embodiment, which are not described in detail herein, and those skilled in the art may refer to the content of the third embodiment and the fourth embodiment to analyze the log to be analyzed.

Compared with the prior art, the log processing method provided in the embodiment can selectively reserve the recorded logs according to the relation between the logs to be processed and the history logs, so that the number of the recorded logs is reduced, and the burden of manually processing the logs is reduced. The server generates a mapping file according to the processed log so that the server automatically analyzes the log received subsequently, the intelligence of the server is improved, the workload of maintenance personnel is reduced, and the pressure of manually analyzing the log is reduced.

A third embodiment of the present invention relates to a log analysis method, which is applied to a server. As shown in fig. 3, the method comprises the following steps:

step 301: and acquiring a first log to be processed.

Step 302: and processing the first log to obtain a word bag of the first log.

In one example, the server deletes a variable in the first log, where the variable is a preset parameter; splitting the first log after deleting the variables into N words, generating a word bag of the log to be processed, wherein N is a positive integer. The preset parameters at least comprise any one of position information of a bad track, number information of the bad track, position information of a bad block and number information of the bad block.

In one example, the method for deleting the variable in the first log by the server is: identifying a number of a body portion of the first log; the digits of the body portion of the first log are deleted.

It should be noted that, the process of processing the first log by the server to obtain the bag of words of the first log is substantially the same as the process of processing the log to be processed in the first embodiment to obtain the bag of words of the log to be processed, and a person skilled in the art may refer to the related content of the first embodiment to perform this step.

Step 303: and determining the similarity of the word bag of the first log and the word bag of the reference log in the mapping file.

Specifically, the mapping file includes a bag of words of the reference log, and a fault class of the reference log, and/or a fault level of the reference log. The method for creating the mapping file may refer to the relevant content of the log processing method mentioned in the second embodiment, and will not be described herein.

The method for determining the similarity of the bag of words of the first log and the bag of words of the reference log in the mapping file by the server comprises the following two methods but is not limited to the following:

method 1: the server calculates the similarity according to the constraint relation of the word bags of the first log, the word bags of the reference log and the similarity; wherein, the constraint relation is: similarity = number of words of the bag of words of the first log and the bag of words of the reference log present at the same time/(number of words of the bag of words of the first log + number of words of the reference log-number of words of the bag of words of the first log present at the same time).

Method 2: the server removes invalid words in the bag of words of the first log and the bag of words of the reference log, the invalid words being words specified in advance, for example, words having no meaning such as various prepositions, connective words, and the like. And after invalid words in the word bags of the first log and the reference log are removed, calculating the similarity according to the constraint relation of the word bags of the first log, the word bags of the reference log and the similarity.

It should be noted that, since the invalid words are the same and do not represent the fault types of the two logs, and/or the fault levels are the same, the influence of the invalid words on the similarity of the bag of the first log and the bag of the reference log can be avoided by removing the invalid words in the bag of the first log and the bag of the reference log.

In one example, a shielding word bag is set in the server, before determining the word bag of the first log and the word bag of the reference log, judging whether the word bag of the first log contains all words in the shielding word bag, if yes, ignoring the first log, otherwise, executing the subsequent steps.

Step 304: and determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bags of the first log and the word bags of the reference log.

In particular, the server is enabled to analyze the first log using the mapping file due to the inclusion of a bag of words for the reference log, a fault class for the reference log, and/or a fault level for the first log in the mapping file.

The method in which the server analyzes the first log using the mapping file is exemplified as follows.

The method a includes that a server takes a fault class of a reference log with highest similarity to a word bag of a first log in a mapping file as the fault class of the first log, and/or takes a fault level of the reference log with highest similarity to the word bag of the first log as the fault level of the first log.

Specifically, if the mapping file includes the word bags of the reference logs and the fault categories of the reference logs, the server takes the fault category of the reference log with the highest similarity with the word bags of the first logs in the mapping file as the fault category of the first logs. And if the mapping file comprises the word bags of the reference logs and the fault level of the reference logs, the server takes the fault level of the reference log with the highest similarity with the word bags of the first logs in the mapping file as the fault level of the first logs. If the mapping file comprises the word bags of the reference logs, the fault categories of the reference logs and the fault levels of the reference logs, the server takes the fault category of the reference log with the highest similarity with the word bags of the first logs in the mapping file as the fault category of the first logs and takes the fault level of the reference log with the highest similarity with the word bags of the first logs as the fault level of the first logs.

The method b, the server judges whether the word bags of the reference log with the similarity with the word bags of the first log larger than a second preset value exist in the mapping file; if the determination is, taking the fault class of the reference log with the highest similarity with the word bag of the first log as the fault class of the first log, and/or taking the fault level of the reference log with the highest similarity with the word bag of the first log as the fault level of the first log; otherwise, determining the fault class of the bag of words of the first log as an unknown class, and/or determining the fault level of the first log as an unknown level. Wherein the second preset value may be set to a value greater than 0 and less than 1, for example, to a value of 30% to 60%, such as 40%, as needed.

In one example, logs of the same fault class may be divided into A, B, C, D and E five fault levels, where E is an unknown level. There may be some differences in the importance of the same fault class for the same fault level logs, in which case M sub-levels may be derived at each fault level, e.g., sub-levels A1, A2, A3, A4, A5, A6, A7, A8, A9 and a10 may be derived for fault level a, such that word bags of the same fault level logs remain in a distinct space.

It is worth mentioning that when the word bags of the reference logs with the similarity to the word bags of the first log being larger than the second preset value do not exist in the mapping file, the fact that the first log and the reference log in the mapping file do not belong to the same fault category is indicated, the server marks the fault category of the word bags of the first log as an unknown category, and maintenance personnel can find the undetected new fault category in time conveniently.

In one example, the mapping file includes a bag of words for the reference log, a fault class for the reference log, and a fault level for the reference log. After the server determines that the fault class of the word bag of the first log is an unknown class and determines that the fault level of the first log is an unknown level, the server reports the first log; determining the fault class of the first log and the fault level of the first log according to the fault class and the fault level designated by the user; and updating the mapping file according to the word bags of the first logs, the fault types of the first logs and the fault levels of the first logs.

It is worth mentioning that the server reports logs of unknown categories and unknown levels in time, and updates the mapping file according to the fault categories and fault levels assessed by the user, so that the mapping file can be continuously expanded and perfected, and the accuracy of the server in analyzing the logs is improved.

It should be noted that, in an extreme case, there are a plurality of reference logs with the highest similarity, that is, the bag of words with the plurality of reference logs has the same similarity as the bag of words of the first log, and is the highest value, the server may set the fault class of the first log to be an unknown class, and the fault level of the first log to be an unknown level.

Compared with the prior art, the method for analyzing the log provided by the embodiment can analyze the first log by using the mapping file by the server, determine the fault class of the first log and/or the fault level of the first log, improve the intelligence of the server and lighten the pressure of maintainers in analyzing the log. In addition, in the process of analyzing the logs, the logs of unknown categories can be fed back in time, the mapping files are updated in time according to the fault categories and the fault levels designated for the logs of unknown categories, the mapping files are perfected continuously, the more perfected the mapping files are, and the more accurate the conclusion obtained by analyzing the logs by using the mapping files is.

A fourth embodiment of the present invention relates to a method for analyzing logs, wherein the present embodiment is a further improvement of the third embodiment, and the specific improvement is that: after step 304, other relevant steps are added.

Specifically, as shown in fig. 4, the present embodiment includes steps 401 to 408, wherein steps 401 to 403 are substantially the same as steps 301 to 303 in the first embodiment, and are not described herein. The differences between the fourth embodiment and the third embodiment will be mainly described below:

steps 401 to 403 are performed.

Step 404: and determining the fault class of the first log and the fault level of the first log according to the similarity between the word bags of the first log and the word bags of the reference log.

Specifically, the mapping file includes a bag of words for the reference log, a fault class for the reference log, and a fault level for the reference log. The server determines a reference log with highest word bag similarity with the first log according to the word bag similarity of the first log and the word bag of the reference log, wherein the fault class of the reference log with highest similarity is used as the fault class of the first log, and the fault level of the reference log with highest similarity is used as the fault level of the first log.

Step 405: and judging whether a second log exists in the recorded logs.

Specifically, the second log is a log belonging to the same fault class as the first log. If the server determines that the second log exists in the recorded logs, step 406 is executed, otherwise step 407 is executed.

Step 406: and comparing the fault level of the first log with the fault level of the second log, and updating the recorded log according to the comparison result.

Specifically, if the server determines that the comparison result indicates that the fault level of the first log is higher than that of the second log, the server overlays the second log with the first log; if the comparison result is determined to indicate that the fault level of the first log is not higher than that of the second log, the second log is not covered by the first log, so that the logs with high fault levels can be covered with the logs with low fault levels.

It is worth mentioning that the logs with low fault level are covered by the logs with high fault level, so that the number of the recorded logs is reduced, and the time and effort wasted by maintenance personnel for analyzing the logs are reduced. The maintainer can intuitively acquire the key log with the highest fault level in each fault category, so that the maintainer can repair more serious faults in time.

It should be noted that, as understood by those skilled in the art, in practical application, the recorded logs may be updated in other manners, for example, the first log and the second log are stored in a table form in the server. If the failure level of the first log is higher than that of the second log, the first log is recorded before the second log, and if the failure level of the first log is lower than that of the first log, the first log is recorded after the second log, and the method for updating the log is not limited in the present embodiment.

Step 407: the first log is recorded.

Specifically, since the log of the fault class is not recorded, the server may record the first log in the log file so that the maintenance person knows the information of the log.

Compared with the prior art, the method for analyzing the log provided by the embodiment replaces the recorded lifting log with the first log after the fault level of the first log is higher than the fault level of the recorded log with the same fault type, thereby ensuring that the importance of the recorded log is continuously improved and further achieving the effect of continuously upgrading the alarm.

The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.

A fifth embodiment of the present invention relates to a server, as shown in fig. 5, including: at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; the memory 502 stores instructions executable by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can execute the log processing method according to the above embodiment.

A sixth embodiment of the present invention relates to a server, as shown in fig. 6, including: at least one processor 601; and a memory 602 communicatively coupled to the at least one processor 601; the memory 602 stores instructions executable by the at least one processor 601, and the instructions are executed by the at least one processor 601, so that the at least one processor 601 can perform the log analysis method according to the above embodiment.

In the fifth and sixth embodiments, the server includes: one or more processors and memory, one processor being illustrated in fig. 5 and 6. The processor, memory may be connected by a bus or otherwise, as exemplified in fig. 5 and 6 by a bus connection. The memory is used as a non-volatile computer readable storage medium for storing non-volatile software programs, non-volatile computer executable programs, and modules. The processor executes various functional applications of the device and data processing by running non-volatile software programs, instructions and modules stored in the memory.

The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store a list of options, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the memory optionally includes memory remotely located from the processor, the remote memory being connectable to the external device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory that, when executed by one or more processors, perform the method of processing logs or the method of analyzing logs in any of the method embodiments described above.

The above product may perform the method provided by the embodiment of the present application, and has the corresponding functional module and beneficial effect of the performing method, and technical details not described in detail in the embodiment of the present application may be referred to the method provided by the embodiment of the present application.

A seventh embodiment of the present application relates to a computer-readable storage medium storing a computer program. Embodiments of the method of processing a log described above are implemented when a computer program is executed by a processor.

An eighth embodiment of the present application relates to a computer-readable storage medium storing a computer program. Embodiments of the method of analysis of logs described above are implemented when a computer program is executed by a processor.

That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the invention and that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. A method of analyzing a log, comprising:

acquiring a first log to be processed;

processing the first log to obtain a word bag of the first log;

determining similarity of a word bag of the first log and a word bag of a reference log in a mapping file, wherein the mapping file comprises the word bag of the reference log and a fault class of the reference log and/or a fault level of the reference log;

determining a fault class of the first log and/or a fault level of the first log according to the similarity of the word bags of the first log and the reference log;

after determining the fault class of the first log and the fault level of the first log according to the similarity between the word bag of the first log and the word bag of the reference log, the method for analyzing the log further comprises:

judging whether a second log exists in the recorded logs, wherein the second log is a log belonging to the same fault type as the first log;

If the fault level of the first log and the fault level of the second log are determined to exist, updating the recorded log according to a comparison result;

and if the first log does not exist, recording the first log.

2. The method for analyzing the log according to claim 1, wherein the determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bag of the first log and the word bag of the reference log specifically comprises:

and taking the fault class of the reference log with the highest similarity with the word bag of the first log as the fault class of the first log, and/or taking the fault level of the reference log with the highest similarity with the word bag of the first log as the fault level of the first log.

3. The method for analyzing the log according to claim 1, wherein the determining the fault class of the first log and/or the fault level of the first log according to the similarity between the word bag of the first log and the word bag of the reference log specifically comprises:

judging whether a word bag of a reference log with similarity to the word bag of the first log larger than a second preset value exists in the mapping file or not;

If so, taking the fault class of the reference log with the highest similarity with the word bag of the first log as the fault class of the first log, and/or taking the fault level of the reference log with the highest similarity with the word bag of the first log as the fault level of the first log;

otherwise, determining the fault class of the bag of words of the first log as an unknown class, and/or determining the fault level of the first log as an unknown level.

4. A method of analysing a log according to claim 3, wherein the map file comprises a bag of words for the reference log, a fault class for the reference log and a fault level for the reference log;

after determining that the fault class of the bag of words of the first log is an unknown class and determining that the fault level of the first log is an unknown level, the log analysis method further comprises:

reporting the first log;

determining the fault class of the first log and the fault level of the first log according to the fault class and the fault level designated by the user;

and updating the mapping file according to the word bag of the first log, the fault class of the first log and the fault level of the first log.

5. The method for analyzing a log according to claim 1, wherein updating the recorded log according to the comparison result specifically comprises:

if the comparison result is determined to indicate that the fault level of the first log is higher than that of the second log, covering the second log by the first log;

and if the comparison result is determined to indicate that the fault level of the first log is not higher than the fault level of the second log, the second log is not covered by the first log.

6. A method of analyzing logs according to any of claims 1 to 3, wherein determining the similarity of the bag of words of the first log to the bag of words of the reference log in the mapping file comprises:

calculating the similarity according to the constraint relation of the word bags of the first log, the word bags of the reference log and the similarity; wherein, the constraint relation is: the similarity = the number of words of the bag of words that occur simultaneously in the first log and the bag of words of the reference log/(the number of words of the bag of words of the first log + the number of words of the reference log-the number of words of the bag of words that occur simultaneously in the first log and the bag of words of the reference log).

7. The method according to claim 6, wherein before the calculation of the similarity according to the constraint relation of the bag of words of the first log, the bag of words of the reference log, and the similarity, the method further comprises:

removing invalid words in the word bags of the first log and the reference log; wherein the invalid word is a pre-specified word.

8. The method for analyzing a log according to claim 1, wherein the processing the first log to obtain a bag of words of the first log specifically includes:

deleting a variable in the first log, wherein the variable is a preset parameter;

splitting the first log after deleting the variables into N words, generating a word bag of the log to be processed, wherein N is a positive integer.

9. The method according to claim 8, wherein the preset parameters include at least any one of position information of a bad track, number information of a bad track, position information of a bad block, and number information of a bad block.

10. The method for analyzing a log according to claim 9, wherein deleting the variable in the first log specifically includes:

Identifying a number of a body portion of the first log;

and deleting the number of the text part of the first log.

11. A server, comprising: at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of log analysis of any one of claims 1 to 10.

12. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of analyzing logs according to any of claims 1 to 10.