CN115470034A - Log analysis method, device and storage medium - Google Patents

Log analysis method, device and storage medium Download PDF

Info

Publication number
CN115470034A
CN115470034A CN202211274833.8A CN202211274833A CN115470034A CN 115470034 A CN115470034 A CN 115470034A CN 202211274833 A CN202211274833 A CN 202211274833A CN 115470034 A CN115470034 A CN 115470034A
Authority
CN
China
Prior art keywords
log
error
analyzed
information
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211274833.8A
Other languages
Chinese (zh)
Inventor
张雅婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202211274833.8A priority Critical patent/CN115470034A/en
Publication of CN115470034A publication Critical patent/CN115470034A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a log analysis method, log analysis equipment and a storage medium. The method comprises the following steps: acquiring a log to be analyzed; processing the log to be analyzed, and determining an information vector to be analyzed; for each information vector to be analyzed, determining a corresponding alternative error category according to the log error reporting information feature library; and determining a classification result according to each alternative error category. By the log analysis method, the log to be analyzed is processed into the information vector to be analyzed, and the classification result is determined according to the alternative error categories in the log error reporting information feature library, so that the efficiency of finding the error log can be improved.

Description

Log analysis method, device and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a log analysis method, a log analysis device, and a storage medium.
Background
When a large amount of automatic tests are developed, the number of test cases is huge, and even if the failure rate is low, the absolute number of failed cases can often reach dozens or hundreds; during batch transaction testing, the batch script amount of a project is usually dozens to hundreds, and the execution result depends on judgment and analysis by developers, so that the testers are difficult to master the specific reason of the failure of the test result.
At present, the error reporting problem of a positioning program and a script is mainly realized by manually analyzing logs, log files are large and complex, testers cannot actively grab the script to execute error reporting, inconvenience is caused to defect extraction and tracking, and the pressure of positioning errors of developers is huge during a large number of tests. And useful information in the log is hidden in huge files, so that classification, visualization and entry display cannot be realized, and the difficulty in positioning and tracking defects is increased.
Disclosure of Invention
The invention provides a log analysis method, log analysis equipment and a log analysis storage medium, which are used for solving the problem of difficulty in defect positioning in the test process and automatically and accurately analyzing error report.
According to an aspect of the present invention, there is provided a log analysis method, including:
acquiring a log to be analyzed;
processing the log to be analyzed, and determining an information vector to be analyzed;
for each information vector to be analyzed, determining a corresponding alternative error category according to the log error reporting information feature library;
and determining a classification result according to each alternative error category.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the log analysis method of any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the log analysis method according to any one of the embodiments of the present invention when the computer instructions are executed.
The technical scheme of the embodiment of the invention comprises the following steps: acquiring a log to be analyzed; processing the log to be analyzed, and determining an information vector to be analyzed; for each information vector to be analyzed, determining a corresponding alternative error category according to the log error reporting information feature library; and determining a classification result according to each alternative error category. The problem that a developer cannot quickly and accurately position a test error in the prior art is solved, the log to be analyzed is processed into the information vector to be analyzed, the information vector to be analyzed is analyzed according to the log error reporting information feature library, the corresponding alternative dislocation category is determined, and the classified result of the error generated by the information vector to be analyzed is determined according to the alternative error category. The log to be analyzed is automatically analyzed, the workload of manual error checking of testers can be greatly reduced by determining the classification result, the working efficiency is improved, the log to be analyzed is automatically analyzed to determine the information vector to be analyzed, the log file analysis efficiency can be improved, manual log checking and analysis are not needed, the efficiency of finding the error log is improved, and the test efficiency is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a log analysis method according to an embodiment of the present invention;
fig. 2 is a flowchart of a log analysis method according to a second embodiment of the present invention;
fig. 3 is a schematic flowchart of a log analysis method based on a k-nearest neighbor algorithm according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device of a log analysis method according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is to be understood that the terms "target" and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a log analysis method according to an embodiment of the present invention, which is applicable to a situation of analyzing a log file in an automated test. As shown in fig. 1, the method includes:
and S110, acquiring a log to be analyzed.
The log to be analyzed may be a log file that needs to be subjected to error analysis. The log file can be a file which is generated by writing a log function in a program and records the execution condition of the program, the log file is a main tool for checking and positioning problems in the program and the script, and the general principle of writing the log file is simple and clear, is convenient for checking the problems, follows a certain rule and is convenient for searching.
Specifically, in the process of project development, in order to ensure that the project can be applied, the project needs to be tested. And testing the developed program by a tester, generating a log file according to the test result and storing the log file correspondingly. The log file may be obtained from a storage space of a local server or other servers as a log to be analyzed, or may be obtained through a third-party data platform, and a specific encoding format of the log to be analyzed is not limited in this embodiment. For example, common encoding formats are ASCII, ANSI, GBK, GB2312, UTF-8, GB18030, UNICODE, and the like.
And S120, processing the log to be analyzed, and determining the information vector to be analyzed.
Wherein the information vector to be analyzed may be text information of the log to be analyzed that is vectorized. For example, the text information in the log to be analyzed may be identified by a vector, and each error-reported text information may be mapped to a vector (T) in the feature space 1 ,w 1 ;T 2 ,w 1 ;…;T n ,w n ) Wherein, T i Representing a characteristic entry, w i Representing the weight of the entry.
Specifically, the classification dimensions included in the information vector to be analyzed, that is, the types and the number of the feature entries included in the information vector to be analyzed, are predetermined. The text content can be extracted according to the coding format of the log to be analyzed, the extracted text content is analyzed and processed, and the text information is converted into vectors according to the classification dimensions, so that at least one information vector to be analyzed is obtained. When the vector conversion is performed, each piece of information in the text information can be converted into an information vector to be analyzed, or only each piece of information with errors in the text information can be converted into an information vector to be analyzed.
S130, determining a corresponding alternative error category according to the log error reporting information feature library for each information vector to be analyzed.
The log error reporting information feature library may be a database storing representative error information feature vectors, the feature vectors may be understood as text information expressed as vectorized logs, and the expression form of the feature vectors may be the same as that of information vectors to be analyzed. The error information feature vector is a representative feature vector that clearly classifies error categories. The error information feature vector can be used as a sample to provide comparison and classification basis for the log error information feature library.
The alternative error category may be an error category stored in the error log information feature library and matched with the information vector to be analyzed. The error categories may include database errors, program exceptions, communication problems, environmental problems, and the like.
Specifically, features of log error reporting possibly caused in text information of various logs can be extracted and used as dimensional components forming error information feature vectors, so that the text information of the logs is represented in a feature vector form and stored in a database, and a log error reporting information feature library can be obtained.
In this embodiment, after at least one information vector to be analyzed is obtained, a log error information feature library is queried, an error information feature vector corresponding to each information vector to be analyzed in the log error information feature library is determined, an error category corresponding to the error information feature vector is determined as an alternative error category corresponding to the information vector to be analyzed, the number of the alternative error categories may be one or multiple, and the number of the alternative error categories may be preset. The alternative error categories of different information vectors to be analyzed may be the same or different; multiple candidate error categories corresponding to the same information vector to be analyzed may be the same or different.
For example, the common characteristics of the text information of various types of logs may include: function name, error code, error information, stack trace information, and the like. If the function name of the information vector to be analyzed is progress download, the error code is 00258, the error information is read-write overtime, and the stack tracking information is 000; the error information feature vector in the log error information feature library comprises an error information feature vector with a function name of progress Downloader, an error code of 00258, read-write overtime and stack tracking information of 000, and an error information feature vector with a function name of progress Downloader, an error code of 00258, overtime and stack tracking information of 000. It indicates that the information vector to be analyzed has the same or similar sample in the error reporting information feature library, and the error category corresponding to the same or similar error information feature vector may be used as the candidate error category corresponding to the information vector to be analyzed.
And S140, determining a classification result according to each alternative error category.
The classification result may be the determined final error category, and the classification result may be a database error report, a program exception, a communication problem, an environmental problem, or the like.
Specifically, after determining the candidate error category corresponding to each information vector to be analyzed, counting the situations of the candidate error categories of all the information vectors to be analyzed, and taking the candidate error category which appears most times as a classification result; if a plurality of alternative error categories with the same maximum times exist, one alternative error category can be randomly selected as a classification result; the classification result may also be determined according to the priority, or all the candidate error categories with the same maximum number of times may be used as the classification result, which is not limited in this embodiment.
The embodiment of the invention provides a log analysis method, which comprises the steps of obtaining a log to be analyzed; processing the log to be analyzed, and determining an information vector to be analyzed; for each information vector to be analyzed, determining a corresponding alternative error category according to the log error reporting information feature library; and determining a classification result according to each alternative error category. The problem that a developer cannot quickly and accurately position a test error in the prior art is solved, the log to be analyzed is processed into the information vector to be analyzed, the information vector to be analyzed is analyzed according to the log error reporting information feature library, the corresponding alternative dislocation category is determined, and the classified result of the error generated by the information vector to be analyzed is determined according to the alternative error category. The log to be analyzed is automatically analyzed, the workload of manual error checking of testers can be greatly reduced by determining the classification result, the working efficiency is improved, the log to be analyzed is automatically analyzed to determine the information vector to be analyzed, the log file analysis efficiency can be improved, manual log checking and analysis are not needed, the efficiency of finding the error log is improved, and the test efficiency is improved.
Example two
Fig. 2 is a flowchart of a log analysis method according to a second embodiment of the present invention, which is further detailed between the present embodiment and the foregoing embodiments. As shown in fig. 2, the method includes:
s201, obtaining a server address and a log storage path which are configured in advance.
The server address is a physical address of the server, the log storage path is a storage path of the log file, and the server address and the log storage path can be configured in advance according to actual requirements.
Specifically, before the log file is formed, a log storage path corresponding to the log file and a stored server address are defined in advance, and after the log file is generated, the log file is directly stored according to the server address and the log storage path. Before analyzing the log file, firstly, a server address and a log storage path corresponding to the log file are acquired.
S202, when the log analysis condition is met, obtaining the log to be analyzed according to the server address and the log storage path.
The log analysis condition may be a condition for judging whether the log file needs to be analyzed, when a certain log file meets the log analysis condition, it indicates that the log needs to be analyzed, and the log analysis condition may be manually set according to actual needs. For example, the log analysis condition may be that the analysis is automatically performed after the log is wrongly reported, or the analysis is performed after the analysis instruction is received, or the analysis is automatically performed after a time interval is set, which is not limited in this embodiment.
Specifically, a log analysis condition is preset, whether the log analysis condition is met is judged according to information (for example, time information, instruction information and the like) associated with the log analysis condition, after the log analysis condition is determined to be met, a log to be analyzed is obtained according to a server address and a log storage path, the log to be analyzed can be all log files in the log storage path or can be part of log files, for example, the log analysis condition is associated with the log to be analyzed, the log file associated with the log analysis condition is used as the log file to be analyzed, or an error-reported log file is used as the log file to be analyzed, and the log file which is not reported to be in error does not need to be analyzed.
And S203, determining a log information string according to the log to be analyzed.
The log information string may be a text content in the log text.
Specifically, according to the encoding format of the log text, information in the log file to be analyzed can be filtered, useless information such as messy codes and the like can be filtered, useful text content can be extracted, and the text content can be split to obtain a plurality of log information strings.
Optionally, determining a log information string according to the log to be analyzed includes: extracting a log to be analyzed and determining text content; the text content is parsed into a log information string.
The text content may be effective content of the log to be analyzed.
Specifically, the log to be analyzed is usually large and complex, and useful information must be obtained through analysis, so that the useful information and useless redundant information in the log to be analyzed can be determined according to the encoding format of the log file, the redundant information in the log to be analyzed is filtered out, useful text content is determined, the text content is analyzed into a log information string, and the analysis of the text content into the log information string can be realized by taking each line of information in the text content as a log information string. By decomposing the log to be analyzed into the log information string convenient for analysis, the error analysis of the log to be analyzed can be facilitated.
And S204, matching each log information string with a characteristic value label in a predetermined characteristic value label set, and determining the successfully matched log information string as an information string to be analyzed.
The feature value label set may be a set of feature value labels, and the feature value label set may be preset. The characteristic value label is a label of a characteristic value, a log information string with the characteristic value can be screened out through the characteristic value label, and the characteristic value label is usually used for marking that a code execution result recorded in a log file is an error.
Specifically, the characteristic value labels are determined in advance according to project requirements or actual test experience, so that one type of errors can be marked through the characteristic value labels. And forming a characteristic value label set according to the characteristic value labels. And matching the log information string with each characteristic value label in the characteristic value label set, screening the log information string to be screened according to the characteristic value label in the characteristic value label set, and determining the log information string successfully matched with the characteristic value label as the information string to be analyzed. For example, when "timeout" is included in the feature value tag, if the feature "timeout" is also included in the log information string, the log information string may be determined as the information string to be analyzed. By setting the characteristic value label, the log information string to be searched can be automatically and quickly screened out.
S205, determining a characteristic entry value according to information corresponding to the information string to be analyzed and preset characteristic entries, and forming an information vector to be analyzed according to the weight value of each characteristic entry.
The feature entries may be classification dimensions of the features, for example, categories of the feature entries may include dimensions of function names, error codes, error information, stack tracking information, and the like. The feature entry weight value may refer to a weight corresponding to the feature entry.
Specifically, after the information string to be analyzed is determined, information contained in the information string to be analyzed, that is, corresponding information, may be correspondingly determined, and the information in the information string to be analyzed is compared with a preset feature entry, for example, the semantics of each information in the information string to be analyzed is first analyzed, the semantics and the feature entry are mapped, if there is a semantic corresponding to the feature entry, a feature entry weight value is determined according to the information corresponding to the semantics, and the semantics may be directly used as the feature entry weight value, or the semantics may be processed to obtain the corresponding feature entry weight value. Or, the information string to be analyzed is stored according to a certain rule, the information in the information string to be analyzed is directly mapped with the feature vocabulary entry weight value according to the storage rule, the information corresponding to the feature vocabulary entry weight value is used as the feature vocabulary entry weight value or is processed into the feature vocabulary entry weight value, and the information can be processed by adopting a TF-IDF (word frequency-inverse text frequency) statistical method to obtain the feature vocabulary entry weight value. After determining a characteristic entry weight value corresponding to each characteristic entry, forming an information vector to be analyzed according to the characteristic entry weight value of each information string to be analyzed, and determining the characteristic entry weight value as 0 when the characteristic entry does not have the corresponding entry weight value; and when an information vector to be analyzed is formed according to the weight values of the characteristic entries, if the sum of the weight values of the characteristic entries is not 1, processing the weight values of the characteristic entries through normalization to obtain the information vector to be analyzed. For example, each information string to be analyzed may be represented in a vector form according to the weight values of the feature entries constituting the information string to be analyzed.
For example, the information vector to be analyzed may be [ function name: 0.5; and (3) error code: 0; error information: 0.3; stack trace information: 0.2].
S206, calculating the similarity between the information vector to be analyzed and the error reporting information vector in the log error reporting information feature library.
The similarity refers to the similarity between the information vector to be analyzed and the error reporting information vector in the log error reporting information feature library, and the higher the similarity is, the error category of the information vector to be analyzed may be the error category corresponding to the error reporting information vector. The error information vector may be a representative error information vector, and the error information vector may be pre-stored in the log error information feature library.
Specifically, after the information vector to be analyzed is obtained, the similarity between the information vector to be analyzed and each error information vector in the log error information feature library can be calculated according to a similarity calculation method. The Similarity calculation method may be a method in which euclidean Distance (euclidean Distance), manhattan Distance (Manhattan Distance), minkowski Distance (Minkowski Distance), cosine Similarity (Cosine Similarity), or the like may be used to calculate the Similarity.
And S207, screening out a preset number of target error reporting information vectors according to the similarity.
And the target error reporting information vector is the finally determined error reporting information vector associated with the information vector to be analyzed. The preset number may be a preset fixed value, or may be dynamically determined according to an actual screening situation during the screening process.
Specifically, after the similarity between the information vector to be analyzed and the error reporting information vector in the log error reporting information feature library is obtained, a preset number of error reporting information vectors can be screened out according to each similarity to serve as the target error reporting information vector. When the preset number is a fixed value, the error reporting information vectors corresponding to the maximum similarity are selected according to the sequence of the similarity, and the error reporting information vectors of the preset number are selected as the target error reporting information vectors. When a fixed preset number is set, the size of the preset number needs to be not larger than the number of error reporting information vectors in the log error reporting information feature library. If the preset number is dynamically determined, a threshold may be set, only the error-reporting information vectors corresponding to the similarities greater than the threshold are used as the target error-reporting information vectors, and the number of the target error-reporting information vectors obtained through screening at this time is determined as the preset number, which is not limited in this embodiment.
For example, a preset number of target error information vectors may be filtered out using a k-Nearest Neighbor algorithm (KNN), by which k target error information vectors closest to the information vector to be analyzed may be filtered out. The k-nearest neighbor algorithm is one of the machine learning algorithms, i.e. given a training data set, for a new input instance, k instances that are nearest to the instance are found in the training data set, and most of the k instances belong to a certain class, and the input instance is classified into the class. The input of the k-nearest neighbor algorithm is a feature vector of an instance, and the output is a category of the instance. For example, after the information vector to be analyzed is input, the candidate error category of the information vector to be analyzed may be output.
Further, when the preset quantity is a preset fixed value, the application provides a method for determining the preset quantity, and the step of determining the preset quantity includes:
A. the method comprises the steps of obtaining an initial number, a training sample set and a testing sample set, wherein the training sample set comprises error reporting information vectors and corresponding error categories, and the testing sample set comprises testing information vectors and corresponding standard error categories.
The initial number may be a preset initial value, and the selection of the initial number may affect the final classification result. The training sample set may be a sample set of analyzed and processed error reporting information collected in advance and used as a comparison standard in a training process, and the training sample set may be the same as or different from the log error reporting information feature library. The test sample set may be a sample set in which the classification result is known in advance, the test information vector may be an error reporting information vector that needs to be tested, and the standard error category may be an error category corresponding to the test information vector.
Specifically, in order to obtain a more accurate k neighbor classification algorithm model, a training sample set is created, an initial number k can be set, the k value is corrected by using the test sample set until the accuracy value of the k neighbor classification algorithm model reaches a preset accuracy, and the numerical value of the preset accuracy can be freely set as required.
Illustratively, if the training sample set has a total of N samples, each training sample information instance is mapped to the N-dimensional feature space R n A vector T of i =(T i1 ,w i1 ;T i2 ,w i1 ;…;T im ,w im ;…;T in ,w in ) Wherein, T im Representing characteristic entries, including dimensions of function name, error code, error information, stack trace information, w im The weight value of the entry, namely the weight value of the feature entry, is represented, and the calculation method of the weight value of the feature entry adopts a classic Term Frequency-Inverse text Frequency (TF-IDF) statistical method. And processing the information in the log file to obtain the characteristic vector in the form, and further dividing the characteristic vector into an error-reporting information vector and a test information vector to obtain a training sample set and a test sample set.
B. And determining the prediction error category corresponding to each test information vector according to the error reporting information vectors in the training sample set and the initial quantity.
The prediction error category may be an error category obtained by testing the test information vector.
Specifically, after setting the initial number k, in order to optimize the k value, the prediction error category corresponding to each test information vector needs to be determined according to the error reporting information vectors in the training sample set and the initial number. For example, an initial value k =1 is selected, a classification rule is determined, and the value of k is optimized by continuously debugging a test sample set in the next step. The classification rule may be:
for i=1to N do
calculating each error information vector T in training sample set i =(T i1 ,w i1 ;T i2 ,w i1 ;…;T im ,w im ;…;T in ,w in ) And a test information vector T to be classified x =(T x1 ,w x1 ;T x2 ,w x1 ;…;T xm ,w xm ;…;T xn ,w xn ) Euclidean distance between them (representing the distance between two points in a high dimensional space), representing the similarity between them:
Figure BDA0003896042490000121
end for
and for each test information vector, obtaining the first k test information vectors with the minimum Euclidean distance according to the obtained Euclidean distance sequence, determining the corresponding error categories, and counting the occurrence frequency of each error category, wherein the frequency with the maximum frequency is the prediction error category corresponding to the test information vector to be classified. Through the method, the prediction error category corresponding to each test information vector is determined.
C. And determining the prediction accuracy according to each prediction error category and the corresponding standard error category.
The prediction accuracy may refer to a similarity between the prediction error category and the standard error category, and a higher similarity indicates a higher prediction accuracy.
Specifically, after the prediction error category corresponding to each test information vector is determined, the obtained prediction error category is compared with the corresponding standard error category to determine whether the prediction result is correct. And when the prediction error category is consistent with the corresponding standard error category, determining that the prediction is accurate, determining whether each test information vector is accurate in prediction, counting the number of accurate predictions, determining the proportion of the number of accurate predictions to the total number of the test information vectors, and determining the proportion as the prediction accuracy.
D. Judging whether the prediction accuracy is greater than an accuracy threshold, if so, determining the initial quantity as a preset quantity; otherwise, updating the initial quantity, determining a new initial quantity, and returning to the step of determining the prediction error category corresponding to each test information vector according to the error reporting information vectors in the training sample set and the initial quantity.
The accuracy threshold may be a preset value, and may be freely set according to actual conditions.
Specifically, when the prediction accuracy is greater than the accuracy threshold, it indicates that the initial number k at this time has reached the requirement, and the initial number k may be set to a preset number; if the prediction accuracy is not greater than the accuracy threshold, the initial number k is not the optimal value, the k value needs to be corrected continuously according to the test sample set, the initial number needs to be updated, the initial number can be updated by adding 1, 2 and the like, the new initial number is determined, and the step of determining the prediction error category corresponding to each test information vector according to the error reporting information vectors in the training sample set and the initial number is returned.
According to the method, the error logs are analyzed and processed by adopting a k-nearest neighbor algorithm, the samples in the test sample set to be classified are predicted through the training sample set with known error categories, the relation between the samples is most directly utilized, and the classification error caused by inaccurate feature value selection is reduced. As a prediction method of error classification based on an example, training is carried out without investing time, and a good classification effect can be obtained as long as training samples are enough.
And S208, respectively determining the error types corresponding to the target error reporting information vectors as candidate error types.
Specifically, since the error information vectors and the corresponding error categories are stored in the log error information feature library, after each target error information vector is determined, the error category corresponding to each target error information vector can be directly determined according to the log error information feature library, and each error category is used as a candidate error category of the information vector to be analyzed.
And S209, counting the corresponding times of each candidate error type.
Specifically, after all the candidate error categories are obtained, the number of times of occurrence of each candidate error category is counted.
And S210, determining the candidate error category with the largest frequency as a classification result.
Specifically, after the number of times corresponding to each candidate error category is obtained, the candidate error category with the largest number of times is determined, and the candidate error category is determined as the classification result. If there is more than one candidate error category, the classification result may be determined according to a preset rule. For example, the rule may be to set a priority to each alternative error category, and the highest priority may be used as the classification result; all the candidate error categories with the largest occurrence number can be used as the classification result, or one of the candidate error categories with the largest occurrence number can be randomly selected as the classification result.
S211, determining a log identifier, an error reporting position and error details corresponding to the information vector to be analyzed.
The log identifier may be a unique identifier of the log file, and may be a file name, a number, and the like of the log file, and the error reporting position may be a position of the information vector to be analyzed in the log file, for example, line 103. The error details may be specific information of the error, and may be used to describe the error in detail.
Specifically, after the classification result of the information vector to be analyzed is determined, a log identifier of a log file where the information vector to be analyzed is located, an error reporting position of the information vector to be analyzed in the log file, and error details describing the information vector to be analyzed in the log file are further determined, all information in an information string to be analyzed corresponding to the information vector to be analyzed is used as the error details, or part of the information is used as the error details, and when the part of the information is used as the error details, a certain rule can be preset to filter information in the information string to be analyzed corresponding to the information vector to be analyzed.
S212, forming an analysis result document according to the log identification, the error reporting position, the error details and the classification result corresponding to each information vector to be analyzed.
The analysis result document may be an analysis result of the log to be analyzed, and the error type and the related information of the log to be analyzed may be determined according to the analysis result document.
Specifically, information such as a log identifier, an error reporting position, error details, a classification result and the like corresponding to each information vector to be analyzed can be written into a document respectively to form an analysis result document, so that a worker can determine error information of the log to be analyzed more visually and conveniently according to the analysis result document.
The log analysis method provided by the embodiment of the invention solves the problem that a developer cannot quickly and accurately position a test error in the prior art, and determines a corresponding alternative dislocation category by processing the log to be analyzed into the information vector to be analyzed, analyzing the information vector to be analyzed according to the log error reporting information feature library, and determining the classified result of an error generated by the information vector to be analyzed according to the alternative error category. The log to be analyzed is automatically analyzed, the workload of manual error checking of testers can be greatly reduced by determining the classification result, the working efficiency is improved, the log to be analyzed is automatically analyzed to determine the information vector to be analyzed, the log file analysis efficiency can be improved, manual log checking and analysis are not needed, the efficiency of finding the error log is improved, and the test efficiency is improved. The method and the device can replace manual processing of numerous and complicated log files, output analysis result documents which are used for performing item, clear classification and detailed content display on error reporting information, are convenient for improving the initiative of a tester for grasping test results, reduce the communication cost of development and test, and improve the efficiency of repairing defects of the developer.
As an optional embodiment of this embodiment, fig. 3 is a schematic flowchart of a log analysis method based on a k-nearest neighbor algorithm according to an embodiment of the present invention.
As shown in fig. 3, the method first obtains a server address and a log storage path, obtains a log to be analyzed according to the server address and the log storage path, and analyzes the log to be analyzed to obtain an itemized log information string; capturing key words in the itemized log information string according to the set characteristic value label to obtain an information string to be analyzed, and converting the information string to be analyzed into an information vector to be analyzed; and then classifying the error reporting information based on a k-nearest neighbor algorithm according to a log error reporting information feature library, determining the error category corresponding to the information vector to be analyzed, and outputting an analysis result document comprising a log identifier, an error reporting position, a classification result and error details.
EXAMPLE III
FIG. 4 shows a schematic block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The processor 11 performs the various methods and processes described above, such as the log analysis method.
In some embodiments, the log analysis method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the log analysis method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the method log analysis method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A log analysis method, comprising:
acquiring a log to be analyzed;
processing the log to be analyzed, and determining an information vector to be analyzed;
for each information vector to be analyzed, determining a corresponding alternative error category according to the log error reporting information feature library;
and determining a classification result according to each alternative error category.
2. The method of claim 1, wherein the obtaining a log to be analyzed comprises:
acquiring a server address and a log storage path which are configured in advance;
and when the log analysis condition is met, acquiring the log to be analyzed according to the server address and the log storage path.
3. The method of claim 1, wherein the processing the log to be analyzed and determining an information vector to be analyzed comprises:
determining a log information string according to the log to be analyzed;
matching each log information string with a characteristic value label in a predetermined characteristic value label set, and determining the successfully matched log information string as an information string to be analyzed;
and determining a weight value of the feature entry according to the information corresponding to the information string to be analyzed and a preset feature entry, and forming an information vector to be analyzed according to the weight value of each feature entry.
4. The method of claim 3, wherein determining a log information string from the log to be analyzed comprises:
extracting the log to be analyzed and determining text content;
and analyzing the text content into a log information string.
5. The method of claim 1, wherein determining the corresponding alternative error category according to the log error information feature library comprises:
calculating the similarity between the information vector to be analyzed and the error reporting information vector in the log error reporting information feature library;
screening out a preset number of target error reporting information vectors according to each similarity;
and respectively determining the error categories corresponding to the target error reporting information vectors as alternative error categories.
6. The method of claim 5, wherein the step of determining the preset number comprises:
acquiring an initial quantity, a training sample set and a test sample set, wherein the training sample set comprises error reporting information vectors and corresponding error categories, and the test sample set comprises test information vectors and corresponding standard error categories;
determining a prediction error category corresponding to each test information vector according to the error reporting information vectors in the training sample set and the initial quantity;
determining a prediction accuracy rate according to each prediction error category and the corresponding standard error category;
judging whether the prediction accuracy is greater than an accuracy threshold, and if so, determining the initial quantity as a preset quantity;
otherwise, updating the initial quantity, determining a new initial quantity, and returning to the step of determining the prediction error category corresponding to each test information vector according to the error reporting information vectors in the training sample set and the initial quantity.
7. The method of claim 1, wherein determining a classification result according to each of the candidate error categories comprises:
counting the times corresponding to the alternative error categories;
and determining the candidate error category with the highest frequency as a classification result.
8. The method of any one of claims 1-7, further comprising:
determining a log identifier, an error reporting position and error details corresponding to the information vector to be analyzed;
and forming an analysis result document according to the log identification, the error reporting position, the error details and the classification result corresponding to each information vector to be analyzed.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the log analysis method of any one of claims 1-8.
10. A computer-readable storage medium having stored thereon computer instructions for causing a processor, when executed, to implement the log analysis method of any one of claims 1-8.
CN202211274833.8A 2022-10-18 2022-10-18 Log analysis method, device and storage medium Pending CN115470034A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211274833.8A CN115470034A (en) 2022-10-18 2022-10-18 Log analysis method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211274833.8A CN115470034A (en) 2022-10-18 2022-10-18 Log analysis method, device and storage medium

Publications (1)

Publication Number Publication Date
CN115470034A true CN115470034A (en) 2022-12-13

Family

ID=84337687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211274833.8A Pending CN115470034A (en) 2022-10-18 2022-10-18 Log analysis method, device and storage medium

Country Status (1)

Country Link
CN (1) CN115470034A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117234776A (en) * 2023-09-18 2023-12-15 厦门国际银行股份有限公司 Intelligent judging method, device and equipment for batch processing error reporting operation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117234776A (en) * 2023-09-18 2023-12-15 厦门国际银行股份有限公司 Intelligent judging method, device and equipment for batch processing error reporting operation

Similar Documents

Publication Publication Date Title
US10095780B2 (en) Automatically mining patterns for rule based data standardization systems
CN111343161B (en) Abnormal information processing node analysis method, abnormal information processing node analysis device, abnormal information processing node analysis medium and electronic equipment
CN108027814B (en) Stop word recognition method and device
CN113657088A (en) Interface document analysis method and device, electronic equipment and storage medium
CN111133396B (en) Production facility monitoring device, production facility monitoring method, and recording medium
US11010393B2 (en) Library search apparatus, library search system, and library search method
CN115470034A (en) Log analysis method, device and storage medium
CN112148841B (en) Object classification and classification model construction method and device
CN116340831B (en) Information classification method and device, electronic equipment and storage medium
CN115618264A (en) Method, apparatus, device and medium for topic classification of data assets
CN115545481A (en) Risk level determination method and device, electronic equipment and storage medium
CN115600592A (en) Method, device, equipment and medium for extracting key information of text content
CN114943219A (en) Method, device and equipment for generating bill of material test data and storage medium
CN115169490A (en) Log classification method, device and equipment and computer readable storage medium
CN114020904A (en) Test question file screening method, model training method, device, equipment and medium
CN111522750B (en) Method and system for processing function test problem
CN114154480A (en) Information extraction method, device, equipment and storage medium
CN109918293B (en) System test method and device, electronic equipment and computer readable storage medium
CN107844478B (en) Patent file processing method and device
CN114492409B (en) Method and device for evaluating file content, electronic equipment and program product
CN112181490B (en) Method, device, equipment and medium for identifying function category in function point evaluation method
CN117290758A (en) Classification and classification method, device, equipment and medium for unstructured document
CN117493785A (en) Data processing method and device and electronic equipment
CN114970689A (en) Method and device for detecting sample and electronic equipment
CN114116688A (en) Data processing and data quality inspection method, device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination