CN113282559A - Computer log classification method and device, storage medium and electronic device - Google Patents

Computer log classification method and device, storage medium and electronic device Download PDF

Info

Publication number
CN113282559A
CN113282559A CN202110626955.8A CN202110626955A CN113282559A CN 113282559 A CN113282559 A CN 113282559A CN 202110626955 A CN202110626955 A CN 202110626955A CN 113282559 A CN113282559 A CN 113282559A
Authority
CN
China
Prior art keywords
log information
information
log
keyword
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110626955.8A
Other languages
Chinese (zh)
Inventor
杨猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Haier Smart Home Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202110626955.8A priority Critical patent/CN113282559A/en
Publication of CN113282559A publication Critical patent/CN113282559A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Abstract

The invention discloses a classification method and a device of computer logs, a storage medium and an electronic device, wherein the method comprises the following steps: reading log information of a target task, and storing attribute information of the log information to a database; filtering attribute information of the log information through a preset algorithm, and acquiring first log information from a filtered result; the first log information is stored in a database, and a labeling operation that a target object performs type labeling on the first log information according to the running environment of the log information is received, namely the attribute information of the log information is filtered to obtain the first log information, and then the type of the first log information is labeled through the labeling operation of the target object. By adopting the technical scheme, the problems that in the related technology, due to the lack of configured keywords, part of log information can be omitted, the obtained log information only can play a warning role, and developers cannot be guided to carry out task optimization and the like are solved.

Description

Computer log classification method and device, storage medium and electronic device
Technical Field
The invention relates to the field of communication, in particular to a computer log classification method and device, a storage medium and an electronic device.
Background
With the rapid growth of internet users, the architecture of an internet service monomer cannot meet the requirements, a distributed system is produced, the number of logs is rapidly increased, and the acquisition of log information of a target task becomes an urgent need for developers.
In the prior art, for the Azkaban task running log, only click check of each interface is provided, and task lists in running, successful running and failed running can be inquired. However, the log information and the properties of the log information cannot be inquired, so that the log information is screened out by configuring keywords at present, but a part of the log information is omitted due to the fact that the configured keywords are few, and the obtained log information is not classified wrongly, only plays a role in warning, and cannot guide developers to carry out task optimization.
The method aims at the problems that in the related technology, due to the fact that configured keywords are lacked, part of log information can be omitted, the obtained log information only can play a warning role, and developers cannot be guided to carry out task optimization and the like.
Disclosure of Invention
The embodiment of the invention provides a classification method and device of computer logs, a storage medium and an electronic device, and aims to at least solve the problems that in the related technology, due to the lack of configured keywords, part of log information can be omitted, the obtained log information only has the alarm effect, and developers cannot be guided to perform task optimization and the like.
According to an embodiment of the present invention, there is provided a method for classifying computer logs, including: reading log information of a target task, and storing attribute information of the log information to a database, wherein the attribute information comprises: running codes corresponding to the log information; filtering the attribute information of the log information through a preset algorithm, and acquiring first log information from a filtered result; and storing the first log information into the database, and receiving a labeling operation of performing type labeling on the first log information by a target object according to the running environment of the log information.
In an exemplary embodiment, after receiving a labeling operation for a target object to perform type labeling on the first log information according to the running environment of the log information, the method further includes: associating the attribute information with the first log information through the running code to obtain first data; constructing a chart according to the first data, wherein the chart is used for indicating basic information of the first log information, and the basic information at least comprises one of the following information: the type of the first log information, the date of the first log information, and the developer corresponding to the first log information.
In one exemplary embodiment, reading log information of a target task includes: constructing a connection pool for the target task, and linking a configuration library through the connection pool, wherein the configuration library is used for storing logs in a first preset format; screening logs in a preset mode to obtain second log information, wherein the second log information is logs in a first preset format; and converting the second log information into log information in a second preset format to obtain the log information in the second preset format.
In an exemplary embodiment, filtering the attribute information of the log information by a preset algorithm, before obtaining the first log information from the filtered result, includes: and filtering public information in the attribute information of the log information to obtain third log information, wherein the public information at least comprises one of the following information: the service starting parameter of the target task, the adding date parameter of the target task and the name parameter of the target task.
In an exemplary embodiment, filtering the attribute information of the log information by a preset algorithm, and after obtaining the first log information from the filtered result, the method includes: setting a first keyword, wherein the first keyword is used for indicating a word of a reference package in the attribute information of the log information; filtering the first log information according to the first keyword; and determining fourth log information through the filtered first log information, wherein the fourth log information is used for indicating a log which does not contain the first keyword.
In one exemplary embodiment, before saving the first log information to the database, the method includes: acquiring line information of a second keyword in the log information through the second keyword; calling a translation interface to translate the acquired line information; and the running code, the second keyword and the translated line information form second data, wherein the second data is used for being stored in the database.
According to another embodiment of the present invention, there is also provided a computer log classification apparatus including: the reading module is used for reading log information of a target task and storing attribute information of the log information to a database, wherein the attribute information comprises: running codes corresponding to the log information; the filtering module is used for filtering the attribute information of the log information through a preset algorithm and acquiring first log information from a filtered result; and the receiving module is used for storing the first log information to the database and receiving the marking operation of the target object for carrying out type marking on the first log information according to the running environment of the log information.
In an exemplary embodiment, the apparatus further includes: the building module is used for associating the attribute information with the first log information through the running code to obtain first data; constructing a chart according to the first data, wherein the chart is used for indicating basic information of the first log information, and the basic information at least comprises one of the following information: the type of the first log information, the date of the first log information, and the developer corresponding to the first log information.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the above classification method for computer logs when the computer program runs.
According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the method for classifying a computer log through the computer program.
In the embodiment of the present invention, log information of a target task is read, and attribute information of the log information is stored in a database, where the attribute information includes: running codes corresponding to the log information; filtering the attribute information of the log information through a preset algorithm, and acquiring first log information from a filtered result; and storing the first log information into the database, receiving a labeling operation of a target object for performing type labeling on the first log information according to the running environment of the log information, namely filtering the attribute information of the log information to obtain the first log information, and further labeling the type of the first log information through the labeling operation of the target object. By adopting the technical scheme, the problems that in the related art, due to the lack of configured keywords, part of log information can be omitted, the obtained log information only can play a role of warning and cannot guide developers to perform task optimization and the like are solved, namely, the attribute information of the log information is filtered through a preset algorithm to obtain the first log information, the quantity of the obtained log information is increased, the type of the first log information is labeled through the labeling operation of a target object, and a decision can be provided for task analysis and optimization.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a computer terminal of a method for classifying computer logs according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a method of classifying computer logs according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a method of classifying computer logs according to an embodiment of the present invention;
fig. 4 is a block diagram of a classification apparatus for computer logs according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The method provided by the embodiment of the application can be executed in a computer terminal or a similar operation device. Taking the example of the computer terminal running on the computer terminal, fig. 1 is a hardware structure block diagram of the computer terminal of the classification method of the computer log according to the embodiment of the present invention. As shown in fig. 1, the computer terminal may include one or more (only one shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and in an exemplary embodiment, may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the computer terminal. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration with equivalent functionality to that shown in FIG. 1 or with more functionality than that shown in FIG. 1.
The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the classification method of the computer log in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In this embodiment, a method for classifying a computer log is provided, which is applied to the computer terminal, and fig. 2 is a flowchart of a method for classifying a computer log according to an embodiment of the present invention, where the flowchart includes the following steps:
step S202, reading log information of a target task, and storing attribute information of the log information to a database, wherein the attribute information comprises: running codes corresponding to the log information;
step S204, filtering the attribute information of the log information through a preset algorithm, and acquiring first log information from a filtered result;
step S206, saving the first log information to the database, and receiving a labeling operation of a target object for performing type labeling on the first log information according to the running environment of the log information.
Through the steps, the log information of the target task is read, and the attribute information of the log information is stored in a database, wherein the attribute information comprises: running codes corresponding to the log information; filtering the attribute information of the log information through a preset algorithm, and acquiring first log information from a filtered result; saving the first log information to the database, receiving a labeling operation of a target object for performing type labeling on the first log information according to the running environment of the log information, namely, the attribute information of the log information is filtered to obtain the first log information, and then the type of the first log information is labeled through the labeling operation of the target object, thereby solving the problems in the related art, because of the lack of configured keywords, part of log information can be missed, and the acquired log information only can play a role of warning, and can not guide developers to carry out task optimization and other problems, that is to say, the attribute information of the log information is filtered through a preset algorithm to obtain the first log information, so that the quantity of the obtained log information is increased, and the type of the first log information is labeled through the labeling operation of the target object, so that a decision can be provided for task analysis optimization.
In step S206, a training model may be further operated according to the target object and a labeling operation for performing type labeling on the first log information according to the running environment of the log information, so that the computer terminal may perform type labeling on the first log information according to the running environment of the log information through the trained model, where the preset algorithm is a TF-IDF algorithm, the TF part calculates a word frequency (the number of times a word appears in a log), the IDF part calculates an inverse document frequency (log (total number of log documents/(number of logs including the word +1))), and the TF-IDF part calculates a word frequency (TF) — Inverse Document Frequency (IDF), so as to filter words that appear frequently but have no meaning, select a keyword with a low word frequency in a document library, and implement first capturing. Combining parts of speech, increasing part of speech weight so as to more accurately capture the keywords.
In an exemplary embodiment, after receiving a marking operation that a target object carries out type marking on the first log information according to a running environment of the log information, associating the attribute information with the first log information through the running code to obtain first data; constructing a chart according to the first data, wherein the chart is used for indicating basic information of the first log information, and the basic information at least comprises one of the following information: the type of the first log information, the date of the first log information, and the developer corresponding to the first log information.
That is, after the first log information is type-labeled according to the target object labeling operation, the attribute information of the target task and the first log information are associated by running coding, combined into first data, and a graph is constructed according to the first data, wherein the graph can be: the chart can be used for carrying out screening statistics through types such as date, developers and error reporting types, analyzing a certain task or a certain type of tasks, sending an alarm signal under the condition that the number of the error reporting types is larger than a preset threshold value, providing optimization guidance for the developers, and clicking the jump link in the chart to check the first log information so as to conveniently process the log information.
In an exemplary embodiment, a connection pool is constructed for the target task, and a configuration library is linked through the connection pool, wherein the configuration library is used for storing a log in a first preset format; screening logs in a preset mode to obtain second log information, wherein the second log information is logs in a first preset format; and converting the second log information into log information in a second preset format to obtain the log information in the second preset format.
Specifically, the logs of the target task are stored in a configuration library in a first preset format, a connection pool is established for the target task through Python, the configuration library of the target task is linked through the connection pool, and the logs of the target task are screened in a preset mode to obtain second log information, wherein the preset mode comprises: and converting the log in the first preset format into the log information in the second preset format so as to read the log information in the second preset format when status is equal to '70'. Further, the attribute information of the target task, such as the project name, the flow name, the job name, the running code, the running machine, the project description, the task starting time, the task ending time and the like of the log information in the second preset format is stored in the Mysql database.
For example, the log of the target task is saved in the configuration library in a gzip compressed form, and the second log information is decompressed by gzip.
In an exemplary embodiment, the attribute information of the log information is filtered through a preset algorithm, and before the first log information is obtained from the filtered result, common information in the attribute information of the log information is filtered to obtain third log information, where the common information at least includes one of: the service starting parameter of the target task, the adding date parameter of the target task and the name parameter of the target task.
In other words, the log information includes common information such as a service start parameter of the target task, an add date parameter of the target task, and a name parameter of the target task, and the log information is subjected to preliminary pre-filtering processing by filtering the printed common information using a regular expression [ re.sub (r '\ d + - \ d + - \ d + -/d +/CST { } INFO-'. format (job _ name), ", x) for x in logs ], which is only an example and is not limited in the embodiment of the present invention.
In an exemplary embodiment, filtering the attribute information of the log information by a preset algorithm, and after obtaining the first log information from the filtered result, the method includes: setting a first keyword, wherein the first keyword is used for indicating a word of a reference package in the attribute information of the log information; filtering the first log information according to the first keyword; and determining fourth log information through the filtered first log information, wherein the fourth log information is used for indicating a log which does not contain the first keyword.
Namely, words of the reference package in the attribute information of the log information are filtered by setting a first keyword, for example, words such as java, python, hadoop, apache, hdfs and the like are filtered.
In an exemplary embodiment, before saving the first log information to a database, obtaining row information of a second keyword in the log information through the second keyword; calling a translation interface to translate the acquired line information; and the running code, the second keyword and the translated line information form second data, wherein the second data is used for being stored in the database.
And separating the first log information according to the line break, scanning the first log information by using a second keyword to obtain the line information of the second keyword, calling a translation interface to translate the obtained line information into Chinese because the log information is basically English, and forming a single column. After the processing, a plurality of columns of information including the running code, the second keyword, the translated row information and the like can be formed. And combining multiple columns of information such as the running code, the second keyword, the translated row information and the like through the running code to obtain second data, and storing the second data in a database.
In order to better understand the process of the classification method for the computer log, the following describes a flow of the implementation method for classifying the computer log with reference to an optional embodiment, but the flow is not limited to the technical solution of the embodiment of the present invention.
In this embodiment, a method for classifying a computer log is provided, and fig. 3 is a schematic diagram of a method for classifying a computer log according to an embodiment of the present invention, as shown in fig. 3, the following steps are specifically performed:
step S301: starting an Azkaban task (corresponding to the target task in the above-described embodiment);
step S302: sending the Azkaban log to a configuration library, and saving the Azkaban log in a gzip compressed form (which is equivalent to a first preset format in the embodiment) in the configuration library (which is equivalent to the log in the embodiment);
step S303: python builds a connection pool, links the Azkaban configuration library through the connection pool, and screens Azkaban logs through status ═ 70' to obtain a first error log (equivalent to the second log information in the above embodiment);
step S304: decompressing the first error log in gzip format with gzip.decompression (logs) · decode ('utf-8'), and converting it into an error log in utf8 format (corresponding to the second preset format in the above-described embodiment) (corresponding to the log information in the above-described embodiment);
step S305: reading the error log, and pushing the attribute information of the Azkaban task such as the project name, flow name, job name, running code, running machine, project description, task starting time, task ending time and the like of the error log into a Mysql database;
step S306: analyzing the error log to obtain first data;
specifically, the method comprises the following steps:
step S3061: filtering public information such as service starting, added date, task name parameters and the like in an error log by using a regular expression [ re (r '\ d + - \ d + - \ d +: d + CST { } INFO-'. format (job _ name), ", x) for x in logs ];
step S3062: the method is realized by Python based on TF-IDF algorithm optimization, the TF part calculates word frequency (the frequency of a certain word appearing in a log), the IDF part calculates inverse document frequency (log (total number of log documents/(number of logs containing the word, such as +1))), and TF-IDF (word frequency (TF)) is Inverse Document Frequency (IDF), so that words which appear frequently but have no meaning are filtered, keywords with lower word frequency are selected in a document library, and the first-step grabbing is realized. Combining parts of speech, increasing part of speech weight so as to more accurately capture the keywords;
step S3063: setting a first keyword to filter the first keyword in the error log, for example, filtering words such as java, python, hadoop, apache, hdfs, and the like;
step S3064: separating the text log according to a line break, and scanning [ x for x in logs if.match (r 'keyword:. +', x) ] in the error log by using a second keyword to acquire the line information of the second keyword;
step S3065: and calling the translation api, and translating the acquired row information into Chinese to form a single column. After the processing, multi-column information comprising the operation code, the error reporting key information, the Chinese version of the error reporting key information and the like can be formed, and the multi-column information comprising the operation code, the error reporting key information, the Chinese version of the error reporting key information and the like is combined through the operation code to obtain first data.
Step S307: writing the first data into a Mysql database;
step S308: and performing type marking on the error log information according to the actual operating environment, for example: external data source problems, data task dependency problems, data development problems, platform resource problems, service linking problems, and the like;
step S309: associating the task attribute information with the key information of the error log by running codes to form complete data;
step S310: sending the complete data to a tool for constructing a chart;
step S311: the tool for constructing the chart reads the complete data, constructs the chart (pie chart and list) according to the complete data, and performs screening statistics through date, developers, error reporting types and the like, so as to analyze a certain task or a certain type of tasks.
Through the steps, reading an error log of an Azkaban task, and storing attribute information of the error log into a database, wherein the attribute information comprises: running codes corresponding to the error logs; filtering the attribute information of the error log through a preset algorithm, and acquiring a first error log from a filtered result; the first error log is stored in the database, a target object is received to perform type labeling operation on the first error log according to the running environment of the error log, namely, the attribute information of the error log is filtered to obtain the first error log, and then the type of the first error log is labeled through the labeling operation of the target object.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
FIG. 4 is a block diagram of a computer log sorting apparatus according to an embodiment of the present invention; as shown in fig. 4, includes:
the reading module 42 is configured to read log information of a target task, and store attribute information of the log information in a database, where the attribute information includes: running codes corresponding to the log information;
the filtering module 44 is configured to filter the attribute information of the log information through a preset algorithm, and obtain first log information from a filtered result;
a receiving module 46, configured to store the first log information in the database, and receive a labeling operation that a target object performs type labeling on the first log information according to an operating environment of the log information.
Through the module, the log information of the target task is read, and the attribute information of the log information is stored in a database, wherein the attribute information comprises: running codes corresponding to the log information; filtering the attribute information of the log information through a preset algorithm, and acquiring first log information from a filtered result; saving the first log information to the database, receiving a labeling operation of a target object for performing type labeling on the first log information according to the running environment of the log information, namely, the attribute information of the log information is filtered to obtain the first log information, and then the type of the first log information is labeled through the labeling operation of the target object, thereby solving the problems in the related art, because of the lack of configured keywords, part of log information can be missed, and the acquired log information only can play a role of warning, and can not guide developers to carry out task optimization and other problems, that is to say, the attribute information of the log information is filtered through a preset algorithm to obtain the first log information, so that the quantity of the obtained log information is increased, and the type of the first log information is labeled through the labeling operation of the target object, so that a decision can be provided for task analysis optimization.
It should be noted that, a training model may be further performed according to a labeling operation performed by the target object on the first log information according to the running environment of the log information, so that the computer terminal may perform type labeling on the first log information according to the running environment of the log information through the trained model, where the preset algorithm is a TF-IDF algorithm, the TF part calculates a word frequency (the number of times a word appears in a log), the IDF part calculates an inverse document frequency (log (total number of log documents/(number of logs including the word, such as +1))), and TF-IDF is a word frequency (TF) × Inverse Document Frequency (IDF), so as to filter words that appear frequently but do not have meaning, select a keyword with a lower word frequency in a document library, and implement the first-step capturing. By combining parts of speech, the part of speech weight is increased so as to more accurately capture the keywords.
In an exemplary embodiment, the apparatus further includes: the building module is used for associating the attribute information with the first log information through the running code to obtain first data; constructing a chart according to the first data, wherein the chart is used for indicating basic information of the first log information, and the basic information at least comprises one of the following information: the type of the first log information, the date of the first log information, and the developer corresponding to the first log information.
That is, after the first log information is type-labeled according to the target object labeling operation, the attribute information of the target task and the first log information are associated by running coding, combined into first data, and a graph is constructed according to the first data, wherein the graph can be: the chart can be used for carrying out screening statistics through types such as date, developers and error reporting types, analyzing a certain task or a certain type of tasks, sending an alarm signal under the condition that the number of the error reporting types is larger than a preset threshold value, providing optimization guidance for the developers, and clicking the jump link in the chart to check the first log information so as to conveniently process the log information.
In an exemplary embodiment, the building module is further configured to build a connection pool for the target task, and link a configuration library through the connection pool, where the configuration library is configured to store a log in a first preset format; screening logs in a preset mode to obtain second log information, wherein the second log information is logs in a first preset format; and converting the second log information into log information in a second preset format to obtain the log information in the second preset format.
Specifically, the logs of the target task are stored in a configuration library in a first preset format, a connection pool is established for the target task through Python, the configuration library of the target task is linked through the connection pool, and the logs of the target task are screened in a preset mode to obtain second log information, wherein the preset mode comprises: and converting the log in the first preset format into the log information in the second preset format so as to read the log information in the second preset format when status is equal to '70'. Further, the attribute information of the target task, such as the project name, the flow name, the job name, the running code, the running machine, the project description, the task starting time, the task ending time and the like of the log information in the second preset format is stored in the Mysql database.
For example, the log of the target task is saved in the configuration library in a gzip compressed form, and the second log information is decompressed by gzip.
In an exemplary embodiment, the attribute information of the log information is filtered through a preset algorithm, and before the first log information is obtained from the filtered result, the filtering module is further configured to filter common information in the attribute information of the log information to obtain third log information, where the common information at least includes one of: the service starting parameter of the target task, the adding date parameter of the target task and the name parameter of the target task.
In other words, the log information includes common information such as a service start parameter of the target task, an add date parameter of the target task, and a name parameter of the target task, and the log information is subjected to preliminary pre-filtering processing by filtering the printed common information using a regular expression [ re.sub (r '\ d + - \ d + - \ d + -/d +/CST { } INFO-'. format (job _ name), ", x) for x in logs ], which is only an example and is not limited in the embodiment of the present invention.
In an exemplary embodiment, the attribute information of the log information is filtered through a preset algorithm, and after first log information is obtained from a filtered result, the filtering module is further configured to set a first keyword, where the first keyword is used to indicate a word of a reference packet in the attribute information of the log information; filtering the first log information according to the first keyword; and determining fourth log information through the filtered first log information, wherein the fourth log information is used for indicating a log which does not contain the first keyword.
Namely, words of the reference package in the attribute information of the log information are filtered by setting a first keyword, for example, words such as java, python, hadoop, apache, hdfs and the like are filtered.
In an exemplary embodiment, before saving the first log information to the database, the apparatus further includes: the translation module is used for acquiring the row information of the second keyword in the log information through the second keyword; calling a translation interface to translate the acquired line information; and the running code, the second keyword and the translated line information form second data, wherein the second data is used for being stored in the database.
And separating the first log information according to the line break, scanning the first log information by using a second keyword to obtain the line information of the second keyword, calling a translation interface to translate the obtained line information into Chinese because the log information is basically English, and forming a single column. After the processing, a plurality of columns of information including the running code, the second keyword, the translated row information and the like can be formed. And combining multiple columns of information such as the running code, the second keyword, the translated row information and the like through the running code to obtain second data, and storing the second data in a database.
An embodiment of the present invention further provides a storage medium including a stored program, wherein the program executes any one of the methods described above.
Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:
s1, reading the log information of the target task, and storing the attribute information of the log information in a database, wherein the attribute information comprises: running codes corresponding to the log information;
s2, filtering the attribute information of the log information through a preset algorithm, and acquiring first log information from the filtered result;
s3, saving the first log information to the database, and receiving a labeling operation of a target object for performing type labeling on the first log information according to the running environment of the log information.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, reading the log information of the target task, and storing the attribute information of the log information in a database, wherein the attribute information comprises: running codes corresponding to the log information;
s2, filtering the attribute information of the log information through a preset algorithm, and acquiring first log information from the filtered result;
s3, saving the first log information to the database, and receiving a labeling operation of a target object for performing type labeling on the first log information according to the running environment of the log information.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for classifying computer logs, comprising:
reading log information of a target task, and storing attribute information of the log information to a database, wherein the attribute information comprises: running codes corresponding to the log information;
filtering the attribute information of the log information through a preset algorithm, and acquiring first log information from a filtered result;
and storing the first log information into the database, and receiving a labeling operation of performing type labeling on the first log information by a target object according to the running environment of the log information.
2. The method of classifying a computer log according to claim 1, further comprising:
associating the attribute information with the first log information through the running code to obtain first data;
constructing a chart according to the first data, wherein the chart is used for indicating basic information of the first log information, and the basic information at least comprises one of the following information: the type of the first log information, the date of the first log information, and the developer corresponding to the first log information.
3. The method of claim 1, wherein reading log information of a target task comprises:
constructing a connection pool for the target task, and linking a configuration library through the connection pool, wherein the configuration library is used for storing logs in a first preset format;
screening logs in a preset mode to obtain second log information, wherein the second log information is logs in a first preset format;
and converting the second log information into log information in a second preset format to obtain the log information in the second preset format.
4. The method for classifying computer logs according to claim 1, wherein the step of filtering the attribute information of the log information by a preset algorithm comprises, before obtaining the first log information from the filtered result:
and filtering public information in the attribute information of the log information to obtain third log information, wherein the public information at least comprises one of the following information: the service starting parameter of the target task, the adding date parameter of the target task and the name parameter of the target task.
5. The method for classifying computer logs according to claim 1, wherein the step of filtering the attribute information of the log information by a preset algorithm, and after obtaining the first log information from the filtered result, comprises:
setting a first keyword, wherein the first keyword is used for indicating a word of a reference package in the attribute information of the log information;
filtering the first log information according to the first keyword;
and determining fourth log information through the filtered first log information, wherein the fourth log information is used for indicating a log which does not contain the first keyword.
6. The method of sorting computer logs according to claim 1, wherein before saving the first log information to a database, comprising:
acquiring line information of a second keyword in the log information through the second keyword;
calling a translation interface to translate the acquired line information;
and the running code, the second keyword and the translated line information form second data, wherein the second data is used for being stored in the database.
7. An apparatus for classifying computer logs, comprising:
the reading module is used for reading log information of a target task and storing attribute information of the log information to a database, wherein the attribute information comprises: running codes corresponding to the log information;
the filtering module is used for filtering the attribute information of the log information through a preset algorithm and acquiring first log information from a filtered result;
and the receiving module is used for storing the first log information to the database and receiving the marking operation of the target object for carrying out type marking on the first log information according to the running environment of the log information.
8. The apparatus for sorting computer logs according to claim 7, further comprising:
the building module is used for associating the attribute information with the first log information through the running code to obtain first data; constructing a chart according to the first data, wherein the chart is used for indicating basic information of the first log information, and the basic information at least comprises one of the following information: the type of the first log information, the date of the first log information, and the developer corresponding to the first log information.
9. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 6.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 6 by means of the computer program.
CN202110626955.8A 2021-06-04 2021-06-04 Computer log classification method and device, storage medium and electronic device Pending CN113282559A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110626955.8A CN113282559A (en) 2021-06-04 2021-06-04 Computer log classification method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110626955.8A CN113282559A (en) 2021-06-04 2021-06-04 Computer log classification method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN113282559A true CN113282559A (en) 2021-08-20

Family

ID=77283502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110626955.8A Pending CN113282559A (en) 2021-06-04 2021-06-04 Computer log classification method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN113282559A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform
CN109213736A (en) * 2017-06-29 2019-01-15 阿里巴巴集团控股有限公司 The compression method and device of log
CN111198850A (en) * 2019-12-14 2020-05-26 深圳猛犸电动科技有限公司 Log message processing method and device and Internet of things platform
CN111813751A (en) * 2020-06-29 2020-10-23 平安科技(深圳)有限公司 Application system log data processing method, application system, device and medium
CN112422344A (en) * 2020-11-18 2021-02-26 青岛海尔科技有限公司 Log abnormity warning method and device, storage medium and electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213736A (en) * 2017-06-29 2019-01-15 阿里巴巴集团控股有限公司 The compression method and device of log
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform
CN111198850A (en) * 2019-12-14 2020-05-26 深圳猛犸电动科技有限公司 Log message processing method and device and Internet of things platform
CN111813751A (en) * 2020-06-29 2020-10-23 平安科技(深圳)有限公司 Application system log data processing method, application system, device and medium
CN112422344A (en) * 2020-11-18 2021-02-26 青岛海尔科技有限公司 Log abnormity warning method and device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN105303112B (en) The detection method and device of component call loophole
CN107957940B (en) Test log processing method, system and terminal
CN111241182A (en) Data processing method and apparatus, storage medium, and electronic apparatus
CN110188135B (en) File generation method and equipment
CN110688307B (en) JavaScript code detection method, device, equipment and storage medium
US10387370B2 (en) Collecting test results in different formats for storage
CN110489324A (en) Method, apparatus that test page jumps, storage medium, electronic device
CN112395251A (en) Intelligent analysis method and device for data file, electronic equipment and storage medium
CN102117436A (en) System and method for analyzing patient electronic receipt file
CN112882713B (en) Log analysis method, device, medium and computer equipment
CN112269746A (en) Automatic testing method and related equipment
CN114780377A (en) Method and device for determining software exception, storage medium and electronic device
CN105224420A (en) A kind of analytical approach of automatic parsing terminal abnormal and system
CN113282559A (en) Computer log classification method and device, storage medium and electronic device
CN110096478B (en) Document index generation method and device
CN112671878A (en) Block chain information subscription method, device, server and storage medium
CN109299132B (en) SQL data processing method and system and electronic equipment
CN113312485B (en) Log automatic classification method and device and computer readable storage medium
CN108376071B (en) APP identification method and system
CN116010147A (en) Method, system, electronic device and storage medium for generating exception handling table
CN110838338A (en) System, method, storage medium, and electronic device for creating biological analysis item
CN109856230B (en) Organic compound residue analysis method and device and intelligent monitoring system thereof
CN107508705A (en) The resource tree constructing method and computing device of a kind of HTTP elements
CN102999590B (en) Identify the method and system of official website
CN112131239A (en) Data processing method, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210820