CN116821067A

CN116821067A - Method, device, equipment and medium for classifying application logs

Info

Publication number: CN116821067A
Application number: CN202310810269.5A
Authority: CN
Inventors: 刘为
Original assignee: Kangjian Information Technology Shenzhen Co Ltd
Current assignee: Kangjian Information Technology Shenzhen Co Ltd
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-09-29

Abstract

The application provides a classification method of application logs, which is applied to the field of digital medical treatment and comprises the following steps: acquiring an application log to be classified from a preset data source; analyzing the format of the application log, determining key values to be extracted from the application log, and extracting the key values of the application log by using a preset key value extraction technology; calculating Jacquard similarity coefficient values between every two key values respectively; classifying the application log according to a preset classification rule and a Jacquard similarity coefficient value between every two key values; the application also provides a classification device of the application log, electronic equipment and a storage medium; in addition, the application also relates to a blockchain technology, and the application log is stored in the blockchain.

Description

Method, device, equipment and medium for classifying application logs

Technical Field

The application relates to the field of artificial intelligence and digital medical treatment, in particular to a classification method, a device, equipment and a medium of application logs.

Background

With the high-speed development of the internet, the quality of science and technology service is continuously improved, the privacy of users is more and more important, and the personal sensitive information in the application log is required to be desensitized in order to protect the privacy of the users; to facilitate locating and finding problems, applications will output logs containing business information (which may contain user information) at critical operations, where the points at which these critical operations output logs (i.e., log output points) are limited, but the application will output a log every time a critical operation is performed; with the continuous generation of logs along with the large-scale high-concurrency running of the program, the log quantity is very large (tens of millions of logs are daily), so that log desensitization work is efficiently completed, the log output points need to be found out from the large-scale logs, application logs are classified into the same type of application logs, the log output points can be found out, and the accurate positioning and desensitization treatment of research personnel are facilitated;

in the prior art, the application log output by a server has the advantages that log parameters generated by the same log output point are basically consistent, and specific data corresponding to the parameters are different; the log parameters generated by different log output points are generally different, when logs with different parameters are generally classified, the parameters of the logs are required to be extracted first, then the analysis rule of each log is written and classified, so that the log classification step is complicated, the efficiency is low, the rapid classification of the logs cannot be realized, for example, in the medical field, a doctor needs to record log information of a treatment scheme and the like of each patient, and the log parameters output by the same output point are the same and possibly comprise user ID, medical history, symptoms and diagnosis. The parameters values are associated with specific interviewees, and are typically different, and the log parameters output at the time of prescribing may include user ID, department, diagnosis, medicine and dosage, and the log parameters output at the two different output points are different.

In order to solve the technical problems of complicated steps and low efficiency of classifying logs with different parameters in the prior art, on the basis, a method for rapidly classifying application logs with different parameters is needed.

Disclosure of Invention

The application provides a classification method, device, equipment and medium of application logs, and mainly aims to solve the technical problems of complicated steps and low efficiency of classifying logs with different parameters in the prior art.

In order to achieve the above object, the present application provides a method for classifying application logs, the method comprising:

acquiring an application log to be classified from a preset data source;

analyzing the format of the application log, determining key values to be extracted from the application log, and extracting the key values of the application log by using a preset key value extraction technology;

calculating Jacquard similarity coefficient values between every two key values respectively;

and classifying the application log according to a preset classification rule and a Jacquard similarity coefficient value between every two key values.

Optionally, the acquiring the application log to be categorized from the preset data source includes:

determining index information of the application log to be classified in the preset data source;

and constructing a query statement according to the preset data source and the index information, and querying the index information by using the query statement to obtain the application log to be classified.

Optionally, after the application log to be categorized is obtained from the preset data source, the method further includes:

removing Chinese, numerals, symbols and character strings with the length larger than a preset length of the application log;

noise data of the application log is identified and filtered.

Optionally, the formula for calculating the value of the jaccard similarity coefficient between every two key values includes:

wherein A represents the key value of one application log in all application logs, B represents the key value of the other application log in all application logs, and J (A, B) represents the Jacquard similarity coefficient value between the key values of the two logs.

Optionally, after the application log categorizing, when there is an additional application log, the method includes:

extracting a key value of the newly added application log;

respectively calculating Jacquard similarity coefficient values between the key values of the newly added application logs and the key values of any application log in the application logs output by each type of output points;

if the Jacquard similarity coefficient value between the key value of the newly added application log and the key value of any application log in each type of application log is larger than a preset threshold, classifying the two application logs corresponding to the Jacquard similarity coefficient value larger than the preset threshold as the same type of application log.

Optionally, after the application log categorizing, when there is a new application log, the method further includes:

and if the Jacquard similarity coefficient value between the key value of the newly added application log and the key value of any application log in each type of application log is smaller than a preset threshold value, taking the newly added application log as the new type of application log.

Optionally, the format of the application log comprises a text file, a JSON format file and an XML format file;

the preset key value extraction technology comprises a regular expression, a JSON analysis technology and an XML analysis technology.

In order to solve the above problems, the present application provides a classification device for application logs, the device comprising:

the acquisition module is used for acquiring application logs to be classified from a preset data source;

the extraction module is used for analyzing the format of the application log, determining key values to be extracted from the application log, and extracting the key values of the application log by using a preset key value extraction technology;

the calculation module is used for calculating the Jacquard similarity coefficient value between every two key values respectively;

and the classifying module is used for classifying the application log according to a preset classifying rule and the Jacquard similarity coefficient value between every two key values.

In order to solve the above-mentioned problems, the present application also provides an electronic apparatus including:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of categorizing application logs according to any of claims 1 to 7.

In order to solve the above-mentioned problems, the present application also provides a computer-readable storage medium, which when executed by a processor, implements the method for categorizing application logs.

In the embodiment of the application, an application log to be classified is obtained from a preset data source, the format of the application log is analyzed, a key value to be extracted is determined in the application log, and the key value of the application log is extracted by utilizing a preset key value extraction technology; the application logs with different parameters can be classified, the flexibility of the extraction step is improved, and the steps of extracting the parameters and analyzing according to the parameters are omitted; the Jacquard similarity coefficient value between every two key values is calculated respectively, so that the quick classification of the application log is realized, and the classification efficiency is improved; the application logs are classified according to preset classification rules and Jaccard similarity coefficient values between every two key values, and the newly added application logs can be classified more conveniently.

Drawings

FIG. 1 is a flowchart of a method for classifying application logs according to the present application;

FIG. 2 is a schematic diagram of a module provided by the present application;

FIG. 3 is a schematic diagram of an electronic device provided by the application;

the achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment of the application can acquire and process related data based on artificial intelligence technology and digital medical field. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological retrieval technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The embodiment of the application provides a classification method of application logs, and an execution subject of the classification method of application logs comprises, but is not limited to, at least one of a client, a mobile terminal and the like which can be configured to execute the method provided by the embodiment of the application. The APP may be a blockchain platform. The mobile terminal includes, but is not limited to: a single server, a server cluster, a cloud server, or a cloud server cluster, etc.

In the embodiment of the application, the application log classifying method obtains the application log from a preset data source; extracting the key value of the application log by using a preset key value extraction technology; calculating Jacquard similarity coefficient values between every two key values; and classifying the application logs with the Jacquard similarity coefficient value larger than a preset threshold value into application logs output by the same output point.

The implementation principle of the present application is further described below with reference to fig. 1.

The method for classifying application logs provided by the application is described with reference to a flow chart shown in fig. 1, and comprises the following steps:

s1, acquiring an application log to be classified from a preset data source.

The preset data sources may include ES (Elasticsearch), files, etc., in this embodiment, the preset data source is ES (Elasticsearch), the ES (Elasticsearch) is an open source distributed search and analysis engine based on Lucene, and may be used for rapidly storing, searching and analyzing a large amount of unstructured data, the application log is stored in the ES, so that technical effects of rapid searching, easy management, deep analysis, etc. can be achieved, and the application log is a file of a series of record information generated in the running process of the application program.

In this embodiment, an application log to be categorized generated during the system operation is searched and obtained from a preset data source.

Specifically, an application log to be categorized is obtained from a preset data source by using a preset obtaining method, wherein the application log can be from an application program, an operating system, network equipment and the like, and the preset obtaining method can comprise any one of API (Application Programming Interface), a file system and a network protocol, and preprocessing operation is performed on the application log to be categorized to remove irrelevant information and filter noise.

In one embodiment, the obtaining the application log to be categorized from the preset data source includes:

Specifically, in this embodiment, the preset data source is ES (Elasticsearch), the index information includes an index name of ES (Elasticsearch) where the log is located, the query statement is a query statement that can be used for querying in ES (Elasticsearch), the index name of the application log to be categorized is determined from ES (Elasticsearch), a query statement conforming to the query grammar of ES (Elasticsearch) is constructed according to ES (Elasticsearch) and actual requirements, the index information is queried by using the query statement, and the application log to be categorized is obtained from ES (Elasticsearch).

For example, in the medical field, ES (Elasticsearch) is used to store and query various types of data, such as an application log generated by a patient during a hospital diagnosis, an application log generated by a patient during a treatment, etc., when we want to obtain an application log related to treatment information of a patient, we need to obtain from ES (Elasticsearch), first determine that the application log to be queried is in the index name of ES (Elasticsearch), for example, in ES (Elasticsearch) index which may be stored as a hospital_logs during the treatment process, design a query statement conforming to ES (Elasticsearch) query grammar, send a query request to ES (Elasticsearch), and query and obtain the application log from ES (Elasticsearch).

In another embodiment, after the acquiring the application log from the preset data source, the method further includes:

filtering the application log through a regular expression to remove Chinese, numbers, symbols and character strings with the length larger than a preset length of the application log;

noise data of the application log is identified and filtered.

Specifically, the application log contains many irrelevant information, such as chinese, numbers, symbols, and character strings with length values greater than a preset length (set to 3 in this embodiment), and then some useless information or erroneous information in the application log, such as null data, abnormal data, etc., are identified.

In this embodiment, the chinese, number, symbol and character string with length greater than the preset length in the application log are removed, so that data noise can be reduced, workload for subsequent noise data filtering is reduced, and subsequent log analysis efficiency and processing cost are reduced.

S2, analyzing the format of the application log, determining key values to be extracted from the application log, and extracting the key values of the application log by using a preset key value extraction technology.

Since the logs of the same log output point may have different lengths, the field output order of the application log is not fixed, but the key values of the application log generated by the same log output point are basically the same.

The key-value refers to a data storage mode, and consists of a key and a value. In programming, key values are typically used to represent some attribute or configuration information, as well as the state or data of an object, etc., that can be considered a collection. For example, in a dictionary, each word can be considered a key, and the corresponding paraphrase is the value corresponding to the key; in a configuration file, each attribute can be regarded as a key, and the corresponding value is the set value corresponding to the attribute; in the log process, key values may also be used to represent some key attributes in the log information. For example, a diagnosis record of a department may use "diagnosis record" as a key, and "patient information" as another key, and its corresponding values respectively indicate which patient is diagnosed at which time and what way the patient is to be diagnosed. By extracting and merging key values of the logs, operations such as automatic log desensitization, log merging, performance optimization and the like can be performed.

That is, in the present embodiment, the key value of the application log can be understood as one set.

In this embodiment, the format and the storage location of the application log are determined, and a key value is extracted from the application log by a preset key value extraction technology, where the preset key value extraction technology includes a regular expression, character string matching, XML parsing technology and JSON parsing technology.

In one embodiment, the extracting the key value of the application log by using a preset key value extraction technology includes:

analyzing the format of the application log, and confirming the data type of the key value to be extracted;

and extracting the key value of the application log by adopting a key value extraction technology corresponding to the format of the application log according to the format of the application log and the data type of the key value.

Specifically, in this embodiment, firstly, the format of the application log and the data type of the key value to be extracted are determined, where the format of the application log includes a text file, a JSON format file, an XML format file, and the like, the data type of the key value refers to the data type of the specified key and value, for example, a character string and an integer, the application log is analyzed to determine the text, such as a timestamp, that matches the key value, and based on the text and the data type, a key value extraction technique corresponding to the format of the application log is invoked according to the format of the application log, for example, when the type of the application log is a JSON format file, the key value of the application log is extracted by using the JSON parsing technique, and when the type of the application log is an XML format file, the key value of the application log is extracted by using the XML parsing technique, and after the extraction technique is determined, the key value of the application log is extracted from the storage location of the application log.

For example, in the medical field, an application log generated by a patient during diagnosis is obtained, after preprocessing is performed on the application log, the format of the application log is analyzed, the data type of the key value to be extracted is determined, the information of the patient during diagnosis such as a diagnosis mode, a mobile phone number and the like is reserved, and a corresponding key value extraction technology is called according to the format of the application log to extract the key value of the patient.

In another embodiment, after the extracting the key value of the application log by using a preset key value extracting technology, the method further includes:

and carrying out data cleaning on the key value of the application log, wherein the data cleaning comprises the step of removing the null value and the repeated value of the application log.

In the embodiment, by extracting the key value of the application log, the classification of the application log with different formats can be realized because the key value is not affected by parameters and the like, and the corresponding extraction technology is adopted according to the format of the application log, so that the flexibility of the extraction step is improved, and the step of unifying the formats is omitted; the extracted key values are cleaned, so that the data processing cost can be reduced, and the quality and reliability of the data can be improved.

S3, calculating Jacquard similarity coefficient values between every two key values respectively.

The jaccard similarity coefficient (Jaccard similarity coefficient) can be used to compare similarities and differences between a limited set of samples, the greater the jaccard similarity coefficient value, the higher the template similarity.

In this embodiment, key values of all application logs are traversed, and the jaccard similarity coefficient between the key values of each two application logs is calculated respectively, so as to obtain a similarity coefficient calculation result between the key values of each two application logs.

Calculating the similarity coefficient between the key values of every two application logs through the following formula to obtain a similarity coefficient calculation result;

It will be appreciated that the Jacquard similarity coefficient computes the ratio of the size of the intersection of two sets (e.g., set A and set B) to the size of the union of the two sets, such as:

set a= {1,2,3,4}, set b= {3,4,5,6}, at this time J ] a, B) = {3,4}/{1,2,3,4,5,6} = 1/3.

In one embodiment, the calculating the value of the jaccard similarity coefficient between the key values of any two application logs, to obtain a similarity coefficient calculation result, includes:

and obtaining key values of any two different application logs, and calculating the key values of the two different application logs through a Jacquard similarity coefficient calculation formula to obtain a calculation result.

Specifically, the number of intersections and the number of union sets of the key values of the two application logs can be calculated, and then the number of intersections of the key values of the two application logs is divided by the number of union sets of the key values of the two application logs to obtain the Jacquard similarity coefficient of the key values of the two application logs.

In this embodiment, by comparing the jaccard similarity coefficient values of the key values between any two logs, the application logs can be quickly classified, the classifying efficiency is improved, and the method is not affected by the format of the application logs.

And S4, classifying the application log according to a preset classification rule and a Jacquard similarity coefficient value between every two key values.

In this embodiment, the preset classification rule refers to two application logs corresponding to a value of a jaccard similarity coefficient greater than a preset threshold value, which are classified into the same class of application logs; after obtaining the value of the jaccard similarity coefficient between every two key values, the two application logs corresponding to the value of the jaccard similarity coefficient greater than a preset threshold are classified as application logs output by the same output point, and in this embodiment, the preset threshold is set to 0.9 (specifically determined by the actual situation).

For example, if the value of the jekcard similarity coefficient between the key value of the application log 1 and the key value of the application log 2 is greater than the preset threshold (0.9), the application log 1 and the application log 2 are considered to be the application log output by the same class of output points;

if the Jacquard similarity coefficient value between the key value of the application log 1 and the key value of the application log 3 is larger than a preset threshold value (0.9), the application log 1, the application log 2, the application log and the application log output by the 3-bit same type output point are considered;

if the value of the Jacquard similarity coefficient between the key value of the application log 1 and the key value of the application log 4 is smaller than a preset threshold value (0.9), the application log output by the output points of the different types of the application log 1 and the application log 4 is considered.

In one embodiment, after application log categorization, when there is a newly added application log, it includes:

extracting a key value of the newly added application log;

In another embodiment, after the classifying of the application log, when there is an additional application log, the method further includes:

For example, the application log 1 and the application log 2 are application logs output by the same type of output points, the application log 3 and the application log 4 are application logs output by the same type of output points, at this time, the application log 5 is newly added, the key value of the application log 5 is extracted, the value of the jaccard similarity coefficient between the key value of the application log 5 and the key value of the application log 1 or the key value of the application log 2 is calculated to obtain a calculation result 1, the value of the jaccard similarity coefficient between the key value of the application log 5 and the key value of the application log 3 or the key value of the application log 4 is calculated to obtain a calculation result 2, if the calculation result 1 is greater than a preset threshold, the application log 5, the application log 1 and the application log 2 are application logs output by the same type of output points, and if the calculation result 2 is greater than a preset threshold, the application log 5, the application log 3 and the application log 4 are application logs output by the same type of output points; if both the calculation result 1 and the calculation result 2 are smaller than the preset threshold, the application log 5 is the application log output by the new output point.

In this embodiment, an application log to be categorized is obtained from a preset data source, the format of the application log is analyzed, a key value to be extracted is determined in the application log, and the key value of the application log is extracted by using a preset key value extraction technology; the application logs with different parameters can be classified, the flexibility of the extraction step is improved, and the steps of extracting the parameters of the application logs and analyzing according to the parameters are omitted; the Jacquard similarity coefficient value between every two key values is calculated respectively, so that the quick classification of the application log is realized, and the classification efficiency is improved; the application logs are classified according to preset classification rules and Jaccard similarity coefficient values between every two key values, and the newly added application logs can be classified more conveniently.

Referring to fig. 2, a functional block diagram of a classification device 100 for application logs according to the present application is shown.

An obtaining module 110, configured to obtain an application log to be categorized from a preset data source;

In one embodiment, after the obtaining the application log to be categorized from the preset data source, the method further includes:

noise data of the application log is identified and filtered.

The extracting module 120 is configured to analyze a format of the application log, determine a key value to be extracted in the application log, and extract the key value of the application log by using a preset key value extracting technology;

in one embodiment, the format of the application log includes a text file, a JSON format file, an XML format file;

A calculation module 130, configured to calculate a value of a jaccard similarity coefficient between every two key values;

in one embodiment, the formula for calculating the value of Jacquard similarity coefficient between every two key values includes:

And the classifying module 140 is configured to classify the application logs with the jaccard similarity coefficient value greater than the preset threshold value as application logs output by the same output point.

extracting a key value of the newly added application log;

In one embodiment, after the classifying of the application log, when there is an additional application log, the method further includes:

Referring to fig. 3, a schematic diagram of a preferred embodiment of an electronic device 1 according to the present application is shown.

The electronic device 1 includes, but is not limited to: memory 11, processor 12, display 13, and network interface 14. The electronic device 1 is connected to a network through a network interface 14 to obtain the original data. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (GlobalSystemofMobilecommunication, GSM), a wideband code division multiple access (WidebandCodeDivisionMultipleAccess, WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, or a call network.

The memory 11 includes at least one type of readable medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 11 may be an internal storage unit of the electronic device 1, such as a hard disk or a memory of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard) or the like, which are equipped in the electronic device 1. Of course, the memory 11 may also comprise both an internal memory unit of the electronic device 1 and an external memory device. In this embodiment, the memory 11 is generally used for storing an operating system and various application software installed in the electronic device 1, such as program codes of the log categorizing program 10. Further, the memory 11 may be used to temporarily store various types of data that have been output or are to be output.

Processor 12 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used for controlling the overall operation of the electronic device 1, e.g. performing data interaction or communication related control and processing, etc. In this embodiment, the processor 12 is configured to execute the program code or process data stored in the memory 11, such as the program code of the log categorizing program 10.

The display 13 may be referred to as a display screen or a display unit. The display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (EmittingDiode, OLED) touch, or the like in some embodiments. The display 13 is used for displaying information processed in the electronic device 1 and for displaying a visual work interface, for example displaying the results of data statistics.

The network interface 14 may alternatively comprise a standard wired interface, a wireless interface, such as a WI-FI interface, which network interface 14 is typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

Fig. 3 shows only the electronic device 1 with components 11-14 and the log categorization program 10, but it should be understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.

Optionally, the electronic device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (EmittingDiode, OLED) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

The electronic device 1 may further comprise Radio Frequency (RF) circuits, sensors and audio circuits etc., which are not described here.

In the above embodiment, the processor 12 may implement the following steps when executing the log categorizing program 10 stored in the memory 11:

acquiring an application log to be classified from a preset data source;

For a detailed description of the above steps, please refer to fig. 1 for a flowchart of an embodiment of a method for dynamically adjusting task execution time intervals.

Furthermore, the embodiment of the application also provides a computer readable medium, which can be nonvolatile or volatile. The computer readable medium may be any one or any combination of several of a hard disk, a multimedia card, an SD card, a flash memory card, SMC, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, and the like. The computer readable medium includes a storage data area and a storage program area, the storage data area stores data created according to the use of the blockchain node, the storage program area stores a log categorizing program 10, and the log categorizing program 10 when executed by a processor realizes the following operations:

acquiring an application log to be classified from a preset data source;

The embodiment of the computer readable medium of the present application is substantially the same as the embodiment of the method for dynamically adjusting the task execution time interval, and will not be described herein.

In another embodiment, the method for dynamically adjusting task execution time interval further ensures privacy and security of all the data, and all the data may be stored in a node of a blockchain. Such as feature dimensions, feature embedding, which may all be stored in the blockchain node.

It should be noted that, the blockchain referred to in the present application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, etc. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

It should be noted that, the foregoing reference numerals of the embodiments of the present application are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above embodiments may be implemented by means of software plus necessary general hardware platforms, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a medium as described above (e.g. ROM/RAM, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, an electronic device, or a network device, etc.) to perform the method according to the embodiments of the present application.

The foregoing description of the preferred embodiments of the present application should not be taken as limiting the scope of the application, but rather should be understood to cover all modifications, equivalents, and alternatives falling within the scope of the application as defined by the appended claims.

Claims

1. A method for categorizing an application log, the method comprising:

acquiring an application log to be classified from a preset data source;

2. The method for classifying an application log according to claim 1, wherein the obtaining the application log to be classified from a preset data source includes:

3. The method for classifying an application log according to claim 1, further comprising, after the obtaining the application log to be classified from the preset data source:

noise data of the application log is identified and filtered.

4. The method of claim 1, wherein the formula for calculating the value of the jaccard similarity coefficient between every two key values comprises:

wherein A represents the key value of one application log in all application logs to be classified, B represents the key value of the other application log in all application logs to be classified, and J (A, B) represents the Jacquard similarity coefficient value between the key values of the two logs.

5. The method for classifying an application log according to claim 1, wherein after the application log is classified, when there is a newly added application log, comprising:

extracting a key value of the newly added application log;

6. The method for classifying an application log according to claim 5, wherein after the application log is classified, when there is a new application log, the method further comprises:

7. The method for classifying an application log according to claim 1, wherein the format of the application log comprises a text file, a JSON format file, an XML format file;

8. An apparatus for categorizing application logs, the apparatus comprising:

9. An electronic device, the electronic device comprising:

at least one processor; the method comprises the steps of,

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the categorization method of application logs according to any one of claims 1 to 7.