CN114296894A

CN114296894A - Call chain intelligent analysis method and device based on log data and electronic equipment

Info

Publication number: CN114296894A
Application number: CN202111602162.9A
Authority: CN
Inventors: 董明; 张海森; 钱燕
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-08

Abstract

The application provides a call chain intelligent analysis method and device based on log data and electronic equipment. In this embodiment, the intelligent analysis of the call chain based on the log data is realized by analyzing the log feature vector corresponding to the log data, and the call chain ID does not need to be pre-embedded. Further, in this embodiment, when the call chain intelligent analysis based on the log data is implemented, the call chain intelligent analysis is performed based on the log feature vector corresponding to the log data, and there are no mandatory requirements and restrictions on the log format.

Description

Call chain intelligent analysis method and device based on log data and electronic equipment

Technical Field

The present application relates to data processing technologies, and in particular, to a call chain intelligent analysis method and apparatus based on log data, and an electronic device.

Background

The calling strand, which is organized by a chain for a series of processes. Each call chain has a unique call chain Identification (ID). When log data generated by running a method in a call chain is analyzed, the call chain is analyzed based on a call chain ID carried in the log data.

The analysis based on the call chain strongly depends on the call chain ID, and the call chain ID is often pre-embedded in advance during application program development so as to perform subsequent analysis of the call chain. And once some calling chain IDs are not pre-embedded, the calling chain corresponding to the non-pre-embedded calling chain ID cannot be analyzed subsequently.

Disclosure of Invention

The application provides a call chain intelligent analysis method and device based on log data and electronic equipment, so that call chain intelligent analysis based on the log data is realized on the premise of not pre-burying a call chain ID.

The embodiment of the application provides a call chain intelligent analysis method based on log data, which is applied to electronic equipment and comprises the following steps:

obtaining original log data generated when a call chain is operated, and denoising the obtained original log data to obtain an effective log data set meeting data requirements;

determining a corresponding log feature vector for each valid log data in the valid log data set;

classifying the log feature vectors corresponding to the effective log data to obtain at least one type of log feature vectors;

and dividing the log feature vectors in each type of log feature vectors so as to divide the log feature data belonging to the same call chain into the same call chain.

The embodiment of the application provides a call chain intelligent analysis device based on log data, and the device is applied to electronic equipment and comprises:

the denoising unit is used for obtaining original log data generated when the call chain is operated and denoising the obtained original log data to obtain an effective log data set meeting the data requirement;

a determining unit, configured to determine a corresponding log feature vector for each valid log data in the valid log data set;

the classification unit is used for classifying the log feature vectors corresponding to the effective log data to obtain at least one type of log feature vectors;

and the dividing unit is used for dividing the log feature vectors in each type of log feature vectors so as to divide the log feature data belonging to the same call chain into the same call chain.

The embodiment of the application also provides the electronic equipment. The electronic device includes: a processor and a machine-readable storage medium;

the machine-readable storage medium stores machine-executable instructions executable by the processor;

the processor is configured to execute machine-executable instructions to implement the steps of the above-disclosed method.

According to the technical scheme, in the embodiment, the intelligent analysis of the call chain based on the log data is realized by analyzing the log feature vector corresponding to the log data, and the call chain ID does not need to be pre-buried.

Further, in this embodiment, when the call chain intelligent analysis based on the log data is implemented, the call chain intelligent analysis is performed based on the log feature vector corresponding to the log data, and there are no mandatory requirements and restrictions on the log format.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart of a method provided by an embodiment of the present application;

FIG. 2 is a flowchart of an implementation of step 102 provided by an embodiment of the present application;

FIG. 3 is a flowchart of an implementation of step 103 provided by an embodiment of the present application;

FIG. 4 is a flowchart of an implementation of step 104 provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of an application of a method provided by an embodiment of the present application;

FIG. 6 is a block diagram of an apparatus according to an embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to make the technical solutions provided in the embodiments of the present application better understood and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart of a method provided in an embodiment of the present application. The flow is applied to the electronic equipment. Optionally, in this embodiment, the electronic device may be a front-end device such as a terminal, and may also be a background server, and the embodiment is not particularly limited.

As shown in fig. 1, the process may include the following steps:

step 101, obtaining original log data generated when a call chain is operated, and denoising the obtained original log data to obtain an effective log data set meeting data requirements.

As described above, when the call chain is run, corresponding log data (which may be referred to as original log data) is generated. Based on this, obtaining the raw log data in step 101 may be achieved. Optionally, in this embodiment, in step 101, original log data generated when the call chain is run within a preset time window may be obtained.

After the original log data is obtained, as described in step 101, denoising the original log data to obtain an effective log data set meeting the data requirement.

Optionally, in this embodiment, denoising the obtained raw log data may include: for each piece of original log data, relevant word data related to a log template is identified from the original log data based on a trained log template model, and the identified relevant word data is deleted from the original log data.

In one example, the log template model is generated by extracting historical log data as word vectors according to special characters and training common words of the log template. The log template model here contains common words (also called associated words) that are relevant to the log template. Based on this, the identifying, from the raw log data, the relevant word data related to the log template based on the trained log template model may include: based on the log template model, which contains a collection of common words (also called associated words) associated with the log template, data existing in the collection (i.e. referred to as associated word data associated with the log template) is identified from the raw log data.

By denoising the obtained original log data, the problem of high similarity caused by the fact that different original log data carry the same log template associated words can be solved, and the accuracy of the follow-up call chain intelligent analysis based on the log data is improved.

Step 102, determining a corresponding log feature vector for each valid log data in the valid log data set.

Optionally, in this embodiment, the valid log data refers to data that meets specific requirements of the log data, and here, the specific requirements of the log data can be set according to actual situations. The embodiments of the present application do not consider data that meets the specific requirements of log data, such as a letter, a punctuation, a number, etc.

As to how to determine the corresponding log feature vector for each valid log data, there are many implementation manners, and fig. 2 illustrates one implementation manner, which is not described herein again for the sake of example.

And 103, classifying the log feature vectors corresponding to the effective log data to obtain at least one type of log feature vectors.

Optionally, in this embodiment, the classification of the log feature vectors corresponding to each valid log data may be implemented by improving a conventional mean shift clustering algorithm, and fig. 3 illustrates one implementation manner, which is not described herein for the time being.

And step 104, dividing the log feature vectors in each type of log feature vectors so as to divide the log feature data belonging to the same call chain into the same call chain.

Optionally, in this embodiment, the data in the same call chain may be divided into the same call chain by comparing feature values in each dimension in the log feature vector. Fig. 4 below illustrates one implementation manner, and details are not described here.

Thus, the flow shown in fig. 1 is completed.

As can be seen from the flow shown in fig. 1, in this embodiment, the intelligent analysis of the call chain based on the log data is implemented by analyzing the log feature vector corresponding to the log data, and the call chain ID does not need to be pre-embedded.

The flow shown in fig. 2 is described below:

referring to fig. 2, fig. 2 is a flowchart of step 102 implementation provided in an embodiment of the present application. As shown in fig. 2, the process may include the following steps:

step 201, for each valid log data, performs the following step 202.

Step 202, determining the word feature vector corresponding to the valid word in the valid log data, and determining the word category to which the valid word belongs based on the word feature vector corresponding to the valid word.

Optionally, in this embodiment, determining the word feature vector corresponding to the valid word in the valid log data may include: for each effective log data, converting the character strings of the effective words in the effective log data into byte arrays according to the conversion mode of the character string-bit group (byte) arrays, determining the characteristic value on at least one dimension corresponding to the byte arrays, and combining the determined at least one characteristic value into the word characteristic vector corresponding to the word.

Optionally, in this embodiment, at least one dimension corresponding to the byte array may include: array length (length), mean (mean), variance (var), first order autocorrelation (acf1), number of intersections of curve and mean line (cross), strength of trend term (trand), linearity of trend term (linearity), and curvature of trend term (curve).

Correspondingly, in this embodiment, the feature value in at least one dimension corresponding to the byte array may include: the method comprises the following steps of determining the length of a byte array, and/or the average value and/or the variance and/or the first-order autocorrelation of all values in the byte array, and/or the number of intersections of a curve of the byte array and a mean line, and/or the strength and/or the linearity and/or the curvature of a trend term of the byte array.

Optionally, in this embodiment, the determining, in this step 202, the word category to which the valid word belongs based on the word feature vector corresponding to the valid word may include:

and inputting the word feature vector corresponding to each effective word in the effective log data into the trained K-Means model to obtain the word category to which the effective word belongs.

In this embodiment, the K-Means model is obtained by training based on the word feature vectors corresponding to the valid words in the history log data.

Step 203, the effective words in the effective log data are subjected to specified standardization processing to obtain standard word data.

It should be noted that, in this embodiment, step 202 and step 203 do not have a fixed time sequence.

Optionally, in this embodiment, a specified standardization process is performed on the valid word in the valid log data, for example, a character string of the valid word in the valid log data is converted into a character of a Double type, and the like, and this embodiment is not particularly limited.

And step 204, generating a log feature vector corresponding to the effective log data according to the standard word data under each word category.

Optionally, in this embodiment, if the K-Means model supports N feature dimensions, each feature dimension corresponds to a word class, based on this, in step 204, for each word class, the feature dimension corresponding to the word class is found in the N feature dimensions, and the standard word data in the word class is sequentially used as the feature value in the feature dimension. And analogizing in sequence, and finally combining the characteristic values on the N characteristic dimensions according to the arrangement sequence of the N characteristic dimensions to form the log characteristic vector. That is, in this embodiment, the log feature vector includes standard word data in each word category, the dimension of the log feature vector is each word category, and the feature value in the feature dimension in the log feature vector is the standard word data in the word category corresponding to the feature dimension.

Thus, the flow shown in fig. 2 is completed.

How to determine the corresponding log feature vector for each valid log data in the valid log data set is achieved by the flow shown in fig. 2. It should be noted that fig. 2 is only one specific embodiment, and is not intended to be limiting.

The flow shown in fig. 3 is described below:

referring to fig. 3, fig. 3 is a flowchart of step 103 implemented by an embodiment of the present application. As shown in fig. 3, the process may include the following steps:

step 301, one of the log feature vectors is selected as a central feature vector.

Optionally, in this embodiment, one log feature vector may be randomly selected from the to-be-processed log feature vectors as the central feature vector. Initially, the log feature vector to be processed may be all log feature vectors in the flow shown by fig. 2. In other cases, the feature vector of the log to be processed may be defined according to the following, which is not described herein for the sake of brevity.

Step 302, selecting candidate log feature vectors meeting the conditions from the log feature vectors corresponding to the effective log data, and determining the selected candidate log feature vectors meeting the conditions as a candidate log feature vector set; the condition means that the distance between each candidate log feature vector in the candidate log feature vector set and the center feature vector is smaller than or equal to a preset distance.

In this embodiment, the preset distance may be set according to actual requirements, and the embodiment of the present application is not particularly limited.

Step 303, controlling the central feature vector to move a target value along the direction of a target offset vector, where the target offset vector is the sum of offset vectors of each candidate log feature vector in the candidate log feature vector set relative to the central feature vector, and the target value is a modulus of the target offset vector.

Step 304, checking whether the target value is larger than or equal to a set offset threshold value, if not, returning to step 303, if so, recording the position of the central feature vector, classifying the central feature vector and other log feature vectors with the distance between the central feature vector and the other log feature vectors smaller than or equal to a preset distance into the same log feature vector, determining the log feature vectors which are not classified into the log feature vectors to be processed, and returning to step 301 for the log feature vectors to be processed until all the log feature vectors are classified.

In this embodiment, the offset threshold may be set according to actual requirements, and the embodiment of the present application is not particularly limited.

The flow shown in fig. 3 is completed.

How to classify the log feature vectors corresponding to the valid log data is realized through the flow shown in fig. 3. It should be noted that fig. 3 is only one specific embodiment, and is not intended to be limiting.

The flow shown in fig. 4 is described below:

referring to fig. 4, fig. 4 is a flowchart of step 104 implemented by an embodiment of the present application. As shown in fig. 4, the process may include the following steps:

step 401, for each log feature vector in each log feature vector type, the following step 402 is executed.

Step 402, for each feature dimension in the log feature vector, classifying feature values meeting similar conditions in the feature dimension into the same call chain.

Optionally, in this embodiment, the feature values that are the same in each feature dimension may be grouped into the same call chain.

Through the flow shown in fig. 4, the attributed call chain can be finally found for all feature values in all feature dimensions.

The flow shown in fig. 4 is completed.

It should be noted that, in this embodiment, all the designed models may be unsupervised learning models, which do not need to be manually calibrated, and thus the manual workload is saved.

The method provided by the embodiment of the application is described above. In order to make the above method easier to understand from a global perspective, fig. 5 illustrates a schematic method provided by an embodiment of the present application.

The following describes the apparatus provided in the embodiments of the present application:

referring to fig. 6, fig. 6 is a structural diagram of an apparatus according to an embodiment of the present disclosure. The device is applied to the electronic equipment, and comprises:

Optionally, the denoising unit denoises the obtained raw log data, including:

for each piece of original log data, relevant word data related to a log template is identified from the original log data based on a trained log template model, and the identified relevant word data is deleted from the original log data.

Optionally, the determining, by the determining unit, a corresponding log feature vector for each valid log data in the valid log data set includes:

for each valid log data, performing the steps of:

determining a word feature vector corresponding to an effective word in the effective log data, and determining a word category to which the effective word belongs based on the word feature vector corresponding to the effective word;

carrying out specified standardization processing on the effective words in the effective log data to obtain standard word data; generating a log feature vector corresponding to the effective log data according to the standard word data under each word category, wherein the log feature vector comprises the standard word data under each word category, the feature dimension of the log feature vector is each word category, and the feature value on the feature dimension in the log feature vector is the standard word data under the word category corresponding to the feature dimension.

Optionally, determining the word feature vector corresponding to the valid word in the valid log data includes:

for each effective log data, converting character strings of effective words in the effective log data into byte arrays according to the conversion mode of the character string-bit group byte arrays, determining characteristic values on at least one dimension corresponding to the byte arrays, and combining the determined at least one characteristic value into a word characteristic vector corresponding to the word;

wherein the characteristic values in the at least one dimension at least comprise: the length of the byte array, and/or the mean and/or the variance var and/or the first-order autocorrelation acf1 of all the values in the byte array, and/or the number of intersections of the curve with the mean line, and/or the intensity trand and/or the linearity and/or the curvature of the trend term.

Optionally, the determining, based on the word feature vector corresponding to the valid word, a word class to which the valid word belongs includes:

inputting the word feature vector corresponding to each effective word in the effective log data into the trained K-Means model to obtain the word category of the effective word;

the K-Means model is obtained by training based on word feature vectors corresponding to effective words in historical log data.

Optionally, the classifying the log feature vectors corresponding to the valid log data by the classifying unit includes:

selecting one log feature vector as a central feature vector; selecting candidate log characteristic vectors meeting the conditions from the log characteristic vectors corresponding to the effective log data, and determining the selected candidate log characteristic vectors meeting the conditions as a candidate log characteristic vector set; the condition is that the distance between each candidate log feature vector in the candidate log feature vector set and the center feature vector is smaller than or equal to a preset distance;

controlling the central feature vector to move a target value along the direction of a target offset vector, wherein the target offset vector is the sum of offset vectors of each candidate log feature vector in a candidate log feature vector set relative to the central feature vector, and the target value is a mode of the target offset vector;

checking whether the target value is larger than or equal to a set offset threshold value, if not, returning to select log feature vectors meeting the conditions from the log feature vectors corresponding to the effective log data, if so, recording the position of the central feature vector, classifying the central feature vector and other log feature vectors with the distance between the central feature vector and the central feature vector being smaller than or equal to a preset distance into the same log feature vector, and returning to the step of selecting one log feature vector as the central feature vector for the unclassified log feature vectors until all the log feature vectors are classified.

Optionally, the dividing, by the dividing unit, the dividing the log feature vector in each type of log feature vector includes:

for each log feature vector in each log feature vector type, performing the following steps:

and for each feature dimension in the log feature vector, classifying feature values meeting similar conditions on the feature dimension into the same call chain.

Thus, the description of the structure of the device shown in fig. 6 is completed.

The embodiment of the application also provides a hardware structure of the device shown in fig. 6. Referring to fig. 7, fig. 7 is a structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 7, the hardware structure may include: a processor and a machine-readable storage medium having stored thereon machine-executable instructions executable by the processor; the processor is configured to execute machine-executable instructions to implement the methods disclosed in the above examples of the present application.

Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where several computer instructions are stored, and when the computer instructions are executed by a processor, the method disclosed in the above example of the present application can be implemented.

The machine-readable storage medium may be, for example, any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. The intelligent analysis method for the call chain based on the log data is applied to electronic equipment and comprises the following steps:

2. The method of claim 1, wherein denoising the obtained raw log data comprises:

3. The method of claim 1, wherein determining a corresponding log feature vector for each valid log data in the valid set of log data comprises:

for each valid log data, performing the steps of:

4. The method of claim 3, wherein determining the word feature vector corresponding to the valid word in the valid log data comprises:

5. The method of claim 3, wherein determining the word class to which the valid word belongs based on the word feature vector corresponding to the valid word comprises:

6. The method of claim 1, wherein the classifying the log feature vector corresponding to each valid log data comprises:

7. The method of claim 1, wherein the dividing log feature vectors of each class of log feature vectors comprises:

8. The utility model provides a call chain intelligent analysis device based on log data which characterized in that, the device is applied to electronic equipment, includes:

9. The apparatus of claim 8, wherein the determining unit determines a corresponding log feature vector for each valid log data in the valid log data set comprises:

for each valid log data, performing the steps of:

10. An electronic device, comprising: a processor and a machine-readable storage medium;

the processor is configured to execute machine executable instructions to implement the method steps of any of claims 1-7.