CN118277141A - Abnormality identification network generation method, abnormality identification device and electronic equipment - Google Patents

Abnormality identification network generation method, abnormality identification device and electronic equipment

Info

Publication number
CN118277141A
CN118277141A CN202410236617.7A CN202410236617A CN118277141A CN 118277141 A CN118277141 A CN 118277141A CN 202410236617 A CN202410236617 A CN 202410236617A CN 118277141 A CN118277141 A CN 118277141A
Authority
CN
China
Prior art keywords
log data
target
service
sample log
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410236617.7A
Other languages
Chinese (zh)
Inventor
刘成穆
初永光
牛白兰
余欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianlian Hangzhou Information Technology Co ltd
Original Assignee
Lianlian Hangzhou Information Technology Co ltd
Filing date
Publication date
Application filed by Lianlian Hangzhou Information Technology Co ltd filed Critical Lianlian Hangzhou Information Technology Co ltd
Publication of CN118277141A publication Critical patent/CN118277141A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses an anomaly identification network generation method, an anomaly identification device and electronic equipment, wherein the method comprises the steps of extracting characteristics of each sample log data in sample log data sets of a plurality of target services to obtain initial characteristic vectors, wherein each target service corresponds to one sample log data set, and each sample log data set corresponds to one calling chain identification information; generating a target feature vector based on the initial feature vector and the time information of each sample log data, and further clustering the sample log data in each sample log data set to obtain a clustering result, wherein the clustering result is used for indicating that each sample log data is normal sample data or abnormal sample data; training a preset anomaly identification network based on each sample log data and a corresponding clustering result to obtain a target anomaly identification network. By utilizing the embodiment of the invention, abnormal business can be automatically identified according to the business log data, and the identification accuracy and efficiency are improved.

Description

Abnormality identification network generation method, abnormality identification device and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for generating an anomaly identification network, and an electronic device.
Background
The log data is an important basis for user behavior recording, fault checking, monitoring, error information and the like in the operation process of the service system, the service system has massive log data, and with the vigorous development of internet technology, the service system widely uses a distributed cluster, and the identification of problematic services in massive data is still a great challenge.
In the prior art, the abnormal log is usually identified by a manual mode, and different monitoring points are required to be designed manually according to call paths of different services, or whether the services are abnormal or not is judged by means of other auxiliary information. However, the method has strong subjectivity on abnormal service identification, reduces the accuracy and reliability of abnormal service determination, increases the manpower consumption and reduces the efficiency of abnormal service identification.
Disclosure of Invention
Aiming at the problems in the prior art, the invention discloses an anomaly identification network generation method, an anomaly identification device and electronic equipment, which can automatically identify an anomaly service according to log data of the service and improve the accuracy and efficiency of the anomaly service identification. The technical scheme disclosed by the invention is as follows:
According to an aspect of the disclosed embodiments of the present invention, there is provided an anomaly identification network generation method, including:
Acquiring a sample log data set of at least one target service based on the call chain identification information; each target service corresponds to one sample log data set, each sample log data set corresponds to one calling chain identification information, and each calling chain identification information is used for identifying a calling link of each target service; each sample log data set includes at least one sample log data, each sample log data including time information;
extracting the characteristics of each sample log data to obtain an initial characteristic vector corresponding to each sample log data;
Generating a target feature vector corresponding to each sample log data based on the initial feature vector and the time information;
Based on the target feature vector, carrying out clustering processing on the sample log data in each sample log data set to obtain a clustering result corresponding to each sample log data, wherein the clustering result is used for indicating that each sample log data is normal sample data or abnormal sample data;
training a preset anomaly identification network based on the log data of each sample and the corresponding clustering result to obtain a target anomaly identification network.
Optionally, each target service includes at least one sub-service, the time information includes first time information and second time information, and generating, based on the initial feature vector and the time information in each sample log data, a target feature vector corresponding to each sample log data includes:
determining duration information of the target service corresponding to each sample log data based on the first time information;
determining time interval information between sub-services corresponding to each sample log data based on the second time information;
And generating the target feature vector based on the initial feature vector, the duration information and the time interval information.
Optionally, the acquiring the sample log data set of the at least one target service based on the call chain identification information includes:
Acquiring sample log data corresponding to the at least one target service, wherein each sample log data comprises call chain identification information;
And grouping the sample log data corresponding to the at least one target service based on the identification information of each call chain to obtain a sample log data set of each target service.
Optionally, the feature extracting the log data of each sample, and obtaining the initial feature vector corresponding to each sample log data includes:
acquiring service type information and call interface information corresponding to each target service;
Generating target keyword information of each target service based on the service type information, the calling interface information and preset keyword information;
And carrying out vectorization processing on each sample log data corresponding to each target service based on the target keyword information to obtain an initial feature vector corresponding to each sample log data.
Optionally, training the preset anomaly identification network based on the log data of each sample and the corresponding clustering result, and obtaining the target anomaly identification network includes:
Inputting each sample log data into the preset anomaly identification network to obtain a prediction identification result of each sample log data;
Determining loss information based on the predictive recognition result and the clustering result;
Training the preset anomaly identification network based on the loss information to obtain the target anomaly identification network.
According to another aspect of the disclosed embodiments of the present invention, there is provided an anomaly identification method including:
acquiring log data of a service to be identified;
Inputting the log data of the service to be identified into a target abnormal identification network to perform abnormal identification, and obtaining a target identification result of the service to be identified, wherein the target identification result is used for indicating whether the service to be identified is a normal service or an abnormal service, and the target abnormal identification network is generated by adopting the abnormal identification network generation method according to any one of the above.
According to another aspect of the disclosed embodiments of the present invention, there is provided an anomaly identification network generation apparatus including:
the first acquisition module is used for acquiring a sample log data set of at least one target service based on the calling chain identification information; each target service corresponds to one sample log data set, each sample log data set corresponds to one calling chain identification information, and each calling chain identification information is used for identifying a calling link of each target service; each sample log data set includes at least one sample log data, each sample log data including time information;
The feature extraction module is used for carrying out feature extraction on each sample log data to obtain an initial feature vector corresponding to each sample log data;
The target feature vector generation module is used for generating a target feature vector corresponding to each sample log data based on the initial feature vector and the time information;
the clustering module is used for carrying out clustering processing on the sample log data in each sample log data set based on the target feature vector to obtain a clustering result corresponding to each sample log data, wherein the clustering result is used for indicating that each sample log data is normal sample data or abnormal sample data;
and the training module is used for training the preset anomaly identification network based on the log data of each sample and the corresponding clustering result to obtain a target anomaly identification network.
According to another aspect of the disclosed embodiments of the present invention, there is provided an abnormality recognition apparatus including:
the second acquisition module is used for acquiring log data of the service to be identified;
The abnormal recognition module is used for inputting the log data of the service to be recognized into a target abnormal recognition network to perform abnormal recognition, so as to obtain a target recognition result of the service to be recognized, wherein the target recognition result is used for indicating whether the service to be recognized is a normal service or an abnormal service, and the target abnormal recognition network is generated by adopting the abnormal recognition network generation method described in any one of the above.
According to another aspect of the disclosed embodiments of the present invention, there is provided an electronic device for generating or identifying an anomaly identification network, including a processor and a memory, where at least one instruction, at least one program, a code set, or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement any one of the anomaly identification network generation method or the anomaly identification method described above.
According to another aspect of the disclosed embodiments of the present invention, there is provided a computer-readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the anomaly identification network generation method or the anomaly identification method of any one of the above.
According to another aspect of the disclosed embodiments of the invention, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the anomaly identification network generation method or the anomaly identification method of any one of the above-described embodiments of the present disclosure.
The anomaly identification network generation method provided by the invention has the following technical effects:
The method comprises the steps of acquiring sample log data sets of at least one target service based on call chain identification information, wherein each target service corresponds to one sample log data set, each sample log data set corresponds to one call chain identification information, each call chain identification information is used for identifying a call link of each target service, each sample log data set comprises at least one sample log data, and each sample log data comprises time information; and further, extracting features from each sample log data to obtain an initial feature vector corresponding to each sample log data, generating a target feature vector corresponding to each sample log data based on the initial feature vector and time information, so that the feature vector contains multi-dimensional features of the sample log data, further clustering the sample log data in each sample log data set to obtain a clustering result corresponding to each sample log data, wherein the clustering result is used for indicating that each sample log data is normal sample data or abnormal sample data, training a preset abnormal recognition network based on each sample log data and the corresponding clustering result to obtain a target abnormal recognition network, and therefore abnormal services can be automatically recognized according to the log data of the services, service abnormalities caused by various reasons can be recognized, and accuracy, reliability and efficiency of abnormal service recognition are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram illustrating an application environment of an anomaly identification network generation method, according to an example embodiment;
FIG. 2 is a flow diagram illustrating a method of anomaly identification network generation, according to an example embodiment;
FIG. 3 is a flow diagram illustrating a feature extraction process according to an example embodiment;
FIG. 4 is a flow chart illustrating a method of determining a target feature vector according to an exemplary embodiment;
FIG. 5 is a schematic diagram illustrating one generation of an anomaly identification network in accordance with an exemplary embodiment;
FIG. 6 is a flow chart illustrating an anomaly identification method in accordance with an exemplary embodiment;
FIG. 7 is a block diagram of an anomaly identification network generation apparatus, according to an example embodiment;
FIG. 8 is a block diagram illustrating an anomaly identification device, according to an example embodiment;
FIG. 9 is a block diagram of a terminal electronic device for anomaly identification network generation or anomaly identification, according to an example embodiment;
FIG. 10 is a block diagram of a server electronic device for anomaly identification network generation or anomaly identification, according to an example embodiment.
Detailed Description
In order that those skilled in the art will better understand the disclosed embodiments of the present invention, a detailed description of the disclosed embodiments of the present invention will be provided with reference to the accompanying drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the disclosed embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment of an anomaly identification network generation method according to an exemplary embodiment, where the application environment may include at least a server 100 and a terminal 200.
In an alternative embodiment, the server 100 may be used to perform the anomaly identification network generation process, where the server 100 may be a separate physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides a cloud computing service.
In an alternative embodiment, terminal 200 may be used to provide services such as anomaly identification based on an anomaly identification network. Specifically, the terminal 200 may include, but is not limited to, smart phones, desktop computers, tablet computers, notebook computers, smart speakers, digital assistants, augmented reality (augmented reality, AR)/Virtual Reality (VR) devices, smart wearable devices, vehicle terminals, smart televisions, and other types of electronic devices; or software running on the electronic device, such as an application, applet, etc. Operating systems running on the electronic device in embodiments of the present application may include, but are not limited to, android systems, IOS systems, linux, windows, and the like.
In addition, it should be noted that, the application environment shown in fig. 1 is only an application environment of the anomaly identification network generation method, and the embodiment of the present disclosure is not limited to the above.
In the embodiment of the present disclosure, the server 100 and the terminal 200 may be directly or indirectly connected through a wired or wireless communication method, which is not limited herein.
The log data is an important basis for user behavior recording, fault checking, monitoring, error information and the like in the operation process of the service system, and the service system widely uses a distributed cluster along with the vigorous development of the Internet technology. The method can connect the upstream chain and the downstream chain of a service request based on ZipKin, skyWalking and other technologies, and further enriches the collected data by adding the log with the calling chain ID, but the identification of problematic transactions in the massive data is a huge challenge. Generally, errors of a call chain interface are easy to identify, but for internal logic errors or anomalies, whether a transaction is abnormal or not needs to be judged by manually looking at a log or other auxiliary information, errors and omission are easy to occur, subjectivity is high, accuracy and reliability of abnormal service determination are reduced, meanwhile, labor consumption is increased, and efficiency of abnormal service identification is reduced. Therefore, the application provides the method for generating the abnormal identification network, which can automatically identify the abnormal service according to the log data of the service and improve the accuracy, reliability and efficiency of the abnormal service identification.
In the following description of an anomaly identification network generation method according to the present application, fig. 2 is a schematic flow chart of an anomaly identification network generation method according to an exemplary embodiment, and the present specification provides the method operation steps as described in the examples or the flow chart, but may include more or less operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment). Specifically, as shown in fig. 2, the method may include:
s201: based on the call chain identification information, a sample log dataset of at least one target service is obtained.
In a specific embodiment, each target service may be any executable service provided in the service providing platform, specifically, the target service may include a login service, a query service, a payment service, and the like, each target service may include at least one sub-service, and taking the target service as an example of a payment service for a user to pay a consumption amount to a merchant, the sub-service of the target service may include pre-deduction from a local account of the user, final deduction from an online banking account (external service) of the user, and the like.
In a specific embodiment, each target service may correspond to a sample log data set, each sample log data set may correspond to call chain identification information, each call chain identification information may be used to identify a call link of each target service, and the call chain identification information may be ID information identifying a service call link. Specifically, each target service corresponds to a complete calling procedure, where the calling procedure may include multiple sub-calling procedures, and each sub-service corresponds to one or multiple sub-calling procedures, for example, the complete calling procedure of a certain service is implemented by calling service B through service a and then calling service C (i.e., the calling link is "a→b→c"), and two calling procedures of calling service B through service a and then calling service C after calling service B are sub-calling procedures, which may correspond to different sub-services.
In a specific embodiment, each sample log data set may include at least one sample log data, each sample log data may include time information, and in particular, the time information may include first time information and second time information, where the first time information may be used to indicate a time-consuming duration of a corresponding target service during one execution, and the second time information may be used to indicate an execution time interval between sub-services of the corresponding target service during one execution; specifically, the first time information may include an occurrence time and a completion time of the target service, and the second time information may include an occurrence time and a completion time of a subtask of the target task.
Specifically, each sample log data may further include data such as a service type corresponding to the target service and a sub-service thereof, a call interface of the target service, an execution result, and the like, where the service type of the target service may include a login service, a query service, a payment service, and the like, and where the target service is a payment service, the sub-service type may include a pre-deduction sub-service type, a final deduction service type, and the like. The execution result may be used to indicate the execution success or the execution failure of the corresponding service. In each sample log data, call interface data is typically recorded in a preset form in order to identify the information. Based on the difference of specific data of the service type, the calling interface and the execution result, a plurality of different text contents, namely different service type information, calling interface information and execution result information, can be corresponding.
In an optional embodiment, the acquiring the sample log data set of the at least one target service based on the call chain identification information may include:
Acquiring sample log data corresponding to at least one target service;
And grouping sample log data corresponding to at least one target service based on the identification information of each call chain to obtain a sample log data set of each target service.
In a particular embodiment, each sample log data may also include a call chain identification information. At the beginning of a service, a globally unique link identification (i.e., call chain ID) may be generated, which may be recorded in a log of this service execution without alteration, which may include the call chain ID of the corresponding service link.
In practical application, in the process of executing each service, the service of a plurality of applications on a plurality of different application servers needs to be called, and service call is sequentially performed, so that a service call link is generated. Each service execution generates a log, and each service can correspond to a plurality of logs (log sets) generated by multiple executions, and the plurality of logs correspond to the same service call links, that is, the plurality of logs have the same call chain ID, so that the logs with the same call chain ID can be obtained from a large number of logs through the call chain ID to form a log set.
S203: and extracting the characteristics of each sample log data to obtain an initial characteristic vector corresponding to each sample log data.
In one embodiment, the log data of the service typically records the corresponding content in a preset format. Firstly, text content in sample log data can be segmented based on preset keyword information, and further text content after segmentation is expressed as a vector based on whether the word appears or the occurrence frequency in a sentence or not. Specifically, the feature extraction operation may be performed based on a preset vectorization algorithm, and specifically, the preset vectorization algorithm may include a document-vector (word 2 vec) algorithm, a vector space model (Vector Space Model), and the like.
Specifically, before word segmentation, text content in the sample log data may be preprocessed, and irrelevant data, such as user data, payment amount data in payment service, and the like, may be removed.
In an alternative embodiment, fig. 3 is a schematic flow chart of a feature extraction process according to an exemplary embodiment, and as shown in fig. 3, performing feature extraction on each sample log data to obtain an initial feature vector corresponding to each sample log data may include:
S301: and acquiring service type information and call interface information corresponding to each target service.
In a specific embodiment, the service type information and the call interface information can be obtained from the application source code through a preset code analysis tool, and the information can also be manually maintained.
S303: and generating target keyword information of each target service based on the service type information, the calling interface information and the preset keyword information.
In a specific embodiment, a plurality of text contents corresponding to the service type information and the call interface information can be used as keywords to form a target keyword together with a preset keyword. The preset keyword information can be set according to actual application requirements.
S305: and carrying out vectorization processing on each sample log data corresponding to each target service based on the target keyword information to obtain an initial feature vector corresponding to each sample log data.
In a specific embodiment, the text content in the sample log data may be segmented based on the target keyword, and further, the segmented text content may be expressed as a vector based on whether the word appears or the number of occurrences in the sentence, and the like.
In the above embodiment, the plurality of sets of text contents corresponding to the service type information and the call interface information of the target service are newly added as keywords, and the text contents in the sample log data are segmented, so that the segmentation accuracy can be improved, the reliability of abnormal recognition is further improved, and the subsequent inaccurate classification of the normal sample data and the abnormal sample data due to the segmentation error is avoided, so that the situation of inaccurate abnormal recognition is caused.
S205: and generating a target feature vector corresponding to each sample log data based on the initial feature vector and the time information.
In a specific embodiment, the target feature vector is generated together with the features directly extracted from the sample log data and the features corresponding to the time information in the sample log data for subsequent processing.
In an alternative embodiment, fig. 4 is a schematic flow chart of determining a target feature vector according to an exemplary embodiment, and as shown in fig. 4, generating, based on the initial feature vector and time information in each sample log data, the target feature vector corresponding to each sample log data includes:
S401: and determining the time length information of the target service corresponding to each sample log data based on the first time information.
In a specific embodiment, the first time information may be used to indicate a time-consuming period of the corresponding target service in one execution process, and specifically, the first time information may include an occurrence time and a completion time of the target service, and the time-consuming, instant long information of executing the service is calculated based on the occurrence time and the completion time of the service.
S403: and determining time interval information between sub-services corresponding to each sample log data based on the second time information.
In a specific embodiment, the second time information may be used to indicate an execution time interval between sub-services of the corresponding target service in one execution process, and specifically, the second time information may include an occurrence time and a completion time of a sub-task of the target task, and the execution time interval between adjacent sub-services in the service execution process is calculated based on the occurrence time of a subsequent sub-service and the completion time of a previous sub-service.
S405: a target feature vector is generated based on the initial feature vector, the duration information, and the time interval information.
In a specific embodiment, the target feature vector is generated based on the features directly extracted from the sample log data, the features corresponding to the time consuming execution of the service, and the features corresponding to the adjacent sub-service execution time intervals.
In the above embodiment, based on the features directly extracted from the sample log data, the features corresponding to the time consumption of service execution, and the features corresponding to the time intervals of adjacent sub-service execution, the feature vectors corresponding to the sample log data are generated together, so that the features of the sample log data can be extracted from different dimensions, and the feature vectors contain more effective features of the sample log data, so as to promote the subsequent anomaly identification processing effect.
S207: and carrying out clustering processing on the sample log data in each sample log data set based on the target feature vector to obtain a clustering result corresponding to each sample log data.
In a specific embodiment, the clustering result may be used to indicate that each sample log data is normal sample data or abnormal sample data, specifically, sample log data deviating from the clustering group may be determined as abnormal sample data, and sample data other than the abnormal sample data may be determined as normal sample data; specifically, a distance (such as WD distance (WASSERSTAIN DISTANCE), KL divergence (Kullback-Leibler Divergence), euclidean distance, etc.) from the cluster center of each sample log data may be calculated to determine a degree of deviation of each sample log data from the cluster center, and in a case where the distance is greater than a preset distance threshold (the degree of deviation is greater), the corresponding sample log data may be determined as sample data deviating from the cluster group, that is, abnormal sample data. The preset distance threshold can be set according to actual application requirements.
In a specific embodiment, each sample log data set may be formed by a plurality of sample log data obtained by executing a service for multiple times, that is, each sample log data set corresponds to the same service, and by clustering a certain number of sample log data corresponding to the same service, abnormal sample log data in the sample log data can be quickly determined, and the method has higher accuracy.
In practical application, in the process of determining the clustering result of the sample log data by clustering, judgment can be performed according to one or more data features in the sample log data, such as a service execution duration feature, a time interval feature between sub-services, a feature corresponding to a service interface, a feature corresponding to execution result information in the sample log data, and the like, so that clustering reference features can be enriched, and various abnormal conditions can be identified, such as too short or too long service execution time, too short or too long time interval between sub-services, service interface call errors, abnormal service execution results, null data in certain data in the log data, other data exceeding a preset format in the log data, and the like, thereby improving the accuracy and comprehensiveness of normal or abnormal judgment of the sample log data. The clustering result can be further checked to ensure reliability.
S209: training a preset anomaly identification network based on each sample log data and a corresponding clustering result to obtain a target anomaly identification network.
In a specific embodiment, the preset anomaly identification network may be an anomaly identification network to be trained, and in particular, the preset anomaly identification network may be any classification network, which is not limited herein. The target anomaly identification network may be an anomaly identification network after training, and the target anomaly identification network may be used for performing anomaly identification on log data of a service to be identified to obtain an identification result of the service to be identified, so as to determine whether the service to be identified is anomalous. Specifically, training can be performed on the anomaly identification network to be trained based on the sample log data and the corresponding clustering result, and the target anomaly identification network is obtained through supervised training.
In an optional embodiment, training the preset anomaly identification network based on each sample log data and the corresponding clustering result to obtain the target anomaly identification network may include:
inputting each sample log data into a preset anomaly identification network to obtain a prediction identification result of each sample log data;
Determining loss information based on the predictive recognition result and the clustering result;
Training a preset anomaly identification network based on the loss information to obtain a target anomaly identification network.
In a specific embodiment, the above prediction recognition result may be used to indicate that the sample log data recognized based on the preset anomaly recognition network is normal sample data or anomaly sample data. The loss information can be calculated by combining a preset loss function; alternatively, the preset loss function may be set in connection with the actual application requirement, such as an exponential loss function, a cross entropy loss function, etc. The loss information may be used to characterize the accuracy of anomaly identification of the current preset anomaly identification network.
In a specific embodiment, training the preset anomaly identification network based on the loss information to obtain the target anomaly identification network may include: updating network parameters of the preset anomaly identification network based on the loss information, repeating the step of inputting each sample log data into the preset anomaly identification network based on the updated preset anomaly identification network to obtain a predicted identification result of each sample log data, and updating training iteration steps of the network parameters of the preset anomaly identification network based on the loss information until the preset training convergence condition is met.
Specifically, the preset training convergence condition may be that the loss information is less than or equal to a preset loss threshold, or the number of training iteration steps reaches a preset number of times, and specifically, the preset loss threshold and the preset number of times may be set in combination with network accuracy and training speed requirements in practical application.
In a specific implementation, as shown in fig. 5, fig. 5 is a schematic diagram illustrating one way of generating an anomaly identification network according to an example embodiment. Specifically, firstly, sample log data corresponding to a plurality of services are obtained, the sample log data corresponding to the plurality of services are grouped based on a call chain ID of the sample log data, one sample log data set corresponding to each target service is obtained, each sample log data set corresponds to one call chain ID, then, feature extraction is carried out on each sample log data, an initial feature vector is obtained, then, based on first time information recorded in the sample log data, duration information of the target service corresponding to each sample log data is determined, wherein the first time information can comprise occurrence time and completion time of the target service, time consumption for executing the service is obtained by calculating based on occurrence time and completion time of the service, instant long information is obtained, time interval information between sub-services corresponding to each sample log data is determined based on second time information recorded in the sample log data, the second time information can comprise occurrence time and completion time of sub-tasks of target tasks, execution time interval information between adjacent sub-services is obtained by calculating based on occurrence time of the next sub-service and completion time of the previous sub-service, and further, the execution time interval information between adjacent sub-services is generated based on the corresponding to the execution time information of the initial feature, the corresponding to the execution time information of the adjacent sub-services, and the common execution time feature of the corresponding to the execution time information of the sub-service is obtained.
And then, based on the target feature vector, carrying out clustering processing on the sample log data in each sample log data set to obtain a clustering result corresponding to each sample log data, wherein the clustering result can be used for indicating that each sample log data is normal sample data or abnormal sample data, further inputting each sample log data into a preset abnormal recognition network to obtain a prediction recognition result of each sample log data, determining loss information based on the prediction recognition result and the clustering result, and training the preset abnormal recognition network based on the loss information to obtain the target abnormal recognition network.
As can be seen from the technical solutions provided in the embodiments of the present disclosure, in the present disclosure, a sample log data set of at least one target service is obtained based on call chain identification information, where each target service corresponds to one sample log data set, each sample log data set corresponds to one call chain identification information, each call chain identification information is used to identify a call link of each target service, each sample log data set includes at least one sample log data, and each sample log data includes time information; and further, extracting features from each sample log data to obtain an initial feature vector corresponding to each sample log data, generating a target feature vector corresponding to each sample log data based on the initial feature vector and time information, so that the feature vector contains multi-dimensional features of the sample log data, further clustering the sample log data in each sample log data set to obtain a clustering result corresponding to each sample log data, wherein the clustering result is used for indicating that each sample log data is normal sample data or abnormal sample data, training a preset abnormal recognition network based on each sample log data and the corresponding clustering result to obtain a target abnormal recognition network, and therefore abnormal services can be automatically recognized according to the log data of the services, service abnormalities caused by various reasons can be recognized, and accuracy, reliability and efficiency of abnormal service recognition are improved. In addition, the text content corresponding to the service type information and the calling interface information of the target service is newly added as a keyword, and the text content in the sample log data is segmented, so that the segmentation accuracy can be improved, the reliability of subsequent abnormal recognition is further improved, and the situation that the subsequent normal sample data and abnormal sample data are inaccurately classified due to segmentation errors and the abnormal recognition inaccuracy is further caused is avoided.
An abnormality recognition method of a target abnormality recognition network generated based on the above-described abnormality recognition network generation method of the present application is described below, and fig. 6 is a flowchart illustrating an abnormality recognition method according to an exemplary embodiment, and as shown in fig. 6, the method may include:
s601: and acquiring log data of the service to be identified.
In a specific embodiment, the service to be identified may be any service that needs to be identified abnormally, such as a login service, a query service, a payment service, and the like. The log data of the service to be identified may be log data generated when the service to be identified is executed.
S603: and inputting the log data of the service to be identified into a target abnormality identification network to perform abnormality identification, so as to obtain a target identification result of the service to be identified.
In a specific embodiment, the target recognition result may be used to indicate that the service to be recognized is a normal service or an abnormal service, and in particular, the target recognition result may be used to indicate a probability that the service to be recognized is normal and/or a probability that the service to be recognized is abnormal, where the probability that the service to be recognized is normal is greater than a first preset threshold, the service to be recognized may be determined to be a normal service, and where the probability that the service to be recognized is abnormal is greater than a second preset threshold, the service to be recognized may be determined to be an abnormal service. The first preset threshold and the second preset threshold can be set according to actual application requirements.
In a specific embodiment, the target anomaly identification network may be obtained by training a preset anomaly identification network based on sample log data and a corresponding clustering result, the preset anomaly identification network is any classification network to be trained, the clustering result corresponding to the sample log data may be obtained by clustering sample log data in a sample log data set based on feature vectors of the sample log data, wherein the feature vectors of the sample log data are generated together based on features directly extracted from the sample log data, features corresponding to time consuming execution of a service and features corresponding to time intervals of execution of adjacent sub-services, a sample log data set may be obtained by grouping a plurality of sample log data sets based on call chain identification information, and each sample log data set corresponds to one call chain identification information.
The target anomaly identification network obtained based on the mode can automatically identify the anomaly service based on the log data of the service, and the accuracy, reliability and efficiency of the anomaly service identification are improved.
The embodiment of the invention also provides an anomaly identification network generation device, as shown in fig. 7, which may include:
A first obtaining module 710, configured to obtain a sample log dataset of at least one target service based on the call chain identification information; each target service corresponds to one sample log data set, each sample log data set corresponds to one calling chain identification information, and each calling chain identification information is used for identifying a calling link of each target service; each sample log data set includes at least one sample log data, each sample log data including time information;
the feature extraction module 720 is configured to perform feature extraction on each sample log data, so as to obtain an initial feature vector corresponding to each sample log data;
A target feature vector generating module 730, configured to generate a target feature vector corresponding to each sample log data based on the initial feature vector and the time information;
The clustering module 740 is configured to perform clustering processing on the sample log data in the sample log data set based on the target feature vector, to obtain a clustering result corresponding to each sample log data, where the clustering result is used to indicate that each sample log data is normal sample data or abnormal sample data;
the training module 750 is configured to train the preset anomaly identification network based on the log data of each sample and the corresponding clustering result, so as to obtain a target anomaly identification network.
Optionally, each target service includes at least one sub-service, the time information includes first time information and second time information, and the target feature vector generating module 730 may include:
the time length information determining unit is used for determining time length information of the target service corresponding to each sample log data based on the first time information;
a time interval information determining unit, configured to determine time interval information between sub-services corresponding to each sample log data based on the second time information;
and the target feature vector generation unit is used for generating the target feature vector based on the initial feature vector, the duration information and the time interval information.
Optionally, the first obtaining module 710 may include:
The first acquisition unit is used for acquiring sample log data corresponding to the at least one target service, and each sample log data comprises call chain identification information;
And the grouping unit is used for grouping the sample log data corresponding to the at least one target service based on the identification information of each calling chain to obtain a sample log data set of each target service.
Optionally, the feature extraction module 720 may include:
The second acquisition unit is used for acquiring the service type information and the calling interface information corresponding to each target service;
The target keyword information generating unit is used for generating target keyword information of each target service based on the service type information, the calling interface information and preset keyword information;
and the vectorization processing unit is used for vectorizing each sample log data corresponding to each target service based on the target keyword information to obtain an initial feature vector corresponding to each sample log data.
Optionally, the training module 750 may include:
the prediction recognition result determining unit is used for inputting each sample log data into the preset anomaly recognition network to obtain a prediction recognition result of each sample log data;
a loss information determining unit configured to determine loss information based on the predictive recognition result and the clustering result;
And the training unit is used for training the preset anomaly identification network based on the loss information to obtain the target anomaly identification network.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The embodiment of the invention also provides an abnormality identification device, as shown in fig. 8, which may include:
A second obtaining module 810, configured to obtain log data of a service to be identified;
The anomaly identification module 820 is configured to input the log data of the service to be identified into a target anomaly identification network for anomaly identification, so as to obtain a target identification result of the service to be identified, where the target identification result is used to indicate that the service to be identified is a normal service or an anomaly service, and the target anomaly identification network is generated by using the anomaly identification network generation method provided by the embodiment of the application.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 9 is a block diagram of a terminal electronic device, which may be a terminal, of which an internal structure may be as shown in fig. 9, for anomaly recognition network generation or anomaly recognition according to an exemplary embodiment. The electronic device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements an anomaly identification network generation method or an anomaly identification method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
Fig. 10 is a block diagram of a server electronic device, which may be a server, whose internal structure may be as shown in fig. 10, for anomaly identification network generation or anomaly identification, according to an example embodiment. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements an anomaly identification network generation method or an anomaly identification method.
It will be appreciated by those skilled in the art that the structures shown in fig. 9 or 10 are merely block diagrams of portions of structures related to the present disclosure and do not constitute limitations of the electronic devices to which the present disclosure is applied, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have different arrangements of components.
In an exemplary embodiment, there is also provided an anomaly identification network generation or anomaly identification electronic device including a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the anomaly identification network generation method or the anomaly identification method in the disclosed embodiments of the present invention.
In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the anomaly identification network generation method or the anomaly identification method in the disclosed embodiments of the present invention.
In an exemplary embodiment, a computer program product containing instructions that, when run on a computer, cause the computer to perform the anomaly identification network generation method or the anomaly identification method in the disclosed embodiments of the present invention is also provided.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate
SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
Other embodiments of the disclosed invention will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed invention. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. An anomaly identification network generation method, comprising:
Acquiring a sample log data set of at least one target service based on the call chain identification information; each target service corresponds to one sample log data set, each sample log data set corresponds to one calling chain identification information, and each calling chain identification information is used for identifying a calling link of each target service; each sample log data set includes at least one sample log data, each sample log data including time information;
extracting the characteristics of each sample log data to obtain an initial characteristic vector corresponding to each sample log data;
Generating a target feature vector corresponding to each sample log data based on the initial feature vector and the time information;
Based on the target feature vector, carrying out clustering processing on the sample log data in each sample log data set to obtain a clustering result corresponding to each sample log data, wherein the clustering result is used for indicating that each sample log data is normal sample data or abnormal sample data;
training a preset anomaly identification network based on the log data of each sample and the corresponding clustering result to obtain a target anomaly identification network.
2. The method of claim 1, wherein each of the target services comprises at least one sub-service, the time information comprises first time information and second time information, and wherein generating the target feature vector for each sample log data based on the initial feature vector and the time information in the each sample log data comprises:
determining duration information of the target service corresponding to each sample log data based on the first time information;
determining time interval information between sub-services corresponding to each sample log data based on the second time information;
And generating the target feature vector based on the initial feature vector, the duration information and the time interval information.
3. The method of claim 1, wherein the obtaining a sample log dataset for at least one target service based on call chain identification information comprises:
Acquiring sample log data corresponding to the at least one target service, wherein each sample log data comprises call chain identification information;
And grouping the sample log data corresponding to the at least one target service based on the identification information of each call chain to obtain a sample log data set of each target service.
4. The method of claim 1, wherein the performing feature extraction on each sample log data to obtain an initial feature vector corresponding to each sample log data comprises:
acquiring service type information and call interface information corresponding to each target service;
Generating target keyword information of each target service based on the service type information, the calling interface information and preset keyword information;
And carrying out vectorization processing on each sample log data corresponding to each target service based on the target keyword information to obtain an initial feature vector corresponding to each sample log data.
5. The method of claim 1, wherein training the preset anomaly identification network based on the each sample log data and the corresponding clustering result to obtain the target anomaly identification network comprises:
Inputting each sample log data into the preset anomaly identification network to obtain a prediction identification result of each sample log data;
Determining loss information based on the predictive recognition result and the clustering result;
Training the preset anomaly identification network based on the loss information to obtain the target anomaly identification network.
6. An anomaly identification method, the method comprising:
acquiring log data of a service to be identified;
inputting the log data of the service to be identified into a target abnormal identification network for abnormal identification to obtain a target identification result of the service to be identified, wherein the target identification result is used for indicating that the service to be identified is a normal service or an abnormal service, and the target abnormal identification network is generated by adopting the abnormal identification network generation method according to any one of claims 1 to 5.
7. An anomaly identification network generation apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a sample log data set of at least one target service based on the calling chain identification information; each target service corresponds to one sample log data set, each sample log data set corresponds to one calling chain identification information, and each calling chain identification information is used for identifying a calling link of each target service; each sample log data set includes at least one sample log data, each sample log data including time information;
The feature extraction module is used for carrying out feature extraction on each sample log data to obtain an initial feature vector corresponding to each sample log data;
The target feature vector generation module is used for generating a target feature vector corresponding to each sample log data based on the initial feature vector and the time information;
the clustering module is used for carrying out clustering processing on the sample log data in each sample log data set based on the target feature vector to obtain a clustering result corresponding to each sample log data, wherein the clustering result is used for indicating that each sample log data is normal sample data or abnormal sample data;
and the training module is used for training the preset anomaly identification network based on the log data of each sample and the corresponding clustering result to obtain a target anomaly identification network.
8. An abnormality recognition device, characterized in that the device includes:
the second acquisition module is used for acquiring log data of the service to be identified;
the anomaly identification module is used for inputting the log data of the service to be identified into a target anomaly identification network to perform anomaly identification, so as to obtain a target identification result of the service to be identified, wherein the target identification result is used for indicating whether the service to be identified is a normal service or an anomaly service, and the target anomaly identification network is generated by adopting the anomaly identification network generation method according to any one of claims 1 to 5.
9. An electronic device for anomaly identification network generation or anomaly identification, characterized in that it comprises a processor and a memory in which at least one instruction, at least one program, code set or instruction set is stored, which is loaded and executed by the processor to implement the anomaly identification network generation method of any one of claims 1 to 5 or the anomaly identification method of claim 6.
10. A computer storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the anomaly identification network generation method of any one of claims 1 to 5 or the anomaly identification method of claim 6.
CN202410236617.7A 2024-03-01 Abnormality identification network generation method, abnormality identification device and electronic equipment Pending CN118277141A (en)

Publications (1)

Publication Number Publication Date
CN118277141A true CN118277141A (en) 2024-07-02

Family

ID=

Similar Documents

Publication Publication Date Title
CN108876133B (en) Risk assessment processing method, device, server and medium based on business information
US11640349B2 (en) Real time application error identification and mitigation
CN110674131A (en) Financial statement data processing method and device, computer equipment and storage medium
CN105205144A (en) Method and system used for data diagnosis and optimization
CN112861662B (en) Target object behavior prediction method based on face and interactive text and related equipment
CN115174231B (en) Network fraud analysis method and server based on AI Knowledge Base
CN111241161A (en) Invoice information mining method and device, computer equipment and storage medium
CN114493255A (en) Enterprise abnormity monitoring method based on knowledge graph and related equipment thereof
CN114004700A (en) Service data processing method and device, electronic equipment and storage medium
CN111475494A (en) Mass data processing method, system, terminal and storage medium
CN115936895A (en) Risk assessment method, device and equipment based on artificial intelligence and storage medium
CN116340172A (en) Data collection method and device based on test scene and test case detection method
CN118277141A (en) Abnormality identification network generation method, abnormality identification device and electronic equipment
CN114565470A (en) Financial product recommendation method based on artificial intelligence and related equipment thereof
CN114722025A (en) Data prediction method, device and equipment based on prediction model and storage medium
CN114637672A (en) Automatic data testing method and device, computer equipment and storage medium
CN112966988A (en) XGboost model-based data evaluation method, device, equipment and storage medium
CN113689020A (en) Service information prediction method, device, computer equipment and storage medium
CN113781237B (en) Product purchase order consumption method based on distributed artificial intelligence system
CN112286724B (en) Data recovery processing method based on block chain and cloud computing center
EP4123479A2 (en) Method and apparatus for denoising click data, electronic device and storage medium
CN115344624A (en) Bank business data anomaly detection method, device, equipment and storage medium
CN117453536A (en) System abnormality analysis method, system abnormality analysis device, computer device and storage medium
CN118041977A (en) Method and device for processing micro-service component, computer equipment and storage medium
CN113326196A (en) Method and device for testing case

Legal Events

Date Code Title Description
PB01 Publication