CN115617953A - Intelligent diagnosis method and system for network service link fault - Google Patents

Intelligent diagnosis method and system for network service link fault Download PDF

Info

Publication number
CN115617953A
CN115617953A CN202211420860.1A CN202211420860A CN115617953A CN 115617953 A CN115617953 A CN 115617953A CN 202211420860 A CN202211420860 A CN 202211420860A CN 115617953 A CN115617953 A CN 115617953A
Authority
CN
China
Prior art keywords
log
data
sample data
tid
network service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211420860.1A
Other languages
Chinese (zh)
Inventor
邹昆
李丽娟
霍曦
段军
原小卫
杨海琴
张驰
郭春江
李亮
李晨华洋
汪俊贵
古训
刘越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Jiuzhou Electronic Technology Co Ltd
Original Assignee
Chengdu Jiuzhou Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Jiuzhou Electronic Technology Co Ltd filed Critical Chengdu Jiuzhou Electronic Technology Co Ltd
Priority to CN202211420860.1A priority Critical patent/CN115617953A/en
Publication of CN115617953A publication Critical patent/CN115617953A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and a system for intelligently diagnosing network service link faults, which belong to the technical field of computers, and comprise the following steps: sampling and extracting the logs of the system service software; extracting characteristics of the sampled data and marking the sampled data as sample data; according to the sample data, TID model training is carried out by utilizing a PCA algorithm; collecting system logs, and analyzing the logs by using a TID model; and outputting a fault diagnosis result. The system and the method can comprehensively analyze and process the log of the system service software, and can accurately diagnose the abnormal condition of the system; the log of the service software is subjected to NLP processing, and the method is suitable for different programming languages and different systems; the system log can be diagnosed on line, the running state of the system can be fed back in a near real-time manner, and the operation and maintenance cost is greatly reduced.

Description

Intelligent diagnosis method and system for network service link fault
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a method and a system for intelligently diagnosing network service link faults.
Background
With the deep construction of the network service system, more and more software is deployed in a matching manner in the whole system, such as data governance, data subscription, data downloading, data cataloging and the like, the software is related to acquisition, processing and storage of a front data source to final display, and each link is connected in series to form a link related to the network service.
In the previous system maintenance work, operation and maintenance personnel can check fault sources step by step through related software logs according to error prompts of software in the inspection process, and if the operation and maintenance personnel do not solve the problems in time, the whole system can be normally used. Secondly, as the service chain has too much related software, the troubleshooting difficulty and the troubleshooting time are increased, so that the system service personnel can not use the software for a long time, thereby influencing the normal work of the service personnel. The traditional mode of manually positioning the fault source through the log has the advantages of low positioning accuracy, time consumption and low working efficiency, and is similar to the operation and maintenance working mode which directly influences the whole service link and the system, so that the operation and maintenance cost is greatly increased.
In order to guarantee the normal work of business personnel and the effective operation of a network business system, higher requirements are put forward on the monitoring and fault diagnosis of the system. The device is researched aiming at the intelligent diagnosis technology of the system software link, so that the requirements of quickly and accurately diagnosing the link fault of the system software are met, and the reliable guarantee is provided for the normal operation of the whole system.
Therefore, the existing troubleshooting process has the following problems: (1) The links of the service link related to software are too many, and any software failure can cause the whole link to be incapable of being normally used; (2) data loss or errors, requiring full link troubleshooting; (3) the error information is imperfect, and the fault location is inaccurate; (4) the troubleshooting time is too long; (5) Business personnel need to perform related work on data stored and displayed in a business link in real time.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the intelligent diagnosis method and the intelligent diagnosis system for the network service link fault, which aim to solve the existing problems.
In order to achieve the above purpose, the invention adopts the technical scheme that:
the scheme provides an intelligent diagnosis method for network service link faults, which comprises the following steps:
s1, sampling and extracting system service software logs;
s2, extracting characteristics of the sampled data and marking the sampled data as sample data;
s3, training the TID model by utilizing a PCA algorithm according to the sample data;
and S4, collecting system service software logs, analyzing the system service software logs of different types by using the TID model, obtaining a fault diagnosis result according to the analysis result, and completing fault diagnosis of the network service link.
The invention has the beneficial effects that: the invention takes a network service system as background, provides a TID model on the basis of analyzing a traditional fault diagnosis model and a common diagnosis method aiming at the characteristics of the network service system and the requirement of system fault diagnosis, provides a learning mechanism based on a PCA algorithm for improving the fault diagnosis efficiency, and can quickly and accurately diagnose and position the system fault; according to the invention, the system service software log is comprehensively analyzed and processed, so that the abnormal condition of the system can be accurately diagnosed; the invention can be applied to different programming languages and different systems by extracting NLP characteristics of the log; the invention can diagnose the system log on line, feed back the system running state in near real time and greatly reduce the operation and maintenance cost.
Further, the step S1 specifically includes: and sampling and extracting the total data of the INFO level log, the DEBUG level log, the WARN level log and the ERROR level log of the system service software in a certain time period.
The beneficial effects of the above further scheme are: according to the invention, the total data acquisition of the logs can be formulated according to services to realize the acquisition of the logs, the pipeline flow for collecting the logs is arranged through the configuration files, the invasiveness of source codes is reduced, and the information of log data, log files and the like with different sources and formats is acquired, aggregated and sampled.
Still further, the step S2 specifically includes: each log is regarded as a section of text by using natural language processing NLP, key features of the log are extracted and marked, and the key features are used as sample data; or
By using a regular log feature extraction method, the key features of logs in different domains are extracted and marked as sample data through keyword extraction or specified filtering rules.
The beneficial effects of the further scheme are as follows: the key feature extraction is based on word frequency statistics, different position weights are given to words at different positions by using a paragraph labeling technology, word similarity calculation is carried out on words with the same word property and higher word frequency in a word segmentation result, the words with higher similarity are combined, and the key words are obtained by sorting according to weights through word inverse frequency. Compared with the traditional Chinese keyword extraction method, the method has the advantages that the problem of low keyword extraction precision caused by the fact that the words with high similarity are not emphasized is solved, the improved algorithm result is better improved on the basis of accuracy and recall ratio than the original basis, and the extracted keyword set can better reflect text content.
Still further, the processing NLP regards each log as a segment of text, extracts and marks key features of the log, and uses the key features as sample data, which specifically includes:
treating each log as a text by using Natural Language Processing (NLP);
acquiring a word vector of an input word in each text;
inputting the word vector into an encoder to obtain an information matrix C;
inputting the information matrix C into a decoder, and extracting the key features of the log by using the decoder;
and marking key characteristics of the log as sample data.
The beneficial effects of the further scheme are as follows: the method realizes key feature extraction based on the Transformer, and the Transformer breaks through the limitation that the RNN model can not be calculated in parallel; compared to CNN, the number of operations required to calculate the association between two locations does not increase with distance; self-attention may produce a more interpretable model from which attention distributions may be examined and the individual heads (attention heads) may learn to perform different tasks.
Still further, the step S3 includes the steps of:
s301, standardizing sample data;
s302, calculating to obtain a covariance matrix according to the sample data after the standardization processing;
s303, calculating to obtain an eigenvector and an eigenvalue of the covariance matrix by using singular value decomposition;
s304, calculating to obtain a variance contribution rate through the characteristic value:
s305, judging whether the variance contribution rate is larger than 95%, if so, obtaining the number of the principal components, and entering the step S306, otherwise, returning to the step S301;
s306, obtaining a result matrix according to the number of the principal components and the eigenvectors, determining m vectors of the corresponding principal components according to the result matrix, and outputting a TID model;
s307, simulating the data which does not appear, carrying out data segmentation on the available data, segmenting the available data into two parts, and respectively using the two parts as a training set and a test set;
s308, training the TID model by using the training set, predicting the test set by using the trained TID model, and optimizing TID model parameters according to the prediction result to finish the training of the TID model.
The beneficial effects of the further scheme are as follows: according to the method, the model generation learns the distribution characteristics of all data through sample data, the learning convergence speed is higher, and when the sample capacity is increased, the learned model can be converged to a real model more quickly; different algorithms can be chosen in a configured form as a reference for different traffic scenarios to derive the TID model.
Still further, the expression of the covariance matrix is as follows:
Figure 960400DEST_PATH_IMAGE001
wherein,
Figure 222754DEST_PATH_IMAGE002
representing a variance matrix, n representing the number of variables, i representing an index, X i It is indicated that the (i) th variable,
Figure 950539DEST_PATH_IMAGE003
represents the mean value of the variables.
Still further, the expression of the variance contribution ratio is as follows:
Figure 298343DEST_PATH_IMAGE004
wherein,
Figure 238618DEST_PATH_IMAGE005
represents the variance contribution rate of the principal component,
Figure 621057DEST_PATH_IMAGE006
represents the variance of the kth principal component,
Figure 519743DEST_PATH_IMAGE007
representing the nth characteristic value.
Still further, the step S4 includes the steps of:
s401, marking collected system service software logs of different types;
s402, cleaning the log data with the labels, and dividing the cleaned text data with the labels;
s403, replacing the word in the TID model and the division result with a word number to respectively form a word index sequence corresponding to the TID model and a word index sequence corresponding to the verification set;
s404, mapping all word index sequences in the step S403 to the dimensionality of the log key features through vector calculation, and calculating the vector distance between the feature vector and the normal sub-feature space;
s405, judging whether the vector distance exceeds a preset threshold value, if so, determining that the network service link is abnormal, otherwise, determining that the network service link is not abnormal, and ending the process.
The beneficial effects of the further scheme are as follows: the label processing of the invention can be used as a label through the service attribute value without processing and conversion; the data can be simply analyzed and derived according to the rule of common behaviors in the service; the data marking can also be carried out by establishing a recognition rule and carrying out component analysis on the common data. According to the label accuracy and the coverage degree, the sample data is extracted to verify and judge whether the label design is reasonable or not; directly calculating the reliability degree through indexes such as recall ratio, precision ratio and the like in the model sample; and verifying the accuracy of the label through subsequent result data observation and statistics.
The invention provides a fault diagnosis system of a network service link, which comprises:
the sampling module is used for sampling and extracting the system service software log;
the characteristic extraction module is used for extracting and marking the characteristics of the sampled data as sample data;
the model training module is used for training the TID model by utilizing a PCA algorithm according to the sample data;
and the fault diagnosis module is used for acquiring system service software logs, analyzing the system service software logs of different types by using the TID model, obtaining a fault diagnosis result according to the analysis result and completing fault diagnosis of the network service link.
The invention has the beneficial effects that: the invention takes a network service system as background, provides a TID model on the basis of analyzing a traditional fault diagnosis model and a common diagnosis method aiming at the characteristics of the network service system and the requirement of system fault diagnosis, provides a learning mechanism based on a PCA algorithm for improving the fault diagnosis efficiency, and can quickly and accurately diagnose and position the system fault; according to the invention, the system service software log is comprehensively analyzed and processed, so that the abnormal condition of the system can be accurately diagnosed; the invention can be applied to different programming languages and different systems by extracting NLP characteristics of the log; the invention can diagnose the system log on line, feed back the system running state in near real time and greatly reduce the operation and maintenance cost.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a flow chart of the PAC algorithm in this embodiment.
Fig. 3 is a schematic diagram of the system structure of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
Examples
As shown in fig. 1, the present invention provides an intelligent diagnosis method for a network service link failure, which is implemented as follows:
s1, sampling and extracting a system service software log, which specifically comprises the following steps: sampling and extracting full data of an INFO level log, a DEBUG level log, a WARN level log and an ERROR level log of system service software in a certain time period;
in this embodiment, the system service software log includes an INFO log, a DEBUG log, a WARN log, and an ERROR log; the ETL sampling extraction is adopted as the total data extraction of a certain time period. Data acquisition is carried out through an ODS area; performing data quality analysis on data in the ODS area, verifying the correctness, completeness, consistency, completeness, effectiveness, timeliness and acquirability of the data, then performing reasonable conversion on the checked data, and performing unified processing on the data with the problems of ambiguity, repetition, incompleteness, violation of business or logic rules and the like in a source database; and loading the converted data according to the initial sequence.
S2, performing feature extraction and marking on the sampled data as sample data, wherein the sample data specifically comprises the following steps:
each log is regarded as a section of text by using natural language processing NLP, key features of the log are extracted and marked, and the key features are used as sample data; or
By using a regular log feature extraction method, the key features of logs in different domains are extracted and marked as sample data through keyword extraction or specified filtering rules.
The method comprises the following steps of using a natural language processing NLP to treat each log as a section of text, extracting and marking key features of the log as sample data, wherein the method specifically comprises the following steps:
each log is regarded as a text by natural language processing NLP;
acquiring a word vector of an input word in each text;
inputting the word vector into an encoder to obtain an information matrix C;
inputting the information matrix C into a decoder, and extracting the key features of the log by using the decoder;
and marking key features of the log as sample data.
In this embodiment, the feature extraction is to extract log key features, and the log key feature extraction method can be classified into a log feature extraction method based on NLP (natural language processing) and rules; processing the log files by means of theme extraction, type classification, structural analysis, semantic representation and the like; and extracting key words or appointing filtering rules of the single log text by various technologies in the NLP field such as synonym replacement, semantic normalization, omission/error correction, automatic word segmentation, part of speech analysis, syntactic analysis, semantic analysis and the like, and extracting different fields of the log.
In this embodiment, the log feature extraction is performed based on a feature extractor, which is composed of two major parts: an Encoder (Encoder) and a Decoder (Decoder), each module containing 6 blocks. All encoders are structurally identical and are responsible for mapping natural language sequences into hidden layers, which contain expressions for natural language sequences but do not share parameters.
The first step is as follows: and acquiring a Word vector X of the input Word, wherein the X is obtained by adding Word embedding and position embedding, and the Word embedding is obtained by adopting Word2Vec or Transformer algorithm pre-training. The position information of each word is set so as to identify the sequential relationship in the language. The position information PE of the Transformer model is expressed by linear transformation of sin and cos:
PE(pos,2i)=sin(pos/100002i/d)
PE (pos,2i+1)=cos(pos/100002i/d)
wherein pos represents the position of a word in a sentence, for example, the sentence is composed of 10 words, pos represents any position of [0-9], and the value range is [0, max sequence ]; i represents the dimension of the word vector, and the value range [0, embedding dimension ], for example, if a certain word vector is 256 dimensions, the value range of i is [0-255]; d represents the dimension of the PE, i.e., the dimension of the word vector, 256 in the above example; 2i denotes an even dimension (sin); 2i +1 represents the odd dimension (cos). The sin and cos formulas correspond to an embedding dimension, i.e., a set of odd and even numbered dimensions. And respectively processing by using the sin function and the cos function so as to generate different periodic changes and obtain the dependency relationship between positions and the time sequence characteristic of the natural language.
The second step is that: and transmitting the vector matrix obtained in the first step into an encoder, wherein the encoder comprises 6 blocks, outputting an encoded information matrix C, and the dimension of the block output by each encoder is completely consistent with that of the input.
The third step: and transmitting the coding information matrix C output by the coder to a decoder, and sequentially translating the next word i +1 by the decoder according to the currently translated words 1-i to extract the characteristics.
In this embodiment, based on the rule structured log information, a hypothesis or a known log structure is extracted, and different fields (fields) of the log are extracted through keyword extraction or pre-processing filtering rules. A primary concern is a relational log (ratified log) that has a specific format (e.g., POSIX format) and logical structure, and therefore information is easily extracted by regular expressions. For the web application logs, the logs also have relatively consistent formats, and variable information in the logs can be extracted through simple regular matching.
S3, training the TID model by using a PCA algorithm according to the sample data, wherein the implementation method comprises the following steps:
s301, standardizing sample data;
s302, according to the sample data after the standardization, a covariance matrix is obtained through calculation:
Figure 354844DEST_PATH_IMAGE001
wherein,
Figure 364388DEST_PATH_IMAGE002
representing a variance matrix, n representing the number of variables, i representing an index, X i It is indicated that the (i) th variable,
Figure 476701DEST_PATH_IMAGE003
represents the mean value of the variables;
s303, calculating to obtain an eigenvector and an eigenvalue of the covariance matrix by using singular value decomposition;
s304, calculating the variance contribution rate through the characteristic value:
s305, judging whether the variance contribution rate is greater than 95%, if so, obtaining the number of the main components, and entering the step S306, otherwise, returning to the step S301;
s306, obtaining a result matrix according to the number of the principal components and the characteristic vectors, determining m vectors of the corresponding principal components according to the result matrix, and outputting a TID model;
s307, simulating the data which do not appear, carrying out data segmentation on the available data, dividing the available data into two parts, and respectively using the two parts as a training set and a test set;
s308, the TID model is trained by the training set, the tested set is predicted by the trained TID model, parameters of the TID model are optimized according to a prediction result, and TID model training is completed.
In this embodiment, the variance contribution rate is obtained by linearly combining the eigenvector of the covariance matrix and the original variable:
Figure 405343DEST_PATH_IMAGE008
wherein,Ythe covariance contribution rate is expressed as a ratio of covariance contribution,Pa covariance matrix is represented by a value of the covariance matrix,P n represents the nth value of the covariance matrix,Xwhich represents the vector of the original variable(s),e nn the unit feature vector is represented by a vector of,X n representing the weight information of the nth original variable vector.
The variance of the principal component measures the variance of the data set that can be interpreted, and the variance of the principal component is the eigenvalue λ of the covariance matrix of X, so the variance of the kth principal component is λ k. To define an index, called the variance contribution of the principal component Yk, which is the ratio of the k-th principal component's variance to the total variance:
Figure 603106DEST_PATH_IMAGE004
wherein,
Figure 275396DEST_PATH_IMAGE005
represents the variance contribution rate of the principal component Yk,
Figure 242215DEST_PATH_IMAGE006
represents the variance of the k-th principal component,
Figure 607337DEST_PATH_IMAGE007
representing the nth characteristic value.
In this embodiment, the PCA algorithm is a statistical method that tries to recombine original variables into a set of new several independent synthetic variables, and can extract several smaller sum variables from the set of new synthetic variables as much as possible to reflect information of the original variables according to actual needs, and is also a method for mathematically processing dimension reduction.
In this embodiment, the key features of the normal operation of the system are learned from the log feature vector, and the outliers detected by comparing these key features or performing unsupervised clustering on the log set or the log feature vector set are the outliers. The method generally uses a feature vector of a single log or a log set to construct a TID model, and TID model training and tuning are carried out based on a Spark distributed platform; adjusting parameters of the TID model through the evaluation index to achieve an offline optimal effect; verifying the effectiveness of TID model improvement points through the advantages and disadvantages of experimental effects; and finally, calculating the value ranges of most log template frequency vectors on the key feature dimensions, such as the value distribution range of 95% of data on the key feature dimensions, and recording the values to generate a normal sub-feature space.
In this embodiment, the PAC algorithm flow is as shown in fig. 2, 1, ETL sampling extraction is adopted to extract full data in a certain time period, and a sample data set a is obtained after processing; 2. extracting key characteristics of the sample by a characteristic extractor, namely a Transformer; 3. standardizing the sample characteristic data, eliminating the influence caused by dimension, calculating a covariance matrix, and establishing a standardized variable covariance matrix; 4. computing eigenvectors of covariance matrix by singular value decomposition
Figure 292396DEST_PATH_IMAGE009
And a characteristic value
Figure 768377DEST_PATH_IMAGE010
(ii) a 5. Calculating variance contribution rate through the characteristic value information; 6. determining variance contribution rate
Figure 589702DEST_PATH_IMAGE011
If the ratio is greater than 95%, recalculating the method contribution ratio if the ratio is not greater than 95%, and acquiring the number m of the principal components if the ratio is greater than 95%; 7. obtaining a result matrix T = AU through the number of the principal components and the eigenvector; 8. determining m vectors of the respective principal components; 9. and outputting the TID model.
S4, collecting system service software logs, analyzing the system service software logs of different types by using a TID model, obtaining a fault diagnosis result according to the analysis result, and completing fault diagnosis of the network service link, wherein the implementation method comprises the following steps:
s401, marking collected system service software logs of different types;
s402, cleaning the log data with the labels, and dividing the cleaned text data with the labels;
s403, replacing the word in the TID model and the division result with a word number to respectively form a word index sequence corresponding to the TID model and a word index sequence corresponding to the verification set;
s404, mapping all word index sequences in the step S403 to dimensionality of log key features through vector calculation, and calculating a vector distance between a feature vector and a normal sub-feature space;
s405, judging whether the vector distance exceeds a preset threshold value, if so, determining that the network service link is abnormal, otherwise, determining that the network service link is not abnormal, and ending the process.
In this embodiment, in the stage of anomaly detection, the log template frequency vector of the online log is mapped to the key feature dimension through vector calculation, and the vector distance between the vector and the normal sub-feature space is calculated, and if the distance exceeds a certain threshold, it is determined that the system is abnormal. And in the abnormal detection stage, whether the online log contains log information related to the fault is checked, and whether the online log is abnormal is judged. Extracting key features from the logs, performing dimension reduction on the key features by using a PCA method, clustering all the logs by using a K-means algorithm, and determining the found outliers as the detected anomalies.
By the design, the logs of the system service software are comprehensively analyzed and processed, and the abnormal condition of the system can be accurately diagnosed; the log of the service software is subjected to NLP processing, and the method is suitable for different programming languages and different systems; the system log can be diagnosed on line, the running state of the system can be fed back in a near real-time manner, and the operation and maintenance cost is greatly reduced.
Example 2
As shown in fig. 3, the present invention provides an intelligent diagnosis system for network service link failure, including:
the sampling module is used for sampling and extracting the system service software logs;
the characteristic extraction module is used for extracting and marking the characteristics of the sampled data as sample data;
the model training module is used for training the TID model by utilizing a PCA algorithm according to the sample data;
and the fault diagnosis module is used for acquiring system service software logs, analyzing different types of system service software logs by using the TID model, obtaining a fault diagnosis result according to the analysis result and completing fault diagnosis of the network service link.
The fault diagnosis system for a network service link provided in the embodiment shown in fig. 3 may execute the technical solution shown in the intelligent fault diagnosis method for a network service link in the above embodiment, and the implementation principle and the beneficial effect are similar, which are not described herein again.
In the embodiment of the invention, the functional units can be divided according to the intelligent network service link fault diagnosis method, for example, each function can be divided into each functional unit, or two or more functions can be integrated into one processing unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software functional unit. It should be noted that the division of the cells in the present invention is schematic, and is only a logical division, and there may be another division manner in actual implementation.
In the embodiment of the invention, the fault diagnosis system of the network service link comprises a hardware structure and/or a software module corresponding to each function in order to realize the principle and the beneficial effect of the intelligent fault diagnosis method of the network service link. It should be readily appreciated by those of ordinary skill in the art that while the exemplary elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in hardware and/or in a combination of hardware and computer software, whether such functionality is implemented as hardware or computer software, the functionality described may be implemented using different approaches for each particular application depending upon the particular application and design constraints imposed on the technology, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The invention carries out comprehensive analysis processing on the log of the system service software, and can accurately diagnose the abnormal condition of the system; the log of the service software is subjected to NLP processing and can be suitable for different programming languages and different systems; the system log can be diagnosed on line, the running state of the system can be fed back in a near real-time manner, and the operation and maintenance cost is greatly reduced.

Claims (9)

1. An intelligent diagnosis method for network service link faults is characterized by comprising the following steps:
s1, sampling and extracting system service software logs;
s2, extracting characteristics of the sampled data and marking the sampled data as sample data;
s3, training the TID model by utilizing a PCA algorithm according to the sample data;
and S4, collecting system service software logs, analyzing the system service software logs of different types by using the TID model, obtaining a fault diagnosis result according to the analysis result, and completing fault diagnosis of the network service link.
2. The method according to claim 1, wherein the step S1 specifically comprises: and in a certain time period, carrying out sampling extraction on full data of an INFO level log, a DEBUG level log, a WARN level log and an ERROR level log of system service software.
3. The method according to claim 2, wherein the step S2 specifically comprises: processing NLP by using natural language to regard each log as a section of text, extracting and marking key features of the log, and taking the key features as sample data; or
By using a regular log feature extraction method, the key features of logs in different domains are extracted and marked as sample data through keyword extraction or specified filtering rules.
4. The method according to claim 3, wherein the NLP uses natural language processing to treat each log as a segment of text, extracts and marks key features of the log, and uses the key features as sample data, which specifically includes:
treating each log as a text by using Natural Language Processing (NLP);
acquiring a word vector of an input word in each text;
inputting the word vector into an encoder to obtain an information matrix C;
inputting the information matrix C into a decoder, and extracting the key features of the log by using the decoder;
and marking key features of the log as sample data.
5. The intelligent network service link fault diagnosis method according to claim 4, wherein the step S3 comprises the following steps:
s301, standardizing sample data;
s302, calculating to obtain a covariance matrix according to the sample data after the standardization processing;
s303, calculating to obtain an eigenvector and an eigenvalue of the covariance matrix by using singular value decomposition;
s304, calculating the variance contribution rate through the characteristic value:
s305, judging whether the variance contribution rate is larger than 95%, if so, obtaining the number of the principal components, and entering the step S306, otherwise, returning to the step S301;
s306, obtaining a result matrix according to the number of the principal components and the eigenvectors, determining m vectors of the corresponding principal components according to the result matrix, and outputting a TID model;
s307, simulating the data which does not appear, carrying out data segmentation on the available data, segmenting the available data into two parts, and respectively using the two parts as a training set and a test set;
s308, the TID model is trained by the training set, the tested set is predicted by the trained TID model, parameters of the TID model are optimized according to a prediction result, and TID model training is completed.
6. The method of claim 5, wherein the covariance matrix is expressed as follows:
Figure 78103DEST_PATH_IMAGE001
wherein,
Figure 922562DEST_PATH_IMAGE002
representing a variance matrix, n representing the number of variables, i representing an index, X i It is indicated that the (i) th variable,
Figure 874338DEST_PATH_IMAGE003
represents the mean value of the variables.
7. The method of claim 6, wherein the variance contribution rate is expressed as follows:
Figure 612487DEST_PATH_IMAGE004
wherein,
Figure 647308DEST_PATH_IMAGE005
represents the variance contribution rate of the principal component,
Figure 979063DEST_PATH_IMAGE006
represents the variance of the kth principal component,
Figure 468950DEST_PATH_IMAGE007
representing the nth characteristic value.
8. The method according to claim 7, wherein the step S4 comprises the steps of:
s401, marking collected system service software logs of different types;
s402, cleaning the log data with the labels, and dividing the cleaned text data with the labels;
s403, replacing the word in the TID model and the division result with a word number to respectively form a word index sequence corresponding to the TID model and a word index sequence corresponding to the verification set;
s404, mapping all word index sequences in the step S403 to the dimensionality of the log key features through vector calculation, and calculating the vector distance between the feature vector and the normal sub-feature space;
s405, judging whether the vector distance exceeds a preset threshold value, if so, determining that the network service link is abnormal, otherwise, determining that the network service link is not abnormal, and ending the process.
9. An intelligent diagnostic system for network service link failure, comprising:
the sampling module is used for sampling and extracting the system service software log;
the characteristic extraction module is used for extracting and marking the characteristics of the sampled data as sample data;
the model training module is used for training the TID model by utilizing a PCA algorithm according to the sample data;
and the fault diagnosis module is used for acquiring system service software logs, analyzing the system service software logs of different types by using the TID model, obtaining a fault diagnosis result according to the analysis result and completing fault diagnosis of the network service link.
CN202211420860.1A 2022-11-15 2022-11-15 Intelligent diagnosis method and system for network service link fault Pending CN115617953A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211420860.1A CN115617953A (en) 2022-11-15 2022-11-15 Intelligent diagnosis method and system for network service link fault

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211420860.1A CN115617953A (en) 2022-11-15 2022-11-15 Intelligent diagnosis method and system for network service link fault

Publications (1)

Publication Number Publication Date
CN115617953A true CN115617953A (en) 2023-01-17

Family

ID=84878591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211420860.1A Pending CN115617953A (en) 2022-11-15 2022-11-15 Intelligent diagnosis method and system for network service link fault

Country Status (1)

Country Link
CN (1) CN115617953A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118093325A (en) * 2024-04-28 2024-05-28 中国民航大学 Log template acquisition method, electronic equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150330326A1 (en) * 2012-12-12 2015-11-19 Purdue Research Foundation Nonlinear model-based controller for premixed charge compression ignition combustion timing in diesel engines
CN105431055A (en) * 2013-07-01 2016-03-23 南加利福尼亚大学 Fasting condition as dietary treatment of diabetes
CN107391353A (en) * 2017-07-07 2017-11-24 西安电子科技大学 Complicated software system anomaly detection method based on daily record
CN108596016A (en) * 2018-03-06 2018-09-28 北京大学 A kind of personalized head-position difficult labor modeling method based on deep neural network
CN109687589A (en) * 2019-01-07 2019-04-26 中国南方电网有限责任公司 A kind of intelligent substation secondary loop fault diagnostic method
CN111274084A (en) * 2020-01-14 2020-06-12 中国平安人寿保险股份有限公司 Fault diagnosis method, device, equipment and computer readable storage medium
CN111769974A (en) * 2020-06-11 2020-10-13 中国科学院计算技术研究所 Cloud system fault diagnosis method
US20210206849A1 (en) * 2018-07-10 2021-07-08 University Of Connecticut Reagents and methods for treating cancer and autoimmune disease
CN113449098A (en) * 2020-03-25 2021-09-28 中移(上海)信息通信科技有限公司 Log clustering method, device, equipment and storage medium
CN114095333A (en) * 2021-11-23 2022-02-25 天翼数字生活科技有限公司 Network troubleshooting method, device, equipment and readable storage medium
US20220179991A1 (en) * 2020-12-08 2022-06-09 Vmware, Inc. Automated log/event-message masking in a distributed log-analytics system
CN115129679A (en) * 2021-03-29 2022-09-30 戴尔产品有限公司 Service request remediation through machine-learning based identification of critical areas of log files
CN115328753A (en) * 2022-08-24 2022-11-11 中国电信股份有限公司 Fault prediction method and device, electronic equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150330326A1 (en) * 2012-12-12 2015-11-19 Purdue Research Foundation Nonlinear model-based controller for premixed charge compression ignition combustion timing in diesel engines
CN105431055A (en) * 2013-07-01 2016-03-23 南加利福尼亚大学 Fasting condition as dietary treatment of diabetes
CN107391353A (en) * 2017-07-07 2017-11-24 西安电子科技大学 Complicated software system anomaly detection method based on daily record
CN108596016A (en) * 2018-03-06 2018-09-28 北京大学 A kind of personalized head-position difficult labor modeling method based on deep neural network
US20210206849A1 (en) * 2018-07-10 2021-07-08 University Of Connecticut Reagents and methods for treating cancer and autoimmune disease
CN109687589A (en) * 2019-01-07 2019-04-26 中国南方电网有限责任公司 A kind of intelligent substation secondary loop fault diagnostic method
CN111274084A (en) * 2020-01-14 2020-06-12 中国平安人寿保险股份有限公司 Fault diagnosis method, device, equipment and computer readable storage medium
CN113449098A (en) * 2020-03-25 2021-09-28 中移(上海)信息通信科技有限公司 Log clustering method, device, equipment and storage medium
CN111769974A (en) * 2020-06-11 2020-10-13 中国科学院计算技术研究所 Cloud system fault diagnosis method
US20220179991A1 (en) * 2020-12-08 2022-06-09 Vmware, Inc. Automated log/event-message masking in a distributed log-analytics system
CN115129679A (en) * 2021-03-29 2022-09-30 戴尔产品有限公司 Service request remediation through machine-learning based identification of critical areas of log files
CN114095333A (en) * 2021-11-23 2022-02-25 天翼数字生活科技有限公司 Network troubleshooting method, device, equipment and readable storage medium
CN115328753A (en) * 2022-08-24 2022-11-11 中国电信股份有限公司 Fault prediction method and device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NAHIDA REYAZ等: "Machine Learning in Sports Talent Identification: A Systematic Review", 《2022 2ND INTERNATIONAL CONFERENCE ON EMERGING FRONTIERS IN ELECTRICAL AND ELECTRONIC TECHNOLOGIES (ICEFEET)》 *
徐久强等: "基于流时间影响域的网络流量异常检测", 《东北大学学报(自然科学版)》 *
王格芳等: "一种电子设备自动测试模型及应用", 《仪表技术与传感器》 *
王琪善: "基于数据挖掘的工业锅炉故障状态检测和预测方法研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118093325A (en) * 2024-04-28 2024-05-28 中国民航大学 Log template acquisition method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109034368B (en) DNN-based complex equipment multiple fault diagnosis method
CN109639739B (en) Abnormal flow detection method based on automatic encoder network
CN105653444B (en) Software defect fault recognition method and system based on internet daily record data
CN111259532A (en) Fault diagnosis method of aeroengine control system sensor based on 3DCNN-JTFA
CN112818008A (en) Intelligent diagnosis method, system, medium and electronic equipment for nuclear power debugging faults
CN115617953A (en) Intelligent diagnosis method and system for network service link fault
CN115865483A (en) Abnormal behavior analysis method and device based on machine learning
CN113435759A (en) Primary equipment risk intelligent evaluation method based on deep learning
CN117743909A (en) Heating system fault analysis method and device based on artificial intelligence
CN115375026A (en) Method for predicting service life of aircraft engine in multiple fault modes
Zhang et al. A textcnn based approach for multi-label text classification of power fault data
CN111737993B (en) Method for extracting equipment health state from fault defect text of power distribution network equipment
Wen et al. A cross-project defect prediction model based on deep learning with self-attention
CN114969334B (en) Abnormal log detection method and device, electronic equipment and readable storage medium
CN116029295A (en) Electric power text entity extraction method, defect positioning method and fault diagnosis method
CN115757062A (en) Log anomaly detection method based on sentence embedding and Transformer-XL
CN113704073B (en) Method for detecting abnormal data of automobile maintenance record library
CN114610882A (en) Abnormal equipment code detection method and system based on electric power short text classification
Park et al. Identify the failure mode of weapon system (or equipment) using machine learning
Wang et al. FastTransLog: A Log-based Anomaly Detection Method based on Fastformer
Zhang et al. Fault diagnosis of on-board equipment in ctcs-3 based on cnn-lstm model
CN113377746B (en) Test report database construction and intelligent diagnosis analysis system
CN113378560B (en) Test report intelligent diagnosis analysis method based on natural language processing
CN114969335B (en) Abnormality log detection method, abnormality log detection device, electronic device and readable storage medium
CN115794465B (en) Log abnormality detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230117

RJ01 Rejection of invention patent application after publication