CN116915582A

CN116915582A - Diagnosis and analysis method and device for fault root cause of communication terminal

Info

Publication number: CN116915582A
Application number: CN202311110498.2A
Authority: CN
Inventors: 许辰人; 徐昊天; 马翔天
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-08-30
Filing date: 2023-08-30
Publication date: 2023-10-20

Abstract

The invention relates to a method and a device for diagnosing and analyzing the root cause of a communication terminal fault, wherein the method at least comprises the following steps: converting the received radio frequency data of the terminal equipment into frequency domain feature data, and analyzing the frequency domain feature data based on a fault detection algorithm to judge whether the terminal equipment has faults or not; receiving monitoring index data of the terminal equipment in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode; and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs. Aiming at the defect that the root cause of the fault is difficult to find from observed business anomalies in the prior art, the invention assists operation and maintenance personnel in the industrial field to carry out the work of locating the root cause of the fault by finding the causal relation of industrial monitoring indexes.

Description

Diagnosis and analysis method and device for fault root cause of communication terminal

Technical Field

The invention relates to the technical field of intelligent operation and maintenance, in particular to a method and a device for diagnosing and analyzing the root cause of a communication terminal fault.

Background

Faults in industrial settings often result from various causes such as metal random movement shielding, unlicensed band bursty interference, hardware antenna looseness, radio frequency channel failure, equipment configuration failure, and so forth. Different faults occur at different levels, which can affect equipment state indexes, communication physical parameters, network protocol indexes and the like, wherein the indexes can be collected and observed when the system is in operation and used for monitoring the occurrence of the faults, however, the occurrence of the faults often affects not only a single index, but also complex inter-equipment relations and adaptation and diffusion of multiple protocols, so that more indexes show abnormal conditions, and finally, the root cause of the faults is difficult to find from observed business anomalies alone.

The industrial field usually triggers fault alarm by setting a threshold value for equipment fault operation and maintenance, and a business expert generally selects a proper threshold value according to the characteristics of key indexes and corresponding application scenes and combines the past data and experience, so that the method has high requirements on the field expertise, needs to deeply understand the physical meaning behind the indexes and the corresponding working principle, and simultaneously depends on the past experience too much, and the simple setting of the threshold value is difficult to meet the high-efficiency operation and maintenance requirements under the condition of high dynamic change of environment or under the condition of similar scenes but different thresholds, and simultaneously brings great manpower consumption. At present, a method for diagnosing and analyzing the root cause of the faults of the industrial 5G-U communication equipment is not well solved.

Patent application publication No. CN116032726A discloses a fault root cause positioning model training method, device, equipment and readable storage medium. The method comprises the following steps: acquiring historical fault data and historical KPI data of a plurality of sample network elements in a communication network; according to the historical fault data and the historical KPI data of each sample network element, a sample data set is obtained, and each group of sample data in the sample data set comprises an alarm feature vector generated based on the fault data and the historical KPI data of each group of sample network elements and a fault root cause corresponding to the alarm of the group of sample network elements; training a classification model based on an attention mechanism by using the sample data set to obtain a fault root cause positioning model; the fault root cause positioning model is used for predicting the fault root cause corresponding to the alarm of at least one target network element based on the alarm feature vector of the at least one target network element. The drawbacks of this prior art are: the invention directly locates the root cause of the fault, lacks causal relation analysis between data, and therefore lacks the interpretability of the root cause of the fault.

The patent application with publication number of CN111538316A discloses a performance-based fault diagnosis method and system for an actuator of a closed-loop control system. The method comprises the following steps: establishing a linear nominal model of the control system under the condition of no fault, and determining an identification model of the actual control system according to input data and output data of the actual control system; determining a first time domain performance residual according to the difference between the output of the linear nominal model under the closed-loop feedback and the output of the identification model under the closed-loop feedback; determining a first frequency domain performance residual error and a first stable domain performance residual error by adopting a gap measurement method according to a linear nominal model and an identification model; and performing fault detection according to the time domain performance residual error, the frequency domain performance residual error and the stable domain performance residual error between the linear nominal model and the identification model. Since this patent lacks analysis of nonlinear causal relationships, it is likely that the resulting fault is not the root cause of the fault.

Based on the above drawbacks, the present invention is expected to improve the efficiency of fault detection in a conventional industrial environment by a fault detection scheme for spectrogram learning and a root cause localization scheme for nonlinear causal reinforcement learning.

Furthermore, there are differences in one aspect due to understanding to those skilled in the art; on the other hand, since the applicant has studied a lot of documents and patents while making the present invention, the text is not limited to details and contents of all but it is by no means the present invention does not have these prior art features, but the present invention has all the prior art features, and the applicant remains in the background art to which the right of the related prior art is added.

Disclosure of Invention

In the prior art, since the monitoring index data has the characteristics of multiple dimensions and multiple layers, such as the power intensity, the duty ratio and the bandwidth resource allocation proportion of an unauthorized channel are different, the scheduling response time delay, the throughput, the packet loss proportion, the memory utilization rate and the like of each layer of a network stack are different, so that the interaction relationship among the index data is complex, the fault propagation influence range is wide, and the interaction relationship is difficult to be described by establishing a unified model, so that the influence effect of faults is difficult to be analyzed.

The industrial field often uses judgment threshold values to detect faults, and has the problems that proper threshold values are difficult to select, data drift exists in practical deployment to cause different optimal threshold values in different factory environments, and maintenance personnel are difficult to analyze index association due to high complexity of the system, so that a more efficient fault detection and root cause analysis scheme is needed.

Aiming at the defects of the prior art, the invention provides a diagnostic analysis method for the fault root cause of a communication terminal, which at least comprises the following steps: converting the received radio frequency data of the terminal equipment into frequency domain feature data, and analyzing the frequency domain feature data based on a fault detection algorithm to judge whether the terminal equipment has faults or not; receiving monitoring index data of the terminal equipment in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode; and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs.

According to the fault detection scheme based on spectrogram learning and the root cause positioning scheme based on nonlinear causal reinforcement learning, the efficiency of fault detection in the traditional industrial environment is improved in an artificial intelligence mode, and the causal relation of industrial monitoring indexes is found, so that operation and maintenance personnel in the industrial field are assisted to perform fault root cause positioning, and the purposes of intelligent industrial operation and labor cost reduction are achieved. The invention has the capability of solving the causal relationship of the data, can provide a clear and interpretable analysis process in the analysis of the root cause of the fault, and has more positive significance for solving the fault.

Preferably, the method for analyzing the frequency domain feature data based on the fault detection algorithm to determine whether the terminal device has a fault at least includes: learning spectrogram features of the frequency domain feature data based on a convolutional neural network, and distinguishing a normal state and a fault state based on the spectrogram features; and judging whether the terminal equipment fails or not based on the waveform and the peak-valley characteristics of the spectrogram. According to the invention, the spectrum diagram features are learned through the convolutional neural network, and the 5G-U equipment in a normal working state can be distinguished from the 5G-U equipment in a fault state, so that whether the 5G-U equipment fails or not can be accurately detected.

Preferably, the method for calculating the nonlinear causal relationship between indexes by means of reinforcement learning at least comprises the following steps: generating an adjacency matrix according to the probability of causal relation between indexes; and evaluating the effectiveness of edges in the index variable causal relation graph, calculating corresponding scores according to the sum of squares of residual errors, and adopting a matrix index to restrict the acyclic nature of the causal relation graph.

According to the method, the nonlinear causal relationship among the indexes is calculated, the effectiveness of the edges in the index variable causal relationship graph is evaluated to eliminate the invalid nonlinear causal relationship, and the accuracy of the nonlinear causal relationship is improved.

Preferably, the method for calculating the probability of causal relation between indexes at least comprises the following steps: and ordering the monitoring index data in a normal state according to time to form time sequence index data, encoding the time sequence index data, and decoding the time sequence index data based on a single-layer neural network to obtain the probability of causal relation between indexes.

The probability of the causal relation is obtained based on the time sequence index data, and the method has the following advantages: and finding objective causal relation rules existing between the observed time sequence index data. Compared with the mode of directly positioning the root cause through the neural network, the method can explain the propagation process and basis when the fault occurs to operation and maintenance personnel. Meanwhile, as the direct causal relation of the index data is generally stable and unchanged, the probability of the obtained causal relation is more universal, and the method is suitable for the condition of large data change range and cannot be randomly influenced by neural network parameter training.

Preferably, the method for calculating the nonlinear causal relationship between indexes by means of reinforcement learning further comprises: calculating a return value, and optimizing a return function based on a random gradient descent method in the training process; the learning rate is adjusted in an exponentially decaying manner. The advantage of such training is that: because the original problem can not be solved in polynomial time, the method can quickly solve the approximate solution in a random gradient descent mode, and has practicability. The invention can flexibly adjust the return function according to the application scene, thereby being applicable to different application scenes rapidly.

Preferably, the method for calculating the nonlinear causal relationship between indexes by means of reinforcement learning further comprises: randomly sampling a plurality of T interval time sequences for the input monitoring index data through Monte Carlo; in the training process, an adjacent matrix corresponding to the maximum return value is recorded, and is output as a result after the training is completed, so that the searching process is completed. The adjacency matrix corresponding to the maximum return value is output, and the purpose is that: the problem of local optimum due to randomness is avoided. According to the invention, the value of the adjacency matrix with larger space can be obtained through probability sampling, so that a better solution can be found in the searching process, and the graph which is most in line with the causal relation of index data can be obtained by recording the optimal adjacency matrix.

Preferably, in the case of occurrence of a fault, the method for judging the root cause index of the fault based on the nonlinear causal relationship at least comprises: the anomaly degree of the index variable is calculated based on the anomaly score of the variable self-encoder, and the root score of the index variable node is calculated based on the weight given to the anomaly score by the distance between the index variable node and the root node.

According to the invention, root cause scores of each index variable node are calculated, and root cause indexes of faults are judged by accumulating the root cause scores.

The invention also provides a device for diagnosing and analyzing the root cause of the communication terminal fault, which at least comprises: and a fault detection module: the method comprises the steps of converting received radio frequency data of the terminal equipment into frequency domain feature data, and analyzing the frequency domain feature data based on a fault detection algorithm to judge whether the terminal equipment has faults or not; the causal relation construction module: the method comprises the steps of receiving monitoring index data of the terminal equipment in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode; fault root cause positioning module: and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs.

The device can improve the efficiency of fault detection in the traditional industrial environment in an artificial intelligence mode according to the radio frequency data and the monitoring index data of the terminal equipment, find the causal relationship of the industrial monitoring index, assist operation and maintenance personnel in the industrial field to perform the work of fault root cause positioning, and achieve the purposes of intelligent industrial operation and labor cost reduction.

Preferably, the fault detection module is configured to: learning spectrogram features of the frequency domain feature data based on a convolutional neural network, and distinguishing a normal state and a fault state based on the spectrogram features; and judging whether the terminal equipment fails or not based on the waveform and the peak-valley characteristics of the spectrogram.

The fault detection module can accurately distinguish the normal working state and the fault state based on the waveform and the peak-valley characteristic of the spectrogram, and the fault judgment accuracy is high.

Preferably, the causal relationship construction module is configured to: the monitoring index data in a normal state are ordered according to time to form time sequence index data, the time sequence index data are encoded, and the time sequence index data are decoded based on a single-layer neural network to obtain the probability of causal relation between indexes; generating an adjacency matrix according to the probability of causal relation between indexes; and evaluating the effectiveness of edges in the index variable causal relation graph, calculating corresponding scores according to the sum of squares of residual errors, and adopting a matrix index to restrict the acyclic nature of the causal relation graph.

The fault root cause positioning module provided by the invention can accurately extract and position the fault root cause based on the probability of causal relation among indexes, reduces the working difficulty of operation and maintenance personnel in analyzing the fault root cause, and reduces the labor cost.

Drawings

FIG. 1 is a schematic illustration of a communication connection for fault diagnosis of a preferred embodiment provided by the present invention;

FIG. 2 is a schematic flow chart of the method for diagnosing and analyzing the root cause of the communication terminal fault;

FIG. 3 is a schematic diagram of module connection of the communication terminal fault root cause diagnosis and analysis device provided by the invention;

FIG. 4 is a spectrum diagram of the normal state of the communication terminal fault root cause diagnosis and analysis device provided by the invention;

fig. 5 is a spectrum diagram of a fault condition of a communication terminal fault root cause diagnosis and analysis apparatus provided by the present invention.

List of reference numerals

20: a base station; 30: a terminal device; 40: a fault root cause diagnosis and analysis device; 310: a data analysis module; 320: a fault detection module; 330: a causal relationship construction module; 340 the root cause positioning module.

Detailed Description

The following detailed description refers to the accompanying drawings.

For example, when other devices set the working frequency band by mistake or have interference sources, the same frequency interference fault can be generated, so that a certain industrial 5G-U terminal is interfered, the signal-to-noise ratio (root-dependent index variable) of the received signal can be abnormally reduced, the error rate (affected index variable) is increased, the packet loss rate (affected index variable) is also increased, and the application data transmission speed is reduced, so that the quality of the monitoring picture (affected index variable) is finally reduced.

Based on the defect, the invention hopefully provides the diagnosis and analysis method and the diagnosis and analysis device for the fault root cause of the communication terminal, the efficiency of fault detection in the traditional industrial environment can be improved in an artificial intelligence mode, and the causal relationship of industrial monitoring indexes is found, so that operation and maintenance personnel in the industrial field are assisted to perform the work of fault root cause positioning, and the effects of intelligent industrial operation and labor cost reduction are achieved.

The invention is described below with respect to some terms.

Radio frequency data: refers to wireless signal data transmitted and received by the 5G-U device through an antenna.

Frequency spectrum diagram: refers to a plot of the radio frequency data as it is being subjected to a fast fourier transform and then presented in the frequency domain.

Monitoring index data: the method at least comprises the relevant time sequence data of network communication such as signal-to-noise ratio, reference signal receiving power, modulation coding strategy, transmitting power allowance, round trip delay, packet loss rate and the like.

The root cause is: the root cause.

Root cause score: the degree of abnormality of the time series index data is an accumulated score obtained by a causal relation graph, and indicates the possibility that the index is the cause of a fault.

Nonlinear causal relationship: two variables that are non-linearly related (i.e., 2 items to be compared) do not change over a range as one of the other variables changes while the other variables are unchanged.

Encoding: a process of converting information from one form or format to another.

Decoding: a process of restoring the digital code to what it represents or converting an electric pulse signal, an optical signal, a radio wave, or the like to information, data, or the like that it represents. Decoding is the process by which the recipient restores the received symbol or code to information, corresponding to the encoding process.

Example 1

As shown in fig. 1, a schematic view of one of the fault diagnoses implemented by the present invention is provided. In this scenario, at least the data analysis module 310, the base station 20, the terminal device 30, and the root cause diagnosis analysis apparatus 40 are included. The data analysis module 310 is communicatively connected to the terminal device 30 via the base station 20. The terminal device 30 is connected in communication with the fault root cause diagnostic analysis means 40 in a wired and/or wireless manner.

Preferably, the terminal device 30 transmits wireless data packets in the unlicensed frequency band of 2.4GHz or 5.8GHz through the 5G-U wireless protocol, and then traffic for industrial applications is sent to the base station 20 through the industrial 5G-U terminal device and the service is completed.

Preferably, the fault root diagnosis analysis apparatus 40 is configured to execute the fault root diagnosis analysis method of the present invention, which is capable of executing an operation program corresponding to the fault root diagnosis analysis method. The fault root diagnosis analysis apparatus 40 preferably executes the fault root diagnosis analysis method of the present invention by using hardware such as an application specific integrated chip, a processor, a server, a micro server, and a cloud server.

Preferably, as shown in fig. 3, the fault root cause diagnosis and analysis apparatus 40 includes at least: the fault detection module 320, the causal relationship construction module 330, and the fault root location module 340. The fault detection module 320, the causal relationship construction module 330 and the fault root positioning module 340 can be connected and operated under the condition of being integrated in the same hardware, and can also belong to different hardware, so that different hardware can be connected and operated.

The fault detection module 320, the causal relationship construction module 330, and the fault root positioning module 340 are sequentially connected through a plurality of communication ports or communication lines and establish an information transmission relationship. The fault detection module 320 is communicatively connected to the terminal device 30 via at least one communication port in a wired and/or wireless manner to receive data of the terminal device 30.

Preferably, the fault root cause diagnosis and analysis apparatus 40 is communicatively connected to the data analysis module 310 through the base station 20 to perform steps S210 to S250, as shown in fig. 2.

The invention collects data in normal state and data in fault experiment. Specifically, the fault experiment includes three types, the first type is to use an interference source to perform co-channel interference (a second fault A2 in fig. 1), so that wireless data of a terminal is difficult to normally transmit; the second type is to adopt radio frequency channel faults, and the data transmission efficiency is reduced by applying attenuation; the third type is an antenna loose hardware failure (first failure A1 in fig. 1), which affects the wireless link by loosening the antenna.

In the experimental scenario, the base station 20 and the terminal device 30 are connected through an air interface, and transmit wireless data by adopting a 5G-U protocol. The base station 20 is connected to a first switch and then to a server of the same network segment. The terminal device 30 is connected to the second switch and then to the user application under the same network segment. The antenna spacing between the base station 20 and the terminal equipment 30 is 1 meter. The 4 antennas on the same terminal device 30 are spaced 2 cm apart.

S210: the data analysis module 310 is configured to obtain radio frequency data and monitoring index data from the terminal device 30 and the base station 20.

In the data collection process, the terminal device 30 collects and saves the I-way and Q-way data in the wireless communication process. At the same time, the base station 20 gathers a large amount of monitoring index data for indicating the current system operating state. All monitoring index data is collected and stored in a database. Preferably, the database is provided in the base station 20. The fault root cause diagnosis and analysis module 40 obtains monitoring index data of the communication system in a normal working state and index data of the communication system in a fault state by reading the database.

S220: the data analysis module 310 converts the radio frequency data of all symbols and antennas into corresponding frequency domain characteristic data through preprocessing. Preferably, the data analysis module 310 also extracts a spectrogram based on the frequency domain feature data.

The data analysis module 310 converts the radio frequency data into frequency domain feature data according to a fast fourier transform. The frequency domain characteristic data comprises a plurality of frequency spectrograms corresponding to the antennas and the symbols. The data analysis module 310 sends the frequency domain signature data to the fault detection module 320.

S230: the fault detection module 320 analyzes the frequency domain characteristic data based on a fault detection algorithm to determine whether the terminal device has a fault.

Specifically, the fault detection module 320 learns spectrogram features corresponding to the frequency domain feature data based on the convolutional neural network. The fault detection module 320 determines whether the communication system corresponding to the spectrogram belongs to a normal state or a fault state based on the spectrogram feature.

Preferably, the fault detection module 320 determines whether the terminal device 30 is faulty based on the waveform and the features of the peaks and valleys of the spectrogram. The fault detection module can accurately distinguish the normal working state and the fault state based on the waveform and the peak-valley characteristic of the spectrogram, and the fault judgment accuracy is high.

Specifically, first, the fault detection module 320 divides the frequency domain feature data of the data collection module 310 according to the frequency band average, uses half of the data for training, and uses the other half of the data for testing, so as to ensure that the data for training and the data for testing do not intersect in the frequency band, and the purpose is to prove the high applicability of the algorithm model.

Then, the fault detection module 320 combines the frequency domain feature data with the device antenna according to the 5G protocol symbol to form an image with a size of 10x1024, and inputs the image, and performs classification training according to the fault label by using the convolutional neural network, so as to obtain a trained fault detection model. The spectrogram obtained according to the mode at any moment can be input into the fault detection model to judge whether faults occur.

Fig. 4 shows a spectrum diagram of a normal state, and fig. 5 shows a spectrum diagram of a fault state. The horizontal axis of fig. 4 and 5 represents the frequency domain, and the vertical axis represents the amplitude. Fig. 4 is blank in the frequency domain of 550-2000, and there is no spectral feature. There are spectral features of the amplitude law in a particular frequency domain that is less than 550.

In comparison with fig. 4, fig. 5 has several spectrum signals in the frequency domain of 550-2000. The amplitude of the radio frequency signal varies in an irregular state, and the amplitude of the variation becomes significantly smaller. Obviously, the spectrogram in the normal state and the spectrogram in the fault state are obviously different in the shape of the frequency spectrum. The fault detection model formed based on the convolutional neural network can learn based on the difference characteristics of the spectrogram in the normal state and the spectrogram in the fault state on the spectrum characteristics, and the effect of distinguishing the two states is achieved.

Preferably, the learning model within the fault detection module 320 is not limited to a convolutional neural network model, but may be a deep reinforcement learning model or the like.

Because the fault detection module 320 directly learns the spectral features of the spectrogram, and the device spectrogram in the fault state and the spectrogram in the normal state have clear visible shape differences, the convolutional neural network can quickly learn and distinguish the two states, so as to judge whether the fault occurs.

Therefore, the fault detection module 320 has high accuracy in judging faults, the fault detection model has high training speed and low calculation force requirement, the scene is widely used, and the method is universal.

S240: the causal relationship construction module 33) receives the monitored index data of the terminal device 30 in the normal state, and calculates the nonlinear causal relationship between the indexes by means of reinforcement learning.

Specifically, the causal relationship construction module 330 sorts the monitoring indicator data in the normal state by time to form time series indicator data. The causal relationship construction module 330 encodes the time series indicator data. For example, for the time sequence index data RSRP, sampling values in the time interval T are input, and the value h is obtained after input encoding by using the gate-controlled loop network unit GRU _i 。

The causal relationship construction module 330 decodes the time series indicator data based on a single layer neural network to obtain probabilities that causal relationships exist between the indicators. Decoding to obtain a function p (e _ij E). The decoding result is calculated as a function described below.

The causal relationship construction module 330 generates an adjacency matrix according to the probability that causal relationships exist between the indicators. The function of the adjacency matrix obtained by decoding is:

p(e _ij ∈E)＝σ(u ^T tanh(w ₁ h _i +W ₂ h _j ))

wherein e _ij Indicating the reason why the ith variable is the jth variable, W ₁ ,W ₂ U represents the neural network parameters, outer sigma represents the probability of sigmoid activation function mapping to edges, E represents the set of edges in the causal relationship graph, T represents the time interval, h _i Representing the value vector, h, of the ith index code in the time sequence index data _j And represents the value vector of the j index code in the time sequence index data. Final baseThe values for each element in the adjacency matrix are probability-generated at the Bernoulli distribution. The adjacency matrix indicates which edges are present in the causal graph. Each adjacency matrix corresponds to a determined causal graph.

After generating the adjacency matrix, the causal relationship construction module 330 evaluates the validity of edges in the index variable causal relationship graph. Preferably, the causal relationship construction module 330 evaluates the validity of edges in the index variable causal relationship graph based on the calculated scores. The smaller Score (G) the better, indicating that the edges in the causal graph are more responsive to real causal relationships.

The causal relationship construction module 330 calculates a corresponding score from the sum of squares of the residuals.

Specifically, the calculation function for calculating the corresponding score from the sum of squares of the residuals includes:

wherein Score (G) represents a Score;representing the sum of squares of the residuals; i E represents the number of edges in the causal graph; m represents the number of data samples.

The causal relationship construction module 330 employs matrix indexes to constrain the acyclic nature of the causal relationship graph.

Specifically, the causal relationship construction module 330 calculates the return value as a function of:

r represents a return value, lambda ₁ Weights indicating whether the adjustment is directed acyclic graph, if so, the weights are λ ₁ Otherwise the weight is 0. Lambda (lambda) ₂ Parameters representing matrix index constraints, G represents the searched directed graph, DAGs represents the set of all directed acyclic graphs, and h (W) represents the matrix index.

The acyclic nature of the searched directed graph can be represented by a matrix index. The denser the rings of the directed graph, the greater the corresponding calculated matrix index, resulting in a lower return value. The causal relationship construction module 330 orders the return values and searches in the direction of increasing return values, making it easier to find directed acyclic graphs.

The causal relation construction module can accurately extract and position the fault root cause based on the probability of causal relation among indexes, reduces the working difficulty of operation and maintenance personnel in analyzing the fault root cause, and reduces the labor cost.

Preferably, during training, the causal relationship construction module 330 optimizes the return function based on a random gradient descent method. Specifically, a random gradient descent is performed using an adaptive moment estimation (Adam) optimizer, and an exponential decay is used to adjust the learning rate. i represents the ith round:

lr＝lr×γ ⁱ

lr denotes the learning rate, lr denotes the initial learning rate, and γ denotes the decay factor.

The causal relationship construction module 330 adjusts the learning rate in an exponentially decaying manner. The advantage of such training is that: a better solution is quickly obtained using a larger learning rate. Along with the continuation of iteration, the learning rate is gradually reduced, so that the fault detection model is more stable in the later stage of training.

Preferably, the causal relationship construction module 330 randomly samples the input monitoring indicator data with a plurality of T-interval time sequences by monte carlo.

During training, the causal relationship construction module 330 records the adjacency matrix corresponding to the maximum return value. And outputting the result after training is finished, and finishing the searching process. Outputting an adjacency matrix corresponding to the maximum return value, which has the advantages that: recording the adjacency matrix corresponding to the maximum return value can enable the causal relation graph corresponding to the adjacency matrix to be closest to the real causal relation, so that the optimal solution is recorded.

The causal relationship construction module 330 outputs an adjacency matrix corresponding to the maximum return value to the fault root location module 340.

S250: in the event of a fault, the fault root location module 340 determines a root indicator of the fault based on nonlinear causal relationships.

Specifically, the degree of abnormality of the index variable is calculated based on the abnormality score of the variable self-encoder, and the root score of the index variable node is calculated by giving a weight to the abnormality score based on the distance between the index variable node and the root node.

Preferably, the variation is derived from the encoder algorithm calculating the degree of abnormality of the individual timing indicator variables based on the degree of deviation from normal. Specifically, the method of amplitude unsupervised anomaly detection (Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications) by automatic encoder according to seasonal KPI variation adopts the reconstruction probability of the data of a single time sequence index variable at the fault moment as the anomaly score SA _k 。

When a fault occurs, a single index variable acts as a single cause of the fault, and its abnormal changes affect all of its outcome variables and further downstream variables, and thus propagate in the causal graph. The purpose of assigning the weight of the anomaly score based on the distance between the index variable node and the root node is to comprehensively consider the attenuation of the influence of the propagation process in order to compare the propagation process, thereby calculating the root index variable most likely to correspond to the fault.

The method for calculating the root cause score of the index variable node comprises the following steps:

wherein lambda is ^d In (2) represents a discount factor, in order to make the anomaly score of the index variable closer to the root cause contribute more to the final root cause score, and d represents the index variable node X _k Distance from source node X _i Distances in the causal relationship graph. Anomaly score SA _k And the method is calculated by a self-encoder based on variation. And finally, selecting the index variable node with the highest root factor score as the fault root factor of the positioning.

The root represents the root of the fault on the basis of obtaining the highest root score. The operation and maintenance personnel can maintain the communication system according to the root cause of the fault.

On the basis of determining the root cause of the fault, the invention also provides a communication early warning system or a communication safety system for early warning according to the root cause of the fault. The invention can be used for automatically early warning the fault of the communication system. On the basis of determining the fault root cause, the invention also provides a scheduling system capable of selecting a scheduling operation and maintenance personnel scheme based on the fault root cause so as to improve the scheduling efficiency of the operation and maintenance personnel.

In summary, the fault detection scheme based on spectrogram learning and the root cause positioning scheme based on nonlinear causal reinforcement learning improve the efficiency of fault detection in the traditional industrial environment in an artificial intelligence mode, find causal relations of industrial monitoring indexes, further assist operation and maintenance personnel in the industrial field to perform work of fault root cause positioning, improve the work efficiency of the operation and maintenance personnel, and reduce the difficulty of operation and maintenance. The early warning system or the scheduling system can further be beneficial to improving the operation efficiency of the operation and maintenance system and the personnel scheduling efficiency of the operation and maintenance system, and reduces or even avoids the situation that the operation and maintenance personnel are replaced due to the fact that the operation and maintenance personnel cannot be determined in advance.

It should be noted that the above-described embodiments are exemplary, and that a person skilled in the art, in light of the present disclosure, may devise various solutions that fall within the scope of the present disclosure and fall within the scope of the present disclosure. It should be understood by those skilled in the art that the present description and drawings are illustrative and not limiting to the claims. The scope of the invention is defined by the claims and their equivalents. The description of the invention includes a plurality of inventive concepts, such as "preferably", "according to a preferred embodiment" or "optionally" each meaning that the corresponding paragraph discloses a separate concept, the applicant reserves the right to filed a divisional application according to each inventive concept.

Claims

1. A method for diagnosing and analyzing the root cause of a communication terminal fault, the method at least comprising: the received radio frequency data of the terminal device (30) is converted into frequency domain characteristic data,

analyzing the frequency domain feature data based on a fault detection algorithm to determine whether the terminal device (30) is faulty;

receiving monitoring index data of the terminal equipment (30) in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode;

and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs.

2. The method for diagnosing and analyzing the root cause of a communication terminal fault according to claim 1, wherein the method for analyzing the frequency domain characteristic data based on a fault detection algorithm to determine whether the terminal device (30) is faulty comprises at least:

learning spectrogram features of the frequency domain feature data based on a convolutional neural network,

distinguishing a normal state from a fault state based on the spectrogram features;

and judging whether the terminal equipment (30) fails or not based on the waveform and the peak-valley characteristics of the spectrogram.

3. The method for diagnosing and analyzing the root cause of a fault of a communication terminal according to claim 1 or 2, wherein the method for calculating the nonlinear causal relationship between the indexes by means of reinforcement learning at least comprises:

generating an adjacency matrix according to the probability of causal relation between indexes;

evaluating the validity of edges in the index variable causal relationship graph, and calculating a corresponding score according to the sum of squares of the residual errors,

the acyclic nature of the causal graph is constrained using matrix indexes.

4. A communication terminal fault root cause diagnosis and analysis method according to any one of claims 1 to 3, wherein the calculation method of the probability of causal relation between indexes comprises at least:

the monitoring index data in the normal state are ordered according to time to form time sequence index data,

the time series indicator data is encoded and,

and decoding the time sequence index data based on a single-layer neural network to obtain the probability of causal relation between indexes.

5. The method for diagnosing and analyzing a root cause of a fault in a communication terminal according to claim 4, wherein the method for calculating a nonlinear causal relationship between indexes by means of reinforcement learning further comprises:

a return value is calculated and the result is obtained,

in the training process, optimizing a return function based on a random gradient descent method;

the learning rate is adjusted in an exponentially decaying manner.

6. The method for diagnosing and analyzing a root cause of a fault in a communication terminal according to claim 4, wherein the method for calculating a nonlinear causal relationship between indexes by means of reinforcement learning further comprises:

randomly sampling a plurality of T interval time sequences for the input monitoring index data through Monte Carlo;

in the training process, an adjacent matrix corresponding to the maximum return value is recorded, and is output as a result after the training is completed, so that the searching process is completed.

7. The method according to any one of claims 1 to 6, wherein, in the case of occurrence of a fault, the method for determining a root indicator of the fault based on the nonlinear causal relationship comprises at least:

calculating the degree of abnormality of the index variable based on the abnormality score of the variable self-encoder,

the weight of the anomaly score is given based on the distance between the index variable node and the root cause node,

and calculating the root cause score of the index variable node.

8. A communication terminal fault root cause diagnostic and analytical apparatus, said apparatus comprising at least:

fault detection module (320): analyzing the frequency domain characteristic data based on a fault detection algorithm to determine whether the terminal device (30) has a fault;

a causality construction module (330): the method comprises the steps of receiving monitoring index data of the terminal equipment (30) in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode;

fault root cause location module (340): and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs.

9. The communication terminal fault root cause diagnostic analysis device of claim 8, wherein the fault detection module (320) is configured to:

10. The communication terminal fault root cause diagnostic analysis device of claim 8 or 9, wherein the causal relationship construction module (330) is configured to:

the time series indicator data is encoded and,

decoding the time sequence index data based on a single-layer neural network to obtain the probability of causal relation between indexes;

the acyclic nature of the causal graph is constrained using matrix indexes.