CN116915582A - Diagnosis and analysis method and device for fault root cause of communication terminal - Google Patents

Diagnosis and analysis method and device for fault root cause of communication terminal Download PDF

Info

Publication number
CN116915582A
CN116915582A CN202311110498.2A CN202311110498A CN116915582A CN 116915582 A CN116915582 A CN 116915582A CN 202311110498 A CN202311110498 A CN 202311110498A CN 116915582 A CN116915582 A CN 116915582A
Authority
CN
China
Prior art keywords
fault
root cause
data
indexes
causal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311110498.2A
Other languages
Chinese (zh)
Inventor
许辰人
徐昊天
马翔天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202311110498.2A priority Critical patent/CN116915582A/en
Publication of CN116915582A publication Critical patent/CN116915582A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Monitoring And Testing Of Transmission In General (AREA)

Abstract

The invention relates to a method and a device for diagnosing and analyzing the root cause of a communication terminal fault, wherein the method at least comprises the following steps: converting the received radio frequency data of the terminal equipment into frequency domain feature data, and analyzing the frequency domain feature data based on a fault detection algorithm to judge whether the terminal equipment has faults or not; receiving monitoring index data of the terminal equipment in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode; and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs. Aiming at the defect that the root cause of the fault is difficult to find from observed business anomalies in the prior art, the invention assists operation and maintenance personnel in the industrial field to carry out the work of locating the root cause of the fault by finding the causal relation of industrial monitoring indexes.

Description

Diagnosis and analysis method and device for fault root cause of communication terminal
Technical Field
The invention relates to the technical field of intelligent operation and maintenance, in particular to a method and a device for diagnosing and analyzing the root cause of a communication terminal fault.
Background
Faults in industrial settings often result from various causes such as metal random movement shielding, unlicensed band bursty interference, hardware antenna looseness, radio frequency channel failure, equipment configuration failure, and so forth. Different faults occur at different levels, which can affect equipment state indexes, communication physical parameters, network protocol indexes and the like, wherein the indexes can be collected and observed when the system is in operation and used for monitoring the occurrence of the faults, however, the occurrence of the faults often affects not only a single index, but also complex inter-equipment relations and adaptation and diffusion of multiple protocols, so that more indexes show abnormal conditions, and finally, the root cause of the faults is difficult to find from observed business anomalies alone.
The industrial field usually triggers fault alarm by setting a threshold value for equipment fault operation and maintenance, and a business expert generally selects a proper threshold value according to the characteristics of key indexes and corresponding application scenes and combines the past data and experience, so that the method has high requirements on the field expertise, needs to deeply understand the physical meaning behind the indexes and the corresponding working principle, and simultaneously depends on the past experience too much, and the simple setting of the threshold value is difficult to meet the high-efficiency operation and maintenance requirements under the condition of high dynamic change of environment or under the condition of similar scenes but different thresholds, and simultaneously brings great manpower consumption. At present, a method for diagnosing and analyzing the root cause of the faults of the industrial 5G-U communication equipment is not well solved.
Patent application publication No. CN116032726A discloses a fault root cause positioning model training method, device, equipment and readable storage medium. The method comprises the following steps: acquiring historical fault data and historical KPI data of a plurality of sample network elements in a communication network; according to the historical fault data and the historical KPI data of each sample network element, a sample data set is obtained, and each group of sample data in the sample data set comprises an alarm feature vector generated based on the fault data and the historical KPI data of each group of sample network elements and a fault root cause corresponding to the alarm of the group of sample network elements; training a classification model based on an attention mechanism by using the sample data set to obtain a fault root cause positioning model; the fault root cause positioning model is used for predicting the fault root cause corresponding to the alarm of at least one target network element based on the alarm feature vector of the at least one target network element. The drawbacks of this prior art are: the invention directly locates the root cause of the fault, lacks causal relation analysis between data, and therefore lacks the interpretability of the root cause of the fault.
The patent application with publication number of CN111538316A discloses a performance-based fault diagnosis method and system for an actuator of a closed-loop control system. The method comprises the following steps: establishing a linear nominal model of the control system under the condition of no fault, and determining an identification model of the actual control system according to input data and output data of the actual control system; determining a first time domain performance residual according to the difference between the output of the linear nominal model under the closed-loop feedback and the output of the identification model under the closed-loop feedback; determining a first frequency domain performance residual error and a first stable domain performance residual error by adopting a gap measurement method according to a linear nominal model and an identification model; and performing fault detection according to the time domain performance residual error, the frequency domain performance residual error and the stable domain performance residual error between the linear nominal model and the identification model. Since this patent lacks analysis of nonlinear causal relationships, it is likely that the resulting fault is not the root cause of the fault.
Based on the above drawbacks, the present invention is expected to improve the efficiency of fault detection in a conventional industrial environment by a fault detection scheme for spectrogram learning and a root cause localization scheme for nonlinear causal reinforcement learning.
Furthermore, there are differences in one aspect due to understanding to those skilled in the art; on the other hand, since the applicant has studied a lot of documents and patents while making the present invention, the text is not limited to details and contents of all but it is by no means the present invention does not have these prior art features, but the present invention has all the prior art features, and the applicant remains in the background art to which the right of the related prior art is added.
Disclosure of Invention
In the prior art, since the monitoring index data has the characteristics of multiple dimensions and multiple layers, such as the power intensity, the duty ratio and the bandwidth resource allocation proportion of an unauthorized channel are different, the scheduling response time delay, the throughput, the packet loss proportion, the memory utilization rate and the like of each layer of a network stack are different, so that the interaction relationship among the index data is complex, the fault propagation influence range is wide, and the interaction relationship is difficult to be described by establishing a unified model, so that the influence effect of faults is difficult to be analyzed.
The industrial field often uses judgment threshold values to detect faults, and has the problems that proper threshold values are difficult to select, data drift exists in practical deployment to cause different optimal threshold values in different factory environments, and maintenance personnel are difficult to analyze index association due to high complexity of the system, so that a more efficient fault detection and root cause analysis scheme is needed.
Aiming at the defects of the prior art, the invention provides a diagnostic analysis method for the fault root cause of a communication terminal, which at least comprises the following steps: converting the received radio frequency data of the terminal equipment into frequency domain feature data, and analyzing the frequency domain feature data based on a fault detection algorithm to judge whether the terminal equipment has faults or not; receiving monitoring index data of the terminal equipment in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode; and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs.
According to the fault detection scheme based on spectrogram learning and the root cause positioning scheme based on nonlinear causal reinforcement learning, the efficiency of fault detection in the traditional industrial environment is improved in an artificial intelligence mode, and the causal relation of industrial monitoring indexes is found, so that operation and maintenance personnel in the industrial field are assisted to perform fault root cause positioning, and the purposes of intelligent industrial operation and labor cost reduction are achieved. The invention has the capability of solving the causal relationship of the data, can provide a clear and interpretable analysis process in the analysis of the root cause of the fault, and has more positive significance for solving the fault.
Preferably, the method for analyzing the frequency domain feature data based on the fault detection algorithm to determine whether the terminal device has a fault at least includes: learning spectrogram features of the frequency domain feature data based on a convolutional neural network, and distinguishing a normal state and a fault state based on the spectrogram features; and judging whether the terminal equipment fails or not based on the waveform and the peak-valley characteristics of the spectrogram. According to the invention, the spectrum diagram features are learned through the convolutional neural network, and the 5G-U equipment in a normal working state can be distinguished from the 5G-U equipment in a fault state, so that whether the 5G-U equipment fails or not can be accurately detected.
Preferably, the method for calculating the nonlinear causal relationship between indexes by means of reinforcement learning at least comprises the following steps: generating an adjacency matrix according to the probability of causal relation between indexes; and evaluating the effectiveness of edges in the index variable causal relation graph, calculating corresponding scores according to the sum of squares of residual errors, and adopting a matrix index to restrict the acyclic nature of the causal relation graph.
According to the method, the nonlinear causal relationship among the indexes is calculated, the effectiveness of the edges in the index variable causal relationship graph is evaluated to eliminate the invalid nonlinear causal relationship, and the accuracy of the nonlinear causal relationship is improved.
Preferably, the method for calculating the probability of causal relation between indexes at least comprises the following steps: and ordering the monitoring index data in a normal state according to time to form time sequence index data, encoding the time sequence index data, and decoding the time sequence index data based on a single-layer neural network to obtain the probability of causal relation between indexes.
The probability of the causal relation is obtained based on the time sequence index data, and the method has the following advantages: and finding objective causal relation rules existing between the observed time sequence index data. Compared with the mode of directly positioning the root cause through the neural network, the method can explain the propagation process and basis when the fault occurs to operation and maintenance personnel. Meanwhile, as the direct causal relation of the index data is generally stable and unchanged, the probability of the obtained causal relation is more universal, and the method is suitable for the condition of large data change range and cannot be randomly influenced by neural network parameter training.
Preferably, the method for calculating the nonlinear causal relationship between indexes by means of reinforcement learning further comprises: calculating a return value, and optimizing a return function based on a random gradient descent method in the training process; the learning rate is adjusted in an exponentially decaying manner. The advantage of such training is that: because the original problem can not be solved in polynomial time, the method can quickly solve the approximate solution in a random gradient descent mode, and has practicability. The invention can flexibly adjust the return function according to the application scene, thereby being applicable to different application scenes rapidly.
Preferably, the method for calculating the nonlinear causal relationship between indexes by means of reinforcement learning further comprises: randomly sampling a plurality of T interval time sequences for the input monitoring index data through Monte Carlo; in the training process, an adjacent matrix corresponding to the maximum return value is recorded, and is output as a result after the training is completed, so that the searching process is completed. The adjacency matrix corresponding to the maximum return value is output, and the purpose is that: the problem of local optimum due to randomness is avoided. According to the invention, the value of the adjacency matrix with larger space can be obtained through probability sampling, so that a better solution can be found in the searching process, and the graph which is most in line with the causal relation of index data can be obtained by recording the optimal adjacency matrix.
Preferably, in the case of occurrence of a fault, the method for judging the root cause index of the fault based on the nonlinear causal relationship at least comprises: the anomaly degree of the index variable is calculated based on the anomaly score of the variable self-encoder, and the root score of the index variable node is calculated based on the weight given to the anomaly score by the distance between the index variable node and the root node.
According to the invention, root cause scores of each index variable node are calculated, and root cause indexes of faults are judged by accumulating the root cause scores.
The invention also provides a device for diagnosing and analyzing the root cause of the communication terminal fault, which at least comprises: and a fault detection module: the method comprises the steps of converting received radio frequency data of the terminal equipment into frequency domain feature data, and analyzing the frequency domain feature data based on a fault detection algorithm to judge whether the terminal equipment has faults or not; the causal relation construction module: the method comprises the steps of receiving monitoring index data of the terminal equipment in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode; fault root cause positioning module: and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs.
The device can improve the efficiency of fault detection in the traditional industrial environment in an artificial intelligence mode according to the radio frequency data and the monitoring index data of the terminal equipment, find the causal relationship of the industrial monitoring index, assist operation and maintenance personnel in the industrial field to perform the work of fault root cause positioning, and achieve the purposes of intelligent industrial operation and labor cost reduction.
Preferably, the fault detection module is configured to: learning spectrogram features of the frequency domain feature data based on a convolutional neural network, and distinguishing a normal state and a fault state based on the spectrogram features; and judging whether the terminal equipment fails or not based on the waveform and the peak-valley characteristics of the spectrogram.
The fault detection module can accurately distinguish the normal working state and the fault state based on the waveform and the peak-valley characteristic of the spectrogram, and the fault judgment accuracy is high.
Preferably, the causal relationship construction module is configured to: the monitoring index data in a normal state are ordered according to time to form time sequence index data, the time sequence index data are encoded, and the time sequence index data are decoded based on a single-layer neural network to obtain the probability of causal relation between indexes; generating an adjacency matrix according to the probability of causal relation between indexes; and evaluating the effectiveness of edges in the index variable causal relation graph, calculating corresponding scores according to the sum of squares of residual errors, and adopting a matrix index to restrict the acyclic nature of the causal relation graph.
The fault root cause positioning module provided by the invention can accurately extract and position the fault root cause based on the probability of causal relation among indexes, reduces the working difficulty of operation and maintenance personnel in analyzing the fault root cause, and reduces the labor cost.
Drawings
FIG. 1 is a schematic illustration of a communication connection for fault diagnosis of a preferred embodiment provided by the present invention;
FIG. 2 is a schematic flow chart of the method for diagnosing and analyzing the root cause of the communication terminal fault;
FIG. 3 is a schematic diagram of module connection of the communication terminal fault root cause diagnosis and analysis device provided by the invention;
FIG. 4 is a spectrum diagram of the normal state of the communication terminal fault root cause diagnosis and analysis device provided by the invention;
fig. 5 is a spectrum diagram of a fault condition of a communication terminal fault root cause diagnosis and analysis apparatus provided by the present invention.
List of reference numerals
20: a base station; 30: a terminal device; 40: a fault root cause diagnosis and analysis device; 310: a data analysis module; 320: a fault detection module; 330: a causal relationship construction module; 340 the root cause positioning module.
Detailed Description
The following detailed description refers to the accompanying drawings.
In the prior art, since the monitoring index data has the characteristics of multiple dimensions and multiple layers, such as the power intensity, the duty ratio and the bandwidth resource allocation proportion of an unauthorized channel are different, the scheduling response time delay, the throughput, the packet loss proportion, the memory utilization rate and the like of each layer of a network stack are different, so that the interaction relationship among the index data is complex, the fault propagation influence range is wide, and the interaction relationship is difficult to be described by establishing a unified model, so that the influence effect of faults is difficult to be analyzed.
For example, when other devices set the working frequency band by mistake or have interference sources, the same frequency interference fault can be generated, so that a certain industrial 5G-U terminal is interfered, the signal-to-noise ratio (root-dependent index variable) of the received signal can be abnormally reduced, the error rate (affected index variable) is increased, the packet loss rate (affected index variable) is also increased, and the application data transmission speed is reduced, so that the quality of the monitoring picture (affected index variable) is finally reduced.
The industrial field often uses judgment threshold values to detect faults, and has the problems that proper threshold values are difficult to select, data drift exists in practical deployment to cause different optimal threshold values in different factory environments, and maintenance personnel are difficult to analyze index association due to high complexity of the system, so that a more efficient fault detection and root cause analysis scheme is needed.
Based on the defect, the invention hopefully provides the diagnosis and analysis method and the diagnosis and analysis device for the fault root cause of the communication terminal, the efficiency of fault detection in the traditional industrial environment can be improved in an artificial intelligence mode, and the causal relationship of industrial monitoring indexes is found, so that operation and maintenance personnel in the industrial field are assisted to perform the work of fault root cause positioning, and the effects of intelligent industrial operation and labor cost reduction are achieved.
The invention is described below with respect to some terms.
Radio frequency data: refers to wireless signal data transmitted and received by the 5G-U device through an antenna.
Frequency spectrum diagram: refers to a plot of the radio frequency data as it is being subjected to a fast fourier transform and then presented in the frequency domain.
Monitoring index data: the method at least comprises the relevant time sequence data of network communication such as signal-to-noise ratio, reference signal receiving power, modulation coding strategy, transmitting power allowance, round trip delay, packet loss rate and the like.
The root cause is: the root cause.
Root cause score: the degree of abnormality of the time series index data is an accumulated score obtained by a causal relation graph, and indicates the possibility that the index is the cause of a fault.
Nonlinear causal relationship: two variables that are non-linearly related (i.e., 2 items to be compared) do not change over a range as one of the other variables changes while the other variables are unchanged.
Encoding: a process of converting information from one form or format to another.
Decoding: a process of restoring the digital code to what it represents or converting an electric pulse signal, an optical signal, a radio wave, or the like to information, data, or the like that it represents. Decoding is the process by which the recipient restores the received symbol or code to information, corresponding to the encoding process.
Example 1
As shown in fig. 1, a schematic view of one of the fault diagnoses implemented by the present invention is provided. In this scenario, at least the data analysis module 310, the base station 20, the terminal device 30, and the root cause diagnosis analysis apparatus 40 are included. The data analysis module 310 is communicatively connected to the terminal device 30 via the base station 20. The terminal device 30 is connected in communication with the fault root cause diagnostic analysis means 40 in a wired and/or wireless manner.
Preferably, the terminal device 30 transmits wireless data packets in the unlicensed frequency band of 2.4GHz or 5.8GHz through the 5G-U wireless protocol, and then traffic for industrial applications is sent to the base station 20 through the industrial 5G-U terminal device and the service is completed.
Preferably, the fault root diagnosis analysis apparatus 40 is configured to execute the fault root diagnosis analysis method of the present invention, which is capable of executing an operation program corresponding to the fault root diagnosis analysis method. The fault root diagnosis analysis apparatus 40 preferably executes the fault root diagnosis analysis method of the present invention by using hardware such as an application specific integrated chip, a processor, a server, a micro server, and a cloud server.
Preferably, as shown in fig. 3, the fault root cause diagnosis and analysis apparatus 40 includes at least: the fault detection module 320, the causal relationship construction module 330, and the fault root location module 340. The fault detection module 320, the causal relationship construction module 330 and the fault root positioning module 340 can be connected and operated under the condition of being integrated in the same hardware, and can also belong to different hardware, so that different hardware can be connected and operated.
The fault detection module 320, the causal relationship construction module 330, and the fault root positioning module 340 are sequentially connected through a plurality of communication ports or communication lines and establish an information transmission relationship. The fault detection module 320 is communicatively connected to the terminal device 30 via at least one communication port in a wired and/or wireless manner to receive data of the terminal device 30.
Preferably, the fault root cause diagnosis and analysis apparatus 40 is communicatively connected to the data analysis module 310 through the base station 20 to perform steps S210 to S250, as shown in fig. 2.
The invention collects data in normal state and data in fault experiment. Specifically, the fault experiment includes three types, the first type is to use an interference source to perform co-channel interference (a second fault A2 in fig. 1), so that wireless data of a terminal is difficult to normally transmit; the second type is to adopt radio frequency channel faults, and the data transmission efficiency is reduced by applying attenuation; the third type is an antenna loose hardware failure (first failure A1 in fig. 1), which affects the wireless link by loosening the antenna.
In the experimental scenario, the base station 20 and the terminal device 30 are connected through an air interface, and transmit wireless data by adopting a 5G-U protocol. The base station 20 is connected to a first switch and then to a server of the same network segment. The terminal device 30 is connected to the second switch and then to the user application under the same network segment. The antenna spacing between the base station 20 and the terminal equipment 30 is 1 meter. The 4 antennas on the same terminal device 30 are spaced 2 cm apart.
S210: the data analysis module 310 is configured to obtain radio frequency data and monitoring index data from the terminal device 30 and the base station 20.
In the data collection process, the terminal device 30 collects and saves the I-way and Q-way data in the wireless communication process. At the same time, the base station 20 gathers a large amount of monitoring index data for indicating the current system operating state. All monitoring index data is collected and stored in a database. Preferably, the database is provided in the base station 20. The fault root cause diagnosis and analysis module 40 obtains monitoring index data of the communication system in a normal working state and index data of the communication system in a fault state by reading the database.
S220: the data analysis module 310 converts the radio frequency data of all symbols and antennas into corresponding frequency domain characteristic data through preprocessing. Preferably, the data analysis module 310 also extracts a spectrogram based on the frequency domain feature data.
The data analysis module 310 converts the radio frequency data into frequency domain feature data according to a fast fourier transform. The frequency domain characteristic data comprises a plurality of frequency spectrograms corresponding to the antennas and the symbols. The data analysis module 310 sends the frequency domain signature data to the fault detection module 320.
S230: the fault detection module 320 analyzes the frequency domain characteristic data based on a fault detection algorithm to determine whether the terminal device has a fault.
Specifically, the fault detection module 320 learns spectrogram features corresponding to the frequency domain feature data based on the convolutional neural network. The fault detection module 320 determines whether the communication system corresponding to the spectrogram belongs to a normal state or a fault state based on the spectrogram feature.
Preferably, the fault detection module 320 determines whether the terminal device 30 is faulty based on the waveform and the features of the peaks and valleys of the spectrogram. The fault detection module can accurately distinguish the normal working state and the fault state based on the waveform and the peak-valley characteristic of the spectrogram, and the fault judgment accuracy is high.
Specifically, first, the fault detection module 320 divides the frequency domain feature data of the data collection module 310 according to the frequency band average, uses half of the data for training, and uses the other half of the data for testing, so as to ensure that the data for training and the data for testing do not intersect in the frequency band, and the purpose is to prove the high applicability of the algorithm model.
Then, the fault detection module 320 combines the frequency domain feature data with the device antenna according to the 5G protocol symbol to form an image with a size of 10x1024, and inputs the image, and performs classification training according to the fault label by using the convolutional neural network, so as to obtain a trained fault detection model. The spectrogram obtained according to the mode at any moment can be input into the fault detection model to judge whether faults occur.
Fig. 4 shows a spectrum diagram of a normal state, and fig. 5 shows a spectrum diagram of a fault state. The horizontal axis of fig. 4 and 5 represents the frequency domain, and the vertical axis represents the amplitude. Fig. 4 is blank in the frequency domain of 550-2000, and there is no spectral feature. There are spectral features of the amplitude law in a particular frequency domain that is less than 550.
In comparison with fig. 4, fig. 5 has several spectrum signals in the frequency domain of 550-2000. The amplitude of the radio frequency signal varies in an irregular state, and the amplitude of the variation becomes significantly smaller. Obviously, the spectrogram in the normal state and the spectrogram in the fault state are obviously different in the shape of the frequency spectrum. The fault detection model formed based on the convolutional neural network can learn based on the difference characteristics of the spectrogram in the normal state and the spectrogram in the fault state on the spectrum characteristics, and the effect of distinguishing the two states is achieved.
Preferably, the learning model within the fault detection module 320 is not limited to a convolutional neural network model, but may be a deep reinforcement learning model or the like.
Because the fault detection module 320 directly learns the spectral features of the spectrogram, and the device spectrogram in the fault state and the spectrogram in the normal state have clear visible shape differences, the convolutional neural network can quickly learn and distinguish the two states, so as to judge whether the fault occurs.
Therefore, the fault detection module 320 has high accuracy in judging faults, the fault detection model has high training speed and low calculation force requirement, the scene is widely used, and the method is universal.
S240: the causal relationship construction module 33) receives the monitored index data of the terminal device 30 in the normal state, and calculates the nonlinear causal relationship between the indexes by means of reinforcement learning.
Specifically, the causal relationship construction module 330 sorts the monitoring indicator data in the normal state by time to form time series indicator data. The causal relationship construction module 330 encodes the time series indicator data. For example, for the time sequence index data RSRP, sampling values in the time interval T are input, and the value h is obtained after input encoding by using the gate-controlled loop network unit GRU i
The causal relationship construction module 330 decodes the time series indicator data based on a single layer neural network to obtain probabilities that causal relationships exist between the indicators. Decoding to obtain a function p (e ij E). The decoding result is calculated as a function described below.
The causal relationship construction module 330 generates an adjacency matrix according to the probability that causal relationships exist between the indicators. The function of the adjacency matrix obtained by decoding is:
p(e ij ∈E)=σ(u T tanh(w 1 h i +W 2 h j ))
wherein e ij Indicating the reason why the ith variable is the jth variable, W 1 ,W 2 U represents the neural network parameters, outer sigma represents the probability of sigmoid activation function mapping to edges, E represents the set of edges in the causal relationship graph, T represents the time interval, h i Representing the value vector, h, of the ith index code in the time sequence index data j And represents the value vector of the j index code in the time sequence index data. Final baseThe values for each element in the adjacency matrix are probability-generated at the Bernoulli distribution. The adjacency matrix indicates which edges are present in the causal graph. Each adjacency matrix corresponds to a determined causal graph.
After generating the adjacency matrix, the causal relationship construction module 330 evaluates the validity of edges in the index variable causal relationship graph. Preferably, the causal relationship construction module 330 evaluates the validity of edges in the index variable causal relationship graph based on the calculated scores. The smaller Score (G) the better, indicating that the edges in the causal graph are more responsive to real causal relationships.
The causal relationship construction module 330 calculates a corresponding score from the sum of squares of the residuals.
Specifically, the calculation function for calculating the corresponding score from the sum of squares of the residuals includes:
wherein Score (G) represents a Score;representing the sum of squares of the residuals; i E represents the number of edges in the causal graph; m represents the number of data samples.
The causal relationship construction module 330 employs matrix indexes to constrain the acyclic nature of the causal relationship graph.
Specifically, the causal relationship construction module 330 calculates the return value as a function of:
r represents a return value, lambda 1 Weights indicating whether the adjustment is directed acyclic graph, if so, the weights are λ 1 Otherwise the weight is 0. Lambda (lambda) 2 Parameters representing matrix index constraints, G represents the searched directed graph, DAGs represents the set of all directed acyclic graphs, and h (W) represents the matrix index.
The acyclic nature of the searched directed graph can be represented by a matrix index. The denser the rings of the directed graph, the greater the corresponding calculated matrix index, resulting in a lower return value. The causal relationship construction module 330 orders the return values and searches in the direction of increasing return values, making it easier to find directed acyclic graphs.
The causal relation construction module can accurately extract and position the fault root cause based on the probability of causal relation among indexes, reduces the working difficulty of operation and maintenance personnel in analyzing the fault root cause, and reduces the labor cost.
Preferably, during training, the causal relationship construction module 330 optimizes the return function based on a random gradient descent method. Specifically, a random gradient descent is performed using an adaptive moment estimation (Adam) optimizer, and an exponential decay is used to adjust the learning rate. i represents the ith round:
lr=lr×γ i
lr denotes the learning rate, lr denotes the initial learning rate, and γ denotes the decay factor.
The causal relationship construction module 330 adjusts the learning rate in an exponentially decaying manner. The advantage of such training is that: a better solution is quickly obtained using a larger learning rate. Along with the continuation of iteration, the learning rate is gradually reduced, so that the fault detection model is more stable in the later stage of training.
Preferably, the causal relationship construction module 330 randomly samples the input monitoring indicator data with a plurality of T-interval time sequences by monte carlo.
During training, the causal relationship construction module 330 records the adjacency matrix corresponding to the maximum return value. And outputting the result after training is finished, and finishing the searching process. Outputting an adjacency matrix corresponding to the maximum return value, which has the advantages that: recording the adjacency matrix corresponding to the maximum return value can enable the causal relation graph corresponding to the adjacency matrix to be closest to the real causal relation, so that the optimal solution is recorded.
The causal relationship construction module 330 outputs an adjacency matrix corresponding to the maximum return value to the fault root location module 340.
S250: in the event of a fault, the fault root location module 340 determines a root indicator of the fault based on nonlinear causal relationships.
Specifically, the degree of abnormality of the index variable is calculated based on the abnormality score of the variable self-encoder, and the root score of the index variable node is calculated by giving a weight to the abnormality score based on the distance between the index variable node and the root node.
Preferably, the variation is derived from the encoder algorithm calculating the degree of abnormality of the individual timing indicator variables based on the degree of deviation from normal. Specifically, the method of amplitude unsupervised anomaly detection (Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications) by automatic encoder according to seasonal KPI variation adopts the reconstruction probability of the data of a single time sequence index variable at the fault moment as the anomaly score SA k
When a fault occurs, a single index variable acts as a single cause of the fault, and its abnormal changes affect all of its outcome variables and further downstream variables, and thus propagate in the causal graph. The purpose of assigning the weight of the anomaly score based on the distance between the index variable node and the root node is to comprehensively consider the attenuation of the influence of the propagation process in order to compare the propagation process, thereby calculating the root index variable most likely to correspond to the fault.
The method for calculating the root cause score of the index variable node comprises the following steps:
wherein lambda is d In (2) represents a discount factor, in order to make the anomaly score of the index variable closer to the root cause contribute more to the final root cause score, and d represents the index variable node X k Distance from source node X i Distances in the causal relationship graph. Anomaly score SA k And the method is calculated by a self-encoder based on variation. And finally, selecting the index variable node with the highest root factor score as the fault root factor of the positioning.
The root represents the root of the fault on the basis of obtaining the highest root score. The operation and maintenance personnel can maintain the communication system according to the root cause of the fault.
On the basis of determining the root cause of the fault, the invention also provides a communication early warning system or a communication safety system for early warning according to the root cause of the fault. The invention can be used for automatically early warning the fault of the communication system. On the basis of determining the fault root cause, the invention also provides a scheduling system capable of selecting a scheduling operation and maintenance personnel scheme based on the fault root cause so as to improve the scheduling efficiency of the operation and maintenance personnel.
In summary, the fault detection scheme based on spectrogram learning and the root cause positioning scheme based on nonlinear causal reinforcement learning improve the efficiency of fault detection in the traditional industrial environment in an artificial intelligence mode, find causal relations of industrial monitoring indexes, further assist operation and maintenance personnel in the industrial field to perform work of fault root cause positioning, improve the work efficiency of the operation and maintenance personnel, and reduce the difficulty of operation and maintenance. The early warning system or the scheduling system can further be beneficial to improving the operation efficiency of the operation and maintenance system and the personnel scheduling efficiency of the operation and maintenance system, and reduces or even avoids the situation that the operation and maintenance personnel are replaced due to the fact that the operation and maintenance personnel cannot be determined in advance.
It should be noted that the above-described embodiments are exemplary, and that a person skilled in the art, in light of the present disclosure, may devise various solutions that fall within the scope of the present disclosure and fall within the scope of the present disclosure. It should be understood by those skilled in the art that the present description and drawings are illustrative and not limiting to the claims. The scope of the invention is defined by the claims and their equivalents. The description of the invention includes a plurality of inventive concepts, such as "preferably", "according to a preferred embodiment" or "optionally" each meaning that the corresponding paragraph discloses a separate concept, the applicant reserves the right to filed a divisional application according to each inventive concept.

Claims (10)

1. A method for diagnosing and analyzing the root cause of a communication terminal fault, the method at least comprising: the received radio frequency data of the terminal device (30) is converted into frequency domain characteristic data,
analyzing the frequency domain feature data based on a fault detection algorithm to determine whether the terminal device (30) is faulty;
receiving monitoring index data of the terminal equipment (30) in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode;
and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs.
2. The method for diagnosing and analyzing the root cause of a communication terminal fault according to claim 1, wherein the method for analyzing the frequency domain characteristic data based on a fault detection algorithm to determine whether the terminal device (30) is faulty comprises at least:
learning spectrogram features of the frequency domain feature data based on a convolutional neural network,
distinguishing a normal state from a fault state based on the spectrogram features;
and judging whether the terminal equipment (30) fails or not based on the waveform and the peak-valley characteristics of the spectrogram.
3. The method for diagnosing and analyzing the root cause of a fault of a communication terminal according to claim 1 or 2, wherein the method for calculating the nonlinear causal relationship between the indexes by means of reinforcement learning at least comprises:
generating an adjacency matrix according to the probability of causal relation between indexes;
evaluating the validity of edges in the index variable causal relationship graph, and calculating a corresponding score according to the sum of squares of the residual errors,
the acyclic nature of the causal graph is constrained using matrix indexes.
4. A communication terminal fault root cause diagnosis and analysis method according to any one of claims 1 to 3, wherein the calculation method of the probability of causal relation between indexes comprises at least:
the monitoring index data in the normal state are ordered according to time to form time sequence index data,
the time series indicator data is encoded and,
and decoding the time sequence index data based on a single-layer neural network to obtain the probability of causal relation between indexes.
5. The method for diagnosing and analyzing a root cause of a fault in a communication terminal according to claim 4, wherein the method for calculating a nonlinear causal relationship between indexes by means of reinforcement learning further comprises:
a return value is calculated and the result is obtained,
in the training process, optimizing a return function based on a random gradient descent method;
the learning rate is adjusted in an exponentially decaying manner.
6. The method for diagnosing and analyzing a root cause of a fault in a communication terminal according to claim 4, wherein the method for calculating a nonlinear causal relationship between indexes by means of reinforcement learning further comprises:
randomly sampling a plurality of T interval time sequences for the input monitoring index data through Monte Carlo;
in the training process, an adjacent matrix corresponding to the maximum return value is recorded, and is output as a result after the training is completed, so that the searching process is completed.
7. The method according to any one of claims 1 to 6, wherein, in the case of occurrence of a fault, the method for determining a root indicator of the fault based on the nonlinear causal relationship comprises at least:
calculating the degree of abnormality of the index variable based on the abnormality score of the variable self-encoder,
the weight of the anomaly score is given based on the distance between the index variable node and the root cause node,
and calculating the root cause score of the index variable node.
8. A communication terminal fault root cause diagnostic and analytical apparatus, said apparatus comprising at least:
fault detection module (320): analyzing the frequency domain characteristic data based on a fault detection algorithm to determine whether the terminal device (30) has a fault;
a causality construction module (330): the method comprises the steps of receiving monitoring index data of the terminal equipment (30) in a normal state, and calculating nonlinear causal relation among indexes in a reinforcement learning mode;
fault root cause location module (340): and judging the root cause index of the fault based on the nonlinear causal relationship under the condition that the fault occurs.
9. The communication terminal fault root cause diagnostic analysis device of claim 8, wherein the fault detection module (320) is configured to:
learning spectrogram features of the frequency domain feature data based on a convolutional neural network,
distinguishing a normal state from a fault state based on the spectrogram features;
and judging whether the terminal equipment (30) fails or not based on the waveform and the peak-valley characteristics of the spectrogram.
10. The communication terminal fault root cause diagnostic analysis device of claim 8 or 9, wherein the causal relationship construction module (330) is configured to:
the monitoring index data in the normal state are ordered according to time to form time sequence index data,
the time series indicator data is encoded and,
decoding the time sequence index data based on a single-layer neural network to obtain the probability of causal relation between indexes;
generating an adjacency matrix according to the probability of causal relation between indexes;
evaluating the validity of edges in the index variable causal relationship graph, and calculating a corresponding score according to the sum of squares of the residual errors,
the acyclic nature of the causal graph is constrained using matrix indexes.
CN202311110498.2A 2023-08-30 2023-08-30 Diagnosis and analysis method and device for fault root cause of communication terminal Pending CN116915582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311110498.2A CN116915582A (en) 2023-08-30 2023-08-30 Diagnosis and analysis method and device for fault root cause of communication terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311110498.2A CN116915582A (en) 2023-08-30 2023-08-30 Diagnosis and analysis method and device for fault root cause of communication terminal

Publications (1)

Publication Number Publication Date
CN116915582A true CN116915582A (en) 2023-10-20

Family

ID=88351341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311110498.2A Pending CN116915582A (en) 2023-08-30 2023-08-30 Diagnosis and analysis method and device for fault root cause of communication terminal

Country Status (1)

Country Link
CN (1) CN116915582A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117354859A (en) * 2023-12-05 2024-01-05 深圳百沃彰世科技有限公司 Mobile terminal communication quality detection and early warning system based on Internet of things
CN117493980A (en) * 2023-12-29 2024-02-02 合肥工业大学 Bearing fault diagnosis method integrating feature extraction and sequencing causal discovery

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117354859A (en) * 2023-12-05 2024-01-05 深圳百沃彰世科技有限公司 Mobile terminal communication quality detection and early warning system based on Internet of things
CN117354859B (en) * 2023-12-05 2024-04-19 深圳百沃彰世科技有限公司 Mobile terminal communication quality detection and early warning system based on Internet of things
CN117493980A (en) * 2023-12-29 2024-02-02 合肥工业大学 Bearing fault diagnosis method integrating feature extraction and sequencing causal discovery
CN117493980B (en) * 2023-12-29 2024-03-19 合肥工业大学 Bearing fault diagnosis method integrating feature extraction and sequencing causal discovery

Similar Documents

Publication Publication Date Title
CN109492193B (en) Abnormal network data generation and prediction method based on deep machine learning model
CN116915582A (en) Diagnosis and analysis method and device for fault root cause of communication terminal
CN116625438B (en) Gas pipe network safety on-line monitoring system and method thereof
CN116559598B (en) Smart distribution network fault positioning method and system
CN113473514B (en) Electric power wireless private network fault diagnosis model training method, diagnosis method and apparatus
CN116684878B (en) 5G information transmission data safety monitoring system
CN111275136B (en) Fault prediction system based on small sample and early warning method thereof
Cerar et al. Learning to detect anomalous wireless links in iot networks
CN115470850A (en) Water quality abnormal event recognition early warning method based on pipe network water quality time-space data
CN117411703A (en) Modbus protocol-oriented industrial control network abnormal flow detection method
CN117041312A (en) Enterprise-level information technology monitoring system based on Internet of things
CN112734977B (en) Equipment risk early warning system and algorithm based on Internet of things
CN117354171B (en) Platform health condition early warning method and system based on Internet of things platform
CN117408162A (en) Power grid fault control method based on digital twin
CN117435908A (en) Multi-fault feature extraction method for rotary machine
CN117556347A (en) Power equipment fault prediction and health management method based on industrial big data
CN104168599B (en) Wireless sensor network fault diagnosis method based on time weight K- nearest neighbour methods
CN116302804A (en) Monitoring index anomaly detection method, system and medium based on time sequence
CN114785617A (en) 5G network application layer anomaly detection method and system
CN114330741A (en) Manufacturing equipment fault monitoring model training method based on federal learning
CN114298213A (en) Satellite communication equipment fault diagnosis method based on multi-dimensional feature vectors and classification models
CN113328881A (en) Topology sensing method, device and system for non-cooperative wireless network
CN112995995A (en) Anomaly detector, anomaly detection network and method for detecting anomalous activity
Lin et al. A semi-supervised approach for abnormal event prediction on large operational network time-series data
CN117939504B (en) Ad hoc network anti-interference method based on interference sensing and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination