CN117242455A

CN117242455A - Neural network model fault monitoring method and device in automatic driving system

Info

Publication number: CN117242455A
Application number: CN202280006258.5A
Authority: CN
Inventors: 王矿磊; 陈艺帆; 陈德久; 苏鹏
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2023-12-15
Also published as: WO2023184188A1

Abstract

A neural network model fault monitoring method and device in an automatic driving system, wherein the method comprises the following steps: acquiring a target output data set (301) of a neural network model to be monitored in an automatic driving system, wherein the target output data set comprises output data sets corresponding to all neural network layers in M neural network layers, the neural network model to be monitored comprises M neural network layers, M is an integer greater than 1, and M is an integer greater than 1 and not greater than M; extracting a characteristic value set (302) corresponding to each neural network layer from the target output data set; calculating relative entropy values between the characteristic value set and a first element set conforming to preset probability distribution to obtain relative entropy value sets (303) corresponding to m neural network layers; and judging whether the neural network model to be monitored has an operation fault or not according to the relative entropy value set (304). By the method, the real-time performance and the accuracy of fault monitoring of the neural network model to be monitored are improved, and the safety of the automatic driving vehicle is ensured.

Description

Neural network model fault monitoring method and device in automatic driving system

Technical Field

The application relates to the technical field of automatic driving, in particular to a neural network model fault monitoring method and device in an automatic driving system.

Background

Because of the high complexity of the computing platform, artificial intelligence (Artificial Intelligence, AI) accelerator and other devices in the automatic driving system, the neural network models deployed on the devices are more susceptible to factors such as hardware failure when performing reasoning operation; therefore, whether the neural network model has operation faults or not can be timely and accurately monitored, and the method has important significance for guaranteeing the safety of the automatic driving vehicle.

Disclosure of Invention

In view of the above, a method, an apparatus, a storage medium and a computer program product for monitoring a neural network model fault in an automatic driving system are provided.

In a first aspect, an embodiment of the present application provides a method for monitoring a neural network model fault in an autopilot system, the method including: acquiring a target output data set of a neural network model to be monitored in an automatic driving system, wherein the target output data set comprises output data sets corresponding to all neural network layers in M neural network layers, the neural network model to be monitored comprises M neural network layers, M is an integer greater than 1, and M is an integer greater than 1 and not greater than M; extracting a characteristic value set corresponding to each neural network layer from the target output data set; calculating the relative entropy values between the characteristic value set and a first element set conforming to preset probability distribution to obtain relative entropy value sets corresponding to the m neural network layers; and judging whether the neural network model to be monitored has operation faults or not according to the relative entropy value set.

Based on the technical scheme, based on the thought of the Monte Carlo method, the output data of each neural network layer is selectively sampled, part of the output data in the output data set is extracted as the characteristic value, and the distribution of the output data of each neural network layer is reflected through the least characteristic value, so that the calculation is simplified, the calculation cost is saved, and the calculation efficiency is improved; meanwhile, a relative entropy value set is obtained by calculating the relative entropy value between the characteristic value set corresponding to each neural network layer and the first element set conforming to the preset probability distribution, so that the data dimension reduction is realized, and the operation efficiency is further improved; therefore, the real-time performance of fault monitoring is improved, and the real-time monitoring of the operation faults of the neural network model in the automatic driving system is realized. Meanwhile, the distribution difference characteristics of the normal output data and the abnormal output data of each neural network layer are described by adopting the relative entropy values, and the normal output data and the abnormal output data of each neural network layer are distinguished, so that whether the neural network model to be monitored has an operation fault or not is judged more accurately through the relative entropy value sets corresponding to m neural network layers, and the accuracy of fault monitoring is improved; in addition, various operation faults of the neural network model or the operation faults of various neural network models can be effectively monitored, and the application range is wide.

According to a first aspect, in a first possible implementation manner of the first aspect, the extracting, in the target output data set, a set of feature values corresponding to the neural network layers includes: determining a first output data set with the minimum quantity of output data in the target output data set; extracting a characteristic value set corresponding to each neural network layer from the output data sets corresponding to each neural network layer according to the quantity of output data in the first output data set; the number of the eigenvalues in the eigenvalue set corresponding to each extracted neural network layer is smaller than or equal to the number of the output data in the first output data set.

Based on the technical scheme, in consideration of the fact that a neural network model in an automatic driving system is generally complex, the quantity of output data in a target output data set is large, in the output data set corresponding to each neural network layer, the characteristic value set corresponding to each neural network layer is adaptively extracted, and the data of the characteristic value extracted in each neural network layer is not larger than the quantity of output data in any neural network layer in m neural network layers, so that operation cost is simplified, follow-up processing efficiency is improved, and real-time requirements on fault monitoring are met.

In a second possible implementation manner of the first aspect, the extracting, in the target output data set, a set of feature values corresponding to the neural network layers includes: and extracting the characteristic value set corresponding to each neural network layer by taking the quantity of output data in the output data set corresponding to each neural network layer as weight.

Based on the technical scheme, the influence on the working state of the neural network model is also different in consideration of the fact that the quantity of output data in different neural network layers is possibly different; according to the method, the quantity of the output data corresponding to each neural network layer is distributed according to the weight of the quantity of the output data corresponding to each neural network layer, so that the characteristic value set corresponding to each neural network layer is adaptively extracted, the extracted characteristic value set can reflect the distribution of the output data of each neural network layer more accurately, meanwhile, through characteristic value extraction, the operation cost is simplified, the follow-up processing efficiency is improved, and the real-time requirement on fault monitoring is met.

In a third possible implementation manner of the first aspect according to the first aspect or the various possible implementation manners of the first aspect, the determining, according to the set of relative entropy values, whether the neural network model to be monitored has an operation fault includes: and inputting the relative entropy value set into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not.

In some examples, the set of relative entropy values is input into a preset classification model, and the preset classification model classifies the set of relative entropy values based on the relative entropy values between the set of feature values extracted from the known normal output data and the set of elements conforming to the preset probability distribution and the relative entropy values between the set of feature values extracted from the abnormal output data and the set of elements conforming to the preset probability distribution, so as to accurately judge whether the neural network model to be monitored has an operation fault.

In a fourth possible implementation manner of the first aspect according to the third possible implementation manner of the first aspect, the preset classification model includes a first classifier based on machine learning; the step of inputting the set of relative entropy values into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not comprises the following steps: inputting the relative entropy value set into the first classifier, and calculating the distance between the relative entropy value set and a plurality of relative entropy value sample sets; the plurality of relative entropy sample sets comprise relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored fails, and relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored works normally; and judging whether the neural network model to be monitored has operation faults or not according to the distances between the relative entropy value set and the plurality of relative entropy value sample sets.

Based on the technical scheme, the first classifier based on machine learning is utilized, pre-training is not needed, and the relative entropy value set can be automatically classified more conveniently and rapidly according to the distance between the relative entropy value set and the plurality of relative entropy value sample sets, so that whether the neural network model to be monitored has operation faults or not is judged in real time.

In a fifth possible implementation form of the first aspect according to the third possible implementation form of the first aspect, the classification model comprises a second classifier based on deep learning; the step of inputting the set of relative entropy values into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not comprises the following steps: inputting the relative entropy value set into the second classifier, and judging whether the neural network model to be monitored has an operation fault or not; the second classifier is trained by a plurality of relative entropy value sample sets.

Based on the technical scheme, by adopting the second classifier based on deep learning, the classification of the relative entropy value set is effectively improved while the classification of the relative entropy value set is judged in real time, so that whether the neural network model to be monitored has operation faults or not is judged more accurately.

In a sixth possible implementation manner of the first aspect according to the fourth or fifth possible implementation manner of the first aspect, the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored fails includes: the relative entropy value between a first characteristic value sample set corresponding to each neural network layer in the m neural network layers and a second element set conforming to the preset probability distribution; the first characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored fails; the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored works normally comprises: the relative entropy value between the second characteristic value sample set corresponding to each neural network layer in the m neural network layers and the second element set conforming to the preset probability distribution; and the second characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored works normally.

In a second aspect, an embodiment of the present application provides a neural network model fault monitoring device in an autopilot system, the device including: the system comprises a transmission module, a control module and a control module, wherein the transmission module is used for acquiring a target output data set of a neural network model to be monitored in an automatic driving system, the target output data set comprises output data sets corresponding to all neural network layers in M neural network layers, the neural network model to be monitored comprises M neural network layers, M is an integer greater than 1, and M is an integer greater than 1 and not greater than M; the processing module is used for extracting a characteristic value set corresponding to each neural network layer from the target output data set; calculating the relative entropy values between the characteristic value set and a first element set conforming to preset probability distribution to obtain relative entropy value sets corresponding to the m neural network layers; and judging whether the neural network model to be monitored has operation faults or not according to the relative entropy value set.

Based on the technical scheme, based on the thought of the Monte Carlo method, the output data of each neural network layer is selectively sampled, part of the output data in the output data set is extracted as the characteristic value, and the distribution of the output data of each neural network layer is reflected through the least characteristic value, so that the calculation is simplified, the calculation cost is saved, and the calculation efficiency is improved; meanwhile, a relative entropy value set is obtained by calculating the relative entropy value between the characteristic value set corresponding to each neural network layer and the first element set conforming to the preset probability distribution, so that the data dimension reduction is realized, and the operation efficiency is further improved; therefore, the real-time performance of fault monitoring is improved, and the real-time monitoring of the neural network model faults in the automatic driving system is realized. Meanwhile, the distribution difference characteristics of the normal output data and the abnormal output data of each neural network layer are described by adopting the relative entropy values, and the normal output data and the abnormal output data of each neural network layer are distinguished, so that whether the neural network model to be monitored has operation faults or not is judged more accurately through the relative entropy value sets corresponding to the m neural network layers, and the accuracy of fault monitoring is improved. In addition, various operation faults of the neural network model or the operation faults of various neural network models can be effectively monitored, and the application range is wide.

In a first possible implementation manner of the second aspect according to the second aspect, the processing module is further configured to: determining a first output data set with the minimum quantity of output data in the target output data set; extracting a characteristic value set corresponding to each neural network layer from the output data sets corresponding to each neural network layer according to the quantity of output data in the first output data set; the number of the eigenvalues in the eigenvalue set corresponding to each extracted neural network layer is smaller than or equal to the number of the output data in the first output data set.

In a second possible implementation manner of the second aspect according to the second aspect, the processing module is further configured to: and extracting the characteristic value set corresponding to each neural network layer by taking the quantity of output data in the output data set corresponding to each neural network layer as weight.

Based on the technical scheme, the influence on the working state of the neural network model is also different in consideration of the fact that the quantity of output data in different neural network layers is also different; according to the method, the quantity of the output data corresponding to each neural network layer is distributed according to the weight of the quantity of the output data corresponding to each neural network layer, so that the characteristic value set corresponding to each neural network layer is adaptively extracted, the extracted characteristic value set can reflect the distribution of the output data of each neural network layer more accurately, meanwhile, through characteristic value extraction, the operation cost is simplified, the follow-up processing efficiency is improved, and the real-time requirement on fault monitoring is met.

In a third possible implementation manner of the second aspect or the various possible implementation manners of the second aspect, the processing module is further configured to: and inputting the relative entropy value set into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not.

In a fourth possible implementation manner of the second aspect according to the third possible implementation manner of the second aspect, the preset classification model includes a first classifier based on machine learning; the processing module is further configured to: inputting the relative entropy value set into the first classifier, and calculating the distance between the relative entropy value set and a plurality of relative entropy value sample sets; the plurality of relative entropy sample sets comprise relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored fails, and relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored works normally; and judging whether the neural network model to be monitored has operation faults or not according to the distances between the relative entropy value set and the plurality of relative entropy value sample sets.

In a fifth possible implementation manner of the second aspect according to the third possible implementation manner of the second aspect, the classification model includes a second classifier based on deep learning; the processing module is further configured to: inputting the relative entropy value set into the second classifier, and judging whether the neural network model to be monitored has operation faults or not; the second classifier is trained by a plurality of relative entropy value sample sets.

In a sixth possible implementation manner of the second aspect according to the fourth or fifth possible implementation manner of the second aspect, the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored fails includes: the relative entropy value between a first characteristic value sample set corresponding to each neural network layer in the m neural network layers and a second element set conforming to the preset probability distribution; the first characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored fails; the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored works normally comprises: the relative entropy value between the second characteristic value sample set corresponding to each neural network layer in the m neural network layers and the second element set conforming to the preset probability distribution; and the second characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored works normally.

In a third aspect, an embodiment of the present application provides a neural network model fault monitoring device in an autopilot system, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the neural network fault monitoring method in the autopilot system of the first aspect or one or more of the first aspects described above when executing the instructions.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions when executed by a processor implement the method of neural network failure monitoring in an autopilot system of the first aspect or one or more of the first aspects.

In a fifth aspect, embodiments of the present application provide a computer program product which, when run on a computer, causes the computer to perform the method of neural network failure monitoring in an autopilot system of the first aspect or one or more of the first aspects described above.

Technical effects of the third to fifth aspects described above, see the first or second aspects described above.

Drawings

FIG. 1 illustrates a schematic architecture of an autopilot system in accordance with one embodiment of the present application;

FIG. 2 illustrates a schematic diagram of fault monitoring of a neural network model, according to an embodiment of the present application;

FIG. 3 illustrates a flow chart of a neural network model fault monitoring method in an autopilot system in accordance with one embodiment of the present application;

FIG. 4 illustrates a flow chart of a method of obtaining a set of relative entropy samples according to an embodiment of the application;

FIG. 5 is a schematic diagram of a neural network model fault monitoring method in an autopilot system according to one embodiment of the present application;

FIG. 6 is a schematic diagram showing a structure of a neural network model fault monitoring device in an automatic driving system according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a neural network model fault monitoring device in an automatic driving system according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments, features and aspects of the application will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

For a better understanding of aspects of embodiments of the present application, related terms and concepts that may be related to embodiments of the present application are described below.

1. Probability distribution

Probability distribution refers to a probability law for expressing the value of a random variable. If the test result is represented by the value of the random variable, the probability distribution of the random test is the probability distribution of the random variable, namely the probability of the random variable and the probability of obtaining the corresponding value. Depending on the type to which the random variable belongs, the probability distribution may be divided into different manifestations, such as gaussian distribution (also known as normal distribution (normal distribution)), binomial distribution, poisson distribution, uniform distribution, bernoulli distribution, laplace distribution, exponential distribution, gamma distribution, beta distribution, polynomial distribution, and so on.

2. Relative entropy

The relative entropy, also called KL divergence (Kullback-Leibler divergence, KLD), is a measure of the asymmetry of the differences between two probability distributions P and Q. The relative entropy can measure the distance between two probability distributions, and when the two probability distributions are identical, their relative entropy is zero, and when the difference between the two probability distributions increases, their relative entropy increases accordingly.

Typically, P represents the true distribution of the data, and Q represents the theoretical distribution of the data, the estimated model distribution, or the approximate distribution of P. The relative entropy of P and Q is shown in the following formula (1),

wherein P (i) represents the ith element in P, and Q (i) represents the ith element in Q; ln (·) represents the calculated natural logarithm.

3. Monte Carlo process

The Monte Carlo method is also called a statistical simulation method or a statistical test method, and is a numerical simulation method taking probability phenomenon as a research object; in general, the unknown characteristic is estimated by obtaining statistical values by a sampling survey method, and in a calculation simulation, a probability model similar to the system performance is constructed and a random test is performed, so that the random characteristic of the system can be simulated.

4. Classifier

Many neural network models ultimately have a classifier for classifying the input data. The classifier is generally composed of a full connectivity layer (fully connected layer) and a softmax function (which may be referred to as a normalized exponential function) that is capable of outputting different classes or probabilities of different classes depending on the data entered.

5. Multi-layer perceptron (MLP)

MLP is an artificial neural network of forward structure that maps a set of input vectors to a set of output vectors. The MLP can be seen as a directed graph, and the basic structure of the multi-layer perceptron consists of multiple node layers: an input layer, an intermediate hidden layer, and an output layer, each node layer being fully connected to the next node layer. Except for the input nodes, each node is a neuron with a nonlinear activation function; MLP follows the principle of the human nervous system, learns and performs data prediction, and has the main advantage of having the ability to quickly solve complex problems.

6. k nearest neighbor algorithm (k-nearest neighbor, KNN)

The KNN algorithm basic logic is as follows: the classification is carried out by measuring the distance between different characteristic values, and the algorithm only determines the category to which the sample to be classified belongs according to the category of the nearest one or more samples in the classification decision. The basic idea is as follows: if a sample belongs to a class for the majority of the K most similar (i.e., nearest neighbor) samples in the feature space, then the sample also belongs to this class, where K is typically an integer no greater than 20. In the KNN algorithm, the selected neighbors are all samples that have been correctly classified.

7. Neural network model

The neural network model is an operation model, and is formed by interconnecting a plurality of nodes (or neurons). Each node represents a specific output function, called the excitation function (activation function). The connection between each two nodes represents a weight, called a weight, for the signal passing through the connection, which corresponds to the memory of the artificial neural network. The output of the neural network model is different according to the connection mode of the neural network model, and the weight value and the excitation function are different. The neural network model itself is usually an approximation to some algorithm or function in nature, and may also be an expression of a logic strategy. The neural network model typically includes a plurality of neural network layers, where each neural network layer may include one or more nodes. The neural network model may be classified into a deep neural network (Deep Neural Network, DNN), a convolutional neural network (Convolutional Neuron Network, CNN), a recurrent neural network (Recurrent Neural Network, RNN), and the like. The deep neural network, also called as multi-layer neural network, can be understood as a neural network model with a plurality of hidden layers, and the internal neural network layers can be divided into three types: input layer, hidden layer, output layer. In general, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers; the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. The convolutional neural network is a neural network model with a convolutional structure; the convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer, which can be regarded as a filter; the convolution layer is a neuron layer for carrying out convolution processing on input data in the convolution neural network; in a convolutional layer, one neuron may be connected to only a portion of adjacent layer neurons; a convolutional layer typically contains a number of feature planes, each of which may consist of a number of neurons arranged in a rectangular pattern; neurons of the same feature plane share weights, i.e., share convolution kernels.

8. Neural network model fault monitoring

The neural network model fault monitoring is to monitor the possible operation faults of the neural network model in the neural network model reasoning operation process. The operation fault may include a fault caused by a hardware failure in a device in which the neural network model is deployed, or an erroneous inference result obtained by the neural network model caused by an abnormal input, or the like. Among them, failures caused by hardware failure are generally referred to as soft errors (soft errors); common soft errors can be classified into transient errors and permanent errors. Wherein, the transient error is related to hardware failure caused by external environment mutation such as radiation, temperature, etc., and the hardware itself is interfered with each other, etc., the transient error is characterized in that the error disappears after a certain period of time appears, and the common transient error is bit flip (bit flip); common permanent errors are both zero (stuck-at-0) and one (stuck-at-one), which are associated with hardware failures caused by open and short circuits, respectively, and are characterized by long-term retention of the error in place.

In the related art, a redundant design mode is adopted to perform neural network model fault monitoring, such as triple redundancy (triple modular redundancy, TMR) design, and a plurality of modules with the same structure are connected in parallel in the TMR design to execute the same function; in the method, a pre-prepared lookup table is adopted to collect all the neuron weight values under the condition that a neural network model has no fault (error free) as far as possible, and in the neural network model reasoning operation process, if a certain weight value is not in the lookup table, the weight value is considered to be abnormal, namely the neural network model has an operation fault, the weight value switching state is started, the weight value in the neuron with the error is distributed to other neurons, and the effect of replacing the error neuron by the other neurons is achieved. Or, a Symptom-based monitor (Symptom-based Error Detectors, SED) is adopted to monitor the faults of the neural network model, and whether the neural network model has operation faults or not is judged according to the output value corresponding to each neural network layer in the neural network model; the method is characterized in that a plurality of output values of each neural network layer are collected in advance under the condition that the neural network model has no fault (error free), a reasonable value range of the output value corresponding to each neural network layer is determined according to the collected plurality of output values, and in the reasoning operation process of the neural network model, if the output value of a certain neural network layer exceeds 1.1 times of the corresponding reasonable value range, the output value is considered to be wrong, so that the operation fault of the neural network model is judged.

Both of the above approaches to neural network model fault monitoring have their own limitations. The redundancy design mode is only suitable for the multi-layer sensing network, namely the input data is not subjected to dimension reduction in a convolution mode, a pooling mode and the like; when monitoring possible faults of the neural network model with the convolution layer and the pooling layer, the method cannot effectively monitor the convolution layer or the pooling layer due to the adoption of a weight value lookup table design; and the mode can only monitor the errors of partial stuck-at-one and bitflip; in addition, aiming at a more complex neural network model, the operation cost of the method is high, and the real-time performance of fault monitoring cannot be ensured, for example, more than 3000 weights exist in an input layer in the neural network model Alexnet, the weight values of all neurons in the Alexnet are collected in advance, the difficulty and operation cost of collecting a lookup table are high, and when the lookup table is used for fault monitoring on the Alexnet, the query speed is low due to the large weight quantity and the large weight value quantity in the lookup table, so that the method cannot be suitable for scenes with high requirements on the real-time performance of fault monitoring such as an automatic driving system. The method is characterized in that a SED mode is adopted, the maximum output value and the minimum output value of each neural network layer are extracted through a simple enumeration algorithm to obtain a reasonable value range of the corresponding output value of each neural network layer, and aiming at a relatively complex neural network model, the operation cost of the mode is very huge, for example, in Alexnet, the output values of each neuron, a pooling layer and a full-connection layer are considered, more than one hundred thousand output values of single neurons are summed, and in the case of a convolution layer, more than 15000 output values are obtained, so that huge operation cost is brought by collecting the output values of each neural network layer, and when the Alexnet is subjected to fault monitoring, the problem that the delay of fault monitoring is caused is not suitable for scenes with high requirements on real-time fault monitoring such as an automatic driving system; in addition, the method can only monitor the occurrence of transient errors, and for zero setting and one setting, the hidden layer cannot monitor permanent errors because the maximum output value cannot be changed obviously under the two conditions.

Due to the limitations of the two methods for monitoring the neural network model faults. The neural network model fault monitoring method provided by the embodiment of the application can be applied to scenes provided with a neural network model (see below for detailed description), such as scenes provided with the neural network model, such as automatic driving vehicles, vehicle-mounted equipment or vehicle-mounted systems (such as an automatic driving system (Automated Driving System, ADS) or an advanced driving assistance system (Advanced Driver Assistant Systems, ADAS), and the like, a deep learning training server deployed in a large scale, and scenes such as object recognition, semantic recognition and the like are carried out by adopting the neural network model in the Internet of things (Internet of Things, ioT) equipment, and scenes such as vehicle detection, object detection and the like are carried out by adopting the neural network model in security equipment.

For convenience of description, taking fault monitoring of a neural network model in an automatic driving system as an example, a neural network model fault monitoring method provided by an embodiment of the present application is described in an exemplary manner.

FIG. 1 illustrates a schematic architecture of an autopilot system in accordance with one embodiment of the present application; as shown in fig. 1, the autopilot system may include: a perception module (persistence layer), a planning and decision module (planning & decision), and a transmission control module (motion controller).

The sensing module is used for sensing the surrounding environment or the in-vehicle environment of the vehicle, and can synthesize data, collected by vehicle sensors such as cameras, laser radars, millimeter wave radars, ultrasonic radars, light sensors and the like, around the vehicle or in the vehicle cabin, sense the surrounding environment or the in-vehicle environment of the vehicle, and transmit sensing results to the planning and decision module. For example, the data collected by the in-vehicle sensors around or in the vehicle cabin may include video streams, point cloud data of radar, or information or data of analyzed and structured people, vehicles, locations, speeds, steering angles, sizes, etc. The sensing module can process data around or in a vehicle cabin acquired by the vehicle-mounted sensor through a neural network model, so as to realize environment sensing, and the neural network model can be deployed in processing equipment such as a vehicle-mounted computing platform or an AI accelerator. As one example, the perception module may acquire an image of the surrounding environment of the vehicle captured by the onboard camera, and process the image using a deep neural network model for image recognition, so that objects such as pedestrians, lane lines, vehicles, obstacles, traffic lights, etc. in the image may be identified.

The planning and decision-making module is used for making an analysis decision based on the perception result generated by the perception module, and planning to generate a control set meeting specific constraint conditions (such as dynamics constraint of the vehicle, collision avoidance, passenger comfort and the like); and may transmit the control set to the transmission control module. As one example, the planning and decision module may process the perceived result and constraints to generate a control set using a neural network model for generating the trajectory; the neural network model may be deployed in a processing device such as an on-board computing platform or an AI accelerator, for example.

The transmission control module is used for controlling the running of the vehicle according to the control set generated by the planning and decision module; for example, control signals such as steering angle, speed, acceleration and the like of a steering wheel can be generated based on a control set and combined with dynamics information of a vehicle, and the vehicle-mounted steering system or an engine and the like can be controlled to execute the control signals, so that vehicle running can be controlled.

Illustratively, the autopilot system may also include other functional modules; such as a positioning module, an interaction module, a communication module, etc. (not shown), which are not limiting. The positioning module can be used for providing position information of the vehicle and also providing attitude information of the vehicle. Illustratively, the positioning module may include a satellite navigation system (Global Navigation Satellite System, GNSS), an inertial navigation system (Inertial Navigation System, INS), etc., which may be used to determine the location information of the vehicle. The interaction module may be used to send information to the driver and receive instructions from the driver. The communication module may be used for the vehicle to communicate with other devices, where the other devices may include mobile terminals, cloud devices, other vehicles, roadside devices, etc., may be implemented through wireless communication connections such as 2G/3G/4G/5G, bluetooth, frequency modulation (frequency modulation, FM), wireless local area network (wireless local area networks, WLAN), long term evolution (long time evolution, LTE), vehicle-to-anything (vehicle to everything, V2X), vehicle-to-vehicle communication (Vehicle to Vehicle, V2V), long term evolution-vehicle (long time evolution vehicle, LTE-V), etc.

The method for monitoring the failure of the neural network model in the automatic driving system provided by the embodiment of the application can be executed by the device for monitoring the failure of the neural network model, taking as an example the failure monitoring of the deep neural network model for image recognition in the perception module in fig. 1 as an example, fig. 2 is a schematic diagram showing the failure monitoring of the neural network model according to an embodiment of the application; as shown in fig. 2, the neural network model fault monitoring device may acquire intermediate data generated in the process of identifying a frame of image by a deep neural network model for image identification in an automatic driving system sensing module, execute the neural network model fault monitoring method (see below for details) in the embodiment of the present application, perform real-time accurate fault monitoring on the deep neural network model, and feed back the fault monitoring result to the sensing module in real time, so that the sensing module determines whether to transmit the current identification result to the planning and decision module. For example, the sensing module may be fed back, and the neural network model works normally, so that the sensing module may transmit the recognition result of the frame image to the planning and decision module; alternatively, the sensing module may be fed back that the neural network failed, so that the sensing module discards the recognition result of the frame image.

The embodiment of the application is not limited to the type of the neural network model fault monitoring device.

The neural network model fault monitoring device may be provided independently, may be integrated in other devices, or may be implemented by software or a combination of software and hardware.

The neural network model fault monitoring device may be an autonomous vehicle, or other components in an autonomous vehicle, for example. Wherein the neural network model fault monitoring device includes, but is not limited to: vehicle-mounted terminals, vehicle-mounted controllers, vehicle-mounted modules, vehicle-mounted components, vehicle-mounted chips, vehicle-mounted units, vehicle-mounted radars or vehicle-mounted cameras and the like. As one example, the neural network model fault monitoring device may be integrated in a processing device such as an on-board computing platform or AI accelerator of an autonomous vehicle.

The neural network model fault monitoring device may also be, for example, a smart terminal having data processing capabilities other than an autonomous vehicle, or a component or chip provided in the smart terminal.

The neural network model fault monitoring means may be a general purpose device or a dedicated device, for example. For example, the apparatus may also be a desktop, a portable, a web server, a palm top (personal digital assistant, PDA), a mobile handset, a tablet, a wireless terminal device, an embedded device or other device with data processing capabilities, or a component or chip within such a device.

The neural network model fault monitoring device may also be a chip or processor with processing functionality, for example, and may include multiple processors. The processor may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.

It should be noted that, the above application scenario described in the embodiment of the present application is for more clearly describing the technical solution of the embodiment of the present application, and does not constitute a limitation on the technical solution provided in the embodiment of the present application, and those skilled in the art can know that, for other similar or new scenarios, the technical solution provided in the embodiment of the present application is applicable to similar technical problems.

The following describes a neural network model fault monitoring method in an automatic driving system in detail.

Fig. 3 is a flowchart illustrating a neural network model fault monitoring method in an automatic driving system according to an embodiment of the present application, which may be performed by the neural network model fault monitoring device in fig. 2, as shown in fig. 3, and may include the following steps:

step 301, acquiring a target output data set of a neural network model to be monitored in an automatic driving system.

The neural network model to be monitored may be any neural network model in an automatic driving system, for example, a deep neural network model for image recognition or a neural network model for voice recognition configured in a perception module, or a neural network model for generating a control set configured in a planning and decision module, or the like.

It should be noted that, in the embodiment of the present application, the type of the neural network model is not limited, and may be, for example, a deep neural network, a convolutional neural network, a recurrent neural network, or the like.

The target output data set may include output data sets corresponding to each of M neural network layers, the neural network model to be monitored includes M neural network layers, M is an integer greater than 1, and M is an integer greater than 1 and not greater than M. For any neural network layer, the output data set corresponding to the neural network layer includes data output by all nodes in the neural network layer in the reasoning process of the neural network model to be monitored. The specific value of m can be preset according to the scale of the neural network model to be monitored, the actual computing resources and the like; for example, the value of M may be set to be close to M, that is, output data sets corresponding to as many neural network layers as possible are obtained, so as to improve monitoring accuracy, for example, when the value of M is the same as that of M, it indicates that the neural network model fault monitoring device obtains output data sets corresponding to all the neural network layers in the neural network model to be monitored; the value of m can be set to be a smaller value, namely, a small amount of output data sets corresponding to the neural network layer are acquired, so that operation resources are saved, the processing efficiency is improved, and the real-time requirement is better met.

As an example, the neural network model to be monitored may be a convolutional neural network used for image recognition in an autopilot system sensing module, where the convolutional neural network may include a plurality of convolutional layers, a pooling layer, a full-connection layer, and the like, and the image collected by the sensing module is input into the convolutional neural network, and after being processed by the convolutional layer, the pooling layer, and the full-connection layer, an image recognition result is output; wherein, each convolution layer can comprise one or more convolution kernels, each convolution kernel can extract a corresponding feature map, and then the target output data set of the convolution neural network can comprise the feature maps extracted by all convolution kernels in each convolution layer.

And 302, extracting a characteristic value set corresponding to each neural network layer from the target output data set.

For any neural network layer, the set of feature values corresponding to the neural network layer may include one or more feature values corresponding to the neural network layer. For any one of the m neural network layers, the output data may be extracted from the output data set corresponding to the neural network layer as a feature value, so as to obtain a feature value set corresponding to the neural network layer. The number of output data extracted may be preset according to the requirement, and for example, the number of output data extracted by different neural network layers may be the same or different, which is not limited. This step can be understood as the extraction of the feature engineering, and the distribution of the output data of each neural network layer is reflected as comprehensively as possible by extracting as few output data as possible as the feature value.

For any one of the m neural network layers, the output data can be extracted from the output data set corresponding to the neural network layer as the characteristic value according to a preset probability distribution mode, so that the characteristic value set corresponding to the neural network layer is obtained; for example, a part of output data may be extracted from the output data set corresponding to the neural network layer in a gaussian distribution manner as a feature value, so as to obtain the feature value set corresponding to the neural network layer.

Possible implementations of extracting the feature value sets corresponding to the neural network layers are illustrated below.

Determining a first output data set with the minimum quantity of output data in a target output data set; extracting a characteristic value set corresponding to each neural network layer from the output data sets corresponding to each neural network layer according to the quantity of output data in the first output data set; the number of the eigenvalues in the eigenvalue set corresponding to each extracted neural network layer is smaller than or equal to the number of the output data in the first output data set.

For example, the number of output data to be extracted by each neural network layer may be determined according to the number of output data in the first output data set, and then the number of output data is extracted in each neural network layer as a feature value, so as to obtain a feature value set corresponding to each neural network layer.

In the mode, in the output data set corresponding to each neural network layer, the characteristic value set corresponding to each neural network layer is adaptively extracted, and the data of the characteristic value extracted in each neural network layer is not more than the quantity of the output data in any neural network layer in m neural network layers, so that the operation cost is simplified, the subsequent processing efficiency is improved, and the real-time requirement on fault monitoring is met.

As an example, a sampling coefficient may be preset, and the amount of output data to be extracted by each neural network layer is determined according to the sampling coefficient and the amount of output data in the first output data set; for example, the number n of output data to be extracted for each neural network layer may be determined by the following formula (2):

n＝α*n _tmp ..................................(2)

in formula (2), n _tmp Representing the quantity of output data in the first output data set, alpha represents a sampling coefficient, and the value range of alpha is [0,1 ]]。

The sampling coefficient alpha is used for balancing the complexity and accuracy of fault monitoring of the neural network model to be monitored, and a specific numerical value of the sampling coefficient can be set according to actual requirements; for example, α may be set to a higher value under the condition of higher requirement for monitoring accuracy, that is, for each neural network layer, extracting a larger amount of output data in the corresponding output data set, as a corresponding feature value of the neural network layer; under the condition that the monitoring accuracy requirement is not too high, the alpha value is set to be a smaller value, namely, for each neural network layer, a smaller amount of output data is extracted from the corresponding output data set and used as the corresponding characteristic value of the neural network layer, so that operation resources are saved, the processing efficiency is improved, and the real-time requirement is better met.

Illustratively, α may be 10%. Illustratively, when α is n _tmp When the value of (a) is a non-integer, then α is n _tmp Rounding down, giving n.

Wherein n is _tmp Can be determined by the following formula (3):

n _tmp ＝min _i∈m φ(i)...................(3)

in the formula (3), Φ (i) represents the number of output data in the output data set corresponding to the i-th neural network layer among the m neural network layers.

In this way, the number of output data to be extracted by each neural network layer, that is, the number of eigenvalues in the eigenvalue set, can be determined according to the above formula (2) and formula (3). As an example, 10% of the total number of output data included in the first output data set with the smallest number of output data may be used as the number of output data to be extracted by each neural network layer, thereby simplifying the operation overhead and improving the subsequent processing efficiency.

And secondly, extracting a characteristic value set corresponding to each neural network layer by taking the quantity of output data in the output data set corresponding to each neural network layer as a weight.

Considering that the quantity of output data in different neural network layers is also different, the influence on the working state of the neural network model is also different; therefore, the number of the extracted feature values of each neural network layer can be changed appropriately, in this manner, the number of the extracted output data of each neural network layer can be allocated according to the weight of the number of the output data corresponding to each neural network layer, that is, the more the number of the output data corresponding to the neural network layer is, the more the number of the output data is extracted as the feature values; correspondingly, the smaller the quantity of output data corresponding to the neural network layer is, the smaller quantity of output data is extracted as a characteristic value; therefore, the self-adaptive extraction of the characteristic value sets corresponding to the neural networks is realized, the extracted characteristic value sets can more accurately reflect the distribution of the output data of the neural network layers, and meanwhile, the operation cost is simplified, the subsequent processing efficiency is improved and the real-time requirement on fault monitoring is met through the characteristic value extraction.

Step 303, calculating the relative entropy values between the feature value set corresponding to each neural network layer and the first element set conforming to the preset probability distribution, and obtaining the relative entropy value sets corresponding to m neural network layers.

The first element set can comprise a plurality of elements conforming to a preset probability distribution, and can be generated in real time or pre-stored; for example, a predetermined number of random numbers, which are subject to a predetermined probability distribution, may be generated in real time, the predetermined number of random numbers constituting the first element set; the preset probability distribution may be, for example, a gaussian distribution.

For any one of the m neural network layers, a relative entropy value between the set of feature values corresponding to the neural network layer and the first element set conforming to the preset probability distribution can be obtained, where the relative entropy value is a real number, and the magnitude of the relative entropy value indicates the difference between the distribution formed by the feature values in the set of feature values corresponding to the neural network layer and the preset probability distribution. In this way, traversing all the neural network layers in the m neural network layers, and calculating to obtain the relative entropy values between each neural network layer and the first element set, so as to obtain a plurality of real numbers, thereby obtaining a relative entropy value set; the relative entropy values in the relative entropy value sets can represent the difference between the distribution formed by the characteristic values in the characteristic value sets corresponding to the neural network layers in the m neural network layers and the preset probability distribution. Meanwhile, a set of relative entropy values is obtained by utilizing the corresponding characteristic value sets of the neural network layers, so that the data dimension reduction is realized, and the operation efficiency is further improved.

And 304, judging whether the neural network model to be monitored has operation faults or not according to the set of the relative entropy values corresponding to the m neural network layers.

The relative entropy value between the characteristic value set extracted from the normal output data of each neural network layer and the first element set conforming to the preset probability distribution in the reasoning process of the neural network model during normal operation can represent the difference between the normal output data of each neural network layer and the first element set; when the neural network model fails, the relative entropy value between the characteristic value set extracted from the abnormal output data of each neural network layer and the first element set conforming to the preset probability distribution can represent the difference between the abnormal output data of each neural network layer and the first element set; because the normal output data of each neural network layer is different from the abnormal output data of each neural network layer, and correspondingly, the relative entropy value between the normal output data of each neural network layer and the first element set is different from the relative entropy value between the abnormal output data of the neural network layer and the first element set, the normal output data of each neural network layer in the reasoning process of the normal operation of the neural network model can be distinguished from the abnormal output data of each neural network layer in the reasoning process of the failure of the neural network model by utilizing the relative entropy value. In addition, the data amount in the output data sets (for example, normal output data or abnormal output data) of each neural network layer is generally larger, that is, the output data sets are widely distributed in the data space, different output data sets are distinguished by using different relative entropy values, that is, the relative entropy values have a corresponding relationship with the widely distributed output data sets in the data space, so that the difference between the different output data sets in the data space is pulled through the different relative entropy values, and the coupling degree of the different output data sets is reduced. In the step, different from directly judging whether the neural network model to be monitored has operation faults according to the output data of each neural network layer in the neural network model reasoning process, the normal output data and the abnormal output data of each neural network layer are distinguished through the corresponding relative entropy value sets of m neural network layers, so that whether the neural network model to be monitored has operation faults is judged more accurately. For example, if the difference between the normal output data of each neural network layer and the abnormal output data of each neural network layer is small, the difference between the two is not easy to be distinguished directly; the relative entropy value between the characteristic value set extracted by the normal output data and the first element set conforming to the preset probability distribution is different from the relative entropy value between the characteristic value set extracted by the abnormal output data and the first element set conforming to the preset probability distribution, and the normal output data and the abnormal output data are distinguished through the relative entropy value, so that whether the neural network model to be monitored has operation faults or not is accurately judged.

In one possible implementation, this step may include: and inputting the relative entropy value sets corresponding to the m neural network layers into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not.

The preset classification model can automatically classify the relative entropy value set according to the magnitude of each relative entropy value in the relative entropy value set, and accurately determine the category to which the relative entropy value set belongs; wherein, the category may include that the neural network model to be monitored works normally and that the neural network model to be monitored malfunctions; the method includes the steps that a set of relative entropy values is input into a preset classification model, the preset classification model classifies the set of relative entropy values based on the relative entropy values between a set of characteristic values extracted from known normal output data and a set of elements conforming to preset probability distribution and the relative entropy values between a set of characteristic values extracted from abnormal output data and a set of elements conforming to the preset probability distribution, and therefore whether the neural network model to be monitored has operation faults or not is accurately judged.

Illustratively, the preset classification model may include a first classifier based on machine learning or a second classifier based on deep learning, etc.; for example, the first classifier may be KNN, the second classifier may be MLP, and so on.

The neural network model fault monitoring method in the automatic driving system provided by the embodiment of the application has the characteristics of low operation cost, high real-time performance, high accuracy, wide application range and the like.

In the embodiment of the application, considering the complexity of the neural network model in the automatic driving system, the number of the included neural network layers is usually more, and the corresponding output data is larger, so that the output data of each neural network layer is selectively sampled based on the thought of the Monte Carlo method, partial output data in an output data set is extracted as a characteristic value, the extracted characteristic value distribution can be used as the estimation of the distribution of the output data of each neural network layer in a target output data set, namely, the distribution of the output data of each neural network layer is reflected by the least characteristic value, thereby simplifying calculation, saving operation cost and improving operation efficiency; meanwhile, a relative entropy value set is obtained by calculating the relative entropy value between the characteristic value set corresponding to each neural network layer and the first element set conforming to the preset probability distribution, so that the data dimension reduction is realized, and the operation efficiency is further improved; therefore, the real-time performance of fault monitoring is improved, and the real-time monitoring of the neural network model faults in the automatic driving system is realized.

In the embodiment of the application, the distribution difference characteristics of the normal output data and the abnormal output data of each neural network layer are described by adopting the relative entropy values, and the normal output data and the abnormal output data of each neural network layer are distinguished by the relative entropy value sets corresponding to m neural network layers, so that whether the neural network model to be monitored has operation faults or not is judged more accurately according to the relative entropy value sets, and the accuracy of fault monitoring is improved. For example, for the fault monitoring mode for Alexnet, compared with the SED fault monitoring mode, when 500 errors occur in Alexnet as well, the accuracy of Alexnet fault monitoring is improved greatly in the embodiment of the present application.

In the embodiment of the application, various operation faults of the neural network model or the operation faults of various neural network models can be effectively monitored, and the application range is wide; for example, the operation faults of various neural network models such as a deep neural network model, a convolutional neural network model and the like can be monitored; for another example, the operation faults of the neural network model to be monitored, including transient faults, permanent faults and the like, caused by hardware failure in equipment deploying the neural network model to be monitored, such as a vehicle-mounted computing platform or an AI accelerator, in the automatic driving system can be monitored in real time; the operation fault of the neural network model to be monitored caused by abnormal input in the automatic driving system can be monitored in real time, so that the safety of a vehicle-mounted computing platform or an AI accelerator and the like is improved. In addition, the range of the neural network layers which are likely to fail can be determined, namely that one or more of the m neural network layers cause the operation failure of the neural network model to be monitored.

In the above step 304, a possible implementation manner of determining whether the neural network model to be monitored has an operation fault according to the set of relative entropy values is illustrated.

Taking a preset classification model as a first classifier based on machine learning as an example, a relative entropy value set can be input into the first classifier, and distances between the relative entropy value set and a plurality of relative entropy value sample sets are calculated; and judging whether the neural network model to be monitored has operation faults or not according to the distances between the relative entropy value set and the plurality of relative entropy value sample sets.

The plurality of relative entropy sample sets may include relative entropy sample sets corresponding to m neural network layers when the neural network model to be monitored fails, and relative entropy sample sets corresponding to m neural network layers when the neural network model to be monitored works normally.

For example, a plurality of relative entropy sample sets may be obtained by sampling in advance, that is, each relative entropy sample set belongs to a class known, wherein the class may be divided into that the neural network model to be monitored works normally and that the neural network model to be monitored malfunctions. The magnitude of the distance between the set of relative entropy values and the plurality of sets of relative entropy samples may represent a degree of difference between the set of relative entropy values and each of the plurality of sets of relative entropy samples; for example, if the distance between a set of relative entropy values and a certain set of relative entropy samples is larger, the larger the difference between the set of relative entropy values and the set of relative entropy samples is, the lower the likelihood that the set of relative entropy values and the set of relative entropy samples belong to the same class is correspondingly. If the distance between the set of relative entropy values and a certain set of relative entropy samples is smaller, the difference between the set of relative entropy values and the set of relative entropy samples is smaller, and accordingly, the set of relative entropy values and the set of relative entropy samples are more likely to belong to the same category.

For example, the set of relative entropy values may be input into a first classifier, where the first classifier calculates distances between the set of relative entropy values and a plurality of sets of relative entropy values, so that sets of relative entropy values of different classes may be divided in a feature space, and then the set of relative entropy values and one or more sets of relative entropy values that are closest to the divided set of relative entropy values may be considered to belong to one class more likely, and further, whether the neural network model to be monitored has an operation fault is determined according to the class to which most of the sets of relative entropy values in the one or more sets of relative entropy values belong.

Taking a first classifier as a KNN classifier as an example, inputting a relative entropy value set into the KNN classifier, wherein the KNN classifier can automatically calculate the distance between the relative entropy value set and each relative threshold value sample set in a plurality of relative entropy value sample sets, select K relative entropy value sample sets closest to the relative entropy value set, and take the category of most of the K relative entropy value sample sets as the category of the relative entropy value set in a majority voting mode; if the type of the relative entropy value set is that the neural network model to be monitored has faults, judging that the neural network model to be monitored has operation faults; if the relative entropy value set category is that the neural network model to be monitored works normally, the neural network model to be monitored can be judged to have no operation fault. Therefore, the first classifier based on machine learning is utilized, pre-training is not needed, and the relative entropy value set can be automatically classified more conveniently and rapidly according to the distance between the relative entropy value set and the plurality of relative entropy value sample sets, so that whether the neural network model to be monitored has operation faults or not is judged in real time.

Taking a preset classification model as a second classifier based on deep learning as an example, inputting a relative entropy value set into the second classifier, and judging whether the neural network model to be monitored has an operation fault or not; the second classifier is trained by a plurality of relative entropy value sample sets.

For example, the second classifier may be trained in advance according to a plurality of relative entropy sample sets and known categories to which each relative entropy sample set belongs, and after training, the second classifier may accurately distinguish between relative entropy sets of different categories. When fault monitoring is carried out, the relative entropy value set can be input into a trained second classifier, and the second classifier can automatically judge the class to which the relative entropy value set belongs, so that whether the neural network model to be monitored has operation faults or not can be accurately judged; therefore, by adopting the second classifier based on deep learning, the accuracy of classifying the relative entropy value set is effectively improved while judging the class of the relative entropy value set in real time, so that whether the neural network model to be monitored has operation faults or not is judged more accurately.

Taking a second classifier as an example, taking the MLP as an example, wherein the topology structure of the MLP can be set according to the number of relative entropy values in the relative entropy value set and the classification class; for example, the topology of the MLP may be (n-20-2), where n represents the number of relative entropy values in the set of relative entropy values input to the MLP input layer; 20 represents the number of MLP hidden layers, and 2 represents two categories output by the MLP output layer, namely that the neural network model to be monitored breaks down and the neural network model to be monitored works normally. In the training stage, training the MLP by using a plurality of relative entropy sample sets as training samples, wherein the relative entropy sample sets corresponding to m neural network layers can be used as negative samples when the neural network model to be monitored breaks down, and the relative entropy sample sets corresponding to m neural network layers can be used as positive samples when the neural network model to be monitored works normally; inputting training samples and corresponding class labels into the MLP, training weight parameters in the MLP, for example, one training sample can be input into the MLP, the MLP outputs the class of the training sample, determines loss function values according to the class and the class labels of the training sample, performs back propagation according to the loss function values, and adjusts the weight parameters in the MLP; and repeating the training process by using a plurality of training samples until convergence is achieved, and fixing the weight parameters in the MLP when convergence is achieved, so as to obtain the trained MLP. In the fault monitoring stage, the relative entropy value set is input into the trained MLP, and the MLP can automatically output the category to which the relative entropy value set belongs, so that whether the operation fault occurs in the current neural network model to be monitored is accurately judged in real time. As one example, for Alexnet for image recognition, when the MLP after training is used to determine whether there is an operation failure in Alexnet, the determination accuracy is improved by about 15% compared to the manner of using SED.

Note that the KNN and the MLP are merely examples, and other classifiers may be used as classification models as needed, which is not limited thereto.

For example, the set of relative entropy value samples corresponding to m neural network layers when the neural network model to be monitored fails may include: the relative entropy value between a first characteristic value sample set corresponding to each neural network layer in the m neural network layers and a second element set conforming to preset probability distribution; the first characteristic value sample set is extracted from output data sample sets corresponding to all the neural network layers when the neural network model to be monitored fails; the set of relative entropy value samples corresponding to m neural network layers when the neural network model to be monitored works normally can comprise: the relative entropy value between the second characteristic value sample set corresponding to each neural network layer in the m neural network layers and the second element set conforming to the preset probability distribution; and the second characteristic value sample set is obtained by extracting output data sample sets corresponding to the neural network layers when the neural network model to be monitored works normally.

Illustratively, the second element set may be the same as the first element set described above; it will be appreciated that the set of elements that fit the preset probability distribution, i.e. the second set of elements, may be predetermined and employed as the first set of elements during the fault monitoring phase.

It can be appreciated that the corresponding set of relative entropy samples can be pre-generated for different neural network models to be monitored according to different scenarios.

FIG. 4 is a flowchart of a method for obtaining a set of samples of relative entropy values according to an embodiment of the application, as shown in FIG. 4, may include the steps of:

step 401, respectively obtaining output data sample sets corresponding to at least one neural network layer in the neural network model to be monitored when the neural network model to be monitored fails and works normally.

As an example, when the neural network model to be monitored works normally, an output data sample set corresponding to each of m neural network layers of the neural network model to be monitored may be obtained.

Taking a neural network model as a depth neural network model for image recognition in a perception module as an example, pre-labeling an object in an original image acquired by a vehicle-mounted camera as a pedestrian in the original image, inputting the original image into the neural network model to be monitored, judging that the object contained in the original image is the pedestrian by reasoning through the neural network model to be monitored, and collecting output data of each neural network layer in the reasoning process as an output data sample set corresponding to each neural network layer in m neural network layers when the neural network model to be monitored works normally. Similarly, different original images can be sequentially adopted, and output data of each neural network layer in each reasoning process can be correspondingly collected, so that a plurality of output data sample sets corresponding to each neural network layer in m neural network layers are obtained when the neural network model to be monitored works normally.

As another example, by means of fault injection, a fault occurring in the reasoning process of the neural network model to be monitored can be simulated, so that when the neural network model to be monitored fails, an output data sample set corresponding to each of m neural network layers in the neural network model to be monitored is obtained.

Taking a neural network model as a deep neural network model for image recognition in a perception module as an example, pre-labeling an object in an original image as a pedestrian in an original image acquired by a vehicle-mounted camera, inputting the original image into the neural network model to be monitored, injecting a fault, judging that the object contained in the original image is not the pedestrian by reasoning through the neural network model to be monitored, and collecting output data of each neural network layer in the reasoning process, thereby being taken as an output data sample set corresponding to each neural network layer in m neural network layers when the neural network model to be monitored breaks down. Similarly, different faults can be sequentially injected or different original images are adopted, the neural network model to be monitored performs multiple reasoning calculation, and output data of each neural network layer in each reasoning process are correspondingly collected, so that a plurality of output data sample sets corresponding to each neural network layer in m neural network layers when the neural network model to be monitored breaks down are obtained.

As another example, by generating the countermeasure sample, an output data sample set corresponding to each of m neural network layers in the neural network model to be monitored when the neural network model to be monitored fails may be obtained. The countermeasure sample represents input data which cannot be normally inferred by the neural network model to be monitored.

Taking a neural network model as an example of a deep neural network model for image recognition in a perception module, pre-labeling an object in an original image as a pedestrian in a frame of the original image acquired by a vehicle-mounted camera, and adding a very small amount of carefully constructed noise into the original image to obtain a countermeasure image, wherein a human eye cannot generally distinguish the countermeasure image from the original image, and the neural network model to be monitored may perform error classification on the object in the countermeasure image, for example, may determine that the object contained in the countermeasure image is not a pedestrian, so that an error occurs; and collecting output data of each neural network layer in the reasoning process, and taking the output data as an output data sample set corresponding to each neural network layer in the m neural network layers when the neural network model to be monitored fails. Similarly, different countermeasure images can be generated, and output data of each neural network layer in each reasoning process can be correspondingly collected, so that a plurality of output data sample sets corresponding to each neural network layer in m neural network layers are obtained when the neural network model to be monitored fails.

Step 402, extracting a set of characteristic value samples corresponding to at least one neural network layer from the set of output data samples corresponding to at least one neural network layer.

In this step, the manner of extracting the feature value sample set may refer to the related expression in step 303, which is not described herein. For example, the number of eigenvalue samples in the eigenvalue sample set can be determined by the above formulas (1) and (2). The value of the sampling coefficient can be set according to the requirement, for example, a smaller sampling coefficient can be set, and the number of the eigenvalue samples in the eigenvalue sample set is reduced, so that the training efficiency of the second classifier is effectively improved, the training of the second classifier under a small amount of data is realized, and the operation resource is effectively saved; or, the efficiency of the first classifier for automatically classifying the relative entropy value set can be effectively improved, and the real-time requirement of fault monitoring can be better met.

For example, when the obtained neural network model to be monitored fails, a first characteristic value sample set corresponding to each neural network layer may be extracted from output data sample sets corresponding to each neural network layer in the m neural network layers; and extracting a second characteristic value sample set corresponding to each neural network layer from the output data sample set corresponding to each neural network layer in the m neural network layers when the obtained neural network model to be monitored works normally.

For any neural network layer of the m neural network layers, the output data samples can be extracted from the output data sample set corresponding to the neural network layer as the characteristic value samples according to a preset probability distribution mode, so that the characteristic value sample set corresponding to the neural network layer is obtained, and the robustness of the classification model is improved.

Step 403, calculating a relative entropy value between a characteristic value sample set corresponding to at least one neural network layer and a second element set conforming to a preset probability distribution, and obtaining a relative entropy value sample set.

The method comprises the steps that the relative entropy value between a first characteristic value sample set and a second element set corresponding to each neural network layer can be calculated, and the relative entropy value sample set corresponding to m neural network layers when a neural network model to be monitored breaks down is obtained; the relative entropy value between the second characteristic value sample set and the second element set corresponding to each neural network layer can be calculated, and the relative entropy value sample set corresponding to m neural network layers when the neural network model to be monitored works normally is obtained.

For example, a class to which the set of relative entropy samples belongs may be further labeled, where the class to which the set of relative entropy samples corresponding to m neural network layers when the neural network model to be monitored fails may be labeled as the class to which the neural network model to be monitored fails, and the class to which the set of relative entropy samples corresponding to m neural network layers when the neural network model to be monitored normally works may be labeled as the class to which the neural network model to be monitored normally works.

As an example, a set of relative entropy samples may be obtained, a first classifier based on machine learning may be sampled, and whether an operation fault exists in the neural network model to be monitored may be determined; as another example, the obtained set of relative entropy samples may be used to train the second classifier for deep learning, so that a small amount of relative entropy samples are used to train the second classifier, thereby effectively saving operation resources.

In addition, the method provided by the embodiment of the application has stronger expansibility, and can be used for analyzing the internal result of the neural network model based on the embodiment and combining the prior art, performing structure non-visual analysis (model-agnostic analysis) and the like; or, under the support of more relative entropy value sample sets, more layers of classification on the operation faults can be realized.

The method for monitoring the neural network model fault shown in fig. 3 is exemplarily described below by taking the neural network model to be monitored as a deep neural network model for image recognition in the automatic driving system perception module.

Fig. 5 is a schematic diagram of a neural network model fault monitoring method in an automatic driving system according to an embodiment of the present application, as shown in fig. 5, a deep neural network model for image recognition in a perception module may be deployed in a vehicle-mounted computing platform or an AI accelerator, and in the working process of the automatic driving system, the perception module may acquire each frame of image acquired by a vehicle-mounted camera, and then use the deep neural network model for image recognition to perform reasoning, and output a recognition result. For any frame of image, the neural network model fault monitoring device may execute step 301, so as to obtain an output data set corresponding to each of m neural network layers in the neural network model in a processing procedure of the deep neural network model for image recognition on the frame of image.

Further, the neural network model fault monitoring device may execute the step 302, where the feature value set corresponding to each neural network layer is extracted from the output data set corresponding to each neural network layer in the m neural network layers.

For example, for any neural network layer, its corresponding set of eigenvalues may be represented in the form of an eigenvalue vector; as an example, extracting n eigenvalues from the mth neural network layer may obtain an eigenvalue vector Am:

in the formula (4) of the present invention,respectively representing the extracted eigenvalues, n represents the number of eigenvalues, and m represents the number of neural network layers.

For each neural network layer, the same amount of output data may be extracted as a set of feature values corresponding to each neural network layer; the set of feature values corresponding to the obtained neural network layers is shown in the following formula (5):

in the formula (5), A1 and A2 … Am represent feature value sets corresponding to the vector m neural network layers. A is a characteristic value matrix of m rows and n columns, and the characteristic value matrix comprises characteristic value sets corresponding to the neural network layers.

The eigenvalue matrix is constructed based on the Monte Carlo idea to reflect the running state of the deep neural network model for image recognition, wherein the deep neural network model for image recognition can generate a large amount of intermediate calculation data, namely output data corresponding to each neural network layer, in the reasoning operation process, and the estimation of the output data corresponding to each neural network layer is established by adopting experiments on the output data, namely the eigenvalue matrix is generated.

Further, the neural network model fault monitoring device may execute the step 303, calculate the relative entropy value between the feature value set corresponding to each neural network layer and the first element set conforming to the gaussian distribution, to obtain the relative entropy value sets corresponding to the m neural network layers.

The first set of elements may be represented in the form of a reference matrix, for example; the set of relative entropy values may be represented in the form of a matrix of relative entropy values.

As an example, the reference matrix G may be as shown in the following formula (6):

G＝[g1 g2 g3…gn]......................(6)

in the above formula (6), g1, g2, …, gn each represent a random number conforming to the standard normal distribution (N to (0, 1)); i.e. the reference matrix G comprises a first set of elements that follow a gaussian distribution.

As one example, a relative entropy matrix may be determined from the eigenvalue matrix and the reference matrix; illustratively, the relative entropy value KLm of the eigenvalue vector Am and the reference matrix G can be obtained by combining the formula (4) and the formula (6), as described in the following formula (7):

in the formula (7), A _m (i) An i-th element representing the eigenvalue vector Am, G (i) representing the i-th element in the reference matrix; ln (·) represents the calculated natural logarithm; sigma (sigma) _n (·) represents summing n data.

Referring to formula (7), for any eigenvalue vector in formula (5), calculating the relative entropy value of the reference matrix described in formula (6), and obtaining a relative entropy value matrix KL:

KL＝[KL ₁ KL ₂ …KL _m ] ^T ................(8)

wherein each element in the matrix of relative entropy values KL represents a relative entropy value. That is, the relative entropy matrix KL includes the relative entropy values of the feature value set and the first element set corresponding to each neural network layer.

The relative entropy matrix KL shown in the formula (8) is a matrix of 1×m, so that the eigenvalue matrix A of m×n shown in the formula (5) is reduced in dimension to a matrix of 1×m, the data dimension reduction is realized, and the operation efficiency is further improved.

In addition, the relative entropy matrix KL describes the distribution difference condition of m-layer extraction feature quantity and the reference matrix G in the neural network. In the embodiment of the application, the inference data in the neural network is not classified directly, but the characteristic value matrix A and the Gaussian distribution reference matrix G are subjected to data projection, each characteristic vector in the formula (8) represents a characteristic point in the projection space, and the categories corresponding to the characteristic points are two categories of failure of the neural network model to be monitored or normal operation of the neural network model to be monitored, so that the normal output data of each neural network layer in the inference process of the neural network model in normal operation can be pulled apart from the difference of the abnormal output data of each neural network layer in the inference process of the neural network model in failure, and the coupling degree of the normal output data and the abnormal output data is reduced.

Further, the neural network model fault monitoring device may execute the step 304, and quickly classify the relative entropy matrix KL by using the classification model, so as to accurately determine whether the neural network model to be monitored has an operation fault in real time. The neural network model fault monitoring device can also feed the monitoring result back to the sensing module, the sensing fusion module, the system health management module and the like for early warning and reporting; for example, when the classification model determines that the class corresponding to the relative entropy matrix KL is that the neural network to be monitored works normally, the result can be fed back to the sensing module, and the sensing module transmits the current sensing result to the planning and decision module after receiving the feedback; when the classification model judges that the class corresponding to the relative entropy matrix KL is the neural network to be monitored fails, the result can be fed back to the sensing module, and the sensing module discards the current sensing result after receiving the feedback.

Based on the same inventive concept of the above method embodiment, the embodiment of the present application further provides a neural network model fault monitoring device in an automatic driving system, where the neural network model fault monitoring device in the automatic driving system may be used to execute the technical solution described in the above method embodiment. For example, the steps of the neural network model fault monitoring method in the automatic driving system shown in fig. 3, 4, or 5 described above may be performed.

Fig. 6 is a schematic structural view of a neural network model fault monitoring device in an automatic driving system according to an embodiment of the present application, and as shown in fig. 6, the device may include: the transmission module 601 is configured to obtain a target output data set of a neural network model to be monitored in an autopilot system, where the target output data set includes output data sets corresponding to each of M neural network layers, the neural network model to be monitored includes M neural network layers, M is an integer greater than 1, and M is an integer greater than 1 and not greater than M; the processing module 602 is configured to extract a set of feature values corresponding to the neural network layers from the target output data set; calculating the relative entropy values between the characteristic value set and a first element set conforming to preset probability distribution to obtain relative entropy value sets corresponding to the m neural network layers; and judging whether the neural network model to be monitored has operation faults or not according to the relative entropy value set.

In the embodiment of the application, based on the thought of the Monte Carlo method, the output data of each neural network layer is selectively sampled, part of the output data in the output data set is extracted as the characteristic value, and the distribution of the output data of each neural network layer is reflected by the least characteristic value, so that the calculation is simplified, the calculation cost is saved, and the calculation efficiency is improved; meanwhile, a relative entropy value set is obtained by calculating the relative entropy value between the characteristic value set corresponding to each neural network layer and the first element set conforming to the preset probability distribution, so that the data dimension reduction is realized, and the operation efficiency is further improved; therefore, the real-time performance of fault monitoring is improved, and the real-time monitoring of the neural network model faults in the automatic driving system is realized. Meanwhile, the distribution difference characteristics of the normal output data and the abnormal output data of each neural network layer are described by adopting the relative entropy values, and the normal output data and the abnormal output data of each neural network layer are distinguished, so that whether the neural network model to be monitored has operation faults or not is judged more accurately through the relative entropy value sets corresponding to the m neural network layers, and the accuracy of fault monitoring is improved. In addition, various operation faults of the neural network model or the operation faults of various neural network models can be effectively monitored, and the application range is wide.

In one possible implementation, the processing module 602 is further configured to: determining a first output data set with the minimum quantity of output data in the target output data set; extracting a characteristic value set corresponding to each neural network layer from the output data sets corresponding to each neural network layer according to the quantity of output data in the first output data set; the number of the eigenvalues in the eigenvalue set corresponding to each extracted neural network layer is smaller than or equal to the number of the output data in the first output data set.

In one possible implementation, the processing module 602 is further configured to: and extracting the characteristic value set corresponding to each neural network layer by taking the quantity of output data in the output data set corresponding to each neural network layer as weight.

In one possible implementation, the processing module 602 is further configured to: and inputting the relative entropy value set into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not.

In one possible implementation, the preset classification model includes a first classifier based on machine learning; the processing module 602 is further configured to: inputting the relative entropy value set into the first classifier, and calculating the distance between the relative entropy value set and a plurality of relative entropy value sample sets; the plurality of relative entropy sample sets comprise relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored fails, and relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored works normally; and judging whether the neural network model to be monitored has operation faults or not according to the distances between the relative entropy value set and the plurality of relative entropy value sample sets.

In one possible implementation, the classification model includes a second classifier based on deep learning; the processing module 602 is further configured to: inputting the relative entropy value set into the second classifier, and judging whether the neural network model to be monitored has an operation fault or not; the second classifier is trained by a plurality of relative entropy value sample sets.

In one possible implementation manner, the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored fails includes: the relative entropy value between a first characteristic value sample set corresponding to each neural network layer in the m neural network layers and a second element set conforming to the preset probability distribution; the first characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored fails; the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored works normally comprises: the relative entropy value between the second characteristic value sample set corresponding to each neural network layer in the m neural network layers and the second element set conforming to the preset probability distribution; and the second characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored works normally.

The technical effects and specific descriptions of the neural network model fault monitoring device in the automatic driving system shown in fig. 6 and various possible implementation manners thereof can be found in the neural network model fault monitoring method in the automatic driving system, and are not repeated here.

It should be understood that the division of the modules in the above apparatus is only a division of a logic function, and may be fully or partially integrated into one physical entity or may be physically separated when actually implemented. Furthermore, modules in the apparatus may be implemented in the form of processor-invoked software; the device comprises, for example, a processor, which is connected to a memory, in which instructions are stored, the processor calling the instructions stored in the memory to implement any of the above methods or to implement the functions of the modules of the device, wherein the processor is, for example, a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or microprocessor, and the memory is internal or external to the device. Alternatively, the modules in the apparatus may be implemented in the form of hardware circuitry, some or all of which may be implemented by the design of hardware circuitry, which may be understood as one or more processors; for example, in one implementation, the hardware circuit is an application-specific integrated circuit (ASIC), and the functions of some or all of the above modules are implemented by the design of the logic relationships of elements within the circuit; for another example, in another implementation, the hardware circuit may be implemented by a programmable logic device (programmable logic device, PLD), for example, a field programmable gate array (Field Programmable Gate Array, FPGA), which may include a large number of logic gates, and the connection relationship between the logic gates is configured by a configuration file, so as to implement the functions of some or all of the above modules. All modules of the above device may be realized in the form of processor calling software, or in the form of hardware circuits, or in part in the form of processor calling software, and in the rest in the form of hardware circuits.

In an embodiment of the present application, the processor is a circuit with signal processing capability, and in one implementation, the processor may be a circuit with instruction reading and running capability, such as a CPU, a microprocessor, a graphics processor (graphics processing unit, GPU) (which may be understood as a microprocessor), or a digital signal processor (digital signal processor, DSP), etc.; in another implementation, the processor may perform a function through a logical relationship of hardware circuitry that is fixed or reconfigurable, e.g., a hardware circuit implemented by the processor as an ASIC or PLD, such as an FPGA. In the reconfigurable hardware circuit, the processor loads the configuration document, and the process of implementing the configuration of the hardware circuit can be understood as a process of loading instructions by the processor to implement the functions of some or all of the above modules.

It will be seen that each module in the above apparatus may be one or more processors (or processing circuits) configured to implement the methods of the above embodiments, for example: a CPU, GPU, microprocessor, DSP, ASIC, FPGA, or a combination of at least two of these processor forms.

Furthermore, the modules in the above apparatus may be all or part integrated together or may be implemented independently. In one implementation, these modules are integrated together and implemented in the form of an SOC. The SOC may include at least one processor for implementing any of the methods or implementing the functions of the modules of the apparatus, where the at least one processor may be of different types, including, for example, a CPU and an FPGA, a CPU and an artificial intelligence processor, a CPU and a GPU, and the like.

The embodiment of the application also provides a neural network model fault monitoring device in the automatic driving system, which comprises the following steps: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the method of the above embodiments when executing the instructions. For example, the steps of the neural network model fault monitoring method in the automated driving system shown in fig. 3, 4, or 5 described above may be performed.

Fig. 7 is a schematic structural diagram of a neural network model fault monitoring device in an automatic driving system according to an embodiment of the present application, and as shown in fig. 7, the neural network model fault monitoring device in an automatic driving system may include: at least one processor 701, communication lines 702, memory 703, and at least one communication interface 704.

The processor 701 may be a general purpose central processing unit, microprocessor, application specific integrated circuit, or one or more integrated circuits for controlling the execution of the program of the present application; the processor 701 may also include a heterogeneous computing architecture of a plurality of general purpose processors, for example, a combination of at least two of a CPU, GPU, microprocessor, DSP, ASIC, FPGA; as one example, the processor 701 may be a cpu+gpu or cpu+asic or cpu+fpga.

Communication line 702 may include a pathway to transfer information between the aforementioned components.

Communication interface 704 uses any transceiver-like device for communicating with other devices or communication networks, such as ethernet, RAN, wireless local area network (wireless local area networks, WLAN), etc.

The memory 703 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc-only memory (compact disc read-only memory) or other optical disk storage, a compact disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor via communication line 702. The memory may also be integrated with the processor. The memory provided by embodiments of the present application may generally have non-volatility. The memory 703 is used for storing computer-executable instructions for executing the aspects of the present application, and is controlled by the processor 701 for execution. The processor 701 is configured to execute computer-executable instructions stored in the memory 703, thereby implementing the method provided in the above-described embodiment of the present application; illustratively, the steps of the neural network model fault monitoring method in the automated driving system shown in fig. 3, 4, or 5 described above may be implemented.

Alternatively, the computer-executable instructions in the embodiments of the present application may be referred to as application program codes, which are not particularly limited in the embodiments of the present application.

Illustratively, the processor 701 may include one or more CPUs, e.g., CPU0 in fig. 7; the processor 701 may also include any one of a CPU, and GPU, ASIC, FPGA, for example, CPU0+gpu0 or CPU 0+asic0 or CPU0+fpga0 in fig. 7.

For example, the neural network model fault monitoring device in an autopilot system may include a plurality of processors, such as processor 701 and processor 707 in fig. 7. Each of these processors may be a single-core (single-CPU) processor, a multi-core (multi-CPU) processor, or a heterogeneous computing architecture including a plurality of general-purpose processors. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In a specific implementation, as an embodiment, the neural network model fault monitoring apparatus in the autopilot system may further include an output device 705 and an input device 706. The output device 705 communicates with the processor 701 and may display information in a variety of ways. For example, the output device 705 may be a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, a Cathode Ray Tube (CRT) display device, or a projector (projector) or the like, and may be, for example, a vehicle-mounted HUD, AR-HUD, display or the like. The input device 706 is in communication with the processor 701 and may receive input from a user in a variety of ways. For example, the input device 706 may be a mouse, keyboard, touch screen device, or sensing device, among others.

As an example, in connection with the neural network model fault monitoring device in the autopilot system shown in fig. 7, the transmission module 601 in fig. 6 described above may be implemented by the communication interface 704 in fig. 7; the processing module 602 in fig. 6 described above may be implemented by the processor 701 in fig. 7.

An embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of the above-described embodiment. Illustratively, the steps of the neural network model fault monitoring method in the automated driving system shown in fig. 3, 4, or 5 described above may be implemented.

Embodiments of the present application provide a computer program product, for example, which may include computer readable code, or a non-volatile computer readable storage medium bearing computer readable code; the computer program product, when run on a computer, causes the computer to perform the method in the above-described embodiments. For example, the steps of the neural network model fault monitoring method in the automated driving system shown in fig. 3, 4, or 5 described above may be performed.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present application may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present application are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Although the invention is described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

A method for monitoring a neural network model fault in an automatic driving system, the method comprising:

acquiring a target output data set of a neural network model to be monitored in an automatic driving system, wherein the target output data set comprises output data sets corresponding to all neural network layers in M neural network layers, the neural network model to be monitored comprises M neural network layers, M is an integer greater than 1, and M is an integer greater than 1 and not greater than M;

extracting a characteristic value set corresponding to each neural network layer from the target output data set;

calculating the relative entropy values between the characteristic value set and a first element set conforming to preset probability distribution to obtain relative entropy value sets corresponding to the m neural network layers;

and judging whether the neural network model to be monitored has operation faults or not according to the relative entropy value set.
The method according to claim 1, wherein extracting the set of eigenvalues corresponding to each neural network layer in the target output data set comprises:

determining a first output data set with the minimum quantity of output data in the target output data set;

Extracting a characteristic value set corresponding to each neural network layer from the output data sets corresponding to each neural network layer according to the quantity of output data in the first output data set; the number of the eigenvalues in the eigenvalue set corresponding to each extracted neural network layer is smaller than or equal to the number of the output data in the first output data set.
The method according to claim 1, wherein extracting the set of eigenvalues corresponding to each neural network layer in the target output data set comprises:

and extracting the characteristic value set corresponding to each neural network layer by taking the quantity of output data in the output data set corresponding to each neural network layer as weight.
A method according to any one of claims 1-3, wherein said determining whether the neural network model to be monitored has an operational failure based on the set of relative entropy values comprises:

and inputting the relative entropy value set into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not.
The method of claim 4, wherein the pre-set classification model comprises a first classifier based on machine learning;

The step of inputting the set of relative entropy values into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not comprises the following steps:

inputting the relative entropy value set into the first classifier, and calculating the distance between the relative entropy value set and a plurality of relative entropy value sample sets; the plurality of relative entropy sample sets comprise relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored fails, and relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored works normally;

and judging whether the neural network model to be monitored has operation faults or not according to the distances between the relative entropy value set and the plurality of relative entropy value sample sets.
The method of claim 4, wherein the classification model comprises a second classifier based on deep learning;

the step of inputting the set of relative entropy values into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not comprises the following steps:

inputting the relative entropy value set into the second classifier, and judging whether the neural network model to be monitored has an operation fault or not; the second classifier is trained by a plurality of relative entropy value sample sets.
The method of claim 5, wherein the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored fails, comprises: the relative entropy value between a first characteristic value sample set corresponding to each neural network layer in the m neural network layers and a second element set conforming to the preset probability distribution; the first characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored fails; the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored works normally comprises: the relative entropy value between the second characteristic value sample set corresponding to each neural network layer in the m neural network layers and the second element set conforming to the preset probability distribution; and the second characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored works normally.
A neural network model fault monitoring device in an automatic driving system, the device comprising:

The system comprises a transmission module, a control module and a control module, wherein the transmission module is used for acquiring a target output data set of a neural network model to be monitored in an automatic driving system, the target output data set comprises output data sets corresponding to all neural network layers in M neural network layers, the neural network model to be monitored comprises M neural network layers, M is an integer greater than 1, and M is an integer greater than 1 and not greater than M;

the processing module is used for extracting a characteristic value set corresponding to each neural network layer from the target output data set; calculating the relative entropy values between the characteristic value set and a first element set conforming to preset probability distribution to obtain relative entropy value sets corresponding to the m neural network layers; and judging whether the neural network model to be monitored has operation faults or not according to the relative entropy value set.
The apparatus of claim 8, wherein the processing module is further configured to: determining a first output data set with the minimum quantity of output data in the target output data set; extracting a characteristic value set corresponding to each neural network layer from the output data sets corresponding to each neural network layer according to the quantity of output data in the first output data set; the number of the eigenvalues in the eigenvalue set corresponding to each extracted neural network layer is smaller than or equal to the number of the output data in the first output data set.
The apparatus of claim 8, wherein the processing module is further configured to: and extracting the characteristic value set corresponding to each neural network layer by taking the quantity of output data in the output data set corresponding to each neural network layer as weight.
The apparatus of any one of claims 8-10, wherein the processing module is further configured to: and inputting the relative entropy value set into a preset classification model, and judging whether the neural network model to be monitored has an operation fault or not.
The apparatus of claim 11, wherein the pre-set classification model comprises a first classifier based on machine learning;

the processing module is further configured to: inputting the relative entropy value set into the first classifier, and calculating the distance between the relative entropy value set and a plurality of relative entropy value sample sets; the plurality of relative entropy sample sets comprise relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored fails, and relative entropy sample sets corresponding to the m neural network layers when the neural network model to be monitored works normally; and judging whether the neural network model to be monitored has operation faults or not according to the distances between the relative entropy value set and the plurality of relative entropy value sample sets.
The apparatus of claim 11, wherein the classification model comprises a second classifier based on deep learning;

the processing module is further configured to: inputting the relative entropy value set into the second classifier, and judging whether the neural network model to be monitored has an operation fault or not; the second classifier is trained by a plurality of relative entropy value sample sets.
The apparatus of claim 12, wherein the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored fails, comprises: the relative entropy value between a first characteristic value sample set corresponding to each neural network layer in the m neural network layers and a second element set conforming to the preset probability distribution; the first characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored fails; the set of relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored works normally comprises: the relative entropy value between the second characteristic value sample set corresponding to each neural network layer in the m neural network layers and the second element set conforming to the preset probability distribution; and the second characteristic value sample set is extracted from output data sample sets corresponding to the neural network layers when the neural network model to be monitored works normally.
A neural network model fault monitoring device in an automatic driving system, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the method of any of claims 1-7 when executing the instructions.
A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1-7.
A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to perform the method of any of claims 1-7.