WO2023184188A1 - 一种自动驾驶系统中神经网络模型故障监测方法及装置 - Google Patents

一种自动驾驶系统中神经网络模型故障监测方法及装置 Download PDF

Info

Publication number
WO2023184188A1
WO2023184188A1 PCT/CN2022/083858 CN2022083858W WO2023184188A1 WO 2023184188 A1 WO2023184188 A1 WO 2023184188A1 CN 2022083858 W CN2022083858 W CN 2022083858W WO 2023184188 A1 WO2023184188 A1 WO 2023184188A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
output data
relative entropy
network model
monitored
Prior art date
Application number
PCT/CN2022/083858
Other languages
English (en)
French (fr)
Inventor
王矿磊
陈艺帆
陈德久
苏鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202280006258.5A priority Critical patent/CN117242455A/zh
Priority to PCT/CN2022/083858 priority patent/WO2023184188A1/zh
Publication of WO2023184188A1 publication Critical patent/WO2023184188A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of automatic driving technology, and in particular to a method and device for fault monitoring of neural network models in an automatic driving system.
  • AI artificial intelligence
  • embodiments of the present application provide a method for fault monitoring of a neural network model in an autonomous driving system.
  • the method includes: obtaining a target output data set of a neural network model to be monitored in the autonomous driving system, and the target output The data set includes an output data set corresponding to each of the m neural network layers, wherein the neural network model to be monitored includes M neural network layers, M is an integer greater than 1, and m is greater than 1 and not greater than M.
  • the target output data set extract the feature value set corresponding to each neural network layer; calculate the relative entropy value between the feature value set and the first element set that conforms to the preset probability distribution, and obtain A set of relative entropy values corresponding to the m neural network layers; based on the set of relative entropy values, it is determined whether there is a running failure in the neural network model to be monitored.
  • the output data of each neural network layer is selectively sampled, part of the output data in the output data set is extracted as feature values, and each neural network layer is reflected by as few feature values as possible
  • the distribution of output data thus simplifies calculations, saves computational overhead, and improves computational efficiency; at the same time, by calculating the relative entropy value between the feature value set corresponding to each neural network layer and the first element set that conforms to the preset probability distribution, we obtain A set of relative entropy values realizes data dimensionality reduction and further improves computing efficiency; thereby improving the real-time performance of fault monitoring and realizing real-time monitoring of operating faults of the neural network model in the autonomous driving system.
  • the relative entropy value is used to describe the distribution difference characteristics of the normal output data and abnormal output data of each neural network layer, and distinguish the normal output data and abnormal output data of each neural network layer, so as to pass the relative entropy corresponding to the m neural network layers.
  • the entropy value set can more accurately determine whether there are operational faults in the neural network model to be monitored, improving the accuracy of fault monitoring; in addition, it can effectively monitor various operational faults of the neural network model or the operational faults of various neural network models. The scope of application wide.
  • extracting a feature value set corresponding to each neural network layer in the target output data set includes: determining the Among the target output data sets, the first output data set has the smallest number of output data; according to the number of output data in the first output data set, in the output data set corresponding to each neural network layer, extract each of the A feature value set corresponding to the neural network layer; wherein the number of feature values in the extracted feature value set corresponding to each neural network layer is less than or equal to the number of output data in the first output data set.
  • the neural network model in the autonomous driving system is usually complex and the amount of output data in the target output data set is large
  • the output data set corresponding to each neural network layer adaptively extract the corresponding The set of eigenvalues, the eigenvalue data extracted in each neural network layer is not greater than the number of output data in any one of the m neural network layers, thereby simplifying the computational overhead, improving the efficiency of subsequent processing, and meeting the requirements for fault monitoring real-time requirements.
  • extracting a feature value set corresponding to each neural network layer in the target output data set includes: using the The number of output data in the output data set corresponding to each neural network layer is the weight, and the feature value set corresponding to each neural network layer is extracted.
  • each neural network is allocated according to the weight of the number of output data corresponding to each neural network layer.
  • the amount of output data extracted from each layer can be used to adaptively extract the feature value set corresponding to each neural network.
  • the extracted feature value set can more accurately reflect the distribution of the output data of each neural network layer.
  • the neural network model to be monitored is determined based on the set of relative entropy values. Whether there is a running fault includes: inputting the set of relative entropy values into a preset classification model, and determining whether there is a running fault in the neural network model to be monitored.
  • the set of relative entropy values is input into a preset classification model.
  • the preset classification model is based on the relative entropy value between a set of feature values extracted from known normal output data and a set of elements that conform to the preset probability distribution.
  • the relative entropy value between the feature value set extracted from the abnormal output data and the element set that conforms to the preset probability distribution is classified, so as to accurately determine whether there is a running failure in the neural network model to be monitored.
  • the preset classification model includes a first classifier based on machine learning;
  • the set of relative entropy values is input into a preset classification model, and determining whether there is a running failure in the neural network model to be monitored includes: inputting the set of relative entropy values into the first classifier, and calculating the relative entropy value.
  • the to-be-monitored neural network model is judged according to the distance between the relative entropy value set and a plurality of relative entropy value sample sets. Monitor the neural network model for operational failures.
  • the relative entropy value set can be automatically classified more conveniently and quickly based on the distance between the relative entropy value set and multiple relative entropy value sample sets. , thereby judging in real time whether there is a running failure in the neural network model to be monitored.
  • the classification model includes a second classifier based on deep learning; the relative entropy The value set is input into the preset classification model, and the judgment of whether the neural network model to be monitored has a running failure includes: inputting the relative entropy value set into the second classifier, and judging the neural network model to be monitored. Whether there is a running failure; wherein, the second classifier is trained by a plurality of relative entropy value sample sets.
  • the m neural network layers The corresponding relative entropy value sample set includes: the relative entropy value between the first feature value sample set corresponding to each neural network layer in the m neural network layers and the second element set that conforms to the preset probability distribution; Wherein, the first feature value sample set is extracted from the output data sample set corresponding to each neural network layer when the neural network model to be monitored fails; when the neural network model to be monitored is working normally, the m
  • the relative entropy value sample set corresponding to the m neural network layers includes: the second feature value sample set corresponding to each neural network layer in the m neural network layers and the second element set that conforms to the preset probability distribution. Relative entropy value; wherein, the second feature value sample set is extracted from the output data sample set corresponding to each neural network layer when the neural network model to be monitored is working normally.
  • inventions of the present application provide a fault monitoring device for a neural network model in an autonomous driving system.
  • the device includes: a transmission module for obtaining a target output data set of a neural network model to be monitored in the autonomous driving system.
  • the target output data set includes an output data set corresponding to each neural network layer in m neural network layers, wherein the neural network model to be monitored includes M neural network layers, M is an integer greater than 1, m is greater than 1 and an integer not greater than M; a processing module, configured to extract the feature value set corresponding to each neural network layer in the target output data set; calculate the feature value set and the first value that conforms to the preset probability distribution
  • the relative entropy values between the element sets are used to obtain the relative entropy value set corresponding to the m neural network layers; based on the relative entropy value set, it is judged whether there is a running failure in the neural network model to be monitored.
  • the output data of each neural network layer is selectively sampled, part of the output data in the output data set is extracted as feature values, and each neural network layer is reflected by as few feature values as possible
  • the distribution of output data thus simplifies calculations, saves computational overhead, and improves computational efficiency; at the same time, by calculating the relative entropy value between the feature value set corresponding to each neural network layer and the first element set that conforms to the preset probability distribution, we obtain A set of relative entropy values realizes data dimensionality reduction and further improves computing efficiency; thereby improving the real-time performance of fault monitoring and realizing real-time monitoring of neural network model faults in autonomous driving systems.
  • the relative entropy value is used to describe the distribution difference characteristics of the normal output data and abnormal output data of each neural network layer, and distinguish the normal output data and abnormal output data of each neural network layer, so as to pass the relative entropy corresponding to the m neural network layers.
  • the entropy value set can more accurately determine whether there is a running fault in the neural network model to be monitored, improving the accuracy of fault monitoring. In addition, it can effectively monitor various operating failures of neural network models or operating failures of various neural network models, and has a wide range of applications.
  • the processing module is further configured to: determine the first output data set with the smallest number of output data among the target output data sets. ;According to the number of output data in the first output data set, extract the feature value set corresponding to each neural network layer from the output data set corresponding to each neural network layer; wherein, each extracted neural network The number of feature values in the feature value set corresponding to the layer is less than or equal to the number of output data in the first output data set.
  • the neural network model in the autonomous driving system is usually complex and the amount of output data in the target output data set is large
  • the output data set corresponding to each neural network layer adaptively extract the corresponding The set of eigenvalues, the eigenvalue data extracted in each neural network layer is not greater than the number of output data in any one of the m neural network layers, thereby simplifying the computational overhead, improving the efficiency of subsequent processing, and meeting the requirements for fault monitoring real-time requirements.
  • the processing module is further configured to: use the number of output data in the output data set corresponding to each neural network layer as a weight, Extract the feature value set corresponding to each neural network layer.
  • each neural network is allocated according to the weight of the quantity of output data corresponding to each neural network layer.
  • the amount of output data extracted from each layer can be used to adaptively extract the feature value set corresponding to each neural network.
  • the extracted feature value set can more accurately reflect the distribution of the output data of each neural network layer.
  • the processing module is further configured to: input the set of relative entropy values into In the preset classification model, determine whether there is a running failure in the neural network model to be monitored.
  • the set of relative entropy values is input into a preset classification model.
  • the preset classification model is based on the relative entropy value between a set of feature values extracted from known normal output data and a set of elements that conform to the preset probability distribution.
  • the relative entropy value between the feature value set extracted from the abnormal output data and the element set that conforms to the preset probability distribution is classified, so as to accurately determine whether there is a running failure in the neural network model to be monitored.
  • the preset classification model includes a first classifier based on machine learning; the processing module, Also used for: inputting the set of relative entropy values into the first classifier, and calculating the distance between the set of relative entropy values and a plurality of relative entropy value sample sets; wherein, the plurality of relative entropy values
  • the sample set includes a sample set of relative entropy values corresponding to the m neural network layers when the neural network model to be monitored fails and a relative entropy value corresponding to the m neural network layers when the neural network model to be monitored works normally.
  • Sample set judging whether there is a running failure in the neural network model to be monitored based on the distance between the relative entropy value set and multiple relative entropy value sample sets.
  • the relative entropy value set can be automatically classified more conveniently and quickly based on the distance between the relative entropy value set and multiple relative entropy value sample sets. , thereby judging in real time whether there is a running failure in the neural network model to be monitored.
  • the classification model includes a second classifier based on deep learning; the processing module also uses In: inputting the set of relative entropy values into the second classifier to determine whether there is a running failure in the neural network model to be monitored; wherein the second classifier is trained by multiple sets of relative entropy value samples .
  • the m neural network layers The corresponding relative entropy value sample set includes: the relative entropy value between the first feature value sample set corresponding to each neural network layer in the m neural network layers and the second element set that conforms to the preset probability distribution; Wherein, the first feature value sample set is extracted from the output data sample set corresponding to each neural network layer when the neural network model to be monitored fails; when the neural network model to be monitored is working normally, the m
  • the relative entropy value sample set corresponding to the m neural network layers includes: the second feature value sample set corresponding to each neural network layer in the m neural network layers and the second element set that conforms to the preset probability distribution. Relative entropy value; wherein, the second feature value sample set is extracted from the output data sample set corresponding to each neural network layer when the neural network model to be monitored is working normally.
  • embodiments of the present application provide a neural network model fault monitoring device in an autonomous driving system, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute The instructions implement the first aspect or one or more of the neural network fault monitoring methods in the automatic driving system of the first aspect.
  • embodiments of the present application provide a computer-readable storage medium on which computer program instructions are stored, characterized in that when the computer program instructions are executed by a processor, the first aspect or the first aspect is implemented.
  • One or several neural network fault monitoring methods in automatic driving systems are provided.
  • embodiments of the present application provide a computer program product that, when the computer program product is run on a computer, causes the computer to execute the above-mentioned first aspect or one or more of the automatic tasks of the first aspect. Neural network fault monitoring method in driving system.
  • Figure 1 shows a schematic architectural diagram of an autonomous driving system according to an embodiment of the present application
  • Figure 2 shows a schematic diagram of fault monitoring of a neural network model according to an embodiment of the present application
  • Figure 3 shows a flow chart of a neural network model fault monitoring method in an autonomous driving system according to an embodiment of the present application
  • Figure 4 shows a flow chart of a method for obtaining a relative entropy value sample set according to an embodiment of the present application
  • Figure 5 shows a schematic diagram of a neural network model fault monitoring method in an autonomous driving system according to an embodiment of the present application
  • Figure 6 shows a schematic structural diagram of a neural network model fault monitoring device in an autonomous driving system according to an embodiment of the present application
  • Figure 7 shows a schematic structural diagram of a neural network model fault monitoring device in an autonomous driving system according to an embodiment of the present application.
  • exemplary means "serving as an example, example, or illustrative.” Any embodiment described herein as “exemplary” is not necessarily to be construed as superior or superior to other embodiments.
  • Probability distribution refers to the probability law used to express the values of random variables. If the test results are expressed by the values of random variables, then the probability distribution of the random experiment is the probability distribution of the random variables, that is, the possible values of the random variables and the probability of obtaining the corresponding values. Depending on the type of random variables, probability distributions can be divided into different forms of expression, such as Gaussian distribution (also known as normal distribution), binomial distribution, Poisson distribution, uniform distribution, Bernoulli distribution, Laplace distribution, exponential distribution, gamma distribution, beta distribution, polynomial distribution, etc.
  • Gaussian distribution also known as normal distribution
  • binomial distribution also known as normal distribution
  • Poisson distribution uniform distribution
  • Bernoulli distribution Laplace distribution
  • exponential distribution gamma distribution
  • beta distribution polynomial distribution
  • Relative entropy also known as KL divergence (Kullback-Leibler divergence, KLD), is a measure of the asymmetry of the difference between two probability distributions P and Q.
  • Relative entropy can measure the distance between two probability distributions. When the two probability distributions are the same, their relative entropy is zero. When the difference between the two probability distributions increases, their relative entropy also increases accordingly.
  • P represents the true distribution of the data
  • Q represents the theoretical distribution of the data, the estimated model distribution, or the approximate distribution of P.
  • P(i) represents the i-th element in P
  • Q(i) represents the i-th element in Q
  • ln( ⁇ ) represents the calculation of the natural logarithm
  • the Monte Carlo method also known as the statistical simulation method or the statistical experimental method, is a numerical simulation method that takes probability phenomena as the research object. It usually obtains statistical values according to the sampling survey method to infer unknown characteristic quantities. In calculation simulation, by constructing a sum A probabilistic model that approximates system performance and performs random experiments can simulate the random characteristics of the system.
  • Classifiers generally consist of a fully connected layer and a softmax function (which can be called a normalized exponential function), and can output different categories or probabilities of different categories based on the input data.
  • MLP is a forward-structured artificial neural network that maps a set of input vectors to a set of output vectors.
  • MLP can be regarded as a directed graph.
  • the basic structure of a multi-layer perceptron consists of multiple node layers: input layer, intermediate hidden layer and output layer. Each node layer is fully connected to the next node layer. Except for the input node, each node is a neuron with a nonlinear activation function; MLP follows the principles of the human nervous system to learn and make data predictions. Its main advantage is its ability to quickly solve complex problems.
  • the basic logic of the KNN algorithm is as follows: Classification is performed by measuring the distance between different feature values. In the classification decision-making, this algorithm only determines the category to which the sample to be classified belongs based on the category of the nearest one or several samples. Its basic idea is: if a sample belongs to a certain category among the k most similar (that is, the closest in the feature space) samples in the feature space, then the sample also belongs to this category, where K is usually not An integer greater than 20. In the KNN algorithm, the selected neighbors are all correctly classified samples.
  • the neural network model is a computing model that consists of a large number of nodes (or neurons) connected to each other. Each node represents a specific output function, called an activation function. Each connection between two nodes represents a weighted value for the signal passing through the connection, called a weight, which is equivalent to the memory of an artificial neural network.
  • the output of the neural network model varies depending on the connection method, weight value and activation function of the neural network model.
  • the neural network model itself is usually an approximation of a certain algorithm or function in nature, or it may be an expression of a logical strategy.
  • Neural network models usually include multiple neural network layers, where each neural network layer may include one or more nodes.
  • Neural network models can be divided into Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), etc.
  • DNN Deep Neural Network
  • CNN Convolutional Neural Network
  • RNN Recurrent Neural Network
  • DNN Deep Neural Network
  • CNN Convolutional Neural Network
  • RNN Recurrent Neural Network
  • the internal neural network layers can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are hidden layers; the layers are fully connected, that is, any neuron in the i-th layer must be Connect to any neuron in layer i+1.
  • the convolutional neural network is a neural network model with a convolutional structure; the convolutional neural network contains a feature extractor composed of a convolution layer and a subsampling layer, which can be regarded as a filter; convolution A layer refers to a layer of neurons in a convolutional neural network that convolves input data; in a convolutional layer, a neuron can only be connected to some of the neighboring neurons; a convolutional layer usually contains several features. plane, each feature plane can be composed of some rectangularly arranged neurons; neurons in the same feature plane share weights, that is, share convolution kernels.
  • Neural network model fault monitoring refers to monitoring possible operating faults of the neural network model during the inference and operation process of the neural network model.
  • operational faults may include faults caused by hardware failure in the device where the neural network model is deployed, or the neural network model obtaining erroneous inference results due to abnormal inputs, etc.
  • faults caused by hardware failure are generally referred to as soft errors; common soft errors can be divided into transient errors and permanent errors.
  • transient errors are related to hardware failures caused by sudden changes in external environments such as radiation and temperature, as well as mutual interference of the hardware itself. The characteristic of transient errors is that the error will disappear after a certain period of time.
  • Common transient errors It is a bit flip; common permanent errors are stuck-at-0 and stuck-at-one, which are respectively related to hardware failures caused by hardware open circuits and short circuits. Permanent A characteristic feature of sexual errors is that the error will remain in place for a long time.
  • redundant design is used to monitor neural network model faults, such as triple modular redundancy (TMR) design.
  • TMR design multiple modules of the same structure are used in parallel to perform the same function; this method
  • a pre-prepared query table is used to collect as much as possible all the neuron weight values under the error-free condition of the neural network model.
  • the inference operation of the neural network model if a certain weight value is not in the query table, it can be It is considered that there is an abnormality in the weight value, that is, there is a running failure in the neural network model, and then the weight value switching state is started, and the weight value in the error neuron is allocated to other neurons, so that other neurons are used to replace the error neuron.
  • Yuan The role of Yuan.
  • SED Symptom-based Error Detectors
  • Alexnet considering the output value of each neuron, pooling layer and fully connected layer, there will be a total of more than 100,000 output values of a single neuron, which are convolved layer, for example, has more than 15,000 output values. Therefore, collecting the output values of each neural network layer will bring huge computational overhead.
  • fault monitoring is performed on Alexnet, due to the huge number of output values, fault monitoring is delayed. , cannot be applied to scenarios such as automatic driving systems that require high real-time fault monitoring; in addition, this method can only monitor the occurrence of transient errors, and for setting zeros and setting ones, since the hidden layer in these two cases, The maximum output value does not change significantly, so permanent errors cannot be monitored.
  • the embodiment of the present application provides a neural network model fault monitoring method (see below for detailed description), which can be applied to scenarios configured with neural network models, such as autonomous vehicles, vehicle-mounted equipment or vehicle-mounted systems (such as automated driving systems (Automated Scenarios where neural network models are deployed such as Driving System (ADS) or Advanced Driver Assistance Systems (ADAS), large-scale deployment of deep learning training servers, and neural networks used in Internet of Things (IoT) devices
  • the model performs object recognition, semantic recognition and other scenarios, and the neural network model is used in security equipment to perform vehicle detection, object detection and other scenarios.
  • the neural network model fault monitoring method provided by the embodiments of this application can accurately monitor various types of devices configured in the above scenarios.
  • Various operating faults that occur in neural network models; especially for scenarios such as autonomous driving systems that require high real-time fault monitoring, real-time fault monitoring can be achieved to meet the real-time requirements of scenarios such as autonomous driving.
  • the neural network model fault monitoring method provided by the embodiment of the present application is exemplarily explained.
  • Figure 1 shows a schematic architectural diagram of an automatic driving system according to an embodiment of the present application; as shown in Figure 1, the automatic driving system may include: a perception module (perception layer), a planning and decision module (planning & decision), and a transmission control module (motion controller).
  • a perception module perception layer
  • planning and decision module planning and decision module
  • motion controller transmission control module
  • the perception module is used to sense the environment around the vehicle or the environment inside the vehicle. It can integrate the data collected by on-board sensors, such as cameras, lidar, millimeter wave radar, ultrasonic radar, light sensors, etc., around the vehicle or in the cabin to perceive the vehicle.
  • the surrounding environment or the environment inside the vehicle can be transmitted to the planning and decision-making module.
  • the data collected by vehicle-mounted sensors around the vehicle or in the cabin may include video streams, radar point cloud data, or analyzed structured positions, speeds, steering angles, and sizes of people, vehicles, and objects. information or data.
  • the perception module can process the data collected by the vehicle sensors around the vehicle or in the cabin through a neural network model to achieve environmental perception.
  • the neural network model can be deployed in a vehicle computing platform or an AI accelerator and other processing equipment.
  • the perception module can obtain the image of the vehicle's surrounding environment collected by the on-board camera, and process the image using a deep neural network model for image recognition, thereby identifying pedestrians, lane lines, vehicles, obstacles, etc. in the image. Traffic lights and other objects.
  • the planning and decision-making module is used to analyze and make decisions based on the perception results generated by the perception module.
  • the planning generates a control set that satisfies specific constraints (such as the dynamic constraints of the vehicle itself, collision avoidance, passenger comfort, etc.); and the control can be The set is transferred to the drive control module.
  • the planning and decision-making module can use the neural network model used to generate trajectories to process the sensing results and constraints to generate a control set; for example, the neural network model can be deployed on a vehicle-mounted computing platform or an AI accelerator, etc. in processing equipment.
  • the transmission control module is used to control vehicle driving according to the control set generated by the planning and decision-making module; for example, it can generate control signals such as steering wheel angle, speed, acceleration, etc. based on the control set and combined with the vehicle's dynamics information, and control the vehicle steering system. Or the engine, etc. executes the control signal to control the driving of the vehicle.
  • the autonomous driving system may also include other functional modules; for example, a positioning module, an interaction module, a communication module, etc. (not shown in the figure), which are not limited.
  • the positioning module can be used to provide location information of the vehicle and also provide attitude information of the vehicle.
  • the positioning module may include a satellite navigation system (Global Navigation Satellite System, GNSS), an inertial navigation system (Inertial Navigation System, INS), etc., which may be used to determine the location information of the vehicle.
  • the interactive module can be used to send information to the driver and receive instructions from the driver.
  • the communication module can be used for vehicles to communicate with other devices, where other devices can include mobile terminals, cloud devices, other vehicles, roadside devices, etc., through 2G/3G/4G/5G, Bluetooth, frequency modulation (FM) ), wireless local area networks (WLAN), long time evolution (LTE), vehicle to everything (V2X), vehicle to vehicle (V2V), long-term Evolution-vehicle (long time evolution vehicle, LTE-V) and other wireless communication connections are implemented.
  • other devices can include mobile terminals, cloud devices, other vehicles, roadside devices, etc., through 2G/3G/4G/5G, Bluetooth, frequency modulation (FM) ), wireless local area networks (WLAN), long time evolution (LTE), vehicle to everything (V2X), vehicle to vehicle (V2V), long-term Evolution-vehicle (long time evolution vehicle, LTE-V) and other wireless communication connections are implemented.
  • FM frequency modulation
  • WLAN wireless local area networks
  • LTE long time evolution
  • V2X vehicle to everything
  • V2V vehicle to vehicle
  • the neural network model fault monitoring method in the autonomous driving system provided by the embodiment of the present application can be executed by the neural network model fault monitoring device, as an example, to perform fault monitoring on the deep neural network model used for image recognition in the perception module in Figure 1
  • Figure 2 shows a schematic diagram of fault monitoring of a neural network model according to an embodiment of the present application; as shown in Figure 2, the neural network model fault monitoring device can obtain the image recognition used in the perception module of the automatic driving system.
  • the deep neural network model generates intermediate data in the process of identifying a frame of image, executes the neural network model fault monitoring method in the embodiment of the present application (see below for detailed description), and performs real-time and accurate fault monitoring on the deep neural network model, And the fault monitoring results are fed back to the perception module in real time, so that the perception module can judge whether to pass the current identification results to the planning and decision-making module. For example, it can be fed back to the perception module that the neural network model is working properly, so that the perception module can pass the recognition result of the frame image to the planning and decision-making module; or it can be fed back to the perception module that the neural network is faulty, so that the perception module discards the Frame image recognition results.
  • the embodiments of the present application do not limit the type of the neural network model fault monitoring device.
  • the neural network model fault monitoring device can be set up independently, or can be integrated in other devices, or can be implemented through software or a combination of software and hardware.
  • the neural network model fault monitoring device may be an autonomous vehicle, or other components in an autonomous vehicle.
  • the neural network model fault monitoring device includes but is not limited to: vehicle terminal, vehicle controller, vehicle module, vehicle module, vehicle component, vehicle chip, vehicle unit, vehicle radar or vehicle camera, etc.
  • the neural network model fault monitoring device can be integrated in an on-board computing platform or an AI accelerator and other processing equipment of an autonomous vehicle.
  • the neural network model fault monitoring device may also be an intelligent terminal with data processing capabilities other than an autonomous vehicle, or a component or chip provided in an intelligent terminal.
  • the neural network model fault monitoring device may be a general-purpose device or a special-purpose device.
  • the device can also be a desktop computer, a portable computer, a network server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device or other devices with data processing functions, or for these Components or chips within the device.
  • PDA personal digital assistant
  • the neural network model fault monitoring device may also be a chip or processor with processing functions, and the fault monitoring device may include multiple processors.
  • the processor can be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
  • FIG 3 shows a flow chart of a neural network model fault monitoring method in an autonomous driving system according to an embodiment of the present application. This method can be executed by the neural network model fault monitoring device in Figure 2, as shown in Figure 3. The following steps can be included:
  • Step 301 Obtain the target output data set of the neural network model to be monitored in the automatic driving system.
  • the neural network model to be monitored can be any neural network model in the autonomous driving system.
  • it can be a deep neural network model configured in the perception module for image recognition or a neural network model for speech recognition, etc. It can also be the neural network model configured in the planning and decision-making module for generating control sets, and so on.
  • the type of neural network model is not limited in the embodiments of this application.
  • it can be a deep neural network, a convolutional neural network, a recurrent neural network, etc.
  • the target output data set may include an output data set corresponding to each of the m neural network layers.
  • the neural network model to be monitored includes M neural network layers, M is an integer greater than 1, and m is greater than 1 and not greater than M is an integer.
  • the output data set corresponding to the neural network layer includes data output by all nodes in the neural network layer during the inference process of the neural network model to be monitored.
  • the specific value of m can be preset according to the scale of the neural network model to be monitored and/or the amount of actual computing resources; for example, the value of m can be set close to M, that is, to obtain as many neural networks as possible The output data set corresponding to the network layer, thereby improving the monitoring accuracy.
  • m and M have the same value, it means that the neural network model fault monitoring device obtains the output data set corresponding to all neural network layers in the neural network model to be monitored; You can also set the value of m to a smaller value, that is, obtain a small number of output data sets corresponding to the neural network layer, thereby saving computing resources, improving processing efficiency, and better meeting real-time requirements.
  • the neural network model to be monitored can be a convolutional neural network used for image recognition in the perception module of the autonomous driving system.
  • the convolutional neural network can include several convolutional layers, pooling layers, fully connected layers and other neural networks.
  • layer the image collected by the perception module is input into the convolutional neural network, and after being processed by the convolution layer, pooling layer, and fully connected layer, the image recognition result is output; among them, each convolution layer can include one or more convolution layers.
  • Convolution kernel each convolution kernel can extract the corresponding feature map, then the target output data set of the convolutional neural network can include the feature maps extracted by all convolution kernels in each convolution layer.
  • Step 302 Extract the feature value set corresponding to each neural network layer from the target output data set.
  • the feature value set corresponding to the neural network layer may include one or more feature values corresponding to the neural network layer.
  • the output data can be extracted as a feature value from the output data set corresponding to the neural network layer, thereby obtaining the feature value set corresponding to the neural network layer.
  • the number of extracted output data can be preset according to requirements. For example, the number of output data extracted by different neural network layers can be the same or different, and there is no limit to this. This step can be understood as the extraction of feature engineering. By extracting as little output data as possible as feature values, it can reflect the distribution of the output data of each neural network layer as comprehensively as possible.
  • the output data can be extracted as a feature value from the output data set corresponding to the neural network layer according to a preset probability distribution, thereby obtaining the corresponding value of the neural network layer.
  • feature value set for example, part of the output data can be extracted as feature values from the output data set corresponding to the neural network layer in a Gaussian distribution manner, thereby obtaining the feature value set corresponding to the neural network layer.
  • the following is an example of a possible implementation method for extracting the feature value set corresponding to each neural network layer.
  • Method 1 Determine the first output data set with the smallest number of output data in the target output data set; according to the number of output data in the first output data set, extract each neural network from the output data set corresponding to each neural network layer The feature value set corresponding to the layer; wherein, the number of feature values in the extracted feature value set corresponding to each neural network layer is less than or equal to the number of output data in the first output data set.
  • the number of output data to be extracted in each neural network layer can be determined based on the number of output data in the first output data set, and then the number of output data to be extracted in each neural network layer is extracted as a feature value to obtain each neural network layer.
  • the set of feature values corresponding to the network layer can be determined based on the number of output data in the first output data set, and then the number of output data to be extracted in each neural network layer is extracted as a feature value to obtain each neural network layer.
  • the neural network model in the autonomous driving system is usually complex and the amount of output data in the target output data set is large
  • the corresponding data of each neural network layer is adaptively extracted.
  • the feature value data extracted in each neural network layer is not greater than the number of output data in any one of the m neural network layers, thereby simplifying the computational overhead, improving the efficiency of subsequent processing, and meeting the requirements for fault monitoring. Real-time requirements.
  • the sampling coefficient can be preset, and the number of output data to be extracted by each neural network layer can be determined based on the sampling coefficient and the number of output data in the first output data set; for example, the number of output data to be extracted by each neural network layer can be determined by the following formula (2)
  • n tmp represents the number of output data in the first output data set
  • represents the sampling coefficient
  • the value range of ⁇ is [0,1].
  • the sampling coefficient ⁇ is used to balance the complexity and accuracy of fault monitoring of the neural network model to be monitored.
  • the specific value of the sampling coefficient can be set according to actual needs; for example, when the monitoring accuracy is required to be higher, ⁇ can be Set to a higher value, that is, for each neural network layer, a larger amount of output data is extracted from the corresponding output data set as the corresponding feature value of the neural network layer; it can be used in situations where monitoring accuracy requirements are not too high.
  • set the ⁇ value to a smaller value, that is, for each neural network layer, extract a smaller amount of output data from the corresponding output data set as the corresponding feature value of the neural network layer, thereby saving computing resources and improving processing efficiency to better meet real-time requirements.
  • can be 10%.
  • ⁇ *n tmp can be rounded down to obtain n.
  • n tmp can be determined by the following formula (3):
  • n tmp min i ⁇ m ⁇ (i)............(3)
  • ⁇ (i) represents the number of output data in the output data set corresponding to the i-th neural network layer among the m neural network layers.
  • the number of output data to be extracted by each neural network layer can be determined, that is, the number of eigenvalues in the eigenvalue set.
  • 10% of the total output data contained in the first output data set with the smallest amount of output data can be used as the number of output data to be extracted by each neural network layer, thereby simplifying the operation overhead and improving the efficiency of subsequent processing.
  • Method 2 Use the number of output data in the output data set corresponding to each neural network layer as the weight to extract the feature value set corresponding to each neural network layer.
  • the number of feature values extracted by each neural network layer can be appropriately changed.
  • the amount of output data extracted by each neural network layer can be allocated according to the weight of the number of output data corresponding to each neural network layer, that is, the greater the number of output data corresponding to the neural network layer, the more output data will be extracted as Feature values; correspondingly, the smaller the number of output data corresponding to the neural network layer, the smaller the number of output data is extracted as feature values; thereby achieving adaptive extraction of the feature value set corresponding to each neural network, the extracted feature value set It can more accurately reflect the distribution of output data of each neural network layer.
  • through feature value extraction it simplifies the computational overhead, improves the efficiency of subsequent processing, and meets the real-time requirements for fault monitoring.
  • Step 303 Calculate the relative entropy value between the feature value set corresponding to each neural network layer and the first element set that conforms to the preset probability distribution, and obtain the relative entropy value set corresponding to the m neural network layers.
  • the first set of elements may include multiple elements that comply with the preset probability distribution, and the first set of elements may be generated in real time or pre-stored; for example, a preset number of elements that comply with the preset probability distribution may be generated in real time.
  • Random numbers the preset number of random numbers constitute the first element set; for example, the preset probability distribution can be a Gaussian distribution.
  • the relative entropy value between the feature value set corresponding to the neural network layer and the first element set that conforms to the preset probability distribution can be obtained.
  • the relative entropy value It is a real number, and its numerical value indicates the difference between the distribution composed of each eigenvalue in the eigenvalue set corresponding to the neural network layer and the preset probability distribution.
  • Step 304 Based on the set of relative entropy values corresponding to the m neural network layers, determine whether there is a running failure in the neural network model to be monitored.
  • the relative entropy value between the feature value set extracted from the normal output data of each neural network layer during the inference process and the first element set that conforms to the preset probability distribution can represent the normal output of each neural network layer.
  • the difference between the data and the first element set; the relative entropy between the feature value set extracted from the abnormal output data of each neural network layer during the reasoning process when the neural network model fails and the first element set that conforms to the preset probability distribution The value can represent the difference between the abnormal output data of each neural network layer and the first element set; since the normal output data of each neural network layer is different from the abnormal output data of each neural network layer, correspondingly, each neural network layer
  • the relative entropy value between the normal output data of the neural network layer and the first element set is different from the relative entropy value between the abnormal output data of the neural network layer and the first element set.
  • the relative entropy value can be used to distinguish the neural network model
  • the amount of data in the output data set of each neural network layer (for example, normal output data or abnormal output data) is usually large, that is, the output data set is widely distributed in the data space, and different relative entropy values are used to distinguish different outputs
  • the data set, that is, the relative entropy value has a corresponding relationship with the output data set that is widely distributed in the data space, so that the differences between different output data sets in the data space are spread out through different relative entropy values, and the coupling of different output data sets is reduced.
  • each neural network is distinguished through the set of relative entropy values corresponding to the m neural network layers.
  • the normal output data and abnormal output data of the layer can be more accurately judged whether there is a running failure in the neural network model to be monitored. For example, if the difference between the normal output data of each neural network layer and the abnormal output data of each neural network layer is small, the difference between the two is not easy to be directly distinguished; and the feature value set extracted from the normal output data is consistent with the preset probability distribution.
  • the relative entropy value between the first element set is different from the relative entropy value between the feature value set extracted from the abnormal output data and the first element set that conforms to the preset probability distribution.
  • the relative entropy value is used to distinguish normal output data from abnormality. Output data to accurately determine whether there is a running failure in the neural network model to be monitored.
  • this step may include: inputting a set of relative entropy values corresponding to m neural network layers into a preset classification model, and determining whether there is a running failure in the neural network model to be monitored.
  • the preset classification model can automatically classify the relative entropy value set according to the size of each relative entropy value in the relative entropy value set, and accurately determine the category to which the relative entropy value set belongs; wherein the category can include the neural network to be monitored.
  • the model works normally and the neural network model to be monitored fails; for example, the relative entropy value set is input into the preset classification model, and the feature value set extracted by the preset classification model based on the known normal output data matches the preset
  • the relative entropy value between the element set of the probability distribution, and the relative entropy value between the feature value set extracted from the abnormal output data and the element set that conforms to the preset probability distribution, the relative entropy value set is classified, so as to accurately judge the target Monitor the neural network model for operational failures.
  • the preset classification model may include a first classifier based on machine learning or a second classifier based on deep learning, etc.; for example, the first classifier may be KNN, the second classifier may be MLP, etc.
  • the neural network model fault monitoring method in the autonomous driving system provided by the embodiments of this application has the characteristics of low computational overhead, high real-time performance, high accuracy, and wide application range.
  • the number of neural network layers included is usually large, and the corresponding output data is large. Therefore, based on the idea of the Monte Carlo method, Selectively sample the output data of each neural network layer and extract part of the output data in the output data set as feature values. The extracted feature value distribution can be used as an estimate of the distribution of the output data of each neural network layer in the target output data set.
  • the distribution of output data of each neural network layer is reflected by using as few eigenvalues as possible, thereby simplifying calculations, saving computational overhead, and improving computational efficiency; at the same time, by calculating the eigenvalue set corresponding to each neural network layer and the value that conforms to the preset probability distribution
  • the relative entropy value between the first element set is obtained to obtain a relative entropy value set, which realizes data dimensionality reduction and further improves the computing efficiency; thereby improving the real-time performance of fault monitoring and realizing the detection of neural network model faults in the autonomous driving system. real-time monitoring.
  • relative entropy values are used to describe the distribution difference characteristics of normal output data and abnormal output data of each neural network layer, and the normal output data of each neural network layer is distinguished through the set of relative entropy values corresponding to m neural network layers.
  • Output data and abnormal output data thereby more accurately judging whether there is a running fault in the neural network model to be monitored based on the relative entropy value set, improving the accuracy of fault monitoring.
  • Alexnet compared with the SED fault monitoring method, when the same 500 errors occur in Alexnet, the embodiment of the present application greatly improves the accuracy of Alexnet fault monitoring.
  • various operating faults of neural network models or operating faults of various neural network models can be effectively monitored, and the application scope is wide; for example, various types of neural networks such as deep neural network models and convolutional neural network models can be monitored.
  • the operating failure of the model for another example, the operating failure of the neural network model to be monitored in the autonomous driving system caused by hardware failure in the equipment such as the on-board computing platform or AI accelerator where the neural network model is deployed can be monitored in real time, including transient failures and permanent failures. It can also monitor in real time the operating failures of the neural network model to be monitored caused by abnormal inputs in the autonomous driving system, thereby improving the safety of on-board computing platforms or AI accelerators.
  • the range of neural network layers that may fail can also be determined, that is, it can be determined that one or more neural network layers among the m neural network layers cause operational failures in the neural network model to be monitored.
  • the following is an example of a possible implementation method for determining whether there is a running failure in the neural network model to be monitored based on the set of relative entropy values in the above step 304.
  • Method 1 Taking the preset classification model as the first classifier based on machine learning as an example, the relative entropy value set can be input into the first classifier, and the relationship between the relative entropy value set and multiple relative entropy value sample sets can be calculated. Distance; based on the distance between the relative entropy value set and multiple relative entropy value sample sets, determine whether there is a running failure in the neural network model to be monitored.
  • the plurality of relative entropy value sample sets may include a relative entropy value sample set corresponding to m neural network layers when the neural network model to be monitored fails and a relative entropy value corresponding to m neural network layers when the neural network model to be monitored works normally. Sample collection.
  • multiple relative entropy value sample sets can be obtained by pre-sampling, that is, the category to which each relative entropy value sample set belongs is known, where the categories can be divided into normal operation of the neural network model to be monitored and neural network to be monitored. The model is malfunctioning.
  • the distance between the relative entropy value set and multiple relative entropy value sample sets can represent the degree of difference between the relative entropy value set and each relative entropy value sample set in the multiple relative entropy value sample sets; for example, if the relative entropy value set The greater the distance from a certain relative entropy value sample set, the greater the difference between the relative entropy value set and the relative entropy value sample set.
  • the relative entropy value set and the relative entropy value sample set belong to the same category. The lower the possibility. If the distance between the relative entropy value set and a certain relative entropy value sample set is smaller, it means that the difference between the relative entropy value set and the relative entropy value sample set is smaller. Correspondingly, the relative entropy value set and the relative entropy value sample set are smaller. The sample collection is more likely to belong to the same category.
  • a relative entropy value set can be input into a first classifier, and the first classifier calculates the distance between the relative entropy value set and a plurality of relative entropy value sample sets, so that relative entropy value samples of different categories can be classified into If the set is divided in the feature space, it can be considered that the relative entropy value set and the divided one or more relative entropy value sample sets that are closest to the relative entropy value set are more likely to belong to the same category, and then according to the one or more relative entropy value set The category to which most relative entropy value sample sets belong in the relative entropy value sample set is used to determine whether there is a running failure in the neural network model to be monitored.
  • the relative entropy value set is input into the KNN classifier.
  • the KNN classifier can automatically calculate the relative entropy value set and the relative thresholds in the multiple relative entropy value sample sets. distance of the sample set, and select the K relative entropy value sample sets that are closest to the relative entropy value set, and use the majority vote to select the category to which the majority of the K relative entropy value sample sets belong to as the relative entropy value.
  • the category of the set if the category of the relative entropy value set is that the neural network model to be monitored is faulty, it can be judged that the neural network model to be monitored has a running failure; if the category of the relative entropy value set is that the neural network model to be monitored is working normally, then It can be judged that the neural network model to be monitored does not have operating faults.
  • the relative entropy value set can be automatically classified more conveniently and quickly based on the distance between the relative entropy value set and multiple relative entropy value sample sets, so as to achieve real-time Determine whether there is a running failure in the neural network model to be monitored.
  • Method 2 Taking the preset classification model as the second classifier based on deep learning as an example, the relative entropy value set can be input into the second classifier to determine whether there is a running failure in the neural network model to be monitored; where, the second classification The device is trained by multiple relative entropy value sample sets.
  • the second classifier can be trained in advance based on multiple relative entropy value sample sets and the known categories to which each relative entropy value sample set belongs. After training, the second classifier can accurately distinguish the relative entropy values of different categories. gather. Furthermore, when performing fault monitoring, the relative entropy value set can be input into the second classifier after training. The second classifier can automatically determine the category to which the relative entropy value set belongs, thereby accurately determining whether there is a running fault in the neural network model to be monitored.
  • the topology structure of the MLP can be set according to the number of relative entropy values in the relative entropy value set and the number of classification categories; for example, the topology structure of the MLP can be (n- 20-2), where n represents the number of relative entropy values in the relative entropy value set input to the MLP input layer; 20 represents the number of MLP hidden layers, and 2 represents the two categories output by the MLP output layer, that is, the neural network to be monitored The network model fails and the neural network model to be monitored works normally. In the training phase, multiple relative entropy value sample sets are used as training samples to train the MLP.
  • the relative entropy value sample sets corresponding to the m neural network layers can be used as negative samples.
  • the set of relative entropy value samples corresponding to the m neural network layers can be used as positive samples; the training samples and the corresponding category labels are input into the MLP to train the weight parameters in the MLP.
  • a training sample can be input To the MLP, the MLP outputs the category of the training sample, determines the loss function value based on the category and the category label of the training sample, performs backpropagation based on the loss function value, and adjusts the weight parameters in the MLP; using multiple training samples , repeat the above training process until convergence is reached, fix the weight parameters in the MLP at the time of convergence, and obtain the trained MLP.
  • the relative entropy value set is input into the above-mentioned trained MLP.
  • the MLP can automatically output the category of the relative entropy value set, thereby accurately determining whether the current neural network model to be monitored has a running failure in real time.
  • Alexnet used for image recognition, when using the trained MLP to determine whether Alexnet has a running failure, compared with using SED, the judgment accuracy is increased by about 15%.
  • KNN and MLP are only examples, and other classifiers can be used as classification models as needed, without limitation.
  • the relative entropy value sample set corresponding to the m neural network layers may include: the first eigenvalue sample set corresponding to each neural network layer in the m neural network layers and the set of relative entropy values consistent with the preset The relative entropy value between the second element set of the probability distribution; where, the first feature value sample set is extracted from the output data sample set corresponding to each neural network layer when the neural network model to be monitored fails; the neural network model to be monitored
  • the relative entropy value sample set corresponding to the m neural network layers during normal operation may include: between the second feature value sample set corresponding to each neural network layer in the m neural network layers and the second element set that conforms to the preset probability distribution relative entropy value; wherein, the second feature value sample set is extracted from the output data sample set corresponding to each neural network layer when the neural network model to be monitored works normally.
  • the second element set may be the same as the above-mentioned first element set; it can be understood that the element set that conforms to the preset probability distribution, that is, the second element set, can be determined in advance, and in the fault monitoring stage, the element set that conforms to the preset probability distribution is used. Let the element set of the probability distribution be the first element set.
  • Figure 4 shows a flow chart of a method for obtaining a relative entropy value sample set according to an embodiment of the present application. As shown in Figure 4, it may include the following steps:
  • Step 401 Obtain a set of output data samples corresponding to at least one neural network layer in the neural network model to be monitored when the neural network model to be monitored fails and when it is working normally.
  • a set of output data samples corresponding to each of the m neural network layers of the neural network model to be monitored can be obtained.
  • the neural network model as the deep neural network model used for image recognition in the perception module as an example
  • the object in the original image is pre-marked as a pedestrian, and the original image is input to the neural network to be monitored.
  • the neural network model to be monitored determines that the object contained in the original image is a pedestrian through reasoning, and then the output data of each neural network layer in the reasoning process is collected as the neural network model to be monitored.
  • m A set of output data samples corresponding to each neural network layer in the neural network layer.
  • fault injection can be used to simulate a fault during the inference process of the neural network model to be monitored, thereby obtaining the neural network values of each of the m neural network layers in the neural network model to be monitored when a fault occurs in the neural network model to be monitored.
  • the set of output data samples corresponding to the layer can be used to simulate a fault during the inference process of the neural network model to be monitored, thereby obtaining the neural network values of each of the m neural network layers in the neural network model to be monitored when a fault occurs in the neural network model to be monitored.
  • the neural network model as the deep neural network model used for image recognition in the perception module as an example
  • the object in the original image is pre-marked as a pedestrian, and the original image is input into the neural network to be monitored.
  • a fault can be injected into the model.
  • the neural network model to be monitored determines through reasoning that the object contained in the original image is not a pedestrian. Then the output data of each neural network layer in the reasoning process is collected, thus serving as a fault in the neural network model to be monitored.
  • different faults can be injected in sequence or different original images can be used.
  • the neural network model to be monitored performs multiple inference calculations, and the output data of each neural network layer in each inference process is collected accordingly, thereby obtaining the neural network to be monitored.
  • a model failure occurs, a set of multiple output data samples corresponding to each of the m neural network layers.
  • a set of output data samples corresponding to each of the m neural network layers in the neural network model to be monitored can be obtained by generating adversarial samples.
  • adversarial samples represent input data for which the neural network model to be monitored cannot perform normal inference.
  • the neural network model as the deep neural network model used for image recognition in the perception module as an example
  • the object in the original image is pre-marked as a pedestrian, and a very small amount of Carefully constructed noise is used to obtain an adversarial image.
  • the human eye is usually unable to distinguish the adversarial image from the original image.
  • the neural network model to be monitored may misclassify the objects in the adversarial image. For example, it may determine that the adversarial image contains The object is not a pedestrian, so an error occurs; then the output data of each neural network layer during the inference process is collected as a set of output data samples corresponding to each of the m neural network layers when the neural network model to be monitored fails.
  • Step 402 From the output data sample set corresponding to at least one neural network layer, extract a feature value sample set corresponding to at least one neural network layer.
  • the method of extracting the feature value sample set may refer to the relevant expressions in step 303 above, and will not be described again here.
  • the number of eigenvalue samples in the eigenvalue sample set can be determined through the above formulas (1) and (2).
  • the value of the sampling coefficient can be set according to requirements. For example, a smaller sampling coefficient can be set to reduce the number of feature value samples in the feature value sample set, thereby effectively improving the training efficiency of the second classifier and achieving a small number of Under the data, the second classifier can be trained to effectively save computing resources; or, the efficiency of the first classifier in automatically classifying the relative entropy value set can be effectively improved to better meet the real-time requirements of fault monitoring.
  • the first feature value sample set corresponding to each neural network layer can be extracted from the output data sample set corresponding to each neural network layer among the m neural network layers.
  • the second feature value sample set corresponding to each neural network layer can be extracted from the output data sample set corresponding to each neural network layer among the m neural network layers.
  • the output data sample can be extracted as a feature value sample from the output data sample set corresponding to the neural network layer according to the preset probability distribution, thereby obtaining the neural network layer.
  • the set of feature value samples corresponding to the network layer improves the robustness of the classification model.
  • Step 403 Calculate the relative entropy value between the feature value sample set corresponding to at least one neural network layer and the second element set that conforms to the preset probability distribution, to obtain a relative entropy value sample set.
  • the relative entropy value between the first feature value sample set and the second element set corresponding to each neural network layer can be calculated to obtain the relative entropy value samples corresponding to the m neural network layers when the neural network model to be monitored fails.
  • the relative entropy value between the second feature value sample set corresponding to each neural network layer and the second element set can be calculated to obtain the relative entropy value sample set corresponding to the m neural network layers when the neural network model to be monitored is operating normally.
  • the category to which the relative entropy value sample set belongs can also be marked, wherein the category to which the relative entropy value sample set corresponding to the m neural network layers corresponds when the neural network model to be monitored fails can be marked as the neural network model to be monitored fails,
  • the category of the relative entropy value sample set corresponding to the m neural network layers can be marked as the neural network model to be monitored works normally.
  • the methods provided by the embodiments of the present application have strong scalability.
  • the internal results of the neural network model can be analyzed to perform model-agnostic analysis. etc.;
  • more levels of classification of operating faults can be achieved.
  • the neural network model to be monitored as a deep neural network model used for image recognition in the perception module of the autonomous driving system as an example, the fault monitoring method of the neural network model shown in Figure 3 above is exemplarily explained below.
  • Figure 5 shows a schematic diagram of a neural network model fault monitoring method in an autonomous driving system according to an embodiment of the present application.
  • the deep neural network model used for image recognition in the perception module can be deployed on the vehicle-mounted computing platform Or in the AI accelerator, during the operation of the autonomous driving system, the perception module can obtain each frame of image collected by the on-board camera, use the deep neural network model for image recognition to perform inference, and output the recognition result.
  • the neural network model fault monitoring device can perform the above step 301 to obtain the processing of the frame image by the deep neural network model used for image recognition.
  • Each of the m neural network layers in the neural network model The output data set corresponding to the neural network layer.
  • the neural network model fault monitoring device can perform the above step 302 to extract the feature value set corresponding to each neural network layer from the output data set corresponding to each neural network layer among the m neural network layers.
  • the corresponding eigenvalue set can be expressed in the form of an eigenvalue vector; as an example, by extracting n eigenvalues from the mth neural network layer, the eigenvalue vector Am can be obtained :
  • n the number of feature values
  • m the number of neural network layers.
  • the same amount of output data can be extracted as the feature value set corresponding to each neural network layer; then the obtained feature value set corresponding to each neural network layer is as shown in the following formula (5):
  • A1, A2...Am represents the set of feature values corresponding to the m neural network layers of the vector.
  • A is an eigenvalue matrix with m rows and n columns.
  • the eigenvalue matrix includes the eigenvalue set corresponding to each neural network layer.
  • This eigenvalue matrix is constructed based on Monte Carlo thinking to reflect the operating status of the deep neural network model used for image recognition.
  • a large amount of intermediate calculation data will be generated. , that is, the output data corresponding to each neural network layer.
  • the eigenvalue matrix is generated, thereby establishing an estimate of the output data corresponding to each neural network layer.
  • the neural network model fault monitoring device can perform the above step 303, calculate the relative entropy value between the feature value set corresponding to each neural network layer and the first element set consistent with the Gaussian distribution, and obtain the relative entropy value corresponding to the m neural network layers.
  • the first element set may be expressed in the form of a reference matrix
  • the relative entropy value set may be expressed in the form of a relative entropy value matrix
  • the reference matrix G can be expressed as the following formula (6):
  • g1, g2,..., gn respectively represent a random number obeying the standard normal distribution (N ⁇ (0, 1)); that is, the reference matrix G includes the first element set that conforms to the Gaussian distribution.
  • the relative entropy matrix can be determined based on the eigenvalue matrix and the reference matrix; illustratively, the relative entropy value KLm of the eigenvalue vector Am and the reference matrix G can be obtained by combining formula (4) and formula (6), as follows As stated in formula (7):
  • a m (i) represents the i-th element of the eigenvalue vector Am
  • G(i) represents the i-th element in the reference matrix
  • ln( ⁇ ) represents the calculation of the natural logarithm
  • ⁇ n ( ⁇ ) means summing n pieces of data.
  • KL [KL 1 KL 2 ...KL m ] T .(8)
  • each element in the relative entropy value matrix KL represents a relative entropy value. That is, the relative entropy value matrix KL includes the relative entropy value of the feature value set corresponding to each neural network layer and the first element set.
  • the relative entropy value matrix KL shown in formula (8) is a 1 ⁇ m matrix, thereby reducing the dimensionality of the m ⁇ n eigenvalue matrix A shown in formula (5) into a 1 ⁇ m matrix, realizing the data Dimensionality reduction further improves computational efficiency.
  • the relative entropy matrix KL describes the distribution difference between the feature quantities extracted by the m layer in the neural network and the reference matrix G.
  • the inference data inside the neural network is not directly classified, but the eigenvalue matrix A and the reference matrix G of the Gaussian distribution are used for data projection.
  • Each eigenvector in formula (8) represents the projection space.
  • One of the feature points in The difference between the output data and the abnormal output data of each neural network layer in the reasoning process when the neural network model fails is widened, and the coupling degree of the normal output data and the abnormal output data is reduced.
  • the neural network model fault monitoring device can perform the above step 304, using the classification model to quickly classify the relative entropy value matrix KL, so as to determine whether there is a running fault in the neural network model to be monitored in real time and accurately.
  • the neural network model fault monitoring device can also feed back the monitoring results to the perception module, or perception fusion module, or system health management module for early warning reporting; for example, in the classification model, the category corresponding to the relative entropy value matrix KL is determined to be the neural network to be monitored During normal operation, the result can be fed back to the perception module.
  • the perception module After receiving the feedback, the perception module will transmit the current perception result to the planning and decision-making module; in the classification model, it is determined that the category corresponding to the relative entropy matrix KL is the neural network to be monitored. When a fault occurs, the result can be fed back to the sensing module. After receiving the feedback, the sensing module discards the current sensing result.
  • embodiments of the present application also provide a neural network model fault monitoring device in the automatic driving system.
  • the neural network model fault monitoring device in the automatic driving system can be used to execute the above method embodiments.
  • Figure 6 shows a schematic structural diagram of a neural network model fault monitoring device in an autonomous driving system according to an embodiment of the present application.
  • the device may include: a transmission module 601, used to obtain The target output data set of the neural network model to be monitored, the target output data set includes the output data set corresponding to each neural network layer in m neural network layers, wherein the neural network model to be monitored includes M neural network layers, M is an integer greater than 1, m is an integer greater than 1 and not greater than M; the processing module 602 is used to extract the feature value set corresponding to each neural network layer in the target output data set; calculate the feature The relative entropy value between the value set and the first element set that conforms to the preset probability distribution is obtained to obtain the relative entropy value set corresponding to the m neural network layers; according to the relative entropy value set, the neural network to be monitored is judged Does the model have operational failures?
  • the output data of each neural network layer is selectively sampled, part of the output data in the output data set is extracted as feature values, and each neural network is reflected by as few feature values as possible Layer output data distribution, thus simplifying calculations, saving computational overhead, and improving computational efficiency; at the same time, by calculating the relative entropy value between the feature value set corresponding to each neural network layer and the first element set that conforms to the preset probability distribution, A set of relative entropy values is obtained, data dimensionality is reduced, and computing efficiency is further improved; thereby improving the real-time performance of fault monitoring and realizing real-time monitoring of neural network model faults in the autonomous driving system.
  • the relative entropy value is used to describe the distribution difference characteristics of the normal output data and abnormal output data of each neural network layer, and distinguish the normal output data and abnormal output data of each neural network layer, so as to pass the relative entropy corresponding to the m neural network layers.
  • the entropy value set can more accurately determine whether there is a running fault in the neural network model to be monitored, improving the accuracy of fault monitoring. In addition, it can effectively monitor various operating failures of neural network models or operating failures of various neural network models, and has a wide range of applications.
  • the processing module 602 is further configured to: determine the first output data set with the smallest number of output data among the target output data sets; The number of data, in the output data set corresponding to each neural network layer, extract the feature value set corresponding to each neural network layer; wherein, the number of feature values in the extracted feature value set corresponding to each neural network layer All are less than or equal to the number of output data in the first output data set.
  • the processing module 602 is also configured to: use the number of output data in the output data set corresponding to each neural network layer as a weight to extract the feature value corresponding to each neural network layer. gather.
  • the processing module 602 is further configured to input the set of relative entropy values into a preset classification model and determine whether there is a running failure in the neural network model to be monitored.
  • the preset classification model includes a first classifier based on machine learning; the processing module 602 is also configured to: input the relative entropy value set to the first classifier , calculate the distance between the relative entropy value set and multiple relative entropy value sample sets; wherein the multiple relative entropy value sample sets include the m neural networks when the neural network model to be monitored fails.
  • the relative entropy value sample set corresponding to the layer and the relative entropy value sample set corresponding to the m neural network layers when the neural network model to be monitored is working normally; according to the relative entropy value set and the relative entropy value sample set. distance between them to determine whether there is a running failure in the neural network model to be monitored.
  • the classification model includes a second classifier based on deep learning; the processing module 602 is also configured to: input the set of relative entropy values into the second classifier, Determine whether there is a running failure in the neural network model to be monitored; wherein the second classifier is trained by a plurality of relative entropy value sample sets.
  • the relative entropy value sample set corresponding to the m neural network layers includes: the relative entropy value sample set corresponding to each of the m neural network layers.
  • each module in the above device is only a division of logical functions. In actual implementation, they can be fully or partially integrated into a physical entity, or they can also be physically separated.
  • the modules in the device can be implemented in the form of the processor calling software; for example, the device includes a processor, the processor is connected to a memory, instructions are stored in the memory, and the processor calls the instructions stored in the memory to implement any of the above methods.
  • the processor is, for example, a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or a microprocessor
  • the memory is a memory within the device or a memory outside the device.
  • the modules in the device can be implemented in the form of hardware circuits, and some or all of the module functions can be implemented through the design of the hardware circuits, which can be understood as one or more processors; for example, in one implementation,
  • the hardware circuit is an application-specific integrated circuit (ASIC).
  • ASIC application-specific integrated circuit
  • the hardware circuit is It can be realized by programmable logic device (PLD), taking Field Programmable Gate Array (FPGA) as an example, which can include a large number of logic gate circuits, and the logic gate circuits are configured through configuration files. connection relationships to realize the functions of some or all of the above modules. All modules of the above device may be fully implemented by the processor calling software, or all may be implemented by hardware circuits, or part of the modules may be implemented by the processor calling software, and the remaining part may be implemented by hardware circuits.
  • PLD programmable logic device
  • FPGA Field Programmable Gate Array
  • the processor is a circuit with signal processing capabilities.
  • the processor may be a circuit with instruction reading and execution capabilities, such as a CPU, a microprocessor, and a graphics processor. (graphics processing unit, GPU) (can be understood as a microprocessor), or digital signal processor (digital signal processor, DSP), etc.; in another implementation, the processor can achieve certain functions through the logical relationship of the hardware circuit. Function, the logical relationship of the hardware circuit is fixed or can be reconstructed, such as the hardware circuit implemented by the processor for ASIC or PLD, such as FPGA.
  • the process of the processor loading the configuration file and realizing the hardware circuit configuration can be understood as the process of the processor loading instructions to realize the functions of some or all of the above modules.
  • each module in the above device can be one or more processors (or processing circuits) configured to implement the methods of the above embodiments, such as: CPU, GPU, microprocessor, DSP, ASIC, FPGA, or these processes A combination of at least two of the vessel forms.
  • the SOC may include at least one processor for implementing any of the above methods or implementing the functions of each module of the device.
  • the at least one processor may be of different types, such as a CPU and an FPGA, or a CPU and an artificial intelligence processor. CPU and GPU etc.
  • Embodiments of the present application also provide a neural network model fault monitoring device in an autonomous driving system, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions When implementing the method of the above embodiment. For example, each step of the neural network model fault monitoring method in the automatic driving system shown in FIG. 3, FIG. 4 or FIG. 5 may be executed.
  • Figure 7 shows a schematic structural diagram of a neural network model fault monitoring device in an autonomous driving system according to an embodiment of the present application.
  • the neural network model fault monitoring device in an autonomous driving system may include: at least one process Device 701, communication line 702, memory 703 and at least one communication interface 704.
  • the processor 701 can be a general central processing unit, a microprocessor, an application-specific integrated circuit, or one or more integrated circuits used to control the execution of the program of the present application; the processor 701 can also include multiple general-purpose processors.
  • the structural computing architecture for example, can be a combination of at least two of CPU, GPU, microprocessor, DSP, ASIC, and FPGA; as an example, the processor 701 can be CPU+GPU or CPU+ASIC or CPU+FPGA.
  • Communication line 702 may include a path that carries information between the above-mentioned components.
  • the communication interface 704 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, RAN, wireless local area networks (WLAN), etc.
  • a transceiver to communicate with other devices or communication networks, such as Ethernet, RAN, wireless local area networks (WLAN), etc.
  • WLAN wireless local area networks
  • Memory 703 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory (RAM)) or other type that can store information and instructions.
  • a dynamic storage device can also be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by a computer Any other medium for access, but not limited to this.
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc storage including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.
  • the memory may exist independently and be connected to the processor through a communication line 702 . Memory can also be integrated with the processor.
  • the memory provided by the embodiment of the present application may generally be non-volatile.
  • the memory 703 is used to store computer execution instructions for executing the solution of the present application, and is controlled by the processor 701 for execution.
  • the processor 701 is used to execute computer execution instructions stored in the memory 703 to implement the methods provided in the above embodiments of the application; for example, the neural network in the automatic driving system shown in the above-mentioned Figure 3, Figure 4 or Figure 5 can be implemented. Each step of the network model fault monitoring method.
  • the computer-executed instructions in the embodiments of the present application may also be called application codes, which are not specifically limited in the embodiments of the present application.
  • the processor 701 may include one or more CPUs, for example, CPU0 in Figure 7; the processor 701 may also include one CPU, and any one of GPU, ASIC, and FPGA, for example, CPU0+ in Figure 7 GPU0 or CPU 0+ASIC0 or CPU0+FPGA0.
  • the neural network model fault monitoring device in the autonomous driving system may include multiple processors, such as processor 701 and processor 707 in Figure 7 .
  • processors can be a single-CPU processor, a multi-CPU processor, or a heterogeneous computing architecture including multiple general-purpose processors.
  • a processor here may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the neural network model fault monitoring device in the autonomous driving system may also include an output device 705 and an input device 706.
  • Output device 705 communicates with processor 701 and can display information in a variety of ways.
  • the output device 705 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector. etc., for example, it can be a display device such as a vehicle HUD, AR-HUD, or monitor.
  • Input device 706 communicates with processor 701 and can receive user input in a variety of ways.
  • the input device 706 may be a mouse, a keyboard, a touch screen device, a sensing device, or the like.
  • the transmission module 601 in Figure 6 can be implemented by the communication interface 704 in Figure 7; the processing module 602 in Figure 6 can Implemented by processor 701 in Figure 7.
  • Embodiments of the present application provide a computer-readable storage medium on which computer program instructions are stored.
  • the computer program instructions are executed by a processor, the methods in the above embodiments are implemented.
  • each step of the neural network model fault monitoring method in the automatic driving system shown in Figure 3, Figure 4 or Figure 5 can be implemented.
  • Embodiments of the present application provide a computer program product, which may, for example, include computer readable code, or a non-volatile computer readable storage medium carrying computer readable code; when the computer program product is run on a computer When, the computer is caused to execute the method in the above embodiment. For example, each step of the neural network model fault monitoring method in the automatic driving system shown in FIG. 3, FIG. 4 or FIG. 5 may be executed.
  • Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or Flash memory), Static Random Access Memory (SRAM), Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanical Coding Device, such as a printer with instructions stored on it.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • Flash memory Static Random Access Memory
  • CD-ROM Compact Disk Read Only Memory
  • DVD Digital Versatile Disk
  • Memory Stick
  • Computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
  • Computer program instructions for performing the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • the computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect).
  • LAN local area network
  • WAN wide area network
  • an external computer such as an Internet service provider through the Internet. connect
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA)
  • the electronic circuit can Computer readable program instructions are executed to implement various aspects of the application.
  • These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s).
  • Executable instructions may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.

Abstract

一种自动驾驶系统中神经网络模型故障监测方法及装置,其中,该方法包括:获取自动驾驶系统中的待监测神经网络模型的目标输出数据集合(301),目标输出数据集合包括m个神经网络层中各神经网络层对应的输出数据集合,其中,待监测神经网络模型包括M个神经网络层,M为大于1的整数,m为大于1且不大于M的整数;在目标输出数据集合中,提取各神经网络层对应的特征值集合(302);计算特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到m个神经网络层对应的相对熵值集合(303);根据相对熵值集合,判断待监测神经网络模型是否存在运行故障(304)。通过该方法,提高了对待监测神经网络模型故障监测的实时性及准确性,保障了自动驾驶车辆安全。

Description

一种自动驾驶系统中神经网络模型故障监测方法及装置 技术领域
本申请涉及自动驾驶技术领域,尤其涉及一种自动驾驶系统中神经网络模型故障监测方法及装置。
背景技术
由于自动驾驶系统中计算平台、人工智能(Artificial Intelligence,AI)加速器等设备的高度复杂性,部署在这些设备上的神经网络模型在进行推理运算时,更容易受到硬件失效等因素的影响;因此,及时准确地监测神经网络模型是否出现运行故障,对于保证自动驾驶车辆安全性具有重要意义。
发明内容
有鉴于此,提出了一种自动驾驶系统中神经网络模型故障监测方法、装置、存储介质及计算机程序产品。
第一方面,本申请的实施例提供了一种自动驾驶系统中神经网络模型故障监测方法,所述方法包括:获取自动驾驶系统中的待监测神经网络模型的目标输出数据集合,所述目标输出数据集合包括m个神经网络层中各神经网络层对应的输出数据集合,其中,所述待监测神经网络模型包括M个神经网络层,M为大于1的整数,m为大于1且不大于M的整数;在所述目标输出数据集合中,提取所述各神经网络层对应的特征值集合;计算所述特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到所述m个神经网络层对应的相对熵值集合;根据所述相对熵值集合,判断所述待监测神经网络模型是否存在运行故障。
基于上述技术方案,基于蒙特卡洛方法的思想,对各神经网络层的输出数据进行选择性采样,抽取输出数据集合中的部分输出数据作为特征值,通过尽量少的特征值反映各神经网络层输出数据的分布,从而简化计算,节约了运算开销,提高运算效率;同时,通过计算各神经网络层对应的特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到一个相对熵值集合,实现了数据降维,进一步提高了运算效率;从而提高了故障监测的实时性,实现了对自动驾驶系统中神经网络模型运行故障的实时监测。同时,采用相对熵值对各神经网络层的正常输出数据和异常输出数据的分布差异特性进行描述,区分各神经网络层的正常输出数据与异常输出数据,从而通过m个神经网络层对应的相对熵值集合更加准确地判断待监测神经网络模型是否存在运行故障,提高了故障监测的准确性;此外,可以有效监测神经网络模型的各类运行故障或各类神经网络模型的运行故障,适用范围广。
根据第一方面,在所述第一方面的第一种可能的实现方式中,所述在所述目标输 出数据集合中,提取所述各神经网络层对应的特征值集合,包括:确定所述目标输出数据集合中,输出数据的数量最小的第一输出数据集合;根据所述第一输出数据集合中输出数据的数量,在所述各神经网络层对应的输出数据集合中,提取所述各神经网络层对应的特征值集合;其中,所提取的各神经网络层对应的特征值集合中特征值的数量均小于或等于所述第一输出数据集合中输出数据的数量。
基于上述技术方案,考虑到自动驾驶系统中神经网络模型通常较复杂,目标输出数据集合中输出数据的数量较大,在各神经网络层对应的输出数据集合中,自适应提取各神经网络层对应的特征值集合,各神经网络层中提取的特征值的数据均不大于m个神经网络层中任一神经网络层中输出数据的数量,从而简化运算开销,提高后续处理效率,满足对故障监测的实时性要求。
根据第一方面,在所述第一方面的第二种可能的实现方式中,所述在所述目标输出数据集合中,提取所述各神经网络层对应的特征值集合,包括:以所述各神经网络层对应的输出数据集合中输出数据的数量为权重,提取所述各神经网络层对应的特征值集合。
基于上述技术方案,考虑到不同神经网络层中输出数据的数量可能不同,对神经网络模型的工作状态影响也亦不同;根据各个神经网络层对应的输出数据的数量的权重以分配每一神经网络层抽取输出数据的多少,从而实现自适应提取各神经网络对应的特征值集合,所提取的特征值集合能够更加准确地反映各神经网络层的输出数据的分布,同时,通过特征值提取,简化运算开销,提高后续处理效率,满足对故障监测的实时性要求。
根据第一方面或第一方面上述各种可能的实现方式,在所述第一方面的第三种可能的实现方式中,所述根据所述相对熵值集合,判断所述待监测神经网络模型是否存在运行故障,包括:将所述相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障。
在一些示例中,将相对熵值集合输入到预设分类模型中,预设分类模型基于已知的正常输出数据提取出的特征值集合与符合预设概率分布的元素集合之间的相对熵值,以及异常输出数据提取出的特征值集合与符合预设概率分布的元素集合之间的相对熵值,对相对熵值集合进行分类,从而准确判断待监测神经网络模型是否存在运行故障。
根据第一方面的第三种可能的实现方式,在所述第一方面的第四种可能的实现方式中,所述预设分类模型包括基于机器学习的第一分类器;所述将所述相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障,包括:将所述相对熵值集合输入到所述第一分类器中,计算所述相对熵值集合与多个相对熵值样本集合之间的距离;其中,所述多个相对熵值样本集合包括所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合及所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合;根据所述相对熵值集合与多个相对熵值样本集合之间的距离,判断所述待监测神经网络模型是否存在运行故障。
基于上述技术方案,利用基于机器学习的第一分类器,无需预先训练,可以根据相对熵值集合与多个相对熵值样本集合之间的距离,更加方便快捷地对相对熵值集合 进行自动分类,从而实时判断待监测神经网络模型是否存在运行故障。
根据第一方面的第三种可能的实现方式,在所述第一方面的第五种可能的实现方式中,所述分类模型包括基于深度学习的第二分类器;所述将所述相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障,包括:将所述相对熵值集合输入到所述第二分类器中,判断所述待监测神经网络模型是否存在运行故障;其中,所述第二分类器由多个相对熵值样本集合训练得到。
基于上述技术方案,通过采用基于深度学习的第二分类器,在实时判别相对熵值集合所属类别的同时,有效提高了相对熵值集合分类的准确性,从而更加准确地判断待监测神经网络模型是否存在运行故障。
根据第一方面的第四种或第五种可能的实现方式,在所述第一方面的第六种可能的实现方式中,所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第一特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第一特征值样本集合由所述待监测神经网络模型发生故障时,所述各神经网络层对应的输出数据样本集合提取得到;所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第二特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第二特征值样本集合由所述待监测神经网络模型正常工作时,所述各神经网络层对应的输出数据样本集合提取得到。
第二方面,本申请的实施例提供了一种自动驾驶系统中神经网络模型故障监测装置,所述装置包括:传输模块,用于获取自动驾驶系统中的待监测神经网络模型的目标输出数据集合,所述目标输出数据集合包括m个神经网络层中各神经网络层对应的输出数据集合,其中,所述待监测神经网络模型包括M个神经网络层,M为大于1的整数,m为大于1且不大于M的整数;处理模块,用于在所述目标输出数据集合中,提取所述各神经网络层对应的特征值集合;计算所述特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到所述m个神经网络层对应的相对熵值集合;根据所述相对熵值集合,判断所述待监测神经网络模型是否存在运行故障。
基于上述技术方案,基于蒙特卡洛方法的思想,对各神经网络层的输出数据进行选择性采样,抽取输出数据集合中的部分输出数据作为特征值,通过尽量少的特征值反映各神经网络层输出数据的分布,从而简化计算,节约了运算开销,提高运算效率;同时,通过计算各神经网络层对应的特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到一个相对熵值集合,实现了数据降维,进一步提高了运算效率;从而提高了故障监测的实时性,实现了自动驾驶系统中神经网络模型故障的实时监测。同时,采用相对熵值对各神经网络层的正常输出数据和异常输出数据的分布差异特性进行描述,区分各神经网络层的正常输出数据与异常输出数据,从而通过m个神经网络层对应的相对熵值集合更加准确地判断待监测神经网络模型是否存在运行故障,提高了故障监测的准确性。此外,可以有效监测神经网络模型的各类运行故障或各类神经网络模型的运行故障,适用范围广。
根据第二方面,在所述第二方面的第一种可能的实现方式中,所述处理模块,还 用于:确定所述目标输出数据集合中,输出数据的数量最小的第一输出数据集合;根据所述第一输出数据集合中输出数据的数量,在所述各神经网络层对应的输出数据集合中,提取所述各神经网络层对应的特征值集合;其中,所提取的各神经网络层对应的特征值集合中特征值的数量均小于或等于所述第一输出数据集合中输出数据的数量。
基于上述技术方案,考虑到自动驾驶系统中神经网络模型通常较复杂,目标输出数据集合中输出数据的数量较大,在各神经网络层对应的输出数据集合中,自适应提取各神经网络层对应的特征值集合,各神经网络层中提取的特征值的数据均不大于m个神经网络层中任一神经网络层中输出数据的数量,从而简化运算开销,提高后续处理效率,满足对故障监测的实时性要求。
根据第二方面,在所述第二方面的第二种可能的实现方式中,所述处理模块,还用于:以所述各神经网络层对应的输出数据集合中输出数据的数量为权重,提取所述各神经网络层对应的特征值集合。
基于上述技术方案,考虑到不同神经网络层中输出数据的数量也不同,对神经网络模型的工作状态影响也亦不同;根据各个神经网络层对应的输出数据的数量的权重以分配每一神经网络层抽取输出数据的多少,从而实现自适应提取各神经网络对应的特征值集合,所提取的特征值集合能够更加准确地反映各神经网络层的输出数据的分布,同时,通过特征值提取,简化运算开销,提高后续处理效率,满足对故障监测的实时性要求。
根据第二方面或第二方面上述各种可能的实现方式,在所述第二方面的第三种可能的实现方式中,所述处理模块,还用于:将所述相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障。
在一些示例中,将相对熵值集合输入到预设分类模型中,预设分类模型基于已知的正常输出数据提取出的特征值集合与符合预设概率分布的元素集合之间的相对熵值,以及异常输出数据提取出的特征值集合与符合预设概率分布的元素集合之间的相对熵值,对相对熵值集合进行分类,从而准确判断待监测神经网络模型是否存在运行故障。
根据第二方面的第三种可能的实现方式,在所述第二方面的第四种可能的实现方式中,所述预设分类模型包括基于机器学习的第一分类器;所述处理模块,还用于:将所述相对熵值集合输入到所述第一分类器中,计算所述相对熵值集合与多个相对熵值样本集合之间的距离;其中,所述多个相对熵值样本集合包括所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合及所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合;根据所述相对熵值集合与多个相对熵值样本集合之间的距离,判断所述待监测神经网络模型是否存在运行故障。
基于上述技术方案,利用基于机器学习的第一分类器,无需预先训练,可以根据相对熵值集合与多个相对熵值样本集合之间的距离,更加方便快捷地对相对熵值集合进行自动分类,从而实时判断待监测神经网络模型是否存在运行故障。
根据第二方面的第三种可能的实现方式,在所述第二方面的第五种可能的实现方式中,所述分类模型包括基于深度学习的第二分类器;所述处理模块,还用于:将所述相对熵值集合输入到所述第二分类器中,判断所述待监测神经网络模型是否存在运 行故障;其中,所述第二分类器由多个相对熵值样本集合训练得到。
基于上述技术方案,通过采用基于深度学习的第二分类器,在实时判别相对熵值集合所属类别的同时,有效提高了相对熵值集合分类的准确性,从而更加准确地判断待监测神经网络模型是否存在运行故障。
根据第二方面的第四种或第五种可能的实现方式,在所述第二方面的第六种可能的实现方式中,所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第一特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第一特征值样本集合由所述待监测神经网络模型发生故障时,所述各神经网络层对应的输出数据样本集合提取得到;所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第二特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第二特征值样本集合由所述待监测神经网络模型正常工作时,所述各神经网络层对应的输出数据样本集合提取得到。
第三方面,本申请的实施例提供了一种自动驾驶系统中神经网络模型故障监测装置,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述第一方面或者第一方面的一种或几种的自动驾驶系统中神经网络故障监测方法。
第四方面,本申请的实施例提供了一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现第一方面或者第一方面的一种或几种的自动驾驶系统中神经网络故障监测方法。
第五方面,本申请的实施例提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行上述第一方面或者第一方面的一种或几种的自动驾驶系统中神经网络故障监测方法。
上述第三方面至第五方面的技术效果,参见上述第一方面或第二方面。
附图说明
图1示出根据本申请一实施例的一种自动驾驶系统的架构示意图;
图2示出根据本申请一实施例的一种对神经网络模型进行故障监测的示意图;
图3示出根据本申请一实施例的一种自动驾驶系统中神经网络模型故障监测方法的流程图;
图4示出了根据本申请一实施例的一种获取相对熵值样本集合的方法流程图;
图5示出根据本申请一实施例的一种自动驾驶系统中神经网络模型故障监测方法的示意图;
图6示出根据本申请一实施例的一种自动驾驶系统中神经网络模型故障监测装置的结构示意图;
图7示出根据本申请一实施例的一种自动驾驶系统中神经网络模型故障监测装置的结构示意图。
具体实施方式
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的相关术语和概念进行介绍。
1、概率分布
概率分布,是指用于表述随机变量取值的概率规律。如果试验结果用随机变量的取值来表示,则随机试验的概率分布就是随机变量的概率分布,即随机变量的可能取值及取得对应值的概率。根据随机变量所属类型的不同,概率分布可以分为不同的表现形式,例如,高斯分布(又称正态分布(normal distribution))、二项分布、泊松分布、均匀分布、伯努利分布、拉普拉斯分布、指数分布、伽马分布、贝塔分布、多项式分布等等。
2、相对熵
相对熵,又称KL散度(Kullback-Leibler divergence,KLD),是两个概率分布P和Q差别的非对称性的度量。相对熵可以衡量两个概率分布之间的距离,当两个概率分布相同时,它们的相对熵为零,当两个概率分布的差别增大时,它们的相对熵也相应增大。
典型情况下,P表示数据的真实分布,Q表示数据的理论分布、估计的模型分布、或P的近似分布。则P与Q的相对熵如下述公式(1)所示,
Figure PCTCN2022083858-appb-000001
其中,P(i)表示P中第i个元素,Q(i)表示Q中第i个元素;ln(·)表示计算自然对数。
3、蒙特卡洛法
蒙特卡洛法也称统计模拟法或统计试验法,是把概率现象作为研究对象的数值模拟方法;通常按抽样调查法求取统计值来推定未知特性量,在计算仿真中,通过构造一个和系统性能相近似的概率模型,并进行随机试验,可以模拟系统的随机特性。
4、分类器
很多神经网络模型最后都有一个分类器,用于对输入数据进行分类。分类器一般由全连接层(fully connected layer)和softmax函数(可以称为归一化指数函数)组成,能够根据所输入的数据而输出不同的类别或不同类别的概率。
5、多层感知机(multi-layer perceptron,MLP)
MLP是一种前向结构的人工神经网络,映射一组输入向量到一组输出向量。MLP可以被看作是一个有向图,多层感知机的基本结构由多个节点层组成:输入层、中间隐藏层和输出层,每一节点层都全连接到下一节点层。除了输入节点,每个节点都是一个带有非线性激活函数的神经元;MLP遵循人类神经系统原理,学习并进行数据预 测,其主要优势在于具备快速解决复杂问题的能力。
6、k近邻算法(k-nearest neighbor,KNN)
KNN算法基本逻辑如下:通过测量不同特征值之间的距离进行分类,该算法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。它的基本思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别,其中K通常是不大于20的整数。KNN算法中,所选择的邻居都是已经正确分类的样本。
7、神经网络模型
神经网络模型是一种运算模型,由大量的节点(或称神经元)之间相互联接构成。每个节点代表一种特定的输出函数,称为激励函数(activation function)。每两个节点间的连接都代表一个对于通过该连接信号的加权值,称之为权重,这相当于人工神经网络的记忆。神经网络模型的输出则依神经网络模型的连接方式,权重值和激励函数的不同而不同。而神经网络模型自身通常都是对自然界某种算法或者函数的逼近,也可能是对一种逻辑策略的表达。神经网络模型通常包括多个神经网络层,其中,每一神经网络层可以包括一个或多个节点。神经网络模型可以分为深度神经网络(Deep Neural Network,DNN)、卷积神经网络(Convolutional Neuron Network,CNN)、循环神经网络(Recurrent Neural Network,RNN)等等。其中,深度神经网络,也称多层神经网络,可以理解为具有很多个隐含层的神经网络模型,其内部的神经网络层可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层;层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。卷积神经网络是一种带有卷积结构的神经网络模型;卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器;卷积层是指卷积神经网络中对输入数据进行卷积处理的神经元层;在卷积层中,一个神经元可以只与部分邻层神经元连接;一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经元组成;同一特征平面的神经元共享权重,即共享卷积核。
8、神经网络模型故障监测
神经网络模型故障监测是指在神经网络模型推理运算过程中,对神经网络模型可能出现的运行故障进行监测。其中,运行故障可以包括由部署神经网络模型的设备中硬件失效导致的故障,或者由异常输入导致的神经网络模型得到错误的推理结果等等。其中,硬件失效导致的故障一般被统称为软错误(soft errors);常见的软错误可以分为瞬态错误和永久性错误。其中,瞬态错误与辐射、温度等外界环境的突变以及硬件本身相互的干扰等因素造成的硬件失效有关,瞬态错误的特点是该错误会在某个时段出现后消失,常见的瞬态错误是比特位翻转(bitflip);常见的永久性错误是置零(stuck-at-0)和置一(stuck-at-one)两种情况,分别与硬件开路和短路造成的硬件失效有关,永久性错误的特征是该错误将会长时间保留在发生位置。
相关技术中,采用冗余设计的方式进行神经网络模型故障监测,如采用三重冗余(triple modular redundancy,TMR)设计,TMR设计中利用多个相同构造的模块并联以执行同样的功能;该方式采用一个预先准备好的查询表以尽可能的收集神经网络 模型无故障(error free)情况下所有神经元权重值,在神经网络模型推理运算过程中,若某一权重值不在查询表内,可视为该权重值出现异常,即神经网络模型存在运行故障,进而启动权重值切换状态,将发生错误的神经元中的权重值分配到其他神经元上,起到使用其他神经元取代该错误神经元的作用。或者,采用基于症状的监测器(Symptom-based Error Detectors,SED)进行神经网络模型故障监测,该方式根据神经网络模型中每一神经网络层对应的输出值大小来判别神经网络模型是否存在运行故障;该方式预先在神经网络模型无故障(error free)情况下收集每一神经网络层的多个输出值,根据收集到的多个输出值确定每一神经网络层对应输出值的合理取值范围,在神经网络模型推理运算过程中,若某一神经网络层的输出值超出其对应的合理取值范围的1.1倍时,则可认为该输出值出现错误,从而判定神经网络模型存在运行故障。
上述两种对神经网络模型故障监测的方式均存在自身的局限性。其中,采用冗余设计的方式只适用于多层感知网络,即输入数据未通过卷积、池化等方式降维;当对具有卷积层和池化层的神经网络模型可能出现的故障进行监测时,该方式由于采用权重值查找表设计,无法对卷积层或池化层进行有效监测;且该方式只能监测部分stuck-at-one和bitflip的错误;此外,针对较复杂的神经网络模型,该方式运算成本较大,无法保证故障监测的实时性,例如,神经网络模型Alexnet中输入层就有3000个以上的权重,预先收集Alexnet所有神经元的权重值,收集查询表难度及运算开销较大,在利用查询表对Alexnet进行故障监测时,由于权重数量及查询表中权重值数量庞大,查询速率较慢,无法适用于自动驾驶系统等对故障监测实时性要求高的场景。采用SED的方式,通过单纯的枚举式算法提取每一神经网络层的最大输出值及最小输出值,以得到每一神经网络层对应输出值的合理取值范围,针对较复杂的神经网络模型,该方式运算开销非常巨大,例如,在Alexnet中,考虑每个神经元、池化层以及全连接层的输出值,总计会有超过十万个以上的单个神经元的输出值,以卷积层为例,就有超过15000个输出值,因此,收集每一神经网络层的输出值会带来庞大的运算开销,在对Alexnet进行故障监测时,由于输出值数量巨大,造成故障监测的延迟,无法适用于自动驾驶系统等对故障监测实时性要求高的场景;此外,该方式只能监测瞬态错误的发生情况,而对于置零和置一,由于隐藏层在这两种情况下,最大输出值不会发生明显变化,因而无法监测永久性错误。
由于上述两种对神经网络模型故障监测的方式所存在的局限性。本申请实施例提供了一种神经网络模型故障监测方法(详细描述参见下文),可以应用于配置有神经网络模型的场景,例如,自动驾驶车辆、车载设备或车载系统(如自动驾驶系统(Automated Driving System,ADS)或高级驾驶辅助系统(Advanced Driver Assistant Systems,ADAS)等部署有神经网络模型的场景,大规模部署的深度学习训练服务器,物联网(Internet of Things,IoT)设备中采用神经网络模型进行物体识别、语义识别等场景,安防设备中采用神经网络模型进行车辆检测、物体检测等场景。本申请实施例提供的神经网络模型故障监测方法,可以准确监测上述场景中所配置的各类神经网络模型出现的多种运行故障;尤其针对自动驾驶系统等对故障监测实时性要求高的场景,可以实现实时故障监测,满足自动驾驶等场景对实时性的要求。
为了便于描述,以对自动驾驶系统中神经网络模型进行故障监测为例,对本申请实施例提供的神经网络模型故障监测方法进行示例性地说明。
图1示出根据本申请一实施例的一种自动驾驶系统的架构示意图;如图1所示,自动驾驶系统可以包括:感知模块(perception layer)、规划与决策模块(planning&decision)、传动控制模块(motion controller)。
其中,感知模块用于感知车辆周围环境或车内环境,可以综合车载传感器,例如摄像头、激光雷达、毫米波雷达、超声波雷达、光线传感器等所采集的车辆周围或车舱内的数据,感知车辆周围环境或车内环境,并可以将感知结果传输到规划与决策模块。示例性地,车载传感器所采集的车辆周围或车舱内的数据可以包括视频流、雷达的点云数据或者是经过分析的结构化的人、车、物的位置、速度、转向角度、尺寸大小等信息或数据。感知模块可以通过神经网络模型,对车载传感器所采集的车辆周围或车舱内的数据进行处理,实现环境感知,示例性地,该神经网络模型可以部署在车载计算平台或AI加速器等处理设备中。作为一个示例,感知模块可以获取车载摄像头所采集的车辆周围环境的图像,利用用于图像识别的深度神经网络模型对该图像进行处理,从而可以识别图像中行人、车道线、车辆、障碍物、交通指示灯等等对象。
规划与决策模块用于基于感知模块所生成的感知结果进行分析决策,规划生成满足特定约束条件(例如车辆本身的动力学约束、避免碰撞、乘客舒适性等)的控制集合;并可以将该控制集合传输到传动控制模块。作为一个示例,规划与决策模块可以利用用于生成轨迹的神经网络模型,对感知结果及约束条件进行处理,生成控制集合;示例性地,该神经网络模型可以部署在车载计算平台或AI加速器等处理设备中。
传动控制模块用于按照规划与决策模块所生成的控制集合,控制车辆行驶;例如,可以基于控制集合,结合车辆的动力学信息,生成方向盘转角、速度、加速度等控制信号,并控制车载转向系统或发动机等执行该控制信号,从而实现控制车辆行驶。
示例性地,自动驾驶系统还可以包括其他功能模块;例如,定位模块、交互模块、通信模块等等(图中未示出),对此不作限定。其中,定位模块可以用于提供车辆的位置信息,还可以提供车辆的姿态信息。示例性地,定位模块可以包括卫星导航系统(Global Navigation Satellite System,GNSS)、惯性导航系统(Inertial Navigation System,INS)等等,可以用于确定车辆的位置信息。交互模块可以用于向驾驶员发出信息及接收驾驶员的指令。通信模块可以用于车辆与其他设备通信,其中,其他设备可以包括移动终端、云端设备、其他车辆、路侧设备等等,可以通过2G/3G/4G/5G、蓝牙、调频(frequency modulation,FM)、无线局域网(wireless local area networks,WLAN)、长期演进(long time evolution,LTE)、车与任何事物相通信(vehicle to everything,V2X)、车与车通信(Vehicle to Vehicle,V2V)、长期演进-车辆(long time evolution vehicle,LTE-V)等无线通信连接来实现。
本申请实施例提供的自动驾驶系统中神经网络模型故障监测方法可以由神经网络模型故障监测装置执行,作为一个示例,以对图1中感知模块中用于图像识别的深度神经网络模型进行故障监测为例,图2示出根据本申请一实施例的一种对神经网络模型进行故障监测的示意图;如图2所示,神经网络模型故障监测装置可以获取自动驾驶系统感知模块中用于图像识别的深度神经网络模型对一帧图像进行识别的过程中生 成中间数据,执行本申请实施例中神经网络模型故障监测方法(详细描述参见下文),对该深度神经网络模型进行实时准确的故障监测,并将故障监测结果实时反馈到感知模块,以便感知模块判断是否将当前的识别结果传递到规划与决策模块。例如,可以反馈感知模块,神经网络模型工作正常,以使感知模块可以将对该帧图像的识别结果传递给规划与决策模块;或者,可以反馈感知模块,神经网络故障,以使感知模块丢弃该帧图像的识别结果。
本申请实施例不限定该神经网络模型故障监测装置的类型。
示例性地,该神经网络模型故障监测装置可以是独立设置,也可以集成在其他装置中,还可以是通过软件或者软件与硬件结合实现。
示例性地,该神经网络模型故障监测装置可以为自动驾驶车辆,或者自动驾驶车辆中的其他部件。其中,该神经网络模型故障监测装置包括但不限于:车载终端、车载控制器、车载模块、车载模组、车载部件、车载芯片、车载单元、车载雷达或车载摄像头等等。作为一个示例,该神经网络模型故障监测装置可以集成在自动驾驶车辆的车载计算平台或AI加速器等处理设备中。
示例性地,该神经网络模型故障监测装置还可以为除了自动驾驶车辆之外的其他具有数据处理能力的智能终端,或设置在智能终端中的部件或者芯片。
示例性地,该神经网络模型故障监测装置可以是一个通用设备或者是一个专用设备。例如,该装置还可以台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备或其他具有数据处理功能的设备,或者为这些设备内的部件或者芯片。
示例性地,该神经网络模型故障监测装置还可以是具有处理功能的芯片或处理器,该故障监测装置可以包括多个处理器。处理器可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。
需要说明的是,本申请实施例描述的上述应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,针对其他相似的或新的场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
下面对本申请实施例提供的自动驾驶系统中神经网络模型故障监测方法进行详细说明。
图3示出根据本申请一实施例的一种自动驾驶系统中神经网络模型故障监测方法的流程图,该可以方法可以由上述图2中神经网络模型故障监测装置执行,如图3所示,可以包括以下步骤:
步骤301、获取自动驾驶系统中的待监测神经网络模型的目标输出数据集合。
其中,待监测神经网络模型可以为自动驾驶系统中任一神经网络模型,例如,可以为感知模块中所配置的用于图像识别的深度神经网络模型或用于语音识别的神经网络模型等等,还可以为规划与决策模块中所配置的用于生成控制集合的神经网络模型,等等。
需要说明的是,本申请实施例中不限制神经网络模型的类型,例如,可以为深度神经网络、卷积神经网络、循环神经网络等等。
其中,目标输出数据集合可以包括m个神经网络层中各神经网络层对应的输出数据集合,待监测神经网络模型包括M个神经网络层,M为大于1的整数,m为大于1且不大于M的整数。示例性地,对于任一神经网络层,该神经网络层对应的输出数据集合中包括待监测神经网络模型推理过程中,该神经网络层中所有节点所输出的数据。其中,m的具体数值可以根据待监测神经网络模型的规模和/或实际运算资源的多少等进行预先设定;示例性地,可以将m的数值设置为接近M,即获取尽可能多的神经网络层对应的输出数据集合,从而提高监测精确度,例如,当m与M取值相同时,则表示神经网络模型故障监测装置获取待监测神经网络模型中所有神经网络层对应的输出数据集合;还可以将m的数值设置为较小值,即获取少量的神经网络层对应的输出数据集合,从而节约运算资源,提高处理效率,更好地满足实时性要求。
作为一个示例,待监测神经网络模型可以为自动驾驶系统感知模块中用于图像识别的卷积神经网络,该卷积神经网络可以包括若干卷积层、池化层、全连接层等等神经网络层,感知模块采集的图像输入到该卷积神经网络中,经过卷积层、池化层、全连接层处理后,输出图像识别结果;其中,每一卷积层可以包括一个或多个卷积核,每一卷积核均可以提取对应的特征图,则该卷积神经网络的目标输出数据集合可以包括各卷积层中所有卷积核所提取的特征图。
步骤302、在目标输出数据集合中,提取各神经网络层对应的特征值集合。
其中,对于任一神经网络层,该神经网络层对应的特征值集合中可以包括一个或多个该神经网络层对应的特征值。示例性地,针对m个神经网络层中任一神经网络层,可以在该神经网络层对应的输出数据集合中提取输出数据作为特征值,从而得到该神经网络层对应的特征值集合。其中,所提取的输出数据的数量,可以根据需求预先设定,示例性地,不同神经网络层所提取出的输出数据的数量可以相同,也可以不同,对此不作限定。该步骤可以理解为特征工程的提取,通过提取尽量少输出数据作为特征值以尽可能全面地反映各神经网络层的输出数据的分布。
示例性地,针对m个神经网络层中任一神经网络层,可以按照预设概率分布的方式在该神经网络层对应的输出数据集合中提取输出数据作为特征值,从而得到该神经网络层对应的特征值集合;例如,可以按照高斯分布的方式在该神经网络层对应的输出数据集合中提取部分输出数据作为特征值,从而得到该神经网络层对应的特征值集合。
下面对提取各神经网络层对应的特征值集合的可能实现方式进行举例说明。
方式一、确定目标输出数据集合中,输出数据的数量最小的第一输出数据集合;根据第一输出数据集合中输出数据的数量,在各神经网络层对应的输出数据集合中,提取各神经网络层对应的特征值集合;其中,所提取的各神经网络层对应的特征值集合中特征值的数量均小于或等于第一输出数据集合中输出数据的数量。
示例性地,可以根据第一输出数据集合中输出数据的数量,确定各神经网络层待提取的输出数据的数量,进而在各神经网络层中提取该数量的输出数据作为特征值,得到各神经网络层对应的特征值集合。
考虑到自动驾驶系统中神经网络模型通常较复杂,目标输出数据集合中输出数据的数量较大,该方式中,在各神经网络层对应的输出数据集合中,自适应提取各神经 网络层对应的特征值集合,各神经网络层中提取的特征值的数据均不大于m个神经网络层中任一神经网络层中输出数据的数量,从而简化运算开销,提高后续处理效率,满足对故障监测的实时性要求。
作为一个示例,可以预设采样系数,根据采样系数及第一输出数据集合中输出数据的数量,确定各神经网络层待提取的输出数据的数量;例如,可以通过下述公式(2)确定各神经网络层待提取的输出数据的数量n:
n=α*n tmp..................................(2)
在公式(2)中,n tmp表示第一输出数据集合中输出数据的数量,α表示采样系数,α的取值范围为[0,1]。
其中,采样系数α用来平衡对待监测神经网络模型进行故障监测的复杂度和精确度,可以根据实际需求设置采样系数的具体数值;例如,可以对监测精确度要求较高的情况下,将α设置为较高数值,即针对每一神经网络层,在对应的输出数据集合中提取较多数量的输出数据,作为该神经网络层对应的特征值;可以在监测精确度要求不太高的情况下,将α值设置为较小值,即针对每一神经网络层,在对应的输出数据集合中提取较少数量的输出数据,作为该神经网络层对应的特征值,从而节约运算资源,提高处理效率,更好地满足实时性要求。
示例性地,α可以取10%。示例性地,当α*n tmp的值为非整数时,则可以将α*n tmp向下取整,从而得到n。
其中,n tmp可以通过下述公式(3)确定:
n tmp=min i∈mφ(i)...................(3)
在公式(3)中,φ(i)表示m个神经网络层中第i个神经网络层对应的输出数据集合中输出数据的数量。
这样,根据上述公式(2)及公式(3)可以确定各神经网络层待提取的输出数据的数量,即特征值集合中特征值的数量。作为一个示例,可以将输出数据的数量最小的第一输出数据集合中所包含的输出数据总数的10%作为各神经网络层待提取的输出数据的数量,从而简化运算开销,提高后续处理效率。
方式二、以各神经网络层对应的输出数据集合中输出数据的数量为权重,提取各神经网络层对应的特征值集合。
考虑到不同神经网络层中输出数据的数量也不同,对神经网络模型的工作状态影响也亦不同;因此,每一神经网络层所提取的特征值的数量的可以做适当的变化,该方式中,可以根据各个神经网络层对应的输出数据的数量的权重以分配每一神经网络层抽取输出数据的多少,即神经网络层对应的输出数据的数量越多,则提取越多数量的输出数据作为特征值;相应的,神经网络层对应的输出数据的数量越少,则提取越少数量的输出数据作为特征值;从而实现自适应提取各神经网络对应的特征值集合,所提取的特征值集合能够更加准确地反映各神经网络层的输出数据的分布,同时,通过特征值提取,简化运算开销,提高后续处理效率,满足对故障监测的实时性要求。
步骤303、计算各神经网络层对应的特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到m个神经网络层对应的相对熵值集合。
其中,第一元素集合可以包括符合预设概率分布的多个元素,第一元素集合可以 为实时生成的也可以为预存的;示例性地,可以实时生成预设数量的服从预设概率分布的随机数,该预设数量的随机数即组成第一元素集合;示例性地,预设概率分布可以为高斯分布。
示例性地,针对m个神经网络层中任一神经网络层,可以得到该神经网络层对应的特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,该相对熵值为一实数,其数值大小表示该神经网络层对应的特征值集合中各特征值所组成的分布与预设概率分布的差异性。这样,遍历m个神经网络层中所有神经网络层,计算得到各神经网络层与第一元素集合之间的相对熵值,即得到多个实数,从而得到相对熵值集合;其中,相对熵值集合中各相对熵值可以表示m个神经网络层中各神经网络层对应的特征值集合中各特征值所组成的分布与预设概率分布的差异性。同时,利用各神经网络层对应的特征值集合,得到一个相对熵值集合,实现了数据降维,进一步提高了运算效率。
步骤304、根据m个神经网络层对应的相对熵值集合,判断待监测神经网络模型是否存在运行故障。
神经网络模型正常工作时推理过程中各神经网络层的正常输出数据提取出的特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,可以表示各神经网络层的正常输出数据与第一元素集合之间的差异;神经网络模型发生故障时推理过程中各神经网络层的异常输出数据提取出的特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,可以表示各神经网络层的异常输出数据与第一元素集合之间的差异;由于各神经网络层的正常输出数据与各神经网络层的异常输出数据存在不同,相应的,各神经网络层的正常输出数据与第一元素集合之间的相对熵值,不同于该神经网络层的异常输出数据与第一元素集合之间的相对熵值,因此,利用相对熵值,可以区分神经网络模型正常工作时推理过程中各神经网络层的正常输出数据,与神经网络模型发生故障时推理过程中各神经网络层的异常输出数据。此外,各神经网络层的输出数据集合(例如,正常输出数据或异常输出数据)中数据量通常较大,即输出数据集合在数据空间中分布较广,利用不同的相对熵值区分不同的输出数据集合,即相对熵值与数据空间中分布较广的输出数据集合存在对应关系,从而通过不同相对熵值将数据空间中不同输出数据集合的差异性拉开,并降低不同输出数据集合的耦合程度。该步骤中,不同于直接根据神经网络模型推理过程中的各神经网络层的输出数据判断待监测神经网络模型是否存在运行故障,通过m个神经网络层对应的相对熵值集合,区分各神经网络层的正常输出数据与异常输出数据,从而更加准确地判断待监测神经网络模型是否存在运行故障。例如,若各神经网络层的正常输出数据与各神经网络层的异常输出数据的差异较小,两者的差异不易直接区分;而正常输出数据提取出的特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,和异常输出数据提取出的特征值集合与符合预设概率分布的第一元素集合之间的相对熵值不同,通过相对熵值区分正常输出数据与异常输出数据,从而准确判断待监测神经网络模型是否存在运行故障。
在一种可能的实现方式中,该步骤可以包括:将m个神经网络层对应的相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障。
示例性地,预设分类模型可以根据相对熵值集合中各相对熵值的大小,自动对相对熵值集合进行分类,准确确定相对熵值集合所属类别;其中,所属类别可以包括待监测神经网络模型正常工作及待监测神经网络模型发生故障;示例性地,将相对熵值集合输入到预设分类模型中,预设分类模型基于已知的正常输出数据提取出的特征值集合与符合预设概率分布的元素集合之间的相对熵值,以及异常输出数据提取出的特征值集合与符合预设概率分布的元素集合之间的相对熵值,对相对熵值集合进行分类,从而准确判断待监测神经网络模型是否存在运行故障。
示例性地,预设分类模型可以包括基于机器学习的第一分类器或基于深度学习的第二分类器等等;例如,第一分类器可以为KNN,第二分类器可以为MLP等等。
本申请实施例所提供的自动驾驶系统中神经网络模型故障监测方法,具有运算开销小、实时性高、准确性高、适用范围广等特点。
本申请实施例中,考虑到自动驾驶系统中神经网络模型的复杂性,所包含的神经网络层的数量通常较多,对应的输出数据的数量较大,因此,基于蒙特卡洛方法的思想,对各神经网络层的输出数据进行选择性采样,抽取输出数据集合中的部分输出数据作为特征值,所抽取的特征值分布可以作为目标输出数据集合中各神经网络层输出数据的分布的估计,即通过尽量少的特征值反映各神经网络层输出数据的分布,从而简化计算,节约了运算开销,提高运算效率;同时,通过计算各神经网络层对应的特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到一个相对熵值集合,实现了数据降维,进一步提高了运算效率;从而提高了故障监测的实时性,实现了自动驾驶系统中神经网络模型故障的实时监测。
本申请实施例中,采用相对熵值对各神经网络层的正常输出数据和异常输出数据的分布差异特性进行描述,通过m个神经网络层对应的相对熵值集合,区分各神经网络层的正常输出数据与异常输出数据,从而根据相对熵值集合,更加准确地判断待监测神经网络模型是否存在运行故障,提高了故障监测的准确性。例如,针对用于Alexnet,相较于SED的故障监测方式,在同样500个错误出现在Alexnet时,本申请实施例对Alexnet故障监测的准确度有较大的提升。
本申请实施例中,可以有效监测神经网络模型的各类运行故障或各类神经网络模型的运行故障,适用范围广;例如,可以监测深度神经网络模型、卷积神经网络模型等各类神经网络模型的运行故障;再例如,可以实时监测自动驾驶系统中由于车载计算平台或AI加速器等部署待监测神经网络模型的设备中硬件失效所导致的待监测神经网络模型运行故障,包括瞬时故障、永久性故障等;还可以实时监测自动驾驶系统中异常输入导致的待监测神经网络模型运行故障,从而提升车载计算平台或AI加速器等的安全性。此外,还可以确定可能发生故障的神经网络层的范围,即可以确定该m个神经网络层中一个或多个神经网络层导致待监测神经网络模型出现运行故障。
下面对上述步骤304中,根据相对熵值集合,判断待监测神经网络模型是否存在运行故障的可能实现方式进行举例说明。
方式一、以预设分类模型为基于机器学习的第一分类器为例,可以将相对熵值集合输入到第一分类器中,计算相对熵值集合与多个相对熵值样本集合之间的距离;根据相对熵值集合与多个相对熵值样本集合之间的距离,判断待监测神经网络模型是否 存在运行故障。
其中,多个相对熵值样本集合可以包括待监测神经网络模型发生故障时m个神经网络层对应的相对熵值样本集合及待监测神经网络模型正常工作时m个神经网络层对应的相对熵值样本集合。
示例性地,多个相对熵值样本集合可以在预先采样得到,即每一相对熵值样本集合所属类别是已知的,其中,类别可以分为待监测神经网络模型正常工作和待监测神经网络模型发生故障。相对熵值集合与多个相对熵值样本集合之间的距离的大小可以表示相对熵值集合与多个相对熵值样本集合中各相对熵值样本集合的差异程度;例如,若相对熵值集合与某一相对熵值样本集合之间的距离越大,则表示相对熵值集合与该相对熵值样本集合的差异越大,相应的,相对熵值集合与该相对熵值样本集合属于同一类别的可能性越低。若相对熵值集合与某一相对熵值样本集合之间的距离越小,则表示相对熵值集合与该相对熵值样本集合的差异越小,相应的,相对熵值集合与该相对熵值样本集合越有可能属于同一类别。
示例性地,可以将相对熵值集合输入到第一分类器中,第一分类器计算相对熵值集合与多个相对熵值样本集合之间的距离,从而可以将不同类别的相对熵值样本集合在特征空间进行划分,则可认为相对熵值集合与所划分出的与相对熵值集合距离最近的一个或多个相对熵值样本集合更有可能同属一个类别,进而根据该一个或多个相对熵值样本集合中多数相对熵值样本集合所属类别,判断待监测神经网络模型是否存在运行故障。
作为一个示例,以第一分类器为KNN分类器为例,将相对熵值集合输入到KNN分类器中,KNN分类器可以自动计算相对熵值集合与多个相对熵值样本集合中各相对阈值样本集合的距离,并选取与相对熵值集合距离最近的K个相对熵值样本集合,按照多数投票的方式将K个相对熵值样本集合中多数相对熵值样本集合所属类别作为该相对熵值集合的类别;若该相对熵值集合的类别为待监测神经网络模型发生故障,则可判断待监测神经网络模型存在运行故障;若该相对熵值集合类别为待监测神经网络模型正常工作,则可判断待监测神经网络模型不存在运行故障。这样,利用基于机器学习的第一分类器,无需预先训练,可以根据相对熵值集合与多个相对熵值样本集合之间的距离,更加方便快捷地对相对熵值集合进行自动分类,从而实时判断待监测神经网络模型是否存在运行故障。
方式二、以预设分类模型为基于深度学习的第二分类器为例,可以将相对熵值集合输入到第二分类器中,判断待监测神经网络模型是否存在运行故障;其中,第二分类器由多个相对熵值样本集合训练得到。
示例性地,可以根据多个相对熵值样本集合及已知的各相对熵值样本集合所属类别预先对第二分类器进行训练,经过训练,第二分类器可以准确区分不同类别的相对熵值集合。进而在进行故障监测时,可以将相对熵值集合输入到训练后的第二分类器中,第二分类器可以自动判别相对熵值集合所属类别,从而准确判断待监测神经网络模型是否存在运行故障;这样,通过采用基于深度学习的第二分类器,在实时判别相对熵值集合所属类别的同时,有效提高了相对熵值集合分类的准确性,从而更加准确地判断待监测神经网络模型是否存在运行故障。
作为一个示例,以第二分类器为MLP为例,其中,MLP的拓扑结构可以根据相对熵值集合中相对熵值的数量及分类类别多少进行设置;例如,MLP的拓扑结构可以为(n-20-2),其中,n表示输入到MLP输入层的相对熵值集合中相对熵值的数量;20表示MLP隐藏层的数量,2表示MLP输出层所输出的两个类别,即待监测神经网络模型出现故障及待监测神经网络模型正常工作。在训练阶段,利用多个相对熵值样本集合作为训练样本对MLP进行训练,其中,待监测神经网络模型发生故障时m个神经网络层对应的相对熵值样本集合可以作为负样本,待监测神经网络模型正常工作时m个神经网络层对应的相对熵值样本集合可以作为正样本;将训练样本及对应的类别标签输入到MLP中,训练MLP中的权重参数,例如,可以将一个训练样本输入到MLP中,MLP输出该训练样本的类别,根据该类别与该训练样本的类别标签,确定损失函数值,进行根据损失函数值进行反向传播,调整MLP中的权重参数;利用多个训练样本,重复上述训练过程,直至达到收敛,固定收敛时MLP中的权重参数,得到经过训练的MLP。在故障监测阶段,将相对熵值集合输入到上述经过训练的MLP中,MLP可以自动输出相对熵值集合所属类别,从而实时准确地判断当前待监测神经网络模型是否出现运行故障。作为一个示例,针对用于图像识别的Alexnet,在采用训练后的MLP判断Alexnet是否存在运行故障时,相比于采用SED的方式,判断准确率提高约15%。
需要说明的是,上述KNN及MLP仅为示例,可以根据需要采用其他分类器作为分类模型,对此不作限定。
示例性地,待监测神经网络模型发生故障时m个神经网络层对应的相对熵值样本集合,可以包括:m个神经网络层中各神经网络层对应的第一特征值样本集合与符合预设概率分布的第二元素集合之间的相对熵值;其中,第一特征值样本集合由待监测神经网络模型发生故障时,各神经网络层对应的输出数据样本集合提取得到;待监测神经网络模型正常工作时m个神经网络层对应的相对熵值样本集合,可以包括:m个神经网络层中各神经网络层对应的第二特征值样本集合与符合预设概率分布的第二元素集合之间的相对熵值;其中,第二特征值样本集合由待监测神经网络模型正常工作时,各神经网络层对应的输出数据样本集合提取得到。
示例性地,第二元素集合可以与上述第一元素集合相同;可以理解的是,可以预先确定符合预设概率分布的元素集合,即第二元素集合,并在故障监测阶段,采用该符合预设概率分布的元素集合作为第一元素集合。
可以理解的是,可以根据不同的场景,针对不同的待监测神经网络模型,预先生成相应的相对熵值样本集合。
图4示出了根据本申请一实施例的一种获取相对熵值样本集合的方法流程图,如图4所示,可以包括以下步骤:
步骤401、分别获取待监测神经网络模型发生故障时及正常工作时,待监测神经网络模型中至少一个神经网络层对应的输出数据样本集合。
作为一个示例,可以获取待监测神经网络模型正常工作时,待监测神经网络模型m个神经网络层中各神经网络层对应的输出数据样本集合。
以神经网络模型为感知模块中用于图像识别的深度神经网络模型为例,针对车载摄像头采集的一张原始图像中,预先标注该原始图像中对象为行人,将该原始图像输 入到待监测神经网络模型中,待监测神经网络模型通过推理,判别该原始图像中所包含的对象为行人,则收集该推理过程中各神经网络层的输出数据,作为待监测神经网络模型正常工作时,m个神经网络层中各神经网络层对应的一个输出数据样本集合。相似的,可以依次采用不同的原始图像,并相应的收集每次推理过程中各神经网络层的输出数据,从而得到待监测神经网络模型正常工作时,m个神经网络层中各神经网络层对应的多个输出数据样本集合。
作为另一个示例,可以通过故障注入的方式,模拟待监测神经网络模型推理过程中发生故障,从而获取待监测神经网络模型发生故障时,待监测神经网络模型中m个神经网络层中各神经网络层对应的输出数据样本集合。
以神经网络模型为感知模块中用于图像识别的深度神经网络模型为例,针对车载摄像头采集的一张原始图像中,预先标注原始图像中对象为行人,将该原始图像输入到待监测神经网络模型中,可以注入一个故障,待监测神经网络模型通过推理,判别该原始图像中所包含对象并非行人,则收集该推理过程中各神经网络层的输出数据,从而作为待监测神经网络模型发生故障时,m个神经网络层中各神经网络层对应的一个输出数据样本集合。相似的,可以依次注入不同的故障或者采用不同的原始图像,待监测神经网络模型进行多次推理计算,并相应的收集每次推理过程中各神经网络层的输出数据,从而得到待监测神经网络模型发生故障时,m个神经网络层中各神经网络层对应的多个输出数据样本集合。
作为另一个示例,可以通过生成对抗样本的方式,获取待监测神经网络模型发生故障时,待监测神经网络模型中m个神经网络层中各神经网络层对应的输出数据样本集合。其中,对抗样本表示待监测神经网络模型无法对其进行正常推理的输入数据。
以神经网络模型为感知模块中用于图像识别的深度神经网络模型为例,针对车载摄像头采集的一帧原始图像中,预先标注原始图像中对象为行人,通过在该原始图像中添加非常少量的精心构造的噪声,从而得到对抗图像,人眼通常无法区分该对抗图像与原始图像,待监测神经网络模型可能会对该对抗图像中对象进行错误分类,例如,可能会判定该对抗图像中包含的对象非行人,从而发生错误;则收集该推理过程中各神经网络层的输出数据,作为待监测神经网络模型发生故障时,m个神经网络层中各神经网络层对应的一个输出数据样本集合。相似的,可以生成不同的对抗图像,并相应的收集每次推理过程中各神经网络层的输出数据,从而得到待监测神经网络模型发生故障时,m个神经网络层中各神经网络层对应的多个输出数据样本集合。
步骤402、在至少一个神经网络层对应的输出数据样本集合,提取至少一个神经网络层对应的特征值样本集合。
该步骤中,提取特征值样本集合的方式可参照上述步骤303中相关表述,在此不再赘述。例如,可以通过上公式(1)和(2)确定特征值样本集合中特征值样本的数量。示例性地,可以根据需求设置采样系数的取值,例如,可以设置较小的采样系数,降低特征值样本集合中特征值样本的数量,从而有效提升第二分类器的训练效率,实现在少量数据下,训练得到第二分类器,有效节约运算资源;或者,可以有效提升第一分类器对相对熵值集合进行自动分类的效率,更好地满足故障监测实时性要求。
示例性地,可以在上述所获取的待监测神经网络模型发生故障时,m个神经网络 层中各神经网络层对应的输出数据样本集合中,提取各神经网络层对应的第一特征值样本集合;可以在上述所获取的待监测神经网络模型正常工作时,m个神经网络层中各神经网络层对应的输出数据样本集合中,提取各神经网络层对应的第二特征值样本集合。
示例性地,针对m个神经网络层中任一神经网络层,可以按照预设概率分布的方式在该神经网络层对应的输出数据样本集合中提取输出数据样本作为特征值样本,从而得到该神经网络层对应的特征值样本集合,提高了分类模型的鲁棒性。
步骤403、计算至少一个神经网络层对应的特征值样本集合与符合预设概率分布的第二元素集合之间的相对熵值,得到相对熵值样本集合。
示例性地,可以计算各神经网络层对应的第一特征值样本集合与第二元素集合之间的相对熵值,得到待监测神经网络模型发生故障时m个神经网络层对应的相对熵值样本集合;可以计算各神经网络层对应的第二特征值样本集合与第二元素集合之间的相对熵值,得到待监测神经网络模型正常工作时m个神经网络层对应的相对熵值样本集合。
示例性地,还可以标注相对熵值样本集合所属类别,其中,待监测神经网络模型发生故障时m个神经网络层对应的相对熵值样本集合所属类别可以标注为待监测神经网络模型发生故障,待监测神经网络模型正常工作时m个神经网络层对应的相对熵值样本集合所属类别可以标注为待监测神经网络模型正常工作。
作为一个示例,可以利用得到相对熵值样本集合,采样基于机器学习的第一分类器,判断待监测神经网络模型是否存在运行故障;作为另一个示例,可以利用得到的相对熵值样本集合,对深度学习的第二分类器进行训练,从而利用少量相对熵值样本训练得到第二分类器,有效节约运算资源。
此外,本申请实施例所提供的方法具有较强的拓展性,还可以在上述实施例的基础上,结合现有技术,分析神经网络模型内部结果,做结构无视化分析(model-agnostic analysis)等;或者,可以在更多相对熵值样本集合的支持下,实现对运行故障进行更多层次的分类。
下面以待监测神经网络模型为自动驾驶系统感知模块中用于图像识别的深度神经网络模型为例,对上述图3所示的神经网络模型故障监测方法进行示例性地说明。
图5示出根据本申请一实施例的一种自动驾驶系统中神经网络模型故障监测方法的示意图,如图5所示,感知模块中用于图像识别的深度神经网络模型可以部署在车载计算平台或AI加速器中,在自动驾驶系统工作过程中,感知模块可以获取车载摄像头采集的每帧图像后,利用用于图像识别的深度神经网络模型进行推理,输出识别结果。针对任一帧图像,神经网络模型故障监测装置可以执行上述步骤301,从而获取用于图像识别的深度神经网络模型对该帧图像的处理过程中,该神经网络模型中m个神经网络层中各神经网络层对应的输出数据集合。
进而,神经网络模型故障监测装置可以执行上述步骤302,在m个神经网络层中各神经网络层对应的输出数据集合中,提取各神经网络层对应的特征值集合。
示例性地,针对任一神经网络层,其对应的特征值集合可以以特征值向量的形式表示;作为一个示例,对第m个神经网络层中抽取n个特征值,可以得到特征值向量 Am:
Figure PCTCN2022083858-appb-000002
在公式(4)中,
Figure PCTCN2022083858-appb-000003
分别表示提取的特征值,n表示特征值的数量,m表示神经网络层的数量。
示例性地,针对各神经网络层,可以提取相同数量的输出数据作为各神经网络层对应的特征值集合;则所得到的各神经网络层对应的特征值集合如下述公式(5)所示:
Figure PCTCN2022083858-appb-000004
在公式(5)中,A1、A2…Am表示向量m个神经网络层对应的特征值集合。A为m行n列的特征值矩阵,该特征值矩阵中包括各神经网络层对应的特征值集合。
该特征值矩阵基于蒙特卡洛思想构建,以反映用于图像识别的深度神经网络模型的运行状态,其中,用于图像识别的深度神经网络模型在进行推理运算过程中,会产生大量中间计算数据,即各神经网络层对应的输出数据,通过对输出数据进行采用实验,即生成特征值矩阵,从而建立对各神经网络层对应的输出数据的估计量。
进一步地,神经网络模型故障监测装置可以执行上述步骤303,计算各神经网络层对应的特征值集合与符合高斯分布的第一元素集合之间的相对熵值,得到m个神经网络层对应的相对熵值集合。
示例性地,第一元素集合可以以参考矩阵的形式表示;相对熵值集合可以以相对熵值矩阵的形式表示。
作为一个示例,参考矩阵G可以如下述公式(6)所示:
G=[g1 g2 g3…gn]......................(6)
在上述公式(6)中,g1,g2,…,gn分别表示一个服从标准正态分布(N~(0,1))的随机数;即参考矩阵G包括符合高斯分布的第一元素集合。
作为一个示例,可以根据特征值矩阵及参考矩阵,确定相对熵矩阵;示例性地,可以结合公式(4)及公式(6),得到特征值向量Am与参考矩阵G的相对熵值KLm,如下述公式(7)所述:
Figure PCTCN2022083858-appb-000005
在公式(7)中,A m(i)表示特征值向量Am的第i个元素,G(i)表示参考矩阵中第i个元素;ln(·)表示计算自然对数;∑ n(·)表示对n个数据求和。
参照公式(7),针对公式(5)中任一特征值向量,求取与公式(6)所述的参考矩阵的相对熵值,可得相对熵值矩阵KL:
KL=[KL 1 KL 2…KL m] T................(8)
其中,相对熵值矩阵KL中每一元素均表示一个相对熵值。即相对熵值矩阵KL包括各神经网络层对应的特征值集合与第一元素集合的相对熵值。
公式(8)所示的相对熵值矩阵KL为一个1×m的矩阵,从而将公式(5)所示的m×n的特征值矩阵A降维成一个1×m的矩阵,实现了数据降维,进一步提高了运算效率。
此外,相对熵值矩阵KL描述了神经网络内m层抽取特征量与参考矩阵G的分布差异情况。本申请实施例中,不直接对神经网络内部的推理数据进行分类,而是将特征值矩阵A与高斯分布的参考矩阵G进行数据投影,公式(8)中的每一个特征向量都表示投影空间中的一个特征点,这些特征点对应的类别为待监测神经网络模型发生故障或者待监测神经网络模型正常工作两类,从而可以将神经网络模型正常工作时推理过程中的各神经网络层的正常输出数据,与神经网络模型发生故障时推理过程中的各神经网络层的异常输出数据的差异性拉开,并降低正常输出数据和异常输出数据的耦合程度。
进一步地,神经网络模型故障监测装置可以执行上述步骤304,利用分类模型,快速对相对熵值矩阵KL进行分类,从而实时且准确地判断待监测神经网络模型是否存在运行故障。神经网络模型故障监测装置还可以将监测结果反馈到感知模块、或者感知融合模块、或者系统健康管理模块等进行预警上报;例如,在分类模型判定相对熵值矩阵KL对应的类别为待监测神经网络正常工作时,可以向感知模块反馈该结果,感知模块在收到该反馈后,将当前感知结果传输到规划与决策模块;在分类模型判定相对熵值矩阵KL对应的类别为待监测神经网络出现故障时,可以向感知模块反馈该结果,感知模块在收到该反馈后,丢弃当前感知结果。
基于上述方法实施例的同一发明构思,本申请的实施例还提供了一种自动驾驶系统中神经网络模型故障监测装置,该自动驾驶系统中神经网络模型故障监测装置可以用于执行上述方法实施例所描述的技术方案。例如,可以执行上述图3、图4或图5中所示自动驾驶系统中神经网络模型故障监测方法的各步骤。
图6示出根据本申请一实施例的一种自动驾驶系统中神经网络模型故障监测装置的结构示意图,如图6所示,该装置可以包括:传输模块601,用于获取自动驾驶系统中的待监测神经网络模型的目标输出数据集合,所述目标输出数据集合包括m个神经网络层中各神经网络层对应的输出数据集合,其中,所述待监测神经网络模型包括M个神经网络层,M为大于1的整数,m为大于1且不大于M的整数;处理模块602,用于在所述目标输出数据集合中,提取所述各神经网络层对应的特征值集合;计算所述特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到所述m个神经网络层对应的相对熵值集合;根据所述相对熵值集合,判断所述待监测神经网络模型是否存在运行故障。
本申请实施例中,基于蒙特卡洛方法的思想,对各神经网络层的输出数据进行选择性采样,抽取输出数据集合中的部分输出数据作为特征值,通过尽量少的特征值反映各神经网络层输出数据的分布,从而简化计算,节约了运算开销,提高运算效率;同时,通过计算各神经网络层对应的特征值集合与符合预设概率分布的第一元素集合之间的 相对熵值,得到一个相对熵值集合,实现了数据降维,进一步提高了运算效率;从而提高了故障监测的实时性,实现了自动驾驶系统中神经网络模型故障的实时监测。同时,采用相对熵值对各神经网络层的正常输出数据和异常输出数据的分布差异特性进行描述,区分各神经网络层的正常输出数据与异常输出数据,从而通过m个神经网络层对应的相对熵值集合更加准确地判断待监测神经网络模型是否存在运行故障,提高了故障监测的准确性。此外,可以有效监测神经网络模型的各类运行故障或各类神经网络模型的运行故障,适用范围广。
在一种可能的实现方式中,所述处理模块602,还用于:确定所述目标输出数据集合中,输出数据的数量最小的第一输出数据集合;根据所述第一输出数据集合中输出数据的数量,在所述各神经网络层对应的输出数据集合中,提取所述各神经网络层对应的特征值集合;其中,所提取的各神经网络层对应的特征值集合中特征值的数量均小于或等于所述第一输出数据集合中输出数据的数量。
在一种可能的实现方式中,所述处理模块602,还用于:以所述各神经网络层对应的输出数据集合中输出数据的数量为权重,提取所述各神经网络层对应的特征值集合。
在一种可能的实现方式中,所述处理模块602,还用于:将所述相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障。
在一种可能的实现方式中,所述预设分类模型包括基于机器学习的第一分类器;所述处理模块602,还用于:将所述相对熵值集合输入到所述第一分类器中,计算所述相对熵值集合与多个相对熵值样本集合之间的距离;其中,所述多个相对熵值样本集合包括所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合及所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合;根据所述相对熵值集合与多个相对熵值样本集合之间的距离,判断所述待监测神经网络模型是否存在运行故障。
在一种可能的实现方式中,所述分类模型包括基于深度学习的第二分类器;所述处理模块602,还用于:将所述相对熵值集合输入到所述第二分类器中,判断所述待监测神经网络模型是否存在运行故障;其中,所述第二分类器由多个相对熵值样本集合训练得到。
在一种可能的实现方式中,所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第一特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第一特征值样本集合由所述待监测神经网络模型发生故障时,所述各神经网络层对应的输出数据样本集合提取得到;所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第二特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第二特征值样本集合由所述待监测神经网络模型正常工作时,所述各神经网络层对应的输出数据样本集合提取得到。
上述图6所示的自动驾驶系统中神经网络模型故障监测装置及其各种可能的实现方式的技术效果及具体描述可参见上述自动驾驶系统中神经网络模型故障监测方法,此处不再赘述。
应理解以上装置中各模块的划分仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。此外,装置中的模块可以以处理器调用软件的形式实现;例如装置包括处理器,处理器与存储器连接,存储器中存储有指令,处理器调用存储器中存储的指令,以实现以上任一种方法或实现该装置各模块的功能,其中处理器例如为通用处理器,例如中央处理单元(Central Processing Unit,CPU)或微处理器,存储器为装置内的存储器或装置外的存储器。或者,装置中的模块可以以硬件电路的形式实现,可以通过对硬件电路的设计实现部分或全部模块的功能,该硬件电路可以理解为一个或多个处理器;例如,在一种实现中,该硬件电路为专用集成电路(application-specific integrated circuit,ASIC),通过对电路内元件逻辑关系的设计,实现以上部分或全部模块的功能;再如,在另一种实现中,该硬件电路为可以通过可编程逻辑器件(programmable logic device,PLD)实现,以现场可编程门阵列(Field Programmable Gate Array,FPGA)为例,其可以包括大量逻辑门电路,通过配置文件来配置逻辑门电路之间的连接关系,从而实现以上部分或全部模块的功能。以上装置的所有模块可以全部通过处理器调用软件的形式实现,或全部通过硬件电路的形式实现,或部分通过处理器调用软件的形式实现,剩余部分通过硬件电路的形式实现。
在本申请实施例中,处理器是一种具有信号的处理能力的电路,在一种实现中,处理器可以是具有指令读取与运行能力的电路,例如CPU、微处理器、图形处理器(graphics processing unit,GPU)(可以理解为一种微处理器)、或数字信号处理器(digital signal processor,DSP)等;在另一种实现中,处理器可以通过硬件电路的逻辑关系实现一定功能,该硬件电路的逻辑关系是固定的或可以重构的,例如处理器为ASIC或PLD实现的硬件电路,例如FPGA。在可重构的硬件电路中,处理器加载配置文档,实现硬件电路配置的过程,可以理解为处理器加载指令,以实现以上部分或全部模块的功能的过程。
可见,以上装置中的各模块可以是被配置成实施以上实施例方法的一个或多个处理器(或处理电路),例如:CPU、GPU、微处理器、DSP、ASIC、FPGA,或这些处理器形式中至少两种的组合。
此外,以上装置中的各模块可以全部或部分可以集成在一起,或者可以独立实现。在一种实现中,这些模块集成在一起,以SOC的形式实现。该SOC中可以包括至少一个处理器,用于实现以上任一种方法或实现该装置各模块的功能,该至少一个处理器的种类可以不同,例如包括CPU和FPGA,CPU和人工智能处理器,CPU和GPU等。
本申请的实施例还提供了一种自动驾驶系统中神经网络模型故障监测装置,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述实施例的方法。示例性地,可以执行上述图3、图4或图5中所示自动驾驶系统中神经网络模型故障监测方法的各步骤。
图7示出根据本申请一实施例的一种自动驾驶系统中神经网络模型故障监测装置的结构示意图,如图7所示,该自动驾驶系统中神经网络模型故障监测装置可以包括:至少一个处理器701,通信线路702,存储器703以及至少一个通信接口704。
处理器701可以是一个通用中央处理器,微处理器,特定应用集成电路,或一个或多个用于控制本申请方案程序执行的集成电路;处理器701也可以包括多个通用处理器的异构运算架构,例如,可以是CPU、GPU、微处理器、DSP、ASIC、FPGA中至少两种的组合;作为一个示例,处理器701可以是CPU+GPU或者CPU+ASIC或者CPU+FPGA。
通信线路702可包括一通路,在上述组件之间传送信息。
通信接口704,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,RAN,无线局域网(wireless local area networks,WLAN)等。
存储器703可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路702与处理器相连接。存储器也可以和处理器集成在一起。本申请实施例提供的存储器通常可以具有非易失性。其中,存储器703用于存储执行本申请方案的计算机执行指令,并由处理器701来控制执行。处理器701用于执行存储器703中存储的计算机执行指令,从而实现本申请上述实施例中提供的方法;示例性地,可以实现上述图3、图4或图5中所示自动驾驶系统中神经网络模型故障监测方法的各步骤。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
示例性地,处理器701可以包括一个或多个CPU,例如,图7中的CPU0;处理器701也可以包括一个CPU,及GPU、ASIC、FPGA中任一个,例如,图7中的CPU0+GPU0或者CPU 0+ASIC0或者CPU0+FPGA0。
示例性地,自动驾驶系统中神经网络模型故障监测装置可以包括多个处理器,例如图7中的处理器701和处理器707。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器,或者是包括多个通用处理器的异构运算架构。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,自动驾驶系统中神经网络模型故障监测装置还可以包括输出设备705和输入设备706。输出设备705和处理器701通信,可以以多种方式来显示信息。例如,输出设备705可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等,例如,可以为车载HUD、AR-HUD、显示器等显示设备。输入设备706和处理器701通信,可以以多种方式接收用户的输入。例如,输入设备706可以是鼠标、键盘、触摸屏设备或传感设备等。
作为一个示例,结合图7所示的自动驾驶系统中神经网络模型故障监测装置,上 述图6中的传输模块601可以由图7中的通信接口704来实现;上述图6中的处理模块602可以由图7中的处理器701来实现。
本申请的实施例提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述实施例中的方法。示例性地,可以实现上述图3、图4或图5中所示自动驾驶系统中神经网络模型故障监测方法的各步骤。
本申请的实施例提供了一种计算机程序产品,例如,可以包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质;当所述计算机程序产品在计算机上运行时,使得所述计算机执行上述实施例中的方法。示例性地,可以执行上述图3、图4或图5中所示自动驾驶系统中神经网络模型故障监测方法的各步骤。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图 和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本申请的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (17)

  1. 一种自动驾驶系统中神经网络模型故障监测方法,其特征在于,所述方法包括:
    获取自动驾驶系统中的待监测神经网络模型的目标输出数据集合,所述目标输出数据集合包括m个神经网络层中各神经网络层对应的输出数据集合,其中,所述待监测神经网络模型包括M个神经网络层,M为大于1的整数,m为大于1且不大于M的整数;
    在所述目标输出数据集合中,提取所述各神经网络层对应的特征值集合;
    计算所述特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到所述m个神经网络层对应的相对熵值集合;
    根据所述相对熵值集合,判断所述待监测神经网络模型是否存在运行故障。
  2. 根据权利要求1所述的方法,其特征在于,所述在所述目标输出数据集合中,提取所述各神经网络层对应的特征值集合,包括:
    确定所述目标输出数据集合中,输出数据的数量最小的第一输出数据集合;
    根据所述第一输出数据集合中输出数据的数量,在所述各神经网络层对应的输出数据集合中,提取所述各神经网络层对应的特征值集合;其中,所提取的各神经网络层对应的特征值集合中特征值的数量均小于或等于所述第一输出数据集合中输出数据的数量。
  3. 根据权利要求1所述的方法,其特征在于,所述在所述目标输出数据集合中,提取所述各神经网络层对应的特征值集合,包括:
    以所述各神经网络层对应的输出数据集合中输出数据的数量为权重,提取所述各神经网络层对应的特征值集合。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述根据所述相对熵值集合,判断所述待监测神经网络模型是否存在运行故障,包括:
    将所述相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障。
  5. 根据权利要求4所述的方法,其特征在于,所述预设分类模型包括基于机器学习的第一分类器;
    所述将所述相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障,包括:
    将所述相对熵值集合输入到所述第一分类器中,计算所述相对熵值集合与多个相对熵值样本集合之间的距离;其中,所述多个相对熵值样本集合包括所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合及所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合;
    根据所述相对熵值集合与多个相对熵值样本集合之间的距离,判断所述待监测神经网络模型是否存在运行故障。
  6. 根据权利要求4所述的方法,其特征在于,所述分类模型包括基于深度学习的第二分类器;
    所述将所述相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障,包括:
    将所述相对熵值集合输入到所述第二分类器中,判断所述待监测神经网络模型是否存在运行故障;其中,所述第二分类器由多个相对熵值样本集合训练得到。
  7. 根据权利要求5所述的方法,其特征在于,所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第一特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第一特征值样本集合由所述待监测神经网络模型发生故障时,所述各神经网络层对应的输出数据样本集合提取得到;所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第二特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第二特征值样本集合由所述待监测神经网络模型正常工作时,所述各神经网络层对应的输出数据样本集合提取得到。
  8. 一种自动驾驶系统中神经网络模型故障监测装置,其特征在于,所述装置包括:
    传输模块,用于获取自动驾驶系统中的待监测神经网络模型的目标输出数据集合,所述目标输出数据集合包括m个神经网络层中各神经网络层对应的输出数据集合,其中,所述待监测神经网络模型包括M个神经网络层,M为大于1的整数,m为大于1且不大于M的整数;
    处理模块,用于在所述目标输出数据集合中,提取所述各神经网络层对应的特征值集合;计算所述特征值集合与符合预设概率分布的第一元素集合之间的相对熵值,得到所述m个神经网络层对应的相对熵值集合;根据所述相对熵值集合,判断所述待监测神经网络模型是否存在运行故障。
  9. 根据权利要求8所述的装置,其特征在于,所述处理模块,还用于:确定所述目标输出数据集合中,输出数据的数量最小的第一输出数据集合;根据所述第一输出数据集合中输出数据的数量,在所述各神经网络层对应的输出数据集合中,提取所述各神经网络层对应的特征值集合;其中,所提取的各神经网络层对应的特征值集合中特征值的数量均小于或等于所述第一输出数据集合中输出数据的数量。
  10. 根据权利要求8所述的装置,其特征在于,所述处理模块,还用于:以所述各神经网络层对应的输出数据集合中输出数据的数量为权重,提取所述各神经网络层对应的特征值集合。
  11. 根据权利要求8-10中任一项所述的装置,其特征在于,所述处理模块,还用于:将所述相对熵值集合输入到预设分类模型中,判断所述待监测神经网络模型是否存在运行故障。
  12. 根据权利要求11所述的装置,其特征在于,所述预设分类模型包括基于机器学习的第一分类器;
    所述处理模块,还用于:将所述相对熵值集合输入到所述第一分类器中,计算所述相对熵值集合与多个相对熵值样本集合之间的距离;其中,所述多个相对熵值样本集合包括所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合及所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合;根据所述相对熵值集合与多个相对熵值样本集合之间的距离,判断所述待监测神经网络模型是否存在运行故障。
  13. 根据权利要求11所述的装置,其特征在于,所述分类模型包括基于深度学习的第二分类器;
    所述处理模块,还用于:将所述相对熵值集合输入到所述第二分类器中,判断所述待监测神经网络模型是否存在运行故障;其中,所述第二分类器由多个相对熵值样本集合训练得到。
  14. 根据权利要求12所述的装置,其特征在于,所述待监测神经网络模型发生故障时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第一特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第一特征值样本集合由所述待监测神经网络模型发生故障时,所述各神经网络层对应的输出数据样本集合提取得到;所述待监测神经网络模型正常工作时所述m个神经网络层对应的相对熵值样本集合,包括:所述m个神经网络层中各神经网络层对应的第二特征值样本集合与符合所述预设概率分布的第二元素集合之间的相对熵值;其中,所述第二特征值样本集合由所述待监测神经网络模型正常工作时,所述各神经网络层对应的输出数据样本集合提取得到。
  15. 一种自动驾驶系统中神经网络模型故障监测装置,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令时实现权利要求1-7中任意一项所述的方法。
  16. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-7中任意一项所述的方法。
  17. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行权利要求1-7中任意一项所述的方法。
PCT/CN2022/083858 2022-03-29 2022-03-29 一种自动驾驶系统中神经网络模型故障监测方法及装置 WO2023184188A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202280006258.5A CN117242455A (zh) 2022-03-29 2022-03-29 一种自动驾驶系统中神经网络模型故障监测方法及装置
PCT/CN2022/083858 WO2023184188A1 (zh) 2022-03-29 2022-03-29 一种自动驾驶系统中神经网络模型故障监测方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/083858 WO2023184188A1 (zh) 2022-03-29 2022-03-29 一种自动驾驶系统中神经网络模型故障监测方法及装置

Publications (1)

Publication Number Publication Date
WO2023184188A1 true WO2023184188A1 (zh) 2023-10-05

Family

ID=88198458

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083858 WO2023184188A1 (zh) 2022-03-29 2022-03-29 一种自动驾驶系统中神经网络模型故障监测方法及装置

Country Status (2)

Country Link
CN (1) CN117242455A (zh)
WO (1) WO2023184188A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111007719A (zh) * 2019-11-12 2020-04-14 杭州电子科技大学 基于领域自适应神经网络的自动驾驶转向角度预测方法
CN111563578A (zh) * 2020-04-28 2020-08-21 河海大学常州校区 基于TensorFlow的卷积神经网络故障注入系统
CN113835408A (zh) * 2020-06-24 2021-12-24 英特尔公司 用于自主驾驶交通工具的稳健多模态传感器融合
WO2022001805A1 (zh) * 2020-06-30 2022-01-06 华为技术有限公司 一种神经网络蒸馏方法及装置
US20220067531A1 (en) * 2020-08-26 2022-03-03 Nvidia Corporation Efficient identification of critical faults in neuromorphic hardware of a neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111007719A (zh) * 2019-11-12 2020-04-14 杭州电子科技大学 基于领域自适应神经网络的自动驾驶转向角度预测方法
CN111563578A (zh) * 2020-04-28 2020-08-21 河海大学常州校区 基于TensorFlow的卷积神经网络故障注入系统
CN113835408A (zh) * 2020-06-24 2021-12-24 英特尔公司 用于自主驾驶交通工具的稳健多模态传感器融合
WO2022001805A1 (zh) * 2020-06-30 2022-01-06 华为技术有限公司 一种神经网络蒸馏方法及装置
US20220067531A1 (en) * 2020-08-26 2022-03-03 Nvidia Corporation Efficient identification of critical faults in neuromorphic hardware of a neural network

Also Published As

Publication number Publication date
CN117242455A (zh) 2023-12-15

Similar Documents

Publication Publication Date Title
US11475770B2 (en) Electronic device, warning message providing method therefor, and non-transitory computer-readable recording medium
EP3295382B1 (en) Bit width selection for fixed point neural networks
EP3940665A1 (en) Detection method for traffic anomaly event, apparatus, program and medium
US11256964B2 (en) Recursive multi-fidelity behavior prediction
US11816841B2 (en) Method and system for graph-based panoptic segmentation
KR20180048930A (ko) 분류를 위한 강제된 희소성
US11741274B1 (en) Perception error model for fast simulation and estimation of perception system reliability and/or for control system tuning
Ahmad et al. Intelligent framework for automated failure prediction, detection, and classification of mission critical autonomous flights
JP2022078310A (ja) 画像分類モデル生成方法、装置、電子機器、記憶媒体、コンピュータプログラム、路側装置およびクラウド制御プラットフォーム
Kawasaki et al. Multimodal trajectory predictions for autonomous driving without a detailed prior map
WO2023184188A1 (zh) 一种自动驾驶系统中神经网络模型故障监测方法及装置
WO2022243337A2 (en) System for detection and management of uncertainty in perception systems, for new object detection and for situation anticipation
Hu et al. Detecting socially abnormal highway driving behaviors via recurrent graph attention networks
Ithnin et al. Intelligent Locking System using Deep Learning for Autonomous Vehicle in Internet of Things
US11927601B2 (en) Persistent two-stage activity recognition
CN116094947B (zh) 一种感知数据的订阅方法、装置、设备及存储介质
CN114596552B (zh) 信息处理方法、训练方法、装置、设备、车辆及介质
Priya et al. Vehicle Detection in Autonomous Vehicles Using Computer Vision Check for updates
Lakshmi Priya et al. Vehicle Detection in Autonomous Vehicles Using Computer Vision
CN113963322B (zh) 一种检测模型训练方法、装置及电子设备
CN115019278B (zh) 一种车道线拟合方法、装置、电子设备和介质
US11710344B2 (en) Compact encoded heat maps for keypoint detection networks
EP4287077A1 (en) Method and apparatus for testing an artificial neural network using surprising inputs
US20230126323A1 (en) Unsupervised data characterization utilizing drift
US20220122594A1 (en) Sub-spectral normalization for neural audio data processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934043

Country of ref document: EP

Kind code of ref document: A1