WO2023015919A1

WO2023015919A1 - Ai computing verification method and apparatus

Info

Publication number: WO2023015919A1
Application number: PCT/CN2022/084753
Authority: WO
Inventors: 王矿磊; 陈艺帆; 陈清龙
Original assignee: 华为技术有限公司
Priority date: 2021-08-12
Filing date: 2022-04-01
Publication date: 2023-02-16
Also published as: CN115705487A

Abstract

The present application provides an AI computing verification method and apparatus. The method is executed by a first computing unit, and comprises: acquiring parameters of an AI model of a second computing unit processing AI computing, the AI model comprising one or more first processing layers; for each of the one or more first processing layers, acquiring input data of the first processing layer from the second computing unit; performing verification processing on the first processing layer on the basis of the parameters of the AI model and the input data of the first processing layer to obtain a verification mark bit of the first processing layer, wherein the amount of computation of the verification processing on the first processing layer is less than the amount of computation of the second computing unit processing the input data by means of the first processing layer; and determining, on the basis of the verification result, whether the output result of the second computing unit processing the AI computing is correct, the verification result comprising verification mark bits of the one or more first processing layers. The method can guarantee the correctness and real-time performance of inferential computing.

Description

AI calculation verification method and device

This application claims the priority of the Chinese patent application with the application number 202110924997.X and the application name "AI computing verification method and device" filed with the China Patent Office on August 12, 2021, the entire contents of which are incorporated by reference in In this application.

technical field

The present application relates to the field of artificial intelligence, and more specifically, relates to a method and device for verifying AI calculations.

Background technique

Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. Artificial intelligence has a wide range of applications. Data processing in transportation, medical care, security and other fields can be completed through AI neural networks. The more data that needs to be analyzed and processed, the greater the computational load of the neural network. Taking the intelligent driving scene as an example, high-level intelligent driving vehicles are often equipped with multiple cameras, lidar, ultrasonic radar and other sensors in order to achieve a comprehensive perception of the surrounding environment, resulting in a large amount of information that needs to be processed. The real-time requirement of neural network inference calculation is very high. If the inference calculation of neural network lags behind, it will not be able to provide environmental information in time for subsequent regulation and control decisions, which will reduce the safety of intelligent driving. However, the traditional central processing unit (CPU) cannot afford the reasoning calculation of such a huge neural network. Therefore, the artificial intelligence chip is used as a hardware acceleration unit to perform the reasoning calculation of the neural network, and the AI chip is performing the reasoning of the neural network. Computing is faster and more energy efficient than traditional chips. Currently commonly used AI chips include graphics processing units (graphics processing units, GPUs), field programmable gate arrays (field programmable gate arrays, FPGAs), and application specific integrated circuits (application specific integrated circuits). circuit, ASIC).

Intelligent driving vehicles run in the external environment and may encounter various problems such as severe weather and electromagnetic interference, which requires the computing platform of the intelligent driving system to have extremely high reliability. Traditional devices such as CPU and memory have error detection and fault tolerance mechanisms such as program flow monitoring, data flow monitoring, memory checking (error checking and correction, ECC), parity checking, etc. to ensure that data in the CPU and memory are not affected by soft failures. For AI chips, in order to achieve high-speed computing, there is generally no effective error detection mechanism inside AI chips, and because the computing architecture of AI chips is different from traditional chips such as CPUs, the error detection mechanism of traditional chips cannot be directly applied to AI chips.

In order to ensure the safety of intelligent driving, the computing platform needs to meet the requirements of automotive safety integrity level (ASIL), and an error detection method is needed to detect the real-time performance of AI chips, so as to ensure that the application of AI chips meets the application scenarios demand.

Contents of the invention

This application provides a verification method and device for AI calculation. The verification method can be executed by other calculation units other than the calculation unit processing AI calculation, and will not affect the processing of AI calculation. Compared with redundant verification , the verification method of AI calculation in the embodiment of the present application has a very small amount of calculation, and has low requirements on the performance of the calculation unit used for verification, which reduces the hardware cost accordingly, and provides guarantee for the reliability of the AI chip.

In the first aspect, a verification method for AI calculation is provided, the method is executed by the first calculation unit, and the method includes: obtaining the parameters of the AI model of the AI calculation processed by the second calculation unit, and the AI model includes one or more first processing layer; each first processing layer in one or more first processing layers performs the following verification process respectively to obtain the verification mark bit of each first processing layer in one or more first processing layers: from the first processing layer The second calculation unit acquires the input data of the first processing layer; performs verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain the verification mark bit of the first processing layer, wherein, for The calculation amount of the verification processing of the first processing layer is less than the calculation amount of the second computing unit processing the input data through the first processing layer; based on the verification result, it is determined whether the output result of the second computing unit processing AI calculation is correct, and the verification result includes A check flag bit for each first processing layer of the one or more first processing layers.

The verification method of the AI calculation in the embodiment of the present application is performed by other computing units other than the computing unit that executes the AI calculation. Compared with the verification method of periodically running the self-check library, the verification method of the AI calculation in the embodiment of the present application The method will not interfere with the normal progress of the inference calculation of the AI model, so it will not affect the acceleration performance of the AI calculation unit, and it will also avoid the same AI calculation unit from performing verification when performing AI calculations, ensuring the correctness of the AI model output results While guaranteeing the real-time performance of the reasoning calculation of the AI model, and since the calculation amount of the verification processing is less than that of the AI calculation, the performance requirements of the computing unit used for the verification processing may not be higher than that of the computing unit used for AI calculation. Compared with verification performed by redundant computing units, the heterogeneous verification method provided in the embodiment of the present application can save power consumption and reduce costs.

In some possible implementations, the AI model further includes one or more second processing layers, and the method further includes: performing a redundancy check on each of the one or more second processing layers to obtain A check mark bit of each of the one or more second processing layers; the check result further includes a check mark bit of each of the one or more second processing layers.

The second processing layer includes pooling layer and activation layer, etc. Since pooling calculation and activation calculation only occupy a small part of resources, even if redundancy check is used, excessive resources will not be consumed.

In some possible implementations, the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the first processing layer is verified based on the parameters of the AI model and the input data of the first processing layer Processing, to obtain the checkmark bit of the first processing layer, including:

Obtain the first check mark bit, the first check mark bit is obtained by performing the first check calculation on the weight matrix; obtain the second check mark bit, the second check mark bit is the second check mark bit on the feature map matrix According to the first check mark bit and the second check mark bit, the pre-calculation check mark bit is obtained; the output matrix is obtained from the second calculation unit, and the output matrix is the weight of the second calculation unit at the first processing layer The matrix and the feature map matrix are obtained by calculation; the third check calculation is performed on the output matrix to obtain the check mark bit after calculation; the check mark bit is obtained according to the check mark bit before calculation and the check mark bit after calculation.

The above weight matrix and feature map matrix can be calculated offline and stored in memory. Moreover, the matrix calculations for obtaining different marker bits are different. The verification method of AI calculation in the embodiment of the present application designs different verification methods for different processing layers in the AI model, which saves computing resources to the greatest extent, so that the verification can be performed in computing units with low computing power, reducing the cost of verification.

In some possible implementations, the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation, and it is determined based on the check result whether the output result of the AI calculation processed by the second calculation unit is correct, including: If there is at least one check mark bit in the check result indicating that the check mark bit before calculation and the check mark bit after calculation are inconsistent, the output result is incorrect.

In some possible implementation manners, the first processing layer is a convolutional layer or a fully connected layer. The plurality of first processing layers may include a plurality of convolutional layers, or, a plurality of fully connected layers, or, one or more convolutional layers and one or more fully connected layers.

In some possible implementation manners, when the judgment output result is incorrect, the state of the second computing unit includes transient failure and permanent failure.

In some possible implementation manners, when it is judged that the output result is incorrect, the method further includes: determining that the state of the second computing unit is a transient failure or a permanent failure by running a self-check library.

In some possible implementation manners, the method further includes: when the status of the second computing unit is permanent failure, reporting the failure status of the second computing unit.

The verification method of AI calculation in the embodiment of the present application can further judge the specific failure state of the hardware after the hardware failure is determined. If the calculation unit only fails transiently, the calculation unit can continue to be used, thereby avoiding the waste of resources. Improved the availability of AI chips.

In the second aspect, a method for verifying AI calculation is provided, which is characterized in that the method is executed by the first calculation unit, and the method includes: obtaining the verification result of the output result of the AI model processed by the second calculation unit for AI calculation, and checking The result of the test is to determine that the output result is incorrect; the self-test library is run to determine that the state of the second computing unit is a transient failure or a permanent failure.

In some possible implementations, the operation of the self-check library determines that the state of the second computing unit is transient failure or permanent failure, including: when the operation result of running the self-test library is no fault, the state of the second computing unit is a transient failure; when the operation result of running the self-test library is faulty, the state of the second computing unit is permanent failure.

In some possible implementations, the method further includes: when the status of the second computing unit is transient failure, discarding the output result; when the status of the second computing unit is permanent failure, reporting the failure of the second computing unit state.

The verification method of AI calculation in the embodiment of this application uses the CPU to perform system scheduling, calls the self-inspection library to perform self-inspection on the AI core with hardware failure, and judges whether the AI core has a permanent failure or a transient failure. If the self-inspection does not find If the failure occurs, it means that the AI core has a transient failure, which does not affect the subsequent calculation, and the AI core can continue to participate in the system operation. If a fault is found in the self-test, it means that the AI core has permanently failed, and the AI core cannot continue to participate in the calculation, and a fault report is required. It avoids directly deactivating the failed AI core, reduces the waste of resources, and improves the availability of AI chips.

In a third aspect, a verification device for AI calculation is provided, the device includes: a transceiver unit, configured to acquire parameters of an AI model for processing AI calculation by a second calculation unit, and the AI model includes one or more first processing layers; Each first processing layer in one or more first processing layers respectively performs the following verification processing to obtain the verification mark bit of each first processing layer in one or more first processing layers: the transceiver unit is also used for , to obtain the input data of the first processing layer from the second calculation unit; the processing unit is used to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain the first processing layer A check mark bit, wherein the calculation amount of the check processing on the first processing layer is less than the calculation amount of the second computing unit processing the input data through the first processing layer; the processing unit is also used to determine the second computing unit based on the check result Whether the output result of the AI calculation is correct or not, the verification result includes a verification flag bit of each first processing layer in one or more first processing layers.

In some possible implementations, the AI model further includes one or more second processing layers, and the processing unit is further configured to: respectively perform redundancy calibration on each of the one or more second processing layers The verification mark bit of each second processing layer in one or more second processing layers is verified; the verification result also includes the verification flag bit of each second processing layer in one or more second processing layers .

In some possible implementations, the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the processing unit is specifically configured to: acquire the first check mark bit, the first check mark bit is It is obtained by performing the first check calculation on the weight matrix; obtaining the second check mark bit, which is obtained by performing the second check calculation on the feature map matrix; according to the first check mark bit and the second check mark bit The check mark bit obtains the check mark bit before calculation; the output matrix is obtained from the second calculation unit, and the output matrix is calculated by the second calculation unit on the weight matrix and the feature map matrix at the first processing layer; the third step is performed on the output matrix Check the calculation to obtain the check mark bit after calculation; obtain the check mark bit according to the check mark bit before calculation and the check mark bit after calculation.

In some possible implementations, the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation, and the processing unit is specifically used to: if there is at least one check mark bit in the check result, it indicates If the pre-check mark bit is inconsistent with the post-calculation check mark bit, the output result will be incorrect.

In some possible implementation manners, the first processing layer is a convolutional layer or a fully connected layer.

In some possible implementation manners, when the judgment output result is incorrect, the processing unit is further configured to: determine that the state of the second computing unit is a transient failure or a permanent failure by running a self-check library.

In some possible implementation manners, the transceiver unit is further configured to: report the failure status of the second computing unit when the status of the second computing unit is permanent failure.

In the fourth aspect, a verification device for AI calculation is provided, the device includes: a transceiver unit, configured to obtain a verification result of the output result of the AI model processed by the second calculation unit for AI calculation, and the verification result is the judgment output result Incorrect; the processing unit is used to run the self-check library to determine whether the state of the second computing unit is a transient failure or a permanent failure.

In some possible implementations, when the result of running the self-test library is no fault, the state of the second computing unit is a transient failure; when the result of running the self-test library is faulty, the second computing unit status is permanently disabled.

In some possible implementations, the device is also used to: when the status of the second computing unit is transient failure, discard the output result; when the status of the second computing unit is permanent failure, report the status of the second computing unit failure state.

In a fifth aspect, a chip is provided, including a first computing unit, and the first computing unit is configured to execute the method in any one of the possible implementations of the first aspect and the second aspect above.

In some possible implementation manners, the chip further includes a second calculation unit, and the second calculation unit is configured to perform AI calculation.

In a sixth aspect, a computer-readable medium is provided, wherein the computer-readable medium stores program codes, and when the computer program codes run on the computer, the computer executes any one of the above-mentioned first aspect and the second aspect. methods in possible implementations.

In a seventh aspect, a computing device is provided, including a first computing unit and a second computing unit, the second computing unit is used to process AI computing based on an AI model, and the first computing unit performs any of the above-mentioned first and second aspects. A method in one possible implementation.

In some possible implementation manners, the processing capability of the first computing unit is less than or equal to the processing capability of the second computing unit.

In some possible implementations, the first calculation unit is at least one of the calculation unit in the AI chip, the calculation unit in the CPU chip, or the calculation unit in the GPU chip, and the second calculation unit is the calculation unit in the AI chip. unit.

Description of drawings

Fig. 1 is the schematic diagram of the verification method of the periodical operation self-inspection storehouse of the embodiment of the present application;

FIG. 2 is a schematic diagram of a system architecture of a possible application of the verification method of AI calculation according to the embodiment of the present application;

Fig. 3 is a schematic diagram of hardware units that may be involved when the verification method of AI calculation according to the embodiment of the present application is applied to the intelligent driving computing platform;

FIG. 4 is a system architecture diagram of the application of the verification method of AI calculation in the embodiment of the present application;

FIG. 5 is a schematic flow chart of a verification method for AI calculation according to an embodiment of the present application;

Fig. 6 is a schematic diagram of calculating the off-line check mark bit of the embodiment of the present application;

FIG. 7 is a schematic diagram of a check mark bit of a calculation feature map according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a check mark before calculation in an embodiment of the present application;

FIG. 9 is a schematic diagram of processing data by the AI model of the embodiment of the present application;

Fig. 10 is a schematic diagram of the calculated check mark bit in the embodiment of the present application;

FIG. 11 is a schematic diagram of calculating a checkmark bit in an embodiment of the present application;

Fig. 12 is a schematic diagram of further detecting the specific failure status of the second computing unit after detecting the failure of the second computing unit according to the embodiment of the present application;

Fig. 13 is a schematic diagram of judging the specific failure state of the second computing unit according to the embodiment of the present application;

FIG. 14 is a schematic flowchart of another AI calculation verification method according to the embodiment of the present application;

Fig. 15 is a schematic block diagram of the verification device for AI calculation provided by the embodiment of the present application;

FIG. 16 is a schematic structural diagram of an AI calculation verification device according to an embodiment of the present application.

Detailed ways

The technical solution in this application will be described below with reference to the accompanying drawings.

At present, there is no special verification mechanism in the AI chip to verify the inference calculation results of the neural network. The inference calculation of the neural network belongs to AI calculation, but it is very necessary to ensure the correctness of the inference calculation results of the neural network. For example, in In the scenario of intelligent driving or intelligent driving, if a wrong decision is made based on the wrong calculation result output by the neural network, it will bring great danger to driving. Therefore, in order to ensure the correctness of the reasoning and calculation results of the neural network, some verification methods have been proposed in the industry, including redundant verification and periodic self-test library (self-test library, STL) verification methods. Redundancy checks include dual module redundancy (DMR) and triple module redundancy (TMR), etc., which refer to using twice or three times the same computing chips or computing units to perform the same calculation at the same time , and then compare the calculation results. If the calculation results are consistent, the calculation results are correct. If the calculation results are inconsistent, the calculation results are wrong. Redundancy verification can theoretically verify the output of the neural network, but at the same time it will double or triple the cost increase.

The verification method of the STL is introduced below in combination with FIG. 1 . Figure 1 (a) shows a schematic diagram of an AI core in an AI chip performing four AI calculations sequentially in chronological order. Each AI calculation runs a neural network for inference calculations and obtains output results. In order to ensure the accuracy of the output results, it is necessary to test the calculation results. Figure (b) in Figure 1 shows that the STL detection method is used to detect 4 AI calculations, where the STL detection is periodic. An STL test is performed after each AI calculation, and the next AI calculation is continued only when the STL test result is normal. Figure (c) in Figure 1 shows the situation where STL detects hardware failure. As shown in (c), in the third detection cycle, when the STL detection finds that the AI chip has a permanent failure, the AI chip The chip no longer performs AI calculations and reports faults.

From the above introduction, we can see that there are some deficiencies in STL detection. First, STL detection is performed periodically, and the detection cycle is fixed, but the AI calculation time is uncertain. For example, the preset detection cycle is 10 milliseconds, but a certain The AI calculation time may exceed 10 milliseconds, so the AI calculation process may be interrupted when running the STL test, resulting in a delay in decision-making, and the AI calculation needs to be stopped when the STL test is performed, and the real-time decision-making cannot be guaranteed. In addition, when the preset detection period is 10 milliseconds, the AI calculation time may be less than 10 milliseconds. For example, in the third detection period in (c) of Figure 1, the AI calculation is carried out in one detection period. Twice, the AI chip failure was detected in the third detection cycle, which may have occurred in the first AI calculation in the third detection cycle, but the STL detection did not detect it in time, resulting in erroneous calculation results Wrong decisions may pose a threat to the safety of equipment. Furthermore, STL detection mainly identifies permanent failures, and the detection of transient failures is weak. If a transient failure occurs, STL detection will also judge the transient failure as a permanent failure, and then report the fault. However, AI chips include A large number of multiplication and addition calculation units, so the area of the AI chip is relatively large, and the probability of being impacted by high-energy particles is also high in theory. Therefore, the probability of transient failure of the AI chip is higher than the probability of permanent failure, and the STL detection will instantaneously If the status failure is determined to be a permanent failure and a fault is reported, the system will disable the reported AI chip, wasting hardware resources and reducing system availability.

Therefore, the embodiment of the present application provides a verification method for AI calculation, which can be executed by other calculation units other than the calculation unit processing AI calculation, and will not affect the processing of AI calculation; verification and neural network The reasoning and calculation of the neural network are carried out synchronously, so that the calculation results of the neural network are correct and real-time; The performance requirements of the computing unit are low, which reduces the hardware cost accordingly and provides a guarantee for the reliability of the AI chip.

The verification method of AI calculation in the embodiment of the present application can be applied to any scene of neural network reasoning calculation, including: a scene with high security requirements for neural network reasoning calculation, for example, in 5G smart industry, massive industrial data will be Analysis, processing and optimization through the AI neural network require a more secure and reliable reasoning calculation of the neural network. Therefore, the verification method of AI calculation in the embodiment of this application can be applied to 5G smart industries, such as control equipment, servers, etc. Deploy devices with AI chips to improve the reliability of neural network inference calculations; scenarios that require large-scale deployment of AI chips, such as cloud computing servers, smart security cameras, smart driving vehicles, terminal devices, etc., these devices are large in scale and run If the time is long, hardware failure is prone to occur, and hardware failure will adversely affect the accuracy of the calculation results of the AI chip and the availability of the system, reducing the user experience. Therefore, the verification method for AI calculation in the embodiment of the present application can be applied to the cloud In computing servers, smart security cameras, and smart driving vehicles, the correctness of AI calculation results is verified in real time; in scenarios where there are a large number of matrix operations, smart devices such as smartphones and smart TVs are usually equipped with graphics processing units (GPUs). Considering the cost, this type of graphics processing unit has low process requirements when it involves, so the failure rate is high, and usually does not have any error detection capabilities, but at the same time, devices equipped with such graphics processing units need to process a large amount of image data, processing The process is mainly based on matrix calculations. When a failure occurs, it may be reflected in the image garbled characters in the smart phone or smart TV, which reduces the user experience. Therefore, the verification method of AI calculation in the embodiment of this application is used as a low-cost calibration The test method can be applied to smart devices equipped with the above-mentioned graphics processing unit, so as to reduce the impact of hardware failure on data processing and improve user experience.

FIG. 2 shows hardware units that may be involved in the application of the verification method for AI calculation according to the embodiment of the present application, including AI chips, CPU chips, memory chips, buses, and GPUs. As shown in Figure 2, the neural network reasoning calculation is performed by the AI chip, and the AI chip can include one or more AI cores, and the AI calculation can be performed by each AI core in the AI chip. The smallest computing unit, so the AI core can be called the AI computing unit. The on-chip storage unit is used to cache the parameters, intermediate calculation results and inference calculation results of the AI model for AI calculation. Each AI core is connected to the on-chip storage unit through a bus . The CPU and the GPU may also respectively include one or more computing units.

The verification method of the neural network calculation of the present application verifies each calculation of the neural network, wherein each verification can be performed by the AI core of the AI chip, or can be performed by the computing unit in the CPU chip or GPU. The result is stored in the memory chip, and finally the CPU chip reads the verification result in the memory chip and makes a logical judgment to determine whether the calculation result of the neural network is correct. The AI chip, CPU chip, memory chip, and GPU are connected through a bus.

Taking the scene of intelligent driving as an example, Fig. 3 shows the system architecture of an application of the verification method of AI calculation according to the embodiment of the present application. As shown in Fig. 3, for intelligent driving vehicles, it is necessary to integrate The data acquired by sensors such as ultrasonic radar is calculated by the intelligent driving computing platform to generate a series of execution instructions and send them to specific actuators for execution. The brake actuator controls vehicle deceleration, etc. If the calculation result of the intelligent driving computing platform is wrong and a wrong execution command is generated, the execution of the actuator according to the wrong execution command will cause driving danger. Therefore, it is particularly necessary to verify the inference calculation of the neural network in the smart driving computing platform. This application The AI calculation verification method can be applied to the intelligent driving computing platform in Figure 3 to verify the reasoning calculation of the neural network in the intelligent driving computing platform to ensure the correctness of the calculation results and ensure the safety of intelligent driving. The intelligent driving computing platform in FIG. 3 includes the hardware units shown in FIG. 2 .

Figure 4 shows the system architecture diagram of the application of the verification method of AI calculation in the embodiment of the present application. As shown in Figure 4, the verification method of AI calculation in the embodiment of the present application is a heterogeneous parallel verification method, which is relatively For traditional redundancy check calculations. The redundant verification calculation is to perform the same calculation as the verified process, and compare the calculation results to determine whether the verified process is correct. For the verified AI computing unit in Figure 4, another AI computing unit for verification is used to perform the same convolution calculation, pooling, activation and full connection calculation output calculation results on the input data. By comparing the two The calculation results of each AI calculation unit determine whether the processing of the verified AI calculation unit is correct. This method requires that the processing capability of the AI computing unit used for verification is the same as or higher than that of the AI computing unit being verified. The verification method of the AI calculation in the embodiment of the present application is a heterogeneous parallel verification method. The heterogeneous method uses different calculation processing from the verified calculation for verification. Connection, pooling, normalization, etc. are designed with different verification methods, and the calculation is heterogeneous. Since the calculation is heterogeneous, the computing unit used for verification can also be different from the AI computing unit to be verified, and the computing unit used for verification and the AI computing unit to be verified can be hardware with different structures and capabilities . In this embodiment of the present application, the heterogeneous parallel verification may be a calculation process different from the AI calculation to be verified. In this case, the computing unit used for verification may be the same as the AI computing unit to be verified. In the embodiment of the present application, the heterogeneous parallel verification can not only adopt calculation processing different from that of the AI calculation to be verified, but also have different structures or processing capabilities of the computing unit used for verification and the AI computing unit to be verified. For example, the heterogeneous parallel verification of AI calculations can be performed by AI computing units, or by other computing units. Here, the computing units that perform the verification of AI reasoning calculations are all recorded as heterogeneous computing units, where heterogeneous computing units The computing power can be the same as the computing power of the computing unit that performs AI reasoning calculations, or can be lower than the computing power of computing units that perform AI reasoning calculations. Calculation verification consumes much less computing resources than AI computing, so it can be executed in ordinary computing units with low computing power, such as GPU computing units or CPUs with small computing power, thereby reducing the need for calibration. The threshold of inspection, saving the cost. Parallelism refers to the verification of each calculation in the neural network in the process of AI calculation to generate a check mark bit. Each calculation of the neural network here refers to the pair of processing layers of each processing data of the neural network. For the calculations made during data processing, when the neural network calculation is completed and the calculation results are output, the calculation results can be judged according to all the check mark bits to determine whether the calculation results are correct, without stopping the AI calculation, and the check has real-time To meet the real-time requirements for neural network inference calculations in application scenarios. The method for verifying the AI calculation in the embodiment of the present application will be specifically introduced below with reference to FIG. 4 .

First, the AI computing unit normally executes the inference calculation of the neural network. After the data to be processed is input into the AI computing unit, after preprocessing, the data is input into the AI core in the form of tensor, and the neural network model is also loaded into the AI core. middle. Neural network models have different structures, and the data to be processed is processed by multiple processing layers in the neural network model to obtain inference results and output. The processing layer in the neural network model can be divided into structure and can include convolution layer, pooling layer, activation layer and fully connected layer. Usually a neural network model can contain multiple processing layers, for example, one or more convolutional layers, one or more pooling layers, one or more activation layers, and one or more fully connected layers. Among them, the calculation of the convolutional layer and the fully connected layer is reflected in the multiplication of the matrix. When the dimension of the matrix is high, the calculation amount introduced by the matrix multiplication operation is relatively large, and it will consume more computing resources.

Figure 4 illustrates a neural network model with a typical structure as an example. The neural network model includes a convolutional layer, a pooling layer, an activation layer, and a fully connected layer. It should be understood that the verification of the AI calculation in the embodiment of the present application The method is not limited to the structure of the neural network model in FIG. 4 , and is also applicable to neural network models of other structures, and each structure may also have multiple identical processing layers. The preprocessed data is calculated by the convolutional layer, pooling layer, activation layer, and fully connected layer to obtain the output calculation results, and then the AI core passes the output calculation results to the CPU, and the CPU executes corresponding decisions based on the calculation results . It can be seen from the above process that the traditional neural network reasoning calculation process does not include a verification mechanism, and the correctness of the calculation results delivered to the CPU cannot be guaranteed. The AI calculation verification method of the embodiment of the present application can verify the inference calculation of the neural network to ensure the correctness of the output calculation results and avoid serious consequences caused by the CPU executing wrong decisions based on wrong calculation results.

It is different from performing one or more repeated calculations on the inference calculation of the neural network in the redundancy check, and it is also different from the verification method of running the self-inspection library after the inference calculation of the neural network is completed. The heterogeneous computing unit According to the verification method of AI calculation in the embodiment of the present application, one or more calculations in the neural network in the AI core are synchronously verified, as shown in Figure 4, the first convolution calculation of the convolutional layer is verified The check mark bit 1 is obtained, and the convolution layer may perform multiple convolution calculations, so the second convolution calculation of the convolution layer is checked to obtain the check mark bit 2, and the third convolution layer of the convolution layer Check the convolution calculation to get the check mark bit 3... Check the first calculation of the pooling layer to get the check mark bit 2a, where a is a positive integer, and check the second calculation of the pooling layer Check to get the check mark bit 2a+1... Check the first calculation of the activation layer to get the check mark bit 3b, where b is a positive integer, and check the second calculation of the activation layer to get the check mark Bit 3b+1... Check the first calculation of the fully connected layer to get the check mark bit 3c, where c is a positive integer, and check the second calculation of the fully connected layer to get the check mark bit 3c+ 1...and so on. The above verification is real-time and will not interfere with the normal reasoning and calculation process of the neural network. When the inference calculation of the neural network is completed and the calculation results are output, the synchronous verification of one or more calculations in the neural network is also completed, and one or more check mark bits are generated and stored in the memory. The output result is verified by one or more check mark bits, and it can be determined whether the calculation result output by the neural network is correct or not according to the one or more check mark bits. At the same time, the verification method of AI calculation in the embodiment of the present application designs different verification methods for different structures inside the neural network. As shown in Figure 4, for the convolutional layer and the fully connected layer, the calculation resources are consumed due to the multiplication operation of the matrix. More processing layers use matrix calculation and verification methods, such as matrix calculations for dimensionality reduction for verification, including multiplication operations between vectors and matrices, or vector multiplication operations for verification. The calculation amount of matrix calculations is relatively large compared to redundant verification The calculation amount of the multiplication operation of the matrix in the method is low, and the redundancy check refers to performing the same calculation on the corresponding processing layer. Since more than 99% of the calculations in the neural network are brought by matrix multiplication operations in structures such as convolutional layers and fully connected layers, the remaining less than 1% of the calculations are caused by structures such as pooling layers and activation layers. Therefore, the matrix multiplication operation in the structure of the convolutional layer and the fully connected layer that consumes most computing resources is checked in a relatively low-computing way, which can greatly reduce the reasoning of the neural network Calculate the amount of calculations for the checksum. For operations in structures such as pooling layers and activation layers that consume a very small amount of computing resources, redundant checks can still be used to simplify the processing of checks. Of course, checks with a small amount of calculation can also be used to further reduce the impact on neural networks. The calculation amount of inference calculation verification of the network.

According to the verification method of AI calculation in the embodiment of the present application, after the neural network completes the inference calculation and outputs the calculation result, the synchronous verification of one or more calculations in the neural network is also completed, and one or more verification marks are generated Bit, the CPU reads one or more check marks in the memory, and judges whether the calculation result output by the neural network is correct according to the one or more check marks, and if it determines that the calculation result is correct, then executes the corresponding decision. If the calculation result is determined to be incorrect, a fault is reported.

FIG. 5 shows a schematic flow chart of the verification method for AI calculation according to the embodiment of the present application. As shown in FIG. 5 , it includes steps 501 to 504, which will be introduced respectively below.

S501. Acquire parameters of an AI model for processing AI calculation by a second computing unit, where the AI model includes one or more first processing layers.

The second calculation unit is an AI core that executes neural network inference calculations. The verification method of AI calculation shown in FIG. 5 is executed by the first calculation unit. The first calculation unit may not be the same calculation unit as the second calculation unit. The first calculation unit The unit can be another AI core on the AI chip where the second computing unit is located, or another computing unit with lower computing power on the AI chip where the second computing unit is located. The other computing unit can be designed for AI acceleration, or It may not be designed for AI acceleration. The first computing unit may also be a computing unit on another AI chip. The first computing unit may also be a CPU core in a CPU chip. The CPU core is the smallest computing unit in a CPU chip. In a possible implementation manner, the first calculation unit and the second calculation unit may also be the same calculation unit, that is, the calculation unit not only performs AI calculation, but also executes the verification method of AI calculation in the embodiment of the present application at the same time.

Since the second computing unit performs the inference calculation of the neural network, the required AI model is loaded into the second computing unit, and the AI model may include one or more processing layers for processing the data input to the AI model, wherein the first A processing layer consumes relatively large computing resources. For a convolutional neural network model with a typical structure, it can be a convolutional layer or a fully connected layer. The AI model may include one or more first processing layers. For example, the AI model may include a convolutional layer and a fully connected layer, and the AI model may also include multiple convolutional layers and multiple fully connected layers. The foregoing are examples and are not intended to be limiting. Of course, the first processing layer may also be a pooling layer or an activation layer.

The first calculation unit may acquire parameters of the AI model from the second calculation unit, and the parameters of the AI model include weights, biases, etc., for example, may be a weight matrix. The parameters of the AI model can be saved in the system memory after offline training. The first calculation unit may also acquire parameters of the AI model from the system memory.

Steps S502 to S503 are performed on each first processing layer in the one or more first processing layers:

S502. Acquire input data of the first processing layer from the second computing unit.

The first computing unit obtains the input data of the first processing layer from the second computing unit, still taking the convolutional neural network model with a typical structure in S501 to process the image data as an example, when the first processing layer is a convolutional layer , the input data of the first processing layer is the preprocessed image data; when the first processing layer is a fully connected layer, the input data of the first processing layer is the output data of the activation layer.

S503. Perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer to obtain the verification flag bit of the first processing layer, wherein the verification processing on the first processing layer The calculation amount of is smaller than the calculation amount of the second calculation unit processing the input data through the first processing layer.

Specifically, in the first processing layer of the AI model, taking the convolutional layer and the fully connected layer as examples, the data processing will be converted into matrix calculations, so the parameters of the AI model obtained above include the weight matrix of the AI model, and the obtained first The input data of a processing layer includes a matrix of feature maps. Wherein, the weight matrix includes multiple row vectors or multiple column vectors, the feature map matrix includes multiple row vectors or multiple column vectors, and each vector includes multiple elements. The dimension of the matrix can be the number of rows or columns of the matrix. The higher the dimension, the higher the complexity of matrix calculation and the higher the consumption of computing resources. The verification method of AI calculation in the embodiment of the present application uses the method of matrix calculation and verification with reduced dimensionality to perform verification processing on the calculation of the first processing layer to obtain the verification flag bit of the first processing layer. For example, a or multiple matrices with fewer rows or columns than the weight matrix or feature map matrix. One possible implementation is as follows:

Firstly, the first calculation unit performs a first check calculation on the weight matrix to obtain a first check mark bit. Since the parameters of the AI model are stored in the system memory after offline training, the parameters of the AI model participating in data processing can be verified and calculated offline, that is to say, the first verification mark bit can be verified offline. check mark. The first verification calculation may use a dimensionality reduction matrix relative to the feature map matrix, a matrix whose number of rows or columns is smaller than the feature map matrix, and perform matrix multiplication calculation with the weight matrix to obtain the first verification mark bit. As shown in Figure 6, the matrix multiplication operation is performed on the full 1-row vector and the weight matrix of the AI model, so as to obtain the offline checkpoint (OC) for the weight matrix. The calculated off-line check mark bit OC can be stored in the memory, and can be read and used in the subsequent check process. It should be understood that the matrix multiplication operation in the embodiment of the present application should satisfy that the number of columns of the left matrix is the same as the number of rows of the right matrix, that is, the number of columns of the all-1 vector in Figure 6 should be equal to the number of rows of the weight matrix, and other matrices in the embodiments of the present application The multiplication operation should also satisfy this rule. Here, the full 1-row vector has only 1 row. From the perspective of the number of rows, it is smaller than the number of rows of the feature map matrix. The matrix operation of the full 1-row vector and the weight matrix is compared to the matrix calculation of the weight matrix and the feature map matrix. Dimensionality reduction matrix computation. It should be noted that this is only an example and not limited thereto.

Take the convolutional neural network model with a typical structure in S501 to process image data as an example, as shown in Figure 7, after each frame of image is input into the second computing unit, it will be converted into a feature matrix (feature matrix), the first A calculation unit performs a second check operation on the feature map matrix to obtain a second check mark, which can also be called a check point feature map matrix (CF). The second verification calculation may use a dimensionality reduction matrix relative to the weight matrix, a matrix whose number of rows or columns is smaller than the weight matrix, and perform matrix calculation with the feature map matrix to obtain the second verification mark bit. As shown in FIG. 7 , the matrix multiplication operation is performed on the feature map matrix and all 1-column vectors to obtain the feature map check marks. Here, the full 1-column vector has only 1 column. From the perspective of the number of columns, it is smaller than the number of columns of the weight matrix. The matrix operation of the full 1-column vector and the feature map matrix is relative to the matrix calculation of the weight matrix and the feature map matrix. Dimensionality reduction matrix computation. It should be noted that this is only an example and not limited thereto.

The first calculation unit acquires the pre-calculation check mark according to the first check mark (offline check mark OC) and the second check mark (feature map check mark CF). As shown in Figure 8, the off-line check mark bit OC and the feature map check mark bit CF are subjected to a vector multiplication operation to obtain the pre-calculation check mark bit (check bit in, CB_in).

FIG. 9 shows that the second calculation unit performs convolution operation on the weight matrix and the feature map matrix to obtain the output matrix. This calculation process is the normal data processing process of the first processing layer in the AI model, not the verification process.

The following is a brief introduction to the convolution operation in the convolution layer. The convolution operation is performed by the convolution operator. The convolution layer can include many convolution operators. The convolution operator is also called the convolution kernel. It is used in the image The role in processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator can essentially be a weight matrix. This weight matrix is usually predefined. During the convolution operation on the image, The weight matrix is usually processed one pixel by one pixel (or two pixels by two pixels...depending on the value of the stride) along the horizontal direction on the input image, so as to complete the extraction of specific features from the image. Work. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to The entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolutional output with a single depth dimension, but in most cases instead of using a single weight matrix, multiple weight matrices of the same size (row×column) are applied, That is, multiple matrices of the same shape. The output of each weight matrix is stacked to form the depth dimension of the convolution image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to filter unwanted noise in the image. Do blurring etc. The multiple weight matrices have the same size (row×column), and the convolutional feature maps extracted by the multiple weight matrices of the same size are also of the same size, and then the extracted multiple convolutional feature maps of the same size are combined to form The output of the convolution operation. The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network can make correct predictions.

The first calculation unit obtains the output matrix from the second calculation unit, and then performs a third check calculation (also a matrix multiplication calculation) on the output matrix to obtain the post-calculation check mark bit. Specifically, as shown in Figure 10, the convolution operation is performed using the all-ones matrix and the output matrix, where the all-ones matrix should satisfy that its number of columns is the same as the number of rows of the output matrix before matrix multiplication calculation can be performed, and thus the post-calculation calibration can be obtained check mark bit (check bit out, CB_out).

The first calculation unit obtains the check mark bit (check bit, CB) according to the check mark bit CB_in before calculation and the check mark bit CB_out after calculation, and the check mark bit can be obtained by subtracting CB_in and CB_out, and can also be obtained by CB_in and CB_out CB_out is obtained by dividing, the purpose is to compare the difference between CB_in and CB_out.

In Fig. 4, the matrix calculation and verification methods in Fig. 6 to Fig. 11 are adopted for the convolutional layer and the fully connected layer, and the check mark bit CB obtained is the check mark bit 1 and the check mark bit 4c in Fig. 4 .

S504. Determine whether the output result of the AI model is correct based on the verification result, where the verification result includes one or more verification flag bits of the first processing layer obtained according to steps S502 to S503.

The check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation. If there is at least one check mark bit in the verification result, it means that the check mark bit before calculation and the check mark bit after calculation are inconsistent, then The output is incorrect. In Figure 11, the check mark CB obtained by subtracting CB_in and CB_out is taken as an example. Since CB_in is obtained from the offline check mark OC and the feature map check mark CF, it represents the theoretical matrix calculation check result of the output matrix, and CB_out is the actual matrix calculation verification result of the output matrix. If CB is 0, then CB_in and CB_out are the same, indicating that the theoretical matrix calculation verification result of the output matrix is the same as the actual matrix calculation verification result of the output matrix. There is no error in the calculation process of the first processing layer based on the weight matrix and the feature map matrix; on the contrary, if CB is not 0, then CB_in and CB_out are not the same, it means that the theoretical matrix calculation verification result of the output matrix is consistent with the output matrix The actual matrix calculation and verification results are different, and the first processing layer makes an error during the calculation process based on the weight matrix and the feature map matrix.

According to the processes in FIG. 6 to FIG. 11 , one or more check flag bits of the first processing layer can be obtained, which is recorded as a check result. It can be seen that if there is at least one check mark bit in the check result, it indicates that the check mark bit before calculation is inconsistent with the check mark bit after calculation, for example, one or more check mark bits of the first processing layer have at least one non-zero value , it means that the first processing layer corresponding to the check mark bit has an error in the calculation process based on the weight matrix and the feature map matrix, and the calculation result output by the AI model is incorrect; if all the check marks in the check result The bit indicates that the check mark bit before calculation is not consistent with the check mark bit after calculation. For example, the check mark bits of one or more first processing layers are all 0, which means that one or more first processing layers are in accordance with If there is no error in the calculation process of the weight matrix and the feature map matrix, the final output calculation result of the AI model is correct.

Optionally, the check marks of the above one or more first processing layers are stored in the memory, and after the AI model outputs the calculation results, the CPU reads the check marks of one or more first processing layers in the memory bits, and then judge whether the output calculation result of the AI model is correct according to the check mark bits of the one or more first processing layers. If it is judged that the calculation result output by the AI model is correct, the corresponding decision will be executed. For example, according to the calculation of the intelligent driving computing platform in Figure 2, a series of execution instructions will be generated, and the actuator will control the steering, acceleration, deceleration, etc. of the vehicle according to the execution instructions; if If it is judged that the calculation result output by the AI model is incorrect, a failure of the second calculation unit is reported.

The method of verifying the dimensionality reduction matrix calculation shown in Figures 6 to 11 (hereinafter referred to as matrix calculation verification) can be used to verify the calculation of the convolutional layer and the fully connected layer, because the convolution calculation and All connection calculations will be converted to matrix calculations, and the method of dimensionality reduction matrix calculations is applicable for verification.

As for one or more second processing layers that are not verified by dimensionality reduction matrix calculation in the AI model, the verification method for AI calculation in the embodiment of the present application also includes, for one or more second Each second processing layer in the processing layer performs a redundancy check to obtain a check mark bit of the second processing layer, so that the above verification result also includes one or more check mark bits of the second processing layer, When judging whether the calculation result output by the AI model is correct, it is necessary to combine one or more check marks of the first processing layer and one or more check marks of the second processing layer for comprehensive judgment. Specifically, the redundancy check is to perform the same calculation of one or more second processing layers on another chip or another computing unit according to the same input data and the same AI model, and then compare one or more Whether the original calculation result of the second processing layer is consistent with the redundant calculation result, so as to obtain the check mark bits of one or more second processing layers, and the check mark bits of one or more second processing layers can be one or more The second processing layer is the difference or quotient between the original calculation result and the redundant calculation result. If the original calculation results of one or more second processing layers are consistent with the redundant calculation results, it means that one or more second processing layers did not make an error when processing data; if the original calculation results of one or more second processing layers If the result is inconsistent with the redundant calculation result, it means that one or more second processing layers have errors in processing data, which will lead to errors in the final output calculation results of the AI model. Since pooling calculations and activation calculations only occupy a very small portion of resources, even if redundancy checks are used, excessive resources will not be consumed. When the original calculation result is inconsistent with the redundant calculation result, it is determined that an error occurred when the second processing layer processes the data, rather than the redundancy calculation. If it is inconsistent with the redundant calculation result, it is determined that an error occurred in the original calculation; or, the original calculation is performed by a common chip or a common calculation unit, and the redundant calculation is performed by a high-reliability chip or a high-reliability calculation unit. When the calculation results are inconsistent, because the chips or computing units that perform redundant calculations are more reliable, errors are less likely to occur, and the chips or computing units that perform the original calculations are less reliable, so errors are more likely to occur High, so when the original calculation result is inconsistent with the redundant calculation result, it is considered that an error occurred in the original calculation.

When it is judged that the output result of the AI model is incorrect, it means that the AI chip where the second computing unit is located has failed, including transient failure and permanent failure. Calculations are affected, but subsequent calculations are not affected, while permanent failures will affect all calculations after permanent failures. If transient failures and permanent failures are not distinguished, after fault reporting, according to the system The security mechanism, the second computing unit that is detected to be invalid will be directly disabled, even if the second computing unit that has a transient failure can continue to be used, it will be disabled, resulting in waste of resources and reducing the availability of the system .

Therefore, the verification method of AI calculation in the embodiment of the present application also includes, when the output result of the AI model is judged to be incorrect according to the method in FIG. status failure or permanent failure. As shown in Figure 12, when the second computing unit is performing AI calculations, the first computing unit simultaneously performs heterogeneous parallel verification. When it is determined that the output result of the AI model is wrong, that is, after the failure of the second computing unit is detected, Run the self-test library to judge whether the second computing unit has a transient failure or a permanent failure. If it is judged that the second computing unit has a transient failure, discard the output result of the AI model and continue to use the second computing unit Perform AI calculations, and if it is judged that the second calculation unit has a permanent failure, report the failure of the second calculation unit.

Fig. 13 shows a schematic diagram of judging the specific failure state of the second computing unit by the verification method of AI calculation according to the embodiment of the present application. As shown in Fig. 13, the CPU judges the output of the AI model according to multiple verification flag bits in the memory Whether the result is wrong, the specific judgment method refers to the above description in Figure 5, the embodiment of the present application will not repeat it here, if it is judged that the output result of the AI model is wrong, it means that the hardware has failed at this time, then run the self-test library Further judge whether the hardware has a permanent failure. As can be seen from the foregoing, STL is mainly used to identify permanent failures. Therefore, STL is called to perform self-test on the failed hardware. If a fault is found in the self-test, it means that the second computing unit has a permanent failure at this time. Then the second computing unit The unit cannot continue to participate in the calculation, and a fault report is required; if no fault is found in the self-test, it means that the second computing unit has a transient failure at this time, and it will not affect subsequent calculations, so you only need to discard this error The output result of the second calculation unit can continue to perform subsequent calculations.

The AI calculation verification method of the embodiment of the present application uses a heterogeneous parallel verification method to verify the calculation of the AI model. First, the verification is performed by other computing units other than AI calculations. Compared with periodic operation The verification method of the self-inspection library, the verification method of the AI calculation in the embodiment of the present application will not interfere with the normal progress of the reasoning calculation of the AI model, so it will not affect the acceleration performance of the AI core, and it will also avoid the same AI core being executed During the AI calculation, verification is performed to ensure the correctness of the output results of the AI model and at the same time ensure the real-time performance of the reasoning calculation of the AI model. Secondly, the verification method of AI calculation in the embodiment of this application designs different verification methods for different processing layers in the AI model, which saves computing resources to the greatest extent, so that the verification can be performed in computing units with low computing power , reducing the cost of verification. The verification method of AI calculation in the embodiment of the present application can further judge the specific failure state of the hardware after the hardware failure is determined. If the calculation unit only fails transiently, the calculation unit can continue to be used, thereby avoiding the waste of resources. Improved the availability of AI chips.

FIG. 14 shows a schematic flowchart of another AI calculation verification method according to the embodiment of the present application, as shown in FIG. 14 , including step 1401 and step 1402 .

S1401. Obtain a verification result of the output result of the AI model from the second calculation unit, and the verification result is that the output result of the determination is incorrect.

S1402. Run the self-check library to determine whether the state of the second computing unit is a transient failure or a permanent failure.

The method in Figure 14 is executed by the first computing unit, which can be another AI core on the AI chip where the second computing unit is located, or other calculations with lower computing power on the AI chip where the second computing unit is located. unit, the other computing unit may or may not be designed for AI acceleration, the first computing unit may also be a computing unit on another AI chip, the first computing unit may also be a CPU core in a CPU chip, etc. , where the CPU core is the smallest computing unit in the CPU chip. In a possible implementation, the first computing unit can also be the same computing unit as the second computing unit, that is, the computing unit not only performs AI computing, but also Execute the AI calculation verification method of the embodiment of the present application. . Wherein, the verification result of the output result of the AI model is obtained from the second calculation unit, the verification result may be the verification result obtained by the method shown in Figure 5, or it may be any existing verification The verification result obtained by the method.

When the result of running the self-test library is no fault, the state of the second computing unit is transient failure; when the result of running the self-test library is faulty, the state of the second computing unit is permanent failure. When the status of the second computing unit is transient failure, the output result is discarded; when the status of the second computing unit is permanent failure, the failure status of the second computing unit is reported.

The verification method of AI calculation in the embodiment of the present application uses the CPU to perform system scheduling, calls STL to perform self-inspection on the AI core with hardware failure, and judges whether the AI core has a permanent failure or a transient failure. If no failure is found in the self-inspection, It means that the AI core has a transient failure, which does not affect the subsequent calculation, and the AI core can continue to participate in the system operation. If a fault is found in the self-test, it means that the AI core has permanently failed, and the AI core cannot continue to participate in the calculation, and a fault report is required. It avoids directly deactivating the failed AI core, reduces the waste of resources, and improves the availability of AI chips.

The verification method for AI calculation in the embodiment of the present application has been described in detail above with reference to FIG. 1 to FIG. 14 . Hereinafter, with reference to FIG. 15 and FIG. 16 , the AI calculation verification device provided by the embodiment of the present application will be described in detail. Here, the AI calculation verification device may be the above-mentioned first calculation unit, which is used to perform the verification of AI calculation. It should be understood that the descriptions of the device embodiments correspond to the descriptions of the method embodiments. Therefore, for details that are not described in detail, reference may be made to the method embodiments above. For brevity, details are not repeated here.

Fig. 15 is a schematic block diagram of an AI calculation verification device provided by an embodiment of the present application, and the device 1500 may specifically be a chip, an intelligent driving hardware platform, or the like. The device 1500 includes a transceiver module 1510 and a processing module 1520 . The transceiver module 1510 can implement a corresponding communication function, and the processing module 1520 is used for data processing. The transceiver module 1510 may also be called a communication interface or a communication unit.

Optionally, the device 1500 may further include a storage module, which may be used to store instructions and/or data, and the processing module 1520 may read the instructions and/or data in the storage module, so that the device implements the aforementioned method embodiments .

The device 1500 can be used to execute the actions in the above method embodiments, specifically, the transceiver module 1510 is used to execute the operations related to sending and receiving in the above method embodiments, and the processing module 1520 is used to execute the operations in the above method embodiments Handle related operations.

The apparatus 1500 may implement steps or processes corresponding to the method embodiments in the embodiments of the present application, and the apparatus 1500 may include modules for executing the methods in FIG. 5 and FIG. 14 . Moreover, each module in the apparatus 1500 and the above-mentioned other operations and/or functions are respectively for realizing the corresponding flow of the method embodiment on the second node side in FIG. 5 and FIG. 14 .

Wherein, when the apparatus 1500 is used to execute the method 500 in FIG. 5 , the transceiver module 1510 can be used to execute steps 501 and 502 in the method 500 ; the processing module 1520 can be used to execute the processing steps 503 and 504 in the method 500 .

Specifically, the transceiver module 1510 is configured to obtain the parameters of the AI model that the second calculation unit processes the AI calculation, the AI model includes one or more first processing layers; for each of the one or more first processing layers, the first The processing layer respectively performs the following verification processing to obtain the verification mark bit of each first processing layer in one or more first processing layers: the transceiver module 1510 is also used to obtain the input of the first processing layer from the second computing unit Data; processing module 1520, used to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain the verification mark bit of the first processing layer, wherein the first processing layer The calculation amount of the verification processing is less than the calculation amount of the second calculation unit processing the input data through the first processing layer; the processing module 1520 is also used to determine whether the output result of the second calculation unit processing AI calculation is correct based on the verification result, and the verification The result includes check flag bits for each of the one or more first processing layers.

In some possible implementations, the AI model further includes one or more second processing layers, and the processing module 1520 is further configured to: respectively perform redundancy on each of the one or more second processing layers Verifying and obtaining the verification mark bit of each second processing layer in the one or more second processing layers; the verification result also includes the verification mark of each second processing layer in the one or more second processing layers bit.

In some possible implementations, the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the processing module 1520 is specifically used to: acquire the first check mark bit, the first check mark bit It is obtained by performing the first check calculation on the weight matrix; the second check mark bit is obtained, and the second check mark bit is obtained by performing the second check calculation on the feature map matrix; according to the first check mark bit and the second check mark bit The second check mark bit obtains the check mark bit before calculation; the output matrix is obtained from the second calculation unit, and the output matrix is calculated by the second calculation unit on the weight matrix and the feature map matrix at the first processing layer; the second calculation is performed on the output matrix Three check calculations to obtain the post-calculation check mark; obtain the check mark according to the pre-calculation check mark and the post-calculation check mark.

In some possible implementations, the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation, and the processing module 1520 is specifically configured to: if there is at least one check mark bit in the check result If the pre-calculation checkmark bit is inconsistent with the post-calculation checkmark bit, the output result will be incorrect.

In some possible implementation manners, when the judgment output result is incorrect, the processing module 1520 is further configured to: determine that the state of the second computing unit is a transient failure or a permanent failure by running a self-check library.

In some possible implementation manners, the transceiver module 1510 is further configured to: report the failure status of the second computing unit when the status of the second computing unit is permanent failure.

When the apparatus 1500 is used to execute the method 1400 in FIG. 14 , the transceiving module 1510 can be used to execute step 1401 in the method 1400 ; the processing module 1520 can be used to execute the processing step 1402 in the method 1400 .

Specifically, the sending and receiving module 1510 is used to obtain the verification result of the output result of the AI model processed by the second computing unit to process the AI calculation, and the verification result is to determine that the output result is incorrect; the processing module 1520 is used to run the self-checking library It is determined that the state of the second computing unit is a transient failure or a permanent failure.

In some possible implementations, the device 1500 is also used to: when the status of the second computing unit is transient failure, discard the output result; when the status of the second computing unit is permanent failure, report to the second computing unit failure status.

It should be understood that the specific process of each module performing the above corresponding steps has been described in detail in the above method embodiments, and for the sake of brevity, details are not repeated here.

As shown in FIG. 16 , the embodiment of the present application also provides a verification device 1600 for AI calculation. The AI computing device 1600 shown in FIG. 16 may include: a memory 1610 , a processor 1620 , and a communication interface 1630 . Wherein, the memory 1610, the processor 1620, and the communication interface 1630 are connected through an internal connection path, the memory 1610 is used to store instructions, and the processor 1620 is used to execute the instructions stored in the memory 1620 to control the communication interface 1630 to receive input samples or send forecast result. Optionally, the memory 1610 may be coupled to the processor 1620 through an interface, or may be integrated with the processor 1620 .

It should be noted that the above-mentioned communication interface 1630 uses a transceiver device such as but not limited to a transceiver to implement communication between the communication device 1600 and other devices or communication networks. The above-mentioned communication interface 1630 may also include an input/output interface (input/output interface).

In the implementation process, each step of the above method may be implemented by an integrated logic circuit of hardware in the processor 1620 or instructions in the form of software. The methods disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 1610, and the processor 1620 reads the information in the memory 1610, and completes the steps of the above method in combination with its hardware. To avoid repetition, no detailed description is given here.

It should be understood that in the embodiment of the present application, the processor may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP), dedicated integrated Circuit (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

It should also be understood that in the embodiment of the present application, the memory may include a read-only memory and a random access memory, and provide instructions and data to the processor. A portion of the processor may also include non-volatile random access memory. For example, the processor may also store device type information.

An embodiment of the present application further provides a chip, which is characterized in that the chip includes a first computing unit, and the first computing unit is configured to execute the above-mentioned method in FIG. 5 or FIG. 14 .

Optionally, the chip further includes a second calculation unit, and the second calculation unit is used to perform AI calculation.

The embodiment also provides a computer-readable medium, which is characterized in that the computer-readable medium stores program codes, and when the computer program codes run on the computer, the computer executes the method in FIG. 5 or FIG. 14 .

The embodiment also provides a computing device, including a first computing unit and a second computing unit, the second computing unit is used to process AI computing based on the AI model, and the first computing unit executes the method in FIG. 5 or FIG. 14 . The processing capability of the first computing unit is less than or equal to the processing capability of the second computing unit. The first computing unit is at least one of a computing unit in an AI chip, a computing unit in a CPU chip, or a computing unit in a GPU chip, and the second computing unit is a computing unit in an AI chip.

The terms "component", "module", "system" and the like are used in this specification to refer to a computer-related entity, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be components. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems). Communicate through local and/or remote processes.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims

A verification method for artificial intelligence AI calculation, characterized in that the method is executed by a first calculation unit, and the method includes:

Acquiring parameters of an AI model for processing the AI calculation by the second computing unit, the AI model including one or more first processing layers;

Perform the following verification process on each of the one or more first processing layers respectively to obtain the verification mark bit of each of the first processing layers in the one or more first processing layers:

obtaining input data of the first processing layer from the second computing unit;

Perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer to obtain a verification flag bit of the first processing layer, wherein the first processing layer The calculation amount of the verification processing of the processing layer is smaller than the calculation amount of the second computing unit processing the input data through the first processing layer;

determining whether the output result of the AI calculation processed by the second calculation unit is correct based on a verification result, the verification result including a verification mark of each first processing layer in the one or more first processing layers bit.
The method according to claim 1, wherein the AI model further comprises one or more second processing layers, and the method further comprises:

performing a redundancy check on each of the one or more second processing layers respectively to obtain a check mark bit of each of the one or more second processing layers;

The verification result also includes a verification flag bit for each second processing layer of the one or more second processing layers.
The method according to claim 1 or 2, wherein the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the parameters based on the AI model and the The input data of the first processing layer performs verification processing on the first processing layer to obtain the verification mark bit of the first processing layer, including:

Obtaining a first check mark bit, the first check mark bit is obtained by performing a first check calculation on the weight matrix;

Obtaining a second check mark bit, the second check mark bit is obtained by performing a second check calculation on the feature map matrix;

Obtaining a pre-calculation check mark according to the first check mark and the second check mark;

Obtaining an output matrix from the second calculation unit, where the output matrix is obtained by calculating the weight matrix and the feature map matrix at the first processing layer by the second calculation unit;

Carrying out the third verification calculation on the output matrix to obtain the post-calculation verification mark;

The check mark bit is obtained according to the check mark bit before calculation and the check mark bit after calculation.
The method according to claim 3, wherein the check mark bit represents whether the check mark bit before the calculation is consistent with the check mark bit after the calculation, and the determination of the first check mark bit based on the check result The second calculation unit processes whether the output result of the AI calculation is correct, including:

If at least one check mark bit in the check result indicates that the pre-calculation check mark bit and the post-calculation check mark bit are inconsistent, the output result is incorrect.
The method according to any one of claims 1 to 4, wherein the first processing layer is a convolutional layer or a fully connected layer.
The method according to any one of claims 1 to 5, wherein when it is judged that the output result is incorrect, the state of the second computing unit includes transient failure and permanent failure.
The method according to claim 6, wherein when it is judged that the output result is incorrect, the method further comprises:

The state of the second computing unit is determined to be a transient failure or a permanent failure by running the self-test library.
The method according to claim 7, wherein the method further comprises:

When the status of the second computing unit is permanent failure, report the failure status of the second computing unit.
A verification method for AI calculation, characterized in that the method is executed by a first calculation unit, and the method includes:

Acquiring a verification result of the output result of the AI model processed by the second computing unit, the verification result is to determine that the output result is incorrect;

The self-test library is run to determine that the state of the second computing unit is a transient failure or a permanent failure.
The method according to claim 9, wherein the operation self-check library determines that the state of the second computing unit is a transient failure or a permanent failure, comprising:

When the operation result of the running self-check library is no failure, the state of the second computing unit is transient failure;

When the running result of running the self-check library is faulty, the state of the second computing unit is permanent failure.
The method according to claim 9 or 10, characterized in that the method further comprises:

discarding the output result when the state of the second computing unit is a transient failure;

When the status of the second computing unit is permanent failure, report the failure status of the second computing unit.
A verification device for AI calculation, characterized in that the device includes:

A transceiver unit, configured to obtain parameters of an AI model for processing the AI calculation by the second computing unit, where the AI model includes one or more first processing layers;

Perform the following verification process on each of the one or more first processing layers respectively to obtain the verification mark bit of each of the first processing layers in the one or more first processing layers:

The transceiver unit is further configured to acquire input data of the first processing layer from the second computing unit;

A processing unit, configured to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain a verification flag bit of the first processing layer, wherein, The calculation amount of the verification processing on the first processing layer is smaller than the calculation amount of the second computing unit processing the input data through the first processing layer;

The processing unit is further configured to determine whether the output result of processing the AI calculation by the second computing unit is correct based on a check result, the check result including each first processing layer in the one or more first processing layers A checkmark bit for the processing layer.
The device according to claim 11, wherein the AI model further comprises one or more second processing layers, and the processing unit is also used for:

performing a redundancy check on each of the one or more second processing layers respectively to obtain a check mark bit of each of the one or more second processing layers;

The verification result also includes a verification flag bit for each second processing layer of the one or more second processing layers.
The device according to claim 12 or 13, wherein the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the processing unit is specifically used for:

Obtaining a first check mark bit, the first check mark bit is obtained by performing a first check calculation on the weight matrix;

Obtaining a second check mark bit, the second check mark bit is obtained by performing a second check calculation on the feature map matrix;

Obtaining a pre-calculation check mark according to the first check mark and the second check mark;

Obtaining an output matrix from the second calculation unit, where the output matrix is obtained by calculating the weight matrix and the feature map matrix at the first processing layer by the second calculation unit;

Carrying out the third verification calculation on the output matrix to obtain the post-calculation verification mark;

The check mark bit is obtained according to the check mark bit before calculation and the check mark bit after calculation.
The device according to claim 14, wherein the check mark bit indicates whether the check mark bit before the calculation is consistent with the check mark bit after the calculation, and the processing unit is specifically used for:

If at least one check mark bit in the check result indicates that the pre-calculation check mark bit and the post-calculation check mark bit are inconsistent, the output result is incorrect.
The device according to any one of claims 12 to 15, wherein the first processing layer is a convolutional layer or a fully connected layer.
The device according to any one of claims 12 to 16, wherein when it is judged that the output result is incorrect, the state of the second computing unit includes transient failure and permanent failure.
The device according to claim 17, wherein when it is judged that the output result is incorrect, the processing unit is further configured to:

The state of the second computing unit is determined to be a transient failure or a permanent failure by running the self-test library.
The device according to claim 18, wherein the transceiver unit is also used for:

When the status of the second computing unit is permanent failure, report the failure status of the second computing unit.
A verification device for AI calculation, characterized in that the device includes:

The transceiver unit is configured to obtain a verification result of the output result of the AI model processed by the second calculation unit for the AI calculation, and the verification result is to determine that the output result is incorrect;

The processing unit is configured to run the self-check library to determine that the state of the second computing unit is a transient failure or a permanent failure.
The device according to claim 20, characterized in that,

When the operation result of the running self-check library is no failure, the state of the second computing unit is transient failure;

When the running result of running the self-check library is faulty, the state of the second computing unit is permanent failure.
The device according to claim 20 or 21, wherein the device is also used for:

discarding the output result when the state of the second computing unit is a transient failure;

When the status of the second computing unit is permanent failure, report the failure status of the second computing unit.
A chip, characterized by comprising a first computing unit configured to execute the method according to any one of claims 1 to 8 or any one of claims 9 to 11.
The chip according to claim 23, wherein the chip further comprises a second calculation unit, the second calculation unit is used to perform AI calculation.
A computer-readable medium, characterized in that the computer-readable medium stores program codes, and when the computer program codes run on a computer, the computer executes the method described in any one of claims 1 to 8. method or the method described in any one of 9 to 11.
A computing device, characterized in that the computing device includes a first computing unit and a second computing unit, the second computing unit is used to process AI calculations based on an AI model, and the first computing unit executes the method according to claim 1. The method described in any one of 8 to 8 or the method described in any one of 9 to 11 verifies the second calculation unit.
The computing device according to claim 26, the processing capability of the first computing unit is less than or equal to the processing capability of the second computing unit.
The computing device according to claim 26 or 27, the first computing unit is at least one of a computing unit in an AI chip, a computing unit in a central processing unit CPU chip, or a computing unit in a graphics processing unit GPU chip , the second computing unit is a computing unit in an AI chip.