WO2023015919A1 - Ai computing verification method and apparatus - Google Patents

Ai computing verification method and apparatus Download PDF

Info

Publication number
WO2023015919A1
WO2023015919A1 PCT/CN2022/084753 CN2022084753W WO2023015919A1 WO 2023015919 A1 WO2023015919 A1 WO 2023015919A1 CN 2022084753 W CN2022084753 W CN 2022084753W WO 2023015919 A1 WO2023015919 A1 WO 2023015919A1
Authority
WO
WIPO (PCT)
Prior art keywords
calculation
processing
verification
computing unit
check
Prior art date
Application number
PCT/CN2022/084753
Other languages
French (fr)
Chinese (zh)
Inventor
王矿磊
陈艺帆
陈清龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023015919A1 publication Critical patent/WO2023015919A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/329Power saving characterised by the action undertaken by task scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present application relates to the field of artificial intelligence, and more specifically, relates to a method and device for verifying AI calculations.
  • AI Artificial intelligence
  • Data processing in transportation, medical care, security and other fields can be completed through AI neural networks.
  • high-level intelligent driving vehicles are often equipped with multiple cameras, lidar, ultrasonic radar and other sensors in order to achieve a comprehensive perception of the surrounding environment, resulting in a large amount of information that needs to be processed.
  • the real-time requirement of neural network inference calculation is very high.
  • the artificial intelligence chip is used as a hardware acceleration unit to perform the reasoning calculation of the neural network, and the AI chip is performing the reasoning of the neural network. Computing is faster and more energy efficient than traditional chips.
  • AI chips include graphics processing units (graphics processing units, GPUs), field programmable gate arrays (field programmable gate arrays, FPGAs), and application specific integrated circuits (application specific integrated circuits). circuit, ASIC).
  • Intelligent driving vehicles run in the external environment and may encounter various problems such as severe weather and electromagnetic interference, which requires the computing platform of the intelligent driving system to have extremely high reliability.
  • Traditional devices such as CPU and memory have error detection and fault tolerance mechanisms such as program flow monitoring, data flow monitoring, memory checking (error checking and correction, ECC), parity checking, etc. to ensure that data in the CPU and memory are not affected by soft failures.
  • ECC error checking and correction
  • parity checking etc.
  • the computing platform needs to meet the requirements of automotive safety integrity level (ASIL), and an error detection method is needed to detect the real-time performance of AI chips, so as to ensure that the application of AI chips meets the application scenarios demand.
  • ASIL automotive safety integrity level
  • This application provides a verification method and device for AI calculation.
  • the verification method can be executed by other calculation units other than the calculation unit processing AI calculation, and will not affect the processing of AI calculation.
  • the verification method of AI calculation in the embodiment of the present application has a very small amount of calculation, and has low requirements on the performance of the calculation unit used for verification, which reduces the hardware cost accordingly, and provides guarantee for the reliability of the AI chip.
  • a verification method for AI calculation is provided, the method is executed by the first calculation unit, and the method includes: obtaining the parameters of the AI model of the AI calculation processed by the second calculation unit, and the AI model includes one or more first processing layer; each first processing layer in one or more first processing layers performs the following verification process respectively to obtain the verification mark bit of each first processing layer in one or more first processing layers: from the first processing layer
  • the second calculation unit acquires the input data of the first processing layer; performs verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain the verification mark bit of the first processing layer, wherein, for The calculation amount of the verification processing of the first processing layer is less than the calculation amount of the second computing unit processing the input data through the first processing layer; based on the verification result, it is determined whether the output result of the second computing unit processing AI calculation is correct, and the verification result includes A check flag bit for each first processing layer of the one or more first processing layers.
  • the verification method of the AI calculation in the embodiment of the present application is performed by other computing units other than the computing unit that executes the AI calculation.
  • the verification method of the AI calculation in the embodiment of the present application will not interfere with the normal progress of the inference calculation of the AI model, so it will not affect the acceleration performance of the AI calculation unit, and it will also avoid the same AI calculation unit from performing verification when performing AI calculations, ensuring the correctness of the AI model output results While guaranteeing the real-time performance of the reasoning calculation of the AI model, and since the calculation amount of the verification processing is less than that of the AI calculation, the performance requirements of the computing unit used for the verification processing may not be higher than that of the computing unit used for AI calculation.
  • the heterogeneous verification method provided in the embodiment of the present application can save power consumption and reduce costs.
  • the AI model further includes one or more second processing layers
  • the method further includes: performing a redundancy check on each of the one or more second processing layers to obtain A check mark bit of each of the one or more second processing layers; the check result further includes a check mark bit of each of the one or more second processing layers.
  • the second processing layer includes pooling layer and activation layer, etc. Since pooling calculation and activation calculation only occupy a small part of resources, even if redundancy check is used, excessive resources will not be consumed.
  • the parameters of the AI model include a weight matrix
  • the input data of the first processing layer includes a feature map matrix
  • the first processing layer is verified based on the parameters of the AI model and the input data of the first processing layer Processing, to obtain the checkmark bit of the first processing layer, including:
  • the first check mark bit is obtained by performing the first check calculation on the weight matrix; obtain the second check mark bit, the second check mark bit is the second check mark bit on the feature map matrix According to the first check mark bit and the second check mark bit, the pre-calculation check mark bit is obtained; the output matrix is obtained from the second calculation unit, and the output matrix is the weight of the second calculation unit at the first processing layer
  • the matrix and the feature map matrix are obtained by calculation; the third check calculation is performed on the output matrix to obtain the check mark bit after calculation; the check mark bit is obtained according to the check mark bit before calculation and the check mark bit after calculation.
  • the above weight matrix and feature map matrix can be calculated offline and stored in memory. Moreover, the matrix calculations for obtaining different marker bits are different.
  • the verification method of AI calculation in the embodiment of the present application designs different verification methods for different processing layers in the AI model, which saves computing resources to the greatest extent, so that the verification can be performed in computing units with low computing power, reducing the cost of verification.
  • the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation, and it is determined based on the check result whether the output result of the AI calculation processed by the second calculation unit is correct, including: If there is at least one check mark bit in the check result indicating that the check mark bit before calculation and the check mark bit after calculation are inconsistent, the output result is incorrect.
  • the first processing layer is a convolutional layer or a fully connected layer.
  • the plurality of first processing layers may include a plurality of convolutional layers, or, a plurality of fully connected layers, or, one or more convolutional layers and one or more fully connected layers.
  • the state of the second computing unit includes transient failure and permanent failure.
  • the method when it is judged that the output result is incorrect, the method further includes: determining that the state of the second computing unit is a transient failure or a permanent failure by running a self-check library.
  • the method further includes: when the status of the second computing unit is permanent failure, reporting the failure status of the second computing unit.
  • the verification method of AI calculation in the embodiment of the present application can further judge the specific failure state of the hardware after the hardware failure is determined. If the calculation unit only fails transiently, the calculation unit can continue to be used, thereby avoiding the waste of resources. Improved the availability of AI chips.
  • a method for verifying AI calculation is provided, which is characterized in that the method is executed by the first calculation unit, and the method includes: obtaining the verification result of the output result of the AI model processed by the second calculation unit for AI calculation, and checking The result of the test is to determine that the output result is incorrect; the self-test library is run to determine that the state of the second computing unit is a transient failure or a permanent failure.
  • the operation of the self-check library determines that the state of the second computing unit is transient failure or permanent failure, including: when the operation result of running the self-test library is no fault, the state of the second computing unit is a transient failure; when the operation result of running the self-test library is faulty, the state of the second computing unit is permanent failure.
  • the method further includes: when the status of the second computing unit is transient failure, discarding the output result; when the status of the second computing unit is permanent failure, reporting the failure of the second computing unit state.
  • the verification method of AI calculation in the embodiment of this application uses the CPU to perform system scheduling, calls the self-inspection library to perform self-inspection on the AI core with hardware failure, and judges whether the AI core has a permanent failure or a transient failure. If the self-inspection does not find If the failure occurs, it means that the AI core has a transient failure, which does not affect the subsequent calculation, and the AI core can continue to participate in the system operation. If a fault is found in the self-test, it means that the AI core has permanently failed, and the AI core cannot continue to participate in the calculation, and a fault report is required. It avoids directly deactivating the failed AI core, reduces the waste of resources, and improves the availability of AI chips.
  • a verification device for AI calculation includes: a transceiver unit, configured to acquire parameters of an AI model for processing AI calculation by a second calculation unit, and the AI model includes one or more first processing layers; Each first processing layer in one or more first processing layers respectively performs the following verification processing to obtain the verification mark bit of each first processing layer in one or more first processing layers: the transceiver unit is also used for , to obtain the input data of the first processing layer from the second calculation unit; the processing unit is used to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain the first processing layer A check mark bit, wherein the calculation amount of the check processing on the first processing layer is less than the calculation amount of the second computing unit processing the input data through the first processing layer; the processing unit is also used to determine the second computing unit based on the check result Whether the output result of the AI calculation is correct or not, the verification result includes a verification flag bit of each first processing layer in one or more
  • the AI model further includes one or more second processing layers
  • the processing unit is further configured to: respectively perform redundancy calibration on each of the one or more second processing layers
  • the verification mark bit of each second processing layer in one or more second processing layers is verified; the verification result also includes the verification flag bit of each second processing layer in one or more second processing layers .
  • the parameters of the AI model include a weight matrix
  • the input data of the first processing layer includes a feature map matrix
  • the processing unit is specifically configured to: acquire the first check mark bit, the first check mark bit is It is obtained by performing the first check calculation on the weight matrix; obtaining the second check mark bit, which is obtained by performing the second check calculation on the feature map matrix; according to the first check mark bit and the second check mark bit
  • the check mark bit obtains the check mark bit before calculation;
  • the output matrix is obtained from the second calculation unit, and the output matrix is calculated by the second calculation unit on the weight matrix and the feature map matrix at the first processing layer;
  • the third step is performed on the output matrix Check the calculation to obtain the check mark bit after calculation; obtain the check mark bit according to the check mark bit before calculation and the check mark bit after calculation.
  • the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation, and the processing unit is specifically used to: if there is at least one check mark bit in the check result, it indicates If the pre-check mark bit is inconsistent with the post-calculation check mark bit, the output result will be incorrect.
  • the first processing layer is a convolutional layer or a fully connected layer.
  • the state of the second computing unit includes transient failure and permanent failure.
  • the processing unit is further configured to: determine that the state of the second computing unit is a transient failure or a permanent failure by running a self-check library.
  • the transceiver unit is further configured to: report the failure status of the second computing unit when the status of the second computing unit is permanent failure.
  • a verification device for AI calculation includes: a transceiver unit, configured to obtain a verification result of the output result of the AI model processed by the second calculation unit for AI calculation, and the verification result is the judgment output result Incorrect; the processing unit is used to run the self-check library to determine whether the state of the second computing unit is a transient failure or a permanent failure.
  • the state of the second computing unit is a transient failure; when the result of running the self-test library is faulty, the second computing unit status is permanently disabled.
  • the device is also used to: when the status of the second computing unit is transient failure, discard the output result; when the status of the second computing unit is permanent failure, report the status of the second computing unit failure state.
  • a chip including a first computing unit, and the first computing unit is configured to execute the method in any one of the possible implementations of the first aspect and the second aspect above.
  • the chip further includes a second calculation unit, and the second calculation unit is configured to perform AI calculation.
  • a computer-readable medium stores program codes, and when the computer program codes run on the computer, the computer executes any one of the above-mentioned first aspect and the second aspect. methods in possible implementations.
  • a computing device including a first computing unit and a second computing unit, the second computing unit is used to process AI computing based on an AI model, and the first computing unit performs any of the above-mentioned first and second aspects.
  • the processing capability of the first computing unit is less than or equal to the processing capability of the second computing unit.
  • the first calculation unit is at least one of the calculation unit in the AI chip, the calculation unit in the CPU chip, or the calculation unit in the GPU chip, and the second calculation unit is the calculation unit in the AI chip. unit.
  • Fig. 1 is the schematic diagram of the verification method of the periodical operation self-inspection storehouse of the embodiment of the present application;
  • FIG. 2 is a schematic diagram of a system architecture of a possible application of the verification method of AI calculation according to the embodiment of the present application;
  • Fig. 3 is a schematic diagram of hardware units that may be involved when the verification method of AI calculation according to the embodiment of the present application is applied to the intelligent driving computing platform;
  • FIG. 4 is a system architecture diagram of the application of the verification method of AI calculation in the embodiment of the present application.
  • FIG. 5 is a schematic flow chart of a verification method for AI calculation according to an embodiment of the present application.
  • Fig. 6 is a schematic diagram of calculating the off-line check mark bit of the embodiment of the present application.
  • FIG. 7 is a schematic diagram of a check mark bit of a calculation feature map according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a check mark before calculation in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of processing data by the AI model of the embodiment of the present application.
  • Fig. 10 is a schematic diagram of the calculated check mark bit in the embodiment of the present application.
  • FIG. 11 is a schematic diagram of calculating a checkmark bit in an embodiment of the present application.
  • Fig. 12 is a schematic diagram of further detecting the specific failure status of the second computing unit after detecting the failure of the second computing unit according to the embodiment of the present application;
  • Fig. 13 is a schematic diagram of judging the specific failure state of the second computing unit according to the embodiment of the present application.
  • FIG. 14 is a schematic flowchart of another AI calculation verification method according to the embodiment of the present application.
  • Fig. 15 is a schematic block diagram of the verification device for AI calculation provided by the embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of an AI calculation verification device according to an embodiment of the present application.
  • the inference calculation of the neural network belongs to AI calculation, but it is very necessary to ensure the correctness of the inference calculation results of the neural network. For example, in the scenario of intelligent driving or intelligent driving, if a wrong decision is made based on the wrong calculation result output by the neural network, it will bring great danger to driving. Therefore, in order to ensure the correctness of the reasoning and calculation results of the neural network, some verification methods have been proposed in the industry, including redundant verification and periodic self-test library (self-test library, STL) verification methods.
  • STL self-test library
  • Redundancy checks include dual module redundancy (DMR) and triple module redundancy (TMR), etc., which refer to using twice or three times the same computing chips or computing units to perform the same calculation at the same time , and then compare the calculation results. If the calculation results are consistent, the calculation results are correct. If the calculation results are inconsistent, the calculation results are wrong. Redundancy verification can theoretically verify the output of the neural network, but at the same time it will double or triple the cost increase.
  • DMR dual module redundancy
  • TMR triple module redundancy
  • Figure 1 (a) shows a schematic diagram of an AI core in an AI chip performing four AI calculations sequentially in chronological order. Each AI calculation runs a neural network for inference calculations and obtains output results. In order to ensure the accuracy of the output results, it is necessary to test the calculation results.
  • Figure (b) in Figure 1 shows that the STL detection method is used to detect 4 AI calculations, where the STL detection is periodic. An STL test is performed after each AI calculation, and the next AI calculation is continued only when the STL test result is normal.
  • Figure (c) in Figure 1 shows the situation where STL detects hardware failure. As shown in (c), in the third detection cycle, when the STL detection finds that the AI chip has a permanent failure, the AI chip The chip no longer performs AI calculations and reports faults.
  • STL detection is performed periodically, and the detection cycle is fixed, but the AI calculation time is uncertain.
  • the preset detection cycle is 10 milliseconds, but a certain The AI calculation time may exceed 10 milliseconds, so the AI calculation process may be interrupted when running the STL test, resulting in a delay in decision-making, and the AI calculation needs to be stopped when the STL test is performed, and the real-time decision-making cannot be guaranteed.
  • the preset detection period is 10 milliseconds
  • the AI calculation time may be less than 10 milliseconds.
  • the AI calculation is carried out in one detection period.
  • the AI chip failure was detected in the third detection cycle, which may have occurred in the first AI calculation in the third detection cycle, but the STL detection did not detect it in time, resulting in erroneous calculation results Wrong decisions may pose a threat to the safety of equipment.
  • STL detection mainly identifies permanent failures, and the detection of transient failures is weak. If a transient failure occurs, STL detection will also judge the transient failure as a permanent failure, and then report the fault.
  • AI chips include A large number of multiplication and addition calculation units, so the area of the AI chip is relatively large, and the probability of being impacted by high-energy particles is also high in theory.
  • the probability of transient failure of the AI chip is higher than the probability of permanent failure, and the STL detection will instantaneously If the status failure is determined to be a permanent failure and a fault is reported, the system will disable the reported AI chip, wasting hardware resources and reducing system availability.
  • the embodiment of the present application provides a verification method for AI calculation, which can be executed by other calculation units other than the calculation unit processing AI calculation, and will not affect the processing of AI calculation; verification and neural network
  • the reasoning and calculation of the neural network are carried out synchronously, so that the calculation results of the neural network are correct and real-time;
  • the performance requirements of the computing unit are low, which reduces the hardware cost accordingly and provides a guarantee for the reliability of the AI chip.
  • the verification method of AI calculation in the embodiment of the present application can be applied to any scene of neural network reasoning calculation, including: a scene with high security requirements for neural network reasoning calculation, for example, in 5G smart industry, massive industrial data will be Analysis, processing and optimization through the AI neural network require a more secure and reliable reasoning calculation of the neural network. Therefore, the verification method of AI calculation in the embodiment of this application can be applied to 5G smart industries, such as control equipment, servers, etc.
  • this type of graphics processing unit has low process requirements when it involves, so the failure rate is high, and usually does not have any error detection capabilities, but at the same time, devices equipped with such graphics processing units need to process a large amount of image data, processing
  • the process is mainly based on matrix calculations.
  • the verification method of AI calculation in the embodiment of this application is used as a low-cost calibration
  • the test method can be applied to smart devices equipped with the above-mentioned graphics processing unit, so as to reduce the impact of hardware failure on data processing and improve user experience.
  • FIG. 2 shows hardware units that may be involved in the application of the verification method for AI calculation according to the embodiment of the present application, including AI chips, CPU chips, memory chips, buses, and GPUs.
  • the neural network reasoning calculation is performed by the AI chip, and the AI chip can include one or more AI cores, and the AI calculation can be performed by each AI core in the AI chip.
  • the smallest computing unit, so the AI core can be called the AI computing unit.
  • the on-chip storage unit is used to cache the parameters, intermediate calculation results and inference calculation results of the AI model for AI calculation.
  • Each AI core is connected to the on-chip storage unit through a bus .
  • the CPU and the GPU may also respectively include one or more computing units.
  • the verification method of the neural network calculation of the present application verifies each calculation of the neural network, wherein each verification can be performed by the AI core of the AI chip, or can be performed by the computing unit in the CPU chip or GPU.
  • the result is stored in the memory chip, and finally the CPU chip reads the verification result in the memory chip and makes a logical judgment to determine whether the calculation result of the neural network is correct.
  • the AI chip, CPU chip, memory chip, and GPU are connected through a bus.
  • Fig. 3 shows the system architecture of an application of the verification method of AI calculation according to the embodiment of the present application.
  • the data acquired by sensors such as ultrasonic radar is calculated by the intelligent driving computing platform to generate a series of execution instructions and send them to specific actuators for execution.
  • the brake actuator controls vehicle deceleration, etc. If the calculation result of the intelligent driving computing platform is wrong and a wrong execution command is generated, the execution of the actuator according to the wrong execution command will cause driving danger. Therefore, it is particularly necessary to verify the inference calculation of the neural network in the smart driving computing platform.
  • the AI calculation verification method can be applied to the intelligent driving computing platform in Figure 3 to verify the reasoning calculation of the neural network in the intelligent driving computing platform to ensure the correctness of the calculation results and ensure the safety of intelligent driving.
  • the intelligent driving computing platform in FIG. 3 includes the hardware units shown in FIG. 2 .
  • FIG 4 shows the system architecture diagram of the application of the verification method of AI calculation in the embodiment of the present application.
  • the verification method of AI calculation in the embodiment of the present application is a heterogeneous parallel verification method, which is relatively For traditional redundancy check calculations.
  • the redundant verification calculation is to perform the same calculation as the verified process, and compare the calculation results to determine whether the verified process is correct.
  • another AI computing unit for verification is used to perform the same convolution calculation, pooling, activation and full connection calculation output calculation results on the input data. By comparing the two The calculation results of each AI calculation unit determine whether the processing of the verified AI calculation unit is correct.
  • the verification method of the AI calculation in the embodiment of the present application is a heterogeneous parallel verification method.
  • the heterogeneous method uses different calculation processing from the verified calculation for verification. Connection, pooling, normalization, etc. are designed with different verification methods, and the calculation is heterogeneous. Since the calculation is heterogeneous, the computing unit used for verification can also be different from the AI computing unit to be verified, and the computing unit used for verification and the AI computing unit to be verified can be hardware with different structures and capabilities .
  • the heterogeneous parallel verification may be a calculation process different from the AI calculation to be verified.
  • the computing unit used for verification may be the same as the AI computing unit to be verified.
  • the heterogeneous parallel verification can not only adopt calculation processing different from that of the AI calculation to be verified, but also have different structures or processing capabilities of the computing unit used for verification and the AI computing unit to be verified.
  • the heterogeneous parallel verification of AI calculations can be performed by AI computing units, or by other computing units.
  • the computing units that perform the verification of AI reasoning calculations are all recorded as heterogeneous computing units, where heterogeneous computing units
  • the computing power can be the same as the computing power of the computing unit that performs AI reasoning calculations, or can be lower than the computing power of computing units that perform AI reasoning calculations.
  • Calculation verification consumes much less computing resources than AI computing, so it can be executed in ordinary computing units with low computing power, such as GPU computing units or CPUs with small computing power, thereby reducing the need for calibration.
  • the threshold of inspection saving the cost.
  • Parallelism refers to the verification of each calculation in the neural network in the process of AI calculation to generate a check mark bit.
  • Each calculation of the neural network here refers to the pair of processing layers of each processing data of the neural network. For the calculations made during data processing, when the neural network calculation is completed and the calculation results are output, the calculation results can be judged according to all the check mark bits to determine whether the calculation results are correct, without stopping the AI calculation, and the check has real-time To meet the real-time requirements for neural network inference calculations in application scenarios.
  • the method for verifying the AI calculation in the embodiment of the present application will be specifically introduced below with reference to FIG. 4 .
  • the AI computing unit normally executes the inference calculation of the neural network.
  • the data to be processed is input into the AI computing unit, after preprocessing, the data is input into the AI core in the form of tensor, and the neural network model is also loaded into the AI core. middle.
  • Neural network models have different structures, and the data to be processed is processed by multiple processing layers in the neural network model to obtain inference results and output.
  • the processing layer in the neural network model can be divided into structure and can include convolution layer, pooling layer, activation layer and fully connected layer.
  • a neural network model can contain multiple processing layers, for example, one or more convolutional layers, one or more pooling layers, one or more activation layers, and one or more fully connected layers. Among them, the calculation of the convolutional layer and the fully connected layer is reflected in the multiplication of the matrix. When the dimension of the matrix is high, the calculation amount introduced by the matrix multiplication operation is relatively large, and it will consume more computing resources.
  • Figure 4 illustrates a neural network model with a typical structure as an example.
  • the neural network model includes a convolutional layer, a pooling layer, an activation layer, and a fully connected layer. It should be understood that the verification of the AI calculation in the embodiment of the present application
  • the method is not limited to the structure of the neural network model in FIG. 4 , and is also applicable to neural network models of other structures, and each structure may also have multiple identical processing layers.
  • the preprocessed data is calculated by the convolutional layer, pooling layer, activation layer, and fully connected layer to obtain the output calculation results, and then the AI core passes the output calculation results to the CPU, and the CPU executes corresponding decisions based on the calculation results .
  • the AI calculation verification method of the embodiment of the present application can verify the inference calculation of the neural network to ensure the correctness of the output calculation results and avoid serious consequences caused by the CPU executing wrong decisions based on wrong calculation results.
  • the heterogeneous computing unit According to the verification method of AI calculation in the embodiment of the present application, one or more calculations in the neural network in the AI core are synchronously verified, as shown in Figure 4, the first convolution calculation of the convolutional layer is verified The check mark bit 1 is obtained, and the convolution layer may perform multiple convolution calculations, so the second convolution calculation of the convolution layer is checked to obtain the check mark bit 2, and the third convolution layer of the convolution layer Check the convolution calculation to get the check mark bit 3...
  • the synchronous verification of one or more calculations in the neural network is also completed, and one or more check mark bits are generated and stored in the memory.
  • the output result is verified by one or more check mark bits, and it can be determined whether the calculation result output by the neural network is correct or not according to the one or more check mark bits.
  • the verification method of AI calculation in the embodiment of the present application designs different verification methods for different structures inside the neural network. As shown in Figure 4, for the convolutional layer and the fully connected layer, the calculation resources are consumed due to the multiplication operation of the matrix.
  • More processing layers use matrix calculation and verification methods, such as matrix calculations for dimensionality reduction for verification, including multiplication operations between vectors and matrices, or vector multiplication operations for verification.
  • the calculation amount of matrix calculations is relatively large compared to redundant verification
  • the calculation amount of the multiplication operation of the matrix in the method is low, and the redundancy check refers to performing the same calculation on the corresponding processing layer. Since more than 99% of the calculations in the neural network are brought by matrix multiplication operations in structures such as convolutional layers and fully connected layers, the remaining less than 1% of the calculations are caused by structures such as pooling layers and activation layers.
  • the matrix multiplication operation in the structure of the convolutional layer and the fully connected layer that consumes most computing resources is checked in a relatively low-computing way, which can greatly reduce the reasoning of the neural network Calculate the amount of calculations for the checksum.
  • redundant checks can still be used to simplify the processing of checks.
  • checks with a small amount of calculation can also be used to further reduce the impact on neural networks. The calculation amount of inference calculation verification of the network.
  • the CPU reads one or more check marks in the memory, and judges whether the calculation result output by the neural network is correct according to the one or more check marks, and if it determines that the calculation result is correct, then executes the corresponding decision. If the calculation result is determined to be incorrect, a fault is reported.
  • FIG. 5 shows a schematic flow chart of the verification method for AI calculation according to the embodiment of the present application. As shown in FIG. 5 , it includes steps 501 to 504, which will be introduced respectively below.
  • the second calculation unit is an AI core that executes neural network inference calculations.
  • the verification method of AI calculation shown in FIG. 5 is executed by the first calculation unit.
  • the first calculation unit may not be the same calculation unit as the second calculation unit.
  • the first calculation unit can be another AI core on the AI chip where the second computing unit is located, or another computing unit with lower computing power on the AI chip where the second computing unit is located.
  • the other computing unit can be designed for AI acceleration, or It may not be designed for AI acceleration.
  • the first computing unit may also be a computing unit on another AI chip.
  • the first computing unit may also be a CPU core in a CPU chip.
  • the CPU core is the smallest computing unit in a CPU chip.
  • the first calculation unit and the second calculation unit may also be the same calculation unit, that is, the calculation unit not only performs AI calculation, but also executes the verification method of AI calculation in the embodiment of the present application at the same time.
  • the required AI model is loaded into the second computing unit, and the AI model may include one or more processing layers for processing the data input to the AI model, wherein the first A processing layer consumes relatively large computing resources.
  • the AI model may include one or more first processing layers.
  • the AI model may include a convolutional layer and a fully connected layer, and the AI model may also include multiple convolutional layers and multiple fully connected layers.
  • the first processing layer may also be a pooling layer or an activation layer.
  • the first calculation unit may acquire parameters of the AI model from the second calculation unit, and the parameters of the AI model include weights, biases, etc., for example, may be a weight matrix.
  • the parameters of the AI model can be saved in the system memory after offline training.
  • the first calculation unit may also acquire parameters of the AI model from the system memory.
  • Steps S502 to S503 are performed on each first processing layer in the one or more first processing layers:
  • the first computing unit obtains the input data of the first processing layer from the second computing unit, still taking the convolutional neural network model with a typical structure in S501 to process the image data as an example, when the first processing layer is a convolutional layer , the input data of the first processing layer is the preprocessed image data; when the first processing layer is a fully connected layer, the input data of the first processing layer is the output data of the activation layer.
  • the data processing will be converted into matrix calculations, so the parameters of the AI model obtained above include the weight matrix of the AI model, and the obtained first
  • the input data of a processing layer includes a matrix of feature maps.
  • the weight matrix includes multiple row vectors or multiple column vectors
  • the feature map matrix includes multiple row vectors or multiple column vectors
  • each vector includes multiple elements.
  • the dimension of the matrix can be the number of rows or columns of the matrix. The higher the dimension, the higher the complexity of matrix calculation and the higher the consumption of computing resources.
  • the verification method of AI calculation in the embodiment of the present application uses the method of matrix calculation and verification with reduced dimensionality to perform verification processing on the calculation of the first processing layer to obtain the verification flag bit of the first processing layer.
  • a or multiple matrices with fewer rows or columns than the weight matrix or feature map matrix is as follows:
  • the first calculation unit performs a first check calculation on the weight matrix to obtain a first check mark bit. Since the parameters of the AI model are stored in the system memory after offline training, the parameters of the AI model participating in data processing can be verified and calculated offline, that is to say, the first verification mark bit can be verified offline. check mark.
  • the first verification calculation may use a dimensionality reduction matrix relative to the feature map matrix, a matrix whose number of rows or columns is smaller than the feature map matrix, and perform matrix multiplication calculation with the weight matrix to obtain the first verification mark bit. As shown in Figure 6, the matrix multiplication operation is performed on the full 1-row vector and the weight matrix of the AI model, so as to obtain the offline checkpoint (OC) for the weight matrix.
  • the calculated off-line check mark bit OC can be stored in the memory, and can be read and used in the subsequent check process.
  • the matrix multiplication operation in the embodiment of the present application should satisfy that the number of columns of the left matrix is the same as the number of rows of the right matrix, that is, the number of columns of the all-1 vector in Figure 6 should be equal to the number of rows of the weight matrix, and other matrices in the embodiments of the present application
  • the multiplication operation should also satisfy this rule.
  • the full 1-row vector has only 1 row. From the perspective of the number of rows, it is smaller than the number of rows of the feature map matrix.
  • the matrix operation of the full 1-row vector and the weight matrix is compared to the matrix calculation of the weight matrix and the feature map matrix. Dimensionality reduction matrix computation. It should be noted that this is only an example and not limited thereto.
  • the convolutional neural network model with a typical structure in S501 to process image data as an example, as shown in Figure 7, after each frame of image is input into the second computing unit, it will be converted into a feature matrix (feature matrix), the first A calculation unit performs a second check operation on the feature map matrix to obtain a second check mark, which can also be called a check point feature map matrix (CF).
  • the second verification calculation may use a dimensionality reduction matrix relative to the weight matrix, a matrix whose number of rows or columns is smaller than the weight matrix, and perform matrix calculation with the feature map matrix to obtain the second verification mark bit.
  • the matrix multiplication operation is performed on the feature map matrix and all 1-column vectors to obtain the feature map check marks.
  • the full 1-column vector has only 1 column. From the perspective of the number of columns, it is smaller than the number of columns of the weight matrix.
  • the matrix operation of the full 1-column vector and the feature map matrix is relative to the matrix calculation of the weight matrix and the feature map matrix. Dimensionality reduction matrix computation. It should be noted that this is only an example and not limited thereto.
  • the first calculation unit acquires the pre-calculation check mark according to the first check mark (offline check mark OC) and the second check mark (feature map check mark CF). As shown in Figure 8, the off-line check mark bit OC and the feature map check mark bit CF are subjected to a vector multiplication operation to obtain the pre-calculation check mark bit (check bit in, CB_in).
  • FIG. 9 shows that the second calculation unit performs convolution operation on the weight matrix and the feature map matrix to obtain the output matrix.
  • This calculation process is the normal data processing process of the first processing layer in the AI model, not the verification process.
  • the convolution operation is performed by the convolution operator.
  • the convolution layer can include many convolution operators.
  • the convolution operator is also called the convolution kernel. It is used in the image
  • the role in processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can essentially be a weight matrix. This weight matrix is usually predefined. During the convolution operation on the image, The weight matrix is usually processed one pixel by one pixel (or two pixels by two pixels...depending on the value of the stride) along the horizontal direction on the input image, so as to complete the extraction of specific features from the image. Work.
  • the size of the weight matrix should be related to the size of the image.
  • the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • the weight matrix will be extended to The entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolutional output with a single depth dimension, but in most cases instead of using a single weight matrix, multiple weight matrices of the same size (row ⁇ column) are applied, That is, multiple matrices of the same shape.
  • the output of each weight matrix is stacked to form the depth dimension of the convolution image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image.
  • one weight matrix is used to extract image edge information
  • another weight matrix is used to extract specific colors of the image
  • another weight matrix is used to filter unwanted noise in the image. Do blurring etc.
  • the multiple weight matrices have the same size (row ⁇ column), and the convolutional feature maps extracted by the multiple weight matrices of the same size are also of the same size, and then the extracted multiple convolutional feature maps of the same size are combined to form The output of the convolution operation.
  • the weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network can make correct predictions.
  • the first calculation unit obtains the output matrix from the second calculation unit, and then performs a third check calculation (also a matrix multiplication calculation) on the output matrix to obtain the post-calculation check mark bit.
  • a third check calculation also a matrix multiplication calculation
  • the convolution operation is performed using the all-ones matrix and the output matrix, where the all-ones matrix should satisfy that its number of columns is the same as the number of rows of the output matrix before matrix multiplication calculation can be performed, and thus the post-calculation calibration can be obtained check mark bit (check bit out, CB_out).
  • the first calculation unit obtains the check mark bit (check bit, CB) according to the check mark bit CB_in before calculation and the check mark bit CB_out after calculation, and the check mark bit can be obtained by subtracting CB_in and CB_out, and can also be obtained by CB_in and CB_out CB_out is obtained by dividing, the purpose is to compare the difference between CB_in and CB_out.
  • Fig. 4 the matrix calculation and verification methods in Fig. 6 to Fig. 11 are adopted for the convolutional layer and the fully connected layer, and the check mark bit CB obtained is the check mark bit 1 and the check mark bit 4c in Fig. 4 .
  • the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation. If there is at least one check mark bit in the verification result, it means that the check mark bit before calculation and the check mark bit after calculation are inconsistent, then The output is incorrect.
  • the check mark CB obtained by subtracting CB_in and CB_out is taken as an example. Since CB_in is obtained from the offline check mark OC and the feature map check mark CF, it represents the theoretical matrix calculation check result of the output matrix, and CB_out is the actual matrix calculation verification result of the output matrix. If CB is 0, then CB_in and CB_out are the same, indicating that the theoretical matrix calculation verification result of the output matrix is the same as the actual matrix calculation verification result of the output matrix.
  • one or more check flag bits of the first processing layer can be obtained, which is recorded as a check result. It can be seen that if there is at least one check mark bit in the check result, it indicates that the check mark bit before calculation is inconsistent with the check mark bit after calculation, for example, one or more check mark bits of the first processing layer have at least one non-zero value , it means that the first processing layer corresponding to the check mark bit has an error in the calculation process based on the weight matrix and the feature map matrix, and the calculation result output by the AI model is incorrect; if all the check marks in the check result The bit indicates that the check mark bit before calculation is not consistent with the check mark bit after calculation. For example, the check mark bits of one or more first processing layers are all 0, which means that one or more first processing layers are in accordance with If there is no error in the calculation process of the weight matrix and the feature map matrix, the final output calculation result of the AI model is correct.
  • the check marks of the above one or more first processing layers are stored in the memory, and after the AI model outputs the calculation results, the CPU reads the check marks of one or more first processing layers in the memory bits, and then judge whether the output calculation result of the AI model is correct according to the check mark bits of the one or more first processing layers. If it is judged that the calculation result output by the AI model is correct, the corresponding decision will be executed. For example, according to the calculation of the intelligent driving computing platform in Figure 2, a series of execution instructions will be generated, and the actuator will control the steering, acceleration, deceleration, etc. of the vehicle according to the execution instructions; if If it is judged that the calculation result output by the AI model is incorrect, a failure of the second calculation unit is reported.
  • matrix calculation verification The method of verifying the dimensionality reduction matrix calculation shown in Figures 6 to 11 (hereinafter referred to as matrix calculation verification) can be used to verify the calculation of the convolutional layer and the fully connected layer, because the convolution calculation and All connection calculations will be converted to matrix calculations, and the method of dimensionality reduction matrix calculations is applicable for verification.
  • the verification method for AI calculation in the embodiment of the present application also includes, for one or more second Each second processing layer in the processing layer performs a redundancy check to obtain a check mark bit of the second processing layer, so that the above verification result also includes one or more check mark bits of the second processing layer,
  • the calculation result output by the AI model it is necessary to combine one or more check marks of the first processing layer and one or more check marks of the second processing layer for comprehensive judgment.
  • the redundancy check is to perform the same calculation of one or more second processing layers on another chip or another computing unit according to the same input data and the same AI model, and then compare one or more Whether the original calculation result of the second processing layer is consistent with the redundant calculation result, so as to obtain the check mark bits of one or more second processing layers, and the check mark bits of one or more second processing layers can be one or more
  • the second processing layer is the difference or quotient between the original calculation result and the redundant calculation result.
  • the original calculation results of one or more second processing layers are consistent with the redundant calculation results, it means that one or more second processing layers did not make an error when processing data; if the original calculation results of one or more second processing layers If the result is inconsistent with the redundant calculation result, it means that one or more second processing layers have errors in processing data, which will lead to errors in the final output calculation results of the AI model. Since pooling calculations and activation calculations only occupy a very small portion of resources, even if redundancy checks are used, excessive resources will not be consumed. When the original calculation result is inconsistent with the redundant calculation result, it is determined that an error occurred when the second processing layer processes the data, rather than the redundancy calculation.
  • the original calculation is performed by a common chip or a common calculation unit, and the redundant calculation is performed by a high-reliability chip or a high-reliability calculation unit.
  • the calculation results are inconsistent, because the chips or computing units that perform redundant calculations are more reliable, errors are less likely to occur, and the chips or computing units that perform the original calculations are less reliable, so errors are more likely to occur High, so when the original calculation result is inconsistent with the redundant calculation result, it is considered that an error occurred in the original calculation.
  • the output result of the AI model is incorrect, it means that the AI chip where the second computing unit is located has failed, including transient failure and permanent failure. Calculations are affected, but subsequent calculations are not affected, while permanent failures will affect all calculations after permanent failures. If transient failures and permanent failures are not distinguished, after fault reporting, according to the system The security mechanism, the second computing unit that is detected to be invalid will be directly disabled, even if the second computing unit that has a transient failure can continue to be used, it will be disabled, resulting in waste of resources and reducing the availability of the system .
  • the verification method of AI calculation in the embodiment of the present application also includes, when the output result of the AI model is judged to be incorrect according to the method in FIG. status failure or permanent failure.
  • the first computing unit simultaneously performs heterogeneous parallel verification.
  • Run the self-test library to judge whether the second computing unit has a transient failure or a permanent failure. If it is judged that the second computing unit has a transient failure, discard the output result of the AI model and continue to use the second computing unit Perform AI calculations, and if it is judged that the second calculation unit has a permanent failure, report the failure of the second calculation unit.
  • Fig. 13 shows a schematic diagram of judging the specific failure state of the second computing unit by the verification method of AI calculation according to the embodiment of the present application.
  • the CPU judges the output of the AI model according to multiple verification flag bits in the memory Whether the result is wrong, the specific judgment method refers to the above description in Figure 5, the embodiment of the present application will not repeat it here, if it is judged that the output result of the AI model is wrong, it means that the hardware has failed at this time, then run the self-test library Further judge whether the hardware has a permanent failure.
  • STL is mainly used to identify permanent failures. Therefore, STL is called to perform self-test on the failed hardware.
  • a fault is found in the self-test, it means that the second computing unit has a permanent failure at this time. Then the second computing unit The unit cannot continue to participate in the calculation, and a fault report is required; if no fault is found in the self-test, it means that the second computing unit has a transient failure at this time, and it will not affect subsequent calculations, so you only need to discard this error
  • the output result of the second calculation unit can continue to perform subsequent calculations.
  • the AI calculation verification method of the embodiment of the present application uses a heterogeneous parallel verification method to verify the calculation of the AI model.
  • the verification is performed by other computing units other than AI calculations.
  • the verification method of the self-inspection library the verification method of the AI calculation in the embodiment of the present application will not interfere with the normal progress of the reasoning calculation of the AI model, so it will not affect the acceleration performance of the AI core, and it will also avoid the same AI core being executed
  • verification is performed to ensure the correctness of the output results of the AI model and at the same time ensure the real-time performance of the reasoning calculation of the AI model.
  • the verification method of AI calculation in the embodiment of this application designs different verification methods for different processing layers in the AI model, which saves computing resources to the greatest extent, so that the verification can be performed in computing units with low computing power , reducing the cost of verification.
  • the verification method of AI calculation in the embodiment of the present application can further judge the specific failure state of the hardware after the hardware failure is determined. If the calculation unit only fails transiently, the calculation unit can continue to be used, thereby avoiding the waste of resources. Improved the availability of AI chips.
  • FIG. 14 shows a schematic flowchart of another AI calculation verification method according to the embodiment of the present application, as shown in FIG. 14 , including step 1401 and step 1402 .
  • the method in Figure 14 is executed by the first computing unit, which can be another AI core on the AI chip where the second computing unit is located, or other calculations with lower computing power on the AI chip where the second computing unit is located.
  • the other computing unit may or may not be designed for AI acceleration
  • the first computing unit may also be a computing unit on another AI chip
  • the first computing unit may also be a CPU core in a CPU chip, etc. , where the CPU core is the smallest computing unit in the CPU chip.
  • the first computing unit can also be the same computing unit as the second computing unit, that is, the computing unit not only performs AI computing, but also Execute the AI calculation verification method of the embodiment of the present application.
  • the verification result of the output result of the AI model is obtained from the second calculation unit
  • the verification result may be the verification result obtained by the method shown in Figure 5, or it may be any existing verification The verification result obtained by the method.
  • the state of the second computing unit is transient failure; when the result of running the self-test library is faulty, the state of the second computing unit is permanent failure.
  • the output result is discarded; when the status of the second computing unit is permanent failure, the failure status of the second computing unit is reported.
  • the verification method of AI calculation in the embodiment of the present application uses the CPU to perform system scheduling, calls STL to perform self-inspection on the AI core with hardware failure, and judges whether the AI core has a permanent failure or a transient failure. If no failure is found in the self-inspection, It means that the AI core has a transient failure, which does not affect the subsequent calculation, and the AI core can continue to participate in the system operation. If a fault is found in the self-test, it means that the AI core has permanently failed, and the AI core cannot continue to participate in the calculation, and a fault report is required. It avoids directly deactivating the failed AI core, reduces the waste of resources, and improves the availability of AI chips.
  • the AI calculation verification device may be the above-mentioned first calculation unit, which is used to perform the verification of AI calculation. It should be understood that the descriptions of the device embodiments correspond to the descriptions of the method embodiments. Therefore, for details that are not described in detail, reference may be made to the method embodiments above. For brevity, details are not repeated here.
  • Fig. 15 is a schematic block diagram of an AI calculation verification device provided by an embodiment of the present application, and the device 1500 may specifically be a chip, an intelligent driving hardware platform, or the like.
  • the device 1500 includes a transceiver module 1510 and a processing module 1520 .
  • the transceiver module 1510 can implement a corresponding communication function, and the processing module 1520 is used for data processing.
  • the transceiver module 1510 may also be called a communication interface or a communication unit.
  • the device 1500 may further include a storage module, which may be used to store instructions and/or data, and the processing module 1520 may read the instructions and/or data in the storage module, so that the device implements the aforementioned method embodiments .
  • a storage module which may be used to store instructions and/or data
  • the processing module 1520 may read the instructions and/or data in the storage module, so that the device implements the aforementioned method embodiments .
  • the device 1500 can be used to execute the actions in the above method embodiments, specifically, the transceiver module 1510 is used to execute the operations related to sending and receiving in the above method embodiments, and the processing module 1520 is used to execute the operations in the above method embodiments Handle related operations.
  • the apparatus 1500 may implement steps or processes corresponding to the method embodiments in the embodiments of the present application, and the apparatus 1500 may include modules for executing the methods in FIG. 5 and FIG. 14 . Moreover, each module in the apparatus 1500 and the above-mentioned other operations and/or functions are respectively for realizing the corresponding flow of the method embodiment on the second node side in FIG. 5 and FIG. 14 .
  • the transceiver module 1510 can be used to execute steps 501 and 502 in the method 500 ; the processing module 1520 can be used to execute the processing steps 503 and 504 in the method 500 .
  • the transceiver module 1510 is configured to obtain the parameters of the AI model that the second calculation unit processes the AI calculation, the AI model includes one or more first processing layers; for each of the one or more first processing layers, the first The processing layer respectively performs the following verification processing to obtain the verification mark bit of each first processing layer in one or more first processing layers: the transceiver module 1510 is also used to obtain the input of the first processing layer from the second computing unit Data; processing module 1520, used to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain the verification mark bit of the first processing layer, wherein the first processing layer The calculation amount of the verification processing is less than the calculation amount of the second calculation unit processing the input data through the first processing layer; the processing module 1520 is also used to determine whether the output result of the second calculation unit processing AI calculation is correct based on the verification result, and the verification The result includes check flag bits for each of the one or more first processing layers.
  • the AI model further includes one or more second processing layers
  • the processing module 1520 is further configured to: respectively perform redundancy on each of the one or more second processing layers Verifying and obtaining the verification mark bit of each second processing layer in the one or more second processing layers; the verification result also includes the verification mark of each second processing layer in the one or more second processing layers bit.
  • the parameters of the AI model include a weight matrix
  • the input data of the first processing layer includes a feature map matrix
  • the processing module 1520 is specifically used to: acquire the first check mark bit, the first check mark bit It is obtained by performing the first check calculation on the weight matrix; the second check mark bit is obtained, and the second check mark bit is obtained by performing the second check calculation on the feature map matrix; according to the first check mark bit and the second check mark bit
  • the second check mark bit obtains the check mark bit before calculation; the output matrix is obtained from the second calculation unit, and the output matrix is calculated by the second calculation unit on the weight matrix and the feature map matrix at the first processing layer; the second calculation is performed on the output matrix
  • the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation
  • the processing module 1520 is specifically configured to: if there is at least one check mark bit in the check result If the pre-calculation checkmark bit is inconsistent with the post-calculation checkmark bit, the output result will be incorrect.
  • the first processing layer is a convolutional layer or a fully connected layer.
  • the state of the second computing unit includes transient failure and permanent failure.
  • the processing module 1520 is further configured to: determine that the state of the second computing unit is a transient failure or a permanent failure by running a self-check library.
  • the transceiver module 1510 is further configured to: report the failure status of the second computing unit when the status of the second computing unit is permanent failure.
  • the transceiving module 1510 can be used to execute step 1401 in the method 1400 ; the processing module 1520 can be used to execute the processing step 1402 in the method 1400 .
  • the sending and receiving module 1510 is used to obtain the verification result of the output result of the AI model processed by the second computing unit to process the AI calculation, and the verification result is to determine that the output result is incorrect; the processing module 1520 is used to run the self-checking library It is determined that the state of the second computing unit is a transient failure or a permanent failure.
  • the state of the second computing unit is a transient failure; when the result of running the self-test library is faulty, the second computing unit status is permanently disabled.
  • the device 1500 is also used to: when the status of the second computing unit is transient failure, discard the output result; when the status of the second computing unit is permanent failure, report to the second computing unit failure status.
  • the embodiment of the present application also provides a verification device 1600 for AI calculation.
  • the AI computing device 1600 shown in FIG. 16 may include: a memory 1610 , a processor 1620 , and a communication interface 1630 .
  • the memory 1610, the processor 1620, and the communication interface 1630 are connected through an internal connection path, the memory 1610 is used to store instructions, and the processor 1620 is used to execute the instructions stored in the memory 1620 to control the communication interface 1630 to receive input samples or send forecast result.
  • the memory 1610 may be coupled to the processor 1620 through an interface, or may be integrated with the processor 1620 .
  • the above-mentioned communication interface 1630 uses a transceiver device such as but not limited to a transceiver to implement communication between the communication device 1600 and other devices or communication networks.
  • the above-mentioned communication interface 1630 may also include an input/output interface (input/output interface).
  • each step of the above method may be implemented by an integrated logic circuit of hardware in the processor 1620 or instructions in the form of software.
  • the methods disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 1610, and the processor 1620 reads the information in the memory 1610, and completes the steps of the above method in combination with its hardware. To avoid repetition, no detailed description is given here.
  • the processor may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP), dedicated integrated Circuit (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory may include a read-only memory and a random access memory, and provide instructions and data to the processor.
  • a portion of the processor may also include non-volatile random access memory.
  • the processor may also store device type information.
  • An embodiment of the present application further provides a chip, which is characterized in that the chip includes a first computing unit, and the first computing unit is configured to execute the above-mentioned method in FIG. 5 or FIG. 14 .
  • the chip further includes a second calculation unit, and the second calculation unit is used to perform AI calculation.
  • the embodiment also provides a computer-readable medium, which is characterized in that the computer-readable medium stores program codes, and when the computer program codes run on the computer, the computer executes the method in FIG. 5 or FIG. 14 .
  • the embodiment also provides a computing device, including a first computing unit and a second computing unit, the second computing unit is used to process AI computing based on the AI model, and the first computing unit executes the method in FIG. 5 or FIG. 14 .
  • the processing capability of the first computing unit is less than or equal to the processing capability of the second computing unit.
  • the first computing unit is at least one of a computing unit in an AI chip, a computing unit in a CPU chip, or a computing unit in a GPU chip, and the second computing unit is a computing unit in an AI chip.
  • a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a computing device and the computing device can be components.
  • One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems). Communicate through local and/or remote processes.
  • packets of data e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Neurology (AREA)
  • Hardware Redundancy (AREA)

Abstract

The present application provides an AI computing verification method and apparatus. The method is executed by a first computing unit, and comprises: acquiring parameters of an AI model of a second computing unit processing AI computing, the AI model comprising one or more first processing layers; for each of the one or more first processing layers, acquiring input data of the first processing layer from the second computing unit; performing verification processing on the first processing layer on the basis of the parameters of the AI model and the input data of the first processing layer to obtain a verification mark bit of the first processing layer, wherein the amount of computation of the verification processing on the first processing layer is less than the amount of computation of the second computing unit processing the input data by means of the first processing layer; and determining, on the basis of the verification result, whether the output result of the second computing unit processing the AI computing is correct, the verification result comprising verification mark bits of the one or more first processing layers. The method can guarantee the correctness and real-time performance of inferential computing.

Description

AI计算的校验方法和装置AI calculation verification method and device
本申请要求于2021年08月12日提交中国专利局、申请号为202110924997.X、申请名称为“AI计算的校验方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110924997.X and the application name "AI computing verification method and device" filed with the China Patent Office on August 12, 2021, the entire contents of which are incorporated by reference in In this application.
技术领域technical field
本申请涉及人工智能领域,并且更具体的,涉及一种AI计算的校验方法和装置。The present application relates to the field of artificial intelligence, and more specifically, relates to a method and device for verifying AI calculations.
背景技术Background technique
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。人工智能的应用领域广泛,交通、医疗、安防等领域中的数据处理都可以通过AI神经网络来完成,需要分析处理的数据越多,神经网络运算负荷就越大。以智能驾驶场景为例,等级较高的智能驾驶车辆为了实现对周围环境的全面感知往往配备多个摄像头、激光雷达、超声波雷达等传感器,由此产生大量需要处理的信息,且智能驾驶车辆对神经网络推理计算的实时性要求很高,神经网络的推理计算若有滞后,则无法及时提供环境信息给后续的规控决策,降低智能驾驶的安全性。而传统中央处理器(central processing unit,CPU)无法负担如此庞大的神经网络的推理计算,因此,人工智能芯片被用作硬件加速单元专门进行神经网络的推理计算,AI芯片在执行神经网络的推理计算时相较于传统芯片更快速更节能,目前常用的AI芯片包括图形处理器(graphics processing unit,GPU)、现场可编程门阵列(field programmable gate array,FPGA)、专用集成电路(application specific integrated circuit,ASIC)。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. Artificial intelligence has a wide range of applications. Data processing in transportation, medical care, security and other fields can be completed through AI neural networks. The more data that needs to be analyzed and processed, the greater the computational load of the neural network. Taking the intelligent driving scene as an example, high-level intelligent driving vehicles are often equipped with multiple cameras, lidar, ultrasonic radar and other sensors in order to achieve a comprehensive perception of the surrounding environment, resulting in a large amount of information that needs to be processed. The real-time requirement of neural network inference calculation is very high. If the inference calculation of neural network lags behind, it will not be able to provide environmental information in time for subsequent regulation and control decisions, which will reduce the safety of intelligent driving. However, the traditional central processing unit (CPU) cannot afford the reasoning calculation of such a huge neural network. Therefore, the artificial intelligence chip is used as a hardware acceleration unit to perform the reasoning calculation of the neural network, and the AI chip is performing the reasoning of the neural network. Computing is faster and more energy efficient than traditional chips. Currently commonly used AI chips include graphics processing units (graphics processing units, GPUs), field programmable gate arrays (field programmable gate arrays, FPGAs), and application specific integrated circuits (application specific integrated circuits). circuit, ASIC).
智能驾驶汽车运行在外部环境中,可能遭遇各种恶劣天气和电磁干扰等问题,这要求智能驾驶系统的计算平台具有极高可靠性。CPU、内存等传统器件具有程序流监控、数据流监控、内存校验(error checking and correction,ECC)、奇偶校验等检错容错机制保障CPU和内存中的数据不受软失效的影响。对于AI芯片,为了实现其高速的运算,AI芯片内部一般没有有效的检错机制,并且由于AI芯片与CPU等传统芯片的计算架构不同,传统芯片的检错机制并不能直接应用于AI芯片。Intelligent driving vehicles run in the external environment and may encounter various problems such as severe weather and electromagnetic interference, which requires the computing platform of the intelligent driving system to have extremely high reliability. Traditional devices such as CPU and memory have error detection and fault tolerance mechanisms such as program flow monitoring, data flow monitoring, memory checking (error checking and correction, ECC), parity checking, etc. to ensure that data in the CPU and memory are not affected by soft failures. For AI chips, in order to achieve high-speed computing, there is generally no effective error detection mechanism inside AI chips, and because the computing architecture of AI chips is different from traditional chips such as CPUs, the error detection mechanism of traditional chips cannot be directly applied to AI chips.
为了保障智能驾驶的安全,计算平台需要满足汽车安全完整性等级(automotive safety integrity level,ASIL)的要求,需要一种检错方法对AI芯片进行实时性检测,从而保障AI芯片的应用满足应用场景的需求。In order to ensure the safety of intelligent driving, the computing platform needs to meet the requirements of automotive safety integrity level (ASIL), and an error detection method is needed to detect the real-time performance of AI chips, so as to ensure that the application of AI chips meets the application scenarios demand.
发明内容Contents of the invention
本申请提供一种AI计算的校验方法和装置,校验方法可以由处理AI计算的计算单元之外的其他计算单元执行,不会对AI计算的处理造成影响,相比于冗余校验,本申请实 施例的AI计算的校验方法具有极小的计算量,对用于校验的计算单元性能要求低,相应地降低了硬件成本,为AI芯片的可靠性提供了保障。This application provides a verification method and device for AI calculation. The verification method can be executed by other calculation units other than the calculation unit processing AI calculation, and will not affect the processing of AI calculation. Compared with redundant verification , the verification method of AI calculation in the embodiment of the present application has a very small amount of calculation, and has low requirements on the performance of the calculation unit used for verification, which reduces the hardware cost accordingly, and provides guarantee for the reliability of the AI chip.
第一方面,提供了一种AI计算的校验方法,方法由第一计算单元执行,方法包括:获取第二计算单元处理AI计算的AI模型的参数,AI模型包括一个或多个第一处理层;对一个或多个第一处理层中的每一个第一处理层分别执行以下校验处理得到一个或多个第一处理层中的每一个第一处理层的校验标记位:从第二计算单元获取第一处理层的输入数据;基于AI模型的参数和第一处理层的输入数据对第一处理层进行校验处理,以得到第一处理层的校验标记位,其中,对第一处理层的校验处理的计算量小于第二计算单元通过第一处理层处理输入数据的计算量;基于校验结果确定第二计算单元处理AI计算的输出结果是否正确,校验结果包括一个或多个第一处理层中的每一个第一处理层的校验标记位。In the first aspect, a verification method for AI calculation is provided, the method is executed by the first calculation unit, and the method includes: obtaining the parameters of the AI model of the AI calculation processed by the second calculation unit, and the AI model includes one or more first processing layer; each first processing layer in one or more first processing layers performs the following verification process respectively to obtain the verification mark bit of each first processing layer in one or more first processing layers: from the first processing layer The second calculation unit acquires the input data of the first processing layer; performs verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain the verification mark bit of the first processing layer, wherein, for The calculation amount of the verification processing of the first processing layer is less than the calculation amount of the second computing unit processing the input data through the first processing layer; based on the verification result, it is determined whether the output result of the second computing unit processing AI calculation is correct, and the verification result includes A check flag bit for each first processing layer of the one or more first processing layers.
本申请实施例的AI计算的校验方法由执行AI计算的计算单元之外的其他计算单元执行,相比于周期性运行自检库的校验方法,本申请实施例的AI计算的校验方法不会干扰AI模型的推理计算的正常进行,因此不会影响AI计算单元的加速性能,也避免了同一个AI计算单元在执行AI计算时又执行校验,在保证AI模型输出结果的正确性的同时保证AI模型的推理计算的实时性,并且由于校验处理的计算量小于AI计算,因此对用于校验处理的计算单元的性能要求可以不高于用于AI计算的计算单元,相比于冗余计算单元进行校验,本申请实施例提供的异构校验方式可以节省功耗,降低成本。The verification method of the AI calculation in the embodiment of the present application is performed by other computing units other than the computing unit that executes the AI calculation. Compared with the verification method of periodically running the self-check library, the verification method of the AI calculation in the embodiment of the present application The method will not interfere with the normal progress of the inference calculation of the AI model, so it will not affect the acceleration performance of the AI calculation unit, and it will also avoid the same AI calculation unit from performing verification when performing AI calculations, ensuring the correctness of the AI model output results While guaranteeing the real-time performance of the reasoning calculation of the AI model, and since the calculation amount of the verification processing is less than that of the AI calculation, the performance requirements of the computing unit used for the verification processing may not be higher than that of the computing unit used for AI calculation. Compared with verification performed by redundant computing units, the heterogeneous verification method provided in the embodiment of the present application can save power consumption and reduce costs.
在某些可能的实现方式中,AI模型还包括一个或多个第二处理层,方法还包括:对一个或多个第二处理层中的每一个第二处理层分别进行冗余校验得到一个或多个第二处理层中的每一个第二处理层的校验标记位;校验结果还包括一个或多个第二处理层中的每一个第二处理层的校验标记位。In some possible implementations, the AI model further includes one or more second processing layers, and the method further includes: performing a redundancy check on each of the one or more second processing layers to obtain A check mark bit of each of the one or more second processing layers; the check result further includes a check mark bit of each of the one or more second processing layers.
第二处理层包括池化层和激活层等,由于池化计算和激活计算只占用极少部分的资源,因此即便使用冗余校验也不会消耗过多的资源。The second processing layer includes pooling layer and activation layer, etc. Since pooling calculation and activation calculation only occupy a small part of resources, even if redundancy check is used, excessive resources will not be consumed.
在某些可能的实现方式中,AI模型的参数包括权重矩阵,第一处理层的输入数据包括特征图矩阵,基于AI模型的参数和第一处理层的输入数据对第一处理层进行校验处理,以得到第一处理层的校验标记位,包括:In some possible implementations, the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the first processing layer is verified based on the parameters of the AI model and the input data of the first processing layer Processing, to obtain the checkmark bit of the first processing layer, including:
获取第一校验标记位,第一校验标记位是对权重矩阵进行第一校验计算得到的;获取第二校验标记位,第二校验标记位是对特征图矩阵进行第二校验计算得到的;根据第一校验标记位和第二校验标记位获取计算前校验标记位;从第二计算单元获取输出矩阵,输出矩阵为第二计算单元在第一处理层对权重矩阵和特征图矩阵进行计算得到;对输出矩阵进行第三校验计算,以得到计算后校验标记位;根据计算前校验标记位和计算后校验标记位获取校验标记位。Obtain the first check mark bit, the first check mark bit is obtained by performing the first check calculation on the weight matrix; obtain the second check mark bit, the second check mark bit is the second check mark bit on the feature map matrix According to the first check mark bit and the second check mark bit, the pre-calculation check mark bit is obtained; the output matrix is obtained from the second calculation unit, and the output matrix is the weight of the second calculation unit at the first processing layer The matrix and the feature map matrix are obtained by calculation; the third check calculation is performed on the output matrix to obtain the check mark bit after calculation; the check mark bit is obtained according to the check mark bit before calculation and the check mark bit after calculation.
上述权重矩阵和特征图矩阵可以离线计算得到,并保存在内存中。且上述获得不同标记位的矩阵计算不同。本申请实施例的AI计算的校验方法针对AI模型中的不同处理层设计了不同的校验方式,最大限度节约了计算资源,使得校验可以在计算能力较低的计算单元中进行,降低了校验的成本。The above weight matrix and feature map matrix can be calculated offline and stored in memory. Moreover, the matrix calculations for obtaining different marker bits are different. The verification method of AI calculation in the embodiment of the present application designs different verification methods for different processing layers in the AI model, which saves computing resources to the greatest extent, so that the verification can be performed in computing units with low computing power, reducing the cost of verification.
在某些可能的实现方式中,校验标记位表示计算前校验标记位和计算后校验标记位是否一致,基于校验结果确定第二计算单元处理AI计算的输出结果是否正确,包括:如果 校验结果中存在至少一个校验标记位表示计算前校验标记位和计算后校验标记位不一致,则输出结果不正确。In some possible implementations, the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation, and it is determined based on the check result whether the output result of the AI calculation processed by the second calculation unit is correct, including: If there is at least one check mark bit in the check result indicating that the check mark bit before calculation and the check mark bit after calculation are inconsistent, the output result is incorrect.
在某些可能的实现方式中,第一处理层为卷积层或全连接层。多个第一处理层可以包括多个卷积层,或,多个全连接层,或,一个或多个卷积层和一个或多个全连接层。In some possible implementation manners, the first processing layer is a convolutional layer or a fully connected layer. The plurality of first processing layers may include a plurality of convolutional layers, or, a plurality of fully connected layers, or, one or more convolutional layers and one or more fully connected layers.
在某些可能的实现方式中,当判断输出结果不正确时,第二计算单元的状态包括瞬态失效和永久性失效。In some possible implementation manners, when the judgment output result is incorrect, the state of the second computing unit includes transient failure and permanent failure.
在某些可能的实现方式中,当判断输出结果不正确时,方法还包括:通过运行自检库确定第二计算单元的状态为瞬态失效或永久性失效。In some possible implementation manners, when it is judged that the output result is incorrect, the method further includes: determining that the state of the second computing unit is a transient failure or a permanent failure by running a self-check library.
在某些可能的实现方式中,方法还包括:当第二计算单元的状态为永久性失效时,上报第二计算单元的失效状态。In some possible implementation manners, the method further includes: when the status of the second computing unit is permanent failure, reporting the failure status of the second computing unit.
本申请实施例的AI计算的校验方法在确定了硬件失效后还可以进一步判断硬件的具体失效状态,若计算单元只是发生瞬态失效,则可以继续使用该计算单元,从而避免资源的浪费,提升了AI芯片的可用性。The verification method of AI calculation in the embodiment of the present application can further judge the specific failure state of the hardware after the hardware failure is determined. If the calculation unit only fails transiently, the calculation unit can continue to be used, thereby avoiding the waste of resources. Improved the availability of AI chips.
第二方面,提供了一种AI计算的校验方法,其特征在于,方法由第一计算单元执行,方法包括:获取第二计算单元处理AI计算的AI模型的输出结果的校验结果,校验结果为判定输出结果不正确;运行自检库确定第二计算单元的状态为瞬态失效或永久性失效。In the second aspect, a method for verifying AI calculation is provided, which is characterized in that the method is executed by the first calculation unit, and the method includes: obtaining the verification result of the output result of the AI model processed by the second calculation unit for AI calculation, and checking The result of the test is to determine that the output result is incorrect; the self-test library is run to determine that the state of the second computing unit is a transient failure or a permanent failure.
在某些可能的实现方式中,运行自检库确定第二计算单元的状态为瞬态失效或永久性失效,包括:当运行自检库的运行结果为没有故障时,第二计算单元的状态为瞬态失效;当运行自检库的运行结果为有故障时,第二计算单元的状态为永久性失效。In some possible implementations, the operation of the self-check library determines that the state of the second computing unit is transient failure or permanent failure, including: when the operation result of running the self-test library is no fault, the state of the second computing unit is a transient failure; when the operation result of running the self-test library is faulty, the state of the second computing unit is permanent failure.
在某些可能的实现方式中,方法还包括:当第二计算单元的状态为瞬态失效时,舍弃输出结果;当第二计算单元的状态为永久性失效时,上报第二计算单元的失效状态。In some possible implementations, the method further includes: when the status of the second computing unit is transient failure, discarding the output result; when the status of the second computing unit is permanent failure, reporting the failure of the second computing unit state.
本申请实施例的AI计算的校验方法通过CPU执行系统调度,调用自检库对出现硬件失效的AI核进行自检,判断AI核是发生永久性失效还是瞬态失效,如果自检没有发现故障,则说明AI核发生了瞬态失效,不影响后续的计算,该AI核可以继续参与系统运算。如果自检发现了故障,说明AI核发生了永久性失效,那么该AI核不能继续参与运算,需要进行故障上报。避免了将发生了失效的AI核直接停用,降低了资源的浪费,提升了AI芯片的可用性。The verification method of AI calculation in the embodiment of this application uses the CPU to perform system scheduling, calls the self-inspection library to perform self-inspection on the AI core with hardware failure, and judges whether the AI core has a permanent failure or a transient failure. If the self-inspection does not find If the failure occurs, it means that the AI core has a transient failure, which does not affect the subsequent calculation, and the AI core can continue to participate in the system operation. If a fault is found in the self-test, it means that the AI core has permanently failed, and the AI core cannot continue to participate in the calculation, and a fault report is required. It avoids directly deactivating the failed AI core, reduces the waste of resources, and improves the availability of AI chips.
第三方面,提供了一种AI计算的校验装置,装置包括:收发单元,用于获取第二计算单元处理AI计算的AI模型的参数,AI模型包括一个或多个第一处理层;对一个或多个第一处理层中的每一个第一处理层分别执行以下校验处理得到一个或多个第一处理层中的每一个第一处理层的校验标记位:收发单元还用于,从第二计算单元获取第一处理层的输入数据;处理单元,用于基于AI模型的参数和第一处理层的输入数据对第一处理层进行校验处理,以得到第一处理层的校验标记位,其中,对第一处理层的校验处理的计算量小于第二计算单元通过第一处理层处理输入数据的计算量;处理单元还用于基于校验结果确定第二计算单元处理AI计算的输出结果是否正确,校验结果包括一个或多个第一处理层中的每一个第一处理层的校验标记位。In a third aspect, a verification device for AI calculation is provided, the device includes: a transceiver unit, configured to acquire parameters of an AI model for processing AI calculation by a second calculation unit, and the AI model includes one or more first processing layers; Each first processing layer in one or more first processing layers respectively performs the following verification processing to obtain the verification mark bit of each first processing layer in one or more first processing layers: the transceiver unit is also used for , to obtain the input data of the first processing layer from the second calculation unit; the processing unit is used to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain the first processing layer A check mark bit, wherein the calculation amount of the check processing on the first processing layer is less than the calculation amount of the second computing unit processing the input data through the first processing layer; the processing unit is also used to determine the second computing unit based on the check result Whether the output result of the AI calculation is correct or not, the verification result includes a verification flag bit of each first processing layer in one or more first processing layers.
在某些可能的实现方式中,AI模型还包括一个或多个第二处理层,处理单元还用于:对一个或多个第二处理层中的每一个第二处理层分别进行冗余校验得到一个或多个第二处理层中的每一个第二处理层的校验标记位;校验结果还包括一个或多个第二处理层中的 每一个第二处理层的校验标记位。In some possible implementations, the AI model further includes one or more second processing layers, and the processing unit is further configured to: respectively perform redundancy calibration on each of the one or more second processing layers The verification mark bit of each second processing layer in one or more second processing layers is verified; the verification result also includes the verification flag bit of each second processing layer in one or more second processing layers .
在某些可能的实现方式中,AI模型的参数包括权重矩阵,第一处理层的输入数据包括特征图矩阵,处理单元具体用于:获取第一校验标记位,第一校验标记位是对权重矩阵进行第一校验计算得到的;获取第二校验标记位,第二校验标记位是对特征图矩阵进行第二校验计算得到的;根据第一校验标记位和第二校验标记位获取计算前校验标记位;从第二计算单元获取输出矩阵,输出矩阵为第二计算单元在第一处理层对权重矩阵和特征图矩阵进行计算得到;对输出矩阵进行第三校验计算,以得到计算后校验标记位;根据计算前校验标记位和计算后校验标记位获取校验标记位。In some possible implementations, the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the processing unit is specifically configured to: acquire the first check mark bit, the first check mark bit is It is obtained by performing the first check calculation on the weight matrix; obtaining the second check mark bit, which is obtained by performing the second check calculation on the feature map matrix; according to the first check mark bit and the second check mark bit The check mark bit obtains the check mark bit before calculation; the output matrix is obtained from the second calculation unit, and the output matrix is calculated by the second calculation unit on the weight matrix and the feature map matrix at the first processing layer; the third step is performed on the output matrix Check the calculation to obtain the check mark bit after calculation; obtain the check mark bit according to the check mark bit before calculation and the check mark bit after calculation.
在某些可能的实现方式中,校验标记位表示计算前校验标记位和计算后校验标记位是否一致,处理单元具体用于:如果校验结果中存在至少一个校验标记位表示计算前校验标记位和计算后校验标记位不一致,则输出结果不正确。In some possible implementations, the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation, and the processing unit is specifically used to: if there is at least one check mark bit in the check result, it indicates If the pre-check mark bit is inconsistent with the post-calculation check mark bit, the output result will be incorrect.
在某些可能的实现方式中,第一处理层为卷积层或全连接层。In some possible implementation manners, the first processing layer is a convolutional layer or a fully connected layer.
在某些可能的实现方式中,当判断输出结果不正确时,第二计算单元的状态包括瞬态失效和永久性失效。In some possible implementation manners, when the judgment output result is incorrect, the state of the second computing unit includes transient failure and permanent failure.
在某些可能的实现方式中,当判断输出结果不正确时,处理单元还用于:通过运行自检库确定第二计算单元的状态为瞬态失效或永久性失效。In some possible implementation manners, when the judgment output result is incorrect, the processing unit is further configured to: determine that the state of the second computing unit is a transient failure or a permanent failure by running a self-check library.
在某些可能的实现方式中,收发单元还用于:当第二计算单元的状态为永久性失效时,上报第二计算单元的失效状态。In some possible implementation manners, the transceiver unit is further configured to: report the failure status of the second computing unit when the status of the second computing unit is permanent failure.
第四方面,提供了一种AI计算的校验装置,该装置包括:收发单元,用于获取第二计算单元处理AI计算的AI模型的输出结果的校验结果,校验结果为判定输出结果不正确;处理单元,用于运行自检库确定第二计算单元的状态为瞬态失效或永久性失效。In the fourth aspect, a verification device for AI calculation is provided, the device includes: a transceiver unit, configured to obtain a verification result of the output result of the AI model processed by the second calculation unit for AI calculation, and the verification result is the judgment output result Incorrect; the processing unit is used to run the self-check library to determine whether the state of the second computing unit is a transient failure or a permanent failure.
在某些可能的实现方式中,当运行自检库的运行结果为没有故障时,第二计算单元的状态为瞬态失效;当运行自检库的运行结果为有故障时,第二计算单元的状态为永久性失效。In some possible implementations, when the result of running the self-test library is no fault, the state of the second computing unit is a transient failure; when the result of running the self-test library is faulty, the second computing unit status is permanently disabled.
在某些可能的实现方式中,装置还用于:当第二计算单元的状态为瞬态失效时,舍弃输出结果;当第二计算单元的状态为永久性失效时,上报第二计算单元的失效状态。In some possible implementations, the device is also used to: when the status of the second computing unit is transient failure, discard the output result; when the status of the second computing unit is permanent failure, report the status of the second computing unit failure state.
第五方面,提供了一种芯片,包括第一计算单元,第一计算单元用于执行上述第一方面和第二方面中任一中可能的实现方式中的方法。In a fifth aspect, a chip is provided, including a first computing unit, and the first computing unit is configured to execute the method in any one of the possible implementations of the first aspect and the second aspect above.
在某些可能的实现方式中,芯片还包括第二计算单元,第二计算单元用于执行AI计算。In some possible implementation manners, the chip further includes a second calculation unit, and the second calculation unit is configured to perform AI calculation.
第六方面,提供了一种计算机可读介质,其特征在于,计算机可读介质存储有程序代码,当计算机程序代码在计算机上运行时,使得计算机执行上述第一方面和第二方面中任一中可能的实现方式中的方法。In a sixth aspect, a computer-readable medium is provided, wherein the computer-readable medium stores program codes, and when the computer program codes run on the computer, the computer executes any one of the above-mentioned first aspect and the second aspect. methods in possible implementations.
第七方面,提供了一种计算设备,包括第一计算单元和第二计算单元,第二计算单元用于基于AI模型处理AI计算,第一计算单元执行上述第一方面和第二方面中任一中可能的实现方式中的方法。In a seventh aspect, a computing device is provided, including a first computing unit and a second computing unit, the second computing unit is used to process AI computing based on an AI model, and the first computing unit performs any of the above-mentioned first and second aspects. A method in one possible implementation.
在某些可能的实现方式中,第一计算单元的处理能力小于或等于第二计算单元的处理能力。In some possible implementation manners, the processing capability of the first computing unit is less than or equal to the processing capability of the second computing unit.
在某些可能的实现方式中,第一计算单元为AI芯片中的计算单元、CPU芯片中的计 算单元或GPU芯片中的计算单元中的至少一种,第二计算单元为AI芯片中的计算单元。In some possible implementations, the first calculation unit is at least one of the calculation unit in the AI chip, the calculation unit in the CPU chip, or the calculation unit in the GPU chip, and the second calculation unit is the calculation unit in the AI chip. unit.
附图说明Description of drawings
图1是本申请实施例的周期性运行自检库的校验方法的示意图;Fig. 1 is the schematic diagram of the verification method of the periodical operation self-inspection storehouse of the embodiment of the present application;
图2是本申请实施例的AI计算的校验方法的一种可能的应用的系统架构示意图;FIG. 2 is a schematic diagram of a system architecture of a possible application of the verification method of AI calculation according to the embodiment of the present application;
图3是本申请实施例的AI计算的校验方法应用于智能驾驶计算平台时可能涉及的硬件单元的示意图;Fig. 3 is a schematic diagram of hardware units that may be involved when the verification method of AI calculation according to the embodiment of the present application is applied to the intelligent driving computing platform;
图4是本申请实施例的AI计算的校验方法应用的系统架构图;FIG. 4 is a system architecture diagram of the application of the verification method of AI calculation in the embodiment of the present application;
图5是本申请实施例的AI计算的校验方法的示意性流程图;FIG. 5 is a schematic flow chart of a verification method for AI calculation according to an embodiment of the present application;
图6是本申请实施例的计算离线校验标记位的示意图;Fig. 6 is a schematic diagram of calculating the off-line check mark bit of the embodiment of the present application;
图7是本申请实施例的计算特征图校验标记位的示意图;FIG. 7 is a schematic diagram of a check mark bit of a calculation feature map according to an embodiment of the present application;
图8是本申请实施例的计算计算前校验标记位的示意图;FIG. 8 is a schematic diagram of a check mark before calculation in an embodiment of the present application;
图9是本申请实施例的AI模型对数据进行处理的示意图;FIG. 9 is a schematic diagram of processing data by the AI model of the embodiment of the present application;
图10是本申请实施例的计算计算后校验标记位的示意图;Fig. 10 is a schematic diagram of the calculated check mark bit in the embodiment of the present application;
图11是本申请实施例的计算校验标记位的示意图;FIG. 11 is a schematic diagram of calculating a checkmark bit in an embodiment of the present application;
图12是本申请实施例的检测到第二计算单元发生失效后进一步检测其具体失效状态的示意图;Fig. 12 is a schematic diagram of further detecting the specific failure status of the second computing unit after detecting the failure of the second computing unit according to the embodiment of the present application;
图13是本申请实施例的判断第二计算单元的具体失效状态的示意图;Fig. 13 is a schematic diagram of judging the specific failure state of the second computing unit according to the embodiment of the present application;
图14是本申请实施例的另一种AI计算的校验方法的示意性流程图;FIG. 14 is a schematic flowchart of another AI calculation verification method according to the embodiment of the present application;
图15本申请实施例提供的AI计算的校验装置的示意性框图;Fig. 15 is a schematic block diagram of the verification device for AI calculation provided by the embodiment of the present application;
图16是本申请实施例的一种AI计算的校验设备的示意性结构图。FIG. 16 is a schematic structural diagram of an AI calculation verification device according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solution in this application will be described below with reference to the accompanying drawings.
目前AI芯片中并没有专门的校验机制对神经网络的推理计算结果进行校验,其中神经网络的推理计算属于AI计算,但神经网络的推理计算结果的正确性保障却十分有必要,例如在智能驾驶或智能驾驶的场景中,如果根据神经网络输出的错误的计算结果执行了错误的决策,会给驾驶带来巨大的危险。因此为了保证神经网络的推理计算结果的正确性,目前业界提出了的一些校验方法,包括冗余校验和周期性运行自检库(self-test library,STL)的校验方法。冗余校验又包括双重冗余(dual module redundancy,DMR)和三重冗余(triple module redundancy,TMR)等,是指使用两倍或三倍数量的相同计算芯片或计算单元同时执行相同的计算,然后对比计算结果,若计算结果一致则说明计算结果正确,若计算结果有不一致则说明计算结果错误。冗余校验理论上可以实现对神经网络输出结果的校验,但是同时会带来双倍或三倍的成本增加。At present, there is no special verification mechanism in the AI chip to verify the inference calculation results of the neural network. The inference calculation of the neural network belongs to AI calculation, but it is very necessary to ensure the correctness of the inference calculation results of the neural network. For example, in In the scenario of intelligent driving or intelligent driving, if a wrong decision is made based on the wrong calculation result output by the neural network, it will bring great danger to driving. Therefore, in order to ensure the correctness of the reasoning and calculation results of the neural network, some verification methods have been proposed in the industry, including redundant verification and periodic self-test library (self-test library, STL) verification methods. Redundancy checks include dual module redundancy (DMR) and triple module redundancy (TMR), etc., which refer to using twice or three times the same computing chips or computing units to perform the same calculation at the same time , and then compare the calculation results. If the calculation results are consistent, the calculation results are correct. If the calculation results are inconsistent, the calculation results are wrong. Redundancy verification can theoretically verify the output of the neural network, but at the same time it will double or triple the cost increase.
下面结合图1对STL的校验方法进行介绍。图1中的(a)图示出了一个AI芯片中的一个AI核按照时间顺序依次执行4次AI计算的示意图,每次AI计算即运行一次神经网络进行推理计算,并得到输出结果。为了保障输出结果的准确性,需要对计算结果进行检测。图1中的(b)图示出了使用STL检测方法对4次AI计算进行检测,其中STL检测为周期性的,在理想情况下,STL检测的周期与执行一次AI计算的时间相同,即每次 AI计算后都进行一次STL检测,当且仅当STL检测结果为正常时,才继续执行下一次AI计算。图1中的(c)图示出了STL检测出硬件失效的情况,如(c)图所示,在第3个检测周期时,进行STL检测发现AI芯片发生了永久性失效,则该AI芯片不再进行AI计算,并且将故障上报。The verification method of the STL is introduced below in combination with FIG. 1 . Figure 1 (a) shows a schematic diagram of an AI core in an AI chip performing four AI calculations sequentially in chronological order. Each AI calculation runs a neural network for inference calculations and obtains output results. In order to ensure the accuracy of the output results, it is necessary to test the calculation results. Figure (b) in Figure 1 shows that the STL detection method is used to detect 4 AI calculations, where the STL detection is periodic. An STL test is performed after each AI calculation, and the next AI calculation is continued only when the STL test result is normal. Figure (c) in Figure 1 shows the situation where STL detects hardware failure. As shown in (c), in the third detection cycle, when the STL detection finds that the AI chip has a permanent failure, the AI chip The chip no longer performs AI calculations and reports faults.
由上述介绍可知,STL检测存在一些不足,首先,STL检测为周期性进行,检测周期是固定的,但AI计算的时间是不确定的,例如预设的检测周期为10毫秒,但某一次的AI计算时间可能超过10毫秒,因此运行STL检测时可能会打断AI计算过程,导致决策延迟,并且执行STL检测时需停止AI计算,无法保证决策的实时性。此外,在预设的检测周期为10毫秒的情况下,AI计算时间可能少于10毫秒,例如图1的(c)图中的第3个检测周期中,在一个检测周期中AI计算进行了两次,在第3个检测周期中检测到AI芯片失效,该失效可能发生在第3个检测周期中的第一次AI计算中,但STL检测没有及时检测到,使得错误的计算结果产生的错误决策,可能对设备的使用安全性造成威胁。再者,STL检测主要识别永久性失效,对于瞬态失效的检测较弱,如果发生了瞬态失效,STL检测也会将瞬态失效判定为永久性失效,进而进行故障上报,然而AI芯片包括大量乘加计算单元,因此AI芯片面积占比较大,理论上受到高能粒子冲击的概率也较高,因此AI芯片发生瞬态失效的概率比发生永久性失效的概率更高,而STL检测将瞬态失效判定为永久性失效进行故障上报,系统则会停用该被上报的AI芯片,浪费硬件资源,降低系统的可用性。From the above introduction, we can see that there are some deficiencies in STL detection. First, STL detection is performed periodically, and the detection cycle is fixed, but the AI calculation time is uncertain. For example, the preset detection cycle is 10 milliseconds, but a certain The AI calculation time may exceed 10 milliseconds, so the AI calculation process may be interrupted when running the STL test, resulting in a delay in decision-making, and the AI calculation needs to be stopped when the STL test is performed, and the real-time decision-making cannot be guaranteed. In addition, when the preset detection period is 10 milliseconds, the AI calculation time may be less than 10 milliseconds. For example, in the third detection period in (c) of Figure 1, the AI calculation is carried out in one detection period. Twice, the AI chip failure was detected in the third detection cycle, which may have occurred in the first AI calculation in the third detection cycle, but the STL detection did not detect it in time, resulting in erroneous calculation results Wrong decisions may pose a threat to the safety of equipment. Furthermore, STL detection mainly identifies permanent failures, and the detection of transient failures is weak. If a transient failure occurs, STL detection will also judge the transient failure as a permanent failure, and then report the fault. However, AI chips include A large number of multiplication and addition calculation units, so the area of the AI chip is relatively large, and the probability of being impacted by high-energy particles is also high in theory. Therefore, the probability of transient failure of the AI chip is higher than the probability of permanent failure, and the STL detection will instantaneously If the status failure is determined to be a permanent failure and a fault is reported, the system will disable the reported AI chip, wasting hardware resources and reducing system availability.
因此本申请实施例提供一种AI计算的校验方法,该校验方法可以由处理AI计算的计算单元之外的其他计算单元执行,不会对AI计算的处理造成影响;校验与神经网络的推理计算同步进行,使得神经网络的计算结果具有正确性和实时性;相比于冗余校验,本申请实施例的AI计算的校验方法具有极小的计算量,对用于校验的计算单元性能要求低,相应地降低了硬件成本,为AI芯片的可靠性提供了保障。Therefore, the embodiment of the present application provides a verification method for AI calculation, which can be executed by other calculation units other than the calculation unit processing AI calculation, and will not affect the processing of AI calculation; verification and neural network The reasoning and calculation of the neural network are carried out synchronously, so that the calculation results of the neural network are correct and real-time; The performance requirements of the computing unit are low, which reduces the hardware cost accordingly and provides a guarantee for the reliability of the AI chip.
本申请实施例的AI计算的校验方法可以应用于任何神经网络推理计算的场景中,包括:对神经网络推理计算的安全性要求较高的场景,例如在5G智慧工业中会将海量工业数据通过AI神经网络进行分析、处理和优化,这就需要更加安全可靠的神经网络的推理计算,因此本申请实施例的AI计算的校验方法可以应用于5G智慧工业中,如控制设备,服务器等部署有AI芯片的设备,以提高神经网络推理计算的可靠性;需要大规模部署AI芯片的场景,例如云计算的服务器、智能安防摄像头、智能驾驶车辆,终端设备等,这些设备规模大、运行时间长,容易发生硬件失效,而硬件失效对AI芯片的计算结果的准确性和系统的可用性都会造成不利影响,降低用户的体验,因此本申请实施例的AI计算的校验方法可以应用于云计算的服务器、智能安防摄像头、智能驾驶车辆中,实时校验AI计算结果的正确性;存在大量矩阵运算的场景,例如智能手机和智能电视等智能设备中通常会配备图形处理单元(GPU),考虑到成本这类图形处理单元在涉及时工艺要求较低,因而失效率很高,且通常不具备任何检错能力,但同时配备了这类图形处理单元的设备需要处理大量的图像数据,处理过程中以矩阵运算为主,当发生失效时体现在智能手机或智能电视中可能是图像乱码等,降低用户使用感受,因此本申请实施例的AI计算的校验方法作为一种低成本的校验方法可以应用于配备了上述图形处理单元的智能设备中,降低硬件失效对数据处理的影响,提高用户体验。The verification method of AI calculation in the embodiment of the present application can be applied to any scene of neural network reasoning calculation, including: a scene with high security requirements for neural network reasoning calculation, for example, in 5G smart industry, massive industrial data will be Analysis, processing and optimization through the AI neural network require a more secure and reliable reasoning calculation of the neural network. Therefore, the verification method of AI calculation in the embodiment of this application can be applied to 5G smart industries, such as control equipment, servers, etc. Deploy devices with AI chips to improve the reliability of neural network inference calculations; scenarios that require large-scale deployment of AI chips, such as cloud computing servers, smart security cameras, smart driving vehicles, terminal devices, etc., these devices are large in scale and run If the time is long, hardware failure is prone to occur, and hardware failure will adversely affect the accuracy of the calculation results of the AI chip and the availability of the system, reducing the user experience. Therefore, the verification method for AI calculation in the embodiment of the present application can be applied to the cloud In computing servers, smart security cameras, and smart driving vehicles, the correctness of AI calculation results is verified in real time; in scenarios where there are a large number of matrix operations, smart devices such as smartphones and smart TVs are usually equipped with graphics processing units (GPUs). Considering the cost, this type of graphics processing unit has low process requirements when it involves, so the failure rate is high, and usually does not have any error detection capabilities, but at the same time, devices equipped with such graphics processing units need to process a large amount of image data, processing The process is mainly based on matrix calculations. When a failure occurs, it may be reflected in the image garbled characters in the smart phone or smart TV, which reduces the user experience. Therefore, the verification method of AI calculation in the embodiment of this application is used as a low-cost calibration The test method can be applied to smart devices equipped with the above-mentioned graphics processing unit, so as to reduce the impact of hardware failure on data processing and improve user experience.
图2示出了本申请实施例的AI计算的校验方法应用时可能涉及的硬件单元,包括AI芯片、CPU芯片、内存芯片、总线和GPU等。如图2所示,神经网络推理计算由AI芯片执行,AI芯片可以包括一个或多个AI核,AI计算可以由AI芯片中的每个AI核执行,AI核为AI芯片中进行推理计算的最小运算单元,因此AI核可称为AI计算单元,片内存储单元用于缓存AI计算的AI模型的参数、中间计算结果以及推理计算结果,各个AI核与片内存储单元之间通过总线连接。CPU和GPU也可以分别包括1个或多个计算单元。FIG. 2 shows hardware units that may be involved in the application of the verification method for AI calculation according to the embodiment of the present application, including AI chips, CPU chips, memory chips, buses, and GPUs. As shown in Figure 2, the neural network reasoning calculation is performed by the AI chip, and the AI chip can include one or more AI cores, and the AI calculation can be performed by each AI core in the AI chip. The smallest computing unit, so the AI core can be called the AI computing unit. The on-chip storage unit is used to cache the parameters, intermediate calculation results and inference calculation results of the AI model for AI calculation. Each AI core is connected to the on-chip storage unit through a bus . The CPU and the GPU may also respectively include one or more computing units.
本申请的神经网络计算的校验方法对神经网络的每一次计算进行校验,其中每一次校验可以由AI芯片的AI核执行,也可以由CPU芯片或者GPU中的计算单元执行,校验结果存储在内存芯片中,最后CPU芯片读取内存芯片中的校验结果,进行逻辑判断,确定神经网络的计算结果是否正确。AI芯片、CPU芯片、内存芯片、GPU之间通过总线连接。The verification method of the neural network calculation of the present application verifies each calculation of the neural network, wherein each verification can be performed by the AI core of the AI chip, or can be performed by the computing unit in the CPU chip or GPU. The result is stored in the memory chip, and finally the CPU chip reads the verification result in the memory chip and makes a logical judgment to determine whether the calculation result of the neural network is correct. The AI chip, CPU chip, memory chip, and GPU are connected through a bus.
以智能驾驶的场景为例,图3示出了本申请实施例的AI计算的校验方法的一种应用的系统架构,如图3所示,对于智能驾驶车辆,需要将摄像头、激光雷达、超声波雷达等传感器获取的数据经过智能驾驶计算平台的计算,从而生成一系列的执行指令,并发送给具体的执行器执行,例如转向执行器根据执行指令控制车辆转向、油门执行器控制车辆加速、刹车执行器控制车辆减速等。如果智能驾驶计算平台发生计算结果错误,产生了错误的执行指令,执行器根据错误的执行指令执行会造成驾驶危险,因此对智能驾驶计算平台中神经网络的推理计算进行校验尤为必要,本申请的AI计算的校验方法可应用于图3中的智能驾驶计算平台中,对智能驾驶计算平台中神经网络的推理计算进行校验,以保证计算结果的正确性,保障智能驾驶的安全性。图3中的智能驾驶计算平台包括图2所示的硬件单元。Taking the scene of intelligent driving as an example, Fig. 3 shows the system architecture of an application of the verification method of AI calculation according to the embodiment of the present application. As shown in Fig. 3, for intelligent driving vehicles, it is necessary to integrate The data acquired by sensors such as ultrasonic radar is calculated by the intelligent driving computing platform to generate a series of execution instructions and send them to specific actuators for execution. The brake actuator controls vehicle deceleration, etc. If the calculation result of the intelligent driving computing platform is wrong and a wrong execution command is generated, the execution of the actuator according to the wrong execution command will cause driving danger. Therefore, it is particularly necessary to verify the inference calculation of the neural network in the smart driving computing platform. This application The AI calculation verification method can be applied to the intelligent driving computing platform in Figure 3 to verify the reasoning calculation of the neural network in the intelligent driving computing platform to ensure the correctness of the calculation results and ensure the safety of intelligent driving. The intelligent driving computing platform in FIG. 3 includes the hardware units shown in FIG. 2 .
图4示出了本申请实施例的AI计算的校验方法应用的系统架构图,如图4所示,本申请实施例中的AI计算的校验方法为异构并行校验方法,是相对于传统的冗余校验计算而言。冗余的校验计算是通过执行和被校验的处理相同的计算,比对计算结果确定被校验的处理是否正确。对于图4中被校验的AI计算单元而言,采用另一用于校验的AI计算单元对输入数据执行相同的卷积计算,池化,激活以及全连接计算输出计算结果,通过比较两个AI计算单元的计算结果确定被校验的AI计算单元处理是否正确。这一方式要求用于校验的AI计算单元处理能力和被校验的AI计算单元相同或高于被校验的AI计算单元。本申请实施例中的AI计算的校验方法为异构并行校验方法,异构是采用了与被校验计算不同的计算处理进行校验,例如,针对神经网络内部不同结构卷积、全连接、池化、归一化等设计不同的校验方式,计算是异构的。由于计算是异构的,用于校验的计算单元也可以不同于被校验的AI计算单元,用于校验的计算单元和被校验的AI计算单元可以是不同结构,不同能力的硬件。本申请实施例中异构并行校验可以是不同于被校验的AI计算的计算处理,这种情况下,用于校验的计算单元可以和被校验的AI计算单元相同。本申请实施例中异构并行校验还可以不仅采用不同于被校验的AI计算的计算处理,用于校验的计算单元和被校验的AI计算单元结构或处理能力不同。例如,对AI计算的异构并行校验可以由AI计算单元执行,也可以由其他计算单元执行,这里执行AI推理计算的校验的计算单元均记为异构计算单元,其中异构计算单元的算力可以与执行AI推理计算的计算单元的算力相同,也可以比执行AI推理计算的计算单元的算力更低,这是由于本申请实施例的神经网络计算的校验方法对AI计算的校验所消耗的资源远小于AI计算所消耗的计算 资源,因此可以在算力较低的普通的计算单元中执行,例如GPU计算单元或算力较小CPU等,由此降低了校验的门槛,节约了成本。并行是指在AI计算的过程中即对神经网络中的每一次计算进行校验,以生成校验标记位,这里的神经网络的每一次计算是指神经网络的每一个处理数据的处理层对数据进程处理时所做的计算,当神经网络计算完成输出计算结果后,即可根据所有的校验标记位对计算结果进行判断,确定计算结果是否正确,无需停止AI计算,且校验具有实时性,以满足应用场景中对神经网络推理计算的实时性要求。以下结合图4对本申请实施例的AI计算的校验方法进行具体介绍。Figure 4 shows the system architecture diagram of the application of the verification method of AI calculation in the embodiment of the present application. As shown in Figure 4, the verification method of AI calculation in the embodiment of the present application is a heterogeneous parallel verification method, which is relatively For traditional redundancy check calculations. The redundant verification calculation is to perform the same calculation as the verified process, and compare the calculation results to determine whether the verified process is correct. For the verified AI computing unit in Figure 4, another AI computing unit for verification is used to perform the same convolution calculation, pooling, activation and full connection calculation output calculation results on the input data. By comparing the two The calculation results of each AI calculation unit determine whether the processing of the verified AI calculation unit is correct. This method requires that the processing capability of the AI computing unit used for verification is the same as or higher than that of the AI computing unit being verified. The verification method of the AI calculation in the embodiment of the present application is a heterogeneous parallel verification method. The heterogeneous method uses different calculation processing from the verified calculation for verification. Connection, pooling, normalization, etc. are designed with different verification methods, and the calculation is heterogeneous. Since the calculation is heterogeneous, the computing unit used for verification can also be different from the AI computing unit to be verified, and the computing unit used for verification and the AI computing unit to be verified can be hardware with different structures and capabilities . In this embodiment of the present application, the heterogeneous parallel verification may be a calculation process different from the AI calculation to be verified. In this case, the computing unit used for verification may be the same as the AI computing unit to be verified. In the embodiment of the present application, the heterogeneous parallel verification can not only adopt calculation processing different from that of the AI calculation to be verified, but also have different structures or processing capabilities of the computing unit used for verification and the AI computing unit to be verified. For example, the heterogeneous parallel verification of AI calculations can be performed by AI computing units, or by other computing units. Here, the computing units that perform the verification of AI reasoning calculations are all recorded as heterogeneous computing units, where heterogeneous computing units The computing power can be the same as the computing power of the computing unit that performs AI reasoning calculations, or can be lower than the computing power of computing units that perform AI reasoning calculations. Calculation verification consumes much less computing resources than AI computing, so it can be executed in ordinary computing units with low computing power, such as GPU computing units or CPUs with small computing power, thereby reducing the need for calibration. The threshold of inspection, saving the cost. Parallelism refers to the verification of each calculation in the neural network in the process of AI calculation to generate a check mark bit. Each calculation of the neural network here refers to the pair of processing layers of each processing data of the neural network. For the calculations made during data processing, when the neural network calculation is completed and the calculation results are output, the calculation results can be judged according to all the check mark bits to determine whether the calculation results are correct, without stopping the AI calculation, and the check has real-time To meet the real-time requirements for neural network inference calculations in application scenarios. The method for verifying the AI calculation in the embodiment of the present application will be specifically introduced below with reference to FIG. 4 .
首先AI计算单元正常执行神经网络的推理计算,当待处理的数据输入AI计算单元后,经过预处理,数据以张量的形式被输入到AI核中,同时神经网络模型也被加载到AI核中。神经网络模型具有不同的结构,待处理的数据经过神经网络模型中多个处理层进行处理得到推理结果并输出。神经网络模型中处理层按结构分可以包括卷积层,池化层,激活层和全连接层。通常神经网络模型可以包含多个处理层,例如,一个或多个卷积层,一个或多个池化层,一个或多个激活层,一个或多个全连接层。其中,卷积层和全连接层的计算体现为矩阵的乘法,当矩阵维度较高时,矩阵乘法运算引入的计算量较大,会消耗较多的计算资源。First, the AI computing unit normally executes the inference calculation of the neural network. After the data to be processed is input into the AI computing unit, after preprocessing, the data is input into the AI core in the form of tensor, and the neural network model is also loaded into the AI core. middle. Neural network models have different structures, and the data to be processed is processed by multiple processing layers in the neural network model to obtain inference results and output. The processing layer in the neural network model can be divided into structure and can include convolution layer, pooling layer, activation layer and fully connected layer. Usually a neural network model can contain multiple processing layers, for example, one or more convolutional layers, one or more pooling layers, one or more activation layers, and one or more fully connected layers. Among them, the calculation of the convolutional layer and the fully connected layer is reflected in the multiplication of the matrix. When the dimension of the matrix is high, the calculation amount introduced by the matrix multiplication operation is relatively large, and it will consume more computing resources.
图4以一种具有典型结构的神经网络模型为例进行说明,该神经网络模型包括卷积层、池化层、激活层和全连接层,应理解,本申请实施例的AI计算的校验方法并不限于图4中的神经网络模型的结构,对于其他结构的神经网络模型也适用,每种结构也可以有多个相同的处理层。预处理后的数据经过卷积层、池化层、激活层和全连接层的计算得到输出的计算结果,然后AI核将输出的计算结果传递给CPU,由CPU根据该计算结果执行相应的决策。由上述过程可知,传统的神经网络的推理计算的过程并不包括校验机制,传递给CPU的计算结果的正确性无法保证。本申请实施例的AI计算的校验方法可以对神经网络的推理计算进行校验,以保证输出的计算结果的正确性,避免CPU根据错误的计算结果执行错误的决策而导致严重后果。Figure 4 illustrates a neural network model with a typical structure as an example. The neural network model includes a convolutional layer, a pooling layer, an activation layer, and a fully connected layer. It should be understood that the verification of the AI calculation in the embodiment of the present application The method is not limited to the structure of the neural network model in FIG. 4 , and is also applicable to neural network models of other structures, and each structure may also have multiple identical processing layers. The preprocessed data is calculated by the convolutional layer, pooling layer, activation layer, and fully connected layer to obtain the output calculation results, and then the AI core passes the output calculation results to the CPU, and the CPU executes corresponding decisions based on the calculation results . It can be seen from the above process that the traditional neural network reasoning calculation process does not include a verification mechanism, and the correctness of the calculation results delivered to the CPU cannot be guaranteed. The AI calculation verification method of the embodiment of the present application can verify the inference calculation of the neural network to ensure the correctness of the output calculation results and avoid serious consequences caused by the CPU executing wrong decisions based on wrong calculation results.
不同于冗余校验中对神经网络的推理计算进行一次或多次重复计算,也不同于运行自检库的校验方法中在神经网络的推理计算结束后再进行校验,异构计算单元根据本申请实施例的AI计算的校验方法对AI核中神经网络中的一次或多次计算进行同步校验,如图4所示,对卷积层的第一次卷积计算进行校验得到校验标记位1,而卷积层可能有进行多次卷积计算,因此对卷积层的第二次卷积计算进行校验得到校验标记位2,对卷积层的第三次卷积计算进行校验得到校验标记位3……对池化层的第一次计算进行校验得到校验标记位2a,其中a为正整数,对池化层的第二次计算进行校验得到校验标记位2a+1……对激活层的第一次计算进行校验得到校验标记位3b,其中b为正整数,对激活层的第二次计算进行校验得到校验标记位3b+1……对全连接层的第一次计算进行校验得到校验标记位3c,其中c为正整数,对全连接层的第二次计算进行校验得到校验标记位3c+1……以此类推。以上校验具有实时性,且不会干扰神经网络的正常的推理计算过程。当神经网络的推理计算完成输出计算结果时,神经网络中的一次或多次计算的同步校验也完成,生成一个或多个校验标记位存储在内存中,CPU根据内存中的一个或多个校验标记位对输出结果进行校验,根据该一个或多个校验标记位即可判定神经网络输出的计算结果是否正确。同时本申请实施例的AI计算的校验方法针对神经网络内部的不同结构设计不同的校验方式, 如图4所示,对于卷积层、全连接层等因为矩阵的乘运算而消耗计算资源较多的处理层使用矩阵计算校验方式,例如降维度的矩阵计算进行校验,包括使用向量和矩阵的乘运算,或者向量乘运算进行校验,矩阵计算的计算量相对于冗余校验方式下矩阵的乘运算的计算量而言低,其中冗余校验是指对相应处理层执行相同的计算。由于神经网络中99%以上的计算量都是由卷积层、全连接层等结构中的矩阵乘运算带来的,剩下不到1%的计算量是由池化层、激活层等结构中的运算带来的,因此对消耗大多数计算资源的卷积层、全连接层等结构中的矩阵乘运算使用计算量相对较低的方式校验,可以极大减小对神经网络的推理计算校验的计算量。对于消耗极小部分计算资源的池化层、激活层等结构中的运算仍可使用冗余校验简化校验的处理方式,当然也可以使用计算量小的校验方式,进一步减小对神经网络的推理计算校验的计算量。It is different from performing one or more repeated calculations on the inference calculation of the neural network in the redundancy check, and it is also different from the verification method of running the self-inspection library after the inference calculation of the neural network is completed. The heterogeneous computing unit According to the verification method of AI calculation in the embodiment of the present application, one or more calculations in the neural network in the AI core are synchronously verified, as shown in Figure 4, the first convolution calculation of the convolutional layer is verified The check mark bit 1 is obtained, and the convolution layer may perform multiple convolution calculations, so the second convolution calculation of the convolution layer is checked to obtain the check mark bit 2, and the third convolution layer of the convolution layer Check the convolution calculation to get the check mark bit 3... Check the first calculation of the pooling layer to get the check mark bit 2a, where a is a positive integer, and check the second calculation of the pooling layer Check to get the check mark bit 2a+1... Check the first calculation of the activation layer to get the check mark bit 3b, where b is a positive integer, and check the second calculation of the activation layer to get the check mark Bit 3b+1... Check the first calculation of the fully connected layer to get the check mark bit 3c, where c is a positive integer, and check the second calculation of the fully connected layer to get the check mark bit 3c+ 1...and so on. The above verification is real-time and will not interfere with the normal reasoning and calculation process of the neural network. When the inference calculation of the neural network is completed and the calculation results are output, the synchronous verification of one or more calculations in the neural network is also completed, and one or more check mark bits are generated and stored in the memory. The output result is verified by one or more check mark bits, and it can be determined whether the calculation result output by the neural network is correct or not according to the one or more check mark bits. At the same time, the verification method of AI calculation in the embodiment of the present application designs different verification methods for different structures inside the neural network. As shown in Figure 4, for the convolutional layer and the fully connected layer, the calculation resources are consumed due to the multiplication operation of the matrix. More processing layers use matrix calculation and verification methods, such as matrix calculations for dimensionality reduction for verification, including multiplication operations between vectors and matrices, or vector multiplication operations for verification. The calculation amount of matrix calculations is relatively large compared to redundant verification The calculation amount of the multiplication operation of the matrix in the method is low, and the redundancy check refers to performing the same calculation on the corresponding processing layer. Since more than 99% of the calculations in the neural network are brought by matrix multiplication operations in structures such as convolutional layers and fully connected layers, the remaining less than 1% of the calculations are caused by structures such as pooling layers and activation layers. Therefore, the matrix multiplication operation in the structure of the convolutional layer and the fully connected layer that consumes most computing resources is checked in a relatively low-computing way, which can greatly reduce the reasoning of the neural network Calculate the amount of calculations for the checksum. For operations in structures such as pooling layers and activation layers that consume a very small amount of computing resources, redundant checks can still be used to simplify the processing of checks. Of course, checks with a small amount of calculation can also be used to further reduce the impact on neural networks. The calculation amount of inference calculation verification of the network.
根据本申请实施例的AI计算的校验方法,在神经网络完成推理计算输出计算结果后,神经网络中的一次或多次计算的同步校验也完成,并生成了一个或多个校验标记位,CPU读取内存中的一个或多个校验标记位,根据该一个或多个校验标记位判定神经网络输出的计算结果是否正确,如果判定计算结果正确,则执行相应的决策,如果判定计算结果不正确,则上报故障。According to the verification method of AI calculation in the embodiment of the present application, after the neural network completes the inference calculation and outputs the calculation result, the synchronous verification of one or more calculations in the neural network is also completed, and one or more verification marks are generated Bit, the CPU reads one or more check marks in the memory, and judges whether the calculation result output by the neural network is correct according to the one or more check marks, and if it determines that the calculation result is correct, then executes the corresponding decision. If the calculation result is determined to be incorrect, a fault is reported.
图5示出了本申请实施例的AI计算的校验方法的示意性流程图,如图5所示包括步骤501至步骤504,以下分别介绍。FIG. 5 shows a schematic flow chart of the verification method for AI calculation according to the embodiment of the present application. As shown in FIG. 5 , it includes steps 501 to 504, which will be introduced respectively below.
S501,获取第二计算单元处理AI计算的AI模型的参数,AI模型包括一个或多个第一处理层。S501. Acquire parameters of an AI model for processing AI calculation by a second computing unit, where the AI model includes one or more first processing layers.
第二计算单元为执行神经网络推理计算的AI核,图5所示的AI计算的校验方法由第一计算单元执行,第一计算单元可以与第二计算单元并非同一计算单元,第一计算单元可以是第二计算单元所在的AI芯片上的另一AI核,或者第二计算单元所在的AI芯片上的其他算力更低的计算单元,该其他计算单元可以是为AI加速设计,也可以不是为AI加速设计,第一计算单元也可以是其他AI芯片上的计算单元,第一计算单元还可以是CPU芯片中的CPU核等,其中CPU核为CPU芯片中的最小的计算单元。在一种可能的实现方式中,第一计算单元也可以与第二计算单元为同一计算单元,即该计算单元既执行AI计算,又同时执行本申请实施例的AI计算的校验方法。The second calculation unit is an AI core that executes neural network inference calculations. The verification method of AI calculation shown in FIG. 5 is executed by the first calculation unit. The first calculation unit may not be the same calculation unit as the second calculation unit. The first calculation unit The unit can be another AI core on the AI chip where the second computing unit is located, or another computing unit with lower computing power on the AI chip where the second computing unit is located. The other computing unit can be designed for AI acceleration, or It may not be designed for AI acceleration. The first computing unit may also be a computing unit on another AI chip. The first computing unit may also be a CPU core in a CPU chip. The CPU core is the smallest computing unit in a CPU chip. In a possible implementation manner, the first calculation unit and the second calculation unit may also be the same calculation unit, that is, the calculation unit not only performs AI calculation, but also executes the verification method of AI calculation in the embodiment of the present application at the same time.
由于第二计算单元执行神经网络的推理计算,因此所需的AI模型被加载到第二计算单元中,AI模型可以包括一个或多个处理层用于对输入AI模型的数据进行处理,其中第一处理层消耗的计算资源较大,对于具有典型结构的卷积神经网络模型时,可以是卷积层或者全连接层。AI模型中可以包括一个或多个第一处理层,例如,AI模型可以包括卷积层和全连接层,AI模型也可以包括多个卷积层和多个全连接层。上述均为举例说明,并不以此为限制。当然第一处理层还可以是池化层或激活层。Since the second computing unit performs the inference calculation of the neural network, the required AI model is loaded into the second computing unit, and the AI model may include one or more processing layers for processing the data input to the AI model, wherein the first A processing layer consumes relatively large computing resources. For a convolutional neural network model with a typical structure, it can be a convolutional layer or a fully connected layer. The AI model may include one or more first processing layers. For example, the AI model may include a convolutional layer and a fully connected layer, and the AI model may also include multiple convolutional layers and multiple fully connected layers. The foregoing are examples and are not intended to be limiting. Of course, the first processing layer may also be a pooling layer or an activation layer.
第一计算单元可以从第二计算单元中获取AI模型的参数,AI模型的参数包括权重、偏置等,例如,可以是权重矩阵。AI模型的参数可以是离线训练好后保存在系统内存中的。第一计算单元也可以从系统内存中获取AI模型的参数。The first calculation unit may acquire parameters of the AI model from the second calculation unit, and the parameters of the AI model include weights, biases, etc., for example, may be a weight matrix. The parameters of the AI model can be saved in the system memory after offline training. The first calculation unit may also acquire parameters of the AI model from the system memory.
对一个或多个第一处理层中的每一个第一处理层执行步骤S502至S503:Steps S502 to S503 are performed on each first processing layer in the one or more first processing layers:
S502,从第二计算单元获取第一处理层的输入数据。S502. Acquire input data of the first processing layer from the second computing unit.
第一计算单元从第二计算单元获取第一处理层的输入数据,依然以S501中的具有典 型结构的卷积神经网络模型对图像数据进行处理为例,当第一处理层为卷积层时,第一处理层的输入数据即为经过预处理的图像数据;当第一处理层为全连接层时,第一处理层的输入数据即为激活层的输出数据。The first computing unit obtains the input data of the first processing layer from the second computing unit, still taking the convolutional neural network model with a typical structure in S501 to process the image data as an example, when the first processing layer is a convolutional layer , the input data of the first processing layer is the preprocessed image data; when the first processing layer is a fully connected layer, the input data of the first processing layer is the output data of the activation layer.
S503,基于AI模型的参数和所述第一处理层的输入数据对第一处理层进行校验处理,以得到第一处理层的校验标记位,其中,对第一处理层的校验处理的计算量小于第二计算单元通过第一处理层处理输入数据的计算量。S503. Perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer to obtain the verification flag bit of the first processing layer, wherein the verification processing on the first processing layer The calculation amount of is smaller than the calculation amount of the second calculation unit processing the input data through the first processing layer.
具体的,AI模型的第一处理层中以卷积层和全连接层为例,对数据的处理都会转换为矩阵计算,因此上述获取的AI模型的参数包括AI模型的权重矩阵,获取的第一处理层的输入数据包括特征图矩阵。其中,权值矩阵包括多个行向量或多个列向量,特征图矩阵包括多个行向量或多个列向量,每个向量包括多个元素。矩阵的维度可以是矩阵的行数或列数,维度越高,矩阵计算的复杂度越高,消耗的计算资源也越高。本申请实施例的AI计算的校验方法使用降维度的矩阵计算校验的方式对第一处理层的计算进行校验处理,以得到第一处理层的校验标记位,例如,可以通过一个或多个相对于权值矩阵或者特征图矩阵的行数或列数少的矩阵进行计算。一种可能的实现方式如下:Specifically, in the first processing layer of the AI model, taking the convolutional layer and the fully connected layer as examples, the data processing will be converted into matrix calculations, so the parameters of the AI model obtained above include the weight matrix of the AI model, and the obtained first The input data of a processing layer includes a matrix of feature maps. Wherein, the weight matrix includes multiple row vectors or multiple column vectors, the feature map matrix includes multiple row vectors or multiple column vectors, and each vector includes multiple elements. The dimension of the matrix can be the number of rows or columns of the matrix. The higher the dimension, the higher the complexity of matrix calculation and the higher the consumption of computing resources. The verification method of AI calculation in the embodiment of the present application uses the method of matrix calculation and verification with reduced dimensionality to perform verification processing on the calculation of the first processing layer to obtain the verification flag bit of the first processing layer. For example, a or multiple matrices with fewer rows or columns than the weight matrix or feature map matrix. One possible implementation is as follows:
首先,第一计算单元对权重矩阵进行第一校验计算,以得到第一校验标记位。由于AI模型的参数是离线训练好之后保存在系统内存中的,因此对参与数据处理的AI模型的参数可以采用离线的方式进行校验计算,也就是说第一校验标记位可以是离线校验标记位。第一校验计算可以采用相对于特征图矩阵的降维度矩阵,行数或列数小于特征图矩阵的矩阵,与权重矩阵进行矩阵乘计算得到第一校验标记位。如图6中所示,将全1行向量与AI模型的权重矩阵进行矩阵乘运算,从而可以得到对于权重矩阵的离线校验标记位(offline checkpoint,OC)。计算得到的离线校验标记位OC可以存储在内存中,在后续校验过程中可以读取使用。应理解,本申请实施例中矩阵乘运算应当满足左矩阵列数与右矩阵行数相同,即图6中全1向量的列数应该等于权重矩阵的行数,本申请实施例中的其他矩阵乘运算也应该满足此规则。这里全1行向量只有1行,从行数这个维度来看是小于特征图矩阵的行数的,全1行向量和权重矩阵的矩阵运算相对于权重矩阵与特征图矩阵的矩阵计算而言是降维度矩阵计算。需要说明的是,此处仅为举例,并不限于此。Firstly, the first calculation unit performs a first check calculation on the weight matrix to obtain a first check mark bit. Since the parameters of the AI model are stored in the system memory after offline training, the parameters of the AI model participating in data processing can be verified and calculated offline, that is to say, the first verification mark bit can be verified offline. check mark. The first verification calculation may use a dimensionality reduction matrix relative to the feature map matrix, a matrix whose number of rows or columns is smaller than the feature map matrix, and perform matrix multiplication calculation with the weight matrix to obtain the first verification mark bit. As shown in Figure 6, the matrix multiplication operation is performed on the full 1-row vector and the weight matrix of the AI model, so as to obtain the offline checkpoint (OC) for the weight matrix. The calculated off-line check mark bit OC can be stored in the memory, and can be read and used in the subsequent check process. It should be understood that the matrix multiplication operation in the embodiment of the present application should satisfy that the number of columns of the left matrix is the same as the number of rows of the right matrix, that is, the number of columns of the all-1 vector in Figure 6 should be equal to the number of rows of the weight matrix, and other matrices in the embodiments of the present application The multiplication operation should also satisfy this rule. Here, the full 1-row vector has only 1 row. From the perspective of the number of rows, it is smaller than the number of rows of the feature map matrix. The matrix operation of the full 1-row vector and the weight matrix is compared to the matrix calculation of the weight matrix and the feature map matrix. Dimensionality reduction matrix computation. It should be noted that this is only an example and not limited thereto.
以S501中的具有典型结构的卷积神经网络模型对图像数据进行处理为例,如图7所示,在每一帧图像输入第二计算单元后会转换为特征图矩阵(feature matrix),第一计算单元对特征图矩阵进行第二校验运算,以得到第二校验标记位,也可以称之为特征图校验标记位(checkpoint feature map matrix,CF)。第二校验计算可以采用相对于权重矩阵的降维度矩阵,行数或列数小于权重矩阵的矩阵,与特征图矩阵进行矩阵计算得到第二校验标记位。如图7所示,将特征图矩阵与全1列向量进行矩阵乘运算,从而得到特征图校验标记位。这里全1列向量只有1列,从列数这个维度来看是小于权重矩阵的列数的,全1列向量和特征图矩阵的矩阵运算相对于权重矩阵与特征图矩阵的矩阵计算而言是降维度矩阵计算。需要说明的是,此处仅为举例,并不限于此。Take the convolutional neural network model with a typical structure in S501 to process image data as an example, as shown in Figure 7, after each frame of image is input into the second computing unit, it will be converted into a feature matrix (feature matrix), the first A calculation unit performs a second check operation on the feature map matrix to obtain a second check mark, which can also be called a check point feature map matrix (CF). The second verification calculation may use a dimensionality reduction matrix relative to the weight matrix, a matrix whose number of rows or columns is smaller than the weight matrix, and perform matrix calculation with the feature map matrix to obtain the second verification mark bit. As shown in FIG. 7 , the matrix multiplication operation is performed on the feature map matrix and all 1-column vectors to obtain the feature map check marks. Here, the full 1-column vector has only 1 column. From the perspective of the number of columns, it is smaller than the number of columns of the weight matrix. The matrix operation of the full 1-column vector and the feature map matrix is relative to the matrix calculation of the weight matrix and the feature map matrix. Dimensionality reduction matrix computation. It should be noted that this is only an example and not limited thereto.
第一计算单元根据第一校验标记位(离线校验标记位OC)和第二校验标记位(特征图校验标记位CF)获取计算前校验标记位。如图8所示,将离线校验标记位OC和特征图校验标记位CF进行向量乘运算,从而得到计算前校验标记位(check bit in,CB_in)。The first calculation unit acquires the pre-calculation check mark according to the first check mark (offline check mark OC) and the second check mark (feature map check mark CF). As shown in Figure 8, the off-line check mark bit OC and the feature map check mark bit CF are subjected to a vector multiplication operation to obtain the pre-calculation check mark bit (check bit in, CB_in).
图9示出了第二计算单元将权重矩阵和特征图矩阵进行卷积运算,以得到输出矩阵, 该计算过程为AI模型中第一处理层对数据的正常处理过程,而非校验过程。FIG. 9 shows that the second calculation unit performs convolution operation on the weight matrix and the feature map matrix to obtain the output matrix. This calculation process is the normal data processing process of the first processing layer in the AI model, not the verification process.
以下对卷积层中的卷积运算做简要介绍,卷积运算由卷积算子执行,卷积层可以包括很多个卷积算子,卷积算子也称为卷积核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同,再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络进行正确的预测。The following is a brief introduction to the convolution operation in the convolution layer. The convolution operation is performed by the convolution operator. The convolution layer can include many convolution operators. The convolution operator is also called the convolution kernel. It is used in the image The role in processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator can essentially be a weight matrix. This weight matrix is usually predefined. During the convolution operation on the image, The weight matrix is usually processed one pixel by one pixel (or two pixels by two pixels...depending on the value of the stride) along the horizontal direction on the input image, so as to complete the extraction of specific features from the image. Work. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to The entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolutional output with a single depth dimension, but in most cases instead of using a single weight matrix, multiple weight matrices of the same size (row×column) are applied, That is, multiple matrices of the same shape. The output of each weight matrix is stacked to form the depth dimension of the convolution image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to filter unwanted noise in the image. Do blurring etc. The multiple weight matrices have the same size (row×column), and the convolutional feature maps extracted by the multiple weight matrices of the same size are also of the same size, and then the extracted multiple convolutional feature maps of the same size are combined to form The output of the convolution operation. The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network can make correct predictions.
第一计算单元从第二计算单元获取输出矩阵,然后对输出矩阵进行第三校验计算(也为矩阵乘计算),以得到计算后校验标记位。具体如图10所示,使用全1矩阵与输出矩阵进行卷积运算,其中全1矩阵应该满足其列数与输出矩阵的行数相同,方可进行矩阵乘计算,由此可以得到计算后校验标记位(check bit out,CB_out)。The first calculation unit obtains the output matrix from the second calculation unit, and then performs a third check calculation (also a matrix multiplication calculation) on the output matrix to obtain the post-calculation check mark bit. Specifically, as shown in Figure 10, the convolution operation is performed using the all-ones matrix and the output matrix, where the all-ones matrix should satisfy that its number of columns is the same as the number of rows of the output matrix before matrix multiplication calculation can be performed, and thus the post-calculation calibration can be obtained check mark bit (check bit out, CB_out).
第一计算单元根据计算前校验标记位CB_in和计算后校验标记位CB_out获取校验标记位(check bit,CB),校验标记位可以通过CB_in和CB_out相减得到,也可以通过CB_in和CB_out相除得到,目的是为了对比CB_in和CB_out之间的差异。The first calculation unit obtains the check mark bit (check bit, CB) according to the check mark bit CB_in before calculation and the check mark bit CB_out after calculation, and the check mark bit can be obtained by subtracting CB_in and CB_out, and can also be obtained by CB_in and CB_out CB_out is obtained by dividing, the purpose is to compare the difference between CB_in and CB_out.
图4中对卷积层和全连接层采用上述图6至图11的矩阵计算校验方式,得到的校验标记位CB即为图4中的校验标记位1和校验标记位4c。In Fig. 4, the matrix calculation and verification methods in Fig. 6 to Fig. 11 are adopted for the convolutional layer and the fully connected layer, and the check mark bit CB obtained is the check mark bit 1 and the check mark bit 4c in Fig. 4 .
S504,基于校验结果确定AI模型的输出结果是否正确,校验结果包括根据步骤S502至S503得到的一个或多个第一处理层的校验标记位。S504. Determine whether the output result of the AI model is correct based on the verification result, where the verification result includes one or more verification flag bits of the first processing layer obtained according to steps S502 to S503.
校验标记位表示计算前校验标记位和计算后校验标记位是否一致,如果校验结果中存在至少一个校验标记位表示计算前校验标记位和计算后校验标记位不一致,则输出结果不正确。图11中以CB_in和CB_out相减得到校验标记位CB为例,由于CB_in由离线校验标记位OC和特征图校验标记位CF得到,表示输出矩阵理论上的矩阵计算校验结果,而CB_out为输出矩阵实际上的矩阵计算校验结果,如果CB为0,则CB_in和CB_out相同,则说明输出矩阵理论上的矩阵计算校验结果与输出矩阵实际上的矩阵计算校验结果相同,第一处理层在根据权重矩阵和特征图矩阵进行计算的过程中没有发生错误;反之,如果CB不为0,则CB_in和CB_out不相同,则说明输出矩阵理论上的矩阵计算校验结果与输出矩阵实际上的矩阵计算校验结果不相同,第一处理层在根据权重矩阵和特征图矩阵进行 计算的过程中发生错误。The check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation. If there is at least one check mark bit in the verification result, it means that the check mark bit before calculation and the check mark bit after calculation are inconsistent, then The output is incorrect. In Figure 11, the check mark CB obtained by subtracting CB_in and CB_out is taken as an example. Since CB_in is obtained from the offline check mark OC and the feature map check mark CF, it represents the theoretical matrix calculation check result of the output matrix, and CB_out is the actual matrix calculation verification result of the output matrix. If CB is 0, then CB_in and CB_out are the same, indicating that the theoretical matrix calculation verification result of the output matrix is the same as the actual matrix calculation verification result of the output matrix. There is no error in the calculation process of the first processing layer based on the weight matrix and the feature map matrix; on the contrary, if CB is not 0, then CB_in and CB_out are not the same, it means that the theoretical matrix calculation verification result of the output matrix is consistent with the output matrix The actual matrix calculation and verification results are different, and the first processing layer makes an error during the calculation process based on the weight matrix and the feature map matrix.
根据图6至图11的过程可以得到一个或多个第一处理层的校验标记位,记为校验结果。可知,如果校验结果中有至少一个校验标记位表示计算前校验标记位和计算后校验标记位不一致,例如一个或多个第一处理层的校验标记位有至少一个非0值,则说明该校验标记位对应的第一处理层在根据权重矩阵和特征图矩阵进行计算的过程中发生错误,则AI模型最后输出的计算结果不正确;如果校验结果中所有校验标记位表示计算前校验标记位和计算后校验标记位不均一致,例如一个或多个第一处理层的校验标记位均为0值,则说明一个或多个第一处理层在根据权重矩阵和特征图矩阵进行计算的过程中均没有发生错误,则AI模型最后输出的计算结果正确。According to the processes in FIG. 6 to FIG. 11 , one or more check flag bits of the first processing layer can be obtained, which is recorded as a check result. It can be seen that if there is at least one check mark bit in the check result, it indicates that the check mark bit before calculation is inconsistent with the check mark bit after calculation, for example, one or more check mark bits of the first processing layer have at least one non-zero value , it means that the first processing layer corresponding to the check mark bit has an error in the calculation process based on the weight matrix and the feature map matrix, and the calculation result output by the AI model is incorrect; if all the check marks in the check result The bit indicates that the check mark bit before calculation is not consistent with the check mark bit after calculation. For example, the check mark bits of one or more first processing layers are all 0, which means that one or more first processing layers are in accordance with If there is no error in the calculation process of the weight matrix and the feature map matrix, the final output calculation result of the AI model is correct.
可选的,上述一个或多个第一处理层的校验标记位存储在内存中,当AI模型输出计算结果后,由CPU读取内存中的一个或多个第一处理层的校验标记位,然后根据该一个或多个第一处理层的校验标记位判断AI模型输出计算结果是否正确。如果判断AI模型输出计算结果正确,则执行相应的决策,例如图2中根据智能驾驶计算平台的计算,从而生成一系列的执行指令,执行器根据执行指令控制车辆转向、加速、减速等;如果判断AI模型输出计算结果不正确,则上报第二计算单元发生故障。Optionally, the check marks of the above one or more first processing layers are stored in the memory, and after the AI model outputs the calculation results, the CPU reads the check marks of one or more first processing layers in the memory bits, and then judge whether the output calculation result of the AI model is correct according to the check mark bits of the one or more first processing layers. If it is judged that the calculation result output by the AI model is correct, the corresponding decision will be executed. For example, according to the calculation of the intelligent driving computing platform in Figure 2, a series of execution instructions will be generated, and the actuator will control the steering, acceleration, deceleration, etc. of the vehicle according to the execution instructions; if If it is judged that the calculation result output by the AI model is incorrect, a failure of the second calculation unit is reported.
图6至图11所示的降维度的矩阵计算进行校验(以下简称矩阵计算校验)的方式可以用于对卷积层和全连接层的计算的校验,这是由于卷积计算和全连接计算均会转换为矩阵计算,适用降维度的矩阵计算进行校验的方式。The method of verifying the dimensionality reduction matrix calculation shown in Figures 6 to 11 (hereinafter referred to as matrix calculation verification) can be used to verify the calculation of the convolutional layer and the fully connected layer, because the convolution calculation and All connection calculations will be converted to matrix calculations, and the method of dimensionality reduction matrix calculations is applicable for verification.
而对于AI模型中不适用降维度的矩阵计算进行校验的方式校验的一个或多个第二处理层,本申请实施例的AI计算的校验方法还包括,对一个或多个第二处理层中的每一个第二处理层进行冗余校验,以得到第二处理层的校验标记位,由此上述校验结果还包括一个或多个第二处理层的校验标记位,在判断AI模型输出计算结果是否正确时需要结合一个或多个第一处理层的校验标记位和一个或多个第二处理层的校验标记位综合判断。具体的,冗余校验即为前述在另一块芯片或者另一个计算单元上根据相同的输入数据和相同的AI模型,执行一个或多个第二处理层的相同计算,然后比较一个或多个第二处理层原计算结果和冗余计算结果是否一致,从而得到一个或多个第二处理层的校验标记位,一个或多个第二处理层的校验标记位可以是一个或多个第二处理层原计算结果和冗余计算结果的差或者商。如果一个或多个第二处理层原计算结果和冗余计算结果一致,则说明一个或多个第二处理层在处理数据时并未发生错误;如果一个或多个第二处理层的原计算结果和冗余计算结果不一致,则说明一个或多个第二处理层在处理数据时发生错误,则会导致AI模型最终输出的计算结果错误。由于池化计算和激活计算只占用极少部分的资源,因此即便使用冗余校验也不会消耗过多的资源。当原计算结果和冗余计算结果不一致,则判定第二处理层处理数据时发生错误,而不是冗余计算发生错误的根据可以是,在冗余校验中,可以设定只要当原计算结果和冗余计算结果不一致,则判定原计算发生错误;或者,由普通芯片或普通计算单元执行原计算,由高可靠性芯片或高可靠性计算单元执行冗余计算,当原计算结果和冗余计算结果不一致时,由于执行冗余计算的芯片或计算单元可靠性更高,因此发生错误的可能性较低,而执行原计算的芯片或计算单元可靠性较低,因此发生错误的可能性更高,因此当原计算结果和冗余计算结果不一致,认为是原计算发生错误。As for one or more second processing layers that are not verified by dimensionality reduction matrix calculation in the AI model, the verification method for AI calculation in the embodiment of the present application also includes, for one or more second Each second processing layer in the processing layer performs a redundancy check to obtain a check mark bit of the second processing layer, so that the above verification result also includes one or more check mark bits of the second processing layer, When judging whether the calculation result output by the AI model is correct, it is necessary to combine one or more check marks of the first processing layer and one or more check marks of the second processing layer for comprehensive judgment. Specifically, the redundancy check is to perform the same calculation of one or more second processing layers on another chip or another computing unit according to the same input data and the same AI model, and then compare one or more Whether the original calculation result of the second processing layer is consistent with the redundant calculation result, so as to obtain the check mark bits of one or more second processing layers, and the check mark bits of one or more second processing layers can be one or more The second processing layer is the difference or quotient between the original calculation result and the redundant calculation result. If the original calculation results of one or more second processing layers are consistent with the redundant calculation results, it means that one or more second processing layers did not make an error when processing data; if the original calculation results of one or more second processing layers If the result is inconsistent with the redundant calculation result, it means that one or more second processing layers have errors in processing data, which will lead to errors in the final output calculation results of the AI model. Since pooling calculations and activation calculations only occupy a very small portion of resources, even if redundancy checks are used, excessive resources will not be consumed. When the original calculation result is inconsistent with the redundant calculation result, it is determined that an error occurred when the second processing layer processes the data, rather than the redundancy calculation. If it is inconsistent with the redundant calculation result, it is determined that an error occurred in the original calculation; or, the original calculation is performed by a common chip or a common calculation unit, and the redundant calculation is performed by a high-reliability chip or a high-reliability calculation unit. When the calculation results are inconsistent, because the chips or computing units that perform redundant calculations are more reliable, errors are less likely to occur, and the chips or computing units that perform the original calculations are less reliable, so errors are more likely to occur High, so when the original calculation result is inconsistent with the redundant calculation result, it is considered that an error occurred in the original calculation.
当判定AI模型的输出结果不正确时,则说明第二计算单元所在的AI芯片发生了失效, 具体包括瞬态失效和永久性失效,其中瞬态失效只会对发生瞬态失效时的那一次计算造成影响,但对之后的计算不造成影响,而永久性失效则会对发生永久性失效后的所有计算都造成影响,如果不区分瞬态失效和永久性失效,进行故障上报后,根据系统的安全机制,被检测到发生失效的第二计算单元均会被直接停用,即使发生瞬态失效的第二计算单元还可以继续使用也会被停用,造成资源浪费,降低了系统的可用性。When it is judged that the output result of the AI model is incorrect, it means that the AI chip where the second computing unit is located has failed, including transient failure and permanent failure. Calculations are affected, but subsequent calculations are not affected, while permanent failures will affect all calculations after permanent failures. If transient failures and permanent failures are not distinguished, after fault reporting, according to the system The security mechanism, the second computing unit that is detected to be invalid will be directly disabled, even if the second computing unit that has a transient failure can continue to be used, it will be disabled, resulting in waste of resources and reducing the availability of the system .
因此本申请实施例的AI计算的校验方法还包括,当根据图5中的方法判定AI模型的输出结果为不正确时,通过运行自检库确定发生失效的第二计算单元的状态为瞬态失效或永久性失效。如图12所示,当第二计算单元在进行AI计算时,第一计算单元同步进行异构并行校验,当判定AI模型的输出结果错误,即检测到第二计算单元发生失效后,则运行自检库判断第二计算单元是发生了瞬态失效还是永久性失效,如果判断第二计算单元是发生了瞬态失效,则舍弃该次AI模型的输出结果,继续使用该第二计算单元进行AI计算,如果判断第二计算单元是发生了永久性失效,则上报第二计算单元的发生故障。Therefore, the verification method of AI calculation in the embodiment of the present application also includes, when the output result of the AI model is judged to be incorrect according to the method in FIG. status failure or permanent failure. As shown in Figure 12, when the second computing unit is performing AI calculations, the first computing unit simultaneously performs heterogeneous parallel verification. When it is determined that the output result of the AI model is wrong, that is, after the failure of the second computing unit is detected, Run the self-test library to judge whether the second computing unit has a transient failure or a permanent failure. If it is judged that the second computing unit has a transient failure, discard the output result of the AI model and continue to use the second computing unit Perform AI calculations, and if it is judged that the second calculation unit has a permanent failure, report the failure of the second calculation unit.
图13示出了本申请实施例的AI计算的校验方法判断第二计算单元的具体失效状态的示意图,如图13所示,CPU根据内存中的多个校验标记位判断AI模型的输出结果是否有误,具体判断方法参照上述对于图5中的描述,本申请实施例在此不再赘述,如果判断AI模型的输出结果有误,说明此时硬件发生了失效,则运行自检库进一步判断硬件是否发生永久性失效。由前述可知,STL主要用于识别永久性失效,因此调用STL对失效的硬件进行自检,如果自检发现了故障,说明此时第二计算单元是发生了永久性失效,则该第二计算单元不能再继续参与运算,需要进行故障上报;如果自检没有发现故障,说明此时第二计算单元是发生了瞬态失效,不会对后续的计算造成影响,则只需舍弃本次的错误的输出结果,第二计算单元可以继续进行后续的计算。Fig. 13 shows a schematic diagram of judging the specific failure state of the second computing unit by the verification method of AI calculation according to the embodiment of the present application. As shown in Fig. 13, the CPU judges the output of the AI model according to multiple verification flag bits in the memory Whether the result is wrong, the specific judgment method refers to the above description in Figure 5, the embodiment of the present application will not repeat it here, if it is judged that the output result of the AI model is wrong, it means that the hardware has failed at this time, then run the self-test library Further judge whether the hardware has a permanent failure. As can be seen from the foregoing, STL is mainly used to identify permanent failures. Therefore, STL is called to perform self-test on the failed hardware. If a fault is found in the self-test, it means that the second computing unit has a permanent failure at this time. Then the second computing unit The unit cannot continue to participate in the calculation, and a fault report is required; if no fault is found in the self-test, it means that the second computing unit has a transient failure at this time, and it will not affect subsequent calculations, so you only need to discard this error The output result of the second calculation unit can continue to perform subsequent calculations.
本申请实施例的AI计算的校验方法使用异构并行的校验方式对AI模型的计算进行校验,首先,校验由执行AI计算之外的其他计算单元执行,相比于周期性运行自检库的校验方法,本申请实施例的AI计算的校验方法不会干扰AI模型的推理计算的正常进行,因此不会影响AI核的加速性能,也避免了同一个AI核在执行AI计算时又执行校验,在保证AI模型输出结果的正确性的同时保证AI模型的推理计算的实时性。其次,本申请实施例的AI计算的校验方法针对AI模型中的不同处理层设计了不同的校验方式,最大限度节约了计算资源,使得校验可以在计算能力较低的计算单元中进行,降低了校验的成本。本申请实施例的AI计算的校验方法在确定了硬件失效后还可以进一步判断硬件的具体失效状态,若计算单元只是发生瞬态失效,则可以继续使用该计算单元,从而避免资源的浪费,提升了AI芯片的可用性。The AI calculation verification method of the embodiment of the present application uses a heterogeneous parallel verification method to verify the calculation of the AI model. First, the verification is performed by other computing units other than AI calculations. Compared with periodic operation The verification method of the self-inspection library, the verification method of the AI calculation in the embodiment of the present application will not interfere with the normal progress of the reasoning calculation of the AI model, so it will not affect the acceleration performance of the AI core, and it will also avoid the same AI core being executed During the AI calculation, verification is performed to ensure the correctness of the output results of the AI model and at the same time ensure the real-time performance of the reasoning calculation of the AI model. Secondly, the verification method of AI calculation in the embodiment of this application designs different verification methods for different processing layers in the AI model, which saves computing resources to the greatest extent, so that the verification can be performed in computing units with low computing power , reducing the cost of verification. The verification method of AI calculation in the embodiment of the present application can further judge the specific failure state of the hardware after the hardware failure is determined. If the calculation unit only fails transiently, the calculation unit can continue to be used, thereby avoiding the waste of resources. Improved the availability of AI chips.
图14示出了本申请实施例的另一种AI计算的校验方法的示意性流程图,如图14所示,包括步骤1401和步骤1402。FIG. 14 shows a schematic flowchart of another AI calculation verification method according to the embodiment of the present application, as shown in FIG. 14 , including step 1401 and step 1402 .
S1401,从第二计算单元获取对AI模型的输出结果的校验结果,校验结果为判定输出结果不正确。S1401. Obtain a verification result of the output result of the AI model from the second calculation unit, and the verification result is that the output result of the determination is incorrect.
S1402,运行自检库确定第二计算单元的状态为瞬态失效或永久性失效。S1402. Run the self-check library to determine whether the state of the second computing unit is a transient failure or a permanent failure.
图14的方法由第一计算单元执行,第一计算单元可以是第二计算单元所在的AI芯片上的另一AI核,或者第二计算单元所在的AI芯片上的其他算力更低的计算单元,该其他计算单元可以是为AI加速设计,也可以不是为AI加速设计,第一计算单元也可以是其他 AI芯片上的计算单元,第一计算单元还可以是CPU芯片中的CPU核等,其中CPU核为CPU芯片中的最小的计算单元,在一种可能的实现方式中,第一计算单元也可以与第二计算单元为同一计算单元,即该计算单元既执行AI计算,又同时执行本申请实施例的AI计算的校验方法。。其中,从第二计算单元获取对AI模型的输出结果的校验结果,该校验结果可以是由图5所示的方法得到的校验结果,也可以是由现有的任一种校验方法得到的校验结果。The method in Figure 14 is executed by the first computing unit, which can be another AI core on the AI chip where the second computing unit is located, or other calculations with lower computing power on the AI chip where the second computing unit is located. unit, the other computing unit may or may not be designed for AI acceleration, the first computing unit may also be a computing unit on another AI chip, the first computing unit may also be a CPU core in a CPU chip, etc. , where the CPU core is the smallest computing unit in the CPU chip. In a possible implementation, the first computing unit can also be the same computing unit as the second computing unit, that is, the computing unit not only performs AI computing, but also Execute the AI calculation verification method of the embodiment of the present application. . Wherein, the verification result of the output result of the AI model is obtained from the second calculation unit, the verification result may be the verification result obtained by the method shown in Figure 5, or it may be any existing verification The verification result obtained by the method.
当运行自检库的运行结果为没有故障时,第二计算单元的状态为瞬态失效;当运行自检库的运行结果为有故障时,第二计算单元的状态为永久性失效。当第二计算单元的状态为瞬态失效时,舍弃所述输出结果;当第二计算单元的状态为永久性失效时,上报第二计算单元的失效状态。When the result of running the self-test library is no fault, the state of the second computing unit is transient failure; when the result of running the self-test library is faulty, the state of the second computing unit is permanent failure. When the status of the second computing unit is transient failure, the output result is discarded; when the status of the second computing unit is permanent failure, the failure status of the second computing unit is reported.
本申请实施例的AI计算的校验方法通过CPU执行系统调度,调用STL对出现硬件失效的AI核进行自检,判断AI核是发生永久性失效还是瞬态失效,如果自检没有发现故障,则说明AI核发生了瞬态失效,不影响后续的计算,该AI核可以继续参与系统运算。如果自检发现了故障,说明AI核发生了永久性失效,那么该AI核不能继续参与运算,需要进行故障上报。避免了将发生了失效的AI核直接停用,降低了资源的浪费,提升了AI芯片的可用性。The verification method of AI calculation in the embodiment of the present application uses the CPU to perform system scheduling, calls STL to perform self-inspection on the AI core with hardware failure, and judges whether the AI core has a permanent failure or a transient failure. If no failure is found in the self-inspection, It means that the AI core has a transient failure, which does not affect the subsequent calculation, and the AI core can continue to participate in the system operation. If a fault is found in the self-test, it means that the AI core has permanently failed, and the AI core cannot continue to participate in the calculation, and a fault report is required. It avoids directly deactivating the failed AI core, reduces the waste of resources, and improves the availability of AI chips.
以上,结合图1至图14详细说明了本申请实施例的AI计算的校验方法。以下,结合图15和图16详细说明本申请实施例提供的AI计算的校验装置,这里的AI计算的校验装置可以是上述第一计算单元,用于执行对AI计算的校验。应理解,装置实施例的描述与方法实施例的描述相互对应,因此,未详细描述的内容可以参见上文方法实施例,为了简洁,这里不再赘述。The verification method for AI calculation in the embodiment of the present application has been described in detail above with reference to FIG. 1 to FIG. 14 . Hereinafter, with reference to FIG. 15 and FIG. 16 , the AI calculation verification device provided by the embodiment of the present application will be described in detail. Here, the AI calculation verification device may be the above-mentioned first calculation unit, which is used to perform the verification of AI calculation. It should be understood that the descriptions of the device embodiments correspond to the descriptions of the method embodiments. Therefore, for details that are not described in detail, reference may be made to the method embodiments above. For brevity, details are not repeated here.
图15是本申请实施例提供的AI计算的校验装置的示意性框图,装置1500具体可以是芯片、智能驾驶硬件平台等。该装置1500包括收发模块1510、处理模块1520。收发模块1510可以实现相应的通信功能,处理模块1520用于进行数据处理。收发模块1510还可以称为通信接口或通信单元。Fig. 15 is a schematic block diagram of an AI calculation verification device provided by an embodiment of the present application, and the device 1500 may specifically be a chip, an intelligent driving hardware platform, or the like. The device 1500 includes a transceiver module 1510 and a processing module 1520 . The transceiver module 1510 can implement a corresponding communication function, and the processing module 1520 is used for data processing. The transceiver module 1510 may also be called a communication interface or a communication unit.
可选地,该装置1500还可以包括存储模块,该存储模块可以用于存储指令和/或数据,处理模块1520可以读取存储模块中的指令和/或数据,以使得装置实现前述方法实施例。Optionally, the device 1500 may further include a storage module, which may be used to store instructions and/or data, and the processing module 1520 may read the instructions and/or data in the storage module, so that the device implements the aforementioned method embodiments .
该装置1500可以用于执行上文方法实施例中的动作,具体的,收发模块1510用于执行上文方法实施例中的收发相关的操作,处理模块1520用于执行上文方法实施例中的处理相关的操作。The device 1500 can be used to execute the actions in the above method embodiments, specifically, the transceiver module 1510 is used to execute the operations related to sending and receiving in the above method embodiments, and the processing module 1520 is used to execute the operations in the above method embodiments Handle related operations.
该装置1500可实现对应于根据本申请实施例中的方法实施例中的步骤或者流程,该装置1500可以包括用于执行图5、图14中的方法的模块。并且,该装置1500中的各模块和上述其他操作和/或功能分别为了实现图5、图14中的第二节点侧中的方法实施例的相应流程。The apparatus 1500 may implement steps or processes corresponding to the method embodiments in the embodiments of the present application, and the apparatus 1500 may include modules for executing the methods in FIG. 5 and FIG. 14 . Moreover, each module in the apparatus 1500 and the above-mentioned other operations and/or functions are respectively for realizing the corresponding flow of the method embodiment on the second node side in FIG. 5 and FIG. 14 .
其中,当该装置1500用于执行图5中的方法500时,收发模块1510可用于执行方法500中的步骤501和步骤502;处理模块1520可用于执行方法500中的处理步骤503和步骤504。Wherein, when the apparatus 1500 is used to execute the method 500 in FIG. 5 , the transceiver module 1510 can be used to execute steps 501 and 502 in the method 500 ; the processing module 1520 can be used to execute the processing steps 503 and 504 in the method 500 .
具体的,收发模块1510,用于获取第二计算单元处理AI计算的AI模型的参数,AI模型包括一个或多个第一处理层;对一个或多个第一处理层中的每一个第一处理层分别执 行以下校验处理得到一个或多个第一处理层中的每一个第一处理层的校验标记位:收发模块1510还用于,从第二计算单元获取第一处理层的输入数据;处理模块1520,用于基于AI模型的参数和第一处理层的输入数据对第一处理层进行校验处理,以得到第一处理层的校验标记位,其中,对第一处理层的校验处理的计算量小于第二计算单元通过第一处理层处理输入数据的计算量;处理模块1520还用于基于校验结果确定第二计算单元处理AI计算的输出结果是否正确,校验结果包括一个或多个第一处理层中的每一个第一处理层的校验标记位。Specifically, the transceiver module 1510 is configured to obtain the parameters of the AI model that the second calculation unit processes the AI calculation, the AI model includes one or more first processing layers; for each of the one or more first processing layers, the first The processing layer respectively performs the following verification processing to obtain the verification mark bit of each first processing layer in one or more first processing layers: the transceiver module 1510 is also used to obtain the input of the first processing layer from the second computing unit Data; processing module 1520, used to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain the verification mark bit of the first processing layer, wherein the first processing layer The calculation amount of the verification processing is less than the calculation amount of the second calculation unit processing the input data through the first processing layer; the processing module 1520 is also used to determine whether the output result of the second calculation unit processing AI calculation is correct based on the verification result, and the verification The result includes check flag bits for each of the one or more first processing layers.
在某些可能的实现方式中,AI模型还包括一个或多个第二处理层,处理模块1520还用于:对一个或多个第二处理层中的每一个第二处理层分别进行冗余校验得到一个或多个第二处理层中的每一个第二处理层的校验标记位;校验结果还包括一个或多个第二处理层中的每一个第二处理层的校验标记位。In some possible implementations, the AI model further includes one or more second processing layers, and the processing module 1520 is further configured to: respectively perform redundancy on each of the one or more second processing layers Verifying and obtaining the verification mark bit of each second processing layer in the one or more second processing layers; the verification result also includes the verification mark of each second processing layer in the one or more second processing layers bit.
在某些可能的实现方式中,AI模型的参数包括权重矩阵,第一处理层的输入数据包括特征图矩阵,处理模块1520具体用于:获取第一校验标记位,第一校验标记位是对权重矩阵进行第一校验计算得到的;获取第二校验标记位,第二校验标记位是对特征图矩阵进行第二校验计算得到的;根据第一校验标记位和第二校验标记位获取计算前校验标记位;从第二计算单元获取输出矩阵,输出矩阵为第二计算单元在第一处理层对权重矩阵和特征图矩阵进行计算得到;对输出矩阵进行第三校验计算,以得到计算后校验标记位;根据计算前校验标记位和计算后校验标记位获取校验标记位。In some possible implementations, the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the processing module 1520 is specifically used to: acquire the first check mark bit, the first check mark bit It is obtained by performing the first check calculation on the weight matrix; the second check mark bit is obtained, and the second check mark bit is obtained by performing the second check calculation on the feature map matrix; according to the first check mark bit and the second check mark bit The second check mark bit obtains the check mark bit before calculation; the output matrix is obtained from the second calculation unit, and the output matrix is calculated by the second calculation unit on the weight matrix and the feature map matrix at the first processing layer; the second calculation is performed on the output matrix Three check calculations to obtain the post-calculation check mark; obtain the check mark according to the pre-calculation check mark and the post-calculation check mark.
在某些可能的实现方式中,校验标记位表示计算前校验标记位和计算后校验标记位是否一致,处理模块1520具体用于:如果校验结果中存在至少一个校验标记位表示计算前校验标记位和计算后校验标记位不一致,则输出结果不正确。In some possible implementations, the check mark bit indicates whether the check mark bit before calculation is consistent with the check mark bit after calculation, and the processing module 1520 is specifically configured to: if there is at least one check mark bit in the check result If the pre-calculation checkmark bit is inconsistent with the post-calculation checkmark bit, the output result will be incorrect.
在某些可能的实现方式中,第一处理层为卷积层或全连接层。In some possible implementation manners, the first processing layer is a convolutional layer or a fully connected layer.
在某些可能的实现方式中,当判断输出结果不正确时,第二计算单元的状态包括瞬态失效和永久性失效。In some possible implementation manners, when the judgment output result is incorrect, the state of the second computing unit includes transient failure and permanent failure.
在某些可能的实现方式中,当判断输出结果不正确时,处理模块1520还用于:通过运行自检库确定第二计算单元的状态为瞬态失效或永久性失效。In some possible implementation manners, when the judgment output result is incorrect, the processing module 1520 is further configured to: determine that the state of the second computing unit is a transient failure or a permanent failure by running a self-check library.
在某些可能的实现方式中,收发模块1510还用于:当第二计算单元的状态为永久性失效时,上报第二计算单元的失效状态。In some possible implementation manners, the transceiver module 1510 is further configured to: report the failure status of the second computing unit when the status of the second computing unit is permanent failure.
当该装置1500用于执行图14中的方法1400时,收发模块1510可用于执行方法1400中的步骤1401;处理模块1520可用于执行方法1400中的处理步骤1402。When the apparatus 1500 is used to execute the method 1400 in FIG. 14 , the transceiving module 1510 can be used to execute step 1401 in the method 1400 ; the processing module 1520 can be used to execute the processing step 1402 in the method 1400 .
具体的,收发模块1510,用于获取第二计算单元处理AI计算的的对AI模型的输出结果的校验结果,校验结果为判定输出结果不正确;处理模块1520,用于运行自检库确定第二计算单元的状态为瞬态失效或永久性失效。Specifically, the sending and receiving module 1510 is used to obtain the verification result of the output result of the AI model processed by the second computing unit to process the AI calculation, and the verification result is to determine that the output result is incorrect; the processing module 1520 is used to run the self-checking library It is determined that the state of the second computing unit is a transient failure or a permanent failure.
在某些可能的实现方式中,当运行自检库的运行结果为没有故障时,第二计算单元的状态为瞬态失效;当运行自检库的运行结果为有故障时,第二计算单元的状态为永久性失效。In some possible implementations, when the result of running the self-test library is no fault, the state of the second computing unit is a transient failure; when the result of running the self-test library is faulty, the second computing unit status is permanently disabled.
在某些可能的实现方式中,装置1500还用于:当第二计算单元的状态为瞬态失效时,舍弃输出结果;当第二计算单元的状态为永久性失效时,上报第二计算单元的失效状态。In some possible implementations, the device 1500 is also used to: when the status of the second computing unit is transient failure, discard the output result; when the status of the second computing unit is permanent failure, report to the second computing unit failure status.
应理解,各模块执行上述相应步骤的具体过程在上述方法实施例中已经详细说明,为 了简洁,在此不再赘述。It should be understood that the specific process of each module performing the above corresponding steps has been described in detail in the above method embodiments, and for the sake of brevity, details are not repeated here.
如图16所示,本申请实施例还提供一种AI计算的校验设备1600。图16所示的AI计算设备1600可以包括:存储器1610、处理器1620、以及通信接口1630。其中,存储器1610、处理器1620,通信接口1630通过内部连接通路相连,该存储器1610用于存储指令,该处理器1620用于执行该存储器1620存储的指令,以控制通信接口1630接收输入样本或发送预测结果。可选地,存储器1610既可以和处理器1620通过接口耦合,也可以和处理器1620集成在一起。As shown in FIG. 16 , the embodiment of the present application also provides a verification device 1600 for AI calculation. The AI computing device 1600 shown in FIG. 16 may include: a memory 1610 , a processor 1620 , and a communication interface 1630 . Wherein, the memory 1610, the processor 1620, and the communication interface 1630 are connected through an internal connection path, the memory 1610 is used to store instructions, and the processor 1620 is used to execute the instructions stored in the memory 1620 to control the communication interface 1630 to receive input samples or send forecast result. Optionally, the memory 1610 may be coupled to the processor 1620 through an interface, or may be integrated with the processor 1620 .
需要说明的是,上述通信接口1630使用例如但不限于收发器一类的收发装置,来实现通信设备1600与其他设备或通信网络之间的通信。上述通信接口1630还可以包括输入/输出接口(input/output interface)。It should be noted that the above-mentioned communication interface 1630 uses a transceiver device such as but not limited to a transceiver to implement communication between the communication device 1600 and other devices or communication networks. The above-mentioned communication interface 1630 may also include an input/output interface (input/output interface).
在实现过程中,上述方法的各步骤可以通过处理器1620中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1610,处理器1620读取存储器1610中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。In the implementation process, each step of the above method may be implemented by an integrated logic circuit of hardware in the processor 1620 or instructions in the form of software. The methods disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 1610, and the processor 1620 reads the information in the memory 1610, and completes the steps of the above method in combination with its hardware. To avoid repetition, no detailed description is given here.
应理解,本申请实施例中,该处理器可以为中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the processor may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP), dedicated integrated Circuit (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
还应理解,本申请实施例中,该存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。处理器的一部分还可以包括非易失性随机存取存储器。例如,处理器还可以存储设备类型的信息。It should also be understood that in the embodiment of the present application, the memory may include a read-only memory and a random access memory, and provide instructions and data to the processor. A portion of the processor may also include non-volatile random access memory. For example, the processor may also store device type information.
本申请实施例还提供了一种芯片,其特征在于,所述芯片包括第一计算单元,第一计算单元用于执行上述图5或图14中的方法。An embodiment of the present application further provides a chip, which is characterized in that the chip includes a first computing unit, and the first computing unit is configured to execute the above-mentioned method in FIG. 5 or FIG. 14 .
可选的,该芯片还包括第二计算单元,第二计算单元用于执行AI计算。Optionally, the chip further includes a second calculation unit, and the second calculation unit is used to perform AI calculation.
请实施例还提供了一种计算机可读介质,其特征在于,计算机可读介质存储有程序代码,当计算机程序代码在计算机上运行时,使得计算机执行图5或图14中的方法。The embodiment also provides a computer-readable medium, which is characterized in that the computer-readable medium stores program codes, and when the computer program codes run on the computer, the computer executes the method in FIG. 5 or FIG. 14 .
请实施例还提供了一种计算设备,包括第一计算单元和第二计算单元,第二计算单元用于基于AI模型处理AI计算,第一计算单元执行图5或图14中的方法。第一计算单元的处理能力小于或等于第二计算单元的处理能力。第一计算单元为AI芯片中的计算单元、CPU芯片中的计算单元或GPU芯片中的计算单元中的至少一种,第二计算单元为AI芯片中的计算单元。The embodiment also provides a computing device, including a first computing unit and a second computing unit, the second computing unit is used to process AI computing based on the AI model, and the first computing unit executes the method in FIG. 5 or FIG. 14 . The processing capability of the first computing unit is less than or equal to the processing capability of the second computing unit. The first computing unit is at least one of a computing unit in an AI chip, a computing unit in a CPU chip, or a computing unit in a GPU chip, and the second computing unit is a computing unit in an AI chip.
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在 进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。The terms "component", "module", "system" and the like are used in this specification to refer to a computer-related entity, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be components. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems). Communicate through local and/or remote processes.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims (28)

  1. 一种人工智能AI计算的校验方法,其特征在于,所述方法由第一计算单元执行,所述方法包括:A verification method for artificial intelligence AI calculation, characterized in that the method is executed by a first calculation unit, and the method includes:
    获取第二计算单元处理所述AI计算的AI模型的参数,所述AI模型包括一个或多个第一处理层;Acquiring parameters of an AI model for processing the AI calculation by the second computing unit, the AI model including one or more first processing layers;
    对所述一个或多个第一处理层中的每一个第一处理层分别执行以下校验处理得到所述一个或多个第一处理层中的每一个第一处理层的校验标记位:Perform the following verification process on each of the one or more first processing layers respectively to obtain the verification mark bit of each of the first processing layers in the one or more first processing layers:
    从所述第二计算单元获取所述第一处理层的输入数据;obtaining input data of the first processing layer from the second computing unit;
    基于所述AI模型的参数和所述第一处理层的输入数据对所述第一处理层进行校验处理,以得到所述第一处理层的校验标记位,其中,对所述第一处理层的校验处理的计算量小于所述第二计算单元通过所述第一处理层处理所述输入数据的计算量;Perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer to obtain a verification flag bit of the first processing layer, wherein the first processing layer The calculation amount of the verification processing of the processing layer is smaller than the calculation amount of the second computing unit processing the input data through the first processing layer;
    基于校验结果确定所述第二计算单元处理所述AI计算的输出结果是否正确,所述校验结果包括所述一个或多个第一处理层中的每一个第一处理层的校验标记位。determining whether the output result of the AI calculation processed by the second calculation unit is correct based on a verification result, the verification result including a verification mark of each first processing layer in the one or more first processing layers bit.
  2. 根据权利要求1所述的方法,其特征在于,所述AI模型还包括一个或多个第二处理层,所述方法还包括:The method according to claim 1, wherein the AI model further comprises one or more second processing layers, and the method further comprises:
    对所述一个或多个第二处理层中的每一个第二处理层分别进行冗余校验得到所述一个或多个第二处理层中的每一个第二处理层的校验标记位;performing a redundancy check on each of the one or more second processing layers respectively to obtain a check mark bit of each of the one or more second processing layers;
    所述校验结果还包括所述一个或多个第二处理层中的每一个第二处理层的校验标记位。The verification result also includes a verification flag bit for each second processing layer of the one or more second processing layers.
  3. 根据权利要求1或2所述的方法,其特征在于,所述AI模型的参数包括权重矩阵,所述第一处理层的输入数据包括特征图矩阵,所述基于所述AI模型的参数和所述第一处理层的输入数据对所述第一处理层进行校验处理,以得到所述第一处理层的校验标记位,包括:The method according to claim 1 or 2, wherein the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the parameters based on the AI model and the The input data of the first processing layer performs verification processing on the first processing layer to obtain the verification mark bit of the first processing layer, including:
    获取第一校验标记位,所述第一校验标记位是对所述权重矩阵进行第一校验计算得到的;Obtaining a first check mark bit, the first check mark bit is obtained by performing a first check calculation on the weight matrix;
    获取第二校验标记位,所述第二校验标记位是对所述特征图矩阵进行第二校验计算得到的;Obtaining a second check mark bit, the second check mark bit is obtained by performing a second check calculation on the feature map matrix;
    根据所述第一校验标记位和所述第二校验标记位获取计算前校验标记位;Obtaining a pre-calculation check mark according to the first check mark and the second check mark;
    从所述第二计算单元获取输出矩阵,所述输出矩阵为所述第二计算单元在所述第一处理层对所述权重矩阵和所述特征图矩阵进行计算得到;Obtaining an output matrix from the second calculation unit, where the output matrix is obtained by calculating the weight matrix and the feature map matrix at the first processing layer by the second calculation unit;
    对输出矩阵进行第三校验计算,以得到计算后校验标记位;Carrying out the third verification calculation on the output matrix to obtain the post-calculation verification mark;
    根据所述计算前校验标记位和所述计算后校验标记位获取所述校验标记位。The check mark bit is obtained according to the check mark bit before calculation and the check mark bit after calculation.
  4. 根据权利要求3所述的方法,其特征在于,所述校验标记位表示所述计算前校验标记位和所述计算后校验标记位是否一致,所述基于校验结果确定所述第二计算单元处理所述AI计算的输出结果是否正确,包括:The method according to claim 3, wherein the check mark bit represents whether the check mark bit before the calculation is consistent with the check mark bit after the calculation, and the determination of the first check mark bit based on the check result The second calculation unit processes whether the output result of the AI calculation is correct, including:
    如果所述校验结果中存在至少一个所述校验标记位表示所述计算前校验标记位和所述计算后校验标记位不一致,则所述输出结果不正确。If at least one check mark bit in the check result indicates that the pre-calculation check mark bit and the post-calculation check mark bit are inconsistent, the output result is incorrect.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述第一处理层为卷积层或全连接层。The method according to any one of claims 1 to 4, wherein the first processing layer is a convolutional layer or a fully connected layer.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,当判断所述输出结果不正确时,所述第二计算单元的状态包括瞬态失效和永久性失效。The method according to any one of claims 1 to 5, wherein when it is judged that the output result is incorrect, the state of the second computing unit includes transient failure and permanent failure.
  7. 根据权利要求6所述的方法,其特征在于,当判断所述输出结果不正确时,所述方法还包括:The method according to claim 6, wherein when it is judged that the output result is incorrect, the method further comprises:
    通过运行自检库确定所述第二计算单元的状态为瞬态失效或永久性失效。The state of the second computing unit is determined to be a transient failure or a permanent failure by running the self-test library.
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:The method according to claim 7, wherein the method further comprises:
    当所述第二计算单元的状态为永久性失效时,上报所述第二计算单元的失效状态。When the status of the second computing unit is permanent failure, report the failure status of the second computing unit.
  9. 一种AI计算的校验方法,其特征在于,所述方法由第一计算单元执行,所述方法包括:A verification method for AI calculation, characterized in that the method is executed by a first calculation unit, and the method includes:
    获取第二计算单元处理所述AI计算的AI模型的输出结果的校验结果,所述校验结果为判定所述输出结果不正确;Acquiring a verification result of the output result of the AI model processed by the second computing unit, the verification result is to determine that the output result is incorrect;
    运行自检库确定所述第二计算单元的状态为瞬态失效或永久性失效。The self-test library is run to determine that the state of the second computing unit is a transient failure or a permanent failure.
  10. 根据权利要求9所述的方法,其特征在于,所述运行自检库确定所述第二计算单元的状态为瞬态失效或永久性失效,包括:The method according to claim 9, wherein the operation self-check library determines that the state of the second computing unit is a transient failure or a permanent failure, comprising:
    当所述运行自检库的运行结果为没有故障时,所述第二计算单元的状态为瞬态失效;When the operation result of the running self-check library is no failure, the state of the second computing unit is transient failure;
    当所述运行自检库的运行结果为有故障时,所述第二计算单元的状态为永久性失效。When the running result of running the self-check library is faulty, the state of the second computing unit is permanent failure.
  11. 根据权利要求9或10所述的方法,其特征在于,所述方法还包括:The method according to claim 9 or 10, characterized in that the method further comprises:
    当所述第二计算单元的状态为瞬态失效时,舍弃所述输出结果;discarding the output result when the state of the second computing unit is a transient failure;
    当所述第二计算单元的状态为永久性失效时,上报所述第二计算单元的失效状态。When the status of the second computing unit is permanent failure, report the failure status of the second computing unit.
  12. 一种AI计算的校验装置,其特征在于,所述装置包括:A verification device for AI calculation, characterized in that the device includes:
    收发单元,用于获取第二计算单元处理所述AI计算的AI模型的参数,所述AI模型包括一个或多个第一处理层;A transceiver unit, configured to obtain parameters of an AI model for processing the AI calculation by the second computing unit, where the AI model includes one or more first processing layers;
    对所述一个或多个第一处理层中的每一个第一处理层分别执行以下校验处理得到所述一个或多个第一处理层中的每一个第一处理层的校验标记位:Perform the following verification process on each of the one or more first processing layers respectively to obtain the verification mark bit of each of the first processing layers in the one or more first processing layers:
    所述收发单元还用于,从所述第二计算单元获取所述第一处理层的输入数据;The transceiver unit is further configured to acquire input data of the first processing layer from the second computing unit;
    处理单元,用于基于所述AI模型的参数和所述第一处理层的输入数据对所述第一处理层进行校验处理,以得到所述第一处理层的校验标记位,其中,对所述第一处理层的校验处理的计算量小于所述第二计算单元通过所述第一处理层处理所述输入数据的计算量;A processing unit, configured to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer, so as to obtain a verification flag bit of the first processing layer, wherein, The calculation amount of the verification processing on the first processing layer is smaller than the calculation amount of the second computing unit processing the input data through the first processing layer;
    所述处理单元还用于基于校验结果确定所述第二计算单元处理所述AI计算的输出结果是否正确,所述校验结果包括所述一个或多个第一处理层中的每一个第一处理层的校验标记位。The processing unit is further configured to determine whether the output result of processing the AI calculation by the second computing unit is correct based on a check result, the check result including each first processing layer in the one or more first processing layers A checkmark bit for the processing layer.
  13. 根据权利要求11所述的装置,其特征在于,所述AI模型还包括一个或多个第二处理层,所述处理单元还用于:The device according to claim 11, wherein the AI model further comprises one or more second processing layers, and the processing unit is also used for:
    对所述一个或多个第二处理层中的每一个第二处理层分别进行冗余校验得到所述一个或多个第二处理层中的每一个第二处理层的校验标记位;performing a redundancy check on each of the one or more second processing layers respectively to obtain a check mark bit of each of the one or more second processing layers;
    所述校验结果还包括所述一个或多个第二处理层中的每一个第二处理层的校验标记 位。The verification result also includes a verification flag bit for each second processing layer of the one or more second processing layers.
  14. 根据权利要求12或13所述的装置,其特征在于,所述AI模型的参数包括权重矩阵,所述第一处理层的输入数据包括特征图矩阵,所述处理单元具体用于:The device according to claim 12 or 13, wherein the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the processing unit is specifically used for:
    获取第一校验标记位,所述第一校验标记位是对所述权重矩阵进行第一校验计算得到的;Obtaining a first check mark bit, the first check mark bit is obtained by performing a first check calculation on the weight matrix;
    获取第二校验标记位,所述第二校验标记位是对所述特征图矩阵进行第二校验计算得到的;Obtaining a second check mark bit, the second check mark bit is obtained by performing a second check calculation on the feature map matrix;
    根据所述第一校验标记位和所述第二校验标记位获取计算前校验标记位;Obtaining a pre-calculation check mark according to the first check mark and the second check mark;
    从所述第二计算单元获取输出矩阵,所述输出矩阵为所述第二计算单元在所述第一处理层对所述权重矩阵和所述特征图矩阵进行计算得到;Obtaining an output matrix from the second calculation unit, where the output matrix is obtained by calculating the weight matrix and the feature map matrix at the first processing layer by the second calculation unit;
    对输出矩阵进行第三校验计算,以得到计算后校验标记位;Carrying out the third verification calculation on the output matrix to obtain the post-calculation verification mark;
    根据所述计算前校验标记位和所述计算后校验标记位获取所述校验标记位。The check mark bit is obtained according to the check mark bit before calculation and the check mark bit after calculation.
  15. 根据权利要求14所述的装置,其特征在于,所述校验标记位表示所述计算前校验标记位和所述计算后校验标记位是否一致,所述处理单元具体用于:The device according to claim 14, wherein the check mark bit indicates whether the check mark bit before the calculation is consistent with the check mark bit after the calculation, and the processing unit is specifically used for:
    如果所述校验结果中存在至少一个所述校验标记位表示所述计算前校验标记位和所述计算后校验标记位不一致,则所述输出结果不正确。If at least one check mark bit in the check result indicates that the pre-calculation check mark bit and the post-calculation check mark bit are inconsistent, the output result is incorrect.
  16. 根据权利要求12至15中任一项所述的装置,其特征在于,所述第一处理层为卷积层或全连接层。The device according to any one of claims 12 to 15, wherein the first processing layer is a convolutional layer or a fully connected layer.
  17. 根据权利要求12至16中任一项所述的装置,其特征在于,当判断所述输出结果不正确时,所述第二计算单元的状态包括瞬态失效和永久性失效。The device according to any one of claims 12 to 16, wherein when it is judged that the output result is incorrect, the state of the second computing unit includes transient failure and permanent failure.
  18. 根据权利要求17所述的装置,其特征在于,当判断所述输出结果不正确时,所述处理单元还用于:The device according to claim 17, wherein when it is judged that the output result is incorrect, the processing unit is further configured to:
    通过运行自检库确定所述第二计算单元的状态为瞬态失效或永久性失效。The state of the second computing unit is determined to be a transient failure or a permanent failure by running the self-test library.
  19. 根据权利要求18所述的装置,其特征在于,所述收发单元还用于:The device according to claim 18, wherein the transceiver unit is also used for:
    当所述第二计算单元的状态为永久性失效时,上报所述第二计算单元的失效状态。When the status of the second computing unit is permanent failure, report the failure status of the second computing unit.
  20. 一种AI计算的校验装置,其特征在于,所述装置包括:A verification device for AI calculation, characterized in that the device includes:
    收发单元,用于获取第二计算单元处理所述AI计算的AI模型的输出结果的校验结果,所述校验结果为判定所述输出结果不正确;The transceiver unit is configured to obtain a verification result of the output result of the AI model processed by the second calculation unit for the AI calculation, and the verification result is to determine that the output result is incorrect;
    处理单元,用于运行自检库确定所述第二计算单元的状态为瞬态失效或永久性失效。The processing unit is configured to run the self-check library to determine that the state of the second computing unit is a transient failure or a permanent failure.
  21. 根据权利要求20所述的装置,其特征在于,The device according to claim 20, characterized in that,
    当所述运行自检库的运行结果为没有故障时,所述第二计算单元的状态为瞬态失效;When the operation result of the running self-check library is no failure, the state of the second computing unit is transient failure;
    当所述运行自检库的运行结果为有故障时,所述第二计算单元的状态为永久性失效。When the running result of running the self-check library is faulty, the state of the second computing unit is permanent failure.
  22. 根据权利要求20或21所述的装置,其特征在于,所述装置还用于:The device according to claim 20 or 21, wherein the device is also used for:
    当所述第二计算单元的状态为瞬态失效时,舍弃所述输出结果;discarding the output result when the state of the second computing unit is a transient failure;
    当所述第二计算单元的状态为永久性失效时,上报所述第二计算单元的失效状态。When the status of the second computing unit is permanent failure, report the failure status of the second computing unit.
  23. 一种芯片,其特征在于,包括第一计算单元,所述第一计算单元用于执行如权利要求1至8中任一项所述的方法或9至11中任一项所述的方法。A chip, characterized by comprising a first computing unit configured to execute the method according to any one of claims 1 to 8 or any one of claims 9 to 11.
  24. 根据权利要求23所述的芯片,其特征在于,所述芯片还包括第二计算单元,所述第二计算单元用于执行AI计算。The chip according to claim 23, wherein the chip further comprises a second calculation unit, the second calculation unit is used to perform AI calculation.
  25. 一种计算机可读介质,其特征在于,所述计算机可读介质存储有程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行如权利要1至8中任一项所述的方法或9至11中任一项所述的方法。A computer-readable medium, characterized in that the computer-readable medium stores program codes, and when the computer program codes run on a computer, the computer executes the method described in any one of claims 1 to 8. method or the method described in any one of 9 to 11.
  26. 一种计算设备,其特征在于,所述计算设备包括第一计算单元和第二计算单元,所述第二计算单元用于基于AI模型处理AI计算,所述第一计算单元执行如权利要求1至8中任一项所述的方法或9至11中任一项所述的方法对所述第二计算单元进行校验。A computing device, characterized in that the computing device includes a first computing unit and a second computing unit, the second computing unit is used to process AI calculations based on an AI model, and the first computing unit executes the method according to claim 1. The method described in any one of 8 to 8 or the method described in any one of 9 to 11 verifies the second calculation unit.
  27. 根据权利要求26所述的计算设备,所述第一计算单元的处理能力小于或等于所述第二计算单元的处理能力。The computing device according to claim 26, the processing capability of the first computing unit is less than or equal to the processing capability of the second computing unit.
  28. 根据权利要求26或27所述的计算设备,所述第一计算单元为AI芯片中的计算单元、中央处理器CPU芯片中的计算单元或图形处理器GPU芯片中的计算单元中的至少一种,所述第二计算单元为AI芯片中的计算单元。The computing device according to claim 26 or 27, the first computing unit is at least one of a computing unit in an AI chip, a computing unit in a central processing unit CPU chip, or a computing unit in a graphics processing unit GPU chip , the second computing unit is a computing unit in an AI chip.
PCT/CN2022/084753 2021-08-12 2022-04-01 Ai computing verification method and apparatus WO2023015919A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110924997.XA CN115705487A (en) 2021-08-12 2021-08-12 AI calculation verification method and device
CN202110924997.X 2021-08-12

Publications (1)

Publication Number Publication Date
WO2023015919A1 true WO2023015919A1 (en) 2023-02-16

Family

ID=85180872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/084753 WO2023015919A1 (en) 2021-08-12 2022-04-01 Ai computing verification method and apparatus

Country Status (2)

Country Link
CN (1) CN115705487A (en)
WO (1) WO2023015919A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117333A (en) * 2018-09-29 2019-01-01 深圳比特微电子科技有限公司 Computing chip and its operating method
CN109902836A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 The failure tolerant method and System on Chip/SoC of artificial intelligence module
CN113032195A (en) * 2021-03-24 2021-06-25 上海西井信息科技有限公司 Chip simulation verification method, system, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117333A (en) * 2018-09-29 2019-01-01 深圳比特微电子科技有限公司 Computing chip and its operating method
CN109902836A (en) * 2019-02-01 2019-06-18 京微齐力(北京)科技有限公司 The failure tolerant method and System on Chip/SoC of artificial intelligence module
CN113032195A (en) * 2021-03-24 2021-06-25 上海西井信息科技有限公司 Chip simulation verification method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN115705487A (en) 2023-02-17

Similar Documents

Publication Publication Date Title
JP7430735B2 (en) Safety monitor for image misclassification
CN111510158B (en) Fault-tolerant error-correcting decoding method, device and chip of quantum circuit
US11144027B2 (en) Functional safety controls based on soft error information
CN111976623B (en) Chassis domain controller for intelligent automobile, control method of vehicle and vehicle
US20210146939A1 (en) Device and method for controlling a vehicle module
CN112506690A (en) Method and device for controlling processor
US11971803B2 (en) Safety monitor for invalid image transform
US20210070321A1 (en) Abnormality diagnosis system and abnormality diagnosis method
CN112231134B (en) Fault processing method and device for neural network processor, equipment and storage medium
US20200074287A1 (en) Fault detectable and tolerant neural network
CN105550067B (en) A kind of airborne computer binary channels system of selection
WO2023015919A1 (en) Ai computing verification method and apparatus
US20240103964A1 (en) Method and system for fault-tolerant data communication
EP3115900A1 (en) A computer system and a method for executing safety-critical applications using voting
GB2605467A (en) Verifying processing logic of a graphics processing unit
Kaprocki et al. Multiunit automotive perception framework: Synergy between AI and deterministic processing
JP7449193B2 (en) Computing device and vehicle control device
KR20240013877A (en) Error-Proof Inference Computation for Neural Networks
US20230401140A1 (en) Method for carrying out data processing
EP4134875A1 (en) Method and apparatus for diagnostic coverage for ai hardware accelerators
Matsumoto et al. Fault tolerance in small world cellular neural networks for image processing
Koeda et al. Fault-Tolerant Ensemble CNNs Increasing Diversity Based on Knowledge Distillation
CN117999544A (en) Fault tolerant system with minimal hardware
KR20230012139A (en) Method and system for detecting abnormality of processor with vehicle
Ben Abdallah et al. Fault-Tolerant Neuromorphic System Design

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22854924

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE