CN113902112A - Hardware calculation simulation method, system and computer readable storage medium - Google Patents
Hardware calculation simulation method, system and computer readable storage medium Download PDFInfo
- Publication number
- CN113902112A CN113902112A CN202111503789.9A CN202111503789A CN113902112A CN 113902112 A CN113902112 A CN 113902112A CN 202111503789 A CN202111503789 A CN 202111503789A CN 113902112 A CN113902112 A CN 113902112A
- Authority
- CN
- China
- Prior art keywords
- hardware
- software
- operator
- data
- graph structure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
- G06N3/065—Analogue means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a hardware computation simulation method, a system and a computer readable storage medium, wherein the method is applied to a hardware computation simulation system, the hardware computation simulation system comprises a construction interface, a compiler and an executor, and the method comprises the following steps: acquiring a neural network through the construction interface; serializing, by the compiler, the neural network into a software graph structure; and acquiring the data to be processed and the software graph structure through the actuator, and calling a software operator corresponding to the software graph structure to perform software calculation on the data to be processed so as to simulate the hardware calculation of the hardware module on the data to be processed. The method aims to improve the adaptability and accuracy of the hardware calculation simulation system for simulating the calculation of the hardware equipment.
Description
Technical Field
The invention relates to the technical field of deep learning, in particular to a hardware calculation simulation method, a hardware calculation simulation system and a computer readable storage medium.
Background
With the rapid development of deep learning and AI chips, deep learning systems are generally complex, and various frames and hardware devices have unique architectures, which makes the development environment deployment, testing, iterative improvement accuracy and performance tuning of deep learning tedious and time-consuming, and a deep learning inference frame is introduced to optimize the above process.
The conventional inference framework mainly comprises open source deep learning inference frameworks such as MNN/NCNN/TNN and the like, the open source deep learning inference framework can support common hardware equipment on the market, for some hardware equipment with unique design and optimization, the problem that the inference framework is not adaptive due to operator non-support, precision loss and the like can be caused by using the open source deep learning inference framework, and the performance and the advantages of the hardware equipment cannot be completely simulated.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a hardware calculation simulation method, a hardware calculation simulation system and a computer readable storage medium, aims to solve the problem that the hardware calculation simulation system is not adaptive due to operator non-support, precision loss and the like possibly caused by using an open source deep learning hardware calculation simulation system for hardware equipment with unique design and optimization, and aims to improve the adaptability and precision of the calculation of the simulation hardware equipment.
In order to achieve the above object, the present invention provides a hardware computation simulation method applied to a hardware computation simulation system, where the hardware computation simulation system includes a build interface, a compiler, and an executor, the build interface is connected to the compiler, the compiler is connected to the executor, and the executor is used to invoke a software operator or a hardware module to compute data to be processed, and the hardware computation simulation method includes:
acquiring a neural network through the construction interface;
serializing, by the compiler, the neural network into a software graph structure;
and acquiring the data to be processed and the software graph structure through the actuator, and calling a software operator corresponding to the software graph structure to perform software calculation on the data to be processed so as to simulate the hardware calculation of the hardware module on the data to be processed.
Optionally, the compiler includes a hardware compiling unit, and after the step of serializing the neural network into a software graph structure by the compiler, the method further includes:
serializing the software graph structure into a hardware graph structure by the hardware compiling unit;
and acquiring the data to be processed and the hardware graph structure through the actuator, and calling the hardware module to perform hardware calculation on the data to be processed.
Optionally, the step of serializing the software graph structure into a hardware graph structure by the hardware compiling unit includes:
and if the operator corresponding to the hardware diagram structure contains a software operator, adding a corresponding data copy operator in the hardware diagram structure.
Optionally, if an operator corresponding to the hardware graph structure includes a software operator, the step of adding a corresponding data copy operator to the hardware graph structure includes:
and adding a first data copy operator before a software operator in an operator corresponding to the hardware diagram structure so as to copy the data to be processed from the hardware module to a memory where the software operator is located, and calling the corresponding software operator to perform software calculation on the data to be processed.
Optionally, if an operator corresponding to the hardware graph structure includes a software operator, the step of adding a corresponding data copy operator to the hardware graph structure includes:
and adding a second data copy operator after a software operator in an operator corresponding to the hardware diagram structure so as to copy the data to be processed from the hardware module to a memory where the software operator is located, and calling the hardware module to perform software calculation on the data to be processed.
Optionally, the step of serializing the neural network into a software graph structure by the compiler comprises:
acquiring a preset relation between a component of the neural network and the software operator;
and serializing the neural network into the software graph structure according to the preset relation.
Optionally, before the step of obtaining the neural network through the building interface, the method further includes:
when a calling instruction of an upper-layer compiler is received, the calling of the upper-layer compiler is accepted;
and acquiring the neural network constructed by the upper-layer compiler through the construction interface.
In addition, to achieve the above object, the present invention also provides a hardware computation simulation system, including:
the building module is used for obtaining the neural network through the building interface;
the compiling module is used for serializing the neural network into a software graph structure through a compiler;
and the execution module is used for acquiring the data to be processed and the software graph structure through an actuator, and calling a software operator corresponding to the software graph structure to perform software calculation on the data to be processed so as to simulate a hardware module to perform hardware calculation on the data to be processed.
In addition, to achieve the above object, the present invention also provides a hardware computation simulation system, including: the hardware computation simulation program comprises a memory, a processor and a hardware computation simulation program which is stored on the memory and can run on the processor, wherein the hardware computation simulation program realizes the steps of the hardware computation simulation method when being executed by the processor.
In addition, to achieve the above object, the present invention further provides a computer readable storage medium having a hardware computation simulation program stored thereon, which, when executed by a processor, implements the steps of the hardware computation simulation method as described above.
According to the hardware calculation simulation method, the hardware calculation simulation system and the computer readable storage medium, the hardware calculation simulation method is applied to the hardware calculation simulation system, and firstly, the neural network is obtained through the construction interface; serializing, by the compiler, the neural network into a software graph structure; and acquiring the data to be processed and the software graph structure through the actuator, and calling a software operator corresponding to the software graph structure to perform software calculation on the data to be processed so as to simulate the hardware calculation of the hardware module on the data to be processed. Compared with the traditional hardware calculation simulation method, the hardware calculation simulation method provided by the embodiment of the invention supports hardware equipment with unique design and optimization, solves the problems of operator non-support, precision loss and the like in the traditional reasoning frame during hardware equipment simulation calculation, and improves the adaptability and accuracy in hardware equipment simulation calculation.
Drawings
Fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a hardware computation simulation method according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating a hardware computation simulation method according to another embodiment of the present invention;
FIG. 4 is a diagram illustrating basic components of a neural network involved in the hardware computation simulation method of the present invention;
FIG. 5 is a diagram illustrating the basic components of a software architecture involved in the hardware computation simulation method of the present invention;
FIG. 6 is a diagram illustrating basic components of a hardware diagram structure involved in the hardware calculation simulation method according to the present invention;
FIG. 7 is a schematic diagram illustrating a software operator calculation flow involved in the hardware calculation simulation method according to the present invention;
fig. 8 is a schematic block diagram of a hardware computation simulation system according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the related technology, the MNN/NCNN/TNN and other open source deep learning inference frames can support common hardware equipment in the market, but for some hardware equipment with unique design and optimization, the open source deep learning inference frame can cause the problems that the inference frames are not adaptive, such as operator non-support, precision loss and the like, the performance characteristics and advantages of the hardware equipment cannot be completely simulated, and accurate test support cannot be provided for the design stage of the hardware equipment.
In order to improve the adaptability and accuracy of the calculation of the simulated hardware equipment, the embodiment of the invention provides a hardware calculation simulation method, a hardware calculation simulation system and a computer readable storage medium, wherein the method mainly comprises the following steps:
acquiring a neural network through the construction interface;
serializing, by the compiler, the neural network into a software graph structure;
and acquiring the data to be processed and the software graph structure through the actuator, and calling a software operator corresponding to the software graph structure to perform software calculation on the data to be processed so as to simulate the hardware calculation of the hardware module on the data to be processed.
Therefore, the neural network is serialized into a software graph structure, and software operators corresponding to the software graph structure are called to perform software calculation on data to be processed, so that the simulation of the calculation process of hardware equipment is realized, and the adaptability and the accuracy of the simulation are improved.
The following detailed description of the claimed invention refers to the accompanying drawings.
As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.
The hardware calculation simulation system can be operated on the terminal in the embodiment of the invention.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a memory 1003, and a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The memory 1003 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). The memory 1003 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, the memory 1003, which is a kind of computer storage medium, may include therein an operating system and a hardware calculation simulation program.
In the terminal shown in fig. 1, the processor 1001 may be configured to call a hardware computation simulation program stored in the memory 1003, and perform the following operations:
acquiring a neural network through the construction interface;
serializing, by the compiler, the neural network into a software graph structure;
and acquiring the data to be processed and the software graph structure through the actuator, and calling a software operator corresponding to the software graph structure to perform software calculation on the data to be processed so as to simulate the hardware calculation of the hardware module on the data to be processed.
Further, the processor 1001 may call a hardware computation simulation program stored in the memory 1003, and further perform the following operations:
serializing the software graph structure into a hardware graph structure by the hardware compiling unit;
and acquiring the data to be processed and the hardware graph structure through the actuator, and calling the hardware module to perform hardware calculation on the data to be processed.
Further, the processor 1001 may call a hardware computation simulation program stored in the memory 1003, and further perform the following operations:
and if the operator corresponding to the hardware diagram structure contains a software operator, adding a corresponding data copy operator in the hardware diagram structure.
Further, the processor 1001 may call a hardware computation simulation program stored in the memory 1003, and further perform the following operations:
and adding a first data copy operator before a software operator in an operator corresponding to the hardware diagram structure so as to copy the data to be processed from the hardware module to a memory where the software operator is located, and calling the corresponding software operator to perform software calculation on the data to be processed.
Further, the processor 1001 may call a hardware computation simulation program stored in the memory 1003, and further perform the following operations:
and adding a second data copy operator after a software operator in an operator corresponding to the hardware diagram structure so as to copy the data to be processed from the hardware module to a memory where the software operator is located, and calling the hardware module to perform software calculation on the data to be processed.
Further, the processor 1001 may call a hardware computation simulation program stored in the memory 1003, and further perform the following operations:
acquiring a preset relation between a component of the neural network and the software operator;
and serializing the neural network into the software graph structure according to the preset relation.
Further, the processor 1001 may call a hardware computation simulation program stored in the memory 1003, and further perform the following operations:
when a calling instruction of an upper-layer compiler is received, the calling of the upper-layer compiler is accepted;
and acquiring the neural network constructed by the upper-layer compiler through the construction interface.
With the rapid development of deep learning and AI chips, deep learning systems are generally complex, and various deep learning frames and hardware devices have unique architectures, so that the work of deployment and testing of development environments for deep learning, iterative improvement of accuracy and performance tuning and the like becomes tedious and time-consuming, and a deep learning inference frame is introduced for simulating a calculation process of deep learning.
At present, a plurality of deep learning reasoning frameworks are common, and the deep learning reasoning frameworks are mainly divided into two categories. One type is an inference framework developed by hardware vendors for their own specific hardware, such as: OpenVINO by Intel, TensorRT by Intvada, Mediapipe by Google, and the like. The OpenVINO and TensorRT can support model formats of mainstream deep learning frames such as TensorFlow, PyTorch, MXNet and Caffe, the Mediapipe only supports the model formats of the TensorFlow deep learning frames, and the OpenVINO, TensorRT and Mediapipe frames have good performance, but support a single hardware platform and mostly support a self-research hardware platform. The other type is an open source deep learning inference framework such as MNN/NCNN/TNN and the like, which can support various training frameworks such as TensorFlow, PyTorch, MXNet, Caffe and the like, but the open source deep learning inference framework defines a set of model formats, inevitably uses converters provided by the frameworks, and can cause the problems of operator non-support, precision loss and the like for some hardware equipment with unique design and optimization or some self-research operators, so that the whole performance characteristics and advantages of the self-research hardware platform can not be played, and accurate operation tests can not be completely provided for the hardware equipment.
It can be seen that the above-mentioned drawbacks exist in the related hardware calculation simulation method. In order to solve the above-mentioned defects, the embodiments of the present invention provide a hardware computation simulation method applied to a hardware computation simulation system, which aims to solve the problem that, for some hardware devices with unique design and optimization, using an open source deep learning inference frame may cause inference frame non-adaptation such as operator non-support and precision loss, and improve the adaptation and precision during computation of simulated hardware devices.
Hereinafter, the contents of the claims of the present invention are explained by specific exemplary embodiments so that those skilled in the art can better understand the scope of the claims of the present invention. It is to be understood that the following exemplary embodiments are not intended to limit the scope of the present invention, but are merely illustrative of the present invention.
Illustratively, referring to fig. 2, in an embodiment of the hardware computation simulation method of the present invention, the hardware computation simulation method comprises the steps of:
s10, acquiring a neural network through the construction interface;
in this embodiment, the main body of the hardware computation simulation method is a hardware computation simulation system, the hardware computation simulation system can schedule an operator to perform computation inference, the hardware computation simulation system includes a construction interface, a compiler, and an actuator, the construction interface is connected to the compiler, the compiler is connected to the actuator, the actuator is used to call a software operator or a hardware module to perform computation on data to be processed, the hardware module includes a hardware device, which may be a CAISA chip, the CAISA chip is an artificial intelligence chip facing edge and cloud inference, the CAISA chip can support all common AI operators, and the CAISA chip can support most CNN algorithms (probabilistic Neural Networks Convolutional Neural Networks) through different configurations and combinations of operators in a data stream network. The building interface is an abstract interface at a higher layer in a hardware computation simulation system and is used for obtaining a neural network, the neural network is an arithmetic mathematical model for performing distributed parallel information processing, the purpose of processing information is achieved by adjusting the interconnection relationship among a large number of internal nodes, and the neural network is obtained through a building window, as shown in fig. 4, fig. 4 is a basic component of a convolutional neural network MobileNet, and the MobileNet is a mobile terminal or an embedded conventional convolutional neural network and is a small and high-efficiency CNN model, wherein 3 × 3DepthWise Conv (3 × 3 deep convolution), BN (normalization), RELU (activation function), and 1 × 1 Conv (1 × 1 convolution) are a computation flow of data to be processed.
Optionally, the building interface may receive a call of an upper compiler, and before the step of obtaining the neural network by the building interface, if receiving a call instruction of the upper compiler, the building interface may receive the call of the upper compiler, where the upper compiler may be a RainBuilder compiler, and the RainBuilder compiler is a general end-to-end automatic compilation tool, and the RainBuilder compiler may build the neural network through the building interface of the hardware computation simulation system, thereby implementing decoupling of the RainBuilder compiler and hardware computation, reducing difficulty in building the neural network, and facilitating maintenance.
S20, serializing the neural network into a software graph structure through the compiler;
in this embodiment, a compiler of the hardware computation simulation system is connected to the building interface, and is located at a lower layer of the building interface, and after the building interface obtains the neural network, the components of the neural network are serialized into a software graph structure, where the software graph structure is the arrangement of pure software operators.
Optionally, the step of serializing the neural network into a software graph structure by a compiler comprises: acquiring a preset relation between the neural network component and the software operator; and serializing the neural network into the software graph structure according to the preset relation.
The neural network component and the software operator have corresponding mapping relation, the mapping relation can be preset, when the compiler loads the neural network, the component of the neural network is analyzed, the software operator corresponding to the neural network is correspondingly confirmed according to the preset relation, and the software operator corresponding to the neural network is arranged into a software graph structure according to the sequence.
Referring to fig. 4 and 5, fig. 4 is a basic component of a convolutional neural network MobileNet, taking Single Batch processing (Single Batch) as an example, when a building interface of a hardware computation simulation system acquires the neural network MobileNet, a compiler of the hardware computation simulation system represents the basic component of the neural network MobileNet as a MobileNet software GRAPH structure SW _ GRAPH shown in fig. 5, where op _ m _ gconv (deep two-dimensional convolution operator) and op _ m _ conv (two-dimensional convolution operator) are a computation flow of the software GRAPH structure to-be-processed data.
S30, acquiring the data to be processed and the software graph structure through the actuator, and calling a software operator corresponding to the software graph structure to perform software calculation on the data to be processed so as to simulate the hardware calculation of the data to be processed by the hardware module.
In this embodiment, an actuator of the hardware computation simulation system is connected to the compiler and located on a lower layer of the compiler, after the compiler serializes the neural network into a software graph structure, to-be-processed data and the software graph structure are obtained, a corresponding software operator is called according to the software operator on the software graph structure to perform computation on the to-be-processed data, the to-be-processed data may be an image requiring feature extraction, the software operator may simulate an emulation operator of a hardware module, a computation process of the emulation operator is completed on the CPU, the actuator may call the software operator on the CPU, the software operator is simple to deploy and suitable for a demonstration effect, the actuator may also call the hardware module, and the hardware module performs computation on the to-be-processed data quickly and is suitable for an actual operation production scenario.
When the software operator calculates the data to be processed, the calculation process is the same as that of the hardware module, the hardware module can be simulated through software calculation, the hardware module called by the actuator can be a CAISA chip, the actuator of the hardware calculation simulation system provided by the embodiment of the invention is uniformly scheduled and optimized, the calculation performance advantage of a data flow architecture of the CAISA chip can be fully exerted, and meanwhile, an INT8/INT16 quantization precision mode is supported. The operator calculation through simulating the hardware behavior can provide the reference of accuracy verification in the chip design stage, and can verify with the calculation result of the CAISA chip, thereby further improving the verification and debugging efficiency.
In the technical scheme disclosed in the embodiment, a neural network is obtained through the construction interface; serializing, by the compiler, the neural network into a software graph structure; the data to be processed and the software graph structure are obtained through the actuator, software operators corresponding to the software graph structure are called to perform software calculation on the data to be processed so as to simulate hardware calculation of the hardware module on the data to be processed, the neural network components are serialized into the software graph structure, corresponding software operators are called according to the software graph structure to perform software calculation on the data to be processed, the software calculation has a demonstration effect and can simulate hardware calculation of hardware on the data to be processed, the software calculation can provide a reference for accuracy verification in a chip design stage and can be mutually verified with a CAISA chip calculation result, and the verification and debugging efficiency is further improved. Or some self-research operators can also be adapted, so that the performance and the advantages of the hardware module can be fully exerted, an accurate operation test is provided for the calculation of the hardware module, the complex hardware environment installation and deployment are avoided, and the adaptation degree and the accuracy of the simulated hardware equipment during calculation are improved.
Optionally, referring to fig. 3, based on any one of the above embodiments, in another embodiment of the hardware computation simulation method of the present invention, the hardware computation simulation method includes:
s40, serializing the software graph structure into a hardware graph structure through the hardware compiling unit;
in this embodiment, step S40 is performed after step S20, the compiler not only can serialize the neural network into a software graph structure, but also includes a hardware compiling unit that can serialize the software graph structure into a hardware graph structure, and the hardware graph structure is optimized by splitting and merging according to the hardware features and the features of specific operators, so that the calculation speed is faster.
Optionally, if an operator corresponding to the hardware diagram structure includes a software operator, a corresponding data copy operator is added to the hardware diagram structure.
The hardware computation simulation method in this embodiment supports user-defined software operators, and if software operators exist before and after a hardware operator, a DATA copy operator OP _ DATA _ CPY needs to be added accordingly to copy DATA to be processed, specifically, to copy the DATA to be processed from a DDR (double DATA rate synchronous dynamic random access memory) to a memory, or to copy the DATA to be processed from the memory to the DDR.
Specifically, a data copy operator is added before a software operator in an operator corresponding to the hardware diagram structure, so that the data to be calculated is copied from the hardware module to a memory where the software operator is located, so that the data to be calculated can be subjected to software calculation. Or adding a data copy operator behind a software operator in an operator corresponding to the hardware diagram structure to copy the data to be calculated from a memory where the software operator is located into the hardware module, so that the data to be calculated can be subjected to hardware calculation.
It is understood that the hardware module includes a memory for storing an operator required by the hardware calculation, which may be DDR, and the software operator is stored in the corresponding memory of the CPU, there may be software operators in the hardware diagram structure, there may be no software operators stored in the memory of the hardware module, and the hardware operators cannot be completed on the hardware module according to the hardware diagram structure, therefore, a data copy operator is added before a software operator in the hardware diagram structure, the data to be processed can be copied into the memory from the DDR, the corresponding software operator is called, the calculation of the software operator in the hardware diagram structure is completed on the CPU, if the software operator is followed by the calculation which needs to be carried out on the hardware module, and adding a data copy operator behind the software operator in the hardware diagram structure, copying the data to be processed into the DDR from the memory, and continuously performing hardware calculation on the data to be processed in the hardware module.
Referring to fig. 5 and 6, fig. 5 is a software GRAPH structure SW _ GRAPH of a neural network MobileNet, fig. 6 is a hardware GRAPH structure HW _ GRAPH of the software GRAPH structure SW _ GRAPH of the neural network MobileNet serialized by a hardware compiling module in a compiler, and in fig. 6, op _ m _ gconv _ HW (deep two-dimensional convolution operator) and op _ m _ gconv _ HW (two-dimensional convolution operator) are calculation flows of data to be processed by the hardware GRAPH structure. The dotted rectangle indicates that the operator may exist, and the operator in the dotted rectangle is a DATA copy operator OP _ DATA _ CPY, and if a software operator exists in the hardware diagram structure, the DATA copy operator exists, specifically, it needs to be determined according to whether there is an operator that needs to be performed in the hardware module before and after the software operator.
S50, acquiring the data to be processed and the hardware graph structure through the executor, and calling the hardware module to perform the hardware calculation on the data to be processed.
In this embodiment, after a hardware compiling unit of a compiler in a hardware computation simulation system converts a software graph structure of a neural network into a hardware graph structure, an executor acquires data to be processed and the hardware graph structure, where the data to be processed may be an image that needs feature extraction, and a hardware module that the executor may call calculates the data to be processed.
The present embodiment provides that the step S50 can be performed simultaneously with the step S30, before the step S30, or after the step S30, which is not limited in this embodiment.
It can be understood that, the software computation mentioned in step S30 is actually used to simulate the hardware computation in this embodiment, the software operator may run on the CPU, and the expected effect and the computation flow on the hardware module are shown, and the result of the software computation may be mutually verified with the computation result of the hardware computation, so as to simplify the work of the hardware module in terms of development environment deployment, testing, and iterative improvement accuracy and performance tuning, and provide accurate test support for the hardware device for deep learning in the design stage.
For convenience of understanding, the present solution is described below by specific application scenarios, and please refer to fig. 4, 5, 6, and 7.
Fig. 4 shows basic components of the convolutional neural network MobileNet, and taking Single Batch processing (Single Batch) as an example, after the composition interface of the hardware computation simulation system acquires the neural network MobileNet, the basic components of the neural network MobileNet are represented as a software diagram structure SW _ GRAPH of MobileNet shown in fig. 5 by a compiler of the hardware computation simulation system. Fig. 6 is a hardware diagram structure HW _ GRAPH of a neural network MobileNet software diagram structure SW _ GRAPH serialized by a hardware compiling module in a compiler, in fig. 6, a dotted line rectangular box indicates that the operator may exist, and the operator in the dotted line rectangular box is a DATA copy operator OP _ DATA _ CPY, if the hardware diagram structure has the software operator, the DATA copy operator exists, and the specific position needs to be determined according to whether the operator needs to be performed in the hardware module before and after the software operator.
After the hardware calculation simulation system converts SW _ GRAPH into HW _ GRAPH, the executor executes corresponding operators according to the two GRAPH structures for calculation, the SW _ GRAPH software GRAPH structure is used for executing the operators simulating hardware calculation behaviors in the CPU, and the HW _ GRAPH hardware GRAPH structure dispatches the hardware module of the CAISA chip to calculate data through the executor.
The SW _ GRAPH software GRAPH structure will invoke software operators that simulate hardware computation behavior through executors, here illustrated by the INT8/INT16 quantization process of the software operator M _ CONV in FIG. 5.
The convolution formula for M _ CONV is:
(x) represents the convolution output, x is the convolutional layer input, w is the weight matrix of the convolutional layer, and b is the bias unit. kh and kw represent the height and width of the convolution kernel, and c is the number of channels input.
The quantization formula corresponding to M _ CONV is:
r represents a real number, q is a quantized numerical value, S is a scaling factor from a real number value range to a quantization domain, and z is a zero point and is a value of q when r =0, so as to ensure that r can be taken as 0.
The formula of the quantized convolution result can be derived according to the two formulas:
ro, ri, rw represent the output, input and weight matrices of the real number domain; qo, qi and qw represent the output, input and weight matrices of the quantization domain; zo, zi and zw represent zeros of the output, input and weight matrices; so, si and sw represent the scaling factors of the output, input and weight matrices.
The specific calculation flow in the hardware module casia chip is shown in fig. 7, and in the hardware calculation simulation system, the calculation flow of calling the software operator is also the calculation behavior of the simulation hardware shown in fig. 7.
The above formula is realized by a KRNL module and an ACTV module, wherein the bracket calculation is completed in the KRNL module:
the constant part in equation (7) can be transferred to the ACTV module to be calculated as a new bias, and equation (7) can be expressed as:
the hardware simulation method provided by the embodiment can completely simulate the computing flow and behavior of the CAISA hardware, and can obtain the intermediate result of each module, thereby providing a basis for hardware verification and debugging.
In the technical solution proposed in this embodiment, the software graph structure is serialized into a hardware graph structure by the hardware compiling unit; and acquiring the data to be processed and the hardware graph structure through the actuator, and calling the hardware module to perform hardware calculation on the data to be processed. On one hand, data processing is completed by calling the hardware module, so that the data processing speed is improved, on the other hand, the hardware calculation result and the software calculation result can be verified mutually, the hardware calculation flow can be displayed through software calculation, accurate test support is provided mutually, and the adaptability and the accuracy of the simulation hardware equipment during calculation are also improved.
In addition, an embodiment of the present invention further provides a hardware computation simulation system, where the hardware computation simulation system includes a memory, a processor, and a hardware computation simulation method program that is stored in the memory and is executable on the processor, and when the hardware computation simulation method program is executed by the processor, the hardware computation simulation method program implements the steps of the hardware computation simulation method described in the above embodiments.
Furthermore, an embodiment of the present invention further provides a hardware computation simulation system, exemplarily referring to fig. 8, where the hardware computation simulation system 100 includes:
the device comprises a construction module 101, a compiling module 102 and an executing module 103, wherein the construction module 101 is used for acquiring a neural network through a construction interface; a compiling module 102, configured to serialize the neural network into a software graph structure through a compiler; the execution module 103 is configured to obtain data to be processed and the software graph structure through an executor, and call a software operator corresponding to the software graph structure to perform software computation on the data to be processed, so as to simulate a hardware module to perform hardware computation on the data to be processed.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a hardware computation simulation method program is stored, and when the hardware computation simulation method program is executed by a processor, the steps of the hardware computation simulation method described in the above embodiments are implemented.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a hardware computation simulation system to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A hardware computation simulation method is applied to a hardware computation simulation system, the hardware computation simulation system comprises a construction interface, a compiler and an executor, the construction interface is connected with the compiler, the compiler is connected with the executor, the executor is used for calling a software operator or a hardware module to compute data to be processed, and the hardware computation simulation method comprises the following steps:
acquiring a neural network through the construction interface;
serializing, by the compiler, the neural network into a software graph structure;
and acquiring the data to be processed and the software graph structure through the actuator, and calling a software operator corresponding to the software graph structure to perform software calculation on the data to be processed so as to simulate the hardware calculation of the hardware module on the data to be processed.
2. The hardware computation simulation method of claim 1, wherein the compiler comprises a hardware compilation unit, and wherein the step of serializing the neural network into a software graph structure by the compiler further comprises:
serializing the software graph structure into a hardware graph structure by the hardware compiling unit;
and acquiring the data to be processed and the hardware graph structure through the actuator, and calling the hardware module to perform hardware calculation on the data to be processed.
3. The hardware computation simulation method of claim 2, wherein the step of serializing the software graph structure into a hardware graph structure by the hardware compilation unit comprises:
and if the operator corresponding to the hardware diagram structure contains a software operator, adding a corresponding data copy operator in the hardware diagram structure.
4. The hardware computation simulation method of claim 3, wherein if the operator corresponding to the hardware graph structure includes a software operator, the step of adding a corresponding data copy operator to the hardware graph structure comprises:
and adding a first data copy operator before a software operator in an operator corresponding to the hardware diagram structure so as to copy the data to be processed from the hardware module to a memory where the software operator is located, and calling the corresponding software operator to perform software calculation on the data to be processed.
5. The hardware computation simulation method of claim 3, wherein if the operator corresponding to the hardware graph structure includes a software operator, the step of adding a corresponding data copy operator to the hardware graph structure comprises:
and adding a second data copy operator after a software operator in an operator corresponding to the hardware diagram structure so as to copy the data to be processed from the hardware module to a memory where the software operator is located, and calling the hardware module to perform software calculation on the data to be processed.
6. The hardware computation simulation method of claim 1, wherein the step of serializing the neural network into a software graph structure by the compiler comprises:
acquiring a preset relation between a component of the neural network and the software operator;
and serializing the neural network into the software graph structure according to the preset relation.
7. The hardware computation simulation method of claim 1, wherein the step of obtaining a neural network through the build interface is preceded by:
when a calling instruction of an upper-layer compiler is received, the calling of the upper-layer compiler is accepted;
and acquiring the neural network constructed by the upper-layer compiler through the construction interface.
8. A hardware computation simulation system, comprising:
the building module is used for obtaining the neural network through the building interface;
the compiling module is used for serializing the neural network into a software graph structure through a compiler;
and the execution module is used for acquiring the data to be processed and the software graph structure through an actuator, and calling a software operator corresponding to the software graph structure to perform software calculation on the data to be processed so as to simulate a hardware module to perform hardware calculation on the data to be processed.
9. A hardware computation simulation system, comprising: memory, a processor and a hardware computation simulation program stored on the memory and executable on the processor, the hardware computation simulation program when executed by the processor implementing the steps of the hardware computation simulation method of any of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a hardware computation simulation program, which when executed by a processor implements the steps of the hardware computation simulation method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111503789.9A CN113902112A (en) | 2021-12-10 | 2021-12-10 | Hardware calculation simulation method, system and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111503789.9A CN113902112A (en) | 2021-12-10 | 2021-12-10 | Hardware calculation simulation method, system and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113902112A true CN113902112A (en) | 2022-01-07 |
Family
ID=79025520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111503789.9A Pending CN113902112A (en) | 2021-12-10 | 2021-12-10 | Hardware calculation simulation method, system and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113902112A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115495093A (en) * | 2022-11-07 | 2022-12-20 | 深圳鲲云信息科技有限公司 | Hybrid compiling method and device, electronic equipment and storage medium |
CN116306856A (en) * | 2023-05-17 | 2023-06-23 | 之江实验室 | Deep learning model deployment method and device based on search |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650922A (en) * | 2016-09-29 | 2017-05-10 | 清华大学 | Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system |
CN111104120A (en) * | 2018-10-29 | 2020-05-05 | 赛灵思公司 | Neural network compiling method and system and corresponding heterogeneous computing platform |
US10789402B1 (en) * | 2019-05-01 | 2020-09-29 | Xilinx, Inc. | Compiler and hardware abstraction layer architecture for a neural network accelerator |
CN112445465A (en) * | 2019-08-28 | 2021-03-05 | 无锡江南计算技术研究所 | Neural network model reasoning and training method based on C code generation |
CN112465108A (en) * | 2020-11-11 | 2021-03-09 | 上海交通大学 | Neural network compiling method for storage and calculation integrated platform |
WO2021069211A1 (en) * | 2019-10-11 | 2021-04-15 | Robert Bosch Gmbh | Method of and apparatus for processing data of a deep neural network |
CN112860420A (en) * | 2019-11-27 | 2021-05-28 | 阿里巴巴集团控股有限公司 | Data processing method and device based on hardware virtualization |
-
2021
- 2021-12-10 CN CN202111503789.9A patent/CN113902112A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650922A (en) * | 2016-09-29 | 2017-05-10 | 清华大学 | Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system |
CN111104120A (en) * | 2018-10-29 | 2020-05-05 | 赛灵思公司 | Neural network compiling method and system and corresponding heterogeneous computing platform |
US10789402B1 (en) * | 2019-05-01 | 2020-09-29 | Xilinx, Inc. | Compiler and hardware abstraction layer architecture for a neural network accelerator |
CN112445465A (en) * | 2019-08-28 | 2021-03-05 | 无锡江南计算技术研究所 | Neural network model reasoning and training method based on C code generation |
WO2021069211A1 (en) * | 2019-10-11 | 2021-04-15 | Robert Bosch Gmbh | Method of and apparatus for processing data of a deep neural network |
CN112860420A (en) * | 2019-11-27 | 2021-05-28 | 阿里巴巴集团控股有限公司 | Data processing method and device based on hardware virtualization |
CN112465108A (en) * | 2020-11-11 | 2021-03-09 | 上海交通大学 | Neural network compiling method for storage and calculation integrated platform |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115495093A (en) * | 2022-11-07 | 2022-12-20 | 深圳鲲云信息科技有限公司 | Hybrid compiling method and device, electronic equipment and storage medium |
CN116306856A (en) * | 2023-05-17 | 2023-06-23 | 之江实验室 | Deep learning model deployment method and device based on search |
CN116306856B (en) * | 2023-05-17 | 2023-09-05 | 之江实验室 | Deep learning model deployment method and device based on search |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3754495B1 (en) | Data processing method and related products | |
WO2021233069A1 (en) | Quantization training and image processing methods and devices, and storage medium | |
US9043770B2 (en) | Program module applicability analyzer for software development and testing for multi-processor environments | |
KR20210149045A (en) | artificial intelligence chip verification | |
CN109993299A (en) | Data training method and device, storage medium, electronic device | |
CN113902112A (en) | Hardware calculation simulation method, system and computer readable storage medium | |
US8677334B2 (en) | Parallelization method, system and program | |
US6430590B1 (en) | Method and apparatus for processing executable program modules having multiple dependencies | |
US20140006751A1 (en) | Source Code Level Multistage Scheduling Approach for Software Development and Testing for Multi-Processor Environments | |
JP2959525B2 (en) | Data processing apparatus and method, information storage medium | |
JPH11513512A (en) | Method of manufacturing digital signal processor | |
Ahmad et al. | Petri net modeling and deadlock analysis of parallel manufacturing processes with shared-resources | |
US20190102149A1 (en) | Method for providing an integrated process for control unit development and a simulation device for control unit development | |
CN114399019A (en) | Neural network compiling method, system, computer device and storage medium | |
Pedre et al. | Accelerating embedded image processing for real time: a case study | |
JP2013164657A (en) | Parallelization method, system and program | |
Chaves et al. | Octave and python: High-level scripting languages productivity and performance evaluation | |
CN117520205A (en) | AI software stack testing method and device, computer readable storage medium and terminal | |
Yasudo et al. | Performance estimation for exascale reconfigurable dataflow platforms | |
CN116523052A (en) | Rapid reasoning method, device and equipment | |
CN115618943A (en) | Model deployment method, device and system and electronic equipment | |
CN114970847A (en) | Data processing method, device and storage medium | |
CN111950219B (en) | Method, apparatus, device and medium for realizing simulator | |
CN114461225A (en) | Compiling method and device, electronic equipment and storage medium | |
Giesl et al. | Computing Contraction Metrics: Comparison of Different Implementations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |