CN113298259B - CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform - Google Patents
CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform Download PDFInfo
- Publication number
- CN113298259B CN113298259B CN202110647708.6A CN202110647708A CN113298259B CN 113298259 B CN113298259 B CN 113298259B CN 202110647708 A CN202110647708 A CN 202110647708A CN 113298259 B CN113298259 B CN 113298259B
- Authority
- CN
- China
- Prior art keywords
- function
- layer
- pooling
- cnn
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013461 design Methods 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000006870 function Effects 0.000 claims abstract description 163
- 238000003062 neural network model Methods 0.000 claims abstract description 11
- 238000013135 deep learning Methods 0.000 claims abstract description 10
- 230000003068 static effect Effects 0.000 claims abstract description 8
- 230000007246 mechanism Effects 0.000 claims abstract description 6
- 238000011176 pooling Methods 0.000 claims description 56
- 230000004913 activation Effects 0.000 claims description 27
- 238000013528 artificial neural network Methods 0.000 claims description 22
- 239000013598 vector Substances 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 10
- 238000005538 encapsulation Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 11
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000012549 training Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 16
- 230000008901 benefit Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 238000004806 packaging method and process Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention belongs to the technical field of radar information processing, and discloses a CNN (computer numerical network) reasoning framework design method supporting multi-core parallelism of an embedded platform. Reading a model file after training of a deep learning framework, extracting weight and bias parameters from the model file and defining the weight and bias parameters by using pointer variables; the operation in the CNN network is respectively packaged into operation kernel functions, and a general programming interface design is carried out for constructing a prediction function of a CNN network reasoning framework; binding a prediction function to a core number of the multi-core processor based on a multithreading mechanism of the multi-core processor; writing a corresponding VSIPL static library according to the platform type of the CNN network reasoning framework to be deployed; and deploying the CNN network reasoning framework on each operating system. The invention establishes the reasoning frame of the neural network model on the embedded platform, meets the real-time processing requirement of the application scene, supports various chips and is compatible with various operating systems.
Description
Technical Field
The invention belongs to the technical field of radar information processing, and particularly relates to a CNN (computer network n) network reasoning framework design method supporting multi-core parallelism of an embedded platform.
Background
In the DSP platform of the embedded system, the challenge faced by deep learning is mainly that there is no unified and general high-performance reasoning framework. The deep learning framework tensorflow of Google is only applicable to the CPU and GPU of the mobile terminal; hundred degrees artificial intelligence framework PADLLEPADDLE is also not suitable for DSP platforms. The deployment of low-storage, low-complexity deep learning frameworks in low-cost, low-energy-consumption, computing-power-limited embedded systems presents challenges. Hardware manufacturers are striving to develop special design chips based on artificial intelligence, accelerate in hardware, optimize a deep learning algorithm for embedded equipment, perform high-performance parallel computation for an embedded platform, and establish a deep learning framework based on the embedded platform, so that the method becomes a new solution. At present, a DSP platform lacks a deep learning framework, for example, a neural network algorithm does not have an inference framework or the real-time performance of the inference framework is not high in an embedded DSP platform.
Disclosure of Invention
Aiming at the problems that a neural network algorithm in the prior art has no inference frame or the inference frame is not high in real-time, the invention aims to provide a CNN (computer network) inference frame design method supporting multi-core parallelism of an embedded platform, which is used for establishing the inference frame of a neural network model in the embedded platform such as a CPU (Central processing unit), a DSP (digital signal processor) and the like so as to meet the real-time processing requirement of an application scene.
Specifically, the invention is realized by adopting the following technical scheme.
The invention provides a CNN network reasoning framework design method supporting multi-core parallelism of an embedded platform, which comprises the following steps:
CNN network model loading: reading a model file trained by the deep learning framework, extracting weights and bias parameters from the model file, and outputting the model weights and bias parameters defined by pointer variables;
CNN network function encapsulation: the convolution operation, pooling operation, activation operation and full-connection operation in the CNN network are respectively packaged into operation kernel functions by adopting a vector instruction set, an assembly language and a C language, the input of each operation kernel function is the model weight and offset defined by the pointer variable, and the output is a convolution layer function, a pooling layer function, an activation function and a full-connection layer function respectively; designing a basic block and view object based on VSIPL standard, and unifying a packaged convolution layer function, a pooling layer function, an activation function and a full-connection layer function, including function parameters, to form an operation kernel function with a general neural network programming interface; constructing a prediction function of a CNN (computer numerical network) reasoning framework by adopting the operation kernel function with the universal neural network programming interface;
And (3) carrying out parallelization design: based on a multithreading mechanism of the multi-core processor, taking a prediction function of the CNN reasoning framework as a thread function, creating a plurality of tasks or a plurality of threads, designing thread synchronization and communication, dividing data of an input test data set based on a load balancing principle, and binding each task or thread to a core number of the multi-core processor through a thread binding function;
Performing cross-platform design: writing a corresponding VSIPL static library according to the platform type of the CNN network reasoning framework to be deployed; the CNN network reasoning framework is deployed at VxWorks, linux, windows, sylixOS or Reworks operating systems.
Further, the trained model file is a binary file containing model parameters and consists of control header parameters and data;
The control header parameters are integers, the 1 st word of the control header parameters is the number of layers of the neural network, the 2 nd, 3 rd and 4 th words represent the dimension of a weight matrix of the first layer neural network model, the 5 th, 6 th and 7 th words are the dimension of the first layer pooling bias, the 8 th word is the dimension of the first layer pooling bias, the 9 th, 10 th and 11 th words are the dimension of the second layer pooling bias, and the 12 th word is the dimension of the second layer pooling bias; and so on, up to the last layer of neural network;
the data are data which are stored in binary files according to the values of the control header parameters and the weights and the bias data of the neural network models from the first layer to the last layer in sequence.
Further, the convolution layer function carries out one-dimensional convolution, two-dimensional convolution or three-dimensional convolution operation, and the dimension, the number of convolution kernels and the size parameters of the convolution kernels of the convolution operation are set;
the pooling layer function performs one-dimensional pooling, two-dimensional pooling or three-dimensional pooling, and the dimension, pooling type, interval and step length of pooling operation are set;
The full connection layer function sets the dimension of the weight matrix;
The activation function sets an activation function.
Further, the designing the basic block and view object based on VSIPL standard to perform programming interface design on the encapsulated convolution layer function, pooling layer function, activation function, full connection layer function, including function parameters, and unifying the operation kernel function with the general neural network programming interface includes:
Calculating middleware standard definition basic blocks and views based on VSIPL, binding pointer variables loaded and output by the CNN network model into basic blocks, extracting data from the basic blocks to bind the data into views, wherein the views are matrixes or vectors; and calling the operation kernel function by taking the converted matrix or vector as an input parameter.
Further, the data partitioning of the input test data set based on the load balancing principle includes:
And dividing the input test data set into N parts averagely, wherein N is the number of cores of the multi-core processor, creating N tasks or threads, binding the tasks or threads on N cores of the multi-core processor, and executing in a data parallel mode.
The CNN network reasoning framework design method supporting the embedded platform multi-core parallelism has the following beneficial effects:
a processing framework for reasoning the convolutional neural network on the embedded platform is established, and a convolutional neural network model is quickly built by using a packaged kernel function, so that a threshold for developing an artificial intelligent algorithm on the embedded platform is reduced, and the reasoning efficiency of the convolutional neural network on the CPU and the DSP platform is improved;
The artificial intelligent algorithm is subjected to multi-core parallel design in the embedded platform reasoning framework, the data and the calculation tasks are automatically divided and mapped to the hardware threads through the multi-core parallel operation reasoning framework, the high-speed real-time processing of the convolutional neural network in the CPU and the DSP platform is ensured, and the hardware resource utilization rate of the DSP platform is brought into play;
the universal convolutional neural network interface is provided, and a developer can perform secondary development according to the programming interface provided by the invention through the self-defined convolutional neural network operation kernel function operator, the programming interface and the bottom assembly function library, so that the programming efficiency is improved.
Drawings
Fig. 1 is a schematic diagram of a CNN network reasoning framework design method supporting multi-core parallelism of an embedded platform according to this embodiment.
Fig. 2 is a schematic diagram of the data arrangement in the model file of the present embodiment.
Fig. 3 is a forward reasoning flowchart of the CNN model of the present embodiment.
Detailed Description
The invention is described in further detail below with reference to the examples and with reference to the accompanying drawings.
Example 1:
The embodiment of the invention relates to a CNN network reasoning framework design method supporting DSP multi-core parallelism. As shown in fig. 1, the CNN network reasoning framework design method supporting DSP multi-core parallelism of the present embodiment includes:
1. CNN network model loading
The C language is used for reading and writing the file, namely, reading the model file trained by the deep learning framework (for example tensorflow, pytorch), extracting the weight and the bias parameter from the model file, outputting the model weight and the bias parameter defined by the pointer variable, and writing the model weight and the bias parameter into a new file. The trained model file is a binary file containing model parameters, takes bin as a suffix and consists of control header parameters and data. The control header parameters are integers, as shown in fig. 2, the 1 st word of the control header parameters is the number of layers of the neural network, the 2 nd, 3 rd and 4 th words represent the dimension of a weight matrix of the first layer neural network model, the 5 th, 6 th and 7 th words are the dimension of the first layer pooling, the 8 th word is the dimension of the first layer pooling bias, the 9 th, 10 th and 11 th words are the dimension of the second layer pooling bias, and the 12 th word is the dimension of the second layer pooling bias; and so on, up to the last layer of neural network. The data are data which are stored in binary files according to the weight values and the bias data of the neural network models of the first layer to the last layer in sequence according to the values of the control header parameters.
2. CNN network function encapsulation
(1) And adopting a vector instruction set, an assembly language and a C language to respectively perform function encapsulation on operations (including convolution operation, pooling operation, activation operation, full-connection operation and prediction function) in the CNN network, and encapsulating the functions into an operation kernel function. The input of each operation kernel function is model weight and bias defined by pointer variables, and the output is convolution layer function, pooling layer function, activation function, normalization function, full connection layer function, etc.
The kernel functions defined in the present invention are shown in table 1.
Table 1 convolutional neural network unified programming interface
The operation kernel functions comprise convolution layer functions, pooling layer functions, activation functions, full connection layer functions and prediction functions. The method specifically comprises the following steps:
A convolution layer function, performing one-dimensional convolution, two-dimensional convolution and three-dimensional convolution operation, setting parameters such as the dimension, the number and the size of convolution kernels of the convolution operation, and packaging the parameters into an operation kernel function;
Pooling layer functions, carrying out one-dimensional pooling, two-dimensional pooling and three-dimensional pooling, setting the dimensionality, pooling type, interval and step length of pooling operation, and packaging into an operation kernel function;
packaging the activation functions, and designing ReLU, softmax, tanh and other activation functions;
Packaging the full-connection layer function and designing the dimension of the weight matrix;
and the prediction function is constructed by adopting a convolution layer function, a pooling layer function, an activation function and a full connection layer function.
(2) Based on VSIPL standard basic block and view object design, the operation kernel functions (convolution layer function, pooling layer function, activation function, full connection layer function) of each package, including function parameters, are subjected to programming interface design and unified into the operation kernel function with a general neural network programming interface. Namely:
The middleware standard definition block (basic block) and view (view) are calculated based on VSIPL (Vector SIGNAL AND IMAGE Processing Library), the output (i.e., pointer variable) of the previous step (neural network model loading) is bound as block, and the data binding is extracted from the block as view, which is a matrix or Vector. The model parameters are converted into matrixes and vectors through the binding of blocks and views, and the converted matrixes or vectors are used as input parameters (see p1 in the table 1) to call the operation kernel function defined by the invention.
The operation kernel function defined by the invention is designed based on VSIPL standard, the parameters are in standard data format, the data in the parameters are matrix or vector, other parameters except the data are relevant to application, and the algorithm is only required to be focused during secondary development, and the parameters are only required to be set. The operation kernel function is optimized through assembly language, so that the pipeline efficiency and the cache hit rate can be fully improved.
The operation kernel function is written by different assembly languages and can support x86, MIPS or powerpc architectures.
(3) And constructing a CNN network prediction function by adopting an operation kernel function (a convolution layer function, a pooling layer function, an activation function and a full connection layer function) with a general neural network programming interface.
3. Parallelization design for CNN network prediction function
Based on multithreading mechanisms of multi-core processors such as CPU, DSP and the like, CNN network prediction functions are used as thread functions, a plurality of tasks (tasks) or a plurality of threads (pthread) are created, thread synchronization and communication are designed, data division is carried out on an input test data set based on a load balancing principle, and each task or thread is bound to a core number of the multi-core processor through a thread binding function. That is, the input test data set is divided into N parts on average, N is the number of cores of the multi-core processor, N is the parallelism of parallel processing, and the task (task) or thread (pthread) is bound to N cores of the multi-core processor to execute in a data parallel manner. The method comprises the steps of dividing data to be predicted into a plurality of parts on average, calling model parameters loaded by neural network model parameters, writing thread functions by using encapsulated operation kernel functions, and binding the thread functions to each core of a multi-core processor for operation, so that automatic mapping of data and calculation tasks to the multi-core processor is realized.
The parallel design is carried out on the prediction function, the parallelism parameter is provided, the forward propagation calculation task of the convolutional neural network can be mapped to the multi-core processor, the data division is carried out on the test data, the task division is carried out on the irrelevant calculation task, and the high-speed real-time processing is realized.
The data can be divided into data according to the verification data set by the hardware platform, a multithreading or multitasking mechanism is created, the data and the computing tasks are automatically mapped to hardware, and single-machine multi-core, data parallelism and task parallelism are supported.
Based on the parallel running mechanism of different operating systems, the prediction function can support running on the operating systems Linux, windows, vxWorks, sylixOS, reworks and the like.
4. Performing cross-platform design
Different platforms call different VSIPL static libraries, and corresponding VSIPL static libraries are written according to the platform types of the CNN network reasoning framework to be deployed; the CNN network reasoning framework is deployed at VxWorks, linux, windows, sylixOS or Reworks operating systems.
The convolutional neural network reasoning code is written based on the neural network programming interface defined by the invention, a secondary development user can write any convolutional neural network reasoning code according to the requirements, write any static library calling different platforms, so that the operation kernel function can be deployed and operated on different hardware platforms, and cross-platform high performance is realized. Meanwhile, the VSIPL static library is realized by adopting assembly languages and C languages of different instruction sets, and the high-performance function library can fully exert computing resources and improve software performance.
Example 2:
The invention relates to a multi-core parallel processing method for CNN (computer network node) reasoning in DSP (digital signal processor) in another embodiment. In this embodiment, the target recognition reasoning part based on the convolutional neural network may run on a vxworks operating system DSP platform or a linux operating system x86 architecture CPU platform.
As shown in FIG. 3, the forward reasoning flow of the neural network model is based on convolutional neural network for identifying the model of the airplane, and the model of the airplane is totally four layers of neural networks. A piece of data read from the training dataset, the data being a vector of length 150, is used as input data for forward reasoning. The first layer of convolution layer is used for carrying out convolution of 6 channels and average pooling with the step length of 2; the second layer of convolution layer is used for carrying out convolution of 12 channels and average pooling with the step length of 2; the third layer is a full connection layer; the fourth layer is the output layer.
The multi-core parallel processing method of CNN (computer numerical network) reasoning in the DSP comprises the following steps:
1. CNN network model loading
And loading tensorflow a trained CNN network model file, wherein the model file is a bin file, and the data arrangement mode is shown in figure 2. Wherein the fourth output layer is not pooled and the parameter defaults to [ 111 ]. Reading the header and the data by the language C to obtain a model of 4 layers; the first layer has 6 convolution kernels with a size of [ 15 ], a pooling layer step size of 2, and a bias of 6; the second layer has 12 convolution kernels with a size of [ 65 ], a pooling layer step size of 2, and a bias of 12; the third layer is a fully connected layer, the weight size is [384 50], and the bias is 50; the fourth layer is the output layer, the weight size is [50 ], biased to 10.
2. CNN network function encapsulation
And adopting a vector instruction set, an assembly language and a C language to respectively perform function encapsulation on operations (including convolution operation, pooling operation, activation operation, full-connection operation and prediction function) in the CNN network, and encapsulating the functions into an operation kernel function. Including vsip _conv1d function (for first layer convolution), vsip _ avgpool _f function (for first layer pooling), vsip _active_f function (for first layer activation), vsip _conv1d function (for second layer convolution), vsip _ avgpool _f function (for second layer pooling), vsip _active_f function (for second layer activation), vsip _ fullnet _f function (for third layer full concatenation), vsip _ fullnet _f function (for fourth layer output), wherein the output layer (fourth layer) outputs confidence probabilities (outputs 10 probability values) for the class to which the aircraft model belongs.
In fig. 2, D1, D2, D3, and D4 represent weight data of the first layer (L1), the second layer (L2), the third layer (L3), and the fourth layer (L4), respectively. D1, D2, D3, D4 data are bound to block, then from block to view, i.e. matrix or vector, then view is taken as input, and vsip _conv1d function (for first layer convolution), vsip _ avgpool _f function (for first layer pooling), vsip _active_f function (for first layer activation), vsip _conv1d function (for second layer convolution), vsip _ avgpool _f function (for second layer pooling), vsip _active_f function (for second layer activation), vsip _ fullnet _f function (for third layer full connection), vsip _ fullnet _f function (for fourth layer output) are called. And designing a programming interface by using the packaged operation kernel function, including function parameters, based on VSIPL standard block and view object designs, and unifying the programming interface into a universal neural network programming interface.
The prediction function of the convolutional neural network inference framework is constructed with the encapsulated vsip _conv1d function (for first layer convolution), vsip _ avgpool _f function (for first layer pooling), vsip _active_f function (for first layer activation), vsip _conv1d function (for second layer convolution), vsip _ avgpool _f function (for second layer pooling), vsip _active_f function (for second layer activation), vsip _ fullnet _f function (for third layer full join), vsip _ fullnet _f function (for fourth layer output).
3. Parallelization design
The method comprises the steps of taking a prediction function of a convolutional neural network reasoning framework as a thread function, dividing data to be predicted into N parts (N is the parallelism of multi-thread parallel processing), creating N threads, binding the threads on N cores of a multi-core processor, and executing in a data parallel mode.
4. Performing cross-platform design
Writing a corresponding VSIPL static library according to the platform type of the CNN network reasoning framework to be deployed; the inference framework is deployed in a vxworks operating system or a linux operating system to complete cross-platform design, judge the labels of the test data sets, and improve the accuracy of target identification classification.
The CNN network reasoning framework design method supporting DSP multi-core parallelism provided by the invention is based on a convolution neural network reasoning framework of a DSP platform, adopts the means of model loading, defining an operation kernel function operator library, bottom assembly optimization and the like, adopts the multi-core parallelism software design method and a mode of packaging into a universal neural network interface, packages and data-parallels a neural network layer, a pooling layer and a full-connection layer, realizes the target recognition high-performance processing requirement of a radar system, and has the characteristics of cross-platform, high performance, parallelization and usability.
(1) Cross-platform: the universal convolutional neural network operation kernel function operator is designed based on VSIPL standard interfaces, operation kernel functions are packaged, a unified programming interface is established, one-time programming is realized, multiple operation is realized, operating systems such as Linux, windows, vxWorks, sylixOS, reworks are supported, and x86, MIPS and powerpc architectures are supported.
(2) High performance: aiming at the characteristics of different instruction sets, the operation kernel function operator adopts vectorization, parallelization and pipeline design, fully exerts the multistage pipeline efficiency of the CPU and the DSP, improves the efficiency of a vector processor, reduces the cache miss rate, thereby realizing the high performance of the calculation function and improving the reasoning speed.
(3) Parallelization: through multi-thread design, the algorithm is mapped to the multi-core processor, and multi-core parallel operation of the CPU and the DSP platform is supported.
(4) Ease of use: and a tensorflow, pytorch framework is supported, and a convolutional neural network model is supported.
In some embodiments, certain aspects of the techniques described above may be implemented by one or more processors of a processing system executing software. The software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer-readable storage medium. The software may include instructions and certain data that, when executed by one or more processors, operate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium may include, for example, a magnetic or optical disk storage device, a solid state storage device such as flash memory, cache, random Access Memory (RAM), or other non-volatile memory device. Executable instructions stored on a non-transitory computer-readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executed by one or more processors.
A computer-readable storage medium may include any storage medium or combination of storage media that can be accessed by a computer system during use to provide instructions and/or data to the computer system. Such storage media may include, but is not limited to, optical media (e.g., compact Disc (CD), digital Versatile Disc (DVD), blu-ray disc), magnetic media (e.g., floppy disk, magnetic tape, or magnetic hard drive), volatile memory (e.g., random Access Memory (RAM) or cache), non-volatile memory (e.g., read Only Memory (ROM) or flash memory), or microelectromechanical system (MEMS) based storage media. The computer-readable storage medium may be embedded in a computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disk or Universal Serial Bus (USB) based flash memory), or coupled to the computer system via a wired or wireless network (e.g., network-accessible storage (NAS)).
Note that not all of the activities or elements in the above general description are required, that a portion of a particular activity or device may not be required, and that one or more further activities or included elements may be performed in addition to those described. Still further, the order in which the activities are listed need not be the order in which they are performed. Moreover, these concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or element of any or all the claims. Furthermore, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter.
Claims (4)
1. A CNN network reasoning frame design method supporting embedded platform multi-core parallelism is characterized by comprising the following steps:
CNN network model loading: reading a model file trained by the deep learning framework, extracting weights and bias parameters from the model file, and outputting the model weights and bias parameters defined by pointer variables; the trained model file is a binary file containing model parameters and consists of control header parameters and data; the control header parameters are integers, the 1 st word of the control header parameters is the number of layers of the neural network, the 2 nd, 3 rd and 4 th words represent the dimension of a weight matrix of the first layer neural network model, the 5 th, 6 th and 7 th words are the dimension of the first layer pooling bias, the 8 th word is the dimension of the first layer pooling bias, the 9 th, 10 th and 11 th words are the dimension of the second layer pooling bias, and the 12 th word is the dimension of the second layer pooling bias; and so on, up to the last layer of neural network; the data are data which are stored in binary files according to the values of control header parameters and the weights and bias data of the neural network models from the first layer to the last layer in sequence;
CNN network function encapsulation: the convolution operation, pooling operation, activation operation and full-connection operation in the CNN network are respectively packaged into operation kernel functions by adopting a vector instruction set, an assembly language and a C language, the input of each operation kernel function is the model weight and offset defined by the pointer variable, and the output is a convolution layer function, a pooling layer function, an activation function and a full-connection layer function respectively; designing a basic block and view object based on VSIPL standard, and unifying a packaged convolution layer function, a pooling layer function, an activation function and a full-connection layer function, including function parameters, to form an operation kernel function with a general neural network programming interface; constructing a prediction function of a CNN (computer numerical network) reasoning framework by adopting the operation kernel function with the universal neural network programming interface;
and (3) carrying out automatic parallelization design: based on a multithreading mechanism of the multi-core processor, taking a prediction function of the CNN reasoning framework as a thread function, creating a plurality of tasks or a plurality of threads, designing thread synchronization and communication, dividing data of an input test data set based on a load balancing principle, and binding each task or thread to a core number of the multi-core processor through a thread binding function;
Performing cross-platform design: writing a corresponding VSIPL static library according to the platform type of the CNN network reasoning framework to be deployed; the CNN network reasoning framework is deployed at VxWorks, linux, windows, sylixOS or Reworks operating systems.
2. The CNN network reasoning framework design method supporting multi-core parallelism of embedded platform of claim 1, wherein,
The convolution layer function carries out one-dimensional convolution, two-dimensional convolution or three-dimensional convolution operation, and the dimension, the number of convolution kernels and the size parameters of the convolution kernels of the convolution operation are set;
the pooling layer function performs one-dimensional pooling, two-dimensional pooling or three-dimensional pooling, and the dimension, pooling type, interval and step length of pooling operation are set;
The full connection layer function sets the dimension of the weight matrix;
The activation function sets an activation function.
3. The CNN network inference framework design method supporting multi-core parallelism of an embedded platform according to claim 1, wherein the designing of the basic block and view object based on VSIPL standard, the programming interface design of the packaged convolution layer function, pooling layer function, activation function, full connection layer function, including function parameters, and the unification of the operation kernel function with the general neural network programming interface includes:
Calculating middleware standard definition basic blocks and views based on VSIPL, binding pointer variables loaded and output by the CNN network model into basic blocks, extracting data from the basic blocks to bind the data into views, wherein the views are matrixes or vectors; and calling the operation kernel function by taking the converted matrix or vector as an input parameter.
4. The CNN network reasoning framework design method supporting multi-core parallelism of the embedded platform according to claim 1, wherein the data partitioning of the input test data set based on the load balancing principle comprises:
And dividing the input test data set into N parts averagely, wherein N is the number of cores of the multi-core processor, creating N tasks or threads, binding the tasks or threads on N cores of the multi-core processor, and executing in a data parallel mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110647708.6A CN113298259B (en) | 2021-06-10 | 2021-06-10 | CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110647708.6A CN113298259B (en) | 2021-06-10 | 2021-06-10 | CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113298259A CN113298259A (en) | 2021-08-24 |
CN113298259B true CN113298259B (en) | 2024-04-26 |
Family
ID=77327859
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110647708.6A Active CN113298259B (en) | 2021-06-10 | 2021-06-10 | CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113298259B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113611332B (en) * | 2021-10-09 | 2022-01-18 | 聊城中赛电子科技有限公司 | Intelligent control switching power supply method and device based on neural network |
CN116991564B (en) * | 2023-09-28 | 2024-01-09 | 之江实验室 | Operator internal parallel acceleration method for heterogeneous dual-core MCU |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107784320A (en) * | 2017-09-27 | 2018-03-09 | 电子科技大学 | Radar range profile's target identification method based on convolution SVMs |
CN108549935A (en) * | 2018-05-03 | 2018-09-18 | 济南浪潮高新科技投资发展有限公司 | A kind of device and method for realizing neural network model |
CN110070178A (en) * | 2019-04-25 | 2019-07-30 | 北京交通大学 | A kind of convolutional neural networks computing device and method |
CN110766017A (en) * | 2019-10-22 | 2020-02-07 | 国网新疆电力有限公司信息通信公司 | Mobile terminal character recognition method and system based on deep learning |
CN111709522A (en) * | 2020-05-21 | 2020-09-25 | 哈尔滨工业大学 | Deep learning target detection system based on server-embedded cooperation |
CN112734040A (en) * | 2021-01-22 | 2021-04-30 | 中国人民解放军军事科学院国防科技创新研究院 | Embedded artificial intelligence computing framework and application method |
CN112748953A (en) * | 2020-07-02 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Data processing method and device based on neural network model and electronic equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10599978B2 (en) * | 2017-11-03 | 2020-03-24 | International Business Machines Corporation | Weighted cascading convolutional neural networks |
US11580386B2 (en) * | 2019-03-18 | 2023-02-14 | Electronics And Telecommunications Research Institute | Convolutional layer acceleration unit, embedded system having the same, and method for operating the embedded system |
-
2021
- 2021-06-10 CN CN202110647708.6A patent/CN113298259B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107784320A (en) * | 2017-09-27 | 2018-03-09 | 电子科技大学 | Radar range profile's target identification method based on convolution SVMs |
CN108549935A (en) * | 2018-05-03 | 2018-09-18 | 济南浪潮高新科技投资发展有限公司 | A kind of device and method for realizing neural network model |
CN110070178A (en) * | 2019-04-25 | 2019-07-30 | 北京交通大学 | A kind of convolutional neural networks computing device and method |
CN110766017A (en) * | 2019-10-22 | 2020-02-07 | 国网新疆电力有限公司信息通信公司 | Mobile terminal character recognition method and system based on deep learning |
CN111709522A (en) * | 2020-05-21 | 2020-09-25 | 哈尔滨工业大学 | Deep learning target detection system based on server-embedded cooperation |
CN112748953A (en) * | 2020-07-02 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Data processing method and device based on neural network model and electronic equipment |
CN112734040A (en) * | 2021-01-22 | 2021-04-30 | 中国人民解放军军事科学院国防科技创新研究院 | Embedded artificial intelligence computing framework and application method |
Non-Patent Citations (1)
Title |
---|
天津滨海迅腾.《TensorFlow项目式案例实战》.天津大学出版社,2020,(第1版),99-104. * |
Also Published As
Publication number | Publication date |
---|---|
CN113298259A (en) | 2021-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10942716B1 (en) | Dynamic computational acceleration using a heterogeneous hardware infrastructure | |
CN113298259B (en) | CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform | |
US11803404B2 (en) | Deep learning algorithm compiling method, device, and related product | |
US11275615B2 (en) | Data processing offload using in-storage code execution | |
US11669443B2 (en) | Data layout optimization on processing in memory architecture for executing neural network model | |
US11915149B2 (en) | System for managing calculation processing graph of artificial neural network and method of managing calculation processing graph by using the same | |
CN111160551A (en) | Computation graph execution method, computer device, and storage medium | |
WO2021000971A1 (en) | Method and device for generating operation data and related product | |
CA3114635A1 (en) | System and method for automated precision configuration for deep neural networks | |
CN110689116B (en) | Neural network pruning method and device, computer equipment and storage medium | |
US11656880B2 (en) | Function evaluation using multiple values loaded into registers by a single instruction | |
US20210073625A1 (en) | Partitioning control dependency edge in computation graph | |
US10564947B2 (en) | Computer system and method for multi-processor communication | |
US20220076095A1 (en) | Multi-level sparse neural networks with dynamic rerouting | |
EP4318319A1 (en) | Model processing method and apparatus | |
CN115576561A (en) | Deep neural network model compiling and optimizing method based on Shenwei processor | |
CN116228515B (en) | Hardware acceleration system, method and related device | |
WO2022078400A1 (en) | Device and method for processing multi-dimensional data, and computer program product | |
WO2023287702A1 (en) | Method and apparatus for accelerated inference of machine-learning models | |
US11573777B2 (en) | Method and apparatus for enabling autonomous acceleration of dataflow AI applications | |
CN113887730A (en) | Quantum simulator implementation method and device, related equipment and quantum simulation method | |
US11941383B1 (en) | Compilation with caching of code analysis result | |
US11809849B1 (en) | Global modulo allocation in neural network compilation | |
US20230121052A1 (en) | Resource resettable deep neural network accelerator, system, and method | |
KR20170081952A (en) | Multi-core simulation system and method based on shared translation block cache |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |