CN113298259B - CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform - Google Patents

CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform Download PDF

Info

Publication number
CN113298259B
CN113298259B CN202110647708.6A CN202110647708A CN113298259B CN 113298259 B CN113298259 B CN 113298259B CN 202110647708 A CN202110647708 A CN 202110647708A CN 113298259 B CN113298259 B CN 113298259B
Authority
CN
China
Prior art keywords
function
layer
pooling
cnn
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110647708.6A
Other languages
Chinese (zh)
Other versions
CN113298259A (en
Inventor
王嘎
杨洋
唐强
韩文俊
丁琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 14 Research Institute
Original Assignee
CETC 14 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 14 Research Institute filed Critical CETC 14 Research Institute
Priority to CN202110647708.6A priority Critical patent/CN113298259B/en
Publication of CN113298259A publication Critical patent/CN113298259A/en
Application granted granted Critical
Publication of CN113298259B publication Critical patent/CN113298259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention belongs to the technical field of radar information processing, and discloses a CNN (computer numerical network) reasoning framework design method supporting multi-core parallelism of an embedded platform. Reading a model file after training of a deep learning framework, extracting weight and bias parameters from the model file and defining the weight and bias parameters by using pointer variables; the operation in the CNN network is respectively packaged into operation kernel functions, and a general programming interface design is carried out for constructing a prediction function of a CNN network reasoning framework; binding a prediction function to a core number of the multi-core processor based on a multithreading mechanism of the multi-core processor; writing a corresponding VSIPL static library according to the platform type of the CNN network reasoning framework to be deployed; and deploying the CNN network reasoning framework on each operating system. The invention establishes the reasoning frame of the neural network model on the embedded platform, meets the real-time processing requirement of the application scene, supports various chips and is compatible with various operating systems.

Description

CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform
Technical Field
The invention belongs to the technical field of radar information processing, and particularly relates to a CNN (computer network n) network reasoning framework design method supporting multi-core parallelism of an embedded platform.
Background
In the DSP platform of the embedded system, the challenge faced by deep learning is mainly that there is no unified and general high-performance reasoning framework. The deep learning framework tensorflow of Google is only applicable to the CPU and GPU of the mobile terminal; hundred degrees artificial intelligence framework PADLLEPADDLE is also not suitable for DSP platforms. The deployment of low-storage, low-complexity deep learning frameworks in low-cost, low-energy-consumption, computing-power-limited embedded systems presents challenges. Hardware manufacturers are striving to develop special design chips based on artificial intelligence, accelerate in hardware, optimize a deep learning algorithm for embedded equipment, perform high-performance parallel computation for an embedded platform, and establish a deep learning framework based on the embedded platform, so that the method becomes a new solution. At present, a DSP platform lacks a deep learning framework, for example, a neural network algorithm does not have an inference framework or the real-time performance of the inference framework is not high in an embedded DSP platform.
Disclosure of Invention
Aiming at the problems that a neural network algorithm in the prior art has no inference frame or the inference frame is not high in real-time, the invention aims to provide a CNN (computer network) inference frame design method supporting multi-core parallelism of an embedded platform, which is used for establishing the inference frame of a neural network model in the embedded platform such as a CPU (Central processing unit), a DSP (digital signal processor) and the like so as to meet the real-time processing requirement of an application scene.
Specifically, the invention is realized by adopting the following technical scheme.
The invention provides a CNN network reasoning framework design method supporting multi-core parallelism of an embedded platform, which comprises the following steps:
CNN network model loading: reading a model file trained by the deep learning framework, extracting weights and bias parameters from the model file, and outputting the model weights and bias parameters defined by pointer variables;
CNN network function encapsulation: the convolution operation, pooling operation, activation operation and full-connection operation in the CNN network are respectively packaged into operation kernel functions by adopting a vector instruction set, an assembly language and a C language, the input of each operation kernel function is the model weight and offset defined by the pointer variable, and the output is a convolution layer function, a pooling layer function, an activation function and a full-connection layer function respectively; designing a basic block and view object based on VSIPL standard, and unifying a packaged convolution layer function, a pooling layer function, an activation function and a full-connection layer function, including function parameters, to form an operation kernel function with a general neural network programming interface; constructing a prediction function of a CNN (computer numerical network) reasoning framework by adopting the operation kernel function with the universal neural network programming interface;
And (3) carrying out parallelization design: based on a multithreading mechanism of the multi-core processor, taking a prediction function of the CNN reasoning framework as a thread function, creating a plurality of tasks or a plurality of threads, designing thread synchronization and communication, dividing data of an input test data set based on a load balancing principle, and binding each task or thread to a core number of the multi-core processor through a thread binding function;
Performing cross-platform design: writing a corresponding VSIPL static library according to the platform type of the CNN network reasoning framework to be deployed; the CNN network reasoning framework is deployed at VxWorks, linux, windows, sylixOS or Reworks operating systems.
Further, the trained model file is a binary file containing model parameters and consists of control header parameters and data;
The control header parameters are integers, the 1 st word of the control header parameters is the number of layers of the neural network, the 2 nd, 3 rd and 4 th words represent the dimension of a weight matrix of the first layer neural network model, the 5 th, 6 th and 7 th words are the dimension of the first layer pooling bias, the 8 th word is the dimension of the first layer pooling bias, the 9 th, 10 th and 11 th words are the dimension of the second layer pooling bias, and the 12 th word is the dimension of the second layer pooling bias; and so on, up to the last layer of neural network;
the data are data which are stored in binary files according to the values of the control header parameters and the weights and the bias data of the neural network models from the first layer to the last layer in sequence.
Further, the convolution layer function carries out one-dimensional convolution, two-dimensional convolution or three-dimensional convolution operation, and the dimension, the number of convolution kernels and the size parameters of the convolution kernels of the convolution operation are set;
the pooling layer function performs one-dimensional pooling, two-dimensional pooling or three-dimensional pooling, and the dimension, pooling type, interval and step length of pooling operation are set;
The full connection layer function sets the dimension of the weight matrix;
The activation function sets an activation function.
Further, the designing the basic block and view object based on VSIPL standard to perform programming interface design on the encapsulated convolution layer function, pooling layer function, activation function, full connection layer function, including function parameters, and unifying the operation kernel function with the general neural network programming interface includes:
Calculating middleware standard definition basic blocks and views based on VSIPL, binding pointer variables loaded and output by the CNN network model into basic blocks, extracting data from the basic blocks to bind the data into views, wherein the views are matrixes or vectors; and calling the operation kernel function by taking the converted matrix or vector as an input parameter.
Further, the data partitioning of the input test data set based on the load balancing principle includes:
And dividing the input test data set into N parts averagely, wherein N is the number of cores of the multi-core processor, creating N tasks or threads, binding the tasks or threads on N cores of the multi-core processor, and executing in a data parallel mode.
The CNN network reasoning framework design method supporting the embedded platform multi-core parallelism has the following beneficial effects:
a processing framework for reasoning the convolutional neural network on the embedded platform is established, and a convolutional neural network model is quickly built by using a packaged kernel function, so that a threshold for developing an artificial intelligent algorithm on the embedded platform is reduced, and the reasoning efficiency of the convolutional neural network on the CPU and the DSP platform is improved;
The artificial intelligent algorithm is subjected to multi-core parallel design in the embedded platform reasoning framework, the data and the calculation tasks are automatically divided and mapped to the hardware threads through the multi-core parallel operation reasoning framework, the high-speed real-time processing of the convolutional neural network in the CPU and the DSP platform is ensured, and the hardware resource utilization rate of the DSP platform is brought into play;
the universal convolutional neural network interface is provided, and a developer can perform secondary development according to the programming interface provided by the invention through the self-defined convolutional neural network operation kernel function operator, the programming interface and the bottom assembly function library, so that the programming efficiency is improved.
Drawings
Fig. 1 is a schematic diagram of a CNN network reasoning framework design method supporting multi-core parallelism of an embedded platform according to this embodiment.
Fig. 2 is a schematic diagram of the data arrangement in the model file of the present embodiment.
Fig. 3 is a forward reasoning flowchart of the CNN model of the present embodiment.
Detailed Description
The invention is described in further detail below with reference to the examples and with reference to the accompanying drawings.
Example 1:
The embodiment of the invention relates to a CNN network reasoning framework design method supporting DSP multi-core parallelism. As shown in fig. 1, the CNN network reasoning framework design method supporting DSP multi-core parallelism of the present embodiment includes:
1. CNN network model loading
The C language is used for reading and writing the file, namely, reading the model file trained by the deep learning framework (for example tensorflow, pytorch), extracting the weight and the bias parameter from the model file, outputting the model weight and the bias parameter defined by the pointer variable, and writing the model weight and the bias parameter into a new file. The trained model file is a binary file containing model parameters, takes bin as a suffix and consists of control header parameters and data. The control header parameters are integers, as shown in fig. 2, the 1 st word of the control header parameters is the number of layers of the neural network, the 2 nd, 3 rd and 4 th words represent the dimension of a weight matrix of the first layer neural network model, the 5 th, 6 th and 7 th words are the dimension of the first layer pooling, the 8 th word is the dimension of the first layer pooling bias, the 9 th, 10 th and 11 th words are the dimension of the second layer pooling bias, and the 12 th word is the dimension of the second layer pooling bias; and so on, up to the last layer of neural network. The data are data which are stored in binary files according to the weight values and the bias data of the neural network models of the first layer to the last layer in sequence according to the values of the control header parameters.
2. CNN network function encapsulation
(1) And adopting a vector instruction set, an assembly language and a C language to respectively perform function encapsulation on operations (including convolution operation, pooling operation, activation operation, full-connection operation and prediction function) in the CNN network, and encapsulating the functions into an operation kernel function. The input of each operation kernel function is model weight and bias defined by pointer variables, and the output is convolution layer function, pooling layer function, activation function, normalization function, full connection layer function, etc.
The kernel functions defined in the present invention are shown in table 1.
Table 1 convolutional neural network unified programming interface
The operation kernel functions comprise convolution layer functions, pooling layer functions, activation functions, full connection layer functions and prediction functions. The method specifically comprises the following steps:
A convolution layer function, performing one-dimensional convolution, two-dimensional convolution and three-dimensional convolution operation, setting parameters such as the dimension, the number and the size of convolution kernels of the convolution operation, and packaging the parameters into an operation kernel function;
Pooling layer functions, carrying out one-dimensional pooling, two-dimensional pooling and three-dimensional pooling, setting the dimensionality, pooling type, interval and step length of pooling operation, and packaging into an operation kernel function;
packaging the activation functions, and designing ReLU, softmax, tanh and other activation functions;
Packaging the full-connection layer function and designing the dimension of the weight matrix;
and the prediction function is constructed by adopting a convolution layer function, a pooling layer function, an activation function and a full connection layer function.
(2) Based on VSIPL standard basic block and view object design, the operation kernel functions (convolution layer function, pooling layer function, activation function, full connection layer function) of each package, including function parameters, are subjected to programming interface design and unified into the operation kernel function with a general neural network programming interface. Namely:
The middleware standard definition block (basic block) and view (view) are calculated based on VSIPL (Vector SIGNAL AND IMAGE Processing Library), the output (i.e., pointer variable) of the previous step (neural network model loading) is bound as block, and the data binding is extracted from the block as view, which is a matrix or Vector. The model parameters are converted into matrixes and vectors through the binding of blocks and views, and the converted matrixes or vectors are used as input parameters (see p1 in the table 1) to call the operation kernel function defined by the invention.
The operation kernel function defined by the invention is designed based on VSIPL standard, the parameters are in standard data format, the data in the parameters are matrix or vector, other parameters except the data are relevant to application, and the algorithm is only required to be focused during secondary development, and the parameters are only required to be set. The operation kernel function is optimized through assembly language, so that the pipeline efficiency and the cache hit rate can be fully improved.
The operation kernel function is written by different assembly languages and can support x86, MIPS or powerpc architectures.
(3) And constructing a CNN network prediction function by adopting an operation kernel function (a convolution layer function, a pooling layer function, an activation function and a full connection layer function) with a general neural network programming interface.
3. Parallelization design for CNN network prediction function
Based on multithreading mechanisms of multi-core processors such as CPU, DSP and the like, CNN network prediction functions are used as thread functions, a plurality of tasks (tasks) or a plurality of threads (pthread) are created, thread synchronization and communication are designed, data division is carried out on an input test data set based on a load balancing principle, and each task or thread is bound to a core number of the multi-core processor through a thread binding function. That is, the input test data set is divided into N parts on average, N is the number of cores of the multi-core processor, N is the parallelism of parallel processing, and the task (task) or thread (pthread) is bound to N cores of the multi-core processor to execute in a data parallel manner. The method comprises the steps of dividing data to be predicted into a plurality of parts on average, calling model parameters loaded by neural network model parameters, writing thread functions by using encapsulated operation kernel functions, and binding the thread functions to each core of a multi-core processor for operation, so that automatic mapping of data and calculation tasks to the multi-core processor is realized.
The parallel design is carried out on the prediction function, the parallelism parameter is provided, the forward propagation calculation task of the convolutional neural network can be mapped to the multi-core processor, the data division is carried out on the test data, the task division is carried out on the irrelevant calculation task, and the high-speed real-time processing is realized.
The data can be divided into data according to the verification data set by the hardware platform, a multithreading or multitasking mechanism is created, the data and the computing tasks are automatically mapped to hardware, and single-machine multi-core, data parallelism and task parallelism are supported.
Based on the parallel running mechanism of different operating systems, the prediction function can support running on the operating systems Linux, windows, vxWorks, sylixOS, reworks and the like.
4. Performing cross-platform design
Different platforms call different VSIPL static libraries, and corresponding VSIPL static libraries are written according to the platform types of the CNN network reasoning framework to be deployed; the CNN network reasoning framework is deployed at VxWorks, linux, windows, sylixOS or Reworks operating systems.
The convolutional neural network reasoning code is written based on the neural network programming interface defined by the invention, a secondary development user can write any convolutional neural network reasoning code according to the requirements, write any static library calling different platforms, so that the operation kernel function can be deployed and operated on different hardware platforms, and cross-platform high performance is realized. Meanwhile, the VSIPL static library is realized by adopting assembly languages and C languages of different instruction sets, and the high-performance function library can fully exert computing resources and improve software performance.
Example 2:
The invention relates to a multi-core parallel processing method for CNN (computer network node) reasoning in DSP (digital signal processor) in another embodiment. In this embodiment, the target recognition reasoning part based on the convolutional neural network may run on a vxworks operating system DSP platform or a linux operating system x86 architecture CPU platform.
As shown in FIG. 3, the forward reasoning flow of the neural network model is based on convolutional neural network for identifying the model of the airplane, and the model of the airplane is totally four layers of neural networks. A piece of data read from the training dataset, the data being a vector of length 150, is used as input data for forward reasoning. The first layer of convolution layer is used for carrying out convolution of 6 channels and average pooling with the step length of 2; the second layer of convolution layer is used for carrying out convolution of 12 channels and average pooling with the step length of 2; the third layer is a full connection layer; the fourth layer is the output layer.
The multi-core parallel processing method of CNN (computer numerical network) reasoning in the DSP comprises the following steps:
1. CNN network model loading
And loading tensorflow a trained CNN network model file, wherein the model file is a bin file, and the data arrangement mode is shown in figure 2. Wherein the fourth output layer is not pooled and the parameter defaults to [ 111 ]. Reading the header and the data by the language C to obtain a model of 4 layers; the first layer has 6 convolution kernels with a size of [ 15 ], a pooling layer step size of 2, and a bias of 6; the second layer has 12 convolution kernels with a size of [ 65 ], a pooling layer step size of 2, and a bias of 12; the third layer is a fully connected layer, the weight size is [384 50], and the bias is 50; the fourth layer is the output layer, the weight size is [50 ], biased to 10.
2. CNN network function encapsulation
And adopting a vector instruction set, an assembly language and a C language to respectively perform function encapsulation on operations (including convolution operation, pooling operation, activation operation, full-connection operation and prediction function) in the CNN network, and encapsulating the functions into an operation kernel function. Including vsip _conv1d function (for first layer convolution), vsip _ avgpool _f function (for first layer pooling), vsip _active_f function (for first layer activation), vsip _conv1d function (for second layer convolution), vsip _ avgpool _f function (for second layer pooling), vsip _active_f function (for second layer activation), vsip _ fullnet _f function (for third layer full concatenation), vsip _ fullnet _f function (for fourth layer output), wherein the output layer (fourth layer) outputs confidence probabilities (outputs 10 probability values) for the class to which the aircraft model belongs.
In fig. 2, D1, D2, D3, and D4 represent weight data of the first layer (L1), the second layer (L2), the third layer (L3), and the fourth layer (L4), respectively. D1, D2, D3, D4 data are bound to block, then from block to view, i.e. matrix or vector, then view is taken as input, and vsip _conv1d function (for first layer convolution), vsip _ avgpool _f function (for first layer pooling), vsip _active_f function (for first layer activation), vsip _conv1d function (for second layer convolution), vsip _ avgpool _f function (for second layer pooling), vsip _active_f function (for second layer activation), vsip _ fullnet _f function (for third layer full connection), vsip _ fullnet _f function (for fourth layer output) are called. And designing a programming interface by using the packaged operation kernel function, including function parameters, based on VSIPL standard block and view object designs, and unifying the programming interface into a universal neural network programming interface.
The prediction function of the convolutional neural network inference framework is constructed with the encapsulated vsip _conv1d function (for first layer convolution), vsip _ avgpool _f function (for first layer pooling), vsip _active_f function (for first layer activation), vsip _conv1d function (for second layer convolution), vsip _ avgpool _f function (for second layer pooling), vsip _active_f function (for second layer activation), vsip _ fullnet _f function (for third layer full join), vsip _ fullnet _f function (for fourth layer output).
3. Parallelization design
The method comprises the steps of taking a prediction function of a convolutional neural network reasoning framework as a thread function, dividing data to be predicted into N parts (N is the parallelism of multi-thread parallel processing), creating N threads, binding the threads on N cores of a multi-core processor, and executing in a data parallel mode.
4. Performing cross-platform design
Writing a corresponding VSIPL static library according to the platform type of the CNN network reasoning framework to be deployed; the inference framework is deployed in a vxworks operating system or a linux operating system to complete cross-platform design, judge the labels of the test data sets, and improve the accuracy of target identification classification.
The CNN network reasoning framework design method supporting DSP multi-core parallelism provided by the invention is based on a convolution neural network reasoning framework of a DSP platform, adopts the means of model loading, defining an operation kernel function operator library, bottom assembly optimization and the like, adopts the multi-core parallelism software design method and a mode of packaging into a universal neural network interface, packages and data-parallels a neural network layer, a pooling layer and a full-connection layer, realizes the target recognition high-performance processing requirement of a radar system, and has the characteristics of cross-platform, high performance, parallelization and usability.
(1) Cross-platform: the universal convolutional neural network operation kernel function operator is designed based on VSIPL standard interfaces, operation kernel functions are packaged, a unified programming interface is established, one-time programming is realized, multiple operation is realized, operating systems such as Linux, windows, vxWorks, sylixOS, reworks are supported, and x86, MIPS and powerpc architectures are supported.
(2) High performance: aiming at the characteristics of different instruction sets, the operation kernel function operator adopts vectorization, parallelization and pipeline design, fully exerts the multistage pipeline efficiency of the CPU and the DSP, improves the efficiency of a vector processor, reduces the cache miss rate, thereby realizing the high performance of the calculation function and improving the reasoning speed.
(3) Parallelization: through multi-thread design, the algorithm is mapped to the multi-core processor, and multi-core parallel operation of the CPU and the DSP platform is supported.
(4) Ease of use: and a tensorflow, pytorch framework is supported, and a convolutional neural network model is supported.
In some embodiments, certain aspects of the techniques described above may be implemented by one or more processors of a processing system executing software. The software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer-readable storage medium. The software may include instructions and certain data that, when executed by one or more processors, operate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium may include, for example, a magnetic or optical disk storage device, a solid state storage device such as flash memory, cache, random Access Memory (RAM), or other non-volatile memory device. Executable instructions stored on a non-transitory computer-readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executed by one or more processors.
A computer-readable storage medium may include any storage medium or combination of storage media that can be accessed by a computer system during use to provide instructions and/or data to the computer system. Such storage media may include, but is not limited to, optical media (e.g., compact Disc (CD), digital Versatile Disc (DVD), blu-ray disc), magnetic media (e.g., floppy disk, magnetic tape, or magnetic hard drive), volatile memory (e.g., random Access Memory (RAM) or cache), non-volatile memory (e.g., read Only Memory (ROM) or flash memory), or microelectromechanical system (MEMS) based storage media. The computer-readable storage medium may be embedded in a computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disk or Universal Serial Bus (USB) based flash memory), or coupled to the computer system via a wired or wireless network (e.g., network-accessible storage (NAS)).
Note that not all of the activities or elements in the above general description are required, that a portion of a particular activity or device may not be required, and that one or more further activities or included elements may be performed in addition to those described. Still further, the order in which the activities are listed need not be the order in which they are performed. Moreover, these concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or element of any or all the claims. Furthermore, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter.

Claims (4)

1. A CNN network reasoning frame design method supporting embedded platform multi-core parallelism is characterized by comprising the following steps:
CNN network model loading: reading a model file trained by the deep learning framework, extracting weights and bias parameters from the model file, and outputting the model weights and bias parameters defined by pointer variables; the trained model file is a binary file containing model parameters and consists of control header parameters and data; the control header parameters are integers, the 1 st word of the control header parameters is the number of layers of the neural network, the 2 nd, 3 rd and 4 th words represent the dimension of a weight matrix of the first layer neural network model, the 5 th, 6 th and 7 th words are the dimension of the first layer pooling bias, the 8 th word is the dimension of the first layer pooling bias, the 9 th, 10 th and 11 th words are the dimension of the second layer pooling bias, and the 12 th word is the dimension of the second layer pooling bias; and so on, up to the last layer of neural network; the data are data which are stored in binary files according to the values of control header parameters and the weights and bias data of the neural network models from the first layer to the last layer in sequence;
CNN network function encapsulation: the convolution operation, pooling operation, activation operation and full-connection operation in the CNN network are respectively packaged into operation kernel functions by adopting a vector instruction set, an assembly language and a C language, the input of each operation kernel function is the model weight and offset defined by the pointer variable, and the output is a convolution layer function, a pooling layer function, an activation function and a full-connection layer function respectively; designing a basic block and view object based on VSIPL standard, and unifying a packaged convolution layer function, a pooling layer function, an activation function and a full-connection layer function, including function parameters, to form an operation kernel function with a general neural network programming interface; constructing a prediction function of a CNN (computer numerical network) reasoning framework by adopting the operation kernel function with the universal neural network programming interface;
and (3) carrying out automatic parallelization design: based on a multithreading mechanism of the multi-core processor, taking a prediction function of the CNN reasoning framework as a thread function, creating a plurality of tasks or a plurality of threads, designing thread synchronization and communication, dividing data of an input test data set based on a load balancing principle, and binding each task or thread to a core number of the multi-core processor through a thread binding function;
Performing cross-platform design: writing a corresponding VSIPL static library according to the platform type of the CNN network reasoning framework to be deployed; the CNN network reasoning framework is deployed at VxWorks, linux, windows, sylixOS or Reworks operating systems.
2. The CNN network reasoning framework design method supporting multi-core parallelism of embedded platform of claim 1, wherein,
The convolution layer function carries out one-dimensional convolution, two-dimensional convolution or three-dimensional convolution operation, and the dimension, the number of convolution kernels and the size parameters of the convolution kernels of the convolution operation are set;
the pooling layer function performs one-dimensional pooling, two-dimensional pooling or three-dimensional pooling, and the dimension, pooling type, interval and step length of pooling operation are set;
The full connection layer function sets the dimension of the weight matrix;
The activation function sets an activation function.
3. The CNN network inference framework design method supporting multi-core parallelism of an embedded platform according to claim 1, wherein the designing of the basic block and view object based on VSIPL standard, the programming interface design of the packaged convolution layer function, pooling layer function, activation function, full connection layer function, including function parameters, and the unification of the operation kernel function with the general neural network programming interface includes:
Calculating middleware standard definition basic blocks and views based on VSIPL, binding pointer variables loaded and output by the CNN network model into basic blocks, extracting data from the basic blocks to bind the data into views, wherein the views are matrixes or vectors; and calling the operation kernel function by taking the converted matrix or vector as an input parameter.
4. The CNN network reasoning framework design method supporting multi-core parallelism of the embedded platform according to claim 1, wherein the data partitioning of the input test data set based on the load balancing principle comprises:
And dividing the input test data set into N parts averagely, wherein N is the number of cores of the multi-core processor, creating N tasks or threads, binding the tasks or threads on N cores of the multi-core processor, and executing in a data parallel mode.
CN202110647708.6A 2021-06-10 2021-06-10 CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform Active CN113298259B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110647708.6A CN113298259B (en) 2021-06-10 2021-06-10 CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110647708.6A CN113298259B (en) 2021-06-10 2021-06-10 CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform

Publications (2)

Publication Number Publication Date
CN113298259A CN113298259A (en) 2021-08-24
CN113298259B true CN113298259B (en) 2024-04-26

Family

ID=77327859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110647708.6A Active CN113298259B (en) 2021-06-10 2021-06-10 CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform

Country Status (1)

Country Link
CN (1) CN113298259B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113611332B (en) * 2021-10-09 2022-01-18 聊城中赛电子科技有限公司 Intelligent control switching power supply method and device based on neural network
CN116991564B (en) * 2023-09-28 2024-01-09 之江实验室 Operator internal parallel acceleration method for heterogeneous dual-core MCU

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784320A (en) * 2017-09-27 2018-03-09 电子科技大学 Radar range profile's target identification method based on convolution SVMs
CN108549935A (en) * 2018-05-03 2018-09-18 济南浪潮高新科技投资发展有限公司 A kind of device and method for realizing neural network model
CN110070178A (en) * 2019-04-25 2019-07-30 北京交通大学 A kind of convolutional neural networks computing device and method
CN110766017A (en) * 2019-10-22 2020-02-07 国网新疆电力有限公司信息通信公司 Mobile terminal character recognition method and system based on deep learning
CN111709522A (en) * 2020-05-21 2020-09-25 哈尔滨工业大学 Deep learning target detection system based on server-embedded cooperation
CN112734040A (en) * 2021-01-22 2021-04-30 中国人民解放军军事科学院国防科技创新研究院 Embedded artificial intelligence computing framework and application method
CN112748953A (en) * 2020-07-02 2021-05-04 腾讯科技(深圳)有限公司 Data processing method and device based on neural network model and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10599978B2 (en) * 2017-11-03 2020-03-24 International Business Machines Corporation Weighted cascading convolutional neural networks
US11580386B2 (en) * 2019-03-18 2023-02-14 Electronics And Telecommunications Research Institute Convolutional layer acceleration unit, embedded system having the same, and method for operating the embedded system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784320A (en) * 2017-09-27 2018-03-09 电子科技大学 Radar range profile's target identification method based on convolution SVMs
CN108549935A (en) * 2018-05-03 2018-09-18 济南浪潮高新科技投资发展有限公司 A kind of device and method for realizing neural network model
CN110070178A (en) * 2019-04-25 2019-07-30 北京交通大学 A kind of convolutional neural networks computing device and method
CN110766017A (en) * 2019-10-22 2020-02-07 国网新疆电力有限公司信息通信公司 Mobile terminal character recognition method and system based on deep learning
CN111709522A (en) * 2020-05-21 2020-09-25 哈尔滨工业大学 Deep learning target detection system based on server-embedded cooperation
CN112748953A (en) * 2020-07-02 2021-05-04 腾讯科技(深圳)有限公司 Data processing method and device based on neural network model and electronic equipment
CN112734040A (en) * 2021-01-22 2021-04-30 中国人民解放军军事科学院国防科技创新研究院 Embedded artificial intelligence computing framework and application method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
天津滨海迅腾.《TensorFlow项目式案例实战》.天津大学出版社,2020,(第1版),99-104. *

Also Published As

Publication number Publication date
CN113298259A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
US10942716B1 (en) Dynamic computational acceleration using a heterogeneous hardware infrastructure
CN113298259B (en) CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform
US11803404B2 (en) Deep learning algorithm compiling method, device, and related product
US11275615B2 (en) Data processing offload using in-storage code execution
US11669443B2 (en) Data layout optimization on processing in memory architecture for executing neural network model
US11915149B2 (en) System for managing calculation processing graph of artificial neural network and method of managing calculation processing graph by using the same
CN111160551A (en) Computation graph execution method, computer device, and storage medium
WO2021000971A1 (en) Method and device for generating operation data and related product
CA3114635A1 (en) System and method for automated precision configuration for deep neural networks
CN110689116B (en) Neural network pruning method and device, computer equipment and storage medium
US11656880B2 (en) Function evaluation using multiple values loaded into registers by a single instruction
US20210073625A1 (en) Partitioning control dependency edge in computation graph
US10564947B2 (en) Computer system and method for multi-processor communication
US20220076095A1 (en) Multi-level sparse neural networks with dynamic rerouting
EP4318319A1 (en) Model processing method and apparatus
CN115576561A (en) Deep neural network model compiling and optimizing method based on Shenwei processor
CN116228515B (en) Hardware acceleration system, method and related device
WO2022078400A1 (en) Device and method for processing multi-dimensional data, and computer program product
WO2023287702A1 (en) Method and apparatus for accelerated inference of machine-learning models
US11573777B2 (en) Method and apparatus for enabling autonomous acceleration of dataflow AI applications
CN113887730A (en) Quantum simulator implementation method and device, related equipment and quantum simulation method
US11941383B1 (en) Compilation with caching of code analysis result
US11809849B1 (en) Global modulo allocation in neural network compilation
US20230121052A1 (en) Resource resettable deep neural network accelerator, system, and method
KR20170081952A (en) Multi-core simulation system and method based on shared translation block cache

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant