CN114020450A - Neural network model execution method, device, system and electronic equipment - Google Patents

Neural network model execution method, device, system and electronic equipment Download PDF

Info

Publication number
CN114020450A
CN114020450A CN202111170745.9A CN202111170745A CN114020450A CN 114020450 A CN114020450 A CN 114020450A CN 202111170745 A CN202111170745 A CN 202111170745A CN 114020450 A CN114020450 A CN 114020450A
Authority
CN
China
Prior art keywords
neural network
executed
block structure
network model
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111170745.9A
Other languages
Chinese (zh)
Inventor
顾鹏
王成波
刘海军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Intellifusion Technologies Co Ltd
Original Assignee
Shenzhen Intellifusion Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Intellifusion Technologies Co Ltd filed Critical Shenzhen Intellifusion Technologies Co Ltd
Priority to CN202111170745.9A priority Critical patent/CN114020450A/en
Publication of CN114020450A publication Critical patent/CN114020450A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a neural network model execution method, a device, a system, electronic equipment and a storage medium, wherein the method comprises the following steps: loading a neural network model to be executed into a model database and analyzing to obtain a model analysis result, wherein the model analysis result comprises at least one block structure information of the neural network model to be executed; distributing independent block structure storage space for each block structure based on the block structure information of the neural network model to be executed, and storing each block structure to the corresponding block structure storage space; according to the to-be-executed tasks of the to-be-executed neural network model, calling the corresponding block structures from the block structure storage space to execute until all the to-be-executed tasks are executed; and unloading the executed neural network model from the model database. The utilization rate of storage resources in the execution of the neural network model can be improved.

Description

Neural network model execution method, device, system and electronic equipment
Technical Field
The invention relates to the field of artificial intelligence, in particular to a neural network model execution method, device and system, electronic equipment and a storage medium.
Background
With the development of artificial intelligence, neural network models in deep learning are widely concerned and used, for example, in the fields of computer vision, speech recognition, text recognition and the like, wherein the number of used neural network models can reach dozens of layers or even hundreds of layers, and the layers can be organized and divided according to functions, for example, into block structures such as convolution blocks for realizing convolution functions and pooling structures for realizing pooling operation. Moreover, most neural network models are computationally intensive, multiple computing resources are required to be used for parallel computing in the execution of the neural network models, for example, heterogeneous computing modes in which different types of execution units such as a CPU, a DSP, and an NPU are executed in parallel, and storage management design between heterogeneous computing affects storage requirements and efficiency of model execution, that is, how to efficiently cooperate to perform computing between different computing resources and storage resources, so that the problem of storage resource reuse needs to be solved in heterogeneous computing.
In order to solve the above storage resource multiplexing problem, in the prior art, a corresponding storage resource is dynamically allocated for each task to be executed of a neural network model, and after each task is completed, the corresponding storage resource is released so that other tasks can be dynamically allocated for use. The disadvantage of this storage management method is obvious, the storage resources required by the execution of the model are generally large, and a large number of unfinished tasks may be accumulated during the execution, thereby occupying a large amount of storage resources, which may cause a great waste of storage resources, especially for the embedded end device with limited storage resources.
Disclosure of Invention
The embodiment of the invention provides a neural network model execution method, which can improve the utilization rate of storage resources of a neural network model.
In a first aspect, an embodiment of the present invention provides a neural network model execution method, including the following steps:
loading a neural network model to be executed into a model database and analyzing to obtain a model analysis result, wherein the model analysis result comprises at least one block structure information of the neural network model to be executed;
distributing independent block structure storage space for each block structure based on the block structure information of the neural network model to be executed, and storing each block structure to the corresponding block structure storage space;
according to the to-be-executed tasks of the to-be-executed neural network model, calling the corresponding block structures from the block structure storage space to execute until all the to-be-executed tasks are executed;
and unloading the executed neural network model from the model database.
Optionally, the loading the neural network model to be executed into the model database and analyzing the neural network model to obtain a model analysis result, including:
reading the neural network model to be executed into the model database and allocating a model identifier;
traversing the to-be-executed neural network model based on the model identifier to obtain a plurality of block structures of the to-be-executed neural network model, and allocating a block index to each block structure;
and acquiring the information of each block structure as the model analysis result.
Optionally, the block structure information includes a model to which a block structure belongs, a block index, and a block calculation type, and allocating a block structure storage space to each block structure based on the block structure information of the to-be-executed neural network model includes:
correspondingly binding at least one computing unit based on the block computing type of each block structure;
and distributing corresponding memory addresses and memory sizes according to the computing units bound by each block structure.
Optionally, the invoking a corresponding block structure from the block structure storage space according to the to-be-executed task of the to-be-executed neural network model for execution includes:
acquiring a task to be executed of the neural network model to be executed from a preset task list;
acquiring block structure information required by the corresponding task to be executed based on the task to be executed;
and calling the corresponding block structure from the block structure storage space to execute according to the block structure information required by the task to be executed.
Optionally, the invoking a corresponding block structure from the block structure storage space according to the block structure information required by the task to be executed to execute includes:
acquiring at least one block structure of the task to be executed according to at least one block index of the block structure information;
and calling the computing unit with the minimum load to execute based on the at least one computing unit bound by the block structure.
Optionally, the unloading the executed neural network model from the model database includes:
releasing the block structure storage space occupied by all executed tasks of the neural network model;
clearing all executed tasks in the preset task list;
and destroying the model identifier of the executed neural network model in the model database.
In a second aspect, an embodiment of the present invention provides a neural network model execution apparatus, including:
the analysis module is used for loading the neural network model to be executed into a model database and analyzing the neural network model to obtain a model analysis result, wherein the model analysis result comprises at least one block structure information of the neural network model to be executed;
the distribution module is used for distributing independent block structure storage space for each block structure based on the block structure information of the to-be-executed neural network model and storing each block structure to the corresponding block structure storage space;
the execution module is used for calling the corresponding block structure from the block structure storage space to execute according to the to-be-executed tasks of the to-be-executed neural network model until all the to-be-executed tasks are executed;
and the unloading module is used for unloading the executed neural network model from the model database.
In a third aspect, an embodiment of the present invention provides a neural network model execution system, where the system includes a neural network model execution end and a model database, where the neural network model execution end is used to execute steps in the neural network model execution method provided in the embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including: the neural network model execution method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the computer program to realize the steps in the neural network model execution method provided by the embodiment of the invention.
In a fifth aspect, the embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the neural network model execution method provided by the embodiment of the present invention.
In the embodiment of the invention, the external neural network model to be executed is loaded into the model database and analyzed to obtain the information of a plurality of block structures of the neural network model, then an independent storage space is distributed to each block structure according to the obtained information of the plurality of block structures and each block structure is stored in the corresponding block structure storage space, so that different tasks of the same neural network model can repeatedly call the same block structure during execution, the storage resources and time resources are wasted by reapplication after the corresponding storage space is released after the execution of one task is finished, and the utilization rate of the storage resources during the execution of the neural network model is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for implementing a neural network model according to an embodiment of the present invention;
FIG. 2 is a flow chart of a model parsing method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an allocation structure of a storage space of a neural network model block structure according to an embodiment of the present invention;
FIG. 4 is a flowchart of a block structure memory space allocation method according to an embodiment of the present invention;
FIG. 5 is a block-structured storage space binding structure diagram of a computing unit according to an embodiment of the present invention;
FIG. 6 is a flowchart of a method for calling a block structure according to an embodiment of the present invention;
FIG. 7 is a flowchart of another method for calling a block structure according to an embodiment of the present invention;
FIG. 8 is a flow chart of a method for offloading a neural network model provided by an embodiment of the invention;
FIG. 9 is a schematic structural diagram of a neural network model execution apparatus according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a parsing module according to an embodiment of the present invention;
FIG. 11 is a schematic structural diagram of a distribution module according to an embodiment of the present invention;
FIG. 12 is a block diagram of an execution module according to an embodiment of the present invention;
FIG. 13 is a block diagram of a call and execute unit according to an embodiment of the present invention;
FIG. 14 is a schematic structural diagram of an unloading module according to an embodiment of the present invention;
fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 16 is a schematic structural diagram of a neural network model execution system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a neural network model execution method according to an embodiment of the present invention, as shown in fig. 1, including the following steps:
101. and loading the neural network model to be executed into the model database and analyzing to obtain a model analysis result.
The model analysis result includes at least one block structure information of the neural network model to be executed.
In an embodiment of the present invention, the neural network model to be executed includes a trained multi-layer neural network, such as a neural network for image recognition, e.g., an RCNN, a YOLO, and the like, and a neural network for text recognition, e.g., an RNN and the like. The neural network model can form a neural network module with certain functions among multiple layers through pre-construction and pre-training, such as a convolution module block for realizing convolution function, a pooling structure for realizing pooling operation and other block structures, and the calculation resources required by each block structure according to the calculation characteristics are different, for example, the convolution block structure for carrying out convolution operation needs a GPU (graphics processing unit) for calculation, and a classification module uses a CPU for classification calculation and the like, so that corresponding calculation units and storage spaces need to be reasonably distributed according to the calculation characteristics of each block structure of a task to be executed, and the block structures are matched with each other to realize the corresponding task of the neural network model to be executed efficiently and save resources.
When a task needs to execute the neural network model, the neural network model to be executed needs to be loaded into a model database in a memory, wherein the model database can be a relational database (such as MYSQL and the like) or a non-relational database (such as NOSQL and the like), namely, data such as the structure of the neural network model and the like are read into the model database, then the structure of the neural network model is analyzed, corresponding parameters are obtained and temporarily stored in the model database, and the model is marked so as to be conveniently and quickly found in the model database, so that the loading and analyzing speed of the model is increased.
Further, as shown in fig. 2, fig. 2 is a flowchart of a model analysis method provided in an embodiment of the present invention, where the step 101 specifically includes:
1011. reading the neural network model to be executed into the model database and assigning a model identifier.
1012. Traversing the to-be-executed neural network model based on the model identifier to obtain a plurality of block structures of the to-be-executed neural network model, and allocating a block index to each block structure.
1013. And acquiring the information of each block structure as the model analysis result.
In the above steps 1011 to 1013, the neural network model to be executed may include a trained multi-layer neural network, such as a neural network for image recognition, e.g., an RCNN, a YOLO, and the like, and a neural network for text recognition, e.g., an RNN and the like. The neural network model can form a neural network module with a certain function among multiple layers through pre-construction and pre-training, for example, block structures such as a convolution module block for realizing a convolution function and a pooling structure for realizing pooling operation are equivalent to that each neural network model is formed by a plurality of block structures, and it can also be understood that each neural network model is analyzed into a group of block structures, and each group of block structures comprises a plurality of block structures.
The neural network model to be executed may be read into the model database in advance and stored in a block structure, and the model identifier of the neural network model to be executed and the block index corresponding to each block structure are assigned. The method comprises the steps that one model identifier corresponds to one to-be-executed neural network model, the to-be-executed neural network model is composed of a plurality of block structures, one model identifier corresponds to one group of block indexes, and the group of block indexes comprise block indexes corresponding to the plurality of block structures which form the to-be-executed neural network model.
The method comprises the steps that a model database comprises a plurality of to-be-executed neural network models, each neural network model is allocated with a unique model identifier, when the A neural network model needs to be executed, all to-be-executed neural network models in the model database can be traversed according to the model identifiers of the A neural network models to obtain a group of block indexes a forming the A neural network model, and a plurality of block structures forming the A neural network model are determined according to the group of block indexes a.
In the embodiment of the present invention, the data of the structure of the neural network model to be executed may be read from an external storage (e.g. a hard disk, a magnetic disk, etc.) into the model database, e.g. the relational database MYSQ, storing each block structure (block) of the model in a table of a database, and assigning a model identifier to the read neural network model through the model database, may be a unique identification of a character string, such as model, in a model database, so that a unique neural network model may be obtained from the model identifier, the block structures of the neural network model to be executed can be traversed from the table of the model database through the model identifier, and each block structure containing the multilayer neural network is allocated with a block index, such as block1, and the block index is used as the mark of the block structure in the model database table. Further, a corresponding block structure can be obtained in the model database table according to the block index, and further, block structure information can be obtained, where the block structure information at least includes a calculation type of the block structure, a model to which the block structure belongs, the block index, a neural network layer parameter included in the block structure, and the like, where a suitable hardware calculation unit can be selected according to the calculation type of the block structure, for example, a convolution block structure of convolution operation requires a GPU (image processing unit) and a corresponding storage space for calculation, and a classification module performs classification calculation and the like using a CPU and a corresponding storage space; the model is divided into a plurality of block structures with block indexes by traversing all layers of the neural network model to be executed, the corresponding block structures can be quickly searched through the block indexes, and different hardware computing units are selected for different block structures, so that the model can run on various hardware in parallel, and the parallel computing capability of the model is improved. Through the parallel design of the inside of the storage space with the block structure, the concurrent scene that a plurality of tasks call the same block structure simultaneously when the neural network executes is facilitated, the problem of storage resource waste caused by repeated application of the storage space with the same block structure when different tasks execute can be avoided, and therefore the utilization rate of the storage resource when the neural network model executes can be improved.
102. And allocating an independent block structure storage space for each block structure based on the block structure information of the neural network model to be executed, and storing each block structure to the corresponding block structure storage space.
In the embodiment of the present invention, the storage space required by each block structure may be calculated according to each block structure information by traversing each block structure information of the to-be-executed neural network model in the model database, and the corresponding calculation unit may be bound according to the block structure calculation type included in the block structure information, then the size of the storage space required by the execution of the whole neural network model is calculated and an independent and corresponding block structure storage space is applied, and finally the parameter data of the block structure is stored in the corresponding block structure storage space, as shown in fig. 3, fig. 3 is a schematic diagram of an allocation structure of a neural network model block structure storage space provided in the embodiment of the present invention, four block structures 0, block1, block2, and block3 are obtained by traversing the to-be-executed neural network model MODELA, then the storage space of each block structure is calculated and applied according to the information of each block structure, from MODEL abuffer.block0 to MODEL abuffer.block3; each block structure comprises neural network layers with different layers, the neural network layers are correspondingly calculated with data input from the outside in a certain sequence and then output from the last block structure block3, the output of one block structure can be used as the input of the next block structure, the storage spaces of all the block structures are mutually independent and do not interfere with each other, the block concurrent execution is facilitated, but all the neural network layers of each block structure share the storage space of the block structure, so that the multiplexing of the internal storage space of the block structure is realized, and the storage resources can be saved.
Further, as shown in fig. 4, fig. 4 is a flowchart of a block structure storage space allocation method provided in an embodiment of the present invention, where the allocating an independent block structure storage space for each block structure based on the block structure information of the to-be-executed neural network model specifically includes:
1021. correspondingly binding at least one computing unit based on the block computing type of each block structure;
1022. and distributing corresponding memory addresses and memory sizes according to the computing units bound by each block structure.
In the embodiment of the present invention, the block calculation type of each block structure may be obtained through each block structure information of the neural network model, and then the corresponding calculation unit is bound according to the block calculation type, where the calculation unit is a hardware execution structure for performing real numerical calculation of the block structure, each block structure may be bound with a plurality of calculation units, but the plurality of calculation units share a storage space of the same block structure, the storage space is a memory that is independently allocated corresponding to the calculation unit bound by each block structure, and includes a memory start address and a memory size, and addresses of the block storage spaces of each block structure may be allocated continuously or discontinuously. For example, as shown in fig. 5, fig. 5 is a schematic diagram of a block structure of computing unit memory space binding in a block structure provided in an embodiment of the present invention, where a computing type of a block structure block1 of a neural network model to be executed is Y, the Y computing type includes two bound computing units Y1 and Y2 (a specific computing unit Y1 and Y2 may select, for example, a GPU and a CPU, according to the computing type), each computing unit should allocate respective memories S11 and S12, a memory space of the block structure block1 includes S11 and S12, and the Y1 computing unit is specified to use an S11 segment of the memory space and the Y2 computing unit uses an S12 segment of the memory space, so that independent and parallel execution inside the block structure can be achieved, a concurrent scenario in which multiple tasks call the block structure block1 at the same time when a neural network is executed repeatedly, and a problem of memory resource waste caused by applying for a memory space of the same block structure when different tasks are executed repeatedly, therefore, the utilization rate of storage resources in the execution of the neural network model can be improved.
103. And calling the corresponding block structure from the block structure storage space to execute according to the to-be-executed tasks of the to-be-executed neural network model until all the to-be-executed tasks are executed.
In the embodiment of the present invention, the neural network model to be executed may be a deep neural network model used in the fields of computer vision, speech recognition, text recognition, etc., where the number of layers of the used neural network model may often reach dozens of layers or even hundreds of layers, and these layers may be organized and divided according to functions, for example, into a rolling block for implementing a convolution function, a pooling structure for implementing a pooling operation, a sorting structure for implementing a sorting function, and other block structures, and a neural network task to be executed usually needs to call a plurality of block structures to be combined together and perform calculation in a certain order to complete a corresponding task, for example, feature extraction of a convolutional neural network needs to call a rolling block and a pooling block from a storage space and perform execution in sequence.
Further, as shown in fig. 6, fig. 6 is a flowchart of a method for calling a block structure according to an embodiment of the present invention, where the calling a corresponding block structure from a block structure storage space to execute according to a task to be executed of the neural network model to be executed specifically includes:
1031. acquiring a task to be executed of the neural network model to be executed from a preset task list;
1032. acquiring block structure information required by the corresponding task to be executed based on the task to be executed;
1033. and calling the corresponding block structure from the block structure storage space to execute according to the block structure information required by the task to be executed.
In the embodiment of the present invention, a plurality of tasks to be executed of the neural network model may be temporarily stored in a list, and the list may be constructed and maintained in a memory by using data structures such as a linked list and a queue, until all the tasks to be executed of the neural network model are completely executed, the list is released; then, sequentially acquiring tasks to be executed from the list for execution, acquiring and calling block structures required by the tasks to be executed from the block structure storage space according to block structure information required by the tasks to be executed for execution, and obtaining execution results; the multiple tasks to be executed of the neural network model can be managed efficiently through the memory list, and therefore efficient execution of the model is facilitated.
Further, as shown in fig. 7, fig. 7 is a flowchart of another method for calling a block structure according to an embodiment of the present invention, where the calling a corresponding block structure from a block structure storage space according to block structure information required by the task to be executed includes:
10331. acquiring at least one block structure of the task to be executed according to at least one block index of the block structure information;
10332. and calling the computing unit with the minimum load to execute based on the at least one computing unit bound by the block structure.
In the embodiment of the invention, the block structure information required by one task to be executed comprises at least one required block index, and a plurality of corresponding block structures can be called from the model database to be executed sequentially until the last block structure is called based on the block indexes. Specifically, the execution of each block structure may call the calculation unit with the minimum load to execute according to at least one calculation unit bound to the block structure, and specifically, according to the corresponding drive queue of each calculation unit, the calculation unit with the minimum length of the drive queue is queried as the calculation unit with the minimum load. For example, for Y computing type computing units Y1 and Y2 of block1, a load query function of a hardware bottom layer driver may be invoked first, a drive queue of the computing units Y1 and Y2 is queried, and one computing unit with the smallest drive queue length, that is, the smallest load is selected to be executed, for example, computing units Y1 and Y2 on block1 are obtained, the smallest load of Y2 is obtained through load query of the hardware bottom layer driver, a block1 block structure will invoke Y2 to be executed, and a computing unit Y1 may perform parallel computing at the same time, which is beneficial to simultaneously invoking concurrent scenarios of the same block structure block1 by multiple tasks, and may also avoid a problem of storage resource waste caused by repeated application of the same block structure storage space when different tasks are executed, thereby improving utilization rate of storage resources when a neural network model is executed.
104. And unloading the executed neural network model from the model database.
Specifically, as shown in fig. 8, fig. 8 is a flowchart of a method for unloading a neural network model according to an embodiment of the present invention, where the unloading of the executed neural network model from the model database includes:
1041. releasing the block structure storage space occupied by all executed tasks of the neural network model;
1042. clearing all executed tasks in the preset task list;
1043. and destroying the model identifier of the executed neural network model in the model database.
In the embodiment of the invention, when the last task of the neural network model to be executed is executed, the block structure storage space occupied by the block structure called by each task is released, the locking of the corresponding computing unit is released, all executed tasks are deleted from the preset task list, the memory occupied by the preset task list is released, and finally the model identifier of the executed neural network model is destroyed from the model database, and the block structure data of the neural network model is deleted from the model database table, so that the unloading of the neural network model can be quickly completed, and the execution process of the whole neural network model is also finished.
To sum up, the embodiment of the present invention loads the to-be-executed neural network model stored in the external memory into the model database, analyzes the model to obtain the information of a plurality of block structures of the neural network model, allocates an independent storage space for each block structure according to the obtained information of the plurality of block structures, and stores each block structure into the corresponding storage space of the block structure, so that different tasks of the same neural network model can repeatedly invoke the same block structure during execution, and do not need to re-apply the corresponding storage space after the execution of one task is completed, thereby wasting storage resources and time resources, and the parallel design in the storage space of the block structure is beneficial to the concurrent scenario that a plurality of tasks invoke the same block structure simultaneously during the execution of the neural network, and also can avoid the problem of storage resource waste caused by repeated application of the same storage space of the block structure during the execution of different tasks, therefore, the utilization rate of storage resources in the execution of the neural network model can be improved.
It should be noted that the neural network model execution method provided in the embodiment of the present invention may be applied to a device such as a mobile phone, a monitor, a computer, and a server that can execute the neural network model.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a neural network model execution device according to an embodiment of the present invention, and as shown in fig. 9, the device includes:
the analysis module 201 is configured to load a neural network model to be executed into a model database and perform analysis to obtain a model analysis result, where the model analysis result includes at least one block structure information of the neural network model to be executed;
an allocating module 202, configured to allocate a block structure storage space to each block structure based on the block structure information of the to-be-executed neural network model, and store each block structure in a corresponding block structure storage space;
the execution module 203 is configured to call, according to the to-be-executed tasks of the to-be-executed neural network model, the corresponding block structures from the block structure storage space to execute until all the to-be-executed tasks are executed;
and an unloading module 204, configured to unload the executed neural network model from the model database.
Optionally, as shown in fig. 10, the parsing module 201 includes:
a reading and assigning unit 2011, configured to read the neural network model to be executed into the model database and assign a model identifier;
a traversal unit 2012, configured to traverse the to-be-executed neural network model based on the model identifier, obtain multiple block structures of the to-be-executed neural network model, and allocate a block index to each block structure;
a first obtaining unit 2013, configured to obtain information of each block structure as the model parsing result.
Optionally, the block structure information includes a model to which the block structure belongs, a block index, and a block calculation type, as shown in fig. 11, the allocating module 202 includes:
a binding unit 2021, configured to correspondingly bind at least one computing unit based on the block computing type of each block structure;
the memory allocation unit 2022 is configured to allocate a corresponding memory address and a corresponding memory size according to the computing unit bound to each block structure.
Optionally, as shown in fig. 12, the executing module 203 includes:
a second obtaining unit 2031, configured to obtain, from a preset task list, one to-be-executed task of the to-be-executed neural network model;
a third obtaining unit 2032, configured to obtain, based on the to-be-executed task, block structure information required by the corresponding to-be-executed task;
a calling and executing unit 2033, configured to call, according to the block structure information required by the task to be executed, a corresponding block structure from the block structure storage space for execution.
Optionally, as shown in fig. 13, the invoking and executing unit 2033 further includes:
an obtaining subunit 20331, configured to obtain, according to the at least one block index of the block structure information, at least one block structure of the task to be executed;
the invoking subunit 20331 is configured to invoke the computation unit with the minimum load to execute based on the at least one computation unit bound by the block structure.
Optionally, as shown in fig. 14, the unloading module 204 includes:
a releasing unit 2041, configured to release a block structure storage space occupied by all executed tasks of the neural network model;
a clearing unit 2042, configured to clear all executed tasks in the preset task list;
a destroying unit 2043, configured to destroy the executed model identifier of the neural network model in the model database.
It should be noted that the neural network model execution apparatus provided in the embodiment of the present invention may be applied to devices such as a mobile phone, a monitor, a computer, and a server that can execute a neural network model.
The neural network model execution device provided by the embodiment of the invention can realize each process realized by the neural network model execution method in the method embodiment, and can achieve the same beneficial effect. To avoid repetition, further description is omitted here.
Referring to fig. 15, fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 15, including: a memory 1302, a processor 1301, and a computer program stored on the memory 1302 and executable on the processor 1301, wherein:
the processor 1301 is used to call the computer program stored in the memory 1302, and performs the following steps:
loading a neural network model to be executed into a model database and analyzing to obtain a model analysis result, wherein the model analysis result comprises at least one block structure information of the neural network model to be executed;
distributing block structure storage space for each block structure based on the block structure information of the neural network model to be executed, and storing each block structure to the corresponding block structure storage space;
according to the to-be-executed tasks of the to-be-executed neural network model, calling the corresponding block structures from the block structure storage space to execute until all the to-be-executed tasks are executed;
and unloading the executed neural network model from the model database.
Optionally, the loading and analyzing the to-be-executed neural network model into the model database by the processor 1301 to obtain a model analysis result, including:
reading the neural network model to be executed into the model database and allocating a model identifier;
traversing the to-be-executed neural network model based on the model identifier to obtain a plurality of block structures of the to-be-executed neural network model, and allocating a block index to each block structure;
and acquiring the information of each block structure as the model analysis result.
Optionally, the block structure information includes a model to which a block structure belongs, a block index, and a block calculation type, and the allocating, by the processor 1301, a block structure storage space to each block structure based on the block structure information of the to-be-executed neural network model includes:
correspondingly binding at least one computing unit based on the block computing type of each block structure;
and distributing corresponding memory addresses and memory sizes according to the computing units bound by each block structure.
Optionally, the task to be executed according to the neural network model to be executed by the processor 1301, invoking a corresponding block structure from the block structure storage space, and executing the task, including:
acquiring a task to be executed of the neural network model to be executed from a preset task list;
acquiring block structure information required by the corresponding task to be executed based on the task to be executed;
and calling the corresponding block structure from the block structure storage space to execute according to the block structure information required by the task to be executed.
Optionally, the invoking, by the processor 1301, a corresponding block structure from the block structure storage space according to the block structure information required by the task to be executed, and executing the corresponding block structure, including:
acquiring at least one block structure of the task to be executed according to at least one block index of the block structure information;
and calling the computing unit with the minimum load to execute based on the at least one computing unit bound by the block structure.
Optionally, the unloading, performed by the processor 1301, of the executed neural network model from the model database includes:
releasing the block structure storage space occupied by all executed tasks of the neural network model;
clearing all executed tasks in the preset task list;
and destroying the model identifier of the executed neural network model in the model database.
The electronic device may be a device that can be applied to a mobile phone, a monitor, a computer, a server, and the like that can perform face image restoration.
The electronic device provided by the embodiment of the invention can realize each process realized by the neural network model execution method in the method embodiment, can achieve the same beneficial effect, and is not repeated here to avoid repetition.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the neural network model execution method provided in the embodiment of the present invention, and can achieve the same technical effect, and is not described herein again to avoid repetition.
An embodiment of the present invention further provides a neural network model execution system, referring to fig. 16, where fig. 16 is a schematic structural diagram of a neural network model execution system provided in an embodiment of the present invention, and the neural network model execution end 1400 and a model database 1404, where the neural network model execution end includes a neural network model management layer 1401, a neural network model execution layer 1402, and a neural network model computation driver layer 1403, where the neural network model management layer 1401 is configured to load and analyze a neural network model to be executed into the model database 1404 to obtain a model analysis result, and the model analysis result includes at least one piece structure information of the neural network model to be executed; and distributing independent block structure storage space for each block structure based on the block structure information of the neural network model to be executed, storing each block structure to the corresponding block structure storage space, and unloading the executed neural network model from the model database. The neural network model execution layer 1402 is configured to invoke a corresponding block structure from the block structure storage space to execute according to the to-be-executed task of the to-be-executed neural network model until all the to-be-executed tasks are executed. The neural network model computation driver layer 1403 is used for providing corresponding hardware computation for the neural network model execution layer 1402 when executing the task to be executed.
Further, the neural network model may be loaded into the model database 1404 through the neural network model management layer 1401, and the neural network model is parsed and unloaded in the model database 1404, and storage resources in the whole execution process of the neural network model are managed; the neural network model execution layer 1402 is configured to execute the neural network model, that is, sequentially schedule block structures to execute according to block structure information obtained by analyzing the neural network model and a task to be executed, and call the neural network model calculation driving layer 1403 to perform calculation according to the calculation type of the block structures; the neural network model computation driver layer 1403 is used for executing hardware computation, a block structure of the neural network model may be used as a minimum unit of computation, and a computation type of the same block structure may support parallel execution of multiple computation execution units.
The neural network model to be executed stored in the external memory is loaded into the model database 1404 through the neural network model management layer 1401 and the model is analyzed to obtain a plurality of block structures of the neural network model, then an independent storage space is allocated to each block structure and each block structure is stored in the corresponding block structure storage space, so that the neural network model execution layer 1402 can repeatedly call the same block structure when executing different tasks of the same neural network model, and does not need to reapply the corresponding storage space after finishing the execution of one task, thereby wasting storage resources and time resources, and the parallel design of the calculation hardware inside the block structure storage space is realized through the neural network model calculation driving layer 1403, namely the same block structure supports a plurality of calculation execution units, which is beneficial to simultaneously calling a concurrent scene of the same block structure by a plurality of tasks during the neural network execution, the problem of storage resource waste caused by repeated application of the same block of storage space during execution of different tasks can be solved, and therefore the utilization rate of the storage resource during execution of the neural network model can be improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (10)

1. A neural network model execution method is characterized by comprising the following steps:
loading a neural network model to be executed into a model database and analyzing to obtain a model analysis result, wherein the model analysis result comprises at least one block structure information of the neural network model to be executed;
distributing independent block structure storage space for each block structure based on the block structure information of the neural network model to be executed, and storing each block structure to the corresponding block structure storage space;
according to the to-be-executed tasks of the to-be-executed neural network model, calling the corresponding block structures from the block structure storage space to execute until all the to-be-executed tasks are executed;
and unloading the executed neural network model from the model database.
2. The method of claim 1, wherein loading and parsing the neural network model to be executed into a model database to obtain a model parsing result comprises:
reading the neural network model to be executed into the model database and allocating a model identifier;
traversing the to-be-executed neural network model based on the model identifier to obtain a plurality of block structures of the to-be-executed neural network model, and allocating a block index to each block structure;
and acquiring the information of each block structure as the model analysis result.
3. The method of claim 2, wherein the block structure information comprises a model to which a block structure belongs, a block index, and a block computation type, and the allocating block structure storage space for each block structure based on the block structure information of the neural network model to be executed comprises:
correspondingly binding at least one computing unit based on the block computing type of each block structure;
and distributing corresponding memory addresses and memory sizes according to the computing units bound by each block structure.
4. The method of claim 3, wherein the invoking the corresponding block structure from the block structure storage space for execution according to the to-be-executed task of the to-be-executed neural network model comprises:
acquiring a task to be executed of the neural network model to be executed from a preset task list;
acquiring block structure information required by the corresponding task to be executed based on the task to be executed;
and calling the corresponding block structure from the block structure storage space to execute according to the block structure information required by the task to be executed.
5. The method according to claim 4, wherein the calling the corresponding block structure from the block structure storage space according to the block structure information required by the task to be executed, comprises:
acquiring at least one block structure of the task to be executed according to at least one block index of the block structure information;
and calling the computing unit with the minimum load to execute based on the at least one computing unit bound by the block structure.
6. The method of claim 5, wherein the unloading of the executed neural network model from the model database comprises:
releasing the block structure storage space occupied by all executed tasks of the neural network model;
clearing all executed tasks in the preset task list;
and destroying the model identifier of the executed neural network model in the model database.
7. A neural network model performing apparatus, the apparatus comprising:
the analysis module is used for loading the neural network model to be executed into a model database and analyzing the neural network model to obtain a model analysis result, wherein the model analysis result comprises at least one block structure information of the neural network model to be executed;
the distribution module is used for distributing independent block structure storage space for each block structure based on the block structure information of the to-be-executed neural network model and storing each block structure to the corresponding block structure storage space;
the execution module is used for calling the corresponding block structure from the block structure storage space to execute according to the to-be-executed tasks of the to-be-executed neural network model until all the to-be-executed tasks are executed;
and the unloading module is used for unloading the executed neural network model from the model database.
8. A neural network model execution system, comprising a neural network model execution end and a model database, wherein the neural network model execution end is used for executing the steps in the neural network model execution method according to any one of claims 1 to 6.
9. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the neural network model execution method according to any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the neural network model execution method according to any one of claims 1 to 6.
CN202111170745.9A 2021-10-08 2021-10-08 Neural network model execution method, device, system and electronic equipment Pending CN114020450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111170745.9A CN114020450A (en) 2021-10-08 2021-10-08 Neural network model execution method, device, system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111170745.9A CN114020450A (en) 2021-10-08 2021-10-08 Neural network model execution method, device, system and electronic equipment

Publications (1)

Publication Number Publication Date
CN114020450A true CN114020450A (en) 2022-02-08

Family

ID=80055415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111170745.9A Pending CN114020450A (en) 2021-10-08 2021-10-08 Neural network model execution method, device, system and electronic equipment

Country Status (1)

Country Link
CN (1) CN114020450A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565064A (en) * 2022-04-26 2022-05-31 心鉴智控(深圳)科技有限公司 Method, system and equipment for identifying multitask learning deep network
CN117349034A (en) * 2023-12-05 2024-01-05 创意信息技术股份有限公司 Hierarchical loading method and device for large language model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565064A (en) * 2022-04-26 2022-05-31 心鉴智控(深圳)科技有限公司 Method, system and equipment for identifying multitask learning deep network
CN117349034A (en) * 2023-12-05 2024-01-05 创意信息技术股份有限公司 Hierarchical loading method and device for large language model
CN117349034B (en) * 2023-12-05 2024-02-23 创意信息技术股份有限公司 Hierarchical loading method and device for large language model

Similar Documents

Publication Publication Date Title
US11593644B2 (en) Method and apparatus for determining memory requirement in a network
CN114020450A (en) Neural network model execution method, device, system and electronic equipment
CN108491263A (en) Data processing method, data processing equipment, terminal and readable storage medium storing program for executing
CN110458280B (en) Convolutional neural network acceleration method and system suitable for mobile terminal
CN107066542B (en) Vector space superposition analysis parallel method and system in geographic information system
CN112199190A (en) Memory allocation method and device, storage medium and electronic equipment
CN113168349A (en) Memory allocation method of AI processor, computer device and computer readable storage medium
CN111240744B (en) Method and system for improving parallel computing efficiency related to sparse matrix
CN110555700A (en) block chain intelligent contract execution method and device and computer readable storage medium
CN107562528A (en) Support the blocking on-demand computing method and relevant apparatus of a variety of Computational frames
CN111352896B (en) Artificial intelligence accelerator, equipment, chip and data processing method
US10580106B2 (en) Graphics processing method utilizing predefined render chunks
CN113806078A (en) Memory scheduling method for edge ai inference framework
CN115586972B (en) Command generation method and device, AI chip, electronic device and storage medium
CN112130977B (en) Task scheduling method, device, equipment and medium
Rekachinsky et al. Modeling parallel processing of databases on the central processor Intel Xeon Phi KNL
CN105573834B (en) A kind of higher-dimension vocabulary tree constructing method based on heterogeneous platform
CN110598174B (en) Back-substitution solving method of sparse matrix based on GPU architecture
CN113742052B (en) Batch task processing method and device
CN112214310B (en) Data set cache queuing method and device
US11855908B2 (en) Resource allocation in cloud computing systems
CN117421109B (en) Training task scheduling method and device, computer equipment and storage medium
US20220200927A1 (en) Differential overbooking in a cloud computing environment
CN114168349A (en) Memory management method for big data computing engine
CN117331679A (en) Data reasoning method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination