WO2021077284A1 - Neural network operating system and method - Google Patents

Neural network operating system and method Download PDF

Info

Publication number
WO2021077284A1
WO2021077284A1 PCT/CN2019/112466 CN2019112466W WO2021077284A1 WO 2021077284 A1 WO2021077284 A1 WO 2021077284A1 CN 2019112466 W CN2019112466 W CN 2019112466W WO 2021077284 A1 WO2021077284 A1 WO 2021077284A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
neural network
node
calculation
layer
Prior art date
Application number
PCT/CN2019/112466
Other languages
French (fr)
Chinese (zh)
Inventor
熊超
牛昕宇
蔡权雄
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to CN201980100192.4A priority Critical patent/CN114365148A/en
Priority to PCT/CN2019/112466 priority patent/WO2021077284A1/en
Publication of WO2021077284A1 publication Critical patent/WO2021077284A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Definitions

  • the embodiments of the present application relate to the field of neural networks, for example, to a neural network operation system and method.
  • the neural network is composed of multiple repetitive calculation layers (also called operators).
  • the calculation method of the neural network has the characteristics of high parallelism and high computational load.
  • Graphics Processing Unit (GPU) equipment contains a large number of small computing cores, on the one hand, it is more in line with the needs of neural network applications; on the other hand, the early neural network algorithm development framework is based on GPU development, and most of the neural network deployment On the GPU.
  • the original intention of GPU design is to process image rendering and other applications, not dedicated to neural network calculations.
  • GPU architecture efficiency and resource utilization are low, usually below 30%. Therefore, the low architecture efficiency of GPU has gradually become the development of neural network technology. Bottleneck.
  • the computing efficiency of the data stream architecture can reach more than 90%. Compared with the instruction set architecture such as GPU, it can make full use of computing resources and is more suitable for the deployment of neural network algorithms.
  • the data flow architecture is technologically advanced, due to the limited development time of the data flow architecture, it is still in the early stage of application, and there are still great unknowns and uncertainties in the application operation form of the data flow architecture.
  • the embodiments of the present application provide a neural network operation system and method, to distinguish the operation form of the neural network based on the data flow architecture, and reduce the threshold for using the data flow device.
  • the embodiment of the application provides a neural network operation system, including:
  • the software layer is configured to construct a neural network calculation graph for the data flow calculation architecture according to the preset network model and the network model data corresponding to the preset network model, and allocate the calculation space corresponding to the neural network calculation graph;
  • the driver layer is connected to the software layer, and is configured to initialize the computing nodes according to the computing space and transmit the node data of multiple computing nodes in the neural network calculation graph through the data transmission between the driver layer and the hardware layer Channel transmission to the hardware layer;
  • the hardware layer is connected to the drive layer and is configured to sequentially calculate the node data of the multiple computing nodes through the data transmission pipeline and perform calculations based on the node data.
  • the embodiment of the present application provides a neural network operation method, including:
  • the software layer constructs a neural network calculation graph for the data flow computing architecture according to the preset network model and the network model data corresponding to the preset network model, and allocates the calculation space corresponding to the neural network calculation graph;
  • the driver layer initializes computing nodes according to the computing space and transmits the node data of multiple computing nodes in the neural network calculation graph to the hardware layer through the data transmission channel between the driver layer and the hardware layer;
  • the hardware layer sequentially obtains the node data of the multiple computing nodes through the data transmission pipeline and performs calculations based on the node data.
  • FIG. 1 is a schematic structural diagram of a neural network operating system provided by Embodiment 1 of the application;
  • FIG. 2 is a schematic structural diagram of another neural network operation system provided in the second embodiment of the application.
  • FIG. 3 is a schematic structural diagram of another neural network operating system provided in the third embodiment of this application.
  • FIG. 4 is a schematic flowchart of a neural network operation method provided in the fourth embodiment of this application.
  • Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many steps in this document can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when the multiple step operations are completed, but there may also be additional steps not included in the drawing. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
  • first”, “second”, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element.
  • the first computing node may be referred to as the second computing node, and similarly, the second computing node may be referred to as the first computing node.
  • Both the first computing node and the second computing node are computing nodes, but the first computing node and the second computing node are not the same computing node.
  • FIG. 1 is a schematic structural diagram of a neural network operation system provided in Embodiment 1 of the application, which is applicable to the operation of a neural network based on a data flow computing architecture.
  • a neural network operating system provided by Embodiment 1 of the present application includes: a software layer 110, a driver layer 120 and a hardware layer 130.
  • the software layer 110 is configured to construct a neural network calculation graph for the data flow calculation architecture according to a preset network model and network model data corresponding to the preset network model, and allocate a calculation space corresponding to the neural network calculation graph.
  • the preset network model is a neural network model that needs to be calculated under the data flow architecture.
  • Neural network calculation graph is a form of expression when the neural network performs actual calculations under the data flow computing architecture. It includes multiple calculation nodes and the connection relationship between the calculation nodes.
  • a calculation node in the neural network calculation graph can correspond to the neural network.
  • the network model data corresponding to the neural network model is the data of each computing node in the neural network model.
  • a neural network calculation graph for the data flow calculation architecture can be constructed, and then the network model data corresponding to the neural network model can be imported into each node of the neural network calculation graph, and the neural network calculation graph can perform actual calculations.
  • the software layer 110 also allocates the calculation space required by the neural network calculation graph.
  • the drive layer 120 is connected to the software layer 110.
  • the drive layer 120 is configured to initialize the computing nodes according to the computing space and pass the node data of multiple computing nodes in the neural network calculation graph through the drive layer 120 and the hardware layer 130.
  • the data transmission channel is transmitted to the hardware layer 130.
  • the software layer 110 constructs the neural network calculation graph
  • it passes the neural network calculation graph to the driver layer 120.
  • the driver layer 120 initiates a device initialization request to the hardware layer 130, and the hardware layer 130 responds to the corresponding device initialization request according to the device initialization request.
  • the I/O interface is initialized, thereby constructing a data transmission pipeline between the driver layer 120 and the hardware layer 130.
  • the software layer 110 controls the data transmission in the data transmission pipeline to control the hardware layer 130 to perform calculations.
  • the drive layer 120 after the software layer 110 allocates the calculation space required by the neural network calculation graph, the drive layer 120 also initializes the calculation nodes of the neural network calculation graph, so that the calculation nodes of the neural network calculation graph can perform actual operations.
  • the hardware layer 130 is connected to the driver layer 120, and the hardware layer 130 is configured to sequentially obtain node data of multiple computing nodes in the neural network calculation graph through the data transmission pipeline and perform calculations based on the node data.
  • the neural network when the user uses the neural network to perform actual calculations, the neural network is inputted with data that needs to be calculated.
  • the software layer 110 obtains the input data and imports the input data into the neural network calculation graph, and then initiates to the drive layer 120 Neural network calculation graph operation request, so that the neural network calculation graph performs calculations based on the user's input data.
  • the driving layer 120 traverses each calculation node of the neural network calculation graph to obtain node data, and transmits the node data to the hardware layer 130 through a data transmission pipeline. After the hardware layer 130 obtains the node data, it completes the calculation of the computing node on the data flow engine, obtains output data corresponding to the input data, and transmits the output data to the drive layer 120 through the data transmission pipeline. After the calculation of the entire calculation graph is completed, the final output data is transmitted from the driver layer 120 to the software layer 110, and the user can obtain the output data corresponding to the input data through the software layer 110.
  • a neural network operating system provided in the first embodiment of the application distinguishes the operating form of the neural network through the software layer, the driver layer, and the hardware layer.
  • the functions of the software layer, the driver layer, and the hardware layer are different, and the actual operation is performed on the neural network. There are two interactions at times.
  • the actual calculations of the neural network are controlled by the driver layer and completed at the hardware layer.
  • the user and the underlying hardware calculations are realized Controlled isolation reduces the threshold for the use of data stream equipment, which is conducive to the wide application of data stream equipment.
  • FIG. 2 is a schematic structural diagram of another neural network operating system provided in the second embodiment of the application. This embodiment is described on the basis of the above-mentioned embodiment.
  • a neural network operating system provided in the second embodiment of the present application includes: a software layer 110, a driver layer 120, and a hardware layer 130.
  • the software layer 110 includes a calculation graph construction module 111, a calculation graph initialization module 112, a memory allocation module 113, and a calculation graph operation module 114.
  • the driver layer 120 includes a device initialization module 121, a data transmission module 123, and a calculation node initialization module. 122.
  • the hardware layer 130 includes an I/O initialization module 131 and a node computing module 132.
  • the calculation graph construction module 111 is configured to construct a neural network calculation graph for the data flow calculation framework according to the preset network model.
  • the neural network is a complex network system formed by a large number of simple processing units that are widely connected to each other. These simple processing units are also called operators, that is, the neural network model consists of a large number of operators. Connected to each other. In actual calculations, one or more layers in a neural network that complete a function are usually called a computing node.
  • the network model data corresponding to the neural network model is the data of multiple computing nodes in the neural network model.
  • the neural network calculation graph It is a form of expression when the neural network performs actual calculations, including the multiple computing nodes of the neural network model and the connection relationship between the computing nodes.
  • the computing node and the operator can be the same size or different sizes, and the size relationship between the computing node and the operator is different for different neural network models.
  • the types of operators included in the neural network model are A1, A2, A3, and A4.
  • the calculation nodes of the neural network calculation graph can be the first calculation node A1+A2 and the second calculation node A3+A4.
  • the connection relationship between the nodes can be to run the first computing node A1+A2 and the second computing node A3+A4 first, and then run the sum of the first computing node A1+A2 and the second computing node A3+A4: A1+A2+ A3+A4.
  • Constructing a neural network calculation graph for a data flow computing architecture based on a preset network model is to construct the operator of the neural network model and the connection relationship between the operators into the calculation node of the neural network calculation graph based on the data flow and the calculation node The connection relationship between.
  • the calculation graph initialization module 112 is configured to import the network model data corresponding to the preset network model into the neural network calculation graph, and pass the neural network calculation graph to the driving layer 120.
  • the initialization of the neural network calculation graph is to import network model data into each calculation node of the neural network calculation graph, so that each calculation node of the neural network calculation graph contains actual data and can perform actual calculations.
  • the calculation graph initialization module 112 transmits the initialized neural network calculation graph to the driving layer 120.
  • the memory allocation module 113 is configured to allocate the computing space required by the neural network calculation graph and initiate a computing node initialization request to the driving layer 120, and the computing node initialization request includes the computing space.
  • the calculation space required by the neural network calculation graph is allocated by the memory allocation module 113.
  • the memory allocation module 113 allocates the calculation space, it initiates a calculation node initialization request to the drive layer 120, so as to provide suitable calculation nodes for actual calculations. surroundings.
  • the calculation graph running module 114 is configured to obtain input data of the preset network model, import the input data into the neural network calculation graph, and initiate a neural network calculation graph running request to the driving layer 120.
  • the device initialization module 121 is configured to initiate a device initialization request to the hardware layer 130.
  • the device initialization module 121 initiates a device initialization request to the hardware layer 130, so that the hardware layer 130 performs I/O interface initialization.
  • the data transmission module 123 is configured to transmit the node data of multiple computing nodes in the neural network calculation graph to the hardware layer 130 through the data transmission channel between the driver layer 120 and the hardware layer 130 according to the neural network calculation graph running request.
  • the computing node initialization module 122 is configured to perform computing node initialization according to the computing node initialization request.
  • the computing node initialization module 122 After the computing node initialization module 122 receives the computing node initialization request sent by the memory allocation module 113, it initializes the computing nodes of the neural network computing graph.
  • the I/O initialization module 131 is configured to complete the initialization of the I/O interface corresponding to the device initialization request according to the device initialization request, and establish the data transmission pipeline.
  • the I/O initialization module 131 After the I/O initialization module 131 receives the device initialization request sent by the device initialization module 121, it completes the initialization of the I/O interface corresponding to the device initialization request, thereby establishing data transmission between the driver layer 120 and the hardware layer 130 pipeline.
  • the node calculation module 132 is configured to sequentially obtain node data of multiple computing nodes through the data transmission pipeline and perform calculations based on the node data.
  • the neural network operating system provided in the second embodiment of the present application completes the initialization of the neural network operating environment through the cooperation of multiple modules of the software layer, the driver layer, and the hardware layer, and provides a suitable operating environment for the operation of the neural network. .
  • FIG. 3 is a schematic structural diagram of another neural network operating system provided in the third embodiment of this application. This embodiment is described on the basis of the above-mentioned embodiment.
  • a neural network operating system provided in the third embodiment of the present application includes: a software layer 110, a driver layer 120, and a hardware layer 130.
  • the software layer 110 includes: a calculation graph construction module 111, a calculation graph initialization module 112, a memory allocation module 113, a calculation graph operation module 114, a data output module 115, and a calculation graph operation management module 116.
  • the driving layer 120 includes: Device initialization module 121, data transmission module 123, computing node initialization module 122, register configuration module 124, and data write module 125.
  • the hardware layer 130 includes: I/O initialization module 131 and node calculation module 132; data transmission module 123 includes data The read-in sub-module 1231 and the data transmission sub-module 1232; the node calculation module 132 includes a data acquisition sub-module 1321, an on-chip storage sub-module 1322, and a hardware node calculation sub-module 1323.
  • the calculation graph construction module 111 is configured to construct a neural network calculation graph for the data flow calculation architecture according to the preset network model.
  • the neural network is a complex network system formed by a large number of simple processing units that are widely connected to each other. These simple processing units are also called operators, that is, the neural network model consists of a large number of operators. Connected to each other. In actual calculations, one or more layers in a neural network that complete a function are usually called a computing node.
  • the network model data corresponding to the neural network model is the data of each computing node in the neural network model.
  • the neural network calculation graph It is a form of expression when the neural network performs actual calculations, including the multiple computing nodes of the neural network model and the connection relationship between the computing nodes.
  • the computing node and the operator can be the same size or different sizes, and the size relationship between the computing node and the operator is different for different neural network models.
  • the types of operators included in the neural network model are A1, A2, A3, and A4.
  • the calculation nodes of the neural network calculation graph can be the first calculation node A1+A2 and the second calculation node A3+A4.
  • the connection relationship between the nodes can be to run the first computing node A1+A2 and the second computing node A3+A4 first, and then run the sum of the first computing node A1+A2 and the second computing node A3+A4: A1+A2+ A3+A4.
  • To construct a neural network calculation graph for the data flow computing architecture based on the preset network model is to construct the calculation nodes of the neural network model and the connection relationship between the calculation nodes into the calculation nodes and the calculation nodes of the neural network calculation graph based on the data flow. The connection relationship between.
  • the calculation graph initialization module 112 is configured to import the network model data corresponding to the preset network model into the neural network calculation graph, and pass the neural network calculation graph to the driving layer 120.
  • the initialization of the neural network calculation graph is to import network model data into each calculation node of the neural network calculation graph, so that each calculation node of the neural network calculation graph contains actual data and can perform actual calculations.
  • the calculation graph initialization module 112 transmits the initialized neural network calculation graph to the driving layer 120.
  • the memory allocation module 113 is configured to allocate the computing space required by the neural network calculation graph and initiate a computing node initialization request to the driving layer 120.
  • the calculation space required by the neural network calculation graph is allocated by the memory allocation module 113.
  • the memory allocation module 113 allocates the calculation space, it initiates a calculation node initialization request to the drive layer 120, so as to provide suitable calculation nodes for actual calculations. surroundings.
  • the device initialization module 121 is configured to initiate a device initialization request to the hardware layer 130.
  • the device initialization module 121 initiates a device initialization request to the hardware layer 130, so that the hardware layer 130 performs I/O interface initialization.
  • the computing node initialization module 122 is configured to perform computing node initialization according to the computing node initialization request.
  • the computing node initialization module 122 After the computing node initialization module 122 receives the computing node initialization request sent by the memory allocation module 113, it initializes the computing nodes of the neural network computing graph.
  • the I/O initialization module 131 is configured to complete the initialization of the I/O interface corresponding to the device initialization request according to the device initialization request, and establish the data transmission pipeline.
  • the I/O initialization module 131 After the I/O initialization module 131 receives the device initialization request sent by the device initialization module 121, it completes the initialization of the I/O interface corresponding to the device initialization request, thereby establishing data transmission between the driver layer 120 and the hardware layer 130 pipeline.
  • the neural network operation system is completed through the interaction between the calculation graph construction module 111, the calculation graph initialization module 112, the memory allocation module 113, the device initialization module 121, the calculation node initialization module 122, and the I/O initialization module 131.
  • the initialization process of the neural network establishes a good operating environment for the actual operation of the neural network.
  • the calculation graph running module 114 is configured to obtain input data of a preset neural network model, import the input data into the neural network calculation graph, and initiate a neural network calculation graph running request to the driving layer 120.
  • the neural network when the user uses the neural network to perform actual calculations, the neural network is inputted with data that needs to be calculated, and the calculation graph running module 114 obtains the user's input data and imports the input data into the neural network calculation graph, and then sends it to the driving layer 120 initiates a request for the operation of the neural network calculation graph, so that the neural network calculation graph is calculated based on the user's input data.
  • the data reading submodule 1231 is configured to read node data of multiple computing nodes in the neural network calculation graph according to the neural network calculation graph running request.
  • the data transmission sub-module 1232 is configured to transmit the node data of multiple computing nodes in the neural network calculation graph to the hardware layer 130 through the data transmission channel.
  • the data reading submodule 1231 reads the node data of multiple computing nodes in the neural network calculation graph, and the data transmission submodule 1232 transmits the node data of multiple computing nodes to the hardware layer 130 through the data transmission pipeline. This allows the hardware layer 130 to perform actual operations on the computing nodes.
  • the calculation graph operation management module 116 is configured to manage the time sequence and required calculation space when the neural network calculation graph is run.
  • the register configuration module 124 is configured to control the hardware layer 130 to establish hardware nodes corresponding to multiple computing nodes in the neural network calculation graph on the data flow engine.
  • the register configuration module 124 controls the hardware nodes corresponding to the calculation nodes of the neural network calculation graph constructed based on the hardware layer 130 of the data flow architecture, so that the neural network calculation graph is operated on the data flow engine.
  • the data acquisition submodule 1321 is configured to sequentially acquire node data of multiple computing nodes through the data transmission pipeline.
  • the on-chip storage submodule 1322 is configured to store the data transmitted by the data acquisition submodule 1321 through the data transmission pipeline and the output data calculated by the hardware node calculation submodule 1323.
  • the on-chip storage sub-module 1322 is configured to store data, and the node data of multiple nodes calculated by the hardware layer 130 and the calculated output data are all stored in the on-chip storage sub-module 1322.
  • the hardware node calculation submodule 1323 is configured to import the node data in the on-chip storage submodule 1322 into the hardware node, complete the calculation of the hardware node on the data flow engine, obtain the output data, and send the output data to the hardware node.
  • the data is stored in the on-chip storage sub-module 1322.
  • the hardware node calculation 1323 submodule 1323 calls the node data from the on-chip storage submodule 1322 and imports the node data into the corresponding hardware node to complete the hardware on the data flow engine. Calculation of nodes. When all the hardware nodes on the data flow engine have completed the calculation, they will get an output data that is finally output to the user.
  • the hardware node calculation sub-module 1323 stores the output data in the on-chip storage sub-module 1322 for transmission to the drive layer 120 Software layer 110.
  • the data writing module 125 is configured to transmit the output data to the software layer 110 through the data transmission pipeline.
  • the hardware node computing sub-module 1323 stores the output data in the on-chip storage sub-module 1222, and the data writing module 125 calls the output data in the on-chip storage sub-module 1222 through the data transmission pipeline, and transmits the output data to the software ⁇ 110 ⁇ Layer 110.
  • the data output module 115 is configured to output the output data to the user.
  • the data writing module 125 of the drive layer 120 transmits the output data to the data output module 115 of the software layer 110, and the data output module 115 outputs the output data to the data storage terminal or upper computer, so that the user can pass the data
  • the storage terminal or the upper computer obtains the output data after the input data is calculated by the neural network.
  • the interaction between the calculation graph operation module 114, the data output module 115, the calculation graph operation management module 116, the data transmission module 123, the register configuration module 124, the data write module 125, and the node calculation module 132 is completed.
  • the user In the actual calculation of the neural network on the data flow engine, the user only needs to input a set of input data to the software layer 110, and after calculation by the hardware layer 130, a corresponding set of output data can be obtained.
  • a neural network operating system provided by the third embodiment of the application obtains user input data and displays output data to the user through the software layer, completes the data transmission between the software layer and the hardware layer through the driver layer, and completes the neural network in the hardware layer.
  • the calculation on the data flow engine realizes the actual operation of the data flow architecture device.
  • the operation of the neural network is divided into three parts through the software layer, the driver layer and the hardware layer. The user only needs to operate at the software layer, and the hardware layer Isolation, to facilitate the application of data streaming equipment.
  • FIG. 4 is a schematic flowchart of a neural network operation method provided in the fourth embodiment of the application, which is applicable to the operation of a neural network based on a data flow computing architecture.
  • This method can be implemented by the neural network operating system provided by any embodiment of this application, and has the beneficial effects of corresponding functional modules of the neural network operating system.
  • the content not described in the fourth embodiment of this application please refer to any system embodiment of this application. description.
  • a neural network operation method provided in the fourth embodiment of the present application includes:
  • the software layer constructs a neural network calculation graph for the data flow calculation architecture according to the preset network model and the network model data corresponding to the preset network model, and allocates a calculation space corresponding to the neural network calculation graph.
  • the preset network model is a neural network model that needs to be calculated under the data flow architecture.
  • One layer/multilayer in a neural network that completes a function is usually called a computing node.
  • the neural network model is composed of multiple computing nodes according to a specific connection relationship.
  • the network model data corresponding to the neural network model is in the neural network model.
  • the neural network calculation graph is a form of expression when the neural network performs actual operations under the data flow computing architecture, including each computing node of the neural network model and the connection relationship between the computing nodes.
  • a neural network calculation graph for the data flow calculation architecture can be constructed, and then the network model data corresponding to the neural network model can be imported into each node of the neural network calculation graph, and the neural network calculation graph can perform actual calculations.
  • the calculation of the neural network calculation graph inevitably requires a certain amount of calculation space. Therefore, the calculation space required by the neural network calculation graph needs to be allocated.
  • the driver layer initializes computing nodes according to the computing space and transmits node data of multiple computing nodes in the neural network calculation graph to the hardware layer through a data transmission channel between the driver layer and the hardware layer.
  • the data transmission pipeline is a transmission channel between the node data of the neural network calculation graph and the actual calculated hardware node, and the data transmission of the neural network calculation graph is realized through the data transmission pipeline.
  • Initializing the computing node is to provide a suitable operating environment for the actual operation of the computing node, so that the computing node of the neural network calculation graph can perform the actual operation.
  • the hardware layer sequentially obtains the node data of the multiple computing nodes through the data transmission pipeline and performs calculations based on the node data.
  • the neural network when the user uses the neural network to perform actual calculations, the neural network will input the data that needs to be calculated. After the user input data is imported into the neural network calculation graph, each calculation node of the neural network calculation graph generates corresponding node data , The node data of each computing node is transmitted to the corresponding hardware node on the data flow engine through the data transmission pipeline for actual calculation. When all the hardware nodes are calculated, the output data for the user is obtained.
  • the fourth embodiment of the present application constructs a neural network calculation graph for a data flow calculation architecture according to a preset network model and the network model data corresponding to the preset network model, and allocates the calculation space corresponding to the neural network calculation graph; according to the calculation space Perform computing node initialization and transmit the node data of multiple computing nodes in the neural network calculation graph to the hardware layer through the data transmission channel between the drive layer and the hardware layer; obtain the data in sequence through the transmission pipeline The node data of multiple nodes is calculated and the calculation is performed based on the node data.
  • the fourth embodiment of the present application makes full use of the characteristics of the data stream architecture to support the actual operation of the data architecture device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Advance Control (AREA)

Abstract

Disclosed herein are a neural network operating system and method. The system comprises: a software layer that is configured to, according to a preset network model and network model data corresponding to the preset network model, construct a neural network computational graph for a data stream computing architecture and allocate a computing space corresponding to the neural network computational graph; a driver layer that is connected to the software layer and is configured to carry out computing node initialization according to the computing space and use a data transmission channel between the driver layer and a hardware layer to transmit to the hardware layer node data of multiple computing nodes in the neural network computational graph; and the hardware layer, which is connected to the driver layer and is configured to use the data transmission channel to successively obtain the node data of the multiple computing nodes and perform computing according to the node data.

Description

神经网络运行系统和方法Neural network operation system and method 技术领域Technical field
本申请实施例涉及神经网络领域,例如涉及一种神经网络运行系统和方法。The embodiments of the present application relate to the field of neural networks, for example, to a neural network operation system and method.
背景技术Background technique
随着深度学习技术的逐渐成熟,基于神经网络的行业落地应用越来越多,包括安防、工业监控、自动驾驶等。With the gradual maturity of deep learning technology, there are more and more applications based on neural networks in industries, including security, industrial monitoring, and autonomous driving.
神经网络由多个重复性计算层(也称为算子)组成,神经网络的计算方式具有高并行度、高计算量的特点。图形处理器(Graphics Processing Unit,GPU)设备包含大量小型计算核,一方面比较符合神经网络应用的需求;另一方面,早期神经网络算法开发的框架是依托于GPU开发的,神经网络大部分部署在GPU上。但是GPU设计初衷是处理图像渲染等应用,并不是专用于神经网络计算的,GPU的架构效率和资源利用率较低,通常在30%以下,因此,GPU的低架构效率逐渐成为神经网络技术发展的瓶颈。The neural network is composed of multiple repetitive calculation layers (also called operators). The calculation method of the neural network has the characteristics of high parallelism and high computational load. Graphics Processing Unit (GPU) equipment contains a large number of small computing cores, on the one hand, it is more in line with the needs of neural network applications; on the other hand, the early neural network algorithm development framework is based on GPU development, and most of the neural network deployment On the GPU. However, the original intention of GPU design is to process image rendering and other applications, not dedicated to neural network calculations. GPU architecture efficiency and resource utilization are low, usually below 30%. Therefore, the low architecture efficiency of GPU has gradually become the development of neural network technology. Bottleneck.
数据流架构计算效率可达到90%以上,相比GPU等指令集架构可以更加充分使用计算资源,更加适合神经网络算法的部署。然而,数据流架构虽然在技术上有领先性,但由于数据流架构的发展时间有限,还处于应用初期阶段,数据流架构的应用运行形式仍存在很大的未知性和不确定性。The computing efficiency of the data stream architecture can reach more than 90%. Compared with the instruction set architecture such as GPU, it can make full use of computing resources and is more suitable for the deployment of neural network algorithms. However, although the data flow architecture is technologically advanced, due to the limited development time of the data flow architecture, it is still in the early stage of application, and there are still great unknowns and uncertainties in the application operation form of the data flow architecture.
发明内容Summary of the invention
本申请实施例提供一种神经网络运行系统和方法,以对基于数据流架构运行的神经网络的运行形式进行区分,降低数据流设备使用门槛。The embodiments of the present application provide a neural network operation system and method, to distinguish the operation form of the neural network based on the data flow architecture, and reduce the threshold for using the data flow device.
本申请实施例提供一种神经网络运行系统,包括:The embodiment of the application provides a neural network operation system, including:
软件层,设置为根据预设网络模型和所述预设网络模型对应的网络模型数据构建针对数据流计算架构的神经网络计算图并分配所述神经网络计算图对应的计算空间;The software layer is configured to construct a neural network calculation graph for the data flow calculation architecture according to the preset network model and the network model data corresponding to the preset network model, and allocate the calculation space corresponding to the neural network calculation graph;
驱动层,与所述软件层连接,设置为根据所述计算空间进行计算节点初始化并将所述神经网络计算图中多个计算节点的节点数据通过所述驱动层与硬件层之间的数据传输通道传输至所述硬件层;The driver layer is connected to the software layer, and is configured to initialize the computing nodes according to the computing space and transmit the node data of multiple computing nodes in the neural network calculation graph through the data transmission between the driver layer and the hardware layer Channel transmission to the hardware layer;
硬件层,与所述驱动层连接,设置为通过所述数据传输管道依次所述多个计算节点的节点数据并根据所述节点数据进行计算。The hardware layer is connected to the drive layer and is configured to sequentially calculate the node data of the multiple computing nodes through the data transmission pipeline and perform calculations based on the node data.
本申请实施例提供一种神经网络运行方法,包括:The embodiment of the present application provides a neural network operation method, including:
软件层根据预设网络模型和所述预设网络模型对应的网络模型数据构建针对数据流计算架构的神经网络计算图并分配所述神经网络计算图对应的计算空间;The software layer constructs a neural network calculation graph for the data flow computing architecture according to the preset network model and the network model data corresponding to the preset network model, and allocates the calculation space corresponding to the neural network calculation graph;
驱动层根据所述计算空间进行计算节点初始化并将所述神经网络计算图中多个计算节点的节点数据通过所述驱动层与硬件层之间的数据传输通道传输至所述硬件层;The driver layer initializes computing nodes according to the computing space and transmits the node data of multiple computing nodes in the neural network calculation graph to the hardware layer through the data transmission channel between the driver layer and the hardware layer;
所述硬件层通过所述数据传输管道依次获取所述多个计算节点的节点数据并根据所述节点数据进行计算。The hardware layer sequentially obtains the node data of the multiple computing nodes through the data transmission pipeline and performs calculations based on the node data.
附图说明Description of the drawings
图1为本申请实施例一提供的一种神经网络运行系统的结构示意图;FIG. 1 is a schematic structural diagram of a neural network operating system provided by Embodiment 1 of the application;
图2为本申请实施例二提供的另一种神经网络运行系统的结构示意图;FIG. 2 is a schematic structural diagram of another neural network operation system provided in the second embodiment of the application;
图3为本申请实施例三提供的另一种神经网络运行系统的结构示意图;FIG. 3 is a schematic structural diagram of another neural network operating system provided in the third embodiment of this application;
图4为本申请实施例四提供的一种神经网络运行方法的流程示意图。FIG. 4 is a schematic flowchart of a neural network operation method provided in the fourth embodiment of this application.
符号说明:110-软件层;120-驱动层;130-硬件层;111-计算图构建模块;112-计算图初始化模块;113-内存分配模块;114-计算图运行模块;115-数据输出模块;116-计算图运行管理模块;121-设备初始化模块;122-计算节点初始化模块;123-数据传输入模块;124-寄存器配置模块;125-数据写出模块;1231-数据读入子模块;1232-数据传输子模块;131-输入/输出(I/O)初始化模块;132-节点计算模块;1321-数据获取子模块;1322-片上存储子模块;1323-硬件节点计算子模块。Symbol description: 110-software layer; 120-driver layer; 130-hardware layer; 111-computation graph building module; 112-computation graph initialization module; 113-memory allocation module; 114-computation graph running module; 115-data output module 116-Compute graph operation management module; 121-device initialization module; 122-computing node initialization module; 123-data transmission in module; 124-register configuration module; 125-data write-out module; 1231-data read-in sub-module; 1232-data transmission sub-module; 131-input/output (I/O) initialization module; 132-node calculation module; 1321-data acquisition sub-module; 1322-on-chip storage sub-module; 1323-hardware node calculation sub-module.
具体实施方式Detailed ways
下面结合附图和实施例对本申请进行说明。本文所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。The application will be described below with reference to the drawings and embodiments. The specific embodiments described herein are only used to explain the application, but not to limit the application. For ease of description, the drawings only show a part of the structure related to the present application instead of all of the structure.
一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将多个步骤描述成顺序的处理,但是本文中的许多步骤可以被并行地、并发地或者同时实施。此外,多个步骤的顺序可以被重新安排。当多个步骤操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、函数、规程、子例程、子程序等等。Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many steps in this document can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when the multiple step operations are completed, but there may also be additional steps not included in the drawing. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
术语“第一”、“第二”等可在本文中用于描述多种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语仅用于将第一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一计算节点称为第二计算节点,且类似地,可将第二计算节点称为第一计算节点。第一计算节点和第二计算节点两者都是计算节点,但第一计算节点和第二计算节点不是同一计算节点。术语“第一”、“第二”等而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有限定。The terms "first", "second", etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element. For example, without departing from the scope of the present application, the first computing node may be referred to as the second computing node, and similarly, the second computing node may be referred to as the first computing node. Both the first computing node and the second computing node are computing nodes, but the first computing node and the second computing node are not the same computing node. The terms "first", "second", etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with "first" and "second" may explicitly or implicitly include one or more of these features. In the description of this application, "plurality" means at least two, such as two, three, etc., unless otherwise defined.
实施例一Example one
图1为本申请实施例一提供的一种神经网络运行系统的结构示意图,可适用于基于数据流计算架构的神经网络的运行。如图1所示,本申请实施例一提供的一种神经网络运行系统包括:软件层110、驱动层120和硬件层130。FIG. 1 is a schematic structural diagram of a neural network operation system provided in Embodiment 1 of the application, which is applicable to the operation of a neural network based on a data flow computing architecture. As shown in FIG. 1, a neural network operating system provided by Embodiment 1 of the present application includes: a software layer 110, a driver layer 120 and a hardware layer 130.
软件层110设置为根据预设网络模型和所述预设网络模型对应的网络模型数据构建针对数据流计算架构的神经网络计算图并分配所述神经网络计算图对应的计算空间。The software layer 110 is configured to construct a neural network calculation graph for the data flow calculation architecture according to a preset network model and network model data corresponding to the preset network model, and allocate a calculation space corresponding to the neural network calculation graph.
本实施例中,预设网络模型是需要在数据流架构下进行运算的神经网络模型。神经网络计算图是神经网络在数据流计算架构下进行实际运算时的一种表达形式,包括多个计算节点及计算节点之间的连接关系,神经网络计算图中的一个计算节点可以对应神经网络中的一层或多层,神经网络模型对应的网络模型数据则是神经网络模型中每个计算节点的数据。根据预设网络模型可以构建针对数据流计算架构的神经网络计算图,再将神经网络模型对应的网络模型数据导入神经网络计算图的每个节点,该神经网络计算图则可以进行实际运算。In this embodiment, the preset network model is a neural network model that needs to be calculated under the data flow architecture. Neural network calculation graph is a form of expression when the neural network performs actual calculations under the data flow computing architecture. It includes multiple calculation nodes and the connection relationship between the calculation nodes. A calculation node in the neural network calculation graph can correspond to the neural network. For one or more layers in the neural network model, the network model data corresponding to the neural network model is the data of each computing node in the neural network model. According to the preset network model, a neural network calculation graph for the data flow calculation architecture can be constructed, and then the network model data corresponding to the neural network model can be imported into each node of the neural network calculation graph, and the neural network calculation graph can perform actual calculations.
一实施例中,神经网络计算图的运算必然需要一定的计算空间,因此,软件层110还对神经网络计算图需要的计算空间进行分配。In an embodiment, the calculation of the neural network calculation graph inevitably requires a certain amount of calculation space. Therefore, the software layer 110 also allocates the calculation space required by the neural network calculation graph.
驱动层120与软件层110连接,驱动层120设置为根据所述计算空间进行计算节点初始化并将所述神经网络计算图中多个计算节点的节点数据通过所述驱动层120与硬件层130之间的数据传输通道传输至所述硬件层130。The drive layer 120 is connected to the software layer 110. The drive layer 120 is configured to initialize the computing nodes according to the computing space and pass the node data of multiple computing nodes in the neural network calculation graph through the drive layer 120 and the hardware layer 130. The data transmission channel is transmitted to the hardware layer 130.
本实施例中,软件层110构建神经网络计算图之后,会将神经网络计算图传入到驱动层120,驱动层120向硬件层130发起设备初始化请求,硬件层130则根据设备初始化请求对相应的I/O接口进行初始化,从而构建驱动层120和硬 件层130之间的数据传输管道。软件层110通过控制该数据传输管道内的数据传输,实现控制硬件层130进行计算。In this embodiment, after the software layer 110 constructs the neural network calculation graph, it passes the neural network calculation graph to the driver layer 120. The driver layer 120 initiates a device initialization request to the hardware layer 130, and the hardware layer 130 responds to the corresponding device initialization request according to the device initialization request. The I/O interface is initialized, thereby constructing a data transmission pipeline between the driver layer 120 and the hardware layer 130. The software layer 110 controls the data transmission in the data transmission pipeline to control the hardware layer 130 to perform calculations.
一实施例中,在软件层110分配神经网络计算图需要的计算空间之后,驱动层120还要对神经网络计算图的计算节点进行初始化,以使神经网络计算图的计算节点能够进行实际运算。In one embodiment, after the software layer 110 allocates the calculation space required by the neural network calculation graph, the drive layer 120 also initializes the calculation nodes of the neural network calculation graph, so that the calculation nodes of the neural network calculation graph can perform actual operations.
硬件层130与驱动层120连接,硬件层130设置为通过所述数据传输管道依次获取神经网络计算图中多个计算节点的节点数据并根据所述节点数据进行计算。The hardware layer 130 is connected to the driver layer 120, and the hardware layer 130 is configured to sequentially obtain node data of multiple computing nodes in the neural network calculation graph through the data transmission pipeline and perform calculations based on the node data.
本实施例中,当用户使用神经网络进行实际运算时,会对神经网络输入需要进行计算的数据,软件层110获取输入数据并将该输入数据导入神经网络计算图中,然后向驱动层120发起神经网络计算图运行请求,以使神经网络计算图根据用户的输入数据进行运算。驱动层120遍历神经网络计算图的每个计算节点以获得节点数据,并通过数据传输管道将节点数据传输到硬件层130。硬件层130获取节点数据后,完成计算节点在数据流引擎上的计算,得到输入数据相对应的输出数据,并通过数据传输管道将输出数据传输到驱动层120。整个计算图计算完成后,最终输出数据由驱动层120传输到软件层110,用户可以通过软件层110获取与输入数据相对应的输出数据。In this embodiment, when the user uses the neural network to perform actual calculations, the neural network is inputted with data that needs to be calculated. The software layer 110 obtains the input data and imports the input data into the neural network calculation graph, and then initiates to the drive layer 120 Neural network calculation graph operation request, so that the neural network calculation graph performs calculations based on the user's input data. The driving layer 120 traverses each calculation node of the neural network calculation graph to obtain node data, and transmits the node data to the hardware layer 130 through a data transmission pipeline. After the hardware layer 130 obtains the node data, it completes the calculation of the computing node on the data flow engine, obtains output data corresponding to the input data, and transmits the output data to the drive layer 120 through the data transmission pipeline. After the calculation of the entire calculation graph is completed, the final output data is transmitted from the driver layer 120 to the software layer 110, and the user can obtain the output data corresponding to the input data through the software layer 110.
本申请实施例一提供的一种神经网络运行系统通过软件层、驱动层和硬件层对神经网络的运行形式进行区分,软件层、驱动层和硬件层的功能不相同,在神经网络进行实际运算时又两两交互,用户在使用神经网络进行实际计算时,只需要通过软件层输入数据或获取输出数据,神经网络的实际运算由驱动层控制,在硬件层完成,用户和底层硬件计算实现了控制隔离,降低了数据流设备的使用门槛,有利于数据流设备的广泛应用。A neural network operating system provided in the first embodiment of the application distinguishes the operating form of the neural network through the software layer, the driver layer, and the hardware layer. The functions of the software layer, the driver layer, and the hardware layer are different, and the actual operation is performed on the neural network. There are two interactions at times. When users use neural networks for actual calculations, they only need to input data or obtain output data through the software layer. The actual calculations of the neural network are controlled by the driver layer and completed at the hardware layer. The user and the underlying hardware calculations are realized Controlled isolation reduces the threshold for the use of data stream equipment, which is conducive to the wide application of data stream equipment.
实施例二Example two
图2为本申请实施例二提供的另一种神经网络运行系统的结构示意图,本实施例在上述实施例的基础上进行说明。如图2所示,本申请实施例二提供的一种神经网络运行系统包括:软件层110、驱动层120和硬件层130。本实施例中,软件层110包括计算图构建模块111、计算图初始化模块112、内存分配模块113和计算图运行模块114,驱动层120包括设备初始化模块121、数据传输模块123和计算节点初始化模块122,硬件层130包括I/O初始化模块131和节点计算模块132。FIG. 2 is a schematic structural diagram of another neural network operating system provided in the second embodiment of the application. This embodiment is described on the basis of the above-mentioned embodiment. As shown in FIG. 2, a neural network operating system provided in the second embodiment of the present application includes: a software layer 110, a driver layer 120, and a hardware layer 130. In this embodiment, the software layer 110 includes a calculation graph construction module 111, a calculation graph initialization module 112, a memory allocation module 113, and a calculation graph operation module 114. The driver layer 120 includes a device initialization module 121, a data transmission module 123, and a calculation node initialization module. 122. The hardware layer 130 includes an I/O initialization module 131 and a node computing module 132.
计算图构建模块111设置为根据所述预设网络模型构建针对数据流计算架 构的神经网络计算图。The calculation graph construction module 111 is configured to construct a neural network calculation graph for the data flow calculation framework according to the preset network model.
本实施例中,神经网络是由大量的、简单的处理单元广泛地互相连接而形成的复杂网络系统,这些简单的处理单元也被称为算子,即神经网络模型由大量的算子广泛地互相连接而成。在实际运算中,神经网络中完成一种功能的一层或多层通常称为一个计算节点,神经网络模型对应的网络模型数据则是神经网络模型中多个计算节点的数据,神经网络计算图是神经网络在进行实际运算时的一种表达形式,包括神经网络模型的多个计算节点及计算节点之间的连接关系。计算节点与算子可以是相同大小,也可以是不同大小,不同的神经网络模型,计算节点和算子之间的大小关系不同。例如,神经网络模型中包含的算子种类为A1、A2、A3、A4四种,神经网络计算图的计算节点可以是第一计算节点A1+A2和第二计算节点A3+A4两种,计算节点之间的连接关系可以是先运行第一计算节点A1+A2和第二计算节点A3+A4,再运行第一计算节点A1+A2和第二计算节点A3+A4的和:A1+A2+A3+A4。根据预设网络模型构建针对数据流计算架构的神经网络计算图,就是将神经网络模型的算子以及算子之间的连接关系构建成基于数据流的神经网络计算图的计算节点以及计算节点之间的连接关系。In this embodiment, the neural network is a complex network system formed by a large number of simple processing units that are widely connected to each other. These simple processing units are also called operators, that is, the neural network model consists of a large number of operators. Connected to each other. In actual calculations, one or more layers in a neural network that complete a function are usually called a computing node. The network model data corresponding to the neural network model is the data of multiple computing nodes in the neural network model. The neural network calculation graph It is a form of expression when the neural network performs actual calculations, including the multiple computing nodes of the neural network model and the connection relationship between the computing nodes. The computing node and the operator can be the same size or different sizes, and the size relationship between the computing node and the operator is different for different neural network models. For example, the types of operators included in the neural network model are A1, A2, A3, and A4. The calculation nodes of the neural network calculation graph can be the first calculation node A1+A2 and the second calculation node A3+A4. The connection relationship between the nodes can be to run the first computing node A1+A2 and the second computing node A3+A4 first, and then run the sum of the first computing node A1+A2 and the second computing node A3+A4: A1+A2+ A3+A4. Constructing a neural network calculation graph for a data flow computing architecture based on a preset network model is to construct the operator of the neural network model and the connection relationship between the operators into the calculation node of the neural network calculation graph based on the data flow and the calculation node The connection relationship between.
计算图初始化模块112设置为将所述预设网络模型对应的网络模型数据导入所述神经网络计算图,并将所述神经网络计算图传入所述驱动层120。The calculation graph initialization module 112 is configured to import the network model data corresponding to the preset network model into the neural network calculation graph, and pass the neural network calculation graph to the driving layer 120.
本实施例中,神经网络计算图的初始化就是将网络模型数据导入神经网络计算图的每个计算节点,以使神经网络计算图的每个计算节点包含实际数据,能够进行实际运算。计算图初始化模块112将初始化后的神经网络计算图传入驱动层120。In this embodiment, the initialization of the neural network calculation graph is to import network model data into each calculation node of the neural network calculation graph, so that each calculation node of the neural network calculation graph contains actual data and can perform actual calculations. The calculation graph initialization module 112 transmits the initialized neural network calculation graph to the driving layer 120.
内存分配模块113设置为分配所述神经网络计算图需要的计算空间并向所述驱动层120发起计算节点初始化请求,计算节点初始化请求包括计算空间。The memory allocation module 113 is configured to allocate the computing space required by the neural network calculation graph and initiate a computing node initialization request to the driving layer 120, and the computing node initialization request includes the computing space.
本实施例中,神经网络计算图需要的计算空间由内存分配模块113进行分配,内存分配模块113分配好计算空间之后,向驱动层120发起计算节点初始化请求,以为计算节点的实际运算提供合适的环境。In this embodiment, the calculation space required by the neural network calculation graph is allocated by the memory allocation module 113. After the memory allocation module 113 allocates the calculation space, it initiates a calculation node initialization request to the drive layer 120, so as to provide suitable calculation nodes for actual calculations. surroundings.
计算图运行模块114,设置为获取所述预设网络模型的输入数据,将输入数据导入神经网络计算图,并向驱动层120发起神经网络计算图运行请求。The calculation graph running module 114 is configured to obtain input data of the preset network model, import the input data into the neural network calculation graph, and initiate a neural network calculation graph running request to the driving layer 120.
设备初始化模块121设置为向所述硬件层130发起设备初始化请求。The device initialization module 121 is configured to initiate a device initialization request to the hardware layer 130.
本实施例中,设备初始化模块121向硬件层130发起设备初始化请求,以使硬件层130进行I/O接口初始化。In this embodiment, the device initialization module 121 initiates a device initialization request to the hardware layer 130, so that the hardware layer 130 performs I/O interface initialization.
数据传输模块123设置为根据神经网络计算图运行请求将神经网络计算图 中多个计算节点的节点数据通过驱动层120与硬件层130之间的数据传输通道传输至硬件层130。The data transmission module 123 is configured to transmit the node data of multiple computing nodes in the neural network calculation graph to the hardware layer 130 through the data transmission channel between the driver layer 120 and the hardware layer 130 according to the neural network calculation graph running request.
计算节点初始化模块122设置为根据所述计算节点初始化请求进行计算节点初始化。The computing node initialization module 122 is configured to perform computing node initialization according to the computing node initialization request.
本实施例中,计算节点初始化模块122接收到内存分配模块113发送的计算节点初始化请求后,即对神经网络计算图的计算节点进行初始化。In this embodiment, after the computing node initialization module 122 receives the computing node initialization request sent by the memory allocation module 113, it initializes the computing nodes of the neural network computing graph.
I/O初始化模块131设置为根据所述设备初始化请求完成所述设备初始化请求对应的I/O接口的初始化,建立所述数据传输管道。The I/O initialization module 131 is configured to complete the initialization of the I/O interface corresponding to the device initialization request according to the device initialization request, and establish the data transmission pipeline.
本实施例中,I/O初始化模块131接收到设备初始化模块121发送的设备初始化请求后,完成设备初始化请求对应的I/O接口初始化,从而建立驱动层120和硬件层130之间的数据传输管道。In this embodiment, after the I/O initialization module 131 receives the device initialization request sent by the device initialization module 121, it completes the initialization of the I/O interface corresponding to the device initialization request, thereby establishing data transmission between the driver layer 120 and the hardware layer 130 pipeline.
节点计算模块132,设置为通过数据传输管道依次获取多个计算节点的节点数据并根据节点数据进行计算。The node calculation module 132 is configured to sequentially obtain node data of multiple computing nodes through the data transmission pipeline and perform calculations based on the node data.
本申请实施例二提供的一种神经网络运行系统通过软件层、驱动层和硬件层的多个模块之间的相互配合完成神经网络运行环境的初始化,为神经网络的运行提供了合适的运行环境。The neural network operating system provided in the second embodiment of the present application completes the initialization of the neural network operating environment through the cooperation of multiple modules of the software layer, the driver layer, and the hardware layer, and provides a suitable operating environment for the operation of the neural network. .
实施例三Example three
图3为本申请实施例三提供的另一种神经网络运行系统的结构示意图,本实施例在上述实施例的基础上进行说明。如图3所示,本申请实施例三提供的一种神经网络运行系统包括:软件层110、驱动层120和硬件层130。本实施例中,软件层110包括:计算图构建模块111、计算图初始化模块112、内存分配模块113、计算图运行模块114、数据输出模块115和计算图运行管理模块116,驱动层120包括:设备初始化模块121、数据传输模块123、计算节点初始化模块122、寄存器配置模块124和数据写出模块125,硬件层130包括:I/O初始化模块131和节点计算模块132;数据传输模块123包括数据读入子模块1231和数据传输子模块1232;节点计算模块132包括数据获取子模块1321、片上存储子模块1322和硬件节点计算子模块1323。FIG. 3 is a schematic structural diagram of another neural network operating system provided in the third embodiment of this application. This embodiment is described on the basis of the above-mentioned embodiment. As shown in FIG. 3, a neural network operating system provided in the third embodiment of the present application includes: a software layer 110, a driver layer 120, and a hardware layer 130. In this embodiment, the software layer 110 includes: a calculation graph construction module 111, a calculation graph initialization module 112, a memory allocation module 113, a calculation graph operation module 114, a data output module 115, and a calculation graph operation management module 116. The driving layer 120 includes: Device initialization module 121, data transmission module 123, computing node initialization module 122, register configuration module 124, and data write module 125. The hardware layer 130 includes: I/O initialization module 131 and node calculation module 132; data transmission module 123 includes data The read-in sub-module 1231 and the data transmission sub-module 1232; the node calculation module 132 includes a data acquisition sub-module 1321, an on-chip storage sub-module 1322, and a hardware node calculation sub-module 1323.
计算图构建模块111设置为根据所述预设网络模型构建针对数据流计算架构的神经网络计算图。The calculation graph construction module 111 is configured to construct a neural network calculation graph for the data flow calculation architecture according to the preset network model.
本实施例中,神经网络是由大量的、简单的处理单元广泛地互相连接而形成的复杂网络系统,这些简单的处理单元也被称为算子,即神经网络模型由大 量的算子广泛地互相连接而成。在实际运算中,神经网络中完成一种功能的一层或多层通常称为一个计算节点,神经网络模型对应的网络模型数据则是神经网络模型中每个计算节点的数据,神经网络计算图是神经网络在进行实际运算时的一种表达形式,包括神经网络模型的多个计算节点及计算节点之间的连接关系。计算节点与算子可以是相同大小,也可以是不同大小,不同的神经网络模型,计算节点和算子之间的大小关系不同。例如,神经网络模型中包含的算子种类为A1、A2、A3、A4四种,神经网络计算图的计算节点可以是第一计算节点A1+A2和第二计算节点A3+A4两种,计算节点之间的连接关系可以是先运行第一计算节点A1+A2和第二计算节点A3+A4,再运行第一计算节点A1+A2和第二计算节点A3+A4的和:A1+A2+A3+A4。根据预设网络模型构建针对数据流计算架构的神经网络计算图,就是将神经网络模型的计算节点以及计算节点之间的连接关系构建成基于数据流的神经网络计算图的计算节点以及计算节点之间的连接关系。In this embodiment, the neural network is a complex network system formed by a large number of simple processing units that are widely connected to each other. These simple processing units are also called operators, that is, the neural network model consists of a large number of operators. Connected to each other. In actual calculations, one or more layers in a neural network that complete a function are usually called a computing node. The network model data corresponding to the neural network model is the data of each computing node in the neural network model. The neural network calculation graph It is a form of expression when the neural network performs actual calculations, including the multiple computing nodes of the neural network model and the connection relationship between the computing nodes. The computing node and the operator can be the same size or different sizes, and the size relationship between the computing node and the operator is different for different neural network models. For example, the types of operators included in the neural network model are A1, A2, A3, and A4. The calculation nodes of the neural network calculation graph can be the first calculation node A1+A2 and the second calculation node A3+A4. The connection relationship between the nodes can be to run the first computing node A1+A2 and the second computing node A3+A4 first, and then run the sum of the first computing node A1+A2 and the second computing node A3+A4: A1+A2+ A3+A4. To construct a neural network calculation graph for the data flow computing architecture based on the preset network model is to construct the calculation nodes of the neural network model and the connection relationship between the calculation nodes into the calculation nodes and the calculation nodes of the neural network calculation graph based on the data flow. The connection relationship between.
计算图初始化模块112设置为将所述预设网络模型对应的网络模型数据导入所述神经网络计算图,并将所述神经网络计算图传入所述驱动层120。The calculation graph initialization module 112 is configured to import the network model data corresponding to the preset network model into the neural network calculation graph, and pass the neural network calculation graph to the driving layer 120.
本实施例中,神经网络计算图的初始化就是将网络模型数据导入神经网络计算图的每个计算节点,以使神经网络计算图的每个计算节点包含实际数据,能够进行实际运算。计算图初始化模块112将初始化后的神经网络计算图传入驱动层120。In this embodiment, the initialization of the neural network calculation graph is to import network model data into each calculation node of the neural network calculation graph, so that each calculation node of the neural network calculation graph contains actual data and can perform actual calculations. The calculation graph initialization module 112 transmits the initialized neural network calculation graph to the driving layer 120.
内存分配模块113设置为为分配所述神经网络计算图需要的计算空间并向所述驱动层120发起计算节点初始化请求。The memory allocation module 113 is configured to allocate the computing space required by the neural network calculation graph and initiate a computing node initialization request to the driving layer 120.
本实施例中,神经网络计算图需要的计算空间由内存分配模块113进行分配,内存分配模块113分配好计算空间之后,向驱动层120发起计算节点初始化请求,以为计算节点的实际运算提供合适的环境。In this embodiment, the calculation space required by the neural network calculation graph is allocated by the memory allocation module 113. After the memory allocation module 113 allocates the calculation space, it initiates a calculation node initialization request to the drive layer 120, so as to provide suitable calculation nodes for actual calculations. surroundings.
设备初始化模块121设置为向所述硬件层130发起设备初始化请求。The device initialization module 121 is configured to initiate a device initialization request to the hardware layer 130.
本实施例中,设备初始化模块121向硬件层130发起设备初始化请求,以使硬件层130进行I/O接口初始化。In this embodiment, the device initialization module 121 initiates a device initialization request to the hardware layer 130, so that the hardware layer 130 performs I/O interface initialization.
计算节点初始化模块122设置为根据所述计算节点初始化请求进行计算节点初始化。The computing node initialization module 122 is configured to perform computing node initialization according to the computing node initialization request.
本实施例中,计算节点初始化模块122接收到内存分配模块113发送的计算节点初始化请求后,即对神经网络计算图的计算节点进行初始化。In this embodiment, after the computing node initialization module 122 receives the computing node initialization request sent by the memory allocation module 113, it initializes the computing nodes of the neural network computing graph.
I/O初始化模块131设置为根据所述设备初始化请求完成所述设备初始化请求对应I/O接口的初始化,建立所述数据传输管道。The I/O initialization module 131 is configured to complete the initialization of the I/O interface corresponding to the device initialization request according to the device initialization request, and establish the data transmission pipeline.
本实施例中,I/O初始化模块131接收到设备初始化模块121发送的设备初始化请求后,完成设备初始化请求对应的I/O接口初始化,从而建立驱动层120和硬件层130之间的数据传输管道。In this embodiment, after the I/O initialization module 131 receives the device initialization request sent by the device initialization module 121, it completes the initialization of the I/O interface corresponding to the device initialization request, thereby establishing data transmission between the driver layer 120 and the hardware layer 130 pipeline.
本实施例中,通过计算图构建模块111、计算图初始化模块112、内存分配模块113、设备初始化模块121、计算节点初始化模块122和I/O初始化模块131之间的交互完成了神经网络运行系统的初始化过程,为神经网络的实际运行建立了良好的运行环境。In this embodiment, the neural network operation system is completed through the interaction between the calculation graph construction module 111, the calculation graph initialization module 112, the memory allocation module 113, the device initialization module 121, the calculation node initialization module 122, and the I/O initialization module 131. The initialization process of the neural network establishes a good operating environment for the actual operation of the neural network.
计算图运行模块114设置为获取预设神经网络模型的输入数据,将所述输入数据导入所述神经网络计算图,并向所述驱动层120发起神经网络计算图运行请求。The calculation graph running module 114 is configured to obtain input data of a preset neural network model, import the input data into the neural network calculation graph, and initiate a neural network calculation graph running request to the driving layer 120.
本实施例中,当用户使用神经网络进行实际运算时,会对神经网络输入需要进行计算的数据,计算图运行模块114获取用户的输入数据并将输入数据导入神经网络计算图,然后向驱动层120发起神经网络计算图的运行请求,使神经网络计算图以用户的输入数据进行计算。In this embodiment, when the user uses the neural network to perform actual calculations, the neural network is inputted with data that needs to be calculated, and the calculation graph running module 114 obtains the user's input data and imports the input data into the neural network calculation graph, and then sends it to the driving layer 120 initiates a request for the operation of the neural network calculation graph, so that the neural network calculation graph is calculated based on the user's input data.
数据读入子模块1231设置为根据所述神经网络计算图运行请求读取所述神经网络计算图中的多个计算节点的节点数据。The data reading submodule 1231 is configured to read node data of multiple computing nodes in the neural network calculation graph according to the neural network calculation graph running request.
数据传输子模块1232设置为将神经网络计算图中多个计算节点的节点数据通过数据传输通道传输至硬件层130。The data transmission sub-module 1232 is configured to transmit the node data of multiple computing nodes in the neural network calculation graph to the hardware layer 130 through the data transmission channel.
本实施例中,数据读入子模块1231读取神经网络计算图中的多个计算节点的节点数据,数据传输子模块1232将多个计算节点的节点数据通过数据传输管道传输到硬件层130,以使硬件层130对计算节点进行实际运算。In this embodiment, the data reading submodule 1231 reads the node data of multiple computing nodes in the neural network calculation graph, and the data transmission submodule 1232 transmits the node data of multiple computing nodes to the hardware layer 130 through the data transmission pipeline. This allows the hardware layer 130 to perform actual operations on the computing nodes.
计算图运行管理模块116设置为对神经网络计算图运行时的时序和所需要的计算空间进行管理。The calculation graph operation management module 116 is configured to manage the time sequence and required calculation space when the neural network calculation graph is run.
寄存器配置模块124设置为控制所述硬件层130在数据流引擎上建立所述神经网络计算图中的多个计算节点对应的硬件节点。The register configuration module 124 is configured to control the hardware layer 130 to establish hardware nodes corresponding to multiple computing nodes in the neural network calculation graph on the data flow engine.
本实施例中,寄存器配置模块124控制基于数据流架构的硬件层130构建的与神经网络计算图的计算节点对应的硬件节点,以使神经网络计算图在数据流引擎上进行运算。In this embodiment, the register configuration module 124 controls the hardware nodes corresponding to the calculation nodes of the neural network calculation graph constructed based on the hardware layer 130 of the data flow architecture, so that the neural network calculation graph is operated on the data flow engine.
数据获取子模块1321,设置为通过数据传输管道依次获取多个计算节点的节点数据。The data acquisition submodule 1321 is configured to sequentially acquire node data of multiple computing nodes through the data transmission pipeline.
片上存储子模块1322设置为存储所述数据获取子模块1321通过所述数据传输管道传输的数据和所述硬件节点计算子模块1323计算后的输出数据。The on-chip storage submodule 1322 is configured to store the data transmitted by the data acquisition submodule 1321 through the data transmission pipeline and the output data calculated by the hardware node calculation submodule 1323.
本实施例中,片上存储子模块1322设置为数据的存储,硬件层130进行计算的多个节点的节点数据和计算后的输出数据都存储于片上存储子模块1322中。In this embodiment, the on-chip storage sub-module 1322 is configured to store data, and the node data of multiple nodes calculated by the hardware layer 130 and the calculated output data are all stored in the on-chip storage sub-module 1322.
硬件节点计算子模块1323设置为将所述片上存储子模块1322内的节点数据导入所述硬件节点,完成数据流引擎上的所述硬件节点的计算,得到所述输出数据,并将所述输出数据存储到所述片上存储子模块1322。The hardware node calculation submodule 1323 is configured to import the node data in the on-chip storage submodule 1322 into the hardware node, complete the calculation of the hardware node on the data flow engine, obtain the output data, and send the output data to the hardware node. The data is stored in the on-chip storage sub-module 1322.
本实施例中,硬件层130对神经网络进行计算时,硬件节点计算1323子模块1323从片上存储子模块1322中调用节点数据并将节点数据导入对应的硬件节点,以完成数据流引擎上的硬件节点的计算。当数据流引擎上的所有硬件节点都完成计算后,会得到一个最终输出给用户的输出数据,硬件节点计算子模块1323将输出数据存储到片上存储子模块1322中,以通过驱动层120传输到软件层110。In this embodiment, when the hardware layer 130 calculates the neural network, the hardware node calculation 1323 submodule 1323 calls the node data from the on-chip storage submodule 1322 and imports the node data into the corresponding hardware node to complete the hardware on the data flow engine. Calculation of nodes. When all the hardware nodes on the data flow engine have completed the calculation, they will get an output data that is finally output to the user. The hardware node calculation sub-module 1323 stores the output data in the on-chip storage sub-module 1322 for transmission to the drive layer 120 Software layer 110.
数据写出模块125设置为将所述输出数据通过所述数据传输管道传输到所述软件层110。The data writing module 125 is configured to transmit the output data to the software layer 110 through the data transmission pipeline.
本实施例中,硬件节点计算子模块1323将输出数据存储到片上存储子模块1222中,数据写出模块125通过数据传输管道调用片上存储子模块1222中的输出数据,并将输出数据传输到软件层110。In this embodiment, the hardware node computing sub-module 1323 stores the output data in the on-chip storage sub-module 1222, and the data writing module 125 calls the output data in the on-chip storage sub-module 1222 through the data transmission pipeline, and transmits the output data to the software层110。 Layer 110.
数据输出模块115设置为向用户输出所述输出数据。The data output module 115 is configured to output the output data to the user.
本实施例中,驱动层120的数据写出模块125将输出数据传输到软件层110的数据输出模块115,数据输出模块115将输出数据输出给数据存储终端或上位机,从而使用户可以通过数据存储终端或上位机得到输入数据经神经网络计算后的输出数据。In this embodiment, the data writing module 125 of the drive layer 120 transmits the output data to the data output module 115 of the software layer 110, and the data output module 115 outputs the output data to the data storage terminal or upper computer, so that the user can pass the data The storage terminal or the upper computer obtains the output data after the input data is calculated by the neural network.
本实施例中,通过计算图运行模块114、数据输出模块115、计算图运行管理模块116、数据传输模块123、寄存器配置模块124、数据写出模块125、节点计算模块132之间的交互完成了神经网络的在数据流引擎上的实际运算,用户只需要向软件层110输入一组输入数据,经过硬件层130的计算,即可得到对应的一组输出数据。In this embodiment, the interaction between the calculation graph operation module 114, the data output module 115, the calculation graph operation management module 116, the data transmission module 123, the register configuration module 124, the data write module 125, and the node calculation module 132 is completed. In the actual calculation of the neural network on the data flow engine, the user only needs to input a set of input data to the software layer 110, and after calculation by the hardware layer 130, a corresponding set of output data can be obtained.
本申请实施例三提供的一种神经网络运行系统通过软件层获取用户的输入数据和向用户展示输出数据,通过驱动层完成软件层和硬件层之间的数据传输,通过硬件层完成神经网络在数据流引擎上的计算,实现了数据流架构设备的实际运算,通过软件层、驱动层和硬件层将神经网络的运行分为三个部分,用户仅需在软件层操作即可,与硬件层隔离,方便数据流设备的应用。A neural network operating system provided by the third embodiment of the application obtains user input data and displays output data to the user through the software layer, completes the data transmission between the software layer and the hardware layer through the driver layer, and completes the neural network in the hardware layer. The calculation on the data flow engine realizes the actual operation of the data flow architecture device. The operation of the neural network is divided into three parts through the software layer, the driver layer and the hardware layer. The user only needs to operate at the software layer, and the hardware layer Isolation, to facilitate the application of data streaming equipment.
实施例四Example four
图4为本申请实施例四提供的一种神经网络运行方法的流程示意图,可适用于基于数据流计算架构的神经网络的运行。该方法可由本申请任意实施例所提供的神经网络运行系统实现,具备神经网络运行系统相应的功能模块的有益效果,本申请实施例四中未描述的内容可参考本申请任意系统实施例中的描述。FIG. 4 is a schematic flowchart of a neural network operation method provided in the fourth embodiment of the application, which is applicable to the operation of a neural network based on a data flow computing architecture. This method can be implemented by the neural network operating system provided by any embodiment of this application, and has the beneficial effects of corresponding functional modules of the neural network operating system. For the content not described in the fourth embodiment of this application, please refer to any system embodiment of this application. description.
如图4所述,本申请实施例四提供的一种神经网络运行方法包括:As shown in Fig. 4, a neural network operation method provided in the fourth embodiment of the present application includes:
S410、软件层根据预设网络模型和所述预设网络模型对应的网络模型数据构建针对数据流计算架构的神经网络计算图并分配所述神经网络计算图对应的计算空间。S410. The software layer constructs a neural network calculation graph for the data flow calculation architecture according to the preset network model and the network model data corresponding to the preset network model, and allocates a calculation space corresponding to the neural network calculation graph.
本实施例中,预设网络模型是需要在数据流架构下进行运算的神经网络模型。神经网络中完成一种功能的一层/多层通常称为一个计算节点,神经网络模型则由多个计算节点按照特定的连接关系组成,神经网络模型对应的网络模型数据则是神经网络模型中每个计算节点的数据,神经网络计算图是神经网络在数据流计算架构下进行实际运算时的一种表达形式,包括神经网络模型的每个计算节点及计算节点之间的连接关系。根据预设网络模型可以构建针对数据流计算架构的神经网络计算图,再将神经网络模型对应的网络模型数据导入神经网络计算图的每个节点,该神经网络计算图则可以进行实际运算。In this embodiment, the preset network model is a neural network model that needs to be calculated under the data flow architecture. One layer/multilayer in a neural network that completes a function is usually called a computing node. The neural network model is composed of multiple computing nodes according to a specific connection relationship. The network model data corresponding to the neural network model is in the neural network model. For the data of each computing node, the neural network calculation graph is a form of expression when the neural network performs actual operations under the data flow computing architecture, including each computing node of the neural network model and the connection relationship between the computing nodes. According to the preset network model, a neural network calculation graph for the data flow calculation architecture can be constructed, and then the network model data corresponding to the neural network model can be imported into each node of the neural network calculation graph, and the neural network calculation graph can perform actual calculations.
一实施例中,神经网络计算图的运算必然需要一定的计算空间,因此,还需对神经网络计算图所需的计算空间进行分配。In one embodiment, the calculation of the neural network calculation graph inevitably requires a certain amount of calculation space. Therefore, the calculation space required by the neural network calculation graph needs to be allocated.
S420、驱动层根据所述计算空间进行计算节点初始化并将所述神经网络计算图中多个计算节点的节点数据通过所述驱动层与硬件层之间的数据传输通道传输至所述硬件层。S420. The driver layer initializes computing nodes according to the computing space and transmits node data of multiple computing nodes in the neural network calculation graph to the hardware layer through a data transmission channel between the driver layer and the hardware layer.
本实施例中,数据传输管道是神经网络计算图的节点数据与实际计算的硬件节点之间的传输通道,神经网络计算图的数据传输通过该数据传输管道实现。对计算节点进行初始化,是为计算节点的实际运算提供合适的运行环境,使神经网络计算图的计算节点能够进行实际运算。In this embodiment, the data transmission pipeline is a transmission channel between the node data of the neural network calculation graph and the actual calculated hardware node, and the data transmission of the neural network calculation graph is realized through the data transmission pipeline. Initializing the computing node is to provide a suitable operating environment for the actual operation of the computing node, so that the computing node of the neural network calculation graph can perform the actual operation.
S430、硬件层通过所述数据传输管道依次获取所述多个计算节点的节点数据并根据所述节点数据进行计算。S430. The hardware layer sequentially obtains the node data of the multiple computing nodes through the data transmission pipeline and performs calculations based on the node data.
本实施例中,当用户使用神经网络进行实际运算时,会对神经网络输入需要进行计算的数据,用户输入的数据导入神经网络计算图之后,神经网络计算图每个计算节点产生相应的节点数据,每个计算节点的节点数据通过数据传输管道传输到在数据流引擎上的对应的硬件节点以进行实际运算。当所有硬件节点计算完毕后,得到供用户使用的输出数据。In this embodiment, when the user uses the neural network to perform actual calculations, the neural network will input the data that needs to be calculated. After the user input data is imported into the neural network calculation graph, each calculation node of the neural network calculation graph generates corresponding node data , The node data of each computing node is transmitted to the corresponding hardware node on the data flow engine through the data transmission pipeline for actual calculation. When all the hardware nodes are calculated, the output data for the user is obtained.
本申请实施例四通过根据预设网络模型和所述预设网络模型对应的网络模型数据构建针对数据流计算架构的神经网络计算图并分配神经网络计算图对应的计算空间;根据所述计算空间进行计算节点初始化并将所述神经网络计算图中多个计算节点的节点数据通过所述驱动层与硬件层之间的数据传输通道传输至所述硬件层;通过所述传输管道依次获取所述多个计算节点的节点数据并根据所述节点数据进行计算。本申请实施例四充分利用数据流架构的特点,支持数据架构设备的实际运算。The fourth embodiment of the present application constructs a neural network calculation graph for a data flow calculation architecture according to a preset network model and the network model data corresponding to the preset network model, and allocates the calculation space corresponding to the neural network calculation graph; according to the calculation space Perform computing node initialization and transmit the node data of multiple computing nodes in the neural network calculation graph to the hardware layer through the data transmission channel between the drive layer and the hardware layer; obtain the data in sequence through the transmission pipeline The node data of multiple nodes is calculated and the calculation is performed based on the node data. The fourth embodiment of the present application makes full use of the characteristics of the data stream architecture to support the actual operation of the data architecture device.

Claims (10)

  1. 一种神经网络运行系统,包括:A neural network operating system, including:
    软件层,设置为根据预设网络模型和所述预设网络模型对应的网络模型数据构建针对数据流计算架构的神经网络计算图并分配所述神经网络计算图对应的计算空间;The software layer is configured to construct a neural network calculation graph for the data flow calculation architecture according to the preset network model and the network model data corresponding to the preset network model, and allocate the calculation space corresponding to the neural network calculation graph;
    驱动层,与所述软件层连接,设置为根据所述计算空间进行计算节点初始化并将所述神经网络计算图中多个计算节点的节点数据通过所述驱动层与硬件层之间的数据传输通道传输至所述硬件层;The driver layer is connected to the software layer, and is configured to initialize the computing nodes according to the computing space and transmit the node data of multiple computing nodes in the neural network calculation graph through the data transmission between the driver layer and the hardware layer Channel transmission to the hardware layer;
    所述硬件层,与所述驱动层连接,设置为通过所述数据传输管道依次获取所述多个计算节点的节点数据并根据所述节点数据进行计算。The hardware layer is connected to the drive layer and is configured to sequentially obtain node data of the multiple computing nodes through the data transmission pipeline and perform calculations based on the node data.
  2. 根据权利要求1所述的系统,其中,所述软件层包括:The system according to claim 1, wherein the software layer includes:
    计算图构建模块,设置为根据预设网络模型构建针对数据流计算架构的神经网络计算图;The calculation graph building module is set to construct a neural network calculation graph for the data flow calculation architecture according to the preset network model;
    计算图初始化模块,设置为将所述预设网络模型对应的网络模型数据导入所述神经网络计算图,并将所述神经网络计算图传入驱动层;The calculation graph initialization module is configured to import the network model data corresponding to the preset network model into the neural network calculation graph, and pass the neural network calculation graph to the driving layer;
    内存分配模块,设置为分配所述神经网络计算图的计算空间并向所述驱动层发起计算节点初始化请求,所述计算节点初始化请求包括所述计算空间。The memory allocation module is configured to allocate the calculation space of the neural network calculation graph and initiate a calculation node initialization request to the drive layer, and the calculation node initialization request includes the calculation space.
  3. 根据权利要求2所述的系统,其中,所述软件层还包括:计算图运行模块,设置为获取所述预设神经网络模型的输入数据,将所述输入数据导入所述神经网络计算图,并向所述驱动层发起神经网络计算图运行请求。The system according to claim 2, wherein the software layer further comprises: a calculation graph running module, configured to obtain input data of the preset neural network model, and import the input data into the neural network calculation graph, And initiate a neural network calculation graph running request to the driving layer.
  4. 根据权利要求3所述的系统,其中,所述驱动层包括:The system according to claim 3, wherein the driving layer comprises:
    设备初始化模块,设置为向硬件层发起设备初始化请求;The device initialization module is set to initiate a device initialization request to the hardware layer;
    数据传输模块,设置为根据所述神经网络计算图运行请求将所述神经网络计算图中多个计算节点的节点数据通过所述驱动层与硬件层之间的数据传输通道传输至所述硬件层;The data transmission module is configured to transmit the node data of multiple computing nodes in the neural network calculation graph to the hardware layer through the data transmission channel between the drive layer and the hardware layer according to the neural network calculation graph operation request ;
    计算节点初始化模块,设置为根据所述计算节点初始化请求进行计算节点初始化。The computing node initialization module is configured to initialize the computing node according to the computing node initialization request.
  5. 根据权利要求4所述的系统,其中,所述硬件层包括:The system according to claim 4, wherein the hardware layer includes:
    输入/输出I/O初始化模块,设置为用于根据所述设备初始化请求完成所述设备初始化请求对应的I/O接口的初始化,以建立所述数据传输管道;The input/output I/O initialization module is configured to complete the initialization of the I/O interface corresponding to the device initialization request according to the device initialization request, so as to establish the data transmission pipeline;
    节点计算模块,设置为通过所述数据传输管道依次获取所述多个计算节点的节点数据并根据所述节点数据进行计算。The node calculation module is configured to sequentially obtain the node data of the multiple calculation nodes through the data transmission pipeline and perform calculations based on the node data.
  6. 根据权利要求5所述的系统,其中,所述数据传输模块包括:The system according to claim 5, wherein the data transmission module comprises:
    数据读入子模块,设置为根据所述神经网络计算图运行请求读取所述神经网络计算图中的多个计算节点的节点数据;A data reading sub-module, configured to read node data of multiple computing nodes in the neural network calculation graph according to the neural network calculation graph running request;
    数据传输子模块,设置为将所述神经网络计算图中多个计算节点的节点数据通过所述数据传输通道传输至所述硬件层;A data transmission sub-module configured to transmit node data of multiple computing nodes in the neural network calculation graph to the hardware layer through the data transmission channel;
    所述系统还包括:寄存器配置模块,设置为控制所述硬件层在数据流引擎上建立所述多个计算节点对应的硬件节点。The system further includes: a register configuration module configured to control the hardware layer to establish hardware nodes corresponding to the multiple computing nodes on the data flow engine.
  7. 根据权利要求6所述的系统,其中,所述节点计算模块包括:The system according to claim 6, wherein the node calculation module comprises:
    数据获取子模块,设置为通过所述数据传输管道依次获取所述多个计算节点的节点数据;A data acquisition sub-module configured to sequentially acquire the node data of the multiple computing nodes through the data transmission pipeline;
    片上存储子模块,设置为存储所述数据获取子模块获取的所述节点数据和所述硬件节点子计算模块计算后的输出数据;An on-chip storage sub-module configured to store the node data obtained by the data obtaining sub-module and the output data calculated by the hardware node sub-computing module;
    硬件节点计算子模块,设置为将所述片上存储子模块内的节点数据导入所述硬件节点,完成数据流引擎上的所述硬件节点的计算,得到所述输出数据,并将所述输出数据存储到所述片上存储子模块。The hardware node calculation submodule is configured to import the node data in the on-chip storage submodule into the hardware node, complete the calculation of the hardware node on the data flow engine, obtain the output data, and send the output data Stored to the on-chip storage sub-module.
  8. 根据权利要求7所述的系统,其中,所述驱动层还包括:The system according to claim 7, wherein the driving layer further comprises:
    数据写出模块,设置为将所述输出数据通过所述数据传输管道传输到所述软件层。The data writing module is configured to transmit the output data to the software layer through the data transmission pipeline.
  9. 根据权利要求8所述的系统,其中,所述软件层还包括:The system according to claim 8, wherein the software layer further comprises:
    数据输出模块,设置为输出所述输出数据。The data output module is set to output the output data.
  10. 一种神经网络运行方法,包括:A neural network operation method, including:
    软件层根据预设网络模型和所述预设网络模型对应的网络模型数据构建针对数据流计算架构的神经网络计算图并分配所述神经网络计算图对应的计算空间;The software layer constructs a neural network calculation graph for the data flow computing architecture according to the preset network model and the network model data corresponding to the preset network model, and allocates the calculation space corresponding to the neural network calculation graph;
    驱动层根据所述计算空间进行计算节点初始化并将所述神经网络计算图中多个计算节点的节点数据通过所述驱动层与硬件层之间的数据传输通道传输至所述硬件层;The driver layer initializes computing nodes according to the computing space and transmits the node data of multiple computing nodes in the neural network calculation graph to the hardware layer through the data transmission channel between the driver layer and the hardware layer;
    所述硬件层通过所述数据传输管道依次获取所述多个计算节点的节点数据并根据所述节点数据进行计算。The hardware layer sequentially obtains the node data of the multiple computing nodes through the data transmission pipeline and performs calculations based on the node data.
PCT/CN2019/112466 2019-10-22 2019-10-22 Neural network operating system and method WO2021077284A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980100192.4A CN114365148A (en) 2019-10-22 2019-10-22 Neural network operation system and method
PCT/CN2019/112466 WO2021077284A1 (en) 2019-10-22 2019-10-22 Neural network operating system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/112466 WO2021077284A1 (en) 2019-10-22 2019-10-22 Neural network operating system and method

Publications (1)

Publication Number Publication Date
WO2021077284A1 true WO2021077284A1 (en) 2021-04-29

Family

ID=75619592

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/112466 WO2021077284A1 (en) 2019-10-22 2019-10-22 Neural network operating system and method

Country Status (2)

Country Link
CN (1) CN114365148A (en)
WO (1) WO2021077284A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033391B (en) * 2022-08-10 2022-11-11 之江实验室 Data flow method and device for neural network calculation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140289445A1 (en) * 2013-03-22 2014-09-25 Antony Savich Hardware accelerator system and method
US20180114117A1 (en) * 2016-10-21 2018-04-26 International Business Machines Corporation Accelerate deep neural network in an fpga
CN108090560A (en) * 2018-01-05 2018-05-29 中国科学技术大学苏州研究院 The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN108154229A (en) * 2018-01-10 2018-06-12 西安电子科技大学 Accelerate the image processing method of convolutional neural networks frame based on FPGA
CN109643336A (en) * 2018-01-15 2019-04-16 深圳鲲云信息科技有限公司 Artificial intelligence process device designs a model method for building up, system, storage medium, terminal
CN109858610A (en) * 2019-01-08 2019-06-07 广东浪潮大数据研究有限公司 A kind of accelerated method of convolutional neural networks, device, equipment and storage medium
CN110096401A (en) * 2019-05-13 2019-08-06 苏州浪潮智能科技有限公司 A kind of server data process performance test method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140289445A1 (en) * 2013-03-22 2014-09-25 Antony Savich Hardware accelerator system and method
US20180114117A1 (en) * 2016-10-21 2018-04-26 International Business Machines Corporation Accelerate deep neural network in an fpga
CN108090560A (en) * 2018-01-05 2018-05-29 中国科学技术大学苏州研究院 The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN108154229A (en) * 2018-01-10 2018-06-12 西安电子科技大学 Accelerate the image processing method of convolutional neural networks frame based on FPGA
CN109643336A (en) * 2018-01-15 2019-04-16 深圳鲲云信息科技有限公司 Artificial intelligence process device designs a model method for building up, system, storage medium, terminal
CN109858610A (en) * 2019-01-08 2019-06-07 广东浪潮大数据研究有限公司 A kind of accelerated method of convolutional neural networks, device, equipment and storage medium
CN110096401A (en) * 2019-05-13 2019-08-06 苏州浪潮智能科技有限公司 A kind of server data process performance test method and device

Also Published As

Publication number Publication date
CN114365148A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
TWI803663B (en) A computing device and computing method
WO2020029592A1 (en) Conversion method, apparatus, computer device, and storage medium
WO2019127838A1 (en) Method and apparatus for realizing convolutional neural network, terminal, and storage medium
US11348004B2 (en) Method of managing data representation for deep learning, method of processing data for deep learning and deep learning system performing the same
US20160012350A1 (en) Interoperable machine learning platform
CN110506260A (en) It is read by minimizing memory using the blob data being aligned in the processing unit of neural network environment and improves performance
JP2022511716A (en) Decentralized deep learning
CN113469355B (en) Multi-model training pipeline in distributed system
US20230023303A1 (en) Machine learning network implemented by statically scheduled instructions
JP2021504837A (en) Fully connected / regression deep network compression through enhancing spatial locality to the weight matrix and providing frequency compression
WO2014204615A2 (en) Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence
CN114254733A (en) Neural network weight distribution using a tree-shaped Direct Memory Access (DMA) bus
US20230251979A1 (en) Data processing method and apparatus of ai chip and computer device
KR20210125559A (en) Methods and devices for step-assisted workflows
CN113592066A (en) Hardware acceleration method, apparatus, device, computer program product and storage medium
WO2023221406A1 (en) Method and apparatus for operating deep learning compiler, and electronic device
CN109711540B (en) Computing device and board card
WO2021077284A1 (en) Neural network operating system and method
WO2022012563A1 (en) Neural network data processing method, apparatus and device, and storage medium
US20160004803A1 (en) Simulation Sequence In Chemical Process Simulation For Chemical Process Flowsheet With Strongly Connected Components
CN116909748A (en) Computing power resource allocation method and device, electronic equipment and storage medium
WO2023071566A1 (en) Data processing method and apparatus, computer device, computer-readable storage medium, and computer program product
WO2022247880A1 (en) Method for fusing operators of neural network, and related product
US11709783B1 (en) Tensor data distribution using grid direct-memory access (DMA) controller
Jeong et al. WebRTC-based resource offloading in smart home environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19949891

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 27/09/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19949891

Country of ref document: EP

Kind code of ref document: A1