CN110046704A - Depth network accelerating method, device, equipment and storage medium based on data flow - Google Patents
Depth network accelerating method, device, equipment and storage medium based on data flow Download PDFInfo
- Publication number
- CN110046704A CN110046704A CN201910280156.2A CN201910280156A CN110046704A CN 110046704 A CN110046704 A CN 110046704A CN 201910280156 A CN201910280156 A CN 201910280156A CN 110046704 A CN110046704 A CN 110046704A
- Authority
- CN
- China
- Prior art keywords
- data
- network
- target
- data flow
- flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7814—Specially adapted for real time processing, e.g. comprising hardware timers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Abstract
The application, which discloses, provides a kind of depth network accelerating method, device, equipment and storage medium based on data flow, which comprises obtains the target depth network information required for pending data;According to the target depth network information, match pre-set target network configuration rule corresponding with the target depth network information, wherein, the target network configuration rule includes the configuration rule between preconfigured computing engines, the first data flow memory module and global data flow network;According to the target network configuration rule, configuration obtains target data stream network;The pending data is handled by the target data flow network.Depth network is accelerated by data flow, reduces the outer data communication of piece, therefore without instruction idle overhead, the hardware-accelerated efficiency of depth network can be improved, moreover, by carrying out network configuration, different depth network models can be configured, supports a variety of different depth network models.
Description
Technical field
This application involves artificial intelligence field, the depth network acceleration based on data flow that more specifically, it relates to a kind of
Method, apparatus, equipment and storage medium.
Background technique
The advances calls bottom hardware platform of deep learning application program neural network based has high throughput.When
When platform based on CPU is unable to satisfy this increasing need, many companies develop dedicated hardware accelerators to support
The progress in the field.The common idea of existing hardware accelerator accelerates more frequently used in the application of deep learning algorithm
Certain certain types of calculating.For existing hardware structure based on the instruction execution with expansible instruction set, then passing through will be normal
Custom instruction is embodied as with calculating to realize and accelerate.Framework realization based on instruction is typically expressed as system on chip (SoC) design.
In architecture based on instruction, many clock cycle are wasted for non-computational relevant operation.It is more general in order to support
Instruction architecture, the calculating in deep learning neural network are often decomposed into multiple instruction.Therefore one calculates usual need
Want multiple clock cycle.Arithmetic and logic unit (ALU) in processor is usually the set of hardware-implemented different operation.
Due to limited instruction expression formula and limited I/O bandwidth, most of ALU resources are when executing single instruction in idle shape
State, for example, the data of multiplication can be first read when doing multiplication and addition, since I/O speed is by bandwidth contributions, so that addition needs
It waits multiplication to calculate to complete and be written in memory, then reads out calculated result and addition data progress additional calculation, multiplying
Method is calculated with during read-write, and additional calculation unit is idle state.Therefore there is hardware-accelerated low efficiency based on instruction
Problem.
Apply for content
The purpose of the application is in view of the above-mentioned drawbacks of the prior art, providing a kind of depth network based on data flow
Accelerated method, device, equipment and storage medium solve limited instruction expression formula and limited I/O bandwidth, most of ALU
Resource is in idle condition when executing single instruction, the low problem of acceleration efficiency.
The purpose of the application is achieved through the following technical solutions:
In a first aspect, providing a kind of depth network accelerating method based on data flow, which comprises
Obtain the target depth network information required for pending data;
According to the target depth network information, pre-set mesh corresponding with the target depth network information is matched
Mark network configuration rule, wherein the target network configuration rule includes preconfigured computing engines, the storage of the first data flow
Configuration rule between module and global data flow network;
According to the target network configuration rule, configuration obtains target data stream network;
The pending data is handled by the target data flow network.
Optionally, described according to the target network configuration rule, configuration obtains target data stream network, comprising:
According to the global data flow network, configure between multiple computing engines concurrently or sequentially;
It is according to concurrently or sequentially, being obtained between the first data flow memory module and the multiple computing engines
The data flow path of multiple computing engines;
Flow path based on the data forms the target data flow network.
It is optionally, described that the pending data is handled by the target data flow network, comprising:
The pending data is read into the first data flow memory module;
In the first data flow memory module, according to the data format and data path of the pending data,
It is that the pending data generates address sequence by pre-set create-rule;
Each clock cycle reads from the first data flow memory module and the target data according to address sequence
Data volume corresponding with computing engines is inputted in flow network, and obtains the shape of the first data flow memory module and computing engines
State.
Optionally, the target network configuration further includes calculating core, the second data flow storage unit and the connection meter
Local data's flow network of core and second buffer is calculated, the configuration of the computing engines includes:
The interconnection for calculating core and local data's flow network is configured, the calculating path for calculating core is obtained;
Second data flow is configured in the interconnection of storage element and local data's flow network, obtains store path;
According to the calculating path and store path, the computing engines are obtained.
Second aspect also provides a kind of depth network accelerating method based on data flow, which comprises
Obtain the target depth network information required for pending data;
According to the target depth network information, pre-set mesh corresponding with the target depth network information is matched
Mark network configuration rule, wherein the target network configuration rule includes calculating core, the second data flow memory module and part
Data flow network;
According to the target network configuration rule, configuration obtains target data stream engine;
The pending data is handled by the target data stream engine.
Optionally, described according to the target network configuration rule, configuration obtains target data stream engine, comprising:
The interconnection for calculating core and local data's flow network is configured, the calculating path for calculating core is obtained;
The interconnection for configuring the second data flow storage module Yu local data's flow network, obtains store path;
According to the calculating path and store path, the target data stream engine is obtained.
It is optionally, described that the pending data is handled by the target data stream engine, comprising:
The pending data is read into the second data flow memory module;
In the second data flow memory module, according to the data format and data path of the pending data,
It is that the pending data generates address sequence by pre-set create-rule;
Each clock cycle reads from the second data flow memory module and the target data according to address sequence
Data volume corresponding with core is calculated is inputted in stream engine, and is obtained the second data flow memory module and calculated the state of core.
Optionally, the second data flow memory module includes the first storage unit and the second storage unit, described logical
The target data stream engine is crossed to handle the pending data, comprising:
Data input in first storage unit is calculated into core, obtains calculated result;
By calculated result storage to the second storage unit, as next input data for calculating core.
The third aspect, also provides a kind of depth network acceleration device based on data flow, and described device includes:
First obtains module, for obtaining the target depth network information required for pending data;
First matching module, for matching pre-set deep with the target according to the target depth network information
Spend the corresponding target network configuration rule of the network information, wherein the target network configuration rule includes preconfigured calculating
Configuration rule between engine, the first data flow memory module and global data flow network;
First configuration module, for according to the target network configuration rule, configuration to obtain target data stream network;
First processing module, for being handled by the target data flow network the pending data.
Fourth aspect, also provides a kind of depth network acceleration device based on data flow, and described device includes:
Second obtains module, for obtaining the target depth network information required for pending data;
Second matching module, for matching pre-set deep with the target according to the target depth network information
Spend the corresponding target network configuration rule of the network information, wherein the target network configuration rule includes calculating core, the second data
Flow memory module and local data's flow network;
Second configuration module, for according to the target network configuration rule, configuration to obtain target data stream engine;
Second processing module, for being handled by the target data stream engine the pending data.
5th aspect, provides a kind of electronic equipment, comprising: memory, processor and is stored on the memory and can
The computer program run on the processor, the processor realize the embodiment of the present application when executing the computer program
The step in the depth network accelerating method based on data flow provided.
6th aspect, provides a kind of computer readable storage medium, meter is stored on the computer readable storage medium
Calculation machine program, the computer program realize the depth net provided by the embodiments of the present application based on data flow when being executed by processor
Step in network accelerated method.
The application bring reduces the outer data communication of piece the utility model has the advantages that accelerate by data flow to depth network, because
This does not instruct idle overhead, and the hardware-accelerated efficiency of depth network can be improved, moreover, by carrying out network configuration, it can be with
Different depth network models is configured, supports a variety of different depth network models.
Detailed description of the invention
Fig. 1 is a kind of optional implementation framework of the depth network accelerating method based on data flow provided by the embodiments of the present application
Schematic diagram;
Fig. 2 is a kind of process for depth network accelerating method based on data flow that the embodiment of the present application first aspect provides
Schematic diagram;
Fig. 3 is the process signal of another depth network accelerating method based on data flow provided by the embodiments of the present application
Figure;
Fig. 4 is that a kind of depth network accelerating method process based on data flow that the embodiment of the present application second aspect provides is shown
It is intended to;
Fig. 5 is another depth network accelerating method flow diagram based on data flow provided by the embodiments of the present application;
Fig. 6 is a kind of depth network acceleration device signal based on data flow that the embodiment of the present application third aspect provides
Figure;
Fig. 7 is a kind of depth network acceleration device signal based on data flow that the embodiment of the present application fourth aspect provides
Figure.
Specific embodiment
The preferred embodiment of the application is described below, those of ordinary skill in the art will be according to described below with this
The relevant technologies in field are realized, and can be more clearly understood that the innovation and bring benefit of the application.
In order to further describe the technical solution of the application, Fig. 1 is please referred to, Fig. 1 is one kind provided by the embodiments of the present application
The optional implementation configuration diagram of depth network accelerating method based on data flow, as shown in Figure 1, framework 103 and piece external storage
Module (DDR) 101 and place CPU are attached by interconnection, and framework 103 includes: the first memory module 104, global data stream
Network 105 and data flow engine 106, above-mentioned first memory module 104 pass through mutual downlink connection sheet above external storage module 101
Meanwhile also by the above-mentioned global data flow network 105 of mutual downlink connection, above-mentioned data flow engine 106 is above-mentioned complete by mutual downlink connection
Office data flow network 105 is so that above-mentioned data flow engine 106 may be implemented concurrently or sequentially.Above-mentioned data flow engine 106 can
To include: to calculate core (or for computing module), the second memory module 108 and local data's flow network 107, calculating core can be with
Including kernel for calculating, such as convolution kernel 109, Chi Huahe 110 and activation primitive core 111 etc., it is, of course, also possible to include
Other calculating cores in addition to example convolution kernel 109, Chi Huahe 110 and activation primitive core 111 herein and without limitation can also
To include all kernels for calculating in depth network.The first above-mentioned memory module 104 and the second above-mentioned storage mould
Block 108 can be on piece cache module, be also possible to DDR or high speed DDR memory module etc..Above-mentioned data flow engine 106 can
To be interpreted as supporting the computing engines of Data Stream Processing, it is understood that be the computing engines for being exclusively used in Data Stream Processing.It is above-mentioned
CPU in may include control register, above-mentioned control register be provided in advance network configuration rule for network carry out
Configuration.
It should be noted that the middle depth network of the application is referred to as deep learning network, deep learning nerve net
Network etc..
This application provides a kind of depth network accelerating method, device, equipment and storage medium based on data flow.
The purpose of the application is achieved through the following technical solutions:
In a first aspect, referring to Fig. 2, Fig. 2 is that a kind of depth network based on data flow provided by the embodiments of the present application adds
The flow diagram of fast method, as shown in Fig. 2, the described method comprises the following steps:
201, the target depth network information required for pending data is obtained.
In the step, above-mentioned pending data can be image data to be identified, target data to be tested, target and wait chasing after
The data that track data etc. can be handled by depth network, the above-mentioned target depth network information correspond to above-mentioned to be processed
The depth network information of data, such as pending data are image data to be identified, then the target depth network information is then used for
The configuration parameter of the depth network of image recognition is handled, for another example pending data is target data to be tested, then target depth net
Network information is then for the configuration parameter of the depth network of processing target detection, and the above-mentioned target depth network information can be pre-
First it is arranged, matching determination is carried out by pending data, is also possible to not do herein by carrying out selection determination manually
It limits.Obtaining the target depth network information can be in order to configure depth network, and the above-mentioned depth network information can wrap
Include network type, data type, the number of plies, calculating type etc..
202, it according to the target depth network information, matches pre-set corresponding with the target depth network information
Target network configuration rule, wherein the target network configuration rule includes preconfigured computing engines, the first data flow
Configuration rule between memory module and global data flow network.
The network class of depth network required for pending data has been contained in the above-mentioned target depth network information
Type data type, the number of plies, calculates type etc., and above-mentioned target network configuration rule can be to be configured in advance, for example,
It can be each parameter in the type networks such as the image recognition network, target detection network, target tracking network pre-set
Rule, computation rule etc., above-mentioned parameter rule can be the setting rule of hyper parameter, setting rule of weight etc., above-mentioned
Computation rule can be the computation rules such as addition, multiplication, convolution, deconvolution.Above-mentioned preconfigured computing engines, the first number
It can be understood as the number and calculating of computing engines according to the configuration rule between stream memory module and global data flow network
The connection type of engine and global data flow network, the connection type between above-mentioned first data flow and global data flow network,
Routing connection type etc. in above-mentioned global data flow network.Global data flow network can be by control register configuration.The net
Network realizes the router that can be between the first data flow memory module and computing engines.It is multiple when being instantiated in single framework
When computing engines, global data flow network can be configured as to different computing engines and send different data to be used for data simultaneously
Row, or being output and input by it by computing engines serial link is longer calculating pipeline, can be handled in this pipeline
More neural net layers.
In a kind of possible embodiment, the first above-mentioned data flow memory module may include outputting and inputting two data
Flow storage unit, for the access of data, be input traffic storage unit by input data be input in computing engines into
Row calculates, and calculated result is output to output stream storage unit and stored by computing engines, in this way can be to avoid input number
According to stream storage unit when to computing engines input data, input traffic storage can not be written in the output result of computing engines
In unit, for example, computing engines need compute repeatedly 2 times to a data in input traffic storage unit,
After the completion of primary calculating, computing engines need to read second of data in input traffic storage unit, it is generally the case that
The storage of first time calculated result can be waited to input traffic storage unit, then go to read secondary data, but be provided with
After output stream storage unit, while first time calculated result being stored to output stream storage unit, go to read
Secondary data are taken, without being waited, improve the efficiency of data processing.
203, according to the target network configuration rule, configuration obtains target data stream network.
Realization of the above-mentioned configuration to target network configuration rule can be to preconfigured computing engines, the first data
The connection relationship between memory module and global data flow network is flowed, above-mentioned connection relationship may include the company of computing engines
Quantity, the order of connection etc. are connect, computing engines can be attached by interconnecting with global data flow network, form new depth
Network can connect quantity and the order of connection according to different computing engines, form different depth networks.According to target network
Configuration rule is configured, then available target data flow network is for handling pending data.Due to each computing engines
Reading data is carried out by the first data flow memory module, the data in the first data flow memory module can be read respectively
Data flow is formed in different computing engines, instruction set sequence is not needed, so configured computing engines will not generate calculating
It is vacant.
204, the pending data is handled by the target data flow network.
Above-mentioned target data flow network is configured by objective network information, and customization data drift net is referred to as
Network, above-mentioned target data flow network are connected the first data flow memory module and computing engines by global data flow network
It connects to form data flow, for instruction set, the read-write without waiting for a upper instruction is completed, and depth network rack may be implemented
The high efficiency calculated under structure.
In the present embodiment, the target depth network information required for pending data is obtained;According to the target depth
The network information matches pre-set target network configuration rule corresponding with the target depth network information, wherein described
Target network configuration rule include preconfigured computing engines, the first data flow memory module and global data flow network it
Between configuration rule;According to the target network configuration rule, configuration obtains target data stream network;Pass through the target data
Flow network handles the pending data.Depth network is accelerated by data flow, reduces the outer data communication of piece,
Therefore without instruction idle overhead, the hardware-accelerated efficiency of depth network can be improved, moreover, by carrying out network configuration, it can
To configure different depth network models, a variety of different depth network models are supported.
It should be noted that the depth network accelerating method provided by the embodiments of the present application based on data flow can be applied to
The equipment of the depth network acceleration of data flow, such as: computer, server, mobile phone etc. can carry out the depth based on data flow
The equipment of network acceleration.
Fig. 3 is referred to, Fig. 3 is another depth network accelerating method based on data flow provided by the embodiments of the present application
Flow diagram, as shown in figure 3, the described method comprises the following steps:
301, the target depth network information required for pending data is obtained.
302, it according to the target depth network information, matches pre-set corresponding with the target depth network information
Target network configuration rule, wherein the target network configuration rule includes preconfigured computing engines, the first data flow
Configuration rule between memory module and global data flow network.
303, it according to the global data flow network, configures between multiple computing engines concurrently or sequentially.
In this step, above-mentioned global data flow network can be by controlling by route implementing, global data flow network
Register processed is configured, and corresponding global data flow network configuration rule is previously provided in above-mentioned control register.It should
Router of the network implementations between the first data flow memory module and each computing engines, network router major function is to mention
Path and feedback path are skipped for data flow between each computing engines.Concurrently or sequentially may be used between above-mentioned multiple computing engines
It is configured by data flow with to be, for example, computing engines A and computing engines B be when global data flow network is parallel, data flow
Computing engines A and computing engines B is flowed to simultaneously, realizes the parallel processing to data, computing engines A and computing engines B is complete
When office data flow network is serial, data flow can be to be selected in computing engines A and be calculated, then calculated result flow direction is calculated and is drawn
B is held up, when serial, it can be understood as be the intensification of depth network query function layer.Specific configuration can be through global data stream
Network control data flow direction, to realize to the configuration concurrently or sequentially between multiple computing engines.The above-mentioned multiple meters of configuration
Calculate concurrently or sequentially can be by configuring mutually getting continuously between global data flow network and multiple computing engines between engine
Arrive, for example, can be multiple computing engines by and line discipline be interconnected with global data flow network, be also possible to multiple calculating
Engine is interconnected by serial rule with global data flow network, and the first data flow memory module is matched with global data flow network
Set interconnection.
304, according to concurrently or sequentially, being obtained between the first data flow memory module and the multiple computing engines
The data flow path of the multiple computing engines.
In this step, above-mentioned first data flow memory module can be caching, DDR or zero access DDR, in the application
It in embodiment, preferably caches, specifically, can be set in caching with controllable read/write address generation unit.It depends on
It is calculated needed for input data format and data path, scalar/vector will generate the address sequence adapted to indexed cache
In data.Above-mentioned address sequence can be used for the data in indexed cache and be input in corresponding computing engines, for example, meter
Calculating engine needs 80 data to be calculated, then the data of 80 corresponding address sequences are read from caching to the computing engines
In.In addition, scalar/vector can also make the address sequence generated have different circulation sizes by setting counter,
For example, a partial circulating of data 1, data 2, data 3, can be improved the reusability of data, meanwhile, it also can adapt to each calculating
The size of engine data processing.Data flow is stored by the first data flow memory module, and it is multiple to control data flow
Between computing engines concurrently or sequentially in each back end, be data flow path, make to obtain data processing such as cocurrent flow
Waterline is generally able to be handled in computing engines, improves the efficiency of data processing.
305, flow path based on the data, forms the target data flow network.
In this step, the first data flow memory module enters data into corresponding calculating by global data flow network
In engine, calculated result is output in the first data flow memory module by computing engines by global data flow network, without referring to
The problem of order is controlled, and computing unit is in idle condition when being also just not carried out single instruction.
306, the pending data is handled by the target data flow network.
In the embodiment, data flow is stored by the first data flow memory module, and it is multiple to control data flow
Between computing engines concurrently or sequentially in each back end, be data flow path, make to obtain data processing such as cocurrent flow
Waterline is generally able to be handled in computing engines, improves the efficiency of data processing.
It is optionally, described that the pending data is handled by the target data flow network, comprising:
The pending data is read into the first data flow memory module;
In the first data flow memory module, according to the data format and data path of the pending data,
It is that the pending data generates address sequence by pre-set create-rule;
Each clock cycle reads from the first data flow memory module and the target data according to address sequence
Data volume corresponding with computing engines is inputted in flow network, and obtains the shape of the first data flow memory module and computing engines
State.
In the embodiment, above-mentioned first data flow memory module can be caching, DDR or zero access DDR, in this Shen
It please preferably cache, specifically, can be the caching for being provided with controllable read/write address generation unit in embodiment.Depend on
It is calculated needed for input data format and data path, scalar/vector is slow to index by the address sequence for generating adaptation
Data in depositing.Above-mentioned address sequence can be used for the data in indexed cache and be input in corresponding computing engines, for example,
Computing engines need 80 data to be calculated, then the data of 80 corresponding address sequences are read from caching to the computing engines
In.In addition, scalar/vector can also make the address sequence generated have different circulation sizes by setting counter,
For example, a partial circulating of data 1, data 2, data 3, can be improved the reusability of data, meanwhile, it also can adapt to each calculating
The size of engine data processing.The state of above-mentioned first data flow memory module includes: that reading data prepares state and data
Completion status is written, the state of above-mentioned computing engines includes calculating whether to complete, if reading is needed to calculate data next time
Etc. states.It can be by carrying out condition monitoring to the data in the first data flow memory module in finite state machine, to obtain
To the state of the first data flow memory module, the shape of computing engines is acquired by the state of the first data flow memory module
State, for example after calculated result is written to the first data flow memory module, can determine that the state of computing engines finishes to calculate.
In each clock cycle, obtained by the state of each computing engines and the first data flow memory module, so as to
Accurately, it is expected that the hardware performance that can carry out maximal efficiency by accurately calculating waiting optimizes, data processing is further increased
Efficiency.
Optionally, the target network configuration further includes calculating core, the second data flow storage unit and the connection meter
Local data's flow network of core and second buffer is calculated, the configuration of the computing engines includes:
The interconnection for calculating core and local data's flow network is configured, the calculating path for calculating core is obtained;
Second data flow is configured in the interconnection of storage element and local data's flow network, obtains store path;
According to the calculating path and store path, the computing engines are obtained.
In this embodiment, above-mentioned calculating core, the second data flow memory module and local data's flow network are groups
At the main configuration of computing engines, above-mentioned calculating core can be convolution kernel, Chi Huahe, activation primitive core etc. with calculated performance
Kernel, in addition, it is necessary to which explanation, calculates core and is referred to as calculating kernel, computing unit, computing module etc..Above-mentioned
Second data flow memory module can be the memory module that caching, DDR or high speed DDR etc. have data access function, above-mentioned
Second data flow memory module and the first data flow memory module can be the different storage zone on the same memory, than
Such as, the second data flow memory module can be the second data buffer area in buffer, and the first data flow memory module can be
First data buffer area in buffer etc., above-mentioned local data's flow network can be understood as in computing engines for that will calculate
The routing that core and the second data flow memory module are attached.For example, the connection calculated between core can be by network router control
System.Above-mentioned network router major function, which is to provide, skips path and feedback path.Register is controlled by setting, it can be by office
Portion's data flow network is configured to be formed with the different flow paths for calculating core available in computing engines.Along these of flow path
The group of the type and sequence that calculate core is combined into the multiple layers of continuous data processing pipeline of offer in deep learning neural network,
For example, press data flow, if the combination for calculating core is convolution kernel to pond core to activation primitive core, an available convolution mind
Through network layer, for another example, the combination for calculating core is deconvolution core to pond core to activation primitive core, an available deconvolution mind
Through network layer etc..It should be noted that the combination for calculating the type and sequence of core is specifically carried out really by target network configuration rule
It is fixed.By forming data flow between calculating core, the calculating of computing engines can be accelerated, to further increase depth
The data-handling efficiency of network.
Above-mentioned optional embodiment, can the depth network based on data flow of reality Fig. 2 and Fig. 3 corresponding embodiment add
Fast method reaches identical effect, and details are not described herein.
Second aspect, refers to Fig. 4, and Fig. 4 is that a kind of depth network based on data flow provided by the embodiments of the present application adds
Fast method flow schematic diagram, as shown in Figure 4, which comprises
401, the target depth network information required for pending data is obtained.
In the step, above-mentioned pending data can be image data to be identified, target data to be tested, target and wait chasing after
The data that track data etc. can be handled by depth network, the above-mentioned target depth network information correspond to above-mentioned to be processed
The depth network information of data, such as pending data are image data to be identified, then the target depth network information is then used for
The configuration parameter of the depth network of image recognition is handled, for another example pending data is target data to be tested, then target depth net
Network information is then for the configuration parameter of the depth network of processing target detection, and the above-mentioned target depth network information can be pre-
First it is arranged, matching determination is carried out by pending data, is also possible to not do herein by carrying out selection determination manually
It limits.Obtaining the target depth network information can be in order to configure depth network, and the above-mentioned depth network information can wrap
Include network type, data type, the number of plies, calculating type etc..
402, it according to the target depth network information, matches pre-set corresponding with the target depth network information
Target network configuration rule, wherein the target network configuration rule include calculate core, the second data flow memory module and
Local data's flow network.
The network class of depth network required for pending data has been contained in the above-mentioned target depth network information
Type data type, the number of plies, calculates type etc., and above-mentioned target network configuration rule can be to be configured in advance, for example,
It can be each parameter in the type networks such as the image recognition network, target detection network, target tracking network pre-set
Rule, computation rule etc., above-mentioned parameter rule can be the setting rule of hyper parameter, setting rule of weight etc., above-mentioned
Computation rule can be the computation rules such as addition, multiplication, convolution, deconvolution.Above-mentioned calculating core, the second data flow memory module
And the configuration rule between local data's flow network can be understood as calculating the type of core, number and calculate core and global number
According to the connection type of flow network, connection type between above-mentioned second data flow and global data flow network, above-mentioned local data
Routing connection type in flow network etc..Local data's flow network can be by control register configuration.The network implementations can be
Router between first data flow memory module and computing engines.For example, the connection calculated between core can be routed by network
Device control.Above-mentioned network router major function, which is to provide, skips path and feedback path.
403, according to the target network configuration rule, configuration obtains target data stream engine.
Realization of the above-mentioned configuration to target network configuration rule can be to preconfigured calculating core, the second data flow
Connection relationship between memory module and local data's flow network, above-mentioned connection relationship may include calculating the type of core,
Quantity is connected, order of connection etc. can will calculate core and be attached by interconnecting with local data flow network, and form new calculating
Engine is data flow engine, can connect quantity and the order of connection according to different calculating core types, form different depths
Spend data flow engine required for network.It is configured according to target network configuration rule, then available target data stream is drawn
It holds up for handling pending data.Since each calculating core carries out reading data by the second data flow memory module, can incite somebody to action
Data in second data flow memory module read in different calculating core respectively and form data flow, for example, will need to carry out
The reading data of multiplication carries out multiplication calculating into multiplication core, and the reading data for carrying out addition will be needed to be added into addition core
Method calculate etc., due to data flow do not need instruction set sequence, so configured data flow engine will not generate calculate it is vacant.
404, the pending data is handled by the target data stream engine.
Above-mentioned target data stream engine is configured by objective network information, is referred to as customization data flow and is drawn
It holds up, above-mentioned target data stream engine is connected the second data flow memory module and each core that calculates by local data's flow network
It connects to form data flow, for the way of realization of instruction set, the read-write without waiting for a upper instruction is completed, and may be implemented
The high efficiency calculated under the depth network architecture.
In the present embodiment, the target depth network information required for pending data is obtained;According to the target depth
The network information matches pre-set target network configuration rule corresponding with the target depth network information, wherein described
Target network configuration rule includes calculating core, the second data flow memory module and local data's flow network;According to the target
Network configuration rule, configuration obtain target data stream engine;By the target data stream engine to the pending data into
Row processing.Depth network is accelerated by data flow, reduces the outer data communication of piece, therefore without instruction idle overhead, it can
To improve the hardware-accelerated efficiency of depth network, moreover, different depth network models can be configured by carrying out network configuration
Required computing engines support computing engines required for a variety of different depth network models.
Fig. 5 is referred to, Fig. 5 is another depth network accelerating method stream based on data flow provided by the embodiments of the present application
Journey schematic diagram, as shown in Figure 5, which comprises
501, the target depth network information required for pending data is obtained;
502, it according to the target depth network information, matches pre-set corresponding with the target depth network information
Target network configuration rule, wherein the target network configuration rule include calculate core, the second data flow memory module and
Local data's flow network;
503, the interconnection for calculating core and local data's flow network is configured, the calculating path for calculating core is obtained;
504, the interconnection for configuring the second data flow storage module and local data's flow network obtains storage road
Diameter;
505, according to the calculating path and store path, the target data stream engine is obtained.
506, the pending data is handled by the target data stream engine.
In this embodiment, above-mentioned calculating core, the second data flow memory module and local data's flow network are compositions
The main configuration of data flow engine, above-mentioned calculating core can be convolution kernel, Chi Huahe, activation primitive core etc. with calculated performance
Kernel, in addition, it is necessary to which explanation, calculates core and is referred to as calculating kernel, computing unit, computing module etc..Above-mentioned
Second data flow memory module can be the memory module that caching, DDR or high speed DDR etc. have data access function, above-mentioned
Second data flow memory module and the first data flow memory module can be the different storage zone on the same memory, than
Such as, the second data flow memory module can be the second data buffer area in buffer, and the first data flow memory module can be
First data buffer area in buffer etc., above-mentioned local data's flow network can be understood as in computing engines for that will calculate
The routing that core and the second data flow memory module are attached.For example, the connection calculated between core can be by network router control
System.Above-mentioned network router major function, which is to provide, skips path and feedback path.Register is controlled by setting, it can be by office
Portion's data flow network is configured to be formed with the different flow paths for calculating core available in computing engines.Along these of flow path
The group of the type and sequence that calculate core is combined into the multiple layers of continuous data processing pipeline of offer in deep learning neural network,
For example, press data flow, if the combination for calculating core is convolution kernel to pond core to activation primitive core, an available convolution mind
Through network layer, for another example, the combination for calculating core is deconvolution core to pond core to activation primitive core, an available deconvolution mind
Through network layer etc..It should be noted that the combination for calculating the type and sequence of core is specifically carried out really by target network configuration rule
It is fixed.
By forming data flow between calculating core, the calculating of computing engines can be accelerated, to further mention
The data-handling efficiency of high depth network.
It is optionally, described that the pending data is handled by the target data stream engine, comprising:
The pending data is read into the second data flow memory module;
In the second data flow memory module, according to the data format and data path of the pending data,
It is that the pending data generates address sequence by pre-set create-rule;
Each clock cycle reads from the second data flow memory module and the target data according to address sequence
Data volume corresponding with core is calculated is inputted in stream engine, and is obtained the second data flow memory module and calculated the state of core.
In the embodiment, above-mentioned second data flow memory module can be caching, DDR or zero access DDR, in this Shen
It please preferably cache, specifically, can be the caching for being provided with controllable read/write address generation unit in embodiment.Depend on
It is calculated needed for input data format and data path, scalar/vector is slow to index by the address sequence for generating adaptation
Data in depositing.Above-mentioned address sequence can be used for the data in indexed cache and be input in corresponding calculating core, for example, meter
Calculating core needs 80 data to be calculated, then from the data of 80 corresponding address sequences of reading in caching into the calculating core.Separately
Outside, scalar/vector can also make the address sequence generated have different circulation sizes by setting counter, for example,
One partial circulating of data 1, data 2, data 3, can be improved the reusability of data, meanwhile, it also can adapt to each calculating nucleus number
According to the size of processing.The state of above-mentioned second data flow memory module includes: that reading data preparation state and data have been written
At state, whether the state of above-mentioned calculating core includes calculating to complete, if needs to read the states such as calculating data next time.It can
With by finite state machine in the first data flow memory module data carry out condition monitoring, to get the first data
The state for flowing memory module is acquired the state for calculating core by the state of the second data flow memory module, for example calculates knot
After fruit is written to the second data flow memory module, it can determine that the state for calculating core finishes to calculate.
In each clock cycle, obtained by the state of each calculating core and the second data flow memory module, so as to standard
Really, it is expected that the hardware performance that can carry out maximal efficiency by accurately calculating waiting optimizes, data processing is further increased
Efficiency.
Optionally, the second data flow memory module includes the first storage unit and the second storage unit, described logical
The target data stream engine is crossed to handle the pending data, comprising:
Data input in first storage unit is calculated into core, obtains calculated result;
By calculated result storage to the second storage unit, as next input data for calculating core.
In the embodiment, the first above-mentioned storage unit can be input traffic storage unit, and above-mentioned second deposits
Storage unit can be input traffic storage unit, and the first storage unit is deposited with the second storage unit for replacing for data flow
It takes, to be the first storage unit, which be input to input data to calculate in core, calculates, and calculates core for calculated result and is output to the
Two storage units are stored, and can calculate the output of core to avoid the first storage unit when to core input data is calculated in this way
As a result it can not be written in first storage unit, need to carry out a data in knee cap storage unit for example, calculating core
It computes repeatedly 2 times, after the completion of calculating first time, calculates core needs and read the data for the second time in the first storage unit, lead to
In normal situation, the storage of first time calculated result can be waited to the first storage unit, then go to read secondary data, but be arranged
After boundary storage unit, the first storage unit can be gone while by the storage of first time calculated result to the second storage unit
Middle secondary data of reading improve the efficiency of data processing without being waited.
Above-mentioned optional embodiment, can the depth network based on data flow of reality Fig. 4 and Fig. 5 corresponding embodiment add
Fast method reaches identical effect, and details are not described herein.It should be noted that above-mentioned each embodiment can also with Fig. 2 and
Fig. 3 embodiment is combined.
The third aspect, please refers to Fig. 6, and Fig. 6 is that a kind of depth network based on data flow provided by the embodiments of the present application adds
Speed variator schematic diagram, as shown in fig. 6, described device includes:
First obtains module 601, for obtaining the target depth network information required for pending data;
First matching module 602, for matching the pre-set and target according to the target depth network information
The corresponding target network configuration rule of the depth network information, wherein the target network configuration rule includes preconfigured meter
Calculate the configuration rule between engine, the first data flow memory module and global data flow network;
First configuration module 603, for according to the target network configuration rule, configuration to obtain target data stream network;
First processing module 604, for being handled by the target data flow network the pending data.
Optionally, first configuration module 603 includes:
Global configuration submodule, for configuring parallel between multiple computing engines according to the global data flow network
Or it is serial;
Path configures submodule, for according between the first data flow memory module and the multiple computing engines
Concurrently or sequentially, the data flow path of the multiple computing engines is obtained;
Submodule is formed, for flow path based on the data, forms the target data flow network.
Optionally, first processing module 604 includes:
First acquisition submodule, for the pending data to be read the first data flow memory module;
First data address generates submodule, is used in the first data flow memory module, according to described to be processed
The data format and data path of data are that the pending data generates address sequence by pre-set create-rule;
First input submodule is used for each clock cycle, according to address sequence from the first data flow memory module
Middle reading is inputted with data volume corresponding with computing engines in the target data flow network, and is obtained the first data flow and deposited
Store up the state of module and computing engines.
Optionally, the target network configuration further includes calculating core, the second data flow storage unit and the connection meter
Calculate local data's flow network of core and second buffer, first configuration module 603 further include:
First partial configures submodule, for configuring the interconnection for calculating core and local data's flow network, obtains
Calculate the calculating path of core;
First partial path submodule, for configuring second data flow in storage element and local data's drift net
The interconnection of network, obtains store path;
First engine modules, for obtaining the computing engines according to the calculating path and store path.
Fourth aspect, please refers to Fig. 7, and Fig. 7 is that a kind of depth network based on data flow provided by the embodiments of the present application adds
Speed variator schematic diagram, as shown in fig. 7, described device includes:
Second obtains module 701, for obtaining the target depth network information required for pending data;
Second matching module 702, for matching the pre-set and target according to the target depth network information
The corresponding target network configuration rule of the depth network information, wherein the target network configuration rule includes calculating core, the second number
According to stream memory module and local data's flow network;
Second configuration module 703, for according to the target network configuration rule, configuration to obtain target data stream engine;
Second processing module 704, for being handled by the target data stream engine the pending data.
Optionally, second configuration module 703 includes:
Second local configuration submodule is obtained for configuring the interconnection for calculating core and local data's flow network
Calculate the calculating path of core;
Second local path submodule, for configuring the second data flow storage module and local data's flow network
Interconnection, obtain store path;
Second engine modules, for obtaining the target data stream engine according to the calculating path and store path.
Optionally, the Second processing module 704 includes:
Second acquisition submodule, for the pending data to be read the second data flow memory module;
Second data address generates submodule, is used in the second data flow memory module, according to described to be processed
The data format and data path of data are that the pending data generates address sequence by pre-set create-rule;
Second input submodule is used for each clock cycle, according to address sequence from the second data flow memory module
Middle reading is inputted with data volume corresponding with core is calculated in the target data stream engine, and obtains the storage of the second data flow
Module and the state for calculating core.
Optionally, the Second processing module 704 includes:
Computational submodule is inputted, for the data input in the first storage unit to be calculated core, obtains calculated result;
Sub-module stored is exported, for storing the calculated result to the second storage unit, as next calculating core
Input data.
5th aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: memory, processor and are stored in described
On memory and the computer program that can run on the processor, the processor are realized when executing the computer program
Step in depth network accelerating method provided by the embodiments of the present application based on data flow.
6th aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage medium
Computer program is stored in matter, the computer program is realized provided by the embodiments of the present application based on number when being executed by processor
According to the step in the depth network accelerating method of stream.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily the application
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way
It realizes.For example, the apparatus embodiments described above are merely exemplary
It, can also be in addition, the processor, chip in each embodiment of the application can integrate in one processing unit
It is to physically exist alone, it can also be with two or more hardware integrations in a unit.Computer readable storage medium or
Computer-readable program can store in a computer-readable access to memory.Based on this understanding, the technology of the application
Substantially all or part of the part that contributes to existing technology or the technical solution can be with software in other words for scheme
The form of product embodies, which is stored in a memory, including some instructions are used so that one
Platform computer equipment (can be personal computer, server or network equipment etc.) executes each embodiment the method for the application
All or part of the steps.And memory above-mentioned include: USB flash disk, it is read-only memory (ROM, Read-Only Memory), random
Access memory (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. are various to can store program
The medium of code.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory
May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English:
Random Access Memory, referred to as: RAM), disk or CD etc..
The above content is combine specific preferred embodiment to made by the application further description, and it cannot be said that
The specific embodiment of the application is only limited to these instructions.The application person of an ordinary skill in the technical field is come
It says, without departing from the concept of this application, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to this Shen
Protection scope please.
Claims (12)
1. a kind of depth network accelerating method based on data flow, which is characterized in that the described method includes:
Obtain the target depth network information required for pending data;
According to the target depth network information, pre-set target network corresponding with the target depth network information is matched
Network configuration rule, wherein the target network configuration rule includes preconfigured computing engines, the first data flow memory module
And the configuration rule between global data flow network;
According to the target network configuration rule, configuration obtains target data stream network;
The pending data is handled by the target data flow network.
2. the method as described in claim 1, which is characterized in that described according to the target network configuration rule, configuration obtains
Target data flow network, comprising:
According to the global data flow network, configure between multiple computing engines concurrently or sequentially;
According to concurrently or sequentially, being obtained the multiple between the first data flow memory module and the multiple computing engines
The data flow path of computing engines;
Flow path based on the data forms the target data flow network.
3. the method as described in claim 1, which is characterized in that it is described by the target data flow network to described to be processed
Data are handled, comprising:
The pending data is read into the first data flow memory module;
In the first data flow memory module, according to the data format and data path of the pending data, by pre-
The create-rule being first arranged is that the pending data generates address sequence;
Each clock cycle reads from the first data flow memory module and the target data drift net according to address sequence
Data volume corresponding with computing engines is inputted in network, and obtains the state of the first data flow memory module and computing engines.
4. such as method any one of claims 1 to 3, which is characterized in that the target network configuration further include calculate core,
Second data flow storage unit and the connection local data's flow network for calculating core and second buffer, the calculating
The configuration of engine includes:
The interconnection for calculating core and local data's flow network is configured, the calculating path for calculating core is obtained;
Second data flow is configured in the interconnection of storage module and local data's flow network, obtains store path;
According to the calculating path and store path, the computing engines are obtained.
5. a kind of depth network accelerating method based on data flow, which is characterized in that the described method includes:
Obtain the target depth network information required for pending data;
According to the target depth network information, pre-set target network corresponding with the target depth network information is matched
Network configuration rule, wherein the target network configuration rule includes calculating core, the second data flow memory module and local data
Flow network;
According to the target network configuration rule, configuration obtains target data stream engine;
The pending data is handled by the target data stream engine.
6. method as described in claim 5, which is characterized in that it is described according to the target network configuration rule, it is arranged to
To target data stream engine, comprising:
The interconnection for calculating core and local data's flow network is configured, the calculating path for calculating core is obtained;
Second data flow is configured in the interconnection of storage element and local data's flow network, obtains store path;
According to the calculating path and store path, the target data stream engine is obtained.
7. method as described in claim 5, which is characterized in that it is described by the target data stream engine to described wait locate
Reason data are handled, comprising:
The pending data is read into the second data flow memory module;
In the second data flow memory module, according to the data format and data path of the pending data, by pre-
The create-rule being first arranged is that the pending data generates address sequence;
Each clock cycle reads from the second data flow memory module according to address sequence and draws with the target data stream
Data volume corresponding with core is calculated is inputted in holding up, and is obtained the second data flow memory module and calculated the state of core.
8. the method as described in any in claim 5 to 7, which is characterized in that the second data flow memory module includes the
One storage unit and the second storage unit, it is described by the target data stream engine to the pending data at
Reason, comprising:
Data input in first storage unit is calculated into core, obtains calculated result;
By calculated result storage to the second storage unit, as next input data for calculating core.
9. a kind of depth network acceleration device based on data flow, which is characterized in that described device includes:
First obtains module, for obtaining the target depth network information required for pending data;
First matching module, for matching the pre-set and target depth net according to the target depth network information
The corresponding target network configuration rule of network information, wherein the target network configuration rule include preconfigured computing engines,
Configuration rule between first data flow memory module and global data flow network;
First configuration module, for according to the target network configuration rule, configuration to obtain target data stream network;
First processing module, for being handled by the target data flow network the pending data.
10. a kind of depth network acceleration device based on data flow, which is characterized in that described device includes:
Second obtains module, for obtaining the target depth network information required for pending data;
Second matching module, for matching the pre-set and target depth net according to the target depth network information
The corresponding target network configuration rule of network information, wherein the target network configuration rule is deposited including calculating core, the second data flow
Store up module and local data's flow network;
Second configuration module, for according to the target network configuration rule, configuration to obtain target data stream engine;
Second processing module, for being handled by the target data stream engine the pending data.
11. a kind of electronic equipment characterized by comprising memory, processor and be stored on the memory and can be in institute
The computer program run on processor is stated, the processor is realized when executing the computer program as in Claims 1-4
Step in described in any item depth network accelerating methods based on data flow.
12. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program is realized when the computer program is executed by processor according to any one of claims 1 to 4 based on data flow
Step in depth network accelerating method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910280156.2A CN110046704B (en) | 2019-04-09 | 2019-04-09 | Deep network acceleration method, device, equipment and storage medium based on data stream |
PCT/CN2019/082101 WO2020206637A1 (en) | 2019-04-09 | 2019-04-10 | Deep network acceleration methods and apparatuses based on data stream, device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910280156.2A CN110046704B (en) | 2019-04-09 | 2019-04-09 | Deep network acceleration method, device, equipment and storage medium based on data stream |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110046704A true CN110046704A (en) | 2019-07-23 |
CN110046704B CN110046704B (en) | 2022-11-08 |
Family
ID=67276511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910280156.2A Active CN110046704B (en) | 2019-04-09 | 2019-04-09 | Deep network acceleration method, device, equipment and storage medium based on data stream |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110046704B (en) |
WO (1) | WO2020206637A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111404770A (en) * | 2020-02-29 | 2020-07-10 | 华为技术有限公司 | Network device, data processing method, device, system and readable storage medium |
CN111753994A (en) * | 2020-06-22 | 2020-10-09 | 深圳鲲云信息科技有限公司 | Data processing method and device of AI chip and computer equipment |
CN111752887A (en) * | 2020-06-22 | 2020-10-09 | 深圳鲲云信息科技有限公司 | Artificial intelligence chip and data processing method based on artificial intelligence chip |
CN111857989A (en) * | 2020-06-22 | 2020-10-30 | 深圳鲲云信息科技有限公司 | Artificial intelligence chip and data processing method based on artificial intelligence chip |
WO2021068244A1 (en) * | 2019-10-12 | 2021-04-15 | 深圳鲲云信息科技有限公司 | Local data stream acceleration method, data stream acceleration system, and computer device |
CN112840284A (en) * | 2019-08-13 | 2021-05-25 | 深圳鲲云信息科技有限公司 | Automatic driving method and device based on data stream, electronic equipment and storage medium |
CN112905525A (en) * | 2019-11-19 | 2021-06-04 | 中科寒武纪科技股份有限公司 | Method and equipment for controlling calculation of arithmetic device |
CN114021708A (en) * | 2021-09-30 | 2022-02-08 | 浪潮电子信息产业股份有限公司 | Data processing method, device and system, electronic equipment and storage medium |
WO2022028224A1 (en) * | 2020-08-03 | 2022-02-10 | 深圳鲲云信息科技有限公司 | Data storage method and apparatus, and device and storage medium |
CN114461978A (en) * | 2022-04-13 | 2022-05-10 | 苏州浪潮智能科技有限公司 | Data processing method and device, electronic equipment and readable storage medium |
CN116974654A (en) * | 2023-09-21 | 2023-10-31 | 浙江大华技术股份有限公司 | Image data processing method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066239A (en) * | 2017-03-01 | 2017-08-18 | 智擎信息系统(上海)有限公司 | A kind of hardware configuration for realizing convolutional neural networks forward calculation |
US20170316312A1 (en) * | 2016-05-02 | 2017-11-02 | Cavium, Inc. | Systems and methods for deep learning processor |
US20180189638A1 (en) * | 2016-12-31 | 2018-07-05 | Intel Corporation | Hardware accelerator template and design framework for implementing recurrent neural networks |
CN108268943A (en) * | 2017-01-04 | 2018-07-10 | 意法半导体股份有限公司 | Hardware accelerator engine |
CN108710941A (en) * | 2018-04-11 | 2018-10-26 | 杭州菲数科技有限公司 | The hard acceleration method and device of neural network model for electronic equipment |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9185093B2 (en) * | 2012-10-16 | 2015-11-10 | Mcafee, Inc. | System and method for correlating network information with subscriber information in a mobile network environment |
CN106447034B (en) * | 2016-10-27 | 2019-07-30 | 中国科学院计算技术研究所 | A kind of neural network processor based on data compression, design method, chip |
CN108154165B (en) * | 2017-11-20 | 2021-12-07 | 华南师范大学 | Marriage and love object matching data processing method and device based on big data and deep learning, computer equipment and storage medium |
CN109445935B (en) * | 2018-10-10 | 2021-08-10 | 杭州电子科技大学 | Self-adaptive configuration method of high-performance big data analysis system in cloud computing environment |
-
2019
- 2019-04-09 CN CN201910280156.2A patent/CN110046704B/en active Active
- 2019-04-10 WO PCT/CN2019/082101 patent/WO2020206637A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170316312A1 (en) * | 2016-05-02 | 2017-11-02 | Cavium, Inc. | Systems and methods for deep learning processor |
US20180189638A1 (en) * | 2016-12-31 | 2018-07-05 | Intel Corporation | Hardware accelerator template and design framework for implementing recurrent neural networks |
CN108268943A (en) * | 2017-01-04 | 2018-07-10 | 意法半导体股份有限公司 | Hardware accelerator engine |
CN107066239A (en) * | 2017-03-01 | 2017-08-18 | 智擎信息系统(上海)有限公司 | A kind of hardware configuration for realizing convolutional neural networks forward calculation |
CN108710941A (en) * | 2018-04-11 | 2018-10-26 | 杭州菲数科技有限公司 | The hard acceleration method and device of neural network model for electronic equipment |
Non-Patent Citations (3)
Title |
---|
AHMAD SHAWAHNA 等: "FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review", 《IEEE ACCESS》 * |
SHUANGLONG LIU 等: "Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA", 《ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS》 * |
李景军 等: "面向训练阶段的神经网络性能分析", 《计算机科学与探索》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112840284A (en) * | 2019-08-13 | 2021-05-25 | 深圳鲲云信息科技有限公司 | Automatic driving method and device based on data stream, electronic equipment and storage medium |
CN113272792A (en) * | 2019-10-12 | 2021-08-17 | 深圳鲲云信息科技有限公司 | Local data stream acceleration method, data stream acceleration system and computer equipment |
WO2021068244A1 (en) * | 2019-10-12 | 2021-04-15 | 深圳鲲云信息科技有限公司 | Local data stream acceleration method, data stream acceleration system, and computer device |
CN112905525A (en) * | 2019-11-19 | 2021-06-04 | 中科寒武纪科技股份有限公司 | Method and equipment for controlling calculation of arithmetic device |
CN112905525B (en) * | 2019-11-19 | 2024-04-05 | 中科寒武纪科技股份有限公司 | Method and equipment for controlling computing device to perform computation |
CN111404770A (en) * | 2020-02-29 | 2020-07-10 | 华为技术有限公司 | Network device, data processing method, device, system and readable storage medium |
WO2021259232A1 (en) * | 2020-06-22 | 2021-12-30 | 深圳鲲云信息科技有限公司 | Data processing method and apparatus of ai chip and computer device |
CN111857989A (en) * | 2020-06-22 | 2020-10-30 | 深圳鲲云信息科技有限公司 | Artificial intelligence chip and data processing method based on artificial intelligence chip |
CN111752887A (en) * | 2020-06-22 | 2020-10-09 | 深圳鲲云信息科技有限公司 | Artificial intelligence chip and data processing method based on artificial intelligence chip |
CN111753994B (en) * | 2020-06-22 | 2023-11-03 | 深圳鲲云信息科技有限公司 | Data processing method and device of AI chip and computer equipment |
CN111857989B (en) * | 2020-06-22 | 2024-02-27 | 深圳鲲云信息科技有限公司 | Artificial intelligence chip and data processing method based on same |
CN111752887B (en) * | 2020-06-22 | 2024-03-15 | 深圳鲲云信息科技有限公司 | Artificial intelligence chip and data processing method based on same |
CN111753994A (en) * | 2020-06-22 | 2020-10-09 | 深圳鲲云信息科技有限公司 | Data processing method and device of AI chip and computer equipment |
WO2022028224A1 (en) * | 2020-08-03 | 2022-02-10 | 深圳鲲云信息科技有限公司 | Data storage method and apparatus, and device and storage medium |
CN114021708A (en) * | 2021-09-30 | 2022-02-08 | 浪潮电子信息产业股份有限公司 | Data processing method, device and system, electronic equipment and storage medium |
WO2023050807A1 (en) * | 2021-09-30 | 2023-04-06 | 浪潮电子信息产业股份有限公司 | Data processing method, apparatus, and system, electronic device, and storage medium |
CN114461978A (en) * | 2022-04-13 | 2022-05-10 | 苏州浪潮智能科技有限公司 | Data processing method and device, electronic equipment and readable storage medium |
CN116974654A (en) * | 2023-09-21 | 2023-10-31 | 浙江大华技术股份有限公司 | Image data processing method and device, electronic equipment and storage medium |
CN116974654B (en) * | 2023-09-21 | 2023-12-19 | 浙江大华技术股份有限公司 | Image data processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110046704B (en) | 2022-11-08 |
WO2020206637A1 (en) | 2020-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110046704A (en) | Depth network accelerating method, device, equipment and storage medium based on data flow | |
CN110689138B (en) | Operation method, device and related product | |
KR102175044B1 (en) | Apparatus and method for running artificial neural network reverse training | |
CN108241890B (en) | Reconfigurable neural network acceleration method and architecture | |
US20180260709A1 (en) | Calculating device and method for a sparsely connected artificial neural network | |
JP7078758B2 (en) | Improving machine learning models to improve locality | |
JP2020518042A (en) | Processing device and processing method | |
CN110326003A (en) | The hardware node with location-dependent query memory for Processing with Neural Network | |
CN107766935B (en) | Multilayer artificial neural network | |
CN115186821B (en) | Core particle-oriented neural network inference overhead estimation method and device and electronic equipment | |
KR20180102059A (en) | Apparatus and method for executing forward operation of artificial neural network | |
CN109409510A (en) | Neuron circuit, chip, system and method, storage medium | |
CN110309911A (en) | Neural network model verification method, device, computer equipment and storage medium | |
CN103870335A (en) | System and method for efficient resource management of signal flow programmed digital signal processor code | |
CN107402905A (en) | Computational methods and device based on neutral net | |
CN112257368A (en) | Clock layout method, device, EDA tool and computer readable storage medium | |
EP3451240A1 (en) | Apparatus and method for performing auto-learning operation of artificial neural network | |
US20210125042A1 (en) | Heterogeneous deep learning accelerator | |
WO2023125857A1 (en) | Model training method based on machine learning framework system and related device | |
CN112114942A (en) | Streaming data processing method based on many-core processor and computing device | |
WO2023065701A1 (en) | Inner product processing component, arbitrary-precision computing device and method, and readable storage medium | |
KR101825880B1 (en) | Input/output relationship based test case generation method for software component-based robot system and apparatus performing the same | |
CN110490308A (en) | Accelerate design method, terminal device and the storage medium in library | |
CN110442753A (en) | A kind of chart database auto-creating method and device based on OPC UA | |
Gonçalves et al. | Exploring data size to run convolutional neural networks in low density fpgas |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |