CN108647777A - A kind of data mapped system and method for realizing that parallel-convolution calculates - Google Patents
A kind of data mapped system and method for realizing that parallel-convolution calculates Download PDFInfo
- Publication number
- CN108647777A CN108647777A CN201810432269.5A CN201810432269A CN108647777A CN 108647777 A CN108647777 A CN 108647777A CN 201810432269 A CN201810432269 A CN 201810432269A CN 108647777 A CN108647777 A CN 108647777A
- Authority
- CN
- China
- Prior art keywords
- convolution
- data
- feature vector
- module
- characteristic pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of data mapped systems and method for realizing that parallel-convolution calculates, belong to nerual network technique field.The data mapped system that the realization parallel-convolution of the present invention calculates includes input feature vector cache module, mapping logic module, output characteristic pattern cache module, weight cache module, convolutional calculation array and control logic module, the input feature vector figure cache module is separately connected with control logic module, mapping logic module, weight cache module is separately connected with control logic module, mapping logic module, computing array is connected with control logic module, mapping logic module, output characteristic pattern cache module, and output characteristic pattern cache module is connected with control logic module.The data mapped system and can eliminate computing resource that is invalid or being not involved in that the realization parallel-convolution of the invention calculates, improve computing resource utilization rate, have good application value.
Description
Technical field
The present invention relates to nerual network technique fields, specifically provide a kind of data mapped system realized parallel-convolution and calculated
And method.
Background technology
With artificial intelligence(AI)The development in field, CNN(Convolutional Neural Network, that is, convolutional Neural
Network)It is fully used.Mainstream convolutional neural networks model is not only complicated at present, and it is big and each to calculate data volume
Layer architecture difference is also very big, and hardware circuit realizes that high-performance realizes that high universalizable is not light simultaneously, should consider the utilization of resources
Rate considers Energy Efficiency Ratio again.Realize each layer of whole network model and unrealistic, power consumption, area, resource profit simultaneously with hardware circuit
It is difficult to obtain satisfied with rate etc. as a result, the usual way for solving the problems, such as this is to exchange area for the time, it also will entire model
Hierarchical block processing is carried out, circuit design at general basic unit, entire model is constructed by control circuit timesharing, simultaneously
Means are mapped by efficient data and improve resource utilization, and circuit working performance is improved with this.In the prior art in hardware electricity
Road is realized be more than 1 there are convolution kernel sliding step during certain convolutional neural networks models calculate in the case of, there are invalid computation,
Reduce resource utilization;On the other hand, in the case of computing array circuit design is fixed, if there is output characteristic pattern and calculate
There is the resource for being not involved in calculating in the unmatched situation of array sizes, there is also waste, computing resource waste meetings for resource utilization
Overall performance is set to cannot get ideal result.
Invention content
The technical assignment of the present invention is in view of the above problems, to provide a kind of meter that can be eliminated and in vain or be not involved in
Resource is calculated, the data mapped system of computing resource utilization rate realized parallel-convolution and calculated is improved.
The further technical assignment of the present invention is to provide a kind of data mapping method realized parallel-convolution and calculated.
To achieve the above object, the present invention provides following technical solutions:
A kind of data mapped system realized parallel-convolution and calculated, which includes input feature vector cache module, mapping logic mould
Block, output characteristic pattern cache module, weight cache module, convolutional calculation array and control logic module, the input feature vector figure
Cache module is separately connected with control logic module, mapping logic module, and weight cache module is patrolled with control logic module, mapping
It collects module to be separately connected, convolutional calculation array and control logic module, mapping logic module, output characteristic pattern cache module phase
Even, output characteristic pattern cache module is connected with control logic module.
The data mapped system for realizing parallel-convolution calculating increases convolutional calculation by reconfiguring input feature vector figure
Degree of parallelism, eliminate computing resource that is invalid or being not involved in.Input feature vector figure is particularly subjected to well-regulated piecemeal, is passed through
Effective mapping means, reconfigure input feature vector figure, will be invalid or be not involved in calculating section and be substituted for effective calculating section, increasing
The degree of parallelism for adding whole convolutional calculation improves the utilization rate of computing resource, improves system performance.
Preferably, caching of the input feature vector figure cache module as outer input data, mapping logic module are pressed
According to the order that control logic module issues data, mapping logic mould are obtained from input feature vector figure cache module and weight cache module
Block send the data of acquisition to convolutional calculation array, and convolutional calculation array will calculate the data completed and send to output characteristic pattern caching
Module.
Preferably, the convolutional calculation array multiplies N row convolutional calculation units, adjacent convolutional calculation unit using N rows
Interconnection.
Each convolutional calculation unit includes 2x2 PE(Processing Element, that is, processing unit), convolution meter
When calculation, each PE corresponds to the calculating of a pixel of an output characteristic pattern.
A method of realizing the data mapping that parallel-convolution calculates, the method carries out input feature vector figure well-regulated
Piecemeal reconfigures input feature vector figure by mapping means, increases the degree of parallelism of convolutional calculation, and mapping logic will be from group again
The data that the input feature vector figure of conjunction obtains are sent to convolutional calculation array, and convolutional calculation array is sent the data completed are calculated to output
Characteristic pattern cache module.
Preferably, when convolution kernel sliding step is more than 1, convolution kernel in input feature vector figure is slided to the part of invalid computation
It is partially filled with what is effectively calculated, the input feature vector figure reconfigured is inputted as convolution unit.
Preferably, the part of convolution kernel sliding invalid computation is filled out with the part effectively calculated in the figure by input feature vector
It fills, invalid computation partial array is filled using the data of the effective calculating position in the matrix upper right corner, will participate in having in input feature vector figure
The data that effect calculates translate downwards to the right, copy in adjacent convolutional calculation unit.
Preferably, the data copied in adjacent convolutional calculation unit and the volume read in from weight cache module
Product core weighted value carries out convolutional calculation, and the characteristic pattern of Combination nova is made to have traversed weighted value, and result of calculation is sent to output characteristic pattern
Cache module.
Preferably, when output characteristic pattern and computing array size mismatch, by multichannel input feature vector figure be divided into compared with
Small characteristic pattern unit reconfigures the characteristic pattern unit of adjacency channel same position for new input feature vector figure, as volume
Product computing array input.
Preferably, the multichannel input feature vector figure division proportion depends on output characteristic pattern size, port number depends on
In convolutional calculation array sizes and output characteristic pattern size.
Compared with prior art, the data mapping method that realization parallel-convolution of the invention calculates has with following prominent
Beneficial effect:The data mapping method for realizing parallel-convolution calculating reconfigures input feature vector by effectively mapping means
Figure, increases the degree of parallelism of convolutional calculation, and input feature vector figure is particularly carried out well-regulated piecemeal, will be invalid or be not involved in meter
Partial replacement is calculated into effective calculating section, computing resource that is invalid or being not involved in is eliminated, increases the degree of parallelism of whole convolutional calculation,
The utilization rate of computing resource is improved, system performance is improved, there is good application value.
Description of the drawings
Fig. 1 is the topological diagram for the data mapped system that realization parallel-convolution of the present invention calculates;
Fig. 2 is that convolutional calculation unit progress convolutional calculation is opened up in the data mapped system that realization parallel-convolution of the present invention calculates
Flutter figure;
Fig. 3 is the signal when data mapping method convolution kernel sliding step that realization parallel-convolution of the present invention calculates is more than 1
Figure;
Fig. 4 be realization parallel-convolution of the present invention the data mapping method output characteristic pattern and the computing array size that calculate not
The schematic diagram of timing.
Specific implementation mode
Below in conjunction with drawings and examples, to the data mapped system and method for realizing parallel-convolution calculating of the present invention
It is described in further detail.
Embodiment
As shown in Figure 1, the data mapped system of the present invention realized parallel-convolution and calculated, including input feature vector cache mould
Block, mapping logic module, output characteristic pattern cache module, weight cache module, convolutional calculation array and control logic module.
Caching of the input feature vector figure cache module as outer input data, with control logic module, mapping logic module
It is separately connected.
Convolutional calculation array multiplies N row convolutional calculation units, adjacent convolutional calculation element-interconn ection using N rows.Such as Fig. 2 institutes
Show, each convolutional calculation unit includes 2x2 PE, and when convolutional calculation, each PE corresponds to a pixel of an output characteristic pattern
The calculating of point.
Mapping logic module is cached according to the order that control logic module issues from input feature vector figure cache module and weight
Module obtains data, and mapping logic module send the data of acquisition to convolutional calculation array, and convolutional calculation array will be calculated and be completed
Data send to output characteristic pattern cache module.
Weight cache module is separately connected with control logic module, mapping logic module.Convolutional calculation array is patrolled with control
Module, mapping logic module, output characteristic pattern cache module is collected to be connected.Export characteristic pattern cache module and control logic module phase
Even.
The present invention's realizes that input feature vector figure is carried out well-regulated piecemeal by the data mapping method that parallel-convolution calculates, and leads to
Mapping means are crossed, input feature vector figure is reconfigured, increase the degree of parallelism of convolutional calculation, mapping logic will be from the input reconfigured
The data that characteristic pattern obtains are sent to convolutional calculation array, and convolutional calculation array, which will calculate the data completed and send to output characteristic pattern, to be delayed
Storing module.
When convolution kernel sliding step is more than 1, convolution kernel in input feature vector figure is slided to the part effectively meter of invalid computation
That calculates is partially filled with, and invalid computation partial array is filled using the data of the effective calculating position in the matrix upper right corner, by input feature vector
The data effectively calculated are participated in figure to translate downwards to the right, are copied in adjacent computing unit.Copy to adjacent calculating list
Data in member carry out convolutional calculation with the convolution kernel weighted value read in from weight cache module, and the characteristic pattern of Combination nova is made to traverse
Complete weighted value, the input feature vector figure reconfigured are inputted as convolution unit, and result of calculation is sent to output characteristic pattern and is delayed
Storing module.Specific implementation process is as shown in Figure 3.It is 4x4 with convolutional calculation array sizes, output characteristic pattern is 2x2, convolution kernel power
Weight matrix is 1x1, is illustrated for the example that convolution kernel sliding step is 2.The each sliding step of convolution kernel is 2, often calculates one
Effective output point can all carry out primary invalid calculating, and entire computing array effective rate of utilization is (2x2)/(4x4)=1/4, is calculated
Resource receives waste, in order to make full use of computing resource, is replicated parallel using by effective computing resource, then respectively from different volumes
Product nuclear convolution, and cache the mode of intermediate result.
1, period 1 T0 moment, control logic command mappings logic cache from input feature vector figure and obtain input feature vector figure
In 11 point values input computing array, cached from weight and obtain respective weights k1 and input computing array.
2, the T1 moment, in computing array, 11 point values and weight k1 be calculated result of calculation out0 to exporting feature
Figure caching, while copying to 12 position of clearing array by 11 points.
3, the T2 moment, 12 point values and weight k2 carry out result out1 is calculated in computing array delays to output characteristic pattern
It deposits, while 21 position of clearing array is copied to by 11 points.
4, the T3 moment, in computing array, 21 point values and weight k3 carry out result out2 is calculated to be delayed to output characteristic pattern
It deposits, while 22 position of clearing array is copied to by 11 points.
5, T4 moment, 22 positions with weight k4 carry out that result out3 is calculated, to output characteristic pattern caching.
The same processing mode of other computing units, until by first characteristic value of input feature vector figure and ownership restatement
It calculates and completes, and preserve intermediate result, then carry out next characteristic value clearing, and so on, first passage input feature vector figure is whole
After the completion of calculating, next channel input feature vector figure enters calculating, and different channels are corresponded to results of intermediate calculations and sum up place
Reason.
When exporting characteristic pattern with computing array size mismatch, multichannel input feature vector figure is divided into smaller characteristic pattern
Unit reconfigures the characteristic pattern unit of adjacency channel same position for new input feature vector figure, as convolutional calculation array
Input.Multichannel input feature vector figure division proportion depend on output characteristic pattern size, port number depend on computing array size and
Export characteristic pattern size.Specific implementation process is as shown in Figure 4.It is 3x3 with convolutional calculation array sizes, output characteristic pattern size is
2x2, convolution kernel size are 1x1, and the example that sliding step is 1 illustrates, and in the case of this kind, computing array size is more than output
Characteristic pattern size, and be not integral multiple relation, computing resource utilization rate is(2x2)/ (3x3)=4/9, is not involved in the resource of calculating
It is wasted.Input feature vector figure stripping and slicing is taken, the input feature vector figure stripping and slicing in different channels is combined, calculating is made full use of
All resources of array so that computing resource can be fully used.Detailed process is as follows:
1, period 1 T0 moment, control logic command mappings logic divide 11 point values of the same position of the one two three four-way
Not Shu Ru computing array 11,12,21,22 positions, four tunnel parallel computations simultaneously obtain 4 output characteristic pattern 11 point values, and
Keep in output characteristic pattern caching.
2, T1 moment, control logic command mappings logic are defeated by the 12 point values difference of the same position of the one two three four-way
Enter the 11 of computing array, 12,21,22 positions, four tunnel parallel computations obtain 12 point values of 4 output characteristic patterns simultaneously, and keep in
To output characteristic pattern caching.
3, T2 moment, control logic command mappings logic are defeated by the 21 point values difference of the same position of the one two three four-way
Enter the 11 of computing array, 12,21,22 positions, four tunnel parallel computations obtain 21 point values of 4 output characteristic patterns simultaneously, and keep in
To output characteristic pattern caching.
4, T3 moment, control logic command mappings logic are defeated by the 22 point values difference of the same position of the one two three four-way
Enter the 11 of computing array, 12,21,22 positions, four tunnel parallel computations obtain 22 point values of 4 output characteristic patterns simultaneously, and keep in
To output characteristic pattern caching.
At the end of the T3 moment, all point values of output characteristic pattern in four channels, which all calculate, to be completed.
Embodiment described above, the only present invention more preferably specific implementation mode, those skilled in the art is at this
The usual variations and alternatives carried out within the scope of inventive technique scheme should be all included within the scope of the present invention.
Claims (9)
1. a kind of data mapped system realized parallel-convolution and calculated, it is characterised in that:The system includes input feature vector caching mould
Block, mapping logic module, output characteristic pattern cache module, weight cache module, convolutional calculation array and control logic module, institute
It states input feature vector figure cache module to be separately connected with control logic module, mapping logic module, weight cache module is patrolled with control
Collect module, mapping logic module is separately connected, convolutional calculation array and control logic module, mapping logic module, output feature
Figure cache module is connected, and output characteristic pattern cache module is connected with control logic module.
2. the data mapped system according to claim 1 realized parallel-convolution and calculated, it is characterised in that:The input
Caching of the characteristic pattern cache module as outer input data, the order that mapping logic module is issued according to control logic module from
Input feature vector figure cache module and weight cache module obtain data, and mapping logic module send the data of acquisition to convolutional calculation
Array, convolutional calculation array will calculate the data completed and send to output characteristic pattern cache module.
3. the data mapped system according to claim 1 or 2 realized parallel-convolution and calculated, it is characterised in that:The volume
Product computing array multiplies N row convolutional calculation units, adjacent convolutional calculation element-interconn ection using N rows.
4. a kind of method that data that realizing that parallel-convolution calculates map, it is characterised in that:The method by input feature vector figure into
The well-regulated piecemeal of row reconfigures input feature vector figure by mapping means, increases the degree of parallelism of convolutional calculation, mapping logic
The data obtained from the input feature vector figure reconfigured are sent to convolutional calculation array, convolutional calculation array will calculate the number completed
According to send to output characteristic pattern cache module.
5. the data mapping method according to claim 4 realized parallel-convolution and calculated, it is characterised in that:Convolution kernel slides
When step-length is more than 1, the part that convolution kernel in input feature vector figure is slided to invalid computation is partially filled with what is effectively calculated, obtains weight
The input feature vector figure of Combination nova is inputted as convolution unit.
6. the data mapping method according to claim 4 or 5 realized parallel-convolution and calculated, it is characterised in that:It is described to incite somebody to action
The part of convolution kernel sliding invalid computation is partially filled with what is effectively calculated in input feature vector figure, is effectively counted using the matrix upper right corner
The data for calculating position fill invalid computation partial array, and the data for participating in effectively calculating in input feature vector figure are put down downwards to the right
It moves, copies in adjacent convolutional calculation unit.
7. the data mapping method according to claim 6 realized parallel-convolution and calculated, it is characterised in that:It is described to copy to
Data in adjacent convolutional calculation unit carry out convolutional calculation with the convolution kernel weighted value read in from weight cache module, make new
The characteristic pattern of combination has traversed weighted value, and result of calculation is sent to output characteristic pattern cache module.
8. the data mapping method according to claim 4 realized parallel-convolution and calculated, it is characterised in that:Export characteristic pattern
When being mismatched with computing array size, multichannel input feature vector figure is divided into smaller characteristic pattern unit, adjacency channel is same
The characteristic pattern unit of one position reconfigures as new input feature vector figure, is inputted as convolutional calculation array.
9. the data mapping method according to claim 8 realized parallel-convolution and calculated, it is characterised in that:The multichannel
Input feature vector figure division proportion depends on output characteristic pattern size, and port number depends on convolutional calculation array sizes and output feature
Figure size.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810432269.5A CN108647777A (en) | 2018-05-08 | 2018-05-08 | A kind of data mapped system and method for realizing that parallel-convolution calculates |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810432269.5A CN108647777A (en) | 2018-05-08 | 2018-05-08 | A kind of data mapped system and method for realizing that parallel-convolution calculates |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108647777A true CN108647777A (en) | 2018-10-12 |
Family
ID=63749398
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810432269.5A Pending CN108647777A (en) | 2018-05-08 | 2018-05-08 | A kind of data mapped system and method for realizing that parallel-convolution calculates |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108647777A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110163338A (en) * | 2019-01-31 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Chip operation method, device, terminal and chip with operation array |
CN112101284A (en) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | Image recognition method, training method, device and system of image recognition model |
CN112966807A (en) * | 2019-12-13 | 2021-06-15 | 上海大学 | Convolutional neural network implementation method based on storage resource limited FPGA |
CN114429207A (en) * | 2022-01-14 | 2022-05-03 | 支付宝(杭州)信息技术有限公司 | Convolution processing method, device, equipment and medium for feature map |
CN114565501A (en) * | 2022-02-21 | 2022-05-31 | 格兰菲智能科技有限公司 | Data loading method and device for convolution operation |
CN116306855A (en) * | 2023-05-17 | 2023-06-23 | 之江实验室 | Data processing method and device based on memory and calculation integrated system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512723A (en) * | 2016-01-20 | 2016-04-20 | 南京艾溪信息科技有限公司 | Artificial neural network calculating device and method for sparse connection |
CN106446546A (en) * | 2016-09-23 | 2017-02-22 | 西安电子科技大学 | Meteorological data complement method based on automatic convolutional encoding and decoding algorithm |
CN106779060A (en) * | 2017-02-09 | 2017-05-31 | 武汉魅瞳科技有限公司 | A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization |
CN107153873A (en) * | 2017-05-08 | 2017-09-12 | 中国科学院计算技术研究所 | A kind of two-value convolutional neural networks processor and its application method |
CN107918794A (en) * | 2017-11-15 | 2018-04-17 | 中国科学院计算技术研究所 | Neural network processor based on computing array |
-
2018
- 2018-05-08 CN CN201810432269.5A patent/CN108647777A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512723A (en) * | 2016-01-20 | 2016-04-20 | 南京艾溪信息科技有限公司 | Artificial neural network calculating device and method for sparse connection |
CN107506828A (en) * | 2016-01-20 | 2017-12-22 | 南京艾溪信息科技有限公司 | Computing device and method |
CN106446546A (en) * | 2016-09-23 | 2017-02-22 | 西安电子科技大学 | Meteorological data complement method based on automatic convolutional encoding and decoding algorithm |
CN106779060A (en) * | 2017-02-09 | 2017-05-31 | 武汉魅瞳科技有限公司 | A kind of computational methods of the depth convolutional neural networks for being suitable to hardware design realization |
CN107153873A (en) * | 2017-05-08 | 2017-09-12 | 中国科学院计算技术研究所 | A kind of two-value convolutional neural networks processor and its application method |
CN107918794A (en) * | 2017-11-15 | 2018-04-17 | 中国科学院计算技术研究所 | Neural network processor based on computing array |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110163338A (en) * | 2019-01-31 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Chip operation method, device, terminal and chip with operation array |
WO2020156508A1 (en) * | 2019-01-31 | 2020-08-06 | 腾讯科技(深圳)有限公司 | Method and device for operating on basis of chip with operation array, and chip |
CN110163338B (en) * | 2019-01-31 | 2024-02-02 | 腾讯科技(深圳)有限公司 | Chip operation method and device with operation array, terminal and chip |
CN112966807A (en) * | 2019-12-13 | 2021-06-15 | 上海大学 | Convolutional neural network implementation method based on storage resource limited FPGA |
CN112966807B (en) * | 2019-12-13 | 2022-09-16 | 上海大学 | Convolutional neural network implementation method based on storage resource limited FPGA |
CN112101284A (en) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | Image recognition method, training method, device and system of image recognition model |
CN114429207A (en) * | 2022-01-14 | 2022-05-03 | 支付宝(杭州)信息技术有限公司 | Convolution processing method, device, equipment and medium for feature map |
CN114429207B (en) * | 2022-01-14 | 2024-09-06 | 支付宝(杭州)信息技术有限公司 | Convolution processing method, device, equipment and medium for feature map |
CN114565501A (en) * | 2022-02-21 | 2022-05-31 | 格兰菲智能科技有限公司 | Data loading method and device for convolution operation |
CN114565501B (en) * | 2022-02-21 | 2024-03-22 | 格兰菲智能科技有限公司 | Data loading method and device for convolution operation |
CN116306855A (en) * | 2023-05-17 | 2023-06-23 | 之江实验室 | Data processing method and device based on memory and calculation integrated system |
CN116306855B (en) * | 2023-05-17 | 2023-09-01 | 之江实验室 | Data processing method and device based on memory and calculation integrated system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647777A (en) | A kind of data mapped system and method for realizing that parallel-convolution calculates | |
CN111178519B (en) | Convolutional neural network acceleration engine, convolutional neural network acceleration system and method | |
CN108875958A (en) | Use the primary tensor processor of outer product unit | |
CN107590085B (en) | A kind of dynamic reconfigurable array data path and its control method with multi-level buffer | |
CN102497411B (en) | Intensive operation-oriented hierarchical heterogeneous multi-core on-chip network architecture | |
CN107852379A (en) | For the two-dimentional router of orientation of field programmable gate array and interference networks and the router and other circuits of network and application | |
US20220222513A1 (en) | Neural network processor system and methods of operating and forming thereof | |
CN103607466B (en) | A kind of wide-area multi-stage distributed parallel grid analysis method based on cloud computing | |
CN110163354A (en) | A kind of computing device and method | |
CN107506329B (en) | A kind of coarse-grained reconfigurable array and its configuration method of automatic support loop iteration assembly line | |
CN105373517A (en) | Spark-based distributed matrix inversion parallel operation method | |
CN103761072B (en) | A kind of array register file structure of coarseness reconfigurable hierarchical | |
CN105468439A (en) | Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework | |
CN104935628A (en) | Method of migrating a plurality of associated virtual machines among a plurality of data centers | |
CN108805285A (en) | A kind of convolutional neural networks pond unit design method | |
CN108875957B (en) | Primary tensor processor and the system for using primary tensor processor | |
Xu et al. | CMSA: Configurable multi-directional systolic array for convolutional neural networks | |
CN113301221B (en) | Image processing method of depth network camera and terminal | |
CN112580774B (en) | Neural network layout method for reconfigurable neural network processor | |
CN203706196U (en) | Coarse-granularity reconfigurable and layered array register file structure | |
CN106357800A (en) | Cloud computing service architecture based on QoE | |
CN109460294A (en) | A kind of reconfigurable task layout method based on area-efficient and dynamic genetic algorithm | |
CN109635945A (en) | A kind of training method of the deep neural network for image classification | |
CN109597619A (en) | A kind of adaptive compiled frame towards heterogeneous polynuclear framework | |
CN106250352A (en) | A kind of abstract model method of reconfigurable processor array |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181012 |
|
RJ01 | Rejection of invention patent application after publication |