CN109948775B - Configurable neural convolution network chip system and configuration method thereof - Google Patents

Configurable neural convolution network chip system and configuration method thereof Download PDF

Info

Publication number
CN109948775B
CN109948775B CN201910128679.5A CN201910128679A CN109948775B CN 109948775 B CN109948775 B CN 109948775B CN 201910128679 A CN201910128679 A CN 201910128679A CN 109948775 B CN109948775 B CN 109948775B
Authority
CN
China
Prior art keywords
weight coefficient
convolution
local pixel
chip system
configurable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910128679.5A
Other languages
Chinese (zh)
Other versions
CN109948775A (en
Inventor
孙建辉
蔡阳健
虞刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN201910128679.5A priority Critical patent/CN109948775B/en
Publication of CN109948775A publication Critical patent/CN109948775A/en
Application granted granted Critical
Publication of CN109948775B publication Critical patent/CN109948775B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present disclosure provides a configurable neural convolutional network chip system and a configuration method thereof. The configurable neural convolutional network chip system comprises at least one neural network configuration unit, wherein each neural network configuration unit comprises a sparse unit which is used for respectively sparsely configuring each local pixel and a corresponding weight coefficient thereof so as to adapt to the size change of a convolutional kernel; the filter multiply-accumulate array is used for performing convolution operation on each sparsely configured local pixel and a preset convolution kernel, multiplying the corresponding convolution operation result by using a weight coefficient corresponding to each local pixel, and accumulating; the accumulation unit is used for adding the preset bias coefficient and convolution accumulation results output by the filter multiplication accumulation array so as to adjust the difficult and easy activation degree of the hidden neuron; a maximal pooling unit for maximal pooling hidden layer neurons to reduce the number of neurons for subsequent convolution.

Description

Configurable neural convolution network chip system and configuration method thereof
Technical Field
The disclosure belongs to the field of chip design, and particularly relates to a configurable neural convolution network chip system and a configuration method thereof.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The neural convolutional network is a feedforward neural network, and the artificial neurons of the neural convolutional network can respond to a part of surrounding units in a coverage range and have excellent performance on large-scale image processing. The convolutional network includes convolutional and pooling layers.
The inventor finds that the following problems exist in the current neural network chip or circuit structure: the architecture configuration form of the neural convolution network is fixed, and the neural convolution network cannot be suitable for different convolutions and sizes, so that hardware resource waste is caused; and the whole nerve convolution process has large calculation amount, so that the calculation power consumption is large.
Disclosure of Invention
According to one aspect of one or more embodiments of the present disclosure, a configurable neuro-convolutional network chip system is provided, which has low power consumption and configurable effect of resource reuse, and is applied to image feature and image edge contour feature identification.
The configurable neural convolutional network chip system comprises at least one neural network configuration unit, wherein each neural network configuration unit comprises:
the sparse unit is used for respectively sparsely configuring each local pixel and the corresponding weight coefficient thereof so as to adapt to the size change of the convolution kernel;
the filter multiply-accumulate array is used for performing convolution operation on each sparsely configured local pixel and a preset convolution kernel, multiplying the corresponding convolution operation result by using a weight coefficient corresponding to each local pixel, and accumulating;
the accumulation unit is used for adding the preset bias coefficient and convolution accumulation results output by the filter multiplication accumulation array so as to adjust the difficult and easy activation degree of the hidden neuron;
a maximal pooling unit for maximal pooling hidden layer neurons to reduce the number of neurons for subsequent convolution.
According to another aspect of one or more embodiments of the present disclosure, there is provided a configuration method of a configurable neuro-convolutional network chip system, which has the effects of low power consumption and configurable resource reuse.
The configuration method of the configurable neural convolution network chip system comprises the following steps:
each local pixel and the corresponding weight coefficient thereof are respectively sparsely configured to adapt to the size change of a convolution kernel;
performing convolution operation on each sparsely configured local pixel and a preset convolution kernel, multiplying corresponding convolution operation results by using a weight coefficient corresponding to each local pixel, and accumulating;
adding the preset bias coefficient and the convolution accumulation result to adjust the difficult and easy activation degree of the hidden neuron;
hidden layer neurons are maximally pooled to reduce the number of neurons for subsequent convolutions.
The beneficial effects of this disclosure are:
(1) in the configurable neural convolutional network chip system disclosed by the invention, pixel data, weight coefficients, an activation layer and hidden neuron bias coefficient connection lines can be subjected to pre-warping processing and rerouting configuration so as to adapt to network structure readjustment force for processing different image feature recognition tasks; the architecture of the neural convolutional network can be reconfigured while the utilization of hardware resources is maximized, so that the neural convolutional network is suitable for different convolutional kernel sizes and is suitable for different edge feature extraction.
(2) The filter multiply-accumulate array performs a low-Power-consumption management mechanism of Power Gating (PG) and Clock Gating (CG) to optimize Power consumption, uses the Power Gating technology (PG) to close the filter multiply-accumulate array which does not work, reduces various Power consumptions such as dynamic Power consumption and static Power consumption, uses the Clock Gating technology (CG) to forbid Clock turnover of the filter multiply-accumulate array, reduces dynamic Power consumption, enters a last maintaining stage, and maintains data after convolution of an old Clock; connectivity of a variable number of local input layer neurons to a single hidden layer neuron can be routed to suit different feature recognition or different convolution kernel operations.
(3) The method and the device have the advantages that the weight coefficients stored in the weight coefficient memory are synchronously broadcasted to the filter multiply-accumulate array in a multicast mode, so that data sharing and rapid synchronous loading between the weight coefficient memory and the filter multiply-accumulate array are realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
Fig. 1 is a schematic structural diagram of a configurable neural convolutional network chip system according to an embodiment of the present disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Interpretation of terms:
fig. 1 is a schematic structural diagram of a configurable neural convolutional network chip system according to an embodiment of the present disclosure.
The english interpretation in fig. 1 is as follows:
MACs: a multiply-add accumulator array.
Filter _ MACs _ 1: multiplying the 1 st filter by the accumulation array to complete the calculation of one hidden neuron in the hidden layer;
filter _ MACs _ k: multiplying the k filter by the accumulation array to complete the calculation of k hidden neurons in the hidden layer;
VDD: a working voltage;
enak _ On _ Off: if the filter multiply accumulate coefficient multicast of array is enabled, Enak _ On _ Off is set to 1, that is, when the filter multiply accumulate coefficient multicast is in high level, multicast data can be input into the interface through the sub-bus;
PG: enabling power supply, wherein when the PG unit is closed, the corresponding module can not supply power and enters a closed state;
CG: enabling a clock, wherein when a CG unit is enabled, the clock enters a module needing convolution operation, when the CG unit is forbidden, clock turnover is forbidden, a corresponding convolution operation module does not update data, and only old data is maintained;
clock: in this synchronous system, the filter multiplies the unique synchronous clock signal at which the accumulation array operates.
Example 1
As shown in fig. 1, a configurable neural convolutional network chip system of this embodiment includes at least one neural network configuration unit, where each neural network configuration unit includes: sparse unit, filter multiply accumulate array, accumulate unit and max pooling unit.
In a specific implementation, the sparse unit is used for respectively sparsely configuring each local pixel and the corresponding weight coefficient thereof so as to adapt to the size change of the convolution kernel.
Specifically, the process of sparsely configuring each local pixel and the corresponding weight coefficient thereof by the sparse unit is as follows:
(1) pixel data coefficient thinning configuration: changing the unused pixel points into 0, so that when the multiplication of the subsequent pixel point data and the corresponding coefficient is carried out, the multiplicand is changed into 0, and the result is directly 0;
(2) weight coefficient thinning configuration: if the pixel data is subjected to sparsification, the weight coefficient is kept unchanged; if the pixel data is not subjected to thinning processing, the weighting coefficient is subjected to thinning processing, namely the corresponding coefficient multiplier is changed into 0, and the obtained result is also 0 when the pixel data and the corresponding coefficient are multiplied; if the weight coefficient kernel becomes small, as shown in fig. 1, the weight coefficient connection switch (weights _ spark _ configure) to the multiplication unit is disabled by lowering the multiplication operation of the multiplicand and multiplier.
In specific implementation, the filter multiply-accumulate array is used for performing convolution operation on each sparsely configured local pixel and a preset convolution kernel, and then multiplying and accumulating corresponding convolution operation results by using a weight coefficient corresponding to each local pixel.
In one embodiment, the accumulation unit is configured to adjust a level of difficult activation of hidden neurons.
In a specific implementation, a max-pooling unit is used to maximize pooling of hidden layer neurons to reduce the number of neurons for subsequent convolution.
The maximum pooling unit performs down-sampling processing on the convolved hidden layer neurons to reduce the number of the hidden layer neurons, reduce the number of times of convolution operation performed later and achieve the purpose of dimension reduction.
The pooled neurons reuse the configurable neural network of the present disclosure for subsequent convolution operations.
Example 2
The configurable neural convolutional network chip system of this embodiment, on the basis of embodiment 1, further includes:
and the pixel data memory is used for storing all local pixel data in the image.
Specifically, image pixel data is pre-compressed, edge and contour feature extraction is considered, only single-channel gray data is reserved, and the rest chrominance channel data are removed, so that the subsequent calculation task amount and pixel storage resource overhead are greatly reduced;
the image is divided into different local areas, and extraction of different edge features based on local area input pixels is carried out, namely, local pixel data and a convolution kernel are subjected to convolution operation to prepare for extraction of different edge features.
Example 3
The configurable neural convolutional network chip system of this embodiment, on the basis of embodiment 1, further includes:
and the weight coefficient initialization unit is used for initializing, performing integer and aligning the weight coefficient corresponding to each local pixel into data with preset digits.
For example: and initializing the weight coefficient corresponding to each local pixel, carrying out integer transformation and aligning to 16 bits.
Example 4
The configurable neural convolutional network chip system of this embodiment, on the basis of embodiment 1, further includes:
the clock gating unit is connected with the filter multiply-accumulate array; the clock gating unit is used for realizing whether to update the calculation data according to the output clock signal.
The filter multiply accumulate array is composed of several filter MAC modules. If the data calculated by a certain filter MAC module in the filter multiply-accumulate array only needs to be kept and does not need to be updated, the clock input by the filter MAC module is forbidden to be a monotone level through a gating clock technology, so that the new calculation data updating is avoided, only old data after convolution operation is reserved, the dynamic energy consumption of the filter MAC module is reduced, when the gating unit is forbidden, the clock passes through again, the calculation data is updated, and meanwhile, the clock passes through after a period of time, so that the influence of charge leakage caused by electric leakage can be reduced.
Example 5
The configurable neural convolutional network chip system of this embodiment, on the basis of embodiment 3, further includes:
and the weight coefficient memory is used for storing the weight coefficient processed by the weight coefficient initialization unit.
In a specific implementation, the weight coefficient memory broadcasts the weight coefficients stored therein to the filter multiply accumulate array synchronously through a multicast mode to realize the sharing of the weight coefficients.
As shown in fig. 1, multicast (multi broadcast), i.e. data from a fixed shared coefficients memory (16bit fixed shared weights memory), passes through a multicast data bus and reaches the coefficient input port of each convolution Filter operation module, and if the coefficient input switch (Ena _ on _ off) of each Filter MAC module (Filter _ MAC) is turned on, the shared coefficients flow into the convolution Filter module, and if not turned on, the switch (Ena _ on _ off) is turned off.
Wherein, multicast is one of 3 basic destination address types of IPv6 data packets, and multicast is one-point-to-multipoint communication.
In the chip system of this embodiment, a convolution processing hardware architecture of a plurality of local pixel data matrices and a shared coefficient data matrix, and a coefficient connection route before convolution of pixel data and a weight coefficient may be configured to adapt to different convolution kernel sizes and to adapt to different feature extractions; accessing an integer aligned coefficient memory, and playing data of a coefficient memory bank to a subsystem number bus input interface of a filter multiply-accumulate array through a bus by utilizing a shared weight coefficient multicast network in consideration of coefficient sharing; the parallel filter multiply-accumulate arrays can synchronously process a plurality of local pixel input arrays to obtain different feature maps for subsequent pooling.
Example 6
The configurable neural convolutional network chip system of this embodiment, on the basis of embodiment 1, further includes:
a power supply gating unit connected to the filter multiply-accumulate array; and the power supply gating unit is used for controlling the start-stop working state of the filter multiply-accumulate array.
If a certain filter MAC module in the filter multiply-accumulate array does not need to be calculated, the power supply of the filter MAC module is cut off by using the power supply gating unit, and the static energy consumption and the dynamic energy consumption of the whole chip are reduced.
The filter multiply-accumulate array of this embodiment can be power-gated by a PG (power gating) unit, if the filter multiply-accumulate array does not need to perform calculation, PG is prohibited, i.e. the working power supply is turned off to eliminate static power consumption and dynamic power consumption, and when the filter multiply-accumulate array needs to be changed, the PG gate needs to be enabled first to supply power; the filter multiply-accumulate array can perform clock gating through a CG (clock gating) unit, temporarily stops the data updating of the filter multiply-accumulate array, enters a maintaining stage, stores the calculated data of the previous clock, and reduces the dynamic power consumption. The coefficient data after the 16-bit sharing coefficient memory is subjected to integer and 16-bit format alignment is stored in the 16-bit sharing coefficient memory.
Example 7
The configurable neural convolutional network chip system of this embodiment, on the basis of embodiment 1, further includes:
and the offset coefficient memory is used for storing a preset offset coefficient.
Wherein the bias coefficients are used to adjust the neuron output.
In the chip system of the present embodiment, the pixel/coefficient connection has a routing function, has low power consumption, maximizes resource utilization, and can configure the neural convolutional network.
Example 8
The configuration method of the configurable neural convolutional network chip system of the embodiment comprises the following steps:
step 1: each local pixel and the corresponding weight coefficient thereof are respectively sparsely configured to adapt to the size change of a convolution kernel;
step 2: performing convolution operation on each sparsely configured local pixel and a preset convolution kernel, multiplying corresponding convolution operation results by using a weight coefficient corresponding to each local pixel, and accumulating;
and step 3: adding the preset bias coefficient and the convolution accumulation result to adjust the difficult and easy activation degree of the hidden neuron;
and 4, step 4: hidden layer neurons are maximally pooled to reduce the number of neurons for subsequent convolutions.
In a specific implementation, before each local pixel and its corresponding weight coefficient are sparsely configured, the method includes:
pre-compressing image pixel data, only retaining single-channel gray data, and removing the rest chroma channel data;
dividing the image into different local areas to obtain all local pixel data in the image;
and initializing the weight coefficient corresponding to each local pixel, and carrying out integer and alignment to data with preset digits.
In another embodiment, the power gating unit is used to control the on-off operation of the filter multiply accumulate array.
If a certain filter MAC module in the filter multiply-accumulate array does not need to be calculated, the power supply of the filter MAC module is cut off by using the power supply gating unit, and the static energy consumption and the dynamic energy consumption of the whole chip are reduced.
In another embodiment, whether to update the calculation data is implemented by using a clock signal output by a clock gating unit.
The filter multiply accumulate array is composed of several filter MAC modules. If the data calculated by a certain filter MAC module in the filter multiply-accumulate array only needs to be kept and does not need to be updated, the clock input by the filter MAC module is forbidden to be a monotone level through a gating clock technology, so that the new calculation data updating is avoided, only old data after convolution operation is reserved, the dynamic energy consumption of the filter MAC module is reduced, when the gating unit is forbidden, the clock passes through again, the calculation data is updated, and meanwhile, the clock passes through after a period of time, so that the influence of charge leakage caused by electric leakage can be reduced.
In another embodiment, the weight coefficients stored in the weight coefficient memory are synchronously broadcast to the filter multiply accumulate array via a multicast format to enable weight coefficient sharing between the weight coefficient memory and the filter multiply accumulate array.
Wherein, multicast is one of 3 basic destination address types of IPv6 data packets, and multicast is one-point-to-multipoint communication.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (11)

1. A configurable neural convolutional network chip system, comprising at least one neural network configuration unit, each neural network configuration unit comprising:
the sparse unit is used for respectively sparsely configuring each local pixel and the corresponding weight coefficient thereof so as to adapt to the size change of the convolution kernel; the image is divided into different local areas, and extraction of different edge features based on local area input pixels is carried out, namely convolution operation is carried out on local pixel data and convolution kernels so as to prepare for extracting different edge features; the filter multiply-accumulate array is used for performing convolution operation on each sparsely configured local pixel and a preset convolution kernel, multiplying the corresponding convolution operation result by using a weight coefficient corresponding to each local pixel, and accumulating; the weight coefficient memory synchronously broadcasts the weight coefficients stored in the weight coefficient memory to the filter multiply accumulation array in a multicast mode so as to realize the sharing of the weight coefficients;
the convolution processing hardware architecture of the multi-local pixel data matrix and the shared coefficient data matrix, and the coefficient connection route before convolution of the pixel data and the weight coefficient can be configured to adapt to different convolution kernel sizes and different feature extraction; accessing an integer aligned weight coefficient memory, and playing data of a coefficient memory bank to a subsystem number bus input interface of a filter multiply-accumulate array through a bus by utilizing a shared weight coefficient multicast network in consideration of coefficient sharing; the parallel filter multiply-accumulate array can synchronously process a plurality of local pixel input arrays to obtain different feature mappings so as to carry out subsequent pooling processing;
the accumulation unit is used for adding the preset bias coefficient and convolution accumulation results output by the filter multiplication accumulation array so as to adjust the difficult and easy activation degree of the hidden neuron;
a maximum pooling unit for performing maximum pooling processing on hidden layer neurons to reduce the number of neurons for subsequent convolution;
the clock gating unit is connected with the filter multiply-accumulate array; the clock gating unit is used for realizing whether to update the calculation data according to the output clock signal.
2. The configurable neuro-convolutional network chip system of claim 1, further comprising:
and the pixel data memory is used for storing all local pixel data in the image.
3. The configurable neuro-convolutional network chip system of claim 1, further comprising:
and the weight coefficient initialization unit is used for initializing, performing integer and aligning the weight coefficient corresponding to each local pixel into data with preset digits.
4. The configurable neuro-convolutional network chip system of claim 3, further comprising:
and the weight coefficient memory is used for storing the weight coefficient processed by the weight coefficient initialization unit.
5. The configurable neuro-convolutional network chip system of claim 1, further comprising:
a power supply gating unit connected to the filter multiply-accumulate array; and the power supply gating unit is used for controlling the start-stop working state of the filter multiply-accumulate array.
6. The configurable neuro-convolutional network chip system of claim 1, further comprising:
and the offset coefficient memory is used for storing a preset offset coefficient.
7. A method of configuring a configurable neural convolutional network chip system as claimed in any one of claims 1-6, comprising:
each local pixel and the corresponding weight coefficient thereof are respectively sparsely configured to adapt to the size change of a convolution kernel;
performing convolution operation on each sparsely configured local pixel and a preset convolution kernel, multiplying corresponding convolution operation results by using a weight coefficient corresponding to each local pixel, and accumulating;
adding the preset bias coefficient and the convolution accumulation result to adjust the difficult and easy activation degree of the hidden neuron;
hidden layer neurons are maximally pooled to reduce the number of neurons for subsequent convolutions.
8. The method of configuring a configurable neural convolutional network chip system of claim 7, wherein before sparsely configuring each local pixel and its corresponding weight coefficient, respectively, comprises:
pre-compressing image pixel data, only retaining single-channel gray data, and removing the rest chroma channel data;
dividing the image into different local areas to obtain all local pixel data in the image; and initializing the weight coefficient corresponding to each local pixel, and carrying out integer and alignment to data with preset digits.
9. The method of configuring a configurable neural convolutional network chip system of claim 7, wherein before sparsely configuring each local pixel and its corresponding weight coefficient, further comprising: and controlling the start-stop working state of the filter multiply-accumulate array by using the power supply gating unit.
10. The method of configuring a configurable neural convolutional network chip system of claim 7, wherein before sparsely configuring each local pixel and its corresponding weight coefficient, further comprising: whether the calculation data is updated or not is realized by using the clock signal output by the clock gating unit.
11. The method of configuring a configurable neural convolutional network chip system of claim 7, wherein before sparsely configuring each local pixel and its corresponding weight coefficient, further comprising: and synchronously broadcasting the weight coefficients stored in the weight coefficient memory to the filter multiply-accumulate array through a multicast mode so as to realize the sharing of the weight coefficients between the weight coefficient memory and the filter multiply-accumulate array.
CN201910128679.5A 2019-02-21 2019-02-21 Configurable neural convolution network chip system and configuration method thereof Expired - Fee Related CN109948775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910128679.5A CN109948775B (en) 2019-02-21 2019-02-21 Configurable neural convolution network chip system and configuration method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910128679.5A CN109948775B (en) 2019-02-21 2019-02-21 Configurable neural convolution network chip system and configuration method thereof

Publications (2)

Publication Number Publication Date
CN109948775A CN109948775A (en) 2019-06-28
CN109948775B true CN109948775B (en) 2021-10-19

Family

ID=67006908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910128679.5A Expired - Fee Related CN109948775B (en) 2019-02-21 2019-02-21 Configurable neural convolution network chip system and configuration method thereof

Country Status (1)

Country Link
CN (1) CN109948775B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396165A (en) * 2020-11-30 2021-02-23 珠海零边界集成电路有限公司 Arithmetic device and method for convolutional neural network

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1010437B (en) * 1988-06-02 1990-11-14 清华大学 Real-time image neighbourhood processor
US9904874B2 (en) * 2015-11-05 2018-02-27 Microsoft Technology Licensing, Llc Hardware-efficient deep convolutional neural networks
CN107871136A (en) * 2017-03-22 2018-04-03 中山大学 The image-recognizing method of convolutional neural networks based on openness random pool
CN107832841B (en) * 2017-11-14 2020-05-05 福州瑞芯微电子股份有限公司 Power consumption optimization method and circuit of neural network chip
CN107918794A (en) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 Neural network processor based on computing array
CN108805266B (en) * 2018-05-21 2021-10-26 南京大学 Reconfigurable CNN high-concurrency convolution accelerator
CN108875917A (en) * 2018-06-28 2018-11-23 中国科学院计算技术研究所 A kind of control method and device for convolutional neural networks processor

Also Published As

Publication number Publication date
CN109948775A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
CN105681628B (en) A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing
US10157156B2 (en) Computing architecture with co-processor
US10678584B2 (en) FPGA-based method for network function accelerating and system thereof
CN109388595B (en) High bandwidth memory system and logic die
EP3731101A1 (en) Architectural enhancements for computing systems having artificial intelligence logic disposed locally to memory
CN102906726B (en) Association process accelerated method, Apparatus and system
CN107196807A (en) Network intermediary device and its dispositions method
US20050235070A1 (en) Systems and methods for reconfigurable computing
CN111339027A (en) Automatic design method of reconfigurable artificial intelligence core and heterogeneous multi-core chip
CN105808328A (en) Task scheduling method, device and system
CN109948775B (en) Configurable neural convolution network chip system and configuration method thereof
US11880684B2 (en) RISC-V-based artificial intelligence inference method and system
CN111427835A (en) Network-on-chip design method and device based on hybrid routing algorithm
WO2022193530A1 (en) Blockchain protocol stack architecture development method and system, device, and storage medium
CN115456155A (en) Multi-core storage and calculation processor architecture
CN106227506A (en) A kind of multi-channel parallel Compress softwares system and method in memory compression system
CN110889499A (en) Neural processing system
CN112306951B (en) CNN-SVM resource efficient acceleration architecture based on FPGA
CN112988621A (en) Data loading device and method for tensor data
JP2005216283A (en) Single chip protocol converter
CN113301141A (en) Deployment method and system of artificial intelligence support framework
CN114912596A (en) Sparse convolution neural network-oriented multi-chip system and method thereof
CN114116052A (en) Edge calculation method and device
CN115731111A (en) Image data processing device and method, and electronic device
EP3678063A1 (en) A convolution operator system to perform concurrent convolution operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211019