CN111882051B - Global broadcast data input circuit for neural network processing - Google Patents

Global broadcast data input circuit for neural network processing Download PDF

Info

Publication number
CN111882051B
CN111882051B CN202010746509.6A CN202010746509A CN111882051B CN 111882051 B CN111882051 B CN 111882051B CN 202010746509 A CN202010746509 A CN 202010746509A CN 111882051 B CN111882051 B CN 111882051B
Authority
CN
China
Prior art keywords
data
module
state
input
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010746509.6A
Other languages
Chinese (zh)
Other versions
CN111882051A (en
Inventor
韩军
张权
张永亮
曾晓洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202010746509.6A priority Critical patent/CN111882051B/en
Publication of CN111882051A publication Critical patent/CN111882051A/en
Application granted granted Critical
Publication of CN111882051B publication Critical patent/CN111882051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to the technical field of integrated circuits, and particularly relates to a global broadcast data input circuit oriented to neural network processing. The circuit of the invention comprises: the top layer module is used for recording data receiving times, the vertical bus module is used for inputting data broadcasting and transmitting in the vertical direction, the horizontal bus module is used for inputting data broadcasting and transmitting in the horizontal direction, and the broadcasting transmitting module of the appointed operation unit is selected. The circuit adopts a two-level bus form in the horizontal and vertical directions to cut a data path, and greatly reduces the extra area overhead and power consumption overhead brought by huge bandwidth in a single bus form while transmitting data in high parallel; meanwhile, an operation unit identification number and an input data label handshaking mechanism are introduced into the broadcast transmitting module, so that the data multiplexing degree is improved, the access frequency of the circuit is reduced, and the integral energy efficiency ratio of the circuit is improved while the data transmitting function of the input circuit is ensured to be correct. The invention can effectively improve the transmission efficiency of the input data in the neural network processing.

Description

Global broadcast data input circuit for neural network processing
Technical Field
The invention belongs to the technical field of integrated circuits, and particularly relates to a global broadcast data input circuit for neural network processing.
Background
The neural network algorithm is well applied to important fields of computer vision, speech recognition, robot control and the like, but various applications also continuously put higher requirements on the precision and complexity of the neural network algorithm, so that the algorithm implementation faces a series of challenging problems. Recent neural network processor architecture research shows that high parallelism and high reusability in a neural network algorithm can be well utilized by matching a row fixed data stream strategy and then matching a specific data transmission channel based on an array parallel spatial processor architecture, so that the number of times of data access and storage is greatly reduced, and the overall energy efficiency ratio of a processor is improved.
The data transmission path is an important medium for the data interaction between the storage system and the convolution operation array, and the hardware implementation of the data transmission path is mainly how to highly transmit data concurrently and reduce the area overhead and the power consumption overhead caused by the bandwidth. For high-concurrency data, input data can be directly sent to all operation units in the convolution operation array, and as the scale of the convolution operation array increases, the bandwidth overhead caused by the direct sending method is very high. The other method is to use a two-level bus form to cut the data path, and for the two-level bus, the bandwidth overhead is relatively small. The form of the two-stage bus is a common hardware implementation mode, and is beneficial to the implementation of an array parallel spatial neural network processor architecture, and the area overhead and the power consumption overhead caused by data bandwidth are greatly reduced. The design provides a row fixed data flow strategy, and a two-stage bus structure is adopted to complete high-parallel global input of data.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a global broadcast data input circuit oriented to neural network processing, which is matched with a row fixed data stream strategy and adopts a two-stage bus structure to complete global transmission of input data.
The invention provides a global broadcast data input circuit for neural network processing, which structurally comprises a top layer module, a vertical bus module, a horizontal bus module and a broadcast transmitting module; wherein:
the top module is used for receiving a data packet from the storage system and automatically recording the data receiving number and the automatic switching of the identification number array according to the internal signal of the data packet; specifically, the top module automatically calculates the data sending times of a single convolution layer according to an external control signal, records the line number of received data, ensures the accuracy of the data sending times, sends a switching signal of an ID array to a broadcast transmitting unit, and ensures the ordered sending of the data; the input data of the top module is a data packet and a data label: the data packets are input data arrays, and each data packet contains 8-bit input data, masks corresponding to data numerical values and convolution line ending signals; the data tags include row tags and column tags.
The entire global broadcast data input process contains six states: initialization state (eldle), configuration state (eConfig), load control information state (epoadctrl), read data tag state (eUpdatetag), read incoming packet state (eTrans), current packet transfer complete state (etrandone), represented using s0, s1, s2, s3, s4, and s5, respectively. Each state jump condition is: firstly, configuring a starting signal; configuring an end signal; making unconditional skip; reading a new data tag array from the external FIFO; fifthly, reading the input data packet by the external FIFO and decoding the input data packet; sixthly, the data of the current row is not transmitted; seventhly, after the data loading of the current row is finished, starting the input transmission of the next row; and completing loading of all data of the current pass, wherein the pass refers to the current channel calculation process. After the hardware is powered on, the hardware is in the state of s0, and when (r) is detected, the hardware starts to enter the state of s 1. The ID array is configured in the s1 state, and when the condition is satisfied, the state jumps to s 2. In the s2 state, the hardware loads the relevant control information in one cycle, and in the condition ③, the hardware enters the s3 state. In the s3 state, the top module reads the data tag array from the external FIFO and holds it until the next read update, and under condition (r), the state jumps to s 4. Under the state of s4, the top-layer module reads in an input data packet from an external FIFO and transmits the input data packet to the decoding module, the decoding module decodes the data packet to obtain input data, a mask corresponding to a data value and a convolution line end signal, and under the condition of a fifth, the state jumps to s 5; and in the state of s5, transmitting the input data obtained by decoding the row and column labels obtained in s3 and s4, the mask corresponding to the data value and the convolution row end signal to the vertical bus module for current data packet transmission. When the transmission of the current data packet is finished, judging according to the convolution line ending signal and the transmission line number, and when the condition of (c) is met, jumping to s 4; when the condition is met, the state jumps to s 3; when the condition of (b) is satisfied, the state jumps to (s 1).
And the vertical bus module is used for receiving the data which is decoded and sent by the top module, including the mask, the ID switching signal and the data label, copying the data and sending the data to all the broadcast transmitting modules which are connected between the module and the horizontal bus module. In addition, the vertical bus module also generates a packet Valid signal Valid at the time of copying and transmits it to all broadcast transmitting modules connected between the module and the horizontal bus module.
The horizontal bus module is used for receiving data sent by the broadcast transmitting module between the vertical bus module and the horizontal bus module, including masks, ID switching signals and column labels, copying the data, and then sending the data to all the broadcast transmitting modules connected between the module and the arithmetic unit. In addition, the horizontal bus module also generates a packet Valid signal Valid at the time of copying, and transmits it to all broadcast transmitting modules connected between the module and the arithmetic unit.
The broadcast transmitting unit has two kinds of module structures, one is positioned between the vertical bus module and the horizontal bus module, a corresponding line label is selected according to an ID switching signal sent by the vertical bus module, the line label is compared with the internal line identification number value, if the line label is matched with the internal line identification number value, and when the Valid and an input first-in first-out queue non-empty signal ready of the operation unit are both high level, the data, the mask, the ID switching signal and the line label are sent to a horizontal bus connected with the multiplex selection switch Mux, otherwise, the related output is shielded; and the other type of the tag is positioned between the horizontal bus module and the operation unit, the corresponding column tag is selected according to the ID switching signal sent by the horizontal bus module, the column tag is compared with the internal column identification number value, if the column tag is matched with the internal column identification number value and when the Valid and the input first-in first-out queue non-empty signal ready of the operation unit are both in a high level, the Valid data is selected to be sent to the operation unit through the multi-way selector Mux and the numerical mask, and otherwise, the relevant output is shielded.
The circuit provided by the invention adopts a two-level bus form in the horizontal and vertical directions to cut a data path, and the vertical bus module and the horizontal bus module cooperate with each other to send data in a highly parallel manner, so that the extra area overhead and power consumption overhead caused by huge bandwidth in a single bus form are greatly reduced. Meanwhile, an operation unit identification number and an input data label handshaking mechanism are introduced into the broadcast transmitting module, so that the data multiplexing degree is improved, the access frequency of the circuit is reduced, and the integral energy efficiency ratio of the circuit is improved while the data transmitting function of the input circuit is ensured to be correct. The invention can improve the transmission efficiency of the input data in the neural network processing.
Drawings
Fig. 1 is a basic block diagram of a global broadcast data input circuit structure of the present invention.
Fig. 2 is an input packet data format.
Fig. 3 is a top module structure diagram.
Fig. 4 is a schematic diagram of data transmission.
Fig. 5 is a diagram of a vertical bus module structure.
Fig. 6 is a block diagram of a horizontal bus module.
Fig. 7 is a block diagram of a broadcast transmitting unit located between a vertical bus block and a horizontal bus block.
Fig. 8 is a block diagram of a broadcast transmitting unit located between a horizontal bus module and an arithmetic unit.
Detailed Description
In the present invention, a basic block diagram of a global broadcast data input circuit structure is shown in fig. 1. The working process of the design is as follows:
the top module records the number of the currently received data lines according to the line ending signal and sends an automatic switching signal of the identification number array to the broadcast transmitting unit. The input data and the corresponding data mask and data label are sent to the vertical bus module to be copied and sent to the next broadcast transmitting unit connected with the vertical bus module; then the broadcast emission module located in the vertical bus module and the horizontal bus module, according to the result of comparing the line data label with the identification number value in the module, completes the sending of the data in the line direction; the horizontal bus module receives input and copies the input; and finally, the broadcast transmitting unit connected with the on-line arithmetic unit sends the input data to the appointed arithmetic unit according to the comparison result of the column data label and the identification number value in the module.
The input data packet in the design is a data array, the interior of the data array comprises a plurality of data components, the data format is shown in FIG. 2, [72:9] is a data value, [8:1] is a data mask, and [0] is a convolution line end signal.
The structure of the top module is shown in fig. 3, where the input data is a data packet and a data tag: the data packets are input data arrays, and each data packet contains 8-bit input data, masks corresponding to data numerical values and convolution line ending signals; the data tags include row tags and column tags. The entire global broadcast data input process contains six states: initializing state eIdle, configuring state eConfig, loading control information state eLoadctrl, reading data tag state eUpdatetag, reading input data packet state eTrans, and current data packet transmission completion state eTransdone, which are respectively represented by s0, s1, s2, s3, s4, and s 5. Each state jump process is shown by the sequence number in the figure: firstly, configuring a starting signal; configuring an end signal; thirdly, unconditional jumping is carried out; reading a new data tag array from the external FIFO; fifthly, reading the input data packet by the external FIFO and decoding the input data packet; sixthly, the data of the current row is not transmitted; seventhly, after the data loading of the current row is finished, starting the input transmission of the next row; and completing loading of all data of the current pass, wherein the pass refers to the current channel calculation process. After the hardware is powered on, the hardware is in the state of s0, and when (r) is detected, the hardware starts to enter the state of s 1. The ID array is configured in the s1 state, and when the condition is satisfied, the state jumps to s 2. In the s2 state, the hardware loads the relevant control information in one cycle, and in the condition ③, the hardware enters the s3 state. In the s3 state, the top module reads the data tag array from the external FIFO and holds until the next read update, and under condition (r), the state jumps to s 4. Under the state of s4, the top module reads in an input data packet from an external FIFO and transmits the input data packet to the decoding module, the decoding module decodes the data packet to obtain input data, a mask corresponding to a data value and a convolution line end signal, and under the condition of a fifth, the state jumps to s 5; and in the state of s5, transmitting the input data obtained by decoding the row and column labels obtained in s3 and s4, the mask corresponding to the data value and the convolution row end signal to the vertical bus module for current data packet transmission. When the transmission of the current data packet is finished, judging according to the convolution line ending signal and the transmission line number, and when the condition of (c) is met, jumping to s 4; when the condition is met, the state jumps to s 3; when the condition of (b) is satisfied, the state jumps to (s 1). The top module has another important function of completing automatic switching of the identification number array in cooperation with a row fixed data stream strategy, and fig. 4 shows a data transmission schematic diagram, wherein the identification number value in a single cycle is kept unchanged, the last cycle in a single pass completes switching of the identification number array as required, and a plurality of passes are repeated according to the number of channels of the convolutional layer to complete ordered configuration of the whole identification number array.
As shown in fig. 5, the vertical bus module receives data, mask, ID switching signal and data label sent by the top module decoding, copies the data, and sends the copied data to all broadcast transmitting modules connected between the vertical bus module and the horizontal bus module. In addition, the vertical bus module also generates a packet Valid signal Valid at the time of copying and transmits it to all broadcast transmitting modules connected between the module and the horizontal bus module.
The structure of the horizontal bus module is shown in fig. 6, and the horizontal bus module receives data, a mask, an ID switching signal and a column tag sent by the broadcast transmitting module between the vertical bus module and the horizontal bus module, copies the data, and sends the copied data to all the broadcast transmitting modules connected between the module and the arithmetic unit. In addition, the horizontal bus module also generates a packet Valid signal Valid at the time of copying and transmits it to all broadcast transmitting modules connected between the module and the arithmetic unit.
The broadcast transmitting unit module structure has two types: a is located between vertical bus module and horizontal bus module, as shown in fig. 7, according to the ID switching signal that the vertical bus module sends selects the correspondent line label, compare the line label with the internal line identification number value, if the two match, and Valid and input first-in first-out queue of the arithmetic unit are not empty signal ready high level, through the multiple-way selector switch Mux, send data, mask, ID switching signal and column label to the horizontal bus connected with it, otherwise shield the relevant output; and the other is positioned between the horizontal bus module and the operation unit, as shown in fig. 8, a corresponding column label is selected according to an ID switching signal sent by the horizontal bus module, the column label is compared with the internal column identification number value, if the column label is matched with the internal column identification number value, and when Valid and the input first-in first-out queue non-empty signal ready of the operation unit are both high level, Valid data is selected through a multi-way selector Mux and a numerical mask and sent to the operation unit, otherwise, relevant output is shielded.

Claims (2)

1. A global broadcast data input circuit oriented to neural network processing is characterized by structurally comprising a top layer module, a horizontal bus module, a vertical bus module and a broadcast transmitting module; wherein:
the top module is used for receiving a data packet from the storage system and automatically recording the data receiving number and the automatic switching of the identification number array according to the internal signal of the data packet; specifically, the top module automatically calculates the data sending times of a single convolution layer according to an external control signal, records the line number of received data, ensures the accuracy of the data sending times, sends a switching signal of an ID array to a broadcast transmitting unit, and ensures the ordered sending of the data; the input data of the top module is a data packet and a data label: the data packets are input data arrays, and each data packet contains 8-bit input data, masks corresponding to data numerical values and convolution line ending signals; the data labels comprise row labels and column labels;
the vertical bus module is used for receiving data sent by the top module in a decoding mode, wherein the data comprises a mask code, an ID switching signal and a data label, copying the data and sending the data to all broadcast transmitting modules connected between the module and the horizontal bus module; the vertical bus module also generates a data packet Valid signal Valid during copying and sends the data packet Valid signal Valid to all broadcast transmitting modules connected between the vertical bus module and the horizontal bus module;
the horizontal bus module is used for receiving data sent by the broadcast transmitting module between the vertical bus module and the horizontal bus module, wherein the data comprises a mask code, an ID switching signal and a column label, copying the data and then sending the data to all the broadcast transmitting modules connected between the module and the arithmetic unit; the horizontal bus module also generates a data packet Valid signal Valid during copying and sends the data packet Valid signal Valid to all broadcast transmitting modules connected between the module and the arithmetic unit;
the broadcast transmitting unit module is divided into two types: one is positioned between the vertical bus module and the horizontal bus module, and the other is positioned between the horizontal bus module and the arithmetic unit; the former selects the corresponding line label according to the ID switching signal sent by the vertical bus module, compares the line label with the internal line identification number value, if the two are matched and the input FIFO queue non-empty signal ready of the Valid and the arithmetic unit is high level, the data, the mask, the ID switching signal and the column label are sent to the horizontal bus connected with the multiplex switch Mux, otherwise the relevant output is shielded; the latter selects the corresponding column label according to the ID switching signal sent by the horizontal bus module, compares the column label with the internal column identification number value, if the column label is matched with the internal column identification number value, and when the Valid and the input first-in first-out queue non-empty signal ready of the operation unit are both high level, selects effective data through a multi-way selection switch Mux and a numerical mask and sends the effective data to the operation unit, otherwise, shields the relevant output.
2. The neural network processing-oriented global broadcast data input circuit of claim 1, wherein the entire global broadcast data input process comprises six states: initializing state, configuring state, loading control information state, reading data tag state, reading input data packet state, and current data packet transmission completion state, which are respectively represented by s0, s1, s2, s3, s4 and s 5; each state jump condition is: firstly, configuring a starting signal; configuring an end signal; thirdly, unconditional jumping is carried out; reading a new data tag array from the external FIFO; fifthly, reading the input data packet by the external FIFO and decoding the input data packet; sixthly, the data of the current row is not transmitted; seventhly, starting the next row of input transmission when the data loading of the current row is finished; completing loading of all data of a current pass, wherein the pass refers to a current channel calculation process;
after the hardware is powered on, the hardware is in an s0 state, and when a jump condition is detected: firstly, when a start signal is configured, the hardware starts to enter an s1 state, and an ID array is configured in an s1 state; when the jump condition satisfies the configuration end signal, the state jumps to s 2; in the state of s2, loading related control information by hardware in a period, when a jump condition satisfies a three-unconditional jump, the hardware enters the state of s3, and in the state of s3, the top module reads a data tag array from an external FIFO and keeps the data tag array until the next reading and updating; when the jump condition satisfies the condition that the new data tag array is read from the external FIFO, the state jumps to s 4; in the s4 state, the top module reads in an input data packet from an external FIFO and transmits the input data packet to a decoding module, and the decoding module decodes the data packet to obtain input data, a mask corresponding to a data value and a convolution line end signal; when the skip condition is met and the external FIFO reads in the input data packet and decodes the input data packet, the state skips to s 5; in the s5 state, the input data obtained by decoding the row and column tags obtained in the s3 state and the s4 state, the mask corresponding to the data value and the convolution row end signal are transmitted to the vertical bus module for current data packet transmission; when the transmission of the current data packet is finished, judging according to the end signal of the convolution line and the transmission line number, and when the transmission of the former data is not finished, the state is jumped to s 4; when the forward data loading is completed, the state jumps to s 3; when the jump condition is satisfied and all data loading of the current pass is completed, the state jumps to s 1.
CN202010746509.6A 2020-07-29 2020-07-29 Global broadcast data input circuit for neural network processing Active CN111882051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010746509.6A CN111882051B (en) 2020-07-29 2020-07-29 Global broadcast data input circuit for neural network processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010746509.6A CN111882051B (en) 2020-07-29 2020-07-29 Global broadcast data input circuit for neural network processing

Publications (2)

Publication Number Publication Date
CN111882051A CN111882051A (en) 2020-11-03
CN111882051B true CN111882051B (en) 2022-05-20

Family

ID=73201088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010746509.6A Active CN111882051B (en) 2020-07-29 2020-07-29 Global broadcast data input circuit for neural network processing

Country Status (1)

Country Link
CN (1) CN111882051B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418419B (en) * 2020-11-20 2022-10-11 复旦大学 Data output circuit structure processed by neural network and scheduled according to priority

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178518A (en) * 2019-12-24 2020-05-19 杭州电子科技大学 Software and hardware cooperative acceleration method based on FPGA
CN111199277A (en) * 2020-01-10 2020-05-26 中山大学 Convolutional neural network accelerator

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178518A (en) * 2019-12-24 2020-05-19 杭州电子科技大学 Software and hardware cooperative acceleration method based on FPGA
CN111199277A (en) * 2020-01-10 2020-05-26 中山大学 Convolutional neural network accelerator

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A Configurable Nonlinear Operation Unit For Neural Network Accelerator;Yujie Cai等;《2017 IEEE 12th International Conference on ASIC》;20180111;第319-322页 *
AES算法的SIMD指令集扩展方法与实现;卢仕听等;《计算机工程》;20110320(第06期);第121-123页 *
Design of RLWE Cryptoprocessor Based on Vector-Instruction Extension with RISC-V Architecture;Quan Zhang等;《2018 14th IEEEInternational Conference on Solid-state and Integrated Circuit Technology》;20181206;第1-3页 *
基于AMBA总线的AES算法的设计与验证;凌爱民;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20170315(第3期);I136-855 *
基于AXI总线的视频数据传输处理的FPGA实现;钟雪燕等;《计算机测量与控制》;20151125(第11期);第3825-3827页 *
嵌入式机器视觉系统图像处理和传输接口设计;邱永华;《电子产品可靠性与环境试验》;20120620(第03期);第65-69页 *

Also Published As

Publication number Publication date
CN111882051A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
US10372653B2 (en) Apparatuses for providing data received by a state machine engine
US8209653B2 (en) Router
US20200160163A1 (en) Computation method and product thereof
CN1316402C (en) Processor structure
US11971836B2 (en) Network-on-chip data processing method and device
US8680888B2 (en) Methods and systems for routing in a state machine
CN103019656B (en) The multistage parallel single instruction multiple data array processing system of dynamic reconstruct
CN102144225A (en) Method & apparatus for real-time data processing
EP2104236A1 (en) Table device, variable-length encoding device, variable-length decoding device, and variable-length encoding/decoding device
CN111882051B (en) Global broadcast data input circuit for neural network processing
US20230176999A1 (en) Devices for time division multiplexing of state machine engine signals
CN110059797B (en) Computing device and related product
CN114116557A (en) Channel estimation device and method based on hardware accelerator and digital signal processor
CN104615439A (en) Configuration controller of reconfigurable system
CN115423084A (en) Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium
CN110503179A (en) Calculation method and Related product
CN100508604C (en) Arithmetic coding circuit and arithmetic coding control method
CN102855120B (en) Processor and processing method for VLIW (very low instruction word)
US7088680B1 (en) System and method for digital communication via a time division multiplexed serial data stream
CN101399978B (en) Reference frame data reading method in hardware decoder and apparatus thereof
CN113971022B (en) Wireless signal processing method applying fully programmable system on chip
CN109741237A (en) Large-scale image data processing system and method
CN106708780A (en) Low complexity branch processing circuit of uniform dyeing array towards SIMT framework
US20030121009A1 (en) Method for generating register transfer level code
CN100411382C (en) A FIFO processing chip and data update method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant