CN111459856A - Data transmission device and transmission method - Google Patents

Data transmission device and transmission method Download PDF

Info

Publication number
CN111459856A
CN111459856A CN202010200676.0A CN202010200676A CN111459856A CN 111459856 A CN111459856 A CN 111459856A CN 202010200676 A CN202010200676 A CN 202010200676A CN 111459856 A CN111459856 A CN 111459856A
Authority
CN
China
Prior art keywords
data
logic
transmission
memory bank
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010200676.0A
Other languages
Chinese (zh)
Other versions
CN111459856B (en
Inventor
刘艳欢
李文明
安述倩
吴海彬
冯煜晶
吴萌
叶笑春
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Ruixin Integrated Circuit Technology Co ltd
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202010200676.0A priority Critical patent/CN111459856B/en
Publication of CN111459856A publication Critical patent/CN111459856A/en
Application granted granted Critical
Publication of CN111459856B publication Critical patent/CN111459856B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/161Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
    • G06F13/1626Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by reordering requests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4009Coupling between buses with data restructuring

Abstract

The present invention provides a data transmission apparatus, comprising: the control logic is used for generating corresponding control signals according to the configuration information so as to control the actions of other modules; the memory bank interface logic is used for reading data from a memory bank or writing data into the memory bank; the first selection logic is used for selecting a data transmission path corresponding to a forward transmission mode or a data receiving path corresponding to a reverse transmission mode; the data alignment logic is used for performing alignment operation on the transmitted data; the recombination module is used for carrying out recombination operation on the transmitted data so as to split the data blocks read out from the memory bank into data component forms, recombine the data blocks into new data blocks, transmit the data blocks to the on-chip memory structure for processing, or split the data blocks processed by the on-chip memory structure into data component forms, recombine the data blocks into data blocks and write the data blocks into the memory bank; the second selection logic is used for selecting a data receiving path corresponding to the forward transmission mode or a data transmission path corresponding to the reverse transmission mode; data routing logic is used to determine the destination address of the data transfer.

Description

Data transmission device and transmission method
Technical Field
The invention relates to the technical field of computer hardware processing, in particular to a data transmission structure in a single-core or multi-core processor comprising a plurality of parallel computing units, and more particularly to a reconfigurable data transmission device and a transmission method.
Background
In high-throughput data processing scenes such as scientific calculation, big data and artificial intelligence calculation, in order to mine parallelism of specific application, a plurality of micro-processing cores are usually used in a processor, a plurality of parallel computing units are designed in each micro-processing core, and data with a plurality of components can be processed at one time by adopting the structure, so that the processing speed is effectively improved, and the concurrent processing requirement on the data is effectively met. In order to fully exert the computing power of the processing unit, the data stored on the chip needs to be recombined in the form of data components and then sent to the computing unit for processing. For such processors, data is read into memory by a host, then transferred into a DDR-based memory bank, and then transferred from DDR to on-chip memory of the processor by a data transfer module. In the above process, if the data reassembly operation is performed by the host, frequent operations of reading and writing the memory of the host are required, which consumes a lot of time and seriously affects the performance of the processor. If the on-chip memory module is used for processing, on one hand, the complexity of the on-chip memory structure is greatly increased, on the other hand, the recombination operation of the on-chip memory structure also consumes a lot of time, the flexibility is greatly weakened, and the requirements of recombination and non-recombination are often simultaneously met. For the above reasons, it is important to develop a more efficient data transmission and reassembly scheme for improving processor efficiency.
Through analysis, the main function of the conventional data transmission structure is to directly move data from a DDR-based memory bank to an on-chip memory without any processing. As shown in fig. 1, in the conventional data transmission structure, during the forward transmission process, external configuration information enters through a configuration interface and is primarily analyzed by a configuration analysis module to form information such as a transmission direction, a transmission size, a data source address, a data destination address, a transmission start signal and the like required for transmission, the information is sent to a control logic module, the control logic module receives the information and generates a corresponding control signal, on one hand, the control signal is transmitted to a memory bank interface logic module, the module generates an operation of reading and writing a memory bank under the control of the control signal, on the other hand, the control signal is sent to an on-chip memory interface logic, and the module sends data from the memory bank interface logic module to the destination address of the on-chip memory module as required under the control of the control signal. The reverse transmission process is the reverse of the forward transmission process. Therefore, in the existing data transmission structure, data cannot be processed in the middle of the data transmission process, and the data processing into data components is completed by either the host or the on-chip storage structure, so that the problems of long time consumption, reduced flexibility, influence on the performance of the processor and the like exist no matter the host or the on-chip storage structure is used for executing the operation of processing the data into the data components.
Disclosure of Invention
Therefore, the present invention is directed to overcome the above-mentioned drawbacks of the prior art and to provide a new data transmission structure and transmission method.
According to a first aspect of the present invention, there is provided a data transmission apparatus for data transmission between a bank and an on-chip memory structure, comprising: the control logic is used for generating corresponding control signals according to the configuration information so as to control the memory bank interface logic, the first selection logic, the data alignment logic, the recombination module, the second selection logic and the data routing logic; the configuration information at least comprises a data transmission direction and a transmission mode, wherein the transmission direction is a forward direction or a reverse direction, and the transmission mode is a simple mode or a recombination mode. The memory bank interface logic is connected with the memory bank and used for reading data from the memory bank or writing data into the memory bank according to a control signal of the control logic; the first selection logic is used for selecting a data transmission path corresponding to a forward transmission mode or a data receiving path corresponding to a reverse transmission mode under the control of a control signal of the control logic; the data alignment logic is used for carrying out alignment operation on the transmitted data; the recombination module is used for carrying out recombination operation on the transmitted data so as to split the data blocks read out from the memory bank into a data component form and recombine the data blocks into new data blocks, then the new data blocks are transmitted to the on-chip storage structure to be processed, or the data blocks processed by the on-chip storage structure are split into a data component form and recombine the data blocks into data blocks to be written into the memory bank; the second selection logic is used for selecting a data receiving path corresponding to the forward transmission mode or a data transmission path corresponding to the reverse transmission mode under the control of a control signal of the control logic; data routing logic for determining a destination address for the data transfer under control of the control signal.
The data transmission apparatus further includes: the configuration port is used for acquiring configuration data input by external equipment initiating a data transmission request; the configuration information analysis logic is configured to analyze configuration data acquired by the configuration port to obtain configuration information, and transmit the configuration information to the control logic, where preferably, the configuration information includes: the method comprises the following steps of (1) transmission mode, transmission direction, on-chip storage initial address, memory bank initial address, transverse continuous size of effective data in a memory bank, number of rows of the effective data in each data block in the memory bank, total size of each row of data in each data block in the memory bank, transmission end mark and transmission starting signal; the transmission mode is configured to be a simple mode or a recombination mode, the simple mode refers to that data is directly transmitted without being recombined, and the recombination mode refers to that the data is recombined in a data component form; the transfer direction is configured to be either a forward direction corresponding to the passage of data from the memory banks to the on-chip memory structure or a reverse direction corresponding to the passage of data from the on-chip memory structure to the memory banks.
The recombination module comprises: the data caching logic comprises a plurality of caching units and is used for caching the data blocks read from the memory banks or the data blocks needing to be written into the memory banks; data splitting logic for splitting contiguous data in each data block read from the memory bank into a plurality of data components; or splitting contiguous data in a data block from the on-chip storage structure into the form of data components; and the data reorganization logic is used for splicing and reorganizing the data components at the same position on each data block in the memory bank to form a new data block to be sent to the on-chip storage structure, or caching all the data components in the same data block from the on-chip storage structure to the same position on different cache units in the data cache logic respectively. Preferably, the data caching logic is configured as a ping-pong mechanism and is divided into two caching modules, wherein when data in one caching module is processed, the other caching module can simultaneously cache the data.
In some embodiments of the present invention, the array reassembly logic is configured with a reassembly cache unit for caching data blocks to be sent to the on-chip storage structure after reassembly or caching data components to be stored in the data cache logic after the data blocks from the on-chip storage structure are split.
According to a second aspect of the present invention, there is provided a data transmission method based on the apparatus of the first aspect of the present invention, for data transmission from a memory bank to an on-chip memory structure, comprising the following steps:
z1, sequentially reading the continuous data of each data block in the memory bank and sequentially caching each data block in different cache units; preferably, the data block read from the memory bank is subjected to a data alignment operation before being stored in the buffer unit;
z2, splitting the data in each data block according to the form of data components, and splicing and recombining the data components at the same position in each data block into a plurality of new data blocks; preferably, the method comprises the following steps: simultaneously reading a data block from the plurality of cache units respectively and sequencing the data blocks, wherein the number of the data blocks is consistent with that of the microprocessing cores corresponding to the on-chip storage structure, and the sequence of the data blocks is consistent with that of the data blocks read from the memory bank; splitting each data block into a plurality of data components by taking the bit width of a memory bank interface as a unit component; splicing the data components at the same position in each data block to recombine a plurality of new data blocks, wherein the number of the data components in each new data block is consistent with the number of the microprocessing cores, and the splicing sequence of each data component in the new data block is consistent with the sequence of the corresponding data block before splicing;
z3, and transferring the new data blocks in the step Z2 to an on-chip storage structure one by one.
According to a third aspect of the present invention, there is provided a data transmission method based on the apparatus of the first aspect of the present invention, for data transmission from an on-chip memory structure to a memory bank, comprising the steps of:
f1, reading the continuous data of the data blocks from the on-chip storage structure one by one;
f2, splitting each data block read in the step F1 into a plurality of data components, caching the data components belonging to the same data block at the same position of a plurality of cache units respectively, and enabling the plurality of data components cached in the same cache unit to form a new data block; preferably, the method comprises the following steps: splitting each data block from the on-chip storage structure into a plurality of data components, wherein the number of the data components is consistent with the number of the micro-processing cores corresponding to the on-chip storage structure; sequentially storing data components belonging to the same data block into the same position of different cache units, wherein one data component corresponds to one cache unit, the sequence of storing the data components into the cache units is consistent with the sequence of the data components in the data block, and the previous data components in the data block are correspondingly stored into the previous cache units; forming all data components stored in the same cache unit into a new data block, wherein the sequence of the data components in the new data block is consistent with the sequence of the corresponding data block read from the on-chip storage structure;
and F3, storing the new data blocks into the memory bank one by one, and preferably, performing data alignment operation on the new data blocks before the new data blocks are stored into the memory bank.
Compared with the prior art, the invention has the advantages that: the invention realizes data recombination in the data transmission process, avoids time delay caused by host processing or on-chip storage structure processing, has flexible working mode, and meets the diversified requirements of data recombination and non-recombination, thereby greatly reducing time delay and resource consumption and effectively exerting the processing performance of the processor.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1 is a diagram of a prior art data transmission structure;
FIG. 2 is a schematic diagram of a data transmission apparatus according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a configuration information format adopted by a data transmission device according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a forward direction data transmission of a data transmission device according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating data reassembly in a forward transmission process of a data transmission apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a data transmission apparatus for transmitting data in a reverse direction according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The inventor finds that if the data is recombined in the data transmission process before the data is moved from the storage body to the on-chip storage structure or before the data is stored in the storage body from the on-chip storage structure, the data processing efficiency can be greatly improved, the data concurrent processing requirement can be effectively met, the data transmission speed is improved, and the time delay and the resource consumption are reduced. For example, a new data transmission structure can be designed, and the structure can be configured into two working modes, one of which realizes the traditional simple data transmission function, and the other of which adds a data recombination function in the data transmission process and fuses data transmission and data recombination, wherein the two working modes can be flexibly configured and used according to needs, and simultaneously meet the diversified demands of data recombination and non-recombination, so that the time delay and the resource consumption are greatly reduced, and the processing performance of the processor can be effectively exerted.
In view of the above, the present invention provides a reconfigurable data transmission device and a transmission method, and the present invention is described in detail below with reference to the accompanying drawings, embodiments, examples, and the like.
According to an embodiment of the present invention, as shown in fig. 2, a data transmission apparatus is provided for data transmission between a memory bank and an on-chip memory structure, and includes a configuration port, a configuration information parsing logic, a control logic, a memory bank interface logic, a first selection logic, a data alignment logic, a reassembly module, a second selection logic, and a data routing logic, where an input/output port and a circuit connection line are configured between the modules.
Specifically, the configuration port is used for acquiring configuration data input from the outside; the configuration information analysis logic is used for analyzing the configuration data acquired by the configuration port to obtain configuration information and transmitting the configuration information to the control logic, wherein the configuration information at least comprises a data transmission direction and a transmission mode, the transmission direction is a forward direction or a reverse direction, and the transmission mode is a simple mode or a recombination mode; the control logic is used for extracting information from the configuration information and generating a control signal corresponding to the information to control the memory bank interface logic, the first selection logic, the data alignment logic, the recombination module, the second selection logic and the data routing logic; the memory bank interface logic is connected with the memory bank and used for reading data from the memory bank or writing data into the memory bank according to a control signal of the control logic; a first selection logic for selecting a transmission path corresponding to a forward transmission mode or a reception path corresponding to a reverse transmission mode under control of a control signal of the control logic; the data alignment logic is used for performing alignment operation on the data needing to be recombined; the recombination module is used for carrying out recombination operation on the data to be recombined so as to split the data blocks read out from the memory bank into data component forms and recombine the data blocks into new data blocks, then the new data blocks are transmitted to the on-chip storage structure to be processed, or the data blocks processed by the on-chip storage structure are split into data component forms and recombine the data blocks into data blocks to be written into the memory bank; a second selection logic for selecting a reception path corresponding to the forward transmission mode or a transmission path corresponding to the reverse transmission mode under control of a control signal of the control logic; data routing logic for selecting a location to which data is to be sent based on the address under control of the control signal.
The data transmission device of the invention firstly receives configuration information from an upper layer, the format of the configuration information is shown in figure 3, and the configuration information comprises a transmission mode (which can be configured into a simple mode or a recombination mode according to requirements), a transmission direction (the transmission direction is divided into a forward direction and a reverse direction, which respectively correspond to a process of data from a memory bank mainly containing DDR to on-chip storage and a process opposite to the process), an on-chip storage starting address, a memory bank starting address, a horizontal continuous size of effective data in the memory bank, a line number of the effective data in each data block in the memory bank, a total size of each line of data of each data block in the memory bank, a transmission ending mark and a transmission starting signal. The simple mode means that data is directly transmitted without being recombined, and the recombination mode means that the data is recombined in a data component form; the forward transfer corresponds to the process of data from the memory banks to the on-chip memory structure, and the reverse transfer corresponds to the process of data from the on-chip memory structure to the memory banks. The configuration information simply and completely describes the transmission mode, the transmission direction, the initial address of each data block of data in external memory banks such as DDR and the like, the shape information of the data block, the on-chip memory initial address and the like required by data transmission.
Preferably, the reassembly module includes a data caching logic, a data splitting logic, and a data reassembly logic. The data caching logic comprises a plurality of caching units and is used for caching data blocks read from a memory bank or data blocks needing to be written into the memory bank; data splitting logic for splitting contiguous data in each data block read from the memory bank into a plurality of data components or splitting contiguous data in data blocks from the on-chip memory structure into data components; the data reorganization logic is used for splicing and reorganizing the data components at the same position on each data block in the memory bank to form a new data block to be sent to the on-chip storage structure, or caching all the data components in the same data block from the on-chip storage structure to the same position on different cache units in the data cache logic respectively.
Preferably, the data caching logic is configured as a ping-pong (P i ngPong) mechanism, and is divided into two caching modules, each caching module includes a plurality of caching units, and when data in one caching module is processed, the other caching module can cache the data at the same time, so as to improve the efficiency of data transmission. The flow line thought is introduced in the data transmission process, so that all modules work in parallel, and the modules do not need to start processing of the next batch of data after the first batch of data reaches the on-chip memory from the memory bank, so that all the modules are always in a working state, and the transmission efficiency is improved. The pipeline mode is matched with a ping-pong mechanism, so that each transmission stage can work in parallel, the transmission bandwidth efficiency can be exerted to the maximum extent, the data transmission and the data recombination are efficiently fused, and the data recombination function is completed while the data transmission is completed.
Preferably, the array reassembly logic is configured with a reassembly cache unit for caching the data block to be sent to the on-chip storage structure after reassembly or caching the data component to be stored in the data cache logic after the data block from the on-chip storage structure is split, and the data processing capability of the transmission device can be improved through caching.
Fig. 4 schematically shows a data flow direction of forward transmission by using the data transmission apparatus of the present invention, and specifically, the forward transmission data transmission and reassembly execution process includes the following steps:
s301: acquiring configuration data input by equipment requesting data transmission from an external from a configuration port, wherein the configuration data at least comprises information such as a transmission mode, a transmission direction, an on-chip storage initial address, a storage bank initial address, the transverse continuous size of effective data in a storage bank, the number of rows of the effective data in each data block in the storage bank, the total size of each row of data in each data block in the storage bank, a transmission ending mark, a transmission starting signal and the like;
s302: the configuration information analysis logic analyzes the configuration data to form configuration information, specifically, the configuration data of the configuration port is stored in a corresponding register according to the number, and meanwhile, required information is obtained according to the number and the corresponding register value according to a mapping relation and is transmitted to the control logic component;
s303: the control logic extracts control information such as a transmission direction, a transmission mode and the like from the input configuration information and generates a corresponding control signal based on a control rule, for example, if the transmission direction in the input configuration information is a forward direction and the transmission mode is a recombination mode, the control logic generates a control signal to control the memory bank interface logic to read data from the memory bank, transmit the data to the recombination module for data recombination and send the data to the on-chip storage structure; if the transmission direction in the input configuration information is the forward direction and the transmission mode is the simple mode, the control logic generates a control signal to control the memory bank interface logic to read data from the memory bank and directly send the data to the on-chip storage structure; if the transmission direction in the input configuration information is reverse and the transmission mode is a recombination mode, the control logic generates a control signal to control the data of the storage structure on the data routing logic receiving chip to be recombined by the recombination module and then stored in the storage body; if the transmission direction in the input configuration information is reverse and the transmission mode is simple, the control logic generates a control signal to control the data of the storage structure on the data routing logic receiving chip to be directly stored in the storage body;
s304: under a recombination mode, sequentially acquiring continuous data of each data block in a memory bank; in a simple mode, directly reading continuous data in a memory bank; the number of the read data blocks is consistent with that of the micro processing cores, for example, if the micro processing cores have 8 cores, 8 data blocks are read, and if the micro processing cores have 16 cores, 16 data blocks are read, which represents the parallel processing capacity of the micro processing cores;
s305: the data is input into a first selection logic after passing through the memory bank interface logic, and the first selection logic selects a path for transmitting the data according to the transmission mode and the transmission direction;
s306: in the reassembly mode, the first selection logic selects the upper path R1 shown in fig. 4, the data block passes through the first selection logic, enters the alignment logic, and then enters step S307; in the simple mode, the first selection logic selects the lower path R2 shown in fig. 4, the data passes through the first selection logic and enters the second selection logic, and then the process goes to step S310;
s307: the data alignment logic performs alignment operation on the data blocks and sends the aligned data blocks to corresponding data cache units in the data cache logic; wherein, the earlier data block to be read is stored in the earlier buffer unit, according to an example of the present invention, as shown in fig. 4, for example, 16 data blocks are to be read from the memory bank, the reading order of the data blocks is data block 1, data block 2, and … data block 16, respectively, in the data buffer logic, data block 1 is buffered in buffer unit 1, data block 2 is buffered in buffer unit 2, and … data block 16 is buffered in buffer unit 16; preferably, the data caching logic is divided into two parts to form a P i ngpong operation mechanism, and one group of cache units can be read while the other group of cache units is written in the transmission process, so that the waiting time is reduced, and the transmission efficiency is improved;
s308: after being processed by the cache logic, the data enters a data splitting logic, each data block is split into a plurality of data components by taking the bit width of a memory bank interface as a unit component, and then the data enters a data reorganization logic;
s309: in the data recombination logic, splicing and recombining the data components at the same position of each data block into a new data block, and inputting the new data block into a second selection logic; the second selection logic receives data from the upper path R1 from the data reassembly logic as shown in fig. 4 based on the transmission mode signal and the transmission direction signal;
s310: the data enters the data routing logic after coming out of the selection logic, and the data routing logic determines a target address into which the data is sent according to the on-chip storage initial address in the configuration information;
s311: the new data blocks are output from the data routing logic and are sequentially written into the on-chip storage structure according to the target addresses determined by the data routing logic.
To better understand the data reassembly process in the forward transmission process, the following description is given with reference to an example.
Assuming that the processor has 8 micro processing cores and can process data of 8 components in parallel, when a data block is read from the memory bank, 8 data blocks, namely data block 1, data block 2, data block 3, data block 4, data block 5, data block 6, data block 7 and data block 8, are read and stored in 8 cache units of the data cache logic, namely cache unit 1, cache unit 2, cache unit 3, cache unit 4, cache unit 5, cache unit 6, cache unit 7 and cache unit 8, the data block read first is cached in the previous cache unit, namely data block 1 is cached in cache unit 1, data block 2 is cached in cache unit 2, and so on.
The method comprises the following steps in recombination:
step 1, respectively reading a data block from the cache units 1 to 8, and sequencing the data blocks according to the sequence read from the memory bank, namely a data block 1, a data block 2, a data block 3, a data block 4, a data block 5, a data block 6, a data block 7 and a data block 8;
step 2, taking the bit width of the memory bank interface as a unit component, splitting each data block into a plurality of data components, for example, assuming that the size of each data block is 256 bit and the unit component is 32 bit, each data block is split into 8 data components, as shown in fig. 5, a data block 1 is split into a data component 11, a data component 12, a data component 13, a data component 14, a data component 15, a data component 16, a data component 17, and a data component 18; by analogy, each data block is divided into 8 data components;
step 3, splicing and recombining the data components at the same position in each data block into a plurality of new data blocks, wherein the number of the data components in each new data block is consistent with the number of the microprocessing cores, and the splicing sequence of each data component in the new data block is consistent with the sequence of the data block before splicing, as shown in fig. 5, the data components at the 1 st position in the data blocks 1, 2, 3, 4, 5, 6, 7 and 8 are combined into a new data block n1, which comprises the data components 11, 21, 31, 41, 51, 61, 71 and 81; similarly, the data components at the 2 nd position in the data blocks 1, 2, 3, 4, 5, 6, 7, 8 are formed into a new data block n2, the data components at the 3 rd position in the data blocks 1, 2, 3, 4, 5, 6, 7, 8 are formed into a new data block n3, the data components at the 4 th position in the data blocks 1, 2, 3, 4, 5, 6, 7, 8 are formed into a new data block n4, the data components at the 5 th position in the data blocks 1, 2, 3, 4, 5, 6, 7, 8 are formed into a new data block n5, the data components at the 2 nd position in the data blocks 1, 2, 3, The data component at the 6 th position in the data block 4, the data block 5, the data block 6, the data block 7 and the data block 8 forms a new data block n6, the data component at the 7 th position in the data block 1, the data block 2, the data block 3, the data block 4, the data block 5, the data block 6, the data block 7 and the data block 8 forms a new data block n7, the data component at the 8 th position in the data block 1, the data block 2, the data block 3, the data block 4, the data block 5, the data block 6, the data block 7 and the data block 8 forms a new data block n8, then new data blocks n1 to n8 are fed into the on-chip storage structure one by one in the order, new data blocks n1 to n8 are fed into the on-chip storage structure in the order that is consistent with the order in which the data components in each new data block are in the original data block, that is, new data blocks n1 through n8 are fed into the on-chip memory structure one by one in the order of n1 through n 8.
Fig. 6 schematically shows a data flow direction of a reverse transmission performed by the data transmission device of the present invention, the reverse transmission can be regarded as a reverse process of the forward transmission, and the data transmission and reassembly performing process includes the following steps: :
s401: acquiring configuration data input by equipment requesting data transmission from the outside from a configuration port;
s402: the configuration information analysis logic analyzes the configuration data to form configuration information;
s403: the control logic extracts control information such as transmission direction and transmission mode from the input configuration information and generates corresponding control signals based on control rules;
s404: reading data from the on-chip memory and entering a data routing logic;
s405: after passing through the data routing logic, the data enters a second selection logic, in a simple mode, the second selection logic selects the next path day shown in fig. 6, the data directly enters the first selection logic after passing through the second selection logic, and the step S409 is performed; in the reassembly mode, the second selection logic selects the upper path R3 shown in fig. 6, and the data enters the reassembly module after passing through the second selection logic, and then the process goes to step S406;
s406: splitting and recombining the data blocks from the second selection logic, and splitting each data block from the on-chip storage structure into a plurality of data components, wherein the number of the data components is consistent with the number of the microprocessing cores corresponding to the on-chip storage structure; sequentially storing data components belonging to the same data block into the same position of different cache units, wherein one data component corresponds to one cache unit, the sequence of storing the data components into the cache units is consistent with the sequence of the data components in the data block, and the previous data components in the data block are correspondingly stored into the previous cache units; forming all data components stored in the same cache unit into a new data block, wherein the sequence of the data components in the new data block is consistent with the sequence of the corresponding data block read from the on-chip storage structure;
s407: after the new data block is processed by the cache logic, the new data block enters the data alignment logic for alignment processing;
s408: the new data block is processed by the data alignment logic and then enters the first selection logic, and the first selection logic receives data from the data alignment logic according to the transmission direction and the transmission mode from the upper path R3 shown in FIG. 6;
s409: the data coming out of the first selection logic enters a memory bank interface logic;
s410: after being processed by the memory bank interface logic, the data is written into a designated position in the memory bank.
The invention realizes data recombination in the data transmission process, avoids time delay caused by host processing or on-chip storage structure processing, has flexible working mode, and meets the diversified requirements of data recombination and non-recombination, thereby greatly reducing time delay and resource consumption and effectively exerting the processing performance of the processor.
It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (13)

1. A data transfer apparatus for data transfer between a memory bank and an on-chip memory structure, comprising:
the control logic is used for generating corresponding control signals according to the configuration information so as to control the memory bank interface logic, the first selection logic, the data alignment logic, the recombination module, the second selection logic and the data routing logic; the configuration information at least comprises a data transmission direction and a transmission mode, wherein the transmission direction is a forward direction or a reverse direction, and the transmission mode is a simple mode or a recombination mode;
the memory bank interface logic is connected with the memory bank and used for reading data from the memory bank or writing data into the memory bank according to a control signal of the control logic;
the first selection logic is used for selecting a data transmission path corresponding to a forward transmission mode or a data receiving path corresponding to a reverse transmission mode under the control of a control signal of the control logic;
the data alignment logic is used for carrying out alignment operation on the transmitted data;
the recombination module is used for carrying out recombination operation on the transmitted data so as to split the data blocks read out from the memory bank into a data component form and recombine the data blocks into new data blocks, then the new data blocks are transmitted to the on-chip storage structure to be processed, or the data blocks processed by the on-chip storage structure are split into a data component form and recombine the data blocks into data blocks to be written into the memory bank;
the second selection logic is used for selecting a data receiving path corresponding to the forward transmission mode or a data transmission path corresponding to the reverse transmission mode under the control of a control signal of the control logic;
data routing logic for determining a destination address for the data transfer under control of the control signal.
2. A data transfer device according to claim 1, characterized in that the data transfer device further comprises:
the configuration port is used for acquiring configuration data input by external equipment initiating a data transmission request;
the configuration information analysis logic is used for analyzing the configuration data acquired by the configuration port to obtain configuration information and transmitting the configuration information to the control logic, wherein the configuration information at least comprises a data transmission direction and a transmission mode, the transmission direction is a forward direction or a reverse direction, and the transmission mode is a simple mode or a recombination mode.
3. A data transmission apparatus according to claim 1, wherein said configuration information comprises: the method comprises the following steps of (1) transmission mode, transmission direction, on-chip storage initial address, memory bank initial address, transverse continuous size of effective data in a memory bank, number of rows of the effective data in each data block in the memory bank, total size of each row of data in each data block in the memory bank, transmission end mark and transmission starting signal;
the transmission mode is configured to be a simple mode or a recombination mode, the simple mode refers to that data is directly transmitted without being recombined, and the recombination mode refers to that the data is recombined in a data component form; the transfer direction is configured to be either a forward direction corresponding to the passage of data from the memory banks to the on-chip memory structure or a reverse direction corresponding to the passage of data from the on-chip memory structure to the memory banks.
4. A data transmission device according to claim 1, wherein the reassembly module comprises:
the data caching logic comprises a plurality of caching units and is used for caching the data blocks read from the memory banks or the data blocks needing to be written into the memory banks;
data splitting logic for splitting contiguous data in each data block read from the memory bank into a plurality of data components; or splitting contiguous data in a data block from the on-chip storage structure into the form of data components;
and the data reorganization logic is used for splicing and reorganizing the data components at the same position on each data block in the memory bank to form a new data block to be sent to the on-chip storage structure, or caching all the data components in the same data block from the on-chip storage structure to the same position on different cache units in the data cache logic respectively.
5. A data transmission device according to claim 4,
the data caching logic is configured as a ping-pong mechanism and is divided into two caching modules, wherein when data in one caching module is processed, the other caching module can cache the data at the same time.
6. The data transmission device according to claim 4, wherein the array reassembly logic is configured with a reassembly buffer unit for buffering data blocks to be sent to the on-chip storage structure after reassembly or buffering data components to be stored in the data buffer logic after the data blocks from the on-chip storage structure are split.
7. A transmission method based on the data transmission apparatus of any one of claims 1 to 6, for transmitting data from a memory bank to an on-chip memory structure, comprising:
z1, sequentially reading the continuous data of each data block in the memory bank and sequentially caching each data block in different cache units;
z2, splitting the data in each data block according to the form of data components, and splicing and recombining the data components at the same position in each data block into a plurality of new data blocks;
z3, and transferring the new data blocks in the step Z2 to an on-chip storage structure one by one.
8. A data transmission method according to claim 7,
the data block read from the memory bank is subjected to a data alignment operation before being stored in the buffer unit.
9. A data transmission method according to claim 7, wherein said step Z2 includes:
z21, reading a data block from each of the plurality of cache units and sequencing the data blocks, wherein the number of the data blocks is consistent with the number of the microprocessing cores corresponding to the on-chip storage structure, and the sequence of the data blocks is consistent with the sequence of the data blocks read from the memory bank;
z22, taking the bit width of the memory bank interface as a unit component, and splitting each data block into a plurality of data components;
and Z23, splicing the data components at the same position in each data block to form a plurality of new data blocks, wherein the number of the data components in each new data block is consistent with the number of the microprocessing cores, and the splicing sequence of each data component in the new data block is consistent with the sequence of the corresponding data block before splicing.
10. A transmission method based on the data transmission apparatus of any one of claims 1 to 6, for transmitting data from an on-chip memory structure to a memory bank, comprising:
f1, reading continuous data in the data blocks from the on-chip storage structure one by one;
f2, splitting each data block read in the step F1 into a plurality of data components, caching the data components belonging to the same data block at the same position of a plurality of cache units respectively, and enabling the plurality of data components cached in the same cache unit to form a new data block;
f3, storing the new data blocks into the memory bank one by one.
11. A data transmission method according to claim 10, wherein said step F2 includes:
f21, splitting each data block from the on-chip storage structure into a plurality of data components, wherein the number of the data components is consistent with the number of the micro-processing cores corresponding to the on-chip storage structure;
f21, sequentially storing the data components belonging to the same data block into the same position of different cache units, wherein one data component corresponds to one cache unit, the sequence of storing the data components into the cache units is consistent with the sequence of the data components in the data block, and the data components which are positioned in front of the data block are correspondingly stored into the cache units which are positioned in front of the data block;
f23, all data components stored in the same cache unit are combined into a new data block, and the sequence of the data components in the new data block is consistent with the sequence of the corresponding data block read from the on-chip storage structure.
12. A data transfer method according to claim 10, wherein the new data block is subjected to a data alignment operation before it is stored in the memory bank.
13. A computer-readable storage medium having embodied thereon a program for implementing the method of any one of claims 7 to 12.
CN202010200676.0A 2020-03-20 2020-03-20 Data transmission device and transmission method Active CN111459856B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010200676.0A CN111459856B (en) 2020-03-20 2020-03-20 Data transmission device and transmission method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010200676.0A CN111459856B (en) 2020-03-20 2020-03-20 Data transmission device and transmission method

Publications (2)

Publication Number Publication Date
CN111459856A true CN111459856A (en) 2020-07-28
CN111459856B CN111459856B (en) 2022-02-18

Family

ID=71678435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010200676.0A Active CN111459856B (en) 2020-03-20 2020-03-20 Data transmission device and transmission method

Country Status (1)

Country Link
CN (1) CN111459856B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232498A (en) * 2020-10-12 2021-01-15 安徽寒武纪信息科技有限公司 Data processing device, integrated circuit chip, electronic equipment, board card and method
CN114065905A (en) * 2020-08-07 2022-02-18 深圳先进技术研究院 Data batch processing method and batch processing device thereof, storage medium and computer equipment
CN115061640A (en) * 2022-08-11 2022-09-16 深圳云豹智能有限公司 Fault-tolerant distributed storage system, method, electronic equipment and medium
CN116403512A (en) * 2023-06-08 2023-07-07 永林电子股份有限公司 Display control method and system based on lamp strip

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1890630A (en) * 2003-12-09 2007-01-03 Arm有限公司 A data processing apparatus and method for moving data between registers and memory
CN103984677A (en) * 2014-05-30 2014-08-13 东南大学 Embedded reconfigurable system based on large-scale coarseness and processing method thereof
US20170116237A1 (en) * 2015-10-27 2017-04-27 Teradata Us, Inc. Buffered data-loading in column-partitioned database tables
CN107506328A (en) * 2016-06-14 2017-12-22 想象技术有限公司 Execute out memory requests
CN109416633A (en) * 2016-07-08 2019-03-01 Arm有限公司 For executing the device and method for rearranging operation
CN110046703A (en) * 2019-03-07 2019-07-23 中国科学院计算技术研究所 A kind of on piece storage processing system for neural network
CN110096456A (en) * 2019-05-13 2019-08-06 成都定为电子技术有限公司 A kind of High rate and large capacity caching method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1890630A (en) * 2003-12-09 2007-01-03 Arm有限公司 A data processing apparatus and method for moving data between registers and memory
CN103984677A (en) * 2014-05-30 2014-08-13 东南大学 Embedded reconfigurable system based on large-scale coarseness and processing method thereof
US20170116237A1 (en) * 2015-10-27 2017-04-27 Teradata Us, Inc. Buffered data-loading in column-partitioned database tables
CN107506328A (en) * 2016-06-14 2017-12-22 想象技术有限公司 Execute out memory requests
CN109416633A (en) * 2016-07-08 2019-03-01 Arm有限公司 For executing the device and method for rearranging operation
CN110046703A (en) * 2019-03-07 2019-07-23 中国科学院计算技术研究所 A kind of on piece storage processing system for neural network
CN110096456A (en) * 2019-05-13 2019-08-06 成都定为电子技术有限公司 A kind of High rate and large capacity caching method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BERKIN AKIN: "《HAMLET ARCHITECTURE FOR PARALLEL DATA REORGANIZATION IN MEMORY》", 《IEEE COMPUTER SOCIETY》 *
叶笑春,李文明,范东睿: "《高通量众核处理器设计》", 《数据与计算发展前沿》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065905A (en) * 2020-08-07 2022-02-18 深圳先进技术研究院 Data batch processing method and batch processing device thereof, storage medium and computer equipment
CN112232498A (en) * 2020-10-12 2021-01-15 安徽寒武纪信息科技有限公司 Data processing device, integrated circuit chip, electronic equipment, board card and method
CN112232498B (en) * 2020-10-12 2022-11-18 安徽寒武纪信息科技有限公司 Data processing device, integrated circuit chip, electronic equipment, board card and method
CN115061640A (en) * 2022-08-11 2022-09-16 深圳云豹智能有限公司 Fault-tolerant distributed storage system, method, electronic equipment and medium
CN116403512A (en) * 2023-06-08 2023-07-07 永林电子股份有限公司 Display control method and system based on lamp strip
CN116403512B (en) * 2023-06-08 2023-08-18 永林电子股份有限公司 Display control method and system based on lamp strip

Also Published As

Publication number Publication date
CN111459856B (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN111459856B (en) Data transmission device and transmission method
CN110647480B (en) Data processing method, remote direct access network card and equipment
CN107301455B (en) Hybrid cube storage system for convolutional neural network and accelerated computing method
US11269796B2 (en) Acceleration control system based on binarization algorithm, chip, and robot
US20190228308A1 (en) Deep learning accelerator system and methods thereof
CN111656339B (en) Memory device and control method thereof
CN111782154B (en) Data moving method, device and system
CN112181293B (en) Solid state disk controller, solid state disk, storage system and data processing method
US10922258B2 (en) Centralized-distributed mixed organization of shared memory for neural network processing
CN111324294A (en) Method and apparatus for accessing tensor data
US8667199B2 (en) Data processing apparatus and method for performing multi-cycle arbitration
US11367498B2 (en) Multi-level memory hierarchy
WO2023124304A1 (en) Chip cache system, data processing method, device, storage medium, and chip
JP2009009549A (en) System and method for processing data by series of computers
CN101939733A (en) External device access apparatus, control method thereof, and system lsi
CN115994040A (en) Computing system, method for data broadcasting and data reduction, and storage medium
CN115237349A (en) Data read-write control method, control device, computer storage medium and electronic equipment
JP7152343B2 (en) semiconductor equipment
CN114970848A (en) Data handling device for parallel processor and corresponding processor
CN111625368A (en) Distributed computing system and method and electronic equipment
KR20230059536A (en) Method and apparatus for process scheduling
CN109598669B (en) GPU-oriented triangular rasterization scanning system
JPH08212178A (en) Parallel computer
CN107807888B (en) Data prefetching system and method for SOC architecture
CN112100098A (en) DDR control system and DDR memory system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231220

Address after: 215125 11-303, creative industrial park, No. 328, Xinghu street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Patentee after: Suzhou Ruixin integrated circuit technology Co.,Ltd.

Address before: 100190 No. 6 South Road, Zhongguancun Academy of Sciences, Beijing, Haidian District

Patentee before: Institute of Computing Technology, Chinese Academy of Sciences