CN112967172A - Data processing device, method, computer equipment and storage medium - Google Patents

Data processing device, method, computer equipment and storage medium Download PDF

Info

Publication number
CN112967172A
CN112967172A CN202110221038.1A CN202110221038A CN112967172A CN 112967172 A CN112967172 A CN 112967172A CN 202110221038 A CN202110221038 A CN 202110221038A CN 112967172 A CN112967172 A CN 112967172A
Authority
CN
China
Prior art keywords
data
storage unit
data processing
control signal
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110221038.1A
Other languages
Chinese (zh)
Inventor
周军
常亮
周亮
何翔
赵能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Chengdu Sensetime Technology Co Ltd
Original Assignee
University of Electronic Science and Technology of China
Chengdu Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China, Chengdu Sensetime Technology Co Ltd filed Critical University of Electronic Science and Technology of China
Priority to CN202110221038.1A priority Critical patent/CN112967172A/en
Publication of CN112967172A publication Critical patent/CN112967172A/en
Priority to PCT/CN2021/115780 priority patent/WO2022179074A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The present disclosure provides a data processing apparatus, a method, a computer device, and a storage medium, wherein the apparatus includes: the method comprises the following steps: a first storage unit and a calculation unit; the compute unit includes an array of processing engines PE; the first storage units are respectively connected with the PEs in the PE array; the PE is used for performing read/write access on the connected first storage unit; the first storage units are used for storing data transmitted in the read/write access process of the connected PE. According to the embodiment of the disclosure, different PEs connected with different first storage units in the PE array can access different first storage units in parallel, so that the efficiency of reading data from the first storage units is improved, the efficiency of storing the data into the first storage units is improved, and the data processing efficiency is improved.

Description

Data processing device, method, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing apparatus, a data processing method, a computer device, and a storage medium.
Background
The image processing algorithm is widely applied in different fields such as image recognition, target detection and the like, and the realization of the image processing algorithm is generally completed by adopting an Artificial Intelligence (AI) accelerator hardware architecture. The currently commonly used AI accelerator hardware architecture mainly includes a storage unit, a computing unit, a control unit, etc., wherein the computing unit of the core is generally composed of a two-dimensional PE array (Processing Engine) and a register array (local register file); the Memory unit may be composed of different hierarchical caches, and includes Memory spaces such as a Double Data Rate (DDR), a Static Random Access Memory (SRAM), a register, and a post-relational database cache. The input data flow is cached and transferred in different storage spaces, enters a register array corresponding to the PE array, is read from the register array through the PE array, is subjected to arithmetic operation (or logic operation), and finally is written back to the corresponding storage space.
The current way of controlling the input data to be stored in the register array has the problem of low data processing efficiency.
Disclosure of Invention
The embodiment of the disclosure at least provides a data processing device, a data processing method, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a data processing apparatus, including: a first storage unit and a calculation unit; the compute unit includes an array of processing engines PE; the first storage units are respectively connected with the PEs in the PE array; the PE is used for performing read/write access on the connected first storage unit; the first storage units are used for storing data transmitted in the read/write access process of the connected PE.
Therefore, different PEs connected with different first storage units in the PE array can access different first storage units in parallel, the efficiency of reading data from the first storage units is improved, the efficiency of storing the data into the first storage units is improved, and the data processing efficiency is improved.
In an optional embodiment, the plurality of first storage units are configured to be respectively connected to different PE groups in the PE array.
In an alternative embodiment, each first storage unit is connected to a PE in a group of PEs; different PEs belong to different PE groups, respectively.
In an optional embodiment, the PE group includes a plurality of PEs in the PE array, and the plurality of PEs are located in the same row, or in the same half row, or in the same block in the hardware layout.
In this way, the corresponding first storage units are allocated to the PEs in the PE array, so that the number of data transmission channels can be increased, and the PEs can transmit more data when performing read/write access on the first storage units, thereby improving the efficiency of data transmission; meanwhile, the flexibility of the data processing device can be increased to adapt to different data processing requirements.
In an optional implementation manner, the PE is configured to perform read access on a connected first storage unit in a first processing cycle to obtain first data corresponding to the PE; and/or in a second processing cycle, performing write access to the connected first storage unit, and storing second data generated by the PE to the connected first storage unit.
In an alternative embodiment, different PEs connected to the same first storage unit perform read/write access to the same first storage unit in different processing cycles; and/or one PE respectively has read/write access to the connected first storage unit in the same processing cycle in the PE group connected with different first storage units.
In an alternative embodiment, in each PE group to which different first storage units are connected, PEs having the same relative position perform read/write access to the connected first storage units in the same processing cycle.
Therefore, the multiple PEs can synchronously carry out read/write access on different first storage units, and the data transmission efficiency is improved.
In an optional embodiment, the method further comprises: a control unit; the control unit is used for generating a first control signal based on a data processing instruction and transmitting the first control signal to the PE; and the PE is used for responding to the received first control signal transmitted by the control unit and reading the first data to be processed by the PE from a first storage unit connected with the PE.
In an optional embodiment, the control unit is further configured to generate a second control signal based on the data processing instruction, and transmit the second control signal to the PE; and the PE is used for responding to the second control signal transmitted by the control unit and writing the data generated by the PE into a first storage unit connected with the PE.
In an optional embodiment, the method further comprises: a data scheduler; the control unit is further configured to generate a third control signal based on the data processing instruction, and transmit the third control signal to the data scheduler; the data scheduler is configured to perform a write access to the first storage unit based on the third control signal.
Therefore, the data scheduler is used as a medium for data transmission, so that the data with larger data volume can be controlled to be transmitted in high efficiency and order during transmission, and errors during transmission are avoided.
In an optional embodiment, the system further comprises a second storage unit; the data scheduler is configured to read to-be-processed data corresponding to each first storage unit from the second storage unit, and store to-be-processed data corresponding to each first storage unit into the corresponding first storage unit based on a first data storage address carried in the third control signal; the data to be processed corresponding to each first storage unit comprises: and the PE connected with each first storage unit needs to read the data.
In an optional embodiment, the control unit is further configured to generate a fourth control signal based on the data processing instruction, and transmit the fourth control signal to the data scheduler; the data scheduler is further configured to perform a read access to the first storage unit based on the fourth control signal.
In an optional embodiment, the data scheduler is configured to read result data from the plurality of first storage units and store the result data in a second storage unit based on the fourth control signal; wherein the result data comprises: and the data generated by the PE connected with the first storage unit and stored in the first storage unit.
In a second aspect, an embodiment of the present disclosure further provides a data processing method, which is applied to a data processing apparatus, where the data processing apparatus includes: a first storage unit and a calculation unit; the compute unit includes an array of processing engines PE; the first storage units are respectively connected with the PEs in the PE array; the data processing method comprises the following steps: the PE performs read/write access to the connected first storage unit; the plurality of first storage units store data transmitted in a read/write access process of the connected PEs.
In an alternative embodiment, the plurality of first storage units are respectively connected to different PE groups in the PE array.
In an alternative embodiment, each first storage unit is connected to a PE in a group of PEs; different PEs belong to different PE groups, respectively.
In an optional embodiment, the PE group includes a plurality of PEs in the PE array, and the plurality of PEs are located in the same row, or in the same half row, or in the same block in the hardware layout.
In an alternative embodiment, the PE performing read/write access to the connected first storage unit includes: the PE performs read access on a connected first storage unit in a first processing cycle to obtain first data corresponding to the PE; and/or in a second processing cycle, performing write access to the connected first storage unit, and storing second data generated by the PE to the connected first storage unit.
In an alternative embodiment, the PE performing read/write access to the connected first storage unit includes: different PEs connected with the same first storage unit perform read/write access on the same first storage unit in different processing cycles; and/or one PE respectively has read/write access to the connected first storage unit in the same processing cycle in the PE group connected with different first storage units.
In an optional embodiment, a group of PEs to which different first storage units are connected, where there is one PE performing read/write access on the connected first storage unit in the same processing cycle, includes: in each PE group connected with different first storage units, the PEs with the same relative position perform read/write access on the connected first storage units in the same processing cycle.
In an optional embodiment, the data processing apparatus further comprises a control unit; the data processing method further comprises: the control unit generates a first control signal based on a data processing instruction and transmits the first control signal to the PE; and the PE reads first data to be processed by the PE from a first storage unit connected with the PE in response to receiving a first control signal transmitted by the control unit.
In an optional embodiment, the method further comprises: the control unit generates a second control signal based on the data processing instruction and transmits the second control signal to the PE; and the PE writes second data generated by the PE into a first storage unit connected with the PE in response to receiving a second control signal transmitted by the control unit.
In an optional embodiment, the data processing apparatus further comprises a data scheduler; the data processing method further comprises: the control unit generates a third control signal based on the data processing instruction and transmits the third control signal to the data scheduler; the data scheduler performs a write access to the first storage unit based on the third control signal.
In an optional embodiment, the data processing apparatus further comprises a second storage unit; the data scheduler reads the data to be processed corresponding to each first storage unit from the second storage unit, and stores the data to be processed corresponding to each first storage unit into the corresponding first storage unit based on the first data storage address carried in the third control signal; the data to be processed corresponding to each first storage unit comprises: and the PE connected with each first storage unit needs to read the data.
In an optional embodiment, the method further comprises: the control unit generates a fourth control signal based on the data processing instruction and transmits the fourth control signal to the data scheduler; the data scheduler performs a read access to the first memory cell based on the fourth control signal.
In an optional implementation, the data scheduler performs a read access to the first storage unit based on the fourth control signal, including: the data scheduler reads result data from the plurality of first storage units based on the fourth control signal and stores the result data into a second storage unit; wherein the result data comprises: and the data generated by the PE connected with the first storage unit and stored in the first storage unit.
In a third aspect, alternative implementations of the present disclosure also provide a computer device, including: instruction memory and the data processing apparatus provided by the first aspect of the present disclosure.
In a fourth aspect, alternative implementations of the present disclosure also provide a computer-readable storage medium having a computer program stored thereon, the computer program being executable by a processor to perform the steps of the second aspect described above, or any one of the possible implementations of the second aspect.
For the description of the effects of the data processing method, the computer device, and the computer-readable storage medium, reference is made to the description of the data processing apparatus, which is not repeated herein.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 shows a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a PE array provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating an internal structure of a PE according to an embodiment of the disclosure;
FIG. 4a is a schematic diagram illustrating a connection manner of a first memory unit and a PE array according to an embodiment of the disclosure;
FIG. 4b is a schematic diagram illustrating another connection manner of the first memory unit and the PE array according to an embodiment of the disclosure;
fig. 5 shows a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure when performing data processing.
Fig. 6 shows a flowchart of a data processing method provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It is found through research that when image data to be processed is processed by using an AI accelerator hardware structure, the image data to be processed generally needs to be transmitted from an external memory to registers included in PEs in a PE array, so that a computing unit in each PE in the PE array can read and process the image data corresponding to the image data to be processed from the registers, register arrays formed by registers included in different PEs share the same bus, and the bandwidth of the bus is limited.
In addition, after the image data to be processed is processed by the PE array, result data can be generated; the generated result data needs to be stored in an external memory; in this case, it is also necessary to transmit the result data generated by different PEs in the PE array to the external memory one by one using the bus. This results in a relatively long time required for storing the result data in the external memory, which results in a reduction in the efficiency of data transmission and also in a reduction in the efficiency of data processing.
Based on the research, the present disclosure provides a data processing apparatus, in the data processing apparatus, including a plurality of first memory cell, different first memory cell is connected with different PEs in the PE array respectively, every PE in the PE array can read/write access to the first memory cell connected with it, and then, different PEs connected with different first memory cell in the PE array can access different first memory cell in parallel, the efficiency of reading data from first memory cell has been promoted, and the efficiency of storing data to first memory cell has been promoted, thereby the data processing efficiency has been promoted.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
To facilitate understanding of the present embodiment, a detailed description is first given of a data processing apparatus provided in an embodiment of the present disclosure.
Referring to fig. 1, a schematic diagram of a data processing apparatus provided in an embodiment of the present disclosure is shown, where the apparatus includes: a first storage unit (a plurality of first storage units including a first storage unit 0 to a first storage unit 3 are shown in the figure) and a calculation unit; the computing unit comprises a processing engine PE array; the first storage units are respectively connected with the PEs in the PE array (a plurality of PEs are shown in the figure, including PE 0-PE 15); wherein the content of the first and second substances,
PE, which is used for performing read/write access to the connected first storage unit;
and the plurality of first storage units are used for storing data transmitted in the read/write access process of the connected PE.
Illustratively, the computing unit includes at least one PE array; the physical connection relationship between any PE in the PE array and other PEs is shown in fig. 2, where multiple PEs together form a 2d (dimension) torus network, and a PE may be connected to different physically connected PEs including those at upper, lower, left, and right positions.
The PE22 that specifically performs the correlation operation task and the PE21 located at the edge of the PE array are included in the PE array, and the PE21 located at the edge of the PE array is labeled halo in fig. 2 for the sake of distinction. The PE22 may complete operations such as multiply-add (MAC) operations on data; the PEs 21 form a peripheral ring array at the periphery of the PE array, and since data shift may occur between different PEs when the data is processed by the PE22 in the PE array, the PE21 in the peripheral ring array can ensure that data is not lost when data is shifted between different PEs in the PE array.
In addition, referring to fig. 3, a schematic diagram of an internal structure of a PE according to an embodiment of the present disclosure is provided. In this example, a PE22 is included that specifically performs the task of correlation operations, and a PE21 is connected to it at the edge of the PE array. In PE22, a memory access module 33, denoted M1; arithmetic Logic Unit 34 (ALU), denoted as ALU 1; an internal register 35, denoted R0_ 1; and a shift register file 36, wherein:
a memory access module M1, configured to perform read/write access on a first memory unit connected to PE22, where the memory access module M1 may transmit data obtained from the first memory unit to the internal register R0_1 or ALU1 when performing read access on the first memory unit, so as to wait for the obtained data to be processed by ALU1 in PE 22; or, the data is transmitted to a shift register file corresponding to PE22, so that the acquired data is transmitted to PE21 connected to PE 22; alternatively, when a write access is performed to the first memory unit, the resultant operation data calculated by the ALU1 may be written into the corresponding first memory unit.
An arithmetic logic unit ALU1 for performing data processing on the received data. Since there may be multiple intermediate calculation steps in data processing, the obtained intermediate calculation data may also be transferred to an internal register in the PE, and the intermediate calculation data in the internal register is called in the next calculation for further processing. After the result operation data is obtained, the result operation data may be transmitted to the memory access module M1 for output according to the actual data processing instruction; alternatively, the resultant operation data may be transferred to the shift register file corresponding to PE22, and waits to be transferred to PE21 connected to PE 22.
Here, for PE21, since PE21 does not take over the function of data operation, it may not include ALU internally, so as to reduce the requirement of equipment, and thus reduce the cost of equipment; or an ALU is present but does not actually perform the relevant data operations to reduce the complexity of the device integration. In fig. 3, a connection relationship similar to that of ALU1 in PE21, which may exist when ALU2 exists in PE21, is indicated by a dotted line.
An internal register R0_1 for receiving and storing data read by M1 from a first memory unit connected to PE 22; or connect with ALU1, store the intermediate operational data produced, and transmit the intermediate operational data to ALU1, so that ALU1 gets the result operational data, and store the result operational data; or transmit the resulting operation data to M1.
Here, for PE21, since PE21 may only complete the function of transferring data between multiple PEs, or only receive data transferred in the first storage unit, it may not contain an internal register internally, so as to reduce the device requirement and thus reduce the device cost; or an internal register exists but does not complete the related task of storage, so as to reduce the complexity of device integration. In fig. 3, a connection relationship similar to that of the internal register R0_1 in PE21, which may exist when the internal register R0_2 exists in PE21, is indicated by a dotted line.
And the shift register file is used for transmitting the data acquired by the PE to other PEs connected with the PE. In fig. 3, the shift register file 36 corresponding to PE22 may be electrically connected to a shift register file corresponding to a PE having a connection relationship in four directions, i.e., up, down, left, and right, where there are 4 shift registers corresponding to the shift register file, including R1, R2, R3, and R4; similarly, in PE21, there is also a shift register file 37 including R1 ', R2', R3 ', and R4'. When data is transmitted from PE22 to PE21, the data can be transmitted through a shift register in the shift register file 36 having a connection relationship with the shift register file 37, for example, when the shift register R4 in PE22 has a connection relationship with the shift register R4 'in PE21, the data can be received by the R4' corresponding to PE21 after the data is transmitted to R4 by PE22, so that the data can be received by PE 21.
Here, the structure of other PEs is similar to the internal structure of the PE, and is only illustrated here, and is not described again.
Different PEs in the PE array may be respectively connected to different first storage units, or may be connected to the same first storage unit. The PE and the first storage unit having a connection relationship may perform data up-transfer, and the PE may read data in the first storage unit and transfer the processed data to the first storage unit.
For each first storage unit, a corresponding plurality of PEs may be connected, and thus the first storage unit may include a plurality of storage units corresponding to the number of PEs, so as to correspondingly store data read/written by each PE of the connected plurality of PEs.
Specifically, when determining the connection relationship between the first storage units and the PEs in the PE array, for example, the PEs may be grouped first, and a corresponding first storage unit may be determined for each group of grouped PEs. When multiple PEs are grouped, multiple PEs having a physical connection relationship may be regarded as one PE group, and the multiple PEs are located in the same row, or in the same half row, or in the same block in the hardware layout.
For example, in one possible implementation, a row of PEs may be used as a PE group, as shown in fig. 4a, which shows a schematic diagram of a connection manner of the first memory unit and the PE array. In fig. 4a, the first row PE is denoted as a PE group (PE group) as G0, the second row is denoted as a PE group as G1, and so on until a PE group is divided for the nth row, denoted as Gn. And allocating a corresponding first storage unit 0 to the PE group G0, allocating a corresponding first storage unit 1 to the PE group G1, and so on until allocating a corresponding first storage unit n to the PE group Gn.
In another possible embodiment, referring to FIG. 4b, another schematic diagram of the connection of the first memory unit to the PE array is shown. In fig. 4b, two rows of PEs are taken as a PE group, the PEs in the first row and the second row are taken as a PE group, which is denoted as G0, the PEs in the third row and the fourth row are taken as a PE group, which is denoted as G1, and so on, the multiple rows of PEs in the PE array may be divided into n different PE groups, that is, n PE groups are divided for the PE array, and then corresponding n first storage units are allocated to the n different PE groups, which may include, for example, first storage unit 0 to first storage unit n.
Specifically, each PE in the PE array may also be regarded as a PE group, and a corresponding first storage unit may be allocated to each PE, that is, each PE in the PE array has a corresponding storage unit. By further dividing the first storage unit in this way, the throughput of data interaction can be maximized, and thus the time consumed in data transmission is reduced.
Here, the number of PEs included in each PE group in the PE groups may be the same or different, for example, in order to reduce the influence of complexity of routing in the circuit, a PE that is relatively close in physical connection relationship is used as one PE group, or in a device with higher pertinence, a plurality of PEs having a stronger arithmetic processing function are used as one PE group. That is, the specific way of determining the PE group may be determined according to actual situations, and is not limited herein.
When the PE performs read/write access to the first storage unit, aiming at the condition of performing read access to the first storage unit: for example, in a first processing cycle, read access may be performed on the connected first storage unit to obtain first data corresponding to the PE;
for the case of performing write access to the first storage unit, for example, in the second processing cycle, write access may be performed to the connected first storage unit, and the second data generated by the PE may be stored in the connected first storage unit.
The processing period can be determined according to an actual data processing process, and in a processing step of multiplying and adding data, for example, the calculation is simple, so that two or three clock periods can be included; in the processing step such as weighted filtering of data, four or five clock cycles may be included because the calculation is complicated. That is, the number of clock cycles included in a processing cycle is related to the actual processing procedure, and the number of clock cycles included in different processing cycles may be the same or different.
In addition, because a plurality of PE groups exist, and when Data transmission is performed for PEs in the PE groups, Data transmission between each group of PEs and the corresponding first storage unit is realized in a Single Instruction Multiple Data (SIMD) manner, different PEs connected to the same first storage unit can perform read/write access to the same first storage unit in different Data processing cycles; and/or one PE respectively has read/write access to the connected first storage unit in the same processing cycle in the PE group connected with different first storage units.
For example, for the multiple first storage units and the multiple PEs shown in fig. 1, in one processing cycle, the PEs at the same position in each group of PEs may perform read access on the corresponding first storage units, taking the embodiment corresponding to fig. 1 as an example, there are 4 first storage units, which are: a first memory cell 0, a first memory cell 1, a first memory cell 2, and a first memory cell 3, the PE connected to the first memory cell 0 including: the storage unit comprises PE0, PE1, PE2 and PE3, the PE connected with the first storage unit 1 comprises PE4, PE5, PE6 and PE7, the PE connected with the first storage unit 2 comprises PE8, PE9, PE10 and PE11, and the PE connected with the first storage unit 3 comprises PE12, PE13, PE14 and PE 15; in this example, the first processing cycle may include 4 clock cycles; in the first clock cycle, PE0 performs read access to first memory cell 0, PE4 performs read access to first memory cell 1, PE8 performs read access to first memory cell 2, and PE12 performs read access to first memory cell 3.
Then, PE0, PE4, PE8, and PE12 may store the read data in the corresponding internal memory, so that the PE including the arithmetic logic unit may perform arithmetic processing on the read data, or the PE not including the arithmetic logic unit may store the read data, and wait for a shift of the next processing cycle or other data transfer.
In the second clock cycle, PE1 performs read access to first memory cell 0, and at the same time, PE5 performs read access to first memory cell 1, PE9 performs read access to first memory cell 2, and PE13 performs read access to first memory cell 3; in the third clock cycle, PE2 performs read access to first memory cell 0, and at the same time, PE6 performs read access to first memory cell 1, PE10 performs read access to first memory cell 2, and PE14 performs read access to first memory cell 3; in the fourth clock cycle, PE3 performs read access to first memory cell 0, and at the same time, PE7 performs read access to first memory cell 1, PE11 performs read access to first memory cell 2, and PE15 performs read access to first memory cell 3. In this way, in the first processing cycle, the PE performs read access to the first storage unit, and transmits the first data, which is stored in the first storage unit correspondingly and waits for processing by the PE, to the internal registers corresponding to the PEs, respectively, to wait for further data access.
In another possible implementation, when the number of PEs in the PE array is large and the size of the image to be processed is small, there may be a case where only a part of the PEs in the PE array need to be used to process the image to be processed, and therefore there may also be a case where the part of the PEs do not access the corresponding first storage units in the first processing cycle, and continue to wait for the data processing instruction of the next processing cycle.
Specifically, when the PE performs read access to the first storage unit, a control unit in the data processing apparatus generates a first control signal based on a data processing instruction and transmits the first control signal to the PE, and the PE reads first data to be processed by the PE from the first storage unit connected to the PE in response to receiving the first control signal transmitted by the control unit.
The data processing instruction may include related instructions for controlling the PE to operate on the data in the first memory cell, such as a data transfer instruction (MOV), an addition instruction (ADD), a subtraction instruction (SUB), a logical AND instruction (AND), AND the like.
Taking the example of processing any image to be processed by using the data processing apparatus, after the first storage unit processes and stores the image, the control unit may generate a first control signal based on the data transfer instruction, where the first control signal includes a data address accessed by the PE receiving the first control signal when performing read access on the first storage unit, and is used to control the PE receiving the first control signal to read data from the corresponding first storage unit, and store the read data in the corresponding internal register.
For example, in the first storage unit 0 shown in fig. 1, since four PEs, that is, PE0 to PE3, are connected, the first storage unit may include corresponding four data storage spaces (spaces), which are denoted by s0, s1, s2, and s3, the first control signal transmitted to the PE0 by the control unit may include, for example, an address of s0, and after the PE0 receives the first control signal, corresponding data may be read from the data storage Space s0 in the connected first storage unit 0 according to the address of s0 carried in the first control signal.
The manner of reading data from the corresponding first storage unit by the other PEs is similar to the manner of reading data from the first storage unit 0 by the PE0, and is not described herein again.
In addition, when the image to be processed is processed and stored in the first storage unit, for example, the following manner may be adopted: the control unit generates a third control signal based on the data processing instruction and transmits the third control signal to a data scheduler in the data processing device; the data scheduler performs a write access to the first storage unit based on a third control signal.
The third control signal may carry, for example, a first data storage address, where the first data storage address is used to determine a storage location of the to-be-processed data stored in the first storage unit.
In a specific implementation, the data processing apparatus further includes a second storage unit, and the second storage unit may include an external memory for storing data such as an original image and a feature map to be processed. The embodiment of the present disclosure describes a detailed process of data processing of a data device by taking processing of an original image as an example. Taking the PE array shown in fig. 1 as an example, when each PE can process sub-image data composed of 4 × 4 pixels, each PE can equally process the corresponding 4 × 4 pixels when the image size (unit is pixel) is 16 × 16. At this time, the data included in the obtained 16 sub-images may be stored in the second storage unit, and the data scheduler may wait to read the data from the second storage unit; moreover, since the data stored in the second storage unit is the data that can be directly processed by the PE, when the data in the second storage unit is stored in the first storage unit, only the transmission of the data can be completed without performing division and other processing on the data, thereby reducing the processing task of the data processing device during data transmission and improving the efficiency of data transmission; in addition, since the data stored in the second storage unit can be directly used as the data to be processed corresponding to the first storage unit, the data to be processed can be read by the first storage unit and the PE connected with the first storage unit.
Specifically, the data scheduler reads the data to be processed corresponding to each first storage unit from the second storage unit, and stores the data to be processed corresponding to each first storage unit into the corresponding first storage unit based on the first data storage address carried in the third control signal; the data to be processed corresponding to each first storage unit comprises: the PE connected to each first memory cell needs to read the data.
After the first storage unit stores the data to be read by the connected PE, the PE may wait for the control unit to transmit the control signal, and after receiving the first control signal sent by the control unit, read the corresponding data from the corresponding first storage unit for processing. At this time, for a more complex image processing algorithm, for example, when performing convolution processing on an image, a plurality of steps such as weighted summation may be included, so that during processing, a plurality of intermediate data may exist, and these intermediate data may be stored in the internal memories respectively corresponding to the PEs for temporary storage, and then the data temporarily stored in the internal memories is directly called for processing during next processing until all data processing tasks on the original image are completed.
Alternatively, the intermediate data may be transferred to the first storage unit, but the intermediate data in the first storage unit may not be output to the second storage unit because the intermediate data is not the final output result data and further processing is required.
Specifically, the control unit may generate a second control signal based on the data processing instruction and transmit the second control signal to the PE; and the PE writes the data generated by the PE into a first storage unit connected with the PE in response to receiving a second control signal transmitted by the control unit.
The second control signal is similar to the first control signal, and includes a data address accessed by the PE receiving the second control signal when performing write access in the first storage unit, and is used to control the PE receiving the second control signal to write data into the corresponding first storage unit, so that the first storage unit receives the data written by the corresponding PE, waits for output to the second storage unit, and has obtained a processing result of the original image.
After the PE completes all processing steps on the data in the original image, the result data for output can be obtained, and at this time, the control unit can also generate a fourth control signal and transmit the fourth control signal to the data scheduler; the data scheduler reads the result data from the plurality of first storage units and stores the result data into the second storage unit based on the fourth control signal; wherein the result data includes data generated by the PE connected to the first storage unit and stored in the first storage unit.
Specifically, the fourth control signal may carry a second data storage address, where the second data storage address is used to indicate a location where the data scheduler stores the result data in the second storage unit. In addition, the fourth control signal may not carry the storage address of the second data.
For example, the data scheduler may read the result data respectively generated by the PE0, the PE1, the PE2, and the PE3, that is, the result data stored in the four data storage spaces s0, s1, s2, and s3 in the first storage unit 0, from the first storage unit 0, and then store the result data in the second storage unit, to obtain the processing result of the original image.
In a possible implementation manner, the control unit may further control to sequentially stitch the plurality of result data output in the second storage unit, so as to restore the plurality of result data obtained from the original image divided into the plurality of sub-images to result data corresponding to the original image.
The embodiment of the present disclosure also provides a specific example of performing convolution processing on the original image a by using the data processing apparatus.
Fig. 5 is a schematic diagram of the data processing apparatus during data processing; there are 4 memory units, which are respectively denoted as PE _ RAM 0-PE _ RAM3, and the PE array includes 16 PEs, which are respectively denoted as PE 0-PE 15.
Among them, PE0 to PE3, PE4 to PE7, PE8 to PE11, and PE12 to PE15 are respectively represented as G0, G1, G2, and G3.
After determining the PE sub-array, it may be determined to take PE _ RAM0 in the first memory cell as the first memory cell corresponding to G0; taking the PE _ RAM1 in the first storage unit as a first storage unit corresponding to G1; taking the PE _ RAM2 in the first storage unit as a first storage unit corresponding to G2; and the PE _ RAM3 in the first storage unit is taken as the first storage unit corresponding to G3.
When the data processing device is used for completing the operation on the convolutional layer, the control unit generates a third control signal C3 based on the data processing instruction, and sends the third control signal to the data scheduler, the data scheduler performs read access on a second storage unit, the second storage unit stores the data corresponding to the original image A, and then the data scheduler stores the data used for performing convolution calculation in the second storage unit into the first storage unit.
Then, the control unit sends a first control signal C1 to the PEs, and each PE operating in the PE array reads the first data to be processed from the corresponding first storage unit, and then performs corresponding calculation.
Wherein C1 controls the following operations: in a first clock cycle, PE0, PE4, PE8 and PE12 respectively corresponding to PE _ RAM 0-PE _ RAM3 read respectively corresponding first data to be processed; in the second clock cycle, the PE1, the PE5, the PE9, and the PE13 read the respective corresponding first data to be processed; in the third clock cycle, the PE2, the PE6, the PE10, and the PE14 read the respective corresponding first data to be processed; in the fourth clock cycle, PE3, PE7, PE11, and PE15 read the first data to be processed respectively corresponding to each other.
Then, PE0 to PE15 perform data processing on the respective corresponding first data to be processed, for example, perform convolution operation processing on the first data to obtain second data.
Here, the second data is result data.
After the PE in the PE array processes the first data to obtain the second data, the control unit sends a second control signal C2 to the PE, and writes the second data in the PE into the first storage unit corresponding to the PE. At this time, the control unit sends a fourth control signal C4 to the data scheduler, causing the data scheduler to read out the resultant data from the first storage unit and store it in the second storage unit.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, a data processing method corresponding to the data processing apparatus is also provided in the embodiments of the present disclosure, and since the principle of solving the problem of the method in the embodiments of the present disclosure is similar to that of the data processing apparatus in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.
Referring to fig. 6, a schematic diagram of a data processing method provided in an embodiment of the present disclosure is shown, where the data processing method is applied to a data processing apparatus; the data processing method comprises the following steps:
s601: the PE performs read/write access on the connected first storage unit;
s602: the plurality of first storage units store data transferred during read/write access by the connected PEs.
In an alternative embodiment, the plurality of first storage units are respectively connected to different PE groups in the PE array.
In an alternative embodiment, each first storage unit is connected to a PE in a group of PEs; different PEs belong to different PE groups, respectively.
In an optional embodiment, the PE group includes a plurality of PEs in the PE array, and the plurality of PEs are located in the same row, or in the same half row, or in the same block in the hardware layout.
In an alternative embodiment, the PE performing read/write access to the connected first storage unit includes: the PE performs read access on a connected first storage unit in a first processing cycle to obtain first data corresponding to the PE; and/or in a second processing cycle, performing write access to the connected first storage unit, and storing second data generated by the PE to the connected first storage unit.
In an alternative embodiment, the PE performing read/write access to the connected first storage unit includes: different PEs connected with the same first storage unit perform read/write access on the same first storage unit in different processing cycles; and/or one PE respectively has read/write access to the connected first storage unit in the same processing cycle in the PE group connected with different first storage units.
In an optional embodiment, a group of PEs to which different first storage units are connected, where there is one PE performing read/write access on the connected first storage unit in the same processing cycle, includes: in each PE group connected with different first storage units, the PEs with the same relative position perform read/write access on the connected first storage units in the same processing cycle.
In an optional embodiment, the data processing apparatus further comprises a control unit; the data processing method further comprises: the control unit generates a first control signal based on a data processing instruction and transmits the first control signal to the PE; and the PE reads first data to be processed by the PE from a first storage unit connected with the PE in response to receiving a first control signal transmitted by the control unit.
In an optional embodiment, the method further comprises: the control unit generates a second control signal based on the data processing instruction and transmits the second control signal to the PE; and the PE writes second data generated by the PE into a first storage unit connected with the PE in response to receiving a second control signal transmitted by the control unit.
In an optional embodiment, the data processing apparatus further comprises a data scheduler; the data processing method further comprises: the control unit generates a third control signal based on the data processing instruction and transmits the third control signal to the data scheduler; the data scheduler performs a write access to the first storage unit based on the third control signal.
In an optional embodiment, the data processing apparatus further comprises a second storage unit; the data scheduler reads the data to be processed corresponding to each first storage unit from the second storage unit, and stores the data to be processed corresponding to each first storage unit into the corresponding first storage unit based on the first data storage address carried in the third control signal; the data to be processed corresponding to each first storage unit comprises: and the PE connected with each first storage unit needs to read the data.
In an optional embodiment, the method further comprises: the control unit generates a fourth control signal based on the data processing instruction and transmits the fourth control signal to the data scheduler; the data scheduler performs a read access to the first memory cell based on the fourth control signal.
In an optional implementation, the data scheduler performs a read access to the first storage unit based on the fourth control signal, including: the data scheduler reads result data from the plurality of first storage units based on the fourth control signal and stores the result data into a second storage unit; wherein the result data comprises: and the data generated by the PE connected with the first storage unit and stored in the first storage unit.
An embodiment of the present disclosure further provides a computer device, including: instruction memory and the data processing device that this disclosed embodiment provided.
The data processing device provided by the embodiment of the disclosure may include a chip, an AI chip, and the like. The computer device provided by the embodiment of the present disclosure may include an intelligent terminal such as a mobile phone, or may also be other devices, servers, and the like that may be used for data processing, and is not limited herein.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the data processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the data processing method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (25)

1. A data processing apparatus, comprising: a first storage unit and a calculation unit; the compute unit includes an array of processing engines PE; the first storage units are respectively connected with the PEs in the PE array;
the PE is used for performing read/write access on the connected first storage unit;
the first storage units are used for storing data transmitted in the read/write access process of the connected PE.
2. The data processing apparatus according to claim 1, wherein the plurality of first storage units are respectively connected to different PE groups in the PE array.
3. The data processing apparatus according to claim 1 or 2, wherein each first storage unit is connected to a PE of a group of PEs; different PEs belong to different PE groups, respectively.
4. The data processing apparatus according to claim 3, wherein the PE group includes a plurality of PEs having physical connection relationships in the PE array, and the PEs are located in a same row, a same half row, or a same block in a hardware layout.
5. The data processing apparatus according to any one of claims 1 to 4, wherein the PE is configured to perform, in a first processing cycle, a read access to the connected first storage unit to obtain first data corresponding to the PE; and/or
And in a second processing cycle, performing write access on the connected first storage unit, and storing second data generated by the PE to the connected first storage unit.
6. A data processing apparatus as claimed in any one of claims 1 to 5, characterized in that different PEs connected to the same first memory location have read/write access to the same first memory location in different processing cycles;
and/or the presence of a gas in the gas,
in the PE groups connected with different first storage units, one PE respectively has read/write access to the connected first storage units in the same processing cycle.
7. The data processing apparatus of claim 6, wherein, of the groups of PEs to which different first storage units are connected, PEs having the same relative position perform read/write access to the connected first storage units in the same processing cycle.
8. The data processing apparatus according to any one of claims 1 to 7, further comprising: a control unit;
the control unit is used for generating a first control signal based on a data processing instruction and transmitting the first control signal to the PE;
and the PE is used for responding to the received first control signal transmitted by the control unit and reading the first data to be processed by the PE from a first storage unit connected with the PE.
9. The data processing apparatus according to claim 8, wherein the control unit is further configured to generate a second control signal based on the data processing instruction, and to transmit the second control signal to the PE;
and the PE is used for responding to the second control signal transmitted by the control unit and writing the data generated by the PE into a first storage unit connected with the PE.
10. The data processing apparatus according to claim 8 or 9, further comprising: a data scheduler;
the control unit is further configured to generate a third control signal based on the data processing instruction, and transmit the third control signal to the data scheduler;
the data scheduler is configured to perform a write access to the first storage unit based on the third control signal.
11. The data processing apparatus according to claim 10, further comprising a second storage unit;
the data scheduler is configured to read to-be-processed data corresponding to each first storage unit from the second storage unit, and store to-be-processed data corresponding to each first storage unit into the corresponding first storage unit based on a first data storage address carried in the third control signal;
the data to be processed corresponding to each first storage unit comprises: and the PE connected with each first storage unit needs to read the data.
12. The data processing apparatus according to claim 10 or 11, wherein the control unit is further configured to generate a fourth control signal based on the data processing instruction, and to transmit the fourth control signal to the data scheduler;
the data scheduler is further configured to perform a read access to the first storage unit based on the fourth control signal.
13. The data processing apparatus according to claim 12, wherein the data scheduler is configured to read result data from the plurality of first storage units and store the result data in a second storage unit based on the fourth control signal;
wherein the result data comprises: and the data generated by the PE connected with the first storage unit and stored in the first storage unit.
14. A data processing method applied to a data processing apparatus, the data processing apparatus comprising: a first storage unit and a calculation unit; the compute unit includes an array of processing engines PE; the first storage units are respectively connected with the PEs in the PE array; the data processing method comprises the following steps:
the PE performs read/write access to the connected first storage unit;
the plurality of first storage units store data transmitted in a read/write access process of the connected PEs.
15. The data processing method of claim 14, wherein the PE performing read/write access to the connected first memory location comprises:
the PE performs read access on a connected first storage unit in a first processing cycle to obtain first data corresponding to the PE; and/or
And in a second processing cycle, performing write access on the connected first storage unit, and storing second data generated by the PE to the connected first storage unit.
16. The data processing method according to claim 14 or 15, wherein the PE performs read/write access to the connected first memory unit, comprising:
different PEs connected with the same first storage unit perform read/write access on the same first storage unit in different processing cycles;
and/or the presence of a gas in the gas,
in the PE groups connected with different first storage units, one PE respectively has read/write access to the connected first storage units in the same processing cycle.
17. The data processing method according to claim 16, wherein the group of PEs connected to different first storage units, in which there is one PE performing read/write access to the connected first storage unit in the same processing cycle, respectively, comprises: in each PE group connected with different first storage units, the PEs with the same relative position perform read/write access on the connected first storage units in the same processing cycle.
18. The data processing method according to any one of claims 14 to 17, wherein the data processing apparatus further comprises a control unit; the data processing method further comprises:
the control unit generates a first control signal based on a data processing instruction and transmits the first control signal to the PE;
and the PE reads first data to be processed by the PE from a first storage unit connected with the PE in response to receiving a first control signal transmitted by the control unit.
19. The data processing method of claim 18, further comprising:
the control unit generates a second control signal based on the data processing instruction and transmits the second control signal to the PE;
and the PE writes second data generated by the PE into a first storage unit connected with the PE in response to receiving a second control signal transmitted by the control unit.
20. The data processing method according to claim 18 or 19, wherein the data processing apparatus further comprises a data scheduler; the data processing method further comprises:
the control unit generates a third control signal based on the data processing instruction and transmits the third control signal to the data scheduler;
the data scheduler performs a write access to the first storage unit based on the third control signal.
21. The data processing method of claim 20, wherein the data processing apparatus further comprises a second storage unit;
the data scheduler reads the data to be processed corresponding to each first storage unit from the second storage unit, and stores the data to be processed corresponding to each first storage unit into the corresponding first storage unit based on the first data storage address carried in the third control signal;
the data to be processed corresponding to each first storage unit comprises: and the PE connected with each first storage unit needs to read the data.
22. The data processing method according to claim 20 or 21, further comprising:
the control unit generates a fourth control signal based on the data processing instruction and transmits the fourth control signal to the data scheduler;
the data scheduler performs a read access to the first memory cell based on the fourth control signal.
23. The data processing method of claim 22, wherein the data scheduler performs a read access to the first memory location based on the fourth control signal, comprising:
the data scheduler reads result data from the plurality of first storage units based on the fourth control signal and stores the result data into a second storage unit;
wherein the result data comprises: and the data generated by the PE connected with the first storage unit and stored in the first storage unit.
24. A computer device, comprising: an instruction memory and a data processing apparatus as claimed in any one of claims 1 to 13.
25. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by data processing means, carries out the steps of the data processing method according to any one of claims 14 to 23.
CN202110221038.1A 2021-02-26 2021-02-26 Data processing device, method, computer equipment and storage medium Pending CN112967172A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110221038.1A CN112967172A (en) 2021-02-26 2021-02-26 Data processing device, method, computer equipment and storage medium
PCT/CN2021/115780 WO2022179074A1 (en) 2021-02-26 2021-08-31 Data processing apparatus and method, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110221038.1A CN112967172A (en) 2021-02-26 2021-02-26 Data processing device, method, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112967172A true CN112967172A (en) 2021-06-15

Family

ID=76275819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110221038.1A Pending CN112967172A (en) 2021-02-26 2021-02-26 Data processing device, method, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112967172A (en)
WO (1) WO2022179074A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596472A (en) * 2021-07-27 2021-11-02 安谋科技(中国)有限公司 Data processing method and device
CN113872752A (en) * 2021-09-07 2021-12-31 哲库科技(北京)有限公司 Security engine module, security engine device and communication equipment
WO2022179074A1 (en) * 2021-02-26 2022-09-01 成都商汤科技有限公司 Data processing apparatus and method, computer device, and storage medium
WO2023151216A1 (en) * 2022-02-14 2023-08-17 华为技术有限公司 Graph data processing method and chip

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164752A1 (en) * 2004-08-13 2009-06-25 Clearspeed Technology Plc Processor memory system
JP2012164144A (en) * 2011-02-07 2012-08-30 Denso Corp Microcomputer
US20130024658A1 (en) * 2011-07-21 2013-01-24 Renesas Electronics Corporation Memory controller and simd processor
CN110892373A (en) * 2018-07-24 2020-03-17 深圳市大疆创新科技有限公司 Data access method, processor, computer system and removable device
CN111045727A (en) * 2018-10-14 2020-04-21 天津大学青岛海洋技术研究院 Processing unit array based on nonvolatile memory calculation and calculation method thereof
CN111897579A (en) * 2020-08-18 2020-11-06 腾讯科技(深圳)有限公司 Image data processing method, image data processing device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5625836A (en) * 1990-11-13 1997-04-29 International Business Machines Corporation SIMD/MIMD processing memory element (PME)
CN106502923B (en) * 2016-09-30 2018-08-24 西安邮电大学 Storage accesses ranks two-stage switched circuit in cluster in array processor
CN107590085B (en) * 2017-08-18 2018-05-29 浙江大学 A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN111209249B (en) * 2020-01-10 2021-11-02 中山大学 Hardware accelerator system based on time domain finite difference method and implementation method thereof
CN112967172A (en) * 2021-02-26 2021-06-15 成都商汤科技有限公司 Data processing device, method, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164752A1 (en) * 2004-08-13 2009-06-25 Clearspeed Technology Plc Processor memory system
JP2012164144A (en) * 2011-02-07 2012-08-30 Denso Corp Microcomputer
US20130024658A1 (en) * 2011-07-21 2013-01-24 Renesas Electronics Corporation Memory controller and simd processor
CN110892373A (en) * 2018-07-24 2020-03-17 深圳市大疆创新科技有限公司 Data access method, processor, computer system and removable device
CN111045727A (en) * 2018-10-14 2020-04-21 天津大学青岛海洋技术研究院 Processing unit array based on nonvolatile memory calculation and calculation method thereof
CN111897579A (en) * 2020-08-18 2020-11-06 腾讯科技(深圳)有限公司 Image data processing method, image data processing device, computer equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022179074A1 (en) * 2021-02-26 2022-09-01 成都商汤科技有限公司 Data processing apparatus and method, computer device, and storage medium
CN113596472A (en) * 2021-07-27 2021-11-02 安谋科技(中国)有限公司 Data processing method and device
CN113596472B (en) * 2021-07-27 2023-12-22 安谋科技(中国)有限公司 Data processing method and device
CN113872752A (en) * 2021-09-07 2021-12-31 哲库科技(北京)有限公司 Security engine module, security engine device and communication equipment
CN113872752B (en) * 2021-09-07 2023-10-13 哲库科技(北京)有限公司 Security engine module, security engine device, and communication apparatus
WO2023151216A1 (en) * 2022-02-14 2023-08-17 华为技术有限公司 Graph data processing method and chip

Also Published As

Publication number Publication date
WO2022179074A1 (en) 2022-09-01

Similar Documents

Publication Publication Date Title
CN112967172A (en) Data processing device, method, computer equipment and storage medium
Lu et al. Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks
CN110050267B (en) System and method for data management
US6067609A (en) Pattern generation and shift plane operations for a mesh connected computer
JP2022540749A (en) Systems and methods for shift-based information mixing across channels of neural networks similar to shuffle nets
US10768856B1 (en) Memory access for multiple circuit components
CN108170640B (en) Neural network operation device and operation method using same
US11487845B2 (en) Convolutional operation device with dimensional conversion
US20040215677A1 (en) Method for finding global extrema of a set of bytes distributed across an array of parallel processing elements
CN111897579A (en) Image data processing method, image data processing device, computer equipment and storage medium
CN106846235B (en) Convolution optimization method and system accelerated by NVIDIA Kepler GPU assembly instruction
WO1999053412A9 (en) Global input/output support for a mesh connected computer
WO2023045445A1 (en) Data processing device, data processing method, and related product
Huang et al. IECA: An in-execution configuration CNN accelerator with 30.55 GOPS/mm² area efficiency
WO2023045446A1 (en) Computing apparatus, data processing method, and related product
CN114003198B (en) Inner product processing unit, arbitrary precision calculation device, method, and readable storage medium
WO2022179075A1 (en) Data processing method and apparatus, computer device and storage medium
CN116051345A (en) Image data processing method, device, computer equipment and readable storage medium
CN114692844A (en) Data processing device, data processing method and related product
Qiu et al. An FPGA‐Based Convolutional Neural Network Coprocessor
Borges AlexNet deep neural network on a many core platform
CN110766150A (en) Regional parallel data loading device and method in deep convolutional neural network hardware accelerator
WO2022111013A1 (en) Device supporting multiple access modes, method and readable storage medium
US11392667B2 (en) Systems and methods for an intelligent mapping of neural network weights and input data to an array of processing cores of an integrated circuit
WO2022001454A1 (en) Integrated computing apparatus, integrated circuit chip, board card, and computing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40047364

Country of ref document: HK