CN115423084A - Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium - Google Patents

Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium Download PDF

Info

Publication number
CN115423084A
CN115423084A CN202211216188.4A CN202211216188A CN115423084A CN 115423084 A CN115423084 A CN 115423084A CN 202211216188 A CN202211216188 A CN 202211216188A CN 115423084 A CN115423084 A CN 115423084A
Authority
CN
China
Prior art keywords
data
convolution
buffer
characteristic
basic operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211216188.4A
Other languages
Chinese (zh)
Inventor
王宇
吴珺媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lichi Semiconductor Co ltd
Original Assignee
Shanghai Lichi Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lichi Semiconductor Co ltd filed Critical Shanghai Lichi Semiconductor Co ltd
Priority to CN202211216188.4A priority Critical patent/CN115423084A/en
Publication of CN115423084A publication Critical patent/CN115423084A/en
Priority to US18/158,711 priority patent/US20240126716A1/en
Priority to EP23165020.1A priority patent/EP4345638A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The utility model provides a pulsation array, pulsation array system and operation method, device, storage medium thereof, through according to the work order that receives, confirm the work pattern that the work order instructed, when the work pattern is the sequencing mode, after distributing to the different configuration values of the control register of first basic arithmetic element and second basic arithmetic element in the pulsation array through the sequencing control signal that the array controller sent, progressively input the characteristic data of characteristic buffer and its corresponding label data into the pulsation array according to the group and carry out the synchronous sequencing operation, and export sequencing characteristic data and its corresponding synchronous label data through the output buffer after sequencing is finished, pass back to the system bus, can realize the function of direct calculation and tracking of complicated sequencing operator on the pulsation array, improve the utilization ratio of pulsation array, avoided the problem that the data handling leads to the waste of transmission bandwidth.

Description

Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to a pulse array, a pulse array system, an operation method, an operation device, and a storage medium thereof.
Background
With the rapid development of science and technology, convolutional Neural Networks (CNNs) are Neural network structures designed for processing image data in the field of artificial intelligence, and have translational invariance and locality mathematically by using convolution operation, so that image features can be effectively extracted, and therefore, the Convolutional Neural Networks are widely applied to the field of computer vision such as image classification and target detection.
The high performance of convolutional neural networks on image processing problems requires enormous computational effort to support. In general, in a convolutional neural network, in addition to the convolutional operation, the operation amount occupied by the sorting operation is also large. Sequencing operation is commonly used in target detection algorithms, and with the continuous development of the algorithms, the position of each piece of sequenced data in the original data needs to be known in many detection scenarios, for example, when the same data appears, the data needs to be distinguished according to the position of the original data. For another example, a Non-Max suppression operator (Non-Max suppression/NMS) in the target detection algorithm needs to sort data according to a specific item, and reserve the rest items of information for subsequent operations.
In the prior art, there are two solutions for special sorting operators: one is to efficiently process the sort class operator by adding an extra heap sort (heap sort) hardware unit, and the other is to support the sort class operator by an internal vector processing unit. The extra hardware unit is added to increase unnecessary area, the data to be sorted is generally generated by the internal pulsation array of the accelerator, the extra hardware or the vector unit is used to transfer the data from the pulsation array to the external processing, the problem of bandwidth waste caused by data transportation cannot be solved, the tracking function of the sorted data cannot be directly realized through the existing pulsation array, and the utilization rate of the existing pulsation array is poor.
Disclosure of Invention
The present disclosure provides a systolic array, a systolic array system, and methods, apparatuses, and storage media for computing the systolic array system, so as to at least solve the above technical problems in the prior art.
According to a first aspect of the present disclosure, there is provided a systolic array comprising:
the first basic operation units and the second basic operation units are connected in a matrix arrangement mode corresponding to rows and used for finishing synchronous sorting operation of the labeled characteristic data in a sorting mode; wherein the content of the first and second substances,
each first basic operation unit comprises a first comparator, a first control register and a first result buffer, wherein the first comparator is used for comparing input characteristic data; the first control register is used for outputting a comparison symbol result to a second basic operation unit corresponding to a current first basic operation unit as a synchronous control signal, outputting a comparison data result to the first result buffer and a first characteristic input register of a next basic operation unit respectively, and outputting temporary characteristic data finally stored in the first result buffer as sequencing characteristic data after sequencing is finished;
each second basic operation unit comprises a second comparator, a second control register, a second result buffer and a synchronous weight input register, wherein the second comparator is used for comparing input label data according to a synchronous control signal received by the synchronous weight input register; and the second control register is used for respectively outputting the comparison tag result to the second result buffer and a second characteristic input register of a next basic operation unit, and after the sorting is finished, the temporary tag data finally stored in the second result buffer is used as synchronous tag data of the sorting characteristic data to be synchronously output.
In an embodiment, each of the first basic operation units includes a first feature input register for storing feature data, and the first result buffer is used for temporarily storing temporary feature data; each second basic operation unit comprises a second characteristic input register used for storing label data, and the second result buffer is used for temporarily storing temporary label data;
correspondingly, the first control register is further configured to successively input the feature data of the first feature input register and the temporary feature data of the first result buffer into the first comparator; the second control register is further configured to successively input the tag data of the second feature input register and the temporary tag data of the second result buffer to the second comparator.
In an embodiment, the first comparator is specifically configured to:
successively comparing the size of the feature data input by the first feature input register with the size of the temporary feature data input by the first result buffer; according to a preset sorting rule, taking the feature data meeting the first sorting condition as new temporary feature data, and taking the feature data meeting the second sorting condition as feature data in a first feature input register of a next basic operation unit;
correspondingly, the first control register is further specifically configured to: and outputting the new temporary feature data to the first result buffer, and outputting the feature data meeting a second sorting condition to a first feature input register of the next basic operation unit.
In an embodiment, the second comparator is specifically configured to:
according to the synchronous control signal received by the synchronous weight input register, comparing the label data input by the second characteristic input register with the temporary label data input by the second result buffer, taking the label data meeting the first ordering condition as new temporary label data, and taking the label data meeting the second ordering condition as the label data in the second characteristic input register of the next basic operation unit;
correspondingly, the second control register is further specifically configured to: and synchronously outputting the new temporary label data to the second result buffer, and synchronously outputting the label data meeting a second sorting condition to a second characteristic input register of the next basic operation unit.
In one embodiment, the systolic array further comprises:
the plurality of first basic operation units and the plurality of second basic operation units are classified into the same convolution basic operation unit, and the convolution basic operation unit comprises a convolution weight input register, a convolution characteristic input register, a convolution result buffer, a convolution control register and a multiplier-adder; wherein the content of the first and second substances,
the convolution weight input register is used for storing convolution weight data;
the convolution characteristic input register is used for storing convolution characteristic data;
the convolution result buffer is used for temporarily storing convolution temporary data;
the multiplier-adder is used for taking the temporary convolution data temporarily stored in the convolution result buffer as an accumulated addend, successively calculating the multiplication operation of the convolution characteristic data input by the convolution characteristic input register and the convolution weight data input by the convolution weight input register, and taking the calculation result as new temporary convolution data;
the convolution control register is used for inputting the convolution weight data of the convolution weight input register, the convolution characteristic data of the convolution characteristic input register and the temporary convolution data temporarily stored in the convolution result buffer into the multiplier-adder, respectively transmitting the convolution characteristic data and the convolution weight data to the convolution characteristic input register and the convolution weight input register of the next convolution basic operation unit after the current calculation period is finished, and outputting the temporary convolution data finally stored in the convolution result buffer as a convolution data result after the convolution operation is finished.
According to a second aspect of the present disclosure, there is provided a systolic array system, comprising: the pulsating array, the system bus, the array controller, the feature buffer and the output buffer are used for finishing the synchronous sorting operation of the tagged feature data in a sorting mode; wherein, the first and the second end of the pipe are connected with each other,
the system bus is respectively connected with the array controller, the characteristic buffer and the output buffer, and is used for sending a sequencing control instruction to the array controller and receiving sequencing characteristic data uploaded by the output buffer and corresponding synchronous label data after sequencing is finished;
the array controller is respectively connected with the feature buffer, the pulse array and the output buffer, and is used for inputting feature data and tag data corresponding to the feature data into the feature buffer after sending a sorting control signal according to the sorting control instruction, gradually inputting the feature data in the feature buffer and the tag data corresponding to the feature data into the pulse array according to groups after the feature data and the tag data are distributed to different configuration values of control registers of a first basic operation unit and a second basic operation unit in the pulse array to perform synchronous sorting operation, and outputting the sorting feature data and the synchronous tag data corresponding to the sorting feature data to the output buffer after the sorting is finished, wherein the feature data are a plurality of candidate detection frame scores generated by a neural network model, and the tag data are position index information corresponding to the candidate detection frame scores.
In one embodiment, the systolic array system further comprises:
the weight buffer is respectively connected with the system bus, the array controller and the pulsation array and is used for completing convolution operation in a convolution mode;
correspondingly, the system bus is also used for sending a convolution control instruction to the array controller and receiving a convolution data result uploaded by the output buffer after the convolution operation is finished;
the array controller is further configured to allocate a convolution configuration value to a control register of each basic operation unit in the systolic array after sending a convolution control signal according to the convolution control instruction, input convolution feature data in the convolution feature buffer and convolution weight data in the convolution weight buffer as two rows of corresponding data one by one according to a preset sequence to perform convolution calculation, and transmit a convolution data result to the output buffer after the convolution operation is finished, where the convolution weight data is convolution window data arranged according to a first preset format, and the convolution feature data is image data arranged according to a second preset format.
According to a third aspect of the present disclosure, there is provided an operation method of a systolic array system, applied to the systolic array system, including:
determining a working mode indicated by a working instruction according to the received working instruction;
when the working mode is a sorting mode, after the sorting control signal sent by the array controller is distributed to different configuration values of control registers of a first basic operation unit and a second basic operation unit in the pulse array, the characteristic data of a characteristic buffer and the corresponding label data are input into the pulse array step by step according to groups to carry out synchronous sorting operation, and after sorting is finished, the sorting characteristic data and the corresponding synchronous label data are output through an output buffer and are transmitted back to a system bus, wherein the first basic operation unit and the second basic operation unit in the pulse array comprise comparators, the characteristic data of the characteristic buffer are a plurality of candidate detection frame scores generated by a neural network model, and the label data are position index information corresponding to the candidate detection frame scores.
In an embodiment, the determining, according to the received work instruction, the work mode indicated by the work instruction includes:
and sending a sequencing control instruction to the array controller by a system bus according to the received working instruction, and determining a sequencing control signal of the array controller according to the sequencing control instruction.
In one embodiment, the operation method of the systolic array system further includes:
when the working mode is a convolution mode, after a convolution control signal sent by the array controller is distributed to control register convolution configuration values of each basic operation unit in the pulse array, convolution characteristic data of the characteristic buffer and convolution weight data of the weight buffer are respectively input into the pulse array one by one as two rows of corresponding data according to a preset sequence for convolution calculation, and after the convolution operation is finished, a convolution data result is output through the output buffer and is transmitted back to a system bus, wherein the convolution weight data of the weight buffer is convolution window data arranged according to a first preset format, and the convolution characteristic data of the characteristic buffer is image data arranged according to a second preset format.
According to a fourth aspect of the present disclosure, there is provided an arithmetic device of a systolic array system, the device including:
the mode determining module is used for determining a working mode indicated by the working instruction according to the received working instruction;
and the sorting result output module is used for gradually inputting the characteristic data of the characteristic buffer and the corresponding label data thereof into the pulse array according to groups for synchronous sorting operation after the sorting control signal sent by the array controller is distributed to different configuration values of control registers of a first basic operation unit and a second basic operation unit in the pulse array when the working mode is the sorting mode, outputting the sorting characteristic data and the corresponding synchronous label data thereof through the output buffer after the sorting is finished, and transmitting the sorting characteristic data and the corresponding synchronous label data back to a system bus, wherein the first basic operation unit and the second basic operation unit in the pulse array comprise comparators, the characteristic data of the characteristic buffer are a plurality of candidate detection frame scores generated by a neural network model, and the label data are position index information corresponding to the candidate detection frame scores.
In an implementation manner, the mode determining module is specifically configured to:
and sending a sequencing control instruction to the array controller by a system bus according to the received working instruction, and determining a sequencing control signal of the array controller according to the sequencing control instruction.
In one embodiment, the operation device of the systolic array system further includes:
and the convolution result output module is used for respectively inputting convolution characteristic data of the characteristic buffer and convolution weight data of the weight buffer into the pulse array one by one as two rows of corresponding data for convolution calculation according to a preset sequence after a convolution control signal sent by the array controller is distributed to a control register convolution configuration value of each basic operation unit in the pulse array when the working mode is the convolution mode, outputting a convolution data result through the output buffer after the convolution calculation is finished, and transmitting the convolution data result back to a system bus, wherein the convolution weight data of the weight buffer is convolution window data arranged according to a first preset format, and the convolution characteristic data of the characteristic buffer is image data arranged according to a second preset format.
According to a fifth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the present disclosure.
According to the pulse array, the pulse array system and the operation method, device and storage medium thereof, the work mode indicated by the work instruction is determined according to the received work instruction, when the work mode is the sequencing mode, after the sequencing control signal sent by the array controller is distributed to different configuration values of control registers of a first basic operation unit and a second basic operation unit in the pulse array, the feature data of a feature buffer and the corresponding label data are gradually input into the pulse array according to groups to carry out synchronous sequencing operation, and after the sequencing is finished, the sequencing feature data and the corresponding synchronous label data are output through an output buffer and are transmitted back to a system bus.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1A is a schematic diagram illustrating a systolic array architecture according to a first embodiment of the present disclosure;
FIG. 1B is a schematic diagram illustrating a basic operation unit in a systolic array in a sorting mode according to an embodiment of the present disclosure;
FIG. 1C is a schematic diagram illustrating a basic operation unit in a systolic array in a convolution mode according to an embodiment of the present disclosure;
fig. 2A shows a schematic structural diagram of a systolic array system provided in the second embodiment of the present disclosure;
fig. 2B is a schematic diagram illustrating a feature data sorting process of four tagged data according to a second embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a method for operating a systolic array system according to a third embodiment of the present disclosure;
fig. 4 shows a schematic structural diagram of an arithmetic device of a systolic array system according to a fourth embodiment of the present disclosure.
Detailed Description
In order to make the objects, features and advantages of the present disclosure more apparent and understandable, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Example one
Fig. 1A is a schematic diagram of a systolic array structure provided in an embodiment of the present disclosure, including: the plurality of first basic operation units and the plurality of second basic operation units are connected in a matrix arrangement mode corresponding to rows and used for finishing synchronous sorting operation of the labeled characteristic data in a sorting mode. As shown in fig. 1A, the present embodiment is composed of M × N configurable basic operation units.
The basic operation unit refers to the smallest constituent unit in the systolic array, the first basic operation unit may be an operation unit for sorting input feature data, and the second basic operation unit may be an operation unit for following and sorting labels corresponding to the input feature data.
Specifically, the systolic array provided in this embodiment may include two types of basic operation units, including a first basic operation unit and a second basic operation unit. The first basic operation units and the second basic operation units may be connected in a matrix arrangement corresponding to rows, as shown in fig. 1A, for example, the first row may be the first basic operation units, and the second row may be the second basic operation units.
Fig. 1B is a schematic diagram illustrating a basic operation unit in a systolic array in a sorting mode according to an embodiment of the disclosure, as shown in fig. 1B, in the sorting mode, the weight input register and the multiplier-adder do not participate in operation; the first control register is connected with the first characteristic input register, the first result buffer and the first comparator, and the connection relation representation is omitted in the figure; the second control register is connected with the second characteristic input register, the synchronous weight input register, the second result buffer and the second comparator, and the connection relation is omitted in the figure. Wherein, the first and the second end of the pipe are connected with each other,
each first basic operation unit comprises a first comparator, a first control register and a first result buffer, wherein the first comparator is used for comparing input characteristic data; the first control register is used for controlling the comparison symbol result to be output to a second basic operation unit corresponding to the current first basic operation unit as a synchronous control signal, outputting the comparison data result to the first result buffer and a first characteristic input register of a next basic operation unit respectively, and outputting temporary characteristic data finally stored in the first result buffer as sequencing characteristic data after the sequencing is finished;
each second basic operation unit comprises a second comparator, a second control register, a second result buffer and a synchronous weight input register, wherein the second comparator is used for comparing input label data according to a synchronous control signal received by the synchronous weight input register; and the second control register is used for controlling the comparison tag result to be respectively output to the second result buffer and a second characteristic input register of the next basic operation unit, and after the sorting is finished, the temporary tag data finally stored in the second result buffer is used as synchronous tag data of the sorting characteristic data to be synchronously output.
In the embodiment of the present disclosure, each first basic operation unit includes a first feature input register for storing feature data, and a first result buffer for temporarily storing temporary feature data; each second basic operation unit comprises a second characteristic input register used for storing label data, and a second result buffer used for temporarily storing temporary label data;
correspondingly, the first control register is also used for inputting the characteristic data of the first characteristic input register and the temporary characteristic data of the first result buffer into the first comparator successively; and the second control register is also used for successively inputting the label data of the second characteristic input register and the temporary label data of the second result buffer into the second comparator.
The first comparator may be an electronic component that compares two characteristic magnitudes at the input end and outputs a comparison result at the output end; the control register may be a memory storing execution commands of different working modes, for example, a convolution operation related instruction is executed in a convolution operation mode, a sorting operation related instruction is executed in a sorting mode, and the first control register may be a sorting operation related instruction in a sorting mode; the first result buffer may be a memory for temporarily storing the feature data that meets the comparison result condition; the first feature input register may be a register for storing feature data. The sequencing feature data is feature data with a certain sequencing rule.
The second comparator can be an electronic element which directly determines the comparison relationship between the two label data at the input end according to the synchronous control signal and outputs the comparison result at the output end, namely the second comparator does not need to really perform comparison operation, and only needs to use the comparison function of the comparator to directly take the comparison symbol transmitted by the synchronous control signal as the comparison result, so that the two label data corresponding to the comparison result can be conveniently moved synchronously along with the characteristic data; the second control register may be for executing tag ordering operation related instructions in the tag mode; the second result buffer may be a memory for temporarily storing the tag data satisfying the comparison result condition; the second feature input register may be a register for storing tag data; the synchronization weight input register may be a "bridge" between the first basic operation unit and the second basic operation unit for storing a comparison sign of the first comparator in the first basic operation unit. The synchronous tag data of the ranking characteristic data is tag data corresponding to the ranking characteristic data in the first base computing unit. In the image processing field, the feature data may be a plurality of candidate detection box scores generated by a neural network model, and the tag data may be position index information corresponding to the plurality of candidate detection box scores.
In an embodiment of the present disclosure, the first comparator is specifically configured to: successively comparing the size of the feature data input by the first feature input register with the size of the temporary feature data input by the first result buffer; according to a preset sorting rule, taking the feature data meeting the first sorting condition as new temporary feature data, and taking the feature data meeting the second sorting condition as feature data in a first feature input register of a next basic operation unit;
correspondingly, the first control register is further specifically configured to: and controlling to output the new temporary characteristic data to the first result buffer and output the characteristic data meeting the second sorting condition to a first characteristic input register of a next basic operation unit.
In an embodiment of the present disclosure, the second comparator is specifically configured to: according to the synchronous control signal received by the synchronous weight input register, comparing the label data input by the second characteristic input register with the temporary label data input by the second result buffer, taking the label data meeting the first ordering condition as new temporary label data, and taking the label data meeting the second ordering condition as the label data in the second characteristic input register of the next basic operation unit;
correspondingly, the second control register is further specifically configured to: and controlling to synchronously output the new temporary label data to a second result buffer, and synchronously outputting the label data meeting the second sorting condition to a second characteristic input register of the next basic operation unit.
The first sorting condition, the second sorting condition and the preset sorting rule may be conditions set according to actual requirements, for example, the first sorting condition may be data set to have a larger sorting result, the second sorting condition may be data set to have a smaller sorting result, and the preset sorting rule is to require data to be sorted from large to small, or feature data having a large sorting result is input into the first result buffer, and tag data having a large sorting result is input into the second result buffer.
Specifically, the present embodiment supports a sorting operator with a tag, which may also be called a shadow mode (shadow mode). That is, the first basic operation unit is used to compare the size of the input feature data, and the second basic operation unit does not actually compare the size of the tag data, but uses the comparison symbol of the second basic operation unit as a reference for moving the tag data of the second basic operation unit along with the feature data of the first basic operation unit. After the sorting is finished, outputting the temporary characteristic data finally stored in the first result buffer as sorting characteristic data; and taking the temporary label data finally stored in the second result buffer as synchronous label data of the sequencing characteristic data for synchronous output.
For example, taking the current first basic operation unit and the current second basic operation unit as an example, the first result buffer and the second result buffer may be set as default to directly temporarily store the first data, and the feature data are sequentially input as follows: the characteristic data A and the corresponding label data are a and are used as data input into the pulse array for the first time; secondly, the characteristic data B and the corresponding label data are B; again, the feature data C and its corresponding tag data are C. Presetting a sorting rule that characteristic data with a large sorting result is input into a first result buffer, and label data with a large sorting result is input into a second result buffer; and inputting the feature data with a small sequencing result into a next basic operation unit, and inputting the label data with a small sequencing result into the next basic operation unit. The first ordering condition is the larger data in the comparison result, and the second ordering condition is the smaller number in the comparison result.
For example, in the first basic operation unit, the first control register first temporarily stores the feature data a of the first feature input register into the first result buffer by default, and outputs no synchronous control signal. When the second feature data B is input into the first comparator, the A in the first result buffer is also input into the first comparator for comparison, the first comparator compares the input feature data B with the feature data A in the first result buffer, if the comparison data result is ' A > B ', the comparison sign is ' >, the feature data A meeting the first sorting condition is used as new temporary feature data, and the feature data B meeting the second sorting condition is used as feature data in the first feature input register of the next basic operation unit. The first control register outputs the feature data A to the first result buffer, outputs the feature data B to the first feature input register of the next basic operation unit, and outputs the comparison symbol ">" as a synchronization control signal to the second basic operation unit corresponding to the current first basic operation unit. Then, in this embodiment, the third feature data C and the feature data a temporarily stored in the result buffer are input into the first comparator for comparison, the comparison process is as described above, and the same type of comparison is performed in the next basic operation unit according to the sequentially input feature data, which is not described again.
Illustratively, in the second basic operation unit, the second control register simultaneously temporarily stores the tag data a of the first and second feature input registers in the second result buffer by default, when the second tag data b is input into the second comparator, the second control register compares the tag data a with the tag data a input into the first comparator in the second result buffer, the second comparator directly compares the input tag data according to the synchronization control signal ">" received by the synchronization weight input register, the comparison tag result is "a > b", the tag data a meeting the first sorting condition is used as new temporary tag data, and the tag data b meeting the second sorting condition is used as feature data in the second feature input register of the next basic operation unit. And the second control register controls the label data a to be output to the second result buffer, and the label data B to be output to a second characteristic input register of the next basic operation unit, so that the synchronous output of the characteristic data A and the label data a, and the synchronous output of the characteristic data B and the label data B are realized. Then, in this embodiment, the third tag data c and the temporarily stored a in the result buffer are input into the second comparator for comparison, the comparison process is as described above, and the same type of comparison is performed in the next basic operation unit according to the sequentially input tag data, which is not described again.
In the embodiment, a comparator is arranged in a conventional pulse array, basic operation units in the pulse array are classified, a first basic operation unit is arranged for sequencing feature data, a second basic operation unit is arranged for sequencing tag data corresponding to the feature data, corresponding commands to be executed by a first control register and a second control register under sequencing operation are arranged, a shadow mode is allowed, namely control signals are transmitted through a weight input/output line in the pulse array, the data flow direction of the second basic operation unit is controlled by the operation result of an adjacent unit, a complex synchronous sequencing function with tag data can be realized, the operation of carrying data to other equipment for sequencing is avoided, the operation time and bandwidth are saved, the utilization rate of the pulse array in operation is improved under the conditions of multilayer network calculation and complex operators, and the requirements of bandwidth pressure of an external system bus and external extra computing force are reduced.
In an embodiment of the present disclosure, the systolic array further comprises: the plurality of first basic operation units and the plurality of second basic operation units are classified into the same convolution basic operation unit, and the convolution basic operation unit comprises a convolution weight input register, a convolution characteristic input register, a convolution result buffer, a convolution control register and a multiplier-adder; wherein the content of the first and second substances,
a convolution weight input register for storing convolution weight data;
a convolution characteristic input register for storing convolution characteristic data;
the convolution result buffer is used for temporarily storing convolution temporary data;
the multiplier-adder is used for taking the temporary convolution data temporarily stored in the convolution result buffer as an accumulated addend, successively calculating the multiplication operation of the convolution characteristic data input by the convolution characteristic input register and the convolution weight data input by the convolution weight input register, and taking the calculation result as new temporary convolution data;
and the convolution control register is used for controlling convolution weight data input into the convolution weight input register, convolution characteristic data input into the convolution characteristic input register and convolution temporary data temporarily stored in the convolution result buffer to be input into the multiplier-adder, respectively transmitting the convolution characteristic data and the convolution weight data to the convolution characteristic input register and the convolution weight input register of the next convolution basic operation unit after the current calculation period is finished, and outputting the convolution temporary data finally stored in the convolution result buffer as a convolution data result after the convolution operation is finished.
Fig. 1C is a schematic structural diagram of a basic operation unit in a systolic array in a convolution mode according to an embodiment of the present disclosure. As shown in fig. 1C, in the convolution mode, each basic operation unit further includes a weight input register and a multiplier-adder, the comparator does not participate in the operation, the convolution control register is connected to the convolution characteristic input register, the convolution weight input register, the convolution result buffer, and the multiplier-adder, and the connection relationship is omitted in the drawing.
The convolution weight input register refers to a register for storing convolution weight data, and in the image processing field, the convolution weight data may be convolution window data, and in other fields, the convolution weight data may also be any data requiring convolution.
The convolution characteristic data may be at least one data stored in a convolution mode in a convolution characteristic input register, and in the image processing field, the convolution characteristic data may be image data, and in other fields, the convolution characteristic data may also be any data having a convolution requirement. The convolution nonce may be an accumulated addend that needs to be used each time a previous computation cycle. The convolution data result may be data having a new characteristic composed of convolution temporary data of convolution result buffers in a plurality of basic operation units.
Specifically, the basic operation unit in this embodiment may further implement a basic function of convolution operation, store two series of feature data with convolution requirements in the convolution weight input register and the convolution feature input register, and input the feature data into the multiplier-adder to perform convolution operation, so as to obtain data with new features.
Specifically, the present embodiment takes the convolution operation of the current convolution basic operation unit as an example for explanation. The convolution control register sends out a control signal corresponding to a convolution mode, controls convolution weight data input into the convolution weight register, convolution characteristic data input into the convolution characteristic register and convolution temporary data temporarily stored in the convolution result buffer to be input into the multiplier-adder, takes the convolution temporary data temporarily stored in the convolution result buffer as an accumulated addend in each calculation period, calculates multiplication operation of the convolution characteristic data and the convolution weight data successively, and stores the calculation result as new convolution temporary data. After the current calculation period is completed, the convolution feature data and the convolution weight data are respectively transmitted to the convolution feature input register and the convolution weight input register of the next convolution basic operation unit, and the same convolution operation is performed on the newly input convolution feature data and the newly input convolution weight data.
The embodiment can also realize the basic convolution operation function by setting the related instruction of the convolution control register in the convolution mode.
Example two
Fig. 2A is a schematic structural diagram of a systolic array system according to a second embodiment of the present disclosure, including: the system comprises a pulsation array, a system bus, an array controller, a characteristic buffer and an output buffer, wherein the pulsation array is used for finishing sequencing operation in a sequencing mode; wherein the content of the first and second substances,
the system bus is respectively connected with the array controller, the characteristic buffer and the output buffer, and is used for sending a sequencing control instruction to the array controller and receiving sequencing characteristic data uploaded by the output buffer and corresponding synchronous label data after sequencing is finished;
the array controller is respectively connected with the feature buffer, the pulse array and the output buffer and is used for controlling the feature data and the corresponding label data to be input into the feature buffer after sending a sequencing control signal according to a sequencing control instruction, gradually inputting the feature data in the feature buffer and the corresponding label data into the pulse array according to groups after the control registers of the first basic operation unit and the second basic operation unit in the pulse array are distributed with different configuration values, carrying out synchronous sequencing operation, and outputting the sequencing feature data and the corresponding synchronous label data to the output buffer after sequencing is finished, wherein the feature data are a plurality of candidate detection frame scores generated by the neural network model, and the label data are position index information corresponding to the candidate detection frame scores.
The system bus is a main component connected with the computer system and used for receiving working instructions and data needing to be sequenced sent by the central controller and sending control instructions of different tasks to the array controller. When the characteristic data processed by the systolic array is a sequencing task, the system bus sends a sequencing control command like an array controller, and the array controller is used for controlling and decoding the received sequencing control command to realize the storage and forwarding of the data and the management of the whole array. The characteristic buffer is used for storing sequencing characteristic data with sequencing requirements, and the output buffer is used for receiving sequencing data results uploaded by the systolic array and then transmitting the sequencing data results back to the system bus.
The sequencing control instruction is obtained by analyzing the working instruction through a system bus and is sent to the array controller. When the task received by the systolic array in this embodiment is a sort operation, the array controller allocates a first configuration value to a first control register of a first basic operation unit in the systolic array, and allocates a second configuration value to a second control register of a second basic operation unit. The sequencing control signal is a sequencing execution signal for interaction of each device in the array controller and is used for realizing sequencing operation. The first configuration value corresponds to a first control signal and is used for enabling the first basic operation unit to finish the sorting operation of the feature data; the second configuration value corresponds to a second control signal for causing the second basic operation unit to complete a synchronous follow-up sorting operation of the tag data.
Specifically, in this embodiment, the sorting operation is completed in the sorting mode, the system bus sends a sorting control instruction to the array controller, the array controller sends a sorting control signal, after the control registers allocated to the first basic operation unit and the second basic operation unit in the systolic array have different configuration values, the feature data stored in the feature buffer and the corresponding tag data are input into the systolic array in sequence, one by one, according to groups to perform synchronous sorting operation, after the sorting is completed, the sorting feature data with sorting characteristics and the corresponding synchronous tag data output by the systolic array are output to the output buffer, and are transmitted to the system bus via the output buffer.
Fig. 2B is a schematic diagram of a feature data sorting process of four tagged data according to a second embodiment of the present disclosure, which belongs to a scenario of complex tag data, that is, a data input bit width exceeding a single basic operation unit. As shown in fig. 2B, this embodiment is a sorting process of performing the reserved position index (index) on the 4 data (4,1,8,2) arranged out of order. This operation uses 2 rows of basic operation units in common: the 1 st line adopts a sorting mode, sorts the characteristic data (4,1,8,2) and outputs a synchronous control signal related to the label; the position index (index) is used as a 2 nd row basic operation unit for one-to-one entry of the tag data and the feature data into the shadow mode, and is sorted (transposed) corresponding to the feature data. After the sorting operation is completed, the original feature data are sequentially arranged in a first result buffer in the 1 st line of basic operation units in the descending order, the position of each feature data in the original data, namely the tag data, is correspondingly arranged in a second result buffer in the 2 nd line of basic operation units, and the sorted feature data and the synchronous tag data are finally input into the pulse array.
Specifically, in this embodiment, the feature data with the tag data is input to the systolic array from the feature buffer step by step in groups, the feature data is located in the first row and input to the first basic operation unit of the systolic array one by one, and the tag data is located in the second row and input to the second basic operation unit of the systolic array one by one. And each row of data enters the systolic array from the characteristic input device of the first column of basic operation units in sequence.
The first basic unit operating mode in the sorting mode is as follows:
i) The weight input register and multiplier-adder are disabled (bypass);
ii) when the first characteristic data enters the computing unit, the first characteristic data is directly stored in the first result buffer, and no synchronous control signal is output; iii) The non-first feature data enters the first comparator from the first feature input, and is compared with the data in the first result cache, and the sorted results are sorted according to a preset sorting rule (for example: output a larger value or output a smaller value) are respectively input into the first result buffer and the characteristic output, and the comparison symbol is transmitted from the weight output line to the next calculation unit as a synchronous control signal;
iv) outputting the feature in the first result buffer after all data ordering is completed.
The second basic operation unit operating mode in shadow mode (shadow mode) is as follows:
i) The multiplier-adder is disabled (bypass);
ii) when the first label data enters the basic operation unit, the first label data is directly stored in a second result buffer without weight data input and output;
iii) When non-first label data enters the second comparator from the second characteristic input register, the second control comparator respectively inputs the label data and temporary label data in the second result cache into the second result buffer and characteristic output according to the comparison result of the first basic operation unit according to the synchronous control signal entering the second comparator from the weight input register so as to finish the synchronous movement of the label; if the lower basic operation unit also adopts the shadow mode, the control signal is continuously broadcasted downwards from the weight output line to the next basic operation unit;
iv) after the ordering of all the characteristic data and the label data is finished, outputting the label in the second result buffer.
As shown in fig. 2A, in an embodiment of the present disclosure, the systolic array system further includes: the weight buffer is respectively connected with the system bus, the array controller and the pulse array and is used for completing convolution operation in a convolution mode;
correspondingly, the system bus is also used for sending a convolution control instruction to the array controller and receiving a convolution data result uploaded by the output buffer after the convolution operation is finished;
the array controller is further used for distributing control register convolution configuration values of each basic operation unit in the pulse array after sending a convolution control signal according to a convolution control instruction, inputting convolution characteristic data in the convolution characteristic buffer and convolution weight data in the convolution weight buffer into the pulse array one by one according to a preset sequence for convolution calculation after the convolution characteristic data and the convolution weight data in the convolution weight buffer are used as two rows of corresponding data respectively, and transmitting convolution data results to the output buffer after the convolution operation is finished, wherein the convolution weight data are convolution window data arranged according to a first preset format, and the convolution characteristic data are image data arranged according to a second preset format.
The system bus is used for receiving a working instruction sent by the central controller and data needing convolution operation and sending a convolution control instruction to the array controller. The preset sequence, the first preset format and the second preset format are all arranged formats which accord with convolution operation. The convolution configuration value corresponds to a convolution control signal and is used for enabling the convolution basic operation unit to complete convolution operation of convolution characteristic data and convolution weight data. For example, in the field of image processing, the convolution weight data may be convolution window data arranged in a first preset format, and the convolution feature data may be image data arranged in a second preset format.
Specifically, in the convolution operation mode, the array controller in this embodiment is further configured to send a convolution control signal according to a convolution control instruction sent by a system bus, allocate the convolution control signal to convolution configuration values of convolution control registers in each convolution basic operation unit in the systolic array, respectively use convolution feature data in the convolution feature buffer and convolution weight data in the convolution weight buffer as two rows of corresponding data, input the two rows of corresponding data into the systolic array one by one according to a preset sequence to perform convolution calculation, and after the convolution operation is completed, receive a convolution data result transmitted by the systolic array to the output buffer, and transmit the convolution data result back to the system bus via the output buffer.
The systolic array system provided by the embodiment can support the improved systolic array to carry out synchronous sequencing on data with tags, realizes the functions of directly calculating and tracking a complex sequencing operator on the systolic array, improves the utilization rate of the systolic array, and avoids the problem of transmission bandwidth waste caused by data transportation.
EXAMPLE III
Fig. 3 is a flowchart of an operation method of a systolic array system according to a third embodiment of the present disclosure, where the method may be executed by an operation device of the systolic array system according to the third embodiment of the present disclosure, and the device may be implemented in a software and/or hardware manner. The method specifically comprises the following steps:
and S310, determining the working mode indicated by the working instruction according to the received working instruction.
In the embodiment of the present disclosure, determining the working mode indicated by the working instruction according to the received working instruction includes: and sending a sequencing control command to the array controller by the system bus according to the received working command, and determining a sequencing control signal of the array controller according to the sequencing control command.
The work instruction may be instruction information including task content. In this embodiment, the operation mode includes two kinds, namely a sequencing operation mode and a convolution operation mode, and when the operation mode is the sequencing operation mode, the array controller sends a sequencing control signal to other devices in the pulse array system according to a received sequencing control instruction to complete corresponding sequencing operation; and when the working mode is a convolution operation mode, the array controller sends convolution control signals to other devices in the pulsation array system according to the received array control instruction so as to complete corresponding convolution operation.
And S320, when the working mode is the sorting mode, after the sorting control signal sent by the array controller is distributed to different configuration values of control registers of a first basic operation unit and a second basic operation unit in the pulse array, inputting the characteristic data of the characteristic buffer and the corresponding tag data into the pulse array step by step according to groups to perform synchronous sorting operation, outputting the sorting characteristic data and the corresponding synchronous tag data through the output buffer after the sorting is finished, and transmitting the sorting characteristic data and the corresponding synchronous tag data back to a system bus.
The first basic operation unit and the second basic operation unit in the systolic array comprise comparators, feature data of the feature buffer are multiple candidate detection frame scores generated by the neural network model, and tag data are position index information corresponding to the multiple candidate detection frame scores.
In an embodiment of the present disclosure, the systolic array system further comprises: when the working mode is a convolution mode, after a convolution control signal sent by the array controller is distributed to a control register convolution configuration value of each basic operation unit in the pulse array, convolution characteristic data of the characteristic buffer and convolution weight data of the weight buffer are respectively input into the pulse array one by one as two rows of corresponding data according to a preset sequence for convolution calculation, and after the convolution operation is finished, a convolution data result is output through the output buffer and is transmitted back to a system bus, wherein the convolution weight data of the weight buffer is convolution window data arranged according to a first preset format, and the convolution characteristic data of the characteristic buffer is image data arranged according to a second preset format.
Example four
Fig. 4 is a schematic structural diagram of an operation device of a systolic array system according to an embodiment of the present disclosure, where the device specifically includes:
a mode determining module 410, configured to determine, according to the received work instruction, a work mode indicated by the work instruction;
and a sorting result output module 420, configured to, when the working mode is a sorting mode, gradually input the feature data of the feature buffer and the tag data corresponding to the feature data into the systolic array in groups for synchronous sorting operation after the sorting control signal sent by the array controller is distributed to different configuration values of control registers of the first basic operation unit and the second basic operation unit in the systolic array, and output the sorting feature data and the synchronous tag data corresponding to the sorting feature data through the output buffer after sorting is completed, and transmit the sorting feature data and the synchronous tag data back to the system bus, where the first basic operation unit and the second basic operation unit in the systolic array include comparators, the feature data of the feature buffer is multiple candidate detection frame scores generated by the neural network model, and the tag data is position index information corresponding to the multiple candidate detection frame scores.
In an implementation manner, the mode determining module is specifically configured to:
and sending a sequencing control command to the array controller by the system bus according to the received working command, and determining a sequencing control signal of the array controller according to the sequencing control command.
In one embodiment, the computing device of the systolic array system further includes:
and the convolution result output module is used for respectively inputting convolution characteristic data of the characteristic buffer and convolution weight data of the weight buffer as two lines of corresponding data into the pulse array one by one for convolution calculation according to a preset sequence after a convolution control signal sent by the array controller is distributed to a control register convolution configuration value of each basic operation unit in the pulse array when the working mode is the convolution mode, outputting a convolution data result through the output buffer after the convolution operation is finished, and transmitting the convolution data result back to the system bus, wherein the convolution weight data of the weight buffer is convolution window data arranged according to a first preset format, and the convolution characteristic data of the characteristic buffer is image data arranged according to a second preset format.
According to an embodiment of the present disclosure, the present disclosure also provides a readable storage medium.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, "a plurality" means two or more unless specifically limited otherwise.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (12)

1. A systolic array, comprising:
the first basic operation units and the second basic operation units are connected in a matrix arrangement mode corresponding to rows and used for finishing synchronous sorting operation of the labeled characteristic data in a sorting mode; wherein the content of the first and second substances,
each first basic operation unit comprises a first comparator, a first control register and a first result buffer, wherein the first comparator is used for comparing input characteristic data; the first control register is used for controlling the comparison symbol result to be output to a second basic operation unit corresponding to the current first basic operation unit as a synchronous control signal, outputting the comparison data result to the first result buffer and a first characteristic input register of a next basic operation unit respectively, and outputting temporary characteristic data finally stored in the first result buffer as sequencing characteristic data after the sequencing is finished;
each second basic operation unit comprises a second comparator, a second control register, a second result buffer and a synchronous weight input register, wherein the second comparator is used for comparing input label data according to a synchronous control signal received by the synchronous weight input register; and the second control register is used for controlling the comparison tag result to be respectively output to the second result buffer and a second characteristic input register of a next basic operation unit, and after the sorting is finished, the temporary tag data finally stored in the second result buffer is used as synchronous tag data of the sorting characteristic data to be synchronously output.
2. The systolic array of claim 1, wherein each of the first basic arithmetic units includes a first feature input register for storing feature data, and the first result buffer is used for temporarily storing temporary feature data; each second basic operation unit comprises a second characteristic input register used for storing label data, and the second result buffer is used for temporarily storing temporary label data;
correspondingly, the first control register is further configured to successively input the feature data of the first feature input register and the temporary feature data of the first result buffer into the first comparator; the second control register is further configured to successively input the tag data of the second feature input register and the temporary tag data of the second result buffer into the second comparator.
3. The systolic array of claim 2, wherein the first comparator is specifically configured to:
successively comparing the size of the feature data input by the first feature input register with the size of the temporary feature data input by the first result buffer; according to a preset sorting rule, taking the feature data meeting the first sorting condition as new temporary feature data, and taking the feature data meeting the second sorting condition as feature data in a first feature input register of a next basic operation unit;
correspondingly, the first control register is further specifically configured to: and controlling to output the new temporary feature data to the first result buffer, and outputting the feature data meeting a second sorting condition to a first feature input register of the next basic operation unit.
4. The systolic array of claim 3, wherein the second comparator is specifically configured to:
according to the synchronous control signal received by the synchronous weight input register, comparing the label data input by the second characteristic input register with the temporary label data input by the second result buffer, taking the label data meeting the first ordering condition as new temporary label data, and taking the label data meeting the second ordering condition as the label data in the second characteristic input register of the next basic operation unit;
correspondingly, the second control register is further specifically configured to: and controlling to synchronously output the new temporary label data to the second result buffer, and synchronously output the label data meeting a second sorting condition to a second characteristic input register of the next basic operation unit.
5. The systolic array of claim 1, further comprising:
the plurality of first basic operation units and the plurality of second basic operation units are classified into the same convolution basic operation unit, and the convolution basic operation unit comprises a convolution weight input register, a convolution characteristic input register, a convolution result buffer, a convolution control register and a multiplier-adder; wherein the content of the first and second substances,
the convolution weight input register is used for storing convolution weight data;
the convolution characteristic input register is used for storing convolution characteristic data;
the convolution result buffer is used for temporarily storing convolution temporary data;
the multiplier-adder is used for taking the temporary convolution data temporarily stored in the convolution result buffer as an accumulated addend, successively calculating the multiplication operation of the convolution characteristic data input by the convolution characteristic input register and the convolution weight data input by the convolution weight input register, and taking the calculation result as new temporary convolution data;
the convolution control register is used for controlling convolution weight data of the convolution weight input register, convolution feature data of the convolution feature input register and temporary convolution data temporarily stored in the convolution result buffer to be input into the multiplier-adder, after the current calculation period is completed, the convolution feature data and the convolution weight data are respectively transmitted to the convolution feature input register and the convolution weight input register of the next convolution basic operation unit, and after the convolution operation is finished, the temporary convolution data finally stored in the convolution result buffer is output as a convolution data result.
6. A systolic array system, comprising: the systolic array of any one of claims 1-5, a system bus, an array controller, a signature buffer, and an output buffer for performing a synchronous sequencing operation of tagged signature data in a sequencing mode; wherein the content of the first and second substances,
the system bus is respectively connected with the array controller, the feature buffer and the output buffer, and is used for sending a sequencing control instruction to the array controller and receiving sequencing feature data uploaded by the output buffer and corresponding synchronous label data after sequencing is finished;
the array controller is respectively connected with the feature buffer, the pulse array and the output buffer, and is used for controlling the feature data and the corresponding label data to be input into the feature buffer after sending a sequencing control signal according to the sequencing control instruction, gradually inputting the feature data in the feature buffer and the corresponding label data into the pulse array according to groups after the control registers of the first basic operation unit and the second basic operation unit in the pulse array have different configuration values, performing synchronous sequencing operation, and outputting the sequencing feature data and the corresponding synchronous label data to the output buffer after sequencing is finished, wherein the feature data are a plurality of candidate detection frame scores generated by a neural network model, and the label data are position index information corresponding to the candidate detection frame scores.
7. The systolic array system of claim 6, further comprising:
the weight buffer is respectively connected with the system bus, the array controller and the pulsation array and is used for completing convolution operation in a convolution mode;
correspondingly, the system bus is also used for sending a convolution control instruction to the array controller and receiving a convolution data result uploaded by the output buffer after the convolution operation is finished;
the array controller is further configured to allocate a convolution configuration value to a control register of each convolution basic operation unit in the pulse array after sending a convolution control signal according to the convolution control instruction, input convolution feature data in the convolution feature buffer and convolution weight data in the convolution weight buffer as two rows of corresponding data one by one according to a preset sequence to perform convolution calculation, and transmit a convolution data result to the output buffer after the convolution calculation is finished, where the convolution weight data is convolution window data arranged according to a first preset format, and the convolution feature data is image data arranged according to a second preset format.
8. An operation method of a systolic array system is applied to the systolic array system and is characterized by comprising the following steps:
determining a working mode indicated by a working instruction according to the received working instruction;
when the working mode is a sorting mode, after the sorting control signal sent by the array controller is distributed to different configuration values of control registers of a first basic operation unit and a second basic operation unit in the pulse array, the characteristic data of a characteristic buffer and the corresponding label data are input into the pulse array step by step according to groups to carry out synchronous sorting operation, and after sorting is finished, the sorting characteristic data and the corresponding synchronous label data are output through an output buffer and are transmitted back to a system bus, wherein the first basic operation unit and the second basic operation unit in the pulse array comprise comparators, the characteristic data of the characteristic buffer are a plurality of candidate detection frame scores generated by a neural network model, and the label data are position index information corresponding to the candidate detection frame scores.
9. The method according to claim 8, wherein the determining the operation mode indicated by the operation instruction according to the received operation instruction comprises:
and sending a sequencing control instruction to the array controller by a system bus according to the received working instruction, and determining a sequencing control signal of the array controller according to the sequencing control instruction.
10. The method of claim 9, further comprising:
when the working mode is a convolution mode, after a convolution control signal sent by the array controller is distributed to control register convolution configuration values of each basic operation unit in the pulse array, convolution characteristic data of the characteristic buffer and convolution weight data of the weight buffer are respectively input into the pulse array one by one as two rows of corresponding data according to a preset sequence for convolution calculation, and after the convolution operation is finished, a convolution data result is output through the output buffer and is transmitted back to a system bus, wherein the convolution weight data of the weight buffer is convolution window data arranged according to a first preset format, and the convolution characteristic data of the characteristic buffer is image data arranged according to a second preset format.
11. An arithmetic device of a systolic array system, characterized in that the device comprises:
the mode determining module is used for determining a working mode indicated by the working instruction according to the received working instruction;
and the sequencing result output module is used for gradually inputting the characteristic data of the characteristic buffer and the corresponding tag data into the pulse array according to groups for synchronous sequencing operation after a sequencing control signal sent by the array controller is distributed to different configuration values of control registers of a first basic operation unit and a second basic operation unit in the pulse array when the working mode is the sequencing mode, outputting the sequencing characteristic data and the corresponding synchronous tag data through the output buffer after the sequencing is finished, and transmitting the sequencing characteristic data and the corresponding synchronous tag data back to a system bus, wherein the first basic operation unit and the second basic operation unit in the pulse array comprise comparators, the characteristic data of the characteristic buffer are a plurality of candidate detection frame scores generated by a neural network model, and the tag data are position index information corresponding to the candidate detection frame scores.
12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to any one of claims 8-10.
CN202211216188.4A 2022-09-30 2022-09-30 Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium Pending CN115423084A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202211216188.4A CN115423084A (en) 2022-09-30 2022-09-30 Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium
US18/158,711 US20240126716A1 (en) 2022-09-30 2023-01-24 Systolic array, systolic array system, computiation method, device, and storage medium
EP23165020.1A EP4345638A1 (en) 2022-09-30 2023-03-29 Systolic array, systolic array system, computiation method, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211216188.4A CN115423084A (en) 2022-09-30 2022-09-30 Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium

Publications (1)

Publication Number Publication Date
CN115423084A true CN115423084A (en) 2022-12-02

Family

ID=84206559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211216188.4A Pending CN115423084A (en) 2022-09-30 2022-09-30 Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium

Country Status (1)

Country Link
CN (1) CN115423084A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116450216A (en) * 2023-06-12 2023-07-18 上海灵动微电子股份有限公司 Local caching method for shared hardware operation unit
TWI828512B (en) * 2023-01-10 2024-01-01 力晶積成電子製造股份有限公司 Transport scheduling method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI828512B (en) * 2023-01-10 2024-01-01 力晶積成電子製造股份有限公司 Transport scheduling method and device
CN116450216A (en) * 2023-06-12 2023-07-18 上海灵动微电子股份有限公司 Local caching method for shared hardware operation unit
CN116450216B (en) * 2023-06-12 2023-08-29 上海灵动微电子股份有限公司 Local caching method for shared hardware operation unit

Similar Documents

Publication Publication Date Title
CN115423084A (en) Systolic array, systolic array system, method and apparatus for computing systolic array system, and storage medium
CN109543832B (en) Computing device and board card
CN110689126B (en) Device for executing neural network operation
CN106227507A (en) Calculating system and controller thereof
CN110738308A (en) neural network accelerators
CN115880132A (en) Graphics processor, matrix multiplication task processing method, device and storage medium
CN102567254B (en) The method that adopts dma controller to carry out data normalization processing
CN114399035A (en) Method for transferring data, direct memory access device and computer system
CN108647780B (en) Reconfigurable pooling operation module structure facing neural network and implementation method thereof
CN117217274A (en) Vector processor, neural network accelerator, chip and electronic equipment
CN115423085A (en) Pulse array, pulse array system, operation method and device thereof, and storage medium
CN109754076B (en) Multi-core brain-like chip
CN115905363A (en) Real-time data sorting system
EP0559100A2 (en) Method and apparatus for data distribution
CN113722668B (en) Processing unit, correlation device and tensor operation method
CN111260046B (en) Operation method, device and related product
CN111046321B (en) Photovoltaic power station operation and maintenance strategy optimization method and device
CN111258641B (en) Operation method, device and related product
EP4345638A1 (en) Systolic array, systolic array system, computiation method, device, and storage medium
CN110533176B (en) Caching device for neural network computation and related computing platform thereof
CN114239646A (en) Radiation source identification system based on plural neural networks
CN113516236A (en) VGG16 network parallel acceleration processing method based on ZYNQ platform
Chen et al. Research on recognition technology of transformer oil leakage based on improved YOLOV3
JPH06266675A (en) Data transfer device and multiprocessor system
CN111260070A (en) Operation method, device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination